kodtabla

Tue Mar 22 08:18:56 CET 2005

Halihó!

>> Az optimalizáció nem is csak fv.en belül optimalizál!!! Épp ez a
>> lényeg! Az _egész_ forrásfájlban tud. Egy régebbi menürendszer
>> kezelõ - adatbeviteli stb. programon így nyertem kb. 2 kByteot,
>> mert a hasonló részeket (amik logikailag mások kicsit, ezért én
>> külön obj-ba raktam õket) összevonta, miután egy fájlba kerültek.
>> Ezt a több száz soros C programot (tele pointerrel - itt
>> visszajön a topicindítás :-) - különbözõ nyelvek kezelésével)
>> lehetett volna kézzel is optimalizálni, de megelégedtem a
>> fordítóval.
>>
> Ez erdekes, egy sima C fordito a library fuggvenyekhez pl hozza sem
> nyul. Meghivja egy call-lal, akkor is ha 6 byte hosszu.

Ha ez probléma, akkor ne használj C library-t! Nem pont az a lényeg, hogy azért használunk vmit mert könnyebb, gyorsabb, egyszerûbb használni?

> Optimalizalasrol szo sincs. Hasonlo reszek keresese? Errol valami
> doksit produkalj, de meg akkor sem hiszem el egeszen.

Na, akkor lássuk:
EWAVR 4.10, Compiler reference, p. 123

Common subexpression elimination
Redundant re-evaluation of common subexpressions is by default eliminated at
optimization levels Medium and High. This optimization normally reduces both code
size and execution time. However, the resulting code might be difficult to debug.
Note: This option has no effect at optimization levels Low and None.
To read more about the command line option, see --no_cse, page 190.

Function inlining
Function inlining means that a simple function, whose definition is known at compile
time, is integrated into the body of its caller to eliminate the overhead of the call. This
optimization, which is performed at optimization level High, normally reduces
execution time, but increases code size. The resulting code might also be difficult to
debug.
The compiler decides which functions to inline. Different heuristics are used when
optimizing for speed and size.
Note: This option has no effect at optimization levels None, Low, and Medium.
To read more about the command line option, see --no_inline, page 190.

Code motion
Evaluation of loop-invariant expressions and common subexpressions are moved to
avoid redundant re-evaluation. This optimization, which is performed at optimization
level Medium, normally reduces code size and execution time. The resulting code might
however be difficult to debug.
Note: This option has no effect at optimization levels None, and Low.

Type-based alias analysis
When two or more pointers reference the same memory location, these pointers are said
to be aliases for each other. The existence of aliases makes optimization more difficult
because it is not necessarily known at compile time whether a particular value is being
changed.
Type-based alias analysis optimization assumes that all accesses to an object will take
place using its declared type or as a char type. This assumption lets the compiler detect
whether pointers may reference the same memory location or not.
Type-based alias analysis is performed at optimization level High. For ISO/ANSI
standard-conforming C or C++ application code, this optimization can reduce code size
and execution time. However, non-standard-conforming C or C++ code might result in
the compiler producing code that leads to unexpected behavior. Therefore, it is possible
to turn this optimization off.
Note: This option has no effect at optimization levels None, Low, and Medium.
To read more about the command line option, see --no_tbaa, page 191.
Example
short f(short * p1, long * p2)
{
*p2 = 0;
*p1 = 1;
return *p2;
}
With type-based alias analysis, it is assumed that a write access to the short pointed to
by p1 cannot affect the long value that p2 points to. Thus, it is known at compile time
that this function returns 0. However, in non-standard-conforming C or C++ code these
pointers could overlap each other by being part of the same union. By using explicit
casts, you can also force pointers of different pointer types to point to the same memory
location.

Clustering of variables
When clustering of variables is enabled, static and global variables are arranged so that
variables that are accessed in the same function are stored close to each other. This
makes it possible for the compiler to use the same base pointer for several accesses.
Note: This option has no effect at optimization levels None and Low.
Cross call
Common code sequences are extracted to local subroutines. This optimization, which is
performed at optimization level High, can reduce code size, sometimes dramatically, on
behalf of execution time and stack size. The resulting code might however be difficult
to debug. This optimization cannot be disabled using the #pragma optimize directive.
Note: This option has no effect at optimization levels None, Low, and Medium, unless
the option --do_cross_call is used.
To read more about related command line options, see --no_cross_call, page 189,
--do_cross_call, page 179, and --cross_call_passes, page 174.

Ez volt az EW AVR, most nézzük a Keil-t 7.5, Optimizing C Compiler and Library Reference, p. 63:

0 Constant Folding: 
The compiler performs calculations that
reduce expressions to numeric constants, where possible.
This includes calculations of run-time addresses.
Simple Access Optimizing: The compiler optimizes access
of internal data and bit addresses in the 8051 system.
Jump Optimizing: The compiler always extends jumps to the
final target. Jumps to jumps are deleted.
1 Dead Code Elimination: 
Unused code fragments and
artifacts are eliminated.
Jump Negation: Conditional jumps are closely examined to
see if they can be streamlined or eliminated by the inversion of
the test logic.
2 Data Overlaying: 
Data and bit segments suitable for static
overlay are identified and internally marked. The BL51
Linker/Locator has the capability, through global data flow
analysis, of selecting segments which can then be overlaid.
3 Peephole Optimizing: 
Redundant MOV instructions are
removed. This includes unnecessary loading of objects from
the memory as well as load operations with constants.
Complex operations are replaced by simple operations when
memory space or execution time can be saved.
4 Register Variables: Automatic variables and function
arguments are located in registers when possible.
Reservation of data memory for these variables is omitted.
Extended Access Optimizing: Variables from the IDATA,
XDATA, PDATA and CODE areas are directly included in
operations. The use of intermediate registers is not necessary
most of the time.
Local Common Subexpression Elimination: If the same
calculations are performed repetitively in an expression, the
result of the first calculation is saved and used further
whenever possible. Superfluous calculations are eliminated
from the code.
Case/Switch Optimizing: Code involving switch and case
statements is optimized as jump tables or jump strings.
5 Global Common Subexpression Elimination: 
Identical sub
expressions within a function are calculated only once when
possible. The intermediate result is stored in a register and
used instead of a new calculation.
Simple Loop Optimizing: Program loops that fill a memory
range with a constant are converted and optimized.
6 Loop Rotation: 
Program loops are rotated if the resulting
program code is faster and more efficient.
7 Extended Index Access Optimizing: 
Uses the DPTR for
register variables where appropriate. Pointer and array
access are optimized for both execution speed and code size.
8 Common Tail Merging: 
When there are multiple calls to a
single function, some of the setup code can be reused,
thereby reducing program size.
9 Common Block Subroutines: 
Detects recurring instruction
sequences and converts them into subroutines. Cx51 even
rearranges code to obtain larger recurring sequences.

>> Persze. Nem mondtam, hogy a rosszat jóra csinálja. Csak a favágós
>> részét megoldja és ez az esetek nagy részében elég is.
>>
> Jaja... Fokozatosan jutottunk el ide a kezdeti vad kijelenteseid
> utan. Melyeket, miutan keresedre visszaideztem, megtagadtal :)

Ejnye-ejnye már megint kezdünk személyeskedni és cinikusan kikacsingatni. Cöcö.

Ezt írtam:

>>> Szóval teszem azt, nagyon sok hasonló vizsgálat van egy forrásban
>>> (if (vmi rendkívül bonyolult) {} else {}) na, itt a fordító
>>> észreveszi, hogy közös - hasonló a feltétel és kiteszi egy
>>> szubrutinba. Ezt assemblyben k*rvanehéz megcsinálni, mert nem
>>> veszed éeszre triviálisan a logikai hasonlóságokat. Ezek sokat
>>> lehet nyerni.

Mi ebben a durva? Menyiben tér el ez a fent idézett optimalizációs lehetõségektõl?

Üdv,
	Lajos