There is a huge amount of gcc data: executable performance, compilation performance, executable sizes, and various synthetic and real world application benchmarks.
For compilation time performance, a couple people are running tests.
In addition, the CSiBE people monitor compilation time. Current time observations.
Also, SuSE tests a number of C++ cases for compile-time performance. Periodic C++ application and compiler performance testing.
Apple is interested in the compilation performance of two files. One is a hunk of proprietary internal code, and the other is a pre-processed Qt file.
There are questions about coverage and style in the listed compile time benchmarks. It might be interesting to find our own compilation time benchmark, one that includes heavily templatized code.
This is routinely monitored by the Department of Software Engineering, University of Szeged. They host the GCC Code-Size Benchmark Environment (CSiBE).
SPEC runs at Red Hat courtesy of Diego Novillo.
SPEC runs at SuSE courtesy of Andreas Jaeger.
Apparently there are SPEC runs at IBM for powerpc/powerpc64, but these results are not public.
In addition, Scott Robert Ladd occasionally posts comprehensive benchmarking numbers with his own benchmarking suite. Unfortunately, his site is down at the moment, and archives are not available.
Attempt to survey current C++ performance for three compilers, on one architecture.
Note all compilers use the GNU C++ runtime.
Architecture is 3Ghz P4, Fedora Core 3. It might be nice to test on a 64 bit architecture, and would probably be fun to compare the other major proprietary compiler, the IBM xlC/Visual Age compiler. However, this compiler is not available for the selected architecture.
Current gcc-4.0.0 libstdc++ sources were built with each compiler, and the time taken to compile all the files in the libstdc++ testsuite was then measured (1761 files) after rebooting the computer. Compile flags were varied to try and measure various parts of the compiler in isolation. This should be a pretty decent simulation of compiling a large C++ project.
Times were collected for various options:
This just times the C++ FE.
This times the C++ FE and the optimizers.
This times the C++ FE and the optimizers, in conjunction with precompiled header support.
This can be measured with the command
time make check-compileon current gcc sources.
In addition, two other pieces of commercial code were compiled. One is from Gaby and is a pre-processed file from a templatized library for representing C++, and the other is from Joe Buck, and comes from some of the system C reference implementation. (See www.systemc.org for more info).
It does seem as if opinions vary widely on what style of C++ should be tested and what constitutes a good test case for measuring compilation speed. Users of C++ compilers have their own favorite pathological code examples. For that reason, we include these three tests, instead of relying on one pre-processed sample. No doubt code samples can be constructed that highlight the deficiencies of any compiler, so examples should be taken with a grain of salt, and should be re-sampled to account for new designs and techniques. For instance, meta-programming techniques, which already tax C++ front ends, might have difference performance footprints on varying implementations.
A variety of tests were used to gauge the quality of produced executables.
oopack.gcc.exe Max=500000 Matrix=5000 Complex=200000 Iterator=500000
make check-performance
./check --benchmark
Measured time is system time, lower is better.
| check-compile | ipr.ii | sc_wif_trace.ii | |||||||
|---|---|---|---|---|---|---|---|---|---|
| -O0 | -O2 | -O2 -include stdc++.h | -O0 | -O2 | -O0 | -O2 | |||
| gcc-3.4.4 | 15:07.13 | 19:19:67 | 9:49:12 | 26.14 | 27.35 | 5.96 | 6.36 | ||
| gcc-4 | 11:48.25 | 15:08.62 | 8:05.08 | 36.54 | 32.10 | 4.38 | 5.46 | ||
| icc-8.1 | 11:46.67 | 12:11.89 | 8:59.50* | 32.22 | 220.36 | 3.18 | 4.03 | ||
Results for oopack, measurement is Mflops, higher is better.
| max | matrix | complex | iterator | |||||
|---|---|---|---|---|---|---|---|---|
| C | C++ | C | C++ | C | C++ | C | C++ | |
| gcc-3.4.4 | 413.2 | 413.2 | 1041.7 | 1050.4 | 1428.6 | 114.2 | 1020.4 | 1075.3 |
| gcc-4 | 406.5 | 406.5 | 1041.7 | 1050.4 | 1403.5 | 114.5 | 1000 | 1075 |
| icc-8.1 | 500.0 | 543.5 | 1068.4 | 1059.3 | 1758.2 | 96.3 | 1098.9 | 1075.3 |
Results for complex dft, measurement is seconds, lower is better.
| complex dft | ||
|---|---|---|
| gcc-3.4.4 | 13.02 | |
| gcc-4 | 4.27 | |
| icc-8.1 | 13.92 | |
Select results for botan, measurement is Mflops, higher is better.
| aes-192 | rc-5(16) | isaac | seal | widerwake4+1 | alder32 | tiger | |
|---|---|---|---|---|---|---|---|
| gcc-3.4.4 | 39.48 | 21.67 | 210.65 | 100.00 | 210.83 | 2002.85 | 35.96 |
| gcc-4 | 35.38 | 31.23 | 238.50 | 154.60 | 199.67 | 537.19 | 35.46 |
| icc-8.1 | 56.33 | 45.88 | 350.73 | 279.71 | 317.83 | 2237.50 | 60.88 |
It looks like gcc-4 will be faster than gcc-3.4, by a significant margin. Most of this speed advantage looks to be from performance optimizations on the C++ FE or preprocessor, although it appears the move to tree ssa did slightly improve things on the optimization front. This is good news for tree ssa.
For best results, gcc-4 with pre-compiled header use should be encouraged. Even with gcc-3.4, pre-compiled headers cut compile times roughly in half.
There were problems creating pre-compiled headers using the Intel compiler, which are still being investigated. Creation and use in an equivalent manner as the GNU compilers was not tested, although this is theoretically possible. In practice, it does not work at the moment. This results in inferior results for the Intel compiler. Although the options are different, it should be possible to create and use a pre-compiled header with the Intel compiler in an equivalent manner as the GNU compiler. The GNU compiler's PCH implementation seems simpler to use, less restrictive, with better diagnostics.
Documentation of PCH strategies is weak with both compilers, although the GNU compiler has better debugging features for analyzing problems.
Results for the libstdc++ performance testsuite showed little difference between Intel and GNU compilers.
For the complex dft test, results indicate that gcc-4 builtin complex functions are effective. The performance advantage that gcc enjoys here is likely to be short lived, as Intel generally adds whatever builtins GNU comes up with to their implementation. However, in the near term (RHEL4) it will make sense to use GNU compilers for this type of code.
Shown results from the botan benchmarks are select, worst-case algorithms in terms of difference between GNU and Intel compilers. Some of the performance difference is quite severe, and an attempt should be made to figure out what is going on, and get suitable regression test cases into gcc bugzilla.