Performance Results
| AM2 up to 36% faster |
SPEC CPU2000 up to 26% faster |
| HIMENO BMT up to 20% faster |
SPEC OMP2001 World Record Results at 2P & 4P |
| POLYHEDRON up to 33% faster |
STREAM up to 16% faster |
| QUANTUM MONTE CARLO up to 73% faster | |
At PathScale, we are focused on maximizing real-world application performance. With help from some of our partners, we have been working with a number of well-known application codes. The PathScale Compiler Suite has been able to prove performance advantages of up 40% over the most well-known competitive Opteron compiler alternative. As always, the results on your specific codes will vary, but we are confident that using the PathScale C, C++, and Fortran compilers will bring immediate benefits to Opteron and Athlon64 users.
Do you have benchmark results you would like to share with other PathScale compiler customers? Send email to support@pathscale.com and we will work to include it in this page.
The PathScale Compiler Suite produces the fastest and most accurate results for AMD Opteron systems for the Polyhedron 2004 Fortran 90 and Fortran 77 benchmarks. The Polyhedron web pages demonstrate this -- http://www.polyhedron.co.uk/compare/linux/f90bench_AMD.html, http://www.polyhedron.co.uk/compare/linux/f77bench_AMD.html -- as they show that PathScale is the only compiler vendor with no red squares for 64-bit results (meaning that on no code is 64-bit PathScale 50% or more slower than the fastest compiler on a benchmark code).
Here are some comparisons obtained from those web pages:
| Polyhedron 2004 F77 Benchmarks 64-bit Compiler Comparisons | |||
| Geometric Mean Time in seconds | PathScale % Faster | ||
| PathScale 2.1 | 19.16 | ||
| Commercial Compiler A | 25.54 | +33% | |
| Commercial Compiler B | 22.75 | +19% | |
| Commercial Compiler C | 29.26 | +53% | |
| Polyhedron 2004 F90 Benchmarks 64-bit Compiler Comparisons | ||
| Geometric Mean Time in seconds | PathScale % Faster | |
| PathScale 2.1 | 22.76 | |
| Commercial Compiler A | 26.25 | +15% |
| Commercial Compiler B | 28.41 | +25% |
| Commercial Compiler C | 35.30 | +55% |
PathScale 2.1 64-bit optimization flags:
F77: -O3 -LNO:fu=9 -OPT:div_split:fast_math:fast_sqrt -IPA:plimit=3500
F90: -Ofast -OPT:fast_math=on -WOPT:if_conv=off -LNO:fu=9:full_unroll_size=7000
The PathScale Compiler Suite enables the highest performance results for both integer and floating point SPEC CPU2000 speed benchmarks for any AMD64-based Linux® system. The best evidence for this is that since October 2004 through August 26, 2005, on AMD processors and Linux operating systems, there have been 186 CPU2000 results published at www.spec.org using PathScale compilers and none with other compilers.
Since there are no results with competitive compilers published recently on the SPEC web site, we ran our own comparison to a competitive compiler using latest compilers for each with the following results:
| SPECint®2000 | SPECfp®2000 | |
| PathScale ™ v2.2.1 | 1598 | 1984 |
| PGI® Workstation 6.0-5 | 1269 | 1779 |
| % Faster for PathScale | +26% | +12% |
Benchmarks were run on a 2.2 Ghz 1-CPU system with DDR400/PC3200 memory. Full details on the compiler flags and configuration used are available here. If anyone can provide us with improved base or peak optimization flags for the competitive compiler, we will be happy to use them and update these results.
Results Published by Our Partners Using the PathScale Compiler Suite -- Including Dual-Core Opteron Results
Recently, AMD and HP have chosen to submit SPEC CPU2000 results for dual-core Opteron (for example, Opteron Models 275 and 875) systems with PathScale Compilers. Also, IBM, HP, Fujitsu-Siemens, Sun, and AMD continue to choose the PathScale Compiler Suite to get the highest level of performance from their AMD64-based Linux® systems.
![]() SPECint2000 SPECfp2000 SPECint_rate2000 (4 CPU, dual core) SPECfp_rate2000 (4 CPU, dual core) SPECint_rate2000 (2 CPU, dual core) SPECfp_rate2000 (2 CPU, dual core) |
![]() SPECint2000 SPECfp2000 SPECint_rate2000 (4 CPU, dual core) SPECint_rate2000 (4 CPU, single core) SPECint_rate2000 (2 CPU) SPECfp_rate2000 (4 CPU, dual core) SPECfp_rate2000 (4 CPU, single core) SPECfp_rate2000 (2 CPU) |
||||||||
![]() SPECint2000 SPECfp2000 SPECint_rate2000 (4 CPU) SPECint_rate2000 (2 CPU) SPECfp_rate2000 (4 CPU) SPECfp_rate2000 (2 CPU) |
![]() SPECint2000 SPECfp2000 SPECint_rate2000 (2 CPU) SPECfp_rate2000 (2 CPU) |
||||||||
|
|||||||||
"Sun tested 2-way Sun Fire V20z and 4-way Sun Fire V40z servers using multiple SPEC benchmarks, including the SPEC® ompM2001 suite of OpenMP® benchmarks. The PathScale Compiler Suite helped Sun's AMD® Opteron® processor-based servers set world records for SPEC ompM2001 on two-processor and four-processor systems. The Sun/PathScale two-processor results were 29 percent faster (footnote 1) than previous-best Linux ompM2001 benchmarks using non-PathScale Fortran and C compilers. This 29 percent advantage, enabled in large part by PathScale compilers, far exceeds the eight percent faster clock rate of the newer Sun systems."
![]() |
| SPECompM2001 (2 thread, 2 core) |
| SPECompM2001 (4 thread, 4 core) |
Footnote 1:
(1) About the SPEC OMPM2001 Results Reported Above:
Two-Processor Results: The Sun V40z server with PathScale Compiler Suite and 2.6 GHz AMD Opteron CPUs achieved a result of 6486 on a system with two cores, two chips and two threads. This comparison is based on the best performing two-processor Linux servers currently shipping, including previous results with competitor's compiler on a 2.4 GHz Sun Java Workstation W2100z system [SPECompM2001 5085, two cores, two chips, two threads].
Four-Processor Results: The Sun V40z server with PathScale Compiler Suite and 2.6 GHz AMD Opteron CPUs achieved a result of 11223 on a system with four cores, four chips and four threads. This comparison is based on the best performing four-processor Linux servers currently shipping, including previous results on a 2.4 GHz Sun V40z system with a non-PathScale compiler [SPECompM2001 8694, four cores, four chips, four threads].
The Pathscale Compiler Suite produces the highest single-CPU and OpenMP Parallel STREAM results for any system powered by AMD CPUs.
OpenMP
| Machine ID (Higher results are better) | ncpus | COPY | SCALE | ADD | TRIAD |
| AMD_Opteron_848 (PathScale 2.0) | 4 | 15378 | 15845 | 15618 | 15921 |
| PathScale 2.2 | 4 | 16872 | 16932 | 16543 | 16545 |
| % Faster for PathScale 2.2 vs. 2.0 | |
+8% | +7% | +6% | +4% |
Single CPU
| Machine ID (Higher results are better) | ncpus | COPY | SCALE | ADD | TRIAD |
| ASUS_SK8N_Opteron248 (EKOPath 2.0) | 1 | 4811 | 4782 | 4685 | 4682 |
| ASUS_SK8N_Opteron248 (Comm'l 64bit Compiler) | 1 | 4304 | 4251 | 4497 | 4458 |
| PathScale 2.2 | 1 | 4902 | 4871 | 4979 | 4987 |
| % Faster for PathScale 2.2 | +14% | +15% | +11% | +12% |
The above results are for STREAM Benchmarks run on Opteron 248 (2.2 Ghz) machines with DDR400 memory and are posted at http://www.cs.virginia.edu/stream/standard/Bandwidth.html. Results for both PathScale and our competitor are both identified as as 'ASUS_SK8N_Opteron248' and 'ASUS_SK8N_Opteron248 (1 CPU)'. Click on the data link at the right of those lines for more details on the submission.
* Results with 'PathScale 2.1' are on the same system as the 2.0
results on the STREAM web site and use the
following optimization flags:
OpenMP: pathf90/pathcc -O3 -CG:use_prefetchnta -LNO:prefetch_ahead=4 -mp
Serial: pathf90/pathcc -O3 -CG:use_prefetchnta .
HimenoBMT "The Performance Evaluation"
The Pathscale Compiler Suite produces the excellent single CPU and 4-CPU OpenMP results using the popular Himeno benchmark:http://accc.riken.jp/HPC/HimenoBMT/index_e.html
Serial results on Opteron 2.2 GHz, PC3200
| F77 | F90 MFLOPS |
C | |
| PathScale 2.0 | 1584 | 1189 | 267 |
| 64-bit Commercial Compiler | 1419 | 1125 | 141 |
| GNU compilers (3.4.3 & g95) | 1002 | 588 | 213 |
| PathScale Advantage | |||
| Commercial 64-bit compiler | +12% | +6% | +89% |
| Gnu compilers (3.4.3 & g95) | +58% | +102% | +25% |
4-thread OpenMP Results on 4-CPU (Microway) 2.2 GHz Opteron, PC3200 server
| 4 thread MFLOPS | PathScale Advantage | |
| Original Himeno F77 OpenMP code PathScale 2.0 | 1969 | |
| Commercial 64-bit compiler | 1691 | +16% |
| PathScale-modified* Himeno F77 OpenMP code PathScale 2.0 | 5155 | |
| Commercial 64-bit compiler | 4309 | +20% |
* _System & Compiler Flag & source code Details_
SPEC® and the benchmark names SPECfp® and
SPECint® are registered trademarks of the Standard Performance Evaluation
Corporation. AMD, AMD Opteron, and combinations thereof are trademarks of
Advanced Micro Devices, Inc. Linux is a registered trademark of Linus
Torvalds. All other trademarks and company names mentioned are the property of
their respective owners.
| This code used at the University of Utah's Meteorology Department for climate research. The code consists of several closely coupled modules and is parallelized with MPI. It is written with Fortran 95 constructs. | |
![]() |
Results for this benchmark were run independently at the University of Utah and published with their permission. |
| 1CPU | 2CPU | 4CPU | |
| PathScale v1.2 | 368.89 sec. | 201.88 sec. | 99.11 sec. |
| PGI v5.2 | 483.45 sec. | 253.38 sec. | 135.53 sec. |
| % Faster for PathScale | +31.1% | +25.5% | +36.7.5% |
| Monte Carlo methods are extremely important in computational physics and related applied fields, and have many diverse applications. PathScale compilers do particularly well in Monte Carlo codes. | ||
![]() |
Results for this benchmark were run independently at Los Alamos National Laboratory and published with their permission. | |
| Time (lower number is better) |
PathScale % Faster | |
| PathScale v1.0 | 78.08 sec. | |
| PGI v5.1 | 135.80 Sec. | 73.93% |
| GCC v3.4.0 | 111.01 sec. | 42.20% |
Compiler settings for Quantum Monte Carlo C++ application
PathScale v1.0: pathCC -64 -ansiE -Ofast -ffast-math
PGI v5.1: pgCC -Kieee -fastsse -O3 -Minline=levels:10 -Msafeptr=global
-Mvect=sse -Mvect=assoc -Mvect=cachesize:1048576 -Mvect=prefetch
GCC v3.4.0: g++ -O3 -ffast-math -mtune=opteron -mfpmath=sse,387
-mieee-fp -m64
For more information on this benchmark go to: http://sourceforge.net/projects/qmcbeaver.







