Performance Results

The PathScale™ Compiler Suite is consistently proving to be the highest performing 64-bit compilers for AMD-based Opteron™ systems running the Linux operating system. Relative to our competition, our performance advantages are clear.

 

AM2 up to 36% faster
SPEC CPU2000 up to 26% faster
HIMENO BMT up to 20% faster
SPEC OMP2001 World Record Results at 2P & 4P
POLYHEDRON up to 33% faster
STREAM up to 16% faster
QUANTUM MONTE CARLO up to 73% faster

At PathScale, we are focused on maximizing real-world application performance. With help from some of our partners, we have been working with a number of well-known application codes. The PathScale Compiler Suite has been able to prove performance advantages of up 40% over the most well-known competitive Opteron compiler alternative. As always, the results on your specific codes will vary, but we are confident that using the PathScale C, C++, and Fortran compilers will bring immediate benefits to Opteron and Athlon64 users.

Do you have benchmark results you would like to share with other PathScale compiler customers? Send email to support@pathscale.com and we will work to include it in this page.

POLYHEDRON

The PathScale Compiler Suite produces the fastest and most accurate results for AMD Opteron systems for the Polyhedron 2004 Fortran 90 and Fortran 77 benchmarks. The Polyhedron web pages demonstrate this -- http://www.polyhedron.co.uk/compare/linux/f90bench_AMD.html, http://www.polyhedron.co.uk/compare/linux/f77bench_AMD.html -- as they show that PathScale is the only compiler vendor with no red squares for 64-bit results (meaning that on no code is 64-bit PathScale 50% or more slower than the fastest compiler on a benchmark code).

Here are some comparisons obtained from those web pages:

Polyhedron 2004 F77 Benchmarks 64-bit Compiler Comparisons
  Geometric Mean Time in seconds PathScale % Faster
PathScale 2.1 19.16  
Commercial Compiler A 25.54 +33%
Commercial Compiler B 22.75 +19%
Commercial Compiler C 29.26 +53%

 

Polyhedron 2004 F90 Benchmarks 64-bit Compiler Comparisons
  Geometric Mean Time in seconds PathScale % Faster
PathScale 2.1 22.76  
Commercial Compiler A 26.25 +15%
Commercial Compiler B 28.41 +25%
Commercial Compiler C 35.30 +55%

PathScale 2.1 64-bit optimization flags:
F77: -O3 -LNO:fu=9 -OPT:div_split:fast_math:fast_sqrt -IPA:plimit=3500
F90: -Ofast -OPT:fast_math=on -WOPT:if_conv=off -LNO:fu=9:full_unroll_size=7000



SPEC® CPU2000

The PathScale Compiler Suite enables the highest performance results for both integer and floating point SPEC CPU2000 speed benchmarks for any AMD64-based Linux® system. The best evidence for this is that since October 2004 through August 26, 2005, on AMD processors and Linux operating systems, there have been 186 CPU2000 results published at www.spec.org using PathScale compilers and none with other compilers.

Since there are no results with competitive compilers published recently on the SPEC web site, we ran our own comparison to a competitive compiler using latest compilers for each with the following results:

  SPECint®2000 SPECfp®2000
PathScale ™ v2.2.1 1598 1984
PGI® Workstation 6.0-5 1269 1779
% Faster for PathScale +26% +12%

Benchmarks were run on a 2.2 Ghz 1-CPU system with DDR400/PC3200 memory. Full details on the compiler flags and configuration used are available here. If anyone can provide us with improved base or peak optimization flags for the competitive compiler, we will be happy to use them and update these results.

Results Published by Our Partners Using the PathScale Compiler Suite -- Including Dual-Core Opteron Results

Recently, AMD and HP have chosen to submit SPEC CPU2000 results for dual-core Opteron (for example, Opteron Models 275 and 875) systems with PathScale Compilers. Also, IBM, HP, Fujitsu-Siemens, Sun, and AMD continue to choose the PathScale Compiler Suite to get the highest level of performance from their AMD64-based Linux® systems.

AMD 64-bit Opteron Logo

SPECint2000
SPECfp2000
SPECint_rate2000 (4 CPU, dual core)
SPECfp_rate2000 (4 CPU, dual core)
SPECint_rate2000 (2 CPU, dual core)
SPECfp_rate2000 (2 CPU, dual core)
HP Logo

SPECint2000
SPECfp2000
SPECint_rate2000 (4 CPU, dual core)
SPECint_rate2000 (4 CPU, single core)
SPECint_rate2000 (2 CPU)
SPECfp_rate2000 (4 CPU, dual core)
SPECfp_rate2000 (4 CPU, single core)
SPECfp_rate2000 (2 CPU)
Sun Microsystems Logo
SPECint2000
SPECfp2000
SPECint_rate2000 (4 CPU)
SPECint_rate2000 (2 CPU)
SPECfp_rate2000 (4 CPU)
SPECfp_rate2000 (2 CPU)
IBM Logo
SPECint2000
SPECfp2000
SPECint_rate2000 (2 CPU)
SPECfp_rate2000 (2 CPU)
Fujitsu Siemens Logo

SPECint2000
SPECfp2000
SPECint_rate2000 (2 CPU)
SPECfp_rate2000 (2 CPU)


SPEC® OMP2001

"Sun tested 2-way Sun Fire V20z and 4-way Sun Fire V40z servers using multiple SPEC benchmarks, including the SPEC® ompM2001 suite of OpenMP® benchmarks. The PathScale Compiler Suite helped Sun's AMD® Opteron® processor-based servers set world records for SPEC ompM2001 on two-processor and four-processor systems. The Sun/PathScale two-processor results were 29 percent faster (footnote 1) than previous-best Linux ompM2001 benchmarks using non-PathScale Fortran and C compilers. This 29 percent advantage, enabled in large part by PathScale compilers, far exceeds the eight percent faster clock rate of the newer Sun systems."

Sun Microsystems Logo
SPECompM2001 (2 thread, 2 core)
SPECompM2001 (4 thread, 4 core)

Footnote 1:
(1) About the SPEC OMPM2001 Results Reported Above:

Two-Processor Results: The Sun V40z server with PathScale Compiler Suite and 2.6 GHz AMD Opteron CPUs achieved a result of 6486 on a system with two cores, two chips and two threads. This comparison is based on the best performing two-processor Linux servers currently shipping, including previous results with competitor's compiler on a 2.4 GHz Sun Java Workstation W2100z system [SPECompM2001 5085, two cores, two chips, two threads].

Four-Processor Results: The Sun V40z server with PathScale Compiler Suite and 2.6 GHz AMD Opteron CPUs achieved a result of 11223 on a system with four cores, four chips and four threads. This comparison is based on the best performing four-processor Linux servers currently shipping, including previous results on a 2.4 GHz Sun V40z system with a non-PathScale compiler [SPECompM2001 8694, four cores, four chips, four threads].


STREAM

The Pathscale Compiler Suite produces the highest single-CPU and OpenMP Parallel STREAM results for any system powered by AMD CPUs.

OpenMP

Machine ID (Higher results are better) ncpus COPY SCALE ADD TRIAD
AMD_Opteron_848 (PathScale 2.0) 4 15378 15845 15618 15921
PathScale 2.2 4 16872 16932 16543 16545
% Faster for PathScale 2.2 vs. 2.0
+8% +7% +6% +4%

Single CPU

Machine ID (Higher results are better) ncpus COPY SCALE ADD TRIAD
ASUS_SK8N_Opteron248 (EKOPath 2.0) 1 4811 4782 4685 4682
ASUS_SK8N_Opteron248 (Comm'l 64bit Compiler) 1 4304 4251 4497 4458
PathScale 2.2 1 4902 4871 4979 4987
% Faster for PathScale 2.2
+14% +15% +11% +12%

The above results are for STREAM Benchmarks run on Opteron 248 (2.2 Ghz) machines with DDR400 memory and are posted at http://www.cs.virginia.edu/stream/standard/Bandwidth.html. Results for both PathScale and our competitor are both identified as as 'ASUS_SK8N_Opteron248' and 'ASUS_SK8N_Opteron248 (1 CPU)'. Click on the data link at the right of those lines for more details on the submission.

* Results with 'PathScale 2.1' are on the same system as the 2.0 results on the STREAM web site and use the following optimization flags:
OpenMP: pathf90/pathcc -O3 -CG:use_prefetchnta -LNO:prefetch_ahead=4 -mp
Serial: pathf90/pathcc -O3 -CG:use_prefetchnta .


HimenoBMT "The Performance Evaluation"

The Pathscale Compiler Suite produces the excellent single CPU and 4-CPU OpenMP results using the popular Himeno benchmark:http://accc.riken.jp/HPC/HimenoBMT/index_e.html

Serial results on Opteron 2.2 GHz, PC3200

  F77 F90
MFLOPS
C
PathScale 2.0 1584 1189 267
64-bit Commercial Compiler 1419 1125 141
GNU compilers (3.4.3 & g95) 1002 588 213
PathScale Advantage
Commercial 64-bit compiler +12% +6% +89%
Gnu compilers (3.4.3 & g95) +58% +102% +25%

4-thread OpenMP Results on 4-CPU (Microway) 2.2 GHz Opteron, PC3200 server

  4 thread MFLOPS PathScale Advantage
Original Himeno F77 OpenMP code PathScale 2.0 1969  
Commercial 64-bit compiler 1691 +16%
PathScale-modified* Himeno F77 OpenMP code PathScale 2.0 5155  
Commercial 64-bit compiler 4309 +20%

* _System & Compiler Flag & source code Details_



SPEC® and the benchmark names SPECfp® and SPECint® are registered trademarks of the Standard Performance Evaluation Corporation. AMD, AMD Opteron, and combinations thereof are trademarks of Advanced Micro Devices, Inc. Linux is a registered trademark of Linus Torvalds. All other trademarks and company names mentioned are the property of their respective owners.


AM2 ATMOSPHERE MODEL CODE

This code used at the University of Utah's Meteorology Department for climate research. The code consists of several closely coupled modules and is parallelized with MPI. It is written with Fortran 95 constructs.
Center for High Performance Computing - University of Utah Results for this benchmark were run independently at the University of Utah and published with their permission.

 

  1CPU 2CPU 4CPU
PathScale v1.2 368.89 sec. 201.88 sec. 99.11 sec.
PGI v5.2 483.45 sec. 253.38 sec. 135.53 sec.
% Faster for PathScale +31.1% +25.5% +36.7.5%

QUANTUM MONTE CARLO

Monte Carlo methods are extremely important in computational physics and related applied fields, and have many diverse applications. PathScale compilers do particularly well in Monte Carlo codes.
Los Alamos National Laboratory Results for this benchmark were run independently at Los Alamos National Laboratory and published with their permission.

 

  Time
(lower number is better)
PathScale % Faster
PathScale v1.0 78.08 sec.  
PGI v5.1 135.80 Sec. 73.93%
GCC v3.4.0 111.01 sec. 42.20%

Compiler settings for Quantum Monte Carlo C++ application
PathScale v1.0: pathCC -64 -ansiE -Ofast -ffast-math
PGI v5.1: pgCC -Kieee -fastsse -O3 -Minline=levels:10 -Msafeptr=global -Mvect=sse -Mvect=assoc -Mvect=cachesize:1048576 -Mvect=prefetch
GCC v3.4.0: g++ -O3 -ffast-math -mtune=opteron -mfpmath=sse,387 -mieee-fp -m64

For more information on this benchmark go to: http://sourceforge.net/projects/qmcbeaver.