Floating Point Benchmark: Comparing Languages (Fourmilog: None Dare Call It Reason)

« Reading List: Back to the Klondike | Main | Mozilla Firefox: JavaScript "Unresponsive" Timeouts and about:config »

Thursday, August 4, 2005

Floating Point Benchmark: Comparing Languages

I have used my floating point benchmark, based on an optical design ray tracing program I wrote almost a quarter of a century ago, to measure the performance of computers and C language implementations ever since the days of the IBM PC/AT. Unlike many benchmarks in the early days of personal computing, it uses double precision floating point and trigonometric functions heavily, and it checks the answers, which has proved embarrassing to a number of compiler vendors over the years. The algorithm is well-understood (it's based on James H. Wyld's classic chapter on ray tracing in Amateur Telescope Making), and has been confirmed to produce results identical to the 11th decimal place on three different double precision floating point formats: IBM 360/370, VAX, and IEEE 754. As Moore's Law and compiler optimisation technology have progressed over the last two decades, current timings on the benchmark are more than five orders of magnitude faster than those of the first machines on which it was run--with identical results. The benchmark is a small, simple program which is easy to port even to less-than-standard language implementations and, in case problems arise, relatively easy to debug. During the 1980s, the benchmark proved uncannily accurate in predicting how fast AutoCAD would regenerate large drawings, and was used by Autodesk to test compilers and evaluate the suitability of different machines as AutoCAD platforms.

The reference implementation of the benchmark is in C, and it has primarily been used to test C implementations, but FORTRAN and Microsoft BASIC versions have been available almost since inception. Jim White ("mathimagics"--approach with extreme caution, known to be in possession of weapons of math instruction) recently ported the benchmark to Java, Microsoft Visual BASIC 6, and Scilab (a free MATLAB-like scientific software package) and kindly contributed them for others to use. His work motivated me to dust off another project from my infinite "to do" list, and I've added ports of the benchmark to Perl, Python, and JavaScript to the distribution as well, which may now be downloaded from the main benchmark page.

Having all these different language implementations (which produce identical results) invites comparison among languages, so I've spent the last couple of days running benchmarks, all on the same machine, with the following results. I've set the run time of the C implementation to 1, and expressed the run time of the other languages relative to that: a relative time of 5 means that language runs the benchmark five times more slowly than the C version. Now obviously, if you're doing heavy-duty number-crunching in a scripting language like Perl, Python, or JavaScript, you're in a state of sin, but it's interesting to know just how bad (or tolerable) the hit is, since it may make sense to do modest numerical work directly in a script rather than add all the complexity of (for example), writing a module in a compiled language. The performance figures for Visual BASIC and Java may surprise you; they surprised me.

Language	Relative Time	Details
C	1	GCC 3.2.3 `-O3`, Linux
Visual Basic .NET	0.866	All optimisations, Windows XP
FORTRAN	1.008	GNU Fortran (g77) 3.2.3 `-O3`, Linux
Java	1.121	Sun JDK 1.5.0_04-b05, Linux
Visual Basic 6	1.132	All optimisations, Windows XP
Python	17.6	Python 2.3.3 `-OO`, Linux
Perl	23.6	Perl v5.8.0, Linux
JavaScript	27.6 39.1 46.9	Opera 8.0, Linux Internet Explorer 6.0.2900, Windows XP Mozilla Firefox 1.0.6, Linux
QBasic	148.3	MS-DOS QBasic 1.1, Windows XP Console

For additional details, including the complete history of results on various machines, please see the fbench page. There is something about benchmarking which invites debate and acrimony, so let me state that I have no inclination to participate in the former nor indulge in or be the target of the latter. While I have taken care in running the benchmarks and am confident the results are repeatable within 2% on my machine with my programming environment, results on other systems, or from "benchmark tweaking" of the various programs may produce quite different results. In any case, the relative speed numbers are representative only of a program of this kind, and are meaningless for comparing other kinds of tasks (for example, I/O intensive work or text processing).

Posted at August 4, 2005 02:32

Fourmilog: None Dare Call It Reason

John Walker's Fourmilab Change Log

Thursday, August 4, 2005

Floating Point Benchmark: Comparing Languages