« Reading List: Tubes | Main | Reading List: Turing & Burroughs »

Saturday, September 22, 2012

Floating Point Benchmark: Haskell Language Added

I have posted an update to my trigonometry-intense floating point benchmark which adds Haskell to the list of languages in which the benchmark is implemented. A new release of the benchmark collection including Haskell is now available for downloading.

The Haskell benchmark was developed and tested on Glasgow Haskell Compiler version 7.4.1 on Ubuntu Linux 11.04; the relative performance of the various language implementations (with C taken as 1) is as follows. All implementations of the benchmark listed below produced identical results to the last (11th) decimal place.

Language Relative
C 1 GCC 3.2.3 -O3, Linux
Visual Basic .NET 0.866 All optimisations, Windows XP
FORTRAN 1.008 GNU Fortran (g77) 3.2.3 -O3, Linux
Pascal 1.027
Free Pascal 2.2.0 -O3, Linux
GNU Pascal 2.1 (GCC 2.95.2) -O3, Linux
Java 1.121 Sun JDK 1.5.0_04-b05, Linux
Visual Basic 6 1.132 All optimisations, Windows XP
Haskell 1.223 GHC 7.4.1-O2 -funbox-strict-fields, Linux
Ada 1.401 GNAT/GCC 3.4.4 -O3, Linux
Lisp 7.41
GNU Common Lisp 2.6.7, Compiled, Linux
GNU Common Lisp 2.6.7, Interpreted
Smalltalk 7.59 GNU Smalltalk 2.3.5, Linux
Python 17.6 Python 2.3.3 -OO, Linux
Perl 23.6 Perl v5.8.0, Linux
Ruby 26.1 Ruby 1.8.3, Linux
JavaScript 27.6
Opera 8.0, Linux
Internet Explorer 6.0.2900, Windows XP
Mozilla Firefox 1.0.6, Linux
QBasic 148.3 MS-DOS QBasic 1.1, Windows XP Console

Benchmarking in a purely functional language with lazy evaluation like Haskell presents formidable challenges and requires some judgement calls which do not occur in benchmarks of procedural languages (although similar situations may arise in procedural languages with strong optimisation of invariant results).

Due to Haskell's lazy evaluation, a straightforward port of the floating point benchmark code from a procedural language (I started with the Ada implementation for this project, since I considered it the cleanest and best documented, using data types much in the spirit of Haskell), will result in a benchmark program which runs in essentially constant time. Since only the last result of the many iterations through the benchmark computation actually contributes to output visible to the outside world, all of the earlier computations go ker-thunk into the memory hole and are never actually performed. “If they can't see it, why do it?” is the Haskell motto. I've had employees who work that way.

A wise manager once told me, “People don't do what you expect, but what you inspect.” So it is with Haskell. In order to get a fair benchmark against a procedural language, we must arrange that each of the requested iterations in the benchmark run actually be fully calculated. In the main fbench.hs, we accomplish this by two ruses. First of all, for all but the final ray trace we vary the clear aperture of the design by adding a pseudorandom value to it. (Since the ray tracing algorithm runs in constant time regardless of this parameter [or the rest of the design, for that matter], this does not affect the timing result.) Each ray trace is then forced to be evaluated by applying the “deepseq” function to its ultimate result. (You may have to install the deepseq package into your Haskell environment—it is not, for example, installed by default on Ubuntu Linux when you install the compiler, ghc.) Since all we care about is ensuring the clear aperture is different on each successive ray trace, we use a pseudorandom generator with a constant seed.

Reference results are produced by compiling fbench.hs into a binary executable (see the Makefile for details) and then running it with an iteration count (specified by the first argument on the command line) which results in a run time of around five minutes. If a positive iteration count is specified, a classic fbench run which expects manual timing will be done. If the iteration count is negative, the run will be in batch mode, intended to be timed by running under the Unix “time” utility or equivalent.

But note that we're doing additional work by generating the pseudorandom numbers and the clanking recursion machinery and consequent garbage collections. Haskell's default pseudorandom generator is famously slow, so we must be careful that we're measuring Haskell's performance on our own code, rather than the wrapper which invokes the generator. This is accomplished by the fbench_baseline.hs program. This uses precisely the same logic to run the benchmark, but performs a null computation in place of the ray trace. (I've left all the ray trace code in this program, in case you want to experiment with switching all or part of it back on.) To calculate the “true” run time of the benchmark on the ray tracing code, we take the run time of fbench.hs and subtract that of fbench_baseline.hs for the same number of iterations. In comparing this with the run time of procedural languages, we assume that the loop overhead of those languages wrapping iterations of the benchmark is negligible; this is almost always the case, but it is not so in Haskell, hence the need for this correction.

Haskell purists may object to this whole exercise and contend that the fairest benchmark is a straight port of the C, FORTRAN, etc. code, measuring it against those languages. I am sympathetic to this argument—after all, if one of the main design goals of the language is to allow factoring out redundant computations, isn't that worthy of being measured in a benchmark? On the other hand, the mission of fbench, since its origin in the 1980s, has been to measure the performance of trigonometric function intense code, and one hardly accomplishes that by running code where all but the last iteration of the benchmark is optimised out by the compiler. Understand—I am extremely impressed, if not in awe, of Haskell's optimisation, but I also want to get a result which people can compare against realistic code, not a contrived benchmark like this one which (in procedural language implementations) simply does the same computation over and over. If you want to experience the magic of lazy evaluation, build the fbench_pure.hs program. It eschews randomisation of the design being ray traced and forcing evaluation of the results, and consequently runs in near-constant time regardless of the iteration count requested. If you wish to explore other strategies for fairly benchmarking Haskell, it's probably best to start with fbench_pure.hs, to which you can add your own code to force evaluation as appropriate.

Finally, let me note that this is the first Haskell program I have ever written (I developed a deep appreciation for the language in the process, and I trust it shall not be my last). I have tried to do things “the Haskell way”, but I may have committed any number of beginner blunders which elicit gnashville sounds from the teeth of those fluent in the language. If so, please chastise me and offer suggestions as to how I might remedy the shortcomings you perceive in this code.

In this release, the C benchmarks, which were previously at the top level of the directory tree in the archive, have been moved into their own c subdirectory with its own Makefile. The C benchmark programs are unchanged.

Update: Benchmark results for Haskell revised to use timings with GHC 7.4.1 and Haskell platform 2012.2.0.0. Its results are substantially better than those of the 6.12.3 compiler I originally used. (2012-09-23 12:23 UTC)

Update: Benchmark results for Haskell revised to use timings with strict fields for primitive types and the -funbox-strict-fields compiler option, which almost doubled execution speed. Special thanks to Don Stewart for analysing the benchmark and recommending this. (2012-09-24 15:40 UTC)

Posted at September 22, 2012 15:47