« November 2, 2017 | Main | November 10, 2017 »

Monday, November 6, 2017

Floating Point Benchmark: Back to BASICs

The floating point benchmark was born in BASIC. The progenitor of the benchmark was an interactive optical design and ray tracing application I wrote in 1980 in Marinchip QBASIC [PDF, 19 Mb]. This was, for the time, a computationally intensive process, as analysis of a multi-element lens design required tracing four light rays with different wavelengths and axial incidence through each surface of the assembly, with multiple trigonometric function evaluations for each surface transit. In the days of software floating point, before the advent of math coprocessors or integrated floating point units, this took a while; more than a second for each analysis.

After I became involved in Autodesk, and we began to receive requests from numerous computer manufacturers and resellers to develop versions of AutoCAD for their machines, I perceived a need to be able to evaluate the relative performance of candidate platforms before investing the major effort to do a full port of AutoCAD. It was clear that the main bottleneck in AutoCAD's “REGEN” performance (the time it took to display a view of a drawing file on the screen) was the machine's floating point performance. Unlike many competitors at the time, AutoCAD used floating point numbers (double precision for the 80x86 version of the program) for the coordinates in its database, and mapping these to the screen required a great deal of floating point arithmetic to be performed. It occurred to me that the largely forgotten lens design program might be a good model for AutoCAD's performance, and since it was only a few pages of code, even less when stripped of its interactive user interface and ability to load and save designs, it could be easily ported to new machines. I made a prototype of the benchmark on the Marinchip machine by stripping the lens design program down to its essential inner loop and then, after testing, rewrote the program in C, using the Lattice C compiler we used for the IBM PC and other 8086 versions of AutoCAD.

Other than running a timing test on the Marinchip 9900 to establish that, even in 1986, it was still faster than the IBM PC/AT, the QBASIC version of the benchmark was set aside and is now lost in the mists of time. Throughout the 1980s and '90s, the C version was widely used to test candidate machines and proved unreasonably effective in predicting how fast AutoCAD would run when ported to them. Since by then most machines had compatible C compilers, running the benchmark was simply a matter of recompiling it and running a timing test. From the start, the C version of the benchmark checked the results of the ray trace and analysis to the last (11th) decimal place against the reference results from the original QBASIC program. This was useful in detecting errors in floating point implementations and mathematical function libraries which would be disastrous for AutoCAD and would preclude a port until remedied. The benchmark's accuracy check was shown to be invariant of the underlying floating point format, producing identical results on a variety of implementations.

Later, I became interested in comparing the performance of different programming languages for scientific computation, so I began to port the C benchmark to various languages, resulting in the comparison table at the end of this article. First was a port of the original QBASIC program to the Microsoft/IBM BASICA included with MS-DOS on the IBM PC and PC/AT. This involved transforming QBASIC's relatively clean (for BASIC, anyway) syntax to the gnarly world of line numbers, two character variable names, and GOSUBs which was BASICA. This allowed comparing the speed of BASICA with the C compiler we were using and showed that, indeed, C was much faster. The BASICA version of the benchmark was preserved and has been included in distributions of the benchmark collection for years in the mbasic (for Microsoft BASIC) directory, but due to the obsolescence of the language, no timings of it have been done since the original run on the PC/AT in 1984.

I was curious how this archaic version of the benchmark might perform on a modern machine, so when I happened upon Michael Haardt's Bas, a free implementation of the original BASICA/GW-BASIC language written in portable C, I realised the opportunity for such a comparison might be at hand. I downloaded version 2.4 of Bas and built it without any problems. I was delighted to discover that it ran the Microsoft BASIC version of the benchmark, last saved in 1998, without any modifications, and produced results accurate to the last digit.

Bas is a tokenising interpreter, not a compiler, so it could be expected to run much slower than any compiled language. I started with the usual comparison to C. I ran a preliminary timing test to determine an iteration count which would yield a run time of around five minutes, ran five timing runs on an idle machine, and for 3,056,858 iterations obtained a mean run time of 291.64 seconds, or 95.4052 microseconds per iteration. Compared the the C benchmark, which runs in 1.7856 microseconds per iteration on the same machine, this is 53.42 times slower, which is shown in the table below in the “BASICA/GW-BASIC” row. This is still 2.78 times faster than Microsoft QBasic (not to be confused with Marinchip QBASIC) which, when I compared it to C on a Windows XP machine in the early 2000s (running in the MS-DOS console window), ran 148.3 times slower than the C version compiled with the Microsoft Visual C compiler on the same machine.

Since I had benchmarked this program with IBM BASICA in the 1980s, it was possible to do a machine performance comparison. The IBM PC/AT, running at 6 MHz, with BASICA version A3.00 and software floating point ran 1000 iterations of the benchmark in 3290 seconds (yes, almost an hour), for a time per iteration of 3.29 seconds per iteration. Dividing this by the present day time per iteration of 95.4052 microseconds per iteration with Bas, we find that the same program, still running in interpreted mode on a modern machine with hardware floating point, runs 34,484 times faster than 1984's flagship personal computer.

This made me curious how a modern compiled BASIC might stack up. In 2005 Jim White ported the C benchmark to Microsoft Visual BASIC (both version 6 and .NET), and obtained excellent results, with the .NET release actually running faster than Microsoft Visual C on Windows XP. (Well, of course this was a comparison against Monkey C, so maybe I shouldn't be surprised.) These ports of the benchmark are available in the visualbasic directory of the benchmark collection.

FreeBASIC is an open source (GPL) command line compiler for a language which is essentially compatible with Microsoft QuickBasic with some extensions for access to operating system facilities. The compiler produces executable code, using the GNU Binutils suite as its back-end. The compiler runs on multiple platforms including Linux and Microsoft Windows. Since Visual Basic is an extension of QuickBasic, crudded up with Windows-specific junk, I decided to try to port Jim White's Visual Basic version 6 code to FreeBASIC.

This wasn't as easy as I'd hoped, because in addition to stripping out all of the “Form” crap, I had to substantially change the source code due to Microsoft-typical fiddling with the language in their endless quest to torpedo developers foolish enough to invest work in their wasting asset platforms. I restructured the program in the interest of readability and added comments to explain what the program is doing. The “Form” output was rewritten to use conventional “Print using” statements. The internal Microsoft-specific timing code was removed and replaced with external timing. The INTRIG compile option (local math functions written in BASIC) was removed—I'm interested in the performance of the language's math functions, not the language for writing math functions.

After getting the program to compile and verifying that it produced correct output, I once again ran a preliminary timing test, determined an iteration count, and ran the archival timing tests, yielding a mean time of 296.54 seconds for 127,172,531 iterations, or 2.3318 microseconds per iteration. Compared to C's 1.7858 microseconds per iteration, this gives a run time of 1.306 times than of C, or almost exactly 30% slower. This is in the middle of the pack for compiled languages, although slower than the heavily optimised ones. See the “FreeBASIC” line in the table for the relative ranking.

The relative performance of the various language implementations (with C taken as 1) is as follows. All language implementations of the benchmark listed below produced identical results to the last (11th) decimal place.

Language Relative
Time
Details
C 1 GCC 3.2.3 -O3, Linux
JavaScript 0.372
0.424
1.334
1.378
1.386
1.495
Mozilla Firefox 55.0.2, Linux
Safari 11.0, MacOS X
Brave 0.18.36, Linux
Google Chrome 61.0.3163.91, Linux
Chromium 60.0.3112.113, Linux
Node.js v6.11.3, Linux
Chapel 0.528
0.0314
Chapel 1.16.0, -fast, Linux
Parallel, 64 threads
Visual Basic .NET 0.866 All optimisations, Windows XP
C++ 0.939
0.964
31.00
189.7
499.9
G++ 5.4.0, -O3, Linux, double
long double (80 bit)
__float128 (128 bit)
MPFR (128 bit)
MPFR (512 bit)
FORTRAN 1.008 GNU Fortran (g77) 3.2.3 -O3, Linux
Pascal 1.027
1.077
Free Pascal 2.2.0 -O3, Linux
GNU Pascal 2.1 (GCC 2.95.2) -O3, Linux
Swift 1.054 Swift 3.0.1, -O, Linux
Rust 1.077 Rust 0.13.0, --release, Linux
Java 1.121 Sun JDK 1.5.0_04-b05, Linux
Visual Basic 6 1.132 All optimisations, Windows XP
Haskell 1.223 GHC 7.4.1-O2 -funbox-strict-fields, Linux
Scala 1.263 Scala 2.12.3, OpenJDK 9, Linux
FreeBASIC 1.306 FreeBASIC 1.05.0, Linux
Ada 1.401 GNAT/GCC 3.4.4 -O3, Linux
Go 1.481 Go version go1.1.1 linux/amd64, Linux
Simula 2.099 GNU Cim 5.1, GCC 4.8.1 -O2, Linux
Lua 2.515
22.7
LuaJIT 2.0.3, Linux
Lua 5.2.3, Linux
Python 2.633
30.0
PyPy 2.2.1 (Python 2.7.3), Linux
Python 2.7.6, Linux
Erlang 3.663
9.335
Erlang/OTP 17, emulator 6.0, HiPE [native, {hipe, [o3]}]
Byte code (BEAM), Linux
ALGOL 60 3.951 MARST 2.7, GCC 4.8.1 -O3, Linux
PL/I 5.667 Iron Spring PL/I 0.9.9b beta, Linux
Lisp 7.41
19.8
GNU Common Lisp 2.6.7, Compiled, Linux
GNU Common Lisp 2.6.7, Interpreted
Smalltalk 7.59 GNU Smalltalk 2.3.5, Linux
Ruby 7.832 Ruby 2.4.2p198, Linux
Forth 9.92 Gforth 0.7.0, Linux
Prolog 11.72
5.747
SWI-Prolog 7.6.0-rc2, Linux
GNU Prolog 1.4.4, Linux, (limited iterations)
COBOL 12.5
46.3
Micro Focus Visual COBOL 2010, Windows 7
Fixed decimal instead of computational-2
Algol 68 15.2 Algol 68 Genie 2.4.1 -O3, Linux
Perl 23.6 Perl v5.8.0, Linux
BASICA/GW-BASIC 53.42 Bas 2.4, Linux
QBasic 148.3 MS-DOS QBasic 1.1, Windows XP Console
Mathematica 391.6 Mathematica 10.3.1.0, Raspberry Pi 3, Raspbian

Posted at 14:04 Permalink