Description of Benchmarks

Note: Our educated guesses concerning the reasons for a particular system's performance appear within [square brackets].

Microbenchmarks: floating point arithmetic


Sums the integers from 0 to 10000, 10000 iterations. A test of floating point arithmetic. Uses essentially the same code as the sum benchmark. [In Larceny, all floating point calculations are done using generic arithmetic.]


Calculation of the 25th Fibonacci number using inexact numbers, 50 iterations. A test of floating point arithmetic. Uses essentially the same code as the fib benchmark.

Microbenchmarks: calls, closures, continuations


Sums the integers from 0 to 10000, 10000 iterations.


Doubly recursive computation of the 25th fibonacci number (75025), using (< n 2) to terminate the recursion. [Register windows should perform well on this benchmark, but gcc may squander part of this advantage by allocating a register window even for the base case.]


A triply recursive integer function related to the Takeuchi function, a Gabriel benchmark, 1000 iterations. A test of non-tail calls and arithmetic. Historical note: The Symbolics 3600 performed 1 iteration of this benchmark in 0.43 seconds using generic arithmetic, which is about 65 times as slow as a Sun Ultra 1.


The tak benchmark in continuation-passing style, 300 iterations. A test of closure creation. [Improperly tail recursive implementations convert this benchmark's tail recursion into deep recursion, which may exceed limits on the size of a control stack.]


The tak benchmark in continuation-capturing style, 30 iterations. A test of call-with-current-continuation. [Larceny's code for call-with-current-continuation is now written in C, and most of its time on this benchmark is spent crossing the Scheme/C barrier.]

Microbenchmarks: lists


The tak benchmark using lists to represent integers, a Gabriel benchmark, 200 iterations. This benchmark contains a peculiar boolean expression for which some compilers generate very poor code:
    (define (shorterp x y)
      (and (not (null? y))
           (or (null? x)
               (shorterp (cdr x)
                         (cdr y)))))
Compare with the ntakl benchmark.


A version of the takl benchmark in which the peculiar boolean expression has been rewritten to make it less confusing for most people and for some compilers:
    (define (shorterp x y)
      (cond ((null? y) #f)
            ((null? x) #t)
             (shorterp (cdr x) (cdr y)))))
The rewritten expression is logically equivalent to the original. [The C versions of the takl and ntakl benchmarks make no attempt to reclaim heap storage, which gives C an advantage over garbage-collected languages. This advantage disappears in benchmarks that allocate large amounts of heap storage, such as diviter, divrec, gcbench, and perm9.]


Table-driven symbolic differentiation, a Gabriel benchmark, 800000 iterations. Uses association lists instead of the original benchmark's property lists.


Symbolic differentiation, a Gabriel benchmark, 800000 iterations.


Destructive list operations, a Gabriel benchmark, 300 iterations.


Divides 200 by 2 using lists as a unary notation for integers, a Gabriel benchmark, 400000 iterations. This benchmark tests null?, cons, car, cdr, and little else. [This benchmark allocates 320 megabytes of heap storage, so the C version can't get by without deallocating storage. The C code for this benchmark performs node-at-a-time deallocation using either free or a custom deallocator. On our test machine, the standard library version of free is very slow for this benchmark, and even the custom deallocator can't reclaim storage quite as fast as a generational garbage collector. Compare this benchmark with perm9, for which a custom node-at-a-time deallocator outperforms generational garbage collection.]


This benchmark is the same as idiv2 except it uses deep recursion instead of iteration. [Implementations that use the SPARC's register windows to implement recursion will perform poorly on this benchmark. That is why gcc performs so poorly even with the custom storage allocator.]


Browsing a data base, a Gabriel benchmark, 600 iterations. [May be a test of string->symbol and/or symbol->string.]

Larger benchmarks: floating point


Fast Fourier Transform, a Gabriel benchmark, 2000 iterations. A test of floating point arithmetic. (I haven't yet checked to see whether it has the same bugs as the original Gabriel benchmark.)


Generation of a Mandelbrot set, 30 iterations. A test of floating point arithmetic.


Determination of a nucleic acid's spatial structure, 10 iterations. A test of floating point arithmetic, and a real program.


Testing to see whether a point is contained within a 2-dimensional polygon, 10000 iterations. A test of floating point arithmetic.


Ray tracing a simple scene, 10 iterations. A test of floating point arithmetic. This program is translated from the Common Lisp code in Example 9.8 of Paul Graham's book on ANSI Common Lisp.


Simplex algorithm, 60000 iterations. A test of floating point arithmetic, and a real program.

Larger benchmarks: puzzles


Combinatorial search of a state space, a Gabriel benchmark, 100 iterations. A test of arrays and classical compiler optimizations. This benchmark was originally written in Pascal by Forrest Baskett.


Another combinatorial search similar to puzzle, a Gabriel benchmark, 10 iterations.

Larger benchmarks: miscellaneous


A type checker written by Jim Miller, 20 iterations.


Dynamic type inference, self-applied, 10 iterations. Written by Fritz Henglein. A real program.


Earley's parsing algorithm, parsing a 9-symbol input according to one of the simplest ambiguous grammars, 150 iterations. A real program, applied to toy data.


This program was provided by Andrew Wright, but we don't know much about it, and would appreciate more information. This higher order program creates closures almost as often as it performs non-tail procedure calls.


Another program that was provided by Andrew Wright. This program may have been written by Jim Miller. It enumerates the order-preserving maps between finite lattices.


Constructs a maze on a hexagonal grid, 2500 iterations. Written by Olin Shivers.


Constructs a maze on a rectangular grid using purely functional style, 100 iterations. Written by Olin Shivers.


Partial evaluation of Scheme code, 100 iterations. Written by Marc Feeley.


A Scheme interpreter evaluating a merge sort, 200 iterations. Written by Marc Feeley.


Scheme to LaTeX processor, 20 iterations. A test of file i/o and probably much else. Part of a real program written by Dorai Sitaram.

Larger benchmarks: variations on boyer


Bob Boyer's theorem proving benchmark, a Gabriel benchmark, 10 iterations. This benchmark uses association lists in place of the original benchmark's property lists. The nboyer benchmark is an updated version of this benchmark.


Translation into Scheme of a translation into Standard ML of a translation into Scheme of the Common Lisp code for the original boyer benchmark, 10 iterations. Thanks to Standard ML, the data structures are somewhat more convoluted than in the boyer benchmark, but aren't much better.


An updated and exponentially scalable version of the boyer benchmark, 1 iteration on a problem of size 3. (The boyer and smlboyer benchmarks solve a problem of size 0.) This benchmark's data structures are considerably more appropriate than the data structures used in the boyer and smlboyer benchmarks. A test of lists, vectors, and garbage collection.


A version of nboyer that has been tuned (by Henry Baker) to reduce storage allocation, making it less of a garbage collection benchmark and more of a compiler benchmark. Only 4 lines of code were changed, and another 7 lines of code were added.

Synthetic benchmarks: garbage collection


This program was written to mimic the phase structure that has been conjectured for a class of application programs for which garbage collection may represent a significant fraction of the execution time. This benchmark warms up by allocating and then dropping a binary tree of height 18. Then it allocates a permanent tree of height 16 and a permanent array of 500000 floating point numbers. Then it allocates about 350 megabytes of tree storage in seven phases, allocating about 50 megabytes per phase. The first phase allocates 67648 trees of height 4, dropping each tree before allocating the next. The second phase allocates and drops 16512 trees of height 6, and so on to the last phase, which allocates 8 trees of height 16. Each phase is divided into two subphases. The first subphase allocates trees top-down using side effects, while the second subphase allocates trees bottom-up without using side effects. This benchmark was written in Java by John Ellis and Pete Kovac, modified by Hans Boehm, and translated into Scheme, Standard ML, C++, and C by William Clinger.

One glaring difference between this benchmark and typical application code is that none of the trees are traversed after they are constructed. [This puts C and C++ at a disadvantage, because languages that rely on explicit deallocation must traverse the trees to deallocate their storage.]

[Generational garbage collectors perform extremely well on the early phases of this benchmark, but are likely to perform worse than non-generational collectors on the last phase or two.]


Creates a list containing all 362880 permutations of a list of 9 integers, in grey code order, using Zaks's algorithm, 5 iterations. A test of storage allocation and garbage collection. At the end of each iteration, the oldest half of the live storage becomes garbage.

[This benchmark is particularly difficult for generational garbage collectors, since it violates their assumption that young objects have a shorter future life expectancy than older objects. Nevertheless the generational collectors used in Larceny and in Chez Scheme perform reasonably well.]

[This benchmark is also difficult for C, because the permutation lists share much of their substructure. To avoid freeing nodes more than once, the C version of this benchmark performs an extra pass over the lists to compute (biased negative) reference counts for each node:

  for ( p = perms; p != 0; p = p->cdr ) {
    for ( x = p->car; x != 0; x = x->cdr ) {
      if ( x->num > 0 )
        x->num = 0;
      else --(x->num);
This extra pass can be omitted when using a storage allocator written especially for this benchmark instead of using malloc and free. With this custom allocator, the code generated by gcc runs almost 10 times as fast. The code shown above does not account for very much of this improvement; most of the improvement must be blamed on the library code for free.]

Last updated 6 July 1999.