Wednesday, 8 April 2009

Hash table woes

Recent performance studies have shown that Haskell's defacto-standard compiler, GHC, still bundles an extremely inefficient hash table implementation. Specifically, the benchmark showed optimized Haskell compiled to native code running even slower than interpreted Python and 32× slower than F#.

Surprisingly, the poor performance turned out to be due to a design flaw in the implementation of GHC's garbage collector that renders it incapable of handling updates to arrays of boxed types efficiently.

Contrary to a statement in the book "Real World Haskell", trees are incapable of providing comparable performance for the majority of use cases. Therefore, it is extremely important that this deficiency is addressed because, without a decent hash table implementation, almost all dictionary data structures will be over an order of magnitude slower in Haskell than other functional languages such as OCaml and F#. Suffice to say, hash tables are an extremely valuable data structure and failing to provide a decent implementation is a serious shortcoming.

Although Haskell programmers have been complaining about the poor performance of its hash tables for at least three years, Donald Stewart has just promised to address the issue by implementing a new and reasonably-performant hash table for GHC.


2 comments:

dons said...

I find your commentary on the Haskell community, it's activities and goals, and constant references to me personally, and also to Real World Haskell, amusing.

Flying Frog Consultancy Ltd. said...

@dons

You appear to be by far the most prolific contributor in the Haskell community and your book "Real World Haskell" is also by far the most popular book on Haskell. So I think you deserve a lot of press coverage.

I have the utmost confidence in your forthcoming hash table implementation. How is it coming along?