ParFunc: 2010

Thursday, October 14, 2010

Hacking together a working version of Haskell Platform 2010.2 for GHC 7.1

I don't know if GHC developers typically attempt to bring up a full, cabal-install based setup around development versions of the compiler, but that's what I wanted. To that end I hacked up a modified version of Haskell Platform 2010.2 that works with GHC 7.1.20101014. This is a recent version of GHC that includes, among other things:

Major run time improvements: namely, a BLACKHOLE fix that greatly improve parallel performance in some cases, relative to 6.12.
The LLVM backend.
The new type inference engine.

The modified version of Haskell Platform can be downloaded from the link below. You should be able to build it against a recent GHC that you can grab from the nightly snapshots (try 2010.10.14 to be safe). If that doesn't work, you can download my own build of the compiler below (Linux 64 bit) which is inflexible and needs to be unpacked in /opt/. I'm including links to download it with or without the platform pre-installed:

Download (2MB): haskell-platform-2010.2-HACKED.tar.bz2
Download (100MB): ghc7.1.20101014.tar.bz2
Download (112MB): ghc7.1_with_platform_preeinstalled.tar.bz2

After that you should be ready for installing other software via "cabal install".

The modified haskell platform tarball contains my notes on the hacks that were necessary, copied below:


I did a few hacks to get this working with GHC 7.1.20101014

First was to upgrade the MTL package to a slightly newer version,
downloaded from Hackage.  Ditto for deepseq.

The second was to hack the .cabal file in haskell-src to be less
picky about upperbounds on version numbers.

But I still ran into compile problems with haskell-src.  (And there's
no new version of it released at the moment to upgrade to.)  Namely, I
got the following compile error:

Language/Haskell/Syntax.hs:67:8:
Could not find module `Data.Generics.Instances':
Use -v to see a list of the files searched for.

Ok, I included "syb" in the list of packages.

Next, build error on quickcheck... upgraded to 2.3.0.2
But that didn't fix the problem -- "split" is still undefined.

The package 'parallel' had to be loosened up to tolerate newer containers versions.

HTTP complained about wanting base 3... but why was the "old-base"
flag set anyway?

Ditto zlib.

Finally, the cabal-install package also required some relaxation of
version numbers, and worse it seems like the type of a library
function has changed from a Maybe result to a definite one.

Here I had to make a choice between updating to cabal-install 0.9.2
or hacking the 0.8.2 version.  It works to add an "fmap Just" to get
0.8.2 to build, and besides the 0.9.2 version I have is actually just
the darcs head -- there hasn't been a release yet.

After picking the hacked 0.8.2 version, `make` succeeded for my
modified haskell-platform-2010.2.

Tuesday, June 29, 2010

Good ol' FP infighting

A Schemer by upbringing, I know all about infighting!

It seems that there's been an ongoing back and forth between certain ML and Haskell partisans about performance predictability. Further, there's been the accusation that Haskell has horrible hash table performance. (Haven't ran these tests myself yet, but I'd like to.)

Anyway, the Haskell CnC project that I announced on the Intel blogs has been garnering some feedback. It got panned particularly brutally by Jon Harrop here. I don't know, maybe that's just what Jon likes to do. In any case, I wrote a response in the comments and it grew big enough that I thought I'd make it into a post (below).

Response to: http://flyingfrogblog.blogspot.com/2010/06/intel-concurrent-collections-for.html

I think there's a bit of a misunderstanding here in the initial conflation of CnC with "graph reduction". CnC is a programming model based on static graphs (no reduction) like synchronous data flow. (I recommend the papers on StreamIT in this area.)

If anything, you could view this as backing down from some of the traditional Haskell implementation decisions. The user makes decisions as to the granularity of the parallel tasks (the granularity of the nodes in the graph), and CnC eschews lazy evaluation during graph execution.

I guess I feel like I've been caught in the middle of some existing flame wars wrt hash tables and performance predictability. I'm aware of the existing battle over Haskell hash tables. I have no problem with hash tables. The Haskell CnC implementation makes no use of persistence and can use mutable map data structures just as well as immutable. I use Haskell hash tables, but as they don't support concurrent update (and aren't high performance to start with) they're not the default. I'd love better hash tables. In other implementations we use TBB hashmaps.

Regarding performance predictability -- sure, it's a problem, both because of lazy evaluation and dependence on high level (read fragile) optimizations. But I don't see why this needs to be an argument. People who need great predictability today can use something else. Haskellers should continue to try to improve predictability and scaling. What else can they do? Even in this little CnC project we have made progress -- complex runtime, quirks and all -- and have Haskell CnC benchmarks getting parallel speedups over 20X. Not perfect, but improving. (I'll give your raytracing benchmark a try sometime, if you like; that sounds fun.)

I also don't think it's a good idea to endorse Dmitriy Vyukov "kindly" dismissing all of Haskell and Erlang. This is an even more extreme position than Jon's own "doesn't scale beyond a few cores" position.

Jon, I'm amenable to your arguments and not necessarily in a different camp (I'm an eager functional programmer more than a lazy one), but I would appreciate not being so cursorily panned!

P.S. (Regarding "the only problem that we've solved" being mandelbrot. Indeed that would be bad if it were so. Please see the paper. We don't have nearly as many benchmarks as I'd like yet, but we're doing Cholesky factorization, the Black-Scholes equation, and the N-body problem at least. Also the paper has some more in-depth discussions about scheduling. None of the current schedulers, by the way, are yet considered by me to be anywhere near an endpoint. Many low hanging fruit remain.)

P.P.S We also have a CnC implementation in F#.

I guess a blog wouldn't hurt.

I've been posting a bit on the Intel blogs recently. Yet I guess there will always be posts that aren't appropriate in that context, so why not create one of these new fangled Weblogs?