Yep, that's the one I found. I implemented algorithm ZS1, and it was slower than my own
First, some Java technicalities: I implemented it using the Iterator interface. That means you have to make two methods:
1) hasNext() which reports if there's a next element in the list;
2) next() which actually gives the next element.
I put all the actual calculation in hasNext(); if it finds a next partition, it reports 'true', but it also has that next partition prepared and well. The implementation of next() only entails returning that partition, and two checks before that if one was found or if we first have to find the next one (in case next() is called twice in a row without a hasNext() in between).
So color me skeptical when the built-in Java profiling said he spent equal amount in both methods. Then I put in time counters myself in the code, and this is what it reported for a run calculating partitions of 75:
time in hasNext: 10828831745 8118265
time in next: 10659977218 8118263
The first number is the nanoseconds, the second the number of calls.
My implementation of the ZS1 algorithm from the paper gives:
time in convert: 10953318714 8118263
time in hasNext: 10293843313 8118265
time in next: 10072671076 8118263
The extra method 'convert' is needed to convert the result to the right format.
For the partition 4 = 2 + 1 + 1, the ZS1 algorithm gives the list (2, 1, 1), and I have to convert that to the list (2, 1, 0, 0). As you see, that costs equal time to any of the other two methods.
My conclusion of this is, that the time needed for the actual calculations are trifle compared to the overhead of the function calls. It would be a nice experiment to rewrite the stuff in C or C++ to see if that works better

.
Oh, and I already threw out most of the other overhead. The partitions I return are simple integer arrays, no fancy objects.