DefaultKeyedValues does not scale well

giorgio42 · Post by **giorgio42** » Wed Oct 31, 2007 12:06 am

Among other methods in this class the getIndex(Comparable key) method iterates over the data list to find the matching row. This does not scale well with category charts that contain several thousands data points.

I wrote a unit test that uses the LineAndShapeRenderer to create scatter plot and profiled it in NetBeans. With 500 data points (doubles) the chart creation spends about 2.6 seconds in DefaultKeyedValues.getIndex(Comparable) (Dell D600 with 1.8 GHz and 1 GB RAM).

In my real application I have more than 5000 data points and it takes more than 10 minutes to generate the chart (on a Intel D640 CPU and 2 GB RAM), which means it is simply unusable in my interactive web application.

Is there an alternative to this implementation (it needs to be CategoryPlot, otherwise I could use FastScatterPlot)?

Maybe the class could use something like LinkedHashMap instead of List to avoid the iteration?

Thanks,
Georg

Post by **david.gilbert** » Wed Oct 31, 2007 12:19 pm

Although it sounds like you might be mis-using the category classes a little (especially if your data points are on a "value" scale), it's still a good idea to fix this performance bottle-neck. There's only one snag - we still support JRE 1.3.1, and LinkedHashMap didn't get introduced until JRE 1.4. So we'll have to either find an appropriate implementation from somewhere else or write our own. In the mean time, you can work around the problem by creating your own CategoryDataset implementation...there's no absolute requirement to use the default one provided.

Post by **Taqua** » Wed Oct 31, 2007 1:54 pm

Well, no need for a fancy new class here. This case could be solved by just maintaining two collections, one list for the indexed access and one hash-map for the faster lookup.

And while working on it, this class can be redesigned to get rid of the internal use of KeyedValue-items altogether. This greatly reduces the number of objects created here - which makes the Garbage-collector happy and your code faster.

I've commited a change to the SVN that removes the use of DefaultKeyValue objects inside that class and that splits the internal storage into two lists (one for the keys, one for the values - both for indexed access) and a hashmap that manages the key to index mapping.

This implementation may be not the *best* one, (getting rid of the disgusting Java-Collections would create the fastest solution), but it is fundamentally better than what was there before. However, sorting and inserting keys is now more expensive - but considering the fact that people tend to read more than they insert or sort, this seems to be ok.

If the performance of sort is a problem, then real cause for the performance trouble would be the programmer who abused this class in such a horrible way.

Post by **david.gilbert** » Wed Oct 31, 2007 3:47 pm

Taqua wrote:Well, no need for a fancy new class here. This case could be solved by just maintaining two collections, one list for the indexed access and one hash-map for the faster lookup.

That seems like a good workaround.

Taqua wrote:And while working on it, this class can be redesigned to get rid of the internal use of KeyedValue-items altogether. This greatly reduces the number of objects created here - which makes the Garbage-collector happy and your code faster.

Not so sure about that. Before, there was a DefaultKeyedValue instance for each data item, pointing to a key object and a Number object (3 objects per data item). Now, there's still a key object and a Number object, plus an Integer object representing the index for the key, and a HashMap.Entry for each key in the map (4 objects per data item). I'm not bothered about that particularly though...

Taqua wrote:I've commited a change to the SVN that removes the use of DefaultKeyValue objects inside that class and that splits the internal storage into two lists (one for the keys, one for the values - both for indexed access) and a hashmap that manages the key to index mapping.

Thanks. I'm going to rename the keyToValueMapping field to 'indexMap', but otherwise this looks good. Note that the trunk in Subversion will be JFreeChart 1.2.x one day - the 1.0.x series is in a branch. I'll copy your changes across so they make it into the 1.0.7 release.

giorgio42 · Post by **giorgio42** » Wed Oct 31, 2007 11:37 pm

I saw that the changes have already been copied over to the branch in the svn repository, therefore I checked out the source code, compiled jfreechart and reran the unit test.

Generating a CategoryChart with 5500 data points now takes about 15 seconds on my laptop.

Creating a similar chart on my more powerful PC took - letting it ran overnight - actually took more than 2 hours without the performance fix.

In my real application most of the (about 25 million) values are null. I will try to filter those out before generating the chart.

The issue is resolved with a performance gain way beyond my expectations.

Thanks,
Georg