Fast plotting
Fast plotting
i was looking at FastScatterPlot and it sure is fast, but it also removes a lot of the functionality of other plots.
i think we could unify "fast" plotting with nice plotting by just adding getX() and getY() methods to the Dataset interface classes.
the main problem with fast plotting is that these interfaces require the data be returned as Number objects and therefore force object creation. not good.
i think that all of the functionality gained from access through Number could be done with
double getX()
and
double getY()
as well (this is for XYDataset, but it's the same thing for all other kinds and varieties).
namely:
- use as integer / long: (int) getX(); / (long) getX()
- check for x == null: Double.isNaN(x)
as far as i can see, that's the only things the Number object (as opposed to a double value) is used for.
i also had some ideas on how to make a faster number object:
1) make a mutable subclass of Number and return that from your data. if you use final modifiers for doubleValue, this would be almost as fast as direct array access. since Numbers are supposed to be immutable in Java it would be a dangerous hack - but it might work in the context of JFreeChart drawing...
2) make a Number subclass with final() methods for doubleValue etc. that would make it faster (by about a factor of 2), but it would still not do anything for memory performance.
what do ppl think of this and are there other ideas? i ran some tests and found that direct array access is largely useless. it's not any faster than a final method call (the JVM does indeed optimize the method call away), but it's a lot less flexible. IMHO there is never a need to directly access arrays. it's always better to wrap it in an interface.
i think we could unify "fast" plotting with nice plotting by just adding getX() and getY() methods to the Dataset interface classes.
the main problem with fast plotting is that these interfaces require the data be returned as Number objects and therefore force object creation. not good.
i think that all of the functionality gained from access through Number could be done with
double getX()
and
double getY()
as well (this is for XYDataset, but it's the same thing for all other kinds and varieties).
namely:
- use as integer / long: (int) getX(); / (long) getX()
- check for x == null: Double.isNaN(x)
as far as i can see, that's the only things the Number object (as opposed to a double value) is used for.
i also had some ideas on how to make a faster number object:
1) make a mutable subclass of Number and return that from your data. if you use final modifiers for doubleValue, this would be almost as fast as direct array access. since Numbers are supposed to be immutable in Java it would be a dangerous hack - but it might work in the context of JFreeChart drawing...
2) make a Number subclass with final() methods for doubleValue etc. that would make it faster (by about a factor of 2), but it would still not do anything for memory performance.
what do ppl think of this and are there other ideas? i ran some tests and found that direct array access is largely useless. it's not any faster than a final method call (the JVM does indeed optimize the method call away), but it's a lot less flexible. IMHO there is never a need to directly access arrays. it's always better to wrap it in an interface.
-
- JFreeChart Project Leader
- Posts: 11734
- Joined: Fri Mar 14, 2003 10:29 am
- antibot: No, of course not.
- Contact:
A lot of the (relative) speed in the FastScatterPlot class comes from the fact that the data is rendered as a simple dot using:
g2.fillRect(transX, transY, 1, 1);
There is no shape lookup, translation, or outlining to worry about, so it is faster than the more general renderer mechanism. Some gain (of course) comes from the use of double primitives and the direct array access, but I'm fairly certain the data access time is usually small relative to the data drawing time (although I have no reproducible benchmarking code to back this up, yet).
My preference for using Number objects in the dataset interfaces is mainly for the following reasons:
(1) It is more convenient for displaying datasets using Swing's JTable / TableModel (important to me);
(2) It allows the use of 'null' to represent missing values.
(3) It works fast enough for a large fraction of common requirements.
I'm turning my focus now towards getting a stable version 1.0 completed, and therefore I think it is unlikely that the dataset interfaces will be changed. I'd change my mind if (a) there was some solid benchmark code that shows a dramatic improvement in performance (reproducible across platforms) resulting from dataset changes, and (b) support for the change among a large group of JFreeChart users.
g2.fillRect(transX, transY, 1, 1);
There is no shape lookup, translation, or outlining to worry about, so it is faster than the more general renderer mechanism. Some gain (of course) comes from the use of double primitives and the direct array access, but I'm fairly certain the data access time is usually small relative to the data drawing time (although I have no reproducible benchmarking code to back this up, yet).
My preference for using Number objects in the dataset interfaces is mainly for the following reasons:
(1) It is more convenient for displaying datasets using Swing's JTable / TableModel (important to me);
(2) It allows the use of 'null' to represent missing values.
(3) It works fast enough for a large fraction of common requirements.
I'm turning my focus now towards getting a stable version 1.0 completed, and therefore I think it is unlikely that the dataset interfaces will be changed. I'd change my mind if (a) there was some solid benchmark code that shows a dramatic improvement in performance (reproducible across platforms) resulting from dataset changes, and (b) support for the change among a large group of JFreeChart users.
David Gilbert
JFreeChart Project Leader
Read my blog
Support JFree via the Github sponsorship program
JFreeChart Project Leader


david,
i think you are heads on with the notion that fast drawing equates drawing as little as possible. drawing as little as possible most likely has the largest effect on drawing speed. i agree that _accessing_ the Number object is fast enough...
i was not concerned with drawing speed as much as with creating the Number objects.
here is my user case, a typical scientific application: i have 30,000 data points, and display them on screen. then, i apply some sort of calculation to the data points. imagine i do this interactively by dragging around a slider.
if the API enforces the use of a Number object, this will be too slow, solely because i need to create 30,000 objects each time the data changes.
what i am afraid of is that enforcement of Number makes JFreeChart unusable for apps with a lot of data. the solution would be to offer Number as part of the interface for things that need it (like JTable) but to make it optional.
test results show that the overhead for object creation is significant. in the worst case (direct set of a value vs. creating a Number object containing that value) the difference is 50:1.
i also assume there will be a pretty big difference in memory use, both in static allocation (Number using a lot more memory than double) and in dynamic performance (creating lots of Objects causes lots of garbage collects/generally bad memory performance).
some of these issues _could_ be addressed by caching a mutable Number objects (and returning them to the cache when not used anymore). but not all.
i agree that Number is generally better than primitive types for all the reasons you mentioned. but it will also make writing scientific apps in JFreeChart a lot harder.
of course, we could just double all code and make fast versions of it, but in the interest of the architecture it would really be a lot better to find a common solution.
i think one approach would be to have both methods, e.g.
public Number getXValue(int series, int item);
and
public double getX(int series, int item)
in the interface.
if we then assume that Double.NaN is a valid replacement for null value, we could gradually rewrite the code so that it uses getX() whenever possible, and getXValue() only when neccessary.
this would cover JTable (where you have to convert your data to objects anyway even in scientific apps)... i think we can have the cake, and eat it, too.
my suggestion is therefore to add getX() and getY() - the primitive versions - to the XYDataset interface before 1.0 ships. then, we can still gradually make everything "fast" later... in effect that means gradually moving away from using Number objects everywhere where they are not needed.
i think you are heads on with the notion that fast drawing equates drawing as little as possible. drawing as little as possible most likely has the largest effect on drawing speed. i agree that _accessing_ the Number object is fast enough...
i love JFreeChart, so i want to make sure it will work well for large datasets. basically all scientific applications deal with large data sets at one point or another.david.gilbert wrote: I'm turning my focus now towards getting a stable version 1.0 completed, and therefore I think it is unlikely that the dataset interfaces will be changed. I'd change my mind if (a) there was some solid benchmark code that shows a dramatic improvement in performance (reproducible across platforms) resulting from dataset changes, and (b) support for the change among a large group of JFreeChart users.
i was not concerned with drawing speed as much as with creating the Number objects.
here is my user case, a typical scientific application: i have 30,000 data points, and display them on screen. then, i apply some sort of calculation to the data points. imagine i do this interactively by dragging around a slider.
if the API enforces the use of a Number object, this will be too slow, solely because i need to create 30,000 objects each time the data changes.
what i am afraid of is that enforcement of Number makes JFreeChart unusable for apps with a lot of data. the solution would be to offer Number as part of the interface for things that need it (like JTable) but to make it optional.
test results show that the overhead for object creation is significant. in the worst case (direct set of a value vs. creating a Number object containing that value) the difference is 50:1.
i also assume there will be a pretty big difference in memory use, both in static allocation (Number using a lot more memory than double) and in dynamic performance (creating lots of Objects causes lots of garbage collects/generally bad memory performance).
some of these issues _could_ be addressed by caching a mutable Number objects (and returning them to the cache when not used anymore). but not all.
i agree that Number is generally better than primitive types for all the reasons you mentioned. but it will also make writing scientific apps in JFreeChart a lot harder.
of course, we could just double all code and make fast versions of it, but in the interest of the architecture it would really be a lot better to find a common solution.
i think one approach would be to have both methods, e.g.
public Number getXValue(int series, int item);
and
public double getX(int series, int item)
in the interface.
if we then assume that Double.NaN is a valid replacement for null value, we could gradually rewrite the code so that it uses getX() whenever possible, and getXValue() only when neccessary.
this would cover JTable (where you have to convert your data to objects anyway even in scientific apps)... i think we can have the cake, and eat it, too.
my suggestion is therefore to add getX() and getY() - the primitive versions - to the XYDataset interface before 1.0 ships. then, we can still gradually make everything "fast" later... in effect that means gradually moving away from using Number objects everywhere where they are not needed.
-
- JFreeChart Project Leader
- Posts: 11734
- Joined: Fri Mar 14, 2003 10:29 am
- antibot: No, of course not.
- Contact:
OK, you are beginning to convince me. Regarding the Double.NaN representing missing values - is this a good assumption to make? It doesn't *feel* right to me, but I can't think of a valid reason why...are there any downsides to it?
David Gilbert
JFreeChart Project Leader
Read my blog
Support JFree via the Github sponsorship program
JFreeChart Project Leader


Good point. I had the same feeling, but, for lack of other options, suggested it anyway. Now i did some research...david.gilbert wrote:OK, you are beginning to convince me. Regarding the Double.NaN representing missing values - is this a good assumption to make? It doesn't *feel* right to me, but I can't think of a valid reason why...are there any downsides to it?
to quote the VM spec at http://java.sun.com/docs/books/vmspec/2 ... s.doc.html
for the purpose of plotting, the difference is not relevant. it just marks a point that can't be plotted.The NaN value is used to represent the result of certain invalid operations such as dividing zero by zero
i think that concludes the argument: missing and NaN are two different things. NaN has very specific semantics (such as that all operations involving NaN return NaN as well, and all boolean operations involving NaN return false.
the question remains on whether or not we can live with that in JFreeChart.
actually... i can think of a problem: an algorithm could ignore missing data points but not NaN data points. ie. maybe you would want to do some extrapolation if the data points are missing, but, at the same time, pass on invalid results (as they should)... great, i just shot down my own suggestion.

too bad... if we can't use NaN, we would have to use an explicit
isMissing()
method in Dataset... maybe that is actually the cleanest option.
-
- JFreeChart Project Leader
- Posts: 11734
- Joined: Fri Mar 14, 2003 10:29 am
- antibot: No, of course not.
- Contact:
Yeah, I think that is the best approach. If I can't think of any other obstacles, I'll try to get these changes made for the 0.9.19 release.nikster wrote:too bad... if we can't use NaN, we would have to use an explicit isMissing() method in Dataset... maybe that is actually the cleanest option.
David Gilbert
JFreeChart Project Leader
Read my blog
Support JFree via the Github sponsorship program
JFreeChart Project Leader


great!
we will have something like (names are just examples...)
getX()
getY()
isMissing(series, item)
& eventually use them everywhere we use getXValue() now, except places where we must have a Number object.
=> voila, scientific apps (and other apps with lots of data) just work in JFreeChart.
e.g. i can implement a super-fast XYDataset type with direct array access encapsulated in final versions of the "fast" methods above and it will work will all the rest of JFreeChart. which doesn't prevent us from doing things like FastScatterPlot for things that need even more speed. it's up to me how i implement isMissing() - from always returning false to checking for NaN to keeping a separate list
impact on existing Number-based applications is minimal since the three methods are trivial to implement if you have the Number object.
very cool
we will have something like (names are just examples...)
getX()
getY()
isMissing(series, item)
& eventually use them everywhere we use getXValue() now, except places where we must have a Number object.
=> voila, scientific apps (and other apps with lots of data) just work in JFreeChart.
e.g. i can implement a super-fast XYDataset type with direct array access encapsulated in final versions of the "fast" methods above and it will work will all the rest of JFreeChart. which doesn't prevent us from doing things like FastScatterPlot for things that need even more speed. it's up to me how i implement isMissing() - from always returning false to checking for NaN to keeping a separate list
impact on existing Number-based applications is minimal since the three methods are trivial to implement if you have the Number object.
very cool

-
- JFreeChart Project Leader
- Posts: 11734
- Joined: Fri Mar 14, 2003 10:29 am
- antibot: No, of course not.
- Contact:
Yes, thanks for suggesting the approach. Most others---unless I misunderstood them---have advocated *replacing* the Number objects with double primitives throughout the dataset interfaces. But your approach allows both to co-exist, and I think it is an excellent compromise (it allows a sort of "lazy" Number creation for those that want/need it). Of course, if someone else can see a big gotcha that we're missing, please speak up!
On the method names, do you prefer isMissing() or isUnknown(). I'm leaning towards the latter because a missing value is unknown, but an unknown value isn't necessarily missing. It is a small point though, I'm not that bothered by it...just thought I'd gather some opinions.
On the method names, do you prefer isMissing() or isUnknown(). I'm leaning towards the latter because a missing value is unknown, but an unknown value isn't necessarily missing. It is a small point though, I'm not that bothered by it...just thought I'd gather some opinions.
David Gilbert
JFreeChart Project Leader
Read my blog
Support JFree via the Github sponsorship program
JFreeChart Project Leader


i am pretty happy with our co-developed solution too
it's going to work.
in the apps i was writing so far, we always had lots of "missing" data (because the data comes from sensors and the sensors sometimes don't work or measure something out of bounds etc). but you might as well call that unknown data. in fact, to say there is some data, but we don't know it is probably more accurate even in my case.
i don't know when or how the missing data condition occurs in other applications...

no particular opinion.... i am just as happy to take isUnknown()david.gilbert wrote: On the method names, do you prefer isMissing() or isUnknown(). I'm leaning towards the latter because a missing value is unknown, but an unknown value isn't necessarily missing. It is a small point though, I'm not that bothered by it...just thought I'd gather some opinions.
in the apps i was writing so far, we always had lots of "missing" data (because the data comes from sensors and the sensors sometimes don't work or measure something out of bounds etc). but you might as well call that unknown data. in fact, to say there is some data, but we don't know it is probably more accurate even in my case.
i don't know when or how the missing data condition occurs in other applications...
-
- JFreeChart Project Leader
- Posts: 11734
- Joined: Fri Mar 14, 2003 10:29 am
- antibot: No, of course not.
- Contact:
I just had a thought - the isMissing() method is redundant. Its purpose is only to determine the meaning of getY() when it returns Double.NaN (which is a double primitive)...is it really 'not a number' or is it equivalent to 'null' (a missing or unknown value)?
My first thought was to change isMissing() to isNaN() because that is what it really tells us. But then why not get the same information from the getYValue() method - by checking whether it returns 'null' or a Number object (which is, presuming the dataset is behaving consistently, a Double or Float object where the isNaN() method returns true).
My first thought was to change isMissing() to isNaN() because that is what it really tells us. But then why not get the same information from the getYValue() method - by checking whether it returns 'null' or a Number object (which is, presuming the dataset is behaving consistently, a Double or Float object where the isNaN() method returns true).
David Gilbert
JFreeChart Project Leader
Read my blog
Support JFree via the Github sponsorship program
JFreeChart Project Leader


I'm currently developing an application that displays data at 64 Hz for the duration of 60 s (3840). I ran into the CPU performance problem because the recreation of 3840 Number objects at the rate of 64 Hz. One of my solutions was to sub-class the class Number and allow the application to modify the value it holds. In other words, I made it not a final class and I seemed to solve my problems.
Am I missing something?
Am I missing something?
i was considering the same thing.
the only problem is that the Number interface contract forbids this - Number is immutable. here is an old article about this: http://www.artima.com/intv/gosling313.html
what you did is create a mutable subclass of Number.
this is not a problem as long as you have full control over the use of your number objects. but because Java defines Number as immutable (at least the included implementations are) it remains a hack. you could run into trouble when classes from the JDK or other classes use the number object because everybody assumes that Number is immutable.
so they will probably, sometime, rely on the immutability of the number object. at best, this creates ambiguities, at worst, errors.
i think our solution will be better because we can avoid the topic altogether.
the only problem is that the Number interface contract forbids this - Number is immutable. here is an old article about this: http://www.artima.com/intv/gosling313.html
what you did is create a mutable subclass of Number.
this is not a problem as long as you have full control over the use of your number objects. but because Java defines Number as immutable (at least the included implementations are) it remains a hack. you could run into trouble when classes from the JDK or other classes use the number object because everybody assumes that Number is immutable.
so they will probably, sometime, rely on the immutability of the number object. at best, this creates ambiguities, at worst, errors.
i think our solution will be better because we can avoid the topic altogether.
well, i am not sure of that. i would think that if isMissing() returns true, the getY() value returned is irrelevant (e.g. undefined).david.gilbert wrote:I just had a thought - the isMissing() method is redundant. Its purpose is only to determine the meaning of getY() when it returns Double.NaN (which is a double primitive)...is it really 'not a number' or is it equivalent to 'null' (a missing or unknown value)?
isNaN() is not really the same in this respect. we want to find out when the value is missing (in order to be compatible with the meaning of null in getYValue())... isMissing therefore must have the same semantics.
uh, the whole point was to not have to call getYValue() wherever you don't need a number - in order to prevent unneccessary object creation.My first thought was to change isMissing() to isNaN() because that is what it really tells us. But then why not get the same information from the getYValue() method - by checking whether it returns 'null' or a Number object (which is, presuming the dataset is behaving consistently, a Double or Float object where the isNaN() method returns true).
since you don't know whether it's null in advance, you would end up creating all those numbers and the whole idea of doing lazy number initialization goes out the window.
to summarize the solution:
1 - in any kind of XYDataset, you add the following methods:
Code: Select all
double getX(...) {
return getXValue(...).doubleValue();
}
double getY() {
return getYValue(..).doubleValue();
}
boolean isMissing(...) {
return getXValue() == null;
}
3 - i can then implement the fast dataset like this
Code: Select all
double[] x, y;
boolean[] missingValueMap;
getX(..) {
return x[..];
}
getY(..) {
return y[..];
}
isMissing(..) {
return missingValueMap[..]
}
getXValue(..) {
if (missingValueMap[..])
return null;
else
return new Double(x[..]); // optionally cache the number object here...
}
getYValue(..) {
if (missingValueMap[..])
return null;
else
return new Double(y[..]);
}
the other advantage is that existing code does not change at all and people who don't want to deal with primitives don't have to.
one thing i had not thought of before: can it be that getXValue() returns null and getYValue returns != null? does that have a meaning? if so, we need to parameterize the isMissing method or have two of them...
agreed?

-
- JFreeChart Project Leader
- Posts: 11734
- Joined: Fri Mar 14, 2003 10:29 am
- antibot: No, of course not.
- Contact:
If isMissing() returns true, getY() should always be Double.NaN. That way, you can call getY() and if it is a number, just use it. But if it is Double.NaN, then you can refer to isMissing() to resolve whether it is *really* not-a-number, or actually a missing or unknown value.nikster wrote:well, i am not sure of that. i would think that if isMissing() returns true, the getY() value returned is irrelevant (e.g. undefined).
isNaN() is not really the same in this respect. we want to find out when the value is missing (in order to be compatible with the meaning of null in getYValue())... isMissing therefore must have the same semantics.
My point, though, is that you don't need isMissing() to do the resolution.
But notice that you only need to call getYValue() in one special circumstance where the y-value is either 'null' or 'new Double(Double.NaN)'. The former involves no object creation, and the latter could be a static instance shared by all datasets...no object creation required.nikster wrote:uh, the whole point was to not have to call getYValue() wherever you don't need a number - in order to prevent unneccessary object creation.My first thought was to change isMissing() to isNaN() because that is what it really tells us. But then why not get the same information from the getYValue() method - by checking whether it returns 'null' or a Number object (which is, presuming the dataset is behaving consistently, a Double or Float object where the isNaN() method returns true).
since you don't know whether it's null in advance, you would end up creating all those numbers and the whole idea of doing lazy number initialization goes out the window.
the advantage is that i then only create number objects when i must. and we can make the code work so that is very rarely, e.g. definitely not at drawing time.
I'm using this as the default for getX() and getY() (it went into CVS this morning):nikster wrote:1 - in any kind of XYDataset, you add the following methods:Code: Select all
double getX(...) { return getXValue(...).doubleValue(); } double getY() { return getYValue(..).doubleValue(); } boolean isMissing(...) { return getXValue() == null; }
Code: Select all
/**
* Returns the x-value (as a double primitive) for an item within a series.
*
* @param series the series (zero-based index).
* @param item the item (zero-based index).
*
* @return The x-value.
*/
public double getX(int series, int item) {
double result = Double.NaN;
Number x = getXValue(series, item);
if (x != null) {
result = x.doubleValue();
}
return result;
}
/**
* Returns the y-value (as a double primitive) for an item within a series.
*
* @param series the series (zero-based index).
* @param item the item (zero-based index).
*
* @return The y-value.
*/
public double getY(int series, int item) {
double result = Double.NaN;
Number y = getYValue(series, item);
if (y != null) {
result = y.doubleValue();
}
return result;
}
Agreed.nikster wrote: 2 - in all the drawing code and wherever you can substitute the Number object with the above calls, you do so.
If you ensure that y[..] is 'Double.NaN' (a double primitive) when missingValueMap[..] is 'true', then getYValue(..) will return 'null' or 'new Double(Double.NaN)' which tells you the same information as calling isMissing(). Now just modify the getYValue() code slightly to check for Double.NaN and return a static instance to prevent creating a 'new Double(Double.NaN)' every time. Something like this:nikster wrote:3 - i can then implement the fast dataset like thisthe advantage is that i then only create number objects when i must. and we can make the code work so that is very rarely, e.g. definitely not at drawing time.Code: Select all
double[] x, y; boolean[] missingValueMap; getX(..) { return x[..]; } getY(..) { return y[..]; } isMissing(..) { return missingValueMap[..] } getXValue(..) { if (missingValueMap[..]) return null; else return new Double(x[..]); // optionally cache the number object here... } getYValue(..) { if (missingValueMap[..]) return null; else return new Double(y[..]); }
Code: Select all
getYValue(..) {
if (missingValueMap[..])
return null;
else if (Double.isNaN(y[..]) )
return XYDataset.DOUBLE_NAN;
else
return new Double(y[..]);
}
Life is simpler if we require non-null x values, which is what I've always assumed (but the code may not enforce that everywhere). I can't think of an application that requires null x-values and defined y-values.nikster wrote:one thing i had not thought of before: can it be that getXValue() returns null and getYValue returns != null? does that have a meaning? if so, we need to parameterize the isMissing method or have two of them...
David Gilbert
JFreeChart Project Leader
Read my blog
Support JFree via the Github sponsorship program
JFreeChart Project Leader


i see.david.gilbert wrote:If isMissing() returns true, getY() should always be Double.NaN. That way, you can call getY() and if it is a number, just use it. But if it is Double.NaN, then you can refer to isMissing() to resolve whether it is *really* not-a-number, or actually a missing or unknown value.nikster wrote:well, i am not sure of that. i would think that if isMissing() returns true, the getY() value returned is irrelevant (e.g. undefined).
isNaN() is not really the same in this respect. we want to find out when the value is missing (in order to be compatible with the meaning of null in getYValue())... isMissing therefore must have the same semantics.
My point, though, is that you don't need isMissing() to do the resolution.
let's assume there is no isMissing method and that getX() returns Double.NaN for missing values.
=> system finds Double.NaN
=> checks getXValue() to see if it's null or NaN
my getXValue code then does this:
1) check if it's missing and return null in that case
2) check if it's NaN and return a static NaN number in that case
3) create object Number and return it
only 3) would be expensive.
you are correct that that would work and would not trigger the object creation...
the decision of whether to do it this way or that, in my opinion, depends entirely on which way to do this is more elegant in the calling code / the grand scheme of things.
in general, i am not one to hesitate in adding a new method to an interface if it make things more clear or if it makes code dealing with the class less arcane.
i don't know JFreeChart well enough to see all the nitty-gritty details so i would not be able to make that call...