|
Post by lsvalgaard on Jul 14, 2012 14:30:28 GMT
Thanks! Updated the graphs for the new data (thanks, this literally doubled the number of data points I have). There's also a couple of new graphs showing just 2010-present, they don't show much of any interest though: cyberiantiger.org/lpdataJust updated the existing images, so just refreshing the page here should work for the graphs I posted already. I have a slight problem with the heavy red 'heat' in 2008 and 2009 when there were hardly any spots.
|
|
|
Post by lsvalgaard on Jul 14, 2012 15:31:03 GMT
Thanks! Updated the graphs for the new data (thanks, this literally doubled the number of data points I have). There's also a couple of new graphs showing just 2010-present, they don't show much of any interest though: cyberiantiger.org/lpdataJust updated the existing images, so just refreshing the page here should work for the graphs I posted already. Perhaps you should not normalize, but simply let the 'heat' show the 'density' of observations. Ah, you did that in the 'long version', and it is clear that some normalization is needed, but you may need some fudging to get around the very small number of observations in 2008-2009. E.g. try to use as Z-value the square root of what you use now.
|
|
|
Post by cybertiger on Jul 15, 2012 9:49:34 GMT
Agree about the normalizing.
Current algorithm uses the maximum possible value to normalize buckets, which leads to red splodges where there's only one data point.
I'm going to try a few different algorithms and see if I can come up with better ideas.
A simple one would be to normalize against the maximum value in the column, which would lead to at least 1 red point everywhere in the map.
An improvement on this would be to add a minimum value to the normalization, which means if there's below a certain level of data you'd not end up with any maximum value point for that column.
e.g. x = max(column); if (x < threshold) column = column / threshold; else column = column / x;
I don't think the kernel I'm using to make the heat map needs any alternation at least, it appears to be working very very well (if a little slow).
Edit: the graphs without normalization are on the linked page at the top.
Edit2: I might also try the sqrt approach, however there's one problem with this, the z value has a range of 0-MAX, sqrt normalizes to 1. I could try sqrt(1+value) though. A better choice might be to use a log scale for z (and it's probably easier to justify from a scientific point of view).
Edit3: Thinking about the log scale some more, in order for it to work I'll need to fudge the zscale minimum value as the buckets very far away from any data have very small values in.
|
|
|
Post by lsvalgaard on Jul 15, 2012 14:27:44 GMT
Agree about the normalizing. Current algorithm uses the maximum possible value to normalize buckets, which leads to red splodges where there's only one data point. I'm going to try a few different algorithms and see if I can come up with better ideas. A simple one would be to normalize against the maximum value in the column, which would lead to at least 1 red point everywhere in the map. An improvement on this would be to add a minimum value to the normalization, which means if there's below a certain level of data you'd not end up with any maximum value point for that column. e.g. x = max(column); if (x < threshold) column = column / threshold; else column = column / x; I don't think the kernel I'm using to make the heat map needs any alternation at least, it appears to be working very very well (if a little slow). Edit: the graphs without normalization are on the linked page at the top. Edit2: I might also try the sqrt approach, however there's one problem with this, the z value has a range of 0-MAX, sqrt normalizes to 1. I could try sqrt(1+value) though. A better choice might be to use a log scale for z (and it's probably easier to justify from a scientific point of view). Edit3: Thinking about the log scale some more, in order for it to work I'll need to fudge the zscale minimum value as the buckets very far away from any data have very small values in. About Edit2: the sqrt has a range 0-sqrt(MAX). I don't see a problem with that. The log has problems with zero
|
|
|
Post by cybertiger on Jul 15, 2012 20:33:30 GMT
About Edit2: the sqrt has a range 0-sqrt(MAX). I don't see a problem with that. The log has problems with zero e.g. If z has a range of 0-1, and you sqrt it, you make the problem worse not better. Anyway, I've come up with a slightly different approach and I'm currently working on improving it a little. So the theory goes instead of plotting a point for each row of data we're plotting a probability surface with a variance in x-units and y-units. With two points, if you're plotting the probability that one point falls into a bucket you do not sum the probability as: p(A or B) = p(A) + p(B) - p(A and B) Another advantage is scaling, given a probability that 1 point will fall into some bucket p(C), the probability of 2 points would be p(C)^2 and the probability of n points would be p(C)^n. And I can call it scaling rather than some arbitrary normalizing function based on how I want the data to look, and even use fractional numbers of points falling into the bucket if I desire. I'm currently modifying the code to allow specifying separate variances in x and y units (rather than choosing variances in terms of the number of buckets). I am estimating the probability of a point landing in a bucket based on the bivariate normal distribution with a correlation of 0 at the centre point of the bucket. This method falls down if either the x or y variance is of the same order of magnitude as the width or height of the bucket respectively. Estimating the cumulative bivariate normal distribution in order to calculate this isn't impossible however the mathematics is starting to cause me to struggle (There are recent papers on calculating it quickly, so I guess it's still an active area of study). I'm finding it rather amusing that Guass is responsible for not only the units of the y axis but also a lot of the maths I'm using to make the plot.
|
|
|
Post by cybertiger on Jul 15, 2012 23:18:26 GMT
Updated graphs using the probability method outlined in a recent post. I've updated the webpage with all this stuff (including new source code) here. I'm rather happy with this method of representing the data as I don't think I've done anything which can't be justified from a scientific point of view, having said that I did a quick search for any published papers on using this method and didn't find any (I didn't try very hard though, and I don't have access to any of the published paper archives which require payment).
|
|
|
Post by sigurdur on Jul 16, 2012 0:44:46 GMT
Cybertiger: Thank you for the work and effort you have put into this.
This visual is very easy to understand, and confirms the trends.
|
|
|
Post by lsvalgaard on Jul 16, 2012 3:51:07 GMT
Cybertiger: Thank you for the work and effort you have put into this. This visual is very easy to understand, and confirms the trends. ditto that
|
|
|
Post by cybertiger on Jul 16, 2012 8:45:52 GMT
Couple of things I've not mentioned.
Number of buckets is 640x480 and the graph is scaled such that the width and height of 1 bucket is equal in terms of pixels (I tried to aim for 1 bucket = 1 pixel, but I haven't managed to get gnuplot to behave quite how I want)
The x and y axis are chosen such that all the data points fit in the graph (this is hopefully obvious).
It'd be nice to draw a best fit line through median points, maybe also plot points and best fit for quartiles but that's a project for another weekend. Median and quartiles should be easy to calculate, getting gnuplot to overlay the points and best fit lines might be more fun.
|
|
|
Post by andrewuwe on Jul 16, 2012 10:34:13 GMT
Your probability images seem to show that we are already losing many sunspots to the L&P effect. I didn't realise.
Is there a way to picture the number or % that are "lost" against time? (And a testable prediction would be nice too.)
|
|
|
Post by cybertiger on Jul 16, 2012 11:32:47 GMT
Your probability images seem to show that we are already losing many sunspots to the L&P effect. I didn't realise. Is there a way to picture the number or % that are "lost" against time? (And a testable prediction would be nice too.) I think it's very hard to quantify the number of sunspots being lost without making some very invalid assumptions. Fitting straight lines to the median, lower and upper quartiles is probably a reasonable method, but the assumption that it's a straight line is clearly invalid, and also as you point out some sunspots aren't appearing (if the theory is correct) towards the end of the graph, and this will skew the best fit lines upwards (especially the one for the lowest quartile). Anyway, at the points lines for the lower quartile, median and upper quartile cross ~1450 gauss would be where the number of sun spots would be reduced by 25%, 50% and 75% respectively. This would be a reasonable prediction, but hard to validate as there are no good models for predicting the total number of sun spots beyond the next cycle. Wonderful presentation here: www.leif.org/research/Predicting%20the%20Solar%20Cycle.pdf regarding the state of predicting solar cycles from 2009.
|
|
|
Post by lsvalgaard on Jul 31, 2012 21:55:55 GMT
updated with July data
|
|
|
Post by justsomeguy on Aug 1, 2012 1:02:48 GMT
More nice data.
When does the paper come out?
|
|
|
Post by lsvalgaard on Aug 1, 2012 2:52:42 GMT
More nice data. When does the paper come out? I think in a week or two...
|
|
|
Post by sigurdur on Aug 1, 2012 2:59:37 GMT
Dr. Svalgaard: The jet stream over Europe is a bit south again this year. Last year NOAA had a piece that indicated UV rays have an effect on placement of the Jet Stream.
In your opinion is there something to this?
|
|