I have a long format data like following.
year company value
2001 AAAAAAA 200
2002 AAAAAAA 300
2003 AAAAAAA 400
2001 BBBBBBB 150
2002 BBBBBBB 250
2000 CCCCCCC 500
2001 CCCCCCC 600
2002 CCCCCCC 550
Is it possible to calculate the average of income by each company, then make a histogram by calculated average on DC.js?
What I first did was to make dimension by company and calculate average, then tried to make average as the key.
Problem I faced there was that if you tried to make histogram out of that, you have to reduceCount it by average dimension. But of course the average does not exist on the crossfilter object.
This is my jsfiddle script.
http://jsfiddle.net/adamist521/n1383zdk/3/
Final image of the graph is also included in script.
Probably related to this one but couldn't figure out how to deal with non-integer average.
DC.js histogram of crossfilter dimension counts
To calculate a running average (Crossfilter is about running calculations) you need to keep track of both the sum and count. Once you have these, calculating the average is trivial. In fact, you should not calculate averages in Crossfilter at all, but just track the components. You need to use a custom group for this to track both count and sum on the same group.
There is an example of just such a custom group in the Crossfilter documentation at: https://github.com/crossfilter/crossfilter/wiki/API-Reference#group_order
In your dc.js chart, you would then define a valueAccessor to calculate the average at time of display:
chart.
...
.valueAccessor(function(d){ return d.value.total/d.value.count; })
Related
I'm writing a script that imposes many cliparts (small pictures) on one large image. The dimensions (width) of the overlapping cliparts can be random, for example, "from 20 to 100 pixels". As an additional effect, blur is added for each clipart, the level of which is also indicated in the range, for example, "from 2.00 to 8.00" and applied in a random order. There are no problems with this, everything works fine, the cliparts are added randomly, with a random blur...
Now I want to make it so that "the current size of the clipboart is smaller" (in the program cycle), "the it needs to be blurred more", and it is necessary to take into account the two specified ranges of values: in pixels "from 20 to 100" and blur level "from 2.00 up to 8.00".
For example:
If the "width of the clipart" is 100 pixels, then it is needed to apply blur 2 (minimal).
If the "width of the clipart" is 20 pixels, then it is needed to apply blur 8 (maximal).
I do not understand how to correctly calculate the "needed blur" from 2.00 to 8.00 for any size of the clipart from the range of 20 to 100? Please, help me to make this calculation. I do not understand mathematics well and can not find a formula.
Okay, I was helped to find the answer to this question on the Russian version of this site here: https://ru.stackoverflow.com/questions/767962/
I'm trying to reduce the number of points in a DC.js line chart to improve performance. The docs lead me to believe xUnits() is the way to do this:
The coordinate grid chart uses the xUnits function to calculate the number of data projections on x axis such as the number of bars for a bar chart or the number of dots for a line chart.
but xUnits does not even seem to be used:
http://jsfiddle.net/m5tguakf/2/
What am I doing wrong?
The number of points is actually determined by crossfilter - dc.js doesn't do any aggregation on its own, so it has no way to add or reduce the number of points.
That documentation may be misleading - it doesn't alter the shape of the data. xUnits is really just needed for dc.js to know the number of elements it is going to draw. It's used for two purposes:
to determine the width of bars or box-plots
to know whether the x scale is ordinal or quantitative
Could dc.js just count the number of points in the crossfilter group? Perhaps.
Anyway, to get back to your original question: if you want to reduce the number of points drawn, aggregate your data differently in your group. Usually this means creating larger bins which either sum or average the data which fall into that interval.
As a simple example, you can combine every other point in your fiddle by binning by even numbers, like so:
var BINSIZE = 2;
// ...
speedSumGroup = runDimension
.group(function(r) { return Math.floor(r/BINSIZE) * BINSIZE; })
// ...
http://jsfiddle.net/gordonwoodhull/djrhodkj/2/
This causes e.g. both Run 6 and Run 7 to fall in the same bin, because they have the same group key. In a real example, you'd probably want to average them, as shown in the annotated stock example.
I have some data like student+testpoints which i would like to plot. The test scores have a max. of 100 points, and min of 0. Like:
John 56points
Ann 72points
and so on for all the students
I have the data nicely in an array. I would like to do a barplot, with 10 bars each corresponding to a 10 point ranges of testscores, so the first bar is 0-10, etc, the last one is 90-100, and I would like the height of the bar be the number of students who have their grades in that range.
My question is, can d3 do this for me with the data format I have, or I should transform my data, do the counting, and easily plot the new data? Or can it do it without transformation? What is the proper way of showing different aspects of the same data?
I have some data which is collected for 6 days during 8:00AM to 11:00AM. I need to plot all the data on same plot one over other. The way I am doing now:
hold on
plot(y1,x1,':b*','MarkerEdgeColor','k')
plot(y2,x2,':r*','MarkerEdgeColor','k')
plot(y3,x3,':y*','MarkerEdgeColor','k')
plot(y4,x4,':g*','MarkerEdgeColor','k')
plot(y5,x5,':c*','MarkerEdgeColor','k')
plot(y6,x6,':w*','MarkerEdgeColor','k')
datetick('x','HH:MM:SS')
hold off
where x1 to x6 has y axis data and y1 to y6 have
y(i) = datenum(Year(1:5), Month(1:5), Input_Vector(1:5,2), Input_Vector(1:5,3), Input_Vector(1:5,4), Input_Vector(1:5,5));
When I plot using above, I get the image attached
But what I need to find patterns by observing them. So I need to have something one above other with x axis 8:00:00 to 11:00:00
I need something like and I got this by making DAY parameter constant date.
If you want to plot one day over another, then the method you used to make the second graph - discarding/replacing the date part of your datetime - is likely the best way to do it. It matches up nicely with the conceptual question that the graph answers, i.e.: "Is there a link between time of day and duration of journey, regardless of the day it was taken on?"
If you still want to preserve the day information, you could always perform the multiple plots with different line specs, and have the legend show which line corresponds to which day.
If the above question - finding a link between time and journey duration - is what you are trying to do, rather than plotting that specific type of graph, I would also try something like this:
Split your day into half hour or quarter hour slots and take the average of all data points in each block. This gives you a single value for each half/quarter hour span.
Plot this as a bar chart with error bars showing standard error (this can be done using bar and errorbars)
If I see anything, try fitting it with an appropriate model and check for goodness of fit. In your case this would probably be a Gaussian model, as your data kinda looks like it peaks around 9:20.
Let's say I have a list of values and I have already chunked them into groups to make a histogram.
Since Excel doesn't have histograms, I made a bar plot using the groups I developed. Specifically, I have the frequencies 2 6 12 10 2 and it produces the bar plot you see below.
Next, I want to add a normal distribution (line plot) with a mean of 0.136 and standard deviation of 0.497 on top of this histogram. How can I do this in excel? I need the axis to line up such that it takes up the width of the bar plot. Otherwise, you get something like I've attached.
But...the normal should be overlayed on the bar plot. How can I get this effect?
There are two main part to this answer:
First, I reverse-engineered the grouped data to come up with an appropriate mean and standard deviation on this scale.
Second, I employed some chart trickery to make the normal distribution curve look right when superimposed on the column chart. I used Excel 2007 for this; hopefully you have the same options available in your version.
Part 1: Reverse-Engineer
The column B formulae are:
Last Point =MAX(A2:A6)
Mean =SUMPRODUCT(B2:B6,A2:A6)/SUM(B2:B6)
E(x^2f) =SUMPRODUCT(A2:A6^2,B2:B6)
E(xf)^2 =SUMPRODUCT(A2:A6,B2:B6)^2
E(f) =SUM(B2:B6)
Variance =B10-B11/B12
StDev =SQRT(B13/(B12-1))
Part 2: Chart Trickery
Data table:
Column D is just an incremental counter. This will be the number of data points in the normal distribution curve.
E2 =D2/$B$8 etc.
F2 =NORMDIST(E2,$B$9,$B$14,FALSE) etc.
Chart:
Now, add Columns E:F to the chart. You will need to massage a few things:
Change the series to be an X-Y plot. This might require some editing of the chart series to force a single series to use your desired X and Y values.
Change the series to use the secondary axes (both X and Y).
Change the secondary X-axis range to 0.5-5.5 (i.e., 0.5 on either side of the column chart category values). This will effectively align the primary and secondary X-axes.
Change the secondary Y-axis range to 0-1
Format the X-Y series appearance to taste (I suggest removing value markers).
The result so far:
Lastly, you can remove the tick marks and labels on the secondary axes to clean up the look.
Postscript: Thanks to John Peltier for innumerable charting inspirations over the years.