Let's say I have a list of values and I have already chunked them into groups to make a histogram.
Since Excel doesn't have histograms, I made a bar plot using the groups I developed. Specifically, I have the frequencies 2 6 12 10 2 and it produces the bar plot you see below.
Next, I want to add a normal distribution (line plot) with a mean of 0.136 and standard deviation of 0.497 on top of this histogram. How can I do this in excel? I need the axis to line up such that it takes up the width of the bar plot. Otherwise, you get something like I've attached.
But...the normal should be overlayed on the bar plot. How can I get this effect?
There are two main part to this answer:
First, I reverse-engineered the grouped data to come up with an appropriate mean and standard deviation on this scale.
Second, I employed some chart trickery to make the normal distribution curve look right when superimposed on the column chart. I used Excel 2007 for this; hopefully you have the same options available in your version.
Part 1: Reverse-Engineer
The column B formulae are:
Last Point =MAX(A2:A6)
Mean =SUMPRODUCT(B2:B6,A2:A6)/SUM(B2:B6)
E(x^2f) =SUMPRODUCT(A2:A6^2,B2:B6)
E(xf)^2 =SUMPRODUCT(A2:A6,B2:B6)^2
E(f) =SUM(B2:B6)
Variance =B10-B11/B12
StDev =SQRT(B13/(B12-1))
Part 2: Chart Trickery
Data table:
Column D is just an incremental counter. This will be the number of data points in the normal distribution curve.
E2 =D2/$B$8 etc.
F2 =NORMDIST(E2,$B$9,$B$14,FALSE) etc.
Chart:
Now, add Columns E:F to the chart. You will need to massage a few things:
Change the series to be an X-Y plot. This might require some editing of the chart series to force a single series to use your desired X and Y values.
Change the series to use the secondary axes (both X and Y).
Change the secondary X-axis range to 0.5-5.5 (i.e., 0.5 on either side of the column chart category values). This will effectively align the primary and secondary X-axes.
Change the secondary Y-axis range to 0-1
Format the X-Y series appearance to taste (I suggest removing value markers).
The result so far:
Lastly, you can remove the tick marks and labels on the secondary axes to clean up the look.
Postscript: Thanks to John Peltier for innumerable charting inspirations over the years.
Related
I would like to create a barplot in TIBCO Spotfire with frequency on Y axis based on two factors: Stage and Genotype.
This is the standard expression that I have from Spotfire:
Count() THEN [Value] / Sum([Value]) OVER (All([Axis.X]))
It turns out, I do not want the frequency over ALL the data, but within Stage. In a way that the sum of the frequency within each is stage it will be 100%.
I watched some videos and I still did not figured out.
I tried to find a solution to your problem but could't find a working expression. This can still help you :
What I would have done in your case is :
remove the Genotype from the X Axis
set the visualization as a 100% stacked bars (with right click)
add the Genotype as a color by parameter (in the visualization options)
When I do a scatter plot, by default it shows the axis from 0.0 to 1.0 fractions.
For example, the following graph contains a straight line that goes from (0,0) to (10m,10m), but it shows:
Detailed data generation show at: Large plot: ~20 million samples, gigabytes of data
How to make the axes show from 0 to 10 million instead?
The inspiration for this comes from this question.
Tested in VisIt 2.13.3.
Since scatter plot associates variables of potentially radically different scales, by default, it maps each variable's range into [0,1]. We have this ticket for it. You can manually change by going to scatter plot attribute's window and Apperance tab and un-checking the 'Normalize the axes to a cube' option
I'm trying to reduce the number of points in a DC.js line chart to improve performance. The docs lead me to believe xUnits() is the way to do this:
The coordinate grid chart uses the xUnits function to calculate the number of data projections on x axis such as the number of bars for a bar chart or the number of dots for a line chart.
but xUnits does not even seem to be used:
http://jsfiddle.net/m5tguakf/2/
What am I doing wrong?
The number of points is actually determined by crossfilter - dc.js doesn't do any aggregation on its own, so it has no way to add or reduce the number of points.
That documentation may be misleading - it doesn't alter the shape of the data. xUnits is really just needed for dc.js to know the number of elements it is going to draw. It's used for two purposes:
to determine the width of bars or box-plots
to know whether the x scale is ordinal or quantitative
Could dc.js just count the number of points in the crossfilter group? Perhaps.
Anyway, to get back to your original question: if you want to reduce the number of points drawn, aggregate your data differently in your group. Usually this means creating larger bins which either sum or average the data which fall into that interval.
As a simple example, you can combine every other point in your fiddle by binning by even numbers, like so:
var BINSIZE = 2;
// ...
speedSumGroup = runDimension
.group(function(r) { return Math.floor(r/BINSIZE) * BINSIZE; })
// ...
http://jsfiddle.net/gordonwoodhull/djrhodkj/2/
This causes e.g. both Run 6 and Run 7 to fall in the same bin, because they have the same group key. In a real example, you'd probably want to average them, as shown in the annotated stock example.
I am drawing charts using dc.js.The following is a frequency VS Day Chart
I am using the following line to generate the titles:
..something.yAxisLabel("Frequency").xAxisLabel('Day');
But the problem is as you see when the frequency is so large the Y axis title is colliding with the frequency numbers. So is there any simple way to move the Y axis title left?
The layout of auxiliary elements such as axes and legends is not completely automatic in dc.js; use .margins() to adjust where necessary.
https://github.com/dc-js/dc.js/blob/master/web/docs/api-latest.md#marginsmargins
It would be great to figure this out automatically but it is difficult to calculate, and easy to work around, so I guess no one has gotten annoyed enough to submit a fix. :)
I've written a program which reads measurements from an impedance analyzer as it sweeps over a range of frequencies or voltages, saves the data to a text file, and also creates a scatter plot. In one type of measurement, I obtain x and y values for complex impedance, neither of which are the independent parameter. Now when plotting this graph, it appears that it simply puts each x value to the right of the previous one at regular spacings resulting in x axis labels looking like, from left to right, [45000, 43000, 40000,... etc.].
I've tried forcing the x-axis to start from zero which did not change anything and haven't been able to find much else on this. Is there a way to make sure the plot reflects the actual x values of each point?
Here's my current method of creating the chart,which pulls the data from the already created table:
For Each row In table.Rows
Chart1.Series("series1").Points.AddXY(row(0), row(1))
Next