What will be a good way in a scatter chart to have different point/symbol sizes?
We want to make the scatter point sizes larger based on count of same record value. Currently we use http://dc-js.github.io/dc.js/examples/scatter-brushing.html which has a single point when we have data of same values controlled by symbolSize.
We want the symbolSize to vary based on the count of record values.More the data of same value larger the point size.
I am looking for something like the c3.js jsfiddle example in Scatter plot size on "tooltip" .
Related
Has anyone added the ability to display values in a boxplot for dc.js?
Interesting answer given to this question related to matplotlib.
Adding a scatter of points to a boxplot using matplotlib
As it's currently implemented, the box plot will display any outliers as circles, and outliers are defined as the points which do not fall within the whiskers.
If you're willing to change the source, it's pretty easy to disable the whiskers and show all the data points.
You just need to change line 42:
var _whiskers = _whiskersIqr(_whiskerIqrFactor);
https://github.com/dc-js/dc.js/blob/356fccea3a1dbd49a76fb1841f280ffad87d725f/src/box-plot.js#L42
You could just set it to null, or add an accessor for whiskers. (There really should be one, looks like an oversight.)
It looks like this with no whiskers:
You'd have to dig a bit deeper and change the underlying d3-plugin if you want to display whiskers along with the data points.
I'm trying to reduce the number of points in a DC.js line chart to improve performance. The docs lead me to believe xUnits() is the way to do this:
The coordinate grid chart uses the xUnits function to calculate the number of data projections on x axis such as the number of bars for a bar chart or the number of dots for a line chart.
but xUnits does not even seem to be used:
http://jsfiddle.net/m5tguakf/2/
What am I doing wrong?
The number of points is actually determined by crossfilter - dc.js doesn't do any aggregation on its own, so it has no way to add or reduce the number of points.
That documentation may be misleading - it doesn't alter the shape of the data. xUnits is really just needed for dc.js to know the number of elements it is going to draw. It's used for two purposes:
to determine the width of bars or box-plots
to know whether the x scale is ordinal or quantitative
Could dc.js just count the number of points in the crossfilter group? Perhaps.
Anyway, to get back to your original question: if you want to reduce the number of points drawn, aggregate your data differently in your group. Usually this means creating larger bins which either sum or average the data which fall into that interval.
As a simple example, you can combine every other point in your fiddle by binning by even numbers, like so:
var BINSIZE = 2;
// ...
speedSumGroup = runDimension
.group(function(r) { return Math.floor(r/BINSIZE) * BINSIZE; })
// ...
http://jsfiddle.net/gordonwoodhull/djrhodkj/2/
This causes e.g. both Run 6 and Run 7 to fall in the same bin, because they have the same group key. In a real example, you'd probably want to average them, as shown in the annotated stock example.
I'm using the corrplot function in seaborn and everything works flawlessly. However, I want to do a little filtering on the data. Is there a way to hide correlations below or above a certain value? I have a large data frame and I only want to see correlations greater than an arbitrary number, say .4.
I'd like all the 'squares' in the image that are not greater than .4 to be set to white, grey or some other color. I'm not sure how to do this because the corrplot takes a full data frame and calculates the correlations internally. I don't want to filter on the data frame values, just the resulting correlation values.
Maybe there's some way to get the resulting image from the underlying matshow call back to my own code and then replot it by filtering the image itself?
As per #mwaskom's comments, you can use sns.heatmap(). You'll have to compute the correlation matrix yourself, but it's otherwise more flexibly in presentation and allows you to pass, e.g. a mask to do exactly what you want.
I've written a program which reads measurements from an impedance analyzer as it sweeps over a range of frequencies or voltages, saves the data to a text file, and also creates a scatter plot. In one type of measurement, I obtain x and y values for complex impedance, neither of which are the independent parameter. Now when plotting this graph, it appears that it simply puts each x value to the right of the previous one at regular spacings resulting in x axis labels looking like, from left to right, [45000, 43000, 40000,... etc.].
I've tried forcing the x-axis to start from zero which did not change anything and haven't been able to find much else on this. Is there a way to make sure the plot reflects the actual x values of each point?
Here's my current method of creating the chart,which pulls the data from the already created table:
For Each row In table.Rows
Chart1.Series("series1").Points.AddXY(row(0), row(1))
Next
Let's say I have a list of values and I have already chunked them into groups to make a histogram.
Since Excel doesn't have histograms, I made a bar plot using the groups I developed. Specifically, I have the frequencies 2 6 12 10 2 and it produces the bar plot you see below.
Next, I want to add a normal distribution (line plot) with a mean of 0.136 and standard deviation of 0.497 on top of this histogram. How can I do this in excel? I need the axis to line up such that it takes up the width of the bar plot. Otherwise, you get something like I've attached.
But...the normal should be overlayed on the bar plot. How can I get this effect?
There are two main part to this answer:
First, I reverse-engineered the grouped data to come up with an appropriate mean and standard deviation on this scale.
Second, I employed some chart trickery to make the normal distribution curve look right when superimposed on the column chart. I used Excel 2007 for this; hopefully you have the same options available in your version.
Part 1: Reverse-Engineer
The column B formulae are:
Last Point =MAX(A2:A6)
Mean =SUMPRODUCT(B2:B6,A2:A6)/SUM(B2:B6)
E(x^2f) =SUMPRODUCT(A2:A6^2,B2:B6)
E(xf)^2 =SUMPRODUCT(A2:A6,B2:B6)^2
E(f) =SUM(B2:B6)
Variance =B10-B11/B12
StDev =SQRT(B13/(B12-1))
Part 2: Chart Trickery
Data table:
Column D is just an incremental counter. This will be the number of data points in the normal distribution curve.
E2 =D2/$B$8 etc.
F2 =NORMDIST(E2,$B$9,$B$14,FALSE) etc.
Chart:
Now, add Columns E:F to the chart. You will need to massage a few things:
Change the series to be an X-Y plot. This might require some editing of the chart series to force a single series to use your desired X and Y values.
Change the series to use the secondary axes (both X and Y).
Change the secondary X-axis range to 0.5-5.5 (i.e., 0.5 on either side of the column chart category values). This will effectively align the primary and secondary X-axes.
Change the secondary Y-axis range to 0-1
Format the X-Y series appearance to taste (I suggest removing value markers).
The result so far:
Lastly, you can remove the tick marks and labels on the secondary axes to clean up the look.
Postscript: Thanks to John Peltier for innumerable charting inspirations over the years.