Seaborn.distplot with probability rather than probability density function - seaborn

Can I have seaborn.distplot with y-axis of probability rather than probability density function?

I'm not quite sure that this is what you meant, but:
ax = sns.distplot(x, rug=True, rug_kws={"color": "g"},
kde_kws={"color": "k", "lw": 3, "label": "KDE"},
hist_kws={"histtype": "step", "linewidth": 3,
"alpha": 1, "color": "g"})
Seems like a rugplot could be a solution to your problem as well.
Taken from the seaborn documentation:
rugplot:
Draw small vertical lines to show each observation in a distribution.

Related

3 and 4 degree curves in three.js

I am trying to reproduce the degree-3 or degree-4 3D curves typically found in parametric cad programs like Rhino or Autocad, which take any number of 3D points to create long curves. I've found that three.js has Cubic (degree-3) and Quadratic (degree-4) Bezier curves available but they take exactly three and 4 vectors, respectively. I'd like to create curves with 10 or more inputs, not just 3 or 4. I've also found that three.js has 'Path' which allows building a 2D curve of mixed degree segments using the .bezierCurveTo() or .quadraticCurveTo() methods.
So my question:
Is there currently a way to construct long chains of CubicBezierCurve3 curves that join smoothly? Ideally with a constructor that takes a simple array of vertices?
If I need to implement this myself, where is the best place to start? I'm thinking the .quadraticCurveTo() method could be extended to use a z component and added to SplineCurve3? I'm not 100% clear on how the array of curves works in the 'Path' object.
THREE.Path.prototype.quadraticCurveTo = function( aCPx, aCPy, aX, aY ) {
var args = Array.prototype.slice.call( arguments );
var lastargs = this.actions[ this.actions.length - 1 ].args;
var x0 = lastargs[ lastargs.length - 2 ];
var y0 = lastargs[ lastargs.length - 1 ];
var curve = new THREE.QuadraticBezierCurve( new THREE.Vector2( x0, y0 ),
new THREE.Vector2( aCPx, aCPy ),
new THREE.Vector2( aX, aY ) );
this.curves.push( curve );
this.actions.push( { action: THREE.PathActions.QUADRATIC_CURVE_TO, args: args } );
};
Thanks for your help!
Thanks to karatedog and fang for your in-depth answers. In searching for more information about B-spline curve, I stumbled upon this extra library for Three.js NURBS which is exactly what I needed. Upon closer inspection of the THREE.NURBSCurve() constructor in this library, it's implemented exactly as fang described: with arrays of both control points and knots. Knots are defined similarly to the method described above. I'm Marking Fang's answer as correct but I wanted to add this link to the pre-existing library as well, so any n00bs like myself could use it :)
If you are fine with using a high degree Bezier curve, then you can implement it using De Casteljau algorithm. The link in karatedog's answer provides a good source for this algorithm. If you want to stick with degree 3 polynomial curve with many control points, B-spline curve will be a good choice. B-spline curve can be implemented using Cox de Boor algorithm. You can find plenty of reference on the internet. B-spline curve definition requires degree, control points and knot vector. If you want your function to simply take an array of 3d points, you can set degree = 3 and internally define the knot vector as
[0, 0, 0, 0, 1/(N-3), 2/(N-3),....., 1, 1, 1, 1].
where N = number of control points.
For example,
N=4, knot vector=[0, 0, 0, 0, 1, 1, 1, 1],
N=5, knot vector=[0, 0, 0, 0, 1/2, 1, 1, 1, 1],
N=6, knot vector=[0, 0, 0, 0, 1/3, 2/3, 1, 1, 1, 1].
For the N=4 case, the B-spline curve is essentially the same as a cubic Bezier curve.
I suggest to implement your own calculation algorithm, it is fairly easy, the learning process is short and worth the time invested. Check this page: http://pomax.github.io/bezierinfo/
It describes a method (language agnostic) that you can calculate BeziƩr curves with any number of control points, although the a calculation that is specific to a certain number of control points (like cubic or quadratic) can be highly optimized.

Efficient algorithm to fit a linear line along the upper boundary of data only

I'm currently trying to fit a linear line through a spread of scattered data in MATLAB. Now this is easy enough using the polyfit function where I can easily obtain my y= mx + c equation. However, I need to now fit a line along the upper boundary of my data, i.e., the top few data points. I know this description is vague, so lets assume that my scattered data will be in a shape of a cone, with its apex on the y-axis, and it spreads outwards and upwards in the +x and +y direction. I need to fit a best fit line on the 'upper edge of the cone' if you will.
I've developed an algorithm but it's extremely slow. It involves first fitting a line of best fit through ALL data, deleting all data points below this line of best fit, and iterating through until only 5% of the initial data points are left. The final best fit line will then reside close to the top edge of the cone. For 250 data points, this takes about 5s and with me dealing with more than a million data points, this algorithm is simply too inefficient.
I guess my question is: is there an algorithm to more efficiently achieve what I need? Or is there a way to sharpen up my code to eliminate unnecessary complexity?
Here is my code in MATLAB:
(As an example)
a = [4, 5, 1, 8, 1.6, 3, 8, 9.2]; %To be used as x-axis points
b = [45, 53, 12, 76, 25, 67, 75, 98]; %To be used as y-axis points
while prod(size(a)) > (0.05*prod(size(a))) %Iterative line fitting occurs until there are less than 5% of the data points left
lobf = polyfit(a,b,1); %Line of Best Fit for current data points
alen = length(a);
for aindex = alen:-1:1 %For loop to delete all points below line of best fit
ValLoBF = lobf(1)*a(aindex) + lobf(2)
if ValLoBF > b(aindex) %if LoBF is above current point...
a(aindex) = []; %delete x coordinate...
b(aindex) = []; %and delete its corresponding y coordinate
end
end
end
Well first of all your example code seems to be running indefinitely ;)
Some optimizations for your code:
a = [4, 5, 1, 8, 1.6, 3, 8, 9.2]; %To be used as x-axis points
b = [45, 53, 12, 76, 25, 67, 75, 98]; %To be used as y-axis points
n_init_a = length(a);
while length(a) > 0.05*n_init_a %Iterative line fitting occurs until there are less than 5% of the data points left
lobf = polyfit(a,b,1); % Line of Best Fit for current data points
% Delete data points below line using logical indexing
% First create values of the polyfit points using element-wise vector multiplication
temp = lobf(1)*a + lobf(2); % Containing all polyfit values
% Using logical indexing to discard all points below
a(b<temp)=[]; % First destroy a
b(b<temp)=[]; % Then b, very important!
end
Also you should try profiling your code by typing in the command window
profile viewer
and check what takes most time calculating your results. I suspect it is polyfit but that can't be sped up much probably.
What you are looking for is not line fitting. You are trying to find the convex hull of the points.
You should check out the function convhull. Once you find the hull, you can remove all of the points that aren't close to it, and fit each part independently to avoid the fact that the data is noisy.
Alternatively, you could render the points onto some pixel grid, and then do some kind of morphological operation, like imclose, and finish with Hough transform. Check out also this answer.

How to compute the variances in Expectation Maximization with n dimensions?

I have been reviewing Expectation Maximization (EM) in research papers such as this one:
http://pdf.aminer.org/000/221/588/fuzzy_k_means_clustering_with_crisp_regions.pdf
I have some doubts that I have not figured it out. For example, what would happen if we have many dimensions for each datapoint?
For example I have the following dataset with 6 datapoints and 4 dimensions:
>D1 D2 D3 D4
5, 19, 72, 5
6, 18, 14, 1
7, 22, 29, 4
3, 22, 51, 1
2, 21, 89, 2
1, 12, 28, 1
It means that for computing the expectation step, do I need to compute 4 standard deviations (one for each dimension)?
Do I also have to compute the variance for each cluster assuming k=3 (Do not know if it is necessary based on the formula from the paper...) or just the variances for each dimensions (4 attributes)?
Usually, you use a Covariance matrix, which also includes variances.
But it really depends on your chosen model. The simplest model does not use variances at all.
A more complex model has a single variance value, the average variance over all dimensions.
Next, you can have a separate variance for each dimension independently; and last but not least a full covariance matrix. That is probably the most flexible GMM in popular use.
Depending on your implementation, there can be many more.
From R's mclust documentation:
univariate mixture
"E" = equal variance (one-dimensional)
"V" = variable variance (one-dimensional)
multivariate mixture
"EII" = spherical, equal volume
"VII" = spherical, unequal volume
"EEI" = diagonal, equal volume and shape
"VEI" = diagonal, varying volume, equal shape
"EVI" = diagonal, equal volume, varying shape
"VVI" = diagonal, varying volume and shape
"EEE" = ellipsoidal, equal volume, shape, and orientation
"EEV" = ellipsoidal, equal volume and equal shape
"VEV" = ellipsoidal, equal shape
"VVV" = ellipsoidal, varying volume, shape, and orientation
single component
"X" = univariate normal
"XII" = spherical multivariate normal
"XXI" = diagonal multivariate normal
"XXX" = elliposidal multivariate normal

Validating fractal dimension computation in Mathematica

I've written an implementation of the standard box-counting algorithm for determining the fractal dimension of an image or a set in Mathematica, and I'm trying to validate it. I've generated a Sierpinski triangle matrix using the CellularAutomaton function, and computed its fractal dimension to be 1.58496 with a statistical error of about 10^-15. This matches the expected value of log(3)/log(2) = 1.58496 incredibly well.
The problem arises when I try to test my algorithm against a randomly-generated matrix. The fractal dimension in this case should be exactly 2, but I get about 1.994, with a statistical error of about 0.004. Hence, my box-counting algorithm seems to work perfectly fine for the Sierpinski triangle, but not quite so well for the random distribution. Any ideas why not?
Code below:
sierpinski512 = CellularAutomaton[90, {{1}, 0}, 512];
ArrayPlot[%]
d512 = FractalDimension[sierpinski512, {512, 256, 128, 64, 32, 16, 8, 4, 2}]
rtable = Table[Round[RandomReal[]], {i, 1, 512}, {j, 1, 1024}];
ArrayPlot[%]
drand = FractalDimension[rtable, {512, 256, 128, 64, 32, 16, 8, 4, 2}]
I can post the FractalDimension code if anybody really needs it, but I think the solution (if any) is not to do with the FractalDimension algorithm, but the rtable I'm generating above.
I have studied this problem a little empirically in consultation with a well-known physicist, and we believe that the fractal dimension of a random point process goes to 2 (in the limit, I think) as the number of points grows large. I can't provide an exact definition of "large" but it can't be less than a few thousand points. So, I think you should expect to get D < 2 unless the number of points is quite large, theoretically, large enough to tile the plane.
I would be grateful for your FractalDimension code!

Mathematica: Help me understand Mathematica 3D coordinates system

I gave up trying to understand Mathematica 3D axes configuration.
When I make 3D plot, and label the 3 axes to identify which axes is which, and then make points on these axes, the points appear on different axes than what I expect them to show at using the Point command, which takes {x,y,z} coordinates.
Here is an example
g=Graphics3D[
{
{PointSize[0],Point[{0,0,0}]}
},
AxesOrigin->{0,0,0}, PlotRange->{{-3,3},{-3,3},{-3,3}},
Axes->True, AxesLabel->{"X","Y","Z"},
LabelStyle->Directive[Bold,Red,16],
PreserveImageOptions->False, Ticks->None,Boxed->False]
The above results in
So, now I added a point at at end of the x-axis, and at the end of the y-axis, and at the end of the z-axis. I make each point different color to help identify them on the plot.
g=Graphics3D[
{
{Red,PointSize[.03],Point[{3,0,0}]},
{Black,PointSize[.03],Point[{0,3,0}]},
{Blue,PointSize[.03],Point[{0,0,3}]}
},
AxesOrigin->{0,0,0},PlotRange->{{-3,3},{-3,3},{-3,3}},
Axes->True,AxesLabel->{"X","Y","Z"},
LabelStyle->Directive[Bold,Red,16],PreserveImageOptions->False,
Ticks->None,Boxed->False]
The result is this:
You can see, the RED point, which I expected it to go to end of the x-axis, shows up at the end of the Z axis. And the Black point, instead of showing up at the end of the Y-axis, shows up at X-axis, and the blue point, instead of showing at the end of the Z axis, shows up at the end of the Y-axis.
May be the labels are wrong? May be I am looking at the image in wrong way?
I am really confused, as I am clearly not understanding something. I looked at documentation, and I could not find something to help me see what I am doing wrong. I am just starting to learn Mathematica 3D graphics.
EDIT:
add image with Ticks on it, reply to Simon, I did not know how to do it the comment box:
g=Graphics3D[
{
Cuboid[{-.1,-.1,-.1},{.1,.1,.1}],
{Red,PointSize[.03],Point[{2,0,0}]},
{Black,PointSize[.03],Point[{0,2,0}]},
{Blue,PointSize[.03],Point[{0,0,2}]}
},
AxesOrigin->{0,0,0},
PlotRange->{{-2,2},{-2,2},{-2,2}},
Axes->True,
AxesLabel->{"X","Y","Z"},
LabelStyle->Directive[Bold,Red,16],
PreserveImageOptions->False,
Ticks->True, TicksStyle->Directive[Black,8],
Boxed->False
]
here is the result:
EDIT: OK, I decided to forget about using AxesLabels, and I put them myself . Much more clear now
m=3;
labels={Text[Style["X",16],{1.2 m,0,0}],Text[Style["Y",16],{0,1.2 m,0}],
Text[Style["Z",16],{0,0,1.2 m}]};
g=Graphics3D[
{
{Red,PointSize[.03],Point[{m,0,0}]},
{Black,PointSize[.03],Point[{0,m,0}]},
{Blue,PointSize[.03],Point[{0,0,m}]},
labels
},
AxesOrigin->{0,0,0},
PlotRange->{{-m,m},{-m,m},{-m,m}},
Axes->True,
AxesLabel->None,
LabelStyle->Directive[Bold,Red,16],
PreserveImageOptions->False,
Ticks->True, TicksStyle->Directive[Black,8],
Boxed->False
]
I agree with you that AxesLabel for 3D graphics is next to worthless. Look at the effects of a small interactive viewpoint change on your figure:
IMHO WRI should really improve the operation of this option, and preferably provide some more placement control too (end/mid of axes etc.).
I believe the labels are being placed in unintuitive spots. Replacing your dots with colored lines of different length is clearer to me. I've also removed the explicit plot range which helps Mathematica put the labels in much clearer places.
g=Graphics3D[
{
{Red,Thick, Line[{{0, 0, 0}, {1, 0, 0}}]},
{Black,Thick, Line[{{0, 0, 0}, {0, 2, 0}}]},
{Blue,Thick, Line[{{0, 0, 0}, {0, 0, 3}}]}
},
AxesOrigin->{0,0,0},
Axes->True,AxesLabel->{"X","Y","Z"},
LabelStyle->Directive[Bold,Red,16],PreserveImageOptions->False,
Ticks->None,Boxed->False]

Resources