suppose i have a data
_sample=np.array([1,2,3,4,5,6,7,8,9,10])
i am plotting the data using seaborn distplot which plots the data using KDE distribution
left image, i use the bin value as 10
I am getting a plot which has a value 0.11 but it should be exactly 0.1 as value/n = 0.1
right image, i use the bin value [1,2,3,4,5,6,7,8,9,10]
in the right image, i get most(90%) of the value at 0.10 but i have a few value having the y-axis 0.20. why is the right side of plot reaching to 0.20 when it all should have the value 0.10
please let me know what i am missing, i am not able to understand this
update: adding code
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
_fig,_ax=plt.subplots(1,2,figsize=(15,5))
_sample=np.array([1,2,3,4,5,6,7,8,9,10])
sns.distplot(_sample,bins=10,ax=_ax[0],axlabel='bins=10')
sns.distplot(_sample,bins=[1,2,3,4,5,6,7,8,9,10],ax=_ax[1],axlabel='bins=[1,2,3,4,5,6,7,8,9,10]')
Related
I don't understand the seaborn.boxplot() graph below.
data source for cvs file
The code is:
%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv('train.csv')
df.head()
plt.figure(figsize = (8,8))
sns.color_palette("Paired")
sns.boxplot(x="Gender",y="Purchase", hue="Age", data=df, palette="Paired")
plt.legend(bbox_to_anchor=(1.05,1),loc=2, borderaxespad=0)
plt.grid(True)
plt.draw()
That produces:
df[(df.Gender == 'F') & (df.Age =='55+')].Purchase.describe()
That produces:
count 5083.000000
mean 9007.036199
std 4801.556874
min 12.000000
25% 6039.500000
50% 8084.000000
75% 10067.000000
max 23899.000000
Name: Purchase, dtype: float64
I find some values but not all. For example, I do not see the maximum.
But most of all, I don't understand these clusters of black dots that
I circled in red on the graph. I don't know what they correspond to.
Do you have any idea what they represent?
As Johann C has indicated, the whiskers are 1.5 times the interquartile range (the values from 25 to 75% i.e. cover the middle 50% of the values). The values outside of this interquartile range are known as outliers and this is what is being represented when you are labelling by ???. In theory the whiskers would be equal length from top and bottom of the interquartile box but as the min value is 12 the whiskers are cut off here. From the looks of it, it suggests that you have a right skew distribution.
From what it looks, these are outliers which are so numerous they overlap. You might thus want to check if you're actually dealing with two separate populations whose samples have been thrown together, or a bimodal distribution as such. Both deserves investigation IMO. However, that'd be better discussed in a statistics channel (it's not specific to seaborn).
How can I select a layer from a tf.estimator.Estimator and access the weights vector for each unit in that layer? Specifically, I'm trying to visualize a Dense layer's weights.
Looking at https://github.com/tensorflow/tensorflow/blob/r1.3/tensorflow/python/layers/core.py it seems that the weights are called kernels, but I'm not able to access those when using the Estimator abstraction.
Ps: for an example of an implementation of Estimator, let's reference https://www.tensorflow.org/get_started/estimator
Estimator has a method called get_variable_value. So, once you have produced a checkpoint (or loaded the variable values from one) and if you know the name of the dense layer, you could do something like this using matplotlib:
import matplotlib.pyplot as plt
weights = estimator.get_variable_value('dense/kernel')
plt.imshow(weights, cmap='gray')
plt.show()
I just used the pre-compiled Estimator for testing and this worked properly for me.
import matplotlib.pyplot as plt
names = classifier.get_variable_names()
print("name:", names)
for i in names:
print(classifier.get_variable_value(i)
import cv2
import numpy as np
while True:
cam = cv2.VideoCapture(0)
while(cam.isOpened()):
ret, im = cam.read()
im=im.reshape(1,-1,3)
im_list=im.tolist()
im_tuples=map(tuple,im_list[0])
im_set=set(im_tuples)
print len(im_set)
This is my code. It runs really slow (like once a second). How can I increase its speed? It already seems really small. Do I lower the image dimensions or something? Or is this as fast as it gets?
A very similar question, solved the same way: how to use 'extent' in matplotlib.pyplot.imshow
I have a list of geographical coordinates (a "tracklog") that describe a geographical trajectory. Also, I have the means of obtaining an image spanning the tracklog coverage, where I know the "geographical coordinates" of the corners of the image.
My plot currently looks like this (notice the ticks - x=longitudes, y=latitudes, in UTM, WGS84):
Then suppose I know the corner coordinates of the following image (or a version of it without the blue track), and would like to plot it SO THAT IT FITS THE COORDINATE SYSTEM of the plot.
How would I do it?
(as a side note, in case that matters, I plan to use tiles)
As per the comment of Joe Kington (waiting for his actual answer so that I can accept it), the following code works as expected, giving a pannable and zoomable fixed-aspect "georeferenced" tile over which I am able to plot tracklogs:
import matplotlib.pyplot as plt
import Image
import numpy
imarray = numpy.asarray(Image.open('map.jpg'))
plt.plot([0,1], [0,1], 'o', c='red', ms=20) ## some reference circles for debugging
plt.imshow(imarray, extent=[0,1,0,1]) ## some random map whose corners have known coordinates
plt.axis('equal')
plt.show()
There is really not much of an answer here, but if you are using matplotlib, and you geos-tuff, take a look at matplotlib.basemap.
By default all operations are done on UTM maps, but you can choose your own projection.
Take also a look on the list of good tutorials in http://www.geophysique.be, for example.
I have two figures, one is a data plot resulting from some calculations and made with matplotlib and the other is a world map figure taken from google maps. I would like to reduce the matplotlib figure to some percentage value and superpose it over the map picture at certain position and get a final "mixed" picture. I know it can be done with graphical problems and so, but I would like to do it automatically on the shell for thousands of different cases, I wonder if you could propose some methodology / ideas for this.
Just in case you wanted to do it directly using matplotlib when you're plotting your data (imagemagick is great otherwise):
import Image
import matplotlib.pyplot as plt
import numpy as np
dpi = 100.0
im = Image.open('Dymaxion_map_unfolded.png')
width, height = im.size
fig = plt.figure(figsize=(width / dpi, height / dpi))
fig.figimage(np.array(im) / 255.0)
# Make an axis in the upper left corner that takes up 20% of the height and 30%
# of the width of the figure
ax = fig.add_axes([0, 0.7, 0.2, 0.3])
ax.plot(range(10))
plt.show()
ImageMagick can do the job, exactly the composite command. For the usage, check this url for the examples: http://www.imagemagick.org/Usage/annotating/#overlay
This sounds like something ImageMagick would be well suited for, esp. the -layers switch.