Add mean and variability to seaborn FacetGrid distplots - seaborn

What is the best way to add a point representing the mean (or another measure of central tendency) and a measure of variability (e.g., standard deviation or confidence interval) to each histogram in a seaborn FacetGrid?
The result should look similar to the figure shown here, but with a mean/SD in each of the FacetGrid subplots. This is a related question for the non-FacetGrid case.

Based on #mwaskom's comment, here is one possible solution (using boxplot, analogous for pointplot):
tips = sns.load_dataset("tips")
sns.set(font_scale=1.3)
def dist_boxplot(x, **kwargs):
ax = sns.distplot(x, hist_kws=dict(alpha=0.2))
ax2 = ax.twinx()
sns.boxplot(x=x, ax=ax2)
ax2.set(ylim=(-5, 5))
g = sns.FacetGrid(tips, col="sex")
g.map(dist_boxplot, "total_bill");
(Not sure why the 0.01 is shifted slightly rightwards...)

Related

Area of country in kilometers squared from Polygons

I am using geopandas sample data for this question.
import geopandas as gpd
df = gpd.read_file(gpd.datasets.get_path("naturalearth_lowres"))
My real dataset is somewhat different containing only 'polygon' type geometry points (in EPSG::4326), but what I would like to do is figure out the area of each polygon for each country in kilometers squared.
I am new to geopandas so I'm not sure if I am doing this right. My process is as follows;
ndf=df
ndf.to_crs("epsg:32633")
ndf["area"] = ndf['geometry'].area/ 10**6
ndf.head(2)
but the resulting areas don't make sense.
So I tried
df_2= df.to_crs({'proj':'cea'})
df_2["area"] = df_2['geometry'].area/ 10**6
df_2.head(2)
which is better, but still not accurate when run a google search for the areas.
So I'm wondering 1) is this the correct method? 2) how do I know the best projection type?
Computing polygon areas on equal-area types of map projection does not always yield good result due to the requirement of dense vertices along the boundaries of the polygon involved.
Computing on the un-projected earth surface is not difficult. With appropriate Python library that takes great-circle arcs between succeeding vertices that forms the surface areas in the computation, the results are more accurate.
The most accurate (imho) method to compute surface areas on the earth with Python can be demonstrated with this simple code.
import geopandas as gpd
from pyproj import Geod, Proj
# Use the included dataset of Geopandas
df = gpd.read_file(gpd.datasets.get_path("naturalearth_lowres"))
# Prep an ellipsoidal earth (WGS84's parameters)
geod = Geod('+a=6378137 +f=0.0033528106647475126')
# List of countries
some_countries = ["Thailand", "Nepal"]
def area_perim (country_name):
pgon = df[df["name"]==country_name].geometry.iloc[0]
# Extract list of longitude/latitude of country's boundary
lons, lats = pgon.exterior.xy[:][0], pgon.exterior.xy[:][1]
# Compute surface area and perimeter
poly_area, poly_perimeter = geod.polygon_area_perimeter(lons, lats)
# Print the results
print("\nCountry:", country_name)
print("Area, (sq.Km): {:.1f}".format(abs(poly_area)/10**6))
print("Perimeter, (Km): {:.2f}".format(poly_perimeter/10**3))
for each in some_countries:
area_perim(each)
Output:
Country: Thailand
Area, (sq.Km): 510125.6
Perimeter, (Km): 5555.56
Country: Nepal
Area, (sq.Km): 150706.9
Perimeter, (Km): 1983.42
Note that, df has CRS = epsg:4326.
If the source geodataframe you use has CRS other than epsg:4326, you can convert it to epsg:4326 before use.
See reference for more details.

Add location marker on plotted Geopandas Dataframe using Folium

Context
I have an merged geodataframe of 1). Postalcode areas and 2). total amount of deliveries within that postalcode area in the city of Groningen called results. The geodataframe includes geometry that include Polygons and Multiploygons visualizing different Postal code areas within the city.
I am new to GeoPandas and therefore I've tried different tutorials including this one from the geopandas official website wherein I got introduced into interactive Folium maps, which I really like. I was able to plot my geodataframe using result.explore(), which resulted in the following map
The problem
So far so good, but now I want to simply place an marker using the folium libarty with the goal to calculate the distance between the marker and the postalcode areas. After some looking on the internet I found out in the quickstart guild that you need to create an folium.Map, then you need folium.Choropleth for my geodataframe and folium.Marker and add them to the folium.Map.
m = folium.Map(location=[53.21917, 6.56667], zoom_start=15)
folium.Marker(
[53.210903, 6.598276],
popup="My marker"
).add_to(m)
folium.Choropleth(results, data=results, columns="Postcode", fill_color='OrRd', name="Postalcode areas").add_to(m)
folium.LayerControl().add_to(m)
m
But when try to run the above code I get the following error:
What is the (possible) best way?
Besides my failing code (which would be great if someone could help me out). I am curious if this is the way to do it (Folium map + marker + choropleth). Is it not possible to call geodataframe.explore() which results into the map in second picture and then just add an marker on the same map? I have the feeling that I am making it too difficult, there must be an better solution using Geopandas.
you have not provided the geometry. Have found postal districts of Netherlands and used that
explore() supports will draw a point as a marker with appropriate parameters
hence two layers,
one is postal areas coloured using number of deliveries
second is point, with distance to each area calculated
import geopandas as gpd
import shapely.geometry
import pandas as pd
import numpy as np
geo_url = "https://geodata.nationaalgeoregister.nl/cbsgebiedsindelingen/wfs?request=GetFeature&service=WFS&version=2.0.0&typeName=cbs_provincie_2017_gegeneraliseerd&outputFormat=json"
gdf = gpd.read_file(geo_url).assign(
deliveries=lambda d: np.random.randint(10**4, 10**6, len(d))
)
p = gpd.GeoSeries(shapely.geometry.Point(6.598276, 53.210903), crs="epsg:4386")
# calc distances to point
gdf["distance"] = gdf.distance(p.to_crs(gdf.crs).values[0])
# dataframe of flattened distances
dfp = pd.DataFrame(
[
"<br>".join(
[f"{a} - {b:.2f}" for a, b in gdf.loc[:, ["statcode", "distance"]].values]
)
],
columns=["info"],
)
# generate colored choropleth
m = gdf.explore(
column="deliveries", categorical=True, legend=False, height=400, width=400
)
# add marker with distances
gpd.GeoDataFrame(
geometry=p,
data=dfp,
).explore(m=m, marker_type="marker")

Keras Image Data Generator show labels

I am using an ImageDataGenerator to augment my images. I need to get the y labels from the generator.
Example : I have 10 training images, 7 are label 0 and 3 are label 1. I want to increase training set size to 100.
total_training_images = 100
total_val_images = 50
model.fit_generator(
train_generator,
steps_per_epoch= total_training_images // batch_size,
epochs=epochs,
validation_data=validation_generator,
validation_steps= total_val_images // batch_size)
By my understanding, this trains a model on 100 training images for each epoch, with each image being augmented in some way or the other according to my data generator, and then validates on 50 images.
If I do train_generator.classes, I get an output [0,0,0,0,0,0,0,1,1,1]. This corresponds to my 7 images of label 0 and 3 images of label 1.
For these new 100 images, how do I get the y-labels?
Does this mean when I am augmenting this to 100 images, my new train_generator labels are the same thing, but repeated 10 times? Essentially np.append(train_generator.classes) 10 times?
I am following this tutorial, if that helps :
https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html
The labels generate as one-hot-encoding with the images.Hope this helps !
training_set.class_indices
from keras.preprocessing import image
import matplotlib.pyplot as plt
x,y = train_generator.next()
for i in range(0,3):
image = x[i]
label = y[i]
print (label)
plt.imshow(image)
plt.show()
Based on what you're saying about the generator, yes.
It will replicate the same label for each augmented image. (Otherwise the model would not train properly).
One simple way to check what the generator is outputting is to get what it yields:
X,Y = train_generator.next() #or next(train_generator)
Just remember that this will place the generator in a position to yield the second element, not the first anymore. (This would make the fit method start from the second element).

matlab: texture classification

I have a histology image like this:
From the image, we can observe there are two kinds of different cells.
and
Is there any way that I can separate these two types of cells into two groups?
How about using your raw image and previous code to achieve this?
% % % your old code
I=imread(file);
t1=graythresh(I);
k1=im2bw(I,t1);
k1=~k1;
se = strel('disk',1);
k0=imfill(~k1,'holes');
cc = conncomp(k0);
k0(cc.PixelIdxList{1})=0;
k1=imfill(k1,'holes');
mask=k0 | k1;
%%%%%%%%%%%%%%%%%%
This will give you:
I=rgb2hsv(I);
I=double(I);
I1=I(:,:,1); % again, the channel that can maximizing the margin between donut and full circle
Imask=(I1-0.2).*(I1-0.9)<0;
k2=mask-Imask;
k2=bwareaopen(k2,100);
This will give you:
k2=mask-Imask;
I2=zeros(size(I1,1),size(I1,2),3);
I2(:,:,1)=(k2==1)*255;
I2(:,:,3)=((I1-0.2).*(I1-0.9)<0)*255;
imshow(I2)
will finally give you (the two types are stored in two channels in the rgb image):
I would use regionprops
props=regionprops(YourBinaryImage, 'Solidity');
The objects with a high solidity will be the disks, those with a lower solidity will be the circles.
(Edit) More formally:
I=imread('yourimage.jpg');
Bw=~im2bw(I, 0.5);
BWnobord = imclearborder(Bw, 4); % clears the partial objects
Props=regionprops(BWnobord, 'All');
solidity=cell2mat({Props.Solidity});
Images={Props.Image};
Access the elements of Images where the value in solidity is higher than 0.9 and you get your disks. The circles are the other ones.
Hope it helps

In Matplotlib, how do you add an Imagedraw object to a PyPlot?

I need to add a shape to a preexisting image generated using a pyplot (plt). The best way I know of to generate basic shapes quickly is using Imagedraw's predefined shapes. The original data has points with corresponding colors in line_holder and colorholder. I need to add a bounding box (or in this case ellipse) to the plot to make it obvious to the user whether the data is in an acceptable range.
import matplotlib.pyplot as plt
from matplotlib.collections import LineCollection
from PIL import Image
...
lines = LineCollection(mpl.line_holder, colors=mpl.colorholder , linestyle='solid')
plt.axes().add_collection(lines)
plt.axes().set_aspect('equal', 'datalim')
plt.axes().autoscale_view(True,True,True)
plt.draw()
plt.show()
I tried inserting this before the show():
image = Image.new('1',(int(ceil(disc/conv))+2,int(ceil(disc/conv))+1), 1)
draw = ImageDraw.Draw(image)
box=(1, 1, int(ceil(disc/conv)), int(ceil(disc/conv))) #create bounding box
draw.ellipse(box, 1, 0) #draw circle in black
but I cannot find a way to then add this ellipse to the pyplot. Does anyone know how one would go about getting the images together? If it is not possible to add an imagedraw object to a pyplot, are there good alternatives for performing this type of operation?
Matplotlib has several patches (shapes) that appear to meet your needs (and remove PIL as a dependency). They are documented here. A helpful example using shapes is here.
To add an ellipse to a plot, you first create a Ellipse patch and then add that patch to the axes you're currently working on. Beware that Circle's (or Ellipse's with equal minor radii) will appear elliptical if your aspect ratio is not equal.
In your snippet you call plt.axes() several times. This is unnecessary, as it is just returning the current axes object. I think it is clearer to keep the axes object and directly operate on it rather than repeatedly getting the same object via plt.axes(). As far as axes() is used in your snippet, gca() does the same thing. The end of my script demonstrates this.
I've also replaced your add_collection() line by a plotting a single line. These essentially do the same thing and allows my snippet to be executed as a standalone script.
import matplotlib.pyplot as plt
import matplotlib as mpl
# set up your axes object
ax = plt.axes()
ax.set_aspect('equal', 'datalim')
ax.autoscale_view(True, True, True)
# adding a LineCollection is equivalent to plotting a line
# this will run as a stand alone script
x = range(10)
plt.plot( x, x, 'x-')
# add and ellipse to the axes
c = mpl.patches.Ellipse( (5, 5), 1, 6, angle=45)
ax.add_patch(c)
# you can get the current axes a few ways
ax2 = plt.axes()
c2 = mpl.patches.Ellipse( (7, 7), 1, 6, angle=-45, color='green')
ax2.add_patch(c2)
ax3 = plt.gca()
c3 = mpl.patches.Ellipse( (0, 2), 3, 3, color='black')
ax3.add_patch(c3)
print id(ax), id(ax2), id(ax3)
plt.show()

Resources