GDF.simplify messes up geometries - geopandas

I am trying to plot river basins on a map. In order to reduce the size of the resulting vector graphics, I am applying GeoSeries.simplify().
import cartopy
import cartopy.crs as ccrs
import matplotlib.pyplot as plt
import geopandas as gpd
# %%% Earth
fig = plt.figure()
latlon_proj = ccrs.PlateCarree()
axis_proj = ccrs.Orthographic()
ax = plt.axes(
projection=axis_proj
)
# %%% Major River Basins
mrb_basins = gpd.read_file('mrb_basins.json') # 520 entries
mrb_basins['geometry'] = mrb_basins['geometry'].simplify(0.1)
for shape in mrb_basins['geometry']:
feat = cartopy.feature.ShapelyFeature(
[shape],
latlon_proj,
facecolor='red',
)
ax.add_feature(feat)
mrb_basins.plot()
The problem is, the resulting map of the earth is fully covered by a red shape.
This is not the case, if I remove the line mrb_basins['geometry'] = mrb_basins['geometry'].simplify(0.1).
How can I simplify the geometries whilst keeping their integrity?
The data set of major river basins is available here.

GeoSeries.simplify() does not always return valid geometries due to the underlying simplification algorithm used by GEOS. And cartopy has trouble to plot invalid geometries.
You need to fix your geometries before passing them to cartopy. The simple trick is to call buffer(0).
mrb_basins['geometry'] = mrb_basins['geometry'].simplify(0.1).buffer(0)
Then your code works fine.

Related

Area of country in kilometers squared from Polygons

I am using geopandas sample data for this question.
import geopandas as gpd
df = gpd.read_file(gpd.datasets.get_path("naturalearth_lowres"))
My real dataset is somewhat different containing only 'polygon' type geometry points (in EPSG::4326), but what I would like to do is figure out the area of each polygon for each country in kilometers squared.
I am new to geopandas so I'm not sure if I am doing this right. My process is as follows;
ndf=df
ndf.to_crs("epsg:32633")
ndf["area"] = ndf['geometry'].area/ 10**6
ndf.head(2)
but the resulting areas don't make sense.
So I tried
df_2= df.to_crs({'proj':'cea'})
df_2["area"] = df_2['geometry'].area/ 10**6
df_2.head(2)
which is better, but still not accurate when run a google search for the areas.
So I'm wondering 1) is this the correct method? 2) how do I know the best projection type?
Computing polygon areas on equal-area types of map projection does not always yield good result due to the requirement of dense vertices along the boundaries of the polygon involved.
Computing on the un-projected earth surface is not difficult. With appropriate Python library that takes great-circle arcs between succeeding vertices that forms the surface areas in the computation, the results are more accurate.
The most accurate (imho) method to compute surface areas on the earth with Python can be demonstrated with this simple code.
import geopandas as gpd
from pyproj import Geod, Proj
# Use the included dataset of Geopandas
df = gpd.read_file(gpd.datasets.get_path("naturalearth_lowres"))
# Prep an ellipsoidal earth (WGS84's parameters)
geod = Geod('+a=6378137 +f=0.0033528106647475126')
# List of countries
some_countries = ["Thailand", "Nepal"]
def area_perim (country_name):
pgon = df[df["name"]==country_name].geometry.iloc[0]
# Extract list of longitude/latitude of country's boundary
lons, lats = pgon.exterior.xy[:][0], pgon.exterior.xy[:][1]
# Compute surface area and perimeter
poly_area, poly_perimeter = geod.polygon_area_perimeter(lons, lats)
# Print the results
print("\nCountry:", country_name)
print("Area, (sq.Km): {:.1f}".format(abs(poly_area)/10**6))
print("Perimeter, (Km): {:.2f}".format(poly_perimeter/10**3))
for each in some_countries:
area_perim(each)
Output:
Country: Thailand
Area, (sq.Km): 510125.6
Perimeter, (Km): 5555.56
Country: Nepal
Area, (sq.Km): 150706.9
Perimeter, (Km): 1983.42
Note that, df has CRS = epsg:4326.
If the source geodataframe you use has CRS other than epsg:4326, you can convert it to epsg:4326 before use.
See reference for more details.

Add location marker on plotted Geopandas Dataframe using Folium

Context
I have an merged geodataframe of 1). Postalcode areas and 2). total amount of deliveries within that postalcode area in the city of Groningen called results. The geodataframe includes geometry that include Polygons and Multiploygons visualizing different Postal code areas within the city.
I am new to GeoPandas and therefore I've tried different tutorials including this one from the geopandas official website wherein I got introduced into interactive Folium maps, which I really like. I was able to plot my geodataframe using result.explore(), which resulted in the following map
The problem
So far so good, but now I want to simply place an marker using the folium libarty with the goal to calculate the distance between the marker and the postalcode areas. After some looking on the internet I found out in the quickstart guild that you need to create an folium.Map, then you need folium.Choropleth for my geodataframe and folium.Marker and add them to the folium.Map.
m = folium.Map(location=[53.21917, 6.56667], zoom_start=15)
folium.Marker(
[53.210903, 6.598276],
popup="My marker"
).add_to(m)
folium.Choropleth(results, data=results, columns="Postcode", fill_color='OrRd', name="Postalcode areas").add_to(m)
folium.LayerControl().add_to(m)
m
But when try to run the above code I get the following error:
What is the (possible) best way?
Besides my failing code (which would be great if someone could help me out). I am curious if this is the way to do it (Folium map + marker + choropleth). Is it not possible to call geodataframe.explore() which results into the map in second picture and then just add an marker on the same map? I have the feeling that I am making it too difficult, there must be an better solution using Geopandas.
you have not provided the geometry. Have found postal districts of Netherlands and used that
explore() supports will draw a point as a marker with appropriate parameters
hence two layers,
one is postal areas coloured using number of deliveries
second is point, with distance to each area calculated
import geopandas as gpd
import shapely.geometry
import pandas as pd
import numpy as np
geo_url = "https://geodata.nationaalgeoregister.nl/cbsgebiedsindelingen/wfs?request=GetFeature&service=WFS&version=2.0.0&typeName=cbs_provincie_2017_gegeneraliseerd&outputFormat=json"
gdf = gpd.read_file(geo_url).assign(
deliveries=lambda d: np.random.randint(10**4, 10**6, len(d))
)
p = gpd.GeoSeries(shapely.geometry.Point(6.598276, 53.210903), crs="epsg:4386")
# calc distances to point
gdf["distance"] = gdf.distance(p.to_crs(gdf.crs).values[0])
# dataframe of flattened distances
dfp = pd.DataFrame(
[
"<br>".join(
[f"{a} - {b:.2f}" for a, b in gdf.loc[:, ["statcode", "distance"]].values]
)
],
columns=["info"],
)
# generate colored choropleth
m = gdf.explore(
column="deliveries", categorical=True, legend=False, height=400, width=400
)
# add marker with distances
gpd.GeoDataFrame(
geometry=p,
data=dfp,
).explore(m=m, marker_type="marker")

Geoview and geopandas groupby projection error

I’m experiencing projection errors following a groupby on geodataframe. Below you will find the libraries that I am using:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib
import holoviews as hv
from holoviews import opts
import panel as pn
from bokeh.resources import INLINE
import geopandas as gpd
import geoviews as gv
from cartopy import crs
hv.extension('bokeh', 'matplotlib')
gv.extension('bokeh')
pd.options.plotting.backend = 'holoviews'
Whilst these are the versions of some key libraries:
bokeh 2.1.1
geopandas 0.6.1
geoviews 1.8.1
holoviews 1.13.3
I have concatenated 3 shapefiles to build a polygon picture of UK healthcare boundaries (links to files provided if needed). Unfortunately, from what i have found the UK doesn’t produce one file that combines all of those, so have had to merge the shape files from the 3 individual countries i’m interested in. The 3 shape files have a size of:
shape file 1 = (https://www.opendatani.gov.uk/dataset/department-of-health-trust-boundaries)
shape file 2 = (https://geoportal.statistics.gov.uk/datasets/5252644ec26e4bffadf9d3661eef4826_4)
shape file 3 = (https://data.gov.uk/dataset/31ab16a2-22da-40d5-b5f0-625bafd76389/local-health-boards-december-2016-ultra-generalised-clipped-boundaries-in-wales)
My code to concat them together is below:
England_CCG.drop(['objectid', 'bng_e', 'bng_n', 'long', 'lat', 'st_areasha', 'st_lengths'], inplace = True, axis = 1 )
Wales_HB.drop(['objectid', 'bng_e', 'bng_n', 'long', 'lat', 'st_areasha', 'st_lengths', 'lhb16nmw'], inplace = True, axis = 1 )
Scotland_HB.drop(['Shape_Leng', 'Shape_Area'], inplace = True, axis = 1)
#NI_HB.drop(['Shape_Leng', 'Shape_Area'], inplace = True, axis = 1 )
England_CCG.rename(columns={'ccg20cd': 'CCG_Code', 'ccg20nm': 'CCG_Name'}, inplace = True )
Wales_HB.rename(columns={'lhb16cd': 'CCG_Code', 'lhb16nm': 'CCG_Name'}, inplace = True )
Scotland_HB.rename(columns={'HBCode': 'CCG_Code', 'HBName': 'CCG_Name'}, inplace = True )
#NI_HB.rename(columns={'TrustCode': 'CCG_Code', 'TrustName': 'CCG_Name'}, inplace = True )
UK_shape = [England_CCG, Wales_HB, Scotland_HB]
Merged_Shapes = gpd.GeoDataFrame(pd.concat(UK_shape))
Each of the files has the same esri projection once joined, and the shape plots perfectly as one when I run:
Test= gv.Polygons(Merged_Shapes, vdims=[('CCG_Name')], crs=crs.OSGB())
This gives me a polygon plot of the UK, with all the area boundaries for each ccg.
To my geodataframe, I then add a new column, called ‘Country’ which attributes each CCG to whatever the country they belong to. So, all the Welsh CCGs are attributed to Wales, all the English ones to England and all the Scottish ones to Scotland. Just a simple additional grouping of the data really.
What I want to achieve is to have a dropdown next to the polygon map I am making, that will show all the CCGs in a particular country when it is selected from the drop down widget. I understand that the way to to do this is by a groupby. However, when I use the following code to achieve this:
c1 = gv.Polygons(Merged_Shapes, vdims=[('CCG_Name','Country')], crs=crs.OSGB()).groupby(['Country'])
I get a long list of projection errors stating:
“WARNING:param.project_path: While projecting a Polygons element from a PlateCarree coordinate reference system (crs) to a Mercator projection none of the projected paths were contained within the bounds specified by the projection. Ensure you have specified the correct coordinate system for your data.”
To which I am left without a map but I retain the widget. Does anyone know what is going wrong here and what a possible solution would be? its been driving me crazy!
Kind regards,
For some reason geoviews doesn't like the OSGB projection then followed by a groupby, as it tries to default back to platecaree projection.
The way I fixed it was to just make the entire dataset project in epsg:4326. For anyone who also runs into this problem, code below (it is a well documented solution:
Merged_Shapes.to_crs({'init': 'epsg:4326'},inplace=True)
gv.Polygons(Merged_Shapes, vdims=[('CCG_Name'),('Country')]).groupby('Country')
The groupby works fine after this.

How to rotate ylabel of pairplot in searborn? [duplicate]

I have a simple factorplot
import seaborn as sns
g = sns.factorplot("name", "miss_ratio", "policy", dodge=.2,
linestyles=["none", "none", "none", "none"], data=df[df["level"] == 2])
The problem is that the x labels all run together, making them unreadable. How do you rotate the text so that the labels are readable?
I had a problem with the answer by #mwaskorn, namely that
g.set_xticklabels(rotation=30)
fails, because this also requires the labels. A bit easier than the answer by #Aman is to just add
plt.xticks(rotation=45)
You can rotate tick labels with the tick_params method on matplotlib Axes objects. To provide a specific example:
ax.tick_params(axis='x', rotation=90)
This is still a matplotlib object. Try this:
# <your code here>
locs, labels = plt.xticks()
plt.setp(labels, rotation=45)
Any seaborn plots suported by facetgrid won't work with (e.g. catplot)
g.set_xticklabels(rotation=30)
however barplot, countplot, etc. will work as they are not supported by facetgrid. Below will work for them.
g.set_xticklabels(g.get_xticklabels(), rotation=30)
Also, in case you have 2 graphs overlayed on top of each other, try set_xticklabels on graph which supports it.
If anyone wonders how to this for clustermap CorrGrids (part of a given seaborn example):
import seaborn as sns
import matplotlib.pyplot as plt
sns.set(context="paper", font="monospace")
# Load the datset of correlations between cortical brain networks
df = sns.load_dataset("brain_networks", header=[0, 1, 2], index_col=0)
corrmat = df.corr()
# Set up the matplotlib figure
f, ax = plt.subplots(figsize=(12, 9))
# Draw the heatmap using seaborn
g=sns.clustermap(corrmat, vmax=.8, square=True)
rotation = 90
for i, ax in enumerate(g.fig.axes): ## getting all axes of the fig object
ax.set_xticklabels(ax.get_xticklabels(), rotation = rotation)
g.fig.show()
You can also use plt.setp as follows:
import matplotlib.pyplot as plt
import seaborn as sns
plot=sns.barplot(data=df, x=" ", y=" ")
plt.setp(plot.get_xticklabels(), rotation=90)
to rotate the labels 90 degrees.
For a seaborn.heatmap, you can rotate these using (based on #Aman's answer)
pandas_frame = pd.DataFrame(data, index=names, columns=names)
heatmap = seaborn.heatmap(pandas_frame)
loc, labels = plt.xticks()
heatmap.set_xticklabels(labels, rotation=45)
heatmap.set_yticklabels(labels[::-1], rotation=45) # reversed order for y
One can do this with matplotlib.pyplot.xticks
import matplotlib.pyplot as plt
plt.xticks(rotation = 'vertical')
# Or use degrees explicitly
degrees = 70 # Adjust according to one's preferences/needs
plt.xticks(rotation=degrees)
Here one can see an example of how it works.
Use ax.tick_params(labelrotation=45). You can apply this to the axes figure from the plot without having to provide labels. This is an alternative to using the FacetGrid if that's not the path you want to take.
If the labels have long names it may be hard to get it right. A solution that worked well for me using catplot was:
import matplotlib.pyplot as plt
fig = plt.gcf()
fig.autofmt_xdate()

In Matplotlib, how do you add an Imagedraw object to a PyPlot?

I need to add a shape to a preexisting image generated using a pyplot (plt). The best way I know of to generate basic shapes quickly is using Imagedraw's predefined shapes. The original data has points with corresponding colors in line_holder and colorholder. I need to add a bounding box (or in this case ellipse) to the plot to make it obvious to the user whether the data is in an acceptable range.
import matplotlib.pyplot as plt
from matplotlib.collections import LineCollection
from PIL import Image
...
lines = LineCollection(mpl.line_holder, colors=mpl.colorholder , linestyle='solid')
plt.axes().add_collection(lines)
plt.axes().set_aspect('equal', 'datalim')
plt.axes().autoscale_view(True,True,True)
plt.draw()
plt.show()
I tried inserting this before the show():
image = Image.new('1',(int(ceil(disc/conv))+2,int(ceil(disc/conv))+1), 1)
draw = ImageDraw.Draw(image)
box=(1, 1, int(ceil(disc/conv)), int(ceil(disc/conv))) #create bounding box
draw.ellipse(box, 1, 0) #draw circle in black
but I cannot find a way to then add this ellipse to the pyplot. Does anyone know how one would go about getting the images together? If it is not possible to add an imagedraw object to a pyplot, are there good alternatives for performing this type of operation?
Matplotlib has several patches (shapes) that appear to meet your needs (and remove PIL as a dependency). They are documented here. A helpful example using shapes is here.
To add an ellipse to a plot, you first create a Ellipse patch and then add that patch to the axes you're currently working on. Beware that Circle's (or Ellipse's with equal minor radii) will appear elliptical if your aspect ratio is not equal.
In your snippet you call plt.axes() several times. This is unnecessary, as it is just returning the current axes object. I think it is clearer to keep the axes object and directly operate on it rather than repeatedly getting the same object via plt.axes(). As far as axes() is used in your snippet, gca() does the same thing. The end of my script demonstrates this.
I've also replaced your add_collection() line by a plotting a single line. These essentially do the same thing and allows my snippet to be executed as a standalone script.
import matplotlib.pyplot as plt
import matplotlib as mpl
# set up your axes object
ax = plt.axes()
ax.set_aspect('equal', 'datalim')
ax.autoscale_view(True, True, True)
# adding a LineCollection is equivalent to plotting a line
# this will run as a stand alone script
x = range(10)
plt.plot( x, x, 'x-')
# add and ellipse to the axes
c = mpl.patches.Ellipse( (5, 5), 1, 6, angle=45)
ax.add_patch(c)
# you can get the current axes a few ways
ax2 = plt.axes()
c2 = mpl.patches.Ellipse( (7, 7), 1, 6, angle=-45, color='green')
ax2.add_patch(c2)
ax3 = plt.gca()
c3 = mpl.patches.Ellipse( (0, 2), 3, 3, color='black')
ax3.add_patch(c3)
print id(ax), id(ax2), id(ax3)
plt.show()

Resources