geopandas plot a polygon on a existing figure - geopandas

I have a plot of dataframe and I would like to continue plotting a polygon, but I get the message Figure size 432x288 with 0 Axes
from shapely.geometry import Polygon
ax1 = stopvu_count.plot.scatter(x='lon', y= 'lat')
polygonY = Polygon([(37.991920,23.731388),
(37.991771,23.731337),
(37.991181,23.735570),
])
p1 = gpd.GeoSeries(polygonY)
p1.plot(facecolor="none",edgecolor='black',ax=ax1)
plt.show()

Related

How to convert convex hull vertices into a geopandas polygon

Iam using DBSCAN to cluster coordinates together and then using convexhull to draw 'polygons' around each cluster. I then want to construct geopandas polygons out of my convex hull shapes to be used for spatial joining.
import pandas as pd, numpy as np, matplotlib.pyplot as plt
from sklearn.cluster import DBSCAN
from scipy.spatial import ConvexHull
Lat=[10,10,20,23,27,28,29,34,11,34,66,22]
Lon=[39,40,23,21,11,29,66,33,55,22,11,55]
D=list(zip(Lat, Lon))
df = pd.DataFrame(D,columns=['LAT','LON'])
X=np.array(df[['LAT', 'LON']])
kms_per_radian = 6371.0088
epsilon = 1500 / kms_per_radian
db = DBSCAN(eps=epsilon, min_samples=3)
model=db.fit(np.radians(X))
cluster_labels = db.labels_
num_clusters = len(set(cluster_labels))
cluster_labels = cluster_labels.astype(float)
cluster_labels[cluster_labels == -1] = np.nan
labels = pd.DataFrame(db.labels_,columns=['CLUSTER_LABEL'])
dfnew=pd.concat([df,labels],axis=1,sort=False)
z=[] #HULL simplices coordinates will be appended here
for i in range (0,num_clusters-1):
dfq=dfnew[dfnew['CLUSTER_LABEL']==i]
Y = np.array(dfq[['LAT', 'LON']])
hull = ConvexHull(Y)
plt.plot(Y[:, 1],Y[:, 0], 'o')
z.append(Y[hull.vertices,:].tolist())
for simplex in hull.simplices:
ploted=plt.plot( Y[simplex, 1], Y[simplex, 0],'k-',c='m')
plt.show()
print(z)
the vertices appended in list[z] represent coordinates of the convex hull however they are not constructed in sequence and closed loop object hence constructing polygon using polygon = Polygon(poin1,point2,point3) will not produce a polygon object. is there a way to construct geopandas polygon object using convex hull vertices in order to use for spatial joining. THanks for your advise.
Instead of generating polygon directly, I would make a MultiPoint out of your coordinates and then generate convex hull around that MultiPoint. That should result in the same geometry, but in properly ordered manner.
Having z as list of lists as you do:
from shapely.geometry import MultiPoint
chulls = []
for hull in z:
chulls.append(MultiPoint(hull).convex_hull)
chulls
[<shapely.geometry.polygon.Polygon at 0x117d50dc0>,
<shapely.geometry.polygon.Polygon at 0x11869aa30>]

How do I perform a curve fit with an array of points and touching a specific point in that array

I need help with curve fitting a given set of points. The points form a parabola and I ought to find the peak point of the result. Issue is when I do a curve fit, it sometimes doesn't touch the max y-coordinate even if the actual point is given in the input array.
Following is the code snippet. Here 1.88 is the actual peak y-coordinate (13.05,1.88). But the graph generated by the code does not touch the point due to curve fitting. So is there a way to fit the curve making sure that it touches the max point given in the input array?
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit, minimize_scalar
fig = plt.gcf()
#fig.set_size_inches(18.5, 10.5)
x = [4.59,9.02,13.05,18.47,20.3]
y = [1.7,1.84,1.88,1.7,1.64]
def f(x, p1, p2, p3):
return p3*(p1/((x-p2)**2 + (p1/2)**2))
plt.plot(x,y,"ro")
popt, pcov = curve_fit(f, x, y)
# find the peak
fm = lambda x: -f(x, *popt)
r = minimize_scalar(fm, bounds=(1, 5))
print( "maximum:", r["x"], f(r["x"], *popt) ) #maximum: 2.99846874275 18.3928199902
plt.text(1,1.9,'maximum '+str(round(r["x"],2))+'( #'+str(round(f(r["x"], *popt),2)) + ' )')
x_curve = np.linspace(min(x), max(x), 50)
plt.plot(x_curve, f(x_curve, *popt))
plt.plot(r['x'], f(r['x'], *popt), 'ko')
plt.show()
Here is a graphical code example using your equation with weighted fitting, where I have made the max point larger to more easily see the effect of the weighting. In non-weighted curve fitting, all weights are implicitly 1.0 as all data points have equal weight. Scipy's curve_fit routine uses weights in the form of uncertainties, so that giving a point a very small uncertainty (which I have done) is like giving the point a very large weight. This technique can be used to make a fit pass arbitrarily close to any single data point by any software that can perform weghted fitting.
import numpy, scipy, matplotlib
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
x = [4.59,9.02,13.05,18.47,20.3]
y = [1.7,1.84,2.0,1.7,1.64]
# note the single very small uncertainty - try making this value 1.0
uncertainties = numpy.array([1.0, 1.0, 1.0E-6, 1.0, 1.0])
# rename data to use previous example
xData = numpy.array(x)
yData = numpy.array(y)
def func(x, p1, p2, p3):
return p3*(p1/((x-p2)**2 + (p1/2)**2))
# these are the same as the scipy defaults
initialParameters = numpy.array([1.0, 1.0, 1.0])
# curve fit the test data, first without uncertainties to
# get us closer to initial starting parameters
ssqParameters, pcov = curve_fit(func, xData, yData, p0 = initialParameters)
# now that we have better starting parameters, use uncertainties
fittedParameters, pcov = curve_fit(func, xData, yData, p0 = ssqParameters, sigma=uncertainties, absolute_sigma=True)
modelPredictions = func(xData, *fittedParameters)
absError = modelPredictions - yData
SE = numpy.square(absError) # squared errors
MSE = numpy.mean(SE) # mean squared errors
RMSE = numpy.sqrt(MSE) # Root Mean Squared Error, RMSE
Rsquared = 1.0 - (numpy.var(absError) / numpy.var(yData))
print('Parameters:', fittedParameters)
print('RMSE:', RMSE)
print('R-squared:', Rsquared)
print()
##########################################################
# graphics output section
def ModelAndScatterPlot(graphWidth, graphHeight):
f = plt.figure(figsize=(graphWidth/100.0, graphHeight/100.0), dpi=100)
axes = f.add_subplot(111)
# first the raw data as a scatter plot
axes.plot(xData, yData, 'D')
# create data for the fitted equation plot
xModel = numpy.linspace(min(xData), max(xData))
yModel = func(xModel, *fittedParameters)
# now the model as a line plot
axes.plot(xModel, yModel)
axes.set_xlabel('X Data') # X axis data label
axes.set_ylabel('Y Data') # Y axis data label
plt.show()
plt.close('all') # clean up after using pyplot
graphWidth = 800
graphHeight = 600
ModelAndScatterPlot(graphWidth, graphHeight)

Geopandas: Get a box that coveres area of a geopandas GeoDataFrame to use it to invert a map

I'm trying to invert a map.
import geopandas as gpd
import geoplot as gplt
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
denmark = world[world.name == 'Denmark']
I would like to find out the boundaries of the "denmark" dataframe, to that I can create a box shaped GeoDataFrame that covers all of Denmark.
I'd then intersect that with "denmark" to get a shape of all that is not denmark, which I can later use to cover parts of a map I don't want to show.
I tried looking through the GeoDataFrame to create this box manually, but that doesn't work well.
cords = [c3
for c in mapping(denmark['geometry'])['features']
for c2 in c['geometry']['coordinates']
for c3 in c2
]
xcords = [x[0] for x in cords if isinstance(x[0], float)]
ycords = [y[1] for y in cords if isinstance(y[1], float)]
w3 = gpd.GeoDataFrame(
[Polygon([[max(xcords), max(ycords)],
[max(xcords), min(ycords)],
[min(xcords), min(ycords)],
[min(xcords), max(ycords)]
])],
columns = ['geometry'],
geometry='geometry')
Is there an easy, quick way to get this box?
Or is there a way tp invert a GeoDataFrame?
A GeoDataFrame has the total_bounds attribute, which returns the minx, miny, maxx, maxy of all geometries (the min/max of the bounds of all geometries).
And to create a Polygon of this, you can then pass those values to the shapely.geometry.box function:
>>> denmark.total_bounds
array([ 8.08997684, 54.80001455, 12.69000614, 57.73001659])
>>> from shapely.geometry import box
>>> box(*denmark.total_bounds)
<shapely.geometry.polygon.Polygon at 0x7f06be3e7668>
>>> print(box(*denmark.total_bounds))
POLYGON ((12.6900061377556 54.80001455343792, 12.6900061377556 57.73001658795485, 8.089976840862221 57.73001658795485, 8.089976840862221 54.80001455343792, 12.6900061377556 54.80001455343792))
Looks like a GeoDataFrame has a property "total_bounds"
So it's
denmark.total_bounds
which returns
array([ 8.08997684, 54.80001455, 12.69000614, 57.73001659])

Best learning algorithms concentric and not linearly separable data

Below are two scatter plots. The first one is for data points that have values of x and y, and I would like to know if there is a clustering algorithm that will automatically recognize that there are two clusters. They are concentric and not linearly separable. K-means is not right for several reasons. The other plot is similar but it has x, y and color values, and I would like to know what learning algorithm would be best at classifying or predicting the correct color from the values of x and y.
I got good classifier results for this problem using the sklearn MLPClassifier algorithm. Here is the scatter and contour plots:
Detailed code at: https://www.linkedin.com/pulse/couple-scikit-learn-classifiers-peter-thorsteinson. The simplified code below shows how it works:
import math
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.neural_network import MLPClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix
# Generate the artificial data set and display the resulting scatter plot
x = []
y = []
z = []
for i in range(500):
rand = np.random.uniform(0.0, 2*math.pi)
randx = np.random.normal(0.0, 30.0)
randy = np.random.normal(0.0, 30.0)
if np.random.random() > 0.5:
z.append(0)
x.append(100*math.cos(rand) + randx)
y.append(100*math.sin(rand) + randy)
else:
z.append(1)
x.append(300*math.cos(rand) + randx)
y.append(300*math.sin(rand) + randy)
plt.axis('equal')
plt.axis([-500, 500, -500, 500])
plt.scatter(x, y, c=z)
plt.show()
# Run the MLPClassifier algorithm on the training data
XY = pd.DataFrame({'x': x, 'y': y})
print(XY.head())
Z = pd.DataFrame({'z': z})
print(Z.head())
XY_train, XY_test, Z_train, Z_test = train_test_split(XY, Z, test_size = 0.20)
mlp = MLPClassifier(hidden_layer_sizes=(10, 10, 10), max_iter=1000)
mlp.fit(XY_train, Z_train.values.ravel())
# Make predictions on the test data and display resulting scatter plot
predictions = mlp.predict(XY_test)
print(confusion_matrix(Z_test,predictions))
print(classification_report(Z_test,predictions))
plt.axis('equal')
plt.axis([-500, 500, -500, 500])
plt.scatter(XY_test.x, XY_test.y, c=predictions)
plt.show()

matplotlib: histogram and bin labels

I'm trying to plot a histogram with bar chart, and I'm having difficulties figuring out how to align the x-axis labels with the actual bins. The code below generates the following plot:
as you can see, the end of each x-label is not aligned to the center of its bin. The way i'm thinking about this is: when i apply a 45-degree rotation, the label pivots around its geometrical center. I was wondering if it's possible to move the pivot up to the top of the label. (Or simply translate all the labels slightly left.)
import matplotlib.pyplot as plt
import numpy as np
#data
np.random.seed(42)
data = np.random.rand(5)
names = ['A:GBC_1233','C:WERT_423','A:LYD_342','B:SFS_23','D:KDE_2342']
ax = plt.subplot(111)
width=0.3
bins = map(lambda x: x-width/2,range(1,len(data)+1))
ax.bar(bins,data,width=width)
ax.set_xticks(map(lambda x: x, range(1,len(data)+1)))
ax.set_xticklabels(names,rotation=45)
plt.show()
Use:
ax.set_xticklabels(names,rotation=45, rotation_mode="anchor", ha="right")
The output is:

Resources