I wanted to calculate the distance from Manila to cities in the Philippines using geopandas GeoSeries.distance(self, other) function.
Steps:
# So I start with the dataset, which should produce a geopandas dataframe consisting basically of cities and a polygon of its boundaries in latlong.
url = 'https://raw.githubusercontent.com/macoymejia/geojsonph/master/MuniCities/MuniCities.minimal.json'
df1 = gpd.read_file(url)
# then I define a centroid column
df1['Centroid'] = df1.geometry.centroid
# then I define Manila location as a shapely point geometry, which produces a DataFrame with point geometry and address as columns
manila_loc = gpd.tools.geocode('Manila')
# then I try to calculate the distance
df1.Centroid.distance(manila_loc.geometry)
But I'm getting this error:
AttributeError Traceback (most recent call last)
<ipython-input-30-76585915942f> in <module>
----> 1 df1.Centroid.distance(manila_loc.geometry)
~/opt/anaconda3/envs/Coursera/lib/python3.8/site-packages/pandas/core/generic.py in __getattr__(self, name)
5137 if self._info_axis._can_hold_identifiers_and_holds_name(name):
5138 return self[name]
-> 5139 return object.__getattribute__(self, name)
5140
5141 def __setattr__(self, name: str, value) -> None:
AttributeError: 'Series' object has no attribute 'distance'
I'm new to GeoPandas but I thought from the documentation that distance method can act on GeoSeries and that df1.Centroid and manila.geometry are valid shapely geometry objects. So I don't know what I am missing. Help pls.
Try this
# relevant code only
dists = []
for i, centr in df1.Centroid.iteritems():
dist = centr.distance( manila.geometry[0] )
dists.append(dist)
print("Dist2Manila: ", dist)
To create new column for the distances:
df1["Dist2Manila"] = dists
You need to feed a singular Point object to the distance method:
from shapely.geometry import Point
from geopandas import GeoDataFrame
destination = Point(5, 5)
geoms = map(lambda x: Point(*x), [(0, 0), (3, 3), (4, 1), (8, 2), (1, 10)])
departures = GeoDataFrame({'city': list('ABCDE'), 'geometry': geoms})
print(departures.assign(dist_to_dest=departures.distance(destination)))
Which give me:
city geometry dist_to_dest
0 A POINT (0.00000 0.00000) 7.071068
1 B POINT (3.00000 3.00000) 2.828427
2 C POINT (4.00000 1.00000) 4.123106
3 D POINT (8.00000 2.00000) 4.242641
4 E POINT (1.00000 10.00000) 6.403124
I was able to solve it. I mistakenly assumed that since it is a geopandas data frame, the centroid column will already be treated as such. But I think you have to explicitly define it. So it's what I did. A minor revision on the original code from:
df1.Centroid.distance(manila_loc.geometry)
to:
gpd.GeoSeries(df1.Centroid).distance(manila_loc.iloc[0,0])
It worked.
Related
I am using nearest_points from shapely to retrieve the nearest points between two polygons.
I get the expected result for two simple polygons:
However for more complex polygons, the points are not the expected nearest points between the polygons.
Note: I added ax.set_aspect('equal') so that the nearest points line woild have to be at a right angle (right?)
What is wrong with my code or my polygons (or me)?
from shapely.geometry import Point, Polygon, LineString, MultiPolygon
from shapely.ops import nearest_points
import matplotlib.pyplot as plt
import shapely.wkt as wkt
#Set of 2 Polyogons A where the nearest points don's seem right
poly1=wkt.loads('POLYGON((-0.136319755454978 51.460464712623626, -0.1363352511419218 51.46042713866513, -0.1363348393705439 51.460425, -0.1365967582352347 51.460425, -0.1363077932125138 51.4605825392028, -0.136237298707157 51.46052697162038, -0.136319755454978 51.460464712623626))')
poly2=wkt.loads('POLYGON ((-0.1371553266140889 51.46046700960882, -0.1371516327997412 51.46046599134276, -0.1371478585043985 51.46046533117243, -0.1371440383598866 51.46046503515535, -0.1371402074187299 51.460465106007696, -0.1371364008325196 51.460465543079344, -0.137132653529373 51.46046634235985, -0.1371289998934435 51.46046749651525, -0.1371254734494216 51.46046899495536, -0.1371221065549237 51.46047082393093, -0.1366012836405492 51.460786954965236, -0.1365402944168757 51.46074798846902, -0.1370125055334012 51.46045400071198, -0.1371553266140889 51.46046700960882))')
#Set of 2 polygons B where the nearest points seem right
#poly1 = Polygon([(0, 0), (2, 8), (14, 10), (6, 1)])
#poly2 = Polygon([(10, 0),(13,5),(14,2)])
p1, p2 = nearest_points(poly1, poly2)
fig,ax= plt.subplots()
ax.set_aspect('equal')
x1,y1=poly1.exterior.xy
x2,y2=poly2.exterior.xy
#Plot Polgygons
plt.plot(x1,y1)
plt.plot(x2,y2)
#Plot LineString connecting the nearest points
plt.plot([p1.x, p2.x],[p1.y,p2.y], color='green')
fig.show()
I’m trying to find the length of the linestring between the starting point of the linestring and the point which are used to find the nearest distance to a polygon.
So I used the following code to get the minimum distance between the linestring and some polygons.
gdf['MinDistToTrack'] = gdf.geometry.apply(lambda l: min(rail_or.distance(l)))
and I would also like to get the distance from the start of the linestring to the point used by the above code.
Now I get dataframe containing the polygons with a value 'MinDistToTrack' (which I have now) but also with a value ‘Length_Of_Linestring_Up_To_Location_Of_Polygon’.
So, let’s say that from the start of the linestring to the polygon there are 22 meters following the path of the linestring, then this is the value I would like to save together with the 'MinDistToTrack'
Polygon ID : 1
'MinDistToTrack' : 1m
'LengthOfLinestringUpToLocationOfPolygon' : 22m
Is this possible or do I need to split the linestring up into small elements and then look at all elements and the length of all the preceding elements in relation to the linestring elements which is nearest to the polygon?
Picture showing the problem
You may use the following concepts from shapely:
The nearest_points() function in shapely.ops calculates the nearest points in a pair of geometries.
shapely.ops.nearest_points(geom1, geom2)
Returns a tuple of the nearest points in the input geometries. The points are returned in the same order as the input geometries.
https://shapely.readthedocs.io/en/stable/manual.html#shapely.ops.nearest_points
from shapely.ops import nearest_points
P = Polygon([(0, 0), (1, 0), (0.5, 1), (0, 0)])
Lin = Linestring([(0, 2), (1, 2), (1, 3), (0, 3)])
nps = [o.wkt for o in nearest_points(P, Lin)]
##nps = ['POINT (0.5 1)', 'POINT (0.5 2)']
np_lin = = nps[1]
You can then use the point np_lin and Project it on the Lin to get the distance using
d = Lin.project(np_lin)
d will be the distance along Lin to the point np_lin i.e. nearest to the corresponding Point of P.
I have a GeoDataFrame with a LINESTRING Z geometry where Z is my altitude for the lat/long. (There are other columns in the dataframe that I deleted for ease of sharing but are relevant when displaying the resulting track)
TimeUTC Latitude Longitude AGL geometry
0 2021-06-16 00:34:04+00:00 42.835413 -70.919610 82.2 LINESTRING Z (-70.91961 42.83541 82.20000, -70...
I would like to find the maximum Z value in that linestring but I am unable to find a way to access it or extract the x,y,z values in a way that I can determine the maximum value outside of the linestring.
line.geometry.bounds only returns the x,y min/max.
The best solution I could come up with was to turn all the points into a list of tuples:
points = line.apply(lambda x: [y for y in x['geometry'].coords], axis=1)
And then find the maximum value of the third element:
from operator import itemgetter
max(ft2,key=itemgetter(2))[2]
I hope there is a better solution available.
Thank you.
You can take your lambda function approach and just take it one step further:
import numpy as np
line['geometry'].apply(lambda geom: np.max([coord[2] for coord in geom.coords]))
Here's a fully reproducible example from start to finish:
import shapely
import numpy as np
import geopandas as gpd
linestring = shapely.geometry.LineString([[0,0,0],
[1,1,1],
[2,2,2]])
gdf = gpd.GeoDataFrame({'id':[1,2,3],
'geometry':[linestring,
linestring,
linestring]})
gdf['max_z'] = (gdf['geometry']
.apply(lambda geom:
np.max([coord[2] for coord in geom.coords])))
In the example above, I create a new column called "max_z" that stores the maximum Z value for each row.
Important note
This solution will only work if you exclusively have LineStrings in your geometries. If, for example, you have MultiLineStrings, you'll have to adapt the function I wrote to take care of that.
I assume this is possible. Example: I have polygons in a geodataframe, some polygons have the same attribute data, they are just separate individual polygons with the same data, each polygon has its own row in the gdf.
I would like to combine the polygons into a multipolygon so they take up only 1 row in the gdf.
The two polygons overlap, I do not want to dissolve them together, I want them to remain 2 separate entities.
There are single polygons, I assume they will also have to be converted to multipolygons even though they are in the singular as ultimately they will be exported for use in GIS software, one geom type per dataset.
I have achieved a .dissolve(by='ID') but as stated above, I do not want to change the polygons geometry.
Suggestions?
You can adapt geopandas' dissolve to generate MultiPolygon instead of unary union. The original code I adapted is here.
import geopandas as gpd
from shapely.geometry import Polygon, MultiPolygon
def groupby_multipoly(df, by, aggfunc="first"):
data = df.drop(labels=df.geometry.name, axis=1)
aggregated_data = data.groupby(by=by).agg(aggfunc)
# Process spatial component
def merge_geometries(block):
return MultiPolygon(block.values)
g = df.groupby(by=by, group_keys=False)[df.geometry.name].agg(
merge_geometries
)
# Aggregate
aggregated_geometry = gpd.GeoDataFrame(g, geometry=df.geometry.name, crs=df.crs)
# Recombine
aggregated = aggregated_geometry.join(aggregated_data)
return aggregated
df = gpd.GeoDataFrame(
{"a": [0, 0, 1], "b": [1, 2, 3]},
geometry=[
Polygon([(0, 0), (1, 0), (1, 1)]),
Polygon([(1, 0), (1, 0), (1, 1)]),
Polygon([(0, 2), (1, 0), (1, 1)]),
],
)
grouped = groupby_multipoly(df, by='a')
grouped
geometry b
a
0 MULTIPOLYGON (((0.00000 0.00000, 1.00000 0.000... 1
1 MULTIPOLYGON (((0.00000 2.00000, 1.00000 0.000... 3
If you change MultiPolygon within merge_geometries to GeometryCollection, you should be able to combine any type of geometry to a single row. But that might not be supported by certain file formats.
I've got 2 datasets, a list of shops with UK coordinates and train station also, with coordinates.
I'm using BallTree to get the nearest station to each shop with a distance, using a a code from this website and I've swapped in my dataframes appropriately.
https://automating-gis-processes.github.io/site/notebooks/L3/nearest-neighbor-faster.html
Code:
import pandas as pd
import numpy as np
import geopandas as gpd
from sklearn.neighbors import BallTree
df_pocs = pd.read_csv(r'C:\Users\FLETCHWI\Desktop\XX\shops.csv', encoding = "ISO-8859-1", engine='python')
df_stations = pd.read_csv(r'C:\Users\FLETCHWI\Desktop\xx\uk_stations.csv', encoding = "ISO-8859-1", engine='python')
gdf_pocs = gpd.GeoDataFrame(
df_pocs, geometry=gpd.points_from_xy(df_pocs.longitude, df_pocs.latitude))
gdf_stations = gpd.GeoDataFrame(
df_stations, geometry=gpd.points_from_xy(df_stations.longitude, df_stations.latitude))
def get_nearest(src_points, candidates, k_neighbors=1):
"""Find nearest neighbors for all source points from a set of candidate points"""
# Create tree from the candidate points
tree = BallTree(candidates, leaf_size=15, metric='haversine')
# Find closest points and distances
distances, indices = tree.query(src_points, k=k_neighbors)
# Transpose to get distances and indices into arrays
distances = distances.transpose()
indices = indices.transpose()
# Get closest indices and distances (i.e. array at index 0)
# note: for the second closest points, you would take index 1, etc.
closest = indices[0]
closest_dist = distances[0]
# Return indices and distances
return (closest, closest_dist)
def nearest_neighbor(left_gdf, right_gdf, return_dist=False):
"""
For each point in left_gdf, find closest point in right GeoDataFrame and return them.
NOTICE: Assumes that the input Points are in WGS84 projection (lat/lon).
"""
left_geom_col = left_gdf.geometry.name
right_geom_col = right_gdf.geometry.name
# Ensure that index in right gdf is formed of sequential numbers
right = right_gdf.copy().reset_index(drop=True)
# Parse coordinates from points and insert them into a numpy array as RADIANS
left_radians = np.array(left_gdf[left_geom_col].apply(lambda geom: (geom.x * np.pi / 180, geom.y * np.pi / 180)).to_list())
right_radians = np.array(right[right_geom_col].apply(lambda geom: (geom.x * np.pi / 180, geom.y * np.pi / 180)).to_list())
# Find the nearest points
# -----------------------
# closest ==> index in right_gdf that corresponds to the closest point
# dist ==> distance between the nearest neighbors (in meters)
closest, dist = get_nearest(src_points=left_radians, candidates=right_radians)
# Return points from right GeoDataFrame that are closest to points in left GeoDataFrame
closest_points = right.loc[closest]
# Ensure that the index corresponds the one in left_gdf
closest_points = closest_points.reset_index(drop=True)
# Add distance if requested
if return_dist:
# Convert to meters from radians
earth_radius = 6371000 # meters
closest_points['distance'] = dist * earth_radius
return closest_points
# Find closest public transport stop for each building and get also the distance based on haversine distance
# Note: haversine distance which is implemented here is a bit slower than using e.g. 'euclidean' metric
# but useful as we get the distance between points in meters
closest_stations = nearest_neighbor(gdf_pocs, gdf_stations, return_dist=True)
Upon running the code, it returns the same station for every shop that I have. However I'd like it to find the nearest station for every shop and the distance to it.
Any help appreciated, thanks!
I did some testing of the functions and indeed lat/long needs to be reversed for it to work.
Notice the warning:
NOTICE: Assumes that the input Points are in WGS84 projection (lat/lon).
Hence, when defining the point simple change
gdf_pocs = gpd.GeoDataFrame(
df_pocs, geometry=gpd.points_from_xy(df_pocs.longitude, df_pocs.latitude))
gdf_stations = gpd.GeoDataFrame(
df_stations, geometry=gpd.points_from_xy(df_stations.longitude, df_stations.latitude))
to
gdf_pocs = gpd.GeoDataFrame(
df_pocs, geometry=gpd.points_from_xy(df_pocs.latitude, df_pocs.longitude))
gdf_stations = gpd.GeoDataFrame(
df_stations, geometry=gpd.points_from_xy(df_stations.latitude, df_stations.longitude))