I got a dataframe with the following columns Name (string), size (num), latitude (num), longitude (num), geometry (shapely.geometry.point.Point).
When i'm plotting my points on a map and are trying to annotate each point the annotation is not shown at all. My guess is that this is due to the projection im using.
Here are the lines of codes im running:
import geopandas as gpd
import geoplot as gplt
proj = gplt.crs.AlbersEqualArea()
fig, ax = plt.subplots(figsize=(10, 10), subplot_kw={'projection': proj})
gdf = gpd.GeoDataFrame(df, geometry=gpd.points_from_xy(df.longitude, df.latitude))
gplt.pointplot(gdf, hue='size', s=15, ax=ax, cmap=palette, legend=True, zorder=10)
for idx, row in gdf.iterrows():
plt.annotate(s=row['Name'], xy=[row['latitude'],row['longitude']])
plt.show()
You need coordinate transformation in
plt.annotate(s=row['Name'], xy=[row['latitude'],row['longitude']])
The transformation should be
xtran = gplt.crs.ccrs.AlbersEqualArea()
Replace that line with
x, y = xtran.transform_point(row['longitude'], row['latitude'], ccrs.PlateCarree())
plt.annotate( s=row['Name'], xy=[x, y] )
Related
I am having trouble adding a basemap to my map. My geodataframe is created using X and Y coords of a bunch of points.
gdf = geo.GeoDataFrame(
df, geometry=gpd.points_from_xy(df['X'], df['Y']))
gdf.set_crs(epsg=3857)
Which look like this:
After using contexily to get a basemap, I cannot get the basemap to properly show up. The coords should be showing the bottom of the Mississippi River Basin.
ax = gdf.plot(color="red", figsize=(9, 9))
cx.add_basemap(ax, zoom=0, crs= gdf.crs)
Let me know if there is anything wrong with my code as to why it is not showing up.
Thanks!
It looks like your data is in WGS84/EPSG:4326 (i.e. lat/lon) coordinates. So I think you're confusing geopandas.GeoDataFrame.set_crs, which tells geopandas what the CRS of the data is, with geopandas.GeoDataFrame.to_crs, which transforms the data from the current CRS to the new one you specify. Also note that neither of these operations are in-place by default. So I think you want:
gdf = geo.GeoDataFrame(
df, geometry=gpd.points_from_xy(df['X'], df['Y'])
)
gdf = gdf.set_crs("epsg:4326")
gdf_mercator = gdf.to_crs("epsg:3857")
This really is same as #Michael Delgado answer. It's simpler to state the CRS at GeoDataFrame construction time. Also make sure you are using correct CRS
MWE
import geopandas as gpd
import geopandas as geo
import pandas as pd
import contextily as cx
# construct a dataframe with X and Y of some points in US
places = gpd.read_file(
gpd.datasets.get_path("naturalearth_cities"),
mask=gpd.read_file(gpd.datasets.get_path("naturalearth_lowres")).loc[
lambda d: d["iso_a3"].eq("USA")
],
)
df = pd.DataFrame({"X": places.geometry.x, "Y": places.geometry.y})
# user code, state CRS at construction time
gdf = geo.GeoDataFrame(
df, geometry=gpd.points_from_xy(df["X"], df["Y"]), crs="epsg:4326"
)
ax = gdf.plot(color="red", figsize=(9, 9))
cx.add_basemap(ax, zoom=0, crs=gdf.crs)
I have a dataframe 'Spreads' where one of the columns is 'HY_OAS'. My goal is to draw a horizontal line (basically representing a range of values for 'HY_OAS') and plot the column mean on that line. In addition, I wanted the x axis min/max to be the min/max for that column and I'd like to include text boxes annotating the min/max. The problem is I'm not sure how to proceed because all I have is the below. Thanks for any and all help. The goal is the second image and the current code is the first image.
fig8 = px.scatter(x=[Spreads['HY_OAS'].mean()], y=[0])
fig8.update_xaxes(visible=True,showticklabels=False,range=[Spreads['HY_OAS'].min(),Spreads['HY_OAS'].max()])
fig8.update_yaxes(visible=True,showticklabels=False, range=[0,0])
Following what you describe and what you have coded
generate some sample data in a dataframe
scatter values along x-axis and use constant for y-axis
add mean marker
format figure
add required annotations
import numpy as np
import plotly.express as px
import pandas as pd
# simulate some data
Spreads = pd.DataFrame({"HY_OAS": np.sin(np.random.uniform(0, np.pi * 2, 50))})
# scatter values along x-axis and and larger point for mean
fig = px.scatter(Spreads, x="HY_OAS", y=np.full(len(Spreads), 0)).add_traces(
px.scatter(x=[Spreads.mean()], y=[0])
.update_traces(marker={"color": "red", "size": 20})
.data
)
# fix up figure config
fig.update_layout(
xaxis_visible=False,
yaxis_visible=False,
showlegend=False,
paper_bgcolor="rgba(0,0,0,0)",
plot_bgcolor="rgba(0,0,0,0)",
)
# finally required annootations
fig.add_annotation(x=Spreads["HY_OAS"].mean(), y=0, text=Spreads["HY_OAS"].mean().round(4))
fig.add_annotation(x=Spreads["HY_OAS"].min(), y=0, text=Spreads["HY_OAS"].min().round(2), showarrow=False, xshift=-20)
fig.add_annotation(x=Spreads["HY_OAS"].max(), y=0, text=Spreads["HY_OAS"].max().round(2), showarrow=False, xshift=20)
straight line
build base figure as follows
then same code to add annotations and configure layout
fig = px.line(x=[Spreads["HY_OAS"].min(), Spreads["HY_OAS"].max()], y=[0,0]).add_traces(
px.scatter(x=[Spreads.mean()], y=[0])
.update_traces(marker={"color": "red", "size": 20})
.data
)
I would like to be able to change values in iris based on the coordinate, instead of the index.
For example, consider the following cube and say that I wish to set values from -45N to 45N and 160E to 240E to 1:
import iris
import numpy as np
from iris.coords import DimCoord
from iris.cube import Cube
latitude_vals = np.linspace(-90, 90, 4)
longitude_vals = np.linspace(45, 360, 8)
latitude = DimCoord(latitude_vals, standard_name="latitude", units="degrees")
longitude = DimCoord(longitude_vals, standard_name="longitude", units="degrees")
cube = Cube(
np.zeros((4, 8), np.float32), dim_coords_and_dims=[(latitude, 0), (longitude, 1)]
)
In this example, what I want can be done by invoking xarray:
import xarray as xr
da = xr.DataArray.from_iris(cube)
da.loc[dict(latitude=slice(-45, 45), longitude=slice(160, 240))] = 1
But can this be done entirely within iris, without having to resort to specifying the indices manually?
Example of specifying the indices manually:
cube.data[1:3, 3:5] = cube.data[1:3, 3:5] + 1
Update (22 Jan 2021): This is a known issue, see this cross-post and links for related discussion.
This is my code:
import pandas as pd
import geoplot as gplt
import geopandas as gpd
import geoplot.crs as gcrs
import contextily
df = pd.read_csv('dataframe_master.csv', index_col='id')
crs = {'init': 'epsg:4326'}
geometry = [geometry.Point(xy) for xy in zip(df['latitude'], df['longitude'])]
df_geo = gpd.GeoDataFrame(df_geo, crs=crs, geometry=geometry)
test = df_geo[:200000]
test = test.to_crs(epsg=3857)
ax = test.plot(marker='o', markersize=1)
contextily.add_basemap(ax)
plt.show()
And it generates this image:
image, which doesn't show a background map and seems a little distorted.
My coordinate data was originally made with the RD-coordinaten standard (EPSG:28992), which I converted to EPSG:4326 with this code:
lon_l = []
lat_l = []
p1 = Proj(init='epsg:28992')
p2 = Proj(proj='latlong',datum='WGS84')
for row in range(len(df)):
lon, lat, z = transform(p1, p2, df.iloc[row, 7], df.iloc[row, 8], 0.0)
lon_l.append(lon)
lat_l.append(lat)
I did a sanity check on the longitude latitude output by comparing to some online converters, and the output points to the correct locations.
I tried following this solution: https://gis.stackexchange.com/questions/348339/using-crs-epsg3857-but-misalignment-between-stamen-background-and-coordinates-o in case my conversion was missing the "towgs84"part, but the image still looked the same with a slightly different colour.
I figured it out! I should've listed longitude before latitude when building the geometry.
geometry = [geometry.Point(xy) for xy in zip(df['longitude'], df['latitude'])]
enter image description here
Hi,
I am trying to recreate some of the covid-19 charts that we have seen. I am using data from the Johns Hopkins database.
The data is arranged so that the city names are in the rows and the columns are dates. A screenshot of the csv file is attached. I want to plot line graphs in seaborn that has days in the x axis and confirmed case by city in the y axis. For some reason, I am unable to re-produce the exponential curves of the death rate.
My code is:
'''loading the file'''
date_columns = list(range(12,123))
df_covid_us = pd.read_csv(covid_us_file, parse_dates=date_columns)
df_covid_us = pd.read_csv(covid_us_file)
'''slicing the columns needed. Province_State and the date columns'''
df = df_covid_us.iloc[:, np.r_[6, 12:123]]
df = df[df['Province_State']=='New York']
'''using df.melt'''
df2 =df.melt(id_vars='Province_State',var_name='Date',value_name='Deaths')
'''plotting using seaborn'''[enter image description here][2]
sns.lineplot(x='Date',y='Deaths',data=df2, ci=None)
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m-%d'))
plt.gca().xaxis.set_major_locator(mdates.DayLocator(interval=20))
plt.show()
enter image description here
With a small sample of made-up data:
import pandas as pd, seaborn as sns
import matplotlib.pyplot as plt, matplotlib.dates as mdates
df = pd.DataFrame({'Province_State':['American Samoa','Guam','Puerto Rico'],
'2020-01-22':[0,1,2],
'2020-01-23':[2,1,0]})
# to get dates in rows
date_columns = [c for c in df.columns.tolist() if c.endswith('/2020')]
df2 = df.melt(id_vars='Province_State',value_vars=date_columns,
var_name='Date',value_name='Deaths')
# dates from string to datetime
df2['Date'] = pd.to_datetime(df2['Date'])
sns.lineplot(x='Date',y='Deaths',hue='Province_State',data=df2)
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m-%d'))
plt.gca().xaxis.set_major_locator(mdates.DayLocator(interval=1))
plt.show()