Create a date scatterplot with seaborn colored by category [duplicate] - seaborn

This question already has answers here:
Select DataFrame rows between two dates
(13 answers)
Pandas: select all dates with specific month and day
(3 answers)
Select rows for a specific month in Pandas
(2 answers)
seaborn scatterplot is plotting more dates than exist in the original data
(2 answers)
Closed 6 months ago.
I did search through a variety of examples. Many of these examples feature edge cases where the poster is trying to do something more than a simple plot. I also looked for a lot of help.
I just want to have a plot where date is on the x-axis, a count is on the y-axis and the dots are colored by category. In the beginning, I got close but for some reason that plot showed dates before when I wanted it to start.
This post was "Closed" but after some work and using suggestions, the plot
works. Code working code is below
# Plot
# Load libraries
import seaborn as sns
import matplotlib.pyplot as plt
# Create sample data set
pen = sns.load_dataset('penguins')
pen['dates'] = pd.date_range(start = '2012-01-01', end = '2013-06-01')[:344]
pen = pen[['dates','flipper_length_mm','sex']].dropna()
# Use a mask to subset obs if needed
#https://stackoverflow.com/questions/29370057/select-dataframe-rows-between-two-dates
mask = (pen['dates'] >= '2012-04-01')
pen2 = pen.loc[mask]
# Create Plots
fig, ax = plt.subplots(figsize=(12,12))
ax = sns.scatterplot(x='dates', y='flipper_length_mm', data=pen2, hue="sex", ax = ax)
# Limit date range
# https://stackoverflow.com/questions/53963816/seaborn-scatterplot-is-plotting-more-dates-than-exist-in-the-original-data
ax.set(xlim = ('2012-04-01', '2013-01-01'))
Here is a working plotnine version of the same concept.
from plotnine import *
from mizani.breaks import date_breaks
from mizani.formatters import date_format
(
ggplot(pen, aes(x='dates', y = 'flipper_length_mm', color = 'sex'))
+ geom_point()
+ scale_x_datetime(breaks = date_breaks('1 month'))
+ theme(axis_text_x = element_text(rotation = 90, hjust = 1))
+ labs(title = "Penguins")
)

Related

Set Ranges in Displot (Seaborn) [duplicate]

This question already has answers here:
How to set some xlim and ylim in Seaborn lmplot facetgrid
(2 answers)
set individual ylim of seaborn lmplot columns
(1 answer)
Closed 12 days ago.
I was trying to plot this using displot.
This is my plot
my code
plt = sns.displot(reg_pred-y_test,kind = 'kde')
Now I want to set ranges of X axis (-20,20) and Y axis (0.00 to 0.12).
I tried plt.xlim(-20,20)
It gives me the followring error message :
AttributeError: 'FacetGrid' object has no attribute 'xlim'
Can anyone help me with setting the ranges?

How to make an animation (or animated gif), from a number of geopandas plots

I have a Geodataframe ("mhg") in which the index are months (i.e. "2019-01-01", "2019-02-01", ...), and the GDF have a column that is the geometry of certain regions (i.e. POLYGON(...)), and finally another column that is the population corresponding to that geometry at that month.
sample data (with onyl two months) could be created by:
import geopandas as gpd
data = [['2019-01-01', 'POLYGON(123...)', 1000], ['2019-01-01', 'POLYGON(456...)', 1500], ['2019-01-01', 'POLYGON(789...)', 1400], ['2019-02-01', 'POLYGON(123...)', 1100], ['2019-02-01', 'POLYGON(456...)', 1600], ['2019-02-01', 'POLYGON(789...)', 1300]]
mhg = gpd.GeoDataFrame(data, columns=['month','geometry', 'population'])
mhg.set_index('month')
I can make a multicolor plot of the users living in each region (all periods) with:
mhg.plot(column='population',cmap='jet')
and I can make the same, but filtering by month, using:
mhg.loc['2019-01-01'].plot(column='population',cmap='jet')
I would like to get some kind of ""animation" or animated gif where I can see the temporal evolution of the population, by using this kind of pseudocode:
for all the plots in
mhg.loc['2019-01-01'].plot(column='population',cmap='jet')
mhg.loc['2019-02-01'].plot(column='population',cmap='jet')
mhg.loc['2019-03-01'].plot(column='population',cmap='jet')
...
then merge all plots into 1 animated gif
But I dont' know how to do it: the number of months can be up to hundreds, I don't how how to make the for loop, and I don't know even how to start...
Any suggestions?
EDIT: I tried the following (following https://linuxtut.com/en/c089c549df4d4a6d815c/):
months = np.sort(np.unique(mhg.month.values))
from matplotlib.animation import FuncAnimation
from matplotlib.animation import PillowWriter
fig, ax = plt.subplots()
ims = []
def update_fig(month):
if len(ims) > 0:
ims[0].remove()
del ims[0]
geos = mhg['geometry'].values
users = mhg[(mhg.month==month)].population
apl = gpd.plotting.plot_polygon_collection(ax, geos, population, True, cmap='jet')
ims.append(apl)
ax.set_title('month = ' + str(month))
return ims
anim = FuncAnimation(fig, update_fig, interval=1000, repeat_delay=3000, frames=months)
plt.show()
But I got a UserWarning: animation was deleted without rendering anything...
So I am stuck again.
I managed to do it this way:
mhg = mhg.reset_index()
groups = mhg.groupby('month')
for month, grp in groups:
grp.plot(column='users',cmap='jet',legend=True,figsize=(10, 10),norm=matplotlib.colors.LogNorm(vmin=mhg.users.min(), vmax=mhg.users.max()))
plt.title({month})
plt.xlim([-20, 5])
plt.ylim([25, 45])
plt.savefig("plot{month}.png".format(month=month), facecolor='white')
plt.close()
And then I joined all the png's with convert (imagemagick tool):
convert -delay 50 -loop 0 *.png aniamtion.gif

Seaborn PairGrid: pairplot two data set with different transparency

I'd like to make a PairGrid plot with the seaborn library.
I have two classed data: a training set and one-target point.
I'd like to plot the one-target point as opaque, however, the samples in the training set should be transparent.
And I'd like to plot the one-target point also in lower cells.
Here is my code and image:
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
data = pd.read_csv("data.csv")
g = sns.PairGrid(data, hue='type')
g.map_upper(sns.scatterplot, alpha=0.2, palette="husl")
g.map_lower(sns.kdeplot, lw=3, palette="husl")
g.map_diag(sns.kdeplot, lw=3, palette="husl")
g.add_legend()
plt.show()
And the data.csv is like belows:
logP tPSA QED HBA HBD type
0 -2.50000 200.00 0.300000 8 1 Target 1
1 1.68070 87.31 0.896898 3 2 Training set
2 3.72930 44.12 0.862259 4 0 Training set
3 2.29702 91.68 0.701022 6 3 Training set
4 -2.21310 102.28 0.646083 8 2 Training set
You can reassign the dataframe used after partial plotting. E.g. g.data = data[data['type'] == 'Target 1']. So, you can first plot the training dataset, change g.data and then plot the target with other parameters.
The following example supposes the first row of the iris dataset is used as training data. A custom legend is added (this might provoke a warning that should be ignored).
import matplotlib.pyplot as plt
from matplotlib.lines import Line2D
import seaborn as sns
iris = sns.load_dataset('iris')
g = sns.PairGrid(iris)
color_for_trainingset = 'paleturquoise'
# color_for_trainingset = sns.color_palette('husl', 2) [-1] # this is the color from the question
g.map_upper(sns.scatterplot, alpha=0.2, color=color_for_trainingset)
g.map_lower(sns.kdeplot, color=color_for_trainingset)
g.map_diag(sns.kdeplot, lw=3, color=color_for_trainingset)
g.data = iris.iloc[:1]
# g.data = data[data['type'] == 'Target 1']
g.map_upper(sns.scatterplot, alpha=1, color='red')
g.map_lower(sns.scatterplot, alpha=1, color='red', zorder=3)
handles = [Line2D([], [], color='red', ls='', marker='o', label='target'),
Line2D([], [], color=color_for_trainingset, lw=3, label='training set')]
g.add_legend(handles=handles)
plt.show()

Changing values in iris cube based on coordinates instead of index

I would like to be able to change values in iris based on the coordinate, instead of the index.
For example, consider the following cube and say that I wish to set values from -45N to 45N and 160E to 240E to 1:
import iris
import numpy as np
from iris.coords import DimCoord
from iris.cube import Cube
latitude_vals = np.linspace(-90, 90, 4)
longitude_vals = np.linspace(45, 360, 8)
latitude = DimCoord(latitude_vals, standard_name="latitude", units="degrees")
longitude = DimCoord(longitude_vals, standard_name="longitude", units="degrees")
cube = Cube(
np.zeros((4, 8), np.float32), dim_coords_and_dims=[(latitude, 0), (longitude, 1)]
)
In this example, what I want can be done by invoking xarray:
import xarray as xr
da = xr.DataArray.from_iris(cube)
da.loc[dict(latitude=slice(-45, 45), longitude=slice(160, 240))] = 1
But can this be done entirely within iris, without having to resort to specifying the indices manually?
Example of specifying the indices manually:
cube.data[1:3, 3:5] = cube.data[1:3, 3:5] + 1
Update (22 Jan 2021): This is a known issue, see this cross-post and links for related discussion.

seaborn line plots with date on the x axis

enter image description here
Hi,
I am trying to recreate some of the covid-19 charts that we have seen. I am using data from the Johns Hopkins database.
The data is arranged so that the city names are in the rows and the columns are dates. A screenshot of the csv file is attached. I want to plot line graphs in seaborn that has days in the x axis and confirmed case by city in the y axis. For some reason, I am unable to re-produce the exponential curves of the death rate.
My code is:
'''loading the file'''
date_columns = list(range(12,123))
df_covid_us = pd.read_csv(covid_us_file, parse_dates=date_columns)
df_covid_us = pd.read_csv(covid_us_file)
'''slicing the columns needed. Province_State and the date columns'''
df = df_covid_us.iloc[:, np.r_[6, 12:123]]
df = df[df['Province_State']=='New York']
'''using df.melt'''
df2 =df.melt(id_vars='Province_State',var_name='Date',value_name='Deaths')
'''plotting using seaborn'''[enter image description here][2]
sns.lineplot(x='Date',y='Deaths',data=df2, ci=None)
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m-%d'))
plt.gca().xaxis.set_major_locator(mdates.DayLocator(interval=20))
plt.show()
enter image description here
With a small sample of made-up data:
import pandas as pd, seaborn as sns
import matplotlib.pyplot as plt, matplotlib.dates as mdates
df = pd.DataFrame({'Province_State':['American Samoa','Guam','Puerto Rico'],
'2020-01-22':[0,1,2],
'2020-01-23':[2,1,0]})
# to get dates in rows
date_columns = [c for c in df.columns.tolist() if c.endswith('/2020')]
df2 = df.melt(id_vars='Province_State',value_vars=date_columns,
var_name='Date',value_name='Deaths')
# dates from string to datetime
df2['Date'] = pd.to_datetime(df2['Date'])
sns.lineplot(x='Date',y='Deaths',hue='Province_State',data=df2)
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m-%d'))
plt.gca().xaxis.set_major_locator(mdates.DayLocator(interval=1))
plt.show()

Resources