enter image description here
Hi,
I am trying to recreate some of the covid-19 charts that we have seen. I am using data from the Johns Hopkins database.
The data is arranged so that the city names are in the rows and the columns are dates. A screenshot of the csv file is attached. I want to plot line graphs in seaborn that has days in the x axis and confirmed case by city in the y axis. For some reason, I am unable to re-produce the exponential curves of the death rate.
My code is:
'''loading the file'''
date_columns = list(range(12,123))
df_covid_us = pd.read_csv(covid_us_file, parse_dates=date_columns)
df_covid_us = pd.read_csv(covid_us_file)
'''slicing the columns needed. Province_State and the date columns'''
df = df_covid_us.iloc[:, np.r_[6, 12:123]]
df = df[df['Province_State']=='New York']
'''using df.melt'''
df2 =df.melt(id_vars='Province_State',var_name='Date',value_name='Deaths')
'''plotting using seaborn'''[enter image description here][2]
sns.lineplot(x='Date',y='Deaths',data=df2, ci=None)
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m-%d'))
plt.gca().xaxis.set_major_locator(mdates.DayLocator(interval=20))
plt.show()
enter image description here
With a small sample of made-up data:
import pandas as pd, seaborn as sns
import matplotlib.pyplot as plt, matplotlib.dates as mdates
df = pd.DataFrame({'Province_State':['American Samoa','Guam','Puerto Rico'],
'2020-01-22':[0,1,2],
'2020-01-23':[2,1,0]})
# to get dates in rows
date_columns = [c for c in df.columns.tolist() if c.endswith('/2020')]
df2 = df.melt(id_vars='Province_State',value_vars=date_columns,
var_name='Date',value_name='Deaths')
# dates from string to datetime
df2['Date'] = pd.to_datetime(df2['Date'])
sns.lineplot(x='Date',y='Deaths',hue='Province_State',data=df2)
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m-%d'))
plt.gca().xaxis.set_major_locator(mdates.DayLocator(interval=1))
plt.show()
Related
# -*- coding: utf-8 -*-
"""
Created on Thu Feb 16 18:17:32 2023
#author: avnth
"""
import seaborn as sb
import pandas as pd
import numpy as np
from sklearn.cluster import KMeans
from sklearn.preprocessing import scale
from sklearn.metrics import silhouette_score
from sklearn.metrics import davies_bouldin_score
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler as sc
from mpl_toolkits import mplot3d
import plotly.express as px
dta=pd.read_csv("D:/XLRI/Term-4/ML/Assignment-2/Prpd_2.csv")
dta.head()
dta1=dta.drop("Cid",axis=1,inplace=False)
#dta1=dta1.iloc[:,1:4]
dta1=pd.DataFrame(dta1)
dta1.head()
dta1.describe()
dta1=pd.DataFrame(dta1)
dta1.describe()
ncl=[]
for i in range(1,15):
kn=KMeans(n_clusters=i)
kn.fit(dta1)
ncl.append(kn.inertia_)
plt.plot(range(1,15),ncl)
#silhoute method
sil = []
for n in range(2,15):
kn1=KMeans(n_clusters = n)
kn1.fit(dta1)
# labels = kn1.labels_
sil.append(silhouette_score(dta1,kn1.labels_, metric = 'euclidean'))
plt.plot(range(2,15),sil)
#Davies Bouldin Index method
db = []
K1 = range(2,8)
for l in K1:
kn2 = (KMeans(n_clusters = l) )
kn2.fit(dta1)
db.append(davies_bouldin_score(dta1,kn2.labels_))
plt.plot(range(2,8),db)
sa=sc()
sa.fit(dta1)
tdta1=sa.transform(dta1)
tdta1=pd.DataFrame(tdta1)
kmc=KMeans(n_clusters=6)
kmc.fit(tdta1)
clus=kmc.predict(tdta1)
dta["clus"]=clus
dta.head()
clus4=dta[dta.clus==4]
clus4.describe()
clus0=dta[dta.clus==0]
clus0.describe()
clus5=dta[dta.clus==5]
clus5.describe()
clus3=dta[dta.clus==3]
clus3.describe()
sb.scatterplot("Recency","Frequency",data=dta,hue="clus")
sb.scatterplot("Frequency","Money",data=dta,hue="clus")
# Creating dataset
z = dta.Recency
x = dta.Frequency
y = dta.Money
z.head()
x.head()
y.head()
# Creating figure
#fig = plt.figure()
#ax = fig.add_subplot(111,projection ="3d")
#dta=pd.DataFrame(dta)
#dta.head()
#for a in range(0,5):
# ax.scatter(dta.Frequency[dta.clus==a],dta.Recency[dta.clus==a],dta.Money[dta.clus==a],label=a,hue="clus")
#ax.legend()
#plt.title("simple 3D scatter plot")
#plt.show()
#df = px.data.iris()
#fig = px.scatter_3d(df, x='sepal_length', y='sepal_width', z='petal_width',color='petal_length',symbol='species')
#fig=plt.figure()
Hello Frieds,
I am newbie to python. Just learning. I have taken a dataset and clustered it. Now, I want to plot it in 3d scatter plot with a 4th dimension that is my cluster as color. For each cluster no new color should appear. So a data point will be plotted as x,y,z attribute but it will have color based on 4th column that is my cluster number. I know how to do it in 2d with hue. But I am unable to find similar thing in 3d plot. Any help will be appreicated. Atatching my code too.
I tried many libraries from online tutorial but I am not egtting exactly what I am looking for. I have attached a sample for how I want it to be plotted. Sample taken from plotly.com This is just replication how I want to plot.
enter image description here
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(z,x,y, marker=".", c=dta["clus"], s=50, cmap="RdBu")
plt.legend(clus)
plt.title("4D scatterplot")
ax.set_xlabel("Recency")
ax.set_ylabel("Frequency")
ax.set_zlabel("Money")
plt.show()
I am trying to create visualizations for recent commonwealth medal tally dataset.
I would like to create a grouped bar chart of top ten countries by total number of medals won.
Y axis = total
x axis = Country name
How can I divide totals into three bars consisting of no of :
gold, Silver,Bronze medals won by each country?
I created one using excel, but don't know how to do it using seaborn
P.S. I have already tried using a list of columns for hue.
df_10 = df.head(10)
sns.barplot(data = df_10, x = 'team' , y = 'total' , hue = df_10[["gold" ,
"silver","bronze"]].apply(tuple , axis = 1) )
Here is the chart that I created using excel:
enter image description here
To plot the graph, you will need to change the dataframe to the format that will allow for easy plotting. One of the ways to do this is using dataframe.melt(). The method used by you may not work... Once the data is in a format that seaborn understands easily, plotting will become simple. As you have not provided the format for df_10, I have assumed the data to have 4 columns - Country, Gold, Silver and Bronze. Below is the code...
## Use melt using Country as ID and G, S, B as the rows for values
df_10 = pd.melt(df_10, id_vars=['Country'], value_vars=['Gold', 'Silver', 'Bronze'])
df_10.rename(columns={'value':'Count', 'variable':'Medals'}, inplace=True) ##Rename so the plot has informative texts
fig, ax=plt.subplots(figsize=(12, 7)) ## Set figure size
ax=sns.barplot(data=df_10, x='Country', y='Count', hue='Medals') ## Plot the graph
I have a Geodataframe ("mhg") in which the index are months (i.e. "2019-01-01", "2019-02-01", ...), and the GDF have a column that is the geometry of certain regions (i.e. POLYGON(...)), and finally another column that is the population corresponding to that geometry at that month.
sample data (with onyl two months) could be created by:
import geopandas as gpd
data = [['2019-01-01', 'POLYGON(123...)', 1000], ['2019-01-01', 'POLYGON(456...)', 1500], ['2019-01-01', 'POLYGON(789...)', 1400], ['2019-02-01', 'POLYGON(123...)', 1100], ['2019-02-01', 'POLYGON(456...)', 1600], ['2019-02-01', 'POLYGON(789...)', 1300]]
mhg = gpd.GeoDataFrame(data, columns=['month','geometry', 'population'])
mhg.set_index('month')
I can make a multicolor plot of the users living in each region (all periods) with:
mhg.plot(column='population',cmap='jet')
and I can make the same, but filtering by month, using:
mhg.loc['2019-01-01'].plot(column='population',cmap='jet')
I would like to get some kind of ""animation" or animated gif where I can see the temporal evolution of the population, by using this kind of pseudocode:
for all the plots in
mhg.loc['2019-01-01'].plot(column='population',cmap='jet')
mhg.loc['2019-02-01'].plot(column='population',cmap='jet')
mhg.loc['2019-03-01'].plot(column='population',cmap='jet')
...
then merge all plots into 1 animated gif
But I dont' know how to do it: the number of months can be up to hundreds, I don't how how to make the for loop, and I don't know even how to start...
Any suggestions?
EDIT: I tried the following (following https://linuxtut.com/en/c089c549df4d4a6d815c/):
months = np.sort(np.unique(mhg.month.values))
from matplotlib.animation import FuncAnimation
from matplotlib.animation import PillowWriter
fig, ax = plt.subplots()
ims = []
def update_fig(month):
if len(ims) > 0:
ims[0].remove()
del ims[0]
geos = mhg['geometry'].values
users = mhg[(mhg.month==month)].population
apl = gpd.plotting.plot_polygon_collection(ax, geos, population, True, cmap='jet')
ims.append(apl)
ax.set_title('month = ' + str(month))
return ims
anim = FuncAnimation(fig, update_fig, interval=1000, repeat_delay=3000, frames=months)
plt.show()
But I got a UserWarning: animation was deleted without rendering anything...
So I am stuck again.
I managed to do it this way:
mhg = mhg.reset_index()
groups = mhg.groupby('month')
for month, grp in groups:
grp.plot(column='users',cmap='jet',legend=True,figsize=(10, 10),norm=matplotlib.colors.LogNorm(vmin=mhg.users.min(), vmax=mhg.users.max()))
plt.title({month})
plt.xlim([-20, 5])
plt.ylim([25, 45])
plt.savefig("plot{month}.png".format(month=month), facecolor='white')
plt.close()
And then I joined all the png's with convert (imagemagick tool):
convert -delay 50 -loop 0 *.png aniamtion.gif
This is my code:
import pandas as pd
import geoplot as gplt
import geopandas as gpd
import geoplot.crs as gcrs
import contextily
df = pd.read_csv('dataframe_master.csv', index_col='id')
crs = {'init': 'epsg:4326'}
geometry = [geometry.Point(xy) for xy in zip(df['latitude'], df['longitude'])]
df_geo = gpd.GeoDataFrame(df_geo, crs=crs, geometry=geometry)
test = df_geo[:200000]
test = test.to_crs(epsg=3857)
ax = test.plot(marker='o', markersize=1)
contextily.add_basemap(ax)
plt.show()
And it generates this image:
image, which doesn't show a background map and seems a little distorted.
My coordinate data was originally made with the RD-coordinaten standard (EPSG:28992), which I converted to EPSG:4326 with this code:
lon_l = []
lat_l = []
p1 = Proj(init='epsg:28992')
p2 = Proj(proj='latlong',datum='WGS84')
for row in range(len(df)):
lon, lat, z = transform(p1, p2, df.iloc[row, 7], df.iloc[row, 8], 0.0)
lon_l.append(lon)
lat_l.append(lat)
I did a sanity check on the longitude latitude output by comparing to some online converters, and the output points to the correct locations.
I tried following this solution: https://gis.stackexchange.com/questions/348339/using-crs-epsg3857-but-misalignment-between-stamen-background-and-coordinates-o in case my conversion was missing the "towgs84"part, but the image still looked the same with a slightly different colour.
I figured it out! I should've listed longitude before latitude when building the geometry.
geometry = [geometry.Point(xy) for xy in zip(df['longitude'], df['latitude'])]
I am making scatter plot in seaborn and I want to add some text to each point of scatter plot according to my data ("Countries" column in hap_educ and hap_rel tables). I think I need loop to do this but cannot figure out how to do it for seaborn. Here is code I use:
https://ibb.co/hZ9NBV0
https://ibb.co/ZYLdgkt
import pandas as pd
import os
import seaborn as sns
import matplotlib.pyplot as plt
# Set up working directory
os.chdir(r'D:/PROJECT CSS/')
#importing data from xlsx files
educ = pd.read_excel(r'D:\PROJECT CSS\educ.xlsx')
happiness= pd.read_excel(r'D:\PROJECT CSS\happiness edited.xlsx')
religious=pd.read_excel(r'D:\PROJECT CSS\religious edited.xlsx')
#Merging data into 2 tables
hap_rel = pd.merge(religious, happiness, on ='Country')
hap_educ= pd.merge(educ, happiness, on ='Country')
p1=sns.regplot(x =hap_educ['Score'], y =hap_educ['Pupil teacher ratio'], data=hap_educ, label='Countries')
plt.xlabel("Index of happiness")
plt.ylabel("Pupil / teacher ratio")
p2=sns.regplot(x=hap_rel['Score'], y=hap_rel['Yes'], data=hap_rel)
plt.xlabel("Index of happiness")
plt.ylabel("Percent of religious people(1=100%)")
Expect to see each point to be Annotated with Country name from my table