I am trying to create visualizations for recent commonwealth medal tally dataset.
I would like to create a grouped bar chart of top ten countries by total number of medals won.
Y axis = total
x axis = Country name
How can I divide totals into three bars consisting of no of :
gold, Silver,Bronze medals won by each country?
I created one using excel, but don't know how to do it using seaborn
P.S. I have already tried using a list of columns for hue.
df_10 = df.head(10)
sns.barplot(data = df_10, x = 'team' , y = 'total' , hue = df_10[["gold" ,
"silver","bronze"]].apply(tuple , axis = 1) )
Here is the chart that I created using excel:
enter image description here
To plot the graph, you will need to change the dataframe to the format that will allow for easy plotting. One of the ways to do this is using dataframe.melt(). The method used by you may not work... Once the data is in a format that seaborn understands easily, plotting will become simple. As you have not provided the format for df_10, I have assumed the data to have 4 columns - Country, Gold, Silver and Bronze. Below is the code...
## Use melt using Country as ID and G, S, B as the rows for values
df_10 = pd.melt(df_10, id_vars=['Country'], value_vars=['Gold', 'Silver', 'Bronze'])
df_10.rename(columns={'value':'Count', 'variable':'Medals'}, inplace=True) ##Rename so the plot has informative texts
fig, ax=plt.subplots(figsize=(12, 7)) ## Set figure size
ax=sns.barplot(data=df_10, x='Country', y='Count', hue='Medals') ## Plot the graph
Related
My dataframe has a column 'rideable_type' which has 3 unique values:
1.classic_bike
2.docked_bike
3.electric_bike
While plotting a barplot using the following code:
g = sns.FacetGrid(electric_casual_type_week, col='member_casual', hue='rideable_type', height=7, aspect=0.65)
g.map(sns.barplot, 'day_of_week', 'number_of_rides').add_legend()
I only get a plot showing 2 unique 'rideable_type' values.
Here is the plot:
As you can see only 'electric_bike' and 'classic_bike' are seen and not 'docked_bike'.
The main problem is that all the bars are drawn on top of each other. Seaborn's barplots don't easily support stacked bars. Also, this way of creating the barplot doesn't support the default "dodging" (barplot is called separately for each hue value, while it would be needed to call it in one go for dodging to work).
Therefore, the recommended way is to use catplot, a special version of FacetGrid for categorical plots.
g = sns.catplot(kind='bar', data=electric_casual_type_week, x='day_of_week', y='number_of_rides',
col='member_casual', hue='rideable_type', height=7, aspect=0.65)
Here is an example using Seaborn's 'tips' dataset:
import seaborn as sns
tips = sns.load_dataset('tips')
g = sns.FacetGrid(data=tips, col='time', hue='sex', height=7, aspect=0.65)
g.map_dataframe(sns.barplot, x='day', y='total_bill')
g.add_legend()
When comparing with sns.catplot, the coinciding bars are clear:
g = sns.catplot(kind='bar', data=tips, x='day', y='total_bill', col='time', hue='sex', height=7, aspect=0.65)
I plotted a scatterplot with seaborn library and I want to change the legend text but dont know how to do that.
example:
The following is iris dataset with species columns encoded in 0/1/2 as per species.
plt.figure(figsize=(8,8))
pl = sns.scatterplot(x='petal_length', y ='petal_width', hue='Species', data=data, s=40,
palette='Set1', legend='full')
I want to change the legends text from [0, 1, 2] to ['setosa', 'versicolor', 'virginica'].
can anybody help.
First, Seaborn (and Matplotlib) usually picks up the labels to put into the legend for hue from the unique values of the array you provide as hue. So as a first step, check that the column Species in your dataframe actually contains the values "setosa", "versicolor", "virginica". If not, one solution is to temporarily map them to other values, for the purpose of plotting:
legend_map = {0: 'setosa',
1: 'versicolor',
2: 'virginica'}
plt.figure(figsize=(8,8))
ax = sns.scatterplot(x=data['petal_length'], y =data['petal_width'], hue=data['species'].map(legend_map),
s=40, palette='Set1', legend='full')
plt.show()
Alternatively, if you want to directly manipulate the plot information and not the underlying data, you can do by accessing the legend names directly:
plt.figure(figsize=(8,8))
ax = sns.scatterplot(x='petal_length', y ='petal_width', hue='species', data=data, s=40,
palette='Set1', legend='full')
l = ax.legend()
l.get_texts()[0].set_text('Species') # You can also change the legend title
l.get_texts()[1].set_text('Setosa')
l.get_texts()[2].set_text('Versicolor')
l.get_texts()[3].set_text('Virginica')
plt.show()
This methodology allows you to also change the legend title, if need be.
I have created my Pivot plotting and now looking for resctricting it for the last few days only.
I am not able to restrict the same. can some one help me on the same.
In the plot function use show_last = x
Where x is the number of bars to show from the latest bar.
For example to plot the pivots on the last 10 bars use:
plot(series = pivots, title = "Pivots", show_last = 10)
I am using working on Tableau stacked bar chart.
The bar chart represents the total %. Therefore, the length of bar chart is equal.
Now I would like to sort the dimension (referee) based on the values of legends ( highest to lowest).
can anyone suggest me how to do it.
I also attached the packaged workfile here
Here is the picture of sort screen;
Level of data source below:
Below is the screen shot based on the final answer provided:
Thanks,
Zep
So to get this you first need to get a calc field that gets the win %:
SUM(IF [FTR] = 'AWins' OR [FTR] = 'Hwins' THEN 1 END)/COUNTD([Game ID])
This can then be used to rank the referees:
Now the reason that it may not be working for you with your technique is that you're sorting on COUNTD(Wins) which is the total number of wins, not the percentage wins for the ref. So someone that has just played more games may come up higher in the rank
Now you have the calc field, you can go back to your report and sort on the new field:
I rearranged the legend so you can see that the ref with the best % wins are shown first (red and blue bars)
If you don't want it sorted by win %, then change the calc field to:
SUM(IF [FTR] = 'AWins' OR [FTR] = 'Hwins' THEN 1 END)
For the COUNTD of games, if you only have the date and the game available and want to create an ID from that that is unique, create a calc field like this:
game-date-id = STR([game]) + STR(' ') + STR(date)
This will then be used in your COUNTD if statement:
SUM(IF [FTR] = 'AWins' OR [FTR] = 'Hwins' THEN 1 END)/COUNTD([game-date-id])
I have attached the picture of the dashboard.
I want to sort the referee based of Hwin
Yeah. It did not work out as expected
not sure how to do this. I have the following data:
Date, Country, QuantityA, QuantityB.
I want to make a timeline Chart with the ratio between Quantity A and B. I also want to create a barChart with Country, which will show the ratio in every country.
The problem is that the ratios are not additive, so if I do this:
var timeDim = ndx.dimension(function(d) {return d.Date;});
ratioAB = timeDim.group().reduceSum(function(d) {return QuantityA/QuantityB}
This will return the ratios for every country separately and will add them up. What I want is to add up QuantityA and QuantityB and then do the ratio.
Thus, the timeline chart will only show the right ratio if I filter in one of the countries.
Is there a way to add both the country and the date as a dimension?
You can create a custom grouping to calculate the sum of QuantityA, the sum of QuantityB, and the ratio between the 2. Or you could just create 2 sum groups, one summing QuantityA, the other QuantityB, and then calculate the ratio when you build the visualization.