Here is my code:
order = sorted(subway_df.conds.unique())
g0 = sns.boxplot(subway_df.ENTRIESn_hourly, groupby = subway_df.conds, data=subway_df, palette = "Set2", order = order )
g0.set(yscale = 'symlog')
g0.set_xlabel('conds', fontdict={'fontsize' : 14})
g0.set_ylabel('ENTRIESn_hourly', fontdict={'fontsize' : 14})
fig.suptitle('Hourly entries at various weather conditions', fontdict={'fontsize' : 16})
and the following graph:
I would like to find a way to sort the appearance of boxes by mean or variance as opposed to alphabetical order. Thanks!
Related
I am trying to create visualizations for recent commonwealth medal tally dataset.
I would like to create a grouped bar chart of top ten countries by total number of medals won.
Y axis = total
x axis = Country name
How can I divide totals into three bars consisting of no of :
gold, Silver,Bronze medals won by each country?
I created one using excel, but don't know how to do it using seaborn
P.S. I have already tried using a list of columns for hue.
df_10 = df.head(10)
sns.barplot(data = df_10, x = 'team' , y = 'total' , hue = df_10[["gold" ,
"silver","bronze"]].apply(tuple , axis = 1) )
Here is the chart that I created using excel:
enter image description here
To plot the graph, you will need to change the dataframe to the format that will allow for easy plotting. One of the ways to do this is using dataframe.melt(). The method used by you may not work... Once the data is in a format that seaborn understands easily, plotting will become simple. As you have not provided the format for df_10, I have assumed the data to have 4 columns - Country, Gold, Silver and Bronze. Below is the code...
## Use melt using Country as ID and G, S, B as the rows for values
df_10 = pd.melt(df_10, id_vars=['Country'], value_vars=['Gold', 'Silver', 'Bronze'])
df_10.rename(columns={'value':'Count', 'variable':'Medals'}, inplace=True) ##Rename so the plot has informative texts
fig, ax=plt.subplots(figsize=(12, 7)) ## Set figure size
ax=sns.barplot(data=df_10, x='Country', y='Count', hue='Medals') ## Plot the graph
My dataframe has a column 'rideable_type' which has 3 unique values:
1.classic_bike
2.docked_bike
3.electric_bike
While plotting a barplot using the following code:
g = sns.FacetGrid(electric_casual_type_week, col='member_casual', hue='rideable_type', height=7, aspect=0.65)
g.map(sns.barplot, 'day_of_week', 'number_of_rides').add_legend()
I only get a plot showing 2 unique 'rideable_type' values.
Here is the plot:
As you can see only 'electric_bike' and 'classic_bike' are seen and not 'docked_bike'.
The main problem is that all the bars are drawn on top of each other. Seaborn's barplots don't easily support stacked bars. Also, this way of creating the barplot doesn't support the default "dodging" (barplot is called separately for each hue value, while it would be needed to call it in one go for dodging to work).
Therefore, the recommended way is to use catplot, a special version of FacetGrid for categorical plots.
g = sns.catplot(kind='bar', data=electric_casual_type_week, x='day_of_week', y='number_of_rides',
col='member_casual', hue='rideable_type', height=7, aspect=0.65)
Here is an example using Seaborn's 'tips' dataset:
import seaborn as sns
tips = sns.load_dataset('tips')
g = sns.FacetGrid(data=tips, col='time', hue='sex', height=7, aspect=0.65)
g.map_dataframe(sns.barplot, x='day', y='total_bill')
g.add_legend()
When comparing with sns.catplot, the coinciding bars are clear:
g = sns.catplot(kind='bar', data=tips, x='day', y='total_bill', col='time', hue='sex', height=7, aspect=0.65)
I'm trying to plot 2 categorical variables and 1 numerical variable in different boxplots using FacetGrid.I need the color differentiation for Genders in the boxplots.
Following code I tried:
box_gender_order= pisa['Gender'].value_counts().index
g = sb.FacetGrid(data = pisa, col = 'Country', height = 3, col_order= high_math_score.head(3).index,palette= sb.color_palette(['blue','orange']),
col_wrap= 3, margin_titles = True)
g.map(sb.boxplot, 'Gender', 'Avg Math Score', order= box_gender_order);
The box plots do not show any difference in color though I see them well plotted.
I have my class defined in "unscaled.BL_yFYield_CSUSHPINSA" (basically, 1:up 0:down). I wish to color the scatterplot into classes akin to how this example demonstrates species are supposedly highlighted by 3 colors (note, I've reduced my example to two colors).
http://www.sthda.com/english/wiki/scatter-plot-matrices-r-base-graphs
this image specifically is what I'm trying to achieve (coloring based on my_cols and a categorical variable). In the iris example, I only saw two species (when I iterated iris$species), but the online code uses 3 colors in the graph, so I'm not sure how that works data.
My example I have two colors for two classes (however, eventually I wish to extend my number of classes beyond 2).
Example, assuming BL_yFYield_CSUSHPINSA had the following values for categorical 0, 1, 2 and I had 3 colors defined in my_cols.
Right now when I graph the output, this is what I get
pre_MyData <- read.csv(file="https://raw.githubusercontent.com/thistleknot/FredAPIR/master/reduced.csv", header=TRUE, sep=",")
MyData <- pre_MyData[,11:18]
my_cols <- c("#00AFBB", "#E7B800")
pairs(MyData[,1:8], pch = 19, cex = 0.5,
col = my_cols[MyData$unscaled.BL_yFYield_CSUSHPINSA],
lower.panel = NULL)
I thought about it. The answer was in my screenshot. my_cols is skipping values with 0 in BL_yfield... (treating it as null). I could try to fix it after the fax, or I could add 1 to my original dataset to remove 0's...
problem solved
pre_MyData <- read.csv(file="https://raw.githubusercontent.com/thistleknot/FredAPIR/master/reduced.csv", header=TRUE, sep=",")
MyData <- pre_MyData[,11:18]
my_cols <- c("#00AFBB", "#E7B800")
pairs(MyData[,1:8], pch = 19, cex = 0.5,
col = my_cols[MyData$unscaled.BL_yFYield_CSUSHPINSA+1],
lower.panel = NULL)
I'm using Bokeh to plot the results of ~700 simulations against another set of results using a scatter plot. I'd like to use the hover tool to qualitatively determine patterns in the data by assigning a custom index that identifies the simulation parameters.
In the code below, x and y are the columns from a Pandas DataFrame which has the simulation IDs for the index. I've been able to assign this index to an array using <DataFrameName>.index.values but I haven't found any documentation on how to assign an index to the hover tool.
# Bokeh Plotting
h = 500
w = 500
default_tools = "pan, box_zoom, resize, wheel_zoom, save, reset"
custom_tools = ", hover"
fig = bp.figure(x_range=xr, y_range=yr, plot_width=w, plot_height=h, tools=default_tools+custom_tools)
fig.x(x, y, size=5, color="red", alpha=1)
bp.show(fig)
The documentation for configuring the hover tool has an example of how to do this that worked for me. Here's the code I used:
from bokeh.models import ColumnDataSource, HoverTool
cds = ColumnDataSource(
data=dict(
x=xdata,
y=ydata,
desc=sim
)
)
hover = HoverTool()
hover.tooltips = [
("Index", "$index"),
("(2z,1z)", "($x, $y)"),
("ID", "#desc")
]