How to display custom index using Bokeh hover tool? - scatter-plot

I'm using Bokeh to plot the results of ~700 simulations against another set of results using a scatter plot. I'd like to use the hover tool to qualitatively determine patterns in the data by assigning a custom index that identifies the simulation parameters.
In the code below, x and y are the columns from a Pandas DataFrame which has the simulation IDs for the index. I've been able to assign this index to an array using <DataFrameName>.index.values but I haven't found any documentation on how to assign an index to the hover tool.
# Bokeh Plotting
h = 500
w = 500
default_tools = "pan, box_zoom, resize, wheel_zoom, save, reset"
custom_tools = ", hover"
fig = bp.figure(x_range=xr, y_range=yr, plot_width=w, plot_height=h, tools=default_tools+custom_tools)
fig.x(x, y, size=5, color="red", alpha=1)
bp.show(fig)

The documentation for configuring the hover tool has an example of how to do this that worked for me. Here's the code I used:
from bokeh.models import ColumnDataSource, HoverTool
cds = ColumnDataSource(
data=dict(
x=xdata,
y=ydata,
desc=sim
)
)
hover = HoverTool()
hover.tooltips = [
("Index", "$index"),
("(2z,1z)", "($x, $y)"),
("ID", "#desc")
]

Related

Seaborn grouped Bar plot

I am trying to create visualizations for recent commonwealth medal tally dataset.
I would like to create a grouped bar chart of top ten countries by total number of medals won.
Y axis = total
x axis = Country name
How can I divide totals into three bars consisting of no of :
gold, Silver,Bronze medals won by each country?
I created one using excel, but don't know how to do it using seaborn
P.S. I have already tried using a list of columns for hue.
df_10 = df.head(10)
sns.barplot(data = df_10, x = 'team' , y = 'total' , hue = df_10[["gold" ,
"silver","bronze"]].apply(tuple , axis = 1) )
Here is the chart that I created using excel:
enter image description here
To plot the graph, you will need to change the dataframe to the format that will allow for easy plotting. One of the ways to do this is using dataframe.melt(). The method used by you may not work... Once the data is in a format that seaborn understands easily, plotting will become simple. As you have not provided the format for df_10, I have assumed the data to have 4 columns - Country, Gold, Silver and Bronze. Below is the code...
## Use melt using Country as ID and G, S, B as the rows for values
df_10 = pd.melt(df_10, id_vars=['Country'], value_vars=['Gold', 'Silver', 'Bronze'])
df_10.rename(columns={'value':'Count', 'variable':'Medals'}, inplace=True) ##Rename so the plot has informative texts
fig, ax=plt.subplots(figsize=(12, 7)) ## Set figure size
ax=sns.barplot(data=df_10, x='Country', y='Count', hue='Medals') ## Plot the graph

Plot does not highlight all the unique values of a column represented by hue

My dataframe has a column 'rideable_type' which has 3 unique values:
1.classic_bike
2.docked_bike
3.electric_bike
While plotting a barplot using the following code:
g = sns.FacetGrid(electric_casual_type_week, col='member_casual', hue='rideable_type', height=7, aspect=0.65)
g.map(sns.barplot, 'day_of_week', 'number_of_rides').add_legend()
I only get a plot showing 2 unique 'rideable_type' values.
Here is the plot:
As you can see only 'electric_bike' and 'classic_bike' are seen and not 'docked_bike'.
The main problem is that all the bars are drawn on top of each other. Seaborn's barplots don't easily support stacked bars. Also, this way of creating the barplot doesn't support the default "dodging" (barplot is called separately for each hue value, while it would be needed to call it in one go for dodging to work).
Therefore, the recommended way is to use catplot, a special version of FacetGrid for categorical plots.
g = sns.catplot(kind='bar', data=electric_casual_type_week, x='day_of_week', y='number_of_rides',
col='member_casual', hue='rideable_type', height=7, aspect=0.65)
Here is an example using Seaborn's 'tips' dataset:
import seaborn as sns
tips = sns.load_dataset('tips')
g = sns.FacetGrid(data=tips, col='time', hue='sex', height=7, aspect=0.65)
g.map_dataframe(sns.barplot, x='day', y='total_bill')
g.add_legend()
When comparing with sns.catplot, the coinciding bars are clear:
g = sns.catplot(kind='bar', data=tips, x='day', y='total_bill', col='time', hue='sex', height=7, aspect=0.65)

Scatterplot with x axis only

I have a dataframe 'Spreads' where one of the columns is 'HY_OAS'. My goal is to draw a horizontal line (basically representing a range of values for 'HY_OAS') and plot the column mean on that line. In addition, I wanted the x axis min/max to be the min/max for that column and I'd like to include text boxes annotating the min/max. The problem is I'm not sure how to proceed because all I have is the below. Thanks for any and all help. The goal is the second image and the current code is the first image.
fig8 = px.scatter(x=[Spreads['HY_OAS'].mean()], y=[0])
fig8.update_xaxes(visible=True,showticklabels=False,range=[Spreads['HY_OAS'].min(),Spreads['HY_OAS'].max()])
fig8.update_yaxes(visible=True,showticklabels=False, range=[0,0])
Following what you describe and what you have coded
generate some sample data in a dataframe
scatter values along x-axis and use constant for y-axis
add mean marker
format figure
add required annotations
import numpy as np
import plotly.express as px
import pandas as pd
# simulate some data
Spreads = pd.DataFrame({"HY_OAS": np.sin(np.random.uniform(0, np.pi * 2, 50))})
# scatter values along x-axis and and larger point for mean
fig = px.scatter(Spreads, x="HY_OAS", y=np.full(len(Spreads), 0)).add_traces(
px.scatter(x=[Spreads.mean()], y=[0])
.update_traces(marker={"color": "red", "size": 20})
.data
)
# fix up figure config
fig.update_layout(
xaxis_visible=False,
yaxis_visible=False,
showlegend=False,
paper_bgcolor="rgba(0,0,0,0)",
plot_bgcolor="rgba(0,0,0,0)",
)
# finally required annootations
fig.add_annotation(x=Spreads["HY_OAS"].mean(), y=0, text=Spreads["HY_OAS"].mean().round(4))
fig.add_annotation(x=Spreads["HY_OAS"].min(), y=0, text=Spreads["HY_OAS"].min().round(2), showarrow=False, xshift=-20)
fig.add_annotation(x=Spreads["HY_OAS"].max(), y=0, text=Spreads["HY_OAS"].max().round(2), showarrow=False, xshift=20)
straight line
build base figure as follows
then same code to add annotations and configure layout
fig = px.line(x=[Spreads["HY_OAS"].min(), Spreads["HY_OAS"].max()], y=[0,0]).add_traces(
px.scatter(x=[Spreads.mean()], y=[0])
.update_traces(marker={"color": "red", "size": 20})
.data
)

Geoview and geopandas groupby projection error

I’m experiencing projection errors following a groupby on geodataframe. Below you will find the libraries that I am using:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib
import holoviews as hv
from holoviews import opts
import panel as pn
from bokeh.resources import INLINE
import geopandas as gpd
import geoviews as gv
from cartopy import crs
hv.extension('bokeh', 'matplotlib')
gv.extension('bokeh')
pd.options.plotting.backend = 'holoviews'
Whilst these are the versions of some key libraries:
bokeh 2.1.1
geopandas 0.6.1
geoviews 1.8.1
holoviews 1.13.3
I have concatenated 3 shapefiles to build a polygon picture of UK healthcare boundaries (links to files provided if needed). Unfortunately, from what i have found the UK doesn’t produce one file that combines all of those, so have had to merge the shape files from the 3 individual countries i’m interested in. The 3 shape files have a size of:
shape file 1 = (https://www.opendatani.gov.uk/dataset/department-of-health-trust-boundaries)
shape file 2 = (https://geoportal.statistics.gov.uk/datasets/5252644ec26e4bffadf9d3661eef4826_4)
shape file 3 = (https://data.gov.uk/dataset/31ab16a2-22da-40d5-b5f0-625bafd76389/local-health-boards-december-2016-ultra-generalised-clipped-boundaries-in-wales)
My code to concat them together is below:
England_CCG.drop(['objectid', 'bng_e', 'bng_n', 'long', 'lat', 'st_areasha', 'st_lengths'], inplace = True, axis = 1 )
Wales_HB.drop(['objectid', 'bng_e', 'bng_n', 'long', 'lat', 'st_areasha', 'st_lengths', 'lhb16nmw'], inplace = True, axis = 1 )
Scotland_HB.drop(['Shape_Leng', 'Shape_Area'], inplace = True, axis = 1)
#NI_HB.drop(['Shape_Leng', 'Shape_Area'], inplace = True, axis = 1 )
England_CCG.rename(columns={'ccg20cd': 'CCG_Code', 'ccg20nm': 'CCG_Name'}, inplace = True )
Wales_HB.rename(columns={'lhb16cd': 'CCG_Code', 'lhb16nm': 'CCG_Name'}, inplace = True )
Scotland_HB.rename(columns={'HBCode': 'CCG_Code', 'HBName': 'CCG_Name'}, inplace = True )
#NI_HB.rename(columns={'TrustCode': 'CCG_Code', 'TrustName': 'CCG_Name'}, inplace = True )
UK_shape = [England_CCG, Wales_HB, Scotland_HB]
Merged_Shapes = gpd.GeoDataFrame(pd.concat(UK_shape))
Each of the files has the same esri projection once joined, and the shape plots perfectly as one when I run:
Test= gv.Polygons(Merged_Shapes, vdims=[('CCG_Name')], crs=crs.OSGB())
This gives me a polygon plot of the UK, with all the area boundaries for each ccg.
To my geodataframe, I then add a new column, called ‘Country’ which attributes each CCG to whatever the country they belong to. So, all the Welsh CCGs are attributed to Wales, all the English ones to England and all the Scottish ones to Scotland. Just a simple additional grouping of the data really.
What I want to achieve is to have a dropdown next to the polygon map I am making, that will show all the CCGs in a particular country when it is selected from the drop down widget. I understand that the way to to do this is by a groupby. However, when I use the following code to achieve this:
c1 = gv.Polygons(Merged_Shapes, vdims=[('CCG_Name','Country')], crs=crs.OSGB()).groupby(['Country'])
I get a long list of projection errors stating:
“WARNING:param.project_path: While projecting a Polygons element from a PlateCarree coordinate reference system (crs) to a Mercator projection none of the projected paths were contained within the bounds specified by the projection. Ensure you have specified the correct coordinate system for your data.”
To which I am left without a map but I retain the widget. Does anyone know what is going wrong here and what a possible solution would be? its been driving me crazy!
Kind regards,
For some reason geoviews doesn't like the OSGB projection then followed by a groupby, as it tries to default back to platecaree projection.
The way I fixed it was to just make the entire dataset project in epsg:4326. For anyone who also runs into this problem, code below (it is a well documented solution:
Merged_Shapes.to_crs({'init': 'epsg:4326'},inplace=True)
gv.Polygons(Merged_Shapes, vdims=[('CCG_Name'),('Country')]).groupby('Country')
The groupby works fine after this.

Python regionprops sci-kit image

I am using sci-kit image to get the "regionprops" of a segmented image. I then wish to replace each of the segment labels with their corresponding statistic (e.g eccentricity).
from skimage import segmentation
from skimage.measure import regionprops
#a segmented image
labels = segmentation.slic(img1, compactness=10, n_segments=200)
propimage = labels
#props loop
for region in regionprops(labels1, properties ='eccentricity') :
eccentricity = region.eccentricity
propimage[propimage==region] = eccentricity
This runs, but the propimage values do not change from their original labels
I have also tried:
for i in range(0,max(labels)):
prop = regions[i].eccentricity #the way to cal a single prop
propimage[i]= prop
This delivers this error
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
I am a recent migrant from matlab where I have implemented this, but the data structures used are completely different.
Can any one help me with this?
Thanks
Use ndimage from scipy : the sum() function can operate using your label array.
from scipy import ndimage as nd
sizes = nd.sum(label_file[0]>0, labels=label_file[0], index=np.arange(0,label_file[1])
You can then evaluate the distribution with numpy.histogram and so on.

Resources