How to not show repeated values in a heatmap in plotly express (px.imshow)? - matrix

I'm trying to plot a matrix using a heatmap chart but I would like to avoid repeated values;
When using seaborn we can set a "mask" to avoid showing all values, but I can't find the equivalent on Plotly / Plotly Express;
I would like to see something like:
But at this moment, it is the below format:
Below is an MWE example of my data structure... Any reference or help to do this will be very welcome
import pandas as pd
import plotly.express as px
heatmap_data=pd.DataFrame(
{'user1': {'user1': 1,
'user2': 0.5267109866774764,
'user3': 0.905914413030722},
'user2': {'user1': 0.5267109866774764,
'user2': 1,
'user3': 0.5160264783692895},
'user3': {'user1': 0.905914413030722,
'user2': 0.5160264783692895,
'user3': 1}
})
fig = px.imshow(heatmap_data, zmin=0, zmax=1,
text_auto=True,
color_continuous_scale="Plasma")
fig
Thank you in advantage

The plotly heatmap does not implement the functionality you would expect. Also, matrix diagrams such as scatter plots have the ability to hide the top half. See this for examples. So I take advantage of the fact that null values are not displayed and replace unwanted data with null values in the original data. The default style then remains, so we change the theme and hide the axis lines. Finally, the height of the color bar is adjusted.
import pandas as pd
import plotly.express as px
heatmap_data=pd.DataFrame(
{'user1': {'user1': 1,
'user2': 0.5267109866774764,
'user3': 0.905914413030722},
'user2': {'user1': 0.5267109866774764,
'user2': 1,
'user3': 0.5160264783692895},
'user3': {'user1': 0.905914413030722,
'user2': 0.5160264783692895,
'user3': 1}
})
heatmap_data.loc['user1','user2']=None
heatmap_data.loc['user1','user3']=None
heatmap_data.loc['user2','user3']=None
fig = px.imshow(heatmap_data,
zmin=0,
zmax=1,
text_auto=True,
color_continuous_scale="Plasma",
template='simple_white'
)
fig.update_xaxes(showline=False)
fig.update_yaxes(showline=False)
fig.update_layout(autosize=False, width=400, coloraxis=dict(colorbar=dict(len=0.8)))
fig

Related

Is there any way to change the legends in seaborn?

I would like to change the format of pIC50 in the legend box. I would like it to be "circle according to the size with no filled color". Any suggestions are welcome!
plt.figure(figsize=(7, 7))
sns.scatterplot(x='MW', y='LogP', data=df_2class, hue='class', size='pIC50', edgecolor='black', alpha=0.2)
sns.set_style("whitegrid", {"ytick.major.size": 100,"xtick.major.size": 2, 'grid.linestyle': 'solid'})
plt.xlabel('MW', fontsize=14, fontweight='bold')
plt.ylabel('LogP', fontsize=14, fontweight='bold')
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0)
In this case, you can loop through the last legend handles and change the color of the dots. Here is an example using the iris dataset:
import matplotlib.pyplot as plt
import seaborn as sns
iris = sns.load_dataset('iris')
ax = sns.scatterplot(data=iris, x='sepal_length', y='petal_length', hue='species', size='sepal_width')
handles, labels = ax.get_legend_handles_labels()
for h in handles[-5:]: # changes the 5 last handles, this number might be different in your case
h.set_facecolor('none')
ax.legend(handles=handles, labels=labels, bbox_to_anchor=[1.02, 1.02], loc='upper left')
plt.tight_layout()
plt.show()

Scatterplot with x axis only

I have a dataframe 'Spreads' where one of the columns is 'HY_OAS'. My goal is to draw a horizontal line (basically representing a range of values for 'HY_OAS') and plot the column mean on that line. In addition, I wanted the x axis min/max to be the min/max for that column and I'd like to include text boxes annotating the min/max. The problem is I'm not sure how to proceed because all I have is the below. Thanks for any and all help. The goal is the second image and the current code is the first image.
fig8 = px.scatter(x=[Spreads['HY_OAS'].mean()], y=[0])
fig8.update_xaxes(visible=True,showticklabels=False,range=[Spreads['HY_OAS'].min(),Spreads['HY_OAS'].max()])
fig8.update_yaxes(visible=True,showticklabels=False, range=[0,0])
Following what you describe and what you have coded
generate some sample data in a dataframe
scatter values along x-axis and use constant for y-axis
add mean marker
format figure
add required annotations
import numpy as np
import plotly.express as px
import pandas as pd
# simulate some data
Spreads = pd.DataFrame({"HY_OAS": np.sin(np.random.uniform(0, np.pi * 2, 50))})
# scatter values along x-axis and and larger point for mean
fig = px.scatter(Spreads, x="HY_OAS", y=np.full(len(Spreads), 0)).add_traces(
px.scatter(x=[Spreads.mean()], y=[0])
.update_traces(marker={"color": "red", "size": 20})
.data
)
# fix up figure config
fig.update_layout(
xaxis_visible=False,
yaxis_visible=False,
showlegend=False,
paper_bgcolor="rgba(0,0,0,0)",
plot_bgcolor="rgba(0,0,0,0)",
)
# finally required annootations
fig.add_annotation(x=Spreads["HY_OAS"].mean(), y=0, text=Spreads["HY_OAS"].mean().round(4))
fig.add_annotation(x=Spreads["HY_OAS"].min(), y=0, text=Spreads["HY_OAS"].min().round(2), showarrow=False, xshift=-20)
fig.add_annotation(x=Spreads["HY_OAS"].max(), y=0, text=Spreads["HY_OAS"].max().round(2), showarrow=False, xshift=20)
straight line
build base figure as follows
then same code to add annotations and configure layout
fig = px.line(x=[Spreads["HY_OAS"].min(), Spreads["HY_OAS"].max()], y=[0,0]).add_traces(
px.scatter(x=[Spreads.mean()], y=[0])
.update_traces(marker={"color": "red", "size": 20})
.data
)

Seaborn Scatter plot custom legend showing Single Label

I used the below code to plot a scatter plot using seaborn. I need to change the labels text in legend. But when I add custom text for the legends, it's only showing one label. I need to have legend text as ['set', 'versi', 'vir']. The code is as below -
import seaborn as sns
import matplotlib.pyplot as plt
iris = sns.load_dataset("iris")
scatter = sns.scatterplot(x='sepal_length', y ='sepal_width', hue='species', data=iris, legend=False)
scatter.legend(labels = ['set', 'versi', 'vir'], loc='upper right')
plt.show(scatter)
Seaborn's sophisticated way of working can't always follow the rules needed for a standard legend (see e.g. issue 2280). Often, the legend is custom created. Currently, matplotlib doesn't provide simple functions to move (or alter) such a legend.
In seaborn 0.11.2, a function sns.move_legend(ax, ...) (info on github) is added, which can move the legend and change some other properties (but not the labels).
So, you can first let sns.scatterplot create a legend, and then move it.
The labels in the legend come from the element names in the hue-column. To obtain different names, the most straightforward way is to temporarily rename them.
Here is some example code (note that plt.show() doesn't have an ax as parameter, but does have an optional block= parameter):
import seaborn as sns
import matplotlib.pyplot as plt
iris = sns.load_dataset("iris")
ax = sns.scatterplot(x='sepal_length', y='sepal_width', hue='species',
data=iris.replace({'species': {'setosa': 'set', 'versicolor': 'versi', 'virginica': 'vir'}}))
sns.move_legend(ax, loc='upper right')
plt.show()

How do I get rid of space between x-ticks and axis in Seaborn heatmap?

In a Seaborn heatmap (within Jupyter Notebook), I am getting extra space between the axis and the x-ticks, which I've moved to the top. If I leave the ticks at the bottom, they are flush as expected, but I need them at the top. I can't figure how to get rid of that space between the upper edge of the plot and the x-ticks. I tried the padding setting in set_tick_params, but that only adjusts space between the tick and the label.
Here's a subset of the data to play with
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sb
axis_labels = ['Q1','Q2','Q3','Q4','Q5']
data = pd.DataFrame([[np.nan,0.14,0.01,0.00,-0.05],
[0.30,np.nan,0.01,0.03,-0.04],
[0.16,0.10,np.nan,0.01,-0.02],
[0.14,0.05,0.02,np.nan,-0.04],
[0.16,0.09,0.01,0.02,np.nan]])
fig, ax = plt.subplots(figsize=(15,15))
sb.heatmap(data, ax=ax, center=0, annot=True, mask=data.isnull(),
square=True, cmap=sb.diverging_palette(275, 150, s=80, l=55, as_cmap=True), cbar_kws={"shrink": 0.75})
ax.set_ylim(5,-0.5)
ax.set_xticklabels(axis_labels, rotation=90, ha='center', fontsize=12)
ax.set_yticklabels(axis_labels, rotation=0, fontsize=12)
ax.xaxis.tick_top();
Probably something super simple that I'm missing. Any ideas?

When using rasterize=True with datashader, how do I get transparency where count=0 to see the underlying tile?

Currently, when I do this:
import pandas as pd
import hvplot.pandas
df = pd.util.testing.makeDataFrame()
plot = df.hvplot.points('A', 'B', tiles=True, rasterize=True, geo=True,
aggregator='count')
I can't see the underlying tile source.
To see the underlying tile source philippjfr suggested setting the color bar limits slightly higher than 0 and set the min clipping_colors to transparent:
plot = plot.redim.range(**{'Count': (0.25, 1)})
plot = plot.opts('Image', clipping_colors={'min': 'transparent'})
Now the underlying tile source is viewable.
Full Code:
import pandas as pd
import hvplot.pandas
df = pd.util.testing.makeDataFrame()
plot = df.hvplot.points('A', 'B', tiles=True, rasterize=True, geo=True,
aggregator='count')
plot = plot.redim.range(**{'Count': (0.25, 1)})
plot = plot.opts('Image', clipping_colors={'min': 'transparent'})
plot

Resources