Add lines between each cell in a Bivariate seaborn histplot blocks - seaborn

I've been trying to add lines between each cell in a sns.histplot with two variables same than a sns.heatmap but I've failed every single time.
I've tried with linewidths argument since it is the way for doing that with a heatmap
penguins = sns.load_dataset("penguins")
sns.histplot(penguins, x="bill_depth_mm", y="body_mass_g", linewidths=1)
but nothing changes. I know I could aggregate the data first and then use a heatmap but I feel so dumb that I can do it in a single way. I'm using seaborn 0.11.2
Thanks in advance!

penguins = sns.load_dataset("penguins")
sns.histplot(
penguins, x="bill_depth_mm", y="body_mass_g", linewidths=1,
edgecolor="w" # <-- Here's what you're missing
)

Related

seaborn FacetGrid empty

A simple dataset, with one column named "measurement" with about 20 distinct values, and another one named "value".
g = sns.FacetGrid(data, col='measurement',col_wrap=4)
g.map(sns.displot,'value')
I get an error about the number of plots matplotlib.pyplot creates
RuntimeWarning: More than 20 figures have been opened. Figures created through the pyplot interface (matplotlib.pyplot.figure) are retained until explicitly closed and may consume too much memory. (To control this warning, see the rcParam figure.max_open_warning).
fig, axes = plt.subplots(nrow, ncol, **kwargs)
The 2 lines of code return me a long column with all the individual graphs and , at the end, an empty FacetGrid (snippet below). I have no idea why this is happening, anybody have a thought?
Thanks
When I tried,
g = sns.FacetGrid(data, col='measurement',col_wrap=4)
g.map(sns.displot,'value')
I also got same results, but changing to distplot or histplot gave me the desired results. If that's what you've wanted.

Transformer-XL: Input and labels for Language Modeling

I'm trying to finetune the pretrained Transformer-XL model transfo-xl-wt103 for a language modeling task. Therfore, I use the model class TransfoXLLMHeadModel.
To iterate over my dataset I use the LMOrderedIterator from the file tokenization_transfo_xl.py which yields a tensor with the data and its target for each batch (and the sequence length).
Let's assume the following data with batch_size = 1 and bptt = 8:
data = tensor([[1,2,3,4,5,6,7,8]])
target = tensor([[2,3,4,5,6,7,8,9]])
mems # from the previous output
My question is: I currently pass this data into the model like this:
output = model(input_ids=data, labels=target, mems=mems)
Is this correct?
I am wondering because the documentation says for the labels parameter:
labels (:obj:torch.LongTensor of shape :obj:(batch_size, sequence_length), optional, defaults to :obj:None):
Labels for language modeling.
Note that the labels are shifted inside the model, i.e. you can set lm_labels = input_ids
So what is it about the parameter lm_labels? I only see labels defined in the forward method.
And when the labels "are shifted" inside the model, does this mean I have to pass data twice (additionally instead of targets) because its shifted inside? But how does the model then know the next token to predict?
I also read through this bug and the fix in this pull request but I don't quite understand how to treat the model now (before vs. after fix)
Thanks in advance for some help!
Edit: Link to issue on Github
That does sound like a typo from another model's convention. You do have to pass data twice, once to input_ids and once to labels (in your case, [1, ... , 8] for both). The model will then attempt to predict [2, ... , 8] from [1, ... , 7]). I am not sure adding something at the beginning of the target tensor would work as that would probably cause size mismatches later down the line.
Passing twice is the default way to do this in transformers; before the aforementioned PR, TransfoXL did not shift labels internally and you had to shift the labels yourself. The PR changed it to be consistent with the library and the documentation, where you have to pass the same data twice.

Using Seaborn's PairGrid to plot many variables against one

I'd like to visualize how one variable in my dataset correlates with 13 other variables. Seaborn's PairGrid allows me to do this fairly easily, but the resulting figure ends up being a single row of graphs with 13 columns. For FacetGrid, there is a wrap_cols parameter that can be passed to make this type of plot look more attractive. Any suggestions for how to implement this column wrap with PairGrid?
The code I'm currently using to generate the 1x13 plot:
g = sns.PairGrid(dataframe, hue=classes, y_vars=var_of_interest, x_vars = list_of_13_covariates)
g.map(plt.scatter)
The PairGrid object does not have a col_wrap parameter.
See the docs here:
http://seaborn.pydata.org/generated/seaborn.PairGrid.html#seaborn.PairGrid

Paraview rotate fields

I am using Paraview 5.0.1. If any solution requires updating, I can try.
I want to programmatically obtain field plots (and corresponding PlotOverLine) of displacements and stresses in rotated coordinate systems.
What are appropriate/convenient/possible ways of doing this?
So far, I have created one Calculator filter for each component of displacements and stresses.
For instance, I used Calculators in 2D with results
(displacement.iHat)*cos(0.7853981625)+(displacement.jHat)*sin(0.7853981625)
(stress_3-stress_0)*sin(45.0*3.14159265/180)*cos(45.0*3.14159265/180)+stress_1*((cos(45.0*3.14159265/180))^2-(sin(45.0*3.14159265/180))^2)
It works fine, but it is quite cumbersome, in several aspects:
Creating them (one filter per component).
Plotting several of them in a single XY plot
Exporting them (one export per component).
Is there a simple way to do this?
PS: The Transform filter does not accomplish this. It rotates the view, not the fields.
Two solutions:
Ugly, inneficient solution
Use Transform and check "Transform All Input vectors"
Add a calculator and add a dummy array
Use transform the other way around, without checking "Transform All Input vectors"
Correct solution :
Compute the transformation yourself in a programmable filter
input = self.GetUnstructuredGridInput();
output = self.GetUnstructuredGridOutput();
output.ShallowCopy(input)
data = input.GetPointData().GetArray("YourArray")
vec = vtk.vtkDoubleArray();
vec.SetNumberOfComponents(3);
vec.SetName("TransformedVectors");
numPoints = input.GetNumberOfPoints()
for i in xrange(0, numPoints):
tuple = data.GetTuple(i)
transform(tuple) # implement the transform in python
vec.InsertNextTuple(tuple)
output.GetPointData().AddArray(vec)

Caffe Multiple Input Images

I'm looking at implementing a Caffe CNN which accepts two input images and a label (later perhaps other data) and was wondering if anyone was aware of the correct syntax in the prototxt file for doing this? Is it simply an IMAGE_DATA layer with additional tops? Or should I use separate IMAGE_DATA layers for each?
Thanks,
James
Edit: I have been using the HDF5_DATA layer lately for this and it is definitely the way to go.
HDF5 is a key value store, where each key is a string, and each value is a multi-dimensional array. Thus, to use the HDF5_DATA layer, just add a new key for each top you want to use, and set the value for that key to store the image you want to use. Writing these HDF5 files from python is easy:
import h5py
import numpy as np
filelist = []
for i in range(100):
image1 = get_some_image(i)
image2 = get_another_image(i)
filename = '/tmp/my_hdf5%d.h5' % i
with hypy.File(filename, 'w') as f:
f['data1'] = np.transpose(image1, (2, 0, 1))
f['data2'] = np.transpose(image2, (2, 0, 1))
filelist.append(filename)
with open('/tmp/filelist.txt', 'w') as f:
for filename in filelist:
f.write(filename + '\n')
Then simply set the source of the HDF5_DATA param to be '/tmp/filelist.txt', and set the tops to be "data1" and "data2".
I'm leaving the original response below:
====================================================
There are two good ways of doing this. The easiest is probably to use two separate IMAGE_DATA layers, one with the first image and label, and a second with the second image. Caffe retrieves images from LMDB or LEVELDB, which are key value stores, and assuming you create your two databases with corresponding images having the same integer id key, Caffe will in fact load the images correctly, and you can proceed to construct your net with the data/labels of both layers.
The problem with this approach is that having two data layers is not really very satisfying, and it doesn't scale very well if you want to do more advanced things like having non-integer labels for things like bounding boxes, etc. If you're prepared to make a time investment in this, you can do a better job by modifying the tools/convert_imageset.cpp file to stack images or other data across channels. For example you could create a datum with 6 channels - the first 3 for your first image's RGB, and the second 3 for your second image's RGB. After reading this in using the IMAGE_DATA layer, you can split the stream into two images using a SLICE layer with a slice_point at index 3 along the slice_dim = 1 dimension. If further down the road, you decide that you want to load even more complex assortments of data, you'll understand the encoding scheme and can write your own decoding layer based off of src/caffe/layers/data_layer.cpp to gain full control of the pipeline.
You may also consider using HDF5_DATA layer with multiple "top"s

Resources