How can I plot different types of seaborn plots on different x ticks? - seaborn

I want to have multiple types of seaborn plots using the same y axis but with different x coordinates (see image below).
I've tried doing this multiple different ways with specifying the X-axis coordinates differently but can't seem to get it to work.
Here is an example of almost working code
x=[1,2,3,3,3,4,4,5,5,6] # first violin
y=[4,4,5,5,5,5,6] # second violin
z=[5,5,6] # swarmplot over second violin
for data,label in [(x,'x'),(y,'y'),(z,'z')]:
for i in data:
ax = sns.violinplot(data=data.loc[data.key.isin(['x','y'])], x='key', y='value',palette=['honeydew','lightgreen'])
sns.swarmplot(x=['swarmplot']*len(data), y=data['value'], order=ax.get_xticklabels() + ['swarmplot'], ax=ax) #.loc[data.key=='z',:]
It produces the following image:
However, it is plotting all values associated with x/y/z instead of just z. When I slice the dataframe to only 'z' in the swarmplot as below, I get an error:
sns.swarmplot(x=['swarmplot']*len(data), y=data.loc[data.key=='z',:]['value'], order=ax.get_xticklabels() + ['swarmplot'], ax=ax)
KeyError: 'swarmplot'
Any suggestions?

To draw a second plot onto the same x-axis, you can use order= giving a list of existing tick labels, appending the new labels.
Here is an example:
import seaborn as sns
tips = sns.load_dataset('tips')
ax = sns.swarmplot(data=tips, x='day', y='total_bill')
sns.violinplot(x=['violin']*len(tips), y=tips['total_bill'], order=ax.get_xticklabels() + ['violin'], ax=ax)
The problem with the code in the new question, is that the x= and y= of the swarmplot need the same number of elements. It also seems the swarmplot resets the y limits, so I added some code to readjust those:
from matplotlib import pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
x = [1, 2, 3, 3, 3, 4, 4, 5, 5, 6] # first violin
y = [4, 4, 5, 5, 5, 5, 6] # second violin
z = [5, 5, 6] # swarmplot over second violin
data = pd.DataFrame({'value': np.concatenate([x, y, z]),
'key': ['x'] * len(x) + ['y'] * len(y) + ['z'] * len(z)})
fig, ax = plt.subplots(1, figsize=(5, 5))
ax = sns.violinplot(data=data.loc[data.key.isin(['x', 'y'])], x='key', y='value', palette=['honeydew', 'lightgreen'])
ymin1, ymax1 = ax.get_ylim()
swarm_data = data.loc[data.key == 'z', :]['value']
sns.swarmplot(x=['swarmplot'] * len(swarm_data), y=swarm_data, order=ax.get_xticklabels() + ['swarmplot'], ax=ax)
ymin2, ymax2 = ax.get_ylim()
ax.set_ylim(min(ymin1, ymin2), max(ymax1, ymax2))
ax.set_xticklabels(['x', 'y', 'swarmplot'])
You can simplify things by directly using the data without creating a dataframe:
x = [1, 2, 3, 3, 3, 4, 4, 5, 5, 6] # first violin
y = [4, 4, 5, 5, 5, 5, 6] # second violin
z = [5, 5, 6] # swarmplot over second violin
fig, ax = plt.subplots(1, figsize=(5, 5))
ax = sns.violinplot(x=['x']*len(x) + ['y']*len(y), y=x + y, palette=['honeydew', 'lightgreen'])
ymin1, ymax1 = ax.get_ylim()
sns.swarmplot(x=['swarmplot'] * len(z), y=z, order=ax.get_xticklabels() + ['swarmplot'], ax=ax)
ymin2, ymax2 = ax.get_ylim()
ax.set_ylim(min(ymin1, ymin2), max(ymax1, ymax2))
ax.set_xticklabels(['x', 'y', 'swarmplot'])


pytorch: efficient way to perform operations on 2 tensors of different sizes, where one has a one-to-many relation

I have 2 tensors. The first tensor is 1D (e.g. a tensor of 3 values). The second tensor is 2D, with the first dim as the IDs to first tensor in a one-many relationship (e.g. a tensor with a shape of 6, 2)
# e.g. simple example of dot product
import torch
a = torch.tensor([2, 4, 3])
b = torch.tensor([[0, 2], [0, 3], [0, 1], [1, 4], [2, 3], [2, 1]]) # 1st column is the index to tensor a, 2nd column is the value
output = [(2*2)+(2*3)+(2*1),(4*4),(3*3)+(3*1)]
output = [12, 16, 12]
Current what I have is to find the size of each id in b (e.g. [3,1,2]) then using torch.split to group them into a list of tensors and running a for loop through the groups. It is fine for a small tensor, but when the size of the tensors are in millions, with tens of thousands of arbitrary-sized groups, it became very slow.
Any better solutions?
You can use numpy.bincount or torch.bincount to sum the elements of b by key:
import numpy as np
a = np.array([2,4,3])
b = np.array([[0,2], [0,3], [0,1], [1,4], [2,3], [2,1]])
print( np.bincount(b[:,0], b[:,1]) )
# [6. 4. 4.]
print( a * np.bincount(b[:,0], b[:,1]) )
# [12. 16. 12.]
import torch
a = torch.tensor([2,4,3])
b = torch.tensor([[0,2], [0,3], [0,1], [1,4], [2,3], [2,1]])
torch.bincount(b[:,0], b[:,1])
# tensor([6., 4., 4.], dtype=torch.float64)
a * torch.bincount(b[:,0], b[:,1])
# tensor([12., 16., 12.], dtype=torch.float64)
numpy.bincount official documentation;
torch.bincount official documentation;
How can I reduce a numpy array based on a key rather than an axis?
Another alternative in pytorch if gradient is needed.
import torch
a = torch.tensor([2,4,3])
b = torch.tensor([[0,2], [0,3], [0,1], [1,4], [2,3], [2,1]])
output = torch.zeros(a.shape[0], dtype=torch.long).index_add_(0, b[:, 0], b[:, 1]) * a
alternatively, torch.tensor.scatter_add also works.

I need to convert a 3 column array into a square matrix?

I am looking to convert my data to a square matrix:
Say your input is a list; you can then convert it to a list of lists (i.e., a proxy to a matrix) with list comprehension:
>>> x = [0, 5, 10, 5, 0, 2, 10, 2, 0]
>>> [x[3*k:3*k+3] for k in range(3)]
[[0, 5, 10], [5, 0, 2], [10, 2, 0]]
To help you parse the line: you are building a list by iterating over k from 0 to 2, where each element will be a slice of x that starts from index 3*k and ends at index 3*k+3. Thus, your list is [x[0:3], x[3:6], x[6:9]].
That said, it's much better to use numpy for all such needs. There, you would do:
>>> import numpy as np
>>> x = np.array([0, 5, 10, 5, 0, 2, 10, 2, 0])
>>> x.reshape(3, 3)
array([[ 0, 5, 10],
[ 5, 0, 2],
[10, 2, 0]])
The reshape() function converts your 1D array into the requested 2D matrix.

Why does this simple LightGBM binary classifier perform poorly?

I tried to train a LightGBM binary classifier using the Python API the relation -
if feature > 5, then 1 else 0
import pandas as pd
import numpy as np
import lightgbm as lgb
x_train = pd.DataFrame([4, 7, 2, 6, 3, 1, 9])
y_train = pd.DataFrame([0, 1, 0, 1, 0, 0, 1])
x_test = pd.DataFrame([8, 2])
y_test = pd.DataFrame([1, 0])
lgb_train = lgb.Dataset(x_train, y_train)
lgb_eval = lgb.Dataset(x_test, y_test, reference=lgb_train)
params = { 'objective': 'binary', 'metric': {'binary_logloss', 'auc'}}
gbm = lgb.train(params, lgb_train, valid_sets=lgb_eval)
y_pred = gbm.predict(x_test, num_iteration=gbm.best_iteration)
array([0.42857143, 0.42857143])
np.where((y_pred > 0.5), 1, 0)
array([0, 0])
Clearly it failed to predict the first test 8. Can anyone see what went wrong?
LightGBM's parameter defaults are set with the expectation of moderate-sized training data, and might not work well on extremely small datasets like the one in this question.
There are two in particular that are impacting your result:
min_data_in_leaf: minimum number of samples that must fall into a leaf node
min_sum_hessian_in_leaf: basically, the minimum contribution to the loss function for one leaf node
Setting these to the lowest possible values can force LightGBM to overfit to such a small dataset.
import pandas as pd
import numpy as np
import lightgbm as lgb
x_train = pd.DataFrame([4, 7, 2, 6, 3, 1, 9])
y_train = pd.DataFrame([0, 1, 0, 1, 0, 0, 1])
x_test = pd.DataFrame([8, 2])
y_test = pd.DataFrame([1, 0])
lgb_train = lgb.Dataset(x_train, y_train)
lgb_eval = lgb.Dataset(x_test, y_test, reference=lgb_train)
params = {
'objective': 'binary',
'metric': {'binary_logloss', 'auc'},
'min_data_in_leaf': 1,
'min_sum_hessian_in_leaf': 0
gbm = lgb.train(params, lgb_train, valid_sets=lgb_eval)
y_pred = gbm.predict(x_test, num_iteration=gbm.best_iteration)
# array([6.66660313e-01, 1.89048958e-05])
np.where((y_pred > 0.5), 1, 0)
# array([1, 0])
For details on all the parameters and their defaults, see

sympy matrix element round?

What am I doing wrong with the macro?
from sympy import *
from decimal import *
def MyMatrixRound(A):
m = A.shape[0]
n = A.shape[1]
for i in range(m):
for j in range(n):
A[i, j]=round(A[i, j]+0.2,1)
A[i, j] = Decimal(str(A[i, j])).quantize(Decimal('0.01'), rounding=ROUND_HALF_UP)
return A
x = Matrix(4, 3, range(12))
# Matrix([[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10, 11]])
# Matrix([[0.200000000000000, 1.20000000000000, 2.20000000000000], [3.20000000000000, 4.20000000000000, 5.20000000000000], [6.20000000000000, 7.20000000000000, 8.20000000000000], [9.20000000000000, 10.2000000000000, 11.2000000000000]])
# I want
# Matrix([[0.2, 1.2, 2.2], [3.2, 4.2, 5.2], [6.2, 7.2, 8.2], [9.2, 10.2, 11.2]])
Thank you in advance and sorry for the bad english!
from sympy import *
from decimal import *
def MyMatrixPrint(A,iDec):
if iDec == 0 :
cDec = '0'
elif iDec == 1:
elif iDec == 2:
cDec = '.01'
elif iDec == 3:
cDec == '.001'
m = A.shape[0]
n = A.shape[1]
arr = [[0 for j in range(m+n)] for i in range(m+n)]
cMatrix = "myMatrix([["
for i in range(m):
for j in range(n):
arr[i][j] = Decimal(str(A[i,j])).quantize(Decimal(cDec), rounding=ROUND_HALF_UP)
cMatrix = cMatrix+str(arr[i][j]) + " ,"
cMatrix = cMatrix + "],["
cMatrix = cMatrix + "]"
return cMatrix
x = Matrix(4, 3, range(12))
print(" ",x)
r = x.applyfunc(lambda e: e+.2)
# I want a little
# Matrix([[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10, 11]])
# myMatrix([[0.2 ,1.2 ,2.2 ,],[3.2 ,4.2 ,5.2 ,],[6.2 ,7.2 ,8.2 ,],[9.2 ,10.2 ,11.2 ,],[]
May not be correct but at least it works
Please correct me if some of my assumptions are wrong on this.
How to round Matrix elements in sympy?
from sympy import *
x = Matrix(4, 3, range(12))
x[2, 2]=x[2, 2]+0.2
# Matrix([[0, 1, 2], [3, 4, 5], [6, 7, 8.20000000000000], [9, 10, 11]])
# Matrix([[0, 1.00, 2.00], [3.00, 4.00, 5.00], [6.00, 7.00, 8.20], [9.00, 10.0, 11.0]])
In what environment are you running this? With Windows command line I get:
>>> x = Matrix(4, 3, range(12))
>>> r=x.applyfunc(lambda e:
... Decimal(str(round(e+.2,1))).quantize(Decimal('.01'),
... rounding=ROUND_HALF_UP))
>>> r[0]
>>> r
[0.2, 1.2, 2.2],
[3.2, 4.2, 5.2],
[6.2, 7.2, 8.2],
[9.2, 10.2, 11.2]])
>>> r[0].round(1)

Manipulating matrix elements in tensorflow

How can I do the following in tensorflow?
mat = [4,2,6,2,3] #
mat[2] = 0 # simple zero the 3rd element
I can't use the [] brackets because it only works on constants and not on
variables. I cant use the slice function either because that returns a tensor and you can't assign to a tensor.
import tensorflow as tf
sess = tf.Session()
var1 = tf.Variable(initial_value=[2, 5, -4, 0])
assignZerosOP = (var1[2] = 0) # < ------ This is what I want to do
Will print
[2, 5, -4, 0]
[2, 5, 0, 0])
You can't change a tensor - but, as you noted, you can change a variable.
There are three patterns you could use to accomplish what you want:
(a) Use tf.scatter_update to directly poke to the part of the variable you want to change.
import tensorflow as tf
a = tf.Variable(initial_value=[2, 5, -4, 0])
b = tf.scatter_update(a, [1], [9])
init = tf.initialize_all_variables()
with tf.Session() as s:
[ 2 5 -4 0]
[ 2 9 -4 0]
[ 2 9 -4 0]
(b) Create two tf.slice()s of the tensor, excluding the item you want to change, and then tf.concat(0, [a, 0, b]) them back together.
(c) Create b = tf.zeros_like(a), and then use to choose which items from a you want, and which zeros from b that you want.
I've included (b) and (c) because they work with normal tensors, not just variables.
