How can I plot different types of seaborn plots on different x ticks? - seaborn

I want to have multiple types of seaborn plots using the same y axis but with different x coordinates (see image below).
I've tried doing this multiple different ways with specifying the X-axis coordinates differently but can't seem to get it to work.
Here is an example of almost working code
x=[1,2,3,3,3,4,4,5,5,6] # first violin
y=[4,4,5,5,5,5,6] # second violin
z=[5,5,6] # swarmplot over second violin
for data,label in [(x,'x'),(y,'y'),(z,'z')]:
for i in data:
c2v['value'].append(i)
c2v['key'].append(label)
data=pd.DataFrame(c2v)
data.head()
print(data.loc[data.key=='z'])
fig,ax=plt.subplots(1,figsize=(5,5),dpi=200)
ax = sns.violinplot(data=data.loc[data.key.isin(['x','y'])], x='key', y='value',palette=['honeydew','lightgreen'])
sns.swarmplot(x=['swarmplot']*len(data), y=data['value'], order=ax.get_xticklabels() + ['swarmplot'], ax=ax) #.loc[data.key=='z',:]
ax.set_xlabel('')
It produces the following image:
However, it is plotting all values associated with x/y/z instead of just z. When I slice the dataframe to only 'z' in the swarmplot as below, I get an error:
sns.swarmplot(x=['swarmplot']*len(data), y=data.loc[data.key=='z',:]['value'], order=ax.get_xticklabels() + ['swarmplot'], ax=ax)
KeyError: 'swarmplot'
Any suggestions?

To draw a second plot onto the same x-axis, you can use order= giving a list of existing tick labels, appending the new labels.
Here is an example:
import seaborn as sns
tips = sns.load_dataset('tips')
ax = sns.swarmplot(data=tips, x='day', y='total_bill')
sns.violinplot(x=['violin']*len(tips), y=tips['total_bill'], order=ax.get_xticklabels() + ['violin'], ax=ax)
ax.set_xlabel('')
The problem with the code in the new question, is that the x= and y= of the swarmplot need the same number of elements. It also seems the swarmplot resets the y limits, so I added some code to readjust those:
from matplotlib import pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
x = [1, 2, 3, 3, 3, 4, 4, 5, 5, 6] # first violin
y = [4, 4, 5, 5, 5, 5, 6] # second violin
z = [5, 5, 6] # swarmplot over second violin
data = pd.DataFrame({'value': np.concatenate([x, y, z]),
'key': ['x'] * len(x) + ['y'] * len(y) + ['z'] * len(z)})
fig, ax = plt.subplots(1, figsize=(5, 5))
ax = sns.violinplot(data=data.loc[data.key.isin(['x', 'y'])], x='key', y='value', palette=['honeydew', 'lightgreen'])
ymin1, ymax1 = ax.get_ylim()
swarm_data = data.loc[data.key == 'z', :]['value']
sns.swarmplot(x=['swarmplot'] * len(swarm_data), y=swarm_data, order=ax.get_xticklabels() + ['swarmplot'], ax=ax)
ymin2, ymax2 = ax.get_ylim()
ax.set_ylim(min(ymin1, ymin2), max(ymax1, ymax2))
ax.set_xlabel('')
ax.set_xticks(np.arange(3))
ax.set_xticklabels(['x', 'y', 'swarmplot'])
plt.show()
You can simplify things by directly using the data without creating a dataframe:
x = [1, 2, 3, 3, 3, 4, 4, 5, 5, 6] # first violin
y = [4, 4, 5, 5, 5, 5, 6] # second violin
z = [5, 5, 6] # swarmplot over second violin
fig, ax = plt.subplots(1, figsize=(5, 5))
ax = sns.violinplot(x=['x']*len(x) + ['y']*len(y), y=x + y, palette=['honeydew', 'lightgreen'])
ymin1, ymax1 = ax.get_ylim()
sns.swarmplot(x=['swarmplot'] * len(z), y=z, order=ax.get_xticklabels() + ['swarmplot'], ax=ax)
ymin2, ymax2 = ax.get_ylim()
ax.set_ylim(min(ymin1, ymin2), max(ymax1, ymax2))
ax.set_xticks(np.arange(3))
ax.set_xticklabels(['x', 'y', 'swarmplot'])
plt.show()

Related

pytorch: efficient way to perform operations on 2 tensors of different sizes, where one has a one-to-many relation

I have 2 tensors. The first tensor is 1D (e.g. a tensor of 3 values). The second tensor is 2D, with the first dim as the IDs to first tensor in a one-many relationship (e.g. a tensor with a shape of 6, 2)
# e.g. simple example of dot product
import torch
a = torch.tensor([2, 4, 3])
b = torch.tensor([[0, 2], [0, 3], [0, 1], [1, 4], [2, 3], [2, 1]]) # 1st column is the index to tensor a, 2nd column is the value
output = [(2*2)+(2*3)+(2*1),(4*4),(3*3)+(3*1)]
output = [12, 16, 12]
Current what I have is to find the size of each id in b (e.g. [3,1,2]) then using torch.split to group them into a list of tensors and running a for loop through the groups. It is fine for a small tensor, but when the size of the tensors are in millions, with tens of thousands of arbitrary-sized groups, it became very slow.
Any better solutions?
You can use numpy.bincount or torch.bincount to sum the elements of b by key:
import numpy as np
a = np.array([2,4,3])
b = np.array([[0,2], [0,3], [0,1], [1,4], [2,3], [2,1]])
print( np.bincount(b[:,0], b[:,1]) )
# [6. 4. 4.]
print( a * np.bincount(b[:,0], b[:,1]) )
# [12. 16. 12.]
import torch
a = torch.tensor([2,4,3])
b = torch.tensor([[0,2], [0,3], [0,1], [1,4], [2,3], [2,1]])
torch.bincount(b[:,0], b[:,1])
# tensor([6., 4., 4.], dtype=torch.float64)
a * torch.bincount(b[:,0], b[:,1])
# tensor([12., 16., 12.], dtype=torch.float64)
References:
numpy.bincount official documentation;
torch.bincount official documentation;
How can I reduce a numpy array based on a key rather than an axis?
Another alternative in pytorch if gradient is needed.
import torch
a = torch.tensor([2,4,3])
b = torch.tensor([[0,2], [0,3], [0,1], [1,4], [2,3], [2,1]])
output = torch.zeros(a.shape[0], dtype=torch.long).index_add_(0, b[:, 0], b[:, 1]) * a
alternatively, torch.tensor.scatter_add also works.

I need to convert a 3 column array into a square matrix?

I am looking to convert my data to a square matrix:
Say your input is a list; you can then convert it to a list of lists (i.e., a proxy to a matrix) with list comprehension:
>>> x = [0, 5, 10, 5, 0, 2, 10, 2, 0]
>>> [x[3*k:3*k+3] for k in range(3)]
[[0, 5, 10], [5, 0, 2], [10, 2, 0]]
To help you parse the line: you are building a list by iterating over k from 0 to 2, where each element will be a slice of x that starts from index 3*k and ends at index 3*k+3. Thus, your list is [x[0:3], x[3:6], x[6:9]].
That said, it's much better to use numpy for all such needs. There, you would do:
>>> import numpy as np
>>> x = np.array([0, 5, 10, 5, 0, 2, 10, 2, 0])
>>> x.reshape(3, 3)
array([[ 0, 5, 10],
[ 5, 0, 2],
[10, 2, 0]])
The reshape() function converts your 1D array into the requested 2D matrix.

Why does this simple LightGBM binary classifier perform poorly?

I tried to train a LightGBM binary classifier using the Python API the relation -
if feature > 5, then 1 else 0
import pandas as pd
import numpy as np
import lightgbm as lgb
x_train = pd.DataFrame([4, 7, 2, 6, 3, 1, 9])
y_train = pd.DataFrame([0, 1, 0, 1, 0, 0, 1])
x_test = pd.DataFrame([8, 2])
y_test = pd.DataFrame([1, 0])
lgb_train = lgb.Dataset(x_train, y_train)
lgb_eval = lgb.Dataset(x_test, y_test, reference=lgb_train)
params = { 'objective': 'binary', 'metric': {'binary_logloss', 'auc'}}
gbm = lgb.train(params, lgb_train, valid_sets=lgb_eval)
y_pred = gbm.predict(x_test, num_iteration=gbm.best_iteration)
y_pred
array([0.42857143, 0.42857143])
np.where((y_pred > 0.5), 1, 0)
array([0, 0])
Clearly it failed to predict the first test 8. Can anyone see what went wrong?
LightGBM's parameter defaults are set with the expectation of moderate-sized training data, and might not work well on extremely small datasets like the one in this question.
There are two in particular that are impacting your result:
min_data_in_leaf: minimum number of samples that must fall into a leaf node
min_sum_hessian_in_leaf: basically, the minimum contribution to the loss function for one leaf node
Setting these to the lowest possible values can force LightGBM to overfit to such a small dataset.
import pandas as pd
import numpy as np
import lightgbm as lgb
x_train = pd.DataFrame([4, 7, 2, 6, 3, 1, 9])
y_train = pd.DataFrame([0, 1, 0, 1, 0, 0, 1])
x_test = pd.DataFrame([8, 2])
y_test = pd.DataFrame([1, 0])
lgb_train = lgb.Dataset(x_train, y_train)
lgb_eval = lgb.Dataset(x_test, y_test, reference=lgb_train)
params = {
'objective': 'binary',
'metric': {'binary_logloss', 'auc'},
'min_data_in_leaf': 1,
'min_sum_hessian_in_leaf': 0
}
gbm = lgb.train(params, lgb_train, valid_sets=lgb_eval)
y_pred = gbm.predict(x_test, num_iteration=gbm.best_iteration)
y_pred
# array([6.66660313e-01, 1.89048958e-05])
np.where((y_pred > 0.5), 1, 0)
# array([1, 0])
For details on all the parameters and their defaults, see https://lightgbm.readthedocs.io/en/latest/Parameters.html.

sympy matrix element round?

What am I doing wrong with the macro?
from sympy import *
from decimal import *
def MyMatrixRound(A):
m = A.shape[0]
n = A.shape[1]
for i in range(m):
for j in range(n):
A[i, j]=round(A[i, j]+0.2,1)
A[i, j] = Decimal(str(A[i, j])).quantize(Decimal('0.01'), rounding=ROUND_HALF_UP)
return A
x = Matrix(4, 3, range(12))
print(x)
print(MyMatrixRound(x))
# Matrix([[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10, 11]])
# Matrix([[0.200000000000000, 1.20000000000000, 2.20000000000000], [3.20000000000000, 4.20000000000000, 5.20000000000000], [6.20000000000000, 7.20000000000000, 8.20000000000000], [9.20000000000000, 10.2000000000000, 11.2000000000000]])
#
# I want
# Matrix([[0.2, 1.2, 2.2], [3.2, 4.2, 5.2], [6.2, 7.2, 8.2], [9.2, 10.2, 11.2]])
Thank you in advance and sorry for the bad english!
2018-12-11------------------------------
FullScript.py
from sympy import *
from decimal import *
def MyMatrixPrint(A,iDec):
var('arr')
if iDec == 0 :
cDec = '0'
elif iDec == 1:
cDec='.1'
elif iDec == 2:
cDec = '.01'
elif iDec == 3:
cDec == '.001'
else:
print("unknown")
m = A.shape[0]
n = A.shape[1]
arr = [[0 for j in range(m+n)] for i in range(m+n)]
cMatrix = "myMatrix([["
for i in range(m):
for j in range(n):
arr[i][j] = Decimal(str(A[i,j])).quantize(Decimal(cDec), rounding=ROUND_HALF_UP)
cMatrix = cMatrix+str(arr[i][j]) + " ,"
cMatrix = cMatrix + "],["
cMatrix = cMatrix + "]"
return cMatrix
x = Matrix(4, 3, range(12))
print(" ",x)
r = x.applyfunc(lambda e: e+.2)
print(MyMatrixPrint(r,1))
# I want a little
# Matrix([[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10, 11]])
# myMatrix([[0.2 ,1.2 ,2.2 ,],[3.2 ,4.2 ,5.2 ,],[6.2 ,7.2 ,8.2 ,],[9.2 ,10.2 ,11.2 ,],[]
May not be correct but at least it works
Please correct me if some of my assumptions are wrong on this.
2018-12-21------------------------------
How to round Matrix elements in sympy?
from sympy import *
x = Matrix(4, 3, range(12))
x[2, 2]=x[2, 2]+0.2
print(x)
print(x.evalf(3))
# Matrix([[0, 1, 2], [3, 4, 5], [6, 7, 8.20000000000000], [9, 10, 11]])
# Matrix([[0, 1.00, 2.00], [3.00, 4.00, 5.00], [6.00, 7.00, 8.20], [9.00, 10.0, 11.0]])
In what environment are you running this? With Windows command line I get:
>>> x = Matrix(4, 3, range(12))
>>> r=x.applyfunc(lambda e:
... Decimal(str(round(e+.2,1))).quantize(Decimal('.01'),
... rounding=ROUND_HALF_UP))
>>> r[0]
0.200000000000000
>>> r
Matrix([
[0.2, 1.2, 2.2],
[3.2, 4.2, 5.2],
[6.2, 7.2, 8.2],
[9.2, 10.2, 11.2]])
>>> r[0].round(1)
0.2

Manipulating matrix elements in tensorflow

How can I do the following in tensorflow?
mat = [4,2,6,2,3] #
mat[2] = 0 # simple zero the 3rd element
I can't use the [] brackets because it only works on constants and not on
variables. I cant use the slice function either because that returns a tensor and you can't assign to a tensor.
import tensorflow as tf
sess = tf.Session()
var1 = tf.Variable(initial_value=[2, 5, -4, 0])
assignZerosOP = (var1[2] = 0) # < ------ This is what I want to do
sess.run(tf.initialize_all_variables())
print sess.run(var1)
sess.run(assignZerosOP)
print sess.run(var1)
Will print
[2, 5, -4, 0]
[2, 5, 0, 0])
You can't change a tensor - but, as you noted, you can change a variable.
There are three patterns you could use to accomplish what you want:
(a) Use tf.scatter_update to directly poke to the part of the variable you want to change.
import tensorflow as tf
a = tf.Variable(initial_value=[2, 5, -4, 0])
b = tf.scatter_update(a, [1], [9])
init = tf.initialize_all_variables()
with tf.Session() as s:
s.run(init)
print s.run(a)
print s.run(b)
print s.run(a)
[ 2 5 -4 0]
[ 2 9 -4 0]
[ 2 9 -4 0]
(b) Create two tf.slice()s of the tensor, excluding the item you want to change, and then tf.concat(0, [a, 0, b]) them back together.
(c) Create b = tf.zeros_like(a), and then use tf.select() to choose which items from a you want, and which zeros from b that you want.
I've included (b) and (c) because they work with normal tensors, not just variables.

Resources