seaborn countplot add xtick this is sum of all other xtick values - seaborn

I'm trying to plot a seaborn countplot with parameter x and hue:
data = {"group1":[1, 2, 3, 1, 2, 3, 1, 1, 2, 2], "group2":["A", "B", "C", "A", "A", "B", "C", "B", "A", "C"]}
df = pd.DataFrame(data=data)
sns.countplot(data=df, x="group1", hue="group2")
plt.show()
Output:
I want to add another X ticks in the same graph, summerizng values acorss all other xticks (A value would be 4, B value would be 3, C value would be 3).
How can I do it?

I was trying to find an elegantly looking solution to your request, but have only come to this yet:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
data = {"group1":[1, 2, 3, 1, 2, 3, 1, 1, 2, 2],
"group2":["A", "B", "C", "A", "A", "B", "C", "B", "A", "C"]}
df = pd.DataFrame(data=data)
g1 = sns.countplot(data=df, x="group1", hue="group2")
count_labels = np.repeat(df["group2"].value_counts().values, # repeat group2 category counts
3) # for number of group1 categories/x-ticks
g2 = g1.twiny() # add twin axes with shared y-axis
g2.set_xticks([p.get_x() for p in g1.patches]) # place ticks at where g1 bars are
g2.set_xticklabels(count_labels) # assign tick labels
g2.set_xlabel("group2 category count")
g2.xaxis.set_ticks_position("bottom")
g2.xaxis.set_label_position("bottom")
g2.spines["bottom"].set_position(("axes", -0.2))
g2.spines["bottom"].set_visible(False)
plt.tick_params(which="both", top=False)
This is what it looks like:
So I thought you might rather want to annotate the bars:
for p, label in zip(g1.patches, count_labels):
g1.annotate(label, (p.get_x()+0.1, 0.1))
And it looks like this:
In case you want to use subplots:
fig, axes = plt.subplots(2, 1)
g1 = sns.countplot(data=df, x="group1", hue="group2", ax=axes[0])
g2 = sns.countplot(data=df, x="group2", ax=axes[1])
This would look this way:

Related

pytorch: efficient way to perform operations on 2 tensors of different sizes, where one has a one-to-many relation

I have 2 tensors. The first tensor is 1D (e.g. a tensor of 3 values). The second tensor is 2D, with the first dim as the IDs to first tensor in a one-many relationship (e.g. a tensor with a shape of 6, 2)
# e.g. simple example of dot product
import torch
a = torch.tensor([2, 4, 3])
b = torch.tensor([[0, 2], [0, 3], [0, 1], [1, 4], [2, 3], [2, 1]]) # 1st column is the index to tensor a, 2nd column is the value
output = [(2*2)+(2*3)+(2*1),(4*4),(3*3)+(3*1)]
output = [12, 16, 12]
Current what I have is to find the size of each id in b (e.g. [3,1,2]) then using torch.split to group them into a list of tensors and running a for loop through the groups. It is fine for a small tensor, but when the size of the tensors are in millions, with tens of thousands of arbitrary-sized groups, it became very slow.
Any better solutions?
You can use numpy.bincount or torch.bincount to sum the elements of b by key:
import numpy as np
a = np.array([2,4,3])
b = np.array([[0,2], [0,3], [0,1], [1,4], [2,3], [2,1]])
print( np.bincount(b[:,0], b[:,1]) )
# [6. 4. 4.]
print( a * np.bincount(b[:,0], b[:,1]) )
# [12. 16. 12.]
import torch
a = torch.tensor([2,4,3])
b = torch.tensor([[0,2], [0,3], [0,1], [1,4], [2,3], [2,1]])
torch.bincount(b[:,0], b[:,1])
# tensor([6., 4., 4.], dtype=torch.float64)
a * torch.bincount(b[:,0], b[:,1])
# tensor([12., 16., 12.], dtype=torch.float64)
References:
numpy.bincount official documentation;
torch.bincount official documentation;
How can I reduce a numpy array based on a key rather than an axis?
Another alternative in pytorch if gradient is needed.
import torch
a = torch.tensor([2,4,3])
b = torch.tensor([[0,2], [0,3], [0,1], [1,4], [2,3], [2,1]])
output = torch.zeros(a.shape[0], dtype=torch.long).index_add_(0, b[:, 0], b[:, 1]) * a
alternatively, torch.tensor.scatter_add also works.

How can I plot different types of seaborn plots on different x ticks?

I want to have multiple types of seaborn plots using the same y axis but with different x coordinates (see image below).
I've tried doing this multiple different ways with specifying the X-axis coordinates differently but can't seem to get it to work.
Here is an example of almost working code
x=[1,2,3,3,3,4,4,5,5,6] # first violin
y=[4,4,5,5,5,5,6] # second violin
z=[5,5,6] # swarmplot over second violin
for data,label in [(x,'x'),(y,'y'),(z,'z')]:
for i in data:
c2v['value'].append(i)
c2v['key'].append(label)
data=pd.DataFrame(c2v)
data.head()
print(data.loc[data.key=='z'])
fig,ax=plt.subplots(1,figsize=(5,5),dpi=200)
ax = sns.violinplot(data=data.loc[data.key.isin(['x','y'])], x='key', y='value',palette=['honeydew','lightgreen'])
sns.swarmplot(x=['swarmplot']*len(data), y=data['value'], order=ax.get_xticklabels() + ['swarmplot'], ax=ax) #.loc[data.key=='z',:]
ax.set_xlabel('')
It produces the following image:
However, it is plotting all values associated with x/y/z instead of just z. When I slice the dataframe to only 'z' in the swarmplot as below, I get an error:
sns.swarmplot(x=['swarmplot']*len(data), y=data.loc[data.key=='z',:]['value'], order=ax.get_xticklabels() + ['swarmplot'], ax=ax)
KeyError: 'swarmplot'
Any suggestions?
To draw a second plot onto the same x-axis, you can use order= giving a list of existing tick labels, appending the new labels.
Here is an example:
import seaborn as sns
tips = sns.load_dataset('tips')
ax = sns.swarmplot(data=tips, x='day', y='total_bill')
sns.violinplot(x=['violin']*len(tips), y=tips['total_bill'], order=ax.get_xticklabels() + ['violin'], ax=ax)
ax.set_xlabel('')
The problem with the code in the new question, is that the x= and y= of the swarmplot need the same number of elements. It also seems the swarmplot resets the y limits, so I added some code to readjust those:
from matplotlib import pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
x = [1, 2, 3, 3, 3, 4, 4, 5, 5, 6] # first violin
y = [4, 4, 5, 5, 5, 5, 6] # second violin
z = [5, 5, 6] # swarmplot over second violin
data = pd.DataFrame({'value': np.concatenate([x, y, z]),
'key': ['x'] * len(x) + ['y'] * len(y) + ['z'] * len(z)})
fig, ax = plt.subplots(1, figsize=(5, 5))
ax = sns.violinplot(data=data.loc[data.key.isin(['x', 'y'])], x='key', y='value', palette=['honeydew', 'lightgreen'])
ymin1, ymax1 = ax.get_ylim()
swarm_data = data.loc[data.key == 'z', :]['value']
sns.swarmplot(x=['swarmplot'] * len(swarm_data), y=swarm_data, order=ax.get_xticklabels() + ['swarmplot'], ax=ax)
ymin2, ymax2 = ax.get_ylim()
ax.set_ylim(min(ymin1, ymin2), max(ymax1, ymax2))
ax.set_xlabel('')
ax.set_xticks(np.arange(3))
ax.set_xticklabels(['x', 'y', 'swarmplot'])
plt.show()
You can simplify things by directly using the data without creating a dataframe:
x = [1, 2, 3, 3, 3, 4, 4, 5, 5, 6] # first violin
y = [4, 4, 5, 5, 5, 5, 6] # second violin
z = [5, 5, 6] # swarmplot over second violin
fig, ax = plt.subplots(1, figsize=(5, 5))
ax = sns.violinplot(x=['x']*len(x) + ['y']*len(y), y=x + y, palette=['honeydew', 'lightgreen'])
ymin1, ymax1 = ax.get_ylim()
sns.swarmplot(x=['swarmplot'] * len(z), y=z, order=ax.get_xticklabels() + ['swarmplot'], ax=ax)
ymin2, ymax2 = ax.get_ylim()
ax.set_ylim(min(ymin1, ymin2), max(ymax1, ymax2))
ax.set_xticks(np.arange(3))
ax.set_xticklabels(['x', 'y', 'swarmplot'])
plt.show()

Manipulating matrix elements in tensorflow

How can I do the following in tensorflow?
mat = [4,2,6,2,3] #
mat[2] = 0 # simple zero the 3rd element
I can't use the [] brackets because it only works on constants and not on
variables. I cant use the slice function either because that returns a tensor and you can't assign to a tensor.
import tensorflow as tf
sess = tf.Session()
var1 = tf.Variable(initial_value=[2, 5, -4, 0])
assignZerosOP = (var1[2] = 0) # < ------ This is what I want to do
sess.run(tf.initialize_all_variables())
print sess.run(var1)
sess.run(assignZerosOP)
print sess.run(var1)
Will print
[2, 5, -4, 0]
[2, 5, 0, 0])
You can't change a tensor - but, as you noted, you can change a variable.
There are three patterns you could use to accomplish what you want:
(a) Use tf.scatter_update to directly poke to the part of the variable you want to change.
import tensorflow as tf
a = tf.Variable(initial_value=[2, 5, -4, 0])
b = tf.scatter_update(a, [1], [9])
init = tf.initialize_all_variables()
with tf.Session() as s:
s.run(init)
print s.run(a)
print s.run(b)
print s.run(a)
[ 2 5 -4 0]
[ 2 9 -4 0]
[ 2 9 -4 0]
(b) Create two tf.slice()s of the tensor, excluding the item you want to change, and then tf.concat(0, [a, 0, b]) them back together.
(c) Create b = tf.zeros_like(a), and then use tf.select() to choose which items from a you want, and which zeros from b that you want.
I've included (b) and (c) because they work with normal tensors, not just variables.

Shuffle array with exceptions

Is there a way to shuffle all elements in an array with the exception of a specified index using the shuffle function?
Without having to manually write a method, does Ruby support anything similar?
For example, say I have an array of integers:
array = [1,2,3,4,5]
and I want to shuffle the elements in any random order but leave the first int in its place. The final result could be something like:
=> [1,4,3,2,5]
Just as long as that first element remains in its place. I've obviously found workarounds by creating my own methods to do this, but I wanted to see if there was some sort of built in function that could help cut down on time and space.
The short answer is no. Based on the latest Ruby documentation of Array.shuffle the only argument it accepts is random number generator. So you will need to write your own method - here's my take on it:
module ArrayExtender
def shuffle_except(index)
clone = self.clone
clone.delete_at(index)
clone.shuffle.insert(index, self[index])
end
end
array = %w(a b c d e f)
array.extend(ArrayExtender)
print array.shuffle_except(1) # => ["e", "b", "f", "a", "d", "c"]
print array.shuffle_except(2) # => ["e", "a", "c", "b", "f", "d"]
There is no built in function. It's still pretty easy to do that:
first element
arr = [1, 2, 3, 4, 5]
hold = arr.shift
# => 1
arr.shuffle.unshift(hold)
# => [1, 4, 5, 2, 3]
specific index
arr = [1, 2, 3, 4, 5]
index = 2
hold = arr.delete_at(index)
# => 3
arr.shuffle.insert(index, hold)
# => [5, 1, 3, 2, 4]

Creating Map from a number of Sets with Map/Reduce

Suppose there are N sets of words and I would like to create a map from those sets so that it maps the words to the number of the words occurrences in all these sets.
For example:
N = 3
S1 = {"a", "b", "c"}, S2 = {"a", "b", "d"}, S3 = {"a", "c", "e"}
M = { "a" -> 3, "b" -> 2, "c" -> 2, "d" -> 1, "e" -> 1}
Now I have M computers to use. Thus, I can make each computer create a map from N/M sets. In the second (final) phase I can create a map from the M maps. Looks like a map/reduce. Does it make sense ? How would you improve this approach ?
This is the standard map reduce example.
For example here is Python code based on the mincemeat map/reduce library:
#!/usr/bin/env python
import mincemeat
S1 = {"a", "b", "c"}
S2 = {"a", "b", "d"}
S3 = {"a", "c", "e"}
datasource = dict(enumerate([S1,S2,S3]))
def mapfn(k, v):
for w in v:
yield w, 1
def reducefn(k, vs):
result = sum(vs)
return result
s = mincemeat.Server()
s.datasource = datasource
s.mapfn = mapfn
s.reducefn = reducefn
results = s.run_server(password="changeme")
print results
Prints
{'a': 3, 'c': 2, 'b': 2, 'e': 1, 'd': 1}
Note that the way that map/reduce is structured means that the server gives new tasks to clients as they complete their tasks.
This means that there is not necessarily a fixed partitioning of N/M tasks to each client.
If one client is faster than the others then it will end up being given more tasks in order to make best use of the available resources.

Resources