Suppose there are N sets of words and I would like to create a map from those sets so that it maps the words to the number of the words occurrences in all these sets.
For example:
N = 3
S1 = {"a", "b", "c"}, S2 = {"a", "b", "d"}, S3 = {"a", "c", "e"}
M = { "a" -> 3, "b" -> 2, "c" -> 2, "d" -> 1, "e" -> 1}
Now I have M computers to use. Thus, I can make each computer create a map from N/M sets. In the second (final) phase I can create a map from the M maps. Looks like a map/reduce. Does it make sense ? How would you improve this approach ?
This is the standard map reduce example.
For example here is Python code based on the mincemeat map/reduce library:
#!/usr/bin/env python
import mincemeat
S1 = {"a", "b", "c"}
S2 = {"a", "b", "d"}
S3 = {"a", "c", "e"}
datasource = dict(enumerate([S1,S2,S3]))
def mapfn(k, v):
for w in v:
yield w, 1
def reducefn(k, vs):
result = sum(vs)
return result
s = mincemeat.Server()
s.datasource = datasource
s.mapfn = mapfn
s.reducefn = reducefn
results = s.run_server(password="changeme")
print results
Prints
{'a': 3, 'c': 2, 'b': 2, 'e': 1, 'd': 1}
Note that the way that map/reduce is structured means that the server gives new tasks to clients as they complete their tasks.
This means that there is not necessarily a fixed partitioning of N/M tasks to each client.
If one client is faster than the others then it will end up being given more tasks in order to make best use of the available resources.
Related
I'm trying to plot a seaborn countplot with parameter x and hue:
data = {"group1":[1, 2, 3, 1, 2, 3, 1, 1, 2, 2], "group2":["A", "B", "C", "A", "A", "B", "C", "B", "A", "C"]}
df = pd.DataFrame(data=data)
sns.countplot(data=df, x="group1", hue="group2")
plt.show()
Output:
I want to add another X ticks in the same graph, summerizng values acorss all other xticks (A value would be 4, B value would be 3, C value would be 3).
How can I do it?
I was trying to find an elegantly looking solution to your request, but have only come to this yet:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
data = {"group1":[1, 2, 3, 1, 2, 3, 1, 1, 2, 2],
"group2":["A", "B", "C", "A", "A", "B", "C", "B", "A", "C"]}
df = pd.DataFrame(data=data)
g1 = sns.countplot(data=df, x="group1", hue="group2")
count_labels = np.repeat(df["group2"].value_counts().values, # repeat group2 category counts
3) # for number of group1 categories/x-ticks
g2 = g1.twiny() # add twin axes with shared y-axis
g2.set_xticks([p.get_x() for p in g1.patches]) # place ticks at where g1 bars are
g2.set_xticklabels(count_labels) # assign tick labels
g2.set_xlabel("group2 category count")
g2.xaxis.set_ticks_position("bottom")
g2.xaxis.set_label_position("bottom")
g2.spines["bottom"].set_position(("axes", -0.2))
g2.spines["bottom"].set_visible(False)
plt.tick_params(which="both", top=False)
This is what it looks like:
So I thought you might rather want to annotate the bars:
for p, label in zip(g1.patches, count_labels):
g1.annotate(label, (p.get_x()+0.1, 0.1))
And it looks like this:
In case you want to use subplots:
fig, axes = plt.subplots(2, 1)
g1 = sns.countplot(data=df, x="group1", hue="group2", ax=axes[0])
g2 = sns.countplot(data=df, x="group2", ax=axes[1])
This would look this way:
Is there a way to shuffle all elements in an array with the exception of a specified index using the shuffle function?
Without having to manually write a method, does Ruby support anything similar?
For example, say I have an array of integers:
array = [1,2,3,4,5]
and I want to shuffle the elements in any random order but leave the first int in its place. The final result could be something like:
=> [1,4,3,2,5]
Just as long as that first element remains in its place. I've obviously found workarounds by creating my own methods to do this, but I wanted to see if there was some sort of built in function that could help cut down on time and space.
The short answer is no. Based on the latest Ruby documentation of Array.shuffle the only argument it accepts is random number generator. So you will need to write your own method - here's my take on it:
module ArrayExtender
def shuffle_except(index)
clone = self.clone
clone.delete_at(index)
clone.shuffle.insert(index, self[index])
end
end
array = %w(a b c d e f)
array.extend(ArrayExtender)
print array.shuffle_except(1) # => ["e", "b", "f", "a", "d", "c"]
print array.shuffle_except(2) # => ["e", "a", "c", "b", "f", "d"]
There is no built in function. It's still pretty easy to do that:
first element
arr = [1, 2, 3, 4, 5]
hold = arr.shift
# => 1
arr.shuffle.unshift(hold)
# => [1, 4, 5, 2, 3]
specific index
arr = [1, 2, 3, 4, 5]
index = 2
hold = arr.delete_at(index)
# => 3
arr.shuffle.insert(index, hold)
# => [5, 1, 3, 2, 4]
I have a small piece of code to generate sequences, which is ok.
List = Reap[
For[i = 1, i <= 10000, i++,
Sow[RandomSample[Join[Table["a", {2}], Table["b", {2}]], 2]]];][[2, 1]];
Tally[List]
Giving the following output,
{{{"b", "b"}, 166302}, {{"b", "a"}, 333668}, {{"a", "b"}, 332964}, {{"a", "a"}, 167066}}
My problem is I have yet to find a way to extract the frequencies from the output ....?
Thanks in advance for any help
Note: Generally do not start user-created Symbol names with a capital letter as these may conflict with internal functions.
It is not clear to me how you wish to transform the output. One interpretation is that you just want:
{166302, 333668, 332964, 167066}
In your code you use [[2, 1]] so I presume you know how to use Part, of which this is a short form. The documentation for Part includes:
If any of the listi are All or ;;, all parts at that level are kept.
You could therefore use:
Tally[list][[All, 2]]
You could also use:
Last /# Tally[list]
As george comments you can use Sort, which due to the structure of the Tally data will sort first by the item because it appears first in each list, and each list has the same length.
tally =
{{{"b","b"},166302},{{"b","a"},333668},{{"a","b"},332964},{{"a","a"},167066}};
Sort[tally][[All, 2]]
{167066, 332964, 333668, 166302}
You could also convert your data into a list of Rule objects and then pull values from a predetermined list:
rules = Rule ### tally
{{"b", "b"} -> 166302, {"b", "a"} -> 333668, {"a", "b"} -> 332964, {"a", "a"} -> 167066}
These could be in any order you choose:
{{"a", "a"}, {"a", "b"}, {"b", "a"}, {"b", "b"}} /. rules
{167066, 332964, 333668, 166302}
Merely to illustrate another technique if you have a specific list of items you wish to count you may find value in this Sow and Reap construct. For example, with a random list of "a", "b", "c", "d":
SeedRandom[1];
dat = RandomChoice[{"a", "b", "c", "d"}, 50];
Counting the "a" and "c" elements:
Reap[Sow[1, dat], {"a", "c"}, Tr[#2] &][[2, All, 1]]
{19, 5}
This is not as fast as Tally but it is faster than doing a Count for each element, and sometimes the syntax is useful.
So this is hurting my head, I am not very good with programming obviously. I have,
LetterArray = [a,b,c,d,e,f,g]
NumArray = [1,2,3,4,5,6,7,8,9,10]
ListOfLetters = []
and I want to take an element from NumArray and, starting on LetterArray[0], go up var x amount of times in LetterArray, and add that element (say var y to the array. Then starting on y go up the next number in NumArray, and so on. Then print the ListOfLetters to console.
My goal is for the output to be like this: [a, c, f, c, a, f, e, e, f, a].
I am drawing a blank on how to go about this in code.
Something like this (if I get your requirements right of course)?
letter_array = %w[a b c d e f g]
number_array = [1,2,3,4,5,6,7,8,9,10]
list_of_letters = []
number_array.inject(0) do |offset, delta|
list_of_letters << letter_array[offset]
(offset + delta) % letter_array.size
end
p list_of_letters #=> ["a", "b", "d", "g", "d", "b", "a", "a", "b", "d"]
Either I don't understand your problem description, or the example output you showed is wrong from a certain point onwards. Anyway, maybe this gets you started:
letter_array = [*?a..?g]
number_array = *1..10
list_of_letters = []
number_array.inject(0) do |s, n|
i = s + n
list_of_letters << letter_array[i % letter_array.size - 1]
i
end
This produces the output ["a", "c", "f", "c", "a", "g", "g", "a", "c", "f"].
Alternatively you can also first create the indices and then use them (this doesn't require a pre-initialized list_of_letters):
indices = number_array.inject([]) { |a, n| a << (a.last || 0) + n ; a})
list_of_letters = indices.map { |i| letter_array[i%letter_array.size-1] }
ar = ('a'..'g').to_a.cycle #keeps on cycling
res = []
p 10.times.map do |n|
n.times{ar.next} #cycle one time too short (the first time n is 0)
res << ar.next #cycle once more and store
end
p res #=>["a", "c", "f", "c", "a", "g", "g", "a", "c", "f"]
I have two arrays a, b of the same length:
a = [a_1, a_2, ..., a_n]
b = [b_1, b_2, ..., b_n]
When I sort a using sort_by!, the elements of a will be arranged in different order:
a.sort_by!{|a_i| some_condition(a_i)}
How can I reorder b in the same order/rearrangement as the reordering of a? For example, if a after sort_by! is
[a_3, a_6, a_1, ..., a_i_n]
then I want
[b_3, b_6, b_1, ..., b_i_n]
Edit
I need to do it in place (i.e., retain the object_id of a, b). The two answers given so far is useful in that, given the sorted arrays:
a_sorted
b_sorted
I can do
a.replace(a_sorted)
b.replace(b_sorted)
but if possible, I want to do it directly. If not, I will accept one of the answers already given.
One approach would be to zip the two arrays together and sort them at the same time. Something like this, perhaps?
a = [1, 2, 3, 4, 5]
b = %w(a b c d e)
a,b = a.zip(b).sort_by { rand }.transpose
p a #=> [3, 5, 2, 4, 1]
p b #=> ["c", "e", "b", "d", "a"]
How about:
ary_a = [ 3, 1, 2] # => [3, 1, 2]
ary_b = [ 'a', 'b', 'c'] # => ["a", "b", "c"]
ary_a.zip(ary_b).sort{ |a,b| a.first <=> b.first }.map{ |a,b| b } # => ["b", "c", "a"]
or
ary_a.zip(ary_b).sort_by(&:first).map{ |a,b| b } # => ["b", "c", "a"]
If the entries are unique, the following may work. I haven't tested it. This is partially copied from https://stackoverflow.com/a/4283318/38765
temporary_copy = a.sort_by{|a_i| some_condition(a_i)}
new_indexes = a.map {|a_i| temporary_copy.index(a_i)}
a.each_with_index.sort_by! do |element, i|
new_indexes[i]
end
b.each_with_index.sort_by! do |element, i|
new_indexes[i]
end