Get frequency of related tags in a table - calculated table or ml? - algorithm

I have a main table of multiple string tags:
["A", "B", "C", "D"]
["A", "C", "D", "G"]
["A", "F", "G", "H"]
["A", "B", "G", "H"]
...
When I create a new row and insert the first tag (by example "A"), I want to get suggested the most frequent tags related to it by looking in the existing rows.
In other words, I want to know for each tag (by example "A"), the frequency of related tags and get a list of related tags ordered by most frequents.
For example:
"A".get_most_frequently_related_tags()
= {"G": 3, "B": 2, "C": 2, "H": 2}
My approach is to iterate the main table and create dinamically a new table with this contents:
[ tag, related_tag, freq ]
[ "A", "B", 2 ]
[ "A", "G", 3 ]
[ "A", "H", 2 ]
...
and then select only rows with tag "A" to extract an hash of ordered [related_tag: freq].
Is that the best approach? I don't know if there's a better algorithm (or using machine learning?)...

Instead of a new table with one row per pair (tag, related_tag), I suggest a mapping with one row per tag, but this row maps the tag to the whole list of all its related tags (and their frequencies).
Most programming languages have a standard "map" in their standard library: in C++, it's std::map or std::unordered_map; in Java, it's the interface java.util.Map, implemented as java.util.HashMap or java.util.TreeMap; in python, it's dict.
Here is a solution in python. The map is implemented with collections.defaultdict, and it maps each tag to a collections.Counter, the python tool of choice to count frequencies.
from collections import Counter, defaultdict
table = [
["A", "B", "C", "D"],
["A", "C", "D", "G"],
["A", "F", "G", "H"],
["A", "B", "G", "H"],
]
def build_frequency_table(table):
freqtable = defaultdict(Counter)
for row in table:
for tag in row:
freqtable[tag].update(row)
for c,freq in freqtable.items():
del freq[c]
return freqtable
freqtable = build_frequency_table(table)
print( freqtable )
# defaultdict(<class 'collections.Counter'>,
# {'A': Counter({'G': 3, 'B': 2, 'C': 2, 'D': 2, 'H': 2, 'F': 1}),
# 'B': Counter({'A': 2, 'C': 1, 'D': 1, 'G': 1, 'H': 1}),
# 'C': Counter({'A': 2, 'D': 2, 'B': 1, 'G': 1}),
# 'D': Counter({'A': 2, 'C': 2, 'B': 1, 'G': 1}),
# 'G': Counter({'A': 3, 'H': 2, 'C': 1, 'D': 1, 'F': 1, 'B': 1}),
# 'F': Counter({'A': 1, 'G': 1, 'H': 1}),
# 'H': Counter({'A': 2, 'G': 2, 'F': 1, 'B': 1})})
print(freqtable['A'].most_common())
# [('G', 3), ('B', 2), ('C', 2), ('D', 2), ('H', 2), ('F', 1)]

I've had a go at finding a solution for this in C#. I cannot defend this approach performance-wise, but 1) it serves the purpose (at least for inputs that are not too large); and 2) I found it to be an interesting challenge personally.
As in Stef's answer, a dictionary is created and may be used to look up any wanted tag to see all of the tag's related tags, ordered by frequency.
I've placed the dictionary creation inside an extension method:
public static IDictionary<string, List<(string Tag, int Count)>> AsRelatedTagWithFrequencyMap
(this IEnumerable<IEnumerable<string>> relatedTags)
{
return relatedTags
.SelectMany(row => row
.Select(targetTag =>
(TargetTag: targetTag,
RelatedTags: row.Where(tag => tag != targetTag))))
.GroupBy(relations => relations.TargetTag)
.ToDictionary(
grouping => grouping.Key,
grouping => grouping
.SelectMany(relations => relations.RelatedTags)
.GroupBy(relatedTag => relatedTag)
.Select(grouping => (RelatedTag: grouping.Key, Count: grouping.Count()))
.OrderByDescending(relatedTag => relatedTag.Count)
.ToList());
}
It is used as follows:
var tagsUsedWithTags = new List<string[]>
{
new[] { "A", "B", "C", "D" },
new[] { "A", "C", "D", "G" },
new[] { "A", "F", "G", "H" },
new[] { "A", "B", "G", "H" }
};
var relatedTagsOfTag = tagsUsedWithTags.AsRelatedTagWithFrequencyMap();
Printing the dictionary content:
foreach (var relation in relatedTagsOfTag)
{
Console.WriteLine($"{relation.Key}: [ {string.Join(", ", relation.Value.Select(related => $"({related.Tag}: {related.Count})"))} ]");
}
A: [ (G: 3), (B: 2), (C: 2), (D: 2), (H: 2), (F: 1) ]
B: [ (A: 2), (C: 1), (D: 1), (G: 1), (H: 1) ]
C: [ (A: 2), (D: 2), (B: 1), (G: 1) ]
D: [ (A: 2), (C: 2), (B: 1), (G: 1) ]
F: [ (A: 1), (G: 1), (H: 1) ]
G: [ (A: 3), (H: 2), (C: 1), (D: 1), (F: 1), (B: 1) ]
H: [ (A: 2), (G: 2), (F: 1), (B: 1) ]

Related

In Ruby how do I sort a hash by its key values in alphabetical order?

Suppose I have a hash,
{"c": 1, "b": 2, "a": 3}
How do I sort the hash so the elements are in order of the key value?
myh = {"c" => 1, "b" => 2, "a" => 3}
myh.sort
=> [["a", 3], ["b", 2], ["c", 1]]
{"c" => 1, "b" => 2, "a" => 3}.sort.to_h

Strange Ruby 2+ Behavior with "select!"

I'm having an issue that I can't seem to find documented or explained anywhere so I'm hoping someone here can help me out. I've verified the unexpected behavior on three versions of Ruby, all 2.1+, and verified that it doesn't happen on an earlier version (though it's through tryruby.org and I don't know which version they're using). Anyway, for the question I'll just post some code with results and hopefully someone can help me debug it.
arr = %w( r a c e c a r ) #=> ["r","a","c","e","c","a","r"]
arr.select { |c| arr.count(c).odd? } #=> ["e"]
arr.select! { |c| arr.count(c).odd? } #=> ["e","r"] <<<<<<<<<<<<<<< ??????
I think the confusing part for me is clearly marked and if anyone can explain if this is a bug or if there's some logic to it, I'd greatly appreciate it. Thanks!
You're modifying the array while you're read from it while you iterate over it. I'm not sure the result is defined behavior. The algorithm isn't required to keep the object in any kind of sane state while it's running.
Some debug printing during the iteration shows why your particular result happens:
irb(main):005:0> x
=> ["r", "a", "c", "e", "c", "a", "r"]
irb(main):006:0> x.select! { |c| p x; x.count(c).odd? }
["r", "a", "c", "e", "c", "a", "r"]
["r", "a", "c", "e", "c", "a", "r"]
["r", "a", "c", "e", "c", "a", "r"]
["r", "a", "c", "e", "c", "a", "r"] # "e" is kept...
["e", "a", "c", "e", "c", "a", "r"] # ... and moved to the start of the array
["e", "a", "c", "e", "c", "a", "r"]
["e", "a", "c", "e", "c", "a", "r"] # now "r" is kept
=> ["e", "r"]
You can see by the final iteration, there is only one r, and that the e has been moved to the front of the array. Presumably the algorithm modifies the array in-place, moving matched elements to the front, overwriting elements that have already failed your test. It keeps track of how many elements are matched and moved, and then truncates the array down to that many elements.
So, instead, use select.
A longer example that matches more elements makes the problem a little clearer:
irb(main):001:0> nums = (1..10).to_a
=> [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
irb(main):002:0> nums.select! { |i| p nums; i.even? }
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
[2, 2, 3, 4, 5, 6, 7, 8, 9, 10]
[2, 2, 3, 4, 5, 6, 7, 8, 9, 10]
[2, 4, 3, 4, 5, 6, 7, 8, 9, 10]
[2, 4, 3, 4, 5, 6, 7, 8, 9, 10]
[2, 4, 6, 4, 5, 6, 7, 8, 9, 10]
[2, 4, 6, 4, 5, 6, 7, 8, 9, 10]
[2, 4, 6, 8, 5, 6, 7, 8, 9, 10]
[2, 4, 6, 8, 5, 6, 7, 8, 9, 10]
=> [2, 4, 6, 8, 10]
You can see that it does indeed move matched elements to the front of the array, overwriting non-matched elements, and then truncate the array.
Just to give you some other ways of accomplishing what you're doing:
arr = %w( r a c e c a r )
arr.group_by{ |c| arr.count(c).odd? }
# => {false=>["r", "a", "c", "c", "a", "r"], true=>["e"]}
arr.group_by{ |c| arr.count(c).odd? }.values
# => [["r", "a", "c", "c", "a", "r"], ["e"]]
arr.partition{ |c| arr.count(c).odd? }
# => [["e"], ["r", "a", "c", "c", "a", "r"]]
And if you want more readable keys:
arr.group_by{ |c| arr.count(c).odd? ? :odd : :even }
# => {:even=>["r", "a", "c", "c", "a", "r"], :odd=>["e"]}
partition and group_by are basic building blocks for separating elements in an array into some sort of grouping, so it is good to be familiar with them.

How can I compare strings to an array to determine highest index value?

I have several strings that I need to compare with values in an array to determine which has the highest index number. For example, the data looks like this:
array = [2, 3, 4, 5, 6, 7, 8, 9, 10, 'J', 'Q', 'K', 'A']
v1 = "4"
v2 = "A"
v3 = "8"
How would I write it so that it would compare each value and return the fact that v2 is the winner based on the index number for A being 12?
A short version:
array = [2, 3, 4, 5, 6, 7, 8, 9, 10, 'J', 'Q', 'K', 'A']
target = [4, "A", 8]
target & array #=> [4, "A", 8]
array & target #=> [4, 8, "A"]
(array & target ).last #=> "A"
target = ["B", "C"]
(array & target ).last #=> nil
array = [2, 3, 4, 5, 6, 7, 8, 9, 10, 'J', 'Q', 'K', 'A']
v1 = "4"
v2 = "A"
v3 = "8"
array.reverse.map(&:to_s).find { |e| [v1, v2, v3].include?(e) }
# => "A"
or
array.reverse.map(&:to_s).find(&[v1, v2, v3].method(:include?))
# => "A"
You could write:
array = [2, 3, 4, 5, 6, 7, 8, 9, 10, 'J', 'Q', 'K', 'A']
a = array.map(&:to_s)
#=> ["2", "3", "4", "5", "6", "7", "8", "9", "10", "J", "Q", "K", "A"]
target = ["4", "A", "8"]
(target & a).empty? ? nil : a[target.map { |s| a.index(s) }.compact.max]
#=> "A"
target = ["B", "C"]
(target & a).empty? ? nil : a[target.map { |s| a.index(s) }.compact.max]
#=> nil
I have assumed that array may not be sorted.

Convert array into a hash

I try to learn map and group_by but it's difficult...
My array of arrays :
a = [ [1, 0, "a", "b"], [1, 1, "c", "d"], [2, 0, "e", "f"], [3, 1, "g", "h"] ]
Expected result :
b= {
1=> {0=>["a", "b"], 1=>["c", "d"]} ,
2=> {0=>["e", "f"]} ,
3=> {1=>["g", "h"]}
}
Group by the first value, the second value can just be 0 or 1.
A starting :
a.group_by{ |e| e.shift}.map { |k, v| {k=>v.group_by{ |e| e.shift}} }
=> [{1=>{0=>[["a", "b"]], 1=>[["c", "d"]]}},
{2=>{0=>[["e", "f"]]}}, {3=>{1=>[["g", "h"]]}}]
I want to get "a" and "b" with the 2 first values, it's the only solution that I've found... (using a hash of hash)
Not sure if group_by is the simplest solution here:
a = [ [1, 0, "a", "b"], [1, 1, "c", "d"], [2, 0, "e", "f"], [3, 1, "g", "h"] ]
result = a.inject({}) do |acc,(a,b,c,d)|
acc[a] ||= {}
acc[a][b] = [c,d]
acc
end
puts result.inspect
Will print:
{1=>{0=>["a", "b"], 1=>["c", "d"]}, 2=>{0=>["e", "f"]}, 3=>{1=>["g", "h"]}}
Also, avoid changing the items you're operating on directly (the shift calls), the collections you could be receiving in your code might not be yours to change.
If you want a somewhat custom group_by I tend do just do it manually. group_by creates an Array of grouped values, so it creates [["a", "b"]] instead of ["a", "b"]. In addition your code is destructive, i.e. it manipulates the value of a. That is only a bad thing if you plan on re using a later on in its original form, but important to note.
As I mentioned though, you might as well just loop through a once and build the desired structure instead of doing multiple group_bys.
b = {}
a.each do |aa|
(b[aa[0]] ||= {})[aa[1]] = aa[2..3]
end
b # => {1=>{0=>["a", "b"], 1=>["c", "d"]}, 2=>{0=>["e", "f"]}, 3=>{1=>["g", "h"]}}
With (b[aa[0]] ||= {}) we check for the existence of the key aa[0] in the Hash b. If it does not exist, we assign an empty Hash ({}) to that key. Following that, we insert the last two elements of aa (= aa[2..3]) into that Hash, with aa[1] as key.
Note that this does not account for duplicate primary + secondary keys. That is, if you have another entry [1, 1, "x", "y"] it will overwrite the entry of [1, 1, "c", "d"] because they both have keys 1 and 1. You can fix that by storing the values in an Array, but then you might as well just do a double group_by. For example, with destructive behavior on a, handling "duplicates":
# Added [1, 1, "x", "y"], removed some others
a = [ [1, 0, "a", "b"], [1, 1, "c", "d"], [1, 1, "x", "y"] ]
b = Hash[a.group_by(&:shift).map { |k, v| [k, v.group_by(&:shift) ] }]
#=> {1=>{0=>[["a", "b"]], 1=>[["c", "d"], ["x", "y"]]}}
[[1, 0, "a", "b"], [1, 1, "c", "d"], [2, 0, "e", "f"], [3, 1, "g", "h"]].
group_by{ |e| e.shift }.
map{ |k, v| [k, v.inject({}) { |h, v| h[v.shift] = v; h }] }.
to_h
#=> {1=>{0=>["a", "b"], 1=>["c", "d"]}, 2=>{0=>["e", "f"]}, 3=>{1=>["g", "h"]}}
Here's how you can do it (nondestructively) with two Enumerable#group_by's and an Object#tap. The elements of a (arrays) could could vary in size and the size of each could be two or greater.
Code
def convert(arr)
h = arr.group_by(&:first)
h.keys.each { |k| h[k] = h[k].group_by { |a| a[1] }
.tap { |g| g.keys.each { |j|
g[j] = g[j].first[2..-1] } } }
h
end
Example
a = [ [1, 0, "a", "b"], [1, 1, "c", "d"], [2, 0, "e", "f"], [3, 1, "g", "h"] ]
convert(a)
#=> {1=>{0=>["a", "b"], 1=>["c", "d"]}, 2=>{0=>["e", "f"]}, 3=>{1=>["g", "h"]}}
Explanation
h = a.group_by(&:first)
#=> {1=>[[1, 0, "a", "b"], [1, 1, "c", "d"]],
# 2=>[[2, 0, "e", "f"]],
# 3=>[[3, 1, "g", "h"]]}
keys = h.keys
#=> [1, 2, 3]
The first value of keys passed into the block assigns the value 1 to the block variable k. We will set h[1] to a hash f, computed as follows.
f = h[k].group_by { |a| a[1] }
#=> [[1, 0, "a", "b"], [1, 1, "c", "d"]].group_by { |a| a[1] }
#=> {0=>[[1, 0, "a", "b"]], 1=>[[1, 1, "c", "d"]]}
We need to do further processing of this hash, so we capture it with tap and assign it to tap's block variable g (i.e., g will initially equal f above). g will be returned by the block after modification.
We have
g.keys #=> [0, 1]
so 0 is the first value passed into each's block and assigned to the block variable j. We then compute:
g[j] = g[j].first[2..-1]
#=> g[0] = [[1, 0, "a", "b"]].first[2..-1]
#=> ["a", "b"]
Similarly, when g's second key (1) is passed into the block,
g[j] = g[j].first[2..-1]
#=> g[1] = [[1, 1, "c", "d"]].first[2..-1]
#=> ["c", "d"]
Ergo,
h[1] = g
#=> {0=>["a", "b"], 1=>["c", "d"]}
h[2] and h[3] are computed similarly, giving us the desired result.

Ruby Array - Delete first 10 digits

I have an array in Ruby and I would like to delete the first 10 digits in the array.
array = [1, "a", 3, "b", 2, "c", 4, "d", 5, "a", 1, "z", 7, "e", 21, "q", 30, "a", 4, "t", 7, "m", 5, 1, 2, "q", "s", "l", 13, 46, 31]
It would ideally return
['a', 'b', 'c', 'd', 'a', 'z', 'e', 'q', 0, 'a', 4, t, 7, m, 5 , 1, 2, q, s, 1, 13, 46, 31]
By removing the first 10 digits (1,3,2,4,5,1,7,2,1,3).
Note that 21(2 and 1) and 30(3 and 0) both have 2 digits
Here's what I've tried
digits = array.join().scan(/\d/).first(10).map{|s|s.to_i}
=> [1,3,2,4,5,1,7,2,1,3]
elements = array - digits
This is what I got
["a", "b", "c", "d", "a", "z", "e", 21, "q", 30, "a", "t", "m", "q", "s", "l", 13, 46, 31]
Now it looks like it took the difference instead of subtracting.
I have no idea where to go from here. and now I'm lost. Any help is appreciated.
To delete 10 numbers:
10.times.each {array.delete_at(array.index(array.select{|i| i.is_a?(Integer)}.first))}
array
To delete 10 digits:
array = [1, "a", 3, "b", 2, "c", 4, "d", 5, "a", 1, "z", 7, "e", 21, "q", 30, "a", 4, "t", 7, "m", 5, 1, 2, "q", "s", "l", 13, 46, 31]
i = 10
while (i > 0) do
x = array.select{|item| item.is_a?(Integer)}.first
if x.to_s.length > i
y = array.index(x)
array[y] = x.to_s[0, (i-1)].to_i
else
array.delete_at(array.index(x))
end
i -= x.to_s.length
end
array
Unfortunately not a one-liner:
count = 10
array.each_with_object([]) { |e, a|
if e.is_a?(Integer) && count > 0
str = e.to_s # convert integer to string
del = str.slice!(0, count) # delete up to 'count' characters
count -= del.length # subtract number of actually deleted characters
a << str.to_i unless str.empty? # append remaining characters as integer if any
else
a << e
end
}
#=> ["a", "b", "c", "d", "a", "z", "e", "q", 0, "a", 4, "t", 7, "m", 5, 1, 2, "q", "s", "l", 13, 46, 31]
I would be inclined to do it like this.
Code
def doit(array, max_nbr_to_delete)
cnt = 0
array.map do |e|
if (e.is_a? Integer) && cnt < max_nbr_to_delete
cnt += e.to_s.size
if cnt <= max_nbr_to_delete
nil
else
e.to_s[cnt-max_nbr_to_delete..-1].to_i
end
else
e
end
end.compact
end
Examples
array = [ 1, "a", 3, "b", 2, "c", 4, "d", 5, "a", 1, "z", 7, "e", 21, "q",
30, "a", 4, "t", 7, "m", 5, 1, 2, "q", "s", "l", 13, 46, 31]
doit(array, 10)
#=> ["a", "b", "c", "d", "a", "z", "e", "q", 0, "a", 4,
# "t", 7, "m", 5, 1, 2, "q", "s", "l", 13, 46, 31]
doit(array, 100)
#=> ["a", "b", "c", "d", "a", "z", "e", "q", "a", "t", "m", "q", "s", "l"]
Explanation
Each element e of the array that is not an integer is mapped to e.
For each non-negative integer n having d digits, suppose cnt is the number of digits that map has already been removed from the string. There are three possibilities:
if cnt >= max_nbr_to_delete, no more digits are to be removed, so e (itself) is returned
if cnt + d <= max_nbr_to_delete all d digits of e are to be removed, which is done by mapping e to nil and subsequently removing nil elements
if cnt < max_nbr_to_delete and cnt + d > max_nbr_to_delete, e.to_s[cnt+d-max_nbr_to_delete..-1].to_i is returned (i.e. the first cnt+d-max_nbr_to_delete digits of e are removed).

Resources