Subtract the values for each group in java 8 - java-8

I have class Data10 which have 5 fields and have list of values then i want to subtract the value from each unique id group of valuePeriod as below example then finally in result generate new Data10 object with same id, same year, new name, same valuePeriod and result of subtraction of the value.
Note: Below example I have only 2 name which i want to subtract from A-B then name become "C" in the expected result.
Important note: I want to do A-B not a B-A.
class Data10 {
int id;
int year;
String name;
int valuePeriod;
BigDecimal value;
}
List<Data10> list = new ArrayList();
list.add(new Data10(1, 2020, "A", 1, new BigDecimal(5.5)));
list.add(new Data10(1, 2020, "B", 1, new BigDecimal(2.5)));
list.add(new Data10(1, 2020, "A", 2, new BigDecimal(8.5)));
list.add(new Data10(1, 2020, "B", 2, new BigDecimal(1.5)));
list.add(new Data10(1, 2020, "A", 3, new BigDecimal(6.5)));
list.add(new Data10(1, 2020, "B", 3, new BigDecimal(2.5)));
list.add(new Data10(2, 2020, "A", 1, new BigDecimal(6.5)));
list.add(new Data10(2, 2020, "B", 1, new BigDecimal(1.5)));
list.add(new Data10(2, 2020, "A", 2, new BigDecimal(9.5)));
list.add(new Data10(2, 2020, "B", 2, new BigDecimal(3.5)));
list.add(new Data10(2, 2020, "A", 3, new BigDecimal(7.5)));
list.add(new Data10(2, 2020, "B", 3, new BigDecimal(5.5)));
I try with grouped the data but how to subtract value for same valuePeriod then generate new object? Any help please. Thanks in advance!
list.stream().collect(Collectors.groupingBy(Data10::getId, Collectors.groupingBy(Data10::getValuePeriod)));
Expected Result:
Data10(1, 2020, "C", 1, 3.0); (Subtract the value for same valuePeriod A-B)
Data10(1, 2020, "C", 2, 7.0);
Data10(1, 2020, "C", 3, 4.0);
Data10(2, 2020, "C", 1, 5.0);
Data10(2, 2020, "C", 2, 6.0);
Data10(2, 2020, "C", 3, 2.0);

If I can think of a better way I will post it. But for now it's a two stage process.
first create a map using the name, id, valuePeriod, and year as the key. This presumes every A has a matching B (else it gets more complicated).
Map<String, Data10> map =
list.stream()
.collect(Collectors.toMap(
d -> d.getName() + d.getId()
+ d.getValuePeriod()+d.getYear(),
d -> d));
map.entrySet().forEach(System.out::println);
prints
A112020=[1, 2020, A, 1, 5.5]
A222020=[2, 2020, A, 2, 9.5]
A232020=[2, 2020, A, 3, 7.5]
A122020=[1, 2020, A, 2, 8.5]
A132020=[1, 2020, A, 3, 6.5]
B212020=[2, 2020, B, 1, 1.5]
B222020=[2, 2020, B, 2, 3.5]
B112020=[1, 2020, B, 1, 2.5]
B232020=[2, 2020, B, 3, 5.5]
A212020=[2, 2020, A, 1, 6.5]
B132020=[1, 2020, B, 3, 2.5]
B122020=[1, 2020, B, 2, 1.5]
Now stream the original list and use the map to reference the appropriate key, filtering out the "B" values.
Create a new Data10 instance, populating the existing values along with the difference of A-B
List<Data10> result = list.stream().filter(d->d.getName().equals("A"))
.map(d->new Data10(d.getId(),
d.getYear(), "C", d.getValuePeriod(),
map.get("A" + d.getId() + d.getValuePeriod()+d.getYear()).getValue()
.subtract(map.get("B" + d.getId()
+ d.getValuePeriod()+d.getYear()).getValue())))
.toList();
result.forEach(System.out::println);
prints
[1, 2020, C, 1, 3.0]
[1, 2020, C, 2, 7.0]
[1, 2020, C, 3, 4.0]
[2, 2020, C, 1, 5.0]
[2, 2020, C, 2, 6.0]
[2, 2020, C, 3, 2.0]
I made an assumption that there could be different years that might have the same valuePeriod so it was necessary to include it in the key so as to target the appropriate values. You can of course modify to fit your exact requirements.

Related

Get frequency of related tags in a table - calculated table or ml?

I have a main table of multiple string tags:
["A", "B", "C", "D"]
["A", "C", "D", "G"]
["A", "F", "G", "H"]
["A", "B", "G", "H"]
...
When I create a new row and insert the first tag (by example "A"), I want to get suggested the most frequent tags related to it by looking in the existing rows.
In other words, I want to know for each tag (by example "A"), the frequency of related tags and get a list of related tags ordered by most frequents.
For example:
"A".get_most_frequently_related_tags()
= {"G": 3, "B": 2, "C": 2, "H": 2}
My approach is to iterate the main table and create dinamically a new table with this contents:
[ tag, related_tag, freq ]
[ "A", "B", 2 ]
[ "A", "G", 3 ]
[ "A", "H", 2 ]
...
and then select only rows with tag "A" to extract an hash of ordered [related_tag: freq].
Is that the best approach? I don't know if there's a better algorithm (or using machine learning?)...
Instead of a new table with one row per pair (tag, related_tag), I suggest a mapping with one row per tag, but this row maps the tag to the whole list of all its related tags (and their frequencies).
Most programming languages have a standard "map" in their standard library: in C++, it's std::map or std::unordered_map; in Java, it's the interface java.util.Map, implemented as java.util.HashMap or java.util.TreeMap; in python, it's dict.
Here is a solution in python. The map is implemented with collections.defaultdict, and it maps each tag to a collections.Counter, the python tool of choice to count frequencies.
from collections import Counter, defaultdict
table = [
["A", "B", "C", "D"],
["A", "C", "D", "G"],
["A", "F", "G", "H"],
["A", "B", "G", "H"],
]
def build_frequency_table(table):
freqtable = defaultdict(Counter)
for row in table:
for tag in row:
freqtable[tag].update(row)
for c,freq in freqtable.items():
del freq[c]
return freqtable
freqtable = build_frequency_table(table)
print( freqtable )
# defaultdict(<class 'collections.Counter'>,
# {'A': Counter({'G': 3, 'B': 2, 'C': 2, 'D': 2, 'H': 2, 'F': 1}),
# 'B': Counter({'A': 2, 'C': 1, 'D': 1, 'G': 1, 'H': 1}),
# 'C': Counter({'A': 2, 'D': 2, 'B': 1, 'G': 1}),
# 'D': Counter({'A': 2, 'C': 2, 'B': 1, 'G': 1}),
# 'G': Counter({'A': 3, 'H': 2, 'C': 1, 'D': 1, 'F': 1, 'B': 1}),
# 'F': Counter({'A': 1, 'G': 1, 'H': 1}),
# 'H': Counter({'A': 2, 'G': 2, 'F': 1, 'B': 1})})
print(freqtable['A'].most_common())
# [('G', 3), ('B', 2), ('C', 2), ('D', 2), ('H', 2), ('F', 1)]
I've had a go at finding a solution for this in C#. I cannot defend this approach performance-wise, but 1) it serves the purpose (at least for inputs that are not too large); and 2) I found it to be an interesting challenge personally.
As in Stef's answer, a dictionary is created and may be used to look up any wanted tag to see all of the tag's related tags, ordered by frequency.
I've placed the dictionary creation inside an extension method:
public static IDictionary<string, List<(string Tag, int Count)>> AsRelatedTagWithFrequencyMap
(this IEnumerable<IEnumerable<string>> relatedTags)
{
return relatedTags
.SelectMany(row => row
.Select(targetTag =>
(TargetTag: targetTag,
RelatedTags: row.Where(tag => tag != targetTag))))
.GroupBy(relations => relations.TargetTag)
.ToDictionary(
grouping => grouping.Key,
grouping => grouping
.SelectMany(relations => relations.RelatedTags)
.GroupBy(relatedTag => relatedTag)
.Select(grouping => (RelatedTag: grouping.Key, Count: grouping.Count()))
.OrderByDescending(relatedTag => relatedTag.Count)
.ToList());
}
It is used as follows:
var tagsUsedWithTags = new List<string[]>
{
new[] { "A", "B", "C", "D" },
new[] { "A", "C", "D", "G" },
new[] { "A", "F", "G", "H" },
new[] { "A", "B", "G", "H" }
};
var relatedTagsOfTag = tagsUsedWithTags.AsRelatedTagWithFrequencyMap();
Printing the dictionary content:
foreach (var relation in relatedTagsOfTag)
{
Console.WriteLine($"{relation.Key}: [ {string.Join(", ", relation.Value.Select(related => $"({related.Tag}: {related.Count})"))} ]");
}
A: [ (G: 3), (B: 2), (C: 2), (D: 2), (H: 2), (F: 1) ]
B: [ (A: 2), (C: 1), (D: 1), (G: 1), (H: 1) ]
C: [ (A: 2), (D: 2), (B: 1), (G: 1) ]
D: [ (A: 2), (C: 2), (B: 1), (G: 1) ]
F: [ (A: 1), (G: 1), (H: 1) ]
G: [ (A: 3), (H: 2), (C: 1), (D: 1), (F: 1), (B: 1) ]
H: [ (A: 2), (G: 2), (F: 1), (B: 1) ]

Why d3.extent without parseInt returns the max value of "9" on column num_colors on Bob Ross dataset?

With this dataset (https://github.com/jwilber/Bob_Ross_Paintings/tree/master/data), I want to get the min and max value of the column:
num_colors.
With d3.extent() I'm able to achieve it, the values are stringify and needed to be parse first.
But why does it return "9" without parseInt.
The possible values are the following for num_colors:
1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15
const dataset = await d3.csv("./bob_ross_paintings.csv")
const x1 = d => d.num_colors
console.log(d3.extent(dataset, yAccessor))
// ["1", "9"]
const x2 = d => parseInt(d.num_colors)
console.log(d3.extent(dataset, yAccessor))
// [1, 15]
Without parseInt d3.extent interprets num_colors as a string value and sorts it alphabetically

Sort Array based on other Sorted Array

I have two arrays of the same size and I sort the second one. How can I array the first one to match?
Basic example (imagine replacing Ints with Strings):
var array1 = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0]
var array2 = [5, 2, 3, 4, 5, 6, 8, 5, 4, 5, 1]
array2.sort = ({ $0 > $1})
Result:
array2 is now [8, 6, 5, 5, 5, 5, 4, 4, 3, 2, 1]
How to sort array1's index value to match array2?
array1 should now be [6, 5, 0, 4, 7, 9, 3, 8, 2, 1, 0]
Zip2, sorted and map
array1 = map(sorted(Zip2(array1, array2), {$0.1 > $1.1}), { $0.0 })
Combining filter
var array1 = ["0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "0"]
var array2 = [5, 2, 3, 4, 5, 6, 8, 5, 4, 5, 1]
func isEven(x:Int) -> Bool {
return x % 2 == 0
}
let result = map(sorted(filter(Zip2(array1, array2), { isEven($0.1) }), {$0.1 > $1.1}), { $0.0 })
// -> ["6", "5", "3", "8", "1"]
As you can see, the line is too complex, you might want to Array method chain syntax:
let result2 = Array(Zip2(array1, array2))
.filter({ isEven($0.1) })
.sorted({ $0.1 > $1.1 })
.map({ $0.0 })
Anyway, if your array2 is [PFObject], you can implement the function something like:
func isOpen(restaurant: PFObject, forTime time: String, onDay day: Int) -> Bool {
// return `true` if the restaurant is open, `false` otherwise
}

Strange Ruby 2+ Behavior with "select!"

I'm having an issue that I can't seem to find documented or explained anywhere so I'm hoping someone here can help me out. I've verified the unexpected behavior on three versions of Ruby, all 2.1+, and verified that it doesn't happen on an earlier version (though it's through tryruby.org and I don't know which version they're using). Anyway, for the question I'll just post some code with results and hopefully someone can help me debug it.
arr = %w( r a c e c a r ) #=> ["r","a","c","e","c","a","r"]
arr.select { |c| arr.count(c).odd? } #=> ["e"]
arr.select! { |c| arr.count(c).odd? } #=> ["e","r"] <<<<<<<<<<<<<<< ??????
I think the confusing part for me is clearly marked and if anyone can explain if this is a bug or if there's some logic to it, I'd greatly appreciate it. Thanks!
You're modifying the array while you're read from it while you iterate over it. I'm not sure the result is defined behavior. The algorithm isn't required to keep the object in any kind of sane state while it's running.
Some debug printing during the iteration shows why your particular result happens:
irb(main):005:0> x
=> ["r", "a", "c", "e", "c", "a", "r"]
irb(main):006:0> x.select! { |c| p x; x.count(c).odd? }
["r", "a", "c", "e", "c", "a", "r"]
["r", "a", "c", "e", "c", "a", "r"]
["r", "a", "c", "e", "c", "a", "r"]
["r", "a", "c", "e", "c", "a", "r"] # "e" is kept...
["e", "a", "c", "e", "c", "a", "r"] # ... and moved to the start of the array
["e", "a", "c", "e", "c", "a", "r"]
["e", "a", "c", "e", "c", "a", "r"] # now "r" is kept
=> ["e", "r"]
You can see by the final iteration, there is only one r, and that the e has been moved to the front of the array. Presumably the algorithm modifies the array in-place, moving matched elements to the front, overwriting elements that have already failed your test. It keeps track of how many elements are matched and moved, and then truncates the array down to that many elements.
So, instead, use select.
A longer example that matches more elements makes the problem a little clearer:
irb(main):001:0> nums = (1..10).to_a
=> [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
irb(main):002:0> nums.select! { |i| p nums; i.even? }
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
[2, 2, 3, 4, 5, 6, 7, 8, 9, 10]
[2, 2, 3, 4, 5, 6, 7, 8, 9, 10]
[2, 4, 3, 4, 5, 6, 7, 8, 9, 10]
[2, 4, 3, 4, 5, 6, 7, 8, 9, 10]
[2, 4, 6, 4, 5, 6, 7, 8, 9, 10]
[2, 4, 6, 4, 5, 6, 7, 8, 9, 10]
[2, 4, 6, 8, 5, 6, 7, 8, 9, 10]
[2, 4, 6, 8, 5, 6, 7, 8, 9, 10]
=> [2, 4, 6, 8, 10]
You can see that it does indeed move matched elements to the front of the array, overwriting non-matched elements, and then truncate the array.
Just to give you some other ways of accomplishing what you're doing:
arr = %w( r a c e c a r )
arr.group_by{ |c| arr.count(c).odd? }
# => {false=>["r", "a", "c", "c", "a", "r"], true=>["e"]}
arr.group_by{ |c| arr.count(c).odd? }.values
# => [["r", "a", "c", "c", "a", "r"], ["e"]]
arr.partition{ |c| arr.count(c).odd? }
# => [["e"], ["r", "a", "c", "c", "a", "r"]]
And if you want more readable keys:
arr.group_by{ |c| arr.count(c).odd? ? :odd : :even }
# => {:even=>["r", "a", "c", "c", "a", "r"], :odd=>["e"]}
partition and group_by are basic building blocks for separating elements in an array into some sort of grouping, so it is good to be familiar with them.

Ruby Array - Delete first 10 digits

I have an array in Ruby and I would like to delete the first 10 digits in the array.
array = [1, "a", 3, "b", 2, "c", 4, "d", 5, "a", 1, "z", 7, "e", 21, "q", 30, "a", 4, "t", 7, "m", 5, 1, 2, "q", "s", "l", 13, 46, 31]
It would ideally return
['a', 'b', 'c', 'd', 'a', 'z', 'e', 'q', 0, 'a', 4, t, 7, m, 5 , 1, 2, q, s, 1, 13, 46, 31]
By removing the first 10 digits (1,3,2,4,5,1,7,2,1,3).
Note that 21(2 and 1) and 30(3 and 0) both have 2 digits
Here's what I've tried
digits = array.join().scan(/\d/).first(10).map{|s|s.to_i}
=> [1,3,2,4,5,1,7,2,1,3]
elements = array - digits
This is what I got
["a", "b", "c", "d", "a", "z", "e", 21, "q", 30, "a", "t", "m", "q", "s", "l", 13, 46, 31]
Now it looks like it took the difference instead of subtracting.
I have no idea where to go from here. and now I'm lost. Any help is appreciated.
To delete 10 numbers:
10.times.each {array.delete_at(array.index(array.select{|i| i.is_a?(Integer)}.first))}
array
To delete 10 digits:
array = [1, "a", 3, "b", 2, "c", 4, "d", 5, "a", 1, "z", 7, "e", 21, "q", 30, "a", 4, "t", 7, "m", 5, 1, 2, "q", "s", "l", 13, 46, 31]
i = 10
while (i > 0) do
x = array.select{|item| item.is_a?(Integer)}.first
if x.to_s.length > i
y = array.index(x)
array[y] = x.to_s[0, (i-1)].to_i
else
array.delete_at(array.index(x))
end
i -= x.to_s.length
end
array
Unfortunately not a one-liner:
count = 10
array.each_with_object([]) { |e, a|
if e.is_a?(Integer) && count > 0
str = e.to_s # convert integer to string
del = str.slice!(0, count) # delete up to 'count' characters
count -= del.length # subtract number of actually deleted characters
a << str.to_i unless str.empty? # append remaining characters as integer if any
else
a << e
end
}
#=> ["a", "b", "c", "d", "a", "z", "e", "q", 0, "a", 4, "t", 7, "m", 5, 1, 2, "q", "s", "l", 13, 46, 31]
I would be inclined to do it like this.
Code
def doit(array, max_nbr_to_delete)
cnt = 0
array.map do |e|
if (e.is_a? Integer) && cnt < max_nbr_to_delete
cnt += e.to_s.size
if cnt <= max_nbr_to_delete
nil
else
e.to_s[cnt-max_nbr_to_delete..-1].to_i
end
else
e
end
end.compact
end
Examples
array = [ 1, "a", 3, "b", 2, "c", 4, "d", 5, "a", 1, "z", 7, "e", 21, "q",
30, "a", 4, "t", 7, "m", 5, 1, 2, "q", "s", "l", 13, 46, 31]
doit(array, 10)
#=> ["a", "b", "c", "d", "a", "z", "e", "q", 0, "a", 4,
# "t", 7, "m", 5, 1, 2, "q", "s", "l", 13, 46, 31]
doit(array, 100)
#=> ["a", "b", "c", "d", "a", "z", "e", "q", "a", "t", "m", "q", "s", "l"]
Explanation
Each element e of the array that is not an integer is mapped to e.
For each non-negative integer n having d digits, suppose cnt is the number of digits that map has already been removed from the string. There are three possibilities:
if cnt >= max_nbr_to_delete, no more digits are to be removed, so e (itself) is returned
if cnt + d <= max_nbr_to_delete all d digits of e are to be removed, which is done by mapping e to nil and subsequently removing nil elements
if cnt < max_nbr_to_delete and cnt + d > max_nbr_to_delete, e.to_s[cnt+d-max_nbr_to_delete..-1].to_i is returned (i.e. the first cnt+d-max_nbr_to_delete digits of e are removed).

Resources