How to use streams to get a running count? - java-8

I have a class
class Person {
String name;
....
Optional<Integer> children;
}
How do I use streams to get a total count of all children?
public int totalCount(final Set<Person> people) {
int total = 0;
for (Person person : people) {
if (person.getChildren().isPresent()) {
total += person.getChildren().get();
}
}
return total;
}
How can I do this with Java 8 streams?
public int totalCount(final Set<Person> people) {
int total = 0;
people.stream()
.filter(p -> p.getChildren().isPresent())
// ???
}

Alternative:
int sum = people.stream().mapToInt( p -> p.getChildren().orElse(0) ).sum();

You can use Collectors.summingInt:
int count = people.stream()
.filter(p -> p.getChilden().isPresent())
.collect(Collectors.summingInt(p -> p.getChildren().get()));

Another variant would be to use mapToInt in order to obtain an IntStream and then call sum() on it:
int count = people.stream()
.filter(p -> p.getChildren().isPresent())
.mapToInt(p -> p.getChildren().get())
.sum();

Related

Big O of union method of dynamic connectivity problem

i faced a paradox to analyse this function, Why the time complexity of this function is N^2 and not N?
public void union(int a, int b) {
int aid = ids[a];
int bid = ids[b];
for (int i = 0; i < ids.length; i++) {
if (ids[i] == aid) {
ids[i] = bid;
}
}
}
Its an implementation of eager approach, to solve dynamic connectivity problem , complete code is:
// Union method has N^2 time complexity!!
class EagerApproach extends UnionFind {
protected int[] ids;
EagerApproach(int[] input) {
super(input);
ids = new int[input.length];
System.arraycopy(input, 0, ids, 0, input.length);
}
public boolean connected(int a, int b) {
return ids[a] == ids[b];
}
public void union(int a, int b) {
int aid = ids[a];
int bid = ids[b];
for (int i = 0; i < ids.length; i++) {
if (ids[i] == aid) {
ids[i] = bid;
}
}
}
public int[] getIds() {
return ids;
}
}
Provided your array access ids[x] is in constant time O(1), the time complexity of the union method is linear in the length of the array ids. So
O(ids.length)
or O(n) if we define n as ids.length.
Be careful with the definition of n and ids though. If, in your specific application, n was defined as ids.length = n * n, then this is obviously O(n^2) with n being sqrt(ids.length).

Shortest uncommon subseqence

Given two strings s and t, determine length of shortest string z such that z is a subsequence of s and not a subsequence of t.
example :
s :babab,
t :babba
sol :
3 (aab)
not looking for copy pastable code, please if anybody can help with intution for solving this.
thanks a lot !
Here you go. I created on IEnumarable method which gives back all possible combinations. This is compared with t. I optimized the solution to loop only once over the not match String t.
using System;
using System.Collections.Generic;
namespace GuessTheNumber
{
public class Element:IComparable<Element>
{
public string Seq { get; set; }
public int Id { get; set; }
public int CompareTo(Element other)
{
return this.Seq.CompareTo(other.Seq);
}
}
class Program
{
static void Main(string[] args)
{
string s = "babab";
string t = "babba";
string z = ShortestUncommonSuqsequence(s, t);
}
static public string ShortestUncommonSuqsequence(string SubSequenceOf, string NotSubSequenceOf)
{
var uniqueSeq = new SortedList<Element, int>();
uniqueSeq.Add(new Element() { Seq = "", Id = -1 }, -1);
foreach (Element oneSequence in GetNextUniqueSequences(uniqueSeq, SubSequenceOf))
{
int index = oneSequence.Id + 1;
while (index < NotSubSequenceOf.Length)
{
char NotChar = NotSubSequenceOf[index];
if (oneSequence.Seq[oneSequence.Seq.Length - 1] == NotChar) break;
index++;
}
if (index == NotSubSequenceOf.Length)
{
return oneSequence.Seq;
}
else
{
oneSequence.Id = index;
}
}
return null;
}
static public IEnumerable<Element> GetNextUniqueSequences(SortedList<Element, int> UniqueSeq, string Input)
{
SortedList<Element, int> results = new SortedList<Element, int>();
foreach (var prevResult in UniqueSeq)
{
for (int i = 0; i < Input.Length; i++)
{
if (prevResult.Value < prevResult.Key.Seq.Length + i)
{
string nextStr = prevResult.Key.Seq + Input[i].ToString();
Element newElem = new Element() { Seq = nextStr, Id = prevResult.Key.Id };
if (!results.Keys.Contains(newElem))
{
results.Add(newElem, prevResult.Key.Seq.Length + i);
yield return newElem;
}
}
}
}
if (Input.Length > 1)
{
foreach (Element res in GetNextUniqueSequences(results, Input.Substring(1)))
{
yield return res;
}
}
}
}
}

Merge two text input files, each line of the files one after the other. See example

I was trying to solve a problem using java 8 that I have already solved using a simple for loop. However I have no idea how to do this.
The Problem is :
File1 :
1,sdfasfsf
2,sdfhfghrt
3,hdfxcgyjs
File2 :
10,xhgdfgxgf
11,hcvcnhfjh
12,sdfgasasdfa
13,ghdhtfhdsdf
Output should be like
1,sdfasfsf
10,xhgdfgxgf
2,sdfhfghrt
11,hcvcnhfjh
3,hdfxcgyjs
12,sdfgasasdfa
13,ghdhtfhdsdf
I already have this basically working,
The core logic is :
List<String> left = readFile(lhs);
List<String> right = readFile(rhs);
int leftSize = left.size();
int rightSize = right.size();
int size = leftSize > rightSize? leftSize : right.size();
for (int i = 0; i < size; i++) {
if(i < leftSize) {
merged.add(left.get(i));
}
if(i < rightSize) {
merged.add(right.get(i));
}
}
MergeInputs.java
UnitTest
Input files are in src/test/resources/com/linux/test/merge/list of the same repo (only allowed to post two links)
However, I boasted I could do this easily using streams and now I am not sure if this can even be done.
Help is really appreciated.
You may simplify your operation to have less conditionals per element:
int leftSize = left.size(), rightSize = right.size(), min = Math.min(leftSize, rightSize);
List<String> merged = new ArrayList<>(leftSize+rightSize);
for(int i = 0; i < min; i++) {
merged.add(left.get(i));
merged.add(right.get(i));
}
if(leftSize!=rightSize) {
merged.addAll(
(leftSize<rightSize? right: left).subList(min, Math.max(leftSize, rightSize)));
}
Then, you may replace the first part by a stream operation:
int leftSize = left.size(), rightSize = right.size(), min = Math.min(leftSize, rightSize);
List<String> merged=IntStream.range(0, min)
.mapToObj(i -> Stream.of(left.get(i), right.get(i)))
.flatMap(Function.identity())
.collect(Collectors.toCollection(ArrayList::new));
if(leftSize!=rightSize) {
merged.addAll(
(leftSize<rightSize? right: left).subList(min, Math.max(leftSize, rightSize)));
}
But it isn’t really simpler than the loop variant. The loop variant may be even more efficient due to its presized list.
Incorporating both operation into one stream operation would be even more complicated (and probably even less efficient).
the code logic should be like as this:
int leftSize = left.size();
int rightSize = right.size();
int minSize = Math.min(leftSize,rightSize);
for (int i = 0; i < minSize; i++) {
merged.add(left.get(i));
merged.add(right.get(i));
}
// adding remaining elements
merged.addAll(
minSize < leftSize ? left.subList(minSize, leftSize)
: right.subList(minSize, rightSize)
);
Another option is using toggle mode through Iterator, for example:
toggle(left, right).forEachRemaining(merged::add);
//OR using stream instead
List<String> merged = Stream.generate(toggle(left, right)::next)
.limit(left.size() + right.size())
.collect(Collectors.toList());
the toggle method as below:
<T> Iterator<? extends T> toggle(List<T> left, List<T> right) {
return new Iterator<T>() {
private final int RIGHT = 1;
private final int LEFT = 0;
int cursor = -1;
Iterator<T>[] pair = arrayOf(left.iterator(), right.iterator());
#SafeVarargs
private final Iterator<T>[] arrayOf(Iterator<T>... iterators) {
return iterators;
}
#Override
public boolean hasNext() {
for (Iterator<T> each : pair) {
if (each.hasNext()) {
return true;
}
}
return false;
}
#Override
public T next() {
return pair[cursor = next(cursor)].next();
}
private int next(int cursor) {
cursor=pair[LEFT].hasNext()?pair[RIGHT].hasNext()?cursor: RIGHT:LEFT;
return (cursor + 1) % pair.length;
}
};
}

can I get filtered list as a parameter to filter predicate in java 8?

I am trying to TDD the prime factors Kata of uncle Bob in java 8 style.
I got the code working, and I want to add an improvment for checking if a number is prime by looking only at previous prime numbers and not all numbers.
public class PrimeFactorsGenerator {
public static List<Integer> generate(int number) {
return createListOfNumberUntil(number).stream().
filter(x -> isDivided(number, x)).
filter(x -> isPrime(x)).
collect(Collectors.toList());
}
private static boolean isPrime(Integer t) {
for (int i = 2; i < t; i++) {
if(isDivided(t, i)){
return false;
}
}
return true;
}
private static List<Integer> createListOfNumberUntil(int number) {
List<Integer> $ = Lists.newArrayList();
for (int i = 2; i <= number; i++) {
$.add(i);
}
return $;
}
private static boolean isDivided(int num, int i) {
return num % i == 0;
}
}
Is it possible to get the filtered list as a parameter to the isPrime method?

Performance of Mass-Evaluating Expressions in IronPython

In an C#-4.0 application, I have a Dictionary of strongly typed ILists having the same length - a dynamically strongly typed column based table.
I want the user to provide one or more (python-)expressions based on the available columns that will be aggregated over all rows. In a static context it would be:
IDictionary<string, IList> table;
// ...
IList<int> a = table["a"] as IList<int>;
IList<int> b = table["b"] as IList<int>;
double sum = 0;
for (int i = 0; i < n; i++)
sum += (double)a[i] / b[i]; // Expression to sum up
For n = 10^7 this runs in 0.270 sec on my laptop (win7 x64). Replacing the expression by a delegate with two int arguments it takes 0.580 sec, for a nontyped delegate 1.19 sec.
Creating the delegate from IronPython with
IDictionary<string, IList> table;
// ...
var options = new Dictionary<string, object>();
options["DivisionOptions"] = PythonDivisionOptions.New;
var engine = Python.CreateEngine(options);
string expr = "a / b";
Func<int, int, double> f = engine.Execute("lambda a, b : " + expr);
IList<int> a = table["a"] as IList<int>;
IList<int> b = table["b"] as IList<int>;
double sum = 0;
for (int i = 0; i < n; i++)
sum += f(a[i], b[i]);
it takes 3.2 sec (and 5.1 sec with Func<object, object, object>) - factor 4 to 5.5. Is this the expected overhead for what I'm doing? What could be improved?
If I have many columns, the approach chosen above will not be sufficient any more. One solution could be to determine the required columns for each expression and use only those as arguments. The other solution I've unsuccessfully tried was using a ScriptScope and dynamically resolve the columns. For that I defined a RowIterator that has a RowIndex for the active row and a property for each column.
class RowIterator
{
IList<int> la;
IList<int> lb;
public RowIterator(IList<int> a, IList<int> b)
{
this.la = a;
this.lb = b;
}
public int RowIndex { get; set; }
public int a { get { return la[RowIndex]; } }
public int b { get { return lb[RowIndex]; } }
}
A ScriptScope can be created from a IDynamicMetaObjectProvider, which I expected to be implemented by C#'s dynamic - but at runtime engine.CreateScope(IDictionary) is trying to be called, which fails.
dynamic iterator = new RowIterator(a, b) as dynamic;
var scope = engine.CreateScope(iterator);
var expr = engine.CreateScriptSourceFromString("a / b").Compile();
double sum = 0;
for (int i = 0; i < n; i++)
{
iterator.Index = i;
sum += expr.Execute<double>(scope);
}
Next I tried to let RowIterator inherit from DynamicObject and made it to a running example - with terrible performance: 158 sec.
class DynamicRowIterator : DynamicObject
{
Dictionary<string, object> members = new Dictionary<string, object>();
IList<int> la;
IList<int> lb;
public DynamicRowIterator(IList<int> a, IList<int> b)
{
this.la = a;
this.lb = b;
}
public int RowIndex { get; set; }
public int a { get { return la[RowIndex]; } }
public int b { get { return lb[RowIndex]; } }
public override bool TryGetMember(GetMemberBinder binder, out object result)
{
if (binder.Name == "a") // Why does this happen?
{
result = this.a;
return true;
}
if (binder.Name == "b")
{
result = this.b;
return true;
}
if (base.TryGetMember(binder, out result))
return true;
if (members.TryGetValue(binder.Name, out result))
return true;
return false;
}
public override bool TrySetMember(SetMemberBinder binder, object value)
{
if (base.TrySetMember(binder, value))
return true;
members[binder.Name] = value;
return true;
}
}
I was surprised that TryGetMember is called with the name of the properties. From the documentation I would have expected that TryGetMember would only be called for undefined properties.
Probably for a sensible performance I would need to implement IDynamicMetaObjectProvider for my RowIterator to make use of dynamic CallSites, but couldn't find a suited example for me to start with. In my experiments I didn't know how to handle __builtins__ in BindGetMember:
class Iterator : IDynamicMetaObjectProvider
{
IList<int> la;
IList<int> lb;
public Iterator(IList<int> a, IList<int> b)
{
this.la = a;
this.lb = b;
}
public int RowIndex { get; set; }
public int a { get { return la[RowIndex]; } }
public int b { get { return lb[RowIndex]; } }
public DynamicMetaObject GetMetaObject(Expression parameter)
{
return new MetaObject(parameter, this);
}
private class MetaObject : DynamicMetaObject
{
internal MetaObject(Expression parameter, Iterator self)
: base(parameter, BindingRestrictions.Empty, self) { }
public override DynamicMetaObject BindGetMember(GetMemberBinder binder)
{
switch (binder.Name)
{
case "a":
case "b":
Type type = typeof(Iterator);
string methodName = binder.Name;
Expression[] parameters = new Expression[]
{
Expression.Constant(binder.Name)
};
return new DynamicMetaObject(
Expression.Call(
Expression.Convert(Expression, LimitType),
type.GetMethod(methodName),
parameters),
BindingRestrictions.GetTypeRestriction(Expression, LimitType));
default:
return base.BindGetMember(binder);
}
}
}
}
I'm sure my code above is suboptimal, at least it doesn't handle the IDictionary of columns yet. I would be grateful for any advices on how to improve design and/or performance.
I also compared the performance of IronPython against a C# implementation. The expression is simple, just adding the values of two arrays at a specified index. Accessing the arrays directly provides the base line and theoretical optimum. Accessing the values via a symbol dictionary has still acceptable performance.
The third test creates a delegate from a naive (and bad by intend) expression tree without any fancy stuff like call-side caching, but it's still faster than IronPython.
Scripting the expression via IronPython takes the most time. My profiler shows me that most time is spent in PythonOps.GetVariable, PythonDictionary.TryGetValue and PythonOps.TryGetBoundAttr. I think there's room for improvement.
Timings:
Direct: 00:00:00.0052680
via Dictionary: 00:00:00.5577922
Compiled Delegate: 00:00:03.2733377
Scripted: 00:00:09.0485515
Here's the code:
public static void PythonBenchmark()
{
var engine = Python.CreateEngine();
int iterations = 1000;
int count = 10000;
int[] a = Enumerable.Range(0, count).ToArray();
int[] b = Enumerable.Range(0, count).ToArray();
Dictionary<string, object> symbols = new Dictionary<string, object> { { "a", a }, { "b", b } };
Func<int, object> calculate = engine.Execute("lambda i: a[i] + b[i]", engine.CreateScope(symbols));
var sw = Stopwatch.StartNew();
int sum = 0;
for (int iteration = 0; iteration < iterations; iteration++)
{
for (int i = 0; i < count; i++)
{
sum += a[i] + b[i];
}
}
Console.WriteLine("Direct: " + sw.Elapsed);
sw.Restart();
for (int iteration = 0; iteration < iterations; iteration++)
{
for (int i = 0; i < count; i++)
{
sum += ((int[])symbols["a"])[i] + ((int[])symbols["b"])[i];
}
}
Console.WriteLine("via Dictionary: " + sw.Elapsed);
var indexExpression = Expression.Parameter(typeof(int), "index");
var indexerMethod = typeof(IList<int>).GetMethod("get_Item");
var lookupMethod = typeof(IDictionary<string, object>).GetMethod("get_Item");
Func<string, Expression> getSymbolExpression = symbol => Expression.Call(Expression.Constant(symbols), lookupMethod, Expression.Constant(symbol));
var addExpression = Expression.Add(
Expression.Call(Expression.Convert(getSymbolExpression("a"), typeof(IList<int>)), indexerMethod, indexExpression),
Expression.Call(Expression.Convert(getSymbolExpression("b"), typeof(IList<int>)), indexerMethod, indexExpression));
var compiledFunc = Expression.Lambda<Func<int, object>>(Expression.Convert(addExpression, typeof(object)), indexExpression).Compile();
sw.Restart();
for (int iteration = 0; iteration < iterations; iteration++)
{
for (int i = 0; i < count; i++)
{
sum += (int)compiledFunc(i);
}
}
Console.WriteLine("Compiled Delegate: " + sw.Elapsed);
sw.Restart();
for (int iteration = 0; iteration < iterations; iteration++)
{
for (int i = 0; i < count; i++)
{
sum += (int)calculate(i);
}
}
Console.WriteLine("Scripted: " + sw.Elapsed);
Console.WriteLine(sum); // make sure cannot be optimized away
}
Although I don't know all the specific details in your case, a slowdown of only 5x for doing anything this low level in IronPython is actually pretty good. Most entries in the Computer Languages Benchmark Game show a 10-30x slowdown.
A major part of the reason is that IronPython has to allow for the possibility that you've done something sneaky at runtime, and thus can't produce code of the same efficiency.

Resources