Appending occurrences and aggregation to the duplicates records - algorithm

This is my file which contains these lines:
**name,bus_id,bus_timing,bus_ticket**
yuhindmklwm00409219,958193628,0305delete,2700)
(yuhindmklwm00409219,958193628,0305delete,800)
(yuhindmklwm00409219,959262446,0219delete,62)
(yuhindmklwm00437293,752013801,0220delete,2700)
(yuhindmklwm00437293,85382,0126delete,500)
(yuhindmklwm00437293,863056514,0326delete,-2700)
(yuhindmklwm00437293,863056514,0326delete,2700)
(yuhindmklwm00437293,85258,0313delete,1000)
(yuhindmklwm00437293,85012,0311delete,1000)
(yuhindmklwm00437293,85718,0311delete,2700)
(yuhindmklwm00437293,744622574,0322delete,90)
(yuhindmklwm00437293,83704,0215delete,17)
(yuhindmklwm00437293,85253,0331delete,-2700)
(yuhindmklwm00437293,85253,0331delete,2700)
(yuhindmklwm00437293,752013801,0305delete,2700)
(yuhindmklwm00437293,33165,0315delete,1000)
(yuhindmklwm00437293,85018,0319delete,100)
(yuhindmklwm00437293,85018,0219delete,100)
(yuhindmklwm00437293,85018,0118delete,100)
(yuhindmklwm00437293,90265,0312delete,6)
(yuhindmklwm00437293,02465,0312delete,25)
(yuhindmklwm00437293,857164939,0313delete,15)
(yuhindmklwm00437293,22102,0313delete,4)
(yuhindmklwm00437293,55423,0313delete,100)
(yuhindmklwm00437293,02465,0314delete,1)
(yuhindmklwm00437293,90265,0312delete,1)
(yuhindmklwm00437293,93108,0315delete,25)
(yuhindmklwm00437293,220432304,0315delete,35)
(yuhindmklwm00437293,701211570,0315delete,35)
(yuhindmklwm00437293,28801,0315delete,10)
(yuhindmklwm00437293,93108,0211delete,3)
(yuhindmklwm00437293,93108,02)
My final output should contain duplicate records and their occurences with sum amount and percentile.
name,bus_id,bus_timing, 60th percentile value of bus_ticket, sum_bus_ticket, occurence)
yuhindmklwm00409219,958193628,0305delete,2000, 2700, 1)
yuhindmklwm00409219,958193628,0305delete,2000, 3500, 2)
.......
.......
......
This can be solved by list's but it's not efficient can somebody think of other data structures?
It's ok if you ignore aggregation's like sum or percentile. But at least one aggregation should be there.
This is my percentile function:
scala> def percentileValue(p: Int,data: List[Int]): Int = {val firstSort=data.sorted; val k=math.ceil((data.size-1) * (p / 100.0)).toInt; return firstSort(k).toInt}
percentileValue: (p: Int, data: List[Int])Int
scala> val lst=List(1,2,3,4,5,6)
lst: List[Int] = List(1, 2, 3, 4, 5, 6)
scala> percentileValue(60,lst)
res142: Int = 4

Shortened the data for better testing. Something like that?
val lili = List (List ("yuhindmklwm004092193", "9581936283", "0305delete3", 2700),
List ("yuhindmklwm004092193", "9581936283", "0305delete3", 800),
List ("yuhindmklwm004092193", "9592624463", "0219delete3", 62),
List ("yuhindmklwm004372933", "7520138013", "0220delete3", 2700),
List ("yuhindmklwm004372933", "853823", "0126delete3", 500),
List ("yuhindmklwm004372933", "8630565143", "0326delete3", -2700),
List ("yuhindmklwm004372933", "8630565143", "0326delete3", 2700),
List ("yuhindmklwm004372933", "852583", "0313delete3", 1000))
Grouping:
scala> lili.groupBy {case (list) => list(0) }.map {case (k, v) => (k, v.map (_(3)))}
res18: scala.collection.immutable.Map[Any,List[Any]] = Map(yuhindmklwm004372933 -> List(2700, 500, -2700, 2700, 1000), yuhindmklwm004092193 -> List(2700, 800, 62))
Mapping on percentileValue:
lili.groupBy {case (list) => list(0) }.map {case (k, v) => (k, v.map (_(3)))}.map {case (k, v:List[Int])=> (k, percentileValue (60, v))}
<console>:10: warning: non-variable type argument Int in type pattern List[Int] (the underlying of List[Int]) is unchecked since it is eliminated by erasure
lili.groupBy {case (list) => list(0) }.map {case (k, v) => (k, v.map (_(3)))}.map {case (k, v:List[Int])=> (k, percentileValue (60, v))}
^
res22: scala.collection.immutable.Map[Any,Int] = Map(yuhindmklwm004372933 -> 2700, yuhindmklwm004092193 -> 2700)
scala> lili.groupBy {case (list) => list(0) }.map {case (k, v) => (k, v.map (_(3)))}.map {case (k, v:List[Int])=> (k, percentileValue (10, v))}
<console>:10: warning: non-variable type argument Int in type pattern List[Int] (the underlying of List[Int]) is unchecked since it is eliminated by erasure
lili.groupBy {case (list) => list(0) }.map {case (k, v) => (k, v.map (_(3)))}.map {case (k, v:List[Int])=> (k, percentileValue (10, v))}
^
res23: scala.collection.immutable.Map[Any,Int] = Map(yuhindmklwm004372933 -> 500, yuhindmklwm004092193 -> 800)

Related

Adjacency-List Paths in Scala

If I have an adjacency list, represented in Scala like this:
val l = List(
List((1, 1), (2, 3), (4, 10)),
List((2, 1)),
List((3, 1), (4, 5)),
List(4, 1)),
List())
Every "List" contains the costs of the path from one node to another in a directed graph. So the first "List" with three entries represents the successors of the first node (counting from 0). That means Node 0 directs to Node 1 with a cost of 1, to Node 2 with a cost of 3 and to Node 4 with a cost of 10 and so on.
How would it be possible to recursively compute if there are paths with a max cost from a given node to another? I thought of something like this:
def hasMaxCostPath: (List[List[(Int, Int)]], Int, Int, Int) => Int => Boolean = (adj, from, to, max) => len =>
So the function receives the adjacency list "adj", the start node "from", the end node "to", the max cost of the path "max" and the max length of the path "len".
So I think the result should be like the following based on the above given adjacency list:
hasMaxCostPath(l, 0, 2, 2)(1) == false
hasMaxCostPath(l, 0, 2, 3)(1) == true
Furthermore how would it be possible to recursively compute a list of all costs of paths that go from one specified node to another within a given max length? Maybe like this:
def getPaths: (List[List[(Int, Int)]], List[(Int, Int)], Int, Int) => List[Int] =
(adj, vn, dest, len) =>
So this function would get the adjacency list "adj", a list of already visited nodes "vn" with (node, cost), in which we take the given node as the start node, the destination node "dest" and the max length of the path "len". And for this function the results could be like the following:
getPaths(l, List((0, 0)), 2, 1) == List(3) // Node0 -> Node2
getPaths(l, List((0, 0)), 2, 2) == List(2, 3) // Node0 -> Node1 -> Node2 AND Node0 -> Node2
Sorry, I'm very new to Scala
Does this work for you?
package foo
object Foo {
def main(args: Array[String]): Unit = {
val edges = List(
List((1, 1), (2, 3), (4, 10)),
List((2, 1)),
List((3, 1), (4, 5)),
List((4, 1)),
List())
println(hasMaxCostPath(edges,0,1,2))
println(hasMaxCostPath(edges,0,2,2))
}
def hasMaxCostPath(edges: List[List[(Int, Int)]], start: Int, end: Int, maxCost: Int): Boolean = {
maxCost > 0 &&
edges(start).exists(a =>
(a._1 == end && a._2 <= maxCost) ||
hasMaxCostPath(edges, a._1, end, maxCost - a._2)
)
}
}
Edit: ====
The above solution did not take into account the length parameter.
Here is a solution with the length parameter:
package foo
object Foo {
def main(args: Array[String]): Unit = {
val edges = List(
List((1, 1), (2, 3), (4, 10)),
List((2, 1)),
List((3, 1), (4, 5)),
List((4, 1)),
List())
assert(! hasMaxCostPath(edges,0,4,4,3))
assert(hasMaxCostPath(edges,0,4,4,4))
}
def hasMaxCostPath(edges: List[List[(Int, Int)]], start: Int, end: Int, maxCost: Int, maxLength: Int): Boolean = {
maxLength > 0 &&
maxCost >= 0 &&
edges(start).exists(a =>
(a._1 == end && a._2 <= maxCost) ||
hasMaxCostPath(edges, a._1, end, maxCost - a._2, maxLength - 1)
)
}
}
=== Edit:
And here is a solution including your second problem:
package foo
object Foo {
def main(args: Array[String]): Unit = {
val edges = List(
List((1, 1), (2, 3), (4, 10)),
List((2, 1)),
List((3, 1), (4, 5)),
List((4, 1)),
List())
assert(! hasMaxCostPath(edges,0,4,4,3))
assert(hasMaxCostPath(edges,0,4,4,4))
assert(getMaxCostPaths(edges,0,0,5,5) == List())
assert(getMaxCostPaths(edges,0,1,1,1) == List(List(0,1)))
assert(getMaxCostPaths(edges,0,2,2,2) == List(List(0,1,2)))
assert(getMaxCostPaths(edges,0,2,5,5) == List(List(0,2), List(0,1,2)))
}
def hasMaxCostPath(edges: List[List[(Int, Int)]], start: Int, end: Int, maxCost: Int, maxLength: Int): Boolean = {
maxLength > 0 &&
maxCost >= 0 &&
edges(start).exists(a =>
(a._1 == end && a._2 <= maxCost) ||
hasMaxCostPath(edges, a._1, end, maxCost - a._2, maxLength - 1)
)
}
def getMaxCostPaths(
edges: List[List[(Int, Int)]],
from: Int, to: Int,
maxCost: Int,
maxLength: Int): List[List[Int]] = {
getMaxCostPathsRec(edges, from, to, maxCost, maxLength, List(from))
}
def getMaxCostPathsRec(
edges: List[List[(Int, Int)]],
start: Int, end: Int,
maxCost: Int,
maxLength: Int,
path: List[Int]) : List[List[Int]] = {
if (maxLength <= 0 || maxCost < 0) return List()
val direct = edges(start).filter(a => a._1 == end && a._2 <= maxCost).map(edge => path ::: List(edge._1))
val transitive = edges(start).flatMap(a =>
getMaxCostPathsRec(edges, a._1, end, maxCost - a._2, maxLength - 1, path ::: List(a._1))
)
direct ::: transitive
}
}

how to calculate the inversion count using merge sort for a list of Int in Scala?

def mergeSort(xs: List[Int]): List[Int] = {
val n = xs.length / 2
if (n == 0) xs
else {
def merge(xs: List[Int], ys: List[Int]): List[Int] =
(xs, ys) match {
case(Nil, ys) => ys
case(xs, Nil) => xs
case(x :: xs1, y :: ys1) =>
if (x < y) {
x :: merge(xs1, ys)
}
else {
y :: merge(xs, ys1)
}
}
val (left, right) = xs splitAt(n)
merge(mergeSort(left), mergeSort(right))
}
}
Inversion Count for an array indicates – how far (or close) the array is from being sorted. If array is already sorted then inversion count is 0. If array is sorted in reverse order that inversion count is the maximum.
Formally speaking, two elements a[i] and a[j] form an inversion if a[i] > a[j] and i < j
Example:
The sequence 2, 4, 1, 3, 5 has three inversions (2, 1), (4, 1), (4, 3).
So if this list(2, 4, 1, 3, 5)is passed to the function, the inversion count should be 3.
How do I add a variable to get the number?
May be something like will help
def mergeSort(xs: List[Int], cnt: Int): (List[Int], Int) = {
val n = xs.length / 2
if (n == 0) (xs, cnt)
else {
def merge(xs: List[Int], ys: List[Int], cnt: Int): (List[Int], Int) =
(xs, ys) match {
case(Nil, ys) => (ys, cnt)
case(xs, Nil) => (xs, cnt)
case(x :: xs1, y :: ys1) =>
if (x <= y) {
val t = merge(xs1, ys, cnt)
(x :: t._1, t._2)
}
else {
val t = merge(xs, ys1, cnt + xs.size)
(y :: t._1, t._2)
}
}
val (left, right) = xs splitAt(n)
val leftMergeSort = mergeSort(left, cnt)
val rightMergeSort = mergeSort(right, cnt)
merge(leftMergeSort._1, rightMergeSort._1, leftMergeSort._2 + rightMergeSort._2)
}
}
I am passing a tuple along all the function calls that's it.
I increment the value of the cnt when we find that first element of one list is less than the first element of second list. In this scenario we add list.length to the cnt. Look at the code, to get a more clear view.
Hope this helps!

Scala List to Map with Next and Previous values

What would be the best way to transform:
val arr: List[String]
To:
val mapArr: List[Tuple[Int, String]]
Where:
each Tuple is:
- String value is the an odd index element of the list
- Int the size of the previous value.
Example:
val stringArr = List("a", "aaa", "bb", "abc")
val resultShouldBe = List((1, "aaa"), (2, "abc"))
You can use IterableLike.grouped for that:
val result = stringArr
.grouped(2)
.collect { case List(toIndex, value) => (toIndex.length, value) }
.toList
Which yields:
scala> val stringArr = List("a", "aaa", "bb", "abc")
stringArr: List[String] = List(a, aaa, bb, abc)
scala> stringArr.grouped(2).collect { case List(toIndex, value) => (toIndex.length, value) }.toList
res1: List[(Int, String)] = List((1,aaa), (2,abc))

Counting the number of inversions using Merge Sort

I need to count the number of inversions using Merge Sort:
object Example {
def msort(xs: List[Int]): List[Int] = {
def merge(left: List[Int], right: List[Int]): Stream[Int] = (left, right) match {
case (x :: xs, y :: ys) if x < y => Stream.cons(x, merge(xs, right))
case (x :: xs, y :: ys) => Stream.cons(y, merge(left, ys))
case _ => if (left.isEmpty) right.toStream else left.toStream
}
val n = xs.length / 2
if (n == 0) xs
else {
val (ys, zs) = xs splitAt n
merge(msort(ys), msort(zs)).toList
}
}
msort(List(8, 15, 3))
}
I guess I have to count it in the line (where y < y, the second line in match)
case (x :: xs, y :: ys) => Stream.cons(y, merge(left, ys))
However, when I tried I failed.
How do I do that?
UPDATE:
a version with an accumulator:
def msort(xs: List[Int]): List[Int] = {
def merge(left: List[Int], right: List[Int], inversionAcc: Int = 0): Stream[Int] = (left, right) match {
case (x :: xs, y :: ys) if x < y => Stream.cons(x, merge(xs, right, inversionAcc))
case (x :: xs, y :: ys) => Stream.cons(y, merge(left, ys, inversionAcc + 1))
case _ => if (left.isEmpty) right.toStream else left.toStream
}
val n = xs.length / 2
if (n == 0) xs
else {
val (ys, zs) = xs splitAt n
merge(msort(ys), msort(zs)).toList
}
}
How do I easily return inversionAcc? I guess, I can return it a part of tuple only like this:
def merge(left: List[Int], right: List[Int], invariantAcc: Int = 0): (Stream[Int], Int)
It doesn't look good, though.
UPDATE2:
and it actually doesn't count properly, I can't find where the error is.
This is the Scala port of my Frege solution.
object CountInversions {
def inversionCount(xs: List[Int], size: Int): (Int, List[Int]) =
xs match {
case _::_::_ => { //If the list has more than one element
val mid = size / 2
val lsize = mid
val rsize = size - mid
val (left, right) = xs.splitAt(mid)
val (lcount, lsorted) = inversionCount(left, lsize)
val (rcount, rsorted) = inversionCount(right, rsize)
val (mergecount, sorted) = inversionMergeCount(lsorted, lsize, rsorted,
rsize, 0, Nil)
val count = lcount + rcount + mergecount
(count, sorted)
}
case xs => (0, xs)
}
def inversionMergeCount(xs: List[Int], m: Int, ys: List[Int], n: Int,
acc: Int, sorted: List[Int]): (Int, List[Int]) =
(xs, ys) match {
case (xs, Nil) => (acc, sorted.reverse ++ xs)
case (Nil, ys) => (acc, sorted.reverse ++ ys)
case (x :: restx, y :: resty) =>
if (x < y) inversionMergeCount(restx, m - 1, ys, n, acc, x :: sorted)
else if (x > y) inversionMergeCount(xs, m, resty, n - 1, acc + m, y :: sorted)
else inversionMergeCount(restx, m - 1, resty, n - 1, acc, x :: y :: sorted)
}
}
If the solution doesn't have to be strictly functional then you can just add a simplistic counter:
object Example {
var inversions = 0
def msort(xs: List[Int]): List[Int] = {
def merge(left: List[Int], right: List[Int]): Stream[Int] = (left, right) match {
case (x :: xs, y :: ys) if x < y => Stream.cons(x, merge(xs, right))
case (x :: xs, y :: ys) =>
inversions = inversions + 1
Stream.cons(y, merge(left, ys))
case _ => if (left.isEmpty) right.toStream else left.toStream
}
val n = xs.length / 2
if (n == 0) xs
else {
val (ys, zs) = xs splitAt n
merge(msort(ys), msort(zs)).toList
}
}
}
Example.msort(List(8, 15, 3))
println(Example.inversions)
If it has to remain functional then you'll need to create an accumulator and thread it through all of the method calls and return a Pair from each function where the accumulator value is included in the return result, then sum the accumulator values for each merge convergence. (My functional-fu is not very good, I've already tried solving this functionally before trying the simple var approach.)

Ruby: Deleting all instances of a particular key from hash of hashes

I have a hash like
h = {1 => {"inner" => 45}, 2 => {"inner" => 46}, "inner" => 47}
How do I delete every pair that contains the key "inner"?
You can see that some of the "inner" pairs appear directly in h while others appear in pairs in h
Note that I only want to delete the "inner" pairs, so if I call my mass delete method on the above hash, I should get
h = {1 => {}, 2 => {}}
Since these pairs don't have a key == "inner"
Really, this is what reject! is for:
def f! x
x.reject!{|k,v| 'inner' == k} if x.is_a? Hash
x.each{|k,v| f! x[k]}
end
def f x
x.inject({}) do |m, (k, v)|
v = f v if v.is_a? Hash # note, arbitrarily recursive
m[k] = v unless k == 'inner'
m
end
end
p f h
Update: slightly improved...
def f x
x.is_a?(Hash) ? x.inject({}) do |m, (k, v)|
m[k] = f v unless k == 'inner'
m
end : x
end
def except_nested(x,key)
case x
when Hash then x = x.inject({}) {|m, (k, v)| m[k] = except_nested(v,key) unless k == key ; m }
when Array then x.map! {|e| except_nested(e,key)}
end
x
end
Here is what I came up with:
class Hash
def deep_reject_key!(key)
keys.each {|k| delete(k) if k == key || self[k] == self[key] }
values.each {|v| v.deep_reject_key!(key) if v.is_a? Hash }
self
end
end
Works for a Hash or a HashWithIndifferentAccess
> x = {'1' => 'cat', '2' => { '1' => 'dog', '2' => 'elephant' }}
=> {"1"=>"cat", "2"=>{"1"=>"dog", "2"=>"elephant"}}
> y = x.with_indifferent_access
=> {"1"=>"cat", "2"=>{"1"=>"dog", "2"=>"elephant"}}
> x.deep_reject_key!(:"1")
=> {"1"=>"cat", "2"=>{"1"=>"dog", "2"=>"elephant"}}
> x.deep_reject_key!("1")
=> {"2"=>{"2"=>"elephant"}}
> y.deep_reject_key!(:"1")
=> {"2"=>{"2"=>"elephant"}}
Similar answer but it is a whitelist type approach. For ruby 1.9+
# recursive remove keys
def deep_simplify_record(hash, keep)
hash.keep_if do |key, value|
if keep.include?(key)
deep_simplify_record(value, keep) if value.is_a?(Hash)
true
end
end
end
hash = {:a => 1, :b => 2, :c => {:a => 1, :b => 2, :c => {:a => 1, :b => 2, :c => 4}} }
deep_simplify_record(hash, [:b, :c])
# => {:b=>2, :c=>{:b=>2, :c=>{:b=>2, :c=>4}}}
Also here are some other methods which I like to use for hashes.
https://gist.github.com/earlonrails/2048705

Resources