Counting the number of inversions using Merge Sort - algorithm

I need to count the number of inversions using Merge Sort:
object Example {
def msort(xs: List[Int]): List[Int] = {
def merge(left: List[Int], right: List[Int]): Stream[Int] = (left, right) match {
case (x :: xs, y :: ys) if x < y => Stream.cons(x, merge(xs, right))
case (x :: xs, y :: ys) => Stream.cons(y, merge(left, ys))
case _ => if (left.isEmpty) right.toStream else left.toStream
}
val n = xs.length / 2
if (n == 0) xs
else {
val (ys, zs) = xs splitAt n
merge(msort(ys), msort(zs)).toList
}
}
msort(List(8, 15, 3))
}
I guess I have to count it in the line (where y < y, the second line in match)
case (x :: xs, y :: ys) => Stream.cons(y, merge(left, ys))
However, when I tried I failed.
How do I do that?
UPDATE:
a version with an accumulator:
def msort(xs: List[Int]): List[Int] = {
def merge(left: List[Int], right: List[Int], inversionAcc: Int = 0): Stream[Int] = (left, right) match {
case (x :: xs, y :: ys) if x < y => Stream.cons(x, merge(xs, right, inversionAcc))
case (x :: xs, y :: ys) => Stream.cons(y, merge(left, ys, inversionAcc + 1))
case _ => if (left.isEmpty) right.toStream else left.toStream
}
val n = xs.length / 2
if (n == 0) xs
else {
val (ys, zs) = xs splitAt n
merge(msort(ys), msort(zs)).toList
}
}
How do I easily return inversionAcc? I guess, I can return it a part of tuple only like this:
def merge(left: List[Int], right: List[Int], invariantAcc: Int = 0): (Stream[Int], Int)
It doesn't look good, though.
UPDATE2:
and it actually doesn't count properly, I can't find where the error is.

This is the Scala port of my Frege solution.
object CountInversions {
def inversionCount(xs: List[Int], size: Int): (Int, List[Int]) =
xs match {
case _::_::_ => { //If the list has more than one element
val mid = size / 2
val lsize = mid
val rsize = size - mid
val (left, right) = xs.splitAt(mid)
val (lcount, lsorted) = inversionCount(left, lsize)
val (rcount, rsorted) = inversionCount(right, rsize)
val (mergecount, sorted) = inversionMergeCount(lsorted, lsize, rsorted,
rsize, 0, Nil)
val count = lcount + rcount + mergecount
(count, sorted)
}
case xs => (0, xs)
}
def inversionMergeCount(xs: List[Int], m: Int, ys: List[Int], n: Int,
acc: Int, sorted: List[Int]): (Int, List[Int]) =
(xs, ys) match {
case (xs, Nil) => (acc, sorted.reverse ++ xs)
case (Nil, ys) => (acc, sorted.reverse ++ ys)
case (x :: restx, y :: resty) =>
if (x < y) inversionMergeCount(restx, m - 1, ys, n, acc, x :: sorted)
else if (x > y) inversionMergeCount(xs, m, resty, n - 1, acc + m, y :: sorted)
else inversionMergeCount(restx, m - 1, resty, n - 1, acc, x :: y :: sorted)
}
}

If the solution doesn't have to be strictly functional then you can just add a simplistic counter:
object Example {
var inversions = 0
def msort(xs: List[Int]): List[Int] = {
def merge(left: List[Int], right: List[Int]): Stream[Int] = (left, right) match {
case (x :: xs, y :: ys) if x < y => Stream.cons(x, merge(xs, right))
case (x :: xs, y :: ys) =>
inversions = inversions + 1
Stream.cons(y, merge(left, ys))
case _ => if (left.isEmpty) right.toStream else left.toStream
}
val n = xs.length / 2
if (n == 0) xs
else {
val (ys, zs) = xs splitAt n
merge(msort(ys), msort(zs)).toList
}
}
}
Example.msort(List(8, 15, 3))
println(Example.inversions)
If it has to remain functional then you'll need to create an accumulator and thread it through all of the method calls and return a Pair from each function where the accumulator value is included in the return result, then sum the accumulator values for each merge convergence. (My functional-fu is not very good, I've already tried solving this functionally before trying the simple var approach.)

Related

Is there a known algorithm for simplifying a boolean expression with number comparisons?

For example, if I have the expression (A > 5) && (A == 6),
that expression can be simplified to just (A == 6), and still have the same behavior for A ∈ ℤ.
I also need it to work with multiple variables, so for instance ((B > 2) && (C == 2)) || ((B > 2) && (C < 2)) should simplify to (B > 2) && (C < 3).
I won't need to compare two unknowns, only unknowns and numbers, and I only need it to work with the operators <, >, and == for numbers, and && and || for expressions (&& being AND and || being OR, of course). All unknowns are integers.
Is there any algorithm that takes such an expression and returns an expression with equal behavior and a minimal amount of operators?
(in my specific case, || operators are preferred over &&)
Here's a slow dynamic programming algorithm along the lines that you were thinking of.
from collections import defaultdict, namedtuple
from heapq import heappop, heappush
from itertools import product
from math import inf
# Constructors for Boolean expressions. False and True are also accepted.
Lt = namedtuple("Lt", ["lhs", "rhs"])
Eq = namedtuple("Eq", ["lhs", "rhs"])
Gt = namedtuple("Gt", ["lhs", "rhs"])
And = namedtuple("And", ["lhs", "rhs"])
Or = namedtuple("Or", ["lhs", "rhs"])
# Variable names. Arbitrary strings are accepted.
A = "A"
B = "B"
C = "C"
# Example formulas.
first_example = And(Gt(A, 5), Eq(A, 6))
second_example = Or(And(Gt(B, 2), Eq(C, 2)), And(Gt(B, 2), Lt(C, 2)))
third_example = Or(And(Gt(A, 1), Gt(B, 1)), And(Gt(A, 0), Gt(B, 2)))
fourth_example = Or(Lt(A, 6), Gt(A, 5))
fifth_example = Or(And(Eq(A, 2), Gt(C, 2)), And(Eq(B, 2), Lt(C, 2)))
# Returns a map from each variable to the set of values such that the formula
# might evaluate differently for variable = value-1 versus variable = value.
def get_critical_value_sets(formula, result=None):
if result is None:
result = defaultdict(set)
if isinstance(formula, bool):
pass
elif isinstance(formula, Lt):
result[formula.lhs].add(formula.rhs)
elif isinstance(formula, Eq):
result[formula.lhs].add(formula.rhs)
result[formula.lhs].add(formula.rhs + 1)
elif isinstance(formula, Gt):
result[formula.lhs].add(formula.rhs + 1)
elif isinstance(formula, (And, Or)):
get_critical_value_sets(formula.lhs, result)
get_critical_value_sets(formula.rhs, result)
else:
assert False, str(formula)
return result
# Returns a list of inputs sufficient to compare Boolean combinations of the
# primitives returned by enumerate_useful_primitives.
def enumerate_truth_table_inputs(critical_value_sets):
variables, value_sets = zip(*critical_value_sets.items())
return [
dict(zip(variables, values))
for values in product(*({-inf} | value_set for value_set in value_sets))
]
# Returns both constants and all single comparisons whose critical value set is
# a subset of the given ones.
def enumerate_useful_primitives(critical_value_sets):
yield False
yield True
for variable, value_set in critical_value_sets.items():
for value in value_set:
yield Lt(variable, value)
if value + 1 in value_set:
yield Eq(variable, value)
yield Gt(variable, value - 1)
# Evaluates the formula recursively on the given input.
def evaluate(formula, input):
if isinstance(formula, bool):
return formula
elif isinstance(formula, Lt):
return input[formula.lhs] < formula.rhs
elif isinstance(formula, Eq):
return input[formula.lhs] == formula.rhs
elif isinstance(formula, Gt):
return input[formula.lhs] > formula.rhs
elif isinstance(formula, And):
return evaluate(formula.lhs, input) and evaluate(formula.rhs, input)
elif isinstance(formula, Or):
return evaluate(formula.lhs, input) or evaluate(formula.rhs, input)
else:
assert False, str(formula)
# Evaluates the formula on the many inputs, packing the values into an integer.
def get_truth_table(formula, inputs):
truth_table = 0
for input in inputs:
truth_table = (truth_table << 1) + evaluate(formula, input)
return truth_table
# Returns (the number of operations in the formula, the number of Ands).
def get_complexity(formula):
if isinstance(formula, bool):
return (0, 0)
elif isinstance(formula, (Lt, Eq, Gt)):
return (1, 0)
elif isinstance(formula, And):
ops_lhs, ands_lhs = get_complexity(formula.lhs)
ops_rhs, ands_rhs = get_complexity(formula.rhs)
return (ops_lhs + 1 + ops_rhs, ands_lhs + 1 + ands_rhs)
elif isinstance(formula, Or):
ops_lhs, ands_lhs = get_complexity(formula.lhs)
ops_rhs, ands_rhs = get_complexity(formula.rhs)
return (ops_lhs + 1 + ops_rhs, ands_lhs + ands_rhs)
else:
assert False, str(formula)
# Formula compared by complexity.
class HeapItem:
__slots__ = ["_complexity", "formula"]
def __init__(self, formula):
self._complexity = get_complexity(formula)
self.formula = formula
def __lt__(self, other):
return self._complexity < other._complexity
def __le__(self, other):
return self._complexity <= other._complexity
def __eq__(self, other):
return self._complexity == other._complexity
def __ne__(self, other):
return self._complexity != other._complexity
def __ge__(self, other):
return self._complexity >= other._complexity
def __gt__(self, other):
return self._complexity > other._complexity
# Like heapq.merge except we can add iterables dynamically.
class Merge:
__slots__ = ["_heap", "_iterable_count"]
def __init__(self):
self._heap = []
self._iterable_count = 0
def update(self, iterable):
iterable = iter(iterable)
try:
value = next(iterable)
except StopIteration:
return
heappush(self._heap, (value, self._iterable_count, iterable))
self._iterable_count += 1
def __iter__(self):
return self
def __next__(self):
if not self._heap:
raise StopIteration
value, index, iterable = heappop(self._heap)
try:
next_value = next(iterable)
except StopIteration:
return value
heappush(self._heap, (next_value, index, iterable))
return value
class Combinations:
__slots__ = ["_op", "_formula", "_best_formulas", "_i", "_n"]
def __init__(self, op, formula, best_formulas):
self._op = op
self._formula = formula
self._best_formulas = best_formulas
self._i = 0
self._n = len(best_formulas)
def __iter__(self):
return self
def __next__(self):
if self._i >= self._n:
raise StopIteration
formula = self._op(self._formula, self._best_formulas[self._i])
self._i += 1
return HeapItem(formula)
# Returns the simplest equivalent formula, breaking ties in favor of fewer Ands.
def simplify(target_formula):
critical_value_sets = get_critical_value_sets(target_formula)
inputs = enumerate_truth_table_inputs(critical_value_sets)
target_truth_table = get_truth_table(target_formula, inputs)
best = {}
merge = Merge()
for formula in enumerate_useful_primitives(critical_value_sets):
merge.update([HeapItem(formula)])
best_formulas = []
for item in merge:
if target_truth_table in best:
return best[target_truth_table]
formula = item.formula
truth_table = get_truth_table(formula, inputs)
if truth_table in best:
continue
n = len(best_formulas)
for op in [And, Or]:
merge.update(Combinations(op, formula, best_formulas))
best[truth_table] = formula
best_formulas.append(formula)
print(simplify(first_example))
print(simplify(second_example))
print(simplify(third_example))
print(simplify(fourth_example))
print(simplify(fifth_example))
Output:
Eq(lhs='A', rhs=6)
And(lhs=Lt(lhs='C', rhs=3), rhs=Gt(lhs='B', rhs=2))
And(lhs=And(lhs=Gt(lhs='B', rhs=1), rhs=Gt(lhs='A', rhs=0)), rhs=Or(lhs=Gt(lhs='B', rhs=2), rhs=Gt(lhs='A', rhs=1)))
True
Or(lhs=And(lhs=Eq(lhs='B', rhs=2), rhs=Lt(lhs='C', rhs=2)), rhs=And(lhs=Gt(lhs='C', rhs=2), rhs=Eq(lhs='A', rhs=2)))
Maybe you can consider intervals for your variables, for example:
(A > 5) && (A == 6)
Given you have a variable A, set an initial interval for it: A: [-∞, ∞].
Each condition that you read, you can reduce your interval:
(A > 5) sets the interval for A: [6, ∞]
(A == 6) sets the interval for A: [6, 6]
For each update on the interval, check if the new condition is possible, for example:
(A > 5) sets the interval for A: [6, ∞]
(A == 5) out of the interval, impossible condition.
Just another example:
((B > 2) && (C == 2)) || ((B > 2) && (C < 2))
Initially: B: [-∞, ∞] and C: [-∞, ∞].
((B > 2) && (C == 2))
(B > 2) sets the interval for B: [3, ∞]
(C == 2) sets the interval for C: [2, 2]
The next condition is attached with ||, so you add intervals:
((B > 2) && (C < 2))
(B > 2) sets the interval for B: [3, ∞]
(C < 2) sets the interval for C: [2, 2] U [-∞, 1] = [-∞, 2]

Julia merge-sort implementation not working correctly

I'm not quite sure why my merge-sort implementation is not working.
merge_sort takes as arguments an array A, and starting and final indices p and r. If I try to run merge_sort(A, 1, 9) on A = [1, 64, 64, 315, 14, 2, 3, 4, 5], A will become A = [1, 1, 1, 1, 1, 2, 2, 4, 5]. I'm trying to use a sentinel to detect whether the L and R arrays have been exhausted.
Here is the code:
function merge_sort(A, p, r)
if p < r
q = floor(Int, (p+r)/2)
merge_sort(A, p, q)
merge_sort(A, q+1, r)
merge(A, p, q, r)
end
end
function merge(A, p, q, r)
n1 = q-p+1
n2 = r-q
L = []
R = []
for i = 1:n1
push!(L, A[p+1-1])
end
for j = 1:n2
push!(R, A[q+j])
end
sentinel = 123456789
push!(L, sentinel)
push!(R, sentinel)
i=1
j=1
for k=p:r
if L[i] <= R[j]
A[k] = L[i]
i = i+1
else
A[k] = R[j]
j = j+1
end
end
end
You have a typo in push!(L, A[p+1-1]) which should be push!(L, A[p+i-1]).
Here is a bit cleaned up version of your code (but I did not try to fully optimize it to retain your logic):
function merge_sort!(A, p = 1, r = length(A))
if p < r
q = div(p+r, 2)
merge_sort!(A, p, q)
merge_sort!(A, q+1, r)
merge!(A, p, q, r)
end
A
end
function merge!(A, p, q, r)
sentinel = typemax(eltype(A))
L = A[p:q]
R = A[(q+1):r]
push!(L, sentinel)
push!(R, sentinel)
i, j = 1, 1
for k in p:r
if L[i] <= R[j]
A[k] = L[i]
i += 1
else
A[k] = R[j]
j += 1
end
end
end

how to calculate the inversion count using merge sort for a list of Int in Scala?

def mergeSort(xs: List[Int]): List[Int] = {
val n = xs.length / 2
if (n == 0) xs
else {
def merge(xs: List[Int], ys: List[Int]): List[Int] =
(xs, ys) match {
case(Nil, ys) => ys
case(xs, Nil) => xs
case(x :: xs1, y :: ys1) =>
if (x < y) {
x :: merge(xs1, ys)
}
else {
y :: merge(xs, ys1)
}
}
val (left, right) = xs splitAt(n)
merge(mergeSort(left), mergeSort(right))
}
}
Inversion Count for an array indicates – how far (or close) the array is from being sorted. If array is already sorted then inversion count is 0. If array is sorted in reverse order that inversion count is the maximum.
Formally speaking, two elements a[i] and a[j] form an inversion if a[i] > a[j] and i < j
Example:
The sequence 2, 4, 1, 3, 5 has three inversions (2, 1), (4, 1), (4, 3).
So if this list(2, 4, 1, 3, 5)is passed to the function, the inversion count should be 3.
How do I add a variable to get the number?
May be something like will help
def mergeSort(xs: List[Int], cnt: Int): (List[Int], Int) = {
val n = xs.length / 2
if (n == 0) (xs, cnt)
else {
def merge(xs: List[Int], ys: List[Int], cnt: Int): (List[Int], Int) =
(xs, ys) match {
case(Nil, ys) => (ys, cnt)
case(xs, Nil) => (xs, cnt)
case(x :: xs1, y :: ys1) =>
if (x <= y) {
val t = merge(xs1, ys, cnt)
(x :: t._1, t._2)
}
else {
val t = merge(xs, ys1, cnt + xs.size)
(y :: t._1, t._2)
}
}
val (left, right) = xs splitAt(n)
val leftMergeSort = mergeSort(left, cnt)
val rightMergeSort = mergeSort(right, cnt)
merge(leftMergeSort._1, rightMergeSort._1, leftMergeSort._2 + rightMergeSort._2)
}
}
I am passing a tuple along all the function calls that's it.
I increment the value of the cnt when we find that first element of one list is less than the first element of second list. In this scenario we add list.length to the cnt. Look at the code, to get a more clear view.
Hope this helps!

Appending occurrences and aggregation to the duplicates records

This is my file which contains these lines:
**name,bus_id,bus_timing,bus_ticket**
yuhindmklwm00409219,958193628,0305delete,2700)
(yuhindmklwm00409219,958193628,0305delete,800)
(yuhindmklwm00409219,959262446,0219delete,62)
(yuhindmklwm00437293,752013801,0220delete,2700)
(yuhindmklwm00437293,85382,0126delete,500)
(yuhindmklwm00437293,863056514,0326delete,-2700)
(yuhindmklwm00437293,863056514,0326delete,2700)
(yuhindmklwm00437293,85258,0313delete,1000)
(yuhindmklwm00437293,85012,0311delete,1000)
(yuhindmklwm00437293,85718,0311delete,2700)
(yuhindmklwm00437293,744622574,0322delete,90)
(yuhindmklwm00437293,83704,0215delete,17)
(yuhindmklwm00437293,85253,0331delete,-2700)
(yuhindmklwm00437293,85253,0331delete,2700)
(yuhindmklwm00437293,752013801,0305delete,2700)
(yuhindmklwm00437293,33165,0315delete,1000)
(yuhindmklwm00437293,85018,0319delete,100)
(yuhindmklwm00437293,85018,0219delete,100)
(yuhindmklwm00437293,85018,0118delete,100)
(yuhindmklwm00437293,90265,0312delete,6)
(yuhindmklwm00437293,02465,0312delete,25)
(yuhindmklwm00437293,857164939,0313delete,15)
(yuhindmklwm00437293,22102,0313delete,4)
(yuhindmklwm00437293,55423,0313delete,100)
(yuhindmklwm00437293,02465,0314delete,1)
(yuhindmklwm00437293,90265,0312delete,1)
(yuhindmklwm00437293,93108,0315delete,25)
(yuhindmklwm00437293,220432304,0315delete,35)
(yuhindmklwm00437293,701211570,0315delete,35)
(yuhindmklwm00437293,28801,0315delete,10)
(yuhindmklwm00437293,93108,0211delete,3)
(yuhindmklwm00437293,93108,02)
My final output should contain duplicate records and their occurences with sum amount and percentile.
name,bus_id,bus_timing, 60th percentile value of bus_ticket, sum_bus_ticket, occurence)
yuhindmklwm00409219,958193628,0305delete,2000, 2700, 1)
yuhindmklwm00409219,958193628,0305delete,2000, 3500, 2)
.......
.......
......
This can be solved by list's but it's not efficient can somebody think of other data structures?
It's ok if you ignore aggregation's like sum or percentile. But at least one aggregation should be there.
This is my percentile function:
scala> def percentileValue(p: Int,data: List[Int]): Int = {val firstSort=data.sorted; val k=math.ceil((data.size-1) * (p / 100.0)).toInt; return firstSort(k).toInt}
percentileValue: (p: Int, data: List[Int])Int
scala> val lst=List(1,2,3,4,5,6)
lst: List[Int] = List(1, 2, 3, 4, 5, 6)
scala> percentileValue(60,lst)
res142: Int = 4
Shortened the data for better testing. Something like that?
val lili = List (List ("yuhindmklwm004092193", "9581936283", "0305delete3", 2700),
List ("yuhindmklwm004092193", "9581936283", "0305delete3", 800),
List ("yuhindmklwm004092193", "9592624463", "0219delete3", 62),
List ("yuhindmklwm004372933", "7520138013", "0220delete3", 2700),
List ("yuhindmklwm004372933", "853823", "0126delete3", 500),
List ("yuhindmklwm004372933", "8630565143", "0326delete3", -2700),
List ("yuhindmklwm004372933", "8630565143", "0326delete3", 2700),
List ("yuhindmklwm004372933", "852583", "0313delete3", 1000))
Grouping:
scala> lili.groupBy {case (list) => list(0) }.map {case (k, v) => (k, v.map (_(3)))}
res18: scala.collection.immutable.Map[Any,List[Any]] = Map(yuhindmklwm004372933 -> List(2700, 500, -2700, 2700, 1000), yuhindmklwm004092193 -> List(2700, 800, 62))
Mapping on percentileValue:
lili.groupBy {case (list) => list(0) }.map {case (k, v) => (k, v.map (_(3)))}.map {case (k, v:List[Int])=> (k, percentileValue (60, v))}
<console>:10: warning: non-variable type argument Int in type pattern List[Int] (the underlying of List[Int]) is unchecked since it is eliminated by erasure
lili.groupBy {case (list) => list(0) }.map {case (k, v) => (k, v.map (_(3)))}.map {case (k, v:List[Int])=> (k, percentileValue (60, v))}
^
res22: scala.collection.immutable.Map[Any,Int] = Map(yuhindmklwm004372933 -> 2700, yuhindmklwm004092193 -> 2700)
scala> lili.groupBy {case (list) => list(0) }.map {case (k, v) => (k, v.map (_(3)))}.map {case (k, v:List[Int])=> (k, percentileValue (10, v))}
<console>:10: warning: non-variable type argument Int in type pattern List[Int] (the underlying of List[Int]) is unchecked since it is eliminated by erasure
lili.groupBy {case (list) => list(0) }.map {case (k, v) => (k, v.map (_(3)))}.map {case (k, v:List[Int])=> (k, percentileValue (10, v))}
^
res23: scala.collection.immutable.Map[Any,Int] = Map(yuhindmklwm004372933 -> 500, yuhindmklwm004092193 -> 800)

no function clause matching in Enum.reduce/3

I'm getting this error when trying to sum a list I'm getting back from a comprehension:
range = 1..999
multiple_of_3_or_5? = fn(n) -> (rem(n, 3) == 0 || rem(n, 5) == 0) end
IO.inspect for n <- range, multiple_of_3_or_5?.(n),
do: Enum.reduce n, 0, fn(x) -> x end
#=> ** (FunctionClauseError) no function clause matching in Enum.reduce/3
Why am I getting this error?
The function in the third parameter of Enum.reduce needs to have two parameters, the element from the enumerable and an accumulator. You currently only have one parameter, x.
The first and third param are wrong. You can try this
range = 1..999
multiple_of_3_or_5? = fn(n) -> (rem(n, 3) == 0 || rem(n, 5) == 0) end
for n <- range, multiple_of_3_or_5?.(n) do n end |> Enum.reduce(0,
fn(x, acc) -> x + acc end)
or
range = 1..999
multiple_of_3_or_5? = fn(n) -> (rem(n, 3) == 0 || rem(n, 5) == 0) end
Enum.reduce_while(range, 0, fn i, acc ->
if multiple_of_3_or_5?.(i), do: {:cont, acc + i}, else: {:cont, acc}
end)

Resources