Use Ruby to Truncate duplicate patterns in an Array - ruby

SITE ADMIN: WOULD YOU PLEASE REMOVE THIS POST?
For example, I have
tt = [0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 0, 0, 0, 0, 0]
and I would like to slim it down to
tt_out = [0, 1, 1, 2, 2, 1, 1, 0, 0]
also I'd like to know when does the repetition begins and ends, hence I'd like to have the following tip
tip = '0','1.','.5','6.','.11','12.','.15','16.','.20'

tt = [0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 0, 0, 0, 0, 0]
tip = []
tt_out = tt.map.with_index{|t, i|
start_range = (i==0 || tt[i-1] != tt[i])
end_range = (tt[i+1] != tt[i])
if start_range && end_range
tip << "#{i}"
elsif start_range
tip << "#{i}."
elsif end_range
tip << ".#{i}"
end
t if start_range || end_range
}.compact
tip
=> ["0", "1.", ".5", "6.", ".11", "12.", ".15", "16.", ".20"]
tt_out
=> [0, 1, 1, 2, 2, 1, 1, 0, 0]
P.S: You've got an error in your example, the last element of tip should be '.20'

Related

Pandas Series correlation against a single vector

I have a DataFrame with a list of arrays as one column.
import pandas as pd
v = [1, 2, 3, 4, 5, 6, 7]
v1 = [1, 0, 0, 0, 0, 0, 0]
v2 = [0, 1, 0, 0, 1, 0, 0]
v3 = [1, 1, 0, 0, 0, 0, 1]
df = pd.DataFrame({'A': [v1, v2, v3]})
print df
Output:
A
0 [1, 0, 0, 0, 0, 0, 0]
1 [0, 1, 0, 0, 1, 0, 0]
2 [1, 1, 0, 0, 0, 0, 1]
I want to do a pd.Series.corr for each row of df.A against the single vector v.
I'm currently doing a loop on df.A and achieving it. It is very slow.
Expected Output:
A B
0 [1, 0, 0, 0, 0, 0, 0] -0.612372
1 [0, 1, 0, 0, 1, 0, 0] -0.158114
2 [1, 1, 0, 0, 0, 0, 1] -0.288675
Here's one using the correlation defintion with NumPy tools meant for performance with corr2_coeff_rowwise -
a = np.array(df.A.tolist()) # or np.vstack(df.A.values)
df['B'] = corr2_coeff_rowwise(a, np.asarray(v)[None])
Runtime test -
Case #1 : 1000 rows
In [59]: df = pd.DataFrame({'A': [np.random.randint(0,9,(7)) for i in range(1000)]})
In [60]: v = np.random.randint(0,9,(7)).tolist()
# #jezrael's soln
In [61]: %timeit df['new'] = pd.DataFrame(df['A'].values.tolist()).corrwith(pd.Series(v), axis=1)
10 loops, best of 3: 142 ms per loop
In [62]: %timeit df['B'] = corr2_coeff_rowwise(np.array(df.A.tolist()), np.asarray(v)[None])
1000 loops, best of 3: 461 µs per loop
Case #2 : 10000 rows
In [63]: df = pd.DataFrame({'A': [np.random.randint(0,9,(7)) for i in range(10000)]})
In [64]: v = np.random.randint(0,9,(7)).tolist()
# #jezrael's soln
In [65]: %timeit df['new'] = pd.DataFrame(df['A'].values.tolist()).corrwith(pd.Series(v), axis=1)
1 loop, best of 3: 1.38 s per loop
In [66]: %timeit df['B'] = corr2_coeff_rowwise(np.array(df.A.tolist()), np.asarray(v)[None])
100 loops, best of 3: 3.05 ms per loop
Use corrwith, but if performance is important, Divakar's anwer should be faster:
df['new'] = pd.DataFrame(df['A'].values.tolist()).corrwith(pd.Series(v), axis=1)
print (df)
A new
0 [1, 0, 0, 0, 0, 0, 0] -0.612372
1 [0, 1, 0, 0, 1, 0, 0] -0.158114
2 [1, 1, 0, 0, 0, 0, 1] -0.288675

Linearly reading a multi-dimensional array obeying dimensional sub-sectioning

I have an API for reading multi-dimensional arrays, requiring to pass a vector of ranges to read sub-rectangles (or hypercubes) from the backing array. I want to read this array "linearly", all elements in some given order with arbitrary chunk sizes. Thus, the task is with an off and a len, to translate the elements covered by this range into the smallest possible set of hyper-cubes, i.e. the smallest number of read commands issued in the API.
For example, we can calculate index vectors for the set of dimensions giving a linear index:
def calcIndices(off: Int, shape: Vector[Int]): Vector[Int] = {
val modsDivs = shape zip shape.scanRight(1)(_ * _).tail
modsDivs.map { case (mod, div) =>
(off / div) % mod
}
}
Let's say the shape is this, representing an array with rank 4 and 120 elements in total:
val sz = Vector(2, 3, 4, 5)
val num = sz.product // 120
A utility to print these index vectors for a range of linear offsets:
def printIndices(off: Int, len: Int): Unit =
(off until (off + len)).map(calcIndices(_, sz))
.map(_.mkString("[", ", ", "]")).foreach(println)
We can generate all those vectors:
printIndices(0, num)
[0, 0, 0, 0]
[0, 0, 0, 1]
[0, 0, 0, 2]
[0, 0, 0, 3]
[0, 0, 0, 4]
[0, 0, 1, 0]
[0, 0, 1, 1]
[0, 0, 1, 2]
[0, 0, 1, 3]
[0, 0, 1, 4]
[0, 0, 2, 0]
[0, 0, 2, 1]
[0, 0, 2, 2]
[0, 0, 2, 3]
[0, 0, 2, 4]
[0, 0, 3, 0]
[0, 0, 3, 1]
[0, 0, 3, 2]
[0, 0, 3, 3]
[0, 0, 3, 4]
[0, 1, 0, 0]
...
[1, 2, 1, 4]
[1, 2, 2, 0]
[1, 2, 2, 1]
[1, 2, 2, 2]
[1, 2, 2, 3]
[1, 2, 2, 4]
[1, 2, 3, 0]
[1, 2, 3, 1]
[1, 2, 3, 2]
[1, 2, 3, 3]
[1, 2, 3, 4]
Let's look at an example chunk that should be read,
the first six elements:
val off1 = 0
val len1 = 6
printIndices(off1, len1)
I will already partition the output by hand into hypercubes:
// first hypercube or read
[0, 0, 0, 0]
[0, 0, 0, 1]
[0, 0, 0, 2]
[0, 0, 0, 3]
[0, 0, 0, 4]
// second hypercube or read
[0, 0, 1, 0]
So the task is to define a method
def partition(shape: Vector[Int], off: Int, len: Int): List[Vector[Range]]
which outputs the correct list and uses the smallest possible list size.
So for off1 and len1, we have the expected result:
val res1 = List(
Vector(0 to 0, 0 to 0, 0 to 0, 0 to 4),
Vector(0 to 0, 0 to 0, 1 to 1, 0 to 0)
)
assert(res1.map(_.map(_.size).product).sum == len1)
A second example, elements at indices 6 until 22, with manual partitioning giving three hypercubes or read commands:
val off2 = 6
val len2 = 16
printIndices(off2, len2)
// first hypercube or read
[0, 0, 1, 1]
[0, 0, 1, 2]
[0, 0, 1, 3]
[0, 0, 1, 4]
// second hypercube or read
[0, 0, 2, 0]
[0, 0, 2, 1]
[0, 0, 2, 2]
[0, 0, 2, 3]
[0, 0, 2, 4]
[0, 0, 3, 0]
[0, 0, 3, 1]
[0, 0, 3, 2]
[0, 0, 3, 3]
[0, 0, 3, 4]
// third hypercube or read
[0, 1, 0, 0]
[0, 1, 0, 1]
expected result:
val res2 = List(
Vector(0 to 0, 0 to 0, 1 to 1, 1 to 4),
Vector(0 to 0, 0 to 0, 2 to 3, 0 to 4),
Vector(0 to 0, 1 to 1, 0 to 0, 0 to 1)
)
assert(res2.map(_.map(_.size).product).sum == len2)
Note that for val off3 = 6; val len3 = 21, we would need four readings.
The idea of the following algorithm is as follows:
a point-of-interest (poi) is the left-most position
at which two index representations differ
(for example for [0, 0, 0, 1] and [0, 1, 0, 0] the poi is 1)
we recursively sub-divide the original (start, stop) linear index range
we use motions in two directions, first by keeping the start constant
and decreasing the stop through a special "ceil" operation on the start,
later by keeping the stop constant and increasing the start through
a special "floor" operation on the stop
for each sub range, we calculate the poi of the boundaries, and
we calculate "trunc" which is ceil or floor operation described above
if this trunc value is identical to its input, we add the entire region
and return
otherwise we recurse
the special "ceil" operation takes the previous start value and
increases the element at the poi index and zeroes the subsequent elements;
e.g. for [0, 0, 1, 1] and poi = 2, the ceil would be [0, 0, 2, 0]
the special "floor" operation takes the previous stop value and
zeroes the elements after the poi index;
e.g. for [0, 0, 1, 1], and poi = 2, the floor would be [0, 0, 1, 0]
Here is my implementation. First, a few utility functions:
def calcIndices(off: Int, shape: Vector[Int]): Vector[Int] = {
val modsDivs = (shape, shape.scanRight(1)(_ * _).tail, shape.indices).zipped
modsDivs.map { case (mod, div, idx) =>
val x = off / div
if (idx == 0) x else x % mod
}
}
def calcPOI(a: Vector[Int], b: Vector[Int], min: Int): Int = {
val res = (a.drop(min) zip b.drop(min)).indexWhere { case (ai,bi) => ai != bi }
if (res < 0) a.size else res + min
}
def zipToRange(a: Vector[Int], b: Vector[Int]): Vector[Range] =
(a, b).zipped.map { (ai, bi) =>
require (ai <= bi)
ai to bi
}
def calcOff(a: Vector[Int], shape: Vector[Int]): Int = {
val divs = shape.scanRight(1)(_ * _).tail
(a, divs).zipped.map(_ * _).sum
}
def indexTrunc(a: Vector[Int], poi: Int, inc: Boolean): Vector[Int] =
a.zipWithIndex.map { case (ai, i) =>
if (i < poi) ai
else if (i > poi) 0
else if (inc) ai + 1
else ai
}
Then the actual algorithm:
def partition(shape: Vector[Int], off: Int, len: Int): List[Vector[Range]] = {
val rankM = shape.size - 1
def loop(start: Int, stop: Int, poiMin: Int, dir: Boolean,
res0: List[Vector[Range]]): List[Vector[Range]] =
if (start == stop) res0 else {
val last = stop - 1
val s0 = calcIndices(start, shape)
val s1 = calcIndices(stop , shape)
val s1m = calcIndices(last , shape)
val poi = calcPOI(s0, s1m, poiMin)
val ti = if (dir) s0 else s1
val to = if (dir) s1 else s0
val st = if (poi >= rankM) to else indexTrunc(ti, poi, inc = dir)
val trunc = calcOff(st, shape)
val split = trunc != (if (dir) stop else start)
if (split) {
if (dir) {
val res1 = loop(start, trunc, poiMin = poi+1, dir = true , res0 = res0)
loop (trunc, stop , poiMin = 0 , dir = false, res0 = res1)
} else {
val s1tm = calcIndices(trunc - 1, shape)
val res1 = zipToRange(s0, s1tm) :: res0
loop (trunc, stop , poiMin = poi+1, dir = false, res0 = res1)
}
} else {
zipToRange(s0, s1m) :: res0
}
}
loop(off, off + len, poiMin = 0, dir = true, res0 = Nil).reverse
}
Examples:
val sz = Vector(2, 3, 4, 5)
partition(sz, 0, 6)
// result:
List(
Vector(0 to 0, 0 to 0, 0 to 0, 0 to 4), // first hypercube
Vector(0 to 0, 0 to 0, 1 to 1, 0 to 0) // second hypercube
)
partition(sz, 6, 21)
// result:
List(
Vector(0 to 0, 0 to 0, 1 to 1, 1 to 4), // first read
Vector(0 to 0, 0 to 0, 2 to 3, 0 to 4), // second read
Vector(0 to 0, 1 to 1, 0 to 0, 0 to 4), // third read
Vector(0 to 0, 1 to 1, 1 to 1, 0 to 1) // fourth read
)
The maximum number of reads, if I'm not mistaken, would be 2 * rank.

Mathematica Generate Binary Numbers with Locked Bits

I have a very specific Mathematica question. I am trying to generate all the binary numbers around certain 'locked' bits. I am using a list of string values to denote which bits are locked e.g. {"U","U,"L","U"}, where U is an "unlocked" mutable bit and L is a "locked" immutable bit. I start with a temporary list of random binary numbers that have been formatted to the previous list e.g. {0, 1, 1, 0}, where the 1 is the locked bit. I need to find all the remaining binary numbers where the 1 bit is constant. I've approached this problem recursively, iteratively, and with a combination of both with no results. This is for research I am doing at my university.
I am building a list of base 10 forms of the binary numbers. I realize that this code is completely wrong. This is just one attempt.
Do[
If[bits[[pos]] == "U",
AppendTo[returnList, myFunction[bits, temp, pos, returnList]]; ],
{pos, 8, 1}]
myFunction[bits_, bin_, pos_, rList_] :=
Module[{binary = bin, current = Length[bin], returnList = rList},
If[pos == current,
Return[returnList],
If[bits[[current]] == "U",
(*If true*)
If[! MemberQ[returnList, FromDigits[binary, 2]],
(*If true*)
AppendTo[returnList, FromDigits[binary, 2]];
binary[[current]] = Abs[binary[[current]] - 1],
(*If false*)
binary[[current]] = 0;
current = current - 1]; ,
(*If false*)
current = current - 1];
returnList = myFunction[bits, binary, pos, returnList];
Return[returnList]]]
You can use Tuples and Fold to generate only bit sets that you are interested in.
bits = {"U", "U", "L", "U"};
Fold[
Function[{running, next},
Insert[running, 1, next]], #, Position[bits, "L"]] & /# Tuples[{0, 1}, Count["U"]#bits]
(*
{{0, 0, 1, 0}, {0, 0, 1, 1}, {0, 1, 1, 0}, {0, 1, 1, 1},
{1, 0, 1, 0}, {1, 0, 1, 1}, {1, 1, 1, 0}, {1, 1, 1, 1}}
*)
Hope this helps.
in = IntegerDigits[Round[ Pi 10^9 ], 2];
mask = RandomSample[ConstantArray["L", 28]~Join~ConstantArray["U", 4],32];
subs[in_, mask_] := Module[ {p = Position[mask, "U"]} ,
ReplacePart[in, Rule ### Transpose[{p, #}]] & /#
Tuples[{0, 1}, Length#p]]
subs[in, mask]
{{1, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1,
0, 0, 1, 0, 0, 1, 0, 1, 0}, {1, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0,
0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0}, {1, 0, 1,
1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0,
1, 0, 0, 1, 0, 1, 0}, {1, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0,
0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0}, ...
FromDigits[#, 2] & /# %
{3108030026, 3108030030, 3108038218, 3108038222, 3108095562,
3108095566, 3108103754, 3108103758, 3141584458, 3141584462,
3141592650, 3141592654, 3141649994, 3141649998, 3141658186,
3141658190}
myFunction[bits_] := Module[{length, num, range, all, pattern},
length = Length[bits];
num = 2^length;
range = Range[0, num - 1];
all = PadLeft[IntegerDigits[#, 2], length] & /# range;
pattern = bits /. {"U" -> _, "L" -> 1};
Cases[all, pattern]]
bits = {"U", "U", "L", "U"};
myFunction[bits]
{{0, 0, 1, 0}, {0, 0, 1, 1}, {0, 1, 1, 0}, {0, 1, 1, 1},
{1, 0, 1, 0}, {1, 0, 1, 1}, {1, 1, 1, 0}, {1, 1, 1, 1}}

ruby array sum of elements with structure conversion

I have
{
3=>[
{63=>[5, 0, 1, 0]},
{64=>[0, 0, 0, 0]},
{65=>[0, 1, 2, 2]}
],
1=>[
{31=>[2, 0, 0, 0]},
{32=>[0, 0, 3, 0]}
]
}
I need to convert into
{ 3 => [5,1,3,2], 1 => [2,0,3,0] }
h= {
3=>[
{63=>[5, 0, 1, 0]},
{64=>[0, 0, 0, 0]},
{65=>[0, 1, 2, 2]}
],
1=>[
{31=>[2, 0, 0, 0]},
{32=>[0, 0, 3, 0]}
]
}
p h.map{ |k, v| { k=> v.map(&:values).flatten(1).transpose.map{ |r| r.reduce(:+) } } }
# => [{3=>[5, 1, 3, 2]}, {1=>[2, 0, 3, 0]}]
It's nothing difficult, you just need a little attention.
a = {
3=>[
{63=>[5, 0, 1, 0]},
{64=>[0, 0, 0, 0]},
{65=>[0, 1, 2, 2]}
],
1=>[
{31=>[2, 0, 0, 0]},
{32=>[0, 0, 3, 0]}
]
}
b = a.each_with_object({}) do |(k, v), memo|
res = []
v.each do |h|
h.each do |_, v2|
v2.each_with_index do |el, idx|
res[idx] ||= 0
res[idx] += el
end
end
end
memo[k] = res
end
b # => {3=>[5, 1, 3, 2], 1=>[2, 0, 3, 0]}
Here's some readable variable names and a basic explanation.
a = {
3=>[
{63=>[5, 0, 1, 0]},
{64=>[0, 0, 0, 0]},
{65=>[0, 1, 2, 2]}
],
1=>[
{31=>[2, 0, 0, 0]},
{32=>[0, 0, 3, 0]}
]
}
b = a.each_with_object({}) do |(key, sub_hashes), result|
# Get the subarray for each nested hash (Ignore keys on the nested hashes)
# Also flattening while mapping to get appropriate array of arrays
value = sub_hashes.flat_map(&:values).
# Transpose each row into a column
# e.g. [[5,0,1,0], [0,0,0,0], [0,1,2,2]] becomes [[5,0,0], [0,0,1], [1,0,2], [0,0,2]]
transpose.
# Sum each column
# e.g. [1,0,2] = 1 + 0 + 2 = 3
map { |column| column.reduce(0, :+) }
# Update results set (Could also get rid of intermediate variable 'value' if you wish)
result[key] = value
end
puts b # => {3=>[5, 1, 3, 2], 1=>[2, 0, 3, 0]}
puts b == {3 => [5,1,3,2], 1=>[2,0,3,0]}
Edit: Now using flat_map!

Most performant way to invert an array?

If I have an array like
ary = [0, 0, 3, 0, 0, 0, 2, 0, 1, 0, 1, 1, 0]
What is the most performant way to get a list of how many indexes were in the array?
inverted = [2,2,2,6,6,8,10,11]
This is what I've come up with, but it seems like there is a more efficient way:
a = []
ary.each_with_index{|v,i| a << Array.new(v, i) if v != 0}
a.flatten
=> [2, 2, 2, 6, 6, 8, 10, 11]
Unless profiling proves this to be a bottleneck, the cleaner is a functional approach:
>> ary.each_with_index.map { |x, idx| [idx]*x }.flatten(1)
=> [2, 2, 2, 6, 6, 8, 10, 11]
If you use Ruby 1.9, I'd recommend this (thanks to sawa for pointing out Enumerable#flat_map):
>> ary.flat_map.with_index { |x, idx| [idx]*x }
=> [2, 2, 2, 6, 6, 8, 10, 11]
[edit: removed examples using inject and each_with_object, it's unlikely they are faster than flat_map + with_index]
You could use Array#push instead of Array#<< to speed this up a little.
ary.each_with_index{|v,i| a.push(*Array.new(v, i)) if v != 0}
Some quick benchmarking shows me that this is about 30% faster than using <<.
> ary = [0, 0, 3, 0, 0, 0, 2, 0, 1, 0, 1, 1, 0]
# => [0, 0, 3, 0, 0, 0, 2, 0, 1, 0, 1, 1, 0]
> quick_bench(10**5) do
> a = []
> ary.each_with_index{|v,i| a << Array.new(v, i) if v != 0}
> a.flatten
> end
Rehearsal ------------------------------------
1.200000 0.020000 1.220000 ( 1.209861)
--------------------------- total: 1.220000sec
user system total real
1.150000 0.000000 1.150000 ( 1.147103)
# => nil
> quick_bench(10**5) do
> a = []
> ary.each_with_index{|v,i| a.push(*Array.new(v, i)) if v != 0}
> end
Rehearsal ------------------------------------
0.870000 0.000000 0.870000 ( 0.865190)
--------------------------- total: 0.870000sec
user system total real
0.860000 0.000000 0.860000 ( 0.858628)
# => nil
> a = []
# => []
> ary.each_with_index{|v,i| a.push(*Array.new(v, i)) if v != 0}
# => [0, 0, 3, 0, 0, 0, 2, 0, 1, 0, 1, 1, 0]
> a
# => [2, 2, 2, 6, 6, 8, 10, 11]
>

Resources