Reformat text file using awk or sed - bash

I have a text data file that looks like below:
Day-Hour, 08188, 0, 08188, 1, (indicating the time is year 2008, julian day 188, between hour0 and hour1)
Receptor, A, (actual data begins)
1, 2, 3, 4,
5, 6, 7, 8,
Receptor, B,
1, 2, 3, 4,
5, 6, 7, 8,
... (continue data for a total of 22 receptors, each receptor has 8 data values)
Day-Hour, 08188, 1, 08188, 2,
Receptor, A,
1, 2, 3, 4,
5, 6, 7, 8,
Receptor, B,
1, 2, 3, 4,
5, 6, 7, 8,
... (continue data for a total of 22 receptors, each receptor has 8 data values, but this is for hours 1 to 2)
...... (continue the same previous pattern for a total of 24 times)
I'd like to reformat it to be like this:
day, time, receptor, data1, data2, data3, ....data8 (header)
08188, 0, A, 1, 2, 3, 4, 5, 6, 7, 8
08188, 0, B, 1, 2, 3, 4, 5, 6, 7, 8
... (repeat the same hour for all 22 receptor sites)
08188, 1, A, 1, 2, 3, 4, 5, 6, 7, 8
08188, 1, B, 1, 2, 3, 4, 5, 6, 7, 8
...(repeat the same hour for all 22 receptor sites)
...
...(repeat the same order 24 times)
I've managed to achieve the format I want through a couple of steps using combinations of awk and sed with something like below, but I feel it's kind of dumb to go through so many steps, and am hoping for experts' help to approach this in a much simpler step. Your inputs are greatly appreciated!
(example steps:)
step1: $ grep -v "Day-Hour" infile.txt > temp1.txt # remove all Day-Hour lines,
# as I know the order of the data
step2: $ sed '/^$/d' temp1.txt > temp2.txt # remove empty lines
step3: $ awk 'ORS=NR%3" ":"\n"' temp2.txt > temp3.txt #concatenate every 3 lines
step4: $ (create a file, e.g. daytime.txt, with 2 fields (day and hour) with following content)
08188, 0,
(repeat 22 times)
08188, 1,
(repeat 22 times)
.... (continue through hour 23)
step5: $ paste daytime.txt temp3.txt > final.txt

This may do the job:
cat file
Day-Hour, 08188, 0, 08188, 1
Receptor, A,
1, 2, 3, 4,
5, 6, 7, 8,
Receptor, B,
11, 12, 13, 14,
15, 16, 17, 18,
Receptor, C,
21, 22, 23, 24,
25, 26, 27, 28,
Day-Hour, 08188, 1, 08188, 2
Receptor, A,
1, 2, 3, 4,
5, 6, 7, 8,
Receptor, B,
1, 2, 3, 4,
5, 6, 7, 8,
awk -v RS= -v OFS=", " -F", *|\n" 'BEGIN {print "day, time, receptor, data1, data2, data3,....data8"} {for (i=7;i<=NF;i+=13) print $2,$3,$i,$(i+2),$(i+3),$(i+4),$(i+5),$(i+7),$(i+8),$(i+9),$(i+10)}' file
day, time, receptor, data1, data2, data3,....data8
08188, 0, A, 1, 2, 3, 4, 5, 6, 7, 8
08188, 0, B, 11, 12, 13, 14, 15, 16, 17, 18
08188, 0, C, 21, 22, 23, 24, 25, 26, 27, 28
08188, 1, A, 1, 2, 3, 4, 5, 6, 7, 8
08188, 1, B, 1, 2, 3, 4, 5, 6, 7, 8
This will print all Receptor, if its 1 or 22.

This will join them up:
sed 's/$/,/;N;N;N;N;N;N;N; s/\n/ /g' foo.txt
into this:
Day-Hour, 08188, 0, 08188, 1, Receptor, A, 1, 2, 3, 4, 5, 6, 7, 8,
Receptor, B, 1, 2, 3, 4, 5, 6, 7, 8, Day-Hour, 08188, 1, 08188, 2,
Receptor, A, 1, 2, 3, 4, 5, 6, 7, 8, Receptor, B, 1, 2, 3, 4, 5, 6, 7,
8,
Then I got lazy in the repackaging:
... | awk '{ $1 = ""; $4 = ""; $5 = ""; print }' | sed -e 's/ \(.*\) Receptor, \(A,.*\)Receptor, \(B, .*\)/\1\2\n\1\3/'
Which producd the desired output on my system.

Related

split an array which comtains partially relatively order into two sorted array in O(n) time

Assume I have two arrays, both of them are sorted, for example:
A: [1, 4, 5, 8, 10, 24]
B: [3, 6, 9, 29, 50, 65]
And then I merge these two array into one array and keep original relative order of both two array
C: [1, 4, 3, 5, 6, 9, 8, 29, 10, 24, 50, 65]
Is there any way to split C into two sorted array in O(n) time?
note: not necessarily into the original A and B
Greedily assign your integers to list 1 if they can go there. If they can't, assign them to list 2.
Here's some Ruby code to play around with this idea. It randomly splits the integers from 0 to n-1 into two sorted lists, then randomly merges them, then applies the greedy approach.
def f(n)
split1 = []
split2 = []
0.upto(n-1) do |i|
if rand < 0.5
split1.append(i)
else
split2.append(i)
end
end
puts "input 1: #{split1.to_s}"
puts "input 2: #{split2.to_s}"
merged = []
split1.reverse!
split2.reverse!
while split1.length > 0 && split2.length > 0
if rand < 0.5
merged.append(split1.pop)
else
merged.append(split2.pop)
end
end
merged += split1.reverse
merged += split2.reverse
puts "merged: #{merged.to_s}"
merged.reverse!
greedy1 = [merged.pop]
greedy2 = []
while merged.length > 0
if merged[-1] >= greedy1[-1]
greedy1.append(merged.pop)
else
greedy2.append(merged.pop)
end
end
puts "greedy1: #{greedy1.to_s}"
puts "greedy2: #{greedy2.to_s}"
end
Here's sample output:
> f(20)
input 1: [2, 3, 4, 5, 8, 9, 10, 18, 19]
input 2: [0, 1, 6, 7, 11, 12, 13, 14, 15, 16, 17]
merged: [2, 0, 1, 6, 3, 4, 5, 8, 9, 7, 10, 11, 18, 12, 13, 19, 14, 15, 16, 17]
greedy1: [2, 6, 8, 9, 10, 11, 18, 19]
greedy2: [0, 1, 3, 4, 5, 7, 12, 13, 14, 15, 16, 17]
> f(20)
input 1: [1, 3, 5, 6, 8, 9, 10, 11, 13, 15]
input 2: [0, 2, 4, 7, 12, 14, 16, 17, 18, 19]
merged: [0, 2, 4, 7, 12, 14, 16, 1, 3, 5, 6, 8, 17, 9, 18, 10, 19, 11, 13, 15]
greedy1: [0, 2, 4, 7, 12, 14, 16, 17, 18, 19]
greedy2: [1, 3, 5, 6, 8, 9, 10, 11, 13, 15]
> f(20)
input 1: [0, 1, 2, 6, 7, 9, 11, 14, 15, 18]
input 2: [3, 4, 5, 8, 10, 12, 13, 16, 17, 19]
merged: [3, 4, 5, 8, 10, 12, 0, 13, 16, 17, 1, 19, 2, 6, 7, 9, 11, 14, 15, 18]
greedy1: [3, 4, 5, 8, 10, 12, 13, 16, 17, 19]
greedy2: [0, 1, 2, 6, 7, 9, 11, 14, 15, 18]
Let's take your example.
[1, 4, 3, 5, 6, 9, 8, 29, 10, 24, 50, 65]
In time O(n) you can work out the minimum of the tail.
[1, 3, 3, 5, 6, 8, 8, 10, 10, 24, 50, 65]
And now the one stream is all cases where it is the minimum, and the other is the cases where it isn't.
[1, 3, 5, 6, 8, 10, 24, 50, 65]
[ 4, 9, 29, ]
This is all doable in time O(n).
We can go further and now split into 3 streams based on which values in the first stream could have gone in the last without changing it being increasing.
[ 3, 5, 6, 8, 10, 24, ]
[1, 5, 6, 8, 50, 65]
[ 4, 9, 29, ]
And now we can start enumerating the 2^6 = 64 different ways of splitting the original stream back into 2 increasing streams.

How to prove this josephus problem variation is a np-complete problem?

I have a problem that is a Josephus problem variation. It is described below:
There are m cards with number from 1 to m,and each of them has a unique number. The cards are dispatched to n person who sit in a circle. Note that m >= n.
Then we choose the person "A" who sits at the position "p" to out of the circle, just like the Josephus problem does. Next step we skip "k" person at the right of p while k is the number of the card toked by the person "A", and we do the same thing until only one person left in the circle.
Question is given n person and m cards, can we choose n cards and allocate them to the n person, to make that whether start at which position(exclude the first position), the person survival at the end is always the first person in the circle.
For example, m = n = 5, the only solution is (4, 1, 5, 3, 2).
I think this problem is a np-complete problem, but I can't prove it. Anybody has a good idea to find a polynomial time solution or prove it's np-hard?
--- example solutions ---
2: [ 1, 2]
2: [ 2, 1]
3: [ 1, 3, 2]
3: [ 3, 1, 2]
4: [ 4, 1, 3, 2]
5: [ 4, 1, 5, 3, 2]
7: [ 5, 7, 3, 1, 6, 4, 2]
9: [ 2, 7, 3, 9, 1, 6, 8, 5, 4]
9: [ 3, 1, 2, 7, 6, 5, 9, 4, 8]
9: [ 3, 5, 1, 8, 9, 6, 7, 4, 2]
9: [ 3, 9, 2, 7, 6, 1, 5, 4, 8]
9: [ 6, 1, 8, 3, 7, 9, 4, 5, 2]
10: [ 3, 5, 6, 10, 1, 9, 8, 7, 4, 2]
10: [ 4, 5, 2, 8, 7, 10, 6, 1, 9, 3]
10: [ 5, 1, 9, 2, 10, 3, 7, 6, 8, 4]
10: [ 6, 3, 1, 10, 9, 8, 7, 4, 5, 2]
10: [ 8, 5, 9, 10, 1, 7, 2, 6, 4, 3]
10: [10, 5, 2, 1, 8, 7, 6, 9, 3, 4]
11: [ 2, 1, 10, 11, 9, 3, 7, 5, 6, 8, 4]
11: [ 3, 7, 11, 10, 9, 8, 1, 6, 5, 4, 2]
11: [ 3, 11, 10, 9, 8, 1, 7, 2, 4, 5, 6]
11: [ 4, 1, 10, 2, 9, 8, 7, 5, 11, 3, 6]
11: [ 4, 2, 7, 11, 5, 1, 10, 9, 6, 3, 8]
11: [ 4, 7, 2, 3, 1, 10, 9, 6, 11, 5, 8]
11: [ 4, 7, 3, 9, 11, 10, 1, 8, 6, 5, 2]
11: [ 4, 11, 7, 2, 1, 10, 9, 6, 5, 3, 8]
11: [ 5, 11, 3, 9, 8, 7, 6, 1, 10, 4, 2]
11: [ 6, 1, 10, 2, 9, 8, 7, 5, 11, 3, 4]
11: [ 6, 2, 7, 11, 5, 1, 10, 9, 4, 3, 8]
11: [ 6, 11, 1, 3, 10, 2, 7, 5, 4, 9, 8]
11: [ 9, 5, 3, 1, 10, 2, 8, 7, 11, 6, 4]
12: [ 1, 7, 11, 10, 4, 9, 2, 12, 6, 5, 8, 3]
12: [ 3, 7, 12, 2, 11, 10, 9, 1, 6, 5, 4, 8]
12: [ 3, 8, 11, 2, 12, 9, 1, 7, 5, 10, 4, 6]
12: [ 4, 2, 5, 1, 11, 10, 9, 8, 12, 7, 3, 6]
12: [ 4, 3, 7, 6, 1, 11, 10, 9, 8, 12, 5, 2]
12: [ 5, 1, 6, 11, 9, 2, 10, 7, 12, 8, 3, 4]
12: [ 5, 2, 3, 12, 9, 10, 7, 6, 1, 11, 4, 8]
12: [ 5, 7, 12, 2, 10, 9, 8, 11, 1, 4, 6, 3]
12: [ 7, 1, 2, 3, 5, 9, 10, 8, 11, 6, 12, 4]
12: [ 8, 7, 1, 11, 9, 3, 5, 10, 6, 4, 12, 2]
12: [ 8, 7, 11, 10, 12, 3, 1, 9, 6, 5, 4, 2]
12: [12, 3, 11, 5, 1, 10, 8, 7, 6, 4, 9, 2]
12: [12, 7, 11, 1, 9, 3, 2, 10, 6, 5, 4, 8]
13: [ 2, 1, 4, 7, 11, 6, 3, 10, 13, 5, 8, 12, 9]
13: [ 2, 5, 13, 12, 4, 11, 3, 1, 9, 7, 8, 6, 10]
13: [ 2, 13, 12, 11, 3, 1, 9, 4, 8, 7, 10, 5, 6]
13: [ 3, 5, 2, 1, 12, 9, 11, 10, 7, 6, 13, 4, 8]
13: [ 3, 5, 13, 1, 11, 2, 9, 8, 7, 12, 6, 4, 10]
13: [ 4, 13, 3, 1, 12, 11, 10, 9, 7, 2, 5, 6, 8]
13: [ 6, 4, 3, 1, 10, 11, 13, 5, 9, 12, 7, 8, 2]
13: [ 6, 4, 13, 7, 5, 1, 12, 11, 10, 9, 8, 3, 2]
13: [ 6, 7, 3, 13, 12, 11, 10, 2, 1, 9, 5, 4, 8]
13: [ 6, 7, 13, 11, 2, 10, 9, 1, 8, 12, 5, 3, 4]
13: [ 6, 11, 7, 13, 1, 10, 2, 12, 9, 8, 5, 4, 3]
13: [ 7, 3, 2, 1, 11, 10, 9, 8, 13, 5, 12, 4, 6]
13: [ 7, 5, 13, 3, 10, 11, 2, 9, 1, 6, 8, 4, 12]
13: [ 7, 5, 13, 3, 11, 2, 9, 8, 1, 6, 12, 4, 10]
13: [ 7, 5, 13, 3, 11, 12, 2, 1, 9, 8, 6, 4, 10]
13: [ 7, 9, 1, 11, 3, 13, 2, 10, 12, 6, 5, 4, 8]
13: [ 8, 3, 5, 11, 13, 9, 10, 7, 1, 6, 4, 12, 2]
13: [ 8, 3, 13, 1, 5, 11, 10, 9, 12, 7, 6, 4, 2]
13: [ 9, 3, 13, 2, 10, 4, 1, 7, 6, 5, 12, 11, 8]
13: [ 9, 4, 7, 5, 1, 11, 13, 10, 12, 8, 6, 3, 2]
13: [ 9, 5, 4, 13, 2, 11, 8, 10, 1, 7, 12, 3, 6]
13: [ 9, 5, 13, 4, 11, 1, 8, 3, 7, 12, 6, 10, 2]
13: [10, 4, 3, 5, 13, 1, 9, 11, 7, 6, 8, 12, 2]
13: [11, 2, 7, 3, 12, 1, 10, 9, 6, 5, 13, 4, 8]
13: [11, 13, 5, 2, 10, 9, 8, 7, 1, 6, 4, 3, 12]
13: [11, 13, 7, 1, 12, 9, 2, 3, 10, 5, 4, 6, 8]
13: [12, 1, 3, 5, 11, 13, 4, 10, 9, 8, 7, 6, 2]
13: [12, 7, 13, 3, 11, 1, 9, 8, 6, 5, 10, 4, 2]
13: [12, 13, 7, 11, 2, 5, 1, 9, 10, 6, 4, 3, 8]
13: [13, 3, 1, 12, 11, 2, 9, 10, 7, 6, 4, 5, 8]
13: [13, 3, 7, 1, 5, 12, 4, 10, 9, 8, 11, 6, 2]
14: [ 3, 5, 13, 14, 1, 12, 11, 10, 9, 8, 7, 6, 4, 2]
14: [ 3, 9, 1, 13, 11, 10, 2, 4, 7, 14, 6, 8, 5, 12]
14: [ 3, 14, 4, 12, 11, 1, 9, 8, 2, 13, 7, 5, 10, 6]
14: [ 4, 11, 1, 13, 7, 10, 12, 2, 14, 9, 8, 5, 6, 3]
14: [ 4, 14, 2, 5, 13, 1, 12, 11, 7, 6, 10, 9, 3, 8]
14: [ 5, 7, 1, 13, 12, 11, 10, 2, 9, 8, 14, 6, 4, 3]
14: [ 6, 3, 14, 5, 11, 13, 2, 12, 9, 1, 7, 4, 8, 10]
14: [ 6, 14, 1, 12, 5, 13, 2, 11, 9, 7, 8, 4, 3, 10]
14: [ 7, 5, 13, 12, 1, 11, 4, 10, 2, 14, 9, 8, 6, 3]
14: [ 7, 11, 5, 13, 1, 3, 2, 4, 10, 9, 14, 6, 8, 12]
14: [ 7, 14, 1, 13, 2, 5, 11, 12, 10, 9, 8, 4, 3, 6]
14: [ 8, 7, 5, 13, 2, 11, 3, 9, 10, 12, 1, 14, 4, 6]
14: [11, 2, 10, 5, 8, 7, 9, 1, 13, 14, 12, 4, 3, 6]
14: [11, 3, 14, 2, 13, 1, 10, 8, 9, 7, 5, 12, 4, 6]
14: [11, 5, 3, 14, 2, 1, 13, 10, 8, 7, 6, 12, 4, 9]
14: [11, 14, 5, 3, 13, 1, 10, 2, 9, 4, 7, 8, 12, 6]
14: [12, 1, 14, 3, 13, 4, 10, 9, 2, 7, 6, 5, 11, 8]
14: [12, 11, 7, 5, 13, 3, 2, 14, 1, 9, 8, 4, 6, 10]
14: [12, 14, 7, 13, 6, 5, 11, 1, 10, 9, 8, 4, 3, 2]
14: [13, 1, 7, 2, 11, 3, 9, 14, 8, 6, 5, 10, 4, 12]
14: [13, 11, 3, 1, 4, 2, 7, 10, 9, 6, 14, 12, 5, 8]
14: [14, 1, 13, 3, 11, 5, 10, 9, 2, 6, 8, 7, 4, 12]
14: [14, 5, 1, 13, 12, 2, 11, 3, 7, 9, 6, 8, 4, 10]
--- possibly helpful for a mathematical solution ---
I noticed that starting with length 9, at least one solution for every length has a longish sequence of integers that decrement by 1.
9: [3, 1, 2, 7, 6, 5, 9, 4, 8]
10: [6, 3, 1, 10, 9, 8, 7, 4, 5, 2]
11: [3, 7, 11, 10, 9, 8, 1, 6, 5, 4, 2]
11: [3, 11, 10, 9, 8, 1, 7, 2, 4, 5, 6]
11: [5, 11, 3, 9, 8, 7, 6, 1, 10, 4, 2]
12: [4, 2, 5, 1, 11, 10, 9, 8, 12, 7, 3, 6]
12: [4, 3, 7, 6, 1, 11, 10, 9, 8, 12, 5, 2]
13: [6, 4, 13, 7, 5, 1, 12, 11, 10, 9, 8, 3, 2]
14: [3, 5, 13, 14, 1, 12, 11, 10, 9, 8, 7, 6, 4, 2]
I noticed that for every length I tested except the very small, at least one solution contains a relatively long run of descending
numbers. So far this answer only considers m = n. Here are a few examples; note that excess is n - run_len:
n = 3, run_len = 2, excess = 1: [1] + [3-2] + []
n = 4, run_len = 2, excess = 2: [4, 1] + [3-2] + []
n = 5, run_len = 2, excess = 3: [4, 1, 5] + [3-2] + []
n = 6, no solution
n = 7, run_len = 1, excess = 6: [5] + [7-7] + [3, 1, 6, 4, 2]
n = 8, no solution
n = 9, run_len = 3, excess = 6: [3, 1, 2] + [7-5] + [9, 4, 8]
n = 10, run_len = 4, excess = 6: [6, 3, 1] + [10-7] + [4, 5, 2]
n = 11, run_len = 4, excess = 7: [3, 7] + [11-8] + [1, 6, 5, 4, 2]
n = 12, run_len = 4, excess = 8: [4, 2, 5, 1] + [11-8] + [12, 7, 3, 6]
n = 13, run_len = 5, excess = 8: [6, 4, 13, 7, 5, 1] + [12-8] + [3, 2]
n = 14, run_len = 7, excess = 7: [3, 5, 13, 14, 1] + [12-6] + [4, 2]
n = 15, run_len = 8, excess = 7: [3, 15, 2] + [13-6] + [1, 5, 4, 14]
n = 16, run_len = 6, excess = 10: [6, 3, 1, 10] + [16-11] + [2, 9, 7, 4, 5, 8]
n = 17, run_len = 8, excess = 9: [2, 5, 17, 15, 14, 1] + [13-6] + [4, 3, 16]
n = 18, run_len = 10, excess = 8: [6, 3, 17, 18, 1] + [16-7] + [5, 4, 2]
n = 19, run_len = 10, excess = 9: [4, 19, 3, 17, 18, 1] + [16-7] + [5, 6, 2]
n = 20, no solution found with run_length >= 10
n = 21, run_len = 14, excess = 7: [3, 21, 2] + [19-6] + [1, 5, 4, 20]
n = 22, run_len = 14, excess = 8: [22, 3, 2, 1] + [20-7] + [5, 21, 4, 6]
n = 23, run_len = 14, excess = 9: [7, 1, 23, 3] + [21-8] + [6, 5, 22, 4, 2]
n = 24, run_len = 16, excess = 8: [6, 5, 24, 2] + [22-7] + [3, 1, 23, 4]
n = 25, run_len = 17, excess = 8: [25, 3, 2, 1] + [23-7] + [5, 24, 4, 6]
n = 26, run_len = 17, excess = 9: [26, 3, 25, 2, 1] + [23-7] + [5, 24, 4, 6]
n = 27, run_len = 20, excess = 7: [3, 27, 2] + [25-6] + [1, 5, 4, 26]
n = 28, run_len = 18, excess = 10: [28, 1, 27, 2, 3] + [25-8] + [6, 5, 7, 4, 26]
n = 29, run_len = 20, excess = 9: [2, 5, 29, 27, 26, 1] + [25-6] + [4, 3, 28]
n = 30, run_len = 23, excess = 7: [30, 5, 2, 1] + [28-6] + [29, 3, 4]
n = 31, run_len = 24, excess = 7: [5, 31, 3] + [29-6] + [1, 30, 4, 2]
n = 32, run_len = 23, excess = 9: [7, 32, 31, 2, 1] + [30-8] + [5, 4, 3, 6]
n = 33, run_len = 26, excess = 7: [3, 33, 2] + [31-6] + [1, 5, 4, 32]
n = 34, run_len = 27, excess = 7: [3, 5, 33, 34, 1] + [32-6] + [4, 2]
n = 35, run_len = 27, excess = 8: [5, 35, 3, 33, 34, 1] + [32-6] + [4, 2]
n = 36, run_len = 26, excess = 10: [35, 7, 3, 1, 36, 2] + [34-9] + [6, 5, 4, 8]
n = 37, run_len = 29, excess = 8: [6, 5, 2, 1] + [35-7] + [36, 37, 3, 4]
n = 38, run_len = 29, excess = 9: [3, 7, 37, 38, 1] + [36-8] + [6, 4, 5, 2]
n = 39, run_len = 32, excess = 7: [3, 39, 2] + [37-6] + [1, 5, 4, 38]
n = 40, run_len = 31, excess = 9: [5, 2, 1] + [38-8] + [3, 7, 40, 4, 6, 39]
n = 41, run_len = 33, excess = 8: [3, 5, 1, 40, 2] + [38-6] + [41, 39, 4]
n = 42, run_len = 33, excess = 9: [42, 3, 41, 2, 1] + [39-7] + [5, 4, 40, 6]
n = 43, run_len = 34, excess = 9: [6, 5, 7, 43, 1] + [41-8] + [42, 4, 3, 2]
n = 44, run_len = 35, excess = 9: [5, 3, 2, 1] + [42-8] + [43, 7, 4, 44, 6]
n = 45, run_len = 38, excess = 7: [3, 45, 2] + [43-6] + [1, 5, 4, 44]
n = 50, run_len = 43, excess = 7: [50, 5, 2, 1] + [48-6] + [49, 3, 4]
n = 100, run_len = 91, excess = 9: [5, 2, 1] + [98-8] + [3, 7, 100, 4, 6, 99]
n = 201, run_len = 194, excess = 7: [3, 201, 2] + [199-6] + [1, 5, 4, 200]
20 is missing from the above table because the run length is at most 10, and is taking a long time to compute. No larger value that I've tested has such a small max run length relative to n.
I found these by checking run lengths from n-1 descending, with all possible starting values and permutations of the run & surrounding elements. This reduces the search space immensely.
For a given n, if the max run in any solution to n is length n-k, then this will find it in O(k! * n). While this looks grim, if k has a constant upper bound (e.g. k <= some threshold for all sufficiently large n) then this is effectively O(n). 'Excess' is what I'm calling k in the examples above. I haven't found any greater than 10, but I don't have a solution yet to n = 20. If it has a solution then its excess will exceed 10.
UPDATE: There are a lot of patterns here.
If n mod 6 is 3 and n >= 9, then [3, n, 2, [n-2, n-3, ..., 6], 1, 5, 4, n-1] is valid.
If n mod 12 is 5 and n >= 17 then [2, 5, n, n-2, n-3, 1, [n-4, n-5, ..., 6], 4, 3, n-1] is valid.
If n mod 20 is 10, then [n, 5, 2, 1, [n-2, n-3, ..., 6], n-1, 3, 4] is valid.
If n mod 60 is 7, 11, 31, or 47, then [5, n, 3, [n-2, n-3, ..., 6], 1, n-1, 4, 2] is valid.
If n mod 60 is 6 or 18 and n >= 18 then [6, 3, n-1, n, 1, [n-2, n-3, ..., 7], 5, 4, 2] is valid.
If n mod 60 is 1, 22, 25 or 52 and n >= 22 then [n, 3, 2, 1], [n-2, n-3, ..., 7], 5, n-1, 4, 6] is valid.
If n mod 60 is 23 then [7, 1, n, 3, [n-2, n-3, ..., 8], 6, 5, n-1, 4, 2] is valid.
If n mod 60 is 14 or 34 then [3, 5, n-1, n, 1, [n-2, n-3, ..., 6], 4, 2] is valid.
If n mod 60 is 24 then [6, 5, n, 2, [n-2, n-1, ..., 7], 3, 1, n-1, 4] is valid
If n mod 60 is 2, 6, 26, 42 and n >= 26 then [n, 3, n-1, 2, 1, [n-3, n-4, ..., 7], 5, n-2, 4, 6] is valid.
If n mod 60 is 16 or 28 then [n, 1, n-1, 2, 3, [n-3, n-4, ..., 8], 6, 5, 7, 4, n-2] is valid.
If n mod 60 is 32 then [7, n, n-1, 2, 1, [n-2, n-3, ..., 8], 5, 4, 3, 6] is valid.
If n mod 60 is 35 or 47 then [5, n, 3, n-2, n-1, 1, [n-3, n-4, ..., 6], 4, 2] is valid.
If n mod 60 is 37 then [6, 5, 2, 1, [n-2, n-1, ..., 7], n-1, n, 3, 4]
If n mod 60 is 38 then [3, 7, n-1, n, 1] + [n-2, n-3, ..., 8] + [6, 4, 5, 2]
If n mod 60 is 40 then [5, 2, 1, [n-2, n-3, ..., 8], 3, 7, n, 4, 6, n-1] is valid
If n mod 60 is 0 and n >= 60 then [3, 5, n, 2, [n-2, n-3, ..., 7], 1, 6, n-1, 4] is valid
If n mod 60 is 7, 19, or 31 and n >= 19 then [4, n, 3, n-2, n-1, 1, [n-3, n-4, ..., 7], 5, 6, 2] is valid
If n mod 60 is 23, 38, or 43 then [7, 3, n, 1, [n-2, n-3, ..., 8], 6, 5, n-1, 4, 2] is a valid solution
If n mod 60 is 14 or 44 and n >= 74 then [3, 5, n-1, n, 1, [n-3, n-4, ..., 6], n-2, 4, 2] is valid.
If n mod 60 is 1 or 49 and n >= 49 then [3, 5, n, 1, [n-2, n-3, ..., 7], 2, n-1, 4, 6] is valid.
If n mod 60 is 6, 18, 30, 42, or 54 and n >= 18 then [n, 3, n-1, 2, 1, [n-3, n-4, ..., 7], 5, 4, n-2, 6] is valid.
If n mod 60 is 10, 18, 38 or 58 and n >= 18 then [n-1, 7, 5, n, 1, [n-2, n-3, ..., 8], 2, 6, 4, 3] is valid.
Currently solved for n mod 60 is any of the following values:
0, 1, 2, 3, 5, 6, 7, 9,
10, 11, 14, 15, 16, 17, 18, 19,
21, 22, 23, 24, 25, 26, 27, 28, 29,
30, 31, 32, 33, 34, 35, 37, 38, 39,
40, 41, 42, 43, 44, 45, 47, 49,
50, 51, 52, 53, 54, 57, 58
Also,
If n mod 42 is 31 then [n, 3, 2, 1, [n-2, n-3, ..., 8], n-1, 5, 4, 7, 6] is valid.
If n mod 420 is 36 or 396 then [n-1, 7, 3, 1, n, 2, [n-2, n-3, ..., 9], 6, 5, 4, 8] is valid.
--- Example for n=21, using the first pattern listed above, and all starting indices.
1: [21, 2, 18, 19, 16, 17, 14, 15, 12, 13, 10, 11, 8, 9, 6, 7, 5, 4, 20, 1]
2: [ 2, 18, 21, 16, 19, 14, 17, 12, 15, 10, 13, 8, 11, 6, 9, 5, 1, 4, 20, 7]
3: [19, 21, 18, 2, 16, 17, 14, 15, 12, 13, 10, 11, 8, 9, 6, 7, 5, 4, 20, 1]
4: [18, 21, 19, 17, 2, 15, 16, 13, 14, 11, 12, 9, 10, 7, 8, 1, 5, 4, 20, 6]
5: [17, 21, 19, 18, 16, 2, 14, 15, 12, 13, 10, 11, 8, 9, 6, 7, 5, 4, 20, 1]
6: [16, 21, 19, 18, 17, 15, 2, 13, 14, 11, 12, 9, 10, 7, 8, 1, 5, 4, 20, 6]
7: [15, 21, 19, 18, 17, 16, 14, 2, 12, 13, 10, 11, 8, 9, 6, 7, 5, 4, 20, 1]
8: [14, 21, 19, 18, 17, 16, 15, 13, 2, 11, 12, 9, 10, 7, 8, 1, 5, 4, 20, 6]
9: [13, 21, 19, 18, 17, 16, 15, 14, 12, 2, 10, 11, 8, 9, 6, 7, 5, 4, 20, 1]
10: [12, 21, 19, 18, 17, 16, 15, 14, 13, 11, 2, 9, 10, 7, 8, 1, 5, 4, 20, 6]
11: [11, 21, 19, 18, 17, 16, 15, 14, 13, 12, 10, 2, 8, 9, 6, 7, 5, 4, 20, 1]
12: [10, 21, 19, 18, 17, 16, 15, 14, 13, 12, 11, 9, 2, 7, 8, 1, 5, 4, 20, 6]
13: [ 9, 21, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 8, 2, 6, 7, 5, 4, 20, 1]
14: [ 8, 21, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 7, 2, 1, 5, 4, 20, 6]
15: [ 7, 21, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 6, 2, 5, 4, 20, 1]
16: [ 6, 21, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 1, 5, 4, 20, 2]
17: [ 1, 5, 2, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 4, 19, 20, 21]
18: [ 5, 2, 18, 19, 16, 17, 14, 15, 12, 13, 10, 11, 8, 9, 6, 7, 4, 1, 20, 21]
19: [ 4, 2, 18, 19, 16, 17, 14, 15, 12, 13, 10, 11, 8, 9, 6, 7, 5, 20, 21, 1]
20: [20, 4, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 1, 5, 21, 2]
You can observe the same relationship between elements from the decrementing run and other elements for all values of n that the pattern applies to. This isn't a proof, but you can turn this into a proof, though I think the work would need to be done for each pattern separately and it's beyond the scope of what I'm going to spend time on for an S/O question.
--- We can fill in the blanks by using m > n. ---
The pattern [n-1, n, 1, [n-2, n-3, ..., 3], n+5] is valid for n mod 4 is 1 and n >= 9.
The pattern [n, 2, 1, [n-2, n-3, ..., 3], n+4] is valid for n mod 2 is 0 and n >= 6.
With these two, plus what we already found, we get nearly everything. I found these by checking a single replacement value in a limited range.
0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
20, 21, 22, 23, 24, 25, 26, 27, 28, 29,
30, 31, 32, 33, 34, 35, 36, 37, 38, 39,
40, 41, 42, 43, 44, 45, 46, 47, 48, 49,
50, 51, 52, 53, 54, 56, 57, 58
If n mod 30 is 29, then [3, n, 2, [n-2, n-3, ..., 4], n-1, n+15) is valid, giving us n mod 60 is 59. We're left with just one unknown: n mod 60 is 55.
...And finally! If n mod 12 is 7 (i.e. n mod 60 is 7, 19, 31, 43, or 55) then [n-1, n, 1, [n-2, n-3, ..., 6], 2, 5, 3, n+4] is valid for all n >= 19.
We now have solutions for all n mod 60, using m=n in most cases, and m=n+15 in the worst case.

Find Top N Most Frequent Sequence of Numbers in List of a Billion Sequences

Let's say I have the following list of lists:
x = [[1, 2, 3, 4, 5, 6, 7], # sequence 1
[6, 5, 10, 11], # sequence 2
[9, 8, 2, 3, 4, 5], # sequence 3
[12, 12, 6, 5], # sequence 4
[5, 8, 3, 4, 2], # sequence 5
[1, 5], # sequence 6
[2, 8, 8, 3, 5, 9, 1, 4, 12, 5, 6], # sequence 7
[7, 1, 7, 3, 4, 1, 2], # sequence 8
[9, 4, 12, 12, 6, 5, 1], # sequence 9
]
Essentially, for any list that contains the target number 5 (i.e., target=5) anywhere within the list, what are the top N=2 most frequently observed subsequences with length M=4?
So, the conditions are:
if target doesn't exist in the list then we ignore that list completely
if the list length is less than M then we ignore the list completely
if the list is exactly length M but target is not in the Mth position then we ignore it (but we count it if target is in the Mth position)
if the list length, L, is longer than M and target is in the i=M position(ori=M+1position, ori=M+2position, ...,i=Lposition) then we count the subsequence of lengthMwheretarget` is in the final position in the subsequence
So, using our list-of-lists example, we'd count the following subsequences:
subseqs = [[2, 3, 4, 5], # taken from sequence 1
[2, 3, 4, 5], # taken from sequence 3
[12, 12, 6, 5], # taken from sequence 4
[8, 8, 3, 5], # taken from sequence 7
[1, 4, 12, 5], # taken from sequence 7
[12, 12, 6, 5], # taken from sequence 9
]
Of course, what we want are the top N=2 subsequences by frequency. So, [2, 3, 4, 5] and [12, 12, 6, 5] are the top two most frequent sequences by count. If N=3 then all of the subsequences (subseqs) would be returned since there is a tie for third.
Important
This is super simplified but, in reality, my actual list-of-sequences
consists of a few billion lists of positive integers (between 1 and 10,000)
each list can be as short as 1 element or as long as 500 elements
N and M can be as small as 1 or as big as 100
My questions are:
Is there an efficient data structure that would allow for fast queries assuming that N and M will always be less than 100?
Are there known algorithms for performing this kind of analysis for various combinations of N and M? I've looked at suffix trees but I'd have to roll my own custom version to even get close to what I need.
For the same dataset, I need to repeatedly query the dataset for various values or different combinations of target, N, and M (where target <= 10,000, N <= 100 and `M <= 100). How can I do this efficiently?
Extending on my comment. Here is a sketch how you could approach this using an out-of-the-box suffix array:
1) reverse and concatenate your lists with a stop symbol (I used 0 here).
[7, 6, 5, 4, 3, 2, 1, 0, 11, 10, 5, 6, 0, 5, 4, 3, 2, 8, 9, 0, 5, 6, 12, 12, 0, 2, 4, 3, 8, 5, 0, 5, 1, 0, 6, 5, 12, 4, 1, 9, 5, 3, 8, 8, 2, 0, 2, 1, 4, 3, 7, 1, 7, 0, 1, 5, 6, 12, 12, 4, 9]
2) Build a suffix array
[53, 45, 24, 30, 12, 19, 33, 7, 32, 6, 47, 54, 51, 38, 44, 5, 46, 25, 16, 4, 15, 49, 27, 41, 37, 3, 14, 48, 26, 59, 29, 31, 40, 2, 13, 10, 20, 55, 35, 11, 1, 34, 21, 56, 52, 50, 0, 43, 28, 42, 17, 18, 39, 60, 9, 8, 23, 36, 58, 22, 57]
3) Build the LCP array. The LCP array will tell you how many numbers a suffix has in common with its neighbour in the suffix array. However, you need to stop counting when you encounter a stop symbol
[0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 2, 1, 1, 0, 2, 1, 1, 2, 0, 1, 3, 2, 2, 1, 0, 1, 1, 1, 4, 1, 2, 4, 1, 0, 1, 2, 1, 3, 0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0, 1, 2, 1, 2, 0]
4) When a query comes in (target = 5, M= 4) you search for the first occurence of your target in the suffix array and scan the corresponding LCP-array until the starting number of suffixes changes. Below is the part of the LCP array that corresponds to all suffixes starting with 5.
[..., 1, 1, 1, 4, 1, 2, 4, 1, 0, ...]
This tells you that there are two sequences of length 4 that occur two times. Brushing over some details using the indexes you can find the sequences and revert them back to get your final results.
Complexity
Building up the suffix array is O(n) where n is the total number of elements in all lists and O(n) space
Building the LCP array is also O(n) in both time and space
Searching a target number in the suffix is O(log n) in average
The cost of scanning through the relevant subsequences is linear in the number of times the target occurs. Which should be 1/10000 on average according to your given parameters.
The first two steps happen offline. Querying is technically O(n) (due to step 4) but with a small constant (0.0001).

Linux script using a Hardware (True) Random number generator

I'd like to use the built in hardware random number generator in my RPI3 for a project. Currently I'm only able to use /dev/hwrng to save binary dumps with
dd if=/dev/hwrng of=data.bin bs=25 count=1
What I need for my project is to read 200 bit long data chunks from the random source (/dev/hwrng) with a frequency of 1 reading/second and count the 1's in it and write the result as decimal into a text file with a timestamp, like this:
datetime, value
11/20/2018 12:48:09, 105
11/20/2018 12:48:10, 103
11/20/2018 12:48:11, 97
The decimal number should be always close to 100, since it is a random data source and the expected number of 1's and 0's should be the same.
Any help is appreciated....
I did come up wit a perl script that is close to what I wan't, so let me share it. I'm sure it could be done in a much cleaner way though...
#!/usr/bin/perl
use strict;
use warnings;
use DateTime;
my #bitcounts = (
0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4, 1, 2, 2, 3, 2, 3,
3, 4, 2, 3, 3, 4, 3, 4, 4, 5, 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4,
3, 4, 4, 5, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 1, 2,
2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, 2, 3, 3, 4, 3, 4, 4, 5,
3, 4, 4, 5, 4, 5, 5, 6, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5,
5, 6, 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, 1, 2, 2, 3,
2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4,
4, 5, 4, 5, 5, 6, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, 2, 3, 3, 4, 3, 4,
4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6,
5, 6, 6, 7, 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, 4, 5,
5, 6, 5, 6, 6, 7, 5, 6, 6, 7, 6, 7, 7, 8
);
for (my $i=0; $i <= 10; $i++) {
system("dd if=/dev/hwrng of=temprnd.bin bs=25 count=1 status=none");
my $filename = 'temprnd.bin';
open(my $fh, '<', $filename) or die "$!";
binmode $fh;
my $count = 0;
my $byte = 0;
while ( read $fh, $byte, 1 ) {
$count += $bitcounts[ord($byte)];
}
my $dt = DateTime->now;
print join ',', $dt->ymd, $dt->hms,"$count\n";
system("rm temprnd.bin");
sleep 1;
}
__END__
Try running the following code
for ((n=0; n<200; ++n)); do echo $(date '+%m/%d/%Y %H:%M:%S'), $(od -vAn -N1 -tu1 < /dev/hwrng); sleep 1; done
If you want to save it to a file, add simple redirect in the end
> somefile
Updating on the new request, try running the following code
for ((n=0; n<10; ++n)); do
count=0
for ((s=0; s<200; ++s)); do
if (( $(od -vAn -N1 -tu1 < /dev/hwrng) > 127 )); then ((++count)); fi
done
echo $(date '+%m/%d/%Y %H:%M:%S'), $count
sleep 1
done

Element-wise maximum value for two lists

Given two Mathematica sets of data such as
data1 = {0, 1, 3, 4, 8, 9, 15, 6, 5, 2, 0};
data2 = {0, 1, 2, 5, 8, 7, 16, 5, 5, 2, 1};
how can I create a set giving me the maximum value of the two lists, i.e. how to obtain
data3 = {0, 1, 3, 5, 8, 9, 16, 6, 5, 2, 1};
?
data1 = {0, 1, 3, 4, 8, 9, 15, 6, 5, 2, 0};
data2 = {0, 1, 2, 5, 8, 7, 16, 5, 5, 2, 1};
Max /# Transpose[{data1, data2}]
(* {0, 1, 3, 5, 8, 9, 16, 6, 5, 2, 1} *)
Another possible solution is to use the MapThread function:
data3 = MapThread[Max, {data1, data2}]
belisarius solution however is much faster.
Simplest, though not the fastest:
Inner[Max,data1,data2,List]

Resources