Covering the zeros with minimum lines in the Hungarian Method - algorithm

I am trying to follow the steps of covering the zeros with the minimum number of lines in the Hungarian Method as follows:
Tick all unassigned rows.
If the ticked row has zeros, then tick the correspondent column.
Within the ticked column, if there is an assignment, then tick the correspondent row.
Draw a line above each un-ticked row and ticked column.
Repeat for each unassigned row.
Then find Theta (which is the smallest uncovered value)
The problem is when I do that, I still have zeros uncovered! causing Theta to be zero and go to an infinite loop!
For example, If we take the following matrix 25 by 25):
1 5 5 2 3 1 2 3 2 4 5 2 3 1 5 5 2 3 1 5 1 4 3 2 5
5 5 3 2 3 2 5 1 4 3 2 5 3 2 4 5 2 5 2 1 1 4 1 2 5
5 1 4 3 2 5 1 1 4 1 2 5 2 2 3 4 1 4 5 3 2 4 5 2 5
1 1 4 1 2 5 3 2 4 5 2 5 5 5 1 5 1 5 5 2 2 3 4 1 4
3 2 4 5 2 5 2 2 3 4 1 4 5 4 2 1 3 2 5 5 5 1 5 1 5
2 2 3 4 1 4 5 5 1 5 1 5 5 5 2 5 5 1 4 5 4 2 1 3 2
5 5 1 5 1 5 5 5 3 2 3 2 1 5 5 1 5 1 5 5 5 2 5 5 1
5 4 2 1 3 2 5 1 4 3 2 5 5 5 4 2 1 3 2 5 1 4 3 2 5
5 5 2 5 5 1 1 1 4 1 2 5 1 5 5 2 5 5 1 1 1 4 1 2 5
2 4 5 3 4 2 3 2 4 5 2 5 2 2 4 5 3 4 2 3 2 4 5 2 5
2 2 5 5 1 3 2 2 3 4 1 4 2 2 2 5 5 1 3 2 2 3 4 1 4
4 1 5 4 5 3 5 5 1 5 1 5 5 4 1 5 4 5 3 5 5 1 5 1 5
5 1 4 3 2 5 3 2 4 5 2 5 5 5 1 4 3 2 5 3 2 4 5 2 5
1 1 4 1 2 5 2 2 3 4 1 4 1 1 1 4 1 2 5 2 2 3 4 1 4
3 2 4 5 2 5 5 5 1 5 1 5 4 3 2 4 5 2 5 5 5 1 5 1 5
2 2 3 4 1 4 5 4 2 1 3 2 1 2 2 3 4 1 4 5 4 2 1 3 2
5 5 1 5 1 5 5 5 2 5 5 1 2 5 5 1 5 1 5 5 5 2 5 5 1
5 1 4 3 2 5 3 5 1 4 3 2 5 3 5 2 2 3 5 2 2 3 2 5 3
3 4 1 4 1 1 1 1 1 4 1 2 5 5 1 4 3 2 5 1 4 1 2 5 2
1 5 5 2 3 1 5 3 2 4 5 2 5 1 1 4 1 2 5 2 4 5 2 5 5
5 5 3 2 3 2 2 2 2 3 4 1 4 3 2 4 5 2 5 2 3 4 1 4 3
5 1 4 3 2 5 2 5 5 1 5 1 5 2 2 3 4 1 4 5 1 5 1 5 5
1 1 4 1 2 5 2 5 4 2 1 3 2 5 5 1 5 1 5 4 2 1 3 2 1
3 2 4 5 2 5 1 5 5 2 5 5 1 5 4 2 1 3 2 5 2 5 5 1 3
2 2 3 4 1 4 1 2 4 5 3 4 2 5 5 2 5 5 1 4 5 3 4 2 2
After subtracting minimum row and column values as steps 1 and 2 from the Hungarian method, I get:
0 4 4 1 2 0 1 2 1 3 4 1 2 0 4 4 1 2 0 4 0 3 2 1 4
4 4 2 1 2 1 4 0 3 2 1 4 2 1 3 4 1 4 1 0 0 3 0 1 4
4 0 3 2 1 4 0 0 3 0 1 4 1 1 2 3 0 3 4 2 1 3 4 1 4
0 0 3 0 1 4 2 1 3 4 1 4 4 4 0 4 0 4 4 1 1 2 3 0 3
2 1 3 4 1 4 1 1 2 3 0 3 4 3 1 0 2 1 4 4 4 0 4 0 4
1 1 2 3 0 3 4 4 0 4 0 4 4 4 1 4 4 0 3 4 3 1 0 2 1
4 4 0 4 0 4 4 4 2 1 2 1 0 4 4 0 4 0 4 4 4 1 4 4 0
4 3 1 0 2 1 4 0 3 2 1 4 4 4 3 1 0 2 1 4 0 3 2 1 4
4 4 1 4 4 0 0 0 3 0 1 4 0 4 4 1 4 4 0 0 0 3 0 1 4
0 2 3 1 2 0 1 0 2 3 0 3 0 0 2 3 1 2 0 1 0 2 3 0 3
1 1 4 4 0 2 1 1 2 3 0 3 1 1 1 4 4 0 2 1 1 2 3 0 3
3 0 4 3 4 2 4 4 0 4 0 4 4 3 0 4 3 4 2 4 4 0 4 0 4
4 0 3 2 1 4 2 1 3 4 1 4 4 4 0 3 2 1 4 2 1 3 4 1 4
0 0 3 0 1 4 1 1 2 3 0 3 0 0 0 3 0 1 4 1 1 2 3 0 3
2 1 3 4 1 4 4 4 0 4 0 4 3 2 1 3 4 1 4 4 4 0 4 0 4
1 1 2 3 0 3 4 3 1 0 2 1 0 1 1 2 3 0 3 4 3 1 0 2 1
4 4 0 4 0 4 4 4 1 4 4 0 1 4 4 0 4 0 4 4 4 1 4 4 0
4 0 3 2 1 4 2 4 0 3 2 1 4 2 4 1 1 2 4 1 1 2 1 4 2
2 3 0 3 0 0 0 0 0 3 0 1 4 4 0 3 2 1 4 0 3 0 1 4 1
0 4 4 1 2 0 4 2 1 3 4 1 4 0 0 3 0 1 4 1 3 4 1 4 4
4 4 2 1 2 1 1 1 1 2 3 0 3 2 1 3 4 1 4 1 2 3 0 3 2
4 0 3 2 1 4 1 4 4 0 4 0 4 1 1 2 3 0 3 4 0 4 0 4 4
0 0 3 0 1 4 1 4 3 1 0 2 1 4 4 0 4 0 4 3 1 0 2 1 0
2 1 3 4 1 4 0 4 4 1 4 4 0 4 3 1 0 2 1 4 1 4 4 0 2
1 1 2 3 0 3 0 1 3 4 2 3 1 4 4 1 4 4 0 3 4 2 3 1 1
Then when we do the assignment, we will have 23 assignments instead of 25, so we do the mentioned earlier covering zeros based on the above steps, I would get the following:
The bold cells are the ones covered according to the above steps.
Notice that there are still zeros uncovered causing the infinite loop as it will be selected next.
Please help me.
Thank you in advance

You can use min-cost maximum flow algorithm for solving the problem when each worker may accomplish two tasks.
At first, let's see how to solve standard assignment problem using min-cost max flow. Create a bipartite graph where workers are in one part and tasks are in another. Put an edge with the capacity 1 and cost cost_ij between worker i and task j for all i, j. Then add a source S and edges from source to every worker of capacity 1 and cost 0. Similarly, add a sink T and edges from every task to sink of same cost and capacity. Then, if you find a min-cost max flow from S to T, then its value will be the total assignment cost.
So, if you allow each worker to select two tasks then edges from source to workers should be with capacity 2. This addition to the algorithm will solve your problem in the optimal way regardless to the given constraint on maximum difference.
However, at the moment I do not know the solution for the task with given restriction on every possible input. If your input values are something special, you may say it in the response and we'll think about special cases of the problem.

Related

Fastest way to transpose large, space delimited text file [duplicate]

This question already has answers here:
An efficient way to transpose a file in Bash
(33 answers)
Closed last month.
I face a large text file with which contains space delimited numbers, ranging from 0-9. Each line contains 3207 numbers and the file consists of 4611769 lines. I want to transpose this file.
Input example :
9 9 2 0 2 2 2 2 9 2 2 2 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 9 2 9 2 2 2 9 2 2 2 2 2 0 2 1 2 2 2 2 2 2 2 2 2 9 2 2 2 2 2 9 2 2 1 1 0 2 2 2 2 2 1 2 2 9 2 2 9 2 2 2 2 2 2 1 2 2 9 2 2 2 2 2 2 9 2 1 1 2 9 2 2 9 2 2 2 2 2 1 2 2 2 9 2 2 2 2 9 9 2 2 2 2 2 2 2 2 2 2 2 9 2 9 2 2 2 2 2 9 2 2 1 9 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 9 9 2 0 2 2 2 2 9 2 2 2 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 9 2 9 2 2 2 9 2 2 2 2 2 0 2 1 2 2 2 2 2 2 2 2 2 9 2 2 2 2 2 9 2 2 1 1 0 2 2 2 2 2 1 2 2 9 2 2 9 2 2 2 2 2 2 1 2 2 9 2 2 2 2 2 2 9 2 1 1 2 9 2 2 9 2 2 2 2 2 1 2 2 2 9 2 2 2 2 9 9 2 2 2 2 2 2 2 2 2 2 2 9 2 9 2 2 2 2 2 9 2 2 1 9 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 9 9 2 0 2 2 2 2 9 2 2 2 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 9 2 9 2 2 2 9 2 2 2 2 2 0 2 1 2 2 2 2 2 2 2 2 2 9 2 2 2 2 2 9 2 2 1 1 0 2 2 2 2 2 1 2 2 9 2 2 9 2 2 2 2 2 2 1 2 2 9 2 2 2 2 2 2 9 2 1 1 2 9 2 2 9 2 2 2 2 2 1 2 2 2 9 2 2 2 2 9 9 2 2 2 2 2 2 2 2 2 2 2 9 2 9 2 2 2 2 2 9 2 2 1 9 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 9 9 2 0 2 2 2 2 9 2 2 2 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 9 2 9 2 2 2 9 2 2 2 2 2 0 2 1 2 2 2 2 2 2 2 2 2 9 2 2 2 2 2 9 2 2 1 1 0 2 2 2 2 2 1 2 2 9 2 2 9 2 2 2 2 2 2 1 2 2 9 2 2 2 2 2 2 9 2 1 1 2 9 2 2 9 2 2 2 2 2 1 2 2 2 9 2 2 2 2 9 9 2 2 2 2 2 2 2 2 2 2 2 9 2 9 2 2 2 2 2 9 2 2 1 9 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 9 9 2 0 2 2 2 2 9 2 2 2 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 9 2 9 2 2 2 9 2 2 2 2 2 0 2 1 2 2 2 2 2 2 2 2 2 9 2 2 2 2 2 9 2 2 1 1 0 2 2 2 2 2 1 2 2 9 2 2 9 2 2 2 2 2 2 1 2 2 9 2 2 2 2 2 2 9 2 1 1 2 9 2 2 9 2 2 2 2 2 1 2 2 2 9 2 2 2 2 9 9 2 2 2 2 2 2 2 2 2 2 2 9 2 9 2 2 2 2 2 9 2 2 1 9 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 9 9 2 0 2 2 2 2 9 2 2 2 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 9 2 9 2 2 2 9 2 2 2 2 2 0 2 1 2 2 2 2 2 2 2 2 2 9 2 2 2 2 2 9 2 2 1 1 0 2 2 2 2 2 1 2 2 9 2 2 9 2 2 2 2 2 2 1 2 2 9 2 2 2 2 2 2 9 2 1 1 2 9 2 2 9 2 2 2 2 2 1 2 2 2 9 2 2 2 2 9 9 2 2 2 2 2 2 2 2 2 2 2 9 2 9 2 2 2 2 2 9 2 2 1 9 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 9 9 2 0 2 2 2 2 9 2 2 2 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 9 2 9 2 2 2 9 2 2 2 2 2 0 2 1 2 2 2 2 2 2 2 2 2 9 2 2 2 2 2 9 2 2 1 1 0 2 2 2 2 2 1 2 2 9 2 2 9 2 2 2 2 2 2 1 2 2 9 2 2 2 2 2 2 9 2 1 1 2 9 2 2 9 2 2 2 2 2 1 2 2 2 9 2 2 2 2 9 9 2 2 2 2 2 2 2 2 2 2 2 9 2 9 2 2 2 2 2 9 2 2 1 9 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 9 9 2 0 2 2 2 2 9 2 2 2 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 9 2 9 2 2 2 9 2 2 2 2 2 0 2 1 2 2 2 2 2 2 2 2 2 9 2 2 2 2 2 9 2 2 1 1 0 2 2 2 2 2 1 2 2 9 2 2 9 2 2 2 2 2 2 1 2 2 9 2 2 2 2 2 2 9 2 1 1 2 9 2 2 9 2 2 2 2 2 1 2 2 2 9 2 2 2 2 9 9 2 2 2 2 2 2 2 2 2 2 2 9 2 9 2 2 2 2 2 9 2 2 1 9 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 9 9 2 0 2 2 2 2 9 2 2 2 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 9 2 9 2 2 2 9 2 2 2 2 2 0 2 1 2 2 2 2 2 2 2 2 2 9 2 2 2 2 2 9 2 2 1 1 0 2 2 2 2 2 1 2 2 9 2 2 9 2 2 2 2 2 2 1 2 2 9 2 2 2 2 2 2 9 2 1 1 2 9 2 2 9 2 2 2 2 2 1 2 2 2 9 2 2 2 2 9 9 2 2 2 2 2 2 2 2 2 2 2 9 2 9 2 2 2 2 2 9 2 2 1 9 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 9 9 2 0 2 2 2 2 9 2 2 2 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 9 2 9 2 2 2 9 2 2 2 2 2 0 2 1 2 2 2 2 2 2 2 2 2 9 2 2 2 2 2 9 2 2 1 1 0 2 2 2 2 2 1 2 2 9 2 2 9 2 2 2 2 2 2 1 2 2 9 2 2 2 2 2 2 9 2 1 1 2 9 2 2 9 2 2 2 2 2 1 2 2 2 9 2 2 2 2 9 9 2 2 2 2 2 2 2 2 2 2 2 9 2 9 2 2 2 2 2 9 2 2 1 9 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 9 9 2 0 2 2 2 2 9 2 2 2 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 9 2 9 2 2 2 9 2 2 2 2 2 0 2 1 2 2 2 2 2 2 2 2 2 9 2 2 2 2 2 9 2 2 1 1 0 2 2 2 2 2 1 2 2 9 2 2 9 2 2 2 2 2 2 1 2 2 9 2 2 2 2 2 2 9 2 1 1 2 9 2 2 9 2 2 2 2 2 1 2 2 2 9 2 2 2 2 9 9 2 2 2 2 2 2 2 2 2 2 2 9 2 9 2 2 2 2 2 9 2 2 1 9 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 9 9 2 0 2 2 2 2 9 2 2 2 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 9 2 9 2 2 2 9 2 2 2 2 2 0 2 1 2 2 2 2 2 2 2 2 2 9 2 2 2 2 2 9 2 2 1 1 0 2 2 2 2 2 1 2 2 9 2 2 9 2 2 2 2 2 2 1 2 2 9 2 2 2 2 2 2 9 2 1 1 2 9 2 2 9 2 2 2 2 2 1 2 2 2 9 2 2 2 2 9 9 2 2 2 2 2 2 2 2 2 2 2 9 2 9 2 2 2 2 2 9 2 2 1 9 2 2 2 2 2 2 2 2 2 2 2 2 2 2 29 9 2 0 2 2 2 2 9 2 2 2 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 9 2 9 2 2 2 9 2 2 2 2 2 0 2 1 2 2 2 2 2 2 2 2 2 9 2 2 2 2 2 9 2 2 1 1 0 2 2 2 2 2 1 2 2 9 2 2 9 2 2 2 2 2 2 1 2 2 9 2 2 2 2 2 2 9 2 1 1 2 9 2 2 9 2 2 2 2 2 1 2 2 2 9 2 2 2 2 9 9 2 2 2 2 2 2 2 2 2 2 2 9 2 9 2 2 2 2 2 9 2 2 1 9 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 9 9 2 0 2 2 2 2 9 2 2 2 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 9 2 9 2 2 2 9 2 2 2 2 2 0 2 1 2 2 2 2 2 2 2 2 2 9 2 2 2 2 2 9 2 2 1 1 0 2 2 2 2 2 1 2 2 9 2 2 9 2 2 2 2 2 2 1 2 2 9 2 2 2 2 2 2 9 2 1 1 2 9 2 2 9 2 2 2 2 2 1 2 2 2 9 2 2 2 2 9 9 2 2 2 2 2 2 2 2 2 2 2 9 2 9 2 2 2 2 2 9 2 2 1 9 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 9 9 2 0 2 2 2 2 9 2 2 2 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 9 2 9 2 2 2 9 2 2 2 2 2 0 2 1 2 2 2 2 2 2 2 2 2 9 2 2 2 2 2 9 2 2 1 1 0 2 2 2 2 2 1 2 2 9 2 2 9 2 2 2 2 2 2 1 2 2 9 2 2 2 2 2 2 9 2 1 1 2 9 2 2 9 2 2 2 2 2 1 2 2 2 9 2 2 2 2 9 9 2 2 2 2 2 2 2 2 2 2 2 9 2 9 2 2 2 2 2 9 2 2 1 9 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 9 9 2 0 2 2 2 2 9 2 2 2 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 9 2 9 2 2 2 9 2 2 2 2 2 0 2 1 2 2 2 2 2 2 2 2 2 9 2 2 2 2 2 9 2 2 1 1 0 2 2 2 2 2 1 2 2 9 2 2 9 2 2 2 2 2 2 1 2 2 9 2 2 2 2 2 2 9 2 1 1 2 9 2 2 9 2 2 2 2 2 1 2 2 2 9 2 2 2 2 9 9 2 2 2 2 2 2 2 2 2 2 2 9 2 9 2 2 2 2 2 9 2 2 1 9 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
9 9 2 0 2 2 2 2 9 2 2 2 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 9 2 9 2 2 2 9 2 2 2 2 2 0 2 1 2 2 2 2 2 2 2 2 2 9 2 2 2 2 2 9 2 2 1 1 0 2 2 2 2 2 1 2 2 9 2 2 9 2 2 2 2 2 2 1 2 2 9 2 2 2 2 2 2 9 2 1 1 2 9 2 2 9 2 2 2 2 2 1 2 2 2 9 2 2 2 2 9 9 2 2 2 2 2 2 2 2 2 2 2 9 2 9 2 2 2 2 2 9 2 2 1 9 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 9 9 2 0 2 2 2 2 9 2 2 2 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 9 2 9 2 2 2 9 2 2 2 2 2 0 2 1 2 2 2 2 2 2 2 2 2 9 2 2 2 2 2 9 2 2 1 1 0 2 2 2 2 2 1 2 2 9 2 2 9 2 2 2 2 2 2 1 2 2 9 2 2 2 2 2 2 9 2 1 1 2 9 2 2 9 2 2 2 2 2 1 2 2 2 9 2 2 2 2 9 9 2 2 2 2 2 2 2 2 2 2 2 9 2 9 2 2 2 2 2 9 2 2 1 9 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 9 9 2 0 2 2 2 2 9 2 2 2 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 9 2 9 2 2 2 9 2 2 2 2 2 0 2 1 2 2 2 2 2 2 2 2 2 9 2 2 2 2 2 9 2 2 1 1 0 2 2 2 2 2 1 2 2 9 2 2 9 2 2 2 2 2 2 1 2 2 9 2 2 2 2 2 2 9 2 1 1 2 9 2 2 9 2 2 2 2 2 1 2 2 2 9 2 2 2 2 9 9 2 2 2 2 2 2 2 2 2 2 2 9 2 9 2 2 2 2 2 9 2 2 1 9 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 9 9 2 0 2 2 2 2 9 2 2 2 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 9 2 9 2 2 2 9 2 2 2 2 2 0 2 1 2 2 2 2 2 2 2 2 2 9 2 2 2 2 2 9 2 2 1 1 0 2 2 2 2 2 1 2 2 9 2 2 9 2 2 2 2 2 2 1 2 2 9 2 2 2 2 2 2 9 2 1 1 2 9 2 2 9 2 2 2 2 2 1 2 2 2 9 2 2 2 2 9 9 2 2 2 2 2 2 2 2 2 2 2 9 2 9 2 2 2 2 2 9 2 2 1 9 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 9 9 2 0 2 2 2 2 9 2 2 2 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 9 2 9 2 2 2 9 2 2 2 2 2 0 2 1 2 2 2 2 2 2 2 2 2 9 2 2 2 2 2 9 2 2 1 1 0 2 2 2 2 2 1 2 2 9 2 2 9 2 2 2 2 2 2 1 2 2 9 2 2 2 2 2 2 9 2 1 1 2 9 2 2 9 2 2 2 2 2 1 2 2 2 9 2 2 2 2 9 9 2 2 2 2 2 2 2 2 2 2 2 9 2 9 2 2 2 2 2 9 2 2 1 9 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 9 9 2 0 2 2 2 2 9 2 2 2 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 9 2 9 2 2 2 9 2 2 2 2 2 0 2 1 2 2 2 2 2 2 2 2 2 9 2 2 2 2 2 9 2 2 1 1 0 2 2 2 2 2 1 2 2 9 2 2 9 2 2 2 2 2 2 1 2 2 9 2 2 2 2 2 2 9 2 1 1 2 9 2 2 9 2 2 2 2 2 1 2 2 2 9 2 2 2 2 9 9 2 2 2 2 2 2 2 2 2 2 2 9 2 9 2 2 2 2 2 9 2 2 1 9 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 9 9 2 0 2 2 2 2 9 2 2 2 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 9 2 9 2 2 2 9 2 2 2 2 2 0 2 1 2 2 2 2 2 2 2 2 2 9 2 2 2 2 2 9 2 2 1 1 0 2 2 2 2 2 1 2 2 9 2 2 9 2 2 2 2 2 2 1 2 2 9 2 2 2 2 2 2 9 2 1 1 2 9 2 2 9 2 2 2 2 2 1 2 2 2 9 2 2 2 2 9 9 2 2 2 2 2 2 2 2 2 2 2 9 2 9 2 2 2 2 2 9 2 2 1 9 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 9 9 2 0 2 2 2 2 9 2 2 2 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 9 2 9 2 2 2 9 2 2 2 2 2 0 2 1 2 2 2 2 2 2 2 2 2 9 2 2 2 2 2 9 2 2 1 1 0 2 2 2 2 2 1 2 2 9 2 2 9 2 2 2 2 2 2 1 2 2 9 2 2 2 2 2 2 9 2 1 1 2 9 2 2 9 2 2 2 2 2 1 2 2 2 9 2 2 2 2 9 9 2 2 2 2 2 2 2 2 2 2 2 9 2 9 2 2 2 2 2 9 2 2 1 9 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 9 9 2 0 2 2 2 2 9 2 2 2 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 9 2 9 2 2 2 9 2 2 2 2 2 0 2 1 2 2 2 2 2 2 2 2 2 9 2 2 2 2 2 9 2 2 1 1 0 2 2 2 2 2 1 2 2 9 2 2 9 2 2 2 2 2 2 1 2 2 9 2 2 2 2 2 2 9 2 1 1 2 9 2 2 9 2 2 2 2 2 1 2 2 2 9 2 2 2 2 9 9 2 2 2 2 2 2 2 2 2 2 2 9 2 9 2 2 2 2 2 9 2 2 1 9 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 9 9 2 0 2 2 2 2 9 2 2 2 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 9 2 9 2 2 2 9 2 2 2 2 2 0 2 1 2 2 2 2 2 2 2 2 2 9 2 2 2 2 2 9 2 2 1 1 0 2 2 2 2 2 1 2 2 9 2 2 9 2 2 2 2 2 2 1 2 2 9 2 2 2 2 2 2 9 2 1 1 2 9 2 2 9 2 2 2 2 2 1 2 2 2 9 2 2 2 2 9 9 2 2 2 2 2 2 2 2 2 2 2 9 2 9 2 2 2 2 2 9 2 2 1 9 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 9 9 2 0 2 2 2 2 9 2 2 2 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 9 2 9 2 2 2 9 2 2 2 2 2 0 2 1 2 2 2 2 2 2 2 2 2 9 2 2 2 2 2 9 2 2 1 1 0 2 2 2 2 2 1 2 2 9 2 2 9 2 2 2 2 2 2 1 2 2 9 2 2 2 2 2 2 9 2 1 1 2 9 2 2 9 2 2 2 2 2 1 2 2 2 9 2 2 2 2 9 9 2 2 2 2 2 2 2 2 2 2 2 9 2 9 2 2 2 2 2 9 2 2 1 9 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 9 9 2 0 2 2 2 2 9 2 2 2 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 9 2 9 2 2 2 9 2 2 2 2 2 0 2 1 2 2 2 2 2 2 2 2 2 9 2 2 2 2 2 9 2 2 1 1 0 2 2 2 2 2 1 2 2 9 2 2 9 2 2 2 2 2 2 1 2 2 9 2 2 2 2 2 2 9 2 1 1 2 9 2 2 9 2 2 2 2 2 1 2 2 2 9 2 2 2 2 9 9 2 2 2 2 2 2 2 2 2 2 2 9 2 9 2 2 2 2 2 9 2 2 1 9 2 2 2 2 2 2 2 2 2 2 2 2 2 2 29 9 2 0 2 2 2 2 9 2 2 2 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 9 2 9 2 2 2 9 2 2 2 2 2 0 2 1 2 2 2 2 2 2 2 2 2 9 2 2 2 2 2 9 2 2 1 1 0 2 2 2 2 2 1 2 2 9 2 2 9 2 2 2 2 2 2 1 2 2 9 2 2 2 2 2 2 9 2 1 1 2 9 2 2 9 2 2 2 2 2 1 2 2 2 9 2 2 2 2 9 9 2 2 2 2 2 2 2 2 2 2 2 9 2 9 2 2 2 2 2 9 2 2 1 9 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 9 9 2 0 2 2 2 2 9 2 2 2 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 9 2 9 2 2 2 9 2 2 2 2 2 0 2 1 2 2 2 2 2 2 2 2 2 9 2 2 2 2 2 9 2 2 1 1 0 2 2 2 2 2 1 2 2 9 2 2 9 2 2 2 2 2 2 1 2 2 9 2 2 2 2 2 2 9 2 1 1 2 9 2 2 9 2 2 2 2 2 1 2 2 2 9 2 2 2 2 9 9 2 2 2 2 2 2 2 2 2 2 2 9 2 9 2 2 2 2 2 9 2 2 1 9 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 9 9 2 0 2 2 2 2 9 2 2 2 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 9 2 9 2 2 2 9 2 2 2 2 2 0 2 1 2 2 2 2 2 2 2 2 2 9 2 2 2 2 2 9 2 2 1 1 0 2 2 2 2 2 1 2 2 9 2 2 9 2 2 2 2 2 2 1 2 2 9 2 2 2 2 2 2 9 2 1 1 2 9 2 2 9 2 2 2 2 2 1 2 2 2 9 2 2 2 2 9 9 2 2 2 2 2 2 2 2 2 2 2 9 2 9 2 2 2 2 2 9 2 2 1 9 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 9 9 2 0 2 2 2 2 9 2 2 2 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 9 2 9 2 2 2 9 2 2 2 2 2 0 2 1 2 2 2 2 2 2 2 2 2 9 2 2 2 2 2 9 2 2 1 1 0 2 2 2 2 2 1 2 2 9 2 2 9 2 2 2 2 2 2 1 2 2 9 2 2 2 2 2 2 9 2 1 1 2 9 2 2 9 2 2 2 2 2 1 2 2 2 9 2 2 2 2 9 9 2 2 2 2 2 2 2 2 2 2 2 9 2 9 2 2 2 2 2 9 2 2 1 9 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
...
I already tried this awk-solution : awk '{for (i=1; i<=NF; i++) a[i]=a[i](NR!=1?FS:"")$i} END {for (i=1; i in a; i++) print a[i]}' which I found here.
I chose an awk-solution due to this similar question where one user already benchmarked different solutions.
This operation runs now for more than 24 hours and I'm curious if there is by chance any other way in any possible language to achieve the same result in less computational time.
Q: What is the fastest way to transpose such a file?
EDIT I: The large amount of possible answers in this similar question is an argument to not see this question as an duplicate. The simple datamash answer as suggested in the comments should help less experienced users with bash to find the answer to this question more easily.
As mentioned in the comments by #kvantour and #Inian datamash seems to be the way to go. This one-liner should solve the question:
datamash transpose -t ' ' < input.txt > output.txt

Generation of a counter variable for episodes in panel data in stata [duplicate]

This question already has an answer here:
Calculating consecutive ones
(1 answer)
Closed 1 year ago.
I am trying to generate a counter variable that describes the duration of a temporal episode in panel data.
I am using long format data that looks something like this:
clear
input byte id int time byte var1 int aim1
1 1 0 .
1 2 0 .
1 3 1 1
1 4 1 2
1 5 0 .
1 6 0 .
1 7 0 .
2 1 0 .
2 2 1 1
2 3 1 2
2 4 1 3
2 5 0 .
2 6 1 1
2 7 1 2
end
I want to generate a variable like aim1 that starts with a value of 1 when var1==1, and counts up one unit with each subsequent observation per ID where var1 is still equal to 1. For each observation where var1!=1, aim1 should contain missing values.
I already tried using rangestat (count) to solve the problem, however the created variable does not restart the count with each episode:
ssc install rangestat
gen var2=1 if var1==1
rangestat (count) aim2=var2, interval(time -7 0) by (id)
Here are two ways to do it: (1) from first principles, but see this paper for more and (2) using tsspell from SSC.
clear
input byte id int time byte var1 int aim1
1 1 0 .
1 2 0 .
1 3 1 1
1 4 1 2
1 5 0 .
1 6 0 .
1 7 0 .
2 1 0 .
2 2 1 1
2 3 1 2
2 4 1 3
2 5 0 .
2 6 1 1
2 7 1 2
end
bysort id (time) : gen wanted = 1 if var1 == 1 & var1[_n-1] != 1
by id: replace wanted = wanted[_n-1] + 1 if var1 == 1 & missing(wanted)
tsset id time
ssc inst tsspell
tsspell, cond(var1 == 1)
list, sepby(id _spell)
+---------------------------------------------------------+
| id time var1 aim1 wanted _seq _spell _end |
|---------------------------------------------------------|
1. | 1 1 0 . . 0 0 0 |
2. | 1 2 0 . . 0 0 0 |
|---------------------------------------------------------|
3. | 1 3 1 1 1 1 1 0 |
4. | 1 4 1 2 2 2 1 1 |
|---------------------------------------------------------|
5. | 1 5 0 . . 0 0 0 |
6. | 1 6 0 . . 0 0 0 |
7. | 1 7 0 . . 0 0 0 |
|---------------------------------------------------------|
8. | 2 1 0 . . 0 0 0 |
|---------------------------------------------------------|
9. | 2 2 1 1 1 1 1 0 |
10. | 2 3 1 2 2 2 1 0 |
11. | 2 4 1 3 3 3 1 1 |
|---------------------------------------------------------|
12. | 2 5 0 . . 0 0 0 |
|---------------------------------------------------------|
13. | 2 6 1 1 1 1 2 0 |
14. | 2 7 1 2 2 2 2 1 |
+---------------------------------------------------------+
The approach of tsspell is very close to what you ask for, except (a) its counter (by default _seq is 0 when out of spell, but replace _seq = . if _seq == 0 gets what you ask (b) its auxiliary variables (by default _spell and _end) are useful in many problems. You must install tsspell before you can use it with ssc install tsspell.

Mysterious: Elif and Counter Increment Bug

Hi I am trying to run a for loop with an increment counter that switches into an elif statement. The for loop is a way of building a string of syllables to synthesize with macintalk. I would like to add a short silence every 20ms but I cant seem to get it to work, I've tried a bunch of debugging steps but none seem to work. Can anyone spot the bug that prevents the elif from being accessed?
EDIT
Ok so I followed the suggestion below and used -eq instead = but I noticed that the counter only resets once and does not access the conditional statement a second time. The revised code is posted below:
counter=0;
for k in $indx
do
counter=$(($counter + 1));
echo 'increment counter'
echo $counter
if [ $k -eq 0 ]
then
stream=$stream'#_'${syllarray[k]}
elif [ $counter -eq 20 ]
then
echo Adding Silence after syllable: ${syllarray[k]}
stream=$stream'_'${syllarray[k]}'[[ slnc 20 ]]'
counter=0;
echo 'reset counter'
echo $counter
else
stream=$stream'_'${syllarray[k]}
fi
done
Sample output:
Synthesize A Syllable Stream with Predetermined Lexicon, Word Order and Phonology
Parameters:
Voice -- Victoria
Rate (words/min) -- 120
Pitch Modulation Interval -- 0
Baseline pitch -- 55
Directory Exists :)
Opening Syllable Transcription for Victoria
0
bIY
1
bUW
2
dAE
3
dOW
4
gOW
5
kUW
6
lAE
7
pAE
8
pIY
9
rOW
10
tIY
11
tUW
Counter Balanced Stimulus Order (Indexed by Syllables in Alphabetical Order)
11 2 9 8 4 6 0 5 10 11 2 9 8 4 6 1 3 7 8 4 6 0 5 10 11 2 9 1 3 7 0 5 10 8 4 6 1 3 7 0 5 10 8 4 6 11 2 9 0 5 10 1 3 7 8 4 6 11 2 9 1 3 7 11 2 9 0 5 10 1 3 7 8 4 6 0 5 10 11 2 9 8 4 6 1 3 7 0 5 10 8 4 6 11 2 9 0 5 10 8 4 6 11 2 9 0 5 10 1 3 7 8 4 6 1 3 7 0 5 10 11 2 9 1 3 7 8 4 6 0 5 10 1 3 7 11 2 9 1 3 7 11 2 9 1 3 7 0 5 10 8 4 6 1 3 7 0 5 10 11 2 9 0 5 10 11 2 9 8 4 6 11 2 9 1 3 7 8 4 6 0 5 10 1 3 7 11 2 9 8 4 6 0 5 10 8 4 6 1 3 7 11 2 9 0 5 10 1 3 7 8 4 6 11 2 9 8 4 6 1 3 7 11 2 9 0 5 10 8 4 6 11 2 9 0 5 10 8 4 6 11 2 9 1 3 7 0 5 10 1 3 7 8 4 6 0 5 10 1 3 7 0 5 10 11 2 9 1 3 7 11 2 9 8 4 6 0 5 10 11 2 9 8 4 6 1 3 7
Creating counterbalanced stimulus stream string with proper Macintalk formatting
increment counter
1
increment counter
2
increment counter
3
increment counter
4
increment counter
5
increment counter
6
increment counter
7
increment counter
8
increment counter
9
increment counter
10
increment counter
11
increment counter
12
increment counter
13
increment counter
14
increment counter
15
increment counter
16
increment counter
17
increment counter
18
increment counter
19
increment counter
20
Adding Silence after syllable: gOW
reset counter
0
increment counter
1
..
[truncated for clarity]
..
increment counter
268
Printing Stream to Screen
_tUW_dAE_rOW_pIY_gOW_lAE#_bIY_kUW_tIY_tUW_dAE_rOW_pIY_gOW_lAE_bUW_dOW_pAE_pIY_gOW[[ slnc 20 ]]_lAE#_bIY_kUW_tIY_tUW_dAE_rOW_bUW_dOW_pAE#_bIY_kUW_tIY_pIY_gOW_lAE_bUW_dOW_pAE#_bIY_kUW_tIY_pIY_gOW_lAE_tUW_dAE_rOW#_bIY_kUW_tIY_bUW_dOW_pAE_pIY_gOW_lAE_tUW_dAE_rOW_bUW_dOW_pAE_tUW_dAE_rOW#_bIY_kUW_tIY_bUW_dOW_pAE_pIY_gOW_lAE#_bIY_kUW_tIY_tUW_dAE_rOW_pIY_gOW_lAE_bUW_dOW_pAE#_bIY_kUW_tIY_pIY_gOW_lAE_tUW_dAE_rOW#_bIY_kUW_tIY_pIY_gOW_lAE_tUW_dAE_rOW#_bIY_kUW_tIY_bUW_dOW_pAE_pIY_gOW_lAE_bUW_dOW_pAE#_bIY_kUW_tIY_tUW_dAE_rOW_bUW_dOW_pAE_pIY_gOW_lAE#_bIY_kUW_tIY_bUW_dOW_pAE_tUW_dAE_rOW_bUW_dOW_pAE_tUW_dAE_rOW_bUW_dOW_pAE#_bIY_kUW_tIY_pIY_gOW_lAE_bUW_dOW_pAE#_bIY_kUW_tIY_tUW_dAE_rOW#_bIY_kUW_tIY_tUW_dAE_rOW_pIY_gOW_lAE_tUW_dAE_rOW_bUW_dOW_pAE_pIY_gOW_lAE#_bIY_kUW_tIY_bUW_dOW_pAE_tUW_dAE_rOW_pIY_gOW_lAE#_bIY_kUW_tIY_pIY_gOW_lAE_bUW_dOW_pAE_tUW_dAE_rOW#_bIY_kUW_tIY_bUW_dOW_pAE_pIY_gOW_lAE_tUW_dAE_rOW_pIY_gOW_lAE_bUW_dOW_pAE_tUW_dAE_rOW#_bIY_kUW_tIY_pIY_gOW_lAE_tUW_dAE_rOW#_bIY_kUW_tIY_pIY_gOW_lAE_tUW_dAE_rOW_bUW_dOW_pAE#_bIY_kUW_tIY_bUW_dOW_pAE_pIY_gOW_lAE#_bIY_kUW_tIY_bUW_dOW_pAE#_bIY_kUW_tIY_tUW_dAE_rOW_bUW_dOW_pAE_tUW_dAE_rOW_pIY_gOW_lAE#_bIY_kUW_tIY_tUW_dAE_rOW_pIY_gOW_lAE_bUW_dOW_pAE
Saving Synthesized Stream
Writing to: ./synthesis/stream-Victoria/stream.flac

Shell command to reverse each line of file

I have a file test.txt with the following text
1 2 3 4
3 4 5 6
8 7 3 2
I want to save it as
4 3 2 1
6 5 4 3
2 3 7 8
Is there any shell command which does that?
rev will do the job:
rev file
4 3 2 1
6 5 4 3
2 3 7 8

Filling in gaps with awk or anything

I have a list such as below, where the 1 column is position and the other columns aren't important for this question.
1 1 2 3 4 5
2 1 2 3 4 5
5 1 2 3 4 5
8 1 2 3 4 5
9 1 2 3 4 5
10 1 2 3 4 5
11 1 2 3 4 5
I want to fill in the gaps such that the list is continuous and it reads
1 1 2 3 4 5
2 1 2 3 4 5
3 0 0 0 0 0
4 0 0 0 0 0
5 1 2 3 4 5
6 0 0 0 0 0
7 0 0 0 0 0
8 1 2 3 4 5
9 1 2 3 4 5
10 1 2 3 4 5
11 1 2 3 4 5
I am familiar with awk and shell scripts, but whatever way it can be done is fine with me.
Thanks for any help..
this one-liner may work for you:
awk '$1>++p{for(;p<$1;p++)print p"  0 0 0 0 0"}1' file
with your example:
kent$ echo '1 1 2 3 4 5
2 1 2 3 4 5
5 1 2 3 4 5
8 1 2 3 4 5
9 1 2 3 4 5
10 1 2 3 4 5
11 1 2 3 4 5'|awk '$1>++p{for(;p<$1;p++)print p" 0 0 0 0 0"}1'
1 1 2 3 4 5
2 1 2 3 4 5
3 0 0 0 0 0
4 0 0 0 0 0
5 1 2 3 4 5
6 0 0 0 0 0
7 0 0 0 0 0
8 1 2 3 4 5
9 1 2 3 4 5
10 1 2 3 4 5
11 1 2 3 4 5
You can use the following awk one-liner:
awk '{b=a;a=$1;while(a>(b++)+1){print(b+1)," 0 0 0 0 0"}}1' input.file
Tested with here-doc input:
awk '{b=a;a=$1;while(a>(b++)+1){print(b+1)," 0 0 0 0 0"}}1' <<EOF
1 1 2 3 4 5
2 1 2 3 4 5
5 1 2 3 4 5
8 1 2 3 4 5
9 1 2 3 4 5
10 1 2 3 4 5
11 1 2 3 4 5
EOF
the output is as follows:
1 1 2 3 4 5
2 1 2 3 4 5
3 0 0 0 0 0
4 0 0 0 0 0
5 1 2 3 4 5
6 0 0 0 0 0
7 0 0 0 0 0
8 1 2 3 4 5
9 1 2 3 4 5
10 1 2 3 4 5
11 1 2 3 4 5
Explanation:
On every input line b is set to a where a is the value of the first column. Because of the order in which b and a are initialized, b can be used in a while loop that runs as long as b < a-1 and inserts the missing lines, filled up with zeros. The 1 at the end of the script will finally print the input line.
This is only for fun:
join -a2 FILE <(seq -f "%g 0 0 0 0 0" $(tail -1 FILE | cut -d' ' -f1)) | cut -d' ' -f -6
produces:
1 1 2 3 4 5
2 1 2 3 4 5
3 0 0 0 0 0
4 0 0 0 0 0
5 1 2 3 4 5
6 0 0 0 0 0
7 0 0 0 0 0
8 1 2 3 4 5
9 1 2 3 4 5
10 1 2 3 4 5
11 1 2 3 4 5
Here is another way:
awk '{x=$1-b;while(x-->1){print ++b," 0 0 0 0 0"};b=$1}1' file
Test:
$ cat file
1 1 2 3 4 5
2 1 2 3 4 5
5 1 2 3 4 5
8 1 2 3 4 5
9 1 2 3 4 5
10 1 2 3 4 5
11 1 2 3 4 5
$ awk '{x=$1-b;while(x-->1){print ++b," 0 0 0 0 0"};b=$1}1' file
1 1 2 3 4 5
2 1 2 3 4 5
3 0 0 0 0 0
4 0 0 0 0 0
5 1 2 3 4 5
6 0 0 0 0 0
7 0 0 0 0 0
8 1 2 3 4 5
9 1 2 3 4 5
10 1 2 3 4 5
11 1 2 3 4 5

Resources