I have two text files (tsv format), which each have 240 columns and 100 lines. I would like to sort the columns alternately and make one file (480 columns and 100 lines). How could I achieve this goal with standard command line tools in Linux?
Example (in case of a single line) :
FileA:
1 2 3 4 5 ・・・
FileB:
001 002 003 004 005 ・・・
Expected Result:
1 001 2 002 3 003 ・・・
just awk with "getline"
==> file1 <==
a b c d e f g h i j k l m
n o p q r s t u v w x y z
==> file2 <==
1 2 3 4 5 6 7 8 9 10 11 12 13
14 15 16 17 18 19 20 21 22 23 24 25 26
$ awk '{split($0,f1);
getline < "file2";
for(i=1;i<=NF;i++) printf "%s%s%s%s", f1[i], OFS, $i, (i==NF?ORS:OFS)}' file1
a 1 b 2 c 3 d 4 e 5 f 6 g 7 h 8 i 9 j 10 k 11 l 12 m 13
n 14 o 15 p 16 q 17 r 18 s 19 t 20 u 21 v 22 w 23 x 24 y 25 z 26
if space is not the required output delimiter set OFS accordingly...
ps. getline use is normally discouraged for any non-trivial script, and usually should be avoided by beginners. See here for example for more explanation.
paste + awk solution:
Sample file1:
a b c d e f g h i j k l m n o p q r s t u v w x y z
a b c d e f g h i j k l m n o p q r s t u v w x y z
Sample file2:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
paste file1 file2 \
| awk '{ len=NF/2;
for (i=1; i<=len; i++)
printf "%s %s%s", $i, $(i+len),(i==len? ORS:OFS)
}'
The output:
a 1 b 2 c 3 d 4 e 5 f 6 g 7 h 8 i 9 j 10 k 11 l 12 m 13 n 14 o 15 p 16 q 17 r 18 s 19 t 20 u 21 v 22 w 23 x 24 y 25 z 26
a 1 b 2 c 3 d 4 e 5 f 6 g 7 h 8 i 9 j 10 k 11 l 12 m 13 n 14 o 15 p 16 q 17 r 18 s 19 t 20 u 21 v 22 w 23 x 24 y 25 z 26
Use bash to make some dummy files that match the spec, along with some letter-line suffixes to tell them apart:
for f in {A..z} {A..j} ; do echo $( seq -f '%g'"$f" 240 ) ; done > FileA
for f in {z..A} {j..A} ; do echo $( seq -f '%03.3g'"$f" 240 ) ; done > FileB
Use bash, paste and xargs:
paste -d' ' <(tr ' ' '\n' < FileA) <(tr ' ' '\n' < FileB) | xargs -L 240 echo
Since the output of that is a bit unweildy, show first ten lines, with both the first and last six columns:
paste -d' ' <(tr ' ' '\n' < FileA) <(tr ' ' '\n' < FileB) | xargs -L 240 echo |
head | cut -d' ' -f1-6,476-480
1A 001z 2A 002z 3A 003z 238z 239A 239z 240A 240z
1B 001y 2B 002y 3B 003y 238y 239B 239y 240B 240y
1C 001x 2C 002x 3C 003x 238x 239C 239x 240C 240x
1D 001w 2D 002w 3D 003w 238w 239D 239w 240D 240w
1E 001v 2E 002v 3E 003v 238v 239E 239v 240E 240v
1F 001u 2F 002u 3F 003u 238u 239F 239u 240F 240u
1G 001t 2G 002t 3G 003t 238t 239G 239t 240G 240t
1H 001s 2H 002s 3H 003s 238s 239H 239s 240H 240s
1I 001r 2I 002r 3I 003r 238r 239I 239r 240I 240r
1J 001q 2J 002q 3J 003q 238q 239J 239q 240J 240q
So i'm issuing a query to mysql and it's returning say 1,000 rows,but each iteration of the program could return a different number of rows. I need to break up (without using a mysql limit) this result set into chunks of 100 rows that i can then programatically iterate through in these 100 row chunks.
So
MySQLOutPut='1 2 3 4 ... 10,000"
I need to turn that into an array that looks like
array[1]="1 2 3 ... 100"
array[2]="101 102 103 ... 200"
etc.
I have no clue how to accomplish this elegantly
Using Charles' data generation:
MySQLOutput=$(seq 1 10000 | tr '\n' ' ')
# the sed command will add a newline after every 100 words
# and the mapfile command will read the lines into an array
mapfile -t MySQLOutSplit < <(
sed -r 's/([^[:blank:]]+ ){100}/&\n/g; $s/\n$//' <<< "$MySQLOutput"
)
echo "${#MySQLOutSplit[#]}"
# 100
echo "${MySQLOutSplit[0]}"
# 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
echo "${MySQLOutSplit[99]}"
# 9901 9902 9903 9904 9905 9906 9907 9908 9909 9910 9911 9912 9913 9914 9915 9916 9917 9918 9919 9920 9921 9922 9923 9924 9925 9926 9927 9928 9929 9930 9931 9932 9933 9934 9935 9936 9937 9938 9939 9940 9941 9942 9943 9944 9945 9946 9947 9948 9949 9950 9951 9952 9953 9954 9955 9956 9957 9958 9959 9960 9961 9962 9963 9964 9965 9966 9967 9968 9969 9970 9971 9972 9973 9974 9975 9976 9977 9978 9979 9980 9981 9982 9983 9984 9985 9986 9987 9988 9989 9990 9991 9992 9993 9994 9995 9996 9997 9998 9999 10000
Something like this:
# generate content
MySQLOutput=$(seq 1 10000 | tr '\n' ' ') # seq is awful, don't use in real life
# split into a large array, each item stored individually
read -r -a MySQLoutArr <<<"$MySQLOutput"
# add each batch of 100 items into a new array entry
batchSize=100
MySQLoutSplit=( )
for ((i=0; i<${#MySQLoutArr[#]}; i+=batchSize)); do
MySQLoutSplit+=( "${MySQLoutArr[*]:i:batchSize}" )
done
To explain some of the finer points:
read -r -a foo reads contents into an array named foo, split on IFS, up to the next character specified by read -d (none given here, thus reading only a single line). If you wanted each line to be a new array entry, consider IFS=$'\n' read -r -d '' -a foo, which will read each line into an array, terminated at the first NUL in the input stream.
"${foo[*]:i:batchSize}" expands to a list of items in array foo, starting at index i, and taking the next batchSize items, concatenated into a single string with the first character in $IFS used as a separator.
I've managed to extract data (from an html page) that goes into a table, and I've isolated the columns of said table into a text file that contains the lines below:
[30,30,32,35,34,43,52,68,88,97,105,107,107,105,101,93,88,80,69,55],
[28,6,6,50,58,56,64,87,99,110,116,119,120,117,114,113,103,82,6,47],
[-7,,,43,71,30,23,28,13,13,10,11,12,11,13,22,17,3,,-15,-20,,38,71],
[0,,,3,5,1.5,1,1.5,0.5,0.5,0,0.5,0.5,0.5,0.5,1,0.5,0,-0.5,-0.5,2.5]
Each bracketed list of numbers represents a column. What I'd like to do is turn these lists into actual columns that I can work with in different data formats. I'd also like to be sure to include that blank parts of these lists too (i.e., "[,,,]")
This is basically what I'm trying to accomplish:
30 28 -7 0
30 6
32 6
35 50 43 3
34 58 71 5
43 56 30 1.5
52 64 23 1
. . . .
. . . .
. . . .
I'm parsing data from a web page, and ultimately planning to make the process as automated as possible so I can easily work with the data after I output it to a nice format.
Anyone know how to do this, have any suggestions, or thoughts on scripting this?
Since you have your lists in python, just do it in python:
l=[["30", "30", "32"], ["28","6","6"], ["-7", "", ""], ["0", "", ""]]
for i in zip(*l):
print "\t".join(i)
produces
30 28 -7 0
30 6
32 6
awk based solution:
awk -F, '{gsub(/\[|\]/, ""); for (i=1; i<=NF; i++) a[i]=a[i] ? a[i] OFS $i: $i}
END {for (i=1; i<=NF; i++) print a[i]}' file
30 28 -7 0
30 6
32 6
35 50 43 3
34 58 71 5
43 56 30 1.5
52 64 23 1
..........
..........
Another solution, but it works only for file with 4 lines:
$ paste \
<(sed -n '1{s,\[,,g;s,\],,g;s|,|\n|g;p}' t) \
<(sed -n '2{s,\[,,g;s,\],,g;s|,|\n|g;p}' t) \
<(sed -n '3{s,\[,,g;s,\],,g;s|,|\n|g;p}' t) \
<(sed -n '4{s,\[,,g;s,\],,g;s|,|\n|g;p}' t)
30 28 -7 0
30 6
32 6
35 50 43 3
34 58 71 5
43 56 30 1.5
52 64 23 1
68 87 28 1.5
88 99 13 0.5
97 110 13 0.5
105 116 10 0
107 119 11 0.5
107 120 12 0.5
105 117 11 0.5
101 114 13 0.5
93 113 22 1
88 103 17 0.5
80 82 3 0
69 6 -0.5
55 47 -15 -0.5
-20 2.5
38
71
Updated: or another version with preprocessing:
$ sed 's|\[||;s|\][,]\?||' t >t2
$ paste \
<(sed -n '1{s|,|\n|g;p}' t2) \
<(sed -n '2{s|,|\n|g;p}' t2) \
<(sed -n '3{s|,|\n|g;p}' t2) \
<(sed -n '4{s|,|\n|g;p}' t2)
If a file named data contains the data given in the problem (exactly as defined above), then the following bash command line will produce the output requested:
$ sed -e 's/\[//' -e 's/\]//' -e 's/,/ /g' <data | rs -T
Example:
cat data
[30,30,32,35,34,43,52,68,88,97,105,107,107,105,101,93,88,80,69,55],
[28,6,6,50,58,56,64,87,99,110,116,119,120,117,114,113,103,82,6,47],
[-7,,,43,71,30,23,28,13,13,10,11,12,11,13,22,17,3,,-15,-20,,38,71],
[0,,,3,5,1.5,1,1.5,0.5,0.5,0,0.5,0.5,0.5,0.5,1,0.5,0,-0.5,-0.5,2.5]
$ sed -e 's/[//' -e 's/]//' -e 's/,/ /g' <data | rs -T
30 28 -7 0
30 6 43 3
32 6 71 5
35 50 30 1.5
34 58 23 1
43 56 28 1.5
52 64 13 0.5
68 87 13 0.5
88 99 10 0
97 110 11 0.5
105 116 12 0.5
107 119 11 0.5
107 120 13 0.5
105 117 22 1
101 114 17 0.5
93 113 3 0
88 103 -15 -0.5
80 82 -20 -0.5
69 6 38 2.5
55 47 71