i am trying to transpose a table (10k rows X 10K cols) using the following script.
A simple data example
$ cat rm1
t1 t2 t3
n1 1 2 3
n2 2 3 44
n3 1 1 1
$ sh transpose.sh rm1
n1 n2 n3
t1 1 2 1
t2 2 3 1
t3 3 44 1
However, I am getting memory error. Any help would be appreciated.
awk -F "\t" '{
for (f = 1; f <= NF; f++)
a[NR, f] = $f
}
NF > nf { nf = NF }
END {
for (f = 1; f <= nf; f++)
for (r = 1; r <= NR; r++)
printf a[r, f] (r==NR ? RS : FS)
}'
Error
awk: cmd. line:2: (FILENAME=input FNR=12658) fatal: dupnode: r->stptr: can't allocate 10 bytes of memory (Cannot allocate memory)
Here's one way to do it, as I mentioned in my comments, in chunks. Here I show the mechanics on a tiny 12r x 10c file, but I also ran a chunk of 1000 rows on a 10K x 10K file in not much more than a minute (Mac Powerbook).6
EDIT The following was updated to consider an M x N matrix with unequal number of rows and columns. The previous version only worked for an 'N x N' matrix.
$ cat et.awk
BEGIN {
start = chunk_start
limit = chunk_start + chunk_size - 1
}
{
n = (limit > NF) ? NF : limit
for (f = start; f <= n; f++) {
a[NR, f] = $f
}
}
END {
n = (limit > NF) ? NF : limit
for (f = start; f <= n; f++)
for (r = 1; r <= NR; r++)
printf a[r, f] (r==NR ? RS : FS)
}
$ cat t.txt
10 11 12 13 14 15 16 17 18 19
20 21 22 23 24 25 26 27 28 29
30 31 32 33 34 35 36 37 38 39
40 41 42 43 44 45 46 47 48 49
50 51 52 53 54 55 56 57 58 59
60 61 62 63 64 65 66 67 68 69
70 71 72 73 74 75 76 77 78 79
80 81 82 83 84 85 86 87 88 89
90 91 92 93 94 95 96 97 98 99
A0 A1 A2 A3 A4 A5 A6 A7 A8 A9
B0 B1 B2 B3 B4 B5 B6 B7 B8 B9
C0 C1 C2 C3 C4 C5 C6 C7 C8 C9
$ cat et.sh
inf=$1
outf=$2
rm -f $outf
for i in $(seq 1 2 12); do
echo chunk for rows $i $(expr $i + 1)
awk -v chunk_start=$i -v chunk_size=2 -f et.awk $inf >> $outf
done
$ sh et.sh t.txt t-transpose.txt
chunk for rows 1 2
chunk for rows 3 4
chunk for rows 5 6
chunk for rows 7 8
chunk for rows 9 10
chunk for rows 11 12
$ cat t-transpose.txt
10 20 30 40 50 60 70 80 90 A0 B0 C0
11 21 31 41 51 61 71 81 91 A1 B1 C1
12 22 32 42 52 62 72 82 92 A2 B2 C2
13 23 33 43 53 63 73 83 93 A3 B3 C3
14 24 34 44 54 64 74 84 94 A4 B4 C4
15 25 35 45 55 65 75 85 95 A5 B5 C5
16 26 36 46 56 66 76 86 96 A6 B6 C6
17 27 37 47 57 67 77 87 97 A7 B7 C7
18 28 38 48 58 68 78 88 98 A8 B8 C8
19 29 39 49 59 69 79 89 99 A9 B9 C9
And then running the first chunk on the huge file looks like:
$ time awk -v chunk_start=1 -v chunk_size=1000 -f et.awk tenk.txt > tenk-transpose.txt
real 1m7.899s
user 1m5.173s
sys 0m2.552s
Doing that ten times with the next chunk_start set to 1001, etc. (and appending with >> to the output, of course) should finally give you the full transposed result.
There is a simple and quick algorithm based on sorting:
1) Make a pass through the input, prepending the row number and column number to each field. Output is a three-tuple of row, column, value for each cell in the matrix. Write the output to a temporary file.
2) Sort the temporary file by column, then row.
3) Make a pass through the sorted temporary file, reconstructing the transposed matrix.
The two outer passes are done by awk. The sort is done by the system sort. Here's the code:
$ echo '1 2 3
2 3 44
1 1 1' |
awk '{ for (i=1; i<=NF; i++) print i, NR, $i}' |
sort -n |
awk ' NR>1 && $2==1 { print "" }; { printf "%s ", $3 }; END { print "" }'
1 2 1
2 3 1
3 44 1
Related
everyone
I am looking for a way to keep the records from txt file that meet the following condition:
This is the example of the data:
aa bb cc
11 22 33
44 55 66
77 88 99
aa bb cc
11 22 33 44 55 66 77
44 55 66 66
77 88 99
aa bb cc
11 22 33 44 55
44 55 66
77 88 99 77
...
Basically, it's a file where one record where there are total 5 lines, 4 lines contain strings/numbers with tab delimeter , and the last is the new line \n.
The first line of the record always has 3 elements, while the number of elements in 2nd 3rd and 4th line can be different.
What I need to do is to remove every record(5 lines block) where total number of elements in the second line > 3 ( and I don't care about the number of elements in all the rest lines) . The output of the example should look like this:
aa bb cc
11 22 33
44 55 66
77 88 99
...
so only the record where i have 3 elements are kept and recorded in the new txt file.
I tried to do it with awk by modifying FS and RS values like this:
awk 'BEGIN {RS="\n\n"; FS="\n";}
{if(length($2)==3) print $2"\n\n"; }' test_filter.txt
but if(length($2)==3) is not correct, as I should count the number of entries in 2nd field instead of counting the length, which I can't find how to do.. any help would be much appreaciated!
thanks in advance,
You can use the split() function to break a line/field/string into components; in this case:
n=split($2,arr," ")
Where:
we split field #2, using a single space (" ") as the delimiter ...
components are stored in array arr[] and ...
n is the number of elements in the array
Pulling this into OP's current awk code, along with a couple small changes, we get:
awk 'BEGIN {ORS=RS="\n\n"; FS="\n"} {n=split($2,arr," "); if (n>=4) next}1' test_filter.txt
With an additional block added to our sample:
$ cat test_filter.txt
aa bb cc
11 22 33
44 55 66
77 88 99
aa bb cc
11 22 33 44 55 66 77
44 55 66 66
77 88 99
aa bb cc
111 222 333
444 555 665
777 888 999
aa bb cc
11 22 33 44 55
44 55 66
77 88 99 77
This awk solution generates:
aa bb cc
11 22 33
44 55 66
77 88 99
aa bb cc
111 222 333
444 555 665
777 888 999
# blank line here
I want to print the numbers 1 to 100 numbers, but in columns: first 1 to 10 in first column, 11 to 20 in 2nd column, then 21 to 30 in 3rd column, ..., 91 to 100 in 10th column.
How can I achieve this in Bash? I have tried:
#!/bin/bash
for ((i=1; i <= 100 ; i++)) do
echo " $i"
done
Its a bit heavy since it spawns many subprocesses .. but posting it as a 1-liner
paste <(seq 1 10) <(seq 11 20) <(seq 21 30) <(seq 31 40) <(seq 41 50) <(seq 51 60) <(seq 61 70) <(seq 71 80) <(seq 81 90) <(seq 91 100)
for 1-10 in 1st column, 11-20 in 2nd column and so on..
and
seq 1 100 | paste - - - - - - - - - -
for 1-10 in 1st row, 11-20 in 2nd row and so on..
Note: There are 10 hypens in 2nd command and in the 1st one, <(command) means process substitution i.e substitutes the the output of the process
Edit: Approach purely using for loop
for ((i=1;i<=10;i++)); do
for ((j=i;j<=(i+90);j+=10)); do
printf "%2d " $j
done
echo
done
for 1-10 in 1st column, 11-20 in 2nd column and so on..
and
for ((i=0;i<10;i++)); do
for ((j=1;j<=10;j++)); do
printf "%2d " $[$i*10+$j]
done
echo
done
for 1-10 in 1st row, 11-20 in 2nd row and so on..
There is an easier way!
$ seq 100 | pr -10t
1 11 21 31 41 51 61 71 81 91
2 12 22 32 42 52 62 72 82 92
3 13 23 33 43 53 63 73 83 93
4 14 24 34 44 54 64 74 84 94
5 15 25 35 45 55 65 75 85 95
6 16 26 36 46 56 66 76 86 96
7 17 27 37 47 57 67 77 87 97
8 18 28 38 48 58 68 78 88 98
9 19 29 39 49 59 69 79 89 99
10 20 30 40 50 60 70 80 90 100
I have a some file with the following content
$ cat somefile
28 46 5d a2 26 7a 192 168 2 2
0 15 e c8 a8 a3 192 168 100 3
54 4 2b 8 c 26 192 168 20 3
As you can see the values in first six columns are represented in hex, the values in last four columns in decimal formats. I just want to add 0 to every single symbol hexidecimal value.
Thanks beforehand.
This one should work out for you:
while read -a line
do
hex=(${line[#]:0:6})
printf "%02x " ${hex[#]/#/0x}
echo ${line[#]:6:4}
done < somefile
Example:
$ cat somefile
28 46 5d a2 26 7a 192 168 2 2
0 15 e c8 a8 a3 192 168 100 3
54 4 2b 8 c 26 192 168 20 3
$ while read -a line
> do
> hex=(${line[#]:0:6})
> printf "%02x " ${hex[#]/#/0x}
> echo ${line[#]:6:4}
> done < somefile
28 46 5d a2 26 7a 192 168 2 2
00 15 0e c8 a8 a3 192 168 100 3
54 04 2b 08 0c 26 192 168 20 3
Here is a way with awk if that is an option:
awk '{for(i=1;i<=6;i++) if(length($i)<2) $i=0$i}1' file
Test:
$ cat file
28 46 5d a2 26 7a 192 168 2 2
0 15 e c8 a8 a3 192 168 100 3
54 4 2b 8 c 26 192 168 20 3
$ awk '{for(i=1;i<=6;i++) if(length($i)<2) $i=0$i}1' file
28 46 5d a2 26 7a 192 168 2 2
00 15 0e c8 a8 a3 192 168 100 3
54 04 2b 08 0c 26 192 168 20 3
Please try this too, if it helps (bash version 4.1.7(1)-release)
#!/bin/bash
while read line;do
arr=($line)
i=0
for num in "${arr[#]}";do
if [ $i -lt 6 ];then
if [ ${#num} -eq 1 ];then
arr[i]='0'${arr[i]};
fi
fi
i=$((i+1))
done
echo "${arr[*]}"
done<your_file
This might work for you (GNU sed):
sed 's/\b\S\s/0&/g' file
Finds a single non-space character and prepends a 0.
I encountered this ascii's style ascii table.
Of course I can store it in a file ascii and use cat ascii to display it content.
But I want to make it behavior more like a command.
UPDATE
When I read cs:app I find that how I bother to restore it in a file and using other commands.
Just run man ascii
If your shell supports aliases, you can do:
alias ascii='cat ~/ascii'
Then just type ascii et voila!
If you're using bash, put the above line in your .bashrc to persist it across logins. Other shells have similar features.
Dec Hex Dec Hex Dec Hex Dec Hex Dec Hex Dec Hex Dec Hex Dec Hex
0 00 NUL 16 10 DLE 32 20 48 30 0 64 40 # 80 50 P 96 60 ` 112 70 p
1 01 SOH 17 11 DC1 33 21 ! 49 31 1 65 41 A 81 51 Q 97 61 a 113 71 q
2 02 STX 18 12 DC2 34 22 " 50 32 2 66 42 B 82 52 R 98 62 b 114 72 r
3 03 ETX 19 13 DC3 35 23 # 51 33 3 67 43 C 83 53 S 99 63 c 115 73 s
4 04 EOT 20 14 DC4 36 24 $ 52 34 4 68 44 D 84 54 T 100 64 d 116 74 t
5 05 ENQ 21 15 NAK 37 25 % 53 35 5 69 45 E 85 55 U 101 65 e 117 75 u
6 06 ACK 22 16 SYN 38 26 & 54 36 6 70 46 F 86 56 V 102 66 f 118 76 v
7 07 BEL 23 17 ETB 39 27 ' 55 37 7 71 47 G 87 57 W 103 67 g 119 77 w
8 08 BS 24 18 CAN 40 28 ( 56 38 8 72 48 H 88 58 X 104 68 h 120 78 x
9 09 HT 25 19 EM 41 29 ) 57 39 9 73 49 I 89 59 Y 105 69 i 121 79 y
10 0A LF 26 1A SUB 42 2A * 58 3A : 74 4A J 90 5A Z 106 6A j 122 7A z
11 0B VT 27 1B ESC 43 2B + 59 3B ; 75 4B K 91 5B [ 107 6B k 123 7B {
12 0C FF 28 1C FS 44 2C , 60 3C < 76 4C L 92 5C \ 108 6C l 124 7C |
13 0D CR 29 1D GS 45 2D - 61 3D = 77 4D M 93 5D ] 109 6D m 125 7D }
14 0E SO 30 1E RS 46 2E . 62 3E > 78 4E N 94 5E ^ 110 6E n 126 7E ~
15 0F SI 31 1F US 47 2F / 63 3F ? 79 4F O 95 5F _ 111 6F o 127 7F DEL
I'm trying to create the program to read each lines and calculate each line's average and store into array...For example, program will read first line, add all the numbers and divide by 24 to calculate the average which will be stored into Avg_list[1].
When I try to run the program, I encounter following error, I have no idea why it doesn't work... Can someone identify the problem?
Code: in Ksh
c=0
while read -r line ; do
v=$line
set -- $v ((g=($2+$3+$4+$5+$6+$7+$8+$9+$10+$11+$12+$13+$14+$15+$16+$17+$18+$19+$20+$21+$22+$23+$24+$25+$26)/24))
echo $g
Avg_list[${c}]=$g
((c=c+1))
done < daily.txt
daily.txt
CPU 55 54 54 54 54 54 54 54 54 54 54 54 54 54 54 55 54 54 55 56 57 54 57 54
CPEAK 56 56 57 55 58 56 56 56 57 55 60 56 55 56 55 56 58 55 57 56 63 56 72 57
RAM 97 97 97 97 97 96 96 96 96 96 96 93 91 89 86 84 90 90 95 97 97 97 97 97
RPEAK 97 97 97 97 97 97 96 96 96 96 96 96 92 90 91 81 94 89 97 97 97 97 97 97
Error note:
while read -r line ; do
v=$line
set -- $v
((g=($2+$3+$4+$5+$6+$7+$8+$9+$10+$11+$12+$13+$14+$15+$16+$17+$18+$19+$20+$21+$22+$23+$24+$25+$26)/24))
echo $g
Avg_list[${c}]=$g
((c=c+1))
done < daily.txt
+ 0< daily.txt
+ read -r line
+ v=CPU 54 54 54 54 54 54 54 54 54 54 54 54 54 54 55 54 54 55 56 57 54 57 54 54
+ set -- CPU 54 54 54 54 54 54 54 54 54 54 54 54 54 54 55 54 54 55 56 57 54 57 54 54
+ (( g=(54+54+54+54+54+54+54+54+CPU0+CPU1+CPU2+CPU3+CPU4+CPU5+CPU6+CPU7+CPU8+CPU9+540+541+542+543+544+545+546)/24 ))
PerformanceAM.sh[21]: g=(54+54+54+54+54+54+54+54+CPU0+CPU1+CPU2+CPU3+CPU4+CPU5+CPU6+CPU7+CPU8+CPU9+540+541+542+543+544+545+546)/24: 0403-009 The specified number is not valid for this command.
EDIT
while read -r line ; do
v=$line
set -- $v
((g=${2}+${3}+${4}+${5}+${6}+${7}+${8}+${9}+${10}+${11}+${12}+${13}+${14}+${15}+${16}+${17}+${18}+${19}+${20}+${21}+${22}+${23}+${24}+${25}+${26})/24))
echo $g
Avg_list[${c}]=$g
((c=c+1))
done < daily.txt
New error:
while read -r line ; do
v=$line
set -- $v
((g=${2}+${3}+${4}+${5}+${6}+${7}+${8}+${9}+${10}+${11}+${12}+${13}+${14}+${15}+${16}+${17}+${18}+${19}+${20}+${21}+${22}+${23}+${24}+${25}+${26})/24)PerformanceAM.sh[18]: 0403-057 Syntax error at line 21 : `/24' is not expected.
Thanks for your suggestions! when I tried used bracket i get this error...I'm now even more confused....it seems like it's not collecting numbers at all...
#!/bin/ksh
while read -r line ; do
v=$line
set -- $v
((g=(${2}+${3}+${4}+${5}+${6}+${7}+${8}+${9}+${10}+${11}+${12}+${13}+${14}+${15}+${16}+${17}+${18}+${19}+${20}+${21}+${22}+${23}+${24}+${25})/24))
echo $g
Avg_list[${c}]=$g
((c=c+1))
done < daily.txt
You were missing one ( at g=${2} and the arguments only go until ${25} not ${26}.
This should do it:
while read -r line; do
sum=0
total=0
for x in $line; do
# if x is numeric
if echo "$x" | grep -E '^[0-9]*$' > /dev/null ; then
((sum=sum+x))
((total=total+1))
else
echo -n "$x "
fi
done
if [ $total = 0 ]; then
echo
else
echo $((sum/total))
fi
done < daily.txt
This follows my general principle of never making long lists of $1 $2... This solution works for any number of integers per line, and it also prints out the line label (a feature easy to remove if you want).
For reference purposes, here's how awk can be used to solve this
array=( $(awk '{sum=0; for (i=2;i<=25; i++) sum=sum+$i; printf "%.0f ",sum/24 }' daily.txt ) )
Given a list of numbers, it's a little tidier to use reverse-polish notation for the calculations:
c=0
while read line; do
set -- $line
shift
script="3 k $* + + + + + + + + + + + + + + + + + + + + + + + 24 / p"
Avg_list[c++]=$( dc -e "$script" )
done < daily.txt
Then
printf "%s\n" "${Avg_list[#]}"
produces
54.458
57.250
94.333
94.875