C-shell: print result from row to column - shell

I have outputs:
1 rpt 4 qor 5 are 6 oip
I want it to become :
1 rpt
4 qor
5 are
6 oip
This is my code:
set d = `sort "04.txt" | uniq -c`
echo $d

With your shown samples, please try following.
xargs -n 2 < Input_file
From man xargs:
-n max-args, --max-args=max-args Use at most max-args arguments per command line. Fewer than max-args arguments will be used if the size
(see the -s option) is exceeded, unless the -x option is given, in
which case xargs will exit.

akshay#sys:~$ cat file
1 rpt 4 qor 5 are 6 oip
akshay#sys:~$ sed 's/ /\n/2; P; D' file
1 rpt
4 qor
5 are
6 oip
akshay#sys:~$ awk -v n=2 '{for (i=n+1; i<=NF; i+=n) $i = "\n" $i}1' file
1 rpt
4 qor
5 are
6 oip
akshay#sys:~$ awk -v RS=" " '{$1=$1; ORS=NR%2?FS:"\n" }1' file
1 rpt
4 qor
5 are
6 oip

Related

Sorting tab delimited numbers by column with pure bash script.

Im stuck on some homework. The requirements of the assignment are to accept an input file and perform some statistics on the values. The user may specify whether to calculate the statistics by row or by value. The shell script must be pure bash script so I can't use awk, sed, perl, python etc.
sample input:
1 1 1 1 1 1 1
39 43 4 3225 5 2 2
6 57 8 9 7 3 4
3 36 8 9 14 4 3
3 4 2 1 4 5 5
6 4 4814 7 7 6 6
I can't figure out how to sort and process the data by column. My code for processing the rows works fine.
# CODE FOR ROWS
while read -r line
echo $(printf "%d\n" $line | sort -n) | tr ' ' \\t > sorted.txt
....
#I perform the stats calculations
# for row line by working with the temp file sorted.txt
done
How could I process this data by column? I've never worked with shell script so I've been staring at this for hours.
If you wanted to analyze by columns you'll need the cols value first (number of columns). head -n 1 gives you the first row, and NF counts the number of fields, giving us the number of columns.
cols=$(head -n 1 test.txt | awk '{print NF}');
Then you can use cut with the '\t' delimiter to grab every column from input.txt, and run it through sort -n, as you did in your original post.
$ for i in `seq 2 $((cols+1))`; do cut -f$i -d$'\t' input.txt; done | sort -n > output.txt
For rows, you can use the shell built-in printf with the format modifier %dfor integers. The sort command works on lines of input, so we replace spaces ' ' with newlines \n using the tr command:
$ cat input.txt | while read line; do echo $(printf "%d\n" $line); done | tr ' ' '\n' | sort -n > output.txt
Now take the output file to gather our statistics:
Min: cat output.txt | head -n 1
Max: cat output.txt | tail -n 1
Sum: (courtesy of Dimitre Radoulov): cat output.txt | paste -sd+ - | bc
Mean: (courtesy of porges): cat output.txt | awk '{ $total += $2 } END { print $total/NR }'
Median: (courtesy of maxschlepzig): cat output.txt | awk ' { a[i++]=$1; } END { print a[int(i/2)]; }'
Histogram: cat output.txt | uniq -c
8 1
3 2
4 3
6 4
3 5
4 6
3 7
2 8
2 9
1 14
1 36
1 39
1 43
1 57
1 3225
1 4814

Writing shell script to print a certain number of lines with certain arguments

I have 5 variables and each variables contains five values.I want to print five lines with the five values from five variables one by one
For example
$a=1 2 3 4 5
$b=4 2 3 4 5
$c=8 9 7 6 5
$d= 8 7 6 5 4
$e=5 6 7 3 3
I want to print five lines in this format
My options was a=1,b=4,c=8,d=8and e=5
My options was a=2,b=2,c=9,d=7 and e=6
and so on upto five values.
I got confused in using the loops.Can anyone help me to provide loops in script to obtain the following output.
a="1 2 3 4 5"
b="4 2 3 4 5"
c="8 9 7 6 5"
d="8 7 6 5 4"
e="5 6 7 3 3"
for i in $(seq 1 5); do
echo -e "My options was \c"
echo -e "a=$(echo $a | cut -f$i -d' ')\c"
echo -e "b=$(echo $b | cut -f$i -d' ')\c"
echo -e "c=$(echo $c | cut -f$i -d' ')\c"
echo -e "d=$(echo $d | cut -f$i -d' ') and \c"
echo -e "e=$(echo $e | cut -f$i -d' ')"
done
Using this awk command with a bash loop:
for i in {1..5}; do
awk '{printf "My options was a=%d, b=%d, c=%d, d=%d and e=%d\n", $1, $2, $3, $4, $5}' <<< $(awk '{print $'$i'}' <(echo -e "$a\n$b\n$c\n$d\n$e") | tr $'\n' ' '); done
Output:
$ a='1 2 3 4 5'
$ b='4 2 3 4 5'
$ c='8 9 7 6 5'
$ d='8 7 6 5 4'
$ e='5 6 7 3 3'
$ for i in {1..5}; do
awk '{printf "My options was a=%d, b=%d, c=%d, d=%d and e=%d\n", $1, $2, $3, $4, $5}' <<< $(awk '{print $'$i'}' <(echo -e "$a\n$b\n$c\n$d\n$e") | tr $'\n' ' '); done
My options was a=1, b=4, c=8, d=8 and e=5
My options was a=2, b=2, c=9, d=7 and e=6
My options was a=3, b=3, c=7, d=6 and e=7
My options was a=4, b=4, c=6, d=5 and e=3
My options was a=5, b=5, c=5, d=4 and e=3
If you transpose the matrix, this is really simple, portable, and idiomatic.
while read -r a b c d e; do
: stuff with "$a", "$b", etc
done <<____
1 4 8 8 5
2 2 9 7 6
3 3 7 6 7
4 4 6 5 3
5 5 5 4 3
____
Notice how the first column enumerates the a values, the second, the bs, etc.

Add column to csv file

I have two files and I need catch the last column of a file and append to other file.
file1
1 2 3
1 2 3
1 2 3
file2
5 5
5 5
5 5
Initial proposal
#!/usr/bin/env bash
column=$(awk '{print $(NF)}' $file1)
paste -d',' $file2 < $column
Expected result
file2
5 5 3
5 5 3
5 5 3
But, This script does not work yet
OBS: I do not know how many columns have in the file. I need more generic solution.
You can use this paste command:
paste -d " " file2 <(awk '{print $NF}' file1)
5 5 3
5 5 3
5 5 3
To append last column of file1 to file2:
paste -d " " file2 <(rev file1 | cut -d " " -f 1 | rev)
Output:
5 5 3
5 5 3
5 5 3
To paste the second column of file 1 to file 2:
while read line; do
read -u 3 c1 c2 c3;
echo $line $c2;
done < file2 3< file1
You can use Perl too:
$ paste -d ' ' file2.txt <(perl -lne 'print $1 if m/(\S+)\s*$/' file1.txt)
5 5 3
5 5 3
5 5 3
Or grep:
$ paste -d ' ' file2.txt <(grep -Eo '(\S+)\s*$' file1.txt)
5 5 3
5 5 3
5 5 3

split file into multiple files (by columns)

I have a file data.txt in which there are 200 columns and rows (a square matrix). So, i have been trying to split my file into 200 files, each of then with one of the column from the big data file. These where my two attempts employing cut and awk, however i don't understand why is not working.
NM=`awk 'NR==1{print NF-2}' < file.txt`
echo $NM
for (( i=1; i = $NM; i++ ))
do
echo $i
cut -f ${i} file.txt > tmpgrid_0${i}.dat
#awk '{print '$i'}' file.txt > tmpgrid_0${i}.dat
done
Any suggestions?.
EDIT: Thank you very much to all of you. All answers were valid but i cannot vote to all of them.
awk '{for(i=1;i<=5;i++){name=FILENAME"_"i;print $i> name}}' your_file
Tested with 5 columns:
> cat temp
PHE 5 2 4 6
PHE 5 4 6 4
PHE 5 4 2 8
TRP 7 5 5 9
TRP 7 5 7 1
TRP 7 5 7 3
TYR 2 4 4 4
TYR 2 4 4 0
TYR 2 4 5 3
> nawk '{for(i=1;i<=5;i++){name=FILENAME"_"i;print $i> name}}' temp
> ls -1 temp_*
temp_1
temp_2
temp_3
temp_4
temp_5
> cat temp_1
PHE
PHE
PHE
TRP
TRP
TRP
TYR
TYR
TYR
>
To summarise my comments, I suggest something like this (untested as I have no sample file):
NM=$(awk 'NR==1{print NF-2}' file.txt)
echo $NM
for (( i=1; i <= $NM; i++ ))
do
echo $i
awk '{print $'$i'}' file.txt > tmpgrid_0${i}.dat
done
An alternative solution using tr and split
< file.txt tr ' ' '\n' | split -nr/200
This assumes that the file is space delimited, but the tr command could be tweaked as appropriate for any delimiter. Essentially this puts each entry on its own line, and then uses split's round robin version to write each 200th line to the same file.
paste -d' ' x* | cmp - file.txt
verifies that it worked if split is writing files with an x prefix.
I got this solution from Reuti on the coreutils mailing list.

Split specific column(s)

I have this kind of recrods:
1 2 12345
2 4 98231
...
I need to split the third column into sub-columns to get this (separated by single-space for example):
1 2 1 2 3 4 5
2 4 9 8 2 3 1
Can anybody offer me a nice solution in sed, awk, ... etc ? Thanks!
EDIT: the size of the original third column may vary record by record.
Awk
% echo '1 2 12345
2 4 98231
...' | awk '{
gsub(/./, "& ", $3)
print
}
'
1 2 1 2 3 4 5
2 4 9 8 2 3 1
...
[Tested with GNU Awk 3.1.7]
This takes every character (/./) in the third column ($3) and replaces (gsub()) it with itself followed by a space ("& ") before printing the entire line.
Sed solution:
sed -e 's/\([0-9]\)/\1 /g' -e 's/ \+/ /g'
The first sed expression replaces every digit with the same digit followed by a space. The second expression replaces every block of spaces with a single space, thus handling the double spaces introduced by the previous expression. With non-GNU seds you may need to use two sed invocations (one for each -e).
Using awk substr and printf:
[srikanth#myhost ~]$ cat records.log
1 2 12345 6 7
2 4 98231 8 0
[srikanth#myhost ~]$ awk '{ len=length($3); for(i=1; i<=NF; i++) { if(i==3) { for(j = 1; j <= len; j++){ printf substr($3,j,1) " "; } } else { printf $i " "; } } printf("\n"); }' records.log
1 2 1 2 3 4 5 6 7
2 4 9 8 2 3 1 8 0
You can use this for more than three column records as well.
Using perl:
perl -pe 's/([0-9])(?! )/\1 /g' INPUT_FILE
Test:
[jaypal:~/Temp] cat tmp
1 2 12345
2 4 98231
[jaypal:~/Temp] perl -pe 's/([0-9])(?! )/\1 /g' tmp
1 2 1 2 3 4 5
2 4 9 8 2 3 1
Using gnu sed:
sed 's/\d/& /3g' INPUT_FILE
Test:
[jaypal:~/Temp] sed 's/[0-9]/& /3g' tmp
1 2 1 2 3 4 5
2 4 9 8 2 3 1
Using gnu awk:
gawk '{print $1,$2,gensub(/./,"& ","G", $NF)}' INPUT_FILE
Test:
[jaypal:~/Temp] gawk '{print $1,$2,gensub(/./,"& ","G", $NF)}' tmp
1 2 1 2 3 4 5
2 4 9 8 2 3 1
If you don't care about spaces, this is a succinct version:
sed 's/[0-9]/& /g'
but if you need to remove spaces, we just chain another regexp:
sed 's/[0-9]/& /g;s/ */ /g'
Note this is compatible with the original sed, thus will run on any UNIX-like.
$ awk -F '' '$1=$1' data.txt | tr -s ' '
1 2 1 2 3 4 5
2 4 9 8 2 3 1
This might work for you:
echo -e "1 2 12345\n2 4 98231" | sed 's/\B\s*/ /g'
1 2 1 2 3 4 5
2 4 9 8 2 3 1
Most probably GNU sed only.

Resources