How to modify a bash script after each run? - bash

The following Script1.sh uses 1st (key) and 2nd columns (values) in every file and prints some output based on some code in other_scripts.sh. script2.sh simply runs both script1.sh and others_script.sh together.
Now, is it possible to extend the similar process to 1st and 3rd columns (output.n2) and repeat the process (output.nn)?
Note: I have around 20k columns. Every file has same exact number of columns.
$ cat file1
s n1 n2 n3
s1 2 3 4
s2 3 4 5
s3 0 1 4
s4 9 8 7
$ cat file2
s n1 n2 n3
s1 12 13 14
s2 13 14 15
s3 10 11 14
s4 19 18 17
$ cat file3
s n1 n2 n3
s1 12 33 44
s2 13 43 54
s3 10 13 44
s4 19 83 74
$ cat filen
s n1 n2 n3
s1 25 33 40
s2 35 43 50
s3 50 13 40
s4 95 83 70
script1.sh
awk '{print $1"\t"$2}' file1 | awk '{print $1"\t""file1""\t"$2}' >> r.1
awk '{print $1"\t"$2}' file2 | awk '{print $1"\t""file2""\t"$2}' >> r.1
awk '{print $1"\t"$2}' file3 | awk '{print $1"\t""file3""\t"$2}' >> r.1
awk '{print $1"\t"$2}' filen | awk '{print $1"\t""filen""\t"$2}' >> r.1
other_scripts.sh
grep file r.1 |awk '{print $1"\t"$2"\t"$3*100}' > output.n1
rm r.1
script2.sh
sh script1.sh
sh other_scripts.sh
output.n1
s1 file1 200
s2 file1 300
s3 file1 0
s4 file1 900
s1 file2 1200
s2 file2 1300
s3 file2 1000
s4 file2 1900
s1 file3 1200
s2 file3 1300
s3 file3 1000
s4 file3 1900
s1 filen 2500
s2 filen 3500
s3 filen 5000
s4 filen 9500

Try with this script. This reproduce exactly the output you required with the input you provided.
#!/bin/bash
NUMBER_OF_COLUMNS=3
NUMBER_OF_FILES=4 # Assuming the files are like file{n}
for coln in `seq 1 $NUMBER_OF_COLUMNS`; do
for filen in `seq 1 $NUMBER_OF_FILES`; do
awk -v n=$coln -v filen=$filen 'NR>1{printf"%s\tfile%i\t%s\n", $1, filen, $(1+n)*100}' file$filen >> output.$coln
done
done
Copy it in a file, make it executable (chmod +x <name of the file>) and run the script (./<name of the file>) inside the directory containing all your files. Remember to put the right numbers as NUMBER_OF_COLUMNS and NUMBER_OF_FILES.

Related

distribute data in both increment and decrement order

I have a file which has n number of rows, i want it's data to be distributed in 7 files as per below order
** my input file has n number of rows, this is just an example.
Input file
1
2
3
4
5
6
7
8
9
10
11
12
13
14
1
5
16
17
.
.
28
Output file
1 2 3 4 5 6 7
14 13 12 11 10 9 8
15 16 17 18 19 20 21
28 27 26 25 24 23 22
so if i open the first file it should have rows
1
14
15
28
similarly if i open the second file it should have rows
2
13
16
27
similarly output for the other files as well.
Can anybody please help, with below code it is doing what is required but not in required order.
awk '{print > ("te1234"++c".txt");c=(NR%n)?c:0}' n=7 test6.txt
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
EDIT: Since OP has changed sample of Input_file totally different so adding this solution now, again this is written and tested with shown samples only.
With xargs + single awk: (recommended one)
xargs -n7 < Input_file |
awk '
FNR%2!=0{
for(i=1;i<=NF;i++){
print $i >> (i".txt")
close(i".txt")
}
next
}
FNR%2==0{
for(i=NF;i>0;i--){
count++
print $i >> (count".txt")
close(i".txt")
}
count=""
}'
Initial solution:
xargs -n7 < Input_file |
awk '
FNR%2==0{
for(i=NF;i>0;i--){
val=(val?val OFS:"")$i
}
$i=val
val=""
}
1' |
awk '
{
for(i=1;i<=NF;i++){
print $i >> (i".txt")
close(i".txt")
}
}'
Above could be done with single awk too will add xargs + awk(single) solution in few mins too.
Could you please try following, written and tested with shown samples in GNU awk.
awk '{for(i=1;i<=NF;i++){print $i >> (i".txt");close(i".txt")}}' Input_file
The output file counter could descend for each second group of seven:
awk 'FNR%n==1 {asc=!asc}
{
out="te1234" (asc ? ++c : c--) ".txt";
print >> out;
close(out)
}' n=7 test6.txt
$ ls
file tst.awk
$ cat tst.awk
{ rec = (cnt % 2 ? $1 sep rec : rec sep $1); sep=FS }
!(NR%n) {
++cnt
nf = split(rec,flds)
for (i=1; i<=nf; i++) {
out = "te1234" i ".txt"
print flds[i] >> out
close(out)
}
rec=sep=""
}
.
$ awk -v n=7 -f tst.awk file
.
$ ls
file te12342.txt te12344.txt te12346.txt tst.awk
te12341.txt te12343.txt te12345.txt te12347.txt
$ cat te12341.txt
1
14
15
28
$ cat te12342.txt
2
13
16
27
If you can have input that's not an exact multiple of n then move the code that's currently in the !(NR%n) block into a function and call that function there and in an END section.
This might work for you (GNU sed & parallel):
parallel 'echo {1}~14w file{1}; echo {2}~14w file{1}' ::: {1..7} :::+ {14..8} |
sed -n -f - file &&
paste file{1..7}
Create a sed script to write files named filen where n is 1 thru 7 (see above first set of parameters in the parallel command and also in the paste command).
The sed script uses the n~m address where n is the starting address and m is the modulo thereafter.
The distributed files are created first and the paste command then joins them all together to produce a single output file (tab separated by default, use paste -d option to get desired delimiter).
Alternative using Bash & sed:
for ((n=1,m=14;n<=7;n++,m--));do echo "$n~14w file$n";echo "$m~14w file$n";done |
sed -nf - file &&
paste file{1..7}

shell script for extracting line of file using awk

I want, the selected lines of file to be print in output file side by side separated by space. Here what I have did so far,
for file in SAC*
do
awk 'FNR==2 {print $4}' $file >>exp
awk 'FNR==3 {print $4}' $file >>exp
awk 'FNR==4 {print $4}' $file >>exp
awk 'FNR==5 {print $4}' $file >>exp
awk 'FNR==7 {print $4}' $file >>exp
awk 'FNR==8 {print $4}' $file >>exp
awk 'FNR==24 {print $0}' $file >>exp
done
My output is:
XV
AMPY
BHZ
2012-08-15T08:00:00
2013-12-31T23:59:59
I want output should be
XV AMPY BHZ 2012-08-15T08:00:00 2013-12-31T23:59:59
First the test data (only 9 rows, tho):
$ cat file
1 2 3 14
1 2 3 24
1 2 3 34
1 2 3 44
1 2 3 54
1 2 3 64
1 2 3 74
1 2 3 84
1 2 3 94
Then the awk. No need for that for loop in shell, awk can handle multiple files:
$ awk '
BEGIN {
ORS=" "
a[2];a[3];a[4];a[5];a[7];a[8] # list of records for which $4 should be outputed
}
FNR in a { print $4 } # output the $4s
FNR==9 { printf "%s\n",$0 } # replace 9 with 24
' file file # ... # the files you want to process (SAC*)
24 34 44 54 74 84 1 2 3 94
24 34 44 54 74 84 1 2 3 94

Shell script to find common values and write in particular pattern with subtraction math to range pattern

Shell script to find common values and write in particular pattern with subtraction math to range pattern
Shell script to get command values in two files and write i a pattern to new file AND also have the first value of the range pattern to be subtracted by 1
$ cat file1
2
3
4
6
7
8
10
12
13
16
20
21
22
23
27
30
$ cat file2
2
3
4
8
10
12
13
16
20
21
22
23
27
Script that works:
awk 'NR==FNR{x[$1]=1} NR!=FNR && x[$1]' file1 file2 | sort | awk 'NR==1 {s=l=$1; next} $1!=l+1 {if(l == s) print l; else print s ":" l; s=$1} {l=$1} END {if(l == s) print l; else print s ":" l; s=$1}'
Script out:
2:4
8
10
12:13
16
20:23
27
Desired output:
1:4
8
10
11:13
16
19:23
27
Similar to sputnick's, except using comm to find the intersection of the file contents.
comm -12 <(sort file1) <(sort file2) |
sort -n |
awk '
function print_range() {
if (start != prev)
printf "%d:", start-1
print prev
}
FNR==1 {start=prev=$1; next}
$1 > prev+1 {print_range(); start=$1}
{prev=$1}
END {print_range()}
'
1:4
8
10
11:13
16
19:23
27
Try doing this :
awk 'NR==FNR{x[$1]=1} NR!=FNR && x[$1]' file1 file2 |
sort |
awk 'NR==1 {s=l=$1; next}
$1!=l+1 {if(l == s) print l; else print s -1 ":" l; s=$1}
{l=$1}
END {if(l == s) print l; else print s -1 ":" l; s=$1}'

shellscript and awk extraction to calculate averages

I have a shell script that contains a loop. This loop is calling another script. The output of each run of the loop is appended inside a file (outOfLoop.tr). when the loop is finished, awk command should calculate the average of specific columns and append the results to another file(fin.tr). At the end, the (fin.tr) is printed.
I managed to get the first part which is appending the results from the loop into (outOfLoop.tr) file. also, my awk commands seem to work... But I'm not getting the final expected output in terms of format. I think I'm missing something. Here is my try:
#!/bin/bash
rm outOfLoop.tr
rm fin.tr
x=1
lmax=4
while [ $x -le $lmax ]
do
calling another script >> outOfLoop.tr
x=$(( $x + 1 ))
done
cat outOfLoop.tr
#/////////////////
#//I'm getting the above part correctly and the output is :
27 194 119 59 178
27 180 100 30 187
27 175 120 59 130
27 189 125 80 145
#////////////////////
#back again to the script
echo "noRun\t A\t B\t C\t D\t E"
echo "----------------------\n"
#// print the total number of runs from the loop
echo "$lmax\t">>fin.tr
#// extract the first column from the output which is 27
awk '{print $1}' outOfLoop.tr >>fin.tr
echo "\t">>fin.tr
#Sum the column---calculate average
awk '{s+=$5;max+=0.5}END{print s/max}' outOfLoop.tr >>fin.tr
echo "\t">>fin.tr
awk '{s+=$4;max+=0.5}END{print s/max}' outOfLoop.tr >>fin.tr
echo "\t">>fin.tr
awk '{s+=$3;max+=0.5}END{print s/max}' outOfLoop.tr >>fin.tr
echo "\t">>fin.tr
awk '{s+=$2;max+=0.5}END{print s/max}' outOfLoop.tr >> fin.tr
echo "-------------------------------------------\n"
cat fin.tr
rm outOfLoop.tr
I want the format to be like :
noRun A B C D E
----------------------------------------------------------
4 27 average average average average
I have incremented max inside the awk command by 0.5 as there was new line between the out put of the results (output of outOfLoop file)
$ cat file
27 194 119 59 178
27 180 100 30 187
27 175 120 59 130
27 189 125 80 145
$ cat tst.awk
NF {
for (i=1;i<=NF;i++) {
sum[i] += $i
}
noRun++
}
END {
fmt="%-10s%-10s%-10s%-10s%-10s%-10s\n"
printf fmt,"noRun","A","B","C","D","E"
printf "----------------------------------------------------------\n"
printf fmt,noRun,$1,sum[2]/noRun,sum[3]/noRun,sum[4]/noRun,sum[5]/noRun
}
$ awk -f tst.awk file
noRun A B C D E
----------------------------------------------------------
4 27 184.5 116 57 160

Compare two file columns (unsorted files)

Input File 1
A1 123 AA
B1 123 BB
C2 44 CC1
D1 12 DD1
E1 11 EE1
Input File 2
A sad21 1
DD1 124f2 2
CC 123tges 3
BB 124sdf 4
AA 1asrf 5
Output File
A1 123 AA 1asrf 5
B1 123 BB 124sdf 4
D1 12 DD1 124f2 2
Making of Output file
We check 3rd column of Input File 1 and 1st Col of Input File 2.
If they match , we print it in Output file.
Note :
The files are not sorted
I tried :
join -t, A B | awk -F "\t" 'BEGIN{OFS="\t"} {if ($3==$4) print $1,$2,$3,$4,$6}'
But this doesnot work as files are unsorted. so the condition ($3==$4) won't work all the time. Please help .
nawk 'FNR==NR{a[$3]=$0;next}{if($1 in a){p=$1;$1="";print a[p],$0}}' file1 file2
tested below:
> cat file1
A1 123 AA
B1 123 BB
C2 44 CC1
D1 12 DD1
E1 11 EE1
> cat file2
A sad21 1
DD1 124f2 2
CC 123tges 3
BB 124sdf 4
AA 1asrf 5
> awk 'FNR==NR{a[$3]=$0;next}{if($1 in a){p=$1;$1="";print a[p],$0}}' file1 file2
D1 12 DD1 124f2 2
B1 123 BB 124sdf 4
A1 123 AA 1asrf 5
>
You can use join, but you need to sort on the key field first and tell join that the key in the first file is column 3 (-1 3):
join -1 3 <(sort -k 3,3 file1) <(sort file2)
Will get you the correct fields, output (with column -t for output formatting):
AA A1 123 1asrf 5
BB B1 123 124sdf 4
DD1 D1 12 124f2 2
To get the same column ordering listed in the question, you need to specify the output format:
join -1 3 -o 1.1,1.2,1.3,2.2,2.3 <(sort -k 3,3 file1) <(sort file2)
i.e. file 1 fields 1 through 3 then file 2 fields 2 and 3. Output (again with column -t):
A1 123 AA 1asrf 5
B1 123 BB 124sdf 4
D1 12 DD1 124f2 2
perl -F'/\t/' -anle 'BEGIN{$f=1}if($f==1){$H{$F[2]}=$_;$f++ if eof}else{$l=$H{$F[0]};print join("\t",$l,#F[1..$#F]) if defined$l}' f1.txt f2.txt
or shorter
perl -F'/\t/' -anle'$f?($l=$H{$F[0]})&&print(join"\t",$l,#F[1..$#F]):($H{$F[2]}=$_);eof&&$f++' f1.txt f2.txt
One way using awk:
awk 'BEGIN { FS=OFS="\t" } FNR==NR { array[$1]=$2 OFS $3; next } { if ($3 in array) print $0, array[$3] }' file2.txt file1.txt
Results:
A1 123 AA 1asrf 5
B1 123 BB 124sdf 4
D1 12 DD1 124f2 2
This might work for you (GNU sed):
sed 's|\(\S*\)\(.*\)|/\\s\1$/s/$/\2/p|' file2 | sed -nf - file1

Resources