Indexing variable created by awk in bash - bash

I'm having some trouble indexing a variable (consisting of 1 line with 4 values) derived from a text file with awk.
In particular, I have a text-file containing all input information for a loop. Every row contains 4 specific input values, and every iteration makes use of a different row of the input file.
Input file looks like this:
/home/hannelore/TVB-pipe_local/subjects/CON02T1/ 10012 100000 1001 --> used for iteration 1
/home/hannelore/TVB-pipe_local/subjects/CON02T1/ 10013 7200 1001 --> used for iteration 2
...
From this input text file, I identified the different columns (path, seed, count, target), and then I wanted to index these variables in each iteration of the loop. However, index 0 returns the entire variable and higher indices return without output. Using awk, cut, or IFS on this obtained variable, I wasn't able to split the variable. Can anyone help me with this?
Some code that I used:
seed=$(awk '{print $2}' $input_file)
--> extract column information from input file, this works
seedsplit=$(awk '{print $2}' $seed)
seedsplit=$(cut -f2 -d ' ' $seed)"
Thank you in advance!
Kind regards,
Hannelore

If I understand you correctly you want to extract the values from the input file row by row.
while read a b c d; do echo "var1:" ${a}; done < file
will print
var1: /home/hannelore/TVB-pipe_local/subjects/CON02T1/
var1: /home/hannelore/TVB-pipe_local/subjects/CON02T1/
similarly you can access the other variables in b,c, and d.

If you want an array, then use array assignment notation:
seed=( $(awk '{print $2}' $input_file) )
Now you will have the words from each line of output from awk in a separate array element.
col1=( $(awk '{print $1}' $input_file) )
col3=( $(awk '{print $3}' $input_file) )
Now you have three arrays which can be indexed in parallel.
for i in $(seq 1 "${#col1[#]}"
do
echo "${col1[$i]} in col1; ${seed[$i]} in col2; ${col3[$i]} in col3"
done

Related

AWK write to a file based on number of fields in csv

I want to iterate over a csv file and discard the rows while writing to a file which doesnt have all columns in a row.
I have an input file mtest.csv like this
IP##Process##Device##ID
TestIP1##TestProcess2##TestDevice1##TestID1
TestIP2##TestProcess2##TestDevice2
TestIP3##TestProcess3##TestDevice3##TestID3
But I am trying to only write those row records where all the 4 columns are present. The output should not have the TestIP2 column complete row as it has 3 columns.
Sample output should look like this:
IP##Process##Device##ID
TestIP1##TestProcess2##TestDevice1##TestID1
TestIP3##TestProcess3##TestDevice3##TestID3
I used to do like this to get all the columns earlier but it writes the TestIP2 row as well which has 3 columns
awk -F "\##" '{print $1"\##"substr($2,1,50)"\##"substr($3,1,50)"\##"substr($4,1,50)}' mtest.csv >output2.csv
But when I try to ensure that it writes to file when all 4 columns are present, it doesn't work
awk -F "\##", 'NF >3 {print $1"\##"substr($2,1,50)"\##"substr($3,1,50)"\##"substr($4,1,50); exit}' mtest.csv >output2.csv
You are making things harder than it need to be. All you need to do is check NF==4 to output any records containing four fields. Your total awk expression would be:
awk -F'##' NF==4 < mtest.csv
(note: the default action by awk is print so there is no explicit print required.)
Example Use/Output
With your sample input in mtest.csv, you would receive:
$ awk -F'##' NF==4 < mtest.csv
IP##Process##Device##ID
TestIP1##TestProcess2##TestDevice1##TestID1
TestIP3##TestProcess3##TestDevice3##TestID3
Thanks David and vukung
Both your solutions are okay.I want to write to a file so that i can trim the length of each field as well
I think this below statement works out
awk -F "##" 'NF>3 {print $1"\##"substr($2,1,50)"\##"substr($3,1,2)"\##"substr($4,1,3)}' mtest.csv >output2.csv

Divide variable by column #2 of file1, compare result to file2

I have two text files.
One contains:
Feb,30000
March,40000
April,60000
The other contains a value :
0.134
I want to take numeric input in a variable, and want to divide that number by each of the 1st file's 2nd column values, i.e.: 30000, 40000, etc., and want to compare it with the 2nd file's value.
How it will be in script?
As bash doesn't have float number calculation function, you can either use bc or awk.
input=10000
value1=$(awk -F [,\ ] '{print $2}' file1)
value2=$(awk '{print $1}' file2)
awk 'BEGIN {if("'$input'"/"'$value1'" > "'$value2'") {print("larger")} else {print("smaller")};}'
Some other tool is needed, (i.e.: bc, awk, calc, etc.), since bash lacks floating point.
Let's call the first file f1, and the second file f2. First use sed and bash to write some text expressions:
x=4900 ; sed 's#.*,\(.*\)#'$x'/\1>'$(<f2)'#' f1
Output:
4900/30000>0.134
4900/40000>0.134
4900/60000>0.134
From there, feed that expression to calc (which will divide, then compare):
x=4900 ; sed 's#.*,\(.*\)#'$x'/\1>'$(<f2)'#' f1 | calc -p
Output, (in calc the "1" means true, here only 4900/30000 or 0.163... is greater than 0.134):
1
0
0

match awk column value to a column in another file

I need to know if I can match awk value while I am inside a piped command. Like below:
somebinaryGivingOutputToSTDOUT | grep -A3 "sometext" | grep "somemoretext" | awk -F '[:|]' 'BEGIN{OFS=","; print "Col1,Col2,Col3,Col4"}{print $4,$6,$4*10^10+$6,$8}'
from here I need to check if the computed value $4*10^10+$6 is present (matches to) in any of the column value of another file. If it is present then print, else just move forward.
File where value needs to be matched is as below:
a,b,c,d,e
1,2,30000000000,3,4
I need to match with the 3rd column of the above file.
I would ideally like this to be in the same command, because if this check is not applied, it prints more than 100 million rows (and a large file).
I have already read this question.
Adding more info:
Breaking my command into parts
part1-command:
somebinaryGivingOutputToSTDOUT | grep -A3 "sometext" | grep "Something:"
part1-output(just showing 1 iteration output):
Something:38|Something1:1|Something2:10588429|Something3:1491539456372358463
part2-command Now I use awk
awk -F '[:|]' 'BEGIN{OFS=","; print "Col1,Col2,Col3,Col4"}{print $4,$6,$4*10^10+$6,$8}'
part2-command output: currently below values are printed (see how i multiplied 1*10^10+10588429 and got 10010588429
1,10588429,10010588429,1491539456372358463
3,12394810,30012394810,1491539456372359082
1,10588430,10010588430,1491539456372366413
Now here I need to put a check (within the command [near awk]) to print only if 10010588429 was present in another file (say another_file.csv as below)
another_file.csv
A,B,C,D,E
1,2, 10010588429,4,5
x,y,z,z,k
10,20, 10010588430,40,50
output should only be
1,10588429,10010588429,1491539456372358463
1,10588430,10010588430,1491539456372366413
So for every row of awk we check entry in file2 column C
Using the associative array approach in previous question, include a hyphen in place of the first file to direct AWK to the input stream.
Example:
grep -A3 "sometext" | grep "somemoretext" | awk -F '[:|]'
'BEGIN{OFS=","; print "Col1,Col2,Col3,Col4"}
NR==FNR {
query[$4*10^10+$6]=$4*10^10+$6;
out[$4*10^10+$6]=$4 FS $6 FS $4*10^10+$6 FS $8;
next
}
query[$3]==$3 {
print out[$3]
}' - another_file.csv > output.csv
More info on the merging process in the answer cited in the question:
Using AWK to Process Input from Multiple Files
I'll post a template which you can utilize for your computation
awk 'BEGIN {FS=OFS=","}
NR==FNR {lookup[$3]; next}
/sometext/ {c=4}
c&&c--&&/somemoretext/ {value= # implement your computation here
if(value in lookup)
print "what you want"}' lookup.file FS=':' grep.files...
here awk loads up the values in the third column of the first file (which is comma delimited) into the lookup array (a hashmap in disguise). For the next set of files, sets the delimiter to : and similar to grep -A3 looks within the 3 distance of the first pattern for the second pattern, does the computation and prints what you want.
In awk you can have more control on what column your pattern matches as well, here I replicated grep example.
This is another simplified example to focus on the core of the problem.
awk 'BEGIN{for(i=1;i<=1000;i++) print int(rand()*1000), rand()}' |
awk 'NR==FNR{lookup[$1]; next}
$1 in lookup' perfect.numbers -
first process creates 1000 random records, and second one filters the ones where the first fields is in the look up table.
28 0.736027
496 0.968379
496 0.404218
496 0.151907
28 0.0421234
28 0.731929
for the lookup file
$ head perfect.numbers
6
28
496
8128
the piped data is substituted as the second file at -.
You can pipe your grep or awk output into a while read loop which gives you some degree of freedom. There you could decide on whether to forward a line:
grep -A3 "sometext" | grep "somemoretext" | while read LINE; do
COMPUTED=$(echo $LINE | awk -F '[:|]' 'BEGIN{OFS=","}{print $4,$6,$4*10^10+$6,$8}')
if grep $COMPUTED /the/file/to/search &>/dev/null; then
echo $LINE
fi
done | cat -

Extracting data from a particular column number in csv file

I have a csv file that looks like this:
arruba,jamaica, bermuda, bahama, keylargo, montigo, kokomo
80,70,90,85,86,89,83
77,75,88,87,83,85,77
76,77,83,86,84,86,84
I want to have a shell script set up so that I can extract out the data so that I can categorize data by columns.
I know that the line of code:
IFS="," read -ra arr <"vis.degrib"
for ((i=0 ; i<${#arr[#]} ; i++));do
ifname=`printf "%s\n" "${arr[i]}"`
echo "$ifname"
done
will print out the individual column components for the first row. How do I also do this again for subsequent rows?
Thank you for your time.
I'm extrapolating from the OP
awk -F, 'NR==1{for(i=1;i<=NF;i++) {gsub(/^ +/,"",$i);print $i}}' vis.degrib
will print
arruba
jamaica
bermuda
bahama
keylargo
montigo
kokomo
note that there is trimming of the space from the beginning of each field. If you remove the condition NR==1, the same will be done for all rows. Was this your request? Please comment...
Perhaps you want to convert columnar format to row based format (transpose)? There are many ways, this awk script will do
awk -F, -v OFS=, '{sep=(NR==1)?"":OFS} {for(i=1;i<=NF;i++) a[i]=a[i] sep $i} END{for(i=1;i<=NF;i++) print a[i]}' vis.degrib
will print
arruba,80,77,76
jamaica,70,75,77
bermuda,90,88,83
bahama,85,87,86
keylargo,86,83,84
montigo,89,85,86
kokomo,83,77,84
you can again trim the space from the beginning of the labels as shown above.
Another approach without awk.
tr ',' '\n' <vis.degrib | pr -4ts,
will generate the same
arruba,80,77,76
jamaica,70,75,77
bermuda,90,88,83
bahama,85,87,86
keylargo,86,83,84
montigo,89,85,86
kokomo,83,77,84
4 is the number of rows in the original file.

extracting values from text file using awk

I have 100 text files which look like this:
File title
4
Realization number
variable 2 name
variable 3 name
variable 4 name
1 3452 4538 325.5
The first number on the 7th line (1) is the realization number, which SHOULD relate to the file name. i.e. The first file is called file1.txt and has realization number 1 (as shown above). The second file is called file2.txt and should have realization number 2 on the 7th line. file3.txt should have realization number 3 on the 7th line, and so on...
Unfortunately every file has realization=1, where they should be incremented according to the file name.
I want to extract variables 2, 3 and 4 from the 7th line (3452, 4538 and 325.5) in each of the files and append them to a summary file called summary.txt.
I know how to extract the information from 1 file:
awk 'NR==7,NR==7{print $2, $3, $4}' file1.txt
Which, correctly gives me:
3452 4538 325.5
My first problem is that this command doesn't seem to give the same results when run from a bash script on multiple files.
#!/bin/bash
for ((i=1;i<=100;i++));do
awk 'NR=7,NR==7{print $2, $3, $4}' File$((i)).txt
done
I get multiple lines being printed to the screen when I use the above script.
Secondly, I would like to output those values to the summary file along with the CORRECT preceeding realization number. i.e. I want a file that looks like this:
1 3452 4538 325.5
2 4582 6853 158.2
...
100 4865 3589 15.15
Thanks for any help!
You can simplify some things and get the result you're after:
#!/bin/bash
for ((i=1;i<=100;i++))
do
echo $i $(awk 'NR==7{print $2, $3, $4}' File$i.txt)
done
You really don't want to assign to NR=7 (as you did) and you don't need to repeat the NR==7,NR==7 either. You also really don't need the $((i)) notation when $i is sufficient.
If all the files are exactly 7 lines long, you can do it all in one awk command (instead of 100 of them):
awk 'NR%7==0 { print ++i, $2, $3, $4}' Files*.txt
Notice that you have only one = in your bash script. Does all the files have exactly 7 lines? If you are only interested in the 7th line then:
#!/bin/bash
for ((i=1;i<=100;i++));do
awk 'NR==7{print $2, $3, $4}' File$((i)).txt
done
Since your realization number starts from 1, you can simply add that using nl command.
For example, if your bash script is called s.sh then:
./s.sh | nl > summary.txt
will get you the result with the expected lines in summary.txt
Here's one way using awk:
awk 'FNR==7 { print ++i, $2, $3, $4 > "summary.txt" }' $(ls -v file*)
The -v flag simply sorts the glob by version numbers. If your version of ls doesn't support this flag, try: ls file* | sort -V instead.

Resources