Divide variable by column #2 of file1, compare result to file2 - shell

I have two text files.
One contains:
Feb,30000
March,40000
April,60000
The other contains a value :
0.134
I want to take numeric input in a variable, and want to divide that number by each of the 1st file's 2nd column values, i.e.: 30000, 40000, etc., and want to compare it with the 2nd file's value.
How it will be in script?

As bash doesn't have float number calculation function, you can either use bc or awk.
input=10000
value1=$(awk -F [,\ ] '{print $2}' file1)
value2=$(awk '{print $1}' file2)
awk 'BEGIN {if("'$input'"/"'$value1'" > "'$value2'") {print("larger")} else {print("smaller")};}'

Some other tool is needed, (i.e.: bc, awk, calc, etc.), since bash lacks floating point.
Let's call the first file f1, and the second file f2. First use sed and bash to write some text expressions:
x=4900 ; sed 's#.*,\(.*\)#'$x'/\1>'$(<f2)'#' f1
Output:
4900/30000>0.134
4900/40000>0.134
4900/60000>0.134
From there, feed that expression to calc (which will divide, then compare):
x=4900 ; sed 's#.*,\(.*\)#'$x'/\1>'$(<f2)'#' f1 | calc -p
Output, (in calc the "1" means true, here only 4900/30000 or 0.163... is greater than 0.134):
1
0
0

Related

Indexing variable created by awk in bash

I'm having some trouble indexing a variable (consisting of 1 line with 4 values) derived from a text file with awk.
In particular, I have a text-file containing all input information for a loop. Every row contains 4 specific input values, and every iteration makes use of a different row of the input file.
Input file looks like this:
/home/hannelore/TVB-pipe_local/subjects/CON02T1/ 10012 100000 1001 --> used for iteration 1
/home/hannelore/TVB-pipe_local/subjects/CON02T1/ 10013 7200 1001 --> used for iteration 2
...
From this input text file, I identified the different columns (path, seed, count, target), and then I wanted to index these variables in each iteration of the loop. However, index 0 returns the entire variable and higher indices return without output. Using awk, cut, or IFS on this obtained variable, I wasn't able to split the variable. Can anyone help me with this?
Some code that I used:
seed=$(awk '{print $2}' $input_file)
--> extract column information from input file, this works
seedsplit=$(awk '{print $2}' $seed)
seedsplit=$(cut -f2 -d ' ' $seed)"
Thank you in advance!
Kind regards,
Hannelore
If I understand you correctly you want to extract the values from the input file row by row.
while read a b c d; do echo "var1:" ${a}; done < file
will print
var1: /home/hannelore/TVB-pipe_local/subjects/CON02T1/
var1: /home/hannelore/TVB-pipe_local/subjects/CON02T1/
similarly you can access the other variables in b,c, and d.
If you want an array, then use array assignment notation:
seed=( $(awk '{print $2}' $input_file) )
Now you will have the words from each line of output from awk in a separate array element.
col1=( $(awk '{print $1}' $input_file) )
col3=( $(awk '{print $3}' $input_file) )
Now you have three arrays which can be indexed in parallel.
for i in $(seq 1 "${#col1[#]}"
do
echo "${col1[$i]} in col1; ${seed[$i]} in col2; ${col3[$i]} in col3"
done

Bash Text file formatting

I have some files with the following format:
555584280113;01-04-2013 00:00:11;0,22;889;30008;1501;sms;/xxx/yyy/zzz
552185022741;01-04-2013 00:00:13;0,22;889;30008;1501;sms;/xxx/yyy/zzz
5511965271852;01-04-2013 00:00:14;0,22;889;30008;1501;sms;/xxx/yyy/zzz
5511980644500;01-04-2013 00:00:22;0,22;889;30008;1501;sms;/xxx/yyy/zzz
553186398559;01-04-2013 00:00:31;0,22;889;30008;1501;sms;/xxx/yyy/zzz
555584280113;01-04-2013 00:00:41;0,22;889;30008;1501;sms;/xxx/yyy/zzz
558487839822;01-04-2013 00:01:09;0,22;889;30008;1501;sms;/xxx/yyy/zzz
I need to have them with a sequence of 10 digits long at the beginning, removed the prefix 55 on the second column (which I have done with a simple sed 's/^55//g') and reformat the date to look like this:
0000000001;555584280113;20130401 00:00:11;0,22;889;30008;1501;sms;/xxx/yyy/zzz
0000000002;552185022741;20130401 00:00:13;0,22;889;30008;1501;sms;/xxx/yyy/zzz
0000000003;5511965271852;20130401 00:00:14;0,22;889;30008;1501;sms;/xxx/yyy/zzz
0000000004;5511980644500;20130401 00:00:22;0,22;889;30008;1501;sms;/xxx/yyy/zzz
0000000005;553186398559;20130401 00:00:31;0,22;889;30008;1501;sms;/xxx/yyy/zzz
0000000006;555584280113;01-04-2013 00:00:41;0,22;889;30008;1501;sms;/xxx/yyy/zzz
I have the date part in a separate way:
cat file.txt | cut -d\; -f2 | awk '{print $1}' |awk -v OFS="-" -F"-" '{print $3$2$1}'
And it works, but I don't know how to put all of them together, the sequence + sed for the prefix + change the date format. The sequence part I'm not even sure how to do it.
Thanks for the help.
awk is one of the best tool out there used for text parsing and formatting. Here is one way of meeting your requirements:
awk '
BEGIN { FS = OFS = ";" }
{
printf "%010d;", NR
$1 = substr($1,3)
split($2, tmp, /[- ]/)
$2=tmp[3]tmp[2]tmp[1]" "tmp[4]
}1' file
We set the input and output field separator to ;
We use printf to format your first column number requirement
We use substr function to remove the first two characters of column 1
We use split function to format the time
Using 1 we print rest of the statement as is.
Output:
0000000001;5584280113;20130401 00:00:11;0,22;889;30008;1501;sms;/xxx/yyy/zzz
0000000002;2185022741;20130401 00:00:13;0,22;889;30008;1501;sms;/xxx/yyy/zzz
0000000003;11965271852;20130401 00:00:14;0,22;889;30008;1501;sms;/xxx/yyy/zzz
0000000004;11980644500;20130401 00:00:22;0,22;889;30008;1501;sms;/xxx/yyy/zzz
0000000005;3186398559;20130401 00:00:31;0,22;889;30008;1501;sms;/xxx/yyy/zzz
0000000006;5584280113;20130401 00:00:41;0,22;889;30008;1501;sms;/xxx/yyy/zzz
0000000007;8487839822;20130401 00:01:09;0,22;889;30008;1501;sms;/xxx/yyy/zzz
If the name of the input file is input, then the following command removes the 55, adds a 10-digit line number, and rearranges the date. With GNU sed:
nl -nrz -w10 -s\; input | sed -r 's/55//; s/([0-9]{2})-([0-9]{2})-([0-9]{4})/\3\2\1/'
If one is using Mac OSX (or another OS without GNU sed), then a slight change is required:
nl -nrz -w10 -s\; input | sed -E 's/55//; s/([0-9]{2})-([0-9]{2})-([0-9]{4})/\3\2\1/'
Sample output:
0000000001;5584280113;20130401 00:00:11;0,22;889;30008;1501;sms;/xxx/yyy/zzz
0000000002;2185022741;20130401 00:00:13;0,22;889;30008;1501;sms;/xxx/yyy/zzz
0000000003;11965271852;20130401 00:00:14;0,22;889;30008;1501;sms;/xxx/yyy/zzz
0000000004;11980644500;20130401 00:00:22;0,22;889;30008;1501;sms;/xxx/yyy/zzz
0000000005;3186398559;20130401 00:00:31;0,22;889;30008;1501;sms;/xxx/yyy/zzz
0000000006;5584280113;20130401 00:00:41;0,22;889;30008;1501;sms;/xxx/yyy/zzz
0000000007;8487839822;20130401 00:01:09;0,22;889;30008;1501;sms;/xxx/yyy/zzz
How it works: nl is a handy *nix utility for adding line numbers. -w10 tells nl that we want 10 digit line numbers. -nrz tells nl to pad the line numbers with zeros, and -s\; tells nl to add a semicolon after the line number. (We have to escape the semicolon so that the shell ignores it.)
The remaining changes are handled by sed. The sed command s/55// removes the first occurrence of 55. The rearrangement of the date is handled by s/([0-9]{2})-([0-9]{2})-([0-9]{4})/\3\2\1/.
You could actually use a Bash loop to do this.
i=0
while read f1 f2; do
((++i))
IFS=\; read n d <<< $f1
d=${d:6:4}${d:3:2}${d:0:2}
printf "%010d;%d;%d %s\n" $i $n $d $f2
done < file.txt

How to convert a column value in a file to its hexadecimal byte flipped format format

Input file:
0 2950|abc|def|0|
0 2564|abc|def|0|
Output file:
Append an additional 0000 after the conversion, i.e., i want the output as:
0000860B0000|abc|def|0|
0000040A0000|abc|def|0|
Only the first column is changed. I'm able to convert a value using print command. But how do I apply it to only the first column of the entire file of the above given input format
in shell scripting?
Ok i'm explaining the conversion process for first column,for example , 0 2950
, i've made hex of 0 i.e., 0000 and flipped it i.e again 0000 (00 00) and hex of 2950 (0B86)
and flip it 860B
(86 0B) and concat them i.e., 0000860B and since i want 12 byte word , i'll append the rest 0's i.e., 0000860B0000. Hope it clears everyone my requirement.
as I commented, the requirement is not 100% clear, however for the example in your question (the value <= FFFF) this (gnu) awk line works:
awk -F'|' -v OFS="|" '{
sub(/^\S* */,"",$1);
$1=sprintf("%04X",$1);$1=gensub(/^(..)(..)$/, "\\2\\1","g",$1)}
$1="0000"$1"0000"' file
test one example:
kent$ cat f
0 2950|abc|def|0|
0 2564|abc|def|0|
0 7|abc|def|0|
kent$ awk -F'|' -v OFS="|" '{sub(/^\S* */,"",$1);$1=sprintf("%04X",$1);$1=gensub(/^(..)(..)$/, "\\2\\1","g",$1)}$1="0000"$1"0000"' f
0000860B0000|abc|def|0|
0000040A0000|abc|def|0|
000007000000|abc|def|0|
This might work for you (GNU sed):
sed -r 'y/|/ /;s/.*/printf "%04X%04X0000|%s|%s|%s|" &/e;s/(..)(..)/\2\1/2' file
$ cat file
0 2950|abc|def|0|
0 2564|abc|def|0|
0 7|abc|def|0|
Awk Code
awk '{
gsub(/[[:space:]]/,x,$1) # Remove Space in field1
fmt=sprintf("%%0%dd%%s%%0%dd",4,4) # Format variable for number zeros
split(sprintf("%04X",$1),A,r) # Split is used (if gensub not available as its feature of gawk)
$1=sprintf(fmt,0,A[3]A[4]A[1]A[2],0) # Finally field1 with new values
}1
' FS="|" OFS="|" file
Resulting
0000860B0000|abc|def|0|
0000040A0000|abc|def|0|
000007000000|abc|def|0|
You may change fmt variable for number of zeros
Example : fmt=sprintf("%%0%dd%%s%%0%dd",1,3)
Default it takes 1 zero on both side
Will result
0860B000|abc|def|0|
0040A000|abc|def|0|
00700000|abc|def|0|
If you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk , /usr/xpg6/bin/awk , or nawk

Who to get/copy a specific word from a text file using bash?

when I do this:
LINE=$(head -38 fine.txt | tail -1 | cut -f2)
I get the 38th line of the file, which is:
Xres = 1098
but I only need to record 1098 as value for a variable.
I am training to read a text file and record values and use them as parameters later in my script.
Add | awk '{print $3}' to the pipeline.
sed -n '38s/.*= *//p' fine.txt
By default, sed prints every input line. The -n option disables this behavior. 38 selects line number 38 and when this line is seen, the substitution replaces everything up to the last equals sign with nothing, and prints.
That's assuming the second field is the last field. If the input line is more complex than I have assumed, try the substitution s/^[^\t]*\t[^\t]*= *//p. If your sed does not recognize \t as a tab character, you'll need to supply literal tabs (you can enter one with the key sequence ctrl-v tab in some shells).
If the input file is large, you may want to refactor the sed script to quit after the 38th line.
Wrapping up, that gets us
LINE=$(sed -n '38!b;s/^[^\t]*\t[^\t]*= *//p;q' fine.txt)
However, this is becoming somewhat complex, to the point of hampering legibility and thus maintainability. The same in Awk is more readable;
awk -F '\t' 'NR==38 { sub(/.*= */,"",$2); print $2; exit 0 }' fine.txt
More generally, you might want to split on tabs, then on spaces. The following implements cut -f2 | awk '{ print $3 }' more precisely:
awk -F '\t' 'NR==38 { split($2,f); print f[3]; exit 0 }' fine.txt
The option -F '\t' sets tab as the input field separator. The condition NR==38 selects the 38th line, and split($2,f) splits the second tab-separated field on spaces into the array f. Then we simply print the third element of f, and exit.

extracting values from text file using awk

I have 100 text files which look like this:
File title
4
Realization number
variable 2 name
variable 3 name
variable 4 name
1 3452 4538 325.5
The first number on the 7th line (1) is the realization number, which SHOULD relate to the file name. i.e. The first file is called file1.txt and has realization number 1 (as shown above). The second file is called file2.txt and should have realization number 2 on the 7th line. file3.txt should have realization number 3 on the 7th line, and so on...
Unfortunately every file has realization=1, where they should be incremented according to the file name.
I want to extract variables 2, 3 and 4 from the 7th line (3452, 4538 and 325.5) in each of the files and append them to a summary file called summary.txt.
I know how to extract the information from 1 file:
awk 'NR==7,NR==7{print $2, $3, $4}' file1.txt
Which, correctly gives me:
3452 4538 325.5
My first problem is that this command doesn't seem to give the same results when run from a bash script on multiple files.
#!/bin/bash
for ((i=1;i<=100;i++));do
awk 'NR=7,NR==7{print $2, $3, $4}' File$((i)).txt
done
I get multiple lines being printed to the screen when I use the above script.
Secondly, I would like to output those values to the summary file along with the CORRECT preceeding realization number. i.e. I want a file that looks like this:
1 3452 4538 325.5
2 4582 6853 158.2
...
100 4865 3589 15.15
Thanks for any help!
You can simplify some things and get the result you're after:
#!/bin/bash
for ((i=1;i<=100;i++))
do
echo $i $(awk 'NR==7{print $2, $3, $4}' File$i.txt)
done
You really don't want to assign to NR=7 (as you did) and you don't need to repeat the NR==7,NR==7 either. You also really don't need the $((i)) notation when $i is sufficient.
If all the files are exactly 7 lines long, you can do it all in one awk command (instead of 100 of them):
awk 'NR%7==0 { print ++i, $2, $3, $4}' Files*.txt
Notice that you have only one = in your bash script. Does all the files have exactly 7 lines? If you are only interested in the 7th line then:
#!/bin/bash
for ((i=1;i<=100;i++));do
awk 'NR==7{print $2, $3, $4}' File$((i)).txt
done
Since your realization number starts from 1, you can simply add that using nl command.
For example, if your bash script is called s.sh then:
./s.sh | nl > summary.txt
will get you the result with the expected lines in summary.txt
Here's one way using awk:
awk 'FNR==7 { print ++i, $2, $3, $4 > "summary.txt" }' $(ls -v file*)
The -v flag simply sorts the glob by version numbers. If your version of ls doesn't support this flag, try: ls file* | sort -V instead.

Resources