How can i compare the numeric values of the last two fields in a file? - bash

I have a file that contains the following information
organic_apple;2;organic_apple_212_212
organic_tomato;3;organic_tomato_24_29
fruit_juice;5;fruit_juice_15_15
So i want a file that contains the output
organic_apple;2;organic_apple_212
organic_tomato;3;organic_tomato_24_29
fruit_juice;5;fruit_juice_15
compare the last two fields, if they are the same display it once , if not , display them both
I'm writing in unix bash using solaris

Regardless of the number of underscores, compare the last two:
awk 'BEGIN{FS=OFS="_"}$NF==$(NF-1){--NF;$1=$1}1' test.in

Try this :
awk -vOFS=_ -F_ '{if ($2 == $3) print $1, $2; else print $1, $2, $3}' file.txt

This script removes the last field, if it is equal to the one before last:
awk -F "_" '$NF==$(NF-1){$NF=""}1' file

Related

Add string to columns in bash

I have a comma-delimited file to which I want to append a string in specific columns. I am trying to do something like this, but couldn't do it until now.
re1,1,a1e,a2e,AGT
re2,2,a1w,a2w,AGT
re3,3,a1t,a2t,ACGTCA
re12,4,b1e,b2e,ACGTACT
And I want to append 'some_string' to columns 3 and 4:
re1,1,some_stringa1e,some_stringa2e,AGT
re2,2,some_stringa1w,some_stringa2w,AGT
re3,3,some_stringa1t,some_stringa2t,ACGTCA
re12,4,some_stringb1e,some_stringb2e,ACGTACT
I was trying something similar to the suggestion solution, but to no avail:
awk -v OFS=$'\,' '{ $3="some_string" $3; print}' $lookup_file
Also, I would like my string to be added to both columns. How would you do this with awk or bash?
Thanks a lot in advance
You can do that with (almost) what you have:
pax> echo 're1,1,a1e,a2e,AGT
re2,2,a1w,a2w,AGT
re3,3,a1t,a2t,ACGTCA
re12,4,b1e,b2e,ACGTACT' | awk 'BEGIN{FS=OFS=","}{$3 = "pre3:"$3; $4 = "pre4:"$4; print}'
re1,1,pre3:a1e,pre4:a2e,AGT
re2,2,pre3:a1w,pre4:a2w,AGT
re3,3,pre3:a1t,pre4:a2t,ACGTCA
re12,4,pre3:b1e,pre4:b2e,ACGTACT
The begin block sets the input and output field separators, the two assignments massage fields 3 and 4, and the print outputs the modified line.
You need to set FS to comma, not just OFS. There's a shortcut for setting FS, it's the -F option.
awk -F, -v OFS=',' '{ $3="some_string" $3; $4 = "some_string" $4; print}' "$lookup_file"
awk's default action is to concatenate, so you can simply place strings next to each other and they'll be treated as one. 1 means true, so with no {action} it will assume "print". You can use Bash's Brace Expansion to assign multiple variables after the script.
awk '{$3 = "three" $3; $4 = "four" $4} 1' {O,}FS=,

AWK: search substring in first file against second

I have the following files:
data.txt
Estring|0006|this_is_some_random_text|more_text
Fstring|0010|random_combination_of_characters
Fstring|0028|again_here
allids.txt (here the columns are separated by semicolon; the real input is tab-delimited)
Estring|0006;MAR0593
Fstring|0002;MAR0592
Fstring|0028;MAR1195
please note: data.txt: the important part is here the first two "columns" = name|number)
Now I want to use awk to search the first part (name|number) of data.txt in allids.txt and output the second column (starting with MAR)
so my expected output would be (again tab-delimited):
Estring|0006|this_is_some_random_text|more_text;MAR0593
Fstring|0010|random_combination_of_characters
Fstring|0028|again_here;MAR1195
I do not know now how to search that first conserved part within awk, the rest should then be:
awk 'BEGIN{FS=OFS="\t"} FNR == NR { a[$1] = $1; next } $1 in a { print a[$0], [$1] }' data.txt allids.txt
I would use a set of field delimiters, like this:
awk -F'[|\t;]' 'NR==FNR{a[$1"|"$2]=$0; next}
$1"|"$2 in a {print a[$1"|"$2]"\t"$NF}' data.txt allids.txt
In your real-data example you can remove the ;. It is in here just to be able to reproduce the example in the question.
Here is another awk that uses a different field separator for both files:
awk -F ';' 'NR==FNR{a[$1]=FS $2; next} {k=$1 FS $2}
k in a{$0=$0 a[k]} 1' allids.txt FS='|' data.txt
Estring|0006|this_is_some_random_text|more_text;MAR0593
Fstring|0010|random_combination_of_characters
Fstring|0028|again_here;MAR1195
This command uses ; as FS for allids.txt and uses | as FS for data.txt.

Awk, Shell Scripting

I have a file which has the following form:
#id|firstName|lastName|gender|birthday|creationDate|locationIP|browserUsed
111|Arkas|Sarkas|male|1995-09-11|2010-03-17T13:32:10.447+0000|192.248.2.123|Midori
Every field is separated with "|". I am writing a shell script and my goal is to remove the "-" from the fifth field (birthday), in order to make comparisons as if they were numbers.
For example i want the fifth field to be like |19950911|
The only solution I have reached so far, deletes all the "-" from each line which is not what I want using sed.
i would be extremely grateful if you show me a solution to my problem using awk.
If this is a homework writing the complete script will be a disservice. Some hints: the function you should be using is gsub in awk. The fifth field is $5 and you can set the field separator by -F'|' or in BEGIN block as FS="|"
Also, line numbers are in NR variable, to skip first line for example, you can add a condition NR>1
An awk one liner:
awk 'BEGIN { FS="|" } { gsub("-","",$5); print }' infile.txt
To keep "|" as output separator, it is better to define OFS value as "|" :
... | awk 'BEGIN { FS="|"; OFS="|"} {gsub("-","",$5); print $0 }'

awk combine 2 commands for csv file formatting

I have a CSV file which has 4 columns. I want to first:
print the first 10 items of each column
only print the items in the third column
My method is to pipe the first awk command into another but i didnt get exactly what i wanted:
awk 'NR < 10' my_file.csv | awk '{ print $3 }'
The only missing thing was the -F.
awk -F "," 'NR < 10' my_file.csv | awk -F "," '{ print $3 }'
You don't need to run awk twice.
awk -F, 'NR<=10{print $3}'
This prints the third field for every line whose record number (line) is less than or equal to 10.
Note that < is different from <=. The former matches records one through nine, the latter matches records one through ten. If you need ten records, use the latter.
Note that this will walk through your entire file, so if you want to optimize your performance:
awk -F, '{print $3} NR>10{exit}'
This will print the third column. Then if the record number is greater than 10, it will exit. This does not step through your entire file.
Note also that awk's "CSV" matching is very simple; awk does not understand quoted fields, so the record:
red,"orange,yellow",green
has four fields, two of which have double quotes in them. YMMV depending on your input.

Redirect output of one command to different files in shell script

I have a tab seperated string.
I want to copy 1 column to one file and the remaining columns to other file in one go..as that string can modify in between if I use 2 different commands.
I tried:
tab_seperated_string | awk -F"\t" '{ print $2"\t"$3"\t"$4"\t"$5} {print $1}'
2,3,4,5 should go to one file and 1 should go to another file.
You can do like this:
tab_seperated_string | awk -F"\t" '{print $2,$3,$4,$5 > "file2"; print $1 > "file1"}' OFS="\t"
It will then save data to two different files.
By setting OFS to \t, you do not need all the \t in the print statement.
Here is another way if you have many fields that go to one file and first field to another:
awk -F"\t" '{print $1 > "file1"; sub(/[^\t]+\t/,""); print $0 > "file2"}' OFS="\t"
The sub(/[^\t]+\t/,"") removes first field and first tab.

Resources