how to add prefix in every 2nd delimeted field in every record in shell script? - bash

I have a file with following records which is comma delimited:
143849998,+4564656
6345353,000345345
754656,0345345
64555546,3453452345
The requirement is to add a certain prefix to every 2nd field in every record. The prefix is different in different conditions. The logic is :
If the second field starts with "+", then leave it as it is.
If the second field starts with "0" (Any number of zeroes, does not matter), replace all zeroes with "+".
If any other condition prefix "+234".
The output should be something like this:
143849998,+4564656
6345353,+345345
754656,+345345
64555546,+2343453452345
How can I achieve this using AWK? I am able to perform the last condition, the first condition is straight forward, but I am failing when I am trying to club all the conditions in one awk command.

this line should do
awk -F, -v OFS="," '$2!~/^\+/{if(!sub(/^0+/,"+",$2))$2="+234"$2}7' file
143849998,+4564656
6345353,+345345
754656,+345345
64555546,+2343453452345
or it could be this too:
awk -F, -v OFS="," '$2!~/^\+/&&!sub(/^0+/,"+",$2){$2="+234"$2}7' file

Using sed:
sed 's|,\([1-9]\)|,+234\1|; s|,0\+|,+|' file
Synonymously in awk:
awk '{ sub(/,([1-9])/, ",+234\1"); sub(/,0+/, ",+") } 1' file
Output:
143849998,+4564656
6345353,+345345
754656,+345345
64555546,+2343453452345

$ awk -F, '{print $1",+"($2~/^[+0]/?"":234)$2+0}' file
143849998,+4564656
6345353,+345345
754656,+345345
64555546,+2343453452345
Adding 0 to $2 strips off any leading zeros and/or plus sign since it's doing an arithmetic operation on it and so the natural result will not have a sign or leading zeros.
Note that this approach will convert +03 to +3 and -3 to +-3 so if those can occur in your input and that's not the desired behavior, update your question to show those cases in the sample input/output.

Using awk:
awk 'BEGIN{FS=OFS=","} $2~/^0/{sub(/^0+/, "+", $2);} !($2~/^\+/){$2="+234" $2}1' file
143849998,+4564656
6345353,+345345
754656,+345345
64555546,+2343453452345
OR using non-regex based checks:
awk 'BEGIN{FS=OFS=","} substr($2,1,1)=="0"{sub(/^0+/, "+", $2);}
substr($2,1,1)!="+"{$2="+234" $2}1' file
143849998,+4564656
6345353,+345345
754656,+345345
64555546,+2343453452345

If your data is as it is:
awk -F",|,[0]+" '/,\+/{ print $0;next } /,0/ {print $1",+"$2; next} {print $1",+234"$2} ' file
gives
143849998,+4564656
6345353,+345345
754656,+345345
64555546,+2343453452345
It works as follows:
either , or ,000 (with an arbitrary number of 0s) is a field separator. Then we have the three rules:
match ,+ (literally) -> print the line, go to next line
match ,0 -> split the line (awk removes all the , and 0s for us), print separated, go to next line
if no match until now, prefix the second with "+234"

You can use the following awk script:
awk -F, 'BEGIN{OFS=","}{sub(/^0+/, "+", $2)}!($2~/[0+]/){$2="+234"$2}1'

This might work for you (GNU sed):
sed '/,+/b;/,00*/s//,+/;t;s/,/&+234/' file
As per spec.

Related

Print part of a comma-separated field using AWK

I have a line containing this string:
$DLOAD , 123 , Loadcase name=SUBCASE_1
I am trying to only print SUBCASE_1. Here is my code, but I get a syntax error.
awk -F, '{n=split($3,a,"="); a[n]} {printf(a[1]}' myfile
How can I fix this?
1st solution: In case you want only to get last field(which contains = in it) then with your shown samples please try following
awk -F',[[:space:]]+|=' '{print $NF}' Input_file
2nd solution: OR in case you want to get specifically 3rd field's value after = then try following awk code please. Simply making comma followed by space(s) as field separator and in main program splitting 3rd field storing values into arr array, then printing 2nd item value of arr array.
awk -F',[[:space:]]+' '{split($3,arr,"=");print arr[2]}' Input_file
Possibly the shortest solution would be:
awk -F= '{print $NF}' file
Where you simply use '=' as the field-separator and then print the last field.
Example Use/Output
Using your sample into in a heredoc with the sigil quoted to prevent expansion of $DLOAD, you would have:
$ awk -F= '{print $NF}' << 'eof'
> $DLOAD , 123 , Loadcase name=SUBCASE_1
> eof
SUBCASE_1
(of course in this case it probably doesn't matter whether $DLOAD was expanded or not, but for completeness, in case $DLOAD included another '=' ...)

Extract the last three columns from a text file with awk

I have a .txt file like this:
ENST00000000442 64073050 64074640 64073208 64074651 ESRRA
ENST00000000233 127228399 127228552 ARF5
ENST00000003100 91763679 91763844 CYP51A1
I want to get only the last 3 columns of each line.
as you see some times there are some empty lines between 2 lines which must be ignored. here is the output that I want to make:
64073208 64074651 ESRRA
127228399 127228552 ARF5
91763679 91763844 CYP51A1
awk  '/a/ {print $1- "\t" $-2 "\t" $-3}'  file.txt.
it does not return what I want. do you know how to correct the command?
Following awk may help you in same.
awk 'NF{print $(NF-2),$(NF-1),$NF}' OFS="\t" Input_file
Output will be as follows.
64073208 64074651 ESRRA
127228399 127228552 ARF5
91763679 91763844 CYP51A1
EDIT: Adding explanation of command too now.(NOTE this following command is for only explanation purposes one should run above command only to get the results)
awk 'NF ###Checking here condition NF(where NF is a out of the box variable for awk which tells number of fields in a line of a Input_file which is being read).
###So checking here if a line is NOT NULL or having number of fields value, if yes then do following.
{
print $(NF-2),$(NF-1),$NF###Printing values of $(NF-2) which means 3rd last field from current line then $(NF-1) 2nd last field from line and $NF means last field of current line.
}
' OFS="\t" Input_file ###Setting OFS(output field separator) as TAB here and mentioning the Input_file here.
You can use sed too
sed -E '/^$/d;s/.*\t(([^\t]*[\t|$]){2})/\1/' infile
With some piping:
$ cat file | tr -s '\n' | rev | cut -f 1-3 | rev
64073208 64074651 ESRRA
127228399 127228552 ARF5
91763679 91763844 CYP51A1
First, cat the file to tr to squeeze out repeted \ns to get rid of empty lines. Then reverse the lines, cut the first three fields and reverse again. You could replace the useless cat with the first rev.

awk rounding the float numbers

My first line of file a.txt contains following and fields are separated by (,)
ab,b1,c,d,5.986627e738,e,5.986627e738
cd,g2,h,i,7.3423542344,j,7.3423542344
ef,l3,m,n,9.3124234323,o,9.3124234323
when I issue the below command
awk -F"," 'NR>-1{OFS=",";gsub($5,$5+10);OFS=",";print }' a.txt
it is printing
ab,b1,c,d,inf,e,inf
cd,g2,h,i,17.3424,j,17.3424
ef,l3,m,n,19.3124,o,19.3124
Here I have two issues
I asked awk to add 10 to only 5th column but it has added to 7th column as well due to duplicate entries
It is rounding up the numbers, instead, I need decimals to print as it is
How can I fix this?
awk 'BEGIN {FS=OFS=","}{$5=sprintf("%.10f", $5+10)}7' file
in your data, the $5 from line#1 has an e, so it was turned into 10.0000... in output.
you did substitution with gsub, therefore all occurrences will be replaced.
printf/sprintf should be considered to output in certain format.
tested with gawk
If you want to set the format in printf dynamically:
kent$ cat f
ab,b1,c,d,5.9866,e,5.986627e738
cd,g2,h,i,7.34235,j,7.3423542344
ef,l3,m,n,9.312423,o,9.3124234323
kent$ awk 'BEGIN {FS=OFS=","}{split($5,p,".");$5=sprintf("%.*f",length(p[2]), $5+10)}7' f
ab,b1,c,d,15.9866,e,5.986627e738
cd,g2,h,i,17.34235,j,7.3423542344
ef,l3,m,n,19.312423,o,9.3124234323
what you did is replacement on the whole record, what you really want to do is
awk 'BEGIN {FS=OFS=","}
{$5+=10}1' a.txt

Awk, Shell Scripting

I have a file which has the following form:
#id|firstName|lastName|gender|birthday|creationDate|locationIP|browserUsed
111|Arkas|Sarkas|male|1995-09-11|2010-03-17T13:32:10.447+0000|192.248.2.123|Midori
Every field is separated with "|". I am writing a shell script and my goal is to remove the "-" from the fifth field (birthday), in order to make comparisons as if they were numbers.
For example i want the fifth field to be like |19950911|
The only solution I have reached so far, deletes all the "-" from each line which is not what I want using sed.
i would be extremely grateful if you show me a solution to my problem using awk.
If this is a homework writing the complete script will be a disservice. Some hints: the function you should be using is gsub in awk. The fifth field is $5 and you can set the field separator by -F'|' or in BEGIN block as FS="|"
Also, line numbers are in NR variable, to skip first line for example, you can add a condition NR>1
An awk one liner:
awk 'BEGIN { FS="|" } { gsub("-","",$5); print }' infile.txt
To keep "|" as output separator, it is better to define OFS value as "|" :
... | awk 'BEGIN { FS="|"; OFS="|"} {gsub("-","",$5); print $0 }'

Cut and replace bash

I have to process a file with data organized like this
AAAAA:BB:CCC:EEEE:DDDD
FF:III:JJJ:KK:LLL
MMMM:NN:OOO:PP
etc
Columns can have different length but lines always have the same number of columns.
I want to be able to cut a specific column of a given line and change it to the value I want.
For example I'd apply my command and change the file to
AAAAA:BB:XXXX:EEEE:DDDD
FF:III:JJJ:KK:LLL
MMMM:NN:OOO:PP
I know how to select a specific line with sed and then cut the field but I have no idea on how to replace the field with the value I have.
Thanks
Here's a way to do it with awk:
Going with your example, if you wanted to replace the 3rd field of the 1st line:
awk 'BEGIN{FS=OFS=":"} {if (NR==1) {$3 = "XXXX"}; print}' input_file
Input:
AAAAA:BB:CCC:EEEE:DDDD
FF:III:JJJ:KK:LLL
MMMM:NN:OOO:PP
Output:
AAAAA:BB:XXXX:EEEE:DDDD
FF:III:JJJ:KK:LLL
MMMM:NN:OOO:PP
Explanation:
awk: invoke the awk command
'...': everything enclosed by single-quotes are instructions to awk
BEGIN{FS=OFS=":"}: Use : as delimiters for both input and output. FS stands for Field Separator. OFS stands for Output Field Separator.
if (NR==1) {$3 = "XXXX"};: If Number of Records (NR) read so far is 1, then set the 3rd field ($3) to "XXXX".
print: print the current line
input_file: name of your input file.
If instead what you are trying to accomplish is simply replace all occurrences of CCC with XXXX in your file, simply do:
sed -i 's/CCC/XXXX/g` input_file
Note that this will also replace partial matches, such as ABCCCDD -> ABXXXXDD
This might work for you (GNU sed):
sed -r 's/^(([^:]*:?){2})CCC/\1XXXX/' file
or
awk -F: -vOFS=: '$3=="CCC"{$3="XXXX"};1' file

Resources