Replace multiple value of csv file - bash

I have a csv file :
1,1,1,2
2,2,1,2
3,3,1,2
4,4,1,2
5,5,1,2
6,6,1,2
7,7,1,2
8,8,1,2
9,9,1,2
10,10,2,2
11,11,2,2
12,12,2,2
13,13,3,2
I want to replace each third value to this :
If 1; then 22
If 2; then 35
If 3; then 14
This is what I have made :
awk -F , -v OFS=, '{if ($3=="1") $3="22";if ($3=="2") $3="35";if ($3=="3") $3="14"} {print "\""$1"\""",""\""$2"\""",""\""$3"\""",""\""$4"\""}' /tmp/test.csv
It's work well on Debian but not on Ubuntu.
What is the problem ? Thanks you
[EDIT]
With the example I cited yesterday, it works , but not with this one :
cat -v test.csv
1,1,1,2
2,2,1,2
3,3,1,2
4,4,1,2
5,5,1,2
6,6,1,2
7,7,1,2
8,8,1,2
9,9,1,2
10,10,1,2
11,11,1,2
12,12,1,2
13,13,1,2
14,14,1,2
15,15,1,2
16,16,1,2
17,17,1,2
18,18,1,2
19,19,1,2
20,20,1,2
21,21,1,2
22,22,1,2
23,23,1,2
24,24,1,2
25,25,1,2
26,26,1,2
27,27,1,2
28,28,1,2
29,29,1,2
30,30,1,2
31,31,1,2
32,32,1,2
33,33,1,2
34,34,1,2
35,35,1,2
36,36,1,2
37,37,1,2
38,38,1,2
39,39,1,2
40,40,1,2
And now, the command return :
awk -F , -v OFS=, '{if ($3=="1") $3="2";if ($3=="2") $3="3";if ($3=="3") $3="5"} {print "\""$1"\""",""\""$2"\""",""\""$3"\""",""\""$4"\""}' toast.csv
"1","1","5","2"
"2","2","5","2"
"3","3","5","2"
"4","4","5","2"
"5","5","5","2"
"6","6","5","2"
"7","7","5","2"
"8","8","5","2"
"9","9","5","2"
"10","10","5","2"
"11","11","5","2"
"12","12","5","2"
"13","13","5","2"
"14","14","5","2"
"15","15","5","2"
"16","16","5","2"
"17","17","5","2"
"18","18","5","2"
"19","19","5","2"
"20","20","5","2"
"21","21","5","2"
"22","22","5","2"
"23","23","5","2"
"24","24","5","2"
"25","25","5","2"
"26","26","5","2"
"27","27","5","2"
"28","28","5","2"
"29","29","5","2"
"30","30","5","2"
"31","31","5","2"
"32","32","5","2"
"33","33","5","2"
"34","34","5","2"
"35","35","5","2"
"36","36","5","2"
"37","37","5","2"
"38","38","5","2"
"39","39","5","2"
"40","40","5","2"
All third values ​​are equal to 5 instead of 2. Same issue with this example on Debian.

None of the code you have posted will behave differently on any given machine vs any other machine. You saying that it did and posting the wrong code initially was a red herring, you just have buggy code, that's all.
The code you added in your latest edit says:
if ($3=="1") $3="2";if ($3=="2") $3="3";if ($3=="3") $3="5"
So let's say you start with a $3 in your input file that has value 1. Your first test/assignment is if ($3=="1") $3="2" so after that code executes $3 has value 2. Now your second test/assignment is if ($3=="2") $3="3" Well, $3 IS now 2 after your first code segment executes, so now it gets set to 3. And then your next test/assignment sets it to 5.
So given a $3 that is 1 you set $3 to 2, then you set it to 3 then you set it to 5 - net result is it's always 5. Throw in in some "else"s:
if ($3=="1") $3="2"; else if ($3=="2") $3="3"; else if ($3=="3") $3="5"
but at least change your script to avoid having to print each field individually:
awk -F, -v OFS='","' '{if ($3=="1") $3="2"; else if ($3=="2") $3="3"; else if ($3=="3") $3="5"} {print "\""$0"\""}' toast.csv
and consider using a more idiomatic approach:
$ cat file
9,9,1,2
10,10,2,2
13,13,3,2
$ awk -F, -v OFS='","' 'BEGIN{split("2,3,5",m)} {$3=m[$3]} {print "\""$0"\""}' file
"9","9","2","2"
"10","10","3","2"
"13","13","5","2"
The above assume your $3 is always one of the values you show/test for. If not there's easy tweaks.
In general to map one set of arbitrary numbers to another and allow for some input data that doesn't need to get mapped:
$ awk -F, -v OFS='","' 'BEGIN{split("1,2,3",a); split("2,3,5",b); for (i in a) m[a[i]]=b[i]} {$3=($3 in m ? m[$3] : $3)} {print "\""$0"\""}' file
"9","9","2","2"
"10","10","3","2"
"13","13","5","2"
or if you prefer:
$ awk -F, -v OFS='","' 'BEGIN{split("1,2,2,3,3,5",t); for (i=2;i in t;i+=2) m[t[i-1]]=t[i]} {$3=($3 in m ? m[$3] : $3)} {print "\""$0"\""}' file
"9","9","2","2"
"10","10","3","2"
"13","13","5","2"

It might be easier with sed:
sed 's/\([0-9]*,[0-9]*,\)1\(,[0-9]*\)/\122\2/' /tmp/test.csv
sed 's/\([0-9]*,[0-9]*,\)2\(,[0-9]*\)/\135\2/' /tmp/test.csv
sed 's/\([0-9]*,[0-9]*,\)3\(,[0-9]*\)/\114\2/' /tmp/test.csv
I believe that should do the trick and will most likely work on most sh/bash environments.
EDIT:
Note that this just prints out the actual replacements each command does, so you know what is going to happen before you actually change anything. You may want to first back up your file and then do inplace replacements with the -i flag:
$ cat /tmp/test.csv
1,1,1,2
2,2,1,2
3,3,1,2
4,4,1,2
5,5,1,2
6,6,1,2
7,7,1,2
8,8,1,2
9,9,1,2
10,10,2,2
11,11,2,2
12,12,2,2
13,13,3,2
$ cp /tmp/test.csv /tmp/test.csv.bak
$ sed -i 's/\([0-9]*,[0-9]*,\)1\(,[0-9]*\)/\122\2/' /tmp/test.csv
$ sed -i 's/\([0-9]*,[0-9]*,\)2\(,[0-9]*\)/\135\2/' /tmp/test.csv
$ sed -i 's/\([0-9]*,[0-9]*,\)3\(,[0-9]*\)/\114\2/' /tmp/test.csv
$ cat /tmp/test.csv
1,1,22,2
2,2,22,2
3,3,22,2
4,4,22,2
5,5,22,2
6,6,22,2
7,7,22,2
8,8,22,2
9,9,22,2
10,10,35,2
11,11,35,2
12,12,35,2
13,13,14,2

Related

Using sed command in shell script for substring and replace position to need

I’m dealing data on text file and I can’t find a way with sed to select a substring at a fixed position and replace it.
This is what I have:
X|001200000000000000000098765432|1234567890|TQ
This is what I need:
‘X’,’00000098765432’,’1234567890’,’TQ’
The following code in sed gives the substring I need (00000098765432) but not overwrites position to need
echo “ X|001200000000000000000098765432|1234567890|TQ” | sed “s/
*//g;s/|/‘,’/g;s/^/‘/;s/$/‘/“
Could you help me?
Rather than sed, I would use awk for this.
echo "X|001200000000000000000098765432|1234567890|TQ" | awk 'BEGIN {FS="|";OFS=","} {print $1,substr($2,17,14),$3,$4}'
Gives output:
X,00000098765432,1234567890,TQ
Here is how it works:
FS = Field separator (in the input)
OFS = Output field separator (the way you want output to be delimited)
BEGIN -> think of it as the place where configurations are set. It runs only one time. So you are saying you want output to be comma delimited and input is pipe delimited.
substr($2,17,14) -> Take $2 (i.e. second field - awk begins counting from 1 - and then apply substring on it. 17 means the beginning character position and 14 means the number of characters from that position onwards)
In my opinion, this is much more readable and maintainable than sed version you have.
If you want to put the quotes in, I'd still use awk.
$: awk -F'|' 'BEGIN{q="\047"} {print q $1 q","q substr($2,17,14) q","q $3 q","q $4 q"\n"}' <<< "X|001200000000000000000098765432|1234567890|TQ"
'X','00000098765432','1234567890','TQ'
If you just want to use sed, note that you say above you want to remove 16 characters, but you are actually only removing 14.
$: sed -E "s/^(.)[|].{14}([^|]+)[|]([^|]+)[|]([^|]+)/'\1','\2','\3','\4'/" <<< "X|0012000000000000000098765432|1234567890|TQ"
'X','00000098765432','1234567890','TQ'
Using sed
$ sed "s/|\(0[0-9]\{15\}\)\?/','/g;s/^\|$/'/g" input_file
'X','00000098765432','1234567890','TQ'
Using any POSIX awk:
$ echo 'X|001200000000000000000098765432|1234567890|TQ' |
awk -F'|' -v OFS="','" -v q="'" '{sub(/.{16}/,"",$2); print q $0 q}'
'X','00000098765432','1234567890','TQ'
not as elegant as I hoped for, but it gets the job done :
'X','00000098765432','1234567890','TQ'
# gawk profile, created Mon May 9 21:19:17 2022
# BEGIN rule(s)
'BEGIN {
1 _ = sprintf("%*s", (__ = +2)^++__+--__*++__,__--)
1 gsub(".", "[0-9]", _)
1 sub("$", "$", _)
1 FS = "[|]"
1 OFS = "\47,\47"
}
# Rule(s)
1 (NF *= NF == __*__) * sub(_, "|&", $__) * \
sub("^.*[|]", "", $__) * sub(".+", "\47&\47") }'
Tested and confirmed working on gnu gawk 5.1.1, mawk 1.3.4, mawk 1.9.9.6, and macosx nawk
— The 4Chan Teller
awk -v del1="\047" \
-v del2="," \
-v start="3" \
-v len="17" \
'{
gsub(substr($0,start+1,len),"");
gsub(/[\|]/,del1 del2 del1);
print del1$0del1
}' input_file
'X',00000098765432','1234567890','TQ'

Need to use awk to get a specific word or value after another specific word?

I need to use awk to get a specific word or value after another specific word, I tried some awk commands already but after many other filters like grep and sed. The file that I need to get the word from is having the same line more than one time like the below line:
Configuration: number=6 model=MSA SNT=4 IC=8 SIZE=16384MB NRF=24 meas=2.00
If need 24 I used
grep IC file | awk 'NF>1{print $NF}'
If need 16384MB I used
grep IC file | awk -F'SIZE=' '{ print $2 }'|awk '{ print $1 }'
We need to get any word from that line using awk? what I used can get what is needed but we still need a minimized awk command.
I am sure we can use one single awk to get the needed info from one line minimized command?
sed -r 's/.*SIZE=([^ ]+).*/\1/' input
16384MB
sed -r 's/.*NRF=([^ ]+).*/\1/' input
24
grep way :
grep -oP 'SIZE=\K[^ ]+' imput
16384MB
awk way :
awk '{for(i=1;i<=NF;i++) if($i ~ /SIZE=/) split($i,a,"=");print a[2]}' input
You could use an Awk with multi-character de-limiter as below to get this done. Loop through the fields, match the pattern you need and print the next field which contains the field value.
awk -F'[:= ]' -v option="${match}" '{for(i=1;i<=NF;i++) if ($i ~ option) {print $(i+1)}}' file
Examples,
match="number"
awk -F'[:= ]' -v option="${match}" '{for(i=1;i<=NF;i++) if ($i ~ option) {print $(i+1)}}' file
6
match="model"
awk -F'[:= ]' -v option="${match}" '{for(i=1;i<=NF;i++) if ($i ~ option) {print $(i+1)}}' file
MSA
match="meas"
awk -F'[:= ]' -v option="${match}" '{for(i=1;i<=NF;i++) if ($i ~ option) {print $(i+1)}}' file
2.00
here is a more general approach
... | awk -v k=NRF '{for(i=2;i<=NF;i++) {split($i,a,"="); m[a[1]]=a[2]} print m[k]}'
code will stay the same just change the key k.
If you have GNU awk you could use the third parameter of match:
$ awk 'match($0,/( IC=)([^ ]*)/,a)&& $0=a[2]' file
8
Or get the meas:
$ awk 'match($0,/( meas=)([^ ]*)/,a)&& $0=a[2]' file
2.00
Should you use some other awk, you could use this combination of split, substr and match:
$ awk 'split(substr($0,match($0,/ IC=[^ ]*/),RLENGTH),a,"=") && $0=a[2]' file
8

Using awk to search for a line that starts with but also contains a string

I have a file that has multiple lines that starts with a keyword. I only want to modify one of them and it's easy to distinguish the two. I want the one that is under the [dbinfo] section. The domain name is static so I know that won't change.
awk -F '=' '$1 ~ /^dbhost/ {print $NF};' myfile.txt
myfile.txt
[ual]
path=/web/
dbhost=ez098sf
[dbinfo]
dbhost=ec0001.us-east-1.localdomain
dbname=ez098sf_default
dbpass=XXXXXX
You can use this awk command to first check for presence of [dbinfo] section and then modify dbhost parameter:
awk -v h='newhost' 'BEGIN{FS=OFS="="}
$0 == "[dbinfo]" {sec=1} sec && $1 == "dbhost"{$2 = h; sec=0} 1' file
[ual]
path=/web/
dbhost=ez098sf
[dbinfo]
dbhost=newhost
dbname=ez098sf_default
dbpass=XXXXXX
You want to utilize a little bit of a state machine here:
awk -F '=' '
$0 ~ /^\[.*\]/ {in_db_info=($0=="[dbinfo]"}
$0 ~ /^dbhost/{if (in_db_info) print $2;}' myfile.txt
You can also do it with sed:
sed '/\[dbinfo\]/,/\[/s/\(^dbhost=\).*/\1domain.com/' myfile.txt

awk load one file into array, test against another file

I have two files:
seqs.fa:
>seq000007;size=72768;
ACTGTGAG
>seq000010;size=53132;
GTAAGATC
GAATTCTT
>seq00045;size=40321;
ACCCATTT
...
numbers.txt
72768
53132
my desired output would be the lines from the first file that match a number from the second file:
>seq000007;size=72768;
>seq000010;size=53132;
I attempted to use awk, but it only returns lines matching the first number:
awk -F"\n" -v RS=">" 'NR==FNR{for(i=1;i<=NF;i++) A[$i]; next} END {for (header in A) {if ( match(header,$1) ) {print header}}}' seqs.fa numbers.txt
seq000007;size=72768;
seq072768;size=1;
Why is awk only looping through the "header" array for the first line in numbers.txt? And, if this is an XY problem, is there a better way to accomplish this goal?
after fixing the typo in your numbers file
$ awk -F'=|;' 'NR==FNR{a[$1]; next}; $3 in a' numbers.txt seqs.fa
>seq000007;size=72768;
>seq000010;size=53132;
In this special case you can use GNU grep like this:
grep -F -f numbers.txt seqs.fa
The option -f filename uses all the patterns found in filename for the search. The options -F tells grep, that the patterns are simple fixed strings.

Parsing an variable in search field in AWK: issue with syntax

I read a few topics but still cannot solve the problem.
This is test file for example:
1:abc:100:/k/ll
2:abd:120:/k/gg
3:www:3:/k/ll
4:rrr:66:/k/gg
5:ddd:140:/k/ll
This is my code:
ZM=${2:-test}
VAR=$1
awk -F':' -v one="$VAR" '$4 ~ one $3 > 100' $ZM
I want for this script to write these lines, where the 3 field is greater than 100, and 4 field contains the string specified in the variable, eg. "ll".
For example:
./test.sh ll
Output:
1:abc:100:/k/ll
5:ddd:140:/k/ll
What am I doing wrong? Thanks for your responses!
FOR $3>100
awk -v FS=":" -v one=$VAR '{if($3>100 && $4~one){print $0}}' my_file
FOR $3>=100 (since your output is different from your request)
awk -v FS=":" -v one=$VAR '{if($3>=100 && $4~one){print $0}}' my_file

Resources