how to replace a string at a specific position in a csv file using bash - bash

I have several .csv files and each csv file has lines which look like this.
AA,1,CC,1,EE
AA,FF,6,7,8,9
BB,6,7,8,99,AA
I am reading through each line of each csv file and then trying to replace the 4th position of each line beginning with AA with "ZZ"
Expected output
AA,1,CC,ZZ,EE
EE,FF,6,ZZ,8,9
BB,6,7,8,99,AA
However the variable "y" does contain the 4th variable "1" and "7" respectively, but when I use sed command it replaces the first occurrence of "1" with "ZZ".
How do I modify my code to replace only the 4th position of each line irrespective of what value it holds?
My code looks like this
$file = "name of file which contains list of all csv files"
for i in `cat file`
while IFS = read -r line;
do
if [[ $line == AA* ]] ; then
y=$(echo "$line" | cut -d',' -f 4)
sed -i "s/${y}/ZZ/" $i
fi
done < $i

Using sed, you can also direct that only the 4th field of a comma separated values file be changed to "ZZ" for lines beginning "AA" with:
sed -i '/^AA/s/[^,][^,]*/ZZ/4' file
Explanation
sed -i call sed to edit file in place;
general form /find/s/match/replace/occurrence; where
find is /^AA/ line beginning with "AA";
match [^,][^,]* a character not a comma followed by any number of non-commas;
replace /ZZ/4 the 4th occurrence of match with "ZZ".
Note, both awk and sed provide good solutions in this case so see the answers by #perreal and #RavinderSingh13
Example Input File
$ cat file
AA,1,CC,1,EE
AA,FF,6,7,8,9
BB,6,7,8,99,AA
Example Use/Output
(note: -i not used below so the changes are simply output to stdout)
$ sed '/^AA/s/[^,][^,]*/ZZ/4' file
AA,1,CC,ZZ,EE
AA,FF,6,ZZ,8,9
BB,6,7,8,99,AA

To robustly do this is just:
$ awk 'BEGIN{FS=OFS=","} $1=="AA"{$4="ZZ"} 1' csv
AA,1,CC,ZZ,EE
AA,FF,6,ZZ,8,9
BB,6,7,8,99,AA
Note that the above is doing a literal string comparison and a literal string replacement so unlike the other solutions posted so far it won't fail if the target string (AA in this example) contains regexp metachars like . or *, nor if it can be part of another string like AAX, nor if the replacement string (ZZ in this example) contains backreferences like & or \1.
If you want to map multiple strings in one pass:
$ awk 'BEGIN{FS=OFS=","; m["AA"]="ZZ"; m["BB"]="FOO"} $1 in m{$4=m[$1]} 1' csv
AA,1,CC,ZZ,EE
AA,FF,6,ZZ,8,9
BB,6,7,FOO,99,AA
and just like GNU sed has -i for "inplace" editing, GNU awk has -i inplace, so you can discard the shell loop and just do:
awk -i inplace '
BEGIN { FS=OFS="," }
(NR==FNR) { ARGV[ARGC++]=$0 }
(NR!=FNR) && ($1=="AA") { $4="ZZ" }
{ print }
' file
and it'll operate on all of the files named in file in one call to awk. "file" in that last case is your file containing a list of other CSV file names.

EDIT1: Since OP has changed requirement a bit do adding following now.
awk 'BEGIN{FS=OFS=","} /^AA/||/^BB/{$4="ZZ"} /^CC/||/^DD/{$5="NEW_VALUE"} 1' Input_file > temp_file && mv temp_file Input_file
Could you please try following.
awk -F, '/^AA/{$4="ZZ"} 1' OFS=, Input_file > temp_file && mv temp_file Input_file
OR
awk 'BEGIN{FS=OFS=","} /^AA/{$4="ZZ"} 1' Input_file > temp_file && mv temp_file Input_file
Explanation: Adding explanation to above code too now.
awk '
BEGIN{ ##Starting BEGIN section of awk which will be executed before reading Input_file.
FS=OFS="," ##Setting field separator and output field separator as comma here for all lines of Input_file.
} ##Closing block for BEGIN section of this program.
/^AA/{ ##Checking condition if a line starts from string AA then do following.
$4="ZZ" ##Setting 4th field as ZZ string as per OP.
} ##Closing this condition block here.
1 ##By mentioning 1 we are asking awk to print edited or non-edited line of Input_file.
' Input_file ##Mentioning Input_file name here.

Using sed:
sed -i 's/\(^AA,[^,]*,[^,]*,\)[^,]*/\1ZZ/' input_file

Related

How to find content in a file and replace the adjecent value

Using bash how do I find a string and update the string next to it for example pass value
my.site.com|test2.spin:80
proxy_pass.map
my.site2.com test2.spin:80
my.site.com test.spin:8080;
Expected output is to update proxy_pass.map with
my.site2.com test2.spin:80
my.site.com test2.spin:80;
I tried using awk
awk '{gsub(/^my\.site\.com\s+[A-Za-z0-9]+\.spin:8080;$/,"my.site2.comtest2.spin:80"); print}' proxy_pass.map
but does not seem to work. Is there a better way to approch the problem. ?
One awk idea, assuming spacing needs to be maintained:
awk -v rep='my.site.com|test2.spin:80' '
BEGIN { split(rep,a,"|") # split "rep" variable and store in
site[a[1]]=a[2] # associative array
}
$1 in site { line=$0 # if 1st field is in site[] array then make copy of current line
match(line,$1) # find where 1st field starts (in case 1st field does not start in column #1)
newline=substr(line,1,RSTART+RLENGTH-1) # save current line up through matching 1st field
line=substr(line,RSTART+RLENGTH) # strip off 1st field
match(line,/[^[:space:];]+/) # look for string that does not contain spaces or ";" and perform replacement, making sure to save everything after the match (";" in this case)
newline=newline substr(line,1,RSTART-1) site[$1] substr(line,RSTART+RLENGTH)
$0=newline # replace current line with newline
}
1 # print current line
' proxy_pass.map
This generates:
my.site2.com test2.spin:80
my.site.com test2.spin:80;
If the input looks like:
$ cat proxy_pass.map
my.site2.com test2.spin:80
my.site.com test.spin:8080;
This awk script generates:
my.site2.com test2.spin:80
my.site.com test2.spin:80;
NOTES:
if multiple replacements need to be performed I'd suggest placing them in a file and having awk process said file first
the 2nd match() is hardcoded based on OP's example; depending on actual file contents it may be necessary to expand on the regex used in the 2nd match()
once satisified with the result the original input file can be updated in a couple ways ... a) if using GNU awk then awk -i inplace -v rep.... or b) save result to a temp file and then mv the temp file to proxy_pass.map
If the number of spaces between the columns is not significant, a simple
proxyf=proxy_pass.map
tmpf=$$.txt
awk '$1 == "my.site.com" { $2 = "test2.spin:80;" } {print}' <$proxyf >$tmpf && mv $tmpf $proxyf
should do. If you need the columns to be lined up nicely, you can replace the print by a suitable printf .... statement.
With your shown samples and attempts please try following awk code. Creating shell variable named var where it stores value my.site.com|test2.spin:80 in it. which further is being passed to awk program. In awk program creating variable named var1 which has shell variable var's value in it.
In BEGIN section of awk using split function to split value of var(shell variable's value container) into array named arr with separator as |. Where num is total number of values delimited by split function. Then using for loop to be running till value of num where it creates array named arr2 with index of current i value and making i+1 as its value(basically 1 is for key of array and next item is value of array).
In main block of awk program checking condition if $1 is in arr2 then print arr2's value else print $2 value as per requirement.
##Shell variable named var is being created here...
var="my.site.com|test2.spin:80"
awk -v var1="$var" '
BEGIN{
num=split(var1,arr,"|")
for(i=1;i<=num;i+=2){
arr2[arr[i]]=arr[i+1]
}
}
{
print $1,(($1 in arr2)?arr2[$1]:$2)
}
' Input_file
OR in case you want to maintain spaces between 1st and 2nd field(s) then try following code little tweak of Above code. Written and tested with your shown samples Only.
awk -v var1="$var" '
BEGIN{
num=split(var1,arr,"|")
for(i=1;i<=num;i+=2){
arr2[arr[i]]=arr[i+1]
}
}
{
match($0,/[[:space:]]+/)
print $1 substr($0,RSTART,RLENGTH) (($1 in arr2)?arr2[$1]:$2)
}
' Input_file
NOTE: This program can take multiple values separated by | in shell variable to be passed and checked on in awk program. But it considers that it will be in format of key|value|key|value... only.
#!/bin/sh -x
f1=$(echo "my.site.com|test2.spin:80" | cut -d'|' -f1)
f2=$(echo "my.site.com|test2.spin:80" | cut -d'|' -f2)
echo "${f1}%${f2};" >> proxy_pass.map
tr '%' '\t' < proxy_pass.map >> p1
cat > ed1 <<EOF
$
-1
d
wq
EOF
ed -s p1 < ed1
mv -v p1 proxy_pass.map
rm -v ed1
This might work for you (GNU sed):
<<<'my.site.com|test2.spin:80' sed -E 's#\.#\\.#g;s#^(\S+)\|(\S+)#/^\1\\b/s/\\S+/\2/2#' |
sed -Ef - file
Build a sed script from the input arguments and apply it to the input file.
The input arguments are first prepared so that their metacharacters ( in this case the .'s are escaped.
Then the first argument is used to prepare a match command and the second is used as the value to be replaced in a substitution command.
The result is piped into a second sed invocation that takes the sed script and applies it the input file.

Extract specific substring in shell

I have a file which contains following line:
ro fstype=sd timeout=10 console=ttymxc1,115200 show=true
I'd like to extract and store fstype attribue "sd" in a variable.
I did the job using bash
IFS=" " read -a args <<< file
for arg in ${args[#]}; do
if [[ "$arg" =~ "fstype" ]]; then
id=$(cut -d "=" -f2 <<< "$arg")
echo $id
fi
done
and following awk command in another shell script:
awk -F " " '{print $2}' file | cut -d '=' -f2
Because 'fstype' argument position and file content can differ, how to do the same things and keep compatibility in shell script ?
Could you please try following.
awk 'match($0,/fstype=[^ ]*/){print substr($0,RSTART+7,RLENGTH-7)}' Input_file
OR more specifically to handle any string before = try following:
awk '
match($0,/fstype=[^ ]*/){
val=substr($0,RSTART,RLENGTH)
sub(/.*=/,"",val)
print val
val=""
}
' Input_file
With sed:
sed 's/.*fstype=\([^ ]*\).*/\1/' Input_file
awk code's explanation:
awk ' ##Starting awk program from here.
match($0,/fstype=[^ ]*/){ ##Using match function to match regex fstype= till first space comes in current line.
val=substr($0,RSTART,RLENGTH) ##Creating variable val which has sub-string of current line from RSTART to till RLENGTH.
sub(/.*=/,"",val) ##Substituting everything till = in value of val here.
print val ##Printing val here.
val="" ##Nullifying val here.
}
' Input_file ##mentioning Input_file name here.
Any time you have tag=value pairs in your data I find it best to start by creating an array (f[] below) that maps those tags (names) to their values:
$ awk -v tag='fstype' -F'[ =]' '{for (i=2;i<NF;i+=2) f[$i]=$(i+1); print f[tag]}' file
sd
$ awk -v tag='console' -F'[ =]' '{for (i=2;i<NF;i+=2) f[$i]=$(i+1); print f[tag]}' file
ttymxc1,115200
With the above approach you can do whatever you like with the data just by referencing it by it's name as the index in the array, e.g.:
$ awk -F'[ =]' '{
for (i=2;i<NF;i+=2) f[$i]=$(i+1)
if ( (f["show"] == "true") && (f["timeout"] < 20) ) {
print f["console"], f["fstype"]
}
}' file
ttymxc1,115200 sd
If your data has more than 1 row and there can be different fields on each row (doesn't appear to be true for your data) then add delete f as the first line of the script.
If the key and value can be matched by the regex fstype=[^ ]*, grep and -o option which extracts matched pattern can be used.
$ grep -o 'fstype=[^ ]*' file
fstype=sd
In addition, regex \K can be used with -P option (please make sure this option is only valid in GNU grep).
Patterns that are to the left of \K are not shown with -o.
Therefore, below expression can extract the value only.
$ grep -oP 'fstype=\K[^ ]*' file
sd

How to add new line in file in bash?

my input file contains
<arg>arg1</arg>
<arg>arg2</arg>
<arg>arg3</arg>
<arg>arg4</arg>
now i want to add new line <arg>arg5</arg>.
I used below command
awk '{gsub("<arg>arg4</arg>", "<arg>arg4</arg>\n<arg>arg5</arg>", $0); print}' inputfile > tempfile
But its not working at all. Its also not giving any errors.
Please help me out here.
You can use a simple string comparison to avoid escaping of special characters like $, ( and ) in regular expressions:
awk '1
$0 == "<arg>arg4</arg>"{
print "<arg>arg5</arg>"
}
' inputfile > tempfile
The first 1 prints the current line and if the current line is <arg>arg4</arg>, print
<arg>arg5</arg>.
If the search string is only part of the line (padded by whitespace for example), you could use index to get the position of the search string
and insert the new string after it:
# define two shell variables
search='<arg>arg4</arg>'
insert='<arg>arg5</arg>'
awk -v search="$search" -v insert="$insert" '
{
idx=index($0, search)
if (idx){
print substr($0, 1, idx+length(search)-1) ORS insert substr($0, idx+length(search))
next
}
}1' inputfile > tempfile
The long print statement prints the following parts
the string before the search string + the search string itself
a newline
the insert string
the string after the search string (possibly empty)
One way using sed:
File1:
$ cat file1
<arg>arg1</arg>
<arg>arg2</arg>
<arg>arg3</arg>
<arg>arg4</arg>
File2:
$ cat file2
<arg>arg5</arg>
sed command:
$ sed -i '$r file2' file1
Check file1:
$ cat file1
<arg>arg1</arg>
<arg>arg2</arg>
<arg>arg3</arg>
<arg>arg4</arg>
<arg>arg5</arg>
Using sed, we can simply read the contents of another file into current file.
$r file2- read(r) when the last line($) is read. -i to edit the file in-place.

How do I join lines using space and comma

I have the file that contains content like:
IP
111
22
25
I want to print the output in the format IP 111,22,25.
I have tried tr ' ' , but its not working
Welcome to paste
$ paste -sd " ," file
IP 111,22,25
Normally what paste does is it writes to standard output lines consisting of sequentially corresponding lines of each given file, separated by a <tab>-character. The option -s does it differently. It states to paste each line of the files sequentially with a <tab>-character as a delimiter. When using the -d flag, you can give a list of delimiters to be used instead of the <tab>-character. Here I gave as a list " ," indicating, use space and then only commas.
In pure Bash:
# Read file into array
mapfile -t lines < infile
# Print to string, comma-separated from second element on
printf -v str '%s %s' "${lines[0]}" "$(IFS=,; echo "${lines[*]:1}")"
# Print
echo "$str"
Output:
IP 111,22,25
I'd go with:
{ read a; read b; read c; read d; } < file
echo "$a $b,$c,$d"
This will also work:
xargs printf "%s %s,%s,%s" < file
Try cat file.txt | tr '\n' ',' | sed "s/IP,/IP /g"
tr deletes new lines, sed changes IP,111,22,25 into IP 111,22,25
The following awk script will do the requested:
awk 'BEGIN{OFS=","} FNR==1{first=$0;next} {val=val?val OFS $0:$0} END{print first FS val}' Input_file
Explanation: Adding explanation for above code now.
awk ' ##Starting awk program here.
BEGIN{ ##Starting BEGIN section here of awk program.
OFS="," ##Setting OFS as comma, output field separator.
} ##Closing BEGIN section of awk here.
FNR==1{ ##Checking if line is first line then do following.
first=$0 ##Creating variable first whose value is current first line.
next ##next keyword is awk out of the box keyword which skips all further statements from here.
} ##Closing FNR==1 BLOCK here.
{ ##This BLOCK will be executed for all lines apart from 1st line.
val=val?val OFS $0:$0 ##Creating variable val whose values will be keep concatenating its own value.
}
END{ ##Mentioning awk END block here.
print first FS val ##Printing variable first FS(field separator) and variable val value here.
}' Input_file ##Mentioning Input_file name here which is getting processed by awk.
Using Perl
$ cat captain.txt
IP
111
22
25
$ perl -0777 -ne ' #k=split(/\s+/); print $k[0]," ",join(",",#k[1..$#k]) ' captain.txt
IP 111,22,25
$

print first 3 characters and / rest of the string with stars

I'have this input like this
John:boofoo
I want to print rest of the string with stars and keep only 3 characters of the string.
The output will be like this
John:boo***
this my command
awk -F ":" '{print $1,$2 ":***"}'
I want to use only print command if possible. Thanks
With GNU sed:
echo 'John:boofoo' | sed -E 's/(:...).*/\1***/'
Output:
John:boo***
With GNU awk for gensub():
$ awk 'BEGIN{FS=OFS=":"} {print $1, substr($2,1,3) gensub(/./,"*","g",substr($2,4))}' file
John:boo***
With any awk:
awk 'BEGIN{FS=OFS=":"} {tl=substr($2,4); gsub(/./,"*",tl); print $1, substr($2,1,3) tl}' file
John:boo***
Could you please try following. This will print stars(keeping only first 3 letters same as it is) how many characters are present in 2nd field after first 3 characters.
awk '
BEGIN{
FS=OFS=":"
}
{
stars=""
val=substr($2,1,3)
for(i=4;i<=length($2);i++){
stars=stars"*"
}
$2=val stars
}
1
' Input_file
Output will be as follows.
John:boo***
Explanation: Adding explanation for above code too here.
awk '
BEGIN{ ##Starting BEGIN section from here.
FS=OFS=":" ##Setting FS and OFS value as : here.
} ##Closing block of BEGIN section here.
{ ##Here starts main block of awk program.
stars="" ##Nullifying variable stars here.
val=substr($2,1,3) ##Creating variable val whose value is 1st 3 letters of 2nd field.
for(i=4;i<=length($2);i++){ ##Starting a for loop from 4(becasue we need to have from 4th character to till last in 2nd field) till length of 2nd field.
stars=stars"*" ##Keep concatenating stars variable to its own value with *.
}
$2=val stars ##Assigning value of variable val and stars to 2nd field here.
}
1 ##Mentioning 1 here to print edited/non-edited lines for Input_file here.
' Input_file ##Mentioning Input_file name here.
Or even with good old sed
$ echo "John:boofoo" | sed 's/...$/***/'
Output:
John:boo***
(note: this just replaces the last 3 characters of any string with "***", so if you need to key off the ':', see the GNU sed answer from Cyrus.)
Another awk variant:
awk -F ":" '{print $1 FS substr($2, 1, 3) "***"}' <<< 'John:boofoo'
John:boo***
Since we have the tags awk, bash and sed: for completeness sake here is a bash only solution:
INPUT="John:boofoo"
printf "%s:%s\n" ${INPUT%%:*} $(TMP1=${INPUT#*:};TMP2=${TMP1:3}; echo "${TMP1:0:3}${TMP2//?/*}")
It uses two arguments to printf after the format string. The first one is INPUT stripped of by everything uncluding and after the :. Lets break down the second argument $(TMP1=${INPUT#*:};TMP2=${TMP1:3}; echo "${TMP1:0:3}${TMP2//?/*}"):
$(...) the string is interpreted as a bash command its output is substituted as last argument to printf
TMP1=${INPUT#*:}; remove everything up to and including the :, store the string in TMP1.
TMP2=${TMP1:3}; geht all characters of TMP1 from offset 3 to the end and store them in TMP2.
echo "${TMP1:0:3}${TMP2//?/*}" output the temporary strings: the first three chars from TMP1 unmodified and all chars from TMP2 as *
the output of the last echo is the last argument to printf
Here is the bash -x output:
+ INPUT=John:boofoo
++ TMP1=boofoo
++ TMP2=foo
++ echo 'boo***'
+ printf '%s:%s\n' John 'boo***'
John:boo***
Another sed : replace all chars after the third by *
sed -E ':A;s/([^:]*:...)(.*)[^*]([*]*)/\1\2\3*/;tA'
Some more awk
awk 'BEGIN{FS=OFS=":"}{s=sprintf("%0*d",length(substr($2,4)),0); gsub(/0/,"*",s);print $1,substr($2,1,3) s}' infile
You can use the %* form of printf, which accepts a variable width. And, if you use '0' as your value to print, combined with the right-aligned text that's zero padded on the left..
Better Readable:
awk 'BEGIN{
FS=OFS=":"
}
{
s=sprintf("%0*d",length(substr($2,4)),0);
gsub(/0/,"*",s);
print $1,substr($2,1,3) s
}
' infile
Test Results:
$ awk --version
GNU Awk 3.1.7
Copyright (C) 1989, 1991-2009 Free Software Foundation.
$ cat f
John:boofoo
$ awk 'BEGIN{FS=OFS=":"}{s=sprintf("%0*d",length(substr($2,4)),0); gsub(/0/,"*",s);print $1,substr($2,1,3) s}' f
John:boo***
Another pure Bash, using the builtin regular expression predicate.
input="John:boofoo"
if [[ $input =~ ^([^:]*:...)(.*)$ ]]; then
printf '%s%s\n' "${BASH_REMATCH[1]}" "${BASH_REMATCH[2]//?/*}"
else
echo >&2 "String doesn't match pattern"
fi
We split the string in two parts: the first part being everything up to (and including) the three chars found after the first colon (stored in ${BASH_REMATCH[1]}), the second part being the remaining part of string (stored in ${BASH_REMATCH[2]}). If the string doesn't match this pattern, we just insult the user.
We then print the first part unchanged, and the second part with every character replaced with *.

Resources