change numerical value in file to characters via awk - bash

I'm looking to replace the numerical values in a file with a new value provided by me. Can be present in any part of the text, in some cases, it comes across as the third position but is not always necessarily the case. Also to try and save a new version of the file.
original format
A:fdg:user#server:r
A:g:1234:xtcy
A:d:1111:xtcy
modified format
A:fdg:user#server:rxtTncC
A:g:replaced_value:xtcy
A:d:replaced_value:xtcy
bash line command with awk:
awk -v newValue="newVALUE" 'BEGIN{FS=OFS=":"} /:.:.*:/ && ~/^[0-9]+$/{~=newValue} 1' original_file.txt > replaced_file.txt

You can simply use sed instead of awk:
sed -E 's/\b[0-9]+\b/replaced_value/g' /path/to/infile > /path/to/outfile

Here is an awk that asks you for replacement values for each numerical value it meets:
$ awk '
BEGIN {
FS=OFS=":" # delimiters
}
{
for(i=1;i<=NF;i++) # loop all fields
if($i~/^[0-9]+$/) { # if numerical value found
printf "Provide replacement value for %d: ",$i > "/dev/stderr"
getline $i < "/dev/stdin" # ask for a replacement
}
}1' file_in > file_out # write output to a new file

I would use GNU AWK for this task following way, let file.txt content be
A:fdg:user#server:rxtTncC
A:g:1234:xtcy
A:d:1111:xtcy
then
awk 'BEGIN{newvalue="replacement"}{gsub(/[[:digit:]]+/,newvalue);print}' file.txt
output
A:fdg:user#server:rxtTncC
A:g:replacement:xtcy
A:d:replacement:xtcy
Explanation: replace one or more digits using newvalue. Disclaimer: I assumed numeric is something consisting solely from digits.
(tested in gawk 4.2.1)

How about
awk -F : '$3 ~ /^[0-9]+$/ { $3 = "new value"} {print}' original_file >replaced_file
?

Related

awk: select first column and value in column after matching word

I have a .csv where each row corresponds to a person (first column) and attributes with values that are available for that person. I want to extract the names and values a particular attribute for persons where the attribute is available. The doc is structured as follows:
name,attribute1,value1,attribute2,value2,attribute3,value3
joe,height,5.2,weight,178,hair,
james,,,,,,
jesse,weight,165,height,5.3,hair,brown
jerome,hair,black,breakfast,donuts,height,6.8
I want a file that looks like this:
name,attribute,value
joe,height,5.2
jesse,height,5.3
jerome,height,6.8
Using this earlier post, I've tried a few different awk methods but am still having trouble getting both the first column and then whatever column has the desired value for the attribute (say height). For example the following returns everything.
awk -F "height," '{print $1 "," FS$2}' file.csv
I could grep only the rows with height in them, but I'd prefer to do everything in a single line if I can.
You may use this awk:
cat attrib.awk
BEGIN {
FS=OFS=","
print "name,attribute,value"
}
NR > 1 && match($0, k "[^,]+") {
print $1, substr($0, RSTART+1, RLENGTH-1)
}
# then run it as
awk -v k=',height,' -f attrib.awk file
name,attribute,value
joe,height,5.2
jesse,height,5.3
jerome,height,6.8
# or this one
awk -v k=',weight,' -f attrib.awk file
name,attribute,value
joe,weight,178
jesse,weight,165
With your shown samples please try following awk code. Written and tested in GNU awk. Simple explanation would be, using GNU awk and setting RS(record separator) to ^[^,]*,height,[^,]* and then printing RT as per requirement to get expected output.
awk -v RS='^[^,]*,height,[^,]*' 'RT{print RT}' Input_file
I'd suggest a sed one-liner:
sed -n 's/^\([^,]*\).*\(,height,[^,]*\).*/\1\2/p' file.csv
One awk idea:
awk -v attr="height" '
BEGIN { FS=OFS="," }
FNR==1 { print "name", "attribute", "value"; next }
{ for (i=2;i<=NF;i+=2) # loop through even-numbered fields
if ($i == attr) { # if field value is an exact match to the "attr" variable then ...
print $1,$i,$(i+1) # print current name, current field and next field to stdout
next # no need to check rest of current line; skip to next input line
}
}
' file.csv
NOTE: this assumes the input value (height in this example) will match exactly (including same capitalization) with a field in the file
This generates:
name,attribute,value
joe,height,5.2
jesse,height,5.3
jerome,height,6.8
With a perl one-liner:
$ perl -lne '
print "name,attribute,value" if $.==1;
print "$1,$2" if /^(\w+).*(height,\d+\.\d+)/
' file
output
name,attribute,value
joe,height,5.2
jesse,height,5.3
jerome,height,6.8
awk accepts variable-value arguments following a -v flag before the script. Thus, the name of the required attribute can be passed into an awk script using the general pattern:
awk -v attr=attribute1 ' {} ' file.csv
Inside the script, the value of the passed variable is reference by the variable name, in this case attr.
Your criteria are to print column 1, the first column containing the name, the column corresponding to the required header value, and the column immediately after that column (holding the matched values).
Thus, the following script allows you to fish out the column headed "attribute1" and it's next neighbour:
awk -v attr=attribute1 ' BEGIN {FS=","} /attr/{for (i=1;i<=NF;i++) if($i == attr) col=i;} {print $1","$col","$(col+1)} ' data.txt
result:
name,attribute1,value1
joe,height,5.2
james,,
jesse,weight,165
jerome,hair,black
another column (attribute 3):
awk -v attr=attribute3 ' BEGIN {FS=","} /attr/{for (i=1;i<=NF;i++) if($i == attr) col=i;} {print $1","$col","$(col+1)} ' awkNames.txt
result:
name,attribute3,value3
joe,hair,
james,,
jesse,hair,brown
jerome,height,6.8
Just change the value of the -v attr= argument for the required column.

Updating a specific field with sed

I'm trying to update a specific field on a specific line with the sed command in Bourne Shell.
Lets say I have a file TopScorer.txt
Player:Games:Goals:Assists
Salah:9:9:3
Kane:10:8:4
And I need to update the 3rd Column (Goals) of a player, I tried this command and it works unless Games and Goals have the same value then it updates the first one
player="Salah"
NewGoals="10"
OldGoals=$(awk -F':' '$1=="'$player'"' TopScorer.txt | cut -d':' -f3)
sed -i '/^'$player'/ s/'$OldGoals'/'$NewGoals'/' TopScorer.txt
Output> Salah:10:9:3 instead of Salah:9:10:3
Is there any solution? Should I use delimiters and $3==... to specify that field?
I also tried the option /2 for second occurrence but it's not very convenient in my case.
You can just do this with awk alone and not with sed. Also note that awk has an internal syntax to import variables from the shell. So your code just becomes
awk -F: -v pl="$player" -v goals="$NewGoals"
'BEGIN { OFS = FS } $1 == pl { $3= goals }1' TopScorer.txt
The -F: sets the input de-limiter as : and the part involving -v imports your shell variables to the context of awk. The BEGIN { OFS = FS } sets the output field separator to the same as input. Then we do the match using the imported variables and update $3 to the required value.
To make the modifications in-place, use a temporary file
awk -F: -v pl="$player" -v goals="$NewGoals"
'BEGIN { OFS = FS } $1 == pl { $3= goals }1' TopScorer.txt > tmpfile && mv tmpfile TopScorer.txt
This might work for you (GNU sed):
(player=Salah;newGoals=10;sed -i "/^$name/s/[^:]*/$newGoals/3" /tmp/file)
Use a sub shell so as not to pollute the current shell (...). Use sed and pattern matching to match the first field of each record to the variable player and replace the third field of the matching record with the contents of newGoals.
P.S. If the variables are needed in further processes the sub shell is not necessary i.e. remove the ( and )
You can try it with Perl
$ player="Salah"
$ NewGoals="10"
$ perl -F: -lane "\$F[2]=$NewGoals if ( \$F[0] eq $player ) ; print join(':',#F) " TopScorer.txt
Player:Games:Goals:Assists
Salah:9:10:3
Kane:10:8:4
$
or export them and call Perl one-liner within single quotes
$ export NewGoals="10"
$ export player="Salah"
$ perl -F: -lane '$F[2]=$ENV{NewGoals} if $F[0] eq $ENV{player} ; print join(":",#F) ' TopScorer.txt
Player:Games:Goals:Assists
Salah:9:10:3
Kane:10:8:4
$
Note that Perl has -i switch and you can do the replacement in-place, so
$ perl -i.bak -F: -lane '$F[2]=$ENV{NewGoals} if $F[0] eq $ENV{player} ; print join(":",#F) ' TopScorer.txt
$ cat TopScorer.txt
Player:Games:Goals:Assists
Salah:9:10:3
Kane:10:8:4
$
This will work .
With the first part of sed , i try to match a full line that math the player, and i keep all fields i want to keep by using \( .
The second part , i rebuild the line with some constants and the value of \1 and the value of \2
player="Salah"
NewGoals="10"
sed "s/^$player:\([^:]*\):[^:]*:\([^:]*\)\$/$player:\1:$NewGoals:\2/"
Could you please try following once. Advantage of this approach is that I am not hard coding field for Goals. This program will look for header's field wherever Goal is present(eg--> 4th or 5th any field), it will change for that specific column only.
1st Solution: When you need to make changes to all occurrences of player name then use following.
NewGoals=10
awk -v newgoals="$NewGoals" 'BEGIN{FS=OFS=":"} FNR==1{for(i=1;i<=NF;i++){if($i=="Goals"){field=i}}} FNR>1{if($1=="Salah"){$field=newgoals}} 1' Input_file
2nd Solution: In case you want to change a specific player's goals value to specific row only then try following.
NewGoals=10
awk -v newgoals="$NewGoals" 'BEGIN{FS=OFS=":"} FNR==1{for(i=1;i<=NF;i++){if($i=="Goals"){field=i}}} FNR>1{if($1=="Salah" && FNR==2){$field=newgoals}} 1' Input_file
Above will make changes only for row 2, you coud change it by changing FNR==2 in 2nd condition where FNR refers row number inawk. In case you want to save output into Input_file itself then you could append > temp_file && mv temp_file Input_file to above codes.

print first 3 characters and / rest of the string with stars

I'have this input like this
John:boofoo
I want to print rest of the string with stars and keep only 3 characters of the string.
The output will be like this
John:boo***
this my command
awk -F ":" '{print $1,$2 ":***"}'
I want to use only print command if possible. Thanks
With GNU sed:
echo 'John:boofoo' | sed -E 's/(:...).*/\1***/'
Output:
John:boo***
With GNU awk for gensub():
$ awk 'BEGIN{FS=OFS=":"} {print $1, substr($2,1,3) gensub(/./,"*","g",substr($2,4))}' file
John:boo***
With any awk:
awk 'BEGIN{FS=OFS=":"} {tl=substr($2,4); gsub(/./,"*",tl); print $1, substr($2,1,3) tl}' file
John:boo***
Could you please try following. This will print stars(keeping only first 3 letters same as it is) how many characters are present in 2nd field after first 3 characters.
awk '
BEGIN{
FS=OFS=":"
}
{
stars=""
val=substr($2,1,3)
for(i=4;i<=length($2);i++){
stars=stars"*"
}
$2=val stars
}
1
' Input_file
Output will be as follows.
John:boo***
Explanation: Adding explanation for above code too here.
awk '
BEGIN{ ##Starting BEGIN section from here.
FS=OFS=":" ##Setting FS and OFS value as : here.
} ##Closing block of BEGIN section here.
{ ##Here starts main block of awk program.
stars="" ##Nullifying variable stars here.
val=substr($2,1,3) ##Creating variable val whose value is 1st 3 letters of 2nd field.
for(i=4;i<=length($2);i++){ ##Starting a for loop from 4(becasue we need to have from 4th character to till last in 2nd field) till length of 2nd field.
stars=stars"*" ##Keep concatenating stars variable to its own value with *.
}
$2=val stars ##Assigning value of variable val and stars to 2nd field here.
}
1 ##Mentioning 1 here to print edited/non-edited lines for Input_file here.
' Input_file ##Mentioning Input_file name here.
Or even with good old sed
$ echo "John:boofoo" | sed 's/...$/***/'
Output:
John:boo***
(note: this just replaces the last 3 characters of any string with "***", so if you need to key off the ':', see the GNU sed answer from Cyrus.)
Another awk variant:
awk -F ":" '{print $1 FS substr($2, 1, 3) "***"}' <<< 'John:boofoo'
John:boo***
Since we have the tags awk, bash and sed: for completeness sake here is a bash only solution:
INPUT="John:boofoo"
printf "%s:%s\n" ${INPUT%%:*} $(TMP1=${INPUT#*:};TMP2=${TMP1:3}; echo "${TMP1:0:3}${TMP2//?/*}")
It uses two arguments to printf after the format string. The first one is INPUT stripped of by everything uncluding and after the :. Lets break down the second argument $(TMP1=${INPUT#*:};TMP2=${TMP1:3}; echo "${TMP1:0:3}${TMP2//?/*}"):
$(...) the string is interpreted as a bash command its output is substituted as last argument to printf
TMP1=${INPUT#*:}; remove everything up to and including the :, store the string in TMP1.
TMP2=${TMP1:3}; geht all characters of TMP1 from offset 3 to the end and store them in TMP2.
echo "${TMP1:0:3}${TMP2//?/*}" output the temporary strings: the first three chars from TMP1 unmodified and all chars from TMP2 as *
the output of the last echo is the last argument to printf
Here is the bash -x output:
+ INPUT=John:boofoo
++ TMP1=boofoo
++ TMP2=foo
++ echo 'boo***'
+ printf '%s:%s\n' John 'boo***'
John:boo***
Another sed : replace all chars after the third by *
sed -E ':A;s/([^:]*:...)(.*)[^*]([*]*)/\1\2\3*/;tA'
Some more awk
awk 'BEGIN{FS=OFS=":"}{s=sprintf("%0*d",length(substr($2,4)),0); gsub(/0/,"*",s);print $1,substr($2,1,3) s}' infile
You can use the %* form of printf, which accepts a variable width. And, if you use '0' as your value to print, combined with the right-aligned text that's zero padded on the left..
Better Readable:
awk 'BEGIN{
FS=OFS=":"
}
{
s=sprintf("%0*d",length(substr($2,4)),0);
gsub(/0/,"*",s);
print $1,substr($2,1,3) s
}
' infile
Test Results:
$ awk --version
GNU Awk 3.1.7
Copyright (C) 1989, 1991-2009 Free Software Foundation.
$ cat f
John:boofoo
$ awk 'BEGIN{FS=OFS=":"}{s=sprintf("%0*d",length(substr($2,4)),0); gsub(/0/,"*",s);print $1,substr($2,1,3) s}' f
John:boo***
Another pure Bash, using the builtin regular expression predicate.
input="John:boofoo"
if [[ $input =~ ^([^:]*:...)(.*)$ ]]; then
printf '%s%s\n' "${BASH_REMATCH[1]}" "${BASH_REMATCH[2]//?/*}"
else
echo >&2 "String doesn't match pattern"
fi
We split the string in two parts: the first part being everything up to (and including) the three chars found after the first colon (stored in ${BASH_REMATCH[1]}), the second part being the remaining part of string (stored in ${BASH_REMATCH[2]}). If the string doesn't match this pattern, we just insult the user.
We then print the first part unchanged, and the second part with every character replaced with *.

Using a value from stored in a different file awk

I have a value stored in a file named cutoff1
If I cat cutoff1 it will look like
0.34722
I want to use the value stored in cutoff1 inside an awk script. Something like following
awk '{ if ($1 >= 'cat cutoff1' print $1 }' hist1.dat >hist_oc1.dat
I think I am making some mistakes. If I do manually it will look like
awk '{ if ($1 >= 0.34722) print $1 }' hist1.dat >hist_oc1.dat
How can I use the value stored in cutoff1 file inside the above mentioned awk script?
The easiest ways to achieve this are
awk -v cutoff="$(cat cutoff1)" '($1 >= cutoff){print $1}' hist.dat
awk -v cutoff="$(< cutoff1)" '($1 >= cutoff){print $1}' hist.dat
or
awk '(NR==FNR){cutoff=$1;next}($1 >= cutoff){print $1}' cutoff1 hist.dat
or
awk '($1 >= cutoff){print $1}' cutoff="$(cat cutoff1)" hist.dat
awk '($1 >= cutoff){print $1}' cutoff="$(< cutoff1)" hist.dat
note: thanks to Glenn Jackman to point to :
man bash Command substitution: Bash performs the expansion by executing command and replacing the command substitution with the
standard output of the command, with any trailing newlines deleted.
Embedded newlines are not deleted, but they may be removed during word
splitting. The command substitution $(cat file) can be replaced by
the equivalent but faster $(< file).
since awk can read multiple files just add the filename before your data file and treat first line specially. No need for external variable declaration.
awk 'NR==1{cutoff=$1; next} $1>=cutoff{print $1}' cutoff data
PS Just noticed that it's similar to the #kvantour's second answer, but keepin it here as a different flavor.
You could use getline to read a value from another file at your convenience. First the main file to process:
$ cat > file
wait
wait
did you see that
nothing more to see here
And cutoff:
$ cat cutoff
0.34722
An wwk script that reads a line from cutoff when it meets the string see in a record:
$ awk '/see/{if((getline val < "cutoff") > 0) print val}1' file
wait
wait
0.34722
did you see that
nothing more to see here
Explained:
$ awk '
/see/ { # when string see is in the line
if((getline val < "cutoff") > 0) # read a value from cutoff if there are any available
print val # and output the value from cutoff
}1' file # output records from file
As there was only one value, it was printed only once even see was seen twice.

Using awk to search for a line that starts with but also contains a string

I have a file that has multiple lines that starts with a keyword. I only want to modify one of them and it's easy to distinguish the two. I want the one that is under the [dbinfo] section. The domain name is static so I know that won't change.
awk -F '=' '$1 ~ /^dbhost/ {print $NF};' myfile.txt
myfile.txt
[ual]
path=/web/
dbhost=ez098sf
[dbinfo]
dbhost=ec0001.us-east-1.localdomain
dbname=ez098sf_default
dbpass=XXXXXX
You can use this awk command to first check for presence of [dbinfo] section and then modify dbhost parameter:
awk -v h='newhost' 'BEGIN{FS=OFS="="}
$0 == "[dbinfo]" {sec=1} sec && $1 == "dbhost"{$2 = h; sec=0} 1' file
[ual]
path=/web/
dbhost=ez098sf
[dbinfo]
dbhost=newhost
dbname=ez098sf_default
dbpass=XXXXXX
You want to utilize a little bit of a state machine here:
awk -F '=' '
$0 ~ /^\[.*\]/ {in_db_info=($0=="[dbinfo]"}
$0 ~ /^dbhost/{if (in_db_info) print $2;}' myfile.txt
You can also do it with sed:
sed '/\[dbinfo\]/,/\[/s/\(^dbhost=\).*/\1domain.com/' myfile.txt

Resources