modify line in a .txt file in bash [duplicate] - bash

This question already has answers here:
Modify column 2 only using awk and sed
(2 answers)
Closed 6 years ago.
I have a .txt that contains lines and these in turn data separated by "," for example:
10,05,nov,2016,122,2,2,330,user
What I want is to be able to modify a parameter of an X line, which the search method is the first number, which is unique, is not repeated.
For example find the number 10 (f1) and modify the row containing the 122 (f5).
I've tried it with sed but I can't do it.
I've commented that with awk I could, but I did'nt study that command.
Some help??

A simple awk script like the following should do the trick :
awk -v find="10" -v field="5" -v newval="abcd" 'BEGIN {FS=OFS=","} {if ($1 == find) $field=newval; print $0}' test.csv
Explanation:
awk -v find="10" -v field="5" -v newval="abcd" : defines 3 variables for awk. find, that contains the pattern we are looking for,field that contains the number of the field we want to edit, and newval with the value to replace.
BEGIN {FS=OFS=","} : before iterating through the file, we set the File Separator and Output File Separator to ",".
if ($1 == find) $field=newval: if the 1rst field of a line contains the pattern we want, we set the Nth field (1st if $field=1, 2nd if $field=2, ...) to the value of newval
print $0: whatever the result from the if test, we print the whole line.
A shorter (but less understandable) version of this script could be written as follow :
awk -v a="10" -v f="5" -v n="abcd" -F, '$1 == a {$f=n}OFS=FS' test.csv
Where a refers to find, f refers to field, n refers to newval and -F, refers to FS=","
Script in action :
> cat test.csv
11,05,nov,2016,122,2,2,330,user
10,05,nov,2016,123,2,2,330,user
12,05,nov,2016,124,2,2,330,user
> awk -v find="10" -v field="5" -v newval="abcd" 'BEGIN {FS=OFS=","} {if ($1 == find) $field=newval; print $0}' test.csv
11,05,nov,2016,122,2,2,330,user
10,05,nov,2016,abcd,2,2,330,user
12,05,nov,2016,124,2,2,330,user

With sed:
$ sed '/^10/s/,[^,]*/,333/4' <<< "10,05,nov,2016,122,2,2,330,user"
10,05,nov,2016,333,2,2,330,user
In lines starting with 10, search for 4th comma followed by non-comma characters and replace with your substitution string.

Related

How can we use '~|~' delimiter to split the records using scripting command?

Please suggest how can I split the columns separated with ~|~ delimiter.(file: abc.dat)
a~|~1~|~x
b~|~1~|~y
c~|~2~|~z
I am trying below awk command but getting output 0 count.
awk -F'~|~' '$2 == 1' ${file} | wc -l
With your shown samples, please try following. We need not to use wc command along with awk, it could be done within awk itself.
awk -F'~\\|~' '$2 == 1{count++} END{print count}' "$file"
Explanation: Setting field separator as ~|~(escaped | here). Then checking if 2nd field is 1, increment variable count with 1 then. In END block of this program print its value.
For saving values into shell variable use like:
var=$(awk -F'~\\|~' '$2 == 1{count++} END{print count}' "$file")
You can also use ~[|]~ as FS value, as the pipe char used inside a bracket expression always matches itself, a pipe char:
counter=$(awk 'BEGIN{FS="~[|]~"} $2==1{cnt++} END{print cnt}' file)
See the online awk demo:
s='a~|~1~|~x
b~|~1~|~y
c~|~2~|~z'
counter=$(awk 'BEGIN{FS="~[|]~"} $2==1{cnt++} END{print cnt}' <<< "$s")
echo $counter
# => 2

Condition on Nth character of string in a Mth column in bash

I have a sample
$ cat c.csv
a,1234543,c
b,1231456,d
c,1230654,e
I need to grep only numbers where 4th character of 2nd column but not be 0 or 1
Output must be
a,1234543,c
I know this only
awk -F, 'BEGIN { OFS = FS } $2 ~/^[2-9]/' c.csv
Is it possible to put a condition on 4th character?
Could you please try following.
awk 'BEGIN{FS=","} substr($2,4,1)!=0 && substr($2,4,1)!=1' Input_file
OR as per Ed site's suggestion:
awk 'BEGIN{FS=","} substr($2,4,1)!~[01]' Input_file
Explanation: Adding a detailed explanation for above code here.
awk ' ##Starting awk program from here.
BEGIN{ ##Starting BEGIN section from here.
FS="," ##Setting field separator as comma here.
} ##Closing BLOCK for this program BEGIN section.
substr($2,4,1)!=0 && substr($2,4,1)!=1 ##Checking conditions if 4th character of current line is NOT 0 and 1 then print the current line.
' Input_file ##Mentioning Input_file name here.
This might work for you (GNU sed or grep):
grep -vE '^([^,]*,){1}[^,]{3}[01]' file
or:
sed -E '/^([^,]*,){1}[^,]{3}[01]/d' file
Replace the 1 for the m'th-1 column and the 3 for the n'th-1 character in that column.
Grep is the answer.
But here is another way using array and variable substitution
test=( $(cat c.csv) ) # load c.csv data to an array
echo ${test[#]//*,???[0-1]*/} # print all items from an array,
# but remove the ones that correspond to this regex *,???[0-1]*
# so 'b,1231456,d' and 'c,1230654,e' from example will be removed
# and only 'a,1234543,c' will be printed
There are many ways to do this with awk. the most literal form would be:
4th character of 2nd column is not 0 or 1
$ awk -F, '($2 !~ /^...[01]/)' file
$ awk -F, '($2 ~ /^...[^01]/)' file
These will also match a line a,abcdefg,b
2nd column is an integer and 4th character is not 0 or 1
$ awk -F, '($2+0==$2) && ($2!~[.]) && ($2 !~ /^...[01]/)'
$ awk -F, '($2 ~ /^[0-9][0-9][0-9][^01][0-9]*$/)'

Prepend text to specific line numbers with variables

I have spent hours trying to solve this. There are a bunch of answers as to how to prepend to all lines or specific lines but not with a variable text and a variable number.
while [ $FirstVariable -lt $NextVariable ]; do
#sed -i "$FirstVariables/.*/$FirstVariableText/" "$PWD/Inprocess/$InprocessFile"
cat "$PWD/Inprocess/$InprocessFile" | awk 'NR==${FirstVariable}{print "$FirstVariableText"}1' > "$PWD/Inprocess/Temp$InprocessFile"
FirstVariable=$[$FirstVariable+1]
done
Essentially I am looking for a particular string delimiter and then figuring out where the next one is and appending the first result back into the following lines... Note that I already figured out the logic I am just having issues prepending the line with the variables.
Example:
This >
Line1:
1
2
3
Line2:
1
2
3
Would turn into >
Line1:
Line1:1
Line1:2
Line1:3
Line2:
Line2:1
Line2:2
Line2:3
You can do all that using below awk one liner.
Assuming your pattern starts with Line, then the below script can be used.
> awk '{if ($1 ~ /Line/ ){var=$1;print $0;}else{ if ($1 !="")print var $1}}' $PWD/Inprocess/$InprocessFile
Line1:
Line1:1
Line1:2
Line1:3
Line2:
Line2:1
Line2:2
Line2:3
Here is how the above script works:
If the first record contains word Line then it is copied into an awk variable var. From next word onwards, if the record is not empty, the newly created var is appended to that record and prints it producing the desired result.
If you need to pass the variables dynamically from shell to awk you can use -v option. Like below:
awk -v var1=$FirstVariable -v var2=$FirstVariableText 'NR==var{print var2}1' > "$PWD/Inprocess/Temp$InprocessFile"
The way you addressed the problem is by parsing everything both with bash and awk to process the file. You make use of bash to extract a line, and then use awk to manipulate this one line. The whole thing can actually be done with a single awk script:
awk '/^Line/{str=$1; print; next}{print (NF ? str $0 : "")}' inputfile > outputfile
or
awk 'BEGIN{RS="";ORS="\n\n";FS=OFS="\n"}{gsub(FS,OFS $1)}1' inputfile > outputfile

Using a value from stored in a different file awk

I have a value stored in a file named cutoff1
If I cat cutoff1 it will look like
0.34722
I want to use the value stored in cutoff1 inside an awk script. Something like following
awk '{ if ($1 >= 'cat cutoff1' print $1 }' hist1.dat >hist_oc1.dat
I think I am making some mistakes. If I do manually it will look like
awk '{ if ($1 >= 0.34722) print $1 }' hist1.dat >hist_oc1.dat
How can I use the value stored in cutoff1 file inside the above mentioned awk script?
The easiest ways to achieve this are
awk -v cutoff="$(cat cutoff1)" '($1 >= cutoff){print $1}' hist.dat
awk -v cutoff="$(< cutoff1)" '($1 >= cutoff){print $1}' hist.dat
or
awk '(NR==FNR){cutoff=$1;next}($1 >= cutoff){print $1}' cutoff1 hist.dat
or
awk '($1 >= cutoff){print $1}' cutoff="$(cat cutoff1)" hist.dat
awk '($1 >= cutoff){print $1}' cutoff="$(< cutoff1)" hist.dat
note: thanks to Glenn Jackman to point to :
man bash Command substitution: Bash performs the expansion by executing command and replacing the command substitution with the
standard output of the command, with any trailing newlines deleted.
Embedded newlines are not deleted, but they may be removed during word
splitting. The command substitution $(cat file) can be replaced by
the equivalent but faster $(< file).
since awk can read multiple files just add the filename before your data file and treat first line specially. No need for external variable declaration.
awk 'NR==1{cutoff=$1; next} $1>=cutoff{print $1}' cutoff data
PS Just noticed that it's similar to the #kvantour's second answer, but keepin it here as a different flavor.
You could use getline to read a value from another file at your convenience. First the main file to process:
$ cat > file
wait
wait
did you see that
nothing more to see here
And cutoff:
$ cat cutoff
0.34722
An wwk script that reads a line from cutoff when it meets the string see in a record:
$ awk '/see/{if((getline val < "cutoff") > 0) print val}1' file
wait
wait
0.34722
did you see that
nothing more to see here
Explained:
$ awk '
/see/ { # when string see is in the line
if((getline val < "cutoff") > 0) # read a value from cutoff if there are any available
print val # and output the value from cutoff
}1' file # output records from file
As there was only one value, it was printed only once even see was seen twice.

awk syntax — what is OFS and why the 1 at the end?

awk -F"\t" -v OFS="\t" '{if($18~/^ *[0-9]*(\.[0-9]+)?" *$/)sub(/"/,"",$18);else $18=" "}1' sample.txt
The code above is some awk code used in a script I'm modifying. I'm new to Unix so am not able understand the syntax of the above awk.
-F is for splitting the colum with the delimeter.
What is OFS?
And what is the use of 1 at the end of the awk script?
-v OFS="\n" passes a param named OFS from the shell to the awk script. Like the -F option or FS it is the field separator - but for the output. It is called the output field separator
You can test it:
awk -v OFS=' ' '{print 1,2}' a.txt
Output separated by spaces:
1 2
1 2
.
awk -v OFS=';' '{print 1,2}' a.txt
Output separated by ;:
1;2
1;2
In your case it means, that the output will be separated by tabs (as the input)
The 1 at the end of the awk script, let awk print the original input line in addition to the script generated output. That's because an awk script usually contains tests (regex, etc) and actions for them. The test 1 will be always true. And as the default action of awk is printing the current line, it will print the line

Resources