Replace third column on all rows of a file in shell - bash

I have a file that has about 60 columns of data. The file is also about 80 million records long. I need a bash command to replace the third column with '20190113'. How do we determine it is the third column? It is delimited by the non-printable character '\001'
So replace the third field on all records of data in a file delimited by the special character '\001' with the value '20190113;

awk can handle non-printing characters, including \001.
$ cat -v test.in
abc^Axyz^Afoo
def^Awvu^Abar
$ awk '{$3 = "20190113"}1' FS=$'\1' OFS=$'\1' test.in | cat -v
abc^Axyz^A20190113
def^Awvu^A20190113
$'…' is a construction supported by most shells that lets you use escape characters.
^A represents the \001 character; -v tells cat to print that instead of a literal non-printing \001 byte.

Not as elegant as awk, but here is method with sed.
a=$(printf "1\0012\0013\0014\0015")
# check
echo "$a" | hexdump -c
b=$(echo "$a" | sed -r 's/([^\x01]*\x01[^\x01]*\x01)[^\x01]*[^x01]/\120190113\x01/')
# check
echo "$b" | hexdump -c

You can use the hex format "\xdd" to specify the delimiters for awk.
Just set the Input and Output delimiters in the BEGIN section.
$ cat -v brian.txt
abc^Axyz^Afoo
def^Awvu^Abar
$ awk ' BEGIN{ FS=OFS="\x01"} { $3="20190113"; print } ' brian.txt
abcxyz20190113
defwvu20190113
$ awk ' BEGIN{ FS=OFS="\x01"} { $3="20190113"; print } ' brian.txt | cat -v
abc^Axyz^A20190113
def^Awvu^A20190113
$
You can try with Perl also
$ perl -F"\x01" -lane ' $F[2]="20190113"; print join("\x01",#F) ' brian.txt
abcxyz20190113
defwvu20190113
$ perl -F"\x01" -lane ' $F[2]="20190113"; print join("\x01",#F) ' brian.txt | cat -v
abc^Axyz^A20190113
def^Awvu^A20190113
$

This might work for you (GNU sed):
sed 's/[^[.\d1.]]*/20190113/3' file
This replaces the third occurrence of those characters that do not match \001 with the string 20190113 on every line throughout the file.

Related

Ignore comma after backslash in a line in a text file using awk or sed

I have a text file containing several lines of the following format:
name,list_of_subjects,list_of_sports,school
Eg1: john,science\,social,football,florence_school
Eg2: james,painting,tennis\,ping_pong\,chess,highmount_school
I need to parse the text file and print the output of fields ignoring the escaped commas. Here those will be fields 2 or 3 like this:
science, social
tennis, ping_pong, chess
I do not know how to ignore escaped characters. How can I do it with awk or sed in terminal?
Substitute \, with a character that your records do not contain normally (e.g. \n), and restore it before printing. For example:
$ awk -F',' 'NR>1{ if(gsub(/\\,/,"\n")) gsub(/\n/,",",$2); print $2 }' file
science,social
painting
Since first gsub is performed on the whole record (i.e $0), awk is forced to recompute fields. But the second one is performed on only second field (i.e $2), so it will not affect other fields. See: Changing Fields.
To be able to extract multiple fields with properly escaped commas you need to gsub \ns in all fields with a for loop as in the following example:
$ awk 'BEGIN{ FS=OFS="," } NR>1{ if(gsub(/\\,/,"\n")) for(i=1;i<=NF;++i) gsub(/\n/,"\\,",$i); print $2,$3 }' file
science\,social,football
painting,tennis\,ping_pong\,chess
See also: What's the most robust way to efficiently parse CSV using awk?.
You could replace the \, sequences by another character that won't appear in your text, split the text around the remaining commas then replace the chosen character by commas :
sed $'s/\\\,/\31/g' input | awk -F, '{ printf "Name: %s\nSubjects : %s\nSports: %s\nSchool: %s\n\n", $1, $2, $3, $4 }' | tr $'\31' ','
In this case using the ASCII control char "Unit Separator" \31 which I'm pretty sure your input won't contain.
You can try it here.
Why awk and sed when bash with coreutils is just enough:
# Sorry my cat. Using `cat` as input pipe
cat <<EOF |
name,list_of_subjects,list_of_sports,school
Eg1: john,science\,social,football,florence_school
Eg2: james,painting,tennis\,ping_pong\,chess,highmount_school
EOF
# remove first line!
tail -n+2 |
# substitute `\,` by an unreadable character:
sed 's/\\\,/\xff/g' |
# read the comma separated list
while IFS=, read -r name list_of_subjects list_of_sports school; do
# read the \xff separated list into an array
IFS=$'\xff' read -r -d '' -a list_of_subjects < <(printf "%s" "$list_of_subjects")
# read the \xff separated list into an array
IFS=$'\xff' read -r -d '' -a list_of_sports < <(printf "%s" "$list_of_sports")
echo "list_of_subjects : ${list_of_subjects[#]}"
echo "list_of_sports : ${list_of_sports[#]}"
done
will output:
list_of_subjects : science social
list_of_sports : football
list_of_subjects : painting
list_of_sports : tennis ping_pong chess
Note that this will be most probably slower then solution using awk.
Note that the principle of operation is the same as in other answers - substitute \, string by some other unique character and then use that character to iterate over the second and third field elemetns.
This might work for you (GNU sed):
sed -E 's/\\,/\n/g;y/,\n/\n,/;s/^[^,]*$//Mg;s/\n//g;/^$/d' file
Replace quoted commas by newlines and then revert newlines to commas and commas to newlines. Remove all lines that do not contain a comma. Delete empty lines.
Using Perl. Change the \, to some control char say \x01 and then replace it again with ,
$ cat laxman.txt
john,science\,social,football,florence_school
james,painting,tennis\,ping_pong\,chess,highmount_school
$ perl -ne ' s/\\,/\x01/g and print ' laxman.txt | perl -F, -lane ' for(#F) { if( /\x01/ ) { s/\x01/,/g ; print } } '
science,social
tennis,ping_pong,chess
You can perhaps join columns with a function.
function joincol(col, i) {
$col=$col FS $(col+1)
for (i=col+1; i<NF; i++) {
$i=$(i+1)
}
NF--
}
This might get used thusly:
{
for (col=1; col<=NF; col++) {
if ($col ~ /\\$/) {
joincol(col)
}
}
}
Note that decrementing NF is undefined behaviour in POSIX. It may delete the last field, or it may not, and still be POSIX compliant. This works for me in BSDawk and Gawk. YMMV. May contain nuts.
Use gawk's FPAT:
awk -v FPAT='(\\\\.|[^,\\\\]*)+' '{print $3}' file
#list_of_sports
#football
#tennis\,ping_pong\,chess
then use gnusub to replace the backslashes:
awk -v FPAT='(\\\\.|[^,\\\\]*)+' '{print gensub("\\\\", "", "g", $3)}' file
#list_of_sports
#football
#tennis,ping_pong,chess

awk to ignore leading and trailing space and blank lines and commented lines if any from a file

Need help on awk
awk to ignore leading and trailing space and blank lines and commented lines if any from a file
Here you go,
grep "MyText" FromMyLog.log |awk -F " " '{print $2}'|awk -F "#" '{print $1}'
Here MyText is the key to grep from file FromMyLog.log
-F is used to avoid the following value, here space between quotes.
'{print $2}' will print the 2nd argument from the output, you can use $1, $2 as your requirement.
awk -F "#" This will ignore the commented lines.
This is just a hint for you, Modify the code with your requirements. This works for me while grep.
grep -v '^$\|^\s*\#' <filename> or egrep -v '^[[:space:]]*$|^ *#' <file_name> (if white spaces)
I think this is what you were asking for:
$> echo -e ' abc \t
\t efg
# alskdjfl
#
awk
# askdfh
' |
awk '
# match if first none space character is not a hash sign
/^[[:space:]]*[^#]/ {
# delete any spaces from start and end of line
sub(/^[[:space:]]*/, "");
sub(/[[:space:]]*$/, "", NF); # `NF` is Number of Fields
print
}'
abc
efg
awk
This can be folded onto a single line if so needed. Any problems, an actual example of the input (in a code block in your question) would be helpful.
Here's one way to extract required content ignoring spaces
FILE CONTENT
Server: 192.168.XX.XX
Address 1: 192.168.YY.YY
Name: central.google.com
Now to extract the server's address without spaces.
COMMAND
awk -F':' '/Server/ '{print $2}' YOURFILENAME | tr -s " "
option -s for squeezing the repetition of spaces.
which gives,
192.168.XX.XX
Here, notice that there is one leading space in the address.
To completely ignore spaces you can change that to,
awk -F':' '/Server/ '{print $2}' YOURFILENAME | tr -d [:space:]
option -d for removing particular characters, which is [:space:] here.
which gives,
192.168.YY.YY
without any leading or trailing spaces.
tr is an UNIX utility for translating, or deleting, or squeezing repeated characters. tr refers to translate here.
Examples:
tr [:lower:] [:upper:]
gives,
YOUAREAWESOME
for
youareawesome
Hope that helps.

Count number of Special Character in Unix Shell

I have a delimited file that is separated by octal \036 or Hexadecimal value 1e.
I need to count the number of delimiters on each line using a bash shell script.
I was trying to use awk, not sure if this is the best way.
Sample Input (| is a representation of \036)
Example|Running|123|
Expected output:
3
awk -F'|' '{print NF-1}' file
Change | to whatever separator you like. If your file can have empty lines then you need to tweak it to:
awk -F'|' '{print (NF ? NF-1 : 0)}' file
You can try
awk '{print gsub(/\|/,"")}'
Simply try
awk -F"|" '{print substr($3,length($3))}' OFS="|" Input_file
Explanation: Making field separator -F as | and then printing the 3rd column by doing $3 only as per your need. Then setting OFS(output field separator) to |. Finally mentioning Input_file name here.
This will work as far as I know
echo "Example|Running|123|" | tr -cd '|' | wc -c
Output
3
This should work for you:
awk -F '\036' '{print NF-1}' file
3
-F '\036' sets input field delimiter as octal value 036
Awk may not be the best tool for this. Gnu grep has a cool -o option that prints each matching pattern on a separate line. You can then count how many matching lines are generated for each input line, and that's the count of your delimiters. E.g. (where ^^ in the file is actually hex 1e)
$ cat -v i
a^^b^^c
d^^e^^f^^g
$ grep -n -o $'\x1e' i | uniq -c
2 1:
3 2:
if you remove the uniq -c you can see how it's working. You'll get "1" printed twice because there are two matching patterns on the first line. Or try it with some regular ascii characters and it becomes clearer what the -o and -n options are doing.
If you want to print the line number followed by the field count for that line, I'd do something like:
$grep -n -o $'\x1e' i | tr -d ':' | uniq -c | awk '{print $2 " " $1}'
1 2
2 3
This assumes that every line in the file contains at least one delimiter. If that's not the case, here's another approach that's probably faster too:
$ tr -d -c $'\x1e\n' < i | awk '{print length}'
2
3
0
0
0
This uses tr to delete (-d) all characters that are not (-c) 1e or \n. It then pipes that stream of data to awk which just counts how many characters are left on each line. If you want the line number, add " | cat -n" to the end.

How to retrieve digits including the separator "."

I am using grep to get a string like this: ANS_LENGTH=266.50 then I use sed to only get the digits: 266.50
This is my full command: grep --text 'ANS_LENGTH=' log.txt | sed -e 's/[^[[:digit:]]]*//g'
The result is : 26650
How can this line be changed so the result still shows the separator: 266.50
You don't need grep if you are going to use sed. Just use sed' // to match the lines you need to print.
sed -n '/ANS_LENGTH/s/[^=]*=\(.*\)/\1/p' log.txt
-n will suppress printing of lines that do not match /ANS_LENGTH/
Using captured group we print the value next to = sign.
p flag at the end allows to print the lines that matches our //.
If your grep happens to support -P option then you can do:
grep -oP '(?<=ANS_LENGTH=).*' log.txt
(?<=...) is a look-behind construct that allows us to match the lines you need. This requires the -P option
-o allows us to print only the value part.
You need to match a literal dot as well as the digits.
Try sed -e 's/[^[[:digit:]\.]]*//g'
The dot will match any single character. Escaping it with the backslash will match only a literal dot.
Here is some awk example:
cat file:
some data ANS_LENGTH=266.50 other=22
not mye data=43
gnu awk (due to RS)
awk '/ANS_LENGTH/ {f=NR} f&&NR-1==f' RS="[ =]" file
266.50
awk '/ANS_LENGTH/ {getline;print}' RS="[ =]" file
266.50
Plain awk
awk -F"[ =]" '{for(i=1;i<=NF;i++) if ($i=="ANS_LENGTH") print $(i+1)}' file
266.50
awk '{for(i=1;i<=NF;i++) if ($i~"ANS_LENGTH") {split($i,a,"=");print a[2]}}' file
266.50

Shell: Subsitute a string between 2 Known strings

I wish to replace the contents of new_version varaiable (13.2.0/8) in between abc_def_APP and application1.war strings in file1
Script :
#!/bin/ksh
new_version="13.0.5/8"
old_version=($(grep -r "location=.*application1.war" /path/file1| awk '{print ($1)}'| cut -f8- -d"/"|sed 's/.\{1\}$//'))
echo "$old_version" 'This gives me version number from file1 which needs to be replaced(13.2.0/9)
File1 Contents:
location="cc://view/blah/blah/blah/abc_def_APP/13.2.0/9/application1.war"
Use following sed command to have your replacement:
sed -i.bak -r "s#^(.*/abc_def_APP/).*(/application1\.war.*)#\1$version1/$version2\2#" /path/file1
With GNU awk (for gensub()):
$ cat file
location="cc://view/blah/blah/blah/abc_def_APP/13.2.0/9/application1.war"
$ new_version="13.2.0/8"
$ gawk -v nv="$new_version" '{$0=gensub(/^(location.*abc_def_APP\/).*(\/application1.war.*)/,"\\1" nv "\\2","")}1' file
location="cc://view/blah/blah/blah/abc_def_APP/13.2.0/8/application1.war"
The difference between this and a sed solution is that awk doesn't require you to jump through hoops due to your new_version variable containing a "/" (or any other character).

Resources