Bash: Filtering records in a file based on multi column delimiter - bash

Need help in Bash to filter records based on a multicolumn delimiter.
Delimiter is |^^|
Sample record
xyz#ATT.NET|^^|xyz|^^|307
Awk runs file when used with single character delimiter but not with multi character.
awk -F"|^^|" "NF !=3 {print}" file.txt
Any suggestions?

The issue is that every character in your delimiter is a regexp metacharacter so you need to escape them when appropriate so awk knows you want them treated literally. This might be overkill:
awk -F'\\|\\^\\^\\|' 'NF!=3' file.txt
but I can't test it since you only provided one line of input, not the selection of lines some of which do/don't match that'd be required to test the script.

awk -F "<regex>" ...
It is not a multicolumn delimiter, is is a regular expression
simple regex,
such as match this single char are what you get use to,
but not all there is.

One way is to escape all the regex characters as #Ed Morton answered.
Alternatively,
you can replace all |^^| with a single character which never shows in your file content, here let's say a comma
sed 's/|^^|/,/g' file.txt
xyz#ATT.NET,xyz,307
The command would be
sed 's/|^^|/,/g' file.txt | awk -F, 'NF != 3'

Related

How to remove string between two characters and before the first occurrence using sed

I would like to remove the string between ":" and the first "|" using sed.
input:
|abc:1.2.3|def|
output from sed:
|abc|def|
I managed to come up with sed 's|\(:\)[^|]*|\1|', but this sed command does not remove the first character (":"). How can I modify this command to also remove the colon?
You don't need to group : in your pattern and use it in substitution.
You should keep it simple:
s='|abc:1.2.3|def|'
sed 's/:[^|]*//' <<< "$s"
|abc|def|
: matches a colon and [^|]* matches 0 or more non-pipe characters
1st solution: With awk you could try following awk program.
awk 'match($0,/:[^|]*/){print substr($0,1,RSTART-1) substr($0,RSTART+RLENGTH)}' Input_file
Explanation: Using match function of awk, where matching from : to till first occurrence of | here. So what match function does is, whenever a regex is matched in it, it will SET values for its OOTB variables named RSTART and RLENGTH, so based on that we are printing sub-string to neglect matched part and print everything else as per required output in question.
2nd solution: Using FPAT option in GNU awk, try following, written and tested with your shown samples only.
awk -v FPAT=':[^|]*' '{print $1,$2}' Input_file

Shell Script Replace a Specified Column with sed

I have a example dataset separated by semicolon as below;
123;IZMIR;ZMIR;123
abc;ANKAR;aaa;999
AAA;ZMIR;ZMIR;bob
BBB;ANKR;RRRR;ABC
I would like to replace values in a specified column. Lets say I want to change "ZMIR" AS "IZMIR" but only for the third column, the ones on the second column must stay the same.
Desired output is;
123;IZMIR;IZMIR;123
abc;ANKAR;aaa;999
AAA;ZMIR;IZMIR;bob
BBB;ANKR;RRRR;ABC
I tried;
sed 's/;ZMIR;/;IZMIR;/' file.txt
the problem is that it changes all the values on the file not just the 3rd one.
I also tried;
awk -F";" '{gsub("ZMIR",";IZMIR;",$2)}1'
and here it specifies the column but, it somehow adds spaces;
123 I;IZMIR; ZMIR 123
abc;ANKAR;aaa;999
AAA ;IZMIR; ZMIR bob
BBB;ANKR;RRRR;ABC
sed doesn't know about columns, awk does (but in awk they're called "fields"):
awk 'BEGIN{FS=OFS=";"} $3=="ZMIR"{$3="IZMIR"} 1' file
Note that since the above is doing a literal string search and replace, you don't have to worry about regexp or backreference metacharacters in the search or replacement strings, unlike in a sed solution (see https://stackoverflow.com/a/29626460/1745001).
wrt what you tried previously with awk:
awk -F";" '{gsub("ZMIR",";IZMIR;",$2)}1'
That says: find "ZMIR" in the 2nd semi-colon-separated field and replace it with ";IZMIR;" and also change every existing ";" on the line to a blank character.
To learn awk, read the book Effective Awk Programming, 4th Edition, by Arnold Robbins.
If you exactly know where the word to replace is located and how many of them are in that line you could use sed with something like:
sed '3 s/ZMIR/IZMIR/2'
With the 3 in the beginning you are selecting the third line and with the 2 in the end the second occurrence. However the awk solution is a better one. But just that you know how it works in sed ;)
This might work for you (GNU sed):
sed -r 's/[^;]+/\n&\n/3;s/\nZMIR\n/IZMIR/;s/\n//g' file
Surround the required field by unique markers then replace the required string (plus markers) by the replacement string. Finally remove the unique markers.
Perl on Command Line
Input
123;IZMIR;ZMIR;123
000;ANKAR;aaa;999
AAA;ZMIR;ZMIR;bob
BBB;ANKR;RRRR;ABC
$. == 1 means first row it does the work only for this row So second row $. == 2
$F[0] means first column and it only does on this column So fourth column $F[3]
-a -F\; means that delimiter is ;
what you want
perl -a -F\; -pe 's/$F[0]/***/ if $. == 1' your-file
output
***;IZMIR;ZMIR;123
abc;ANKAR;aaa;999
AAA;ZMIR;ZMIR;bob
BBB;ANKR;RRRR;ABC
for row == 2 and column == 2
perl -a -F\; -pe 's/$F[1]/***/ if $. == 2' your-file
123;IZMIR;ZMIR;123
abc;***;aaa;999
AAA;ZMIR;ZMIR;bob
BBB;ANKR;RRRR;ABC
Also without -a -F
perl -pe 's/123/***/ if $. == 1' your-file
output
***;IZMIR;ZMIR;123
abc;ANKAR;aaa;999
AAA;ZMIR;ZMIR;bob
BBB;ANKR;RRRR;ABC
If you want to edit you can add -i option that means Edit in-place And that's it, it simply find, replace and save in the same file
perl -i -a -F\; and so on
You need to include some absolute references in the line:
^ for beginning of the line
unequivocal separation pattern
^.*ZMIR and [^;]*;ZMIR give different values where first take everything before ZMIR and sed take the longest possible
Specific
sed 's/^\([^;]*;[^;]*;\)ZMIR;/\1IZMIR;/' YourFile
generic where Old and New are batch variable (Remember, this is regex value so regex rules to apply like escaping some char)
#Old='ZMIR'
#New='IZMIR'
sed 's/^\(\([^;]*;\)\{2\}\)'${Old}';/\1'${New}';/' YourFile
In this simple case sed is an alternative, but awk is better for a complex or long line.

How to escape single quote using awk in a bash script

This is what I have so far, tried multiple ways but can't get it just right.
My goal is to sanitize the input to prevent problems while inputting to mysql from text file
cat 'file.txt' | awk '{gsub(/'"'"'/, '.') ; print $0}' > 'file_sanitized.txt'
What you want is:
awk '{gsub(/\047/,".")}1' file
See http://awk.freeshell.org/PrintASingleQuote.
Since you requested "how to escape single quote using awk" here's a solution using awk:
awk '{ gsub("\x27", ".");print $0}'
where \x27 is the escape sequence representation of the hexadecimal value 27 (a single quote).
For a list of all escape sequences see https://www.gnu.org/software/gawk/manual/html_node/Escape-Sequences.html
I'm not sure if I got the problem correctly. If you want to replace all occurrences of a single quote by a dot, use tr:
tr "'" "." < file.txt > sanitized.txt
If you want to escape the single quote with a backslash use sed like this:
sed "s/'/\\\'/g" file.txt > sanitized.txt
Note: Please take the advice from CharlesDuffy seriously. This is far from a stable and safe solution to escape values for an SQL import.
You have used wrong quoting:
awk '{gsub(/'"'"'/, "."); print}'

How to extract specific string in a file using awk, sed or other methods in bash?

I have a file with the following text (multiple lines with different values):
TokenRange(start_token:8050285221437500528,end_token:8051783269940793406,...
I want to extract the value of start_token and end_token. I tried awk and cut, but I am not able to figure out the best way to extract the targeted values.
Something like:
cat filename| get the values of start_token and end_token
grep -oP '(?<=token:)\d+' filename
Explanation:
-o: print only part that matches, not complete line
-P: use Perl regex engine (for look-around)
(?<=token:): positive look-behind – zero-width pattern
\d+: one or more digits
Result:
8050285221437500528
8051783269940793406
A (potentially more efficient) variant of this, as pointed out by hek2mgl in his comment, uses \K, the variable-width look-behind:
grep -oP 'token:\K\d+'
\K keeps everything that has been matched to the left of it, but does not include it in the match (see perlre).
Using awk:
awk -F '[(:,]' '{print $3, $5}' file
8050285221437500528 8051783269940793406
First value is start_token and last value is end_token.
a sed version
sed -e '/^TokenRange(/!d' -e 's/.*:\([0-9]*\),.*:\([0-9]*\),.*/\1 \2/' YourFile

add character to particular field using awk

How to add '0' to the first and 9th digit of 2nd field using awk?
data
12345,20150303024955
output
12345,0201503030024955
I am just new to shell script.
Assuming by "add to" you mean "prefix with":
$ echo '12345,20150303024955' |
awk 'BEGIN{FS=OFS=","} {sub(/.{8}/,"&0",$2); $2="0"$2}1'
12345,0201503030024955
You asked for awk but this is also easy to do in sed:
$ echo '12345,20150303024955' | sed -r 's/,(.{8})/,0\10/'
12345,0201503030024955
How it works
-r
Turn on extended regex so that we don't need backslash escapes.
s/,(.{8})/,0\10/
Look for a comma followed by eight characters. Replace that with a comma, a zero, those eight characters, and another zero.

Resources