capture-specific-columns and mask the column - shell

I am trying to write the script to capture and mask the specific column.I need to have the 4 column with clear text and also mask it too in output file .I am not sure how to mask the same column
Pls help me in rewriting the below command or new command
input.txt
---------
AA | BB | CC | 123456
output.txt
---------
BB | 123456 | 12xx56
Script I wrote
cat input.txt | nawk -F '|' '{print $2 "|" $4 "|" $4} >output.txt

nawk -F '|' '{print $2 "|" $4 "|" substr($4, 1,3) "xx" substr($4,6,2)}' input.txt > output.txt
output
BB | 123456| 12xx56
Assuming you don't really need the leading and trailing spaces, I would make it
nawk -F '|' '{gsub(/ */, "", $0);print $2 "|" $4 "|" substr($4, 1,2) "xx" substr($4,5,2)}' input.txt > output.txt
cat output.txt
BB|123456|12xx56
final solution
echo "AA | BB | CC | 12345678" \
| awk -F '|' '{gsub(/ */, "", $0)
#dbg print "length$4=" (length($4)-4)
masking=sprintf("%"(length($4)-4)"s", " ") ; gsub(/ /, "x", masking)
print $2 "|" $4 "|" substr($4, 1,2) masking substr($4,(length($4)-1),2)
}'
BB|12345678|12xxxx78
I using echo "..." to simplfy the testing process. You can take that out, replace with input.txt > output.txt and the end of the line and it will work as before.
I've added the (length($4)-1) to make the position of the 2nd to last char on $4 dynamic, based on the length of what ever word is in $4.
IHTH

Related

Extracting string from line, give as input to a command and then output the entire line with replacing the string

I have a file containing like below, multiple rows are there
test1| 1234 | test2 | test3
Extract second column 1234 and run a command feeding that as input
lets say we get X as output to the command
Print the output as below for each of the line
test1 | X | test2 | test3
Prefer if I could do it in one-liner, but open to ideas.
I am able to extract string using awk, but I am not sure how I can still preserve the initial output and replace it in the output. Below is what I tested
cat file.txt | awk -F '|' '{newVar=system("command "$2); print newVar $4}'
#
Sample command output, where we extract the "name"
openstack show 36a6c06e-5e97-4a53-bb42
+----------------------------+-----------------------------------+
| Property | Value |
+----------------------------+-----------------------------------+
| id | 36a6c06e-5e97-4a53-bb42 |
| name | testVM1 |
+----------------------------+-----------------------------------+
Perl to the rescue!
perl -lF'/\|/' -ne 'chomp( $F[1] = qx{ command $F[1] }); print join "|", #F' < file.txt
-n reads the input line by line
-l removes newlines from input and adds them to prints
F specifies how to split each input line into the #F array
$F[1] corresponds to the second column, we replace it with the output of the command
chomp removes the trailing newline from the command output
join glues the array back to one line
Using awk:
awk -F ' *| *' '{("command "$2) | getline $2}1' file.txt
e.g.
$ awk -F ' *| *' '{("date -d #"$2) | getline $2}1' file.txt
test1| Thu 01 Jan 1970 05:50:34 AM IST | test2 | test3
I changed the field separator from | to *| * to accommodate the spaces surrounding the fields. You can remove those based on your actual input.
This finally did the trick..
awk -F' *[|] *' -v OFS=' | ' '{
cmd = "openstack show \047" $2 "\047"
while ( (cmd | getline line) > 0 ) {
if ( line ~ /name/ ) {
split(line,flds,/ *[|] */)
$2 = flds[3]
break
}
}
close(cmd)
print
}' file
If command can take the whole list of values once and generate the converted list as output (e.g. tr 'a-z' 'A-Z') then you'd want to do something like this to avoid spawning a shell once per input line (which is extremely slow):
awk -F' *[|] *' '{print $2}' file |
command |
awk -F' *[|] *' -v OFS=' | ' 'NR==FNR{a[FNR]=$0; next} {$2=a[FNR]} 1' - file
otherwise if command needs to be called with one value at a time (e.g. echo) or you just don't care about execution speed then you'd do:
awk -F' *[|] *' -v OFS=' | ' '{
cmd = "command \047" $2 "\047"
if ( (cmd | getline line) > 0 ) {
$2 = line
}
close(cmd)
print
}' file
The \047s will produce single quotes around $2 when it's passed to command and so shield it from shell interpretation (see https://mywiki.wooledge.org/Quotes) and the test on the result of getline will protect you from silently overwriting the current $2 with the output of an earlier command execution in the event of a failure (see http://awk.freeshell.org/AllAboutGetline). The close() ensures that you don't end up with a "too many open files" error or other cryptic problem if the pipe isn't being closed properly, e.g. if command is generating multiple lines and you're just reading the first one.
Given your comment below, if you're going with the 2nd approach above then you'd write something like:
awk -F' *[|] *' -v OFS=' | ' '{
cmd = "openstack show \047" $2 "\047"
while ( (cmd | getline line) > 0 ) {
split(line,flds)
if ( flds[2] == "name" ) {
$2 = flds[3]
break
}
}
close(cmd)
print
}' file

add output in command

I started only a few weeks ago with scripting or I am trying at least ...
bash-4.3# /usr/openv/netbackup/bin/admincmd/bperror -backstat -hoursago 72 \
| grep xxx1 \
| awk '{ print $1 "\t" $19 "\t" $12 "\t" $14 "\t" $16 }' >> test
bash-4.3# cat test
1535229470 0 xxx1 policy1 sched1
1535314239 0 xxx1 policy1 sched1
1535400749 0 xxx1 policy1 sched1
Now I want to transform the first entry (timestamp) into a readable date
date=$(awk 'NR == 1 {print $1}' test); bpdbm -ctime $date |awk '{ print $3 " " $4 " " $5 " " $6 " " $8 }'
Sat Aug 25 22:37:50 2018
How can I now replace the first entry on each line by this output or change the first command?
thank you very much!
Using GNU awk:
awk '$1~/[0-9]+/{$1=strftime(PROCINFO["strftime"],$1)}1' file
This replaces the timestamp in the first field of the line with the associated readable date using the function strftime.
The date format is the default one PROCINFO["strftime"] as mentioned in the awk man page.

awk print something if column is empty

I am trying out one script in which a file [ file.txt ] has so many columns like
abc|pqr|lmn|123
pqr|xzy|321|azy
lee|cha| |325
xyz| |abc|123
I would like to get the column list in bash script using awk command if column is empty it should print blank else print the column value
I have tried the below possibilities but it is not working
cat file.txt | awk -F "|" {'print $2'} | sed -e 's/^$/blank/' // Using awk and sed
cat file.txt | awk -F "|" '!$2 {print "blank"} '
cat file.txt | awk -F "|" '{if ($2 =="" ) print "blank" } '
please let me know how can we do that using awk or any other bash tools.
Thanks
I think what you're looking for is
awk -F '|' '{print match($2, /[^ ]/) ? $2 : "blank"}' file.txt
match(str, regex) returns the position in str of the first match of regex, or 0 if there is no match. So in this case, it will return a non-zero value if there is some non-blank character in field 2. Note that in awk, the index of the first character in a string is 1, not 0.
Here, I'm assuming that you're interested only in a single column.
If you wanted to be able to specify the replacement string from a bash variable, the best solution would be to pass the bash variable into the awk program using the -v switch:
awk -F '|' -v blank="$replacement" \
'{print match($2, /[^ ]/) ? $2 : blank}' file.txt
This mechanism avoids problems with escaping metacharacters.
You can do it using this sed script:
sed -r 's/\| +\|/\|blank\|/g' File
abc|pqr|lmn|123
pqr|xzy|321|azy
lee|cha|blank|325
xyz|blank|abc|123
If you don't want the |:
sed -r 's/\| +\|/\|blank\|/g; s/\|/ /g' File
abc pqr lmn 123
pqr xzy 321 azy
lee cha blank 325
xyz blank abc 123
Else with awk:
awk '{gsub(/\| +\|/,"|blank|")}1' File
abc|pqr|lmn|123
pqr|xzy|321|azy
lee|cha|blank|325
xyz|blank|abc|123
You can use awk like this:
awk 'BEGIN{FS=OFS="|"} {for (i=1; i<=NF; i++) if ($i ~ /^ *$/) $i="blank"} 1' file
abc|pqr|lmn|123
pqr|xzy|321|azy
lee|cha|blank|325
xyz|blank|abc|123

Print column using while loop in awk

I need to extract the values of the 2nd column from a file while the value from $1 = 2 until $1 = 3. As an example, from the file
1 | 2.158e+06
| 2.31e+06
| 5.008e+06
2 | 693000
| 718000
| 725000
3 | 2.739e+06
| 2.852e+06
| 2.865e+06
| 2.874e+06
4 | 4.033e+06
| 4.052e+06
| 4.059e+06
I would like to extract values of the 2nd column from $1=2 until $1=3
693000
718000
725000
I tried using awk, but I have just figured out how to extract the values from $1=1 until $2=2
awk -F "|" '{if ($1>1) exit; else print $2}' foo.txt
Output
2.158e+06
2.31e+06
5.008e+06
I also tried this
awk -F "|" '{i=2; do {print $2; i++} while ($4); if ($1>i) exit}' foo.txt
But it gives me the whole 2nd column
2.158e+06
2.31e+06
5.008e+06
693000
718000
725000
2.739e+06
2.852e+06
2.865e+06
2.874e+06
4.033e+06
Does anyone know how to do this using awk or other command?
Thanks
A range pattern could work nicely here. The pattern $1==2,$1==3 will start executing the action when the first column is 2 and stop when it is 3. (Since the range is inclusive we need to check that the first column is not 3 before printing the second column in this case.)
$ awk -F\| '$1==2,$1==3 { if ($1 != 3) print $2 }' foo.txt
693000
718000
725000
hzhang#dell-work ~ $ cat sample.csv
1 | 2.158e+06
| 2.31e+06
| 5.008e+06
2 | 693000
| 718000
| 725000
3 | 2.739e+06
| 2.852e+06
| 2.865e+06
| 2.874e+06
4 | 4.033e+06
| 4.052e+06
| 4.059e+06
hzhang#dell-work ~ $ awk -F"|" 'BEGIN{c=0}{if($1>=3){c=0} if(c==1 ||($1>=2 && $1<3)){c = 1;print $2}}' sample.csv
693000
718000
725000
I set a flag c. If $1 is not between 2 and 3, the flag set to 0, otherwise it is 1, which means we can print $2 out.
This is what I came up with:
awk -F "|" '{if ($1==3) exit} /^2/,EOF {print $2}' file
1) /^2/,EOF {print $2} signifies print everything in second column up to the end of file, starting with a row that begins with a 2
2) {if ($1==3) exit} stops printing once the first column is a number 3
Output
693000
718000
725000
using getline statement in awk tactically
awk -v FS=" [|] " '$1=="2"{print $2;getline;while(($1==" "||$1==2)){print $2;$0="";getline>0}}' my_file
Here is another awk
awk -F\| '/^2$/ {f=1} /^3$/ {f=0} f {print $2+0}' file
693000
718000
725000
-F\| set field separator to |
/^2/ if file start with 2, set flag f to true.
/^3/ if file start with 2, set flag f to false.
f {print $2+0}' if flag f is true, print filed 2.
$2+0 this is used to remove space in front of number. Remove it if it contains letters.
Just so you don't have to read the entire file, exit when you see a '3':
$ awk -F\| '/^2\s+/ {f=1} /^3\s+/ {exit} f {print $2+0}' file
693000
718000
725000

Multiple condition in nawk command

I have the nawk command where I need to format the data based on the length .All the time I need to keep first 6 digit and last 4 digit and make xxxx in the middle. Can you help in fine tuning the below script
#!/bin/bash
FILES=/export/home/input.txt
cat $FILES | nawk -F '|' '{
if (length($3) >= 13 )
print $1 "|" $2 "|" substr($3,1,6) "xxxxxx" substr($3,13,4) "|" $4"|" $5
else
print $1 "|" $2 "|" $3 "|" $4 "|" $5"|
}' > output.txt
done
input.txt
"2"|"X"|"A"|"ST"|"245552544555201"|"1111-11-11"|75.00
"6"|"Y"|"D"|"VT"|"245652544555200"|"1111-11-11"|95.00
"5"|"X"|"G"|"ST"|"3445625445552023"|"1111-11-11"|75.00
"3"|"Y"|"S"|"VT"|"24532254455524"|"1111-11-11"|95.00
output.txt
"X"|"ST"|"245552544555201"|"245552xxxxx5201"
"Y"|"VT"|"245652544555200"|"245652xxxxx5200"
"X"|"ST"|"3445625445552023"|"344562xxxxxx2023"
"Y"|"VT"|"24532254455524"|"245322xxxx5524"
Try this:
$ awk '
BEGIN {FS = OFS = "|"}
length($5)>=13 {
fld5=$5
start = substr($5,1,7)
end = substr($5,length($5)-4)
gsub(/./,"x",fld5)
sub(/^......./,start,fld5)
sub(/.....$/,end,fld5)
$1=$2; $2=$4; $3=$5; $4=fld5; NF-=3;
}1' file
"X"|"ST"|"245552544555201"|"245552xxxxx5201"
"Y"|"VT"|"245652544555200"|"245652xxxxx5200"
"X"|"ST"|"3445625445552023"|"344562xxxxxx2023"
"Y"|"VT"|"24532254455524"|"245322xxxx5524"

Resources