I have the nawk command where I need to format the data based on the length .All the time I need to keep first 6 digit and last 4 digit and make xxxx in the middle. Can you help in fine tuning the below script
#!/bin/bash
FILES=/export/home/input.txt
cat $FILES | nawk -F '|' '{
if (length($3) >= 13 )
print $1 "|" $2 "|" substr($3,1,6) "xxxxxx" substr($3,13,4) "|" $4"|" $5
else
print $1 "|" $2 "|" $3 "|" $4 "|" $5"|
}' > output.txt
done
input.txt
"2"|"X"|"A"|"ST"|"245552544555201"|"1111-11-11"|75.00
"6"|"Y"|"D"|"VT"|"245652544555200"|"1111-11-11"|95.00
"5"|"X"|"G"|"ST"|"3445625445552023"|"1111-11-11"|75.00
"3"|"Y"|"S"|"VT"|"24532254455524"|"1111-11-11"|95.00
output.txt
"X"|"ST"|"245552544555201"|"245552xxxxx5201"
"Y"|"VT"|"245652544555200"|"245652xxxxx5200"
"X"|"ST"|"3445625445552023"|"344562xxxxxx2023"
"Y"|"VT"|"24532254455524"|"245322xxxx5524"
Try this:
$ awk '
BEGIN {FS = OFS = "|"}
length($5)>=13 {
fld5=$5
start = substr($5,1,7)
end = substr($5,length($5)-4)
gsub(/./,"x",fld5)
sub(/^......./,start,fld5)
sub(/.....$/,end,fld5)
$1=$2; $2=$4; $3=$5; $4=fld5; NF-=3;
}1' file
"X"|"ST"|"245552544555201"|"245552xxxxx5201"
"Y"|"VT"|"245652544555200"|"245652xxxxx5200"
"X"|"ST"|"3445625445552023"|"344562xxxxxx2023"
"Y"|"VT"|"24532254455524"|"245322xxxx5524"
Related
I started only a few weeks ago with scripting or I am trying at least ...
bash-4.3# /usr/openv/netbackup/bin/admincmd/bperror -backstat -hoursago 72 \
| grep xxx1 \
| awk '{ print $1 "\t" $19 "\t" $12 "\t" $14 "\t" $16 }' >> test
bash-4.3# cat test
1535229470 0 xxx1 policy1 sched1
1535314239 0 xxx1 policy1 sched1
1535400749 0 xxx1 policy1 sched1
Now I want to transform the first entry (timestamp) into a readable date
date=$(awk 'NR == 1 {print $1}' test); bpdbm -ctime $date |awk '{ print $3 " " $4 " " $5 " " $6 " " $8 }'
Sat Aug 25 22:37:50 2018
How can I now replace the first entry on each line by this output or change the first command?
thank you very much!
Using GNU awk:
awk '$1~/[0-9]+/{$1=strftime(PROCINFO["strftime"],$1)}1' file
This replaces the timestamp in the first field of the line with the associated readable date using the function strftime.
The date format is the default one PROCINFO["strftime"] as mentioned in the awk man page.
I have a little script to compare some columns inside a bunch of CSV files.
It's working fine, but there are some things that are bugging me.
Here is the code:
FILES=./*
for f in $FILES
do
cat -v $f | sed "s/\^A/,/g" > op_tmp.csv
awk -F, -vOFS=, 'NR == 1{next} $9=="T"{t[$8]+=$7;n[$8]} $9=="A"{a[$8]+=$7;n[$8]} $9=="C"{c[$8]+=$7;n[$8]} $9=="R"{r[$8]+=$7;n[$8]} $9=="P"{p[$8]+=$7;n[$8]} END{ for (i in n){print i "|" "A" "|" a[i]; print i "|" "C" "|" c[i]; print i "|" "R" "|" r[i]; print i "|" "P" "|" p[i]; print i "|" "T" "|" t[i] "|" (t[i]==a[i]+c[i]+r[i]+p[i] ? "ERROR" : "MATCHED")} }' op_tmp.csv >> output.csv
rm op_tmp.csv
done
Just to explain:
I get all files on the directory, then i use CAT to replace the divisor ^A for a Pipe |.
Then i use the awk onliner to compare the columns i need and print the result to a output.csv.
But now i want to print the filename before every loop.
I tried using the cat sed and awk in the same line and printing the $FILENAME, but it doesn't work:
cat -v $f | sed "s/\^A/,/g" | awk -F, -vOFS=, 'NR == 1{next} $9=="T"{t[$8]+=$7;n[$8]} $9=="A"{a[$8]+=$7;n[$8]} $9=="C"{c[$8]+=$7;n[$8]} $9=="R"{r[$8]+=$7;n[$8]} $9=="P"{p[$8]+=$7;n[$8]} END{ for (i in n){print i "|" "A" "|" a[i]; print i "|" "C" "|" c[i]; print i "|" "R" "|" r[i]; print i "|" "P" "|" p[i]; print i "|" "T" "|" t[i] "|" (t[i]==a[i]+c[i]+r[i]+p[i] ? "ERROR" : "MATCHED")} }' > output.csv
Can anyone help?
You can rewrite the whole script better, but assuming it does what you want for now just add
echo $f >> output.csv
before awk call.
If you want to add filename in every awk output line, you have to pass it as an argument, i.e.
awk ... -v fname="$f" '{...; print fname... etc
A rewrite:
for f in ./*; do
awk -F '\x01' -v OFS="|" '
BEGIN {
letter[1]="A"; letter[2]="C"; letter[3]="R"; letter[4]="P"; letter[5]="T"
letters["A"] = letters["C"] = letters["R"] = letters["P"] = letters["T"] = 1
}
NR == 1 {next}
$9 in letters {
count[$9,$8] += $7
seen[$8]
}
END {
print FILENAME
for (i in seen) {
sum = 0
for (j=1; j<=4; j++) {
print i, letter[j], count[letter[j],i]
sum += count[letter[j],i]
}
print i, "T", count["T",i], (count["T",i] == sum ? "ERROR" : "MATCHED")
}
}
' "$f"
done > output.csv
Notes:
your method of iterating over files will break as soon as you have a filename with a space in it
try to reduce duplication as much as possible.
newlines are free, use them to improve readability
improve your variable names i, n, etc -- here "letter" and "letters" could use improvement to hold some meaning about those symbols.
awk has a FILENAME variable (here's the actual answer to your question)
awk understands \x01 to be a Ctrl-A -- I assume that's the field separator in your input files
define an Output Field Separator that you'll actually use
If you have GNU awk (version ???) you can use the ENDFILE block and do away with the shell for loop altogether:
gawk -F '\x01' -v OFS="|" '
BEGIN {...}
FNR == 1 {next}
$9 in letters {...}
ENDFILE {
print FILENAME
for ...
# clean up the counters for the next file
delete count
delete seen
}
' ./* > output.csv
I have a csv file that needs a lot of manipulation. Maybe by using awk and sed?
input:
"Sequence","Fat","Protein","Lactose","Other Solids","MUN","SCC","Batch Name"
1,4.29,3.3,4.69,5.6,11,75,"35361305a"
2,5.87,3.58,4.41,5.32,10.9,178,"35361305a"
3,4.01,3.75,4.75,5.66,12.2,35,"35361305a"
4,6.43,3.61,3.56,4.41,9.6,275,"35361305a"
final output:
43330075995647
59360178995344
40380035995748
64360275964436
I'm able to get through some of it going step by step.
How do I test specific columns for a value over 9.9 and replace it with 9.9 ?
Also, is there a way to combine any of these steps?
remove first line:
tail -n +2 test.csv > test1.txt
remove commas:
sed 's/,/ /g' test1.txt > test2.txt
remove quotes:
sed 's/"//g' test2.txt > test3.txt
remove columns 1 and 8 and
reorder remaining columns as 1,2,6,5,4,3:
sort test3.txt | uniq -c | awk '{print $3 "\t" $4 "\t" $8 "\t" $7 "\t" $6 "\t" $5}' test4.txt
test new columns 1,2,4,5,6 - if the value is over 9.9, replace it with 9.9
How should I do this step?
solution for following parts were found in a previous question - reformating a text file
columns 1,2,4,5,6 round decimals to tenths
column 3 needs to be four characters long, using zero to left fill
remove periods and spaces
awk '{$0=sprintf("%.1f%.1f%4s%.1f%.1f%.1f", $1,$2,$3,$4,$5,$6);gsub(/ /,"0");gsub(/\./,"")}1' test5.txt > test6.txt
This produces the output you want from the original file. Note that in the question you specified - note that in the question you specified "column 4 round to whole number" but in the desired output you had rounded it to one decimal place instead:
awk -F'[,"]+' 'function m(x) { return x < 9.9 ? x : 9.9 }
NR > 1 {
s = sprintf("%.1f%.1f%04d%.1f%.1f%.1f", m($2),m($3),$7,m($6),m($5),m($4))
gsub(/\./, "", s)
print s
}' test.csv
I have specified the field separator as any number of commas and double quotes together, so this "parses" your CSV format for you without requiring any additional steps.
The function m returns the minimum of 9.9 and the number you pass to it.
Output:
43330075995647
59360178995344
40380035995748
64360275964436
The three first in one go:
awk -F, '{gsub(/"/,"");$1=$1} NR>1' test.csc
1 4.29 3.3 4.69 5.6 11 75 35361305a
2 5.87 3.58 4.41 5.32 10.9 178 35361305a
3 4.01 3.75 4.75 5.66 12.2 35 35361305a
4 6.43 3.61 3.56 4.41 9.6 275 35361305a
tail -n +2 file | sort -u | awk -F , '
{
$0 = $1 FS $2 FS $6 FS $5 FS $4 FS $3
for (i = 1; i <= 6; ++i)
if ($i > 9.9)
$i = 9.9
$0 = sprintf("%.1f%.1f%4s%.0f%.1f%.1f", $1, $2, $3, $4, $5, $6)
gsub(/ /, "0"); gsub(/[.]/, "")
print
}
'
Or
< file awk -F , '
NR > 1 {
$0 = $1 FS $2 FS $6 FS $5 FS $4 FS $3
for (i = 1; i <= 6; ++i)
if ($i > 9.9)
$i = 9.9
$0 = sprintf("%.1f%.1f%4s%.0f%.1f%.1f", $1, $2, $3, $4, $5, $6)
gsub(/ /, "0"); gsub(/[.]/, "")
print
}
'
Output:
104309964733
205909954436
304009964838
406409643636
I am trying to write the script to capture and mask the specific column.I need to have the 4 column with clear text and also mask it too in output file .I am not sure how to mask the same column
Pls help me in rewriting the below command or new command
input.txt
---------
AA | BB | CC | 123456
output.txt
---------
BB | 123456 | 12xx56
Script I wrote
cat input.txt | nawk -F '|' '{print $2 "|" $4 "|" $4} >output.txt
nawk -F '|' '{print $2 "|" $4 "|" substr($4, 1,3) "xx" substr($4,6,2)}' input.txt > output.txt
output
BB | 123456| 12xx56
Assuming you don't really need the leading and trailing spaces, I would make it
nawk -F '|' '{gsub(/ */, "", $0);print $2 "|" $4 "|" substr($4, 1,2) "xx" substr($4,5,2)}' input.txt > output.txt
cat output.txt
BB|123456|12xx56
final solution
echo "AA | BB | CC | 12345678" \
| awk -F '|' '{gsub(/ */, "", $0)
#dbg print "length$4=" (length($4)-4)
masking=sprintf("%"(length($4)-4)"s", " ") ; gsub(/ /, "x", masking)
print $2 "|" $4 "|" substr($4, 1,2) masking substr($4,(length($4)-1),2)
}'
BB|12345678|12xxxx78
I using echo "..." to simplfy the testing process. You can take that out, replace with input.txt > output.txt and the end of the line and it will work as before.
I've added the (length($4)-1) to make the position of the 2nd to last char on $4 dynamic, based on the length of what ever word is in $4.
IHTH
I have the following awk command which works perfectly:
awk 'BEGIN { ORS = " " } {print 22, $1, NR; for(i=2;i<=NF;++i) print $i}{print "\n"}' file
What it does is insert two columns: the value 22 as the first, and the nr of the row as the third (see this question).
I tried running this command in a for loop as follows:
for p in 20 21 22
do
awk 'BEGIN { ORS = " " } {print $p, \$1, NR; for(i=2;i<=NF;++i) print \$i}{print "\n"}' file > file$p
done
So instead of the 22 in the first column, 3 files are made where each file has 20, 21 or 22 as the first column. Unfortunately, this doesn't work and I get the message: ^ backslash not last character on line.
It probably has something to do with how I escaped the $ characters, but I don't know how else to do it... any ideas?
You can assign a shell-parameter to awk using
-v parameter=value
i.e. use
awk -v p=$p ...
Like this:
for p in 20 21 22
do
awk -v p=$p 'BEGIN { ORS = " " } [TOO LONG LINE]' file > file$p
done
Complete command:
for p in 20 21 22; do awk -v p=$p 'BEGIN { ORS = " " } {print p, $1, NR; for(i=2;i<=NF;++i) print $i}{print "\n"}' file > file$p; done
You are in the right direction. Because your awk script is within ' (single quotes), bash doesn't replace $p with its value in the script (See Advanced Bash-Scripting Guide - Quoting).
You need to do one of the following:
1 (Recommened):
for p in 20 21 22
do
awk -v p=$p 'BEGIN { ORS = " " } {print p, \$1, NR; for(i=2;i<=NF;++i) print \$i}{print "\n"}' file > file$p
done
2: Use " instead of ' and within the awk escape the " by \"