Display message when no match found in AWK - bash

I'm writing a small BASH script that reads a csv file with names on it and prompts the user for a name to be removed. The csv file looks like this:
Smith,John
Jackie,Jackson
The first and last name of the person to be removed from the list are saved in the bash variables $first_name and $last_name.
This is what I have so far:
cat file.csv | awk -F',' -v last="$last_name" -v first="$first_name" ' ($1 != last || $2 != first) { print } ' > tmpfile1
This works fine. However, it still outputs to tmpfile1 even if no employee matches that name. What I would like is to have something like:
if ($1 != last || $2 != first) { print } > tmpfile1 ; else { print "No Match Found." }
I'm new to awk and can't get that last part to work.
NOTE: I do not want to use something like grep -v "$last_name,$first_name"; I want to use a filtering function.

You can redirect right inside the awk script, and only output matches found.
awk -F',' -v last="$last_name" -v first="$first_name" '
$1==last && $2==first {next}
{print > "tmpfile"}
' file.csv
Here are some differences between your script and this....
This has awk reading your CSV directly, rather than having UUOC.
This actively skips the records you want to skip,
and prints everything else through a redirect.
Note that you could, if you wanted, specify the target to which to redirect in a variable you pass in using -v as well.
If you really want the "No match found" error, you can set a flag, then use the END special condition in awk...
awk -F',' -v last="$last_name" -v first="$first_name" '
$1==last && $2==first { found=1; next }
{ print > "tmpfile" }
END { if (!found) print "No match found." > "/dev/stderr" }
' file.csv
And if you want no tmpfile to be created if a match wasn't found, you would either need to scan the file TWICE, once to verify that there's a match, and once to print, or if there's no risk that the size of the file would be too great for available memory, you could keep a buffer:
awk -F',' -v last="$last_name" -v first="$first_name" '
$1==last && $2==first { next }
{ output = (output ? output ORS : "" ) $0 }
END {
if (output)
print output > "tmpfile"
else
print "No match found." > "/dev/stderr"
}
' file.csv
Disclaimer: I haven't tested any of these. :)

You can do two passes over the file, or you can queue up all of the file so far in memory and then just fail if you reach the END block with no match.
awk -v first="$first" last="$last" '$1 != last || $2 != first {
for (i=1; i<=n; ++i) print a[i] >>"tempfile"; p=1; split("", a); }
# No match yet, remember this line for later
!p { a[++n] = $0; next }
# If we get through to here, there was a match
p { print >>"tempfile" }
END { if (!p) { print "no match" >"/dev/stderr"; exit 1 } }' filename
This requires you to have enough memory to store the entire file (this will be required when there is no match).

With a bash script, you can test if awk print something.
If yes, remove the tmpfile.
c=$(awk -F',' -v a="$last_name" -v b="$first_name" '
$1==a && $2==b {c=1;next}
{print > "tmpfile"}
END{if (!c){print "no match"}}' infile)
[ -n "$c" ] && { echo "$c"; rm tmpfile;}

Related

Editing text in Bash

I am trying to edit text in Bash, i got to point where i am no longer able to continue and i need help.
The text i need to edit:
Symbol Name Sector Market Cap, $K Last Links
AAPL
Apple Inc
Computers and Technology
2,006,722,560
118.03
AMGN
Amgen Inc
Medical
132,594,808
227.76
AXP
American Express Company
Finance
91,986,280
114.24
BA
Boeing Company
Aerospace
114,768,960
203.30
The text i need:
Symbol,Name,Sector,Market Cap $K,Last,Links
AAPL,Apple Inc,Computers and Technology,2,006,722,560,118.03
AMGN,Amgen Inc,Medical,132,594,808,227.76
AXP,American Express Company,Finance,91,986,280,114.24
BA,Boeing Company,Aerospace,114,768,960,203.30
I already tried :
sed 's/$/,/' BIPSukol.txt > BIPSukol1.txt | awk 'NR==1{print}' BIPSukol1.txt | awk '(NR-1)%5{printf "%s ", $0;next;}1' BIPSukol1.txt | sed 's/.$//'
But it doesnt quite do the job.
(BIPSukol1.txt is the name of the file i am editing)
The biggest problem you have is you do not have consistent delimiters between your fields. Some have commas, some don't and some are just a combination of 3-fields that happen to run together.
The tool you want is awk. It will allow you to treat the first line differently and then condition the output that follows with convenient counters you keep within the script. In awk you write rules (what comes between the outer {...} and then awk applies your rules in the order they are written. This allows you to "fix-up" your hap-hazard format and arrive at the desired output.
The first rule applied FNR==1 is applied to the 1st line. It loops over the fields and finds the problematic "Market Cap $K" field and considers it as one, skipping beyond it to output the remaining headings. It stores a counter count = NF - 3 as you only have 5 lines of data for each Symbol, and skips to the next record.
When count==n the next rule is triggered which just outputs the records stored in the a[] array, zeros count and deletes the a[] array for refilling.
The next rule is applied to every record (line) of input from the 2nd-on. It simply removes any whitespece from the fields by forcing awk to recalculate the fields with $1 = $1 and then stores the record in the array incrementing count.
The last rule, END is a special rule that runs after all records are processed (it lets you sum final tallies or output final lines of data) Here it is used to output the records that remain in a[] when the end of the file is reached.
Putting it altogether in another cut at awk:
awk '
FNR==1 {
for (i=1;i<=NF;i++)
if ($i == "Market") {
printf ",Market Cap $K"
i = i + 2
}
else
printf (i>1?",%s":"%s"), $i
print ""
n = NF-3
count = 0
next
}
count==n {
for (i=1;i<=n;i++)
printf (i>1?",%s":"%s"), a[i]
print ""
delete a
count = 0
}
{
$1 = $1
a[++count] = $0
}
END {
for (i=1;i<=count;i++)
printf (i>1?",%s":"%s"), a[i]
print ""
}
' file
Example Use/Output
Note: you can simply select-copy the script above and then middle-mouse-paste it into an xterm with the directory set so it contains file (you will need to rename file to whatever your input filename is)
$ awk '
> FNR==1 {
> for (i=1;i<=NF;i++)
> if ($i == "Market") {
> printf ",Market Cap $K"
> i = i + 2
> }
> else
> printf (i>1?",%s":"%s"), $i
> print ""
> n = NF-3
> count = 0
> next
> }
> count==n {
> for (i=1;i<=n;i++)
> printf (i>1?",%s":"%s"), a[i]
> print ""
> delete a
> count = 0
> }
> {
> $1 = $1
> a[++count] = $0
> }
> END {
> for (i=1;i<=count;i++)
> printf (i>1?",%s":"%s"), a[i]
> print ""
> }
> ' file
Symbol,Name,Sector,Market Cap $K,Last,Links
AAPL,Apple Inc,Computers and Technology,2,006,722,560,118.03
AMGN,Amgen Inc,Medical,132,594,808,227.76
AXP,American Express Company,Finance,91,986,280,114.24
BA,Boeing Company,Aerospace,114,768,960,203.30
(note: it is unclear why you want the "Links" heading included since there is no information for that field -- but that is how your desired output is specified)
More Efficient No Array
You always have afterthoughts that creep in after you post an answer, no different than remembering a better way to answer a question as you are walking out of an exam, or thinking about the one additional question you wished you would have asked after you excuse a witness or rest your case at trial. (there was some song that captured it -- a little bit ironic :)
The following does essentially the same thing, but without using arrays. Instead it simply outputs the information after formatting it rather than buffer it in an array for output all at once. It was one of those type afterthoughts:
awk '
FNR==1 {
for (i=1;i<=NF;i++)
if ($i == "Market") {
printf ",Market Cap $K"
i = i + 2
}
else
printf (i>1?",%s":"%s"), $i
print ""
n = NF-3
count = 0
next
}
count==n {
print ""
count = 0
}
{
$1 = $1
printf (++count>1?",%s":"%s"), $0
}
END { print "" }
' file
(same output)
With your shown samples, could you please try following(written and tested in GNU awk). Considering that(by seeing OP's attempts) after header of Input_file you want to make every 5 lines into a single line.
awk '
BEGIN{
OFS=","
}
FNR==1{
NF--
match($0,/Market.*\$K/)
matchedPart=substr($0,RSTART,RLENGTH)
firstPart=substr($0,1,RSTART-1)
lastPart=substr($0,RSTART+RLENGTH)
gsub(/,/,"",matchedPart)
gsub(/ +/,",",firstPart)
gsub(/ +/,",",lastPart)
print firstPart matchedPart lastPart
next
}
{
sub(/^ +/,"")
}
++count==5{
print val,$0
count=0
val=""
next
}
{
val=(val?val OFS:"")$0
}
' Input_file
OR if your awk doesn't support NF-- then try following.
awk '
BEGIN{
OFS=","
}
FNR==1{
match($0,/Market.*\$K/)
matchedPart=substr($0,RSTART,RLENGTH)
firstPart=substr($0,1,RSTART-1)
lastPart=substr($0,RSTART+RLENGTH)
gsub(/,/,"",matchedPart)
gsub(/ +/,",",firstPart)
gsub(/ +Links( +)?$/,"",lastPart)
gsub(/ +/,",",lastPart)
print firstPart matchedPart lastPart
next
}
{
sub(/^ +/,"")
}
++count==5{
print val,$0
count=0
val=""
next
}
{
val=(val?val OFS:"")$0
}
' Input_file
NOTE: Looks like your header/first line needed special manipulation because we can't simply set , for all spaces, so taken care of it in this solution as per shown samples.
With GNU awk. If your first line is always the same.
echo 'Symbol,Name,Sector,Market Cap $K,Last,Links'
awk 'NR>1 && NF=5' RS='\n ' ORS='\n' FS='\n' OFS=',' file
Output:
Symbol,Name,Sector,Market Cap $K,Last,Links
AAPL,Apple Inc,Computers and Technology,2,006,722,560,118.03
AMGN,Amgen Inc,Medical,132,594,808,227.76
AXP,American Express Company,Finance,91,986,280,114.24
BA,Boeing Company,Aerospace,114,768,960,203.30
See: 8 Powerful Awk Built-in Variables – FS, OFS, RS, ORS, NR, NF, FILENAME, FNR

awk output to file based on filter

I have a big CSV file that I need to cut into different pieces based on the value in one of the columns. My input file dataset.csv is something like this:
NOTE: edited to clarify that data is ,data, no spaces.
action,action_type, Result
up,1,stringA
down,1,strinB
left,2,stringC
So, to split by action_type I simply do (I need the whole matching line in the resulting file):
awk -F, '$2 ~ /^1$/ {print}' dataset.csv >> 1_dataset.csv
awk -F, '$2 ~ /^2$/ {print}' dataset.csv >> 2_dataset.csv
This works as expected but I am basicaly travesing my original dataset twice. My original dataset is about 5GB and I have 30 action_type categories. I need to do this everyday, so, I need to script the thing to run on its own efficiently.
I tried the following but it does not work:
# This is a file called myFilter.awk
{
action_type=$2;
if (action_type=="1") print $0 >> 1_dataset.csv;
else if (action_type=="2") print $0 >> 2_dataset.csv;
}
Then I run it as:
awk -f myFilter.awk dataset.csv
But I get nothing. Literally nothing, no even errors. Which sort of tell me that my code is simply not matching anything or my print / pipe statement is wrong.
You may try this awk to do this in a single command:
awk -F, 'NR > 1{fn = $2 "_dataset.csv"; print >> fn; close(fn)}' file
With GNU awk to handle many concurrently open files and without replicating the header line in each output file:
awk -F',' '{print > ($2 "_dataset.csv")}' dataset.csv
or if you also want the header line to show up in each output file then with GNU awk:
awk -F',' '
NR==1 { hdr = $0; next }
!seen[$2]++ { print hdr > ($2 "_dataset.csv") }
{ print > ($2 "_dataset.csv") }
' dataset.csv
or the same with any awk:
awk -F',' '
NR==1 { hdr = $0; next }
{ out = $2 "_dataset.csv" }
!seen[$2]++ { print hdr > out }
{ print >> out; close(out) }
' dataset.csv
As currently coded the input field separator has not been defined.
Current:
$ cat myfilter.awk
{
action_type=$2;
if (action_type=="1") print $0 >> 1_dataset.csv;
else if (action_type=="2") print $0 >> 2_dataset.csv;
}
Invocation:
$ awk -f myfilter.awk dataset.csv
There are a couple ways to address this:
$ awk -v FS="," -f myfilter.awk dataset.csv
or
$ cat myfilter.awk
BEGIN {FS=","}
{
action_type=$2
if (action_type=="1") print $0 >> 1_dataset.csv;
else if (action_type=="2") print $0 >> 2_dataset.csv;
}
$ awk -f myfilter.awk dataset.csv

awk - command how to handle when file has no records

I have awk command which compares two csv files and if a records is found appends the files with values from one file and creates a new file .which works fine and gives expected output when both the files have records. but when one of the file is empty not getting the desired result. how to handle the empty file and get the required output
The script which i have is
#!/bin/ksh
set -x
#dos2unix $SCRIPT_HOME/input/declined/file1.csv $SCRIPT_HOME/input/declined/file1.csv
/usr/xpg4/bin/awk 'BEGIN{FS=OFS="|"} FNR==NR {a[$2]=$1 ;b[$15]=$15;c[17]=substr($17,1,1);next;}
{
print $12;
if($12 in b){
print $0,a[$17],c[17];
}
else {
{print $0}}
}' $SCRIPT_HOME/input/declined/declined.csv $SCRIPT_HOME/input/declined/file2.csv > $SCRIPT_HOME/input/error/file2.csv
In my case when the file declined.csv is empty i just want to print out the file2.csv records as it is.
Change:
FNR==NR
to:
FILENAME==ARGV[1]
I used the option [ ! -s $SCRIPT_HOME/input/declined/declined.csv ] ||
to verify file is empty or not.
[ ! -s $SCRIPT_HOME/input/declined/declined.csv ] ||
/usr/xpg4/bin/awk 'BEGIN{FS=OFS="|"} FNR==NR {a[$2]=$1 ;b[$15]=$15;c[17]=substr($17,1,1);next;}
{
print $12;
if($12 in b){
print $0,a[$17],c[17];
}
else {
{print $0}}
}' $SCRIPT_HOME/input/declined/declined.csv $SCRIPT_HOME/input/declined/file2.csv > $SCRIPT_HOME/input/error/filea2.csv
[ -s $SCRIPT_HOME/input/declined/declined.csv ] ||
{
print " i am here executing";
cp $SCRIPT_HOME/input/declined/file2.csv $SCRIPT_HOME/input/error/file2.csv

Bash script to grep through one file for a list names, then grep through a second file to match those names to get a lookup value

Somehow, being specific just doesn't translate well into a title.
Here is my goal, using BASH script in a cygwin environment:
Read text file $filename to get a list of schemas and table names
Take that list of schemas and table names and find a match in $lookup_file to get a value
Use that value to make a logic choice
I basically have each item working separately. I just can't figure out how to glue it all together.
For step one, it's
grep $search_string $filename | awk '{print $1, $5}' | sed -e 's~"~~g' -e 's~ ~\t~g'
Which gives a list of schema{tab}table
For step two, it's
grep -e '{}' $lookup_file | awk '{print $3}'
Where $lookup_file is schema{tab}table{tab}value
Step three is basically, based on the value returned, do "something"; file a report, email a warning, ignore it, etc.
I tried stringing part one and two together with xargs, but it treats the schema and the table name as filenames and throws errors.
What is the glue I'm missing? Or is there a better method?
awk -v s="$search_string" 'NR == FNR { if ($0 ~ s) { gsub(/"/, "", $5); a[$1, $5] = 1; }; next; } a[$1, $2] { print $3; }' "$filename" "$lookup_file"
Explained:
NR == FNR { if ($0 ~ s) { gsub(/"/, "", $5); a[$1, $5] = 1; }; next; } targets the first file, searching for valid matches on it, and save key values in array a.
a[$1, $2] { print $3; } targets the second file and prints the value in its third column if it finds matches with the first and second column of the file and the keys in array a.
awk -v search="$search_string" '$0 ~ search { gsub(/"/, "", $5);
print $1"\t"$5; }' "$filename" |
while read line
do
result=$(awk -v search="\b$line\b" '$0 ~ search { print $3; } ' "$lookup_file");
# Do "something" with $result
done

Overwriting a file in bash

I have a file, of which a part is shown below:
OUTPUT_FILENAME="out.Received.Power.x.0.y.1.z.0.41
X_TX=0
Y_TX=1
Z_TX=0.41
I would like to automatically change some part of it with BASH: every time i see OUTPUT_FILENAME i want to over write the name next to it and change it with a new one. Then i want to do the same with the values X_TX, Y_TX and Z_TX: delete the value next to it and rewrite a new one. For example instead of X_TX=0 i want X_TX=0.3 or viceversa.
Do you think it's possible?Maybe with grep or so..
You can use sed like this:
i.e. to replace X_TX= with X_TX=123 you can do:
sed -i -e 's/X_TX=.*/X_TX=123/g' /tmp/file1.txt
One option using awk. Your values are passed as variables to the awk script and substituted when exists a match:
awk -v outfile="str_outfile" -v x_tx="str_x" -v y_tx="str_y" -v z_tx="str_z" '
BEGIN { FS = OFS = "=" }
$1 == "OUTPUT_FILENAME" { $2 = outfile; print; next }
$1 == "X_TX" { $2 = x_tx; print $0; next }
$1 == "Y_TX" { $2 = y_tx; print $0; next }
$1 == "Z_TX" { $2 = z_tx; print $0; next }
' infile

Resources