Output a record from an existing file based on a matching condition in bash scripting - shell

I need to be able to output a record if a condition is true.
Suppose this is the existing file,
Record_ID,Name,Last Name,Phone Number
I am trying to output record if the last name matches. I collect user input to get last name and then perform the following operation.
read last_name
cat contact_records.txt | awk -F, '{if($3=='$last_name')print "match"; else print "no match";}'
This script outputs no match for every record within contact_records.txt

Your script has two problems:
First, $last_name is not considered quoted in the context of 'awk'. For example, if "John" is to be queried, you are comparing $3 with the variable John rather than string "John". This can be fixed by adding two double-quotes as below:
read last_name
cat contact_records.txt | awk -F, '{if($3=="'$last_name'")print "match"; else print "no match";}'
Second, it actually scans the whole contact_records.txt and prints match/no match for each line of comparison. For example, contact_records.txt has 100 lines, with "John" in it. Then, querying if John is in it by this script yields 1 "match"'s and 99 "no match"'s. This might not be what you want. Here's a fix:
read last_name
if [ `cat contact_records.txt | cut -d, -f 3 | grep -c "$last_name"` -eq 0 ]; then
echo "no match"
else
echo "match"
fi

Related

regex to print lines if value between patterns is greater than number - solution which is independent of column position

2001-06-30T11:33:33,543 DEBUG (Bss-Thread-948:[]) SUNCA#44#77#CALMED#OK#58#NARDE#4356#68654768961#BHR#TST#DEV
2001-06-30T11:33:33,543 DEBUG (Bss-Thread-948:[]) SUNCA#44#77#CALMED#OK#58#NARDE#89034#1234567#BHR#TST#DEV
2001-06-30T11:33:33,543 DEBUG (Bss-Thread-948:[]) SUNCA#44#77#OK#58#BHREDD#234586#4254567#BHR#TST#DEV
2001-06-30T11:33:33,543 DEBUG (Bss-Thread-948:[]) SUNCA#44#77#OK#58#NARDE#89034#1034567#BHR#TST#DEV
I have log file mentioned above. I would like to print lines only if value between patterns # and #BHR is greater than 1100000.
I can see in my log file lines with values 68654768961, 1234567, 4254567, 1034567. As per the requirement the output should conatin only first 3 lines.
I am looking for regex to get desired output.
One questions, this #58#BHR should be ignore in third line ? If yes, I will get value between patterns # and #BHR#.
Normally, it should be solved this question by writing scripting according the business logical. But you could try this one line command by awk.
awk '{if (0 == system("[ $(echo \"" $0 "\"" " | grep -oP \"" "(?<=#)\\d+(?=#BHR#)\" || echo 0) -gt 1100000 ]")) {print $0}}' log_file
Mainly, it use system() to scratch the value by grep:
# if can't get the pattern value by grep, the value will assign 0
echo $one_line | grep -oP "(?<=#)\d+(?=#BHR#)" || echo 0`
and compare the value to 1100000 by [ "$value" -gt 1100000 ] in awk.
FYI, so if the value greater than 1100000 it will return 0.
system(cmd): executes cmd and returns its exit status

Why does a "while read" loop stop when grep is run with an empty argument?

The following code does not work as I would expect:
(the original purpose of the script is to make a relation between items of two files where the identifiers are not sorted in the same order, but my question raises rather a curiosity about basic shell functionalities)
#!/bin/sh
process_line() {
id="$1"
entry=$(grep $id index.txt) # the "grep" line
if [ "$entry" = "" ]; then
echo 00000 $id
else
echo $entry | awk '{print $2, $1;}'
fi
}
cat << EOF > index.txt
xyz 33333
abc 11111
def 22222
EOF
cat << EOF | while read line ; do process_line "$line"; done
abc
def
xyz
EOF
The output is:
11111 abc
22222 def
00000
But I would expect:
11111 abc
22222 def
00000
33333 xyz
(the last line is missing in the actual output)
My investigations show that the "grep" line is the one that leads to the early interruption of the while loop. However I cannot see the causal relationship.
That's because in the third iteration with the empty line, you call process_line with an empty id. This leads to grep index.txt, i.e. no file name. This grep reads from stdin and that'll consume all your input you pipe into the while loop.
To see this in action, add set -x at the top of your script.
You can get the desired behaviour if you replace the empty id with a string guaranteed to be not found, such as
entry=$(grep "${id:-NoSuchString}" index.txt)
Changing the "process_line" function to the following might help...
process_line() {
id=$1
if [ "$id" = "" ]
then
echo "00000"
else
entry=$(grep "${id}" index.txt)
echo "$entry" | awk '{ print $2, $1 }'
fi
}
Explanation:
if the "id" passed in is empty then just output the default
move the grep to the else clause so it only executes when "id" has a value
solves the problem with the missing quotes around id in the grep statement
another thing to consider is the case where "id" is not-empty but not found in the index.txt file. This could result in a blank output. Adding an if statement after the grep call to handle this case may be a good idea depending on what the overall intention is.
Hope that helps

find if there is any string in a line after matching regex bash

In a file looking like below I would to find if all lines with "PRIO" has any value after that and if there are some values missing I would like to write it as output.
I've tried to do this with grep, but it only matches if there's even one occurrence of looking word.
cat PATH_TO_FILE | grep 'PRIO' &> /dev/null
if [ $? == 0 ]; then
echo "matched"
else
echo "not found"
fi
File structure looks simillar to this one below
name1
sdgk
PRIO 3
name2
PRIO
dsl dfhhhdf
name3
fnslkf hsdhfd
jlkg;jslk sgdgdsg
kfasdjmgkdlsgl sdggsehg
PRIO 1
name4
sdgds
dsdsgdg
PRIO 2
sdgg
With awk this is very simple by checking the number of fields.
awk '/PRIO/{ str=(NF>1)?"matched":"not found"; print str }' <file>
This does :
/PRIO/ : if a line contains the word PRIO perform action {...}
{...} : if the number for fields is bigger then 1 `(NF>1)1, it matched otherwise it did not.
If you want to ensure that PRIO is the first word, then use $1=="PRIO", and if you want to print the line number then use
awk '($1=="PRIO"){ str=(NF>1)?"matched":"not found"; print NR,str }' <file>
not sure what you want to test but
$ grep -q 'PRIO\s*$' file
checks whether there are any PRIO without any value after. For your sample input this will succeed, which you can use this as an error condition.
if grep -q 'PRIO\s*$' file
then echo "found missing value instance"
else echo "all instances have values"
fi

Awk print exact fields when finding an exact match

I have a persons.dat file containing information.
Here's an example line.
1129|Lepland|Carmen|female|1984-02-18|228T04:39:58.781+0000|81.25.252.111|Internet Explorer
1129 is the ID.
I am asked to display the information of anyone based on their ID,in particular their firstname (Carmen), (Lastname = Lepland) and date of Birth (1984-02-18) separated by a space.
I have stored the id in a shell variable IDNumber as shown below:
for arg1 in $#;do # Retrieve ID Number
if [ $1 = "-id" ];then
IDNumber="$2"
fi
shift
done
How can I use awk to display the exact fields of one ID?
The command line argument parsing of the shell script is a bit confusing like that, since arg1 is not used.
And even after it finds -id and assigns $2 to IDNumber,
the iteration continues.
For example when the arguments are -id 3,
after IDNumber=3,
the iteration continues, checking if [ 3 = "-id" ].
Also, the "$1" in if [ $1 = ... ] should be double-quoted,
otherwise the script will crash if there is an empty argument.
Here's one way to fix these issues:
while [ $# -gt 0 ]; do
if [ "$1" = -id ]; then
id=$2
fi
shift
done
Then you can use id with Awk like this:
awk -F'|' -v id="$id" '$1 == id {print $3, $2, $5}' persons.dat
That is:
Set | as the field separator
Set id variable in Awk to the value of $id in the shell
Find the record in the input where the first field ($1) is equal to id
Print the columns you need

bash csv file column extraction and deduplication

I have a .csv file I am working with and I need to output another csv file that contains a de-deuplicated list of columns 2 and 6 from the first csv with some caveats.
This is a bit difficult to explain in words but here is an example of what my input is:
"customer_name","cid”,”boolean_status”,”type”,”number”
“conotoso, inc.”,”123456”,”TRUE”,”Inline”,”210”
"conotoso, inc.","123456”,”FALSE”,”Inline”,”411"
“afakename”,”654321”,”TRUE","Inline”,”253”
“bfakename”,”909090”,”FALSE”,”Inline”,”321”
“cfakename”,”121212”,”TRUE","Inline","145”
what I need for this to do is create a new .csv file containing only "customer_name" column and "boolean_status" column.
Now, I also need there to be only one line for "customer_name" and to show "TRUE" if ANY of the customer_name matches a "true" value in the boolean column.
The output from the above input should be this:
"customer_name",”boolean_status”
“conotoso, inc.”,”TRUE”
“afakename”,”TRUE"
“cfakename”,”TRUE"
So far I tried
awk -F "\"*\",\"*\"" '{print $1","$6}' data1.csv >data1out.csv
to give me the output file, but then I attempted to cat data1out.csv | grep 'TRUE' with no good luck
can someone help me out on what i should do to manipulate this properly?
I'm also running into issues with the awk printing out the leading commas
All I really need at the end is a number of "how many unique 'customer_names' have at least 1 'True' in the "boolean" column?"
You will get your de duplicated file by using
sort -u -t, -k2,2 -k6,6 filname>sortedfile
Post this you can write a script to extract the columns required.
while read line
do
grep "TRUE" "$line"
if [ $? -eq 0]
then
a=$(cut -d',' -f1-f3 $line)
echo a >>outputfile
fi
done<<sortedfile

Resources