A shell script to validate a column - bash

I have this script that is going to read in csv files that will be in directory. However I want the script to validate an entire coulmn of the .csv file before it moves to the second column i.e
Name, City
Joe, Orlando
Sam, Copper Town
Mike, Atlanta
So it has to check the coulmn NAMES first before it moveds on to check City? How would i make changes to the following script to cater to this?
# Read all files. no file have spaces in their names
for file in /source/*.csv ; do
# init two variables before processing a new file
FILESTATUS=GOOD
FIRSTROW=true
# process file 1 line a time, splitting the line by the
# Internal Field Sep ,
cat "${file}" | while IFS=, read field1 field2; do
# Skip first line, the header row
if [ "${FIRSTROW}" = "true" ]; then
FIRSTROW=FALSE
# skip processing of this line, continue with next record
continue;
fi
#different validations
if [[ "${field1}" = somestringprefix* ]]; then
${FILESTATUS}=BAD
# Stop inner loop
break
fi
somecheckonField2
done
if [ ${FILESTATUS} = "GOOD" ] ; then
mv ${file} /source/good
else
mv ${file} /source/bad
fi
done

I would use awk in the inner loop:
if awk -F, '$1 !~/^prefix1/ || $2 !~ /^prefix2/ {exit(1)}' "$file" ; then
mv "$file" good
else
mv "$file" bad
fi
^prefix1 and ^prefix2 are meant to be regex patterns.

Related

Using a FOR loop to compare items in a list with items in an ARRAY

Without too much fluff, basically I'm creating an array of IP addresses from a user provided file. Then I have another file with three columns of data and multiple lines, the first column is IP addresses.
What I'm trying to do is loop through the file with 3 columns of data and compare the IP addresses with the values in the arrary, and if a value is present from file in the array, to then print some text as well as the 3rd column from that line of the file.
I have a feeling I'm taking a really wrong approach and making things a lot harder than what they need to be!
Semi-Pseudo code below
#!/bin/bash
scopeFile=$1
data=$2
scopeArray=()
while IFS= read -r line; do
scopeArray+=("$line")
done <$1
for line in $2; do
if [[ $line == scopeArray ]]; then
awk '{print $3 " is in scope!"}' $2;
else
echo "$line is NOT in scope!"
fi;
done
EDIT: Added example files for visulisation for context, data.txt
file is dynamically generated elsewhere but the format is always the same.
scope.txt=$1
192.168.0.14
192.168.0.15
192.168.0.16
data.txt=$2
192.168.0.14 : example.com
192.168.0.15 : foobar.com
192.168.0.19 : test.com
Here is one way of doing what you wanted.
#!/usr/bin/env bash
mapfile -t scopeArray < "$1"
while read -r col1 col2 col3; do
for item in "${!scopeArray[#]}"; do
if [[ $col1 == "${scopeArray[item]}" ]]; then
printf '%s is in scope!\n' "$col3"
unset 'scopeArray[item]' && break
else
printf '%s is not is scope!\n' "$col1" >&2
fi
done
done < "$2"
The shell is not the best if not the right tool for comparing files, but it will get you there slowly but surely.
mapfile is a bash4+ feature jyfi.

Running math, ignoring non-numeric values

I am trying to do some math on 2nd column of a txt file , but some lines are not numbers , i only want to operate on the lines which have numbers .and keep other line unchanged
txt file like below
aaaaa
1 2
3 4
How can I do this?
Doubling the second column in any line that doesn't contain any alphabetic content might look a bit like the following in native bash:
#!/bin/bash
# iterate over lines in input file
while IFS= read -r line; do
if [[ $line = *[[:alpha:]]* ]]; then
# line contains letters; emit unmodified
printf '%s\n' "$line"
else
# break into a variable for the first word, one for the second, one for the rest
read -r first second rest <<<"$line"
if [[ $second ]]; then
# we extracted a second word: emit it, doubled, between the first word and the rest
printf '%s\n' "$first $(( second * 2 )) $rest"
else
# no second word: just emit the whole line unmodified
printf '%s\n' "$line"
fi
fi
done
This reads from stdin and writes to stdout, so usage is something like:
./yourscript <infile >outfile
thanks all ,this is my second time to use this website ,i find it is so helpful that it can get the answer very quickly
I also find a answer below
#!/bin/bash
FILE=$1
while read f1 f2 ;do
if[[$f1 != *[!0-9]*]];then
f2=`echo "$f2 -1"|bc` ;
echo "$f1 $f2"
else
echo "$f1 $f2"
fi
done< %FILE

Shell script to validate a csv file column by column

I was wondering how I would go about writing this in shell? I want to validate a field in a csv file coulmn by coulmn. For example only want to validate if coulmn number one is number
Number,Letter
1,u
2,h
3,d
4,j
above
Loop - for all files (loop1)
loop from rows(2-n) (loop2) #skipping first row since its a header
validate column 1
validate column 2
...
end loop2
if( file pass validation)
copy to goodFile directory
else(
send to badFile directory
end loop1
What I have here below is a row by row validation, what modification would i need to make it like the above psuedo code i have above. I am terrible at unix just started learning about awk.
#!/bin/sh
for file in /source/*.csv
do
awk -F"," '{ # awk -F", " {'print$2'} to get the fields.
$date_regex = '~(0[1-9]|1[012])[- /.](0[1-9]|[12][0-9]|3[01])[- /.](19|20)\d\d~';
if (length($1) == "")
break
if (length($2) == "") && (length($2) > 30)
break
if (length($3) == "") && ($3 !~ /$date_regex/)
break
if (length($4) == "") && (($4 != "S") || ($4 != "E")
break
if (length($5) == "") && ((length($5) < 9 || (length($5) > 11)))
break
}' file
#whatever you need with "$file"
done
I will combine two different ways to write a loop.
Lines starting with # are comment:
# Read all files. I hope no file have spaces in their names
for file in /source/*.csv ; do
# init two variables before processing a new file
FILESTATUS=GOOD
FIRSTROW=true
# process file 1 line a time, splitting the line by the
# Internal Field Sep ,
cat "${file}" | while IFS=, read field1 field2; do
# Skip first line, the header row
if [ "${FIRSTROW}" = "true" ]; then
FIRSTROW=FALSE
# skip processing of this line, continue with next record
continue;
fi
# Lot of different checks possible here
# Can google them easy (check field integer)
if [[ "${field1}" = somestringprefix* ]]; then
${FILESTATUS}=BAD
# Stop inner loop
break
fi
somecheckonField2
done
if [ ${FILESTATUS} = "GOOD" ] ; then
mv ${file} /source/good
else
mv ${file} /source/bad
fi
done
Assuming no stray whitespace in the file, here's how I'd do it in bash.
# validate: first field is an integer
# validate: 2nd field is a lower-case letter
for file in *.csv; do
good=true
while IFS=, read -ra fields; do
if [[ ! (
${fields[0]} =~ ^[+-]?[[:digit:]]+$
&& ${fields[1]} == [a-z]
) ]]
then
good=false
break
fi
done < "$file"
if $good; then
: # handle good file
else
: # handle bad file
fi
done

How do I translate a list using a dictionary in bash?

Say I have a dictionary TSV file dict.txt:
apple pomme
umbrella parapluie
glass verre
... ...
and another file list.txt containing a list of words (from the left column of dict.txt):
pie
apple
blue
...
I'd like to translate them into the corresponding words from the right column of dict.txt, i.e:
tarte
pomme
bleu
...
what is the easiest way to do so?
You can use awk:
awk 'FNR==NR{a[$1]=$2;next} a[$1]{print a[$1]}' dict.txt list.txt
EDIT: If there is a requirement to have multi words (separated by spaces) as word meaning in the dictionary using tab se field separator you can use:
awk -F '\t' 'FNR==NR{a[$1]=$2;next} a[$1]{print a[$1]}' dict.txt list.txt
If you don't have many words (so that everything fits in memory) you can use an associative array:
#!/bin/bash
declare -A english2french=()
# Build dictionary
linenb=0
while ((++linenb)) && IFS=$'\t' read -r en fr; do
if [[ -z $fr ]] || [[ -z $en ]]; then
echo "Error line $linenb: one of the two is empty fr=\`$fr' en=\`$en'"
continue
fi
english2french["$en"]=$fr
done < dict.txt
# Translate
linenb=0
while ((++linenb)) && read -r en; do
[[ -z $en ]] && continue
fr=${english2french["$en"]}
if [[ -n $fr ]]; then
echo "$fr"
else
echo >&2 "Error line $linenb: word \`$en' unknown"
fi
done < list.txt
It seems a bit long, but there are lots of error checks ;).

Shell issue for loop in while loop

I am using while loop to read xyz.txt file and file which contains contents like below:
2 - info1
4 - info2
6 - info3
9 - info4
Further I am using if condition to match the count -gt then y value so it will send an email. The problem I am facing every time it matches the if condition it is sending an email which I want once, it should read the file till end and if condition matches store the next line output to a file and then send that file with all information. At present I am receiving number of email.
Hope my question is clear I think I am looking for return function once condition matches it continue reading file till the end and store the info.
count=`echo $line | awk '{print $3}'`
cnt=o
while read line
do
if [ "$count" -gt "$x" ]; then ---> This logic is working fine
cnt=$(( $cnt + 1)) --- > This logic is working fine
echo $line > info.txt -----> In info.txt I want to store info in 1 go which ever matches condition.
export info.txt=$info.txt
${PERL_BIN}/perl $send_mail
fi
done < file.txt
If you only want to send email once, don't put the invocation of Perl which sends mail inside the loop; put it outside the loop (after the end of the loop). Use append (>>) to build the file up piecemeal.
count=`echo $line | awk '{print $3}'`
cnt=0 # 0 not o!
while read line
do
if [ "$count" -gt "$x" ]; then
cnt=$(($cnt + 1))
echo $line >> info.txt
fi
done < file.txt
if [ $cnt -gt 0 ]
then
export info_txt=$info.txt
${PERL_BIN}/perl $send_mail
fi
Okay. I've tried to grasp what you want, I think it is this:
First, before the loop, remove any old info.txt file.
rm info.txt
Then, each time through the loop, append new lines to it like so:
echo $line >> info.txt
Notice the double arrows >>. This means append, instead of overwrite.
Finally, do the email sending after the loop.

Resources