a modification of this script in shell - shell

I have this script that needs to READ ALL fields in a coulmn and validate before it can hit the second column for example
Name, City
Joe, Orlando
Sam, Copper Town
Mike, Atlanta
so the script should read the entire column of name(top to bottom) and validate for null before it moves to the second column. IT should NOT read line by line . Please add some pointer on how to modify /correct
# Read all files. no file have spaces in their names
for file in /export/home/*.csv ; do
# init two variables before processing a new file
$date_regex = '~(0[1-9]|1[012])[- /.](0[1-9]|[12][0-9]|3[01])[- /.](19|20)\d\d~';
FILESTATUS=GOOD
FIRSTROW=true
# process file 1 line a time, splitting the line by the
# Internal Field Sep ,
cat "${file}" | while IFS=, read field1 field2 field3 field4; do
# Skip first line, the header row
if [ "${FIRSTROW}" = "true" ]; then
FIRSTROW=FALSE
# skip processing of this line, continue with next record
continue;
fi
#different validations
if [[ ! -n "$field1" ]]; then
${FILESTATUS}=BAD
# Stop inner loop
break
fi
#somecheckonField2
if [[ ! -n "$field2"]] && ("$field2" =~ $date_regex) ; then
${FILESTATUS}=BAD
# Stop inner loop
break
fi
if [[ ! -n "$field3" ]] && (("$field3" != "S") || ("$field3" != "E")); then
${FILESTATUS}=BAD
# Stop inner loop
break
fi
if [[ ! -n "$field4" ]] || (( ${#field4} < 9 || ${#field4} > 11 )); then
${FILESTATUS}=BAD
# Stop inner loop
break
fi
done
if [ ${FILESTATUS} = "GOOD" ] ; then
mv ${file} /export/home/goodFile
else
mv ${file} /export/home/badFile
fi
done

This awk will read the whole file, then you can do your verification in the END block:
for file in /export/home/*.csv ; do
awk -F', ' '
# skip the header and blank lines
NR == 1 || NF == 0 {next}
# save the data
{ for (i=1; i<=NF; i++) data[++nr,i] = $i }
END {
status = "OK"
# verify column 1
for (lineno=1; lineno <= nr; lineno++) {
if (length(data[lineno,1]) == 0) {
status = "BAD"
break
}
}
printf "file: %s, verify column 1, status: %s\n", FILENAME, status
# verify other columns ...
}
' "$file"
done

Here's an attempt at an awk script that does what it seems like the original script is trying to do:
#!/usr/bin/awk -f
# fields separated by commas
BEGIN { FS = "," }
# skip first line
NR == 1 { next }
# check for empty fields
$1 == "" || $2 == "" || $3 == "" || $4 == "" { exit 1 }
# check for "valid" date (urk... doing this with a regex is horrid)
# it would be better to split it into components and validate each sub-field,
# but I'll leave that as a learning exercise for the reader
$2 !~ /^(0[1-9]|1[012])[- /.](0[1-9]|[12][0-9]|3[01])[- /.](19|20)[0-9][0-9]$/ { exit 1 }
# third field should be either S or E
$3 !~ /^[SE]$/ { exit 1 }
# check the length of the fourth field is between 9 and 11
length($4) < 9 || length($4) > 11 { exit 1 }
# if we haven't found problems up to here, then things are good
END { exit 0 }
Save that in e.g. validate.awk, and set the executable bit on it (chmod +x validate.awk), then you can simply do:
if validate.awk < somefile.txt
then
mv somefile.txt goodfiles/
else
mv somefile.txt badfiles/
fi

Related

Write a line to a file, but also write previous line if it matches a pattern and exists

I'm trying to write a function which reads text from a large file and writes specific blocks of text to another file.
Example file
#Tag
Scenario 1:
Do thing 1
Do thing 2
Scenario 2:
Do thing 1
Do thing 3
#Tag2
Scenario 3:
Do thing 1
Don't do thing 4
I'm trying to read through this file line by line (using IFS right now) and want the output to be something like this:
File 1
#Tag
Scenario 1:
Do thing 1
Do thing 2
File 2
Scenario 2:
Do thing 1
Do thing 3
File 3
#Tag2
Scenario 3:
Do thing 1
Don't do thing 4
I have the pieces in place to read through the file and separate out on the "Scenario" pattern and the lines after it, but the problem I'm running into is trying to figure out how to capture the #Tag pattern and write it if it exists above the "Scenario" pattern.
Edit: here is the current relevant part of the script:
function writeToTestFile {
while IFS='' read -r line || [[ -n "$line" ]]; do
#if line matches the tag pattern of "#" followed by anything, store it
if [[ $line == *#* || "" ]]; then
local tagValue=$line
#if line in file matches "Scenario:" pattern, write to new file
elif [[ $line == *Scenario:* ]]; then
fileToWriteTo=$filename$counter$extention
((counter++))
echo "writing to $fileToWriteTo"
touch $dirToWriteTo/$fileToWriteTo
else
#if line does not match "Scenario:" pattern, check for existing file and write to that
if [[ -e $dirToWriteTo/$fileToWriteTo ]]; then
echo " "$line >> $dirToWriteTo/$fileToWriteTo
fi
# if file does not exist and line does match pattern, do nothing
fi
done < "$1"
}
You can parse the file fairly easily with your function in bash. The key is not to worry about looking for the tag lines. Simply look for Scenario, having checked/saved the previous tag line each iteration in some variable like tag. When Scenario is found, check if tag exists. If so, write the tagline held in tag before Scenario and then continue with normal writing of output.
#!/bin/bash
function writeToTestFile {
[ -z "$1" ] && { ## validate input
printf "%s() error: insufficient input.\n" "$FUNCNAME"
return 1
}
[ -r "$1" ] || { ## validate file readable
printf "%s() error: file not readable '%s'\n" "$FUNCNAME" "$1"
return 1
}
local tag="" ## use local declarations
local line=""
local num=""
local fname=""
while IFS='' read -r line || [ -n "$line" ]; do
if [ "${line// */}" = "Scenario" ]; then ## check Scenario
num="${line/Scenario /}" ## parse num
fname="File_${num%:}.txt" ## parse fname
:> "$fname" ## truncate fname
[ -n "$tag" ] && printf "%s\n" "$tag" > "$fname" ## tagline
printf "%s\n" "$line" >> "$fname" ## write Scenario line
fi ## write normal lines & update tagline
[ "${line:0:1}" = " " ] && printf "%s\n" "$line" >> "$fname"
[ "${line:0:1}" = "#" ] && tag="$line" || tag=
done < "$1"
return 0
}
writeToTestFile "$1"
(note: File_X.txt is truncated before being written, adjust as required. If there is any chance a line, other than a tag line, begins with '#', you can further anchor the comparison with "${line:0:4}" = "#Tag")
Input File
$ cat tagfile.txt
#Tag
Scenario 1:
Do thing 1
Do thing 2
Scenario 2:
Do thing 1
Do thing 3
#Tag2
Scenario 3:
Do thing 1
Don't do thing 4
Use/Output
$ bash tags.sh tagfile.txt
Checking the output files:
$ cat File_1.txt
#Tag
Scenario 1:
Do thing 1
Do thing 2
$ cat File_2.txt
Scenario 2:
Do thing 1
Do thing 3
$ cat File_3.txt
#Tag2
Scenario 3:
Do thing 1
Don't do thing 4
Look it over and let me know if you have any questions.
I'd use awk:
awk -v MATCH="Scenario 1" '
!/^[[:space:]]/ {show=0}
$0==MATCH {print prev; show=1}
show {print}
{prev=$0}
' input_file
That makes several assumptions about the formats which start and end the capture; you might need to adjust the first two conditions.
It would be easy enough to find a similar solution based on your existing bash script. But it would be useful to see the existing bash script.
Perl version:
#!/usr/bin/perl
my $i = 0, $t= 0, $fh = 0;
while (<>) {
if ((/^Scenario/ && !$t) || ($t = /^#\w+$/)) {
close($fh) if $fh;
open($fh, '>', "File".++$i.".txt") or die;
}
print $fh $_;
}
close($fh) if $fh;
Usage: ./script.pl < input.txt
Here is a simple awk solution that incorporates the idea of a buffer to hold #Tag records immediately prior to Scenario records. It also forms a given filename for output from its corresponding Scenario record. Records not part of a Scenario are discarded:
#! /usr/bin/awk -f
BEGIN {
buffer = filename = ""
}
/^#Tag/ {
if (buffer ~ /./ && filename ~ /./)
print buffer > filename
buffer = $0
next
}
/^Scenario [0-9]+:/ {
filename=$0
sub(/^Scenario +/, "File ", filename)
sub(/:[ \t\r]*$/, "", filename)
}
filename ~ /./ {
if (buffer ~ /./) {
print buffer > filename
buffer = ""
}
print > filename
}
END {
if (buffer ~ /./ && filename ~ /./)
print buffer > filename
}

shell: Replacing a part of Line in a file

I am new to shell programming and had to do the following task.
I have a file with following line at line number 28 (static).
page.sysauth = "Admin"
I wanted to replace this line using a shell script each time a new entry to sysauth be made.
page.sysauth = {"Admin", "root"}
page.sysauth = {"Admin", "root", "newAdmin"}
etc.
Also I would want to remove entries from this sysauth variable
page.sysauth = {"Admin", "root", "newAdmin"}
page.sysauth = {"Admin", "root"}
page.sysauth = "Admin"
Please provide pointers in achieving this.
EDIT:
Thank you for the inputs:
Assumption: First entry should be present. eg: page.sysauth="Admin"
Script fails when page.sysauth=______ (empty).
Here's my working script sysauth_adt.sh
#!/bin/bash
add () {
sed -i~ -e '28 { s/= "\(.*\)"/= {"\1"}/; # Add curlies to a single entry.
s/}/,"'"$entry"'"}/ # Add the new entry.
}' "$file"
}
remove () {
sed -i~ -e '28 { s/"'"$entry"'"//; # Remove the entry.
s/,}/}/; # Remove the trailing comma (entry was last).
s/{,/{/; # Remove the leading comma (entry was first).
s/,,/,/; # Remove surplus comma (entry was inside).
s/{"\([^,]*\)"}/"\1"/ # Remove curlies for single entry.
}' "$file"
}
if (( $# == 3 )) ; then
file=$1
action=$2
entry=$3
if [[ $action == add ]] ; then
if head -n28 $1 | tail -n1 | grep -q $3 ; then
echo 0
else
add
fi
elif [[ $action == remove ]] ; then
if head -n28 $1 | tail -n1 | grep -q $3 ; then
remove
else
echo 0
fi
fi
else
echo "Usage: ${0#*/} file (add | remove) entry" >&2
exit 1
fi
If your entries will always be single words with no commas, you can use simple sed scripts:
#!/bin/bash
add () {
sed -i~ -e '28 { s/= "\(.*\)"/= {"\1"}/; # Add curlies to a single entry.
s/}/,"'"$entry"'"}/ # Add the new entry.
}' "$file"
}
remove () {
sed -i~ -e '28 { s/"'"$entry"'"//; # Remove the entry.
s/,}/}/; # Remove the trailing comma (entry was last).
s/{,/{/; # Remove the leading comma (entry was first).
s/,,/,/; # Remove surplus comma (entry was inside).
s/{"\([^,]*\)"}/"\1"/ # Remove curlies for single entry.
}' "$file"
}
if (( $# == 3 )) ; then
file=$1
action=$2
entry=$3
if [[ $action == add ]] ; then
add
elif [[ $action == remove ]] ; then
remove
fi
else
echo "Usage: ${0#*/} file (add | remove) entry" >&2
exit 1
fi
The script doesn't check whether an entry already exists in the list when adding or removing it. For more complicated tasks, I'd probably switch to a real programming language.
awk -F '[[:blank:]]+=[[:blank:]]+' '
# load every single entry
NR != 28 && FNR == NR && $1 ~ /^page.sysauth$/ && $0 !~ /\{.*\}/ { aSysAdd[ ++SysAdd] = $2}
# load special line 28
NR == 28 {
gsub( /^{|}$/, "", Datas = $2)
SysOrg = split( Datas, aSysOrg, /[[:blank:]]*,[[:blank:]]*/)
}
# print standard lines
FNR != NR && $1 !~ /^page.sysauth$/ { print }
# print line 28 (new version)
FNR != NR && FNR == 28 {
printf "page.sysauth = {" aSysOrg[ 1]
for( i = 2; i <= SysOrg; i++) printf ", %s", aSysOrg[ i]
for( i = 1; i <= SysAdd; i++) printf ", %s", aSysAdd[ i]
print "}"
# don't print other page.sysauth
}
' YourFile YourFile > YourFile.new
mv YourFile.new YourFile
Using awk with 2 read (the double YourFile is not an error and mandatory)

reading in a file in a loop and shell scripting

if [ ${FILESTATUS} = "GOOD" ] ; then
mv ${file} /export/home/goodFile
else
mv ${file} /export/home/badFile
fi
want the above to be integrated to the below script. If both column pass the validation then THAT FILE(.csv) should be moved to the good file directory otherwise it should be moved bad file. Please help with integrating the logic/loop
for file in /export/home/*.csv ; do
awk -F', ' '
# skip the header and blank lines
NR == 1 || NF == 0 {next}
# save the data
{ for (i=1; i<=NF; i++) data[++nr,i] = $i }
END {
status = "OK"
# verify column 1
for (lineno=1; lineno <= nr; lineno++) {
if (length(data[lineno,1]) == 0) {
status = "BAD"
break
}
}
printf "file: %s, verify column 1, status: %s\n", FILENAME, status
#verify coulmn 2
for(linenum = 1; linenum <nr; linenum++) {
if (length(dataArr[linenum,2]) == 0){
STATUS = "BAD"
break
}
if ((dataArr[linenum,2]) !~ /^(0[1-9]|1[012])[- /.](0[1-9]|[12][0-9]|3[01])[- /.](19|20)[0-9][0-9]$/){
STATUS = "BAD"
break
}
}
# verify other columns ...
}
' "$file"
done
Have this script that is supposed to read in about 10 or so .csv files from a directory. However I want this script to integrate the following where If the file is succesfully passed through validation steps it goes to the goodFile directory other wise goes to the badfile directory. I am not sure where to include this looping mechnaism.
Anywhere in the awk code you write STATUS = "BAD", replace that with exit 1
Then, in the for loop, test awk's exit status
for file in /export/home/*.csv ; do
awk -F', ' '....' "$file"
if [[ $? -eq 0 ]]; then
# "good" status
mv ${file} /export/home/goodFile
else
# "bad" status
mv ${file} /export/home/badFile
fi
done

Shell script to validate a csv file column by column

I was wondering how I would go about writing this in shell? I want to validate a field in a csv file coulmn by coulmn. For example only want to validate if coulmn number one is number
Number,Letter
1,u
2,h
3,d
4,j
above
Loop - for all files (loop1)
loop from rows(2-n) (loop2) #skipping first row since its a header
validate column 1
validate column 2
...
end loop2
if( file pass validation)
copy to goodFile directory
else(
send to badFile directory
end loop1
What I have here below is a row by row validation, what modification would i need to make it like the above psuedo code i have above. I am terrible at unix just started learning about awk.
#!/bin/sh
for file in /source/*.csv
do
awk -F"," '{ # awk -F", " {'print$2'} to get the fields.
$date_regex = '~(0[1-9]|1[012])[- /.](0[1-9]|[12][0-9]|3[01])[- /.](19|20)\d\d~';
if (length($1) == "")
break
if (length($2) == "") && (length($2) > 30)
break
if (length($3) == "") && ($3 !~ /$date_regex/)
break
if (length($4) == "") && (($4 != "S") || ($4 != "E")
break
if (length($5) == "") && ((length($5) < 9 || (length($5) > 11)))
break
}' file
#whatever you need with "$file"
done
I will combine two different ways to write a loop.
Lines starting with # are comment:
# Read all files. I hope no file have spaces in their names
for file in /source/*.csv ; do
# init two variables before processing a new file
FILESTATUS=GOOD
FIRSTROW=true
# process file 1 line a time, splitting the line by the
# Internal Field Sep ,
cat "${file}" | while IFS=, read field1 field2; do
# Skip first line, the header row
if [ "${FIRSTROW}" = "true" ]; then
FIRSTROW=FALSE
# skip processing of this line, continue with next record
continue;
fi
# Lot of different checks possible here
# Can google them easy (check field integer)
if [[ "${field1}" = somestringprefix* ]]; then
${FILESTATUS}=BAD
# Stop inner loop
break
fi
somecheckonField2
done
if [ ${FILESTATUS} = "GOOD" ] ; then
mv ${file} /source/good
else
mv ${file} /source/bad
fi
done
Assuming no stray whitespace in the file, here's how I'd do it in bash.
# validate: first field is an integer
# validate: 2nd field is a lower-case letter
for file in *.csv; do
good=true
while IFS=, read -ra fields; do
if [[ ! (
${fields[0]} =~ ^[+-]?[[:digit:]]+$
&& ${fields[1]} == [a-z]
) ]]
then
good=false
break
fi
done < "$file"
if $good; then
: # handle good file
else
: # handle bad file
fi
done

shell validation checks syntax trouble [duplicate]

I have this script that needs to READ ALL fields in a coulmn and validate before it can hit the second column for example
Name, City
Joe, Orlando
Sam, Copper Town
Mike, Atlanta
so the script should read the entire column of name(top to bottom) and validate for null before it moves to the second column. IT should NOT read line by line . Please add some pointer on how to modify /correct
# Read all files. no file have spaces in their names
for file in /export/home/*.csv ; do
# init two variables before processing a new file
$date_regex = '~(0[1-9]|1[012])[- /.](0[1-9]|[12][0-9]|3[01])[- /.](19|20)\d\d~';
FILESTATUS=GOOD
FIRSTROW=true
# process file 1 line a time, splitting the line by the
# Internal Field Sep ,
cat "${file}" | while IFS=, read field1 field2 field3 field4; do
# Skip first line, the header row
if [ "${FIRSTROW}" = "true" ]; then
FIRSTROW=FALSE
# skip processing of this line, continue with next record
continue;
fi
#different validations
if [[ ! -n "$field1" ]]; then
${FILESTATUS}=BAD
# Stop inner loop
break
fi
#somecheckonField2
if [[ ! -n "$field2"]] && ("$field2" =~ $date_regex) ; then
${FILESTATUS}=BAD
# Stop inner loop
break
fi
if [[ ! -n "$field3" ]] && (("$field3" != "S") || ("$field3" != "E")); then
${FILESTATUS}=BAD
# Stop inner loop
break
fi
if [[ ! -n "$field4" ]] || (( ${#field4} < 9 || ${#field4} > 11 )); then
${FILESTATUS}=BAD
# Stop inner loop
break
fi
done
if [ ${FILESTATUS} = "GOOD" ] ; then
mv ${file} /export/home/goodFile
else
mv ${file} /export/home/badFile
fi
done
This awk will read the whole file, then you can do your verification in the END block:
for file in /export/home/*.csv ; do
awk -F', ' '
# skip the header and blank lines
NR == 1 || NF == 0 {next}
# save the data
{ for (i=1; i<=NF; i++) data[++nr,i] = $i }
END {
status = "OK"
# verify column 1
for (lineno=1; lineno <= nr; lineno++) {
if (length(data[lineno,1]) == 0) {
status = "BAD"
break
}
}
printf "file: %s, verify column 1, status: %s\n", FILENAME, status
# verify other columns ...
}
' "$file"
done
Here's an attempt at an awk script that does what it seems like the original script is trying to do:
#!/usr/bin/awk -f
# fields separated by commas
BEGIN { FS = "," }
# skip first line
NR == 1 { next }
# check for empty fields
$1 == "" || $2 == "" || $3 == "" || $4 == "" { exit 1 }
# check for "valid" date (urk... doing this with a regex is horrid)
# it would be better to split it into components and validate each sub-field,
# but I'll leave that as a learning exercise for the reader
$2 !~ /^(0[1-9]|1[012])[- /.](0[1-9]|[12][0-9]|3[01])[- /.](19|20)[0-9][0-9]$/ { exit 1 }
# third field should be either S or E
$3 !~ /^[SE]$/ { exit 1 }
# check the length of the fourth field is between 9 and 11
length($4) < 9 || length($4) > 11 { exit 1 }
# if we haven't found problems up to here, then things are good
END { exit 0 }
Save that in e.g. validate.awk, and set the executable bit on it (chmod +x validate.awk), then you can simply do:
if validate.awk < somefile.txt
then
mv somefile.txt goodfiles/
else
mv somefile.txt badfiles/
fi

Resources