testing last column header of text file for equality - bash

I am trying to check if columns of a user file called testfile.txt are correctly named (they should be named var1, var2, var3). The file looks like
var1 var2 var3
5 6 7
I was thinking the following should work:
read i j k < testfile.txt
echo "${k}"
if [[ "${i}" != "var1" || "${j}" != "var2" || "${k}" != "var3" ]]; then
echo "input incorrect"
else
echo "input correct"
fi
but this returns
var3
input incorrect
So although the last column seems to be correctly named, the test fails. If I only test for the names of the first two columns, it works, but the test for the last column is always deemed false somehow.
How can I correct the script so that it can also test correctly for the value of the last column header?

If you just want to strip the CR's from the header line:
read i j k <<< $( sed '1 s/\r//g; 2q;' testfile.txt )
If you want to clean the whole file:
tr -d "\r" <testfile.txt>x && mv x testfile.txt

One in GNU awk (or any awk that supports multichar RS like mawk or Busybox awk):
$ awk 'BEGIN {
RS="\r?\n" # regard \r
headers="var1, var2, var3" # header names
n=split(headers,h,/, */) # set to an array
}
NR==1 { # only process first record
for(i=1;i<=NF;i++) # and every field of it
if($i!=h[i] || n!=NF) { # if a header differ of count is wrong
print "input incorrect" # complain
exit # and leave
}
exit # leave without complaining
}' testfile
Output could be:
input incorrect
or not.

Related

Bash/Shell -- Inserting line breaks between dates

Ive got data that comes in a file with multiple dates/times, etc...
example:
12/15/19,23:30,80.2
12/15/19,23:45,80.6
12/16/19,00:00,80.5
12/16/19,00:15,80.2
And would like to use some command that will automatically go through the whole file and anytime the date changes, it would insert 2 Blank lines so that i'm able to see more clearly when the date changes.
example of what i'm looking for the file to look like after said command:
12/15/19,23:30,80.2
12/15/19,23:45,80.6
12/16/19,00:00,80.5
12/16/19,00:15,80.2
What is the best way to do this through bash/shell command line commands?
Using awk:
awk -F',' 'NR>1 && prev!=$1{ print ORS }
{ prev=$1; print }' file
Use , as field separator
If this is not the first line and prev is different from field1, print two newlines (print prints one newline and the
output record separator ORS another one)
For each line, save the value of field1 in variable prev and print the line
Since you're detecting patterns over multiple lines, you'll want to use bash builtins instead of programs like grep or sed.
# initialize variable
last_date=''
# loop over file lines (IFS='' to loop by line instead of word)
while IFS='' read line; do
# extract date (up to first comma)
this_date="${line%%,*}"
# print blank line unless dates are equal
[[ "$this_date" = "$last_date" ]] || echo
# remember date for next line
last_date="$this_date"
# print
printf '%s\n' "$line"
# feed loop with file
done < my_file.txt
Here's the shorter copy/paste version:
b='';while IFS='' read l;do a="${l%%,*}";[[ "$a" = "$b" ]]||echo;b="$a";printf '%s\n' "$l";done < my_file.txt
And you can also make it a function:
function add_spaces {
# initialize variable
last_date=''
# loop over file lines (IFS='' to loop by line instead of word)
while IFS='' read line; do
# extract date (up to first comma)
this_date="${line%%,*}"
# print blank line unless dates are equal
[[ "$this_date" = "$last_date" ]] || echo
# remember date for next line
last_date="$this_date"
# print
printf '%s\n' "$line"
# feed loop with file
done < "$1" # $1 is the first argument to the function
}
So that you can call it whenever you want:
add_spaces my_file.txt

How to Print output of shell script in different columns of csv file

I have written a shell script and want to print output of 5 defined variables in csv file, I am using a condition, If condition 1 success then output should print in first 5 columns, else output should in next 5 columns, like below:
if condition 1 success
$DAY,$ModName,$Version,$END_TIME,$START_TIME
(should print in column number 1..5 of csv)
if condition 2 success
$DAY,$ModName,$Version,$END_TIME,$START_TIME
(should print in column number 6..10 of csv)
But using my code output always appends to next row
Below is my code:
if [ "$Version" = linux ]
then
echo "$DAY","$ModName","$Version","$END_TIME","$START_TIME" | awk -F "\"*,\"*" '{print $a ","}' >> output.csv;
else
echo "$DAY","$ModName","$Version","$END_TIME","$START_TIME" | awk -F "\"*,\"*" '{print $b}' >> output.csv;
fi
I tried n number of things apart from this code, but not able to find the solution.
I would appreciate your help :)
{print $6, $7, $8, $9, $10} refers to input fields, not output.
When you want to start with 5 empty fields just printf them (avoiding a \n)
if [ "${Version}" != "linux" ]; then
printf "%s" ",,,,,"
fi
echo "${DAY},${ModName},${Version},${END_TIME},${START_TIME}"
(Next time please use lowercase variable names)
When a variable can have a ',', you might need to give values in double quotes.

Bash shell test if all characters in one string are in another string

I have two strings which I want to compare for equal chars, the strings must contain the exact chars but mychars can have extra chars.
mychars="abcdefg"
testone="abcdefgh" # false h is not in mychars
testtwo="abcddabc" # true all char in testtwo are in mychars
function test() {
if each char in $1 is in $2 # PSEUDO CODE
then
return 1
else
return 0
fi
}
if test $testone $mychars; then
echo "All in the string" ;
else ; echo "Not all in the string" ; fi
# should echo "Not all in the string" because the h is not in the string mychars
if test $testtwo $mychars; then
echo "All in the string" ;
else ; echo "Not all in the string" ; fi
# should echo 'All in the string'
What is the best way to do this? My guess is to loop over all the chars in the first parameter.
You can use tr to replace any char from mychars with a symbol, then you can test if the resulting string is any different from the symbol, p.e.,:
tr -s "[$mychars]" "." <<< "ggaaabbbcdefg"
Outputs:
.
But:
tr -s "[$mychars]" "." <<< "xxxggaaabbbcdefgxxx"
Prints:
xxx.xxx
So, your function could be like the following:
function test() {
local dictionary="$1"
local res=$(tr -s "[$dictionary]" "." <<< "$2")
if [ "$res" == "." ]; then
return 1
else
return 0
fi
}
Update: As suggested by #mklement0, the whole function could be shortened (and the logic fixed) by the following:
function test() {
local dictionary="$1"
[[ '.' == $(tr -s "[$dictionary]" "." <<< "$2") ]]
}
The accepted answer's solution is short, clever, and efficient.
Here's a less efficient alternative, which may be of interest if you want to know which characters are unique to the 1st string, returned as a sorted, distinct list:
charTest() {
local charsUniqueToStr1
# Determine which chars. in $1 aren't in $2.
# This returns a sorted, distinct list of chars., each on its own line.
charsUniqueToStr1=$(comm -23 \
<(sed 's/\(.\)/\1\'$'\n''/g' <<<"$1" | sort -u) \
<(sed 's/\(.\)/\1\'$'\n''/g' <<<"$2" | sort -u))
# The test succeeds if there are no chars. in $1 that aren't also in $2.
[[ -z $charsUniqueToStr1 ]]
}
mychars="abcdefg" # define reference string
charTest "abcdefgh" "$mychars"
echo $? # print exit code: 1 - 'h' is not in reference string
charTest "abcddabc" "$mychars"
echo $? # print exit code: 0 - all chars. are in reference string
Note that I've renamed test() to charTest() to avoid a name collision with the test builtin/utility.
sed 's/\(.\)/\1\'$'\n''/g' splits the input into individual characters by placing each on a separate line.
Note that the command creates an extra empty line at the end, but that doesn't matter in this case; to eliminate it, append ; ${s/\n$//;} to the sed script.
The command is written in a POSIX-compliant manner, which complicates it, due to having to splice in an \-escaped actual newline (via an ANSI C-quoted string, $\n'); if you have GNU sed, you can simplify to sed -r 's/(.)/\1\n/g
sort -u then sorts the resulting list of characters and weeds out duplicates (-u).
comm -23 compares the distinct set of sorted characters in both strings and prints those unique to the 1st string (comm uses a 3-column layout, with the 1st column containing lines unique to the 1st file, the 2nd column containing lines unique to the 2nd column, and the 3rd column printing lines the two input files have in common; -23 suppresses the 2nd and 3rd columns, effectively only printing the lines that are unique to the 1st input).
[[ -z $charsUniqueToStr1 ]] then tests if $charsUniqueToStr1 is empty (-z);
in other words: success (exit code 0) is indicated, if the 1st string contains no chars. that aren't also contained in the 2nd string; otherwise, failure (exit code 1); by virtue of the conditional ([[ .. ]]) being the last statement in the function, its exit code also becomes the function's exit code.

Shell script to validate a csv file column by column

I was wondering how I would go about writing this in shell? I want to validate a field in a csv file coulmn by coulmn. For example only want to validate if coulmn number one is number
Number,Letter
1,u
2,h
3,d
4,j
above
Loop - for all files (loop1)
loop from rows(2-n) (loop2) #skipping first row since its a header
validate column 1
validate column 2
...
end loop2
if( file pass validation)
copy to goodFile directory
else(
send to badFile directory
end loop1
What I have here below is a row by row validation, what modification would i need to make it like the above psuedo code i have above. I am terrible at unix just started learning about awk.
#!/bin/sh
for file in /source/*.csv
do
awk -F"," '{ # awk -F", " {'print$2'} to get the fields.
$date_regex = '~(0[1-9]|1[012])[- /.](0[1-9]|[12][0-9]|3[01])[- /.](19|20)\d\d~';
if (length($1) == "")
break
if (length($2) == "") && (length($2) > 30)
break
if (length($3) == "") && ($3 !~ /$date_regex/)
break
if (length($4) == "") && (($4 != "S") || ($4 != "E")
break
if (length($5) == "") && ((length($5) < 9 || (length($5) > 11)))
break
}' file
#whatever you need with "$file"
done
I will combine two different ways to write a loop.
Lines starting with # are comment:
# Read all files. I hope no file have spaces in their names
for file in /source/*.csv ; do
# init two variables before processing a new file
FILESTATUS=GOOD
FIRSTROW=true
# process file 1 line a time, splitting the line by the
# Internal Field Sep ,
cat "${file}" | while IFS=, read field1 field2; do
# Skip first line, the header row
if [ "${FIRSTROW}" = "true" ]; then
FIRSTROW=FALSE
# skip processing of this line, continue with next record
continue;
fi
# Lot of different checks possible here
# Can google them easy (check field integer)
if [[ "${field1}" = somestringprefix* ]]; then
${FILESTATUS}=BAD
# Stop inner loop
break
fi
somecheckonField2
done
if [ ${FILESTATUS} = "GOOD" ] ; then
mv ${file} /source/good
else
mv ${file} /source/bad
fi
done
Assuming no stray whitespace in the file, here's how I'd do it in bash.
# validate: first field is an integer
# validate: 2nd field is a lower-case letter
for file in *.csv; do
good=true
while IFS=, read -ra fields; do
if [[ ! (
${fields[0]} =~ ^[+-]?[[:digit:]]+$
&& ${fields[1]} == [a-z]
) ]]
then
good=false
break
fi
done < "$file"
if $good; then
: # handle good file
else
: # handle bad file
fi
done

Loop through a file with colon-separated strings

I have a file that looks like this:
work:week:day:england:
work1:week:day:sweden:
work2:week:day::
.....
Each time I loop through the list I want go get each string as a variable which I can use.
E.g if I want to know which location I work in I would get the fourth location column from the first column "work*"
I tried this:
for country in $( awk -F '[:]' '{print $1}' file.txt); do
if [[ "$country" == "england" ]];
then
echo "This user works in England!"
else
echo "You do not work in England!"
fi
done
I would like to get each strings separated by a colon as a variable for each row each loop.
You can use just bash for this: set IFS (internal field separator) to : and this will catch the fields properly:
while IFS=":" read -r a b c country
do
echo "$country"
done < "file"
This returns:
england
sweden
This way you will be able to use $a for the first field, $b for the second, $c for the third and $country for the forth. Of course, adapt the number and names to your requirements.
All together:
while IFS=":" read a b c country
do
if [[ "$country" == "england" ]]; then
echo "this user works in England"
else
echo "You do not work in England"
fi
done < "file"
Just do the whole thing in awk:
awk -F: '$4=="england"{print "this user works in England";next}
{print "You do not work in England"}' file
Set the field separator to a colon. If the fourth field is "england", print the first message. next skips to the next line. Otherwise, print the second message.
The fields on each line are accessible by $1, $2, etc. so you can use the data in each field within awk to do whatever you want. The field is read line by line automatically, so there's no need to write your own while read loop.

Resources