Parsing CSV Values in Bash

Parsing CSV Values in Bash - bash

I am trying to write a Bash script that will read a particular csv, and move files based on the values of a column in that csv. However, when I do so it treats everything as false and does not move the files despite the fact that I know for sure it should be reading roughly one in five lines as true.
The code is as follows
#!/bin/bash
FILE=filename.csv
while IFS=, read -a csv_line;
do
EMAIL="${csv_line[1]}" #identify the filename
HASVAL="${csv_line[62]}" #should be either 1 or 0
if [ -e "$EMAIL" ]
then
echo "detected"
if [ "$HASVAL" = "1" ]
then
mv "$EMAIL" /home/targetdirectory
echo "moved"
fi
fi
done < $FILE
I cannot see what is wrong with this script. It only prints "detected", never prints "moved" and does not move the files, so I suspect it is not matching the text correctly. Is it possible that I am reading the contents of a csv wrong and its possible for not all values in a csv to be a string? Or am I doing something else wrong?
Thank you for any help you can give.
EDIT: replacing the offending if statement with
[ "$HASVAL" -eq 1 ]
gives me
detected
: integer expression expected
on every line so I'm not sure integer comparison will work either.
EDIT: As discussed below, it looks like the problem has been solved. The .csv had DOS endings, and since I was looking at the last column it wouldn't match properly, and the last column had to be trimmed of '/r' so it could actually match it to a proper string. Thanks to everybody for the assistance.

Edit: best practices suggest using double [[ ]] and double equals for your if statement. Also have you verified you are comparing String values or should you be doing an integer comparison.

Related

Ubuntu bash script string contains similar words

I am trying to write a bash script that will tell whether two strings are of similar value. I have produced this bash script:
#!/bin/bash
value="java.lang.NullPointerException"
if [[ "java.lang.NullPointerException" = "$value" || "java.lang.NullPointerException" == "$value"* ]]; then
echo "Match"
fi
Basically what I want to achive, is that if two strings are of equal value or a very similar either side but with matching text in the middle then echo "Match".
I have tried a number of resources but can't get this example to work. I have taken a look at:
In bash, how can I check if a string begins with some value?
How to test that a variable starts with a string in bash?
https://ubuntuforums.org/showthread.php?t=1118003
Please note these values would eventually come from a text file and so the values will be in a form of variables. I have tried different approaches, but don't seem to get it working. I just want to get this if statement working. It works for matching text but not for values either side. Value could be "java.lang.NullPointerException: Unexpected" or "Unexpected java.lang.NullPointerException".

#!/bin/bash
value="java.lang.NullPointerException" #or java.lang.NullPointerException: Unexpected
if [[ $value == *"java.lang.NullPointerException"* ]];
then
echo "Match"
fi

A simple and portable (POSIX compliant) technique for wildcard matching is to use a case statement rather than if. For your example, this would look something like
#!/bin/sh
value="java.lang.NullPointerException"
case "$value" in
*java.lang.NullPointerException*) echo Match;;
esac

Unexpected end of file in while loop in bash

I am trying to write a bash script that will do the following:
Take a directory or file as input (will always begin with /mnt/user/)
Search other mount points for same file or directory (will always begin with /mnt/diskx)
Return value
So, for example, the input will be "/mnt/user/my_files/file.txt". It will search if ""/mnt/disk1/my_files/file.txt" exists and will incrementally look for each disk (disk2, disk3, etc) until it finds it or disk20.
This is what I have so far:
#/user/bin/bash
var=$1
i=0
while [ -e $check_var = echo $var | sed 's:/mnt/user:/mnt/disk$i+1:']
do
final=$check_var
done
It's incomplete yes, but I am not that proficient in bash so I'm doing a little at a time. I'm sure my command won't work properly yet either but right now I am getting an "unexpected end of file" and I can't figure out why.

There are many issues here:
If this is the actual code you're getting "unexpected end of file" on, you should save the file in Unix format, not DOS format.
The shebang should be #!/usr/bin/bash or #!/bin/bash depending on your system
You have to assign check_var before running [ .. ] on it.
You have to use $(..) to expand a command
Variables like $i are not expanded in single quotes
sed can't add numbers
i is never incremented
the loop logic is inverted, it should loop until it matches and not while it matches.
You'd want to assign final after -- not in -- the loop.
Consider doing it in even smaller pieces, it's easier to debug e.g. the single statement sed 's:/mnt/user:/mnt/disk$i+1:' than your entire while loop.
Here's a more canonical way of doing it:
#!/bin/bash
var="${1#/mnt/user/}"
for file in /mnt/disk{1..20}/"$var"
do
[[ -e "$file" ]] && final="$file" && break
done
if [[ $final ]]
then
echo "It exists at $final"
else
echo "It doesn't exist anywhere"
fi

Bash/Shell | How to prioritize quote from IFS in read [duplicate]

This question already has answers here:
IFS separate a string like "Hello","World","this","is, a boring", "line"
(3 answers)
Closed 6 years ago.
I'm working with a hand fill file and I am having issue to parse it.
My file input file cannot be altered, and the language of my code can't change from bash script.
I made a simple example to make it easy for you ^^
var="hey","i'm","happy, like","you"
IFS="," read -r one two tree for five <<<"$var"
echo $one:$two:$tree:$for:$five
Now I think you already saw the problem here. I would like to get
hey:i'm:happy, like:you:
but I get
hey:i'm:happy: like:you
I need a way to tell the read that the " " are more important than the IFS. I have read about the eval command but I can't take that risk.
To end this is a directory file and the troublesome field is the description one, so it could have basically anything in it.
original file looking like that
"type","cn","uid","gid","gecos","description","timestamp","disabled"
"type","cn","uid","gid","gecos","description","timestamp","disabled"
"type","cn","uid","gid","gecos","description","timestamp","disabled"
Edit #1
I will give a better exemple; the one I use above is too simple and #StefanHegny found it cause another error.
while read -r ldapLine
do
IFS=',' read -r objectClass dumy1 uidNumber gidNumber username description modifyTimestamp nsAccountLock gecos homeDirectory loginShell createTimestamp dumy2 <<<"$ldapLine"
isANetuser=0
while IFS=":" read -r -a class
do
for i in "${class[#]}"
do
if [ "$i" == "account" ]
then
isANetuser=1
break
fi
done
done <<< $objectClass
if [ $isANetuser == 0 ]
then
continue
fi
#MORE STUFF APPEND#
done < file.csv
So this is a small part of the code but it should explain what I do. The file.csv is a lot of lines like this:
"top:shadowAccount:account:posixAccount","Jdupon","12345","6789","Jdupon","Jean Mark, Dupon","20140511083750Z","","Jean Mark, Dupon","/home/user/Jdupon","/bin/ksh","20120512083750Z","",""

If the various bash versions you will use are all more recent than v3.0, when regexes and BASH_REMATCH were introduced, you could use something like the following function: [Note 1]
each_field () {
local v=,$1;
while [[ $v =~ ^,(([^\",]*)|\"[^\"]*\") ]]; do
printf "%s\n" "${BASH_REMATCH[2]:-${BASH_REMATCH[1]:1:-1}}";
v=${v:${#BASH_REMATCH[0]}};
done
}
It's argument is a single line (remember to quote it!) and it prints each comma-separated field on a separate line. As written, it assumes that no field has an enclosed newline; that's legal in CSV, but it makes dividing the file into lines a lot more complicated. If you actually needed to deal with that scenario, you could change the \n in the printf statement to a \0 and then use something like xargs -0 to process the output. (Or you could insert whatever processing you need to do to the field in place of the printf statement.)
It goes to some trouble to dequote quoted fields without modifying unquoted fields. However, it will fail on fields with embedded double quotes. That's fixable, if necessary. [Note 2]
Here's a sample, in case that wasn't obvious:
while IFS= read -r line; do
each_field "$line"
printf "%s\n" "-----"
done <<EOF
type,cn,uid,gid,gecos,"description",timestamp,disabled
"top:shadowAccount:account:posixAccount","Jdupon","12345","6789","Jdupon","Jean Mark, Dupon","20140511083750Z","","Jean Mark, Dupon","/home/user/Jdupon","/bin/ksh","20120512083750Z","",""
EOF
Output:
type
cn
uid
gid
gecos
description
timestamp
disabled
-----
top:shadowAccount:account:posixAccount
Jdupon
12345
6789
Jdupon
Jean Mark, Dupon
20140511083750Z
Jean Mark, Dupon
/home/user/Jdupon
/bin/ksh
20120512083750Z
-----
Notes:
I'm not saying you should use this function. You should use a CSV parser, or a language which includes a good CSV parsing library, like python. But I believe this bash function will work, albeit slowly, on correctly-formatted CSV files of a certain common CSV dialect.
Here's a version which handles doubled quotes inside quoted fields, which is the classic CSV syntax for interior quotes:
each_field () {
local v=,$1;
while [[ $v =~ ^,(([^\",]*)|\"(([^\"]|\"\")*)\") ]]; do
echo "${BASH_REMATCH[2]:-${BASH_REMATCH[3]//\"\"/\"}}";
v=${v:${#BASH_REMATCH[0]}};
done
}

My suggestion, as in some previous answers (see below), is to switch the separator to | (and use IFS="|" instead):
sed -r 's/,([^,"]*|"[^"]*")/|\1/g'
This requires a sed that has extended regular expressions (-r) however.
Should I use AWK or SED to remove commas between quotation marks from a CSV file? (BASH)
Is it possible to write a regular expression that matches a particular pattern and then does a replace with a part of the pattern

Use first 3 characters of a filename as a variable in shell script

this is my first post so hopefully I will make my question clear.
I am new to shell scripts and my task with this one is to add a new value to every line of a csv file. The value that needs added is based on the first 3 digits of the filename.
I bit of background. The csv files I am receiving are eventually being loaded into partitioned oracle tables. The start of the file name (e.g. BATTESTFILE.txt) contains the partitioned site so I need to write a script that takes the first 3 characters of the filename (in this example BAT) and add this to the end of each line of the file.
The closest I have got so far is when I stripped the code to the bare basics of what I need to do:
build_files()
{
OLDFILE=${filename[#]}.txt
NEWFILE=${filename[#]}.NEW.txt
ABSOLUTE='path/scripts/'
FULLOLD=$ABSOLUTE$OLDFILE
FULLNEW=$ABSOLUTE$NEWFILE
sed -e s/$/",${j}"/ "${FULLOLD}" > "${FULLNEW}"
}
set -A site 'BAT'
set -A filename 'BATTESTFILE'
for j in ${site[#]}; do
for i in ${filename[#]}; do
build_files ${j}
done
done
Here I have set up an array site as there will be 6 'sites' and this will make it easy to add additionals sits to the code as the files come through to me. The same is to be siad for the filename array.
This codes works, but it isn't as automated as I need. One of my most recent attempts has been below:
build_files()
{
OLDFILE=${filename[#]}.txt
NEWFILE=${filename[#]}.NEW.txt
ABSOLUTE='/app/dss/dsssis/sis/scripts/'
FULLOLD=$ABSOLUTE$OLDFILE
FULLNEW=$ABSOLUTE$NEWFILE
sed -e s/$/",${j}"/ "${FULLOLD}" > "${FULLNEW}"
}
set -A site 'BAT'
set -A filename 'BATTESTFILE'
for j in ${site[#]}; do
for i in ${filename[#]}; do
trust=echo "$filename" | cut -c1-3
echo "$trust"
if ["$trust" = 'BAT']; then
${j} = 'BAT'
fi
build_files ${j}
done
done
I found the code trust=echo "$filename" | cut -c1-3 through another question on StackOverflow as I was researching, but it doesn't seem to work for me. I added in the echo to test what trust was holding, but it was empty.
I am getting 2 errors back:
Line 17 - BATTESTFILE: not found
Line 19 - test: ] missing
Sorry for the long winded questions. Hopefully It contains helpful info and shows the steps I have taken. Any questions, comment away. Any help or guidance is very much appreciated. Thanks.

When you are new with shells, try avoiding arrays.
In an if statement use spaces before and after the [ and ] characters.
Get used to surrounding your shell variables with {} like ${trust}
I do not know how you fill your array, when the array is hardcoded, try te replace with
SITE=file1
SITE="${SITE} file2"
And you must tell unix you want to have the rightside eveluated with $(..) (better than backtics):
trust=$(echo "${filename}" | cut -c1-3)
Some guidelines and syntax help can be found at Google

Just use shell parameter expansion:
$ var=abcdefg
$ echo "${var:0:3}"
abc
Assuming you're using a reasonably capable shell like bash or ksh, for example

Just in case it is useful for anyone else now or in the future, I got my code to work as desired by using the below. Thanks Walter A below for his answer to my main problem of getting the first 3 characters from the filename and using them as a variable.
This gave me the desired output of taking the first 3 characters of the filename, and adding them to the end of each line in my csv file.
## Get the current Directory and file name, create a new file name
build_files()
{
OLDFILE=${i}.txt
NEWFILE=${i}.NEW.txt
ABSOLUTE='/app/dss/dsssis/sis/scripts/'
FULLOLD=$ABSOLUTE$OLDFILE
FULLNEW=$ABSOLUTE$NEWFILE
## Take the 3 characters from the filename and
## add them onto the end of each line in the csv file.
sed -e s/$/";${j}"/ "${FULLOLD}" > "${FULLNEW}"
}
## Loop to take the first 3 characters from the file names held in
## an array to be added into the new file above
set -A filename 'BATTESTFILE'
for i in ${filename[#]}; do
trust=$(echo "${i}" | cut -c1-3)
echo "${trust}"
j="${trust}"
echo "${i} ${j}"
build_files ${i} ${j}
done
Hope this is useful for someone else.

String contains in Bash that is a directory path

I am writing an SVN script that will export only changed files. In doing so I only want to export the files if they don't contain a specific file.
So, to start out I am modifying the script found here.
I found a way to check if a string contains using the functionality found here.
Now, when I try to run the following:
filename=`echo "$line" |sed "s|$repository||g"`
if [ ! -d $target_directory$filename ] && [[!"$filename" =~ *myfile* ]] ; then
fi
However I keep getting errors stating:
/home/home/myfile: "no such file or directory"
It appears that BASH is treating $filename as a literal. How do I get it so that it reads it as a string and not a path?
Thanks for your help!

You have some syntax issues (a shell script linter can weed those out):
You need a space after "[[", otherwise it'll be interpretted as a command (giving an error similar to what you posted).
You need a space after the "!", otherwise it'll be considered part of the operand.
You also need something in the then clause, but since you managed to run it, I'll assume you just left it out.
You combined two difference answers from the substring thing you posted, [[ $foo == *bar* ]] and [[ $foo =~ .*bar.* ]]. The first uses a glob, the second uses a regex. Just use [[ ! $filename == *myfile* ]]

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Parsing CSV Values in Bash - bash

Edit: best practices suggest using double [[ ]] and double equals for your if statement. Also have you verified you are comparing String values or should you be doing an integer comparison.

Related

Ubuntu bash script string contains similar words

Unexpected end of file in while loop in bash

Bash/Shell | How to prioritize quote from IFS in read [duplicate]

Use first 3 characters of a filename as a variable in shell script

String contains in Bash that is a directory path

Categories

Resources