Bash/Shell | How to prioritize quote from IFS in read [duplicate] - bash

This question already has answers here:
IFS separate a string like "Hello","World","this","is, a boring", "line"
(3 answers)
Closed 6 years ago.
I'm working with a hand fill file and I am having issue to parse it.
My file input file cannot be altered, and the language of my code can't change from bash script.
I made a simple example to make it easy for you ^^
var="hey","i'm","happy, like","you"
IFS="," read -r one two tree for five <<<"$var"
echo $one:$two:$tree:$for:$five
Now I think you already saw the problem here. I would like to get
hey:i'm:happy, like:you:
but I get
hey:i'm:happy: like:you
I need a way to tell the read that the " " are more important than the IFS. I have read about the eval command but I can't take that risk.
To end this is a directory file and the troublesome field is the description one, so it could have basically anything in it.
original file looking like that
"type","cn","uid","gid","gecos","description","timestamp","disabled"
"type","cn","uid","gid","gecos","description","timestamp","disabled"
"type","cn","uid","gid","gecos","description","timestamp","disabled"
Edit #1
I will give a better exemple; the one I use above is too simple and #StefanHegny found it cause another error.
while read -r ldapLine
do
IFS=',' read -r objectClass dumy1 uidNumber gidNumber username description modifyTimestamp nsAccountLock gecos homeDirectory loginShell createTimestamp dumy2 <<<"$ldapLine"
isANetuser=0
while IFS=":" read -r -a class
do
for i in "${class[#]}"
do
if [ "$i" == "account" ]
then
isANetuser=1
break
fi
done
done <<< $objectClass
if [ $isANetuser == 0 ]
then
continue
fi
#MORE STUFF APPEND#
done < file.csv
So this is a small part of the code but it should explain what I do. The file.csv is a lot of lines like this:
"top:shadowAccount:account:posixAccount","Jdupon","12345","6789","Jdupon","Jean Mark, Dupon","20140511083750Z","","Jean Mark, Dupon","/home/user/Jdupon","/bin/ksh","20120512083750Z","",""

If the various bash versions you will use are all more recent than v3.0, when regexes and BASH_REMATCH were introduced, you could use something like the following function: [Note 1]
each_field () {
local v=,$1;
while [[ $v =~ ^,(([^\",]*)|\"[^\"]*\") ]]; do
printf "%s\n" "${BASH_REMATCH[2]:-${BASH_REMATCH[1]:1:-1}}";
v=${v:${#BASH_REMATCH[0]}};
done
}
It's argument is a single line (remember to quote it!) and it prints each comma-separated field on a separate line. As written, it assumes that no field has an enclosed newline; that's legal in CSV, but it makes dividing the file into lines a lot more complicated. If you actually needed to deal with that scenario, you could change the \n in the printf statement to a \0 and then use something like xargs -0 to process the output. (Or you could insert whatever processing you need to do to the field in place of the printf statement.)
It goes to some trouble to dequote quoted fields without modifying unquoted fields. However, it will fail on fields with embedded double quotes. That's fixable, if necessary. [Note 2]
Here's a sample, in case that wasn't obvious:
while IFS= read -r line; do
each_field "$line"
printf "%s\n" "-----"
done <<EOF
type,cn,uid,gid,gecos,"description",timestamp,disabled
"top:shadowAccount:account:posixAccount","Jdupon","12345","6789","Jdupon","Jean Mark, Dupon","20140511083750Z","","Jean Mark, Dupon","/home/user/Jdupon","/bin/ksh","20120512083750Z","",""
EOF
Output:
type
cn
uid
gid
gecos
description
timestamp
disabled
-----
top:shadowAccount:account:posixAccount
Jdupon
12345
6789
Jdupon
Jean Mark, Dupon
20140511083750Z
Jean Mark, Dupon
/home/user/Jdupon
/bin/ksh
20120512083750Z
-----
Notes:
I'm not saying you should use this function. You should use a CSV parser, or a language which includes a good CSV parsing library, like python. But I believe this bash function will work, albeit slowly, on correctly-formatted CSV files of a certain common CSV dialect.
Here's a version which handles doubled quotes inside quoted fields, which is the classic CSV syntax for interior quotes:
each_field () {
local v=,$1;
while [[ $v =~ ^,(([^\",]*)|\"(([^\"]|\"\")*)\") ]]; do
echo "${BASH_REMATCH[2]:-${BASH_REMATCH[3]//\"\"/\"}}";
v=${v:${#BASH_REMATCH[0]}};
done
}

My suggestion, as in some previous answers (see below), is to switch the separator to | (and use IFS="|" instead):
sed -r 's/,([^,"]*|"[^"]*")/|\1/g'
This requires a sed that has extended regular expressions (-r) however.
Should I use AWK or SED to remove commas between quotation marks from a CSV file? (BASH)
Is it possible to write a regular expression that matches a particular pattern and then does a replace with a part of the pattern

Related

How do I ignore a byte order marker from a while read loop in zsh

I need to verify that all images mentioned in a csv are present inside a folder. I wrote a small shell script for that
#!/bin/zsh
red='\033[0;31m'
color_Off='\033[0m'
csvfile=$1
imgpath=$2
cat $csvfile | while IFS=, read -r filename rurl
do
if [ -f "${imgpath}/${filename}" ]
then
echo -n
else
echo -e "$filename ${red}MISSING${color_Off}"
fi
done
My CSV looks something like
Image1.jpg,detail-1
Image2.jpg,detail-1
Image3.jpg,detail-1
The csv was created by excel.
Now all 3 images are present in imgpath but for some reason my output says
Image1.jpg MISSING
Upon using zsh -x to run the script i found that my CSV file has a BOM at the very beginning making the image name as \ufeffImage1.jpg which is causing the whole issue.
How can I ignore a BOM(byte-order marker) in a while read operation?
zsh provides a parameter expansion (also available in POSIX shells) to remove a prefix: ${var#prefix} will expand to $var with prefix removed from the front of the string.
zsh also, like ksh93 and bash, supports ANSI C-like string syntax: $'\ufeff' refers to the Unicode sequence for a BOM.
Combining these, one can refer to ${filename#$'\ufeff'} to refer to the content of $filename but with the Unicode sequence for a BOM removed if it's present at the front.
The below also makes some changes for better performance, more reliable behavior with odd filenames, and compatibility with non-zsh shells.
#!/bin/zsh
red='\033[0;31m'
color_Off='\033[0m'
csvfile=$1
imgpath=$2
while IFS=, read -r filename rurl; do
filename=${filename#$'\ufeff'}
if ! [ -f "${imgpath}/${filename}" ]; then
printf '%s %bMISSING%b\n' "$filename" "$red" "$color_Off"
fi
done <"$csvfile"
Notes on changes unrelated to the specific fix:
Replacing echo -e with printf lets us pick which specific variables get escape sequences expanded: %s for filenames means backslashes and other escapes in them are unmodified, whereas %b for $red and $color_Off ensures that we do process highlighting for them.
Replacing cat $csvfile | with < "$csvfile" avoids the overhead of starting up a separate cat process, and ensures that your while read loop is run in the same shell as the rest of your script rather than a subshell (which may or may not be an issue for zsh, but is a problem with bash when run without the non-default lastpipe flag).
echo -n isn't reliable as a noop: some shells print -n as output, and the POSIX echo standard, by marking behavior when -n is present as undefined, permits this. If you need a noop, : or true is a better choice; but in this case we can just invert the test and move the else path into the truth path.

Read CSV and add data using condition

I am trying to read a CSV which has data like:
Name Time
John
Ken
Paul
I want to read column one if it matches then change time. For example, if $1 = John then change time of the John to $2.
Here is what I have so far:
while IFS=, read -r col1 col2
do
echo "$col1"
if[$col1 eq $1] then
echo "$2:$col2"
done < test.csv >> newupdate.csv
To run ./test.sh John 30.
I am trying to keep the csv updated so making a new file I thought would be okay. so I can read updated file again for next run and update again.
Your shell script has a number of syntax errors. You need spaces inside [...] and you should generally quote your variables. You can usefully try http://shellcheck.net/ before asking for human assistance.
while IFS=, read -r col1 col2
do
if [ "$col1" = "$1" ]; then
col2=$2
fi
echo "$col1,$col2" # comma or colon separated?
done < test.csv >newupdate.csv
Notice how we always print the entire current line, with or without modifications depending on the first field. Notice also the semicolon (or equivalently newline) before then, and use of = as the equality comparison operator for strings. (The numeric comparison operator is -eq with a dash, not eq.)
However, it's probably both simpler and faster to use Awk instead. The shell isn't very good (or very quick) at looping over lines in the first place.
awk -F , -v who="$1" -v what="$2" 'BEGIN { OFS=FS }
$1 == who { $2 = what } 1' test.csv >newupdate.csv
Doing this in sed will be even more succinct; but the error symptoms if your variables contain characters which have a special meaning to sed will be bewildering. So don't really do this.
sed "s/^$1,.*/$1,$2/" test.csv >newupdate.csv
There are ways to make this less brittle, but then not using sed for any non-trivial scripts is probably the most straightforward solution.
None of these scripts use any Bash-specific syntax, so you could run them under any POSIX-compatible shell.

How to read a space in bash - read will not

Many people have shown how to keep spaces when reading a line in bash. But I have a character based algorithm which need to process each end every character separately - spaces included. Unfortunately I am unable to get bash read to read a single space character from input.
while read -r -n 1 c; do
printf "[%c]" "$c"
done <<< "mark spitz"
printf "[ ]\n"
yields
[m][a][r][k][][s][p][i][t][z][][ ]
I've hacked my way around this, but it would be nice to figure out how to read a single any single character.
Yep, tried setting IFS, etc.
Just set the input field separator(a) so that it doesn't treat space (or any character) as a delimiter, that works just fine:
printf 'mark spitz' | while IFS="" read -r -n 1 c; do
printf "[%c]" "$c"
done
echo
That gives you:
[m][a][r][k][ ][s][p][i][t][z]
You'll notice I've also slightly changed how you're getting the input there, <<< appears to provide a extraneous character at the end and, while it's not important to the input method itself, I though it best to change that to avoid any confusion.
(a) Yes, I'm aware that you said you've tried setting IFS but, since you didn't actually show how you'd tried this, and it appears to work fine the way I do it, I have to assume you may have just done something wrong.

Writing a bash script that removes every line of text not beginning with x, y, or z

I'm trying to write a bash script that goes through a text file, TextInput.txt, and remove every line of text that doesn't start with s1, s2 ... sN (where sX is a single- or multi-character string; in the example below, I'll go with 'Rabbit', 'Squirrel', and 'Puppy' as accepted starts of lines). It should then write everything to a new text file, TextOutput.txt.
Being no to bash scripting, I've only managed to write this fairly limited dummy code.
#!/bin/sh
TextInput=$(<TextInput.txt)
AcceptedStrings[0]='Rabbit'
AcceptedStrings[1]='Squirrel'
AcceptedStrings[2]='Puppy'
# What goes here?
echo "$TextModified" > TextOutput.txt
If the strings are specified on the command line, you can loop through them using grep -v. If you already know all the strings, you can do egrep -v ^(str1|str2|str3) filename.
Reading your question again, you said that you want to remove lines that don't start with the specified string but in the example, you gave a list of accepted strings. You may need to clarify on that.
As mentioned in the comment above, you could use grep. However, if you wanted to do it using bash, you could say:
acceptedstring=( Rabbit Squirrel Puppy )
pattern=$( IFS=$'|'; echo "${acceptedstring[*]}" )
while read -r line; do
[[ "$line" =~ ^($pattern) ]] || echo "$line"
done < TextInput.txt > TextOutput.txt
First convert your array of accepting strings into a pattern and then use grep to find all the lines which match the pattern:
pattern=$(printf "|%s" "${AcceptedStrings[#]}")
pattern=${pattern:1}
grep "^($pattern)" TextInput.txt > TextOutput.txt

reading a very long line from without breaking into multiple line in shell script

I have a file which has very long rows of data. When i try to read using shell script, the data comes into multiple lines,ie, breaks at certain points.
Example row:
B_18453583||Active|917396140129|405819121107402|Active|7396140129||7396140129|||||||||18-MAY-10|||||18-MAY-10|405819121107402|Outgoing International Calls,Outgoing Calls,WAP,Call Waiting,MMS,Data Service,National Roaming-Voice,Outgoing International Calls except home country,Conference Call,STD,Call Forwarding-Barr,CLIP,Incoming Calls,INTSNS,WAPSNS,International Roaming-Voice,ISD,Incoming Calls When Roaming Internationally,INTERNET||For You Plan||||||||||||||||||
All this is the content of a single line.
I use a normal read like this :
var=`cat pranay.psv`
for i in $var; do
echo $i
done
The output comes as:
B_18453583||Active|917396140129|405819121107402|Active|7396140129||7396140129|||||||||18- MAY-10|||||18-MAY-10|405819121107402|Outgoing
International
Calls,Outgoing
Calls,WAP,Call
Waiting,MMS,Data
Service,National
Roaming-Voice,Outgoing
International
Calls
except
home
country,Conference
Call,STD,Call
Forwarding-Barr,CLIP,Incoming
Calls,INTSNS,WAPSNS,International
Roaming-Voice,ISD,Incoming
Calls
When
Roaming
Internationally,INTERNET||For
You
Plan||||||||||||||||||
How do i print all in single line??
Please help.
Thanks
This is because of word splitting. An easier way to do this (which also disbands with the useless use of cat) is this:
while IFS= read -r -d $'\n' -u 9
do
echo "$REPLY"
done 9< pranay.psv
To explain in detail:
$'...' can be used to create human readable strings with escape sequences. See man bash.
IFS= is necessary to avoid that any characters in IFS are stripped from the start and end of $REPLY.
-r avoids interpreting backslash in text specially.
-d $'\n' splits lines by the newline character.
Use file descriptor 9 for data storage instead of standard input to avoid greedy commands like cat eating all of it.
You need proper quoting. In your case, you should use the command read:
while read line ; do
echo "$line"
done < pranay.psv

Resources