Combining variable concatenation and for loops in bash - bash

I have this function in R, which I use to produce a list of dates:
#! usr/bin/env Rscript
date_seq = function(){
args = commandArgs(trailingOnly = TRUE)
library(lubridate)
days = seq(ymd(args[1]),ymd(args[2]),1)
days =format(days, "%Y%m%d")
return(days)
}
date_seq()
I call this function in a bash script to create a vector of dates:
Rscript date_seq.R 20160730 20160801 > dates
I define a couple of other string variables in the bash script:
home_url="https://pando-rgw01.chpc.utah.edu/hrrr/sfc/"
file_name="/hrrr.t{00-23}z.wrfsfcf00.grib2"
The final goal is to create a vector of download links, that incorporates the three variables home_url, date and file_name, like so:
"https://pando-rgw01.chpc.utah.edu/hrrr/sfc/20160730/hrrr.t{00-23}z.wrfsfcf00.grib2"
"https://pando-rgw01.chpc.utah.edu/hrrr/sfc/20160731/hrrr.t{00-23}z.wrfsfcf00.grib2"
"https://pando-rgw01.chpc.utah.edu/hrrr/sfc/20160801/hrrr.t{00-23}z.wrfsfcf00.grib2"
I tried a few lines in bash script:
for date in $dates; do download_url=$home_url$date$hrrr_file; cat
$download_url; done
for date in $dates; do download_url="${home_url}${date}${hrrr_file}"; cat $download_url;
done
for date in $dates; do download_url="$home_url"; download_url+="$date"; download_url+="$hrrr_file"; cat $download_url; done
None of these produce the output I expect. I am not sure if the download_url variable is not being produced, or is being produced and stored somewhere, and I am not able to reproduce it. Can anyone please help me understand?
Edit
Results of trying the suggestions below:
#triplee suggested using
sed "s#.*#$home_url&$hrrr_file#" "dates"
and
while read -r date; do; printf '%s%s%s\n' "$home_url" "$date" "$hrrr_file"; done <dates
Both of these produce this output:
https://pando-rgw01.chpc.utah.edu/hrrr/sfc/[1] "20160730" "20160731" "20160801"/hrrr.t{00-23}z.wrfsfcf00.grib2
#xdhmoore suggested using
for date in $(cat dates); do; echo ${home_url}${date}${hrrr_file}"; done
which produces this output:
https://pando-rgw01.chpc.utah.edu/hrrr/sfc/[1]/hrrr.t{00-23}z.wrfsfcf00.grib2
https://pando-rgw01.chpc.utah.edu/hrrr/sfc/"20160730"/hrrr.t{00-23}z.wrfsfcf00.grib2
https://pando-rgw01.chpc.utah.edu/hrrr/sfc/"20160731"/hrrr.t{00-23}z.wrfsfcf00.grib2
https://pando-rgw01.chpc.utah.edu/hrrr/sfc/"20160801"/hrrr.t{00-23}z.wrfsfcf00.grib2`
Both are not the output I am expecting, though the solution by #xdhmoore is closer. But I see another problem in #xdhmoore's solution: The quotations around the date in output. The output of cat dates looks like this: "20160730" "20160731" "20160801", so I think I have to rework the function or the way I call it in the bash script as well.
I'll keep updating the question to reflect the output of all suggestions, since it is simpler to do so than trying to answer each comment. As always, thanks a lot!

The for statement loops over the tokens you give it as arguments, not the contents of files.
You seem to be looking for
sed "s#.*#$home_url&$hrrr_file#" "dates"
The token & recalls the text which was matched by the regex in a sed substitution.
The same thing could be done vastly more slowly with a shell loop;
while read -r date; do
printf '%s%s%s\n' "$home_url" "$date" "$hrrr_file"
done <dates
which illustrates how to (slowly) iterate over the lines in a file without the use of external utilities.
Either of hese can be piped to xargs curl (or perhaps xargs -n 1 curl); or you could refactor the while loop;
while read -r date; do
curl "$home_url$date$hrrr_file"
done <dates
As noted in comments, cat is a command for copying files, not echoing text; for the latter, use echo or (for any nontrivial formatting) printf.
Update: The above assumes your R output generated one date per line. To split the file into lines and remove quotes around the values, you can preprocess with sed 's/"\([^"]\)" */\1\n/g' "dates" (provided your sed dialects supports \n as an escape for newline); or perhaps do
sed "s#\"\([^\"]*\)\" *#$home_url\\1$frrr_file\\
#g" "dates"
again with some reservation for differences between sed dialects. In the worst case, maybe switch to Perl, which actually brings some relief to the backslashitis, but requires new backslashes in other places:
perl -pe "s#\"(\d+)\" *#$home_url\$1$frrr_file\n#g" "dates"
But probably a better solution is to change your R script so it doesn't produce wacky output. Or just don't use R in the first place. See e.g. https://stackoverflow.com/a/3494814/874188 for how to get dates from Perl. Or if you have GNU date, try
#!/bin/bash
start=$(date -d "$1" +%s)
end=$(date -d "$2" +%s)
for ((i=start; i<=end; i+=60*60*24)); do
date -d "#$i" +%Y%m%d
done
(If you are on a Mac or similar, the date program won't accept a date as an argument to -d and you will have to use slightly different syntax. It's not hard to do but this answer has too many speculations already.)

Related

Bash change filenames from month name to month number

I have been trying to do this all afternoon and cannot figure out how to do this. I'm running MXLinux and from the commandline am trying (unsucessfully) to batch edit a bunch of filenames (I've about 500 so don't want to do this by hand) from:
2020-August-15.pdf
2021-October-15.pdf
To:
2020-08-15.pdf
2021-10-15.pdf
I cannot find anything that does this (in a way I understand) so am wondering. Is this possible or am I to do this by hand?
Admittedly I'm not very good with Bash but I can use sed, awk, rename, date, etc. I just can't seem to find a way to combine them to rename my files.
I cannot find anything on here that has been of any help in doing this.
Many thanks.
EDIT:
I'm looking for a way to combine commands and ideally not have to overtly for-loop through the files and the months. What I mean is I would prefer, and was trying to, pipe ls into a command combination to convert as specified above. Sorry for the confusion.
EDIT 2:
Thank you to everyone who came up with answers, and for you patience with my lack of ability. I don't think I'm qualified to make a decision as to the best answer however have settled, for my use-case on the following:
declare -A months=( [January]=01 [February]=02 [March]=03 [April]=04 [May]=05\
[June]=06 [July]=07 [August]=08 [September]=09 [October]=10 [November]=11 [December]=12 )
for oldname in 202[01]-[A-za-z]*-15.pdf
do
IFS=-. read y m d ext <<< "${oldname}"
mv "$oldname" "$y-${months[$m]}-$d.$ext"
done
I think this offer the best flexibility. I would have liked the date command but don't know how to not have the file extension hard coded. I was unaware of the read command or that you could use patterns in the for-loop.
I have learned a lot from this thread so again thank you all. Really my solution is a cross of most of the solutions below as I've taken from them all.
With just Bash built-ins, try
months=(\
January February March April May June \
July August September October November December)
for file in ./*; do
dst=$file
for ((i=1; i<=${#months[#]}; ++i)); do
((i<10)) && i=0$i
dst=${dst//${months[$i]}/$i}
done
mv -- "$file" "$dst"
done
This builds up an array of month names, and loops over it to find the correct substitution.
The line ((i<10)) && i=0$i adds zero padding for single-digit month numbers; remove it if that's undesired.
As an aside, you should basically never use ls in scripts.
The explicit loop could be avoided if you had a command which already knows how to rename files, but this implements that command. If you want to save it in a file, replace the hard-coded ./* with "$#", add a #!/bin/bash shebang up top, save it as monthrenamer somewhere in your PATH, and chmod a+x monthrenamer. Then you can run it like
monthrenamer ./*
to rename all the files in the current directory without an explicit loop, or a more restricted wildcard argument to only select a smaller number of files, like
monthrenamer /path/to/files/2020*.pdf
You could run date twelve times to populate the array, but it's not like hard-coding the month names is going to be a problem. We don't expect them to change (and calling twelve subprocesses at startup just to avoid that seems quite excessive in this context).
As an aside, probably try to fix the process which creates these files to produce machine-readable file names. It's fairly obvious to a human, too, that 2021-07 refers to the month of July, whereas going the other way is always cumbersome (you will need to work around it in every tool or piece of code which wants to order the files by name).
Assuming you have the GNU version of date(1), you could use date -d to map the month names to numbers:
for f in *.pdf; do
IFS=- read y m d <<<"${f%.pdf}"
mv "$f" "$(date -d "$m $d, $y" +%F.pdf)"
done
I doubt it's any more efficient than your sed -e 's/January/01/' -e 's/February/02/' etc, but it does feel less tedious to type. :)
Explanation:
Loop over the .pdf files, setting f to each filename in turn.
The read line is best explained right to left:
a.
"${f%.pdf}" expands to the filename without the .pdf part, e.g. "2020-August-15".
b. <<< turns that value into a here-string, which is a mechanism for feeding a string as standard input to some command. Essentially, x <<<y does the same thing as echo y | x, with the important difference that the x command is run in the current shell instead of a subshell, so it can have side effects like setting variables.
c. read is a shell builtin that by default reads a single line of input and assigns it to one or more shell variables.
d. IFS is a parameter that tells the shell how to split lines up into words. Here we're setting it – only for the duration of the read command – to -. That tells read to split the line it reads on hyphens instead of whitespace; IFS=- read y m d <<<"2020-August-15" assigns "2020" to y, "August" to m, and "15" to d.
The GNU version of date(1) has a -d parameter that tells it to display another date instead of the current one. It accepts a number of different formats itself, sadly not including "yyyy-Mon-dd", which is why I had to split the filename up with read. But it does accept "Mon dd, yyyy", so that's what I pass to it. +%F.pdf tells it that when it prints the date back out it should do so ISO-style as "yyyy-mm-dd", and append ".pdf" to the result. ("%F" is short for "%Y-%m-%d"; I could also have used -I instead of +anything and moved the .pdf outside the command expansion.)
f. The call to date is wrapped in $(...) to capture its output, and that result is used as the second parameter to mv to rename the files.
Another way with POSIX shell:
# Iterate over pattern that will exclude already renamed pdf files
for file in [0-9][0-9][0-9][0-9]-[^0-9]*.pdf
do
# Remove echo if result match expectations
echo mv -- "$file" "$(
# Set field separator to - or . to split filename components
IFS=-.
# Transfer filename components into arguments using IFS
set -- $file
# Format numeric date string
date --date "$3 $2 $1" '+%Y-%m-%d.pdf'
)"
done
If you are using GNU utilities and the Perl version of rename (not the util-linux version), you can build a one-liner quite easily:
rename "$(
seq -w 1 12 |
LC_ALL=C xargs -I# date -d 1970-#-01 +'s/^(\d{4}-)%B(-\d{2}\.pdf)$/$1%m$2/;'
)" *.pdf
You can shorten if you don't care about safety (or legibility)... :-)
rename "$(seq -f%.f/1 12|date -f- +'s/%B/%m/;')" *.pdf
What I mean is I would prefer, and was trying to, pipe ls into a command combination to convert as specified above.
Well, you may need to implement that command combination then. Here’s one consisting of a single “command” and in pure Bash without external processes. Pipe your ls output into that and, once satisfied with the output, remove the final echo…
#!/bin/bash
declare -Ar MONTHS=(
[January]=01
[February]=02
[March]=03
[April]=04
[May]=05
[June]=06
[July]=07
[August]=08
[September]=09
[October]=10
[November]=11
[December]=12)
while IFS= read -r path; do
IFS=- read -ra segments <<<"$path"
segments[-2]="${MONTHS["${segments[-2]}"]}"
IFS=- new_path="${segments[*]}"
echo mv "$path" "$new_path"
done
What is working for me in Mac OS 12.5 with GNU bash, version 3.2.57(1)-release (arm64-apple-darwin21)
is the following :
for f in *.pdf; do mv "$f" "$(echo $f |sed -e 's/Jan/-01-/gi' -e 's/Feb/-02-/gi' -e 's/Mar/-03-/gi' -e 's/Apr/-04-/gi' -e 's/May/-05-/gi' -e 's/jun/-06-/gi' -e 's/Jul/-07-/gi' -e 's/Aug/-08-/gi' -e 's/Sep/-09-/gi' -e 's/Oct/-10-/gi' -e 's/Nov/-11-/gi' -e 's/Dec/-12-/gi' )"; done
Note the original file had the month expressed in three litters in my case :
./04351XXX73435-2021Mar08-2021Apr08.pdf

Read CSV and add data using condition

I am trying to read a CSV which has data like:
Name Time
John
Ken
Paul
I want to read column one if it matches then change time. For example, if $1 = John then change time of the John to $2.
Here is what I have so far:
while IFS=, read -r col1 col2
do
echo "$col1"
if[$col1 eq $1] then
echo "$2:$col2"
done < test.csv >> newupdate.csv
To run ./test.sh John 30.
I am trying to keep the csv updated so making a new file I thought would be okay. so I can read updated file again for next run and update again.
Your shell script has a number of syntax errors. You need spaces inside [...] and you should generally quote your variables. You can usefully try http://shellcheck.net/ before asking for human assistance.
while IFS=, read -r col1 col2
do
if [ "$col1" = "$1" ]; then
col2=$2
fi
echo "$col1,$col2" # comma or colon separated?
done < test.csv >newupdate.csv
Notice how we always print the entire current line, with or without modifications depending on the first field. Notice also the semicolon (or equivalently newline) before then, and use of = as the equality comparison operator for strings. (The numeric comparison operator is -eq with a dash, not eq.)
However, it's probably both simpler and faster to use Awk instead. The shell isn't very good (or very quick) at looping over lines in the first place.
awk -F , -v who="$1" -v what="$2" 'BEGIN { OFS=FS }
$1 == who { $2 = what } 1' test.csv >newupdate.csv
Doing this in sed will be even more succinct; but the error symptoms if your variables contain characters which have a special meaning to sed will be bewildering. So don't really do this.
sed "s/^$1,.*/$1,$2/" test.csv >newupdate.csv
There are ways to make this less brittle, but then not using sed for any non-trivial scripts is probably the most straightforward solution.
None of these scripts use any Bash-specific syntax, so you could run them under any POSIX-compatible shell.

Time difference between two dates in the log files

I am trying to get the time difference between two dates as given below in Bash script. However I am not successful
head_info: 05-31-2017:04:27:37
tail_info: 05-31-2017:04:30:57
the problem is that after Reformation above time and while trying to calculate in seconds due to space, it is ignoring time.
This is my script:
fm_head_info=(${head_info:6:4}"-"${head_info:0:2}"-"${head_info:3:2}" \
"${head_info:11:8})
fm_tail_info=(${tail_info:6:4}"-"${tail_info:0:2}"-"${tail_info:3:2}" \
"${tail_info:11:8})
$ fm_head_info
-bash: 2017-05-31: command not found
Thank you
Let's define your shell variables:
$ tail_info=05-31-2017:04:30:57
$ head_info=05-31-2017:04:27:37
Now, let's create a function to convert those dates to seconds-since-epoch:
$ date2sec() { date -d "$(sed 's|-|/|g; s|:| |' <<<"$*")" +%s; }
To find the time difference between those two date in seconds:
$ echo $(( $(date2sec "$tail_info") - $(date2sec "$head_info") ))
200
As written above, this requires bash (or similar advanced shell) and GNU date. In other words, this should work on any standard Linux. To make this work on OSX, some changes to the date command will likely be necessary.
How it works
Starting with the innermost command inside the function date2sec, we have:
sed 's|-|/|g; s|:| |' <<<"$*"
In the argumnet to the function, this replaces all - with / and it replaces the first : with a space. This converts the the dates from the format in your input to one that the GNU date function will understand. For example:
$ sed 's|-|/|g; s|:| |' <<<"05-31-2017:04:30:57"
05/31/2017 04:30:57
With this form, we can use date to find seconds-since-epoch:
$ date -d "05/31/2017 04:30:57" +%s
1496230257
And, for the head_info:
$ date -d "05/31/2017 04:27:37" +%s
1496230057
Now that we have that, all that is left is to subtract the times:
$ echo $(( 1496230257 - 1496230057 ))
200
Your immediate issue is the inclusion of erroneous (...) surrounding your string indexed assignment and your questionable quoting. It looks like you intended:
fm_head_info="${head_info:6:4}-${head_info:0:2}-${head_info:3:2} ${head_info:11:8}"
fm_tail_info="${tail_info:6:4}-${tail_info:0:2}-${tail_info:3:2} ${tail_info:11:8}"
Your use of string indexes is correct, e.g.
#!/bin/bash
head_info=05-31-2017:04:27:37
tail_info=05-31-2017:04:30:57
fm_head_info="${head_info:6:4}-${head_info:0:2}-${head_info:3:2} ${head_info:11:8}"
fm_tail_info="${tail_info:6:4}-${tail_info:0:2}-${tail_info:3:2} ${tail_info:11:8}"
echo "fm_head_info: $fm_head_info"
echo "fm_tail_info: $fm_tail_info"
Example Use/Output
$ bash headinfo.sh
fm_head_info: 2017-05-31 04:27:37
fm_tail_info: 2017-05-31 04:30:57
You can then do something similar with the differences in date -d "$var" +%s as John shows in his answer to arrive at the time difference. Note, string indexes are limited to bash, while a sed solution (absent the herestring) would be portable on all POSIX shells.

How do I use `sed` to alter a variable in a bash script?

I'm trying to use enscript to print PDFs from Mutt, and hitting character encoding issues. One way around them seems to be to just use sed to replace the problem characters: sed -ir 's/[“”]/"/g' {input}
My test input file is this:
“very dirty”
we’re
I'm hoping to get "very dirty" and we're but instead I'm still getting
â\200\234very dirtyâ\200\235
weâ\200\231re
I found a nice little post on printing to PDFs from Mutt that I used as a starting point. I have a bash script that I point to from my .muttrc with set print_command="$HOME/.mutt/print.sh" -- the script currently reads about like this:
#!/bin/bash
input="$1" pdir="$HOME/Desktop" open_pdf=evince
# Straighten out curly quotes
sed -ir 's/[“”]/"/g' $input
sed -ir "s/[’]/'/g" $input
tmpfile="`mktemp $pdir/mutt_XXXXXXXX.pdf`"
enscript --font=Courier8 $input -2r --word-wrap --fancy-header=mutt -p - 2>/dev/null | ps2pdf - $tmpfile
$open_pdf $tmpfile >/dev/null 2>&1 &
sleep 1
rm $tmpfile
It does a fine job of creating a PDF (and works fine if you give it a file as an argument) but I can't figure out how to fix the curly quotes.
I've tried a bunch of variations on the sed line:
input=sed -r 's/[“”]/"/g' $input
$input=sed -ir "s/[’]/'/g" $input
Per the suggestion at Can I use sed to manipulate a variable in bash? I also tried input=$(sed -r 's/[“”]/"/g' <<< $input) and I get an error: "Syntax error: redirection unexpected"
But none manages to actually change $input -- what is the correct syntax to change $input with sed?
Note: I accepted an answer that resolved the question I asked, but as you can see from the comments there are a couple of other issues here. enscript is taking in a whole file as a variable, not just the text of the file. So trying to tweak the text inside the file is going to take a few extra steps. I'm still learning.
On Editing Variables In General
BashFAQ #21 is a comprehensive reference on performing search-and-replace operations in bash, including within variables, and is thus recommended reading. On this particular case:
Use the shell's native string manipulation instead; this is far higher performance than forking off a subshell, launching an external process inside it, and reading that external process's output. BashFAQ #100 covers this topic in detail, and is well worth reading.
Depending on your version of bash and configured locale, it might be possible to use a bracket expression (ie. [“”], as your original code did). However, the most portable thing is to treat “ and ” separately, which will work even without multi-byte character support available.
input='“hello ’cruel’ world”'
input=${input//'“'/'"'}
input=${input//'”'/'"'}
input=${input//'’'/"'"}
printf '%s\n' "$input"
...correctly outputs:
"hello 'cruel' world"
On Using sed
To provide a literal answer -- you almost had a working sed-based approach in your question.
input=$(sed -r 's/[“”]/"/g' <<<"$input")
...adds the missing syntactic double quotes around the parameter expansion of $input, ensuring that it's treated as a single token regardless of how it might be string-split or glob-expanded.
But All That May Not Help...
The below is mentioned because your test script is manipulating content passed on the command line; if that's not the case in production, you can probably disregard the below.
If your script is invoked as ./yourscript “hello * ’cruel’ * world”, then information about exactly what the user entered is lost before the script is started, and nothing you can do here will fix that.
This is because $1, in that scenario, will only contain “hello; ’cruel’ and world” are in their own argv locations, and the *s will have been replaced with lists of files in the current directory (each such file substituted as a separate argument) before the script was even started. Because the shell responsible for parsing the user's command line (which is not the same shell running your script!) did not recognize the quotes as valid at the time when it ran this parsing, by the time the script is running, there's nothing you can do to recover the original data.
Abstract: The way to use sed to change a variable is explored, but what you really need is a way to use and edit a file. It is covered ahead.
Sed
The (two) sed line(s) could be solved with this (note that -i is not used, it is not a file but a value):
input='“very dirty”
we’re'
sed 's/[“”]/\"/g;s/’/'\''/g' <<<"$input"
But it should be faster (for small strings) to use the internals of the shell:
input='“very dirty”
we’re'
input=${input//[“”]/\"}
input=${input//[’]/\'}
printf '%s\n' "$input"
$1
But there is an underlying problem with your script, you are trying to clean an input received from the command line. You are using $1 as the source of the string. Once somebody writes:
./script “very dirty”
we’re
That input is lost. It is broken into shell's tokens and "$1" will be “very only.
But I do not believe that is what you really have.
file
However, you are also saying that the input comes from a file. If that is the case, then read it in with:
input="$(<infile)" # not $1
sed 's/[“”]/\"/g;s/’/'\''/g' <<<"$input"
Or, if you don't mind to edit (change) the file, do this instead:
sed -i 's/[“”]/\"/g;s/’/'\''/g' infile
input="$(<infile)"
Or, if you are clear and certain that what is being given to the script is a filename, like:
./script infile
You can use:
infile="$1"
sed -i 's/[“”]/\"/g;s/’/'\''/g' "$infile"
input="$(<"$infile")"
Other comments:
Then:
Quote your variables.
Do not use the very old `…` syntax, use $(…) instead.
Do not use variables in UPPER case, those are reserved for environment variables.
And (unless you actually meant sh) use a shebang (first line) that targets bash.
The command enscript most definitively requires a file, not a variable.
Maybe you should use evince to open the PS file, there is no need of the step to make a pdf, unless you know you really need it.
I believe that is better use a file to store the output of enscript and ps2pdf.
Do not hide the errors printed by the commands until everything is working as desired, then, just call the script as:
./script infile 2>/dev/null
Or as required to make it less verbose.
Final script.
If you call the script with the name of the file that enscript is going to use, something like:
./script infile
Then, the whole script will look like this (runs both in bash or sh):
#!/usr/bin/env bash
Usage(){ echo "$0; This script require a source file"; exit 1; }
[ $# -lt 1 ] && Usage
[ ! -e $1 ] && Usage
infile="$1"
pdir="$HOME/Desktop"
open_pdf=evince
# Straighten out curly quotes
sed -i 's/[“”]/\"/g;s/’/'\''/g' "$infile"
tmpfile="$(mktemp "$pdir"/mutt_XXXXXXXX.pdf)"
outfile="${tmpfile%.*}.ps"
enscript --font=Courier10 "$infile" -2r \
--word-wrap --fancy-header=mutt -p "$outfile"
ps2pdf "$outfile" "$tmpfile"
"$open_pdf" "$tmpfile" >/dev/null 2>&1 &
sleep 5
rm "$tmpfile" "$outfile"

Bash/Shell | How to prioritize quote from IFS in read [duplicate]

This question already has answers here:
IFS separate a string like "Hello","World","this","is, a boring", "line"
(3 answers)
Closed 6 years ago.
I'm working with a hand fill file and I am having issue to parse it.
My file input file cannot be altered, and the language of my code can't change from bash script.
I made a simple example to make it easy for you ^^
var="hey","i'm","happy, like","you"
IFS="," read -r one two tree for five <<<"$var"
echo $one:$two:$tree:$for:$five
Now I think you already saw the problem here. I would like to get
hey:i'm:happy, like:you:
but I get
hey:i'm:happy: like:you
I need a way to tell the read that the " " are more important than the IFS. I have read about the eval command but I can't take that risk.
To end this is a directory file and the troublesome field is the description one, so it could have basically anything in it.
original file looking like that
"type","cn","uid","gid","gecos","description","timestamp","disabled"
"type","cn","uid","gid","gecos","description","timestamp","disabled"
"type","cn","uid","gid","gecos","description","timestamp","disabled"
Edit #1
I will give a better exemple; the one I use above is too simple and #StefanHegny found it cause another error.
while read -r ldapLine
do
IFS=',' read -r objectClass dumy1 uidNumber gidNumber username description modifyTimestamp nsAccountLock gecos homeDirectory loginShell createTimestamp dumy2 <<<"$ldapLine"
isANetuser=0
while IFS=":" read -r -a class
do
for i in "${class[#]}"
do
if [ "$i" == "account" ]
then
isANetuser=1
break
fi
done
done <<< $objectClass
if [ $isANetuser == 0 ]
then
continue
fi
#MORE STUFF APPEND#
done < file.csv
So this is a small part of the code but it should explain what I do. The file.csv is a lot of lines like this:
"top:shadowAccount:account:posixAccount","Jdupon","12345","6789","Jdupon","Jean Mark, Dupon","20140511083750Z","","Jean Mark, Dupon","/home/user/Jdupon","/bin/ksh","20120512083750Z","",""
If the various bash versions you will use are all more recent than v3.0, when regexes and BASH_REMATCH were introduced, you could use something like the following function: [Note 1]
each_field () {
local v=,$1;
while [[ $v =~ ^,(([^\",]*)|\"[^\"]*\") ]]; do
printf "%s\n" "${BASH_REMATCH[2]:-${BASH_REMATCH[1]:1:-1}}";
v=${v:${#BASH_REMATCH[0]}};
done
}
It's argument is a single line (remember to quote it!) and it prints each comma-separated field on a separate line. As written, it assumes that no field has an enclosed newline; that's legal in CSV, but it makes dividing the file into lines a lot more complicated. If you actually needed to deal with that scenario, you could change the \n in the printf statement to a \0 and then use something like xargs -0 to process the output. (Or you could insert whatever processing you need to do to the field in place of the printf statement.)
It goes to some trouble to dequote quoted fields without modifying unquoted fields. However, it will fail on fields with embedded double quotes. That's fixable, if necessary. [Note 2]
Here's a sample, in case that wasn't obvious:
while IFS= read -r line; do
each_field "$line"
printf "%s\n" "-----"
done <<EOF
type,cn,uid,gid,gecos,"description",timestamp,disabled
"top:shadowAccount:account:posixAccount","Jdupon","12345","6789","Jdupon","Jean Mark, Dupon","20140511083750Z","","Jean Mark, Dupon","/home/user/Jdupon","/bin/ksh","20120512083750Z","",""
EOF
Output:
type
cn
uid
gid
gecos
description
timestamp
disabled
-----
top:shadowAccount:account:posixAccount
Jdupon
12345
6789
Jdupon
Jean Mark, Dupon
20140511083750Z
Jean Mark, Dupon
/home/user/Jdupon
/bin/ksh
20120512083750Z
-----
Notes:
I'm not saying you should use this function. You should use a CSV parser, or a language which includes a good CSV parsing library, like python. But I believe this bash function will work, albeit slowly, on correctly-formatted CSV files of a certain common CSV dialect.
Here's a version which handles doubled quotes inside quoted fields, which is the classic CSV syntax for interior quotes:
each_field () {
local v=,$1;
while [[ $v =~ ^,(([^\",]*)|\"(([^\"]|\"\")*)\") ]]; do
echo "${BASH_REMATCH[2]:-${BASH_REMATCH[3]//\"\"/\"}}";
v=${v:${#BASH_REMATCH[0]}};
done
}
My suggestion, as in some previous answers (see below), is to switch the separator to | (and use IFS="|" instead):
sed -r 's/,([^,"]*|"[^"]*")/|\1/g'
This requires a sed that has extended regular expressions (-r) however.
Should I use AWK or SED to remove commas between quotation marks from a CSV file? (BASH)
Is it possible to write a regular expression that matches a particular pattern and then does a replace with a part of the pattern

Resources