newline character in POSIX shell - shell

I'm parsing output of avahi-browse tool and my script should be POSIX compatible.
I'm doing it next way:
local _dnssd=`avahi-browse -apt`
if [ -z "$_dnssd" ]; then
echo "No info"
else
IFS='
' # it's new line character in IFS
for _row in $_dnssd
do
local _tmpIFP="$IFS"
IFS=";"
case "$_row" in
...
esac
IFS="$_tmpIFS"
done
fi
I really don't like line with newline assignment to IFS. Is it possible to replace it in better way?
I tried some suggestions from stackoverflow, but it doesn't work:
IFS=$(echo -e '\n')
avahi-browse output:
+;br0;IPv4;switch4B66E4;_http._tcp;local
+;br0;IPv4;switch4B66E4;_csco-sb._tcp;local

Add a space after \n in the IFS variable, then remove that space again:
IFS="$(printf '\n ')" && IFS="${IFS% }"
#IFS="$(printf '\n ')" && IFS="${IFS%?}"
printf '%s' "$IFS" | od -A n -c

It's better to use a while loop than trying to iterate over a string that contains the entire output.
avahi-browse -apt | while IFS=";" read field1 field2 ...; do
case ... in
...
esac
done
Note you should need one name per field for the read command. The ... is just a placeholder, not valid shell syntax for a variable number of fields.
This simply does nothing if the program produces no output. If you really need to detect that case, try
avahi-browse -apt | {
read line || { echo "No info"; exit; }
while : ; do
IFS=";" read field1 field2 ... <<EOF
$line
EOF
case ... in
...
esac
read line || break
done
}
In both cases, any variables set in the right-hand side of the pipe are local to that shell. If you need to set variables for later use, you'll need to make some further adjustments.

If one can rely* that IFS has its default value (space, tab, new line) then one can simply strip the first two characters (space and tab) and the new line character will remain:
IFS=${IFS#??}
*You can rely on it if IFS has not been modified by the script before and if it is a POSIX shell (as the topic implies):
The shell shall set IFS to <space><tab><newline> when it is invoked.
See http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_05_03

Related

How to iterate over the characters of a string in a POSIX shell script?

A POSIX compliant shell shall provide mechanisms like this to iterate over collections of strings:
for x in $(seq 1 5); do
echo $x
done
But, how do I iterate over each character of a word?
It's a little circuitous, but I think this'll work in any posix-compliant shell. I've tried it in dash, but I don't have busybox handy to test with.
var='ab * cd'
tmp="$var" # The loop will consume the variable, so make a temp copy first
while [ -n "$tmp" ]; do
rest="${tmp#?}" # All but the first character of the string
first="${tmp%"$rest"}" # Remove $rest, and you're left with the first character
echo "$first"
tmp="$rest"
done
Output:
a
b
*
c
d
Note that the double-quotes around the right-hand side of assignments are not needed; I just prefer to use double-quotes around all expansions rather than trying to keep track of where it's safe to leave them off. On the other hand, the double-quotes in [ -n "$tmp" ] are absolutely necessary, and the inner double-quotes in first="${tmp%"$rest"}" are needed if the string contains "*".
Use getopts to process input one character at a time. The : instructs getopts to ignore illegal options and set OPTARG. The leading - in the input makes getopts treat the string as a options.
If getopts encounters a colon, it will not set OPTARG, so the script uses parameter expansion to return : when OPTARG is not set/null.
#!/bin/sh
IFS='
'
split_string () {
OPTIND=1;
while getopts ":" opt "-$1"
do echo "'${OPTARG:-:}'"
done
}
while read -r line;do
split_string "$line"
done
As with the accepted answer, this processes strings byte-wise instead of character-wise, corrupting multibyte codepoints. The trick is to detect multibyte codepoints, concatenate their bytes and then print them:
#!/bin/sh
IFS='
'
split_string () {
OPTIND=1;
while getopts ":" opt "$1";do
case "${OPTARG:=:}" in
([[:print:]])
[ -n "$multi" ] && echo "$multi" && multi=
echo "$OPTARG" && continue
esac
multi="$multi$OPTARG"
case "$multi" in
([[:print:]]) echo "$multi" && multi=
esac
done
[ -n "$multi" ] && echo "$multi"
}
while read -r line;do
split_string "-$line"
done
Here the extra case "$multi" is used to detect when the multi buffer contains a printable character. This works on shells like Bash and Zsh but Dash and busybox ash do not pattern match multibyte codepoints, ignoring locale.
This degrades somewhat nicely: Dash/ash treat sequences of multibyte codepoints as one character, but handle multibyte characters surrounded by single byte characters fine.
Depending on your requirements it may be preferable not to split consecutive multibyte codepoints anyway, as the next codepoint may be a combining character which modifies the character before it.
This won't handle the case where a single byte character is followed by a combining character.
This works in dash and busybox:
echo 'ab * cd' | grep -o .
Output:
a
b
*
c
d
I was developing a script which demanded stacks... So, we can use it to iterate through strings
#!/bin/sh
# posix script
pop () {
# $1 top
# $2 stack
eval $1='$(expr "'\$$2'" : "\(.\).*")'
eval $2='$(expr "'\$$2'" : ".\(.*\)" )'
}
string="ABCDEFG"
while [ "$string" != "" ]
do
pop c string
echo "--" $c
done

Assign and/or manipulate incoming variables (string) from external program in bash

I have an external program which hands me a bunch of information via stdin ($1) to my script.
I get a line like the following:
session number="2018/06/20-234",data name="XTRDF_SLSLWX3_FSLO",data group="Testing",status="Error",data type="0"
Now I want to use this line split into single variables.
I thought about two ways until now:
INPUT='session number="2018/06/20-234",data name="XTRDF_SLSLWX3_FSLO",data group="Testing",status="Error",data type="0"'
echo "$INPUT" | tr ',' '\n' | tr ' ' '_' > vars.tmp
set vars.tmp
This will do the job until I have a data_name variable with a space in it, my trim command will automatically change it to _ and my assigned variable is no longer correct in upcoming checks.
So I thought about loading the input into a array and do some pattern substitution on the array to delete everything until and including the = and do some variable assignments afterwards
INPUT='session number="2018/06/20-234",data name="XTRDF_SLSLWX3_FSLO",data group="Testing",status="Error",data type="0"'
IFS=',' read -r -a array <<< "$INPUT"
array=("${array[#]/#*=/}")
session_number="${array[0]}"
data_name="${array[1]}"
....
But now I have a strange behaviour cutting the input if there is a = somewhere in the data name or data group and I have no idea if this is the way to do it. I'm pretty sure there should be no = in the data name or data group field compared to a space but you never know...
How could I do this?
Simple Case: No Commas Within Strings
If you don't need to worry about commas or literal quotes inside the quoted data, the following handles the case you asked about (stray =s within the data) sanely:
#!/usr/bin/env bash
case $BASH_VERSION in ''|[123].*) echo "ERROR: Requires bash 4.0 or newer" >&2; exit 1;; esac
input='session number="2018/06/20-234",data name="XTRDF_SLSLWX3_FSLO",data group="Testing",status="Error",data type="0"'
declare -A data=( )
IFS=, read -r -a pieces <<<"$input"
for piece in "${pieces[#]}"; do
key=${piece%%=*} # delete everything past the *first* "=", ignoring later ones
value=${piece#*=} # delete everything before the *first* "=", ignoring later ones
value=${value#'"'} # remove leading quote
value=${value%'"'} # remove trailing quote
data[$key]=$value
done
declare -p data
...results in (whitespace added for readability, otherwise literal output):
declare -A data=(
["data type"]="0"
[status]="Error"
["data group"]="Testing"
["data name"]="XTRDF_SLSLWX3_FSLO"
["session number"]="2018/06/20-234"
)
Handling Commas Inside Quotes
Now, let's say you do need to worry about commas inside your quotes! Consider the following input:
input='session number="123",error="Unknown, please try again"'
Now, if we try to split on commas without considering their position, we'll have error="Unknown and have please try again as a stray value.
To solve this, we can use GNU awk with the FPAT feature.
#!/usr/bin/env bash
case $BASH_VERSION in ''|[123].*) echo "ERROR: Requires bash 4.0 or newer" >&2; exit 1;; esac
input='session number="123",error="Unknown, please try again"'
# Why do so many awk people try to write one-liners? Isn't this more readable?
awk_script='
BEGIN {
FPAT = "[^=,]+=(([^,]+)|(\"[^\"]+\"))"
}
{
printf("%s\0", NF)
for (i = 1; i <= NF; i++) {
printf("%s\0", $i)
}
}
'
while :; do
IFS= read -r -d '' num_fields || break
declare -A data=( )
for ((i=0; i<num_fields; i++)); do
IFS= read -r -d '' piece || break
key=${piece%%=*}
value=${piece#*=}
value=${value#'"'}
value=${value%'"'}
data[$key]=$value
done
declare -p data # maybe invoke a callback here, before going on to the next line
done < <(gawk "$awk_script" <<<"$input")
...whereafter output is properly:
declare -A data=(["session number"]="123" [error]="Unknown, please try again" )

IFS separate a string like "Hello","World","this","is, a boring", "line"

I'm trying to parse a .csv file and I have some problems with IFS.
The file contains lines like this:
"Hello","World","this","is, a boring","line"
The columns are separated with a comma, so I tried to explode the line with this code:
IFS=, read -r -a tempArr <<< "$line"
But I get this output:
"Hello"
"World"
"this"
"is
a boring"
"line"
I understand why, so I tried some other commands but I don't get my expected output.
IFS=\",\"
IFS=\",
IFS=',\"'
IFS=,\"
Every time the third element is seperated in 2 parts.
How can I use IFS to seperate the string to 5 parts like this?
"Hello"
"World"
"this"
"is, a boring"
"line"
give this a try:
sed 's/","/"\n"/g' <<<"${line}"
sed has a search and replace command s which is using regex to search pattern.
The regex replaces , in "," with new line char.
As a consequence each element is on a separate line.
You may wish to use the gawk with FPAT to define what makes a valid string -
Input :
"hello","world","this,is"
Script :
gawk -n 'BEGIN{FS=",";OFS="\n";FPAT="([^,]+)|(\"[^\"]+\")"}{$1=$1;print $0}' somefile.csv
Output :
"hello"
"world"
"this,is"
bashlib provides a csvline function. Assuming you've installed it somewhere in your PATH:
line='"Hello","World","this","is, a boring","line"'
source bashlib
csvline <<<"$line"
printf '%s\n' "${CSVLINE[#]}"
...output from the above being:
Hello
World
this
is, a boring
line
To quote the implementation (which is copyright lhunath, the below text being taken from this specific revision of the relevant git repo):
# _______________________________________________________________________
# |__ csvline ____________________________________________________________|
#
# csvline [-d delimiter] [-D line-delimiter]
#
# Parse a CSV record from standard input, storing the fields in the CSVLINE array.
#
# By default, a single line of input is read and parsed into comma-delimited fields.
# Fields can optionally contain double-quoted data, including field delimiters.
#
# A different field delimiter can be specified using -d. You can use -D
# to change the definition of a "record" (eg. to support NULL-delimited records).
#
csvline() {
CSVLINE=()
local line field quoted=0 delimiter=, lineDelimiter=$'\n' c
local OPTIND=1 arg
while getopts :d: arg; do
case $arg in
d) delimiter=$OPTARG ;;
esac
done
IFS= read -d "$lineDelimiter" -r line || return
while IFS= read -rn1 c; do
case $c in
\")
(( quoted = !quoted ))
continue ;;
$delimiter)
if (( ! quoted )); then
CSVLINE+=( "$field" ) field=
continue
fi ;;
esac
field+=$c
done <<< "$line"
[[ $field ]] && CSVLINE+=( "$field" ) ||:
} # _____________________________________________________________________

Bash read function returns error code when using new line delimiter

I have a script that I am returning multiple values from, each on a new line. To capture those values as bash variables I am using the read builtin (as recommended here).
The problem is that when I use the new line character as the delimiter for read, I seem to always get a non-zero exit code. This is playing havoc with the rest of my scripts, which check the result of the operation.
Here is a cut-down version of what I am doing:
$ read -d '\n' a b c < <(echo -e "1\n2\n3"); echo $?; echo $a $b $c
1
1 2 3
Notice the exit status of 1.
I don't want to rewrite my script (the echo command above) to use a different delimiter (as it makes sense to use new lines in other places of the code).
How do I get read to play nice and return a zero exit status when it successfully reads 3 values?
Update
Hmmm, it seems that I may be using the "delimiter" wrongly. From the man page:
-d *delim*
The first character of delim is used to terminate the input line,
rather than newline.
Therefore, one way I could achieve the desired result is to do this:
read -d '#' a b c < <(echo -e "1\n2\n3\n## END ##"); echo $?; echo $a $b $c
Perhaps there's a nicer way though?
The "problem" here is that read returns non-zero when it reaches EOF which happens when the delimiter isn't at the end of the input.
So adding a newline to the end of your input will make it work the way you expect (and fix the argument to -d as indicated in gniourf_gniourf's comment).
What's happening in your example is that read is scanning for \ and hitting EOF before finding it. Then the input line is being split on \n (because of IFS) and assigned to $a, $b and $c. Then read is returning non-zero.
Using -d for this is fine but \n is the default delimiter so you aren't changing anything if you do that and if you had gotten the delimiter correct (-d $'\n') in the first place you would have seen your example not work at all (though it would have returned 0 from read). (See http://ideone.com/MWvgu7)
A common idiom when using read (mostly with non-standard values for -d is to test for read's return value and whether the variable assigned to has a value. read -d '' line || [ "$line" ] for example. Which works even when read fails on the last "line" of input because of a missing terminator at the end.
So to get your example working you want to either use multiple read calls the way chepner indicated or (if you really want a single call) then you want (See http://ideone.com/xTL8Yn):
IFS=$'\n' read -d '' a b c < <(printf '1 1\n2 2\n3 3')
echo $?
printf '[%s]\n' "$a" "$b" "$c"
And adding \0 to the end of the input stream (e.g. printf '1 1\n2 2\n3 3\0') or putting || [ "$a" ] at the end will avoid the failure return from the read call.
The setting of IFS for read is to prevent the shell from word-splitting on spaces and breaking up my input incorrectly. -d '' is read on \0.
-d is the wrong thing to use here. What you really want is three separate calls to read:
{ read a; read b; read c; } < <(echo $'1\n2\n3\n')
Be sure that the input ends with a newline so that the final read has an exit status of 0.
If you don't know how many lines are in the input ahead of time, you need to read the values into an array. In bash 4, that takes just a single call to readarray:
readarray -t arr < <(echo $'1\n2\n3\n')
Prior to bash 4, you need to use a loop:
while read value; do
arr+=("$value")
done < <(echo $'1\n2\n3\n')
read always reads a single line of input; the -d option changes read's idea of what terminates a line. An example:
$ while read -d'#' value; do
> echo "$value"
> done << EOF
> a#b#c#
> EOF
a
b
c

How to split one string into multiple strings separated by at least one space in bash shell?

I have a string containing many words with at least one space between each two. How can I split the string into individual words so I can loop through them?
The string is passed as an argument. E.g. ${2} == "cat cat file". How can I loop through it?
Also, how can I check if a string contains spaces?
I like the conversion to an array, to be able to access individual elements:
sentence="this is a story"
stringarray=($sentence)
now you can access individual elements directly (it starts with 0):
echo ${stringarray[0]}
or convert back to string in order to loop:
for i in "${stringarray[#]}"
do
:
# do whatever on $i
done
Of course looping through the string directly was answered before, but that answer had the the disadvantage to not keep track of the individual elements for later use:
for i in $sentence
do
:
# do whatever on $i
done
See also Bash Array Reference.
Did you try just passing the string variable to a for loop? Bash, for one, will split on whitespace automatically.
sentence="This is a sentence."
for word in $sentence
do
echo $word
done
This
is
a
sentence.
Probably the easiest and most secure way in BASH 3 and above is:
var="string to split"
read -ra arr <<<"$var"
(where arr is the array which takes the split parts of the string) or, if there might be newlines in the input and you want more than just the first line:
var="string to split"
read -ra arr -d '' <<<"$var"
(please note the space in -d ''; it cannot be omitted), but this might give you an unexpected newline from <<<"$var" (as this implicitly adds an LF at the end).
Example:
touch NOPE
var="* a *"
read -ra arr <<<"$var"
for a in "${arr[#]}"; do echo "[$a]"; done
Outputs the expected
[*]
[a]
[*]
as this solution (in contrast to all previous solutions here) is not prone to unexpected and often uncontrollable shell globbing.
Also this gives you the full power of IFS as you probably want:
Example:
IFS=: read -ra arr < <(grep "^$USER:" /etc/passwd)
for a in "${arr[#]}"; do echo "[$a]"; done
Outputs something like:
[tino]
[x]
[1000]
[1000]
[Valentin Hilbig]
[/home/tino]
[/bin/bash]
As you can see, spaces can be preserved this way, too:
IFS=: read -ra arr <<<' split : this '
for a in "${arr[#]}"; do echo "[$a]"; done
outputs
[ split ]
[ this ]
Please note that the handling of IFS in BASH is a subject on its own, so do your tests; some interesting topics on this:
unset IFS: Ignores runs of SPC, TAB, NL and on line starts and ends
IFS='': No field separation, just reads everything
IFS=' ': Runs of SPC (and SPC only)
Some last examples:
var=$'\n\nthis is\n\n\na test\n\n'
IFS=$'\n' read -ra arr -d '' <<<"$var"
i=0; for a in "${arr[#]}"; do let i++; echo "$i [$a]"; done
outputs
1 [this is]
2 [a test]
while
unset IFS
var=$'\n\nthis is\n\n\na test\n\n'
read -ra arr -d '' <<<"$var"
i=0; for a in "${arr[#]}"; do let i++; echo "$i [$a]"; done
outputs
1 [this]
2 [is]
3 [a]
4 [test]
BTW:
If you are not used to $'ANSI-ESCAPED-STRING' get used to it; it's a timesaver.
If you do not include -r (like in read -a arr <<<"$var") then read does backslash escapes. This is left as exercise for the reader.
For the second question:
To test for something in a string I usually stick to case, as this can check for multiple cases at once (note: case only executes the first match, if you need fallthrough use multiple case statements), and this need is quite often the case (pun intended):
case "$var" in
'') empty_var;; # variable is empty
*' '*) have_space "$var";; # have SPC
*[[:space:]]*) have_whitespace "$var";; # have whitespaces like TAB
*[^-+.,A-Za-z0-9]*) have_nonalnum "$var";; # non-alphanum-chars found
*[-+.,]*) have_punctuation "$var";; # some punctuation chars found
*) default_case "$var";; # if all above does not match
esac
So you can set the return value to check for SPC like this:
case "$var" in (*' '*) true;; (*) false;; esac
Why case? Because it usually is a bit more readable than regex sequences, and thanks to Shell metacharacters it handles 99% of all needs very well.
Just use the shells "set" built-in. For example,
set $text
After that, individual words in $text will be in $1, $2, $3, etc. For robustness, one usually does
set -- junk $text
shift
to handle the case where $text is empty or start with a dash. For example:
text="This is a test"
set -- junk $text
shift
for word; do
echo "[$word]"
done
This prints
[This]
[is]
[a]
[test]
$ echo "This is a sentence." | tr -s " " "\012"
This
is
a
sentence.
For checking for spaces, use grep:
$ echo "This is a sentence." | grep " " > /dev/null
$ echo $?
0
$ echo "Thisisasentence." | grep " " > /dev/null
$ echo $?
1
echo $WORDS | xargs -n1 echo
This outputs every word, you can process that list as you see fit afterwards.
(A) To split a sentence into its words (space separated) you can simply use the default IFS by using
array=( $string )
Example running the following snippet
#!/bin/bash
sentence="this is the \"sentence\" 'you' want to split"
words=( $sentence )
len="${#words[#]}"
echo "words counted: $len"
printf "%s\n" "${words[#]}" ## print array
will output
words counted: 8
this
is
the
"sentence"
'you'
want
to
split
As you can see you can use single or double quotes too without any problem
Notes:
-- this is basically the same of mob's answer, but in this way you store the array for any further needing. If you only need a single loop, you can use his answer, which is one line shorter :)
-- please refer to this question for alternate methods to split a string based on delimiter.
(B) To check for a character in a string you can also use a regular expression match.
Example to check for the presence of a space character you can use:
regex='\s{1,}'
if [[ "$sentence" =~ $regex ]]
then
echo "Space here!";
fi
For checking spaces just with bash:
[[ "$str" = "${str% *}" ]] && echo "no spaces" || echo "has spaces"
$ echo foo bar baz | sed 's/ /\n/g'
foo
bar
baz
For my use case, the best option was:
grep -oP '\w+' file
Basically this is a regular expression that matches contiguous non-whitespace characters. This means that any type and any amount of whitespace won't match. The -o parameter outputs each word matches on a different line.
Another take on this (using Perl):
$ echo foo bar baz | perl -nE 'say for split /\s/'
foo
bar
baz

Resources