Explanation of bash specific syntax - bash

Came across this piece of code:
for entry in $(echo $tmp | tr ';' '\n')
do
echo $entry
rproj="${entry%%,*}"
rhash="${entry##*,}"
remoteproj[$rproj]=$rhash
done
So I do understand that initially ';' is converted to new line so that all entries in the file are on a separate line. However, I am seeing this for the first time:
rproj="${entry%%,*}"
rhash="${entry##*,}"
I do understand that this is taking everything before ',' and after comma ',' . But, is this more efficient than split? Also, if someone please explain the syntax because I am unable to relate this to regular expression or bash syntax.

These are string manipulation operators.
${string##substring}
Deletes longest match of $substring from front of $string.
Meaning it will remove everything before the first comma, including it
${string%%substring}
Deletes longest match of $substring from back of $string.
Meaning it will remove everything after the last comma, including it
Btw, I would use the internal field separator instead of the tr command:
IFS=';'
for entry in $tmp ; do
echo $entry
rproj="${entry%%,*}"
rhash="${entry##*,}"
remoteproj[$rproj]=$rhash
done
unset IFS
Like this.

Use the read command both to split the line original line and to split each entry.
IFS=';' read -r -a entries <<< "$tmp"
for entry in "${entries[#]}"; do
IFS=, read -r rproj rhash <<< "$entry"
remoteproj["$rproj"]=$rhash
done

For performance it is best to do things without subshells. I am still getting confused between % and #, but these internal evaluations are way better than using sed, cut or perl.
The %% means "remove the largest possible matching string from the end of the variable's contents".
The ## means "remove the largest possible matching string from the beginning of the variable's contents".
You can see the working with a simple test:
for entry in key,value a,b,c
do
echo "$entry is split into ${entry%%,*} and ${entry##*,}"
done
The result of splitting key,value is obvious. When you are splitting a,b,c the field b is lost.

Related

How to grep from a single line

I'm using a weather API that outputs all data in a single line. How do I use grep to get the values for "summary" and "apparentTemperature"? My command of regular expressions is basically nonexistent, but I'm ready to learn.
{"latitude":59.433335,"longitude":24.750486,"timezone":"Europe/Tallinn","offset":2,"currently":{"time":1485880052,"summary":"Clear","icon":"clear-night","precipIntensity":0,"precipProbability":0,"temperature":0.76,"apparentTemperature":-3.34,"dewPoint":-0.13,"humidity":0.94,"windSpeed":3.99,"windBearing":262,"visibility":9.99,"cloudCover":0.11,"pressure":1017.72,"ozone":282.98}}
Thank you!
How do I use grep to get the values for "summary" and "apparentTemperature"?
You use grep's -o flag, which makes it output only the matched part.
Since you don't know much about regex, I suggest you instead learn to use a JSON parser, which would be more appropriate for this task.
For example with jq, the following command would extract the current summary :
<whatever is your JSON source> | jq '.currently.summary'
Assume your single-line data is contained in a variable called DATA_LINE.
If you are certain the field is only present once in the whole line, you could do something like this in Bash:
if
[[ "$DATA_LINE" =~ \"summary\":\"([^\"]*)\" ]]
then
summary="${BASH_REMATCH[1]}"
echo "Summary field is : $summary"
else
echo "Summary field not found"
fi
You would have to do that once for each field, unless you build a more complex matching expression that assumes fields are in a specific order.
As a note, the matching expression \"summary\":\"([^\"]*)\" finds the first occurrence in the data of a substring consisting of :
"summary":" (double quotes included), followed by
([^\"]*) a sub-expression formed of a sequence of zero or more characters other than a double quote : this is in parentheses to make it available later as an element in the BASH_REMATCH array, because this is the value you want to extract
and finally a final quote ; this is not absolutely necessary, but protects from reading from a truncated data line.
For apparentTemperature the code will be a bit different because the field does not have the same format.
if
[[ "$DATA_LINE" =~ \"apparentTemperature\":([^,]*), ]]
then
apparentTemperature="${BASH_REMATCH[1]}"
echo "Apparent temperature field is : $apparentTemperature"
else
echo "Apparent temperature field not found"
fi
This is fairly easily understood if your skills are limited - like mine! Assuming your string is in a variable called $LINE:
summary=$(sed -e 's/.*summary":"//' -e 's/".*//' <<< $LINE)
Then check:
echo $summary
Clear
That executes (-e) 2 sed commands. The first one substitutes everything up to summary":" with nothing and the second substitutes the first remaining double quote and everything that follows with nothing.
Extract apparent temperature:
appTemp=$(sed -e 's/.*apparentTemperature"://' -e 's/,.*//' <<< $LINE)
Then check:
echo $appTemp
-3.34
As Aaron mentioned a json parser like jq is the right tool for this, but since the question was about grep, let's see one way to do it.
Assuming your API return value is in $json:
json='{"latitude":59.433335,"longitude":24.750486,"timezone":"Europe/Tallinn","offset":2,"currently":{"time":1485880052,"summary":"Clear","icon":"clear-night","precipIntensity":0,"precipProbability":0,"temperature":0.76,"apparentTemperature":-3.34,"dewPoint":-0.13,"humidity":0.94,"windSpeed":3.99,"windBearing":262,"visibility":9.99,"cloudCover":0.11,"pressure":1017.72,"ozone":282.98}}'
The patterns you see in the parenthesis are lookbehind and lookahead assertions for context matching. They can be used with the -P Perl regex option and will not be captured in the output.
summary=$(<<< "$json" grep -oP '(?<="summary":").*?(?=",)')
apparentTemperature=$(<<< "$json" grep -oP '(?<="apparentTemperature":).*?(?=,)')

bash - split string into array WITH empty values

pattern="::a::b::"
oldIFS=$IFS
IFS="::"
read -r -a extractees <<< $pattern
IFS=$oldIFS
this results in
{"a","b"}
however, I need to maintain the indices, so I want
{"","a","b",""}
(for comparison, if I wanted {"a","b"}, I would have written "a::b".
Why? because these elements are later split again (on a different delimiter) and the empty "" values should result in an empty list then.
How do I achieve this?
No field separator can be longer than 1 character, unfortunately, so '::' → ':'.
Aside of that, globbing should be explicitly turned off to prevent potential filename expansion in an unquoted variable.
set -f # disable globbing
pattern=":a:b c:"
oldIFS=$IFS
IFS=":"
extractees=($pattern)
IFS=$oldIFS
echo "'${extractees[0]}'"
echo "'${extractees[1]}'"
echo "'${extractees[2]}'"
echo "'${extractees[3]}'"

How to split strings over multiple lines in Bash?

How can i split my long string constant over multiple lines?
I realize that you can do this:
echo "continuation \
lines"
>continuation lines
However, if you have indented code, it doesn't work out so well:
echo "continuation \
lines"
>continuation lines
This is what you may want
$ echo "continuation"\
> "lines"
continuation lines
If this creates two arguments to echo and you only want one, then let's look at string concatenation. In bash, placing two strings next to each other concatenate:
$ echo "continuation""lines"
continuationlines
So a continuation line without an indent is one way to break up a string:
$ echo "continuation"\
> "lines"
continuationlines
But when an indent is used:
$ echo "continuation"\
> "lines"
continuation lines
You get two arguments because this is no longer a concatenation.
If you would like a single string which crosses lines, while indenting but not getting all those spaces, one approach you can try is to ditch the continuation line and use variables:
$ a="continuation"
$ b="lines"
$ echo $a$b
continuationlines
This will allow you to have cleanly indented code at the expense of additional variables. If you make the variables local it should not be too bad.
Here documents with the <<-HERE terminator work well for indented multi-line text strings. It will remove any leading tabs from the here document. (Line terminators will still remain, though.)
cat <<-____HERE
continuation
lines
____HERE
See also http://ss64.com/bash/syntax-here.html
If you need to preserve some, but not all, leading whitespace, you might use something like
sed 's/^ //' <<____HERE
This has four leading spaces.
Two of them will be removed by sed.
____HERE
or maybe use tr to get rid of newlines:
tr -d '\012' <<-____
continuation
lines
____
(The second line has a tab and a space up front; the tab will be removed by the dash operator before the heredoc terminator, whereas the space will be preserved.)
For wrapping long complex strings over many lines, I like printf:
printf '%s' \
"This will all be printed on a " \
"single line (because the format string " \
"doesn't specify any newline)"
It also works well in contexts where you want to embed nontrivial pieces of shell script in another language where the host language's syntax won't let you use a here document, such as in a Makefile or Dockerfile.
printf '%s\n' >./myscript \
'#!/bin/sh` \
"echo \"G'day, World\"" \
'date +%F\ %T' && \
chmod a+x ./myscript && \
./myscript
You can use bash arrays
$ str_array=("continuation"
"lines")
then
$ echo "${str_array[*]}"
continuation lines
there is an extra space, because (after bash manual):
If the word is double-quoted, ${name[*]} expands to a single word with
the value of each array member separated by the first character of the
IFS variable
So set IFS='' to get rid of extra space
$ IFS=''
$ echo "${str_array[*]}"
continuationlines
In certain scenarios utilizing Bash's concatenation ability might be appropriate.
Example:
temp='this string is very long '
temp+='so I will separate it onto multiple lines'
echo $temp
this string is very long so I will separate it onto multiple lines
From the PARAMETERS section of the Bash Man page:
name=[value]...
...In the context where an assignment statement is assigning a value to a shell variable or array index, the += operator can be used to append to or add to the variable's previous value. When += is applied to a variable for which the integer attribute has been set, value is evaluated as an arithmetic expression and added to the variable's current value, which is also evaluated. When += is applied to an array variable using compound assignment (see Arrays below), the variable's value is not unset (as it is when using =), and new values are appended to the array beginning at one greater than the array's maximum index (for indexed arrays) or added as additional key-value pairs in an associative array. When applied to a string-valued variable, value is expanded and appended to the variable's value.
You could simply separate it with newlines (without using backslash) as required within the indentation as follows and just strip of new lines.
Example:
echo "continuation
of
lines" | tr '\n' ' '
Or if it is a variable definition newlines gets automatically converted to spaces. So, strip of extra spaces only if applicable.
x="continuation
of multiple
lines"
y="red|blue|
green|yellow"
echo $x # This will do as the converted space actually is meaningful
echo $y | tr -d ' ' # Stripping of space may be preferable in this case
This isn't exactly what the user asked, but another way to create a long string that spans multiple lines is by incrementally building it up, like so:
$ greeting="Hello"
$ greeting="$greeting, World"
$ echo $greeting
Hello, World
Obviously in this case it would have been simpler to build it one go, but this style can be very lightweight and understandable when dealing with longer strings.
Line continuations also can be achieved through clever use of syntax.
In the case of echo:
# echo '-n' flag prevents trailing <CR>
echo -n "This is my one-line statement" ;
echo -n " that I would like to make."
This is my one-line statement that I would like to make.
In the case of vars:
outp="This is my one-line statement" ;
outp+=" that I would like to make." ;
echo -n "${outp}"
This is my one-line statement that I would like to make.
Another approach in the case of vars:
outp="This is my one-line statement" ;
outp="${outp} that I would like to make." ;
echo -n "${outp}"
This is my one-line statement that I would like to make.
Voila!
I came across a situation in which I had to send a long message as part of a command argument and had to adhere to the line length limitation. The commands looks something like this:
somecommand --message="I am a long message" args
The way I solved this is to move the message out as a here document (like #tripleee suggested). But a here document becomes a stdin, so it needs to be read back in, I went with the below approach:
message=$(
tr "\n" " " <<-END
This is a
long message
END
)
somecommand --message="$message" args
This has the advantage that $message can be used exactly as the string constant with no extra whitespace or line breaks.
Note that the actual message lines above are prefixed with a tab character each, which is stripped by here document itself (because of the use of <<-). There are still line breaks at the end, which are then replaced by tr with spaces.
Note also that if you don't remove newlines, they will appear as is when "$message" is expanded. In some cases, you may be able to workaround by removing the double-quotes around $message, but the message will no longer be a single argument.
Depending on what sort of risks you will accept and how well you know and trust the data, you can use simplistic variable interpolation.
$: x="
this
is
variably indented
stuff
"
$: echo "$x" # preserves the newlines and spacing
this
is
variably indented
stuff
$: echo $x # no quotes, stacks it "neatly" with minimal spacing
this is variably indented stuff
Following #tripleee 's printf example (+1):
LONG_STRING=$( printf '%s' \
'This is the string that never ends.' \
' Yes, it goes on and on, my friends.' \
' My brother started typing it not knowing what it was;' \
" and he'll continue typing it forever just because..." \
' (REPEAT)' )
echo $LONG_STRING
This is the string that never ends. Yes, it goes on and on, my friends. My brother started typing it not knowing what it was; and he'll continue typing it forever just because... (REPEAT)
And we have included explicit spaces between the sentences, e.g. "' Yes...". Also, if we can do without the variable:
echo "$( printf '%s' \
'This is the string that never ends.' \
' Yes, it goes on and on, my friends.' \
' My brother started typing it not knowing what it was;' \
" and he'll continue typing it forever just because..." \
' (REPEAT)' )"
This is the string that never ends. Yes, it goes on and on, my friends. My brother started typing it not knowing what it was; and he'll continue typing it forever just because... (REPEAT)
Acknowledgement for the song that never ends
However, if you have indented code, it doesn't work out so well:
echo "continuation \
lines"
>continuation lines
Try with single quotes and concatenating the strings:
echo 'continuation' \
'lines'
>continuation lines
Note: the concatenation includes a whitespace.
This probably doesn't really answer your question but you might find it useful anyway.
The first command creates the script that's displayed by the second command.
The third command makes that script executable.
The fourth command provides a usage example.
john#malkovich:~/tmp/so$ echo $'#!/usr/bin/env python\nimport textwrap, sys\n\ndef bash_dedent(text):\n """Dedent all but the first line in the passed `text`."""\n try:\n first, rest = text.split("\\n", 1)\n return "\\n".join([first, textwrap.dedent(rest)])\n except ValueError:\n return text # single-line string\n\nprint bash_dedent(sys.argv[1])' > bash_dedent
john#malkovich:~/tmp/so$ cat bash_dedent
#!/usr/bin/env python
import textwrap, sys
def bash_dedent(text):
"""Dedent all but the first line in the passed `text`."""
try:
first, rest = text.split("\n", 1)
return "\n".join([first, textwrap.dedent(rest)])
except ValueError:
return text # single-line string
print bash_dedent(sys.argv[1])
john#malkovich:~/tmp/so$ chmod a+x bash_dedent
john#malkovich:~/tmp/so$ echo "$(./bash_dedent "first line
> second line
> third line")"
first line
second line
third line
Note that if you really want to use this script, it makes more sense to move the executable script into ~/bin so that it will be in your path.
Check the python reference for details on how textwrap.dedent works.
If the usage of $'...' or "$(...)" is confusing to you, ask another question (one per construct) if there's not already one up. It might be nice to provide a link to the question you find/ask so that other people will have a linked reference.

grep a pattern and output non-matching part of line

I know it is possible to invert grep output with the -v flag. Is there a way to only output the non-matching part of the matched line? I ask because I would like to use the return code of grep (which sed won't have). Here's sort of what I've got:
tags=$(grep "^$PAT" >/dev/null 2>&1)
[ "$?" -eq 0 ] && echo $tags
You could use sed:
$ sed -n "/$PAT/s/$PAT//p" $file
The only problem is that it'll return an exit code of 0 as long as the pattern is good, even if the pattern can't be found.
Explanation
The -n parameter tells sed not to print out any lines. Sed's default is to print out all lines of the file. Let's look at each part of the sed program in between the slashes. Assume the program is /1/2/3/4/5:
/$PAT/: This says to look for all lines that matches pattern $PAT to run your substitution command. Otherwise, sed would operate on all lines, even if there is no substitution.
/s/: This says you will be doing a substitution
/$PAT/: This is the pattern you will be substituting. It's $PAT. So, you're searching for lines that contain $PAT and then you're going to substitute the pattern for something.
//: This is what you're substituting for $PAT. It is null. Therefore, you're deleting $PAT from the line.
/p: This final p says to print out the line.
Thus:
You tell sed not to print out the lines of the file as it processes them.
You're searching for all lines that contain $PAT.
On these lines, you're using the s command (substitution) to remove the pattern.
You're printing out the line once the pattern is removed from the line.
How about using a combination of grep, sed and $PIPESTATUS to get the correct exit-status?
$ echo Humans are not proud of their ancestors, and rarely invite
them round to dinner | grep dinner | sed -n "/dinner/s/dinner//p"
Humans are not proud of their ancestors, and rarely invite them round to
$ echo $PIPESTATUS[1]
0[1]
The members of the $PIPESTATUS array hold the exit status of each respective command executed in a pipe. $PIPESTATUS[0] holds the exit status of the first command in the pipe, $PIPESTATUS[1] the exit status of the second command, and so on.
Your $tags will never have a value because you send it to /dev/null. Besides from that little problem, there is no input to grep.
echo hello |grep "^he" -q ;
ret=$? ;
if [ $ret -eq 0 ];
then
echo there is he in hello;
fi
a successful return code is 0.
...here is 1 take at your 'problem':
pat="most of ";
data="The apples are ripe. I will use most of them for jam.";
echo $data |grep "$pat" -q;
ret=$?;
[ $ret -eq 0 ] && echo $data |sed "s/$pat//"
The apples are ripe. I will use them for jam.
... exact same thing?:
echo The apples are ripe. I will use most of them for jam. | sed ' s/most\ of\ //'
It seems to me you have confused the basic concepts. What are you trying to do anyway?
I am going to answer the title of the question directly instead of considering the detail of the question itself:
"grep a pattern and output non-matching part of line"
The title to this question is important to me because the pattern I am searching for contains characters that sed will assign special meaning to. I want to use grep because I can use -F or --fixed-strings to cause grep to interpret the pattern literally. Unfortunately, sed has no literal option, but both grep and bash have the ability to interpret patterns without considering any special characters.
Note: In my opinion, trying to backslash or escape special characters in a pattern appears complex in code and is unreliable because it is difficult to test. Using tools which are designed to search for literal text leaves me with a comfortable 'that will work' feeling without considering POSIX.
I used both grep and bash to produce the result because bash is slow and my use of fast grep creates a small output from a large input. This code searches for the literal twice, once during grep to quickly extract matching lines and once during =~ to remove the match itself from each line.
while IFS= read -r || [[ -n "$RESULT" ]]; do
if [[ "$REPLY" =~ (.*)("$LITERAL_PATTERN")(.*) ]]; then
printf '%s\n' "${BASH_REMATCH[1]}${BASH_REMATCH[3]}"
else
printf "NOT-REFOUND" # should never happen
exit 1
fi
done < <(grep -F "$LITERAL_PATTERN" < "$INPUT_FILE")
Explanation:
IFS= Reassigning the input field separator is a special prefix for a read statement. Assigning IFS to the empty string causes read to accept each line with all spaces and tabs literally until end of line (assuming IFS is default space-tab-newline).
-r Tells read to accept backslashes in the input stream literally instead of considering them as the start of an escape sequence.
$REPLY Is created by read to store characters from the input stream. The newline at the end of each line will NOT be in $REPLY.
|| [[ -n "$REPLY" ]] The logical or causes the while loop to accept input which is not newline terminated. This does not need to exist because grep always provides a trailing newline for every match. But, I habitually use this in my read loops because without it, characters between the last newline and the end of file will be ignored because that causes read to fail even though content is successfully read.
=~ (.*)("$LITERAL_PATTERN")(.*) ]] Is a standard bash regex test, but anything in quotes in taken as a literal. If I wanted =~ to consider the regex characters in contained in $PATTERN, then I would need to eliminate the double quotes.
"${BASH_REMATCH[#]}" Is created by [[ =~ ]] where [0] is the entire match and [N] is the contents of the match in the Nth set of parentheses.
Note: I do not like to reassign stdin to a while loop because it is easy to error and difficult to see what is happening later. I usually create a function for this type of operation which acts typically and expects file_name parameters or reassignment of stdin during the call.

linux shell script: split string, put them in an array then loop through them [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Split string based on delimiter in Bash?
In a bash script how do I split string with a separator like ; and loop through the resulting array?
You can probably skip the step of explicitly creating an array...
One trick that I like to use is to set the inter-field separator (IFS) to the delimiter character. This is especially handy for iterating through the space or return delimited results from the stdout of any of a number of unix commands.
Below is an example using semicolons (as you had mentioned in your question):
export IFS=";"
sentence="one;two;three"
for word in $sentence; do
echo "$word"
done
Note: in regular Bourne-shell scripting setting and exporting the IFS would occur on two separate lines (IFS='x'; export IFS;).
If you don't wish to mess with IFS (perhaps for the code within the loop) this might help.
If know that your string will not have whitespace, you can substitute the ';' with a space and use the for/in construct:
#local str
for str in ${STR//;/ } ; do
echo "+ \"$str\""
done
But if you might have whitespace, then for this approach you will need to use a temp variable to hold the "rest" like this:
#local str rest
rest=$STR
while [ -n "$rest" ] ; do
str=${rest%%;*} # Everything up to the first ';'
# Trim up to the first ';' -- and handle final case, too.
[ "$rest" = "${rest/;/}" ] && rest= || rest=${rest#*;}
echo "+ \"$str\""
done
Here's a variation on ashirazi's answer which doesn't rely on $IFS. It does have its own issues which I ouline below.
sentence="one;two;three"
sentence=${sentence//;/$'\n'} # change the semicolons to white space
for word in $sentence
do
echo "$word"
done
Here I've used a newline, but you could use a tab "\t" or a space. However, if any of those characters are in the text it will be split there, too. That's the advantage of $IFS - it can not only enable a separator, but disable the default ones. Just make sure you save its value before you change it - as others have suggested.
Here is an example code that you may use:
$ STR="String;1;2;3"
$ for EACH in `echo "$STR" | grep -o -e "[^;]*"`; do
echo "Found: \"$EACH\"";
done
grep -o -e "[^;]*" will select anything that is not ';', therefore spliting the string by ';'.
Hope that help.
sentence="one;two;three"
a="${sentence};"
while [ -n "${a}" ]
do
echo ${a%%;*}
a=${a#*;}
done

Resources