How to grep from a single line - bash

I'm using a weather API that outputs all data in a single line. How do I use grep to get the values for "summary" and "apparentTemperature"? My command of regular expressions is basically nonexistent, but I'm ready to learn.
{"latitude":59.433335,"longitude":24.750486,"timezone":"Europe/Tallinn","offset":2,"currently":{"time":1485880052,"summary":"Clear","icon":"clear-night","precipIntensity":0,"precipProbability":0,"temperature":0.76,"apparentTemperature":-3.34,"dewPoint":-0.13,"humidity":0.94,"windSpeed":3.99,"windBearing":262,"visibility":9.99,"cloudCover":0.11,"pressure":1017.72,"ozone":282.98}}
Thank you!

How do I use grep to get the values for "summary" and "apparentTemperature"?
You use grep's -o flag, which makes it output only the matched part.
Since you don't know much about regex, I suggest you instead learn to use a JSON parser, which would be more appropriate for this task.
For example with jq, the following command would extract the current summary :
<whatever is your JSON source> | jq '.currently.summary'

Assume your single-line data is contained in a variable called DATA_LINE.
If you are certain the field is only present once in the whole line, you could do something like this in Bash:
if
[[ "$DATA_LINE" =~ \"summary\":\"([^\"]*)\" ]]
then
summary="${BASH_REMATCH[1]}"
echo "Summary field is : $summary"
else
echo "Summary field not found"
fi
You would have to do that once for each field, unless you build a more complex matching expression that assumes fields are in a specific order.
As a note, the matching expression \"summary\":\"([^\"]*)\" finds the first occurrence in the data of a substring consisting of :
"summary":" (double quotes included), followed by
([^\"]*) a sub-expression formed of a sequence of zero or more characters other than a double quote : this is in parentheses to make it available later as an element in the BASH_REMATCH array, because this is the value you want to extract
and finally a final quote ; this is not absolutely necessary, but protects from reading from a truncated data line.
For apparentTemperature the code will be a bit different because the field does not have the same format.
if
[[ "$DATA_LINE" =~ \"apparentTemperature\":([^,]*), ]]
then
apparentTemperature="${BASH_REMATCH[1]}"
echo "Apparent temperature field is : $apparentTemperature"
else
echo "Apparent temperature field not found"
fi

This is fairly easily understood if your skills are limited - like mine! Assuming your string is in a variable called $LINE:
summary=$(sed -e 's/.*summary":"//' -e 's/".*//' <<< $LINE)
Then check:
echo $summary
Clear
That executes (-e) 2 sed commands. The first one substitutes everything up to summary":" with nothing and the second substitutes the first remaining double quote and everything that follows with nothing.
Extract apparent temperature:
appTemp=$(sed -e 's/.*apparentTemperature"://' -e 's/,.*//' <<< $LINE)
Then check:
echo $appTemp
-3.34

As Aaron mentioned a json parser like jq is the right tool for this, but since the question was about grep, let's see one way to do it.
Assuming your API return value is in $json:
json='{"latitude":59.433335,"longitude":24.750486,"timezone":"Europe/Tallinn","offset":2,"currently":{"time":1485880052,"summary":"Clear","icon":"clear-night","precipIntensity":0,"precipProbability":0,"temperature":0.76,"apparentTemperature":-3.34,"dewPoint":-0.13,"humidity":0.94,"windSpeed":3.99,"windBearing":262,"visibility":9.99,"cloudCover":0.11,"pressure":1017.72,"ozone":282.98}}'
The patterns you see in the parenthesis are lookbehind and lookahead assertions for context matching. They can be used with the -P Perl regex option and will not be captured in the output.
summary=$(<<< "$json" grep -oP '(?<="summary":").*?(?=",)')
apparentTemperature=$(<<< "$json" grep -oP '(?<="apparentTemperature":).*?(?=,)')

Related

Replacing variable section in string

I am having a long list of strings (actually files names) $var looking like this
p1035sEthinylestradiol913
p1035sTAbs872
p946sCarbaryl1182
Now I wish to replace the string, which occurs between the first s and the first integer [1-9], with R. Hence the output should look like:
p1035sR913
p1035sR872
p946sR1182
I was trying something like this:
echo ${var/s*[1-9]/R}
But this of course will remove the first integer in the string after the smatch and that is not what I want. Can someone help me out here? Thanks a lot in advance!
To keep the matched digit you could switch from parameter expansions like ${var/s*[1-9]/R} to matching [[ string =~ pattern ]]. The matched digit could then be retrieved by BASH_REMATCH. However, you still had to do this for every entry in your list.
With sed you automatically change every line and keeping the digit is easy:
sed -E 's/s.*([0-9])/sR\1/' file
or
someCommand | sed -E 's/s.*([0-9])/sR\1/'

bash script on specific URL string manipulation

I need to manipulate a string (URL) of which I don't know lenght.
the string is something like
https://x.xx.xxx.xxx/dontcare1/dontcare2/dontcareN/keyword/restofstring
I basically need a regular expression which returns this:
https://x.xx.xxx.xxx/keyword/restofstring
where the x is the current ip which can vary everytime and I don't know the number of dontcares.
I actually have no idea how to do it, been 2 hours on the problem but didn't find a solution.
thanks!
You can use sed as follows:
sed -E 's=(https://[^/]*).*(/keyword/.*)=\1\2='
s stands for substitute and has the form s=search pattern=replacement pattern=.
The search pattern is a regex in which we grouped (...) the parts you want to extract.
The replacement pattern accesses these groups with \1 and \2.
You can feed a file or stdin to sed and it will process the input line by line.
If you have a string variable and use bash, zsh, or something similar you also can feed that variable directly into stdin using <<<.
Example usage for bash:
input='https://x.xx.xxx.xxx/dontcare1/dontcare2/dontcareN/keyword/restofstring'
output="$(sed -E 's=(https://[^/]*).*(/keyword/.*)=\1\2=' <<< "$input")"
echo "$output" # prints https://x.xx.xxx.xxx/keyword/restofstring
echo "https://x.xx.xxx.xxx/dontcare1/dontcare2/dontcareN/keyword/restofstring" | sed "s/dontcare[0-9]\+\///g"
sed is used to manipulate text. dontcare[0-9]\+\///g is an escaped form of the regular expression dontcare[0-9]+/, which matches the word "dontcare" followed by 1 or more digits, followed by the / character.
sed's pattern works like this: s/find/replace/g, where g is a command that allowed you to match more than one instance of the pattern.
You can see that regular expression in action here.
Note that this assumes there are no dontcareNs in the rest of the string. If that's the case, Socowi's answer works better.
You could also use read with a / value for $IFS to parse out the trash.
$: IFS=/ read proto trash url trash trash trash keyword rest <<< "https://x.xx.xxx.xxx/dontcare1/dontcare2/dontcareN/keyword/restofstring"
$: echo "$proto//$url/$keyword/$rest"
https://x.xx.xxx.xxx/keyword/restofstring
This is more generalized when the dontcare... values aren't known and predictable strings.
This one is pure bash, though I like Socowi's answer better.
Here's a sed variation which picks out the host part and the last two components from the path.
url='http://example.com:1234/ick/poo/bar/quux/fnord'
newurl=$(echo "$url" | sed 's%\(https*://[^/?]*[^?/]\)[^ <>'"'"'"]*/\([^/ <>'"''"]*/^/ <>'"''"]*\)%\1\2%')
The general form is sed 's%pattern%replacement%' where the pattern matches through the end of the host name part (captured into one set of backslashed parentheses) then skips through the penultimate slash, then captures the remainder of the URL including the last slash; and the replacement simply recalls the two captured groups without the skipped part between them.

Source grep expression from array

I am passing input to grep from previously declared variable that contains multiple lines. My goal is to extract only certain lines.
As I increase the argument count in grep, the readability goes down.
var1="
_id=1234
_type=document
date_found=988657890
whateverelse=1211121212"
echo "$var1"
_id=1234
_type=document
date_found=988657890
whateverelse=1211121212
grep -e 'file1\|^_id=\|_type\|date_found\|whateverelse' <<< $var1
_id=1234
_type=document
date_found=988657890
whateverelse=1211121212
My idea was to pass parameters from array and it will increase readibility:
declare -a grep_array=(
"^_id=\|"
"_type\|"
"date_found\|"
"whateverelse"
)
echo ${grep_array[#]}
^_id=\| _type\| date_found\| whateverelse
grep -e '${grep_array[#]}' <<<$var1
---- no results
How can I do it with grep to pass parameters with multiple OR conditions from somewhere else not one line?
As I have more arguments the readibility and manageability goes down.
Your idea is right but you've got couple of issues in the logic. The array expansion of type ${array[#]} puts the contents of the array as separate words, split by the white space character. While you wanted to pass a single regexp string to grep, the shell has expanded the array into its constituents and tries it to evaluate as
grep -e '^_id=\|' '_type\|' 'date_found\|' whateverelse
which means each of your regexp strings are now evaluated as a file content instead of a regexp string.
So to let grep treat your whole array content as a single string use the ${array[*]} expansion. Since this particular type of expansion uses the IFS character for joining the array content, you get a default space (default IFS value) between the words if it is not reset. The syntax below resets the IFS value in a sub-shell and prints out the expanded array content
grep -e "$(IFS=; printf '%s' "${grep_array[*]}")" <<<"$str1"

Extracting snmpdump values (with an exact MIB) from a shell script

I have a a some SNMP dump:
1.3.6.1.2.1.1.2.0|5|1.3.6.1.4.1.9.1.1178
1.3.6.1.2.1.1.3.0|7|1881685367
1.3.6.1.2.1.1.4.0|6|""
1.3.6.1.2.1.1.5.0|6|"hgfdhg-4365.gfhfg.dfg.com"
1.3.6.1.2.1.1.6.0|6|""
1.3.6.1.2.1.1.7.0|2|6
1.3.6.1.2.1.1.8.0|7|0
1.3.6.1.2.1.1.9.1.2.1|5|1.3.6.1.4.1.9.7.129
1.3.6.1.2.1.1.9.1.2.2|5|1.3.6.1.4.1.9.7.115
And need to grep all data in first string after 1.3.6.1.2.1.1.2.0|5|, but not include this start of the string in grep itself. So, I must receive 1.3.6.1.4.1.9.1.1178 in grep. I've tried to use regex:
\b1.3.6.1.2.1.1.2.0\|5\|\s*([^\n\r]*)
But without any success. If a regular expression, or grep, is in fact the right tool, can you help me find the right regex? Otherwise, what tools should I consider instead?
With GNU grep +PCRE support, you can use Perl's \K flag to discard part of the matched string :
grep -Po "1\.3\.6\.1\.2\.1\.1\.2\.0\|5\|\K.*"
-P enables Perl's regex mode and -o switches output to matched parts rather than whole lines.
I had to escape the characters that have special meaning in Perl regexs, but this can be avoided as 123 suggests, by enclosing the characters to interpret literally between \Q and \E :
grep -Po "\Q1.3.6.1.2.1.1.2.0|5|\E\K.*"
I would usually solve this with sed as follows :
sed -n 's/1\.3\.6\.1\.2\.1\.1\.2\.0|5|\(.*\)/\1/p'
The -n flag disables implicit output and the search and replace command will remove the searched prefix from the line, leaving the relevant part to be printed.
The characters that have special meaning in GNU Basic Regular Expressions (BRE) must be escaped, which in this case is only .. Also note that the grouping tokens are \( and \) rather than the usual ( and ).
An alternate way to do this is in native shell, without any regexes at all. Consider:
prefix='1.3.6.1.2.1.1.2.0|5|'
while read -r line; do
[[ $line = "$prefix"* ]] && printf '%s\n' "${line#$prefix}"
done
If your original string is piped into the while read loop, the output is precisely 1.3.6.1.4.1.9.1.1178.

grep a pattern and output non-matching part of line

I know it is possible to invert grep output with the -v flag. Is there a way to only output the non-matching part of the matched line? I ask because I would like to use the return code of grep (which sed won't have). Here's sort of what I've got:
tags=$(grep "^$PAT" >/dev/null 2>&1)
[ "$?" -eq 0 ] && echo $tags
You could use sed:
$ sed -n "/$PAT/s/$PAT//p" $file
The only problem is that it'll return an exit code of 0 as long as the pattern is good, even if the pattern can't be found.
Explanation
The -n parameter tells sed not to print out any lines. Sed's default is to print out all lines of the file. Let's look at each part of the sed program in between the slashes. Assume the program is /1/2/3/4/5:
/$PAT/: This says to look for all lines that matches pattern $PAT to run your substitution command. Otherwise, sed would operate on all lines, even if there is no substitution.
/s/: This says you will be doing a substitution
/$PAT/: This is the pattern you will be substituting. It's $PAT. So, you're searching for lines that contain $PAT and then you're going to substitute the pattern for something.
//: This is what you're substituting for $PAT. It is null. Therefore, you're deleting $PAT from the line.
/p: This final p says to print out the line.
Thus:
You tell sed not to print out the lines of the file as it processes them.
You're searching for all lines that contain $PAT.
On these lines, you're using the s command (substitution) to remove the pattern.
You're printing out the line once the pattern is removed from the line.
How about using a combination of grep, sed and $PIPESTATUS to get the correct exit-status?
$ echo Humans are not proud of their ancestors, and rarely invite
them round to dinner | grep dinner | sed -n "/dinner/s/dinner//p"
Humans are not proud of their ancestors, and rarely invite them round to
$ echo $PIPESTATUS[1]
0[1]
The members of the $PIPESTATUS array hold the exit status of each respective command executed in a pipe. $PIPESTATUS[0] holds the exit status of the first command in the pipe, $PIPESTATUS[1] the exit status of the second command, and so on.
Your $tags will never have a value because you send it to /dev/null. Besides from that little problem, there is no input to grep.
echo hello |grep "^he" -q ;
ret=$? ;
if [ $ret -eq 0 ];
then
echo there is he in hello;
fi
a successful return code is 0.
...here is 1 take at your 'problem':
pat="most of ";
data="The apples are ripe. I will use most of them for jam.";
echo $data |grep "$pat" -q;
ret=$?;
[ $ret -eq 0 ] && echo $data |sed "s/$pat//"
The apples are ripe. I will use them for jam.
... exact same thing?:
echo The apples are ripe. I will use most of them for jam. | sed ' s/most\ of\ //'
It seems to me you have confused the basic concepts. What are you trying to do anyway?
I am going to answer the title of the question directly instead of considering the detail of the question itself:
"grep a pattern and output non-matching part of line"
The title to this question is important to me because the pattern I am searching for contains characters that sed will assign special meaning to. I want to use grep because I can use -F or --fixed-strings to cause grep to interpret the pattern literally. Unfortunately, sed has no literal option, but both grep and bash have the ability to interpret patterns without considering any special characters.
Note: In my opinion, trying to backslash or escape special characters in a pattern appears complex in code and is unreliable because it is difficult to test. Using tools which are designed to search for literal text leaves me with a comfortable 'that will work' feeling without considering POSIX.
I used both grep and bash to produce the result because bash is slow and my use of fast grep creates a small output from a large input. This code searches for the literal twice, once during grep to quickly extract matching lines and once during =~ to remove the match itself from each line.
while IFS= read -r || [[ -n "$RESULT" ]]; do
if [[ "$REPLY" =~ (.*)("$LITERAL_PATTERN")(.*) ]]; then
printf '%s\n' "${BASH_REMATCH[1]}${BASH_REMATCH[3]}"
else
printf "NOT-REFOUND" # should never happen
exit 1
fi
done < <(grep -F "$LITERAL_PATTERN" < "$INPUT_FILE")
Explanation:
IFS= Reassigning the input field separator is a special prefix for a read statement. Assigning IFS to the empty string causes read to accept each line with all spaces and tabs literally until end of line (assuming IFS is default space-tab-newline).
-r Tells read to accept backslashes in the input stream literally instead of considering them as the start of an escape sequence.
$REPLY Is created by read to store characters from the input stream. The newline at the end of each line will NOT be in $REPLY.
|| [[ -n "$REPLY" ]] The logical or causes the while loop to accept input which is not newline terminated. This does not need to exist because grep always provides a trailing newline for every match. But, I habitually use this in my read loops because without it, characters between the last newline and the end of file will be ignored because that causes read to fail even though content is successfully read.
=~ (.*)("$LITERAL_PATTERN")(.*) ]] Is a standard bash regex test, but anything in quotes in taken as a literal. If I wanted =~ to consider the regex characters in contained in $PATTERN, then I would need to eliminate the double quotes.
"${BASH_REMATCH[#]}" Is created by [[ =~ ]] where [0] is the entire match and [N] is the contents of the match in the Nth set of parentheses.
Note: I do not like to reassign stdin to a while loop because it is easy to error and difficult to see what is happening later. I usually create a function for this type of operation which acts typically and expects file_name parameters or reassignment of stdin during the call.

Resources