grep a pattern and output non-matching part of line - bash

I know it is possible to invert grep output with the -v flag. Is there a way to only output the non-matching part of the matched line? I ask because I would like to use the return code of grep (which sed won't have). Here's sort of what I've got:
tags=$(grep "^$PAT" >/dev/null 2>&1)
[ "$?" -eq 0 ] && echo $tags

You could use sed:
$ sed -n "/$PAT/s/$PAT//p" $file
The only problem is that it'll return an exit code of 0 as long as the pattern is good, even if the pattern can't be found.
Explanation
The -n parameter tells sed not to print out any lines. Sed's default is to print out all lines of the file. Let's look at each part of the sed program in between the slashes. Assume the program is /1/2/3/4/5:
/$PAT/: This says to look for all lines that matches pattern $PAT to run your substitution command. Otherwise, sed would operate on all lines, even if there is no substitution.
/s/: This says you will be doing a substitution
/$PAT/: This is the pattern you will be substituting. It's $PAT. So, you're searching for lines that contain $PAT and then you're going to substitute the pattern for something.
//: This is what you're substituting for $PAT. It is null. Therefore, you're deleting $PAT from the line.
/p: This final p says to print out the line.
Thus:
You tell sed not to print out the lines of the file as it processes them.
You're searching for all lines that contain $PAT.
On these lines, you're using the s command (substitution) to remove the pattern.
You're printing out the line once the pattern is removed from the line.

How about using a combination of grep, sed and $PIPESTATUS to get the correct exit-status?
$ echo Humans are not proud of their ancestors, and rarely invite
them round to dinner | grep dinner | sed -n "/dinner/s/dinner//p"
Humans are not proud of their ancestors, and rarely invite them round to
$ echo $PIPESTATUS[1]
0[1]
The members of the $PIPESTATUS array hold the exit status of each respective command executed in a pipe. $PIPESTATUS[0] holds the exit status of the first command in the pipe, $PIPESTATUS[1] the exit status of the second command, and so on.

Your $tags will never have a value because you send it to /dev/null. Besides from that little problem, there is no input to grep.
echo hello |grep "^he" -q ;
ret=$? ;
if [ $ret -eq 0 ];
then
echo there is he in hello;
fi
a successful return code is 0.
...here is 1 take at your 'problem':
pat="most of ";
data="The apples are ripe. I will use most of them for jam.";
echo $data |grep "$pat" -q;
ret=$?;
[ $ret -eq 0 ] && echo $data |sed "s/$pat//"
The apples are ripe. I will use them for jam.
... exact same thing?:
echo The apples are ripe. I will use most of them for jam. | sed ' s/most\ of\ //'
It seems to me you have confused the basic concepts. What are you trying to do anyway?

I am going to answer the title of the question directly instead of considering the detail of the question itself:
"grep a pattern and output non-matching part of line"
The title to this question is important to me because the pattern I am searching for contains characters that sed will assign special meaning to. I want to use grep because I can use -F or --fixed-strings to cause grep to interpret the pattern literally. Unfortunately, sed has no literal option, but both grep and bash have the ability to interpret patterns without considering any special characters.
Note: In my opinion, trying to backslash or escape special characters in a pattern appears complex in code and is unreliable because it is difficult to test. Using tools which are designed to search for literal text leaves me with a comfortable 'that will work' feeling without considering POSIX.
I used both grep and bash to produce the result because bash is slow and my use of fast grep creates a small output from a large input. This code searches for the literal twice, once during grep to quickly extract matching lines and once during =~ to remove the match itself from each line.
while IFS= read -r || [[ -n "$RESULT" ]]; do
if [[ "$REPLY" =~ (.*)("$LITERAL_PATTERN")(.*) ]]; then
printf '%s\n' "${BASH_REMATCH[1]}${BASH_REMATCH[3]}"
else
printf "NOT-REFOUND" # should never happen
exit 1
fi
done < <(grep -F "$LITERAL_PATTERN" < "$INPUT_FILE")
Explanation:
IFS= Reassigning the input field separator is a special prefix for a read statement. Assigning IFS to the empty string causes read to accept each line with all spaces and tabs literally until end of line (assuming IFS is default space-tab-newline).
-r Tells read to accept backslashes in the input stream literally instead of considering them as the start of an escape sequence.
$REPLY Is created by read to store characters from the input stream. The newline at the end of each line will NOT be in $REPLY.
|| [[ -n "$REPLY" ]] The logical or causes the while loop to accept input which is not newline terminated. This does not need to exist because grep always provides a trailing newline for every match. But, I habitually use this in my read loops because without it, characters between the last newline and the end of file will be ignored because that causes read to fail even though content is successfully read.
=~ (.*)("$LITERAL_PATTERN")(.*) ]] Is a standard bash regex test, but anything in quotes in taken as a literal. If I wanted =~ to consider the regex characters in contained in $PATTERN, then I would need to eliminate the double quotes.
"${BASH_REMATCH[#]}" Is created by [[ =~ ]] where [0] is the entire match and [N] is the contents of the match in the Nth set of parentheses.
Note: I do not like to reassign stdin to a while loop because it is easy to error and difficult to see what is happening later. I usually create a function for this type of operation which acts typically and expects file_name parameters or reassignment of stdin during the call.

Related

Passing regular expression as parameter [duplicate]

This question already has answers here:
Checking the success of a command in a bash `if [ .. ]` statement
(1 answer)
When to wrap quotes around a shell variable?
(5 answers)
Closed last year.
I am trying to pass a regular expression as a parameter. What should I fix in my code?
My goal is to send find and the regular expression string, then use grep on the parameter so I can do whatever I want with what grep finds (which is print the count of occurrences).
This is what I send:
$ ./lab12.sh find [Gg]reen
Here's my bash code:
if [[ "$1" == "find" ]]
then
declare -i cnt=0
for file in /tmp/beatles/*.txt ; do
if [[ grep -e $2 ]] //problem is here...
then
((cnt=cnt+1))
fi
done
echo "$cnt songs contain the pattern "$2""
fi
The if statement takes a command. [[ being one, and grep is another, writing [[ grep ... ]] is essentially as wrong as writing vim grep, or cat grep etc, just use:
if grep -q -e "$pattern"
then
...
instead.
The -q switch to grep will disable output, but set the exit status to 0 (success) when the pattern is matches, and 1 (failure) otherwise, and the if statement will only execute the then block if the command succeded.
Using -q will allow grep to exit as soon as the first line is matches.
And as always, remember to wrap your paremeter expansions in double quotes, to avoid pathname expansion and wordsplitting.
Note that square brackets [...] will be interpreted by your calling shell, and you should escape them, or wrap the whole pattern in quotes.
It's always recommended use single quotes, as the only special character is another single quote.
$ ./lab12.sh find '[Gg]reen'

POSIX/Bash pad variable with trailing newlines

I have a variable with some lines in it and I would like to pad it with a number of newlines defined in another variable. However it seems that the subshell may be stripping the trailing newlines. I cannot just use '\n' with echo -e as the lines may already contain escaped chars which need to be printed as is.
I have found I can print an arbitrary number of newlines using this.
n=5
yes '' | sed -n "1,${n}p;${n}q"
But if I run this in a subshell to store it in the variable, the subshell appears to strip the trailing newlines.
I can approximate the functionality but it's clumsy and due to the way I am using it I would much rather be able to just call echo "$var" or even use $var itself for things like string concatenation. This approximation runs into the same issue with subshells as soon as the last (filler) line of the variable is removed.
This is my approximation
n=5
var="test"
#I could also just set n=6
cmd="1,$((n+1))p;$((n+1))q"
var="$var$(yes '' | sed -n $cmd; echo .)"
#Now I can use it with
echo "$var" | head -n -1
Essentially I need a good way of appending a number of newlines to a variable which can then be printed with echo.
I would like to keep this POSIX compliant if at all possible but at this stage a bash solution would also be acceptable. I am also using this as part of a tool for which I have set a challenge of minimizing line and character count while maintaining readability. But I can work that out once I have a workable solution
Command substitutions with either $( ) or backticks will trim trailing newlines. So don't use them; use the shell's built-in string manipulation:
n=5
var="test"
while [ "$n" -gt 0 ]; do
var="$var
"
n=$((n-1))
done
Note that there must be nothing after the var="$var (before the newline), and nothing before the " on the next line (no indentation!).
A sequence of n newlines:
printf -v spaces "%*s" $n ""
newlines=${spaces// /$'\n'}

How to grep from a single line

I'm using a weather API that outputs all data in a single line. How do I use grep to get the values for "summary" and "apparentTemperature"? My command of regular expressions is basically nonexistent, but I'm ready to learn.
{"latitude":59.433335,"longitude":24.750486,"timezone":"Europe/Tallinn","offset":2,"currently":{"time":1485880052,"summary":"Clear","icon":"clear-night","precipIntensity":0,"precipProbability":0,"temperature":0.76,"apparentTemperature":-3.34,"dewPoint":-0.13,"humidity":0.94,"windSpeed":3.99,"windBearing":262,"visibility":9.99,"cloudCover":0.11,"pressure":1017.72,"ozone":282.98}}
Thank you!
How do I use grep to get the values for "summary" and "apparentTemperature"?
You use grep's -o flag, which makes it output only the matched part.
Since you don't know much about regex, I suggest you instead learn to use a JSON parser, which would be more appropriate for this task.
For example with jq, the following command would extract the current summary :
<whatever is your JSON source> | jq '.currently.summary'
Assume your single-line data is contained in a variable called DATA_LINE.
If you are certain the field is only present once in the whole line, you could do something like this in Bash:
if
[[ "$DATA_LINE" =~ \"summary\":\"([^\"]*)\" ]]
then
summary="${BASH_REMATCH[1]}"
echo "Summary field is : $summary"
else
echo "Summary field not found"
fi
You would have to do that once for each field, unless you build a more complex matching expression that assumes fields are in a specific order.
As a note, the matching expression \"summary\":\"([^\"]*)\" finds the first occurrence in the data of a substring consisting of :
"summary":" (double quotes included), followed by
([^\"]*) a sub-expression formed of a sequence of zero or more characters other than a double quote : this is in parentheses to make it available later as an element in the BASH_REMATCH array, because this is the value you want to extract
and finally a final quote ; this is not absolutely necessary, but protects from reading from a truncated data line.
For apparentTemperature the code will be a bit different because the field does not have the same format.
if
[[ "$DATA_LINE" =~ \"apparentTemperature\":([^,]*), ]]
then
apparentTemperature="${BASH_REMATCH[1]}"
echo "Apparent temperature field is : $apparentTemperature"
else
echo "Apparent temperature field not found"
fi
This is fairly easily understood if your skills are limited - like mine! Assuming your string is in a variable called $LINE:
summary=$(sed -e 's/.*summary":"//' -e 's/".*//' <<< $LINE)
Then check:
echo $summary
Clear
That executes (-e) 2 sed commands. The first one substitutes everything up to summary":" with nothing and the second substitutes the first remaining double quote and everything that follows with nothing.
Extract apparent temperature:
appTemp=$(sed -e 's/.*apparentTemperature"://' -e 's/,.*//' <<< $LINE)
Then check:
echo $appTemp
-3.34
As Aaron mentioned a json parser like jq is the right tool for this, but since the question was about grep, let's see one way to do it.
Assuming your API return value is in $json:
json='{"latitude":59.433335,"longitude":24.750486,"timezone":"Europe/Tallinn","offset":2,"currently":{"time":1485880052,"summary":"Clear","icon":"clear-night","precipIntensity":0,"precipProbability":0,"temperature":0.76,"apparentTemperature":-3.34,"dewPoint":-0.13,"humidity":0.94,"windSpeed":3.99,"windBearing":262,"visibility":9.99,"cloudCover":0.11,"pressure":1017.72,"ozone":282.98}}'
The patterns you see in the parenthesis are lookbehind and lookahead assertions for context matching. They can be used with the -P Perl regex option and will not be captured in the output.
summary=$(<<< "$json" grep -oP '(?<="summary":").*?(?=",)')
apparentTemperature=$(<<< "$json" grep -oP '(?<="apparentTemperature":).*?(?=,)')

how to count the number of lines in a variable in a shell script

Having some trouble here. I want to capture output from ls command into variable. Then later use that variable and count the number of lines in it. I've tried a few variations
This works, but then if there are NO .txt files, it says the count is 1:
testVar=`ls -1 *.txt`
count=`wc -l <<< $testVar`
echo '$count'
This works for when there are no .txt files, but the count comes up short by 1 when there are .txt files:
testVar=`ls -1 *.txt`
count=`printf '$testVar' | wc -l`
echo '$count'
This variation also says the count is 1 when NO .txt files exist:
testVar=`ls -1 *.txt`
count=`echo '$testVar' | wc -l`
echo '$count'
Edit: I should mention this is korn shell.
The correct approach is to use an array.
# Use ~(N) so that if the match fails, the array is empty instead
# of containing the pattern itself as the single entry.
testVar=( ~(N)*.txt )
count=${#testVar[#]}
This little question actually includes the result of three standard shell gotchas (both bash and korn shell):
Here-strings (<<<...) have a newline added to them if they don't end with a newline. That makes it impossible to send a completely empty input to a command with a here-string.
All trailing newlines are removed from the output of a command used in command substitution (cmd or preferably $(cmd)). So you have no way to know how many blank lines were at the end of the output.
(Not really a shell gotcha, but it comes up a lot). wc -l counts the number of newline characters, not the number of lines. So the last "line" is not counted if it is not terminated with a newline. (A non-empty file which does not end with a newline character is not a Posix-conformant text file. So weird results like this are not unexpected.)
So, when you do:
var=$(cmd)
utility <<<"$var"
The command substitution in the first line removes all trailing newlines, and then the here-string expansion in the second line puts exactly one trailing newline back. That converts an empty output to a single blank line, and otherwise removes blank lines from the end of the output.
So utility is wc -l, then you will get the correct count unless the output was empty, in which case it will be 1 instead of 0.
On the other hand, with
var=$(cmd)
printf %s "$cmd" | utility
The trailing newline(s) are removed, as before, by the command substitution, so the printf leaves the last line (if any) unterminated. Now if utility is wc -l, you'll end up with 0 if the output was empty, but for non-empty files, the count will not include the last line of the output.
One possible shell-independent work-around is to use the second option, but with grep '' as a filter:
var=$(cmd)
printf %s "${var}" | grep '' | utility
The empty pattern '' will match every line, and grep always terminates every line of output. (Of course, this still won't count blank lines at the end of the output.)
Having said all that, it is always a bad idea to try to parse the output of ls, even just to count the number of files. (A filename might include a newline character, for example.) So it would be better to use a glob expansion combined with some shell-specific way of counting the number of objects in the glob expansion (and some other shell-specific way of detecting when no file matches the glob).
I was going to suggest this, which is a construct I've used in bash:
f=($(</path/to/file))
echo ${#f[#]}
To handle multiple files, you'd just .. add files.
f=($(</path/to/file))
f=+($(</path/to/otherfile))
or
f=($(</path/to/file) $(</path/to/otherfile))
To handle lots of files, you could loop:
f=()
for file in *.txt; do
f+=($(<$file))
done
Then I saw chepner's response, which I gather is more korn-y than mine.
NOTE: loops are better than parsing ls.
You can also use like this:
#!/bin/bash
testVar=`ls -1 *.txt`
if [ -z "$testVar" ]; then
# Empty
count=0
else
# Not Empty
count=`wc -l <<< "$testVar"`
fi
echo "Count : $count"

Bash not recognising strings as equal

I have two variables line and sorted, which are both space-delimited strings of numbers. Part of my script depends on checking whether these two strings are equal:
if [[ $sorted == $line ]]
then
echo "test"
fi
When running this I get no output. A visual check, using: echo $sorted and echo $line gives two seemingly similar outputs.
I thought that this may be due to either of the two outputs having an extra white space character at the end, so I decided to check whether removing spaces from the strings removed the problem:
test1=`echo $sorted | tr -d ' '`
test2=`echo $line | tr -d ' '`
Subsequently performing:
if [[ "$test1" == "$test2" ]]
then
echo "test"
fi
Did give the desired "test" output. However, when comparing the number of characters of both variables using wc, the output is the same for both variables. Furthermore, checking the number of white space characters in line and sorted with echo <variable> | grep -o "\s" | wc -l also gives the same output for both variables.
My question is what could be causing this behaviour; running tr removes the problem, yet counting the number of white spaces with wc and grep shows that the number of spaces (or at least, characters) is similar.
I think that some of your tests to see whether the strings are the same are broken, because you're not quoting your variables. This test should show a different count for the two variables:
echo "$var_name" | grep -c "\s"
You can use declare -p var_name to see the contents of your variable, which should show you where the leading/trailing white space is.
As you're using bash, you can also take advantage of the <<< here string syntax instead of using echo:
grep -c "\s" <<<"$var_name"
As kojiro points out in the comments (thanks), this is a more robust approach and saves creating a subshell.

Resources