I have two variables line and sorted, which are both space-delimited strings of numbers. Part of my script depends on checking whether these two strings are equal:
if [[ $sorted == $line ]]
then
echo "test"
fi
When running this I get no output. A visual check, using: echo $sorted and echo $line gives two seemingly similar outputs.
I thought that this may be due to either of the two outputs having an extra white space character at the end, so I decided to check whether removing spaces from the strings removed the problem:
test1=`echo $sorted | tr -d ' '`
test2=`echo $line | tr -d ' '`
Subsequently performing:
if [[ "$test1" == "$test2" ]]
then
echo "test"
fi
Did give the desired "test" output. However, when comparing the number of characters of both variables using wc, the output is the same for both variables. Furthermore, checking the number of white space characters in line and sorted with echo <variable> | grep -o "\s" | wc -l also gives the same output for both variables.
My question is what could be causing this behaviour; running tr removes the problem, yet counting the number of white spaces with wc and grep shows that the number of spaces (or at least, characters) is similar.
I think that some of your tests to see whether the strings are the same are broken, because you're not quoting your variables. This test should show a different count for the two variables:
echo "$var_name" | grep -c "\s"
You can use declare -p var_name to see the contents of your variable, which should show you where the leading/trailing white space is.
As you're using bash, you can also take advantage of the <<< here string syntax instead of using echo:
grep -c "\s" <<<"$var_name"
As kojiro points out in the comments (thanks), this is a more robust approach and saves creating a subshell.
Related
I have a problem with the following for loop:
X="*back* OLD"
for P in $X
do
echo "-$P"
done
I need it to output just:
-*back*
-OLD
However, it lists all files in the current directory matching the *back* pattern. For example it gives the following:
-backup.bkp
-backup_new.bkp
-backup_X
-OLD
How to force it to output the exact pattern?
Use an array, as unquoted parameter expansions are still subject to globbing.
X=( "*back*" OLD )
for P in "${X[#]}"; do
printf '%s\n' "$P"
done
(Use printf, as echo could try to interpret an argument as an option, for example, if you had n in the value of X.)
Use set -o noglob before your loop and set +o noglob after to disable and enable globbing.
To prevent filename expansion you could read in the string as a Here String.
To iterate over the items, you could turn them into lines using parameter expansion and read them linewise using read. In order to be able to put a - sign as the first character, use printf instead of echo.
X="*back* OLD"
while read -r x
do printf -- '-%s\n' "$x"
done <<< "${X/ /$'\n'}"
Another way could be to use tr to transform the string into lines, then use paste with the - sign as delimiter and "nothing" from /dev/null as first column.
X="*back* OLD"
tr ' ' '\n' <<< "$X" | paste -d- /dev/null -
Both should output:
-*back*
-OLD
I have a variable with some lines in it and I would like to pad it with a number of newlines defined in another variable. However it seems that the subshell may be stripping the trailing newlines. I cannot just use '\n' with echo -e as the lines may already contain escaped chars which need to be printed as is.
I have found I can print an arbitrary number of newlines using this.
n=5
yes '' | sed -n "1,${n}p;${n}q"
But if I run this in a subshell to store it in the variable, the subshell appears to strip the trailing newlines.
I can approximate the functionality but it's clumsy and due to the way I am using it I would much rather be able to just call echo "$var" or even use $var itself for things like string concatenation. This approximation runs into the same issue with subshells as soon as the last (filler) line of the variable is removed.
This is my approximation
n=5
var="test"
#I could also just set n=6
cmd="1,$((n+1))p;$((n+1))q"
var="$var$(yes '' | sed -n $cmd; echo .)"
#Now I can use it with
echo "$var" | head -n -1
Essentially I need a good way of appending a number of newlines to a variable which can then be printed with echo.
I would like to keep this POSIX compliant if at all possible but at this stage a bash solution would also be acceptable. I am also using this as part of a tool for which I have set a challenge of minimizing line and character count while maintaining readability. But I can work that out once I have a workable solution
Command substitutions with either $( ) or backticks will trim trailing newlines. So don't use them; use the shell's built-in string manipulation:
n=5
var="test"
while [ "$n" -gt 0 ]; do
var="$var
"
n=$((n-1))
done
Note that there must be nothing after the var="$var (before the newline), and nothing before the " on the next line (no indentation!).
A sequence of n newlines:
printf -v spaces "%*s" $n ""
newlines=${spaces// /$'\n'}
To parse colon-delimited fields I can use read with a custom IFS:
$ echo 'foo.c:41:switch (color) {' | { IFS=: read file line text && echo "$file | $line | $text"; }
foo.c | 41 | switch (color) {
If the last field contains colons, no problem, the colons are retained.
$ echo 'foo.c:42:case RED: //alert' | { IFS=: read file line text && echo "$file | $line | $text"; }
foo.c | 42 | case RED: //alert
A trailing delimiter is also retained...
$ echo 'foo.c:42:case RED: //alert:' | { IFS=: read file line text && echo "$file | $line | $text"; }
foo.c | 42 | case RED: //alert:
...Unless it's the only extra delimiter. Then it's stripped. Wait, what?
$ echo 'foo.c:42:case RED:' | { IFS=: read file line text && echo "$file | $line | $text"; }
foo.c | 42 | case RED
Bash, ksh93, and dash all do this, so I'm guessing it is POSIX standard behavior.
Why does it happen?
What's the best alternative?
I want to parse the strings above into three variables and I don't want to mangle any text in the third field. I had thought read was the way to go but now I'm reconsidering.
Yes, that's standard behaviour (see the read specification and Field Splitting). A few shells (ash-based including dash, pdksh-based, zsh, yash at least) used not to do it, but except for zsh (when not in POSIX mode), busybox sh, most of them have been updated for POSIX compliance.
That's the same for:
$ var='a:b:c:' IFS=:
$ set -- $var; echo "$#"
3
(see how the POSIX specification for read actually defers to the Field Splitting mechanism where a:b:c: is split into 3 fields, and so with IFS=: read -r a b c, there are as many fields as variables).
The rationale is that in ksh (on which the POSIX spec is based) $IFS (initially in the Bourne shell the internal field separator) became a field delimiter, I think so any list of elements (not containing the delimiter) could be represented.
When $IFS is a separator, one can't represent a list of one empty element ("" is split into a list of 0 element, ":" into a list of two empty elements¹). When it's a delimiter, you can express a list of zero element with "", or one empty element with ":", or two empty elements with "::".
It's a bit unfortunate as one of the most common usages of $IFS is to split $PATH. And a $PATH like /bin:/usr/bin: is meant to be split into "/bin", "/usr/bin", "", not just "/bin" and "/usr/bin".
Now, with POSIX shells (but not all shells are compliant in that regard), for word splitting upon parameter expansion, that can be worked around with:
IFS=:; set -o noglob
for dir in $PATH""; do
something with "${dir:-.}"
done
That trailing "" makes sure that if $PATH ends in a trailing :, an extra empty element is added. And also that an empty $PATH is treated as one empty element as it should be.
That approach can't be used for read though.
Short of switching to zsh, there's no easy work around other than inserting an extra : and remove it afterwards like:
echo a:b:c: | sed 's/:/::/2' | { IFS=: read -r x y z; z=${z#:}; echo "$z"; }
Or (less portable):
echo a:b:c: | paste -d: - /dev/null | { IFS=: read -r x y z; z=${z%:}; echo "$z"; }
I've also added the -r which you generally want when using read.
Most likely here you'd want to use a proper text processing utility like sed/awk/perl instead of writing convoluted and probably inefficient code around read which has not been designed for that.
¹ Though in the Bourne shell, that was still split into zero elements as there was no distinction between IFS-whitespace and IFS-non-whitespace characters there, something that was also added by ksh
One "feature" of read is that it will strip leading and trailing whitespace separators in the variables it populates - it is explained in much more detail at the linked answer. This enables beginners to have read do what they expect when doing for example read first rest <<< ' foo bar ' (note the extra spaces).
The take-away? It is hard to do accurate text processing using Bash and shell tools. If you want full control it's probably better to use a "stricter" language like for example Python, where split() will do what you want, but where you might have to dig much deeper into string handling to explicitly remove newline separators or handle encoding.
I know it is possible to invert grep output with the -v flag. Is there a way to only output the non-matching part of the matched line? I ask because I would like to use the return code of grep (which sed won't have). Here's sort of what I've got:
tags=$(grep "^$PAT" >/dev/null 2>&1)
[ "$?" -eq 0 ] && echo $tags
You could use sed:
$ sed -n "/$PAT/s/$PAT//p" $file
The only problem is that it'll return an exit code of 0 as long as the pattern is good, even if the pattern can't be found.
Explanation
The -n parameter tells sed not to print out any lines. Sed's default is to print out all lines of the file. Let's look at each part of the sed program in between the slashes. Assume the program is /1/2/3/4/5:
/$PAT/: This says to look for all lines that matches pattern $PAT to run your substitution command. Otherwise, sed would operate on all lines, even if there is no substitution.
/s/: This says you will be doing a substitution
/$PAT/: This is the pattern you will be substituting. It's $PAT. So, you're searching for lines that contain $PAT and then you're going to substitute the pattern for something.
//: This is what you're substituting for $PAT. It is null. Therefore, you're deleting $PAT from the line.
/p: This final p says to print out the line.
Thus:
You tell sed not to print out the lines of the file as it processes them.
You're searching for all lines that contain $PAT.
On these lines, you're using the s command (substitution) to remove the pattern.
You're printing out the line once the pattern is removed from the line.
How about using a combination of grep, sed and $PIPESTATUS to get the correct exit-status?
$ echo Humans are not proud of their ancestors, and rarely invite
them round to dinner | grep dinner | sed -n "/dinner/s/dinner//p"
Humans are not proud of their ancestors, and rarely invite them round to
$ echo $PIPESTATUS[1]
0[1]
The members of the $PIPESTATUS array hold the exit status of each respective command executed in a pipe. $PIPESTATUS[0] holds the exit status of the first command in the pipe, $PIPESTATUS[1] the exit status of the second command, and so on.
Your $tags will never have a value because you send it to /dev/null. Besides from that little problem, there is no input to grep.
echo hello |grep "^he" -q ;
ret=$? ;
if [ $ret -eq 0 ];
then
echo there is he in hello;
fi
a successful return code is 0.
...here is 1 take at your 'problem':
pat="most of ";
data="The apples are ripe. I will use most of them for jam.";
echo $data |grep "$pat" -q;
ret=$?;
[ $ret -eq 0 ] && echo $data |sed "s/$pat//"
The apples are ripe. I will use them for jam.
... exact same thing?:
echo The apples are ripe. I will use most of them for jam. | sed ' s/most\ of\ //'
It seems to me you have confused the basic concepts. What are you trying to do anyway?
I am going to answer the title of the question directly instead of considering the detail of the question itself:
"grep a pattern and output non-matching part of line"
The title to this question is important to me because the pattern I am searching for contains characters that sed will assign special meaning to. I want to use grep because I can use -F or --fixed-strings to cause grep to interpret the pattern literally. Unfortunately, sed has no literal option, but both grep and bash have the ability to interpret patterns without considering any special characters.
Note: In my opinion, trying to backslash or escape special characters in a pattern appears complex in code and is unreliable because it is difficult to test. Using tools which are designed to search for literal text leaves me with a comfortable 'that will work' feeling without considering POSIX.
I used both grep and bash to produce the result because bash is slow and my use of fast grep creates a small output from a large input. This code searches for the literal twice, once during grep to quickly extract matching lines and once during =~ to remove the match itself from each line.
while IFS= read -r || [[ -n "$RESULT" ]]; do
if [[ "$REPLY" =~ (.*)("$LITERAL_PATTERN")(.*) ]]; then
printf '%s\n' "${BASH_REMATCH[1]}${BASH_REMATCH[3]}"
else
printf "NOT-REFOUND" # should never happen
exit 1
fi
done < <(grep -F "$LITERAL_PATTERN" < "$INPUT_FILE")
Explanation:
IFS= Reassigning the input field separator is a special prefix for a read statement. Assigning IFS to the empty string causes read to accept each line with all spaces and tabs literally until end of line (assuming IFS is default space-tab-newline).
-r Tells read to accept backslashes in the input stream literally instead of considering them as the start of an escape sequence.
$REPLY Is created by read to store characters from the input stream. The newline at the end of each line will NOT be in $REPLY.
|| [[ -n "$REPLY" ]] The logical or causes the while loop to accept input which is not newline terminated. This does not need to exist because grep always provides a trailing newline for every match. But, I habitually use this in my read loops because without it, characters between the last newline and the end of file will be ignored because that causes read to fail even though content is successfully read.
=~ (.*)("$LITERAL_PATTERN")(.*) ]] Is a standard bash regex test, but anything in quotes in taken as a literal. If I wanted =~ to consider the regex characters in contained in $PATTERN, then I would need to eliminate the double quotes.
"${BASH_REMATCH[#]}" Is created by [[ =~ ]] where [0] is the entire match and [N] is the contents of the match in the Nth set of parentheses.
Note: I do not like to reassign stdin to a while loop because it is easy to error and difficult to see what is happening later. I usually create a function for this type of operation which acts typically and expects file_name parameters or reassignment of stdin during the call.
I want to assign the grep result to a variable for further use:
lines=$(cat abc.txt | grep "hello")
but seems newline characters are removed in the result, when I do
echo $lines
only one line is printed. How can I preserve newline characters, so when I echo $lines, it generates the same result as cat abc.txt | grep "hello" does.
You want to say
echo "$lines"
instead of
echo $lines
To elaborate:
echo $lines means "Form a new command by replacing $lines with the contents of the variable named lines, splitting it up on whitespace to form zero or more new arguments to the echo command. For example:
lines='1 2 3'
echo $lines # equivalent to "echo 1 2 3"
lines='1 2 3'
echo $lines # also equivalent to "echo 1 2 3"
lines="1
2
3"
echo $lines # also equivalent to "echo 1 2 3"
All these examples are equivalent, because the shell ignores the specific kind of whitespace between the individual words stored in the variable lines. Actually, to be more precise, the shell splits the contents of the variable on the characters of the special IFS (Internal Field Separator) variable, which defaults (at least on my version of bash) to the three characters space, tab, and newline.
echo "$lines", on the other hand, means to form a single new argument from the exact value of the variable lines.
For more details, see the "Expansion" and "Word Splitting" sections of the bash manual page.
Using the Windows port for grep (not the original question, I kow and not applicable to *nix). I found that -U solved the problem nicely.
From the --help:
-U, --binary do not strip CR characters at EOL (MSDOS)