Matching to multi-line string in bash with word boundary not working - bash

I'm using GNU bash, version 4.4.19 to match to a line in a multi-line string (which I'm reading from a file).
File in.txt:
abc/def/
bar/foo/x
foobar/foo/y
foobar/quux/
In this file, with a pattern like ^bar/foo/.*$, I'm trying to match bar/foo/ (and not foobar/foo/y).
But since it's a multiline string, ^ and $ will not match each line but rather all of the string. Therefore I'm trying to use \b (word boundary) in my regexp.
This is what I'm trying but it's not working.:
in="$(cat in.txt)"
re=\\bbar/foo/.*\\b
[[ "$in" =~ $re ]] && echo OK
Other patterns I tried and didn't work:
re=\bbar/foo/.*\n
re=\\bbar/foo/.*\\n
re=\\bbar\/foo\/.*\\b
re=\\bbar\/foo\/.*\\n
re=\\bbar\/foo\/\(.*\)\\n

bash uses POSIX.2 regular expressions for =~ (see man 7 regex) as described here
... a '\' followed by
one of the characters "^.[$()|*+?{\" (matching that character taken as
an ordinary character), a '\' followed by any other character(!)
(matching that character taken as an ordinary character, as if the '\'
had not been present(!)), or a single character with no other signifi‐
cance (matching that character).
and they don't support \b as word boundary delimiter.
Your best option is to match line by line
ii=( $(< in.txt) )
for l in "${ii[#]}"
do
[[ "$l" =~ $re ]] && echo "match: $l"
done

Grep's P option supports word boundaries \b. So with GNU's grep you could do:
in="$(cat in.txt)"
re="\bbar/foo/[^ ]*"
grep -oP "$re" <<< $in
The output is:
bar/foo/x

You can use awk to match the wanted line:
awk '/^bar\/foo\/*/{print $0}' input_file
Hope this helps.

Related

Cut string of numbers at letter in bash

I have a string such as plantford1775.274.284b63.11.
I have been using identity=$( echo "$identity" | cut -d'.' -f3) to cut at each dot, and then choose the third section. I am left with 284b63.
The format of this part is always a letter, sandwiched by varying amounts of numbers. I would like to take the first few numbers before the letter. An example code line would be this:
identity=$( echo "$identity" | cut -d'anyletter' -f1)
What do I replace anyletter with to cut at whatever letter is listed there, so that I end with a string of 284?
This could be done in single awk, please try following written and tested with your shown samples.
echo "$identity" | awk -F'.' '{sub(/[^0-9].*/,"",$3);print $3}'
Explanation: simple explanation would be, passing echo command's output as a standard input to awk code. In awk program, setting field separator as . for values. Then in 3rd field substituting(using sub function of awk) everything apart from digits with NULL in 3rd field, then printing it.
Try:
echo plantford1775.274.284b63.11 | cut -d. -f3 | sed 's/[a-z].*//'
Or a slight variation on the REGEX, with [[...]] in bash:
v="plantford1775.274.284b63.11"
[[ $v =~ ^[^.]+.[^.]+.([^.]+).*$ ]] && echo ${BASH_REMATCH[1]}
Output
284b63
Or if you are only interested in the digits before the letter:
[[ $v =~ ^[^.]+.[^.]+.([[:digit:]]+)[^.]+.*$ ]] && echo ${BASH_REMATCH[1]}
Output
284
With bash, using the =~ operator :
[[ $identity =~ [^.]*.[^.]*.([0-9]+) ]] && identity=${BASH_REMATCH[1]}
or, in POSIX shell:
identity=${identity#*.*.}
identity=${identity%%[^0-9]*}
or, using sed:
identity=$(sed 's/[^.]*.[^.]*.\([0-9]*\).*/\1/' <<< "$identity")
Maybe you can use a bash regex and get the result from $BASH_REMATCH.
[[ "$identity" =~ ([0-9]+)[a-z][0-9]+ ]] && identity="${BASH_REMATCH[1]}"
Say we have
identity=284b63
then you can do a
lead=${identity%[a-z]*}
to set lead to 284. Feel free to adapt the pattern to upper case letters and/or other separators.
If the format of this part is always a letter, sandwiched by varying amounts of numbers, and you want to match this format, you might also use gnu awk, setting the field separator to . and use a pattern with a capture group for the 3rd field.
The pattern captures 1 or more digits from the start of the string, and match one of more chars [a-z] after it followed by a digit.
echo "$identity" | awk -F'.' 'match($3, /^([0-9]+)[a-z]+[0-9]/, ary) {print ary[1]}'
Output
284
Or using sed with a pattern matching the first 2 dots and the capture group after the 2nd dot:
identity=$(sed 's/^[^.]\+\.[^\.]\+\.\([0-9]\+\)[a-z]\+[0-9].*/\1/' <<< "$identity")

How to properly expand a Bash variable that contains newlines on sed replacement (insertion) side

Bear with me at first, thank you. Suppose I have
$ echo $'foo\nbar'
foo
bar
Now when I assign the string to a Bash variable, Bash does not give the same vertical output anymore:
$ str='foo\nbar'
$
$ echo $str
foo\nbar
$
$ echo $'str'
str
Try printf:
$ printf "$str\n"
foo
bar
Those examples are for illustration purposes because I am looking for a way to expand the newline(s) inside the $str variable such that I can substitute the $str variable on sed replacement (insertion) side.
# this does not work:
sed -i.bak $'/<!-- insert here -->/i\\\n'$'str'$'\\\n' index.html
# this works as expected though:
sed -i.bak $'/<!-- insert here -->/i\\\n'foo$'\\\n'bar$'\\\n' index.html
I did several ways to hack this but none worked; here is one example:
# this does not work:
sed -i.bak $'/<!-- insert here -->/i\\\n'`printf 'foo\\x0Abar'`$'\\\n' index.html
Further tests, I realized that as long as the variable does not contain newlines, things work as expected:
# This works as long as str2 does not contain any newline.
str2='foo_bar'
sed -i.bak $'/<!-- insert here -->/i\\\n'$str2$'\\\n' index.html
The expected result is that sed will insert 2 liners in place before <!-- insert here --> of the index.html file.
foo
bar
<!-- insert here -->
I try to achieve this as one liner. I know I can break sed into the vertical, multi-line form, which will be easier for me; however, I want to explore if there is a one liner style.
Is this doable or not?
My system is macOS High Sierra 10.13.6
Bash version: 3.2.57(1)-release
BSD sed was last updated on May 10, 2005
Your examples have a few subtle error, so here are a few examples regarding quoting and newlines in strings in bash and sed.
How quoting works in general:
# bash converts escape-sequence '\n' to real newline (0x0a) before passing it to echo
$ echo $'foo\nbar'
foo
bar
# bash passes literal 8 characters 'foo\nbar' to echo and echo simply prints them
$ echo 'foo\nbar'
foo\nbar
# bash passes literal 8 characters 'foo\nbar' to echo and echo converts escape-sequence
$ echo -e 'foo\nbar'
foo
bar
# bash passes literal string 'foo\nbar' to echo (twice)
# then echo recombines both arguments using a single space
$ str='foo\nbar'
$ echo $str "$str"
foo\nbar foo\nbar
# bash interprets escape-sequences and stores result 'foo<0x0a>bar' in str,
# then passes two arguments 'foo' and 'bar' to echo, due to "word splitting"
# then echo recombines both arguments using a single space
$ str=$'foo\nbar'
$ echo $str
foo bar
# bash interprets escape-sequences and stores result 'foo<0x0a>bar' in str,
# then passes it as a single argument to echo, without "word splitting"
$ str=$'foo\nbar'
$ echo "$str"
foo
bar
How to apply shell quoting, when dealing with newlines in sed
# replace a character with newline, using newline's escape-sequence
# sed will convert '\n' to a literal newline (0x0a)
$ sed 's/-/foo\nbar/' <<< 'blah-blah'
# replace a character with newline, using newline's escape-sequence in a variable
# sed will convert '\n' to a literal newline (0x0a)
$ str='foo\nbar' # str contains the escape-sequence '\n' and not a literal newline
$ sed 's/-/'"$str"'/' <<< 'blah-blah'
# replace a character with newline, using a literal newline.
# note the line-continuation-mark \ after 'foo' before the literal newline,
# which is part of the sed script, since everything in-between '' is literal
$ sed 's/-/foo\
bar/' <<< 'blah-blah' # end-of-command
# replace a character with newline, using a newline in shell-escape-mode
# note the same line-continuation-mark \ before $'\n', which is part of the sed script
# note: the sed script is a single string composed of three parts '…\', $'\n' and '…',
$ sed 's/-/foo\'$'\n''bar/' <<< 'blah-blah'
# the same as above, but with a single shell-escape-mode string instead of 3 parts.
# note the required quoting of the line-continuation-mark with an additional \ escape
# i.e. after shell-escaping the sed script contains a single \ and a literal newline
$ sed $'s/-/foo\\\nbar/' <<< 'blah-blah'
# replace a character with newline, using a shell-escaped string in a variable
$ str=$'\n' # str contains a literal newline (0x0a) due to shell escaping
$ sed 's/-/foo\'"$str"'bar/' <<< 'blah-blah'
# same as above with the required (quoted) line-continuation inside the variable
# note, how the single \ from '…foo\' (previous example) became \\ inside $'\\…'
$ str=$'\\\n' # str contains \ and a literal newline (0x0a) due to shell escaping
$ sed 's/-/foo'"$str"'bar/' <<< 'blah-blah'
All the sed examples will print the same:
blahfoo
barblah
So, a newline in sed's replacement string must either be
(1) newline's escape-sequence (i.e. '\n'), so sed can replace it with a literal newline, or
(2) a literal newline preceded by a line-continuation-mark (i.e. $'\\\n' or '\'$'\n', which is NOT the same as '\\\n' or '\\n' or $'\\n').
This means you need to replace each literal newline <0x0a> with newline's escape-sequence \n or insert a line-continuation-mark before each literal newline inside your replacement string before double-quote-expanding it into sed's substitute replacement string.
Since there are many more caveats regarding escaping in sed, I recommend you use awk's gsub function instead passing your replacement string as a variable via -v, e.g.
$ str=$'foo\nbar'
$ awk -v REP="$str" -- '{gsub(/-/, REP); print}' <<< 'blah-blah'
blahfoo
barblah
PS: I don't know, if this answer is entirely true in your case, because your operating system uses an outdated version of bash.
echo -e $str
where -e is
enable interpretation of backslash escapes
Use sed command r to insert arbitrary text
str="abc\ndef"
tmp=$(mktemp)
(
echo
printf -- "$str"
echo
) > "$tmp"
sed -i.bak '/<!-- insert here -->/r '"$tmp" index.html
rm -r "$tmp"
sed interprets newline as command delimiter. The ; doesn't really is a seds command delimeter, only newline is. Don't append/suffix ; or } or spaces in the w command - it will be interpreted as part of the filename (yes, spaces also). sed commands like w or r are escaped by a newline.
If you want more flexibility, rather move to awk.

shell script for reading file and replacing new file with | symbol

i have txt file like below.
abc
def
ghi
123
456
789
expected output is
abc|def|ghi
123|456|789
I want replace new line with pipe symbol (|). i want to use in egrep.After empty line it should start other new line.
you can try with awk
awk -v RS= -v OFS="|" '{$1=$1}1' file
you get,
abc|def|ghi
123|456|789
Explanation
Set RS to a null/blank value to get awk to operate on sequences of blank lines.
From the POSIX specification for awk:
RS
The first character of the string value of RS shall be the input record separator; a by default. If RS contains more than one character, the results are unspecified. If RS is null, then records are separated by sequences consisting of a plus one or more blank lines, leading or trailing blank lines shall not result in empty records at the beginning or end of the input, and a shall always be a field separator, no matter what the value of FS is.
$1==$1 re-formatting output with OFS as separator, 1 is true for always print.
Here's one using GNU sed:
cat file | sed ':a; N; $!ba; s/\n/|/g; s/||/\n/g'
If you're using BSD sed (the flavor packaged with Mac OS X), you will need to pass in each expression separately, and use a literal newline instead of \n (more info):
cat file | sed -e ':a' -e 'N' -e '$!ba' -e 's/\n/|/g' -e 's/||/\
/g'
If file is:
abc
def
ghi
123
456
789
You get:
abc|def|ghi
123|456|789
This replaces each newline with a | (credit to this answer), and then || (i.e. what was a pair of newlines in the original input) with a newline.
The caveat here is that | can't appear at the beginning or end of a line in your input; otherwise, the second sed will add newlines in the wrong places. To work around that, you can use another character that won't be in your input as an intermediate value, and then replace singletons of that character with | and pairs with \n.
EDIT
Here's an example that implements the workaround above, using the NUL character \x00 (which should be highly unlikely to appear in your input) as the intermediate character:
cat file | sed ':a;N;$!ba; s/\n/\x00/g; s/\x00\x00/\n/g; s/\x00/|/g'
Explanation:
:a;N;$!ba; puts the entire file in the pattern space, including newlines
s/\n/\x00/g; replaces all newlines with the NUL character
s/\x00\x00/\n/g; replaces all pairs of NULs with a newline
s/\x00/|/g replaces the remaining singletons of NULs with a |
BSD version:
sed -e ':a' -e 'N' -e '$!ba' -e 's/\n/\x00/g' -e 's/\x00\x00/\
/g' -e 's/\x00/|/g'
EDIT 2
For a more direct approach (GNU sed only), provided by #ClaudiuGeorgiu:
sed -z 's/\([^\n]\)\n\([^\n]\)/\1|\2/g; s/\n\n/\n/g'
Explanation:
-z uses NUL characters as line-endings (so newlines are not given special treatment and can be matched in the regular expression)
s/\([^\n]\)\n\([^\n]\)/\1|\2/g; replaces every 3-character sequence of <non-newline><newline><non-newline> with <non-newline>|<non-newline>
s/\n\n/\n/g replaces all pairs of newlines with a single newline
In native bash:
#!/usr/bin/env bash
curr=
while IFS= read -r line; do
if [[ $line ]]; then
curr+="|$line"
else
printf '%s\n' "${curr#|}"
curr=
fi
done
[[ $curr ]] && printf '%s\n' "${curr#|}"
Tested:
$ f() { local curr= line; while IFS= read -r line; do if [[ $line ]]; then curr+="|$line"; else printf '%s\n' "${curr#|}"; curr=; fi; done; [[ $curr ]] && printf '%s\n' "${curr#|}"; }
$ f < <(printf '%s\n' 'abc' 'def' 'ghi' '' 123 456 789)
abc|def|ghi
123|456|789
Use rs. For example:
rs -C'|' 2 3 < file
rs = reshape data array. Here I'm specifying that I want 2 rows, 3 columns, and the output separator to be pipe.

How to remove white spaces (\t, \n, \r, space) form the beginning and the end of a string in shell?

I want to remove white spaces (\t, \n, \r, space) form the beginning and the end of a string if they exist
How to do that?
Is it possibe to that only with expressions like ${str#*}?
If you're using bash (which your idea of ${str#} seems to suggest), then you can use this:
echo "${str##[[:space:]]}" # trim all initial whitespace characters
echo "${str%%[[:space:]]}" # trim all trailing whitespace characters
You can say
sed -e 's/^[ \t\r\n]*//' -e 's/[ \t\r\n]*$//' <<< "string"
# ^^^^^^^^^^^ ^^^^^^^^^^
# beginning end of string
Or use \s to match tab and space if it is supported by your sed version.
If you can use sed then:
echo "${str}" | sed -e 's/^[[:space:]]*//' -e 's/[[:space:]]*$//'

How can I capture the text between specific delimiters into a shell variable?

I have little problem with specifying my variable. I have a file with normal text and somewhere in it there are brackets [ ] (only 1 pair of brackets in whole file), and some text between them. I need to capture the text within these brackets in a shell (bash) variable. How can I do that, please?
Bash/sed:
VARIABLE=$(tr -d '\n' filename | sed -n -e '/\[[^]]/s/^[^[]*\[\([^]]*\)].*$/\1/p')
If that is unreadable, here's a bit of an explanation:
VARIABLE=`subexpression` Assigns the variable VARIABLE to the output of the subexpression.
tr -d '\n' filename Reads filename, deletes newline characters, and prints the result to sed's input
sed -n -e 'command' Executes the sed command without printing any lines
/\[[^]]/ Execute the command only on lines which contain [some text]
s/ Substitute
^[^[]* Match any non-[ text
\[ Match [
\([^]]*\) Match any non-] text into group 1
] Match ]
.*$ Match any text
/\1/ Replaces the line with group 1
p Prints the line
May I point out that while most of the suggested solutions might work, there is absolutely no reason why you should fork another shell, and spawn several processes to do such a simple task.
The shell provides you with all the tools you need:
$ var='foo[bar] pinch'
$ var=${var#*[}; var=${var%%]*}
$ echo "$var"
bar
See: http://mywiki.wooledge.org/BashFAQ/073
Sed is not necessary:
var=`egrep -o '\[.*\]' FILENAME | tr -d ][`
But it's only works with single line matches.
Using Bash builtin regex matching seems like yet another way of doing it:
var='foo[bar] pinch'
[[ "$var" =~ [^\]\[]*\[([^\[]*)\].* ]] # Bash 3.0
var="${BASH_REMATCH[1]}"
echo "$var"
Assuming you are asking about bash variable:
$ export YOUR_VAR=$(perl -ne'print $1 if /\[(.*?)\]/' your_file.txt)
The above works if brackets are on the same line.
What about:
shell_variable=$(sed -ne '/\[/,/\]/{s/^.*\[//;s/\].*//;p;}' $file)
Worked for me on Solaris 10 under Korn shell; should work with Bash too. Replace '$(...)' with back-ticks in Bourne shell.
Edit: worked when given [ on one line and ] on another. For the single line case as well, use:
shell_variable=$(sed -n -e '/\[[^]]*$/,/\]/{s/^.*\[//;s/\].*//;p;}' \
-e '/\[.*\]/s/^.*\[\([^]]*\)\].*$/\1/p' $file)
The first '-e' deals with the multi-line spread; the second '-e' deals with the single-line case. The first '-e' says:
From the line containing an open bracket [ not followed by a close bracket ] on the same line
Until the line containing close bracket ],
substitute anything up to and including the open bracket with an empty string,
substitute anything from the close bracket onwards with an empty string, and
print the result
The second '-e' says:
For any line containing both open bracket and close bracket
Substitute the pattern consisting of 'characters up to and including open bracket', 'characters up to but excluding close bracket' (and remember this), 'stuff from close bracket onwards' with the remembered characters in the middle, and
print the result
For the multi-line case:
$ file=xxx
$ cat xxx
sdsajdlajsdl
asdajsdkjsaldjsal
sdasdsad [aaaa
bbbbbbb
cccc] asdjsalkdjsaldjlsaj
asdjsalkdjlksjdlaj
asdasjdlkjsaldja
$ shell_variable=$(sed -n -e '/\[[^]]*$/,/\]/{s/^.*\[//;s/\].*//;p;}' \
-e '/\[.*\]/s/^.*\[\([^]]*\)\].*$/\1/p' $file)
$ echo $shell_variable
aaaa bbbbbbb cccc
$
And for the single-line case:
$ cat xxx
sdsajdlajsdl
asdajsdkjsaldjsal
sdasdsad [aaaa bbbbbbb cccc] asdjsalkdjsaldjlsaj
asdjsalkdjlksjdlaj
asdasjdlkjsaldja
$
$ shell_variable=$(sed -n -e '/\[[^]]*$/,/\]/{s/^.*\[//;s/\].*//;p;}' \
-e '/\[.*\]/s/^.*\[\([^]]*\)\].*$/\1/p' $file)
$ echo $shell_variable
aaaa bbbbbbb cccc
$
Somewhere about here, it becomes simpler to do the whole job in Perl, slurping the file and editing the result string in two multi-line substitute operations.
var=`grep -e '\[.*\]' test.txt | sed -e 's/.*\[\(.*\)\].*/\1/' infile.txt`
Thanks to everyone, i used Strager's version and works perfectly, thanks alot once again...
var=`grep -e '\[.*\]' test.txt | sed -e 's/.*\[\(.*\)\].*/\1/' infile.txt`
Backslashes (BSL) got munched up ... :
var='foo[bar] pinch'
[[ "$var" =~ [^\]\[]*\[([^\[]*)\].* ]] # Bash 3.0
# Just in case ...:
[[ "$var" =~ [^BSL]BSL[]*BSL[([^BSL[]*)BSL].* ]] # Bash 3.0
var="${BASH_REMATCH[1]}"
echo "$var"
2 simple steps to extract the text.
split var at [ and get the right part
split var at ] and get the left part
cb0$ var='foo[bar] pinch'
cb0$ var=${var#*[}
cb0$ var=${var%]*} && echo $var
bar

Resources