output of sed gives strange result when using capture groups - bash

I'm doing the following command in a bash:
echo -e 'UNUSED\nURL: ^/tags/0.0.0/abcd' | sed -rn 's#^URL: \^/tags/([^/]+)/#\1#p'
I think this should output only the matching lines and the content of the capture group. So I'm expecting 0.0.0 as the result. But I'm getting 0.0.0abcd
Why contains the capture group parts from the left and the right side of the /? What I am doing wrong?

echo -e 'UNUSED\nURL: ^/tags/0.0.0/abcd' |
sed -rn 's#^URL: \^/tags/([^/]+)/#\1#p'
echo outputs two lines:
UNUSED
URL: ^/tags/0.0.0/abcd
The regular expression given to sed does not match the first line, so this line is not printed. The regular expression matches the second line, so URL: ^/tags/0.0.0/ is replaced with 0.0.0; only the matched part of the line is replaced, so abcd is passed unchanged.
To obtain the desired output you must also match abcd, for example with
sed -rn 's#^URL: \^/tags/([^/]+)/.*#\1#p'
where the .* eats all characters to the end of the line.

You can use awk:
echo -e 'UNUSED\nURL: ^/tags/0.0.0/abcd' | awk -F/ 'index($0, "^/tags/"){print $3}'
0.0.0
This awk command uses / as field delimiter and prints 3rd column when there ^/tags/ text in input.
Alternatively, you can use gnu grep:
echo -e 'UNUSED\nURL: ^/tags/0.0.0/abcd' | grep -oP '^URL: \^/tags/\K([^/]+)'
0.0.0
Or this sed:
echo -e 'UNUSED\nURL: ^/tags/0.0.0/abcd' | sed -nE 's~^URL: \^/tags/([^/]+).*~\1~p'
0.0.0

This sed catch your desired output.
echo -e 'UNUSED\nURL: ^/tags/0.0.0/abcd' | sed -E '/URL/!d;s#.*/(.*)/[^/]*#\1#'

Related

How to filter all the paths from the urls using "sed" or "grep"

I was trying to filter all the files from the URLs and get only paths.
echo -e "http://sub.domain.tld/secured/database_connect.php\nhttp://sub.domain.tld/section/files/image.jpg\nhttp://sub.domain.tld/.git/audio-files/top-secret/audio.mp3" | grep -Ei "(http|https)://[^/\"]+" | sort -u
http://sub.domain.tld
But I want the result like this
http://sub.domain.tld/secured/
http://sub.domain.tld/section/files/
http://sub.domain.tld/.git/audio-files/top-secret/
Is there any way to do it with sed or grep
Using grep
$ echo ... | grep -o '.*/'
http://sub.domain.tld/secured/
http://sub.domain.tld/section/files/
http://sub.domain.tld/.git/audio-files/top-secret/
with grep
If your grep has the -o option:
... | grep -Eio 'https?://.*/'
If there could be multiple URLs per line:
... | grep -Eio 'https?://[^[:space:]]+/'
with sed
If the input is always precisely one URL per line and nothing else, you can just delete the filename part:
... | sed 's/[^/]*$//'
You could use match function of awk, will work in any version of awk. Simple explanation would be, passing echo command's output to awk program. Using match matching everything till last occurrence of / and then printing the sub-string to print just before /(with -1 to RLENGTH).
your_echo_command | awk 'match($0,/.*\//){print substr($0,RSTART,RLENGTH-1)}'
GNU Awk
$ echo ... | awk 'match($0,/.*\//,a){print a[0]}'
$ echo ... | awk '{print gensub(/(.*\/).*/,"\\1",1)}'
$ echo ... | awk 'sub(/[^/]*$/,"")'
http://sub.domain.tld/secured/
http://sub.domain.tld/section/files/
http://sub.domain.tld/.git/audio-files/top-secret/
xargs
$ echo ... | xargs -i sh -c 'echo $(dirname "{}")/'
http://sub.domain.tld/secured/
http://sub.domain.tld/section/files/
http://sub.domain.tld/.git/audio-files/top-secret/

Regular expression using sed in UNIX

I want to replace variable using sed .To replace i need to know what is present in a file ,So i want to extract that string using regular expression .
$ cat file1.txt
select * from ${database_name}.tab_name;
I want to take ${type_database_name_env} into a string and use sed replace command to replace that variable with actual name
sed -n 's/[${][a-z][_][a-z][_][a-z][_][a-z][}]/,/./p' file1.txt
I need output as
$ var1=`sed command` # I am looking for proper sed command
$ echo $var1
${database_name}
With grep, you may use
var1="$(grep -o '\${[^{}]*}' file1.txt | head -1)"
The | head -1 is used to exract the first match in case there are more.
See the online demo:
f='select * from ${database_name}.tab_name;'
var1="$(grep -o '\${[^{}]*}' <<< "$f" | head -1)"
echo "$var1"
With sed, you may use
var1="$(sed -En 's/.*(\$\{[^{}]*}).*/\1/p' file"
See the online demo:
f='select * from ${database_name}.tab_name;'
var1="$(sed -En 's/.*(\$\{[^{}]*}).*/\1/p' <<< $f)"
echo "$var1"
# => ${database_name}
Regex details
.* - matches 0+ chars
(\$\{[^{}]*}) - captures into Group 1 (\1) a $ char followed with {, 0+ chars other than { and } and then a }
.* - matches 0+ chars.
As the replacement is the reference to the Group 1 text, it is all there remains after sed does its job. Note the -E option: it enables the POSIX ERE syntax where (...) are used to specify a capturing group, not \(...\).
You could just use awk:
$ awk -F'[ .]+' '{print $4}' file
${database_name}

bash script command output execution doesn't assign full output when using backticks

I used many times [``] to capture output of command to a variable. but with following code i am not getting right output.
#!/bin/bash
export XLINE='($ZWP_SCRIP_NAME),$ZWP_LT_RSI_TRIGGER)R),$ZWP_RTIMER'
echo 'Original XLINE'
echo $XLINE
echo '------------------'
echo 'Extract all word with $ZWP'
#works fine
echo $XLINE | sed -e 's/\$/\n/g' | sed -e 's/.*\(ZWP[_A-Z]*\).*/\1/g' | grep ZWP
echo '------------------'
echo 'Assign all word with $ZWP to XVAR'
#XVAR doesn't get all the values
export XVAR=`echo $XLINE | sed -e 's/\$/\n/g' | sed -e 's/.*\(ZWP[_A-Z]*\).*/\1/g' | grep ZWP` #fails
echo "$XVAR"
and i get:
Original XLINE
($ZWP_SCRIP_NAME),$ZWP_LT_RSI_TRIGGER)R),$ZWP_RTIMER
------------------
Extract all word with $ZWP
ZWP_SCRIP_NAME
ZWP_LT_RSI_TRIGGER
ZWP_RTIMER
------------------
Assign all word with $ZWP to XVAR
ZWP_RTIMER
why XVAR doesn't get all the values?
however if i use $() to capture the out instead of ``, it works fine. but why `` is not working?
Having GNU grep you can use this command:
XVAR=$(grep -oP '\$\KZWP[A-Z_]+' <<< "$XLINE")
If you pass -P grep is using Perl compatible regular expressions. The key here is the \K escape sequence. Basically the regex matches $ZWP followed by one or more uppercase characters or underscores. The \K after the $ removes the $ itself from the match, while its presence is still required to match the whole pattern. Call it poor man's lookbehind if you want, I like it! :)
Btw, grep -o outputs every match on a single line instead of just printing the lines which match the pattern.
If you don't have GNU grep or you care about portability you can use awk, like this:
XVAR=$(awk -F'$' '{sub(/[^A-Z_].*/, "", $2); print $2}' RS=',' <<< "$XLINE")
First, the smallest change that makes your code "work":
echo "$XLINE" | tr '$' '\n' | sed -e 's/.*\(ZWP[_A-Z]*\).*/\1/g' | grep ZWP_
The use of tr replaces a sed expression that didn't actually do what you thought it did -- try looking at its output to see.
One sane alternative would be to rely on GNU grep's -o option. If you can't do that...
zwpvars=( ) # create a shell array
zwp_assignment_re='[$](ZWP_[[:alnum:]_]+)(.*)' # ...and a regex
content="$XLINE"
while [[ $content =~ $zwp_assignment_re ]]; do
zwpvars+=( "${BASH_REMATCH[1]}" ) # found a reference
content=${BASH_REMATCH[2]} # stuff the remaining content aside
done
printf 'Found variable: %s\n' "${zwpvars[#]}"

sed output in case of not match

I want to extract some fields from a line, like:
echo "aaa,bbb-ccc,ddd" | sed 's/.*,\(.*\)-\(.*\),.*/\1 \2/'
It will output "bbb ccc" as i expect. But if the input line is not of the pattern,
echo "aaa,bbb,ccc,ddd" | sed 's/.*,\(.*\)-\(.*\),.*/\1 \2/'
It will output the whole line "aaa,bbb,ccc,ddd". What i want is when not matched, output nothing. How can i do it?
sed -n 's/.*,\(.*\)-\(.*\),.*/\1 \2/p'
Example:
$ echo "aaa,bbb-ccc,ddd" | sed -n 's/.*,\(.*\)-\(.*\),.*/\1 \2/p'
bbb ccc
$ echo "aaa,bbb,ccc,ddd" | sed -n 's/.*,\(.*\)-\(.*\),.*/\1 \2/p'
$
-n disables printing of the pattern space by default.
p makes sed print when matched.
echo "a-b" | sed -n 's/-/#/p'
a#b
echo "a_b" | sed -n 's/-/#/p'
-n means 'no output' as default.
/p means print
If that's the case, think of a simpler approach
echo "aaa,bbb-ccc,ddd" | awk -F"," '$2 ~ /-/{ print $2}'

Using sed to replace a string with the contents of a variable, even if it's an escape character

I'm using
sed -e "s/\*DIVIDER\*/$DIVIDER/g" to replace *DIVIDER* with a user-specified string, which is stored in $DIVIDER. The problem is that I want them to be able to specify escape characters as their divider, like \n or \t. When I try this, I just end up with the letter n or t, or so on.
Does anyone have any ideas on how to do this? It will be greatly appreciated!
EDIT: Here's the meat of the script, I must be missing something.
curl --silent "$URL" > tweets.txt
if [[ `cat tweets.txt` == *\<error\>* ]]; then
grep -E '(error>)' tweets.txt | \
sed -e 's/<error>//' -e 's/<\/error>//' |
sed -e 's/<[^>]*>//g' |
head $headarg | sed G | fmt
else
echo $REPLACE | awk '{gsub(".", "\\\\&");print}'
grep -E '(description>)' tweets.txt | \
sed -n '2,$p' | \
sed -e 's/<description>//' -e 's/<\/description>//' |
sed -e 's/<[^>]*>//g' |
sed -e 's/\&amp\;/\&/g' |
sed -e 's/\&lt\;/\</g' |
sed -e 's/\&gt\;/\>/g' |
sed -e 's/\&quot\;/\"/g' |
sed -e 's/\&....\;/\?/g' |
sed -e 's/\&.....\;/\?/g' |
sed -e 's/^ *//g' |
sed -e :a -e '$!N;s/\n/\*DIVIDER\*/;ta' | # Replace newlines with *divider*.
sed -e "s/\*DIVIDER\*/${DIVIDER//\\/\\\\}/g" | # Replace *DIVIDER* with the actual divider.
head $headarg | sed G
fi
The long list of sed lines are replacing characters from an XML source, and the last two are the ones that are supposed to replace the newlines with the specified character. I know it seems redundant to replace a newline with another newline, but it was the easiest way I could come up with to let them pick their own divider. The divider replacement works great with normal characters.
You can use bash to escape the backslash like this:
sed -e "s/\*DIVIDER\*/${DIVIDER//\\/\\\\}/g"
The syntax is ${name/pattern/string}. If pattern begins with /, every occurence of pattern in name is replaced by string. Otherwise only the first occurence is replaced.
Maybe:
case "$DIVIDER" in
(*\\*) DIVIDER=$(echo "$DIVIDER" | sed 's/\\/\\\\/g');;
esac
I played with this script:
for DIVIDER in 'xx\n' 'xxx\\ddd' "xxx"
do
echo "In: <<$DIVIDER>>"
case "$DIVIDER" in (*\\*) DIVIDER=$(echo "$DIVIDER" | sed 's/\\/\\\\/g');;
esac
echo "Out: <<$DIVIDER>>"
done
Run with 'ksh' or 'bash' (but not 'sh') on MacOS X:
In: <<xx\n>>
Out: <<xx\\n>>
In: <<xxx\\ddd>>
Out: <<xxx\\\\ddd>>
In: <<xxx>>
Out: <<xxx>>
It seems to be a simple substitution:
$ d='\n'
$ echo "a*DIVIDER*b" | sed "s/\*DIVIDER\*/$d/"
a
b
Maybe I don't understand what you're trying to accomplish.
Then maybe this step could take the place of the last two of yours:
sed -n ":a;$ {s/\n/$DIVIDER/g;p;b};N;ba"
Note the space after the dollar sign. It prevents the shell from interpreting "${s..." as a variable name.
And as ghostdog74 suggested, you have way too many calls to sed. You may be able to change a lot of the pipe characters to backslashes (line continuation) and delete "sed" from all but the first one (leave the "-e" everywhere). (untested)
You just need to escape the escape char.
\n will match \n
\ will match \
\\ will match \
Using FreeBSD sed (e.g. on Mac OS X) you have to preprocess the $DIVIDER user input:
d='\n'
d='\t'
NL=$'\\\n'
TAB=$'\\\t'
d="${d/\\n/${NL}}"
d="${d/\\t/${TAB}}"
echo "a*DIVIDER*b" | sed -E -e "s/\*DIVIDER\*/${d}/"

Resources