Regular expression using sed in UNIX

Regular expression using sed in UNIX - shell

I want to replace variable using sed .To replace i need to know what is present in a file ,So i want to extract that string using regular expression .
$ cat file1.txt
select * from ${database_name}.tab_name;
I want to take ${type_database_name_env} into a string and use sed replace command to replace that variable with actual name
sed -n 's/[${][a-z][_][a-z][_][a-z][_][a-z][}]/,/./p' file1.txt
I need output as
$ var1=`sed command` # I am looking for proper sed command
$ echo $var1
${database_name}

With grep, you may use
var1="$(grep -o '\${[^{}]*}' file1.txt | head -1)"
The | head -1 is used to exract the first match in case there are more.
See the online demo:
f='select * from ${database_name}.tab_name;'
var1="$(grep -o '\${[^{}]*}' <<< "$f" | head -1)"
echo "$var1"
With sed, you may use
var1="$(sed -En 's/.*(\$\{[^{}]*}).*/\1/p' file"
See the online demo:
f='select * from ${database_name}.tab_name;'
var1="$(sed -En 's/.*(\$\{[^{}]*}).*/\1/p' <<< $f)"
echo "$var1"
# => ${database_name}
Regex details
.* - matches 0+ chars
(\$\{[^{}]*}) - captures into Group 1 (\1) a $ char followed with {, 0+ chars other than { and } and then a }
.* - matches 0+ chars.
As the replacement is the reference to the Group 1 text, it is all there remains after sed does its job. Note the -E option: it enables the POSIX ERE syntax where (...) are used to specify a capturing group, not \(...\).

You could just use awk:
$ awk -F'[ .]+' '{print $4}' file
${database_name}

Related

output of sed gives strange result when using capture groups

I'm doing the following command in a bash:
echo -e 'UNUSED\nURL: ^/tags/0.0.0/abcd' | sed -rn 's#^URL: \^/tags/([^/]+)/#\1#p'
I think this should output only the matching lines and the content of the capture group. So I'm expecting 0.0.0 as the result. But I'm getting 0.0.0abcd
Why contains the capture group parts from the left and the right side of the /? What I am doing wrong?

echo -e 'UNUSED\nURL: ^/tags/0.0.0/abcd' |
sed -rn 's#^URL: \^/tags/([^/]+)/#\1#p'
echo outputs two lines:
UNUSED
URL: ^/tags/0.0.0/abcd
The regular expression given to sed does not match the first line, so this line is not printed. The regular expression matches the second line, so URL: ^/tags/0.0.0/ is replaced with 0.0.0; only the matched part of the line is replaced, so abcd is passed unchanged.
To obtain the desired output you must also match abcd, for example with
sed -rn 's#^URL: \^/tags/([^/]+)/.*#\1#p'
where the .* eats all characters to the end of the line.

You can use awk:
echo -e 'UNUSED\nURL: ^/tags/0.0.0/abcd' | awk -F/ 'index($0, "^/tags/"){print $3}'
0.0.0
This awk command uses / as field delimiter and prints 3rd column when there ^/tags/ text in input.
Alternatively, you can use gnu grep:
echo -e 'UNUSED\nURL: ^/tags/0.0.0/abcd' | grep -oP '^URL: \^/tags/\K([^/]+)'
0.0.0
Or this sed:
echo -e 'UNUSED\nURL: ^/tags/0.0.0/abcd' | sed -nE 's~^URL: \^/tags/([^/]+).*~\1~p'
0.0.0

This sed catch your desired output.
echo -e 'UNUSED\nURL: ^/tags/0.0.0/abcd' | sed -E '/URL/!d;s#.*/(.*)/[^/]*#\1#'

bash script command output execution doesn't assign full output when using backticks

I used many times [``] to capture output of command to a variable. but with following code i am not getting right output.
#!/bin/bash
export XLINE='($ZWP_SCRIP_NAME),$ZWP_LT_RSI_TRIGGER)R),$ZWP_RTIMER'
echo 'Original XLINE'
echo $XLINE
echo '------------------'
echo 'Extract all word with $ZWP'
#works fine
echo $XLINE | sed -e 's/\$/\n/g' | sed -e 's/.*\(ZWP[_A-Z]*\).*/\1/g' | grep ZWP
echo '------------------'
echo 'Assign all word with $ZWP to XVAR'
#XVAR doesn't get all the values
export XVAR=`echo $XLINE | sed -e 's/\$/\n/g' | sed -e 's/.*\(ZWP[_A-Z]*\).*/\1/g' | grep ZWP` #fails
echo "$XVAR"
and i get:
Original XLINE
($ZWP_SCRIP_NAME),$ZWP_LT_RSI_TRIGGER)R),$ZWP_RTIMER
------------------
Extract all word with $ZWP
ZWP_SCRIP_NAME
ZWP_LT_RSI_TRIGGER
ZWP_RTIMER
------------------
Assign all word with $ZWP to XVAR
ZWP_RTIMER
why XVAR doesn't get all the values?
however if i use $() to capture the out instead of ``, it works fine. but why `` is not working?

Having GNU grep you can use this command:
XVAR=$(grep -oP '\$\KZWP[A-Z_]+' <<< "$XLINE")
If you pass -P grep is using Perl compatible regular expressions. The key here is the \K escape sequence. Basically the regex matches $ZWP followed by one or more uppercase characters or underscores. The \K after the $ removes the $ itself from the match, while its presence is still required to match the whole pattern. Call it poor man's lookbehind if you want, I like it! :)
Btw, grep -o outputs every match on a single line instead of just printing the lines which match the pattern.
If you don't have GNU grep or you care about portability you can use awk, like this:
XVAR=$(awk -F'$' '{sub(/[^A-Z_].*/, "", $2); print $2}' RS=',' <<< "$XLINE")

First, the smallest change that makes your code "work":
echo "$XLINE" | tr '$' '\n' | sed -e 's/.*\(ZWP[_A-Z]*\).*/\1/g' | grep ZWP_
The use of tr replaces a sed expression that didn't actually do what you thought it did -- try looking at its output to see.
One sane alternative would be to rely on GNU grep's -o option. If you can't do that...
zwpvars=( ) # create a shell array
zwp_assignment_re='[$](ZWP_[[:alnum:]_]+)(.*)' # ...and a regex
content="$XLINE"
while [[ $content =~ $zwp_assignment_re ]]; do
zwpvars+=( "${BASH_REMATCH[1]}" ) # found a reference
content=${BASH_REMATCH[2]} # stuff the remaining content aside
done
printf 'Found variable: %s\n' "${zwpvars[#]}"

shell command to remove characters after a special character in bash/shell

I have filename
hello_1.0_25.tgz
a_hello_1.25.6_154.tgz
<name>_<name1>.tgz
The output which i need is
hello_1.0
a_hello_1.25.6
<name>
How can i get string before special character _ in bash (or) shell?

In bash, this is easy:
$ f=hello_1.0_25.tgz
$ echo "${f%_*}"
hello_1.0
${f%_*} simply removes the _ and anything after it from the end of the variable f.
This is more concise than other approaches that use external tools and also saves using an extra process when one isn't needed.
more tips on string manipulation in bash

Something like
sed -r 's/(.*)_.*/\1/'
Test
$ echo "hello_1.0_25.tgz" | sed -r 's/(.*)_.*/\1/'
hello_1.0
$ echo "a_hello_1.25.6_154.tgz" | sed -r 's/(.*)_.*/\1/'
a_hello_1.25.6
$ echo "<name>_<name1>.tgz" | sed -r 's/(.*)_.*/\1/'
<name>
What it does?
s substitute command
(.*) matches anything till the last _ . Saved in \1
_.* matches _ followed by the rest
/\1/ replaced with \1, first capture group
OR
sed -r 's/_[^_]+$//'
Test
$ echo "hello_1.0_25.tgz" | sed -r 's/_[^_]+$//'
hello_1.0
$ echo "a_hello_1.25.6_154.tgz" | sed -r 's/_[^_]+$//'
a_hello_1.25.6
$ echo "<name>_<name1>.tgz" | sed -r 's/_[^_]+$//'
<name>
What it does?
[^_]+ Matches anything other than _. + quantifes the previous pattern one or more times
$ matches the end of the line
// replaced with empty

this sed line should do:
sed 's/_[^_]*$//'
little test with your example:
kent$ cat f
hello_1.0_25.tgz
a_hello_1.25.6_154.tgz
<name>_<name1>.tgz
kent$ sed 's/_[^_]*$//' f
hello_1.0
a_hello_1.25.6
<name>
awk can do it for sure too:
kent$ awk -F_ -v OFS="_" 'NF--' f
hello_1.0
a_hello_1.25.6
<name>
or grep if you like:
kent$ grep -Po '.*(?=_[^_]*$)' f
hello_1.0
a_hello_1.25.6
<name>
and #Tom Fenech 's bash way is nice too.

A slight variation on substring extraction:
$ m="a_hello_1.25.6_154.tgz"
$ echo "${m/%_${m/#*_/}/}"
$ a_hello_1.25.6
Which basically says ${m/#*_/} find the text following the last _ = 154.tgz (call it stuff); and then remove it, preceded by an underscore, from the backend of the string ${m/%_stuff/}. For a complete expression of ${m/%_${m/#*_/}/}.

Try this one.
sed 's/\(.*\)_\.*/\1/g' file_name

With Bash regular expressions:
$ f=hello_1.0_25.tgz
$ if [[ $f =~ (.*)_.*\.tgz$ ]]; then echo "${BASH_REMATCH[1]}"; fi
hello_1.0

This awk should do:
awk -F_ '{$NF="";sub(/_$/,"")}1' OFS=_ file
hello_1.0
a_hello_1.25.6
<name>
-F_ Sets Field Separator to _
$NF="" Removes last field.
sub(/_$/,"") Removes last filed separator.
1 Prints out all lines.

Split from 40900000 to 409-00-000

Does anybody knows a way to convert "40900000" to "409-00-000" with single command, sed or awk.
I already tried couple of ways with sed but no luck at all. I need to do this in a bulk, there is around 40k line and some of this lines are not proper, so they need to be fixed.
Thanks in advance

Using GNU sed, I would do it like this:
sed -r 's/([0-9]{3})([0-9]{2})([0-9]{3})/\1-\2-\3/' filename
# or, equivalently
sed -E 's/([0-9]{3})([0-9]{2})([0-9]{3})/\1-\2-\3/' filename
The -r or -E enables extended regex mode, which avoids the need to escape all the parentheses
\1 is the first capture group (the bits in between the ( ))
[0-9] means the range zero to nine
{3} means three of the preceeding character or range
edit: Thanks for all the comments.
On other systems that lack the -r switch, or its alias -E, you have to escape the ( ) and { } above. That leaves you with:
sed 's/\([0-9]\{3\}\)\([0-9]\{2\}\)\([0-9]\{3\}\)/\1-\2-\3/' filename
At the expense of repetition, you can avoid some of the escapes by simply repeating the [0-9]:
sed 's/\([0-9][0-9][0-9]\)\([0-9][0-9]\)\([0-9][0-9][0-9]\)/\1-\2-\3/' filename
For the record, Perl is equally capable of doing this sort of thing:
perl -pwe 's/(\d{3})(\d{2})(\d{3})/$1-$2-$3/' filename
-p means print
-w means enable warnings
-e means execute one line
\d is the "digit" character class (zero to nine)

No need to run external commands, bash or ksh can do it themselves.
$ a=12345678
$ [ ${#a} = 8 ] && { b=${a:0:3}-${a:3:2}-${a:5};a=$b;}
$ echo $a
123-45-678
$ a=abc-de-fgh
$ [ ${#a} = 8 ] && { b=${a:0:3}-${a:3:2}-${a:5};a=$b;}
$ echo $a
abc-de-fgh

You can use sed, like this:
sed 's/\([0-9][0-9][0-9]\)\([0-9][0-9]\)\([0-9][0-9][0-9]\)/\1-\2-\3/'
or more succinctly, with extended regex syntax:
sed -E 's/([0-9]{3})([0-9]{2})([0-9]{3})/\1-\2-\3/'

For golfing:
$ echo "40900000" | awk '$1=$1' FIELDWIDTHS='3 2 3' OFS='-'
409-00-000

With sed:
sed 's/\(...\)\(..\)\(...\)/\1-\2-\3/'
The dot matches character, and the surrounding with \( and \) makes it a group. The \1 references the first group.

Just for the fun of it, an awk
echo "40900000" | awk '{a=$0+0} length(a)==8 {$0=substr(a,1,3)"-"substr(a,4,2)"-"substr(a,6)}1'
409-00-000
This test if there are 8 digits.
A more complex version (need gnu awk due to gensub):
echo "40900000" | awk --re-interval '{print gensub(/([0-9]{3})([0-9]{2})([0-9]{3})/,"\\1-\\2-\\3","g")}'
409-00-000
echo "409-00-000" | awk --re-interval '{print gensub(/([0-9]{3})([0-9]{2})([0-9]{3})/,"\\1-\\2-\\3","g")}'
409-00-000

Turnarround from STDIN:
echo "40900000" | grep -E "[0-9]{8}" | cut -c "1-3,4-5,6-8" --output-delimiter=-
from file:
grep -E "[0-9]{8}" filename | cut -c "1-3,4-5,6-8" --output-delimiter=-
But I prefect Tom Fenech's solution.

Using grep to get the line number of first occurrence of a string in a file

I am using bash script for testing purpose.During my testing I have to find the line number of first occurrence of a string in a file. I have tried "awk" and "grep" both, but non of them return the value.
Awk example
#/!bin/bash
....
VAR=searchstring
...
cpLines=$(awk '/$VAR/{print NR}' $MYDIR/Configuration.xml
this does not expand $VAR. If I use the value of VAR it works, but I want to use VAR
Grep example
#/!bin/bash
...
VAR=searchstring
...
cpLines=grep -n -m 1 $VAR $MYDIR/Configuration.xml |cut -f1 -d:
this gives error line 20: -n: command not found

grep -n -m 1 SEARCH_TERM FILE_PATH |sed 's/\([0-9]*\).*/\1/'
grep switches
-n = include line number
-m 1 = match one
sed options (stream editor):
's/X/Y/' - replace X with Y
\([0-9]*\) - regular expression to match digits zero or multiple times occurred, escaped parentheses, the string matched with regex in parentheses will be the \1 argument in the Y (replacement string)
\([0-9]*\).* - .* will match any character occurring zero or multiple times.

You need $() for variable substitution in grep
cpLines=$(grep -n -m 1 $VAR $MYDIR/Configuration.xml |cut -f1 -d: )

Try something like:
awk -v search="$var" '$0~search{print NR; exit}' inputFile
In awk, / / will interpret awk variable literally. You need to use match (~) operator. What we are doing here is looking for the variable against your input line. If it matches, we print the line number stored in NR and exit.
-v allows you to create an awk variable (search) in above example. You then assign it your bash variable ($var).

grep -n -m 1 SEARCH_TERM FILE_PATH | grep -Po '^[0-9]+'
explanation:
-Po = -P -o
-P use perl regex
-o only print matched string (not the whole line)

Try pipping;
grep -P 'SEARCH TERM' fileName.txt | wc -l

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Regular expression using sed in UNIX - shell

You could just use awk: $ awk -F'[ .]+' '{print $4}' file ${database_name}

Related

output of sed gives strange result when using capture groups

bash script command output execution doesn't assign full output when using backticks

shell command to remove characters after a special character in bash/shell

Split from 40900000 to 409-00-000

Using grep to get the line number of first occurrence of a string in a file

Categories

Resources