Shell scripting selecting a part of a word

Shell scripting selecting a part of a word - shell

Shell scripting - I need to get only "v1.0.42" from below. There is no space between any words here
"ansible-project-development-environment-TEMPLATE-v1.0.42-role_test_example_run_environment"

Could you please try following.
awk 'match($0,/v[0-9]+\.[0-9]+\.[0-9]+/){print substr($0,RSTART,RLENGTH)}' Input_file
2nd solution: Using GNU sed:
sed -E 's/.*(v[0-9]+\.[0-9]+\.[0-9]+).*/\1/ Input_file
OR with BRE sed as per David sir's comments:
sed 's/^.*-\(v[^-][^-]*\).*$/\1/' Input_file
3rd solution: With perl one liner.
perl -ne 'print "$&\n" if /v[0-9]+\.[0-9]+\.[0-9]+/' Input_file

If you're using Bash and the above string in var $var:
$ [[ $var =~ v([0-9]+\.?)+ ]] && echo ${BASH_REMATCH[0]}
v1.0.42

Assuming you have the string in a shell variable, you could use parameter expansion to remove the parts you don't want:
v="ansible-project-development-environment-TEMPLATE-v1.0.42-role_test_example_run_environment"
v=${v#*-TEMPLATE-} # v1.0.42-role_test_example_run_environment
v=${v%%-*} # v1.0.42
This is standard POSIX shell, not requiring any non-standard extensions.
Relevant quote:
${parameter#[word]}
Remove Smallest Prefix Pattern. The word shall be expanded to produce a pattern. The parameter expansion shall then result in
parameter, with the smallest portion of the prefix matched by the
pattern deleted. If present, word shall not begin with an unquoted
'#'.
${parameter%%[word]}
Remove Largest Suffix Pattern. The word shall be expanded to produce a pattern. The parameter expansion shall then result in
parameter, with the largest portion of the suffix matched by the
pattern deleted.

try this:
echo "ansible-project-development-environment-TEMPLATE-v1.0.42-role_test_example_run_environment" | cut -d '-' -f6

Related

Using sed or awk to filter a specific pattern?

I have a file with 1000 IP address in the format, after the 4 octets is the source port which i need filtered. I am a bit new to bash scripting so struggling to grep the IP address filtering the port number. Any suggestions on how I used sed or awk to filter the port number will be really appreciated.
192.168.100.1.111
192.168.200.10.111
192.168.200.128.501
192.168.150.5.300
Output desired
192.168.100.1

Too easy.
awk -F'.' '{print $1"."$2"."$3"."$4}' < /tmp/ipaddresses.txt

You can do it with the below command:-
sed 's/\.[^.]*$//' your_ip_address.txt > filtered_ip_address.txt
It will remove the last dot along with following string.
For example:-
echo 192.168.100.1.111 | sed 's/\.[^.]*$//'
Will output:-
192.168.100.1

Remember that calling awk or sed is calling a new program, while Bash is already running. Instead you could filter your input directly in Bash:
$ while read url; do echo ${url%.*}; done <<EOL
129.168.100.1.111
1291.168.200.10.111
EOL
129.168.100.1
1291.168.200.10
That is
while read url
do
echo ${url%.*}
done < urls.txt
References
The parameter expansion (see man bash) {url%.*} uses the variable $url and strips (%) everything which matches the pattern .*. Note that pattern matching is different from regular expressions. In particular pattern matching has simpler syntax. That is . matches the literal character ., and * matches any string. Pattern matching is most commonly know for filename patterns as in *.txt, etc.
${parameter%word}
${parameter%%word}
Remove matching suffix pattern. The word is expanded to produce
a pattern just as in pathname expansion. If the pattern matches
a trailing portion of the expanded value of parameter, then the
result of the expansion is the expanded value of parameter with
the shortest matching pattern (the ``%'' case) or the longest
matching pattern (the ``%%'' case) deleted. If parameter is #
or *, the pattern removal operation is applied to each posi-
tional parameter in turn, and the expansion is the resultant
list. If parameter is an array variable subscripted with # or
*, the pattern removal operation is applied to each member of
the array in turn, and the expansion is the resultant list.

An awk example:
your_port=111
awk -F\. -v port=$your_port '$NF==port{$NF="";sub(/.$/, "");print}' OFS='.' file
If you just want to remove the ports:
sed 's/\.[^\.]*$//' your_file
Or in awk:
awk -F\. '{$NF="";sub(/.$/, "")}1' OFS='.' your_file
To avoid duplicates in just one process:
awk -F\. '{$NF="";sub(/.$/, "")}!a[$0]++' OFS='.' your_file

Remove a fixed prefix/suffix from a string in Bash

I want to remove the prefix/suffix from a string. For example, given:
string="hello-world"
prefix="hell"
suffix="ld"
How do I get the following result?
"o-wor"

$ prefix="hell"
$ suffix="ld"
$ string="hello-world"
$ foo=${string#"$prefix"}
$ foo=${foo%"$suffix"}
$ echo "${foo}"
o-wor
This is documented in the Shell Parameter Expansion section of the manual:
${parameter#word}
${parameter##word}
The word is expanded to produce a pattern and matched according to the rules described below (see Pattern Matching). If the pattern matches the beginning of the expanded value of parameter, then the result of the expansion is the expanded value of parameter with the shortest matching pattern (the # case) or the longest matching pattern (the ## case) deleted. […]
${parameter%word}
${parameter%%word}
The word is expanded to produce a pattern and matched according to the rules described below (see Pattern Matching). If the pattern matches a trailing portion of the expanded value of parameter, then the result of the expansion is the value of parameter with the shortest matching pattern (the % case) or the longest matching pattern (the %% case) deleted. […]

Using sed:
$ echo "$string" | sed -e "s/^$prefix//" -e "s/$suffix$//"
o-wor
Within the sed command, the ^ character matches text beginning with $prefix, and the trailing $ matches text ending with $suffix.
Adrian Frühwirth makes some good points in the comments below, but sed for this purpose can be very useful. The fact that the contents of $prefix and $suffix are interpreted by sed can be either good OR bad- as long as you pay attention, you should be fine. The beauty is, you can do something like this:
$ prefix='^.*ll'
$ suffix='ld$'
$ echo "$string" | sed -e "s/^$prefix//" -e "s/$suffix$//"
o-wor
which may be what you want, and is both fancier and more powerful than bash variable substitution. If you remember that with great power comes great responsibility (as Spiderman says), you should be fine.
A quick introduction to sed can be found at http://evc-cit.info/cit052/sed_tutorial.html
A note regarding the shell and its use of strings:
For the particular example given, the following would work as well:
$ echo $string | sed -e s/^$prefix// -e s/$suffix$//
...but only because:
echo doesn't care how many strings are in its argument list, and
There are no spaces in $prefix and $suffix
It's generally good practice to quote a string on the command line because even if it contains spaces it will be presented to the command as a single argument. We quote $prefix and $suffix for the same reason: each edit command to sed will be passed as one string. We use double quotes because they allow for variable interpolation; had we used single quotes the sed command would have gotten a literal $prefix and $suffix which is certainly not what we wanted.
Notice, too, my use of single quotes when setting the variables prefix and suffix. We certainly don't want anything in the strings to be interpreted, so we single quote them so no interpolation takes place. Again, it may not be necessary in this example but it's a very good habit to get into.

$ string="hello-world"
$ prefix="hell"
$ suffix="ld"
$ #remove "hell" from "hello-world" if "hell" is found at the beginning.
$ prefix_removed_string=${string/#$prefix}
$ #remove "ld" from "o-world" if "ld" is found at the end.
$ suffix_removed_String=${prefix_removed_string/%$suffix}
$ echo $suffix_removed_String
o-wor
Notes:
#$prefix : adding # makes sure that substring "hell" is removed only if it is found in beginning.
%$suffix : adding % makes sure that substring "ld" is removed only if it is found in end.
Without these, the substrings "hell" and "ld" will get removed everywhere, even it is found in the middle.

I use grep for removing prefixes from paths (which aren't handled well by sed):
echo "$input" | grep -oP "^$prefix\K.*"
\K removes from the match all the characters before it.

Do you know the length of your prefix and suffix? In your case:
result=$(echo $string | cut -c5- | rev | cut -c3- | rev)
Or more general:
result=$(echo $string | cut -c$((${#prefix}+1))- | rev | cut -c$((${#suffix}+1))- | rev)
But the solution from Adrian Frühwirth is way cool! I didn't know about that!

Small and universal solution:
expr "$string" : "$prefix\(.*\)$suffix"

Using the =~ operator:
$ string="hello-world"
$ prefix="hell"
$ suffix="ld"
$ [[ "$string" =~ ^$prefix(.*)$suffix$ ]] && echo "${BASH_REMATCH[1]}"
o-wor

NOTE: Not sure if this was possible back in 2013 but it's certainly possible today (10 Oct 2021) so adding another option ...
Since we're dealing with known fixed length strings (prefix and suffix) we can use a bash substring to obtain the desired result with a single operation.
Inputs:
string="hello-world"
prefix="hell"
suffix="ld"
Plan:
bash substring syntax: ${string:<start>:<length>}
skipping over prefix="hell" means our <start> will be 4
<length> will be total length of string (${#string}) minus the lengths of our fixed length strings (4 for hell / 2 for ld)
This gives us:
$ echo "${string:4:(${#string}-4-2)}"
o-wor
NOTE: the parens can be removed and still obtain the same result
If the values of prefix and suffix are unknown, or could vary, we can still use this same operation but replace 4 and 2 with ${#prefix} and ${#suffix}, respectively:
$ echo "${string:${#prefix}:${#string}-${#prefix}-${#suffix}}"
o-wor

Using #Adrian Frühwirth answer:
function strip {
local STRING=${1#$"$2"}
echo ${STRING%$"$2"}
}
use it like this
HELLO=":hello:"
HELLO=$(strip "$HELLO" ":")
echo $HELLO # hello

How to ensure I have exactly 2 spaces before string and zero spaces after

I get a string that can have from zero to multiple leading and trailing spaces.
I'm trying to get rid of them without lot of hackery but my code looks huge.
How to do this in a clean way?

as easy as:
$ src=" some text "
$ dst=" $(echo $src)"
$ echo ":$dst:"
: some text:
$(echo $src) will get rid of all around spaces.
than you simply add how much spaces you need before it.

How are you calling out the string? If it's an echo you can just put
Echo "<2 spaces>". "string";
if it's a normal string you just put 2 spaces between the first qoute and the string.
"<2spaces> string here"

One way using GNU sed:
sed 's/^[ \t]*/ /; s/[ \t]*$//' file.txt
You can apply this to a bash variable like this:
echo "$string" | sed 's/^[ \t]*/ /; s/[ \t]*$//'
And save it like this:
variable=$(echo "$string" | sed 's/^[ \t]*/ /; s/[ \t]*$//')
Explanation:
The first substitution will remove all leading whitespace and replace it with two spaces.
The second substitution will simply remove all lagging whitespace from a line.

The simplest is probably to use an external process.
value=$(echo "$value" | sed 's/^ *\(.*[^ ]\) *$/ \1/')
If you need to transform an empty string into two spaces, you'll need to modify the regex; and if you're not on Linux, your sed dialect may differ slightly. For maximum portability, switch to awk or Perl, or do it all in Bash. That gets a bit more complex, but for a start, trailing=${value##*[! ]} contains any trailing spaces, and you can trim them off with ${value%$trailing}, and similarly for leading spaces. See the section on variable substitution in the Bash manual for details.

You can use a regular expression to match everything between the leading and trailing spaces. The matched text is found in the BASH_REMATCH array (the text matching the first parentheses group is in element 1).
spcs='\ *'
text='.*[^ ]'
[[ $src =~ ^$spcs($text)$spcs$ ]]
dst=" ${BASH_REMATCH[1]}"

Regexp in bash for number between "quotes"

Input:
hello world "22" bye world
I need a regex that will work in bash that can get me the numbers between the quotes. The regex should match 22.
Thanks!

Hmm have you tried \"([0-9]+)\" ?

In Bash >= 3.2:
while read -r line
do
[[ $line =~ .*\"([0-9]+)\".* ]]
echo "${BASH_REMATCH[1]}"
done < inputfile.txt
Same thing using sed so it's more portable:
while read -r line
do
result=$(sed -n 's/.*\"\([0-9]\+\)\".*/\1/p')
echo "$result"
done < inputfile.txt

Pure Bash, no Regex. Number is in array element 1.
IFS=\" # input field separator is a double quote now
while read -a line ; do
echo -e "${line[1]}"
done < "$infile"

There are not really regexes in bash itself. There are however some programs that can use regexes, amongst them grep and sed.
grep's main functionality is to filter lines that match a given regex, ie you give it some data to stdin or a file and it prints the lines that match the regex.
sed does transform data. It doesn't just return the matching lines, you can tell it what to return with the s/regex/replacement/ command. The output part can contain references to groups (\x where x is the number of the group), if you specify the -r option.
So what we need is sed. Your input contains some stuff (^.*), a ", some digits ([0-9]+), a ", and some stuff (.*$). We later need to reference the digits, so we need to make the digits a group. So our complete matching regex is: ^.*"([0-9]+)".*$. We want to replace that with only the digits, so the replacement part is just \1.
Building the complete sed command is left as an exercise to you :-)
(Note that sed does not transform lines that don't match. If your input is only the line you provided above, that's fine. If there are other lines you'd like to silently skip, you need to specify the option -n (no automatic printing) and add a n to the end of the sed expression, which instructs it to print the line. That way it only prints the matching line(s).)

How to insert a newline in front of a pattern?

How to insert a newline before a pattern within a line?
For example, this will insert a newline behind the regex pattern.
sed 's/regex/&\n/g'
How can I do the same but in front of the pattern?
Given this sample input file, the pattern to match on is the phone number.
some text (012)345-6789
Should become
some text
(012)345-6789

This works in bash and zsh, tested on Linux and OS X:
sed 's/regexp/\'$'\n/g'
In general, for $ followed by a string literal in single quotes bash performs C-style backslash substitution, e.g. $'\t' is translated to a literal tab. Plus, sed wants your newline literal to be escaped with a backslash, hence the \ before $. And finally, the dollar sign itself shouldn't be quoted so that it's interpreted by the shell, therefore we close the quote before the $ and then open it again.
Edit: As suggested in the comments by #mklement0, this works as well:
sed $'s/regexp/\\\n/g'
What happens here is: the entire sed command is now a C-style string, which means the backslash that sed requires to be placed before the new line literal should now be escaped with another backslash. Though more readable, in this case you won't be able to do shell string substitutions (without making it ugly again.)

Some of the other answers didn't work for my version of sed.
Switching the position of & and \n did work.
sed 's/regexp/\n&/g'
Edit: This doesn't seem to work on OS X, unless you install gnu-sed.

In sed, you can't add newlines in the output stream easily. You need to use a continuation line, which is awkward, but it works:
$ sed 's/regexp/\
&/'
Example:
$ echo foo | sed 's/.*/\
&/'
foo
See here for details. If you want something slightly less awkward you could try using perl -pe with match groups instead of sed:
$ echo foo | perl -pe 's/(.*)/\n$1/'
foo
$1 refers to the first matched group in the regular expression, where groups are in parentheses.

On my mac, the following inserts a single 'n' instead of newline:
sed 's/regexp/\n&/g'
This replaces with newline:
sed "s/regexp/\\`echo -e '\n\r'`/g"

echo one,two,three | sed 's/,/\
/g'

You can use perl one-liners much like you do with sed, with the advantage of full perl regular expression support (which is much more powerful than what you get with sed). There is also very little variation across *nix platforms - perl is generally perl. So you can stop worrying about how to make your particular system's version of sed do what you want.
In this case, you can do
perl -pe 's/(regex)/\n$1/'
-pe puts perl into a "execute and print" loop, much like sed's normal mode of operation.
' quotes everything else so the shell won't interfere
() surrounding the regex is a grouping operator. $1 on the right side of the substitution prints out whatever was matched inside these parens.
Finally, \n is a newline.
Regardless of whether you are using parentheses as a grouping operator, you have to escape any parentheses you are trying to match. So a regex to match the pattern you list above would be something like
\(\d\d\d\)\d\d\d-\d\d\d\d
\( or \) matches a literal paren, and \d matches a digit.
Better:
\(\d{3}\)\d{3}-\d{4}
I imagine you can figure out what the numbers in braces are doing.
Additionally, you can use delimiters other than / for your regex. So if you need to match / you won't need to escape it. Either of the below is equivalent to the regex at the beginning of my answer. In theory you can substitute any character for the standard /'s.
perl -pe 's#(regex)#\n$1#'
perl -pe 's{(regex)}{\n$1}'
A couple final thoughts.
using -ne instead of -pe acts similarly, but doesn't automatically print at the end. It can be handy if you want to print on your own. E.g., here's a grep-alike (m/foobar/ is a regex match):
perl -ne 'if (m/foobar/) {print}'
If you are finding dealing with newlines troublesome, and you want it to be magically handled for you, add -l. Not useful for the OP, who was working with newlines, though.
Bonus tip - if you have the pcre package installed, it comes with pcregrep, which uses full perl-compatible regexes.

In this case, I do not use sed. I use tr.
cat Somefile |tr ',' '\012'
This takes the comma and replaces it with the carriage return.

To insert a newline to output stream on Linux, I used:
sed -i "s/def/abc\\\ndef/" file1
Where file1 was:
def
Before the sed in-place replacement, and:
abc
def
After the sed in-place replacement. Please note the use of \\\n. If the patterns have a " inside it, escape using \".

Hmm, just escaped newlines seem to work in more recent versions of sed (I have GNU sed 4.2.1),
dev:~/pg/services/places> echo 'foobar' | sed -r 's/(bar)/\n\1/;'
foo
bar

echo pattern | sed -E -e $'s/^(pattern)/\\\n\\1/'
worked fine on El Captitan with () support

In my case the below method works.
sed -i 's/playstation/PS4/' input.txt
Can be written as:
sed -i 's/playstation/PS4\nplaystation/' input.txt
PS4
playstation
Consider using \\n while using it in a string literal.
sed : is stream editor
-i : Allows to edit the source file
+: Is delimiter.
I hope the above information works for you 😃.

in sed you can reference groups in your pattern with "\1", "\2", ....
so if the pattern you're looking for is "PATTERN", and you want to insert "BEFORE" in front of it, you can use, sans escaping
sed 's/(PATTERN)/BEFORE\1/g'
i.e.
sed 's/\(PATTERN\)/BEFORE\1/g'

You can also do this with awk, using -v to provide the pattern:
awk -v patt="pattern" '$0 ~ patt {gsub(patt, "\n"patt)}1' file
This checks if a line contains a given pattern. If so, it appends a new line to the beginning of it.
See a basic example:
$ cat file
hello
this is some pattern and we are going ahead
bye!
$ awk -v patt="pattern" '$0 ~ patt {gsub(patt, "\n"patt)}1' file
hello
this is some
pattern and we are going ahead
bye!
Note it will affect to all patterns in a line:
$ cat file
this pattern is some pattern and we are going ahead
$ awk -v patt="pattern" '$0 ~ patt {gsub(patt, "\n"patt)}1' d
this
pattern is some
pattern and we are going ahead

sed -e 's/regexp/\0\n/g'
\0 is the null, so your expression is replaced with null (nothing) and then...
\n is the new line
On some flavors of Unix doesn't work, but I think it's the solution to your problem.
echo "Hello" | sed -e 's/Hello/\0\ntmow/g'
Hello
tmow

This works in MAC for me
sed -i.bak -e 's/regex/xregex/g' input.txt sed -i.bak -e 's/qregex/\'$'\nregex/g' input.txt
Dono whether its perfect one...

After reading all the answers to this question, it still took me many attempts to get the correct syntax to the following example script:
#!/bin/bash
# script: add_domain
# using fixed values instead of command line parameters $1, $2
# to show typical variable values in this example
ipaddr="127.0.0.1"
domain="example.com"
# no need to escape $ipaddr and $domain values if we use separate quotes.
sudo sed -i '$a \\n'"$ipaddr www.$domain $domain" /etc/hosts
The script appends a newline \n followed by another line of text to the end of a file using a single sed command.

In vi on Red Hat, I was able to insert carriage returns using just the \r character. I believe this internally executes 'ex' instead of 'sed', but it's similar, and vi can be another way to do bulk edits such as code patches. For example. I am surrounding a search term with an if statement that insists on carriage returns after the braces:
:.,$s/\(my_function(.*)\)/if(!skip_option){\r\t\1\r\t}/
Note that I also had it insert some tabs to make things align better.

Just to add to the list of many ways to do this, here is a simple python alternative. You could of course use re.sub() if a regex were needed.
python -c 'print(open("./myfile.txt", "r").read().replace("String to match", "String to match\n"))' > myfile_lines.txt

sed 's/regexp/\'$'\n/g'
works as justified and detailed by mojuba in his answer .
However, this did not work:
sed 's/regexp/\\\n/g'
It added a new line, but at the end of the original line, a \n was added.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio