Substring from a string in bash using scripting language - bash

How can we fetch a substring from a string in bash using scripting language?
Example:
fullstring="mnuLOCNMOD.URL = javascript:parent.doC...something"
The substring I want is everything before ".URL" in the full string.

With Parameter Expansion, you can do:
fullstring="mnuLOCNMOD.URL = javascript:parent.doC...something"
echo ${fullstring%\.URL*}
prints:
mnuLOCNMOD

$ fullstring="mnuLOCNMOD.URL = javascript:parent.doC...something"
$ sed -r 's/^(.*)\.URL.*$/\1/g' <<< "$fullstring"
mnuLOCNMOD
$

You can use grep:
echo "mnuLOCNMOD.URL = javas" | grep -oP '\w+(?=\.URL)'
and assign the result to a string. I used a positive lookahead (?=regex) because it's a zero length assertion, meaning that it'll be matched but won't be displayed.
Run grep --help to find out what o and P flags stand for.

Parameter Expansion is the way to go.
If you are interested in a simple grep:
% fullstring="mnuLOCNMOD.URL = javascript:parent.doC...something"
% grep -o '^[^.]*' <<<"$fullstring"
mnuLOCNMOD

fullstring="mnuLOCNMOD.URL = javascript:parent.doC...something"
menuID=`echo $fullstring | cut -f 1 -d '.'`
here I used dot as a separator
this works in .sh files

To offer yet another alternative: Bash's regular-expression matching operator, =~:
fullstring="mnuLOCNMOD.URL = javascript:parent.doC...something"
echo "$([[ $fullstring =~ ^(.*)'.URL' ]] && echo "${BASH_REMATCH[1]}")"
Note how the (one and only) capture group ((.*)) is reported through element 1 of the special "${BASH_REMATCH[#]}" array variable.
While in this case l3x's parameter expansion solution is simpler, =~ generally offers more flexibility.
awk offers an easy solution as well:
echo "$(awk -F'\\.URL' '{ print $1 }' <<<"$fullstring")"

Related

shell script concatenation is printing double quotes"" [duplicate]

Below is the snippet of a shell script from a larger script. It removes the quotes from the string that is held by a variable. I am doing it using sed, but is it efficient? If not, then what is the efficient way?
#!/bin/sh
opt="\"html\\test\\\""
temp=`echo $opt | sed 's/.\(.*\)/\1/' | sed 's/\(.*\)./\1/'`
echo $temp
Use tr to delete ":
echo "$opt" | tr -d '"'
NOTE: This does not fully answer the question, removes all double quotes, not just leading and trailing. See other answers below.
There's a simpler and more efficient way, using the native shell prefix/suffix removal feature:
temp="${opt%\"}"
temp="${temp#\"}"
echo "$temp"
${opt%\"} will remove the suffix " (escaped with a backslash to prevent shell interpretation).
${temp#\"} will remove the prefix " (escaped with a backslash to prevent shell interpretation).
Another advantage is that it will remove surrounding quotes only if there are surrounding quotes.
BTW, your solution always removes the first and last character, whatever they may be (of course, I'm sure you know your data, but it's always better to be sure of what you're removing).
Using sed:
echo "$opt" | sed -e 's/^"//' -e 's/"$//'
(Improved version, as indicated by jfgagne, getting rid of echo)
sed -e 's/^"//' -e 's/"$//' <<<"$opt"
So it replaces a leading " with nothing, and a trailing " with nothing too. In the same invocation (there isn't any need to pipe and start another sed. Using -e you can have multiple text processing).
If you're using jq and trying to remove the quotes from the result, the other answers will work, but there's a better way. By using the -r option, you can output the result with no quotes.
$ echo '{"foo": "bar"}' | jq '.foo'
"bar"
$ echo '{"foo": "bar"}' | jq -r '.foo'
bar
There is a straightforward way using xargs:
> echo '"quoted"' | xargs
quoted
xargs uses echo as the default command if no command is provided and strips quotes from the input, see e.g. here. Note, however, that this will work only if the string does not contain additional quotes. In that case it will either fail (uneven number of quotes) or remove all of them.
If you came here for aws cli --query, try this. --output text
You can do it with only one call to sed:
$ echo "\"html\\test\\\"" | sed 's/^"\(.*\)"$/\1/'
html\test\
The shortest way around - try:
echo $opt | sed "s/\"//g"
It actually removes all "s (double quotes) from opt (are there really going to be any more double quotes other than in the beginning and the end though? So it's actually the same thing, and much more brief ;-))
The easiest solution in Bash:
$ s='"abc"'
$ echo $s
"abc"
$ echo "${s:1:-1}"
abc
This is called substring expansion (see Gnu Bash Manual and search for ${parameter:offset:length}). In this example it takes the substring from s starting at position 1 and ending at the second last position. This is due to the fact that if length is a negative value it is interpreted as a backwards running offset from the end of parameter.
Update
A simple and elegant answer from Stripping single and double quotes in a string using bash / standard Linux commands only:
BAR=$(eval echo $BAR) strips quotes from BAR.
=============================================================
Based on hueybois's answer, I came up with this function after much trial and error:
function stripStartAndEndQuotes {
cmd="temp=\${$1%\\\"}"
eval echo $cmd
temp="${temp#\"}"
eval echo "$1=$temp"
}
If you don't want anything printed out, you can pipe the evals to /dev/null 2>&1.
Usage:
$ BAR="FOO BAR"
$ echo BAR
"FOO BAR"
$ stripStartAndEndQuotes "BAR"
$ echo BAR
FOO BAR
This is the most discrete way without using sed:
x='"fish"'
printf " quotes: %s\nno quotes: %s\n" "$x" "${x//\"/}"
Or
echo $x
echo ${x//\"/}
Output:
quotes: "fish"
no quotes: fish
I got this from a source.
Linux=`cat /etc/os-release | grep "ID" | head -1 | awk -F= '{ print $2 }'`
echo $Linux
Output:
"amzn"
Simplest ways to remove double quotes from variables are
Linux=`echo "$Linux" | tr -d '"'`
Linux=$(eval echo $Linux)
Linux=`echo ${Linux//\"/}`
Linux=`echo $Linux | xargs`
All provides the Output without double quotes:
echo $Linux
amzn
I know this is a very old question, but here is another sed variation, which may be useful to someone. Unlike some of the others, it only replaces double quotes at the start or end...
echo "$opt" | sed -r 's/^"|"$//g'
If you need to match single or double quotes, and only strings that are properly quoted. You can use this slightly more complex regex...
echo $opt | sed -E "s|^(['\"])(.*)\1$|\2|g"
This uses backrefences to ensure the quote at the end is the same as at the start.
In Bash, you could use the following one-liner:
[[ "${var}" == \"*\" || "${var}" == \'*\' ]] && var="${var:1:-1}"
This will remove surrounding quotes (both single and double) from the string stored in var while keeping quote characters inside the string intact. Also, this won't do anything if there's only a single leading quote or only a single trailing quote or if there are mixed quote characters at start/end.
Wrapped in a function:
#!/usr/bin/env bash
# Strip surrounding quotes from string [$1: variable name]
function strip_quotes() {
local -n var="$1"
[[ "${var}" == \"*\" || "${var}" == \'*\' ]] && var="${var:1:-1}"
}
str="'hello world'"
echo "Before: ${str}"
strip_quotes str
echo "After: ${str}"
My version
strip_quotes() {
while [[ $# -gt 0 ]]; do
local value=${!1}
local len=${#value}
[[ ${value:0:1} == \" && ${value:$len-1:1} == \" ]] && declare -g $1="${value:1:$len-2}"
shift
done
}
The function accepts variable name(s) and strips quotes in place. It only strips a matching pair of leading and trailing quotes. It doesn't check if the trailing quote is escaped (preceded by \ which is not itself escaped).
In my experience, general-purpose string utility functions like this (I have a library of them) are most efficient when manipulating the strings directly, not using any pattern matching and especially not creating any sub-shells, or calling any external tools such as sed, awk or grep.
var1="\"test \\ \" end \""
var2=test
var3=\"test
var4=test\"
echo before:
for i in var{1,2,3,4}; do
echo $i="${!i}"
done
strip_quotes var{1,2,3,4}
echo
echo after:
for i in var{1,2,3,4}; do
echo $i="${!i}"
done
I use this regular expression, which avoids removing quotes from strings that are not properly quoted, here the different outputs are shown depending on the inputs, only one with begin-end quote was affected:
echo '"only first' | sed 's/^"\(.*\)"$/\1/'
Output: >"only first<
echo 'only last"' | sed 's/^"\(.*\)"$/\1/'
Output: >"only last"<
echo '"both"' | sed 's/^"\(.*\)"$/\1/'
Output: >both<
echo '"space after" ' | sed 's/^"\(.*\)"$/\1/'
Output: >"space after" <
echo ' "space before"' | sed 's/^"\(.*\)"$/\1/'
Output: > "space before"<
STR='"0.0.0"' ## OR STR="\"0.0.0\""
echo "${STR//\"/}"
## Output: 0.0.0
There is another way to do it. Like:
echo ${opt:1:-1}
If you try to remove quotes because the Makefile keeps them, try this:
$(subst $\",,$(YOUR_VARIABLE))
Based on another answer: https://stackoverflow.com/a/10430975/10452175

How can I increment an infix variable in Bash?

I have a string foo-0 that I want to convert to bar1baz, i.e., parse the trailing index and add a prefix/suffix. The part before the trailing index (in this case foo- can also contain numeric characters, but those should not be changed.
I tried the following:
echo foo-0 | cut -d'-' -f 2 | sed 's/.*/bar&baz/'
but that gives me only a partial solution (bar0baz). How can I increment the infix variable?
EDIT: the solutions below only work partially for what I am trying to achieve. This is my fault because I simplified the example above too much for the sake of clarity.
The final goal is to set an environmental variable (let's call it MY_ENV) to the output value using bash with the following syntax:
/bin/sh -c "echo $var | ... (some bash magic to replace the trailing index) | ... (some bash magic to set MY_ENV=the output of the pipe)"
Side note: The reason I am using /bin/sh -c "..." is because I want to use the command in a Kubernetes YAML.
Partial solution (using awk)
This works:
echo foo-0 | awk -F- '{print "bar" $2+1 "baz"}'
This doesn't (output is 1baz):
/bin/sh -c "echo foo-0 | awk -F- '{print \"bar\" $2+1 \"baz\"}'
Partial solution (using arithmetic context and parameter expansion)
$ var=foo-0
$ echo "bar$((${var//[![:digit:]]}+1))baz"
This does not work if var contains other numeric characters before the trailing index (e.g. for var foo=r2a-foo-0.
You may use awk:
awk -F- '{print "bar" $2+1 "baz"}' <<< 'foo-0'
bar1baz
You could use an arithmetic context and parameter expansion:
$ var=foo-0
$ echo "bar$((${var//[![:digit:]]}+1))baz"
bar1baz
Unrolled, from the inside:
${var//[![:digit:]]} removes all non-digits from var:
$ echo "${var//[![:digit:]]}"
0
$((blah+1)) adds 1 to the variable blah:
$ blah=0
$ echo "$((blah+1))"
1
or, instead of blah, we can use the result of the inner substitution:
$ echo "$(( ${var//[![:digit:]]} + 1 ))"
1
and finally, putting this between bar and baz, you get bar1baz.
Amending for the other case brought up: assuming there might be other digits and we want to increment only the trailing ones, e.g.,
var=2a-foo-21
To do this, we can use nested parameter expansion with extended globs (shopt -s extglob) and the +(pattern) pattern, which matches one or more of pattern. Observe:
$ echo "${var#"${var%%+([[:digit:]])}"}"
21
The outer expansion is ${var#pattern}, which removes the shortest match of pattern from the beginning of $var. For pattern, we use
"${var%%+([[:digit:]])}"
which is "remove the longest match of +([[:digit:]]) (one or more digits) from the end of $var". This leaves us with just the trailing digits, and incrementing them and adding string before and after looks something like this:
$ echo "bar$((${var#"${var%%+([[:digit:]])}"}+1))baz"
bar22baz
This is so unreadable that I'd suggest using regex instead:
$ re='([[:digit:]]+)$'
$ [[ $var =~ $re ]]
$ echo "bar$((${BASH_REMATCH[1]}+1))baz"
bar22baz

to extract pattern from a variable in unix

I have a pattern for example
type=hello(10,1)
How can I extract text alone into one variable and the contents into another variable in shell script.
The desired output is
text=hello
p1=10
p2=1
Appreciate any help in this regard.
Thanks
you can use awk with -F to define delimiter
In this case your delimeter can be [=(,] (escape the open bracket)
$ y=$(echo "type=hello(10,1)" | awk -F [=\(,] '{print $2}')
$ echo $y
hello
awk index starts from 1, so $2 = hello, $3 = 10, $4=1
If you are using bash, you can use a regular expression with capture groups. A simple example:
[[ $type =~ (.*)\((.*),(.*)\) ]]
text=${BASH_REMATCH[1]}
p1=${BASH_REMATCH[2]}
p2=${BASH_REMATCH[3]}
If you are a using a shell that does not directly support regular expression matching, you can use the expr command, but it only supports one capture group at a time.
$ text=$( expr "$type" : '\(.*\)(' )
$ p1=$( expr "$type" : '.*(\(.*\),' )
$ pt2=$( expr "$type" : '.*(.*,\(.*\))' )

How do I avoid the usage of the "for" loop in this bash function?

I am creating this function to make multiple grep's over every line of a file. I run it as following:
cat file.txt | agrep string1 string2 ... stringN
function agrep () {
for a in $#; do
cmd+=" | grep '$a'";
done ;
while read line ; do
eval "echo "\'"$line"\'" $cmd";
done;
}
The idea is to print every line that contains all the strings: string1, string2, ..., stringN. This already works but I want to avoid the usage of the for to construct the expression:
| grep string1 | grep string2 ... | stringN
And if it's possible, also the usage of eval. I tried to make some expansion as follows:
echo "| grep $"{1..3}
And I get:
| grep $1 | grep $2 | grep $3
This is almost what I want but the problem is that when I try:
echo "| grep $"{1..$#}
The expansion doesn't occur because bash cant expand {1..$#} due to the $#. It just works with numbers. I would like to construct some expansion that works in order to avoid the usage of the for in the agrep function.
agrep () {
if [ $# = 0 ]; then
cat
else
pattern="$1"
shift
grep -e "$pattern" | agrep "$#"
fi
}
Instead of running each multiple greps on each line, just get all the lines that match string1, then pipe that to grep for string2, etc. One way to do this is make agrep recursive.
agrep () {
if (( $# == 0 )); then
cat # With no arguments, just output everything
else
grep "$1" | agrep "${#:2}"
fi
}
It's not the most efficient solution, but it's simple.
(Be sure to note Rob Mayoff's answer, which is the POSIX-compliant version of this.)
awk to the rescue!
you can avoid multiple grep calls and constructing the command by switching to awk
awk -v pat='string1 string2 string3' 'BEGIN{n=split(pat,p)}
{for(i=1;i<=n;i++) if($0!~p[i]) next}1 ' file
enter your space delimited strings as in the example above.
Not building a string for the command is definitely better (see chepner's and Rob Mayoff's answers). However, just as an example, you can avoid the for by using printf:
agrep () {
cmd=$(printf ' | grep %q' "$#")
sh -c "cat $cmd"
}
Using printf also helps somewhat with special characters in the patterns. From help printf:
In addition to the standard format specifications described in printf(1),
printf interprets:
%b expand backslash escape sequences in the corresponding argument
%q quote the argument in a way that can be reused as shell input
%(fmt)T output the date-time string resulting from using FMT as a format
string for strftime(3)
Since the aim of %q is providing output suitable for shell input, this should be safe.
Also: You almost always want to use "$#" with the quotes, not just plain $#.

How to obtain the first letter in a Bash variable?

I have a Bash variable, $word, which is sometimes a word or sentence, e.g.:
word="tiger"
Or:
word="This is a sentence."
How can I make a new Bash variable which is equal to only the first letter found in the variable? E.g., the above would be:
echo $firstletter
t
Or:
echo $firstletter
T
word="tiger"
firstletter=${word:0:1}
word=something
first=${word::1}
initial="$(echo $word | head -c 1)"
Every time you say "first" in your problem description, head is a likely solution.
A portable way to do it is to use parameter expansion (which is a POSIX feature):
$ word='tiger'
$ echo "${word%"${word#?}"}"
t
With cut :
word='tiger'
echo "${word}" | cut -c 1
Since you have a sed tag here is a sed answer:
echo "$word" | sed -e "{ s/^\(.\).*/\1/ ; q }"
Play by play for those who enjoy those (I do!):
{
s: start a substitution routine
/: Start specifying what is to be substituted
^\(.\): capture the first character in Group 1
.*:, make sure the rest of the line will be in the substitution
/: start specifying the replacement
\1: insert Group 1
/: The rest is discarded;
q: Quit sed so it won't repeat this block for other lines if there are any.
}
Well that was fun! :) You can also use grep and etc but if you're in bash the ${x:0:1} magick is still the better solution imo. (I spent like an hour trying to use POSIX variable expansion to do that but couldn't :( )
Using bash 4:
x="test"
read -N 1 var <<< "${x}"
echo "${var}"

Resources