extract path value substring using sed - bash

Trying to extract text between a path variable which has the following value
path_value="path/to/value/src"
I want to extract just value from the above variable and use that later in my script. I know it can be done using grep or awk but I wanted to know how it can be done using sed
So I tried this
service_name=$(echo $path_value | sed -e 's/path/to/(.*\)/.*/\1/')
But I get this error bad flag in substitute command: '('
Could you please suggest what is the right regex to achieve what I am trying to do?

Using parameter substitution and eliminating the subprocess calls:
$ path_value="path/to/value/src"
$ tempx="${path_value%/*}"
$ echo "${tempx}"
path/to/value
$ service_name="${tempx##*/}"
$ echo "${service_name}"
value
Performing a bash/regex comparison and retrieving the desired item from the BASH_REMATCH[] array (also eliminates subprocess calls):
$ regex='.*/([^/]+)/([^/]+)$'
$ [[ "${path_value}" =~ $regex ]] && service_name="${BASH_REMATCH[1]}"
$ echo "${service_name}"
# fwiw, contents of the BASH_REMATCH[] array:
$ typeset -p BASH_REMATCH
declare -ar BASH_REMATCH=([0]="path/to/value/src" [1]="value" [2]="src")

You can use
#!/bin/bash
path_value="path/to/value/src"
service_name=$(echo "$path_value" | sed 's~path/to/\([^/]*\)/.*~\1~')
echo "$service_name"
# => value
See the online demo.
Note I replaced / regex delimiters with ~ so as to avoid escaping / chars inside the pattern.
The capturing parentheses must both be escaped in a POSIX BRE regex.
The [^/]* part only matches zero or more chars other than /.

Related

echo inside a for loop lists files matching the output pattern

I have a problem with the following for loop:
X="*back* OLD"
for P in $X
do
echo "-$P"
done
I need it to output just:
-*back*
-OLD
However, it lists all files in the current directory matching the *back* pattern. For example it gives the following:
-backup.bkp
-backup_new.bkp
-backup_X
-OLD
How to force it to output the exact pattern?
Use an array, as unquoted parameter expansions are still subject to globbing.
X=( "*back*" OLD )
for P in "${X[#]}"; do
printf '%s\n' "$P"
done
(Use printf, as echo could try to interpret an argument as an option, for example, if you had n in the value of X.)
Use set -o noglob before your loop and set +o noglob after to disable and enable globbing.
To prevent filename expansion you could read in the string as a Here String.
To iterate over the items, you could turn them into lines using parameter expansion and read them linewise using read. In order to be able to put a - sign as the first character, use printf instead of echo.
X="*back* OLD"
while read -r x
do printf -- '-%s\n' "$x"
done <<< "${X/ /$'\n'}"
Another way could be to use tr to transform the string into lines, then use paste with the - sign as delimiter and "nothing" from /dev/null as first column.
X="*back* OLD"
tr ' ' '\n' <<< "$X" | paste -d- /dev/null -
Both should output:
-*back*
-OLD

Using value inside a variable without expanding

I am trying to find and replace a specific text content using the sed command and to run it via a shell script.
Below is the sample script that I am using:
fp=/asd/filename.txt
fd="sed -i -E 's ($2).* $2:$3 g' ${fp}"
eval $fd
and executing the same by passing the arguments:
./test.sh update asd asdfgh
But if the argument string contains $ , it breaks the commands and it is replacing with wrong values, like
./test.sh update asd $apr1$HnIF6bOt$9m3NzAwr.aG1Yp.t.bpIS1.
How can I make sure that the values inside the variables are not expanded because of the $?
Updated
sh file test.sh
set -xv
fp="/asd/filename.txt"
sed -iE "s/(${2//'$'/'\$'}).*/${2//'$'/'\$'}:${3//'$'/'\$'}/g" "$fp"
text file filename.txt
hello:world
Outputs
1)
./test.sh update hello WORLD
sed -iE "s/(${2//'$'/'\$'}).*/${2//'$'/'\$'}:${3//'$'/'\$'}/g" "$fp"
++ sed -iE 's/(hello).*/hello:WORLD/g' /asd/filename.txt
2)
./test.sh update hello '$apr1$hosgaxyv$D0KXp5dCyZ2BUYCS9BmHu1'
sed -iE "s/(${2//'$'/'\$'}).*/${2//'$'/'\$'}:${3//'$'/'\$'}/g" "$fp"
++ sed -iE 's/(hello).*/hello:'\''$'\''apr1'\''$'\''hosgaxyv'\''$'\''D0KXp5dCyZ2BUYCS9BmHu1/g' /asd/filename.txt
In both the case , its not replacing the content
You don't need eval here at all:
fp=/asd/filename.txt
sed -i -E "s/(${2//'$'/'\$'}).*/\1:${3//'$'/'\$'}/g" "$fp"
The whole sed command is in double quotes so variables can expand.
I've replaced the blank as the s separator with / (doesn't really matter in the example).
I've used \1 to reference the first capture group instead of repeating the variable in the substitution.
Most importantly, I've used ${2//'$'/'\$'} instead of $2 (and similar for $3). This escapes every $ sign as \$; this is required because of the double quoting, or the $ get eaten by the shell before sed gets to see them.
When you call your script, you must escape any $ in the input, or the shell tries to expand them as variable names:
./test.sh update asd '$apr1$HnIF6bOt$9m3NzAwr.aG1Yp.t.bpIS1.'
Put the command-line arguments that are filenames in single quotes:
./test.sh update 'asd' '$apr1$HnIF6bOt$9m3NzAwr.aG1Yp.t.bpIS1'
must protect all the script arguments with quotes if having space and special shell char, and escape it if it's a dollar $, and -Ei instead of -iE even better drop it first for test, may add it later if being really sure
I admit i won't understant your regex so let's just get in the gist of solution, no need eval;
fp=/asd/filename.txt
sed -Ei "s/($2).*/$2:$3/g" $fp
./test.sh update asd '\$apr1\$HnIF6bOt\$9m3NzAwr.aG1Yp.t.bpIS1.'

Assign part of a file name to bash variable?

I have a file and its name looks like:
12U12345._L001_R1_001.fastq.gz
I want to assign to a variable just the 12U12345 part.
So far I have:
variable=`basename $fastq | sed {s'/_S[0-9]*_L001_R1_001.fastq.gz//'}`
Note: $fastq is a variable with the full path to the file in it.
This solution currently returns the full file name, any ideas how to get this right?
Just use the built-in parameter expansion provided by the shell, instead of spawning a separate process
fastq="12U12345._L001_R1_001.fastq.gz"
printf '%s\n' "${fastq%%.*}"
12U12345
or use printf() itself to store to a new variable in one-shot
printf -v numericPart '%s' "${fastq%%.*}"
printf '%s\n' "${numericPart}"
Also bash has a built-in regular expression comparison operator, represented by =~ using which you could do
fastq="12U12345._L001_R1_001.fastq.gz"
regex='^([[:alnum:]]+)\.(.*)'
if [[ $fastq =~ $regex ]]; then
numericPart="${BASH_REMATCH[1]}"
printf '%s\n' "${numericPart}"
fi
You could use cut:
$> fastq="/path/to/12U12345._L001_R1_001.fastq.gz"
$> variable=$(basename "$fastq" | cut -d '.' -f 1)
$> echo "$variable"
12U12345
Also, please note that:
It's better to wrap your variable inside quotes. Otherwise you command won't work with filenames that contain space(s).
You should use $() instead of the backticks.
Using Bash Parameter Expansion to extract the basename and then extract the portion of the filename you want:
fastq="/path/to/12U12345._L001_R1_001.fastq.gz"
file="${fastq##*/}" # gives 12U12345._L001_R1_001.fastq.gz
string="${file%%.*}" # gives 12U12345
Note that Bash doesn't allow us to nest the parameter expansion. Otherwise, we could have combined statements 2 and 3 above.

How to do string separation using using keyword?

I have the STRING as given below. There is no specific separator between each key. The only way is to identify the keys is using the keyword "key_1" or "key_2" etc..
All keys begin with "key_" and can never appear in the value of another:
STRING="key_1=mislanious_string1 key_2=miscellaneous_string2"
I want the output as below.
echo $STRING1 should print:
key_1=mislanious_string1
echo $STRING2 should print:
key_2=mislanious_string2
e.g:
If STRING="key_1=foobarzkey_2=bash" , then the output should look like , STRING1=key_1=foobarz and STRING2=key_2=bash.
There may be more keys like key_1 , key_2 , key_3 etc. Each key starts with "key_" and can never appear in the value of another:
How to this in UNIX bash shell?
Using grep -P (PCRE) to support multiple key-value pairs in input:
STRING="key_1=mislanious_string1key_2=miscellaneous_string2key_3=fookey_4=BASH"
grep -oP 'key_[^=]+=.*?(?=key_|$)' <<< "$STRING"
key_1=mislanious_string1
key_2=miscellaneous_string2
key_3=foo
key_4=BASH
To store them into BASH array you can use:
read -d '' -ra arr < <(grep -oP 'key_[^=]+=.*?(?=key_|$)' <<< "$STRING")
printf "%s\n" "${arr[#]}"
key_1=mislanious_string1
key_2=miscellaneous_string2
key_3=foo
key_4=BASH
declare -p arr
declare -a arr='([0]="key_1=mislanious_string1" [1]="key_2=miscellaneous_string2" [2]="key_3=foo" [3]="key_4=BASH")'
UPDATE:: Here is a pure BASH (non-gnu) way of splitting these strings. We first insert an invisible character before every occurrence of key_ string and then use that for splitting the string:
STRING="key_1=mislanious_string1key_2=miscellaneous_string2key_3=fookey_4=BASH"
c=$'\x06'
s="${STRING//key_/${c}key_}"
arr=()
while [[ "$s" =~ ${c}(key_[^=]+=[^${c}]+)(.*) ]]; do
arr+=( "${BASH_REMATCH[1]}" )
s="${BASH_REMATCH[2]}"
done
Then to test:
printf "<%s>\n" "${arr[#]}"
<key_1=mislanious_string1>
<key_2=miscellaneous_string2>
<key_3=foo>
<key_4=BASH>
I like anubhava's grep -oP solution best. Here's an awk solution:
STRING="key_15=foobarzkey_3=bash"
awk -v RS="key_" 'NR>1{split($0, a, /=/); print "STRING" a[1] "=" RS $0}' <<< "$STRING"
STRING15=key_15=foobarz
STRING3=key_3=bash
So, to create that output as shell variables
eval $(awk -v RS="key_" 'NR>1{split($0, a, /=/); print "STRING" a[1] "=" RS $0}' <<< "$STRING")
echo $STRING3 # => key_3=bash
echo $STRING15 # => key_15=foobarz
This answer originally didn't recognize keys not preceded by whitespace. This has been fixed. In its current form this answer provides value as a portable solution. If you disagree, please let me know.
The answers provided by Glenn Jackman and anubhava are helpful, but use GNU extensions not available on all platforms (grep -P, awk with a multi-char. RS value).
Here's a POSIX-compliant sed solution that should work on most platforms, using either bash, ksh, or zsh as the shell:
str='key_1=mislanious_string1 key_2=miscellaneous_string2key_3=last'
while read -r varDef; do
[[ -n $varDef ]] && typeset "$varDef"
done < <(sed 's/\(key_\([0-9]\{1,\}\)=\)/\'$'\n''string\2=\1/g' <<<"$str")
#'# Print the variables created ($string1, $string2, $string3).
typeset -p ${!string#}
Note that lowercase variable names (string1, ...) are used so as to prevent potential conflicts with environment variables.
sed is used to split the string into key-value tokens each on their own line, preceded by the desired target variable name and =, effectively outputting shell variable assignments; e.g., for key_1, the sed command passes out:
string1=key_1=mislanious_string1 
The while loop then reads each output line and uses typeset to declare and assign the variable (note that typeset was chosen for ksh compatibility - while typeset also works in bash and zsh you'd typically use declare there); [[ -n $varDef ]] ignores the empty line that the sed output starts with.
Note: This solution trims trailing whitespace from values, consistent with the example in the question. This trimming happens due to use of read with the default $IFS value (internal field separators) - to preserve trailing whitespace, simply use IFS= read instead of just read.
Also note that use of process substitution to provide input (while ... <(sed ...)) (as opposed to a pipeline (sed ... | while ...) is required to ensure that the variables are defined in the current shell (rather than in a subshell, which would result in variables not visible to the current shell).
Some background info on what makes the above sed command POSIX-compliant:
POSIX only mandates basic regular expressions for sed, which takes away many features (e.g., quantifiers ? and +, alternation (|)) and makes escaping more cumbersome (e.g., ( and ) must be \-escaped).
POSIX sed also doesn't support escape sequences such as \n in replacement strings passed to s, so ANSI-C quoting is used to splice an \-escaped actual newline into the replacement string using $'\n'.
As an example of how useful the non-POSIX GNU sed extensions are, here's an equivalent command taking full advantage of GNU sed's features (extended regular expressions, support for \n), resulting in a shorter and more readable command:
sed -r 's/(key_([0-9]+)=)/\nstring\2=\1/g' <<<"$str"
Sometimes the simplest solution can be overlooked:
STRING="key_1=mislanious_string1key_2=miscellaneous_string2"
read STRING1 STRING2<<<${STRING//key_/ key_}
echo $STRING1
echo $STRING2

Assigning a value having semicolon (';') to a variable in bash

I'm trying to escape ('\') a semicolon (';') in a string on unix shell (bash) with sed. It works when I do it directly without assigning the value to a variable. That is,
$ echo "hello;" | sed 's/\([^\\]\);/\1\\;/g'
hello\;
$
However, it doesn't appear to work when the above command is assigned to a variable:
$ result=`echo "hello;" | sed 's/\([^\\]\);/\1\\;/g'`
$
$ echo $result
hello;
$
Any idea why?
I tried by using the value enclosed with and without quotes but that didn't help. Any clue greatly appreciated.
btw, I first thought the semicolon at the end of the string was somehow acting as a terminator and hence the shell didn't continue executing the sed (if that made any sense). However, that doesn't appear to be an issue. I tried by using the semicolon not at the end of the string (somewhere in between). I still see the same result as before. That is,
$ echo "hel;lo" | sed 's/\([^\\]\);/\1\\;/g'
hel\;lo
$
$ result=`echo "hel;lo" | sed 's/\([^\\]\);/\1\\;/g'`
$
$ echo $result
hel;lo
$
You don't need sed (or any other regex engine) for this at all:
s='hello;'
echo "${s//;/\;}"
This is a parameter expansion which replaces ; with \;.
That said -- why are you trying to do this? In most cases, you don't want escape characters (which are syntax) to be inside of scalar variables (which are data); they only matter if you're parsing your data as syntax (such as using eval), which is a bad idea for other reasons, and best avoided (or done programatically, as via printf %q).
I find it interesting that the use of back-ticks gives one result (your result) and the use of $(...) gives another result (the wanted result):
$ echo "hello;" | sed 's/\([^\\]\);/\1\\;/g'
hello\;
$ z1=$(echo "hello;" | sed 's/\([^\\]\);/\1\\;/g')
$ z2=`echo "hello;" | sed 's/\([^\\]\);/\1\\;/g'`
$ printf "%s\n" "$z1" "$z2"
hello\;
hello;
$
If ever you needed an argument for using the modern x=$(...) notation in preference to the older x=`...` notation, this is probably it. The shell does an extra round of backslash interpretation with the back-ticks. I can demonstrate this with a little program I use when debugging shell scripts called al (for 'argument list'); you can simulate it with printf "%s\n":
$ z2=`echo "hello;" | al sed 's/\([^\\]\);/\1\\;/g'`
$ echo "$z2"
sed
s/\([^\]\);/\1\;/g
$ z1=$(echo "hello;" | al sed 's/\([^\\]\);/\1\\;/g')
$ echo "$z1"
sed
s/\([^\\]\);/\1\\;/g
$ z1=$(echo "hello;" | printf "%s\n" sed 's/\([^\\]\);/\1\\;/g')
$ echo "$z1"
sed
s/\([^\\]\);/\1\\;/g
$
As you can see, the script executed by sed differs depending on whether you use x=$(...) notation or x=`...` notation.
s/\([^\]\);/\1\;/g # ``
s/\([^\\]\);/\1\\;/g # $()
Summary
Use $(...); it is easier to understand.
You need to use four (three also work). I guess its because it's interpreted twice, first one by the sed command and the second one by the shell when reading the content of the variable:
result=`echo "hello;" | sed 's/\([^\\]\);/\1\\\\;/g'`
And
echo "$result"
yields:
hello\;

Resources