I'm looking to change a variable with format 8.0.3.0 to 8.0-3.0 in a bash script
So it's always the middle decimal point to a dash
What's the most efficient way to do this?
You can use parameter expansion, like this:
v="8.0.3.0"
echo "${v%*.*.*}-${v#*.*.}"
With GNU sed you can suffix a replacement with the occurence of the match you want to replace :
$ echo "a.b.c.d" | sed 's/\./-/2'
a.b-c.d
If you have a lot of data to handle, it may be worth trying this solution which is not as concise but avoids calling an external program, using internal matching, so may be faster.
string=8.0.3.0
if
[[ $string =~ ^([0-9])[.]([0-9])[.]([0-9])[.]([0-9])$ ]]
then
newstring="${BASH_REMATCH[1]}.${BASH_REMATCH[2]}-${BASH_REMATCH[3]}.${BASH_REMATCH[4]}"
else
echo "ERROR: no match on $string"
fi
If awk is a choice
awk '$0 = gensub( /\./, "-", 2 )' file
8.0-3.0
Related
Trying to extract text between a path variable which has the following value
path_value="path/to/value/src"
I want to extract just value from the above variable and use that later in my script. I know it can be done using grep or awk but I wanted to know how it can be done using sed
So I tried this
service_name=$(echo $path_value | sed -e 's/path/to/(.*\)/.*/\1/')
But I get this error bad flag in substitute command: '('
Could you please suggest what is the right regex to achieve what I am trying to do?
Using parameter substitution and eliminating the subprocess calls:
$ path_value="path/to/value/src"
$ tempx="${path_value%/*}"
$ echo "${tempx}"
path/to/value
$ service_name="${tempx##*/}"
$ echo "${service_name}"
value
Performing a bash/regex comparison and retrieving the desired item from the BASH_REMATCH[] array (also eliminates subprocess calls):
$ regex='.*/([^/]+)/([^/]+)$'
$ [[ "${path_value}" =~ $regex ]] && service_name="${BASH_REMATCH[1]}"
$ echo "${service_name}"
# fwiw, contents of the BASH_REMATCH[] array:
$ typeset -p BASH_REMATCH
declare -ar BASH_REMATCH=([0]="path/to/value/src" [1]="value" [2]="src")
You can use
#!/bin/bash
path_value="path/to/value/src"
service_name=$(echo "$path_value" | sed 's~path/to/\([^/]*\)/.*~\1~')
echo "$service_name"
# => value
See the online demo.
Note I replaced / regex delimiters with ~ so as to avoid escaping / chars inside the pattern.
The capturing parentheses must both be escaped in a POSIX BRE regex.
The [^/]* part only matches zero or more chars other than /.
I have a variable with some lines in it and I would like to pad it with a number of newlines defined in another variable. However it seems that the subshell may be stripping the trailing newlines. I cannot just use '\n' with echo -e as the lines may already contain escaped chars which need to be printed as is.
I have found I can print an arbitrary number of newlines using this.
n=5
yes '' | sed -n "1,${n}p;${n}q"
But if I run this in a subshell to store it in the variable, the subshell appears to strip the trailing newlines.
I can approximate the functionality but it's clumsy and due to the way I am using it I would much rather be able to just call echo "$var" or even use $var itself for things like string concatenation. This approximation runs into the same issue with subshells as soon as the last (filler) line of the variable is removed.
This is my approximation
n=5
var="test"
#I could also just set n=6
cmd="1,$((n+1))p;$((n+1))q"
var="$var$(yes '' | sed -n $cmd; echo .)"
#Now I can use it with
echo "$var" | head -n -1
Essentially I need a good way of appending a number of newlines to a variable which can then be printed with echo.
I would like to keep this POSIX compliant if at all possible but at this stage a bash solution would also be acceptable. I am also using this as part of a tool for which I have set a challenge of minimizing line and character count while maintaining readability. But I can work that out once I have a workable solution
Command substitutions with either $( ) or backticks will trim trailing newlines. So don't use them; use the shell's built-in string manipulation:
n=5
var="test"
while [ "$n" -gt 0 ]; do
var="$var
"
n=$((n-1))
done
Note that there must be nothing after the var="$var (before the newline), and nothing before the " on the next line (no indentation!).
A sequence of n newlines:
printf -v spaces "%*s" $n ""
newlines=${spaces// /$'\n'}
I have a file and its name looks like:
12U12345._L001_R1_001.fastq.gz
I want to assign to a variable just the 12U12345 part.
So far I have:
variable=`basename $fastq | sed {s'/_S[0-9]*_L001_R1_001.fastq.gz//'}`
Note: $fastq is a variable with the full path to the file in it.
This solution currently returns the full file name, any ideas how to get this right?
Just use the built-in parameter expansion provided by the shell, instead of spawning a separate process
fastq="12U12345._L001_R1_001.fastq.gz"
printf '%s\n' "${fastq%%.*}"
12U12345
or use printf() itself to store to a new variable in one-shot
printf -v numericPart '%s' "${fastq%%.*}"
printf '%s\n' "${numericPart}"
Also bash has a built-in regular expression comparison operator, represented by =~ using which you could do
fastq="12U12345._L001_R1_001.fastq.gz"
regex='^([[:alnum:]]+)\.(.*)'
if [[ $fastq =~ $regex ]]; then
numericPart="${BASH_REMATCH[1]}"
printf '%s\n' "${numericPart}"
fi
You could use cut:
$> fastq="/path/to/12U12345._L001_R1_001.fastq.gz"
$> variable=$(basename "$fastq" | cut -d '.' -f 1)
$> echo "$variable"
12U12345
Also, please note that:
It's better to wrap your variable inside quotes. Otherwise you command won't work with filenames that contain space(s).
You should use $() instead of the backticks.
Using Bash Parameter Expansion to extract the basename and then extract the portion of the filename you want:
fastq="/path/to/12U12345._L001_R1_001.fastq.gz"
file="${fastq##*/}" # gives 12U12345._L001_R1_001.fastq.gz
string="${file%%.*}" # gives 12U12345
Note that Bash doesn't allow us to nest the parameter expansion. Otherwise, we could have combined statements 2 and 3 above.
I have the STRING as given below. There is no specific separator between each key. The only way is to identify the keys is using the keyword "key_1" or "key_2" etc..
All keys begin with "key_" and can never appear in the value of another:
STRING="key_1=mislanious_string1 key_2=miscellaneous_string2"
I want the output as below.
echo $STRING1 should print:
key_1=mislanious_string1
echo $STRING2 should print:
key_2=mislanious_string2
e.g:
If STRING="key_1=foobarzkey_2=bash" , then the output should look like , STRING1=key_1=foobarz and STRING2=key_2=bash.
There may be more keys like key_1 , key_2 , key_3 etc. Each key starts with "key_" and can never appear in the value of another:
How to this in UNIX bash shell?
Using grep -P (PCRE) to support multiple key-value pairs in input:
STRING="key_1=mislanious_string1key_2=miscellaneous_string2key_3=fookey_4=BASH"
grep -oP 'key_[^=]+=.*?(?=key_|$)' <<< "$STRING"
key_1=mislanious_string1
key_2=miscellaneous_string2
key_3=foo
key_4=BASH
To store them into BASH array you can use:
read -d '' -ra arr < <(grep -oP 'key_[^=]+=.*?(?=key_|$)' <<< "$STRING")
printf "%s\n" "${arr[#]}"
key_1=mislanious_string1
key_2=miscellaneous_string2
key_3=foo
key_4=BASH
declare -p arr
declare -a arr='([0]="key_1=mislanious_string1" [1]="key_2=miscellaneous_string2" [2]="key_3=foo" [3]="key_4=BASH")'
UPDATE:: Here is a pure BASH (non-gnu) way of splitting these strings. We first insert an invisible character before every occurrence of key_ string and then use that for splitting the string:
STRING="key_1=mislanious_string1key_2=miscellaneous_string2key_3=fookey_4=BASH"
c=$'\x06'
s="${STRING//key_/${c}key_}"
arr=()
while [[ "$s" =~ ${c}(key_[^=]+=[^${c}]+)(.*) ]]; do
arr+=( "${BASH_REMATCH[1]}" )
s="${BASH_REMATCH[2]}"
done
Then to test:
printf "<%s>\n" "${arr[#]}"
<key_1=mislanious_string1>
<key_2=miscellaneous_string2>
<key_3=foo>
<key_4=BASH>
I like anubhava's grep -oP solution best. Here's an awk solution:
STRING="key_15=foobarzkey_3=bash"
awk -v RS="key_" 'NR>1{split($0, a, /=/); print "STRING" a[1] "=" RS $0}' <<< "$STRING"
STRING15=key_15=foobarz
STRING3=key_3=bash
So, to create that output as shell variables
eval $(awk -v RS="key_" 'NR>1{split($0, a, /=/); print "STRING" a[1] "=" RS $0}' <<< "$STRING")
echo $STRING3 # => key_3=bash
echo $STRING15 # => key_15=foobarz
This answer originally didn't recognize keys not preceded by whitespace. This has been fixed. In its current form this answer provides value as a portable solution. If you disagree, please let me know.
The answers provided by Glenn Jackman and anubhava are helpful, but use GNU extensions not available on all platforms (grep -P, awk with a multi-char. RS value).
Here's a POSIX-compliant sed solution that should work on most platforms, using either bash, ksh, or zsh as the shell:
str='key_1=mislanious_string1 key_2=miscellaneous_string2key_3=last'
while read -r varDef; do
[[ -n $varDef ]] && typeset "$varDef"
done < <(sed 's/\(key_\([0-9]\{1,\}\)=\)/\'$'\n''string\2=\1/g' <<<"$str")
#'# Print the variables created ($string1, $string2, $string3).
typeset -p ${!string#}
Note that lowercase variable names (string1, ...) are used so as to prevent potential conflicts with environment variables.
sed is used to split the string into key-value tokens each on their own line, preceded by the desired target variable name and =, effectively outputting shell variable assignments; e.g., for key_1, the sed command passes out:
string1=key_1=mislanious_string1
The while loop then reads each output line and uses typeset to declare and assign the variable (note that typeset was chosen for ksh compatibility - while typeset also works in bash and zsh you'd typically use declare there); [[ -n $varDef ]] ignores the empty line that the sed output starts with.
Note: This solution trims trailing whitespace from values, consistent with the example in the question. This trimming happens due to use of read with the default $IFS value (internal field separators) - to preserve trailing whitespace, simply use IFS= read instead of just read.
Also note that use of process substitution to provide input (while ... <(sed ...)) (as opposed to a pipeline (sed ... | while ...) is required to ensure that the variables are defined in the current shell (rather than in a subshell, which would result in variables not visible to the current shell).
Some background info on what makes the above sed command POSIX-compliant:
POSIX only mandates basic regular expressions for sed, which takes away many features (e.g., quantifiers ? and +, alternation (|)) and makes escaping more cumbersome (e.g., ( and ) must be \-escaped).
POSIX sed also doesn't support escape sequences such as \n in replacement strings passed to s, so ANSI-C quoting is used to splice an \-escaped actual newline into the replacement string using $'\n'.
As an example of how useful the non-POSIX GNU sed extensions are, here's an equivalent command taking full advantage of GNU sed's features (extended regular expressions, support for \n), resulting in a shorter and more readable command:
sed -r 's/(key_([0-9]+)=)/\nstring\2=\1/g' <<<"$str"
Sometimes the simplest solution can be overlooked:
STRING="key_1=mislanious_string1key_2=miscellaneous_string2"
read STRING1 STRING2<<<${STRING//key_/ key_}
echo $STRING1
echo $STRING2
I know it is possible to invert grep output with the -v flag. Is there a way to only output the non-matching part of the matched line? I ask because I would like to use the return code of grep (which sed won't have). Here's sort of what I've got:
tags=$(grep "^$PAT" >/dev/null 2>&1)
[ "$?" -eq 0 ] && echo $tags
You could use sed:
$ sed -n "/$PAT/s/$PAT//p" $file
The only problem is that it'll return an exit code of 0 as long as the pattern is good, even if the pattern can't be found.
Explanation
The -n parameter tells sed not to print out any lines. Sed's default is to print out all lines of the file. Let's look at each part of the sed program in between the slashes. Assume the program is /1/2/3/4/5:
/$PAT/: This says to look for all lines that matches pattern $PAT to run your substitution command. Otherwise, sed would operate on all lines, even if there is no substitution.
/s/: This says you will be doing a substitution
/$PAT/: This is the pattern you will be substituting. It's $PAT. So, you're searching for lines that contain $PAT and then you're going to substitute the pattern for something.
//: This is what you're substituting for $PAT. It is null. Therefore, you're deleting $PAT from the line.
/p: This final p says to print out the line.
Thus:
You tell sed not to print out the lines of the file as it processes them.
You're searching for all lines that contain $PAT.
On these lines, you're using the s command (substitution) to remove the pattern.
You're printing out the line once the pattern is removed from the line.
How about using a combination of grep, sed and $PIPESTATUS to get the correct exit-status?
$ echo Humans are not proud of their ancestors, and rarely invite
them round to dinner | grep dinner | sed -n "/dinner/s/dinner//p"
Humans are not proud of their ancestors, and rarely invite them round to
$ echo $PIPESTATUS[1]
0[1]
The members of the $PIPESTATUS array hold the exit status of each respective command executed in a pipe. $PIPESTATUS[0] holds the exit status of the first command in the pipe, $PIPESTATUS[1] the exit status of the second command, and so on.
Your $tags will never have a value because you send it to /dev/null. Besides from that little problem, there is no input to grep.
echo hello |grep "^he" -q ;
ret=$? ;
if [ $ret -eq 0 ];
then
echo there is he in hello;
fi
a successful return code is 0.
...here is 1 take at your 'problem':
pat="most of ";
data="The apples are ripe. I will use most of them for jam.";
echo $data |grep "$pat" -q;
ret=$?;
[ $ret -eq 0 ] && echo $data |sed "s/$pat//"
The apples are ripe. I will use them for jam.
... exact same thing?:
echo The apples are ripe. I will use most of them for jam. | sed ' s/most\ of\ //'
It seems to me you have confused the basic concepts. What are you trying to do anyway?
I am going to answer the title of the question directly instead of considering the detail of the question itself:
"grep a pattern and output non-matching part of line"
The title to this question is important to me because the pattern I am searching for contains characters that sed will assign special meaning to. I want to use grep because I can use -F or --fixed-strings to cause grep to interpret the pattern literally. Unfortunately, sed has no literal option, but both grep and bash have the ability to interpret patterns without considering any special characters.
Note: In my opinion, trying to backslash or escape special characters in a pattern appears complex in code and is unreliable because it is difficult to test. Using tools which are designed to search for literal text leaves me with a comfortable 'that will work' feeling without considering POSIX.
I used both grep and bash to produce the result because bash is slow and my use of fast grep creates a small output from a large input. This code searches for the literal twice, once during grep to quickly extract matching lines and once during =~ to remove the match itself from each line.
while IFS= read -r || [[ -n "$RESULT" ]]; do
if [[ "$REPLY" =~ (.*)("$LITERAL_PATTERN")(.*) ]]; then
printf '%s\n' "${BASH_REMATCH[1]}${BASH_REMATCH[3]}"
else
printf "NOT-REFOUND" # should never happen
exit 1
fi
done < <(grep -F "$LITERAL_PATTERN" < "$INPUT_FILE")
Explanation:
IFS= Reassigning the input field separator is a special prefix for a read statement. Assigning IFS to the empty string causes read to accept each line with all spaces and tabs literally until end of line (assuming IFS is default space-tab-newline).
-r Tells read to accept backslashes in the input stream literally instead of considering them as the start of an escape sequence.
$REPLY Is created by read to store characters from the input stream. The newline at the end of each line will NOT be in $REPLY.
|| [[ -n "$REPLY" ]] The logical or causes the while loop to accept input which is not newline terminated. This does not need to exist because grep always provides a trailing newline for every match. But, I habitually use this in my read loops because without it, characters between the last newline and the end of file will be ignored because that causes read to fail even though content is successfully read.
=~ (.*)("$LITERAL_PATTERN")(.*) ]] Is a standard bash regex test, but anything in quotes in taken as a literal. If I wanted =~ to consider the regex characters in contained in $PATTERN, then I would need to eliminate the double quotes.
"${BASH_REMATCH[#]}" Is created by [[ =~ ]] where [0] is the entire match and [N] is the contents of the match in the Nth set of parentheses.
Note: I do not like to reassign stdin to a while loop because it is easy to error and difficult to see what is happening later. I usually create a function for this type of operation which acts typically and expects file_name parameters or reassignment of stdin during the call.