Extracting a string between last two slashes in Bash - bash

I know this can be easily done using regex like I answered on https://stackoverflow.com/a/33379831/3962126, however I need to do this in bash.
So the closest question on Stackoverflow I found is this one bash: extracting last two dirs for a pathname, however the difference is that if
DIRNAME = /a/b/c/d/e
then I need to extract
d

This may be relatively long, but it's also much faster to execute than most preceding answers (other than the zsh-only one and that by j.a.), since it uses only string manipulations built into bash and uses no subshell expansions:
string='/a/b/c/d/e' # initial data
dir=${string%/*} # trim everything past the last /
dir=${dir##*/} # ...then remove everything before the last / remaining
printf '%s\n' "$dir" # demonstrate output
printf is used in the above because echo doesn't work reliably for all values (think about what it would do on a GNU system with /a/b/c/-n/e).

Here a pure bash solution:
[[ $DIRNAME =~ /([^/]+)/[^/]*$ ]] && printf '%s\n' "${BASH_REMATCH[1]}"
Compared to some of the other answers:
It matches the string between the last two slashes. So, for example, it doesn't match d if DIRNAME=d/e.
It's shorter and fast (just uses built-ins and doesn't create subprocesses).
Support any character between last two slashes (see Charles Duffy's answer for more on this).
Also notice that is not the way to assign a variable in bash:
DIRNAME = /a/b/c/d/e
^ ^
Those spaces are wrong, so remove them:
DIRNAME=/a/b/c/d/e

Using awk:
echo "/a/b/c/d/e" | awk -F / '{ print $(NF-1) }' # d
Edit: This does not work when the path contains newlines, and still gives output when there are less than two slashes, see comments below.

Using sed
if you want to get the fourth element
DIRNAME="/a/b/c/d/e"
echo "$DIRNAME" | sed -r 's_^(/[^/]*){3}/([^/]*)/.*$_\2_g'
if you want to get the before last element
DIRNAME="/a/b/c/d/e"
echo "$DIRNAME" | sed -r 's_^.*/([^/]*)/[^/]*$_\1_g'

OMG, maybe this was obvious, but not to me initially. I got the right result with:
dir=$(basename -- "$(dirname -- "$str")")
echo "$dir"

Using zsh parameter substitution is pretty cool too
echo ${${DIRNAME%/*}##*/}
I think it's faster than the double $() as well, because it won't need any subprocesses.
Basically it slices off the right side first, and then all the remaining left side second.

Related

filename comparison with wildcard

I am working on a script and I need to compare a filename to another one and look for specific changes (in this case a "(x)" added to a filename when OS X needs to add a file to a directory, when a filename already exists) so this is an excerpt of the script, modified to be tested without the rest of it.
#!/bin/bash
p2_s2="/Path/to file (name)/containing - many.special chars.docx.gdoc"
next_line="/Path/to file (name)/containing - many.special chars.docx (1).gdoc"
file_ext=$(echo "${p2_s2}" | rev | cut -d '.' -f 1 | rev)
file_name=$(basename "${p2_s2}" ".${file_ext}")
file_dir=$(dirname "${p2_s2}")
esc_file_name=$(printf '%q' "${file_name}")
esc_file_dir=$(printf '%q' "${file_dir}")
esc_next_line=$(printf '%q' "${next_line}")
if [[ ${esc_next_line} =~ ${esc_file_dir}/${esc_file_name}\ \(?\).${file_ext} ]]
then
echo "It's a duplicate!"
fi
What I'm trying to do here is detect if the file next_line is a duplicate of p2_s2. As I am expecting multiple duplicates, next_line can have a (1) appended at the end of a filename or any other number in brackets (Although I am sure no double digits). As I can't do a simple string compare with a wildcard in the middle, I tried using the "=~" operator and escaping all the special chars. Any idea what I'm doing wrong?
You can trim ps2_s2's extension, trim next_line's extension including the number inside the parenthesis and see if you get the same file name. If you do - it's a duplicate. In order to do so, [[ allows us to perform a comparison between a string and a Glob.
I used extglob's +( ... ) pattern, so I can use +([0-9]) to match the number inside the parenthesis. Notice that extglob is enabled by shopt -s extglob.
#!/bin/bash
p2_s2="/Path/to/ps2.docx.gdoc"
next_line="/Path/to/ps2(1).docx.gdoc"
shopt -s extglob
if [[ "${p2_s2%%.*}" = "${next_line%%\(+([0-9])\).*}" ]]; then
printf '%s is a duplicate of %s\n' "$next_line" "$p2_s2"
fi
EDIT:
I now see that you've edited your question, so in case this solution is not enough, I'm positive that it'll be a good template to work with.
The (1) in next_line doesn't come before the final . it comes before the second to final . in the original filename but you only strip off a single . as the extension.
So when you generate the comparison filename you end up with /Path/to\ file\ \(name\)/containing\ -\ many.special\ chars.docx\ \(?\).gdoc which doesn't match what you expect.
If you had added set -x to the top of your script you'd have seen what the shell was actually doing and seen this.
What does OS X actually do in this situation? Does it add (#) before .gdoc? Does it add it before.docx`? Does it depend on whether OS X knows what the filename is (it is some type it can open natively)?

Trimming filenames in bash with built-in tools only (no rename command)

I have filenames of the form
word-123_AnotherWord--asdf_12345.mp4
word-123_AnotherWord-_asdf-12345.mp4
word-123_AnotherWord-asdf--12345.mp4
word-123_AnotherWord-asdf_-12345.mp4
...which I wish to trim to only contain the last 11 characters and extension.
My current attempt to do so looks like the following:
$ for i in *.mp4 ; do
mv "$i" "${/.*?(.{1,11}\.mp4)$/}";
done
But I gives me this error:
bash: ${/.*?(.{1,11}.mp4)$/}: bad substitution
Any idea why?
This question is a continue to this stack , but answer there works on my PC locally only, I didn't work on my server remotely!
Thanks in advance!
In the syntax ${var/pattern/replacement}, there are several things wrong with the usage "${/.*?(.{1,11}\.mp4)$/}":
First, var isn't optional, it's mandatory.
Second, pattern needs to be given in glob format, not regex format. If you want fancier expressions, use extglob syntax.
Third, unless intent is to delete the parts matching the expression, the final / should actually contain something.
If you want to trim everything but the last 15 characters of each name (11 + 4 for the extension), that's trivial:
for i in *.mp4; do
mv "$i" "${i:${#i}-15}"
done
Now, if you really want to use a regex:
name_re='.{1,11}\.mp4$'
for i in *.mp4; do
[[ $i =~ $name_re ]] && mv -- "$i" "${BASH_REMATCH[0]}"
done

Match exact word in bash script, extract number from string

I'm trying to create a very simple bash script that will open new link base on the input command
Use case #1
$ ./myscript longname55445
It should take the number 55445 and then assign that to a variable which will later be use to open new link based on the given number.
Use case #2
$ ./myscript l55445
It should do the exact same thing as above by taking the number and then open the same link.
Use case #3
$ ./myscript 55445
If no prefix given then we just simply open that same link as a fallback.
So far this is what I have
#!/bin/sh
BASE_URL=http://api.domain.com
input=$1
command=${input:0:1}
if [ "$command" == "longname" ]; then
number=${input:1:${#input}}
url="$BASE_URL?id="$number
open $url
elseif [ "$command" == "l" ]; then
number=${input:1:${#input}}
url="$BASE_URL?id="$number
open $url
else
number=${input:1:${#input}}
url="$BASE_URL?id="$number
open $url
fi
But this will always fallback to the elseif there.
I'm using zsh at the moment.
input=$1
command=${input:0:1}
sets command to the first character of the first argument. It's not possible for a one character string to be equal to an eight-character string ("longname"), so the if condition must always fail.
Furthermore, both your elseif and your else clauses set
number=${input:1:${#input}}
Which you could have written more simply as
number=${input:1}
But in both cases, you're dropping the first character of input. Presumably in the else case, you wanted the entire first argument.
see whether this construct is helpful for your purpose:
#!/bin/bash
name="longname55445"
echo "${name##*[A-Za-z]}"
this assumes a letter adjacent to number.
The following is NOT another way to write the same, because it is wrong.
Please see comments below by mklement0, who noticed this. Mea culpa.
echo "${name##*[:letter:]}"
You have command=${input:0:1}
It takes the first single char, and you compare it to "longname", of course it will fail, and go to elseif.
The key problem is to check if the input is beginning with l or longnameor nothing. If in one of the 3 cases, take the trailing numbers.
One grep line could do it, you can just grep on input and get the returned text:
kent$ grep -Po '(?<=longname|l|^)\d+' <<<"l234"
234
kent$ grep -Po '(?<=longname|l|^)\d+' <<<"longname234"
234
kent$ grep -Po '(?<=longname|l|^)\d+' <<<"234"
234
kent$ grep -Po '(?<=longname|l|^)\d+' <<<"foobar234"
<we got nothing>
You can use regex matching in bash.
[[ $1 =~ [0-9]+ ]] && number=$BASH_REMATCH
You can also use regex matching in zsh.
[[ $1 =~ [0-9]+ ]] && number=$MATCH
Based on the OP's following clarification in a comment,
I'm only looking for the numbers [...] given in the input.
the solution can be simplified as follows:
#!/bin/bash
BASE_URL='http://api.domain.com'
# Strip all non-digits from the 1st argument to get the desired number.
number=$(tr -dC '[:digit:]' <<<"$1")
open "$BASE_URL?id=$number"
Note the use of a bash shebang, given the use of 'bashism' <<< (which could easily be restated in a POSIX-compliant manner).
Similarly, the OP's original code should use a bash shebang, too, due to use of non-POSIX substring extraction syntax.
However, judging by the use of open to open a URL, the OP appears to be on OSX, where sh is essentially bash (though invocation as sh does change behavior), so it'll still work there. Generally, though, it's safer to be explicit about the required shell.

Shell: extract words matching pattern, but ignore circumventing expression

I am currently trying to extract ALL matching expressions from a text which e.g. looks like this and put them into an array.
aaaaaaaaa${bbbbbbb}ccccccc${dddd}eeeee
ssssssssssssssssss${TTTTTT}efhsekfh ej
348653jlk3jß1094utß43t59ßgöelfl,-s-fko
The matching expressions are similar to this: ${}. Beware that I need the full expression, not only the word in between this expression! So in this case the result should be an array which contains:
${bbbbbbb}
${dddd}
${TTTTTTT}
Problems I have stumbled upon and couldn't solve:
It should NOT recognizes this as a whole
${bbbbbbb}ccccccc${dddd} but each for its own
grep -o is not installed on the old machine, Perl is not allowed either!
Many commands e.g. BASH_REMATCH only deliver the whole line or the first occurrence of the expression, instead of all matching expressions in the line!
The mentioned pattern \${[^}]*} seems to work partly, as it can extract the first occurrence of the expression, however it always omitts the ones following after that, if it's in the same text line. What I need is ALL matching expressions found in the line, not only the first one.
You could split the string on any of the characters $,{,}:
$ s='...blaaaaa${blabla}bloooo${bla}bluuuuu...'
$ echo "$s"
...blaaaaa${blabla}bloooo${bla}bluuuuu...
$ IFS='${}' read -ra words <<< "$s"
$ for ((i=0; i<${#words[#]}; i++)); do printf "%d %s\n" $i "${words[i]}"; done
0 ...blaaaaa
1
2 blabla
3 bloooo
4
5 bla
6 bluuuuu...
So if you're trying to extract the words inside the braces:
$ for ((i=2; i<${#words[#]}; i+=3)); do printf "%d %s\n" $i "${words[i]}"; done
2 blabla
5 bla
If the above doesn't suit you, grep will work:
$ echo '...blaaaaa${blabla}bloooo${bla}bluuuuu...' | grep -o '\${[^}]\+}'
${blabla}
${bla}
You still haven't told us exactly what output you want.
Since it bugged me a lot I have asked directly on www.unix.com and was kindly provided with a solution which fits for my ancient shell. So if anyone got the same problem here is the solution:
line='aaaa$aa{yyy}aaa${important}xxxxxxxx${important2}oo{o$}oo$oo${importantstring3}'
IFS=\$ read -a words <<< "$line"
regex='^(\{[^}]+})'
for e in "${words[#]}"; do
if [[ $e =~ $regex ]]; then
echo "\$${BASH_REMATCH[0]}";
fi;
done
which prints then the following - without even getting disturbed by random occurrences of $ and { or } between the syntactically correct expressions:
${important}
${important2}
${importantstring3}
I have updated the full solution after I got another update from the forums: now it also ignores this: aaa$aa{yyy}aaaa - which it previously printed as ${yyy} - but which it should completely ignore as there are characters between $ and {. Now with the additional anchoring on the beginning of the regexp it works as expected.
I just found another issue: theoretically using the above approach I would still get a wrong output if the read line looks like this line='{ccc}aaaa${important}aaa'. The IFS would split it and the REGEX would match {ccc} although this hadn't the $ sign in front. This is suboptimal.
However following approach could solve it: after getting the BASH_REMATCH I would need to do a search in the original line - the one I gave to the IFS - for this exact expression ${ccc} - with the difference, that the $ is included! And only if it finds this exact match, only then, it counts as a valid match; otherwise it should be ignored. Kind of a reverse search method...
Updated - add this reverse search to ignore the trap on the beginning of the line:
pattern="\$${BASH_REMATCH[0]}";
searchresult="";
searchresult=`echo "$line" | grep "$pattern"`;
if [ "$searchresult" != "" ]; then echo "It was found!"; fi;
Neglectable issue: If the line looks like this line='{ccc}aaaaaa${ccc}bbbbb' it would recognize the first {ccc} as a valid match (although it isn't) and print it, because the reverse search found the second ${ccc}. Although this is not intended it's irrelevant for my specific purpose as it implies that this pattern does in fact exist at least once in the same line.

Bash: trim a parameter from both ends

Greetings!
This are well know Bash parameter expansion patterns:
${parameter#word}, ${parameter##word}
and
${parameter%word}, ${parameter%%word}
I need to chop one part from the beginning and anoter part from the trailing of the parameter. Could you advice something for me please?
If you're using Bash version >= 3.2, you can use regular expression matching with a capture group to retrieve the value in one command:
$ path='/xxx/yyy/zzz/ABC/abc.txt'
$ [[ $path =~ ^.*/([^/]*)/.*$ ]]
$ echo ${BASH_REMATCH[1]}
ABC
This would be equivalent to:
$ path='/xxx/yyy/zzz/ABC/abc.txt'
$ path=$(echo "$path" | sed 's|^.*/\([^/]*\)/.*$|\1|p')
$ echo $path
ABC
I don't know that there's an easy way to do this without resorting to sub-shells, something you probably want to avoid for efficiency. I would just use:
> xx=hello_there
> yy=${xx#he}
> zz=${yy%re}
> echo ${zz}
llo_the
If you're not fussed about efficiency and just want a one-liner:
> zz=$(echo ${xx%re} | sed 's/^he//')
> echo ${zz}
llo_the
Keep in mind that this second method starts sub-shells - it's not something I'd be doing a lot of if your script has to run fast.
This solution uses what Andrey asked for and it does not employ any external tool. Strategy: Use the % parameter expansion to remove the file name, then use the ## to remove all but the last directory:
$ path=/path/to/my/last_dir/filename.txt
$ dir=${path%/*}
$ echo $dir
/path/to/my/last_dir
$ dir=${dir##*/}
$ echo $dir
last_dir
I would highly recommend going with bash arrays as their performance is just over 3x faster than regular expression matching.
$ path='/xxx/yyy/zzz/ABC/abc.txt'
$ IFS='/' arr=( $path )
$ echo ${arr[${#arr[#]}-2]}
ABC
This works by telling bash that each element of the array is separated by a forward slash / via IFS='/'. We access the penultimate element of the array by first determining how many elements are in the array via ${#arr[#]} then subtracting 2 and using that as the index to the array.

Resources