Caseinsensitive Parameter Expansion - bash

my problem is, I try to strip a string at a start of a variable. I have done shopt -s exglob to get extended pattern matching.
a="HelloDolly"
echo "${a#[A-Z]+([a-z])}"
I thought that +([a-z]) mean as much lower case letter as possible. And that [A-Z]+([a-z]) should match Hello
should return Dolly but I get lloDolly back. If give / instead # a try
echo "${a/[A-Z]+([a-z])}"
I get back nothing. Looks like the Parameter Expansions is caseinsensitive.
Thanks everybody who could give me an hint.

Using a single #, you get the shortest possible match. "He" is the shortest possible match of one uppercase letter and one or more lowercase letters. Switch to double # to get the longest possible match "Hello"
echo "${a##[A-Z]+([a-z])}"
To avoid issues with locale-based interpretation of character ranges, use character classes instead:
echo "${a##[[:upper:]]+([[:lower:]])}"

Related

BASH - Replace substring "$$" with substring "$$$"

Essentially what I am trying to do is take a string with a bunch of text and if it has a substring of "$$" to replace it with a substring of "$$$"
ex:
string="abcde\$\$fghi"
# Modify string
echo $string
# ^ should give "abcde$$$fghi"
I have been at this for like 2 hours now and it seems like a very simple thing, so if anyone could provide some help then I would greatly appreciate it. Thanks!
EDIT: Changed original string in the question from "abcde$$fghi" to "abcde\$\$fghi"
$$ is a special variable in the shell, it contains the ID of the current process. The variables are expanded in double quotes, therefore string does not contain $$ but a number (the PID of shell) instead.
Enclose the string in apostrophes (single quotes) to get $$ inside it.
The replacement you need can be done in multiple ways. The simplest way (probably) and also the fastest way (for sure) is to use / in the parameter expansion of $string:
echo "${string/'$$'/'$$$'}"
To make it work you have to use the same trick as before: wrap $$ and $$$ in single quotes to prevent the shell replace them with something else. The quotes around the entire expression are needed to preserve the space characters contained by $string, otherwise the line is split to words by whitspaces and and echo outputs these words separated by one space character.
Check it online.
If you quote the string with single quote marks (i.e. string='abcde$$fghi') you can do the replacement with echo "${string/'$$'/'$$$'}"
Edit: this is basically what #axiac said in their comment

What does ${img_file%.*} in a shell script mean?

I know that .* means fetch all files regardless of the extensions (I hope I'm not wrong). However, I can't for the love of my life seem to figure out what does that extra % sign mean!
Here's two code snippets that might help describe the situation a bit more :
img_files=${img_files}' '$(ls ${TRAINING_DIR}/*.exp${exposure}.tif)
for img_file in ${img_files}; do
run_command tesseract ${img_file} ${img_file%.*} \
${box_config} ${config} &
For those who need even more details, here's the full script.
The expression ${img_file%.*} will remove the rightmost dot and any character after it in the variable img_file. From man bash:
${parameter%word}
${parameter%%word}
Remove matching suffix pattern. The word is expanded to produce
a pattern just as in pathname expansion. If the pattern matches
a trailing portion of the expanded value of parameter, then the
result of the expansion is the expanded value of parameter with
the shortest matching pattern
Example:
>var="word1 word2"
>echo ${var%word2}
word1
>echo ${var%word1}
word1 word2
% here means removal from right edge. For example
consider a variable img_file="racecar"
${img_file%c*} will return race.
${img_file%%c*} = ra

What does this variable assignment do?

I'm having to code a subversion hook script, and I found a few examples online, mostly python and perl. I found one or two shell scripts (bash) as well. I am confused by a line and am sorry this is so basic a question.
FILTER=".(sh|SH|exe|EXE|bat|BAT)$"
The script later uses this to perform a test, such as (assume EXT=ex):
if [[ "$FILTER" == *"$EXT"* ]]; then blah
My problem is the above test is true. However, I'm not asking you to assist in writing the script, just explaining the initial assignment of FILTER. I don't understand that line.
Editing in a closer example FILTER line. Of course the script, as written does not work, because 'ex' returns true, and not just 'exe'. My problem here is only, however, that I don't understant the layout of the variable assignment itself.
Why is there a period at the beginning? ".(sh..."
Why is there a dollar sign at the end? "...BAT)$"
Why are there pipes between each pattern? "sh|SH|exe"
You probably looking for something as next:
FILTER="\.(sh|SH|exe|EXE|bat|BAT)$"
for EXT
do
if [[ "$EXT" =~ $FILTER ]];
then
echo $EXT extension disallowed
else
echo $EXT is allowed
fi
done
save it to myscript.sh and run it as
myscript.sh bash ba.sh
and will get
bash is allowed
ba.sh extension disallowed
If you don't escape the "dot", e.g. with the FILTER=".(sh|SH|exe|EXE|bat|BAT)$" you will get
bash extension disallowed
ba.sh extension disallowed
What is (of course) wrong.
For the questions:
Why is there a period at the beginning? ".(sh..."
Because you want match .sh (as extension) and not for example bash (without the dot). And therefore the . must be escaped, like \. because the . in regex mean "any character.
Why is there a dollar sign at the end? "...BAT)$"
The $ mean = end of string. You want match file.sh and not file.sh.jpg. The .sh should be at the end of string.
Why are there pipes between each pattern? "sh|SH|exe"
In the rexex, the (...|...|...) construction delimites the "alternatives". As you sure quessed.
You really need read some "regex tutorial" - it is more complicated - and can't be explained in one answer.
Ps: NEVER use UPPERCASE variable names, they can collide with environment variables.
This just assigns a string to FILTER; the contents of that string have no special meaning. When you try to match it against the pattern *ex*, the result is true assuming that the value of $FILTER consists the string ex surrounded by anything on either side. This is true; ex is a substring of exe.
FILTER=".(sh|SH|exe|EXE|bat|BAT)$"
^^
|
+---- here is the "ex" from the pattern.
As I can this is similar to regular expression pattern:
In regular expressions the string start with can be show with ^, similarly in this case . represent seems doing that.
In the bracket you have exact string, which represents what the exact file extensions would be matched, they are 'Or' by using the '|'.
And at the end the expression should only pick the string will '$' or end point and not more than.
I would say that way original author might have looked at it and implemented it.

count quotes in a string that do not have a backslash before them

Hey I'm trying to use a regex to count the number of quotes in a string that are not preceded by a backslash..
for example the following string:
"\"Some text
"\"Some \"text
The code I have was previously using String#count('"')
obviously this is not good enough
When I count the quotes on both these examples I need the result only to be 1
I have been searching here for similar questions and ive tried using lookbehinds but cannot get them to work in ruby.
I have tried the following regexs on Rubular from this previous question
/[^\\]"/
^"((?<!\\)[^"]+)"
^"([^"]|(?<!\)\\")"
None of them give me the results im after
Maybe a regex is not the way to do that. Maybe a programatic approach is the solution
How about string.count('"') - string.count("\\"")?
result = subject.scan(
/(?: # match either
^ # start-of-string\/line
| # or
\G # the position where the previous match ended
| # or
[^\\] # one non-backslash character
) # then
(\\\\)* # match an even number of backslashes (0 is even, too)
" # match a quote/x)
gives you an array of all quote characters (possibly with a preceding non-quote character) except unescaped ones.
The \G anchor is needed to match successive quotes, and the (\\\\)* makes sure that backslashes are only counted as escaping characters if they occur in odd numbers before the quote (to take Amarghosh's correct caveat into account).

Linux: shell builtin string matching

I am trying to become more familiar with using the builtin string matching stuff available in shells in linux. I came across this guys posting, and he showed an example
a="abc|def"
echo ${a#*|} # will yield "def"
echo ${a%|*} # will yield "abc"
I tried it out and it does what its advertised to do, but I don't understand what the $,{},#,*,| are doing, I tried looking for some reference online or in the manuals but I couldn't find anything. Can anyone explain to me what's going on here?
This article in the Linux Journal says that the # operator deletes the shortest possible match on the left, while the % operator deletes the shortest possible match on the right.
So ${a#*|} returns everything after the |, and ${a%|*} returns everything before the |.
If you had a situation that called for greedy matching, you'd use ## or %%.
Take a look at this.
${string%substring}
Deletes shortest match of $substring
from back of $string.
${string#substring}
Deletes shortest match of $substring
from front of $string.
EDIT:
I don't understand what the $,{},#,*,|
are doing
I recommend reading this
Typically, ${somename} will substitute the contents of a defined parameter:
mystring="1234567"
echo ${mystring} # produces '1234567'
The % and # symbols are allowing you to add commands that modify the default behavior.
The asterisk '*' is a wildcard; while the pipe '|' is simply a matching character. Let me do the same thing using the matching character of '4'.
mystring="1234567"
echo ${mystring#*4} # produces '567'
Those features and other similarly useful ones are documented in the Shell Parameter Expansion section of the Bash Reference Manual. Here's another really good reference.

Resources