Split a string by two delimiters [duplicate] - ruby

This question already has answers here:
Split string by multiple delimiters
(6 answers)
Closed 6 years ago.
I want to split a string by whitespaces and # using a single ruby command.
word.split(" ") will split by whitespaces ;
word.split("#") will split by '.
How to do all three at once?

Use regular expressions' character class to do that:
word.split(/[ #]/)
To match any whitespace character use \s : word.split(/[\s#]/)
A character class is delimited with square brackets ([, ]) and lists
characters that may appear at that point in the match. /[ab]/ means a
or b, as opposed to /ab/ which means a followed by b.
/\s/ - A whitespace character: /[ \t\r\n\f]/

Related

Git bash on Windows different result than terminal on CentOS for regex [duplicate]

This question already has answers here:
Removal of special characters from string using perl script
(2 answers)
Closed 9 months ago.
See the following cleanCustomer.sh file
#!/bin/bash
customer=Reportçós
cleanedCustomer=${customer//[^a-zA-Z0-9 \-_.]/}
echo $cleanedCustomer
When I run it on Windows 11 in Git Bash it prints Reports.
When I run it on CentOS in terminal it prints Reportçós.
Anybody knows why is a-z interpreted as alpha characters in CentOS and not in Windows?
How do I ensure only english characters are considered in the CentOS?
From the bash manual:
A pair of characters separated by a hyphen denotes a range expression; any character that falls between those two characters, inclusive, using the current locale’s collating sequence and character set, is matched. If the first character following the ‘[’ is a ‘!’ or a ‘^’ then any character not enclosed is matched. A ‘-’ may be matched by including it as the first or last character in the set.
Your Git Bash locale uses rules that don't match accented characters in ranges like a-z, your CentOS locale does. This can be addressed by using a consistent locale like C for collation. Plus your - is in the wrong spot; it needs to be first or last, and the backslash needs to be escaped with another backslash to match a literal one.
#!/bin/bash
LC_COLLATE=C
customer=Reportçós
cleanedCustomer=${customer//[^a-zA-Z0-9 \\_.-]/}
printf "%s\n" "$cleanedCustomer"

How do I truncate the last two characters of all files in a directory? [duplicate]

This question already has answers here:
Bash script to remove 'x' amount of characters the end of multiple filenames in a directory?
(3 answers)
Closed 5 years ago.
So pretty simple question. All of the files in my directory are of the form 6bfefb348d746eca288c6d62f6ebec04_0.jpg. I want them to look like 6bfefb348d746eca288c6d62f6ebec04.jpg. Essentially, I want to take off the _0 at the end of every file name. How would I go about doing this with bash?
With Perl's standalone rename command:
rename -n 's/..(\....)$/$1/' *
If everything looks fine, remove -n.
It is possible to use this standalone rename command with a syntax similar to sed's s/regexp/replacement/ command. In regex a . matches one character. \. matches a . and $ matches end of line (here end of filename). ( and ) are special characters in regex to mark a subexpression (here one . and three characters at the end of your filename) which then can be reused with $1. sed uses \1 for first back-reference, rename uses $1.
See: Back-references and Subexpressions with sed

What does expanding a variable as "${var%%r*}" mean in bash? [duplicate]

This question already has an answer here:
Bash: manipulating with strings (percent sign)
(1 answer)
Closed 6 years ago.
I've got the following variable set in bash:
ver=$(/usr/lib/virtualbox/VBoxManage -v | tail -1)
then I have the following variable which I do not quite understand:
pkg_ver="${ver%%r*}"
Could anyone elaborate on what this does, and how pkg_ver is related to the original ver value?
It is a bash parameter expansion syntax to extract text from end of string upto first occurrence of r
name="Ivory"
printf "%s\n" "${name%%r*}"
Ivo
${PARAMETER%%PATTERN}
This form is to remove the described pattern trying to match it from the end of the string. The operator "%" will try to remove the shortest text matching the pattern, while "%%" tries to do it with the longest text matching.
You will get everything from variable ver until first "r" character and it will be stored inside pkg_ver.
export ver=aaarrr
echo "${ver%%r*}"
aaa

How remove first match in gsub? [duplicate]

This question already has answers here:
Ruby - replace the first occurrence of a substring with another string
(3 answers)
Closed 6 years ago.
I have this
2016-05-20T13:36:29.835, CTF3D57C
and I want this
2016-05-2013:36:29.835, CTF3D57C
I just want to remove the first T character. How do I do this?
This will substitute the first 'T' in string with anything you want:
str = str.sub('T', '')
If you wish to substitute all occurrence of any substring or regex, use gsub

How to allow string with letters, numbers, period, hyphen, and underscore?

I am trying to make a regular expression, that allow to create string with the small and big letters + numbers - a-zA-z0-9 and also with the chars: .-_
How do I make such a regex?
The following regex should be what you are looking for (explanation below):
\A[-\w.]*\z
The following character class should match only the characters that you want to allow:
[-a-zA-z0-9_.]
You could shorten this to the following since \w is equivalent to [a-zA-z0-9_]:
[-\w.]
Note that to include a literal - in your character class, it needs to be first character because otherwise it will be interpreted as a range (for example [a-d] is equivalent to [abcd]). The other option is to escape it with a backslash.
Normally . means any character except newlines, and you would need to escape it to match a literal period, but this isn't necessary inside of character classes.
The \A and \z are anchors to the beginning and end of the string, otherwise you would match strings that contain any of the allowed characters, instead of strings that contain only the allowed characters.
The * means zero or more characters, if you want it to require one or more characters change the * to a +.
/\A[\w\-\.]+\z/
\w means alphanumeric (case-insensitive) and "_"
\- means dash
\. means period
\A means beginning (even "stronger" than ^)
\z means end (even "stronger" than $)
for example:
>> 'a-zA-z0-9._' =~ /\A[\w\-\.]+\z/
=> 0 # this means a match
UPDATED thanks phrogz for improvement

Resources