What is the Ruby regex to match a string with at least one period and no spaces? - ruby

What is the regex to match a string with at least one period and no spaces?

You can use this :
/^\S*\.\S*$/
It works like this :
^ <-- Starts with
\S <-- Any character but white spaces (notice the upper case) (same as [^ \t\r\n])
* <-- Repeated but not mandatory
\. <-- A period
\S <-- Any character but white spaces
* <-- Repeated but not mandatory
$ <-- Ends here
You can replace \S by [^ ] to work strictly with spaces (not with tabs etc.)

Something like
^[^ ]*\.[^ ]*$
(match any non-spaces, then a period, then some more non-spaces)

no need regular expression. Keep it simple
>> s="test.txt"
=> "test.txt"
>> s["."] and s.count(" ")<1
=> true
>> s="test with spaces.txt"
=> "test with spaces.txt"
>> s["."] and s.count(" ")<1
=> false

Try this:
/^\S*\.\S*$/

Related

Regex: match something except within arbitrary delimiters

My string:
a = "Please match spaces here <but not here>. Again match here <while ignoring these>"
Using Ruby's regex flavor, I would like to do something like:
a.gsub /regex_pattern/, '_'
And obtain:
"Please_match_spaces_here_<but not here>._Again_match_here_<while ignoring these>"
This should do it:
result = subject.gsub(/\s+(?![^<>]*>)/, '_')
This regex assumes there's nothing tricky like escaped angle brackets. Also be aware that \s matches newlines, TABs and other whitespace characters as well as spaces. That's probably what you want, but you have the option of matching only spaces:
/ +(?![^<>]*>)/
I think, it works:
a = "Please match spaces here <but not here>. Again match here <while ignoring these>"
pattern = /<(?:(?!<).)*>/
a.gsub(pattern, '')
# => "Please match spaces here . Again match here "

what would the regular expression to extract the 3 from be?

I basically need to get the bit after the last pipe
"3083505|07733366638|3"
What would the regular expression for this be?
You can do this without regex. Here:
"3083505|07733366638|3".split("|").last
# => "3"
With regex: (assuming its always going to be integer values)
"3083505|07733366638|3".scan(/\|(\d+)$/)[0][0] # or use \w+ if you want to extract any word after `|`
# => "3"
Try this regex :
.*\|(.*)
It returns whatever comes after LAST | .
You could do that most easily by using String#rindex:
line = "3083505|07733366638|37"
line[line.rindex('|')+1..-1]
#=> "37"
If you insist on using a regex:
r = /
.* # match any number of any character (greedily!)
\| # match pipe
(.+) # match one or more characters in capture group 1
/x # extended mode
line[r,1]
#=> "37"
Alternatively:
r = /
.* # match any number of any character (greedily!)
\| # match pipe
\K # forget everything matched so far
.+ # match one or more characters
/x # extended mode
line[r]
#=> "37"
or, as suggested by #engineersmnky in a comment on #shivam's answer:
r = /
(?<=\|) # match a pipe in a positive lookbehind
\d+ # match any number of digits
\z # match end of string
/x # extended mode
line[r]
#=> "37"
I would use split and last, but you could do
last_field = line.sub(/.+\|/, "")
That remove all chars up to and including the last pipe.

How do prevent whitespace from appearing in these bash variables?

I'm reading in values from an .ini file, and sometimes may get trailing or leading whitespace.
How do I amend this first line to prevent that?
db=$(sed -n 's/.*DB_USERNAME *= *\([^ ]*.*\)/\1/p' < config.ini);
echo -"$db"-
Result;
-myinivar -
I need;
-myinivar-
Use parameter expansion.
echo "=${db% }="
You don't need the .* inside the capturing group (or the semicolon at the end of line):
db="$(sed -n 's/.*DB_USERNAME *= *\([^ ]*\).*/\1/p' < config.ini)"
To elaborate:
.* matches anything at all
DB_USERNAME matches that literal string
* (a single space followed by an asterisk) matches any number of spaces
= matches that literal string
* (a single space followed by an asterisk) matches any number of spaces
\( starts the capturing group that is used for \1 later
[^ ] matches anything which is not a space character
* repeats that zero or more times
\) ends the capturing group
.* matches anything at all
Therefore, the result will be all the characters after DB_USERNAME = and any number of spaces, up to the next space or end of line, whichever comes first.
You can use echo to trim whitespace:
db='myinivar '
echo -"$(echo $db)"-
-myinivar-
Use crudini which handles these ini file edge cases transparently
db=$(crudini --get config.ini '' DB_USERNAME)
To get rid of more than one trailing space, use %% which removes the longest matching pattern from the end of the string
echo "=${db%% *}="

Ruby regex to split text

I am using the below regex to split a text at certain ending punctuation however it doesn't work with quotes.
text = "\"Hello my name is Kevin.\" How are you?"
text.scan(/\S.*?[...!!??]/)
=> ["\"Hello my name is Kevin.", "\" How are you?"]
My goal is to produce the following result, but I am not very good with regex expressions. Any help would be greatly appreciated.
=> ["\"Hello my name is Kevin.\"", "How are you?"]
text.scan(/"(?>[^"\\]+|\\{2}|\\.)*"|\S.*?[...!!??]/)
The idea is to check for quoted parts before. The subpattern is a bit more elaborated than a simple "[^"]*" to deal with escaped quotes (* see at the end to a more efficient pattern).
pattern details:
" # literal: a double quote
(?> # open an atomic group: all that can be between quotes
[^"\\]+ # all that is not a quote or a backslash
| # OR
\\{2} # 2 backslashes (the idea is to skip even numbers of backslashes)
| # OR
\\. # an escaped character (in particular a double quote)
)* # repeat zero or more times the atomic group
" # literal double quote
| # OR
\S.*?[...!!??]
to deal with single quote to you can add: '(?>[^'\\]+|\\{2}|\\.)*'| to the pattern (the most efficient), but if you want make it shorter you can write this:
text.scan(/(['"])(?>[^'"\\]+|\\{2}|\\.|(?!\1)["'])*\1|\S.*?[...!!??]/)
where \1 is a backreference to the first capturing group (the found quote) and (?!\1) means not followed by the found quote.
(*) instead of writing "(?>[^"\\]+|\\{2}|\\.)*", you can use "[^"\\]*+(?:\\.[^"\\]*)*+" that is more efficient.
Add optional quote (["']?) to the pattern:
text.scan(/\S.*?[...!!??]["']?/)
# => ["\"Hello my name is Kevin.\"", "How are you?"]

Why $ doesn't match \r\n

Can someone explain this:
str = "hi there\r\n\r\nfoo bar"
rgx = /hi there$/
str.match rgx # => nil
rgx = /hi there\s*$/
str.match rgx # => #<MatchData "hi there\r\n\r">
On the one hand it seems like $ does not match \r. But then if I first capture all the white spaces, which also include \r, then $ suddenly does appear to match the second \r, not continuing to capture the trailing "\nfoo bar".
Is there some special rule here about consecutive \r\n sequences? The docs on $ simply say it will match "end of line" which doesn't explain this behavior.
$ is a zero-width assertion. It doesn't match any character, it matches at a position. Namely, it matches either immediately before a \n, or at the end of string.
/hi there\s*$/ matches because \s* matches "\r\n\r", which allows the $ to match at the position before the second \n. The $ could have also matched at the position before the first \n, but the \s* is greedy and matches as much as it can, while still allowing the overall regex to match.

Resources