Find last word in string without any dots, white spaces - preg-match

I dont understant regular expression, because it seems very dificult... So what i found it just find last word with dots and etc.
$content='Nullam quis risus eget urna mollis ornare vel eu leo.';
$pattern = '/[^ ]*$/';
preg_match($pattern, $content, $result);
echo $result[0];
I get "leo.".
How can I get just "leo", without dot, or question mark ?
Thank you.

You can obtain the last word using the $ anchor and a lookahead:
$pattern = '~[a-z]+(?=[^a-z]*$)~i';
explanation:
~ # pattern delimiter
[a-z]+ # all characters between a and z (included) one or more times
(?= # open a lookahead. Means followed by:
[^a-z]* # all characters that are not letters zero or more times
$ # until the end of the string
) # close the lookahead
~ # pattern delimiter
i # case insensitive ( [a-z] => [a-zA-Z] and [^a-z] => [^a-zA-Z])
For your regex tries, i suggest you to use this online tool which is specific to php.

You can use rubular.com to learn and test regular expressions. They also provide a nice cheat-sheet =).
You will catch your word with : \b([a-zA-Z]*)[\W]$
With this regex, you will match the point too. To extract only "leo" you will need to understand about capturing groups: http://www.regular-expressions.info/named.html

Somehow I solved this problem with pattern: '/[\p{L}\d]+(?=[^a-z]*$)/u'. Maybe someone know how to find first and last word in string and put them in , arrays with preg_match ?

Related

Regexp hangs when input string contains brackets

I have:
vv = /added:\s{0,}\d{1,2}\/\d{1,2}\/\d{4}|terminated:\s{0,}\d{1,2}\/\d{1,2}\/\d{4}|(?-mix:\((\w+([\p{P}\s]{,3}\w*)*)\))/i
Below is my experiment:
detail = "(value containts lorem ipsum lorum ipsum"
detail =~ vv
When I try without bracket at the start of input string, it works.
detail = "value containts lorem ipsum lorum ipsum"
detail =~ vv
# => nil
The problem you experience is catastrophical backtracking. Your \w+([\p{P}\s]{,3}\w*)* causes an issue as the ([\p{P}\s]{,3}\w*)* contains a nested zero or more quantifier *. The problem arises because the parts inside are both optional (=can match empty strings) and quantified. See your regex demo, try adding one more symbol and see the step amount increase: adding a space after (value containt will double the number of steps from 65,742 to 102,610! Adding 1 more symbol crashes the demo.
Replacing it with \w+(?:[\p{P}\s]{1,3}\w+)*, or even \w+(?:\W{1,3}\w+)* should fix the issue as the subpatterns inside the grouping (...) construct will no longer be matching empty strings (but the whole group will be optional, zero or more repetitions). [\p{P}\s]{1,3} requires at least 1 punctuation or whitespace and \w+ requires one or more word characters.
Also note that you do not need the (?-mix:...) group, I removed it from my suggested pattern: you have no . inside (no need for m), no letters that can be in lower- or upper case (no need for i) and there are no spaces to ignore in the pattern (no need for x). Also, {0,} quantifier is equal to *, I replaced one or two in the beginning.
Use
vv = /added:\s*\d{1,2}\/\d{1,2}\/\d{4}|terminated:\s*\d{1,2}\/\d{1,2}\/\d{4}|\((\w+(?:[\p{P}\s]{1,3}\w+)*)\)/i
detail = "(value containts lorem ipsum lorum ipsum"
detail =~ vv
See Ruby demo

Regex matching chars around text

I have a string with chars inside and I would like to match only the chars around a string.
"This is a [1]test[/1] string. And [2]test[/2]"
Rubular http://rubular.com/r/f2Xwe3zPzo
Currently, the code in the link matches the text inside the special chars, how can I change it?
Update
To clarify my question. It should only match if the opening and closing has the same number.
"[2]first[/2] [1]second[/2]"
In the code above, only first should match and not second. The text inside the special chars (first), should be ignored.
Try this:
(\[[0-9]\]).+?(\[\/[0-9]\])
Permalink to the example on Rubular.
Update
Since you want to remove the 'special' characters, try this instead:
foo = "This is a [1]test[/1] string. And [2]test[/2]"
foo.gsub /\[\/?\d\]/, ""
# => "This is a test string. And test"
Update, Part II
You only want to remove the 'special' characters when the surrounding tags match, so what about this:
foo = "This is a [1]test[/1] string. And [2]test[/2], but not [3]test[/2]"
foo.gsub /(?:\[(?<number>\d)\])(?<content>.+?)(?:\[\/\k<number>\])/, '\k<content>'
# => "This is a test string. And test, but not [3]test[/2]"
\[([0-9])\].+?\[\/\1\]
([0-9]) is a capture since it is surrounded with parentheses. The \1 tells it to use the result of that capture. If you had more than one capture, you could reference them as well, \2, \3, etc.
Rubular
You can also use a named capture, rather than \1 to make it a little less cryptic. As in: \[(?<number>[0-9])\].+?\[\/\k<number>\]
Here's a way to do it that uses the form of String#gsub that takes a block. The idea is to pull strings such as "[1]test[/1]" into the block, and there remove the unwanted bits.
str = "This is a [1]test[/1] string. And [2]test[/2], plus [3]test[/99]"
r = /
\[ # match a left bracket
(\d+) # capture one or more digits in capture group 1
\] # match a right bracket
.+? # match one or more characters lazily
\[\/ # match a left bracket and forward slash
\1 # match the contents of capture group 1
\] # match a right bracket
/x
str.gsub(r) { |s| s[/(?<=\]).*?(?=\[)/] }
#=> "This is a test string. And test, plus [3]test[/99]"
Aside: When I first heard of named capture groups, they seemed like a great idea, but now I wonder if they really make regexes easier to read than \1, \2....

Need some help understanding backreferences in ruby

I was working on a coderbyte problem where I output the number of occurrences of a character along with the corresponding character. For example "wwwggopp" would return 3w2g1o2p. I was able to solve it but I compared my answer to someone else's and they came up with the following:
def RunLength(str)
chunks = str.scan(/((\w)\2*)/)
output = ' '
chunks.each do |chunk|
output << chunk[0].size.to_s + chunk[1]
end
output
end
I get most of the code but what exactly is happening here?
(/((\w)\2*)/)
I understand that \w refers to any character and \2 is a 'backreference' and * refers to 0 or more instances...but together, I'm not sure what it means, mostly because I don't know really know what a backreference is and how it works. I've been reading about it but I'm still struggling to grasp the concept. Does the \2 refer to the "2nd group" and if so, what exactly is the "2nd group"?
Backreferences recall what was matched by a capturing group. A backreference is specified as a backslash (\); followed by a digit indicating the number of the group to be recalled.
Your regular expression broke down:
( # group and capture to \1:
( # group and capture to \2:
\w # word characters (a-z, A-Z, 0-9, _)
) # end of \2
\2* # what was matched by capture \2 (0 or more times)
) # end of \1

Substring starting with a combination of digits till the next white space

I have a very long string of around 2000 chars. The string is a join of segments with first two chars of each segment as the segment indicator.
Eg- '11xxxxx 12yyyy 14ddddd gghgfbddc 0876686589 SANDRA COLINS 201 STMONK CA'
Now I want to extract the segment with indicator 14.
I achieved this using:
str.split(' ').each do |substr|
if substr.starts_with?('14')
key = substr.slice(2,5).to_i
break
end
end
I feel there should be a better way to do this. I am not able to find a more direct and one line solution for string matching in ruby. Please someone suggest a better approach.
It's not entirely clear what you're looking for, because your example string shows letters, but your title says digits. Either way, this is a good task for a regular expression.
foo = '12yyyy 014dddd 14ddddd gghgfbddc'
bar = '12yyyy 014dddd 1499999 gghgfbddc'
baz = '12yyyy 014dddd 14a9B9z gghgfbddc'
foo[/\b14[a-zA-Z]+/] # => "14ddddd"
bar[/\b14\d+/] # => "1499999"
baz[/\b14\w+/] # => "14a9B9z"
foo[/\b14\S+/] # => "14ddddd"
bar[/\b14\S+/] # => "1499999"
baz[/\b14\S+/] # => "14a9B9z"
In the patterns:
\b means word-break, so the pattern has to start at a transition between spaces or punctuation.
[a-zA-Z]+ means one or more letters.
\d+ means one or more digits.
\w+ means one or more of letters, digits and '_'. That is equivalent to the character set [a-zA-Z0-9_]+.
\S+ means non-whitespace, which is useful if you want everything up to a space.
Which of those is appropriate for your use-case is really up to you to decide.

Ruby Regex Match Between "foo" and "bar"

I have unfortunately wandered into a situation where I need regex using Ruby. Basically I want to match this string after the underscore and before the first parentheses. So the end result would be 'table salt'.
_____ table salt (1) [F]
As usual I tried to fight this battle on my own and with rubular.com. I got the first part
^_____ (Match the beginning of the string with underscores ).
Then I got bolder,
^_____(.*?) ( Do the first part of the match, then give me any amount of words and letters after it )
Regex had had enough and put an end to that nonsense and crapped out. So I was wondering if anyone on stackoverflow knew or would have any hints on how to say my goal to the Ruby Regex parser.
EDIT: Thanks everyone, this is the pattern I ended up using after creating it with rubular.
ingredientNameRegex = /^_+([^(]*)/;
Everything got better once I took a deep breath, and thought about what I was trying to say.
str = "_____ table salt (1) [F]"
p str[ /_{3}\s(.+?)\s+\(/, 1 ]
#=> "table salt"
That says:
Find at least three underscores
and a whitespace character (\s)
and then one or more (+) of any character (.), but as little as possible (?), up until you find
one or more whitespace characters,
and then a literal (
The parens in the middle save that bit, and the 1 pulls it out.
Try this: ^[_]+([^(]*)\(
It will match lines starting with one or more underscores followed by anything not equal to an opening bracket: http://rubular.com/r/vthpGpVr4y
Here's working regex:
str = "_____ table salt (1) [F]"
match = str.match(/_([^_]+?)\(/)
p match[1].strip # => "table salt"
You could use
^_____\s*([^(]+?)\s*\(
^_____ match the underscore from the beginning of string
\s* matches any whitespace character
( grouping start
[^(]+ matches all non ( character at least once
? matches the shortest possible string (non greedy)
) grouping end
\s* matches any whitespace character
\( find the (
"_____ table salt (1) [F]".gsub(/[_]\s(.+)\s\(/, ' >>>\1<<< ')
# => "____ >>>table salt<<< 1) [F]"
It seems to me the simplest regex to do what you want is:
/^_____ ([\w\s]+) /
That says:
leading underscores, space, then capture any combination of word chars or spaces, then another space.

Resources