I have a regex in Ruby to match a huge list of emoticons
/\|?>?[:*;Xx8=<(%)D]-?'?,?o?\_^?[-DOo0S*Ppb3c:;\/\\|)(}{\]><]\)?/
But it doesnt match a few emoticons like
:'-(
and
:*(
The link with my set of matching emoticons is http://rubular.com/r/1vnWEvN76v
How do I match the unmatched ones ?
use r":'-\(" for first example and r":\*\(" for second . and you can add them with pipe (|) to your regex ! but its depend on what you want to be matched with your regex , also you can add them after other regexes or use & or ..
Note that ( and * are regex symbols and you need \ before them !
in this case for your regex you just need to add |\( end of your regex:
\|?>?[:*;Xx8=<(%)D]-?'?,?o?\_^?[-DOo0S*Ppb3c:;\/\\|)(}{\]><]\)?|\(
Related
I want to use this regex to match any block comment (c-style) in a string.
But why the below does not?
rblockcmt = Regexp.new "/\\*[.\s]*?\\*/" # match block comment
p rblockcmt=~"/* 22/Nov - add fee update */"
==> nil
And in addition to what Sir Swoveland posted, a . matches any character except a newline:
The following metacharacters also behave like character classes:
/./ - Any character except a newline.
https://ruby-doc.org/core-2.3.0/Regexp.html
If you need . to match a newline, you can specify the m flag, e.g. /.*?/m
Options
The end delimiter for a regexp can be followed by one or more
single-letter options which control how the pattern can match.
/pat/i - Ignore case
/pat/m - Treat a newline as a character matched by .
...
https://ruby-doc.org/core-2.3.0/Regexp.html
Because having exceptions/quirks like newline not matching a . can be painful, some people specify the m option for every regex they write.
It appears that you intend [.\s]*? to match any character or a whitespace, zero or more times, lazily. Firstly, whitespaces are characters, so you don't need \s. That simplifies your expression to [.]*?. Secondly, if your intent is to match any character there is no need for a character class, just write .. Thirdly, and most importantly, a period within a character class is simply the character ".".
You want .*? (or [^*]*).
I have an input file named test which looks like this
leonid sergeevich vinogradov
ilya alexandrovich svintsov
and when I use grep like this grep 'leonid*vinogradov' test it says nothing, but when I type grep 'leonid.*vinogradov' test it gives me the first string. What's the difference between * and .*? Because I see no difference between any number of any characters and any character followed by any number of any characters.
I use ubuntu 14.04.3.
* doesn't match any number of characters, like in a file glob. It is an operator, which indicates 0 or more matches of the previous character. The regular expression leonid*vinogradov would require a v to appear immediately after 0 or more ds. The . is the regular expression metacharcter representing any single character, so .* matches 0 or more arbitrary characters.
grep uses regex and .* matches 0 or more of any characters.
Where as 'leonid*vinogradov' is also evaluated as regex and it means leoni followed by 0 or more of letter d hence your match fails.
It's Regular Expression grep uses, short as regexp, not wildcards you thought. In this case, "." means any character, "" means any number of (include zero) the previous character, so "." means anything here.
Check the link, or google it, it's a powerful tool you'll find worth to knew.
I am trying to match mathematica expressions like 1+2 and 1*2/3.... to infinity. Can someone explain why my regex matches the final case below, and how to fix it so that it matches only valid expressions (that might stretch forever)?
perms=["12+2*4","2+2","-2+","12+34-"]
perms.each do |line|
puts "#{line}=#{eval(line)}" if line =~ /^\d+([+-\/*]\d+){1,}/
end
I expected the output to be:
12+2*4=20
2+2=4
Inside a [character set], the - character defines a range of characters -- think of [a-z] or [0-9]. If you want to match a literal -, it must be the first or last character.
/^\d+(?:[+\/*-]\d+)+$/
Other things: {1,} is exactly +; and you need to anchor at the end too, so you don't match 1+2+
You should finalize your expression with $ to match the entire input string:
/^\d+([-+\/*]\d+){1,}$/
The wrong position of the hyphen - is one source of error in your expression. The missing $ the other.
I'm trying to make a regex that matches anything except an exact ending string, in this case, the extension '.exe'.
Examples for a file named:
'foo' (no extension) I want to get 'foo'
'foo.bar' I want to get 'foo.bar'
'foo.exe.bar' I want to get 'foo.exe.bar'
'foo.exe1' I want to get 'foo.exe1'
'foo.bar.exe' I want to get 'foo.bar'
'foo.exe' I want to get 'foo'
So far I created the regex /.*\.(?!exe$)[^.]*/
but it doesn't work for cases 1 and 6.
You can use a positive lookahead.
^.+?(?=\.exe$|$)
^ start of string
.+? non greedily match one or more characters...
(?=\.exe$|$) until literal .exe occurs at end. If not, match end.
See demo at Rubular.com
Wouldn't a simple replacement work?
string.sub(/\.exe\z/, "")
Do you mean regex matching or capturing?
There may be a regex only answer, but it currently eludes me. Based on your test data and what you want to match, doing something like the following would cover both what you want to match and capture:
name = 'foo.bar.exe'
match = /(.*).exe$/.match(name)
if match == nil
# then this filename matches your conditions
print name
else
# otherwise match[1] is the capture - filename without .exe extension
print match[1]
end
string pattern = #" (?x) (.* (?= \.exe$ )) | ((?=.*\.exe).*)";
First match is a positive look-ahead that checks if your string
ends with .exe. The condition is not included in the match.
Second match is a positive look-ahead with the condition included in the
match. It only checks if you have something followed by .exe.
(?x) is means that white spaces inside the pattern string are ignored.
Or don't use (?x) and just delete all white spaces.
It works for all the 6 scenarios provided.
I have the following string format:
[[TEXT|TEXT]] <-- the "|TEXT" is optional
And so far what works fine is:
/([^\[]+)(?=\]\])/
Which will return:
TEXT
or
TEXT|TEXT
I want to match up to "|TEXT" if it has been included and only ever match the left side of "|" or "]]" depending on which is first.
Any suggestions?
UPDATE - if you only want to match what's in the [[]] and not including the |TEXT if it's there, try this:
([^|\[]+?)(?=(?:\]\]|\|))
Which is more-or-less your starting regex but it stops at a | or a ]].
Or if your regex flavour supports look-behinds you could do:
((?<=\[\[)([^\|]+?)(?=(?:\]\]|\|))
This just enforces that you match from [[ onwards. Not really necessary but a bit more unknown-content-proof.