How to use şŞıİçÇöÖüÜĞğ characters in a regular expression in Ruby? - ruby

I tried using a regular expression to capture names:
r[1].scan(/^([A-Z]|[ŞİÇÖÜĞ])([a-z]|[şŞıİçÇöÖüÜĞğ])*\s([A-Z]|[ŞİÇÖÜĞ])([a-z]|[şŞıİçÇöÖüÜĞğ])*/u)
But, it gives me an error:
syntax error, unexpected $end, expecting ')'
...atches = r[1].scan(/^([A-Z]|[ŞİÇÖÜĞ])([a-z]|[şŞ�...
...
I see that the problem is the Turkish characters I'm using. Is it possible to use unicode values of the characters in regexp? How can I use these problematic characters in this regexp?

Use ruby 1.9
Go with /\p{Word}+\p{Space}\p{Word}*/

Related

How to use lookbehind regexp in go?

I'm trying to convert this ruby regex to go.
GROUP_CALL = /^(?<i1>[ \t]*)group\(?[ \t]*(?<grps>#{SYMBOLS})[ \t]*\)?[ \t]+do[ \t]*?\n(?<blk>.*?)\n^\k<i1>end[ \t]*$/m
I converted it to
groupCall := regexp.MustCompile("^(?P<i1>[ \\t]*)group\\(?[ \\t]*(?P<grps>(?::\\w+|:\"[^\"#]+\"|:'[^']+')([ \\t]*,[ \\t]*(?::\\w+|:\"[^\"#]+\"|:'[^']+'))*)[ \\t]*\\)?[ \\t]+do[ \\t]*?\\n(?P<blk>.*?)\\n^\\k<i1>end[ \\t]*$/s")
but when run I get this error
error parsing regexp: invalid escape sequence: \k
There's no mention of \k in the go docs, is there no equivalent in go?
lookbehinds aren't supported neither are backreferences like #stribizhev mentioned.
Regular Expression 2 (RE2) Syntax Reference:
https://github.com/google/re2/wiki/Syntax
The syntax of the regular expressions accepted is the same general
syntax used by Perl, Python, and other languages. More precisely, it
is the syntax accepted by RE2 and described at
//code.google.com/p/re2/wiki/Syntax, except for \C. --GoLang Docs
(ref: https://golang.org/pkg/regexp/)

Regular expression escape warning

For code
url.gsub(/"|\[|]| /, '')
ruby raises warning
warning: regular expression has ']' without escape: /"|\[|]| /
How to fix it?
Your regex would be reduced to,
url.gsub(/[ "\[\]]/, '')

Preg_ match using?

I use preg_match to the program like this
if (preg_match('/^[a-z0-9]+\:/{1,2}', $filename))
But it shows an error like this
Warning: preg_match() [function.preg-match]: Unknown modifier '{'
how to change this?
You are missing '/' at the end of the regex and you should escape '/' in the regex itself. This should work i.e. warning should be gone (ignoring if regex you've written is doing what you want):
if (preg_match('/^[a-z0-9]+\:\/{1,2}/', $filename))

Why is the IRB prompt changing to an askerisk when I try to match this regex?

I am a newbie in Ruby, I'm using version 1.9.3. I have the following regular expression:
/\\\//
As far as I know, it should match a string which has the characters '\' and '/', one following the other, right?
I am using the following code in order to get true in case the regex matches the string or symbol in the far right:
!(regex !~ :"string or symbol to match")
Because using =~ gives me the index of the match and I simply want a boolean. Besides, I'm trying to see how ugly or hackish can Ruby look compared to C :P
When I try to match the symbol :\/ the IRB prompt changes to an asterisk, and returns nothing. Why?
When I try to match the string "\/" my little ugly snippet returns false. Why?
The symbol :\/ is not a valid symbol. You could do :'\/' if you wanted a symbol version of the string '\/'. And when you feed it "\/" it is false because that has double quotes so it is actually the string '/' so you actually want either '\/' or "\\/".
Finally, it's better code and convention to do your test like so:
!!(regex =~ :'\/')
!!(regex =~ '\/')
!!(regex =~ "\\/")

How do you use unicode characters within a regular expression in Ruby?

I am attempting to write a line of code that will take a line of japanese text and delete a certain set of characters. However I am having trouble with using unicode characters inside of the regular expression.
I am currently using text.gsub(/《.*?》/u, '') but I get the error
'gsub': invalid byte sequence in Windows-31J (Argument error)
Can anyone tell me what I am doing incorrectly?
Example text : その仕草《しぐさ》があまりに無造作《むぞうさ》だったので
Expected result: その仕草があまりに無造作だったので
Thanks
edit: # encoding: utf-8 is present at the top of the script.
Try this:
text.encode('utf-8', 'utf-8').gsub(/《.*?》/u, '')

Resources