Regular expression problem with ruby

Regular expression problem with ruby - ruby

I have a regular expression to match filenames which look like this:
name - subname goes here v4 03.txt
name - subname long 03.txt
name - subname v4 #03.txt
I want to extract the name and subname, without any addintional data. I'm able to extract the data just fine, the problem that is giving me errors is the v4 part (it's a version marker which is a v and a digit after it and it's not included everywhere), I want to exclude it but it extracts it along with the subname...
My regex looks like this:
^([\w \.]+)(?:-)?([\w \.-]+)? #?\d+
I tried doing something like this, but it only works without the ? at the end of "(?:v\d+ )?", and then it can't match filenames without the version:
^([\w \.]+)(?:-)?([\w \.-]+)? (?:v\d+ )?#?\d+
How do I make it work?

try this:
/^([\w \.]+?) - ([\w \.-]+?)(?: v\d+)? #?\d+/
I think you need to understand what is the difference of (\w+?) and (\w+)?

I would do this in two stages, first remove the parts that you don't want
a = str.sub /\s* (?: v\d+)? \s* \d+ \.[^.]*? $/x, ''
And then split the string on ' - '
a.split /\s*-\s*/

Related

Reformatting dates

I'm trying to reformat German dates (e.g. 13.03.2011 to 2011-03-13).
This is my code:
str = "13.03.2011\n14:30\n\nHannover Scorpions\n\nDEG Metro Stars\n60\n2 - 3\n\n\n\n13.03.2011\n14:30\n\nThomas Sabo Ice Tigers\n\nKrefeld Pinguine\n60\n2 - 3\n\n\n\n"
str = str.gsub("/(\d{2}).(\d{2}).(\d{4})/", "/$3-$2-$1/")
I get the same output like input. I also tried my code with and without leading and ending slashes, but I don't see a difference. Any hints?
I tried to store my regex'es in variables like find = /(\d{2}).(\d{2}).(\d{4})/ and replace = /$3-$2-$1/, so my code looked like this:
str = "13.03.2011\n14:30\n\nHannover Scorpions\n\nDEG Metro Stars\n60\n2 - 3\n\n\n\n13.03.2011\n14:30\n\nThomas Sabo Ice Tigers\n\nKrefeld Pinguine\n60\n2 - 3\n\n\n\n"
find = /(\d{2}).(\d{2}).(\d{4})/
replace = /$3-$2-$1/
str = str.gsub(find, replace)
TypeError: no implicit conversion of Regexp into String
from (irb):4:in `gsub'
Any suggestions for this problem?

First mistake is the regex delimiter. You do not need place the regex as string. Just place it inside a delimiter like //
Second mistake, you are using captured groups as $1. Replace those as \\1
str = str.gsub(/(\d{2})\.(\d{2})\.(\d{4})/, "\\3-\\2-\\1")
Also, notice I have escaped the . character with \., because in regex . means any character except \n

Separate word Regex Ruby

I have a bunch of input files in a loop and I am extracting tag from them. However, I want to separate some of the words. The incoming strings are in the form cs### where ### => is any number from 0-9. I want the result to be cs ###. The closest answer I found was this, Regex to separate Numeric from Alpha . But I cannot get this to work, as the string is being predefined (Static) and mine changes.
Found answer:
Nevermind, I found the answer the following sperates alpha-numeric characters and removes any unwanted non-alphanumeric characters so anything like ab5#6$% =>ab 56
gsub(/(?<=[0-9])(?=[a-z])|(?<=[a-z])(?=[0-9])/i, ' ').gsub(/[^0-9a-z ]/i, ' ')

If your string is something like
str = "cs3232
cs23
cs423"
Then you can do something like
str.scan(/((cs)(\d{1,10}))/m).collect{|e| e.shift; e }
# [["cs", "3232"], ["cs", "23"], ["cs", "423"]]

Rails 3 + regex - Replace part of a string, 1 occurrence

I'm new to Rails, and furthermore to regex. Been looking around, but I'm blocked...
I have a string like this :
Current: http://zs.domain.com/user_images/123456789/imageName_size.ext
Wanted: http://zs.domain.com/user_images/123456789/imageName.ext
I've managed to get to this :
http://a0.twimg.com/profile/1240267050/logo1.png
=> losing all occurrences with
picture.gsub!(/_([a-z0-9-]+)/, '')
or this :
http://a0.twimg.com/profile_images/1240267050/logo1
=> changing only the last occurrence, but losing the extension with
picture.gsub!(/_([a-z0-9-]+)**.(png|gif|jpg|jpeg)**/, '')

You're almost there. The second parameter is the string with which the match will be replaced, and you can re-use matched groups from the match. This will do the trick:
picture.gsub!(/_([a-z0-9-]+).(png|gif|jpg|jpeg)/, '.\2')
To accomodate for the additional conditions, as posed in the comment:
picture.gsub!(/_([^\/]+).(png|gif|jpg|jpeg)/, '.\2')

markijbema's answer will change the string
.../xxx_yyygifzzz/...,
into
.../xxxgifzzz/....
In order to avoid that, you can do this:
picture.gsub!(/_[^\/]+(?=\.[^\.]+\z)/, '')
(?=...) is understood as a context that follows the string, and will not be included in the match.
\z describes the end of the string, so this regexp is safe to use when some intermediate directory includes a string like above.

Ruby MatchData class is repeating captures, instead of including additional captures as it "should"

Ruby 1.9.1, OSX 10.5.8
I'm trying to write a simple app that parses through of bunch of java based html template files to replace a period (.) with an underscore if it's contained within a specific tag. I use ruby all the time for these types of utility apps, and thought it would be no problem to whip up something using ruby's regex support. So, I create a Regexp.new... object, open a file, read it in line by line, then match each line against the pattern, if I get a match, I create a new string using replaceString = currentMatch.gsub(/./, '_'), then create another replacement as whole string by newReplaceRegex = Regexp.escape(currentMatch) and finally replace back into the current line with line.gsub(newReplaceRegex, replaceString) Code below, of course, but first...
The problem I'm having is that when accessing the indexes within the returned MatchData object, I'm getting the first result twice, and it's missing the second sub string it should otherwise be finding. More strange, is that when testing this same pattern and same test text using rubular.com, it works as expected. See results here
My pattern:
(<(?:WEBOBJECT|webobject) (?:NAME|name)=(?:[a-zA-Z0-9]+.)+(?:[a-zA-Z0-9]+)(?:>))
Text text:
<WEBOBJECT NAME=admin.normalMode.someOtherPatternWeDontWant.moreThatWeDontWant>moreNonMatchingText<WEBOBJECT NAME=admin.SecondLineMatch>AndEvenMoreNonMatchingText
Here's the relevant code:
tagRegex = Regexp.new('(<(?:WEBOBJECT|webobject) (?:NAME|name)=(?:[a-zA-Z0-9]+\.)+(?:[a-zA-Z0-9]+)(?:>))+')
testFile = File.open('RegexTestingCompFix.txt', "r+")
lineCount=0
testFile.each{|htmlLine|
lineCount += 1
puts ("Current line: #{htmlLine} at line num: #{lineCount}")
tagMatch = tagRegex.match(htmlLine)
if(tagMatch)
matchesArray = tagMatch.to_a
firstMatch = matchesArray[0]
secondMatch = matchesArray[1]
puts "First match: #{firstMatch} and second match #{secondMatch}"
tagMatch.captures.each {|lineMatchCapture|
puts "Current capture for tagMatches: #{lineMatchCapture} of total match count #{matchesArray.size}"
#create a new regex using the match results; make sure to use auto escape method
originalPatternString = Regexp.escape(lineMatchCapture)
replacementRegex = Regexp.new(originalPatternString)
#replace any periods with underscores in a copy of lineMatchCapture
periodToUnderscoreCorrection = lineMatchCapture.gsub(/\./, '_')
#replace original match with underscore replaced copy within line
htmlLine.gsub!(replacementRegex, periodToUnderscoreCorrection)
puts "The modified htmlLine is now: #{htmlLine}"
}
end
}
I would think that I should get the first tag in matchData[0] then the second tag in matchData1, or, what I'm really doing because I don't know how many matches I'll get within any given line is matchData.to_a.each. And in this case, matchData has two captures, but they're both the first tag match
which is: <WEBOBJECT NAME=admin.normalMode.someOtherPatternWeDontWant.moreThatWeDontWant>
So, what the heck am I doing wrong, why does rubular test give me the expected results?

You want to use the on String#scan instead of the Regexp#match:
tag_regex = /<(?:WEBOBJECT|webobject) (?:NAME|name)=(?:[a-zA-Z0-9]+\.)+(?:[a-zA-Z0-9]+)(?:>)/
lines = "<WEBOBJECT NAME=admin.normalMode.someOtherPatternWeDontWant.moreThatWeDontWant>moreNonMatchingText\
<WEBOBJECT NAME=admin.SecondLineMatch>AndEvenMoreNonMatchingText"
lines.scan(tag_regex)
# => ["<WEBOBJECT NAME=admin.normalMode.someOtherPatternWeDontWant.moreThatWeDontWant>", "<WEBOBJECT NAME=admin.SecondLineMatch>"]
A few recommendations for next ruby questions:
newlines and spaces are your friends, you don't loose points for using more lines on your code ;-)
use do-end on blocks instead of {}, improves readability a lot
declare variables in snake case (hello_world) instead of camel case (helloWorld)
Hope this helps

I ended up using the String.scan approach, the only tricky point there was figuring out that this returns an array of arrays, not a MatchData object, so there was some initial confusion on my part, mostly due to my ruby green-ness, but it's working as expected now. Also, I trimmed the regex per Trevoke's suggestion. But snake case? Never...;-) Anyway, here goes:
tagRegex = /(<(?:webobject) (?:name)=(?:\w+\.)+(?:\w+)(?:>))/i
testFile = File.open('RegexTestingCompFix.txt', "r+")
lineCount=0
testFile.each do |htmlLine|
lineCount += 1
puts ("Current line: #{htmlLine} at line num: #{lineCount}")
oldMatches = htmlLine.scan(tagRegex) #oldMatches thusly named due to not explicitly using Regexp or MatchData, as in "the old way..."
if(oldMatches.size > 0)
oldMatches.each_index do |index|
arrayMatch = oldMatches[index]
aMatch = arrayMatch[0]
#create a new regex using the match results; make sure to use auto escape method
replacementRegex = Regexp.new(Regexp.escape(aMatch))
#replace any periods with underscores in a copy of lineMatchCapture
periodToUnderscoreCorrection = aMatch.gsub(/\./, '_')
#replace original match with underscore replaced copy within line, matching against the new escaped literal regex
htmlLine.gsub!(replacementRegex, periodToUnderscoreCorrection)
puts "The modified htmlLine is now: #{htmlLine}"
end # I kind of still prefer the brackets...;-)
end
end
Now, why does MatchData work the way it does? It seems like it's behavior is a bug really, and certainly not very useful in general if you can't get it provide a simple means of accessing all the matches. Just my $.02

Small bits:
This regexp helps you get "normalMode" .. But not "secondLineMatch":
<webobject name=\w+\.((?:\w+)).+> (with option 'i', for "case insensitive")
This regexp helps you get "secondLineMatch" ... But not "normalMode":
<webobject name=\w+\.((?:\w+))> (with option 'i', for "case insensitive").
I'm not really good at regexpt but I'll keep toiling at it.. :)
And I don't know if this helps you at all, but here's a way to get both:
<webobject name=admin.(\w+) (with option 'i').

Ruby Regex match unless escaped with \

Using Ruby I'm trying to split the following text with a Regex
~foo\~\=bar =cheese~monkey
Where ~ or = denotes the beginning of match unless it is escaped with \
So it should match
~foo\~\=bar
then
=cheese
then
~monkey
I thought the following would work, but it doesn't.
([~=]([^~=]|\\=|\\~)+)(.*)
What is a better regex expression to use?
edit To be more specific, the above regex matches all occurrences of = and ~
edit Working solution. Here is what I came up with to solve the issue. I found that Ruby 1.8 has look ahead, but doesn't have lookbehind functionality. So after looking around a bit, I came across this post in comp.lang.ruby and completed it with the following:
# Iterates through the answer clauses
def split_apart clauses
reg = Regexp.new('.*?(?:[~=])(?!\\\\)', Regexp::MULTILINE)
# need to use reverse since Ruby 1.8 has look ahead, but not look behind
matches = clauses.reverse.scan(reg).reverse.map {|clause| clause.strip.reverse}
matches.each do |match|
yield match
end
end

What does "remove the head" mean in this context?
If you want to remove everything before a certain char, this will do:
.*?(?<!\\)= // anything up to the first "=" that is not preceded by "\"
.*?(?<!\\)~ // same, but for the squiggly "~"
.*?(?<!\\)(?=~) // same, but excluding the separator itself (if you need that)
Replace by "", repeat, done.
If your string has exactly three elements ("1=2~3") and you want to match all of them at once, you can use:
^(.*?(?<!\\)(?:=))(.*?(?<!\\)(?:~))(.*)$
matches: \~foo\~\=bar =cheese~monkey
| 1 | 2 | 3 |
Alternatively, you split the string using this regex:
(?<!\\)[=~]
returns: ['\~foo\~\=bar ', 'cheese', 'monkey'] for "\~foo\~\=bar =cheese~monkey"
returns: ['', 'foo\~\=bar ', 'cheese', 'monkey'] for "~foo\~\=bar =cheese~monkey"

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Regular expression problem with ruby - ruby

try this: /^([\w \.]+?) - ([\w \.-]+?)(?: v\d+)? #?\d+/ I think you need to understand what is the difference of (\w+?) and (\w+)?

I would do this in two stages, first remove the parts that you don't want a = str.sub /\s* (?: v\d+)? \s* \d+ \.[^.]? $/x, '' And then split the string on ' - ' a.split /\s-\s*/

Related

Reformatting dates

Separate word Regex Ruby

Rails 3 + regex - Replace part of a string, 1 occurrence

Ruby MatchData class is repeating captures, instead of including additional captures as it "should"

Ruby Regex match unless escaped with \

Categories

Resources

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Regular expression problem with ruby - ruby

try this: /^([\w \.]+?) - ([\w \.-]+?)(?: v\d+)? #?\d+/ I think you need to understand what is the difference of (\w+?) and (\w+)?

I would do this in two stages, first remove the parts that you don't want a = str.sub /\s* (?: v\d+)? \s* \d+ \.[^.]*? $/x, '' And then split the string on ' - ' a.split /\s*-\s*/

Related

Reformatting dates

Separate word Regex Ruby

Rails 3 + regex - Replace part of a string, 1 occurrence

Ruby MatchData class is repeating captures, instead of including additional captures as it "should"

Ruby Regex match unless escaped with \

Categories

Resources

I would do this in two stages, first remove the parts that you don't want a = str.sub /\s* (?: v\d+)? \s* \d+ \.[^.]? $/x, '' And then split the string on ' - ' a.split /\s-\s*/