Partial string replace with gsub

Partial string replace with gsub - ruby

I have an array of different image urls where I need to replace "/s_" with "/xl_". Ive tried a number of different ways, but non of them seems to work as I expect them to. Here is my latest version:
available_images.each do |img|
img.gsub(/.*(\/s_).*\.jpg/, "\/xl_")
end
available_images is the array holding a number of strings (which of course match the provided regex: .*(/s_).*.jpg ).
Any thoughts on how that can be fixed?
Thanks in advance!

A gsub! (! because you do a each and not a map) with a simple string (instead of a regex) should work:
"path/to/s_image.jpg".gsub '/s_', '/xl_'
# => "path/to/xl_image.jpg"
Update
As pointed out in the comments, the solution might result in unexpected behavior if the path contains multiple occurrences of '/s_'.
"path/s_thing/s_image.jpg".gsub '/s_', '/xl_'
#=> "path/xl_thing/xl_image.jpg"
▲ ▲
Borodin posted a nice, short regex substitution, which works in that case:
"path/s_thing/s_image.jpg".sub %r|/s_(?!.*/)|, '/xl_'
#=> "path/s_thing/xl_image.jpg"
△ ▲
It only replaces the last occurrence of '/s_'.

Related

Extracting numbers with regex in ruby from a numbers divided by a dot (thousand delimiter)

Trying to extract '4995' from the string '4.995,-' with regex in Ruby.
I tried with
/\d+/
Which seems to work from this Rubular screenshot: http://cl.ly/image/111c2x0N3s0C
but running it only outputs
4

You cannot match it in a single regex because it is not a single substring.
"4.995,-".gsub(/\D/, "") # => "4995"

I'm up-voting sawa's answer because it's a good answer.
But since you are new to regular expressions, you may want further explanation as to why his answer works for you.
When you are trying to match with the regexp /\d+/, what you are saying is "Match for me 1 or more consecutive digits." But your target string, 4.995,-, is not made up of only consecutive digits. It has a 4 and it has a 995. The first match of "1 or more consecutive digits" is 4. That's why what you're getting as a result is 4.
Try to look at your problem differently. Instead of saying, "Find me all the digits and extract those out," you could say, "Find me anything that's not a digit, and get rid of it." To do this, you can use ruby's search-and-replace function, gsub. gsub searches a target string for anything that matches a given regular expression, and then it replaces those matches with some replacement string that you also provide. Documentation on gsub can be found here
The regular expression for "non-digit" is /\D/. So, you can do a gsub that looks for any /\D/ and replaces it with a blank string.
'4.995,-'.gsub(/\D/,'')

Do as below using String#[] and String#tr:
"4.995,-"[/\d+.\d+/].tr('.','') # => "4995"
# more Rubyish way using #tr method only
"4.995,-".tr("^0-9",'') # => "4995"

p '4.995,-1'.delete('.')[/\d+/] #=> "4995"

Here's another way that, like #Arup's solution, works when a digit follows the first non-digit:
'4.995,-1'.sub('.','').to_i.to_s #=> "4995"
This works because
'4.995,-1'.sub('.','') #=> "4995,-1"
and to_i takes the first part part of a string that can be converted to a Fixnum.
Alternatively:
'4.995,-1'.to_f.to_s.sub('.','') #=> "4995"

ruby remove variable length string from regular expression leaving hyphen

I have a string such as this: "im# -33.870816,151.203654"
I want to extract the two numbers including the hyphen.
I tried this:
mystring = "im# -33.870816,151.203654"
/\D*(\-*\d+\.\d+),(\-*\d+\.\d+)/.match(mystring)
This gives me:
33.870816,151.203654
How do I get the hyphen?
I need to do this in ruby
Edit: I should clarify, the "im# " was just an example, there can be any set of characters before the numbers. the numbers are mostly well formed with the comma. I was having trouble with the hyphen (-)
Edit2: Note that the two nos are lattidue, longitude. That pattern is mostly fixed. However, in theory, the preceding string can be arbitrary. I don't expect it to have nos. or hyphen, but you never know.

How about this?
arr = "im# -33.2222,151.200".split(/[, ]/)[1..-1]
and arr is ["-33.2222", "151.200"], (using the split method).
now
arr[0].to_f is -33.2222 and arr[1].to_f is 151.2
EDIT: stripped "im#" part with [1..-1] as suggested in comments.
EDIT2: also, this work regardless of what the first characters are.

If you want to capture the two numbers with the hyphen you can use this regex:
> str = "im# -33.870816,151.203654"
> str.match(/([\d.,-]+)/).captures
=> ["33.870816,151.203654"]
Edit: now it captures hyphen.
This one captures each number separetely: http://rubular.com/r/NNP2OTEdiL
Note: Using String#scan will match all ocurrences of given pattern, in this case
> str.scan /\b\s?([-\d.]+)/
=> [["-33.870816"], ["151.203654"]] # Good, but flattened version is better
> str.scan(/\b\s?([-\d.]+)/).flatten
=> ["-33.870816", "151.203654"]
I recommend you playing around a little with Rubular. There's also some docs about regegular expressions with Ruby:
http://www.ruby-doc.org/docs/ProgrammingRuby/html/language.html#UJ
http://www.regular-expressions.info/ruby.html
http://www.ruby-doc.org/core-1.9.3/Regexp.html

Your regex doesn't work because the hyphen is caught by \D, so you have to modify it to catch only the right set of characters.
[^0-9-]* would be a good option.

Match comma separated list with Ruby Regex

Given the following string, I'd like to match the elements of the list and parts of the rest after the colon:
foo,bar,baz:something
I.e. I am expecting the first three match groups to be "foo", "bar", "baz". No commas and no colon. The minimum number of elements is 1, and there can be arbitrarily many. Assume no whitespace and lower case.
I've tried this, which should work, but doesn't populate all the match groups for some reason:
^([a-z]+)(?:,([a-z]+))*:(something)
That matches foo in \1 and baz (or whatever the last element is) in \2. I don't understand why I don't get a match group for bar.
Any ideas?
EDIT: Ruby 1.9.3, if that matters.
EDIT2: Rubular link: http://rubular.com/r/pDhByoarbA
EDIT3: Add colon to the end, because I am not just trying to match the list. Sorry, oversimplified the problem.

This expression works for me: /(\w+)/i

If you want to do it with regex, how about this?
(?<=^|,)("[^"]*"|[^,]*)(?=,|$)
This matches comma-separated fields, including the possibility of commas appearing inside quoted strings like 123,"Yes, No". Regexr for this.
More verbosely:
(?<=^|,) # Must be preceded by start-of-line or comma
(
"[^"]*"| # A quote, followed by a bunch of non-quotes, followed by quote, OR
[^,]* # OR anything until the next comma
)
(?=,|$) # Must end with comma or end-of-line
Usage would be with something like Python's re.findall(), which returns all non-overlapping matches in the string (working from left to right, if that matters.) Don't use it with your equivalent of re.search() or re.match() which only return the first match found.
(NOTE: This actually doesn't work in Python because the lookbehind (?<=^|,) isn't fixed width. Grr. Open to suggestions on this one.)
Edit: Use a non-capturing group to consume start-of-line or comma, instead of a lookbehind, and it works in Python.
>>> test_str = '123,456,"String","String, with, commas","Zero-width fields next",,"",nyet,123'
>>> m = re.findall('(?:^|,)("[^"]*"|[^,]*)(?=,|$)',test_str)
>>> m
['123', '456', '"String"', '"String, with, commas"',
'"Zero-width fields next"', '', '""', 'nyet', '123']
Edit 2: The Ruby equivalent of Python's re.findall(needle, haystack) is haystack.scan(needle).

Maybe split will be better solution for this case?
'foo,bar,baz'.split(',')
=> ["foo", "bar", "baz"]

If I am interpreting your post correctly, you want everything separated by commas before the colon (:).
The appropriate regex for this would be:
[^\s:]*(,[^\s:]*)*(:.*)?
This should find everything you are looking for.

Regular expression to strip everything but words

I'm helpless on regular expressions so please help me on this problem.
Basically I am downloading web pages and rss feeds and want to strip everything except plain words. No periods, commas, if, ands, and buts. Literally I have a list of the most common words used in English and I also want to strip those too but I think I know how to do that and don't need a regular expression because it would be really way to long.
How do I strip everything from a chunk of text except words that are delimited by spaces? Everything else goes in the trash.
This works quite well thanks to Pavel .split(/[^[:alpha:]]/).uniq!

I think that what fits you best would be splitting of the string into words. In this case, String::split function would be the better option. It accepts a regexp that matches substrings, which should split the source string into array elements.
In your case, it should be "some non-alphabetic characters". Alphabetic character class is denoted by [:alpha:]. So, here's the example of what you need:
irb(main):001:0> "asd, < er >w , we., wZr,fq.".split(/[^[:alpha:]]+/)
=> ["asd", "er", "w", "we", "wZr", "fq"]
You may further filter the result by intersecting the resultant array with array that contains only English words:
irb(main):001:0> ["asd", "er", "w", "we", "wZr", "fq"] & ["we","you","me"]
=> ["we"]

try \b\w*\b to match whole words

Very odd issue with Ruby and regex

I am getting completely different reults from string.scan and several regex testers...
I am just trying to grab the domain from the string, it is the last word.
The regex in question:
/([a-zA-Z0-9\-]*\.)*\w{1,4}$/
The string (1 single line, verified in Ruby's runtime btw)
str = 'Show more results from software.informer.com'
Work fine, but in ruby....
irb(main):050:0> str.scan /([a-zA-Z0-9\-]*\.)*\w{1,4}$/
=> [["informer."]]
I would think that I would get a match on software.informer.com ,which is my goal.

Your regex is correct, the result has to do with the way String#scan behaves. From the official documentation:
"If the pattern contains groups, each individual result is itself an array containing one entry per group."
Basically, if you put parentheses around the whole regex, the first element of each array in your results will be what you expect.

It does not look as if you expect more than one result (especially as the regex is anchored). In that case there is no reason to use scan.
'Show more results from software.informer.com'[ /([a-zA-Z0-9\-]*\.)*\w{1,4}$/ ]
#=> "software.informer.com"
If you do need to use scan (in which case you obviously need to remove the anchor), you can use (?:) to create non-capturing groups.
'foo.bar.baz lala software.informer.com'.scan( /(?:[a-zA-Z0-9\-]*\.)*\w{1,4}/ )
#=> ["foo.bar.baz", "lala", "software.informer.com"]

You are getting a match on software.informer.com. Check the value of $&. The return of scan is an array of the captured groups. Add capturing parentheses around the suffix, and you'll get the .com as part of the return value from scan as well.
The regex testers and Ruby are not disagreeing about the fundamental issue (the regex itself). Rather, their interfaces are differing in what they are emphasizing. When you run scan in irb, the first thing you'll see is the return value from scan (an Array of the captured subpatterns), which is not the same thing as the matched text. Regex testers are most likely oriented toward displaying the matched text.

How about doing this :
/([a-zA-Z0-9\-]*\.*\w{1,4})$/
This returns
informer.com
On your test string.
http://rubular.com/regexes/13670

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Partial string replace with gsub - ruby

Related

Extracting numbers with regex in ruby from a numbers divided by a dot (thousand delimiter)

ruby remove variable length string from regular expression leaving hyphen

Match comma separated list with Ruby Regex

Regular expression to strip everything but words

Very odd issue with Ruby and regex

Categories

Resources