Regex to extract last number portion of varying URL - ruby

I'm creating a URL parser and have three kind of URLs from which I would like to extract the number portion from the end of the URL and increment the extracted number by 10 and update the URL. I'm trying to use regex to extract but I'm new to regex and having trouble.
These are three URL structures of which I'd like to increment the last number portion of:
Increment last number 20 by 10:
http://forums.scamadviser.com/site-feedback-issues-feature-requests/20/
Increment last number 50 by 10:
https://forums.questionablecontent.net/index.php/board,1.50.html
Increment last number 30 by 10:
https://forums.comodo.com/how-can-i-help-comodo-please-we-need-you-b39.30/

With \d+(?!.*\d) regex, you will get the last digit chunk in the string. Then, use s.gsub with a block to modify the number and put back to the result.
See this Ruby demo:
strs = ['http://forums.scamadviser.com/site-feedback-issues-feature-requests/20/', 'https://forums.questionablecontent.net/index.php/board,1.50.html', 'https://forums.comodo.com/how-can-i-help-comodo-please-we-need-you-b39.30/']
arr = strs.map {|item| item.gsub(/\d+(?!.*\d)/) {$~[0].to_i+10}}
Note: $~ is a MatchData object, and using the [0] index we can access the whole match value.
Results:
http://forums.scamadviser.com/site-feedback-issues-feature-requests/30/
https://forums.questionablecontent.net/index.php/board,1.60.html
https://forums.comodo.com/how-can-i-help-comodo-please-we-need-you-b39.40/

Try this regex:
\d+(?=(\/)|(.html))
It will extract the last number.
Demo: https://regex101.com/r/zqUQlF/1
Substitute back with this regex:
(.*?)(\d+)((\/)|(.html))
Demo: https://regex101.com/r/zqUQlF/2

this regex matches only the last whole number in each URL by using a lookahead (which 'sees' patterns but doesn't eat any characters):
\d+(?=\D*$)
online demo here.

Like this:
urls = ['http://forums.scamadviser.com/site-feedback-issues-feature-requests/20/', 'https://forums.questionablecontent.net/index.php/board,1.50.html', 'https://forums.comodo.com/how-can-i-help-comodo-please-we-need-you-b39.30/']
pattern = /(\d+)(?=[^\d]+$)/
urls.each do |url|
url.gsub!(pattern) {|m| m.to_i + 10}
end
puts urls
You can also test it online here: https://ideone.com/smBJCQ

Related

How do I regex-match an unknown number of repeating elements?

I'm trying to write a Ruby script that replaces all rem values in a CSS file with their px equivalents. This would be an example CSS file:
body{font-size:1.6rem;margin:4rem 7rem;}
The MatchData I'd like to get would be:
# Match 1 Match 2
# 1. font-size 1. margin
# 2. 1.6 2. 4
# 3. 7
However I'm entirely clueless as to how to get multiple and different MatchData results. The RegEx that got me closest is this (you can also take a look at it at Rubular):
/([^}{;]+):\s*([0-9.]+?)rem(?=\s*;|\s*})/i
This will match single instances of value declarations (so it will properly return the desired Match 1 result), but entirely disregards multiples.
I also tried something along the lines of ([0-9.]+?rem\s*)+, but that didn't return the desired result either, and doesn't feel like I'm on the right track, as it won't return multiple result data sets.
EDIT After the suggestions in the answers, I ended up solving the problem like this:
# search for any declarations that contain rem unit values and modify blockwise
#output.gsub!(/([^ }{;]+):\s*([^}{;]*[0-9.]rem+[^;]*)(?=\s*;|\s*})/i) do |match|
# search for any single rem value
string = match.gsub(/([0-9.]+)rem/i) do |value|
# convert the rem value to px by multiplying by 10 (this is not universal!)
value = sprintf('%g', Regexp.last_match[1].to_f * 10).to_s + 'px'
end
string += ';' + match # append the original match result to the replacement
match = string # overwrite the matched result
end
You can't capture a dynamic number of match groups (at least not in ruby).
Instead you could do either one of the following:
Capture the whole value and split on space
Use multilevel matching to capture first the whole key/value pair and secondly match the value. You can use blocks on the match method in ruby.
This regex will do the job for your example :
([^}{;]+):(?:([0-9\.]+?)rem\s?)?(?:([0-9\.]+?)rem\s?)
But whith this you can't match something like : margin:4rem 7rem 9rem
This is what I've been able to do: DEMO
Regex: (?<={|;)([^:}]+)(?::)([^A-Za-z]+)
And this is what my result looks like:
# Match 1 Match 2
# 1. font-size 1. margin
# 2. 1.6 2. 4
As #koffeinfrei says, dynamic capture isn't possible in Ruby. Would be smarter to capture the whole string and remove spaces.
str = 'body{font-size:1.6rem;margin:4rem 7rem;}'
str.scan(/(?<=[{; ]).+?(?=[;}])/)
.map { |e| e.match /(?<prop>.+):(?<value>.+)/ }
#⇒ [
# [0] #<MatchData "font-size:1.6rem" prop:"font-size" value:"1.6rem">,
# [1] #<MatchData "margin:4rem 7rem" prop:"margin" value:"4rem 7rem">
# ]
The latter match might be easily adapted to return whatever you want, value.split(/\s+/) will return all the values, \d+ instead of .+ will match digits only etc.

Return specific segment from Ruby regex

I have a big chunk of text I am scanning through and I am searching with a regex that is prefixed by some text.
var1 = textchunk.match(/thedata=(\d{6})/)
My result from var1 would return something like:
thedata=123456
How do I only return the number part of the search so in the example above just 123456 without taking var1 and then stripping thedata= off in a line below
If you expect just one match in the string, you may use your own code and access the captures property and get the first item (since the data you need is captured with the first set of unescaped parentheses that form a capturing group):
textchunk.match(/thedata=(\d{6})/).captures.first
See this IDEONE demo
If you have multiple matches, just use scan:
textchunk.scan(/thedata=(\d{6})/)
NOTE: to only match thedata= followed with exactly 6 digits, add a word boundary:
/thedata=(\d{6})\b/
^^
or a lookahead (if there can be word chars after 6 digits other than digits):
/thedata=(\d{6})(?!\d)/
^^^^^^
▶ textchunk = 'garbage=42 thedata=123456'
#⇒ "garbage=42 thedata=123456"
▶ textchunk[/thedata=(\d{6})/, 1]
#⇒ "123456"
▶ textchunk[/(?<=thedata=)\d{6}/]
#⇒ "123456"
The latter uses positive lookbehind.

best way to find substring in ruby using regular expression

I have a string https://stackverflow.com. I want a new string that contains the domain from the given string using regular expressions.
Example:
x = "https://stackverflow.com"
newstring = "stackoverflow.com"
Example 2:
x = "https://www.stackverflow.com"
newstring = "www.stackoverflow.com"
"https://stackverflow.com"[/(?<=:\/\/).*/]
#⇒ "stackverflow.com"
(?<=..) is a positive lookbehind.
If string = "http://stackoverflow.com",
a really easy way is string.split("http://")[1]. But this isn't regex.
A regex solution would be as follows:
string.scan(/^http:\/\/(.+)$/).flatten.first
To explain:
String#scan returns the first match of the regex.
The regex:
^ matches beginning of line
http: matches those characters
\/\/ matches //
(.+) sets a "match group" containing any number of any characters. This is the value returned by the scan.
$ matches end of line
.flatten.first extracts the results from String#scan, which in this case returns a nested array.
You might want to try this:
#!/usr/bin/env ruby
str = "https://stackoverflow.com"
if mtch = str.match(/(?::\/\/)(/S)/)
f1 = mtch.captures
end
There are two capturing groups in the match method: the first one is a non-capturing group referring to your search pattern and the second one referring to everything else afterwards. After that, the captures method will assign the desired result to f1.
I hope this solves your problem.

RegEx to remove new line characters and replace with comma

I scraped a website using Nokogiri and after using xpath I was left with the following string (which is a few td's pushed into one string).
"Total First Downs\n\t\t\t\t\t\t\t\t359\n\t\t\t\t\t\t\t\t274\n\t\t\t\t\t\t\t"
My goal is to make this into an array that looks like the following(it will be a nested array):
["Total First Downs", "359", "274"]
The issue is creating a regex equation that removes the escaped characters, subs in one "," but does not sub in a "," after the last set of integers. If the comma after the last set of integers is necessary, I could use #compact to get rid of the nil that occurs in the array. If you need the code on how I scraped the website here it is: (please note i saved the webpage for testing in order for my ip address to not get burned during the trial phase)
f = File.open('page')
doc = Nokogiri::HTML:(f)
f.close
number = doc.xpath('//tr[#class="tbdy1"]').count
stats = Array.new(number) {Array.new}
i = 0
doc.xpath('//tr[#class="tbdy1"]').each do |tr|
stats[i] << tr.text
i += 1
end
Thanks for your help
I don't fully understand your problem, but the result can be easily achieved with this:
"Total First Downs\n\t\t\t\t\t\t\t\t359\n\t\t\t\t\t\t\t\t274\n\t\t\t\t\t\t\t"
.split(/[\n\t]+/)
# => ["Total First Downs", "359", "274"]
Try with gsub
"Total First Downs\n\t\t\t\t\t\t\t\t359\n\t\t\t\t\t\t\t\t274\n\t\t\t\t\t\t\t".gsub("/[\n\t]+/",",")

Find two numbers in string with Ruby using Regex

I have a string that looks like this:
Results 1 - 10 of 20
How would I find the number 10 and 20 of that sentence using regex in Ruby?
Something like:
first_number, second_number = compute_regex(my_string)...
Thanks
Like so:
first, second = *source.scan(/\d+/)[-2,2]
Explanation
\d+ matches any number
scan finds all matches of its regular expression argument in source
[-2,2] returns the last two numbers in an array: starts at index -2 from end, returns next 2
* splat operator unpacks these two matches into the variables first and second (NOTE: this operator is not necessary, you can remove this, and I like the concept)
Try this:
a = "Results 1 - 10 of 20"
first_number, second_number = a.match(/\w+ (\d) \- (\d+) of (\d+)/)[2..3].map(&:to_i)
The map piece is necessary because the regexp MatchData objects returned are strings.

Resources