I have a relative URI / resource:
"/v1/threads/110408889879497140/"
I want to just parse out the ID (the final number in this string).
Hoping something other than regex :)
a = "/v1/threads/110408889879497140/"
a.split('/').last
you can also do it with rpartition:
"/v1/threads/110408889879497140/".rpartition('threads/').last.chop
Use scan with regex:
a.scan(/\d{5,}/)
If you want to isolate numbers in a string without regex, you can use the fact that numbers have ASCII range from 48 to 57 and do something like:
a = "/v1/threads/110408889879497140/"
a.each_char{ |c| a.delete!(c) unless c.ord.between?(48, 57) }
p a #=> 1110408889879497140
A URL is just a protocol designation followed by a file path, so use File.basename which was designed to work with file paths:
File.basename("/v1/threads/110408889879497140/")
# => "110408889879497140"
Related
I am new to ruby and writing the expression to replace the string between the xml tags by hashing the value inside that.
I did the following to replace with the new password
puts "<password>check1</password>".gsub(/(?<=password\>)[^\/]+(?=\<\/password)/,'New \0')
RESULT: <password>New check1</password> (EXPECTED)
My expectation is to get the result like this (Md5 checksum of the value "New check1")
<password>6aaf125b14c97b307c85fc6e681c410e</password>
I tried it in the following ways and none of them was successful (I have included the required libraries "require 'digest'").
puts "<password>check1</password>".gsub(/(?<=password\>)[^\/]+(?=\<\/password)/,Digest::MD5.hexdigest('\0'))
puts "<password>check1</password>".gsub(/(?<=password\>)[^\/]+(?=\<\/password)/,Digest::MD5.hexdigest '\0')
puts "<password>check1</password>".gsub(/(?<=password\>)[^\/]+(?=\<\/password)/, "Digest::MD5.hexdigest \0")
Any help on this to achieve the expectation is very much appreciated
This will work:
require 'digest'
line = "<other>stuff</other><password>check1</password><more>more</more>"
line.sub(/<password>(?<pwd>[^<]+)<\/password>/, Digest::SHA2.hexdigest(pwd))
=> "<other>stuff</other>8a859fd2a56cc37285bc3e307ef0d9fc1d2ec054ea3c7d0ec0ff547cbfacf8dd<more>more</more>"
Make sure the input is one line at a time, and you'll probably want sub, not gsub
P.S.: agree with Tom Lord's comment.. if your XML is not gargantuan in size, try to use an XML library to parse it... Ox or Nokogiri perhaps?
Different libraries have different advantages.
This is a variant of Tilo's answer.
require 'digest'
line = "<other>stuff</other><password>check1</password><more>more</more>"
r = /(?<=<password>).+?(?=<\/password>)/
line.sub(r) { |pwd| Digest::SHA2.hexdigest(pwd) }
#=> "<other>stuff</other><password>8a859fd2a56cc37285bc3e307ef0d9f
# c1d2ec054ea3c7d0ec0ff547cbfacf8dd</password><more>more</more>"
(I've displayed the returned string on two lines so make it readable without the need for horizontal scrolling.)
The regular expression reads, "match '<password>' in a positive lookbehind ((?<=...)), followed by any number of characters, lazily ('?'), followed by the string '</password>' in a positive lookahead ((?=...)).
I'm creating a URL parser and have three kind of URLs from which I would like to extract the number portion from the end of the URL and increment the extracted number by 10 and update the URL. I'm trying to use regex to extract but I'm new to regex and having trouble.
These are three URL structures of which I'd like to increment the last number portion of:
Increment last number 20 by 10:
http://forums.scamadviser.com/site-feedback-issues-feature-requests/20/
Increment last number 50 by 10:
https://forums.questionablecontent.net/index.php/board,1.50.html
Increment last number 30 by 10:
https://forums.comodo.com/how-can-i-help-comodo-please-we-need-you-b39.30/
With \d+(?!.*\d) regex, you will get the last digit chunk in the string. Then, use s.gsub with a block to modify the number and put back to the result.
See this Ruby demo:
strs = ['http://forums.scamadviser.com/site-feedback-issues-feature-requests/20/', 'https://forums.questionablecontent.net/index.php/board,1.50.html', 'https://forums.comodo.com/how-can-i-help-comodo-please-we-need-you-b39.30/']
arr = strs.map {|item| item.gsub(/\d+(?!.*\d)/) {$~[0].to_i+10}}
Note: $~ is a MatchData object, and using the [0] index we can access the whole match value.
Results:
http://forums.scamadviser.com/site-feedback-issues-feature-requests/30/
https://forums.questionablecontent.net/index.php/board,1.60.html
https://forums.comodo.com/how-can-i-help-comodo-please-we-need-you-b39.40/
Try this regex:
\d+(?=(\/)|(.html))
It will extract the last number.
Demo: https://regex101.com/r/zqUQlF/1
Substitute back with this regex:
(.*?)(\d+)((\/)|(.html))
Demo: https://regex101.com/r/zqUQlF/2
this regex matches only the last whole number in each URL by using a lookahead (which 'sees' patterns but doesn't eat any characters):
\d+(?=\D*$)
online demo here.
Like this:
urls = ['http://forums.scamadviser.com/site-feedback-issues-feature-requests/20/', 'https://forums.questionablecontent.net/index.php/board,1.50.html', 'https://forums.comodo.com/how-can-i-help-comodo-please-we-need-you-b39.30/']
pattern = /(\d+)(?=[^\d]+$)/
urls.each do |url|
url.gsub!(pattern) {|m| m.to_i + 10}
end
puts urls
You can also test it online here: https://ideone.com/smBJCQ
I have the following string:
<http://test.host/users?param1=1¶m=1>; rel=\"rel_value\"
And I would like to get the URL and the rel value. That is:
http://test.host/users?param1=1¶m=1
and
rel_value
I know how to get the URL:
string[/<.*?>/]
But failing to see how to get the rel. Any ideas on a regex that I could get both?
If the string is guaranteed to have that format:
/<(.+)>; rel=\\\"(.+)\\\"/
To be used like so:
m = s.match(/<(.+)>; rel=\\\"(.+)\\\"/)
m[0] #=> http://test.host/users?param1=1¶m=1
m[1] #=> rel_value
Additionally, you could just use two regexes to search for each thing in the string:
s[/(?<=<).+(?=>)/] #=> http://test.host/users?param1=1¶m=1
s[/(?<=rel=\\\").+(?=\\\")/] #=> rel_value
(These use lookahead and lookbehind to not capture anything besides the values).
As you asked for a regex that does both:
<(.*)>.*rel=\\"(.*)\\"
The first capturing group contains the URL, and the second one the rel value. But you could just do one regex for each.
For the URL:
<(.*)>
And for the rel value:
rel=\\"(.*)\\"
There should be at least one non-regex solution:
str.tr('<>\\\"','').split(';\s+rel=')
#=> ["http://test.host/users?param1=1¶m=1; rel=rel_value"]
I have a some VMware server tools that return a string from our VMware servers of all the virtual machines it is hosting. I am using Ruby and need to get all the hostnames, however, when I do, the string looks like this:
=>"/vmfs/volumes/d9a12362-2cc7sfe/server1/server1.vmx"
Is there a regex to replace the string with what is between the last "/" and ".vmx"?
There are 2 sweet methods for this.
string = "/vmfs/volumes/d9a12362-2cc7sfe/server1/server1.vmx"
File.extname(string) # => '.vmx'
File.basename(string, File.extname(string)) # => "server1"
If you wanted to use regex you can do this with gsub or sub:
s = '=>"/vmfs/volumes/d9a12362-2cc7sfe/server1/server1.vmx"'
result = s.gsub(/(?mi)(=>".+\/)(.+)(\.vmx")/, '\1new\3')
Result:
=>"/vmfs/volumes/d9a12362-2cc7sfe/server1/new.vmx"
I wouldn't worry about using a regex. Ruby has methods that make this easy:
path = "/vmfs/volumes/d9a12362-2cc7sfe/server1/server1.vmx"
dir, basename = File.split(path)
File.join(dir, 'foo' + File.extname(basename))
# => "/vmfs/volumes/d9a12362-2cc7sfe/server1/foo.vmx"
Regular expressions are powerful but that power comes at a price:
They can be hard to maintain, especially as they become more complex.
They can really eat up CPU time, especially when they do the wrong thing or aren't explicit enough.
They can open up logic-holes resulting in bad results, so we have to test them carefully.
Your question isn't entirely clear though:
...replace the string with what is between the last "/" and ".vmx"?
could mean you want to replace the entire path with the basename minus the extension:
path = File.basename(path, File.extname(path))
# => "server1"
You could do something like this:
string = "/vmfs/volumes/d9a12362-2cc7sfe/server1/server1.vmx"
new_string = string.split("/")[-1].gsub(".vmx","")
This way you dont need to use regex
I'm trying to separate out path elements from a URL in ruby. My URL is:
/cm/api/m_deploymentzones_getDeploymentZones.htm
And using a regex:
/\/([^\/]*)\//
The match is /cm/. I expected to see just cm, given that the / characters are explicitly excluded from the capturing group. What am I doing wrong?
I am also trying to get the next path element with:
/\/[^\/]*\/([^\/]*)\//
And this returns: /cm/api/
Any ideas?
It would be easier to use URI.parse():
require 'uri'
url = '/cm/api/m_deploymentzones_getDeploymentZones.htm'
parts = URI.parse(url).path.split('/') # ["", "cm", "api", "m_deploymentzones_getDeploymentZones.htm"]
Here's a regex matching the path components:
(?<=/).+?(?=/|$)
But you'd be better off just splitting the string on / characters. You don't need regex for this.