Remove specific parts from url - ruby

Lets suppose I have a url like this:
https://www.youtube.com/watch/3e4345?v=rwmEkvPBG1s
What is the best and shorthest way to only get the 3e4345 part?
Sometimes it doesn't contain additional params in ?
I don't want to use any gems.
What I did was:
url = url.split('/watch/')
url = url[1].split('/')[0].split('?')[0]
Is there a better way? Thanks

possibly the safest and best one. use URI.
URI("https://www.youtube.com/watch/34345?v=rwmEkvPBG1s").path.split("/").last
For more refer How to extract URL parameters from a URL with Ruby or Rails?

You could do the following and using the match function to find a match based on a regular expression statement. The value at [1] is the first capture from the regular expression. I have included a breakdown from regexper.com to help illustrate what the expression is accomplishing.
You will notice parentheses around the \d+ which are what captures the digits out of the URL when it matches.
url.to_s.match(/\/watch\/(\d+).*$/)[1]

x = "https://www.youtube.com/watch/34345?v=rwmEkvPBG1s"
File.basename(URI(x).path)
=> "34345"

Related

cy.contains match with regex?

I am trying to match part of a url http://www.mywebsite.com/get-stuff in cypress and haven't been able to figure out how to code a regex match.
I tried:
cy.contains('http.*get-stuff')
and don't find a match for
do some things
If you are trying to see if some content on your website has the text http://www.mywebsite.com/get-stuff using regex, you will need to pass in a valid Regular Expression. Your argument is attempting to match using a glob expression.
If you are trying to see if the url of your website is navigated to http://www.mywebsite.com/get-stuff, you likely want to write an assertion off of the cy.url() command like so:
cy.url().should('match', /myregexp/)
I know it's been quite a long, but for those who are still looking for a solution ( just like me ), you can make use of cy.url().should('contain', /regex/) if the accepted solution didn't work for you.
This solution should work too:
cy.get('div') // select DOM element (tag, class or id)
.invoke('text') // check the innerHTML text
.should('match', /regex/) // compare with a regular expression
More on:
https://docs.cypress.io/api/commands/invoke

How to return file path without url link?

I have
http://foobar.s3.amazonaws.com/uploads/users/15/photos/12/foo.jpg
How do I return
uploads/users/15/photos/12/foo.jpg
It is better to use the URI parsing that is part of the Ruby standard library
than to experiment with some regular expression that may or may not take every
possible special case into account.
require 'uri'
url = "http://foo.s3.amazonaws.com/uploads/users/15/photos/12/foo.jpg"
path = URI.parse(url).path
# => "/uploads/users/15/photos/12/foo.jpg"
path[1..-1]
# => "uploads/users/15/photos/12/foo.jpg"
No need to reinvent the wheel.
"http://foobar.s3.amazonaws.com/uploads/users/15/photos/12/foo.jpg".sub("http://foobar.s3.amazonaws.com/","")
would be an explicit version, in which you substitute the homepage-part with an empty string.
For a more universal approach I would recommend a regular expression, similar to this one:
string = "http://foobar.s3.amazonaws.com/uploads/users/15/photos/12/foo.jpg"
string.sub(/(http:\/\/)*.*?\.\w{2,3}\//,"")
If it's needed, I could explain the regular expression.
link = "http://foobar.s3.amazonaws.com/uploads/users/15/photos/12/foo.jpg"
path = link.match /\/\/[^\/]*\/(.*)/
path[1]
#=> "uploads/users/15/photos/12/foo.jpg"
Someone recommended this approach as well:
URI.parse(URI.escape('http://foobar.s3.amazonaws.com/uploads/users/15/photos/12/foo.jpg')).path[1..-1]
Are there any disadvantages using something like this versus a regexp approach?
The cheap answer is to just strip everything before the first single /.
Better answers are "How do I process a URL in ruby to extract the component parts (scheme, username, password, host, etc)?" and "Remove subdomain from string in ruby".

Ruby regex: extract a list of urls from a string

I have a string of images' URLs and I need to convert it into an array.
http://rubular.com/r/E2a5v2hYnJ
How do I do this?
URI.extract(your_string)
That's all you need if you already have it in a string. I can't remember, but you may have to put require 'uri' in there first. Gotta love that standard library!
Here's the link to the docs URI#extract
Scan returns an array
myarray = mystring.scan(/regex/)
See here on regular-expressions.info
The best answer will depend very much on exactly what input string you expect.
If your test string is accurate then I would not use a regex, do this instead (as suggested by Marnen Laibow-Koser):
mystring.split('?v=3')
If you really don't have constant fluff between your useful strings then regex might be better. Your regex is greedy. This will get you part way:
mystring.scan(/https?:\/\/[\w.-\/]*?\.(jpe?g|gif|png)/)
Note the '?' after the '*' in the part capturing the server and path pieces of the URL, this makes the regex non-greedy.
The problem with this is that if your server name or path contains any of .jpg, .jpeg, .gif or .png then the result will be wrong in that instance.
Figuring out what is best needs more information about your input string. You might for example find it better to pattern match the fluff between your desired URLs.
Use String#split (see the docs for details).
Part of the problem is in rubular you are using https instead of http.. this gets you closer to what you want if the other answers don't work for you:
http://rubular.com/r/cIjmjxIfz5

Ruby RegEx issue

I'm having a problem getting my RegEx to work with my Ruby script.
Here is what I'm trying to match:
http://my.test.website.com/{GUID}/{GUID}/
Here is the RegEx that I've tested and should be matching the string as shown above:
/([-a-zA-Z0-9#:%_\+.~#?&\/\/=]{2,256}\.[a-z]{2,4}\b(\/[-a-zA-Z0-9#:%_\+.~#?&\/\/=]*)([\/\/[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}\/\/[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}\/\/])*?\/)/
3 capturing groups:
group 1: ([-a-zA-Z0-9#:%_\+.~#?&\/\/=]{2,256}\.[a-z]{2,4}\b(\/[-a-zA-Z0-9#:%_\+.~#?&\/\/=]*)([\/\/[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}\/\/[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}\/\/])*?\/)
group 2: (\/[-a-zA-Z0-9#:%_\+.~#?&\/\/=]*)
group 3: ([\/\/[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}\/\/[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}\/\/])
Ruby is giving me an error when trying to validate a match against this regex:
empty range in char class: (My RegEx goes here) (SyntaxError)
I appreciate any thoughts or suggestions on this.
You could simplify things a bit by using URI to deal parsing the URL, \h in the regex, and scan to pull out the GUIDs:
uri = URI.parse(your_url)
path = uri.path
guids = path.scan(/\h{8}-\h{4}-\h{4}-\h{4}-\h{12}/)
If you need any of the non-path components of the URL the you can easily pull them out of uri.
You might need to tighten things up a bit depending on your data or it might be sufficient to check that guids has two elements.
You have several errors in your RegEx. I am very sleepy now, so I'll just give you a hint instead of a solution:
...[\/\/[0-9a-fA-F]....
the first [ does not belong there. Also, having \/\/ inside [] is unnecessary - you only need each character once inside []. Also,
...[-a-zA-Z0-9#:%_\+.~#?&\/\/=]{2,256}...
is greedy, and includes a period - indeed, includes all chars (AFAICS) that can come after it, effectively swallowing the whole string (when you get rid of other bugs). Consider {2,256}? instead.

what does the empty regex match in ruby?

following a RoR security tutorial (here), i wrote something along the lines of
##private_re = //
def secure?
action_name =~ ##private_re
end
the idea is that in the base case, this shouldn't match anything, and return nil. problem is that it doesn't. i've worked around for the time being by using a nonsensical string, but i'd like to know the answer.
The empty regular expression successfully matches every string.
Examples of regular expressions that will always fail to match:
/(?=a)b/
/\Zx\A/
/[^\s\S]/
It is meant to not change the behavior of the controller in any way, as // will match every string.
The idea is that ##private is meant to be set in the controller to match things you DO want to be private. Thus, that code is meant to do nothing, but when combined with
##private = /.../ in the controller, gives you a nice privacy mechanism.

Resources