Proper gsub regular expression for this URL? - ruby

Say I have a string representing a URL:
http://www.mysite.com/somepage.aspx?id=33
..I'd like to escape the forward slashes and the question mark:
http:\/\/www.mysite.com\/somepage.aspx\?id=33
How can I do this via gsub? I've been playing with some regular expressions in there but haven't hit on the winning formula yet.

I suggest you use
url = url.gsub(/(?=[\/?])/, '\\')
As shown here
url = 'http://www.mysite.com/somepage.aspx?id=33'
url = url.gsub(/(?=[\/?])/, '\\')
puts url
output
http:\/\/www.mysite.com\/somepage.aspx\?id=33

How about this one result = searchText.gsub(/(\/|\?)/, "\\\\$1")

I will suggest using a block to make it more readable:
url.gsub(/[\/?]/) { |c| "\\#{c}" }

Related

How to remove amp; from URL

I am getting a URL that contains amp;. Is there any way to remove this as currently I tried URLDecode function, but It's not working. Do I need to remove It using simple string replacement or Is there any better way to do this?
As #Lankymart pointed out URLDecode only works on URL-encoded characters (%26), not on HTML entities (&). Use a regular string replacement to change the HTML entity & into a literal ampersand:
url = Replace(url, "&", "&")
In Angular I added amp; to the params names
this.activatedRoute.queryParams.subscribe(params => {
this.user_id = params['user_id'];
this.practice_id = params['amp;practice_id'];
this.patient_id = params['amp;patient_id'];
});

Cleanest way to inject into string

We are looking to optimize images with a thumbnail version, which are stored under a funky version of the existing URL:
Original Image:
https://image.s3-us-west-2.amazonaws.com/8/flower.jpg
Thumbnail Image:
https://image.s3-us-west-2.amazonaws.com/8/thumbnails/medium_flower.jpg
I was going to look from the end of the string for the last '/' and replacing it with '/thumbnails/medium_'. In my case this always safe, but I can't figure out this kind of mutation in Ruby on Rails.
s = "https://image.s3-us-west-2.amazonaws.com/8/flower.jpg"
img_url = s.split('/')[-1] // should give 'flower.jpg'
The issue is to get everything before the last '/' to inject in 'thumbnails/medium_'. Any ideas?
s = "https://image.s3-us-west-2.amazonaws.com/8/flower.jpg"
img_url = s.insert(s.rindex('/')+1, 'thumbnails/medium_')
# The above approach modifies the original string, if this is unsatisfactory, use:
img_url = s.dup.insert(s.rindex('/')+1, 'thumbnails/medium_')
s = "https://image.s3-us-west-2.amazonaws.com/8/flower.jpg"
img_url = "#{File.dirname(s)}/thumbnails/medium_#{File.basename(s)}"
# => "https://image.s3-us-west-2.amazonaws.com/8/thumbnails/medium_flower.jpg"
I would probably use URI and Pathname to work with URLs and file paths:
require 'uri'
require 'pathname'
url = "https://image.s3-us-west-2.amazonaws.com/8/flower.jpg"
uri = URI.new(url)
path = Pathname.new(uri.path)
uri.path = "#{path.dirname}/thumbnails/medium_#{path.basename}"
uri.to_s
#=> "https://image.s3-us-west-2.amazonaws.com/8/thumbnails/medium_flower.jpg"
s = "https://image.s3-us-west-2.amazonaws.com/8/flower.jpg"
s.sub /([^\/]+)$/, 'thumbnails/medium_\1'
The s.sub's 2nd argument should be quoted with single quotation mark, or you have to escape the backslash in the \1 part.
UPDATE
s.sub /([^\/]+?)(?=$|\?|#)$/, 'thumbnails/medium_\1'
In case there's a query string or a fragment or both, behind the path, which contains slashes.
It's #[Range] method what you need:
# a little performance optimization - no need to split split string twice
parts = s.split('/')
img_url = parts[0..-2].join('/') + "/thumbnails/medium_" + parts[-1]
On a side note. If you are using some Rails plugin for handling images (CarrierWave or Paperclip), you should use built-in mechanisms for URL interpolation.

Regular Expression find usage of word after "/" in URL

I am trying to parse through URLs using Ruby and return the URLs that match a word after the "/" in .com , .org , etc.
If I am trying to capture "questions" in a URL such as
https://stackoverflow.com/questions I also want to be able to capture https://stackoverflow.com/blah/questions. But I do not want to capture https://stackoverflow.com/queStioNs.
Currently my expression can match https://stackoverflow.com/questions but cannot match with "questions" after another "/", or 2 "/"s, etc.
The end of my regular expression is using \bquestions\.
I tried doing ([a-zA-Z]+\W{1}+\bjob\b|\bjob\b) but this only gets me URLs with /questions and /blah/questions but not /blah/bleh/questions.
What am I doing wrong and how do I match what I need?
You don't actually need a regex for this, you can instead use the URI module:
require 'uri'
urls = ['https://stackoverflow.com/blah/questions', 'https://stackoverflow.com/queStioNs']
urls.each do |url|
the_path = URI(url).path
puts the_path if the_path.include?'questions'
end
I don't know whether there is any simple way around, here is my solution:
regexp = '^(https|http)?:\/\/[\w]+\.(com|org|edu)(\/{1}[a-z]+)*$'
group_length = "https://stackoverflow.com/blah/questions".match(regexp).length
"https://stackoverflow.com/blah/questions".match(regexp)[group_length - 1].gsub("/","")
It will return 'questions'.
Update as per you comments below:
use [\S]*(\/questions){1}$
Hope it helps :)

How to get part of string after some word with Ruby?

I have a string containing a path:
/var/www/project/data/path/to/file.mp3
I need to get the substring starting with '/data' and delete all before it. So, I need to get only /data/path/to/file.mp3.
What would be the fastest solution?
'/var/www/project/data/path/to/file.mp3'.match(/\/data.*/)[0]
=> "/data/path/to/file.mp3"
could be as easy as:
string = '/var/www/project/data/path/to/file.mp3'
path = string[/\/data.*/]
puts path
=> /data/path/to/file.mp3
Using regular expression is a good way. Though I am not familiar with ruby, I think ruby should have some function like "substring()"(maybe another name in ruby).
Here is a demo by using javascript:
var str = "/var/www/project/data/path/to/file.mp3";
var startIndex = str.indexOf("/data");
var result = str.substring(startIndex );
And the link on jsfiddle demo
I think the code in ruby is similar, you can check the documentation. Hope it's helpful.
Please try this:
"/var/www/project/data/path/to/file.mp3".scan(/\/var\/www(\/.+)*/)
It should return you all occurrences.

Ruby: replace a given URL in an HTML string

In Ruby, I want to replace a given URL in an HTML string.
Here is my unsuccessful attempt:
escaped_url = url.gsub(/\//,"\/").gsub(/\./,"\.").gsub(/\?/,"\?")
path_regexp = Regexp.new(escaped_url)
html.gsub!(path_regexp, new_url)
Note: url is actually a Google Chart request URL I wrote, which will not have more special characters than /?|.=%:
The gsub method can take a string or a Regexp as its first argument, same goes for gsub!. For example:
>> 'here is some ..text.. xxtextxx'.gsub('..text..', 'pancakes')
=> "here is some pancakes xxtextxx"
So you don't need to bother with a regex or escaping at all, just do a straight string replacement:
html.gsub!(url, new_url)
Or better, use an HTML parser to find the particular node you're looking for and do a simple attribute assignment.
I think you're looking for something like:
path_regexp = Regexp.new(Regexp.escape(url))

Resources