How to change the value of a url parameter in ruby? - ruby

What is a better way to do this?
Ideally using a URI parsing class of some sort, rather than relying on my own regex
url = "http://example.com" //or "http://example.com?after=111"
next_url = url.gsub(/after=\d+/,"666")
if !next_url.eql?(url)
if (new2.include?('?') == false)
next_url = url + "?after=666"
else
next_url = url + "&after=666"
end
end
puts next_url

I recommend using the Addressable gem when you are taking URLs apart or putting them together. It's very comprehensive, and has query_values(options = {}) and query_values=(new_query_values) to extract all the query components into a hash, or to rebuild it from a hash. It will also handle decoding and encoding the parameters as needed, things that URI will not do for you.

Not sure about your question, but you probably want something like this?
path, query = url.split('?')
query = (query||'').scan(/(.+)=(.+)/).map{|k, v| "#{k}=#{k == 'after' ? 666 : v}"}.join('&')
puts [path, query].join('?')

There's a Ruby library called URI that handles parsing and building of URLs.

Related

How to replace string in URL with captured regex pattern

I want to replace 'hoge' to 'foo' with regex. But the user's value is dynamic so I can't use str.gsub('hoge', 'foo').
str = '?user=hoge&tab=fuga'
What should I do?
Don't do this with a regular expression.
This is how to manipulate URIs using the existing wheels:
require 'uri'
str = 'http://example.com?user=hoge&tab=fuga'
uri = URI.parse(str)
query = URI.decode_www_form(uri.query).to_h # => {"user"=>"hoge", "tab"=>"fuga"}
query['user'] = 'foo'
uri.query = URI.encode_www_form(query)
uri.to_s # => "http://example.com?user=foo&tab=fuga"
Alternately:
require 'addressable'
uri = Addressable::URI.parse('http://example.com?tab=fuga&user=hoge')
query = uri.query_values # => {"tab"=>"fuga", "user"=>"hoge"}
query['user'] = 'foo'
uri.query_values = query
uri.to_s # => "http://example.com?tab=fuga&user=foo"
Note that in the examples the order of the parameters changed, but the code handled the difference without problems.
The reason you want to use URI or Addressable is because parameters and values have to be correctly encoded when they contain illegal characters. URI and Addressable know the rules and will follow them, whereas naive code assumes it's OK to not bother with encoding, causing broken URIs.
URI is part of the Ruby Standard Library, and Addressable is more full-featured. Take your pick.
You can try below regex
([?&]user=)([^&]+)
DEMO
You probably want to find out what the user query maps to first before using a .gsub to replace whatever value it is.
First, parse the URL string into an URI object using the URI module. And then, you can use the CGI query methods to get the key value pairs of the query params off the URI object using the CGI module. And finally, you can .gsub off the values in that hash.

How do I post/upload multiple files at once using HttpClient?

def test_post_with_file filename = 'test01.xml'
File.open(filename) do |file|
response = #http_client.post(url, {'documents'=>file})
end
end
How do I modify the above method to handle a multi-file-array post/upload?
file_array = ['test01.xml', 'test02.xml']
You mean like this?
def test_post_with_file(file_array=[])
file_array.each do |filename|
File.open(filename) do |file|
response = #http_client.post(url, {'documents'=>file})
end
end
end
I was having the same problem and finally figured out how to do it:
def test_post_with_file(file_array)
form = file_array.map { |n| ['documents[]', File.open(n)] }
response = #http_client.post(#url, form)
end
You can see in the docs how to pass multiple values: http://rubydoc.info/gems/httpclient/HTTPClient#post_content-instance_method .
In the "body" row, I tried without success to use the 4th example. Somehow HttpClient just decides to apply .to_s to each hash in the array.
Then I tried the 2nd solution and it wouldn't work either because only the last value is kept by the server. But I discovered after some tinkering that the second solution works if the parameter name includes the square brackets to indicate there are mutiple values as an array.
Maybe this is a bug in Sinatra (that's what I'm using), maybe the handling of such data is implementation-dependent, maybe the HttpClient doc is outdated/wrong. Or a combination of these.

How can I remove Google tracking parameters (UTM) from an URL?

I have a bunch of URLs which I would like to clean. They all contain UTM parameters, which are not necessary, or rather harmful in this case. Example:
http://houseofbuttons.tumblr.com/post/22326009438?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+HouseOfButtons+%28House+of+Buttons%29
All potential parameters begin with utm_.
How can I remove them easily with a ruby script / structure without destroying other potentialy "good" URL parameters?
You can apply a regex to the urls to clean them up. Something like this should do the trick:
url = 'http://houseofbuttons.tumblr.com/post/22326009438?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+HouseOfButtons+%28House+of+Buttons%29&normal_param=1'
url.gsub(/&?utm_.+?(&|$)/, '') => "http://houseofbuttons.tumblr.com/post/22326009438?normal_param=1"
This uses the URI lib to deconstruct and change the querystring (no regex):
require 'uri'
str ='http://houseofbuttons.tumblr.com/post/22326009438?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+HouseOfButtons+%28House+of+Buttons%29&normal_param=1'
uri = URI.parse(str)
clean_key_vals = URI.decode_www_form(uri.query).reject{|k, _| k.start_with?('utm_')}
uri.query = URI.encode_www_form(clean_key_vals)
p uri.to_s #=> "http://houseofbuttons.tumblr.com/post/22326009438?normal_param=1"

How do I safely join relative url segments?

I'm trying to find a robust method of joining partial url path segments together. Is there a quick way to do this?
I tried the following:
puts URI::join('resource/', '/edit', '12?option=test')
I expect:
resource/edit/12?option=test
But I get the error:
`merge': both URI are relative (URI::BadURIError)
I have used File.join() in the past for this but something does not seem right about using the file library for urls.
URI's api is not neccearily great.
URI::join will work only if the first one starts out as an absolute uri with protocol, and the later ones are relative in the right ways... except I try to do that and can't even get that to work.
This at least doesn't error, but why is it skipping the middle component?
URI::join('http://somewhere.com/resource', './edit', '12?option=test')
I think maybe URI just kind of sucks. It lacks significant api on instances, such as an instance #join or method to evaluate relative to a base uri, that you'd expect. It's just kinda crappy.
I think you're going to have to write it yourself. Or just use File.join and other File path methods, after testing all the edge cases you can think of to make sure it does what you want/expect.
edit 9 Dec 2016 I figured out the addressable gem does it very nicely.
base = Addressable::URI.parse("http://example.com")
base + "foo.html"
# => #<Addressable::URI:0x3ff9964aabe4 URI:http://example.com/foo.html>
base = Addressable::URI.parse("http://example.com/path/to/file.html")
base + "relative_file.xml"
# => #<Addressable::URI:0x3ff99648bc80 URI:http://example.com/path/to/relative_file.xml>
base = Addressable::URI.parse("https://example.com/path")
base + "//newhost/somewhere.jpg"
# => #<Addressable::URI:0x3ff9960c9ebc URI:https://newhost/somewhere.jpg>
base = Addressable::URI.parse("http://example.com/path/subpath/file.html")
base + "../up-one-level.html"
=> #<Addressable::URI:0x3fe13ec5e928 URI:http://example.com/path/up-one-level.html>
Have uri as URI::Generic or subclass of thereof
uri.path += '/123'
Enjoy!
06/25/2016 UPDATE for skeptical folk
require 'uri'
uri = URI('http://ioffe.net/boris')
uri.path += '/123'
p uri
Outputs
<URI::HTTP:0x2341a58 URL:http://ioffe.net/boris/123>
Run me
The problem is that resource/ is relative to the current directory, but /edit refers to the top level directory due to the leading slash. It's impossible to join the two directories without already knowing for certain that edit contains resource.
If you're looking for purely string operations, simply remove the leading or trailing slashes from all parts, then join them with / as the glue.
The way to do it using URI.join is:
URI.join('http://example.com', '/foo/', 'bar')
Pay attention to the trailing slashes. You can find the complete documentation here:
http://www.ruby-doc.org/stdlib-1.9.3/libdoc/uri/rdoc/URI.html#method-c-join
As you noticed, URI::join won't combine paths with repeated slashes, so it doesn't fit the part.
Turns out it doesn't require a lot of Ruby code to achieve this:
module GluePath
def self.join(*paths, separator: '/')
paths = paths.compact.reject(&:empty?)
last = paths.length - 1
paths.each_with_index.map { |path, index|
_expand(path, index, last, separator)
}.join
end
def self._expand(path, current, last, separator)
if path.start_with?(separator) && current != 0
path = path[1..-1]
end
unless path.end_with?(separator) || current == last
path = [path, separator]
end
path
end
end
The algorithm takes care of consecutive slashes, preserves start and end slashes, and ignores nil and empty strings.
puts GluePath::join('resource/', '/edit', '12?option=test')
outputs
resource/edit/12?option=test
Use this code:
File.join('resource/', '/edit', '12?option=test').
gsub(File::SEPARATOR, '/').
sub(/^\//, '')
# => resource/edit/12?option=test
example with empty strings:
File.join('', '/edit', '12?option=test').
gsub(File::SEPARATOR, '/').
sub(/^\//, '')
# => edit/12?option=test
Or use this if possible to use segments like resource/, edit/, 12?option=test and where http: is only a placeholder to get a valid URI. This works for me.
URI.
join('http:', 'resource/', 'edit/', '12?option=test').
path.
sub(/^\//, '')
# => "resource/edit/12"
A not optimized solution. Note that it doesn't take query params into account. It only handles paths.
class URL
def self.join(*str)
str.map { |path|
new_path = path
# Check the first character
if path[0] == "/"
new_path = new_path[1..-1]
end
# Check the last character
if path[-1] != "/"
new_path += "/"
end
new_path
}.join
end
end
This question is nearly a decade old, yet it seems that there is no perfect solution posted.
A handful of posted answers fail to handle multiple //, e.g. stuff like path = path[1..-1] if path.start_with?('/')
Answers that simply call File.join(*paths) seem to be the accepted "Ruby way," yet they fail in cases where you pass a URI object, e.g. File.join(URI.join('some/path')) fails with TypeError: no implicit conversion of URI::Generic into String.
Below is what I ended up using:
module UrlHelper
def self.join(*paths)
# yes, Ruby's stdlib really does lack a functional join method for URLs
File.join(*paths.map(&:to_s))
end
end
You can use File.join('resource/', '/edit', '12?option=test')
I improved #Maximo Mussini's script to make it works gracefully:
SmartURI.join('http://example.com/subpath', 'hello', query: { token: secret })
=> "http://example.com/subpath/hello?token=secret"
https://gist.github.com/zernel/0f10c71f5a9e044653c1a65c6c5ad697
require 'uri'
module SmartURI
SEPARATOR = '/'
def self.join(*paths, query: nil)
paths = paths.compact.reject(&:empty?)
last = paths.length - 1
url = paths.each_with_index.map { |path, index|
_expand(path, index, last)
}.join
if query.nil?
return url
elsif query.is_a? Hash
return url + "?#{URI.encode_www_form(query.to_a)}"
else
raise "Unexpected input type for query: #{query}, it should be a hash."
end
end
def self._expand(path, current, last)
if path.starts_with?(SEPARATOR) && current != 0
path = path[1..-1]
end
unless path.ends_with?(SEPARATOR) || current == last
path = [path, SEPARATOR]
end
path
end
end
You can use this:
URI.join('http://exemple.com', '/a/', 'b/', 'c/', 'd')
=> #<URI::HTTP http://exemple.com/a/b/c/d>
URI.join('http://exemple.com', '/a/', 'b/', 'c/', 'd').to_s
=> "http://exemple.com/a/b/c/d"
See: http://ruby-doc.org/stdlib-2.4.1/libdoc/uri/rdoc/URI.html#method-c-join-label-Synopsis
My understanding of URI::join is that it thinks like a web browser does.
To evaluate it, point your mental web browser to the first parameter, and keep clicking links until you browse to the last parameter.
For example, URI::join('http://example.com/resource/', '/edit', '12?option=test'), you would browse like this:
http://example.com/resource/, click a link to /edit (a file at the root of the site)
http://example.com/edit, click a link to 12?option=test (a file in the same directory as edit)
http://example.com/12?option=test
If the first link were /edit/ (with a trailing slash), or /edit/foo, then the next link would be relative to /edit/ rather than /.
This page possibly explains it better than I can: Why is URI.join so counterintuitive?
This is my simple take on this problem, just splitting up all the path segments and join them together again. This only works if you're only working with relative path segments, but if that's all you want to do this is handy.
def join_paths *paths
paths.map{|p| p.split('/')}
.flatten
.reject(&:empty?)
.compact
.join('/')
end
Then you can use it like so:
join_paths 'foo/', '/bar', 'a/b/c', 'd' #=> "foo/bar/a/b/c/d"

In Ruby/Rails, how can I encode/escape special characters in URLs?

How do I encode or 'escape' the URL before I use OpenURI to open(url)?
We're using OpenURI to open a remote url and return the xml:
getresult = open(url).read
The problem is the URL contains some user-input text that contains spaces and other characters, including "+", "&", "?", etc. potentially, so we need to safely escape the URL. I saw lots of examples when using Net::HTTP, but have not found any for OpenURI.
We also need to be able to un-escape a similar string we receive in a session variable, so we need the reciprocal function.
Don't use URI.escape as it has been deprecated in 1.9.
Rails' Active Support adds Hash#to_query:
{foo: 'asd asdf', bar: '"<#$dfs'}.to_query
# => "bar=%22%3C%23%24dfs&foo=asd+asdf"
Also, as you can see it tries to order query parameters always the same way, which is good for HTTP caching.
Ruby Standard Library to the rescue:
require 'uri'
user_text = URI.escape(user_text)
url = "http://example.com/#{user_text}"
result = open(url).read
See more at the docs for the URI::Escape module. It also has a method to do the inverse (unescape)
The main thing you have to consider is that you have to escape the keys and values separately before you compose the full URL.
All the methods which get the full URL and try to escape it afterwards are broken, because they cannot tell whether any & or = character was supposed to be a separator, or maybe a part of the value (or part of the key).
The CGI library seems to do a good job, except for the space character, which was traditionally encoded as +, and nowadays should be encoded as %20. But this is an easy fix.
Please, consider the following:
require 'cgi'
def encode_component(s)
# The space-encoding is a problem:
CGI.escape(s).gsub('+','%20')
end
def url_with_params(path, args = {})
return path if args.empty?
path + "?" + args.map do |k,v|
"#{encode_component(k.to_s)}=#{encode_component(v.to_s)}"
end.join("&")
end
def params_from_url(url)
path,query = url.split('?',2)
return [path,{}] unless query
q = query.split('&').inject({}) do |memo,p|
k,v = p.split('=',2)
memo[CGI.unescape(k)] = CGI.unescape(v)
memo
end
return [path, q]
end
u = url_with_params( "http://example.com",
"x[1]" => "& ?=/",
"2+2=4" => "true" )
# "http://example.com?x%5B1%5D=%26%20%3F%3D%2F&2%2B2%3D4=true"
params_from_url(u)
# ["http://example.com", {"x[1]"=>"& ?=/", "2+2=4"=>"true"}]
Ruby has the built-in URI library, and the Addressable gem, in particular Addressable::URI
I prefer Addressable::URI. It's very full featured and handles the encoding for you when you use the query_values= method.
I've seen some discussions about URI going through some growing pains so I tend to leave it alone for handling encoding/escaping until these things get sorted out:
http://osdir.com/ml/ruby-core/2010-06/msg00324.html
http://osdir.com/ml/lang-ruby-core/2009-06/msg00350.html
http://osdir.com/ml/ruby-core/2011-06/msg00748.html

Resources