Ruby CGI::Cookie raw cookie parsing - ruby

I want to be able to parse raw cookie strings in ruby 1.9.3p194 (2012-04-20 revision 35410) [x86_64-darwin10.8.0].
The CGI::Cookie library looked promising, however, it does not work how I thought it might have.
For example,
CGI::Cookie::parse("ASPSESSIONIDSCDRSRTS=HHALOHOBJGJMLPIANNLDOMCJ; path=/").each_key {|name| p 'Cookie name: ' + name}
Will return:
"Cookie name: ASPSESSIONIDSCDRSRTS"
"Cookie name: path"
What I would like is something like the CGI::new instance works, except you pass it a raw cookie string:
cookie1 = CGI::Cookie::new('name' => 'name',
'value' => ['value1', 'value2', ...],
'path' => 'path', # optional
'domain' => 'domain', # optional
'expires' => Time.now, # optional
'secure' => true # optional
)
name = cookie1.name
values = cookie1.value
path = cookie1.path
domain = cookie1.domain
expires = cookie1.expires
secure = cookie1.secure
What I can't figure out is how to do this eloquently from a raw cookie string.
EDIT ---
The following code is in the ~/.rvm/src/ruby-1.9.3-p194/lib/cgi/cookie.rb file. So it should return as commented below cookies[name] = Cookie::new(name, *values). Which I on't seem to be getting.
# Parse a raw cookie string into a hash of cookie-name=>Cookie
# pairs.
#
# cookies = CGI::Cookie::parse("raw_cookie_string")
# # { "name1" => cookie1, "name2" => cookie2, ... }
#
def Cookie::parse(raw_cookie)
cookies = Hash.new([])
return cookies unless raw_cookie
raw_cookie.split(/[;,]\s?/).each do |pairs|
name, values = pairs.split('=',2)
next unless name and values
name = CGI::unescape(name)
values ||= ""
values = values.split('&').collect{|v| CGI::unescape(v,##accept_charset) }
if cookies.has_key?(name)
values = cookies[name].value + values
end
cookies[name] = Cookie::new(name, *values)
end
cookies
end
EDIT ---
This seems to be a bug in Ruby's CGI::Cookie.parse method. I've opened a bug report on the Ruby bug tracker - https://bugs.ruby-lang.org/issues/7364

I was just having the same problem. CGI::Cookie.new is just not capable to consume what CGI::Cookie::parse produces, they just don't seem to complement each other the way we expected then.
Searching through the gems quickly I found https://github.com/dwaite/cookiejar . It seems this gem tries to do a lot, but least it has code to properly parse raw cookies as we expect it. I didn't try to understand the library because it seems it does so much stuff and I'm actually only concerned about "consuming cookie -> modiyfing it -> produces cookie".
I came up with this quick hack:
require 'cgi'
require 'cookiejar'
# Some sample
raw_cookie = "somecookie=some%7Cvalue; expires=Wed, 02 Apr 2014 13:36:50 GMT; path=/; domain=.somedomain.com"
parts = CookieJar::CookieValidation.parse_set_cookie raw_cookie
# Needs manual unescape
parts[:value] = CGI::unescape(parts[:value])
# Per spec, this name is different
parts[:expires] = parts[:expires_at]
# Remove old ones
parts.delete :expires_at
# CookieJar adds them always, remove
parts.delete :version
# Convert symbol keys to strings for GGI::Cookie
cookie = parts.inject(Hash.new) do |acc, (k,v)|
acc[k.to_s] = v
acc
end
puts CGI::Cookie.new(cookie).to_s
#=> somecookie=some%7Cvalue; domain=.somedomain.com; path=/; expires=Wed, 02 Apr 2014 13:36:50 GMT
I guess that should be a call for a gem which does only one thing with cookies and does it well: being able to parse and generate raw cookie strings. But TBH, the problem domain is probably much more complex then fooling around the way I did.

I haven't used the library, but according to the Ruby docs ( http://www.ruby-doc.org/stdlib-1.9.3/libdoc/cgi/rdoc/CGI/Cookie.html ) what you're calling |name| in your block is actually a Cookie object.
parse(raw_cookie)
Parse a raw cookie string into a hash of cookie-name=>Cookie pairs.
You're probably just seeing the name printed out as that's what an implicit .to_s will give you on a Cookie.
Try printing name.path or name.value (or any of the accessors of a Cookie object) instead.

Related

How to replace string in URL with captured regex pattern

I want to replace 'hoge' to 'foo' with regex. But the user's value is dynamic so I can't use str.gsub('hoge', 'foo').
str = '?user=hoge&tab=fuga'
What should I do?
Don't do this with a regular expression.
This is how to manipulate URIs using the existing wheels:
require 'uri'
str = 'http://example.com?user=hoge&tab=fuga'
uri = URI.parse(str)
query = URI.decode_www_form(uri.query).to_h # => {"user"=>"hoge", "tab"=>"fuga"}
query['user'] = 'foo'
uri.query = URI.encode_www_form(query)
uri.to_s # => "http://example.com?user=foo&tab=fuga"
Alternately:
require 'addressable'
uri = Addressable::URI.parse('http://example.com?tab=fuga&user=hoge')
query = uri.query_values # => {"tab"=>"fuga", "user"=>"hoge"}
query['user'] = 'foo'
uri.query_values = query
uri.to_s # => "http://example.com?tab=fuga&user=foo"
Note that in the examples the order of the parameters changed, but the code handled the difference without problems.
The reason you want to use URI or Addressable is because parameters and values have to be correctly encoded when they contain illegal characters. URI and Addressable know the rules and will follow them, whereas naive code assumes it's OK to not bother with encoding, causing broken URIs.
URI is part of the Ruby Standard Library, and Addressable is more full-featured. Take your pick.
You can try below regex
([?&]user=)([^&]+)
DEMO
You probably want to find out what the user query maps to first before using a .gsub to replace whatever value it is.
First, parse the URL string into an URI object using the URI module. And then, you can use the CGI query methods to get the key value pairs of the query params off the URI object using the CGI module. And finally, you can .gsub off the values in that hash.

How to insert variables in an url?

I have this code for send a request to an url, and I wanted to place on the url two variables :
talksList = open('http://yolo.com/?action=cp_list&id=#{variable1}&key=#{variable2}')
But when I insert my variables like this, it doesn't work. Can you help me ?
Thanks in advance.
As #YuHao said, you're trying to interpolate a variable into a non-interpreted string. But you have a bigger long-term problem.
Don't try to inject unencoded variables into a URL. While it will work, you run the risk of generating nonsensical URLs, which a browser would accept, but code won't. Instead, use the appropriate tools to modify the URL, which will maintain appropriate encoding for you.
Here's an example using URI:
require 'uri'
variable1 = 'foo'
variable2 = 'bar'
uri = URI.parse('http://yolo.com/?action=cp_list')
params = URI.decode_www_form(uri.query)
params << ['id', variable1]
params << ['key', variable2]
uri.query = URI.encode_www_form(params)
uri.to_s # => "http://yolo.com/?action=cp_list&id=foo&key=bar"
You can do the same thing using the Addressable gem, which is more full-featured:
require 'addressable/uri'
variable1 = 'foo'
variable2 = 'bar'
uri = Addressable::URI.parse('http://yolo.com/?action=cp_list')
params = uri.query_values
uri.query_values = params.merge('id' => variable1, 'key' => variable2)
uri.to_s # => "http://yolo.com/?action=cp_list&id=foo&key=bar"
That's because you are using strings with single quotes. In single quoted strings, nothing is replaced except \\ and \'.
Specifically, interpolation is only available in double quoted strings, try
talksList = open("http://yolo.com/?action=cp_list&id=#{variable1}&key=#{variable2}")

In Ruby/Rails, how can I encode/escape special characters in URLs?

How do I encode or 'escape' the URL before I use OpenURI to open(url)?
We're using OpenURI to open a remote url and return the xml:
getresult = open(url).read
The problem is the URL contains some user-input text that contains spaces and other characters, including "+", "&", "?", etc. potentially, so we need to safely escape the URL. I saw lots of examples when using Net::HTTP, but have not found any for OpenURI.
We also need to be able to un-escape a similar string we receive in a session variable, so we need the reciprocal function.
Don't use URI.escape as it has been deprecated in 1.9.
Rails' Active Support adds Hash#to_query:
{foo: 'asd asdf', bar: '"<#$dfs'}.to_query
# => "bar=%22%3C%23%24dfs&foo=asd+asdf"
Also, as you can see it tries to order query parameters always the same way, which is good for HTTP caching.
Ruby Standard Library to the rescue:
require 'uri'
user_text = URI.escape(user_text)
url = "http://example.com/#{user_text}"
result = open(url).read
See more at the docs for the URI::Escape module. It also has a method to do the inverse (unescape)
The main thing you have to consider is that you have to escape the keys and values separately before you compose the full URL.
All the methods which get the full URL and try to escape it afterwards are broken, because they cannot tell whether any & or = character was supposed to be a separator, or maybe a part of the value (or part of the key).
The CGI library seems to do a good job, except for the space character, which was traditionally encoded as +, and nowadays should be encoded as %20. But this is an easy fix.
Please, consider the following:
require 'cgi'
def encode_component(s)
# The space-encoding is a problem:
CGI.escape(s).gsub('+','%20')
end
def url_with_params(path, args = {})
return path if args.empty?
path + "?" + args.map do |k,v|
"#{encode_component(k.to_s)}=#{encode_component(v.to_s)}"
end.join("&")
end
def params_from_url(url)
path,query = url.split('?',2)
return [path,{}] unless query
q = query.split('&').inject({}) do |memo,p|
k,v = p.split('=',2)
memo[CGI.unescape(k)] = CGI.unescape(v)
memo
end
return [path, q]
end
u = url_with_params( "http://example.com",
"x[1]" => "& ?=/",
"2+2=4" => "true" )
# "http://example.com?x%5B1%5D=%26%20%3F%3D%2F&2%2B2%3D4=true"
params_from_url(u)
# ["http://example.com", {"x[1]"=>"& ?=/", "2+2=4"=>"true"}]
Ruby has the built-in URI library, and the Addressable gem, in particular Addressable::URI
I prefer Addressable::URI. It's very full featured and handles the encoding for you when you use the query_values= method.
I've seen some discussions about URI going through some growing pains so I tend to leave it alone for handling encoding/escaping until these things get sorted out:
http://osdir.com/ml/ruby-core/2010-06/msg00324.html
http://osdir.com/ml/lang-ruby-core/2009-06/msg00350.html
http://osdir.com/ml/ruby-core/2011-06/msg00748.html

Using Open-URI to fetch XML and the best practice in case of problems with a remote url not returning/timing out?

Current code works as long as there is no remote error:
def get_name_from_remote_url
cstr = "http://someurl.com"
getresult = open(cstr, "UserAgent" => "Ruby-OpenURI").read
doc = Nokogiri::XML(getresult)
my_data = doc.xpath("/session/name").text
# => 'Fred' or 'Sam' etc
return my_data
end
But, what if the remote URL times out or returns nothing? How I detect that and return nil, for example?
And, does Open-URI give a way to define how long to wait before giving up? This method is called while a user is waiting for a response, so how do we set a max timeoput time before we give up and tell the user "sorry the remote server we tried to access is not available right now"?
Open-URI is convenient, but that ease of use means they're removing the access to a lot of the configuration details the other HTTP clients like Net::HTTP allow.
It depends on what version of Ruby you're using. For 1.8.7 you can use the Timeout module. From the docs:
require 'timeout'
begin
status = Timeout::timeout(5) {
getresult = open(cstr, "UserAgent" => "Ruby-OpenURI").read
}
rescue Timeout::Error => e
puts e.to_s
end
Then check the length of getresult to see if you got any content:
if (getresult.empty?)
puts "got nothing from url"
end
If you are using Ruby 1.9.2 you can add a :read_timeout => 10 option to the open() method.
Also, your code could be tightened up and made a bit more flexible. This will let you pass in a URL or default to the currently used URL. Also read Nokogiri's NodeSet docs to understand the difference between xpath, /, css and at, %, at_css, at_xpath:
def get_name_from_remote_url(cstr = 'http://someurl.com')
doc = Nokogiri::XML(open(cstr, 'UserAgent' => 'Ruby-OpenURI'))
# xpath returns a nodeset which has to be iterated over
# my_data = doc.xpath('/session/name').text # => 'Fred' or 'Sam' etc
# at returns a single node
doc.at('/session/name').text
end

How to check if a URL is valid

How can I check if a string is a valid URL?
For example:
http://hello.it => yes
http:||bra.ziz, => no
If this is a valid URL how can I check if this is relative to a image file?
Notice:
As pointed by #CGuess, there's a bug with this issue and it's been documented for over 9 years now that validation is not the purpose of this regular expression (see https://bugs.ruby-lang.org/issues/6520).
Use the URI module distributed with Ruby:
require 'uri'
if url =~ URI::regexp
# Correct URL
end
Like Alexander Günther said in the comments, it checks if a string contains a URL.
To check if the string is a URL, use:
url =~ /\A#{URI::regexp}\z/
If you only want to check for web URLs (http or https), use this:
url =~ /\A#{URI::regexp(['http', 'https'])}\z/
Similar to the answers above, I find using this regex to be slightly more accurate:
URI::DEFAULT_PARSER.regexp[:ABS_URI]
That will invalidate URLs with spaces, as opposed to URI.regexp which allows spaces for some reason.
I have recently found a shortcut that is provided for the different URI rgexps. You can access any of URI::DEFAULT_PARSER.regexp.keys directly from URI::#{key}.
For example, the :ABS_URI regexp can be accessed from URI::ABS_URI.
The problem with the current answers is that a URI is not an URL.
A URI can be further classified as a locator, a name, or both. The
term "Uniform Resource Locator" (URL) refers to the subset of URIs
that, in addition to identifying a resource, provide a means of
locating the resource by describing its primary access mechanism
(e.g., its network "location").
Since URLs are a subset of URIs, it is clear that matching specifically for URIs will successfully match undesired values. For example, URNs:
"urn:isbn:0451450523" =~ URI::regexp
=> 0
That being said, as far as I know, Ruby doesn't have a default way to parse URLs , so you'll most likely need a gem to do so. If you need to match URLs specifically in HTTP or HTTPS format, you could do something like this:
uri = URI.parse(my_possible_url)
if uri.kind_of?(URI::HTTP) or uri.kind_of?(URI::HTTPS)
# do your stuff
end
I prefer the Addressable gem. I have found that it handles URLs more intelligently.
require 'addressable/uri'
SCHEMES = %w(http https)
def valid_url?(url)
parsed = Addressable::URI.parse(url) or return false
SCHEMES.include?(parsed.scheme)
rescue Addressable::URI::InvalidURIError
false
end
This is a fairly old entry, but I thought I'd go ahead and contribute:
String.class_eval do
def is_valid_url?
uri = URI.parse self
uri.kind_of? URI::HTTP
rescue URI::InvalidURIError
false
end
end
Now you can do something like:
if "http://www.omg.wtf".is_valid_url?
p "huzzah!"
end
For me, I use this regular expression:
/\A(http|https):\/\/[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(:[0-9]{1,5})?(\/.*)?\z/ix
Option:
i - case insensitive
x - ignore whitespace in regex
You can set this method to check URL validation:
def valid_url?(url)
return false if url.include?("<script")
url_regexp = /\A(http|https):\/\/[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(:[0-9]{1,5})?(\/.*)?\z/ix
url =~ url_regexp ? true : false
end
To use it:
valid_url?("http://stackoverflow.com/questions/1805761/check-if-url-is-valid-ruby")
Testing with wrong URLs:
http://ruby3arabi - result is invalid
http://http://ruby3arabi.com - result is invalid
http:// - result is invalid
http://test.com\n<script src=\"nasty.js\"> (Just simply check "<script")
127.0.0.1 - not support IP address
Test with correct URLs:
http://ruby3arabi.com - result is valid
http://www.ruby3arabi.com - result is valid
https://www.ruby3arabi.com - result is valid
https://www.ruby3arabi.com/article/1 - result is valid
https://www.ruby3arabi.com/websites/58e212ff6d275e4bf9000000?locale=en - result is valid
In general,
/^#{URI::regexp}$/
will work well, but if you only want to match http or https, you can pass those in as options to the method:
/^#{URI::regexp(%w(http https))}$/
That tends to work a little better, if you want to reject protocols like ftp://.
This is a little bit old but here is how I do it. Use Ruby's URI module to parse the URL. If it can be parsed then it's a valid URL. (But that doesn't mean accessible.)
URI supports many schemes, plus you can add custom schemes yourself:
irb> uri = URI.parse "http://hello.it" rescue nil
=> #<URI::HTTP:0x10755c50 URL:http://hello.it>
irb> uri.instance_values
=> {"fragment"=>nil,
"registry"=>nil,
"scheme"=>"http",
"query"=>nil,
"port"=>80,
"path"=>"",
"host"=>"hello.it",
"password"=>nil,
"user"=>nil,
"opaque"=>nil}
irb> uri = URI.parse "http:||bra.ziz" rescue nil
=> nil
irb> uri = URI.parse "ssh://hello.it:5888" rescue nil
=> #<URI::Generic:0x105fe938 URL:ssh://hello.it:5888>
[26] pry(main)> uri.instance_values
=> {"fragment"=>nil,
"registry"=>nil,
"scheme"=>"ssh",
"query"=>nil,
"port"=>5888,
"path"=>"",
"host"=>"hello.it",
"password"=>nil,
"user"=>nil,
"opaque"=>nil}
See the documentation for more information about the URI module.
You could also use a regex, maybe something like http://www.geekzilla.co.uk/View2D3B0109-C1B2-4B4E-BFFD-E8088CBC85FD.htm assuming this regex is correct (I haven't fully checked it) the following will show the validity of the url.
url_regex = Regexp.new("((https?|ftp|file):((//)|(\\\\))+[\w\d:\##%/;$()~_?\+-=\\\\.&]*)")
urls = [
"http://hello.it",
"http:||bra.ziz"
]
urls.each { |url|
if url =~ url_regex then
puts "%s is valid" % url
else
puts "%s not valid" % url
end
}
The above example outputs:
http://hello.it is valid
http:||bra.ziz not valid

Resources