Match regex works for one search, but scan does not - ruby

The following gets me one match:
query = http://0.0.0.0:9393/review?first_name=aoeu&last_name=rar
find = /(?<=(\?|\&)).*?(?=(\&|\z))/.match(query)
When I examine 'find' I get:
first_name=aoeu
I want to match everything between a '?' and a '&', so I tried
find = query.scan(/(?<=(\?|\&)).*?(?=(\&|\z))/)
But yet when I examine 'find' I now get:
[["?", "&"], ["&", ""]]
What do I need to do to get:
[first_name=aoeu][last_name=rar]
or
["first_name=aoeu","last_name=rar"]
?

Use String#split.
query.split(/[&?]/).drop(1)
or
query[/(?<=\?).*/].split("&")
But if your real purpose is to extract the parameters from url, then question and its answer.

Use other module provided by ruby or rails will make your code more maintainable and readable.
require 'uri'
uri = 'http://0.0.0.0:9393/review?first_name=aoeu&last_name=rar'
require 'rack'
require 'rack/utils'
Rack::Utils.parse_query(URI.parse(uri).query)
# => {"first_name"=>"aoeu", "last_name"=>"rar"}
# or CGI
require 'cgi'
CGI::parse(URI.parse(uri).query)
# => {"first_name"=>["aoeu"], "last_name"=>["rar"]}

If you need extract query params from URI, please, check thread "How to extract URL parameters from a URL with Ruby or Rails?". It contains a lot of solutions without using regexps.

Related

Ruby unescape HTML string

Any idea how I can unescape the following string in Ruby?
C:\inetpub\wwwroot\adminWeb
to
C:\inetpub\wwwroot\adminWeb
or to
C%3A%5Cinetpub%5Cwwwroot%5CadminWeb
Tried with URI.decode with no success.
The CGI library is one option:
require 'cgi'
CGI.unescapeHTML('C:\inetpub\wwwroot\adminWeb')
# => "C:\\inetpub\\wwwroot\\adminWeb"
One more variant is HTMLEntities
HTMLEntities.new.decode "C:\inetpub\wwwroot\adminWeb"
# => "C:\\inetpub\\wwwroot\\adminWeb"
I prefer to use it because it deals with rare cases aså and — which CGI.unescapeHTML does not
An alternative is using the standard lib's URI module:
require 'uri'
URI.unescape "C%3A%5Cinetpub%5Cwwwroot%5CadminWeb" # => "C:\\inetpub\\wwwroot\\adminWeb"

Is there a way to combine multiple regular expressions for a substring command?

Is there a way to combine these two regular expressions I am using to convert multi-platform file paths to a URL?
#image_file = "#{request.protocol}#{request.host}/#{#image_file.path.sub(/^([a-z]):\//,"")}".sub(/^\//,"")
This handles both my Windows and *IX platforms for file path conversion to a URL. For example, both of the following file path strings are handled properly:
- "c:\users\docs\pictures\image.jpg" goes to "http://localhost/users/docs/pictures/image.jpg"
- "\home\usr_name\pictures\image.jpg" goes to "http://localhost/usr_name/pictures/image.jpg"
I would prefer not to have to use two sub calls on a string if there is a way to combine them properly.
Suggestions and feedback from the community welcome!
The regex you are looking for is /^([a-z]:)?\//:
"c:/users/docs/pictures/image.jpg".sub(/^([a-z]:)?\//, '')
=> "users/docs/pictures/image.jpg"
"/home/usr_name/pictures/image.jpg".sub(/^([a-z]:)?\//, '')
=> "home/usr_name/pictures/image.jpg"
As some background on working with filenames and URLs...
First, Ruby doesn't require you to use reversed-slashes in Windows filenames, so if you're generating them don't bother. Instead, rely on the fact that the IO class knows what OS you're on and will auto-sense the path separator and convert things for you on the fly. This is from the IO documentation:
Ruby will convert pathnames between different operating system conventions if possible. For instance, on a Windows system the filename "/gumby/ruby/test.rb" will be opened as "\gumby\ruby\test.rb". When specifying a Windows-style filename in a Ruby string, remember to escape the backslashes:
"c:\\gumby\\ruby\\test.rb"
Our examples here will use the Unix-style forward slashes; File::ALT_SEPARATOR can be used to get the platform-specific separator character.
If you're receiving the paths from another source, this makes it easy to normalize them into something Ruby likes:
path = "c:\\users\\docs\\pictures\\image.jpg" # => "c:\\users\\docs\\pictures\\image.jpg"
puts path
# >> c:\users\docs\pictures\image.jpg
path.gsub!(/\\/, '/') if path['\\']
path # => "c:/users/docs/pictures/image.jpg"
puts path
# >> c:/users/docs/pictures/image.jpg
For convenience, write a little helper method:
def normalize_path(p)
p.gsub(/\\/, '/')
end
normalize_path("c:\\users\\docs\\pictures\\image.jpg") # => "c:/users/docs/pictures/image.jpg"
normalize_path("/users/docs/pictures/image.jpg") # => "/users/docs/pictures/image.jpg"
Ruby's File and Pathname classes are very helpful when dealing with paths:
foo = normalize_path(path) # => "c:/users/docs/pictures/image.jpg"
File.dirname(foo) # => "c:/users/docs/pictures"
File.basename(foo) # => "image.jpg"
and:
File.split(foo) # => ["c:/users/docs/pictures", "image.jpg"]
path_to_file, filename = File.split(foo)
path_to_file # => "c:/users/docs/pictures"
filename # => "image.jpg"
Alternately there's the Pathname class:
require 'pathname'
bar = Pathname.new(foo)
bar.dirname # => #<Pathname:c:/users/docs/pictures>
bar.basename # => #<Pathname:image.jpg>
Pathname is an experimental class in Ruby's standard library that wraps up all the convenience methods from File, FileUtils and Dir into one umbrella class. It's worth getting to know:
The goal of this class is to manipulate file path information in a neater way than standard Ruby provides. The examples below demonstrate the difference.
All functionality from File, FileTest, and some from Dir and FileUtils is included, in an unsurprising way. It is essentially a facade for all of these, and more.
Back to your question...
Ruby's standard library also contains the URI class. It's well tested and is a better way to build URLs than simple string concatenation due to idiosyncrasies that can occur when characters need to be encoded.
require 'uri'
url = URI::HTTP.build({:host => 'www.foo.com', :path => foo[/^(?:[a-z]:)?(.+)/, 1]})
url # => #<URI::HTTP:0x007fe91117a438 URL:http://www.foo.com/users/docs/pictures/image.jpg>
The build method applies syntax rules to make sure the URL is valid.
If you need it, at this point you can tack on to_s to get the stringified version:
url.to_s # => "http://www.foo.com/users/docs/pictures/image.jpg"

Get filename on server while using Ruby gem Curb

Is there a way to get the filename of the file being downloaded (without having to parse the url provided)? I am hoping to find something like:
c = Curl::Easy.new("http://google.com/robots.txt")
c.perform
File.open( c.file_name, "w") { |file| file.write c.body_str }
Unfortunately, there's nothing in the Curb documentation regarding polling the filename. I don't know whether you have a particular aversion to parsing, but it's a simple process if using the URI module:
require 'uri'
url = 'http://google.com/robots.txt'
uri = URI.parse(url)
puts File.basename(uri.path)
#=> "robots.txt"
UPDATE:
In the comments to this question, the OP suggests using split() to split the URL by slashes (/). While this may work in the majority of situations, it isn't a catch-all solution. For instance, versioned files won't be parsed correctly:
url = 'http://google.com/robots.txt?1234567890'
puts url.split('/').last
#=> "robots.txt?1234567890"
In comparison, using URI.parse() guarantees the filename – and only the filename – is returned:
require 'uri'
url = 'http://google.com/robots.txt?1234567890'
uri = URI.parse(url)
puts File.basename(uri.path)
#=> "robots.txt"
In sum, for optimal coherence and integrity, it's wise to use the URI library to parse universal resources – it's what it was created for, after all.

Getting webpage content with Ruby -- I'm having troubles

I want to get the content off this* page. Everything I've looked up gives the solution of parsing CSS elements; but, that page has none.
Here's the only code that I found that looked like it should work:
file = File.open('http://hiscore.runescape.com/index_lite.ws?player=zezima', "r")
contents = file.read
puts contents
Error:
tracker.rb:1:in 'initialize': Invalid argument - http://hiscore.runescape.com/index_lite.ws?player=zezima (Errno::EINVAL)
from tracker.rb:1:in 'open'
from tracker.rb:1
*http://hiscore.runescape.com/index_lite.ws?player=zezima
If you try to format this as a link in the post it doesn't recognize the underscore (_) in the URL for some reason.
You really want to use open() provided by the Kernel class which can read from URIs you just need to require the OpenURI library first:
require 'open-uri'
Used like so:
require 'open-uri'
file = open('http://hiscore.runescape.com/index_lite.ws?player=zezima')
contents = file.read
puts contents
This related SO thread covers the same question:
Open an IO stream from a local file or url
The appropriate way to fetch the content of a website is through the NET::HTTP module in Ruby:
require 'uri'
require 'net/http'
url = "http://hiscore.runescape.com/index_lite.ws?player=zezima"
r = Net::HTTP.get_response(URI.parse(url).host, URI.parse(url).path)
File.open() does not support URIs.
Best wishes,
Fabian
Please use open-uri, its support both uri and local files
require 'open-uri'
contents = open('http://www.google.com') {|f| f.read }

Equivalent of cURL for Ruby?

Is there a cURL library for Ruby?
Curb and Curl::Multi provide cURL bindings for Ruby.
If you like it less low-level, there is also Typhoeus, which is built on top of Curl::Multi.
Use OpenURI and
open("http://...", :http_basic_authentication=>[user, password])
accessing sites/pages/resources that require HTTP authentication.
Curb-fu is a wrapper around Curb which in turn uses libcurl. What does Curb-fu offer over Curb? Just a lot of syntactic sugar - but that can be often what you need.
HTTP clients is a good page to help you make decisions about the various clients.
You might also have a look at Rest-Client
If you know how to write your request as a curl command, there is an online tool that can turn it into ruby (2.0+) code: curl-to-ruby
Currently, it knows the following options: -d/--data, -H/--header, -I/--head, -u/--user, --url, and -X/--request. It is open to contributions.
the eat gem is a "replacement" for OpenURI, so you need to install the gem eat in the first place
$ gem install eat
Now you can use it
require 'eat'
eat('http://yahoo.com') #=> String
eat('/home/seamus/foo.txt') #=> String
eat('file:///home/seamus/foo.txt') #=> String
It uses HTTPClient under the hood. It also has some options:
eat('http://yahoo.com', :timeout => 10) # timeout after 10 seconds
eat('http://yahoo.com', :limit => 1024) # only read the first 1024 chars
eat('https://yahoo.com', :openssl_verify_mode => 'none') # don't bother verifying SSL certificate
Here's a little program I wrote to get some files with.
base = "http://media.pragprog.com/titles/ruby3/code/samples/tutthreads_"
for i in 1..50
url = "#{ base }#{ i }.rb"
file = "tutthreads_#{i}.rb"
File.open(file, 'w') do |f|
system "curl -o #{f.path} #{url}"
end
end
I know it could be a little more eloquent but it serves it purpose. Check it out. I just cobbled it together today because I got tired of going to each URL to get the code for the book that was not included in the source download.
There's also Mechanize, which is a very high-level web scraping client that uses Nokogiri for HTML parsing.
Adding a more recent answer, HTTPClient is another Ruby library that uses libcurl, supports parallel threads and lots of the curl goodies. I use HTTPClient and Typhoeus for any non-trivial apps.
To state the maybe-too-obvious, tick marks execute shell code in Ruby as well. Provided your Ruby code is running in a shell that has curl:
puts `curl http://www.google.com?q=hello`
or
result = `
curl -X POST https://www.myurl.com/users \
-d "name=pat" \
-d "age=21"
`
puts result
A nice minimal reproducible example to copy/paste into your rails console:
require 'open-uri'
require 'nokogiri'
url = "https://www.example.com"
html_file = URI.open(url)
doc = Nokogiri::HTML(html_file)
doc.css("h1").text
# => "Example Domain"

Resources