I am using the -O or --output-document option on wget to store get the http from a website. However the -O option requires a file for the output to be stored in and I would like to store it in a variable in my program so that I can manipulate it easier. Is there any way to do this without rereading it in from the file? In essence, I am manually creating a crude cache.
Sample code
#!/usr/bin/ruby
url= "http://www.google.com"
whereIWantItStored = `wget #{url} --output-document=outsideFile`
Reference:
I found this post helpful in using wget within my program: Using wget via Ruby on Rails
#!/usr/bin/ruby
url= "http://www.google.com"
whereIWantItStored = `wget #{url} -O -`
Be sure to sanitize your url to avoid shell injection. The - after -O means standard output, which gets captured by the ruby backticks.
https://www.owasp.org/index.php/Command_Injection explains shell injection.
http://apidock.com/ruby/Shellwords/shellescape For Ruby >=1.9 or the Escape Gem for ruby 1.8.x
I wouldn't use wget. I'd use something like HTTParty.
Then you could do:
require 'httparty'
url = 'http://www.google.com'
response = HTTParty.get(url)
whereIWantItStored = response.code = 200 ? response.body : nil
Related
I'm trying to get/download some files from an url. I'm make a tiny script in ruby to get this files. Follow the script:
require 'nokogiri'
require 'open-uri'
(1..2).each do |season|
(1..3).each do |ep|
season = season.to_s.rjust(2, '0')
ep = ep.to_s.rjust(2, '0')
page = Nokogiri::HTML(open("https://some-url/s#{season}e{ep}/releases"))
page.css('table.table tbody tr td a').each do |el|
link = el['href']
`curl "https://some-url#{link}"` if link.match('sujaidr.srt$')
end
end
end
puts "done"
But the response from curl is:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<title>Redirecting...</title>
<h1>Redirecting...</h1>
<p>You should be redirected automatically to target URL:
/some-url/s0Xe0Y/releases. If not click the link.
When I use wget the redirected page is downloaded. I tried to set the user agent but not works. The server always redirect the link only when I try download the files through curl or others cli's like wget, aria2c, httpie, etc. And I can't find any solution for now.
How can I do this?
Solved
I decide use Watir webdriver to do this. Works great for now.
If you want to download the file, rather then the page doing the redirection try using the option -L within your code for example:
curl -L "https://some-url#{link}"
From the curl man:
-L, --location
(HTTP) If the server reports that the requested page has moved to a different
location (indicated with a Location: header and a 3XX
response code), this option will make curl redo the request on
the new place.
If you are using ruby, instead of calling curl or other 3rd party tools, you may cat to use something like this:
require 'net/http'
# Must be somedomain.net instead of somedomain.net/, otherwise, it will throw exception.
Net::HTTP.start("somedomain.net") do |http|
resp = http.get("/flv/sample/sample.flv")
open("sample.flv", "wb") do |file|
file.write(resp.body)
end
end
puts "Done."
Check this answer from where the example came out: https://stackoverflow.com/a/2263547/1135424
I want to write a Ruby method that does two things:
Determine what the current stable version of Ruby is. My first thought is to get the response from https://www.ruby-lang.org/en/downloads/ and use RegEx to isolate the phrase The current stable version is [x]. Is there is an API I'm not aware of?
Get the URL to download the .tar.gz of that release. For this I was thinking the same thing, get it from the output of the site URL.
I'm looking for advice about the best way to go about it, or direction if there's something in place I might use to determine my desired results.
Ruby code to fetch the download page, then parse the current version and the link URL:
html = Net::HTTP.get(URI("https://www.ruby-lang.org/en/downloads/"))
vers = html[/http.*ruby-(.*).tar.gz/,1]
link = html[/http.*ruby-.*.tar.gz/]
GitHub code: ruby-stable-version.rb
Shell code:
ruby-stable-version
If you are using rbenv you can use ruby-build to get a list of ruby versions and then grep against that.
ruby-build --definitions | tail -r | grep -x -G -m 1 '[0-9]\.[0-9].[0-9]\-*[p0-9*]*'
You can then use that within your code like so:
version = `ruby-build --definitions | tail -r | grep -x -G -m 1 '[0-9]\.[0-9].[0-9]\-*[p0-9*]*'`.strip
You can then use this value to get the download URL.
url = "http://cache.ruby-lang.org/pub/ruby/#{version[0..2]}/ruby-#{version}.tar.gz"
And then download the file:
require 'open-uri'
open("ruby-#{version}.tar.gz", 'wb') do |file|
file << open(url).read
end
Learn more about rbenv here and ruby-build here.
Another possibility would be to use the Ruby source repository. Check version.h in every branch, filter by RUBY_PATCHLEVEL > -1 (-1 is used for -dev versions), sort by RUBY_VERSION and take the latest one.
You can use:
Ruby's built-in OpenURI, and Nokogiri, to read a page, parse it, search for certain tags, extract a parameter such as a "src" or "href".
OpenURI to read the URL, or curl or wget at the command-line to retrieve the file.
Nokogiri's tutorials including showing how to use OpenURI to retrieve the page and hand it off to Nokogiri.
OpenURI's docs show how to "open" URLs and retrieve their content using read. Once you've done that, the data will be easy to save to disk using something like this for text files:
File.write('some_file', open('http://www.example.com/').read)
or for binary:
File.open('some_file', 'wb') { |fo| fo.write(open('http://www.example.com/').read) }
There are examples of using both Nokogiri and OpenURI for this all over Stack Overflow.
If I run a simple script using OpenURI, I can access a web page. The results get written to the terminal.
Normally I would use bash redirection to write the results to a file.
How do I use ruby to write the results of an OpenURI call to a file?
require 'open-uri'
open("file_to_write.html", "wb") do |file|
URI.open("http://www.example.com/") do |uri|
file.write(uri.read)
end
end
Note: In Ruby < 2.5 you must use open(url) instead of URI.open(url). See https://bugs.ruby-lang.org/issues/15893
The pickaxe to the rescue. (this used to be a good page, but is no longer working)
Try this instead: Open an IO stream from a local file or url
To refresh Redmine, I need SVN to ping our Redmine installation from our post-commit hook. Our post-commit hook is a Ruby script that generates an email. I'd like to insert a call do this:
curl --insecure https://redmineserver+webappkey
This call works from the command line but when I try to do this:
#!/usr/bin/ruby -w
REFRESH_DRADIS_URL = "https://redmineserver+webappkey"
system("/usr/bin/curl", "--insecure", "#{REFRESH_DRADIS_URL}")
It doesn't work. How do I do this in ruby? I googled 'ruby system curl' but I just got a bunch of links to integrate curl into ruby (which is NOT what I'm interested in).
There are many ways
REFRESH_DRADIS_URL = "https://redmineserver+webappkey"
result = `/usr/bin/curl --insecure #{REFRESH_DRADIS_URL}`
but I don't think you have to use curl at all. Try this
require 'open-uri'
open(REFRESH_DRADIS_URL)
If the certificate isn't valid then it gets a little more complicated
require 'net/https'
http = Net::HTTP.new("amazon.com", 443)
http.use_ssl = true
http.verify_mode = OpenSSL::SSL::VERIFY_NONE
resp, data = http.get("/")
system ("curl --insecure #{url}")
For such a simple problem, I wouldn't bother with shelling out to curl, I'd simply do
require 'net/https'
http = Net::HTTP.new('redmineserver+webappkey', 443)
http.use_ssl = true
http.verify_mode = OpenSSL::SSL::VERIFY_NONE
http.get('/')
And for more complex problems, I'd still not shell out to curl, but rather use one of the many Ruby libcurl bindings.
I had to do this recently and tried the following and it worked:
test.rb
class CurlTest
def initialize()
end
def dumpCurl
test = `curl -v https://google.com 2>&1`
puts test
end
end
curlTest = CurlTest.new()
curlTest.dumpCurl
Is there a cURL library for Ruby?
Curb and Curl::Multi provide cURL bindings for Ruby.
If you like it less low-level, there is also Typhoeus, which is built on top of Curl::Multi.
Use OpenURI and
open("http://...", :http_basic_authentication=>[user, password])
accessing sites/pages/resources that require HTTP authentication.
Curb-fu is a wrapper around Curb which in turn uses libcurl. What does Curb-fu offer over Curb? Just a lot of syntactic sugar - but that can be often what you need.
HTTP clients is a good page to help you make decisions about the various clients.
You might also have a look at Rest-Client
If you know how to write your request as a curl command, there is an online tool that can turn it into ruby (2.0+) code: curl-to-ruby
Currently, it knows the following options: -d/--data, -H/--header, -I/--head, -u/--user, --url, and -X/--request. It is open to contributions.
the eat gem is a "replacement" for OpenURI, so you need to install the gem eat in the first place
$ gem install eat
Now you can use it
require 'eat'
eat('http://yahoo.com') #=> String
eat('/home/seamus/foo.txt') #=> String
eat('file:///home/seamus/foo.txt') #=> String
It uses HTTPClient under the hood. It also has some options:
eat('http://yahoo.com', :timeout => 10) # timeout after 10 seconds
eat('http://yahoo.com', :limit => 1024) # only read the first 1024 chars
eat('https://yahoo.com', :openssl_verify_mode => 'none') # don't bother verifying SSL certificate
Here's a little program I wrote to get some files with.
base = "http://media.pragprog.com/titles/ruby3/code/samples/tutthreads_"
for i in 1..50
url = "#{ base }#{ i }.rb"
file = "tutthreads_#{i}.rb"
File.open(file, 'w') do |f|
system "curl -o #{f.path} #{url}"
end
end
I know it could be a little more eloquent but it serves it purpose. Check it out. I just cobbled it together today because I got tired of going to each URL to get the code for the book that was not included in the source download.
There's also Mechanize, which is a very high-level web scraping client that uses Nokogiri for HTML parsing.
Adding a more recent answer, HTTPClient is another Ruby library that uses libcurl, supports parallel threads and lots of the curl goodies. I use HTTPClient and Typhoeus for any non-trivial apps.
To state the maybe-too-obvious, tick marks execute shell code in Ruby as well. Provided your Ruby code is running in a shell that has curl:
puts `curl http://www.google.com?q=hello`
or
result = `
curl -X POST https://www.myurl.com/users \
-d "name=pat" \
-d "age=21"
`
puts result
A nice minimal reproducible example to copy/paste into your rails console:
require 'open-uri'
require 'nokogiri'
url = "https://www.example.com"
html_file = URI.open(url)
doc = Nokogiri::HTML(html_file)
doc.css("h1").text
# => "Example Domain"