I want to write a Ruby method that does two things:
Determine what the current stable version of Ruby is. My first thought is to get the response from https://www.ruby-lang.org/en/downloads/ and use RegEx to isolate the phrase The current stable version is [x]. Is there is an API I'm not aware of?
Get the URL to download the .tar.gz of that release. For this I was thinking the same thing, get it from the output of the site URL.
I'm looking for advice about the best way to go about it, or direction if there's something in place I might use to determine my desired results.
Ruby code to fetch the download page, then parse the current version and the link URL:
html = Net::HTTP.get(URI("https://www.ruby-lang.org/en/downloads/"))
vers = html[/http.*ruby-(.*).tar.gz/,1]
link = html[/http.*ruby-.*.tar.gz/]
GitHub code: ruby-stable-version.rb
Shell code:
ruby-stable-version
If you are using rbenv you can use ruby-build to get a list of ruby versions and then grep against that.
ruby-build --definitions | tail -r | grep -x -G -m 1 '[0-9]\.[0-9].[0-9]\-*[p0-9*]*'
You can then use that within your code like so:
version = `ruby-build --definitions | tail -r | grep -x -G -m 1 '[0-9]\.[0-9].[0-9]\-*[p0-9*]*'`.strip
You can then use this value to get the download URL.
url = "http://cache.ruby-lang.org/pub/ruby/#{version[0..2]}/ruby-#{version}.tar.gz"
And then download the file:
require 'open-uri'
open("ruby-#{version}.tar.gz", 'wb') do |file|
file << open(url).read
end
Learn more about rbenv here and ruby-build here.
Another possibility would be to use the Ruby source repository. Check version.h in every branch, filter by RUBY_PATCHLEVEL > -1 (-1 is used for -dev versions), sort by RUBY_VERSION and take the latest one.
You can use:
Ruby's built-in OpenURI, and Nokogiri, to read a page, parse it, search for certain tags, extract a parameter such as a "src" or "href".
OpenURI to read the URL, or curl or wget at the command-line to retrieve the file.
Nokogiri's tutorials including showing how to use OpenURI to retrieve the page and hand it off to Nokogiri.
OpenURI's docs show how to "open" URLs and retrieve their content using read. Once you've done that, the data will be easy to save to disk using something like this for text files:
File.write('some_file', open('http://www.example.com/').read)
or for binary:
File.open('some_file', 'wb') { |fo| fo.write(open('http://www.example.com/').read) }
There are examples of using both Nokogiri and OpenURI for this all over Stack Overflow.
Related
I'm trying to remove the following bash command from my ruby script:
nodes = "knife search 'chef_environment:#{env} AND recipe:#{microservice}' -i 2>&1 | tail -n 2"
node = %x[ #{nodes} ].split
node.each do |n|
puts n
end
And replace it with something like this:
node = Chef::Knife.search("chef_environment:#{env} AND recipe:#{microservice}").split
Is this possible? Is there any documentation regarding Chef::knife library in ruby and how to use it?
To access a chef server, you could try to use the ridley gem, which is also used by Berkshelf and thus generally up-to-date.
A usage example could be:
ridley = Ridley.from_chef_config('/path/to/knife.rb')
ridley.search(:node, "chef_environment:#{env} AND recipe:#{microservice}")
See the documentation of the gem for a more detailed description of its options.
I am using the -O or --output-document option on wget to store get the http from a website. However the -O option requires a file for the output to be stored in and I would like to store it in a variable in my program so that I can manipulate it easier. Is there any way to do this without rereading it in from the file? In essence, I am manually creating a crude cache.
Sample code
#!/usr/bin/ruby
url= "http://www.google.com"
whereIWantItStored = `wget #{url} --output-document=outsideFile`
Reference:
I found this post helpful in using wget within my program: Using wget via Ruby on Rails
#!/usr/bin/ruby
url= "http://www.google.com"
whereIWantItStored = `wget #{url} -O -`
Be sure to sanitize your url to avoid shell injection. The - after -O means standard output, which gets captured by the ruby backticks.
https://www.owasp.org/index.php/Command_Injection explains shell injection.
http://apidock.com/ruby/Shellwords/shellescape For Ruby >=1.9 or the Escape Gem for ruby 1.8.x
I wouldn't use wget. I'd use something like HTTParty.
Then you could do:
require 'httparty'
url = 'http://www.google.com'
response = HTTParty.get(url)
whereIWantItStored = response.code = 200 ? response.body : nil
I am writing a program in Ruby which will search for strings in text files within a directory - similar to Grep.
I don't want it to attempt to search in binary files but I can't find a way in Ruby to determine whether a file is binary or text.
The program needs to work on both Windows and Linux.
If anyone could point me in the right direction that would be great.
Thanks,
Xanthalas
libmagic is a library which detects filetypes. For this solution I assume, that all mimetype's which start with text/ represent text files. Eveything else is a binary file. This assumption is not correct for all mime types (eg. application/x-latex, application/json), but libmagic detect's these as text/plain.
require "filemagic"
def binary?(filename)
begin
fm= FileMagic.new(FileMagic::MAGIC_MIME)
!(fm.file(filename)=~ /^text\//)
ensure
fm.close
end
end
gem install ptools
require 'ptools'
File.binary?(file)
An alternative to using the ruby-filemagic gem is to rely on the file command that ships with most Unix-like operating systems. I believe it uses the same libmagic library under the hood but you don't need the development files required to compile the ruby-filemagic gem. This is helpful if you're in an environment where it's a bit of work to install additional libraries (e.g. Heroku).
According to man file, text files will usually contain the word text in their description:
$ file Gemfile
Gemfile: ASCII text
You can run the file command through Ruby can capture the output:
require "open3"
def text_file?(filename)
file_type, status = Open3.capture2e("file", filename)
status.success? && file_type.include?("text")
end
Updating above answer with such example, when file name includes "text":
file /tmp/ball-texture.png
/tmp/ball-texture.png: PNG image data, 11 x 18, 8-bit/color RGBA, non-interlaced
So updated code will be like:
def text_file?(filename)
file_type, status = Open3.capture2e('file', filename)
status.success? && file_type.split(':').last.include?('text')
end
Is there a cURL library for Ruby?
Curb and Curl::Multi provide cURL bindings for Ruby.
If you like it less low-level, there is also Typhoeus, which is built on top of Curl::Multi.
Use OpenURI and
open("http://...", :http_basic_authentication=>[user, password])
accessing sites/pages/resources that require HTTP authentication.
Curb-fu is a wrapper around Curb which in turn uses libcurl. What does Curb-fu offer over Curb? Just a lot of syntactic sugar - but that can be often what you need.
HTTP clients is a good page to help you make decisions about the various clients.
You might also have a look at Rest-Client
If you know how to write your request as a curl command, there is an online tool that can turn it into ruby (2.0+) code: curl-to-ruby
Currently, it knows the following options: -d/--data, -H/--header, -I/--head, -u/--user, --url, and -X/--request. It is open to contributions.
the eat gem is a "replacement" for OpenURI, so you need to install the gem eat in the first place
$ gem install eat
Now you can use it
require 'eat'
eat('http://yahoo.com') #=> String
eat('/home/seamus/foo.txt') #=> String
eat('file:///home/seamus/foo.txt') #=> String
It uses HTTPClient under the hood. It also has some options:
eat('http://yahoo.com', :timeout => 10) # timeout after 10 seconds
eat('http://yahoo.com', :limit => 1024) # only read the first 1024 chars
eat('https://yahoo.com', :openssl_verify_mode => 'none') # don't bother verifying SSL certificate
Here's a little program I wrote to get some files with.
base = "http://media.pragprog.com/titles/ruby3/code/samples/tutthreads_"
for i in 1..50
url = "#{ base }#{ i }.rb"
file = "tutthreads_#{i}.rb"
File.open(file, 'w') do |f|
system "curl -o #{f.path} #{url}"
end
end
I know it could be a little more eloquent but it serves it purpose. Check it out. I just cobbled it together today because I got tired of going to each URL to get the code for the book that was not included in the source download.
There's also Mechanize, which is a very high-level web scraping client that uses Nokogiri for HTML parsing.
Adding a more recent answer, HTTPClient is another Ruby library that uses libcurl, supports parallel threads and lots of the curl goodies. I use HTTPClient and Typhoeus for any non-trivial apps.
To state the maybe-too-obvious, tick marks execute shell code in Ruby as well. Provided your Ruby code is running in a shell that has curl:
puts `curl http://www.google.com?q=hello`
or
result = `
curl -X POST https://www.myurl.com/users \
-d "name=pat" \
-d "age=21"
`
puts result
A nice minimal reproducible example to copy/paste into your rails console:
require 'open-uri'
require 'nokogiri'
url = "https://www.example.com"
html_file = URI.open(url)
doc = Nokogiri::HTML(html_file)
doc.css("h1").text
# => "Example Domain"
To save the HTML of a web page using Ruby, it's very easy.
One way to do is by using rio:
require 'rubygems'
require 'rio'
rio('http://www.google.com') > rio('google.html')
Is it possible to do the same for by parsing the html, requesting again the different images, javascript, css and then save each of them?
I think it is not very efficient.
So, is there a way to save a web page + all the images, css, and javascript that are related to that page, and all this automatically?
what about system("wget -r -l 1 http://google.com")
Most time we can use the system's tools. Like dimus said, you can use the wget to download page.
And there are many useful api for solving the Net problem. Such as net/ftp, net/http or net/https.
You can see the document for detail.
Net/HTTP
.But these methods only get the response, what we need do more is parsing the HTML document. Even more using the mozilla's lib is a good way.
url = "docs.zillabyte.com"
output_dir = "/tmp/crawl"
# -E = adjust malformed extensions (e.g. /some_image/ -> /some_image.gif)
# -H = span hosts (e.g. include assets from other domains)
# -p = download all assets associated with the page
# -P = output prefix (a.k.a the directory to dump the assets)
system("wget -E -H -p '#{url}' -P '#{output_dir}'")
# read files from 'output_dir'