How to first modified in ruby - ruby

how i can get the first modified in ruby and the code down below is for the last modified
but i can't get the first modified
require 'open-uri'
open("link") do |f|
f.each_line {|line| p line}
puts p f.last_modified
end
how i can get the first modified in ruby and thanks
so what is the code that i have to write it
and i tried
require 'open-uri'
open("link") do |f|
f.each_line {|line| p line}
puts p f.first_modified
end
and it didn't work

There's no "first_modified" method in OpenURI, because there's no support for it from underlying HTTP protocol. So, what you want to do - it's impossible.
OpenURI docs: http://ruby-doc.org/stdlib-2.1.0/libdoc/open-uri/rdoc/OpenURI/Meta.html

First-Modified is not a standard or common non-standard HTTP header.
Once you request a header that exists, you can query it if it is present by using the meta hash, even if it is not exposed as a ruby method.
require 'open-uri'
p open('http://www.google.com').meta['Your-Header-Name']

Related

Getting all unique URL's using nokogiri

I've been working for a while to try to use the .uniq method to generate a unique list of URL's from a website (within the /informatics path). No matter what I try I get a method error when trying to generate the list. I'm sure it's a syntax issue, and I was hoping someone could point me in the right direction.
Once I get the list I'm going to need to store these to a database via ActiveRecord, but I need the unique list before I get start to wrap my head around that.
require 'nokogiri'
require 'open-uri'
require 'active_record'
ARGV[0]="https://www.nku.edu/academics/informatics.html"
ARGV.each do |arg|
open(arg) do |f|
# Display connection data
puts "#"*25 + "\nConnection: '#{arg}'\n" + "#"*25
[:base_uri, :meta, :status, :charset, :content_encoding,
:content_type, :last_modified].each do |method|
puts "#{method.to_s}: #{f.send(method)}" if f.respond_to? method
end
# Display the href links
base_url = /^(.*\.nku\.edu)\//.match(f.base_uri.to_s)[1]
puts "base_url: #{base_url}"
Nokogiri::HTML(f).css('a').each do |anchor|
href = anchor['href']
# Make Unique
if href =~ /.*informatics/
puts href
#store stuff to active record
end
end
end
end
Replace the Nokogiri::HTML part to select only those href attributes that matches with /*.informatics/ and then you can use uniq, as it's already an array:
require 'nokogiri'
require 'open-uri'
require 'active_record'
ARGV[0] = 'https://www.nku.edu/academics/informatics.html'
ARGV.each do |arg|
open(arg) do |f|
puts "#{'#' * 25} \nConnection: '#{arg}'\n #{'#' * 25}"
%i[base_uri meta status charset content_encoding, content_type last_modified].each do |method|
puts "#{method.to_s}: #{f.send(method)}" if f.respond_to? method
end
puts "base_url: #{/^(.*\.nku\.edu)\//.match(f.base_uri.to_s)[1]}"
anchors = Nokogiri::HTML(f).css('a').select { |anchor| anchor['href'] =~ /.*informatics/ }
puts anchors.map { |anchor| anchor['href'] }.uniq
end
end
See output.

Is there a unified way to get content at a file:// or http:// URI scheme in Ruby?

It appears the Net::HTTP library doesn't support loading of local file via file:// . I'd like to configure loading of content from a file or remotely, depending on environment.
Is there a standard Ruby way to access either type the same way, or barring that some succinct code that branches?
Do you know about open-uri?
require 'open-uri'
open("/home/me/file.txt") { |f| ... }
open("http://www.google.com") { |f| ... }
So to support either "http://" or "file://" in one statement, simply remove the "file://" from the beginning of the uri if it is present (and no need to do any processing for "http://"), like so:
uri = ...
open(uri.sub(%r{^file://}, ''))
Here's some experimental code that teaches "open-uri" to handle "file:" URIs:
require 'open-uri'
require 'uri'
module URI
class File < Generic
def open(*args, &block)
::File.open(self.path, &block)
end
end
##schemes['FILE'] = File
end
As Ben Lee pointed out, open-uri is the way to go here. I've also used it in combination with paperclip for storing resources associated with models, which makes everything brilliantly simple.
require 'open-uri'
class SomeModel < ActiveRecord::Base
attr_accessor :remote_url
has_attached_file :resource # etc, etc.
before_validation :get_remote_resource, :if => :remote_url_provided?
validates_presence_of :remote_url, :if => :remote_url_provided?,
:message => 'is invalid or missing'
def get_remote_resource
self.resource = SomeModel.download_remote_resource(self.remote_url)
end
def self.download_remote_resource (uri)
io = open(URI.parse(uri))
def io.original_filename; base_uri.path.split('/').last; end
io.original_filename.blank? ? nil : io
rescue
end
end
# SomeModel.new(:remote_url => 'http://www.google.com/').save

Save Webscraped data

I am trying to scrape a website. I am able to scrape data from that website. I am having trouble saving the data from the scrape to yaml file that I have included
My Code:
require 'rubygems'
require 'open-uri'
require 'hpricot'
article = []
doc = open("http://www.cmegroup.com/trading/interest-rates/cleared-otc/irs.html"{|f| Hpricot(f) }
(doc/"/html/body/div/div/div/div/table/").each do |article|
puts "#{article.inner_html}"
end
File.open('test.yaml', 'w') { |f|
f <<article.to_yaml
}
First you are missing a closing parenthesis for the open call (a ) right before the block starts).
When you add that you'll notice that you'll get a NoMethodError (undefined method 'to_yaml' for []:Array). To fix that you have to require 'yaml', which pulls in the monkey-patches for the Array class. After that you'll notice that your yaml file is empty, because you never put anything into article. Here's a fixed version:
require 'rubygems'
require 'open-uri'
require 'hpricot'
require 'yaml'
articles = []
url = "http://www.cmegroup.com/trading/interest-rates/cleared-otc/irs.html"
doc = open(url) {|f| Hpricot(f) }
(doc/"/html/body/div/div/div/div/table/").each do |article|
articles << article.inner_html
end
File.open('test.yaml', 'w') { |f| f << articles.to_yaml }

Download a zip file through Net::HTTP

I am trying to download the latest.zip from WordPress.org using Net::HTTP. This is what I have got so far:
Net::HTTP.start("wordpress.org/") { |http|
resp = http.get("latest.zip")
open("a.zip", "wb") { |file|
file.write(resp.body)
}
puts "WordPress downloaded"
}
But this only gives me a 4 kilobytes 404 error HTML-page (if I change file to a.txt). I am thinking this has something to do with the URL probably is redirected somehow but I have no clue what I am doing. I am a newbie to Ruby.
My first question is why use Net::HTTP, or code to download something that could be done more easily using curl or wget, which are designed to make it easy to download files?
But, since you want to download things using code, I'd recommend looking at Open-URI if you want to follow redirects. Its a standard library for Ruby, and very useful for fast HTTP/FTP access to pages and files:
require 'open-uri'
open('latest.zip', 'wb') do |fo|
fo.print open('http://wordpress.org/latest.zip').read
end
I just ran that, waited a few seconds for it to finish, ran unzip against the downloaded file "latest.zip", and it expanded into the directory containing their content.
Beyond Open-URI, there's HTTPClient and Typhoeus, among others, that make it easy to open an HTTP connection and send queriers/receive data. They're very powerful and worth getting to know.
NET::HTTP doesn't provide a nice way of following redirects, here is a piece of code that I've been using for a while now:
require 'net/http'
class RedirectFollower
class TooManyRedirects < StandardError; end
attr_accessor :url, :body, :redirect_limit, :response
def initialize(url, limit=5)
#url, #redirect_limit = url, limit
end
def resolve
raise TooManyRedirects if redirect_limit < 0
self.response = Net::HTTP.get_response(URI.parse(url))
if response.kind_of?(Net::HTTPRedirection)
self.url = redirect_url
self.redirect_limit -= 1
resolve
end
self.body = response.body
self
end
def redirect_url
if response['location'].nil?
response.body.match(/<a href=\"([^>]+)\">/i)[1]
else
response['location']
end
end
end
wordpress = RedirectFollower.new('http://wordpress.org/latest.zip').resolve
puts wordpress.url
File.open("latest.zip", "w") do |file|
file.write wordpress.body
end

Retrieve contents of URL as string

For tedious reasons to do with Hpricot, I need to write a function that is passed a URL, and returns the whole contents of the page as a single string.
I'm close. I know I need to use OpenURI, and it should look something like this:
require 'open-uri'
open(url) {
# do something mysterious here to get page_string
}
puts page_string
Can anyone suggest what I need to add?
You can do the same without OpenURI:
require 'net/http'
require 'uri'
def open(url)
Net::HTTP.get(URI.parse(url))
end
page_content = open('http://www.google.com')
puts page_content
Or, more succinctly:
Net::HTTP.get(URI.parse('http://www.google.com'))
The open method passes an IO representation of the resource to your block when it yields. You can read from it using the IO#read method
open([mode [, perm]] [, options]) [{|io| ... }]
open(path) { |io| data = io.read }
require 'open-uri'
open(url) do |f|
page_string = f.read
end
See also the documentation of IO class
I was also very confused what to use for better performance and speedy results. I ran a benchmark for both to make it more clear:
require 'benchmark'
require 'net/http'
require "uri"
require 'open-uri'
url = "http://www.google.com"
Benchmark.bm do |x|
x.report("net-http:") { content = Net::HTTP.get_response(URI.parse(url)).body if url }
x.report("open-uri:") { open(url){|f| content = f.read } if url }
end
Its result is:
user system total real
net-http: 0.000000 0.000000 0.000000 ( 0.097779)
open-uri: 0.030000 0.010000 0.040000 ( 0.864526)
I'd like to say that it depends on what your requirement is and how you want to process.
To make code a little clearer, the OpenURI open method will return the value returned by the block, so you can assign open's return value to your variable. For example:
xml_text = open(url) { |io| io.read }
Starting with Ruby 3.0, calling URI.open via Kernel#open has been removed, so instead call URI.open directly:
require 'open-uri'
page_string = URI.open(url, &:read)
Try the following instead:
require 'open-uri'
content = URI(your_url).read
require 'open-uri'
open(url) {|f| #url must specify the protocol
str = f.read()
}

Resources