Ruby parsing HTTPresponse with Nokogiri - ruby

Parsing HTTPresponse with Nokogiri
Hi, I am having trouble parsing HTTPresponse objects with Nokogiri.
I use this function to fetch a website here:
fetch a link
def fetch(uri_str, limit = 10)
# You should choose better exception.
raise ArgumentError, 'HTTP redirect too deep' if limit == 0
url = URI.parse(URI.encode(uri_str.strip))
puts url
#get path
req = Net::HTTP::Get.new(url.path,headers)
#start TCP/IP
response = Net::HTTP.start(url.host,url.port) { |http|
http.request(req)
}
case response
when Net::HTTPSuccess
then #print final redirect to a file
puts "this is location" + uri_str
puts "this is the host #{url.host}"
puts "this is the path #{url.path}"
return response
# if you get a 302 response
when Net::HTTPRedirection
then
puts "this is redirect" + response['location']
return fetch(response['location'],aFile, limit - 1)
else
response.error!
end
end
html = fetch("http://www.somewebsite.com/hahaha/")
puts html
noko = Nokogiri::HTML(html)
When I do this html prints a whole bunch of gibberish and
Nokogiri complains that "node_set must be a Nokogiri::XML::NOdeset
If anyone could offer help it would be quite appreciated

First thing. Your fetch method returns a Net::HTTPResponse object and not just the body. You should provide the body to Nokogiri.
response = fetch("http://www.somewebsite.com/hahaha/")
puts response.body
noko = Nokogiri::HTML(response.body)
I've updated your script so it's runnable (bellow). A couple of things were undefined.
require 'nokogiri'
require 'net/http'
def fetch(uri_str, limit = 10)
# You should choose better exception.
raise ArgumentError, 'HTTP redirect too deep' if limit == 0
url = URI.parse(URI.encode(uri_str.strip))
puts url
#get path
headers = {}
req = Net::HTTP::Get.new(url.path,headers)
#start TCP/IP
response = Net::HTTP.start(url.host,url.port) { |http|
http.request(req)
}
case response
when Net::HTTPSuccess
then #print final redirect to a file
puts "this is location" + uri_str
puts "this is the host #{url.host}"
puts "this is the path #{url.path}"
return response
# if you get a 302 response
when Net::HTTPRedirection
then
puts "this is redirect" + response['location']
return fetch(response['location'], limit-1)
else
response.error!
end
end
response = fetch("http://www.google.com/")
puts response
noko = Nokogiri::HTML(response.body)
puts noko
The script gives no error and prints the content. You may be getting Nokogiri error due to the content you're receiving. One common problem I've encountered with Nokogiri is character encoding. Without the exact error it's impossible to tell what's going on.
I'd recommnend looking at the following StackOverflow Questions
ruby 1.9: invalid byte sequence in UTF-8 (specifically this answer)
How to convert a Net::HTTP response to a certain encoding in Ruby 1.9.1?

Related

Cannot make HTTP Delete request with Ruby's net/http library

I've been trying to make an API call to my server to delete a user record help on a dev database. When I use Fiddler to call the URL with the DELETE operation I am able to immediately delete the user record. When I call that same URL, again with the DELETE operation, from my script below, I get this error:
{"Message":"The requested resource does not support http method 'DELETE'."}
I have changed the url in my script below. The url I am using is definitely correct. I suspect that there is a logical error in my code that I haven't caught. My script:
require 'net/http'
require 'json'
require 'pp'
require 'uri'
def deleteUserRole
# prepare request
url= "http://my.database.5002143.access" # dev
uri = URI.parse(url)
request = Net::HTTP::Delete.new(uri.path)
http = Net::HTTP.new(uri.host, uri.port)
# send the request
response = http.request(request)
puts "response: \n"
puts response.body
puts "response code: " + response.code + "\n \n"
# parse response
buffer= response.body
result = JSON.parse(buffer)
status= result["Success"]
if status == true
then puts "passed"
else puts "failed"
end
end
deleteUserRole
It turns out that I was typing in the wrong command. I needed to change this line:
request = Net::HTTP::Delete.new(uri.path)
to this line:
request = Net::HTTP::Delete.new(uri)
By typing uri.path I was excluding part of the URL from the API call. When I was debugging, I would type puts uri and that would show me the full URL, so I was certain the URL was right. The URL was right, but I was not including the full URL in my DELETE call.
if you miss the parameters to pass while requesting delete, it won't work
you can do like this
uri = URI.parse('http://localhost/test')
http = Net::HTTP.new(uri.host, uri.port)
attribute_url = '?'
attribute_url << body.map{|k,v| "#{k}=#{v}"}.join('&')
request = Net::HTTP::Delete.new(uri.request_uri+attribute_url)
response = http.request(request)
where body is a hashmap where you can define query params as a hashmap.. while sending request it can be joined in the url by the code above.
ex:body = { :resname => 'res', :bucket_name => 'bucket', :uploaded_by => 'upload' }

Issue while fetching data from nested json

I am trying to fetch data from a nested json. Not able to understand the issue over here. Please ignore the fields that I am passing to ChildArticle class. I can sort that out.
URL for JSON - http://api.nytimes.com/svc/mostpopular/v2/mostshared/all-sections/email/30.json?api-key=31fa4521f6572a0c05ad6822ae109b72:2:72729901
Below is my code:
url = 'http://api.nytimes.com'
#Define the HTTP object
uri = URI.parse(url)
http = Net::HTTP.new(uri.host, uri.port)
#If the api being scraped uses https, then set use_ssl to true.
http.use_ssl = false
#Define the request_url
#Make a GET request to the given url
request = '/svc/mostpopular/v2/mostshared/all-sections/email/30.json?api-key=31fa4521f6572a0c05ad6822ae109b72:2:72729901'
response = http.send_request('GET', request)
#Parse the response body
forecast = JSON.parse(response.body)
forecast["results"]["result"].each do |item|
date = Date.parse(item["published_date"].to_s)
if (#start <= date) && (#end >= date)
article = News::ChildArticle.new(author: item["author"], title: item["title"], summary: item["abstract"],
images: item["images"],source: item["url"], date: item["published_date"],
guid: item["guid"], link: item["link"], section: item["section"],
item_type: item["item_type"], updated_date: item["updated_date"],
created_date: item["created_date"],
material_type_facet: item["material_type_facet"])
#articles.concat([article])
end
end
I get below error -
[]': no implicit conversion of String into Integer (TypeError) atforecast["results"]["result"].each do |item|`
Looks like forecast['results'] is simply an array, not a hash.
Take a look at this slightly modified script. Give it a run in your terminal, and check out its output.
require 'net/http'
require 'JSON'
url = 'http://api.nytimes.com'
uri = URI.parse(url)
http = Net::HTTP.new(uri.host, uri.port)
http.use_ssl = false
request = '/svc/mostpopular/v2/mostshared/all-sections/email/30.json?api-key=31fa4521f6572a0c05ad6822ae109b72:2:72729901'
response = http.send_request('GET', request)
forecast = JSON.parse(response.body)
forecast["results"].each.with_index do |item, i|
puts "Item #{i}:"
puts '--'
item.each do |k, v|
puts "#{k}: #{v}"
end
puts '----'
end
Also, you may want to inspect the JSON structure of the API return from that URL. If you go to that URL, open your JavaScript console, and paste in
JSON.parse(document.body.textContent)
you can inspect the JSON structure very easily.
Another option would be downloading the response to a JSON file, and inspecting it in your editor. You'll need a JSON prettifier though.
File.open('response.json', 'w') do |f|
f.write(response.body)
end

Net::HTTP Proxy list

I understand that you could use proxy in the ruby Net::HTTP. However, I have no idea how to do this with a bunch of proxy. I need the Net::HTTP to change to another proxy and send another post request after every post request. Also, is it possible to make the Net::HTTP to change to another proxy if the previous proxy is not working? If so, how?
Code I'm trying to implement the script in:
require 'net/http'
sleep(8)
http = Net::HTTP.new('URLHERE', 80)
http.read_timeout = 5000
http.use_ssl = false
path = 'PATHHERE'
data = '(DATAHERE)'
headers = {
'Referer' => 'REFERER HERE',
'Content-Type' => 'application/x-www-form-urlencoded; charset=UTF-8',
'User-Agent' => '(USERAGENTHERE)'}
resp, data = http.post(path, data, headers)
# Output on the screen -> we should get either a 302 redirect (after a successful login) or an error page
puts 'Code = ' + resp.code
puts 'Message = ' + resp.message
resp.each {|key, val| puts key + ' = ' + val}
puts data
end
Given an array of proxies, the following example will make a request through each proxy in the array until it receives a "302 Found" response. (This isn't actually a working example because Google doesn't accept POST requests, but it should work if you insert your own destination and working proxies.)
require 'net/http'
destination = URI.parse "http://www.google.com/search"
proxies = [
"http://proxy-example-1.net:8080",
"http://proxy-example-2.net:8080",
"http://proxy-example-3.net:8080"
]
# Create your POST request_object once
request_object = Net::HTTP::Post.new(destination.request_uri)
request_object.set_form_data({"q" => "stack overflow"})
proxies.each do |raw_proxy|
proxy = URI.parse raw_proxy
# Create a new http_object for each new proxy
http_object = Net::HTTP.new(destination.host, destination.port, proxy.host, proxy.port)
# Make the request
response = http_object.request(request_object)
# If we get a 302, report it and break
if response.code == "302"
puts "#{proxy.host}:#{proxy.port} responded with #{response.code} #{response.message}"
break
end
end
You should also probably do some error checking with begin ... rescue ... end each time you make a request. If you don't do any error checking and a proxy is down, control will never reach the line that checks for response.code == "302" -- the program will just fail with some type of connection timeout error.
See the Net::HTTPHeader docs for other methods that can be used to customize the Net::HTTP::Post object.

Ruby URL Validation

I wrote out this script to basically parse a textfile of URL's and return the http response code, however I cant get it to work. I'm able to import and parse the file, however unable to get the return code. Thanks in advance!
require 'net/http'
#Open URL from file
File.open("sample_input_file", "r") do |infile|
while (URI = infile.gets)
end
end
#Get HTTP response code
http = Net::HTTP.new
response = http.request_head(URI)
#Print result
if
response.code != "200"
puts URI + "Error"
else
puts "Ok"
end
.gets returns a string, you need to actually make an a uri by calling for example URI.parse
http://www.ruby-doc.org/stdlib-1.9.3/libdoc/uri/rdoc/

How to replace a particular string found in URL taken from console with file content found on each line written in text file in Ruby?

I want to replace fuzz word which has been written in URL taken from console with contents of text file having one string per line.
After replacing this fuzz word with file content want to fire http request with this replaced content file and store responses with modified requests in new file.
I have written like this but getting error:
fuzz1.rb:16: private method `gsub' called for #<URI::HTTP:0x2969040> (NoMethodError)
Code is here:
require 'net/http'
puts "Enter Target:\n"
target = URI(gets())
new_reference = target
a1 = target.clone
Net::HTTP.start(target.host, target.port) do |http|
request = Net::HTTP::Get.new target.request_uri
response = http.request request
puts response.body
end
puts "File contents:\n"
f= File.open("fuzz.txt","r")
while line = f.gets do
puts "Attack value: #{line}"
b = a1.gsub('fuzz','#{line}')
c = b
Net::HTTP.start(c.host, c.port) do |http|
request = Net::HTTP::Get.new c.request_uri
response = http.request request
puts response.body
end
end
No idea why this gsub error is coming and not replacing fuzz word found in URL with file line content.
a1 is an URI object, it doesn't have a gsub method. You need to cast it to a string. Try this
b = URI.parse(a1.to_s.gsub('fuzz','#{line}'))

Resources