rails 2 read file from http post body - ruby

I am using rails 2 and I am trying to read a file from http post request body.
def get_file_from_request(request, file_name)
file = Tempfile.new([File.basename(file_name, ".*"), File.extname(file_name)])
Rails.logger.info "#{request.body.size}" # returns 130
file.write(request.body.read)
file
end
If I do
request.inspect
I get following:
...
"rack.version"=>[1, 2], "rack.input"=>#<PhusionPassenger::Utils::TeeInput:0x0000000d3a6148 #len=130, #socket=nil, #bytes_read=130, #tmp=#<StringIO:0x0000000d3a60d0>>, "rack.errors"=>#<IO:<STDERR>>, "rack.multithread"=>false, "rack.multiprocess"=>true, "rack.run_once"=>false, "rack.url_scheme"=>"http", "rack.hijack?"=>true, "rack.hijack"=>#<Proc:0x0000000d3a5ec8#/data/rbenv/.rbenv/versions/1.9.3-p550/lib/ruby/gems/1.9.1/gems/passenger-4.0.42/lib/phusion_passenger/rack/thread_handler_extension.rb:69 (lambda)>,
...
Are there any obvious problems with my approach?
Can someone help me with extracting files from request body? The file is definitely not 130 bytes. Its is like 3 MB.
EDIT: Here is the controller class
class Api::PhotosController < Api::BaseController
before_filter :verify_session, :except => [:new, :show_image]
def create
Rails.logger.info "upload photo request: params=#{params.inspect}, request=#{request.inspect}"
...
file = get_file_from_request(request, params[:file_name])
...
rescue => e
Rails.logger.info "Could not upload photo: params=#{params.inspect}, exception=#{e.backtrace}"
render_error(e)
end
private
def get_file_from_request(request, file_name)
file = Tempfile.new([File.basename(file_name, ".*"), File.extname(file_name)])
Rails.logger.info "#{request.body.size}" # returns 130
file.write(request.body.read)
file
end
end

In Rails 2 you may have to use request.raw_post. Does that help?

Related

How do I scrape a website and output data to xml file with Nokogiri?

I've been trying to scrape data using Nokogiri and HTTParty and can scrape data off a website successfully and print it to the console but I can't work out how to output the data to an xml file in the repo.
Right now the code looks like this:
class Scraper
attr_accessor :parse_page
def initialize
doc = HTTParty.get("https://store.nike.com/gb/en_gb/pw/mens-nikeid-lifestyle-shoes/1k9Z7puZoneZoi3?ref=https%253A%252F%252Fwww.google.com%252F")
#parse_page ||= Nokogiri::HTML(doc)
end
def get_names
item_container.css(".product-display-name").css("p").children.map { |name| name.text }.compact
end
def get_prices
item_container.css(".product-price").css("span.local").children.map { |price| price.text }.compact
end
private
def item_container
parse_page.css(".grid-item-info")
end
scraper = Scraper.new
names = scraper.get_names
prices = scraper.get_prices
(0...prices.size).each do |index|
puts " - - - Index #{index + 1} - - -"
puts "Name: #{names[index]} | Price: #{prices[index]}"
end
end
I've tried changing the .each method to include a File.write() but all it ever does is write the last line of the output into the xml file. I would appreciate any insight as to how to parse the data correctly, I am new to scraping.
I've tried changing the .each method to include a File.write() but all it ever does is write the last line of the output into the xml file.
Is the File.write method inside the each loop? I guess what's happening here is You are overwriting the file on every iteration and that's why you are seeing only the last line.
Try putting the each loop inside the block of the File.open method like:
File.open(yourfile, 'w') do |file|
(0...prices.size).each do |index|
file.write("your text")
end
end
I also recommend reading about the Nokogiri::XML::Builder and then saving it's output to the file.

How to Save File using Ruby 2.2.3 and rest-client

I am trying to use a rest API to download a file, it appears to work, but I dont actually have a file downloaded. I am assuming its because its going to memory, and not to my file system.
Below is the portion of code responsible. My URL is slightly edited when pasting it below, and my authToken is valid.
backup_url = "#{proto}://#{my_host}/applications/ws/migration/export?noaudit=#{include_audit}&includebackup=#{include_backup_zips}&authToken=#{my_token}"
resource = RestClient::Resource.new(
backup_url,
:timeout => nil,
:open_timeout => nil)
response = resource.get
if response.code == 200
puts "Backup Complete"
else
puts "Backup Failed"
abort("Response Code was not 200: Response Code #{response.code}")
end
Returns:
# => 200 OK | application/zip 222094570 bytes
Backup Complete
There is no file present though.
Thanks,
Well you actually have to write to the file yourself.
Pathname('backup.zip').write response.to_s
You can save the zip file using File class
...
if response.code == 200
f = File.new("backup.zip", "wb")
f << response.body
f.close
puts "Backup Complete"
else
...

ruby net/http `read_body': Net::HTTPOK#read_body called twice (IOError)

I'm getting read_body called twice (IOError) using the net/http library. I'm trying to download files and use http sessions efficiently. Looking for some help or advice to fix my issues. From my debug message it appears when I log the response code, readbody=true. Is that why read_body is read twice when I try to write the large file in chunks?
D, [2015-04-12T21:17:46.954928 #24741] DEBUG -- : #<Net::HTTPOK 200 OK readbody=true>
I, [2015-04-12T21:17:46.955060 #24741] INFO -- : file found at http://hidden:8080/job/project/1/maven-repository/repository/org/project/service/1/service-1.zip.md5
/usr/lib/ruby/2.2.0/net/http/response.rb:195:in `read_body': Net::HTTPOK#read_body called twice (IOError)
from ./deploy_application.rb:36:in `block in get_file'
from ./deploy_application.rb:35:in `open'
from ./deploy_application.rb:35:in `get_file'
from ./deploy_application.rb:59:in `block in <main>'
from ./deploy_application.rb:58:in `each'
from ./deploy_application.rb:58:in `<main>'
require 'net/http'
require 'logger'
STAMP = Time.now.utc.to_i
#log = Logger.new(STDOUT)
# project , build, service remove variables above
project = "project"
build = "1"
service = "service"
version = "1"
BASE_URI = URI("http://hidden:8080/job/#{project}/#{build}/maven-repository/repository/org/#{service}/#{version}/")
# file pattern for application is zip / jar. Hopefully the lib in the zipfile is acceptable.
# example for module download /#{service}/#{version}.zip /#{service}/#{version}.zip.md5 /#{service}/#{version}.jar /#{service}/#{version}.jar.md5
def clean_exit(code)
# remove temp files on exit
end
def get_file(file)
puts BASE_URI
uri = URI.join(BASE_URI,file)
#log.debug(uri)
request = Net::HTTP::Get.new uri #.request_uri
#log.debug(request)
response = #http.request request
#log.debug(response)
case response
when Net::HTTPOK
size = 0
progress = 0
total = response.header["Content-Length"].to_i
#log.info("file found at #{uri}")
# need to handle file open error
Dir.mkdir "/tmp/#{STAMP}"
File.open "/tmp/#{STAMP}/#{file}", 'wb' do |io|
response.read_body do |chunk|
size += chunk.size
new_progress = (size * 100) / total
unless new_progress == progress
#log.info("\rDownloading %s (%3d%%) " % [file, new_progress])
end
progress = new_progress
io.write chunk
end
end
when 404
#log.error("maven repository file #{uri} not found")
exit 4
when 500...600
#log.error("error getting #{uri}, server returned #{response.code}")
exit 5
else
#log.error("unknown http response code #{response.code}")
end
end
#http = Net::HTTP.new(BASE_URI.host, BASE_URI.port)
files = [ "#{service}-#{version}.zip.md5", "#{service}-#{version}.jar", "#{service}-#{version}.jar.md5" ].each do |file| #"#{service}-#{version}.zip",
get_file(file)
end
Edit: Revised answer!
Net::HTTP#request, when called without a block, will pre-emptively read the body. The documentation isn't clear about this, but it hints at it by suggesting that the body is not read if a block is passed.
If you want to make the request without reading the body, you'll need to pass a block to the request call, and then read the body from within that. That is, you want something like this:
#http.request request do |response|
# ...
response.read_body do |chunk|
# ...
end
end
This is made clear in the implementation; Response#reading_body will first yield the unread response to a block if given (from #transport_request, which is called from #request), then read the body unconditionally. The block parameter to #request gives you that chance to intercept the response before the body is read.

How to get RSS feed in xml format for ruby script

I am using the following ruby script from this dashing widget that retrieves an RSS feed and parses it and sends that parsed title and description to a widget.
require 'net/http'
require 'uri'
require 'nokogiri'
require 'htmlentities'
news_feeds = {
"seattle-times" => "http://seattletimes.com/rss/home.xml",
}
Decoder = HTMLEntities.new
class News
def initialize(widget_id, feed)
#widget_id = widget_id
# pick apart feed into domain and path
uri = URI.parse(feed)
#path = uri.path
#http = Net::HTTP.new(uri.host)
end
def widget_id()
#widget_id
end
def latest_headlines()
response = #http.request(Net::HTTP::Get.new(#path))
doc = Nokogiri::XML(response.body)
news_headlines = [];
doc.xpath('//channel/item').each do |news_item|
title = clean_html( news_item.xpath('title').text )
summary = clean_html( news_item.xpath('description').text )
news_headlines.push({ title: title, description: summary })
end
news_headlines
end
def clean_html( html )
html = html.gsub(/<\/?[^>]*>/, "")
html = Decoder.decode( html )
return html
end
end
#News = []
news_feeds.each do |widget_id, feed|
begin
#News.push(News.new(widget_id, feed))
rescue Exception => e
puts e.to_s
end
end
SCHEDULER.every '60m', :first_in => 0 do |job|
#News.each do |news|
headlines = news.latest_headlines()
send_event(news.widget_id, { :headlines => headlines })
end
end
The example rss feed works correctly because the URL is for an xml file. However I want to use this for a different rss feed that does not provide an actual xml file. This rss feed I want is at http://www.ttc.ca/RSS/Service_Alerts/index.rss
This doesn't seem to display anything on the widget. Instead of using "http://www.ttc.ca/RSS/Service_Alerts/index.rss", I also tried "http://www.ttc.ca/RSS/Service_Alerts/index.rss?format=xml" and "view-source:http://www.ttc.ca/RSS/Service_Alerts/index.rss" but with no luck. Does anyone know how I can get the actual xml data related to this rss feed so that I can use it with this ruby script?
You're right, that link does not provide regular XML, so that script won't work in parsing it since it's written specifically to parse the example XML. The rss feed you're trying to parse is providing RDF XML and you can use the Rubygem: RDFXML to parse it.
Something like:
require 'nokogiri'
require 'rdf/rdfxml'
rss_feed = 'http://www.ttc.ca/RSS/Service_Alerts/index.rss'
RDF::RDFXML::Reader.open(rss_feed) do |reader|
# use reader to iterate over elements within the document
end
From here you can try learning how to use RDFXML to extract the content you want. I'd begin by inspecting the reader object for methods I could use:
puts reader.methods.sort - Object.methods
That will print out the reader's own methods, look for one you might be able to use for your purposes, such as reader.each_entry
To further dig down you can inspect what each entry looks like:
reader.each_entry do |entry|
puts "----here's an entry----"
puts entry.inspect
end
or see what methods you can call on the entry:
reader.each_entry do |entry|
puts "----here's an entry's methods----"
puts entry.methods.sort - Object.methods
break
end
I was able to crudely find some titles and descriptions using this hack job:
RDF::RDFXML::Reader.open('http://www.ttc.ca/RSS/Service_Alerts/index.rss') do |reader|
reader.each_object do |object|
puts object.to_s if object.is_a? RDF::Literal
end
end
# returns:
# TTC Service Alerts
# http://www.ttc.ca/Service_Advisories/index.jsp
# TTC Service Alerts.
# TTC.ca
# http://www.ttc.ca
# http://www.ttc.ca/images/ttc-main-logo.gif
# Service Advisory
# http://www.ttc.ca/Service_Advisories/all_service_alerts.jsp#Service+Advisory
# 196 York University Rocket route diverting northbound via Sentinel, Finch due to a collision that has closed the York U Bus way.
# - Affecting: Bus Routes: 196 York University Rocket
# 2013-12-17T13:49:03.800-05:00
# Service Advisory (2)
# http://www.ttc.ca/Service_Advisories/all_service_alerts.jsp#Service+Advisory+(2)
# 107B Keele North route diverting northbound via Keele, Lepage due to a collision that has closed the York U Bus way.
# - Affecting: Bus Routes: 107 Keele North
# 2013-12-17T13:51:08.347-05:00
But I couldn't quickly find a way to know which one was a title, and which a description :/
Finally, if you still can't find how to extract what you want, start a new question with this info.
Good luck!

send_file for a tempfile in Sinatra

I'm trying to use Sinatra's built-in send_file command but it doesn't seem to be working for tempfiles.
I basically do the following to zip an album of mp3s:
get '/example' do
songs = ...
file_name = "zip_test.zip"
t = Tempfile.new(['temp_zip', '.zip'])
# t = File.new("testfile.zip", "w")
Zip::ZipOutputStream.open(t.path) do |z|
songs.each do |song|
name = song.name
name += ".mp3" unless name.end_with?(".mp3")
z.put_next_entry(name)
z.print(open(song.url) {|f| f.read })
p song.name + ' added to file'
end
end
p t.path
p t.size
send_file t.path, :type => 'application/zip',
:disposition => 'attachment',
:filename => file_name,
:stream => false
t.close
t.unlink
end
When I use t = File.new(...) things work as expected, but I don't want to use File as it will have concurrency problems.
When I use t = Tempfile.new(...), I get:
!! Unexpected error while processing request: The file identified by body.to_path does not exist`
Edit: It looks like part of the problem is that I'm sending multiple files. If I just send one song, the Tempfile system works as well.
My guess is that you have a typo in one of your song-names, or maybe a slash in one of the last parts of song.url? I adopted your code and if all the songs exist, sending the zip as a tempfile works perfectly fine.

Resources