Why does open('uri').read delete the response data? - ruby

If I open a URI and read a the response as follows:
response = open("https://www.example.com")
result = response.read
That works fine, but if I then call response.read again an empty string is returned. This seems like odd behavior. Why is this the case?

It's because OpenURI is returning a Tempfile object, which is a special implementation of the File class:
A Tempfile objects behaves just like a File object, and you can perform all the usual file operations on it: reading data, writing data, changing its permissions, etc. So although this class does not explicitly document all instance methods supported by File, you can in fact call any File instance method on a Tempfile object.
And a File class' parent is an IO object. Which means when you call read you're calling an IO implementation of the method.
What all of this means is that you're reading a file when doing response.read and you're reading until end of file. Which is why you're getting an empty string when you do a second read, because you're trying to read from the end of file, which has nothing.
Here's one way to examine this and see what's going on:
require 'open-uri'
response = open('http://google.com')
puts response.class # => Tempfile
puts response.read # => <!doctype html><html ...
puts response.pos # => 10941
puts response.read # => ""
response.rewind
puts response.pos # => 0
puts response.read # => <!doctype html><html ...

Related

How can I access the raw request body in ruby CGI scripts?

In a ruby script that I run as a CGI program, I need to access the body of a HTTP POST request. The request body contains JSON data:
{"data":"a"}
I want to take the whole body and parse it with JSON.parse to process it. What's the canonical way to do this? The Ruby docs don't mention the request body.
I only found a hint in a blog post that
CGI tries to parse the request body as form parameters so a blob of JSON awkwardly ends up as the one and only parameter key.
This approach seems to work
puts cgi.params.keys.first # prints {"data":"a"}
but fails as soon as the value for data is a base64 encoded string that contains an = for padding: Using this body
{"data":"a="}
results in the following output (characters missing at the end):
puts cgi.params.keys.first # prints {"data":"a
What's the correct approach to solve this?
As you might already know, when parameters and their values are urlencoded they are delimited with an =: name=Theo&language=ruby and so on.
This is why the name of the first parameter stops at the character before the =. The approach of using the first key, as describe in that blog post, isn't really reliable.
Instead, in a CGI script you can read the request body directly from stdin e.g.
request_body = $stdin.read
Note, when you instantiate a CGI object it will read in everything from stdin and attempt to parse it into the params hash.
This means that if you'd still like to use the cgi library for building your response you'll need to read from stdin earlier in the code, before creating the CGI object. e.g.
# minimal example that just outputs the request body
require 'cgi'
request_body = $stdin.read
cgi = CGI.new
cgi.out("status" => "OK", "type" => "text/plain", "connection" => "close") do
request_body
end
Apparently there is no easy solution for this in Ruby.
But there are two ways you can achieve this.
Redefine CGI::parse(params) method.
This method in CGI module is responsible for parsing both POST and GET parameters into params hash. You can redefine this method in your code so that it add an extra parameter called RAW_DATA in params hash.
def CGI::parse(query)
params = {}
query.split(/[&;]/).each do |pairs |
key, value = pairs.split('=', 2).collect {
| v | CGI::unescape(v)
}
next unless key
params[key] || = []
params[key].push(value) if value
end
#Add RAW_DATA to params
params[:RAW_DATA] = query
params.default = [].freeze
params
end
Use $stdin.read() before creating CGI instance.
But this may prevent you from making use of other CGI features.
So you may replace $stdin temporarily with a StringIO object.
require 'cgi'
require 'stringio'
raw_data = $stdin.read()
real_stdin = $stdin
$stdin = StringIO.new(raw_data)
STDIN = $stdin
cgi = CGI.new
#Your CGI code here
#........
$stdin = real_stdin
STDIN = $stdin

How do I call a function in Ruby?

I'm trying to call but I keep getting an error. This is my code:
require 'rubygems'
require 'net/http'
require 'uri'
require 'json'
class AlchemyAPI
#Setup the endpoints
##ENDPOINTS = {}
##ENDPOINTS['taxonomy'] = {}
##ENDPOINTS['taxonomy']['url'] = '/url/URLGetRankedTaxonomy'
##ENDPOINTS['taxonomy']['text'] = '/text/TextGetRankedTaxonomy'
##ENDPOINTS['taxonomy']['html'] = '/html/HTMLGetRankedTaxonomy'
##BASE_URL = 'http://access.alchemyapi.com/calls'
def initialize()
begin
key = File.read('C:\Users\KVadher\Desktop\api_key.txt')
key.strip!
if key.empty?
#The key file should't be blank
puts 'The api_key.txt file appears to be blank, please copy/paste your API key in the file: api_key.txt'
puts 'If you do not have an API Key from AlchemyAPI please register for one at: http://www.alchemyapi.com/api/register.html'
Process.exit(1)
end
if key.length != 40
#Keys should be exactly 40 characters long
puts 'It appears that the key in api_key.txt is invalid. Please make sure the file only includes the API key, and it is the correct one.'
Process.exit(1)
end
#apiKey = key
rescue => err
#The file doesn't exist, so show the message and create the file.
puts 'API Key not found! Please copy/paste your API key into the file: api_key.txt'
puts 'If you do not have an API Key from AlchemyAPI please register for one at: http://www.alchemyapi.com/api/register.html'
#create a blank file to hold the key
File.open("api_key.txt", "w") {}
Process.exit(1)
end
end
# Categorizes the text for a URL, text or HTML.
# For an overview, please refer to: http://www.alchemyapi.com/products/features/text-categorization/
# For the docs, please refer to: http://www.alchemyapi.com/api/taxonomy/
#
# INPUT:
# flavor -> which version of the call, i.e. url, text or html.
# data -> the data to analyze, either the the url, text or html code.
# options -> various parameters that can be used to adjust how the API works, see below for more info on the available options.
#
# Available Options:
# showSourceText -> 0: disabled (default), 1: enabled.
#
# OUTPUT:
# The response, already converted from JSON to a Ruby object.
#
def taxonomy(flavor, data, options = {})
unless ##ENDPOINTS['taxonomy'].key?(flavor)
return { 'status'=>'ERROR', 'statusInfo'=>'Taxonomy info for ' + flavor + ' not available' }
end
#Add the URL encoded data to the options and analyze
options[flavor] = data
return analyze(##ENDPOINTS['taxonomy'][flavor], options)
print
end
**taxonomy(text,"trees",1)**
end
In ** ** I have entered my call. Am I doing something incorrect. The error I receive is:
C:/Users/KVadher/Desktop/testrub:139:in `<class:AlchemyAPI>': undefined local variable or method `text' for AlchemyAPI:Class (NameError)
from C:/Users/KVadher/Desktop/testrub:6:in `<main>'
I feel as though I'm calling as normal and that there is something wrong with the api code itself? Although I may be wrong.
Yes, as jon snow says, the function (method) call must be outside of the class. The methods are defined along with the class.
Also, Options should be a Hash, not a number, as you call options[flavor] = data, which is going to cause you another problem.
I believe maybe you meant to put text in quotes, as that is one of your flavors.
Furthermore, because you declared a class, this is called an instance method, and you must make an instance of the class to use this:
my_instance = AlchemyAPI.new
my_taxonomy = my_instance.taxonomy("text", "trees")
That's enough to get it to work, it seems like you have a ways to go to get this all working though. Good luck!

Retrieve a file in Ruby

So what I am trying to do is pass a file name into a method and and check if the file is closed. What I am struggling to do is getting a file object from the file name without actually opening the file.
def file_is_closed(file_name)
file = # The method I am looking for
file.closed?
end
I have to fill in the commented part. I tried using the load_file method from the YAML module but I think that gives the content of the file instead of the actual file.
I couldn't find a method in the File module to call. Is there a method maybe that I don't know?
File#closed? returns whether that particular File object is closed, so there is no method that is going to make your current attempted solution work:
f1 = File.new("test.file")
f2 = File.new("test.file")
f1.close
f1.closed? # => true # Even though f2 still has the same file open
It would be best to retain the File object that you're using in order to ask it if it is closed, if possible.
If you really want to know if your current Ruby process has any File objects open for a particular path, something like this feels hack-ish but should mostly work:
def file_is_closed?(file_name)
ObjectSpace.each_object(File) do |f|
if File.absolute_path(f) == File.absolute_path(file_name) && !f.closed?
return false
end
end
true
end
I don't stand by that handling corner cases well, but it seems to work for me in general:
f1 = File.new("test.file")
f2 = File.new("test.file")
file_is_closed?("test.file") # => false
f1.close
file_is_closed?("test.file") # => false
f2.close
file_is_closed?("test.file") # => true
If you want to know if any process has the file open, I think you'll need to resort to something external like lsof.
For those cases where you no longer have access to the original file objects in Ruby (after fork + exec, for instance), a list of open file descriptors is available in /proc/pid/fd. Each file there is named for the file descriptor number, and is a symlink to the opened file, pipe, or socket:
# Returns hash in form fd => filename
def open_file_descriptors
Hash[
Dir.glob( File.join( '/proc', Process.pid.to_s, 'fd', '*' ) ).
map { |fn| [File.basename(fn).to_i, File.readlink(fn)] rescue [nil, nil] }.
delete_if { |fd, fn| fd.nil? or fd < 3 }
]
end
# Return IO object for the named file, or nil if it's not open
def io_for_path(path)
fd, fn = open_file_descriptors.find {|k,v| path === v}
fd.nil? ? nil : IO.for_fd(fd)
end
# close an open file
file = io_for_path('/my/open/file')
file.close unless file.nil?
The open_file_descriptors method parses the fd directory and returns a hash like {3 => '/my/open/file'}. It is then a simple matter to get the file descriptor number for the desired file, and have Ruby produce an IO object for it with for_fd.
This assumes you are on Linux, of course.

Ruby: how can use the dump method to output data to a csv file?

I try to use the ruby standard csv lib to dump out the arr of object to a csv.file , called 'a.csv'
http://ruby-doc.org/stdlib-1.9.3/libdoc/csv/rdoc/CSV.html#method-c-dump
dump(ary_of_objs, io = "", options = Hash.new)
but in this method, how can i dump into a file?
there is no such examples exists and help. I google it no example to do for me...
Also, the docs said that...
The next method you can provide is an instance method called
csv_headers(). This method is expected to return the second line of
the document (again as an Array), which is to be used to give each
column a header. By default, ::load will set an instance variable if
the field header starts with an # character or call send() passing the
header as the method name and the field value as an argument. This
method is only called on the first object of the Array.
Anyone knows how to pass the instance method csv_headers() to this dump function?
I haven't tested this out yet, but it looks like io should be set to a file. According to the doc you linked "The io parameter can be used to serialize to a File"
Something like:
f = File.open("filename")
dump(ary_of_objs, io = f, options = Hash.new)
The accepted answer doesn't really answer the question so I thought I'd give a useful example.
First of all if you look at the docs at http://ruby-doc.org/stdlib-1.9.3/libdoc/csv/rdoc/CSV.html, if you hover over the method name for dump you see you can click to show source. If you do that you'll see that the dump method attempts to call csv_headers on the first object you pass in from ary_of_objs:
obj_template = ary_of_objs.first
...snip...
headers = obj_template.csv_headers
Then later you see that the method will call csv_dump on each object in ary_of_objs and pass in the headers:
ary_of_objs.each do |obj|
begin
csv << obj.csv_dump(headers)
rescue NoMethodError
csv << headers.map do |var|
if var[0] == #
obj.instance_variable_get(var)
else
obj[var[0..-2]]
end
end
end
end
So we need to augment each entry in array_of_objs to respond to those two methods. Here's an example wrapper class that would take a Hash, and return the hash keys as the CSV headers and then be able to dump each row based on the headers.
class CsvRowDump
def initialize(row_hash)
#row = row_hash
end
def csv_headers
#row.keys
end
def csv_dump(headers)
headers.map { |h| #row[h] }
end
end
There's one more catch though. This dump method wants to write an extra line at the top of the CSV file before the headers, and there's no way to skip that if you call this method due to this code at the top:
# write meta information
begin
csv << obj_template.class.csv_meta
rescue NoMethodError
csv << [:class, obj_template.class]
end
Even if you return '' from CsvRowDump.csv_meta that will still be a blank line where a parse expects the headers. So instead lets let dump write that line and then remove it afterwards when we call dump. This example assumes you have an array of hashes that all have the same keys (which will be the CSV header).
#rows = #hashes.map { |h| CsvRowDump.new(h) }
File.open(#filename, "wb") do |f|
str = CSV::dump(#rows)
f.write(str.split(/\n/)[1..-1].join("\n"))
end

understanding Ruby code?

I was wondering if anyone can help me understanding the Ruby code below? I'm pretty new to Ruby programming and having trouble understanding the meaning of each functions.
When I run this with my twitter username and password as parameter, I get a stream of twitter feed samples. What do I need to do with this code to only display the hashtags?
I'm trying to gather the hashtags every 30 seconds, then sort from least to most occurrences of the hashtags.
Not looking for solutions, but for ideas. Thanks!
require 'eventmachine'
require 'em-http'
require 'json'
usage = "#{$0} <user> <password>"
abort usage unless user = ARGV.shift
abort usage unless password = ARGV.shift
url = 'https://stream.twitter.com/1/statuses/sample.json'
def handle_tweet(tweet)
return unless tweet['text']
puts "#{tweet['user']['screen_name']}: #{tweet['text']}"
end
EventMachine.run do
http = EventMachine::HttpRequest.new(url).get :head => { 'Authorization' => [ user, password ] }
buffer = ""
http.stream do |chunk|
buffer += chunk
while line = buffer.slice!(/.+\r?\n/)
handle_tweet JSON.parse(line)
end
end
end
puts "#{tweet['user']['screen_name']}: #{tweet['text']}"
That line shows you a user name followed by the content of the tweet.
Let's take a step back for a sec.
Hash tags appear inside the tweet's content--this means they're inside tweet['text']. A hash tag always takes the form of a # followed by a bunch of non-space characters. That's really easy to grab with a regex. Ruby's core API facilitates that via String#scan. Example:
"twitter is short #foo yawn #bar".scan(/\#\w+/) # => ["#foo", "#bar"]
What you want is something like this:
def handle_tweet(tweet)
return unless tweet['text']
# puts "#{tweet['user']['screen_name']}: #{tweet['text']}" # OLD
puts tweet['text'].scan(/\#\w+/).to_s
end
tweet['text'].scan(/#\w+/) is an array of strings. You can do whatever you want with that array. Supposing you're new to Ruby and want to print the hash tags to the console, here's a brief note about printing arrays with puts:
puts array # => "#foo\n#bar"
puts array.to_s # => '["#foo", "#bar"]'
#Load Libraries
require 'eventmachine'
require 'em-http'
require 'json'
# Looks like this section assumes you're calling this from commandline.
usage = "#{$0} <user> <password>" # $0 returns the name of the program
abort usage unless user = ARGV.shift # Return first argument passed when program called
abort usage unless password = ARGV.shift
# The URL
url = 'https://stream.twitter.com/1/statuses/sample.json'
# method which, when called later, prints out the tweets
def handle_tweet(tweet)
return unless tweet['text'] # Ensures tweet object has 'text' property
puts "#{tweet['user']['screen_name']}: #{tweet['text']}" # write the result
end
# Create an HTTP request obj to URL above with user authorization
EventMachine.run do
http = EventMachine::HttpRequest.new(url).get :head => { 'Authorization' => [ user, password ] }
# Initiate an empty string for the buffer
buffer = ""
# Read the stream by line
http.stream do |chunk|
buffer += chunk
while line = buffer.slice!(/.+\r?\n/) # cut each line at newline
handle_tweet JSON.parse(line) # send each tweet object to handle_tweet method
end
end
end
Here's a commented version of what the source is doing. If you just want the hashtag, you'll want to rewrite handle_tweet to something like this:
handle_tweet(tweet)
tweet.scan(/#\w/) do |tag|
puts tag
end
end

Resources