how to find the file path from the open command - ruby

I need to get the path of the file in fo variable so that i can pass the path to the unzip_file function. how do i get the path here?
url = 'http://www.dtniq.com/product/mktsymbols_v2.zip'
open(url, 'r') do |fo|
puts "unzipfile "
unzip_file(fo, "c:\\temp11\\")
end

In terms of how to do it I would do this:
Find out the class of the object I am dealing with
ruby-1.9.2-p290 :001 > tmp_file = open('tmp.txt', 'r')
=> #<File:tmp.txt>
ruby-1.9.2-p290 :001 > tmp_file.class
=> File
Go look up the documentation for that class
Google Search : ruby file
Which returns Class: File ruby-doc.org => www.ruby-doc.org/core/classes/File.html
Look at the methods. There is one called path -> looks interesting
If I haven't found an answer by now then
Continue looking around google/stack overflow for a bit
I really can't find a solution that matches my problem. Time to ask a question on here
Most of the time 1..3 should get you what you need. Once you learn to read the documentation you can do things a lot quicker. It's just trying to overcome how difficult it is to get into the docs when you first start.

The fo in your block should be a Tempfile so you can use the path method:
url = 'http://www.dtniq.com/product/mktsymbols_v2.zip'
open(url, 'r') do |fo|
puts "unzipfile "
unzip_file(fo.path, "c:\\temp11\\")
end

Related

xpath search using libxml + ruby

I am trying to search for a specific node in an XML file using XPath. This search worked just fine under REXML but REXML was too slow for large XML docs. So moved over to LibXML.
My simple example is processing a Yum repomd.xml file, an example can be found here: http://mirror.san.fastserv.com/pub/linux/centos/6/os/x86_64/repodata/repomd.xml
My test script is as follows:
require 'rubygems'
require 'libxml'
p = LibXML::XML::Parser.file( "/tmp/dr.xml")
repomd = p.parse
filelist = repomd.find_first("/repomd/data[#type='filelists']/location#href")
puts "Length: " + filelist.length.to_s
filelist.each do |f|
puts f.attributes['href']
end
I get this error:
Error: Invalid expression.
/usr/lib/ruby/gems/1.8/gems/libxml-ruby-2.7.0/lib/libxml/document.rb:123:in `find': Error: Invalid expression. (LibXML::XML::Error)
from /usr/lib/ruby/gems/1.8/gems/libxml-ruby-2.7.0/lib/libxml/document.rb:123:in `find'
from /usr/lib/ruby/gems/1.8/gems/libxml-ruby-2.7.0/lib/libxml/document.rb:130:in `find_first'
from /tmp/scripty.rb:6
I have also tried simpler examples like below, but still no dice.
p = LibXML::XML::Parser.file( "/tmp/dr.xml")
repomd = p.parse
filelist = repomd.root.find(".//location")
puts "Length: " + filelist.length.to_s
In the above case I get the output:
Length: 0
Your inspired guidance would be greatly appreciated, and I have searched for what I am doing wrong, and I just can't figure it out...
Here is some code that will fetch the file and process it, still doesn't work...
require 'rubygems'
require 'open-uri'
require 'libxml'
raw_xml = open('http://mirror.san.fastserv.com/pub/linux/centos/6/os/x86_64/repodata/repomd.xml').read
p = LibXML::XML::Parser.string(raw_xml)
repomd = p.parse
filelist = repomd.find_first("//data[#type='filelists']/location[#href]")
puts "First: " + filelist
In the end I reverted back to REXML and used stream processing. Much faster and much easier XPath syntax implementation.
Looking at your code,it seems you want to collect only those location elements which has href attribute. If that's the case below should work:
"//data[#type='filelists']/location[#href]"

Reading Several URIs in ruby

I need to read the contents of web page for several times and extract some information out of it for which I use regular expressions. I am using open-uri to read contents of the page and the sample code I written is as follows:
require 'open-uri'
def getResults(words)
results = []
words.each do |word|
results.push getAResult(word)
end
results
end
def getAResult(word)
file = open("http://www.somapage.com?option=#{word}")
contents = file.read
file.close
contents.match /some-regex-here/
$1.empty? ? -1 : $1.to_f
end
The problem is unless I comment out file.close line getAResult returns always -1. When I try this code on console, getAResult immediately returns -1, but ruby process runs for another two to three seconds or so.
If I remove file.close line getAResult returns the correct result, but now getResults is a bunch of -1s except for the first one. I tried to use curb gem for reading the page, but similar problem appears.
This seems like an issue related with threading. However, I couldn't come up with something reasonable to search and find a corresponding solution. What do you think problem would be?
NOTE: This web page I try to read does not return results so fast. It takes some time.
try hpricot or nokogiri
it can search documents via XPath in your html file
You should grab the match result, like the following:
1.9.3-327 (main):0 > contents.match /div/
=> #<MatchData "div">
1.9.3-327 (main):0 > $1
=> nil
1.9.3-327 (main):0 > contents.match /(div)/
=> #<MatchData "div" 1:"div">
1.9.3-327 (main):0 > $1
=> "div"
If you are worried about thread safety, then you shouldn't use the $n regexp variables. Capture your results directly, like this:
value = contents[/regexp/]
Specifically, here's a more ruby-like formatting of that method:
def getAResult(word)
contents = open("http://www.somapage.com?option=#{word}"){|f| f.read }
value = contents[/some-regex-here/]
value.empty? ? -1 : value.to_f
end
The block form of #open (as above) automatically closes the file when you are done with it.

How do I avoid EOFError with Ruby script?

I have a Ruby script (1.9.2p290) where I am trying to call a number of URLs, and then append information from those URLs into a file. The issue is that I keep getting an end of file error - EOFError. An example of what I'm trying to do is:
require "open-uri"
proxy_uri = URI.parse("http://IP:PORT")
somefile = File.open("outputlist.txt", 'a')
(1..100).each do |num|
page = open('SOMEURL' + num, :proxy => proxy_uri).read
pattern = "<img"
tags = page.scan(pattern)
output << tags.length
end
somefile.puts output
somefile.close
I don't know why I keep getting this end of file error, or how I can avoid getting the error. I think it might have something to do with the URL that I'm calling (based on some dialogue here: What is an EOFError in Ruby file I/O?), but I'm not sure why that would affect the I/O or cause an end of file error.
Any thoughts on what I might be doing wrong here or how I can get this to work?
Thanks in advance!
The way you are writing your file isn't idiomatic Ruby. This should work better:
(1..100).each do |num|
page = open('SOMEURL' + num, :proxy => proxy_uri).read
pattern = "<img"
tags = page.scan(pattern)
output << tags.length
end
File.open("outputlist.txt", 'a') do |fo|
fo.puts output
end
I suspect that the file is being closed because it's been opened, then not written-to while 100 pages are processed. If that takes a while I can see why they'd close it to avoid apps using up all the file handles. Writing it the Ruby-way automatically closes the file immediately after the write, avoiding holding handles open artificially.
As a secondary thing, rather than use a simple pattern match to try to locate image tags, use a real HTML parser. There will be little difference in processing speed, but potentially more accuracy.
Replace:
page = open('SOMEURL' + num, :proxy => proxy_uri).read
pattern = "<img"
tags = page.scan(pattern)
output << tags.length
with:
require 'nokogiri'
doc = Nokogiri::HTML(open('SOMEURL' + num, :proxy => proxy_uri))
output << doc.search('img').size

Reading file with Ruby returns strange output

I am trying to read in a JSON file with Ruby and the output is extremely strange. Here is the code that I am using:
require 'rubygems'
class ServiceCalls
def initialize ()
end
def getFile()
Dir.entries('./json').each do |mFile|
if mFile[0,1] != "."
self.sendServiceRequest(mFile)
end
end
end
def sendServiceRequest(mFile)
currentFile = File.new("./json/" + mFile, "r")
puts currentFile.read
currentFile.close
end
end
mServiceCalls = ServiceCalls.new
mServiceCalls.getFile
And here is the output:
Macintosh H??=A?v?P$66267945-2481-3907-B88A-1094AA9DAB6D??/??is32???????????????????????????????????vvz?????????????????????????????????????????????????????????????????????????????????????????????vvz?????????????????????????????????????????????????????????????????????????????????????????????vvz???????????????????????????????????????????????????????????s8m+88888888???????89????????99?????????9:??????????:;??????????;=??????????=>??????????>????????????#??????????#A??????????AC??????????CD??????????DE??????????EE??????????E6OXdknnkdXO6ic118?PNG
bookmark88?A[DT>??A?#
ApplicationsMAMPhtdocsServiceTestAutomationMDXservicecatalog-verizon.json$4T??
`?
U?????l??????
Macintosh H??=A?v?P$66267945-2481-3907-B88A-1094?is32???????????????????????????????????vvz?????????????????????????????????????????????????????????????????????????????????????????????vvz?????????????????????????????????????????????????????????????????????????????????????????????vvz???????????????????????????????????????????????????????????s8m+88888888???????89????????99?????????9:??????????:;??????????;=??????????=>??????????>????????????#??????????#A??????????AC??????????CD??????????DE??????????EE??????????E6OXdknnkdXO6ic118?PNG
UIEvolutions-MacBook-Pro-109:MDXServiceTesting Banderson$ ruby testmdxservices.rb
bookmark88?A?,P>??A?#
ApplicationsMAMPhtdocsServiceTestAutomationMDXservicecatalog-adaptation.json$4T??
`?
U?????l??????
Macintosh H??=A?v?P$66267945-2481-3907-B88A-1094AA9DAB6D??/?<icns<?TOC his32?s8mic118il32?l8mic1?ic07ic13#ic08#ic14^?ic09_ic1?is32???????????????????????????????????vvz?????????????????????????????????????????????????????????????????????????????????????????????vvz?????????????????????????????????????????????????????????????????????????????????????????????vvz???????????????????????????????????????????????????????????s8m+88888888???????89????????99?????????9:??????????:;??????????;=??????????=>??????????>????????????#??????????#A??????????AC??????????CD??????????DE??????????EE??????????E6OXdknnkdXO6ic118?PNG
IHDR szz?iCCPICC Profile(?T?k?P??e???:g >h?ndStC??kW??Z?6?!H??m\??$?~?ًo:?w?>?
كo{?a???"L?"???4M'S??????9'??^??qZ?/USO???????^C+?hM??J&G#Ӳy???lt?o߫?c՚?
? ??5?"?Y?i\?΁?'&??.?<?ER/?dE?oc?ግ#?f45#? ??B:K?#8?i??
??s??_???雭??m?N?|??9}p?????_?A??pX6?5~B?$?&???ti??e??Y)%$?bT?3li?
??????P???4?43Y???P??1???KF????ۑ??5>?)?#????r??y??????[?:V???ͦ#??wQ?HB??d(??B
a?cĪ?L"J??itTy?8?;(???Gx?_?^?[???????%׎??ŷ??Q???麲?ua??n?7???
Q???H^e?O?Q?u6?S??u
?2??%vX
???^?*l
O?????ޭˀq,>??S???%?L??d????B???1CZ??$M??9??P
'w????\/????]???.r#???E|!?3?>_?o?a?۾?d?1Z?ӑ???z???'?=??????~+??cjJ?tO%mN?????
|??-???bW?O+
o?
^?
I?H?.?;???S?]?i_s9?*p???.7U^??s.?3u?
Can someone please tell me what I am doing wrong? Do I need to specify what type of encoding I'm using? I have tried to read the file with gets, sysread, and another I can't remember.
I am not completely sure why but I believe it is the './json' path that is causing the issue. I tried the script on my Windows XP machine and got similar results.
However, when I rewrote the script to include File.dirname(__FILE__) instead of './' it worked. I also cleaned up some of the code.
class ServiceCalls
def get_file
dirname = File.join(File.dirname(__FILE__), 'json')
Dir.entries(dirname).each do |file|
unless file.start_with? '.'
File.open(File.join(dirname, file), 'r') {|f| puts f.read}
end
end
end
end
sc = ServiceCalls.new
sc.get_file
__FILE__ is the path of the current script. File.join uses system independent path separators. File.open, if you pass it a block, will actually close the file for you when it completes the block. String#start_with? is a cleaner way than using [0,1] to get the first element of a string.
try this:
Dir.entries('./json').each do |mFile|
next if ['.', '..'].include?(mFile)
self.sendServiceRequest(mFile)

Ruby unable to use require

This is a newbie question as I am attempting to learn Ruby by myself, so apologies if it sounds like a silly question!
I am reading through the examples of why's (poignant) guide to ruby and am in chapter 4. I typed the code_words Hash into a file called wordlist.rb
I opened another file and typed the first line as require 'wordlist.rb' and the rest of the code as below
#Get evil idea and swap in code
print "Enter your ideas "
idea = gets
code_words.each do |real, code|
idea.gsub!(real, code)
end
#Save the gibberish to a new file
print "File encoded, please enter a name to save the file"
ideas_name = gets.strip
File::open( 'idea-' + ideas_name + '.txt', 'w' ) do |f|
f << idea
end
When I execute this code, it fails with the following error message:
C:/MyCode/MyRubyCode/filecoder.rb:5: undefined local variable or method `code_words' for main:Object (NameError)
I use Windows XP and Ruby version ruby 1.8.6
I know I should be setting something like a ClassPath, but not sure where/how to do so!
Many thanks in advance!
While the top-level of all files are executed in the same context, each file has its own script context for local variables. In other words, each file has its own set of local variables that can be accessed throughout that file, but not in other files.
On the other hand, constants (CodeWords), globals ($code_words) and methods (def code_words) would be accessible across files.
Some solutions:
CodeWords = {:real => "code"}
$code_words = {:real => "code"}
def code_words
{:real => "code"}
end
An OO solution that is definitely too complex for this case:
# first file
class CodeWords
DEFAULT = {:real => "code"}
attr_reader :words
def initialize(words = nil)
#words = words || DEFAULT
end
end
# second file
print "Enter your ideas "
idea = gets
code_words = CodeWords.new
code_words.words.each do |real, code|
idea.gsub!(real, code)
end
#Save the gibberish to a new file
print "File encoded, please enter a name to save the file"
ideas_name = gets.strip
File::open( 'idea-' + ideas_name + '.txt', 'w' ) do |f|
f << idea
end
I think the problem might be that the require executes the code in another context, so the runtime variable is no longer available after the require.
What you could try is making it a constant:
CodeWords = { :real => 'code' }
That will be available everywhere.
Here is some background on variable scopes etc.
I was just looking at the same example and was having the same problem.
What I did was change the variable name in both files from code_words to $code_words .
This would make it a global variable and thus accesible by both files right?
My question is: wouldn't this be a simpler solution than making it a constant and having to write CodeWords = { :real => 'code' } or is there a reason not to do it ?
A simpler way would be to use the Marshal.dump feature to save the code words.
# Save to File
code_words = {
'starmonkeys' => 'Phil and Pete, those prickly chancellors of the New Reich',
'catapult' => 'chucky go-go', 'firebomb' => 'Heat-Assisted Living',
'Nigeria' => "Ny and Jerry's Dry Cleaning (with Donuts)",
'Put the kabosh on' => 'Put the cable box on'
}
# Serialize
f = File.open('codewords','w')
Marshal.dump(code_words, f)
f.close
Now at the beginning of your file you would put this:
# Load the Serialized Data
code_words = Marshal.load(File.open('codewords','r'))
Here's the easy way to make sure you can always include a file that's in the same directory as your app, put this before the require statement
$:.unshift File.dirname(__FILE__)
$: is the global variable representing the "CLASSPATH"

Resources