How do handle control flow better and nil objects in ruby - ruby

I have this script that is a part of a bigger one. I have tree diffrent XML files that looks a litle diffrent from each other and I need some type of control structure to handle nil-object and xpath expressions better
The script that I have right now, outputs nil objects:
require 'open-uri'
require 'rexml/document'
include REXML
#urls = Array.new()
#urls << "http://testnavet.skolverket.se/SusaNavExport/EmilObjectExporter?id=186956355&strId=info.uh.kau.KTADY1&EMILVersion=1.1"
#urls << "http://testnavet.skolverket.se/SusaNavExport/EmilObjectExporter?id=184594606&strId=info.uh.gu.GS5&EMILVersion=1.1"
#urls << "http://testnavet.skolverket.se/SusaNavExport/EmilObjectExporter?id=185978100&strId=info.uh.su.ARO720&EMILVersion=1.1"
#urls.each do |url|
doc = REXML::Document.new(open(url).read)
doc.elements.each("/educationInfo/extensionInfo/nya:textualDescription/nya:textualDescriptionPhrase | /ns:educationInfo/ns:extensionInfo/gu:guInfoExtensions/gu:guSubject/gu:descriptions/gu:description | //*[name()='ct:text']"){
|e| m = e.text
m.gsub!(/<.+?>/, "")
puts "Description: " + m
puts ""
}
end
OUTPUT:
Description: bestrykning, kalandrering, tryckning, kemiteknik
Description: Vill du jobba med internationella och globala frågor med...
Description: The study of globalisation is becoming ever more
important for our understanding of today´s world and the School of
Global Studies is a unique environment for research.
Description:
Description:
Description: Kursen behandlar identifieringen och beskrivningen av
sjukliga förändringar i mänskliga skelett. Kursen ger en
ämneshistorisk bakgrund och skelettförändringars förhållanden till
moderna kliniska data diskuteras.

See this post on how to skip over entries when using a block in ruby. The method each() on doc.elements is being called with a block (which is you code containing gsub and puts calls). The "next" keyword will let you stop executing the block for the current element and move on to the next one.
doc.elements.each("/educationInfo/extensionInfo/nya:textualDescription/nya:textualDescriptionPhrase | /ns:educationInfo/ns:extensionInfo/gu:guInfoExtensions/gu:guSubject/gu:descriptions/gu:description | //*[name()='ct:text']"){
|e| m = e.text
m.gsub!(//, "")
next if m.empty?
puts "Description: " + m
puts ""
}
We know that "m" is a string (and not nil) when using the "next" keyword because we just called gsub! on it, which did not throw an error when executing that line. That means the blank Descriptions are caused by empty strings, not nil objects.

Related

loop through json array and retrieve one attribute, gives errors also

i am new to programming in ruby, and i am trying to get the value of json['earning_rate_hr'] but i get an error, in '[]': no implicit conversion of String into Integer (TypeError)
i know and i understand the error, however this is not my main question here is my file :
checkingchecker.rb :
#require_relative '../lib/hackex/net/typhoeus'
require_relative '../lib/hackex'
require 'rubygems'
require 'json'
file = 'accounts1.txt'
f = File.open file, 'r'
puts "MADE BY THE PEOPLE, FOR THE PEOPLE #madebylorax"
puts ""
puts "--------------------------------------------------------"
puts ""
while line = f.gets
line = line.chomp.split(';')
email, password = line
puts "logging in as " + email
HackEx.LoginDo(email, password) do |http, auth_token, user|
puts "getting info..."
user = HackEx::Request.Do(http, HackEx::Request.UserInfo(auth_token))['user']
puts "receieved user info!"
bank = HackEx::Request.Do(http, HackEx::Request.UserBank(auth_token))['user_bank']
puts "recieved bank info!"
json = HackEx::Request.Do(http, HackEx::Request.UserSpam(auth_token))['spam']
puts "recieved spam info!"
puts json['earning_rate_hr'] #error line, the error is because this is an array, and it cant be turned into integer, i was wondering if there is a way to use puts on it without trying to make it an integer
userchecking = bank["checking"]
checking = userchecking.scan(/.{1,3}/).join(',')
puts email + " has in Checking: BTC #{checking}"
puts ""
puts "--------------------------------------------------------"
puts ""
end
end
i tried to do puts json, it puts items like this one :
{"id"=>"9867351", "user_id"=>"289108", "victim_user_id"=>"1512021",
"victim_ip"=
"86.60.226.175", "spam_level"=>"50", "earning_rate_hr"=>"24300", "total_earning s"=>"13267800", "started_at"=>"2015-11-01 07:46:59",
"last_collected_at"=>"2015- 11-24 01:46:59"}
what i want to do is select the earning_rate_hr for each one of them and add them together, however i do not have a clue on how to do that, since the error is not fixed and i cant get the value of it
ps : i tried turning it into a Hash, and i also tried using .first, but .first only shows the firs one, i want to show all of them, thank you
I know you from line messenger, I haven't used ruby codes in a long time and this one keeps giving me cloudflare errors, I'm not sure if its because of server downtime/maintainance or whatever but yeah anyway heres your script, enjoy farming ;) -LineOne
PS, I changed a few strings to make it look a lil cleaner so you can see the spam income easier, and added the sleep (1) because sleeping for one second before reconnecting helps to prevent cloudflare errors
also you don't need to require json or rubygems in your hackex scripts because its required in the library so its all covered pre-user-input/script
require_relative 'libv5/lib/hackex'
while 1<2
begin
print'Filename: '
fn=gets.chomp
file = fn+'.txt'
f = File.open file, 'r'
puts "MADE BY THE PEOPLE, FOR THE PEOPLE #madebylorax" #helped by lineone
puts ""
puts "--------------------------------------------------------"
puts ""
while line = f.gets
line = line.chomp.split(';')
email, password = line
HackEx.LoginDo(email, password) do |http, auth_token, user|
puts "Retrieving Info..."
puts''
user = HackEx::Request.Do(http, HackEx::Request.UserInfo(auth_token))['user']
bank = HackEx::Request.Do(http, HackEx::Request.UserBank(auth_token))['user_bank']
json = HackEx::Request.Do(http, HackEx::Request.UserSpam(auth_token))['spam']
cash_count=0
tot_count=0
json.each do |j|
earn_rate = j['earning_rate_hr']
total= j['total_earnings']
cash_count+=earn_rate.to_i
tot_count+=total.to_i
end
print "#{email}: current earnings: #{cash_count} per hour, Total earnings #{tot_count},"
userchecking = bank["checking"]
checking = userchecking.scan(/.{1,3}/).join(',')
puts " #{checking} BTC in Checking"
puts ""
puts "--------------------------------------------------------"
puts ""
sleep 1
end
end
rescue
puts"#{$!}"
end
end
Thats fine you can also calculate the total income of your farms by adding new variables at the top example a=0 then adding the number at the end a+=tot_count
This should help:
earning_rates = json.map{|e| e["earning_rate_hr"]}
puts "Earning rates per hour: #{earning_rates.join(" ")}"
puts "Sum of earning rates: #{earning_rates.map{|e| e.to_i}.inject{|sum, x| sum + x}}"

How can I process huge JSON files as streams in Ruby, without consuming all memory?

I'm having trouble processing a huge JSON file in Ruby. What I'm looking for is a way to process it entry-by-entry without keeping too much data in memory.
I thought that yajl-ruby gem would do the work but it consumes all my memory. I've also looked at Yajl::FFI and JSON:Stream gems but there it is clearly stated:
For larger documents we can use an IO object to stream it into the
parser. We still need room for the parsed object, but the document
itself is never fully read into memory.
Here's what I've done with Yajl:
file_stream = File.open(file, "r")
json = Yajl::Parser.parse(file_stream)
json.each do |entry|
entry.do_something
end
file_stream.close
The memory usage keeps getting higher until the process is killed.
I don't see why Yajl keeps processed entries in the memory. Can I somehow free them, or did I just misunderstood the capabilities of Yajl parser?
If it cannot be done using Yajl: is there a way to do this in Ruby via any library?
Problem
json = Yajl::Parser.parse(file_stream)
When you invoke Yajl::Parser like this, the entire stream is loaded into memory to create your data structure. Don't do that.
Solution
Yajl provides Parser#parse_chunk, Parser#on_parse_complete, and other related methods that enable you to trigger parsing events on a stream without requiring that the whole IO stream be parsed at once. The README contains an example of how to use chunking instead.
The example given in the README is:
Or lets say you didn't have access to the IO object that contained JSON data, but instead only had access to chunks of it at a time. No problem!
(Assume we're in an EventMachine::Connection instance)
def post_init
#parser = Yajl::Parser.new(:symbolize_keys => true)
end
def object_parsed(obj)
puts "Sometimes one pays most for the things one gets for nothing. - Albert Einstein"
puts obj.inspect
end
def connection_completed
# once a full JSON object has been parsed from the stream
# object_parsed will be called, and passed the constructed object
#parser.on_parse_complete = method(:object_parsed)
end
def receive_data(data)
# continue passing chunks
#parser << data
end
Or if you don't need to stream it, it'll just return the built object from the parse when it's done. NOTE: if there are going to be multiple JSON strings in the input, you must specify a block or callback as this is how yajl-ruby will hand you (the caller) each object as it's parsed off the input.
obj = Yajl::Parser.parse(str_or_io)
One way or another, you have to parse only a subset of your JSON data at a time. Otherwise, you are simply instantiating a giant Hash in memory, which is exactly the behavior you describe.
Without knowing what your data looks like and how your JSON objects are composed, it isn't possible to give a more detailed explanation than that; as a result, your mileage may vary. However, this should at least get you pointed in the right direction.
Both #CodeGnome's and #A. Rager's answer helped me understand the solution.
I ended up creating the gem json-streamer that offers a generic approach and spares the need to manually define callbacks for every scenario.
Your solutions seem to be json-stream and yajl-ffi. There's an example on both that're pretty similar (they're from the same guy):
def post_init
#parser = Yajl::FFI::Parser.new
#parser.start_document { puts "start document" }
#parser.end_document { puts "end document" }
#parser.start_object { puts "start object" }
#parser.end_object { puts "end object" }
#parser.start_array { puts "start array" }
#parser.end_array { puts "end array" }
#parser.key {|k| puts "key: #{k}" }
#parser.value {|v| puts "value: #{v}" }
end
def receive_data(data)
begin
#parser << data
rescue Yajl::FFI::ParserError => e
close_connection
end
end
There, he sets up the callbacks for possible data events that the stream parser can experience.
Given a json document that looks like:
{
1: {
name: "fred",
color: "red",
dead: true,
},
2: {
name: "tony",
color: "six",
dead: true,
},
...
n: {
name: "erik",
color: "black",
dead: false,
},
}
One could stream parse it with yajl-ffi something like this:
def parse_dudes file_io, chunk_size
parser = Yajl::FFI::Parser.new
object_nesting_level = 0
current_row = {}
current_key = nil
parser.start_object { object_nesting_level += 1 }
parser.end_object do
if object_nesting_level.eql? 2
yield current_row #here, we yield the fully collected record to the passed block
current_row = {}
end
object_nesting_level -= 1
end
parser.key do |k|
if object_nesting_level.eql? 2
current_key = k
elsif object_nesting_level.eql? 1
current_row["id"] = k
end
end
parser.value { |v| current_row[current_key] = v }
file_io.each(chunk_size) { |chunk| parser << chunk }
end
File.open('dudes.json') do |f|
parse_dudes f, 1024 do |dude|
pp dude
end
end

Ruby win32ole - Can't replace text in word document

I'm trying to replace text in a document like so:
require 'win32ole'
def replace_doc(doc, find, repl)
begin
word = WIN32OLE.new('Word.Application')
word.Visible = true
doc = word.Documents.Open(doc)
word.Selection.HomeKey(unit=6)
finder = word.Selection.Find
finder.Text = "[#{find}]"
while word.Selection.Find.Execute
word.Selection.TypeText(text=repl)
end
doc.SaveAs(doc)
doc.Close
rescue Exception => e
puts e.message
puts "Unable to edit file."
end
end
def main()
puts "File: "
doc = gets.chomp()
puts "Find: "
find = gets.chomp()
puts "Replace with: "
repl = gets.chomp()
replace_doc(doc, find, repl)
end
main()
I'm running Ruby 2.0 on Windows XP. The WINWORD.exe process starts (I see it in task manager), and no exception is raised. However, when I go to the document, none of the text I expect to be replaced -- is. What is going on? I've copied the code (except for a few things) from here.
It's hard to say without the actual word document and input data you're using, but I suspect that the square brackets in finder.Text are your issue. As your program is now, entering foo for the find text would search for [foo] in your word document, not plain foo. Note that in the post you linked. There are actual square brackets in the example word document (it contains [date] etc.)

How to work around the unavailibilty of passing by reference in ruby?

My problem is around the fact that I cannot pass by reference in Ruby.
I have two functions searching and get_title_ids.
I have two arrays in searching
(1) title (2) href
which needs to be updated.
def searching
title = []
href = []
(0..20).step(10) do |i|
prev= title.length
title, href = get_title_ids(i, title, href) ## <<-- Should have just done "get_title_ids(i, title, href)"
## something which on next iteration will increase the ids and hrefs
puts "\nthe title lenght is #{title.length} and href length is #{href.length}\n"
assert_operator prev,:<,title.length,"After scrolling to bottom no new jobs are getting added"
end
end
def get_title_ids (start_from=0, title=[], href=[])
#Part of code which can store all links and titles of jobs displayed
(start_from..(titles.length-1)).each do |i|
unless titles[i].text.chomp
title << titles[i].text.chomp
href << titles[i].attribute("href")
end
end
end
return [title, href] ### <<---- this is what messed it up
end
The problem is I am unable to push new elements into the arrays title and href that have been defined in searching.
Each time I call get_title_ids i do not want to gather data that I had previously gathered(hence the start_form).
My problem is not memory but time. So i am not too concerned about the data being duplicated when I call the get_title_ids function as compared to the fact that I have to waste time scrapping data that I already scrapped in the previous for loop.
So does any one know how to hack the pass by reference in Ruby.
EDIT
SO from reading the questions below turns out I dint need to perform the return from get_title_ids. And then it all worked.
Arrays in ruby are most certainly passed by reference (well, technically, they are passed by value, but that value is a pointer to the array). Observe:
def push_new ary
ary << 'new element'
end
a = ['first element']
push_new a
a # => ["first element", "new element"]
Even if a reference type object is passed by value, it is still referring to the same object in memory. If this were not the case then the example below would not work.
example:
> def searching
> title = []
> href = []
> test(title, href)
> puts "Title: #{title.inspect} Href: #{href.inspect}"
> end
> def test(title, href)
> title << "title1"
> title << "title2"
> href << "www.title1.com"
> href << "www.title2.com"
> end
> searching
Title: ["title1", "title2"] Href: ["www.title1.com", "www.title2.com"]

Ruby parameterize if ... then blocks

I am parsing a text file and want to be able to extend the sets of tokens that can be recognized easily. Currently I have the following:
if line =~ /!DOCTYPE/
puts "token doctype " + line[0,20]
#ast[:doctype] << line
elsif line =~ /<html/
puts "token main HTML start " + line[0,20]
html_scanner_off = false
elsif line =~ /<head/ and not html_scanner_off
puts "token HTML header starts " + line[0,20]
html_header_scanner_on = true
elsif line =~ /<title/
puts "token HTML title " + line[0,20]
#ast[:HTML_header_title] << line
end
Is there a way to write this with a yield block, e.g. something like:
scanLine("title", :HTML_header_title, line)
?
Don't parse HTML with regexes.
That aside, there are several ways to do what you're talking about. One:
class Parser
class Token
attr_reader :name, :pattern, :block
def initialize(name, pattern, block)
#name = name
#pattern = pattern
#block = block
end
def process(line)
#block.call(self, line)
end
end
def initialize
#tokens = []
end
def scanLine(line)
#tokens.find {|t| line =~ t.pattern}.process(line)
end
def addToken(name, pattern, &block)
#tokens << Token.new(name, pattern, block)
end
end
p = Parser.new
p.addToken("title", /<title/) {|token, line| puts "token #{token.name}: #{line}"}
p.scanLine('<title>This is the title</title>')
This has some limitations (like not checking for duplicate tokens), but works:
$ ruby parser.rb
token title: <title>This is the title</title>
$
If you're intending to parse HTML content, you might want to use one of the HTML parsers like nokogiri (http://nokogiri.org/) or Hpricot (http://hpricot.com/) which are really high-quality. A roll-your-own approach will probably take longer to perfect than figuring out how to use one of these parsers.
On the other hand, if you're dealing with something that's not quite HTML, and can't be parsed that way, then you'll need to roll your own somehow. There's a few Ruby parser frameworks out there that may help, but for simple tasks where performance isn't a critical factor, you can get by with a pile of regexps like you have here.

Resources