Why negative value becomes positive after dumping into yaml file? - ruby

I have a simple sinatra app uses yaml files to handle data. One of the feature is an User can vote or veto for a Question. The vote feature works fine, but I met some weird things when implementing the veto feature.
Briefly saying:
when a question's current votes_count is positive (>= 1), the number will decrease correctly
but when a question's current votes_count is zero or negative, the number will successfully decrease in the data hash, but after dump data hash into the yaml file, negative becomes positive.
This is the yaml file for Question:
'1': !ruby/hash:Sinatra::IndifferentHash
title: " Best way to require all files from a directory in ruby?"
description: What's the best way to require all files from a directory in ruby ?
user_id: '3'
votes_count: 0
# other user information
This is the route handler relates to veto feature:
post "/questions/:id/veto" do
check_vote_validity_for_question(params[:id])
#question = Question.find_by(:id, params[:id])
#question.votes_count = (#question.votes_count.to_i - 1)
Question.update(params[:id], votes_count: #question.votes_count )
# omit user related code
end
This is the update method:
def self.update(id, attrs)
data = load_data_of(data_name)
# binding.pry
obj_info = data[id]
attrs.each do |k, v|
v = v.to_s if v.is_a?(Array)
obj_info[k] = v
end
# binding.pry
File.open(File.join(data_path, "#{data_name.to_s}.yaml"), "w+") do |f|
f.write(Psych.dump(data).delete("---"))
end
end
If I pause the program inside update method before and after update the data hash, it shows the value of votes_count was set correctly.
Before:
[1] pry(Question)> data
=> {"1"=>
{"title"=>" Best way to require all files from a directory in ruby?",
"description"=>"What's the best way to require all files from a directory in ruby ?",
"user_id"=>"3",
"votes_count"=>0},
After:
[1] pry(Question)> data
=> {"1"=>
{"title"=>" Best way to require all files from a directory in ruby?",
"description"=>"What's the best way to require all files from a directory in ruby ?",
"user_id"=>"3",
"votes_count"=>-1},
The value of the key"votes_count" in data hash is -1 after updating, but after I dumping the data hash into yaml file, the value of "votes_count" of the user in the yaml file became 1. And if the value in hash is -2, it will become 2 in the yaml file.
I tried making a hash which has a negative value in irb, then dump it into a yaml file, things work ok. I have no idea what happened. Could anybody help me?

It looks the problem in the line
f.write(Psych.dump(data).delete("---"))
You delete -.
For example
"-1".delete("---") #=> "1"

Related

How to read data from a different file without using YAML or JSON

I'm experimenting with a Ruby script that will add data to a Neo4j database using REST API. (Here's the tutorial with all the code if interested.)
The script works if I include the hash data structure in the initialize method but I would like to move the data into a different file so I can make changes to it separately using a different script.
I'm relatively new to Ruby. If I copy the following data structure into a separate file, is there a simple way to read it from my existing script when I call #data? I've heard one could do something with YAML or JSON (not familiar with how either work). What's the easiest way to read a file and how could I go about coding that?
#I want to copy this data into a different file and read it with my script when I call #data.
{
nodes:[
{:label=>"Person", :title=>"title_here", :name=>"name_here"}
]
}
And here is part of my code, it should be enough for the purposes of this question.
class RGraph
def initialize
#url = 'http://localhost:7474/db/data/cypher'
#If I put this hash structure into a different file, how do I make #data read that file?
#data = {
nodes:[
{:label=>"Person", :title=>"title_here", :name=>"name_here"}
]
}
end
#more code here... not relevant to question
def create_nodes
# Scan file, find each node and create it in Neo4j
#data.each do |key,value|
if key == :nodes
#data[key].each do |node| # Cycle through each node
next unless node.has_key?(:label) # Make sure this node has a label
#WE have sufficient data to create a node
label = node[:label]
attr = Hash.new
node.each do |k,v| # Hunt for additional attributes
next if k == :label # Don't create an attribute for "label"
attr[k] = v
end
create_node(label,attr)
end
end
end
end
rGraph = RGraph.new
rGraph.create_nodes
end
Given that OP said in comments "I'm not against using either of those", let's do it in YAML (which preserves the Ruby object structure best). Save it:
#data = {
nodes:[
{:label=>"Person", :title=>"title_here", :name=>"name_here"}
]
}
require 'yaml'
File.write('config.yaml', YAML.dump(#data))
This will create config.yaml:
---
:nodes:
- :label: Person
:title: title_here
:name: name_here
If you read it in, you get exactly what you saved:
require 'yaml'
#data = YAML.load(File.read('config.yaml'))
puts #data.inspect
# => {:nodes=>[{:label=>"Person", :title=>"title_here", :name=>"name_here"}]}

How do I ignore the nil values in the loop with parsed values from Mechanize?

In my text file are a list of URLs. Using Mechanize I'm using that list to parse out the title and meta description. However, some of those URL pages don't have a meta description which stops my script with a nil error:
undefined method `[]' for nil:NilClass (NoMethodError)
I've read up and seen solutions if I were using Rails, but for Ruby I've only seen reject and compact as possible solutions to ignore nil values. I added compact at the end of the loop, but that doesn't seem to do anything.
require 'rubygems'
require 'mechanize'
File.readlines('parsethis.txt').each do |line|
page = Mechanize.new.get(line)
title = page.title
metadesc = page.at("head meta[name='description']")[:content]
puts "%s, %s, %s" % [line.chomp, title, metadesc]
end.compact!
It's just a list of urls in a text like this:
http://www.a.com
http://www.b.com
This is what will output in the console for example:
http://www.a.com, Title, This is a description.
If within the list of URLs there is no description or title on that particular page, it throws up the nil error. I don't want it to skip any urls, I want it to go through the whole list.
Here is one way to do it:
Edit( for added requirement to not skip any url's):
metadesc = page.at("head meta[name='description']")
puts "%s, %s, %s" % [line.chomp, title, metadesc ? metadesc[:content] : "N/A"]
This is untested but I'd do something like this:
require 'open-uri'
require 'nokogiri'
page_info = {}
File.foreach('parsethis.txt') { |url|
page = Nokogiri::HTML(open(url))
title = page.title
meta_desc = page.at("head meta[name='description']")
meta_desc_content = meta_desc ? meta_desc[:content] : nil
page_info[url] = {:title => title, :meta_desc => meta_desc_content}
}
page_info.each do |url, info|
puts [
url,
info[:title],
info[:meta_desc]
].join(', ')
end
File.foreach iteratively reads a file, returning each line individually.
page.title could return a nil if a page doesn't have a title; titles are optional in pages.
I break down accessing the meta-description into two steps. Meta tags are optional in HTML so they might not exist, at which point a nil would be returned. Trying to access a content= parameter would result in an exception. I think that's what you're seeing.
Instead, in my code, meta_desc_content is conditionally assigned a value if the meta-description tag was found, or nil.
The code populates the page_info hash with key/value pairs of the URL and its associated title and meta-description. I did it this way because a hash-of-hashes, or possibly an array-of-hashes, is a very convenient structure for all sorts of secondary manipulations, such as returning the information as JSON or inserting into a database.
As a second step the code iterates over that hash, retrieving each key/value pair. It then joins the values into a string and prints them.
There are lots of things in your code that are either wrong, or not how I'd do them:
File.readlines('parsethis.txt').each returns an array which you then have to iterate over. That isn't scalable, nor is it efficient. File.foreach is faster than File.readlines(...).each so get in the habit of using it unless you are absolutely sure you know why you should use readlines.
You use Mechanize for something that Nokogiri and OpenURI can do faster. Mechanize is a great tool if you are working with forms and need to navigate a site, but you're not doing that, so instead you're dragging around additional code-weight that isn't necessary. Don't do that; It leads to slow programs among other things.
page.at("head meta[name='description']")[:content] is an exception in waiting. As I said above, meta-descriptions are not necessarily going to exist in a page. If it doesn't then you're trying to do nil[:content] which will definitely raise an exception. Instead, work your way down to the data you want so you can make sure that the meta-description exists before you try to get at its content.
You can't use compact or compact! the way you were. An each block doesn't return an array, which is the class you need for compact or compact!. You could have used map but the logic would have been messy and puts inside map is rarely used. (Probably shouldn't be used is more likely but that's a different subject.)

Parsing large file with SaxMachine seems to be loading the whole file into memory

I have a 1.6gb xml file, and when I parse it with Sax Machine it does not seem to be streaming or eating the file in chunks - rather it appears to be loading the whole file into memory (or maybe there is a memory leak somewhere?) because my ruby process climbs upwards of 2.5gb of ram. I don't know where it stops growing because I ran out of memory.
On a smaller file (50mb) it also appears to be loading the whole file. My task iterates over the records in the xml file and saves each record to a database. It takes about 30 seconds of "idling" and then all of a sudden the database queries start executing.
I thought SAX was supposed to allow you to work with large files like this without loading the whole thing in memory.
Is there something I am overlooking?
Many thanks
Update to add code sample
class FeedImporter
class FeedListing
include ::SAXMachine
element :id
element :title
element :description
element :url
def to_hash
{}.tap do |hash|
self.class.column_names.each do |key|
hash[key] = send(key)
end
end
end
end
class Feed
include ::SAXMachine
elements :listing, :as => :listings, :class => FeedListing
end
def perform
open('~/feeds/large_feed.xml') do |file|
# I think that SAXMachine is trying to load All of the listing elements into this one ruby object.
puts 'Parsing'
feed = Feed.parse(file)
# We are now iterating over each of the listing elements, but they have been "parsed" from the feed already.
puts 'Importing'
feed.listings.each do |listing|
Listing.import(listing.to_hash)
end
end
end
end
As you can see, I don't care about the <listings> element in the feed. I just want the attributes of each <listing> element.
The output looks like this:
Parsing
... wait forever
Importing (actually, I don't ever see this on the big file (1.6gb) because too much memory is used :(
Here's a Reader that will yield each listing's XML to a block, so you can process each Listing without loading the entire document into memory
reader = Nokogiri::XML::Reader(file)
while reader.read
if reader.node_type == Nokogiri::XML::Reader::TYPE_ELEMENT and reader.name == 'listing'
listing = FeedListing.parse(reader.outer_xml)
Listing.import(listing.to_hash)
end
end
If listing elements could be nested, and you wanted to parse the outermost listings as single documents, you could do this:
require 'rubygems'
require 'nokogiri'
# Monkey-patch Nokogiri to make this easier
class Nokogiri::XML::Reader
def element?
node_type == TYPE_ELEMENT
end
def end_element?
node_type == TYPE_END_ELEMENT
end
def opens?(name)
element? && self.name == name
end
def closes?(name)
(end_element? && self.name == name) ||
(self_closing? && opens?(name))
end
def skip_until_close
raise "node must be TYPE_ELEMENT" unless element?
name_to_close = self.name
if self_closing?
# DONE!
else
level = 1
while read
level += 1 if opens?(name_to_close)
level -= 1 if closes?(name_to_close)
return if level == 0
end
end
end
def each_outer_xml(name, &block)
while read
if opens?(name)
yield(outer_xml)
skip_until_close
end
end
end
end
once you have it monkey-patched, it's easy to deal with each listing individually:
open('~/feeds/large_feed.xml') do |file|
reader = Nokogiri::XML::Reader(file)
reader.each_outer_xml('listing') do |outer_xml|
listing = FeedListing.parse(outer_xml)
Listing.import(listing.to_hash)
end
end
Unfortunately there are now three different repos for sax-machine. And worse, the gemspec version was not bumped.
Despite the comment on Greg Weber's blog, I don't think this code was integrated into pauldix's or ezkl's forks. To use the lazy, fiber-based version of the code, I think you need to specifically reference gregweb's version in your gemfile like this:
gem 'sax-machine', :git => 'https://github.com/gregwebs/sax-machine'
I forked sax-machine so that it uses constant memory: https://github.com/gregwebs/sax-machine
Good news: there is a new maintainer that is planning on merging my changes.
Myself and the new maintainer have been using my fork without issue for a year now.
You are right, SAXMachine reads the whole document eagerly. Have a look at it's handler sources: https://github.com/pauldix/sax-machine/blob/master/lib/sax-machine/sax_handler.rb
To solve your Problem, I would use http://nokogiri.rubyforge.org/nokogiri/Nokogiri/XML/SAX/Parser.html directly and implement the handler yourself.

Sinatra with a persistent variable

My sinatra app has to parse a ~60MB XML-file. This file hardly ever changes: on a nightly cron job, It is overwritten with another one.
Are there tricks or ways to keep the parsed file in memory, as a variable, so that I can read from it on incoming requests, but not have to parse it over and over for each incoming request?
Some Pseudocode to illustrate my problem.
get '/projects/:id'
return #nokigiri_object.search("//projects/project[#id=#{params[:id]}]/name/text()")
end
post '/projects/update'
if params[:token] == "s3cr3t"
#nokogiri_object = reparse_the_xml_file
end
end
What I need to know, is how to create such a #nokogiri_object so that it persists when Sinatra runs. Is that possible at all? Or do I need some storage for that?
You could try:
configure do
##nokogiri_object = parse_xml
end
Then ##nokogiri_object will be available in your request methods. It's a class variable rather than an instance variable, but should do what you want.
The proposed solution gives a warning
warning: class variable access from toplevel
You can use a class method to access the class variable and the warning will disappear
require 'sinatra'
class Cache
##count = 0
def self.init()
##count = 0
end
def self.increment()
##count = ##count + 1
end
def self.count()
return ##count
end
end
configure do
Cache::init()
end
get '/' do
if Cache::count() == 0
Cache::increment()
"First time"
else
Cache::increment()
"Another time #{Cache::count()}"
end
end
Two options:
Save the parsed file to a new file and always read that one.
You can save in a file – serialize - a hash with two keys: 'last-modified' and 'data'.
The 'last-modified' value is a date and you check in every request if that day is today. If it is not today then a new file is downloaded, parsed and stored with today's date.
The 'data' value is the parsed file.
That way you parse just once time, sort of a cache.
Save the parsed file to a NoSQL database, for example redis.

How to parse SOAP response from ruby client?

I am learning Ruby and I have written the following code to find out how to consume SOAP services:
require 'soap/wsdlDriver'
wsdl="http://www.abundanttech.com/webservices/deadoralive/deadoralive.wsdl"
service=SOAP::WSDLDriverFactory.new(wsdl).create_rpc_driver
weather=service.getTodaysBirthdays('1/26/2010')
The response that I get back is:
#<SOAP::Mapping::Object:0x80ac3714
{http://www.abundanttech.com/webservices/deadoralive} getTodaysBirthdaysResult=#<SOAP::Mapping::Object:0x80ac34a8
{http://www.w3.org/2001/XMLSchema}schema=#<SOAP::Mapping::Object:0x80ac3214
{http://www.w3.org/2001/XMLSchema}element=#<SOAP::Mapping::Object:0x80ac2f6c
{http://www.w3.org/2001/XMLSchema}complexType=#<SOAP::Mapping::Object:0x80ac2cc4
{http://www.w3.org/2001/XMLSchema}choice=#<SOAP::Mapping::Object:0x80ac2a1c
{http://www.w3.org/2001/XMLSchema}element=#<SOAP::Mapping::Object:0x80ac2774
{http://www.w3.org/2001/XMLSchema}complexType=#<SOAP::Mapping::Object:0x80ac24cc
{http://www.w3.org/2001/XMLSchema}sequence=#<SOAP::Mapping::Object:0x80ac2224
{http://www.w3.org/2001/XMLSchema}element=[#<SOAP::Mapping::Object:0x80ac1f7c>,
#<SOAP::Mapping::Object:0x80ac13ec>,
#<SOAP::Mapping::Object:0x80ac0a28>,
#<SOAP::Mapping::Object:0x80ac0078>,
#<SOAP::Mapping::Object:0x80abf6c8>,
#<SOAP::Mapping::Object:0x80abed18>]
>>>>>>> {urn:schemas-microsoft-com:xml-diffgram-v1}diffgram=#<SOAP::Mapping::Object:0x80abe6c4
{}NewDataSet=#<SOAP::Mapping::Object:0x80ac1220
{}Table=[#<SOAP::Mapping::Object:0x80ac75e4
{}FullName="Cully, Zara"
{}BirthDate="01/26/1892"
{}DeathDate="02/28/1979"
{}Age="(87)"
{}KnownFor="The Jeffersons"
{}DeadOrAlive="Dead">,
#<SOAP::Mapping::Object:0x80b778f4
{}FullName="Feiffer, Jules"
{}BirthDate="01/26/1929"
{}DeathDate=#<SOAP::Mapping::Object:0x80c7eaf4>
{}Age="81"
{}KnownFor="Cartoonists"
{}DeadOrAlive="Alive">]>>>>
I am having a great deal of difficulty figuring out how to parse and show the returned information in a nice table, or even just how to loop through the records and have access to each element (ie. FullName,Age,etc). I went through the whole "getTodaysBirthdaysResult.methods - Object.new.methods" and kept working down to try and work out how to access the elements, but then I get to the array and I got lost.
Any help that can be offered would be appreciated.
If you're going to parse the XML anyway, you might as well skip SOAP4r and go with Handsoap. Disclaimer: I'm one of the authors of Handsoap.
An example implementation:
# wsdl: http://www.abundanttech.com/webservices/deadoralive/deadoralive.wsdl
DEADORALIVE_SERVICE_ENDPOINT = {
:uri => 'http://www.abundanttech.com/WebServices/DeadOrAlive/DeadOrAlive.asmx',
:version => 1
}
class DeadoraliveService < Handsoap::Service
endpoint DEADORALIVE_SERVICE_ENDPOINT
def on_create_document(doc)
# register namespaces for the request
doc.alias 'tns', 'http://www.abundanttech.com/webservices/deadoralive'
end
def on_response_document(doc)
# register namespaces for the response
doc.add_namespace 'ns', 'http://www.abundanttech.com/webservices/deadoralive'
end
# public methods
def get_todays_birthdays
soap_action = 'http://www.abundanttech.com/webservices/deadoralive/getTodaysBirthdays'
response = invoke('tns:getTodaysBirthdays', soap_action)
(response/"//NewDataSet/Table").map do |table|
{
:full_name => (table/"FullName").to_s,
:birth_date => Date.strptime((table/"BirthDate").to_s, "%m/%d/%Y"),
:death_date => Date.strptime((table/"DeathDate").to_s, "%m/%d/%Y"),
:age => (table/"Age").to_s.gsub(/^\(([\d]+)\)$/, '\1').to_i,
:known_for => (table/"KnownFor").to_s,
:alive? => (table/"DeadOrAlive").to_s == "Alive"
}
end
end
end
Usage:
DeadoraliveService.get_todays_birthdays
SOAP4R always returns a SOAP::Mapping::Object which is sometimes a bit difficult to work with unless you are just getting the hash values that you can access using hash notation like so
weather['fullName']
However, it does not work when you have an array of hashes. A work around is to get the result in xml format instead of SOAP::Mapping::Object. To do that I will modify your code as
require 'soap/wsdlDriver'
wsdl="http://www.abundanttech.com/webservices/deadoralive/deadoralive.wsdl"
service=SOAP::WSDLDriverFactory.new(wsdl).create_rpc_driver
service.return_response_as_xml = true
weather=service.getTodaysBirthdays('1/26/2010')
Now the above would give you an xml response which you can parse using nokogiri or REXML. Here is the example using REXML
require 'rexml/document'
rexml = REXML::Document.new(weather)
birthdays = nil
rexml.each_recursive {|element| birthdays = element if element.name == 'getTodaysBirthdaysResult'}
birthdays.each_recursive{|element| puts "#{element.name} = #{element.text}" if element.text}
This will print out all elements that have any text.
So once you have created an xml document you can pretty much do anything depending upon the methods the library you choose has ie. REXML or Nokogiri
Well, Here's my suggestion.
The issue is, you have to snag the right part of the result, one that is something you can actually iterator over. Unfortunately, all the inspecting in the world won't help you because it's a huge blob of unreadable text.
What I do is this:
File.open('myresult.yaml', 'w') {|f| f.write(result.to_yaml) }
This will be a much more human readable format. What you are probably looking for is something like this:
--- !ruby/object:SOAP::Mapping::Object
__xmlattr: {}
__xmlele:
- - &id024 !ruby/object:XSD::QName
name: ListAddressBooksResult <-- Hash name, so it's resul["ListAddressBooksResult"]
namespace: http://apiconnector.com
source:
- !ruby/object:SOAP::Mapping::Object
__xmlattr: {}
__xmlele:
- - &id023 !ruby/object:XSD::QName
name: APIAddressBook <-- this bastard is enumerable :) YAY! so it's result["ListAddressBooksResult"]["APIAddressBook"].each
namespace: http://apiconnector.com
source:
- - !ruby/object:SOAP::Mapping::Object
The above is a result from DotMailer's API, which I spent the last hour trying to figure out how to enumerate over the results. The above is the technique I used to figure out what the heck is going on. I think it beats using REXML etc this way, I could do something like this:
result['ListAddressBooksResult']['APIAddressBook'].each {|book| puts book["Name"]}
Well, I hope this helps anyone else who is looking.
/jason

Resources