Extend existing XML by Nokogiri - ruby

I'm trying to extend an existing XML file and add a new node. I'm loading the XML containing a lot of products, add a new one and save it.
I'm using Nokogiri and Ruby 1.9.3.
This is the best that I created:
builder = Nokogiri::XML::Builder.new do
root do
load_xml = Nokogiri::XML(IO.read("test.xml"))
parent.add_child(load_xml.root)
data do
name "Name"
end
end
end
file = File.open("test.xml",'w')
file.puts builder.to_xml
file.close

Nokogiri::XML::Builder is actually only used when creating new XML-Files, not when editing them.
Also your code loads the XML and puts it into a new root-Node (root) while it appends a new child (the data-node) to it. Is this really the desired behaviour?
Normally you would do adding a node like this:
doc = Nokogiri::XML(IO.read("test.xml"))
name_node = Nokogiri::XML::Node.new("name",doc)
name_node.content = "Name"
data_node = Nokogiri::XML::Node.new("data",doc)
data_node.add_child(name_node)
doc.root.add_child(data_node)
file = File.open("test.xml",'w')
file.puts doc.to_xml
file.close
This is without creating a new root node, because this seems a little bit
peculiar to me...
Also you might want to try the Nokogiri-Documentation, it is fairly extensive.
There are other ways, which would use Nokogiri::XML::Builder to create everything downside from and including data, this would be an example for this combined approach:
builder = Nokogiri::XML::Builder.new do
data do
name "Name"
end
end
doc = Nokogiri::XML(IO.read("test.xml"))
doc.root.add_child builder.doc.root
file = File.open("test.xml",'w')
file.puts doc.to_xml
file.close

Related

Nokogiri - Checking if the value of an xpath exists and is blank or not in Ruby

I have an XML file, and before I process it I need to make sure that a certain element exists and is not blank.
Here is the code I have:
CSV.open("#{csv_dir}/products.csv","w",{:force_quotes => true}) do |out|
out << headers
Dir.glob("#{xml_dir}/*.xml").each do |xml_file|
gdsn_doc = GDSNDoc.new(xml_file)
logger.info("Processing xml file #{xml_file}")
:x
#desc_exists = #gdsn_doc.xpath("//productData/description")
if !#desc_exists.empty?
row = []
headers.each do |col|
row << product[col]
end
out << row
end
end
end
The following code is not working to find the "description" element and to check whether it is blank or not:
#desc_exists = #gdsn_doc.xpath("//productData/description")
if !#desc_exists.empty?
Here is a sample of the XML file:
<productData>
<description>Chocolate biscuits </description>
<productData>
This is how I have defined the class and Nokogiri:
class GDSNDoc
def initialize(xml_file)
#doc = File.open(xml_file) {|f| Nokogiri::XML(f)}
#doc.remove_namespaces!
The code had to be moved up to an earlier stage, where Nokogiri was initialised. It doesn't get runtime errors, but it does let XML files with blank descriptions get through and it shouldn't.
class GDSNDoc
def initialize(xml_file)
#doc = File.open(xml_file) {|f| Nokogiri::XML(f)}
#doc.remove_namespaces!
desc_exists = #doc.xpath("//productData/descriptions")
if !desc_exists.empty?
You are creating your instance like this:
gdsn_doc = GDSNDoc.new(xml_file)
then use it like this:
#desc_exists = #gdsn_doc.xpath("//productData/description")
#gdsn_doc and gdsn_doc are two different things in Ruby - try just using the version without the #:
#desc_exists = gdsn_doc.xpath("//productData/description")
The basic test is to use:
require 'nokogiri'
doc = Nokogiri::XML(<<EOT)
<productData>
<description>Chocolate biscuits </description>
<productData>
EOT
# using XPath selectors...
doc.xpath('//productData/description').to_html # => "<description>Chocolate biscuits </description>"
doc.xpath('//description').to_html # => "<description>Chocolate biscuits </description>"
xpath works fine when the document is parsed correctly.
I get an error "undefined method 'xpath' for nil:NilClass (NoMethodError)
Usually this means you didn't parse the document correctly. In your case it's because you're not using the right variable:
gdsn_doc = GDSNDoc.new(xml_file)
...
#desc_exists = #gdsn_doc.xpath("//productData/description")
Note that gdsn_doc is not the same as #gdsn_doc. The later doesn't appear to have been initialized.
#doc = File.open(xml_file) {|f| Nokogiri::XML(f)}
While that should work, it's idiomatic to write it as:
#doc = Nokogiri::XML(File.read(xml_file))
File.open(...) do ... end is preferred if you're processing inside the block and want Ruby to automatically close the file. That isn't necessary when you're simply reading then passing the content to something else for processing, hence the use of File.read(...) which slurps the file. (Slurping isn't necessary a good practice because it can have scalability problems, but for reasonable sized XML/HTML it's OK because it's easier to use DOM-based parsing than SAX.)
If Nokogiri doesn't raise an exception it was able to parse the content, however that still doesn't mean the content was valid. It's a good idea to check
#doc.errors
to see whether Nokogiri/libXML had to do some fix-ups on the content just to be able to parse it. Fixing the markup can change the DOM from what you expect, making it impossible to find a tag based on your assumptions for the selector. You could use xmllint or one of the XML validators to check, but Nokogiri will still have to be happy.
Nokogiri includes a command-line version nokogiri that accepts a URL to the document you want to parse:
nokogiri http://example.com
It'll open IRB with the content loaded and ready for you to poke at it. It's very convenient when debugging and testing. It's also a decent way to make sure the content actually exists if you're dealing with HTML containing DHTML that loads parts of the page dynamically.

How to read data from a different file without using YAML or JSON

I'm experimenting with a Ruby script that will add data to a Neo4j database using REST API. (Here's the tutorial with all the code if interested.)
The script works if I include the hash data structure in the initialize method but I would like to move the data into a different file so I can make changes to it separately using a different script.
I'm relatively new to Ruby. If I copy the following data structure into a separate file, is there a simple way to read it from my existing script when I call #data? I've heard one could do something with YAML or JSON (not familiar with how either work). What's the easiest way to read a file and how could I go about coding that?
#I want to copy this data into a different file and read it with my script when I call #data.
{
nodes:[
{:label=>"Person", :title=>"title_here", :name=>"name_here"}
]
}
And here is part of my code, it should be enough for the purposes of this question.
class RGraph
def initialize
#url = 'http://localhost:7474/db/data/cypher'
#If I put this hash structure into a different file, how do I make #data read that file?
#data = {
nodes:[
{:label=>"Person", :title=>"title_here", :name=>"name_here"}
]
}
end
#more code here... not relevant to question
def create_nodes
# Scan file, find each node and create it in Neo4j
#data.each do |key,value|
if key == :nodes
#data[key].each do |node| # Cycle through each node
next unless node.has_key?(:label) # Make sure this node has a label
#WE have sufficient data to create a node
label = node[:label]
attr = Hash.new
node.each do |k,v| # Hunt for additional attributes
next if k == :label # Don't create an attribute for "label"
attr[k] = v
end
create_node(label,attr)
end
end
end
end
rGraph = RGraph.new
rGraph.create_nodes
end
Given that OP said in comments "I'm not against using either of those", let's do it in YAML (which preserves the Ruby object structure best). Save it:
#data = {
nodes:[
{:label=>"Person", :title=>"title_here", :name=>"name_here"}
]
}
require 'yaml'
File.write('config.yaml', YAML.dump(#data))
This will create config.yaml:
---
:nodes:
- :label: Person
:title: title_here
:name: name_here
If you read it in, you get exactly what you saved:
require 'yaml'
#data = YAML.load(File.read('config.yaml'))
puts #data.inspect
# => {:nodes=>[{:label=>"Person", :title=>"title_here", :name=>"name_here"}]}

How to properly automate xml to xls

I am getting a lot of xml files recently, that i want to analyse in excel. In stead of using the xml conversion standard in (newer versions of) excel, I want to use a Ruby code that does it for a number of files automatically.
I am not very familiar, however, with rexml. After half a days work I got the code to convert just one(!) xml node. This is how it looks:
require 'rexml/document'
Dir.glob("FILES/archive/*.xml") do |eksemel|
puts "converting #{eksemel}"
filename = (/\d+/.match(eksemel)).to_s
xml_file = File.open("#{eksemel}", "r")
csv_file = File.new("#{filename}.csv", "w")
xml = REXML::Document.new( xml_file )
counter = 0
xml.elements.each("RESULTS") do |e|
e.elements.each("component") do |f|
f.elements.each("paragraph") do |g|
counter = counter + 1
csv_file.puts g.text
end
end
end
end
Is there a way to a) instead of define the names of the elements and the number let ruby do it automatically and b) save all of these as separate columns in a csv file?
It isn't clear what you are using counter for. It would also help if you clarified what kind of structure the XML file has (for instance, are there many <paragraph> elements within each <component> element?). But, here is a cleaner way to write what I think you shooting for:
require 'rexml/document'
require 'csv'
Dir.glob('FILES/archive/*.xml') do |eksemel|
puts "converting #{eksemel}"
# I assume you are creating a .csv file with the same name as your .xml file
xml_file = File.new(eksemel)
csv_file = CSV.open(eksemel.sub(/\.xml$/, '.csv'), 'w')
xml = REXML::Document.new(xml_file)
counter = xml.elements.to_a('RESULTS//component//paragraph').length
xml.elements.each('RESULTS//component') do |component|
csv_file << component.elements.to_a('paragraph')
end
[xml_file, csv_file].each {|f| f.close}
end

How to save the result of Nokogiri

This is part of a ruby script. I want to save the results to a text file. I only want the results specified in these two DIVS.
url = browser.html
doc = Nokogiri::HTML(open(url))
price = doc.css("#sectionPrice").text
ship = doc.css("#shippingCharges td").text
How do I save the scraped results? Mind you that the script loading the page is working correclty. In SHELL I can see the values of my scrape using XPATH as follows.
page_html = Nokogiri::HTML.parse(browser.html)
shipping = puts page_html.xpath(".//*[#id='shippingCharges']").inner_text
price = puts page_html.xpath(".//*[#id='sectionPrice']").inner_text
How do I save this data to a CSV or XML?
//Side Question: Is this data returned in SHELL saved anywhere? How do I access it outside of SHELL
url = browser.html
doc = Nokogiri::HTML(open(url))
price = doc.css("#sectionPrice").text
ship = doc.css("#shippingCharges td").text
CSV.open("/users/fabio/desktop/ruby/gp.csv", "wb") do |csv|
csv << [price, ship]
end
Not creating the CSVfile. Nothing appearing in the DIR What gives?
It is pretty simple to write this to a csv file.
Just add the following in:
require 'csv'
CSV.open("file.csv", "wb") do |csv|
csv << [price, ship]
end
If shipping and price are arrays then you will want to iterate through them but this is how you create a csv.
Hope this gets you on your way.
Cheers!

Create one XML file that joins many others

I am trying to create an XML using some list of XML's.
here is an example list of XML's
java.xml :
<JavaDetails>
<SomeList> ... </SomeList>
....
</JavaDetails>
c.xml
<CDetails>
<SomeList> ... </SomeList>
....
</CDetails>
I want to create a Programming.xml using the above XML's
it should look like:
<programming>
<Java>
<JavaDetails>
<SomeList> ... </SomeList>
....
</JavaDetails>
</Java>
<C>
<CDetails>
<SomeList> ... </SomeList>
....
</CDetails>
</C>
</programming>
I am currently looking into nokogiri to do the same as Performance is a major factor, What I am not sure is how to create nodes for the output XML. any code help in Ruby using Nokogiri is much appreciated.
To create a new XML file with a specific root, it can be as simple as:
doc = Nokogiri.XML("<programming/>")
One way to add a child node to that document:
java = doc.root.add_child('<Java/>').first
To read in another XML file from disk and append it:
java_details = Nokogiri.XML( IO.read )
java << java_details.root
Thus, if you have an array of filenames and you want to construct wrapping elements from each based on the name:
require 'nokogiri'
files = %w[ java.xml c.xml ]
doc = Nokogiri.XML('<programming/>')
files.each do |filename|
wrap_name = File.basename(filename,'.*').capitalize
wrapper = doc.root.add_child("<#{wrap_name} />").first
wrapper << Nokogiri.XML(IO.read(filename)).root
end
puts doc
Alternatively, if you want to use the Builder interface of Nokogiri:
builder = Nokogiri::XML::Builder.new do |xml|
xml.programming do
files.each do |filename|
wrap_name = File.basename(filename,'.*').capitalize
xml.send(wrap_name) do
xml.parent << Nokogiri.XML(IO.read(filename)).root
end
end
end
end
puts builder.to_xml
To install it:
gem install nokogiri
Here's the syntax:
require 'nokogiri'
builder = Nokogiri::XML::Builder.new do |xml|
xml.programming {
xml.Java {
xml.JavaDetails {
xml.SomeList 'List item'
}
}
}
end
The result can be retrieved with to_xml:
builder.to_xml
HTH!

Resources