XML - How to pull out repeating children nodes - ruby

I am trying to extract repeating child elements from an xpath.
This is a sample of the XML:
<productDetail>
<productTypeCode>123</productTypeCode>
<productPrice currency="EUR">13.27</productPrice>
<productPrice currency="US">15</productPrice>
</productDetail>
As you can see the productPrice currency node is repeating.
I am able to pull out one of them by looping through each of the elements:
#node.children.each do |c|
if c.name == "productDetail"
info = {}
productTypeCode = nil
c.children.each do |gc|
name = gc.name
if name == "productTypeCode"
productTypeCode = gc.text
elsif name == "productPrice"
info["productPrice"] = gc.text
attrs = gc.attributes
info["productPrice_cur"] = attrs["currency"].value
end
As you can see I only have the "productPrice" information once in the loop, but there are two of them in the XML data.
How do I access both of the values, seen as the xpath and value names are the same?
I am coding this in Ruby.

Related

Avoid nested select blocks

I have to retrieve some information related to movies and shows from a json document.
unique_nos = js['navigation']['category'].select{|n| n['name']=="Home"}.first['category'].select{|s| s['name']=="#{type}"}.first['category'].select{|k| k['name']=='Movie Studios'}.first['category'].map{|l| l['categoryId']}
The same would go for tv shows also.
unique_nos = js['navigation']['category'].select{|n| n['name']=="Home"}.first['category'].select{|s| s['name']=='TV'}.first['category'].select{|k| k['name']=='Networks'}.first['category'].map{|l| l['categoryId']}
I would like to avoid duplicated code performing same tasks. I would rather like to pass this block as a parameter so it could be dynamic. Is there any way to achieve this with metaprogramming?
You can simply extract it as a method:
def find_unique_nos(js, type, category)
js['navigation']['category'].select{|n| n['name']=="Home"}.first['category'].select{|s| s['name']== type }.first['category'].select{|k| k['name']==category}.first['category'].map{|l| l['categoryId']}
end
On a side note, select { ... }.first is equivalent to find { ... }, so you can simplify this to:
def find_unique_nos(js, type, category)
js['navigation']['category'].find{|n| n['name'] == "Home" }['category']
.find{|s| s['name'] == type }['category']
.find{|k| k['name'] == category }['category']
.map{|l| l['categoryId']}
end
If you want to be more sophisticated, you can use a builder to do the repetitive job of find{ ... }['category']:
def find_unique_nos(js, type, category)
['Home', type, category].inject(js['navigation']['category']) do |cat, name|
cat.find{|n| n['name'] == name }['category']
end.map{|l| l['categoryId']}
end
Please consider to use intermediate variables to break down such long chains, it will help ease debugging and comprehension. Using your same code with reformatting:
def unique_numbers(json: j, type: t)
category = type == 'TV' ? 'Networks' : 'Movie Studios'
json['navigation']['category']
.select{|n| n['name']=="Home"}
.first['category']
.select{|s| s['name'] == type }
.first['category']
.select{|k| k['name'] == category }
.first['category']
.map{|l| l['categoryId']}
end

How do I access an attribute from the outer node inside an inner condition?

I want to perform the following XPath:
/Configs/Category/InputMenu/Config[#Value = 'DualPack' and (/Configs/Category/MasterSlave/Config[#No = ./#No]/#Value = 'Master')]
Where the "./#No" in the part"[#No = ./#No]" is from /Configs/Category/InputMenu/Config#No, not from /Configs/Category/MasterSlave/Config#No
How can I specify that the ./#No is from that outer node?
thanks - dave
In XPath 2, you could use for to simulate the let of XQuery:
/Configs/Category/InputMenu/Config[#Value = 'DualPack' and (for $no in ./#No return /Configs/Category/MasterSlave/Config[#No = $no]/#Value = 'Master')]
Otherwise you could turn it around, and check if the No attribute is equal to the No attribute of one element that has a Master value (instead checking if the value of one element with the same No attribute is Master).
/Configs/Category/InputMenu/Config[#Value = 'DualPack' and #No = /Configs/Category/MasterSlave/Config[#Value = 'Master']/#No]

How do I find a collection of nodes in HtmlAgilityPack using linq to xml?

I want to extract information from various websites. I am using HtmlAgilityPack and Linq to XML. So far I have managed to extract the value from a single node in a website by writing:
var q = document.DocumentNode.DescendantNodes()
.Where(n => n.Name == "img" && n.Id == "GraphicalBoard001")
.FirstOrDefault();
But I am really interested in the whole collection of img's that start with "GraphicalBoard". I tried something like:
var q2 = document.DocumentNode.DescendantNodes()
.Where(n => n.Name == "img" && n.Id.Contains("GraphicalBoard"))
.Select...
But it seems that linq doesn't like the Contains-method, since I lose the Select option in intellisense. How can I extract all the img-tags where the Id starts with "GraphicalBoard"?
How can I extract all the img-tags where the Id starts with "GraphicalBoard"?
You had it already, just stop at the call to Where(). The Where() call filters the collection by the items that satisfies the predicate.
Though you should write it so you filter through the img descendants, not all descendants.
var query = doc.DocumentNode.Descendants("img")
.Where(img => img.Id.StartsWith("GraphicalBoard"));

Set array elements (string) as variable name in Ruby

I have the following array, that I use to later write the header on an Excel file.
fields = ["fileName", "type", "id"]
And then I have the following code that reads values from an XML:
filename = xml.xpath('//path/filename').text
type = xml.xpath('//path/type').text
id = xml.xpath('//path/id').text
The I iterate the initial array (fields) in order to set the Excel cells to the values extracted in the previous step:
row = 2
c = 1
fields.each do |content|
ws.Cells(row,c).Value = content
c = c + 1
I'm trying to have the array's (fields) contents to variable names instead of strings in order to be able to reuse the head fields.
Can anyone recommend a way of making it possible?
This sounds like you need to use a Hash to associate field names to the values you extracted:
fields = {
"fileName" => xml.xpath('//path/filename').text,
"type" => xml.xpath('//path/type').text,
"id" => xml.xpath('//path/id').text
}
row=2
c=1
fields.each do |key,value|
ws.Cells(row,c).Value = value
c=c+1
end

Multiple Counts within a single query

I want a list of counts for some of my data (count the number of open.closed tasks etc), I want to get all counts inside 1 query, so I am not sure what I do with my linq statement below...
_user is an object that returns info about the current loggedon user
_repo is am object that returns an IQueryable of whichever table I want to select
var counters = (from task in _repo.All<InstructionTask>()
where task.AssignedToCompanyID == _user.CompanyID || task.CompanyID == _user.CompanyID
join instructions in _repo.GetAllMyInstructions(_user) on task.InstructionID equals
instructions.InstructionID
group new {task, instructions}
by new
{
task
}
into g
select new
{
TotalEveryone = g.Count(),
TotalMine = g.Count(),
TotalOpen = g.Count(x => x.task.IsOpen),
TotalClosed = g.Count(c => !c.task.IsOpen)
}).SingleOrDefault();
Do I convert my object to single or default? The exception I am getting is, this sequence contains more than one element
Note: I want overall stats, not for each task, but for all tasks - not sure how to get that?
You need to dump everything into a single group, and use a regular Single. I am not sure if LINQ-to-SQL would be able to translate it correctly, but it's definitely worth a try.
var counters = (from task in _repo.All<InstructionTask>()
where task.AssignedToCompanyID == _user.CompanyID || task.CompanyID == _user.CompanyID
join instructions in _repo.GetAllMyInstructions(_user) on task.InstructionID == instructions.InstructionID
group task by 1 /* <<=== All tasks go into one group */ into g select new {
TotalEveryone = task.Count(),
TotalMine = task.Count(), // <<=== You probably need a condition here
TotalOpen = task.Count(x => x.task.IsOpen),
TotalClosed = task.Count(c => !c.task.IsOpen)
}).Single();
From MSDN
Returns the only element of a sequence, or a default value if the
sequence is empty; this method throws an exception if there is more
than one element in the sequence.
You need to use FirstOrDefault. SingleOrDefault is designed for collections that contains exactly 1 element (or none).

Resources