Linq to XML: Finding value of a specific element - linq

I am new bee to "Linq" and "Linq to XML" concepts. I have the following xml tree
<WLANProfile xmlns="http://www.microsoft.com/networking/WLAN/profile/v1">
<name>IWS</name>
<SSIDConfig>
<SSID>
<hex>496153</hex>
<name>ISL</name>
</SSID>
</SSIDConfig>
<connectionType>ESS</connectionType>
<MSM>
<security>
<authEncryption>
<authentication>WPA2PSK</authentication>
<encryption>AES</encryption>
<useOneX>false</useOneX>
</authEncryption>
<sharedKey>
<keyType>networkKey</keyType>
<protected>false</protected>
<keyMaterial>BFEBBEA9B0E78ECD671A8D35D96556A32E001B7524A1</keyMaterial>
</sharedKey>
</security>
</MSM>
</WLANProfile>
I was wondering how to retrieve the KeyMaterial element value using linq to xml?
I have tried to use the following code, but I get empty enumeration
var networkKey = from c in doc.Descendants("WLANProfile")
select (string)c.Element("keyMaterial").Value;
Any suggestions?

Two mistakes:
1.) keyMaterial is not a direct child of WLANProfile that's why you don't get any results (c.Elements will only look for a direct child)
2.) you need to use the specified namespace in the XML - otherwise no node will match
Both applied:
XNamespace xns = "http://www.microsoft.com/networking/WLAN/profile/v1";
var networkKey = (from c in doc.Descendants(xns + "keyMaterial")
select (string)c.Value).FirstOrDefault();
Somewhat shorter in dot notation if you know there is always going to be exactly one key:
string networkKey = xdoc.Descendants(xns + "keyMaterial").Single().Value;

Related

trying to parse specific data using xpath

I have a small xml file that I'm trying to grab the away_team first and then the home_team second.
/game/team/statistics/#goals gives me the data I want but I need to reverse the order. So I'm trying to understand how to get the away_team goals first, followed by the home_team.
Below is the file
<game id="f24275a9-4f30-4a81-abdf-d16a9aeda087" status="closed" coverage="full" home_team="4416d559-0f24-11e2-8525-18a905767e44" away_team="44167db4-0f24-11e2-8525-18a905767e44" scheduled="2013-10-10T23:00:00+00:00" attendance="18210" start_time="2013-10-10T23:08:00+00:00" end_time="2013-10-11T01:32:00+00:00" clock="00:00" period="3" xmlns="http://feed.elasticstats.com/schema/hockey/game-v2.0.xsd">
<venue id="bd7b42fa-19bb-4b91-8615-214ccc3ff987" name="First Niagara Center" capacity="18690" address="One Seymour H. Knox III Plaza" city="Buffalo" state="NY" zip="14203" country="USA"/>
<team name="Sabres" market="Buffalo" id="4416d559-0f24-11e2-8525-18a905767e44" points="1">
<scoring>
<period number="1" sequence="1" points="1"/>
<period number="2" sequence="2" points="0"/>
<period number="3" sequence="3" points="0"/>
</scoring>
<statistics goals="1" assists="2" penalties="7" penalty_minutes="23" team_penalties="0" team_penalty_minutes="0" shots="27" blocked_att="14" missed_shots="8" hits="25" giveaways="5" takeaways="10" blocked_shots="7" faceoffs_won="22" faceoffs_lost="28" powerplays="1" faceoffs="50" faceoff_win_pct="44.0" shooting_pct="3.7" points="3">
<powerplay faceoffs_won="2" faceoffs_lost="0" shots="0" goals="0" missed_shots="1" assists="0" faceoff_win_pct="100.0" faceoffs="2"/>
<shorthanded faceoffs_won="3" faceoffs_lost="3" shots="1" goals="0" missed_shots="0" assists="0" faceoffs="6" faceoff_win_pct="50.0"/>
<evenstrength faceoff_win_pct="40.5" missed_shots="7" goals="1" faceoffs_won="17" shots="26" faceoffs="42" faceoffs_lost="25" assists="2"/>
<penalty shots="0" goals="0" missed_shots="0"/>
</statistics>
<shootout shots="0" missed_shots="0" goals="0" shots_against="0" goals_against="0" saves="0" saves_pct="0"/>
<goaltending shots_against="33" goals_against="4" saves="29" saves_pct="0.879" total_shots_against="33" total_goals_against="4">
<powerplay shots_against="0" goals_against="0" saves="0" saves_pct="0"/>
<shorthanded shots_against="7" goals_against="0" saves="7" saves_pct="1.0"/>
<evenstrength goals_against="4" saves_pct="0.846" shots_against="26" saves="22"/>
<penalty shots_against="0" goals_against="0" saves="0" saves_pct="0"/>
<emptynet goals_against="0" shots_against="0">
<powerplay goals_against="0"/>
<shorthanded goals_against="0"/>
<evenstrength goals_against="0"/>
</emptynet>
</goaltending>
Here's an XPath 2.0 expression that should do what you asked, yielding a sequence of two elements:
(/game/team[#id = /game/#home_team]/statistics/#goals,
/game/team[#id = /game/#away_team]/statistics/#goals)
Credit to #Ian for sleuthing out the details of the question.
In XPath 1.0, you could concatenate string data from the two teams in whatever order you want:
concat(/game/team[#id = /game/#home_team]/statistics/#goals, ' ',
/game/team[#id = /game/#away_team]/statistics/#goals)
But as Ian said, you can't produce a nodeset with an order different from document order. (I don't think a nodeset has any intrinsic order at all... it's how it's processed that imposes an order.)
Update:
As Ian pointed out, your XML data is in a namespace, thanks to the default namespace declaration on <game>. Since you said that "/game/team/statistics/#goals gives me the data", I'm assuming that you've already taken care of this aspect of the problem, perhaps by declaring the default namespace in your XPath execution environment.

Handle storing of child elements with same name and different XPath?

I'm trying to extract values from XML with Nokogiri.
I want to store, separated in an array, the child elements with the same name but different xpath. Those elements are ProdA, ProdB.
Currently I'm only trying to print the child elements, but the code I have so far prints only "SDocument" and not the child elements.
The goal is have an array like this:
array = [["2","8"], ["8","9"]]
This is the code:
#!/usr/bin/env ruby
require 'nokogiri'
doc = Nokogiri::XML(File.open("input.xml"))
a = doc.xpath("//SDocument").each do |n|
n if n.text?
end
puts a
This is the XML:
<?xml version="1.0" encoding="UTF-8"?>
<Document-St-5>
<SDocument>
<ItemList>
<Items_A>
<ItemElem>
<Item_Values>
<ProdA>2</ProdA>
<ProdB>8</ProdB>
</Item_Values>
</ItemElem>
</Items_A>
<Items_B>
<ItemElem>
<Item_Values>
<ProdA>8</ProdA>
<ProdB>9</ProdB>
</Item_Values>
</ItemElem>
</Items_B>
</ItemList>
</SDocument>
</Document-St-5>
Can somebody point me to the correct way please?
Update:
What I actually want is to store, in an array, the XPath of all unique child elements of SDocument node and those that have multiple
occurences, store them grouped. But if possible get the XPath without knowing the name of the children, only get unique XPaths.
For example:
The child elements StName and StCode only have one occurence each one, then the array that has the XPath so far would be:
arr_Xpath = [ ["/Document-St-5/SDocument/StName"], ["/Document-St-5/SDocument/StCode"], ... ]
The ProdA node's that are children of node Items_A have the following XPath:
/Document-St-5/SDocument/ItemList/Items_A/ItemElem/Item_Values/ProdA
The ProdA node's that are children of node Items_B have the following XPath:
/Document-St-5/SDocument/ItemList/Items_B/ItemElem/Item_Values/ProdA
Then the array of unique XPath of child elements would be (including ProdB node's XPath):
arr_Xpath = [ "/Document-St-5/SDocument/StName",
"/Document-St-5/SDocument/StCode",
"/Document-St-5/SDocument/ItemList/Items_A/ItemElem/Item_Values/ProdA",
"/Document-St-5/SDocument/ItemList/Items_A/ItemElem/Item_Values/ProdB",
"/Document-St-5/SDocument/ItemList/Items_B/ItemElem/Item_Values/ProdA",
"/Document-St-5/SDocument/ItemList/Items_B/ItemElem/Item_Values/ProdB" ]
I think, knowing first the unique XPaths, it would be possible to use doc.xpath("..") to get values of each child element and group them
if it has more than one occurence. So, the final array I'd like to get is:
arr_Values = [ ["WERLJ01"], ["MEKLD"],["2","9"],["8","3"],["1"],["17"]]
Where:
arr_Values[0] is the array that contains StName values
arr_Values[1] is the array that contains StCode values
arr_Values[2] is the array that contains the values of all the ProdA node's children of Items_A.
arr_Values[3] is the array that contains the values of all the ProdB node's children of Items_A.
arr_Values[4] is the array that contains the values of all the ProdA node's children of Items_B.
arr_Values[5] is the array that contains the values of all the ProdB node's children of Items_B.
An XML example is:
<?xml version="1.0" encoding="UTF-8"?>
<Document-St-5>
<SDocument>
<StName>WERLJ01</StName>
<StCode>MEKLD</StCode>
<ItemList>
<Items_A>
<ItemElem>
<Item_Values>
<ProdA>2</ProdA>
<ProdB>8</ProdB>
</Item_Values>
</ItemElem>
</Items_A>
<Items_A>
<ItemElem>
<Item_Values>
<ProdA>9</ProdA>
<ProdB>3</ProdB>
</Item_Values>
</ItemElem>
</Items_A>
<Items_B>
<ItemElem>
<Item_Values>
<ProdA>1</ProdA>
<ProdB>17</ProdB>
</Item_Values>
</ItemElem>
</Items_B>
</ItemList>
</SDocument>
</Document-St-5>
Update 2:
Hello the Tin Man, it works! What does it mean the "%w" and "%w[element1 element2]"? Does the form %w[...] accept more than 2 elements?
I newbie to Nokogiri, I only mention Xpath since the XML have more than 200 unique child nodes (unique Xpath's), then do you suggest me to use the same technique with CSS for all child nodes or is there a way to process the XML and do the same (group in array the elements with same name and that have same Xpath) without knowing the name of the child nodes? I'd like to know the way you suggest me.
Thanks again
Here's one way:
require 'nokogiri'
doc = Nokogiri::XML(<<EOT)
<?xml version="1.0" encoding="UTF-8"?>
<Document-St-5>
<SDocument>
<ItemList>
<Items_A>
<ItemElem>
<Item_Values>
<ProdA>2</ProdA>
<ProdB>8</ProdB>
</Item_Values>
</ItemElem>
</Items_A>
<Items_B>
<ItemElem>
<Item_Values>
<ProdA>8</ProdA>
<ProdB>9</ProdB>
</Item_Values>
</ItemElem>
</Items_B>
</ItemList>
</SDocument>
</Document-St-5>
EOT
data = doc.search('SDocument').map{ |node|
%w[ProdA ProdB].map{ |n| node.search(n).map(&:text) }
}
data # => [[["2", "8"], ["8", "9"]]]
It results in a bit deeper nesting than you want but it's close.
A little different way, perhaps more easily understood, is:
data = doc.search('SDocument').map{ |node|
%w[A B].map{ |ab|
node.at("Items_#{ ab }").search('ProdA, ProdB').map(&:text)
}
}
The reason the nesting is one-level deeper than you specified is, I'm assuming there will be multiple <SDocument> tags in the XML. If there won't be, then the code can be modified a bit to return the array as you're asking:
data = doc.search('Items_A, Items_B').map{ |node|
node.search('ProdA, ProdB').map(&:text)
}
data # => [["2", "8"], ["8", "9"]]
Notice I'm using CSS selectors, to make it easy to specify I want the code to look at two different nodes, both for Items_A and Items_B, and ProdA and ProdB.
Update after the question completely changed:
Here's the set-up:
require 'nokogiri'
doc = Nokogiri::XML(<<EOT)
<?xml version="1.0" encoding="UTF-8"?>
<Document-St-5>
<SDocument>
<StName>WERLJ01</StName>
<StCode>MEKLD</StCode>
<ItemList>
<Items_A>
<ItemElem>
<Item_Values>
<ProdA>2</ProdA>
<ProdB>8</ProdB>
</Item_Values>
</ItemElem>
</Items_A>
<Items_A>
<ItemElem>
<Item_Values>
<ProdA>9</ProdA>
<ProdB>3</ProdB>
</Item_Values>
</ItemElem>
</Items_A>
<Items_B>
<ItemElem>
<Item_Values>
<ProdA>1</ProdA>
<ProdB>17</ProdB>
</Item_Values>
</ItemElem>
</Items_B>
</ItemList>
</SDocument>
</Document-St-5>
EOT
Here's the code:
data = %w[StName StCode].map{ |n| [doc.at(n).text] }
%w[ProdA ProdB].each do |prod|
data << doc.search('Items_A').map{ |item| item.at(prod).text }
end
%w[ProdA ProdB].each do |prod|
data << [doc.at("Items_B #{prod}").text]
end
Here's what was captured:
data # => [["WERLJ01"], ["MEKLD"], ["2", "9"], ["8", "3"], ["1"], ["17"]]

Unable to findnodes() restricted just to current parent

I'm parsing a simple XML file to create a flat text file from it. The desired outcome is shown below the sample XML. The XML has sort of a header-detail structure (Assembly_Info and Part respectively), with a unique header node followed by any number of detail record nodes, all of which are siblings. After digging into the elements under the header, I can't then find a way back 'up' to then pick up all the sibling detail nodes.
XML file looks like this:
<?xml version="1.0" standalone="yes" ?>
<Wrapper>
<Record>
<Product>
<prodid>4094</prodid>
</Product>
<Assembly>
<Assembly_Info>
<id>DF-7A</id>
<interface>C</interface>
</Assembly_Info>
<Part>
<status>N/A</status>
<dev_name>0000</dev_name>
</Part>
<Part>
<status>Ready</status>
<dev_name>0455</dev_name>
</Part>
<Part>
<status>Ready</status>
<dev_name>045A</dev_name>
</Part>
</Assembly>
<Assembly>
<Assembly_Info>
<id>DF-7A</id>
<interface>C</interface>
</Assembly_Info>
<Part>
<status>N/A</status>
<dev_name>0002</dev_name>
</Part>
<Part>
<status>Ready</status>
<dev_name>0457</dev_name>
</Part>
</Assembly>
</Record>
</Wrapper>
For each Assembly I need to read the values of the two elemenmets in Assembly_Info which I do successfully. But, I then want to read each of the Part records that are associated with the Assembly. The objective is to 'flatten' the file into this:
prodid id interface status dev_name
4094 DF-7A C N/A 0000
4094 DF-7A C Ready 0455
4094 DF-7A C Ready 045A
4094 DF-7A C N/A 0002
4094 DF-7A C Ready 0457
I'm attempting to use findnodes() to do this, as that's about the only tool I thought I understood. My code unfortunately reads all of the Part records from the entire file foreach Assembly--since the only way I've been able to find the Part nodes is to start at the root. I don't know how to change 'where I am', if you will; to tell findnodes to begin at current parent. Code looks like this:
my $parser = XML::LibXML -> new();
my $tree = $parser -> parse_file ('DEMO.XML');
for my $product ($tree->findnodes ('/Wrapper/Record/Product/prodid')) {
$prodid = $product->textContent();
}
foreach my $assembly ($tree->findnodes ('/Wrapper/Record/Assembly')){
$assemblies++;
$parts = 0;
for my $assembly ($tree->findnodes ('/Wrapper/Record/Assembly/Assembly_Info')) {
$id = $assembly->findvalue('id');
$interface = $assembly->findvalue('interface');
}
foreach my $part ($tree->findnodes ('/Wrapper/Record/Assembly/Part')) {
$parts++;
$status = $part->findvalue('status');
$dev_name = $part->findvalue('dev_name');
}
print "Assembly No: ", $assemblies, " Parts: ",$parts, "\n";
}
How do I get just the Part nodes for a given Assembly, after I've gone down to the Assembly_Info depths? There is quite a bit I'm not getting, and I think a problem may be that I'm thinking of this as 'navigating' or moving a cursor, if you will. Examples of XPath path expressions have not helped me.
Instead of always using $tree as the starting point for the findnodes method, you can use any other node, especially also child nodes. Then you could use a relative XPath expression. For example:
for my $record ($tree->findnodes('/Wrapper/Record')) {
for my $assembly ($record->findnodes('./Assembly')) {
for my $part ($assembly->findnodes('./Part')) {
}
}
}

REXML parsing an XML in ruby

Folks,
I am using REXML for a sample XML file:
<Accounts title="This is the test title">
<Account name="frenchcustomer">
<username name = "frencu"/>
<password pw = "hello34"/>
<accountdn dn = "https://frenchcu.com/"/>
<exporttest name="basic">
<exportname name = "basicexport"/>
<exportterm term = "oldschool"/>
</exporttest>
</Account>
<Account name="britishcustomer">
<username name = "britishcu"/>
<password pw = "mellow34"/>
<accountdn dn = "https://britishcu.com/"/>
<exporttest name="existingsearch">
<exportname name = "largexpo"/>
<exportterm term = "greatschool"/>
</exporttest>
</Account>
</Accounts>
I am reading the XML like this:
#data = (REXML::Document.new file).root
#dataarr = ##testdata.elements.to_a("//Account")
Now I want to get the username of the frenchcustomer, so I tried this:
#dataarr[#name=fenchcustomer].elements["username"].attributes["name"]
this fails, I do not want to use the array index, for example
#dataarr[1].elements["username"].attributes["name"]
will work, but I don't want to do that, is there something that i m missing here. I want to use the array and get the username of the french user using the Account name.
Thanks a lot.
I recommend you to use XPath.
For the first match, you can use first method, for an array, just use match.
The code above returns the username for the Account "frenchcustomer" :
REXML::XPath.first(yourREXMLDocument, "//Account[#name='frenchcustomer']/username/#name").value
If you really want to use the array created with ##testdata.elements.to_a("//Account"), you could use find method :
french_cust_elt = the_array.find { |elt| elt.attributes['name'].eql?('frenchcustomer') }
french_username = french_cust_elt.elements["username"].attributes["name"]
puts #data.elements["//Account[#name='frenchcustomer']"]
.elements["username"]
.attributes["name"]
If you want to iterate over multiple identical names:
#data.elements.each("//Account[#name='frenchcustomer']") do |fc|
puts fc.elements["username"].attributes["name"]
end
I don't know what your ##testdata are, I tried with the following testcode:
require "rexml/document"
#data = (REXML::Document.new DATA).root
#dataarr = #data.elements.to_a("//Account")
# Works
p #dataarr[1].elements["username"].attributes["name"]
#Works not
#~ p #dataarr[#name='fenchcustomer'].elements["username"].attributes["name"]
##dataarr is an array
#dataarr.each{|acc|
next unless acc.attributes['name'] =='frenchcustomer'
p acc.elements["username"].attributes["name"]
}
##dataarr is an array
puts "===Array#each"
#dataarr.each{|acc|
next unless acc.attributes['name'] =='frenchcustomer'
p acc.elements["username"].attributes["name"]
}
puts "===XPATH"
#data.elements.to_a("//Account[#name='frenchcustomer']").each{|acc|
p acc.elements["username"].attributes["name"]
}
__END__
<Accounts title="This is the test title">
<Account name="frenchcustomer">
<username name = "frencu"/>
<password pw = "hello34"/>
<accountdn dn = "https://frenchcu.com/"/>
<exporttest name="basic">
<exportname name = "basicexport"/>
<exportterm term = "oldschool"/>
</exporttest>
</Account>
<Account name="britishcustomer">
<username name = "britishcu"/>
<password pw = "mellow34"/>
<accountdn dn = "https://britishcu.com/"/>
<exporttest name="existingsearch">
<exportname name = "largexpo"/>
<exportterm term = "greatschool"/>
</exporttest>
</Account>
</Accounts>
I'm not very familiar with rexml, so I expect there is a better solution. But perhaps aomebody can take my code to build a better solution.

Want to skip a tag and get by index

Given this XML:
<mets:mets>
<mets:fileSec>
<mets:fileGrp ID="fileGrp001" USE="image/dynamic">
<mets:file ID="filebib4112678_18760203_1_24_0001_m.jp2" MIMETYPE="image/jp2" SIZE="5308416"
CREATED="2009-11-10T00:00:00" USE="image/dynamic" ADMID="techMD001"
CHECKSUM="c07f516d77d8a5ca452775d489ffe78c" CHECKSUMTYPE="MD5">
<mets:FLocat LOCTYPE="URL" xlink:type="simple"
xlink:href="file:bib4112678_18760203_1_24_0001_m.jp2"/>
</mets:file>
<mets:file ID="filebib4112678_18760203_1_24_0002_m.jp2" MIMETYPE="image/jp2" SIZE="5308416"
CREATED="2009-11-10T00:00:00" USE="image/dynamic" ADMID="techMD002"
CHECKSUM="6497ceb7a8477fbe9ba4ff9e6e57999f" CHECKSUMTYPE="MD5">
<mets:FLocat LOCTYPE="URL" xlink:type="simple"
xlink:href="file:bib4112678_18760203_1_24_0002_m.jp2"/>
</mets:file>
</mets:fileGrp>
<mets:fileGrp ID="fileGrp002" USE="text/alto">
<mets:file ID="filebib4112678_18760203_1_24_0001_alto.xml" MIMETYPE="text/xml" SIZE="1114112"
CREATED="2009-11-10T00:00:00" USE="text/alto" ADMID="techMD005"
CHECKSUM="e391852693f78d2eb024caf6dbdb97c6" CHECKSUMTYPE="MD5">
<mets:FLocat LOCTYPE="URL" xlink:type="simple"
xlink:href="file:bib4112678_18760203_1_24_0001_alto.xml"/>
</mets:file>
<mets:file ID="filebib4112678_18760203_1_24_0002_alto.xml" MIMETYPE="text/xml" SIZE="1114112"
CREATED="2009-11-10T00:00:00" USE="text/alto" ADMID="techMD006"
CHECKSUM="e391852693f78d2eb024caf6dbdb97c6" CHECKSUMTYPE="MD5">
<mets:FLocat LOCTYPE="URL" xlink:type="simple"
xlink:href="file:bib4112678_18760203_1_24_0002_alto.xml"/>
</mets:file>
</mets:fileGrp>
</mets:fileSec>
</mets:mets>
This expression :
/mets/fileSec/fileGrp[2]/file[2]/#ADMID
gives the result "techMD006"
However, I would like to get the same result using something like this expression/query:
/mets/fileSec//file[4]/#ADMID
I.e I don't want to bother about the fileGrp element, since it makes things more complicated. Unfortunately the expression above didn't work..
Does anyone know how to make such an expression?
thanx!
Your expression retrieves all file elements that are a descendant of /mets/fileSec and are the fourth child of their parent:
/mets/fileSec//file[4]/#ADMID
But you have no such elements. What you want is to retrieve all file elements that are a descendant of /mets/fileSec and then take the fourth one. Use this:
(/mets/fileSec//file)[4]/#ADMID

Resources