Checking complex multiple values (and) in a node in Xpath - xpath

hello i find many explample with xpath, unfortunatly i can't do what i would like to do :(
I need to control a xml script, a i want to alert if 2 sub elements are correct.
here a part of my xml file
<node componentName="printinput" componentVersion="0.102" offsetLabelX="0" >
<elementParameter field="TEXT" name="UNIQUE_NAME" value="name1" show="false"/>
<elementParameter field="CHECK" name="TCK_HELP" value="true"/>
<elementParameter field="TEXT" name="CO_ON" value="10000" show="false"/>
</node>
i would like to check if TCK_HELP=true AND CO_ON=10000 . With or, no pb, but i don't know hox to do this with 'and'. i understand why it'is not working, but i don't know how to do .. Thank a lot for your help
one of my tries :
/*[local-name() = 'ProcessType']
/*[local-name() = 'node']
[
#componentName='printinput'
]
/*[local-name() = 'elementParameter']
[#name='TCK_HELP' and #value!='true'
and
#name='CO_ON' and #value='10000'
]

What about:
//node[#componentName="printinput"][elementParameter[#name="TCK_HELP"][#value="true"]][elementParameter[#name="CO_ON"][#value="10000"]]

Related

Is possible to extract the text of an element compared with the end of the value of an attribute of current node?

I have an XML like this
<Values>
<Value ID="Contents01" Name="Contents 01" QualifierID="en-US">
<Text>It is a test [{placeHolder01}]</Text>
</Value>
<Value ID="VarPlaceHolderValue01" Name="Var Place Holder 01" QualifierID="en-US">[{placeHolder01}]</Value>
<Value ID="Contents02" Name="Contents 02" QualifierID="en-US">
<Text>Some extra text.</Text></Value>
<Value ID="PlaceHolder01" Name="PlaceHolder 01" QualifierID="en-US">
<Text>For StackOverflow</Text>
</Value>
</Values>
Would be possible to get with an expression the QualifierID of the PlaceHolderValue01, having the currently selected node the PlaceHolder01.
So the idea would be something like this from an already selected node.
//Values/Value[starts-with(#ID,'Var') and substring(./#ID, string-length(./#ID) - 2) = substring(#ID, string-length(#ID) - 2)]/text()
However I am getting syntax error with the xpath checkers, how it should work correctly?
Is it possible to do this with only xpath? The idea is to extract the text of the element VarPlaceHolderValue01, knowing that starts with Var and ends with the same number value of the current selected node?
Trying it out in iPython:
First, to select the node:
In [11]: root.xpath('//Value[starts-with(#ID, "PlaceHolder")]')
Out[11]: [<Element Value at 0x1094a1a00>]
Next, to isolate the string to be matched:
In [13]: root.xpath('substring-after(//Value[starts-with(#ID, "PlaceHolder")]/#ID, "PlaceHolder")')
Out[13]: '01'
Next, to match the "Var"-starting element and extract its text.
In [18]: root.xpath('string(//Value[starts-with(#ID, "Var") and contains(#ID, substring-after(//Value[starts-with(#ID, "PlaceHolder")]/#ID, "PlaceHolder"))])')
Out[18]: '[{placeHolder01}]'

trying to parse specific data using xpath

I have a small xml file that I'm trying to grab the away_team first and then the home_team second.
/game/team/statistics/#goals gives me the data I want but I need to reverse the order. So I'm trying to understand how to get the away_team goals first, followed by the home_team.
Below is the file
<game id="f24275a9-4f30-4a81-abdf-d16a9aeda087" status="closed" coverage="full" home_team="4416d559-0f24-11e2-8525-18a905767e44" away_team="44167db4-0f24-11e2-8525-18a905767e44" scheduled="2013-10-10T23:00:00+00:00" attendance="18210" start_time="2013-10-10T23:08:00+00:00" end_time="2013-10-11T01:32:00+00:00" clock="00:00" period="3" xmlns="http://feed.elasticstats.com/schema/hockey/game-v2.0.xsd">
<venue id="bd7b42fa-19bb-4b91-8615-214ccc3ff987" name="First Niagara Center" capacity="18690" address="One Seymour H. Knox III Plaza" city="Buffalo" state="NY" zip="14203" country="USA"/>
<team name="Sabres" market="Buffalo" id="4416d559-0f24-11e2-8525-18a905767e44" points="1">
<scoring>
<period number="1" sequence="1" points="1"/>
<period number="2" sequence="2" points="0"/>
<period number="3" sequence="3" points="0"/>
</scoring>
<statistics goals="1" assists="2" penalties="7" penalty_minutes="23" team_penalties="0" team_penalty_minutes="0" shots="27" blocked_att="14" missed_shots="8" hits="25" giveaways="5" takeaways="10" blocked_shots="7" faceoffs_won="22" faceoffs_lost="28" powerplays="1" faceoffs="50" faceoff_win_pct="44.0" shooting_pct="3.7" points="3">
<powerplay faceoffs_won="2" faceoffs_lost="0" shots="0" goals="0" missed_shots="1" assists="0" faceoff_win_pct="100.0" faceoffs="2"/>
<shorthanded faceoffs_won="3" faceoffs_lost="3" shots="1" goals="0" missed_shots="0" assists="0" faceoffs="6" faceoff_win_pct="50.0"/>
<evenstrength faceoff_win_pct="40.5" missed_shots="7" goals="1" faceoffs_won="17" shots="26" faceoffs="42" faceoffs_lost="25" assists="2"/>
<penalty shots="0" goals="0" missed_shots="0"/>
</statistics>
<shootout shots="0" missed_shots="0" goals="0" shots_against="0" goals_against="0" saves="0" saves_pct="0"/>
<goaltending shots_against="33" goals_against="4" saves="29" saves_pct="0.879" total_shots_against="33" total_goals_against="4">
<powerplay shots_against="0" goals_against="0" saves="0" saves_pct="0"/>
<shorthanded shots_against="7" goals_against="0" saves="7" saves_pct="1.0"/>
<evenstrength goals_against="4" saves_pct="0.846" shots_against="26" saves="22"/>
<penalty shots_against="0" goals_against="0" saves="0" saves_pct="0"/>
<emptynet goals_against="0" shots_against="0">
<powerplay goals_against="0"/>
<shorthanded goals_against="0"/>
<evenstrength goals_against="0"/>
</emptynet>
</goaltending>
Here's an XPath 2.0 expression that should do what you asked, yielding a sequence of two elements:
(/game/team[#id = /game/#home_team]/statistics/#goals,
/game/team[#id = /game/#away_team]/statistics/#goals)
Credit to #Ian for sleuthing out the details of the question.
In XPath 1.0, you could concatenate string data from the two teams in whatever order you want:
concat(/game/team[#id = /game/#home_team]/statistics/#goals, ' ',
/game/team[#id = /game/#away_team]/statistics/#goals)
But as Ian said, you can't produce a nodeset with an order different from document order. (I don't think a nodeset has any intrinsic order at all... it's how it's processed that imposes an order.)
Update:
As Ian pointed out, your XML data is in a namespace, thanks to the default namespace declaration on <game>. Since you said that "/game/team/statistics/#goals gives me the data", I'm assuming that you've already taken care of this aspect of the problem, perhaps by declaring the default namespace in your XPath execution environment.

Can't address XML attribute thought XPath in Ruby (using Nokogiri)

I'm trying to filter xml file to get nodes with certain attribute. I can successfully filter by node (ex. \top_manager), but when I try \\top_manager[#salary='great'] I get nothing.
<?xml version= "1.0"?>
<employee xmlns="http://www.w3schools.com" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="employee.xsd">
<top_manager>
<ceo salary="great" respect="enormous" type="extra">
<fname>
Vasya
</fname>
<lname>
Pypkin
</lname>
<hire_date>
19
</hire_date>
<descr>
Big boss
</descr>
</ceo>
<cio salary="big" respect="great" type="intro">
<fname>
Petr
</fname>
<lname>
Pypkin
</lname>
<hire_date>
25
</hire_date>
<descr>
Resposible for information security
</descr>
</cio>
</top_manager>
......
How I need to correct this code to get what I need?
require 'nokogiri'
f = File.open("employee.xml")
doc = Nokogiri::XML(f)
doc.xpath("//top_manager[#salary='great']").each do |node|
puts node.text
end
thank you.
That's because salary is not attribute of <top_manager> element, it is the attribute of <top_manager>'s children elements :
//xmlns:top_manager[*[#salary='great']]
Above XPath select <top_manager> element having any of it's child element has attribute salary equals "great". Or if you meant to select the children (the <ceo> element in this case) :
//xmlns:top_manager/*[#salary='great']

Unable to findnodes() restricted just to current parent

I'm parsing a simple XML file to create a flat text file from it. The desired outcome is shown below the sample XML. The XML has sort of a header-detail structure (Assembly_Info and Part respectively), with a unique header node followed by any number of detail record nodes, all of which are siblings. After digging into the elements under the header, I can't then find a way back 'up' to then pick up all the sibling detail nodes.
XML file looks like this:
<?xml version="1.0" standalone="yes" ?>
<Wrapper>
<Record>
<Product>
<prodid>4094</prodid>
</Product>
<Assembly>
<Assembly_Info>
<id>DF-7A</id>
<interface>C</interface>
</Assembly_Info>
<Part>
<status>N/A</status>
<dev_name>0000</dev_name>
</Part>
<Part>
<status>Ready</status>
<dev_name>0455</dev_name>
</Part>
<Part>
<status>Ready</status>
<dev_name>045A</dev_name>
</Part>
</Assembly>
<Assembly>
<Assembly_Info>
<id>DF-7A</id>
<interface>C</interface>
</Assembly_Info>
<Part>
<status>N/A</status>
<dev_name>0002</dev_name>
</Part>
<Part>
<status>Ready</status>
<dev_name>0457</dev_name>
</Part>
</Assembly>
</Record>
</Wrapper>
For each Assembly I need to read the values of the two elemenmets in Assembly_Info which I do successfully. But, I then want to read each of the Part records that are associated with the Assembly. The objective is to 'flatten' the file into this:
prodid id interface status dev_name
4094 DF-7A C N/A 0000
4094 DF-7A C Ready 0455
4094 DF-7A C Ready 045A
4094 DF-7A C N/A 0002
4094 DF-7A C Ready 0457
I'm attempting to use findnodes() to do this, as that's about the only tool I thought I understood. My code unfortunately reads all of the Part records from the entire file foreach Assembly--since the only way I've been able to find the Part nodes is to start at the root. I don't know how to change 'where I am', if you will; to tell findnodes to begin at current parent. Code looks like this:
my $parser = XML::LibXML -> new();
my $tree = $parser -> parse_file ('DEMO.XML');
for my $product ($tree->findnodes ('/Wrapper/Record/Product/prodid')) {
$prodid = $product->textContent();
}
foreach my $assembly ($tree->findnodes ('/Wrapper/Record/Assembly')){
$assemblies++;
$parts = 0;
for my $assembly ($tree->findnodes ('/Wrapper/Record/Assembly/Assembly_Info')) {
$id = $assembly->findvalue('id');
$interface = $assembly->findvalue('interface');
}
foreach my $part ($tree->findnodes ('/Wrapper/Record/Assembly/Part')) {
$parts++;
$status = $part->findvalue('status');
$dev_name = $part->findvalue('dev_name');
}
print "Assembly No: ", $assemblies, " Parts: ",$parts, "\n";
}
How do I get just the Part nodes for a given Assembly, after I've gone down to the Assembly_Info depths? There is quite a bit I'm not getting, and I think a problem may be that I'm thinking of this as 'navigating' or moving a cursor, if you will. Examples of XPath path expressions have not helped me.
Instead of always using $tree as the starting point for the findnodes method, you can use any other node, especially also child nodes. Then you could use a relative XPath expression. For example:
for my $record ($tree->findnodes('/Wrapper/Record')) {
for my $assembly ($record->findnodes('./Assembly')) {
for my $part ($assembly->findnodes('./Part')) {
}
}
}

Want to skip a tag and get by index

Given this XML:
<mets:mets>
<mets:fileSec>
<mets:fileGrp ID="fileGrp001" USE="image/dynamic">
<mets:file ID="filebib4112678_18760203_1_24_0001_m.jp2" MIMETYPE="image/jp2" SIZE="5308416"
CREATED="2009-11-10T00:00:00" USE="image/dynamic" ADMID="techMD001"
CHECKSUM="c07f516d77d8a5ca452775d489ffe78c" CHECKSUMTYPE="MD5">
<mets:FLocat LOCTYPE="URL" xlink:type="simple"
xlink:href="file:bib4112678_18760203_1_24_0001_m.jp2"/>
</mets:file>
<mets:file ID="filebib4112678_18760203_1_24_0002_m.jp2" MIMETYPE="image/jp2" SIZE="5308416"
CREATED="2009-11-10T00:00:00" USE="image/dynamic" ADMID="techMD002"
CHECKSUM="6497ceb7a8477fbe9ba4ff9e6e57999f" CHECKSUMTYPE="MD5">
<mets:FLocat LOCTYPE="URL" xlink:type="simple"
xlink:href="file:bib4112678_18760203_1_24_0002_m.jp2"/>
</mets:file>
</mets:fileGrp>
<mets:fileGrp ID="fileGrp002" USE="text/alto">
<mets:file ID="filebib4112678_18760203_1_24_0001_alto.xml" MIMETYPE="text/xml" SIZE="1114112"
CREATED="2009-11-10T00:00:00" USE="text/alto" ADMID="techMD005"
CHECKSUM="e391852693f78d2eb024caf6dbdb97c6" CHECKSUMTYPE="MD5">
<mets:FLocat LOCTYPE="URL" xlink:type="simple"
xlink:href="file:bib4112678_18760203_1_24_0001_alto.xml"/>
</mets:file>
<mets:file ID="filebib4112678_18760203_1_24_0002_alto.xml" MIMETYPE="text/xml" SIZE="1114112"
CREATED="2009-11-10T00:00:00" USE="text/alto" ADMID="techMD006"
CHECKSUM="e391852693f78d2eb024caf6dbdb97c6" CHECKSUMTYPE="MD5">
<mets:FLocat LOCTYPE="URL" xlink:type="simple"
xlink:href="file:bib4112678_18760203_1_24_0002_alto.xml"/>
</mets:file>
</mets:fileGrp>
</mets:fileSec>
</mets:mets>
This expression :
/mets/fileSec/fileGrp[2]/file[2]/#ADMID
gives the result "techMD006"
However, I would like to get the same result using something like this expression/query:
/mets/fileSec//file[4]/#ADMID
I.e I don't want to bother about the fileGrp element, since it makes things more complicated. Unfortunately the expression above didn't work..
Does anyone know how to make such an expression?
thanx!
Your expression retrieves all file elements that are a descendant of /mets/fileSec and are the fourth child of their parent:
/mets/fileSec//file[4]/#ADMID
But you have no such elements. What you want is to retrieve all file elements that are a descendant of /mets/fileSec and then take the fourth one. Use this:
(/mets/fileSec//file)[4]/#ADMID

Resources