How do I count the XML properties using xpath in ruby? - ruby

I have this XML:
<SPEECH>
<SPEAKER>ADAM</SPEAKER>
<LINE>Yonder comes my master, your brother.</LINE>
</SPEECH>
<SPEECH>
<SPEAKER>ORLANDO</SPEAKER>
<LINE>Go apart, Adam, and thou shalt hear how he will</LINE>
<LINE>shake me up.</LINE>
</SPEECH>`enter code here`
<STAGEDIR>Enter OLIVER</STAGEDIR>
<SPEECH>
<SPEAKER>ADAM</SPEAKER>
<LINE>Now, sir! what make you here?</LINE>
</SPEECH>
How do I count how many lines are when a SPEAKER with text Adam has in total?
I tried something like this:
#source.xpath("//SPEAKER[//*[contains(text(), 'ADAM')]]//LINE")

I'm not familiar with Ruby, but the XPath to get all LINE elements from SPEAKER named "ADAM" would be:
//SPEECH[SPEAKER='ADAM']/LINE
or if you want use contains instead of an exact match for SPEAKER:
//SPEECH[contains(SPEAKER, 'ADAM')]/LINE
Brief explanation:
//SPEECH: find SPEECH elements anywhere in the document...
[contains(SPEAKER, 'ADAM')]: ...where its child element SPEAKER contains text 'ADAM'
/LINE: from such SPEECH elements, select child element LINE
xpathtester demo
A few problems in your attempted XPath:
//*[contains(text(), 'ADAM')] will match any element within the entire XML document that contains text 'ADAM', not just within SPEAKER element because it starts with / which point to the root document. You should, at least, add . at the beginning
LINE is not descendant of SPEAKER, so //SPEAKER[...]//LINE will not match any element in the XML above

Related

Select XML Node by position

I have the following XML structure
<Root>
<BundleItem>
<Item>1</Item>
<Item>2</Item>
<Item>3</Item>
</BundleItem>
<Item>4</Item>
<Item>5</Item>
<Item>6</Item>
<BundleItem>
<Item>7</Item>
<Item>8</Item>
<Item>9</Item>
</BundleItem>
</Root>
And by providing the following xPath
//Item[1]
I am selecting
<Item>1</Item>
<Item>4</Item>
<Item>7</Item>
My goal is to select only <Item>1</Item> or <Item>7</Item> regardless of the parent element where they are found and only depending on the position, which i am providing in the xPath.
Is it possible to do that only by using the position and without providing additional criterias in the xPath ?
//Item[1] selects the all the first child elements that are <Item/> regardless of their parent.
To get the two items you are looking for you could use //Item[text() = 1 or text() = 7].
A good tutorial can be found at w3schools.com and you can play with XPath expressions over your XML input here. (I am not affiliated with either of these resources but find them useful.)

XPATH - get all <tag> text data except the last two <tag>

This is my HTML data
<book></book>
<book></book>
<book></book
....
....
'N' books
I'd like to get all the text data between all the <book></book> nodes except the last 2 <book> nodes
Basically i need //book[1 to n-2]//text()
Is there an XPATH query i can write for this?
One possible way to get all book elements within the same parent, except the last two book elements (in other words, excluding the last and one before the last) :
//book[position() < last()-1]

Using Nokogiri with multiple search elements

In this XML snippet I need to replace the data in the UID for some of the blocks. The actual file contains more than 100 similar blocks.
Although I have been able to extract subsets based on name="Track (Timeline)", I am struggling to reduce this subset to the specific block I need by also using the data in the <TrackID>, if name="Track (TimeLine)" and the text of <TrackID> is 0x1200 then set UID to xxxx.
I am new to Nokogiri and, although I write test scripts, I do not consider myself a programmer.
<StructuralMetadata key="06.0E.2B.34.02.53.01.01.0D.01.01.01.01.01.3B.00" length="116" name="Track (TimeLine)">
<EditRate>25/1</EditRate>
<Origin>0</Origin>
<Sequence>32-04-25-67-E7-A7-86-4A-9B-28-53-6F-66-74-65-6C</Sequence>
<TrackID>0x1200</TrackID>
<TrackName>Softel VBI Data</TrackName>
<TrackNumber>0x17010101</TrackNumber>
<UID>34-C1-B9-B9-5F-07-A4-4E-8F-F4-53-6F-66-74-65-6C</UID>
</StructuralMetadata>
<StructuralMetadata key="06.0E.2B.34.02.53.01.01.0D.01.01.01.01.01.3B.00" length="116" name="Track (TimeLine)">
<EditRate>25/1</EditRate>
<Origin>0</Origin>
<Sequence>35-12-2D-86-E6-74-0B-4C-B4-24-53-6F-66-74-65-6C</Sequence>
<TrackID>0x1300</TrackID>
<TrackName>Softel VBI Data</TrackName>
<TrackNumber>0x0</TrackNumber>
<UID>37-0C-80-34-4C-8D-CE-41-85-F3-53-6F-66-74-65-6C</UID>
</StructuralMetadata>
Using xpath:
//StructuralMetadata
will select all StructuralMetadata elements in your XML. The double slash at the start means to select nodes wherever they appear in the document.
You don't want all the nodes though, you can filter the ones you want with a predicate:
//StructuralMetadata[#name="Track (TimeLine)" and TrackID="0x1200"]
This will select all StructuralMetadata elements that have a name attribute with the value Track (TimeLine), and a TrackID child element with contents 0x1200.
As you're interested in the UID element, you can further refine the expression:
//StructuralMetadata[#name="Track (TimeLine)" and TrackID="0x1200"]/UID
This expression will match all the UID elements that are children of StructuralMetadata elements that match the predicate described above.
Putting this to use:
require 'nokogiri'
# Parse the document, assuming xml_file is a File object containing the XML
doc = Nokogiri::XML(xml_file)
# I'm assuming there is only one element in the document that matches
# the criteria, so I'm using at_xpath
node = doc.at_xpath('//StructuralMetadata[#name="Track (TimeLine)" and TrackID="0x1200"]/UID')
# At this point, doc contains a representation of the xml, and node points to
# the UID node within that representation. We can update the contents of
# this node
node.content = 'XXX'
# Now write out the updated XML. This just writes it to standard output,
# you could write it to a file or elsewhere if needed
puts doc.to_xml
A great way to approach this problem is with the ‘map reduce’ style of programming, which works to take a large list of things and narrow it down and combine it into the result you're after. Specifically, Array#find and Array#select are really useful for this sort of problem. Check out this example:
require 'nokogiri'
xml = Nokogiri::XML.parse(File.read "sample.xml")
element = xml.css('StructuralMetadata').find { |item|
item['name'] == "Track (TimeLine)" and item.css('TrackID').text == "0x1200"
}
puts element.to_xml
This little program first uses the CSS selector to get all of the <StructuralMetadata> elements in the document. It returns an array, which we can filter to just what we want using the Array#find method. Array#select is its cousin which returns an array of all the matching objects instead of the first one it happens to find.
Inside the block we have a test to check if the <StructuralMetadata> tag is the one we’re after. Then it puts the element.to_xml string to the console so you can see which thing it found if you run this as a command-line script. Now you can find the element, you can modify it in the usual way and save out a new XML file or whatever.

Can't get nth node in Selenium

I try to write xpath expressions so that my tests won't be broken by small design changes. So instead of the expressions that Selenium IDE generates, I write my own.
Here's an issue:
//input[#name='question'][7]
This expression doesn't work at all. Input nodes named 'question' are spread across the page. They're not siblings.
I've tried using intermediate expression, but it also fails.
(//input[#name='question'])[2]
error = Error: Element (//input[#name='question'])[2] not found
That's why I suppose Seleniun has a wrong implementation of XPath.
According to XPath docs, the position predicate must filter by the position in the nodeset, so it must find the seventh input with the name 'question'. In Selenium this doesn't work. CSS selectors (:nth-of-kind) neither.
I had to write an expression that filters their common parents:
//*[contains(#class, 'question_section')][7]//input[#name='question']
Is this a Selenium specific issue, or I'm reading the specs wrong way? What can I do to make a shorter expression?
Here's an issue:
//input[#name='question'][7]
This expression doesn't work at all.
This is a FAQ.
[] has a higher priority than //.
The above expression selects every input element with #name = 'question', which is the 7th child of its parent -- and aparently the parents of input elements in the document that is not shown don't have so many input children.
Use (note the brackets):
(//input[#name='question'])[7]
This selects the 7th element input in the document that satisfies the conditions in the predicate.
Edit:
People, who know Selenium (Dave Hunt) suggest that the above expression is written in Selenium as:
xpath=(//input[#name='question'])[7]
If you want the 7th input with name attribute with a value of question in the source then try the following:
/descendant::input[#name='question'][7]

Use XPath to select the element with a certain token in the value

I have the following XML:
<ZMARA SEGMENT="1">
<MATERIAL>000000000030001004</MATERIAL>
<PRODUCT_GROUP>14000IAA</PRODUCT_GROUP>
<PRODUCT_GROUP_DESC>HER 30 AR NEW Size</PRODUCT_GROUP_DESC>
<CLASS_CODE>I046</CLASS_CODE>
<CLASS_CODE_DESC>Heritage 30</CLASS_CODE_DESC>
<CHARACTERISTICS_01>,001,PLANNING_ALERT_PERCENTAGE, 50.000,PLANNI</CHARACTERISTICS_01>
<CHARACTERISTICS_02>X,001,COLOR_ATTRIBUTE,Weathered Wood,WEWD,Col</CHARACTERISTICS_02>
<CHARACTERISTICS_03>,001,ARMA_UOM,SALES SQUARE,SSQ,ARMA UNIT OF M</CHARACTERISTICS_03>
<CHARACTERISTICS_04>,001,ARMA_A_CATEGORY,05-Below 260 Lam/Multi-l</CHARACTERISTICS_04>
</ZMARA>
Using XPath I need to select the CHARACTERISTICS_XX element whose value contains the COLOR_ATTRIBUTE token. It will not always be characteristics_02. Thanks for the help. I am a total noob at XPath.
This looks like its taken from a sap idoc, you can probably be lucky that the fieldnamed are not 6 character long abbreviations :)
The answer given by spinon is correct, however if there could be another element that contains the text 'COLOR_ATTRIBUTE', this would give a more specific match:
/ZMARA/*[starts-with(local-name(.), 'CHARACTERISTICS_')][contains(.,'COLOR_ATTRIBUTE')]
Another suggestion is to avoid the '//' expression if you know where the ZMARA element can occur, in the expression above ZMARA would only be searched as a root element which would be more performant.
This should work:
//ZMARA/*[contains(.,'COLOR_ATTRIBUTE')]

Resources