X-Path expression to pull data from a table row - xpath

I am new to Xpath. I tried to write an xpath expression which returns all the valid elements in a above table row (Marked as 'Required').
XPath Used:
.//*[#id='reportListItemID0']/td[not(#width) and child::node()] | .//*[#id='reportListItemID0']/td[not(#width) and not(child::node())]
<tr style="" class="deatilRow" id="reportListItemID0" nowrap="">
<td width="10px"> </td>
<td>04/04/2013</td><!----------------------------Required-->
<td width="3px" colspan="10"> </td>
<td>Morkel, Rashid</td><!------------------------Required-->
<td width="3px" colspan="10"> </td>
<td>100668041</td><!-----------------------------Required-->
<td width="3px" colspan="10"> </td>
<td></td><!--------------------------------------Required-->
<td width="3px" colspan="10"> </td>
<td>
XA0404181004596<!--Required-->
</td>
<td width="3px" colspan="10"> </td>
<td>$31.00</td><!--------------------------------Required-->
<td width="3px" colspan="10"> </td>
<td>
<span class="workedYesNoColor">N</span><!----Required-->
</td>
<td width="3px" colspan="10"> </td>
</tr>
The X-Path used returns the below listed elements:
<td>04/04/2013</td>
<td>Morkel, Rashid</td>
<td>100668041</td>
<td></td>
<td> XA0404181004596 </td>
<td>$31.00</td>
<td> <span class="workedYesNoColor">N</span> </td>
But the Expected Result is as below: (The leaf nodes are only required and not the 'td')
<td>04/04/2013</td>
<td>Morkel, Rashid</td>
<td>100668041</td>
<td></td>
XA0404181004596
<td>$31.00</td>
<span class="workedYesNoColor">N</span>
Important Note: The position of the 'td' with tags 'a' and 'span' can vary. They are not present at the 5th and 6th position consistently.
Please let me know what I am missing.

Related

Problem with rowspan and page break in DomPDF (nesting loop)

Problem with rowspan and page break in DomPDF (nesting loop), this is my code
How to make nesting array with rowspan for column number and good page break?
I can't seem to find anything in the documentation
//Problem with rowspan and page break in DomPDF (nesting array), this is my code
How to make nesting array with rowspan for column number and good page break?
I can't seem to find anything in the documentation
Problem with rowspan and page break in DomPDF (nesting array), this is my code
How to make nesting array with rowspan for column number and good page break?
I can't seem to find anything in the documentation//
<table class="table" style="table-layout: fixed; width: 100%;" >
<tbody>
<tr>
<td class="text-center" style="width:5%;">
<h6>III.</h6>
</td>
<td colspan="2">
<h6>INFORMASI TENTANG KUALIFIKASI DAN HASIL YANG DICAPAI</h6>
<h6 class="font-italic font-weight-bold">INFORMATION OF QUALIFICATION AND LEARNING OUTCOME
</h6>
</td>
</tr>
<tr>
<td class="text-center" rowspan="10">
<p>3.1</p>
</td>
<td colspan="2">
<p class=" font-weight-bold">Capaian Pembelajaran</p>
<p class="font-italic font-weight-bold">Learning Outcomes</p>
</td>
</tr>
<tr>
<td class="text-center" style="width:50%;">
<p class=" font-weight-bold">Bahasa Indonesia</p>
</td>
<td class="text-center" style="width:50%;">
<p class="font-weight-bold">Bahasa Inggris</p>
</td>
</tr>
#foreach($kcs as $kc)
<tr>
<td class="text-center" style="width:50%;">
<p class=" font-weight-bold">{{$kc->kategori_id}}</p>
</td>
<td class="text-center" style="width:50%;">
<p class="font-weight-bold font-italic">{{$kc->kategori_en}}</p>
</td>
</tr>
#foreach($cps as $cp)
#if($cp->id_ps==$data->id_ps && $kc->id==$cp->id_kategori)
<tr>
<td class="text-center" style="width:50%;">
{!! $cp->cpl_id!!}
</td>
<td class="text-center font-italic" style="width:50%;">
{!! $cp->cpl_en!!}
</td>
</tr>
#endif
#endforeach
#endforeach
</tbody>
</table>

List only records populated with Capybara

Friends helped me with a solution that validates if there are [active/inactive] records in the list. When I list the records using pp capybara also returns blank lines. How do I disregard empty records?
def validate_active_inactive_records
expect(page).to have_css("td:nth-child(5)", :text => /^(ACTIVE|INACTIVE)$/)
# ***listing records***
page.all('.tvGrid tr > td:nth-child(5)').each do |td|
puts td.text
end
end
<table width="100%" class="tvGrid">
<tbody>
<tr>
<th colspan="1" class="tvHeader">Id</th>
<th colspan="1" class="tvHeader">Code</th>
<th colspan="1" class="tvHeader">Description</th>
<th colspan="1" class="tvHeader">Operational Center</th>
<th colspan="1" class="tvHeader">Status</th>
</tr>
<tr class="tvRowEmpty">
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
</tr>
<tr class="tvRowEmpty">
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
</tr>
<tr class="tvRowEmpty">
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
</tr>
<tr class="tvRowEmpty">
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
</tr>
<tr class="tvRowEmpty">
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
</tr>
<tr class="tvRowEmpty">
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
</tr>
<tr class="tvRowEmpty">
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
</tr>
</tbody>
</table>
Are you asking how to remove the rows with the class tvRowEmpty from your search results? If so, you can use the :not operator in your finder:
def validate_active_inactive_records
expect(page).to have_css("td:nth-child(5)", :text => /^(ACTIVE|INACTIVE)$/)
# ***listing records***
page.all('.tvGrid tr:not(.tvRowEmpty) > td:nth-child(5)').each do |td|
puts td.text
end
end
If you want to exclude any td that just contains you could use the following finder with a regex that filters tags containing only whitespace characters:
page.all('.tvGrid tr > td:nth-child(5)', text: /[\s]^*/).each

xpath that exclude some specific elements

This is a simple version of the HTML of the page that I want analyse:
<table class="class_1">
<tbody>
<tr class="class_2">
<td class="class_3"> </td>
<td class="class_4"> </td>
<td class="class_5"> </td>
</tr>
<tr class="class_2">
<td class="class_3"> </td>
<td class="class_4"> </td>
<td class="class_5"><span class="class_6"></span>square</td>
</tr>
<tr class="class_2">
<td class="class_3"> </td>
<td class="class_4"> </td>
<td class="class_5"><span class="class_7"></span>circle</td>
</tr>
<tr class="class_2">
<td class="class_3"> </td>
<td class="class_4"> </td>
<td class="class_5"><span class="class_6"></span>triangle</td>
</tr>
</tbody>
</table>
You can find the page at
https://sabbiobet.netsons.org/test.html
If you try in a google sheets the function:
=IMPORTXML("https://sabbiobet.netsons.org/test.html";"//td[#class='class_5']")
i'll obtain:
square
circle
triangle
I need to obtain all the <td> with class="class_5" minus the ones that have or <span class=class_7>.
In other words I want to obtain only these values:
Square
Triangle
can somebody help me?
The following XPath expression
//td[#class='class_5' and span and not(span[#class='class_7'])]
selects all td elements having an attribute class with value class_5, having a child element span and not having a child element span where its class attribute has the value class_7.
Note that you could also use
//td[#class='class_5' and span[#class='class_6']]
to get the same result in this case.
This should work:
//td[#class='class_5'][not(text()=' ')][not(./span[#class='class_7'])]
where [not(text()=' ')] is not testing for a reqular space but rather for a symbol with Unicode code U+00A0 that you can input from keyboard in windows using alt+0160 where numbers are to be input from numpad.

Thymeleaf th:block condition

I want to set up some conditions on Thymeleaf templates like this, but it doesn't work.
<table border=2>
<thead>
<tr>
<td> Identifiant </td>
<td> Nom Formation </td>
<td> Descirption Formation </td>
<td> Adresse Formation </td>
<td>Status Formation </td>
<td> Chef Projet </td>
<td> Formateur </td>
<td>Ressource Humain</td>
<td>Update</td>
<td>Liste Devellopeur</td>
</tr>
</thead>
<tbody>
<tr th:each="formations : ${formations}">
<th:block th:if="${StatusFormation}} =='Traitement' }">
<td th:text="${formations.id}"> </td>
<td th:text="${formations.NomFormation}"> </td>
<td th:text="${formations.DescriptionFormation}"> </td>
<td th:text="${formations.StatusFormation}"> </td>
<td th:text="${formations.AdresseFormation}"> </td>
<td th:text="${formations.chef_projet}"> </td>
<td th:text="${formations.formateurs}"> </td>
<td th:text="${formations.ressourcehumain}"> </td>
</th:block>
</tr>
</tbody>
</table>
the erros is
Caused by: org.thymeleaf.exceptions.TemplateProcessingException: Could not parse as expression: "${StatusFormation}} =='Traitement' }" (template: "ChefProjetFormationHome" - line 29, col 11) at org.thymeleaf.standard.expression.StandardExpressionParser.parseExpression(StandardExpressionParser.java:131)
The problem is you add one extra brace in this line:
<th:block th:if="${StatusFormation}} =='Traitement' }">
you should change it to:
<th:block th:if="${StatusFormation} == 'Traitement'">

How do I retrieve multiple row node data from an html table in XPATH?

Sometime during the dark ages a script was built that outputs the following html..
...
<TABLE BORDER=0 FRAME=ALL_FRAMES RULES=ALL_RULES ALIGN=CENTER BGCOLOR="ffffe5">
<CAPTION ALIGN=TOP>
<FONT COLOR=009594 SIZE=-1><B>Access Information</B></FONT>
</CAPTION>
<TR>
<TD ALIGN=RIGHT VALIGN=MIDDLE>
<FONT COLOR=black SIZE=-1><B>Access Circuit(s):</B></FONT>
</TD>
<TD ALIGN=LEFT VALIGN=MIDDLE>
**DATA TO COLLECT 111**
</TD>
<TD ALIGN=RIGHT VALIGN=MIDDLE>
<FONT COLOR=black SIZE=-1><B>Other Circuit(s):</B></FONT>
</TD>
<TD ALIGN=LEFT VALIGN=MIDDLE>
&nbsp
</TD>
</TR>
<TR>
<TD ALIGN=RIGHT VALIGN=MIDDLE>
&nbsp
</TD>
<TD ALIGN=LEFT VALIGN=MIDDLE>
**DATA TO COLLECT AAA**
</TD>
<TD ALIGN=RIGHT VALIGN=MIDDLE>
&nbsp
</TD>
<TD ALIGN=LEFT VALIGN=MIDDLE>
&nbsp
</TD>
</TR>
<TR>
<TD ALIGN=RIGHT VALIGN=MIDDLE>
&nbsp
</TD>
<TD ALIGN=LEFT VALIGN=MIDDLE>
**DATA TO COLLECT BBB**
</TD>
<TD ALIGN=RIGHT VALIGN=MIDDLE>
&nbsp
</TD>
<TD ALIGN=LEFT VALIGN=MIDDLE>
&nbsp
</TD>
</TR>
<TR>
<TD ALIGN=RIGHT VALIGN=MIDDLE>
&nbsp
</TD>
<TD ALIGN=LEFT VALIGN=MIDDLE>
**DATA TO COLLECT CCC**
</TD>
<TD ALIGN=RIGHT VALIGN=MIDDLE>
&nbsp
</TD>
<TD ALIGN=LEFT VALIGN=MIDDLE>
&nbsp
</TD>
</TR>
<TR>
<TD ALIGN=RIGHT VALIGN=MIDDLE>
<FONT COLOR=black SIZE=-1><B>Customer:</B></FONT>
</TD>
...
Sorry, I would show you the table layout but I don't know how without <table> on SO
How can I use XPATH (in PHP) to collect only each DATA TO COLLECT section? So far I've been able to retrieve the first row with //*[*='Access Circuit(s):']/following-sibling::td[1].
Things to note:
This is only a small section of a large document.
I cannot change the scripts output.
I wont know how many rows there will be (figure 0 to 6).
The data should be expected to always be in the same "column".
I may only have XPATH version 1. But version 2 answers are still welcomed.
The expression I came up with is this:
//TR[(.//B[.='Access Circuit(s):']) or ((./preceding-sibling::TR//B[.='Access Circuit(s):']) and (./following-sibling::TR//B[.='Customer:']))]//TD[2]
returns
<TD ALIGN="LEFT" VALIGN="MIDDLE">**DATA TO COLLECT 111**</TD>
<TD ALIGN="LEFT" VALIGN="MIDDLE">**DATA TO COLLECT AAA**</TD>
<TD ALIGN="LEFT" VALIGN="MIDDLE">**DATA TO COLLECT BBB**</TD>
<TD ALIGN="LEFT" VALIGN="MIDDLE">**DATA TO COLLECT CCC**</TD>
It uses the knowledge that the first row contains Access Circuit(s): and the first uncollected row contains Customer:. If you can't be sure of either one of those, then I think it can't be done with a single XPath expression.
Step-by-step
1. //TR[
2. (.//B[.="Access Circuit(s):"])
3. or ( (./preceding-sibling::TR//B[.="Access Circuit(s):"])
4. and (./following-sibling::TR//B[.="Customer:"]) )
5. ]//TD[2]
Means
1. all TR nodes
2. that either contain "Access Circuit(s):"
3. or
- (3.) are positioned after "Access Circuit(s):"
- (4.) and are positioned before "Customer:"
5. all TD nodes that are the second TD of their parents

Resources