I'm using selenium to automate the testing. I'm facing an issue when I try to select the checkbox having title as "1794 Pecos Rd, Las Vegas NV 89115, UNITED STATES" of a below table row. So I'm using XPath to do this. Below is the HTML.
<tr class="odd" role="row">
<td class="sorting_1" title="<input type="checkbox">">
<input type="checkbox" style="background-color: rgb(255, 255, 255);">
</td>
<td title="1206164015">1206164015</td>
<td title="1794 Pecos Rd, Las Vegas NV 89115, UNITED STATES">1794 Pecos Rd, Las Vegas NV 89115, UNITED STATES</td>
</tr>
Below is the XPATH that I have tried:
//*[#title="110 Pennington St, Tuscon AZ 85701, UNITED STATES"]/preceding-sibling::td/input
Related
I am trying to all p elements located between two h5 elements. The starting h5 text is "Subject" and the second h5 text is "tenders file".
You may see the picture attached as well.
I don't want to have other p elements which are coming after the second h5.
I have tried the following XPath:
//p[preceding-sibling::h5//*[contains(text() , 'SUBJECT')] and following-sibling::h5//*[contains(text() , 'Tender’s Files,')]] trying to get idea from [enter link description here][2]
but could not get the right paragraphs. It still selects other paragraphs after the second h5.
<div>
<table class="table table-striped table-bordered table-hover" width="90%">
<tbody>
<tr>
<td style="vertical-align: middle;" colspan="2" width="90%">
<h5 style="padding-left: 10px;"><strong><span style="color: #3577be;">Tender Title:</span> Testing of Non-Fortified Wheat Flour in NES</strong></h5>
</td>
</tr>
<tr>
<td style="vertical-align: middle;" width="45%">
<h5 style="padding-left: 10px;"><strong><span style="color: #3577be;">Tender No:</span> SYRIA-TA-2021-005</strong></h5>
</td>
<td style="vertical-align: middle;">
<h5 style="padding-left: 10px;"><strong><span style="color: #3577be;">Location:</span> North East Syria</strong></h5>
</td>
</tr>
<tr>
<td style="vertical-align: middle;" colspan="2">
<h5 style="padding-left: 10px;"><strong><span style="color: #3577be;">Tender Package Available from:</span> 2021-01-10</strong></h5>
</td>
</tr>
<tr>
<td style="vertical-align: middle;" colspan="2">
<h5 style="padding-left: 10px;"><strong><span style="color: #3577be;">Deadline for Offer Submission:</span> 2021-01-18 17:00 (Iraqi Time)</strong></h5>
</td>
</tr>
</tbody>
</table>
<table class="table " width="90%">
<tbody>
<tr>
<td style="text-align: center;"> </td>
</tr>
</tbody>
</table>
<h5><strong><u>SUBJECT:</u></strong> <strong>Testing of Non-Fortified Wheat Flour in NES</strong></h5>
<p>Our organization, a non-profit organization, provides humanitarian assistance to “people in need”, is seeking quotations from eligible contractors to <strong>Testing of Non-Fortified Wheat Flour in NES</strong>. Our organization anticipates awarding Multiple or Single contract(s) as a result of this Solicitation. Our organization reserves the right to award more or none under this RFQ.</p>
<p>All bids shall be submitted <strong>via e-mail to</strong> <span id="cloak1f9ac73a082c1f52174ccee4f406b81c"><strong>Syr-tendering#blumont.org</strong></span> <strong>as PDF format and clearly written the subject of the tender</strong> This RFQ is in no way obligates our organization Our organization to award a contract nor does it commit our organization to pay any cost incurred in the preparation and submission of a proposal.</p>
<p>Our organization bears no responsibility for data errors resulting from transmission or conversion processes.</p>
<p> </p>
<ul>
<li><strong>To help us with our procurement effort, please indicate in your email where (ngotenders.net) you saw this tender/procurement notice.</strong></li>
</ul>
<p><strong>Sincerely</strong></p>
<p><strong>Procurement Committee</strong></p>
<h5><strong>Tender’s Files,</strong></h5>
<h5><strong>5ffb04ba52a49-005-announcement.zip, </strong></h5>
<hr>
<h5 dir="rtl"><strong><u>الموضوع</u></strong><strong><u>:</u></strong> <strong>فحص الطحين الغير مدعم في شمال شرق سوريا.</strong><strong> </strong></h5>
<p dir="rtl">منظمتنا و هي منظمة غير ربحية تعمل لخدمة المنكوبين في العالم و تسعى للحصول على عروض أسعار من المقاولين المؤهلين لغرض الموضوع: <strong>فحص الطحين الغير مدعم في شمال شرق سوريا.</strong> وتتوقع منظمتنا منح (عقود) متعددة أو مفردة نتيجة لهذا الطلب. وتحتفظ منظمتنا بالحق في منح التعاقد بأكثر أو أقل من المتوقع للطلب أعلاه.</p>
<p dir="rtl">لهذا الطلب. وتحتفظ منظمتنا بالحق في منح التعاقد بأكثر أو أقل من المتوقع للطلب أعلاه.</p>
<p dir="rtl"> يجب على جميع مقدمي العطاءات تقديم العروض عبر الايميل :<strong>عبر الايميل: </strong><span id="cloakc42a61e471daa10a7992dbd8b44f9b26"><strong>Syr-tendering#blumont.org</strong></span> <strong>و بصيغة</strong><strong> PDF</strong> و تم التوضيح للموضوع المناقصة بان المنظمة لا تلتزم بأي حال من الأحوال بمنح العقد كما أن المنظمة لا تلتزم بدفع أي تكاليف متكبدة في إعداد وتقديم العرض.</p>
<p dir="rtl">كما ان منظمتنا لا تتحمل أية مسؤولية عن أي أخطاء في البيانات الناتجة عن عمليات النقل أو التحويل او المحادثة.</p>
<p dir="rtl">
</p><p dir="rtl"><strong>مع فائق الاحترام و التقدير</strong></p>
<p dir="rtl"><strong>لجنة المشتريات</strong></p>
<h5><strong>Tender’s Files,</strong></h5>
<h5><strong>5ffb04ba52a49-005-announcement.zip, </strong></h5>
</div>
the page source code.
enter link description here
Using techniques from the following Q/A:
XPath to select all elements between two headings?
Testing text() nodes vs string values in XPath
The following XPath,
//p[ preceding-sibling::h5[starts-with(normalize-space(),'SUBJECT:')]
and following-sibling::h5[normalize-space()='Tender’s Files,']]
will select all p elements between your two targeted headlines, as requested.
Update after OP included actual markup:
Your actual markup includes duplicate
<h5><strong>Tender’s Files,</strong></h5>
headings. The above XPath will select through to the last such heading.
If you want to select through only the first such heading, use this XPath instead:
//p[ preceding-sibling::h5[starts-with(normalize-space(),'SUBJECT:')]
and following-sibling::h5[normalize-space()='Tender’s Files,']
and not(preceding-sibling::h5[normalize-space()='Tender’s Files,'])]
Your xpath should work if you add this:
//p[preceding-sibling::h5//*[contains(text() , 'SUBJECT')] and (following-sibling:: h5//*[contains(text() , 'Tender’s Files,')])[2]]
I have a dynamic web table and I want to select the node on the basis of text value of two different text attributes.
//tr[.//td[contains(text(),'SATWIK GHANSIYAL')] and .//td[contains(text(),'07/07/2002')]]
HTML:
<html><head></head><body><table>
<tbody><tr style="background-color:White;height:24px;">
<td class="gridtext" align="center">
<span class="checkboxclass"><input id="ctl00_ContentPlaceHolder1_grdUsers_ctl02_chkSelect" type="checkbox" name="ctl00$ContentPlaceHolder1$grdUsers$ctl02$chkSelect" onclick="javascript:setTimeout('__doPostBack(\'ctl00$ContentPlaceHolder1$grdUsers$ctl02$chkSelect\',\'\')',
0)"></span>
</td><td class="gridtext" align="left" style="background-color:#FDE9D9;">SATWIK GHANSIYAL</td>
<td class="gridtext" align="left" style="background-color:#FDE9D9;" xpath="1">RAJESH GHANSIYAL</td>
<td class="gridtext" align="left" style="background-color:#FDE9D9;">SHELLY</td>
<td class="gridtext" align="left" style="background-color:#FDE9D9;">07/07/2002</td>
</tr>
</tbody></table>
</body></html>
I am getting the massage no element found
use this.
//tr[.//td[contains(.,'SATWIK GHANSIYAL')] and .//td[contains(.,'07/07/2002')]]
trying to evaluate this expression based on the given xml
//tr[.//td[contains(text(),'SATWIK GHANSIYAL')] and .//td[contains(text(),'07/07/2002')]]
text() is returning multiple sequence, thus giving me this error message.
Unable to perform XPath operation. A sequence of more than one item is not allowed as the first argument of contains() ("", "")
I am trying to parse the following HTML using Ruby and Nokogiri:
<div class="vevent">
<table width="750"><tr>
<td width="25"> </td>
<td valign="top" width="200">
<font size="2" face="sans-serif">
<font color="black"><b>June 30, 2015</b></font>
<br>
<span class="dtstart"><span class="value-title" title="2015-06-30"></span></span><br><span class="summary"><font color="#92161" size="3"><b>Band Concert</b></font></span>
<br><font color="#333333">Event</font><br>
<br>
<br>
<br clear="left">Have a question? email us.<br>
<br></font>
</td>
<td valign="top" width="10"></td>
<td valign="top">
<br clear="left"><font color="#92161">111 Main Street</font><br>
<font color="#92161">Mainstreet, Ohio 55111</font>
<a rel="nofollow" href="http://maps.google.com/maps?f=q&source=s_q&hl=en&geocode=&q=%221700+111+MainStreet+NE+Mainstreet,+Ohio+55111%22" target="_blank"><font size="1" face="sans-serif">map link</font></a><br><br>
<font color="#92161"><font size="2" face="sans-serif">Telephone:</font> 3305551000</font><br><br>
Visit our website for complete information.<br><br>
Enjoy a summer evening concert on Main Street at 8pm. Doors and cash bar open at 7pm.<br><br>Look for more details and ticket sales to be released soon on our website<br> <br><br>
<br>
</td>
</tr></table>
</div>
I am trying to grab the last bit of text:
Visit our website for complete information.<br><br>
Enjoy a summer evening concert on Main Street at 8pm. Doors and cash bar open at 7pm.<br><br>Look for more details and ticket sales to be released soon on our website<br> <br><br>
Here is my code thus far:
events = doc.css("div.vevent")
events.collect do |row|
row.css("td")[3]
end
This will get me to the third td which has the text that I am looking for as follows:
<td valign="top">
<br clear="left"><font color="#92161">111 Main Street</font><br>
<font color="#92161">Mainstreet, Ohio 55111</font>
<a rel="nofollow" href="http://maps.google.com/maps?f=q&source=s_q&hl=en&geocode=&q=%221700+111+MainStreet+NE+Mainstreet,+Ohio+55111%22" target="_blank"><font size="1" face="sans-serif">map link</font></a><br><br>
<font color="#92161"><font size="2" face="sans-serif">Telephone:</font> 3305551000</font><br><br>
Visit our website for complete information.<br><br>
Enjoy a summer evening concert on Main Street at 8pm. Doors and cash bar open at 7pm.<br><br>Look for more details and ticket sales to be released soon on our website<br> <br><br>
<br>
</td>
However once there if I call text on that td I get all the text inside of the td. I only want the last bit that is not inside any element. I tried using XPath and parent so that I could say "just give me the text that is inside the td (not nested inside of another element)" but I couldn't get that to work. Anyone have any ideas on this?
Try this code: doc.css('td')[3].css('> text()').to_s.strip
I suggest using xpath which is more flexible.
If I understand you correctly, you would like:
I only want the last bit that is not inside any element
So, try this XPath:
//table//td[last()]/text()
I'm trying to import some data from a HTML page with feeds importer. The context is this:
<table class="tabela">
<tr valign="TOP">
<td class="formulario-legenda">Nome:</td>
<td nowrap="nowrap">
<b>Raul Fernando de Almeida Moreira Vidal</b>
</td>
</tr>
<tr valign="TOP">
<td class="formulario-legenda">Sigla:</td>
<td>
<b>RMV</b>
</td>
</tr>
<tr valign="TOP">
<td class="formulario-legenda">Código:</td>
<td>206415</td>
</tr>
<tr valign="TOP">
<td class="formulario-legenda">Estado:</td>
<td>Ativo</td>
</tr>
</table>
<table>
<tr>
<td class="topo">
<table>
<tr>
<td class="formulario-legenda">Categoria:</td>
<td>Professor Associado</td>
</tr>
<tr>
<td class="formulario-legenda">Carreira:</td>
<td>Pessoal Docente de Universidades</td>
</tr>
<tr>
<td class="formulario-legenda">Grupo profissional:</td>
<td>Docente</td>
</tr>
<tr valign="TOP">
<td class="formulario-legenda">Departamento:</td>
<td>
<a href="uni_geral.unidade_view?pv_unidade=151"
title="Departamento de Engenharia Informática">Departamento de Engenharia Informática</a>
</td>
</tr>
</table>
</td>
</tr>
</table>
I tried with this:
/html/body/div/div/div/div/div/div/div/table/tbody/tr/td/table/tbody/tr[1]/td[2]
but nothing appears. Can someone help me with the right syntax to obtain "Grupo Profissional"?
Quick answer that might work
Considering just the HTML sample you provided (which only has two tables) you can select the text you want using this expression, based on the table's position:
//table[2]//tr[3]/td[1]/text()
This will work in the HTML you pasted above. But it might not work in your actual scenario, since you might have other tables, the table you want to select has no ID and you didn't suggest some invariant text in your code which could be used to anchor the context for the expression. Assuming the initial part of your XPath expression (the div sequence) is correct, you might be able to use:
/html/body/div/div/div/div/div/div/div/table[2]//tr[3]/td[1]/text()
But it's wuite a fragile expression and vulnerable to any changes in the document.
A (possibly) better solution
A better alternative is to look for some identifier you could use. I can only guess, since I don't know your code. In your sample code, I would guess that Codigo and the number following it 206415 might be some identifier. If it is, you could use it to anchor your context. First you select it:
//table[.//td[text()='Código:']/following-sibling::td='206415']
The expression above will select the table which contains a td with the exact text Código: followed by a td containing the exact text 206415. This will create a unique context (considering that the number is an unique identifier). From that context, you can now select the text you want, which is inside the next table (following-sibling::table[1]). This is the context of the second table:
//table[.//td[text()='Código:']/following-sibling::td='206415']/following-sibling::table[1]
And this should select the text you want (Grupo profissional:) which is in the third row tr[3] and first cell/column td[1] of that table:
//table[.//td[text()='Código:']/following-sibling::td='206415']/following-sibling::table[1]//tr[3]/td[1]/text()
I'm trying to get table with content of MMEL codes from this site and I'm trying to accomplish it with CSS Selectors.
What I've got so far is:
require_relative 'sources/Downloader'
require 'nokogiri'
html_content = Downloader.download_page('http://www.s-techent.com/ATA100.htm')
parsed_html = Nokogiri::HTML(html_content)
tmp = parsed_html.css("tr[*]")
puts tmp.text
And I'm getting error while trying to get this tr with attribute. How can I complete this task to get this table in simple form because I want to parse it to JSON. It would be nice go get this in sections and call it in.each block.
EDIT:
I'd be nic if I can get things in block like this (look into pages source)
<TR><TD WIDTH="10%" VALIGN="TOP" ROWSPAN=5>
<B><FONT FACE="Arial" SIZE=2><P ALIGN="CENTER">11</B></FONT></TD>
<TD WIDTH="40%" VALIGN="TOP" COLSPAN=2>
<B><FONT FACE="Arial" SIZE=2><P>PLACARDS AND MARKINGS</B></FONT></TD>
<TD WIDTH="50%" VALIGN="TOP">
<FONT FACE="Arial" SIZE=2><P ALIGN="LEFT">All procurable placards, labels, etc., shall be included in the illustrated Parts Catalog. They shall be illustrated, showing the part number, Legend and Location. The Maintenance Manual shall provide the approximate Location (i.e., FWD -UPPER -RH) and illustrate each placard, label, marking, self -illuminating sign, etc., required for safety information, maintenance significant information or by government regulations. Those required by government regulations shall be so identified.</FONT></TD>
</TR>
This should print all those TR's from source at line 96. There are three tables in that page and table[1] has all the text you needed:
require 'nokogiri'
doc = Nokogiri::HTML(open('http://www.s-techent.com/ATA100.htm'))
doc.css("table")[1].css("tr").each do |i|
puts i #=> prints the exact html between TR tags (including)
puts i.text #=> prints the text
end
For instance:
puts doc.css("table")[1].css("tr")[2]
prints the following:
<tr>
<td valign="TOP" colspan="3">
<b><font face="Arial" size="2"><p align="CENTER">GROUP DEFINITION - AIRCRAFT</p></font></b>
</td>
<td valign="TOP">
<font face="Arial" size="2"><p align="LEFT">The complete operational unit. Includes dimensions and
areas, lifting and shoring, leveling and weighing, towing and taxiing, parking and mooring, requi
red placards, servicing.</p></font>
</td>
</tr>
You could do the same using xpath also:
Below is the content from the first table of the webpage given in the post by OP:
require 'nokogiri'
require 'open-uri'
doc = Nokogiri.HTML(open('http://www.s-techent.com/ATA100.htm'))
doc.xpath('(//table)[1]/tr').each do |tr|
puts tr.to_html(:encoding => 'utf-8')
end
Output:
<tr>
<td width="33%" valign="MIDDLE" colspan="2">
<p><img src="S-Tech-Logo-Blue2.gif" width="274" height="127"></p>
</td>
<td width="67%" valign="MIDDLE">
<b><i><font face="Arial" color="#0000ff">
<p align="CENTER"><big>AIRCRAFT PARTS MANUFACTURING ASSISTANCE (PMA)</big><br><big>DAR SERVICES</big></p></font></i></b>
</td>
</tr>
Now, if you want to collect the last table rows, then do:
require 'nokogiri'
require 'open-uri'
doc = Nokogiri.HTML(open('http://www.s-techent.com/ATA100.htm'))
p doc.xpath('(//table)[3]/tr').to_a.size # => 1
doc.xpath('(//table)[3]/tr').each do |tr|
puts tr.to_html(:encoding => 'utf-8')
end
Output:
<tr>
<td width="40%" valign="TOP" height="10">
<p align="CENTER"><b><font face="Arial" size="2" color="#0000ff">149 AZALEA CIRCLE • LIMERICK, PA 19468-1330</font></b></p>
</td>
<td width="30%" valign="TOP" height="10">
<p align="CENTER"><b><font face="Arial" size="2" color="#0000ff">610-495-6898 (Office) • 484-680-0507 (Cell)</font></b></p>
</td>
<td width="110%" valign="TOP" height="10">
<p align="CENTER"><b><font face="Arial" size="2">E-mail S-Tech</font></b></p>
</td>
</tr>