<td nowrap>Source name</td>
<td style="text-align: justify">Fresh postmortem eye<br>
I need to extract the text "Fresh postmortem eye" that comes after "Source name" on a form. I tried this:
//Source name/following-sibling::text()[1]
and didn't get anything. I'm hacking together a webscraper.
Try this one:
//td[.="Source name"]/following-sibling::td[1]/text()
Related
I need to modify the format of a form in a model-driven app to make it more readable/intuitive. Currently, the form looks like this:
I tried to use a web resource to create a simple HTML table with <script> and Xrm.Page.getAttribute() to pull in the relevant fields under the planned and actual columns, but that isn't working. I set the dependencies and assigned it to the proper form element, but no luck. The code that I used is this:
<div>
<table>
<tr>
<td><u>Tasks</u></td>
<td><u>Planned</u></td>
<td><u>Actual</u></td>
</tr>
<tr>
<td>Task 1</td>
<td><script>Xrm.Page.getAttribute("[plannedField_1]")</script></td>
<td><script>Xrm.Page.getAttribute("[actualField_1]")</script></td>
</tr>
<tr>
<td>Task 2</td>
<td><script>Xrm.Page.getAttribute("[plannedField_2]")</script></td>
<td><script>Xrm.Page.getAttribute("[actualField_2]")</script></td>
</tr>
<tr>
<td>Task 3</td>
<td><script>Xrm.Page.getAttribute("[plannedField_3]")</script></td>
<td><script>Xrm.Page.getAttribute("[actualField_3]")</script></td>
</tr>
</table>
</div>
Is this a valid way to modify form output, or is there another/better way to do this that doesn't involve creating an elaborate solution with dynamically scripted HTML?
Uncheck the Display label on the form for one of the controls.
I am trying to scrape a table which looks like the below.
<table class="table">
<caption>Caption</caption>
<tbody>
<tr>
<th scope="row">Title</th>
<td>Detail</td>
</tr>
<tr>
<th scope="row">Title 2</th>
<td>Detail 2</td>
</tr>
</tbody>
</table>
How would you set up scrapy so my output file generates an output similar to the below?!
Title: Detail
Title2: Detail2
Currently I can get all the text using two css selectors (one for the td's and one for the th's) but I would love to be able to combine these!
Unfortunately the number of rows differs from page to page..
Using xpath:
tabledata={}
for i in response.xpath("//table[#class='table']//tr")
tabledata[i.xpath("th/text()").extract_first()] = i.xpath("td/text()").extract_first()
Output
{"Title":"Detail", "Title 2":"Detail 2"}
I have allowedcontent=true which is working and allowing me to have attributes in my opening tags; however, CKEdtior is still removing the closing tag attributes. I am using the editor to allow modification of simple Handlebars templates that use {{each}} and {{/each}}. The issue comes when using this with a table and wanting to repeat my rows.
For example, I have the following HTML entered into source:
<table>
<tr data-each={{each Person}}">
<td class="col-student-id">{{Identifier}}</td>
<td class="col-name">{{Name}}</td>
</tr data-each="{{/each}}">
</table>
When I click out of source, it removes the attribute on my closing tr tag.
Is there anyway to force CKEditor to not remove this attribute? If not, does anyone know of a way to allows me to use something like this:
<table>
{{each Person}}
<tr>
<td class="col-student-id">{{Identifier}}</td>
<td class="col-name">{{Name}}</td>
</tr>
{{/each}}
</table>
When I try the above example, it is reformatted to be:
<section>{{each Person}} {{/each}}
<table>
<tr>
<td class="col-student-id">{{Identifier}}</td>
<td class="col-name">{{Name}}</td>
</tr>
</table>
Your input source code is invalid - closing tags cannot have attributes in HTML, so CKEditor ignores them. Read more in CKEditor HTML Autocorrection Issue.
I'm trying to import some data from a HTML page with feeds importer. The context is this:
<table class="tabela">
<tr valign="TOP">
<td class="formulario-legenda">Nome:</td>
<td nowrap="nowrap">
<b>Raul Fernando de Almeida Moreira Vidal</b>
</td>
</tr>
<tr valign="TOP">
<td class="formulario-legenda">Sigla:</td>
<td>
<b>RMV</b>
</td>
</tr>
<tr valign="TOP">
<td class="formulario-legenda">Código:</td>
<td>206415</td>
</tr>
<tr valign="TOP">
<td class="formulario-legenda">Estado:</td>
<td>Ativo</td>
</tr>
</table>
<table>
<tr>
<td class="topo">
<table>
<tr>
<td class="formulario-legenda">Categoria:</td>
<td>Professor Associado</td>
</tr>
<tr>
<td class="formulario-legenda">Carreira:</td>
<td>Pessoal Docente de Universidades</td>
</tr>
<tr>
<td class="formulario-legenda">Grupo profissional:</td>
<td>Docente</td>
</tr>
<tr valign="TOP">
<td class="formulario-legenda">Departamento:</td>
<td>
<a href="uni_geral.unidade_view?pv_unidade=151"
title="Departamento de Engenharia Informática">Departamento de Engenharia Informática</a>
</td>
</tr>
</table>
</td>
</tr>
</table>
I tried with this:
/html/body/div/div/div/div/div/div/div/table/tbody/tr/td/table/tbody/tr[1]/td[2]
but nothing appears. Can someone help me with the right syntax to obtain "Grupo Profissional"?
Quick answer that might work
Considering just the HTML sample you provided (which only has two tables) you can select the text you want using this expression, based on the table's position:
//table[2]//tr[3]/td[1]/text()
This will work in the HTML you pasted above. But it might not work in your actual scenario, since you might have other tables, the table you want to select has no ID and you didn't suggest some invariant text in your code which could be used to anchor the context for the expression. Assuming the initial part of your XPath expression (the div sequence) is correct, you might be able to use:
/html/body/div/div/div/div/div/div/div/table[2]//tr[3]/td[1]/text()
But it's wuite a fragile expression and vulnerable to any changes in the document.
A (possibly) better solution
A better alternative is to look for some identifier you could use. I can only guess, since I don't know your code. In your sample code, I would guess that Codigo and the number following it 206415 might be some identifier. If it is, you could use it to anchor your context. First you select it:
//table[.//td[text()='Código:']/following-sibling::td='206415']
The expression above will select the table which contains a td with the exact text Código: followed by a td containing the exact text 206415. This will create a unique context (considering that the number is an unique identifier). From that context, you can now select the text you want, which is inside the next table (following-sibling::table[1]). This is the context of the second table:
//table[.//td[text()='Código:']/following-sibling::td='206415']/following-sibling::table[1]
And this should select the text you want (Grupo profissional:) which is in the third row tr[3] and first cell/column td[1] of that table:
//table[.//td[text()='Código:']/following-sibling::td='206415']/following-sibling::table[1]//tr[3]/td[1]/text()
I want to create an xpath for clicking on "run " (4th column) based on the first column value (xyz). the below xpath doesnt work. Can you suggest a better way of writing the xpath.
//table/tbody/tr/td[text()='xyz fix']/parent::tr/td[4]
<div id="main">
<table class="FixedLayout" width="1000px">
<tbody>
<tr></tr>
<tr>
<td class="RowHeight">
xyz
</td>
<td>xyz fix</td>
<td>1125</td>
<td>
Run
</td>
</tr>
<tr>
<td class="RowHeight">
abc
</td>
<td>abc fix</td>
<td>1125</td>
<td>
Run
</td>
</tr>
</tbody>
</table>
</div>
I don't see why your one didn't work. Please clarify what it means "doesn't work". NoSuchElementException? ElementNotVisibleException? Wrong XPath? Not clicking the link or what?
Meanwhile, try the following XPaths (but the issue could be your Selenium code instead of XPath):
Here I assume you want to the <a> link instead of <td>, because you mentioned you want to click it.
Use XPath predicate:
//*[#id='main']//table/tobdy/tr[td[text()='xyz']]/td[4]/a
Use XPath predicate with attribute selector to avoid using index.
//*[#id='main']//table/tobdy/tr[td[text()='xyz']]//a[contains(#href, 'Instance/Create')]
Use .. to get the parent
//*[#id='main']//table/tobdy/tr/td[text()='xyz']/../td[4]/a