Making Link Description point to Link Reference in IMPORTXML statement - xpath

I am scraping FINVIZ financial data using the google sheets IMPORTXML function, specifically:
=IMPORTXML("http://finviz.com/quote.ashx?t="&B1,"//table[#id='news-table']/tr")
Against the following source:
<table width="100%" cellpadding="1" cellspacing="0" border="0" id="news-table" class="fullview-news-outer">
<tr>
<td width="130" align="right" style="white-space:nowrap">Feb-15-20 03:58PM </td>
<td align="left">U.S. Woman From Cruise Falls Ill as 2,200 Head Home: Virus Update <span style="color:#aa6dc0;font-size:9px">Bloomberg</span></td>
</tr>
<tr>
<td width="130" align="right">03:41PM </td>
<td align="left">Afraid of sky-high stock valuations? Consider this deep value strategy <span style="color:#aa6dc0;font-size:9px">MarketWatch</span></td>
</tr>
<tr>
<td width="130" align="right">10:47AM </td>
<td align="left">If you could buy only one stock for 5G and artificial intelligence, this would be it <span style="color:#aa6dc0;font-size:9px">MarketWatch</span></td>
</tr>
</table>
Hence, if I choose the stock ticker for Apple (AAPL) then the XPATH component of the IMPORTXML call above returns the following:
Question: How do I modify the XPATH component to make the news description clickable to its HREF?
I DO NOT want the URL to replace the news - just the news to point to its destination such that clicking the news takes me to the link.

try:
=ARRAYFORMULA({INDEX(IMPORTXML("http://finviz.com/quote.ashx?t="&B1,
"//table[#id='news-table']/tr"),,1),
HYPERLINK(IMPORTXML("http://finviz.com/quote.ashx?t="&B1,
"//table[#id='news-table']/tr//a/#href"),
INDEX(IMPORTXML("http://finviz.com/quote.ashx?t="&B1,
"//table[#id='news-table']/tr"),,2))})

Related

Xpath's //tr/td/a[text()="France"] shows 3 matches. How to get value of "x" result?

There is for example such piece of code:
<tr>
<td width="270" class="news">
Ukraine
<span>17.06.15<span>
</td>
</tr>
<tr>
<td width="270" class="news">
France
<span>17.06.15<span>
</td>
</tr>
<tr>
<td width="270" class="news">
USA
<span>17.06.15<span>
</td>
</tr>
<tr>
<td width="270" class="news">
France
<span>10.10.15<span>
</td>
</tr>
<tr>
<td width="270" class="news">
Germany
<span>17.06.15<span>
</td>
</tr>
<tr>
<td width="270" class="news">
France
<span>23.12.15<span>
</td>
</tr>
In dev panel in Chrome this xpath //tr/td/a[text()="France"] shows me 3 results.
I cant find the way to get the value I need.
For example, how to get the date of 2nd result (10.10.15)?
I tried different ways with position, something like //tr/td/a[text()="France"][position=2]/span/text() and //tr/td/a[position(text()="France")][2]/span/text()etc, but no success. Maybe "position" is not what I need?
EDITED:
Positions of "France" in code differ depends on a page. In the example above right now positions are 2,4,6, but on the other page it can be 3,10,15,25,27. How to get the 2nd result and the last one, without taking into account the position where 'France' actually located?
This is one possible way :
(//tr/td[a='France'])[2]/span/text()
The parentheses before position index is required because ([]) has a higher precedence (priority) than (// and /) *. So this expression //tr/td[a='France'][2]/span/text(), for example, will look for the 2nd td within a single tr parent instead, which doesn't exist in the HTML sample you posted.
*: for further explanation on this matter: How to select specified node within Xpath node sets by index with Selenium?
Your 'xml' was invalid so first I corrected it (I know it isn't valid HTML but good enough to test):
<?xml version="1.0" encoding="utf-8"?>
<form>
<tr>
<td width="270" class="news">
France
<span>17.06.15</span>
</td>
</tr>
<tr>
<td width="270" class="news">
France
<span>10.10.15</span>
</td>
</tr>
<tr>
<td width="270" class="news">
France
<span>23.12.15</span>
</td>
</tr>
</form>
Then, this worked:
//tr[position()=2]/td/a[text()="France"]/../span/text()

Listing job offers (schema.org’s JobPosting)

I have a page with list of jobs jobs offers and every job in list is link to page with job offer.
And I have a problem with Microdata, and my question is, which variant is better?
First variant:
<table itemscope itemtype="http://schema.org/JobPosting">
<tr>
<td itemprop="title" itemtype="http://schema.org/JobPosting" itemscope>job 1</td>
</tr>
<tr>
<td itemprop="title" itemtype="http://schema.org/JobPosting" itemscope>job 2</td>
</tr>
<tr>
<td itemprop="title" itemtype="http://schema.org/JobPosting" itemscope>job 3</td>
</tr>
</table>
Second variant:
<table>
<tr itemscope itemtype="http://schema.org/JobPosting">
<td itemprop="title"><a href..>job 1</a></td>
</tr>
<tr itemscope itemtype="http://schema.org/JobPosting">
<td itemprop="title"><a href..>job 2</a></td>
</tr>
<tr itemscope itemtype="http://schema.org/JobPosting">
<td itemprop="title"><a href..>job 3</a></td>
</tr>
</table>
Your first variant means: There is a JobPosting which has three titles. Each of these titles consists of another JobPosting.
Your second variant means: There are three JobPostings, each one has a title.
So you want to go with your second variant.
Note that you have an error on your current page. Instead of the example contained in your question, on your page you use itemprop="title" on the a element. But then the href value is the title, not the anchor text.
So instead of
<td>
<a itemprop="title" href="…" title="…">…</a>
</td>
<!-- the value of 'href' is the JobPosting title -->
you should use
<td itemprop="title">
<a class="list1" href="…" title="…">…</a>
</td>
<!-- the value of 'a' is the JobPosting title -->
And why not use the url property here?
<td itemprop="title">
<a itemprop="url" href="…" title="…">…</a>
</td>
The second one. The first one is describing a table as JobPosting which isn't a JobPosting.

Outlook 2007 and 2010 table cell formatting lost with long content

The set up:
<table width="600" >
<tr>
<td width="400" rowspan="2" valign="top">
With very long content here*
</td>
<td width="200" valign="top">
Top-aligned content
</td>
</tr>
<tr>
<td valign="bottom">
*Bottom-aligned content loses vertical alignment
and appears as if valign="middle"
</td>
</tr>
</table>
Example code is in jsfiddle as it is too long (lots of content needed to trigger the bug).
So see these:
http://jsfiddle.net/webhelpla/XZyg2/ sent as an email looks OK
http://jsfiddle.net/webhelpla/XZyg2/1/ sent as an email: bottom-aligned content is not bottom-aligned anymore.
Any ideas and experience with workarounds for this?
try adding vertical-align:bottom; as well,
<td valign="bottom" style='vertical-align:bottom;' >
*Bottom-aligned content loses vertical alignment
and appears as if valign="middle"
</td>
try this fiddle. i removed the rowspan=2 from the td, always use cellpadding in place of that.
Add table-layout:fixed to the css for the table and see if that helps. Coder1984 is correct in adding the "style=" tag inside the td since some email clients do better with that....
At any rate, it is very hard to predict how html email will render in various clients. I use email on acid to check rendering in a wide range of clients, from webmail through email clients to mobile....
Outlook has an issue with content of blocks over a certain size (2300px if I remember correctly). You may be able to avoid the issue with a third cell in your right-hand row:
<table width="600" >
<tr>
<td width="400" rowspan="3" valign="top">
With very long content here*
</td>
<!-- Add minimal heights to force the middle row to take the space -->
<td width="200" valign="top" height="1">
Top-aligned content
</td>
</tr>
<tr><td style="page-break:always"><!-- Let's make a page break *here* --></td></tr>
<tr>
<td valign="bottom" height="1">
*Bottom-aligned content loses vertical alignment
and appears as if valign="middle"
</td>
</tr>
</table>

Need query for XPath that finds all <tr> elements that contain 7 <td> elements

Hello and hopefully thanks for the help.
Honestly I am not very experienced at XPath and I am hoping a guru out there will have a quick answer for me.
I am scraping a web page for data. The defining aspect of the data I want is that it is contained in a row <tr> that has 7 <td> elements. Each <td> element has one of the pieces of data I need to import. I am using the HTML Agility Pack on CodePlex to grab the data, but I can't seem to figure out how to define the query.
Contained in the web page is a section like this:
<table border="0" cellpadding="3" cellspacing="1" width="100%">
<tr class="bgWhite" xmlns:msxsl="urn:schemas-microsoft-com:xslt">
<td class="dataHdrText02" valign="top" width="50" align="center"><nobr>SYMBOL</nobr></td>
<td class="dataHdrText02" valign="top" align="center">PERIOD</td>
<td class="dataHdrText02" valign="top" align="center" width="*">EVENT TITLE</td>
<td class="dataHdrText02" valign="top" align="center">EPS ESTIMATE</td>
<td class="dataHdrText02" valign="top" align="center">EPS ACTUAL</td>
<td class="dataHdrText02" valign="top" align="center">PREV. YEAR ACTUAL</td>
<td class="dataHdrText02" valign="top" align="center"><nobr>DATE/TIME (ET)</nobr></td>
</tr>
<tr class="bgWhite">
<td align="center" width="50"><nobr>CSCO </nobr></td>
<td align="center">Q4 2011</td>
<td align="left" width="*">Q4 2011 CISCO Systems Inc Earnings Release</td>
<td align="center">$ 0.38 </td>
<td align="center">n/a </td>
<td align="center">$ 0.43 </td>
<td align="center"><nobr>10-Aug-11</nobr></td>
</tr>
<tr class="bgWhite">
<td align="center" width="50"><nobr>CSCO  </nobr></td>
<td align="center">Q3 2011</td>
<td align="left" width="*">Q3 2011 Cisco Systems Earnings Release</td>
<td align="center">$ 0.37 </td>
<td align="center">$ 0.42 </td>
<td align="center">$ 0.42 </td>
<td align="center"><nobr>11-May-11 AMC</nobr></td>
</tr>
<tr class="bgWhite" xmlns:msxsl="urn:schemas-microsoft-com:xslt">
<td align="center" colspan="7"><img src="/format/cb/images/spacer.gif" width="1" height="4"></td>
</tr>
</table>
My goal is to grab the earnings event data and place it into a database for analysis. My original thought was to grab all <tr> elements with 7 <td> elements then work with that data. Any advice or alternative suggestions would be welcome.
This should do it for you.
//tr[count(td)=7]

Finding table values in watij using xpath

I am using watij to automate my UI testing. I have many tables in a webpage. I need to find a table which has a width 95%. It contains many rows. I have to find each row with different text say "running first UI test on local" as below adn need to get the td value "Complete". I am not ble to get the value but I get the watij address. Let me know how I can find this.
<table width=95%>
<tr>
<th align="left">
<span id="lblHeaderComponent" style="font-size:10pt;font-weight:bold;">Component</span>
</th>
<th align="left">
<span id="lblHeaderServer" style="font-size:10pt;font-weight:bold;">Server</span>
</th>
<th align="left">
<span id="lblHeaderStatus" style="font-size:10pt;font-weight:bold;">
</span>
</th>
</tr>
<tr>
<td align="left"
nowrap="nowrap" style="font-size:12px;">running first UI test on local</td>
<td align="left" style="font-size:12px;">Google</td>
<td align="left" style="font-size:12px;">
<a style='color:#336600;'>Complete</a>
</td>
</tr>
<tr>
<td align="left"
style="border-top:1px solid #cfcfcf;border-bottom:1px solid #cfcfcf;"
colspan="3"
style="font-size:12px; color:#ff3300;">
</td>
</tr>
<tr>
<td align="left" nowrap="nowrap" style="font-size:12px;">running second UI test on local</td>
<td align="left" style="font-size:12px;">Google</td>
<td align="left" style="font-size:12px;">
<a style='color:#336600;'>Complete</a>
</td>
</tr>
</table>
You can try an xpath visualizer like this one to assist you in getting the right expression. It lets you see the results visually.
Using XPath on HTML assumes the HTML is XHTML - in other words it must be well-formed XML.

Resources