Google sheet importxml xpath query - xpath

I'm trying to get the data for "Next Earnings Announcement" for this http://www.bloomberg.com/quote/1880:HK site.
I have tried
=ImportXml( "http://www.bloomberg.com/quote/1880:HK", "//span[#class='company_stat']" )
=ImportXml( "http://www.bloomberg.com/quote/1880:HK", "/html/body/div[2]/div[1]/div[1]/div[2]/div[2]/div[2]/div[1]/div[1]/div[3]/table/tbody/tr[16]/td/text()" )
getting a #N/A, want 10/27/2014 as result

Instead of your fragile, horribly complicated XPath expression, try a useful one:
//th[normalize-space() = 'Next Earnings Announcement']/following-sibling::td

Related

XPath to return value only elements containing the text

would like to return value of 'Earnings per share' (i.e. -7.3009, -7.1454, -19.6295, -1.6316)
from "http://www.aastocks.com/en/stocks/analysis/company-fundamental/earnings-summary?symbol=01801"
using below as a example for '-7.3009'
=importxml("http://www.aastocks.com/en/stocks/analysis/company-fundamental/earnings-summary?symbol=01801", "//tr/td[contains(text(),'Earnings')]/td[2]")
However, it returns #N/A.
Can someone help?
this xpath will return your specific data
id("cnhk-list")//tr[td[contains(., "Earnings Per Share")]]/td[starts-with(#class, "cfvalue")]//text()
xpath explanation in english is " you actually needs to select the td where row contains Earnings Per Share which is in table that has some specific ID

how to scrape popups on the website with scrapy

I want to scrape the name, age and gender of the reviews on boots.com. For age and gender you can only see this data once you hover the mouse on the name in each review. First of all my I made the code for scraping the name but its not working. Second of all I don't know how to scrape age and gender from the pop up. Could you help me please. Thanks in advance.
Link:https://www.boots.com/clearasil-ultra-rapid-action-treatment-cream-25ml-10084703
Screenshot of popup
import scrapy
from ..items import BootsItem
from scrapy.loader import ItemLoader
class bootsSpider(scrapy.Spider):
name = 'boots'
start_urls = ['https://www.boots.com/clearasil-ultra-rapid-action-treatment-cream-25ml-10084703']
allowed_domains = ["boots.com"]
def parse(self, response):
reviews = response.xpath("//div[#class='bv-content-item-avatar-offset bv-content-item-avatar-offset-off']")
for review in reviews:
loader = ItemLoader(item=BootsItem(), selector=review, response=response)
loader.add_xpath("name", ".//div[#class='bv-content-reference-data bv-content-author-name']/span/text()")
yield loader.load_item()
Javascript is used to display the data (78 reviews in your case). You should use Selenium to scrape this. To display all the comments, you'll have to click multiple times on the following button :
//button[contains(#class,"load-more")]
Then, to scrape the name of all consumers you can use the following XPath (then use .text method to extract the data) :
//li//div[#class="bv-content-header-meta"][./span[#class="bv-content-rating bv-rating-ratio"]]//span[#class="bv-author"]/*/span
Output : 78 nodes
If you want to scrape the text reviews you can use :
//li//div[#class="bv-content-header-meta"][./span[#class="bv-content-rating bv-rating-ratio"]]/following::p[1]
Output : 78 nodes
To get the age and the gender of each consumer, you'll have to mouse over their names (see the preceding XPath) then fetch the value with the following XPath :
//span[#class="bv-author-userinfo-value"][preceding-sibling::span[#class="bv-author-userinfo-data"][.="Age"]]
//span[#class="bv-author-userinfo-value"][preceding-sibling::span[#class="bv-author-userinfo-data"][.="Gender"]]
Alternatively, if you don't want to/can't use Selenium, you can download the JSON (see the XHR requests in your browser) which contains everything you need.
https://api.bazaarvoice.com/data/batch.json?passkey=324y3dv5t1xqv8kal1wzrvxig&apiversion=5.5&displaycode=2111-en_gb&resource.q0=reviews&filter.q0=isratingsonly:eq:false&filter.q0=productid:eq:868029&filter.q0=contentlocale:eq:en_EU,en_GB,en_IE,en_US,en_CA&sort.q0=submissiontime:desc&stats.q0=reviews&filteredstats.q0=reviews&include.q0=authors,products,comments&filter_reviews.q0=contentlocale:eq:en_EU,en_GB,en_IE,en_US,en_CA&filter_reviewcomments.q0=contentlocale:eq:en_EU,en_GB,en_IE,en_US,en_CA&filter_comments.q0=contentlocale:eq:en_EU,en_GB,en_IE,en_US,en_CA&limit.q0=100&offset.q0=0&limit_comments.q0=3&callback=bv_1111_50671
For this case, I set the &limit.q0= to 100 and offset.q0 to 0 to be sure to fetch all the data. Once you get the JSON, you'll find all the information in : Batched Results>q0>Results>0,1,2,3,...,78
Output :
To download the JSON userequest and extract the data with json module.

Splitting XPATH produces more results than the actual possible

I have been trying to gather some historical data of managers of football clubs and noticed a weird behaviour. I am trying to scrape the history table of the clubs managed by a manager from this website : https://www.transfermarkt.co.in/carlo-ancelotti/profil/trainer/523
With the entire xpath as a single input to fetch the response, the code works alright as expected
clubs = response.xpath("//div[#id='yw1']//td[#class='hauptlink no-border-links']//a/text()").extract()
print(clubs)
Output : ['Everton', 'SSC Napoli', 'Bayern Munich ', 'Real Madrid', 'Paris SG',\
'Chelsea', 'Milan', 'Juventus', 'AC Parma', 'Reggiana', 'Italy']
That's the list of clubs from the foretold history table. However, while the xpath is split as shown in the following code, it fetches names of clubs from the other table too in spite of it having a totally different div id. I mean it's not 'yw1' for the other table
career_table = response.xpath("//div[#id='yw1']")
clubs = career_table.xpath("//td[#class='hauptlink no-border-links']//a/text()").extract()
print(clubs)
Output : ['Everton', 'SSC Napoli', 'Bayern Munich ', 'Real Madrid', 'Paris SG',\
'Chelsea', 'Milan', 'Juventus', 'AC Parma', 'Reggiana', 'Italy', 'Milan', 'Retired',\
'AS Roma', 'Milan', 'AC Parma', 'AS Roma', 'Parma U19', 'AC Parma', 'Reggiolo', 'Parma U19']
Can someone enlighten me, what is that I'm missing here?
You need to use relative XPath (starting .):
clubs = career_table.xpath(".//td[#class='hauptlink no-border-links']//a/text()").extract()
print(clubs)

YouTube Analytics API doesn't return ROWS for query for a specific video

I am using the YouTube Analytics API to get analytics for specific queries.
For channel queries it works great. Problems occurred, when I execute a query for a specific video..like this (start is defined):
/*****************************
* deviceType-stats
*****************************/
$optparams = array('dimensions' => 'deviceType',
"filters" => "video==" . $videoId,
);
$currentDate = date("Y-m-d", time());
$resDeviceTypes = $youtubeService->analytics->reports->query(
"channel==" . $videoendorsement->ytchannelid
,$startDate
,$currentDate
,"views,estimatedMinutesWatched"
,$optparams);
I don't get any result ROWS for this query, but only for some video Ids. For other video Ids it works. Btw: The videos where I don't get any result ROWS are online since 2 days on YouTube. Do I have to wait a little bit longer since I get a result from API?!
Overall there is additionally a problem to get demography and geography for any video. Doesn't matter which video Id I use, I don't get any result ROWS for demography and geography stats.
Does anybody know where the problem is?
When no results match the criteria specified, no rows are returned. For example, if you query for a range of dates prior to a video being published.

Google spreadsheet xpath only selects arrays instead of single values

So I am trying to use this bit of code to only select a single show from a show database, I only want the latest selected show, but google spreadsheet keeps returning an array of the lastest show from every season.
=importXML("http://services.tvrage.com/feeds/episode_list.php?sid="&B2, "//episode[number(translate(airdate, '-', '')) < "&I2&"][last()]")
B2 is the show id = 11215 and I2 is the today's date in iso style format = 20130626 this date is acquired from google spreadsheet with the command =TEXT( TODAY() ; "yyyyMMdd" )
So can anyone help me get just the latest show for the current season?
Put parentheses around the first path of your XPath expression:
(//episode[number(translate(airdate, '-', '')) < "&I2&"])[last()]

Resources