Extra table values using XPath and Yahoo Pipe? - xpath

I'd like to track the appearance of new values in a table via an RSS feed. Specifically, that is new competitions in http://www.kaggle.com/competitions
So I registered for Yahoo Pipes, found the XPath with Firefox XPath Checker to be
id('competitions-table')/tbody/tr/td[1]/div/a/h4
and used the Pipis XPath Fetch module. I'd expect the list of competition names, however, I get zero results :/
Am I doing it incorrectly? Any other suggestions to accomplish that?

Try this one: //table[#id='competitions-table']//tr//h4

Related

Getting a xPath from XML document

I am trying to get some values from an online XML document, but I cannot find the right xpath to navigate to those values. I want to import these values into a Google Spreadsheet document, which requires me to get the exact xpath.
The website is this one, and I am trying to get the information for "WillPay" information from MeetingInfo Venue=S1, Races RaceNo=1, Pools PoolInfo Pool=WIN, in OddsInfo.
For now, the value of "Number=1" should be 3350 (or something close to this, it changes quite often), and I would like to load all of these values onto the google spreadsheet document.
What I've tried is locating the xpath of all of it, and tried to my best attempt to get
"/AOSBS_XML/Meetings/MeetingInfo/Races/Pools/PoolInfo/OddsSet/OddsInfo/#WillPay"
but it doesn't work.
I've been stuck on this problem for months now and I've been avoiding it, but realised I can't anymore because it's hindering my work. Please help.
Thanks!
-Brandon
Try using this xpath expression:
//MeetingInfo[#Venue="S1"]/Races//RaceInfo[#RaceNo="1"]//Pools//PoolInfo[#Pool="WIN"]//OddsSet//OddsInfo[#Number="1"]/#WillPay
An alternative :
//OddsInfo[#WillPay][ancestor::PoolInfo[#Pool='WIN'] and ancestor::RaceInfo[#RaceNo='1'] and ancestor::MeetingInfo[#Venue='S1']]

Problem finding correct Selector for response.xpath or response.css in Scrapy on Coinmarketcap

i´d like to loop over the top20 exchanges on coinmarketcap to crawl the tables, e.g. https://coinmarketcap.com/exchanges/fatbtc/
Now i spent a few hours in finding the Selector, e.g. for Price
In Scrapy Shell i tried ... and many more, but all not working:
from Addon XPath Helper:
response.xpath('/html/body/div[#id='__next']/div[#class='cmc-app-wrapper.cmc-app-wrapper--env-prod.sc-1mezg3x-0.fUoBLh']/div[#class='container.cmc-main-section']/div[#class='cmc-main-section__content']/div[#class='cmc-exchanges.sc-1tluhf0-0.wNRWa']/div[#class='cmc-details-panel-table.sc-3klef5-0.cSzKTI']/div[#class='cmc-markets-listing.lgsxp9-0.eCrwnv']/div[#class='cmc-table.sc-1yv6u5n-0.dNLqEp']/div[#class='cmc-table__table-wrapper-outer']/div/table/tbody/tr[#class='cmc-table-row.sc-1ebpa92-0.kQmhAn'][1]/td[#class='cmc-table__cell.cmc-table__cell--sortable.cmc-table__cell--right.cmc-table__cell--sort-by__price']').getall()
from Chrome Inspector:
response.xpath('/td[#class='cmc-table__cell.cmc-table__cell--sortable.cmc-table__cell--right.cmc-table__cell--sort-by__price']').getall()
from Chrome Inspector copy XPath:
:
response.xpath('//*[#id="__next"]/div/div[2]/div[1]/div[2]/div[2]/div/div[2]/div[3]/div/table/tbody/tr[1]/td[5]').extract()
I´m using the Chrome Inspector and since today an addon called "Xpath helper" for showing the Selectors, but i still don´t really understand what i´m doing there :(. I´d really appreciate any idea how to access that data and to give me a better understanding in finding these selectors.
Pretty easy (I used position() to skip table header):
for row in response.xpath('//table[#id="exchange-markets"]//tr[position() > 1]'):
price = row.xpath('.//span[#class="price"]/text()').get()
# price = row.xpath('.//span[#class="price"]/#data-usd').get() #if you need to be more precise
XPATHs are basically //tagname[#attribute='value'] from HTML.
For your site, you can loop over names with //table[#id='exchange-markets']//tr/td[2]/a
and get prices with //table[#id='exchange-markets']//tr/td[5]
where we are basically saying to look within the table rows on column 5.

Confused about XPath Syntax

Problem Summary:
Hi, I'm trying to learn to use the Scrapy Framework for python (available at https://scrapy.org). I'm following along with a tutorial I found here: https://www.scrapehero.com/scrape-alibaba-using-scrapy/, but I was going to use a different site for practice rather than just copy them on Alibaba. My goal is to get game data from https://www.mlb.com/scores.
So I need to use Xpath to tell the spider which parts of the html to scrape, (I'm about halfway down on that tutorial page on the scrapehero site, at the "Construct Xpath selectors for the product list" section). Problem is I'm having a hell of a time figuring out what syntax should actually be to get the pieces I want? I've been going over xpath examples all morning trying to figure out the right syntax but I haven't been able to get it.
Background info:
So what I want is- from https://www.mlb.com/scores, I want an xpath() command which will return an array with all the games displayed.
Following along with the tutorial, what I understand about how to do this is I'd want to inspect the elements from the webpage, determine their class/id, and specific that in the xpath command.
I've tried a lot of variations to get the data but all are returning empty arrays.
I don't really have any training in XPath so I'm not sure if my syntax is just off somewhere or what, but I'd really appreciate any help on getting this command to return the objects I'm looking for. Thanks for taking the time to read this.
Code:
Here are some of the attempts that didn't work:
response.xpath("//div[#class='g5-component--mlb-scores__game-wrapper']")
response.xpath("//div[#class='g5-component]")
response.xpath("//li[#class='mlb-scores__list-item mlb-scores__list-item--game']")
response.xpath("//li[#class='mlb-scores__list-item']")
response.xpath("//div[#!data-game-pk-id > 0]")'
response.xpath("//div[contains(#class, 'g5-component')]")
Expected Results and Actual Results
I want an XPath command that returns an array containing a selector object for each game on the mlb.com/scores page.
So far I've been able to get generic returns that aren't actually what I want (I can get a selector that returns the whole page by just leaving out the predicates, but whenever I try to specify I end up with an empty array).
So for all my attempts I either get the wrong objects or an empty array.
You need to always check HTML source code (Ctrl+U in a browser) for the data you need. For MLB page you'll find that content you are want to parse is loaded dynamically using JavaScript.
You can try to use Scrapy-Splash to get target content from your start_urls or you can find direct HTTP request used to get information you want (using Network tab of Chrome Developer Tools) and parse JSON:
https://statsapi.mlb.com/api/v1/schedule?sportId=1,51&date=2019-06-26&gameTypes=E,S,R,A,F,D,L,W&hydrate=team(leaders(showOnPreview(leaderCategories=[homeRuns,runsBattedIn,battingAverage],statGroup=[pitching,hitting]))),linescore(matchup,runners),flags,liveLookin,review,broadcasts(all),decisions,person,probablePitcher,stats,homeRuns,previousPlay,game(content(media(featured,epg),summary),tickets),seriesStatus(useOverride=true)&useLatestGames=false&language=en&leagueId=103,104,420

Syntax for scraping double quotes in rapidminer (XPATH)

I'm having trouble using xpath in Rapidminer when trying to retrieve reviews form the google play store. The problem seems to be that these reviews are in double quotes and I can't get rapidminer to spit out the text...only blanks. I have a number of other xpath queries that are working fine for other commands where i use divs and span etc. I'm able to get things to work on google spreadsheet for this query through =importXML, but not in rapidminer.
This is what I have in XPATH:
//*[#class='review-text']")
So I added a /text() to the end and still nothing. I have played around with adding //div instead of //* and have used h:/span also. I'm kind of hoping there's a special syntax for retrieving quotes that i'm unaware of?
Here is the HTML i'm looking to scrape in the image below:
https://i.stack.imgur.com/dl6I8.png
Please see my comment below on further failed tests. Thanks.

Pasting constantly updating xpath into google sheets

I'm pretty fresh and trying to paste certain xpath from a website into sheets.
Url: "https://www.btcmarkets.net/"
Xpath: (from chrome copy xpath function) : //*[#id="LastPriceAUDBTC"]
I keep getting
formula parse error
I have managed to get the table headings on with:
Xpath: "//tr"
but not the information within
Is this even possible?
I know the google finance add-ons but I am analyzing the difference in prices of different exchanges.
QUERY #2
I would also like to
=importxml("http://www.xe.com/currencyconverter/convert/?Amount=1&From=EUR&To=CAD","//*[#id="ucc-container"]/span[2]/span[2]")
Should I be using =importDATA and shaving off what I don't want?
You need to use double quotes around the entire xpath but single quotes around the class name/id name/attribute name:
"//*[#id='LastPriceAUDBTC']"
And
=importxml("http://www.xe.com/currencyconverter/convert/?Amount=1&From=EUR&To=CAD","//*[#id='ucc-container']/span[2]/span[2]")

Resources