I have a table and I am trying to get the no of rows in table. But following statement took too much time to execute.
Dim NumOfRows As Integer = selenium.GetXpathCount("//table[#id='Group']/tbody/tr")
any idea what is taking too long
I'm guessing you must be using IE? The xpath performance is horrific for things like this, and you'll no doubt run into more performance issues. My advice is to simply get the entire DOM out of selenium and parse it yourself with html agility.
http://htmlagilitypack.codeplex.com/
Related
Normally, one would use an XPath query to obtain a certain value or node. In my case, I'm doing some web-scraping with google spreadsheets, using the importXML function to update automatically some values. Two examples are given below:
=importxml("http://www.creditagricoledtvm.com.br/";"(//td[#class='xl7825385'])[9]")
=importxml("http://www.bloomberg.com/quote/ELIPCAM:BZ";"(//span)[32]")
The problem is that the pages I'm scraping will change every now and then and I understand very little about XML/XPath, so it takes a lot of trial and error to get to a node. I was wondering if there is any tool I could use to point to an element (either in the page or in its code) that would provide an appropriate query.
For example, in the second case, I've noticed the info I wanted was in a span node (hence (//span)), so I printed all of them in a spreadsheet and used the line count to find the [32] index. This takes long to load, so it's pretty inconvenient. Also, I don't even remember how I've figured the //td[#class='xl7825385'] query. Thus why I'm wondering if there is more practical method of pointing to page elements.
Some clues :
Learning XPath basics is still useful. W3Schools is a good starting point.
https://www.w3schools.com/xml/xpath_intro.asp
Otherwise, built-in dev tools of your browser can help you to generate absolute XPath. Select an element, right-click on it then >Copy>Copy XPath.
https://developers.google.com/web/tools/chrome-devtools/open
Browser extensions like Chropath can generate absolute or relative XPath for you.
https://autonomiq.io/chropath/
I am trying to get some values from an online XML document, but I cannot find the right xpath to navigate to those values. I want to import these values into a Google Spreadsheet document, which requires me to get the exact xpath.
The website is this one, and I am trying to get the information for "WillPay" information from MeetingInfo Venue=S1, Races RaceNo=1, Pools PoolInfo Pool=WIN, in OddsInfo.
For now, the value of "Number=1" should be 3350 (or something close to this, it changes quite often), and I would like to load all of these values onto the google spreadsheet document.
What I've tried is locating the xpath of all of it, and tried to my best attempt to get
"/AOSBS_XML/Meetings/MeetingInfo/Races/Pools/PoolInfo/OddsSet/OddsInfo/#WillPay"
but it doesn't work.
I've been stuck on this problem for months now and I've been avoiding it, but realised I can't anymore because it's hindering my work. Please help.
Thanks!
-Brandon
Try using this xpath expression:
//MeetingInfo[#Venue="S1"]/Races//RaceInfo[#RaceNo="1"]//Pools//PoolInfo[#Pool="WIN"]//OddsSet//OddsInfo[#Number="1"]/#WillPay
An alternative :
//OddsInfo[#WillPay][ancestor::PoolInfo[#Pool='WIN'] and ancestor::RaceInfo[#RaceNo='1'] and ancestor::MeetingInfo[#Venue='S1']]
i´d like to loop over the top20 exchanges on coinmarketcap to crawl the tables, e.g. https://coinmarketcap.com/exchanges/fatbtc/
Now i spent a few hours in finding the Selector, e.g. for Price
In Scrapy Shell i tried ... and many more, but all not working:
from Addon XPath Helper:
response.xpath('/html/body/div[#id='__next']/div[#class='cmc-app-wrapper.cmc-app-wrapper--env-prod.sc-1mezg3x-0.fUoBLh']/div[#class='container.cmc-main-section']/div[#class='cmc-main-section__content']/div[#class='cmc-exchanges.sc-1tluhf0-0.wNRWa']/div[#class='cmc-details-panel-table.sc-3klef5-0.cSzKTI']/div[#class='cmc-markets-listing.lgsxp9-0.eCrwnv']/div[#class='cmc-table.sc-1yv6u5n-0.dNLqEp']/div[#class='cmc-table__table-wrapper-outer']/div/table/tbody/tr[#class='cmc-table-row.sc-1ebpa92-0.kQmhAn'][1]/td[#class='cmc-table__cell.cmc-table__cell--sortable.cmc-table__cell--right.cmc-table__cell--sort-by__price']').getall()
from Chrome Inspector:
response.xpath('/td[#class='cmc-table__cell.cmc-table__cell--sortable.cmc-table__cell--right.cmc-table__cell--sort-by__price']').getall()
from Chrome Inspector copy XPath:
:
response.xpath('//*[#id="__next"]/div/div[2]/div[1]/div[2]/div[2]/div/div[2]/div[3]/div/table/tbody/tr[1]/td[5]').extract()
I´m using the Chrome Inspector and since today an addon called "Xpath helper" for showing the Selectors, but i still don´t really understand what i´m doing there :(. I´d really appreciate any idea how to access that data and to give me a better understanding in finding these selectors.
Pretty easy (I used position() to skip table header):
for row in response.xpath('//table[#id="exchange-markets"]//tr[position() > 1]'):
price = row.xpath('.//span[#class="price"]/text()').get()
# price = row.xpath('.//span[#class="price"]/#data-usd').get() #if you need to be more precise
XPATHs are basically //tagname[#attribute='value'] from HTML.
For your site, you can loop over names with //table[#id='exchange-markets']//tr/td[2]/a
and get prices with //table[#id='exchange-markets']//tr/td[5]
where we are basically saying to look within the table rows on column 5.
I continue to get this error when I try to run this XPath query
//div[#iti='0']
on this link (flight search from google)
https://www.google.com/flights/#search;f=LGW;t=JFK;d=2014-05-22;r=2014-05-26
I get something like this:
=ImportXML("https://www.google.fr/flights/#search;f=jfk;t=lgw;d=2014-02-22;r=2014-02-26";"//div[#iti='0']")
I verified and the XPath is correct (I get the answer wanted using XPath helper, the answer wanted are the data relative to the first flight selected).
I guess that it is a problem of syntax, but I tried more or less all the combinations of lower/uppercase, punctuation (replacing ; , ' ") and I tried to link the URI and the XPath query stored in cells, but nothing works.
Any help will be appreciated.
As a matter of fact, maybe it is a bug on the new google sheets or they have changed how the function works. I've activated mine and when I try to use the ImportXML it simply wont work. Since I have some old sheets here (on the old mechanism) they still work normally. If I copy and paste the script from the old to the new one it simply doesn't get any data.
Here a example:
=ImportXML("http://www.nytimes.com/pages/todayspaper/index.html";"//div[#class='columnGroup first']//h3")
If I run this on the old mechanism it works fine, but if I run the same on the new mechanism, first it will exchange my ";" for a "," and then it will bring a "#N/A" with a warning of "Error: Imported XML content cannot be parsed".
Edit (05/05/2015):
I am happy to say that I tested this function again today on the new spreadsheets and they've fixed it. I was checking that every two months and now finally they have solved this issue. The example I've added above is now returning information.
I'm sorry, but you won't be able to easily parse Google result pages. The reason your function throws an error is because the content of the page you see in your browser is generated by javascript, and Google spreadsheet doesn't execute js.
Your ImportXML has the right syntax, it doesn't return anything because the node you're looking for isn't there (importXML Parse Error).
You will have to find another source if you want these result in your spreadsheet. For info some libraries already parse the usual result page (http://www.seerinteractive.com/blog/google-scraper-in-google-docs-update for example, if it still works), but I doubt finding one for your special case will be easy.
This gives the answer (importXML Parse Error), but it's not entirely obvious.
ImportXML doesn't load Javascript. When you're building ImportXML queries on Google results, make sure you're testing against a version of the page that has Javascript turned off. You can do this using the Chrome DevTools.
(But I agree that ImportXML is fickle, idiosyncratic, and generally rage-inducing).
Has anyone tested sorting with Selenium? I'd like to verify that sorting a table in different ways work (a-z, z-a, state, date, etc.). Any help would be very much appreciated.
/Göran
Before checking it with selenium, You have to do small thing. Store the table values(which comes after sorting) in a string or array.
Now perform the sorting using selenium and capture the new list as
string new_list= selenium.gettable("xpath");
Now compare both the values and check whether they are same or not.
I have shared a strategy to test sorting feature of an application on my blog. You can use this to automate test cases that verify the sorting feature of an application. You could use it on place like the search result page, item listing and report module of the application. The strategy explained does not require creation of test data and is fully scalable.
You can get value of fields like this:
//div[#id='sortResult']/div[1]/div (this'd be row 1 of the search result)
//div[#id='sortResult']/div[2]/div ( row 2)
(I'm making some assumptions about the HTML structure here, but you get my drift...)
These can be quite fragile assertions, I'd recommend you anchor these xpath references to an outer container element (not the root of your document, as lots of "automatic" tools do).
When you click sort, the value changes. You'll have to find out what the values are supposed to be.
Also watch out for browser compatibility with such xpaths. They're not always ;)
The way I approached this was to define the expected sorted results as an array and then iterate over the results returned from the sorted page to make sure they met my expectations.
It's a little slow, but it does work. (We actually managed to find a few low-level sorting defects on multiple pages this way..)
You could use the WebDriver API from Selenium 2.0 (currently in alpha) to return an array of elements with the findElements command before and after the sort. This becomes a bit more difficult however if what you're sorting is paginated.