I'm scraping SuperLawyers.com for name and address info of lawyers. It's scraping all of the correct data except the phone number. The profile pages have the phone number twice, of course I'm happy with extracting either of them. An example of a page:
https://profiles.superlawyers.com/massachusetts/somerville/lawyer/wyckoff-nissenbaum/e854f9a4-28d2-46e6-bf69-dee74c7ffdb1.html
My xPath: phone = response.xpath('//div[#id="lawyer_phone_button"]//text()').extract()
You should not skip the a tag in this case. Use below
phone = response.xpath('//div[#id="lawyer_phone_button"]/a/text()').extract()
Works on the page you posted
Related
I would like to get the "3,776" price in this website.
https://jp.mercari.com/item/m68422230699
I copy the full xpath, but it returs #N/A
=IMPORTXML("https://jp.mercari.com/item/m68422230699","/html/body/div[1]/div[1]/div/div/div/main/article/div[2]/section[1]/section[1]/div/mer-price//span[2]")
#N/A, in this case, is a result of trying to scrape JavaScript content/elements which Google Sheets does not support. you can test this simply by disabling JS for a given site and what's left can be scraped. in your case it's nothing:
There is a google sheet containing a list of MPN's (manufacturer part numbers). Trying to scrape a site called wikiarms for the UPC Codes when I have the MPN for an item.
I have the correct formula for doing this on another site.
=IMPORTXML("http://gun.deals/search/apachesolr_search/"&B1,"//dd/a[../../dt[contains(text(),'UPC')]]|//dd/span[../../dt[contains(text(),'UPC')]]")
Trying to figure out what the correct xpath to complete this formula. Some videos I have watch said to open the page in Chrome and use inspector to select and copy the xpath to complete the importxml function. I tried this with no luck.
Sample
Visit https://www.wikiarms.com/guns?q=20071
In the table there is a button "available in 6 stores" click that to reveal the list. The UPC should be listed after the MPN.
If I copy the xpath in Chrome this is the result
/html/body/div[1]/div/div/div[2]/div/div/div[2]/div[2]/table/tbody/tr[2]/td[5]
=IMPORTXML("https://www.wikiarms.com/guns?q="&B2,"xpath here")
What do I have to add at the end of this formula to pull in the UPC code? I will be using this formula to pull in UPC code for about 1000 items.
Thank you for your help.
Using your sample link, try
=IMPORTXML("https://www.wikiarms.com/guns?q=20071","//td[#class='upc']/a/#title")
and see if it works for you.
I am having problems with locating elements using xpath while trying to write automated webUI tests with Arquillian Drone + Graphene.
To figure things out I tried to locate the search-button on the google homepage. Even that I am not getting done. Neither with an absolute or a relative xpath.
However, I am able to locate elements using IDs or when the xpath string has an ID in it. But only when the ID is a real ID and is not generated. For example on google homepage: The google-logo has a real ID "hplogo". I can locate this element by using directly the ID or the ID within the xpath-expression.
Why is locating the google logo using the ID "hplogo" possible but it fails while using the absolute xpath "/html/body/div[1]/div[5]/span/center/div[1]/div/div"?
I am really confused. What am I doing wrong? Any help is appreciated!
EDIT:
WebElement e = browser.findElement(By.xpath("/html/body/div[1]/div[5]/span/center/div[1]/div/div"));
is causing a NoSuchElementException.
Your expression works on
Firefox, but on webkit-based browser (e.g., chrome) the rendered DOM is a bit different. Maybe it depends on localization (google.co.uk for me). If I force on google.com the image logo for me is:
/html/body/div/div[5]/span/center/div[1]/img on firefox 37 and /html/body/div/div[6]/span/center/div[1]/img on Chome 42.
EDIT:
After discussing in chat, we figure out that HTMLUNIT is indeed creating a DOM that is different from the one real browsers render. Suggested to migrate to FirefoxDriver
I'm 'scraping' a few product descriptions from a website and bringing them into a google spreadsheet using importXML.
It has gone fairly smoothly, but there is one major snag that I would love to correct, and I need your help!
The website in question prohibits those posting products from including contact information (email addresses usually) in the product description. Sometimes people ignore the rule, and include the contact information anyways. When this occurs, the website automatically hides the contact information in the product description, replacing it with [obscured], as in "...please feel free to contact me at [obscured]" or something close to that. The [obscured] appears in a different colour, and is obviously treated differently by the website.
When these product descriptions are imported into my spreadsheet, the [obscured] causes the scraping to kind of be 'bumped'-- the description text stops prior to [obscured], the word [obscured] appears in an adjacent cell all by itself, and the description text that follows [obscured] then continues in a third cell.
This separation ruins the alignment and logic in my spreadsheet, as product descriptions having an [obscured] word become broken up and misaligned from those that do not.
I would love to be able to have my importXML or XPath accommodate for this, and essentially 'ignore' the [obscured]. I don't mind it being included in the scraped description, but I want to stop the breaking-up into 3 separate adjacent cells.
The [obscured] is part of a 'span' that appears to occasionally lie within the description class 'desc' I am calling.
Is there a way to do this? Instruct importXML to import that 'desc' class BUT 'ignore/omit/exception' of the span which might sometimes appear within?
I've included the source code (inspect element in Safari) below:
<div class="desc descFull collapsed">
<span class="obscureText">[obscured]</span>
As mentioned, this span only occurs in some of the product descriptions, not all of them.
Does anyone know what kind of language I would use in the importXML to call the 'desc' but ignore the 'span', or prevent the splitting into 3 cells when the [obscured] is encountered??
My current call is
=ImportXML(A1,"//div[#class='desc']")
which works fine, unless the [obscured] span is encountered.
Thank you for any help you can give!
Unless Google Drive is breaking the definition of Xpath, Xpaths can't be used to query CSS classes, like CSS selectors can.
The Xpath //div[#class='desc'] will only match a div element with a class attribute that is literally "desc". It won't match "desc descFull collapsed" as the string is different.
As for excluding the text of the obscured node, that would require finding the text nodes and exluding on, which would return a nodeset, not a string, and you wouldn't be able to concatenate these back together using XPath 1.0. If Google Drive uses XPath 2.0 it might be possible, using the techniques in that linked question.
I am new in windows phone development. Please tell me is there is any option to highlight the phone number, email id and url links. And also want to set auto link. I have a textblock that contains description ( ex:- email id, phone number and some text ). I want to set auto link property..please tell me is it possible in windows phone 7? or is there any ways to set auto link for email id?
I solve my problem by combining these following links data :-
How to set the links in a text block clickable in wp7
http://msdn.microsoft.com/en-us/sharepointandwindowsphone7trainingcourse_addinglauncherstoemailandphonenumberslab_topic2.aspx
I think it is helpful for developers to set autolink (option to call on phone number and mail on mail id ).
Although only discussing URLs, this question should be a starting point:
How to set the links in a text block clickable in wp7
Unfortunately there is no built-in autolink functionality.