Unable to extract the data through xpath or css selector - xpath

When I do inspect element or view source, the required data is available on page, but when I extract them by using xpath or css, I am getting an empty list. Even I tried to extract all the nodes and it's content but that required data which was shown in View page source are not getting extracted. What could be the reason?
Below is the example code:
I need to extract href value from tag.
<div class="url-link">
<a data-id="abc" class="abc xyz" data-is-avod="" href="/ab/extract/xyz/3&t=25">Title</a>
<span>title</span>
</div>
I used response.xpath('//div/a/#href').extract() xpath but I am unable to extract the desired content.
I have analyzed and found when I logged in to the website then only inspect element or View page source shows this <a> tag else it does not show. So i think to get the #href text i need to pass the form with login information, but I don't know how to pass a form and how to get details of the form.
Please help.

Related

How to extract javascript text with xpath

Is it possible to extra a part of a javascript which is not visible with use of xpath?
<script type="text/javascript"> "agri_discount_group":"","agri_discount_text":"Promoties ","reindex":"1","list_name":"","list_active_variant":"0","list_is_single":"","recalc_rules":"0","agri_discount_type9_art":"","dropshipment":"","leadtime_dropshipment":"0","leadtime_delivery_extra_costs":"","agri_discount_start":"1675033560","agri_discount_stop":"1676242740","stock_feed":"2","stock_feed_tstamp":"1675396701","default_scancode":"8710429017146","tstamp_first_online":"1558595899", </script>
I now how i get a whole text of a div/span tag which is visible on the website but not how i can extract the data which is not visible on the frontend (but is on the source code). And secondly i am not sure how i can only select the code behind the text element "default scancode". Is this possible?
I use octoparse to extract the info from a website.
Edit:
This part of the code is what i am looking for:
Code element . The entire page source is shown here:
source code

Using relative xpath to scrape custom div attribute

I have a few hundred URLs where I'm trying to scrape the image path for an image on a page. Each page is the same format, but the div class is unique to each page.
I want to be able to use import xml in Google sheets to scrape just the content of the data-path element.
I've tried and failed to use xpath to pull out the URLs.
<div class="uniqueid active" data-path="/~/media/Images/image.jpg" data-alt="Anything"></div>
E.g. //div[#class='*']/#data-path"
Example of site: https://www.cannondale.com/en/Australia/Bike/ProductDetail?Id=77d3b8fe-41f7-42b6-bf69-b5cf0ae55548&parentid=undefined
If div class has the pattern "uniqueid active", then you can try the following XPath:
//div[contains(#class, "active")]/#data-path
Otherwise, if div class can be anything, use this query:
//div[#class]/#data-path
UPDATE:
I tried to get values of data-path attributes with IMPORTXML, but didn't succeed. Tried to do it using Python (requests and lxml) and it works. So probably the problem is in Google Sheets - some limitations or bugs, idk.

Extracting links (get href values) with certain text with Xpath under a div tag with certain class

SO contributors. I am fully aware of the following question How to obtain href values from a div using xpath?, which basically deals with one part of my problem yet for some reason the solution posted there does not work in my case, so I would kindly ask for help in resolving two related issues. In the example below, I would like to get the href value of the "more" hyperlink (http://www.thestraddler.com/201715/piece2.php), which is under the div tag with content class.
<div class="content">
<h3>Against the Renting of Persons: A conversation with David Ellerman</h3>
[1]
</p>
<p>More here.</p>
</div>
In theory I should be able to extract the links under a div tag with
xidel website -e //div[#class="content"]//a/#href
but for some reason it does not work. How can I resolve this and (2nd part) how can I extract the href value of only the "here" hyperlink?

Visualforce link component for rerendering?

Is there a visualforce component for links? I'd like a link () on my page which can trigger an ajax call to one of the functions in the controller and rerender an element on the page.
This is how I'm doing it right now, but I don't want it to be a button, I need a link:
There are two standard apex link components, an apex:outputLink and an apex:commandLink. Both render anchor tags in html. From what you are asking, it sounds like need the command link, but I've posted information about both of them here.
You can find out more about them in the Visualforce Developer's Guide.
The apex:outputLink should be used when you want to create a standard hyperlink:
This component is rendered in HTML as an anchor tag
with an href attribute. Like its HTML equivalent, the body of an
is the text or image that displays as the link. To
add query string parameters to a link, use nested
components.
<apex:outputLink value="https://www.salesforce.com"
id="theLink">www.salesforce.com</apex:outputLink>
The example above renders the following HTML:
<a id="theLink" name="theLink"
href="https://www.salesforce.com">www.salesforce.com</a>
The apex:commandLink is probably what you need.
... executes an action defined by a controller, and then either
refreshes the current page, or navigates to a different page based on
the PageReference variable that is returned by the action. An
apex:commandLink component must always be a child of an apex:form
component.
<apex:commandLink action="{!save}" value="Save" id="theCommandLink"/>
The example above renders the following HTML:
<a id="thePage:theForm:theCommandLink" href="#" onclick="generatedJs()">Save</a>

How do HtmlAgilityPack extract text from html node whose class attribute appended dynamically

Dear friends,I want to extract text 平均3.6 星 from this code segment excerpted from amazon.cn.
<div class="content"><ul>
<li><b>用户评分:</b>
<span class="crAvgStars" style="white-space:no-wrap;">
<span class="asinReviewsSummary" ref="dp_db_cm_cr_acr_pop_" name="B004GUSIKO">
<a>
<span class="swSprite s_star_3_5 " title="平均3.6 星">
<span>平均3.6 星</span>
</span>
</a>
My question is span class tag value "s_star_3_5 " vary from different customer's rating level and appended dynamically. So I attempt to use doc.DocumentNode.SelectSingleNode(" //span[#class='swSprite']").InnerText or //span[#class='swSprite s_star_3_5 '], but the result is an error or not what my want !
Any suggestions?
First of all, I suggest you saving the value of doc.DocumentNode.OuterHtml to a local .html file and see if the code you're obtaining is that code. The thing is that sometimes you start parsing a website using HtmlAgilityPack, but the very first problem is that you're not getting the valid HTML correctly. Maybe you're getting a 404 error, or a redirection, etc.
I'm suggesting this because I tested //span[#class='swSprite s_star_3_5 '] and worked correctly.
That was the issue in the following questions:
Selecting nodes that have an attribute with spaces using HTMLAgilityPack
XPath Query Problem using HTML Agility Pack
If that doesn't help, post the HTML code and I'll help you ;)
This works for me:
HtmlDocument doc = new HtmlDocument();
doc.Load(myHtml);
HtmlNode node = doc.DocumentNode.SelectSingleNode("//span[starts-with(#class, 'swSprite')]");
Console.WriteLine("Text=" + node.InnerText.Trim());
and outputs
平均3.6 星
Note I use the XPATH starts-with function.

Resources