Given the following HTML:
<html>
<body>
<div id="first">
<div id="sub">
test
</div>
</div>
<div id="second">
</div>
</body>
</html>
According to the Chrome Developer tools, the XPath to the "test" link is
/html/body/div[1]/div/a
However, when I do
const selector ="/html/body/div[1]/div/a";
await page.waitForXPath(selector);
console.log("after waiting for selector -> selector was found");
it never passes the await page.waitForXPath(selector); line.
Can anyone explain why and how I have to modify the XPath?
I did some experimenting and waiting for an xpath works with /html/body/div[1] but not anymore with /html/body/div[1]/div, also not with /html/body/div[1]/div[1]
I am using puppeteer-core#5.3.1 and Chrome 85.0.4183.121 on Ubuntu.
Update:
Just to make sure my XPath is correct, I tested it in the Chrome Devtools console, where it works fine:
$x("/html/body/div[1]/div/a")
[a] <-- returns expected result
Still can't understand why it's not working with Puppeteer.
Try these xpath expressions and see if they work:
//div/a/text()
should return (using the html in your question) test. And
//div/a/#href
should return
test.html
Try this. It waits until test shows up and return the value.
const selector = await "/html/body/div[1]/div/a";
await page.waitForXPath(selector);
const href = await page.$x(selector);
const href_value = await page.evaluate((...href) => {
return href.map(e => e.href);
}, ...href);
await console.log(href_value);
Related
Using puppeteer sharp I load a page and try to read the value of an attribute.
Html page:
<body>
<img src="data:image/png;base64,R0lGOD" alt="Red dot" />
<a href="#" id="bottle">
I use this:
string awaitXPath = "//img[contains(#src, 'data:image/png;base64')][1]";
var element = await _page.WaitForXPathAsync(awaitXPath, new PuppeteerSharp.WaitForSelectorOptions() { Timeout = 5000 });
string strBase64 = await element.GetPropertyAsync("src").Result.JsonValueAsync<string>();
which is working OK for some cases, but sometimes it happens that my execution freeze when I try to GetPropertyAsync. There is a way to add a timeout to GetPropertyAsync? or maybe somebody has another idea to get #src attribute's value.
Thank you.
Use var strBase64 = await (await element.GetPropertyAsync("src")).JsonValueAsync();
I have the following div
<div data-dmid="product-detail-page" itemscope="" itemtype="http://schema.org/Product" itemid="3600542198158">
from which I would like to extract the itemid -> 3600542198158
I was using the following Xpath which does however not return any value:
//div[#data-dmid='product-detail-page']/#itemid
Could please someone advise how to built the Xpath correctly for it
#
Unfortunately I have to renew my question.
I was looking for the code with Firefox inspection tool.
Looking at the html source code which is different to the output with the inspection tool I have the following part which will be interesting:
<div class="onCanvas content-with-footer">
<div id="container-main" class="content-main">
<div data-dmid="uvp-banner-container" style="height: 54px; width: 100%"></div>
<script>
document.addEventListener("DOMContentLoaded", function() {
var props = {};
ReactInit.initReactComponent("contentViewService", "UvpBannerContainer", props, document.querySelector("[data-dmid='uvp-banner-container']"));
});
</script>
<div id="react-product-detail-page"></div>
<script>
var props = {
gtin: 3600542198158,
locale: dmSettings.localeLanguage
};
ReactInit.initReactComponent("product-detail-page", "ProductDetailPage", props, document.getElementById("react-product-detail-page"));
$(document).ready(function () {
var props = {
locale: dmSettings.localeLanguage
};
ReactInit.initReactComponent("product-detail-page", "PriceLegend", props, document.getElementById("react-price-legend"));
});
</script>
I would need to get the gtin (plain number) of the second script.
I would like to use the xpath in a scraping tool why only plain xpath code will work for me.
Thank you again and please excuse my previous not fully correct question.
I am assuming that you don't mind JavaScript and jQuery since you didn't specify:
var itemId = $("div[data-dmid]").attr("itemid");
console.log(itemId);
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<div data-dmid="product-detail-page" itemscope="" itemtype="http://schema.org/Product" itemid="3600542198158">
I got the answer with help of another post on Stackoverflow.
Reading a javascript variable's value
The correct code for my updated question is
substring-before(substring-after(//div[#class='onCanvas content-with-footer']//script[2][contains(.,'gtin')]/text(), "gtin: "), ",")
Thank you for any help.
This is my first time asking a question so I am a true SO newbie. I am currently working on a mobile app and I am using Parse React and Ratchet to build it. I have read the React documentations on FB github and apparently do not understand all enough to solve some problems. One of my problems is using the results of a Parse Query in the observe function of the declared ParseComponent as a value of a rendered react component, which in turn attempts to render the passed value as HTML. Below is the parent object:
export default class CategoryPage extends ParseComponent
{
observe(props,state){
return{
category: new Parse.Query('BusinessCategory').equalTo("objectId", this.props.categoryId)
};
}
render() {
return (
<div>
<Header text={this.data.category.objectId} back="true"/>
<div className="content">
<BusinessList categoryId={this.data.category.objectId}/>
</div>
<NavBar />
</div>
);
}
};
Notice I am passing the objectId of the category found in the Query as a text attribute of the Header React component. I am expecting Header as a child to use the passed property as follows:
var Header = React.createClass({
render: function () {
return(
<header className="bar bar-nav">
<h1 className="title">{this.props.text}</h1>
</header>
);
}
});
However the h1 is not rendering anything! I am thinking that this.data.category.objectId is a string and therefore should be rendered in the h1 tag as a string.
I do appreciate your answers very much.
This may come off as a bit newb-ish, but I don't really know how to approach this.
Can anyone recommend me a way of delivering and image from a flask backend, after being called by an angular $http.get call?
Brief example of what I am trying to do.
//javascript code
myApp.controller('MyCtrl', function($scope, $http){
$http.get('/get_image/').success(function(data){
$scope.image = data;
});
});
#flask back end
#app.route('/get_image/', methods= ['GET', 'POST'])
def serve_image():
image_binary = get_image_binary() #returns a .png in raw bytes
return image_binary
<!-- html -->
<html ng-app= "myApp">
<div ng-controller= "MyCtrl">
{{ image }}
</div>
</html>
So as you can see, I am attempting to serve a raw-byte .png image from the flask backend, to the frontend.
I've tried something like this
<html>
<img src= "/get_image/">
</html>
But the trouble is, 'get_image_binary' takes a while to run, and the page loads before the image is ready to be served. I want the image to load asyncronously to the page, only when it is ready.
Again, I am sure there are many ways to do this, probably something built into angular itself, but it is sort of difficult to phrase this into a google-able search.
Can't speak to the flask stuff, but below is some AngularJS code.
This directive won't replace the source attribute until after Angular manipulates the DOM and the browser renders (AngularJS : $evalAsync vs $timeout).
HTML:
<div ng-controller="MyController">
<img lazy-load ll-src="http://i.imgur.com/WwPPm0p.jpg" />
</div>
JS:
angular.module('myApp', [])
.controller('MyController', function($scope) {})
.directive('lazyLoad', function($timeout) {
return {
restrict:'A',
scope: {},
link: function(scope, elem, attrs) {
$timeout(function(){ elem.attr('src', attrs.llSrc) });
},
}
});
Same code in a working JSFiddle
I've an ajax code like this:
var req = new XMLHttpRequest();
req.open('GET', 'http://www.example.org/', false);
req.send(null);
if(req.status == 200)
var response = http_attendance.responseText;
document.getElementById('divAttendance').innerHTML = response;
When I get result on the page, FF browser shows the DOM elements on 'divAttendance'. If I want to put put some jquery effect on the result, I can't be able to do it.
DOM elements is clearly viewed using firebug. But, when I generate the source code of that page then there is no repsonse text on 'divAttendance'. It is blank like thisL:
<html>
....
..
<div id="divAttendance"></div>
..
..
</html>
How to manipulate or put some effect on that result ???
Well, if you are using jQuery then you should be using jquery ajax anyways
http://api.jquery.com/jQuery.ajax/
Regardless, if you are populating your div with AJAX response then it will not show up using "View Source" rather you will have to use a tool like firebug.
Your div initially should look like following
<div id="divAttendance" style="display:none"></div>
and your javascript should have the following
.....
document.getElementById('divAttendance').innerHTML = response;
$("#divAttendance").show("slow");
For such operations jQuery load is easy and usefull function, have a look at
http://api.jquery.com/load/
Your specific example can be rewritten as
<html>
....
IMPORT JQUERY.JS
<script language="javascript">
$('#divAttendance').load('http://www.example.org/', function() {
$("#divAttendance").show("slow");
});
</script>
..
<div id="divAttendance" style="display:none"></div>
..
..
</html>