The data on the webpage is displayed dynamically and it seems that checking for every change in the html and extracting the data is a very daunting task and also needs me to use very unreliable XPaths. So I would want to be able to extract the data from the XHR packets.
I hope to be able to extract information from XHR packets as well as generate 'XHR' packets to be sent to the server.
The extracting information part is more important for me because the sending of information can be handled easily by automatically triggering html elements using casperjs.
I'm attaching a screenshot of what I mean.
The text in the response tab is the data I need to process afterwards. (This XHR response has been received from the server.)
This is not easily possible, because the resource.received event handler only provides meta data like url, headers or status, but not the actual data. The underlying phantomjs event handler acts the same way.
Stateless AJAX Request
If the ajax call is stateless, you may repeat the request
casper.on("resource.received", function(resource){
// somehow identify this request, here: if it contains ".json"
// it also also only does something when the stage is "end" otherwise this would be executed two times
if (resource.url.indexOf(".json") != -1 && resource.stage == "end") {
var data = casper.evaluate(function(url){
// synchronous GET request
return __utils__.sendAJAX(url, "GET");
}, resource.url);
// do something with data, you might need to JSON.parse(data)
}
});
casper.start(url); // your script
You may want to add the event listener to resource.requested. That way you don't need to way for the call to complete.
You can also do this right inside of the control flow like this (source: A: CasperJS waitForResource: how to get the resource i've waited for):
casper.start(url);
var res, resData;
casper.waitForResource(function check(resource){
res = resource;
return resource.url.indexOf(".json") != -1;
}, function then(){
resData = casper.evaluate(function(url){
// synchronous GET request
return __utils__.sendAJAX(url, "GET");
}, res.url);
// do something with the data here or in a later step
});
casper.run();
Stateful AJAX Request
If it is not stateless, you would need to replace the implementation of XMLHttpRequest. You will need to inject your own implementation of the onreadystatechange handler, collect the information in the page window object and later collect it in another evaluate call.
You may want to look at the XHR faker in sinon.js or use the following complete proxy for XMLHttpRequest (I modeled it after method 3 from How can I create a XMLHttpRequest wrapper/proxy?):
function replaceXHR(){
(function(window, debug){
function args(a){
var s = "";
for(var i = 0; i < a.length; i++) {
s += "\t\n[" + i + "] => " + a[i];
}
return s;
}
var _XMLHttpRequest = window.XMLHttpRequest;
window.XMLHttpRequest = function() {
this.xhr = new _XMLHttpRequest();
}
// proxy ALL methods/properties
var methods = [
"open",
"abort",
"setRequestHeader",
"send",
"addEventListener",
"removeEventListener",
"getResponseHeader",
"getAllResponseHeaders",
"dispatchEvent",
"overrideMimeType"
];
methods.forEach(function(method){
window.XMLHttpRequest.prototype[method] = function() {
if (debug) console.log("ARGUMENTS", method, args(arguments));
if (method == "open") {
this._url = arguments[1];
}
return this.xhr[method].apply(this.xhr, arguments);
}
});
// proxy change event handler
Object.defineProperty(window.XMLHttpRequest.prototype, "onreadystatechange", {
get: function(){
// this will probably never called
return this.xhr.onreadystatechange;
},
set: function(onreadystatechange){
var that = this.xhr;
var realThis = this;
that.onreadystatechange = function(){
// request is fully loaded
if (that.readyState == 4) {
if (debug) console.log("RESPONSE RECEIVED:", typeof that.responseText == "string" ? that.responseText.length : "none");
// there is a response and filter execution based on url
if (that.responseText && realThis._url.indexOf("whatever") != -1) {
window.myAwesomeResponse = that.responseText;
}
}
onreadystatechange.call(that);
};
}
});
var otherscalars = [
"onabort",
"onerror",
"onload",
"onloadstart",
"onloadend",
"onprogress",
"readyState",
"responseText",
"responseType",
"responseXML",
"status",
"statusText",
"upload",
"withCredentials",
"DONE",
"UNSENT",
"HEADERS_RECEIVED",
"LOADING",
"OPENED"
];
otherscalars.forEach(function(scalar){
Object.defineProperty(window.XMLHttpRequest.prototype, scalar, {
get: function(){
return this.xhr[scalar];
},
set: function(obj){
this.xhr[scalar] = obj;
}
});
});
})(window, false);
}
If you want to capture the AJAX calls from the very beginning, you need to add this to one of the first event handlers
casper.on("page.initialized", function(resource){
this.evaluate(replaceXHR);
});
or evaluate(replaceXHR) when you need it.
The control flow would look like this:
function replaceXHR(){ /* from above*/ }
casper.start(yourUrl, function(){
this.evaluate(replaceXHR);
});
function getAwesomeResponse(){
return this.evaluate(function(){
return window.myAwesomeResponse;
});
}
// stops waiting if window.myAwesomeResponse is something that evaluates to true
casper.waitFor(getAwesomeResponse, function then(){
var data = JSON.parse(getAwesomeResponse());
// Do something with data
});
casper.run();
As described above, I create a proxy for XMLHttpRequest so that every time it is used on the page, I can do something with it. The page that you scrape uses the xhr.onreadystatechange callback to receive data. The proxying is done by defining a specific setter function which writes the received data to window.myAwesomeResponse in the page context. The only thing you need to do is retrieving this text.
JSONP Request
Writing a proxy for JSONP is even easier, if you know the prefix (the function to call with the loaded JSON e.g. insert({"data":["Some", "JSON", "here"],"id":"asdasda")). You can overwrite insert in the page context
after the page is loaded
casper.start(url).then(function(){
this.evaluate(function(){
var oldInsert = insert;
insert = function(json){
window.myAwesomeResponse = json;
oldInsert.apply(window, arguments);
};
});
}).waitFor(getAwesomeResponse, function then(){
var data = JSON.parse(getAwesomeResponse());
// Do something with data
}).run();
or before the request is received (if the function is registered just before the request is invoked)
casper.on("resource.requested", function(resource){
// filter on the correct call
if (resource.url.indexOf(".jsonp") != -1) {
this.evaluate(function(){
var oldInsert = insert;
insert = function(json){
window.myAwesomeResponse = json;
oldInsert.apply(window, arguments);
};
});
}
}).run();
casper.start(url).waitFor(getAwesomeResponse, function then(){
var data = JSON.parse(getAwesomeResponse());
// Do something with data
}).run();
I may be late into the party, but the answer may help someone like me who would fall into this problem later in future.
I had to start with PhantomJS, then moved to CasperJS but finally settled with SlimerJS. Slimer is based on Phantom, is compatible with Casper, and can send you back the response body using the same onResponseReceived method, in "response.body" part.
Reference: https://docs.slimerjs.org/current/api/webpage.html#webpage-onresourcereceived
#Artjom's answer's doesn't work for me in the recent Chrome and CasperJS versions.
Based on #Artjom's answer and based on gilly3's answer on how to replace XMLHttpRequest, I have composed a new solution that should work in most/all versions of the different browsers. Works for me.
SlimerJS cannot work on newer version of FireFox, therefore no good for me.
Here is the the generic code to add a listner to load of XHR (not dependent on CasperJS):
var addXHRListener = function (XHROnStateChange) {
var XHROnLoad = function () {
if (this.readyState == 4) {
XHROnStateChange(this)
}
}
var open_original = XMLHttpRequest.prototype.open;
XMLHttpRequest.prototype.open = function (method, url, async, unk1, unk2) {
this.requestUrl = url
open_original.apply(this, arguments);
};
var xhrSend = XMLHttpRequest.prototype.send;
XMLHttpRequest.prototype.send = function () {
var xhr = this;
if (xhr.addEventListener) {
xhr.removeEventListener("readystatechange", XHROnLoad);
xhr.addEventListener("readystatechange", XHROnLoad, false);
} else {
function readyStateChange() {
if (handler) {
if (handler.handleEvent) {
handler.handleEvent.apply(xhr, arguments);
} else {
handler.apply(xhr, arguments);
}
}
XHROnLoad.apply(xhr, arguments);
setReadyStateChange();
}
function setReadyStateChange() {
setTimeout(function () {
if (xhr.onreadystatechange != readyStateChange) {
handler = xhr.onreadystatechange;
xhr.onreadystatechange = readyStateChange;
}
}, 1);
}
var handler;
setReadyStateChange();
}
xhrSend.apply(xhr, arguments);
};
}
Here is CasperJS code to emit a custom event on load of XHR:
casper.on("page.initialized", function (resource) {
var emitXHRLoad = function (xhr) {
window.callPhantom({eventName: 'xhr.load', eventData: xhr})
}
this.evaluate(addXHRListener, emitXHRLoad);
});
casper.on('remote.callback', function (data) {
casper.emit(data.eventName, data.eventData)
});
Here is a code to listen to "xhr.load" event and get the XHR response body:
casper.on('xhr.load', function (xhr) {
console.log('xhr load', xhr.requestUrl)
console.log('xhr load', xhr.responseText)
});
Additionally, you can also directly download the content and manipulate it later.
Here is the example of the script I am using to retrieve a JSON and save it locally :
var casper = require('casper').create({
pageSettings: {
webSecurityEnabled: false
}
});
var url = 'https://twitter.com/users/username_available?username=whatever';
casper.start('about:blank', function() {
this.download(url, "hop.json");
});
casper.run(function() {
this.echo('Done.').exit();
});
I'm using ajax to pull in content to create a light box type content page. It also has a next and prev button once loaded so hence the use of ajax.
I wanted to use Ajax for the page navigation too. But if someone clicks a page link and then tries to use the light box feature both the jquery and ajax requests no longer work within the loaded area.
I've read a lot about bind and delegate but not sure how to use them in this context
Here's my main pieces of code:
// This gets called on document ready
function clicky() {
$link.click(function(e) {
e.preventDefault();
var linkPage = $(this).attr('href');
if ($(this).hasClass('pages')){
// PAGE specific code
if ($('body').scrollTop() != 0) {
$('body').animate({ scrollTop: 0 }, 500, function(){
pageLoad(linkPage);
});
} else { pageLoad(linkPage); }
console.log('page');
} else {
// PRODUCT specific code
if ($('body').scrollTop() != 0) {
$('body').animate({ scrollTop: 0 }, 500, function(){
productLoad(linkPage);
});
} else {
productLoad(linkPage);
}
}
});
}
Here's my ajax for the two different areas:
// Ajax stuff going on for pages
function pageLoad(linkPage) {
// Page stuff fades out
history.pushState(null, null, linkPage);
$("#page-content").load(linkPage + " #guts", function(){
// Loads in page content
});
}
// Ajax stuff going on for Products
function productLoad(linkPage) {
// Page stuff fades out
history.pushState(null, null, linkPage);
$("#product-content").load(linkPage + " #guts", function(){
// Shows an overlay/lightbox and loads in content
});
}
Edit: This worked for me
$(document).on('click', '.link' , function(){
console.log('this worked');
return false;
});
This worked:
$(document).on('click', '.link' , function(){
console.log('this worked');
return false;
});
I'v got two questions. First. How can I reduce this code?
$('#m').click(function() {
var href = $(this).attr('href');
$('#con').hide().load('inc/main.php').fadeIn('normal');
return false;
});
$('#b').click(function() {
var href = $(this).attr('href');
$('#con').hide().load('inc/blog.php').fadeIn('normal');
return false;
});
$('#p').click(function() {
var href = $(this).attr('href');
$('#con').hide().load('inc/portfolio.php').fadeIn('normal');
return false;
});
$('#l').click(function() {
var href = $(this).attr('href');
$('#con').hide().load('inc/lebenslauf.php').fadeIn('normal');
return false;
});
$('#k').click(function() {
var href = $(this).attr('href');
$('#con').hide().load('inc/kontakt.php').fadeIn('normal');
return false;
});
I'm using a lib called perfect scrollbar. It is included this way:
$(document).ready(function(a){a("#scrollbox").perfectScrollbar({wheelSpeed:20,wheelPropagation:!1})});
When main.php is loaded in with this script, the scrollbar is not there like it should be. It's because the document doesn't refresh like usual. What to I need to write to get it working when loaded in?
Write a function & pass each selector & filepath to this function
$('#m').click(some_function()
{
helperfunction($(this), 'inc/main.php');
});
function helperfunction(selector, phpfilepath) {
var href = selector.attr('href');
$('#con').hide().load(phpfilepath).fadeIn('normal');
return false;
}
On my site I use one core/frame PHP file. If user hit one of my link (like contact, our about..etc..) the content loaded via ajax. I use the following snippet to achieve this:
var AjaxContent = function(){
var container_div = '';
var content_div = '';
return {
getContent : function(url){
$(container_div).animate({opacity:0},
function(){ // the callback, loads the content with ajax
$(container_div).load(url, //only loads the selected portion
function(){
$(container_div).animate({opacity:1});
}
);
});
},
ajaxify_links: function(elements){
$(elements).click(function(){
AjaxContent.getContent(this.href);
return false;
});
},
init: function(params){
container_div = params.containerDiv;
content_div = params.contentDiv;
return this;
}
}
}();
I need help how to integrate a preloading, so if visitors hit one of my link (for example the gallery menu) will see a little loading image, because now they see the big white nothing for long - long seconds.
Add loading image beforing ajax call and after you get response from server simply replace that image with data like the one below
function(){ // the callback, loads the content with ajax
$(container_div).html("<img src='loading.gif' />");//add image before ajax call
$(container_div).load(url, //only loads the selected portion
function(){
$(container_div).html(data);//replace image with server response data
$(container_div).animate({opacity:1});
}
Try this
var AjaxContent = function(){
var container_div = '';
var content_div = '';
return {
getContent : function(url){
$(container_div).html('Loading...'); //replace with your loading img html code
$(container_div).load(url, //only loads the selected portion
function(){
$(container_div).css({opacity:0});
$(container_div).animate({opacity:1});
});
},
ajaxify_links: function(elements){
$(elements).click(function(){
AjaxContent.getContent(this.href);
return false;
});
},
init: function(params){
container_div = params.containerDiv;
content_div = params.contentDiv;
return this;
}
}
}();
The variable ajaxdata is modified within the success function, if that hasn't been done yet, I would like to wait 2 seconds, then continue without it.
The use case is for a jqueryui autocomplete field. The autocomplete source is an ajax request, but if the user types quickly, and exits the field before the list loads, the field remains unset. Using the 'change' event on the autocomplete I check if the user entered a valid option without selecting it, but this doesn't work if the source hasn't loaded when the change event fires. So I would like to put a delay in the change function which waits, if the source (stored in the variable 'ajaxdata') is empty.
code:
input.autocomplete({
source: function (request, response){
$.ajax(
{
type: "GET",
url: "/some/url",
dataType: "json",
success: function(data){
response($.map(data,function(item){
return{
label: item.label,
value: item.value
}
}));
ajaxdata = data;
}
}
);
// ajaxopts = ajaxsource(request,response,ajaxurl,xtraqry)
},
change: function(event, ui) {
if (!ui.item) {
// user didn't select an option, but what they typed may still match
var enteredString = $(this).val();
var stringMatch = false;
if (ajaxdata.length==0){
/// THIS IS WHERE I NEED A 2 SECOND DELAY
}
var opts = ajaxdata;
for (var i=0; i < opts.length; i++){
if(opts[i].label.toLowerCase() == enteredString.toLowerCase()){
$(this).val(opts[i].label);// corrects any incorrect case
stringMatch = true;
break;
}
}
}
},
});
Edit:
To be more specific about the problem: This delay needs to be conditional. Meaning that if the data is already loaded (either because it came from a static source, or from an earlier ajax call) I do not want to have a delay.
If I'm understanding you properly, I think you just want to check and see if ajaxdata has been populated; but if it hasn't, only wait two more seconds and then just proceed without it.
Try this:
change: function(event, ui) {
if (!ui.item) {
// user didn't select an option, but what they typed may still match
if (ajaxdata.length==0){
/// THIS IS WHERE I NEED A 2 SECOND DELAY
//pass in 'this' so that you can use it
setTimeout(function() {correctCase(this);}, 2000);
}
}
}
. . . . .
function correctCase(inThis){
//I'm not sure what this variable does. do you really need it???
var stringMatch = false;
var enteredString = $(inThis).val();
//you still want to be sure that ajaxdata is not empty here
if (ajaxdata.length==0){
var opts = ajaxdata;
for (var i=0; i < opts.length; i++){
if(opts[i].label.toLowerCase() == enteredString.toLowerCase()){
$(inThis).val(opts[i].label); // corrects any incorrect case
stringMatch = true; //this variable doesn't seem to do anything after this???
break;
}
}
}
}
I'm not really sure what it is you're trying to do, but I'm pretty sure something like this would be a better way of doing it :
input.autocomplete({
source: function(request, response) {
return $.ajax({
type: "GET",
url: "/some/url",
dataType: "json"
});
},
change: function(event, ui) {
if (!ui.item) {
// user didn't select an option, but what they typed may still match
var enteredString = this.value;
var stringMatch = false;
//make sure ajax is complete
this.source().done(function(data) {
var opts = $.map(data, function(item) {
return {
label: item.label,
value: item.value
}
});
for (var i = 0; i < opts.length; i++) {
if (opts[i].label.toLowerCase() == enteredString.toLowerCase()) {
$(this).val(opts[i].label); // corrects any incorrect case
stringMatch = true;
}
}
});
}
}
});
By default, JavaScript is asynchronous whenever it encounters an async function, it queued that function for later.
But if you want a pause js(ajax call or anything) for you can do it use promises
Case 1: output hello(will not wait for setTimeout)
https://jsfiddle.net/shashankgpt270/h0vr53qy/
//async
function myFunction() {
let result1='hello'
//promise =new Promise((resolve,reject)=>{
setTimeout(function(){
resolve("done");
result1="done1";
}, 3000);
//});
//result = await promise
alert(result1);
}
myFunction();
case 2: output done1(will wait for setTimeout)
https://jsfiddle.net/shashankgpt270/1o79fudt/
async function myFunction() {
let result1='hello'
promise =new Promise((resolve,reject)=>{
setTimeout(function(){
resolve("done");
result1="done1";
}, 3000);
});
result = await promise
alert(result1);
}
myFunction();