This example in the CasperJS manual shows how to scrape urls from Google. It shows that the urls will come out nice and clean. However, when I run this example, my output looks like this:
20 links found:
- /url?q=http://casperjs.org/&sa=U&ei=_-TuU-KBC83-yQSu5YKQAg&ved=0CBQQFjAA&usg=AFQjCNH321k0JXrSx5WZp-fH6JwxX-O75Q
- /url?q=http://code4fun.fr/tutoriel-casperjs/&sa=U&ei=_-TuU-KBC83-yQSu5YKQAg&ved=0CBoQFjAB&usg=AFQjCNHreU-9mg7OZxK3TOl94HDPOnA_aQ
- /url?q=http://casperjs.readthedocs.org/&sa=U&ei=_-TuU-KBC83-yQSu5YKQAg&ved=0CCEQFjAC&usg=AFQjCNGzX6V5ZQtmCwHwZerHR3ftK3pHOw
- /url?q=https://github.com/n1k0/casperjs&sa=U&ei=_-TuU-KBC83-yQSu5YKQAg&ved=0CCcQFjAD&usg=AFQjCNEiGMDpYiPm1qXK7ZxDCwWwKjAStg
- /url?q=http://www.technologies-ebusiness.com/enjeux-et-tendances/casperjs-pour-des-tests-d-integration&sa=U&ei=_-TuU-KBC83-yQSu5YKQAg&ved=0CC4QFjAE&usg=AFQjCNFOGl1p6ApqP8TmAxhtQp33DHpbcQ
- /url?q=https://www.lullabot.com/blog/article/testing-front-end-casperjs&sa=U&ei=_-TuU-KBC83-yQSu5YKQAg&ved=0CDQQFjAF&usg=AFQjCNG53ZxHl8yZ0JGdzNbwKuZmPOLqCg
- /url?q=http://blog.newrelic.com/2013/06/04/simpler-ui-testing-with-casperjs-2/&sa=U&ei=_-TuU-KBC83-yQSu5YKQAg&ved=0CDoQFjAG&usg=AFQjCNFzlDb7R4Uv-jj_3S5IbJUpKF-7fA
- /url?q=https://www.npmjs.org/package/grunt-casperjs&sa=U&ei=_-TuU-KBC83-yQSu5YKQAg&ved=0CEEQFjAH&usg=AFQjCNGn-dwJpkX_XTQv8YnFZTClcLosJA
- /url?q=http://www.phase2technology.com/blog/behavorial-test-for-custom-entity-using-casperjs/&sa=U&ei=_-TuU-KBC83-yQSu5YKQAg&ved=0CEcQFjAI&usg=AFQjCNFG0KDAADmocesrDoqQTHW6PPO8KQ
- /url?q=http://blog.codeship.io/2013/03/07/smoke-testing-with-casperjs.html&sa=U&ei=_-TuU-KBC83-yQSu5YKQAg&ved=0CE4QFjAJ&usg=AFQjCNG5AsT2iKCnN-utrCGsthCZCpYKaQ
- /url?q=http://phantomjs.org/&sa=U&ei=_-TuU9yhG4iZyASb8oK4Dw&ved=0CBQQFjAA&usg=AFQjCNGXz7tw-UkfDOpqvYV89KlcJPGfHQ
- /url?q=http://phantomjs.org/download.html&sa=U&ei=_-TuU9yhG4iZyASb8oK4Dw&ved=0CB8QFjAB&usg=AFQjCNG_czKcYiFKskAvoRl1CceXuTJecA
- /url?q=http://www.mathieurobin.com/2013/04/phantomjs-chargez-et-jouez-avec-vos-sites-en-js-sans-quitter-la-console/&sa=U&ei=_-TuU9yhG4iZyASb8oK4Dw&ved=0CCUQFjAC&usg=AFQjCNEAtYz0zcsRVYy-37U9sJL7e9EqYQ
- /url?q=http://svay.com/blog/paris-js-10-introduction-a-phantomjs-un-navigateur-webkit-headless/&sa=U&ei=_-TuU9yhG4iZyASb8oK4Dw&ved=0CCsQFjAD&usg=AFQjCNE9dUuVQmNpK064a9GPJyOIetUWAA
- /url?q=https://github.com/ariya/phantomjs&sa=U&ei=_-TuU9yhG4iZyASb8oK4Dw&ved=0CDIQFjAE&usg=AFQjCNErqnWYxIVwBwXeUjaSd4SFicQqpw
- /url?q=https://github.com/gruntjs/grunt-lib-phantomjs&sa=U&ei=_-TuU9yhG4iZyASb8oK4Dw&ved=0CDkQFjAF&usg=AFQjCNHkRVx926JJkKhdoKxKsKVcQc-QTg
- /url?q=http://blog.octo.com/seo-spa-angular/&sa=U&ei=_-TuU9yhG4iZyASb8oK4Dw&ved=0CD8QFjAG&usg=AFQjCNFcj-ykUo-rSQKlcEZIy1qjSlW-oQ
- /url?q=https://www.npmjs.org/package/phantomjs&sa=U&ei=_-TuU9yhG4iZyASb8oK4Dw&ved=0CEYQFjAH&usg=AFQjCNGweWRdm8qjqxOOybFgtz5B8CnMDQ
- /url?q=http://code.google.com/p/phantomjs/&sa=U&ei=_-TuU9yhG4iZyASb8oK4Dw&ved=0CEwQFjAI&usg=AFQjCNEwvNx7NNMDAaiqHZ_y-3Bbf62W_w
- /url?q=http://casperjs.org/&sa=U&ei=_-TuU9yhG4iZyASb8oK4Dw&ved=0CE4QFjAJ&usg=AFQjCNGNEKkl1eWaFx9Sz6R7ZFVN9r1Bhw
Here's their code I ran:
var links = [];
var casper = require('casper').create();
function getLinks() {
var links = document.querySelectorAll('h3.r a');
return Array.prototype.map.call(links, function(e) {
return e.getAttribute('href');
});
}
casper.start('http://google.fr/', function() {
// search for 'casperjs' from google form
this.fill('form[action="/search"]', { q: 'casperjs' }, true);
});
casper.then(function() {
// aggregate results for the 'casperjs' search
links = this.evaluate(getLinks);
// now search for 'phantomjs' by filling the form again
this.fill('form[action="/search"]', { q: 'phantomjs' }, true);
});
casper.then(function() {
// aggregate results for the 'phantomjs' search
links = links.concat(this.evaluate(getLinks));
});
casper.run(function() {
// echo results in some pretty fashion
this.echo(links.length + ' links found:');
this.echo(' - ' + links.join('\n - ')).exit();
});
Can someone explain what is going on and why my urls are not clean like theirs?
The example is fine since you are getting the urls, but with a little bit of noise. It looks like google changed the hrefs in the mean time. So you can just add
links = links.map(function(link){
return link.substring(0, link.indexOf("&sa=U&ei=")).replace("/url?q=", "");
});
before joining the links in the last step.
I actually found a work around using a function called getLinks(). This works great and should be a bit more versatile for what I need. Using a combination of split() and pop() allows you to get what you want.
Code:
function getLinks() {
var links = document.querySelectorAll('h3.r a');
return Array.prototype.map.call(links, function(e) {
return e.getAttribute('href').split('/').pop();
});
}
casper.start(googleSearch, function() {
links = this.evaluate(getLinks);
});
casper.run(function() {
// echo results in some pretty fashion
this.echo(links.length + ' links found:');
this.echo(links.join('\n')).exit();
});
Related
I'm using the raw property to get formatted data from urls into the terminal, like this
$(function() {
var save_state = [];
var terminal = $('#term').terminal(function(command, term) {
term.pause();
url = ...;
$.get(url, function(result) {
term.echo(result, {raw:true}).resume();
});
}, { prompt: '>>', name: 'test', outputLimit: 1000 });
});
I'm wondering, how do I get it so when links in result are clicked, they load their data into the terminal the same way command data is loaded, rather than opening a new browser tab?
Thanks!
If you're using command that include URL or URI (for instance get foo.html or get https://example.com) you can use this:
terminal.on('click', '.terminal-output > div:not(.exception) a', function() {
// if you don't use `true` it will show the command like if you type it
// instead of `get` you can use any command you have that will
// fetch the url and display it on the terminal
terminal.exec('get ' + $(this).attr('href'), true);
return false; // prevent following the link
});
if you have different logic for displaying the urls you may need to dipicate the code from interpreter inside click event handler.
terminal.on('click', '.terminal-output > div:not(.exception) a', function() {
// duplicated code from your interpreter
term.pause();
var url = $(this).attr('href');
$.get(url, function(result) {
term.echo(result, {raw:true}).resume();
});
return false;
});
The following is the html I wanna tackle with.
I wanna post a query and see the weather reports for that city. I tried my codes:
var casper = require('casper').create();
var utils = require('utils');
casper.start('http://www.weather.com.cn/weather1d/101210101.shtml');
casper.waitForSelector('input', function()
{
this.fillXPath('div.search clearfix',{'//input[#id="txtZip"]':'shijiazhuang'},true);
});
casper.then(function()
{
utils.dump(this.getTitle());
});
casper.run();
It did not print the web paeg title on the console. I also tried this:
casper.waitForSelector('input', function()
{
this.fillSelectors('div.search clearfix',{'input[id="txtZip"]':'shijiazhuang'},true);
});
}
);
I did not get any web page title also. I got quite confused and did not what I had done wrong. Besides, it seems that this.fill method only tackles with name attributes to post information on CasperJS's official website. I need your help.
CasperJS script:
var casper = require('casper').create(), utils = require('utils');
casper
.start('http://www.weather.com.cn/weather1d/101210101.shtml',function(){
this
.wait(3000,function(){
this.capture('search.png');
utils.dump(this.getTitle());
})
.evaluate(function(){
document.getElementById('txtZip').value='shijiazhuang';
document.querySelector('input[type="button"]').click();
})
})
.run();
Result:
"【石家庄天气】石家庄今天天气预报,今天,今天天气,7天,15天天气预报,天气预报一周,天气预报15天查询"
search.png
You are doing it correct, you only have to modify your script a little bit.
Just modify the selector of the input field the actual don't seems to work, then you should get the correct result.
"this.fillXPath" only fills up the form and don't post it.
...
// fill the form
casper.waitForSelector('input', function() {
this.fillXPath('#txtZip', {
'//input[#id="txtZip"]': 'shijiazhuang'
}, true);
});
// trigger - post the form
casper.then(function() {
this.click('#btnZip');
});
// i can't speak chinese - wait for some specific change on this page
casper.wait(5000);
// take a screenshot of it
casper.then(function() {
this.capture("shijiazhuang_weather.png");
});
...
Following is a minimal casper script that does a Google query. I've added casper.on('click' ...) prior to running the script, but it doesn't appear to get triggered.
What am I missing?
// File: google_click_test.js
"use strict";
var casper = require('casper').create();
casper.on('click', function(css) {
casper.echo('casper.on received click event ' + css);
});
// ================================================================
// agenda starts here
casper.start('https://google.com', function g01() {
casper.echo('seeking main page');
});
casper.then(function a02() {
casper.waitForSelector(
'form[action="/search"]',
function() {
casper.echo("found search form");
},
function() {
casper.echo("failed to find search form");
casper.exit();
});
});
casper.then(function a03() {
casper.fillSelectors('form[action="/search"]', {
'input[title="Google Search"]' : 'casperjs'
}, false);
});
casper.then(function a04() {
casper.click('form[action="/search"] input[name="btnG"]')
casper.echo('clicked search button');
});
casper.run();
Output:
Here's the output. I would expect to see casper.on received click event somewhere, but it seems that it didn't get triggered:
$ casperjs --ignore-ssl-errors=true --web-security=no google_click_test.js
seeking main page
found search form
clicked search button
$
Although your example runs fine for me using casperjs 1.1.0-beta3 and phantomjs 1.9.8, I've been having similar issues in the last few months with casperjs. Sadly it seems that the author has stopped maintaining the project. More information here:
https://github.com/n1k0/casperjs/issues/1299
I would suggest moving to a different testing framework. In my case I chose a combination of mocha + chai + nightmarejs. This gist is a good starting point:
https://gist.github.com/MikaelSoderstrom/4842a97ec399aae1e024
I have following javascript that is using a selection changed to fill in a select list.
$(function () {
$("#bedrijvenauto").each(function () {
var target = $(this);
var dest = target.attr("data-autocomplete-destination");
target.autocomplete({
source: target.attr("data-autocomplete-source"),
select: function (event, ui) {
alert('selected bedrijf');
event.preventDefault();
target.val(ui.item.label);
$(dest).val(ui.item.value);
$("#projectenauto").val("");
alert('selected bedrijf');
alert($('#BEDRIJF_ID').val());
$.getJSON("/Project/GetListViaJson", { bedrijf: $('#BEDRIJF_ID').val() }, function (data) {
alert('selected bedrijf');
alert(data);
$("#PROJECT_ID").empty();
$("#PROJECT_ID").append(new Option("Maak een selectie", 0));
for (var i = 0; i < data.length; ++i) {
alert(data[i].value + ' ' + data[i].label);
$("#PROJECT_ID").append(new Option(data[i].label, data[i].value));
}
});
},
focus: function (event, ui) {
event.preventDefault();
target.val(ui.item.label);
}
});
target.val($("#BEDRIJF_NAAM").val());
});
It works like a charm on my development pc. The alert are all coming out even the data is returning results. That is the difference with the development pc that does not give any results after the call to getJSON
I have the feeling I am missing a detail here.
I am not used to debugging on a webserver because I usually create GUI applications in WPF, and this is a student's work for his vacation and I now got to get it working without him being around anymore. Vacation is done :-(
But not for me.
The 404 error indicated in your comments means the url your creating is incorrect. Always make use of the #Url.Action() method to ensure they are correctly generated. In your script
var url = '#Url.Action("GetListViaJson", "Project")';
$.getJSON(url, { bedrijf: $('#BEDRIJF_ID').val() }, function (data) {
....
}
or if this is an external script, then add the var url = '#Url.Action(...)'; in the main view (razor code is not evaluated in external script files), or add it as a data- attribute to the element your handling
data-url = "#Url.Action(...)"
and get it again using var url = $(someElement).data('url');
I am trying to run a basic casperjs script that logs into a website then shows me the links. However my output is not returning anything other then 'Done'
Here is my code
var casper = require('casper').create();
casper.start('http://xxxxxx/Login.aspx', function(){
//Login
this.fill('form#form1', {
'username': 'xxxxx',
'password': 'xxxxx'
}, true);
});
casper.then(function(){
var links = document.getElementsByTagName('a');
for(var i = 0; i < links.length; ++i) {
//These should show something
this.echo(links[i].innerText;
this.echo(this.getHTML());
}
});
casper.run(function(){
this.echo('done').exit();
});
Like I said, the only thing I get back is "done".
You use getElementsByTagName in the CasperJS context, you can't do that here, you have to pass in the page context by using the evaluate function (see evaluate, thenEvaluate).
If you just want to print the text of a link, use that in your casper.then :
this.echo(this.fetchText('a'));
And you forgot a bracket too here : this.echo(links[i].innerText;
When you iterate in casperjs, you should use each (IIFE) : http://docs.casperjs.org/en/latest/modules/casper.html#each