Casperjs how to get to next page - casperjs

I'm just beginning with casperjs and js trying to get it to navigate to a new page.
There are multiple similar links on the page that look like this:
<font class="IndexLink">2</font>
<font class="IndexLink">2</font>
<font class="IndexLink">2</font>
<font class="IndexLink">2</font>
.etc.
I am on page 1 trying to go to page 2. Here is the partial code:
pageNumber=2; // hard coded for now.
var target = 'a[onclick="GoToPage('+ pageNumber + ')"]';
this.test.assertSelectorExists(target);
this.click(target);
I'm doing the test to assure myself that the selector is valid and the test passes. In the debug output I see that the url change is requested but it seems to go to the same page it is on instead of page 2.
For what it's worth here is the debug output for this segment of code:
PASS Found an element matching: a[onclick="GoToPage(2)"]
[debug] [phantom] Mouse event 'click' on selector: a[onclick="GoToPage(2)"]
[debug] [phantom] Navigation requested: url=http://www.clermontauditorrealestate.org/search/advancedsearch.aspx?mode=advanced#, type=LinkClicked, lock=true, isMainFrame=true
[debug] [phantom] url changed to "http://www.clermontauditorrealestate.org/search/advancedsearch.aspx?mode=advanced#"
[info] [phantom] Step 5/6: done in 2880ms.
[info] [phantom] Step 6/6 http://www.clermontauditorrealestate.org/search/advancedsearch.aspx?mode=advanced# (HTTP 200)
[debug] [phantom] Capturing page to /Users/willirl/a-will-1-screenshot.png
This is a public web site I'm scraping so I can post the full 20 or so lines of code if that would help.
Any help is appreciated.

How dou you dedect if a new page is loaded, instantly after the click?
If yes, I recommend using thenClick and then to do some actions after it.
Notice that you will have to call run() for your instance if you want to execute the start of the instance and all the then-steps

Related

How can I get an anchor tag with no or empty href in typo3 ckeditor

In TYPO3 8.7, I'm trying to create an anchor tag to open a modal, in a regular text element, like this:
<a class="someclass" data-open="myModal">Click me</a>
But Typo3 will automatically add an href attribute linking to the current page. When I click the tag, the modal opens, but the page immediately reloads.
I've tryed adding href="#", but that turns into href="/mypage/#" and href="#mymodal" becomes href="/mypage/#mymodal", both of which trigger a reload.
In my ckeditor setup, I have set allowedContent: true
How can I make an <a> tag without the href being altered?
If you have a ClickEvent on an a-tag you need to return false from the javascript to stop further processing. And following the link is the last further processing.
Even if you manage to reduce the href to # you page may reload or jump to the start.
Maybe you can fool your browser if you use href="javascript:return false".

Web scraping from youtube with nokogiri

I want to scrape all the names of the users who commented below a youtube video.
I'm using ruby and nokogiri.
require 'rubygems'
require 'nokogiri'
require 'open-uri'
url = "https://www.youtube.com/watch?v=tntOCGkgt98"
doc = Nokogiri::HTML(open(url))
doc.css(".comment-thread-renderer > .comment-renderer").each do |comment|
name = comment.css("#comment-section-renderer-items .g-hovercard").text
puts name
end
But it's not working, I'm not getting any output, no error either.
I won't be able to give you a solution, but at least I can give you a couple of hints that may help you to move forward.
The code you have is not working because the comments section is loaded via an ajax call after the page is loaded. If you do a hard reload in your browser, you will see that there is a spinner icon and a Loading... text in the sections comment, waiting for the content to be loaded. When Nokogiri gets the page via the http request, it gets the html content that you see before the comments are loaded. As a matter of fact the place where the contents will be later added looks like:
<div id="watch-discussion" class="branded-page-box yt-card">
<div id="comment-section-renderer"
class="comment-section-renderer vve-check"
data-visibility-tracking="CCsQuy8iEwjr3P3u1uzNAhXIepAKHRV9D8Ao-B0=">
<div class="action-panel-loading">
<p class="yt-spinner ">
<span class="yt-spinner-img yt-sprite" title="Loading icon">
</span>
<span class="yt-spinner-message">Loading...</span>
</p>
</div>
</div>
</div>
That is the reason why you won't find the divs you are looking for, because they aren't part of the html you have.
Looking at the network console in the browser, it seems that the ajax request to get the comments data is being sent to https://www.youtube.com/watch_fragments_ajax?v=tntOCGkgt98&tr=time&distiller=1&ctoken=EhYSC3RudE9DR2tndDk4wAEAyAEA4AEBGAY%253D&frags=comments&spf=load. As you can see the v parameter is the video id, however there are a couple of caveats:
There is a ctoken param, which you can get by scraping the original page contents. It is inside a <script> tag, in the form of
'COMMENTS_TOKEN': "<token>".
However, you still need to send a session_token as a form data in the body of the AJAX request (which is a POST). That I don't know where is coming from :(.
I think that you will be pushing the limits of Nokogiri here, as AFAIK it is not intended to follow ajax requests or handling Javascript. Maybe the ruby Selenium driver is better suited for this.
HTH
I think you need name.css("#comment-section..."
The each statement will iterate over the elements, using the variable name.
You may want to use node instead of name:
doc.css(".comment-thread-renderer > .comment-renderer").each do |node|
name = node.css("#comment-section-renderer-items .g-hovercard").text
puts name
end
I wrote this rails app using nokogiri to see all the tags that a page has before any javascript is run in the browser. The source code is here, so you can adjust it if you need to add more info about the node in the view.
That can easily tell you if the particular tag element that you are looking for is something you can retrieve without having to do some JS eval.
Most web crawlers don't support client-side rendering, which gives you an idea that it's not a trivial task to execute JS when scraping content.
YouTube is a dynamically rendered JavaScript website, though it could be parsed with Nokogiri without using Selenium or another package. Try open the Network tab in dev tools, scroll to the comment section, and see what request being send.
You need to make a post request in order to fetch comments data. You can preview the output in the "Preview" tab.
Preview output:
Which is equivalent to this comment:
Note: Since this comment brings very little value, this answer will be updated with the attached code once there will be an available solution.

SyntaxError: illegal character while loading jquery popup in cakephp

I am trying to integrate popup plugin from https://github.com/webtechnick/CakePHP-Popup-Plugin. I just followed exactly as said on the above link. I am trying to load a element in popup hence i used this $this->Popup->link('click me', array('element' => 'my_element'));
but when i load on the browser i get this following error which i have no clue about it and i have been trying to fix this for last two days and pls help me resolve this error
SyntaxError: illegal character
...eldset>\r\n\t<a href=\"#\" onclick=\"$('#popup_1').show(); return false;\">click...
and any other possible solution to this would be appriciated.
Ok then see how implement webdesignandsuch.com/how-to-create-a-popup-with-css-and-javascript/
Now read the implementation from the above link, now what you have to do in cakephp is given below:
Step 1: Add these two lines in your layout file, lets say you are using default layout then inside body tags add:
<div id="blanket" style="display:none"></div>
<div id="popUpDiv" style="display:none">
These two lines will be used to create popup div and blanket
Step 2: Now You will see css-pop.js file there is one function
function popup(windowname) {
blanket_size(windowname);
window_pos(windowname);
toggle('blanket');
toggle(windowname);
}
So you have to make one ajax request to get HTML of popup as example is shown below
$.ajax({
url: "test.html",
context: document.body
}).done(function(data) {
$('#popUpDiv').html(data);
popup('popUpDiv');
});
data would be html code which is a response of your ajax request.
Step 3: Create url of ajax request, lets sat ajax_signup.html
echo all the html part followed by exit.
Now you get an idea what I am trying to explain!

How to programmatically add script or stylesheet on a per page basis in Docpad

How can I programmatically add script or stylesheet tag to a page specified in page's YAML front matter (meta)?
Assuming there is src/documents/posts/a.html.eco with following contents:
---
layout: default
scripts: ['a.js']
---
Blog post that requires a special javascript
and layout src/layouts/default.html.eco with following contents:
...
#getBlock('scripts').toHTML()
</body>
...
The final result for posts/a.html should be:
...
<!-- some extra stuff that was added when processing script tag -->
<script scr="/scripts/a.js"></script>
</body>
...
..while other pages shouldn't have a reference to /scripts/a.js
The comment above tag is just to show that there may be some processing envolved before injecting the tag.
I tried many approaches using different events in docpad.coffee file (including approach taken from docpad-plugin-livereload plugin) but every time I was facing the same problem - script tag was applied to all pages instead of being applied to a.html only. Here is one of my tries:
renderDocument: (opts) ->
{extension,templateData,file,content} = opts
if extension == 'html' and scripts = file.get('scripts')
if typeof scripts != 'undefined'
scripts.forEach (scriptName) ->
#docpad.getBlock('scripts').add('<!-- custom script tag here -->')
I've also tried render event, populateCollections (which is not documented however I found it in docpad-plugin-livereload plugin) and even extendTemplateData events and no luck so far.
I know there is a method of doing this right inside a layout:
#getBlock('scripts').add(#document.scripts or [])
..which is totally fine and it really works as expected however it doesn't seem to provide enough freedom for me to manipulate the content before it's injected to a page.. And even if it's possible I won't like the idea of having some heavy logic inside layout template, I want it to be in a plugin/docpad.coffee
Hopefully that makes sense
Try templateData.getBlock('scripts').add instead of docpad.getBlock('scripts').add

Watin - Cannot find link element (Javascript)

I'm using watin to automate a process on a web system (internal).
I can open the website and access some links, but others cannot be found. I think this may be either because they are deeply nested or because the href is javascript. This is the format they are in:
<frame>
<html>
<frameset>
<frame>
<html>
<body>
<div>
<table>
<table>
<tr>
<td>
<a id="1_1_1_a" href="javascript:blah" </a>
I've tried various different ways to find by id, element type etc. But I'm stuck on this.
Can anyone help?
Thanks
Elements within a frame have to be searched through its frame because each frame is considered a separate namespace in WatIn. So first get the correct frame (by calling the Frame method or through the Frames property - note that in your example you must do it twice as you have two nested frames) and then search for the link, e.g.:
ie.Frames[0].Frames[0].Link("1_1_1_a")
Try this
using (var browser = new IE("your_web_site_here"))
{
try
{
Frame first_frame = browser.Frame(Find.ById("frame_1_id"));
Frame second_frame = first_frame.Frame(Find.ById("frame_2_id"));
var first_div = second_frame.Div(Find.ById("first_div_id"));
Assert.IsTrue(first_div.Exists);
var Link_to_click = first_div.Link(Find.ById("1_1_1_a"));
Assert.IsTrue(Link_to_click.Exists);
Link_to_click.Click();
}
catch (Exception ex)
{
}
}
Sometimes watin cannot find elements I'm still trying to find out why and what are the preconditions.
I had a similar issue where my test would browse to a page that had 40 links on it and even though you could see all the links on the page the following statement would show links (and other elements collections) had a count of zero.
ie.Links.Count();
I'm still not sure exactly what the underlying issue is but I discovered that if you start Gallio Icarus / Gallio Echo / Visual Studio (or whatever test runner you are using) by Right Click -> Run as administrator the test works as expected and the elements are loaded into the ie browser object correctly.

Resources