Scraping with casperjs -- Not sure how to handle empty div - casperjs

I'm using casperjs to scrape a site. I setup a function which stores a string into a variable named images (shown below) and it works great.
images = casper.getElementsAttribute('.search-product-image','src');
I then call that variable in fs so I can export it to a CSV, which also works fine.
casper.then(function() {
var f = fs.open('e36v10.csv', 'w');
f.write(imagessplit + String.fromCharCode(13));
f.close();
});
The issue I just noticed is that not all products have images, so when the scraper hits a product without an image it passes by it obviously. I need it to at least alert me somehow (something as simple as filler text thats says, "no image here") when it passes by a product without an image because what I do is I copy that string (along with may other strings) and organize them into columns within the CSV and it messes up the order of everything without having some sort of filler text ("no image here"). Thanks
Edit
Below is the exact source from the website I am trying to pull from.
A product I can get the image from and my code works fine:
<div class="search-v4-product-image">
<img alt="238692" class="search-product-image" src="http://d5otzd52uv6zz.cloudfront.net/group.jpg">
<p class="image-overlay">Generic</p>
</div>
A product with no image and my scraper passes right by it without alerting me.
<div class="search-v4-product-image"> </div>

First I would do images = casper.getElementsInfo('.search-product-image') which will give you an array of elements matching .search-product-image. Then you can iterate over this array and extract the src attribute from each element with: var src = image.attributes.src
Now that you have the src attribute you can simply check wether it has a value or not. If it does not, then you could assign it to placeholder text.

You can write this functionality for the page context this way:
casper.then(function(){
var imgList = this.evaluate(function(){
var productImages = document.querySelectorAll("div.search-v4-product-image"),
imageList = [];
Array.prototype.forEach.call(productImages, function(div){
if (div.children.length == 0) {
imageList.push({empty: true});
} else {
var img = div.children[0]; // assumes that the image is the first child
imageList.push({empty: false, src: img.src});
}
});
return imageList;
});
var csv = "";
imgList.forEach(function(img){
if (img.empty) {
csv += ";empty";
} else {
csv += img.src+";";
}
});
fs.write('e36v10.csv', csv, 'w');
});
This iterates over all divs and pushes the src to an array. You can check the empty property for every element.
I suspect that the output would be more meaningful if you iterate over all product divs and check it this way. Because then you can also write the product name to the csv.
You could use CSS selectors but then you would need make the :nth-child selection much higher in the hierarchy (product div list). This is because :nth-child only works based on its parent and not over the whole tree.

Related

Nightwatch .execute() how to pass parameter to execute function, is it possible?

Please be patient - I am a beginner in programming. Tester for long time but programming is not my domain.
My test is:
from the backend I get some list with some element (e.g. 5 text strings)
I click some element on page which displayed those 5 elements (of course I don't know if listed elements are correct or not)
I need to check if list of elements displayed on ui is the list received from backend
Problem:
I cannot access the elements by Nightwatch api css selector, at least I could not manage (Angular app) to do it with Nightwatch
I found I could do it with .execute()
My code is (failing):
browser
.click(selector.HEADER.APPS_GRID, function () {
for (var app in appsList) {
let appShortName = appsList[app].shortName
let appLongName = appsList[app].longName
let appUrl = appsList[app].url
let appVisibility = appsList[app].visibility
browser.execute(function(app){
var appShortNameDisplayed = document.getElementsByClassName('logo-as-text')[app].innerText
var appLongNameDisplayed = document.getElementsByClassName('app-name')[app].innerText
return [appShortNameDisplayed, appLongNameDisplayed]
}, function(result){
console.log(result.value[0])
})
}
})
It fails in lines:
var appShortNameDisplayed = document.getElementsByClassName('logo-as-text')[app].innerText
var appLongNameDisplayed = document.getElementsByClassName('app-name')[app].innerText
unfortunately I have to make query with [app] - iterating by elements of object. If I skip [app].innerText I get some data like element-6066-11e4-a52e-4f735466cecf instead of text values displayed on page
I get error:
Error while running .executeScript() protocol action: TypeError: document.getElementsByClassName(...)[app] is undefined
Is it possible to pass the "app" param (counter) to the document query?
Or is it the way I have to make one query that will return as many data as necessary and then handle data returned in this block
function(result) {
console.log(result.value[0])
})
The fragment of html page is
<div _ngcontent-c8="" class="ep-app-icon mt-auto mb-auto text-center logo-as-text"> XXX </div>
... and I need to get this "XXX" text.
As your own comment suggests, there is an args argument to .execute that is an array. The array elements will be the arguments in the function passed to execute.
See https://nightwatchjs.org/api/commands/#execute
.executeAsync(function(){
var buttons=document.getElementsByTagName('button');
buttons[2].click();
return buttons;
},[],function(result){
console.log('done')
})
Try Async it works for sure

Image duplicates itself when using appendParagraph in Google Script

I wrote a script to add an image from my Google Drive and some custom text to a Google Doc. (I got the image insertion code from here).
The resulting document is created ok, but my image is added twice for some reason...
function myFunction(e) {
var doc = DocumentApp.create('fileTest');
var body = doc.getBody();
var matchedFiles = DriveApp.getFilesByName('logo.png');
if (matchedFiles.hasNext()) {
var image = matchedFiles.next().getBlob();
var positionedImage = body.getParagraphs()[0].addPositionedImage(image);
}
body.appendParagraph('Test line of text for testing');
doc.saveAndClose();
}
However, if I get rid of my appendParagraph code (body.appendParagraph(t1);) I only get one image (but obviously without the paragraph of text I want)
What's going on here? And how do I add both one picture and my paragraph of text?
I have not even the slightest clue as to why, but I found a way to make this work.
Switching the order of my code seemed to do the trick. I simply moved the image-insertion code to the end (i.e., after the appendParagraph code), and it worked fine. No duplicate image!
function myFunction(e) {
var doc = DocumentApp.create('fileTest');
var body = doc.getBody();
body.appendParagraph('Test line of text for testing');
var matchedFiles = DriveApp.getFilesByName('logo.png');
if (matchedFiles.hasNext()) {
var image = matchedFiles.next().getBlob();
var positionedImage = body.getParagraphs()[0].addPositionedImage(image);
}
doc.saveAndClose();
}

ajax append something if condition is verified

How do i make this working?
var makePage = $('<div />').attr('data-role', 'page').attr('id', 'p'+item.id)
.append($('<div>').attr('data-role', 'header')
.append('')
.append('<img src="images/app/logo.png" id="navImg"/>')
.append('<div class="separatore"></div></div>'))
.append($('<div />').attr('data-role', 'main').attr('class', 'ui-content')
.append('<h2 class="hstyle">'+item.name+' '+item.surname+'</h2>')
.append($('<ul />').attr('data-role', 'listview').attr('data-inset', 'true')
if (item.cellulare != '') { .append('<li><img src="images/app/tel.png" class="ui-li-icon">'+item.cellulare+'</li>') }
) // data-role page
); // data-role main
makePage.appendTo($.mobile.pageContainer);
the condition i want is "if variable item is not empty, append this..".
Thanks
Store your new div into a variable if you want to later append new elements onto it. The way you were doing it you were telling the compiler to use the append method of a true value which doesn't exist.
var newDiv = $('<div />').attr('data-role', 'page')
.append($('<div>')
.attr('data-role', 'header');
if (item != '') {
newDiv.append('')
.append('<img src="images/app/logo.png" id="navImg"/>');
}
If you need to append into specific elements you need to append at that level then put that level into the parent. For instance if you were trying to append the image inside the anchor (note your code if it were to be proper code would append all into the first div you created):
$('').append('<img src="images/app/logo.png" id="navImg"/>').appendTo(newDiv);
In your first part it looks like you possibly were intending to create a div and then append another div with a specific data-role, but the way you were doing it would create a div with data-role page, append another div, and then update the parent div's data-role to header. To do it in order you should do:
var newDiv = $('<div />').attr('data-role', 'page');
$('<div>').attr('data-role', 'header').appendTo(newDiv);
Write like this.
if(item != '')
{
//write your code
}
you are mixing code

newb: typo3 access uploaded images in typoscript

I'm trying to do something like in this Tutorial, a very basic gallery.
In the example of the Tut they load images from uploads/media/ like so
page.10.marks.PROJECTTHUMBNAIL = IMG_RESOURCE
page.10.marks.PROJECTTHUMBNAIL {
stdWrap.wrap = <img src="|" />
file {
import = uploads/media/
import.data = levelmedia: -1,slide
import.listNum = 0
}
}
but now I want to load pictures that have been uploaded in an image-cObject.
This is an embarrassing question but I've been trying to figure this out for two days and I can't seem to get it right -.- I'm sure there are lots of answers out there... I just don't know the magic words to put into google to FIND them T-T
I tried very basic stuff like just doing the same as above but with a different path, I rummaged through the TSRef of IMAGE and IMG_RESOURCE, tried fiddling with CONTENT, and tried to adapt the tt_content.image.20 = USER (?? O.o) description in the typoscript object-browser... but all to no avail, as I know so little what I'm doing -.-
Any nudge in the right direction would be greatly appreciated!
You have to load the content elements using the CONTENT cObject and set how the content shall be rendered. This will load Image content elements on the given page regardless of what column they are in:
page.10.marks.PROJECTTHUMBNAIL = CONTENT
page.10.marks.PROJECTTHUMBNAIL {
table = tt_content
select {
where = CType = 'image' AND image != ''
orderBy = sorting ASC
}
renderObj = IMAGE
renderObj {
file {
import = uploads/pics/
import.field = image
import.listNum = 0
}
}
}
NOTE: The renderObj is just my example and it renders only the first image of the Image element. You can set the rendering as you please, e.g. set the file to be GIFBUILDER which would allow you to resize the image. You can also tweak the select to load content elements with more refined conditions.

jQuery Functions need to run again after ajax is complete

I am developing a website that parses rss feeds and displays them based on category. You can view it here: http://vitaminjdesign.com/adrian
I am using tabs to display each category. The tabs use ajax to display a new set of feeds when they are clicked.
I am also using two other scripts- One called equalheights, which re-sizes all of the heights to that of the tallest item. And the other script I am using is called smart columns, which basically resize your columns so it always fills the screen.
The first problem I am having is when you click a new tab (to display feeds within that category). When a new tab is clicked, the console shows a jQuery error:
$(".block").equalHeights is not a function
[Break On This Error] $(".block").equalHeights();
The main problem is that each feed box fills up the entire screen's width (after you click on a tab), even if there are multiple feed boxes in that category.
MY GUESS - although all of the feeds (across all tabs) are loaded on pageload, when a new tab is selected, both jQuery scripts need to be run again. any ideas on how I can make this work properly?
One thing to note - I used the ajaxSuccess method for making equalHeights work on the first page...but it wont work after a tab is clicked.
My jQuery code for the tabs are below:
$(".tab_content").hide(); //Hide all content
$("ul.tabs li:first").addClass("active").show(); //Activate first tab
$(".tab_content:first").show(); //Show first tab content
$("#cities li:nth-child(1)").addClass('zebra');
$("#column li ul li:nth-child(6)").addClass('zebra1');
//On Click Event
$("ul.tabs li").click(function() {
$("ul.tabs li").removeClass("active"); //Remove any "active" class
$(this).addClass("active"); //Add "active" class to selected tab
$(".tab_content").hide(); //Hide all tab content
var activeTab = $(this).find("a").attr("href"); //Find the href attribute value to identify the active tab + content
$(activeTab).fadeIn(); //Fade in the active ID content
$(".block").equalHeights();
return false;
});
Thanks to Macy (see answer below), I have brought my jQuery script to the following: (still does not work)
$(document).ajaxSuccess(function(){
var script = document.createElement('script');
script.src = 'js/equalHeight.js';
document.body.appendChild(script);
equalHeight($(".block"));
I found some small problems in your code. I am not sure that my suggestions will solve all the problems, but I decide to describe my first results here.
1) You should remove comma before the '}'. Currently the call look like $("#column").sortable({/**/,});
2) The function equalHeight is not jQuery plugin. It is the reason why the call $(".block").equalHeights(); inside your 'click' event handler follows to the error "$(".block").equalHeights is not a function" which you described. You should change the place of the code to equalHeight($(".block")); like you use it on other places.
3) The script http://vitaminjdesign.com/adrian/js/equalHeight.js defines the function equalHeight only and not start any actions. Once be loaded it stay on the page. So you should not load it at the end of every ajax request. So I suggest to reduce the script
$(document).ajaxSuccess(function(){
var script = document.createElement('script');
script.src = 'http://vitaminjdesign.com/adrian/js/equalHeight.js';
document.body.appendChild(script);
equalHeight($(".block"));
$("a[href^='http:']:not([href*='" + window.location.host + "'])").each(function() {
$(this).attr("target", "_blank");
});
});
to
$(document).ajaxSuccess(function(){
equalHeight($(".block"));
$("a[href^='http:']:not([href*='" + window.location.host + "'])").each(function() {
$(this).attr("target", "_blank");
});
});
4) I suggest to change the code of http://vitaminjdesign.com/adrian/js/equalHeight.js from
function equalHeight(group) {
tallest = 0;
group.each(function() {
thisHeight = $(this).height();
if(thisHeight > tallest) {
tallest = thisHeight;
}
});
group.height(tallest);
}
to
function equalHeight(group) {
var tallest = 0;
group.each(function() {
var thisHeight = $(this).height();
if(thisHeight > tallest) {
tallest = thisHeight;
}
});
group.height(tallest);
}
to eliminate the usage of global variables tallest and thisHeight. I recommend you to use JSLint to verify all your JavaScript codes. I find it very helpful.
5) I recommend you to use any XHTML validator to find some small but sometime very important errors in the markup. Try this for example to see some errors. The more you follow the XHTML standards the more is the probability to have the same results of the page in different web browsers. By the way, you can dramatically reduce the number of the errors in your current code if the scripts included in the page will be in the following form
<script type="text/javascript">
//<![CDATA[
/* here is the JavaScript code */
//]]>
</script>
I didn't analysed the full code but I hope that my suggestions will solve at least some of problems which you described in your question.
Essentially, when you add a new element to the document, the equalheights script has not attached its behavior to that new element. So, the "quick fix", is probably to re-embed the equalheights script after an ajax request has completed so that it re-attaches itself to all elements on the page, including the elements you just added.
Before this line: $(".block").equalHeights(); , add a line of script which re-embeds/re-runs your equalheights script.
$.getScript('<the location of your equalHeightsScript>');
$.getScript('<the location of your smartColumnsScript>');
$(".block").equalHeights();
or
var script = document.createElement('script');
script.src = '<the location of your script>';
document.body.appendChild(script);
A better solution would be to upgrade the plugin so it takes advantage of live. However, I'm not up to that at the moment :)
Some Error Here
$("ul.tabs li").click(function() {
$("ul.tabs li").removeClass("active"); //Remove any "active" class
$(this).addClass("active"); //Add "active" class to selected tab
$(".tab_content").hide(); //Hide all tab content
.
.
.
});
Should be re-written like this
$("ul.tabs li").click(function() {
$(this).addClass("active").Siblings("li").removeClass("active");; //Remove any "active" class Add "active" class to selected tab
$(".tab_content").hide(); //Hide all tab content
.
.
.
});
I don't think you need to run the scripts again after the ajax, or at least that's not the "main" problem.
You seem to have some problems in the script smartColumn.js
Right now it seems to only operate on the ul with the id "column" ('#column'), and it is working on the one UL#column you do have, but of course your HTML has many other "columns" all of which have the class "column" ('.column') that you want it to work on as well.
Just to get the beginning of what you are trying to do, change all the selectors in smartColumn.js that say 'ul#column' to say 'ul.column' instead, and then alter the HTML so that the first "column" has a class="column" rather than an id="column".
That should solve the 100% wide columns at least.
That should solve your "Main" Problem. But there are other problems.

Resources