How to scrape images with Cheerio and paste to Google Sheets? - image

This is my first trial to learn to how to scrape images from a web and paste them to Google Sheets. I want to download the second image from https://ir.eia.gov/ngs/ngs.html and paste it to a Google Sheet. In the web, there are two images. I want to get the second image under <img alt="Working Gas in Underground Storage Compared with Five-Year Range" src="ngs.gif" border="0">. I like to learn how to reference its img alt= or src="ngs.gif" in the code, not the index so I can utilize the concept to other various HTML situations also. Can anyone help fix the following code so that I can learn? Thank you!
function test() {
const url = 'https://ir.eia.gov/ngs/ngs.html';
const res = UrlFetchApp.fetch(url, { muteHttpExceptions: true }).getContentText();
var $ = Cheerio.load(res);
// I want to download the image, <img alt="Working Gas in Underground Storage Compared with Five-Year Range" src="ngs.gif" border="0">
// What should be changed in the following code?
var chart = $('img').attr('src').find('ngs.gif');
SpreadsheetApp.getActiveSheet().insertImage(chart, 1, 1);
}

I believe your goal as follows.
You want to retrieve the 2nd image of img tags and put it to the Spreadsheet.
In this HTML, it seems that the URL is https://ir.eia.gov/ngs/ + filename. So I thought that the method of insertImage(url, column, row) can be used. When this is reflect to your script, how about the following modified script?
Modified script:
function test() {
const url = 'https://ir.eia.gov/ngs/ngs.html';
const res = UrlFetchApp.fetch(url, { muteHttpExceptions: true }).getContentText();
const $ = Cheerio.load(res);
const urls = [];
$('img').each(function () {
urls.push("https://ir.eia.gov/ngs/" + $(this).attr('src'));
});
if (urls.length > 1) {
SpreadsheetApp.getActiveSheet().insertImage(urls[1], 1, 1); // 2nd image is retrieved.
}
}
When this script is run, the URL of https://ir.eia.gov/ngs/ngs.gif is retrieved and the image is put to the Spreadsheet.
Reference:
insertImage(url, column, row)
Added:
About your following new question in the comment,
Thanks a lot! So other than calling the index of the image, is there no method to call either alt="Working Gas in Underground Storage Compared with Five-Year Range" or src="ngs.gif" in the code? I'm just curious to learn a smart way for a potential scenario, for instance, if a web has 20 images and the locations of those images keep changing day by day, so the second image is not always in the second place. Thank you again for any guide!
In this case, how about the following sample script?
Sample script:
function test() {
const url = 'https://ir.eia.gov/ngs/ngs.html';
const res = UrlFetchApp.fetch(url, { muteHttpExceptions: true }).getContentText();
const $ = Cheerio.load(res);
const obj = [];
$('img').each(function () {
const t = $(this);
const src = t.attr('src');
obj.push({ alt: t.attr('alt'), src: src, url: "https://ir.eia.gov/ngs/" + src });
});
const searchAltValue = "Working Gas in Underground Storage Compared with Five-Year Range";
const searchSrcValue = "ngs.gif";
const ar = obj.filter(({alt, src}) => alt == searchAltValue && src == searchSrcValue);
if (ar.length > 0) {
SpreadsheetApp.getActiveSheet().insertImage(ar[0].url, 1, 1);
}
}
In this sample script, when the values of src and alt are Working Gas in Underground Storage Compared with Five-Year Range and ngs.gif, respectively, the URL is retrieved and put to the image.
If you want to select Working Gas in Underground Storage Compared with Five-Year Range OR ngs.gif, please modify alt == searchAltValue && src == searchSrcValue to alt == searchAltValue || src == searchSrcValue.

Related

Image duplicates itself when using appendParagraph in Google Script

I wrote a script to add an image from my Google Drive and some custom text to a Google Doc. (I got the image insertion code from here).
The resulting document is created ok, but my image is added twice for some reason...
function myFunction(e) {
var doc = DocumentApp.create('fileTest');
var body = doc.getBody();
var matchedFiles = DriveApp.getFilesByName('logo.png');
if (matchedFiles.hasNext()) {
var image = matchedFiles.next().getBlob();
var positionedImage = body.getParagraphs()[0].addPositionedImage(image);
}
body.appendParagraph('Test line of text for testing');
doc.saveAndClose();
}
However, if I get rid of my appendParagraph code (body.appendParagraph(t1);) I only get one image (but obviously without the paragraph of text I want)
What's going on here? And how do I add both one picture and my paragraph of text?
I have not even the slightest clue as to why, but I found a way to make this work.
Switching the order of my code seemed to do the trick. I simply moved the image-insertion code to the end (i.e., after the appendParagraph code), and it worked fine. No duplicate image!
function myFunction(e) {
var doc = DocumentApp.create('fileTest');
var body = doc.getBody();
body.appendParagraph('Test line of text for testing');
var matchedFiles = DriveApp.getFilesByName('logo.png');
if (matchedFiles.hasNext()) {
var image = matchedFiles.next().getBlob();
var positionedImage = body.getParagraphs()[0].addPositionedImage(image);
}
doc.saveAndClose();
}

Squarespace ajax call do not load video and audio

i'm working with squarespace developer mode, i create some javascript code to get my blog articles or events, the problem i have is that when i use ajax call to get my events to display them in a page i've created, the data is loaded correctly except blocks having audios and videos, i had the same issue with images, but SQS gives a solution for images :
var images = document.querySelectorAll('img[data-src]' );
for (var i = 0; i < images.length; i++) {
ImageLoader.load(images[i], {load: true});
}
But this do not solve the audio and videos blocks, i've tested every single code part existing on SQS forums but none of them work, ive also tested what suggested here but no solution.
Here is my code to get events:
$(document).ready(function(){
var url = '/test-blog?format=json';
$.getJSON(url).done(function(data) {
var items = data.items;
var $_container = $('#events-container-concert');
var result = "";
var appendText = [];
items.forEach(function(elm) {
var body = elm.body;
var $_body = $(body);
appendText.push("<div class='blog-item'><div id='body'>"+body+"</div>");
});
appendText.join(" ");
$_container.html(appendText);
var images = document.querySelectorAll('img[data-src]' );
for (var i = 0; i < images.length; i++) {
ImageLoader.load(images[i], {load: true});
}
});
});
Please is there anyone who had this issue on SQS?
no one answered my question on SQS forums
Thanks
You may use this to automatically load all the blocks (I believe even the image blocks, but if not then pair it with your call to ImageLoader):
window.Squarespace.AFTER_BODY_LOADED = false;
window.Squarespace.afterBodyLoad();
Of course, you'll want to wait until the content is on the page. You can pull up the page you linked-to and then run those two lines via console as a quick test. Worked well for me in that context.
Reference: https://github.com/Squarespace/squarespace-core/blob/master/src/Lifecycle.js

fineuploader - Read file dimensions / Validate by resolution

I would like to validate by file dimensions (resolution).
on the documentation page there is only information regarding file name and size, nothing at all in the docs about dimensions, and I also had no luck on Google.
The purpose of this is that I don't want users to upload low-res photos to my server. Thanks.
As Ray Nicholus had suggested, using the getFile method to get the File object and then use that with the internal instance object qq.ImageValidation to run fineuploader's validation on the file. A promise must be return because this proccess is async.
function onSubmit(e, id, filename){
var promise = validateByDimensions(id, [1024, 600]);
return promise;
}
function validateByDimensions(id, dimensionsArr){
var deferred = new $.Deferred(),
file = uploaderElm.fineUploader('getFile', id),
imageValidator = new qq.ImageValidation(file, function(){}),
result = imageValidator.validate({
minWidth : dimensionsArr[0],
minHeight : dimensionsArr[1]
});
result.done(function(status){
if( status )
deferred.reject();
else
deferred.resolve();
});
return deferred.promise();
}
Remained question:
Now I wonder how to show the thumbnail of the image that was rejected, while not uploading it to the server, the UI could mark in a different color as an "invalid image", yet the user could see which images we valid and which weren't...
- Update - (regarding the question above)
While I do not see how I could have the default behavior of a thumbnail added to the uploader, but not being uploaded, but there is a way to generate thumbnail manually, like so:
var img = new Image();
uploaderElm.fineUploader("drawThumbnail", id, img, 200, false);
but then I'll to create an item to be inserted to qq-upload-list myself, and handle it all myself..but still it's not so hard.
Update (get even more control over dimensions validation)
You will have to edit (currently) the qq.ImageValidation function to expose outside the private function getWidthHeight. just change that function deceleration to:
this.getWidthHeight = function(){
Also, it would be even better to change the this.validate function to:
this.validate = function(limits) {
var validationEffort = new qq.Promise();
log("Attempting to validate image.");
if (hasNonZeroLimits(limits)) {
this.getWidthHeight().done(function(dimensions){
var failingLimit = getFailingLimit(limits, dimensions);
if (failingLimit) {
validationEffort.failure({ fail:failingLimit, dimensions:dimensions });
}
else {
validationEffort.success({ dimensions:dimensions });
}
}, validationEffort.success);
}
else {
validationEffort.success();
}
return validationEffort;
};
So you would get the fail reason, as well as the dimensions. always nice to have more control.
Now, we could write the custom validation like this:
function validateFileDimensions(dimensionsLimits){
var deferred = new $.Deferred(),
file = this.holderElm.fineUploader('getFile', id),
imageValidator = new qq.ImageValidation(file, function(){});
imageValidator.getWidthHeight().done(function(dimensions){
var minWidth = dimensions.width > dimensionsLimits.width,
minHeight = dimensions.height > dimensionsLimits.height;
// if min-width or min-height satisfied the limits, then approve the image
if( minWidth || minHeight )
deferred.resolve();
else
deferred.reject();
});
return deferred.promise();
}
This approach gives much more flexibility. For example, you would want to have different validation for portrait images than landscape ones, you could easily identify the image orientation and run your own custom code to do whatever.

How do I make iScroll5 work when the image is generated from a DB?

I am using iScroll5 in a PhoneGap project. On the index page, user will click on a series of thumbnails generated from a database, then the image ID chosen will be written to localstorage, the page will change, the image ID will be pulled from localstorage and the image displayed.
It works fine if I reference the image directly (not from the DB) this way (as a test):
<body onload="loaded()">
<div id='wrapper'><div id='scroller'>
<ul><li><a id='output' href='index.html' onclick='returnTo()'></a></li></ul>
</div></div>
<script>
var newWP = document.createElement('img');
newWP.src = '0buggies/0118_buggies/wallpaper-18b2.jpg';
document.getElementById('output').appendChild(newWP);
</script>
</body>
I can pinch/zoom to resize the image for the screen (the main function my app requires), and scroll the image on the X and Y axis, then upon tapping the image, I will be returned to the index page. All of this works.
But if I pull the image out of a database and reference it the following way, all other aspects of the page code being the same, pinch/zoom does not work, though the picture is displayed and I can scroll on X and Y:
// ... DB code here ...
function querySuccess(tx, results) {
var path = results.rows.item.category +
"/" + results.rows.item.subcat +
"/" + results.rows.item.filename_lg;
document.getElementById("output").innerHTML = "<img src='" + path +
"'>";
}
// ... more DB code here ...
<body onload="loaded()">
<div id='wrapper'> <ul><li><a id='output' href='index.html'
onclick='returnTo()'></a></li></ul> </div>
How do I make iScroll5 work when the image is generated from a DB? I'm using the same CSS and iScroll JS on both pages. (iScroll4 has the same problem as iScroll 5 above.) I am using the SQLite DB plugin (from http://iphonedevlog.wordpress.com/2014/04/07/installing-chris-brodys-sqlite-database-with-cordova-cli-android/ which is my own site).
Try calling refresh on the scrollbar to get it to recognize the DOM change.
Best to wrap it in a 0-delay setTimeout, like so (Stolen from http://iscrolljs.com/#refresh)
:
setTimeout(function () {
myScroll.refresh();
}, 0);
If it takes time for the image to load, you'll want to wait until it's loaded entirely, unless you know the dimensions up-front.
When dealing with images loaded dynamically things get a little more complicated. The reason is that the image dimensions are known to the browser only when the image itself has been fully loaded (and not when the img tag has been added to the DOM).
Your best bet is to explicitly declare the image width/height. You'd do this like so:
function querySuccess (results) {
var path = results.rows.item.category +
"/" + results.rows.item.subcat +
"/" + results.rows.item.filename_lg;
var img = document.createElement('img');
img.width = 100;
img.height = 100;
img.src = path;
document.getElementById('output').appendChild(img);
// need to refresh iscroll in case the previous img was smaller/bigger than the new one
iScrollInstance.refresh();
}
If width/height are unknown you could save the image dimensions into the database and retrieve them together with the image path.
function querySuccess (results) {
var path = results.rows.item.category +
"/" + results.rows.item.subcat +
"/" + results.rows.item.filename_lg;
var img = document.createElement('img');
img.width = results.width;
img.height = results.height;
img.src = path;
document.getElementById('output').appendChild(img);
// need to refresh iscroll in case the previous img was smaller/bigger than the new one
iScrollInstance.refresh();
}
If you can't evaluate the image dimensions in any way then you have to wait for the image to be fully loaded and at that point you can perform an iScroll.refresh(). Something like this:
function querySuccess (results) {
var path = results.rows.item.category +
"/" + results.rows.item.subcat +
"/" + results.rows.item.filename_lg;
var img = document.createElement('img');
img.onload = function () {
setTimeout(iScrollInstance.refresh.bind(iScrollInstance), 10); // give 10ms rest
}
img.onerror = function () {
// you may want to deal with error404 or connection errors
}
img.src = path;
document.getElementById('output').appendChild(img);
}
Why is the viewport user-scalable prop different on each sample? works=no, broken=yes
Just an observation.
fwiw, here are a few things to look into:
Uncomment the deviceReady addListener, as Cordova init really depends on this.
Your loaded() method assigns myScroll a new iScroll, then explicitly calls onDeviceReady(), which then declares var myScroll; -- this seems inherently problematic - rework this.
If 1 & 2 don't help, then I suggest moving queryDB(tx); from populateDB() to successCB() and commenting out the myScroll.refresh()
And just a note, I find that logging to console is less intrusive than using alerts when trying to track down a symptom that seems to be messing with events firing, or timing concerns.

Jscript image tag creation gives an error

function pushImage () {
var img = new Image();
img.src = '/waroot/chart/chart.png';
document.getElementById("test1").innerHTML = "<img src='/waroot/chart/chart.png'>";
document.getElementById("test2").innerHTML = img;
}
Test 1 works and shows the image, but test 2 doesn't. I am not sure how to solve it but i will need the way test 2 works further along my project since i'm going to have to circle through a large amount of images.
The images are created by JFreeCharts, saved to png and then have to be posted to a site. As a side question: is it possible to push the freecharts objects straight to the jscript instead of having to save them prior (i could throw them into the dictionary and then look them up later but i'm not sure if this works)
Use .appendChild(img) instead of .innerHTML:
function pushImage () {
var img = new Image();
img.src = '/waroot/chart/chart.png';
document.getElementById("test1").innerHTML = "<img src='/waroot/chart/chart.png'>";
document.getElementById("test2").appendChild(img);
}
Demo
This is because img is an image object, not an html string, so you have to append it to the DOM.
P.S., don't forget that the alt attribute is required in the img tag!

Resources