NodeJS Bluebird Cheerio Web Scraper, Proper Indexing and Parsing of scraped data - promise

Functional Overview
I am scraping strings of thumbnail image URL data from an insecure site (http), downloading the images, and then uploading these images to a secure website server (https). I am using cheerio and bluebird to scrape list of website URL's using a mapped promise request, my code is shown below. I push the thumbnail URL image string data from the website URL's to an array stored in the "json" array and then write a file with the contained json data to a suppImages.json file.
Current Issue I am trying to address
Their is a variable number of thumbnail images (around 20 each) contained in the website URL's I am scraping. Right now, my code is set up to aggregate all the thumbnail image URL's into one array. What I would like my code to do is parse the specific thumbnail image URL's into separate arrays per website URL. So basically instead of the output looking like one blob of aggregate data from all the website URL's I want their to be several arrays each contained the discrete thumbnail images displayed on the given website URL.
My code
let fs = require('fs')
const requestPromise = require('request-promise');
const Promise = require('bluebird');
const cheerio = require('cheerio');
const suppURL = require('./output.json');
const urls = suppURL.urli;
console.log("Currently reading URLs from buttons of Realty Warp"+urls)
var json = { pictureThumb: []};
scraper = () => Promise.map(urls, requestPromise)
.map((htmlOnePage, index) => {
const $ = cheerio.load(htmlOnePage);
var linksPic = $(".thumb img");
$(linksPic).each(function(i, link){
var sop = $(this).attr('src');
console.log("sop:" + sop)
json.pictureThumb.push(sop);
});
fs.writeFile('suppImages.json', JSON.stringify(json, null, 6), function(err){
console.log('wrote file');
})
return console.log("URL"+index+':Scrape Complete');
})
.then()
.catch((e) => console.log('We encountered an error' + e));
scraper()

Related

How to show multiple images at once with a jspsych plugin?

I am coding in Gorilla and try to show multiple images at once using a JsPsych plugin.
Currently I am using the plugin jsPsychImageButtonResponse as follows:
var trial = {
type: jsPsychImageButtonResponse,
stimulus: [aURL, bURL, cURL],
choices: [aURL, bURL, cURL],
};
I would like to show three cards (named aURL, bURL, cURL) and make the participants click on one of them.
I uploaded the images in the resources tab on Gorilla and referred to them like this:
// URL's for images
var aURL = gorilla.stimuliURL('a.png');
var bURL = gorilla.stimuliURL('b.png');
var cURL = gorilla.stimuliURL('c.png');
// Create an array containing all the URL's for our required stimuli
var images = [];
images.push(aURL);
images.push(bURL);
images.push(cURL);
Any help is welcome.
Thank you in advance!

Is there any way to get google classroom form question insert title image URL

I want to get the image url which is inserted when create a question into the classroom form.
Below is the code through we get the title , choices if available but i am not able to get the image url which is insert under the question title.
function getCourse() {
var form = FormApp.openById(id);
var formResponses = form.getItems();
var type=formResponses[0].getType();
var title = formResponses[0].getTitle();
var image =formResponses[0].getImage();//no such method Logger.log(image);
}
That image is not available through the Forms Service, it's added through the /viewresponse source code which is generated some way by Google. You could get it by using the URL Fetch Service (UrlFetchApp).
Related
How can I scrape text and images from a random web page?
(javascript / google scripts) How to get the title of a page encoded with iso-8859-1 so that the title will display correctly in my utf-8 website?
var blob = questionType.getImage();
var b64 = blob.getContentType() + ';base64,'+ Utilities.base64Encode(blob.getBytes());
var html = "data:" + b64 ;

How do you create and then insert an image file from Google Drive into a Google Apps Script UrlFetchApp.fetch call to an External API?

I am using Google Apps Script to call an external API to post both text and an image to a contact in an external system. I have posted the text fine, many times, no problems. I have not worked with sending or even using images in Apps Script before, so I am unsure of how to send the image as a file. I've done quite a bit of research on Stack Overflow and elsewhere, but have not found the answer to this yet.
The API documentation for the external system says that it needs the following:
contactId - Type: String
Message:
text - Type: String... Description: Message text (Media or Text is required).
upload - Type: File... Description: Message image (Media or Text is required). Media must be smaller than 1.5mb. Use a jpg, jpeg, png, or gif.
The "upload", type "File" (a jpg picture/image) is what I cannot figure out how to grab, format, and send. I currently have the image in Google Drive, have shared it for anyone to access via its URL, and it is well under 1.5MB.
Here is most of my test code (marked as JS, but really Google Apps Script), with the identifying info changed, with several different ways I have tried it. At this point, I am just banging my head against the wall! Any help is greatly appreciated!!! Thank you!
function TestAPI() {
var apiKey2 = '9xxxxx-xxxx2-xxxxx-bxxx-3xxxxxxa'; //API Key for the external system
var url4 = 'https://www.externalsystem.com/api/v1/[contactID]/send';
var pic = DriveApp.getFileById("1yyyyyy_Nyyyyxxxxxx-_xxxxxxx_xyyyy_"); // I Tried this
// var pic = driveService.files().get('1yyyyyy_Nyyyyxxxxxx-_xxxxxxx_xyyyy_'); //And tried this
// var pic = DriveApp.getFileByID('1yyyyyy_Nyyyyxxxxxx-_xxxxxxx_xyyyy_').getAs(Family.JPG); //And this
// var pic = { "image" : { "source": {"imageUri": "https://drive.google.com/file/d/1yyyyyy_Nyyyyxxxxxx-_xxxxxxx_xyyyy_" } } }; //And this
// var pic = { file : DriveApp.getFileById("1yyyyyy_Nyyyyxxxxxx-_xxxxxxx_xyyyy_") }; //And this
var formData = {
'contactID': '[contactID]',
'text': "Text here to send to external system through API", // This works fine every time!
'upload': pic // Tried this
// 'upload': DriveApp.getFileByID("1yyyyyy_Nyyyyxxxxxx-_xxxxxxx_xyyyy_").getBlob() // And tried this
// 'upload': { "image" : { "source": {"imageUri": "https://drive.google.com/file/d/1yyyyyy_Nyyyyxxxxxx-_xxxxxxx_xyyyy_" } } } // And this
};
var options4 = {
"headers":{"x-api-key":apiKey2},
'method' : 'post',
'payload': formData
};
var response4 = UrlFetchApp.fetch(url4, options4);
}
Once again, everything is working fine (text, etc.) except for the image (the "upload") not coming through. I am pretty sure it is because I don't know how to "package" the image from Google Drive through the UrlFetchApp call to the API.
The official document says that Message text (Media or Text is required) and Message image (Media or Text is required). From this, please try to test as following modification.
Modified request body:
var formData = {
upload: DriveApp.getFileById("###").getBlob()
};
I thought that from the official document, when both 'upload' and 'text' are used, only 'text' might be used.
And also, from your tested result, it was found that the request body is required to be sent as the form data.
Reference:
Contacts - Send Message To Contact

User generated image hosting for angularJS App

I'm building a web app with AngularJS that will allow users to upload their own images. Right now all of my data is text based, so I am storing the text based data in Firebase. As far as I know, Firebase can't store images. What I want to do is store the user generated images somewhere simple (I'm thinking Amazon S3 or even Dropbox) and then reference the images via unique URLs, which I would store as text in Firebase.
My questions:
Does this seem like a valid approach?
Any recommended services for hosting the images?
How to upload an image to the hosting service and get the image's unique URL?
Right now I am allowing users to upload images on the front end with the following code, just not sure what to do with the images once I have them. Would appreciate any help, I'm very new to this!
HTML
<output id="list"></output>
<input type="file" id="files" name="files[]" class="button" multiple />
Upload Pictures</i>
Angular Controller
$scope.getImages = function(){
$("input[type='file']").trigger('click');
}
function handleFileSelect(evt) {
var files = evt.target.files; // FileList object
// Loop through the FileList and render image files as thumbnails.
for (var i = 0, f; f = files[i]; i++) {
console.log(f);
// Only process image files.
if (!f.type.match('image.*')) {
continue;
}
var reader = new FileReader();
// Closure to capture the file information.
reader.onload = (function(theFile) {
return function(e) {
// Render thumbnail.
var span = document.createElement('span');
span.innerHTML = ['<img class="thumb" src="', e.target.result,
'" title="', escape(theFile.name), '"/>'].join('');
document.getElementById('list').insertBefore(span, null);
};
})(f);
// Read in the image file as a data URL.
reader.readAsDataURL(f);
}
}
document.getElementById('files').addEventListener('change', handleFileSelect, false);
You could use a service like Cloudinary to host images uploaded by your users. There are Angular directives that making using the service pretty easy. You will need a small server-side component to encrypt the upload parameters.
Look into Zapier integration with S3. The idea is that you setup a queue collection in firebase where you would create a new instance with the binary of the file data. Then zapier listens to for child_added on this queue collection and does it's magic (that you don't have to worry about) to upload your file to S3 bucket. After everything is finished, the instance in the queue is deleted... No server side needed with that, except there might be some fees...
Here is the link https://zapier.com/zapbook/amazon-s3/

Show image saved into mongodb gridfs with node.js

I am saving images of my aplication into gridfs. The problem becomes when I need to show the image. I don't know how can i do it. I'm using node.js, geddy framework and mongodb.
this.show = function (req, resp, params) {
var self = this;
var GridFS = require('GridFS').GridFS;
var myFS = new GridFS('resources');
//recupero la imagen
myFS.get(params.id,function(err,data){
});
myFS.close();
params.id is the image id. When i do console.log(data) I recive:
Buffer <90 f8 w8 dj 4f....>
How can I do to respond the image in png format to the view?
thanks...a lot!
I've never used geddy at all, but you might want to look at this:
Render Image Stored in Mongo (GridFS) with Node + Jade + Express
The basic idea is to set the right "Content-Type" header ("image/png" should work) and simply reply to the request with the image data.
Your browser can render the image if you're using an <img src="/url/to/your/image/request/handler">... tag in the html.
You can't console.log image data with most shells / command lines, sorry. ;)

Resources