Using puppeteer to screenshot local files, but it's still making web requests? - caching

I am using Puppeteer to take screenshots of a web page for my company. I need to test multiple people's accounts so that means visiting the page multiple times (150 times in this case). This results in our firewall kicking me out for making too many requests.
My solution is to just fetch the contents of the page and save them locally. Then I use puppeteer on that local file, overriding the function used to get data from our servers to instead just use data already loaded into Node from a CSV.
All of this works, but it looks like it's still making requests to our servers.
I tried giving it a userDataDir so it could cache any resources. In theory, if it's loading it from file://, it's caching the resources and there's no Ajax requests, it shouldn't be making any further requests, right?
I also tried installing a debugging proxy but since it's https I can't see what it's trying to request.
This is how I start it:
puppeteer.launch({
userDataDir: "temp/"
})
.then(browser => {
next(browser, links);
)
.catch(error => {
cb(error, null);
});
next will iterate through any links it needs to visit.
This part saves the page locally:
if (this._linkCache[baseLink] === undefined) {
fetch(baseLink)
.then(resp => resp.text())
.then(contents => {
fs.writeFile(fullFileName, contents, 'utf8', err => {
if (err) {
cb(err, null);
} else {
this._linkCache[baseLink] = fileUrl;
gotoPage(fileUrl);
}
});
})
.catch(error => {
cb(error, null);
});
}
// Go to the cached version
else {
gotoPage(this._linkCache[baseLink] + queryParams);
}
And this gets the screenshots:
const gotoPage = async(url) => {
try {
const page = await browser.newPage();
// Override 'fetchAccountData' function
await page.evaluateOnNewDocument(testData => {
window["fetchAccountData"] = (cb: (err: any, data: any)=>void) => {
cb(null, testData);
};
}, data);
// Go to page and get screenshot
await page.goto(url);
const screenie = `${outputPath}${uuid()}.png`;
await page.screenshot({ fullPage: true, path: screenie, type: "png" });
pageHtml.push(`<img src="file://${screenie}" />`);
next(browser, rest);
} catch (e) {
cb(e, null);
}
};
I was hoping this would be able to only make a few requests at the beginning while it saves the html locally and caches all the resources, but it seems to make a request for every link.
How can I stop it?

Related

IndexedDB breaks in Firefox after trying to save autoIncremented Blob

I am trying to implement Blob storage via IndexedDB for long Media recordings.
My code works fine in Chrome and Edge (not tested in Safari yet) - but won't do anything in Firefox. There are no errors, it just doesn't try to fulfill my requests past the initial DB Connection (which is successful). Intuitively, it seems that the processing is blocked by something. But I don't have anything in my code which would be blocking.
Simplified version of the code (without heavy logging and excessive error checks which I have added trying to debug):
const dbName = 'recording'
const storeValue = 'blobs'
let connection = null
const handler = window.indexedDB || window.mozIndexedDB || window.webkitIndexedDB
function connect() {
return new Promise((resolve, reject) => {
const request = handler.open(dbName)
request.onupgradeneeded = (event) => {
const db = event.target.result
if (db.objectStoreNames.contains(storeValue)) {
db.deleteObjectStore(storeValue)
}
db.createObjectStore(storeValue, {
keyPath: 'id',
autoIncrement: true,
})
}
request.onerror = () => {
reject()
}
request.onsuccess = () => {
connection = request.result
connection.onerror = () => {
connection = null
}
connection.onclose = () => {
connection = null
}
resolve()
}
})
}
async function saveChunk(chunk) {
if (!connection) await connect()
return new Promise((resolve, reject) => {
const store = connection.transaction(
storeValue,
'readwrite'
).objectStore(storeValue)
const req = store.add(chunk)
req.onsuccess = () => {
console.warn('DONE!') // Fires in Chrome and Edge - not in Firefox
resolve(req.result)
}
req.onerror = () => {
reject()
}
req.transaction.oncomplete = () => {
console.warn('DONE!') // Fires in Chrome and Edge - not in Firefox
}
})
}
// ... on blob available
await saveChunk(blob)
What I tried so far:
close any other other browser windows, anything that could count as on "open connection" that might be blocking execution
refresh Firefox profile
let my colleague test the code on his own machine => same result
Additional information that might useful:
Running in Nuxt 2.15.8 dev environment (localhost:3000). Code is used in the component as a Mixin. The project is rather large and uses a bunch of different browser APIs. There might be some kind of collision ?! This is the only place where we use IndexedDB, though, so to get to the bottom of this without any errors being thrown seems almost impossible.
Edit:
When I create a brand new Database, there is a brief window in which Transactions complete fine, but after some time has passed/something triggered, it goes back to being queued indefinitely.
I found out this morning when I had this structure:
...
clearDatabase() {
// get the store
const req = store.clear()
req.transaction.oncomplete = () => console.log('all good!')
}
await this.connect()
await this.clearDatabase()
'All good' fired. But any subsequent requests were broken same as before.
On page reload, even the clearDatabase request was broken again.
Something breaks with ongoing usage.
Edit2:
It's clearly connected to saving a Blob instance without an id with the autoIncrement option. Not only does it fail silently, it basically completely corrupts the DB. If I manually assign an incrementing ID to a Blob object, it works! If I leave out the id field for a regular simple object, it also works! Anyone knows about this? I feel like saving blobs is a common use-case so this should have been found already?!
I've concluded, unless proven otherwise, that it's a Firefox bug and opened a ticket on Bugzilla.
This happens with Blobs but might also be true for other instances. If you find yourself in the same situation there is a workaround. Don't rely on autoIncrement and assign IDs manually before trying to save them to the DB.

How do you store a custom 404 page on the client/browser to show when there's no internet connection? [duplicate]

I have a service worker that is supposed to cache an offline.html page that is displayed if the client has no network connection. However, it sometimes believes the navigator is offline even when it is not. That is, navigator.onLine === false. This means the user may get offline.html instead of the actual content even when online, which is obviously something I'd like to avoid.
This is how I register the service worker in my main.js:
// Install service worker for offline use and caching
if ('serviceWorker' in navigator) {
navigator.serviceWorker.register('/service-worker.js', {scope: '/'});
}
My current service-worker.js:
const OFFLINE_URL = '/mysite/offline';
const CACHE_NAME = 'mysite-static-v1';
self.addEventListener('install', (event) => {
event.waitUntil(
// Cache the offline page when installing the service worker
fetch(OFFLINE_URL, { credentials: 'include' }).then(response =>
caches.open(CACHE_NAME).then(cache => cache.put(OFFLINE_URL, response)),
),
);
});
self.addEventListener('fetch', (event) => {
const requestURL = new URL(event.request.url);
if (requestURL.origin === location.origin) {
// Load static assets from cache if network is down
if (/\.(css|js|woff|woff2|ttf|eot|svg)$/.test(requestURL.pathname)) {
event.respondWith(
caches.open(CACHE_NAME).then(cache =>
caches.match(event.request).then((result) => {
if (navigator.onLine === false) {
// We are offline so return the cached version immediately, null or not.
return result;
}
// We are online so let's run the request to make sure our content
// is up-to-date.
return fetch(event.request).then((response) => {
// Save the result to cache for later use.
cache.put(event.request, response.clone());
return response;
});
}),
),
);
return;
}
}
if (event.request.mode === 'navigate' && navigator.onLine === false) {
// Uh-oh, we navigated to a page while offline. Let's show our default page.
event.respondWith(caches.match(OFFLINE_URL));
return;
}
// Passthrough for everything else
event.respondWith(fetch(event.request));
});
What am I doing wrong?
navigator.onLine and the related events can be useful when you want to update your UI to indicate that you're offline and, for instance, only show content that exists in a cache.
But I'd avoid writing service worker logic that relies on checking navigator.onLine. Instead, attempt to make a fetch() unconditionally, and if it fails, provide a backup response. This will ensure that your web app behaves as expected regardless of whether the fetch() fails due to being offline, due to lie-fi, or due to your web server experiencing issues.
// Other fetch handler code...
if (event.request.mode === 'navigate') {
return event.respondWith(
fetch(event.request).catch(() => caches.match(OFFLINE_URL))
);
}
// Other fetch handler code...

upload multiple files to sftp server using Rxjs

Good Day! I would like to implement a convenient method for uploading a multiple files to an sftp-server with methods of calling back each ofuploaded files to server.
I have already tried to implement some code that works, but I saw that there is a memory leak that does not allow to successfully close the connection to the sftp server server after all download.
it is absolutely not critical to constantly open the connection and close it for me.
I tweaked the code a little bit from here: how do I send (put) multiple files using nodejs ssh2-sftp-client?
code:
function sftpPutFiles(config, files, pathToDir, callbackStep, callbackFinish, callbackError) {
let Client = require('ssh2-sftp-client');
let PromisePool = require('es6-promise-pool');
const sendFile = (config, pathFrom, pathTo) => {
return new Promise(function (resolve, reject) {
let sftp = new Client();
console.log(pathFrom, pathTo);
sftp.on('keyboard-interactive', (name, instructions, instructionsLang, prompts, finish) => { finish([config.password]); });
sftp.connect(config).then(() => {
return sftp.put(pathFrom, pathTo);
}).then(() => {
console.log('finish '+pathTo);
callbackStep(pathTo);
sftp.end();
resolve(pathTo);
}).catch((err) => {
console.log(err, 'catch error');
callbackError(err);
});
});
};
// Create a pool.
let indexFile = 0;
let pool = new PromisePool(() => {
while (indexFile < files.length) {
let file = files[indexFile];
indexFile++;
return sendFile(config, file.path, `${pathToDir}/${file.name}`);
}
return null;
}, 10);
pool.start().then(function () {
console.log({"message":"OK"}); // res.send('{"message":"OK"}');
callbackFinish();
});
}
using
input.addEventListener('change', function (e) {
e.preventDefault();
sftpPutFiles(
{host: '192.168.2.201', username: 'crestron', password: 'ehAdmin'},
this.files,
`./Program01/test/`,
pathTo => {
let tr = document.createElement('tr');
let bodyTable = document.querySelector('.body');
tr.innerHTML = `<td>${bodyTable.children.length+1}</td><td>${pathTo}</td><td>OK</td>`;
bodyTable.appendChild(tr);
}, () => {
alert('Всё файлы загружены');
},
err => {
alert('Ошибка: '+err);
}
);
});
If there is an error uploading the file to the sftp server, the connection does not close and I cannot reconnect when I open the custom console. I would like to translate the code to Rxjs to better support and I think I can solve the problem of closing the connection and responsiveness of the application.
make sure your using the latest version of ssh2-sftp-client - there has been a fair amount of updates recently, including fixes to handle errors more consistently and ensure connections are closed correctly. (v4.1.0).
You are using sftp.on('keyboard-interaction', ...). There is nothing which emits events of this type in the module, so this listener will not fire.
If you just want to upload files, use the fastPut() method. It is much faster. Make sure the destination path includes the remote file name and not just the remote directory.
Have a look at Promise.all(). You could use this instead of the promise-pool and I think it would be a lot cleaner. Something like (untested)
let localPath = '/path/to/src-dir';
let remotePath = '/path/to/dst-dir';
let files = ['file1.txt', file2.txt','file3.txt'];
let client = new Client();
client.connect(config)
.then(() => {
let promises = [];
files.forEach(f => {
let from = path.join(localPath, f);
let to = path.join(remotePath, f);
promise.push(client.fastPut(from, to));
});
return Promise.all(promises);
}).then(res => { // res is array of resoved promise results
client.end();
}).catch(err => {
// deal with error
});

Create React App with Service Workers

I have upgraded my CRA to version 3.10.8 as it has built in support for PWA.
As a next step I have registered my service worker in the index.js and I think it got registered succesfully.
Now my main goal is to have some offline caching for our API calls (backend in Rails), so that when there is no network I can serve the cached response .
Is there anything else that I need to do to serve cached API responses.
When I built my app with Create react App, all it did was create a file called
registerServiceWorker.js and then this gets called from the index.js.
Also the final app we are building is packaged with Codova so most of the Assets will be in local , our main aim is to cache the API calls. Is this the right way to go. We are using Redux for state management, but have not use any persistence as of now.
Any help/tips would be highly appreciated.
registerServiceWorker.js code below...
// In production, we register a service worker to serve assets from local cache.
// This lets the app load faster on subsequent visits in production, and gives
// it offline capabilities. However, it also means that developers (and users)
// will only see deployed updates on the "N+1" visit to a page, since previously
// cached resources are updated in the background.
const isLocalhost = Boolean(
window.location.hostname === 'localhost' ||
// [::1] is the IPv6 localhost address.
window.location.hostname === '[::1]' ||
// 127.0.0.1/8 is considered localhost for IPv4.
window.location.hostname.match(
/^127(?:\.(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)){3}$/
)
);
export default function register() {
if (process.env.NODE_ENV === 'production' && 'serviceWorker' in navigator) {
// The URL constructor is available in all browsers that support SW.
const publicUrl = new URL(process.env.PUBLIC_URL, window.location);
if (publicUrl.origin !== window.location.origin) {
// Our service worker won't work if PUBLIC_URL is on a different origin
// from what our page is served on. This might happen if a CDN is used to
// serve assets; see https://github.com/facebookincubator/create-react-app/issues/2374
return;
}
window.addEventListener('load', () => {
const swUrl = `${process.env.PUBLIC_URL}/service-worker.js`;
if (!isLocalhost) {
// Is not local host. Just register service worker
registerValidSW(swUrl);
} else {
// This is running on localhost. Lets check if a service worker still exists or not.
checkValidServiceWorker(swUrl);
}
});
}
}
function registerValidSW(swUrl) {
navigator.serviceWorker
.register(swUrl)
.then(registration => {
registration.onupdatefound = () => {
const installingWorker = registration.installing;
installingWorker.onstatechange = () => {
if (installingWorker.state === 'installed') {
if (navigator.serviceWorker.controller) {
// At this point, the old content will have been purged and
// the fresh content will have been added to the cache.
// It's the perfect time to display a "New content is
// available; please refresh." message in your web app.
console.log('New content is available; please refresh.');
} else {
// At this point, everything has been precached.
// It's the perfect time to display a
// "Content is cached for offline use." message.
console.log('Content is cached for offline use.');
}
}
};
};
})
.catch(error => {
console.error('Error during service worker registration:', error);
});
}
function checkValidServiceWorker(swUrl) {
// Check if the service worker can be found. If it can't reload the page.
fetch(swUrl)
.then(response => {
// Ensure service worker exists, and that we really are getting a JS file.
if (
response.status === 404 ||
response.headers.get('content-type').indexOf('javascript') === -1
) {
// No service worker found. Probably a different app. Reload the page.
navigator.serviceWorker.ready.then(registration => {
registration.unregister().then(() => {
window.location.reload();
});
});
} else {
// Service worker found. Proceed as normal.
registerValidSW(swUrl);
}
})
.catch(() => {
console.log(
'No internet connection found. App is running in offline mode.'
);
});
}
export function unregister() {
if ('serviceWorker' in navigator) {
navigator.serviceWorker.ready.then(registration => {
registration.unregister();
});
}
}
I am using the Create-react-app version 3.
change the condition statement, Remove the codition (process.env.NODE_ENV === 'production' &&) it should only have if('serviceWorker' in navigator).
create your custom-service-worker file in public folder rewrite the following code as const swUrl = ${process.env.PUBLIC_URL}/service-worker.js as swUrl = ./custom-service-worker.js.
In the custom-service-worker.js file in public folder add the follow code, please refer the sample external api calls( place your api urls to be cached)
importScripts("https://storage.googleapis.com/workbox-cdn/releases/4.3.1/workbox-sw.js");
if (workbox) {
console.log('workbok loaded', workbox.routing)
}
//to cache the css html js and images files
workbox.routing.registerRoute(
/\.(?:js|html|css|images|svg)$/,
new workbox.strategies.NetworkFirst()
);
workbox.routing.registerRoute(
'http://localhost:3000',
new workbox.strategies.NetworkFirst()
);
//to cache the external api calls
workbox.routing.registerRoute(
new RegExp('https://jsonplaceholder.typicode.com/users'),
new workbox.strategies.StaleWhileRevalidate()
);
//to cache the external api calls
workbox.routing.registerRoute(new RegExp('http://insight.dev.schoolwires.com/HelpAssets/C2Assets/C2Files/C2ImportUsersSample.csv'),
new workbox.strategies.StaleWhileRevalidate()
);

my browser registers the service worker and caches the urls but it doesn't work offline?

This registers and works perfectly fine online. But when the server is turned off, and when the page is refreshed, the registered serviceworker no longer shows up in the console and no caches in the cache storage.
if ("serviceWorker" in navigator){
navigator.serviceWorker.register("/sw.js").then(function(registration){
console.log("service worker reg", registration.scope)
}).catch(function(error){
console.log("Error:", error);
})
}
in sw.js
var CACHE_NAME = 'cache-v1';
var urlsToCache = [
'/index.html'
];
self.addEventListener('install', function(event) {
event.waitUntil(
caches.open(CACHE_NAME).then(function(cache) {
return cache.addAll(urlsToCache);
})
);
//event.waitUntil(self.skipWaiting());
});
You're properly adding your file to cache, but you're missing returning your cached file on request.
Your sw.js should also have following code:
self.addEventListener('fetch', function(event) {
event.respondWith(
caches.match(event.request)
.then(function(response) {
// Cache hit - return response
if (response) {
return response;
}
return fetch(event.request);
}
)
);
});
It's from Introduction to service workers.
Morover, you should rather cache / instead of /index.html as usually, you don't hit index.html file directly.
Your service worker doesn't show any console.log when offline, because you don't have any activate code. This article - offline cookbook is very useful to understand details.

Resources