Directly using require() in Express instead of placing in a variable - performance

I'm building an app with express and using passport's facebook login
The example application is:
https://github.com/passport/express-4.x-facebook-example/blob/master/server.js
And from it has come to my attention that I can skip the const/var=require... format and directly do this if I never have to reference it again:
e.g
const createError = require('http-errors'),
session = require('cookie-session');
...
app.use(session({ secret: process.env.cookie_secret, resave: true, saveUninitialized: true }));
app.use(function(req, res, next) {
next(createError(404));
});
becomes
app.use(require('cookie-session')({ secret: process.env.cookie_secret, resave: true, saveUninitialized: true }));
app.use(function(req, res, next) {
next(require('http-errors')(404));
});
This works, great, my file is half it's length now but.. I'm worried about the performance implication of this?

require() is a synchronous operation and blocks the event loop. As such, you generally do not want to ever be doing the first require() of a module in the middle of an actual request handler in a server because that will momentarily block the event loop.
Now, since modules are cached, only the first time you require() a module will actually take very long. But, never-the-less, it is considered a good coding practice to load your dependencies upon startup when synchronous I/O is no big deal and not during run-time.
If there were any problems with loading dependencies, you probably also want those to be discovered at server startup time, not once the server is already serving customers.
So, I think the answer to your question is yes and no. Yes, it's just fine to directly require() without assigning to variables in your startup code. No, it's not recommended to do so inside a request handler or middleware. Better to load your dependencies upon startup. Now, no great harm comes to your code if you happen to do a require() inside a request handler because only the first time actually loads if from disk and takes very long, but as a general practice, it's not the recommended way of coding just because you're trying to save a variable name somewhere.
Personally, I'd also like to know that once my server has startup, all dependencies have been successfully loaded too so there is no danger of an imperfect install being discovered later after it starts serving requests (where it may not be as obvious what went wrong and where users would see the consequences).
Here's one other thing to consider. Javascript is moving from require() to import over time and you cannot use import except at the top level of a module. You can't use it inside a statement.
Summary:
You want to load dependencies at startup so you don't block the event loop during actual processing of requests.
You want to load dependencies at startup so you discover any missing dependencies at server startup and not during server run-time.
Code is generally considered more reader-friendly if dependencies are obvious and easy to see for anyone who works on this module.
In the future when we all are using import instead of require(), import is only allowed at the top level.

Related

Is there a way to delay cache revalidation in service worker?

I am currently working on performance improvements for a React-based SPA. Most of the more basic stuff is already done so I started looking into more advanced stuff such as service workers.
The app makes quite a lot of requests on each page (most of the calls are not to REST endpoints but to an endpoint that basically makes different SQL queries to the database, hence the amount of calls). The data in the DB is not updated too often so we have a local cache for the responses, but it's obviously getting lost when a user refreshes a page. This is where I wanted to use the service worker - to keep the responses either in cache store or in IndexedDB (I went with the second option). And, of course, the cache-first approach does not fit here too well as there is still a chance that the data may become stale. So I tried to implement the stale-while-revalidate strategy: fetch the data once, then if the response for a given request is already in cache, return it, but make a real request and update the cache just in case.
I tried the approach from Jake Archibald's offline cookbook but it seems like the app is still waiting for real requests to resolve even when there is a cache entry to return from (I see those responses in Network tab).
Basically the sequence seems to be the following: request > cache entry found! > need to update the cache > only then show the data. Doing the update immediately is unnecessary in my case so I was wondering if there is any way to delay that? Or, alternatively, not to wait for the "real" response to be resolved?
Here's the code that I currently have (serializeRequest, cachePut and cacheMatch are helper functions that I have to communicate with IndexedDB):
self.addEventListener('fetch', (event) => {
// some checks to get out of the event handler if certain conditions don't match...
event.respondWith(
serializeRequest(request).then((serializedRequest) => {
return cacheMatch(serializedRequest, db.post_cache).then((response) => {
const fetchPromise = fetch(request).then((networkResponse) => {
cachePut(serializedRequest, response.clone(), db.post_cache);
return networkResponse;
});
return response || fetchPromise;
});
})
);
})
Thanks in advance!
EDIT: Can this be due to the fact that I put stuff into IndexedDB instead of cache? I am sort of forced to use IndexedDB instead of the cache because those "magic endpoints" are POST instead of GET (because of the fact they require the body) and POST cannot be inserted into the cache...

Why $time from $lock=Cache::lock('name', $time) should be greater than the updating Cache time?

I placed this code inside a Route::get() method only to test it quicker. So this is how it looks:
use Illuminate\Support\Facades\Cache;
Route::get('/cache', function(){
$lock = Cache::lock('test', 4);
if($lock->get()){
Cache::put('name', 'SomeName'.now());
dump(Cache::get('name'));
sleep(5);
// dump('inside get');
}else{
dump('locked');
}
// $lock->release();
});
If you reach this route from two browsers (almost)at the same time. They both will respond with the result from dump(Cache::get('name'));. Shouldn't the second browser respond be "locked"? Because when it calls the $lock->get() that is supposed to return false? And that because when the second browser tries to reach this route the lock should be still set.
That same code works just fine if the time required for the code after the $lock = Cache::lock('test', 4) to be executed is less than 4. If you set the sleep($sec) when $sec<4 you will see that the first browser reaching this route will respond with the result from Cache::get('name') and the second browser will respond with "locked" as expected.
Can anyone explain why is this happening? Isn't it suppose that any get() method to that lock, expect the first one, to return false for that amount of time the lock has been set? I used 2 different browsers but it works the same with 2 tabs from the same browser too.
Quote from the 5.6 docs https://laravel.com/docs/5.6/cache#atomic-locks:
To utilize this feature, your application must be using the memcached or redis cache driver as your application's default cache driver. In addition, all servers must be communicating with the same central cache server.
Quote from the 5.8 docs https://laravel.com/docs/5.8/cache#atomic-locks:
To utilize this feature, your application must be using the memcached, dynamodb, or redis cache driver as your application's default cache driver. In addition, all servers must be communicating with the same central cache server.
Quote from the 8.0 docs https://laravel.com/docs/8.x/cache#atomic-locks:
To utilize this feature, your application must be using the memcached, redis, dynamodb, database, file, or array cache driver as your application's default cache driver. In addition, all servers must be communicating with the same central cache server.
Apparently, they have been adding support for more drivers to make use of this lock functionality. Check which Cache driver you are using and if it fits the support list of your Laravel version.
There is likely an atomicity issue here where the cache driver you are using is not able to lock a file atomically. What should happen is that when a process (i.e. a php request) is writing to the lock file, all other processes requiring the lock file should at least wait until the lock file available to be read again. If not, they read the lock file before it has been written to, which obviously causes a race condition.
I saw this question I asked, well now I can say that the problem I was trying to solve here was not because of the atomic lock. The problem here is the sleep method. If the time provided to the sleep method is bigger than the time that a lock will live, it means when the next request it's able to hit the route the lock time will expire(will be released). And that's because let's say you have defined a route like this:
Route::get('case/{value}', function($value){
if($value){
dump('hit-1');
}else{
sleep(5);
dump('hit-0');
}
});
And you open two browser tabs with the same URL that hits this route something like:
127.0.0.1:8000/case/0
and
127.0.0.1:8000/case/1
It will show you that the first route will take 5sec to finish execution and even if the second request is sent almost at the same time with the first request, still it will wait to finish the first one and then run. This means the second request will last 5sec(from the first request) plus the time it took to run.
Back to the asked question the lock time will expire by the time the second request will get it or said differently run the $lock->get() statement.

What could cause Realm to lock up when attempting a write?

My team is currently facing an issue in our Xamarin.Forms app across all platforms(Android, iOS, and UWP). Realm will frequently become unresponsive, where the only way to use it again is to close the app. Over the past few months it's become more frequent and easy to reproduce, yet we have not been able to determine the cause or a workaround.
We have identified a few patterns that may help identify what's happening. We've noticed that whenever something that needs information from the database, we'll see that worker thread stuck on a Realm.Write() call. This behavior seems almost as if there's a deadlock occuring within the Realm system. It's not consistent as to which Write() call it's stuck on, seeming to be random based on when the Realm fails. At that point, any other attempts to access this realm through any method, such as Find(),All(),Remove(), etc also get stuck. We've also confirmed that the code within the Write() is never being run at this point, since we can put a realm independent logging call on the first line and never see it in our logs.
Once this issue occurs, some other issues can happen in addition to this. We have two other Realms in our app that handle completely separate data, and as such have no overlapping code. These Realms are never the cause of this issue, but when the problem Realm gets stuck, it sometimes causes the other Realms to get stuck on their next calls as well. This issue also sometimes persists between uses of the app, causing the very first call to Realm to get stuck and requires a complete reinstall to fix.
Due to our app using Reactive based programming, we've had to structure how we handle our database a bit differently. For the problem Realm, we have a service that keeps a single instance active in an observable stream, which can then be subscribed to for watching changes. I've added some examples of this architecture at the end of this post. We also route all our other non-observable actions through this stream, however during debugging we've been able to move these calls to their own independent realm instances with little issue/no change to functionality.
Currently, we're thinking it's most likely an issue related either to how we're converting Realm to an observable system, or with our Realms crashing/becoming corrupted somehow.
RealmStream declaration:
_realmStream = Observable
.Start(() => Realm.GetInstance(_dbConfig), _scheduler)
.Do(_ => logger.LogTrace("Realm created"), () => logger.LogTrace("Realm stream completed"))
.Replay()
.AutoConnect();
RealmStream use example:
public IObservable<IChangeSet<TResult>> GetChangeSetStream<TSource, TResult>(Func<Realm, IQueryable<TSource>> selector, Func<TSource, TResult> transformFactory) where TSource : RealmObject
{
return _realmStream
.Select(realm =>
selector(realm)
.AsRealmCollection()
.ToObservableChangeSet<IRealmCollection<TSource>, TSource>()
.SubscribeOn(_scheduler)
.Transform(transformFactory)
.DisposeMany())
.Switch()
.Catch<IChangeSet<TResult>, Exception>(ex =>
{
_logger.LogError(ex, "Error getting property change stream");
return Observable.Return<IChangeSet<TResult>>(default);
})
.SubscribeOn(_scheduler);
}
Non-Observable realm methodss:
public async Task Run(Action<Realm> action)
{
await _realmStream
.Do(action)
.SubscribeOn(_scheduler);
}
public async Task<TResult> Run<TResult>(Func<Realm, TResult> action)
{
return await _realmStream
.Select(action)
.SubscribeOn(_scheduler);
}
So far, we've attempted the following:
Made sure Realm and Xamarin are both on the most recent versions
Reducing the number of Realm.Write()s (Minor improvement)
Moving every Realm function into our observable system (No noticable change, most of our functions already do this)
Attempted moving everything that does not require observables to using independent realm instances (increased frequency of locking)
Attempted to move everything away from our single instance of Realm. We weren't able to do this, as we could not determine how to properly handle some observable events, such as a RealmObject being deleted, without causing major performance issues
realm.Write needs to acquire a write lock and based on your description, it appears that you do get a deadlock where a thread with an open write transaction waits for another thread that is stuck on the realm.Write call. If you're able to reproduce the hand with a debugger attached, you can inspect the threads window and try to pinpoint the offending code.
This article provides some tips about debugging deadlocks. Unfortunately, without the whole project and a repro case, it'd be hard to pinpoint the cause.

How to access a python object from a previous HTTP request?

I have some confusion about how to design an asynchronous part of a web app. My setup is simple; a visitor uploads a file, a bunch of computation is done on the file, and the results are returned. Right now I'm doing this all in one request. There is no user model and the file is not stored on disk.
I'd like to change it so that the results are delivered in two parts. The first part comes back with the request response because it's fast. The second part might be heavy computation and a lot of data, so I want it to load asynchronously, whenever it's done. What's a good way to do this?
Here are some things I do know about this. Usually, asynchronicity is done with ajax requests. The request will be to some route, let's say /results. In my controller, there'll be a method written to respond to /results. But this method will no longer have any information from the previous request, because HTTP is stateless. To get around this, people pass info through the request. I could either pass all the data through the request, or I could pass an id which the controller would use to look up the data somewhere else.
My data is a big python object (a pandas DataFrame). I don't want to pass it through the network. If I use an id, the controller will have to look it up somewhere. I'd rather not spin up a database just for these short durations, and I'd also rather not convert it out of python and write to disk. How else can I give the ajax request access to the python object across requests?
My only idea so far is to have the initial request trigger my framework to render a second route, /uuid/slow_results. This would be served until the ajax request hits it. I think this would work, but it feels pretty ad hoc and unnatural.
Is this a reasonable solution? Is there another method I don't know? Or should I bite the bullet and use one of the aforementioned solutions?
(I'm using the web framework Flask, though this question is probably framework agnostic.
PS: I'm trying to get better at writing SO questions, so let me know how to improve it.)
So if your app is only being served by one Python process, you could just have a global object that's a map from ids to DataFrames, but you'd also need some way of expiring them out of the map so you don't leak memory.
So if your app is running on multiple machines, you're screwed. If your app is just running on one machine, it might be sitting behind apache or something and then apache might spawn multiple Python processes and you'd still be screwed? I think you'd find out by doing ps aux and counting instances of python.
Serializing to a temporary file or database are fine choices in general, but if you don't like either in this case and don't want to set up e.g. Celery just for this one thing, then multiprocessing.connection is probably the tool for the job. Copying and lightly modifying from here, the box running your webserver (or another, if you want) would have another process that runs this:
from multiprocessing.connection import Listener
import traceback
RESULTS = dict()
def do_thing(data):
return "your stuff"
def worker_client(conn):
try:
while True:
msg = conn.recv()
if msg['type'] == 'answer': # request for calculated result
answer = RESULTS.get(msg['id'])
conn.send(answer)
if answer:
del RESULTS[msg['id']]
else:
conn.send("doing thing on {}".format(msg['id']))
RESULTS[msg['id']] = do_thing(msg)
except EOFError:
print('Connection closed')
def job_server(address, authkey):
serv = Listener(address, authkey=authkey)
while True:
try:
client = serv.accept()
worker_client(client)
except Exception:
traceback.print_exc()
if __name__ == '__main__':
job_server(('', 25000), authkey=b'Alex Altair')
and then your web app would include:
from multiprocessing.connection import Client
client = Client(('localhost', 25000), authkey=b'Alex Altair')
def respond(request):
client.send(request)
return client.recv()
Design could probably be improved but that's the basic idea.

How to know when a web page is loaded when using QtWebKit?

Both QWebFrame and QWebPage have void loadFinished(bool ok) signal which can be used to detect when a web page is completely loaded. The problem is when a web page has some content loaded asynchronously (ajax). How to know when the page is completely loaded in this case?
I haven't actually done this, but I think you may be able to achieve your solution using QNetworkAccessManager.
You can get the QNetworkAccessManager from your QWebPage using the networkAccessManager() function. QNetworkAccessManager has a signal finished ( QNetworkReply * reply ) which is fired whenever a file is requested by the QWebPage instance.
The finished signal gives you a QNetworkReply instance, from which you can get a copy of the original request made, in order to identify the request.
So, create a slot to attach to the finished signal, use the passed-in QNetworkReply's methods to figure out which file has just finished downloading and if it's your Ajax request, do whatever processing you need to do.
My only caveat is that I've never done this before, so I'm not 100% sure that it would work.
Another alternative might be to use QWebFrame's methods to insert objects into the page's object model and also insert some JavaScript which then notifies your object when the Ajax request is complete. This is a slightly hackier way of doing it, but should definitely work.
EDIT:
The second option seems better to me. The workflow is as follows:
Attach a slot to the QWebFrame::javascriptWindowObjectCleared() signal. At this point, call QWebFrame::evaluateJavascript() to add code similar to the following:
window.onload = function() { // page has fully loaded }
Put whatever code you need in that function. You might want to add a QObject to the page via QWebFrame::addToJavaScriptWindowObject() and then call a function on that object. This code will only execute when the page is fully loaded.
Hopefully this answers the question!
To check the load of specific element you can use a QTimer. Something like this in python:
#pyqtSlot()
def on_webView_loadFinished(self):
self.tObject = QTimer()
self.tObject.setInterval(1000)
self.tObject.setSingleShot(True)
self.tObject.timeout.connect(self.on_tObject_timeout)
self.tObject.start()
#pyqtSlot()
def on_tObject_timeout(self):
dElement = self.webView.page().currentFrame().documentElement()
element = dElement.findFirst("css selector")
if element.isNull():
self.tObject.start()
else:
print "Page loaded"
When your initial html/images/etc finishes loading, that's it. It is completely loaded. This fact doesn't change if you then decide to use some javascript to get some extra data, page views or whatever after the fact.
That said, what I suspect you want to do here is expose a QtScript object/interface to your view that you can invoke from your page's script, effectively providing a "callback" into your C++ once you've decided (from the page script) that you've have "completely loaded".
Hope this helps give you a direction to try...
The OP thought it was due to delayed AJAX requests but there also could be another reason that also explains why a very short time delay fixes the problem. There is a bug that causes the described behaviour:
https://bugreports.qt-project.org/browse/QTBUG-37377
To work around this problem the loadingFinished() signal must be connected using queued connection.

Resources