Why fetching data from sqlite block Node.JS? - performance

I want to fetch and export to csv file huge amount (5 - 12 milions rows) of archive data from Sqlite database. While doing this the whole server is blocked. No other connection can be handled by server (for example I couldn't open website in another tab in browser).
Node.JS server part:
function exportArchiveData(response, query){
response.setHeader('Content-type', 'text/csv');
response.setHeader('Content-disposition', 'attachment; filename=archive.csv');
db.fetchAllArchiveData(
query.ID,
function(error, data){
if(!error)
response.write(data.A + ';' + data.B + ';' + data.C + '\n');
},
function(error, retrievedRows){
response.end();
});
};
Sqlite DB module:
module.exports.SS.prototype.fetchAllArchiveData = function (
a, callback, complete) {
var self = this;
// self.sensorSqliteDb.all(
self.sensorSqliteDb.each(
'SELECT A, B, C '+
'FROM AD WHERE '+
' A="' + a + '"'+
' ORDER BY C ASC' +
';'
,
callback,
complete
);
};
I also create index on AD like CREATE INDEX IAD ON AD(A, C) and EXPLAIN QUERY PLAN show that this index is used by sqlite engine.
Still, when I call exportArchiveData server send the data properly but no other action can be performed during this. I have a huge amount of data (5 - 12 milions of rows to send) so it takes ~3 minutes.
How can I prevent this from blocking whole server?
I thought that if I use EACH and there will be callback's the server will be more responsive. Also Memory usage is huge (about 3GB and even more). Can I prevent this somehow?
In answer to comments, I would like to add some clarifications:
I use node-sqlite3 from developmentseed. It should be asynchronous and non-blocking. And it is. When statement is prepared I can request main page. But when server start serving data, then Node.js server is blocked. I guess thats because request for home page is one request to call some callback while there are milions request for callback handling archive data "EACH".
If I use sqlite3 tool from linux command line I do not get rows immediately but that is not the problem as long as node-sqlite3 is non-blocking.
Yes. I'm hitting CPU max. What is worse, when I request twice as much data the whole memory is used, and then server freeze forever.

OK. I handle this problem this way.
Instead of using Database#each I use Database#prepare with multiple Statement#get.
What is more, I investigate that running out of memory was caused by full buffer of response. So now, I call for next row when I get previous and when response buffer have place for new data. Working perfect. And Now server is not blocked (only during preparing statement).
Sqlite module:
module.exports.SS.prototype.fetchAllArchiveData = function (
a) {
var self = this;
var statement = self.Db.prepare(
'SELECT A, B, C '+
'FROM AD WHERE '+
' A="' + a + '"'+
' ORDER BY C ASC' +
';'
,
function(error){
if(error != null){
console.log(error);
}
}
);
return statement;
};
Server side:
function exportArchiveData(response, query){
var respRet = null;
var i = 0;
var statement = db.fetchAllArchiveData(
query.ID);
var getcallback = function(err, row){
if(err != null){
console.mylog(err);
return;
}
if(typeof(row) != 'undefined'){
respRet = response.write(row.A + ';' + row.B + ';' + row.C + '\n');
console.log(i++ + ' ' + respRet);
if(respRet){
statement.get(getcallback);
}else{
console.log('should wait on drain');
response.on('drain', function(){
console.log('drain - set on drain to null, call statement');
response.on('drain', function(){});
statement.get(getcallback);
});
}
}else{
response.end();
}
};
statement.get(function(err, row){
response.setHeader('Content-type', 'text/csv');
response.setHeader('Content-disposition', 'attachment; filename=archive.csv');
getcallback(err, row);
});
};

Related

How to trigger Shiny App

I have a shiny app, which has a chunk of code to preload required data. This process takes a long time, but it only needs to run once each day.
The problem is the shiny_preload_data() function only gets triggered when the first user access the app and this user has to wait for a long time for the data to be ready.
Is there a way to trigger the shiny_preload_data() before the first user opens a browser to access this app?
In side my server.R function, the code structure looks like this:
shiny_preload_data()
shinyServer(function(input, output, clientData, session) {
....
}
rm(list = ls())
library(shiny)
autoInvalidate <- reactiveTimer(10000,session = NULL)
GetData <- function(){
if(!exists("nextCall")){
Data <<- mtcars
# 84600 is + 1 day
nextCall <<- Sys.time() + 120
}
else if (Sys.time() >= nextCall){
Data <<- iris
# 84600 is + 1 day
nextCall <<- Sys.time() + 120
message(paste0("Next call at: ",nextCall))
}
else{
return()
}
}
ui <- fluidPage(mainPanel(tableOutput("table")))
server <- function(input, output, session){
observeEvent(autoInvalidate(),{
GetData()
})
output$table <- renderTable({
autoInvalidate()
Data
})
}
shinyApp(ui = ui, server=server)
As pointed out by #Taegost its best if you do that via a separate method using cron job, here is some examples how to do it.
If you want your app to do this lets say every x hours,minutes or daily you can write a function similar to mine. I simply check for file existence and compare the nextCall timestamp with the current one
For demonstration purposes I put the checking, which is the reactiveTimer to 10 secs and getting new data every 2 mins

How to ensure that clients are ready before emitting from socket.io server

I'm trying to make a two player competitive maze game using socket.io. To ensure that both players get the same maze, I want to send a seed to both clients, where the client then generates a maze based on said seed. However, when the second player joins, only the first player (who was already in the room) receives the emission.
Here is the relevant server-side room and seed emission code:
// Find and join an unfilled room
var roomID = 0;
while (typeof io.sockets.adapter.rooms[roomID.toString()] !== 'undefined' && io.sockets.adapter.rooms[roomID.toString()].length >= 2)
roomID++;
socket.join(roomID.toString());
console.log('A user from ' + socket.conn.remoteAddress + ' has connected to room ' + roomID);
// Seed announcement
if (io.sockets.adapter.rooms[roomID.toString()].length == 2) {
var seed = Math.random().toString();
socket.in(roomID).emit('seed', seed);
console.log("announcing seed " + seed + " to room " + roomID);
}
socket.on('seedAck', function(msg) {
console.log(msg);
})
On the client side, I have some code to respond back to the server with the seed, to find out if they're receiving the seed properly.
socket.on('seed', function(msg) {
// Some other code here...
socket.emit('seedAck', 'client has recieved seed ' + msg);
});
Here is what the server sees:
A user from ::1 has connected to room 0
A user from ::1 has connected to room 0
announcing seed 0.936373041709885 to room 0
client has received seed 0.936373041709885
To verify that only the client that was already in the room received the seed, I refreshed the first client, and this time only the second client received the seed.
I believe what's happening is that the server is sending the seed before the second client is ready. However, after multiple Google searches, I could not come to a solution. I was considering adding a button to the client, requiring the user to press the button first (and thus ensuring that the client is ready), but then it requires some tedious bookkeeping. I've also considering some sort of callback function, but I haven't come to a conclusion on how to properly implement it.
Is my only option to manually keep track when the both clients are ready, or is there a better solution that's integrated within socket.io?
Edit: I've attempted modified my code so that it waits for both clients to send a message that they're ready before sending the seed. Server-side, I have this:
socket.on('ready', function(msg) {
console.log(msg);
// Seed announcement
if (io.sockets.adapter.rooms[roomID.toString()].length == 2) {
var seed = Math.random().toString();
socket.to(roomID).emit('seed', seed);
console.log("announcing seed " + seed + " to room " + roomID);
}
});
while client-side, I now have this:
socket.on('seed', function(msg) {
// Some other code here ...
socket.emit('seedAck', 'client has received seed ' + msg);
});
socket.on('connect', function() {
socket.emit('ready', 'client is ready!');
});
However, the same problem persists, as shown by the server output:
A user from ::1 has connected to room 0
client is ready!
A user from ::1 has connected to room 0
client is ready!
announcing seed 0.48290129541419446 to room 0
client has received seed 0.48290129541419446
The second client still does not properly receive the seed.
A small modification to my edited server code fixed the problem:
socket.on('ready', function(msg) {
console.log(msg);
// Seed announcement
if (io.sockets.adapter.rooms[roomID.toString()].length == 2) {
var seed = Math.random().toString();
io.sockets.in(roomID.toString()).emit('seed', seed); // Modified line
console.log("announcing seed " + seed + " to room " + roomID);
}
});
This solution was taken from this SO question.

Parse Cloud "beforeSave" not saving data

I'm using parse beforeSave method to save an order, here is the code:
//Before save an order - if finish - set priority to 0
Parse.Cloud.beforeSave("UserToOrders", function(request, response) {
Parse.Cloud.useMasterKey();
var preStatus = request.object.get("OrderStatus");
if (preStatus == "Finish") {
request.object.set("Priority",0);
console.log("beforeSave(\"UserToOrders\")\t Order (" + request.object.id + ") Status is 'Finish' So Priority set to '0'");
}
else {
console.log("beforeSave(\"UserToOrders\")\t Order (" + request.object.id + ") Status Changed to: " + preStatus);
request.object.set("OrderStatus",preStatus);
}
response.success();
});
Here is the log:
I2016-03-09T20:56:05.779Z]v136 before_save triggered for UserToOrders for user pSi0iCGJJe:
Input: {"original":{"ACL":{"*":{"read":true},"vxgEWFQ7eu":{"read":true,"write":true}},"OrderStatus":"Ready","OrderStatusActivity":"Active","ResturantID":"g1bzMQEXoj","TimeToBeReady":{"__type":"Date","iso":"2016-03-08T23:35:23.916Z"},"UserETA":{"__type":"Date","iso":"2016-03-08T23:35:23.916Z"},"UserID":"vxgEWFQ7eu","createdAt":"2016-03-08T21:06:06.605Z","objectId":"t3NoxcSp5z","updatedAt":"2016-03-08T21:40:59.538Z"},"update":{"OrderStatus":"Finish","objectId":"t3NoxcSp5z"}}
Result: Update changed to {"OrderStatus":"Finish","Priority":0}
I2016-03-09T20:56:05.975Z]beforeSave("UserToOrders") Order (t3NoxcSp5z) Status is 'Finish' So Priority set to '0'
but nothing is being changed in the DB.
What do i miss?
Thanks.
in var preStatus is the same value as you wanted to save...
var preStatus = request.object.get("OrderStatus");
and you are "saving" the same value, you can just delete this line
request.object.set("OrderStatus",preStatus);
if it is not what you want, provide the log from "OrderStatus" = Finish
I've figured it out. It was an ACL permissions issue.
The Order created by one client, while the chance made by another one.

Is measuring js execution time a way to tell how quickly the app is responding to requests?

I have something like a microtime() function at the very start of my node.js / express app.
function microtime (get_as_float) {
// Returns either a string or a float containing the current time in seconds and microseconds
//
// version: 1109.2015
// discuss at: http://phpjs.org/functions/microtime
// + original by: Paulo Freitas
// * example 1: timeStamp = microtime(true);
// * results 1: timeStamp > 1000000000 && timeStamp < 2000000000
var now = new Date().getTime() / 1000;
var s = parseInt(now, 10);
return (get_as_float) ? now : (Math.round((now - s) * 1000) / 1000) + ' ' + s;
}
The code of the actual app looks something like this:
application.post('/', function(request, response) {
t1 = microtime(true);
//code
//code
response.send(something);
console.log("Time elapsed: " + (microtime(true) - t1));
}
Time elapsed: 0.00599980354309082
My question is, does this mean that from the time a POST request hits the server to the time a response is sent out is give or take ~0.005s?
I've measured it client-side but my internet is pretty slow so I think there's some lag that has nothing to do with the application itself. What's a quick and easy way to check how quickly the requests are being processed?
Shameless plug here. I've written an agent that tracks the time usage for every Express request.
http://blog.notifymode.com/blog/2012/07/17/profiling-express-web-framwork-with-notifymode/
In fact when I first started writing the agent, I took the same approach. But I soon realized that it is not accurate. My implementation tracks the time difference between request and the response by substituting the Express router. That allowed me to add tracker functions. Feel free to give it a try.

Azure Storage Queue very slow from a worker role in the cloud, but not from my machine

I'm doing a very simple test with queues pointing to the real Azure Storage and, I don't know why, executing the test from my computer is quite faster than deploy the worker role into azure and execute it there. I'm not using Dev Storage when I test locally, my .cscfg is has the connection string to the real storage.
The storage account and the roles are in the same affinity group.
The test is a web role and a worker role. The page tells to the worker what test to do, the the worker do it and returns the time consumed. This specific test meassures how long takes get 1000 messages from an Azure Queue using batches of 32 messages. First, I test running debug with VS, after I deploy the app to Azure and run it from there.
The results are:
From my computer: 34805.6495 ms.
From Azure role: 7956828.2851 ms.
That could mean that is faster to access queues from outside Azure than inside, and that doesn't make sense.
I'm testing like this:
private TestResult InQueueScopeDo(String test, Guid id, Int64 itemCount)
{
CloudStorageAccount account = CloudStorageAccount.Parse(_connectionString);
CloudQueueClient client = account.CreateCloudQueueClient();
CloudQueue queue = client.GetQueueReference(Guid.NewGuid().ToString());
try
{
queue.Create();
PreTestExecute(itemCount, queue);
List<Int64> times = new List<Int64>();
Stopwatch sw = new Stopwatch();
for (Int64 i = 0; i < itemCount; i++)
{
sw.Start();
Boolean valid = ItemTest(i, itemCount, queue);
sw.Stop();
if (valid)
times.Add(sw.ElapsedTicks);
sw.Reset();
}
return new TestResult(id, test + " with " + itemCount.ToString() + " elements", TimeSpan.FromTicks(times.Min()).TotalMilliseconds,
TimeSpan.FromTicks(times.Max()).TotalMilliseconds,
TimeSpan.FromTicks((Int64)Math.Round(times.Average())).TotalMilliseconds);
}
finally
{
queue.Delete();
}
return null;
}
The PreTestExecute puts the 1000 items on the queue with 2048 bytes each.
And this is what happens in the ItemTest method for this test:
Boolean done = false;
public override bool ItemTest(long itemCurrent, long itemCount, CloudQueue queue)
{
if (done)
return false;
CloudQueueMessage[] messages = null;
while ((messages = queue.GetMessages((Int32)itemCount).ToArray()).Any())
{
foreach (var m in messages)
queue.DeleteMessage(m);
}
done = true;
return true;
}
I don't what I'm doing wrong, same code, same connection string and I got these resuts.
Any idea?
UPDATE:
The problem seems to be in the way I calculate it.
I have replaced the times.Add(sw.ElapsedTicks); for times.Add(sw.ElapsedMilliseconds); and this block:
return new TestResult(id, test + " with " + itemCount.ToString() + " elements",
TimeSpan.FromTicks(times.Min()).TotalMilliseconds,
TimeSpan.FromTicks(times.Max()).TotalMilliseconds,
TimeSpan.FromTicks((Int64)Math.Round(times.Average())).TotalMilliseconds);
for this one:
return new TestResult(id, test + " with " + itemCount.ToString() + " elements",
times.Min(),times.Max(),times.Average());
And now the results are similar, so apparently there is a difference in how the precision is handled or something. I will research this later on.
The problem apparently was a issue with different nature of the StopWatch and TimeSpan ticks, as discussed here.
Stopwatch.ElapsedTicks Property
Stopwatch ticks are different from DateTime.Ticks. Each tick in the DateTime.Ticks value represents one 100-nanosecond interval. Each tick in the ElapsedTicks value represents the time interval equal to 1 second divided by the Frequency.
How is your CPU utilization? Is this possible that your code is spiking the CPU and your workstation is much faster than your Azure node?

Resources