Redis Blocking Save - ruby

How can I force Redis to do a blocking save? I am using the Ruby Redis gem, but I believe this question is not specific to that library. It seems like SAVE and a BGSAVE commands seem to flutter away doing stuff in the background, causing "-ERR background save in progress" errors on subsequent calls.
Hopefully this would be a boring, synchronous call that blocks all other Redis commands until the save over "dump.rdb" is finished. And hopefully this will not require actually shutting down the server, mucking around with "/etc/init.d/redis-server". Presumably I should be polling with the LASTSAVE command?

if you call SAVE but you get an error about a background save in progress, this means that there is also a BGSAVE in progress, becuase one of this is true:
1) Somebody called BGSAVE
2) Redis is configured to save from time to time (the default).
So your SAVE fails since there is already a save in progress. You can check if there is a background in progress, and when it is completed, checking the INFO output.

SAVE is a synchronous save; BGSAVE is the asynchronous one.
Why do you think SAVE is running in the background?

Redis#save does just that. What version of Redis and redis gem are you using?

Related

Should I use rails5 ActiveJob default async adapter for small background job in production?

Rails app which handle and activation of a license using an external service, the external service sometime delays the handling of rails request to over 30s, which will then return an error to front end (I'm running heroku, so max is 30s).
I tried using ActiveJobs and the default rails async adapter (Rails 5), and I can see that is working in Heroku out of the box. I keep reading that I should be using another web process and for example redis, but if the background job should just be performed straight after the request is done and if is just hitting another API outside which may be slower, is it so bad to use the default async?
I can see that this is handle in an in-process thread but I don't see a reason for such small job to be having another web process.
I use the async adapter in production for sending emails. This is a very small job. An email could take up to 3 seconds to send.
The doc said it's a poor fit for production because it will drop pending jobs on restart. If I remember correctly, Heroku restarts dynos once a day.
If your job is pending during the restart, the job will be lost. For my case, a pending email during the restart is pretty slim. So far so good.
But if you have jobs taking 30 seconds, I'll use Resque or DelayedJob.
If for small background job in production, which does not require 100% persistence in case of failure/server restart, whose duration is relatively short and thus separate process would be an overkill, I'd recommend using Sucker Punch.
Sucker Punch gem is designed to handle exactly such case. It prepares execution thread pool for each Job you create, using the concurrent-ruby gem, which is (probably) the most robust concurrency library in Ruby. It also hooks on_exit to finish all the pending tasks, so I guess you can expect this gem to be more reliable than the AsyncJob.
One thing to note is that although Sucker Punch is supported on Active Job, the adapter is not well written. Or, at least, when you use Sucker Punch adapter, it's behavior would be just like that of async adapter. So, I'd recommend using bare Sucker Punch if you wanted something just a little more useful/robust than AsyncJob.

Why does my Redis key show up only minutes after being stored?

I have a handler function on AWS Lambda that is connecting to a Redis instance to store a single key in the cache. The function has completed successfully but the key in Redis shows up minutes (or more) after the fact.
This behavior is observable on both Heroku Redis and Redis Cloud, they're both hosted solutions.
I can't for the life of me figure out what's causing this lag. My Redis knowledge is practically zero, I know how to store a list using LPUSH and how to trim that list using LTRIM.
The writer to Redis uses this Node client while I observe the lag using redis-cli on my local machine.
Is it common to experience this kind of lack in the setup I describe? What can I do to debug this?
I'm purposefully ignoring most of the information in the question and would like to refer only to the alleged symptom, namely that
key show up only minutes after being stored
This behavior is impossible with Redis - any change to the data is immediately visible given Redis' design. That said, the only scenario what you're describing could be remotely possible is when you're writing to a Redis master server and reading from a very-badly-lagged slave. I can ensure you that this is not the case with Redis Cloud however.
The main reason is due to the fact that the Lambda container starts to sleep as soon as your function terminates, and the Redis client you are using is all asynchronous APIs.
Note that the API is entire asynchronous. To get data back from the server, you'll need to use a callback.
I'm assuming that the asynchronous SET is the last action performed in your Lambda function. Once that is called, the underlying Lambda container goes to sleep, and most likely, the actual SET action hasn't finished its job yet. Therefore, the record will not show in Redis until the exact same Lambda container was called to execute your function again, and finished the job that it was supposed to finish on the last execution. This is probably the lag that you are experiencing.
To test whether or not this is true, do a sleep action for a couple of seconds at the end of your function to delay the Lambda container going to sleep immediately, and see if the lag is still there.
I would also recommend not to use asynchronous behaviour APIs inside Lambda functions. They'll add state to your Lambda computation, and this is actually not recommended by AWS themselves within the Lambda documentations too.

How to force Event Trace Session to flush data more often?

I have a driver that writes alot of trace logs using WPP.
I have configured an AutoLogger registry key entry to write the events to an .etl file.
The logging session is started successfully and the file is created, but it appears that the data is flushed to disk once in a long while.
Is it possible to make it flush data more often or even in real time?
I tried playing with the "Flush Timer" but based on what is written in MSDN and on the effect it had it is not what I am looking for.
Thanks.
That seems like an error in the documentation, in all other places where you configure the FlushTimer you will get what you want: the events in your buffer flushed to your session (no matter if real-time or file). I think that you somehow missed something in your test while trying it.
You can force the buffer to be flushed using ControlTrace using the EVENT_TRACE_CONTROL_FLUSH control code.

multiple processes writing to a single log file

This is intended to be a lightweight generic solution, although the problem is currently with a IIS CGI application that needs to log the timeline of events (second resolution) for troubleshooting a situation where a later request ends up in the MySQL database BEFORE the earlier request!
So it boils down to a logging debug statements in a single text file.
I could write a service that manages a queue as suggested in this thread:
Issue writing to single file in Web service in .NET
but deploying the service on each machine is a pain
or I could use a global mutex, but this would require each instance to open and close the file for each write
or I could use a database which would handle this for me, but it doesnt make sense to use a database like MySQL to try to trouble shoot a timeline issue with itself. SQLite is another possability, but this thread
http://www.perlmonks.org/?node_id=672403
Suggests that it is not a good choice either.
I am really looking for a simple approach, something as blunt as writing to individual files for each process and consolidating them accasionally with a scheduled app. I do not want to over engineer this, nor spend a week implementing it. It is only needed occassionally.
Suggestions?
Try the simplest solution first - each write to the log opens and closes the file. If you experience problems with this, which you probably won't , look for another solution.
You can use file locking. Lock the file for writing, write the message, unlock.
My suggestion is to preserve performance then think in asynchronous logging. Why not send your data log info using UDP to service listening port and he write to log file.
I would also suggest some kind of a central logger that can be called by each process in an asynchronous way. If the communication is UDP or RPC or whatever would be an implementation detail.
Even thought it's an old post, has anyone got an idea why not using the following concept:
Creating/opening a file with share mode of FILE_SHARE_WRITE.
Having a named global mutex, and opening it.
Whenever a file write is desired, lock the mutex first, then write to the file.
Any input?

Looking for pattern/approach/suggestions for handling long-running operation tied to web app

I'm working on a consumer web app that needs to do a long running background process that is tied to each customer request. By long running, I mean anywhere between 1 and 3 minutes.
Here is an example flow. The object/widget doesn't really matter.
Customer comes to the site and specifies object/widget they are looking for.
We search/clean/filter for widgets matching some initial criteria. <-- long running process
Customer further configures more detail about the widget they are looking for.
When the long running process is complete the customer is able to complete the last few steps before conversion.
Steps 3 and 4 aren't really important. I just mention them because we can buy some time while we are doing the long running process.
The environment we are working in is a LAMP stack-- currently using PHP. It doesn't seem like a good design to have the long running process take up an apache thread in mod_php (or fastcgi process). The apache layer of our app should be focused on serving up content and not data processing IMO.
A few questions:
Is our thinking right in that we should separate this "long running" part out of the apache/web app layer?
Is there a standard/typical way to break this out under Linux/Apache/MySQL/PHP (we're open to using a different language for the processing if appropriate)?
Any suggestions on how to go about breaking it out? E.g. do we create a deamon that churns through a FIFO queue?
Edit: Just to clarify, only about 1/4 of the long running process is database centric. We're working on optimizing that part. There is some work that we could potentially do, but we are limited in the amount we can do right now.
Thanks!
Consider providing the search results via AJAX from a web service instead of your application. Presumably you could offload this to another server and let you web application deal with the content as you desire.
Just curious: 1-3 minutes seems like a long time for a lookup query. Have you looked at indexes on the columns you are querying to improve the speed? Or do you need to do some algorithmic process -- perhaps you could perform some of this offline and prepopulate some common searches with hints?
As Jonnii suggested, you can start a child process to carry out background processing. However, this needs to be done with some care:
Make sure that any parameters passed through are escaped correctly
Ensure that more than one copy of the process does not run at once
If several copies of the process run, there's nothing stopping a (not even malicious, just impatient) user from hitting reload on the page which kicks it off, eventually starting so many copies that the machine runs out of ram and grinds to a halt.
So you can use a subprocess, but do it carefully, in a controlled manner, and test it properly.
Another option is to have a daemon permanently running waiting for requests, which processes them and then records the results somewhere (perhaps in a database)
This is the poor man's solution:
exec ("/usr/bin/php long_running_process.php > /dev/null &");
Alternatively you could:
Insert a row into your database with details of the background request, which a daemon can then read and process.
Write a message to a message queue which a daemon then read and processed.
Here's some discussion on the Java version of this problem.
See java: what are the best techniques for communicating with a batch server
Two important things you might do:
Switch to Java and use JMS.
Read up on JMS but use another queue manager. Unix named pipes, for instance, might be an acceptable implementation.
Java servlets can do background processing. You could do something similar to this technology in a web technology with threading support. I don't know about PHP though.
Not a complete answer but I would think using AJAX and passing the 2nd step to something thats faster then PHP (C, C++, C#) then a PHP function pick the results off of some stack most likely just a database.

Resources