Prioritize reading versus writing in the tarantool slave

Prioritize reading versus writing in the tarantool slave - tarantool

Can anyone tell me if there is a way to prioritize reading versus writing in the tarantool slave? There is a task which is more important to read than to write, but during writing to the master, the changed records are blocked for some time on the slave. In general, I understand that this is the correct behavior for databases. But maybe there is an opportunity to prioritize reading versus writing on a slave?

all transactions (both read and write) are served in a single thread, there is not prioritization between them.
You can read about it in more details here: https://www.tarantool.io/en/doc/1.10/book/box/atomic/

Related

Performance, writing multiple files in multiple threads to the same directory

First off I'd like to apologise if this has been answered somewhere else here on SO. Since I couldn't find any questions specific for what I need I decided to ask.
Problem: I need to receive multiple data packets from a device (scanner for example) and store these in disk. My approach was to queue the data packets and have a thread wait until there were enough items in the queue and start a thread that would consume a batch of data packets and store them in disk, the directory in which they are stored doesn't change so multiple threads can, possibly, write to the same directory at the same time.
A collegue of mine mentioned that this was a bad idea since having multiple threads write to the same disk would actually slow it down. Having googled this, I found that there was no clear answer to whether this was true or not, some stated that this isn't a problem for modern computers while others said this wouldn't be a problem if different threads were writing to separate disks like in a RAID setup.
Question: Is writing to the same disk with multiple threads (which are writing a batch of data packet files) a bad idea? Or is there actual benefits to doing so?
P.S: I'd like to add that the solution for this will be for the Windows OS.
Thanks!

Nifi processor batch insert - handle failure

I am currently in the process of writing an ElasticSearch Nifi processor. Individual inserts / writes to ES are not optimal, instead batching documents is preferred. What would be considered the optimal approach within a Nifi processor to track (batch) documents (FlowFiles) and when at a certain amount batch them in? The part I am most concerned about is if ES is unavailable, down, network partition, etc. prevents the batch from being successful. The primary point of the question, is given that Nifi has content storage for queuing / back-pressure, etc. is there a preferred method for using that to ensure no FlowFiles get lost if a destination is down? Maybe there is another processor I should look at for an example?
I have looked at the Mongo processor, Merge, etc. to try and get an idea of the preferred approach for batching inside of a processor, but can't seem to find anything specific. Any suggestions would be appreciated.
Good chance I am overlooking some basic functionality baked into Nifi. I am still fairly new to the platform.
Thanks!

Great question and a pretty common pattern. This is why we have the concept of a ProcessSession. It allows you to send zero or more things to an external endpoint and only commit once you know it has been ack'd by the recipient. In this sense it offers at least-once semantics. If the protocol you're using supports two-phase commit style semantics you can get pretty close to the ever elusive exactly-once semantic. Much of the details of what you're asking about here will depend on the destination systems API and behavior.
There are some examples in the apache codebase which reveal ways to do this. One way is if you can produce a merged collection of events prior to pushing to the destination system. Depends on its API. I think PutMongo and PutSolr operate this way (though the experts on that would need to weigh in). An example that might be more like what you're looking for can be found in PutSQL which operates on batches of flowfiles to send in a single transaction (on the destination DB).
https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/PutSQL.java
Will keep an eye here but can get the eye of a larger NiFi group at users#nifi.apache.org
Thanks
Joe

The best way to store restart information in spring-batch readers, processors, writers and tasklets

Currently I'm designing my first batch application with spring batch using several tasklets and own readers, writers and processors primarily doing input data checks and tif-file handling (split, merge etc) depending on the input data i.e. document metadata with the appertaining image files. I want to store and use restart information persistet in the batch_step_execution_context in the spring-batch job-repository. Unfortunately I did not find many examples where and how to do this best. I want to make the application restartable so that it can continue after error correction at the point it left off.
What I have done so far and checked if in case of an exception the step information has been persistet:
Implemented ItemStream in a CustomItemWriter using update() and open() to store and regain information to/from the step_execution_context e.g. executionContext.putLong("count", count). Works good.
Used StepListeners and found that the context information written in beforeStep() has been persistet. Also works.
I appreciate help which will give or point to some examples, "restart tutorial" or sources where to read how to do it in Readers, Processors, Writers and tasklets. Does it make sense in Readers and Processors? I'm aware that handling restart information might also depend on commit-interval, restartable flags etc..
Remark: Maybe I require some deeper understanding of spring-batch concepts beyond what I read and tried so far. Also hints regarding this are welcome. I consider myself as intermediate level lacking details to make my application using some comforts of spring-batch.

How safe is it to store sessions with Redis?

I'm currently using MySql to store my sessions. It works great, but it is a bit slow.
I've been asked to use Redis, but I'm wondering if it is a good idea because I've heard that Redis delays write operations. I'm a bit afraid because sessions need to be real-time.
Has anyone experienced such problems?

Redis is perfect for storing sessions. All operations are performed in memory, and so reads and writes will be fast.
The second aspect is persistence of session state. Redis gives you a lot of flexibility in how you want to persist session state to your hard-disk. You can go through http://redis.io/topics/persistence to learn more, but at a high level, here are your options -
If you cannot afford losing any sessions, set appendfsync always in your configuration file. With this, Redis guarantees that any write operations are saved to the disk. The disadvantage is that write operations will be slower.
If you are okay with losing about 1s worth of data, use appendfsync everysec. This will give great performance with reasonable data guarantees

This question is really about real-time sessions, and seems to have arisen partly due to a misunderstanding of the phrase 'delayed write operations' While the details were eventually teased out in the comments, I just wanted to make it super-duper clear...
You will have no problems implementing real-time sessions.
Redis is an in-memory key-value store with optional persistence to disk. 'Delayed write operations' refers to writes to disk, not the database in general, which exists in memory. If you SET a key/value pair, you can GET it immediately (i.e in real-time). The policy you select with regards to persistence (how much you delay the writes) will determine the upper-bound for how much data could be lost in a crash.

Basically there are two main types available: async snapsnots and fsync(). They're called RDB and AOF respectively. More on persistence modes on the official page.
The signal handling of the daemonized process syncs to disk when it receives a SIGTERM for instance, so the data will still be there after a reboot. I think the daemon or the OS has to crash before you'll see an integrity corruption, even with the default settings (RDB snapshots).
The AOF setting uses an Append Only File that logs the commands the server receives, and recreates the DB from scratch on cold start, from the saved file. The default disk-sync policy is to flush once every second (IIRC) but can be set to lock and write on every command.
Using both the snapshots and the incremental log seems to offer both a long term don't-mind-if-I-miss-a-few-seconds-of-data approach with a more secure, but costly incremental log. Redis supports clustering out of the box, so replication can be done too it seems.
I'm using the default RDB setting myself and saving the snapshots to remote FTP. I haven't seen a failure that's caused a data loss yet. Acute hardware failure or power outages would most likely, but I'm hosted on a VPS. Slim chance of that happening :)

How to understand asynchronous io in Windows?

1.How to understand asynchronous io in Windows??
2.If I write/read something to the file using asynchronous io :
WriteFile();
ReadFile();
WriteFile();
How many threads does the OS generate to accomplish these task?
Do the 3 task run simultaneously and in multi-threading way
or run one after another just with different order?
3.Can I use multithreading and in each thread using a asynchronous io
to read or write the same file？

1.How to understand asynchronous io in Windows??
Read the Win32 documentation. Search on the web. Don't expect an answer to such a large, broad question here in SO.
2.If I write/read something to the file using asynchronous io :
WriteFile();
ReadFile();
WriteFile();
How many threads does the OS generate to accomplish these task?
I don't think it does. It will re-use existing thread contexts to execute kernel function calls. Basically the OS schedules the work and borrows a thread to do it - which is fine, since the kernel context is always the same.
3.Can I use multithreading and in each thread using a asynchronous io to read or write
the same file？
I believe so, yes. I don't know that the order of execution is guaranteed to match the order of submission, in which case you will obtain unpredictable results if you issue concurrent reads/writes on the same byte ranges.

To your questions:
How many threads does the OS generate
to accomplish these task?
Depends if you are using the windows pools, iocp, etc. Generally you decide.
Do the 3 task run simultaneously and
in multi-threading way or run one
after another just with different
order?
This depends on your architecture. On a single-cored machine, the 3 tasks would run one after another and the order would be os decided. On a multi-cored machine these might run together, depending on how the OS scheduled the threads.
3.Can I use multithreading and in each thread using a asynchronous io to read
or write the same file？
That is out of my knowledge so someone else would need to answer that one.
I suggest getting a copy of Windows via C/C++ as that has a very large chapter on Asynchronous IO.

I guess it depends which operating system you are using. But you shouldnt have to worry about this anyhow, it is transparent and should not affect how you write your code.

If you use the standard read and write in windows, you don't have to care that the system may not write it immediately, unless you are writing on the command-line and are waiting for the user to type some input. The OS is responsible for ensuring that what you write will eventually be written to the hard drive, and will do a much better job that you can do anyway.
If you are working on some weird asynchronous io, then please reformat your question.

I suggest looking for Jeffery Richter's books on Win32 programming. They are very well-written guides for just this sort of thing.
I think he has a newer book(s?) on C#, so watch out that you don't buy the wrong one.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio