Tarantool WAL vs Redis AOF - tarantool

Redis with the AOF (append only file) parameter enabled can only lose a second of data. I know that Tarantool has wal_move='write' by default, as this mode provides data persistence for vinyl and memtx, and in what case can this result in data loss that cannot be restored?
The second part of the question:
what wal_mode setting do you recommend none/write/fsync for vinyl and memtx in particular?
I would like to get a constructive answer because a similar question was asked earlier but raised even more questions and he didn't give a clear answer:
Difference between Redis AOF and Tarantool WAL log

Wal-mode write' may lose not more that 1 record (in case of hard server reboot); wal-mode fsync never loses data.
You may refer to tarantool documentation here: https://www.tarantool.io/en/doc/1.10/reference/configuration/#confval-wal_mode
It implies that for write mode every operation is not acknowledged to the client until write syscall succeeds, and for fsync mode it is not acknowledged until a subsequent fsync syscall succeeds.
Thus, fsync mode "never" loses data - as it is persisted on disk before replying to the client. write mode may lose data during system reboot - data which was accepted to OS buffer but not flushed to disk. On typical workloads we've encountered cases when a single operation was lost in such a case.

Related

Spark save files distributedly

According to the Spark documentation,
All transformations in Spark are lazy, in that they do not compute their results right away. Instead, they just remember the transformations applied to some base dataset (e.g. a file). The transformations are only computed when an action requires a result to be returned to the driver program.
I am currently working on a large dataset that, once processed, outputs even a bigger amount of data, which needs to be stored in text files, as done with the command saveAsTextFile(path).
So far I have been using this method; however, since it is an action (as stated above) and not a transformation, Spark needs to send data from every partition to the driver node, thus slowing down the process of saving quite a bit.
I was wondering if any distributed file saving method (similar to saveAsTextFile()) exists on Spark, enabling each executor to store its own partition by itself.
I think you're misinterpreting what it means to send a result to the driver. saveAsTextFile does not send the data back to the driver. Rather, it sends the result of the save back to the driver once it's complete. That is, saveAsTextFile is distributed. The only case where it's not distributed is if you only have a single partition or you've coallesced your RDD back to a single partition before calling saveAsTextFile.
What that documentation is referring to is sending the result of saveAsTextFile (or any other "action") back to the driver. If you call collect() then it will indeed send the data to the driver, but saveAsTextFile only sends a succeed/failed message back to the driver once complete. The save itself is still done on many nodes in the cluster, which is why you'll end up with many files - one per partition.
IO is always expensive. But sometimes it can seem as if saveAsTextFile is even more expensive precisely because of the lazy behavior described in that excerpt. Essentially, when saveAsTextFile is called, Spark may perform many or all of the prior operations on its way to being saved. That is what is meant by laziness.
If you have the Spark UI set up it may give you better insight into what is happening to the data on its way to a save (if you haven't already done that).

2 instances of Redis: as a cache and as a persistent datastore

I want to setup 2 instances of Redis because I have different requirements for the data I want to store in Redis. While I sometimes do not mind losing some data that are used primarly as cached data, I want to avoid to lose some data in some cases like when I use python RQ that stores into Redis the jobs to execute.
I mentionned below the main settings to achieve such a goal.
What do you think?
Did I forget anything important?
1) Redis as a cache
# Snapshotting to not rebuild the whole cache if it has to restart
# Be reasonable to not decrease the performances
save 900 1
save 300 10
save 60 10000
# Define a max memory and remove less recently used keys
maxmemory X # To define according needs
maxmemory-policy allkeys-lru
maxmemory-samples 5
# The rdb file name
dbfilename dump.rdb
# The working directory.
dir ./
# Make sure appendonly is disabled
appendonly no
2) Redis as a persistent datastore
# Disable snapshotting since we will save each request, see appendonly
save ""
# No limit in memory
# How to disable it? By not defining it in the config file?
maxmemory
# Enable appendonly
appendonly yes
appendfilename redis-aof.aof
appendfsync always # Save on each request to not lose any data
no-appendfsync-on-rewrite no
# Rewrite the AOL file, choose a good min size based on the approximate size of the DB?
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 32mb
aof-rewrite-incremental-fsync yes
aof-load-truncated yes
Sources:
http://redis.io/topics/persistence
https://raw.githubusercontent.com/antirez/redis/2.8/redis.conf
http://fr.slideshare.net/eugef/redis-persistence-in-practice-1
http://oldblog.antirez.com/post/redis-persistence-demystified.html
How to perform Persistence Store in Redis?
https://www.packtpub.com/books/content/implementing-persistence-redis-intermediate
I think your persistence options are too aggressive - but it mostly depends on the nature and the volume of your data.
For the cache, using RDB is a good idea, but keep in mind that depending on the volume of data, dumping the content of the memory on disk has a cost. On my system, Redis can write memory data at 400 MB/s, but note that data may (or may not) be compressed, may (or may not) be using dense data structures, so your mileage will vary. With your settings, a cache supporting heavy writing will generate a dump every minute. You have to check that with the volume you have, the dump duration is well below that minute (something like 6-10 seconds would be fine). Actually, I would recommend to keep only save 900 1 and remove the other save lines. And even a dump every 15 min could be considered as too frequent, especially if you have SSD hardware that will progressively wear out.
For the persistent store, you need to define also the dir parameter (since it also controls the location of the AOF file). The appendfsync always option is overkill and too slow for most purposes, except if you have very low throughput. You should set it to everysec. If you cannot afford to lose a single bit of data even in case of system crash, then using Redis as a storage backend is not a good idea. Finally, you will probably have to adjust auto-aof-rewrite-percentage and auto-aof-rewrite-min-size to the level of write throughput the Redis instance has to sustain.
I totally agree with #Didier - this is more of a supplement rather than a full answer.
First note that Redis offers tunable persistency - you can use RDB and/or AOF. While a your choice of using RDB for a persistent cache makes perfect sense, I would recommend considering using both for your persistent store. This will allow you both point-in-time recovery based on the snapshots (i.e. backup) as well as post-crash recovery to the last recorded operation with the AOF.
For the persistent store, you don't want to set maxmemory to 0 (which is the default if it is commented out in the conf file). When set to 0, Redis will use as much memory as the OS will give it so eventually, as your dataset grows, you will run into a situation where the OS will kill it to free memory (this often happens when you least expect it ;)). You should, instead, use a real value that's based on the amount of RAM that your server has with enough padding for the OS. For example, if your server has 16GB of RAM, as a rule of thumb I'd restrict Redis from using more than 14GB.
But there's a catch. Since you've read everything about Redis' persistency, you probably remember that Redis forks to write the data to disk. Forking can more than double the memory consumption (forked copy + changes) during the child process' execution so you need to make sure that your server has enough free memory to accommodate that if you use data persistence. Also note that you should consider in your maxmemory calculation other potential memory-consuming thingies such as replication and client buffers depending on what/how you and the app use Redis.

Making a journal file in golang

I have a small project in Go that are receiving text lines over tcp to process. However, to ensure robustness, I want to create some sort of journal so that nothing is lost in case of power failure (e.g. a frame of data is received by my app, but is not yet processed).
I have googled for any guides on how a journal file should be implemented, but the search results are heavily polluted by Oracle RDBMS documentation and such.
My tought was something like: immediately after receiving a line, write it to a file with a "not processed flag". After processing, update the file so that this flag is cleared, opening for overwrites. At the same time as this flag is cleared, send an "processed ack" to the data sender. Perhaps its easiest to deal with fixed size "slots" in the journal to ensure that I can reuse freed slots rather than having a ever-increasing file and maintain a "free list" of unused slots.
Is there any "best practice" for implementing such files in custom code, i.g.e with regards to file structure, padding and locking? Are there any concerns doing so in Go as it is cross-platform rather than using native file-system APIs?
You shouldn't rewrite a journal. Just append the operations to it so that you can recreate them, and then control the strictness level you want.
The logic should simply be:
receive message
write it to journal
optionally do an fsync on the journal now - depending on your consistency requirements.
optionally then send a "received ack" - depends on your needs.
process the message.
optionally write another "processed" record to the file with an id of the record. you don't always need that but this where you don't rewrite the old record. Alternatively you can write a separate file with the "top transaction id" you've processed, so you'll automatically know where to begin processing again in case of a failure. this will reduce the journal size.
send a "processed ack" or "processing failure" - again, depends on what you want.
Databases usually let you control the fsync behavior - every write, every N seconds, when the os decides - it's a matter of speed vs. durability.
A good read on the subject might be this post on redis persistence:
http://oldblog.antirez.com/post/redis-persistence-demystified.html
[EDIT] another great read on the subject - http://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying
As for the Go aspect of it - there are a few options of writing to files, from a low level file handler to a buffered writer. Of course a file handler will keep you most in control of what's going on under the hood. I'm not sure how much caching behind the scenes a normal file writer in Go does, I'd suggest you read the code if you intend to use it.

How safe is it to store sessions with Redis?

I'm currently using MySql to store my sessions. It works great, but it is a bit slow.
I've been asked to use Redis, but I'm wondering if it is a good idea because I've heard that Redis delays write operations. I'm a bit afraid because sessions need to be real-time.
Has anyone experienced such problems?
Redis is perfect for storing sessions. All operations are performed in memory, and so reads and writes will be fast.
The second aspect is persistence of session state. Redis gives you a lot of flexibility in how you want to persist session state to your hard-disk. You can go through http://redis.io/topics/persistence to learn more, but at a high level, here are your options -
If you cannot afford losing any sessions, set appendfsync always in your configuration file. With this, Redis guarantees that any write operations are saved to the disk. The disadvantage is that write operations will be slower.
If you are okay with losing about 1s worth of data, use appendfsync everysec. This will give great performance with reasonable data guarantees
This question is really about real-time sessions, and seems to have arisen partly due to a misunderstanding of the phrase 'delayed write operations' While the details were eventually teased out in the comments, I just wanted to make it super-duper clear...
You will have no problems implementing real-time sessions.
Redis is an in-memory key-value store with optional persistence to disk. 'Delayed write operations' refers to writes to disk, not the database in general, which exists in memory. If you SET a key/value pair, you can GET it immediately (i.e in real-time). The policy you select with regards to persistence (how much you delay the writes) will determine the upper-bound for how much data could be lost in a crash.
Basically there are two main types available: async snapsnots and fsync(). They're called RDB and AOF respectively. More on persistence modes on the official page.
The signal handling of the daemonized process syncs to disk when it receives a SIGTERM for instance, so the data will still be there after a reboot. I think the daemon or the OS has to crash before you'll see an integrity corruption, even with the default settings (RDB snapshots).
The AOF setting uses an Append Only File that logs the commands the server receives, and recreates the DB from scratch on cold start, from the saved file. The default disk-sync policy is to flush once every second (IIRC) but can be set to lock and write on every command.
Using both the snapshots and the incremental log seems to offer both a long term don't-mind-if-I-miss-a-few-seconds-of-data approach with a more secure, but costly incremental log. Redis supports clustering out of the box, so replication can be done too it seems.
I'm using the default RDB setting myself and saving the snapshots to remote FTP. I haven't seen a failure that's caused a data loss yet. Acute hardware failure or power outages would most likely, but I'm hosted on a VPS. Slim chance of that happening :)

Write request flow in Linux from user space to the device?

I'm confused as to what happens when I issue a write from user space in Linux. What is the full flow, down to the storage device? Supposing I use CFQ and a kernel that still uses pdflush.
CFQ is said to place requests into sync and a-sync queues. Writes are a-sync, for example, so they go into an a-sync queue based on the io priority. The queue will get CPU according to CFQ time slices, expiration settings, etc. Fine.
At the same time, writes dirty pages. Writing dirty pages is triggered by VM settings and done in the context of pdflush thread that runs a copy of background_writeout() which calls writeback_nodes(). When this happens, the writes pdflush does are surely sync.
Does it mean that we have two competing writes going on potentially for the same or similar write requests - one controlled by CFQ queuing mechanisms and another triggered by VM subsystem?
Does it mean the VM subsystem effectively breaks CFQ rules as soon as we hit dirty_background_ratio since pdflush doesn't carry the same io priority as the requesting process?
And as soon as we hit dirty_ratio CFQ setting become more or less obsolete, since all writes become sync?
I've read a lot of specific info on the two subsystems, but I'm yet to understand the whole write request flow. The interactive Linux kernel map does not include IO scheduler.
Any pointers to the whole picture would be appreciated.
Thanks,
Alex
I've found it since then. Basically, yes, pdflush and write cache in general do break IO priorities for writes. Reads can still benefit from CFQ, since they are sync and sync requests get priority, but for writes to be subjected to IO priorities they have to be direct.

Resources