I am working on project that contains a fixed-size buffer of type (FIFO): First input First Output, where clients send their requests to that buffer, and the system handles them.When the buffer is full, I have to apply one of the following overloading policies (Drop Policies): DRPH : Drop one Request from the Head of buffer. DRPT: Drop one Request from Tail of buffer.DRPR: Drop 25% of elements in the buffer randomly. BLCK: block new connections until space is available in buffer.
I made a simulation to measure the performance using Httperf by sending many requests per second and measuring the response time, but I have got unstable values for response time especially when the requests number is large. so by simulation I can not get the best drop policy. I repeated the simulation many times, each time I have got different values.
The question is :
theoretically, what is the best buffer management drop policy among the mentioned policies? .
It definitely depends on your data and in which order it is needed. But usually, with a FIFO, the data at the end of the buffer is the oldest and so the one with the least likelhood to be required again. So DRPR is probably the best solution. But only if you can afford losing data (e.g. because it can be re-inserted later). If that is not the case you have to block connections until buffer space is available again.
Another thing: I would strive for a dynamic buffer. Start with a reasonable default size and see how quick it fills up. Above a certain rate increase the buffer size (and below a certain threshold you can lower it again) up to a certain maximum.
Related
I am writing events into Redis Stream.
But I would like to keep only the last 48 hours events.
According to the Redis documentations, I saw that I can manage my list size only using the MAXLEN which take affect by the records count and not by time frame.
Is there any way I can use the XADD function but to limit on insertion records oldest that the last 48 hours?
Thanks for the help!
This is yet not clear. I don't like the vanilla way of time capping a stream, that is, "trim by <seconds>", because it means that if there is a delay in the process XADD-ing items, later the next XADD will have to evict things potentially for seconds, causing latency spikes. Moreover it does not make a lot of sense semantically. Your real "capped resource" is memory, so it's not really so important how many items you want to store in the past VS how many items you can store, so the number of items limit makes more sense. Yet in certain applications where there are multiple streams with insertion rates that vary a lot between different producers, it makes sense to cap by time, to avoid wasting memory in certain producers that emit very few entries per unit of time. Maybe at some point I'll add some "best effort" time capping that does not do more work than a given amount, but that eventually will be able to trim the stream, given enough XADD calls.
AFAIK not yet. There were discussions about adding a timestamp cap (to XADD, and possible to XTRIM as well), but it doesn't look like this feature has been implemented in the latest release candidates.
A possible solution in nodejs based on trimming to a specified key (not on time per se).
https://gist.github.com/jakelowen/22cb8a233ac0cdbb8e77808e17e0e1fc
Proof of concept. Not battle tested.
OK - Here is what I'm trying to achieve.
I've got an ES cluster with tens of millions of data (can linearly grow). These are raw data (something like an audit log). We will have features (incrementally) to retrospectively transform this audit log into a different document (index) depending upon the feature requirement. Therefore this would require reindexing (bulk read and bulk write).
These are my technical requirements:
The "reindexing component" should be horizontally scalable. Linearly scale by spinning up multiple instances of this (to speed up).
The "reindexing component" should be resilient. If one chunk of data fails during read by one worker, some other worker should pick this up.
Resume from where it left. These should be resumable from where it stopped (or crashed) rather than reading through the full index again.
A bit of research showed me that I'd have to build a bespoke solution for my needs.
Now my question is to use scroll or from&size
Scroll is naturally more intended to do bulk reads in an efficient way, but I also need it to be horizontally scalable. I understand there's a "sliced scroll" feature that allows parallel scrolls but this is only limited to the number of shards? ie if number of shards are 5, then I can only have 5 workers reading the elasticsearch. However, the transformations can be scaled though.
Alternatively, I was wondering if the paging (using from and size) would be ticking all my boxes. The approach is, I'd find the total count. Then I'd be computing the offsets and throwing that in a queue. Then a pool of workers would read the offsets from the queue and reading it using the (from & size). By this, I will exactly know which offsets have failed/pending etc and also reads can scale.
However, the important question I have is does it harm elasticsearch by firing more and more large paging requests concurrently (assuming page size is 2000).
I'd like to hear different views/solutions/pointers/comments on this.
I am currently toying around with the Scroll API of Elasticsearch, and want to use it to obtain a large set of data and do some manual processing on it. The processing is performed by an external library and is not of the type that can easily be included as a script.
While this seems to work nicely at the moment, I was wondering what considerations that I should take into account when fine-tuning the scroll size for performing this form of processing. A quick observation seems to indicate that increasing the scroll size will reduce the latency of the operation. While I suspect that larger scroll sizes will generally reduce throughput, I have no idea whether this hypothesis is correct. Also, I have no idea if there are any other consequences that I do not envision right now.
So to summarize, my question is: what impact does changing Elasticsearch's scroll size have, especially on performance, in a scenario where the results are processed for each batch that is obtained?
Thanks in advance!
The one (and the only I know of) consideration is to be able to process batch fast enough to not release scroll context (which is controlled by ?scroll=X parameter).
Assuming that you will consume all the data from query, there, scroll should be tuned based on network and 3rd-party app performance. I.e.
if your app can process data in stream-like manner, bigger chunks is better
if your app processing data in batches (waiting for full ES response first), the upper limit for batch size should guarantee processing time < scroll release time
if you work in poor network environment, less batch size is better to handle overhead of dropped connections/retries
generally, bigger batch is obviously better, as it eliminates some network/ES cpu overhead
I am trying to spread out data that is received in bursts. This means I have data that is received by some other application in large bursts. For each data entry I need to do some additional requests on some server, at which I should limit the traffic. Hence I try to spread up the requests in the time that I have until the next data burst arrives.
Currently I am using a token-bucket to spread out the data. However because the data I receive is already badly shaped I am still either filling up the queue of pending request, or I get spikes whenever a bursts comes in. So this algorithm does not seem to do the kind of shaping I need.
What other algorithms are there available to limit the requests? I know I have times of high load and times of low load, so both should be handled well by the application.
I am not sure if I was really able to explain the problem I am currently having. If you need any clarifications, just let me know.
EDIT:
I'll try to clarify the problem some more and explain, why a simple rate limiter does not work.
The problem lies in the bursty nature of the traffic and the fact, that burst have a different size at different times. What is mostly constant is the delay between each burst. Thus we get a bunch of data records for processing and we need to spread them out as evenly as possible before the next bunch comes in. However we are not 100% sure when the next bunch will come in, just aproximately, so a simple divide time by number of records does not work as it should.
A rate limiting does not work, because the spread of the data is not sufficient this way. If we are close to saturation of the rate, everything is fine, and we spread out evenly (although this should not happen to frequently). If we are below the threshold, the spreading gets much worse though.
I'll make an example to make this problem more clear:
Let's say we limit our traffic to 10 requests per seconds and new data comes in about every 10 seconds.
When we get 100 records at the beginning of a time frame, we will query 10 records each second and we have a perfect even spread. However if we get only 15 records we'll have one second where we query 10 records, one second where we query 5 records and 8 seconds where we query 0 records, so we have very unequal levels of traffic over time. Instead it would be better if we just queried 1.5 records each second. However setting this rate would also make problems, since new data might arrive earlier, so we do not have the full 10 seconds and 1.5 queries would not be enough. If we use a token bucket, the problem actually gets even worse, because token-buckets allow bursts to get through at the beginning of the time-frame.
However this example over simplifies, because actually we cannot fully tell the number of pending requests at any given moment, but just an upper limit. So we would have to throttle each time based on this number.
This sounds like a problem within the domain of control theory. Specifically, I'm thinking a PID controller might work.
A first crack at the problem might be dividing the number of records by the estimated time until next batch. This would be like a P controller - proportional only. But then you run the risk of overestimating the time, and building up some unsent records. So try adding in an I term - integral - to account for built up error.
I'm not sure you even need a derivative term, if the variation in batch size is random. So try using a PI loop - you might build up some backlog between bursts, but it will be handled by the I term.
If it's unacceptable to have a backlog, then the solution might be more complicated...
If there are no other constraints, what you should do is figure out the maximum data rate that you are comfortable with sending additional requests, and limit your processing speed according to that. Then monitor what is happening. If that gets through all of your requests quickly, then there is no harm . If its sustained level of processing is not fast enough, then you need more capacity.
If I have large dataset and do random updates then I think updates are mostly disk bounded (in case append only databases there is not about seeks but about bandwidth I think). When I update record slightly one data page must be updated, so if my disk can write 10MB/s of data and page size is 16KB then i can have max 640 random updates per second. In append only databases apout 320 per second bacause one update can take two pages - index and data. In other databases bacause of ranom seeks to update page in place can be even worse like 100 updates per second.
I assume that one page in cache has only one update before write (random updates). Going forward the same will by for random inserts around all data pages (for examle not time ordered UUID) or even worst.
I refer to the situation when dirty pages (after update) must be flushed to disk and synced (can't longer stay in cache). So updates per second count is in this situation disk bandwidth bounded? Are my calculations like 320 updates per second likely? Maybe I am missing something?
"It depends."
To be complete, there are other things to consider.
First, the only thing distinguishing a random update from an append is the head seek involved. A random update will have the head dancing all over the platter, whereas an append will ideally just track like record player. This also assumes that each disk write is the full write and completely independent of all other writes.
Of course, that's in a perfect world.
With most modern databases, each update will typically involve, at a minimum, 2 writes. One for the actual data, the other for the log.
In a typical scenario, if you update a row, the database will make the change in memory. If you commit that row, the database will acknowledge that by making a note in the log, while keeping the actual dirty page in memory. Later, when the database checkpoints it will right the dirty pages to the disk. But when it does this, it will sort the blocks and write them as sequentially as it can. Then it will write a checkpoint to the log.
During recovery when the DB crashed and could not checkpoint, the database reads the log up to the last checkpoint, "rolls it forward" and applies those changes to actual disk page, marks the final checkpoint, then makes the system available for service.
The log write is sequential, the data writes are mostly sequential.
Now, if the log is part of a normal file (typical today) then you write the log record, which appends to the disk file. The FILE SYSTEM will then (likely) append to ITS log that change you just made so that it can update it's local file system structures. Later, the file system will, also, commit its dirty pages and make it's meta data changes permanent.
So, you can see that even a simple append can invoke multiple writes to the disk.
Now consider an "append only" design like CouchDB. What Couch will do, is when you make a simple write, it does not have a log. The file is its own log. Couch DB files grow without end, and need compaction during maintenance. But when it does the write, it writes not just the data page, but any indexes affected. And when indexes are affected, then Couch will rewrite the entire BRANCH of the index change from root to leaf. So, a simple write in this case can be more expensive than you would first think.
Now, of course, you throw in all of the random reads to disrupt your random writes and it all get quite complicated quite quickly. What I've learned though is that while streaming bandwidth is an important aspect of IO operations, overall operations per second are even more important. You can have 2 disks with the same bandwidth, but the one with the slower platter and/or head speed will have fewer ops/sec, just from head travel time and platter seek time.
Ideally, your DB uses dedicated raw storage vs a file system for storage, but most do not do that today. The advantages of file systems based stores operationally typically outweigh the performance benefits.
If you're on a file system, then preallocated, sequential files are a benefit so that your "append only" isn't simply skipping around other files on the file system, thus becoming similar to random updates. Also, by using preallocated files, your updates are simply updating DB data structures during writes rather than DB AND file system data structures as the file expands.
Putting logs, indexes, and data on separate disks allow multiple drives to work simultaneously with less interference. Your log can truly be append only for example compared to fighting with the random data reads or index updates.
So, all of those things factor in to throughput on DBs.