Random spikes in usage (CockroachCloud Serverless) - cockroachdb

I recently set up a free CockroachDB Serverless cluster on CockroachCloud. It's been really great so far, but sometimes there are random spikes in Request Units even though the amount of SQL statements doesn't increase at all. Here's a screenshot of the two graphs in the cluster management page, it illustrates pretty well what I mean. I would really appreciate some help on how I could eliminate these spikes because CockroachCloud has some limits on free usage. That being said, I'm still fairly new to CockroachDB, so I might be missing something obvious.

You are likely performing enough mutations on your data to trigger automatic statistics collection as a background process. By default when 20% or more rows are modified in a table, CockroachDB will trigger a statistics refresh. The statistics are used by the optimizer to create more efficient query plans.
Your SQL Statements graph indicates that almost all your operations are inserts. That many inserts is almost certainly triggering stats collection. While you can turn off stats collection, the optimizer will then be using stale data to calculate query plans, potentially causing performance problems.
The occasional spikes in your Request Unit graph are above the 100 RUs per second baseline, but the rest of the time you are well below 100 RUs per second. That means you are accumulating RUs most of the time, and that (plus the initial 10 million RU allocation) should cover the bursts.
I added a FAQ entry to the Serverless docs covering this.

Related

Amazon Elasticsearch - Concurrent Bulk Requests

When I am adding 200 documents to ElasticSearch via one bulk request - it's super fast.
But I am wondering if is there a chance to speed up the process with concurrent executions: 20 concurrent executions with 10 documents each.
I know it's not efficient, but maybe there is a chance to speed up the process with concurrent executions?
Lower concurrency is preferable for bulk document inserts. Some concurrency is helpful in some circumstances — It Depends™ and I'll get into it — but is not a major or automatic win.
There's a lot that can be tuned when it comes to performance of writes to Elasticsearch. One really quick win that you should check: are you using HTTP keep-alive for your connections? That's going to save a lot of the TCP and TLS overhead of setting up each connection. Just that change can make a big performance boost, and also uncover some meaningful architectural considerations for your indexing pipeline.
So check that out and see how it goes. From there, we should go to the bottom, and work our way up.
The index on disk is Lucene. Lucene is a segmented index. The index part is a core reason why you're using Elasticsearch in the first place: a dictionary of sorted terms can be searched in O(log N) time. That's super fast and scalable. The segment part is because inserting into an index is not particularly fast — depending on your implementation, it costs O(log N) or O(N log N) to maintain the sorting.
So Lucene's trick is to buffer those updates and append a new segment; essentially a collection of mini-indices. Searching some relatively small number of segments is still much faster than taking all the time to maintain a sorted index with every update. Over time Lucene takes care of merging these segments to keep them within some sensible size range, expunging deleted and overwritten docs in the process.
In Elasticsearch, every shard is a distinct Lucene index. If you have an index with a single shard, then there is very little benefit to having more than a single concurrent stream of bulk updates. There may be some benefit to concurrency on the application side, depending on the amount of time it takes for your indexing pipeline to collect and assemble each batch of documents. But on the Elasticsearch side, it's all just one set of buffers getting written out to one segment after another.
Sharding makes this a little more interesting.
One of Elasticsearch's strengths is the ability to partition the data of an index across multiple shards. This helps with availability, and it helps workloads scale beyond the resources of a single server.
Alas it's not quite so simple as to say that the concurrency should be equal, or proportional, to the number of primary shards that an index has. Although, as a rough heuristic, that's not a terrible one.
You see, internally, the first Elasticsearch node to handle the request is going to turn that Bulk request into a sequence of individual document update actions. Each document update is sent to the appropriate node that is hosting the shard that this document belongs to. Responses are collected by the bulk action so that it can send a summary of the bulk operation in its response to the client.
So at this point, depending on the document-shard routing, some shards may be busier than others during the course of processing an incoming bulk request. Is that likely to matter? My intuition says not really. It's possible, but it would be unusual.
In most tests and analysis I've seen, and in my experience over ~ten years with Lucene, the slow part of indexing is the transformation of the documents' values into the inverted index format. Parsing the text, analyzing it into terms, and so on, can be very complex and costly. So long as a bulk request has sufficient documents that are sufficiently well distributed across shards, the concurrency is not as meaningful as saturating the work done at the shard and segment level.
When tuning bulk requests, my advice is something like this.
Use HTTP keep-alive. This is not optional. (You are using TLS, right?)
Choose a batch size where each request is taking a modest amount of time. Somewhere around 1 second, probably not more than 10 seconds.
If you can get fancy, measure how much time each bulk request took, and dynamically grow and shrink your batch.
A durable queue unlocks a lot of capabilities. If can fetch and assemble documents and insert them into, say, Kafka, then that process can be run in parallel to saturate the database and parallelize any denormalization or preparation of documents. A different process then pulls from the queue and sends requests to the server, and with some light coordination you can test and tune different concurrencies at different stages. A queue also lets you pause your updates for various migrations and maintenance tasks when it helps to put the cluster into read-only mode for a time.
I've avoided replication throughout this answer because there's only one reason where I'd ever recommend tweaking replication. And that is when you are bulk creating an index that is not serving any production traffic. In that case, it can help save some resources through your server fleet to turn off all replication to the index, and enable replication after the index is essentially done being loaded with data.
To close, what if you crank up the concurrency anyway? What's the risk? Some workloads don't control the concurrency and there isn't the time or resources to put a queue in front of the search engine. In that case, Elasticsearch can avoid a fairly substantial amount of concurrency. It has fairly generous thread pools for handling concurrent document updates. If those thread pools are saturated, it will reject responses with a HTTP 429 error message and a clear message about queue depths being exceeded. Those can impact stability of the cluster, depending on available resources, and number of shards in the index. But those are all pretty noticeable issues.
Bottom line: no, 20 concurrent bulks with 10 documents each will probably not speed up performance relative to 1 bulk with 200 documents. If your bulk operations are fast, you should increase their size until they run for a second or two, or are problematic. Use keep-alive. If there is other app-side overhead, increase your concurrency to 2x or 3x and measure empirically. If indexing is mission critical, use a fast, durable queue.
There is no straight answer to this as it depends on lots of factors. Above the optimal bulk request size, performance no longer improves and may even drop off. The optimal size, however, is not a fixed number.
It depends entirely on your hardware, your document size and complexity, and your indexing and search load.
Try indexing typical documents in batches of increasing size. When performance starts to drop off, your batch size is too big.
Since you are doing it in batches of 200, the chances are high that it should be most optimal way to index. But again it will depend on the factors mentioned above.

Performance issues when pushing data at a constant rate to Elasticsearch on multiple indexes at the same time

We are experiencing some performance issues or anomalies on a elasticsearch specifically on a system we are currently building.
The requirements:
We need to capture data for multiple of our customers, who will query and report on them on a near real time basis. All the documents received are the same format with the same properties and are in a flat structure (all fields are of primary type and no nested objects). We want to keep each customer’s information separate from each other.
Frequency of data received and queried:
We receive data for each customer at a fluctuating rate of 200 to 700 documents per second – with the peak being in the middle of the day.
Queries will be mostly aggregations over around 12 million documents per customer – histogram/percentiles to show patterns over time and the occasional raw document retrieval to find out what happened a particular point in time. We are aiming to serve 50 to 100 customer at varying rates of documents inserted – the smallest one could be 20 docs/sec to the largest one peaking at 1000 docs/sec for some minutes.
How are we storing the data:
Each customer has one index per day. For example, if we have 5 customers, there will be a total of 35 indexes for the whole week. The reason we break it per day is because it is mostly the latest two that get queried with occasionally the remaining others. We also do it that way so we can delete older indexes independently of customers (some may want to keep 7 days, some 14 days’ worth of data)
How we are inserting:
We are sending data in batches of 10 to 2000 – every second. One document is around 900bytes raw.
Environment
AWS C3-Large – 3 nodes
All indexes are created with 10 shards with 2 replica for the test purposes
Both Elasticsearch 1.3.2 and 1.4.1
What we have noticed:
If I push data to one index only, Response time starts at 80 to 100ms for each batch inserted when the rate of insert is around 100 documents per second. I ramp it up and I can reach 1600 before the rate of insert goes to close to 1sec per batch and when I increase it to close to 1700, it will hit a wall at some point because of concurrent insertions and the time will spiral to 4 or 5 seconds. Saying that, if I reduce the rate of inserts, Elasticsearch recovers nicely. CPU usage increases as rate increases.
If I push to 2 indexes concurrently, I can reach a total of 1100 and CPU goes up to 93% around 900 documents per second.
If I push to 3 indexes concurrently, I can reach a total of 150 and CPU goes up to 95 to 97%. I tried it many times. The interesting thing is that response time is around 109ms at the time. I can increase the load to 900 and response time will still be around 400 to 600 but CPU stays up.
Question:
Looking at our requirements and findings above, is the design convenient for what’s asked? Are there any tests that I can do to find out more? Is there any setting that I need to check (and change)?
I've been hosting thousands of Elasticsearch clusters on AWS over at https://bonsai.io for the last few years, and have had many a capacity planning conversation that sound like this.
First off, it sounds to me like you have a pretty good cluster design and test rig going here. My first intuition here is that you are legitimately approaching the limits of your c3.large instances, and will want to bump up to a c3.xlarge (or bigger) fairly soon.
An index per tenant per day could be reasonable, if you have relatively few tenants. You may consider an index per day for all tenants, using filters to focus your searches on specific tenants. And unless there are obvious cost savings to discarding old data, then filters should suffice to enforce data retention windows as well.
The primary benefit of segmenting your indices per tenant would be to move your tenants between different Elasticsearch clusters. This could help if you have some tenants with wildly larger usage than others. Or to reduce the potential for Elasticsearch's cluster state management to be a single point of failure for all tenants.
A few other things to keep in mind that may help explain the performance variance you're seeing.
Most importantly here, indexing is incredibly CPU bottlenecked. This makes sense, because Elasticsearch and Lucene are fundamentally just really fancy string parsers, and you're sending piles of strings. (Piles are a legitimate unit of measurement here, right?) Your primary bottleneck is going to be the number and speed of your CPU cores.
In order to take the best advantage of your CPU resources while indexing, you should consider the number of primary shards you're using. I'd recommend starting with three primary shards to distribute the CPU load evenly across the three nodes in your cluster.
For production, you'll almost certainly end up on larger servers. The goal is for your total CPU load for your peak indexing requirements ends up under 50%, so you have some additional overhead for processing your searches. Aggregations are also fairly CPU hungry. The extra performance overhead is also helpful for gracefully handling any other unforeseen circumstances.
You mention pushing to multiple indices concurrently. I would avoid concurrency when bulk updating into Elasticsearch, in favor of batch updating with the Bulk API. You can bulk load documents for multiple indices with the cluster-level /_bulk endpoint. Let Elasticsearch manage the concurrency internally without adding to the overhead of parsing more HTTP connections.
That's just a quick introduction to the subject of performance benchmarking. The Elasticsearch docs have a good article on Hardware which may also help you plan your cluster size.

Gather statistics for a huge oracle table - worry of collapsing the server

I'm trying to improve the execution time of a view of an Oracle database, which takes ages to load, and it involves a table which has 15,352,595 records. I'm considering to gather statistics on it coz I suspect the poor performance is due to stale stats.
However, I'm worried that this will burden the server a lot and I'm not very confident that its harddisks (or whatever hardware components) can withstand heavy workloads without breaking.
Is that a real thing that should be worried?
You can gather statistics with low value of estimate percentage.

when I try to use Oracle Statistics my cost grows?

I have a big query on for tables and I want to optimize it.
The weird part is that when I get the execution plan without statistics it says something like 1.2M. however if I get statistics for one of the tables involved in the query, my cost lowers to 4k. But if I ask for statistics in the other tables the cost grows to 50k, so I am not sure what's happening.
Can anyone explain a reason why giving more statistics actually increases query cost?
The Cost Based Optimiser uses as much information as you can give it in order to calculate the cost of a plan. If you update (i.e. change) the statistics it uses, then obviously that will change the calculated cost of the plan.
It's not actually the gathering of stats that causes the cost to grow - it's how those stats have changed (whether up or down) that causes the calculated cost to change.
In the absence of statistics, Oracle may use heuristics, guesswork or a quick sample of the data (depending on the settings in your instance).
Generally, the better (more accurate or representative) the statistics, the more accurate the cost calculation.
The cost based optimizer has it's challenges. There are rounding errors that can have quite an impact on decisions that it makes. This is one of the reasons that SQL Plan Stability, introduced in 11g is so nice. Forget about 10g, if you can, or prepare for long debugging sessions.
At the first use, a plan is generated based on the current statistics and executed. If SQL is repeated, the SQL and the plan are stored in a baseline. In the maintenance window, the most expensive plans are re evaluated and in many cases, a better plan can be provided. This is possible because at runtime, the optimizer is limited in the time it is given to search for a plan. In the maintenance window, a lot more time can be spent to find the best plan.
In 11g the peeking is also fixed and a single SQL can now have multiple plans, based on the values of the bind variables.
The query cost is based on many factors, where IO is a very important factor.
How are your tables filled and where are the high water marks located? A table that is filled and emptied constantly can have it's high watermark far away....
There are lots of bugs in the optimizer, lots of options, controlled by hidden parameters. You could try to use them to tweek the behaviour. Upgrading to 11g might be a lot smarter as it solves lots of performance problems for many applications.

Does it make sense to optimize queries for less i/o pressure?

I have a read only database (product) that recides on its own Sql Server 2008.
I already optimized queries by looking at most expensive queries in activity monitor - report. I ordered the report by CPU-cost. I now have something like 50 queries/second and no query is longer than 300ms.
CPU-Time is ok (30%) and Memory is only used by 20% (out of 64GB).
There is one issue: disk time is at steady 100% (I looked at idle time performance counter and used ideras SQL diagnostic manager). I can see that the product db behaves different than my order db which is on a different machine and has smaller tables: If I look at a profiler trace I have queries in product-db that show a value in column "read" higher than 50.000. In my order DB these values are never higher than 1000. The queries in product-db use a lot of Common table expressions, work on large tables (some are around 5 Million entries).
I am not shure if I should invest time in optimizing queries for i/o performance or if I should just add a server. By otimizing for query duration I already added the missing indexes. Is optimizing for i/o something that is usually done?
In short, yes. Optimize for both CPU and IO.
Queries with high CPU tend to be doing unnecessary in-memory sorts, (sometimes inefficient) hash joins, or complex logic.
Queries with high IO (Page Reads) tend to be doing full table scans or working in other inefficient ways.
9 times out of 10, the same queries will be near the top of the list, but if you've worked on the high CPU and you still are unhappy with performance, then by all means, work on the high IO procs next.
There's always a next bottleneck.
they say.
Now that you've tuned CPU usage, it's only natural that I/O load emerges as dominant. Is your performance already acceptable? If yes stop, if no you have to estimate how many hours you will have to invest in further tuning and if buying another server or more hard disks might be cheaper.
Regarding the I/O tuning again, try to see what you can achieve with easy measures. Sometimes you can trade CPU for I/O and vice versa. Compression is an example for this. You would then tune that component that is your current bottlneck.
Before you seek to make the I/O faster try to reduce the I/O that is generated.
Look for obvious IO performance improvements for your query, but more importantly, look at how you can improve your IO performance at the server level.
If your other resources (CPU and memory) aren't overloaded, you probably don't need a new server. Consider adding an SSD for logs and temp files, and/or consider if you can affordably fit your whole DB onto an array of SSDs.
Of course, clearing out your disk IO bottleneck is likely to raise CPU usage, but if your performance is close to acceptable, this will probably improve things to the point that you can stop optimizing for now.
Unless you are using SSDs or a DB optimized SAN then IO is almost always the limit in database applications.
So yes, optimize to get rid of it as much as possible.
Table indexes are the first thing to do.
Then, add as much RAM as you possibly can, up to the complete size of your DB files.
Then partition your data tables (if that is a reasonable thing to do) so that any necessary table or index scans are done on only one or two table partitions.
Then I suppose you either buy bigger machines with even more RAM and/or buy SSDs or a SAN or a SAN with SSDs.
Alternatively you rebuild your entire database application to use something like NoSQL or database sharding, and implement all your relations, joins, constraints, etc in a middle interface layer.

Resources