I am trying to improve the performance of my database, which simplified set-up is the following :
One table with 3 rows (id_device, timestamp, data) with a composite btree index (id_device, timestamp)
1k devices sending data every minute
The insert are quite fast, since PostgreSQL merely writes the rows in the order they are received. However, when trying to get many data with consecutive timestamp of a given device, the query is not so fast. The way I understand it is that due to the way the data is collected, there is never more than one row of a given device on each page of the table. Therefore, if I want to get 10k data with consecutive timestamp of a given device, PostgreSQL has to fetch 10k pages from disk. Besides, since this operation can be done on any of the 1k devices, those pages are not going to be kept in RAM.
I have tried to CLUSTER the table, and it indeed solve the performance issue, but this operation is incredibly long (~1 day) and it locks the entire table, so I discarded this solution.
I have read about the partitionning, but that would mean a lot of scripting if I need to add a new table every time a new devices is connected, and it seems to me a bit bug-prone.
I am rather confident in the fact that this set-up is not particularly original, so is there an advice I could use?
I'm guessing your index also has low selectivity, because you're indexing device_id first (which are only 1000 different) and not timestamp first.
Depends on what you do with the data you fetch, but maybe the solution could be batching the operation, such as fetching the data for a predetermined period and processing data for all 1000 devices in one go.


What is the actual use of partitions in clickhouse?

It says partitions make it easier to drop or move data so that there is hit only on limited data. In various blogs it is suggested to use month as a partitioning key (toYYYYMM(date)). In many places it is also suggested to not have more than a couple of partitions. I am using clickhouse as a database to store time series data which do not undergo frequent deletions. What would be the advisable partitioning key for timeseries data of high volume? Does there have to be one if I do not want to perform deletes frequently?
In production I noticed that startup was very slow and I was suspecting that having too many partitions is the culprit. So I decided to test it out by inserting time-series data fresh into a table (which created >2300 partitions for ~20Bil rows) by selecting data from another table (so that it doesn't have an opportunity to optimize the table). Immediately I dropped the original table and tried a restart. It finished fast in about 10s. This is in complete opposite to what I observed in production with 800GB+ of data (with many databases and tables as opposed to my test node which had only one table).
Edit: As it was pointed out, I mixed up parts and partitions. Regarding startup time of clickhouse being affected, I'd better post another question.
This is a pretty common question, and for disclosure, I work at ClickHouse.
Partitions are particularly useful when you have timeseries data, as you noted. When determining the number of partitions, we often recommend a few guidelines:
The use of partitioning should be determined by a couple of questions as to why you're using them:
are you generally going to query only a single partition? For example, if your queries are often for results within a one day or one month period, it could make sense to partition at that period duration
are you wanting to "tier" or set a TTL on your data such that once the partition reaches an age of X (e.g., 91 days old, 7 months old), you want to do something special with it? (e.g., TTL to lower cost tier storage, backup and delete from ClickHouse, etc.)
We often recommend to keep the number of partitions less than around 100. Up to 1000 partitions can work, but it is suboptimal and will have some performance impact at the filesystem and index/memory sizes, which can impact startup time insert/query time
Given these guidelines, hoping that helps with your question. It is probably most common to partition at the day or month, but since ClickHouse can manage large tables quite easily, might want to move towards fewer partitions if possible - partitioning by month probably most common.
I didn't fully understand your test results so please feel free to expand. 2300 partitions sounds like too many but might work, just with some performance implications. Reducing your number of partitions (and therefore increasing the partition size) seems like a good recommendation.

extremely high SSD write rate with multiple concurrent writers

I'm using QuestDB as backend for storing collected data using the same script for different data sources.
My problem ist the extremly high disk (ssd) usage. During 4 days it has written 335MB per second.
What am I doing wrong?
Inserting data using the ILP interface
I don't know how much data you are ingesting, so not sure if 335 MB per second is much or not. But since you are surprised by it I am going to assume your throughput is lower than that. It might be the case your data is out of order, specially if ingesting from multiple data sources.
QuestDB keeps the data per table always in incremental order by designated timestamp. If data arrives out of order, the whole partition needs to be rewritten. This might lead to write amplification where you see your data is being rewritten very often.
Until literally a few days ago, to fine tune this you would need to change the default config, but since version 6.6.1, this is dynamically adjusted.
Maybe you want to give a try to version 6.6.1, or alternatively if data from different sources is arriving out of order (relative to each other), you might want to create separate tables for different sources, so data is always in order for each table.
I have been experimenting a lot and it seems that you're absolutely right. I was ingesting 14 different clients into a single table. After having splitted this to 14 tables, one for each client, the problem disappeared.
Another advantage is the fact that I need a symbol less as I do not have to distinguish the rows.
By the way - thank you and your team for this marvellous tool you gave us! It makes my work so much easier!!

Cassandra lookup query is quite slow after deleting large bundle of data

Currently, I have a cassandra column family with large rows of data, to say more than 100,000. Now, I'd like to remove all data in this column family and the problem came up:
After all data is removed, I execute a lookup query in this column family, the cassandra will take tens of seconds to return a empty query result. And the time cost will increase Linearly when the original data is larger
It is caused by the tombstone feature while deleting data from the cassandra database. The lookup speed won't recover to normal until the next GC is fired. See Cassandra Distributed Deletes.
Because such query operations are frequently used in my system, I cannot bear the huge latency up to a few seconds.
Would you please give me a solution to this problem?
This sounds like a very bad way to use a database. Populate it, empty it, repeat. One way you can solve your problem is by using different CF names each time, as in when you empty the data and start repopulating it, create a new column family and use that and just drop the other colum family however this is hacky.
I'd suggest using compaction (gets rid of all the tombstones it can detect) to solve your problem, it is CPU intensive but it's better than waiting for tens of seconds for queries to respond. You can make the task less intensive on your machine by providing the specific ks & cf you want to compact:
./nodetool compact <ks_name> <cf_name>
Ritchard's point is a good one, gc_grace_seconds is set to 10 days by default so you will probably have to tweak this to allow for compaction to get rid of tombstones.
If your column family is frequently modified (read then update then read the update again...), you should use the leveled compaction strategy
To make deleted columns removed quickier, change the property gc_grace_seconds of your column family

SQL Server - Merging large tables without locking the data

I have a very large set of data (~3 million records) which needs to be merged with updates and new records on a daily schedule. I have a stored procedure that actually breaks up the record set into 1000 record chunks and uses the MERGE command with temp tables in an attempt to avoid locking the live table while the data is updating. The problem is that it doesn't exactly help. The table still "locks up" and our website that uses the data receives timeouts when attempting to access the data. I even tried splitting it up into 100 record chunks and even tried a WAITFOR DELAY '000:00:5' to see if it would help to pause between merging the chunks. It's still rather sluggish.
I'm looking for any suggestions, best practices, or examples on how to merge large sets of data without locking the tables.
Change your front end to use NOLOCK or READ UNCOMMITTED when doing the selects.
You can't NOLOCK MERGE,INSERT, or UPDATE as the records must be locked in order to perform the update. However, you can NOLOCK the SELECTS.
Note that you should use this with caution. If dirty reads are okay, then go ahead. However, if the reads require the updated data then you need to go down a different path and figure out exactly why merging 3M records is causing an issue.
I'd be willing to bet that most of the time is spent reading data from the disk during the merge command and/or working around low memory situations. You might be better off simply stuffing more ram into your database server.
An ideal amount would be to have enough ram to pull the whole database into memory as needed. For example, if you have a 4GB database, then make sure you have 8GB of RAM.. in an x64 server of course.
I'm afraid that I've quite the opposite experience. We were performing updates and insertions where the source table had only a fraction of the number of rows as the target table, which was in the millions.
When we combined the source table records across the entire operational window and then performed the MERGE just once, we saw a 500% increase in performance. My explanation for this is that you are paying for the up front analysis of the MERGE command just once instead of over and over again in a tight loop.
Furthermore, I am certain that merging 1.6 million rows (source) into 7 million rows (target), as opposed to 400 rows into 7 million rows over 4000 distinct operations (in our case) leverages the capabilities of the SQL server engine much better. Again, a fair amount of the work is in the analysis of the two data sets and this is done only once.
Another question I have to ask is well is whether you are aware that the MERGE command performs much better with indexes on both the source and target tables? I would like to refer you to the following link:
From personal experience, the main problem with MERGE is that since it does page lock it precludes any concurrency in your INSERTs directed to a table. So if you go down this road it is fundamental that you batch all updates that will hit a table in a single writer.
For example: we had a table on which INSERT took a crazy 0.2 seconds per entry, most of this time seemingly being wasted on transaction latching, so we switched this over to using MERGE and some quick tests showed that it allowed us to insert 256 entries in 0.4 seconds or even 512 in 0.5 seconds, we tested this with load generators and all seemed to be fine, until it hit production and everything blocked to hell on the page locks, resulting in a much lower total throughput than with the individual INSERTs.
The solution was to not only batch the entries from a single producer in a MERGE operation, but also to batch the batch from producers going to individual DB in a single MERGE operation through an additional level of queue (previously also a single connection per DB, but using MARS to interleave all the producers call to the stored procedure doing the actual MERGE transaction), this way we were then able to handle many thousands of INSERTs per second without problem.
Having the NOLOCK hints on all of your front-end reads is an absolute must, always.

DB Index speed vs caching

We have about 10K rows in a table. We want to have a form where we have a select drop down that contains distinct values of a given column in this table. We have an index on the column in question.
To increase performance I created a little cache table that contains the distinct values so we didn't need to do a select distinct field from table against 10K rows. Surprisingly it seems doing select * from cachetable (10 rows) is no faster than doing the select distinct against 10K rows. Why is this? Is the index doing all the work? At what number of rows in our main table will there be a performance improvement by querying the cache table?
For a DB, 10K rows is nothing. You're not seeing much difference because the actual calculation time is minimal, with most of it consumed by other, constant, overhead.
It's difficult to predict when you'd start noticing a difference, but it would probably be at around a million rows.
If you've already set up caching and it's not detrimental, you may as well leave it in.
10k rows is not much... start caring when you reach 500k ~ 1 million rows.
Indexes do a great job, specially if you just have 10 different values for that index.
This depends on numerous factors - the amount of memory your DB has, the size of the rows in the table, use of a parameterised query and so forth, but generally 10K is not a lot of rows and particularly if the table is well indexed then it's not going to cause any modern RDBMS any sweat at all.
As a rule of thumb I would generally only start paying close attention to performance issues on a table when it passes the 100K rows mark, and 500K doesn't usually cause much of a problem if indexed correctly and accessed by such. Performance usually tends to fall off catastrophically on large tables - you may be fine on 500K rows but crawling on 600K - but you have a long way to go before you are at all likely to hit such problems.
Is the index doing all the work?
You can tell how the query is being executed by viewing the execution plan.
For example, try this:
explain plan for select distinct field from table;
select * from table(dbms_xplan.display);
I notice that you didn't include an ORDER BY on that. If you do not include ORDER BY then the order of the result set may be random, particularly if oracle uses the HASH algorithm for making a distinct list. You ought to check that.
So I'd look at the execution plans for the original query that you think is using an index, and at the one based on the cache table. Maybe post them and we can comment on what's really going on.
Incidentaly, the cache table would usually be implemented as a materialised view, particularly if the master table is generally pretty static.
Serious premature optimization. Just let the database do its job, maybe with some tweaking to the configuration (especially if it's MySQL, which has several cache types and settings).
Your query in 10K rows most probably uses HASH SORT UNIQUE.
As 10K most probably fit into db_buffers and hash_area_size, all operations are performed in memory, and you won't note any difference.
But if the query will be used as a part of a more complex query, or will be swapped out by other data, you may need disk I/O to access the data, which will slow your query down.
Run your query in a loop in several sessions (as many sessions as there will be users connected), and see how it performs in that case.
For future plans and for scalability, you may want to look into an indexing service that uses pure memory or something faster than the TCP DB round-trip. A lot of people (including myself) use Lucene to achieve this by normalizing the data into flat files.
Lucene has a built-in Ram Drive directory indexer, which can build the index all in memory - removing the dependency on the file system, and greatly increasing speed.
Lately, I've architected systems that have a single Ram drive index wrapped by a Webservice. Then, I have my Ajax-like dropdowns query into that Webservice for high availability and high speed - no db layer, no file system, just pure memory and if remote tcp packet speed.
If you have an index on the column, then all the values are in the index and the dbms never has to look in the table. It just looks in the index which just has 10 entries. If this is mostly read only data, then cache it in memory. Caching helps scalability and a lot by relieving the database of work. A query that is quick on a database with no users, might perform poorly if a 30 queries are going on at the same time.
