OpenTSDB is super fast. KairosDB is known as re-write of OpenTSDB and as claimed that it's even faster than OpenTSDB (see here). However I did some tests with a pseudo-distributed cluster (1 master, 1 slave, locally) for OpenTSDB and 1-node cluster for KairosDB on my VirtualBox (5 GB RAM, 3 cores). The insertion speed was around 100,000 records / sec for OpenTSDB and 30,000 records / sec for KairosDB. Did I configure something wrong with KairosDB or OpenTSDB is actually faster?
I don't have measurements on OpenTSDB. We use KairosDB and it's quite fast.
What database did you use for KairosDB? H2 is for test only and is desperately slow.
What interface did you use for pushiong data? And if you use the REST API how did you build your queries?
On a single bare metal node we were above 50,000 samples per second using Telnet (limited by the acquisition agent), and about 3 times this speed for gzipped JSON using the REST API (this is for batch inserts of historical data, and the JSON is build to insert data with one array of datapoints per series, and with up to 10,000,000 samples per document).
Maybe VirtualBox slows everything down too much (guest VMs have very poor performances).
On another hand the last time I spoke with OpenTSDB they were a lot under the 100,000 points per seconds in insertion... So they may have improved performances.
Related
A few days ago I was granted the very interesting project, of recoding the data input of a large simulation framework.
Before I accepted that task, it used an ... interesting ... file reader. That sadly didn't work.
The task is easy to describe: get up to 2+ million floats per second from a database (originally planned PostgresQL, now looking into ElasticSearch or Cassandra) into a Java proxy. Remap, buffer and feed them to a high-performance data bus.
The second part was easy enough, and the first part was standard, right? Erm, sadly wrong.
As it turned out, I somehow smashed into a brick wall of "not more than 1 Mb / s"... which is only 5% of the transfer rate needed. Yes, yes, the problem could surely be killed with more hardware, but why introduce a "built in brake", if there might be better solutions?
A tad background on the data:
A unit consists of a uuid, a type and a timeseries of values (floats). One float per minute for a whole year -> 60*24*365 = 525,600 floats.
The consumer asks 150k units, for 15 time-steps a second -> 2.25 mio floats per second.
We plan to pre-split the timeseries into day-batches, which seems to be a tad more manageable. Even at this rate that's about 9k*150k units = 1.35 GB of data for the buffer.
So, I tried several different ways to store and retrieve the data.
In PostgresQL the most promising approach till now is a table:
uuid uuid,
type text,
t1...t365 float4[]
PK is the combination of uuid and type.
Simple enough, I only need to SELECT tx FROM table.
Info on my testing setup: as I can't yet test with a full dataset (even faking them is painstackingly slow, at 2 sec per unit) I usually test with batches of 2k units.
Under such conditions the SELECT needs 12 seconds. According to my tests that's NOT due to transmission bottleneck (same result if I use a DB on the same machine). Which is roughly 5% of the speed I need, and much slower than I assumed it would be. 1 Mb/s???
I haven't found any apparent bottleneck yet, so throwing more hardware at it, might even not work at all.
Anyone knowledgable about PostgresQL an idea what's slowing it down so much?
OK, so I tried another system. ElasticSearch might be faster, and also solve some other questions, like a nasty cluster access API I might have to add later on.
mapping: {
_index: timeseries_ID
_type: unit
uuid: text(keyword)
type: text(keyword)
day: integer
data: array[float]
}
To my surprise, this was even slower. Only 100k floats per second tops. Only 2% of what I need. Yes yes I know, add more nodes, but isn't ElasticSearch supposed to be a tad faster than Postgres?
Anyone an idea for an ElasticSearch mapping or any configs that might be more promising?
Now, on monday I'll get to work with my tests with Cassandra. The planned table is simple:
day int PRIMARY KEY,
uuid text,
type text,
data array[float]
Is there a more sensible setup for this kind of data for Cassandra?
Or does anyone have a better idea alltogether to solve this problem?
Clarifications
If I do understand what you want I can't see why you are using a database at all.
Reason behind using a DB is, that the data comes from outside people, with varying IT-skills. So we need a central storage for the data, with an API as well as maybe a simple web frontend to check the integrity and clean up and reformate it into our own internal system. A database seems to work better on that front than some proprietory file system. That one could be more easily clogged and made unusable by inexperienced data contributors.
I'm not sure it's clear on what you're trying to do here or why you seem to be randomly trying different database servers.
It's not randomly. The Postgres was a typical case of someone going: "oh but we do use that one already for our data, so why should I have to learn something new?"
The Elasticsearch approach is trying to leverage the distributed cluster and replication features. Thus we can have a central permanent storage, and just order a temporary ES cluster on our vms with replicates of the needed data. This way ES would handle all the data transport into the computer cluster.
Cassandra is a suggestion by my boss, as it's supposed to be much more scalable than Postgres. Plus it could comfort those that prefer to have a more SQLy API.
But - can you show e.g. the query in PostgreSQL you are trying to run - is it multiple uuids but a short time-period or one uuid and a longer one?
Postgres: The simplest approach is to get all uuids and exactly one daybatch. So a simple:
SELECT t1 FROM table;
Where t1 is the column holding the data for dayset one.
For testing with my (till now) limited fake data (roughly 2% of the full set) I sadly have to go with: SELECT t1, t2 ... t50 FROM table
Depending on testing, I might also go with splitting that one large select into some/many smaller ones. I'm thinking about going by a uuid-hash based split, with indexes set accordingly of course. It's all a question of balancing overhead and reliability. Nothing is final yet.
Are we talking multiple consumers or just one?
At the start one consumer, planned is to have multiple instances of the simulation. Much smaller instances though. The 150k unit one is supposed to be the "single large consumer" assumption.
How often are the queries issued?
As needed, in the full fledged approach that would be every 96 seconds. More often if I switch towards smaller queries.
Is this all happening on the same machine or is the db networked to this Java proxy and if so by what.
At the moment I'm testing on one or two two machines only: preliminary tests are made solely on my workstation, with later tests moving the db to a second machine. They are networking over a standard gigabit LAN.
In the full fledged version the simulation(s) will run on vms on a cluster with the DB having a dedicated strong server for itself.
add the execution plan generated using explain (analyze, verbose)
Had to jury rigg something with a small batch (281 units) only:
Seq Scan on schema.fake_data (cost=0.00..283.81 rows=281 width=18) (actual time=0.033..3.335 rows=281 loops=1)
Output: t1, Planning time: 0.337 ms, Execution time: 1.493 ms
Executing the thing for real: 10 sec for a mere 1.6 Mb.
Now faking a 10k unit by calling t1-t36 (I know, not even close to the real thing):
Seq Scan on opsim.fake_data (cost=0.00..283.81 rows=281 width=681) (actual time=0.012..1.905 rows=281 loops=1)
Output: *, Planning time: 0.836 ms, Execution time: 2.040 ms
Executing the thing for real: 2 min for ~60 Mb.
The problem is definitely not the planning or execution. Neither is it the network, as I get the same slow read on my local system. But heck, even a slow HDD has at LEAST 30 Mb/s, a cheap network 12.5 MB/s ... I know I know, that's brutto, but how come that I get < 1Mb/s out of those dbs? Is there some bandwith limit per connection? Aunt google at least gave me no indication for anything like that.
I was playing around with cassandra-stress tool on my own laptop (8 cores, 16GB) with Cassandra 2.2.3 installed out of the box with having its stock configuration. I was doing exactly what was described here:
http://www.datastax.com/dev/blog/improved-cassandra-2-1-stress-tool-benchmark-any-schema
And measuring its insert performance.
My observations were:
using the code from https://gist.github.com/tjake/fb166a659e8fe4c8d4a3 without any modifications I had ~7000 inserts/sec.
when modifying line 35 in the code above (cluster: fixed(1000)) to "cluster: fixed(100)", i. e. configuring my test data distribution to have 100 clustering keys instead of 1000, the performance was jumping up to ~11000 inserts/sec
when configuring it to have 5000 clustering keys per partition, the performance was reducing to just 700 inserts/sec
The documentation says however Cassandra can support up to 2 billion rows per partition. I don't need that much still I don't get how just 5000 records per partition can slow the writes 10 times down or am I missing something?
Supporting is a little different from "best performaning". You can have very wide partitions, but the rule-of-thumb is to try to keep them under 100mb for misc performance reasons. Some operations can be performed more efficiently when the entirety of the partition can be stored in memory.
As an example (this is old example, this is a complete non issue post 2.0 where everything is single pass) but in some versions when the size is >64mb compaction has a two pass process, that halves compaction throughput. It still worked with huge partitions. I've seen many multi gb ones that worked just fine. but the systems with huge partitions were difficult to work with operationally (managing compactions/repairs/gcs).
I would say target the rule of thumb initially of 100mb and test from there to find own optimal. Things will always behave differently based on use case, to get the most out of a node the best you can do is some benchmarks closest to what your gonna do (true of all systems). This seems like something your already doing so your definitely on the right path.
I am using Apache Cassandra for storing around 100 million records. There is one single node with the following specifications-
RAM-32GB, HDD-2TB, Intel quad core processor.
With cassandra there is a read performance problem. For some queries it takes around 40mins for giving the output. After searching for how to improve the read performance i came to know about the following factors-
Compaction strategy,compression techniques, key cache, increase the heap space, turning off the swap space for cassandra.
After doing these optimizations, the performance remains the same. After seraching, I came around for integrating Hadoop with cassandra.Is it the correct way to do the queries in cassandra or any other factors I am missing here??
Thanks.
It looks like you data model could be improved. 40 minutes is something impossible. I download all data from 6 million records (around 10gb) within few minutes. And think it because I convert data in the process of download and store them. Trivial selects must take milliseconds.
Did you build it on the base of queries that you must do ?
The business scenario requires:
50M key-value pairs, 2K each , 100G memory in total.
About 40% of key-value will change in a second.
The Java application need Get() once and set() once for each changed pair, it will be 50M*40%*2=4M qps (query per second) .
We tested memcached - which shows very limited qps.
Our benchmarking is very similar to results showed here
http://xmemcached.googlecode.com/svn/trunk/benchmark/benchmark.html
10,000 around qps is the limitation of one memcached server.
That mean we need 40 partitioned memcached servers in our business scenario- which seems very uneconomic and unrealistic.
In your experience, is the benchmarking accurate in term of memcached’s designed performance?
Any suggestion to tune memcached system(client or server)?
Or any other alternative memory store system that is able meet the requirement more economically?
Many thanks in advance!
If you look at the graphs in the benchmark you talked about you need to realize that the in many of those instances the limit was the network and not memcached. For instance, if you will have 2k values for all of your items then your maximum throughput on a GigE network if about 65k ops/sec. (1024*1024*128/2048=65536). Memcached can do a lot more operations per second than this. I have personally hit 200K ops/sec with (I think) 512b values and I have heard of others getting much higher throughput than I did. This all depends heavily on the network though.
Also, memcached is barely doing anything at 10k ops/sec. My guess is your aren't taking advantage of concurrency in your benchmarks.
I have server with 4 GB RAM and 2x 4 cores CPU. When I start perform massive writes in Cassandra all works fine initially, but after a couple hours with 10K inserts per second database grows up to 25+ GB, and performance go down to 500 insert per seconds!
I find out this because compacting operations is very slow but I don't understand why? I set 8 concurrent compacting threads but Cassandra don't use 8 threads; only 2 cores are loaded.
Appreciate any help.
We've seen similar problems with Cassandra out-the-box, see:
http://www.acunu.com/blogs/richard-low/cassandra-under-heavy-write-load-part-ii/
One solution to these sort of performance degradation issues (but by no means the only) is to consider a different storage engine, like Castle, used in the above blog post - its opensource (GPL v2), has much better performance and degrades much more gracefully. The code is here (I've just pushed up a branch for Cassandra 0.8 support):
https://bitbucket.org/acunu/fs.hg
And instructions on how to get started are here:
http://support.acunu.com/entries/20216797-castle-build-instructions
(Full disclosure: I work for Acunu, so may be a little biased ;-)