I'm testing Cassandra performance in the case of simultaneous read and writes operations.
Latency for READ operations is important for me, WRITES usually are done once per day, so I'm investigating if the WRITE operations would affect READ operations latency.
I have a table of 10_000 records which I READ from application at 50 RPS. For example, the latency is 10ms, REQUEST_SERIAL_CONSISTENCY and REQUEST_CONSISTENCY is QUORUM for READ operations.
Query for READs is: SELECT * from table where id = X
Simultaneously I run another application which insert same records to table (with same ids, so in fact it is UPDATE operations), the REQUEST_SERIAL_CONSISTENCY and REQUEST_CONSISTENCY is ALL for WRITE operations. In several threads, I load my Cassandra cluster with batches of INSERT queries size of 60 queries in batch.
Query for WRITEs is: INSERT into table values(id,..) where id = X;
What I expect it is the latency increasing for READ operations when WRITES is enabled. But latency is stable in general besides random spikes like:
Why hasn't latency been changed, was it wrong to expect uniform latency increase?
What about these spikes, can it be connected with MemTable flushes to disk? I can see them in statistic:
nodetool tpstats
MemtableFlushWriter 0 0 4272 0 0
Info of my cluster:
cqlsh> SELECT * FROM system_schema.keyspaces;
keyspace_name | durable_writes | replication
--------------------+----------------+-------------------------------------------------------------------------------------
system_auth | True | {'class': 'org.apache.cassandra.locator.SimpleStrategy', 'replication_factor': '1'}
system_schema | True | {'class': 'org.apache.cassandra.locator.LocalStrategy'}
system_distributed | True | {'class': 'org.apache.cassandra.locator.SimpleStrategy', 'replication_factor': '3'}
system | True | {'class': 'org.apache.cassandra.locator.LocalStrategy'}
search | True | {'class': 'org.apache.cassandra.locator.SimpleStrategy', 'replication_factor': '1'}
system_traces | True | {'class': 'org.apache.cassandra.locator.SimpleStrategy', 'replication_factor': '2'}
With 10000 records, I do not expect much change in latency. Latency can get hit due to lot of factor. It could be that Cassandra is doing I/O for compactions, repairs and that could hit your latency even when no write is done. So I assume random spikes you are seeing is because of some compactions running in background. One more thing, when you say you are inserting records in batch of 60, please dont use Cassandra batch for inserting records as they will be slow.
Related
I have a spark job that writes to s3 bucket and have a athena table on top of this location.
The table is partitioned. Spark was writing 1GB single file per partition. We experimented with maxRecordsPerFile option thus writing only 500MB data per file. In the above case we ended up having 2 files with 500MB each
This saved 15 mins in run-time on the EMR
However, there was a problem with athena. Athena query CPU time started getting worse with the new file size limit.
I tried comparing the same data with the same query before and after execution and this is what I found:
Partition columns = source_system, execution_date, year_month_day
Query we tried:
select *
from dw.table
where source_system = 'SS1'
and year_month_day = '2022-09-14'
and product_vendor = 'PV1'
and execution_date = '2022-09-14'
and product_vendor_commission_amount is null
and order_confirmed_date is not null
and filter = 1
order by product_id
limit 100;
Execution time:
Before: 6.79s
After: 11.102s
Explain analyze showed that the new structure had to scan more data.
Before: CPU: 13.38s, Input: 2619584 rows (75.06MB), Data Scanned: 355.04MB; per task: std.dev.: 77434.54, Output: 18 rows (67.88kB)
After: CPU: 20.23s, Input: 2619586 rows (74.87MB), Data Scanned: 631.62MB; per task: std.dev.: 193849.09, Output: 18 rows (67.76kB)
Can you please guide me why this takes double the time? What are the things to look out for? Is there a sweet spot on file size that would be optimal for spark & athena combination?
One hypothesis is that pushdown filters are more effective with the single file strategy.
From AWS Big Data Blog's post titled Top 10 Performance Tuning Tips for Amazon Athena:
Parquet and ORC file formats both support predicate pushdown (also
called predicate filtering). Both formats have blocks of data that
represent column values. Each block holds statistics for the block,
such as max/min values. When a query is being run, these statistics
determine whether the block should be read or skipped depending on the
filter value used in the query. This helps reduce data scanned and
improves the query runtime. To use this capability, add more filters
in the query (for example, using a WHERE clause).
One way to optimize the number of blocks to be skipped is to identify
and sort by a commonly filtered column before writing your ORC or
Parquet files. This ensures that the range between the min and max of
values within the block are as small as possible within each block.
This gives it a better chance to be pruned and also reduces data
scanned further.
To test it I would suggest to do another experiment if possible. Change the spark job and sort the data before persisting it into the two files. Use the following order:
source_system, execution_date, year_month_day, product_vendor, product_vendor_commission_amount, order_confirmed_date, filter and product_id. Then check the query statistics.
At least the dataset would be optimised for the presented use case. Otherwise, change it according to the most heavy queries.
The post comments about optimal file sizes too and it gives a general rule of thumb. From my experience, Spark works well with sizes between 128MB and 2GB. It should be also fine for other query engines like Presto used by Athena.
My suggestion would be to break year_month_day/execution date ( as mostly used in the queries ) to Year, Month and Day partitions , which would reduce the amount of data scan and efficient filtering.
I am trying to fetch data from Greenplum table using Apache NiFi - QueryDatabaseTableRecord. I am seeing GC overhead limit exceeded error and the NiFi webpage becomes unresponsive.
I have set the 'Fetch Size' property to 10000 but it seems the property is not being used in this case.
Other settings:
Database Type : Generic
Max Rows Per Flow File : 1000000
Output Batch Size : 2
jvm min/max memory allocation is 4g/8g
Is there an alternative to avoid the GC errors for this task ?
this is a clear case of the "Fetch Size" parameter not being used, see processor info on this.
Try to test the jdbc setFetchsize on its own to see if it works.
I use a VMWare environment to compare the performance of Postgres-XL 9.5 and PostgreSQL 9.5.
I build Postgres-XL cluster following the instruction of Creating a Postgres-XL cluster
Physical HW:
M/B: Gigabyte H97M-D3H
CPU: Intel i7-4790 #3.60Mhz
RAM: 32GB DDR3 1600
HD: 2.5" Seagate SSHD ST1000LM014 1TB
Infra:
VMWare ESXi 6.0
VM:
DB00~DB05:
CPU: 1 core, limit to 2000Mhz
RAM: 2GB, limit to 2GB
HD: 50GB
Advanced CPU Hyperthread mode: any
OS: Ubuntu 16.04 LTS x64 (all packages are upgraded to the current version with apt-update; apt-upgrade)
PostgreSQL 9.5+173 on DB00
Postgres-XL 9.5r1.2 on DB01~DB05
userver: (for executing pgbench)
CPU: 2 cores,
RAM: 4GB,
HD: 50GB
OS: Ubuntu 14.04 LTS x64
Role:
DB00: Single PostgreSQL
DB01: GTM
DB02: Coordinator Master
DB03~DB05: datanode master dn1~dn3
postgresql.conf in DB01~DB05
shared_buffers = 128MB
dynamic_shared_memory_type = posix
max_connections = 300
max_prepared_transactions = 300
hot_standby = off
# Others are default values
postgresql.conf of DB00 is
max_connections = 300
shared_buffers = 128MB
max_prepared_transactions = 300
dynamic_shared_memory_type = sysv
#Others are default values
On userver:
pgbench -h db00 -U postgres -i -s 10 -F 10 testdb;
pgbench -h db00 -U postgres -c 30 -t 60 -j 10 -r testdb;
pgbench -h db02 -U postgres -i -s 10 -F 10 testdb;
pgbench -h db02 -U postgres -c 30 -t 60 -j 10 -r testdb;
I confirmed that all tables pgbench_* are averagely distributed amoung dn1~dn3 in Postgres-XL
pgbench results:
Single PostgreSQL 9.5: (DB00)
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 10
query mode: simple
number of clients: 30
number of threads: 10
number of transactions per client: 60
number of transactions actually processed: 1800/1800
tps = 1263.319245 (including connections establishing)
tps = 1375.811566 (excluding connections establishing)
statement latencies in milliseconds:
0.001084 \set nbranches 1 * :scale
0.000378 \set ntellers 10 * :scale
0.000325 \set naccounts 100000 * :scale
0.000342 \setrandom aid 1 :naccounts
0.000270 \setrandom bid 1 :nbranches
0.000294 \setrandom tid 1 :ntellers
0.000313 \setrandom delta -5000 5000
0.712935 BEGIN;
0.778902 UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
3.022301 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
3.244109 UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
7.931936 UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
1.129092 INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
4.159086 END;
_
Postgres-XL 9.5
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 10
query mode: simple
number of clients: 30
number of threads: 10
number of transactions per client: 60
number of transactions actually processed: 1800/1800
tps = 693.551818 (including connections establishing)
tps = 705.965242 (excluding connections establishing)
statement latencies in milliseconds:
0.003451 \set nbranches 1 * :scale
0.000682 \set ntellers 10 * :scale
0.000656 \set naccounts 100000 * :scale
0.000802 \setrandom aid 1 :naccounts
0.000610 \setrandom bid 1 :nbranches
0.000553 \setrandom tid 1 :ntellers
0.000536 \setrandom delta -5000 5000
0.172587 BEGIN;
3.540136 UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
0.631834 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
6.741206 UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
17.539502 UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
0.974308 INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
10.475378 END;
My question is, why are Postgres-XL's TPS and other indexes (such as INSERT, UPDATE) are far poor than those of PostgreSQL ? I thought Postgres-XL's performance should be better that of PostgreSQL, isn't it ?
Postgres-XL is designed to run on multiple physical nodes. Running it on VMWare is a good educational exercise but should not be expected to show any performance gain. You are adding virtualization overhead and the overhead of the clustering software. The web page test from joyeu’s answer used 4 physical machines. Assuming that the performance increase quoted over a single node is based on the same machine you would read this as 4 times the hardware for a 2.3x increase in performance.
Maybe you should try a large "Scale" value.
I got the similar result as you.
And then I found this webpage from Postgres-XL official site:
http://www.postgres-xl.org/2016/04/postgres-xl-9-5-r1-released/eased/
It says:
Besides proving its mettle on Business Intelligence workloads,
Postgres-XL has performed remarkably well on OLTP workloads when
running pgBench (based on TPC-B) benchmark. In a 4-Node (Scale: 4000)
configuration, compared to PostgreSQL, XL gives up to 230% higher TPS
(-70% latency comparison) for SELECT workloads and up to 130% (-56%
latency comparison) for UPDATE workloads. Yet, it can scale much, much
higher than even the largest single node server.
So I guess Postgres-XL performs well for large data size.
And I will conduct a test to confirm this right now.
Postgres-XL is a clustered server. Individual transactions will always be slightly slower on it, but because it can be scale up to massive clusters letting it be able to process MUCH more data concurrently letting it process large data sets much faster.
Also performance varies WIDELY depending on what configuration options you use.
From your test specs:
Physical HW:
M/B: Gigabyte H97M-D3H
CPU: Intel i7-4790 #3.60Mhz
RAM: 32GB DDR3 1600
HD: 2.5" Seagate SSHD ST1000LM014 1TB <-----
using a single disk will likely introduce a bottleneck and slower your performances. You are using the same read/write speed divided by 4 considering that GTM, Coordinator and data nodes are going to access/spool data etc.
Despite of people speaking about performance gaps introduced by the hypervisor, database are disk intensive applications, not memory/cpu intensive one, this means that are perfect for virtualization to the condition of distributing accordingly the workload between disk groups. Obiviously use a preallocated disk or you will slow down the inserts for real.
Can Apache NIFI "ExecuteSQL Processor" stream large set of select result in chunks say 'x' MB?
The ExecuteSQL Processor can "stream" large numbers of rows in the sense that it will stream the data directly to FlowFile content (which will not be held in-memory/heap) so it is very memory efficient. It does not, at this time, chunk the results, though. There is a ticket https://issues.apache.org/jira/browse/NIFI-1251 to provide such capabilities, though.
You can now use the QueryDatabaseTable processor which supports chunking by the "Max Rows Per Flow File" attribute.
You can also specify a limit statement in the SQL itself (along with a sort by ID), pull one batch, get the last ID, pull all >max(id), repeat until done, i.e.
Start
|
UpdateAttr: maxid--------- SQL ... $maxid:isEmpty():ifElse('', 'where id>maxid') order by id limit n
|_____________________________|
|
do sth
it's by # records and not size - but knowing the approx size per record, you can still do it
I am working on a single node Cassandra setup. The system which I am using has 4-Core cpu with 8GB RAM.
The properties of the column family which i am using is:
Keyspace: keyspace1:
Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy
Durable Writes: true
Options: [datacenter1:1]
Column Families:
ColumnFamily: colfamily (Super)
Key Validation Class: org.apache.cassandra.db.marshal.UTF8Type
Default column value validator: org.apache.cassandra.db.marshal.UTF8Type
Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type/org.apache.cassandra.db.marshal.BytesType
Row cache size / save period in seconds / keys to save : 100000.0/0/all
Row Cache Provider: org.apache.cassandra.cache.ConcurrentLinkedHashCacheProvider
Key cache size / save period in seconds: 200000.0/14400
GC grace seconds: 864000
Compaction min/max thresholds: 4/32
Read repair chance: 1.0
Replicate on write: true
Built indexes: []
Compaction Strategy: org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy
I tried to insert 1million rows to a column family. The Throughput for writes is around 2500 per sec and reads is around 380 per sec.
How can I improve both the read and write throughput??.
380 per second means, that you are reading data from hard drive with low cache hit rate or OS is swapping. Check Cassandra statistics to find out cache usage:
./nodetool -host <IP> cfstats
You have enabled both row and key cache. row cache will read whole row into RAM - means all columns given by row key. In this case you can disable key cache. But make sure that you have enough free RAM to handle row caching.
If you have Cassandra with off-heap-cache (default from 1.x), it is possible that row cache is very large and OS started swapping - check swap size - this can decrease performance.