We have a small Greenplum cluster in which some queries abort.
System related information:
Greenplum Version: 6.3
Master Host: 1
Segment Host: 2
RAM per Segmenthost: 32GB
SWAP per Segmenthost: 32GB
TOTAL segment: 8 Primary + 0 mirror
segment per host: 4
vm_overcommmit_ratio: 95
gp_vmem_protect_limit: 8072MB
statement_mem: 250MB
The queries are executed with a none superuser.
Symptom:
The query failed with the following error massage:
Canceling query because of high VMEM usage. Used: 7245MB, available 801MB, red zone: 7264MB (runaway_cleaner.c:189)
What we tried:
We calculate the Greenplum Parameter with this information: https://gpdb.docs.pivotal.io/6-3/best_practices/sysconfig.html
This help us for some "simple" queries but for more complicated ones the error happend again.
In the next Step we configured the max_statement_mem: 2000MB
This didn't have any effect to the memory consumption on the segmenthosts. We track this with following Query:
select segid, sum (vmem_mb) from session_state.session_level_memory_consumption
where query like '%<some snippet of the query>%'
group by segid
order by segid;
The memory consumption increases very quickly and the error happend again.
We tried to restrict the memory consumption by setting the following resource queue for the user:
CREATE RESOURCE QUEUE adhoc with (ACTIVE_STATEMENTS=6, MEMORY_LIMIT=6291);
ALTER ROLE user1 RESOURCE QUEUE adhoc;
The Database is set to use the resource queue with the parameter gp_resource_manager: queue
We see in the Table 'gp_toolkit.gp_resqueue_status' when we execute a statement that the 'rsqmemoryvalue' is 1048 but the memory consumption in the session_state.session_level_memory_consumption table shows higher values for the segments until the error occurs again.
Has anyone a tip to fix this problem?
Each query will ask for 250MB memory and you set gp_vmem_protect_limit to 8GB. In this case, you can probably run (8GB- primary process memory)/250MB =~ 20-30 queries at the same time. The size of primary process depends on other settings, shared_buffers, wal_buffers,...
Statement_mem can be set in a session. This means some users can set statement_mem higher (up to max_statement_mem) and you will see less queries in concurrent.
When the memory allocated to those concurrent queries reach 90(OR 95) % of gp_vmem_protect_limit, runaway detector will start to cancel queries to protect primary process from OS OOM Kill.
To "fix" the problem (it is not a problem actually), you can
1) set lower default statement_mem, so you can have more queries running concurrently but slower.
2) increase RAM on segment hosts, such that you can increase gp_vmem_protect_limit.
Related
I ran into the scenario in the title on this cluster:
5 shards, 5 replicas
Google Cloud Compute
One table only on the cluster (sharded and replicated) with ReplicatedReplacingMergeTree. I can provide the schema if needed, but the issue should not depend on that
Clickhouse 21.8.13.1.altinitystable. (but also reproduced on 20.7.2.30)
This is the sequence of events:
I executed an OPTIMIZE TABLE .... PARTITION .... FINAL on one node of each of the shards. The partition is fairly large (120Gb) so that process would take longer than one hour.
The optimize started and was visible in system.merges and system.replication_queue as usual.
During the process one of the nodes was restarted because of a GCP maintenance event and came back up a few minutes later.
Once Clickhouse restarted, it restarted the merge as expected. Though three GET_PART operations (I assume parts that were created during the downtime and had to be replicated) did not start as they were waiting on the large merge to complete. See the output of the replication_queue table below. 90-20220530_0_1210623_1731 part is the one indeed covered by the merge generated by the OPTIMIZE statement
SELECT
replica_name,
postpone_reason,
type
FROM system.replication_queue
(formatted)
replica_name: snuba-errors-tiger-4-4
postpone_reason: Not executing log entry queue-0055035589 for part 90-20220530_0_1210420_1730 because it is covered by part 90-20220530_0_1210623_1731 that is currently executing.
type: GET_PART
replica_name: snuba-errors-tiger-4-4
postpone_reason: Not executing log entry queue-0055035590 for part 90-20220530_1210421_1210598_37 because it is covered by part 90-20220530_0_1210623_1731 that is currently executing.
type: GET_PART
replica_name: snuba-errors-tiger-4-4
postpone_reason: Not executing log entry queue-0055035591 for part 90-20220530_1210599_1210623_6 because it is covered by part 90-20220530_0_1210623_1731 that is currently executing.
type: GET_PART
replica_name: snuba-errors-tiger-4-4
postpone_reason:
type: MERGE_PARTS
The metrics around the replication delay increased up to 1:30 minutes and the distributed table did not send any query to this node until the merge was done (90 minutes later)
Is this a normal behavior ? If yes, is there a way to prevent a long merge from blocking replication in case of restart ?
max_replica_delay_for_distributed_queries is set to 300 seconds on the cluster. I was expecting the 1:30 minutes delay would be ignored but that did not seem to be the case as no query was routed to the impacted node. Is there another way to tell Clickhouse to ignore the replication delay ?
Thank you
Filippo
I am having a table with around 2 billion rows that i try to query the max(id) from. Id is not the sort key of the table and the table is using the table engine mergeTree.
No matter what I try, I get memory errors. This does not stop with this one query only. As soon as I try to query any table fully (vertical) to find data my 12 gb ram is not enough. Now I know I can just add more but that is not the point. Is it by design that clickhouse just throws an error when it doesn't have enough memory? Is there a setting that tells clickhouse to use disk instead?
SQL Error [241]: ClickHouse exception, code: 241, host: XXXXXX, port: 8123; Code: 241, e.displayText() = DB::Exception: Memory limit (for query) exceeded: would use 9.32 GiB (attempt to allocate chunk of 9440624 bytes), maximum: 9.31 GiB (version 21.4.6.55 (official build))
Alexey Milovidov disagree to put into CH documentation minimum RAM requirements. But I would say that 32 GB is a minimum for production CH.
At least:
You need to lower mark cache because it's 5GB!!!! by default (set it 500MB).
You need to lower max_block_size to 16384.
You need to lower max_threads to 2.
You need to set max_bytes_before_external_group_by to 3GB.
You need to set aggregation_memory_efficient_merge_threads to 1.
For me what worked was to change the maximum server memory usage from 0.9 to 1.2.
<max_server_memory_usage_to_ram_ratio>1.2</max_server_memory_usage_to_ram_ratio>
--> config.xml
Thanks for the reply as it led me ultimately to this.
I have been a problem in Cassandra. Please help me..
I am executing Select statement at 500K rows table at intervals 1 millisecond. After some time I get message "All host(s) tried for query failed. First host tried, 10.1.60.12:9042: Host considered as DOWN. See innerErrors".
I run select statement the fallowing:
select * from demo.users
this returning to me 5K rows. There are 500K rows in the users table.
I don't know what is wrong. I have not changed the cassandra.yaml file.
I need to make settings for the memory cache? There is too much disk i/o when I run select statement.
Please help me
A range query (select * with no primary key or token ranges) can be a very expensive query that has to hit at least 1 of every replica set (depend on size of dataset). If your trying to read the entire dataset or doing batch processing either be best to use spark connector or behave like it, and query individual token ranges to prevent putting too much load on coordinators.
If you are going to be using inefficient queries (which is fine, just don't expect the same throughput as normal reads) you will probably need more resources or some specialized tuning. You could add more nodes or look into whats causing it to go DOWN. Most likely its GCs from heap load, so can check GC log. If you have the memory available you can increase heap. Would be good idea to max heap size since with reading everything, system caches are not going to be as meaningful. Use G1 once over 16gb (which you should be) in the jvm.options.
We are using sqlplus to offload data from oracle using sqlplus on a large table with 500+ columns and around 15 million records per day.
The query fails as oracle is not able to allocate the required memory for the result set.
Fine tuning oracle DB server to increase memory allocation is ruled out since it is used across teams and is critical.
This is a simple select with a filter on a column.
What options do I have to make it work?
1) to break my query down into multiple chunks and run it in nightly batch mode.
If so , how can a select query be broken down
2) Are there any optimization techniques I can use while using sqlplus for a select query on a large table?
3) Any java/ojdbc based solution which can break a select into chunks and reduce the load on db server?
Any pointers are highly appreciated.
Here is the errror message thrown:
ORA-04030: out of process memory when trying to allocate 169040 bytes (pga heap,kgh stack)
ORA-04030: out of process memory when trying to allocate 16328 bytes (koh-kghu sessi,pl/sql vc2)
The ORA-4030 indicates the process needs more memory(UGA in SGA/PGA depending upon the server architecture) to execute job.
This could be caused by shortage of RAM(Dedicated server mode environment), a small PGA size, or may be operating system setting to restrict allocation of enough RAM.
This MOS Note describes how to diagnose and resolve ORA-04030 error.
Diagnosing and Resolving ORA-4030 Errors (Doc ID 233869.1)
Your option 1 seems in your control. Breaking down the query will require knowledge of the query/data. Either a column in the data might work; i.e.
query1: select ... where col1 <= <value>
query2: select ... where col1 > <value>
... or ... you might have to build more code around the problem.
Thought: does the query involving sorting/grouping? Can you live without it? Those operations take up more memory.
calling mysqldump for a database containing innodb & myisam tables.
Dump still runs very fast when it comes to a fat MyISAM table with 11GB size.
Fast means iotop shows me more than 70MB/s write performance.
I view the process in mytop so i know it happens at a big table.
Dump files grows up to 8GB and then suddenly the I/O is only about 1 MB/s.
Server Load is OK, no other processes running.
Tried to change my.cnf settings but nothing worked.
Performance depends on a few factors.
I had to create an alternative solution to Mysqldump for a client to make them load a 42GB dump file (with more than 1 billion rows)
For reference: originally, MySQLDump took 3.9 days on a 16 core server with 64Gb ram and a 10 disk SSD array.
Using uniVocity We loaded the same data in 90 minutes, using a 3 year old laptop. You can use it with a 30 day evaluation license to load this.
Other than that, here are a few things that may impact performance:
Check if you have this on your dump file to disable constraints:
SET #OLD_UNIQUE_CHECKS=##UNIQUE_CHECKS, UNIQUE_CHECKS=0
SET #OLD_FOREIGN_KEY_CHECKS=##FOREIGN_KEY_CHECKS, FOREIGN_KEY_CHECKS=0
SET #OLD_SQL_MODE=##SQL_MODE, SQL_MODE='NO_AUTO_VALUE_ON_ZERO'
If it doesn't add them or alter the create table script to remove all constraints. If you have constraints enabled (primary keys, foreign keys, etc) while running your dump load, the process will get slower over time as the database will validate these contraints on every insert against a growing number of possibilities (more PK's and FK's).
If you are using InnoDB (not exactly your case but it may help someone else), add this to your my.cfg file:
innodb_doublewrite = 0
innodb_buffer_pool_size = 8000M
# innodb_log_file_size = 512M - If I enable this one the server won't start. Couldn't identify why.
log-bin = 0
innodb_support_xa = 0
innodb_flush_log_at_trx_commit = 0