Clickhouse Exception: Memory limit (total) exceeded - clickhouse

Attempting to connect Clickhouse to replicate data from PostgreSQL using https://clickhouse.com/docs/en/engines/database-engines/materialized-postgresql/. Any ideas on how to solve the error or what's the best way to replicate PostgreSQL data to Clickhouse?
CREATE DATABASE pg_db
ENGINE = MaterializedPostgreSQL('localhost:5432', 'dbname', 'dbuser', 'dbpass')
SETTINGS materialized_postgresql_schema = 'dbschema'
Then running SHOW TABLES FROM pg_db; doesn't show all tables (missing large tables that has 800k rows). When attempting to attach that large table using ATTACH TABLE pg_db.lgtable;, gets an error below:
Code: 619. DB::Exception: Failed to add table lgtable to replication.
Info: Code: 241. DB::Exception: Memory limit (total) exceeded: would
use 1.75 GiB (attempt to allocate chunk of 4219172 bytes), maximum:
1.75 GiB. (MEMORY_LIMIT_EXCEEDED) (version 22.1.3.7 (official build)). (POSTGRESQL_REPLICATION_INTERNAL_ERROR) (version 22.1.3.7 (official
build))
I've tried increasing allocated memory and adjusting other settings, but still getting the same problem.
set max_memory_usage = 8000000000;
set max_memory_usage_for_user = 8000000000;
set max_bytes_before_external_group_by = 1000000000;
set max_bytes_before_external_sort = 1000000000;
set max_block_size=512, max_threads=1, max_rows_to_read=512;

Related

Memory limit exceeded when running very simple query in Clickhouse

I have a very large table (730M rows) that uses the ReplacingMergeTree engine. I've started getting "Memory limit (for query) exceeded" even when running trivial queries.
For example, SELECT * FROM my_table LIMIT 5 gives:
Code: 241. DB::Exception: Received from localhost:9000. DB::Exception: Memory limit (for query) exceeded: would use 24.50 GiB (attempt to allocate chunk of 26009509376 bytes), maximum: 9.31 GiB: While executing MergeTree.
Why is Clickhouse trying to use 24.5G of memory for a simple SELECT query, and how can I fix it?
Because many parallel threads read all columns and 65k rows and allocate several MB for each column.
How many columns in the table?
Try
set max_block_size=512, max_threads=1, max_rows_to_read=512;
SELECT * FROM my_table LIMIT 5;
First of all this looks like a ClickHouse disadvantage... However, there is a way to get around this limitation, I use a query on system.tables table, there is a partition_key field.
SELECT partition_key FROM system.tables
WHERE database = 'your_db' AND name='your_big_table'
It indicates what partition key the table has, or rather how parts of the table can be addressed. Since the table is large, there are probably partitions and we can use them:
SELECT *
FROM your_db.your_big_table
WHERE toYYYYMM(event_time) = toYYYYMM(now())
LIMIT 5;
or if it is intDiv(id_value, 10000000)
SELECT *
FROM your_db.your_big_table
WHERE intDiv(id_value, 10000000) = 0
LIMIT 5;
This way you reduce the number of rows to iterate and bypass the limitation.
I think LIMIT is not transmitted to the nodes of the CH cluster and that is why such an overhead occurs, and you got the exception.
By the way, I have the same story because of the limitation in max_rows_to_read_leaf
DB::Exception: Limit for rows (controlled by 'max_rows_to_read_leaf' setting) exceeded

Oracle updatexml throws pga memory on high volume data

I’m using the below updatexml function to update an Xmltype column scrambling for many tables using procedure. Some tables have huge volume of data for those update is failed with PGA memory issue
Command as follows:
Update table_name
set XMLRECORD = updatexml(xmlrecord,'/row/c1/text()','SCRAMBLE1','/row/c3/text()','SCRAMBLE2')
Error message:
ORA-04036: PGA memory used by the instance exceeds PGA_AGGREGATE_LIMIT
04036.00000 - “PGA memory used by the instance exceeds PGA_AGGREGATE_LIMIT”
*Cause: Private memory across the instance exceeded the limit specified in the PGA_AGGREGATE_LIMIT initialization parameter. The
largest session using Program Global Area(PGA) memory were interrupted
to get under the limit.
*Action: Increase the PGA_AGRREGATE_LIMIT initialization parameter or reduce memory usage.
We tried to increase the pga limit but still the issue occurs. Appreciate any suggestion to handle this.
I would just recreate the table with the new data.

iam getting error like ORA-27125: unable to create shared memory segment

When I am trying to create db instances, I am getting errors like below:
SQL> ORA-27125: unable to create shared memory segment HPUX-ia64
Error: 12: Not enough space
My system physical memory has 15.68 GB and my
shmmax value is = 1640MB
sga_max_size = 3540MB
sga_target = 3540MB
and in opt/oracle has more than 10gb. I have tried modifying sga value but still i am only getting error.
Thanks in advance.

Slow cross-loading from oracle (oracle-fdw) into PostgreSQL

I created multiple posts in the forum about the performance problem that I have but now after i made some tests and gathered all the info that is needed I'm creating this post.
I have performance issues with two big tables. Those tables are located on an oracle remote database. I'm running the quert :
insert into local_postgresql_table select * from oracle_remote_table.
The first table has 45M records and its size is 23G. The import of the data from the oracle remote database is taking 1 hour and 38 minutes. After that I create 13 regular indexes on the table and it takes 10 minutes per table ->2 hours and 10 minutes in total.
The second table has 29M records and its size is 26G. The import of the data from the oracle remote database is taking 2 hours and 30 minutes. The creation of the indexes takes 1 hours and 30 minutes (some are indexes on one column and the creation takes 5 min and some are indexes on multiples column and it takes 11 min.
Those operation are very problematic for me and I'm searching for a solution to improve the performance. The parameters I assigned :
min_parallel_relation_size = 200MB
max_parallel_workers_per_gather = 5
max_worker_processes = 8
effective_cache_size = 2500MB
work_mem = 16MB
maintenance_work_mem = 1500MB
shared_buffers = 2000MB
RAM : 5G
CPU CORES : 8
-I tried running select count(*) from table in oracle and in postgresql the running time is almost equal.
-Before importing the data I drop the indexes and the constraints.
-I tried to copy a 23G file from the oracle server to the postgresql server and it took me 12 minutes.
Please advice how can I continue ? How can I improve something in this operation ?

Reading large result sets in jdbc

I am using JDBC and have a Stored Procedure in DB2 which returns a huge amount of data - around 6000 rows. Because of this huge volume, network transfer (from db server to the application server) is taking time.
What I am thinking is to use multiple java threads to invoke the Stored Procedure with each thread returning different blocks of data.
Thread1 - row 1 - row 1000;
Thread2 - row 1001 - row 2000;
Thread3 - row 2001 - row 3000 and so on
All these threads can be run in parallel and I can aggregate the results of each thread.
Is there any better way to handle this problem using JDBC or any other means?
Depending upon your JDBC driver, setting the fetch size may help. With the default fetch size of 0, your driver may be reading all rows into memory at once which could be the cause of the slowness.

Resources