Execute Select Statement at Cassandra - caching

I have been a problem in Cassandra. Please help me..
I am executing Select statement at 500K rows table at intervals 1 millisecond. After some time I get message "All host(s) tried for query failed. First host tried, 10.1.60.12:9042: Host considered as DOWN. See innerErrors".
I run select statement the fallowing:
select * from demo.users
this returning to me 5K rows. There are 500K rows in the users table.
I don't know what is wrong. I have not changed the cassandra.yaml file.
I need to make settings for the memory cache? There is too much disk i/o when I run select statement.
Please help me

A range query (select * with no primary key or token ranges) can be a very expensive query that has to hit at least 1 of every replica set (depend on size of dataset). If your trying to read the entire dataset or doing batch processing either be best to use spark connector or behave like it, and query individual token ranges to prevent putting too much load on coordinators.
If you are going to be using inefficient queries (which is fine, just don't expect the same throughput as normal reads) you will probably need more resources or some specialized tuning. You could add more nodes or look into whats causing it to go DOWN. Most likely its GCs from heap load, so can check GC log. If you have the memory available you can increase heap. Would be good idea to max heap size since with reading everything, system caches are not going to be as meaningful. Use G1 once over 16gb (which you should be) in the jvm.options.

Related

How to maintain cache in ClickHouse?

I use crontab to schedule a SQL that queries a big table every 2 hours.
select a,b,c,d,e,f,g,h,i,j,k,many_cols from big_table format Null
It takes anywhere from 5 minutes to 30 seconds at a time.
What I can see from the query_log is that when the SQL time is low, the MarkCacheHits value is high, when the time is high, the MarkCacheHits value is low, and the MarkCacheMiss value is high.
And I'm wondering how to make mark cache hit as many as possible? (This is probably not the only big table that needs to be warmed up)
Will mark cache be replaced by other queries and what is its limit?
Does the warm-up way of selecting specific columns really work for an aggregate query of those columns? For example, warm-up SQL is as above, and the aggregate query can be select a,sum(if(b,c,0)) from big_table group by a
My clickhouse server has been hanging occasionally recently, and I can't see any errors or exceptions at the corresponding time from the log. Could this be related to my regular warm-up query of the big table?
In reality you placing data into Linux disk cache.
Will mark cache be replaced by other queries and what is its limit?
yes, will be replaced, 5GB <mark_cache_size>5368709120</mark_cache_size>
Does the warm-up way of selecting specific columns really work for an aggregate query of those columns?
Yes because you put files into Linux cache.
Could this be related to my regular warm-up query of the big table?
No.

ClickHouse - Inserting more than a hundred entries per query

I do not figure out how to increase the max number of entries per query. I would like to insert a thousand entries per query, and the default value is 100.
According to the doc, the parameter max_partitions_per_insert_block defines the limit of simultaneous entries.
I've tried to modify it from the ClickHouse client, but my insertion still fails :
$ clickhouse-client
my-virtual-machine :) set max_partitions_per_insert_block=1000
*SET* max_partitions_per_insert_block = 1000
Ok.
0 rows in set. Elapsed: 0.001 sec.
Moreover, this is no max_partitions_per_insert_block field in the /etc/clickhouse-server/config.xml file.
After modifying max_partitions_per_insert_block, I've tried to insert my data, but I'm stuck with this error :
infi.clickhouse_orm.database.ServerError: Code: 252, e.displayText() = DB::Exception: Too many partitions for single INSERT block (more than 100). The limit is controlled by 'max_partitions_per_insert_block' setting. Large number of partitions is a common misconception. It will lead to severe negative performance impact, including slow server startup, slow INSERT queries and slow SELECT queries. Recommended total number of partitions for a table is under 1000..10000. Please note, that partitioning is not intended to speed up SELECT queries (ORDER BY key is sufficient to make range queries fast). Partitions are intended for data manipulation (DROP PARTITION, etc). (version 19.5.3.8 (official build))
EDIT: I'm still stuck with this. I cannot even manually set the parameter to the value I want with SET max_partitions_per_insert_block = 1000: the value is changed but goes back to 100 after exiting and reopening clickhouse-client (even with sudo, so it does not look like a permission problem).
I figured it out when reading again the documentation, especially this document. I have recognized in the web profile settings I saw in the system.settings table. I just tried to insert the following in my default's profile, reloaded, and my insert of a thousand entries wen well : <max_partitions_per_insert_block>1000</max_partitions_per_insert_block>
I guess it was obvious for some, but probably not for unexperimented people.
Most likely you should change the partitioning scheme. Each partition generates several files on the file system, which can lead to disruption of the OS. In addition, this may be the cause of long mergers.

sqlplus out of process memory when trying to allocate X amount of memory

We are using sqlplus to offload data from oracle using sqlplus on a large table with 500+ columns and around 15 million records per day.
The query fails as oracle is not able to allocate the required memory for the result set.
Fine tuning oracle DB server to increase memory allocation is ruled out since it is used across teams and is critical.
This is a simple select with a filter on a column.
What options do I have to make it work?
1) to break my query down into multiple chunks and run it in nightly batch mode.
If so , how can a select query be broken down
2) Are there any optimization techniques I can use while using sqlplus for a select query on a large table?
3) Any java/ojdbc based solution which can break a select into chunks and reduce the load on db server?
Any pointers are highly appreciated.
Here is the errror message thrown:
ORA-04030: out of process memory when trying to allocate 169040 bytes (pga heap,kgh stack)
ORA-04030: out of process memory when trying to allocate 16328 bytes (koh-kghu sessi,pl/sql vc2)
The ORA-4030 indicates the process needs more memory(UGA in SGA/PGA depending upon the server architecture) to execute job.
This could be caused by shortage of RAM(Dedicated server mode environment), a small PGA size, or may be operating system setting to restrict allocation of enough RAM.
This MOS Note describes how to diagnose and resolve ORA-04030 error.
Diagnosing and Resolving ORA-4030 Errors (Doc ID 233869.1)
Your option 1 seems in your control. Breaking down the query will require knowledge of the query/data. Either a column in the data might work; i.e.
query1: select ... where col1 <= <value>
query2: select ... where col1 > <value>
... or ... you might have to build more code around the problem.
Thought: does the query involving sorting/grouping? Can you live without it? Those operations take up more memory.

Temp tablespace runs out of space and prompts ORA-01652 when a select is executed

I am facing an issue while executing a huge query, where the temp tablespace of the oracle instance runs out of space. At the following link is the query.
https://dl.dropboxusercontent.com/u/96203352/Query/title_block.sql
Size of the Temp tablespace is 30 GB and due to clients concerns I can not extend its space more. Therefore, I tried to reduce sort operations but it all went in vain. Is there anyway to optimize or reduce sorts operations of this query.
At the following link the statistics of the PLAN Table is placed.
https://dl.dropboxusercontent.com/u/96203352/Query/PLAN_TABLE_INFO.txt
As the size of the query and the explain plan is way to large to be posted in this question, therefore I have to share it while using a link. Sorry for the inconvenience.
One more thing I can not remove distinct from the select statement as there is duplication in the data returned.
Please help.
The query plan says it all at the very top: all the temp space is being used by the DISTINCT operation. The reason that operation is requiring so much memory is that your query rows are so fat... around 10,000 bytes each!!!
I'm not sure why your query rows are so fat, but one suggestion I would try would be to change your query to do the DISTINCT operation and then, in a later step, CAST the necessary columns to VARCHAR2(3999). Logically that shouldn't affect it, but I've seen strange behaviour with CAST over the years. I wouldn't trust it enough not to at least try my suggestion.

(TSQL) INSERT doubling time of the query

I have a quite complex multi-join TSQL SELECT query that runs for about 8 seconds and returns about 300K records. Which is currently acceptable. But I need to reuse results of that query several times later, so I am inserting results of the query into a temp table. Table is created in advance with columns that match output of SELECT query. But as soon as I do INSERT INTO ... SELECT - execution time more than doubles to over 20 seconds! Execution plans shows that 46% of the query cost goes to "Table Insert" and 38% to Table Spool (Eager Spool).
Any idea why this is happening and how to speed it up?
Thanks!
The "Why" of it hard to say, we'd need a lot more information. (though my SWAG would be that it has to do with logging...)
However, the solution, 9 times out of 10 is to use SELECT INTO to make your temp table.
I would start by looking at standard tuning itmes. Is disk performing? Are there sufficient resources (IOs, RAM, CPU, etc)? Is there a bottleneck in the RDBMS? Does sound like the issue but what is happening with locking? Does other code give similar results? Is other code performant?
A few things I can suggest based on the information you have provided. If you don't care about dirty reads, you could always change the transaction isolation level (if you're using MS T-SQL)
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
select ...
This may speed things up on your initial query as locks will not need to be done on the data you are querying from. If you're not using SQL server, do a google search for how to do the same thing with the technology you are using.
For the insert portion, you said you are inserting into a temp table. Does your database support adding primary keys or indexes on your temp table? If it does, have a dummy column in there that is an indexed column. Also, have you tried to use a regular database table with this? Depending on your set up, it is possible that using that will speed up your insert times.

Resources