CockroachDb memory budget constraint - cockroachdb

I have a local installation of CockroachDb on my windows PC and when I run a particular select query, I get the following error message:
7143455 bytes requested, 127403581 currently allocated, 134217728 bytes in budget.
I have read a blog post here but I haven't found a solution. I will appreciate a help on how to increase this budget limit.

CockroachDB versions between 2.0 and 2.0.2 have a bug in memory accounting for JSONB columns, leading to this error. The bug will be fixed in version 2.0.3, due in mid-June.
As a workaround, you may be able to rewrite this query to be more efficient (This might reduce the memory usage enough to work even with the bug. Even if it doesn't, it'll speed up the query when 2.0.3 is available). If I'm reading your query correctly, this is equivalent to
SELECT ID, JsonData,PrimaryIDs,IsActive,IsDeleted FROM "TableName"
WHERE LOWER(JsonData->>'Name') LIKE '%transaction%'
ORDER BY ID OFFSET 0 FETCH NEXT 100 ROWS ONLY
The subquery with ROW_NUMBER() was used with older versions of SQL Server, but since SQL Server 2012, the OFFSET 0 FETCH NEXT N ROWS ONLY version has been available and is more efficient.
The syntax OFFSET 0 FETCH NEXT N ROWS ONLY syntax comes from the SQL standard so it should work with most databases. CockroachDB also supports the LIMIT keyword which is used in MySQL and PostgreSQL for the same purpose.

Related

Impala query with LIMIT 0

Being production support team member, I investigate issues with various Impala queries and while researching on an issue , I see a team submits an Impala query with LIMIT 0 which obviously do not return any rows and then again without LIMIT 0 which gives them result. I guess they submit these queries from IBM Datastage. Before I question them why they do so.. wanted to check what could be a reason for someone to run with LIMIT 0. Is it just to check syntax or connection with Impala? I see a similar question discussed here in context of SQL but thought to ask anyway in Impala perspective. Thanks Neel
I think you are partially correct.
Pls note, limit will process all the data and then apply limit clause.
LIMIT 0 is mostly used to -
to check if syntax of SQL is correct. But impala do fetch all the records before applying limit. so SQL is completely validated. Some system may use this to check out the sql they generated automatically before actually applying it in server.
limit fetching lots of rows from a huge table or a data set every time you run a SQL.
sometime you want to create an empty table using structure of some other tables but do not want to copy store format, configurations etc.
dont want to burden the hue/any interface that is interacting with impala. All data will be processed but will not be returned.
performance test - this will somewhat give you an idea of run time of SQL. i used the word somewhat because its not actual time to complete but estimated time to complete a SQL.

How to know how much memory is used by a query in clickhouse

I am trying to test performance of clickhouse to get sense how much memory i need for a dedicated server.
Currently I'm using PostgreSQL in production and now I want to migrate to clickhouse, so I inserted some of production data into a clickhouse server locally and executing the most used queries on production on clickhouse.
But I do not know how much memory does clickhouse use to execute these queries.
After some research I found the answer hope it help others.
clickhouse has table called 'system.query_log' that is used for storing statistics of each executed query like duration or memory usage
system.query_log
also there is table 'system.processes' that has information about current queries
system.processes
I'm using the following query to inspect recent queries. It returns memory use, query duration, number of read rows, used functions, and more:
SELECT * FROM system.query_log
WHERE type != 'QueryStart' AND NOT has(databases, 'system')
ORDER BY event_time_microseconds DESC
LIMIT 20;

Does the number of columns in a Vertica table impact query performance?

We are working with a Vertica 8.1 table containing 500 columns and 100 000 rows.
The following query will take around 1.5 seconds to execute, even when using the vsql client straight on one of the Vertica cluster nodes (to eliminate any network latency issue) :
SELECT COUNT(*) FROM MY_TABLE WHERE COL_132 IS NOT NULL and COL_26 = 'anotherValue'
But when checking the query_requests table, the request_duration_ms is only 98 ms, and the resource_acquisitions table doesn't indicate any delay in resource asquisition. I can't understand where the rest of the time is spent.
If I then export to a new table only the columns used by the query, and run the query on this new, smaller, table, I get a blazing fast response, even though the query_requests table still tells me the request_duration_ms is around 98 ms.
So it seems that the number of columns in the table impacts the execution time of queries, even if most of these columns are not referenced. Am I wrong ? If so, why is it so ?
Thanks by advance
It sounds like your query is running against the (default) superprojection that includes all tables. Even though Vertica is a columnar database (with associated compression and encoding), your query is probably still touching more data than it needs to.
You can create projections to optimize your queries. A projection contains a subset of columns; if one is available that has all the columns your query needs, then the query uses that instead of the superprojection. (It's a little more complicated than that, because physical location is also a factor, but that's the basic idea.) You can use the Database Designer to create some initial projections based on your schema and sample queries, and iteratively improve it over time.
I was running Vertica 8.1.0-1, it seems the issue was a Vertica bug in the Vertica planning phase causing a performance degradation. It was solved in versions >= 8.1.1 :
[https://my.vertica.com/docs/ReleaseNotes/8.1.x/Vertica_8.1.x_Release_Notes.htm]
VER-53602 - Optimizer - This fix improves complex query performance during the query planning phase.

sqlplus out of process memory when trying to allocate X amount of memory

We are using sqlplus to offload data from oracle using sqlplus on a large table with 500+ columns and around 15 million records per day.
The query fails as oracle is not able to allocate the required memory for the result set.
Fine tuning oracle DB server to increase memory allocation is ruled out since it is used across teams and is critical.
This is a simple select with a filter on a column.
What options do I have to make it work?
1) to break my query down into multiple chunks and run it in nightly batch mode.
If so , how can a select query be broken down
2) Are there any optimization techniques I can use while using sqlplus for a select query on a large table?
3) Any java/ojdbc based solution which can break a select into chunks and reduce the load on db server?
Any pointers are highly appreciated.
Here is the errror message thrown:
ORA-04030: out of process memory when trying to allocate 169040 bytes (pga heap,kgh stack)
ORA-04030: out of process memory when trying to allocate 16328 bytes (koh-kghu sessi,pl/sql vc2)
The ORA-4030 indicates the process needs more memory(UGA in SGA/PGA depending upon the server architecture) to execute job.
This could be caused by shortage of RAM(Dedicated server mode environment), a small PGA size, or may be operating system setting to restrict allocation of enough RAM.
This MOS Note describes how to diagnose and resolve ORA-04030 error.
Diagnosing and Resolving ORA-4030 Errors (Doc ID 233869.1)
Your option 1 seems in your control. Breaking down the query will require knowledge of the query/data. Either a column in the data might work; i.e.
query1: select ... where col1 <= <value>
query2: select ... where col1 > <value>
... or ... you might have to build more code around the problem.
Thought: does the query involving sorting/grouping? Can you live without it? Those operations take up more memory.

PSQL = fast, remote sql = v.slow

Okay, I appreciate that the question is a tad vague, but after a day of googling, I'm getting nowhere, any help will be appreciated, and I'm willing to try anything.
The issue is that we have a PostgreSQL db, that has arount 10-15 million rows in a particular table.
We're doing a select for all the columns, based on a DateTime field in the table. No joins, just a standard select with a where clause (time >= x AND time <= y). There is an index on the field as well...
When I perform the sql using psql on the local machine, it runs in around 15-20 seconds, and brings back .5 million rows, one of which is a text field holding a large amount of data per row (a program stack trace). When we use the same sql and run it through Npgsql, or pgadmin III on windows, it takes around 2minutes.
This is leading me to think that it's a network issue. I've checked on the machine when the query is running and it's not using a huge amount of memory or CPU, and the network speed is negligible.
I've gone through the recommendations on the Postgres Site for the memory settings as well. including updating shmmax and shmall.
It's Ubuntu 10.04, PSQL 8.4, 4GB RAM, 2.8GHz Quad Xeon (virtual but dedicated resources). the machine has it's windows counterpart (2008 R2, SS2008) on there as well, but turned off. The Query returns in around 10-15 seconds using SS with the same schema and data, I know this wouldn't be a direct comparison, but wanted to show that it wasn't a disk performance issue.
So the question is... any suggestions? Are there any network settings I should be changing? Anything that I've missed? I can't give too much information about the database, but here is a EXPLAIN ANALYZE that's obfuscated...
Index Scan using "IDX_column1" on "table1" (cost=0.00..45416.20 rows=475130 width=148) (actual time=0.025..170.812 rows=482266 loops=1)
Index Cond: (("column1" >= '2011-03-14 00:00:00'::timestamp without time zone) AND ("column1" <= '2011-03-14 23:59:59'::timestamp without time zone))
Total runtime: 196.898 ms
Try setting cursor_tuple_fraction to 1 in psql and see if it changes the results. If so, then it is likely that the optimiser is picking a better plan based on only getting the top 10% or so of results compared to getting the whole lot. Istr psql uses a cursor to fetch results piece by piece rather than the "firehose" executequery method.
If this is the case, it doesn't point directly to a solution, but you will need to tweak your planner settings, and at least if you can reproduce the behaviour in psql than it may be easier to see the differences and test changes.

Resources