Clickhouse KILL QUERY hangs forever

Clickhouse KILL QUERY hangs forever - clickhouse

Having following potential infinite-time execution query. It does not make sense why it had been issued to Clickhouse server. Query is already has been launched and still running:
SELECT Count("SN".*) FROM (SELECT sleepEachRow(3) FROM system.numbers) "SN"
Okay, try to find associated query_id or already have one. For instance, query_id = 'd02f4bdb-8928-4347-8641-4da4b9c0f486'. Let's kill it via following query:
KILL QUERY WHERE query_id = 'd02f4bdb-8928-4347-8641-4da4b9c0f486'
Achieved kill-query result seems to be okay from first look:
┌─kill_status─┬─query_id─────────────────────────────┬─user────┬─query────────────────────────────────────────────────────────────────────────┐
│ waiting │ d02f4bdb-8928-4347-8641-4da4b9c0f486 │ default │ SELECT Count("SN".*) FROM (SELECT sleepEachRow(3) FROM system.numbers) "SN"; │
└─────────────┴──────────────────────────────────────┴─────────┴──────────────────────────────────────────────────────────────────────────────┘
Okay, let's wait for several seconds and ensure that original query had been terminated successfully. Let's check it via following system information schema query:
SELECT "query_id", "query", "is_cancelled" FROM system.processes WHERE query_id = 'd02f4bdb-8928-4347-8641-4da4b9c0f486';
Unfortunately original query is still running in a some sense. It turned into "is_cancelled" state and still hangs:
┌─query_id─────────────────────────────┬─query────────────────────────────────────────────────────────────────────────┬─is_cancelled─┐
│ d02f4bdb-8928-4347-8641-4da4b9c0f486 │ SELECT Count("SN".*) FROM (SELECT sleepEachRow(3) FROM system.numbers) "SN"; │ 1 │
└──────────────────────────────────────┴──────────────────────────────────────────────────────────────────────────────┴──────────────┘
Waiting for hour and more time and still getting some results. Original query is still hanged in "is_cancelled" state. Subsequent KILL queries with same query_id does not do nothing.
Most likely, restarting the server will help solve the problem, but I do not want to do this. How to solve the problem with a stuck query without server restarting?

ClickHouse queries can't be killed during the sleep.
If you are using a recent CH release (21.12+), then the KILL flag will be checked after each block is processed (on older releases it might never be checked). Since the default block is 65536, the query will be slept for 65536 * 3 seconds ~= 54 hours before checking anything.
In future releases of CH it will be impossible to sleep for more than 3 seconds (which right now is a limit of sleep but not for sleepEachRow). In the meantime you can either wait or restart the server.

Related

Prefer oldest version in ReplacingMergeTree

I have a maybe slightly weird use case. I have to perform expansive counts on data and build snapshots from them (something like "number of users who can access this entity"). Now I store these numbers per entity with a timestamp (so basically "at this point in time, x users could access this entity").
Now it might be, that the number doesn't change between snapshots, because no access lists have been changed and/or no users have been added. This might actually even be the default case. So of course I would like to avoid having tens of thousands of identical lines ("5 users at 10pm", "5 users at 11pm", "5 users at 12pm" and so on). Therefore a ReplacingMergeTree comes to mind. The order by would be entity, count.
There is a problem though. If I understand the documentation correctly, the ReplacingMergeTree would now keep always the latest row. So the timestamp would change. While I would like to keep the oldest timestamp, so I know the first time this count has been calculated. This in turn I can use to fill the gaps (if the count is 3h old and there is no newer count in between, obviously the same count can be assumed true for 2h ago and 1h ago).
Is there any way to achieve this?
The only thing that might come to mind might be using a uint as version and starting with MaxUint and then decrementing. But this feels slightly weird.

best way I found until now is add a version column: UInt32 DEFAULT 4294967295 - toUnixTimestamp(create_time)
CREATE TABLE test
(
`id` UInt8,
`value` UInt8,
`version` UInt32 DEFAULT 4294967295 - toUnixTimestamp(create_time),
`create_time` DateTime DEFAULT now()
)
ENGINE = ReplacingMergeTree(version)
ORDER BY id;
INSERT INTO test (id, value) VALUES (1,1);
INSERT INTO test (id, value) VALUES (1,2);
SELECT * FROM test;
┌─id─┬─value─┬────version─┬─────────create_time─┐
│ 1 │ 1 │ 2670403264 │ 2021-06-25 03:47:11 │
└────┴───────┴────────────┴─────────────────────┘
┌─id─┬─value─┬────version─┬─────────create_time─┐
│ 1 │ 2 │ 2670403251 │ 2021-06-25 03:47:24 │
└────┴───────┴────────────┴─────────────────────┘
SELECT * FROM test FINAL;
┌─id─┬─value─┬────version─┬─────────create_time─┐
│ 1 │ 1 │ 2670403264 │ 2021-06-25 03:47:11 │
└────┴───────┴────────────┴─────────────────────┘

Sporadic long query execution in Oracle RDS

We are experiencing sporadic long queries execution in our application. The database is Oracle 12.1 RDS. I can see in AppDynamics that query was executed for 13s, I'm executing it myself in Oracle SQL Developer and it never takes longer than 0.1s. I can't put query here as there are 3 of them that sporadically give execution time longer than 10s and for each of them I can't reproduce it in SQL Developer.
We've started to log Execution plan for long running queries using /*+ gather_plan_statistics */ and it is the same as if query executed for 0.1s except the fact that it doesn't have such a record "1 SQL Plan Directive used for this statement".
I'm looking for any ideas that could help to identify the root cause of this behavior.

One possibility is that you've got a cached execution plan which works fine for most parameter values, or combination of parameter values, but which fails badly for certain values/combinations. You can try adding a non-filtering predicate such as 1 = 1 to your WHERE clause. I've read but haven't tested that this can be used to force a hard parse, but it may be that you need to change the value (e.g. 1 = 1, 2 = 2, 3 = 3, etc) for each execution of your query.

Why does Vertica query_requests table report that a query took a few milliseconds, while it actually took 10 seconds?

I'm running queries against a Vertica table with close to 500 columns and only 100 000 rows.
A simple query (like select avg(col1) from mytable) takes 10 seconds, as reported by the Vertica vsql client with the \timing command.
But when checking column query_requests.request_duration_ms for this query, there's no mention of the 10 seconds, it reports less than 100 milliseconds.
The query_requests.start_timestamp column indicates that the beginning of the processing started 10 seconds after I actually executed the command.
The resource_acquisitions table show no delay in resource acquisition, but its queue_entry_timestamp column also shows the queue entry occurred 10 seconds after I actually executed the command.
The same query run on the same data but on a table with only one column returns immediately. And since I'm running the queries directly on a Vertica node, I'm excluding any network latency issue.
It feels like Vertica is doing something before executing the query. This is taking most of the time, and is related to the number of columns of the table. Any idea what it could be, and what I could try to fix it ?
I'm using Vertica 8, in a test environment with no load.

I was running Vertica 8.1.0-1, it seems the issue was caused by a Vertica bug in the query planning phase causing a performance degradation. It was solved in versions >= 8.1.1 :
https://my.vertica.com/docs/ReleaseNotes/8.1./Vertica_8.1.x_Release_Notes.htm
VER-53602 - Optimizer - This fix improves complex query performance during the query planning phase.

PG_TERMINATE_BACKEND does not end a specific session

I was unable to drop a redshift db because of a connection:
Couldn't drop my_db : #<ActiveRecord::StatementInvalid: PG::ObjectInUse: ERROR: database "my_db" is being accessed by other users
I connected (via psql) to another db of the same cluster, and checked to see the pid of of my pending session:
my_other_db=# select procpid from pg_stat_activity where datname='my_db';
procpid
---------
20457
(1 row)
So I attempted a call to PG_TERMINATE_BACKEND:
my_other_db=# select pg_terminate_backend(20457);
pg_terminate_backend
----------------------
1
(1 row)
But when I checked my pg_stat_activity, my blocking session was still here:
my_other_db=# select procpid from pg_stat_activity where datname='my_db';
procpid
---------
20457
(1 row)
And I was still unable to drop my db.
Any idea ? (I had to restart the cluster to get rid of it, which is not a satisfying solution)
(Of course, I tried with another session, which I managed to terminate)

When you cancel the query or terminate the session Redshift has to return the database to a safe state by reverting any changes already made. This can take a varying amount of time depending on what the session was doing and any other queries are inflight that affect the same table(s).

You can do one of the following
select pg_terminate_backend([pid])
cancel [pid]
Kill the query via the Redshift console
On rare occasions, ghost pids will continue to run. In these instances, you can reboot the cluster.

Oracle - Can't Find Long Running Queries

What am I missing here? I am trying to test identifying long running queries.
I have a test table with about 400 million rows called mytest.
I ran select * from mytest in sqlplus
In another window, I ran the script below to see my long running query
select s.username, s.sid, s.serial#, s.schemaname,
s.program, s.osuser, s.status, s.last_call_et
from v$session s
where last_call_et >= 1 – this is just for testing
My long running query does not show up in the result from the query above. If I change the criteria to be >=0, then I see my query showing the status as INACTIVE and last_call_et of 0 despite the fact that the query is still running. What can I do to see my long running queries like the select * from... above so that I can kill it?
Thanks

First, you need to understand what a query like select * from mytest is really doing under the covers because that's generally not going to be a long-running query. Oracle doesn't ever need to materialize that result set and isn't going to read all the data as the result of a single call. Instead, what goes on is a series of calls each of which cause Oracle to do a little bit of work. The conversation goes something like this.
Client: Hey Oracle, run the query for me: select * from mytest
Oracle: Sure thing (last_call_et resets to 0 to reflect that a new call started). I've generated a query plan and opened a cursor,
here's a handle (note that no work has been done yet to actually
execute the query)
Client: Cool, thanks. Using this cursor handle,
fetch me the next 50 rows (the fetch size is a client-side setting)
Oracle: Will do (last_call_et resets to 0 to reflect that a new call started). I started full scanning the table, read a couple of
blocks, and got 50 rows. Here you go.
Client: OK, I've processed
those. Using this cursor handle, fetch the next 50 rows
Repeat until all the data is fetched
At no point in this process is Oracle ever really being asked to do more than read a handful of blocks to get the 50 rows (or whatever the fetch size the client is requesting). At any point, the client could simply not request the next batch of data so Oracle doesn't need to do anything long-running. Oracle doesn't track the application think time between requests for more data-- it has no idea whether the client is a GUI that is in a tight loop fetching data or whether it is displaying a result to a human and waiting for a human to hit the "next" button. The vast majority of the time, the session is going to be INACTIVE because it's mostly waiting for the client to request the next batch of data (which it generally won't do until it had formatted the last batch of data for display and done the work to display it).
When most people talk about a long-running query, they're talking about a query that Oracle is actively processing for a relatively long time with no waits on a client to fetch the data.

You can use the below script to find long running query:
select * from
(
select
opname,
start_time,
target,
sofar,
totalwork,
units,
elapsed_seconds,
message
from
v$session_longops
order by start_time desc
)
where rownum <=1;

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio