How to kill a process (query) in ClickHouse - clickhouse

Is there any way to kill an idle query in ClickHouse? I have an OPTIMIZE query that will never be completed (as it is running against a ReplicatedMergeTree table) blocking a table that I need to delete.

I usually run
SELECT query_id, query FROM system.processes;
To find running queries. From that list I find query_id of query I want to kill and then execute
KILL QUERY WHERE query_id = '<id>';
More info can be found in Documentation on KILL QUERY
To kill Mutation, there's similar KILL MUTATION.

Yes, there is a replace_running_query option.
In short, you can add a query_id parameter to your HTTP request, like that:
http://localhost:8123/?query=SELECT * FROM system.numbers LIMIT 100000000& replace_running_query=1&query_id=example
Then do a second HTTP request, with the same query_id:
http://localhost:8123/?query=SELECT 1&replace_running_query=1&query_id=example
The server will cancel the first query and run the second one instead.
You can override the option (it is disabled by default) in your config file to get rid of placing it in the request arguments.

Related

update data from MYSQL to CH Cluster problem

this problem makes me very confused, could anybody help me? thank you in advantage.
I use ReplicatedMergeTree to create the table.
Here is update SQL:
when I first run it, the result is correct, but run it again, there will be zero
did you try to run
SELECT count(*) FROM ods.fb_daily_inventory WHERE uds_load_date >= '2021-05-04'
on MySQL server, to ensure data still exists on MySQL?
Mutations are asynchronous. Your delete was sent to Zookeeper, then your insert ran, then your select and likely then your delete statement. If you are updating from mysql you should look at bin log replication to replicate only new or changed data over. Clickhouse works best with new data, not updates/deletes as those are asynchronous.
A mutation query returns immediately after the mutation entry is added (in case of replicated tables to ZooKeeper, for non-replicated tables - to the filesystem). The mutation itself executes asynchronously using the system profile settings. To track the progress of mutations you can use the system.mutations table. A mutation that was successfully submitted will continue to execute even if ClickHouse servers are restarted. There is no way to roll back the mutation once it is submitted, but if the mutation is stuck for some reason it can be cancelled with the KILL MUTATION query.
https://clickhouse.com/docs/en/sql-reference/statements/alter/#mutations

Snowflake queries with CTE seems not to cache results

When I execute a query containing a CTE (common table expression defined by WITH clause) in Snowflake, the result is not cached.
The question now is: is this how Snowflake works-as-designed, or do I need to consider something to force a result caching?
Snowflake does use the result set cache for CTEs. You can confirm that by running this simple one twice. It should show in the history table that the second one did not use a warehouse to run. Drilling down into the query profile should show the second one's execution plan is a single node, query result reuse.
with
my_cte(L_ORDERKEY) as
(select L_ORDERKEY from "SNOWFLAKE_SAMPLE_DATA"."TPCH_SF1"."LINEITEM")
select * from MY_CTE limit 10000;
There are certain conditions that make Snowflake not use the result set cache. One of the more common ones is use of a function that can produce different results on multiple runs. For example, if a query includes current_timestamp(), that's going to change each time it runs.
Here is a complete list of the criteria that all must be met in order to use the result set cache. Even then, there's a note that meeting all of those criteria does not guarantee use of the result set cache.
https://docs.snowflake.com/en/user-guide/querying-persisted-results.html#retrieval-optimization

Impala query stuck in status Executing

I have a query CREATE TABLE foobar AS SELECT ... that runs successfully in Hue (the returned status is Inserted 986571 row(s)) and takes a couple seconds to complete. However, in Cloudera Manager its status - after more than 10 minutes - still says Executing.
Is it a bug in Cloudera Manager or is this query actually still running?
When Hue executes a query, it leaves the query open so that users can page through results at their own pace. (Of course, this behavior isn't very useful for DDL statements.) That means that Impala still considers the query to be executing, even if it is not actively using CPU cycles (keep in mind it is still holding memory!). Hue will close the query if explicitly told to, or when the page/session is closed, e.g. using the hue command:
> build/env/bin/hue close_queries --help
Note that Impala has a query option to automatically 'timeout' queries after a period of time, see query_timeout_s. Hue sets this to 10 minutes by default, but you can override it in the hue.ini settings.
One thing to note is that when queries 'time out', they are cancelled but not closed, i.e. the query will remain "in flight" with a CANCELLED status. The reason for this is so that users (or tools) can continue to observe the query metadata (e.g. query profile, status, etc.), which would not be available if the query is fully closed and thus deregistered from the impalad. Unfortunately these cancelled queries may still hold some non-negligible resources, but this will be fixed with IMPALA-1575.
More information: Hive and Impala queries life cycle

Count inserts, deletes and updates in a PowerCenter session

Is there a way in PowerCenter 9.1 to get the number of inserts, deletes and updates after an execution of a session? I can see the data on the log but I would like to see it in a more ordered fashion in a table.
The only way I know requires building the mapping appropriately. You need to have 3 separate instances of the target and use a router to redirect the rows to either TARGET_insert or TARGET_update or TARGET_delete. Workflow Monitor will then show a separate row for the inserted, updated and deleted rows.
There are few ways,
1. You can use $tgtsuccessrows / $TgtFailedRows and assign it to workflow variable
2. Expression transformation can be used with a variable port to keep track of insert/update/delete
3. You can even query OPB_SESSLOG in second stream to get row count inside same session.
Not sure if PowerCenter 9.1 offers a solution to this problem.
You can design your mapping to populate a Audit table to track the number of insert/update/delete's
You can download a sample implementation from Informatica Marketplace block titled "PC Mapping : Custom Audit Table"
https://community.informatica.com/solutions/mapping_custom_audit_table
There are multiple ways like you can create a assignment task attach this assignment task just after you session once the session complete its run the assignment task will pass on the session stats from session to the workflow variable defined at workflow level, sessions stats like $session.status,$session.rowcount etc and now create a worklet having a mapping included in it, pass the session stats captured at workflow level to the newly created worklet and from worklet to the mapping, now once the stats are available at mapping level in the mapping scan these stats (using a SQL or EXP transformation) and then write these stats to the AUDIT table ... attach the combination of assignment task and worklet after each session and it will start capturing the stats of each session after the session completes it run....

Cancel execution a big query on oracle submitted by SQLDeveloper

Perhaps, this is a naive question. When we use SQLDeveloper to execute a big query to Oracle, the we cancel the task. I wonder if it helps to cancel the execution of the server or not?
Thanks,
yes, in case the DB server finds the time to handle the protocol message, it cancels the execution of the statement and returns an
ORA-01013: user requested cancel of current operation instead of an SQL result set.

Resources