I use ReplicatedMergeTree and Distributed table in clickhouse to make a HA cluster.
And I think it should store two replicas in cluster,it will be ok when one of node has so problems.
This is some of my configuration(config.xml):
...
<logs>
<shard>
<weight>1</weight>
<internal_replication>true</internal_replication>
<replica>
<host>node1</host>
<port>9000</port>
</replica>
<replica>
<host>node2</host>
<port>9000</port>
</replica>
</shard>
<shard>
<weight>1</weight>
<internal_replication>true</internal_replication>
<replica>
<host>node2</host>
<port>9000</port>
</replica>
<replica>
<host>node3</host>
<port>9000</port>
</replica>
</shard>
<shard>
<weight>1</weight>
<internal_replication>true</internal_replication>
<replica>
<host>node3</host>
<port>9000</port>
</replica>
<replica>
<host>node1</host>
<port>9000</port>
</replica>
</shard>
</logs>
...
<!-- each node is different -->
<macros>
<layer>01</layer>
<shard>01</shard>
<replica>node1</replica>
</macros>
<!-- below is node2 and node3 configuration
<macros>
<layer>02</layer>
<shard>02</shard>
<replica>node2</replica>
</macros>
<macros>
<layer>03</layer>
<shard>03</shard>
<replica>node3</replica>
</macros>
-->
...
And then I create table in each node by clickhouse-client --host cmd:
create table if not exists game(uid Int32,kid Int32,level Int8,datetime Date)
ENGINE = ReplicatedMergeTree('/clickhouse/data/{shard}/game','{replica}')
PARTITION BY toYYYYMMDD(datetime)
ORDER BY (uid,datetime);
After create ReplicatedMergeTree table , I then create distribute table in each node (just for each node have this table, in fact it only create on one node)
CREATE TABLE game_all AS game
ENGINE = Distributed(logs, default, game ,rand())
This is just ok now.And I also think it is ok when i insert data to game_all.But when I query data from game table and game_all table , I find it must be something wrong.
Because I insert one record to game_all table ,but the result is 3 which it must be one ,and I query each game table ,just one table has 1 record.Finally I check each node's disk and it seems to have no replicas in this table ,Because just one node have some disk use over 4KB ,others have no disk use just 4KB.
Related
Trying to deploy clickhouse on k8s to use as graphite backend. New to clickhouse I have gone through links with same issue but none is helping me. Trying to create two clickhouse servers planning to add one more in-future.. clickhouse server deployed as k8 statefulset.
clickhouse1-0.clickhouse1-hs.ns-vaggarwal.svc.cluster.local :) select * from graphite
SELECT *
FROM graphite
Query id: 7ba316b8-bc88-4ab2-83d2-269a990f93b7
Received exception from server (version 20.12.4):
Code: 306. DB::Exception: Received from localhost:9000. DB::Exception: Stack size too large. Stack address: 0x7fd2d75fe000, frame address: 0x7fd2d79fd3f0, stack size: 4197392, maximum stack size: 8388608.
0 rows in set. Elapsed: 0.044 sec.
clickhouse1-0.clickhouse1-hs.ns-vaggarwal.svc.cluster.local :) exit
Example init.sql i am using.
CREATE TABLE IF NOT EXISTS default.graphite_index
(
Date Date,
Level UInt32,
Path String,
Version UInt32,
updated DateTime DEFAULT now(),
status Enum8('SIMPLE' = 0, 'BAN' = 1, 'APPROVED' = 2, 'HIDDEN' = 3, 'AUTO_HIDDEN' = 4)
)
ENGINE = ReplicatedReplacingMergeTree('/clickhouse/tables/single/default.graphite_index', '{replica}', updated)
PARTITION BY toYYYYMM(Date)
ORDER BY (Path)
SETTINGS index_granularity = 1024;
CREATE TABLE IF NOT EXISTS default.graphite (
Path String CODEC(ZSTD(2)),
Value Float64 CODEC(Delta, ZSTD(2)),
Time UInt32 CODEC(Delta, ZSTD(2)),
Date Date CODEC(Delta, ZSTD(2)),
Timestamp UInt32 CODEC(Delta, ZSTD(2))
) ENGINE = Distributed('graphite', '', graphite, xxHash64(Path));
CREATE DATABASE IF NOT EXISTS shard_01;
CREATE TABLE IF NOT EXISTS shard_01.graphite
AS default.graphite
ENGINE = ReplicatedGraphiteMergeTree('/clickhouse/tables/01/graphite', 'clickhouse1-0', 'graphite_rollup')
PARTITION BY toYYYYMM(Date)
ORDER BY (Path, Time);
CREATE DATABASE IF NOT EXISTS shard_02;
CREATE TABLE IF NOT EXISTS shard_02.graphite
AS default.graphite
ENGINE = ReplicatedGraphiteMergeTree('/clickhouse/tables/02/graphite', 'clickhouse1-0', 'graphite_rollup')
PARTITION BY toYYYYMM(Date)
ORDER BY (Path, Time);
Relevant part from config.xml
<remote_servers>
<graphite>
<!-- Shard 01 -->
<shard>
<internal_replication>true</internal_replication>
<replica>
<host>clickhouse1-0</host>
<port>9000</port>
</replica>
<replica>
<host>clickhouse2-0</host>
<port>9000</port>
</replica>
</shard>
<!-- Shard 02 -->
<shard>
<internal_replication>true</internal_replication>
<replica>
<host>clickhouse1-0</host>
<port>9000</port>
</replica>
<replica>
<host>clickhouse2-0</host>
<port>9000</port>
</replica>
</shard>
</graphite>
</remote_servers>
metric paths have been indexed.
clickhouse1-0.clickhouse1-hs.ns-vaggarwal.svc.cluster.local :) show tables
SHOW TABLES
Query id: 8cc91a1f-0be1-4ae8-98ed-411b06624968
┌─name───────────┐
│ graphite │
│ graphite_index │
└────────────────┘
2 rows in set. Elapsed: 0.002 sec.
clickhouse1-0.clickhouse1-hs.ns-vaggarwal.svc.cluster.local :) select * from graphite_index LIMIT 5;
SELECT *
FROM graphite_index
LIMIT 5
Query id: ced975d1-cd06-49b7-a18f-96e0e62504fd
┌───────Date─┬─Level─┬─Path────────────────────────────────────────────────────────────┬────Version─┬─────────────updated─┬─status─┐
│ 1970-02-12 │ 30005 │ active.pickle.carbon-clickhouse1-6489b8f7c8-mbzpr.agents.carbon │ 1612267498 │ 2021-02-02 12:04:58 │ SIMPLE │
│ 1970-02-12 │ 30005 │ active.pickle.carbon-clickhouse1-6489b8f7c8-ts6kk.agents.carbon │ 1612267558 │ 2021-02-02 12:05:58 │ SIMPLE │
│ 1970-02-12 │ 30005 │ active.pickle.carbon-clickhouse2-7898cd697d-jndms.agents.carbon │ 1612271795 │ 2021-02-02 13:16:35 │ SIMPLE │
│ 1970-02-12 │ 30005 │ active.pickle.carbon-clickhouse2-7898cd697d-pbns7.agents.carbon │ 1612271786 │ 2021-02-02 13:16:26 │ SIMPLE │
│ 1970-02-12 │ 30005 │ active.tcp.carbon-clickhouse1-6489b8f7c8-mbzpr.agents.carbon │ 1612267498 │ 2021-02-02 12:04:58 │ SIMPLE │
└────────────┴───────┴─────────────────────────────────────────────────────────────────┴────────────┴─────────────────────┴────────┘
5 rows in set. Elapsed: 0.002 sec. Processed 1.12 thousand rows, 110.75 KB (613.49 thousand rows/s., 60.45 MB/s.)
You made the indefinite loop:
CREATE TABLE IF NOT EXISTS default.graphite (
) ENGINE = Distributed('graphite', '', graphite, xxHash64(Path));
Distributed table points to itself.
must be Distributed('graphite', 'SOMEDATABASE'
or should use default database in remote_servers
<replica>
<default_database>
!!!!!! NEVER EVER use circular-replication
!!!!!! NEVER EVER use circular-replication
!!!!!! NEVER EVER use circular-replication
!!!!!! NEVER EVER use circular-replication
This article MUST be deleted. https://altinity.com/blog/2018/5/10/circular-replication-cluster-topology-in-clickhouse
This document made so much harm to ClickHouse.
I have an entity in Dynamics CRM with attributes that look like this (just a small snippet):
Date (datetime type) UserID (GUID type) Operation Entity
2020-09-24 00:47:08.000 16742A71-ED5F-E611-80EA-005056B53B31 Delete Account
2020-09-24 00:47:08.000 16742A71-ED5F-E611-80EA-005056B53B31 Create Opportunity
2020-10-07 05:37:54.000 16742A71-ED5F-E611-80EA-005056B53B31 Update Contact
2020-10-07 02:34:45.000 16742A71-ED5F-E611-80EA-005056B53B31 Update Contact
2020-10-07 09:39:02.000 16742A71-ED5F-E611-80EA-005056B53B31 Update Contact
What I'm looking to do is to get unique combination of Date (don't care about the time portion), UserID, Operation and Entity. So based on the data snipped above, I want to get 3 records as a result of a FetchXML query since the last 3 records were created on the same exact date by the same exact user doing the same exact operation against the same exact entity and the top 2 were unique.
Date (datetime type) UserID (GUID type) Operation Entity
2020-09-24 16742A71-ED5F-E611-80EA-005056B53B31 Delete Account
2020-09-24 16742A71-ED5F-E611-80EA-005056B53B31 Create Opportunity
2020-10-07 16742A71-ED5F-E611-80EA-005056B53B31 Update Contact
This would be an equivalent of GROUP BY cast(createdon as date). Is this possible to achieve in FetchXML at all?
TIA,
-Tony.
Yes, this is possible using a dategrouping.
The below example groups by year, but this article explains the other options, including day and month.
<fetch distinct='false' mapping='logical' aggregate='true'>
<entity name='opportunity'>
<attribute name='opportunityid' alias='opportunity_count' aggregate='count'/>
<attribute name='estimatedvalue' alias='estimatedvalue_sum' aggregate='sum'/>
<attribute name='estimatedvalue' alias='estimatedvalue_avg' aggregate='avg'/>
<attribute name='actualclosedate' groupby='true' dategrouping='year' alias='year' />
<filter type='and'>
<condition attribute='statecode' operator='eq' value='Won' />
</filter>
</entity>
I have created replicated merge tree table as below:
CREATE TABLE probe.a on cluster dwh (
instime UInt64,
psn UInt64
) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/probe/a', '{replica}') PARTITION BY instime ORDER BY (psn);
Then I created a distributed table as :
CREATE TABLE probe.a_distributed on cluster dwh (
instime UInt64,
psn UInt64
) ENGINE = Distributed(dwh,probe, a, rand());
I have then added macro in each server:
Server 1
<yandex>
<macros replace="true">
<shard>1</shard>
<replica>server1.com</replica>
</macros>
</yandex>
Server 2
<yandex>
<macros replace="true">
<shard>2</shard>
<replica>server2.com</replica>
</macros>
</yandex>
Remote Servers:
<dwh>
<!-- shard 01 -->
<shard>
<replica>
<host>server1.com</host>
<port>9000</port>
<user>default</user>
<password>test12pwd</password>
</replica>
</shard>
<!-- shard 02 -->
<shard>
<replica>
<host>server2.com</host>
<port>9000</port>
<user>default</user>
<password>test12pwd</password>
</replica>
</shard>
</dwh>
I have two issues when dropping partition:
When I drop partition using a distributed table
ALTER TABLE probe.a on cluster dwh DROP PARTITION '2020-03-13';
I get error:
DB::Exception: Table 'a' is replicated, but shard #4 isn't replicated
according to its cluster definition. Possibly
true is forgotten in the
cluster config. (version 19.16.14.65) (version 19.16.14.65)
Dropped partition individually but distributed table is showing half of the row still but when I check locally there is no row
How can this issue with distributed table be resolved for data sharded without replication?
you use Replicated tables. You MUST mark your shards with <internal_replication>true</internal_replication>.
<dwh>
<!-- shard 01 -->
<shard>
<internal_replication>true</internal_replication>
<replica>
<host>server1.com</host>
<port>9000</port>
<user>default</user>
<password>test12pwd</password>
</replica>
</shard>
<!-- shard 02 -->
<shard>
<internal_replication>true</internal_replication>
<replica>
<host>server2.com</host>
<port>9000</port>
<user>default</user>
<password>test12pwd</password>
</replica>
</shard>
</dwh>
I have the following json on a topic that the JDBC connector publishes to
{"APP_SETTING_ID":9,"USER_ID":10,"APP_SETTING_NAME":"my_name","SETTING_KEY":"my_setting_key"}
Here's my connector file
name=data.app_setting
connector.class=io.confluent.connect.jdbc.JdbcSourceConnector
poll.interval.ms=500
tasks.max=4
mode=timestamp
query=SELECT APP_SETTING_ID, APP_SETTING_NAME, SETTING_KEY,FROM MY_TABLE with (nolock)
timestamp.column.name=LAST_MOD_DATE
topic.prefix=data.app_setting
key.converter=org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable=false
value.converter=org.apache.kafka.connect.json.JsonConverter
value.converter.schemas.enable=false
I now want to insert a key to this message by multiplying the two integer fields - APP_SETTING_ID and USER_ID. So the key for this message becomes 9*10 = 90
Is this transformation possible through Connect and if so could someone please shed light on it
I would try seeing how far you can get with
query=SELECT APP_SETTING_ID, APP_SETTING_NAME, SETTING_KEY, (APP_SETTING_ID*USER_ID) as _key FROM MY_TABLE with (nolock)
Then add an ExtractKey transform
transforms=AddKeys,ExtractKey
# this make a map
transforms.AddKeys.type=org.apache.kafka.connect.transforms.ValueToKey
transforms.AddKeys.fields=_key
# this gets one field from the map
transforms.ExtractKey.type=org.apache.kafka.connect.transforms.ExtractField$Key
transforms.ExtractKey.field=_key
I can create and drop tables and do query normally in Presto, but when I use insert, it's always wrong as shown bellow:
presto:default> create table test.lll (a int);
CREATE TABLE
presto:default> insert into test.lll select 1;
Query 20180104_091933_00007_k8e78, FAILED, 5 nodes
Splits: 84 total, 30 done (35.71%)
0:00 [0 rows, 0B] [0 rows/s, 0B/s]
Query 20180104_091933_00007_k8e78 failed: No page sink provider for connector 'hive'
What is the reason and how to address it? Any help is appreciated.
Error Type: INTERNAL_ERROR
Error Code: GENERIC_INTERNAL_ERROR (65536)
Full stack trace:
java.lang.IllegalArgumentException: No page sink provider for connector 'hive'
at com.google.common.base.Preconditions.checkArgument(Preconditions.java:191)
at com.facebook.presto.split.PageSinkManager.providerFor(PageSinkManager.java:67)
at com.facebook.presto.split.PageSinkManager.createPageSink(PageSinkManager.java:61)
at com.facebook.presto.operator.TableWriterOperator$TableWriterOperatorFactory.createPageSink(TableWriterOperator.java:97)
at com.facebook.presto.operator.TableWriterOperator$TableWriterOperatorFactory.createOperator(TableWriterOperator.java:88)
at com.facebook.presto.operator.DriverFactory.createDriver(DriverFactory.java:92)
at com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunnerFactory.createDriver(SqlTaskExecution.java:515)
at com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunnerFactory.access$1400(SqlTaskExecution.java:490)
at com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:616)
at com.facebook.presto.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:163)
at com.facebook.presto.execution.executor.LegacyPrioritizedSplitRunner.process(LegacyPrioritizedSplitRunner.java:23)
at com.facebook.presto.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:492)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)`