It's 2 shared 2 replicas clickhouse cluster ,It's 4 clickhouse nodes
where I optimize table in one node , occurred error as following:
but it's normal where execute on any other clickhouse nodes。
risk-luck2.dg.163.org :) optimize table risk_detect_test.risk_doubtful_user_daily_device_view_lyp;
OPTIMIZE TABLE risk_detect_test.risk_doubtful_user_daily_device_view_lyp
Received exception from server (version 20.4.4):
Code: 999. DB::Exception: Received from localhost:9000. DB::Exception: Can't get data for node /clickhouse/tables/test/01-02/risk_doubtful_user_daily_device_view_lyp/replicas/risk-olap6.dg.163.org (multiple leaders Ok)/host: node doesn't exist (No node).
0 rows in set. Elapsed: 0.002 sec.
risk-luck2.dg.163.org :) show create table risk_detect_test.risk_doubtful_user_daily_device_view_lyp;
SHOW CREATE TABLE risk_detect_test.risk_doubtful_user_daily_device_view_lyp
┌─statement──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ CREATE MATERIALIZED VIEW risk_detect_test.risk_doubtful_user_daily_device_view_lyp
(
`app_id` String,
`event_date` Date,
`device_id` UInt32
)
ENGINE = ReplicatedReplacingMergeTree('/clickhouse/tables/test/{layer}-{shard}/risk_doubtful_user_daily_device_view_lyp', '{replica}')
PARTITION BY toYYYYMM(event_date)
PRIMARY KEY app_id
ORDER BY (app_id, event_date, device_id)
SETTINGS index_granularity = 8192 AS
SELECT
app_id,
event_date,
xxHash32(device_id) AS device_id
FROM risk_detect_online.dwd_risk_doubtful_detail │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
It seems it's another bug in CH
ENGINE = ReplicatedReplacingMergeTree(
'/clickhouse/tables/test/{layer}-{shard}/risk_doubtful_user_daily_device_view_lyp', '{replica}')
Can't get data for node
/clickhouse/tables/online/01-02/risk_doubtful_user_daily_device_view/replicas/risk-olap6.dg.163.org
CH tries to use incorrect Zookeeper path in case of Mat.View.
risk_doubtful_user_daily_device_view instead of risk_doubtful_user_daily_device_view_lyp.
Database also is incorrect tables/online/01-02/ /tables/test/{layer}-{shard}/
I suggest you to switch to "TO" notation. https://den-crane.github.io/Everything_you_should_know_about_materialized_views_commented.pdf
Or run optimize against the inner table
OPTIMIZE TABLE "risk_detect_test"."inner.risk_doubtful_user_daily_device_view_lyp";
clickhouse-server.log as following:
2021.08.18 16:37:11.384434 [ 128614 ] {b6de1d84-a238-4e2f-9af4-3ce0ddf8551d} <Debug> executeQuery: (from 10.200.128.91:40236) insert into dwd_risk_detect_detail(app_id, app_type, app_version, city, created_at, defense_count, defense_result, detect_count, device_code, device_id, id, ip, model, os_version, package_name, phone_brand, platform, province, region, risk_type1, risk_type2, risk_type3, role_account, role_id, sdk_version, sign_hash, ts) FORMAT TabSeparated
2021.08.18 16:37:11.384735 [ 128614 ] {b6de1d84-a238-4e2f-9af4-3ce0ddf8551d} <Trace> ContextAccess (default): Access granted: INSERT(app_id, app_type, app_version, city, created_at, defense_count, defense_result, detect_count, device_code, device_id, id, ip, model, os_version, package_name, phone_brand, platform, province, region, risk_type1, risk_type2, risk_type3, role_account, role_id, sdk_version, sign_hash, ts) ON risk_detect_online.dwd_risk_detect_detail
2021.08.18 16:37:11.385706 [ 128614 ] {b6de1d84-a238-4e2f-9af4-3ce0ddf8551d} <Debug> InterpreterSelectQuery: MergeTreeWhereOptimizer: condition "risk_type1 != 0" moved to PREWHERE
2021.08.18 16:37:11.386554 [ 128614 ] {b6de1d84-a238-4e2f-9af4-3ce0ddf8551d} <Trace> ContextAccess (default): Access granted: SELECT(id, app_id, app_type, device_id, role_id, defense_result, risk_type1, risk_type2, risk_type3, defense_count, detect_count, event_date, event_hour, event_minute) ON risk_detect_online.dwd_risk_detect_detail
2021.08.18 16:37:11.386764 [ 128614 ] {b6de1d84-a238-4e2f-9af4-3ce0ddf8551d} <Trace> ContextAccess (default): Access granted: INSERT(app_id, app_type, event_date, event_hour, event_minute, risk_type1, risk_type2, risk_type3, defense_result, defense_count, detect_count, device_id, role_id, id) ON risk_detect_online.`.inner.risk_stat_view`
2021.08.18 16:37:11.387323 [ 128614 ] {b6de1d84-a238-4e2f-9af4-3ce0ddf8551d} <Trace> ContextAccess (default): Access granted: SELECT(app_id, app_type, device_id, role_id, event_date) ON risk_detect_online.dwd_risk_detect_detail
2021.08.18 16:37:11.387434 [ 128614 ] {b6de1d84-a238-4e2f-9af4-3ce0ddf8551d} <Trace> ContextAccess (default): Access granted: INSERT(app_id, app_type, event_date, device_id, role_id) ON risk_detect_online.`.inner.risk_total_user_stat_view`
2021.08.18 16:37:11.578506 [ 128861 ] {819b05a8-5ad0-414f-a0a7-111c765cac57} <Debug> executeQuery: (from 127.0.0.1:40932) OPTIMIZE TABLE risk_detect_online.risk_doubtful_user_daily_device_view
2021.08.18 16:37:11.578659 [ 128861 ] {819b05a8-5ad0-414f-a0a7-111c765cac57} <Trace> ContextAccess (default): Access granted: OPTIMIZE ON risk_detect_online.risk_doubtful_user_daily_device_view
2021.08.18 16:37:11.580097 [ 128861 ] {819b05a8-5ad0-414f-a0a7-111c765cac57} <Error> executeQuery: Code: 999, e.displayText() = Coordination::Exception: Can't get data for node /clickhouse/tables/online/01-02/risk_doubtful_user_daily_device_view/replicas/risk-olap6.dg.163.org (multiple leaders Ok)/host: node doesn't exist (No node) (version 20.4.4.18 (official build)) (from 127.0.0.1:40932) (in query: OPTIMIZE TABLE risk_detect_online.risk_doubtful_user_daily_device_view), Stack trace (when copying this message, always include the lines below):
0. Poco::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int) # 0x104191d0 in /usr/bin/clickhouse
1. DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int) # 0x8fff8ad in /usr/bin/clickhouse
2. Coordination::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int, int) # 0xdddf7d8 in /usr/bin/clickhouse
3. Coordination::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int) # 0xdddfe02 in /usr/bin/clickhouse
4. ? # 0xddf1f60 in /usr/bin/clickhouse
5. DB::StorageReplicatedMergeTree::sendRequestToLeaderReplica(std::__1::shared_ptr<DB::IAST> const&, DB::Context const&) # 0xd76117e in /usr/bin/clickhouse
6. DB::StorageReplicatedMergeTree::optimize(std::__1::shared_ptr<DB::IAST> const&, std::__1::shared_ptr<DB::IAST> const&, bool, bool, DB::Context const&) # 0xd762546 in /usr/bin/clickhouse
7. DB::StorageMaterializedView::optimize(std::__1::shared_ptr<DB::IAST> const&, std::__1::shared_ptr<DB::IAST> const&, bool, bool, DB::Context const&) # 0xd6d5a9d in /usr/bin/clickhouse
8. DB::InterpreterOptimizeQuery::execute() # 0xd225346 in /usr/bin/clickhouse
9. ? # 0xd5499f9 in /usr/bin/clickhouse
10. DB::executeQuery(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, DB::Context&, bool, DB::QueryProcessingStage::Enum, bool, bool) # 0xd54d025 in /usr/bin/clickhouse
11. DB::TCPHandler::runImpl() # 0x9106678 in /usr/bin/clickhouse
12. DB::TCPHandler::run() # 0x9107650 in /usr/bin/clickhouse
13. Poco::Net::TCPServerConnection::start() # 0x10304f4b in /usr/bin/clickhouse
14. Poco::Net::TCPServerDispatcher::run() # 0x103053db in /usr/bin/clickhouse
15. Poco::PooledThread::run() # 0x104b2fa6 in /usr/bin/clickhouse
16. Poco::ThreadImpl::runnableEntry(void*) # 0x104ae260 in /usr/bin/clickhouse
17. start_thread # 0x74a4 in /lib/x86_64-linux-gnu/libpthread-2.24.so
18. __clone # 0xe8d0f in /lib/x86_64-linux-gnu/libc-2.24.so
2021.08.18 16:37:11.580526 [ 128861 ] {819b05a8-5ad0-414f-a0a7-111c765cac57} <Debug> MemoryTracker: Peak memory usage (for query): 0.00 B.
2021.08.18 16:37:11.580592 [ 128861 ] {} <Information> TCPHandler: Processed in 0.002 sec.
I'm investigating whether ClickHouse is a good option for OLAP purposes. To do so, I replicated some queries I have running on PostgreSQL, using ClickHouse's sintax.
All the queries I have ran are much faster than Postgres', but the ones that perform text search run out of memory. Below is the error code and the stack trace.
clickhouse_driver.errors.ServerException: Code: 241. DB::Exception:
Memory limit (for query) exceeded: would use 9.31 GiB (attempt to
allocate chunk of 524288 bytes), maximum: 9.31 GiB.
The script for the query is:
SELECT COUNT(*)
FROM ObserverNodeOccurrence as occ
LEFT JOIN
ObserverNodeOccurrence_NodeElements as occ_ne
ON occ._id = occ_ne.occurrenceId
WHERE
occ_ne.snippet LIKE '<img>'
The query above counts the number of entries of the column snippet which contain an HTML image (<img>). This column contains HTML snippets, hence searching text becomes quite expensive. A close/mid term goal is to parse this column and convert it into a set of other columns (e.g. contains_img, contains_script, etc.). But, for now, I would like to be able to run such query without running out of memory.
My question(s) is(are):
how can I successfully execute text-search queries on such column without running out of memory?
Is there a way to force the query planner to use disk as soon as it runs out of memory?
I am using MergeTree engine. Is there another engine that's able to split the load between ram and disk?
Full stack trace:
clickhouse_driver.errors.ServerException: Code: 241.
DB::Exception: Memory limit (for query) exceeded: would use 9.31 GiB (attempt to allocate chunk of 524288 bytes), maximum: 9.31 GiB. Stack trace:
0. /usr/bin/clickhouse-server(StackTrace::StackTrace()+0x22) [0x781c272]
1. /usr/bin/clickhouse-server(MemoryTracker::alloc(long)+0x8ba) [0x71bbb4a]
2. /usr/bin/clickhouse-server(MemoryTracker::alloc(long)+0xc5) [0x71bb355]
3. /usr/bin/clickhouse-server() [0x67aeb4e]
4. /usr/bin/clickhouse-server() [0x67af010]
5. /usr/bin/clickhouse-server() [0x67e5af4]
6. /usr/bin/clickhouse-server(void DB::Join::joinBlockImpl<(DB::ASTTableJoin::Kind)1, (DB::ASTTableJoin::Strictness)2, DB::Join::MapsTemplate<DB::JoinStuff::WithFlags<DB::RowRefList, false> > >(DB::Block&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, DB::NamesAndTypesList const&, DB::Block const&, DB::Join::MapsTemplate<DB::JoinStuff::WithFlags<DB::RowRefList, false> > const&) const+0xe1c) [0x68020dc]
7. /usr/bin/clickhouse-server(DB::Join::joinBlock(DB::Block&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, DB::NamesAndTypesList const&) const+0x1a5) [0x67bc415]
8. /usr/bin/clickhouse-server(DB::ExpressionAction::execute(DB::Block&, bool) const+0xa5d) [0x6d961dd]
9. /usr/bin/clickhouse-server(DB::ExpressionActions::execute(DB::Block&, bool) const+0x45) [0x6d97545]
10. /usr/bin/clickhouse-server(DB::ExpressionBlockInputStream::readImpl()+0x48) [0x6c52888]
11. /usr/bin/clickhouse-server(DB::IBlockInputStream::read()+0x188) [0x6635628]
12. /usr/bin/clickhouse-server(DB::FilterBlockInputStream::readImpl()+0xd9) [0x6c538b9]
13. /usr/bin/clickhouse-server(DB::IBlockInputStream::read()+0x188) [0x6635628]
14. /usr/bin/clickhouse-server(DB::ExpressionBlockInputStream::readImpl()+0x2d) [0x6c5286d]
15. /usr/bin/clickhouse-server(DB::IBlockInputStream::read()+0x188) [0x6635628]
16. /usr/bin/clickhouse-server(DB::ParallelInputsProcessor<DB::ParallelAggregatingBlockInputStream::Handler>::loop(unsigned long)+0x139) [0x6c7f409]
17. /usr/bin/clickhouse-server(DB::ParallelInputsProcessor<DB::ParallelAggregatingBlockInputStream::Handler>::thread(std::shared_ptr<DB::ThreadGroupStatus>, unsigned long)+0x209) [0x6c7fc79]
18. /usr/bin/clickhouse-server(ThreadFromGlobalPool::ThreadFromGlobalPool<void (DB::ParallelInputsProcessor<DB::ParallelAggregatingBlockInputStream::Handler>::*)(std::shared_ptr<DB::ThreadGroupStatus>, unsigned long), DB::ParallelInputsProcessor<DB::ParallelAggregatingBlockInputStream::Handler>*, std::shared_ptr<DB::ThreadGroupStatus>, unsigned long&>(void (DB::ParallelInputsProcessor<DB::ParallelAggregatingBlockInputStream::Handler>::*&&)(std::shared_ptr<DB::ThreadGroupStatus>, unsigned long), DB::ParallelInputsProcessor<DB::ParallelAggregatingBlockInputStream::Handler>*&&, std::shared_ptr<DB::ThreadGroupStatus>&&, unsigned long&)::{lambda()#1}::operator()() const+0x7f) [0x6c801cf]
19. /usr/bin/clickhouse-server(ThreadPoolImpl<std::thread>::worker(std::_List_iterator<std::thread>)+0x1af) [0x71c778f]
20. /usr/bin/clickhouse-server() [0xb2ac5bf]
21. /lib/x86_64-linux-gnu/libpthread.so.0(+0x76db) [0x7fc5b50826db]
22. /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f) [0x7fc5b480988f]
Run Clickhouse-Client in terminal
set max_bytes_before_external_group_by=20000000000; --20 GB for external group by
set max_memory_usage=40000000000; --40GB for memory limit
I have a problem with the output of r.cross. I hope you can follow my description without MWE:
I have 3 rasters I want to cross with the following characteristics:
GRASS 7.4.0 (Bengue):~ > r.stats soil_t,lcov,watermask -N
100%
4 8 0
4 8 1
4 9 0
[...]
I would expect r.cross to create a raster with a category for each line shown above. However, I get the following:
GRASS 7.4.0 (Bengue):~ > r.cross input=soil_t,lcov,watermask output=svc
GRASS 7.4.0 (Bengue):~ > r.category svc
0
1 category 4; category 8; category 1
2 category 4; category 9; category 0
[...]
Why is the first line just zero when one would rather expect something like: 1 category 4; category 8; category 0?
EDIT: Just noticed that under GRASS version 6.4 it runs as expected:
GRASS 6.4.6 (Bengue):~ > r.category svc
0
1 category 4; category 8; category 0
2 category 4; category 8; category 1
3 category 4; category 9; category 0
So, something must be wrong with the 7.4 version of r.cross?!
Thanks for your help!
System infos:
GRASS version 7.4.0
Ubuntu MATE 16.04 (xenial)
just in case somebody comes across this post: It was also asked in the mailing list shortly after this post by somebody else: https://lists.osgeo.org/pipermail/grass-user/2018-February/077934.html. As it seems, it is a bug and not yet fixed in the latest release version of GRASS.
I am new to ruby and trying to use regular expression.
Basically I want to read a file and check if it has the right format.
Requirements to be in the correct format:
1: The word should start with from
2: There should be one space and only one space is allowed, unless there is a comma
3: Not consecutive commas
4: from and to are numbers
5: from and to must contain a colon
from: z to: 2
from: 1 to: 3,4
from: 2 to: 3
from:3 to: 5
from: 4 to: 5
from: 4 to: 7
to: 7 from: 6
from: 7 to: 5
0: 7 to: 5
from: 24 to: 5
from: 7 to: ,,,5
from: 8 to: 5,,5
from: 9 to: ,5
If I have the correct regular expression, then the output should be:
from: 1 to: 3,4
from: 2 to: 3
from: 4 to: 5
from: 4 to: 7
from: 7 to: 5
from: 24 to: 5
so in this case these are the false ones:
from: z to: 2 # because starts with z
from:3 to: 5 # because there is no space after from:
to: 7 from: 6 # because it starts with to but supposed to start with from
0: 7 to: 5 # starts with 0 instead of from
from: 7 to: ,,,5 # because there are two consecutive commas
from: 8 to: 5,,5 # two consecutive commas
from: 9 to: ,5 # start with comma
OK, the regex you want is something like this:
from: \d+(?:,\d+)* to: \d+(?:,\d+)*
This assumes that multiple numbers are permitted in the from: column as well. If not, you want this one:
from: \d+ to: \d+(?:,\d+)*
To verify that the whole file is valid (assuming all it contains are lines like this one), you could use a function like this:
def validFile(filename)
File.open(filename).each do |line|
return false if (!/\d+(?:,\d+)* to: \d+(?:,\d+)*/.match(line))
end
return true
end
What you are looking for is called negative lookahead. Specifically, \d+(?!,,) which says: match 1 or more consecutive digits not followed by 2 commas. Here is the whole thing:
str = "from: z to: 2
from: 1 to: 3,4
from: 2 to: 3
from:3 to: 5
from: 4 to: 5
from: 4 to: 7
to: 7 from: 6
from: 7 to: 5
0: 7 to: 5
from: 24 to: 5
from: 7 to: ,,,5
from: 8 to: 5,,5
from: 9 to: ,5
"
str.each_line do |line|
puts(line) if line =~ /\Afrom: \d+ to: \d+(?!,,)/
end
Output:
from: 1 to: 3,4
from: 2 to: 3
from: 4 to: 5
from: 4 to: 7
from: 7 to: 5
from: 24 to: 5
I am working on a Hierarchical panel data using WinBugs. Assuming a data on school performance - logs with independent variable logp & rank. All schools are divided into three categories (cat) and I need beta coefficient for each category (thus HLM). I am wanting to account for time-specific and school specific effects in the model. One way can be to have dummy variables in the list of variables under mu[i] but that would get messy because my number of schools run upto 60. I am sure there must be a better way to handle that.
My data looks like the following:
school time logs logp cat rank
1 1 4.2 8.9 1 1
1 2 4.2 8.1 1 2
1 3 3.5 9.2 1 1
2 1 4.1 7.5 1 2
2 2 4.5 6.5 1 2
3 1 5.1 6.6 2 4
3 2 6.2 6.8 3 7
#logs = log(score)
#logp = log(average hours of inputs)
#rank - rank of school
#cat = section red, section blue, section white in school (hierarchies)
My WinBUGS code is given below.
model {
# N observations
for (i in 1:n){
logs[i] ~ dnorm(mu[i], tau)
mu[i] <- bcons +bprice*(logp[i])
+ brank[cat[i]]*(rank[i])
}
}
}
# C categories
for (c in 1:C) {
brank[c] ~ dnorm(beta, taub)}
# priors
bcons ~ dnorm(0,1.0E-6)
bprice ~ dnorm(0,1.0E-6)
bad ~ dnorm(0,1.0E-6)
beta ~ dnorm(0,1.0E-6)
tau ~ dgamma(0.001,0.001)
taub ~dgamma(0.001,0.001)
}
As you can see in the data sample above, I have multiple observations for school over time. How can I modify the code to account for time and school specific fixed effects. I have used STATA in the past and we get fe,be,i.time options to take care of fixed effects in a panel data. But here I am lost.