clickhouse :) TRUNCATE TABLE <db_name>.<table_name> ON CLUSTER xxxx
┌─host────────┬─port─┬─status─┬─error─────────────────────────────────────────────────────────┬─num_hosts_remaining─┬─num_hosts_active─┐
│ xxx │ xxx │ 341 │ Cannot execute replicated DDL query, maximum retries exceeded │ 5 │ 5 │
└─────────────┴──────┴────────┴───────────────────────────────────────────────────────────────┴─────────────────────┴──────────────────┘
┌─host────────┬─port─┬─status─┬─error─────────────────────────────────────────────────────────┬─num_hosts_remaining─┬─num_hosts_active─┐
│ xxx │ xxx │ 341 │ Cannot execute replicated DDL query, maximum retries exceeded │ 4 │ 4 │
└─────────────┴──────┴────────┴───────────────────────────────────────────────────────────────┴─────────────────────┴──────────────────┘
┌─host────────┬─port─┬─status─┬─error─────────────────────────────────────────────────────────┬─num_hosts_remaining─┬─num_hosts_active─┐
│ xxx │ xx │ 341 │ Cannot execute replicated DDL query, maximum retries exceeded │ 3 │ 3 │
└─────────────┴──────┴────────┴───────────────────────────────────────────────────────────────┴─────────────────────┴──────────────────┘
┌─host────────┬─port─┬─status─┬─error─────────────────────────────────────────────────────────┬─num_hosts_remaining─┬─num_hosts_active─┐
│ xxx │ xxx │ 341 │ Cannot execute replicated DDL query, maximum retries exceeded │ 2 │ 1 │
│ xxx │ xxx │ 341 │ Cannot execute replicated DDL query, maximum retries exceeded │ 1 │ 1 │
└─────────────┴──────┴────────┴───────────────────────────────────────────────────────────────┴─────────────────────┴──────────────────┘
┌─host────────┬─port─┬─status─┬─error─────────────────────────────────────────────────────────┬─num_hosts_remaining─┬─num_hosts_active─┐
│ xxx │ xxx │ 341 │ Cannot execute replicated DDL query, maximum retries exceeded │ 0 │ 0 │
└─────────────┴──────┴────────┴───────────────────────────────────────────────────────────────┴─────────────────────┴──────────────────┘
log show:
2022.01.18 15:08:39.500203 [ 11583 ] {} <Error> mars.logout_local (ed49f1f3-5576-46bb-ad49-f1f35576d6bb): void DB::StorageReplicatedMergeTree::queueUpdatingTask(): Code: 999. Coordination::Exception: Can't get data for node /clickhouse/tables/0/xxx/xxx/replicas/xxx/log_pointer: node doesn't exist (No node). (KEEPER_EXCEPTION), Stack trace (when copying this message, always include the lines below):
0. DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int, bool) # 0x944bdda in /usr/lib/debug/.build-id/66/99b86599a2121e78e0d42dd67791abd9ae5265.debug
1. Coordination::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, Coordination::Error, int) # 0x11cf3795 in /usr/lib/debug/.build-id/66/99b86599a2121e78e0d42dd67791abd9ae5265.debug
2. Coordination::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, Coordination::Error) # 0x11cf3a22 in /usr/lib/debug/.build-id/66/99b86599a2121e78e0d42dd67791abd9ae5265.debug
3. zkutil::ZooKeeper::get(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, Coordination::Stat*, std::__1::shared_ptr<Poco::Event> const&) # 0x11cfda12 in /usr/lib/debug/.build-id/66/99b86599a2121e78e0d42dd67791abd9ae5265.debug
4. DB::ReplicatedMergeTreeQueue::pullLogsToQueue(std::__1::shared_ptr<zkutil::ZooKeeper>, std::__1::function<void (Coordination::WatchResponse const&)>, DB::ReplicatedMergeTreeQueue::PullLogsReason) # 0x117dc451 in /usr/lib/debug/.build-id/66/99b86599a2121e78e0d42dd67791abd9ae5265.debug
5. DB::StorageReplicatedMergeTree::queueUpdatingTask() # 0x113ab28a in /usr/lib/debug/.build-id/66/99b86599a2121e78e0d42dd67791abd9ae5265.debug
6. DB::BackgroundSchedulePoolTaskInfo::execute() # 0x10898888 in /usr/lib/debug/.build-id/66/99b86599a2121e78e0d42dd67791abd9ae5265.debug
7. DB::BackgroundSchedulePool::threadFunction() # 0x1089a8b7 in /usr/lib/debug/.build-id/66/99b86599a2121e78e0d42dd67791abd9ae5265.debug
8. void std::__1::__function::__policy_invoker<void ()>::__call_impl<std::__1::__function::__default_alloc_func<ThreadFromGlobalPool::ThreadFromGlobalPool<DB::BackgroundSchedulePool::BackgroundSchedulePool(unsigned long, unsigned long, char const*)::$_1>(DB::BackgroundSchedulePool::BackgroundSchedulePool(unsigned long, unsigned long, char const*)::$_1&&)::'lambda'(), void ()> >(std::__1::__function::__policy_storage const*) # 0x1089b632 in /usr/lib/debug/.build-id/66/99b86599a2121e78e0d42dd67791abd9ae5265.debug
9. ThreadPoolImpl<std::__1::thread>::worker(std::__1::__list_iterator<std::__1::thread, void*>) # 0x948ce7f in /usr/lib/debug/.build-id/66/99b86599a2121e78e0d42dd67791abd9ae5265.debug
10. void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void ThreadPoolImpl<std::__1::thread>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>)::'lambda0'()> >(void*) # 0x9490763 in /usr/lib/debug/.build-id/66/99b86599a2121e78e0d42dd67791abd9ae5265.debug
11. start_thread # 0x7fa3 in /lib/x86_64-linux-gnu/libpthread-2.28.so
12. __clone # 0xf94cf in /lib/x86_64-linux-gnu/libc-2.28.so
i go to zookeeper to find the node /clickhouse/tables/0/xxx/xxx/replicas/xxx/log_pointer.it not exists truly.but why?
select count(*) from <db_name>.<table_name>
┌─count()─┐
│ 347279 │
└─────────┘
1 rows in set. Elapsed: 0.017 sec.
Related
On a clickhouse database, I've an array type as column and I want to make an distinct for all elements inside them
Instead of getting this
Select distinct errors.message_grouping_fingerprint
FROM views
WHERE (session_date >= toDate('2022-07-21')) and (session_date < toDate('2022-07-22'))
and notEmpty(errors.message) = 1
and project_id = 162
SETTINGS distributed_group_by_no_merge=0
[-8964675922652096680,-8964675922652096680]
[-8964675922652096680]
[-8964675922652096680,-8964675922652096680,-8964675922652096680,-8964675922652096680,-8964675922652096680,-8964675922652096680,-8964675922652096680,-827009490898812590,-8964675922652096680,-8964675922652096680,-8964675922652096680,-8964675922652096680]
[-8964675922652096680,-8964675922652096680,-8964675922652096680]
[-827009490898812590]
[-1660275624223727714,-1660275624223727714]
[1852265010681444046]
[-2552644061611887546]
[-7142229185866234523]
[-7142229185866234523,-7142229185866234523]
To get this
-8964675922652096680
-827009490898812590
-1660275624223727714
1852265010681444046
-2552644061611887546
-7142229185866234523
and finally, to make a count of all them
as 6
groupUniqArrayArray
select arrayMap( i-> rand()%10, range(rand()%3+1)) arr from numbers(10);
┌─arr─────┐
│ [0] │
│ [1] │
│ [7,7,7] │
│ [8,8] │
│ [9,9,9] │
│ [6,6,6] │
│ [2,2] │
│ [8,8,8] │
│ [2] │
│ [8,8,8] │
└─────────┘
SELECT
groupUniqArrayArray(arr) AS uarr,
length(uarr)
FROM
(
SELECT arrayMap(i -> (rand() % 10), range((rand() % 3) + 1)) AS arr
FROM numbers(10)
)
┌─uarr──────────────┬─length(groupUniqArrayArray(arr))─┐
│ [0,5,9,4,2,8,7,3] │ 8 │
└───────────────────┴──────────────────────────────────┘
ARRAY JOIN
SELECT A
FROM
(
SELECT arrayMap(i -> (rand() % 10), range((rand() % 3) + 1)) AS arr
FROM numbers(10)
)
ARRAY JOIN arr AS A
GROUP BY A
┌─A─┐
│ 0 │
│ 1 │
│ 4 │
│ 5 │
│ 6 │
│ 9 │
└───┘
I am expecting 300-500k rps with Rust. What I het is 7-40k rps.
I copied the hello-world example of rocket and actix_web:
rocket:
#![feature(proc_macro_hygiene, decl_macro)]
#[macro_use] extern crate rocket;
#[get("/hello/<name>/<age>")]
fn hello(name: String, age: u8) -> String {
format!("Hello, {} year old named {}!", age, name)
}
fn main() {
rocket::ignite().mount("/", routes![hello]).launch();
}
actix_web:
use actix_web::{get, post, web, App, HttpResponse, HttpServer, Responder};
#[get("/")]
async fn hello() -> impl Responder {
HttpResponse::Ok().body("Hello world!")
}
#[post("/echo")]
async fn echo(req_body: String) -> impl Responder {
HttpResponse::Ok().body(req_body)
}
async fn manual_hello() -> impl Responder {
HttpResponse::Ok().body("Hey there!")
}
#[actix_web::main]
async fn main() -> std::io::Result<()> {
println!("hi!");
HttpServer::new(|| {
App::new()
.service(hello)
.service(echo)
.route("/hey", web::get().to(manual_hello))
})
.bind("127.0.0.1:8080")?
.run()
.await
}
the results are (for actix_web):
$ autocannon "http://localhost:8080/" -c 10 -d 5 15s
Running 5s test # http://localhost:8080/
10 connections
┌─────────┬──────┬──────┬───────┬──────┬─────────┬─────────┬───────┐
│ Stat │ 2.5% │ 50% │ 97.5% │ 99% │ Avg │ Stdev │ Max │
├─────────┼──────┼──────┼───────┼──────┼─────────┼─────────┼───────┤
│ Latency │ 0 ms │ 0 ms │ 0 ms │ 0 ms │ 0.01 ms │ 0.11 ms │ 13 ms │
└─────────┴──────┴──────┴───────┴──────┴─────────┴─────────┴───────┘
┌───────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┐
│ Stat │ 1% │ 2.5% │ 50% │ 97.5% │ Avg │ Stdev │ Min │
├───────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Req/Sec │ 42527 │ 42527 │ 49375 │ 49631 │ 47817.6 │ 2710.34 │ 42503 │
├───────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Bytes/Sec │ 3.74 MB │ 3.74 MB │ 4.35 MB │ 4.37 MB │ 4.21 MB │ 238 kB │ 3.74 MB │
└───────────┴─────────┴─────────┴─────────┴─────────┴─────────┴─────────┴─────────┘
Req/Bytes counts sampled once per second.
239k requests in 5.02s, 21 MB read
With NodeJs, I get 30k rps. Why?
require('http').createServer(function (req, res) {
res.write('Hello World!');
res.end();
}).listen(8080);
p/singer {stav/research-changes} $ autocannon "http://localhost:8080/" -c 10 -d 5 5s
Running 5s test # http://localhost:8080/
10 connections
┌─────────┬──────┬──────┬───────┬──────┬─────────┬─────────┬───────┐
│ Stat │ 2.5% │ 50% │ 97.5% │ 99% │ Avg │ Stdev │ Max │
├─────────┼──────┼──────┼───────┼──────┼─────────┼─────────┼───────┤
│ Latency │ 0 ms │ 0 ms │ 0 ms │ 0 ms │ 0.01 ms │ 0.11 ms │ 11 ms │
└─────────┴──────┴──────┴───────┴──────┴─────────┴─────────┴───────┘
┌───────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┐
│ Stat │ 1% │ 2.5% │ 50% │ 97.5% │ Avg │ Stdev │ Min │
├───────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Req/Sec │ 35647 │ 35647 │ 44095 │ 44351 │ 42102.4 │ 3331.11 │ 35624 │
├───────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Bytes/Sec │ 5.45 MB │ 5.45 MB │ 6.75 MB │ 6.78 MB │ 6.44 MB │ 510 kB │ 5.45 MB │
└───────────┴─────────┴─────────┴─────────┴─────────┴─────────┴─────────┴─────────┘
Req/Bytes counts sampled once per second.
210k requests in 5.01s, 32.2 MB read
macos 10.15.1
rustc 1.54.0-nightly (5dc8789e3 2021-05-21)
node 14
cpu = 2%
ram = 50%
Note: I use cargo run --release to run the examples.
I've got a question regarding the FILL WITH function. I need a query grouped by month with empty rows to plot on a graph. I use the FILL WITH function.
I have a simple table:
CREATE TABLE IF NOT EXISTS fillwith
(
`event_timestamp` Datetime64,
`event_date` Date,
`event_type` String
)
ENGINE = Memory
With some sample data
insert into fillwith (event_timestamp, event_date, event_type) values ('2021-01-07 19:14:33.000', '2021-01-07', 'PRODUCT_VIEW');
insert into fillwith (event_timestamp, event_date, event_type) values ('2021-02-07 19:14:33.000', '2021-02-07', 'PRODUCT_CLICK');
insert into fillwith (event_timestamp, event_date, event_type) values ('2020-11-07 19:14:33.000', '2020-11-07', 'PRODUCT_VIEW');
insert into fillwith (event_timestamp, event_date, event_type) values ('2020-12-07 19:14:33.000', '2020-12-07', 'PRODUCT_VIEW');
insert into fillwith (event_timestamp, event_date, event_type) values ('2020-09-07 19:14:33.000', '2020-09-07', 'PRODUCT_VIEW');
With a day interval, I get a full list of days but not sorted and feels likes they are random days
SELECT
toDate(toStartOfInterval(event_date, toIntervalDay(1))) AS date,
countIf(event_type = 'PRODUCT_VIEW') AS views,
countIf(event_type = 'PRODUCT_CLICK') AS clicks
FROM fillwith
GROUP BY toDate(toStartOfInterval(event_date, toIntervalDay(1)))
ORDER BY date ASC
WITH FILL FROM toDate('2020-01-01') TO toDate('2021-12-01') STEP dateDiff('second', now(), now() + toIntervalDay(1))
Result:
┌───────date─┬─views─┬─clicks─┐
│ 2020-09-07 │ 1 │ 0 │
│ 2020-11-07 │ 1 │ 0 │
│ 2020-12-07 │ 1 │ 0 │
│ 2021-01-07 │ 1 │ 0 │
│ 2021-02-07 │ 0 │ 1 │
└────────────┴───────┴────────┘
┌───────date─┬─views─┬─clicks─┐
│ 2106-02-07 │ 0 │ 0 │
│ 2005-05-25 │ 0 │ 0 │
│ 2062-07-09 │ 0 │ 0 │
│ 2106-02-07 │ 0 │ 0 │
│ 1997-05-03 │ 0 │ 0 │
│ 2054-06-17 │ 0 │ 0 │
│ 2106-02-07 │ 0 │ 0 │
│ 1989-04-11 │ 0 │ 0 │
│ 2046-05-26 │ 0 │ 0 │
│ 2103-07-11 │ 0 │ 0 │
When I try the same for a Month interval:
select
toDate(toStartOfInterval(event_date, INTERVAL 1 month)) as date,
countIf(event_type = 'PRODUCT_VIEW') as views,
countIf(event_type = 'PRODUCT_CLICK') as clicks
from fillwith
GROUP BY toDate(toStartOfInterval(event_date, INTERVAL 1 month))
ORDER BY date ASC WITH FILL
FROM toDate('2020-01-01') TO toDate('2021-04-01') STEP dateDiff('second',
now(),
now() + INTERVAL 1 month)
Result:
┌───────date─┬─views─┬─clicks─┐
│ 2020-01-01 │ 0 │ 0 │
│ 2020-09-01 │ 1 │ 0 │
│ 2020-11-01 │ 1 │ 0 │
│ 2020-12-01 │ 1 │ 0 │
│ 2021-01-01 │ 1 │ 0 │
│ 2021-02-01 │ 0 │ 1 │
└────────────┴───────┴────────┘
But I expect:
┌───────date─┬─views─┬─clicks─┐
│ 2020-01-01 │ 0 │ 0 │
│ 2020-02-01 │ 0 │ 0 │
│ 2020-03-01 │ 0 │ 0 │
│ 2020-04-01 │ 0 │ 0 │
│ 2020-05-01 │ 0 │ 0 │
│ 2020-06-01 │ 0 │ 0 │
│ 2020-07-01 │ 0 │ 0 │
│ 2020-08-01 │ 0 │ 0 │
│ 2020-09-01 │ 1 │ 0 │
│ 2020-10-01 │ 0 │ 0 │
│ 2020-11-01 │ 1 │ 0 │
│ 2020-12-01 │ 1 │ 0 │
│ 2021-01-01 │ 1 │ 0 │
│ 2021-02-01 │ 0 │ 1 │
│ 2021-03-01 │ 0 │ 0 │
│ 2021-04-01 │ 0 │ 0 │
└────────────┴───────┴────────┘
Does someone know why this happens and how I can improve this?
Thanks for your help!
Try this query:
WITH toDate(0) AS start_date, toRelativeMonthNum(toDate(0)) AS relative_month_of_start_date
SELECT
addMonths(start_date, relative_month - relative_month_of_start_date) AS month,
views,
clicks
FROM
(
SELECT
toRelativeMonthNum(event_date) AS relative_month,
countIf(event_type = 'PRODUCT_VIEW') AS views,
countIf(event_type = 'PRODUCT_CLICK') AS clicks
FROM fillwith
GROUP BY relative_month
ORDER BY relative_month ASC
WITH FILL
FROM toRelativeMonthNum(toDate('2020-01-01'))
TO toRelativeMonthNum(toDate('2021-12-01')) STEP 1
)
ORDER BY month ASC
/*
┌──────month─┬─views─┬─clicks─┐
│ 2020-01-01 │ 0 │ 0 │
│ 2020-02-01 │ 0 │ 0 │
│ 2020-03-01 │ 0 │ 0 │
│ 2020-04-01 │ 0 │ 0 │
│ 2020-05-01 │ 0 │ 0 │
│ 2020-06-01 │ 0 │ 0 │
│ 2020-07-01 │ 0 │ 0 │
│ 2020-08-01 │ 0 │ 0 │
│ 2020-09-01 │ 1 │ 0 │
│ 2020-10-01 │ 0 │ 0 │
│ 2020-11-01 │ 1 │ 0 │
│ 2020-12-01 │ 1 │ 0 │
│ 2021-01-01 │ 1 │ 0 │
│ 2021-02-01 │ 0 │ 1 │
│ 2021-03-01 │ 0 │ 0 │
│ 2021-04-01 │ 0 │ 0 │
│ 2021-05-01 │ 0 │ 0 │
│ 2021-06-01 │ 0 │ 0 │
│ 2021-07-01 │ 0 │ 0 │
│ 2021-08-01 │ 0 │ 0 │
│ 2021-09-01 │ 0 │ 0 │
│ 2021-10-01 │ 0 │ 0 │
│ 2021-11-01 │ 0 │ 0 │
└────────────┴───────┴────────┘
*/
or alternate way:
SELECT
toStartOfMonth(date) AS month,
sum(views) AS views,
sum(clicks) AS clicks
FROM
(
SELECT
event_date AS date, /* or: toDate(toStartOfDay(event_timestamp)) AS date */
countIf(event_type = 'PRODUCT_VIEW') AS views,
countIf(event_type = 'PRODUCT_CLICK') AS clicks
FROM fillwith
GROUP BY date
ORDER BY date ASC
WITH FILL
FROM toDate('2020-01-01')
TO toDate('2021-12-01')
/* type of 'date' is Date => '1' means 1 day */
STEP 1
)
GROUP BY month
ORDER BY month ASC
/*
┌──────month─┬─views─┬─clicks─┐
│ 2020-01-01 │ 0 │ 0 │
│ 2020-02-01 │ 0 │ 0 │
│ 2020-03-01 │ 0 │ 0 │
│ 2020-04-01 │ 0 │ 0 │
│ 2020-05-01 │ 0 │ 0 │
│ 2020-06-01 │ 0 │ 0 │
│ 2020-07-01 │ 0 │ 0 │
│ 2020-08-01 │ 0 │ 0 │
│ 2020-09-01 │ 1 │ 0 │
│ 2020-10-01 │ 0 │ 0 │
│ 2020-11-01 │ 1 │ 0 │
│ 2020-12-01 │ 1 │ 0 │
│ 2021-01-01 │ 1 │ 0 │
│ 2021-02-01 │ 0 │ 1 │
│ 2021-03-01 │ 0 │ 0 │
│ 2021-04-01 │ 0 │ 0 │
│ 2021-05-01 │ 0 │ 0 │
│ 2021-06-01 │ 0 │ 0 │
│ 2021-07-01 │ 0 │ 0 │
│ 2021-08-01 │ 0 │ 0 │
│ 2021-09-01 │ 0 │ 0 │
│ 2021-10-01 │ 0 │ 0 │
│ 2021-11-01 │ 0 │ 0 │
└────────────┴───────┴────────┘
*/
I have some data:
┌─id--┬─serial┐
│ 1 │ 1 │
│ 2 │ 2 │
│ 3 │ 3 │
│ 4 │ 1 │
│ 5 │ 3 │
│ 6 │ 2 │
│ 7 │ 1 │
│ 8 │ 2 │
│ 9 │ 3 │
│ 10 │ 1 │
│ 11 │ 2 │
│ 12 │ 1 │
│ 13 │ 2 │
│ 14 │ 3 │
└─────┴───────┘
I want to group by column 'serial' where the group rule is: any ascending subset (like this, 1 -> 2 -> 3) is a group.
I expect result:
┌─id--┬─serial┬─group─┐
│ 1 │ 1 │ 1 │
│ 2 │ 2 │ 1 │
│ 3 │ 3 │ 1 │
│ 4 │ 1 │ 2 │
│ 5 │ 3 │ 2 │
│ 6 │ 2 │ 3 │
│ 7 │ 1 │ 4 │
│ 8 │ 2 │ 4 │
│ 9 │ 3 │ 4 │
│ 10 │ 1 │ 5 │
│ 11 │ 2 │ 5 │
│ 12 │ 1 │ 6 │
│ 13 │ 2 │ 6 │
│ 14 │ 3 │ 6 │
└─────┴───────┴───────┘
If I right understand you wanna split the set into subsets with ascending trend.
SELECT r.1 id, r.2 serial, r.3 AS group, arrayJoin(result) r
FROM (
SELECT
groupArray((id, serial)) sourceArray,
/* find indexes where the ascending trend is broken */
arrayFilter(i -> (i = 1 OR sourceArray[i - 1].2 > sourceArray[i].2), arrayEnumerate(sourceArray)) trendBrokenIndexes,
/* select all groups with ascending trend and assign them group-id */
arrayMap(i ->
(i, arraySlice(sourceArray, trendBrokenIndexes[i], i < length(trendBrokenIndexes) ? trendBrokenIndexes[i+1] - trendBrokenIndexes[i] : null)),
arrayEnumerate(trendBrokenIndexes)) groups,
/* prepare the result */
arrayReduce('groupArrayArray', arrayMap(x -> arrayMap(y -> (y.1, y.2, x.1), x.2), groups)) result
FROM (
/* source data */
SELECT arrayJoin([(1 , 1),(2 , 2),(3 , 3),(4 , 1),(5 , 3),(6 , 2),(7 , 1),(8 , 2),(9 , 3),(10, 1),(11, 2),(12, 1),(13, 2),(14, 3)]) a, a.1 id, a.2 serial
ORDER BY id))
/* Result
┌─id─┬─serial─┬─group─┬─r────────┐
│ 1 │ 1 │ 1 │ (1,1,1) │
│ 2 │ 2 │ 1 │ (2,2,1) │
│ 3 │ 3 │ 1 │ (3,3,1) │
│ 4 │ 1 │ 2 │ (4,1,2) │
│ 5 │ 3 │ 2 │ (5,3,2) │
│ 6 │ 2 │ 3 │ (6,2,3) │
│ 7 │ 1 │ 4 │ (7,1,4) │
│ 8 │ 2 │ 4 │ (8,2,4) │
│ 9 │ 3 │ 4 │ (9,3,4) │
│ 10 │ 1 │ 5 │ (10,1,5) │
│ 11 │ 2 │ 5 │ (11,2,5) │
│ 12 │ 1 │ 6 │ (12,1,6) │
│ 13 │ 2 │ 6 │ (13,2,6) │
│ 14 │ 3 │ 6 │ (14,3,6) │
└────┴────────┴───────┴──────────┘
*/
I am searching for a fast way to check if an int is included in a constant, sparse set.
Consider a unicode whitespace function:
let white_space x = x = 0x0009 or x = 0x000A or x = 0x000B or x = 0x000C or x = 0x000D or x = 0x0020 or x = 0x0085 or x = 0x00A0 or x = 0x1680
or x = 0x2000 or x = 0x2001 or x = 0x2002 or x = 0x2003 or x = 0x2004 or x = 0x2005 or x = 0x2006 or x = 0x2007 or x = 0x2008
or x = 0x2009 or x = 0x200A or x = 0x2028 or x = 0x2029 or x = 0x202F or x = 0x205F or x = 0x3000
What ocamlopt generates looks like this:
.L162:
cmpq $19, %rax
jne .L161
movq $3, %rax
ret
.align 4
.L161:
cmpq $21, %rax
jne .L160
movq $3, %rax
ret
.align 4
.L160:
cmpq $23, %rax
jne .L159
movq $3, %rax
ret
.align 4
...
I microbenchmarked this code using the following benchmark:
let white_space x = x = 0x0009 || x = 0x000A || x = 0x000B || x = 0x000C || x = 0x000D || x = 0x0020 || x = 0x0085 || x = 0x00A0 || x = 0x1680
|| x = 0x2000 || x = 0x2001 || x = 0x2002 || x = 0x2003 || x = 0x2004 || x = 0x2005 || x = 0x2006 || x = 0x2007 || x = 0x2008
|| x = 0x2009 || x = 0x200A || x = 0x2028 || x = 0x2029 || x = 0x202F || x = 0x205F || x = 0x3000
open Core.Std
open Core_bench.Std
let ws = [| 0x0009 ;0x000A ;0x000B ;0x000C ;0x000D ;0x0020 ;0x0085 ;0x00A0 ;0x1680
;0x2000 ;0x2001 ;0x2002 ;0x2003 ;0x2004 ;0x2005 ;0x2006 ;0x2007 ;0x2008
;0x2009 ;0x200A ;0x2028 ;0x2029 ;0x202F ;0x205F ;0x3000 |]
let rec range a b =
if a >= b then []
else a :: range (a + 1) b
let bench_space n =
Bench.Test.create (fun() -> ignore ( white_space ws.(n) ) ) ~name:(Printf.sprintf "checking whitespace (%x)" (n))
let tests : Bench.Test.t list =
List.map (range 0 (Array.length ws)) bench_space
let () =
tests
|> Bench.make_command
|> Command.run
The benchmark yields:
Estimated testing time 2.5s (25 benchmarks x 100ms). Change using -quota SECS.
┌──────────────────────────┬──────────┬────────────┐
│ Name │ Time/Run │ Percentage │
├──────────────────────────┼──────────┼────────────┤
│ checking whitespace (0) │ 4.05ns │ 18.79% │
│ checking whitespace (1) │ 4.32ns │ 20.06% │
│ checking whitespace (2) │ 5.40ns │ 25.07% │
│ checking whitespace (3) │ 6.63ns │ 30.81% │
│ checking whitespace (4) │ 6.83ns │ 31.71% │
│ checking whitespace (5) │ 8.13ns │ 37.77% │
│ checking whitespace (6) │ 8.28ns │ 38.46% │
│ checking whitespace (7) │ 8.98ns │ 41.72% │
│ checking whitespace (8) │ 10.08ns │ 46.81% │
│ checking whitespace (9) │ 10.43ns │ 48.44% │
│ checking whitespace (a) │ 11.49ns │ 53.38% │
│ checking whitespace (b) │ 12.71ns │ 59.04% │
│ checking whitespace (c) │ 12.94ns │ 60.08% │
│ checking whitespace (d) │ 14.03ns │ 65.16% │
│ checking whitespace (e) │ 14.38ns │ 66.77% │
│ checking whitespace (f) │ 15.09ns │ 70.06% │
│ checking whitespace (10) │ 16.15ns │ 75.00% │
│ checking whitespace (11) │ 16.67ns │ 77.43% │
│ checking whitespace (12) │ 17.59ns │ 81.69% │
│ checking whitespace (13) │ 18.66ns │ 86.68% │
│ checking whitespace (14) │ 19.02ns │ 88.35% │
│ checking whitespace (15) │ 20.10ns │ 93.36% │
│ checking whitespace (16) │ 20.49ns │ 95.16% │
│ checking whitespace (17) │ 21.42ns │ 99.50% │
│ checking whitespace (18) │ 21.53ns │ 100.00% │
└──────────────────────────┴──────────┴────────────┘
So I am basically limited at around 100MB/s which is not too bad, but still around one order of magnitude slower than lexers of e.g. gcc. Since OCaml is a "you get what you ask for" language, I guess I cannot expect the compiler to optimize this, but is there a general technique that allows to improve this?
This is shorter and seems more constant time:
let white_space2 = function
| 0x0009 | 0x000A | 0x000B | 0x000C | 0x000D | 0x0020 | 0x0085 | 0x00A0 | 0x1680
| 0x2000 | 0x2001 | 0x2002 | 0x2003 | 0x2004 | 0x2005 | 0x2006 | 0x2007 | 0x2008
| 0x2009 | 0x200A | 0x2028 | 0x2029 | 0x202F | 0x205F | 0x3000 -> true
| _ -> false
Gives:
┌──────────────────────────┬──────────┬────────────┐
│ Name │ Time/Run │ Percentage │
├──────────────────────────┼──────────┼────────────┤
│ checking whitespace (0) │ 5.98ns │ 99.76% │
│ checking whitespace (1) │ 5.98ns │ 99.76% │
│ checking whitespace (2) │ 5.98ns │ 99.77% │
│ checking whitespace (3) │ 5.98ns │ 99.78% │
│ checking whitespace (4) │ 6.00ns │ 100.00% │
│ checking whitespace (5) │ 5.44ns │ 90.69% │
│ checking whitespace (6) │ 4.89ns │ 81.62% │
│ checking whitespace (7) │ 4.89ns │ 81.62% │
│ checking whitespace (8) │ 4.90ns │ 81.63% │
│ checking whitespace (9) │ 5.44ns │ 90.68% │
│ checking whitespace (a) │ 5.44ns │ 90.70% │
│ checking whitespace (b) │ 5.44ns │ 90.67% │
│ checking whitespace (c) │ 5.44ns │ 90.67% │
│ checking whitespace (d) │ 5.44ns │ 90.69% │
│ checking whitespace (e) │ 5.44ns │ 90.69% │
│ checking whitespace (f) │ 5.44ns │ 90.69% │
│ checking whitespace (10) │ 5.44ns │ 90.73% │
│ checking whitespace (11) │ 5.44ns │ 90.69% │
│ checking whitespace (12) │ 5.44ns │ 90.71% │
│ checking whitespace (13) │ 5.44ns │ 90.69% │
│ checking whitespace (14) │ 4.90ns │ 81.67% │
│ checking whitespace (15) │ 4.89ns │ 81.61% │
│ checking whitespace (16) │ 4.62ns │ 77.08% │
│ checking whitespace (17) │ 5.17ns │ 86.14% │
│ checking whitespace (18) │ 4.62ns │ 77.09% │
└──────────────────────────┴──────────┴────────────┘