Clickhouse: INDEX on column with values 0 1

Clickhouse: INDEX on column with values 0 1 - performance

I am trying to boost the performance of a query which contains a WHERE clause on a UInt8 column, that only contains 0 or 1 as possible values. I tried to break down the problem to ensure that no other factors (partition, PK..) cause the problem. I created a simple Table index_text with only 1 column and a set of indizes like this:
CREATE TABLE default.index_text (
`columnX` UInt8,
INDEX indexX1 columnX TYPE minmax GRANULARITY 1,
INDEX indexX2 columnX TYPE
set(0) GRANULARITY 1,
INDEX indexX3 columnX TYPE
set(1) GRANULARITY 1
) ENGINE = MergeTree()
ORDER BY
tuple() SETTINGS index_granularity = 8192
After that, I populate the table with ~25mil random values (0 or 1). I would expect the indizes to drop granules on this query, but this is not the case:
SELECT COUNT(*) FROM index_text WHERE columnX = 0
SELECT COUNT(*)
FROM index_text
WHERE columnX = 0
[JWDebian] 2020.10.19 07:48:26.511085 [ 584 ] {af7615f0-f32b-47c5-87a2-e8acc8e27f5e} <Debug> executeQuery: (from [::1]:40088) SELECT COUNT(*) FROM index_text WHERE columnX = 0
[JWDebian] 2020.10.19 07:48:26.511384 [ 584 ] {af7615f0-f32b-47c5-87a2-e8acc8e27f5e} <Trace> ContextAccess (default): Access granted: SELECT(columnX) ON default.index_text
[JWDebian] 2020.10.19 07:48:26.511440 [ 584 ] {af7615f0-f32b-47c5-87a2-e8acc8e27f5e} <Debug> default.index_text (SelectExecutor): Key condition: unknown
[JWDebian] 2020.10.19 07:48:26.512611 [ 584 ] {af7615f0-f32b-47c5-87a2-e8acc8e27f5e} <Debug> default.index_text (SelectExecutor): Index `indexX1` has dropped 0 / 3050 granules.
[JWDebian] 2020.10.19 07:48:26.522601 [ 584 ] {af7615f0-f32b-47c5-87a2-e8acc8e27f5e} <Debug> default.index_text (SelectExecutor): Index `indexX2` has dropped 0 / 3050 granules.
[JWDebian] 2020.10.19 07:48:26.523699 [ 584 ] {af7615f0-f32b-47c5-87a2-e8acc8e27f5e} <Debug> default.index_text (SelectExecutor): Index `indexX3` has dropped 0 / 3050 granules.
[JWDebian] 2020.10.19 07:48:26.523722 [ 584 ] {af7615f0-f32b-47c5-87a2-e8acc8e27f5e} <Debug> default.index_text (SelectExecutor): Selected 1 parts by date, 1 parts by key, 3050 marks by primary key, 3050 marks to read from 1 ranges
[JWDebian] 2020.10.19 07:48:26.523764 [ 584 ] {af7615f0-f32b-47c5-87a2-e8acc8e27f5e} <Trace> default.index_text (SelectExecutor): Reading approx. 24985600 rows with 2 streams
[JWDebian] 2020.10.19 07:48:26.523823 [ 584 ] {af7615f0-f32b-47c5-87a2-e8acc8e27f5e} <Trace> InterpreterSelectQuery: FetchColumns -> Complete
[JWDebian] 2020.10.19 07:48:26.525061 [ 620 ] {af7615f0-f32b-47c5-87a2-e8acc8e27f5e} <Trace> AggregatingTransform: Aggregating
[JWDebian] 2020.10.19 07:48:26.525087 [ 620 ] {af7615f0-f32b-47c5-87a2-e8acc8e27f5e} <Trace> Aggregator: Aggregation method: without_key
[JWDebian] 2020.10.19 07:48:26.530850 [ 621 ] {af7615f0-f32b-47c5-87a2-e8acc8e27f5e} <Trace> AggregatingTransform: Aggregating
[JWDebian] 2020.10.19 07:48:26.530893 [ 621 ] {af7615f0-f32b-47c5-87a2-e8acc8e27f5e} <Trace> Aggregator: Aggregation method: without_key
[JWDebian] 2020.10.19 07:48:26.598438 [ 620 ] {af7615f0-f32b-47c5-87a2-e8acc8e27f5e} <Trace> AggregatingTransform: Aggregated. 6509826 to 1 rows (from 6.21 MiB) in 0.074525217 sec. (87350648.03635526 rows/sec., 83.30 MiB/sec.)
[JWDebian] 2020.10.19 07:48:26.598976 [ 621 ] {af7615f0-f32b-47c5-87a2-e8acc8e27f5e} <Trace> AggregatingTransform: Aggregated. 6109074 to 1 rows (from 5.83 MiB) in 0.075064427 sec. (81384408.62274216 rows/sec., 77.61 MiB/sec.)
[JWDebian] 2020.10.19 07:48:26.598994 [ 621 ] {af7615f0-f32b-47c5-87a2-e8acc8e27f5e} <Trace> Aggregator: Merging aggregated data
┌──COUNT()─┐
│ 12618900 │
└──────────┘
[JWDebian] 2020.10.19 07:48:26.599322 [ 584 ] {af7615f0-f32b-47c5-87a2-e8acc8e27f5e} <Information> executeQuery: Read 24979658 rows, 23.82 MiB in 0.088181578 sec., 283275243 rows/sec., 270.15 MiB/sec.
[JWDebian] 2020.10.19 07:48:26.599356 [ 584 ] {af7615f0-f32b-47c5-87a2-e8acc8e27f5e} <Debug> MemoryTracker: Peak memory usage (for query): 0.00 B.
What am I doing wrong here? Conceptual misunderstanding of INDEX? Wrong type/params of INDEX? I am using ClickHouse server version 20.9.2 revision 54439, so I guess the allow_experimental_data_skipping_indices setting is no longer important. In desperation, I set it to 1 and queried an OPTIMIZE TABLE index_text FINAL after population, but the results are the same.

Related

how to use while loop to iterate a value in unix shell script

I have a path location called "qe/performance" and want to copy "jmx" files to S3 Bucket in AWS. Once file copied, i want to use while loop to list down the converted files(.jtl) in the results folder into s3 bucket. If files are present (successful), it will update the ECS cluster and remove the ec2 instance to decrease the load and send message to the group that file has uploaded.
on the other hand, if converted files are not presented to the results folder into s3 bucket, i want to put "Timeoutthreshold for 10 minutes and use iteration and sleep for 60 seconds to wait for files to be uploaded to s3 bucket. If iteration period crosses the timeoutthreshold, it will send email notification that file hasnt uploaded....
#/bin/bash
TimeoutThreshold=10
holder="qe/performance/"
echo $holder
for files in $holder; do
aws s3 cp $files s3://$JMeterFilesBucket/scripts/ --recursive
i=0
while [ true ]; do
aws s3 ls s3://$JMeterFilesBucket/results/ --recursive |grep .jtl
if [ "$?" -ne "0" ]; then \
sleep 60
iterator=$i + 1
if [ iterator -eq "$TimeoutThreshold" ]; then
aws sns publish --topic-arn ${Subscription} --message "files (HTML Reports/.jtl) failed to load to the s3 bucket: s3://$JMeterFilesBucket/results/";
else
sleep 60
fi
exit 0
else [ "$?" -eq "0" ]; then \
aws ecs update-service --cluster $Cluster --service $Service --desired-count $CURRENT_DESIRED_COUNT --region ${AWS::Region}
echo "publishing SNS Notification for the files that have been uploaded successfully"
aws sns publish --topic-arn ${Subscription} --message "files (HTML Reports/.jtl) have successfully loaded to the s3 bucket: s3://$JMeterFilesBucket/results/";
break
fi
done
done
here is the result after running this command, it still continuing showing the list of files like this and break
Also no email notifications are going thru in either condition
qe/performance/

21:49:50
Completed 14.7 KiB/14.7 KiB (86.3 KiB/s) with 1 file(s) remaining upload: qe/performance/ContribAccl_Dev.jmx to s3://retqa-jmeterfiles/scripts/ContribAccl_Dev.jmx

21:49:52
2019-09-11 17:41:12 1460 results/ContribAccl_Dev.jmx.jtl

21:49:52
2019-09-06 03:16:32 1653 results/Sample13.jmx.jtl

21:49:52
2019-09-09 15:19:40 163 results/Sample15.jmx.jtl

21:49:52
2019-09-11 17:41:12 1460 results/ContribAccl_Dev.jmx.jtl

21:49:52
2019-09-06 03:16:32 1653 results/Sample13.jmx.jtl

21:49:52
2019-09-09 15:19:40 163 results/Sample15.jmx.jtl

21:49:52
2019-09-11 17:41:12 1460 results/ContribAccl_Dev.jmx.jtl

21:49:52
2019-09-06 03:16:32 1653 results/Sample13.jmx.jtl

21:49:52
2019-09-09 15:19:40 163 results/Sample15.jmx.jtl

21:49:52
2019-09-11 17:41:12 1460 results/ContribAccl_Dev.jmx.jtl

21:49:52
2019-09-06 03:16:32 1653 results/Sample13.jmx.jtl

21:49:52
2019-09-09 15:19:40 163 results/Sample15.jmx.jtl

21:49:54
2019-09-11 17:41:12 1460 results/ContribAccl_Dev.jmx.jtl

21:49:54
2019-09-06 03:16:32 1653 results/Sample13.jmx.jtl

21:49:54
2019-09-09 15:19:40 163 results/Sample15.jmx.jtl

21:49:54
2019-09-11 17:41:12 1460 results/ContribAccl_Dev.jmx.jtl

21:49:54
2019-09-06 03:16:32 1653 results/Sample13.jmx.jtl

21:49:54
2019-09-09 15:19:40 163 results/Sample15.jmx.jtl

You have a typo here:
#/bin/bash
That should begin with #!, not just #.
This is wrong:
else [ "$?" -eq "0" ];
You don't put a condition after else. If you want to do another test, you have to use elif.
But there's no need for another condition. This condition is just the opposite of the first condition, so just write else and it will execute the code.
if [ "$?" -ne "0" ]; then
sleep 60
iterator=$i + 1
if [ iterator -eq "$TimeoutThreshold" ]; then
aws sns publish --topic-arn ${Subscription} --message "files (HTML Reports/.jtl) failed to load to the s3 bucket: s3://$JMeterFilesBucket/results/";
else
sleep 60
fi
exit 0
else
aws ecs update-service --cluster $Cluster --service $Service --desired-count $CURRENT_DESIRED_COUNT --region ${AWS::Region}
echo "publishing SNS Notification for the files that have been uploaded successfully"
aws sns publish --topic-arn ${Subscription} --message "files (HTML Reports/.jtl) have successfully loaded to the s3 bucket: s3://$JMeterFilesBucket/results/";
break
fi
There's also no need to escape the newline after then.
There also doesn't seem to be any point to the while loop. If the condition is true it will use exit to terminate the script, but if the condition is false it will use break to stop the loop. Either way, only the first iteration of the loop runs.
This is not the correct way to increment variable:
iterator=$i + 1
This tries to execute the statement + 1 with the environment variable iterator set to $i. But + 1 is not a valid command. You should write:
((iterator = $i + 1))
The double parenthese make it an arithmetic expression instead of an ordinary command.
Then you need to put $ before the iterator variable when you test it:
if [ "$iterator" -eq "$TimeoutThreshold" ]; then
However, $i is always 0 because you never increment that. If the loop didn't end immediately, you could use
((i++))
at the end of the loop.

I just want to address the general construct of your if/else statement. If you get the syntax correct, you would write:
if [ "$?" -ne "0" ]; then
command_1
elif [ "$?" -eq "0" ]; then
command_2 # This can never happen
else
command_3
fi
but that is wrong. The second list of commands (the portion of the elif) can never happen. The reason is that if the command that precedes this fails, then command_1 will be executed and neither of command_2 or command_3. But if the command that precedes this does not fail, then [ "$?" -ne "0" ] will fail, setting $? to a non-zero value. Then [ "$?" -eq "0" ] will fail because the command executed by the if portion changed the value of $?. Don't write if/else like this.

Grok pattern for data separated by pipe with whitespaces and optional values in it

I have a textfile/logfile in which the values are separated by a pipe symbol. "|" with multiple whitespaces.
Also I just wanted to try it without gsub.
An example is below,
Does anyone know how to write a GROK pattern to extract it for logstash? as I am very new to it. Thanks in advance
5000| | |applicationLog |ClientLog |SystemLog |Green | |2014-01-07 11:58:48.76948 |12345 (0x1224)|1) Error 2)Sample Log | Configuration Manager

Since the number of | are inconsistent between different words, you can match it with .*? and extract rest of the data as predefined grok pattern
%{NUMBER:num}.*?%{WORD:2nd}.*?%{WORD:3rd}.*?%{WORD:4th}.*?%{WORD:5th}.*?%{TIMESTAMP_ISO8601}
which will give you,
{
"num": [
[
"5000"
]
],
"BASE10NUM": [
[
"5000"
]
],
"2nd": [
[
"applicationLog"
]
],
"3rd": [
[
"ClientLog"
]
],
"4th": [
[
"SystemLog"
]
],
"5th": [
[
"Green"
]
],
"TIMESTAMP_ISO8601": [
[
"2014-01-07 11:58:48.76948"
]
],
"YEAR": [
[
"2014"
]
],
"MONTHNUM": [
[
"01"
]
],
"MONTHDAY": [
[
"07"
]
],
"HOUR": [
[
"11",
null
]
],
"MINUTE": [
[
"58",
null
]
],
"SECOND": [
[
"48.76948"
]
],
"ISO8601_TIMEZONE": [
[
null
]
]
}
You can test it at online grok debugger.
Since you are new to grok you might want to read, grok filter plugin basics
If you can, I'd suggest you to also have a look in dissect filter which is faster and efficient than grok,
The Dissect filter is a kind of split operation. Unlike a regular
split operation where one delimiter is applied to the whole string,
this operation applies a set of delimiters to a string value. Dissect
does not use regular expressions and is very fast. However, if the
structure of your text varies from line to line then Grok is more
suitable. There is a hybrid case where Dissect can be used to
de-structure the section of the line that is reliably repeated and
then Grok can be used on the remaining field values with more regex
predictability and less overall work to do.

tegrahost_v2: Stat for tegra186-quill-p3310-1000-c03-00-base.dtb failed

I've built an image for Jetson TX2 module using yocto. Everything when fine for few days but now I get this error when I try to flash the device.
Welcome to Tegra Flash
version 1.0.0
Type ? or help for help and q or quit to exit
Use ! to execute system commands
[ 0.0008 ] tegrasign_v2 --key None --getmode mode.txt
[ 0.0016 ] Assuming zero filled SBK key
[ 0.0016 ]
[ 0.0016 ] Generating RCM messages
[ 0.0023 ] tegrarcm_v2 --listrcm rcm_list.xml --chip 0x18 --download rcm mb1_recovery_prod.bin 0 0
[ 0.0030 ] RCM 0 is saved as rcm_0.rcm
[ 0.0033 ] RCM 1 is saved as rcm_1.rcm
[ 0.0033 ] List of rcm files are saved in rcm_list.xml
[ 0.0033 ]
[ 0.0033 ] Signing RCM messages
[ 0.0040 ] tegrasign_v2 --key None --list rcm_list.xml --pubkeyhash pub_key.key
[ 0.0046 ] Assuming zero filled SBK key
[ 0.0076 ]
[ 0.0076 ] Copying signature to RCM mesages
[ 0.0083 ] tegrarcm_v2 --chip 0x18 --updatesig rcm_list_signed.xml
[ 0.0093 ]
[ 0.0093 ] Parsing partition layout
[ 0.0100 ] tegraparser_v2 --pt flash.xml.tmp
[ 0.0109 ]
[ 0.0109 ] Creating list of images to be signed
[ 0.0116 ] tegrahost_v2 --chip 0x18 --partitionlayout flash.xml.bin --list images_list.xml zerosbk
[ 0.0124 ] Stat for tegra186-quill-p3310-1000-c03-00-base.dtb failed
[ 0.0161 ]
Error: Return value 4
Command tegrahost_v2 --chip 0x18 --partitionlayout flash.xml.bin --list images_list.xml zerosbk
Does this error ring a bell to anyone?
I am able to flash the board with JetPack.
Thanks,
-Damien

Just in case you never figured this out, it looks like
[ 0.0124 ] Stat for tegra186-quill-p3310-1000-c03-00-base.dtb failed
is the real error. Fix that and you should be good.

Twine: How to create click link for number variable

The Problem lies at where i do the score editions, $numOrder is equivalent to a orderedArray of c1 c2 c3 but i can't seem to use them as hook for click. The error message says it expects a ?this or string. Can click hook be a number?
(print: "$temp <br/>")
<b>(print: "$c1 $c2 $c3")</b>
(set: $numOrder to (sorted:...(a:$c1,$c2,$c3)))
(if: $temp is $Bigger)
[
(click:$numOrder's 1st)[(set:$score to $score-0.5)]
(click:$numOrder's 2nd)[(set:$score to $score-0.5)]
(click:$numOrder's 3rd)[(set:$score to $score+1)]
]
(elseif:$temp is $Smaller)
[
(click:$numOrder's 1st)[(set:$score to $score+1)]
(click:$numOrder's 2nd)[(set:$score to $score-0.5)]
(click:$numOrder's 3rd)[(set:$score to $score-0.5)]
]
(else:)
[
(click:$numOrder's 1st)[(set:$score to $score-0.5)]
(click:$numOrder's 2nd)[(set:$score to $score+1)]
(click:$numOrder's 3rd)[(set:$score to $score-0.5)]
]
]
(else:)
[
(print:"<br\><br\>Well that was easy!")
(stop:)
]
]}

Handling 4-block oriented matrix product and inversion in Maxima

I am concerned in finding symbolic solutions and expansion to matrix products and inversions. Actually, it is something I would like to define by myself. I will explain myself.
I want to create a "mathematical" object that i will call B4MAT which represents a square matrix whose elements are 4 square half-sized matrices. So I want to define the product between two B4MAT giving me back another B4MAT whose components are calculated by applying product rules, but among matrices, not scalars.
Furthermore, and this is a very important point, consider Blockwise Inversion of a matrix. I want to define inversion of a B4MAT as an operation returning me another B4MAT whose elements are calculated using the blockwise inversion algorithm in the link.
How to achieve this in Maxima?
Thankyou

For the first half of your question, you just need to change matrix_element_mult to non-commutative multiplication and then use a matrix whose elements are the blocks you want. For example:
Maxima branch_5_27_base_248_ge261c5e http://maxima.sourceforge.net
using Lisp SBCL 1.0.57.0.debian
Distributed under the GNU Public License. See the file COPYING.
Dedicated to the memory of William Schelter.
The function bug_report() provides bug reporting information.
(%i1) A: matrix([1,2],[3,4])$ B: matrix([2,1],[3,4])$
(%i3) matrix([A,B], [B,A]);
*** output flushed ***
(%i4) C: matrix([A,B], [B,A]);
[ [ 1 2 ] [ 2 1 ] ]
[ [ ] [ ] ]
[ [ 3 4 ] [ 3 4 ] ]
(%o4) [ ]
[ [ 2 1 ] [ 1 2 ] ]
[ [ ] [ ] ]
[ [ 3 4 ] [ 3 4 ] ]
(%i5) C . C;
[ [ 5 5 ] [ 4 4 ] ]
[ [ ] [ ] ]
[ [ 18 32 ] [ 18 32 ] ]
(%o5) [ ]
[ [ 4 4 ] [ 5 5 ] ]
[ [ ] [ ] ]
[ [ 18 32 ] [ 18 32 ] ]
(%i6) matrix_element_mult: ".";
(%o6) .
(%i7) C . C;
[ [ 14 16 ] [ 13 17 ] ]
[ [ ] [ ] ]
[ [ 33 41 ] [ 33 41 ] ]
(%o7) [ ]
[ [ 13 17 ] [ 14 16 ] ]
[ [ ] [ ] ]
[ [ 33 41 ] [ 33 41 ] ]
I think you have to code up the inversion formula yourself though (don't forget you can get at the blocks with expressions like "C[1][2]" (for the top right corner) etc.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Clickhouse: INDEX on column with values 0 1 - performance

Related

how to use while loop to iterate a value in unix shell script

Grok pattern for data separated by pipe with whitespaces and optional values in it

tegrahost_v2: Stat for tegra186-quill-p3310-1000-c03-00-base.dtb failed

Twine: How to create click link for number variable

Handling 4-block oriented matrix product and inversion in Maxima

Categories

Resources