Have such table and data:
create table sensor_values(
dt DateTime default now(),
value UInt32
)
engine MergeTree()
partition by toYYYYMM(dt)
order by tuple();
insert into sensor_values(value) values (1), (2), (11), (13), (4), (17), (5), (8);
Data:
value
-----
1
2
11
13
4
17
5
8
I would like to select data in range from first bad value (11) to last bad value (17). Bad values are more than 10.
Desired range after select:
value
-----
11
13
4
17
My first thoughts were to define whether value bad or not and then to calculate (some how) accumulative sum:
value isBad cumSum
--------------------
1 0 0
2 0 0
11 1 1
13 1 2
4 0 2
17 1 3
5 0 3
8 0 3
Then I would select from min(cumSum) to max(cumSum) - 1 but I miss last bad value.
How can I get the last value included in select result?
You can try to use either the window-functions (see: runningDifference, neighbor) or array-functions:
SELECT arrayJoin(slice) as result
FROM (
SELECT
groupArray(data) AS arr,
arrayFirstIndex(x -> (x > 10), arr) AS first_index,
(length(arr) - arrayFirstIndex(x -> (x > 10), arrayReverse(arr)) + 1) AS last_index,
arraySlice(arr, first_index, last_index - first_index + 1) AS slice
FROM
(
/* test dataset */
SELECT arrayJoin([1, 2, 11, 13, 4, 17, 5, 8]) AS data
)
)
/*
┌─result─┐
│ 11 │
│ 13 │
│ 4 │
│ 17 │
└────────┘
*/
Related
I am not clear about these two words.
Whether does one block have a fixed number of rows?
Whether is one block the minimum unit to read from disk?
Whether are different blocks stored in different files?
Whether is the range of one block bigger than granule? That means, one block can have several granules skip indices.
https://clickhouse.tech/docs/en/operations/table_engines/mergetree/#primary-keys-and-indexes-in-queries
Primary key is sparsed. By default it contains 1 value of each 8192 rows (= 1 granule).
Let's disable adaptive granularity (for the test) -- index_granularity_bytes=0
create table X (A Int64)
Engine=MergeTree order by A
settings index_granularity=16,index_granularity_bytes=0;
insert into X select * from numbers(32);
index_granularity=16 -- 32 rows = 2 granule , primary index have 2 values 0 and 16
select marks, primary_key_bytes_in_memory from system.parts where table = 'X';
┌─marks─┬─primary_key_bytes_in_memory─┐
│ 2 │ 16 │
└───────┴─────────────────────────────┘
16 bytes === 2 values of INT64.
Adaptive index granularity means that granules size various. Because wide rows (many bytes) needs (for performance) fewer (<8192) rows in granule.
index_granularity_bytes = 10MB ~ 1k row * 8129. So each granule have 10MB. If rows size 100k (long Strings), granule will have 100 rows (not 8192).
Skip index granules GRANULARITY 3 -- means that an index will store one value for each 3 table granules.
create table X (A Int64, B Int64, INDEX IX1 (B) TYPE minmax GRANULARITY 4)
Engine=MergeTree order by A
settings index_granularity=16,index_granularity_bytes=0;
insert into X select number, number from numbers(128);
128/16 = 8, table have 8 granules, INDEX IX1 stores 2 values of minmax (8/4)
So minmax index stores 2 values -- (0..63) and (64..128)
0..63 -- points to the first 4 table's granules.
64..128 -- points to the second 4 table' granules.
set send_logs_level='debug'
select * from X where B=77
[ 84 ] <Debug> dw.X (SelectExecutor): **Index `IX1` has dropped 1 granules**
[ 84 ] <Debug> dw.X (SelectExecutor): Selected 1 parts by date, 1 parts by key, **4 marks** to read from 1 ranges
SelectExecutor checked skip index - 4 table granules can be skipped because 77 is not in 0..63 .
And another 4 granules must be read ( 4 marks ) because 77 in (64..128) -- some of that 4 granules have B=77.
https://clickhouse.tech/docs/en/development/architecture/#block
Block can contain any number of rows.
For example 1 row blocks:
set max_block_size=1;
SELECT * FROM numbers_mt(1000000000) LIMIT 3;
┌─number─┐
│ 0 │
└────────┘
┌─number─┐
│ 2 │
└────────┘
┌─number─┐
│ 3 │
└────────┘
set max_block_size=100000000000;
create table X (A Int64) Engine=Memory;
insert into X values(1);
insert into X values(2);
insert into X values(3);
SELECT * FROM X;
┌─A─┐
│ 1 │
└───┘
┌─A─┐
│ 3 │
└───┘
┌─A─┐
│ 2 │
└───┘
3 rows in block
drop table X;
create table X (A Int64) Engine=Memory;
insert into X values(1)(2)(3);
select * from X
┌─A─┐
│ 1 │
│ 2 │
│ 3 │
└───┘
I have a table in Oracle that has a column that I would like to assign a value to from a set of possible values. I like to assign the values in order of the set, repeatedly, for the entire table.
For example:
If the set of values is {1, 2, 3}. I'd like to assign the values in this pattern until the last row is reached:
rowNum someCol valueCol
1 this 1
2 is 2
3 some 3
4 other 1
5 column 2
6 in 3
7 the 1
8 table 2
I can't figure out how to do this with a traditional update statement. Anone that could help with this problem?
Use Modulo to achieve desire result
UPDATE TableName
SET valueCol= CASE WHEN rowNum % 3 == 1 then 1
WHEN rowNum % 3 == 2 then 2
WHEN rowNum % 3 == 0 then 3
END
update tablename
set valuecol = case mod(rownum, 3) when 0 then 3 else mod(rownum, 3) end
;
i have a table in hive with two columns: session_id and duration_time like this:
|| session_id || duration||
1 14
1 10
1 20
1 10
1 12
1 16
1 8
2 9
2 6
2 30
2 22
i want to add a new column with unique id when:
the session_id is changing or the duration_time > 15
i want the output to be like this:
session_id duration unique_id
1 14 1
1 10 1
1 20 2
1 10 2
1 12 2
1 16 3
1 8 3
2 9 4
2 6 4
2 30 5
2 22 6
any ideas how to do that in hive QL?
thanks!
SQL tables represent unordered sets. You need a column specifying the ordering of the values, because you seem to care about the ordering. This could be an id column or a created-at column, for instance.
You can do this using a cumulative sum:
select t.*,
sum(case when duration > 15 or seqnum = 1 then 1 else 0 end) over
(order by ??) as unique_id
from (select t.*,
row_number() over (partition by session_id order by ??) as seqnum
from t
) t;
I'd like to create a function where for an arbitrary integer input value (let's say unsigned 32 bit) and a given number of d digits the return value will be a d digit B base number, B being the smallest base that can be used to represent the given input on d digits.
Here is a sample input - output of what I have in mind for 3 digits:
Input Output
0 0 0 0
1 0 0 1
2 0 1 0
3 0 1 1
4 1 0 0
5 1 0 1
6 1 1 0
7 1 1 1
8 0 0 2
9 0 1 2
10 1 0 2
11 1 1 2
12 0 2 0
13 0 2 1
14 1 2 0
15 1 2 1
16 2 0 0
17 2 0 1
18 2 1 0
19 2 1 1
20 0 2 2
21 1 2 2
22 2 0 2
23 2 1 2
24 2 2 0
25 2 2 1
26 2 2 2
27 0 0 3
28 0 1 3
29 1 0 3
30 1 1 3
.. .....
The assignment should be 1:1, for each input value there should be exactly one, unique output value. Think of it as if the function should return the nth value from the list of strangely sorted B base numbers.
Actually this is the only approach I could come up so far with - given an input value, generate all the numbers in the smallest possible B base to represent the input on d digits, then apply a custom sorting to the results ('penalizing' the higher digit values and putting them further back in the sort), and return the nth value from the sorted array. This would work, but is a spectacularly inefficient implementation - I'd like to do this without generating all the numbers up to the input value.
What would be an efficient approach for implementing this function? Any language or pseudocode is fine.
MBo's answer shows how to find the smallest base that will represent an integer number with a given number of digits.
I'm not quite sure about the ordering in your example. My answer is based on a different ordering: Create all possible n-digit numbers up to base b (e.g. all numbers up to 999 for max. base 10 and 3 digits). Sort them according to their maximum digit first. Numbers are sorted normalls within a group with the same maximum digit. This retains the characteristic that all values from 8 to 26 must be base 3, but the internal ordering is different:
8 0 0 2
9 0 1 2
10 0 2 0
11 0 2 1
12 0 2 2
13 1 0 2
14 1 1 2
15 1 2 0
16 1 2 1
17 1 2 2
18 2 0 0
19 2 0 1
20 2 0 2
21 2 1 0
22 2 1 1
23 2 1 2
24 2 2 0
25 2 2 1
26 2 2 2
When your base is two, life is easy: Just generate the appropriate binary number.
For other bases, let's look at the first digit. In the example above, five numbers start with 0, five start with 1 and nine start with 2. When the first digit is 2, the maximum digit is assured to be 2. Therefore, we can combine 2 with a 9 2-digit numbers of base 3.
When the first digit is smaller than the maximum digit in the group, we can combine it with the 9 2-digit numbers of base 3, but we must not use the 4 2-digit numbers that are ambiguous with the 4 2-digit numbers of base 2. That gives us five possibilites for the digits 0 and 1. These possibilities – 02, 12, 20, 21 and 22 – can be described as the unique numbers with two digits according to the same scheme, but with an offset:
4 0 2
5 1 2
6 2 0
7 2 1
8 2 2
That leads to a recursive solution:
for one digit, just return the number itself;
for base two, return the straightforward representation in base 2;
if the first number is the maximum digit for the determined base, combine it with a straighforward representations in that base;
otherwise combine it with a recursively determined representation of the same algorithm with one fewer digit.
Here's an example in Python. The representation is returned as list of numbers, so that you can represent 2^32 − 1 as [307, 1290, 990].
import math
def repres(x, ndigit, base):
"""Straightforward representation of x in given base"""
s = []
while ndigit:
s += [x % base]
x /= base
ndigit -= 1
return s
def encode(x, ndigit):
"""Encode according to min-base, fixed-digit order"""
if ndigit <= 1:
return [x]
base = int(x ** (1.0 / ndigit)) + 1
if base <= 2:
return repres(x, ndigit, 2)
x0 = (base - 1) ** ndigit
nprev = (base - 1) ** (ndigit - 1)
ncurr = base ** (ndigit - 1)
ndiff = ncurr - nprev
area = (x - x0) / ndiff
if area < base - 1:
xx = x0 / (base - 1) + x - x0 - area * ndiff
return [area] + encode(xx, ndigit - 1)
xx0 = x0 + (base - 1) * ndiff
return [base - 1] + repres(x - xx0, ndigit - 1, base)
for x in range(32):
r = encode(x, 3)
print x, r
Assuming that all values are positive, let's make simple math:
d-digit B-based number can hold value N if
Bd > N
so
B > N1/d
So calculate N1/d value, round it up (increment if integer), and you will get the smallest base B.
(note that numerical errors might occur)
Examples:
d=2, N=99 => 9.95 => B=10
d=2, N=100 => 10 => B=11
d=2, N=57 => 7.55 => B=8
d=2, N=33 => 5.74 => B=6
Delphi code
function GetInSmallestBase(N, d: UInt32): string;
const
Digits = '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ';
var
Base, i: Byte;
begin
Base := Ceil(Power(N, 1/d) + 1.0E-12);
if Base > 36 then
Exit('Big number, few digits...');
SetLength(Result, d);
for i := d downto 1 do begin
Result[i] := Digits[1 + N mod Base]; //Delphi string is 1-based
N := N div Base;
end;
Result := Result + Format(' : base [%d]', [Base]);
end;
begin
Memo1.Lines.Add(GetInSmallestBase(99, 2));
Memo1.Lines.Add(GetInSmallestBase(100, 2));
Memo1.Lines.Add(GetInSmallestBase(987, 2));
Memo1.Lines.Add(GetInSmallestBase(1987, 2));
Memo1.Lines.Add(GetInSmallestBase(87654321, 6));
Memo1.Lines.Add(GetInSmallestBase(57, 2));
Memo1.Lines.Add(GetInSmallestBase(33, 2));
99 : base [10]
91 : base [11]
UR : base [32]
Big number, few digits...
H03LL7 : base [22]
71 : base [8]
53 : base [6]
Hi what i need to do is create a select statement which outputs the sum of the first character in a field within the table so the output would look something like
A,12
B,0
C,20
D,14
E,0
ect...
The table is called contacts, in the above there was 12 occurrences of people whose names begin with the letter A
I hope i have explained this correctly
Let's understand this with EMP table example.
SQL> with
2 letters
3 as
4 (select chr( ascii('A')+level-1 ) letter
5 from dual
6 connect by level <= 26
7 )
8 SELECT substr(ename, 1, 1) AS init_name,
9 count(*) cnt
10 FROM emp
11 WHERE substr(ename, 1, 1) IN (SELECT letter from letters)
12 GROUP BY substr(ename, 1, 1)
13 UNION
14 SELECT l.letter AS init_name,
15 0 cnt
16 FROM letters l
17 WHERE l.letter NOT IN (SELECT substr(ename, 1, 1) FROM emp)
18 ORDER BY init_name
19 /
I CNT
- ----------
A 2
B 1
C 1
D 0
E 0
F 1
G 0
H 0
I 0
J 2
K 1
L 0
M 2
N 0
O 0
P 0
Q 0
R 0
S 2
T 1
U 0
V 0
W 1
X 0
Y 0
Z 0
26 rows selected.
SQL>
So, it gives the count of each letter of first name, and for the other letters which does not exist in the first name, the count is 0.
Generate the 26 letters using connect then left join to the first letter of the name and count them:
select letter, count(name) count
from (select chr(ascii('A')+level-1) letter from dual connect by level < 27) l
left join emp on substr(name, 1, 1) = letter
group by letter order by 1
See SQLFiddle
Attribution: My technique of generating letters uses elements of Lalit's answer.