I have a disconnected table called TableX with two columns and a single row of data:
Date | Amount
4-Jan | 10.5
There will always only be 1 row of data.
The model it is part of is huge and the amount 10.5 will be used extensively in measures.
To extract 10.5 from this table I've used the following:
AmountValue := MIN(TableX[Amount])
Is MIN the fastest way to extract this value?
Here test from DaxStudio (5 Cold + 5 Hot cache execution) for disconnected table containin only one row as in your example :
DEFINE
MEASURE 'TableX'[AmountValue3] = ALL(TableX[Amount])
evaluate
{[AmountValue2]}
DEFINE
MEASURE 'TableX'[AmountValue2] = Values(TableX[Amount])
evaluate
{[AmountValue2]}
DEFINE
MEASURE 'TableX'[AmountValue] = min(TableX[Amount])
evaluate
{[AmountValue]}
Related
First off, I admit that I am not sure whether what I am trying to achieve is possible (or even logical). Still I am putting forth this query (and if nothing else, at least be told that I need to redesign my table structure / business logic).
In a table (myValueTable) I have the following records:
Item
article
from_date
to_date
myStock
1
Paper
01/04/2021
31/12/9999
100
2
Tray
12/04/2021
31/12/9999
12
3
Paper
28/04/2021
31/12/9999
150
4
Paper
06/05/2021
31/12/9999
130
As part of the underlying process, I am to find out the value (of field myStock) as on a particular date, say 30/04/2021 (assuming no inward / outward stock movement in the interim).
To that end, I have the following values:
varRefDate = 30/04/2021
varArticle = "Paper"
And my query goes something like this:
get_value = myValueTable.objects.filter(from_date__lte=varRefDate, to_date__gte=varRefDate).get(article=varArticle).myStock
which should translate to:
get_value = SELECT myStock FROM myValueTable WHERE varRefDate BETWEEN from_date AND to_date
But with this I am coming up with more than one result (actually THREE!).
How do I restrict the query result to get ONLY the 3rd instance i.e. the one with value "150" (for article = "paper")?
NOTE: The upper limit of date range (to_date) is being kept constant at 31/12/9999.
Edit
Solved it. In a round about manner. Instead of .get, resorted to generating values_list with fields from_date and myStock. Using the count of objects returned; appended a list with date difference between from_date and the ref date (which is 30/04/2021) and the value of field myStock, sorted (ascending) the generated list. The first tuple in the sorted list will have the least date difference and the corresponding myStock value and that will be the value I am searching for. Tested and works.
I have a dataset which I want to format in order to perform repeated measures anova. My dataset is of the form:
set.seed(32)
library(tibble)
id<- rep(1:2,each=3)
y_0 <- rep(rnorm(2,mean=50,sd=10),each=3)
time <- rep(c(1,2,3),times=2)
c<-rep(rnorm(2,mean=10,sd=12),each=3)
data <- tibble(id,y,t,c)
I want to bring the dataset in the form of a dataset for repeated measures anova meaning I want to have only one value for id in each column and create 3 more columns. One for y+c in time 1 named y_1,y+c in time 2 named y_2 and y+c in time 3 named y_3. Can anyone provide some assistance?
I see that clickhouse created multiple directories for each partition key.
Documentation says the directory name format is: partition name, minimum number of data block, maximum number of data block and chunk level. For example, the directory name is 201901_1_11_1.
I think it means that the directory is a part which belongs to partition 201901, has the blocks from 1 to 11 and is on level 1. So we can have another part whose directory is like 201901_12_21_1, which means this part belongs to partition 201901, has the blocks from 12 to 21 and is on level 1.
So I think partition is split into different parts.
Am I right?
Parts -- pieces of a table which stores rows. One part = one folder with columns.
Partitions are virtual entities. They don't have physical representation. But you can say that these parts belong to the same partition.
Select does not care about partitions.
Select is not aware about partitioning keys.
BECAUSE each part has special files minmax_{PARTITIONING_KEY_COLUMN}.idx
These files contain min and max values of these columns in this part.
Also this minmax_ values are stored in memory in a (c++ vector) list of parts.
create table X (A Int64, B Date, K Int64,C String)
Engine=MergeTree partition by (A, toYYYYMM(B)) order by K;
insert into X values (1, today(), 1, '1');
cd /var/lib/clickhouse/data/default/X/1-202002_1_1_0/
ls -1 *.idx
minmax_A.idx <-----
minmax_B.idx <-----
primary.idx
SET send_logs_level = 'debug';
select * from X where A = 555;
(SelectExecutor): MinMax index condition: (column 0 in [555, 555])
(SelectExecutor): Selected 0 parts by date
SelectExecutor checked in-memory part list and found 0 parts because minmax_A.idx = (1,1) and this select needed (555, 555).
CH does not store partitioning key values.
So for example toYYYYMM(today()) = 202002 but this 202002 is not stored in a part or anywhere.
minmax_B.idx stores (18302, 18302) (2020-02-10 == select toInt16(today()))
In my case, I had used groupArray() and arrayEnumerate() for ranking in Populate. I thought that Populate can run query with new data on the partition (in my case: toStartOfDay(Date)), the total sum of new inserted data is correct but the groupArray() function is doesn't work correctly.
I think it's happened because when insert one Part, CH will groupArray() and rank on each Part immediately then merging Parts in one Partition, therefore i wont get exactly the final result of groupArray() and arrayEnumerate() function.
Summary, Merge
[groupArray(part_1) + groupArray(part_2)] is different from
groupArray(Partition)
with
Partition=part_1 + part_2
The solution that i tried is insert new data as one block size, just like using groupArray() to reduce the new data to the number of rows that is lower than max_insert_block_size=1048576. It did correctly but it's hard to insert new data of 1 day as one Part because it will use too much memory for querying when populating the data of 1 day (almost 150Mn-200Mn rows).
But do u have another solution for Populate with groupArray() for new inserting data, such as force CH to use POPULATE on each Partition, not each Part after merging all the part into one Partition?
I have a situation where I need to find the max value on 3 calculated fields and store it in another field, is it possible to do it in one SQL query? Below is the example
SELECT
Income1, Income1 * 2% as Personal_Income, Income2, Income2 * 10% as Share_Income, Income3, Income3 * 1% as Job_Income, Max(Personal_Income, Share_Income, Job_Income) From Table
one way I tried is to calculate Personal_Income, Share_Income, Job_Income in the first pass and in the second pass I used
Select Case when Personal_income > Share_Income and Personal_Income > Job_Income then Personal_income when Share_income > Job_Income then Share_income Else Job_income as the greatest_income
but this require me to do 2 scans on a billion rows table, How can I avoid this and do it in a single pass? Any help much appreciated.
You can use greatest which results in the greatest value among multiple columns on a given row.
select greatest(Income1*1.02, Income2*1.1, Income3*1.01) as greatest_Income
From Table
Note this is not an aggregate function and other columns can be included in select as needed.
I am trying to use power pivot to analyze a stock portfolio at any point in time.
The data model is:
transactions table with buy and sell transactions
historical_prices table with the closing price of each stock
security_lookup table with the symbol and other information about the stock (whether it’s a mutual fund, industry, large cap, etc.).
One to many relationships link the symbol column in security_lookup to the transactions and historical_prices tables.
I am able to get the cost basis to work correctly by doing sumx(transactions, quantity*price). However, I’m not able to get the current value of my holdings. I have a measure called “Current Price” which finds the most recent closing price by
Current Price :=
CALCULATE (
LASTNONBLANK ( Historical_prices[close], min[close] ),
FILTER (
Historical_Prices,
Historical_prices[date] = LASTDATE ( historical_prices[date] )
)
)
However, when I try to find the current value of a security by using
Current Value = sumx(transactions,transactions[quantity]*[Current Price])
the total is not accurate. I'd appreciate suggestions on a way to find the current value of a position. Preferably using sumx or an iterator function so that the subtotals are accurate.
The problem with your Current Value measure is that you are evaluating [Current Price] within the row context of the transactions table (since SUMX is an iterator), so it's only seeing the date associated with that row instead of the last date. Or more precisely, that row's date is the last date in the measure's filter context.
The simplest solution is probably to calculate the Current Price outside of the iterator using a variable and then pass that constant in so you don't have to worry about row and filter contexts.
Current Value =
VAR CurrentPrice = [Current Price]
RETURN SUMX(transactions, transactions[quantity] * CurrentPrice)