Referencing from table with mixed cells of different categories - filter

I'm trying to program a Google Sheets for comparing and analyzing logistic costs.
I have the following:
A sheet with a database of numbers, organized like this:
A second sheet with a table in which, using the MIN function, I get the price of the cheapest provider for each model, depending on quantity and destination.
And last, into another sheet, I have what I call "The interface". Using an INDEX MATCH MATCH formula, I let the user choose destination and quantity for each one of the models avalable, and it returns the cheapest price. (I can't post more images, so basically it has this structure):
MODEL A
DESTINATION: DESTINATION 2
NUM. OBJ: 2
PRICE: 59
PROVIDER:
My problem is that I can't figure how to make it return the name of the provider with the cheapest price, as I'm referencing from the second table, in which in a same row or column there are cells with prices that belong to different providers.

Using min is undesirable in this context, because it doesn't tell you where the minimal value was found, and you need this information.
Here is a formula that returns the minimal cost together with the provider. In my example, the data is in the range A1:E7, as below; destination is in G1 and model is in G2.
=iferror(array_constrain(sort({filter(A1:A7, B1:B7=G2), filter(filter(C1:E7, B1:B7=G2), C1:E1=G1)}, 2, True), 1, 2), "Not found")
The same with linebreaks for readability:
=iferror(
array_constrain(
sort(
{
filter(A1:A7, B1:B7 = G2),
filter(filter(C1:E7, B1:B7 = G2), C1:E1 = G1)
},
2, True),
1, 2),
"Not found")
Explanation:
filtering by B1:B7 = G2 means keeping only the rows with the desired model
filtering by C1:E1 = G1 means keeping only the column with desired destination
{ , } means putting two parts of a filtered table together: column A, and column with destination
sort by 2nd column (price), in ascending order (true)
array_constrain keeps only the first row in this sort; that is, one with lowest price.
iferror is in case there is no such destination or model in the table. Then the function returns "not found".
Example: with G1 = Destination 1 and G2 = A, the formula returns
Provider 2 2

Related

PowerQuery choose values based on a key column

I have very large files which PowerQuery seems to handle nicely. I need to do some mathematical operations using column d and the value from columns a, b or c based on the value of the key column. My first thought is to isolate the salient value making a column called Salient which selects the value I need and then go from there. In Excel, this might be: =INDEX($A:$E, ROW(F2), MATCH(A2,$A$1:$D$1)).
In reality, I have between 50 and 100 columns as well as millions of rows, so extra points for computational efficiency.
You can define a custom column Salient with just this as the definition:
Record.Field(_, [Key])
The M code for the whole step looks like this:
= Table.AddColumn(#"Prev Step Name", "Salient", each Record.Field(_, [Key]), Int64.Type)
The _ represents the current row, which is a record data type that can be expressed as e.g.
[Key = "a", a = 17, b = 99, c = 21, d = 12]
and you use Record.Field to pick the field corresponding to the Key.

Google Apps Script: Activate and sort rows when cells in given column are not blank

I'm extremely new to Apps Script and trying to make my first thing. It's a shopping list.
I want to create a function that will activate and then sort (by Column 1, 'Aisle #') all rows where there are values in a given other column (Column 3, 'Qty'). The idea is to sort the items on the list for that week (i.e., with a value filled in for Qty) by aisle to give me the order I should be looking for things. I do not want to sort items which are in the spreadsheet but without
a value for Qty.
Here is what I've got so far:
var sheet = ss.getActiveSheet()
var range = sheet.getDataRange();
var rangeVals = range.getValues()
function orderList2(){
if(rangeVals[3] != ""){
sheet.activate().sort(1, ascending=true);
};
};
I'm trying to use "if" to define which rows to activate before doing the sort (as I don't want to sort the entire sheet—I only want to sort the items I will be buying that week, i.e., the items with a value in Column 3). The script runs but ends up sorting the entire sheet.
The closest thing I could find was an iteration, but when I did it, it ended up only activating the top-left cell.
Any help you can provide would be greatly appreciated!
Cheers,
Nick
Answer:
Use Range.sort() instead of Sheet.sort() if you don't want to sort the entire sheet.
Explanation:
You want to sort the data according to the value in column A (Aisle #), if the corresponding value in C (Qty) is not empty.
If my assumption is correct, the rows where Qty is empty should go below the rest of data, and they should not be sorted according to their Aisle #.
In this case, I'd suggest the following:
Sort the full range of data (headers excluded) according to Qty, so that the rows without a Qty are placed at the bottom, using Range.sort() (if you don't need to exclude the headers, you can use Sheet.sort() instead).
Use SpreadsheetApp.flush() to apply the sort to the spreadsheet.
Use getValues(), filter() and length to know how many rows in the initial range have their column C populated (variable QtyElements in the sample below).
Using QtyElements, retrieve the range of rows with a non-empty column C, and sort it according to column 1, using Range.sort().
Code sample:
function orderList2() {
var sheet = SpreadsheetApp.getActiveSheet();
var firstRow = 2; // Range starts at row 2, header row excluded
var fullRange = sheet.getRange(firstRow, 1, sheet.getLastRow() - firstRow + 1, sheet.getLastColumn());
fullRange.sort(3); // Sort full range according to Qty
SpreadsheetApp.flush(); // Refresh spreadsheet
var QtyElements = fullRange.getValues().filter(row => row[2] !== "").length;
sheet.getRange(firstRow, 1, QtyElements, sheet.getLastColumn())
.sort(1); // If not specified, default ascending: true
//.sort({column: 1, ascending: false}); // Uncomment if you want descending sort
}
Reference:
Range.sort(sortSpecObj)

How to understand part and partition of ClickHouse?

I see that clickhouse created multiple directories for each partition key.
Documentation says the directory name format is: partition name, minimum number of data block, maximum number of data block and chunk level. For example, the directory name is 201901_1_11_1.
I think it means that the directory is a part which belongs to partition 201901, has the blocks from 1 to 11 and is on level 1. So we can have another part whose directory is like 201901_12_21_1, which means this part belongs to partition 201901, has the blocks from 12 to 21 and is on level 1.
So I think partition is split into different parts.
Am I right?
Parts -- pieces of a table which stores rows. One part = one folder with columns.
Partitions are virtual entities. They don't have physical representation. But you can say that these parts belong to the same partition.
Select does not care about partitions.
Select is not aware about partitioning keys.
BECAUSE each part has special files minmax_{PARTITIONING_KEY_COLUMN}.idx
These files contain min and max values of these columns in this part.
Also this minmax_ values are stored in memory in a (c++ vector) list of parts.
create table X (A Int64, B Date, K Int64,C String)
Engine=MergeTree partition by (A, toYYYYMM(B)) order by K;
insert into X values (1, today(), 1, '1');
cd /var/lib/clickhouse/data/default/X/1-202002_1_1_0/
ls -1 *.idx
minmax_A.idx <-----
minmax_B.idx <-----
primary.idx
SET send_logs_level = 'debug';
select * from X where A = 555;
(SelectExecutor): MinMax index condition: (column 0 in [555, 555])
(SelectExecutor): Selected 0 parts by date
SelectExecutor checked in-memory part list and found 0 parts because minmax_A.idx = (1,1) and this select needed (555, 555).
CH does not store partitioning key values.
So for example toYYYYMM(today()) = 202002 but this 202002 is not stored in a part or anywhere.
minmax_B.idx stores (18302, 18302) (2020-02-10 == select toInt16(today()))
In my case, I had used groupArray() and arrayEnumerate() for ranking in Populate. I thought that Populate can run query with new data on the partition (in my case: toStartOfDay(Date)), the total sum of new inserted data is correct but the groupArray() function is doesn't work correctly.
I think it's happened because when insert one Part, CH will groupArray() and rank on each Part immediately then merging Parts in one Partition, therefore i wont get exactly the final result of groupArray() and arrayEnumerate() function.
Summary, Merge
[groupArray(part_1) + groupArray(part_2)] is different from
groupArray(Partition)
with
Partition=part_1 + part_2
The solution that i tried is insert new data as one block size, just like using groupArray() to reduce the new data to the number of rows that is lower than max_insert_block_size=1048576. It did correctly but it's hard to insert new data of 1 day as one Part because it will use too much memory for querying when populating the data of 1 day (almost 150Mn-200Mn rows).
But do u have another solution for Populate with groupArray() for new inserting data, such as force CH to use POPULATE on each Partition, not each Part after merging all the part into one Partition?

Simpler alternative to simultaneously Sort and Filter by column in Google Spreadsheets

I have a spreadsheet (here's a copy) with the following (headered) columns:
A: Indices for a list of groceries;
B: Names for the groceries to be indexed by column A;
C: Check column with "x" for inactive items in column B, empty otherwise;
D: Sorting indices that I want to apply to column B;
Currently, I am getting the sorted AND filtered result with this formula:
=SORT(FILTER(B2:B; C2:C = ""); FILTER(D2:D; C2:C = ""); TRUE)
The problem is that I need to apply the filter two times: one for the items and one for the indices, otherwise I get a mismatch between elements for the Sort function.
I feel that this doesn't scale well since it creates duplication.
Is there a way to get the same results with a simpler formula or another arrangement of columns?
=SORT(FILTER({Itens!B2:B\Itens!G2:G}; Itens!D2:D=""))
=SORT(FILTER({Itens!B2:B\Itens!G2:G}; Itens!D2:D="");2;1)
or maybe: =SORT(FILTER(Itens!B2:B; Itens!D2:D="");2;1)

olap4J - calculations on member grouping

I'm trying to write an olap4j (Mondrian) query that will group the rows by ranges.
Assume we have counts of cards per child and the children ages.
i want to sum the cards amount by age ranges, so i will have counts for ages 0-5,5-10,10-15 and so on.
Is this can be done with olap4j?
You need to define calculated members for that:
With member [Age].[0-4] as [Age].[0]:[Age].[4]
member [Age].[5-9] as [Age].[5]:[Age].[9]
etc.
Alternatively, you may want to re-design your dimension table. I'm guessing you have age as a degenerate dimension in the fact table. I suggest creating a separate dimension dim_age with a structure like this:
age_id, age, age_group
0, null, null
1, 0, 0-4
2, 1, 0-4
(...)
Then it's easy to define a first level on the dimension based on the age_group.

Resources