Perfomance wise for LUA table selection - performance

I'm a bit new to LUA. So I have a game that I need to capture the Entity and insert into the table. The maximum possible Entity table that could happen at the same time is 14. So I read that an array based solution is good.
But I saw that the table size increment even if we delete some value, for example from 10 table value and delete value at index 9 its not automatically shift the size when I want to insert table number 11.
Example:
local Table = {"hello", "hello", "hello", "hello", "hello", "hello", "hello", "hello", "hello", "hello"}
-- Current Table size = 10
-- Perform delete at index 9
Table[9] = nil
-- Have new Entity to insert
Table[#Table + 1] = "New Value"
-- The table size will grow by the time the game extend.
So for this type of situation did array based table with nil value inside that grow by the time of new table value inserted will have better perfomance or should I move into table with key?
Or I should just stick with array based table and perform full cleanup when the table isnt used?

If you set an element in a table to nil, then that just stays there as a "hole" in your array.
tab = {1, 2, 3, 4}
tab[2] = nil
-- tab == {1, nil, 3, 4}
-- #tab is actually undefined and could be both 1 or 4 (or something completely unexpected)!
What you need to do is set the field to nil, then shift all the following fields to fill that hole. Luckily, Lua has a function for that, which is table.remove(table, index).
tab = {1, 2, 3, 4}
table.remove(tab, 2)
-- tab == {1, 3, 4}
-- #tab == 3
Keep in mind that this can get very slow as there's lots of memory access involved, so don't go applying this solution when you have a few million elements some day :)

While table.remove(Table, 9) will do the job in your case (removing field from "array" table and shifting remaining fields to fill the hole), you should first consider using "set" table instead.
If you:
- often remove/add elements
- don't care about their order
- often check if table contains a certain element
then the "set" table is your choice. Use it like so
local tab = {
["John"] = true,
["Jane"] = true,
["Bob"] = true,
}
Your elements will be stored as indices in a table.
Remove an element with
tab["Jane"] = nil
Test if table contains an element with
if tab["John"] then
-- tab contains "John"
Advantages compared to array table:
- this will eliminate performance overhead when removing an element because other elements will remain intact and no shifting is required
- checking if element exists in this table (which I assume is the main puspose of this table) is also faster than using array table because it no longer requires iterating over all the elements to find a match, the hash lookup is used instead
Note however that this approach doesn't let you have duplicate values as your elements, because tables can't contain duplicate keys. In that case you can use numbers as values to store the amount of times the element is duplicated in your set, e.g.
local tab = {
["John"] = 1,
["Jane"] = 2,
["Bob"] = 35,
}
Now you have 1 John, 2 Janes and 35 Bobs
https://www.lua.org/pil/11.5.html

Related

PowerQuery choose values based on a key column

I have very large files which PowerQuery seems to handle nicely. I need to do some mathematical operations using column d and the value from columns a, b or c based on the value of the key column. My first thought is to isolate the salient value making a column called Salient which selects the value I need and then go from there. In Excel, this might be: =INDEX($A:$E, ROW(F2), MATCH(A2,$A$1:$D$1)).
In reality, I have between 50 and 100 columns as well as millions of rows, so extra points for computational efficiency.
You can define a custom column Salient with just this as the definition:
Record.Field(_, [Key])
The M code for the whole step looks like this:
= Table.AddColumn(#"Prev Step Name", "Salient", each Record.Field(_, [Key]), Int64.Type)
The _ represents the current row, which is a record data type that can be expressed as e.g.
[Key = "a", a = 17, b = 99, c = 21, d = 12]
and you use Record.Field to pick the field corresponding to the Key.

Google Apps Script: Activate and sort rows when cells in given column are not blank

I'm extremely new to Apps Script and trying to make my first thing. It's a shopping list.
I want to create a function that will activate and then sort (by Column 1, 'Aisle #') all rows where there are values in a given other column (Column 3, 'Qty'). The idea is to sort the items on the list for that week (i.e., with a value filled in for Qty) by aisle to give me the order I should be looking for things. I do not want to sort items which are in the spreadsheet but without
a value for Qty.
Here is what I've got so far:
var sheet = ss.getActiveSheet()
var range = sheet.getDataRange();
var rangeVals = range.getValues()
function orderList2(){
if(rangeVals[3] != ""){
sheet.activate().sort(1, ascending=true);
};
};
I'm trying to use "if" to define which rows to activate before doing the sort (as I don't want to sort the entire sheet—I only want to sort the items I will be buying that week, i.e., the items with a value in Column 3). The script runs but ends up sorting the entire sheet.
The closest thing I could find was an iteration, but when I did it, it ended up only activating the top-left cell.
Any help you can provide would be greatly appreciated!
Cheers,
Nick
Answer:
Use Range.sort() instead of Sheet.sort() if you don't want to sort the entire sheet.
Explanation:
You want to sort the data according to the value in column A (Aisle #), if the corresponding value in C (Qty) is not empty.
If my assumption is correct, the rows where Qty is empty should go below the rest of data, and they should not be sorted according to their Aisle #.
In this case, I'd suggest the following:
Sort the full range of data (headers excluded) according to Qty, so that the rows without a Qty are placed at the bottom, using Range.sort() (if you don't need to exclude the headers, you can use Sheet.sort() instead).
Use SpreadsheetApp.flush() to apply the sort to the spreadsheet.
Use getValues(), filter() and length to know how many rows in the initial range have their column C populated (variable QtyElements in the sample below).
Using QtyElements, retrieve the range of rows with a non-empty column C, and sort it according to column 1, using Range.sort().
Code sample:
function orderList2() {
var sheet = SpreadsheetApp.getActiveSheet();
var firstRow = 2; // Range starts at row 2, header row excluded
var fullRange = sheet.getRange(firstRow, 1, sheet.getLastRow() - firstRow + 1, sheet.getLastColumn());
fullRange.sort(3); // Sort full range according to Qty
SpreadsheetApp.flush(); // Refresh spreadsheet
var QtyElements = fullRange.getValues().filter(row => row[2] !== "").length;
sheet.getRange(firstRow, 1, QtyElements, sheet.getLastColumn())
.sort(1); // If not specified, default ascending: true
//.sort({column: 1, ascending: false}); // Uncomment if you want descending sort
}
Reference:
Range.sort(sortSpecObj)

How to understand part and partition of ClickHouse?

I see that clickhouse created multiple directories for each partition key.
Documentation says the directory name format is: partition name, minimum number of data block, maximum number of data block and chunk level. For example, the directory name is 201901_1_11_1.
I think it means that the directory is a part which belongs to partition 201901, has the blocks from 1 to 11 and is on level 1. So we can have another part whose directory is like 201901_12_21_1, which means this part belongs to partition 201901, has the blocks from 12 to 21 and is on level 1.
So I think partition is split into different parts.
Am I right?
Parts -- pieces of a table which stores rows. One part = one folder with columns.
Partitions are virtual entities. They don't have physical representation. But you can say that these parts belong to the same partition.
Select does not care about partitions.
Select is not aware about partitioning keys.
BECAUSE each part has special files minmax_{PARTITIONING_KEY_COLUMN}.idx
These files contain min and max values of these columns in this part.
Also this minmax_ values are stored in memory in a (c++ vector) list of parts.
create table X (A Int64, B Date, K Int64,C String)
Engine=MergeTree partition by (A, toYYYYMM(B)) order by K;
insert into X values (1, today(), 1, '1');
cd /var/lib/clickhouse/data/default/X/1-202002_1_1_0/
ls -1 *.idx
minmax_A.idx <-----
minmax_B.idx <-----
primary.idx
SET send_logs_level = 'debug';
select * from X where A = 555;
(SelectExecutor): MinMax index condition: (column 0 in [555, 555])
(SelectExecutor): Selected 0 parts by date
SelectExecutor checked in-memory part list and found 0 parts because minmax_A.idx = (1,1) and this select needed (555, 555).
CH does not store partitioning key values.
So for example toYYYYMM(today()) = 202002 but this 202002 is not stored in a part or anywhere.
minmax_B.idx stores (18302, 18302) (2020-02-10 == select toInt16(today()))
In my case, I had used groupArray() and arrayEnumerate() for ranking in Populate. I thought that Populate can run query with new data on the partition (in my case: toStartOfDay(Date)), the total sum of new inserted data is correct but the groupArray() function is doesn't work correctly.
I think it's happened because when insert one Part, CH will groupArray() and rank on each Part immediately then merging Parts in one Partition, therefore i wont get exactly the final result of groupArray() and arrayEnumerate() function.
Summary, Merge
[groupArray(part_1) + groupArray(part_2)] is different from
groupArray(Partition)
with
Partition=part_1 + part_2
The solution that i tried is insert new data as one block size, just like using groupArray() to reduce the new data to the number of rows that is lower than max_insert_block_size=1048576. It did correctly but it's hard to insert new data of 1 day as one Part because it will use too much memory for querying when populating the data of 1 day (almost 150Mn-200Mn rows).
But do u have another solution for Populate with groupArray() for new inserting data, such as force CH to use POPULATE on each Partition, not each Part after merging all the part into one Partition?

data structure with STL I not understood how to declare

Your program should be able to achieve the following functions,
1- To add a new node to the bingeing, specific position, or the 2-
3-
4-
5- 6- 7- 8- 9- 10- 11-
To display List items.
To display List items in reverse order
To count the number of items
To insert a new item at the beginning
To insert a new item at the end
To insert a new item at the middle
To delete the first item
To delete an item from the middle
To delete the last item
To search an existing item and return node position
In Swift, the syntax is pretty much straight forward when you need to do some sort of action with arrays.
Here are the examples that you were looking for.
var numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
// Insert a number to the beginning of the array
numbers.insert(0, at: 0)
// Append a number to the end of the array
numbers.append(11)
// Display the array of numbers in reverse order
print(numbers.reverse())
// Count the number of items in the array
numbers.count
// Remove the first item from the array
numbers.removeFirst()
// Remove the last item from the array
numbers.removeLast()
// Remove the item in the middle of the array (Only if the array count is not an even number so it has a middle value)
numbers.remove(at: (numbers.count / 2))
// Find the index of the number 3
numbers.firstIndex(of: 3)

How to filter clickhouse table by array column contents?

I have a clickhouse table that has one Array(UInt16) column. I want to be able to filter results from this table to only get rows where the values in the array column are above a threshold value. I've been trying to achieve this using some of the array functions (arrayFilter and arrayExists) but I'm not familiar enough with the SQL/Clickhouse query syntax to get this working.
I've created the table using:
CREATE TABLE IF NOT EXISTS ArrayTest (
date Date,
sessionSecond UInt16,
distance Array(UInt16)
) Engine = MergeTree(date, (date, sessionSecond), 8192);
Where the distance values will be distances from a certain point at a certain amount of seconds (sessionSecond) after the date. I've added some sample values so the table looks like the following:
Now I want to get all rows which contain distances greater than 7. I found the array operators documentation here and tried the arrayExists function but it's not working how I'd expect. From the documentation, it says that this function "Returns 1 if there is at least one element in 'arr' for which 'func' returns something other than 0. Otherwise, it returns 0". But when I run the query below I get three zeros returned where I should get a 0 and two ones:
SELECT arrayExists(
val -> val > 7,
arrayEnumerate(distance))
FROM ArrayTest;
Eventually I want to perform this select and then join it with the table contents to only return rows that have an exists = 1 but I need this first step to work before that. Am I using the arrayExists wrong? What I found more confusing is that when I change the comparison value to 2 I get all 1s back. Can this kind of filtering be achieved using the array functions?
Thanks
You can use arrayExists in the WHERE clause.
SELECT *
FROM ArrayTest
WHERE arrayExists(x -> x > 7, distance) = 1;
Another way is to use ARRAY JOIN, if you need to know which values is greater than 7:
SELECT d, distance, sessionSecond
FROM ArrayTest
ARRAY JOIN distance as d
WHERE d > 7
I think the reason why you get 3 zeros is that arrayEnumerate enumerates over the array indexes not array values, and since none of your rows have more than 7 elements arrayEnumerates results in 0 for all the rows.
To make this work,
SELECT arrayExists(
val -> distance[val] > 7,
arrayEnumerate(distance))
FROM ArrayTest;

Resources