Hashing methodology for collection of strings and integer ranges - algorithm

I have a data, for example per the following:
I need to match the content with the input provided for the Content & Range fields to return the matching rows. As you can see the Content field is a collection of strings & the Range field is a range between two numbers. I am looking at hashing the data, to be used for matching with the hashed input. Was thinking about Iterating through the collection of individual strings hashcode & storing it for the Content field. For the Range field I was looking at using interval trees. But then the challenge is when i hash the Content input & Range input how will i find if it that hashcode is present in the hashcode generated for the collection of strings in the Content fields & the same for the Range fields.
Please do let me know if there are any other alternate ways in which this can be achieved. Thanks.

There is a simple solution to your problem: Inverted Index.
For each item in content, create the inverted index that maps 'Content' to 'RowID', i.e. create another table of 2 columns viz. Content(string), RowIDs(comma separated strings).
For your first row, add the entries {Azd, 1}, {Zax, 1}, {Gfd, 1}..., {Mni, 1} in that table. For the second row, add entries for new Content strings. For the Content string already present in the first row ('Gfd', for example), just append the new row id to the entry you created for first row. So, Gfd's row will look like {Gfd, 1,2}.
When done processing, you will have the table that will have 'Content' strings mapped to all the rows in which this content string is present.
Do the same inverted indexing for mapping 'Range' to 'RowID' and create another table of Range(int), RowIDs(comma seperated strings).
Now, you will have a table whose rows will tell which range is present in which row ids.
Finally, for each query that you have to process, get the corresponding Content and Range row from the inverted index tables and do an intersection of those comma seperated list. You will get your answer.

Related

List.Distinct and List.Count Challenge

Within PQ, I have a table of data (below) to which I am trying to determine whether all the columns titled Plan Status-# are the same, excluding blanks, and if all the same, display that value and if not display "Varies across plans"
The PQ code is below where I use List.Distinct to create a list of all "unique values".
I then use List.Count to count this number in the list and if 1, set the column equal to the Distinct value.
If List.Count({List.Distinct({[#"Plan Status-H"],[#"Plan Status-D"],[#"Plan Status-S"],[#"Plan Status-M"],[#"Plan Status-C"],[#"Plan Status-U"]})})=1 then List.Distinct({[#"Plan Status-H"],[#"Plan Status-D"],[#"Plan Status-S"],[#"Plan Status-M"],[#"Plan Status-C"],[#"Plan Status-U"]}) else "Varies across plans"
As per the table above, the List.Count does not seem to working correctly as some of the records show a merged value of items in the list which means the List.Count for a list with multiple values is calcing as 1.
You have extra brackets in your code. Try this:
if List.Count(List.Distinct({[#"Plan Status-H"],[#"Plan Status-D"],[#"Plan Status-S"],[#"Plan Status-M"],[#"Plan Status-C"],[#"Plan Status-U"]}))=1
then List.Distinct({[#"Plan Status-H"],[#"Plan Status-D"],[#"Plan Status-S"],[#"Plan Status-M"],[#"Plan Status-C"],[#"Plan Status-U"]})
else "Varies across plans"

Simpler alternative to simultaneously Sort and Filter by column in Google Spreadsheets

I have a spreadsheet (here's a copy) with the following (headered) columns:
A: Indices for a list of groceries;
B: Names for the groceries to be indexed by column A;
C: Check column with "x" for inactive items in column B, empty otherwise;
D: Sorting indices that I want to apply to column B;
Currently, I am getting the sorted AND filtered result with this formula:
=SORT(FILTER(B2:B; C2:C = ""); FILTER(D2:D; C2:C = ""); TRUE)
The problem is that I need to apply the filter two times: one for the items and one for the indices, otherwise I get a mismatch between elements for the Sort function.
I feel that this doesn't scale well since it creates duplication.
Is there a way to get the same results with a simpler formula or another arrangement of columns?
=SORT(FILTER({Itens!B2:B\Itens!G2:G}; Itens!D2:D=""))
=SORT(FILTER({Itens!B2:B\Itens!G2:G}; Itens!D2:D="");2;1)
or maybe: =SORT(FILTER(Itens!B2:B; Itens!D2:D="");2;1)

How to split a Webix datatable column into multiple columns?

In my webix datatable, I am showing multiple values in the cells for some columns.
To identify which values belong to which header, I have separated the column headers by a '|' (pipe) and similarly the values under them as well.
Now, in place of delimiting the columns by '|' , I need to split the columns into some editable columns with the same name.
Please refer to this snippet : https://webix.com/snippet/8ce1148e
In this above snippet, for example the Scores column will be split into two more editable columns as Rank and Vote. Similarly for Place column into Type and Name.
How the values of the first array elements is shown under each of them will remain as is.
How can this be done ?
Thanks
While creating the column configuration for webix, you can provide array to the header field for the first column along with the colspan like below:
var columns = [];
columns[0] =
{"id":"From", "header":[{"text":"Date","colspan":2},{"text":"From"}]};
columns[1] =
{"id":"To","header":[null, {"text":"To"}]};
column[0] will create Date and From and column[1] will be creating the To.

Explode function returning single row

I used the field type as Array. "Select col as sample_table" returns the below output.
["[-80.86598534884,35.53423185253291],[-80.86598789514547,35.53423048990488],[-80.86598794307857,35.53423046392442]"]
When I used
select explode(col)
from sample_table.
I get the output as below which is a single row.
[-80.86598534884,35.53423185253291],[-80.86598789514547,35.53423048990488],[-80.86598794307857,35.53423046392442]
I want the output in 3 rows as below.
[-80.86598534884655,35.53423185253291]
[-80.86598789514547,35.53423048990488]
[-80.86598794307857,35.53423046392442]
As i see in the hive tutorial, explode function should return multiple rows but i don't see it happening
The input you have given appears as an array field having only one value. That entire value is taken as array of size one by explode function and thereby returns the result in a single row.

I would like to know whether label column accepts such sub zero values or empty values

I would like to apply natural number sort order to the attribute representing members' age, but including sub zero values and empty values in addition to the natural human age.
I would like to know whether label column accepts such sub zero values or empty values inevitably flown into from the manually input source data like logs.
Yes!
You have to change the data type of the label from Varchar(128) to Integer.
There are two ways to do it:
run MAQL: "ALTER DATATYPE {f_dataset_name.nm_label_name} INT;"
Go to CloudConnect LDM modeler. Click on Dataset => Edit => Show
DataTypes => change datatype on label to Integer
This data type accepts also sub zero values. For "null" or "empty" values there has to be upper case null string "NULL" in the source data.

Resources