Uber H3 Hex - Search database of res 10 indexes for their res 4-9 parents - h3

I have a large database of location points and their corresponding res 10 hexes.
I need to query this database and identify how many points are in a certain res 4, 5, 6, 7, 8, and 9 hex.
Is this possible without adding additional res indexes in the database? Is there a certain format/pattern in the hex naming convention I could use?

All of the children of a res N index at res M fall within a range, so you can do a range query to find them. This takes a little wrangling, but only to construct the query, not to run it.
To find all the res 10 children of a res 4 index, e.g. 841e001ffffffff:
Take cellToCenterChild('841e001ffffffff'), which evaluates to 8a1e00000007fff. This is the bottom of the range.
The top of the range is a little trickier. We don't currently expose a function for it, but you can construct it by swapping the resolution bits of the parent from 4 to 10. In hexidecimal, this is conveniently just the second character, so you can swap 4 for a yielding 8a1e001ffffffff. This is not a valid index, but will work for a range query.
Use a range query to find child indexes:
select * from my_data
where h3_index between "8a1e00000007fff" and "8a1e001ffffffff";
Assuming you have an appropriate index on h3_index, this should be fairly fast.

Related

Hashing with division remainder method

I don't understand this exercise.
Hash the keys: (13,17,39,27,1,20,4,40,25,9,2,37) into a hash table of size 13 using the division-remainder method.
a) find a suitable value for m.
b) handle collisions using linked lists andvisualize theresult in a table like this
0→
1→
2→
3→
4→
5→
6→
...
c) c) Handle collision with linear probing using the sequence s(j) = j and illustrate the development in a table by starting a new column for every insert (don’t forget to copy the cells already filled to the right) and by using downwards arrows to show the probing steps in case of collisions.
my attempt:
a) if the table size is 13, m also have to be 13 because of remaining classes
b) for example 0→ 39 -> 13 ....
c) I have no idea
It would be really great if someone could help me solve it. :)
Let me give a brief overview of all topics which will be used here.
Hash-map is a data structure that uses a hash function to map identifying values, known as keys, to their associated values. It contains “key-value” pairs and allows retrieving value by key.
Like in array you can get any element using index, similarly you can get any value using a key in hash-map.
Basically something like this happens, you are given a key which is string here, then it is hashed and we put the value at that index in array.
In our example image, if you want what is value for "Billy", we again hash "Billy" we get 03. Now we just check the value at index 3 and that's the stored value for "Billy" (key)
In your case you have to hash integers not strings.
Now how to hash keys?
There can be several methods like you may sum ascii values of characters of string, or anything what you can think of.
let's say you have this array [100, 1, 3, 56, 80]
and you have to store it in bucket of size 13.
We directly can't use those array values as an index because we will need index 1 and index 100, it will make bucket have 100 size.
But if you take remainder of each array number with 13 then the remainder is always guaranteed to be from 0 to 13, thus you can use a 13 size bucket if you has keys using division method
[100, 1, 3, 56, 80] remainder with 13 -> [9, 1, 3, 4, 5]
Thus you store 100's value at index 9, and so on.
Collision:
But what if in array we have a value 5 and 80, both after will give remainder 5. What to store at index 5 now?
In our example image,
Now let's say "SACHU" this also gives 03 after hashing now two keys gave same index so this is called collision which can be resolved using two methods
linkedlist like storage (store both values at same index using linkedlist, like this)
linear probing: in simple words 03 index is already occupied we try to find next empty index, like using the most simplest probing our in image example will be, 06 is empty so we store "SACHU" value at 06 not 03.
(now this is a little hard so I highly suggest you to read hashing and collisions on internet)
Now, there is one method where we h(x) denotes the hash of an integer x.
if number is x, first hash will be, h1 = h(x)
If h1 index is not empty we again hash same index, h2 = h(h1)
An so on, I am not sure, but I guess this is what is meant by s[j] = j method.
THESE ARE THE METHODS WHICH YOU HAVE TO USE IN YOUR PROBLEM.
I prefer you to give it a try first.
You can read more about it online and and comment if still you were not able to solve it.

How to offset limit by sorted index with AQL?

I have a document collection of members which have two relevant properties: _key and score. I've also created a persistent index on the score field, as that should make sorting significantly faster. I want to write an AQL query that returns different results based on the sorted index of a specific member (referred to as A):
Always returns at least the top 5 members by score. (LIMIT 5)
If A is in the top 10, return the 6 - 10 ranked members. (LIMIT 5, 5)
Otherwise, return the members directly above and below A in rank. (LIMIT x - 1, 3, x = A's rank)
I was unable to do this in a single query, however I was able to fetch the rank of a member by doing something along the lines of
RETURN LENGTH(
FOR m IN members
FILTER m.score > DOCUMENT("members", "ID").score
RETURN 1
) + 1
and then use a second query to fetch the ranked data I wanted, something like
FOR m IN members
SORT m.score DESC LIMIT 10
RETURN m
or joining two sub-queries with LIMIT 5 and LIMIT rank - 2, 3 depending on the rank.

How do I sort a table while keeping a record of the original positions?

I have a Lua table of integers which I sort in ascending order. Later on, my script need to take the index of the smallest number (that is, the position it was on originally, before sorting), try something, then move on to the index of the second smallest number, try something, etc. until the end of the sorted list.
My problem is that I obviously lose the original positions (or indexes) of my original table when I sort it. Is there a way, maybe with a nested table holding both numbers and indexes, to keep a record of the original indexes, and then perform a sort on the integers?
To make the question a little bit clearer:
Original table: 4 6 2
Sorted table: 2 4 6
In this sorted table, I need to know that 2 was at the position 3
before sorting, that 4 was at position 1, and that 6 was at position
2.
Something like the following (untested code written in answer box) should work regardless of uniqueness (though lua table sorting isn't stable so you can't guarantee which of equal elements will be sorted first in case that matters):
local origtab = {4, 6, 2}
local sorttab = {}
for i,v in ipairs(origtab) do
sorttab[i] = {index = i, value = v}
end
table.sort(sorttab, function(a, b) return a.value < b.value end)
for i,t in ipairs(sorttab) do
-- t.index is original index
-- t.value is value
end
If all of your values are guaranteed to be unique, then this is pretty simple.
local function remember_sort(t)
local map = {}
for i=1, #t do map[t[i]] = i end
table.sort(t)
return map -- map of values to old indices
end
local t = {4, 6, 2}
local r = remember_sort(t)
for k, v in pairs(r) do print(k, v) end
2 3
4 1
6 2

Data Structure / Hash Function to link Sets of Ints to Value

Given n integer id's, I wish to link all possible sets of up to k id's to a constant value. What I'm looking for is a way to translate sets (e.g. {1, 5}, {1, 3, 5} and {1, 2, 3, 4, 5, 6, 7}) to unique values.
Guarantees:
n < 100 and k < 10 (again: set sizes will range in [1, k]).
The order of id's doesn't matter: {1, 5} == {5, 1}.
All combinations are possible, but some may be excluded.
All sets and values are constant and made only once. No deletes or inserts, no value updates.
Once generated, the only operations taking place will be look-ups.
Look-ups will be frequent and one-directional (given set, look up value).
There is no need to sort (or otherwise organize) the values.
Additionally, it would be nice (but not obligatory) if "neighboring" sets (drop one id, add one id, swap one id, etc) are easy to reach, as well as "all sets that include at least this set".
Any ideas?
Enumerate using the product of primes.
a -> 2
b -> 3
c -> 5
d -> 7
et cetera
Now hash(ab) := 6, and hash (abc) := 30
And a nice side effect is that, if "ab" is a subset of "abc", then:
hash(abc) % hash(ab) == 0
and
hash(abc) / hash(ab) == hash(c)
The bad news: You might run into overflow, the 100th prime will probably be around 1000, and 64 bits cannot accomodate 1000**10. This will not affect the functioning as a hash function; only the subset thingy will fail to work. the same method applied to anagrams
The other option is Zobrist-hashing. It is equivalent to the the primes method, but instead of primes you use a fixed set of (random) numbers, and instead of multiplying you use XOR.
For a fixed small (it needs << ~70 bits) set like yours, it might be possible to tune the zobrist tables to totally avoid collisions (yielding a perfect hash).
And the final (and simplest) way is to use a (100bit) bitmap, and treat that as a hashvalue (maybe after modulo table size)
And a totally unrelated method is to just build a decision tree on the bits of the bitmap. (the tree would have a maximal depth of k) a related kD tree on bit values
May be not the best solution, but you can do the following:
Sort the set from Lowest to highest with a simple IntegerComparator
Add each item of the set to a String
so if you have {2,5,9,4} first Step->{2,4,5,9}; second->"2459"
This way you will get a unique String from a unique set. If you really need to map them to an integer value, you can hash the string after that.
A second way I can think of is to store them in a java Set and simply map it against a HashMap with set as keys
Calculate a 'diff' from each set {1, 6, 87, 89} = {1,5,81,2,0,0,...}
{1,2,3,4} = { 1,1,1,1,0,0,0,0... };
Then binary encode each number with a variable length encoding and concatenate the bits.
It's hard to compare the sets (except for the first few equal bits), but because there can't be many large intervals in a set, all possible values just might fit into 64 bits. (slack of 16 bits at least...)

Storing a bucket of numbers in an efficient data structure

I have a buckets of numbers e.g. - 1 to 4, 5 to 15, 16 to 21, 22 to 34,....
I have roughly 600,000 such buckets. The range of numbers that fall in each of the bucket varies. I need to store these buckets in a suitable data structure so that the lookups for a number is as fast as possible.
So my question is what is the suitable data structure and a sorting mechanism for this type of problem.
Thanks in advance
If the buckets are contiguous and disjoint, as in your example, you need to store in a vector just the left bound of each bucket (i.e. 1, 5, 16, 22) plus, as the last element, the first number that doesn't fall in any bucket (35). (I assume, of course, that you are talking about integer numbers.)
Keep the vector sorted.
You can search the bucket in O(log n), with kind-of-binary search. To search which bucket does a number x belong to, just go for the only index i such that vector[i] <= x < vector[i+1]. If x is strictly less than vector[0], or if it is greater than or equal to the last element of vector, then no bucket contains it.
EDIT. Here is what I mean:
#include <stdio.h>
// ~ Binary search. Should be O(log n)
int findBucket(int aNumber, int *leftBounds, int left, int right)
{
int middle;
if(aNumber < leftBounds[left] || leftBounds[right] <= aNumber) // cannot find
return -1;
if(left + 1 == right) // found
return left;
middle = left + (right - left)/2;
if( leftBounds[left] <= aNumber && aNumber < leftBounds[middle] )
return findBucket(aNumber, leftBounds, left, middle);
else
return findBucket(aNumber, leftBounds, middle, right);
}
#define NBUCKETS 12
int main(void)
{
int leftBounds[NBUCKETS+1] = {1, 4, 7, 15, 32, 36, 44, 55, 67, 68, 79, 99, 101};
// The buckets are 1-3, 4-6, 7-14, 15-31, ...
int aNumber;
for(aNumber = -3; aNumber < 103; aNumber++)
{
int index = findBucket(aNumber, leftBounds, 0, NBUCKETS);
if(index < 0)
printf("%d: Bucket not found\n", aNumber);
else
printf("%d belongs to the bucket %d-%d\n", aNumber, leftBounds[index], leftBounds[index+1]-1);
}
return 0;
}
You will probably want some kind of sorted tree, like a B-Tree, B+ Tree, or Binary Search tree.
If I understand you correctly, you have a list of buckets and you want, given an arbitrary integer, to find out which bucket it goes in.
Assuming that none of the bucket ranges overlap, I think you could implement this in a binary search tree. That would make the lookup possible in O(logn) (whenere n=number of buckets).
It would be simple to do this, just define the left branch to be less than the low end of the bucket, the right branch to be greater than the right end. So in your example we'd end up with a tree something like:
16-21
/ \
5-15 22-34
/
1-4
To search for, say, 7, you just check the root. Less than 16? Yes, go left. Less than 5? No. Greater than 15? No, you're done.
You just have to be careful to balance your tree (or use a self balancing tree) in order to keep your worst-case performance down. this is really important if your input (the bucket list) is already sorted.
+1 to the kind-of binary search idea. It's simple and gives good performance for 600000 buckets. That being said, if it's not good enough, you could create an array with MAX BUCKET VALUE - MIN BUCKET VALUE = RANGE elements, and have each element in this array reference the appropriate bucket. Then, you get a lookup in guaranteed constant [O(1)] time, at the cost of using a huge amount of memory.
If A) the probability of accessing buckets is not uniform and B) you knew / could figure out how likely a given set of buckets were to be accessed, you could probably combine these two approaches to create a kind of cache. For example, say bucket {0, 3} were accessed all the time, as was {7, 13}, then you can create an array CACHE. . .
int cache_low_value = 0;
int cache_hi_value = 13;
CACHE[0] = BUCKET_1
CACHE[1] = BUCKET_1
...
CACHE[6] = BUCKET_2
CACHE[7] = BUCKET_3
CACHE[8] = BUCKET_3
...
CACHE[13] = BUCKET_3
. . . which will allow you to find a bucket in O(1) time assuming the value you're trying to associate a value with a bucket is between cache_low_value and cache_hi_value (if Y <= cache_hi_value && Y >= cache_low_value; then BUCKET = CACHE[Y]). On the up side, this approach wouldn't use all the memory on your machine; on the downside, it'd add the equivalent of an additional operation or two to your bsearch in the case you can't find your number / bucket pair in the cache (since you had to check the cache in the first place).
A simple way to store and sort these in C++ is to use a pair of sorted arrays that represent the lower and upper bounds on each bucket. Then, you can use int bucket_index= std::distance(lower_bounds.begin(), std::lower_bound(lower_bounds, value)) to find the bucket that the value will match with, and if (upper_bounds[bucket_index]>=value), bucket_index is the bucket you want.
You can replace that with a single struct holding the bucket, but the principle will be the same.
Let me see if I can restate your requirement. It's analogous to having, say, the day of the year, and wanting to know which month a given day falls in? So, given a year with 600,000 days(an interesting planet), you want to return a string that is either "Jan","Feb","Mar"... "Dec"?
Let me focus on the retrieval end first, and I think you can figure out how to arrange the data when initializing the data structures, given what has already been posted above.
Create a data structure...
typedef struct {
int DayOfYear :20; // an bit-int donating some bits for other uses
int MonthSS :4; // subscript to select months
int Unused :8; // can be used to make MonthSS 12 bits
} BUCKET_LIST;
char MonthStr[12] = "Jan","Feb","Mar"... "Dec";
.
To initialize, use a for{} loop to set BUCKET_LIST.MonthSS to one of the 12 months in MonthStr.
On retrieval, do a binary search on a vector of BUCKET_LIST.DayOfYear (you'll need to write a trivial compare function for BUCKET_LIST.DayOfYear). Your result can be obtained by using the return from bsearch() as the subscript into MonthStr...
pBucket = (BUCKET_LIST *)bsearch( v_bucket_list);
MonthString = MonthStr[pBucket->MonthSS];
The general approach here is to have collections of "pointers" to the strings attached to the 600,000 entries. All of the pointers in a bucket point to the same string. I used a bit int as a subscript here, instead of 600k 4 byte pointers, because it takes less memory (4 bits vs 4 bytes), and BUCKET_LIST sorts and searches as a species of int.
Using this scheme you'll use no more memory or storage than storing a simple int key, get the same performance as a simple int key, and do away with all the range checking on retrieval. IE: no if{ } testing. Save those if{ }s for initializing the BUCKET_LIST data structure, and then forget about them on retrieval.
I refer to this technique as subscript aliasing, as it resolves a many-to-one relationship by converting the subscript of the many to the subscript of the one - very efficiently I might add.
My application was to use an array of many UCHARs to index a much smaller array of double floats. The size reduction was enough to keep all of the hot-spot's data in L1 cache on the processor. 3X performance gain just from this one little change.

Resources