I have a document collection of members which have two relevant properties: _key and score. I've also created a persistent index on the score field, as that should make sorting significantly faster. I want to write an AQL query that returns different results based on the sorted index of a specific member (referred to as A):
Always returns at least the top 5 members by score. (LIMIT 5)
If A is in the top 10, return the 6 - 10 ranked members. (LIMIT 5, 5)
Otherwise, return the members directly above and below A in rank. (LIMIT x - 1, 3, x = A's rank)
I was unable to do this in a single query, however I was able to fetch the rank of a member by doing something along the lines of
RETURN LENGTH(
FOR m IN members
FILTER m.score > DOCUMENT("members", "ID").score
RETURN 1
) + 1
and then use a second query to fetch the ranked data I wanted, something like
FOR m IN members
SORT m.score DESC LIMIT 10
RETURN m
or joining two sub-queries with LIMIT 5 and LIMIT rank - 2, 3 depending on the rank.
Related
I need to rank the values based on lower values as 1 and higher value as last value, for instance, If I have five customers and the sales of there values can be ranked using RANKX function(highest as 1 and lowest as 5) but in mycase I need the ranks in reverse order that is highests values to be as ranked as last and lowest value to be ranked as 1.
Tried using RANKX function and switch function and tried reversing the values but if I filter then the rank values are not correct
I have a large database of location points and their corresponding res 10 hexes.
I need to query this database and identify how many points are in a certain res 4, 5, 6, 7, 8, and 9 hex.
Is this possible without adding additional res indexes in the database? Is there a certain format/pattern in the hex naming convention I could use?
All of the children of a res N index at res M fall within a range, so you can do a range query to find them. This takes a little wrangling, but only to construct the query, not to run it.
To find all the res 10 children of a res 4 index, e.g. 841e001ffffffff:
Take cellToCenterChild('841e001ffffffff'), which evaluates to 8a1e00000007fff. This is the bottom of the range.
The top of the range is a little trickier. We don't currently expose a function for it, but you can construct it by swapping the resolution bits of the parent from 4 to 10. In hexidecimal, this is conveniently just the second character, so you can swap 4 for a yielding 8a1e001ffffffff. This is not a valid index, but will work for a range query.
Use a range query to find child indexes:
select * from my_data
where h3_index between "8a1e00000007fff" and "8a1e001ffffffff";
Assuming you have an appropriate index on h3_index, this should be fairly fast.
I found here that i can select random nodes from neo4j using next queries:
MATCH (a:Person) RETURN a ORDER BY rand() limit 10
MATCH (a:Person) with a, rand() as rnd RETURN a ORDER BY rnd limit 10
Both queries seems to do the same thing but when I try to match random nodes that are in relationship with a given node then I have different results:
The next query will return always the same nodes (nodes are not randomly selected)
MATCH (p:Person{user_id: '1'})-[r:REVIEW]->(m:Movie)
return m order by rand() limit 10
...but when I use rand() in a with clause I get indeed random nodes:
MATCH (p:Person{user_id: '1'})-[r:REVIEW]->(m:Movie)
with m, rand() as rnd
return m order by rnd limit 10
Any idea why rand() behave different in a with clause in the second query but in the first not?
It's important to understand that using rand() in the ORDER BY like this isn't doing what you think it's doing. It's not picking a random number per row, it's ordering by a single number.
It's similar to a query like:
MATCH (p:Person)
RETURN p
ORDER BY 5
Feel free to switch up the number. In any case, it doesn't change the ordering because ordering every row, when the same number is used, doesn't change the ordering.
But when you project out a random number in a WITH clause per row, then you're no longer ordering by a single number for all rows, but by a variable which is different per row.
Please recommend the optimal algorithm or solution for such a task:
There are several arrays with fractional numbers
a = [1.5, 2, 3, 4.5, 7, 10, ...(up to 100 numbers)]
b = [5, 6, 8, 14, ...]
c = [1, 2, 4, 6.25, 8.15 ...] (up to 7 arrays)
Arrays can be of arbitrary length and contain a different count of numbers.
It is required to select one number from each array in such a way that their product was into a given range.
For example data required product should be between 40 and 50.
Solution can be:
a[2] * b[2] * c[1] = 3 * 8 * 2 = 48
a[0] * b[3] * c[1] = 1.5 * 14 * 2 = 42
If there can be several solutions (different combinations), then how can you find them all in the optimal way?
This is doable, but barely. This will require combining pairs of things over and over again using a variety of strategies.
First of all if you have 2 arrays of no more than 100 things, you can create an array of all pairs, sorted by sum either ascending or descending, and it only has 10,000 things in it.
Next, we can use a heap to implement a priority queue.
With a priority queue, we can combine 2 ordered arrays of size at most 10,000 to stream out the sums in either ascending or descending order while not keeping track of more than 10,000 things. How? First we create a data structure like this:
Create priority queue
For every entry a of array A:
Put (a, B[0], 0) into our queue using the product as a priority
return a data structure which contains B and the priority queue
And now we can get values out like this:
If the priority queue is empty:
We're done
else:
Take the first element of the queue
if not at the end of B:
insert (a, b[next_index], next_index) into the queue
return that first element
And we can peek at them by just looking at the first element of the queue without touching the data structure.
This strategy can stream through 2 arrays of size 10,000 with total work just a few billion operations.
OK, so now we can arrange to always have 7 arrays. (Some may simply be a trivial [1].) We can start as follows with the brute force strategy.
Combine the first 2 ascending.
Combine the second 2 ascending.
Combine the third 2 descending.
Arrange the last descending.
Next we can use the priority queue merge strategy as follows:
Combine (first 2) with (second 2) ascending
Combine (third 2) with last descending
We just need the generators at the moment.
Now our strategy will look like this:
For each combination (in ascending order) from first 4:
For each combination that lands in window from last 3:
emit final combination
But how do we do the window? Well, as the combination from the first 4 goes up, the window that the last 3 has to fall in goes down. So adjusting the window looks like this:
while there is a next value and next value is large enough to fit in the window:
Extract next value
Add next value to end of window
while first value is too large for the window:
remove first value from the window
(Variable sized arrays, such as Python's List, can do both these operations in amortized O(1) each.)
So our actual way to finish is:
For each combination (in ascending order) from first 4:
adjust entries in window from last 3
For each in window from last 3:
emit final combination
This has a fixed overhead of a few billion operations plus O(number of answers) to actually emit the combinations. This includes a number of data structures with around 10k items, plus a window whose maximum size is 1 million items for a maximum memory usage of a few hundred MB.
I have been able to convert a csv file to list format using a function. In doing so I was able to assign a name to a class number and thereafter 3 additional numbers e.g:
In the csv file:
Hussain 1 7 8 0
Alexandra 1 0 0 2
Became :
['Alexandra', 2],['Hussain', 8]
As the sorting method asked for the name in alphabetical order and the person's highest score. I used tuples to complete the code above and would like to carry on using tuples.
Now I wish to be able to sort this so that it becomes highest averages to lowest, e.g, the sorting method for averages would result in:
[Hussain, 1.66666666667],[Alexandra, 0.6666666666]
These numbers are what I expect as they are the averages of the last three numbers in the csv file as the 2nd column is being ignored here. As Hussain has the highest average he is placed first. I would appreciate any possible help.
What I would like to be done is the following:
I would like to be able to print out all the students in order of highest averages to lowest. As Hussain has a higher average of 1.6, he is printed out first then Alexandra is printed as she has a lower average. These two students are from the same class (shown in the second column of the csv file) and they are to be printed when the user chooses class 1 to be sorted.
TIA
suppose you have a list of class 1 like:
class1 = [['Alexandra', 1, 0, 0, 2], ['Hussain', 1, 7, 8, 0]]
then you sorted this according second element in lists of list
##this is class number 1 and you find class 1 people in second index of list by i[1] == 1(class number)
avg_list = [[i[0], float(sum(i[2:]))/len(i[2:])] for i in class1 if i[1] == 1 ]
dd = sorted(avg_list, key=lambda x: x[1])
dd.reverse()
print dd
Output:
[['Hussain', 5.0], ['Alexandra', 0.6666666666666666]]
Use key function in python's sort function. Read about it here:
https://docs.python.org/3/howto/sorting.html