Histogram based search in SOLR (relevancy by position in multivalued field) - algorithm

I'm trying to add histogram based search into SOLR. For instance, we need to search closest (or exact the same) distribution to [1, 2, 3, 4]. So, most likely, we can use multivalued field of ints.
The question is - how to make results more relevant depends on their position in multivalued field?
For example
[1, 2, 3, 5]
is more relevant to [1, 2, 3, 4] - only last element is different,
than
[1, 4, 3, 2], despite a fact, that numbers in this example is exact the same, positions of 2 elements are different.
On the other hand, we don't need exact elements search, because we need to find just a closest one. The weight of elements is the same.
Any thoughts?

Related

What is the most efficient way to split an array of numbers, such that sum of each subset is as close to a target as possible, without exceeding it?

I am faced with this optimization challenge:
Take for example the array, [1, 2, 4, 3, 3, 6, 2, 1, 6, 7, 4, 2]
I want to split this into multiple sub-arrays, such that their sums are as close to a target sum. Say, 7.
The only condition I have is the sums cannot be more that the target sum.
Using a greedy approach, I can split them as
[1, 2, 4], [3, 3, 1], [6], [2, 4], [6], [7], [2]
The subset sums are 7, 7, 6, 6, 6, 7 and 2.
Another approach I tried is as follows:
Sort the array, in reverse.
Set up a running total initialized to 0, and an empty subset.
If the list is empty, proceed to Step 6.
Going down the list, pick the first number, which when added to the running total does not exceed the target sum. If no such element is found, proceed to Step 6, else proceed to Step 5.
Remove this element from the list, add it to the subset, and update running total. Repeat from step 3.
Print the current subset, clear the running total and subset. If the list isn't empty, repeat from Step 3. Else proceed to Step 7.
You're done!
This approach produced the following split:
[7], [6, 1], [6, 1], [4, 3], [4, 3], [2, 2, 2]
The subset sum was much more even: 7, 7, 7, 7, 7 and 6.
Is this the best strategy?
Any help is greatly appreciated!
I think you should use the terms "subset" and "sub-array" carefully. What you are looking for is "subset".
The best strategy here would be to write the recursive solution that tries each possibility of forming a subset so that the sum remains <= maximum allowed sum.
If you carefully understand what the recursion does, you'll understand that some sub-problems are being solved again and again. So, you can (memoize) store the solutions to the sub-problems and re-use them. Thus, reading about dynamic programming will help you.

How to find the largest number in an array made by a ascendingly sorted array and a descendingly sorted array

for an array like [1, 2, 4, 6, 8, 7, 5], how do we efficiently find the largest number in it?
We know that the first part of the array is 1, 2, 4, 6, which is ascendingly sorted and the second part is 8, 7, 5 which is a descendingly sorted array.
The simply solution would be iterate through the array, but given the array is made of two sorted array, I would image the search can be done by some sort of binary search variation to achieve o(logn) runtime complexity. However I cannot seem to come up with the solution.
What you are asking for is equivalent to finding the "peak" of an array. Here is logarithmic time solution to the problem

BST: Name of the ordering property

Assume you have an array of
[10, 2, 3, 11, 4]
and
[4, 2, 11, 10, 3]
If constructing BSTs from the following arrays,
then both constructed BSTs are different even though the data is the same. If both lists are sorted prior to making the BSTs then you would get the same BST, is there a specific algorithm which allows you to sort two or more BSTs so given the same data, they always give the same layout?
Is there a term for this property?

Genetic algorithm, cross over without duplicate data

I'm creating a genetic algorithm and I just encounter a problem, let's take an example. I have a list of numbers : [2, 3, 6, 8, 9, 1, 4] which represent my datas.
The best solution to my problem depends on the order of the numbers in the list. So I have two solution : S1 [2, 3, 9, 8, 1, 6, 4] and S2 [1, 6, 4, 3, 9, 2, 8]
If I do a basic cross-over with S1 and S2 I may obtain a solution like this : child [2, 3, 9, 8, 9, 2, 8] and we can see that the solution is bad because I duplicate datas.
The question is how may I realized an evolution (so cross-over) without duplicate thoses datas ?
thanks.
You will need a crossover operator like Ordered Crossover (OX1) that can perform crossover without duplicate thoses datas:
OX1:
A randomly selected portion of one parent is mapped to a portion
of the other parent. From the replaced portion on, the rest is filled
up by the remaining genes, where already present genes are omitted and
the order is preserved.
You should take care with mutation too, because it can change the genes order, in this case you can use a mutation operator like Reverse Sequence Mutation (RSM).
In the reverse sequence mutation operator, we take a sequence S
limited by two positions i and j randomly chosen, such that i<j.
The gene order in this sequence will be reversed by the same way as
what has been covered in the previous operation.
You have Permutation Encoding, look at this explanation: http://www.obitko.com/tutorials/genetic-algorithms/crossover-mutation.php
In general you take the elements of the first parent in order in which they are met in the first parent and you take the rest of the elements in the order in which they are met in the second parent.

Finding permutations for balanced distribution of values in lists

Sorry for the bad title, but I don't know how to call this.
I have K lists, N elements in each, for example:
[8, 5, 6]
[4, 3, 2]
[6, 5, 0]
and I want to find such a permutation of the lists' elements, so that the sum of elements in first column, second column etc are as close to each other as possible (so the distribution is "fair").
In my example that would be (probably):
[8, 5, 6]
[4, 2, 3] -- the lists contain the same values
[0, 6, 5] just in different order
sums: 12, 13, 14
Is there some more elegant way than finding all the permutations for each list, and brute-force finding the "ideal" combination of them?
I'm not asking for code, just give me a hint how to do it, if you know.
Thanks!
ps. the lists can be quite large, and more of them - think ~20x~20 max.
If you can accept an approximation, I would do it iteratively :
Sort matrix lines by descending weight (sum of line elements).
Edit : Sorting first by max element in line could be better.
Each time you are going to add a new line to your result matrix, put smaller elements into higher columns.
Order lines of your result matrix back to their initial state (if you have to).
It works with your example, but will obviously not be always perfect.
Here is an example (javascript)

Resources