ElasticSearch: minimum_should_match and length of terms list - elasticsearch

Using ElasticSearch I'm trying to use the minimum_should_match option on a Terms Query to find documents that have a list of longs that is X% similar to the list of longs I'm querying with.
e.g:
{
"filter": {
"fquery": {
"query": {
"terms": {
"mynum": [1, 2, 3, 4, 5, 6, 7, 8, 9, 13],
"minimum_should_match": "90%",
"disable_coord": False
}
}
}
}
}
will match two documents with a mynum list of:
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
and:
[1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12]
This works and is correct since the first document has a 10 at the end while the query contained a 13 and the second document contained an 11 where again the query contained a 13.
Which means that 1 ou of 10 numbers in my query's list is different in the returned document and amounts to the allowed 90% similarity (minimum_should_match) value in the query.
Now the issue that I have is that I would like the behaviour to be different in the sense that since the second document is longer and has 11 numbers in place of 10, the difference level should ideally have been higher since it has actually two values 11 and 12 that are not in the query's list. e.g:
Instead of computing the intersection of:
(list1) [1, 2, 3, 4, 5, 6, 7, 8, 9, 13]
with:
(list2) [1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12]
which is a 10% difference
it should say that since list2 is longer than list1, the intersection should be:
(list2) [1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12]
with:
(list1) [1, 2, 3, 4, 5, 6, 7, 8, 9, 13]
which is a 12% difference
Is this possible ?
If not, how could I weight in the length of the list besides using a dense vector rather than a sparse one ? e.g:
using
[1, 2, 3, 4, 5, 6, 7, 8, 9, , , , 13]
rather than:
[1, 2, 3, 4, 5, 6, 7, 8, 9, 13]

Related

Replace multiple indexes with one value

I have an array that has the value
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
I want to replace indexes 1 to 3, [2, 3, 4], with the single value of 123.
Is there a way to do this in golang?
Rather than use an array you should be working with slices.
a := []int{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
b := append(append(a[:1:1], 123), a[4:]...)
Or if you don't need to keep the original elements then copy in place.
a := []int{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
a[1] = 123
copy(a[2:8], a[4:10])
a = a[:8]

Transform one list into another [duplicate]

This question already has answers here:
Algorithm: optimal way to rearrange a list from one order to another?
(4 answers)
Closed 4 years ago.
Given two lists, for example:
a = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
b = [2, 4, 6, 7, 0, 1, 3, 5, 8, 9]
I wish to find a series of moves which will transform list a into list b, where each move is an operation:
move(from_index, to_index)
which moves the element at location from_index and places it at location to_index. So if:
a = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
then the operation move(3,1) on the list a will transform a into:
a = [0, 3, 1, 2, 4, 5, 6, 7, 8, 9]
a = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
b = [2, 4, 6, 7, 0, 1, 3, 5, 8, 9]
move(0, 8)
a = [1, 2, 3, 4, 5, 6, 7, 0, 8, 9]
move(0, 8)
a = [2, 3, 4, 5, 6, 7, 0, 1, 8, 9]
move(1, 8)
a = [2, 4, 5, 6, 7, 0, 1, 3, 8, 9]
move(2, 8)
a = [2, 4, 6, 7, 0, 1, 3, 5, 8, 9]
a==b
Hopefully that's what you're looking for.
Basically, start with the left- most element and move it to where it should be. For example, I took 0 and placed it right after the value that it is supposed to eventually end up behind, which is 7. I continued moving from left to right until all of the elements were in the desired order.
I'd iterate over the second sequence (the sorted list) and swap items in the first. I wrote this pseudo-code in python:
>>> a = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> b = [2, 4, 6, 7, 0, 1, 3, 5, 8, 9]
>>> def swap(seq, i, j):
... a = seq[i]
... seq[i] = seq[j]
... seq[j] = a
...
>>> for index_in_b, value in enumerate(b):
... index_in_a = a.index(value)
... if index_in_b != index_in_a:
... swap(a, index_in_a, index_in_b)
... print('move {} to {}'.format(index_in_a, index_in_b))
move 0 to 2
move 1 to 4
move 2 to 6
move 3 to 7
move 4 to 6
move 5 to 6
move 6 to 7
In this case I'm moving the items in the first sequence by swapping them.
Update
We can slightly improve the performance in python by removing the move inside swap function and also removing the function call. Here is a performance comparison:
import timeit
s1 = """
a = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
b = [2, 4, 6, 7, 0, 1, 3, 5, 8, 9]
def swap(seq, i, j):
a = seq[i]
seq[i] = seq[j]
seq[j] = a
for index_in_b, value in enumerate(b):
index_in_a = a.index(value)
if index_in_b != index_in_a:
swap(a, index_in_a, index_in_b)"""
s2 = """
a = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
b = [2, 4, 6, 7, 0, 1, 3, 5, 8, 9]
for index_in_b, value in enumerate(b):
index_in_a = a.index(value)
if index_in_b != index_in_a:
a[index_in_a], a[index_in_b] = a[index_in_b], a[index_in_a]"""
# on an i7 macbook pro
timeit.timeit(s1)
4.087386846542358
timeit.timeit(s2)
3.5381240844726562
Slightly better, but for sure there are better ways to achieve this.

python appending issues, function keeps changing values of list

I was trying to visualize bubblesort by making an animated plot on some unsorted list, say np.random.permutation(10)
so naturally I would append the list every time it's altered within the bubblesort function until it's completely sorted. Here's the code
def bubblesort(A):
instant = []
for i in range(len(A)-1):
lindex=0
while lindex+1<len(A):
if A[lindex]> A[lindex+1]:
swap(A,lindex,lindex+1)
lindex+=1
else:
lindex+=1
instant.append(A)
return instant
The problem is though, instant only returns
[array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]), array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]), array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]), array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]), array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]), array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]), array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]), array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]), array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])]
which is obviously not right. What has gone wrong? Thanks!
A is being operated on in-place, and bubblesort is returning a list of references to this array. Notice that if you check A now, it is also sorted.
Changing
if A[lindex]> A[lindex+1]:
swap(A,lindex,lindex+1)
to
if A[lindex]> A[lindex+1]:
A = A.copy()
swap(A,lindex,lindex+1)
making a copy before changing anything, should show the progress of the sort.

recursive removal of elements in array

Given an array of n elements, remove any adjacent pair of elements which are equal. Repeat this operation until there are no more adjacent pairs to remove; that will be the final array.
For e.g 1 2 2 3 4 should return the array 1 3 4.
please note array need not to be sorted.
check this test case also: 1,2,2,3,4,4,3,5 o/p should be 1,5.
(2,2) and (4,4) gets removed, then (3,3) which became adjacent after the removal of (4,4)
Any time you remove a pair of elements, you also need to see if you generated another pair that you want to remove.
The algorithm should follow naturally from that observation.
In Python:
>>> l=[1,2,2,3,4,4,3,5]
>>> [x for x in l if not l.count(x) > 1]
[1, 5]
This removes all integers that occur more than once in the list. This is a correct result for your example but I think that you are really trying to state something different. I think you are saying:
list:=(an unsorted list of integers)
while adjacent_pairs(list) is True:
remove_adjacent_pairs(list)
Once again, in Python:
#!/usr/bin/env python
def dedupe_adjacent(l):
for i in xrange(len(l) - 1, 0, -1):
if l[i] == l[i-1]:
del l[i-1:i+1]
return True
return False
def process_list(l):
print "input list: ",l
i=1
while(dedupe_adjacent(l)):
print " loop ",i,":",l
i+=1
print "processed list=",l
print
process_list([1,2,2,3,4,4,3,5])
process_list([1,2,2,3,4,4,6,3,5])
Output:
input list: [1, 2, 2, 3, 4, 4, 3, 5]
loop 1 : [1, 2, 2, 3, 3, 5]
loop 2 : [1, 2, 2, 5]
loop 3 : [1, 5]
processed list= [1, 5]
input list: [1, 2, 2, 3, 4, 4, 6, 3, 5]
loop 1 : [1, 2, 2, 3, 6, 3, 5]
loop 2 : [1, 3, 6, 3, 5]
processed list= [1, 3, 6, 3, 5]
The following:
function compress(arr) {
var prev, res = [];
for (var i in arr) {
if (i == 0 || (arr[i] != arr[i - 1]))
res.push(arr[i]);
}
return res;
}
compress([1, 2, 2, 3, 3, 3, 3, 4, 3, 3, 5, 6, 7, 8, 8]);
Returns:
[1, 2, 3, 4, 3, 5, 6, 7, 8]
Also (JavaScript 1.6 solution):
[1, 2, 2, 3, 3, 3, 3, 4, 3, 3, 5, 6, 7, 8, 8].filter(function(el, i, arr) {
return i == 0 || (el != arr[i - 1]);
})
Edit: Removing any item that appears in the array more than once requires a different solution:
function dedup(arr) {
var res = [], seen = {};
for (var i in arr)
seen[arr[i]] = seen[arr[i]] ? ++seen[arr[i]] : 1;
for (var j in arr) {
if (seen[arr[j]] == 1)
res.push(arr[j]);
}
return res;
}
The following:
dedup([1, 2, 2, 3, 4, 4, 3, 5]);
Produces:
[1, 5]
I have a solution to this in Java. You need to use replaceAll method in String class in Java. You can use regular expession to remove such adjacent redundant characters:
public class MyString {
public static void main(String[] args) {
String str = "12234435";
while(!str.replaceAll("(\\w)\\1+", "").equalsIgnoreCase(str))
str = str.replaceAll("(\\w)\\1+", "");
System.out.println(str);
}
}
You can find how to give a regular expression here
I would:
Sort the array.
From the start of the array, until you are at the last element of the array do:
`count` = count the number of array[i] elements.
remove the first `count` elements of the array if `count` > 1.
The following Python 3 code will remove duplicates from a list (array). It does this by scanning the array from start towards end and compares the target element with the element one larger. If they are the same they are removed. If the element pointer is not pointing at 0, then it is reduced by 1 in order to catch nested pairs. If the two compared elements are different then the pointer is incremented.
I'm sure there's a more pythonic way to remove two adjacent elements from a list, but I'm new to Python and haven't figured that out yet. Also, you'll want to get rid of the print(indx, SampleArray) statement--I left it in there to let you follow the progress in the output listing below.
# Algorithm to remove duplicates in a semi-sorted list
def CompressArray(SampleArray):
indx=0
while(indx < len(SampleArray)-1):
print(indx, SampleArray)
if(SampleArray[indx]==SampleArray[indx+1]):
del(SampleArray[indx])
del(SampleArray[indx])
if(indx>0):
indx-=1
else:
indx+=1
return SampleArray
Here are sample runs for:
[1, 2, 2, 3, 4]
[1, 2, 2, 3, 4, 4, 3, 5]
[1, 2, 2, 3, 3, 3, 3, 4, 3, 3, 5, 6, 7, 8, 8]
[1, 2, 2, 3, 4, 6, 7, 7, 6, 4, 3, 8, 8, 5, 9, 10, 10, 9, 11]
[1, 1, 2, 3, 3, 2, 4, 5, 6, 6, 5, 7, 8, 8, 7, 4, 9]
================================
0 [1, 2, 2, 3, 4]
1 [1, 2, 2, 3, 4]
0 [1, 3, 4]
1 [1, 3, 4]
[1, 3, 4]
================================
0 [1, 2, 2, 3, 4, 4, 3, 5]
1 [1, 2, 2, 3, 4, 4, 3, 5]
0 [1, 3, 4, 4, 3, 5]
1 [1, 3, 4, 4, 3, 5]
2 [1, 3, 4, 4, 3, 5]
1 [1, 3, 3, 5]
0 [1, 5]
[1, 5]
================================
0 [1, 2, 2, 3, 3, 3, 3, 4, 3, 3, 5, 6, 7, 8, 8]
1 [1, 2, 2, 3, 3, 3, 3, 4, 3, 3, 5, 6, 7, 8, 8]
0 [1, 3, 3, 3, 3, 4, 3, 3, 5, 6, 7, 8, 8]
1 [1, 3, 3, 3, 3, 4, 3, 3, 5, 6, 7, 8, 8]
0 [1, 3, 3, 4, 3, 3, 5, 6, 7, 8, 8]
1 [1, 3, 3, 4, 3, 3, 5, 6, 7, 8, 8]
0 [1, 4, 3, 3, 5, 6, 7, 8, 8]
1 [1, 4, 3, 3, 5, 6, 7, 8, 8]
2 [1, 4, 3, 3, 5, 6, 7, 8, 8]
1 [1, 4, 5, 6, 7, 8, 8]
2 [1, 4, 5, 6, 7, 8, 8]
3 [1, 4, 5, 6, 7, 8, 8]
4 [1, 4, 5, 6, 7, 8, 8]
5 [1, 4, 5, 6, 7, 8, 8]
[1, 4, 5, 6, 7]
================================
0 [1, 2, 2, 3, 4, 6, 7, 7, 6, 4, 3, 8, 8, 5, 9, 10, 10, 9, 11]
1 [1, 2, 2, 3, 4, 6, 7, 7, 6, 4, 3, 8, 8, 5, 9, 10, 10, 9, 11]
0 [1, 3, 4, 6, 7, 7, 6, 4, 3, 8, 8, 5, 9, 10, 10, 9, 11]
1 [1, 3, 4, 6, 7, 7, 6, 4, 3, 8, 8, 5, 9, 10, 10, 9, 11]
2 [1, 3, 4, 6, 7, 7, 6, 4, 3, 8, 8, 5, 9, 10, 10, 9, 11]
3 [1, 3, 4, 6, 7, 7, 6, 4, 3, 8, 8, 5, 9, 10, 10, 9, 11]
4 [1, 3, 4, 6, 7, 7, 6, 4, 3, 8, 8, 5, 9, 10, 10, 9, 11]
3 [1, 3, 4, 6, 6, 4, 3, 8, 8, 5, 9, 10, 10, 9, 11]
2 [1, 3, 4, 4, 3, 8, 8, 5, 9, 10, 10, 9, 11]
1 [1, 3, 3, 8, 8, 5, 9, 10, 10, 9, 11]
0 [1, 8, 8, 5, 9, 10, 10, 9, 11]
1 [1, 8, 8, 5, 9, 10, 10, 9, 11]
0 [1, 5, 9, 10, 10, 9, 11]
1 [1, 5, 9, 10, 10, 9, 11]
2 [1, 5, 9, 10, 10, 9, 11]
3 [1, 5, 9, 10, 10, 9, 11]
2 [1, 5, 9, 9, 11]
1 [1, 5, 11]
[1, 5, 11]
================================
0 [1, 1, 2, 3, 3, 2, 4, 5, 6, 6, 5, 7, 8, 8, 7, 4, 9]
0 [2, 3, 3, 2, 4, 5, 6, 6, 5, 7, 8, 8, 7, 4, 9]
1 [2, 3, 3, 2, 4, 5, 6, 6, 5, 7, 8, 8, 7, 4, 9]
0 [2, 2, 4, 5, 6, 6, 5, 7, 8, 8, 7, 4, 9]
0 [4, 5, 6, 6, 5, 7, 8, 8, 7, 4, 9]
1 [4, 5, 6, 6, 5, 7, 8, 8, 7, 4, 9]
2 [4, 5, 6, 6, 5, 7, 8, 8, 7, 4, 9]
1 [4, 5, 5, 7, 8, 8, 7, 4, 9]
0 [4, 7, 8, 8, 7, 4, 9]
1 [4, 7, 8, 8, 7, 4, 9]
2 [4, 7, 8, 8, 7, 4, 9]
1 [4, 7, 7, 4, 9]
0 [4, 4, 9]
[9]
================================
I love Java, but functional solutions should get more time on this site.
In Haskell, doing things the way the question asks:
compress lst = if (length lst == length b) then lst else (compress b) where
b = helper lst
helper [] = []
helper [x] = [x]
helper (x:y:xs) = if (x == y) then (helper xs) else (x:helper (y:xs))
You can solve this problem in O(n) time, although it is a bit more complicated
compress' lst = reverse (helper [] lst) where
helper xs [] = xs
helper [] (x:xs) = helper [x] xs
helper (a:as) (x:xs)
| a == x = helper as xs
| otherwise = helper (x:a:as) xs
I think we could use a stack to check adjacent duplicated elements.
Scan the array. For each new element, if it is equal to the top element in the stack, drop it and pop the top element from the stack. Otherwise, push it into the stack.
Here is the stack based algorithm based upon the edited question.
// pseudo code, not tested
void RemoveDupp(vector<int> & vin, vector<int> & vout)
{
int i = 0, int j = -1;
vout.resize(vin.size());
while (i < vin.size())
{
if (j == -1 || vout[j] != vin[i])
vout[++j] = vin[i++]; //push
else
j--, i++; //pop
}
vout.resize(j + 1);
}

Create an array with evenly-spaced values

What is an easy way to generate an array that has values with a fixed distance between them?
For example:
1, 4, 7, 10,... etc
I need to be able to set start, end, and step distance.
Try using Range.step:
> (1..19).step(3).to_a
=> [1, 4, 7, 10, 13, 16, 19]
In Ruby 1.9:
1.step(12).to_a #=> [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
1.step(12,3).to_a #=> [1, 4, 7, 10]
Or you can splat instead of to_a:
a = *1.step(12,3) #=> [1, 4, 7, 10]

Resources