Searching for multiple values in a String array in Elastic - elasticsearch

I have a field that I am indexing into Elasticsearch that is an array of strings. So, for example, here is what the string array will look like in two records:
Record 1: {"str1", str2", str3", "str4", "str5"}
Record 2: {"str1", str2", str6", "str7", "str8"}
Question 1: I want to be able to query for multiple strings in this array. For e.g. my query has "str1", "str2". "str3" as the search parameter. I want to search for records where the string array has any of these three strings
Question 2: For the scenario above will Record 1 return with a higher score than record 2 (since all three strings are in the array for record 1 but only two are there in record 2).
Is this possible at all? Can you please help with what the query should look like and if the scoring works the way I stated.

You can index them as an array, such as:
{
"myArrayField": [ "str1", str2", str3", "str4", "str5" ],
...
}
You would then be able to query a number of ways, the simplest for your case being a match query (which is analyzed):
{
"match" : {
"myArrayField" : "str1 str2 str3"
}
}
Or a terms query (which is not analyzed):
{
"terms" : {
"myArrayField" : [ "str1", "str2", "str3" ]
}
}
And Yes, matches against more query terms will receive a higher score, so Record 1 would be scored higher than Record 2.

Related

Python OpenSearch retrieve records based on element in a list

So I need to retrieve records based on a field called "cash_transfer_ids" which is a python list.
I want to retrieve all records whose cash_transfer_ids contain a specific id value (a string).
What should the query be look like? Should I use match or term query?
Example: I want to retrieve any record whose cash_transfer_ids field contains 'abc'
Then I may get record such as
record 1: cash_transfer_ids:['abc']
record 2: cash_transfer_ids:['dfdfd', 'abc']
etc...
Thanks very much for any help!
if cash_transfer_ids is type keyword I try filter with Term.
term = "abc"
query = {
"query": {
"term": {
"cash_transfer_ids": {
"value": term
}
}
}
}
response = get_client_es().search(index="idx_test", body=query)

How to search for an exact value in a keyword array field in ElasticSearch?

I have a Keyword field that is an array and am trying to search for documents that contain one of the values in the array.
A document contains a field called allowed_groups:
"allow_groups" : [
"c4e3f246-0b1f-43cc-831e-37ca620bf083"
],
I have tried a match query
...
"must":[
{
"match": {
"allow_groups": {
"query": "c4e3f246-0b1f-43cc-831e-37ca620bf083"
}
}
},
...
This returns results but as soon as I change a single character (3 at the end of the value to a 4), the documents still return. I need to change the value to something much different for the documents to not return.
I have also tried a term query in its place but cant get documents to come back at all.
I also should mention that in the end I am trying to pass an array of values to be matched against the allow_groups keyword array and return a document when theres at least 1 single exact match. Im just testing with 1 value currently.

Project the sum of all fields in a document that match a regular expression, in elasticsearch

In Elasticsearch, I know I can specify the fields I want to return from documents that match my query using {"fields":["fieldA", "fieldB", ..]}.
But how do I return the sum of all fields that match a particular regular expression (as a new field)?
For example, if my documents look like this:
{"documentid":1,
"documentStats":{
"foo_1_1":1,
"foo_2_1":5,
"boo_1_1:3
}
}
and I want the sum of all stats that match _1_ per document?
You can define an artificial field called script_field that contains a small Groovy script, which will do the job for you.
So after your query, you can add a script_fields section like this:
{
"query" : {
...
},
"script_fields" : {
"sum" : {
"script" : "_source.documentStats.findAll{ it.key =~ '_1_'}.collect{it.value}.sum()"
}
}
}
What the script does is simply to retrieve all the fields in documentStats whose name matches _1_ and sums all their values, in this case, you'll get 4.
Make sure to enable dynamic scripting in elasticsearch.yml and restart your ES node before trying this out.

MongoDB compound indexes vs Single FIeld Indexes in terms of space consumption

According to this post compound indexes are bigger in terms of dimensions (I could not find much info on docs, so if you could point me there I would be grateful).
Suppose I have to search for the whole address (we can assume I will always have all the fields available both in collection and in the query) through a collection of addresses like
{
name: String,
street: String,
postcode: String,
City: String,
Country: String
}
My question is: how bigger a compound index would be?
If a compound index is bigger then a single field wouldn't it be better to add a hash of the concatenation of all values to all objects, add a single index to the hash field and search by that (although it do not sounds like a good practice)?
If a compound index is bigger then a single field wouldn't it be better to add a hash of the concatenation of all values to all objects, add a single index to the hash field and search by that (although it do not sounds like a good practice)?
These accomplish different things. A compound index has an order and that order has an effect. For instance, the index { 'country' : 1, 'city' : 1, 'postcode' : 1 } would allow to search for all address in a specific city of a specific country. A hash can't do that - hashes only support exact matches.
I don't see how this is bad practice at all, it's just a very narrow use case. Remember than every slight difference in spelling, additional white spaces, etc. will result in different hash values and that you can't even answer simple question like "how many address in country X do we store?". But if you don't need that, why not?
By the way, MongoDB has built-in support for this. If the address is embedded, using a hashed index on the entire subdocument will accomplish what you need:
MongoDB supports hashed indexes of any single field. The hashing function collapses embedded documents and computes the hash for the entire value,
e.g.:
> db.hash.insert( {"name": "john", "address" : { "city" : "Chicago", "state":"IL",
"country" : "US" } } );
WriteResult({ "nInserted" : 1 })
> db.hash.createIndex( { "address" : "hashed" } );
...
>
> This query uses the index and finds the document:
> db.hash.find({"address" : {"city" : "Chicago", "state": "IL", "country" : "US" } } );
>
> // this query wont find the document b/c of missing state, but is still fast (IXSCAN)
> db.hash.find({"address" : {"city" : "Chicago", "country" : "US" } } );

Sorting by value in multivalued field in elasticsearch

I have a multivalue field with integers in the document, for example
{
values: [1,2,3,4,5]
}
I apply range filter, for example from 2 to 4 and get list of document with values, contains 2,3,4.
Now I'd like to sort results, and first return documents, which contains 3.
I could do it using script sorting:
{
sort:{
_script: {
script: "doc['values'].getValues().contains(3) ? 0 : 1",
type: "number"
}
}
}
But I don't like it's performance, because getValues() returns a List actually, and contains methods is O(n).
Are any better ways?

Resources