Elasticsearch match query on each separate value in a multi value field without nested - elasticsearch

For a multi-valued field, like this:
PUT match_test
{
"mappings": {
"properties": {
"companies": {
"type": "text"
}
}
}
}
POST match_test/_doc/1
{
"companies": ["bank of canada", "japan games and movies", "microsoft canada"]
}
This query returns the document we inserted above:
GET match_test/_search
{
"query": {
"match": {
"companies": {
"query": "canada games",
"operator": "and"
}
}
}
}
Is there any way to tell elastic to match to each item in the list separately?
I want the doc to match "bank of", "of America", "bank", "games", "Canada", but not "Microsoft games"
I do not want to use nested documents or scripts

If you want to find words that are far apart from each other but are still on the same array index , then you can use position_increment_gap.
When creating a mapping, set position_increment_gap of the field to 100. Elasticsearch will automatically index array data at each position with +100 in position for the data at the next index.
Then write a match_phrase query with slop 99.
PUT match_test
{
"mappings": {
"properties": {
"companies": {
"type": "text",
"position_increment_gap": 100
}
}
}
}
GET match_test/_search
{
"query": {
"match_phrase": {
"companies": {
"query": "japan movies",
"slop":99
}
}
}
}
Read more about it here https://www.elastic.co/guide/en/elasticsearch/reference/current/position-increment-gap.html

Related

Number of nested objects in Elasticsearch

Looking for a way to get the number of nested objects, for querying, sorting etc.
For example, given this index:
PUT my-index-000001
{
"mappings": {
"properties": {
"some_id": {"type": "long"},
"user": {
"type": "nested",
"properties": {
"first": {
"type": "keyword"
},
"last": {
"type": "keyword"
}
}
}
}
}
}
PUT my-index-000001/_doc/1
{
"some_id": 111,
"user" : [
{
"first" : "John",
"last" : "Smith"
},
{
"first" : "Alice",
"last" : "White"
}
]
}
How to filter by the number of users (e.g. query fetching all documents with more than XX users).
I was thinking to using a runtime_field but this gives an error:
GET my-index-000001/_search
{
"runtime_mappings": {
"num": {
"type": "long",
"script": {
"source": "emit(doc['some_id'].value)"
}
},
"num1": {
"type": "long",
"script": {
"source": "emit(doc['user'].size())" // <- this breaks with "No field found for [user] in mapping"
}
}
}
,"fields": [
"num","num1"
]
}
Is it possible perhaps using aggregations?
Would also be nice to know if I can sort the results (e.g. all documents with more than XX and sorted desc by XX).
Thanks.
You cannot query this efficiently
It is possible to use this hack for it, but I would only do it if you need to do some one-time fetching, not for a regular use case as it uses params._source and is therefore really slow when you have a lot of docs
{
"query": {
"function_score": {
"min_score": 1, # -> min number of nested docs to filter by
"query": {
"match_all": {}
},
"functions": [
{
"script_score": {
"script": "params._source['user'].size()"
}
}
],
"boost_mode": "replace"
}
}
}
It basically calculates a new score for each doc, where the score is equal to the length of the users array, and then removes all docs under min_score from returning
The best way to do this is to add a userCount field at indexing time (since you know how many elements there are) and then query that field using a range query. Very simple, efficient and fast.
Each element of the nested array is a document in itself, and thus, not queryable via the root-level document.
If you cannot re-create your index, you can leverage the _update_by_query endpoint in order to add that field:
POST my-index-000001/_update_by_query?wait_for_completion=false
{
"script": {
"source": """
ctx._source.userCount = ctx._source.user.size()
"""
}
}

How to store terms to search on those terms in the ElasticSearch?

I have a requirement where the search should be made on some specific terms. For example,
{
"_index":"idx_name",
"_type":"_doc",
"_id":"82000223323",
"_score":1,
"_source":{
"title":"where is my card?"
}
}
Assuming the above document is in Elasticsearch, I need to fetch this document whenever there is a query on either debit or credit keywords. So, How do I go about solving this in ES? What would be the mapping for that new field and what would be the right query?
You can create a new field called card_type to index types debit and/or credit.
So, you can use Term Query to filter results by each type.
Mapping
{
"mappings": {
"properties": {
"title": {
"type": "text"
},
"card_type": {
"type": "keyword"
}
}
}
}
POST my-index-000001/_doc
{
"title": "where is my card?",
"card_type": "debit"
}
OR
POST my-index-000001/_doc
{
"title": "any value here",
"card_type": ["debit", "credit"]
}
The new query filter by type debit.
GET my-index-000001/_search
{
"query": {
"bool": {
"filter": [
{
"term": {
"card_type": "debit"
}
}
],
"must": [
{
"match": {
"title": "where is my card?"
}
}
]
}
}
}

Elasticsearch 5.X Percolate: How to autogenerate copy_to fields?

In ES 2.3.3, many queries in the system I'm working on use the _all field. Sometimes these are registered to a percolate index, and when running percolator on the doc, _all is generated automatically.
In converting to ES 5.X _all is being deprecated and so _all has been replaced with a copy_to field that contains the components that we actually care about, and it works great for those searches.
Registering the same query to a percolate index with the same document mapping including copy_to fields works fine. Sending a percolate query with the document never results in a hit for a copy_to field however.
Manually building the copy_to field via simple string concatenation seems to work, it's just that I'd expect to be able to Query -> DocIndex and get the same result as Doc -> PercolateQuery... So I'm just looking for a way to have ES generate the copy_to fields automatically on a document being percolated.
Ended up there was nothing wrong with ES of course, posting here in case it helps someone else. Figured it out while attempting to generate a simpler example to post here with details... Basically the issue came down to the fact that attempting to percolate a document of a type that doesn't exist in the percolate index doesn't give any errors back, but seems to apply all percolate queries without applying any mappings which was just confusing as it worked for simple test cases, but not complex ones. Here's an example:
From the copy_to docs, generate an index with a copy_to mapping. See that a query to the copy_to field works.
PUT my_index
{
"mappings": {
"my_type": {
"properties": {
"first_name": {
"type": "text",
"copy_to": "full_name"
},
"last_name": {
"type": "text",
"copy_to": "full_name"
},
"full_name": {
"type": "text"
}
}
}
}
}
PUT my_index/my_type/1
{
"first_name": "John",
"last_name": "Smith"
}
GET my_index/_search
{
"query": {
"match": {
"full_name": {
"query": "John Smith",
"operator": "and"
}
}
}
}
Create a percolate index with the same type
PUT /my_percolate_index
{
"mappings": {
"my_type": {
"properties": {
"first_name": {
"type": "text",
"copy_to": "full_name"
},
"last_name": {
"type": "text",
"copy_to": "full_name"
},
"full_name": {
"type": "text"
}
}
},
"queries": {
"properties": {
"query": {
"type": "percolator"
}
}
}
}
}
Create a percolate query that matches our other percolate query on the copy_to field, and a second query that just queries on a basic unmodified field
PUT /my_percolate_index/queries/1?refresh
{
"query": {
"match": {
"full_name": {
"query": "John Smith",
"operator": "and"
}
}
}
}
PUT /my_percolate_index/queries/2?refresh
{
"query": {
"match": {
"first_name": {
"query": "John"
}
}
}
}
Search, but with the wrong type... there will be a hit on the basic field (first_name: John) even though no document mappings match the request
GET /my_percolate_index/_search
{
"query" : {
"percolate" : {
"field" : "query",
"document_type" : "non_type",
"document" : {
"first_name": "John",
"last_name": "Smith"
}
}
}
}
{"took":7,"timed_out":false,"_shards":{"total":5,"successful":5,"skipped":0,"failed":0},"hits":{"total":1,"max_score":0.2876821,"hits":[{"_index":"my_percolate_index","_type":"queries","_id":"2","_score":0.2876821,"_source":{
"query": {
"match": {
"first_name": {
"query": "John"
}
}
}
}}]}}
Send in the correct document_type and see both matches as expected
GET /my_percolate_index/_search
{
"query" : {
"percolate" : {
"field" : "query",
"document_type" : "my_type",
"document" : {
"first_name": "John",
"last_name": "Smith"
}
}
}
}
{"took":7,"timed_out":false,"_shards":{"total":5,"successful":5,"skipped":0,"failed":0},"hits":{"total":2,"max_score":0.51623213,"hits":[{"_index":"my_percolate_index","_type":"queries","_id":"1","_score":0.51623213,"_source":{
"query": {
"match": {
"full_name": {
"query": "John Smith",
"operator": "and"
}
}
}
}},{"_index":"my_percolate_index","_type":"queries","_id":"2","_score":0.2876821,"_source":{
"query": {
"match": {
"first_name": {
"query": "John"
}
}
}
}}]}}

Elasticsearch Query Filter for Word Count

I am currently looking for a way to return documents with a maximum of n words in a certain field.
The query could look like this for a resultset that contains documents with less than three words in the "name" field but there is nothing like word_count as far as I know.
Does anyone know how to handle this, maybe even in a different way?
GET myindex/myobject/_search
{
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"word_count": {
"name": {
"lte": 3
}
}
}
]
}
},
"query": {
"match_all" : { }
}
}
}
}
You can use the token_count data type in order to index the number of tokens in a given field and then search on that field.
# 1. create the index/mapping with a token_count field
PUT myindex
{
"mappings": {
"myobject": {
"properties": {
"name": {
"type": "string",
"fields": {
"word_count": {
"type": "token_count",
"analyzer": "standard"
}
}
}
}
}
}
}
# 2. index some documents
PUT index/myobject/1
{
"name": "The quick brown fox"
}
PUT index/myobject/2
{
"name": "brown fox"
}
# 3. the following query will only return document 2
POST myindex/_search
{
"query": {
"range": {
"name.word_count": { 
"lt": 3
}
}
}
}

Elasticsearch sorting by matching array item

I have a following structure in indexed documents:
document1: "customLists":[{"id":8,"position":8},{"id":26,"position":2}]
document2: "customLists":[{"id":26,"position":1}]
document3: "customLists":[{"id":8,"position":1},{"id":26,"position":3}]
I am able to search matching documents that belong to a given list with match query "customLists.id = 26". But I need to sort the documents based on the position value within that list and ignore positions of the other lists.
So the expected results would be in order of document2, document1, document3
Is the data structure suitable for this kind of sorting and how to handle this?
One way to achieve this would be to set mapping type of customLists as nested and then use sorting by nested fields
Example :
1) Create Index & Mapping
put test
put test/test/_mapping
{
"properties": {
"customLists": {
"type": "nested",
"properties": {
"id": {
"type": "integer"
},
"position": {
"type": "integer"
}
}
}
}
}
2) Index Documents :
put test/test/1
{
"customLists":[{"id":8,"position":8},{"id":26,"position":2}]
}
put test/test/2
{
"customLists":[{"id":26,"position":1}]
}
put test/test/3
{
"customLists":[{"id":8,"position":1},{"id":26,"position":3}]
}
3) Query to sort by positon for given id
post test/_search
{
"filter": {
"nested": {
"path": "customLists",
"query": {
"term": {
"customLists.id": {
"value": "26"
}
}
}
}
},
"sort": [
{
"customLists.position": {
"order": "asc",
"mode": "min",
"nested_filter": {
"term": {
"customLists.id": {
"value": "26"
}
}
}
}
}
]
}

Resources