Elasticsearch Query Filter for Word Count - elasticsearch

I am currently looking for a way to return documents with a maximum of n words in a certain field.
The query could look like this for a resultset that contains documents with less than three words in the "name" field but there is nothing like word_count as far as I know.
Does anyone know how to handle this, maybe even in a different way?
GET myindex/myobject/_search
{
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"word_count": {
"name": {
"lte": 3
}
}
}
]
}
},
"query": {
"match_all" : { }
}
}
}
}

You can use the token_count data type in order to index the number of tokens in a given field and then search on that field.
# 1. create the index/mapping with a token_count field
PUT myindex
{
"mappings": {
"myobject": {
"properties": {
"name": {
"type": "string",
"fields": {
"word_count": {
"type": "token_count",
"analyzer": "standard"
}
}
}
}
}
}
}
# 2. index some documents
PUT index/myobject/1
{
"name": "The quick brown fox"
}
PUT index/myobject/2
{
"name": "brown fox"
}
# 3. the following query will only return document 2
POST myindex/_search
{
"query": {
"range": {
"name.word_count": { 
"lt": 3
}
}
}
}

Related

i want to search for documents that field exists only search term in elasticsearch

part of my document mapping below:
"character_cut": {
"type": "keyword"
}
and sample data is here.
doc1
character_cut: ["John"]
doc2
character_cut: ["John", "Smith"]
doc3
character_cut: ["Smith", "Jessica", "Anna"]
doc4
character_cut: ["John"]
if i find "John" will retrive doc1, doc2, doc4.
how can i retrive only doc1, doc4 with "John" query?
There are 2 ways to do it.
1. Token_count
A field of type token_count is really an integer field which accepts string values, analyzes them, then indexes the number of tokens in the string.
PUT index-name
{
"mappings": {
"properties": {
"character_cut":{
"type": "text",
"fields": {
"keyword":{
"type":"keyword"
},
"length":{
"type":"token_count", ---> no of keyword tokens
"analyzer":"keyword"
}
}
}
}
}
}
Query
{
"query": {
"bool": {
"must": [
{
"term": {
"character_cut.keyword": {
"value": "John"
}
}
},
{
"term": {
"character_cut.length": {
"value": 1 --> replace with no of matches required
}
}
}
]
}
}
}
2. Using script query
{
"query": {
"bool": {
"must": [
{
"term": {
"character_cut.keyword": {
"value": "John"
}
}
},
{
"script": {
"script": "doc['character_cut.keyword'].size()==1"
--> replace with no of matches required
}
}
]
}
}
}
token_count will calculate count at index time so it will be faster than script which will compute at run time

not able to search in compounding query using analyzer

I have a problem index which has multiple fields e.g tags (comma separated string of tags), author, tester. I am creating a global search where problems can be searched by all these fields at once.
I am using boolean query
e.g
{
"query": {
"bool": {
"must": [{
"match": {
"author": "author_username"
}
},
{
"match": {
"tester": "tester_username"
}
},
{
"match": {
"tags": "<tag1,tag2>"
}
}
]
}
}
}
Without Analyzer I am able to get the results but it uses space as separator e.g python 3 is getting searched as python or 3.
But I wanted to search Python 3 as single query. So, I have created an analyzer for tags so that every comma-separated tag is considered as one, not by standard whitespace.
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "my_tokenizer"
}
},
"tokenizer": {
"my_tokenizer": {
"type": "pattern",
"pattern": ","
}
}
}
},
"mappings": {
"properties": {
"tags": {
"type": "text",
"analyzer": "my_analyzer",
"search_analyzer": "standard"
}
}
}
}
But now I am not getting any results. Please let me know what I am missing here. I am not able to find the use of analyzer in compound queries in the documentation: https://www.elastic.co/guide/en/elasticsearch/reference/current/compound-queries.html
Adding an example:
{
"query": {
"bool": {
"must": [{
"match": {
"author": "test1"
}
},
{
"match": {
"tester": "test2"
}
},
{
"match": {
"tags": "test3, abc 4"
}
}
]
}
}
}
Results should match all the fields but for the tags field there should be a union of tags and query should be comma-separated not by space. i.e query should match test and abc 4 but above query searching for test, abc and 4.
You need to either remove search_analyzer from your mapping or pass my_analyzer in match query
GET tags/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"tags": {
"query": "python 3",
"analyzer": "my_analyzer" --> by default search analyzer is used
}
}
}
]
}
}
}
By default, queries will use the analyzer defined in the field mapping, but this can be overridden with the search_analyzer setting.

Is is possible to term query with asciifolding?

I would need to match the whole field but using lowercase and asciifolding token filters. Is this possible in Elasticsearch?
For example, if I have a "Title" field for products and the product title is "Potovalni Kovček". And the user search query is "potovalni kovcek" then I need to return this product as the result. But only if the whole title matches the search query. If the user search query is "potovalni" or "Potovalni" or "kovcek" no results should be returned.
Can I create a term query with lowercase and asciifolding token filters? I couldn't figure out how to do that.
What I would do is to define the title field as a keyword and use a custom normalizer to do the job.
First let's create the index:
PUT test
{
"settings": {
"analysis": {
"normalizer": {
"exact": {
"type": "custom",
"filter": [
"lowercase",
"asciifolding"
]
}
}
}
},
"mappings": {
"doc": {
"properties": {
"title": {
"type": "keyword",
"normalizer": "exact"
}
}
}
}
}
Then, we index a sample document:
PUT test/doc/1
{
"title": "Potovalni Kovček"
}
Finally, we can search:
# Record 1 is returned
POST test/_search
{
"query": {
"term": {
"title": "Potovalni Kovček"
}
}
}
# Record 1 is returned
POST test/_search
{
"query": {
"term": {
"title": "potovalni kovcek"
}
}
}
# No record is returned
POST test/_search
{
"query": {
"term": {
"title": "potovalni"
}
}
}
# No record is returned
POST test/_search
{
"query": {
"term": {
"title": "kovcek"
}
}
}

Elasticsearch discard documents that contain superset of query

Let's say I have 3 documents:
{ "cities": "Paris Zurich Milan" }
{ "cities": "Paris Zurich" }
{ "cities": "Zurich"}
cities is just text, I'm not using any custom analyzer.
I want to query for documents that have in cities both Paris and Zurich, in this order, and do not have any other city. So I want to get only the second document.
This is what I'm trying so far:
{
"query": {
"match_phrase": {
"cities": "Paris Zurich"
}
}
}
But this returns also the first document.
What should I do instead?
If you do not care about case sensitivity just use term query:
{
"query": {
"term": {
"cities.keyword": "Paris Zurich"
}
}
}
It will only match the exact value of field.
On the other hand you can create custom analyzer that will still store the exact value of field (just like keyword) with one exception: the stored value will be converted to lowercase so you will be able to find Paris Zurich as well as paris Zurich. Here is the example:
{
"settings": {
"analysis": {
"analyzer": {
"lowercase_analyzer": {
"type": "custom",
"tokenizer": "keyword",
"char_filter": [],
"filter": ["lowercase"]
}
}
}
},
"mappings": {
"doc": {
"properties": {
"cities": {
"type": "text",
"fields": {
"lowercased": {
"type": "text",
"analyzer": "lowercase_analyzer"
}
}
}
}
}
}
}
{
"query": {
"term": {
"cities.lowercased": "paris zurich" // Query string should also be in lowercase
}
}
}

elasticsearch - find document by exactly matching a nested object

I have documents that contain multiple role/right definitions as an array of nested objects:
{
...
'roleRights': [
{'roleId':1, 'right':1},
{'roleId':2, 'right':1},
{'roleId':3, 'right':2},
]
}
I am trying to filter out document with specific roleRights, but my query seems to mix up combinations. Here is my filterQuery as "pseudoCode"
boolFilter > must > termQuery >roleRights.roleId: 1
boolFilter > must > termQuery >roleRights.type: 2
The above should only return
documents that have role 1 assigned with right 2.
But it looks like i get
all document that have role 1 assigned disregarding the right
and all documents that have right 2 assigned disregarding the role.
Any hints?
You need to map roleRights as nested (see a good explanation here), like below:
PUT your_index
{
"mappings": {
"your_type": {
"properties": {
"roleRights": {
"type": "nested",
"properties": {
"roleId": { "type": "integer" },
"right": { "type": "integer" }
}
}
}
}
}
}
Make sure to delete your index first, recreate it and re-populate it.
Then you'll be able to make your query like this:
POST your_index/_search
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "roleRights",
"query": {
"term": { "roleRights.roleId": 1}
}
}
},
{
"nested": {
"path": "roleRights",
"query": {
"term": { "roleRights.type": 2}
}
}
}
]
}
}
}

Resources