i want to search for documents that field exists only search term in elasticsearch - elasticsearch

part of my document mapping below:
"character_cut": {
"type": "keyword"
}
and sample data is here.
doc1
character_cut: ["John"]
doc2
character_cut: ["John", "Smith"]
doc3
character_cut: ["Smith", "Jessica", "Anna"]
doc4
character_cut: ["John"]
if i find "John" will retrive doc1, doc2, doc4.
how can i retrive only doc1, doc4 with "John" query?

There are 2 ways to do it.
1. Token_count
A field of type token_count is really an integer field which accepts string values, analyzes them, then indexes the number of tokens in the string.
PUT index-name
{
"mappings": {
"properties": {
"character_cut":{
"type": "text",
"fields": {
"keyword":{
"type":"keyword"
},
"length":{
"type":"token_count", ---> no of keyword tokens
"analyzer":"keyword"
}
}
}
}
}
}
Query
{
"query": {
"bool": {
"must": [
{
"term": {
"character_cut.keyword": {
"value": "John"
}
}
},
{
"term": {
"character_cut.length": {
"value": 1 --> replace with no of matches required
}
}
}
]
}
}
}
2. Using script query
{
"query": {
"bool": {
"must": [
{
"term": {
"character_cut.keyword": {
"value": "John"
}
}
},
{
"script": {
"script": "doc['character_cut.keyword'].size()==1"
--> replace with no of matches required
}
}
]
}
}
}
token_count will calculate count at index time so it will be faster than script which will compute at run time

Related

Proximity-Relevance in elasticsearch

I have an json record in the elastic search with fields
"streetName": "5 Street",
"name": ["Shivam Apartments"]
I tried the below query but it does not return anything if I add streetName bool in the query
{
"query": {
"bool": {
"must": [
{
"bool": {
"must": {
"match": {
"name": {
"query": "shivam apartments",
"minimum_should_match": "80%"
}
}
}
}
},
{
"bool": {
"must": {
"match": {
"streetName": {
"query": "5 street",
"minimum_should_match": "80%"
}
}
}
}
}
]
}
}
}
Document Mapping
{
"rabc_documents": {
"mappings": {
"properties": {
"name": {
"type": "text",
"analyzer": "autocomplete_analyzer",
"position_increment_gap": 0
},
"streetName": {
"type": "keyword"
}
}
}
}
}
Based on the E.S Documentation (Keywords in Elastic Search)
"Keyword fields are only searchable by their exact value".
Along with that keywords are case sensitive as well.
Taking aforementioned into account:
Searching for "5 street" will not match "5 Street" ('s' vs 'S') on keyword field
minimum_should_match will not work on a keyword field.
Suggestion: For partial matches use "text" mapping instead of "keyword". Keywords are meant to be used for filtering, aggregation based on term, etc.

Can you reference other queries in Elasticsearch percolator?

can percolator queries reference other stored query docs in a percolator index? For example, given I have the following Boolean query, with _id=1, already indexed in the percolator:
{
"query": {
"bool": {
"must": [
{ "term": { "tag": "wow" } }
]
}
}
}
Could I have another query, with _id=2, indexed (note that I'm making up the _percolator_ref_id terms query key):
{
"query": {
"bool": {
"should": [
{ "term": { "tag": "elasticsearch" } },
{ "terms" : { "_percolator_ref_id": [1] } }
]
}
}
}
If I percolated the following document:
{ "tag": "wow" }
I would expect both _id=1 and _id=2 queries to match. Does some functionality like _percolator_ref_id exist?
Thanks!
Edit: To clarify, I do not know beforehand how many query references appear in a given query (e.g., the _id=2 query could reference 10 other queries potentially).
You can do something like below
2 queries are registered in below index
PUT myindex
{
"mappings": {
"properties": {
"query1": {
"type": "percolator"
},
"query": {
"type": "percolator"
},
"field": {
"type": "text"
}
}
}
}
You can use bool and must/should to combine different queries
GET /myindex/_search
{
"query": {
"bool": {
"must": [
{
"percolate": {
"field": "query",
"document": {
"field": "fox jumps over the lazy dog"
}
}
},
{
"percolate": {
"field": "query1",
"document": {
"field": "fox jumps over the lazy dog"
}
}
}
]
}
}
}

Querying Nested JSON based on 1 term value

I have indexed JSON like below format
JSON:
{"work":[{"organization":"abc", end:"present"},{"organization":"edf", end:"old"}]}
{"work":[{"organization":"edf", end:"present"},{"organization":"abc", end:"old"}]}
I want to query records where organization is "abc" and end is "present"
but below query is not working
work.0.organization: "abc" AND work.0.end:"present"
No records are matched
if I give query like below
work.organization: "abc" AND work.end:"present"
Both the records are matched. Whereas only the first record is what I want
The matched record should be only the below
{"work":[{"organization":"abc", end:"present"},{"organization":"edf", end:"old"}]}
You have to use nested_types. First map work as nested type in elastic using following mappings
PUT index_name_3
{
"mappings": {
"document_type" : {
"properties": {
"work" : {
"type": "nested",
"properties": {
"organization" : {
"type" : "text"
},
"end" : {
"type" : "text"
}
}
}
}
}
}
}
Use the following query to do nested filter match and innerhits
{
"query": {
"nested": {
"path": "work",
"inner_hits": {},
"query": {
"bool": {
"must": [{
"term": {
"work.organization": {
"value": "abc"
}
}
},
{
"term": {
"work.end": {
"value": "present"
}
}
}
]
}
}
}
}
}

Elasticsearch Query Filter for Word Count

I am currently looking for a way to return documents with a maximum of n words in a certain field.
The query could look like this for a resultset that contains documents with less than three words in the "name" field but there is nothing like word_count as far as I know.
Does anyone know how to handle this, maybe even in a different way?
GET myindex/myobject/_search
{
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"word_count": {
"name": {
"lte": 3
}
}
}
]
}
},
"query": {
"match_all" : { }
}
}
}
}
You can use the token_count data type in order to index the number of tokens in a given field and then search on that field.
# 1. create the index/mapping with a token_count field
PUT myindex
{
"mappings": {
"myobject": {
"properties": {
"name": {
"type": "string",
"fields": {
"word_count": {
"type": "token_count",
"analyzer": "standard"
}
}
}
}
}
}
}
# 2. index some documents
PUT index/myobject/1
{
"name": "The quick brown fox"
}
PUT index/myobject/2
{
"name": "brown fox"
}
# 3. the following query will only return document 2
POST myindex/_search
{
"query": {
"range": {
"name.word_count": { 
"lt": 3
}
}
}
}

Elastic Search : Match Query not working in Nested Bool Filters

I am able to get data for the following elastic search query :
{
"query": {
"filtered": {
"query": [],
"filter": {
"bool": {
"must": [
{
"bool": {
"should": [
{
"term": {
"gender": "malE"
}
},
{
"term": {
"sentiment": "positive"
}
}
]
}
}
]
}
}
}
}
}
However, If I query using "match" - I get error message with 400 status response
{
"query": {
"filtered": {
"query": [],
"filter": {
"bool": {
"must": [
{
"bool": {
"should": [
{
"match": {
"gender": "malE"
}
},
{
"term": {
"sentiment": "positive"
}
}
]
}
}
]
}
}
}
}
}
Is match query not supported in nested bool filters ?
Since the term query looks for the exact term in the field’s inverted index and I want to query gender data as case_insensitive field - Which approach shall I try ?
Settings of the index :
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"analyzer_keyword": {
"tokenizer": "keyword",
"filter": "lowercase"
}
}
}
}
}
}
Mapping for field Gender:
{"type":"string","analyzer":"analyzer_keyword"}
The reason you're getting an error 400 is because there is no match filter, only match queries, even though there are both term queries and term filters.
Your query can be as simple as this, i.e. no need for a filtered query, simply put your term and match queries into a bool/should:
{
"query": {
"bool": {
"should": [
{
"match": {
"gender": "male"
}
},
{
"term": {
"sentiment": "positive"
}
}
]
}
}
}
This answer is for ElasticSearch 7.x. As I understand from the question, you would like to use a match query for the gender field and a term query for the sentiment field. The mappings for each of these field should look like below:
"sentiment": {
"type": "keyword"
},
"gender": {
"type": "text"
}
The corresponding search API would be:
"query": {
"bool": {
"must": [
{
"terms": {
"sentiment": [
"very positive", "positive"
]
}
},
{
"match": {
"gender": "malE"
}
}
]
}
}
This search API returns all the documents where gender is "Male"/"MALE"/"mALe" etc. So, you may have indexed the gender field holding "mALe", but, the match query for "gender": "malE" will still be able to retrieve it. In the latest version of ElasticSearch, if the query is a match type, the value (which is "gender": "malE") will be automatically lower cased internally before search begins. But, it should not be that tough for a client of the API to pass a lowercase to the match query at the onset itself. Coming to the sentiment field, since, its a keyword field, you can search for values that contain spaces too like very positive.

Resources