Elasticsearch: Unable to search with wordforms - elasticsearch

I am trying to setup Elasticsearch, created index, added some records but can not make it return results with word forms (for example: records with substring "dreams" when I search for "dream").
My records look like this (index "myindex/movies"):
{
"id": 1,
"title": "What Dreams May Come",
... other fields
}
The configuration I tried to use:
{
"settings": {
"analysis": {
"analyzer": {
"stem": {
"tokenizer": "standard",
"filter": [
"standard",
"lowercase",
"stop",
"porter_stem"
]
}
}
}
},
"mappings": {
"movies": {
"dynamic": true,
"properties": {
"title": {
"type": "string",
"analyzer": "stem"
}
}
}
}
}
And query look like this:
{
"query": {
"query_string": {
"query": "Dream"
}
}
}
I can get result back using word "dreams" but not "dream".
Do I do something wrong?
Should I install porter_stem somehow first?

You haven't done anything wrong , just that you are searching in wrong field.
query_string , does the search on _all by default. And _all is having its own analyzer.
So either you need to apply the same analyzer to _all or point your query to title field like below -
{
"query": {
"query_string": {
"query": "dream",
"default_field": "title"
}
}
}

Related

Querying an analysed field doesn't work without informing then analyser in the query

I'm using elasticsearch 7.14 and I want to perform a query using a custom analyzer. This is the index:
PUT /my-index-001
{
"settings": {
"index": {
"number_of_shards": 3,
"number_of_replicas": 0
},
"analysis": {
"analyzer": {
"alphanumeric_only_analyzer": {
"type": "custom",
"tokenizer": "standard",
"char_filter": [
"alphanumeric_only_filter"
],
"filter": [
"lowercase"
]
}
},
"char_filter": {
"alphanumeric_only_filter": {
"type": "pattern_replace",
"pattern": "[^A-Za-z0-9]",
"replacement": ""
}
}
}
},
"mappings": {
"properties": {
"myField": {
"type": "text",
"analyzer": "alphanumeric_only_analyzer",
"search_analyzer": "alphanumeric_only_analyzer"
}
}
}
}
And 2 documents to test the queries:
POST /my-index-001/_doc
{
"myField":"asd-9887"
}
POST /my-index-001/_doc
{
"myField":"asd 9887"
}
Checking the analyzer, it works as expected, resulting the token "asd9887"
POST my-index-001/_analyze
{
"analyzer": "alphanumeric_only_analyzer",
"text": "aSd 9887"
}
Since everything is there and looks fine, let's start querying:
Query1: This finds both documents:
GET /my-index-001/_search
{
"query": {
"term": {
"myField": "asd9887"
}
}
}
Query2: This doesn't find any document
GET /my-index-001/_search
{
"query": {
"term": {
"myField": "asd 9887"
}
}
}
Query3: This finds both documents, but I had to inform which analyser to use:
GET /my-index-001/_search
{
"query": {
"match": {
"myField": {
"query": "asd 9887",
"analyzer": "alphanumeric_only_analyzer"
}
}
}
}
Why should I be required to do it this way, since I created the mapping informing search_analyzer as alphanumeric_only_analyzer?
There is a way to make Query2 work as is? I don't want my users having to know analyzer names, as well as I want them to be able to find both documents when querying any value that, after analyzed, matches the analyzed document value.
Use match query instead of term query
The term query does not analyze the search term. The term query only searches for the exact term you provide. So it is searching for "asd 9887" in your tokens.
Match query analyzes search term using same analyzer as field resulting in creation of same tokens. So "asd 9887" is converted to "asd9887" while searching

not able to search in compounding query using analyzer

I have a problem index which has multiple fields e.g tags (comma separated string of tags), author, tester. I am creating a global search where problems can be searched by all these fields at once.
I am using boolean query
e.g
{
"query": {
"bool": {
"must": [{
"match": {
"author": "author_username"
}
},
{
"match": {
"tester": "tester_username"
}
},
{
"match": {
"tags": "<tag1,tag2>"
}
}
]
}
}
}
Without Analyzer I am able to get the results but it uses space as separator e.g python 3 is getting searched as python or 3.
But I wanted to search Python 3 as single query. So, I have created an analyzer for tags so that every comma-separated tag is considered as one, not by standard whitespace.
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "my_tokenizer"
}
},
"tokenizer": {
"my_tokenizer": {
"type": "pattern",
"pattern": ","
}
}
}
},
"mappings": {
"properties": {
"tags": {
"type": "text",
"analyzer": "my_analyzer",
"search_analyzer": "standard"
}
}
}
}
But now I am not getting any results. Please let me know what I am missing here. I am not able to find the use of analyzer in compound queries in the documentation: https://www.elastic.co/guide/en/elasticsearch/reference/current/compound-queries.html
Adding an example:
{
"query": {
"bool": {
"must": [{
"match": {
"author": "test1"
}
},
{
"match": {
"tester": "test2"
}
},
{
"match": {
"tags": "test3, abc 4"
}
}
]
}
}
}
Results should match all the fields but for the tags field there should be a union of tags and query should be comma-separated not by space. i.e query should match test and abc 4 but above query searching for test, abc and 4.
You need to either remove search_analyzer from your mapping or pass my_analyzer in match query
GET tags/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"tags": {
"query": "python 3",
"analyzer": "my_analyzer" --> by default search analyzer is used
}
}
}
]
}
}
}
By default, queries will use the analyzer defined in the field mapping, but this can be overridden with the search_analyzer setting.

Elasticsearch discard documents that contain superset of query

Let's say I have 3 documents:
{ "cities": "Paris Zurich Milan" }
{ "cities": "Paris Zurich" }
{ "cities": "Zurich"}
cities is just text, I'm not using any custom analyzer.
I want to query for documents that have in cities both Paris and Zurich, in this order, and do not have any other city. So I want to get only the second document.
This is what I'm trying so far:
{
"query": {
"match_phrase": {
"cities": "Paris Zurich"
}
}
}
But this returns also the first document.
What should I do instead?
If you do not care about case sensitivity just use term query:
{
"query": {
"term": {
"cities.keyword": "Paris Zurich"
}
}
}
It will only match the exact value of field.
On the other hand you can create custom analyzer that will still store the exact value of field (just like keyword) with one exception: the stored value will be converted to lowercase so you will be able to find Paris Zurich as well as paris Zurich. Here is the example:
{
"settings": {
"analysis": {
"analyzer": {
"lowercase_analyzer": {
"type": "custom",
"tokenizer": "keyword",
"char_filter": [],
"filter": ["lowercase"]
}
}
}
},
"mappings": {
"doc": {
"properties": {
"cities": {
"type": "text",
"fields": {
"lowercased": {
"type": "text",
"analyzer": "lowercase_analyzer"
}
}
}
}
}
}
}
{
"query": {
"term": {
"cities.lowercased": "paris zurich" // Query string should also be in lowercase
}
}
}

elasticsearch aggregation on field containing spaces

I have a field that contains spaces called "CompanyName". The CompanyName field contains things like, "ABC Client", "BCD CLIENT 123", "EFG CLIENT HIJ"
When I index the data I set the mapping to "index" : "not_analyzed". When I run an aggregation, without any other queries, it appears to work fine.
The issue I have is that if I want to first run another query and then get an aggregation of those results, the aggregation then breaks because it interprets the spaces in the company names, so it looks like the aggregation is run over the output of the first query and not over the field that I setup.
The query:
{
"size": 0,
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"Stuff": "1"
}
},
{
"term": {
"filename": "FileOfData.sourcedata"
}
}
]
}
}
}
},
"aggs": {
"users": {
"terms": {
"field": "CompanyName"
}
}
}
}
I have also tried adding a custom analyzer using:
"analysis": {
"analyzer": {
"companynamestring": {
"type": "custom",
"tokenizer": "keyword",
"filter": "lowercase"
}
}
}
}
And it is still not working. Does anyone know how I can run a query and then get an aggregation that returns only the full CompanyName field and is not tokenized?
Thanks!

Exact match in elastic search query

I want to exactly match the string ":Feed:" in a message field and go back a day pull all such records. The json I have seems to also match the plain word " feed ". I am not sure where I am going wrong. Do I need to add "constant_score" to this query JSON? The JSON I have currently is as shown below:
{
"query": {
"bool": {
"must": {
"query_string": {
"fields": ["message"],
"query": "\\:Feed\\:"
}
},
"must": {
"range": {
"timestamp": {
"gte": "now-1d",
"lte": "now"
}
}
}
}
}
}
As stated here: Finding Exact Values, since the field has been analyzed when indexed - you have no way of exact-matching its tokens (":"). Whenever the tokens should be searchable the mapping should be "not_analyzed" and the data needs to be re-indexed.
If you want to be able to easily match only ":feed:" inside the message field you might want to costumize an analyzer which doesn't tokenize ":" so you will be able to query the field with a simple "match" query instead of wild characters.
Not able to do this with query_string but managed to do so by creating a custom normalizer and then using a "match" or "term" query.
The following steps worked for me.
create a custom normalizer (available >V5.2)
"settings": {
"analysis": {
"normalizer": {
"my_normalizer": {
"type": "custom",
"filter": ["lowercase"]
}
}
}
}
Create a mapping with type "keyword"
{
"mappings": {
"default": {
"properties": {
"title": {
"type": "text",
"fields": {
"normalize": {
"type": "keyword",
"normalizer": "my_normalizer"
},
"keyword" : {
"type": "keyword"
}
}
}
}
}
}
use match or term query
{
"query": {
"bool": {
"must": [
{
"match": {
"title.normalize": "string to match"
}
}
]
}
}
}
Use match phrase
GET /_search
{
"query": {
"match_phrase": {
"message": "7000-8900"
}
}
}
In java use matchPhraseQuery of QueryBuilder
QueryBuilders.matchPhraseQuery(fieldName, searchText);
Simple & Sweet Soln:
use term query..
GET /_search
{
"query": {
"term": {
"message.keyword": "7000-8900"
}
}
}
use term query instead of match_phrase,
match_phrase this find/match with ES-document stored sentence, It will not exactly match. It matches with those sentence words!

Resources