elasticsearch aggregation on field containing spaces - elasticsearch

I have a field that contains spaces called "CompanyName". The CompanyName field contains things like, "ABC Client", "BCD CLIENT 123", "EFG CLIENT HIJ"
When I index the data I set the mapping to "index" : "not_analyzed". When I run an aggregation, without any other queries, it appears to work fine.
The issue I have is that if I want to first run another query and then get an aggregation of those results, the aggregation then breaks because it interprets the spaces in the company names, so it looks like the aggregation is run over the output of the first query and not over the field that I setup.
The query:
{
"size": 0,
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"Stuff": "1"
}
},
{
"term": {
"filename": "FileOfData.sourcedata"
}
}
]
}
}
}
},
"aggs": {
"users": {
"terms": {
"field": "CompanyName"
}
}
}
}
I have also tried adding a custom analyzer using:
"analysis": {
"analyzer": {
"companynamestring": {
"type": "custom",
"tokenizer": "keyword",
"filter": "lowercase"
}
}
}
}
And it is still not working. Does anyone know how I can run a query and then get an aggregation that returns only the full CompanyName field and is not tokenized?
Thanks!

Related

not able to search in compounding query using analyzer

I have a problem index which has multiple fields e.g tags (comma separated string of tags), author, tester. I am creating a global search where problems can be searched by all these fields at once.
I am using boolean query
e.g
{
"query": {
"bool": {
"must": [{
"match": {
"author": "author_username"
}
},
{
"match": {
"tester": "tester_username"
}
},
{
"match": {
"tags": "<tag1,tag2>"
}
}
]
}
}
}
Without Analyzer I am able to get the results but it uses space as separator e.g python 3 is getting searched as python or 3.
But I wanted to search Python 3 as single query. So, I have created an analyzer for tags so that every comma-separated tag is considered as one, not by standard whitespace.
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "my_tokenizer"
}
},
"tokenizer": {
"my_tokenizer": {
"type": "pattern",
"pattern": ","
}
}
}
},
"mappings": {
"properties": {
"tags": {
"type": "text",
"analyzer": "my_analyzer",
"search_analyzer": "standard"
}
}
}
}
But now I am not getting any results. Please let me know what I am missing here. I am not able to find the use of analyzer in compound queries in the documentation: https://www.elastic.co/guide/en/elasticsearch/reference/current/compound-queries.html
Adding an example:
{
"query": {
"bool": {
"must": [{
"match": {
"author": "test1"
}
},
{
"match": {
"tester": "test2"
}
},
{
"match": {
"tags": "test3, abc 4"
}
}
]
}
}
}
Results should match all the fields but for the tags field there should be a union of tags and query should be comma-separated not by space. i.e query should match test and abc 4 but above query searching for test, abc and 4.
You need to either remove search_analyzer from your mapping or pass my_analyzer in match query
GET tags/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"tags": {
"query": "python 3",
"analyzer": "my_analyzer" --> by default search analyzer is used
}
}
}
]
}
}
}
By default, queries will use the analyzer defined in the field mapping, but this can be overridden with the search_analyzer setting.

Elasticsearch filter to match array value of analyzed index

Mapping of the field that im trying to make filter for:
"genres": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
Theres an entry with these values:
"genres": [
"Animation",
"History"
],
I am trying to make a filter, where i would input "Animation" - it would return me all entries that have animation as their genre.
Tried using terms:
GET /test/_search
{
"query": {
"bool": {
"filter": {
"terms": {
"genres": [
"Animation",
"History"
]
}
}
}
}
}
}
This resulted with no entries, as i read more i see that i need to remap my database and put "index": "not_analyzed" - then it would return some entries.
However, i can get these results by not using filter, using something like this:
GET /tmdb/_search
{
"query": {
"bool": {
"must" : [
{
"match": {
"genres": "history"
}
},
{
"match": {
"genres": "animation"
}
}
]
}
}
}
This does give me some results, but it only returns values that have both "animation" AND "history" as their genre.
So my question - do i need to remap my database, and add the "index": "not_analyzed" to the columns that i will use the filter for, or do i go with the second option (not using filters).
Edit:
Thought something like this would work, but its not working as i expected (the operator and does not seem to work for me):
GET /test/_search
{
"query": {
"match": {
"genres": {
"query": "animation",
"query": "history",
"operator": "and"
}
}
}
}
Your first query is almost correct. If you query the genres field (i.e. analyzed) you should use a match query instead
POST /test/_search
{
"query": {
"bool": {
"should": [{
"match": {
"genres": "Animation"
}
}
},{
"match": {
"genres": "History"
}
}
}]
}
}
}
If you query the genres.keyword field (i.e. not analyzed) then you can use the terms query
POST /test/_search
{
"query": {
"bool": {
"filter": {
"terms": {
"genres.keyword": [
"Animation",
"History"
]
}
}
}
}
}
}
Note: not_analyzed was used in ES 2.x and earlier, starting with ES 5 the using the keyword type is equivalent.

Autocomplete functionality using elastic search

I have an elastic search index with following documents and I want to have an autocomplete functionality over the specified fields:
mapping: https://gist.github.com/anonymous/0609b1d110d91dceb9a90faa76d1d5d4
Usecase:
My query is of the form prefix type eg "sta", "star", "star w" .."start war" etc with an additional filter as tags = "science fiction". Also there queries could match other fields like description, actors(in cast field, not this is nested). I also want to know which field it matched to.
I investigated 2 ways for doing that but non of the methods seem to address the usecase above:
1) Suggester autocomplete:
https://www.elastic.co/guide/en/elasticsearch/reference/1.7/search-suggesters-completion.html
With this it seems I have to add another field called "suggest" replicating the data which is not desirable.
2) using a prefix filter/query:
https://www.elastic.co/guide/en/elasticsearch/reference/1.7/query-dsl-prefix-filter.html
this gives the whole document back not the exact matching terms.
Is there a clean way of achieving this, please advise.
Don't create mapping separately, insert data directly into index. It will create default mapping for that. Use below query for autocomplete.
GET /netflix/movie/_search
{
"query": {
"query_string": {
"query": "sta*"
}
}
}
I think completion suggester would be the cleanest way but if that is undesirable you could use aggregations on name field.
This is a sample index(I am assuming you are using ES 1.7 from your question
PUT netflix
{
"settings": {
"analysis": {
"analyzer": {
"prefix_analyzer": {
"tokenizer": "keyword",
"filter": [
"lowercase",
"trim",
"edge_filter"
]
},
"keyword_analyzer": {
"tokenizer": "keyword",
"filter": [
"lowercase",
"trim"
]
}
},
"filter": {
"edge_filter": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 20
}
}
}
},
"mappings": {
"movie":{
"properties": {
"name":{
"type": "string",
"fields": {
"prefix":{
"type":"string",
"index_analyzer" : "prefix_analyzer",
"search_analyzer" : "keyword_analyzer"
},
"raw":{
"type": "string",
"analyzer": "keyword_analyzer"
}
}
},
"tags":{
"type": "string", "index": "not_analyzed"
}
}
}
}
}
Using multi-fields, name field is analyzed in different ways. name.prefix is using keyword tokenizer with edge ngram filter
so that string star wars can be broken into s, st, sta etc. but while searching, keyword_analyzer is used so that search query does not get broken into multiple small tokens. name.raw will be used for aggregation.
The following query will give top 10 suggestions.
GET netflix/movie/_search
{
"query": {
"filtered": {
"filter": {
"term": {
"tags": "sci-fi"
}
},
"query": {
"match": {
"name.prefix": "sta"
}
}
}
},
"size": 0,
"aggs": {
"unique_movie_name": {
"terms": {
"field": "name.raw",
"size": 10
}
}
}
}
Results will be something like
"aggregations": {
"unique_movie_name": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "star trek",
"doc_count": 1
},
{
"key": "star wars",
"doc_count": 1
}
]
}
}
UPDATE :
You could use highlighting for this purpose I think. Highlight section will get you the whole word and which field it matched. You can also use inner hits and highlighting inside it to get nested docs also.
{
"query": {
"query_string": {
"query": "sta*"
}
},
"_source": false,
"highlight": {
"fields": {
"*": {}
}
}
}

Exact match in elastic search query

I want to exactly match the string ":Feed:" in a message field and go back a day pull all such records. The json I have seems to also match the plain word " feed ". I am not sure where I am going wrong. Do I need to add "constant_score" to this query JSON? The JSON I have currently is as shown below:
{
"query": {
"bool": {
"must": {
"query_string": {
"fields": ["message"],
"query": "\\:Feed\\:"
}
},
"must": {
"range": {
"timestamp": {
"gte": "now-1d",
"lte": "now"
}
}
}
}
}
}
As stated here: Finding Exact Values, since the field has been analyzed when indexed - you have no way of exact-matching its tokens (":"). Whenever the tokens should be searchable the mapping should be "not_analyzed" and the data needs to be re-indexed.
If you want to be able to easily match only ":feed:" inside the message field you might want to costumize an analyzer which doesn't tokenize ":" so you will be able to query the field with a simple "match" query instead of wild characters.
Not able to do this with query_string but managed to do so by creating a custom normalizer and then using a "match" or "term" query.
The following steps worked for me.
create a custom normalizer (available >V5.2)
"settings": {
"analysis": {
"normalizer": {
"my_normalizer": {
"type": "custom",
"filter": ["lowercase"]
}
}
}
}
Create a mapping with type "keyword"
{
"mappings": {
"default": {
"properties": {
"title": {
"type": "text",
"fields": {
"normalize": {
"type": "keyword",
"normalizer": "my_normalizer"
},
"keyword" : {
"type": "keyword"
}
}
}
}
}
}
use match or term query
{
"query": {
"bool": {
"must": [
{
"match": {
"title.normalize": "string to match"
}
}
]
}
}
}
Use match phrase
GET /_search
{
"query": {
"match_phrase": {
"message": "7000-8900"
}
}
}
In java use matchPhraseQuery of QueryBuilder
QueryBuilders.matchPhraseQuery(fieldName, searchText);
Simple & Sweet Soln:
use term query..
GET /_search
{
"query": {
"term": {
"message.keyword": "7000-8900"
}
}
}
use term query instead of match_phrase,
match_phrase this find/match with ES-document stored sentence, It will not exactly match. It matches with those sentence words!

Elasticsearch: Unable to search with wordforms

I am trying to setup Elasticsearch, created index, added some records but can not make it return results with word forms (for example: records with substring "dreams" when I search for "dream").
My records look like this (index "myindex/movies"):
{
"id": 1,
"title": "What Dreams May Come",
... other fields
}
The configuration I tried to use:
{
"settings": {
"analysis": {
"analyzer": {
"stem": {
"tokenizer": "standard",
"filter": [
"standard",
"lowercase",
"stop",
"porter_stem"
]
}
}
}
},
"mappings": {
"movies": {
"dynamic": true,
"properties": {
"title": {
"type": "string",
"analyzer": "stem"
}
}
}
}
}
And query look like this:
{
"query": {
"query_string": {
"query": "Dream"
}
}
}
I can get result back using word "dreams" but not "dream".
Do I do something wrong?
Should I install porter_stem somehow first?
You haven't done anything wrong , just that you are searching in wrong field.
query_string , does the search on _all by default. And _all is having its own analyzer.
So either you need to apply the same analyzer to _all or point your query to title field like below -
{
"query": {
"query_string": {
"query": "dream",
"default_field": "title"
}
}
}

Resources