I have an index with documents that have 3 fields name, summary and tags
name is short text that contains small pharse e.g. "Japanese Handmade Sword"
summary is a long text that is description of certain products, it may be more then 200 words.
tags is an array of string with keywords, e.g. ["Japanese", "Antiquity", "Weapon", "Katana"]
I need to combine these fields into one search query to get desired search results. For example, when user searched "Japan" I should get this item. However, match query always gives me empty result, although I have data and can see all documents without query.
Here is my mapping and index settings that performs some tokenization for fields.
PUT lessons
{
"settings": {
"index": {
"number_of_shards": 1
},
"refresh_interval": "5s",
"similarity": {
"string_similarity": {
"type": "BM25"
}
},
"analysis": {
"analyzer": {
"autocomplete": {
"filter": [
"lowercase"
],
"tokenizer": "standard"
},
"autocomplete_search": {
"type": "custom",
"filter": "lowercase",
"tokenizer": "standard"
}
},
}
},
"mappings": {
"properties": {
"name": {
"type": "text",
"analyzer": "autocomplete",
"search_analyzer": "autocomplete_search",
"fielddata": true,
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"summary": {
"type": "text",
"analyzer": "autocomplete",
"search_analyzer": "autocomplete_search",
"fielddata": true,
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"tags": {
"type": "text",
"search_analyzer": "autocomplete_search",
"fielddata": true,
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
I am using Kibana and when run below query I get no result
GET lessons/_search
{
"query": {
"match": {
"summary": "Japan"
}
}
}
What is wrong with my index settings or mapping?
You can use a multi-match query, to search on multiple fields for the same query text
{
"query": {
"multi_match" : {
"query": "Japan",
"fields": [ "summary", "tags", "name" ]
}
}
}
Related
I have an Elastic Search project with my aggregation and filter working correctly before I added synonym analyzer to mapping.
Current working Mapping :
"settings": {
"analysis": {
"normalizer": {
"lowercase": {
"type": "custom",
"filter": ["lowercase"]
}
}
}
},
"mappings": {
"doc": {
"dynamic": "false",
"properties": {
"primarytrades": {
"type": "nested",
"properties" :{
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256,
"normalizer": "lowercase"
}
}
}
}
}
}
}
}
#This is request and response with expected bucketed values:
Request:
{"aggs":{"filter_trades":{"aggs":{"nested_trades":{"aggs":{"autocomplete_trades":{"terms":{"field":"primarytrades.name.keyword","include":".*p.*l.*u.*m.b.","size":10}}},"nested":{"path":"primarytrades"}}},"filter":{"nested":{"path":"primarytrades","query":{"bool":{"should":[{"match":{"primarytrades.name":{"fuzziness":2,"query":"plumb"}}},{"match_phrase_prefix":{"primarytrades.name":{"query":"plumb"}}}]}}}}}},"query":{"bool":{"filter":[{"nested":{"path":"primarytrades","query":{"bool":{"should":[{"match":{"primarytrades.name":{"fuzziness":2,"query":"plumb"}}},{"match_phrase_prefix":{"primarytrades.name":{"query":"plumb"}}}]}}}}]}},"size":0}
Response:
{"took":1,"timed_out":false,"_shards":{"total":5,"successful":5,"skipped":0,"failed":0},"hits":{"total":7216,"max_score":0.0,"hits":[]},"aggregations":{"filter#filter_trades":{"doc_count":7216,"nested#nested_trades":{"doc_count":48496,"sterms#autocomplete_trades":{"doc_count_error_upper_bound":0,"sum_other_doc_count":0,"buckets":[{"key":"plumbing","doc_count":7192},{"key":"plumbing parts","doc_count":179}]}}}}}
To add synonym search feature to this, I changed mapping with synonym analyzer like this :
"settings": {
"analysis": {
"normalizer": {
"lowercase": {
"type": "custom",
"filter": [ "lowercase" ]
}
},
"analyzer": {
"synonym_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [ "lowercase", "my_synonyms" ]
}
},
"filter": {
"my_synonyms": {
"type": "synonym",
"synonyms": [ "piping, sink, plumbing" ],
"updateable": true
}
}
}
},
"mappings": {
"doc": {
"dynamic": "false",
"properties": {
"primarytrades": {
"type": "nested",
"properties" :{
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
},
"analyzed": {
"type": "text",
"analyzer": "standard",
"search_analyzer": "synonym_analyzer"
}
}
}
}
}
}
}
}
And also, I changed my query to use search_analyzer as below :
{"aggs":{"filter_trades":{"aggs":{"nested_trades":{"aggs":{"autocomplete_trades":{"match":{"field":"primarytrades.name.analyzed","include":".*p.*l.*u.*m.b.","size":10}}},"nested":{"path":"primarytrades"}}},"filter":{"nested":{"path":"primarytrades","query":{"bool":{"should":[{"match":{"primarytrades.name":{"fuzziness":2,"query":"plumb","search_analyzer":"synonym_analyzer"}}},{"match_phrase_prefix":{"primarytrades.name":{"query":"plumb","search_analyzer":"synonym_analyzer"}}}]}}}}}},"query":{"bool":{"filter":[{"nested":{"path":"primarytrades","query":{"bool":{"should":[{"match":{"primarytrades.name":{"fuzziness":2,"query":"plumb","search_analyzer":"synonym_analyzer"}}},{"match_phrase_prefix":{"primarytrades.name":{"query":"plumb","search_analyzer":"synonym_analyzer"}}}]}}}}]}}}
I am getting this error :
"type": "named_object_not_found_exception",
"reason": "[8:24] unable to parse BaseAggregationBuilder with name [match]: parser not found"
Can someone help me correct the query ?
Thanks in advance!
In your match queries, you need to specify analyzer and not search_analyzer. search_analyzer is only a valid keyword in the mapping section.
{
"match": {
"primarytrades.name": {
"fuzziness": 2,
"query": "plumb",
"analyzer": "synonym_analyzer" <--- change this
}
}
},
I am currenty using Elasticsearch's phonetic analyzer. I want the query to give higher score to exact matches then phonetic ones. Here is the query I am using:
{
"query": {
"multi_match" : {
"query" : "Abhijeet",
"fields" : ["content", "title"]
}
},
"size": 10,
"_source": [ "title", "bench", "court", "id_" ],
"highlight": {
"fields" : {
"title" : {},
"content":{}
}
}
}
When I search for Abhijeet, the top queries are Abhijit and only later does Abhijeet come. I want the exact matches to appear first, all the time and then the phonetic ones. Can this be done?
Edit:
Mappings
{
"courts_2": {
"mappings": {
"properties": {
"author": {
"type": "text",
"analyzer": "my_analyzer"
},
"bench": {
"type": "text",
"analyzer": "my_analyzer"
},
"citation": {
"type": "text"
},
"content": {
"type": "text",
"analyzer": "my_analyzer"
},
"court": {
"type": "text"
},
"date": {
"type": "text"
},
"id_": {
"type": "text"
},
"title": {
"type": "text",
"analyzer": "my_analyzer"
},
"verdict": {
"type": "text"
}
}
}
}
}
Here is the code I used to set up the phonetic analyzer:
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "standard",
"filter": [
"lowercase",
"my_metaphone"
]
}
},
"filter": {
"my_metaphone": {
"type": "phonetic",
"encoder": "metaphone",
"replace": true
}
}
}
}
},
"mappings": {
"properties": {
"author": {
"type": "text",
"analyzer": "my_analyzer"
},
"bench": {
"type": "text",
"analyzer": "my_analyzer"
},
"citation": {
"type": "text"
},
"content": {
"type": "text",
"analyzer": "my_analyzer"
},
"court": {
"type": "text"
},
"date": {
"type": "text"
},
"id_": {
"type": "text"
},
"title": {
"type": "text",
"analyzer": "my_analyzer"
},
"verdict": {
"type": "text"
}
}
}
}
Now, I want to query only the title and the content field. Here, I want the exact matches to appear first and then the phonetic ones.
The general solution approach is:
to use a bool-query,
with your ponectic query/queries in the must clause,
and the non-phonetic query/queries in the should clause
I can update the answer if you include the mappings and settings of your index to your question.
Update: Solution Approach
A. Expand your mapping to use multi-fields for title and content:
"title": {
"type": "text",
"analyzer": "my_analyzer",
"fields" : {
"standard" : {
"type" : "text"
}
}
},
...
"content": {
"type": "text",
"analyzer": "my_analyzer"
"fields" : {
"standard" : {
"type" : "text"
}
}
},
B. Get the fields populated (e.g. by re-indexing everything):
POST courts_2/_update_by_query
C. Adjust your query to leverage the newly introduced fields:
GET courts_2/_search
{
"_source": ["title","bench","court","id_"],
"size": 10,
"query": {
"bool": {
"must": {
"multi_match": {
"query": "Abhijeet",
"fields": ["title", "content"]
}
},
"should": {
"multi_match": {
"query": "Abhijeet",
"fields": ["title.standard", "content.standard"]
}
}
}
},
"highlight": {
"fields": {
"title": {},
"content": {}
}
}
}
I want a simple Pie chart based on my Index. However the fields in the result seem to be embedded within the _source field, which cannot be used in a Terms Aggregation in Kibana.
Sample Result is shown below:
Now if I disable the _source field in the mapping:
I don't get any of the fields:
However, the Kibana Discover page is listing the available fields, which are never returned by the ES results - when _source was enabled.
The Index Mapping is as shown below:
{
"settings": {
"analysis": {
"filter": {
"filter_stemmer": {
"type": "stemmer",
"language": "english"
}
},
"analyzer": {
"tags_analyzer": {
"type": "custom",
"filter": [
"standard",
"lowercase",
"filter_stemmer"
],
"tokenizer": "standard"
}
}
}
},
"mappings": {
"schemav1": {
"properties": {
"user_id": {
"type": "text"
},
"technician_query": {
"analyzer": "tags_analyzer",
"type": "text"
},
"staffer_queries": {
"analyzer": "tags_analyzer",
"type": "text"
},
"status":{
"type":"text"
}
}
}
}
}
Ok, the reason is simple, in order for your fields to be used in aggregations, you need to have a keyword version of them. You cannot aggregate text fields.
Transform your mapping to this:
"mappings": {
"schemav1": {
"properties": {
"user_id": {
"type": "keyword"
},
"technician_query": {
"analyzer": "tags_analyzer",
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
},
"staffer_queries": {
"analyzer": "tags_analyzer",
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
},
"status":{
"type":"keyword"
}
}
}
}
So, user_id and status are now keyword and technician_query.raw and staffer_queries.raw are also `keyword fields, which you can use in terms aggregations, hence in Pie charts as well.
I used following mapping:
I have modified english analyzer to use ngram analyzer as follows,so that I should be able to search under following scenarios :
1] partial search and special character search
2] To get advantage of language analyzers
{
"settings": {
"analysis": {
"analyzer": {
"english_ngram": {
"type": "custom",
"filter": [
"english_possessive_stemmer",
"lowercase",
"english_stop",
"english_stemmer",
"ngram_filter"
],
"tokenizer": "whitespace"
}
},
"filter": {
"english_stop": {
"type": "stop"
},
"english_stemmer": {
"type": "stemmer",
"language": "english"
},
"english_possessive_stemmer": {
"type": "stemmer",
"language": "possessive_english"
},
"ngram_filter": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 25
}
}
}
},
"mappings": {
"movie": {
"properties": {
"title": {
"type": "string",
"fields": {
"en": {
"type": "string",
"analyzer": "english_ngram"
}
}
}
}
}
}
}
Indexed my data as follows:
PUT http://localhost:9200/movies/movie/1
{
"title" : "$peci#l movie"
}
Query as follows:
{
"query": {
"multi_match": {
"query": "$peci#44 m11ov",
"fields": ["title.en"],
"operator":"and",
"type": "most_fields",
"minimum_should_match": "75%"
}
}
}
In query I am looking for "$peci#44 m11ov" string ,ideally I should not get results for this.
Anything wrong in here ?
This is a result of ngram tokenization. When you tokenize a string $peci#l movie your analyzer produces tokens like $, $p, $pe, etc. Your query also produces most of these tokens. Though these matches will have a lower score than a complete match. If it's critical for you to exclude these false positive matches, you can try to set a threshold using min_score option https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-min-score.html
I've got an Elasticsearch v5 index set up for mapping config hashes to URLs.
{
"settings": {
"analysis": {
"analyzer": {
"url-analyzer": {
"type": "custom",
"tokenizer": "url-tokenizer"
}
},
"tokenizer": {
"url-tokenizer": {
"type": "path_hierarchy",
"delimiter": "/"
}
}
}
},
"mappings": {
"route": {
"properties": {
"uri": {
"type": "string",
"index": "analyzed",
"analyzer": "url-analyzer"
},
"config": {
"type": "object"
}}}}}
I would like to match the longest path prefix with the highest score, so that given the documents
{ "uri": "/trousers/", "config": { "foo": 1 }}
{ "uri": "/trousers/grey", "config": { "foo": 2 }}
{ "uri": "/trousers/grey/lengthy", "config": { "foo": 3 }}
when I search for /trousers, the top result should be trousers, and when I search for /trousers/grey/short the top result should be /trousers/grey.
Instead, I'm finding that the top result for /trousers is /trousers/grey/lengthy.
How can I index and query my documents to achieve this?
I have one solution, after drinking on it: what if we treat the URI in the index as a keyword, but still use the PathHierarchyTokenizer on the search input?
Now we store the following docs:
/trousers
/trousers/grey
/trousers/grey/lengthy
When we submit a query for /trousers/grey/short, the search_analyzer can build the input [trousers, trousers/grey, trousers/grey/short].
The first two of our documents will match, and we can trivially select the longest match using a custom sort.
Now our mapping document looks like this:
{
"settings": {
"analysis": {
"analyzer": {
"uri-analyzer": {
"type": "custom",
"tokenizer": "keyword"
},
"uri-query": {
"type": "custom",
"tokenizer": "uri-tokenizer"
}
},
"tokenizer": {
"uri-tokenizer": {
"type": "path_hierarchy",
"delimiter": "/"
}
}
}},
"mappings": {
"route": {
"properties": {
"uri": {
"type": "text",
"fielddata": true,
"analyzer": "uri-analyzer",
"search_analyzer": "uri-query"
},
"config": {
"type": "object"
}
}
}
}
}
```
and our query looks like this:
{
"sort": {
"_script": {
"script": "doc.uri.length",
"order": "asc",
"type": "number"
}
},
"query": {
"match": {
"uri": {
"query": "/trousers/grey/lengthy",
"type": "boolean"
}
}
}
}