How can I do a nested search on elastic search? - elasticsearch

I am having troubles constructing a search query in ES 7.4
Here is my mapping:
[
'settings' => [
'number_of_shards' => 1,
'number_of_replicas' => 1,
'analysis' => [
'filter' => [
'filter_stemmer' => [
'type' => 'stemmer',
'language' => 'english'
]
],
'analyzer' => [
'g_analyzer' => [
'type' => 'custom',
'filter' => ['lowercase', 'stemmer'],
'tokenizer' => 'standard'
],
"no_stopwords" => [
"type" => "standard",
"stopwords" => []
],
]
]
],
'mappings' => [
'_source' => [
'enabled' => true
],
'properties' => [
'id' => [
'type' => 'integer'
],
'title' => [
'type' => 'text',
"analyzer" => "g_analyzer",
],
'description' => [
'type' => 'text',
"analyzer" => "g_analyzer",
],
'jobStatus' => [
'type' => 'text'
],
'videoId' => [
'type' => 'text',
],
'thumbnail' => [
'type' => 'text'
],
'playlistId' => [
'type' => 'text'
],
'channelId' => [
'type' => 'text'
],
'publishedDate' => [
"type" => "date",
],
'created_at' => [ //date video was updated
"type" => "date",
],
'updated_at' => [ //date video was updated
"type" => "date",
],
'url' => [
'type' => 'text'
],
'subtitles' => [
'type' => 'nested',
'properties' => [
'id' => [
'type' => 'integer'
],
'start_time' => [
'type' => 'float'
],
'end_time' => [
'type' => 'float'
],
'text' => [
'type' => 'text',
"analyzer" => "g_analyzer",
],
'langcode' => [
'type' => 'text'
],
]
]
]
]
];
What query do I need to search for the text "bill gates" in the subtitles, and return the subtitle "bill gates" was found in, as well as the subtitle above and below the hit?

As of now I am not having your sample docs and expected docs so can't try it local and provide you complete query but as you are using nested datatype, you need to make use of nested queries.
Nested queries are used to query the nested datatype and same official doc as some examples as well, see if you can follow them, and provide what you try and from there we can help you.

I figured out how to do the nested query:
$body = [
'query' => [
'nested' => [
'inner_hits'=>[
'size'=>3
],
'path' => 'subtitles',
'query' => [
'bool' => [
'must'=>[
[
'match'=>[ 'subtitles.text'=>$searchTerm ]
]
]
]
]
]
],
];
Doing this will add an inner hits with the subtitles with the actual found terms

Related

elasticsearch 7, boost by integer value

I'm trying to boost a search by the "created" field (an integer / timestamp) but always run into
"{"error":{"root_cause":[{"type":"parsing_exception","reason":"Unknown key for a START_OBJECT in [script].","line":1,"col":181}],"type":"parsing_exception","reason":"Unknown key for a START_OBJECT in [script].","line":1,"col":181},"status":400}"
Without the 'script' the query works fine. But I'm running out of ideas how to write this script correctly. Any ideas?
return [
'index' => 'articles_' . $this->system,
'body' => [
'size' => $this->size,
'from' => $this->start,
'sort' => [
$this->order => 'desc',
],
'query' => [
'query_string' => [
'query' => $this->term,
'fields' => ['title^5', 'caption^3', 'teaser^2', 'content'],
'analyze_wildcard' => true,
],
'script' => [
'script' => [
'lang' => 'painless',
'source' => "doc['#created'].value / 100000",
],
],
],
],
];
EDIT: Updated query, but still running into "{"error":{"root_cause":[{"type":"parsing_exception","reason":"[query_string] malformed query, expected [END_OBJECT] but found [FIELD_NAME]","line":1,"col":171}],"type":"parsing_exception","reason":"[query_string] malformed query, expected [END_OBJECT] but found [FIELD_NAME]","line":1,"col":171},"status":400}"
Script is not a standalone attribute. It should be part of bool. When you have multiple filters these should be in must/should/filter under bool
'body' => [
'size' => $this->size,
'from' => $this->start,
'sort' => [
$this->order => 'desc'
],
'query' => [
'bool' => [
'must' =>[
'query_string' => [
'query' => $this->term,
'fields' => ['title^5', 'caption^3', 'teaser^2', 'content'],
'analyze_wildcard' => true
],
'script' => [
'script' => [
'lang' => 'painless',
'source' => "doc['#created'].value / 100000"
]
]
]
]
]
]
Above can have syntax issue of brackets(I couldn't test it) , query structure is correct
...
'query' => [
'function_score' => [
'query' => [
'query_string' => [
'query' => $this->term,
'fields' => ['title^10', 'caption^8', 'teaser^5', 'content'],
'analyze_wildcard' => true,
],
],
'script_score' => [
'script' => [
'lang' => 'expression',
'source' => "_score + (doc['created'] / 10000000000000)",
],
],
],
],
Was my solution at the end. Sadly found at the documentation of elasticsearch later. But you really have to divide the timestamp strongly that it doesn't totally overpower the best matches.

How to include mapped fields subfield in result in Elasticsearch?

For aggregation, I have a raw value of my field. But I can't access this value in my query. For example, in my case I have a brand Tommy Hilfiger and it's raw value tommy-hilfiger as a brand.keyword. How to include this value in a search results?
'body' => [
'settings' => [
'analysis' => [
'filter' => [
'remove_spaces_inside' => [
'type' => 'pattern_replace',
'pattern' => '\\s+',
'replacement' => ' '
],
'convert_spaces' => [
'type' => 'pattern_replace',
'pattern' => '\\s+',
'replacement' => '-'
],
],
'char_filter' => [
'convert_amp' => [
'type' => 'pattern_replace',
'pattern' => '&',
'replacement' => 'and'
]
],
'analyzer' => [
'slug' => [
'char_filter' => ['convert_amp'],
'tokenizer' => 'keyword',
'filter' => ['trim', 'lowercase', 'asciifolding', 'remove_spaces_inside', 'convert_spaces']
],
'format' => [
'char_filter' => ['convert_amp'],
'tokenizer' => 'keyword',
'filter' => ['trim', 'remove_spaces_inside']
]
]
]
],
'mappings' => [
'my_type' => [
'properties' => [
'brand' => [
'type' => 'string',
'fields' => [
'keyword' => [
'type' => 'string',
'analyzer' => 'slug',
'index_options' => 'docs',
]
]
]
]
]
]
]
Upd.
In my case, I store brand in 2 fields: default "Tommy Hilfiger" for full-text search, formatted keyword (slug) "tommy-hilfiger" for exact search. I can aggregate data by slug, but can't get this field in my query. For example, this query return all records with brand Tommy Hilfiger, but only default values, not a slug.
'body' => [
'_source' => [
'brand',
'brand.keyword'
],
'query' => [
'bool' => [
'must' => [
[
'terms' => [
'brand.keyword' => [
'tommy-hilfiger',
]
]
]
]
]
]
]

Multi match and highlighting in elasticsearch

When I try to match one field in query everything works fine with highlighting in elasticsearch.
When I try to use:
$params = [
'index' => 'my_index',
'type' => 'articles',
'body' => [
'from' => '0',
'size' => '10',
'query' => [
'bool' => [
'must' => [
'match' => [ 'content' => 'what I want to search' ]
]
]
],
'highlight' => [
'pre_tags' => ['<mark>'],
'post_tags' => ['</mark>'],
'fields' => [
'content' => [ 'fragment_size' => 150, 'number_of_fragments' => 3 ]
]
],
]
];
everything works, but when I try to catch multiple fields, my search works correctly, but highlighting disappears.
'match' => [ 'content' => 'what I want to search' ],
'match' => [ 'type' => 1 ]
Do you know how to achieve functional highlighting, when I want apply search on two different fields with two different queries?
try this:
$params = [
'index' => 'my_index',
'type' => 'articles',
'body' => [
'from' => '0',
'size' => '10',
'query' => [
'bool' => [
'must' => [
'match' => [ 'content' => 'what I want to search' ]
]
],
'filter' => ['type' => 1]
]
] ],
'highlight' => [
'pre_tags' => ['<mark>'],
'post_tags' => ['</mark>'],
'fields' => [
'content' => [ 'fragment_size' => 150, 'number_of_fragments' => 3 ]
]
],
]

Convertion of Elasticsearch Query from 2.0 to 5.0

Guys I am newbie in elastic search and trying to migrate from 2.* to 5.*.
I have a query which I cannot convert. I tried searching google, as well as read documentation but examples in it are very basic and don't really help in here. Maybe someone can help me to convert the following query into modern one? I am using php client for it.
$query = [
'index' => 'index_name',
'type' => 'table',
'body' => [
'query' => [
'filtered' => [
'query' => [
'bool' => [
'must' => [
[
'match' => [
'_all' => [
'query' => 'zonda',
'fuzziness' => 1,
],
],
],
],
],
],
'filter' => [
[
'term' => [
'foo' => 1,
],
],
],
],
],
],
'size' => 10000,
];
Not much into php, so could not check it, but it should be something like this:
$query = [
'index' => 'index_name',
'type' => 'table',
'body' => [
'query' => [
'bool' => [
'must' => [
[
'match' => [
'_all' => [
'query' => 'zonda',
'fuzziness' => 1,
],
],
],
],
'filter' => [
[
'term' => [
'foo' => 1,
],
],
],
],
],
],
'size' => 10000,
];
The filtered query is now gone, and you can use the filter part of the bool query.

elasticsearch mapping dynamic template

What is wrong with my mapping? I receive:
{"error":"ClassCastException[java.util.LinkedHashMap cannot be cast to java.util.List]","status":500}
I use ElasticSearch 1.1.1 on ubuntu server.
If I delete the dynamic templates it works
$this->mapping = [
"dynamic_templates" => [
'all_fields' => [
'match' => "*",
'match_mapping_type' => 'string',
'mapping' => [
'index' => 'not_analyzed',
],
],
],
'properties' => [
'state' => [
'type' => 'boolean',
],
...
];
}
The mapping should look like this:
"dynamic_templates" => [
['all_fields' => [
'match' => "*",
'match_mapping_type' => 'string',
'mapping' => [
'index' => 'not_analyzed',
],
]],
],
Note the field definition should be an associative array inside a scalar array.

Resources