Multi match and highlighting in elasticsearch - elasticsearch

When I try to match one field in query everything works fine with highlighting in elasticsearch.
When I try to use:
$params = [
'index' => 'my_index',
'type' => 'articles',
'body' => [
'from' => '0',
'size' => '10',
'query' => [
'bool' => [
'must' => [
'match' => [ 'content' => 'what I want to search' ]
]
]
],
'highlight' => [
'pre_tags' => ['<mark>'],
'post_tags' => ['</mark>'],
'fields' => [
'content' => [ 'fragment_size' => 150, 'number_of_fragments' => 3 ]
]
],
]
];
everything works, but when I try to catch multiple fields, my search works correctly, but highlighting disappears.
'match' => [ 'content' => 'what I want to search' ],
'match' => [ 'type' => 1 ]
Do you know how to achieve functional highlighting, when I want apply search on two different fields with two different queries?

try this:
$params = [
'index' => 'my_index',
'type' => 'articles',
'body' => [
'from' => '0',
'size' => '10',
'query' => [
'bool' => [
'must' => [
'match' => [ 'content' => 'what I want to search' ]
]
],
'filter' => ['type' => 1]
]
] ],
'highlight' => [
'pre_tags' => ['<mark>'],
'post_tags' => ['</mark>'],
'fields' => [
'content' => [ 'fragment_size' => 150, 'number_of_fragments' => 3 ]
]
],
]

Related

How can I do a nested search on elastic search?

I am having troubles constructing a search query in ES 7.4
Here is my mapping:
[
'settings' => [
'number_of_shards' => 1,
'number_of_replicas' => 1,
'analysis' => [
'filter' => [
'filter_stemmer' => [
'type' => 'stemmer',
'language' => 'english'
]
],
'analyzer' => [
'g_analyzer' => [
'type' => 'custom',
'filter' => ['lowercase', 'stemmer'],
'tokenizer' => 'standard'
],
"no_stopwords" => [
"type" => "standard",
"stopwords" => []
],
]
]
],
'mappings' => [
'_source' => [
'enabled' => true
],
'properties' => [
'id' => [
'type' => 'integer'
],
'title' => [
'type' => 'text',
"analyzer" => "g_analyzer",
],
'description' => [
'type' => 'text',
"analyzer" => "g_analyzer",
],
'jobStatus' => [
'type' => 'text'
],
'videoId' => [
'type' => 'text',
],
'thumbnail' => [
'type' => 'text'
],
'playlistId' => [
'type' => 'text'
],
'channelId' => [
'type' => 'text'
],
'publishedDate' => [
"type" => "date",
],
'created_at' => [ //date video was updated
"type" => "date",
],
'updated_at' => [ //date video was updated
"type" => "date",
],
'url' => [
'type' => 'text'
],
'subtitles' => [
'type' => 'nested',
'properties' => [
'id' => [
'type' => 'integer'
],
'start_time' => [
'type' => 'float'
],
'end_time' => [
'type' => 'float'
],
'text' => [
'type' => 'text',
"analyzer" => "g_analyzer",
],
'langcode' => [
'type' => 'text'
],
]
]
]
]
];
What query do I need to search for the text "bill gates" in the subtitles, and return the subtitle "bill gates" was found in, as well as the subtitle above and below the hit?
As of now I am not having your sample docs and expected docs so can't try it local and provide you complete query but as you are using nested datatype, you need to make use of nested queries.
Nested queries are used to query the nested datatype and same official doc as some examples as well, see if you can follow them, and provide what you try and from there we can help you.
I figured out how to do the nested query:
$body = [
'query' => [
'nested' => [
'inner_hits'=>[
'size'=>3
],
'path' => 'subtitles',
'query' => [
'bool' => [
'must'=>[
[
'match'=>[ 'subtitles.text'=>$searchTerm ]
]
]
]
]
]
],
];
Doing this will add an inner hits with the subtitles with the actual found terms

elasticsearch 7, boost by integer value

I'm trying to boost a search by the "created" field (an integer / timestamp) but always run into
"{"error":{"root_cause":[{"type":"parsing_exception","reason":"Unknown key for a START_OBJECT in [script].","line":1,"col":181}],"type":"parsing_exception","reason":"Unknown key for a START_OBJECT in [script].","line":1,"col":181},"status":400}"
Without the 'script' the query works fine. But I'm running out of ideas how to write this script correctly. Any ideas?
return [
'index' => 'articles_' . $this->system,
'body' => [
'size' => $this->size,
'from' => $this->start,
'sort' => [
$this->order => 'desc',
],
'query' => [
'query_string' => [
'query' => $this->term,
'fields' => ['title^5', 'caption^3', 'teaser^2', 'content'],
'analyze_wildcard' => true,
],
'script' => [
'script' => [
'lang' => 'painless',
'source' => "doc['#created'].value / 100000",
],
],
],
],
];
EDIT: Updated query, but still running into "{"error":{"root_cause":[{"type":"parsing_exception","reason":"[query_string] malformed query, expected [END_OBJECT] but found [FIELD_NAME]","line":1,"col":171}],"type":"parsing_exception","reason":"[query_string] malformed query, expected [END_OBJECT] but found [FIELD_NAME]","line":1,"col":171},"status":400}"
Script is not a standalone attribute. It should be part of bool. When you have multiple filters these should be in must/should/filter under bool
'body' => [
'size' => $this->size,
'from' => $this->start,
'sort' => [
$this->order => 'desc'
],
'query' => [
'bool' => [
'must' =>[
'query_string' => [
'query' => $this->term,
'fields' => ['title^5', 'caption^3', 'teaser^2', 'content'],
'analyze_wildcard' => true
],
'script' => [
'script' => [
'lang' => 'painless',
'source' => "doc['#created'].value / 100000"
]
]
]
]
]
]
Above can have syntax issue of brackets(I couldn't test it) , query structure is correct
...
'query' => [
'function_score' => [
'query' => [
'query_string' => [
'query' => $this->term,
'fields' => ['title^10', 'caption^8', 'teaser^5', 'content'],
'analyze_wildcard' => true,
],
],
'script_score' => [
'script' => [
'lang' => 'expression',
'source' => "_score + (doc['created'] / 10000000000000)",
],
],
],
],
Was my solution at the end. Sadly found at the documentation of elasticsearch later. But you really have to divide the timestamp strongly that it doesn't totally overpower the best matches.

How to include mapped fields subfield in result in Elasticsearch?

For aggregation, I have a raw value of my field. But I can't access this value in my query. For example, in my case I have a brand Tommy Hilfiger and it's raw value tommy-hilfiger as a brand.keyword. How to include this value in a search results?
'body' => [
'settings' => [
'analysis' => [
'filter' => [
'remove_spaces_inside' => [
'type' => 'pattern_replace',
'pattern' => '\\s+',
'replacement' => ' '
],
'convert_spaces' => [
'type' => 'pattern_replace',
'pattern' => '\\s+',
'replacement' => '-'
],
],
'char_filter' => [
'convert_amp' => [
'type' => 'pattern_replace',
'pattern' => '&',
'replacement' => 'and'
]
],
'analyzer' => [
'slug' => [
'char_filter' => ['convert_amp'],
'tokenizer' => 'keyword',
'filter' => ['trim', 'lowercase', 'asciifolding', 'remove_spaces_inside', 'convert_spaces']
],
'format' => [
'char_filter' => ['convert_amp'],
'tokenizer' => 'keyword',
'filter' => ['trim', 'remove_spaces_inside']
]
]
]
],
'mappings' => [
'my_type' => [
'properties' => [
'brand' => [
'type' => 'string',
'fields' => [
'keyword' => [
'type' => 'string',
'analyzer' => 'slug',
'index_options' => 'docs',
]
]
]
]
]
]
]
Upd.
In my case, I store brand in 2 fields: default "Tommy Hilfiger" for full-text search, formatted keyword (slug) "tommy-hilfiger" for exact search. I can aggregate data by slug, but can't get this field in my query. For example, this query return all records with brand Tommy Hilfiger, but only default values, not a slug.
'body' => [
'_source' => [
'brand',
'brand.keyword'
],
'query' => [
'bool' => [
'must' => [
[
'terms' => [
'brand.keyword' => [
'tommy-hilfiger',
]
]
]
]
]
]
]

Elastic Search: ES-PHP Client how to create GEOPOINT Index

I am trying to create an index that should contain GEOPoint by using the Elastic Search PHP client -> https://www.elastic.co/guide/en/elasticsearch/client/php-api/5.0/index.html
My Code is as follow
$params = [
'index' => 'sweden_codes',
'type' => 'sweden_c',
'body' => [
'mappings' => [
'location' => [
'properties' => [
'pin' => [
'properties' => [
'location' => [
'type' => 'geo_point'
]
]
]
]
],
'text' => $code->City,
'pin' => [
'location' => [
'lat' => $code->Latitude,
'lon' => $code->Longitude
]
]
]
]
];
$client = ClientBuilder::create()
->setSSLVerification(false)
->setHosts(["elasticsearch:9200"])->build();
The Problem is when i go into kibana it say " No Compatible Fields: The "sweden_codes" index pattern does not contain any of the following field types: geo_point"
can anyone please have a look into the issue and let me know whats wrong with my mapping and index creation
Here is the Code for Mappings that works for me
$params = [
'index' => 'sweden_postal_codes',
'body' => [
'mappings' => [
'codes' => [
'properties' => [
'location' => [
'type' => 'geo_point'
],
'city' => [
'type' => 'string'
]
]
]
]
]
];
$client = ClientBuilder::create()
->setSSLVerification(false)
->setHosts(["elasticsearch:9200"])->build();
// Adding the index into the ES Cluster
$response = $client->indices()->create($params);
Here is the code for document indexing that worked for me
$params = [
'index' => 'sweden_postal_codes',
'type' => 'codes',
'id' => 2,
'body' => [
'location' => [
'lat' => 30.5268956,
'lon' => 79.2289643
],
'city' => 'Stockholm'
]
];
$client = ClientBuilder::create()
->setSSLVerification(false)
->setHosts(["elasticsearch:9200"])->build();
// Adding the index into the ES Cluster
$response = $client->index($params);

Ignore Results Outside of Distance Range

I am working with ElasticSearch for an application which deals with "posts". I currently have it working with a geo_point so that it will return all posts ordered by distance from the end-user. While this is working I also need to work in one more aspect for the system.
Posts can be paid for and for instance if I were to pay for my post and choose "Local" as the area range then this post should only show to end-users which are less than or equal to 20 miles away.
I have a column on my index named spotlight_range, is there a way I can create a query to say ignore all records if the spotlight_range = 'Local' and the distance is > 20 miles? I need to do this for several different spotlight ranges. For instance Regional may be 100 miles or less, etc.
My current query looks like this
$params = [
'index' => 'my_index',
'type' => 'posts',
'size' => 25,
'from' => 0,
'body' => [
'sort' => [
'_geo_distance' => [
'post_location' => [
'lat' => '44.4759',
'lon' => '-73.2121'
],
'order' => 'asc',
'unit' => 'mi'
]
],
'query' => [
'filtered' => [
'query' => [
'match_all' => []
],
'filter' => [
'geo_distance' => [
'distance' => '100mi',
'post_location' => [
'lat' => '44.4759',
'lon' => '-73.2121'
]
]
]
]
]
]
];
My index is setup with the following fields.
'id' => ['type' => 'integer'],
'title' => ['type' => 'string'],
'description' => ['type' => 'string'],
'price' => ['type' => 'integer'],
'shippable' => ['type' => 'boolean'],
'username' => ['type' => 'string'],
'post_location' => ['type' => 'geo_point'],
'post_location_string' => ['type' => 'string'],
'is_spotlight' => ['type' => 'boolean'],
'spotlight_range' => ['type' => 'string'],
'created_at' => ['type' => 'date', 'format' => 'yyyy-MM-dd HH:mm:ss'],
'updated_at' => ['type' => 'date', 'format' => 'yyyy-MM-dd HH:mm:ss']
My end goal for this is not specifically to search for distance < X and range = Y but rather to have it filter them out for all types based on distances I specify. The search should return ALL types of ranges but also filter out anything past my specified distance for each range type based on the users lat/lon passed into the query.
I have been looking for a solution to this online without much luck.
I would add a circle geo_shape to the document, centered on post_location and with a radius corresponding to the spotlight_range since you know both information at indexing time. That way you can encode into each post its corresponding "reach".
...
'post_location' => ['type' => 'geo_point'],
'spotlight_range' => ['type' => 'string'],
'reach' => ['type' => 'geo_shape'], <---- add this
So a "local" document would look something like this once indexed
{
"spotlight_range": "local",
"post_location": {
"lat": 42.1526,
"lon": -71.7378
},
"reach" : {
"type" : "circle",
"coordinates" : [-71.7378, 42.1526],
"radius" : "20mi"
}
}
Then the query would feature another geo_shape centered on the user's location with the chosen radius and would only retrieve documents whose reach intersects the circle shape in the query.
$params = [
'index' => 'my_index',
'type' => 'posts',
'size' => 25,
'from' => 0,
'body' => [
'sort' => [
'_geo_distance' => [
'post_location' => [
'lat' => '44.4759',
'lon' => '-73.2121'
],
'order' => 'asc',
'unit' => 'mi'
]
],
'query' => [
'filtered' => [
'query' => [
'match_all' => []
],
'filter' => [
'geo_shape' => [
'reach' => [
'relation' => 'INTERSECTS',
'shape' => [
'type' => 'circle',
'coordinates' => [-73.2121, 44.4759],
'radius' => '20mi'
]
]
]
]
]
]
]
];

Resources