Elasticsearch highlight with nested objects - elasticsearch

I have a question about highlighting nested object fields.
Consider record like this:
_source: {
id: 286
translations: [
{
id: 568
language: lang1
value: foo1 bar1
}
{
id: 569
language: lang2
value: foo2 bar2
}
]
}
If the translations.value has ngram filter, is it possible to highlight matches in nested object such as this one?
And how would the highlight query look like.
Thanks a lot for response.

Same problem over here. It seems that there is now way to do it in elastic search and won't be in near future.
Developer Shay Banon wrote:
In order to do highlighting based on the nested query, the nested
documents needs to be extracted as well in order to highlight it,
which is more problematic (and less performant).
Also:
His explanation was that this would take a good amount of memory as
there can be a large number of children. And it looks genuine to me as
adding this feature will violate the basic concept of processing only
N number of feeds at a time.
So the only way is to process the result of a query manually in your own programm to add the highlights.
Update
I don't know about tire or ngram filters but i found a way to retrieve all filter matching nested documents by using nested facets and facet filters. You need a seperate query for highlighting but its much faster than browsing through _source, in my case at least.
{"query":
{"match_all":{}},
"facets":{
"matching_translations":{
"nested":"translations",
"terms":{"field":"translations.value"},
"facet_filter":{
"bool":{"must":[{"terms":{"translations.value":["foo1"]}}]}
}
}
}
}
You can use the resulting facet terms for highlighting in your programm.
For example: i want to highlight links to nested documents (in jquery):
setHighlights = function(sdata){
var highlightDocs = [];
if(sdata['facets'] && sdata['facets']['docIDs'] && sdata['facets']['doctIDs']['terms'] && sdata['facets']['docIDs']['terms'].length >0){
for(var i in sdata['facets']['docIDs']['terms']){
highlightDocs.push(sdata['facets']['docIDs']['terms'][i]['term'])
}
}
$('li.document_link').each(function(){
if($.inArray($(this).attr('id'),highlightDocs) != -1) {
$(this).addClass('document_selected');
}
});
I hope that helps a little.

You can use force_source" : true in the fields to cause the document be highlighted once nested fields are joined.

Related

How elasticsearch updateByQuery syntax works

I've been working with Elasticsearch for some days. As i'm creating a CRUD, I've come across the updateByQuery method. I'm working with nestjs, and the way that I'm updating a field is:
await this.elasticSearch.updateByQuery(
{
index: 'my_index_user',
body:{
query:{
match:{
name: 'user_name',
}
},
script: {
inline : 'ctx._source.name = "new_user_name"'
}
}
}
);
My question is:
Why does elasticsearch need this syntax 'ctx._source.name = "new_user_name"' to specifie what the new value of the field name should be? What is ctx._source is this context?
As mentioned in the official doc of source filtering, using this you can fetch field value in the _source (Value which sent to Elasticsearch and this is stored as it is, and doesn't go through the analysis process).
Let's take an example of text field for which standard analyzer(Default) is applied, and you store the value of foo bar in this field, Elasticsearch
breaks the value of field as it goes through the analysis process and foo and bar two tokens are stored in the inverted index of Elasticsearch, but if you want to see the original value ie foo bar, you can check the _source and get it.
Hence, it's always better to have the original value(without analysis process) to be in the _source, hence using this API, you are updating the field value there.. this also helps when you want to reindex later to new index or change the way its analyzed as you have the original value in _source.

Type of field for prefix search in Elastic Search

I'm confused on what index type I should apply for my field for prefix search, many show search_as_you_type but I think auto complete is not what I'm going for.
I have a UUID field:
id: 34y72ca1-3739-41ff-bbec-f6d17479384c
The following terms should return the doc above:
3
34
34y72ca1
34y72ca1-3739
34y72ca1-3739-41ff-bbec-f6d17479384c
Using 3739 should not return it as it doesn't start with 3739. Initially this is what I was going for but then the wildcard field is not supported by Amazon AWS, so I compromise for prefix search instead of partial search.
I tried search_as_you_type field but it doesn't return the result when I use the whole ID. Actually, my use case is when user click enter, the results will be shown, instead of real-live when they type, so if speed is compromised its OK, just that I hope for something that will be good for many rows of data.
Thanks
If you have not explicitly defined any index mapping, then you need to use id.keyword field instead of the id field for the prefix query to show the appropriate results. This uses the keyword analyzer instead of the standard analyzer
{
"query": {
"prefix": {
"id.keyword": {
"value": "34y72ca1"
}
}
}
}
Otherwise, you can modify your index mapping, by adding multi fields for id field

ArangoSearch support for multiple fields search

Does ArangoSearch support search on multiple/all fields of a collection. I want to be able to search a text on all fields of a given collection. Does ArangoSearch support such a thing?
You can let a View index all fields (attributes) of your documents very easily:
{
"links": {
"yourCollection": {
"includeAllFields": true
}
},
…
}
In queries you need to be explicit about which fields to search in however:
FOR doc IN yourView
SEARCH doc.field1 == "foo" OR doc.field2 == "foo" OR doc.nested.field == "foo"
RETURN doc
It is not possible (yet) to express this using a wildcard, like SEARCH doc.* == "foo". Possible workarounds are to maintain a separate attribute which combines the content of all the individual fields you want to search in (but you need to make sure that it stays in sync with the source attributes), or to use a query builder of sorts to generate a disjunction like above.

Can I get outer document fields from nested top hits aggregation?

I am making use of the functionality that was added in Elasticsearch 1.5 to allow a top hits aggregation inside a nested aggregation. The problem I have is that once I have my top nested documents I want to be able to also get fields from their outer documents.
my pseudo aggregation structure is
nested: {
some_other_aggreagation: {
"top_hits": {
}
}
}
The top nested hits include the index, type and id of the outer document, so I could perform a secondary search, but I'd like to avoid that. My other option is to return all of the hits from the query (currently I only return the results of the aggregations) and then match up the documents with the events in my code, but that seems bad from a performance point of view.
Can anyone suggest something better? Thanks.

Elasticsearch: field "title" was indexed without position data; cannot run PhraseQuery

I have an index in ElasticSearch with the following mapping:
mappings: {
feed: {
properties: {
html_url: {
index: not_analyzed
omit_norms: true
index_options: docs
type: string
}
title: {
index_options: offsets
type: string
}
created: {
store: true
format: yyyy-MM-dd HH:mm:ss
type: date
}
description: {
type: string
}
}
}
getting the following error when performing phrase search ("video games"):
IllegalStateException[field \"title\" was indexed without position data; cannot run PhraseQuery (term=video)];
Single word searches work fine. Tried "index_options: positions" as well but with no luck. Title field contains text in multiple languages, sometimes empty. Interesting that it seems to fail randomly, for example it would fail with 200K documents or 800K using the same dataset. Is there a reason some titles wouldn't get indexed with positions?
Elastic search version 0.90.5
Just in case someone else has the same issue. There was another type/table (feed2) in the same index with the same "title" field that was set to "not_analyzed".
For some reason even if you specify the type: http://elasticsearchhost.com:9200/index_name/feed/_search the other type is still being searched as well. Changing the mapping for feed2 type fixed the problem.
You probably have another field named 'title' with a different mapping in another type but in the same index.
Basically if you have 2 fields with the same name in the same index - even if they are in different types - they cannot have different mappings: to be more precise, even if they have the same type (eg: "string") but one of them is "analyzed" and the other is "not analyzed", problems will arise.
I mean, yeah, you can try to setup 2 different mappings, and ElasticSearch will not complain, but when searching you get strange result and everything will go bananas.
You can read more about this issue here where they say:
[...] In the end, we opted to enforce the rule that all fields with the same name in the same index must have the same mapping [...]
And yeah, considering how the promise of ElasticSearch has always been "it just works" this little detail took a lot of people by surprise.

Resources