How can perform an Elasticsearch Multisearch, with only suggesters? - elasticsearch

I need to return suggestions from 4 separate suggesters, across two separate indices.
I am currently doing this by sending two separate requests to Elasticsearch (one for each index) and combining the results in my application. Obviously this does not seem ideal when the Multisearch API is available.
From playing with the Multisearch API I am able to combine these suggestion requests into one and it correctly retrieves results from all 4 completion suggesters from both indexes.
However, it also automatically performs a match_all query on the chosen indices. I can of course minimize the impact of this by setting searchType to count but the results are worse than the two separate curl requests.
It seems that no matter what I try I cannot prevent the Multisearch API from performing some sort of query over each index.
e.g.
{
index: 'users',
type: 'user'
},
{
suggest: {
users_suggest: {
text: term,
completion: {
size : 5,
field: 'users_suggest'
}
}
},
{
index: 'photos',
type: 'photo'
},
{
suggest: {
photos_suggest: {
text: term,
completion: {
size : 5,
field: 'photos_suggest'
}
}
}
}
A request like the above which clearly omits the {query:{} part of this multisearch request, still performs a match_all query and returns everything in the index.
Is there any way to prevent the query taking place so that I can simply get the combined completion suggesters results? Or is there another way to search multiple suggesters on multiple indices in one query?
Thanks in advance

Do make size=0, so that no hits will be returned but only suggestions.
{
"size": 0,
"suggest":{}
}
for every request.

Related

Indexing strategy for hierarchical structures on ElasticSearch

Let's say I have hierarchical types such as in example below:
base_type
child_type1
child_type3
child_type2
child_type1 and child_type2 inherit metadata properties from base_type. child_type3 has all properties inherited from both child_type1 and base_type.
To add to the example, here's several objects with their properties:
base_type_object: {
base_type_property: "bto_prop_value_1"
},
child_type1_object: {
base_type_property: "ct1o_prop_value_1",
child_type1_property: "ct1o_prop_value_2"
},
child_type2_object: {
base_type_property: "ct2o_prop_value_1",
child_type2_property: "ct2o_prop_value_2"
},
child_type3_object: {
base_type_property: "ct3o_prop_value_1",
child_type1_property: "ct3o_prop_value_2",
child_type3_property: "ct3o_prop_value_3"
}
When I query for base_type_object, I expect to search base_type_property values in each and every one of the child types as well. Likewise, if I query for child_type1_property, I expect to search through all types that have such property, meaning objects of type child_type1 and child_type3.
I see that mapping types have been removed. What I'm wondering is whether this use case warrants indexing under separate indices.
My current line of thinking using example above would be to create 4 indices: base_type_index, child_type1_index, child_type2_index and child_type3_index. Each index would only have mappings of their own properties, so base_type_index would only have base_type_property, child_type1_index would have child_type1_property etc. Indexing child_type1_object would create an entry on both base_type_index and child_type1_index indices.
This seems convenient because, as far as I can see, it's possible to search multiple indices using GET /my-index-000001,my-index-000002/_search. So I would theoretically just need to list hierarchy of my types in GET request: GET /base_type_index,child_type1_index/_search.
To make it easier to understand, here is how it would be indexed:
base_type_index
base_type_object: {
base_type_property: "bto_prop_value_1"
},
child_type1_object: {
base_type_property: "ct1o_prop_value_1"
},
child_type2_object: {
base_type_property: "ct2o_prop_value_1",
},
child_type3_object: {
base_type_property: "ct3o_prop_value_1",
}
child_type1_index
child_type1_object: {
child_type1_property: "ct1o_prop_value_2"
},
child_type3_object: {
child_type1_property: "ct3o_prop_value_2",
}
I think values for child_type2_index and child_type3_index are apparent, so I won't list them in order to keep the post length at a more reasonable level.
Does this make sense and is there a better way of indexing for my use case?

Most performant way to update a single document in Elasticsearch via an alias

I have an Elasticsearch setup with an alias that points to many indices. I need to update a single document, but I don't know which index it resides in.
There are two ways I can accomplish this as far as I can see:
_update_by_query:
POST my-alias/_update_by_query
{
"query": {
"terms": {
"_id": ["my-id-to-update"]
}
},
"script": {
"source": "ctx._source['Field'] = 'new value'"
}
}
read (which returns the specific index) then write:
GET my-alias/_search
{
"query": {
"terms": {
"_id": ["my-id-to-update"]
}
}
}
POST my-index-returned-from-the-get/_update/my-id-to-update
{
"doc": {
"Field": "new value"
}
}
Which method is more performant?
Which method is preferred?
Is there a better way than either of these two?
The performance of both approach will be the same with one difference that your first approach only need to send one request compare to second one with two request, so it would be better to use first approach as you will reduce the API calls by half.
Also in my opinion the first approach is much cleaner and fits more in concept of aliases of Elasticsearch because you are encapsulating exact index name from your application, as application doesn't need to have any clue about exact index-name your documents are in.
An important note about updating a document in Elasticsearch is documents in Elasticsearch don't get updated, it means the document will be flagged as deleted and new document will be created (this is due to Lucene implementation), then during process of Lucene segment merging the document will be actually deleted.
you can find a good blog post about segment merging here.

Can inclusion of specific fields change the elasticsearch result set?

I have an ES query that returns 414 documents if I exclude a specific field from results.
If I include this field, the document count drops to 328.
The documents that get dropped are consistent and this happens whether I scroll results or query directly.
The field map for the field that reduces the result set looks like this:
"completion": {
"type": "object",
"enabled": false
}
Nothing special to it and I have other "enabled": false object type fields that return just fine in this query.
I tested against multiple indexes with the same data to rule out corruption (I hope).
This 'completion' object is a nested and ignored object that has 4 or 5 levels of nesting but once again, I have other similarly nested objects that return just fine for this query.
The query is a simple terms match for 414 terms (yes, this is terrible, we are rethinking our strategy on this):
var { _scroll_id, hits } = await elastic.search({
index: index,
type: type,
body: shaQuery,
scroll: '10s',
_source_exclude: 'account,layout,surveydata,verificationdata,accounts,scores'
});
while (hits && hits.hits.length) {
// Append all new hits
allRecords.push(...hits.hits)
var { _scroll_id, hits } = await elastic.scroll({
scrollId: _scroll_id,
scroll: '10s'
})
}
The query is:
"query": {
"terms": {
"_id": [
"....",
"....",
"...."
}
}
}
In this example, I will only get back 328 results. If I add 'completion' to the _source_exclude then I get the full set back.
So, my question is: What are the scenarios where including a field in the result could limit the search when that field is totally unrelated to the search.
The #'s are specific to this example but consistent across queries. I just include them for context on the overall problem.
Also important is that this completion field has the same data and format across both included and excluded records, I can't see anything that would cause a problem.
The problem was found and it was obscure. What we saw was that it was always failing at the same point and when it was examined a little more closely, the same error was coming out:
{ took: 158,
timed_out: false,
_shards:
{ total: 5,
successful: 4,
skipped: 0,
failed: 1,
failures: [ [Object] ] },
[ { shard: 0,
index: ‘theindexname’,
node: ‘4X2vwouIRriYbQTQcHQ_sw’,
reason:
{ type: ‘illegal_argument_exception’,
reason:
‘cannot write xcontent for unknown value of type class java.math.BigInteger’ } } ]
Ok well thats strange, we are not using BigIntegers at all. But, thanks to the power of the Google this issue in the elasticsearch issue tracker was revealed:
https://github.com/elastic/elasticsearch/pull/32888
"XContentBuilder to handle BigInteger and BigDecimal" which is a bug in 6.3 where fields that used BigInteger and BigDecimal would fail to serialize and thus break when source filtering was applied. We were running 6.3.
It is unclear why our systems are triggering this issue but upgrading to 6.5 solved it entirely.
Obscure obscure obscure but solved thanks to Javier's persistence.

ElasticSearch - Delete documents by specific field

This seemingly simple task is not well-documented in the ElasticSearch documentation:
We have an ElasticSearch instance with an index that has a field in it called sourceId. What API call would I make to first, GET all documents with 100 in the sourceId field (to verify the results before deletion) and then to DELETE same documents?
You probably need to make two API calls here. First to view the count of documents, second one to perform the deletion.
Query would be the same, however the end points are different. Also I'm assuming the sourceId would be of type keyword
Query to Verify
POST <your_index_name>/_search
{
"size": 0,
"query": {
"term": {
"sourceId": "100"
}
}
}
Execute the above Term Query and take a note at the hits.total of the response.
Remove the "size":0 in the above query if you want to view the entire documents as response.
Once you have the details, you can go ahead and perform the deletion using the same query as shown in the below query, notice the endpoint though.
Query to Delete
POST <your_index_name>/_delete_by_query
{
"query": {
"term": {
"sourceId": "100"
}
}
}
Once you execute the Deletion By Query, notice the deleted field in the response. It must show you the same number.
I've used term queries however you can also make use of any Match or any complex Bool Query. Just make sure that the query is correct.
Hope it helps!
POST /my_index/_delete_by_query?conflicts=proceed&pretty
{
"query": {
"match_all": {}
}
}
Delete all the documents of an index without deleting the mapping and settings:
See: https://opster.com/guides/elasticsearch/search-apis/elasticsearch-delete-by-query/

Increasing 'view' counter of a document in an index everytime it gets queried explicitly using _id via _search endpoint

Say, I have an index called blog which has 10 documents called article. The article is a JSON with one of the property being views which is initialized to 0.
I was wondering if there's a good way of updating the views counter everytime the document gets explicitly called via _search endpoint using document id, so that I can sort it by view on my other queries.
Or would that be something that will have to be taken care of at the application layer?
My feeble attempt query dsl so far:
let options = {
index: 'blog',
body: {
query: {
function_score: {
query: {
match: { _id: req.params.articleID }
},
"weight" : 2
,
score_mode: "sum"
,
script_score : {
script : {
inline: "(2 + doc['view'].value)"
}
}
}
},
}
};
I have been trying inline script but that would require me to send two separate request. First search & then update if found. I was wondering if I could do it on a single query i.e trigger the views counter to increase by one automatically everytime I query via _search.

Resources