Elastitcsearch 7 : mapping types - elasticsearch

I come across the following phrase and I am under impression that a valid 6.x query with type might give an error. I am using the cluster ES 7.10
Note that in 7.0, _doc is a permanent part of the path, and represents
the endpoint name rather than the document type.
But, to my surprise, I am able to run the following query. Does it mean _doc is NOT permanent part of the path? In specific, what kind of queries I need to modify when I am moving from 6.x to 7.x
PUT ecommercesite/product/1
{
"product_name": "Men High Performance Fleece Jacket",
"description": "Best Value. All season fleece jacket",
"unit_price": 79.99,
"reviews": 250,
"release_date": "2016-08-16"
}
And only the 6.x query, I am not able to run on 7.10. I got an error with respect to type.
GET ecommercesite/product/_mapping

The PUT requests currently (end of 2020) just throws a warning but will fail in 8.x.
For now, you could start replacing product with _doc:
PUT ecommercesite/product/1 --> PUT ecommercesite/_doc/1
GET ecommercesite/product/_mapping --> GET ecommercesite/_doc/_mapping?include_type_name
but it'd be best to ditch the types completely and adhere to the standards:
important: instead of PUT ecommercesite/1 either keep using PUT ecommercesite/_doc/1 or use PUT /ecommercesite/_create/1 (docs here)
GET ecommercesite/_mapping (docs here)
no significant changes in GET ecommercesite/_search

Related

Elasticsearch issue types removal

I am trying to run the below code in Python using Elasticsearch Ver 7.1, however the following errors come up:
ElasticsearchDeprecationWarning: [types removal] Using include_type_name in put mapping requests is deprecated. The parameter will be removed in the next major version.
client.indices.put_mapping(index=indexName,doc_type='diseases', body=diseaseMapping, include_type_name=True)
followed by:
ElasticsearchDeprecationWarning: [types removal] Specifying types in document index requests is deprecated, use the typeless endpoints instead (/{index}/_doc/{id}, /{index}/_doc, or /{index}/_create/{id}).
client.index(index=indexName,doc_type=docType, body={"name": disease,"title":currentPage.title,"fulltext":currentPage.content})
How I am supposed to amend my code to make it (see here) work in line with Elasticsearch 7X version? Any kind of help would be much appreciated.
This is just a warning right now, but it will become an error in Elasticsearch 8.
From last few version, Elasticsearch has been planning the removal of index types inside an index
ES5 - Setting index.mapping.single_type: true on an index will enable the single-type-per-index behavior which will be enforced in 6.0.
In ES6 - you can't have more than 1 index type inside 1 index
In ES7 - the concept of types inside an index has been deprecated
In ES8 - it will be removed, and you can't use types for query or while inserting documents
My suggestion would be to design an application and mapping in such a way that it doesn't include type parameter in index
To know the reason why elastic search has done this here is a link: https://www.elastic.co/guide/en/elasticsearch/reference/current/removal-of-types.html#_why_are_mapping_types_being_removed
A common issue (and difficult to spot) for this error message could be misspelling the endpoint, e.g.:
Misspelled:
/search
Correct:
/_search
Double check if your endpoint is correct as ElasticSearch may think you are trying to manipulate (add, update, remove) a document and you are giving a type, which is not the case (you are trying to call an endpoint).

Elasticsearch Dynamic Field Mapping and JSON Dot Notation

I'm trying to write logs to an Elasticsearch index from a Kubernetes cluster. Fluent-bit is being used to read stdout and it enriches the logs with metadata including pod labels. A simplified example log object is
{
"log": "This is a log message.",
"kubernetes": {
"labels": {
"app": "application-1"
}
}
}
The problem is that a few other applications deployed to the cluster have labels of the following format:
{
"log": "This is another log message.",
"kubernetes": {
"labels": {
"app.kubernetes.io/name": "application-2"
}
}
}
These applications are installed via Helm charts and the newer ones are following the label and selector conventions as laid out here. The naming convention for labels and selectors was updated in Dec 2018, seen here, and not all charts have been updated to reflect this.
The end result of this is that depending on which type of label format makes it into an Elastic index first, trying to send the other type in will throw a mapping exception. If I create a new empty index and send in the namespaced label first, attempting to log the simple app label will throw this exception:
object mapping for [kubernetes.labels.app] tried to parse field [kubernetes.labels.app] as object, but found a concrete value
The opposite situation, posting the namespaced label second, results in this exception:
Could not dynamically add mapping for field [kubernetes.labels.app.kubernetes.io/name]. Existing mapping for [kubernetes.labels.app] must be of type object but found [text].
What I suspect is happening is that Elasticsearch sees the periods in the field name as JSON dot notation and is trying to flesh it out as an object. I was able to find this PR from 2015 which explicitly disallows periods in field names however it seems to have been reversed in 2016 with this PR. There is also this multi-year thread from 2015-2017 discussing this issue but I was unable to find anything recent involving the latest versions.
My current thoughts on moving forward is to standardize the Helm charts we are using to have all of the labels use the same convention. This seems like a band-aid on the underlying issue though which is that I feel like I'm missing something obvious in the configuration of Elasticsearch and dynamic field mappings.
Any help here would be appreciated.
I opted to use the Logstash mutate filter with the rename option as described here:
https://www.elastic.co/guide/en/logstash/current/plugins-filters-mutate.html#plugins-filters-mutate-rename
The end result looked something like this:
filter {
mutate {
'[kubernetes][labels][app]' => '[kubernetes][labels][app.kubernetes.io/name]'
'[kubernetes][labels][chart]' => '[kubernetes][labels][helm.sh/chart]'
}
}
Although personally I've never encountered the exact same issue, I had similar problems when I indexed some test data and afterwards changed the structure of the document that should have been indexed (especially when "unflattening" data structures).
Your interpretation of the error message is correct. When you first index the document
{
"log": "This is another log message.",
"kubernetes": {
"labels": {
"app.kubernetes.io/name": "application-2"
}
}
}
Elasticsearch will recognize the app as an object/structure due to dynamic mapping.
When you then try to index the document
{
"log": "This is a log message.",
"kubernetes": {
"labels": {
"app": "application-1"
}
}
}
the previously, dynamically created mapping defined the field app as an object with sub-fields but elasticsearch encounters a concrete value, namely "application-1".
I suggest that you setup an index template to define the correct mappings. For the 'outdated' logging-versions I suggest to pre-process the particular documents either through an elasticsearch ingest-pipeline or with e.g. Logstash to get the documents in the correct format.
Hope that helps.

What are aliases in elasticsearch for?

I recently started working in a company that uses Elasticsearch. While most of its concepts are somewhat similar to relational databases and I am able to understand them, I still don't quite get the concept of aliases.
I did not find any such question here and the information provided on the Elasticsearch website did not help much either.
Can someone explain what aliases are for and ideally include an example of a situation where they are needed?
aliases are like soft links or shortcuts to actual indexes
the advantage is to be able to have an alias pointing to index1a while building or re-indexing on index2b and the moment of swapping them is atomic thanks to the alias, to which all code should point
Renaming an alias is a simple remove then add operation within the same API. This operation is atomic, no need to worry about a short period of time where the alias does not point to an index:
[EDIT] as pointed out #wholevinski aliases have other functionalities like:
Multiple indices can be specified for an action ...
all the info is in the page you have linked
[EDIT2] more on why the need/benefit of the atomicity
the key being "zero downtime" https://en.wikipedia.org/wiki/Zero_unscheduled_downtime or https://en.wikipedia.org/wiki/High_availability
https://www.elastic.co/guide/en/elasticsearch/guide/current/index-aliases.html
We will talk more about the other uses for aliases later in the book. For now we will explain how to use them to switch from an old index to a new index with zero downtime.
#arhak covered the topic pretty well.
One use case that (at least) made me understand the value of indices was the need to remove out-of-date documents and more specifically when using time-based-indices.
For example, you need to keep the logs of an application for at least one year. You decide to use time-based-indices, meaning you save into indices with the following format: 2018-02-logs, 2018-03-logs etc.. In order to be able to search in every index you create the following alias:
POST /_aliases
{
"actions": [{
"add": {
"alias": "current-logs", "indices": [ "2018-02-logs","2018-03-logs" ]
}
}]
}
And query like:
GET /current-logs/_search
Another advantage is that you can delete the out-of-date values very easily:
POST /_aliases
{
"actions": [
{ "remove": { "alias": "current-logs", "index": "logs_2018-01" }}
]
}
and DELETE /logs_2018-01
Aliases are basically created to group a set of indices and make them accessible regarless the name they have. Is a pointer to a set of indices. You can also apply a query/condition to all of these indices. It is very useful when performing queries or creating dashboards over the same group of indices all the time. In addition, if in the future you change the name of the indices that are part of an alias, the end users will not notice that change since it is for transparent for them and you will only update the pointer.

Resulting all the indices while trying to get only those indices which are associated with specific alias

I have recently upgraded application from Elastic search 5.3.1 to 6.0.
My requirement was to get all the indices which were associated with specific alias.
I used below mentioned snippet to fetch all indices associated with specific alias . This snippet is working fine in 5.3.1 and giving only those indices which are associated with that specific alias.
GetAliasesResponse r = client.admin().indices().getAliases(new
GetAliasesRequest("givenalias")).actionGet();
But after ES 6.0 , the same snippet giving all the indices which are created in system .
Ideally it should only return those indices which are associated with given alias .Not others. This was working in Elastic search 5.3.1.
TL;DR: It is an intended breaking change in the Java API of Elasticsearch (though it is not explicitly mentioned in the "Breaking changes in 6.0 ยป Java API changes" page).
The following is the story of discovering this fact. (Note: the original answer was heavily edited, hence comments might be out of date.)
Breaking changes in the REST API in 6.0
First I noticed that this part of the REST API changed in Elasticsearch 6.0. There are two reported breaking changes concerning aliases:
GET /_aliases,_mappings syntax was removed in favor of GET /_aliases/GET /_mappings
Indices aliases api resolves indices expressions only against indices
Though there isn't mentioned anything concerning OP's case.
From what I have seen doing queries, this query works in Elasticsearch 5:
GET /alias1/_aliases
And does not work in Elasticsearch 6, giving the following error:
{
"error": "Incorrect HTTP method for uri [/alias1/_aliases] and method [GET], allowed: [PUT]",
"status": 405
}
Interestingly, GET /alias1/_alias works in both versions and returns same result.
Moreover, I didn't manage to find an example of GET /alias1/_aliases in the documentation of nor 5.6, neither 6.0.
Reproducing the bug
After having realised that OP is actually using the Java API, I managed to reproduce the exact same behavior.
The following code:
GetAliasesResponse alias1 = client.admin().indices()
.getAliases(new GetAliasesRequest("alias1")).actionGet();
In ES 5 produces this in the IntelliJ debugger:
And for ES 6 I've got the following:
As you can see, there are extra keys in the second output, which have empty values.
Diving into the source code
Quick search over the elasticsearch codebase gave me the final explanation. In ES 5 there was a test testIndicesGetAliases which was checking that list of indices returned for a test alias has exactly one element (IndexAliasesIT.java#L554):
logger.info("--> getting alias1");
GetAliasesResponse getResponse = admin().indices().prepareGetAliases("alias1").get();
assertThat(getResponse, notNullValue());
assertThat(getResponse.getAliases().size(), equalTo(1));
And in 6.0 it checks that the size is 5! (IndexAliasesIT.java#L573)
logger.info("--> getting alias1");
GetAliasesResponse getResponse = admin().indices().prepareGetAliases("alias1").get();
assertThat(getResponse, notNullValue());
assertThat(getResponse.getAliases().size(), equalTo(5));
This change was introduced in this commit, which is related to these issues:
Remove comma-separated feature parsing for GetIndicesAction #24723
_alias API no longer accepts index wildcards #25090
This is actually interesting, because one of the reported REST API breaking changes that we have see above also broke compatibility of some Java API calls.
What you can do
In short term, you need just to filter out those keys with empty values.
In longer term I think it makes sense to migrate to Java High Level REST Client since Elastic plan to deprecate the TransportClient in version 7.0:
We plan on deprecating the TransportClient in Elasticsearch 7.0 and
removing it completely in 8.0. Instead, you should be using the Java
High Level REST Client, which executes HTTP requests rather than
serialized Java requests.
In general, Elasticsearch breaks compatibility quite often, so it's better to stay away from the dark corners of it like Java API.
Thanks for reading.
Hope that helps!

Elasticsearch NEST: ordering terms aggregation with multiple criteria

Using NEST, I need to be able to order a terms aggregation with multiple criteria (requires ElasticSearch 1.5 or later). For example:
"order": [{"avg_rank": "desc"}, {"avg_score": "desc"}]
This is working great using the raw JSON that I created to verify I was getting the expected behavior. Now, in trying to translate that over to code using the NEST library, I'm not seeing how that would be accomplished.
The OrderDescending() method has only one implementation that takes a string for the key. I need a C# "params" type method that can take a list of OrderDescending() and\or OrderAscending() elements.
Is there a way to do this in NEST that I'm overlooking?
Is there a way in NEST to work around this where I can inject a little raw JSON where I need it?
FWIW, I've been using the "fluent" style to create my queries.
EDIT:
I see that, using "object initializer" syntax, I could manually create the dictionary and add my criteria elements. Problem is, I have large amounts of code written in "fluent" syntax. So,
Is there a way to use an "object initializer" object and convert it to a "fluent" descriptor? In this case, a TermsAggregator to a TermsAggregationDescriptor?
EDIT 2:
I should have mentioned originally that I tried .OrderDescending("avg_rank").OrderDescending("avg_score") already. That simply took that last one in the chain. In looking at the code, I can see why. Each call to OrderDescending blindly news up the dictionary instead of checking to see if one was already newed up and adding a new key to the dictionary if it already exists.
Based on this, I believe this is a bug for which I have entered a report here:
OrderDescending and OrderAscending cannot be chained for multi-criteria ordering
EDIT 3:
I appreciate all the answers (some of which are getting deleted) because they're helping drive this along and are responsible for these edits. I should also have mentioned originally that I discovered that:
"order": { "avg_rank": "desc", "avg_score": "desc" }
does not work. I don't know why for sure but ES will only use the last one in that case. It has be a list of dictionaries as shown in my example above at the top. I've verified that correctly sub-orders the aggregation on the second element. So, the underlying object cannot be typed as a simple dictionary. I've also added this information to the bug report I created (as noted in EDIT 2).
If you're using the fluent syntax you can just chain the sorts together.
Sample:
var esClient = ninjectKernel.Get<IElasticClient>();
var query = esClient.Search<RedemptionES>(s=> s
.SortAscending(a=>a.Date)
.SortDescending(d=>d.Input.User.Name)
);
Response:
{
"sort": [
{
"#timestamp": {
"order": "asc"
}
},
{
"input.user.name": {
"order": "desc"
}
}
]
}
Martijn Laarman of the NEST team was very responsive and kind enough to provide a work around for the bug I reported in EDIT 2 of the description above. The fix can be found in the comments of that same bug report: Work around for NEST library multi-criteria aggregation ordering.
Note that he provided a work around for both object initializer and fluent syntaxes (the one I needed).

Resources