elasticsearch faking index per user - how are routing values inferred when updating? - elasticsearch

Using the fake index per user as suggested by docs. ES version 1.6.0 sometimes fails to behave as expected.
Checking the alias:
curl localhost:9200/testbig/_alias/<userId>
{"<indexname>":{"aliases":{"<userId>":{"filter":{"term":
{"userId":"<userId>"}},"index_routing":"<userId>","search_routing":"<userId>"}}
}}
But trying to update a document:
curl -XPOST localhost:9200/<userId>/<type>/<id>/_update -d
'{"doc":{"userId":"<userId>","field1":"val1"}}'
I get
{ "error": "ElasticsearchIllegalArgumentException[Alias [<userId>] has
index routing associated with it [<userId>], and was provided with
routing value [<DIFFERENTuserId>], rejecting operation]",
"status": 400 }

In case anyone else suffers a similar issue, what causes is this:
If you start by using actual separate indexes for each user, it's OK to have records with the same id, i.e. paths like
localhost:9200/userid1/type/id1
localhost:9200/userid2/type/id1
but when the userids are just aliases, these correspond, of course, to the same document. Hence the routing clash on subsequent updates.

Related

Elastitcsearch 7 : mapping types

I come across the following phrase and I am under impression that a valid 6.x query with type might give an error. I am using the cluster ES 7.10
Note that in 7.0, _doc is a permanent part of the path, and represents
the endpoint name rather than the document type.
But, to my surprise, I am able to run the following query. Does it mean _doc is NOT permanent part of the path? In specific, what kind of queries I need to modify when I am moving from 6.x to 7.x
PUT ecommercesite/product/1
{
"product_name": "Men High Performance Fleece Jacket",
"description": "Best Value. All season fleece jacket",
"unit_price": 79.99,
"reviews": 250,
"release_date": "2016-08-16"
}
And only the 6.x query, I am not able to run on 7.10. I got an error with respect to type.
GET ecommercesite/product/_mapping
The PUT requests currently (end of 2020) just throws a warning but will fail in 8.x.
For now, you could start replacing product with _doc:
PUT ecommercesite/product/1 --> PUT ecommercesite/_doc/1
GET ecommercesite/product/_mapping --> GET ecommercesite/_doc/_mapping?include_type_name
but it'd be best to ditch the types completely and adhere to the standards:
important: instead of PUT ecommercesite/1 either keep using PUT ecommercesite/_doc/1 or use PUT /ecommercesite/_create/1 (docs here)
GET ecommercesite/_mapping (docs here)
no significant changes in GET ecommercesite/_search

Elasticsearch issue types removal

I am trying to run the below code in Python using Elasticsearch Ver 7.1, however the following errors come up:
ElasticsearchDeprecationWarning: [types removal] Using include_type_name in put mapping requests is deprecated. The parameter will be removed in the next major version.
client.indices.put_mapping(index=indexName,doc_type='diseases', body=diseaseMapping, include_type_name=True)
followed by:
ElasticsearchDeprecationWarning: [types removal] Specifying types in document index requests is deprecated, use the typeless endpoints instead (/{index}/_doc/{id}, /{index}/_doc, or /{index}/_create/{id}).
client.index(index=indexName,doc_type=docType, body={"name": disease,"title":currentPage.title,"fulltext":currentPage.content})
How I am supposed to amend my code to make it (see here) work in line with Elasticsearch 7X version? Any kind of help would be much appreciated.
This is just a warning right now, but it will become an error in Elasticsearch 8.
From last few version, Elasticsearch has been planning the removal of index types inside an index
ES5 - Setting index.mapping.single_type: true on an index will enable the single-type-per-index behavior which will be enforced in 6.0.
In ES6 - you can't have more than 1 index type inside 1 index
In ES7 - the concept of types inside an index has been deprecated
In ES8 - it will be removed, and you can't use types for query or while inserting documents
My suggestion would be to design an application and mapping in such a way that it doesn't include type parameter in index
To know the reason why elastic search has done this here is a link: https://www.elastic.co/guide/en/elasticsearch/reference/current/removal-of-types.html#_why_are_mapping_types_being_removed
A common issue (and difficult to spot) for this error message could be misspelling the endpoint, e.g.:
Misspelled:
/search
Correct:
/_search
Double check if your endpoint is correct as ElasticSearch may think you are trying to manipulate (add, update, remove) a document and you are giving a type, which is not the case (you are trying to call an endpoint).

Delete Document with empty document id - elasticsearch TCP client

I am using TCP client to perform delete operation.
sample code:
DeleteRequestBuilder builder = client.prepareDelete(indexName, indexType,indexDocumentId);
ListenableActionFuture<DeleteResponse> deleteResponse = builder.setOperationThreaded(false).execute();
deleteResponse.actionGet(ESTemplateHelper.INDEX_STATE_ACK_TIMEOUT);
deleteStatus = deleteResponse.isDone();
I am passing empty value/"" to indexDocumentId.
deleteStatus is always true for empty documentId. But document is not deleted. am i missing something? Isn't it expected to throw any error?
The prepareDelete command is for deleting a single document by its ID. For more information: https://www.elastic.co/guide/en/elasticsearch/client/java-api/1.7/delete.html
Now, the ID of a document cannot be empty string. So, there should be no such document. The reason deleteStatus is true because it holds the value "whether the request is done or not?" and not "was the document deleted?". If you drill down the response, I believe you will find: found = false.
In case, you are passing an empty string in the hope of deleting all the documents of type indexType in the index indexName, then prepareDelete is not the right API for that.
Maybe, you can execute a query on all documents in your type, and delete them one by one. There is also delete by query API but it has been deprecated in 1.5 and removed in 2.0 because it can potentially cause OOM errors. More details here: https://www.elastic.co/guide/en/elasticsearch/reference/1.6/docs-delete-by-query.html
In case, you don't care about this index altogether, then deleting the index is the quickest and cleanest way to go: https://www.elastic.co/guide/en/elasticsearch/reference/1.7/indices-delete-index.html. I believe in the similar fashion you can delete your type too.
Eg: curl -XDELETE http://localhost:9200/indexName/indexType

How do I add an attribute to an Elasticsearch node for the purpose of Shard Allocation Filtering?

I'm attempting to follow the reference guide to make sure certain indexes end up on certain machines. I'm attempting to give 2 of my nodes an attribute named "storage_type", where one node gets "long_term" and one gets "short_term".
I understand that I need to add the attribute of "storage_type" to each of the nodes, and then set each index to have {"index.routing.allocation.require.tag" : "short"} or {"index.routing.allocation.require.tag" : "long"} respectively.
I've attempted to add these settings via curl calls, like most ES things, but it does not appear that I could PUT settings. i.e.:
curl -XPUT localhost:9200/_nodes/my_node_name/_settings -d '{"storage_term" : "short_term"}'
So how do I add these attributes such as "storage_type" (which is n to nodes)? Is it a config file? A command line argument? An HTTP call that I'm missing?
Since version 5.0 node attributes are to be set via node.attr.:
node.attr.storage_term: short_term
See Shard Allocation Filtering section of the official reference.
It's not to be done through curl calls. You need to use elasticsearch.yml.
in elasticsearch.yml:
node.storage_term: short_term

Carrot2+ElasticSearch Basic Flow of Information

I am using Carrot2 and ElasticSearch. I has elastic search server running with a lot of data when I installed carrot2 plugin.
Wanted to get answers to a few basic questions:
Will clustering work only on newly indexed documents or even old documents?
How can I specify which fields to look at for clustering?
The curl command is working and giving some results. How can I get the curl command which takes a JSON as input to a REST API url of the form localhost:9200/article-index/article/_search_with_clusters?.....
Appreciate any help.
Yes, if you want to use the plugin straight off the ES installation, you need to make REST calls of your own. I believe you are using Python. Take a look at requests. It is a delightful REST tool for python.
To make POST requests you can do the following :
import json
url = 'localhost:9200/article-index/article/_search_with_clusters'
payload = {'some': 'data'}
r = requests.post(url, data=json.dumps(payload))
print r.text
Find more information at requests documentation.
Will clustering work only on newly indexed documents or even old
documents?
It will work even on old documments
How can I specify which fields to look at for clustering?
Here's an example using the shakepspeare dataset. The query is which of shakespeare's plays are about war?
$ curl -XPOST http://localhost:9200/shakespeare/_search_with_clusters?pretty -d '
{
"search_request": {
"query": {"match" : { "_all": "war" }},
"size": 100
},
"max_hits": 0,
"query_hint": "war",
"field_mapping": {
"title": ["_source.play_name"],
"content": ["_source.text_entry"]
},
"algorithm": "lingo"
}'
Running this you'll get back plays like Richard, Henry... The title is what carrot2 uses to develop the cluster names and the text entry is what it uses to make the clusters.
The curl command is working and giving some results. How can I get the
curl command which takes a JSON as input to a REST API url of the form
localhost:9200/article-index/article/_search_with_clusters?.....
Typically use the elasticsearch client libraries for your language of choice.

Resources