Elasticsearch 2.x index mapping _id - elasticsearch

I ran ElasticSearch 1.x (happily) for over a year. Now it's time for some upgrading - to 2.1.x. The nodes should be turned off and then (one-by-one) on again. Seems easy enough.
But then I ran into troubles. The major problem is the field _uid, which I created myself so that I knew the exact location of a document from a random other one (by hashing a value). This way I knew that only that the exact one will be returned. During upgrade I got
MapperParsingException[Field [_uid] is a metadata field and cannot be added inside a document. Use the index API request parameters.]
But when I try to map my former _uid to _id (which should also be good enough) I get something similar.
The reason why I used the _uid param is because the lookup time is a lot lower than a termsQuery (or the like).
How can I still use the _uid or _id field in each document for the fast (and exact) lookup of certain exact documents? Note that I have to call thousands exact ones at the time, so I need an ID like query. Also it may occur the _uid or _id of the document does not exist (in that case I want, like now, a 'false-like' result)
Note: The upgrade from 1.x to 2.x is pretty big (Filters gone, no dots in names, no default access to _xxx)
Update (no avail):
Updating the mapping of _uid or _id using:
final XContentBuilder mappingBuilder = XContentFactory.jsonBuilder().startObject().startObject(type).startObject("_id").field("enabled", "true").field("default", "xxxx").endObject()
.endObject().endObject();
CLIENT.admin().indices().prepareCreate(index).addMapping(type, mappingBuilder)
.setSettings(Settings.settingsBuilder().put("number_of_shards", nShards).put("number_of_replicas", nReplicas)).execute().actionGet();
results in:
MapperParsingException[Failed to parse mapping [XXXX]: _id is not configurable]; nested: MapperParsingException[_id is not configurable];
Update: Changed name into _id instead of _uid since the latter is build out of _type#_id. So then I'd need to be able to write to _id.

Since there appears to be no way around setting the _uid and _id I'll post my solution. I mapped all document which had a _uid to uid (for internal referencing). At some point it came to me, you can set the relevant id
To bulk insert document with id you can:
final BulkRequestBuilder builder = client.prepareBulk();
for (final Doc doc : docs) {
builder.add(client.prepareIndex(index, type, doc.getId()).setSource(doc.toJson()));
}
final BulkResponse bulkResponse = builder.execute().actionGet();
Notice the third argument, this one may be null (or be a two valued argument, then the id will be generated by ES).
To then get some documents by id you can:
final List<String> uids = getUidsFromSomeMethod(); // ids for documents to get
final MultiGetRequestBuilder builder = CLIENT.prepareMultiGet();
builder.add(index_name, type, uids);
final MultiGetResponse multiResponse = builder.execute().actionGet();
// in this case I simply want to know whether the doc exists
if (only_want_to_know_whether_it_exists){
for (final MultiGetItemResponse response : multiResponse.getResponses()) {
final boolean exists = response.getResponse().isExists();
exist.add(exists);
}
} else {
// retrieve the doc as json
final String string = builder.getSourceAsString();
// handle JSON
}
If you only want 1:
client.prepareGet().setIndex(index).setType(type).setId(id);
Doing - the single update - using curl is mapping-id-field (note: exact copy):
# Example documents
PUT my_index/my_type/1
{
"text": "Document with ID 1"
}
PUT my_index/my_type/2
{
"text": "Document with ID 2"
}
GET my_index/_search
{
"query": {
"terms": {
"_id": [ "1", "2" ]
}
},
"script_fields": {
"UID": {
"script": "doc['_id']"
}
}
}

Related

Setting a hardcoded value on an Elastic document with Painless

I'm trying to learn Painless so that I could use it while trying to enrich and manipulate incoming documents. However, every way I've seen for accessing the document just results in errors.
Having input this in the Painless Lab in Kibana, these are the errors I'm getting:
def paths = new String[3];
paths[0]= '.com';
paths[1] = 'bar.com';
paths[2] = 'foo.bar.com';
doc['my_field'] = paths; // does not work: '[Ljava.lang.String; cannot be cast to org.elasticsearch.index.fielddata.ScriptDocValues'
ctx.my_field = paths; // does not compile: 'cannot resolve symbol [ctx.my_field]'
return doc['my_field'] == 'field_value'; // does not work: 'No field found for [my_field] in mapping'
doc['my_field'] == 'field_value' complains despite the field being present in the test document, though doc.containsKey('my_field') does return false.
How should I actually be accessing and manipulating the incoming document? I'm using ElasticSearch 7.12.
You can create ingest pipeline with set processor for adding hardcode value to incoming document.
{
"description" : "sets the value of count to 1",
"set": {
"field": "count",
"value": 1
}
}
There are very specific context available for painless API. you are using String[] which may be causing issue so you need to use either Arrays or ArraysList. you can check example of painless lab here.
Below is script i have tried in painless lab and it is working as expcted:
def ctx = params.ctx;
ArrayList paths = new ArrayList();
paths.add('.com');
paths.add('bar.com');
paths.add('foo.bar.com');
ctx['my_field'] = paths;
return ctx
Add below in parameters tab, i missed to add this in answer. this required because in actual implmentation you will get value from context and update context.
{
"ctx":{
"my_field":["test"]
}
}

Format reading ElasticSearch dates

This is my mapping for one of the properties in my ElasticSearch model:
"timestamp":{
"type":"date",
"format":"dd-MM-yyyy||yyyy-MM-dd'T'HH:mm:ss.SSSZ||epoch_millis"
}
I'm not sure if I'm misunderstanding the documentation. It clearly says:
The first format will also act as the one that converts back from milliseconds to a string representation.
And that is exactly what I want. I would like to be able to read directly (if possible) the dates as dd-MM-yyyy.
Unfortunately, when I go to the document itself (so, accessing to the ElasticSearch's endpoint directly, not via the application layer) I still get:
"timestamp" : "2014-01-13T15:48:25.000Z",
What am I missing here?.
As #Val mentioned, you'd get the value/format as how it is being indexed.
However if you want to view the date in particular format regardless of the format it has been indexed, you can make use of Script Fields. Note that it would be applied at querying time.
Below query is what your solution would be.
POST <your_index_name>/_search
{
"query":{
"match_all":{ }
},
"script_fields":{
"timestamp":{
"script":{
"inline": "def sf = new SimpleDateFormat(\"dd-MM-yyyy\");def dt = new Date(doc['timestamp'].value);def mydate = sf.format(dt);return mydate;"
}
}
}
}
Let me know how it goes.

Converting stringified float to float in Elasticsearch

I have a mapping in an Elasticsearch index with a certain string field called duration. However, duration is actually a float, but it's passed in as a string from my provisioning chain, so it will always look something like this : "0.12". So now I'd like to create a new index with a new mapping, where the duration field is a float. Here's what I've done, which isn't working at all, either for old entries or for incoming new ones.
First, I create my new index with my new mapping by doing the following :
PUT new_index
{
"mappings": { "new_mapping": {"properties": {"duration": {"type": "float"}, ... }
}
I then check that the new mapping are really in place using :
GET new_index/_mapping
I then copy the contents of the old index into the new one :
POST _reindex
{
"source": {
"index": "old_index"
},
"dest": {
"index": "new_index"
}
}
However, when I look at the entries in new_index, be it the ones I've added with that last POST or the new ones that came in since through my provisioning chain, the duration entry is still a string, even when its _type is new_mapping.
What am I doing wrong here ? Or is there simply no way to convert a string to a float within Elasticsearch ?
The duration field in the new index will be indexed as float (as per your mapping), however if the duration field in the source document is still a string, it will stay as a string in the _source, but still be indexed as float.
You can do a range query "from 1.00 to 3.00" on the new index and compare with what you get in the old index. Since the old index will run a lexical range (because of the string type) you might get results with a duration of 22.3, while in the new index you'll only get durations that are really between 1.00 and 3.00.

Separate indices or use type field in Elasticsearch

I'm developing an Elasticsearch service and we have multiple sources like our support ticket portal and a forum. Currently, I'm segregating each source into it's own index as each will have a child type. The ticket portal will of course search tickets (with nested replies) but also search users and such so there are multiple types under the portal index. Simple stuff so far.
However, I'm starting to think of merging the indices and prefix the type (portalTicket, portalUser, forumThread, forumUser, etc) as I'm wanting to search across both sources, but maybe there is a way to query them and bring it all back together. I'm just working with tickets and threads at the moment to start small, here are the two simples mappings I'm using thus far:
{
ticket : {
properties : {
replies : {
type : 'nested'
}
}
}
}
{
thread : {
properties : {
posts : {
type : 'nested'
}
}
}
}
Wanted to show that to show I'm using nested objects with different names. I can of course have same names but there will also be other meta data attached to the ticket and thread mappings that will be nested types also and that takes me to the issue. When I search without specifying the index, I get issues with some not having the nested type, as expected. The thread mapping doesn't have a replies property, it's posts. I can get around it using index in a filter like so:
{
filter : {
indices : {
index : 'portal',
no_match_query : 'none',
query : {
bool : {
should : [
{
match : {
title : 'help'
}
},
{
nested : {
path : 'replies',
query : {
match : {
'replies.text' : 'help'
}
}
}
}
]
}
}
}
}
}
Ok, that works for the portal index but working it to include the forum index is making me feel like I'm just fighting elasticsearch and not using it properly.
So should I keep them on separate indices and get a filter that will return both indices results or should I merge them into a single index, use a field to hold the source and likely normalize the nested properties or is there a way to work with multiple indices in a faceted way? (I know, aggregates in ES 2)
Reading these two posts (thanks to the commenters for pointing these out):
Elastic search, multiple indexes vs one index and types for different data sets?
https://www.elastic.co/blog/index-vs-type
I have decided that my data is too different and the amount of documents that I anticipate (and future additions) means that I should go with different indices.
Now to learn how to search across the different indices but this post was more about which strategy I should use so I'm going to open a new question for that.

Elasticsearch document id type integer vs string : Is there any performace difference?

I am using elasticsearch 2.3.1. Currently all the document ids are integer. But I have a situation where the document ids can be numeric valued or sometimes alpha-numeric string. So I need to make the field type 'string'.
So, I need to know if there is any performance difference based on the type of Id. Please help....
Elasticsearch will store the id as a String even if your mapping says otherwise:
"mappings": {
"properties": {
"id": {
"type": "integer"
},
That is my mapping, but when I do a sort on _id I get documents ordered as:
10489, 10499, 105, 10514...
i.e. in String order.
Latest version of ES (7.14) mandates the document's _id to be a String. You can see it in the documentation for org.elasticsearch.action.index.IndexRequest. IndexRequest mandates the _id to be a String field alone. No other types are supported. Example usage of IndexRequest can be found here: https://www.elastic.co/guide/en/elasticsearch/client/java-rest/current/java-rest-high-document-index.html
In case, the above link stops working later, here is the snippet from the link:
IndexRequest request = new IndexRequest("posts");
request.id("1"); //This is the only method available to set the document's _id.
String jsonString = "{" +
"\"user\":\"kimchy\"," +
"\"postDate\":\"2013-01-30\"," +
"\"message\":\"trying out Elasticsearch\"" +
"}";
request.source(jsonString, XContentType.JSON);

Resources