Change _type of a document in elasticsearch - elasticsearch

I have two TYPES in my elasticsearch index. Both have same mapping. I am using one for active documents, while the other for archived ones.
Now, i want to archive a document i.e. change its _type from active to archived. Both are in same index, so i cannot reindex them as well.
Is there a way to do this in Elasticsearch 5.0 ?

Changing the type is tricky. You would have to remove and then index the document with the new type.
Why not have a field in your document indicating "activeness". Then you can use a bool query to filter by what you want:
{"query": {
"bool": {
"filter": [{"term": {"status", "active"}}],
"query": { /* your query object here */ }
}
}
}

Agree with having a field which indicates the activeness of the document.
(Or)
Use two different indices for "active" and "inactive" types.
Use aliases which map to these indices.
Aliases will give you flexibility to change your indices without downtimes.

Related

What is data structure used for Elasticsearch flattened type

I was trying to find how flattened type in Elasticsearch works under the hood, the documentation specifies that all leaf values will be indexed into a single field as a keyword, as a result, there will be a dedicated index for all those flattened keywords.
From documentation:
By default, Elasticsearch indexes all data in every field and each indexed field has a dedicated, optimized data structure. For example, text fields are stored in inverted indices, and numeric and geo fields are stored in BKD trees.
The specific case that I am trying to understand:
If I have flattened field and index object with nested objects there is the ability to query a specific nested key in the flattened object. See how to query by labels.release:
PUT bug_reports
{
"mappings": {
"properties": {
"labels": {
"type": "flattened"
}
}
}
}
POST bug_reports/_doc/1
{
"labels": {
"priority": "urgent",
"release": ["v1.2.5", "v1.3.0"]
}
}
POST bug_reports/_search
{
"query": {
"term": {"labels.release": "v1.3.0"}
}
}
Would flattened field have the same index structure as the keyword field, and how it is able to reference the specific child key of flattened object?
The initial design and implementation of the flattened field type is described in this issue. The leaf keys are also indexed along with the leaf values, which is how they are allowing the search for a specific sub-field.
There are some ongoing improvements to the flattened field type and Elastic would also like to support numeric values, but that's not yet released.

Querying ElasticSearch document based on particular value without knowing field name

I need to query the entire index based on particular text value. I don't have field name to query for. Is it possible to search the documents based on particular text?
You can use query string.
You can specify multiple fields. If no field is specified it will search in entire document
{
"query": {
"query_string" : {
"query" : "text"
}
}
}

Filtering collapsed results in Elasticsearch

I have an elasticsearch index containing documents that represent entities at a given point in time. When an entity changes state, a new document is created with a timestamp. When I need to get the current state of all entities, I can do the following:
GET https://127.0.0.1:9200/myindex/_search
{
"collapse": {
"field": "entity_id"
},
"sort" : [{
"timestamp": {
"order": "desc"
}
}]
}
However, I would like to further filter the result of the collapse. When entities are deleted I create a new document that includes an is_deleted flag along with the timestamp in a nested metadata field. I would like to extend the above query to entirely filter out those entities that have been deleted. Using a term filter on entity_metadata.is_deleted: true obviously does not work, because then my result just includes the last document with that entity_id before it got marked as deleted. How can I filter my results after the collapse is done to exclude any tombstoned entites?
What I would suggest is that instead of adding an is_deleted flag to all entity_id documents, you could add a date_deleted field with the date of the deletion to all documents of that entity, and then when you view a document, given its date and the deleted_date you'd know if the document was LIVE or deleted at that date.
In addition, it would allow you to consider:
all documents that don't have a deleted_date field (i.e. not deleted) and
all documents that have a deleted_date before/after a given date.

Can I add a field automatically to an elastic search index when the data is being indexed?

I have 2 loggers from 2 different clusters logging into my elasticsearch. logger1 uses indices mydata-cluster1-YYYY.MM.DD and logger2 uses indices mydata-cluster2-YYYY.MM.DD.
I have no way of touching the loggers. So i would like to add a field on the ES side when the data is indexed to show which cluster the data belongs to. Can i use mappings to do this?
Thanks
What if you use the PUT mapping API, in order to add a field to your index:
PUT mydata-cluster1-YYYY.MM.DD/_mapping/mappingtype <-- change the mapping type according to yours
{
"properties": {
"your_field": {
"type": "text" <--- type of the field
}
}
}
This SO could come in handy. Hope it helps!

Exclude setting on integer field in term query

My documents contain an integer array field, storing the id of tags describing them. Given a specific tag id, I want to extract a list of top tags that occur most frequently together with the provided one.
I can solve this problem associating a term aggregation over the tag id field to a term filter over the same field, but the list I get back obviously always starts with the album id I provide: all documents matching my filter have that tag, and it is thus the first in the list.
I though of using the exclude field to avoid creating the problematic bucket, but as I'm dealing with an integer field, that seems not to be possible: this query
{
"size": 0,
"query": {
"term": {
"tag_ids": "00001"
}
},
"aggs": {
"tags": {
"terms": {
"size": 3,
"field": "tag_ids",
"exclude": "00001"
}
}
}
}
returns an error saying that Aggregation [tags] cannot support the include/exclude settings as it can only be applied to string values.
Is it possible to avoid getting back this bucket?
This is, as of Elasticsearch 1.4, a shortcoming of ES itself.
After the community proposed this change, the functionality has been added and will be included in Elasticsearch 1.5.0.
It's supposed to be fixed since version 1.5.0.
Look at this: https://github.com/elasticsearch/elasticsearch/pull/7727
While it is enroute to being fixed: My workaround is to have the aggregation use a script instead of direct access to the field, and let that script use the value as string.
Works well and without measurable performance loss.

Resources