My documents contain an integer array field, storing the id of tags describing them. Given a specific tag id, I want to extract a list of top tags that occur most frequently together with the provided one.
I can solve this problem associating a term aggregation over the tag id field to a term filter over the same field, but the list I get back obviously always starts with the album id I provide: all documents matching my filter have that tag, and it is thus the first in the list.
I though of using the exclude field to avoid creating the problematic bucket, but as I'm dealing with an integer field, that seems not to be possible: this query
{
"size": 0,
"query": {
"term": {
"tag_ids": "00001"
}
},
"aggs": {
"tags": {
"terms": {
"size": 3,
"field": "tag_ids",
"exclude": "00001"
}
}
}
}
returns an error saying that Aggregation [tags] cannot support the include/exclude settings as it can only be applied to string values.
Is it possible to avoid getting back this bucket?
This is, as of Elasticsearch 1.4, a shortcoming of ES itself.
After the community proposed this change, the functionality has been added and will be included in Elasticsearch 1.5.0.
It's supposed to be fixed since version 1.5.0.
Look at this: https://github.com/elasticsearch/elasticsearch/pull/7727
While it is enroute to being fixed: My workaround is to have the aggregation use a script instead of direct access to the field, and let that script use the value as string.
Works well and without measurable performance loss.
Related
I have an elasticsearch index containing documents that represent entities at a given point in time. When an entity changes state, a new document is created with a timestamp. When I need to get the current state of all entities, I can do the following:
GET https://127.0.0.1:9200/myindex/_search
{
"collapse": {
"field": "entity_id"
},
"sort" : [{
"timestamp": {
"order": "desc"
}
}]
}
However, I would like to further filter the result of the collapse. When entities are deleted I create a new document that includes an is_deleted flag along with the timestamp in a nested metadata field. I would like to extend the above query to entirely filter out those entities that have been deleted. Using a term filter on entity_metadata.is_deleted: true obviously does not work, because then my result just includes the last document with that entity_id before it got marked as deleted. How can I filter my results after the collapse is done to exclude any tombstoned entites?
What I would suggest is that instead of adding an is_deleted flag to all entity_id documents, you could add a date_deleted field with the date of the deletion to all documents of that entity, and then when you view a document, given its date and the deleted_date you'd know if the document was LIVE or deleted at that date.
In addition, it would allow you to consider:
all documents that don't have a deleted_date field (i.e. not deleted) and
all documents that have a deleted_date before/after a given date.
As I am able to sort data using score like
{
"version":true,
"_source":false,
"sort": [
{
"_score": {
"order": "desc"
}
}
],
"query": {
"match_all": {}
}
}
Please let me know How can I do the same with _version. By default Fielddata is not supported on field _version. So may be I am missing some thing.
Is there any specific setting to query with version?
Please help!
You can't do this, and usually you don't have to.
See this thread:
https://discuss.elastic.co/t/filter-by--version-and-show--version-in-elasticsearch-query/22024/2
While using the _version might seem to work in certain cases, I would
recommend to never use it for anything else than optimistic locking of
updates. In particular, versions do not carry any meaning: they might look
like the number of times a document has been modified but it is not always
the case (for instance if you create a new document which has the same ID
as a document that you just deleted, the version number of the new document
will not be 1), and more importantly it is an implementation detail, this
behaviour might change in the future.
_version field is not indexed so you can't use it in queries.
You can create you custom version field and handle it manually.
I have two TYPES in my elasticsearch index. Both have same mapping. I am using one for active documents, while the other for archived ones.
Now, i want to archive a document i.e. change its _type from active to archived. Both are in same index, so i cannot reindex them as well.
Is there a way to do this in Elasticsearch 5.0 ?
Changing the type is tricky. You would have to remove and then index the document with the new type.
Why not have a field in your document indicating "activeness". Then you can use a bool query to filter by what you want:
{"query": {
"bool": {
"filter": [{"term": {"status", "active"}}],
"query": { /* your query object here */ }
}
}
}
Agree with having a field which indicates the activeness of the document.
(Or)
Use two different indices for "active" and "inactive" types.
Use aliases which map to these indices.
Aliases will give you flexibility to change your indices without downtimes.
For example I have the following two documents with fields Id and Name(the name field is analyzed):
1,jack-in-box
2,box
When my query was "box", I got the both documents, but actually I only wanna the document 2, or getting document 2 above of document 1.
How can I query this please.
I know that the doc1 was tokenized to jack,in and box, so when I search box I would get the doc1. My current solution is creating another field called name_not_analyzed and it is not analyzed. But I have been wondering if we have the best way via query to solve this such I don't have to reindex. Thanks in advance!
As #jgr pointed out in comment doc2 should be above doc1 by default unless you have your own ranking algorithm or if you are using constant score query or if you are only using filter which would give score of 1 to all documents
Now if you only want doc2, you could use scripting
{
"query": {
"filtered": {
"filter": {
"script": {
"script": "_source.name.toLowerCase()=='box'"
}
}
}
}
}
I am accessing the source itself to check against, also using lowercase to match BOX, Box etc.
Hope this helps!!
I am looking for a way to create a facet such that I can essentially return two values for one key.
For instance, I am attempting to retrieve both an amount and schedule properties of an object. I attempted to use a computed value script, but the calculations that have to be done using the two objects are date based, and require an external library to perform them.
Basically, something along the lines of:
"theFacet": {
"terms_stats": {
"key_field": "someKeyProbablyADate",
"value_field": "amount",
"value_field": "simpleSchedule"
}
}
Workarounds are also appreciated. Perhaps some way to return a new dynamic object with both fields?
Sounds like you want to pre-process your data before you index it into a single field, then facet on that.
Something among the line of a single string containing key#amount#schedule
Then when you get the faceting results back you can split it up again and run whatever logic you want.
Try combining different fields with a script element. For example:
"facets": {
"facet-name": {
"terms": {
"field": "some-field",
"script": "_source['another-field'] + '/' + term
}
}
}