Checking if item has been indexed from an Array - elasticsearch

I have list or items to be added to Elasticsearch , but when i check the count i found that the items count is less in Elasticseach compare to the database .
So i created an array with all the ids in the database i want to know how can i compare it with elaticsearch
{
"size": 100,
"query": {
"bool": {
"should": [
{
"bool": {
"must_not": {
"terms": {
"ID": [
10400,
11024,
10401,
11026,
11053,
11061
]
}
}
}
}
]
}
}
}

You could use an aggregation query to list buckets for document IDs.
The following query will not include buckets for IDs that are not present in your index.
If you want buckets for IDs that are not in the index than you may want to use filter aggregation to write one filter query for each ID you are searching.
POST test_index/_search
{
"size": 0,
"aggs":{
"matching_values_field": {
"filter": {
"terms" : { "id" : [
10400,
11024,
10401,
11026,
11053,
11061
]}
},
"aggs": {
"myfield" : {
"terms" : {
"field" : "id"
}
}
}
}
}
}

Related

Deduplicate and perform composite aggregation on deduced result

I've an index in elastic search which contains data of daily transactions. Each doc has mainly three fields as below :
TxnId, Status, TxnType,userId
two documents can have same TxnIds.
I'm looking for a query that provides aggregation over status,TxnType for unique txnIds. Basically I'm looking for something like : select unique txnIds from user_table group by status,txnType.
I've a ES query which will dedup on TxnIds. I've another ES query which can perform composite aggregation on status and txnType. I want to do both things in Single query.
I tried collapse feature . I also tried cardinality and dedup features. But query is not giving correct output.:
{
"size": 0,
"query": {
"bool": {
"filter": [
{
"term": {
"streamSource": 3
}
}
]
}
},
"collapse": {
"field": "txnId"
},
"aggs": {
"buckets": {
"composite": {
"size": 30,
"sources": [
{
"status": {
"terms": {
"field": "status"
}
}
},
{
"txnType": {
"terms": {
"field": "txnType"
}
}
}
]
}
}
}
}

Elasticsearch return unique string from array field after a given filter

How would I get all values of all the ids with a given prefix from the elastic search records and make them unique.
Records
PUT items/1
{ "ids" : [ "apple_A", "orange_B" ] }
PUT items/2
{ "ids" : [ "apple_A", "apple_B" ] }
PUT items/3
{ "ids" : [ "apple_C", "banana_A" ] }
What I need is to find all the unique ids for a given prefix, for example if input is apple the output of ids should be ["apple_A", "apple_B", "apple_C"]
What I have tried so far is make use of the term aggregation, with the following query I was able to filter out the documents which have ids with given prefix but in the aggregation it will return all the ids part of the document.
{
"aggregations": {
"filterIds": {
"filter": {
"bool": {
"filter": [
{
"prefix": {
"ids.keyword": {
"value": "apple"
}
}
}
]
}
},
"aggregations": {
"uniqueIds": {
"terms": {
"field": "ids.keyword",
}
}
}
}
}
}
It's returning aggregation list as [ "appleA", "orange_B", "apple_B","apple_C", "banana_A"] if we give prefix input as apple. Basically returning all ids which have a matching filter.
Is there to get only the ids which match the prefix in array and not all the ids in the array of document ?
You can limit the returned values using the include parameter:
POST items/_search
{
"size": 0,
"aggregations": {
"filterIds": {
"filter": {
"bool": {
"filter": [
{
"prefix": {
"ids.keyword": {
"value": "apple"
}
}
}
]
}
},
"aggregations": {
"uniqueIds": {
"terms": {
"field": "ids.keyword",
"include": "apple.*" <--
}
}
}
}
}
}
Do check this other thread which deals with using regex within include -- it's very similar to your use case.

elasticsearch how do i query (search) in single document?

assuming that index's name is index & document 1's id is "1"
how can i query in single document?
something like this..
GET index/_search
{
"query": {
"id": "1",
"terms": ["is this text in document 1?"]
}
}
or
GET index/_doc/1/_search
{
...
}
far as i found,
GET test/_doc/_search
{
"query": {
"terms" : {
"_id" : ["1"]
}
}
}
this will get the document id of "1", but cannot perform any further queries.
the reason i want to query inside single document is because my app is using live-news view
and once news is retrieved from server, i want to search it in elasticsearch for keywork higlighting, and spam filtering.
You have to compose your query with Boolean Query
The best approch is to specify the id query under the filter because it will not have effect on scoring. You can next specify queries under must, must_not and should, according to your need :
GET index/_search
{
"from": 0,
"size": 10,
"query": {
"bool": {
"must": [
{
"term": {
"field": "value"
}
}
],
"must_not": [],
"should": [],
"filter": [
{
"terms": {"_id": ["1"]}
}
]
}
}
}

Elastic search bool query

My objective is to find out most recent 10 documents which match message id as MSG-1013 and Severity field must be info. Both conditions should satisfied and match text should be exact. I have tried with search query below but it does not give me expected results. What am I doing wrong here ?
{
"size": 10,
"query": {
"bool": {
"must": [
{
"match": { "messageId": "MSG-1013" }
},
{
"match": { "Severity": "Info" }
}
]
}
}
}
If I have understood you correctly, you want to find the top 10 (recent) documents having exactly fields "messageId" and "Severity". I assume, you don't need a score because your score seems to be the the document timestamp or something else like a date field. For this purpose, you could use the bool filter in combination with a sort query.
{
"query": {
"bool": {
"filter": [
{ "term": { "messageId": "MSG-1013" } },
{ "term": { "Severity": "Info" } }
]
}
},
"sort" : [
{ "documentTimestamp" : {"order" : "desc"}}
],
"size": 10
}

Filtering nested aggregation result on number of buckets

I have this query that does a nested aggregation giving me unique machineid per unique key. What I want Elasticsearch to return is only those key with two or more unique machineid. I can of course solve this problem application-side, but is there a way to solve this directly in the query? Or maybe I am going about this the wrong way?
My query:
{
"query": {
"filtered": {
"filter": {
"bool": {
"must_not": {
"term" : { "key" : "" }
}
}
}
}
},
"aggs": {
"keys": {
"terms": {
"field": "key",
"size" : 0
},
"aggs": {
"machines": {
"terms": {
"field": "machineid",
"size" : 0
}
},
}
}
}
}
Example document:
{
"timestamp":"2014-05-23T08:21:51+00:00",
"machineid":"1444056739053156926",
"hash":"77f595dee5ffacea72b135b1fce1312e",
"key":"XXXXXX-XXXXXX-XXXXXX-XXXXXX"
}
I have been looking at scripted metric aggregation but it doesn't seem to be what I'm looking for.
Issue #4404 and issue #8110 on Elasticsearch GitHub seem to describe my problem but they are both closed.

Resources