Count of elements on kibana visualization - elasticsearch

I have inserted below JSON records on my elastic index. How do I get count of all the elements present in the "devices" array so that count can be visualized on Kibana Dashboard ?
Filter condition - Devices count needs to be displayed as "4" for SAMPLE application and "2" for SAMPLE2 application on Kibana.
Without Filter condition - Device count to be displayed as "6" devices.
{
"status" : "SUCCESS",
"request" : ["ABC"],
"applicationName" : "SAMPLE",
"endTime" : 1478772517736,
"devices" : ["d1","d2","d3","d4"]
}
,
{
"status" : "FAILED",
"request" : ["EDF"],
"applicationName" : "SAMPLE2",
"endTime" : 1478772517736,
"devices" : ["d5","d12"]
}

You should create a scripted field in Kibana in order to get the length of an array element. So your script could look something like this:
doc['devices'].values.size()
OR
doc['devices'].values.length
And then you can have a Data Table visualization, where having the array count in respective to the applicationName by using the terms aggregation. Or you could apply filters saying:
applicationName:"SAMPLE"
applicationName:"SAMPLE2"
which will display the array count for the given filter criteria. This SO could be helpful.

Related

How could i remove items from another search?

On elastic search we make two searches, one for exact items, and another for non-exact items.
On we search input = dev, and on the exact result we get this item:
{"_id" : "users-USER#1-name",
"_source" : {
"pk" : "USER#1",
"entity" : "users",
"field" : "name",
"input" : "dev",
}}
Then we do a second search for the non-exact results we get this item:
{"_id" : "users-USER#1-description",
"_source" : {
"pk" : "USER#1",
"entity" : "users",
"field" : "name",
"input" : "Dev1",
}}
We want to remove the exact results from the first search from the second non-exact search by pk, we want to remove the items with the pk's from the first search from the second search
I'll heavenly appreciate any idea.
For example, on the fist search we got item:
"_id" : "users-USER#1-name"
"pk" : "USER#1"
Since we got this item on the first search, we want to remove all the items with the pks from the second search.
So the second search would be empty:
empty

elasticsearch - query between document types

I have a production_order document_type
i.e.
{
part_number: "abc123",
start_date: "2018-01-20"
},
{
part_number: "1234",
start_date: "2018-04-16"
}
I want to create a commodity document type
i.e.
{
part_number: "abc123",
commodity: "1 meter machining"
},
{
part_number: "1234",
commodity: "small flat & form"
}
Production orders are datawarehoused every week and are immutable.
Commodities on the other hand could change over time. i.e abc123 could change from 1 meter machining to 5 meter machining, so I don't want to store this data with the production_order records.
If a user searches for "small flat & form" in the commodity document type, I want to pull all matching records from the production_order document type, the match being between part number.
Obviously I can do this in a relational database with a join. Is it possible to do the same in elasticsearch?
If it helps, we have about 500k part numbers that will be commoditized and our production order data warehouse currently holds 20 million records.
I have found that you can indeed now query between indexs in elasticsearch, however you have to ensure your data stored correctly. Here is an example from the 6.3 elasticsearch docs
Terms lookup twitter example At first we index the information for
user with id 2, specifically, its followers, then index a tweet from
user with id 1. Finally we search on all the tweets that match the
followers of user 2.
PUT /users/user/2
{
"followers" : ["1", "3"]
}
PUT /tweets/tweet/1
{
"user" : "1"
}
GET /tweets/_search
{
"query" : {
"terms" : {
"user" : {
"index" : "users",
"type" : "user",
"id" : "2",
"path" : "followers"
}
}
}
}
Here is the link to the original page
https://www.elastic.co/guide/en/elasticsearch/reference/6.1/query-dsl-terms-query.html
In my case above I need to setup my storage so that commodity is a field and it's values are an array of part numbers.
i.e.
{
"1 meter machining": ["abc1234", "1234"]
}
I can then look up the 1 meter machining part numbers against my production_order documents
I have tested and it works.
There is no joins supported in elasticsearch.
You can query twice first by getting all the partnumbers using "small flat & form" and then using all the partnumbers to query the other index.
Else try to find a way to merge these into a single index. That would be better. Updating the Commodities would not cause you any problem by combining the both.

How to use multiple query strings with aggregation in elasticsearch

How to use multiple query strings with aggregate functions in elasticsearch?
For example:
if a>0 AND a<1, then {"low":count(aggregate count of records within 0 to 1)}
else if a > 1 AND a < 100, then {"normal":count(aggregate count of records within 1 to 100)}
else {"high":count(aggregate count of records after 100)}
How to achieve this using Request Body Query string?
Thank you in advance.
Assuming that a is a field that you search on, I think the easiest way for you to do that is using the range aggregation with buckets for each of your use-cases (low, normal, high).
You cannot bind aggregations to conditions of your query. That you would have to do in code yourself. But if you use the range aggregation, you could define your buckets like
POST /_search
{
"aggs" : {
"a_ranges" : {
"range" : {
"field" : "a",
"ranges" : [
{ "to" : 1 },
{ "from" : 1, "to" : 10 },
{ "from" : 10 }
]
}
}
}
}
Depending on your query, two of these buckets would remain empty, but this should give you the result you want

Finding duplicate documents

I have some documents whose ids are randomly generated. The issue here is I need to find the duplicates amongst these documents. I have three fields which should not be identical for two documents. So how to check for duplicates based on multiple fields?
Sample documents
document 1 = {
"process" : "business",
"processId" : 5433321,
"country" : "US"
}
document 2 = {
"process" : "operations",
"processId" : 334233,
"country" : "UK"
}
document 3 = {
"process" : "business",
"processId" : 5433321,
"country" : "US"
}
Here as you can see, document 1 and document 3 are the same, but they are having different Ids in my database,so exist as separate documents. So on run I need to find the above as duplicates and if possible keep only one.
The best option here would be to model your document around doc ID. Now for each unique document , create a docID which is a hash of the content of the document. This makes sure that only one unique document exists across the index. Next use _create API to create documents. This will fail all requests on over write document with same document ID.
You can further read about other duplication issues and its solutions here.

MongoDB complex find

I need to grab the top 3 results for each of the 8 users. Currently I am looping through for each user and making 8 calls the the db. Is there a way to structure the query to pull the same 8X3 dataset in a single db pull?
selected_users = users.sample(8)
cur = 0
while cur <= selected_users .count-1
cursor = status_store.find({'user' => selected_users[cur]},{:fields =>params}).sort('score', -1).limit(3)
*do something*
cur+=1
end
The collection I am pulling from looks like the below. Each user can have an unbound number of tweets so I have not embedded them within within a user document.
{
"_id" : ObjectId("51e92cc8e1ce7219e40003eb"),
"id_str" : "57915476419948544",
"score" : 904,
"text" : "Yesterday we had a bald eagle on the show. Oddly enough, he was in the country illegally.",
"timestamp" : "19/07/2013 08:10",
"user" : {
"id_str" : "115485051",
"name" : "Conan O'Brien",
"screen_name" : "ConanOBrien",
"description" : "The voice of the people. Sorry, people.",
}
}
Thanks in advance.
Yes you can do this using the aggregation framework.
Another way would be to keep track of the top 3 scores for in the user documents. If this is faster or not depends on how often you write to scores vs read to top scores by users.

Resources