I have a requirement to find the numbers of mobile applications registered by the customer. The Elastic Search index is designed as below (Mobile App in one index, Customers in one index and the association between both in 3rd index). When I created the Kibana Indexpattern for these 3 indices together, it does not provide meaningful/valid set of fields to query them.
mobile_users
{
"_index": "mobile_users",
"_type": "_doc",
"_id": "mobileuser_id1",
"_score": 1,
"_source": {
"userid": "mobileuser_id1",
"name": "jack",
"username": "jtest",
"identifiers": [ ],
"contactEmails": [ ],
"creationDate": "2020-09-29 09:18:36 GMT",
"lastUpdated": 1601371117354,
"isSuspended": false,
"authStrategyIds": [ ],
"subscription": false
}
}
mobile_applications
{
"_index": "mobile_applications",
"_type": "_doc",
"_id": "mobileapp_id1",
"_source": {
"appDefinition": {
"info": {
"version": "1.0",
"title": "TEST.MobileAPP"
},
"AppDisplayName": "TEST.MobileAPP1.0",
"appName": "TEST.MobileAPP",
"appVersion": "1.0",
"maturityState": "Test",
"isActive": false,
"owner": "mobileappowner",
"creationDate": "2020-09-24 11:21:44 GMT",
"lastModified": "2020-10-13 11:58:22 GMT",
"id": "mobileapp_id1"
}
registered_mobile_applications
{
"_index": "registered_mobile_applications",
"_type": "_doc",
"_id": "mobileuser_id1",
"_version": 1,
"_score": 1,
"_source": {
"applicationId": "mobileuser_id1",
"mobileappIds": [
"mobileapp_id1", "mobileapp_id2"
],
"lastUpdated": 1601371117929
}
}
Can you advise if there is any way to get the count of registered applications for the given customer?
it's Elasticsearch, not Elastic Search :)
given each of your document structures are dramatically different, it's not surprising you can't get much meaning from a single index pattern
however there's no way to natively count the values of an array in a document in Kibana. you could create a scripted field that should do it, or add that as a separate field during ingestion
Related
I'm new to Elasticsearch, and I cannot find a Delete query.
Here is an example of an document in myIndex :
{
"_index": "myIndex",
"_type": "_doc",
"_id": "IPc5kn8Bq7SuVr5qM9dq",
"_score": 1,
"_source": {
"code": "1234567",
"matches": [
{
"hostname": "hostnameA.com",
"url": "https://www.hostnameA.com/....",
},
{
"hostname": "hostnameB.com",
"url": "https://www.hostnameB.com/....",
},
{
"hostname": "hostnameC.com",
"url": "https://www.hostnameC.com/....",
},
{
"hostname": "hostnameD.com",
"url": "https://www.hostnameD.com/....",
},
]
}
}
Let's say this index contains 10k documents.
I would like a query to remove all the item from my array matches where the hostname is equal to hostnameC.com, and keeping all the others.
Anyone would have an idea to help me?
I have a documents in Elasticsearch with the following structure:
{
"_index": "logstash-2018.05.11",
"_type": "doc",
"_id": "LSg_T2MB-uso043FSCvT",
"_version": 1
"_source": {
"#version": "1",
"#timestamp": "2018-05-11T12:48:57.447Z",
"filename": "VARIABLEPART_COMMONPART"
},
"fields": {
"#timestamp": [
"2018-05-11T12:48:57.447Z"
]
}
}
I want to write some queries that allow me to count the aggregated documents for VARIABLEPART.
Also, I would like to make sure that the result of my queries can be viewed with Kibana, but I do not know where to start !
I have data of this format in elasticsearch, each one is in seperate document:
{ 'pid': 1, 'nm' : 'tom'}, { 'pid': 1, 'nm' : 'dick''},{ 'pid': 1, 'nm' : 'harry'}, { 'pid': 2, 'nm' : 'tom'}, { 'pid': 2, 'nm' : 'harry'}, { 'pid': 3, 'nm' : 'dick'}, { 'pid': 3, 'nm' : 'harry'}, { 'pid': 4, 'nm' : 'harry'}
{
"took": 137,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 8,
"max_score": null,
"hits": [
{
"_index": "query_test",
"_type": "user",
"_id": "AVj9KS86AaDUbQTYUmwY",
"_score": null,
"_source": {
"pid": 1,
"nm": "Harry"
}
},
{
"_index": "query_test",
"_type": "user",
"_id": "AVj9KJ9BAaDUbQTYUmwW",
"_score": null,
"_source": {
"pid": 1,
"nm": "Tom"
}
},
{
"_index": "query_test",
"_type": "user",
"_id": "AVj9KRlbAaDUbQTYUmwX",
"_score": null,
"_source": {
"pid": 1,
"nm": "Dick"
}
},
{
"_index": "query_test",
"_type": "user",
"_id": "AVj9KYnKAaDUbQTYUmwa",
"_score": null,
"_source": {
"pid": 2,
"nm": "Harry"
}
},
{
"_index": "query_test",
"_type": "user",
"_id": "AVj9KXL5AaDUbQTYUmwZ",
"_score": null,
"_source": {
"pid": 2,
"nm": "Tom"
}
},
{
"_index": "query_test",
"_type": "user",
"_id": "AVj9KbcpAaDUbQTYUmwb",
"_score": null,
"_source": {
"pid": 3,
"nm": "Dick"
}
},
{
"_index": "query_test",
"_type": "user",
"_id": "AVj9Kdy5AaDUbQTYUmwc",
"_score": null,
"_source": {
"pid": 3,
"nm": "Harry"
}
},
{
"_index": "query_test",
"_type": "user",
"_id": "AVj9KetLAaDUbQTYUmwd",
"_score": null,
"_source": {
"pid": 4,
"nm": "Harry"
}
}
]
}
}
And I need to find the pid's which have 'harry' and do not have 'tom', which in the above example are 3 and 4. Which essentialy means look for the documents having same pids where none of them has nm with value 'tom' but at least one of them have nm with value 'harry'.
How do I query that?
EDIT: Using Elasticsearch version 5
What if you have a POST request body which could look something like below, where you might use bool :
POST _search
{
"query": {
"bool" : {
"must" : {
"term" : { "nm" : "harry" }
},
"must_not" : {
"term" : { "nm" : "tom" }
}
}
}
}
I am relatively very new in Elasticsearch, so I might be wrong. But I have never seen such query. Simple filters can not be used here as those are applied on a doc (and not aggregations) which you do not want. What I see is you want to do a "Group by" query with "Having" clause (in terms of SQL). But Group by queries involve some aggregation (like avg, max, min of any field) which is used in "Having" clause. Basically you use a reducer for Post processing of aggregation results. For queries like this Bucket Selector Aggregation can be used. Read this
But your case is different. You do not want to apply Having clause on any metric aggregation but you want to check if some value is present in field (or column) of your "group by" data. In terms of SQL, you want to do a "where" query in "group by". This is what I have never seen. You can also read this
However, at application level, you can easily do this by breaking your query. First find unique pid where nm= harry using term aggs. Then get docs for those pid with additional condition nm != tom.
P.S. I am very new to ES. And I will be very happy if any one contradicts me show ways to do this in one query. I will also learn that.
We have one index with one type, clients/client, and clients can be self-referential in a parent-child hierarchy (but not using ES parent-child, as that doesnt support a self-referential structure).
We are considering using nested for this, but the hierarchy is potentially endless, which makes nested queries a bit of a hassle, or maybe even impossible.
What we would want to find is primarily all top-level parents, so we build our searchQuery by filtering/searching for all elements that dont have a reference to parent (a simple term value with the parent id). Also, we save a reference to each elements children inside of that element, a list of children IDs, so that we can do subsequent requests in the frontend when the user sees that element, for a hierarchical visualization.
However, the thing that gives us a headache is: how do we, without post-processing, find children elements, where the parent WASN'T found, ie orphaned children, so that they dont get lost in the process? Because the above described query, finding top-level parents that each find their own children, doesnt work, if the search query matches ONLY a child element. The only idea we have is doing a second request for this, but that destroys the score sorting. We have been toying with many ideas, but have fallen short of finding a one-request-elasticsearch-solution for this issue. Is there such a thing?
our data looks something like below, but of course we can save the entire tree in each element. The question is, which is the best approach.
"hits": {
"total": 5,
"max_score": 1,
"hits": [
{
"_index": "clientsv3",
"_type": "client",
"_id": "5",
"_score": 1,
"_source": {
"name": "Client 2 sub2",
"country": "Belgium",
"parentId": 2
}
},
{
"_index": "clientsv3",
"_type": "client",
"_id": "2",
"_score": 1,
"_source": {
"name": "Client 2",
"country": "France",
"children": [
3,
5
]
}
},
{
"_index": "clientsv3",
"_type": "client",
"_id": "4",
"_score": 1,
"_source": {
"name": "Client 2 sub sub",
"country": "Germany",
"parentId": 3
}
},
{
"_index": "clientsv3",
"_type": "client",
"_id": "1",
"_score": 1,
"_source": {
"name": "Client 1",
"country": "Germany"
}
},
{
"_index": "clientsv3",
"_type": "client",
"_id": "3",
"_score": 1,
"_source": {
"name": "Client 2 sub",
"country": "Germany",
"children": [
4
],
"parentId": 2
}
}
]
}
Is there a way to get only the matched keywords while searching on an analysed field. My case is I have a 'content' field (string analysed) against which a query is run like this:
GET /posts/post/_search?pretty=true
{
"query": {
"query_string": {
"query": "content:(obama or hilary)"
}
},
"fields": ["id", "interaction_id", "sentiment", "tweet_created_at", "content"]
}
I get output like this:
"hits": [
{
"_index": "posts_v1",
"_type": "post",
"_id": "51764639fdccca097f03d095",
"_score": 2.024847,
"fields": {
"content": "UGANDA HILARY",
"id": "51764639fdccca097f03d095",
"sentiment": 0,
"tweet_created_at": "2012-11-24T14:59:25Z",
"interaction_id": "1e236478961ca480e0744001f05ca8b8"
}
},
{
"_index": "posts_v1",
"_type": "post",
"_id": "51c2bae26c8f1806cb000001",
"_score": 1.9791828,
"fields": {
"content": "Obama in Berlin — looking back",
"id": "51c2bae26c8f1806cb000001",
"sentiment": 0,
"tweet_created_at": "2013-06-20T08:18:39Z",
"interaction_id": "1e2d98202c55a980e07493a024172cb6"
}
},
{
"_index": "posts_v1",
"_type": "post",
"_id": "51c3a6b06c8f185fcb000001",
"_score": 1.7071226,
"fields": {
"content": "Knowing Barack Obama, Hilary Clintonr",
"id": "51c3a6b06c8f185fcb000001",
"sentiment": 0,
"tweet_created_at": "2013-06-21T01:04:45Z",
"interaction_id": "1e2da0e8fb5fa480e07407b3fa87ab72"
}
}
]
So, I need to have something like this:
"hits": [
{
"_index": "posts_v1",
"_type": "post",
"_id": "51764639fdccca097f03d095",
"_score": 2.024847,
"fields": {
"content": "UGANDA HILARY",
"id": "51764639fdccca097f03d095",
"sentiment": 0,
"tweet_created_at": "2012-11-24T14:59:25Z",
"interaction_id": "1e236478961ca480e0744001f05ca8b8",
"content_tags": ["hilary"]
}
},
{
"_index": "posts_v1",
"_type": "post",
"_id": "51c2bae26c8f1806cb000001",
"_score": 1.9791828,
"fields": {
"content": "Obama in Berlin — looking back",
"id": "51c2bae26c8f1806cb000001",
"sentiment": 0,
"tweet_created_at": "2013-06-20T08:18:39Z",
"interaction_id": "1e2d98202c55a980e07493a024172cb6",
"content_tags": ["obama"]
}
},
{
"_index": "posts_v1",
"_type": "post",
"_id": "51c3a6b06c8f185fcb000001",
"_score": 1.7071226,
"fields": {
"content": "Knowing Barack Obama, Hilary Clintonr",
"id": "51c3a6b06c8f185fcb000001",
"sentiment": 0,
"tweet_created_at": "2013-06-21T01:04:45Z",
"interaction_id": "1e2da0e8fb5fa480e07407b3fa87ab72",
"content_tags": ["obama", "hilary"]
}
}
]
Please note the content_tags field in the second hits structure. Is there a way to acheive this?
Elasticsearch doesn't support returning which terms matched which field directly though I think it could implement one reasonably easily as an additional "highlighter". I think you have two options at this point:
Do something hacky with highlighting like asking for the text length to be the max(all_strings.map(strlen).max, min_highlight_length), strip the text that isn't highlighted, and dedupe. I believe min_highlight_length is 13 characters or something. That might only apply to the FVH, which I don't suggest you use, so maybe you can ignore that.
Do two searches either via multisearch or sequentially.