I'm new to Elasticsearch, and I cannot find a Delete query.
Here is an example of an document in myIndex :
{
"_index": "myIndex",
"_type": "_doc",
"_id": "IPc5kn8Bq7SuVr5qM9dq",
"_score": 1,
"_source": {
"code": "1234567",
"matches": [
{
"hostname": "hostnameA.com",
"url": "https://www.hostnameA.com/....",
},
{
"hostname": "hostnameB.com",
"url": "https://www.hostnameB.com/....",
},
{
"hostname": "hostnameC.com",
"url": "https://www.hostnameC.com/....",
},
{
"hostname": "hostnameD.com",
"url": "https://www.hostnameD.com/....",
},
]
}
}
Let's say this index contains 10k documents.
I would like a query to remove all the item from my array matches where the hostname is equal to hostnameC.com, and keeping all the others.
Anyone would have an idea to help me?
Related
I have a requirement to find the numbers of mobile applications registered by the customer. The Elastic Search index is designed as below (Mobile App in one index, Customers in one index and the association between both in 3rd index). When I created the Kibana Indexpattern for these 3 indices together, it does not provide meaningful/valid set of fields to query them.
mobile_users
{
"_index": "mobile_users",
"_type": "_doc",
"_id": "mobileuser_id1",
"_score": 1,
"_source": {
"userid": "mobileuser_id1",
"name": "jack",
"username": "jtest",
"identifiers": [ ],
"contactEmails": [ ],
"creationDate": "2020-09-29 09:18:36 GMT",
"lastUpdated": 1601371117354,
"isSuspended": false,
"authStrategyIds": [ ],
"subscription": false
}
}
mobile_applications
{
"_index": "mobile_applications",
"_type": "_doc",
"_id": "mobileapp_id1",
"_source": {
"appDefinition": {
"info": {
"version": "1.0",
"title": "TEST.MobileAPP"
},
"AppDisplayName": "TEST.MobileAPP1.0",
"appName": "TEST.MobileAPP",
"appVersion": "1.0",
"maturityState": "Test",
"isActive": false,
"owner": "mobileappowner",
"creationDate": "2020-09-24 11:21:44 GMT",
"lastModified": "2020-10-13 11:58:22 GMT",
"id": "mobileapp_id1"
}
registered_mobile_applications
{
"_index": "registered_mobile_applications",
"_type": "_doc",
"_id": "mobileuser_id1",
"_version": 1,
"_score": 1,
"_source": {
"applicationId": "mobileuser_id1",
"mobileappIds": [
"mobileapp_id1", "mobileapp_id2"
],
"lastUpdated": 1601371117929
}
}
Can you advise if there is any way to get the count of registered applications for the given customer?
it's Elasticsearch, not Elastic Search :)
given each of your document structures are dramatically different, it's not surprising you can't get much meaning from a single index pattern
however there's no way to natively count the values of an array in a document in Kibana. you could create a scripted field that should do it, or add that as a separate field during ingestion
I would like to remove member2 from members. I saw script
ctx._source.list_data.removeIf{list_item -> list_item.list_id == remove_id}
for a list but in my case it's not working. Is that possible?
"_index": "test",
"_type": "test",
"_id": "5",
"_score": 1.0,
"_source": {
"id": "1",
"description": "desc",
"name": "ss",
"members": {
"member1": {
"id": "2",
"role": "owner"
},
"member2": {
"role": "owner",
"id": "3"
}
}
}
}
You can use the update API:
POST test/_update/5
{
"script": "ctx._source.members.remove('member2')"
}
removeIf is for list. Your members2 is of type object so you need to use remove
{
"script": "if(ctx._source.members.member2.id=='3')
ctx._source.members.remove('member2')"
}
I am using the ES 6.5. When I fetch the required messages, I have to transpose and aggregate it. See example for more details.
Message retrieved - 2 messages retried for example:
{
"_index": "index_name",
"_type": "data",
"_id": "data_id",
"_score": 5.0851293,
"_source": {
"header": {
"id": "System_20190729152502239_57246_16667",
"creationTimestamp": "2019-07-29T15:25:02.239Z",
},
"messageData": {
"messageHeader": {
"date": "2019-06-03",
"mId": "1000",
"mDescription": "TEST",
},
"messageBreakDown": [
{
"category": "New",
"subCategory": "Sub",
"messageDetails": [
{
"Amount": 5.30
}
]
}
]
}
}
},
{
"_index": "index_name",
"_type": "data",
"_id": "data_id",
"_score": 5.09512,
"_source": {
"header": {
"id": "System_20190729152502239_57246_16667",
"creationTimestamp": "2019-07-29T15:25:02.239Z",
},
"messageData": {
"messageHeader": {
"date": "2019-06-03",
"mId": "1000",
"mDescription": "TEST",
},
"messageBreakDown": [
{
"category": "Old",
"subCategory": "Sub",
"messageDetails": [
{
"Amount": 4.30
}
]
}
]
}
}
}
Now I am looking for a query to post on ES which will transpose the data and group by on category and sub category .
So basically if you check the messages, they have same header.id (which is the main search criteria). Within this header.id, one message is for category New and other Old (messageData.messageBreakDown is array and in it category value).
So ideally as you see the output, both messages belong to same mId, and it has New price and Old Price.
How to aggregate for the desired results ?
Final output message can have desired fields only e.g. date, mId, mDesciption, New price and Old price (both in one output)?
UPDATE:
Below is the mapping,
{"index_name":{"mappings":{"data":{"properties":{"header":{"properties":{"id":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"creationTimestamp":{"type":"date"}}},"messageData":{"properties":{"messageBreakDown":{"properties":{"category":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"messageDetails":{"properties":{"Amount":{"type":"float"}}},"subCategory":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}}}},"messageHeader":{"properties":{"mDescription":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"mId":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"date":{"type":"date"}}}}}}}}}}
I had like to update the logdate column for ALL records in a specific index. From what I have read so far, it seems that it is not possible? I am correct?
Here's a sample of a document:
{
"_index": "logstash-01-2015",
"_type": "ufdb",
"_id": "AU__EvrALg15uxY1Wxf9",
"_score": 1,
"_source": {
"message": "2015-08-14 06:50:05 [31946] PASS level2 10.249.10.70 level2 ads http://ad.360yield.com/unpixel.... GET",
"#version": "1",
"#timestamp": "2015-09-24T11:17:57.389Z",
"type": "ufdb",
"file": "/usr/local/ufdbguard/logs/ufdbguardd.log",
"host": "PROXY-DEV",
"offset": "3983281700",
"logdate": "2015-08-14T04:50:05.000Z",
"status": "PASS",
"group": "level2",
"clientip": "10.249.10.70",
"category": "ads",
"url": "http://ad.360yield.com/unpixel....",
"method": "GET",
"tags": [
"_grokparsefailure"
]
}
}
You are correct, that is not possible.
There's been an open issue asking Update by Query for long time, and I'm not sure it's going to be implemented anytime soon since it is very problematic for the underlying lucene engine. It requires deleting all documents and reindexing them.
An Update by Query Plugin is available on github, but it's experimental and I never tried it.
UPDATE 2018-05-02
The original answer is quite old. Update By Query is now supported.
You can use the partial update API.
To test it, I created a trivial index:
PUT /test_index
Then created a document:
PUT /test_index/doc/1
{
"message": "2015-08-14 06:50:05 [31946] PASS level2 10.249.10.70 level2 ads http://ad.360yield.com/unpixel.... GET",
"#version": "1",
"#timestamp": "2015-09-24T11:17:57.389Z",
"type": "ufdb",
"file": "/usr/local/ufdbguard/logs/ufdbguardd.log",
"host": "PROXY-DEV",
"offset": "3983281700",
"logdate": "2015-08-14T04:50:05.000Z",
"status": "PASS",
"group": "level2",
"clientip": "10.249.10.70",
"category": "ads",
"url": "http://ad.360yield.com/unpixel....",
"method": "GET",
"tags": [
"_grokparsefailure"
]
}
Now I can do a partial update on the document with:
POST /test_index/doc/1/_update
{
"doc": {
"logdate": "2015-09-25T12:20:00.000Z"
}
}
If I retrieve the document:
GET /test_index/doc/1
I will see that the logdate property has been updated:
{
"_index": "test_index",
"_type": "doc",
"_id": "1",
"_version": 2,
"found": true,
"_source": {
"message": "2015-08-14 06:50:05 [31946] PASS level2 10.249.10.70 level2 ads http://ad.360yield.com/unpixel.... GET",
"#version": "1",
"#timestamp": "2015-09-24T11:17:57.389Z",
"type": "ufdb",
"file": "/usr/local/ufdbguard/logs/ufdbguardd.log",
"host": "PROXY-DEV",
"offset": "3983281700",
"logdate": "2015-09-25T12:20:00.000Z",
"status": "PASS",
"group": "level2",
"clientip": "10.249.10.70",
"category": "ads",
"url": "http://ad.360yield.com/unpixel....",
"method": "GET",
"tags": [
"_grokparsefailure"
]
}
}
Here is the code I used to test it:
http://sense.qbox.io/gist/236bf271df6d867f5f0c87eacab592e41d3095cf
Is there a way to get only the matched keywords while searching on an analysed field. My case is I have a 'content' field (string analysed) against which a query is run like this:
GET /posts/post/_search?pretty=true
{
"query": {
"query_string": {
"query": "content:(obama or hilary)"
}
},
"fields": ["id", "interaction_id", "sentiment", "tweet_created_at", "content"]
}
I get output like this:
"hits": [
{
"_index": "posts_v1",
"_type": "post",
"_id": "51764639fdccca097f03d095",
"_score": 2.024847,
"fields": {
"content": "UGANDA HILARY",
"id": "51764639fdccca097f03d095",
"sentiment": 0,
"tweet_created_at": "2012-11-24T14:59:25Z",
"interaction_id": "1e236478961ca480e0744001f05ca8b8"
}
},
{
"_index": "posts_v1",
"_type": "post",
"_id": "51c2bae26c8f1806cb000001",
"_score": 1.9791828,
"fields": {
"content": "Obama in Berlin — looking back",
"id": "51c2bae26c8f1806cb000001",
"sentiment": 0,
"tweet_created_at": "2013-06-20T08:18:39Z",
"interaction_id": "1e2d98202c55a980e07493a024172cb6"
}
},
{
"_index": "posts_v1",
"_type": "post",
"_id": "51c3a6b06c8f185fcb000001",
"_score": 1.7071226,
"fields": {
"content": "Knowing Barack Obama, Hilary Clintonr",
"id": "51c3a6b06c8f185fcb000001",
"sentiment": 0,
"tweet_created_at": "2013-06-21T01:04:45Z",
"interaction_id": "1e2da0e8fb5fa480e07407b3fa87ab72"
}
}
]
So, I need to have something like this:
"hits": [
{
"_index": "posts_v1",
"_type": "post",
"_id": "51764639fdccca097f03d095",
"_score": 2.024847,
"fields": {
"content": "UGANDA HILARY",
"id": "51764639fdccca097f03d095",
"sentiment": 0,
"tweet_created_at": "2012-11-24T14:59:25Z",
"interaction_id": "1e236478961ca480e0744001f05ca8b8",
"content_tags": ["hilary"]
}
},
{
"_index": "posts_v1",
"_type": "post",
"_id": "51c2bae26c8f1806cb000001",
"_score": 1.9791828,
"fields": {
"content": "Obama in Berlin — looking back",
"id": "51c2bae26c8f1806cb000001",
"sentiment": 0,
"tweet_created_at": "2013-06-20T08:18:39Z",
"interaction_id": "1e2d98202c55a980e07493a024172cb6",
"content_tags": ["obama"]
}
},
{
"_index": "posts_v1",
"_type": "post",
"_id": "51c3a6b06c8f185fcb000001",
"_score": 1.7071226,
"fields": {
"content": "Knowing Barack Obama, Hilary Clintonr",
"id": "51c3a6b06c8f185fcb000001",
"sentiment": 0,
"tweet_created_at": "2013-06-21T01:04:45Z",
"interaction_id": "1e2da0e8fb5fa480e07407b3fa87ab72",
"content_tags": ["obama", "hilary"]
}
}
]
Please note the content_tags field in the second hits structure. Is there a way to acheive this?
Elasticsearch doesn't support returning which terms matched which field directly though I think it could implement one reasonably easily as an additional "highlighter". I think you have two options at this point:
Do something hacky with highlighting like asking for the text length to be the max(all_strings.map(strlen).max, min_highlight_length), strip the text that isn't highlighted, and dedupe. I believe min_highlight_length is 13 characters or something. That might only apply to the FVH, which I don't suggest you use, so maybe you can ignore that.
Do two searches either via multisearch or sequentially.