Elasticsearch delete debug syslog messages older than x days and with a certain term - elasticsearch

Im trying to get rid of debug syslog Messages after a certain amount of time.
The query is running without Errors but isnĀ“t deleting any data:
curl -XPOST 'localhost:9200/logstash-syslog-vmware/_delete_by_query?pretty' -H 'Content-Type: application/json' -d '
{
"query": {
"bool" : {
"must" : {
"match" : {
"syslog_severity" : "debug" }
},
"filter" : {
"range" : {
"#timestamp" : {
"gt" : "2017-10-13T09:00:00",
"lt" : "2017-10-13T11:30:00"
}
}
}
}
}
}'

Not really the answer that has been solved as per comment (using a wildcard) but let me say something.
If you intend to purge everyday the debug level old documents then I'd recommend using different indices:
One for the trace, debug levels and another for the others.
Then when you need to purge old data, just drop the index like DELETE logstash-syslog-vmware-debug-2017-10-13.
It will be much more efficient.
If it was only a one go operation, then feel free to ignore me :)

Related

Elasticsearch Query "must match" in log

I have the following in my log that i would like to use ElasticSearch query to find:
2014-07-02 20:52:39 INFO home.helloworld: LOGGER/LOG:ID1234 has successfully been received, {"uuid"="abc123"}
2014-07-02 20:52:39 INFO home.helloworld: LOGGER/LOG:ID1234 has successfully been transferred, {"uuid"="abc123"}
2014-07-02 20:52:39 INFO home.byebyeworld: LOGGER/LOG:ID1234 has successfully been processed, {"uuid"="abc123"}
2014-07-02 20:52:39 INFO home.byebyeworld: LOGGER/LOG:ID1234 has exited, {"uuid"="abc123"}
2014-07-02 20:53:00 INFO home.helloworld: LOGGER/LOG:ID1234 has successfully been received, {"uuid"="def123"}
2014-07-02 20:53:00 INFO home.helloworld: LOGGER/LOG:ID1234 has successfully been transferred, {"uuid"="def123"}
2014-07-02 20:53:00 INFO home.byebyeworld: LOGGER/LOG:ID1234 has successfully been processed, {"uuid"="def123"}
2014-07-02 20:53:00 INFO home.byebyeworld: LOGGER/LOG:ID1234 has exited, {"uuid"="def123"}
Since each of above line is represented as single "message" in elasticsearch, i have a hard time querying it using POST rest calls. I tried using "must match" like below to only get line 1 of my log but it is not consistent, sometimes it returns multiple hits instead of just 1 hit:
{
"query" : {
"constant_score" : {
"filter" : {
"bool" : {
"must" : [
{ "match_phrase_prefix" : {"message" : "home.helloworld:"}},
{ "match_phrase_prefix" : {"message" : "LOGGER/LOG:ID1234"}},
{ "match" : {"message" : "received, {\"uuid\"=\"abc123\"}"}}
]
}
}
}
}
}
am i doing anything wrong with above elasticsearch query? i thought "must" is equal to AND, and "match" is more of CONTAINS, "match_phrase_prefix" is STARTSWITH? can someone please show me how to properly query a log filled with above logs with different uuid number and only return the single hit? originally i thought i got the query down with above, it first returned just 1 hit but then it returned 2 then a lot more.. .which to me is not consistent. Thank you in advance!!
The problem is with the 3-rd clause of your bool query. Let me give you couple queries which will just work for you and I'll explain why they do the job.
First Query
curl -XGET http://localhost:9200/my_logs/_search -d '
{
"query" : {
"constant_score" : {
"filter" : {
"bool" : {
"must" : [
{ "match_phrase_prefix" : {"message" : "home.helloworld:"}},
{ "match_phrase_prefix" : {"message" : "LOGGER/LOG:ID1234"}},
{ "match" : {
"message" : {
"query": "received, {\"uuid\"=\"abc123\"",
"operator": "and"
}
}
}
]
}
}
}
}
}'
Explanation
Let's make sure we are on the same page about indexing. By default the indexer will pass your data through standard chain of analysis. I.e. splitting by whitespace, reducing special characters, making lower-casing, etc. So in the indices we will just have tokens with its positions.
Match query as full-text query will take your query text "received, {\"uuid\"=\"abc123\"" and will pass through analysis as well. By default this analysis just splits your text by whitespace, reduces special characters, makes lower-casing, etc. The result of this analysis would look similar to this (simplified): received, uuid, abc123.
What match query will do - it will combine those tokens against message field using default operator (which is or). So as a logical expression the very last clause (match-query) would look like this: message:received OR message:uuid OR message:abc123.
This is why first 4 log-entries will match. I was able to reproduce it.
Second Query
curl -XGET http://localhost:9200/my_logs/_search -d '
{
"query" : {
"constant_score" : {
"filter" : {
"bool" : {
"must" : [
{ "match_phrase_prefix" : {"message" : "home.helloworld:"}},
{ "match_phrase_prefix" : {"message" : "LOGGER/LOG:ID1234"}},
{ "match_phrase_prefix" : {"received, {\"uuid\"=\"abc123\""}}
]
}
}
}
}
}'
Explanation
This one a little bit simpler. Remember: our indexing process left tokens and their positions in the index.
What actually Match Phrase Prefix query is doing - it takes the input query (let's take "received, {\"uuid\"=\"abc123\"" as an example), makes exactly the same step of query text analysis. And tries to find tokens received, uuid, abc123 at neighboring positions in the index. Just in the same exact ordering: received -> uuid -> abc123 (almost).
Except the very last token, which in our case is abc123. To be precisely, it will make the wildcard for the last token. I.e. received -> uuid -> abc123*.
To be perfectionist I'd add that received -> uuid -> abc123 (i.e. without wildcard in the end) - is what actually Match Phrase query is doing. It also counts the positions in the index, i.e. tries to match the 'phrase', not just separate tokens in random positions.

how to copy ElasticSearch field to another field

I have 100GB ES index now. Right now I need to change one field to multi-fields, such as: username to username.username and username.raw (not_analyzed). I know it will apply to the incoming data. But how can I make this change affect on the old data? Should I using index scroll to copy the whole index to a new one, Or there is a better solution to just copy one filed please.
There's a way to achieve this without reindexing all your data by using the update by query plugin.
Basically, after installing the plugin, you can run the following query and all your documents will get the multi-field re-populated.
curl -XPOST 'localhost:9200/your_index/_update_by_query' -d '{
"query" : {
"match_all" : {}
},
"script" : "ctx._source.username = ctx._source.username;"
}'
It might take a while to run on 100GB docs, but after this runs, the username.raw field will be populated.
Note: for this plugin to work, one needs to have scripting enabled.
POST index/type/_update_by_query
{
"query" : {
"match_all" : {}
},
"script" :{
"inline" : "ctx._source.username = ctx._source.username;",
"lang" : "painless"
}
}
This worked for me on es 5.6, above one did not!

Elasticsearch has_child query/filter in Kibana 4

I cannot seem to get the has_child query (or filter) to function in Kibana 4. My code works in elasticsearch directly as a curl script, but not in Kibana 4, yet I understood this was a key feature of the upgrade. Can anybody shed any light?
The curl script as follows works in elasticsearch, returning all of the parents where they have a child object:
curl -XPOST localhost:port/indexname/_search?pretty -d '{
"query" : {
"has_child" : {
"type" : "object",
"query" : {
"match_all" : {}
}
}
}
}'
The above runs fine. Then to convert it to the JSON query to submit within Kibana, I've followed the general formatting rules: I've dropped the curl line and added the index name (and sometimes a blank filter [], but it doesn't seem to make much difference); no error is thrown but the whole dataset returns.
{
"index" : "indexname",
"query" : {
"has_child" : {
"type" : "object",
"query" : {
"match_all" : {}
}
}
}
}
Am I missing something? Has anybody else got a has_child query to run in Kibana 4?
Many thanks in advance
Toby

Delete all documents from index/type without deleting type

I know one can delete all documents from a certain type via deleteByQuery.
Example:
curl -XDELETE 'http://localhost:9200/twitter/tweet/_query' -d '{
"query" : {
"term" : { "user" : "kimchy" }
}
}'
But i have NO term and simply want to delete all documents from that type, no matter what term. What is best practice to achieve this? Empty term does not work.
Link to deleteByQuery
I believe if you combine the delete by query with a match all it should do what you are looking for, something like this (using your example):
curl -XDELETE 'http://localhost:9200/twitter/tweet/_query' -d '{
"query" : {
"match_all" : {}
}
}'
Or you could just delete the type:
curl -XDELETE http://localhost:9200/twitter/tweet
Note: XDELETE is deprecated for later versions of ElasticSearch
The Delete-By-Query plugin has been removed in favor of a new Delete By Query API implementation in core. Read here
curl -XPOST 'localhost:9200/twitter/tweet/_delete_by_query?conflicts=proceed&pretty' -H 'Content-Type: application/json' -d'
{
"query": {
"match_all": {}
}
}'
From ElasticSearch 5.x, delete_by_query API is there by default
POST: http://localhost:9200/index/type/_delete_by_query
{
"query": {
"match_all": {}
}
}
You can delete documents from type with following query:
POST /index/type/_delete_by_query
{
"query" : {
"match_all" : {}
}
}
I tested this query in Kibana and Elastic 5.5.2
Torsten Engelbrecht's comment in John Petrones answer expanded:
curl -XDELETE 'http://localhost:9200/twitter/tweet/_query' -d
'{
"query":
{
"match_all": {}
}
}'
(I did not want to edit John's reply, since it got upvotes and is set as answer, and I might have introduced an error)
Starting from Elasticsearch 2.x delete is not anymore allowed, since documents remain in the index causing index corruption.
Since ElasticSearch 7.x, delete-by-query plugin was removed in favor of new Delete By Query API.
The curl option:
curl -X POST "localhost:9200/my-index/_delete_by_query" -H 'Content-Type: application/json' -d' { "query": { "match_all":{} } } '
Or in Kibana
POST /my-index/_delete_by_query
{
"query": {
"match_all":{}
}
}
The above answers no longer work with ES 6.2.2 because of Strict Content-Type Checking for Elasticsearch REST Requests. The curl command which I ended up using is this:
curl -H'Content-Type: application/json' -XPOST 'localhost:9200/yourindex/_doc/_delete_by_query?conflicts=proceed' -d' { "query": { "match_all": {} }}'
In Kibana Console:
POST calls-xin-test-2/_delete_by_query
{
"query": {
"match_all": {}
}
}
(Reputation not high enough to comment)
The second part of John Petrone's answer works - no query needed. It will delete the type and all documents contained in that type, but that can just be re-created whenever you index a new document to that type.
Just to clarify:
$ curl -XDELETE 'http://localhost:9200/twitter/tweet'
Note: this does delete the mapping! But as mentioned before, it can be easily re-mapped by creating a new document.
Note for ES2+
Starting with ES 1.5.3 the delete-by-query API is deprecated, and is completely removed since ES 2.0
Instead of the API, the Delete By Query is now a plugin.
In order to use the Delete By Query plugin you must install the plugin on all nodes of the cluster:
sudo bin/plugin install delete-by-query
All of the nodes must be restarted after the installation.
The usage of the plugin is the same as the old API. You don't need to change anything in your queries - this plugin will just make them work.
*For complete information regarding WHY the API was removed you can read more here.
You have these alternatives:
1) Delete a whole index:
curl -XDELETE 'http://localhost:9200/indexName'
example:
curl -XDELETE 'http://localhost:9200/mentorz'
For more details you can find here -https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-delete-index.html
2) Delete by Query to those that match:
curl -XDELETE 'http://localhost:9200/mentorz/users/_query' -d
'{
"query":
{
"match_all": {}
}
}'
*Here mentorz is an index name and users is a type
I'm using elasticsearch 7.5 and when I use
curl -XPOST 'localhost:9200/materials/_delete_by_query?conflicts=proceed&pretty' -d'
{
"query": {
"match_all": {}
}
}'
which will throw below error.
{
"error" : "Content-Type header [application/x-www-form-urlencoded] is not supported",
"status" : 406
}
I also need to add extra -H 'Content-Type: application/json' header in the request to make it works.
curl -XPOST 'localhost:9200/materials/_delete_by_query?conflicts=proceed&pretty' -H 'Content-Type: application/json' -d'
{
"query": {
"match_all": {}
}
}'
{
"took" : 465,
"timed_out" : false,
"total" : 2275,
"deleted" : 2275,
"batches" : 3,
"version_conflicts" : 0,
"noops" : 0,
"retries" : {
"bulk" : 0,
"search" : 0
},
"throttled_millis" : 0,
"requests_per_second" : -1.0,
"throttled_until_millis" : 0,
"failures" : [ ]
}
Just to add couple cents to this.
The "delete_by_query" mentioned at the top is still available as a plugin in elasticsearch 2.x.
Although in the latest upcoming version 5.x it will be replaced by
"delete by query api"
Elasticsearch 2.3 the option
action.destructive_requires_name: true
in elasticsearch.yml do the trip
curl -XDELETE http://localhost:9200/twitter/tweet
For future readers:
in Elasticsearch 7.x there's effectively one type per index - types are hidden
you can delete by query, but if you want remove everything you'll be much better off removing and re-creating the index. That's because deletes are only soft deletes under the hood, until the trigger Lucene segment merges*, which can be expensive if the index is large. Meanwhile, removing an index is almost instant: remove some files on disk and a reference in the cluster state.
* The video/slides are about Solr, but things work exactly the same in Elasticsearch, this is Lucene-level functionality.
If you want to delete document according to a date.
You can use kibana console (v.6.1.2)
POST index_name/_delete_by_query
{
"query" : {
"range" : {
"sendDate" : {
"lte" : "2018-03-06"
}
}
}
}

What's the reason for specifying only the 'field' option for the Term & Phrase suggesters in elasticsearch

When using the suggester API, we are forced to specify the field option :
"suggest" : {
"text" : "val",
"sug_name" : {
"term" : {
"field" : "field_name"
}
}
}
Is this field supposed to be a valid field name of some type ?
If so, fields can exist only in the context of types AFAIK.
Why isn't possible to also specify (at least optionally) the type the field belongs to ?
Is your question if "field" has to be a valid field?
YES it does if you want it to find anything, you are welcome to search for fields that dont exist, although that seems an odd thing to do.
Your second question, the answer, I believe, is NO, you can not specify a _type using the _suggest api, you can use a suggest block with the _search api as shown here
curl -s -XPOST 'localhost:9200/_search' -d '{
"query" : {
...
},
"suggest" : {
...
}
}'

Resources