How do we delete only certain events in an index in elasticsearch? - elasticsearch

I wanted to know if there is any command that can be used to delete a specific event after indexing. I'm using windows.

You can delete a specific document in your Elasticsearch index. You just need to know the index name in which it resides, its mapping type, and its id (e.g. AVEf9). Then you can use the delete API in order to achieve it.
curl -XDELETE http://localhost:9200/index_name/type_name/AVEf9

I guess marvel-sense is helpful when you want to run large queries. I used the following to delete data which existed before 10 minutes from now.
DELETE /movie1_indexer/movie/_query
{
"query": {
"range": {
"#timestamp":{
"lt":"now-10m"
}
}
}
}
lt-less than.
https://www.elastic.co/guide/en/elasticsearch/guide/current/time-based.html

You can use CURL or WGET (you might want to install CYGWIN or something similar):
The following example shows how to delete tweets from the twitter index that have the user value of "kimchy". Read about the elasticsearch query structure to understand the JSON in the 2nd example
Search API -> https://www.elastic.co/guide/en/elasticsearch/reference/1.4/search-search.html
Delete via Query -> https://www.elastic.co/guide/en/elasticsearch/reference/1.7/docs-delete-by-query.html
$ curl -XDELETE 'http://localhost:9200/twitter/tweet/_query?q=user:kimchy'
$ curl -XDELETE 'http://localhost:9200/twitter/tweet/_query' -d '{
"query" : {
"term" : { "user" : "kimchy" }
}
}
You could also install REST Easy or some similar browser plugin:
https://chrome.google.com/webstore/detail/resteasy/nojelkgnnpdmhpankkiikipkmhgafoch?hl=en-US

Related

How to create document in elasticsearch to save data and to search it?

Here it is my requirement, This is my 3levels of data which I am gettting from DB , my requirement is when I search for Developer I should get all the values of Developer such as Geo and Graph from Data2 in a list and while coming to support my values should contain Server and Data in a list and then on the basis of selection of Data1 . Data3 should be able to do the search , like suppose when we select developer then Geopos and Graphpos...
the logic which i need to use here is of elasticsearch
data1 data2 data3
Developer GEO GeoPos
Developer GRAPH GraphPos
Support SERVER ServerPos
Support Data DataPos
this is what I have done to crete the index and to get the values
curl -X PUT http://localhost:9200/mapping_log
{ "mappings":{ "properties":{"data1:{"type": "text","fields":{"keyword":{"type":"keyword"}}}, {"data2":{"type": "text","fields":{"keyword":{"type":"keyword"}}}, {"data3":{"type": "text","fields":{"keyword":{"type":"keyword"}}}, } } } 
searching values , I am not sure what I am going to get can u pls help with search dsl query too
curl -X GET "localhost:9200/mapping_log/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query": {
"match": {
"data1.data2": "product"
}
}
}
How to create document for such type of Data can we create json and post it through postman or curl ?
If your documents are not indexed in elastic search first you need to ingest them to an existing index in elastic with the aid of Logstah , you can find many configuration file related to you input database.
Before transforming your documents create and index in elastic with multi fields mapping, you can use dynamic mapping(elastic default mapping) also and change your Dsl query but I recommend to use multi fields mapping as follow
PUT /mapping{
"mappings":
{"properties": {"rating":{"type": "float"},
"content":{"type": "text"},
"author":{"properties": {
"name":{"type": "text"},
"email":{"type": "keyword"}
}}
}}
}
The result will be
Mapping result
then you can query the fields in kibana Dev tools with DSL query like below
GET /mapping/_search{
"query": {"match":
{ "author.email": "SOMEMAIL"}}
}

How to delete data from a specific index in elasticsearch after a certain period?

I have an index in elasticsearch with is occupied by some json files with respected to timestamp.
I want to delete data from that index.
curl -XDELETE http://localhost:9200/index_name
Above code deletes the whole index. My requirement is to delete certain data after a time period(for example after 1 week). Could I automate the deletion process?
I tried to delete by using curator.
But I think it deletes the indexes created by timestamp, not data with in an index. Can we use curator for delete data within an index?
It will be pleasure if I get to know that either of following would work:
Can Curl Automate to delete data from an index after a period?
Can curator Automate to delete data from an index after a period?
Is there any other way like python scripting to do the job?
References are taken from the official site of elasticsearch.
Thanks a lot in advance.
You can use the DELETE BY QUERY API: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-delete-by-query.html
Basically it will delete all the documents matching the provided query:
POST twitter/_delete_by_query
{
"query": {
"match": {
"message": "some message"
}
}
}
But the suggested way is to implement indexes for different periods (days for example) and use curator to drop them periodically, based on the age:
...
logs_2019.03.11
logs_2019.03.12
logs_2019.03.13
logs_2019.03.14
Simple example using Delete By Query API:
POST index_name/_delete_by_query
{
"query": {
"bool": {
"filter": {
"range": {
"timestamp": {
"lte": "2019-06-01 00:00:00.0",
"format": "yyyy-MM-dd HH:mm:ss.S"
}
}
}
}
}
}
This will delete records which have a field "timestamp" which is the date/time (within the record) at which they occured. One can run the query to get a count for what will be deleted.
GET index_name/_search
{
"size": 1,
"query: {
-- as above --
Also it is nice to use offset dates
"lte": "now-30d",
which would delete all records older than 30 days.
You can always delete single documents by using the HTTP request method DELETE.
To know which are the id's you want to delete you need to query your data. Probably by using a range filter/query on your timestamp.
As you are interacting with the REST api you can do this with python or any other language. There is also a Java client if you prefer a more direct api.

Elasticsearch check if script from file system is loaded

I use Groovy scripts in elasticsearch's config/script directory to update/upsert.
As there are several nodes I would like to add a guard in the application code that the script is loaded/available and I need to document these steps with curl for admins.
I tried
curl -XPOST localhost:9200/_scripts/groovy/<script_name>
which will just return indexed scripts - not the file system ones.
The closest I got is
curl localhost:9200/_search -d '
{
"script_fields": {
"<some_string_as_fieldname>": {
"script_file": "<script_name>",
"params": {
"count": 1
}
}
}
}'
which basically renders an exception MissingPropertyException[No such property: ctx for class: Script2]] because my script is an update-script requiring the ctx-variable. (During the update it works like charm.)
If it's not there, I get nested: ElasticsearchIllegalArgumentException[Unable to find on disk script decrement_entity_e33]
However, I guess there will be a more straight forward approach to check the script loaded. I just cannot find it.
Any help appreciated.

how to copy ElasticSearch field to another field

I have 100GB ES index now. Right now I need to change one field to multi-fields, such as: username to username.username and username.raw (not_analyzed). I know it will apply to the incoming data. But how can I make this change affect on the old data? Should I using index scroll to copy the whole index to a new one, Or there is a better solution to just copy one filed please.
There's a way to achieve this without reindexing all your data by using the update by query plugin.
Basically, after installing the plugin, you can run the following query and all your documents will get the multi-field re-populated.
curl -XPOST 'localhost:9200/your_index/_update_by_query' -d '{
"query" : {
"match_all" : {}
},
"script" : "ctx._source.username = ctx._source.username;"
}'
It might take a while to run on 100GB docs, but after this runs, the username.raw field will be populated.
Note: for this plugin to work, one needs to have scripting enabled.
POST index/type/_update_by_query
{
"query" : {
"match_all" : {}
},
"script" :{
"inline" : "ctx._source.username = ctx._source.username;",
"lang" : "painless"
}
}
This worked for me on es 5.6, above one did not!

How to update multiple documents that match a query in elasticsearch

I have documents which contains only "url"(analyzed) and "respsize"(not_analyzed) fields at first. I want to update documents that match the url and add new field "category"
I mean;
at first doc1:
{
"url":"http://stackoverflow.com/users/4005632/mehmet-yener-yilmaz",
"respsize":"500"
}
I have an external data and I know "stackoverflow.com" belongs to category 10,
And I need to update the doc, and make it like:
{
"url":"http://stackoverflow.com/users/4005632/mehmet-yener-yilmaz",
"respsize":"500",
"category":"10"
}
Of course I will do this all documents which url fields has "stackoverflow.com"
and I need the update each doc oly once.. Because category data of url is not changeable, no need to update again.
I need to use _update api with _version number to check it but cant compose the dsl query.
EDIT
I run this and looks works fine:
But documents not changed..
Although query result looks true, new field not added to docs, need refresh or etc?
You could use the update by query plugin in order to do just that. The idea is to select all document without a category and whose url matches a certain string and add the category you wish.
curl -XPOST 'localhost:9200/webproxylog/_update_by_query' -H "Content-Type: application/json" -d '
{
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"url": "stackoverflow.com"
}
},
{
"missing": {
"field": "category"
}
}
]
}
}
}
},
"script" : "ctx._source.category = \"10\";"
}'
After running this, all your documents with url: stackoverflow.com that don't have a category, will get category: 10. You can run the same query again later to fix new stackoverflow.com documents that have been indexed in the meantime.
Also make sure to enable scripting in elasticsearch.yml and restart ES:
script.inline: on
script.indexed: on
In the script, you're free to add as many fields as you want, e.g.
...
"script" : "ctx._source.category1 = \"10\"; ctx._source.category2 = \"20\";"
UPDATE
ES 2.3 now features the update by query functionality. You can still use the above query exactly as is and it will work (except that filtered and missing are deprecated, but still working ;).
That all sounds great but just to add to #Val answer, Update By Query is available form ElasticSearch 2.x but not for earlier versions. In our case we're using 1.4 for legacy reasons and there is no chance of upgrading in forseeable future so another solution is using the Update by query plugin provided here: https://github.com/yakaz/elasticsearch-action-updatebyquery

Resources