I have 100GB ES index now. Right now I need to change one field to multi-fields, such as: username to username.username and username.raw (not_analyzed). I know it will apply to the incoming data. But how can I make this change affect on the old data? Should I using index scroll to copy the whole index to a new one, Or there is a better solution to just copy one filed please.
There's a way to achieve this without reindexing all your data by using the update by query plugin.
Basically, after installing the plugin, you can run the following query and all your documents will get the multi-field re-populated.
curl -XPOST 'localhost:9200/your_index/_update_by_query' -d '{
"query" : {
"match_all" : {}
},
"script" : "ctx._source.username = ctx._source.username;"
}'
It might take a while to run on 100GB docs, but after this runs, the username.raw field will be populated.
Note: for this plugin to work, one needs to have scripting enabled.
POST index/type/_update_by_query
{
"query" : {
"match_all" : {}
},
"script" :{
"inline" : "ctx._source.username = ctx._source.username;",
"lang" : "painless"
}
}
This worked for me on es 5.6, above one did not!
Related
For the app i created, the indices are generated once in a week. And the type and nature of the data is not varying and that implies, I need the same mapping type for these indices. Is it possible in elasticsearch to apply the same mapping to all the indices as they are created?. This could avoid me the overhead of defining mapping each time the index is created.
Definitely, you can use what is called an index template. Since your mapping type is stable, that's the perfect condition for using index templates.
It's as easy as creating an index. See below, whenever you want to index a document in an index whose name matches my_*, ES will select that template and create the index for you using the given mappings, settings and aliases:
curl -XPUT localhost:9200/_template/template_1 -d '{
"template" : "my_*",
"settings" : {
"number_of_shards" : 1
},
"aliases" : {
"my_alias" : {}
},
"mappings" : {
"my_type" : {
"properties" : {
"my_field": { "type": "string" }
}
}
}
}'
It's basically the technique used by Logstash when it needs to index new logs for each new day in a new daily index.
You can employ index template to address your problem. The official documentation can be found here.
A use case of how to apply the same with examples can be found in this blog
I wanted to know if there is any command that can be used to delete a specific event after indexing. I'm using windows.
You can delete a specific document in your Elasticsearch index. You just need to know the index name in which it resides, its mapping type, and its id (e.g. AVEf9). Then you can use the delete API in order to achieve it.
curl -XDELETE http://localhost:9200/index_name/type_name/AVEf9
I guess marvel-sense is helpful when you want to run large queries. I used the following to delete data which existed before 10 minutes from now.
DELETE /movie1_indexer/movie/_query
{
"query": {
"range": {
"#timestamp":{
"lt":"now-10m"
}
}
}
}
lt-less than.
https://www.elastic.co/guide/en/elasticsearch/guide/current/time-based.html
You can use CURL or WGET (you might want to install CYGWIN or something similar):
The following example shows how to delete tweets from the twitter index that have the user value of "kimchy". Read about the elasticsearch query structure to understand the JSON in the 2nd example
Search API -> https://www.elastic.co/guide/en/elasticsearch/reference/1.4/search-search.html
Delete via Query -> https://www.elastic.co/guide/en/elasticsearch/reference/1.7/docs-delete-by-query.html
$ curl -XDELETE 'http://localhost:9200/twitter/tweet/_query?q=user:kimchy'
$ curl -XDELETE 'http://localhost:9200/twitter/tweet/_query' -d '{
"query" : {
"term" : { "user" : "kimchy" }
}
}
You could also install REST Easy or some similar browser plugin:
https://chrome.google.com/webstore/detail/resteasy/nojelkgnnpdmhpankkiikipkmhgafoch?hl=en-US
When using the suggester API, we are forced to specify the field option :
"suggest" : {
"text" : "val",
"sug_name" : {
"term" : {
"field" : "field_name"
}
}
}
Is this field supposed to be a valid field name of some type ?
If so, fields can exist only in the context of types AFAIK.
Why isn't possible to also specify (at least optionally) the type the field belongs to ?
Is your question if "field" has to be a valid field?
YES it does if you want it to find anything, you are welcome to search for fields that dont exist, although that seems an odd thing to do.
Your second question, the answer, I believe, is NO, you can not specify a _type using the _suggest api, you can use a suggest block with the _search api as shown here
curl -s -XPOST 'localhost:9200/_search' -d '{
"query" : {
...
},
"suggest" : {
...
}
}'
I have a situation where I want to filter the results not when performing a search but rather a GET using elasticsearch. Basically I have a Document that has a status field indicating that the entity has a state of discarded. When performing the GET I need to check the value of this field thereby excluding it if the status is indeed one of "discarded".
I know i can do this using a search with a term query, but what about when using a GET against the index based on Document ID?
Update: Upon further investigation, it seems the only way to do this is to use percolation or a search. I hope I am wrong if anyone has any suggestions I am all ears.
Just to clarify I am using the Java API.
thanks
Try something like this:
curl http://domain/my_index/_search -d '{
"filter": {
"and": [
{
"ids" : {
"type" : "my_type",
"values" : ["123"]
}
},
{
"term" : {
"discarded" : "false"
}
}
]
}
}
NOTE: you can also use a missing filter if the discarded field does not exist on some docs.
NOTE 2: I don't think this will be markedly slower than a normal get request either...
Is it possible to register query (like the percolate process) and call them by name to execute them.
I am building an application that will let the user save search query associated with a label. I would like to save the query generated by the filter in ES.
If I save the query in an index, I have to call ES first to retrieve the query, extract the field containing the query and then call ES again to execute it. Can I do it in one call ?
The other solution is to register queries (labels with _percolator with an identifier of the user:
/_percolate/transaction/user1_label1
{
"userId": "user1",
"query":{
"term":{"field1":"foo" }
}
}
and when there is a new document use the percolator in a non indexing mode (filtered per userId) to retrieve which query match, and then update the document by adding a field "label":["user1_label1", "user1_label2"] and finaly index the document. SO the labelling is done at indexing time.
What do you think ?
Thanks in advance.
Try Filter Aliases.
curl -XPOST 'http://localhost:9200/_aliases' -d '
{
"actions" : [
{
"add" : {
"index" : "the_real_index",
"alias" : "user1",
"filter" : { "term" : { "field1" : "foo" } }
}
}
]
}'