Delete Indexes by index name and type using elasticSearch 2.3.3 in java - spring

I have a project in java where I index the data using elastic search 2.3.3. The indexes are of two types.
My index doc looks like:
{
"took": 10,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 1,
"hits": [
{
"_index": "test_index",
"_type": "movies",
"_id": "uReb0g9KSLKS18sTATdr3A",
"_score": 1,
"_source": {
"genre": "Thriller"
}
},
{
"_index": "test_index",
"_type": "drama",
"_id": "cReb0g9KSKLS18sTATdr3B",
"_score": 1,
"_source": {
"genre": "SuperNatural"
}
},
{
"_index": "index1",
"_type": "drama",
"_id": "cReb0g9KSKLS18sT76ng3B",
"_score": 1,
"_source": {
"genre": "Romance"
}
}
]
}
}
I need to delete index of a particular name and type only.
For eg:- From the above doc, I want to delete indexes with Name "test_index" and type "drama".
So the result should look like:
{
"took": 10,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 1,
"hits": [
{
"_index": "test_index",
"_type": "movies",
"_id": "uReb0g9KSLKS18sTATdr3A",
"_score": 1,
"_source": {
"genre": "Thriller"
}
},
{
"_index": "index1",
"_type": "drama",
"_id": "cReb0g9KSKLS18sT76ng3B",
"_score": 1,
"_source": {
"genre": "Romance"
}
}
]
}
}
Solutions tried:
client.admin().indices().delete(new DeleteIndexRequest("test_index").actionGet();
But it delete both indexes with name "test_index"
I have also tried various queries in sense beta plugin like:
DELETE /test_index/drama
It gives the error: No handler found for uri [/test_index/drama] and method [DELETE]
DELETE /test_index/drama/_query?q=_id:*&analyze_wildcard=true
It also doesn't work.
When I fire delete index request at that time id of indexes are unknown to us and I have to delete the indexes by name and type only.
How can I delete the required indexes using java api?

This used to be possible till ES 2.0 using the delete mapping API, however since 2.0 Delete Mapping API does not exist any more.
To do this you will have to install the Delete by Query plugin. Then you can simply do a match all query on your index and type and then delete all of them.
The query will look something like this:
DELETE /test_index/drama/_query
{
"query": {
"query": {
"match_all": {}
}
}
}
Also keep in mind that this will delete the documents in the mapping and not the mapping itself. If you want to remove the mapping too you'll have to reindex without the mapping.
This might be able to help you with the java implementation

Related

Reduce data returned by ElasticSearch

I have the following query.
GET sales/_search
{
"query": {
"terms": {
"ean": ["8719092410766", "8719092444716"]
}
},
"_source": ["ean"],
"size": 10000
}
Which gives me the following result.
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 1,
"hits": [
{
"_index": "sales",
"_type": "doc",
"_id": "CuDvcGIBmw7bqEEVBvZq",
"_score": 1,
"_source": {
"ean": "8719092444716"
}
},
{
"_index": "sales",
"_type": "doc",
"_id": "DeDvcGIBmw7bqEEVBvZq",
"_score": 1,
"_source": {
"ean": "8719092410766"
}
},
{
"_index": "sales",
"_type": "doc",
"_id": "9yHvcGIBbx4s3M8zD9_u",
"_score": 1,
"_source": {
"ean": "8719092410766"
}
}
]
}
}
This is a lot of data, and I am actually only interested in the sources. What I would like it to return is this:
["8719092444716", "8719092410766"]
Or as closely as possible to it. Is there any trick that I can use to reduce the amount of data fetched from the database? I read about filter_path, but ElasticSearch 6.0 doesn't seem to recognize this keyword.
As you mentioned, you could use filter_path (docs), which is a parameter you can add to your request's URL and specify (comma separated) the data components you want to include in the response. For example, if you are interested in only the hits and none of the ES metrics, you could do (curl example)
curl http://localhost:9200/index01/type01/_search?filter_path=hits.hits
, and get the following response
{
"hits" : {
"hits" : [
{
"_index" : "index01",
"_id" : "6PHE_WIBts_g9zk4nzM5",
"_type" : "type01",
"_source" : {
"title" : "Radioactive Honeycomb"
},
"_score" : 1
}
]
}
}
Hope that helps (I'm using ES 6.0 btw).

Custom scoring function in Elasticsearch does not return expected field value

I create a custom scoring function for my documents that just returns the value of the field a for each document. But for some reason, in the example below, the last digits of the _score in the results differ from the last digits of the value of a for each document. What is happening here?
PUT test/doc/1
{
"a": 851459198
}
PUT test/doc/2
{
"a": 984968088
}
GET test/_search
{
"query": {
"function_score": {
"script_score": {
"script": {
"inline": "doc[\"a\"].value"
}
}
}
}
}
That will return the following:
{
"took": 16,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 984968060,
"hits": [
{
"_index": "test",
"_type": "doc",
"_id": "2",
"_score": 984968060,
"_source": {
"a": 984968088
}
},
{
"_index": "test",
"_type": "doc",
"_id": "1",
"_score": 851459200,
"_source": {
"a": 851459198
}
}
]
}
}
Why is the _score different than the value of the field a?
I'm using Elasticsearch 2.1.1
The _score value is internally hard coded as a float which can only accurately represent integers up to the value 134217728. Therefore, if you want to make use, in the scoring function, of a field stored as a number larger than that, it will overflow the buffer and be truncated. See this github issue

MLT (More Like This) elasticsearch query

I'm trying to use elasticsearch MLT (More Like This) query.
Only one doc in store:
{
"_index": "monitors",
"_type": "monitor",
"_id": "AVTnvJ8SancUpEdFLMiq",
"_score": 1,
"_source": {
"ProcessGroup": "test",
"ProcessName": "test",
"OpName": "test",
"Domain": "test",
"LogLevel": "Info",
"StartDateTime": "2016-05-04 04:46:47",
"EndDateTime": "2016-05-04 04:47:47",
"MessageDateTime": "2016-05-04 04:46:47",
"ApplicationCode": "test",
"Status": "10",
}
}
Query:
POST /_search
{
"query": {
"more_like_this" : {
"fields" : ["ProcessName"],
"like" : "test",
"min_term_freq" : 1,
"max_query_terms" : 12
}
}
}
ProcessName is a not analyzed field.
I was expected to get this document as a response, but instead i got nada:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 2,
"successful": 2,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
Why is that ?
Another question:
Suppose I have search engines docs, and I search for "stph". I expect to get "Stephan Curry" suggestion because it's commonly searched. Fuzzy search doesn't fit because distance is greater than 2, so does using MLT query is a good option for this scenario ?

Finding multiple Elasticsearch documents with same ids, different types

I need to find out if any document with a certain id was already indexed in my ES database, so that I can delete them before indexing a new document.
The trouble is I do not know a priori the type it was indexed as.
I found the _mget query which sounds like it could be what I need, but then this quote in the documentation says I only get 1 (random) hit when searching
If you don’t set the type and have many documents sharing the same
_id, you will end up getting only the first matching document.
how can I get this behaviour; finding all documents sharing an _id, possibly > 1 with different _type in the same index without an expensive _search query?
thanks!
A simple term query on "_id" worked for me.
So I created a trivial index and added two documents each, for two different types:
PUT /test_index
POST /test_index/_bulk
{"index":{"_type":"type1","_id":1}}
{"name":"type1 doc1"}
{"index":{"_type":"type1","_id":2}}
{"name":"type1 doc2"}
{"index":{"_type":"type2","_id":1}}
{"name":"type2 doc1"}
{"index":{"_type":"type2","_id":2}}
{"name":"type2 doc2"}
And this query will return both documents with id 1:
POST /test_index/_search
{
"query": {
"constant_score": {
"filter": {
"term": {
"_id": "1"
}
}
}
}
}
...
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 1,
"hits": [
{
"_index": "test_index",
"_type": "type1",
"_id": "1",
"_score": 1,
"_source": {
"name": "type1 doc1"
}
},
{
"_index": "test_index",
"_type": "type2",
"_id": "1",
"_score": 1,
"_source": {
"name": "type2 doc1"
}
}
]
}
}
Here's the code I used:
http://sense.qbox.io/gist/a8085b57c22631148dd4c67769307caf6425fd95

Does the elasticsearch ID have to be unique to a type or to the index?

Elasticsearch allows you to store a _type along with the _index. I was wondering if I were to provide my own _id should it be unique across the index?
It should be unique together
PUT so
PUT /so/t1/1
{}
PUT /so/t2/1
{}
GET /so/_search
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 1,
"hits": [
{
"_index": "so",
"_type": "t2",
"_id": "1",
"_score": 1,
"_source": {}
},
{
"_index": "so",
"_type": "t1",
"_id": "1",
"_score": 1,
"_source": {}
}
]
}
}
And the reason for that: you'd never get documents by index w/o knowing doctype, and querying ES with index-wide query will return documents including their types and indexes.
Absolutely, there are a few ways of doing it.
The first is using the PUT API, which allows us to specify an ID for a document. So, for the index index and the type type:
curl -XPUT "http://localhost:9200/index/type/1/" -d'
{
"test":"test"
}
Which gives me this document:
{
"_index": "index",
"_type": "type",
"_id": "1",
"_score": 1,
"_source": {
"test": "test"
}
}
Another way is to route the ID to a unique field in your mapping. For example, an md5 hash. So, for an index called index with a type called type, we can specify the following mapping:
curl -XPUT "http://localhost:9200/index/_mapping/type" -d'
{
"type": {
"_id":{
"path" : "md5"
},
"properties": {
"md5": {
"type":"string"
}
}
}
}
This time, I'm going to use the POST API, which automatically generates an ID. If you haven't specified a path in your mapping, it will automatically generate one for you.
curl -XPOST "http://localhost:9200/index/type/" -d'
{
"md5":"00000000000011111111222222223333"
}'
Which gives me the following document in a search:
{
"_index": "index",
"_type": "type",
"_id": "00000000000011111111222222223333",
"_score": 1,
"_source": {
"md5": "00000000000011111111222222223333"
}
}
The second method is generally preferred, because it provides consistency across the index. A perfectly valid id for an index could be 1 like in the example, or dog in another case.

Resources