how to filter data from elasticsearch - elasticsearch

I have create several indexes in an elasticsearch. The result of the http://localhost:9200/_cat/indices?v indicate as follows.
health status index pri rep docs.count docs.deleted store.size pri.store.size
yellow open idx_post:user:5881bc8eb46a870d958d007a 5 1 6 0 31.3kb 31.3kb
yellow open idx_usr 5 1 2 0 14.8kb 14.8kb
yellow open idx_post:group:587dc2b57a8bee13bdab6592 5 1 1 0 6.1kb 6.1kb
I am new to elasticsearch and I tried to filter data by using the index idx_usr.
http://localhost:9200/idx_usr/_search?pretty=1
it is working fine and it returns the expected values. But when I am trying to grab data by using the index idx_post it returns an Error saying the index is not found.
http://localhost:9200/idx_post/_search?pretty=1
This gives the result:
{
"error" : "IndexMissingException[[idx_post] missing]",
"status" : 404
}
If anyone can identify the reason. I would be really thankful.
Update::
This is how do I create the index. Assume that there is an object call data. And I am using an ES client call this.esClient
var _esData = {
index: "idx_post:group"+data.group_id;
type:'posts',
id:data.post_id.toString()
};
_esData['body'] = data;
this.esClient.index(_esData, function (error, response) {
if(error)
console.log(error);
callBack(response);
});

You can get the results you want by querying your indexes like this with a wildcard
http://localhost:9200/idx_post*/_search?pretty=1
^
|
add this

Because index does not exists
{
"error" : "IndexMissingException[[idx_post] missing]",
"status" : 404
}
You try to search in idx_post while _cat returns name idx_post:user:5881bc8eb46a870d958d007a and idx_post:group:587dc2b57a8bee13bdab6592

Related

Individually update a large amount of documents with the Python DSL Elasticsearch UpdateByQuery

I'm trying to use the UpdateByQuery to update a property of a large amount of documents. But as each document will have a different value, I need to execute ir one by one. I'm traversing a big amount of documents, and for each document I call this funcion:
def update_references(self, query, script_source):
try:
ubq = UpdateByQuery(using=self.client, index=self.index).update_from_dict(query).script(source=script_source)
ubq.execute()
except Exception as err:
return False
return True
Some example values are:
query = {'query': {'match': {'_id': 'VpKI1msBNuDimFsyxxm4'}}}
script_source = 'ctx._source.refs = [\'python\', \'java\']'
The problem is that when I do that, I got an error: "Too many dynamic script compilations within, max: [75/5m]; please use indexed, or scripts with parameters instead; this limit can be changed by the [script.max_compilations_rate] setting".
If I change the max_compilations_rate using Kibana, it has no effect:
PUT _cluster/settings
{
"transient": {
"script.max_compilations_rate": "1500/1m"
}
}
Anyway, it would be better to use a parametrized script. I tried:
def update_references(self, query, script_source, script_params):
try:
ubq = UpdateByQuery(using=self.client, index=self.index).update_from_dict(query).script(source=script_source, params=script_params)
ubq.execute()
except Exception as err:
return False
return True
So, this time:
script_source = 'ctx._source.refs = params.value'
script_params = {'value': [\'python\', \'java\']}
But as I have to update the query and the parameters each time, I need to create a new instance of the UpdateByQuery for each document in the large collection, and the result is the same error.
I also tried to traverse and update the large collection with:
es.update(
index=kwargs["index"],
doc_type="paper",
id=paper["_id"],
body={"doc": {
"refs": paper["refs"] # e.g. [\\'python\\', \\'java\\']
}}
)
But I'm getting the following error: "Failed to establish a new connection: [Errno 99] Cannot assign requested address juil. 10 18:07:14 bib gunicorn[20891]: POST http://localhost:9200/papers/paper/OZKI1msBNuDimFsy0SM9/_update [status:N/A request:0.005s"
So, please, if you have any idea on how to solve this it will be really appreciated.
Best,
You can try it like this.
PUT _cluster/settings
{
"persistent" : {
"script.max_compilations_rate" : "1500/1m"
}
}
The version update is causing these errors.

Deleting index from ElasticSearch (via Kibana) automatically being recreated?

I have created an ElasticSearch instance via AWS and have pushed some test data into it in order to play around with Kibana. I'm done playing around now and want to delete all my data and start again. I have run a delete command on my index:
Command
DELETE /uniqueindex
Response
{
"acknowledged" : true
}
However almost immediately my index seems to re-appear and documents start appearing in the count of documents as well.
Command
GET /_cat/indices?v
Response:
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open .kibana_1 e3LQWRvgSvqSL8CFTyw_SA 1 0 3 0 15.2kb 15.2kb
yellow open uniqueindex Y4tlNxAXQVKUs_DjVQLNnA 5 1 713 0 421.7kb 421.7kb
It's like it's auto generating after the delete. Clearly a setting or something, but being new to ElasticSearch/Kibana I'm not sure what I'm missing.
By default indices in Elasticsearch can be created automatically just by PUTing or POSTing a document.
You can change this behavior with action.auto_create_index where you can disable this entirely (indices need to be created with a PUT command) or just whitelist specific indices.
Quoting from the linked docs:
PUT _cluster/settings
{
"persistent": {
"action.auto_create_index": "twitter,index10,-index1*,+ind*"
}
}
PUT _cluster/settings
{
"persistent": {
"action.auto_create_index": "false"
}
}
+ is allowing automatic index creation while - forbids it.

How to delete data from a particular shard

I have got a index with 5 primary shards and no replicas.
One of my shard(shard 1) is in unassigned state. When i checked the log file, i found out below error:
2obv65.nvd, _2vfjgt.fdx, _3e3109.si, _3dwgm5_Lucene45_0.dvm, _3aks2g_Lucene45_0.dvd, _3d9u9f_76.del, _3e30gm.cfs, _3cvkyl_es090_0.tim, _3e309p.nvd, _3cvkyl_es090_0.blm]]; nested: FileNotFoundException[_101a65.si]; ]]
When i checked the index, i could not find the 101a65.si file for the shard 1.
I am unable to locate the missing .si file. I tried a lot but could not assign the shard 1 again.
Is there any other way to make the shard 1 assign again? or do i need to delete the entire shard 1 data?
Please suggest.
Normally in the stack trace you should see the path to the corrupted shard, something like MMapIndexInput(path="path/to/es/db/nodes/node_number/indices/name_of_index/1/index/some_file) (here the 1 is the shard number)
Normally deleting path/to/es/db/nodes/node_number/indices/name_of_index/1 should help the shard recover. If you still see it unassigned try sending this command to your cluster (normally as per the documentation, it should work, though I'm not sure about ES 1.x syntax and commands):
POST _cluster/reroute
{
"commands" : [
{
"allocate" : {
"index" : "myIndexName",
"shard" : 1,
"node" : "myNodeName",
"allow_primary": true
}
}
]
}

how to fill a scripted field value by condition in kibana

i am using Kibana 4 and my document contains two integer fields called: 'x' & 'y'. i would like to create a scripted field in Kibana returning the division value of 'x' by 'y' if 'y'<> 0. else: return the value of 'x'.
i have tried to add this script to a new screnter code hereipted field:
doc['x'].value > 0 ? doc['x'].value/doc['y'].value : doc['x'].value;
but got a parsing error when trying to visualize it:
Error: Request to Elasticsearch failed:
{"error":"SearchPhaseExecutionException[Failed to execute phase [query],
all shards failed; shardFailures
how can i create a scripted field with condition in Kibana, step by step?
What you are seeing is not a parsing error, shardFailures just means that the underlying Elasticsearch is not ready yet. When starting Kibana/Elasticsearch, make sure your ES cluster is ready before diving into Kibana, i.e. run curl -XGET localhost:9200/_cluster/health and in the response, you should see something similar to this:
{
cluster_name: your_cluster_name
status: yellow <----- this must be either yellow or green
timed_out: false
number_of_nodes: 2
number_of_data_nodes: 2
active_primary_shards: 227
active_shards: 454
relocating_shards: 0 <----- this must be 0
initializing_shards: 0 <----- this must be 0
unassigned_shards: 25
}
As for your script, it is written correctly, however the condition you mentioned is not correct since you wanted y <> 0 and not x > 0, so it should be
doc['y'].value != 0 ? doc['x'].value / doc['y'].value : doc['x'].value
Please give it a try

Can anyone give a list of REST APIs to query elasticsearch?

I am trying to push my logs to elasticsearch through logstash.
My logstash.conf have 2 log files as input; elasticsearch as output; and grok as filter. Here is my grok match:
grok {
match => [ "message", "(?<timestamp>[0-9]{4}-[0-9]{2}-[0-9]{2}
[0-9]{2}:[0-9]{2}:[0-9]{2},[0-9]{3})
(?:\[%{GREEDYDATA:caller_thread}\]) (?:%{LOGLEVEL:level})
(?:%{DATA:caller_class})(?:\-%{GREEDYDATA:message})" ]
}
When elasticsearch is started, all my logs are added to elasticsearch server with seperate index name as mentioned in logstash.conf.
My doubt is that how my logs are stored in elasticsearch? I only know that it is stored with the index name as mentioned in logstash.
'http://164.99.178.18:9200/_cat/indices?v' API given me the following:
health status index pri rep docs.count docs.deleted store.size pri.store.size
yellow open tomcat-log 5 1 6478 0 1.9mb 1.9mb
yellow open apache-log 5 1 212 0 137kb
137kb
But, how 'documents', 'fields' are created in elasticsearch for my logs.
I read that elasticsearch is REST based search engine. So, if there any REST APIs that I could use to analyze my data in elasticsearch.
Indeed.
curl localhost:9200/tomcat-log/_search
Will give you back the first 10 documents but also the total number of docs in your index.
curl localhost:9200/tomcat-log/_search -d '{
"query": {
"match": {
"level" : "error"
}
}
}'
might gives you all docs in tomcat-log which have level equal to error.
Have a look at this section of the book. It will help.

Resources