Fetch all the rows using elasticsearch_dsl - elasticsearch

Currently i am using the following program to extract the id and its severity information from elastic search .
from elasticsearch import Elasticsearch
from elasticsearch_dsl import Search, Q
client = Elasticsearch(
[
#'http://user:secret#10.x.x.11:9200/',
'http://10.x.x.11:9200/',
],
verify_certs=True
)
s = Search(using=client, index="test")
response = s.execute()
for hit in response:
print hit.message_id, hit.severity, "\n\n"
i believe by default the query returns 10 rows. I am having more than 10000 rows in elastic search. I need to fetch all the information.
Can some one guide me how to run the same query to fetch all records ?

You can use the scan() helper function in order to retrieve all docs from your test index:
from elasticsearch import Elasticsearch, helpers
client = Elasticsearch(
[
#'http://user:secret#10.x.x.11:9200/',
'http://10.x.x.11:9200/',
],
verify_certs=True
)
docs = list(helpers.scan(client, index="test", query={"query": {"match_all": {}}}))
for hit in docs:
print hit.message_id, hit.severity, "\n\n"

Related

python elasticsearch get field from given doc_id

My input is <index_name>, <doc_id>, <field_name>, i want the value of the field
I am looking for python-client equivalent of
GET <index_name>/_doc/<doc_id>/?_source_includes=<field_name>
I figured it out
from elasticsearch import Elasticsearch
es = Elasticsearch()
result = es.get(
index=<index_name>,
id=<doc_id>,
_source_includes=<field_name>
)

QueryDSL _id query within script

I am using Elasticsearch & Kibana v5.6 and within devTools, I am able to use script within querydsl to query for a doc on a fieldname = value, ie:
GET indexA/_search
{
"query":{ "script":{ "script": """
def a = doc['field1'].value;
return a == 'value1';
"""}}
}
Above would return all doc that has 'value1' as value within the field called 'field1'. But I am unable to search on _id, official doc says that prior to v6 we should use _uid instead so I have tried that with no luck. I am using script because after I am able to use _uid to get value of _id, essentially I want to do some value comparison similar to below:
GET indexA/_search
{
"query":{ "script":{ "script": """
def a = doc['field1'].value;
def b = doc['_uid'].value;
return a == b;
"""}}
}
I think devTools is where i want to execute this instead of other places. Any pointers are appreciated
You are referring to Query Id's document or this one but deploying in the wrong context you need to define id's in the separate head of ids or put under term field _id.

ElasticSearch get only document ids, _id field, using search query on index

For a given query I want to get only the list of _id values without getting any other information (without _source, _index, _type, ...).
I noticed that by using _source and requesting non-existing fields it will return only minimal data but can I get even less data in return ?
Some answers suggest to use the hits part of the response, but I do not want the other info.
Better to use scroll and scan to get the result list so elasticsearch doesn't have to rank and sort the results.
With the elasticsearch-dsl python lib this can be accomplished by:
from elasticsearch import Elasticsearch
from elasticsearch_dsl import Search
es = Elasticsearch()
s = Search(using=es, index=ES_INDEX, doc_type=DOC_TYPE)
s = s.fields([]) # only get ids, otherwise `fields` takes a list of field names
ids = [h.meta.id for h in s.scan()]
I suggest to use elasticsearch_dsl for python. They have a nice api.
from elasticsearch_dsl import Document
# don't return any fields, just the metadata
s = s.source(False)
results = list(s)
Afterwards you can get the the id with:
first_result: Document = results[0]
id: Union[str,int] = first_result.meta.id
Here is the official documentation to get some extra information: https://elasticsearch-dsl.readthedocs.io/en/latest/search_dsl.html#extra-properties-and-parameters

Elasticsearch DSL: Bucket not working

Running the code,
from elasticsearch import Elasticsearch
from elasticsearch_dsl import Search, Q, A
client = Elasticsearch(timeout=100)
s = Search(using=client, index="cms*")
s.aggs.bucket('ExitCode', 'terms', field='ExitCode').metric('avgCpuEff', 'avg', field='CpuEff')
for hit in s[0:20].execute():
print hit['ExitCode']
yields several ExitCode = 0. I thought a terms bucket is supposed to group all the results that have the same exit code, in this case. What is actually going on?
You're iterating over the hits, you need to iterate over the aggregated buckets instead:
response = s.execute()
for code in response.aggregations.ExitCode.buckets:
print(code.key, code.avgCpuEff.value)

Implement Equivalent Elasticsearch titan query using Elasticsearch java driver

I asked question related to pagination in Elastic Search result fetched by titan here
Pagination with Elastic Search in Titan
and concluded that its not supporting right now so to get it I decided to search Titan index directly using ES java client.
Here is ths Titan way of fetching ES records:
Iterable<Result<Vertex>> vertices = g.indexQuery("search","v.testTitle:(mytext)")
.addParameter(new Parameter("from", 0))
.addParameter(new Parameter("size", 2)).vertices();
for (Result<Vertex> result : vertices) {
Vertex tv = result.getElement();
System.out.println(tv.getProperty("testTitle")+ ": " + result.getScore());
}
Its return 1 record.
but addParameter() is not supported so pagination is not allowed. So i wanted to do same thing directly from ES java client as below:
Node node = nodeBuilder().node();
Client client = new TransportClient().addTransportAddress(new InetSocketTransportAddress("127.0.0.1", 9300));
SearchResponse response = client.prepareSearch("titan")
.setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
.setQuery(QueryBuilders.fieldQuery("testTitle", "mytext")) // Query
.execute()
.actionGet();
System.out.println(response.getHits().totalHits());
node.close();
Its printing zero in my case. but the same query in Titan code (above) return 1 record. Am I missing some Titan specific parameters or options in this ES java code????
I think that Titan is sending a QueryStringQuery.
That said, I would recommend using a MatchQuery.
QueryBuilders.matchQuery("offerTitle", "your text whatever you want")

Resources