How to retrieve all document ids (_id) in a specific index - elasticsearch

I am trying to retrieve all documents in an index, while getting only the _id field back.
Basically I want to retrieve all the document ids I have.
While using:
{
"query": {
"match_all": {}
},
"fields": []
}
The hits I get contain: "_index", "_type", "_id" , "_score", "_source"
which is way more then I need.
Edit(Answer):
So my problem was that I used KOPF to run the queries, and the results were not accurate (got the _source and some more..)! When using curl I got the correct results!
So the above query actually achieved what I needed!
You can also use:
{
"query": {
"match_all": {}
},
"_source": false,
}
or
{
"query": {
"match_all": {}
},
"fields": ["_id"]
}

For elasticsearch, only can specific _source fields by using fields array.
_index, _type, _id, _score must will be returned by elasticsearch.
there is no way to remove them from response.

I am assuming your _id is of your document in index not of index itself.
In new version of elastic search, "_source" is used for retrieving selected fields of your es document because _source fields contains everything you need in elastic search for a es record.
Example:
Let's say index name is "movies" and type is "movie" and you want to retrieve the movieName and movieTitle of all elastic search records.
curl -XGET 'http://localhost:9200/movies/movie/_search?pretty=true' -d '
{
"query" : {
"match_all" : {}
},
"_source": ["movieName","movieTitle"]
}'
OR http://localhost:9200/movies/movie/_search?pretty=true&_source=movieName,movieTitle
By default it return 10 results. For getting n number of records then put &size=n in url

Related

ElasticSearch : Aggregations on one field not working

I have few documents in one index in elastic search. When I aggregate by one of its fields, I do not get any results. The field's mapping is
{
"type": "string",
"index": "not_analyzed"
}
I have another field that is indexed in the same manner but I am able to do aggregations on that. What possible causes can be there for this? How do I narrow down the issue?
Edit : The Elastic Search version is 1.6.0 and I am running the following query for aggregation:
{
"aggregations": {
"aggr_name": {
"terms": {
"field": "storeId",
"size": 100
}
}
}
}
where "storeId" is the field I am aggregating on. The same aggregation works on another field with the same mapping.

Elasticsearch: document size and query performance

I have an ES index with medium size documents (15-30 Mb more or less).
Each document has a boolean field and most of the times users just want to know if a specific document ID has that field set to true.
Will document size affect the performance of this query?
"size": 1,
"query": {
"term": {
"my_field": True
}
},
"_source": [
"my_field"
]
And will a "size":0 query results in better time performance?
Adding "size":0 to your query, you will avoid some net transfer this behaviour will improve your performance time.
But as I understand your case of use, you can use count
An example query:
curl -XPOST 'http://localhost:9200/test/_count -d '{
"query": {
"bool": {
"must": [
{
"term": {
"id": xxxxx
}
},
{
"term": {
"bool_field": True
}
}
]
}
}
}'
With this query only checking if there is some total, you will know if a doc with some id have set the bool field to true/false depending on the value that you specify in bool_field at query. This will be quite fast.
Considering that Elasticsearch will index your fields, the document size will not be a big problem for the performance. Using size 0 don't affect the query performance inside Elasticsearch but affect positively the performance to retrieve the document because the network transfer.
If you just want to check one boolean field for a specific document you can simply use Get API to obtain the document just retrieving the field you want to check, like this:
curl -XGET 'http://localhost:9200/my_index/my_type/1000?fields=my_field'
In this case Elasticsearch will just retrieve the document with _id = 1000 and the field my_field. So you can check the boolean value.
{
"_index": "my_index",
"_type": "my_type",
"_id": "1000",
"_version": 9,
"found": true,
"fields": {
"my_field": [
true
]
}
}
By looking at your question I see that you haven't mentioned the elasticsearch version you are using. I would say there are lot of factors that affects the performance of a elasticsearch cluster.
However assuming it is the latest elasticsearch and considering that you are after a single value, the best approach is to change your query in to a non-scoring, filtering query. Filters are quite fast in elasticsearch and very easily cached. Making a query non-scoring avoids the scoring phase entirely(calculating relevance, etc...).
To to this:
GET localhost:9200/test_index/test_partition/_search
{
"query" : {
"constant_score" : {
"filter" : {
"term" : {
"my_field" : True
}
}
}
}
}
Note that we are using the search API. The constant_score is used to convert the term query in to a filter, which should be inherently fast.
For more information. Please refer Finding exact values

Is it possible to return a specific field when running a query in sense for elasticsearch

I have loaded some data into elasticsearch and written a query against the data however the results contain all of the data for the matching queries. Is it possible to filter the results to show a particular field?
Example
Query to find all records for a specific country but to return a list of registration numbers.
All the data is available elasticsearch however I get a full json record back for each match.
I'm running this query in SENSE (within Kibana 4.5.0).
The query is...
GET _search
{
filter_path=reg_no.*,
"fields" : ["reg_no"],
"query" : {
"fields" : ["country_cd", "oprg_stat"],
"query" : "956 AND 9074"
}
}
If I remove the two lines
filter_path=reg_no.*,
"fields" : ["reg_no"],
the query runs but brings back all the data.
Try this query:
POST _search
{
"_source": [
"reg_no"
],
"query": {
"bool": {
"filter": [
{
"term": {
"country_cd": "956"
}
},{
"term": {
"oprg_stat": "9074"
}
}
]
}
}
}

ElasticSearch terms aggregation by entire field

How can I write an ElasticSearch term aggregation query that takes into account the entire field value, rather than individual tokens? For example, I would like to aggregate by city name, but the following returns new, york, san and francisco as individual buckets, not new york and san francisco as the buckets as expected.
curl -XPOST "http://localhost:9200/cities/_search" -d'
{
"size": 0,
"aggs" : {
"cities" : {
"terms" : {
"field" : "city",
"min_doc_count": 10
}
}
}
}'
You should fix this in your mapping. Add a not_analyzed field. You can create the multi field if you also need the analyzed version.
"album": {
"city": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
Now create your aggregate on city.raw
Update at 2018-02-11
now we can use syntax .keyword after grouped by field according to this
GET /bank/_search
{
"size": 0,
"aggs": {
"group_by_state": {
"terms": {
"field": "state.keyword"
}
}
}
}
This elastic doc suggests to fix that in mapping (as suggested in the accepted answer) - either to make the field not_analyzed or to add a raw field with not_analyzed and use it in aggregations.
There is no other way for it. As the aggregations operate upon inverted index and if the field is analyzed, the inverted index is bound to have only tokens and not the original values of the field.

retrieving the multivalued array from elasticsearch

I have trouble getting the facet in my index.
Basically I want to get the details of particular facet say "Company" in a separate array
I tried many queries but it all get entire facet under facet array .How can I get only particular facet in a facet array
My index is https://gist.github.com/4015817
Please help me .I am badly stuck here
Considering how complex your data structure is, the simples way to extract this information might be using script fields:
curl "localhost:9200/index/doc/_search?pretty=true" -d '{
"query" : {
"match_all" : {
}
},
"script_fields": {
"entity_facets": {
"script": "result=[];foreach(facet : _source.Categories.Types.Facets) {if(facet.entity==entity) result.add(facet);} result",
"params": {
"entity": "Country"
}
},
"first_facet": {
"script": "_source.Categories.Types.Facets[0]"
}
}
}'

Resources