I need to send a large bunch of ids in terms query, and i tried with approx 2000 guids, but I found that the data is not being posted to elasticseach. Json array was empty. Is there any limit to max count of values in terms query??and is there any config setting that can increase the max query length for terms query.
I just tried to find out on web if its the json_encode function that does not support such a large array size to encode, but its not the case, so second thing that came to my mind is if elasticsearch terms query supports this or not??
Any help or guidance will be highly appreciated.
If you are using a bool filter or query, it looks like there is a limit of 1024 clauses. See this.
https://groups.google.com/forum/#!topic/elasticsearch/LqywKHKWbeI
Based on that same link, it also appears that you can the option in your elasticsearch.yml
Related
Does ElasticSearch have a size limit on its JSON queries?
e.g. If I filter using ids, and I build up a list of 1 million+ ids, will the request be rejected?
In theory there must be a ceiling. But I can't find any documentation on it.
indices.query.bool.max_clause_count (Static, integer) Maximum number
of clauses a Lucene BooleanQuery can contain. Defaults to 1024
Refer to this official documentation, to know more about this setting
Add the following configuration in the elasticsearch.yml file to increase the maximum number of clauses.
indices.query.bool.max_clause_count:4096
Is there a way to limit a field to a certain number of characters when getting results from Elasticsearch? I know how to limit my results to a specific set of fields, but I don't see how to get just a piece of the data. I would like to receive just the first 100 characters to display a preview of data and limit I/O.
I have seen that highlighting gives the option of setting a fragment size, but I am not necessarily querying for anything from the field I want a substring of.
Elasticsearch doesn't provide such an option. The ideal way to achieve this kind of scenario is to change the way you are indexing data and may be store a snippet along with the long length fields. ( #foresightyj has provided good links to exclude/include fields at indexing/querying time)
In order to load all the documents index by ElasticSearch, I am using the following query through tire.
def all
max = total
Tire.search 'my_documents' do
query { all }
size max
end.results.map { |entry| entry.to_hash }
end
Where max, respectively total is a count query to return the number of present documents. I have indexed about 10,000 documents. Currently, the request takes too long.
I am aware, that I should not query all documents like this. What is the best alternative here? Using pagination, if yes, toward which metric would I define the number of documents per page?
I am also planning to extend the size of the documents, to 100,000 or even 1,000,000 and I don't see yet how this can scale.
I appreciate every comment.
Rationale: I do this, because I am running calculations over these data. Hence, I need all the data, run the computations and save the results back into the documents.
Have a look at the scroll API, which is highly optimized to fetch a large amount of results. It uses the scan search type and doesn't support sorting but let you provide a query to filter the documents you want to fetch. Have a look at the reference to know more about it. Remember the size that you define in the request is per shard; that means that if you have 5 primary shards, setting 10 would lead to have 50 results back per request.
When I apply the 'ToFacets("facets/CameraFacets")' extension on the 'IQueryable' that comes from my query, I find the count on one of the 'IEnumerable' collections against a facet in the dictionary is 1024. I know for sure there are more, but how do I retrieve them? Will increasing the safe limit automatically give me all values, also is there another way of doing this without having to increase that limit?
Yes if you change the safe limit it will pull in more facets, take a look at the HandleTermsFacet(..) in the code.
However, I wouldn't recommend it. It's a perf issue because 1024 facets means you are doing 1024 seperate queries.
If you need to deal with this many facets, you are better off using a Map/Reduce index, also see this blog post
How to get all the rows returned from the solr instead of getting only 10 rows?
You can define how many rows you want (see Pagination in SolrNet), but you can't get all documents. Solr is not a database. It doesn't make much sense to get all documents in Solr, if you feel you need it you might be using the wrong tool for the job.
This is also explained in detail in the Solr FAQ.
As per Solr Wiki,
About the row that query returns,
The default value is "10", which is used if the parameter is not specified. If you want to tell Solr to return all possible results from the query without an upper bound, specify rows to be 10000000 or some other ridiculously large value that is higher than the possible number of rows that are expected.
refer this https://wiki.apache.org/solr/CommonQueryParameters
You can setup rows=x, where x is the desired number of doc in the query url.
You can also get groups of 10 doc, by looping over the founds docs by changing start value and leaving row=10
Technically it is possible to get all results from a SOLR search. All you need to do is to specify the limit as -1.