Does solr support the sorting while creating index? - performance

In my test environment, there are nearly 130,000,000 documents on each server. It works fast if I do a search without sorting by date, but extremly slow if sorting is enabled.
I think if the solr can sort an indexed field while creating index, searching would be more efficient. So, how to configure the solr to sort some fields while indexing?

The initial query would be slower but all the subsequent queries should be fast.
Solr should be able to use the Filter Query Cache for sorting.
You can also warm the sort fields.
Also check if the overhead is also just cause of sorting and there is no querying and scoring involved.

Related

How does Elasticsearch/Lucene achieve such performance when querying multiple fields?

According to the answer given here, Elasticsearch doesn't seem to use compound indexes for querying multiple fields, and instead queries multiple indexes and then intersects the results.
My question is how does it achieve such high performance? Surely a composite index is faster since it leads you straight to the desired data, rather than querying multiple indexes, which in turn return more data, and then compare the results?
I get the advantages of the multiple indexes, regarding the field order, etc., but in terms of performance, surely it's inferior...

ElasticSearch Search Queries Count

We have a use case for aggregating count of elastic-search search queries/operations. Initially we've decided to make use of the /_stats endpoint for aggregating results on a per index basis. However, we would also like to explore the option of filtering search operations so we can distinguish operations by origin/source. I was wondering how we can do this efficiently. Any references to documentation or implementations would be highly appreciated,

How important is it to use separate indices for percolator queries and their documents?

The ElasticSearch documentation on the Percolate query recommends using separate indices for the query and the document being percolated:
Given the design of percolation, it often makes sense to use separate indices for the percolate queries and documents being percolated, as opposed to a single index as we do in examples. There are a few benefits to this approach:
Because percolate queries contain a different set of fields from the percolated documents, using two separate indices allows for fields to be stored in a denser, more efficient way.
Percolate queries do not scale in the same way as other queries, so percolation performance may benefit from using a different index configuration, like the number of primary shards.
At the bottom of the page here:
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-percolate-query.html
I understand this in theory, but I'd like to know more about how necessary this is for a large index (say, 1 million registered queries).
The tradeoff in my case is that creating a separate index for the document is quite a bit of extra work to maintain, mainly because both indices need to stay "in sync". This is difficult to guarantee without transactions, so I'm wondering if the effort is worth it for the scale I need.
In general I'm interested in any advice regarding the design of the index/mapping so that it can be queried efficiently. Thanks!

Searching for data in DynamoDB or using a search service

I would like to know the pros and cons of trying to search for data (basically full text search on a limited set of fields).
My data is currently in DynamoDB, and I realize that is not well suited to full-text search. Are there ways of doing a full-text search in DynamoDB? What are the pros and cons of doing that?
I can also use a Search cluster (like ElasticSearch). Any reasons that you would not go with a search cluster?
Are there other ways to do a full-text search? Other solutions?
Dynamodb is best suited for key value Insert and Retrieval.
It does not support search functionality, if you are trying to do a scan with some condition that will be O(n) and it will be very costly since you are consuming lots of read capacity.
Now coming to options
If use case is not full text search and only key value match, you can try to come up with composites key, but it will have drawbacks like
a. Can not change the schema afterwards and may require huge effort if you need to search on a new field.
b. Designing these kind of key is tricky considering that few keys will always be hot, and may result into hot partition.
Ideal solution is to use elastic-search or solr indexing. You can have a lambda function listening to dynamodb stream, doing transformation and putting data in elasticsearch. But it will have limitations like
a. Elasticsearch cluster is costly.

Filtering the results of a sorted query in Lucene.NET

I'm using Lucene.NET, which is currently up to date with Lucene 2.9. I'm trying to implement a kind of select distinct, but without the need to drill down into any groups. I know that Lucene 3.2 has a faceted search that may solve this, but I don't have the time to port it to 2.9 yet.
I figure in any event, when you perform a paged query with a sort operator, Lucene has to find all the documents that match the query, sort them, then take the top N results, where N is the page size. I'd like to build something that is also applied after the sorted query has completed, but takes the top N unique results and returns them. I'm thinking of using a HashSet and one of the indexed fields to determine uniqueness. I'd rather find a way to extend something in Lucene than try and do this once the results are already returned for performance reasons.
Custom filters seem to run before the main query is even applied and custom collectors run before sorting is applied, unless you are sorting by Lucene's document id. So what is the best approach to this problem? A point in the direction of the right component to extend will get you the answer on this one, an example implementation will most definitely get you the answer. Thanks in advance
I'd make the search without sorting, and in a custom collector, would collect the results in a sorted list of size N based on "uniqueness"

Resources