I'm building a product search platform. I used Solr search engine before, and i found its performance is fine but doesn't generate a user interface. Recently I found Algolia has more features, easy setup, and generates a User Interface.
So if someone used Algolia before:
Is Algolia performance better than Solr?
Is there any difference between Algolia and Websolr ?
I'm using Algolia and SolR in production for an e-commerce website.
You're right about what you say on Algolia. It's fast (really) and has a lot of powerful features.
You have a complete dashboard to manage your search engine.
For SolR, it's ok but it's also a black box. You can fine tune your search engine, but it exhibits poor performance for semantic searches (I tested it).
If you have to make a choice, it depends on a lot of things.
With Algolia, there are no servers to manage, easy configuration and integration. It's fast with 20 millions records for me (less than 15ms per search).
With SolR, you can customise a little bit more. But it's a lot of work. If I had to make a choice, it would be more between Algolia and ElasticSearch. SolR is losing velocity; it's hard to imagine it growing again in the next few years.
As a resume, if you want to be fast and efficient, choose Algolia. If you want to dive deep into a search engine architecture and you have a lot of time (count it in months), you can try ElasticSearch.
I hope that I was helpful with my answer, ask me if you have more questions.
Speed is a critical part of keeping users happy. Algolia is aggressively designed to reduce latency. In a benchmarking test, Algolia returned results up to 200x faster than Elasticsearch.
Out-of-the-box, Algolia provides prefix matching for as-you-type search, typo-tolerance with intelligent result highlighting, and a flexible, powerful ranking formula. The ranking formula makes it easy to combine textual relevance with business data like prices and popularity metrics. With Lucene-based search tools like Solr and Elasticsearch, the ranking formula must be designed and built from scratch, which can be very difficult for teams without deep search experience to get right.
Algolia’s highly-optimized infrastructure is distributed across the world in 15 regions and 47 datacenters. Algolia provides a 99.99% reliability guarantee and can deliver a fast search to users wherever in the world they’re connecting from. Elasticsearch and Solr do not automatically distribute to multiple regions, and doing so can incur significant server costs and devops resources
Related
I'm learning about ElasticSearch and enjoying every minute of it. However, there are some practical issues that are confusing me and of course lack of experience that I think seeing some good real life examples might clear up.
Now I am working on a website where I have accounts and products catalog and I want to search for best product matches when end-user searching for products depending on distance, relevance queries and so many criteria .
Particularly interested in:
Relevance Scoring and ranking strategies
Analyzing data of products catalog
Filtering
I would appreciate any references.
P.S
I am using Nest for .net to communicate with ElasticSearch Cluster
Well those three subjects are quite wides. A lot of works has been done on it. You should take some time to look at the elastic search documentation, for your problem, I would recommend you to have a look to the following page first:
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html (For the scoring of your document based on the distance)
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-filter-context.html (For the filtering)
Concerning your last point, the analysing part, I would recommend that you have a look to Kibana:
https://www.elastic.co/products/kibana
First, I'd recommend this article by Alexander Reelsen — Implementing A Modern E-Commerce Search. Great content on e-commerce product catalogues, filtering, and relevance in general (hint — there's no single optimal approach to achieve "relevance").
Secondly, I recently published a handbook for people just like you who need some good real-life examples — you can purchase it at https://elasticsearchbook.com. It contains concise guides on topic like faceted search, filtering, deduplication, autocomplete etc.
I am picking one of the 2 search engines above for a project, and so far both of them have shown to be similar in functionalities.
At least for the requirements that I have:
Proximity Search
Boolean queries
query over all fields
Boolean queries
Retrieval of original indexed document
Real time search requirements, as soon as I index a document, it should be available
Besides that I should have around 1 single type of document, in about 40 million documents - roughly 2 TB of data
that's basically what I need, my questions would be:
Does one search engine perform better than the other considering my dataset? Such as offering better indexing rates or Search Rates?
Am I loosing anything by going with Solr(considering my requirements)?
Solr is my choice at the moment.
some thoughts:
nobody can tell you about which one would perform best for you unless you benchmark in your realistic conditions
for %99 of users, any of the two would work perfectly
if you want to go with one of them (for any reason: you like it, your devs want to try it, you like the logo, whatever), then, don't sweat it, both are very capable.
Our company has several products and several teams. One team is in charge of searching, and is standardizing on Elasticsearch as a nosql db to store all their data, with plans to use Neo4j later to compliment their searches with relationship data.
My team is responsible for the product side of a social app (people have friends, and work for companies, and will be colleagues with everyone working at their companies, etc). We're looking at graph dbs as a solution (after abandoning the burning ship that is n^2 relationships in rdbms), specifically neo4j (the Cypher query language is a beautiful thing).
A subset of our data is similar to the data used by the search team, and we will need to make sure search can search over their data and our data simultaneously. The search team is pushing us to standardize on ElasticSearch for our db instead of Neo4j or any graph db. I believe this is for the sake of standardization and consistency.
We're obviously coming from very different places here, search concerns vs product concerns. He asserts that ElasticSearch can cover all our use cases, including graph-like queries to find suggestions. While that's probably true, I'm really looking to stick with Neo4j, and use an ElasticSearch plugin to integrate with their search.
In this situation, are there any major gotchas to choosing ElasticSearch over Neo4j for a product db (or vice versa)? Any guidelines or anecdotes from those who have been in similar situations?
We are heavy users of both technologies, and in our experience you would better use both to what they are good for.
Elasticsearch is a super good piece of software when it comes to search functionalities, logs management and facets.
Despite their graph plugin, if you want to use a lot of social network and alike relationships in elasticsearch indices, you will have two problems :
You will have to update documents everytime a relationship changes, which can come to a lot when a single entity changes. For example, let's say you have organizations having users which are doing contributions on github, and you want to search for organizations having the top contributors in a certain language, everytime a user is doing a contribution on github you will have to reindex the whole organization, compute percentage of contributions of languages for all users etc... And this is a simple example.
If you intend to use nested fields and partent/child mapping, you will loose performance during search, in reference, the quote from the "tuning for search" documentation here : https://www.elastic.co/guide/en/elasticsearch/reference/master/tune-for-search-speed.html#_document_modeling
Documents should be modeled so that search-time operations are as cheap as possible.
In particular, joins should be avoided. nested can make queries
several times slower and parent-child relations can make queries
hundreds of times slower. So if the same questions can be answered
without joins by denormalizing documents, significant speedups can be
expected.
Relationships are very well handled in a graph database like neo4j. Neo4j on the contrary lacks search features elasticsearch provides, doing full_text search is possible but not so performant and introduces some burden in your application.
Note apart : when you talk about "store", elasticsearch is a search engine not a database (while being used a lot as it), while neo4j is a database fully transactional.
However, combining both is the winning process, we have actually written an article describing this process that we call Graph-Aided Search with a set of open source plugins for both Elasticsearch and Neo4j providing you a powerful two-way integration out of the box.
You can read more about it here : http://graphaware.com/neo4j/2016/04/20/graph-aided-search-the-rise-of-personalised-content.html
I'm pondering a strategy to maintain an index for Elasticsearch, I've found a plugin which may handle maintenance quite well however I would like to get a little more intimate with Elasticsearch since I really like her and the plugin would make playtime a little less intimate if you know what I mean.
So anyway, if I have a data set that would have fairly frequent updates (say ~ 1 update / 10s), would I run into performance problems with Elasticsearch? Can partial index updates be done when a single row changes or is a full re-rebuild of the index necessary? The strategy I plan on implementing involves modifying the index whenever I do CRUD with my application (python postgre) so there will be some overhead with the code which I'm not overly concerned about, just the performance. Is my strategy common?
I've used Sphinx which did have partial re-indexing which was run with a cron job to keep in sync, it had mapping between indexes and MySQL tables defined in the config. This was the recommended approach for Sphinx. Is there a recommended approach with Elasticsearch?
There are a number of different strategies for handling this, there's no simple one size fits all solution.
To answer some of your questions, first, there is no such thing as a partial update in Elasticsearch/Lucene. If you update a single field in a document the whole document is rewritten. Be aware of the performance implications of this when designing your schema. If you update a single document however, it should be available near instantly. Elasticsearch is a near-realtime search engine, you don't have to worry about regenerating the index constantly.
For your write load one update / 10s the default performance settings should be fine. That's a very low write load for ES in fact, it can scale much higher. Netflix, for instance, performs 7 millions updates / minute in one of their clusters.
As far as syncing strategies go, I've written an in-depth article on this "Keeping Elasticsearch in Sync"
I am building a site atm which requires realtime indexing of results (not 10,000 docs per second, I mean millisecond updates). I went anout researching the different techs and originally came up with dozens of different platforms. I have been able to narrow my choices down to about 3 through the use of deduction (doc complexity, different types of support etc):
Lucence
Xapian
Sphinx
I originally tried to choose between these by the sites using them but then, to my surprise, lots and lots of high profile sites trust all three of these. I also found that all three of these also allow millisecond updates.
I thought about Sphinx originally because it is the only one of the three to say full realtime indexing instead of near realtime indexing only to find it is still in beta (not sure how reliable this tech would be in realtime indexing tbh).
I am leaning towards lucene since when solr gets realtime indexing moving my schema to solr will be insanely easy.
I am also leaning towards Xapian because a number of sites I know implement it very well.
I am having huge problems deciding between these techs and which one would be best suited.
I am looking at a site with millions maybe even tens of millions of records that needs an index that can be appended/deleted/updated in realtime.
Can anyone share their experiences on working with realtime search platforms to help me choose the right one for me? I am open to suggestions that are not here :).
P.S I use MongoDB so don't post SQL only search platforms please :).
I am answering this question with what I found, after a couple of weeks, was the best option.
I found Lucene the best actually since Zoies user base was, is.....**. I wanted to post a topic on the google group (the only form of support) and so far a couple of weeks later it still has not been moderated and approved for display.
That really put me off Zoie so in the end I decided to give Lucene a try.
Thanks anyway :).
I would recommend zoie based on lucene.