Per store indexing in Solr - elasticsearch

We have a requirement where we have say 500 stores and skus in each store is having different prices and they change everyday. the inventory status for each also changes everyday. We want to index data from all these stores in Solr and elastic search both. What is most effective way in which we can achieve this. Also I need help for querying too when i want to display this on website.

your question is a bit unclear, but if you are looking on how to index diff price/inventory per store, there is a very recent Lucene Solr Revolution presentation by Erik Hatcher showing how to this using Payloads (Solr recently got support for using payload stuff by Erik himself). He is actually using the same example in his presentation.

Related

How to get most retrieved items from an elasticsearch index?

I have uploaded some data to elasticsearch and I would like to keep track of how many times a data point is returned by past searches, that is to say, the most popular searched items.
Does elasticsearch provide such functionality to achieve this without implementing and updating a counter myself?
Cheers.

Best way to set up ElasticSearch for searching in each customer's data only

We have a SAAS product where companies create accounts and populate their own private data. We are thinking about using ElasticSearch to allow the customer to search all their own data in our system.
As an example we would have a free text search where the user can type anything and the API would return multiple different types of objects. E.g. they type John and the API returns the user object for users matching a first name containing John, or an email containing John. Or it might also return a team object where the team name matches John (e.g. John's Team) etc.
So my questions are:
Is ElasticSearch a sensible choice for what we want to do from a
concept perspective?
If we did use ElasticSearch what would be the
best way to index the data so we can search all data for a
particular customer? Does each customer have its own index?
Are there any hints on how we keep ElasticSearch in sync with the data in the database (DynamoDB)? If we index the data for a customer and then update the data as it changes is it sensible to then also reindex the data on a scheduled basis too?
Thanks!
I will try to provide general answers from my own experience with splitted customer data with elastic search:
If you want to search through a lot of data really fast, ES is always a really good solution for this - it comes with the cost of an secondary data storage that you will have to keep in sync with your database.
You cant have diffrent data types in one index, so the case would be either to create one index per data type and customer (carefull, indices come with an overhead - avoid creating too much with little data in it) - or you create one index per data type and add a property to your data where you then can filter it with e.g. a customer number.
You will have to denormalize your data as much as possible to benefit from elastic search.
As mentioned in 1 you will need to keep both in sync - there are plenty ways too do that. As an example we use a an event driven approach to push critical updates into elasticsearch as soon as possible (carefull: its not SQL - so you will always have some concurrency issues when u need read and write safety). For data that is not highly critical we use jobs that update them regulary. When you index a document with the same id it will get completely updated.
Hope this helps, feel free to asy questions.

Comparison of Handling Logs and PDFs in Solr & Elasticsearch and Data Visualization in Banana & Kibana

How do Elasticsearch and Solr compare in respect to the following:
Indexing logs.
Indexing events.
Indexing PDF documents.
Ease of creating and distributing visualizations. Kibana vs Banana.
Support and documentation for developers.
Any help is appreciated.
EDIT
More specifically, i am trying to figure out how exactly a PDF document or an event can be indexed at all. I have worked a little bit on Elasticsearch and since i am a fan of JSON, i found it quite useful when i tried to index structured data.
For example logs are mostly structured and thus i guess easier to index and search. Now what if i want to index the whole log file itself?
Follow up
Is Kibana the only visualization tool available for Elasticsearch?
Is Banana the only visualization tool available for Solr?
Here is an answer to try to address just the Elasticsearch aspect of the post.
Take a look at https://github.com/elastic/elasticsearch-mapper-attachments for handling PDFs
For events/logs, you would need to transform those into structured data to index in Elasticsearch. You can have a field in there for the source (the log file the data came from and other information like that) - you will have all the data in the whole log file indexed in that fashion. You can take advantage of ES aggregations to group results based on log file, calculate statistics, etc.
The ELK stack is definitely worth a look.
I don't know if Kibana is the only visualization tool but it is probably the most popular and likely to offer more than something else.

Summarization in Elasticsearch

I am a newbie to Elasticsearch. We are currently using Splunk platform for our analytics application and looking to migrate to ELK. Splunk provides options to schedule searches to run in background periodically and to store the search results in a separate summary index. Is similar functionality available in Elasticsearch? If so, please point me to the documentation containing the process.
Thanks,
Keerthana
This is a great use case. Of course Elasticsearch can perform such tasks, but it is more manual. You have to write your own script. So for example, if you want to summarize data, you can use ElasticSearch aggregations, and take the result (which comes in JSON format) and store it back into an index where you keep summary data. This way, even if you delete your raw data, your summary data lives on.
Elasticsearch comes with different clients. I like to use the Python Elasticsearch DSL library.

What does the document cap mean in Websolr?

I'm using the Websolr addon in Heroku. What does it mean by "250,000 documents"? What number of DB records or size is that?
Nick from Websolr here.
In this case, 'documents' would be all the distinct 'things' that you want to search.
A Solr index is comprised of many documents. Each document has many fields. Typically each document is analogous to a row in a table, or an instance of a model for your particular ORM.
Typically, a Solr client for your preferred language will help you integrate that concept into your own application and the tools you have used to create it.
In Solr a document is an indexed 'item'.

Resources