How to maintain index on incrementing counters in Elasticsearch? - elasticsearch

What is the best way to go around implementing counters and to be able to sort on them.
These counters are updated quite frequently and I do not want to reindex the entire document. The approaches I know of are:
1) To maintain the the counter values in some form of cache, query elastic search and sort in memory to return the results.
2) Maintain 2 indices in elastic search, 1 for the document and other for the counters. Issue 2 queries separately to elasticsearch and merge the results.
Please help.

Seems like there updating the index too frequently is not an ideal use of elasticsearch.
Based on the information from this blog by elasticsearch, eventual consistency is the way to go.
https://www.elastic.co/blog/found-keeping-elasticsearch-in-sync
I will be updating my implementation based on the approach suggested in the blog.
Closing the question.

Related

ElasticSearch Search Queries Count

We have a use case for aggregating count of elastic-search search queries/operations. Initially we've decided to make use of the /_stats endpoint for aggregating results on a per index basis. However, we would also like to explore the option of filtering search operations so we can distinguish operations by origin/source. I was wondering how we can do this efficiently. Any references to documentation or implementations would be highly appreciated,

How important is it to use separate indices for percolator queries and their documents?

The ElasticSearch documentation on the Percolate query recommends using separate indices for the query and the document being percolated:
Given the design of percolation, it often makes sense to use separate indices for the percolate queries and documents being percolated, as opposed to a single index as we do in examples. There are a few benefits to this approach:
Because percolate queries contain a different set of fields from the percolated documents, using two separate indices allows for fields to be stored in a denser, more efficient way.
Percolate queries do not scale in the same way as other queries, so percolation performance may benefit from using a different index configuration, like the number of primary shards.
At the bottom of the page here:
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-percolate-query.html
I understand this in theory, but I'd like to know more about how necessary this is for a large index (say, 1 million registered queries).
The tradeoff in my case is that creating a separate index for the document is quite a bit of extra work to maintain, mainly because both indices need to stay "in sync". This is difficult to guarantee without transactions, so I'm wondering if the effort is worth it for the scale I need.
In general I'm interested in any advice regarding the design of the index/mapping so that it can be queried efficiently. Thanks!

Searching for data in DynamoDB or using a search service

I would like to know the pros and cons of trying to search for data (basically full text search on a limited set of fields).
My data is currently in DynamoDB, and I realize that is not well suited to full-text search. Are there ways of doing a full-text search in DynamoDB? What are the pros and cons of doing that?
I can also use a Search cluster (like ElasticSearch). Any reasons that you would not go with a search cluster?
Are there other ways to do a full-text search? Other solutions?
Dynamodb is best suited for key value Insert and Retrieval.
It does not support search functionality, if you are trying to do a scan with some condition that will be O(n) and it will be very costly since you are consuming lots of read capacity.
Now coming to options
If use case is not full text search and only key value match, you can try to come up with composites key, but it will have drawbacks like
a. Can not change the schema afterwards and may require huge effort if you need to search on a new field.
b. Designing these kind of key is tricky considering that few keys will always be hot, and may result into hot partition.
Ideal solution is to use elastic-search or solr indexing. You can have a lambda function listening to dynamodb stream, doing transformation and putting data in elasticsearch. But it will have limitations like
a. Elasticsearch cluster is costly.

Best way to store votes in elasticsearch for a reddit like system

I am building a site similar to reddit using elasticsearch and trying to decide where is the best place to store the up/down votes. I can think of couple options.
Store as part of the document.
In this case, any vote will trigger an update on the document. According to elasticsearch document, this is essentially a replace of the whole document. That seems to be a very expensive operation.
Store in another database.
Store votes in other database like SQL/MongoDB and update elasticsearch periodically. In this case, we have to tolerate some delay for the new votes to affect search result which is not so ideal and will also increase complexity and maintenance cost.
Store in another index in elasticsearch
This can separate the concern by index - one mostly RO, one RW. Is there an efficient way to merge the two indices so that I can order by votes at query time?
Any suggestions on those options or other better way to handle this?
There is a forth option - store votes in a separate document with a different type but in the same index as the original document. The votes type can be made a child of the article type. This setup will enable you to perform queries against articles and votes at the same time using has_child filters and queries. It will also require reindexing of only a small votes document every time a vote occurs instead of the large article document. On the negative side, the has_child and has_parent queries require loading of the parent/child map into memory, so this approach has a non-trivial memory footprint comparing to all other options that you have described.

elasticsearch - tips on how to organize my data

I'm trying elasticsearch by getting some data from facebook and twitter to.
The question is: how can I organize this data in index?
/objects/posts
/objects/twits
or
/posts/post
/twits/twit
I'm trying queries such as, get posts by author_id = X
You need to think about the long term when deciding how to structure your data in Elasticsearch. How much data are you planning on capturing? Are search requests going to look into both Facebook and Twitter data? Amount of requests, types of queries and so on.
Personally I would start of with the first approach, localhost:9200/social/twitter,facebook/ as this will reduce the need for another index when it isn't necessarily required. You can search across both of the types easily which has less overhead than searching across two indexes. There is quite an interesting article here about how to grow with intelligence.
Elasticsearch has many configurations, essentially its finding a balance which fits your data.
First one is the good approach. Because creating two indices will create two lucence instances which will effect the response time.

Resources