I have read articles about failure scenarios of RavenDB cluster (http://ayende.com/blog/155937/api-design-sharding-status-for-failure-scenarios-solving-at-the-right-granularity) which were very helpful. However, the solution only works with SequentialShardAccessStrategy, or at least I was not able to get it work with ParallelShardAccessStrategy. :o) Is there a way for ParallelShardAccessStrategy?
Related
I'm trying to process a document and store many documents into ravendb which I have running locally.
I'm getting the error
Tried to send *ravendb.BatchCommand request via POST http://127.0.0.1:8080/databases/mydb/bulk_docs to all configured nodes in the topology, all of them seem to be down or not responding. I've tried to access the following nodes: http://127.0.0.1:8080
I was able to fetch mydb topology from http://127.0.0.1:8080.
Fetched topology: ( url: http://127.0.0.1:8080, clusterTag: A, serverRole: Member)
exit status 1
To me, it sounds like maybe my local cluster is running out of compute to process the large amount of data I'm trying to store.
RavenDB says I'm using 3 of 12 available cores, and I'd also like to make sure it's using a reasonable amount of the ram I have available on the machine (I'd even be happy with giving it a swap)
But reading around online, I'm not finding much helpful information for making sure RavenDB is able to use what it needs. I found the settings.json so I can add in configurations which theoretically should get included into the server but I'm not making much progress.
I also found some settings and changed "reassign cores" to 12 but it says that still 3/12 are being used and 6/31.1 GB of memory are being used.
If an alternative solution is recommended I'm all ears. I just need to run things locally and storing everything as json's doesn't enable fast enough retrieval for my usecase.
Update
I was able to install mongodb and setup a local database. It hasn't given me any problems yet. RavenDB looks appealing if I understood it better but I guess I'll stick with the tried and true for this project.
It is highly unlikely that you managed to run out of resources on the server with 3 cores / 6 GB unless you are pushing hundreds of millions of documents and doing very heavy work.
Do you get any error on the server? There should be more details on the error or in the server log.
Storm topology is been deployed using Storm command on machine X. Worker nodes are running on Machine Y.
Once topology has been deployed, this is ready to process tuples and workers are processing request and response.
Can anyone please suggest that how do Worker node identify work and data, as I am not sure how worker node has access of code which is not at all deployed by developer?
If code to topology is accessible to Worker Nodes, can you please where is the location of this and also suggest execution of Worker nodes?
One, your asking a fairly complex question. I've been using Storm for awhile and don't understand much about how it works internally. Here is a good article talking about the internals of Storm. It's over two years old but should still be highly relevant. I believe that Netty is now used as the internal messaging transport, it's mentioned as being experimental in the article.
As far as code being run on worker nodes, there is an configuration in storm.yaml,
storm.local.dir
When uploading the Topology, I believe it copies the jar to that location. So every different worker machine will have the necessary jar in it's configured storm.local.dir. So even though you only upload the one machine, Storm will distributed it to the necessary workers. (That's from memory and I'm not in a spot to test it at the moment. )
Newbie in Newrelic here. I have an API service hosted on Heroku and being monitored at Newrelic.
While I was studying how to use newrelic. I found out my 2 workers are being underutilised with very low RPM and low transaction time. So I decided to cut down to one worker which saves me $36 a month. =]
Shortly after that I received tonnes of logEntries emails stating request timeouts of one of my web dynos. Looking into Newrelic. I found out that one of my actions are being called suspciously high number of times for 2-3 minutes.
The action being V1::CarsController#Index, which basically shows a collection of cars.
While I was not sure whether the deletion of one worker dyno has caused memcached to do something, I also suspect that may be someone is trying scrap the data off the database. I am not too sure how to further investigate into the issue. I wonder if I can track down the request IP and see it is the same? or how can I further investigate?
If further information is needed I am happy to provide in Edits!
Thanks
The question is a little broad, but I feel there is no one place that helps systematically diagnose elastic search issues. The broad categories could be :
Client
Query errors
Incorrect Query Results
Unexplained behaviors
Server
Setup issues
Performance issues
Critical errors
Unexplained behaviors
Example for 1)a) would be to say, log the query string on the server ( reference to how to enable logging would be nice), install the inquistor plugin (link to github) and run the query string yourself. etc.
Your question is very broad and to be honest I am not sure I can fully answer it, however I will tell you how we monitor and manage our cluster.
1 - We log query logs and slow query logs to graylog2 (it uses es under the hood) so we can easily see, report, and alert on all logging from our cluster. We can also view slow queries that have occurred.
2 - we send es stats to statsd and then graph that information in graphite. This way we can see things like cluster state, query counts, indexing counts, jvm stats, disk i/o, etc. All parsed from the es stats api and sent to statsd
3 - we use fabric scripts to deploy/upgrade the cluster and manage plugin installation
4 - we use jenkins and jmeter to run occasional performance tests against the cluster (are we getting slower over time, does the cluster deployment work?)
5 - we use bigdesk and head plugins to keep an eye on the cluster and explore how it is doing.
I'm starting to work with NEST.
I've seen in a previous question that I should use TryConnect only once at the beginning of the program and then use Connect.
But that seems a bit too naive for a long running system.
What if I have a cluster of say 3 machines and I want to make sure I can connect to any of the 3 machines?
What should be the recommended way of doing that?
Should I:
- Use TryConnect each time and use a different host + port if it fails (downside - an additional roundtrip each time)?
- Try to work with a client and have some retry mechanism to handle failures due to connectivity issues? Maybe implement a connection pool on top of that?
Any other option?
Any suggestions/recommendations?
Sample code?
Thanks for your help,
Ron
Connection pooling is an often requested feature, but due to the many heuristics involved and different approaches NEST does not come with this out of the box. You will have to implement this yourself.
I would not recommend calling TryConnect() before each call as now you are doing two calls instead of one.
Each NEST call returns a IResponse which you can check for IsValid, ConnectionStatus will hold the request and response details.
See also the documentation on handling responses
In 1.0 NEST will start to throw an exception incase of TCP level errors so more generic approaches to connection pooling can be implemented, and nest might come with a separate nuget package implementing one (if anything as reference). See also this discussion https://github.com/Mpdreamz/NEST/pull/224#issuecomment-16347889
Hope this helps for now.
UPDATE this answer is outdated NEST 1.0 ships with connection pool and cluster failover support out of the box: http://nest.azurewebsites.net/elasticsearch-net/cluster-failover.html