Rails - searching and configuration of Solr - ruby

I use for searching a content in my app Solr. What I don't like is, that everytime, when I restart computer, I have to manually start Solr and then, when is in the app a new content, I have to reindex that, because in other hand Solr wouldn't find the new data.
This is not very comfortable, how looks the work with Solr on the server, eg. on Heroku? Do I have there starting Solr all the time or do I have there reindex data over and over again, as on my localhost I do?
Eventually, exist better solution for searching except Solr?

You are using the included server, right?
You can choose to deploy it in Tomcat. You just have to copy your files to Tomcat and register your Solr application in Tomcat configuration. Tomcat is run as a service. Or, you can use a script to start Jetty on startup.
And a professional Solr service tries to keep your Solr application alive and your data safe against any cause such as a crashed software, failed server or even a datacenter that went down.
Check what Heroku (or other hosted Solr solutions) promises you in their terms. They would do a much better job than an individual (no restarting Solr instances frequently!).
When you add something to Solr, it is persisted to disk. When commited, it is available to search. If a document changes, you reindex it to reflect the new changes.
When you restart Solr, the same persisted data is available. What is your exact trouble?
There is the DIH (Direct Import Handler) if you want to automatically index from a DB.

I'm happy with Solr so far.
As far as starting Solr instance after restarting your computer, you can write a bash script that would do it for you, or declare an alias that would start your Solr and your app server.
As far as re-indexing. New and updated records should be re-indexed automatically, unless manipulate your data from the console.
For the alternate solutions check out Thinking Sphinx

Related

elasticsearch temporary crash when heavily adding/updating

I am running elasticsearch on a dedicated server on a Saas platform. The problem is that when cron jobs execute, and massively update/insert new values in elastic search, the front-office(the site) when it tries to connect to elasticsearch it returns false (the connection fails).
Anyone knows what can be the problem and how it can be fixed? We are running elasticsearch latest stable elastic search version.
This happens on and off, meaning when i refresh the page in the front office sometimes it cannot connect to elastic search, after another refresh it works again and so on, until the heavy load passes.
We have nvme hdds and elastic search is only running on that server not multi-nodes.
When i say heavily, I mean 1000-2000 updates per second.

How to test Elasticsearch without any possible harm for data?

I need to test functionality of project which is built on elasticsearch. I already have lots of data stored in the db. But i want somehow to test it without any possible harm. Is there any possible way or technology(which works like h2 f.e) to do it.
H2 is a in memory database, also embeddable in your application. In the past versions of ES there was a option to bootstrap a embedded elasticsearch node with your application.
In the current version this feature is not available anymore. So you'll need to bring up at least one node using a second machine, VM or container running elasticsearch. At least running elsticsearch on your machine is still a option.
But maybe you're still on a old version of elasticsearch? Then have a look on this SO (assuming you're using java): How to start elasticsearch 5.1 embedded in my java application?

Running multiple elasticsearch instances

I need to setup 2 Elasticsearch instances:
one for kibana logs (my separate application will throw logs at it)
one for search for my production application
My plan is to create a separate folders with elasticsearch in them. They dont talk to each other which means they are separate databases and if one goes down, the other still runs. Is this good solution or should I use only one elasticsearch folder with muliple elasticsearch.yaml configuration files? What is the best practice for multiple elasticsearch instances?
The best practice is to NOT run two Elasticsearch instances on the SAME server.
Your production search will probably need a lot of ram to work fast and stay responsive. You don't want your logging system interfere with that.

Apache Solr requires regular restart

I have Solr installed and set up on my Drupal 7 site. Most of the time it works as expected. However, every so often, maybe every other day at least, the search will suddenly stop working and according to the Drupal error log I get:
"0" Status: Request failed: Connection refused.
The Type column says Apache Solr. To fix this, I just restart the Solr service, is there something I can do to prevent this issue from occurring again? I suspect it's some sort of configuration with the Solr that needs adjusting.
I'm kind of new to Solr, so any tips would be appreciated.
Thanks
How busy is Solr server? If not very busy, check if you have a firewall between your Drupal and Solr servers. Some firewalls kill the connections if there is no traffic going through.
One way to test would be to access Solr admin interface. If you can, the server itself is fine, only Drupal's connection died.
I am assuming that Solr client library in Drupal tries to maintain a persistent connection. If that's not the case, the above does not apply.
I ended up reducing the number of documents to be indexed during cron from 200 to 50. That seemed to resolve the issue, as I have not had any Solr outages over the last couple of weeks.

BigCouch IDs and Backup data on EC2

I have a few questions about BigCouch that i'm interesting getting answers before start using it.
Do I need to choose my shard key carefully or can just use an auto-generated GUID? I start with a single server with 1 replication, but I want to be ready when I need to add another shard
Any GUI for managing the cluster like CouchBase have, something similar to administer the DB
How can I backup the data when hosting BigCouch on EC2 (ie. snapshots)
Thanks
Since you have no started to use BigCouch yet and it looks like you need some features that are available out of the box in Couchbase (auto-sharding, administration console ...)
Why no going on Couchbase ?

Resources