How to test Elasticsearch without any possible harm for data? - elasticsearch

I need to test functionality of project which is built on elasticsearch. I already have lots of data stored in the db. But i want somehow to test it without any possible harm. Is there any possible way or technology(which works like h2 f.e) to do it.

H2 is a in memory database, also embeddable in your application. In the past versions of ES there was a option to bootstrap a embedded elasticsearch node with your application.
In the current version this feature is not available anymore. So you'll need to bring up at least one node using a second machine, VM or container running elasticsearch. At least running elsticsearch on your machine is still a option.
But maybe you're still on a old version of elasticsearch? Then have a look on this SO (assuming you're using java): How to start elasticsearch 5.1 embedded in my java application?

Related

Designing ElasticSearch Migration from 6.8 to 7.16 along with App Deployment

I have a Spring Boot application that uses ElasticSearch 6.8 and I would like to migrate it to Elasticsearch 7.16 with least downtime. I can do rolling update but the problem with migration is that when I migrate my ES cluster from version 6 to 7, some features in my application fails because of breaking changes (for example total hit response change)
I also upgraded my ElasticSearch client to version 7 in a separate branch and I can deploy it as well but that client doesn't work with ES version 6. So I cannot first release the application and then do the ES migration. I thought about doing application deployment and ES migration at the same time with a few hours downtime but in case something goes wrong rollback may take too much time (We have >10TB data in PROD).
I still couldn't find a good solution to this problem. I'm thinking to migrate only ES data nodes to 7.16 version and keep master nodes in 6.8. Then do application deployment and migrate ElasticSearch master nodes together with a small downtime. Has anyone tried doing this? Would running data and master nodes of my ElasticSearch cluster in different versions (6.8 and 7.16) cause problem?
Any help / suggestion is much appreciated
The breaking change you mention can be alleviated by using the query string parameter rest_total_hits_as_int=true in your client code in order to keep getting total hit count as in version 6 (mentioned in the same link you shared).
Running master and data nodes with different versions is not supported and I would not venture into it. If you have a staging environment where you can test this upgrade procedure it's better.
Since 6.8 clients are compatible with 7.16 clusters, you can add that small bit to your 6.8 client code, then you should be able to upgrade your cluster to 7.16.
When your ES server is upgraded, you can upgrade your application code to use the 7.16 client and you'll be good.
As usual with upgrades, since you cannot revert them once started, you should test this on a test environment first.

How can i run tarantool 2.3.1 with the snapshot from tarantool 1.10.3

Circumstances:
For 3 years there is an application, that uses tarantool (now it is 1.10.3), and once upon a time we've decided to move it to Kubernetes and replace old and ugly dockerfile, based on Jessie to the official image tarantool/tarantool:2.3.1. I don't know will it be all fine with the data.
In this case, I have two questions, and I will be really happy to read the correct answers from you:
Does tarantool 2.3.1-2-g92750c828 support recovering from snapshots, made by tarantool 1.10.3?
How can I surely load the snapshot data by the new version? It would be great to make it without restarting tarantool, because I have PVC like "emptyDir" that can't handle any data when pod restarts)
Yes, Tarantool 2.3 is compatible with Tarantool 1.10 in terms of binary protocol and snapshot format. If you can't simply run Tarantool 2.3 based on snapshot of 1.10, please file an issue - https://github.com/tarantool/tarantool/issues
I want to note several things: after upgrade to 2.3 it could be impossible to run it back on 1.10 (some system spaces/records incompatible will be created).
Sorry, I can't answer for this question for Kubernetes. But you can't simply stop and run again single instance. But upgrade without downtime is available for replication cluster - https://www.tarantool.io/en/doc/2.4/book/admin/upgrades/#upgrading-tarantool-in-a-replication-cluster

Automatically remove older zipkin entries in elasticsearch

This is specifically for Zipkin's Elastic Search storage connector. Which does not do the index that you can use Curator.
Is there a way of automatically removing old traces and have that as part of the ElasticSearch configuration (rather than building yet another service or cron job) Since I am using it for a development server I just need it wiped every hour or so.
From zipkin docs:
There is no support for TTL through this SpanStore. It is recommended instead to use Elastic Curator to remove indices older than the point you are interested in.

Remote data store processing with ElasticSearch 7.1 and log4j2.11.1

I am using ElasticSearch 7.1. It comes with log4j2.11.1.jar. The problem comes when I am trying to setup a remote data store with log4j2 running as a TcpSocketServer. I would then use log4j logging API in different Java applications to transmit logs over to the remote data store to analyse. However, from log4j2 Java documentation, I found out that the TcpSocketServer has been taken out.
How did you guys managed to configure a remote data store with the latest log4j2 library? Is there any working architecture layout which still fits my use case?
Elasticsearch is not a great log shipper; also what happens if the network is down? We're generally going more down the route that the Beats should take that part over, so Filebeat with the Elasticsearch module here: https://www.elastic.co/guide/en/beats/filebeat/7.1/filebeat-module-elasticsearch.html

Rails - searching and configuration of Solr

I use for searching a content in my app Solr. What I don't like is, that everytime, when I restart computer, I have to manually start Solr and then, when is in the app a new content, I have to reindex that, because in other hand Solr wouldn't find the new data.
This is not very comfortable, how looks the work with Solr on the server, eg. on Heroku? Do I have there starting Solr all the time or do I have there reindex data over and over again, as on my localhost I do?
Eventually, exist better solution for searching except Solr?
You are using the included server, right?
You can choose to deploy it in Tomcat. You just have to copy your files to Tomcat and register your Solr application in Tomcat configuration. Tomcat is run as a service. Or, you can use a script to start Jetty on startup.
And a professional Solr service tries to keep your Solr application alive and your data safe against any cause such as a crashed software, failed server or even a datacenter that went down.
Check what Heroku (or other hosted Solr solutions) promises you in their terms. They would do a much better job than an individual (no restarting Solr instances frequently!).
When you add something to Solr, it is persisted to disk. When commited, it is available to search. If a document changes, you reindex it to reflect the new changes.
When you restart Solr, the same persisted data is available. What is your exact trouble?
There is the DIH (Direct Import Handler) if you want to automatically index from a DB.
I'm happy with Solr so far.
As far as starting Solr instance after restarting your computer, you can write a bash script that would do it for you, or declare an alias that would start your Solr and your app server.
As far as re-indexing. New and updated records should be re-indexed automatically, unless manipulate your data from the console.
For the alternate solutions check out Thinking Sphinx

Resources