How to move elasticsearch index using file system? - elasticsearch

Usecase:
I have created es-indexes: mywebsiteindex-yyyymmdd , mysharepointindex-yyyymmdd in my laptop/dev machine. I want to export/zip that index as a file. The file may be migrated by someone who has credentials to target machine. And the zip/file may be imported to target-elastic folder.
You can abstract the words 'machine' 'folder' 'zip' in the above. Focus is 'transfer index as a file and reimport at target which I may not have access through http/tcp/ftp/ssh'.
Is there any python/other script out there that can export-from-source and import-to-target? A script that hides internal complexities of node/cluster count differences between dev/prod etc, and just move index.
Note: I already referred to the below page, so no need to reiterate the same
https://www.elastic.co/guide/en/cloud/current/ec-migrate-data.html

There are some options:
You can use the snapshot and restore api to create a snapshot of your index and restore it in your new instance. (recommended way)
You can use the reindex api in your new instance to reindex your index from remote.
You can use Logstash with your old instance as an input and your new instance as the output.
And you can write a script/application using one of the supported clients to query your index, export to a file, read that file and import in your new instance. (logstash can also do that).
But you can't move your data files, this is not supported nor recommended by Elastic.

Related

Azure cognitive search indexer blob storage

I am stuck in a complicated situation and appreciate that if somebody can help.
So I was testing indexing blob storage( pdf files) and indexed a copy of my storage in qa environment that cost me some money.
My question is that:
Is there any solution to use this index in production without indexing again?
I found a solution to copy the index and that works fine but when I add an indexer that is connect to production blob storage it start indexing from scratch again( as I expected). Is there any solution to avid this? Is there any solution to ask indexer to index from now on?
I tried to use the index and the indexer that I already have by changing the subscription to prod. But I have to change the data source for indexer to point at production blob storage and in this case I get an error :
Indexer 'filesIndexer' currently references data source 'qafilesds' and cannot be updated to reference a different datasource 'prodfilesds' because it has a non-empty change tracking state, or it is currently in progress. You can use Reset API to reset the indexer's change tracking state when it is no longer in progress, and retry this call.
A simple answer to your first question is to simply use the qa index you built.
A more complicated answer is to switch from the push model you are using now to a pull model. From your explanation above I assume all of your content comes from blob storage. And you have configured an indexer to do the indexing for you. This is known as the pull model.
The alternative is to use the Azure Cognitive Search SDK to write your own application that submits content to the index instead. In this case you do not use the built-in indexer, only the index itself. Then you are free to use whatever logic you want to determine what to index and what to skip. You can even enable your storage accounts to notify your application with events when content is updated.

Neo4j in memory db

I saw Neo4j can run as Impermanent DB for unit testing porpouses, I'm not sure if this fits my needs. I have my data stored in neo4j the usual way (persistent) but, starts from my data, I want to let each user start an "experimental session": the users add/delete nodes and relationships, but NOT in permanent way, just experimenting with the data (after that session the edits should be lost). The edits shouldn't be saved and obiouvsly they shouldn't be visibile to the others. What's the best way to accomplish that?
Using impermanent database should work. You would
need to import the data to each new database
spring-data-neo4j is not able to connect to multiple databases (in current release), you would need to start multiple instances of your application, e.g. in a tomcat container
when your application stops (or crashes) you would obviously lose data
Or you could potentially use only 1 database with the base data being public (= visible to everyone) and then for all new nodes/relationships you can add owner property.
When querying the data you would check the property is either public or the current user.
At the end of the session you would just delete all nodes and relationships with given owner.
If you also want to edit existing data then it gets more complicated, you could create a copy of the node/relationship and somehow handle that, or if it's not too large copy whole dataset.
You can build a docker image from the neo4j base image (or build your own) and copy your graph.db into it.
Then you can have every user start a docker container from said image.
If that doesn't answer your question, more info is needed.

Clear the cache of FetchDistributedMapCache processor

How to clear the cache of FetchDistributedMapCache processor in Apache NiFi?
I tried deleting the persisted directory and also tried giving a new directory all together but it still fetches old data. Thanks for your help.
You should be able to stop the DistributedMapCacheClient and DistributedMapCacheServer, then delete the existing DistributedMapCacheServer and create a new one with same port as the previous one, then start them back up.
Inside NiFi, you could create a new DistributedMapCacheServer and point your processor at that instead. Outside of NiFi, I've written a Groovy script where you can interact with the DistributedMapCacheServer from the command line. The API only allows you to remove entries you know about; in the upcoming NiFi 1.2.0 release, you will be able to remove entries using a regular expression for the keys (implemented in NIFI-3627). At that point I will update the Groovy script to enable that feature.

Is it possible to configure properties like jcr:PrimaryType from a maven install

I'm following the steps from the Adobe instructions on How to Build AEM Projects using Maven and I'm not seeing how to populate or configure the meta data for the contents.
I can edit and configure the actual files, but when I push the zip file to the CQ instance, the installed component has a jcr:primaryType of nt:folder and the item I'm trying to duplicate has a jcr:primaryType of cq:Component (as well as many other properties). So is there a way to populate that data without needing to manual interact with CQ?
I'm very new to AEM, so it's entirely possible I've overlooked something very simple.
Yes, this is possible to configure JCR node types without manually changing with CQ.
Make sure you have .content.xml file in component folder and it contains correct jcr:primaryType ( e.g. jcr:primaryType="cq:Component").
This file contains metadata for mapping JCR node on File System.
For beginners it may be useful take a look VLT, that used for import/export JCR on File System. Actually component's files in your project should be similar to VLT component export result from JCR.

Elasticsearch - how to store scripts in config/scripts directory

I'm trying to experiment with using scripts in the config/scripts directory. The Elasticsearch docs here say this:
Save the contents of the script as a file called config/scripts/my_script.groovy on every data node in the cluster:
This seems like it's probably really easy, but I'm afraid I don't understand how exactly to put a groovy file "on every data node in the cluster". Would this normally be done through the command line somehow, or can it be done by manually moving the groovy file (in Finder on OSX for example)? I have a test index, but when I look at the file structure on the nodes I'm confused where to put the groovy file. Help, pretty please.
You just need to copy the file to each server running elasticsearch. If you're just running elasticsearch on your computer then go to the folder you've installed elasticsearch into and add copy the file into config/scripts in there (you may have to create the folder first). Doesn't matter how the file gets there.
You should see an entry in the logs (or the console if you are running in the foreground) along the lines of
compiling script file [/path/to/elasticsearch/config/scripts/my_script.groovy
This won't show up straightaway - by default elasticsearch checks for new/updated scripts every 60 seconds (you can change this with the watcher.interval setting)
Since file scripts are deprecated (elastic/elasticsearch#24552 & elastic/elasticsearch#24555) this aproach is not going to work anymore.
API it's the only way.

Resources