Greenplum - migrating segment data directories to new hardware location - greenplum

I currently have my Greenplum database installed and running on a server. I have attached a new hard disk and have simply copied the master directory and all of the segment directories over because I want to point my database to the data on the new hard disk.
I have changed the environment variable MASTER_DATA_DIRECTORY to point to the new master directory, however I cannot figure out how to point to the new segment data directories. How can I point to the new directories so that when I run gpstart, my database starts up pointing to the data on the new hardware?
Thanks

Out of the box Greenplum does not support moving its directories. But it can be done this way:
Move the directories and in their old locations create symbolic links to the new locations. For instance, if previously you used "/data/master" directory and switched to "/data2/master" you can easily remove "/data/master" directory and replace it with symbolic link "/data/master -> /data2/master"
More complicated and not recommended approach. Greenplum stores filespace locations in pg_filespace_entry table. You should start the Greenplum in restricted mode, edit this table ("set allow_system_table_mods=DML; update pg_filespace_entry set ..."), stop the Greenplum (the stop might fail, you should manually stop each segment with "pg_ctl -D stop"), move the directories
Regardless of the approaches, you should backup the DB. If this is test environment, I'd recommend you to just remove the old system with "gpdeletesystem" and init it anew in new directories

Related

How to migrate a PostgreSQL 10 database from Windows C drive to another drive

I have almost an identical problem as this post:
How to migrate a Windows 10 installation of PostgreSQL 9.5.7 to a larger disk
I have a PostgreSQL database on my C drive which is running out of space. I want to move my database to my larger F drive. I'm running into the same issue as the user in the post I mentioned:
The path to executable under the service to start my server is
C:\PostgreSQL\pg10\pgservice.exe "//RS//PostgreSQL 10 Server"
There's no specific path to the data directory explicitly written. I'm not sure how to change where PostgreSQL looks to store data since there's not a -D variable defined there.
I think if I just copy my data over to the larger drive and pass the new data directory as a parameter argument on startup, my issue would be solved. Any ideas on how to do this given my current configuration?
I won’t call it migration rather just transferring files from one location to another.
It can be done by:
Stopping database server
Cut/paste data to your new drive location
reconfigure database server to use new location
Start server again or restart system if needed

How to move elasticsearch index using file system?

Usecase:
I have created es-indexes: mywebsiteindex-yyyymmdd , mysharepointindex-yyyymmdd in my laptop/dev machine. I want to export/zip that index as a file. The file may be migrated by someone who has credentials to target machine. And the zip/file may be imported to target-elastic folder.
You can abstract the words 'machine' 'folder' 'zip' in the above. Focus is 'transfer index as a file and reimport at target which I may not have access through http/tcp/ftp/ssh'.
Is there any python/other script out there that can export-from-source and import-to-target? A script that hides internal complexities of node/cluster count differences between dev/prod etc, and just move index.
Note: I already referred to the below page, so no need to reiterate the same
https://www.elastic.co/guide/en/cloud/current/ec-migrate-data.html
There are some options:
You can use the snapshot and restore api to create a snapshot of your index and restore it in your new instance. (recommended way)
You can use the reindex api in your new instance to reindex your index from remote.
You can use Logstash with your old instance as an input and your new instance as the output.
And you can write a script/application using one of the supported clients to query your index, export to a file, read that file and import in your new instance. (logstash can also do that).
But you can't move your data files, this is not supported nor recommended by Elastic.

How to copy windows folder structure recursively from one ec2 instance to another?

We have Windows 2012 R2 AWS ec2 instance where there is a particular folder structure created for a COTS application. We have built fault tolerance and any time this instance goes down and another ones comes up, the new instance installs everything from scratch. The challenge is, copying the folder structure into the new one. The folder structure is quite deep (5 level) and I would like to avoid manually create these 100's of folder structure when the new instance is coming up.
To Illustrate, my current ec2 has:
C:\ABC
C:\ABC\sub1
C:\ABC\sub2
...
C:\ABC\subn
C:\ABC\sub1\child1-sub1
C:\ABC\sub1\child2-sub1
...
C:\ABC\sub2\child1-sub2
C:\ABC\sub2\child2-sub2
...
so on..
My idea is if I can copy the folder structure (without files) into a variable, then I can write the variable into a file and I can copy the file into S3. When the new instance comes up, read this file from S3, get the structure and re-create it.
I tried using " robocopy $source $dest /e /xf *.*", but $dest is a directory. I need to store the results into some kind of variable which can be stored somewhere.
Any suggestions/ thoughts?
You can use the tree command to get the directory structure.
In PowerShell, you can use the tree command to print a tree structure,
starting from the current directory to it's descendants at deepest
levels. You can also specify a custom path as an argument
Then as you said you can store that in S3.
Finally you can run commands at startup to read from S3 and create the folder hierarchy in the new EC2 instance

MongoDB running on windows drive setup

Looking to setup a high performance environment running Mongo 3.4 on windows 2016 in azure. I come from a SQL\windows background and was wondering if there are any options with Mongo to spread out the IO workload of mongod. It seems odd that there is only a dbPath option and that you can not configure separate locations for the DB(s), opslogs and journal. Am i missing something ?
Thanks for any assistance
This is indeed possible, using a couple of different techniques:
The oplog is stored in the local database, so you can keep it in a separate folder by using the storage.directoryPerDB config option.
The journal is stored in a subfolder of the data directory; you can make MongoDB save its journal files in a separate directory by preparing a symbolic link called journal in the data directory, pointing to your other folder.

What is the algorithm that dropbox uses to identify list of files/folders changed locally when the app was not running?

I understand that we can identify the changes in File System when the app is running using some OS events. I am just wondering when the app is not running, If I make lots of changes on the file system like add / modify / delete / rename few files and folders, What algorithm does Dropbox uses to identify these changes. One thing I could think of is, by comparing last modified time of a file on the file system against LMT stored value when the app was running. In this case, we had to loop through all the files anyways. However, LMT doesn't change if we do rename. Just wanted to see is there any better approach as relying on LMT has its own problems?
Any comments?
I don't know if it's how Dropbox handles it but here is a strategy that may be useful:
You have a root directory handled by Dropbox. If I were Dropbox, I'ld keep hashes for each file I have on the server. Starting from the root, the app would scan the file tree (directories + files) and compute the hashes for each file.
The scan would lead to a double index hashtable. Each file and directory would be indexed using its relative path (from the root Dropbox directory). A second index would be made using the hash(es) of each file.
Now, the app has scanned and established the double-indexed hashtable. The server would then send the tuples (relative path, hashes of the file). Let (f, h) be such a file tuple:
The app would try to get the file through the path index using f:
If there is a result, compare the hashes. If they don't match, update the file on the remote server
If there is no result the file may have been deleted OR moved/renamed. The app then tries to get the file through the hash index using h: if there is a match, that means the file is still there only under a different path (hence moved or renamed). The app send the info and the file is appropriately moved/renamed on the server.
The file has not been found neither using the hash or the path. It has been deleted from the Dropbox file tree: we delete it accordingly on the server.
Note that this strategy needs a synchronization mechanism to know, when meeting a match, if the file has to be updated on the client or on the server. This could be achieved by storing the time of the last update run by Dropbox (on the client and the server) and who performed this last update (on the server).

Resources