Shell Script for file monitoring - shell

I have 2 AWS EC2 LAMP servers and i want to replicate the data on one of the folders to others. I know I can try with EFS, but for some reason it is not a viable option at this moment. So, here is what I want to request for help:
Our Server A and Server B has same file structure but the files inside are mismatch. So, I want a script in Server A to look in, example, /var/www/html/../file/ folder and compare with /var/www/html/../file/ in Server B, and dump all new files from Server A to B.
Any help on how to write it?

Well, I used S3FS which is lot easier than breaking head over the script. It readily copies the files from one server to another.

Related

How to upload a file to a server, that's not in the inventory?

Sometimes we need to upload logs of an application, that's distributed among multiple local Unix machines, to the vendor's server. The machines are all part of the same inventory, and can perform the archiving of the logs, and uploading the archives directly.
The server runs Unix and accepts only SCP and SFTP, so synchronize module (which uses rsync) will not work.
There exists a net_put-module, but that seems intended for uploads to special network appliances -- trying to use it, I get cryptic errors about ansible_network_os...
I can, of course, use the command module, but is not there something specifically targeted for SCP- and/or SFTP-servers?
No, there is no module for scp or sftp, and I don't really see that it would provide a lot of value. sftp and scp are straightforward to use with command, and the underlying commands don't really support the things you might want a module to do, like skipping an upload if the file on the remote wouldn't change.

Need ansible playbook to compare files from one server to other server on linux

I have folder which is having lot of files and i need to compare it from one server to another server. If the file is not on the destination server then need to copy the same. Can you please help in writing ansible playbook to do the same thing.
For example: A server is having 100 files and I need to compare this into B server and if any file missing then copy to B server. I need to write this in Ansible, please help.
Thanks,
Raghu
You should use synchronize module for that kind of task.

Hosts File for Greenplum Installation

I am setting up greenplum 3 node cluster for POC while checking installation steps I found that hostfile_exkeys file have to be in master node.
Can anyone tell me where I should create this file location, node etc?
And most important what to put in this?
You create hostfile_exkeys on the Master. It isn't needed on the other hosts. You can put it in /home/gpadmin or anywhere that is convenient for you.
You put the three hostnames for your POC in this file. Example:
mdw
sdw1
sdw2
This is documented pretty well here: https://gpdb.docs.pivotal.io/5120/install_guide/prep_os_install_gpdb.html
You can also run a POC in the cloud. Greenplum is available in AWS, Azure, and GCP. It does all of the configuration for you. You can even use the BYOL product listings for 90 days for free to evaluate the product or you can use the Hourly billed products to get support while you evaluate the product.
There are examples in the utililty reference for gpssh-exkeys documentation but, in general, you should put in all the hostnames in your cluster. If there a multiple network-interfaces, those can go in instead.
I generally put this file either in /home/gpadmin or /home/gpadmin/gpconfigs (good place to keep all files for initial setup and initialization).
Your file will look something like (one name per line):
mdw
sdw1
sdw2
If there are 2 network interfaces, it might look something like:
mdw
mdw-1
mdw-2
sdw1
sdw1-1
sdw1-2
sdw2
sdw2-1
sdw2-2
Your /etc/hosts file (on all server) should include the IP addresses for all the interfaces and their names, so this file should match those names listed in /etc/hosts.
This is primarily to allow the master to exchange ssh keys with all hosts so it is always password-less login to the hosts. After you have this file set up, you will run (example):
gpssh-exkeys -f /home/gpadmin/gpconfigs/yourhostfilename
I hope this helps.

Golang file and folder replication / mirroring across multiple servers

Consider this scenario. In a load-balanced environment, I have 3 separate instances of a CMS running on 3 different physical servers. These 3 separate running instances of the application is sharing the same database.
On each server, the CMS has a /media folder where all media subfolders and files reside. My question is how I'd implement/code a file replication service/functionality in Golang, so when a subfolder or file is added/changed/deleted on one of the servers, it'll get copied/replicated/deleted on all other servers?
What packages would I need to look in to, or perhaps you have a small code snippet to help me get started? That would be awesome.
Edit:
This question has been marked as "duplicate", but it is not. It is however an alternative to setting up a shared network file system. I'm thinking that keeping a copy of the same file on all servers, synchronizing and keeping them updated might be better than sharing them.
You probably shouldn't do this. Use a distributed file system, object storage (ala S3 or GCS) or a syncing program like btsync or syncthing.
If you still want to do this yourself, it will be challenging. You are basically building a distributed database and they are difficult to get right.
At first blush you could checkout something like etcd or raft, but unfortunately etcd doesn't work well with large files.
You could, on upload, also copy the file to every other server using ssh. But then what happens when a server goes down? Or what happens when two people update the same file at the same time?
Maybe you could design it such that every file gets a unique id (perhaps based on the hash of its contents so you can safely dedupe) and those files can never be updated or deleted, only added. That would solve the simultaneous update problem, but you'd still have the downtime problem.
One approach would be for each server to maintain an append-only version log when a file is added:
VERSION | FILE HASH
1 | abcd123
2 | efgh456
3 | ijkl789
With that you can pull every file from a server and a single number would be sufficient to know when a file is added. (For example if you think Server A is on version 5, and you get informed it is now on version 7, you know you need to sync 2 files)
You could do this with a database table:
ID | LOCAL_SERVER_ID | REMOTE_SERVER_ID | VERSION | FILE HASH
Which you could periodically poll and do your syncing via ssh or http between machines. If a server was down you could just retry until it works.
Or if you didn't want to have a centralized database for this you could use a library like memberlist. The local meta data for each node could be its version.
Either way there will be some amount of delay between a file was uploaded to a single server, and when it's available on all of them. Handling that well is hard, which is why you probably shouldn't do this.

Trouble Uploading Large Files to RStudio using Louis Aslett's AMI on EC2

After following this simple tutorial http://www.louisaslett.com/RStudio_AMI/ and video guide http://www.louisaslett.com/RStudio_AMI/video_guide.html I have setup an RStudio environment on EC2.
The only problem is, I can't upload large files (> 1GB).
I can upload small files just fine.
When I try to upload a file via RStudio, it gives me the following error:
Unexpected empty response from server
Does anyone know how I can upload these large files for use in RStudio? This is the whole reason I am using EC2 in the first place (to work with big data).
Ok so I had the same problem myself and it was incredibly frustrating, but eventually I realised what was going on here. The default home directory size for AWS is less than 8-10GB regardless of the size of your instance. As this as trying to upload to home then there was not enough room. An experienced linux user would not have fallen into this trap, but hopefully any other windows users new to this who come across this problem will see this. If you upload into a different drive on the instance then this can be solved. As the Louis Aslett Rstudio AMI is based in this 8-10GB space then you will have to set your working directory outside this, the home directory. Not intuitively apparent from Rstudio server interface. Whilst this is an advanced forum and this is a rookie error I am hoping no one deletes this question as I spent months on this and I think someone else will too. I hope this makes sense to you?
Don't you have shell access to your Amazon server? Don't rely on RStudio's upload (which may have a 2Gb limit, reasonably) and use proper unix dev tools:
rsync -avz myHugeFile.dat amazonusername#my.amazon.host.ip:
on your local PC command line (install cygwin or other unixy compatibility system) will transfer your huge file to your amazon server, and if interrupted will resume from that point, will compress the data for transfer too.
For a windows gui on something like this, WinSCP was what we used to do in the bad old days before Linux.
This could have something to do with your web server. Are you using nginx or apache as your web server. If so you can modify the upload feature in your nginx server. If you are running nginx on the front end of the web server I would recommend the following fix in your nginx.conf file.
http {
...
client_max_body_size 100M;
}
https://www.tecmint.com/limit-file-upload-size-in-nginx/
I had a similar problems with a 5GB file. What worked for me was to use SQLite to create a database with the csv file that I needed. Use SQLite code to bring create the database. Then I used a function in RStudio to communicate with the local database. In that way, I was able to bring in the csv file. I can track down the R code that I used if you like.

Resources