scp, inconsistency for file structure preservation - bash

My task: collect log files from several servers.
Server file structure: "/remote/path/dir/sub-dirs/files.log", which
is the same on all servers. (All servers have the same set of
"sub-dirs", absence could happen, and of course "files.log" names
differ)
Local file structure: "/local/path/logs"
After copy I would like to have
"/local/path/logs/dir/sub-dirs/files.log"
Method (in a whlile loop for servers): scp -r
$SERVERS:/remote/path/dir /local/path/logs
Problem: For reasons I don't understand, the first scp command
ignores the "dir" folder, I get "/local/path/logs/sub-dirs/files.log"
But following scp commands gives me what I intended
"/local/path/logs/dir/sub-dirs/files.log"
Why is this happening and how should I fix/get around it?
Thanks!

Why is this happening [...]
In the command scp -r path/to/source dest:
If dest doesn't exist, the dest directory will be created, and path/to/source/* will be copied into it. For example if you have path/to/source/X then dest/X will be created.
If dest is a directory, then dest/source will be created, and the path/to/source/* will be copied into it. For example if you have path/to/source/X then dest/source/X will be created.
[...] and how should I fix/get around it?
Create dest in advance, for example:
mkdir -p /local/path/logs
scp -r $SERVERS:/remote/path/dir /local/path/logs

Related

How to copy specific files from directories, while the directories name was extracted from an excel file using Bash script

I'm new in Bash and I have a list of names of directories stored in an excel file. I'd like to find those directories (they are located in different location at the computer) and to copy from each directory specific files (list of 4 files that ends with specific endings) to a remote computer.
For examples:
For a name of directory at the excel sheet - "NA123", I'd like to find it and copy it's partial content to a remote computer, for example copy the files: samples-sheet.csv, toInfo.xml, newfiles.gz, todo.csv to the remote computer, under a folder name "NA123".
How do I begin to do that?
****Editing to give an example of how it needs to be*****
A short example of the csv is as below:
A
1 14RD00129_TS1_01
2 SD-2015-06_01
3 US-005
4 RA99
All the names at the csv are directories that can be found under /home/bella/samples under 3 different folders: some will be at /home/bella/samples/gruop_1, some at:/home/bella/samples/gruop_2, and some at:/home/bella/samples/gruop_3
So first I need to iterate through the csv file, to locate the match directory at my computer, then I need to copy 4 specific files to a remote computer with the same name of directory. Hope this is clearer...
I guess you CSV file should only consist of directory names then, since there's only one column. I assume there is no header line in the CSV (A in your example) and no line number. You can take this as a starting point:
samples='/home/bella/samples'
while IFS= read -r line; do
dir=$(find "$samples"/gruop_{1..3} -type d -name "$line")
scp "$dir"/{samples-sheet.csv,toInfo.xml,newfiles.gz,todo.csv} \
user#host.com:"/path/to/$line"
done < 'file.csv'
Basically, you could do something like:
# create the directory on the remote:
ssh remote-ip 'mkdir -p NA123'
# copy the files to the remote in the directory just created
for f in samples-sheet.csv toInfo.xml newfiles.gz todo.csv; do scp $f remote-ip:NA123/; done

Move files under GCS with renaming

I want to write the following bash script which copies files from one GCS bucket to another with renaming options.
My input folder is gs://test-rtt-integration/result/frd/*.orc
and my destination folder is gs://test-rtt-integration/recent_files/frd
The renaming of the copied file should be done based on the name provided from gs://test-rtt-integration/complex-files/TAN/recent_files/today/frd
once the copy with renaming is done I need to clean gs://test-rtt-integration/result/frd
I tested the following commands, but they are not working properly
NAME = "$(gsutil ls gs://test-rtt-integration/complex-files/TAN/recent_files/today/frd)"
gsutil mv gs://test-rtt-integration/result/frd/*.orc gs://test-rtt-integration/recent_files/frd/$NAME
gsutil rm -rf gs://test-rtt-integration/result/frd
( all .orc files and other files should be deleted)
But this is not working properly as I have to split the NAME based on / and get the last split , so if the result of split is called SPLIT , I have to do gsutil mv gs://test-rtt-integration/result/frd/*.orc gs://test-rtt-integration/recent_files/frd/$SPLIT
Any idea on how to do this?
The question is a little bit confusing. You say that you want to move files from one Google Cloud Storage bucket to another, but all the operations are made in one single bucket called test-rtt-integration.
However, as soon as you get the file location with the command gsutil ls gs://[BUCKET_NAME]/folder e.g. gs://[BUCKET_NAME]/folder/[FILENAME].orc, since the gs://[BUCKET_NAME]/folder/ part is always the same for all the objects in the folder, just replace it with null and you will get only the object name at the end as [FILENAME].orc etc.
I am not sure if this is exactly what you are looking for, but I did a little bit of coding myself and I have created a bash script that:
Gets the name of each object from gs://[BUCKET_NAME]/from bucket folder
Copy all objects from gs://[BUCKET_NAME]/from bucket folder to the gs://[BUCKET_NAME]/to/ bucket folder
Delete all objects from gs://[BUCKET_NAME]/from bucket folder
Inside there are comments that explain how every operation works in details. If that is not exactly what you are looking for, you can get the basic idea of how that works and implement it in different way that will suit you better. I have tested the scrip myself in Google Cloud Shell and it is working. The example code can be found in GitHub.

Comparing files using Ansible

I need to find a way where I can compare elasticsearch template folders in given elasticsearch Data host group. Meaning if the directory is /usr/local/elasticsearch/config/templates/, I need to make sure all the files inside that directory in that ansible host group is same.
No extra template files or difference version template files. I haven't been able to figure out how to do this.
Try combining ansible with rsync dry-run using the shell module:
ansible -i production data_hosts -l '!~host1' -f 1 -m shell \
-a 'rsync --checksum --delete --dry-run -r -v host1.example.com:/usr/local/elasticsearch/config/templates/ /usr/local/elasticsearch/config/templates'
Explanation
Compares the /usr/local/elasticsearch/config/templates directory on all hosts in the [data_hosts] group to host1
Excludes host1.example.com using the -l limit argument: -l '!~host1'
Uses -f 1 to only run one compare at a time. Optional, but helped in my case because the directories contained large numbers of files (>10K)
Uses --dry-run to prevent rsync from actually sync'ing the directories
Uses --delete to list extraneous files in the destination directory
Uses --checksum to compare files based on checksum rather than mod-time and size
Notes
You could modify this one-liner to perform the sync by removing --dry-run. Consider adding -z to compress file data during transfer, and -a for archive mode.
The production inventory file would look like this:
[data_hosts]
host1.example.com
host2.example.com
host3.example.com
host4.example.com
I did it by first comparing the number of files in all hosts in that given group under the template folder, then getting the list of files and their respective md5sum values and exporting them to current playbook using include_vars. Then I compared each files md5sum against the one exported with include_vars and with_items.

s3cmd sync is remote copying the wrong files to the wrong locations

I've got the following as part of a shell script to copy site files up to a S3 CDN:
for i in "${S3_ASSET_FOLDERS[#]}"; do
s3cmd sync -c /path/to/.s3cfg --recursive --acl-public --no-check-md5 --guess-mime-type --verbose --exclude-from=sync_ignore.txt /path/to/local/${i} s3://my.cdn/path/to/remote/${i}
done
Say S3_ASSET_FOLDERS is:
("one/" "two/")
and say both of those folders contain a file called... "script.js"
and say I've made a change to two/script.js - but not touched one/script.js
running the above command will firstly copy the file from /one/ to the correct location, although I've no idea why it thinks it needs to:
INFO: Sending file
'/path/to/local/one/script.js', please wait...
File
'/path/to/local/one/script.js'
stored as
's3://my.cdn/path/to/remote/one/script.js' (13551
bytes in 0.1 seconds, 168.22 kB/s) [1 of 0]
... and then a remote copy operation for the second folder:
remote copy: two/script.js -> script.js
What's it doing? Why?? Those files aren't even similar. Different modified times, different checksums. No relation.
And I end up with an s3 bucket with two incorrect files in. The file in /two/ that should have been updated, hasn't. And the file in /one/ that shouldn't have changed is now overwritten with the contents of /two/script.js
Clearly I'm doing something bizarrely stupid because I don't see anyone else having the same issue. But I've no idea what??
First of all, try to run it without --no-check-md5 option.
Second, I suggest you to pay attention to directory names, specifically trailing slashes.
s3cmd documentation says:
With directories there is one thing to watch out for – you can either upload the directory and its contents or just the contents. It all depends on how you specify the source.
To upload a directory and keep its name on the remote side specify the source without the trailing slash
On the other hand to upload just the contents, specify the directory it with a trailing slash

Unable to create the md5sum file I need to create. Manually doing it would be far too labour-intensive

I need to create/recreate an md5sum file for all files in a directory and all files in all sub-directories of that directory.
I am using a rockettheme template that requires a valid md5sum document and I have made changes to the files, so the originally included md5sum file is no longer valid.
There are over 300 files that need to be checksummed, and the md5hash added to a single file.
The basic structure of the file is as follows:
1555599f85c7cd6b3d8f1047db42200b admin/forms/fields/imagepicker.php
8a3edb0428f11a404535d9134c90063f admin/forms/fields/index.html
8a3edb0428f11a404535d9134c90063f admin/forms/index.html
8a3edb0428f11a404535d9134c90063f admin/index.html
8a3edb0428f11a404535d9134c90063f admin/presets/index.html
b6609f823ffa5cb52fc2f8a49618757f admin/presets/preset1.png
7d84b8d140e68c0eaf0b3ee6d7b676c8 admin/presets/preset2.png
0de9472357279d64771a9af4f8657c2a admin/presets/preset3.png
5bda28157fe18bffe11cad1e4c8a78fa admin/presets/preset4.png
2ff2c5c22e531df390d2a4adb1700678 admin/presets/preset5.png
4b3561659633476f1fd0b88034ae1815 admin/presets/preset6.png
8a3edb0428f11a404535d9134c90063f admin/tips/index.html
2afd5df9f103032d5055019dbd72da38 admin/tips/overview.xml
79f1beb0ce5170a8120ba65369503bdc component.php
caf4a31db542ca8ee63501b364821d9d css/grid-responsive.css
8a3edb0428f11a404535d9134c90063f css/index.html
8697baa2e31e784c8612e2c56a1cd472 css/master-gecko.css
0857bc517aa15592eb796553fd57668b css/master-ie10.css
a4625ce5b8e23790eacb7704742bf735 css/master-ie8.css
This is just a snippet, but the logic is there.
hash path/to/file/relative/to/MD5SUM_file
Can anyone help me write a shell script (bash shell) that I can add to my path that will execute and generate a file called "MD5SUM_new"? I want the output file name to be "MD5SUM_new" so I can review the content before issuing a mv MD5SUM_new MD5SUM
FYI, the MD5SUM_new file needs to be saved in the root level of the template.
Thanks
This is quite easy, really. To hash all files under the current directory:
find . -type f | xargs md5sum > md5sums
Then, you can make sure it's correct:
md5sum -c md5sums

Resources