I want to get the latest file names under each directory of gcs - bash

I want to know the path to the latest file under each directory using gsutil ls.
Executing the command in a loop like this is very slow.
I want the final output to be
How can I do this?
I want to know the path to the latest file under each directory using gsutil ls.
shell script
for dir in dir_list[#];do
file+=$(gsutil ls -R ${dir} | tail -n 1);
done
Running the command in a loop process is very slow.
I want the final output to be
Is there another way?
results image
gs://bucket/dir_a/latest.txt
gs://bucket/dir_b/latest.txt
gs://bucket/dir_c/latest.txt
gs://bucket/dir_d/latest.txt

There isn't other strategy for a good reason: directory doesn't exist. So, you need to scan all the files, get the metadata, get this one which is the last, and do that for each "similar prefix".
A prefix is what you call directories "/path/to/prefix/". That's why you can only perform search by prefix in GCS not by file pattern.
So, you can imagine to build a custom app which, for each different prefix (directory), create a concurrent process (fork) dedicated to this prefix. Like that you can perform parallelization. It's not so simple to write but you can!

Related

Bash union of two directories in one statement

I'm trying to run a command that takes one location input (intended for a single directory of files), but I need to run it on files in several locations. While I'd normally run it on */*.type, I'm looking for some way to run the command over (*/dirA/*.type AND dirB/*.type).
I basically need all files of *.type within a directory structure, but they're not all at the same directory level (or I'd just do */*/*.type or something to grab them all). Unfortunately they're in a particular layout for a reason, which I can't just reorganize to make this command run.
Is there any bash shortcut/command/whatever-it's-called that I could use to get both sets of files at once?
you can say
dir{A,B}/*.type
For example running this with ls command
root#do:/tmp# ls dir{A,B}/*.type
dirA/test.type dirB/test.type
If the command works when you pass one wildcard in, that means it is expecting a list of file names. In that case you can pass it two wildcards just as easily.
command */dirA/*.type dirb/*.type

OS X bash For loop only processes one file in a directory

I'm trying to get this code to process all files in a directory : https://github.com/kieranjol/ifi-ffv1/blob/master/ifi-ffv1.sh
I run it in the terminal and add path to file ./ifi-ffv1.sh /path/to/file.mov. How can I get it to move on to the next? I'll also need to make sure that it only processes AV files, such as .avi/.mkv/*.mov etc.
I've tried using while loops with shift but I can't get that to work either.
I've tried adding a specific path like here but I'm failing http://www.cyberciti.biz/faq/unix-loop-through-files-in-a-directory/
I've tried this https://askubuntu.com/a/315338 and it keeps looping the same file rather than moving on to the next one. http://tldp.org/HOWTO/Bash-Prog-Intro-HOWTO-7.html this didn't help me either.
I know this is going to be a horribly simple solution but I'm very new to this.
You don't actually have any kind of loop in your code. You need to do something like
for file in path/to/*.avi path/to/*.avg
do
./ifi-ffv1.sh "$file"
done
which will loop through all the specified files and substitute each one for $1
You can put whatever file names you want instead of the path/to/*.avi path/to/*.avg. If you cd to the directory first, you can leave out the paths, and just use *.avi *.avg
To do it all in one script, do something like this:
cd <your directory>
for file in *.avi *.avg
do
<your existing script here>
done
replacing all the $1's in your script with "$file" (not duplicating any quotes you already have, of course)

Making checks before rsyncing external drive on OSX

I have the following issue on OSX though I guess this could equally be filed under bash. I have several encrypted portable drives that I use to sync an offsite data store or as an on-the-go data store etc. I keep these updated using rsync with several options including --del and an includes file.
This is currently done very statically i.e.
rsync <options> --include-file=... /Volumes /Volumes/PortableData
where the includes file would read something like
+ /Abc/
+ /Def/
...
- *
I would like to do the following:
Check the correct drive is mounted and find its mount-point
Check that all the + /...../ entries are mounted under /Volumes
rsync
To achieve 1 I was intending to store the uuid of the drives in variables in my profile so that I could search for them and find the relevant mount point. A bash function in .bashrc that takes a uuid and returns a mount point. I have seen some web entries for achieving this.
2 I am a little more stuck on. What is the best way of retrieving only those entries that are both + and top level folder designations in the include files then iterating to check they are mounted and readable? Again, I'm thinking of trying to put some of this logic in functions for re-usability.
Is there a better way of achieving this? I have thought of CCC, but like the idea of scripting in bash and using rsync as it is a good way of getting to know the command line.
rsync can call in a file that is a list of exclusions.
I would write a script that dumped directories to text file that are NOT + and top level folder designations in the include files
You are going to want an exclusion to look like this:(you can use wildcards if it helps)
dirtoexlude1
dirtoexlude2
dirtoexlude
Then just direct an rsync to that exclusion file.
Your Rsync command will be something like this:
rsync -aP --exclude-from=rsyncexclusion.txt
a is for recursive essentially (with hand waving) and P is for verbose.
good luck.

regex/wildcard in scp

Is it possible to use a wildcard in scp
I am trying to achieve:
loop
{
substitue_host (scp path/file.jar user#host:path1/foo*/path2/jar/)
}
I keep on getting "scp: ambiguous target"
Actually I am calling an api with source and dest that uses scp underneath and loops over diff hosts to put files
Thanks!
In general, yes, it is certainly possible to use a wildcard in scp.
But, in your scp command, the second argument is the target, the first argument is the source. You certainly cannot copy a source into multiple targets.
If you were trying to copy multiple jars, for example, then the following would certainly work:
scp path/*.jar user#host:path2/jar/
"ambigious target" in this case is specifically complaining that the wildcard you're using results in multiple possible target directories on the #host system.
--- EDIT:
If you want to copy to multiple directories on a remote system and have to determine them dynamically, a script like the following should work:
dir_list=$(ssh user#host ls -d '/path1/foo*/path2/jar/')
for dir in $dir_list; do
scp path/file.jar user#host:$dir
done
The dir_list variable will hold the results of the execution of the ls on the remote system. The -d is so that you get the directory names, not their contents. The single quotes are to ensure that wildcard expansion waits to execute on the remote system, not on the local system.
And then you'll loop through each dir to do the remote copy into that directory.
(All this is ksh syntax, btw.)

Bash script to find specific files in a hierarchy of files

I have a folder in which there are many many folder and in each of these I have lots and lots of files. I have no idea which folder each files might be located in. I will periodically receive a list of files I need to copy to a predefined destination.
The script will run on a Unix machine.
So, my little script should:
read received list
find all files in the list
copy each file to a predefined destination via SCP
step 1 and 3, I think I'll manage on my own, but how will I do step 2?
I was thinking about using "find" to locate each file and when found, write the location in a string array. When all files are found I loop through the string array, running the "SCP" command for each file-location.
I think this should work, but I've never written a bash script before so could anyone help me a little to get started? I just need a basic "find" command which finds a filename and returns the file location if the file is found.
find $dir -name $name -exec scp {} $destination \;

Resources