Selecting single directory that satisfies certain pattern - bash

I would like to be able to get name of the first directory that matches a certain pattern, say:
~/dir-a/dir-b/dir-*
That is, if the directory dir-b contained directories dir-1, dir-2, and dir-3, I would get dir-1 (or, alternatively, dir-3).
The option listed above works if there is only one subdirectory in dir-b, but obviously fails when there are more of them.

You can use bash arrays, like:
content=(~/dir-a/dir-b/dir-*) #stores the content of a directory into array "content"
echo "${content[0]}" #echoes the 1st
echo ${content[${#content[#]}-1]} #echoes the last element of array "comtent"
#or, according to #konsolebox'c comments
echo "${content[#]:(-1)}"
Another method, make a bash function like:
first() { set "$#"; echo "$1"; }
#and call it
first ~/dir-a/dir-b/dir-*
If you want sort files, not by name but by modification time, you can use the next script:
where="~/dir-a/dir-b"
find $where -type f -print0 | xargs -0 stat -f "%m %N" | sort -rn | head -1 | cut -f2- -d" "
decomposed
the find finds files by defined criteria
the xargs runs the stat command for every found file and prints the result as "modification_time filename"
the sort sorts the result by the time
the head gets the first of them
and the cut cuts the unvanted time field
You can adjust the find with -mindepth 1 -maxdepth 1 to don't descend deeper.
In linux, it can be shorter, (using -printf format), but this works in OS X too...

Related

How to grab the last result of a find command?

The result of my find command produces the following result
./alex20_0
./alex20_1
./alex20_2
./alex20_3
I saved this result as a variable. now the only part I really need is whatever the last part is or essentially the highest number or "latest version".
So from the above string all I need to extract is ./alex20_3 and save that as a variable. Is there a way to just extract whatever the last directory is outputted from the find command?
I would do the last nth characters command to extract it since its already in order, but it wouldn't be the same number of characters once we get to version ./alex20_10 etc.
Try this:
your_find_command | tail -n 1
find can list your files in any order. To extract the latest version you have to sort the output of find. The safest way to do this is
find . -maxdepth 1 -name "string" -print0 | sort -zV | tail -zn1
If your implementation of sort or tail does not support -z and you are sure that the filenames are free of line-breaks you can also use
find . -maxdepth 1 -name "string" -print | sort -V | tail -n1
There could be multiple ways to achieve this -
Using the 'tail' command (as suggested by #Roadowl)
find branches -name alex* | tail -n1
Using the 'awk' command
find branches -name alex* | awk 'END{print}'
Using the 'sed' command
find branches -name alex* | sed -e '$!d'
Other possible options are to use a bash script, perl or any other language. You best bet would be the one that you find is more convenient.
Since you want the file name sorted by the highest version, you can try as follows
$ ls
alex20_0 alex20_1 alex20_2 alex20_3
$ find . -iname "*alex*" -print | sort | tail -n 1
./alex20_3

How do I filter down a subset of files based upon time?

Let's assume I have done lots of work whittling down a list of files in a directory down to the 10 files that I am interested in. There were hundreds of files, and I have finally found the ones I need.
I can either pipe out the results of this (piping from ls), or I can say I have an array of those values (doing this inside a script). Doesn't matter either way.
Now, of that list, I want to find only the files that were created yesterday.
We can use tools like find -mtime 1 which are fine. But how would we do that with a subset of files in a directory? Can we pass a subset to find via xargs?
I can do this pretty easily with a for loop. But I was curious if you smart people knew of a one-liner.
If they're in an array:
files=(...)
find "${files[#]}" -mtime 1
If they're being piped in:
... | xargs -d'\n' -I{} find {} -mtime 1
Note that the second one will run a separate find command for each file which is a bit inefficient.
If any of the items are directories and you don't want to search inside of them, add -maxdepth 0 to disable find's recursion.
Another option that won't recurse, though I'd just use John's find solution if I were you.
$: stat -c "%n %w" "${files[#]}" | sed -n "
/ $(date +'%Y-%m-%d' --date=yesterday) /{ s/ .*//; p; }"
The stat will print the name and creation date of files in the array.
The sed "greps" for the date you want and strips the date info before printing the filename.

How to read CSV file stored in variable

I want to read a CSV file using Shell,
But for some reason it doesn't work.
I use this to locate the latest added csv file in my csv folder
lastCSV=$(ls -t csv-output/ | head -1)
and this to count the lines.
wc -l $lastCSV
Output
wc: drupal_site_livinglab.csv: No such file or directory
If I echo the file it says: drupal_site_livinglab.csv
Your issue is that you're one directory up from the path you are trying to read. The quick fix would be wc -l "csv-output/$lastCSV".
Bear in mind that parsing ls -t though convenient, isn't completely robust, so you should consider something like this to protect you from awkward file names:
last_csv=$(find csv-output/ -mindepth 1 -maxdepth 1 -printf '%T#\t%p\0' |
sort -znr | head -zn1 | cut -zf2-)
wc -l "$last_csv"
GNU find lists all files along with their last modification time, separating the output using null bytes to avoid problems with awkward filenames.
if you remove -maxdepth 1, this will become a recursive search
GNU sort arranges the files from newest to oldest, with -z to accept null byte-delimited input.
GNU head -z returns the first record from the sorted list.
GNU cut -z at the end discards the timestamp, leaving you with only the filename.
You can also replace find with stat (again, this assumes that you have GNU coreutils):
last_csv=$(stat csv-output/* --printf '%Y\t%n\0' | sort -znr | head -zn1 | cut -zf2-)

Using 'find' to select unknown patterns in file names with bash

Let's say I have a directory with 4 files in it.
path/to/files/1_A
path/to/files/1_B
path/to/files/2_A
path/to/files/2_B
I want to create a loop, which on each iteration, does something with two files, a matching X_A and X_B. I need to know how to find these files, which sounds simple enough using pattern matching. The problem is, there are too many files, and I do not know the prefixes aka patterns (1_ and 2_ in the example). Is there some way to group files in a directory based on the first few characters in the filename? (Ultimately to store as a variable to be used in a loop)
You could get all the 3-character prefixes by printing out all the file names, trimming them to three characters, and then getting the unique strings.
find -printf '%f\n' | cut -c -3 | sort -u
Then if you wanted to loop over each prefix, you could write a loop like:
find -printf '%f\n' | cut -c -3 | sort -u | while IFS= read -r prefix; do
echo "Looking for $prefix*..."
find -name "$prefix*"
done

Using find within a for loop to extract portion of file names as a variable (bash)

I have a number of files with a piece of useful information in their names that I want to extract as a variable and use in a subsequent step. The structure of the file names is samplename_usefulbit_junk. I'm attempting to loop through these files using a predictable portion of the file name (samplename), store the whole name in a variable, and use sed to extract the useful bit. It does not work.
samples="sample1 sample2 sample3"
for i in $samples; do
filename="$(find ./$FILE_DIR -maxdepth 1 -name '$(i)*' -printf '%f\n')"
usefulbit="$(find ./$FILE_DIR -maxdepth 1 -name '$(i)*' -printf '%f\n' | sed 's/.*samplename//g' | sed 's/junk.*//g')"
(More steps using $usefulbit or $(usefulbit) or ${usefulbit} or something)
done
find ./$FILE_DIR -maxdepth 1 -name 'sample1*' -printf '%f\n' and find ./$FILE_DIR -maxdepth 1 -name "sample1*" -printf '%f\n' both work, but no combination of parentheses, curly brackets, or single-, double-, or backquotes has got the loop to work. Where is this going wrong?
Try this:
for file in `ls *_*_*.*`
do
echo "Full file name is: $file"
predictable_portion_filename=${file_%%_*}
echo "predictable portion in the filename is: ${predictable_portion_filename}"
echo "---"
done
PS: $variable or ${variable} or "${variable}" or "$variable" are different than $(variable) as in the last case, $( ... ) makes a sub-shell and treats anything inside as a command i.e. $(variable) will make the sub-shell to execute a command named variable
In place of ls __., you can also use (to recursively find all files with that standard file name): ls -1R *_*_*.*
In place of using ${file%%_*} you can also use: echo ${file} | cut -d'_' -f1 to get the predictable value. You can use various other ways as well (awk, sed/ etc).
Excuse me, i can't do it with bash, may i show you another approach? Here is a shell (lua-shell) i am developing, and a demo as a solution for your case:
wws$ `ls ./demo/2
sample1_red_xx.png sample2_green_xx.png sample3_blue_xx.png
wws$ source ./demo/2.lua
sample1_red_xx.png: red
sample2_green_xx.png: green
sample3_blue_xx.png: blue
wws$
I really want to know your whole plan , unless you need bash as the only tool...
Er, i fogot to paste the script:
samples={"sample1", "sample2", "sample3"}
files = lfs.collect("./demo/2")
function get_filename(prefix)
for i, file in pairs(files) do
if string.match(file.name, prefix) then return file.name end
end
end
for i = 1, #samples do
local filename = get_filename(samples[i])
vim:set(filename)
:f_lvf_hy
print(filename ..": ".. vim:clipboard())
end
The 'get_filename()' seems a little verbose... i haven't finished the lfs component.
I'm not sure whether answering my own question with my final solution is proper stackoverflow etiquette, but this is what ultimately worked for me:
for i in directory/*.ext; do
myfile="$i"
name="$(echo $i | sed 's!.*/!!g' | sed 's/_junk*.ext//g')"
some other steps
done
This way I start with the file name already a variable (in a variable?) and don't have to struggle with find and its strong opinions. It also spares me from having to make a list of sample names.
The first sed removes the directory/ and the second removes the end of the file name and extension, leaving a variable $name that I use as a prefix when generating other files in subsequent steps. So much simpler!

Resources