For loop, wildcard and conditional statement - bash

I don't really know what am I supposed to do with it.
For each file in the /etc directory whose name starts with the o or l and the second letter and the second letter of the name is t or r, display its name, size and type ('file'/'directory'/'link'). Use: wildcard, for loop and conditional statement for the type.
#!/bin/bash
etc_dir=$(ls -a /etc/ | grep '^o|^l|^.t|^.r')
for file in $etc_dir
do
stat -c '%s-%n' "$file"
done
I was thinking about something like that but I have to use if statement.

You may reach the goal by using find command.
This will search through all subdirectories.
#!/bin/bash
_dir='/etc'
find "${_dir}" -name "[ol][tr]*" -exec stat -c '%s-%n' {} \; 2>/dev/null
To have control on searching in subdirectories, you may use -maxdepth flag, like in the below example it will search only the files and directories name in the /etc dir and don't go through the subdirectories.
#!/bin/bash
_dir='/etc'
find "${_dir}" -maxdepth 1 -name "[ol][tr]*" -exec stat -c '%s-%n' {} \; 2>/dev/null
You may also use -type f OR -type d parameters to filter finding only Files OR Directories accordingly (if needed).
#!/bin/bash
_dir='/etc'
find "${_dir}" -name "[ol][tr]*" -type f -exec stat -c '%s-%n' {} \; 2>/dev/null
Update #1
Due to your request in the comments, this is a long way but used for loop and if statement.
Note: I'd strongly recommend to review and practice the commands used in this script instead of just copy and pasting them to get the score ;)
#!/bin/bash
# Set the main directory path.
_mainDir='/etc'
# This will find all files in the $_mainDir (ignoring errors if any) and assign the file's path to the $_files variable.
_files=$(find "${_mainDir}" 2>/dev/null)
# In this for loop we will
# loop over all files
# identify the poor filename from the whole file path
# and IF the poor file name matches the statement then run & output the `stat` command on that file.
for _file in ${_files} ;do
_fileName=$(basename ${_file})
if [[ "${_fileName}" =~ ^[ol][tr].* ]] ;then
stat -c 'Size: %s , Type: %n ' "${_file}"
fi
done
exit 0

You should break-down you problems into multiple pieces and tackle them one by one.
First, try and build an expression that finds the right files. If you were to execute your regex expression in a shell:
ls -a /etc/ | grep '^o|^l|^.t|^.r'
You would immediately see that you don't get the right output. So the first step would be to understand how grep works and fix the expression to:
ls -a /etc/ | grep '^[ol][tr]*'
Then, you have the file name, and you need the size and a textual file type. The size is easy to obtain using a stat call.
But, you soon realize you cannot ask stat to provide a textual format of the file type with the -f switch, so you probably have to use an if clause to present that.

How about this:
shopt -s extglob
ls -dp /etc/#(o|l)#(t|r)* | grep -v '/$'
Explanation:
shopt extglob - enable extended globbing (https://www.google.com/search?q=bash+extglob)
ls -d - list directories names, not their content
ls -dp - and add / at the end of each directory name
#(o|l)#(t|r) - o or l once (#), and then t or r once
grep -v '/$' - remove all lines containing / at the end
Of course, Vab's find solution is better that this ls:
find /etc -maxdepth 1 -name "[ol][tr]*" -type f -exec stat {} \;

Related

Create archive from difference of two folders

I have the following problem.
There are two nested folders A and B. They are mostly identical, but B has a few files that A does not. (These are two mounted rootfs images).
I want to create a shell script that does the following:
Find out which files are contained in B but not in A.
copy the files found in 1. from B and create a tar.gz that contains these files, keeping the folder structure.
The goal is to import the additional data from image B afterwards on an embedded system that contains the contents of image A.
For the first step I put together the following code snippet. Note to grep "Nur" : "Nur in" = "Only in" (german):
diff -rq <A> <B>/ 2>/dev/null | grep Nur | awk '{print substr($3, 1, length($3)-1) "/" substr($4, 1, length($4)-1)}'
The result is the output of the paths relative to folder B.
I have no idea how to implement the second step. Can someone give me some help?
Using diff for finding files which don't exist is severe overkill; you are doing a lot of calculations to compare the contents of the files, where clearly all you care about is whether a file name exists or not.
Maybe try this instead.
tar zcf newfiles.tar.gz $(comm -13 <(cd A && find . -type f | sort) <(cd B && find . -type f | sort) | sed 's/^\./B/')
The find commands produce a listing of the file name hierarchies; comm -13 extracts the elements which are unique to the second input file (which here isn't really a file at all; we are using the shell's process substitution facility to provide the input) and the sed command adds the path into B back to the beginning.
Passing a command substitution $(...) as the argument to tar is problematic; if there are a lot of file names, you will run into "command line too long", and if your file names contain whitespace or other irregularities in them, the shell will mess them up. The standard solution is to use xargs but using xargs tar cf will overwrite the output file if xargs ends up calling tar more than once; though perhaps your tar has an option to read the file names from standard input.
With find:
$ mkdir -p A B
$ touch A/a A/b
$ touch B/a B/b B/c B/d
$ cd B
$ find . -type f -exec sh -c '[ ! -f ../A/"$1" ]' _ {} \; -print
./c
./d
The idea is to use the exec action with a shell script that tests the existence of the current file in the other directory. There are a few subtleties:
The first argument of sh -c is the script to execute, the second (here _ but could be anything else) corresponds to the $0 positional parameter of the script and the third ({}) is the current file name as set by find and passed to the script as positional parameter $1.
The -print action at the end is needed, even if it is normally the default with find, because the use of -exec cancels this default.
Example of use to generate your tarball with GNU tar:
$ cd B
$ find . -type f -exec sh -c '[ ! -f ../A/"$1" ]' _ {} \; -print > ../list.txt
$ tar -c -v -f ../diff.tar --files-from=../list.txt
./c
./d
Note: if you have unusual file names the --verbatim-files-from GNU tar option can help. Or a combination of the -print0 action of find and the --null option of GNU tar.
Note: if the shell is POSIX (e.g., bash) you can also run find from the parent directory and get the path of the files relative from there, if you prefer:
$ mkdir -p A B
$ touch A/a A/b
$ touch B/a B/b B/c B/d
$ find B -type f -exec sh -c '[ ! -f A"${1#B}" ]' _ {} \; -print
B/c
B/d

How to remove files from a directory if their names are not in a text file? Bash script

I am writing a bash script and want it to tell me if the names of the files in a directory appear in a text file and if not, remove them.
Something like this:
counter = 1
numFiles = ls -1 TestDir/ | wc -l
while [$counter -lt $numFiles]
do
if [file in TestDir/ not in fileNames.txt]
then
rm file
fi
((counter++))
done
So what I need help with is the if statement, which is still pseudo-code.
You can simplify your script logic a lot :
#/bin/bash
# for loop to iterate over all files in the testdir
for file in TestDir/*
do
# if grep exit code is 1 (file not found in the text document), we delete the file
[[ ! $(grep -x "$file" fileNames.txt &> /dev/null) ]] && rm "$file"
done
It looks like you've got a solution that works, but I thought I'd offer this one as well, as it might still be of help to you or someone else.
find /Path/To/TestDir -type f ! -name '.*' -exec basename {} + | grep -xvF -f /Path/To/filenames.txt"
Breakdown
find: This gets file paths in the specified directory (which would be TestDir) that match the given criteria. In this case, I've specified it return only regular files (-type f) whose names don't start with a period (-name '.*'). It then uses its own builtin utility to execute the next command:
basename: Given a file path (which is what find spits out), it will return the base filename only, or, more specifically, everything after the last /.
|: This is a command pipe, that takes the output of the previous command to use as input in the next command.
grep: This is a regular-expression matching utility that, in this case, is given two lists of files: one fed in through the pipe from find—the files of your TestDir directory; and the files listed in filenames.txt. Ordinarily, the filenames in the text file would be used to match against filenames returned by find, and those that match would be given as the output. However, the -v flag inverts the matching process, so that grep returns those filenames that do not match.
What results is a list of files that exist in the directory TestDir, but do not appear in the filenames.txt file. These are the files you wish to delete, so you can simply use this line of code inside a parameter expansion $(...) to supply rm with the files it's able to delete.
The full command chain—after you cd into TestDir—looks like this:
rm $(find . -type f ! -name '.*' -exec basename {} + | grep -xvF -f filenames.txt")

How to cd into grep output?

I have a shell script which basically searches all folders inside a location and I use grep to find the exact folder I want to target.
for dir in /root/*; do
grep "Apples" "${dir}"/*.* || continue
While grep successfully finds my target directory, I'm stuck on how I can move the folders I want to move in my target directory. An idea I had was to cd into grep output but that's where I got stuck. Tried some Google results, none helped with my case.
Example grep output: Binary file /root/ant/containers/secret/Documents/2FD412E0/file.extension matches
I want to cd into 2FD412E0and move two folders inside that directory.
dirname is the key to that:
cd $(dirname $(grep "...." ...))
will let you enter the directory.
As people mentioned, dirname is the right tool to strip off the file name from the path.
I would use find for such kind of task:
while read -r file
do
target_dir=`dirname $file`
# do something with "$target_dir"
done < <(find /root/ -type f \
-exec grep "Apples" --files-with-matches {} \;)
Consider using find's -maxdepth option. See the man page for find.
Well, there is actually simpler solution :) I just like to write bash scripts. You might simply use single find command like this:
find /root/ -type f -exec grep Apples {} ';' -exec ls -l {} ';'
Note the second -exec. It will be executed, if the previous -exec command exited with status 0 (success). From the man page:
-exec command ;
Execute command; true if 0 status is returned. All following arguments to find are taken to be arguments to the command until an argument consisting of ; is encountered. The string {} is replaced by the current file name being processed everywhere it occurs in the arguments to the command, not just in arguments where it is alone, as in some versions of find.
Replace the ls -l command with your stuff.
And if you want to execute dirname within the -exec command, you may do the following trick:
find /root/ -type f -exec grep -q Apples {} ';' \
-exec sh -c 'cd `dirname $0`; pwd' {} ';'
Replace pwd with your stuff.
When find is not available
In the comments you write that find is not available on your system. The following solution works without find:
grep -R --files-with-matches Apples "${dir}" | while read -r file
do
target_dir=`dirname $file`
# do something with "$target_dir"
echo $target_dir
done

Using bash I need to perform a find of 0 byte files but report on their existence before deletion

The history of this problem is:
I have millions of files and directories on a NAS system. I found a count of 1,095,601 empty (0 byte) files. These files used to have data but were destroyed by a predecessor not using the correct toolsets to migrate data between an XSAN and this Isilon NAS.
The files were media production data, like fonts, pdfs and image files. They are no longer useful beyond the history of their existence. Before I proceed to delete them, the production user's need a record of which files used to exist, so when they browse a project folder, they can use the unaffected files but then refer to a text file in the same directory which records which files used to also be there and thus provide reason as to why certain reference files are broken.
So how do I find files across multiple directories and delete them but first output their filename to a text file which would be saved to each relevant path location?
I am thinking along the lines of:
for file in $(find . -type f -size 0); do
echo "$file" >> /PATH/TO/FOUND/FILE/PARENT/DIR/deletedFiles.txt -print0 |
xargs -0 rm ;
done
To delete each empty file while leaving behind a file called deletedFiles.txt which contains the names of the deleted files, try:
PATH=/bin:/usr/bin find . -empty -type f -execdir bash -c 'printf "%s\n" "$#" >>deletedFiles.txt' none {} + -delete
How it works
PATH=/bin:/usr/bin
This sets a temporary but secure path.
find .
This starts find looking in the current directory
-empty
This tells find to only look for empty files
-type f
This restricts find to looking for regular files.
-execdir bash -c 'printf "%s\n" "$#" >>deletedFiles.txt' none {} +
In each directory that contains an empty file, this adds the name of each empty file to the file deletedFiles.txt.
Notice the peculiar use of none in the command:
bash -c 'printf "%s\n" "$#" >>deletedFiles.txt' none {} +
When this command is run, bash will execute the string printf "%s\n" "$#" >>deletedFiles.txt and the arguments that follow that string are assigned to the positional parameters: $0, $1, $2, etc. When we use $#, it does not include $0. It, as is usual, expands to $1, $2, .... Thus, we add the placeholder none so that the placeholder is assigned is the $0, which we will ignore, and the complete list of file names are assigned to "$#".
-delete
This deletes each empty file.
Why not simply
find . -type f -size 0 -exec rm -v + |
sed -e 's%^removed .\./%%' -e 's/.$//' >deletedFiles.txt
If your find is too old to support -exec ... + you'll need to revert to -exec rm -v {} \; or refactor to
find . -type f -size 0 -print0 |
xargs -r -0 rm -v |
sed -e 's%^removed .\./%%' -e 's/.$//' >deletedFiles.txt
The brief sed script is to postprocess the output from rm -v which looks like
removed ‘./bar’
removed ‘./foo’
(with some funny quote characters around the file name) on my system. If you are fine with that output, of course, just omit the sed script from the pipeline.
If you know in advance which directories contain empty files, you can run the above snippet individually in those directories. Assuming you saved the snippet above as a script (with a proper shebang and execute permissions) named find-empty, you could simply use
for path in /path/to/first /path/to/second/directory /path/to/etc; do
cd "$path" && find-empty
done
This will only work if you have absolute paths (if not, you can run the body of the loop in a subshell by adding parentheses around it).
If you want to inspect all the directories in a tree, change the script to print to standard output instead (remove >deletedFiles.txt from the script) and try something like
find /path/to/tree -type d -exec sh -c '
t=$(mktemp -t find-emptyXXXXXXXX)
cd "$1" &&
find-empty | grep . >"$t" &&
mv "$t" deletedFiles.txt ||
rm "$t"' _ {} \;
This uses a temporary file so as to avoid updating the timestamp of directories which do not contain any empty files. The grep . is used purely for side effect; if any (non-empty) lines are printed, it will return success, whereas otherwise, it will report failure; this way, we know whether or not to move the temporary file to the target directory.
With prompting from #JonathanLeffler I have succeeded with the following:
#!/bin/bash
## call this script with: find . -type f -empty -exec handleEmpty.sh {} +
for file in "$#"
do
file2="$(basename "$file")"
echo "$file2" >> "$(dirname "$file")"/deletedFiles.txt
rm "$file"
done
This means I retain a trace of the removed files in a deletedFiles.txt flag file in each respective directory for the users to see when files are missing. That way, they can pursue going back to archive CD's to retrieve these deleted files, which are hopefully not 0 byte files.
Thanks to #John1024 for the suggestion of using the empty flag rather than size.

Get first file of given extension from a folder

I need to get the first file in a folder which has the .tar.gz extension. I came up with:
FILE=/path/to/folder/$(ls /path/to/folder | grep ".tar.gz$" | head -1)
but I feel it can be done simpler and more elegant. Is there a better solution?
You could get all the files in an array, and then get the desired one:
files=( /path/to/folder/*.tar.gz )
Getting the first file:
echo "${files[0]}"
Getting the last file:
echo "${files[${#files[#]}-1]}"
You might want to set the shell option nullglob to handle cases when there are no matching files:
shopt -s nullglob
here is the shorter version from your own idea.
FILE=$(ls /path/to/folder/*.tar.gz| head -1)
You can use set as shown below. The shell will expand the wildcard and set will assign the files as positional parameters which can be accessed using $1, $2 etc.
# set nullglob so that if no matching files are found, the wildcard expands to a null string
shopt -s nullglob
set -- /path/to/folder/*.tar.gz
# print the name of the first file
echo "$1"
It is not good practice to parse ls as you are doing, because it will not handle filenames containing newline characters. Also, the grep is unnecessary because you could simply do ls /path/to/folder/*.tar.gz | head -1.
Here's a way to accomplish it:
for FILE in *.tar.gz; do break; done
You tell bash to break the loop in the first iteration, just when the first filename is assigned to FILE.
Another way to do the same:
first() { FILE=$1; } && first *.tar.gz
Here you are using the positional parameters of the function first which is better than set the positional parameters of your entire bash process (as with set --).
Here's a find based solution:
$ find . -maxdepth 1 -type f -iname "*.tar.gz" | head -1
where:
. is the current directory
-maxdepth 1 means only check the current directory
-type f means only look at files
-iname "*.tar.gz" means do a case-insensitive search for any file with the .tar.gz extension
| head -1 takes the results of find and only returns the first line
You could get rid of the | head -1 by doing something like:
$ find . -maxdepth 1 -type f -iname "*.tar.gz" -maxdepth 1 -print -quit
But I'm actually not sure how portable -print -quit is across environments (it works on MacOS and Ubuntu though).

Resources