Add file name to txt file if it is not zero bytes - bash

I would like to add files that meet a set of conditions to a txt file for easy transfer later, I do this with:
ls -1 > AllFilesPresent.txt
value=$(<AllFilesPresent.txt)
rm AllFilesPresent.txt
for val in $value; do
case $val in
(Result*.RData) echo "$val" >> CompletedJobs.txt ;;
esac
done
I've run into the situation where some of the files are corrupted and show up as zero byte files, which i can find manually with:
find . -size 0 -maxdepth 1
How do I include adjust my loop to reject files that are zero bytes?

The code
ls -1 > AllFilesPresent.txt
value=$(<AllFilesPresent.txt)
rm AllFilesPresent.txt
for val in $value; do
has essentially identical functionality to
for val in $(ls -1); do
That doesn't work in general. It breaks if filenames have whitespace or glob characters in them, at least. See Bash Pitfalls #1 (for f in $(ls *.mp3)). In addition, there are particular problems with using the output of ls in programs. It's only suitable for interactive use. See Why you shouldn't parse the output of ls(1).
A correct, completely safe, and much shorter and faster, alternative is:
for val in *; do
A full solution for your question is:
shopt -s nullglob
for file in Result*.RData; do
[[ -f $file && -s $file ]] && printf '%s\n' "$file"
done >CompletedJobs.txt
shopt -s nullglob prevents glob patterns expanding to (what amounts to ) garbage if they don't match any files.
I've replaced val with the more meaningful (to me anyway) file.
The Result*.RData causes the loop to only process files that match that pattern.
I've added a -f $file test to avoid processing any non-file things (directories, fifos, ...) that might be lying around. It still allows symlinks to files through. You might not want that. You can add a ! -L $file && at the start of the test expression if you want to rule out symlinks.
I've replaced echo "$val" with printf '%s\n' "$val" because the original code doesn't work in general. See the accepted, and excellent, answer to Why is printf better than echo?.
I've moved the redirection to CompletedJobs.txt outside the loop, as suggested by #CharlesDuffy in a comment.
Note that this code won't work if any of the files have newlines in their names (e.g. create one with echo data > $'Result\n1.RData'). That's very uncommon, but posssible. The only way to safely store general unquoted filenames in files is to separate them with ASCII NUL characters (which can't appear in file names). To do that, replace the printf ... with printf '%s\0' "$file". That would mean that CompletedJobs.txt is no longer a text file though. It would also require modifications to any tools that read the file.
You could also do this just with find:
find . -maxdepth 1 -type f -name 'Result*.RData' -not -size 0 -printf '%P\n' >CompletedJobs.txt
The %P format with -printf removes the leading ./ from outputs so you get Result2.RData instead of ./Result2.RData (which find would print by default, or with the -print option).
Replace \n with \0 to make the output safe for any possible filename.

-s file True if file exists and has a size greater than zero.
ls -1 > AllFilesPresent.txt
value=$(<AllFilesPresent.txt)
rm AllFilesPresent.txt
for val in $value; do
if [[ -s "${val}" ]]; then
case $val in
(Result*.RData) echo "$val" >> CompletedJobs.txt ;;
esac
fi
done

Related

How to create a txt file with a list of directory names if directories have a certain file

I have a parent directory with over 800+ directories, each of these has a unique name. Some of these directories house a sub-directory called y in which a file called z, (if it exists) can be found.
I need to script a loop that will check each of the 800+ for z, and if it's there, I need to append the name of the directory (the directory before y) into a text file. I'm not sure how to do this.
This is what I have
#!/bin/bash
for d in *; do
if [ -d "y"]; then
for f in *; do
if [ -f "x"]
echo $d >> IDlist.txt
fi
fi
done
Let's assume that any foo/y/z is a file (that is, you do not have directories with such names). If you had a really large number of such files, storing all paths in a bash variable could lead to memory issues, and would advocate for another solution, but about 800 paths is not large. So, something like this should be OK:
declare -a names=(*/y/z)
printf '%s\n' "${names[#]%%/*}" > IDlist.txt
Explanation: the paths of all z files are first stored in array names, thanks to a glob pattern: */y/z. Then, a pattern substitution is applied to each array element to suppress the /y/z part: "${names[#]%%/*}". The result is printed, one name per line: printf '%s\n'.
If you also had directories named z, or if you had millions of files, find could be used, instead, with a bit of awk to retain only the leading directory name:
find . -mindepth 3 -maxdepth 3 -path './*/y/z' -type f |
awk -F/ '{print $2}' > IDlist.txt
If you prefer sed for the post-processing:
find . -mindepth 3 -maxdepth 3 -path './*/y/z' -type f |
sed 's|^\./\(.*\)/y/z|\1|' > IDlist.txt
These two are probably also more efficient (faster).
Note: your initial attempt could also work, even if using bash loops is far less efficient, but it needs several changes:
#!/bin/bash
for d in *; do
if [ -d "$d/y" ]; then
for f in "$d"/y/*; do
if [ "$f" = "$d/y/z" ]; then
printf '%s\n' "$d" >> IDlist.txt
fi
done
fi
done
As noted by #LéaGris, printf is better than echo because if d is the -e string, for instance, echo "$d" interprets it as an option of the echo command and does not print it.
But a simpler and more efficient version (even if not as efficient as the first proposal or the find-based ones) would be:
#!/bin/bash
for d in *; do
if [ -f "$d/y/z" ]; then
printf '%s\n' "$d"
fi
done > IDlist.txt
As you can see there is another improvement (also suggested by #LéaGris), which consists in redirecting the output of the entire loop to the IDlist.txt file. This will open and close the file only once, instead of once per iteration.
This should solve it:
for f in */y/z; do
[ -f "$f" ] && echo ${f%%/*}
done
Note:
If there is a possibility of weird top level directory name like "-e", use printf instead of echo, as in the comment below.
This should do it:
shopt -s nullglob
outfile=IDlist.txt
>$outfile
for found in */y/x
do
[[ -f $found ]] && echo "${found%%/*}" >>$outfile # Drop the /y/x part
done
The nullglob ensures that the loop is skipped if there is no match, and the quotes in the echo ensure that the directory name is output correctly even if it contains two successive spaces.
You can first try to do some filtering using find
Below will list all z files recursively within current directory
Then let's say the one of the output was
./dir001/y/z
Then you can extract required part using multiple ways grep, sed, awk, etc
e.g. with grep
find . -type f | grep z | grep -E -o "y.*$"
will give
y/z
The first example doesn't check that z is a file, but I think it's worth showing compgen:
#!/bin/bash
compgen -G '*/y/z' | sed 's|/.*||' > IDlist.txt
Doing glob expansion, file check and path splitting with perl only:
perl -E 'foreach $p (glob "*/y/z") {say substr($p, 0, index($p, "/")) if -f $p}' > IDlist.txt

Identifying folder with name as largest number in the directory

there is a directory which contains folders named with numbers, i've to find the folder with largest number in that directory.
This is the script i've written to find that folder:
files='ls path/'
var=0
for file in $files
do
echo $file
tmp=$((file-"0"))
if [ $tmp -gt $var ]
then
var=$tmp
fi
done
echo $var
But it's not working. It gives below error after invoking the script using command sudo ./restore2.sh.
ls
path/
./restore2.sh: line 6: path/: syntax error: operand expected (error token is "/")
0
Try this:
#!/bin/bash
files=`ls path/`
var=0
for file in $files
do
echo $file
tmp=$((file-"0"))
if [ $tmp -gt $var ]
then
var=$tmp
fi
done
echo $var
there's a backtick here: ls path/ instead of single or double-quotes.
I've only corrected this statement and it worked. and notice to add #!/bin/bash at the top of the script. This will tell your system to run the script in a bash shell.
You're using single quotes instead of backticks files='ls path/'. It's trying to use it as a literal string instead of evaluating it.
Also, for that specific task, you can just do:
ls test | awk '{if($1 > largest){largest = $1}} END{print largest}'
To have it a bit simpler.
Use find instead:
find . -maxdepth 1 -type d -regextype "posix-extended" -regex "^.*[[:digit:]]+.*$" | sort -n | tail -1
Set the maxdepth to 1 to check for directories within this directory only and no deeper. Set the regular expression type to posix-extended and search for all directories that have one or more digits. Print the result and order through sort before taking the largest one with tail -1.
Does path/ have any files in it? It looks like it's empty.
You should be getting a completely different complaint...
You don't want the path info in the filename. Rather than strip it with ${file##*/}, just go there and use non-path'd names.
An adaptation using your own logic as its base -
cd /whatever/path/ # go where the files are
var=-1 # initialize comparator
for file in [0-9]* # each entry that starts with a digit
do [[ "$file" =~ [^0-9] ]] && continue # skip any file with nondigit contents
[[ -f "$file" ]] || continue # only process plain files
(( file > var )) && var=$file # remember largest seen
done
echo $var # report largest
If you are sure there will be no negative numbered filenames, this should do it.
If there can be valid negatives, then your initialization needs to be appropriately lower, and the exclusion of nondigits should include the minus sign, as well as the list of files to select.
Note that this doesn't parse ls and doesn't require piping through a sort or spawning any other processes -- it's all handled in the bash interpreter and should be pretty efficient.
If you are sure of your data, and know there aren't any negatives or files named just 0 or non-plain-file entries in the directory that match the [0-9]* pattern, you can simplify it to just
cd /whatever/path/ # go where the files are
for file in [0-9]*; do (( file > var )) && var=$file; done
echo $var # report largest
As an aside, if you wanted to preserve the "make a list first" logic, you should still NOT use ls. Use an array.
cd /wherever/your/files/are/
files=( [0-9]* )
for file in "${files[#]}"
do : ...

Bash rename last underscore in string

I have got a directory with files in which some of then end with an underscore.
I would like to test each file to see if it ends with an underscore and then strip off the underscore.
I am currently running the following code:
for file in *;do
echo $file;
if [[ "${file:$length:1}" == "_" ]];then
mv $file $(echo $file | sed "s/.$//g");
fi
done
But it does not seem to be renaming the files with underscore. For example if i have a file called all_indoors_ I expect it to give me all_indoors.
You could use built-in string substitution:
for file in *_; do
mv "$file" "${file%_}"
done
Just use a regex to check the string:
for file in *
do
[[ $file =~ "_$" ]] && echo mv "$file" "${file%%_}"
done
Once you are sure it works as intended, remove the echo so that the mv command executes!
It may even be cleaner to use *_ so that the for will just loop over the files with a name ending with _, as hek2mgl suggests in comments.
for file in *_
do
echo mv "$file" "${file%%_}"
done
You can use which will be recursive:
while read f; do
mv "$f" "${f:0:-1}"; # Remove last character from $f
done < <(find . -type f -name '*_')
Although not a pure bash approach, you can use rename.ul (written by Larry Wall, the person behind perl). Rename is not part of the default linux environment, but is part of util-linux.
You use rename with:
rename perlexpr files
(some flags ommitted).
So you could use:
rename 's/_$//' *
if you want to remove all characters including and after the underscore.
As #hek2mgl points out, there are multiple rename commands (see here), so first test if you have picked the right one.

change file names in a directory with a certain pattern at the beginning

I want to remove the numbers from the file names in one directory:
file names:
89_Ajohn_text_phones
3_jpegs_directory
..
What I would like to have
Ajohn_text_phones
jpegs_directory
I tried:
rename 's/^\([0-9]|[0-9][0-9]\)_//g' *
but I did not work.
There are two rename tools. The one you appear to try to use is based on Perl, and as such uses perl-style regular expressions. The escaping rules are a little different from sed; in particular, parentheses for grouping aren't escaped (escaped parentheses are literal parentheses). Use
rename 's/^([0-9]|[0-9][0-9])_//' *
or, somewhat more concisely,
rename 's/^[0-9]{1,2}_//' *
This rename is the default on Debian-based distributions such as Ubuntu.
The other rename tool is from util-linux, and it is a very simple tool that does not work with regexes and cannot handle this particular requirement. It is the default on, for example, Fedora-based distributions, and what's worse, those often don't even package the Perl rename. You can find the Perl rename here on CPAN and put it in /usr/local/bin if you want, but otherwise, your simplest option is probably a shell loop with mv and sed:
for f in *; do
# set dest to the target file name
# Note: using sed's extended regex syntax (-r) because it is nice.
# Much less escaping of metacharacters is needed. Note that
# sed on FreeBSD and Mac OS X uses -E instead of -r, so there
# you'd have to change that or use regular syntax, in which
# the regex would have to be written as ^[0-9]\{1,2\}_
dest="$(echo "$f" | sed -r 's/^[0-9]{1,2}_//')"
if [ "$f" = "$dest" ]; then
# $f did not match pattern; do nothing
true;
elif [ -e "$dest" ]; then
# avoid losing files.
echo "$dest already exists!"
else
mv "$f" "$dest"
fi
done
You could put this into a shell script, say rename.sh:
#!/bin/sh
transform="$1"
shift
for f in "$#"; do
dest="$(echo "$f" | sed -r "$transform")"
if [ "$f" = "$dest" ]; then
## do nothing
true;
elif [ -e "$dest" ]; then
echo "$dest already exists!"
else
mv "$f" "$dest"
fi
done
and call rename.sh 's/^[0-9]{1,2}_//' *.
One caveat remains: in
dest="$(echo "$f" | sed -r "$transform")"
there is a possibility that "$f" could be something that echo considers a command line option, such as -n or -e. I do not know a way to solve this portably (if you read this and know one, please leave a comment), but if you are in a position where tethering yourself to bash is not a problem, you can use bash's here strings instead:
dest="$(sed -r "$transform" <<< "$f")"
(and replace the #!/bin/sh shebang with #!/bin/bash). Such files are rare, though, so as timebombs go, this is not unlikely to be a dud.
#!/bin/bash
for f in *
do
mv "$f" "${f#*_}"
done

How can I grep contents of files with bash only without using find or grep -r?

I have an assignment to write a bash program which if I type in the following:
-bash-4.1$ ./sample.sh path regex keyword
that will result something like that:
path/sample.txt:12
path/sample.txt:34
path/dir/sample1.txt:56
path/dir/sample2.txt:78
The numbers are the line number of the search results. I have absolutely no idea how can I achieve this in bash, without using find or grep -r. I am allowed to use grep, sed, awk, …
Break the problem into parts.
First, you need to obtain the file names to search in. How can you list the files in a directory and its subdirectories? (Hint: there's a glob pattern for that.)
You need to iterate over the files. What form of loop should this be?
For each file, you need to read each line from the file in turn. There's a builtin for that.
For each line, you need to test whether the line matches the specified regexp. There's a construct for that.
You need to maintain a counter of the number of lines read in a file to be able to print the line number.
Search for globstar in the bash manual.
See https://unix.stackexchange.com/questions/18886/why-is-while-ifs-read-used-so-often-instead-of-ifs-while-read/18936#18936 regarding while read loops.
shopt -s globstar # to enable **/
GLOBIGNORE=.:.. # to match dot files
dir=$1; regex=$2
for file in "$dir"/**/*; do
[[ -f $file ]] || continue
n=1
while IFS= read -r line; do
if [[ $line =~ $regex ]]; then
echo "$file:$n"
fi
((++n))
done <"$file"
done
It's possible that your teacher didn't intend you to use the globstar feature, which is a relatively recent addition to bash (appeared in version 4.0). If so, you'll need to write a recursive function to recurse into subdirectories.
traverse_directory () {
for x in "$1"/*; do
if [ -d "$x" ]; then
traverse_directory "$x"
elif [ -f "$x" ]; then
grep "$regexp" "$x"
fi
done
}
Putting this into practice:
#!/bin/sh
regexp="$2"
traverse_directory "$1"
Follow-up exercise: the glob pattern * omits files whose name begins with a . (dot files). You can easily match dot files as well by adding looping over .* as well, i.e. for x in .* *; do …. However, this throws the function into an infinite loop as it recurses forever into . (and also ..). How can you change the function to work with dot files as well?
while read
do
[[ $REPLY =~ foo ]] && echo $REPLY
done < file.txt

Resources