Finding files in list using bash array loop - bash

I'm trying to write a script that reads a file with filenames, and outputs whether or not those files were found in a directory.
Logically I'm thinking it goes like this:
$filelist = prompt for file with filenames
$directory = prompt for directory path where find command is performed
new Array[] = read $filelist line by line
for i, i > numberoflines, i++
if find Array[i] in $directory is false
echo "$i not found"
export to result.txt
I've been having a hard time getting Bash to do this, any ideas?

First, I would just assume that all the file-names are supplied on standard input. E.g., if the file names.txt contains the file-names and check.sh is the script, you can invoke it like
cat names.txt | ./script.sh
to obtain the desired behaviour (i.e., using the file-names from names.txt).
Second, inside script.sh you can loop as follows over all lines of the standard input
while read line
do
... # do your checks on $line here
done
Edit: I adapted my answer to use standard input instead of command line arguments, due to the problem indicated by #rici.

while read dirname
do
echo $dirname >> result.txt
while read filename
do
find $dirname -type f -name $filename >> result.txt
done <filenames.txt
done <dirnames.txt

Related

Identifying folder with name as largest number in the directory

there is a directory which contains folders named with numbers, i've to find the folder with largest number in that directory.
This is the script i've written to find that folder:
files='ls path/'
var=0
for file in $files
do
echo $file
tmp=$((file-"0"))
if [ $tmp -gt $var ]
then
var=$tmp
fi
done
echo $var
But it's not working. It gives below error after invoking the script using command sudo ./restore2.sh.
ls
path/
./restore2.sh: line 6: path/: syntax error: operand expected (error token is "/")
0
Try this:
#!/bin/bash
files=`ls path/`
var=0
for file in $files
do
echo $file
tmp=$((file-"0"))
if [ $tmp -gt $var ]
then
var=$tmp
fi
done
echo $var
there's a backtick here: ls path/ instead of single or double-quotes.
I've only corrected this statement and it worked. and notice to add #!/bin/bash at the top of the script. This will tell your system to run the script in a bash shell.
You're using single quotes instead of backticks files='ls path/'. It's trying to use it as a literal string instead of evaluating it.
Also, for that specific task, you can just do:
ls test | awk '{if($1 > largest){largest = $1}} END{print largest}'
To have it a bit simpler.
Use find instead:
find . -maxdepth 1 -type d -regextype "posix-extended" -regex "^.*[[:digit:]]+.*$" | sort -n | tail -1
Set the maxdepth to 1 to check for directories within this directory only and no deeper. Set the regular expression type to posix-extended and search for all directories that have one or more digits. Print the result and order through sort before taking the largest one with tail -1.
Does path/ have any files in it? It looks like it's empty.
You should be getting a completely different complaint...
You don't want the path info in the filename. Rather than strip it with ${file##*/}, just go there and use non-path'd names.
An adaptation using your own logic as its base -
cd /whatever/path/ # go where the files are
var=-1 # initialize comparator
for file in [0-9]* # each entry that starts with a digit
do [[ "$file" =~ [^0-9] ]] && continue # skip any file with nondigit contents
[[ -f "$file" ]] || continue # only process plain files
(( file > var )) && var=$file # remember largest seen
done
echo $var # report largest
If you are sure there will be no negative numbered filenames, this should do it.
If there can be valid negatives, then your initialization needs to be appropriately lower, and the exclusion of nondigits should include the minus sign, as well as the list of files to select.
Note that this doesn't parse ls and doesn't require piping through a sort or spawning any other processes -- it's all handled in the bash interpreter and should be pretty efficient.
If you are sure of your data, and know there aren't any negatives or files named just 0 or non-plain-file entries in the directory that match the [0-9]* pattern, you can simplify it to just
cd /whatever/path/ # go where the files are
for file in [0-9]*; do (( file > var )) && var=$file; done
echo $var # report largest
As an aside, if you wanted to preserve the "make a list first" logic, you should still NOT use ls. Use an array.
cd /wherever/your/files/are/
files=( [0-9]* )
for file in "${files[#]}"
do : ...

What does each line of this bash script do?

I found an old past paper question with content not covered in my course. I hope I don't get examined with that but what does this bash script do? I know grep takes user input and outputs the line containing the input and echo just repeats the input and cat just displays the input. But I have on idea what this does as a whole. Any help please?
#!/bin/bash
outputFile=$1
for file in $(find -name '*txt' | grep data)
do
echo $file >> $outputFile
cat $file >> $outputFile
done
Each line:
#!/bin/bash
Hash-bang the script to use bash
outputFile=$1
Set the variable named "outputFile" to the first parameter passed into the script. Running the script would look like bash myScript.sh "/some/file/to/output.txt"
for file in $(find -name '*txt' | grep data)
do
Loop through every file in this directory and it's subdirectories looking for a file that ends in with the characters "txt" and contains the characters "data" somewhere in the name. For each iteration of the for loop/file found, set the file name to the variable "file"
echo $file >> $outputFile
Echo out/print the file name stored in the variable "file" to the outputFile
cat $file >> $outputFile
Take the contents of the file and stick it in the outputFile.
done
End the For Loop
There's some issues with this script though. If $outputFile or $file have a space in their name or path, then it will fail. It's good practice to toss double quotes around variables like:
cat "$file" >> "$outputFile"
#!/bin/bash
The shebang. If this script is executable an invoked directly as in ./this_script or found in the PATH, it will be invoked with /bin/bash.
outputFile=$1
Assign the first argument to the name outputFile.
++ find -name '*txt'
Recursively list all files with a name ending in "txt". It would be more standard to include the path and write this as find . -name '*.txt'.
+ … | grep data
Filter the previous list of file names. Only list those containing the string "data" in their names. This pipe could be eliminated by writing find . -name '*data*txt'.
for file in $(find -name '*txt' | grep data)
For every word in the output of the find | grep pipeline, assign that word to the name file and run the loop. This can break down if any of the found names have whitespace or glob characters in them. It would be better to use find's native -exec flag to handle this.
echo $file >> $outputFile
Append the expansion of the variable "file" to a new or existing file at the path found by expanding $outputFile. If the former expansion starts with a dash, it could cause echo to treat it as an argument. If the latter expansion has whitespace or a glob character in it, this may cause an "ambiguous redirect" error. It would be better to quote the expansions, and use printf to avoid the argument edge-case to echo, as in printf '%s\n' "$file" >> "$outputFile".
cat $file >> $outputFile
Append the contents of the file found at the expansion of the variable "file" to the path found by expanding $outputFile, or cause another ambiguous redirect error. It would be better to quote the expansions, like cat "$file" >> "$outputFile".
Assuming that none of the aforementioned expansion edge-cases were expected, it would be better to write this entire script like this:
find . -name '*data*txt' -print -exec cat {} \; >> "$1"

bash scripting and conditional statements

I am trying to run a simple bash script but I am struggling on how to incoperate a condition. any pointers. the loop says. I would like to incoperate a conditions such that when gdalinfo cannot open the image it copies that particular file to another location.
for file in `cat path.txt`; do gdalinfo $file;done
works fine in opening the images and also shows which ones cannot be opened.
the wrong code is
for file in `cat path.txt`; do gdalinfo $file && echo $file; else cp $file /data/temp
Again, and again and again - zilion th again...
Don't use contsructions like
for file in `cat path.txt`
or
for file in `find .....`
for file in `any command what produces filenames`
Because the code will BREAK immediatelly, when the filename or path contains space. Never use it for any command what produces filenames. Bad practice. Very Bad. It is incorrect, mistaken, erroneous, inaccurate, inexact, imprecise, faulty, WRONG.
The correct form is:
for file in some/* #if want/can use filenames directly from the filesystem
or
find . -print0 | while IFS= read -r -d '' file
or (if you sure than no filename contains a newline) can use
cat path.txt | while read -r file
but here the cat is useless, (really - command what only copies a file to STDOUT is useless). You should use instead
while read -r file
do
#whatever
done < path.txt
It is faster (doesn't fork a new process, as do in case of every pipe).
The above whiles will fill the corect filename into the variable file in cases when the filename contains a space too. The for will not. Period. Uff. Omg.
And use "$variable_with_filename" instead of pure $variable_with_filename for the same reason. If the filename contains a white-space any command will misunderstand it as two filenames. This probably not, what you want too..
So, enclose any shell variable what contain a filename with double quotes. (not only filename, but anything what can contain a space). "$variable" is correct.
If i understand right, you want copy files to /data/temp when the gdalinfo returns error.
while read -r file
do
gdalinfo "$file" || cp "$file" /data/temp
done < path.txt
Nice, short and safe (at least if your path.txt really contains one filename per line).
And maybe, you want use your script more times, therefore dont out the filename inside, but save the script in a form
while read -r file
do
gdalinfo "$file" || cp "$file" /data/temp
done
and use it like:
mygdalinfo < path.txt
more universal...
and maybe, you want only show the filenames for what gdalinfo returns error
while read -r file
do
gdalinfo "$file" || printf "$file\n"
done
and if you change the printf "$file\n" to printf "$file\0" you can use the script in a pipe safely, so:
while read -r file
do
gdalinfo "$file" || printf "$file\0"
done
and use it for example as:
mygdalinfo < path.txt | xargs -0 -J% mv % /tmp/somewhere
Howgh.
You can say:
for file in `cat path.txt`; do gdalinfo $file || cp $file /data/temp; done
This would copy the file to /data/temp if gdalinfo cannot open the image.
If you want to print the filename in addition to copying it in case of failure, say:
for file in `cat path.txt`; do gdalinfo $file || (echo $file && cp $file /data/temp); done

convert a file path into string

I'm having an error trying to find a way to replace a string in a directory path with another string
sed: Error tryning to read from {directory_path}: It's a directory
The shell script
#!/bin/sh
R2K_SOURCE="source/"
R2K_PROCESSED="processed/"
R2K_TEMP_DIR=""
echo " Procesando archivos desde $R2K_SOURCE "
for file in $(find $R2K_SOURCE )
do
if [ -d $file ]
then
R2K_TEMP_DIR=$( sed 's/"$R2K_SOURCE"/"$R2K_PROCESSED"/g' $file )
echo "directorio $R2K_TEMP_DIR"
else
# some code executes
:
fi
done
# find $R2K_PROCCESED -type f -size -200c -delete
i'm understanding that the rror it's in this line
R2K_TEMP_DIR=$( sed 's/"$R2K_SOURCE"/"$R2K_PROCESSED"/g' $file )
but i don't know how to tell sh that treats $file variable as string and not as a directory object.
If you want ot replace part of path name you can echo path name and take it to sed over pipe.
Also you must enable globbing by placing sed commands into double quotes instead of single and change separator for 's' command like that:
R2K_TEMP_DIR=$(echo "$file" | sed "s:$R2K_SOURCE:$R2K_PROCESSED:g")
Then you will be able to operate with slashes inside 's' command.
Update:
Even better is to remove useless echo and use "here is string" instead:
R2K_TEMP_DIR=$(sed "s:$R2K_SOURCE:$R2K_PROCESSED:g" <<< "$file")
First, don't use:
for item in $(find ...)
because you might overload the command line. Besides, the for loop cannot start until the process in $(...) finishes. Instead:
find ... | while read item
You also need to watch out for funky file names. The for loop will cough on all files with spaces in them. THe find | while will work as long as files only have a single space in their name and not double spaces. Better:
find ... -print0 | while read -d '' -r item
This will put nulls between file names, and read will break on those nulls. This way, files with spaces, tabs, new lines, or anything else that could cause problems can be read without problems.
Your sed line is:
R2K_TEMP_DIR=$( sed 's/"$R2K_SOURCE"/"$R2K_PROCESSED"/g' $file )
What this is attempting to do is edit your $file which is a directory. What you want to do is munge the directory name itself. Therefore, you have to echo the name into sed as a pipe:
R2K_TEMP_DIR=$(echo $file | sed 's/"$R2K_SOURCE"/"$R2K_PROCESSED"/g')
However, you might be better off using environment variable parameters to filter your environment variable.
Basically, you have a directory called source/ and all of the files you're looking for are under that directory. You simply want to change:
source/foo/bar
to
processed/foo/bar
You could do something like this ${file#source/}. The # says this is a left side filter and it will remove the least amount to match the glob expression after the #. Check the manpage for bash and look under Parameter Expansion.
This, you could do something like this:
#!/bin/sh
R2K_SOURCE="source/"
R2K_PROCESSED="processed/"
R2K_TEMP_DIR=""
echo " Procesando archivos desde $R2K_SOURCE "
find $R2K_SOURCE -print0 | while read -d '' -r file
do
if [ -d $file ]
then
R2K_TEMP_DIR="processed/${file#source/}"
echo "directorio $R2K_TEMP_DIR"
else
# some code executes
:
fi
done
R2K_TEMP_DIR="processed/${file#source/}" removes the source/ from the start of $file and you merely prepend processed/ in its place.
Even better, it's way more efficient. In your original script, the $(..) creates another shell process to run your echo in which then pipes out to another process to run sed. (Assuming you use loentar's solution). You no longer have any subprocesses running. The whole modification of your directory name is internal.
By the way, this should also work too:
R2K_TEMP_DIR="$R2K_PROCESSED/${file#$R2K_SOURCE}"
I just didn't test that.

bash to print certain file names to text

I have spent a lot of time the past few weeks and posting on here. I finally think I am much closer with learning bash but I have one problem with my code I cannot for the life of me figure out why it will not run. I can run each line in the terminal and it returns a result but for some reason when I point it to run, it will do nothing. I get a a syntax error: word unexpected (expecting "do").
#!/bin/bash
image="/Home/Desktop/epubs/images"
for f in $(ls "$image"*.jpg); do
fsize=$(stat --printf= '%s' "$f");
if [ "$fsize" -eq "40318" ]; then
echo "$(basename $f)" >> results.txt
fi
done
What am I missing???
The problem might be in line endings. Make sure your script file has unix line endings, not the Windows ones.
Also, do not iterate over output of ls. Use globbing right in the shell:
for f in "$file"/*.jpg ; do
Your for loop appears to be missing a list of values to iterate over:
image="/Home/Desktop/epubs/images"
for f in $(ls "$image"*.jpg); do
Because $image does not end with a /, your ls command expands to
for f in $(ls /Home/Desktop/epubs/images*.jpg); do
which probably results in
for f in ; do
causing the syntax error. The simplest fix is
for f in $(ls "$image"/*.jpg); do
but you should take the advice in the other answers and skip ls:
for f in "$image"/*.jpg; do
Here's how I would do that.
#!/bin/bash -e
image="/Home/Desktop/epubs/images"
(cd "$image"
for f in *.jpg; do
let fsize=$(stat -c %s "$f")
if (( fsize == 40318 )); then
echo "$f"
fi
done) >results.txt
The -e means the script will exit if anything goes wrong (can't cd into the directory, for instance). Saves a lot of error checking when you're happy with that behavior.
The parentheses mean that the cd command is in a subshell; the surrounding script (including the redirection into results.txt) is still in whatever directory you started in.
Now that we're in the directory, we can just look for *.jpg, no directory prefix, and no need to call basename on anything.
Using let and (( == )) treats the size value as a number instead of a string, so we won't get tripped up by any wonkiness in the way stat chooses to format the value.
We just redirect the output the entire loop into the result file instead of appending every time through; it's more efficient. If you have existing contents in results.txt that you want to keep, you can just change the > back to a >>, but leaving it around the whole loop is still more efficient than opening the file and appending to it on every iteration.

Resources