Find a file in unix using shell - bash

I am new to Unix and i have an assignment, where im struggling to find the solution
Find a given file in given directory and subdirectories recursively
If yes, print the full file path along with its size and word count, last modified time .
I tried using
find . -name "*.xlsx"
im getting some values
i want to do it in sh program
In shell i tried,
#!/bin/sh
file="/home/sample"
if[[$(find . -name "*.xlsx") -gt 0]]
then
echo"files are there"
fi
I need to get the values, instead im getting error. How to run the find in if statement.
Thanks

First issue was you need to have space between if and [[.. , Also space after echo and the string to print. Also add ; after ]]
Secondly, use -n inside if with find. It will test for a non-empty string. This will make your if statement and find work.
if [[ -n $(find . -name "*.txt") ]];
then
echo $PWD/*.txt # This will print full path of file.
wc *.txt | awk '{print "Word Count:",$1,"File Size:",$3,"Bytes" }' #Prints word count and size
fi
For getting word count and file size we can use wc command. First column gives the word count and third column gives the file size in bytes. You can use awk to print the columns.

Related

grep files against a list only containing numbers

I have several files (~70000) that have numbers in the name, a couple of examples would be 991000_Metatissue.qsub.file 828000_Metatissue.qsub.file, and then I have another file (files_failed.txt) with a bunch of numbers that I would use to grep. This list looks like this:
4578000
458000
4582000
527000
5288000
5733000
653000
6548000
6663000
I have tried with: ls -1 *.qsub.file | grep -F -f files_failed.txt - and even doing this:
ls -1 *.qsub.file > files_to_submit.txt
grep -F -f files_failed.txt files_to_submit.txt
But always got all the qsub.files...
grep -f isn't well composed (see GNU bug 16305), so I recommend using awk instead:
find . -name '*_*.qsub.file' |awk -F_ '
NR == FNR { failed[$NR] = 1; next }
$1 in failed
' files_failed.txt /dev/stdin
This uses find to locate the files in question, piping them into awk. Before awk processes that, it reads files_failed.txt and stores the values into an associated array (aka dictionary or hash) when the line number (NR, number of records so far) equals the line number of the current file (FNR), meaning it's the first file read. If the first column (the number of the file since we delimited by _) is in that array, it was a failure. AWK's default action on a stanza is to print it, so you will get a list of those failed files.
Note the lack of regular expressions! On a big directory, this is much faster than grep -F -f …, which itself is much faster than grep -f …, even assuming the aforementioned bug is fixed.
You should be using find and you need to modify your "patterns". Here is one way that should work:
# List all files ending in "qsub.file"
find . -name '*.qsub.file' |
# Add ./ and _ to each number to make the match exact
grep -F -f <(sed -e 's:^:./:' -e 's/$/_/' files_failed.txt)
70000 files is too much for a ls, you should use find instead.
And I prefer invert the logic, list just what a want instead of list all and then filter.
Something like
while read line; do find -iname $line_Metatissue.qsub.file; done < files_failed.txt
If you need the exit in another file?
while read line; do find -iname $line_Metatissue.qsub.file; done < files_failed.txt >> files_to_submit.txt
You can use the below script:-
ls -1 *.qsub.file > filelist.txt
while read pattern
do
filefound=$(grep $pattern filelist.txt)
if [ "$filefound" != "" ]; then
echo "File Found : $filefound"
fi
done < files_failed.txt
Second option:-
while read pattern
do
find . -name "$pattern*.qsub.file" >> filefound.txt
done < files_failed.txt
All your files will be stored in file filefound.txt

bash: get a file name with a first line matching a string pattern [duplicate]

This question already has an answer here:
Print names of files with proper head
(1 answer)
Closed 5 years ago.
How to find a file name with the first line matching a certain string?
So far I have come up with
find . -type f -exec grep -l "MAGIC_WORD" {} \;
but that searches for MAGIC_WORD across all lines of the files while I'm only looking for the files with that pattern at the first line.
I probably should somehow use head -1 but don't know how to mix it with find -exec and return a proper file name
This sounds crazy, because you want to read every file on the machine.
for filename in "$(find . -type f)"
do
line=$(head -n 1 "$filename")
if grep -q "MAGIC WORD" <<< "$line"
then
printf "$filename"
fi
done
Where -q option is returning the exit status of grep. If I remember correctly, it returns 0 (which means true) if it finds the pattern or any other integer if it failed (which means false).
What I did was creating a list with all paths of files and then for each one I check if their first line matches the pattern, if yes, I print the file path as well. I don't know if you need only the file name without the path.

Renames numbered files using names from list in other file

I have a folder where there are books and I have a file with the real name of each file. I renamed them in a way that I can easily see if they are ordered, say "00.pdf", "01.pdf" and so on.
I want to know if there is a way, using the shell, to match each of the lines of the file, say "names", with each file. Actually, match the line i of the file with the book in the positión i in sort order.
<name-of-the-book-in-the-1-line> -> <book-in-the-1-position>
<name-of-the-book-in-the-2-line> -> <book-in-the-2-position>
.
.
.
<name-of-the-book-in-the-i-line> -> <book-in-the-i-position>
.
.
.
I'm doing this in Windows, using Total Commander, but I want to do it in Ubuntu, so I don't have to reboot.
I know about mv and rename, but I'm not as good as I want with regular expressions...
renamer.sh:
#!/bin/bash
for i in `ls -v |grep -Ev '(renamer.sh|names.txt)'`; do
read name
mv "$i" "$name.pdf"
echo "$i" renamed to "$name.pdf"
done < names.txt
names.txt: (line count must be the exact equal to numbered files count)
name of first book
second-great-book
...
explanation:
ls -v returns naturally sorted file list
grep excludes this script name and input file to not be renamed
we cycle through found file names, read value from file and rename the target files by this value
For testing purposes, you can comment out the mv command:
#mv "$i" "$name"
And now, simply run the script:
bash renamer.sh
This loops through names.txt, creates a filename based on a counter (padding to two digits with printf, assigning to a variable using -v), then renames using mv. ((++i)) increases the counter for the next filename.
#!/bin/bash
i=0
while IFS= read -r line; do
printf -v fname "%02d.pdf" "$i"
mv "$fname" "$line"
((++i))
done < names.txt

Using find within a for loop to extract portion of file names as a variable (bash)

I have a number of files with a piece of useful information in their names that I want to extract as a variable and use in a subsequent step. The structure of the file names is samplename_usefulbit_junk. I'm attempting to loop through these files using a predictable portion of the file name (samplename), store the whole name in a variable, and use sed to extract the useful bit. It does not work.
samples="sample1 sample2 sample3"
for i in $samples; do
filename="$(find ./$FILE_DIR -maxdepth 1 -name '$(i)*' -printf '%f\n')"
usefulbit="$(find ./$FILE_DIR -maxdepth 1 -name '$(i)*' -printf '%f\n' | sed 's/.*samplename//g' | sed 's/junk.*//g')"
(More steps using $usefulbit or $(usefulbit) or ${usefulbit} or something)
done
find ./$FILE_DIR -maxdepth 1 -name 'sample1*' -printf '%f\n' and find ./$FILE_DIR -maxdepth 1 -name "sample1*" -printf '%f\n' both work, but no combination of parentheses, curly brackets, or single-, double-, or backquotes has got the loop to work. Where is this going wrong?
Try this:
for file in `ls *_*_*.*`
do
echo "Full file name is: $file"
predictable_portion_filename=${file_%%_*}
echo "predictable portion in the filename is: ${predictable_portion_filename}"
echo "---"
done
PS: $variable or ${variable} or "${variable}" or "$variable" are different than $(variable) as in the last case, $( ... ) makes a sub-shell and treats anything inside as a command i.e. $(variable) will make the sub-shell to execute a command named variable
In place of ls __., you can also use (to recursively find all files with that standard file name): ls -1R *_*_*.*
In place of using ${file%%_*} you can also use: echo ${file} | cut -d'_' -f1 to get the predictable value. You can use various other ways as well (awk, sed/ etc).
Excuse me, i can't do it with bash, may i show you another approach? Here is a shell (lua-shell) i am developing, and a demo as a solution for your case:
wws$ `ls ./demo/2
sample1_red_xx.png sample2_green_xx.png sample3_blue_xx.png
wws$ source ./demo/2.lua
sample1_red_xx.png: red
sample2_green_xx.png: green
sample3_blue_xx.png: blue
wws$
I really want to know your whole plan , unless you need bash as the only tool...
Er, i fogot to paste the script:
samples={"sample1", "sample2", "sample3"}
files = lfs.collect("./demo/2")
function get_filename(prefix)
for i, file in pairs(files) do
if string.match(file.name, prefix) then return file.name end
end
end
for i = 1, #samples do
local filename = get_filename(samples[i])
vim:set(filename)
:f_lvf_hy
print(filename ..": ".. vim:clipboard())
end
The 'get_filename()' seems a little verbose... i haven't finished the lfs component.
I'm not sure whether answering my own question with my final solution is proper stackoverflow etiquette, but this is what ultimately worked for me:
for i in directory/*.ext; do
myfile="$i"
name="$(echo $i | sed 's!.*/!!g' | sed 's/_junk*.ext//g')"
some other steps
done
This way I start with the file name already a variable (in a variable?) and don't have to struggle with find and its strong opinions. It also spares me from having to make a list of sample names.
The first sed removes the directory/ and the second removes the end of the file name and extension, leaving a variable $name that I use as a prefix when generating other files in subsequent steps. So much simpler!

Unix for loop skipping one value in script, works properly in command line

I am running the following commands in command line:
for DATAFILE in `find dir_name -type f -mtime +10 | egrep -v -e 'archive/'`
do
echo 'Data file name- ' "$DATAFILE"
echo 'Base name '
BASENAME=`basename "${DATAFILE}"`
DESTFILE="${BASENAME}"_`date +"%Y%m%d%H%M%S"`
echo "Dest file - "$DESTFILE
done
I get the following result for this:
Data file name- DIR_PATH_1/file_1.txt
Base name
Dest file - file_1.txt_20120719041239
Data file name- DIR_PATH_2/file_2.txt
Base name
Dest file - file_2.txt_20120719041239
When I put the same commands in a shell script and execute, I get the following result:
Data file name- DIR_PATH_1/file_1.txt
DIR_PATH_2/file_2.txt
Base name
Dest file - file_2.txt_20120719040956
I have checked the script for Control-M and other junk characters. Also, I don't have any extra steps in the shell script (no parameters and all).
Can someone point me in the right direction.
Thanks.
Update 1:
I made the following change to the loop:
Earlier:
for DATAFILE in `find ${ROOT_DIR} -type f -mtime +$DAYS_ARCH |
egrep -v -e 'archive/'`
Now:
find ${ROOT_DIR} -type f -mtime +$DAYS_ARCH |
egrep -v -e 'archive/' | while read DATAFILE
It seems to be working properly now. I am still testing to confirm this.
Update 2:
Changing from FOR to WHILE loop has fixed the issue. But still I am not able to understand why this is happening. Anyone?
Capturing find output in backticks will effectively join all lines. This will break file names with embedded spaces.
for DATAFILE in find ${ROOT_DIR} -type f -mtime +$DAYS_ARCH |
egrep -v -e 'archive/'
The while read ... loop will read exactly one pathname at a time, even if they contain white space or other special characters.
Add to your script:
echo "My shell is: " $0
echo "My IFS is:" $IFS
and compare it with results from interactive shell.
Make sure your script is executed by script you want, by adding a hashbang line.
According to man sh, IFS is defined as:
Input Field Separators. This is normally set to ⟨space⟩,
⟨tab⟩, and ⟨newline⟩. See the White Space Splitting section
for more details.

Resources