Select python files based on matching content inside file - bash

I have recently created some code that will search for python files belonging to a directory, and it will invest into searching all sub-directories for these python files. All whilst excluding those files I specify.
I then want to filter this search further by selecting on those python files that contain keywords in them. For example, I want to return all python files which contains the keyword as content 'url'.
Here is what I have attempted:
find ML*/* -not -path "*/venv/**" -not -path "*/ENV_DIR/**" -print0 | \
while read -d $'\0' file; do if [[ "$file" != */venv && "$file" != */ENV_DIR ]]; \
then echo ~/$file | grep '.py$'; fi; done | while IFS= read -r lines; \
do if [ $lines == 'url' ]; then echo $lines; fi ; done
However this prints nothing because the last while loop does not search into the content, it instead prints out the path to the file.
I have attempted the following to read the content (by using gawk):
find ML*/* -not -path "*/venv/**" -not -path "*/ENV_DIR/**" -print0 | \
while read -d $'\0' file; do if [[ "$file" != */venv && "$file" != */ENV_DIR ]]; \
then echo ~/$file | grep '.py$'; fi; done | while IFS= read -r lines; \
do gawk '{ print }' $lines; done
However, this will only print out the first file content.

You can do it with single command
find ./ -not -path "*/venv/**" -not -path "*/ENV_DIR/**" -name "*.py" -exec grep -H 'url' {} \;
-not -path excludes paths
-name "*.py" matches only files with extension
-exec executes command for each file found
grep -H looks for matching line and prints filename
{} is replaced by the current file name being processed
\; is used to specify last argument

Related

traverse through folder and do something with specific file types

I'm working on a bash script that should go through a directory and print all files and if it hits a folder it should call it's self and do it again. I believe my problem lies with if [[ $file =~ \.yml?yaml$ ]]; when I remove the tilda it runs but not correctly if [[ $file = \.yml?yaml$ ]];
It returns "this a file isn't need -> $file" even though it's a yaml.
#!/bin/bash
print_files_and_dirs() {
for file in $1/*;
do
if [ -f "$file" ];
then
if [[ $file =~ \.yml?yaml$ ]];
then
echo "this is a yaml file! -> $file"
else
echo "this a file isn't need -> $file"
fi
else
print_files_and_dirs $file
fi
done
}
print_files_and_dirs .
Maybe you can use find to find the yaml files and do something with them.
find "$PWD" \
-type f \( -name "*.yaml" -or -name "*.yml" \) \
-exec echo found {} \;
If you only want the file name without the path, you could use printf to get the names and pipe it to xargs.
find "$PWD" \
-type f \( -name "*.yaml" -or -name "*.yml" \) \
-printf '%f\n' \
| xargs -I{} echo found {}

iterate over lines in file then find in directory

I am having trouble looping and searching. It seems that the loop is not waiting for the find to finish. What am I doing wrong?
I made a loop the reads a file line by line. I then want to use that "name" to search a directory looking to see if a folder has that name. If it exists copy it to a drive.
#!/bin/bash
DIRFIND="$2"
DIRCOPY="$3"
if [ -d $DIRFIND ]; then
while IFS='' read -r line || [[ -n "$line" ]]; do
echo "$line"
FILE=`find "$DIRFIND" -type d -name "$line"`
if [ -n "$FILE" ]; then
echo "Found $FILE"
cp -a "$FILE" "$DIRCOPY"
else
echo "$line not found."
fi
done < "$1"
else
echo "No such file or directory"
fi
Have you tried xargs...
Proposed Solution
cat filenamelist | xargs -n1 -I {} find . -type d -name {} -print | xargs -n1 -I {} mv {} .
what the above does is pipe a list of filenames into find (one at a time), when found find prints the name and passes to xarg which moves the file...
Expansion
file = yogo
yogo -> | xargs -n1 -I yogo find . -type d -name yogo -print | xargs -n1 -I {} mv ./<path>/yogo .
I hope the above helps, note that xargs has the advantage that you do not run out of command line buffer.

Unexpected Termination of While Loop in Bash

The below code snippet is for searching files recursively and iterating them.
find . -type f -not -name '*.ini' -print0 | while IFS= read -r -d '' filename; do
echo "$filename"
done
It gives this resut:
1.jpg
2.jpg
3.jpg
But if I want to process the file somehow like this
find . -type f -not -name '*.ini' -print0 | while IFS= read -r -d '' filename; do
echo "$filename"
echo "$(${ExternalApp} -someparams $filename 2> /dev/null| cut -f 2- -d: | cut -f 2- -d ' ' )"
done
The loop terminates after the first iteration and result become like this:
1.jpg
I have recently updated bash (I'm on windows with MSYS). What is the problem here?
find's output is read by the command. This is an especially common problem when using ssh, ffmpeg or mplayer.
You can redirect from /dev/null if it doesn't need input at all:
find . -type f -not -name '*.ini' -print0 | while IFS= read -r -d '' filename; do
echo "$filename"
# v-- here
echo "$(${ExternalApp} -someparams $filename < /dev/null 2> /dev/null |
cut -f 2- -d: | cut -f 2- -d ' ' )"
done

Bash Script interactive mv issues

I'm working on a bash script to help organize files and I want to use mv -i to make sure I don't write over something important.
The script is working right now except for the -i for the mv.
It shows (y/n [n]) not overwritten part, but then goes and and doesn't allow me to interact with it.
createList()
{
ls *.epub | sed 's/-.*//' |uniq >> list.txt
ls *.mobi | sed 's/-.*//' |uniq >> list2.txt
}
atag()
{
find /Users/j/Desktop/Source -maxdepth 1 -iname "*.epub" -type f -print0 | xargs -0 -I '{}' tag -a Purple {}
find /Users/j/Desktop/Source -maxdepth 1 -iname "*.mobi" -type f -print0 | xargs -0 -I '{}' tag -a Purple {}
}
moveEpub()
{
while read -r line; do
if [ -d "/Users/j/Desktop/Dest/$line" ]; then
if [ -d "/Users/j/Desktop/Dest/$line/EPUB" ]; then
find /Users/j/Desktop/Source/ -maxdepth 1 -iname "*$line*" -and ! -iname ".*$line*" -type f -print0 | xargs -0 -I '{}' mv -i {} /Users/j/Desktop/Dest/"$line"/EPUB/
else
mkdir "/Users/j/Desktop/Dest/$line/EPUB"
find /Users/j/Desktop/Source/ -maxdepth 1 -iname "*$line*" -and ! -iname ".*$line*" -type f -print0 | xargs -0 -I '{}' mv -i {} /Users/j/Desktop/Dest/"$line"/EPUB/
fi
fi
done < "list.txt"
}
moveMobi()
{
while read -r line; do
if [ -d "/Users/j/Desktop/Dest/$line" ]; then
if [ -d "/Users/j/Desktop/Dest/$line/MOBI" ]; then
find /Users/j/Desktop/Source/ -maxdepth 1 -iname "*$line*" -and ! -iname ".*$line*" -type f -print0 | xargs -0 -I '{}' mv -i {} /Users/j/Desktop/Dest/"$line"/MOBI/
else
mkdir "/Users/j/Desktop/Dest/$line/MOBI"
find /Users/j/Desktop/Source/ -maxdepth 1 -iname "*$line*" -and ! -iname ".*$line*" -type f -print0 | xargs -0 -I '{}' mv --interactive {} /Users/j/Desktop/Dest/"$line"/MOBI/
fi
fi
done < "list2.txt"
}
clear
createList
atag
moveEpub
moveMobi
rm list.txt
rm list2.txt
If you want mv -i to interact with the terminal, that means its stdin needs to be attached to that terminal. There are several places, here, where you're overriding stdin.
For instance:
# THIS LOOP OVERRIDES STDIN
while read -r line
...
done <list.txt
...redirects stdin for the entire duration of the loop, so instead of reading from the user, mv reads from list.txt. To change this, use a different file descriptor:
# This loop uses FD 3 for stdin
while read -r line <&3
...
done 3<list.txt
Another place is in calling xargs. Instead of:
# Overrides stdin for xargs and mv to contain output from find
find ... -print0 | xargs -0 -I '{}' mv -i '{}' "$dest"
...use:
# directly executes mv from find, stdin not modified
find ... -exec mv -i '{}' "$dest" ';'
That said, I would suggest ditching list.txt and list2.txt altogether; you simply don't need them; for that matter, you don't need find either.
dest=/Users/j/Desktop/Dest
source=/Users/j/Desktop/Source
moveEpub() {
local -A finished=( ) # WARNING: This requires bash 4.0 or newer.
for name in *.epub; do
prefix=${name%%-*} # remove everything past the first dash
[[ ${finished[$prefix]} ]] && continue # skip if already done with this prefix
finished[$prefix]=1 # set flag to skip other files w/ this prefix
[[ -d $dest/$prefix ]] || continue # skip if no directory exists for this prefix
mkdir -p "$dest/$prefix/EPUB" # create destination if not existing
mv -i "$source"/*"$prefix"* "$dest/$prefix/EPUB"
done
}
You can use built in find action -exec instead of piping to xargs :
find /Users/j/Desktop/Source/ -maxdepth 1 \
-iname "*$line*" -and ! -iname ".*$line*" -type f \
-exec mv -i {} /Users/j/Desktop/Dest/"$line"/EPUB/ \;

How to loop through a directory recursively to delete files with certain extensions

I need to loop through a directory recursively and remove all files with extension .pdf and .doc. I'm managing to loop through a directory recursively but not managing to filter the files with the above mentioned file extensions.
My code so far
#/bin/sh
SEARCH_FOLDER="/tmp/*"
for f in $SEARCH_FOLDER
do
if [ -d "$f" ]
then
for ff in $f/*
do
echo "Processing $ff"
done
else
echo "Processing file $f"
fi
done
I need help to complete the code, since I'm not getting anywhere.
As a followup to mouviciel's answer, you could also do this as a for loop, instead of using xargs. I often find xargs cumbersome, especially if I need to do something more complicated in each iteration.
for f in $(find /tmp -name '*.pdf' -or -name '*.doc'); do rm $f; done
As a number of people have commented, this will fail if there are spaces in filenames. You can work around this by temporarily setting the IFS (internal field seperator) to the newline character. This also fails if there are wildcard characters \[?* in the file names. You can work around that by temporarily disabling wildcard expansion (globbing).
IFS=$'\n'; set -f
for f in $(find /tmp -name '*.pdf' -or -name '*.doc'); do rm "$f"; done
unset IFS; set +f
If you have newlines in your filenames, then that won't work either. You're better off with an xargs based solution:
find /tmp \( -name '*.pdf' -or -name '*.doc' \) -print0 | xargs -0 rm
(The escaped brackets are required here to have the -print0 apply to both or clauses.)
GNU and *BSD find also has a -delete action, which would look like this:
find /tmp \( -name '*.pdf' -or -name '*.doc' \) -delete
find is just made for that.
find /tmp -name '*.pdf' -or -name '*.doc' | xargs rm
Without find:
for f in /tmp/* tmp/**/* ; do
...
done;
/tmp/* are files in dir and /tmp/**/* are files in subfolders. It is possible that you have to enable globstar option (shopt -s globstar).
So for the question the code should look like this:
shopt -s globstar
for f in /tmp/*.pdf /tmp/*.doc tmp/**/*.pdf tmp/**/*.doc ; do
rm "$f"
done
Note that this requires bash ≥4.0 (or zsh without shopt -s globstar, or ksh with set -o globstar instead of shopt -s globstar). Furthermore, in bash <4.3, this traverses symbolic links to directories as well as directories, which is usually not desirable.
If you want to do something recursively, I suggest you use recursion (yes, you can do it using stacks and so on, but hey).
recursiverm() {
for d in *; do
if [ -d "$d" ]; then
(cd -- "$d" && recursiverm)
fi
rm -f *.pdf
rm -f *.doc
done
}
(cd /tmp; recursiverm)
That said, find is probably a better choice as has already been suggested.
Here is an example using shell (bash):
#!/bin/bash
# loop & print a folder recusively,
print_folder_recurse() {
for i in "$1"/*;do
if [ -d "$i" ];then
echo "dir: $i"
print_folder_recurse "$i"
elif [ -f "$i" ]; then
echo "file: $i"
fi
done
}
# try get path from param
path=""
if [ -d "$1" ]; then
path=$1;
else
path="/tmp"
fi
echo "base path: $path"
print_folder_recurse $path
This doesn't answer your question directly, but you can solve your problem with a one-liner:
find /tmp \( -name "*.pdf" -o -name "*.doc" \) -type f -exec rm {} +
Some versions of find (GNU, BSD) have a -delete action which you can use instead of calling rm:
find /tmp \( -name "*.pdf" -o -name "*.doc" \) -type f -delete
For bash (since version 4.0):
shopt -s globstar nullglob dotglob
echo **/*".ext"
That's all.
The trailing extension ".ext" there to select files (or dirs) with that extension.
Option globstar activates the ** (search recursivelly).
Option nullglob removes an * when it matches no file/dir.
Option dotglob includes files that start wit a dot (hidden files).
Beware that before bash 4.3, **/ also traverses symbolic links to directories which is not desirable.
This method handles spaces well.
files="$(find -L "$dir" -type f)"
echo "Count: $(echo -n "$files" | wc -l)"
echo "$files" | while read file; do
echo "$file"
done
Edit, fixes off-by-one
function count() {
files="$(find -L "$1" -type f)";
if [[ "$files" == "" ]]; then
echo "No files";
return 0;
fi
file_count=$(echo "$files" | wc -l)
echo "Count: $file_count"
echo "$files" | while read file; do
echo "$file"
done
}
This is the simplest way I know to do this:
rm **/#(*.doc|*.pdf)
** makes this work recursively
#(*.doc|*.pdf) looks for a file ending in pdf OR doc
Easy to safely test by replacing rm with ls
The following function would recursively iterate through all the directories in the \home\ubuntu directory( whole directory structure under ubuntu ) and apply the necessary checks in else block.
function check {
for file in $1/*
do
if [ -d "$file" ]
then
check $file
else
##check for the file
if [ $(head -c 4 "$file") = "%PDF" ]; then
rm -r $file
fi
fi
done
}
domain=/home/ubuntu
check $domain
There is no reason to pipe the output of find into another utility. find has a -delete flag built into it.
find /tmp -name '*.pdf' -or -name '*.doc' -delete
The other answers provided will not include files or directories that start with a . the following worked for me:
#/bin/sh
getAll()
{
local fl1="$1"/*;
local fl2="$1"/.[!.]*;
local fl3="$1"/..?*;
for inpath in "$1"/* "$1"/.[!.]* "$1"/..?*; do
if [ "$inpath" != "$fl1" -a "$inpath" != "$fl2" -a "$inpath" != "$fl3" ]; then
stat --printf="%F\0%n\0\n" -- "$inpath";
if [ -d "$inpath" ]; then
getAll "$inpath"
#elif [ -f $inpath ]; then
fi;
fi;
done;
}
I think the most straightforward solution is to use recursion, in the following example, I have printed all the file names in the directory and its subdirectories.
You can modify it according to your needs.
#!/bin/bash
printAll() {
for i in "$1"/*;do # for all in the root
if [ -f "$i" ]; then # if a file exists
echo "$i" # print the file name
elif [ -d "$i" ];then # if a directroy exists
printAll "$i" # call printAll inside it (recursion)
fi
done
}
printAll $1 # e.g.: ./printAll.sh .
OUTPUT:
> ./printAll.sh .
./demoDir/4
./demoDir/mo st/1
./demoDir/m2/1557/5
./demoDir/Me/nna/7
./TEST
It works fine with spaces as well!
Note:
You can use echo $(basename "$i") # print the file name to print the file name without its path.
OR: Use echo ${i%/##*/}; # print the file name which runs extremely faster, without having to call the external basename.
Just do
find . -name '*.pdf'|xargs rm
If you can change the shell used to run the command, you can use ZSH to do the job.
#!/usr/bin/zsh
for file in /tmp/**/*
do
echo $file
done
This will recursively loop through all files/folders.
The following will loop through the given directory recursively and list all the contents :
for d in /home/ubuntu/*;
do
echo "listing contents of dir: $d";
ls -l $d/;
done

Resources