Shell Programming File Search and Append - shell

I am trying to write a shell program that will search my current directory (say, my folder containing C code), read all files for the keywords "printf" or "fprintf", and append the include statement to the file if it isn't already done.
I have tried to write the search portion already (for now, all it does is search files and print the list of matching files), but it is not working. Included below is my code. What am I doing wrong?
EDIT: New code.
#!/bin/sh
#processes files ending in .c and appends statements if necessary
#search for files that meet criteria
for file in $( find . -type f )
do
echo $file
if grep -q printf "$file"
then
echo "File $file contains command"
fi
done

To execute commands in a subshell you need $( command ). Notice the $ before the parenthesis.
You don't need to store the list of files in a temporary variable, you can directly use
for file in $( find . ) ; do
echo "$file"
done
And with
find . -type f | grep somestring
you are not searching the file content but the file name (in my example all the files which name contains "somestring")
To grep the content of the files:
for file in $( find . -type f ) ; do
if grep -q printf "$file" ; then
echo "File $file contains printf"
fi
done
Note that if you match printf it will also match fprintf (as it contains printf)
If you want to search just files ending with .c you can use the -name option
find . -name "*.c" -type f
Use the -type f option to list only files.
In any case check if your grep has the -r option to search recursively
grep -r --include "*.c" printf .

You can do this sort of thing with sed -i, but I find that distasteful. Instead, it seems reasonable to use ed (sed is ed for streams, so it makes sense to use ed when you're not working with a stream).
#!/bin/sh
for i in *.c; do
grep -Fq '#include <stdio.h>' $i && continue
grep -Fq printf $i && ed -s $i << EOF > /dev/null
1
i
#include <stdio.h>
.
w
EOF
done

Related

Automator/Apple Script: Move files with same prefix on a new folder. The folder name must be the files prefix

I'm a photographer and I have multiple jpg files of clothings in one folder. The files name structure is:
TYPE_FABRIC_COLOR (Example: BU23W02CA_CNU_RED, BU23W02CA_CNU_BLUE, BU23W23MG_LINO_WHITE)
I have to move files of same TYPE (BU23W02CA) on one folder named as TYPE.
For example:
MAIN FOLDER>
BU23W02CA_CNU_RED.jpg, BU23W02CA_CNU_BLUE.jpg, BU23W23MG_LINO_WHITE.jpg
Became:
MAIN FOLDER>
BU23W02CA_CNU > BU23W02CA_CNU_RED.jpg, BU23W02CA_CNU_BLUE.jpg
BU23W23MG_LINO > BU23W23MG_LINO_WHITE.jpg
Here are some scripts.
V1
#!/bin/bash
find . -maxdepth 1 -type f -name "*.jpg" -print0 | while IFS= read -r -d '' file
do
# Extract the directory name
dirname=$(echo "$file" | cut -d'_' -f1-2 | sed 's#\./\(.*\)#\1#')
#DEBUG echo "$file --> $dirname"
# Create it if not already existing
if [[ ! -d "$dirname" ]]
then
mkdir "$dirname"
fi
# Move the file into it
mv "$file" "$dirname"
done
it assumes all files that the find lists are of the format you described in your question, i.e. TYPE_FABRIC_COLOR.ext.
dirname is the extraction of the first two words delimited by _ in the file name.
since find lists the files with a ./ prefix, it is removed from the dirname as well (that is what the sed command does).
the find specifies the name of the files to consider as *.jpg. You can change this to something else, if you want to restrict which files are considered in the move.
this version loops through each file, creates a directory with it's first two sections (if it does not exists already), and moves the file into it.
if you want to see what the script is doing to each file, you can add option -v to the mv command. I used it to debug.
However, since it loops though each file one by one, this might take time with a large number of files, hence this next version.
V2
#!/bin/bash
while IFS= read -r dirname
do
echo ">$dirname"
# Create it if not already existing
if [[ ! -d "$dirname" ]]
then
mkdir "$dirname"
fi
# Move the file into it
find . -maxdepth 1 -type f -name "${dirname}_*" -exec mv {} "$dirname" \;
done < <(find . -maxdepth 1 -type f -name "*.jpg" -print | sed 's#^\./\(.*\)_\(.*\)_.*\..*$#\1_\2#' | sort | uniq)
this version loops on the directory names instead of on each file.
the last line does the "magic". It finds all files, and extracts the first two words (with sed) right away. Then these words are sorted and "uniqued".
the while loop then creates each directory one by one.
the find inside the while loop moves all files that match the directory being processed into it. Why did I not simply do mv ${dirname}_* ${dirname}? Since the expansion of the * wildcard could result in a too long arguments list for the mv command. Doing it with the find ensures that it will work even on LARGE number of files.
Suggesting oneliner awk script:
echo "$(ls -1 *.jpg)"| awk '{system("mkdir -p "$1 OFS $2);system("mv "$0" "$1 OFS $2)}' FS=_ OFS=_
Explanation:
echo "$(ls -1 *.jpg)": List all jpg files in current directory one file per line
FS=_ : Set awk field separator to _ $1=type $2=fabric $3=color.jpg
OFS=_ : Set awk output field separator to _
awk script explanation
{ # for each file name from list
system ("mkdir -p "$1 OFS $2); # execute "mkdir -p type_fabric"
system ("mv " $0 " " $1 OFS $2); # execute "mv current-file to type_fabric"
}

How to use bash string formatting to reverse date format?

I have a lot of files that are named as: MM-DD-YYYY.pdf. I want to rename them as YYYY-MM-DD.pdf I’m sure there is some bash magic to do this. What is it?
For files in the current directory:
for name in ./??-??-????.pdf; do
if [[ "$name" =~ (.*)/([0-9]{2})-([0-9]{2})-([0-9]{4})\.pdf ]]; then
echo mv "$name" "${BASH_REMATCH[1]}/${BASH_REMATCH[4]}-${BASH_REMATCH[3]}-${BASH_REMATCH[2]}.pdf"
fi
done
Recursively, in or under the current directory:
find . -type f -name '??-??-????.pdf' -exec bash -c '
for name do
if [[ "$name" =~ (.*)/([0-9]{2})-([0-9]{2})-([0-9]{4})\.pdf ]]; then
echo mv "$name" "${BASH_REMATCH[1]}/${BASH_REMATCH[4]}-${BASH_REMATCH[3]}-${BASH_REMATCH[2]}.pdf"
fi
done' bash {} +
Enabling the globstar shell option in bash lets us do the following (will also, like the above solution, handle all files in or below the current directory):
shopt -s globstar
for name in **/??-??-????.pdf; do
if [[ "$name" =~ (.*)/([0-9]{2})-([0-9]{2})-([0-9]{4})\.pdf ]]; then
echo mv "$name" "${BASH_REMATCH[1]}/${BASH_REMATCH[4]}-${BASH_REMATCH[3]}-${BASH_REMATCH[2]}.pdf"
fi
done
All three of these solutions uses a regular expression to pick out the relevant parts of the filenames, and then rearranges these parts into the new name. The only difference between them is how the list of pathnames is generated.
The code prefixes mv with echo for safety. To actually rename files, remove the echo (but run at least once with echo to see that it does what you want).
A direct approach example from the command line:
$ ls
10-01-2018.pdf 11-01-2018.pdf 12-01-2018.pdf
$ ls [0-9]*-[0-9]*-[0-9]*.pdf|sed -r 'p;s/([0-9]{2})-([0-9]{2})-([0-9]{4})/\3-\1-\2/'|xargs -n2 mv
$ ls
2018-10-01.pdf 2018-11-01.pdf 2018-12-01.pdf
The ls output is piped to sed , then we use the p flag to print the argument without modifications, in other words, the original name of the file, and s to perform and output the conversion.
The ls + sed result is a combined output that consist of a sequence of old_file_name and new_file_name.
Finally we pipe the resulting feed through xargs to get the effective rename of the files.
From xargs man:
-n number Execute command using as many standard input arguments as possible, up to number arguments maximum.
You can use the following command very close to the one of klashxx:
for f in *.pdf; do echo "$f"; mv "$f" "$(echo "$f" | sed 's#\(..\)-\(..\)-\(....\)#\3-\2-\1#')"; done
before:
ls *.pdf
12-01-1998.pdf 12-03-2018.pdf
after:
ls *.pdf
1998-01-12.pdf 2018-03-12.pdf
Also if you have other pdf files that does not respect this format in your folder, what you can do is to select only the files that respect the format: MM-DD-YYYY.pdf to do so use the following command:
for f in `find . -maxdepth 1 -type f -regextype sed -regex './[0-9]\{2\}-[0-9]\{2\}-[0-9]\{4\}.pdf' | xargs -n1 basename`; do echo "$f"; mv "$f" "$(echo "$f" | sed 's#\(..\)-\(..\)-\(....\)#\3-\2-\1#')"; done
Explanations:
find . -maxdepth 1 -type f -regextype sed -regex './[0-9]\{2\}-[0-9]\{2\}-[0-9]\{4\}.pdf this find command will look only for files in the current working directory that respect your syntax and extract their basename (remove the ./ at the beginning, folders and other type of files that would have the same name are not taken into account, other *.pdf files are also ignored.
for each file you do a move and the resulting file name is computed using sed and back reference to the 3 groups for MM,DD and YYYY
For these simple filenames, using a more verbose pattern, you can simplify the body of the loop a bit:
twodigit=[[:digit:]][[:digit:]]
fourdigit="$twodigit$twodigit"
for f in $twodigit-$twodigit-$fourdigit.pdf; do
IFS=- read month day year <<< "${f%.pdf}"
mv "$f" "$year-$month-$day.pdf"
done
This is basically #Kusalananda's answer, but without the verbosity of regular-expression matching.

Making a file out of all the files with given a string

Create a file that includes the content of all the files in the current folder that has a given string (in say argument 1), the data will be in it one after the other (each file appended to the end). The name of the file will be the given string.
I thought of the following but it doesn't work:
grep $1 * >> fnames #places all the names of the right files in a file
for x in fnames
do
cat x >> $1 #concat the files from the list
done
rm fnames
On the same note, is there a site that has solved exercises like this or examples?
You can do something like this using process substitution:
shopt -s nullglob
while read -r file; do
cat "$file"
done < <(grep -l "search-pattern" *) > /path/to/newfile
This is assuming your directory only has files and no sub-directories.
You will need to use find with grep if there are sub-directories as well:
find . -maxdepth 1 -type f -exec grep -q "search-pattern" {} \; -print0 |
xargs -0 cat > /path/to/newfile
How about (assuming you aren't worried about files with spaces or newlines or shell globs/etc. in their names since those will not work here correctly):
for O in $(grep -l $1 *)
do
cat "$O" >> $1
done

find command with filename coming from bash printf builtin not working

I'm trying to do a script which lists files on a directory and then searchs one by one every file in other directory. For dealing with spaces and special characters like "[" or "]" I'm using $(printf %q "$FILENAME") as input for the find command: find /directory/to/search -type f -name $(printf %q "$FILENAME").
It works like a charm for every filename except in one case: when there's multibyte characters (UTF-8). In that case the output of printf is an external quoted string, i.e.: $'file name with blank spaces and quoted characters in the form of \NNN\NNN', and that string is not being expanded without the $'' quoting, so find searchs for a file with a name including that quote: «$'filename'».
Is there an alternative solution in order to be able to pass to find any kind of filename?
My script is like follows (I know some lines can be deleted, like the "RESNAME="):
#!/bin/bash
if [ -d $1 ] && [ -d $2 ]; then
IFSS=$IFS
IFS=$'\n'
FILES=$(find $1 -type f )
for FILE in $FILES; do
BASEFILE=$(printf '%q' "$(basename "$FILE")")
RES=$(find $2 -type f -name "$BASEFILE" -print )
if [ ${#RES} -gt 1 ]; then
RESNAME=$(printf '%q' "$(basename "$RES")")
else
RESNAME=
fi
if [ "$RESNAME" != "$BASEFILE" ]; then
echo "FILE NOT FOUND: $FILE"
fi
done
else
echo "Directories do not exist"
fi
IFS=$IFSS
As an answer said, I've used associative arrays, but with no luck, maybe I'm not using correctly the arrays, but echoing it (array[#]) returns nothing. This is the script I've written:
#!/bin/bash
if [ -d "$1" ] && [ -d "$2" ]; then
declare -A files
find "$2" -type f -print0 | while read -r -d $'\0' FILE;
do
BN2="$(basename "$FILE")"
files["$BN2"]="$BN2"
done
echo "${files[#]}"
find "$1" -type f -print0 | while read -r -d $'\0' FILE;
do
BN1="$(basename "$FILE")"
if [ "${files["$BN1"]}" != "$BN1" ]; then
echo "File not found: "$BN1""
fi
done
fi
Don't use for loops. First, it is slower. Your find has to complete before the rest of your program can run. Second, it is possible to overload the command line. The enter for command must fit in the command line buffer.
Most importantly of all, for sucks at handling funky file names. You're running conniptions trying to get around this. However:
find $1 -type f -print0 | while read -r -d $'\0' FILE
will work much better. It handles file names -- even file names that contain \n characters. The -print0 tells find to separate file names with the NUL character. The while read -r -d $'\0 FILE will read each file name (separate by the NUL character) into $FILE.
If you put quotes around the file name in the find command, you don't have to worry about special characters in the file names.
Your script is running find once for each file found. If you have 100 files in your first directory, you're running find 100 times.
Do you know about associative (hash) arrays in BASH? You are probably better off using associative arrays. Run find on the first directory, and store those files names in an associative array.
Then, run find (again using the find | while read syntax) for your second directory. For each file you find in the second directory, see if you have a matching entry in your associative array. If you do, you know that file is in both arrays.
Addendum
I've been looking at the find command. It appears there's no real way to prevent it from using pattern matching except through a lot of work (like you were doing with printf. I've tried using the -regex matching and using \Q and \E to remove the special meaning of pattern characters. I haven't been successful.
There comes a time that you need something a bit more powerful and flexible than shell to implement your script, and I believe this is the time.
Perl, Python, and Ruby are three fairly ubiquitous scripting languages found on almost all Unix systems and are available on other non-POSIX platforms (cough! ...Windows!... cough!).
Below is a Perl script that takes two directories, and searches them for matching files. It uses the find command once and uses associative arrays (called hashes in Perl). I key the hash to the name of my file. In the value portion of the hash, I store an array of the directories where I found this file.
I only need to run the find command once per directory. Once that is done, I can print out all the entries in the hash that contain more than one directory.
I know it's not shell, but this is one of the cases where you can spend a lot more time trying to figure out how to get shell to do what you want than its worth.
#! /usr/bin/env perl
use strict;
use warnings;
use feature qw(say);
use File::Find;
use constant DIRECTORIES => qw( dir1 dir2 );
my %files;
#
# Perl version of the find command. You give it a list of
# directories and a subroutine for filtering what you find.
# I am basically rejecting all non-file entires, then pushing
# them into my %files hash as an array.
#
find (
sub {
return unless -f;
$files{$_} = [] if not exists $files{$_};
push #{ $files{$_} }, $File::Find::dir;
}, DIRECTORIES
);
#
# All files are found and in %files hash. I can then go
# through all the entries in my hash, and look for ones
# with more than one directory in the array reference.
# IF there is more than one, the file is located in multiple
# directories, and I print them.
#
for my $file ( sort keys %files ) {
if ( #{ $files{$file} } > 1 ) {
say "File: $file: " . join ", ", #{ $files{$file} };
}
}
Try something like this:
find "$DIR1" -printf "%f\0" | xargs -0 -i find "$DIR2" -name \{\}
How about this one-liner?
find dir1 -type f -exec bash -c 'read < <(find dir2 -name "${1##*/}" -type f)' _ {} \; -printf "File %f is in dir2\n" -o -printf "File %f is not in dir2\n"
Absolutely 100% safe regarding files with funny symbols, newlines and spaces in their name.
How does it work?
find (the main one) will scan through directory dir1 and for each file (-type f) will execute
read < <(find dir2 -name "${1##*/} -type f")
with argument the name of the current file given by the main find. This argument is at position $1. The ${1##*/} removes everything before the last / so that if $1 is path/to/found/file the find statement is:
find dir2 -name "file" -type f
This outputs something if file is found, otherwise has no output. That's what is read by the read bash command. read's exit status is true if it was able to read something, and false if there wasn't anything read (i.e., in case nothing is found). This exit status becomes bash's exit status which becomes -exec's status. If true, the next -printf statement is executed, and if false, the -o -printf part will be executed.
If your dirs are given in variables $dir1 and $dir2 do this, so as to be safe regarding spaces and funny symbols that could occur in $dir2:
find "$dir1" -type f -exec bash -c 'read < <(find "$0" -name "${1##*/}" -type f)' "$dir2" {} \; -printf "File %f is in $dir2\n" -o -printf "File %f is not in $dir2\n"
Regarding efficiency: this is of course not an efficient method at all! the inner find will be executed as many times as there are found files in dir1. This is terrible, especially if the directory tree under dir2 is deep and has many branches (you can rely a little bit on caching, but there are limits!).
Regarding usability: you have fine-grained control on how both find's work and on the output, and it's very easy to add many more tests.
So, hey, tell me how to compare files from two directories? Well, if you agree on loosing a little bit of control, this will be the shortest and most efficient answer:
diff dir1 dir2
Try it, you'll be amazed!
Since you are only using find for its recursive directory following, it will be easier to simply use the globstar option in bash. (You're using associative arrays, so your bash is new enough).
#!/bin/bash
shopt -s globstar
declare -A files
if [[ -d $1 && -d $2 ]]; then
for f in "$2"/**/*; do
[[ -f "$f" ]] || continue
BN2=$(basename "$f")
files["$BN2"]=$BN2
done
echo "${files[#]}"
for f in "$1"/**/*; do
[[ -f "$f" ]] || continue
BN1=$(basename $f)
if [[ ${files[$BN1]} != $BN1 ]]; then
echo "File not found: $BN1"
fi
done
fi
** will match zero or more directories, so $1/**/* will match all the files and directories in $1, all the files and directories in those directories, and so forth all the way down the tree.
If you want to use associative arrays, here's one possibility that will work well with files with all sorts of funny symbols in their names (this script has too much to just show the point, but it is usable as is – just remove the parts you don't want and adapt to your needs):
#!/bin/bash
die() {
printf "%s\n" "$#"
exit 1
}
[[ -n $1 ]] || die "Must give two arguments (none found)"
[[ -n $2 ]] || die "Must give two arguments (only one given)"
dir1=$1
dir2=$2
[[ -d $dir1 ]] || die "$dir1 is not a directory"
[[ -d $dir2 ]] || die "$dir2 is not a directory"
declare -A dir1files
declare -A dir2files
while IFS=$'\0' read -r -d '' file; do
dir1files[${file##*/}]=1
done < <(find "$dir1" -type f -print0)
while IFS=$'\0' read -r -d '' file; do
dir2files[${file##*/}]=1
done < <(find "$dir2" -type f -print0)
# Which files in dir1 are in dir2?
for i in "${!dir1files[#]}"; do
if [[ -n ${dir2files[$i]} ]]; then
printf "File %s is both in %s and in %s\n" "$i" "$dir1" "$dir2"
# Remove it from dir2 has
unset dir2files["$i"]
else
printf "File %s is in %s but not in %s\n" "$i" "$dir1" "$dir2"
fi
done
# Which files in dir2 are not in dir1?
# Since I unset them from dir2files hash table, the only keys remaining
# correspond to files in dir2 but not in dir1
if [[ -n "${!dir2files[#]}" ]]; then
printf "File %s is in %s but not in %s\n" "$dir2" "$dir1" "${!dir2files[#]}"
fi
Remark. The identification of files is only based on their filenames, not their contents.

Suppress output to StdOut when piping echo

I'm making a bash script that crawls through a directory and outputs all files of a certain type into a text file. I've got that working, it just also writes out a bunch of output to console I don't want (the names of the files)
Here's the relevant code so far, tmpFile is the file I'm writing to:
for DIR in `find . -type d` # Find problem directories
do
for FILE in `ls "$DIR"` # Loop through problems in directory
do
if [[ `echo ${FILE} | grep -e prob[0-9]*_` ]]; then
`echo ${FILE} >> ${tmpFile}`
fi
done
done
The files I'm putting into the text file are in the format described by the regex prob[0-9]*_ (something like prob12345_01)
Where I pipe the output from echo ${FILE} into grep, it still outputs to stdout, something I want to avoid. I think it's a simple fix, but it's escaping me.
All this can be done in one single find command. Consider this:
find . -type f -name "prob[0-9]*_*" -exec echo {} >> ${tmpFile} \;
EDIT:
Even simpler: (Thanks to #GlennJackman)
find . -type f -name "prob[0-9]*_*" >> $tmpFile
To answer your specific question, you can pass -q to grep for silent output.
if echo "hello" | grep -q el; then
echo "found"
fi
But since you're already using find, this can be done with just one command:
find . -regex ".*prob[0-9]*_.*" -printf '%f\n' >> ${tmpFile}
find's regex is a match on the whole path, which is why the leading and trailing .* is needed.
The -printf '%f\n' prints the file name without directory, to match what your script is doing.
what you want to do is, read the output of the find command,
for every entry find returned, you want to get all (*) the files under that location
and then you want to check whether that filename matches the pattern you want
if it matches then add it to the tmpfile
while read -r dir; do
for file in "$dir"/*; do # will not match hidden files, unless dotglob is set
if [[ "$file" =~ prob[0-9]*_ ]]; then
echo "$file" >> "$tmpfile"
fi
done < <(find . -type d)
however find can do that alone
anubhava got me there ;)
so look his answer on how that's done

Resources