Making a file out of all the files with given a string - bash

Create a file that includes the content of all the files in the current folder that has a given string (in say argument 1), the data will be in it one after the other (each file appended to the end). The name of the file will be the given string.
I thought of the following but it doesn't work:
grep $1 * >> fnames #places all the names of the right files in a file
for x in fnames
do
cat x >> $1 #concat the files from the list
done
rm fnames
On the same note, is there a site that has solved exercises like this or examples?

You can do something like this using process substitution:
shopt -s nullglob
while read -r file; do
cat "$file"
done < <(grep -l "search-pattern" *) > /path/to/newfile
This is assuming your directory only has files and no sub-directories.
You will need to use find with grep if there are sub-directories as well:
find . -maxdepth 1 -type f -exec grep -q "search-pattern" {} \; -print0 |
xargs -0 cat > /path/to/newfile

How about (assuming you aren't worried about files with spaces or newlines or shell globs/etc. in their names since those will not work here correctly):
for O in $(grep -l $1 *)
do
cat "$O" >> $1
done

Related

Automator/Apple Script: Move files with same prefix on a new folder. The folder name must be the files prefix

I'm a photographer and I have multiple jpg files of clothings in one folder. The files name structure is:
TYPE_FABRIC_COLOR (Example: BU23W02CA_CNU_RED, BU23W02CA_CNU_BLUE, BU23W23MG_LINO_WHITE)
I have to move files of same TYPE (BU23W02CA) on one folder named as TYPE.
For example:
MAIN FOLDER>
BU23W02CA_CNU_RED.jpg, BU23W02CA_CNU_BLUE.jpg, BU23W23MG_LINO_WHITE.jpg
Became:
MAIN FOLDER>
BU23W02CA_CNU > BU23W02CA_CNU_RED.jpg, BU23W02CA_CNU_BLUE.jpg
BU23W23MG_LINO > BU23W23MG_LINO_WHITE.jpg
Here are some scripts.
V1
#!/bin/bash
find . -maxdepth 1 -type f -name "*.jpg" -print0 | while IFS= read -r -d '' file
do
# Extract the directory name
dirname=$(echo "$file" | cut -d'_' -f1-2 | sed 's#\./\(.*\)#\1#')
#DEBUG echo "$file --> $dirname"
# Create it if not already existing
if [[ ! -d "$dirname" ]]
then
mkdir "$dirname"
fi
# Move the file into it
mv "$file" "$dirname"
done
it assumes all files that the find lists are of the format you described in your question, i.e. TYPE_FABRIC_COLOR.ext.
dirname is the extraction of the first two words delimited by _ in the file name.
since find lists the files with a ./ prefix, it is removed from the dirname as well (that is what the sed command does).
the find specifies the name of the files to consider as *.jpg. You can change this to something else, if you want to restrict which files are considered in the move.
this version loops through each file, creates a directory with it's first two sections (if it does not exists already), and moves the file into it.
if you want to see what the script is doing to each file, you can add option -v to the mv command. I used it to debug.
However, since it loops though each file one by one, this might take time with a large number of files, hence this next version.
V2
#!/bin/bash
while IFS= read -r dirname
do
echo ">$dirname"
# Create it if not already existing
if [[ ! -d "$dirname" ]]
then
mkdir "$dirname"
fi
# Move the file into it
find . -maxdepth 1 -type f -name "${dirname}_*" -exec mv {} "$dirname" \;
done < <(find . -maxdepth 1 -type f -name "*.jpg" -print | sed 's#^\./\(.*\)_\(.*\)_.*\..*$#\1_\2#' | sort | uniq)
this version loops on the directory names instead of on each file.
the last line does the "magic". It finds all files, and extracts the first two words (with sed) right away. Then these words are sorted and "uniqued".
the while loop then creates each directory one by one.
the find inside the while loop moves all files that match the directory being processed into it. Why did I not simply do mv ${dirname}_* ${dirname}? Since the expansion of the * wildcard could result in a too long arguments list for the mv command. Doing it with the find ensures that it will work even on LARGE number of files.
Suggesting oneliner awk script:
echo "$(ls -1 *.jpg)"| awk '{system("mkdir -p "$1 OFS $2);system("mv "$0" "$1 OFS $2)}' FS=_ OFS=_
Explanation:
echo "$(ls -1 *.jpg)": List all jpg files in current directory one file per line
FS=_ : Set awk field separator to _ $1=type $2=fabric $3=color.jpg
OFS=_ : Set awk output field separator to _
awk script explanation
{ # for each file name from list
system ("mkdir -p "$1 OFS $2); # execute "mkdir -p type_fabric"
system ("mv " $0 " " $1 OFS $2); # execute "mv current-file to type_fabric"
}

Shell Script: How to copy files with specific string from big corpus

I have a small bug and don't know how to solve it. I want to copy files from a big folder with many files, where the files contain a specific string. For this I use grep, ack or (in this example) ag. When I'm inside the folder it matches without problem, but when I want to do it with a loop over the files in the following script it doesn't loop over the matches. Here my script:
ag -l "${SEARCH_QUERY}" "${INPUT_DIR}" | while read -d $'\0' file; do
echo "$file"
cp "${file}" "${OUTPUT_DIR}/${file}"
done
SEARCH_QUERY holds the String I want to find inside the files, INPUT_DIR is the folder where the files are located, OUTPUT_DIR is the folder where the found files should be copied to. Is there something wrong with the while do?
EDIT:
Thanks for the suggestions! I took this one now, because it also looks for files in subfolders and saves a list with all the files.
ag -l "${SEARCH_QUERY}" "${INPUT_DIR}" > "output_list.txt"
while read file
do
echo "${file##*/}"
cp "${file}" "${OUTPUT_DIR}/${file##*/}"
done < "output_list.txt"
Better implement it like below with a find command:
find "${INPUT_DIR}" -name "*.*" | xargs grep -l "${SEARCH_QUERY}" > /tmp/file_list.txt
while read file
do
echo "$file"
cp "${file}" "${OUTPUT_DIR}/${file}"
done < /tmp/file_list.txt
rm /tmp/file_list.txt
or another option:
grep -l "${SEARCH_QUERY}" "${INPUT_DIR}/*.*" > /tmp/file_list.txt
while read file
do
echo "$file"
cp "${file}" "${OUTPUT_DIR}/${file}"
done < /tmp/file_list.txt
rm /tmp/file_list.txt
if you do not mind doing it in just one line, then
grep -lr 'ONE\|TWO\|THREE' | xargs -I xxx -P 0 cp xxx dist/
guide:
-l just print file name and nothing else
-r search recursively the CWD and all sub-directories
match these works alternatively: 'ONE' or 'TWO' or 'THREE'
| pipe the output of grep to xargs
-I xxx name of the files is saved in xxx it is just an alias
-P 0 run all the command (= cp) in parallel (= as fast as possible)
cp each file xxx to the dist directory
If i understand the behavior of ag correctly, then you have to
adjust the read delimiter to '\n' or
use ag -0 -l to force delimiting by '\0'
to solve the problem in your loop.
Alternatively, you can use the following script, that is based on find instead of ag.
while read file; do
echo "$file"
cp "$file" "$OUTPUT_DIR/$file"
done < <(find "$INPUT_DIR" -name "*$SEARCH_QUERY*" -print)

How to use bash string formatting to reverse date format?

I have a lot of files that are named as: MM-DD-YYYY.pdf. I want to rename them as YYYY-MM-DD.pdf I’m sure there is some bash magic to do this. What is it?
For files in the current directory:
for name in ./??-??-????.pdf; do
if [[ "$name" =~ (.*)/([0-9]{2})-([0-9]{2})-([0-9]{4})\.pdf ]]; then
echo mv "$name" "${BASH_REMATCH[1]}/${BASH_REMATCH[4]}-${BASH_REMATCH[3]}-${BASH_REMATCH[2]}.pdf"
fi
done
Recursively, in or under the current directory:
find . -type f -name '??-??-????.pdf' -exec bash -c '
for name do
if [[ "$name" =~ (.*)/([0-9]{2})-([0-9]{2})-([0-9]{4})\.pdf ]]; then
echo mv "$name" "${BASH_REMATCH[1]}/${BASH_REMATCH[4]}-${BASH_REMATCH[3]}-${BASH_REMATCH[2]}.pdf"
fi
done' bash {} +
Enabling the globstar shell option in bash lets us do the following (will also, like the above solution, handle all files in or below the current directory):
shopt -s globstar
for name in **/??-??-????.pdf; do
if [[ "$name" =~ (.*)/([0-9]{2})-([0-9]{2})-([0-9]{4})\.pdf ]]; then
echo mv "$name" "${BASH_REMATCH[1]}/${BASH_REMATCH[4]}-${BASH_REMATCH[3]}-${BASH_REMATCH[2]}.pdf"
fi
done
All three of these solutions uses a regular expression to pick out the relevant parts of the filenames, and then rearranges these parts into the new name. The only difference between them is how the list of pathnames is generated.
The code prefixes mv with echo for safety. To actually rename files, remove the echo (but run at least once with echo to see that it does what you want).
A direct approach example from the command line:
$ ls
10-01-2018.pdf 11-01-2018.pdf 12-01-2018.pdf
$ ls [0-9]*-[0-9]*-[0-9]*.pdf|sed -r 'p;s/([0-9]{2})-([0-9]{2})-([0-9]{4})/\3-\1-\2/'|xargs -n2 mv
$ ls
2018-10-01.pdf 2018-11-01.pdf 2018-12-01.pdf
The ls output is piped to sed , then we use the p flag to print the argument without modifications, in other words, the original name of the file, and s to perform and output the conversion.
The ls + sed result is a combined output that consist of a sequence of old_file_name and new_file_name.
Finally we pipe the resulting feed through xargs to get the effective rename of the files.
From xargs man:
-n number Execute command using as many standard input arguments as possible, up to number arguments maximum.
You can use the following command very close to the one of klashxx:
for f in *.pdf; do echo "$f"; mv "$f" "$(echo "$f" | sed 's#\(..\)-\(..\)-\(....\)#\3-\2-\1#')"; done
before:
ls *.pdf
12-01-1998.pdf 12-03-2018.pdf
after:
ls *.pdf
1998-01-12.pdf 2018-03-12.pdf
Also if you have other pdf files that does not respect this format in your folder, what you can do is to select only the files that respect the format: MM-DD-YYYY.pdf to do so use the following command:
for f in `find . -maxdepth 1 -type f -regextype sed -regex './[0-9]\{2\}-[0-9]\{2\}-[0-9]\{4\}.pdf' | xargs -n1 basename`; do echo "$f"; mv "$f" "$(echo "$f" | sed 's#\(..\)-\(..\)-\(....\)#\3-\2-\1#')"; done
Explanations:
find . -maxdepth 1 -type f -regextype sed -regex './[0-9]\{2\}-[0-9]\{2\}-[0-9]\{4\}.pdf this find command will look only for files in the current working directory that respect your syntax and extract their basename (remove the ./ at the beginning, folders and other type of files that would have the same name are not taken into account, other *.pdf files are also ignored.
for each file you do a move and the resulting file name is computed using sed and back reference to the 3 groups for MM,DD and YYYY
For these simple filenames, using a more verbose pattern, you can simplify the body of the loop a bit:
twodigit=[[:digit:]][[:digit:]]
fourdigit="$twodigit$twodigit"
for f in $twodigit-$twodigit-$fourdigit.pdf; do
IFS=- read month day year <<< "${f%.pdf}"
mv "$f" "$year-$month-$day.pdf"
done
This is basically #Kusalananda's answer, but without the verbosity of regular-expression matching.

Search using reference file and print matching lines

have a folder structure as shown below ./all_files
-rwxrwxrwx reference_file.txt
drwxrwxrwx file1.txt
drwxrwxrwx file2.txt
drwxrwxrwx file3.txt
reference_file.txt has filenames as shown below
$cat reference_file.txt
file1.txt
file2.txt
data in file1.txt and file2.txt are as shown below:
$cat file1.txt
step_1
step_2
step_3
Now, I have to take particular step say step2 from each file
Note1: file name must present in reference_file.txt
Note2: step2 is not line no:2 always.
Note3: search should perform recursively.
I have used below script:
#!/bin/sh
for i in cat reference_file.txt;
do
find . -type f -name $i | grep -v 'FS*' | xargs grep -F 'step_2'
done<reference_file.txt
after using above code i got no output.
# bash -x script.sh
+ for i in cat reference_file.txt
+ find . -type f -name **cat**
+ xargs grep -F 'step_2'
+ for i in cat **reference_file.txt**
+ find . -type f -name reference_file.txt
+ xargs grep -F 'step_2'
Added New requirement:
target=step_XX_2 where XX can be anything and should be skipped for search.. so that desire ouput will be.. step_ab_2 step_cd_2 step_ef_2
I think this is what you are trying to achieve. Please let me know:
EDIT: my previous version did not search recursively.
Further edits: Note that using process substitution for find means that this script MUST be run under bash and not sh.
Further edit for change in specification: note the change to target and the -E option to grep instead of -F.
#!/bin/bash
target='step_.*?_?2'
while read -r name
do
# EDIT: exclude certain directories
if [[ $name == "old1" || $name == "old2" ]]
then
# do the next iteration of the loop
continue
fi
while read -r fname
do
if [[ $fname != FS* ]]
then
# Display the filename (grep -H is not in POSIX)
if out=$(grep -E "$target" "$fname")
then
echo "$fname: $out"
fi
fi
done < <(find . -type f -name "$name")
done < reference_file.txt
Note that your trace (bash -x) uses bash but your #! line uses sh. They are different - you should be consistent with the shell you are using.
So, I have dropped the xargs, that reads strings standard input and executes a program using the strings as argument. Since we already have the argument strings for grep we don't need it.
Your grep -v 'FS*' probably doesn't do what you expect. The regular expression FS* means "F followed by zero or more S's". Not the same as a shell pattern matching (globbing). In my solution I have used FS* because I am using the shell, not grep.
I believe this question is duplicate of this
What you need is
#!/bin/sh
for i in `cat reference_file.txt`
do find . -type f -name $i | grep -v 'FS*' | xargs grep -F 'step_2'
done
See the backticks and Do Not read the file reference_file.txt twice.

Shell Programming File Search and Append

I am trying to write a shell program that will search my current directory (say, my folder containing C code), read all files for the keywords "printf" or "fprintf", and append the include statement to the file if it isn't already done.
I have tried to write the search portion already (for now, all it does is search files and print the list of matching files), but it is not working. Included below is my code. What am I doing wrong?
EDIT: New code.
#!/bin/sh
#processes files ending in .c and appends statements if necessary
#search for files that meet criteria
for file in $( find . -type f )
do
echo $file
if grep -q printf "$file"
then
echo "File $file contains command"
fi
done
To execute commands in a subshell you need $( command ). Notice the $ before the parenthesis.
You don't need to store the list of files in a temporary variable, you can directly use
for file in $( find . ) ; do
echo "$file"
done
And with
find . -type f | grep somestring
you are not searching the file content but the file name (in my example all the files which name contains "somestring")
To grep the content of the files:
for file in $( find . -type f ) ; do
if grep -q printf "$file" ; then
echo "File $file contains printf"
fi
done
Note that if you match printf it will also match fprintf (as it contains printf)
If you want to search just files ending with .c you can use the -name option
find . -name "*.c" -type f
Use the -type f option to list only files.
In any case check if your grep has the -r option to search recursively
grep -r --include "*.c" printf .
You can do this sort of thing with sed -i, but I find that distasteful. Instead, it seems reasonable to use ed (sed is ed for streams, so it makes sense to use ed when you're not working with a stream).
#!/bin/sh
for i in *.c; do
grep -Fq '#include <stdio.h>' $i && continue
grep -Fq printf $i && ed -s $i << EOF > /dev/null
1
i
#include <stdio.h>
.
w
EOF
done

Resources