Shell Script: How to copy files with specific string from big corpus

Shell Script: How to copy files with specific string from big corpus - bash

I have a small bug and don't know how to solve it. I want to copy files from a big folder with many files, where the files contain a specific string. For this I use grep, ack or (in this example) ag. When I'm inside the folder it matches without problem, but when I want to do it with a loop over the files in the following script it doesn't loop over the matches. Here my script:
ag -l "${SEARCH_QUERY}" "${INPUT_DIR}" | while read -d $'\0' file; do
echo "$file"
cp "${file}" "${OUTPUT_DIR}/${file}"
done
SEARCH_QUERY holds the String I want to find inside the files, INPUT_DIR is the folder where the files are located, OUTPUT_DIR is the folder where the found files should be copied to. Is there something wrong with the while do?
EDIT:
Thanks for the suggestions! I took this one now, because it also looks for files in subfolders and saves a list with all the files.
ag -l "${SEARCH_QUERY}" "${INPUT_DIR}" > "output_list.txt"
while read file
do
echo "${file##*/}"
cp "${file}" "${OUTPUT_DIR}/${file##*/}"
done < "output_list.txt"

Better implement it like below with a find command:
find "${INPUT_DIR}" -name "*.*" | xargs grep -l "${SEARCH_QUERY}" > /tmp/file_list.txt
while read file
do
echo "$file"
cp "${file}" "${OUTPUT_DIR}/${file}"
done < /tmp/file_list.txt
rm /tmp/file_list.txt
or another option:
grep -l "${SEARCH_QUERY}" "${INPUT_DIR}/*.*" > /tmp/file_list.txt
while read file
do
echo "$file"
cp "${file}" "${OUTPUT_DIR}/${file}"
done < /tmp/file_list.txt
rm /tmp/file_list.txt

if you do not mind doing it in just one line, then
grep -lr 'ONE\|TWO\|THREE' | xargs -I xxx -P 0 cp xxx dist/
guide:
-l just print file name and nothing else
-r search recursively the CWD and all sub-directories
match these works alternatively: 'ONE' or 'TWO' or 'THREE'
| pipe the output of grep to xargs
-I xxx name of the files is saved in xxx it is just an alias
-P 0 run all the command (= cp) in parallel (= as fast as possible)
cp each file xxx to the dist directory

If i understand the behavior of ag correctly, then you have to
adjust the read delimiter to '\n' or
use ag -0 -l to force delimiting by '\0'
to solve the problem in your loop.
Alternatively, you can use the following script, that is based on find instead of ag.
while read file; do
echo "$file"
cp "$file" "$OUTPUT_DIR/$file"
done < <(find "$INPUT_DIR" -name "*$SEARCH_QUERY*" -print)

Related

Trying to create a folder and cp the file using text file

I'm trying to create a folder using txt file and copy the file. I have two file types:
try.txt
Changes/EMAIL/header-20-percent-off.gif
Changes/EMAIL/header-50-percent-off.gif
demo of folder named zip2
zip2/EMAIL/header-20-percent-off.gif
zip2/EMAIL/header-50-percent-off.gif
Code:
mkdir -p dirname `xargs -a try.txt`
cp -R {Dont know how this will work :( }
Actual output:
Changes/EMAIL/header-20-percent-off.gif/
/header-50-percent-off.gif/
Expected output:
Changes/EMAIL/header-20-percent-off.gif
/header-50-percent-off.gif
As you can see for some reason it thinks header-20-percent-off.gif and header-50-percent-off.gif are directories.
Once Changes/Email/ is created I would like to copy the two gif files header-20-percent-off.gif and header-50-percent-off.gif there.

First create folders:
<try.txt xargs -d$'\n' dirname | xargs -d$'\n' mkdir -p
Then copy files. First prepare the stream with proper source and destination directories with sed and then pass to xargs:
sed 's#^Changes/\(.*\)#zip2/\1\n&#' try.txt |
xargs -d$'\n' -n2 cp
But if you are not proficient in bash, just read the stream line by line:
while IFS= read -r dest; do
dir=$(dirname "$dest")
mkdir -p "$dir"
src=$(sed 's#^Changes#zip2#' <<<"$dest")
cp "$src" "$dest"
done < try.txt
Don't use backticks `, they are highly discouraged. Use $(...) for command substitution instead.
Just doing xargs -a try.txt without a command makes little sense, just $(cat try.txt) or better $(<try.txt).
Use -t option with xargs to see what is it doing.
Explicitly specify the delimeter with xargs -d$'\n' - otherwise xargs will parse " ' and \ specially.
I believe with some luck and work you could just use rsync with something along rsync --include-from=try.txt changes/ zip2/.

How can I recursively replace file and directory names using Terminal?

Using the Terminal on macOS, I want to recursively replace a word with the name of both a directory and a file name. For instance, I have an angular app and the module name is article, all of the file names, and directory names contain the word article. I've already done a find and replace to replace articles with apples in the code. Now I want to do the same with the file structure so both the file names and the directories share the same convention.
Just for information, I've already tried to use the newest Yeoman generator to create new files, but there seems to be an issue with it. The alternative is to duplicate a directory and rename all of the files, this is quite time consuming.

got it to work with the following script
var=$1
if [ -n "$var" ]; then
CRUDNAME=$1
CRUDNAMEUPPERCASE=`echo ${CRUDNAME:0:1} | tr '[a-z]' '[A-Z]'`${CRUDNAME:1}
FOLDERNAME=$CRUDNAME's'
# Create new folder
cp -R modules/articles modules/$FOLDERNAME
# Do the find/replace in all the files
find modules/$FOLDERNAME -type f -print0 | xargs -0 sed -i -e 's/Article/'$CRUDNAMEUPPERCASE'/g'
find modules/$FOLDERNAME -type f -print0 | xargs -0 sed -i -e 's/article/'$CRUDNAME'/g'
# Delete useless files due to sed
rm modules/$FOLDERNAME/**/*-e
rm modules/$FOLDERNAME/**/**/*-e
rm modules/$FOLDERNAME/**/**/**/*-e
# Rename all the files
for file in modules/$FOLDERNAME/**/*article* ; do mv $file ${file//article/$CRUDNAME} ; done
for file in modules/$FOLDERNAME/**/**/*article* ; do mv $file ${file//article/$CRUDNAME} ; done
for file in modules/$FOLDERNAME/**/**/**/*article* ; do mv $file ${file//article/$CRUDNAME} ; done
else
echo "Usage: sh rename-module.sh [crud-name]"
fi
apparently I'm not the only one to encounter this issue
https://github.com/meanjs/generator-meanjs/issues/79

Making a file out of all the files with given a string

Create a file that includes the content of all the files in the current folder that has a given string (in say argument 1), the data will be in it one after the other (each file appended to the end). The name of the file will be the given string.
I thought of the following but it doesn't work:
grep $1 * >> fnames #places all the names of the right files in a file
for x in fnames
do
cat x >> $1 #concat the files from the list
done
rm fnames
On the same note, is there a site that has solved exercises like this or examples?

You can do something like this using process substitution:
shopt -s nullglob
while read -r file; do
cat "$file"
done < <(grep -l "search-pattern" *) > /path/to/newfile
This is assuming your directory only has files and no sub-directories.
You will need to use find with grep if there are sub-directories as well:
find . -maxdepth 1 -type f -exec grep -q "search-pattern" {} \; -print0 |
xargs -0 cat > /path/to/newfile

How about (assuming you aren't worried about files with spaces or newlines or shell globs/etc. in their names since those will not work here correctly):
for O in $(grep -l $1 *)
do
cat "$O" >> $1
done

Display filename of tar file

I would like to know how to display the filename along with the lines matching a specfic word of a tar file.
Command wise :
zcat file | grep "stuff" -r # shows what I want
zcat *.gz | grep "stuff" -ar # this fails

You can use zgrep:
For single file, you can use the following command to display filename:
zgrep "stuff" file.gz /dev/null
For multiple files:
zgrep "stuff" *.gz

Maybe this related answer can help. It uses tar to untar (you would need to add -z) and pipes each file of the archive to awk for "grepping" inside it.

I'm not quite sure what the question is but if you are looking for tar files on your system then just do something like this. This will recursively search your current directory and any child directories for .tar files. Hope this helps.
find -name "*.tar"

If zcat file | grep "stuff" -r shows what you want, you can do this for multiple files:
for name in *.gz ; do zcat "$name" | grep -a "stuff" | sed -e "s/^/${name}: /" ; done
This command uses globbing (*) to expand to a list of .gz files in your working directory, then calls zcat for extraction, grep for the search and sed for prefixing with the filename on each of the files.
Note that if you are working with gzipped tarballs, most people give them a .tgz or .tar.gz instead of just .gz extension.

This will output nameOfFileInTar:LineNumber:Match. Invoke with greptar.sh tarfile.tar pattern
If you don't want the line number, remove the -n option. If you only want the line number, add |cut -f1 -d: after the grep
#!/bin/bash
TARFILE=$1
PATTERN=$2
tar ztf $TARFILE | while read -r FILE
do
res=$(tar zxf $TARFILE $FILE -O | grep -n $2 )
if [[ $? == 0 ]]; then
echo "$res" | while read -r line; do
echo $FILE:$line;
done
fi
done

How can I manipulate file names using bash and sed?

I am trying to loop through all the files in a directory.
I want to do some stuff on each file (convert it to xml, not included in example), then write the file to a new directory structure.
for file in `find /home/devel/stuff/static/ -iname "*.pdf"`;
do
echo $file;
sed -e 's/static/changethis/' $file > newfile +".xml";
echo $newfile;
done
I want the results to be:
$file => /home/devel/stuff/static/2002/hello.txt
$newfile => /home/devel/stuff/changethis/2002/hello.txt.xml
How do I have to change my sed line?

If you need to rename multiple files, I would suggest to use rename command:
# remove "-n" after you verify it is what you need
rename -n 's/hello/hi/g' $(find /home/devel/stuff/static/ -type f)
or, if you don't have rename try this:
find /home/devel/stuff/static/ -type f | while read FILE
do
# modify line below to do what you need, then remove leading "echo"
echo mv $FILE $(echo $FILE | sed 's/hello/hi/g')
done

Are you trying to change the filename? Then
for file in /home/devel/stuff/static/*/*.txt
do
echo "Moving $file"
mv "$file" "${file/static/changethis}.xml"
done
Please make sure /home/devel/stuff/static/*/*.txt is what you want before using the script.

First, you have to create the name of the new file based on the name of the initial file. The obvious solution is:
newfile=${file/static/changethis}.xml
Second you have to make sure that the new directory exists or create it if not:
mkdir -p $(dirname $newfile)
Then you can do something with your file:
doSomething < $file > $newfile

I wouldn't do the for loop because of the possibility of overloading your command line. Command lines have a limited length, and if you overload it, it'll simply drop off the excess without giving you any warning. It might work if your find returns 100 file. It might work if it returns 1000 files, but it might fail if your find returns 1000 files and you'll never know.
The best way to handle this is to pipe the find into a while read statement as glenn jackman.
The sed command only works on STDIN and on files, but not on file names, so if you want to munge your file name, you'll have to do something like this:
$newname="$(echo $oldname | sed 's/old/new/')"
to get the new name of the file. The $() construct executes the command and puts the results of the command on STDOUT.
So, your script will look something like this:
find /home/devel/stuff/static/ -name "*.pdf" | while read $file
do
echo $file;
newfile="$(echo $file | sed -e 's/static/changethis/')"
newfile="$newfile.xml"
echo $newfile;
done
Now, since you're renaming the file directory, you'll have to make sure the directory exists before you do your move or copy:
find /home/devel/stuff/static/ -name "*.pdf" | while read $file
do
echo $file;
newfile="$(echo $file | sed -e 's/static/changethis/')"
newfile="$newfile.xml"
echo $newfile;
#Check for directory and create it if it doesn't exist
$dirname=$(dirname "$newfile")
if [ ! -d "$dirname" ]
then
mkdir -p "$dirname"
fi
#Directory now exists, so you can do the move
mv "$file" "$newfile"
done
Note the quotation marks to handle the case there's a space in the file name.
By the way, instead of doing this:
if [ ! -d "$dirname" ]
then
mkdir -p "$dirname"
fi
You can do this:
[ -d "$dirname"] || mkdir -p "$dirname"
The || means to execute the following command only if the test isn't true. Thus, if [ -d "$dirname" ] is a false statement (the directory doesn't exist), you run mkdir.
It's a fairly common shortcut when you see shell scripts.

find ... | while read file; do
newfile=$(basename "$file").xml;
do something to "$file" > "$somedir/$newfile"
done

OUTPUT="$(pwd)";
for file in `find . -iname "*.pdf"`;
do
echo $file;
cp $file $file.xml
echo "file created in directory = {$OUTPUT}"
done
This will create a new file with name whatyourfilename.xml, for hello.pdf the new file created would be hello.pdf.xml, basically it creates a new file with .xml appended at the end.
Remember the above script finds files in the directory /home/devel/stuff/static/ whose file names match the matcher string of the find command (in this case *.pdf), and copies it to your present working directory.
The find command in this particular script only finds files with filenames ending with .pdf If you wanted to run this script for files with file names ending with .txt, then you need to change the find command to this find /home/devel/stuff/static/ -iname "*.txt",

Once I wanted to remove trailing -min from my files. i.e. wanted alg-min.jpg to turn into alg.jpg. so after some struggle, managed to figure something like this:
for f in *; do echo $f; mv $f $(echo $f | sed 's/-min//g');done;
Hope this helps someone willing to REMOVE or SUBTITUDE some part of their file names.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Shell Script: How to copy files with specific string from big corpus - bash

Related

Trying to create a folder and cp the file using text file

How can I recursively replace file and directory names using Terminal?

Making a file out of all the files with given a string

Display filename of tar file

How can I manipulate file names using bash and sed?

Categories

Resources