Script to mv the files with file sequence which is lowest - shell

Could you help me how to write a shell script
I have the below set of files with
Folder path: abc/xyz
abc122.1001.csv
abc122.1002.csv
abc122.1003.csv
I want to search for the abc122.* files in a folder location and move all the files to another location except the file with the lowest sequence number (1001, 1002...)
ex: abc122.1001.csv is the lowest sequence file, all other files with abc122* have to be moved to another location.
I've tried the command below, but it doesn't work
find /abc/xyz -name 'abc122*' |sort -r | tail -n1 | mv {} u05/BACKUP/OLD

This can be a starting point:
move_all_but_one.sh
#!/bin/bash
shift
if [ $# -gt 0 ]; then
mv $# u05/BACKUP/OLD/
fi
Than run it:
./move_all_but_one.sh abc122*
But still not a final solution as it can crash for very long list of files (argument list too long).

The syntax {} is for the exec command of find, not for mv nor the shell.
Replacing | mv ... with
| while read f; do mv "$f" u05/BACKUP/OLD; done
should help, no ?
the | sort -r | tail -n 1 could be replaced by | sort | tail -n +2

Related

Shell Script: How to copy files with specific string from big corpus

I have a small bug and don't know how to solve it. I want to copy files from a big folder with many files, where the files contain a specific string. For this I use grep, ack or (in this example) ag. When I'm inside the folder it matches without problem, but when I want to do it with a loop over the files in the following script it doesn't loop over the matches. Here my script:
ag -l "${SEARCH_QUERY}" "${INPUT_DIR}" | while read -d $'\0' file; do
echo "$file"
cp "${file}" "${OUTPUT_DIR}/${file}"
done
SEARCH_QUERY holds the String I want to find inside the files, INPUT_DIR is the folder where the files are located, OUTPUT_DIR is the folder where the found files should be copied to. Is there something wrong with the while do?
EDIT:
Thanks for the suggestions! I took this one now, because it also looks for files in subfolders and saves a list with all the files.
ag -l "${SEARCH_QUERY}" "${INPUT_DIR}" > "output_list.txt"
while read file
do
echo "${file##*/}"
cp "${file}" "${OUTPUT_DIR}/${file##*/}"
done < "output_list.txt"
Better implement it like below with a find command:
find "${INPUT_DIR}" -name "*.*" | xargs grep -l "${SEARCH_QUERY}" > /tmp/file_list.txt
while read file
do
echo "$file"
cp "${file}" "${OUTPUT_DIR}/${file}"
done < /tmp/file_list.txt
rm /tmp/file_list.txt
or another option:
grep -l "${SEARCH_QUERY}" "${INPUT_DIR}/*.*" > /tmp/file_list.txt
while read file
do
echo "$file"
cp "${file}" "${OUTPUT_DIR}/${file}"
done < /tmp/file_list.txt
rm /tmp/file_list.txt
if you do not mind doing it in just one line, then
grep -lr 'ONE\|TWO\|THREE' | xargs -I xxx -P 0 cp xxx dist/
guide:
-l just print file name and nothing else
-r search recursively the CWD and all sub-directories
match these works alternatively: 'ONE' or 'TWO' or 'THREE'
| pipe the output of grep to xargs
-I xxx name of the files is saved in xxx it is just an alias
-P 0 run all the command (= cp) in parallel (= as fast as possible)
cp each file xxx to the dist directory
If i understand the behavior of ag correctly, then you have to
adjust the read delimiter to '\n' or
use ag -0 -l to force delimiting by '\0'
to solve the problem in your loop.
Alternatively, you can use the following script, that is based on find instead of ag.
while read file; do
echo "$file"
cp "$file" "$OUTPUT_DIR/$file"
done < <(find "$INPUT_DIR" -name "*$SEARCH_QUERY*" -print)

Faster way to list files with similar names (using bash)?

I have a directory with more than 20K files all with a random number prefix (eg 12345--name.jpg). I want to find files with similar names and remove all but one. I don't care which one because they are duplicates.
To find duplicated names I've use
find . -type f \( -name "*.jpg" \) | | sed -e 's/^[0-9]*--//g' | sort | uniq -d
as the list of a for/next loop.
To find all but one to delete, I'm currently using
rm $(ls -1 *name.jpg | tail -n +2)
This operation is pretty slow. I want to speed this up. Any suggestions?
I would do it like this.
*Note that you are dealing with rm command, so make sure that you have backup of the existing directory in case something goes south.
Create a backup directory and take backup of existing files. Once done check if all the files are there.
mkdir bkp_dir;cp *.jpg /bkp_dir
Create another temp directory where we will keep all only 1 file for each similar name. So all unique file names will be here.
$ mkdir tmp
$ for i in $(ls -1 *.jpg|sed 's/^[[:digit:]].*--\(.*\.jpg\)/\1/'|sort|uniq);do cp $(ls -1|grep "$i"|head -1) tmp/ ;done
*Explanation of the command is at the last. Once executed, check in /tmp directory if you got unique instances of the files.
Remove all *.jpg files from main directory. Saying again, please verify that all files have been backed up before executing rm command.
rm *.jpg
Backup the unique instances from the temp directory.
cp tmp/*.jpg .
Explanation of command in step 2.
Command to get unique file names for step 2 will be
for i in $(ls -1 *.jpg|sed 's/^[[:digit:]].*--\(.*\.jpg\)/\1/'|sort|uniq);do cp $(ls -1|grep "$i"|head -1) tmp/ ;done
$(ls -1 *.jpg|sed 's/^[[:digit:]].*--\(.*\.jpg\)/\1/'|sort|uniq) will get the unique file names like file1.jpg , file2.jpg
for i in $(...);do cp $(ls -1|grep "$i"|head -1) tmp/ ;done will copy one file for each filename to tmp/ directory.
You should not be using ls in scripts and there is no reason to use a separate file list like in userunknown's reply.
keepone () {
shift
rm "$#"
}
keepone *name.jpg
If you are running find to identify the files you want to isolate anyway, traversing the directory twice is inefficient. Filter the output from find directly.
find . -type f -name "*.jpg" |
awk '{ f=$0; sub(/^[0-9]*--/, "", f); if (a[f]++) print }' |
xargs echo rm
Take out the echo if the results look like what you expect.
As an aside, the /g flag to sed is useless for a regex which can only match once. The flag says to replace all occurrences on a line instead of the first occurrence on a line, but if there can be only one, the first is equivalent to all.
Assuming no subdirectories and no whitespace-in-filenames involved:
find . -type f -name "*.jpg" | sed -e 's/^[0-9]*--//' | sort | uniq -d > namelist
removebutone () { shift; echo rm "$#"; }; cat namelist | while read n; do removebutone "*--$n"; done
or, better readable:
removebutone () {
shift
echo rm "$#"
}
cat namelist | while read n; do removebutone "*--$n"; done
Shift takes the first parameter from $* off.
Note that the parens around the name parmeter are superflous, and that there shouldn't be two pipes before sed. Maybe you had something else there, which needed to be covered.
If it looks promising, you have, of course, to remove the 'echo' in front of 'rm'.

Trying to write a shell script to move most recent file from downloads to another folder

I am trying to write a script that will find the most recently added item from my downloads folder and move to to another folder. I'm close but stuck on the final part. I'm doing this as an exercise to better learn iTerm2, not for practical reasons. I realize there are simpler ways to do this in browser.
ls -t1 /Users/name/downloads | head -n 1 | > Users/name/targetfolder
If the item is a file, you can pipe your head command to cp :
ls -t1 /Users/name/downloads | head -n 1 | xargs cp -t Users/name/targetfolder
You may also add a test to check whether the item is a file or a directory :
last=$(ls -t1 . | head -n 1)
todir=Users/name/targetfolder
[ -d $last ] && cp -r "$last" "$todir" || cp "$last" "$todir"
You are correctly finding the most recent item with:
ls -t1 /Users/name/downloads | head -n 1
However you are making mistake afer that.
What you can do is:
mv $(ls -t1 /Users/name/downloads | head -n 1) Users/name/targetfolder
Above is a standard mv command whose syntax is:
mv filename target_filename # if you are renaming a file. Or,
mv filename target_dirname # if moving the file to a different directory.
Anything command between $() is replaced by its output.
So $(ls -t1 /Users/name/downloads | head -n 1) is replaced by the most recent file.
Hence, basically, the command means mv most_recent_file Users/name/targetfolder

Unexpected Error while Executing Simple grep Script

I'm trying to collect a line from a series of very long files. Unfortunately, I need to extract the same line from an identically named file in 1600 distinct directories. The directory structure is like this.
Directory jan10 contains both the executed bash script, and directories named 18-109. The directories 18-109 each contain directories named 18A, 18B, ..., 18H. Inside each of these directories is the file "target.out" that we want the information from. Here is the code that I wrote to access this information:
for i in $(cat ~/jan10/list.txt);
do
cd $i
cd *A
grep E-SUM-OVERALL target.out | cut -c 17-24 > ../overallenergy.out
cd ../*B
grep E-SUM-OVERALL target.out | cut -c 17-24 >> ../overallenergy.out
cd ../*C
grep E-SUM-OVERALL target.out | cut -c 17-24 >> ../overallenergy.out
cd ../*D
grep E-SUM-OVERALL target.out | cut -c 17-24 >> ../overallenergy.out
cd ../*E
grep E-SUM-OVERALL target.out | cut -c 17-24 >> ../overallenergy.out
cd ../*F
grep E-SUM-OVERALL target.out | cut -c 17-24 >> ../overallenergy.out
cd ../*G
grep E-SUM-OVERALL target.out | cut -c 17-24 >> ../overallenergy.out
cd ../*H
done
In this example, list.txt contains the numbers 18-109 each on a different line. An example of the "list.txt" is shown below:
17
18
19
20
21
22
23
24
25
Unexpectedly, this code simply won't work, it returns the error:
./testscript.sh: line 8: cd: 18: No such file or directory
./testscript.sh: line 11: cd: *A: No such file or directory
It returns this error for every numbered directory and every lettered sub-directory. Does anyone have any insight on what I've done wrong? I'll answer any questions, and I apologize again if this is unclear. The grep command by itself does work, so I imagine it's a problem with one of the "cd" commands, but I'm unsure. The code is being executed in the jan10 directory.
for Dir in $(cat ~/jan10/list.txt)
do
find "$Dir" -type f -name target.out |
while read File
do
grep E-SUM-OVERALL "$File" > "${File%/*/target.out}"/overallenergy.out
done
done
Now that I understand your requirement better (my fault), here's a more fleshed out solution.
prompt$ cat simpleGrepScript.sh
#!/bin/bash
if ${testMode:-true} ; then
echo "processing file $1 into outfile ${1%/*}/../overallenergy.out" 1>&2
else
[[ -f "$1" ]] && grep 'E-SUM-OVERALL' "$1" > ${1%/*}/../overallenergy.out || echo "no file "$1" found" 1>&2
fi
Run
prompt$ find /starting/path -name target.out | xargs /path/to/simpleGrepScript.sh
if the output from the testMode
"processing file $1 into outfile ${1%/*}/../overallenergy.out"
looks OK, then change to ${testMode:-false}.
If it doesn't look right, post the mininum error examples as a comment and I'll see if I can fix it.
If there are spaces in your path name, we'll have to circle back and add some more options to find and xargs.
IHTH.
Define a shell function that, for a given directory, finds all the underlying targets and for each target outputs, on stdout, a suitable command.
% gen_greps () {
find $1 -name target.out | while read fname ; do
printf "grep E-SUM-OVERALL $fname | "
printf "cut -c 17-24 > "
printf "$(dirname $fname)/overallenergy.out\n"
done
}
%
make a dry run
% gen_greps jan10
...
grep E-SUM-OVERALL jan10/29/29H/target.out | cut -c 17-24 > jan10/29/29H/overallenergy.out
...
%
if what we see is what we want, pass the commands to a shell for execution
% gen_greps jan10 | sh
%
That's all (?)
Don't use for in this way. In order for for to execute, it must first process the cat command, and if there are white spaces in the file name, the for will fail. Plus, it's very possible to overload your command line when executing the for.
Instead use a while read loop which is more efficient and more tolerant of file name issues:
while read dir
do
....
done < ~/jan10/list.txt
It is also very dangerous to use glob patters in the cd command because more than one file could match that pattern, and that could cause cd to fail.
Also, if you find yourself piping to a series of grep, cut, sed commands, you can usually replace that with a single awk command.
If all of your files you need are called target.out and there are no other files called target.out that you want to skip, you can use find to find the various files without changing directories to each one:
Note how much shorter and simpler the entire program is:
while read dir
do
find $dir -name "target.out" -type f \
-exec awk '/E-SUM-OVERALL/ {print substr $0, 17, 8}' {}\;
done < ~/jan10/list.txt > overallenergy.out
I don't have any data, so it's sort of hard to actually test this. It maybe possible that I could simply use the field in my awk rather that substr. Or my substr command could be off.

bash continue execution on command failure

#! /bin/bash
while :
do
filenames=$(ls -rt *.log | tail -n 2)
echo $filenames
cat $filenames > jive_log.txt
sleep 0.1
done
I am trying to read latest 2 files from a directory and join them using bash.
However when no files are present in the current directory with an extension .log the command ls -rt *.log fails with error "ls: cannot access *.log: No such file or directory". After the error it looks like the while loop does not execute.
AfterWhat do I do so that the infinite loop continues even if one command fails.
I'm not sure what you mean but perhaps:
for (( ;; )); do
while IFS= read -r FILE; do
cat "$FILE"
done < <(exec ls -rt1 *.log | tail -n 2) >> jive_log.txt
sleep 1
done
Note the ls option -1 which prints out files line by line.
Anyhow you can join last two files to jive_log.txt with:
while IFS= read -r FILE; do
cat "$FILE"
done < <(exec ls -rt1 *.log | tail -n 2) >> jive_log.txt
Another way is to save it to an array (e.g. with readarray) then pass the last 2 elements to cat.
readarray -t FILES < <(exec ls -rt1 *.log)
cat "${FILES[#]:(-2)}" > jive_log.txt ## Or perhaps you mean to append it? (>>)
If you want to sort the output of find, you have to add a sort key at the beginning, which can be removed later on.
find . -name \*.log -printf '%T+\t%p\n' |
sort -r |
head -2 |
cut -f 2-
Using head instead of tail is a bit cheaper.

Resources