Sequentially numbering of files in different folders while keeping the name after the number

Sequentially numbering of files in different folders while keeping the name after the number - bash

I have a lot of ogg or wave files in different folders that I want to sequentially number while keeping everything that stands behind the prefixed number. The input may look like this
Folder1/01 Insbruck.ogg
02 From Milan to Rome.ogg
03 From Rome to Naples.ogg
Folder2/01 From Naples to Palermo.ogg
02 From Palermo to Syracrus.ogg
03 From Syracrus to Tropea
The output should be:
Folder1/01 Insbruck.ogg
02 From Milan to Rome.ogg
03 From Rome to Naples.ogg
Folder2/04 From Naples to Palermo.ogg
05 From Palermo to Syracrus.ogg
06 From Syracrus to Tropea.ogg
The sequential numbering across folders can be done with this BASH script that I found here:
find . | (i=0; while read f; do
let i+=1; mv "$f" "${f%/*}/$(printf %04d "$i").${f##*.}";
done)
But this script removes the title that I would like to keep.

TL;DR
Like this, using find and perl rename:
rename -n 's#/\d+#sprintf "/%0.2d", ++$::c#e' Folder*/*
Drop -n switch if the output looks good.
With -n, you only see the files that will really be renamed, so only 3 files from Folder2.
Going further
The variable $::c (or $main::c is a package variable) is a little hack to avoid the use of more complex expressions:
rename -n 's#/\d+#sprintf "/%0.2d", ++our $c#e' Folder*/*
or
rename -n '{ no strict; s#/\d+#sprintf "/%0.2d", ++$c#e; }' Folder*/*
or
rename -n '
do {
use 5.012;
state $c = 0;
s#/\d+#sprintf "/%0.2d", ++$c#e
}
' Folder*/*
Thanks go|dfish & Grinnz on freenode

A bash script for this job would be:
#!/bin/bash
argc=$#
width=${#argc}
n=0
for src; do
base=$(basename "$src")
dir=$(dirname "$src")
if ! [[ $base =~ ^[0-9]+\ .*\.(ogg|wav)$ ]]; then
echo "$src: Unexpected file name. Skipping..." >&2
continue
fi
printf -v dest "$dir/%0${width}d ${base#* }" $((++n))
echo "moving '$src' to '$dest'"
# mv -n "$src" "$dest"
done
and could be run as
./renum Folder*/*
assuming the script is saved as renum. It will just print out source and destination file names. To do actual moving, you should drop the # at the beginning of the line # mv -n "$src" "$dest" after making sure it will work as expected. Note that the mv command will not overwrite an existing file due to the -n option. This may or may not be desirable. The script will print out a warning message and skip unexpected file names, that is, the file names not fitting the pattern specified in the question.

The sequential numbering across folders can be done with this BASH script that I found here:
find . | (i=0; while read f; do
let i+=1; mv "$f" "${f%/*}/$(printf %04d "$i").${f##*.}";
done)
But this script removes the title that I would like to keep.
Not as robust as the accepted answer but this is the improved version of your script and just in case rename is not available.
#!/usr/bin/env bash
[[ -n $1 ]] || {
printf >&2 'Needs a directory as an argument!\n'
exit 1
}
n=1
directory=("$#")
while IFS= read -r files; do
if [[ $files =~ ^(.+)?\/([[:digit:]]+[^[:blank:]]+)(.+)$ ]]; then
printf -v int '%02d' "$((n++))"
[[ -e "${BASH_REMATCH[1]}/$int${BASH_REMATCH[3]}" ]] && {
printf '%s is already in sequential order, skipping!\n' "$files"
continue
}
echo mv -v "$files" "${BASH_REMATCH[1]}/$int${BASH_REMATCH[3]}"
fi
done < <(find "${directory[#]}" -type f | sort )
Now run the script with the directory in question as the argument.
./myscript Folder*/
or
./myscript Folder1/
or
./myscript Folder2/
or a . the . is the current directory.
./myscript .
and so on...
Remove the echo if you're satisfied with the output.

Related

Search file of directories and find file names, save to new file - bash

I'm trying to find the paths for some fastq.gz files in a mess of a system.
I have some folder paths in a file called temp (subset):
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG167/
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG178/
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG213/
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG230/
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG234/
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG250/
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG251/
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG257/
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG263/
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG265/temp/
Let's assume 2 fastq.gz files are found in each directory in temp except for /temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG265/temp/.
I want to find the fastq.gz files and print them (if found) next to the directory I'm searching in.
Ideal output:
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG167/ found /temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG167/NG167_S19_R2_001.fastq.gz
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG178/ found /temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG178/NG178_S1_R2_001.fastq.gz
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG178/ found /temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG178/NG178_S1_R1_001.fastq.gz
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG213/ found /temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG213/NG213_S20_R2_001.fastq.gz
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG213/ found /temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG213/NG213_S20_R1_001.fastq.gz
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG230/ found /temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG230/NG230_S23_R1_001.fastq.gz
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG230/ found /temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG230/NG230_S23_R2_001.fastq.gz
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG234/ found /temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG234/NG234_S18_R2_001.fastq.gz
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG234/ found /temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG234/NG234_S18_R1_001.fastq.gz
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG250/ found /temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG250/NG250_S2_R2_001.fastq.gz
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG250/ found /temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG250/NG250_S2_R1_001.fastq.gz
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG251/ found /temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG251/NG251_S3_R1_001.fastq.gz
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG251/ found /temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG251/NG251_S3_R2_001.fastq.gz
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG257/ found /temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG257/NG257_S4_R2_001.fastq.gz
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG257/ found /temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG257/NG257_S4_R1_001.fastq.gz
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG263/ found /temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG263/NG263_S22_R1_001.fastq.gz
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG263/ found /temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG263/NG263_S22_R2_001.fastq.gz
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG265/temp/ not_found
I'm part the way there:
wc -l temp
while read -r line; do cd $line; echo ${line} >> ~/tmp; find `pwd -P` -name "*fastq.gz" >> ~/tmp; done < temp
cd ~
less tmp
Current output:
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG167/
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG167/NG167_S19_R1_001.fastq.gz
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG167/NG167_S19_R2_001.fastq.gz
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG178/
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG178/NG178_S1_R2_001.fastq.gz
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG178/NG178_S1_R1_001.fastq.gz
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG213/
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG213/NG213_S20_R2_001.fastq.gz
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG213/NG213_S20_R1_001.fastq.gz
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG230/
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG230/NG230_S23_R1_001.fastq.gz
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG230/NG230_S23_R2_001.fastq.gz
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG234/
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG234/NG234_S18_R2_001.fastq.gz
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG234/NG234_S18_R1_001.fastq.gz
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG250/
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG250/NG250_S2_R2_001.fastq.gz
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG250/NG250_S2_R1_001.fastq.gz
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG251/
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG251/NG251_S3_R1_001.fastq.gz
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG251/NG251_S3_R2_001.fastq.gz
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG257/
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG257/NG257_S4_R2_001.fastq.gz
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG257/NG257_S4_R1_001.fastq.gz
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG263/
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG263/NG263_S22_R1_001.fastq.gz
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG263/NG263_S22_R2_001.fastq.gz
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG265/temp/
My code places the directory searched for first, then any matching files on subsequent lines. I'm not sure how to get the output I desire...
Any help, gratefully received!
Thanks,

Not your original script but this version does not run cd and find on each line in this case each directory but the whole directory tree/structure just once and the parsing is done inside the while read loop.
#!/usr/bin/env bash
mapfile -t to_search < temp.txt
while IFS= read -rd '' files; do
if [[ $files == *.fastq.gz ]]; then
printf '%s found %s\n' "${files%/*}/" "$files"
else
printf '%s not_found!\n' "$files" >&2
fi
done < <(find "${to_search[#]%/*.fastq.gz*}" -print0) | column -t
This is how I would rewrite your script. Using cd in a subshell
#!/usr/bin/env bash
while read -r line; do
if [[ -d "$line" ]]; then
(
cd "$line" || exit
varname=$(find "$(pwd -P)" -name '*fastq.gz')
if [[ -n $varname ]]; then
printf '%s found %s\n' "$line" "$line${varname#*./}"
else
printf '%s not_found!\n' "$line"
fi
)
fi
done < temp.txt | column -t

Given a line -
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG263/NG263_S22_R2_001.fastq.gz
you can get what you want for the found lines quite easily with sed - just feed the lines to it.
... | sed -e 's#^\(.*/\)\([^/]*\)$#\1 found \1\2#'
However, that doesn't eliminate the line before.
To do that you either use something like awk (and do a simple state machine), or do something like this in sed (general idea here https://stackoverflow.com/a/25203093).
... | sed -e '#/$#{$!N;#\n.*gz$#!P;D}'
(although I think I have a typo as it is not working for me on osx).
So then you'd be left with the .gz lines already converted, and the lines ending in / where you can also use sed to then append the "not found".
... | sed -e 's#/$#/ not found#'

Bash command does not work in script but in console

I have running the two commands in a script where I want to check if all files in a directoy are media:
1 All_LINES=$(ls -1 | wc -l)
2 echo "Number of lines: ${All_LINES}"
3
4 REACHED_LINES=$(ls -1 *(*.JPG|*.jpg|*.PNG|*.png|*.JPEG|*.jpeg) | wc -l)
5 echo "Number of reached lines: ${REACHED_LINES}"
if[...]
Running line 4 and 5 sequentially in a shell it works as expected, counting all files ending with .jpg, .JPG...
Running all together in a script gives the following error though:
Number of lines: 12
/home/andreas/.bash_scripts/rnimgs: command substitution: line 17: syntax error near unexpected token `('
/home/andreas/.bash_scripts/rnimgs: command substitution: line 17: `ls -1 *(*.JPG|*.jpg|*.PNG|*.png|*.JPEG|*.jpeg) | wc -l)'
Number of reached lines:
Could somebody explain this to me, please?
EDIT: This is as far as I got:
#!/bin/bash
# script to rename images, sorted by "type" and "date modified" and named by current folder
#get/set basename for files
CURRENT_BASENAME=$(basename "${PWD}")
echo -e "Current directory/basename is: ${CURRENT_BASENAME}\n"
read -e -p "Please enter basename: " -i "${CURRENT_BASENAME}" BASENAME
echo -e "\nNew basename is: ${BASENAME}\n"
#start
echo -e "START RENAMING"
#get nr of all files in directory
All_LINES=$(ls -1 | wc -l)
echo "Number of lines: ${All_LINES}"
#get nr of media files in directory
REACHED_LINES=$(ls -1 *(*.JPG|*.jpg|*.PNG|*.png|*.JPEG|*.jpeg) | wc -l)
echo "Number of reached lines: ${REACHED_LINES}"
EDIT1: Thanks again guys, this is my result so far. Still room for improvement, but a start and ready to test.
#!/bin/bash
#script to rename media files to a choosable name (default: ${basename} of current directory) and sorted by date modified
#config
media_file_extensions="(*.JPG|*.jpg|*.PNG|*.png|*.JPEG|*.jpeg)"
#enable option extglob (extended globbing): If set, the extended pattern matching features described above under Pathname Expansion are enabled.
#more info: https://askubuntu.com/questions/889744/what-is-the-purpose-of-shopt-s-extglob
#used for regex
shopt -s extglob
#safe and set IFS (The Internal Field Separator): IFS is used for word splitting after expansion and to split lines into words with the read builtin command.
#more info: https://bash.cyberciti.biz/guide/$IFS
#used to get blanks in filenames
SAVEIFS=$IFS;
IFS=$(echo -en "\n\b");
#get and print current directory
basedir=$PWD
echo "Directory:" $basedir
#get and print nr of files in current directory
all_files=( "$basedir"/* )
echo "Number of files in directory: ${#all_files[#]}"
#get and print nr of media files in current directory
media_files=( "$basedir"/*${media_file_extensions} )
echo -e "Number of media files in directory: ${#media_files[#]}\n"
#validation if #all_files = #media_files
if [ ${#all_files[#]} -ne ${#media_files[#]} ]
then
echo "ABORT - YOU DID NOT REACH ALL FILES, PLEASE CHECK YOUR FILE ENDINGS"
exit
fi
#make a copy
backup_dir="backup_95f528fd438ef6fa5dd38808cdb10f"
backup_path="${basedir}/${backup_dir}"
mkdir "${backup_path}"
rsync -r "${basedir}/" "${backup_path}" --exclude "${backup_dir}"
echo "BACKUP MADE"
echo -e "START RENAMING"
#set new basename
basename=$(basename "${PWD}")
read -e -p "Please enter file basename: " -i "$basename" basename
echo -e "New basename is: ${basename}\n"
#variables
counter=1;
new_name="";
file_extension="";
#iterate over files
for f in $(ls -1 -t -r *${media_file_extensions})
do
#catch file name
echo "Current file is: $f"
#catch file extension
file_extension="${f##*.}";
echo "Current file extension is: ${file_extension}"
#create new name
new_name="${basename}_${counter}.${file_extension}"
echo "New name is: ${new_name}";
#rename file
mv $f "${new_name}";
echo -e "Counter is: ${counter}\n"
((counter++))
done
#get and print nr of media files before
echo "Number of media files before: ${#media_files[#]}"
#get and print nr of media files after
media_files=( "$basedir"/*${media_file_extensions} )
echo -e "Number of media files after: ${#media_files[#]}\n"
#delete backup?
while true; do
read -p "Do you wish to keep the result? " yn
case $yn in
[Yy]* ) rm -r ${backup_path}; echo "BACKUP DELETED"; break ;;
[Nn]* ) rm -r !(${backup_dir}); rsync -r "${backup_path}/" "${basedir}"; rm -r ${backup_path}; echo "BACKUP RESTORED THEN DELETED"; break;;
* ) echo "Please answer yes or no.";;
esac
done
#reverse IFS to default
IFS=$SAVEIFS;
echo -e "END RENAMING"

You don't need to and don't want to use ls at all here. See https://mywiki.wooledge.org/ParsingLs
Also, don't use uppercase for your private variables. See Correct Bash and shell script variable capitalization
#!/bin/bash
shopt -s extglob
read -e -p "Please enter basename: " -i "$PWD" basedir
all_files=( "$basedir"/* )
echo "Number of files: ${#all_files[#]}"
media_files=( "$basedir"/*(*.JPG|*.jpg|*.PNG|*.png|*.JPEG|*.jpeg) )
echo "Number of media files: ${#media_files[#]}"

As #chepner already pointed out in a comment, you likely need to explicitly enable extended globbing on your script. c.f. Greg's WIKI
It's also possible to condense that pattern to eliminate some redundancy and add mixed case if you like -
$: ls -1 *.*([Jj][Pp]*([Ee])[Gg]|[Pp][Nn][Gg])
a.jpg
b.JPG
c.jpeg
d.JPEG
mixed.jPeG
mixed.pNg
x.png
y.PNG
You can also accomplish this without ls, which is error-prone. Try this:
$: all_lines=(*)
$: echo ${#all_lines[#]}
55
$: reached_lines=( *.*([Jj][Pp]*([Ee])[Gg]|[Pp][Nn][Gg]) )
$: echo ${#reached_lines[#]}
8
c.f. this breakdown
If all you want is counts, but prefer not to include directories:
all_dirs=( */ )
num_files=$(( ${#all_files[#]} - ${#all_dirs[#]} ))
If there's a chance you will have a directory with a name that matches your jpg/png pattern, then it gets trickier. At that point it's probably easier to just use #markp-fuso's solution.
One last thing - avoid all-caps variable names. Those are generally reserved for system stuff.

Assuming the OP wants to limit the counts to normal files (ie, exclude non-files like directories, pipes, symbolic links, etc), a solution based on find may provide more accurate counts.
Updating OP's original code to use find (ignoring dot files for now):
ALL_LINES=$(find . -maxdepth 1 -type f | wc -l)
echo "Number of lines: ${ALL_LINES}"
REACHED_LINES=$(find . -maxdepth 1 -type f \( -iname '*.jpg' -o -iname '*.png' -o -iname '*.jpeg' \) | wc -l)
echo "Number of reached lines: ${REACHED_LINES}"

Cannot escape path in bash file

I'm trying to run some command with looping through all files in a directory. The code is:
#!/bin/bash
shopt -s nullglob
INPUT_DIR=$1
OUTPUT_DIR=$2
: ${INPUT_DIR:="."}
: ${OUTPUT_DIR:="."}
files="$INPUT_DIR/*.ttf"
for file in $files
do
base_file=${file##*/}
output="$OUTPUT_DIR/${base_file%.*}.woff"
ttf2woff "$file" "$output" || exit 1
done
I'd expect the double qoutes around $INPUT_DIR/*.ttf would do the magic but apparently it's not:
$> ttf2woff_multi "/Users/ozan/Dropbox/Graphic Library/Google Fonts/fonts-master/ofl/raleway"
Can't open input file (/Users/ozan/Dropbox/Graphic)
and when I print out $FILES I get: /Users/ozan/Dropbox/Graphic Library/Google
What am I missing here?
Edit: files="$INPUT_DIR"/*.ttf instead of files="$INPUT_DIR/*.ttf" doesn't work either...

In addition to the array solution, (which is a good solution), you can also make use of read with process substitution:
INPUT_DIR=${1:=.}
OUTPUT_DIR=${2:=.}
[ -d "$INPUT_DIR" -a -d "$OUTPUT_DIR" ] || {
printf "error: invalid directory specified (INPUT_DIR or OUTPUT_DIR)\n"
exit 1
}
while IFS= read -r file; do
base_file=${file##*/}
output="$OUTPUT_DIR/${base_file%.*}.woff"
ttf2woff "$file" "$output" || exit 1
done < <(find "$INPUT_DIR" -type f -iname "*.ttf")

Since you want to loop through a list of files, better store them in an array:
files=("$INPUT_DIR"/*.ttf)
for file in "${files[#]}"
do
base_file=${file##*/}
output="$OUTPUT_DIR/${base_file%.*}.woff"
ttf2woff "$file" "$output" || exit 1
done
Note you were saying "$INPUT_DIR/*.ttf" whereas I am suggesting "$INPUT_DIR"/*.ttf. This is to allow the globbing to behave as intended and expand properly.
The key point here, as Cyrus mentions in comments, is the fact of not quoting, since they prevent globbing.
See an example with some files.
$ ls f*
f1 f2 f3
Store with double quotes... it just matches the string itself:
$ files=("f*")
$ for f in "${files[#]}"; do echo "$f"; done
f*
See how it is expanded if we do not quote:
$ files=(f*)
$ for f in "${files[#]}"; do echo "$f"; done
f1
f2
f3

create and rename multiple copies of files

I have a file input.txt that looks as follows.
abas_1.txt
abas_2.txt
abas_3.txt
1fgh.txt
3ghl_1.txt
3ghl_2.txt
I have a folder ff. The filenames of this folder are abas.txt, 1fgh.txt, 3ghl.txt. Based on the input file, I would like to create and rename the multiple copies in ff folder.
For example in the input file, abas has three copies. In the ff folder, I need to create the three copies of abas.txt and rename it as abas_1.txt, abas_2.txt, abas_3.txt. No need to copy and rename 1fgh.txt in ff folder.
Your valuable suggestions would be appreciated.

You can try something like this (to be run from within your folder ff):
#!/bin/bash
while IFS= read -r fn; do
[[ $fn =~ ^(.+)_[[:digit:]]+\.([^\.]+)$ ]] || continue
fn_orig=${BASH_REMATCH[1]}.${BASH_REMATCH[2]}
echo cp -nv -- "$fn_orig" "$fn"
done < input.txt
Remove the echo if you're happy with it.
If you don't want to run from within the folder ff, just replace the line
echo cp -nv -- "$fn_orig" "$fn"
with
echo cp -nv -- "ff/$fn_orig" "ff/$fn"
The -n option to cp so as to not overwrite existing files, and the -v option to be verbose. The -- tells cp that there are no more options beyond this point, so that it will not be confused if one of the files starts with a hyphen.

using for and grep :
#!/bin/bash
for i in $(ls)
do
x=$(echo $i | sed 's/^\(.*\)\..*/\1/')"_"
for j in $(grep $x in)
do
cp -n $i $j
done
done

Try this one
#!/bin/bash
while read newFileName;do
#split the string by _ delimiter
arr=(${newFileName//_/ })
extension="${newFileName##*.}"
fileToCopy="${arr[0]}.$extension"
#check for empty : '1fgh.txt' case
if [ -n "${arr[1]}" ]; then
#check if file exists
if [ -f $fileToCopy ];then
echo "copying $fileToCopy -> $newFileName"
cp "$fileToCopy" "$newFileName"
#else
# echo "File $fileToCopy does not exist, so it can't be copied"
fi
fi
done
You can call your script like this:
cat input.txt | ./script.sh

If you could change the format of input.txt, I suggest you adjust it in order to make your task easier. If not, here is my solution:
#!/bin/bash
SRC_DIR=/path/to/ff
INPUT=/path/to/input.txt
BACKUP_DIR=/path/to/backup
for cand in `ls $SRC_DIR`; do
grep "^${cand%.*}_" $INPUT | while read new
do
cp -fv $SRC_DIR/$cand $BACKUP_DIR/$new
done
done

Why is while not not working?

AIM: To find files with a word count less than 1000 and move them another folder. Loop until all under 1k files are moved.
STATUS: It will only move one file, then error with "Unable to move file as it doesn't exist. For some reason $INPUT_SMALL doesn't seem to update with the new file name."
What am I doing wrong?
Current Script:
Check for input files already under 1k and move to Split folder
INPUT_SMALL=$( ls -S /folder1/ | grep -i reply | tail -1 )
INPUT_COUNT=$( cat /folder1/$INPUT_SMALL 2>/dev/null | wc -l )
function moveSmallInput() {
while [[ $INPUT_SMALL != "" ]] && [[ $INPUT_COUNT -le 1003 ]]
do
echo "Files smaller than 1k have been found in input folder, these will be moved to the split folder to be processed."
mv /folder1/$INPUT_SMALL /folder2/
done
}

I assume you are looking for files that has the word reply somewhere in the path. My solution is:
wc -w $(find /folder1 -type f -path '*reply*') | \
while read wordcount filename
do
if [[ $wordcount -lt 1003 ]]
then
printf "%4d %s\n" $wordcount $filename
#mv "$filename" /folder2
fi
done
Run the script once, if the output looks correct, then uncomment the mv command and run it for real this time.
Update
The above solution has trouble with files with embedded spaces. The problem occurs when the find command hands its output to the wc command. After a little bit of thinking, here is my revised soltuion:
find /folder1 -type f -path '*reply*' | \
while read filename
do
set $(wc -w "$filename") # $1= word count, $2 = filename
wordcount=$1
if [[ $wordcount -lt 1003 ]]
then
printf "%4d %s\n" $wordcount $filename
#mv "$filename" /folder2
fi
done

A somewhat shorter version
#!/bin/bash
find ./folder1 -type f | while read f
do
(( $(wc -w "$f" | awk '{print $1}' ) < 1000 )) && cp "$f" folder2
done
I left cp instead of mv for safery reasons. Change to mv after validating
I you also want to filter with reply use #Hai's version of the find command

Your variables INPUT_SMALL and INPUT_COUNT are not functions, they're just values you assigned once. You either need to move them inside your while loop or turn them into functions and evaluate them each time (rather than just expanding the variable values, as you are now).

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Sequentially numbering of files in different folders while keeping the name after the number - bash

Related

Search file of directories and find file names, save to new file - bash

Bash command does not work in script but in console

Cannot escape path in bash file

create and rename multiple copies of files

Why is while not not working?

Categories

Resources