Replace numbers in a file name in unix (BASH) - bash

I have multiple files approximately 150 and there names do not match a requirement of a vendor. example file names are:
company_red001.p12
company_red002.p12
.
.
.
.
company_red150.p12
I need to rename all files so that 24 is added to each number sequentially and that there are no preceding zero's and that the company_ component is removed.
red25.p12
red26.p12
red27.p12
.
.
.
red150.p12
I have used a for loop in bash to remove the company_ component but would like something that executes all changes simultaneously as I have to perform this at a moments notice.
example:
#!/bin/bash
n = 24
for file in company_red*
do
new_name=$file$n
n=$(($+1))
mv -i $file $new_name
done
example 2
#!/bin/bash
for f in company_red*
do mv "$f" "${f/company_red/red}";
done

Most probably this one could be fine :)
# printf is used to emulate a lot of files
for f in $( printf "company_red%03d.p12\n" {1..150} )
do
# get the filename
n="$f"
# remove extension
n="${n%.*}"
# remove leading letters
n="${n##*[[:alpha:]]}"
# add 24, 10# is used to consider the 10-based number
n="$(( 10#$n + 24 ))"
# construct new filename
g="red${n}.p12"
echo mv "$f" "$g"
done
And this could be simplified a bit
for f in $( printf "company_red%03d.p12\n" {1..150} )
do
# take the number from the specific, fixed position
n="${f:11:3}"
# everything below is the same as in the previous example
n="$(( 10#$n + 24 ))"
g="red${n}.p12"
echo mv "$f" "$g"
done
And finally, this could be simplified yet twice -- just escape of using $n and $g:
for f in $( printf "company_red%03d.p12\n" {1..150} )
do
echo mv "$f" "red$(( 10#${f:11:3} + 24 )).p12"
done
But this could complicate understanding and supporting of the code.

Do:
for file in *.p12; do
name=${file#*_} ## Extracts the portion after `_` from filename, save as variable "name"
pre=${name%.*} ## Extracts the portion before extension, save as "pre"
num=${pre##*[[:alpha:]]} ## Extracts number from variable "pre"
pre=${pre%%[0-9]*} ## Extracts the alphabetic portion from variable "pre"
suf=${name##*.} ## Extracts the extension from variable "name"
echo mv -i "$file" "${pre}""$(printf '%d' $((10#$num+24)))"."${suf}" ## Doing arithmetic expansion for addition, and necessary formatting to get desired name
done
Outputs:
mv -i company_red001.p12 red25.p12
mv -i company_red002.p12 red26.p12
The above is dry-run, remove echo if you are satisfied with the renaming to be done:
for file in *.p12; do
name=${file#*_}
pre=${name%.*}
num=${pre##*[[:alpha:]]}
pre=${pre%%[0-9]*}
suf=${name##*.}
mv -i "$file" "${pre}""$(printf '%d' $((10#$num+24)))"."${suf}"
done

Related

How to split large *.csv files with headers in Bash?

I need split big *.csv file for several smaller. Currently there is 661497 rows, I need each file with max. 40000. I've tried solution that I found on Github but with no success:
FILENAME=/home/cnf/domains/cnf.com.pl/public_html/sklep/dropshipping-pliki/products-files/my_file.csv
HDR=$(head -1 ${FILENAME})
split -l 40000 ${FILENAME} xyz
n=1
for f in xyz*
do
if [[ ${n} -ne 1 ]]; then
echo ${HDR} > part-${n}-${FILENAME}.csv
fi
cat ${f} >> part-${n}-${FILENAME}.csv
rm ${f}
((n++))
done
The error I get:
/home/cnf/domains/cnf.com.pl/public_html/sklep/dropshipping-pliki/download.sh: line 23: part-1-/home/cnf/domains/cnf.com.pl/public_html/sklep/dropshipping-pliki/products-files/my_file.csv.csv: No such file or directory
thanks for help!
Keep in mind FILENAME contains both a directory and a file so later in the script when you build the new filename you get something like:
part-1-/home/cnf/domains/cnf.com.pl/public_html/sklep/dropshipping-pliki/products-files/tyre_8.csv.csv
One quick-n-easy fix would be split the directory and filename into 2 separate variables, eg:
srcdir='/home/cnf/domains/cnf.com.pl/public_html/sklep/dropshipping-pliki/products-files'
filename='tyre_8.csv'
hdr=$(head -1 ${srcdir}/${filename})
split -l 40000 "${srcdir}/${filename}" xyz
n=1
for f in xyz*
do
if [[ ${n} -ne 1 ]]; then
echo ${hdr} > "${srcdir}/part-${n}-${filename}"
fi
cat "${f}" >> "${srcdir}/part-${n}-${filename}"
rm "${f}"
((n++))
done
NOTES:
consider using lowercase variables (using uppercase variables raises the possibility of problems if there's an OS variable of the same name)
wrap variable references in double quotes in case string contains spaces
don't need to add a .csv extension on the new filename since it's already part of $filename

bash script not filtering

I'm hoping this is a simple question, since I've never done shell scripting before. I'm trying to filter certain files out of a list of results. While the script executes and prints out a list of files, it's not filtering out the ones I don't want. Thanks for any help you can provide!
#!/bin/bash
# Purpose: Identify all *md files in H2 repo where there is no audit date
#
#
#
# Example call: no_audits.sh
#
# If that call doesn't work, try ./no_audits.sh
#
# NOTE: Script assumes you are executing from within the scripts directory of
# your local H2 git repo.
#
# Process:
# 1) Go to H2 repo content directory (assumption is you are in the scripts dir)
# 2) Use for loop to go through all *md files in each content sub dir
# and list all file names and directories where audit date is null
#
#set counter
count=0
# Go to content directory and loop through all 'md' files in sub dirs
cd ../content
FILES=`find . -type f -name '*md' -print`
for f in $FILES
do
if [[ $f == "*all*" ]] || [[ $f == "*index*" ]] ;
then
# code to skip
echo " Skipping file: " $f
continue
else
# find audit_date in file metadata
adate=`grep audit_date $f`
# separate actual dates from rest of the grepped line
aadate=`echo $adate | awk -F\' '{print $2}'`
# if create date is null - proceed
if [[ -z "$aadate" ]] ;
then
# print a list of all files without audit dates
echo "Audit date: " $aadate " " $f;
count=$((count+1));
fi
fi
done
echo $count " files without audit dates "
First, to address the immediate issue:
[[ $f == "*all*" ]]
is only true if the exact contents of f is the string *all* -- with the wildcards as literal characters. If you want to check for a substring, then the asterisks shouldn't be quoted:
[[ $f = *all* ]]
...is a better-practice solution. (Note the use of = rather than == -- this isn't essential, but is a good habit to be in, as the POSIX test command is only specified to permit = as a string comparison operator; if one writes [ "$f" == foo ] by habit, one can get unexpected failures on platforms with a strictly compliant /bin/sh).
That said, a ground-up implementation of this script intended to follow best practices might look more like the following:
#!/usr/bin/env bash
count=0
while IFS= read -r -d '' filename; do
aadate=$(awk -F"'" '/audit_date/ { print $2; exit; }' <"$filename")
if [[ -z $aadate ]]; then
(( ++count ))
printf 'File %q has no audit date\n' "$filename"
else
printf 'File %q has audit date %s\n' "$filename" "$aadate"
fi
done < <(find . -not '(' -name '*all*' -o -name '*index*' ')' -type f -name '*md' -print0)
echo "Found $count files without audit dates" >&2
Note:
An arbitrary list of filenames cannot be stored in a single bash string (because all characters that might otherwise be used to determine where the first name ends and the next name begins could be present in the name itself). Instead, read one NUL-delimited filename at a time -- emitted with find -print0, read with IFS= read -r -d ''; this is discussed in [BashFAQ #1].
Filtering out unwanted names can be done internal to find.
There's no need to preprocess input to awk using grep, as awk is capable of searching through input files itself.
< <(...) is used to avoid the behavior in BashFAQ #24, wherein content piped to a while loop causes variables set or modified within that loop to become unavailable after its exit.
printf '...%q...\n' "$name" is safer than echo "...$name..." when handling unknown filenames, as printf will emit printable content that accurately represents those names even if they contain unprintable characters or characters which, when emitted directly to a terminal, act to modify that terminal's configuration.
Nevermind, I found the answer here:
bash script to check file name begins with expected string
I tried various versions of the wildcard/filename and ended up with:
if [[ "$f" == *all.md ]] || [[ "$f" == *index.md ]] ;
The link above said not to put those in quotes, and removing the quotes did the trick!

How can I use multiple Bash arguments in loop dynamically without using long regex strings?

I have a directory with the following files:
file1.jpg
file2.jpg
file3.jpg
file1.png
file2.png
file3.png
I have a bash function named filelist and it looks like this:
filelist() {
if [ "$1" ]
then
shopt -s nullglob
for filelist in *."$#" ; do
echo "$filelist" >> created-file-list.txt;
done
echo "file created listing: " $#;
else
filelist=`find . -type f -name "*.*" -exec basename \{} \;`
echo "$filelist" >> created-file-list.txt
echo "file created listing: All Files";
fi
}
Goal: Be able to type as many arguments as I want for example filelist jpg png and create a file with a list of files of only the extensions I used as arguments. So if I type filelist jpg it would only show a list of files that have .jpg.
Currently: My code works great with one argument thanks to $#, but when I use both jpg and png it creates the following list
file1.jpg
file2.jpg
file3.jpg
png
It looks like my for loop is only running once and only using the first argument. My suspicion is I need to count how many arguments and run the loop on each one.
An obvious fix for this is to create a long regex check like (jpg|png|jpeg|html|css) and all of the different extensions one could ever think to type. This is not ideal because I want other people to be free to type their file extensions without breaking it if they type one that I don't have identified in my regex. Dynamic is key.
You can rewrite your function as shown below - just loop through each extension and append the list of matching files to the output file:
filelist() {
if [ $# -gt 0 ]; then
shopt -s nullglob
for ext in "$#"; do
printf '%s\n' *."$ext" >> created-file-list.txt
echo "created listing for extension $ext"
done
else
find . -type f -name "*.*" -exec basename \{} \; >> created-file-list.txt
echo "created listing for all files"
fi
}
And you can invoke your function as:
filelist jpg png
Try this
#!/bin/bash
while [ -n "$1" ]
do
echo "Current Parameter: $1 , Remaining $#"
#Pass $1 to some bash function or do whatever
shift
done
Using the shift you shift the args left and get the next one by reading the $1 variable.
See man bash on what shift does.
shift [n]
The positional parameters from n+1 ... are renamed to $1 .... Parameters represented by the numbers $# down to $#-n+1 are
unset. n must
be a non-negative number less than or equal to $#. If n is 0, no parameters are changed. If n is not given, it is assumed to
be 1. If n
is greater than $#, the positional parameters are not changed. The return status is greater than zero if n is greater than
$# or less
than zero; otherwise 0.
Or you can iterate like as follows
for this in "$#"
do
echo "Param = $this";
done

rename numbering within filename using shell

My files have the following pattern:
a0015_random_name.txt
a0016_some_completely_different_name.txt
a0017_and_so_on.txt
...
I would like to rename only the numbering using the shell, so that they are going two numbers down:
a0015_random_name.txt ---> a0013_random_name.txt
a0016_some_completely_different_name.txt ---> a0014_some_completely_different_name.txt
a0017_and_so_on.txt ---> a0015_and_so_on.txt
I've tried already this:
let n=15; for i in *.txt; do let n=n-2; b=`printf a00`$n'*'.txt; echo "mv $i $b"; done
(I use echo first, in order to see what would happen)
but this gave me:
mv a0015_random_name.txt a0013*.txt
mv a0016_some_completely_different_name.txt a0014*.txt
mv a0017_and_so_on.txt a0015*.txt
Also I've tried to find the command, which would set the rest of the name right, but I couldn't find it. Does someone know it, or have a better idea how to do this?
Your code is almost correct. Try this:
let n=15; for i in *.txt; do let n=n-2; b=`echo $i | sed "s/a[0-9]*/a$n/g`; echo "mv $i $b"; done
Better yet, to make it more robust, use the following modification:
let n=15; for i in *.txt; do let t=n-2; let n=n+1; b=`echo $i | sed "s/a00$n/a00$t/g`; echo "mv $i $b"; done
If you have the Perl rename.pl script, this is a one-liner:
rename 's/\d+/sprintf "%0${\(length $&)}d", $&-2/e' *.txt
Otherwise, it's a bit wordier. Here's one way:
for f in *.txt; do
number=$(expr "$f" : '^[^0-9]*\([0-9]*\)') # extract the first number from the filename
prefix=${f%%$number*} # remember the part before
suffix=${f#*$number} # and after the number
let n=10#$number-2 # subtract 2
nf=$(printf "%s%0${#number}d%s" \
"$prefix" "$n" "$suffix") # build new filename
echo "mv '$f' '$nf'" # echo the rename command
# mv "$f" "$nf" # uncomment to actually do the rename
done
Note the 10# on the let line - that forces the number to be interpreted in base 10 even if it has leading zeroes, which would otherwise cause it to be interpreted in base 8. Also, the %0${#number}d format tells printf to format the new number with enough leading zeroes to be the same length as the original number.
On your example, the above script produces this output:
mv 'a0015_random_name.txt' 'a0013_random_name.txt'
mv 'a0016_some_completely_different_name.txt' 'a0014_some_completely_different_name.txt'
mv 'a0017_and_so_on.txt' 'a0015_and_so_on.txt'

delete a numbered range of files

I have a range of files in the format namen.txt and I'd like to remove all but 3 whos n immediately precedes a selected file.
ie. if I input file24.txt I want to delete any file lower than file20.txt
The following works, but is there a simpler way? perhaps using find -name -delete or similar?
file=file24.txt
num=$(sed 's/[^0-9]//g' <<< $file)
((num-=3))
while :
do
files=($(find $dir -name "*txt"))
count=${#files[#]}
if ((count < 1 ))
then
break
fi
rm file"$num".txt
((num--))
done
Here is one way of doing it:
#!/bin/bash
# Grab the number from file passed to the script. $1 holds the that value
num="${1//[!0-9]}"
# To prevent from matching no files enable this shell option
shopt -s nullglob
# Iterate over the desired path where files are
for file in *; do
# Capture the number from file in loop
n=${file//[!0-9]}
# If file has no number and matches your criteria, delete the file
[[ ! -z $n ]] && (( n < num - 3)) && rm "$file"
done
Run it as:
./script.sh file24.txt
I'd probably do this with a Perl script:
#!/usr/bin/env perl
use strict;
use warnings;
for my $arg (#ARGV)
{
my($prefix, $number, $suffix) = ($arg =~ m/^ (\D+) (\d+) (\D.*) $/x);
foreach my $i (1..$number-4)
{
my $file = "$prefix$i$suffix";
unlink $file;
print "$file\n";
}
}
For each of the arguments specified on the command line, the name is split into 3 bits: a non-empty prefix of non-digits, a non-empty number of digits, and a suffix consisting of a non-digit followed by any sequence of characters (so file1.bz2 is split into file, 1 and .bz2). Then, for each number from 1 to 4 less than the given number, generate a file name from the prefix, the current number, and the suffix. With that file name, unlink the file and print the name. You can tweak it to remove only files that exist, or not report the names, or whatever. There's no fixed limit on the maximum number of files.
You could omit the unlink and simply print the file names and send those to xargs rm -f or an equivalent. You could ensure that the names were terminated with a null byte so that names with newlines could be handled correctly by GNU xargs and the -0 option. Etc.
You could code this in pure Bash if you wished to, though the splitting into prefix, number, suffix will be messier than in Perl. I wouldn't use awk for this, though it could probably be forced to do the job if you chose to make it do so.
I think this might be one of the easiest ways of doing it:
shopt -s extglob
rm !(file21.txt|file22.txt|file23.txt)
Here's a simple function that does this in a more generic way:
function rmbut3() {
for ((n=0 ; n < $(($1 - 3)) ; n++))
do
rm file${n}.txt
done
}
rmbut3 24 # deletes files up to file20.txt

Resources