I've been trying to find an efficient way to rename lots of files, by removing a specific component of the filename, in bash shell in linux. Filenames are like:
DATA_X3.A2022086.40e50s.231.2022087023101.csv
I want to remove the 2nd to last element entirely, resulting in:
DATA_X3.A2022086.40e50s.231.csv
I've seen suggestions to use perl-rename, that might handle this (I'm not clear), but this system does not have perl-rename available. (Has GNU bash 4.2, and rename from util-linux 2.23)
I like extended globbing and parameter parsing for things like this.
$: shopt -s extglob
$: n=DATA_X3.A2022086.40e50s.231.2022087023101.csv
$: echo ${n/.+([0-9]).csv/.csv}
DATA_X3.A2022086.40e50s.231.csv
So ...
for f in *.csv; do mv "$f" "${f/.+([0-9]).csv/.csv}"; done
This assumes all the files in the local directory, and no other CSV files with similar formatting you don't want to rename, etc.
edit
In the more general case where the .csv is not immediately following the component to be removed, is there a way to drop the nth dot-separated component in the filename? (without a more complicated sequence to string-split in bash (always seems cumbersome) and rebuild the filename?
There is usually a way. If you know which field needs to be removed -
$: ( IFS=. read -ra line <<< "$n"; unset line[4]; IFS=".$IFS"; echo "${line[*]}" )
DATA_X3.A2022086.40e50s.231.csv
Breaking that out:
( # open a subshell to localize IFS
IFS=. read -ra line <<< "$n"; # inline set IFS to . to parse to fields
unset line[4]; # unset the desired field from the array
IFS=".$IFS"; # prepend . as the OUTPUT separator
echo "${line[*]}" # reference with * to reinsert
) # closing the subshell restores IFS
I will confess I am not certain why the inline setting of IFS doesn't work on the reassembly. /shrug
This is a simple split/drop-field/reassemble, but I think it may be an X/Y Problem
If what you are doing is dropping the one field that has the date/timestamp info, then as long as the format of that field is consistent and unique, it's probably easier to use a version of the first approach.
Is it possible you meant for DATA_X3.A2022086.40e50s.231.2022087023101.csv's 5th field to be 20220807023101? i.e., August 7th of 2022 # 02:31:01 AM? Because if that's what you mean, and it's supposed to be 14 digits instead of 13, and that is the only field that is always supposed to be exactly 14 digits, then you don't need shopt and can leave the field position floating -
$: n=DATA_X3.A2022086.40e50s.231.20220807023101.csv
$: $: echo ${n/.[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]./.}
DATA_X3.A2022086.40e50s.231.csv
Related
I need to verify that all images mentioned in a csv are present inside a folder. I wrote a small shell script for that
#!/bin/zsh
red='\033[0;31m'
color_Off='\033[0m'
csvfile=$1
imgpath=$2
cat $csvfile | while IFS=, read -r filename rurl
do
if [ -f "${imgpath}/${filename}" ]
then
echo -n
else
echo -e "$filename ${red}MISSING${color_Off}"
fi
done
My CSV looks something like
Image1.jpg,detail-1
Image2.jpg,detail-1
Image3.jpg,detail-1
The csv was created by excel.
Now all 3 images are present in imgpath but for some reason my output says
Image1.jpg MISSING
Upon using zsh -x to run the script i found that my CSV file has a BOM at the very beginning making the image name as \ufeffImage1.jpg which is causing the whole issue.
How can I ignore a BOM(byte-order marker) in a while read operation?
zsh provides a parameter expansion (also available in POSIX shells) to remove a prefix: ${var#prefix} will expand to $var with prefix removed from the front of the string.
zsh also, like ksh93 and bash, supports ANSI C-like string syntax: $'\ufeff' refers to the Unicode sequence for a BOM.
Combining these, one can refer to ${filename#$'\ufeff'} to refer to the content of $filename but with the Unicode sequence for a BOM removed if it's present at the front.
The below also makes some changes for better performance, more reliable behavior with odd filenames, and compatibility with non-zsh shells.
#!/bin/zsh
red='\033[0;31m'
color_Off='\033[0m'
csvfile=$1
imgpath=$2
while IFS=, read -r filename rurl; do
filename=${filename#$'\ufeff'}
if ! [ -f "${imgpath}/${filename}" ]; then
printf '%s %bMISSING%b\n' "$filename" "$red" "$color_Off"
fi
done <"$csvfile"
Notes on changes unrelated to the specific fix:
Replacing echo -e with printf lets us pick which specific variables get escape sequences expanded: %s for filenames means backslashes and other escapes in them are unmodified, whereas %b for $red and $color_Off ensures that we do process highlighting for them.
Replacing cat $csvfile | with < "$csvfile" avoids the overhead of starting up a separate cat process, and ensures that your while read loop is run in the same shell as the rest of your script rather than a subshell (which may or may not be an issue for zsh, but is a problem with bash when run without the non-default lastpipe flag).
echo -n isn't reliable as a noop: some shells print -n as output, and the POSIX echo standard, by marking behavior when -n is present as undefined, permits this. If you need a noop, : or true is a better choice; but in this case we can just invert the test and move the else path into the truth path.
I have been trying to do this all afternoon and cannot figure out how to do this. I'm running MXLinux and from the commandline am trying (unsucessfully) to batch edit a bunch of filenames (I've about 500 so don't want to do this by hand) from:
2020-August-15.pdf
2021-October-15.pdf
To:
2020-08-15.pdf
2021-10-15.pdf
I cannot find anything that does this (in a way I understand) so am wondering. Is this possible or am I to do this by hand?
Admittedly I'm not very good with Bash but I can use sed, awk, rename, date, etc. I just can't seem to find a way to combine them to rename my files.
I cannot find anything on here that has been of any help in doing this.
Many thanks.
EDIT:
I'm looking for a way to combine commands and ideally not have to overtly for-loop through the files and the months. What I mean is I would prefer, and was trying to, pipe ls into a command combination to convert as specified above. Sorry for the confusion.
EDIT 2:
Thank you to everyone who came up with answers, and for you patience with my lack of ability. I don't think I'm qualified to make a decision as to the best answer however have settled, for my use-case on the following:
declare -A months=( [January]=01 [February]=02 [March]=03 [April]=04 [May]=05\
[June]=06 [July]=07 [August]=08 [September]=09 [October]=10 [November]=11 [December]=12 )
for oldname in 202[01]-[A-za-z]*-15.pdf
do
IFS=-. read y m d ext <<< "${oldname}"
mv "$oldname" "$y-${months[$m]}-$d.$ext"
done
I think this offer the best flexibility. I would have liked the date command but don't know how to not have the file extension hard coded. I was unaware of the read command or that you could use patterns in the for-loop.
I have learned a lot from this thread so again thank you all. Really my solution is a cross of most of the solutions below as I've taken from them all.
With just Bash built-ins, try
months=(\
January February March April May June \
July August September October November December)
for file in ./*; do
dst=$file
for ((i=1; i<=${#months[#]}; ++i)); do
((i<10)) && i=0$i
dst=${dst//${months[$i]}/$i}
done
mv -- "$file" "$dst"
done
This builds up an array of month names, and loops over it to find the correct substitution.
The line ((i<10)) && i=0$i adds zero padding for single-digit month numbers; remove it if that's undesired.
As an aside, you should basically never use ls in scripts.
The explicit loop could be avoided if you had a command which already knows how to rename files, but this implements that command. If you want to save it in a file, replace the hard-coded ./* with "$#", add a #!/bin/bash shebang up top, save it as monthrenamer somewhere in your PATH, and chmod a+x monthrenamer. Then you can run it like
monthrenamer ./*
to rename all the files in the current directory without an explicit loop, or a more restricted wildcard argument to only select a smaller number of files, like
monthrenamer /path/to/files/2020*.pdf
You could run date twelve times to populate the array, but it's not like hard-coding the month names is going to be a problem. We don't expect them to change (and calling twelve subprocesses at startup just to avoid that seems quite excessive in this context).
As an aside, probably try to fix the process which creates these files to produce machine-readable file names. It's fairly obvious to a human, too, that 2021-07 refers to the month of July, whereas going the other way is always cumbersome (you will need to work around it in every tool or piece of code which wants to order the files by name).
Assuming you have the GNU version of date(1), you could use date -d to map the month names to numbers:
for f in *.pdf; do
IFS=- read y m d <<<"${f%.pdf}"
mv "$f" "$(date -d "$m $d, $y" +%F.pdf)"
done
I doubt it's any more efficient than your sed -e 's/January/01/' -e 's/February/02/' etc, but it does feel less tedious to type. :)
Explanation:
Loop over the .pdf files, setting f to each filename in turn.
The read line is best explained right to left:
a.
"${f%.pdf}" expands to the filename without the .pdf part, e.g. "2020-August-15".
b. <<< turns that value into a here-string, which is a mechanism for feeding a string as standard input to some command. Essentially, x <<<y does the same thing as echo y | x, with the important difference that the x command is run in the current shell instead of a subshell, so it can have side effects like setting variables.
c. read is a shell builtin that by default reads a single line of input and assigns it to one or more shell variables.
d. IFS is a parameter that tells the shell how to split lines up into words. Here we're setting it – only for the duration of the read command – to -. That tells read to split the line it reads on hyphens instead of whitespace; IFS=- read y m d <<<"2020-August-15" assigns "2020" to y, "August" to m, and "15" to d.
The GNU version of date(1) has a -d parameter that tells it to display another date instead of the current one. It accepts a number of different formats itself, sadly not including "yyyy-Mon-dd", which is why I had to split the filename up with read. But it does accept "Mon dd, yyyy", so that's what I pass to it. +%F.pdf tells it that when it prints the date back out it should do so ISO-style as "yyyy-mm-dd", and append ".pdf" to the result. ("%F" is short for "%Y-%m-%d"; I could also have used -I instead of +anything and moved the .pdf outside the command expansion.)
f. The call to date is wrapped in $(...) to capture its output, and that result is used as the second parameter to mv to rename the files.
Another way with POSIX shell:
# Iterate over pattern that will exclude already renamed pdf files
for file in [0-9][0-9][0-9][0-9]-[^0-9]*.pdf
do
# Remove echo if result match expectations
echo mv -- "$file" "$(
# Set field separator to - or . to split filename components
IFS=-.
# Transfer filename components into arguments using IFS
set -- $file
# Format numeric date string
date --date "$3 $2 $1" '+%Y-%m-%d.pdf'
)"
done
If you are using GNU utilities and the Perl version of rename (not the util-linux version), you can build a one-liner quite easily:
rename "$(
seq -w 1 12 |
LC_ALL=C xargs -I# date -d 1970-#-01 +'s/^(\d{4}-)%B(-\d{2}\.pdf)$/$1%m$2/;'
)" *.pdf
You can shorten if you don't care about safety (or legibility)... :-)
rename "$(seq -f%.f/1 12|date -f- +'s/%B/%m/;')" *.pdf
What I mean is I would prefer, and was trying to, pipe ls into a command combination to convert as specified above.
Well, you may need to implement that command combination then. Here’s one consisting of a single “command” and in pure Bash without external processes. Pipe your ls output into that and, once satisfied with the output, remove the final echo…
#!/bin/bash
declare -Ar MONTHS=(
[January]=01
[February]=02
[March]=03
[April]=04
[May]=05
[June]=06
[July]=07
[August]=08
[September]=09
[October]=10
[November]=11
[December]=12)
while IFS= read -r path; do
IFS=- read -ra segments <<<"$path"
segments[-2]="${MONTHS["${segments[-2]}"]}"
IFS=- new_path="${segments[*]}"
echo mv "$path" "$new_path"
done
What is working for me in Mac OS 12.5 with GNU bash, version 3.2.57(1)-release (arm64-apple-darwin21)
is the following :
for f in *.pdf; do mv "$f" "$(echo $f |sed -e 's/Jan/-01-/gi' -e 's/Feb/-02-/gi' -e 's/Mar/-03-/gi' -e 's/Apr/-04-/gi' -e 's/May/-05-/gi' -e 's/jun/-06-/gi' -e 's/Jul/-07-/gi' -e 's/Aug/-08-/gi' -e 's/Sep/-09-/gi' -e 's/Oct/-10-/gi' -e 's/Nov/-11-/gi' -e 's/Dec/-12-/gi' )"; done
Note the original file had the month expressed in three litters in my case :
./04351XXX73435-2021Mar08-2021Apr08.pdf
I am new to bash, and to stackoverflow, so please excuse me if my question is missing some elements.
I am looking for a way where I can read the files in the directory, and if the file name is in a specific format, set it to a certain variable to be processed.
For example: in /test, I have multiple files with the format: number_date_typeOfFile.fileType.
1_210720_TypeOne.txt
1_210721_TypeOne.txt
1_210722_TypeOne.txt
1_210720_TypeTwo.txt
1_210721_TypeTwo.txt
1_210722_TypeTwo.txt
+ other files
They are not going to be .txt files, its just as an example. There will be other files so I need a way that reads the front number, and the type correctly, with the varying dates.
The result I want is to set those 6 files into variables:
TypeOneA = 1_210720_TypeOne.txt
TypeOneB = 1_210721_TypeOne.txt
TypeOneC = 1_210722_TypeOne.txt
TypeTwoA = 1_210720_TypeTwo.txt
TypeTwoB = 1_210721_TypeTwo.txt
ect.
Other questions that are already answered seems to just read all the files in the directory and echo the name.
Thanks in advance.
Extended globbing can really help refine file sets.
If you set
shopt -s extglob
then you can match exactly that pattern with
[0-9]_[0-9][0-9][0-9][0-9][0-9][0-9]_Type+(One|Two).txt
or if you don't feel the need to specify exactly six digits in that "date" zone,
[0-9]_+([0-9])_Type+(One|Two).txt
This means you can collect them into an array safely with something like
shopt -s extglob
lst=( [0-9]_[0-9][0-9][0-9][0-9][0-9][0-9]_Type+(One|Two).txt )
The above will put them all into a single array. If you need separate arrays by type -
type1=( [0-9]_[0-9][0-9][0-9][0-9][0-9][0-9]_TypeOne.txt )
type2=( [0-9]_[0-9][0-9][0-9][0-9][0-9][0-9]_TypeTwo.txt )
Note that these don't even require extended globbing.
You can then access them like any array.
echo "${type1[5]}"
for t2 in "${type2[#]}"; do echo "$t2"; done
There are a lot of ways to tweak this.
$: shopt -s extglob
$: lst=( [0-9]_+([0-9])_*Type+(One|Two)*.txt )
Now it's one list with a more varied set of filenames.
$: cat <<< "${lst[#]}"
1_210720_TypeOne.txt 1_210720_TypeOne_210720123.txt 1_210720_TypeTwo.txt
1_210721_TypeOne.txt 1_210721_TypeOne_210721456.txt 1_210721_TypeTwo.txt
1_210722_TypeOne.txt 1_210722_TypeOne_210731789.txt 1_210722_TypeTwo.txt
1_210723_RANDOMWORD_TypeOne_210723951.txt
$: printf "%s\n" "${lst[#]}" | grep TypeOne
1_210720_TypeOne.txt
1_210720_TypeOne_210720123.txt
1_210721_TypeOne.txt
1_210721_TypeOne_210721456.txt
1_210722_TypeOne.txt
1_210722_TypeOne_210731789.txt
1_210723_RANDOMWORD_TypeOne_210723951.txt
$: printf "%s\n" "${lst[#]}" | grep TypeTwo
1_210720_TypeTwo.txt
1_210721_TypeTwo.txt
1_210722_TypeTwo.txt
$: printf "%s\n" "${lst[#]}" | grep [^0-9]_TypeOne
1_210723_RANDOMWORD_TypeOne_210723951.txt
This may be a very specific case, but I know very little about bash and I need to remove "duplicate" files. I've been downloading totally legal videogame roms these past few days, and I noticed that a lot of packs have many different versions of the same game, like this:
Awesome Golf (1991).lnx
Awesome Golf (1991) [b1].lnx
Baseball Heroes (1991).lnx
Baseball Heroes (1991) [b1].lnx
Basketbrawl (1992).lnx
Basketbrawl (1992) [a1].lnx
Basketbrawl (1992) [b1].lnx
Batman Returns (1992).lnx
Batman Returns (1992) [b1].lnx
How can I make a bash script that removes the duplicates? A duplicate would be any file that has the same name, and the name would be the string before the first parenthesis. The script should parse all the files and grab their names, see which names match to detect duplicates, and remove all files except the first one (first being the first that comes up in alphabetical order).
Would you please try the following:
#!/bin/bash
dir="dir" # the directory where the rom files are located
declare -A seen # associative array to detect the duplicates
while IFS= read -r -d "" f; do # loop over filenames by assigning "f" to it
name=${f%(*} # extract the "name" by removing left paren and following characters
name=${name%.*} # remove the extension considering the case the filename doesn't have parens
name=${name%[*} # remove the left square bracket and following characters considering the case as above
name=${name%% } # remove trailing whitespaces, if any
if (( seen[$name]++ )); then # if the name duplicates...
# remove "echo" if the output looks good
echo rm -- "$f" # then remove the file
fi
done < <(find "$dir" -type f -name "*.lnx" -print0 | sort -z -t "." -k1,1)
# sort the list of filenames in alphabetical order
Please modify the first dir= line to your directory path which holds the rom files.
The echo command just prints the filenames to be removed as a rehearsal. If the output looks good, then remove echo and execute the real one.
[Explanation]
An associative array seen associates the extracted "name" with a
counter of appearance. If the counter is non-zero, the file is a duplicated
one and can be removed (as long as the files are properly sorted).
The -print0 option to find, the -z option to sort and the -d ""
option to read make a null character as a delimiter of filenames to
accept filenames which contain special characters such as a whitespace,
tab, newline, etc.
This question already has answers here:
IFS separate a string like "Hello","World","this","is, a boring", "line"
(3 answers)
Closed 6 years ago.
I'm working with a hand fill file and I am having issue to parse it.
My file input file cannot be altered, and the language of my code can't change from bash script.
I made a simple example to make it easy for you ^^
var="hey","i'm","happy, like","you"
IFS="," read -r one two tree for five <<<"$var"
echo $one:$two:$tree:$for:$five
Now I think you already saw the problem here. I would like to get
hey:i'm:happy, like:you:
but I get
hey:i'm:happy: like:you
I need a way to tell the read that the " " are more important than the IFS. I have read about the eval command but I can't take that risk.
To end this is a directory file and the troublesome field is the description one, so it could have basically anything in it.
original file looking like that
"type","cn","uid","gid","gecos","description","timestamp","disabled"
"type","cn","uid","gid","gecos","description","timestamp","disabled"
"type","cn","uid","gid","gecos","description","timestamp","disabled"
Edit #1
I will give a better exemple; the one I use above is too simple and #StefanHegny found it cause another error.
while read -r ldapLine
do
IFS=',' read -r objectClass dumy1 uidNumber gidNumber username description modifyTimestamp nsAccountLock gecos homeDirectory loginShell createTimestamp dumy2 <<<"$ldapLine"
isANetuser=0
while IFS=":" read -r -a class
do
for i in "${class[#]}"
do
if [ "$i" == "account" ]
then
isANetuser=1
break
fi
done
done <<< $objectClass
if [ $isANetuser == 0 ]
then
continue
fi
#MORE STUFF APPEND#
done < file.csv
So this is a small part of the code but it should explain what I do. The file.csv is a lot of lines like this:
"top:shadowAccount:account:posixAccount","Jdupon","12345","6789","Jdupon","Jean Mark, Dupon","20140511083750Z","","Jean Mark, Dupon","/home/user/Jdupon","/bin/ksh","20120512083750Z","",""
If the various bash versions you will use are all more recent than v3.0, when regexes and BASH_REMATCH were introduced, you could use something like the following function: [Note 1]
each_field () {
local v=,$1;
while [[ $v =~ ^,(([^\",]*)|\"[^\"]*\") ]]; do
printf "%s\n" "${BASH_REMATCH[2]:-${BASH_REMATCH[1]:1:-1}}";
v=${v:${#BASH_REMATCH[0]}};
done
}
It's argument is a single line (remember to quote it!) and it prints each comma-separated field on a separate line. As written, it assumes that no field has an enclosed newline; that's legal in CSV, but it makes dividing the file into lines a lot more complicated. If you actually needed to deal with that scenario, you could change the \n in the printf statement to a \0 and then use something like xargs -0 to process the output. (Or you could insert whatever processing you need to do to the field in place of the printf statement.)
It goes to some trouble to dequote quoted fields without modifying unquoted fields. However, it will fail on fields with embedded double quotes. That's fixable, if necessary. [Note 2]
Here's a sample, in case that wasn't obvious:
while IFS= read -r line; do
each_field "$line"
printf "%s\n" "-----"
done <<EOF
type,cn,uid,gid,gecos,"description",timestamp,disabled
"top:shadowAccount:account:posixAccount","Jdupon","12345","6789","Jdupon","Jean Mark, Dupon","20140511083750Z","","Jean Mark, Dupon","/home/user/Jdupon","/bin/ksh","20120512083750Z","",""
EOF
Output:
type
cn
uid
gid
gecos
description
timestamp
disabled
-----
top:shadowAccount:account:posixAccount
Jdupon
12345
6789
Jdupon
Jean Mark, Dupon
20140511083750Z
Jean Mark, Dupon
/home/user/Jdupon
/bin/ksh
20120512083750Z
-----
Notes:
I'm not saying you should use this function. You should use a CSV parser, or a language which includes a good CSV parsing library, like python. But I believe this bash function will work, albeit slowly, on correctly-formatted CSV files of a certain common CSV dialect.
Here's a version which handles doubled quotes inside quoted fields, which is the classic CSV syntax for interior quotes:
each_field () {
local v=,$1;
while [[ $v =~ ^,(([^\",]*)|\"(([^\"]|\"\")*)\") ]]; do
echo "${BASH_REMATCH[2]:-${BASH_REMATCH[3]//\"\"/\"}}";
v=${v:${#BASH_REMATCH[0]}};
done
}
My suggestion, as in some previous answers (see below), is to switch the separator to | (and use IFS="|" instead):
sed -r 's/,([^,"]*|"[^"]*")/|\1/g'
This requires a sed that has extended regular expressions (-r) however.
Should I use AWK or SED to remove commas between quotation marks from a CSV file? (BASH)
Is it possible to write a regular expression that matches a particular pattern and then does a replace with a part of the pattern