Renaming large amount of files - bash

i need a script which will rename large amount of files. I got a folder with a lot of files. Every file is named by ID. Then i have a CSV file like this:
oldID;newID
oldID;newID
etc...
Every old and new id is specific and original. I'd like to ask what should be the best way to do it or little help in bash/batch.

The solution for batch is very similar to e0k's solution for bash; you read the file in one line at a time, split the line on semicolons, and rename the file accordingly.
for /f "tokens=1,2 delims=;" %%A in (ids.csv) do ren "%%A" "%%B"
This assumes that your IDs are in a file called ids.csv

If you are using bash (the shell used in world of Linux, UNIX, etc.), you can use the following short script based on this internal field separator answer. This assumes that you are using a semicolon (;) as the delimiter of your "CSV" file and that there is only one such delimiter.
#!/bin/bash
while IFS=';' read -ra names; do
mv "${names[0]}" "${names[1]}";
done < translation.csv
where translation.csv is your file containing the name translations with an oldname;newname format.
If you are instead asking for a batch file (i.e. for Windows, DOS, etc.) then that is a different animal in a different world.

Given that your OS is some unix (like linux), and given that the use of csv files has been your own choice, there might be an easier way to go: mmv can rename many files in one go, using patterns to match original files, and allowing to use the matched strings in the target file names. See http://ss64.com/bash/mmv.html.

Related

Script to move all files starting with the same 7 letters in a different folder named after first 7 chars of its future content

All files are in a directory (over 500 000 files), named in the following pattern
AR00001_1
AR00001_2
AR00001_3
AR00002_1
AR00002_2
AR00002_3
I need a script, can be both batch or unix shell that takes everything with AR00001 and moves it into a new folder that will be called AR00001, and does the same for AR00002 files etc
Here's what I've been trying to figure out until now
for f in *_*; do
DIR="$( echo ${f%.*} | tr '_' '/')"
mkdir -p "./$DIR"
mv "$f" "$DIR"
done
Thanks
// Update
Ran this in the CMD
for %F in (c:\test\*) do (md "d:\destination\%~nF"&move "%F" "d:\destination\%~nF\") >nul
Seems to be almost what I wanted, except that it does not take the first 7 characters as a substring but instead creates a folder for each file :/ I'm trying to mix it with your solutions
#echo off
setlocal enabledelayedexpansion
for %%a in (???????_*) do (
set "x=%%a"
set "x=!x:~0,7!"
md "!x!" >nul
move "!x!*" "!x!\" 2>nul
)
for every matching file do:
- get the first 7 characters
- create a folder with that name (ignore error message, if exist)
- move all files that start with those 7 characters (ignore errormessages, if files doesn't exist (already moved))
The following achieves the desired effect and checks for non-existence of the target directory each time before creating it.
#echo off
setlocal ENABLEDELAYEDEXPANSION
set "TOBASE=c:\target\"
set "MATCHFILESPEC=AR*"
for %%F in ("%MATCHFILESPEC%") do (
set "FILENAME=%%~nF"
set "TOFOLDER=%TOBASE%!FILENAME:~0,7!"
if not exist "!TOFOLDER!\" md "!TOFOLDER!"
move "%%F" "!TOFOLDER!" >nul
)
endlocal
In the move command, by moving only the current file rather than including a wildcard, we ensure that we're not eating up file names that might be about to appear the next time around the loop. Keeping it simple, assuming that efficiency is not of prime importance.
I'd recommend prototyping by creating batch files (with a .bat or .cmd extension) rather than trying to do complex tasks interactively using on one-liners. The behaviour can be different and there are more things you can do in a batch file, such as using setlocal to turn on delayed expansion of variables. It's also just a pain writing for loops using the %F interactively, only to have to remember to convert all those to %%F, %%~nF, etc. when pasting into a batch file for posterity.
One word of caution: with 500,000 files in the folder, and all of the files having very similar prefixes, if your file system has 8.3 directory naming turned on (which is often the default) it is possible to run into problems using wildcards. This happens as the 8.3 namespace gets more and more busy and there are fewer and fewer options for ways the file name can be encoded in 8 characters. (The hash table fills up and starts overflowing into unexpected file names).
One solution is to turn that feature off on the server but that may have severe implications for any legacy applications. To see what the file looks like in 8.3 naming scheme, you can do, e.g.:
dir /x /p AR*
... which might give you something like (where the left hand name is the one converted to 8.3):
ARB900~1.TST AR15467_RW322.tst
AR85E3~1.TST AR15468_RW322.tst
ARDDFE~1.TST AR15469_RW322.tst
AR1547~1.TST AR15470_RW322.tst
AR1547~2.TST AR15471_RW322.tst
...
In this example, since the first two characters seem to be maintained, there should be no conflict.
So for example if I say for %a in (AR8*) do #echo %a I get what might at first seem to be incorrect:
AR15468_RW322.tst
AR18565_RW322.tst
AR20376_RW322.tst
AR14569_RW322.tst
AR17278_RW322.tst
...
But this is actually correct; it is all the files that match AR8* in both the long file name and short file name formats.
Edit: I am aware in retrospect that this solution looks very similar to Stephan's, and I had browsed through the existing answers before starting work on my own, so I should credit him. I will try and save face by pointing out a benefit of Stephan's solution. Its use of wildcards should circumvent any 8.3 naming issue: by specifying the wildcard as ???????_*, it only catches the long file names and won't match any of the converted 8.3 file names (all of which are devoid of underscores in that position). Similarly, a wildcard such as AR?????_* would do the same.
With bash, you'd write:
for f in *; do
[[ -d $f ]] && continue # skip existing directories
prefix=${f:0:7} # substring of first 7 characters
mkdir -p "$prefix" # create the directory if it does not exist
mv "$f" "$prefix" # and move the file
done
For the substring expansion, see https://www.gnu.org/software/bash/manual/bash.html#Shell-Parameter-Expansion -- this is probably the bit you're missing.

Copying files with specific extension from a list (text file) of directories

I have a text file with list of certain directories that I want to copy *.xlsx files from them to another directory.
This is how the the text file (list.txt) is arranged:
PT_NAK01, PT_NAK04, PT_NAK05, PT_JAR03
What I have so far:
#echo off
set main_folder="\\internal.company.com\project folder\"
set my_folder="C:\_M__\files"
for /f "tokens=*" %%i in (list.txt) DO (
xcopy "%main_folder%\%%i" "%my_folder%"
)
So the folders that I want to look into would be \\internal.company.com\project folder\PT_NAK01 etc.
What I don't know is how to pass the specific extension *.xlsx to this command.
Note: I haven't used /S switch with xcopy deliberately because I do not want the files in the sub-directories.
P.S. Solutions in powershell or cygwin work for me as well.
This is a cygwinshell answer (bash is an advanced shell that should be reserved for when standard Posix shell (/bin/sh) is insufficient). Note that slashes are reversed intentionally.
I see the format in your list.txt is delimited with commas and whitespace. I am going to assume that this is literal and the reason none of what you've tried so far works. Therefore, I am parsing it with the explicit assumption that comma and then space (, ) is a delimiter and that there is no way to escape them (e.g. if you have a file named apples, oranges.txt then my code would erroneously parse files named apples and oranges.txt).
#!/bin/sh
main_folder="${1:-//internal.company.com/project folder}"
my_folder="${2:-c:/_Masoud/files}"
cd "$main_folder" || exit $?
IFS=', ' find $(cat list.txt) -maxdepth 1 -name \*.xlsx |while IFS= read xlsx; do
mkdir -p "$my_folder/${xlsx%/*}"
cp -a "$xlsx" "$my_folder/$xlsx"
done
I've done some extra work for you to make this more abstract. $main_folder is taken from your first argument (a missing argument will default to //internal.company.com/project folder) and $my_folder is taken from your second argument (if missing, it defaults to c:/_Masoud/files). Don't forget to quote your command-line arguments if they contain spaces or interpretable characters.
After determining your source and destination, I then try to change directories to the source directory. If this fails, the script will stop with the same exit code.
Now for the loop. I've changed the Input Field Separator ($IFS) to be the comma and space (, ) we talked about earlier and then glued the contents of list.txt into its arguments, followed by the requirement of being one level deep (to include PT_NAK05/foobar/baz.xlsx, use -maxdepth 2 or just remove that clause altogether to view the file tree recursively), followed by the requirement of a name matching *.xlsx (this is escaped because your shell would otherwise assume you're talking about the local directory). The output of this is then read into a loop line by line as $xlsx. We recreate the target's parent directory in the new target destination if it's not already present, then we copy the file to that location. cp -a preserves permissions and time stamps.
One thing that made an error in my example was how I set the text file with the folder names. It should be set up with carriage return as separator instead of comma-separated entries.
PT_NAK01
PT_NAK04
PT_NAK05
etc.
With that, this batch-file (in reference to MatSnow's and shellter's comments) works fine for the purpose of the question.
#echo off
set main_folder="\\internal.company.com\project folder\"
set my_folder="C:\_M__\files"
for /f "tokens=*" %%i in (list.txt) DO (
xcopy "%main_folder%\%%i\*.xlsx" "%my_folder%"
)
Note: If you want to type this directly into the command line, you don't need double % for the variables.

Windows delete with wildcards deleting erratically

This is driving me crazy. Basically, I have a program that outputs tables to a flat file for multiple databases with the same structure. These files get named in the format tablename_####.dat, where #### is the 4 digit company number. After these are all created, the program then combines all of the files by tablename, and adds a timestamp on the end. So, the final file name is in the format tablename_YYYYMMDD_HHmmSS.dat. Finally, I want to delete all of the individual .dat files, leaving only the combined, time stamped files.
This works just fine for all of the tables, except for the table VEX. For example, I have files:
VEX_1234.dat
VEX_5678.dat
VEX_0987.dat
which combine to form VEX_20150414_144352.dat. After this, I run the command:
`del *_????.dat`
This deletes all of the tables' individual files (V_1234.dat, PAT_9534.dat, etc.), while leaving the combined files (V_20150414_142311.dat, PAT_20150413_132113.dat) ...except for VEX. It deletes both the individual files and the combined file. Shouldn't this only delete files that end with an underscore, 4 characters, and ".dat"?
I know this has to be something really simple that I'm missing. What is going on?
Most likely your issue is caused by short 8.3 file names.
The ? wildcard can match 0 or 1 character if it precedes a dot. Your file mask of *_????.dat will match any name that has any number of characters, followed by a _, followed by 0 to 4 characters, followed by the .dat extension. The tricky thing is it will attempt to match both the long file name, and any short 8.3 name, if it exists.
Try issuing dir /x *.dat, and look at the short name of the problem file. I suspect it will match your file mask.
There are patterns with how short names are derived, but there is no way to predict the short name of any given file unless you are aware of all existing short names within the folder, and then you would be relying on undocumented behavior.
This is a fairly common problem. If your files are on an NTFS drive and you have admin rights, then you can disable short file name generation. But this does not remove already existing short names.
The best general solution is to pipe DIR /B through FINDSTR to remove the unwanted files, and process the result with FOR /F to delete each file individually. The FINDSTR below will exclude file names that contain two or more _ characters.
for /f "delims=" %%F in ('dir /b *.dat^|findstr /v "_.*_"') do del "%%F"

renaming files in windows...perhaps dos command prompt (For)

This kind of question has been asked a few times before on here and I have tried to use the answers in previous posts for my problem but I'm still struggling.
I have in a directory with 100's of files along the lines of
ab00123456.stp
ab00123457.stp
ab00123458.stp
...and so on
I would like to rename all these by adding a pre and post text to the file name.
So the end result would be...
CDE_AB00123456_A.stp
CDE_AB00123457_A.stp
CDE_AB00123458_A.stp
...and so on
(Note the upper and lowercase text change also......as if this wasn't difficult enough already!)
Any clues would be much appreciated.....along the lines of some DOS command perhaps....
Andy
for /? is extremely helpful. In particular, it contains the following substitutions:
%~nI - expands %I to a file name only
%~xI - expands %I to a file extension only
Thus, you create a for loop that iterates through your files with iteration variable %I and renames %I to CDE_%~nI_A%~xI.
Ready-to-use example:
for %i in (*) DO echo rename %i CDE_%~ni_A%~xi
Try this in a directory of your choice, fine-tune it and remove the echo once you are satisfied.
Note that translation to upper-case is much harder, but since Windows is not case sensitive anyway, I'd just double-check if this is really required.
You should write a batch script to do this. But if you don't know how to script there are 100's of free file renaming tools.
here is a list of some
http://listoffreeware.com/list-of-best-free-file-rename-software/

concatenating .txt files into a csv file with a tab delimiter

I am trying to concatenate a set of .txt files using windows command line, into a csv file.
so i use
type *.txt > me_new_file.csv
but a the fields of a given row, which is tab delimited, ends up in one column. How do I take advantage of tab separation in the original text file to create a csv file such that fields are aligned in columns correctly, using one or more command lines? I am thinking there might be something like...
type *.txt > me_new_file.csv delim= ' '
but haven't been able to find anything yet.
Thank You for your help. Would also appreciate if someone could direct me to a related answer.
From the command line you'd have a fairly complicated time of it. The Windows cmd.exe command processor is much, much simpler than dash, ash, or bash, et.al.
Best thing would be to concatenate all of your files into the .csv file, open it in a text editor, and do a global find and replace replacing with ,
Be careful that your other data doesn't have any commas in it.
If the source files are tab delimited, then the output file is also tab delimited. Depending on the software you are using, you should be able load the tab delimited data properly.
Suppose you are using Excel. If the output file has a .csv extension, then Excel will default to comma delimited columns when it opens the file. Of course that does not work for you. But if you rename the file to have some other extension like .txt, then when you open it with Excel, it will open a series of dialog boxes where you can specify the format, including tab delimited.
If you want to keep the .csv extension and have Excel automatically open it properly, then you need to transform the data. This can be done very easily with JREPL.BAT - a hybrid JScript/batch utility that performs a regular expression search and replace on text data. JREPL.BAT is pure script that runs natively on any Windows machine from XP onward.
The following encloses each value in quotes, just in case a value contains a comma literal.
type *.txt 2>nul | jrepl "\t" "\q,\q" /x /jendln "$txt='\x22'+$txt+'\x22'" /o output.csv
Beware: Your use of type *.txt will fail if the last line in any of your source .txt files does not end with a newline. In such a case, the first line of the next file will be appended to the last line of the previous file. Not good.
You can solve that problem by processing each file individually in a FOR loop.
(for %F in (*.txt) do jrepl "\t" "\q,\q" /x /jendln "$txt='\x22'+$txt+'\x22'" /f "%F") >output.csv
The above is designed to run on the command line. If used in a batch script, then a few changes are needed:
(for %%F in (*.txt) do call jrepl "\t" "\q,\q" /x /jendln "$txt='\x22'+$txt+'\x22'" /f "%%F") >output.csv
Note: My answer assumes none of the source files contain quotes. If they do contain quotes, then a more complicated search and replace is required. But it still can be done efficiently with JREPL.

Resources