This is driving me crazy. Basically, I have a program that outputs tables to a flat file for multiple databases with the same structure. These files get named in the format tablename_####.dat, where #### is the 4 digit company number. After these are all created, the program then combines all of the files by tablename, and adds a timestamp on the end. So, the final file name is in the format tablename_YYYYMMDD_HHmmSS.dat. Finally, I want to delete all of the individual .dat files, leaving only the combined, time stamped files.
This works just fine for all of the tables, except for the table VEX. For example, I have files:
VEX_1234.dat
VEX_5678.dat
VEX_0987.dat
which combine to form VEX_20150414_144352.dat. After this, I run the command:
`del *_????.dat`
This deletes all of the tables' individual files (V_1234.dat, PAT_9534.dat, etc.), while leaving the combined files (V_20150414_142311.dat, PAT_20150413_132113.dat) ...except for VEX. It deletes both the individual files and the combined file. Shouldn't this only delete files that end with an underscore, 4 characters, and ".dat"?
I know this has to be something really simple that I'm missing. What is going on?
Most likely your issue is caused by short 8.3 file names.
The ? wildcard can match 0 or 1 character if it precedes a dot. Your file mask of *_????.dat will match any name that has any number of characters, followed by a _, followed by 0 to 4 characters, followed by the .dat extension. The tricky thing is it will attempt to match both the long file name, and any short 8.3 name, if it exists.
Try issuing dir /x *.dat, and look at the short name of the problem file. I suspect it will match your file mask.
There are patterns with how short names are derived, but there is no way to predict the short name of any given file unless you are aware of all existing short names within the folder, and then you would be relying on undocumented behavior.
This is a fairly common problem. If your files are on an NTFS drive and you have admin rights, then you can disable short file name generation. But this does not remove already existing short names.
The best general solution is to pipe DIR /B through FINDSTR to remove the unwanted files, and process the result with FOR /F to delete each file individually. The FINDSTR below will exclude file names that contain two or more _ characters.
for /f "delims=" %%F in ('dir /b *.dat^|findstr /v "_.*_"') do del "%%F"
Related
All files are in a directory (over 500 000 files), named in the following pattern
AR00001_1
AR00001_2
AR00001_3
AR00002_1
AR00002_2
AR00002_3
I need a script, can be both batch or unix shell that takes everything with AR00001 and moves it into a new folder that will be called AR00001, and does the same for AR00002 files etc
Here's what I've been trying to figure out until now
for f in *_*; do
DIR="$( echo ${f%.*} | tr '_' '/')"
mkdir -p "./$DIR"
mv "$f" "$DIR"
done
Thanks
// Update
Ran this in the CMD
for %F in (c:\test\*) do (md "d:\destination\%~nF"&move "%F" "d:\destination\%~nF\") >nul
Seems to be almost what I wanted, except that it does not take the first 7 characters as a substring but instead creates a folder for each file :/ I'm trying to mix it with your solutions
#echo off
setlocal enabledelayedexpansion
for %%a in (???????_*) do (
set "x=%%a"
set "x=!x:~0,7!"
md "!x!" >nul
move "!x!*" "!x!\" 2>nul
)
for every matching file do:
- get the first 7 characters
- create a folder with that name (ignore error message, if exist)
- move all files that start with those 7 characters (ignore errormessages, if files doesn't exist (already moved))
The following achieves the desired effect and checks for non-existence of the target directory each time before creating it.
#echo off
setlocal ENABLEDELAYEDEXPANSION
set "TOBASE=c:\target\"
set "MATCHFILESPEC=AR*"
for %%F in ("%MATCHFILESPEC%") do (
set "FILENAME=%%~nF"
set "TOFOLDER=%TOBASE%!FILENAME:~0,7!"
if not exist "!TOFOLDER!\" md "!TOFOLDER!"
move "%%F" "!TOFOLDER!" >nul
)
endlocal
In the move command, by moving only the current file rather than including a wildcard, we ensure that we're not eating up file names that might be about to appear the next time around the loop. Keeping it simple, assuming that efficiency is not of prime importance.
I'd recommend prototyping by creating batch files (with a .bat or .cmd extension) rather than trying to do complex tasks interactively using on one-liners. The behaviour can be different and there are more things you can do in a batch file, such as using setlocal to turn on delayed expansion of variables. It's also just a pain writing for loops using the %F interactively, only to have to remember to convert all those to %%F, %%~nF, etc. when pasting into a batch file for posterity.
One word of caution: with 500,000 files in the folder, and all of the files having very similar prefixes, if your file system has 8.3 directory naming turned on (which is often the default) it is possible to run into problems using wildcards. This happens as the 8.3 namespace gets more and more busy and there are fewer and fewer options for ways the file name can be encoded in 8 characters. (The hash table fills up and starts overflowing into unexpected file names).
One solution is to turn that feature off on the server but that may have severe implications for any legacy applications. To see what the file looks like in 8.3 naming scheme, you can do, e.g.:
dir /x /p AR*
... which might give you something like (where the left hand name is the one converted to 8.3):
ARB900~1.TST AR15467_RW322.tst
AR85E3~1.TST AR15468_RW322.tst
ARDDFE~1.TST AR15469_RW322.tst
AR1547~1.TST AR15470_RW322.tst
AR1547~2.TST AR15471_RW322.tst
...
In this example, since the first two characters seem to be maintained, there should be no conflict.
So for example if I say for %a in (AR8*) do #echo %a I get what might at first seem to be incorrect:
AR15468_RW322.tst
AR18565_RW322.tst
AR20376_RW322.tst
AR14569_RW322.tst
AR17278_RW322.tst
...
But this is actually correct; it is all the files that match AR8* in both the long file name and short file name formats.
Edit: I am aware in retrospect that this solution looks very similar to Stephan's, and I had browsed through the existing answers before starting work on my own, so I should credit him. I will try and save face by pointing out a benefit of Stephan's solution. Its use of wildcards should circumvent any 8.3 naming issue: by specifying the wildcard as ???????_*, it only catches the long file names and won't match any of the converted 8.3 file names (all of which are devoid of underscores in that position). Similarly, a wildcard such as AR?????_* would do the same.
With bash, you'd write:
for f in *; do
[[ -d $f ]] && continue # skip existing directories
prefix=${f:0:7} # substring of first 7 characters
mkdir -p "$prefix" # create the directory if it does not exist
mv "$f" "$prefix" # and move the file
done
For the substring expansion, see https://www.gnu.org/software/bash/manual/bash.html#Shell-Parameter-Expansion -- this is probably the bit you're missing.
I have filenames that come with the date at the beginning of their names, e.g. 20171015….txt.
I currently use a batch file to strip off the first 8 characters and this has worked well. However I'd like to now keep the date but move it to the end of the file name.
With the file name without extension set as a variable, %Variable%:
Ren "%Variable%.ext" "%Variable:~8%%Variable:~,8%.ext"
Or if performed within some sort of loop and delayed expansion enabled, i.e.SetLocal EnableDelayedExpansion:
Ren "!Variable!.ext" "!Variable:~8!!Variable:~,8!.ext"
i need a script which will rename large amount of files. I got a folder with a lot of files. Every file is named by ID. Then i have a CSV file like this:
oldID;newID
oldID;newID
etc...
Every old and new id is specific and original. I'd like to ask what should be the best way to do it or little help in bash/batch.
The solution for batch is very similar to e0k's solution for bash; you read the file in one line at a time, split the line on semicolons, and rename the file accordingly.
for /f "tokens=1,2 delims=;" %%A in (ids.csv) do ren "%%A" "%%B"
This assumes that your IDs are in a file called ids.csv
If you are using bash (the shell used in world of Linux, UNIX, etc.), you can use the following short script based on this internal field separator answer. This assumes that you are using a semicolon (;) as the delimiter of your "CSV" file and that there is only one such delimiter.
#!/bin/bash
while IFS=';' read -ra names; do
mv "${names[0]}" "${names[1]}";
done < translation.csv
where translation.csv is your file containing the name translations with an oldname;newname format.
If you are instead asking for a batch file (i.e. for Windows, DOS, etc.) then that is a different animal in a different world.
Given that your OS is some unix (like linux), and given that the use of csv files has been your own choice, there might be an easier way to go: mmv can rename many files in one go, using patterns to match original files, and allowing to use the matched strings in the target file names. See http://ss64.com/bash/mmv.html.
I need change a lot of folders with a batch script file..
I have those format of name folders:
2013.03.12.08.05.06_Debug_Test1
2013.03.12.08.04.09_Debug_Test2
...
I need change for this:
2013.12.03.08.05.06_Debug_Test1
2013.12.03.08.04.09_Debug_Test2
Invert the number 12 with the number 03
This is possible using a windows batch file?
#echo off
for /f "tokens=1,2,3*delims=." %%a in ('dir /b /ad "*.*.*.*") do if not %%b==%%c echo ren "%%a.%%b.%%c.%%d" "%%a-%%c.%%b.%%d"
for /f "tokens=1,2,3*delims=.-" %%a in ('dir /b /ad "*-*.*.*") do if not %%b==%%c echo ren "%%a-%%b.%%c.%%d" "%%a.%%b.%%c.%%d"
Should get you started.
The first FOR selects directories of the format *.*.*.* and renames them *-*.*.* with the 2nd and 3rd elements swapped.
The second renames the renamed directories to change the - to .
Consider directories 2013.03.12.08.05.06_Debug_Test1 and 2013.12.03.08.05.06_Debug_Test1 - attempting to rename the one will fail because the other exists, hence need to rename twice.
(I've assumed that '-' does not exist in your directorynames - you may wish to substitute some other character - #,#,$,q suggest themselves)
Note that I've simply ECHOed the rename. Since the second rename depends on the first, the second set wouldn't be produced until the echo is removed from the first after careful checking.
I'd suggest you create a sample subdirectory to test first, including such names as I've highlighted.
Using StringSolver, which requires a valid JRE installation and sbt, allows to use a semi-automatic version of move:
move 2013.03.12.08.05.06_Debug_Test1 2013.12.03.08.05.06_Debug_Test1
Then check the transformation:
move --explain
concatenates for all a>=0 (a 2-digit number from the substring starting at the a+1-th number ending at the end of the a+1-th AlphaNumeric token in first input + the substring starting at the 2*a+2-th token not containing 0-9a-zA-Z ending at the end of the a+3-th non-number in first input) + the first input starting at the 4th AlphaNumeric token.
which means that it decomposed the transformation by:
2013.12.03.08.05.06_Debug_Test1
AAAABBBBAABCCCCCCCCCCCCCCCCCCCC
where
A is "a 2-digit number from the substring starting at the a+1-th number ending at the end of the a+1-th AlphaNumeric token in first input"
B is "the substring starting at the 2*a+2-th token not containing 0-9a-zA-Z ending at the end of the a+3-th non-number in first input"
C is "the first input starting at the 4th AlphaNumeric token."
which corresponds to what you expected for folders of this type.
If you do not trust it, you can have a dry run:
move --test
which displays what the mapping would do on all folders.
Then perform the transformation for all folders using move --auto or the abbreviated command
move
Alternative
Using the Monitor.ps1 modified and run in Powershell -Sta, you can do it yourself in Windows like in this Youtube video.
DISCLAIMER: I am a co-author of this software developped for academic purposes.
I discovered this quite by accident while looking for a file with a number in the name. When I type:
dir *number*
(where number represents any number from 0 to 9 and with no spaces between the asterisks and the number)
at the cmd.exe command prompt, it returns various files do not appear in any to fit the search criteria. What's weird, is that depending on the directory, some numbers will work and not others. An example is, in a directory associated with a website, I type the following:
dir *4*
and what is returned is:
Directory of C:\Ampps\www\includes\pages
04/30/2012 03:55 PM 153 inventory_list_retrieve.php
06/18/2012 11:17 AM 6,756 ix.html
06/19/2012 01:47 PM 257,501 jquery.1.7.1.js
3 File(s) 264,410 bytes
0 Dir(s) 362,280,906,752 bytes free
That just doesn't make any sense to me. Any clue?
The question is posed on stackOverflow because the DIR command is often combined with FOR in batch programs. The strange DIR behavior would seem to make batch programs potentially unreliable if they use the DIR command.
Edit: (additional note). Though much time has passed, I discovered another quirk with this that almost cost me a lot of work. I wanted to delete all .htm files in a particular directory tree. I realized just before doing it that *.htm matches .html files as well. Also, *.man matches .manifest, and there are probably others. Deleting all .html files in that particular directory would have been upsetting to say the least.
Wild cards at the command prompt are matched against both the long file name and the short "8.3" name if one is present. This can produce surprises.
To see the short names, use the /X option to the DIR command.
Note that this behavior is not in any way specific to the DIR command, and can lead to other (often unpleasant) surprises when a wild card matches more than expected on any command, such as DEL.
Unlike in *nix shells, replacement of a file pattern with the list of matching names is implemented within each command and not implemented by the shell itself. This can mean that different commands could implement different wild card pattern rules, but in practice this is quite rare as Windows provides API calls to search a directory for files that match a pattern and most programs use those calls in the obvious way. For programs written in C or C++ using the "usual" tools, that expansion is provided "for free" by the C runtime library, using the Windows API.
The Windows API in question is FindFirstFile() and its close relatives FindFirstFileEx(), FindNextFile(), and FindClose().
Oddly, although the documentation for FindFirstFile() describes its lpFileName parameter as "directory or path, and the file name, which can include wildcard characters, for example, an asterisk (*) or a question mark (?)" it never actually defines what the * and ? characters mean.
The exact meaning of the file pattern has history in the CP/M operating system dating from the early 1970s that strongly influenced (some might say "was directly copied" in place of "influenced" here) the design of MSDOS. This has resulted in a number of "interesting" artifacts and behaviors. Some of this at the DOS end of the spectrum is described at this blog post from 2007 where Raymond describes exactly how file patters were implemented in DOS.
Yep. You'll see that it also searches through short names if you try this:
dir /x *4*
(/x switch is for short names)
for filtering file names use :
dir /b | find "4"
A quote from RBerteig's answer:
Note that this behavior is not in any way specific to the DIR command,
and can lead to other (often unpleasant) surprises when a wild card
matches more than expected on any command, such as DEL.
The above is true even for the FOR command, which is very nasty.
for %A in (*4*) do #echo %A contains a 4
will also search the short names. The solution again would be to use FIND or FINDSTR to filter out the names in a more reliable manner.
for %A in (*) do #echo %A | >nul findstr 4 && echo %A contains a 4
Note - change %A to %%A if using the command within a batch file.
Combining FOR with FINDSTR can be a general purpose method to safely use any command that runs into problems with short file names. Simply replace ECHO with the problem command such as COPY or DEL.
Seems like dir command searches also short ( 8.3 manner ) file names under the hood.
When I call dir *1* this is what I get:
Volume in drive C is System
Volume Serial Number is F061-0B78
Directory of C:\Users\Piotrek\Desktop\Downloads
2012-05-20 17:33 23 639 040 gDEBugger-5_8.msi
2012-05-20 17:30 761 942 glew-1.7.0.zip
2012-05-20 17:11 9 330 176 irfanview_plugins_433_setup.exe
2012-05-24 20:17 4 419 192 SumatraPDF-2.1.1-install.exe
2012-05-15 22:55 3 466 248 TrueCrypt Setup 7.1a.exe
5 File(s) 1 127 302 494 bytes
There is a gDEBugger-5_8.msi file amongst listed ones, which apparently does not have any 1 character in it.
Everything becomes clear when I use /X switch with the dir command, which makes dir use 8.3 file names. Output from a dir /X *1* command:
Volume in drive C is System
Volume Serial Number is F061-0B78
Directory of C:\Users\Piotrek\Desktop\Downloads
2012-05-20 17:33 23 639 040 GDEBUG~1.MSI gDEBugger-5_8.msi
2012-05-20 17:30 761 942 GLEW-1~1.ZIP glew-1.7.0.zip
2012-05-20 17:11 9 330 176 IRFANV~1.EXE irfanview_plugins_433_setup.exe
2012-05-24 20:17 4 419 192 SUMATR~1.EXE SumatraPDF-2.1.1-install.exe
2012-05-15 22:55 3 466 248 TRUECR~1.EXE TrueCrypt Setup 7.1a.exe
5 File(s) 1 127 302 494 bytes
Quote from dir's help:
/X This displays the short names generated for non-8dot3 file
names. The format is that of /N with the short name inserted
before the long name. If no short name is present, blanks are
displayed in its place.