Remove duplicates from comma separated list in batch file - windows

I have a batch file that (among other things) turns a list like this:
'foo_ph1-1.tif', 'foo_ph2-1', 'foo_ph2-2'
into a list like this, in a local variable called INVNOS:
'fooph1', 'fooph2', 'fooph2'
I want to remove the duplicates from the second list. I've been trying to do this when I create the list, from the answers to this question, to no avail.
Here's how I make the list.
#echo off
setlocal ENABLEDELAYEDEXPANSION
for %%f in ("*.tif") do #echo %%~nf>>list.lst
set FNAMES=
set INVNOS=
for /f %%i in ('type list.lst') do (
set FNAMES=!FNAMES!'%%i.jpg',
for /f "tokens=1 delims=-" %%a in ("%%i") do (
set BEFORE_HYPHEN=%%a
set INVNOS=!INVNOS!'!BEFORE_HYPHEN:_=!',
)
)
set "FNAMES=%FNAMES:~0,-2%"
set "INVNOS=%INVNOS:~0,-2%"
echo %INVNOS%
endlocal
Solutions with findstr won't work because I need to initialize INVNOS with an empty string, and I get stuck with the difference between % and '!', and slicing, inside the for loop.
I know this is easy in Python, however I'd like to do it with what's native (Windows 10/Windows Server), so CMD or Powershell.
Any suggestions?
Just to sketch the bigger picture, INVNOS (inventory numbers) is derived from directories full of tif's, so we can check whether or not they exist in some sql database.

I would approach the problem differently:
#echo off
setlocal ENABLEDELAYEDEXPANSION
for %%f in (*.tif) do (
for /f "delims=-" %%g in ("%%~nf") do set "~%%g=."
)
for /f "delims=~=" %%a in ('set ~') do set "INVOS='%%a', !INVOS!"
set "INVOS=%INVOS:~0,-2%
echo %INVOS:_=%
The trick is to define variables for each filename (the variableNAMES contain the filenames. A variable can only exist once, so per definition, there are no duplicates)
With another for loop extract the names from the defined variables and join them. The underscores can be deleted in one go instead of removing them from each substring.
When needed, you can delete the variables with for /f "delims==" %%a in ('set ~') do set "%%a=", but they are destroyed anyway when the script ends. (same line when you want to be sure, no variable starting with ~ is defined by accident before you set them)

#ECHO OFF
SETLOCAL ENABLEDELAYEDEXPANSION
:: The values assigned to these variables suit my system and test environment
SET "sourcedir=u:\your files"
SET "tempfile=%temp%\tempfile.txt"
:: remove variables starting :
FOR /F "delims==" %%a In ('set : 2^>Nul') DO SET "%%a="
(for %%f in ("%sourcedir%\*.tif") do echo %%~nf)>"%tempfile%"
set "FNAMES="
set "INVNOS="
for /f "usebackqdelims=" %%i in ("%tempfile%") do (
set FNAMES=!FNAMES!'%%i.jpg',
for /f "tokens=1 delims=-" %%a in ("%%i") do (
set "BEFORE_HYPHEN=%%a"
SET "before_hyphen=!BEFORE_HYPHEN:_=!"
IF NOT DEFINED :!BEFORE_HYPHEN! set "INVNOS=!INVNOS!'!BEFORE_HYPHEN:_=!', "&SET ":!BEFORE_HYPHEN!=Y"
)
)
set "FNAMES=%FNAMES:~0,-2%"
set "INVNOS=%INVNOS:~0,-2%"
echo %INVNOS%
IF DEFINED tempfile DEL "%tempfile%"
GOTO :EOF
You would need to change the value assigned to sourcedir to suit your circumstances. The listing uses a setting that suits my system.
I deliberately include spaces in names to ensure that the spaces are processed correctly.
%tempfile% is used temporarily and is a filename of your choosing.
The usebackq option is only required because I chose to add quotes around the source filename.
it is standard practice on SO to use the syntax set "var=value" for string
assignments as this ensures stray trailing spaces on the line are ignored.
Evil trailing space on OP's code set INVNOS... within the for ... %%a loop.
Given OP's original filename list, foo_ph1-1.tif foo_ph2_1 foo_ph2-2, the processing should produce fooph1 fooph21 fooph2, not fooph1 fooph2 fooph2 as claimed.
My testing included foo_ph2-2.tif
The code is essentially the same, but first clearing any environment variables that start :, on the Irish principle.
The temporary file nominated is recreated avoiding the (unfulfilled) requirement to first delete it.
BEFORE_HYPHEN is explicitly expunged of underscores before the if not defined test is applied. I selected : because : can't be part of a filename. Once the name is applied to the invnos list, the :!BEFORE_HYPHEN! variable is established to prevent further accumulation of repeat BEFORE_HYPHEN values into invnos.

If you wanted to step up to PowerShell, something like this could be done in a .bat file script. Of course, It would be easier to write and maintain if it were all written in PowerShell.
=== doit.bat
#ECHO OFF
FOR /F "delims=" %%A IN ('powershell -NoLogo -NoProfile -Command ^
"(Get-ChildItem -File -Filter '*.tif' |" ^
"ForEach-Object { '''' + $($_.Name.Split('-')[0].Replace('_','')) + '''' } |" ^
"Sort-Object -Unique) -join ','"') DO (
SET "INVNOS=%%~A"
)
ECHO INVNOS is set to %INVNOS%
EXIT /B
Get-ChildItem produces a list of all the *.tif files in the directory. Split() does what "delims=-" does in a FOR loop. The [0] subscript chooses everything up to the first '-' character in the file name. Replace will remove the '_' characters. Sort-Object removed duplicates to produce a unique list. The -join converts the list of names to a single, comma delimited string. The resulting string is stored into the INVNOS variable.
Do you really want APOSTROPHE characters around each name in the list?

Related

For command token doesn't work when there are empty fields in the record

I am trying to extract the values from the third field of a file which has data records.
The fields are separated by vertical bar characters:
9001||10454145||60|60
9001|234467|10454145||60|60
9001|234457|10454145||60|60
Command is -
for /f "tokens=3 delims=|" %%A IN ('Findstr /i "9001" .\itemloc\%%~nf.dat') do (
echo %%A >> log.txt
)
But the output I am getting is
60
10454145
10454145
The empty fields are messing up my output. Any suggestions how to make the for token work with empty fields in the record?
#ECHO OFF
SETLOCAL ENABLEDELAYEDEXPANSION
rem The following settings for the directories and filenames are names
rem that I use for testing and deliberately includes spaces to make sure
rem that the process works using such names. These will need to be changed to suit your situation.
SET "sourcedir=u:\your files"
SET "destdir=u:\your results"
SET "filename1=%sourcedir%\q75199035.txt"
SET "outfile=%destdir%\outfile.txt"
(
FOR /f "usebackqtokens=1*delims=" %%e IN ("%filename1%") DO (
SET "line=%%e"
FOR /f "tokens=3 delims=|" %%y IN ("!line:||=|(missing)|!") DO ECHO %%y
)
)>"%outfile%"
TYPE "%outfile%"
GOTO :EOF
Always verify against a test directory before applying to real data.
Note that if the filename does not contain separators like spaces, then both usebackq and the quotes around %filename1% can be omitted.
The magic is that for each line, || is replaced by |(missing)|.
This simple solution has its faults - for instance if there is ||| in the source data, or the usual suspects (some punctuation symbols like !) but should be quite happy with alphameric source text.
Another way would be to use a third-party utility like sed to pre-process the source data.
The fundamental reason for this phenomenon is that for/f parses the line as [delimiters]token1[delimiters]token2..., where [delimiters] is any sequence of any of the delimiter characters.

copying most recent file set from txt in unknown directory

i have a notepad.txt document that lists the files that need to be copied to the folder that holds the batch file. the files are located in several sub directories and with my code, it copies all of the files with specified name.
for /f "delims=" %%i in (testlist.txt) do echo |robocopy "%dir1%." "C:\temporary" "%%i.*" /s /ndl /njs /njh /nc /ts /ns
how do i set this up properly so it will search the most recent file, and copy only the file not the folder and subfolder?
How to get file's last modified date on Windows command line?
for %a in (MyFile.txt) do set FileDate=%~ta
Compare 2 dates in a Windows batch file
set "sdate1=%olddate:~-4%%olddate:~3,2%%olddate:~0,2%"
set "sdate2=%newdate:~-4%%newdate:~3,2%%newdate:~0,2%"
if %sdate1% GTR %sdate2% (goto there) else echo here
So given that you can already read the file and do the copy, here is pseudo code of the logic I would write for putting it all together:
set oldTimeStamp = "1901-01-01" //so first comparison wins and doesn't throw a null error
for each filename in list.txt
set newTimestamp = getTimeStamp(filename)
if newTimeStamp > oldTimeStamp then set fileToCopy = filename
set oldTimeStamp = newTimeStamp
next
doCopy(fileToCopy)
Basically loop through each filename and get the timestamp. Store the timestamp of the previous file and compare the new and old timestamps. If the current one is newer, save the filename to a variable that you will use to copy. At the end of the loop, fileToCopy should contain the name of the file with the most recent modified time.
The following code snippet retrieves the most recent file and echos its path. Here the wmic command is used to get standardised locale-independent timestamps, which can immediately be compared as strings, so it is not necessary to convert them to numbers. So here it is:
#echo off
setlocal EnableExtensions EnableDelayedExpansion
set "RECENT=00000000000000.000000+000"
set "RECENTFILE="
for /F "usebackq eol=| delims=" %%L in ("testlist.txt") do (
setlocal DisableDelayedExpansion
set "CURRFILE=%%~fL"
if exist "%%~fL" (
setlocal EnableDelayedExpansion
for /F "skip=1 tokens=1 delims= " %%T in ('
wmic DATAFILE ^
WHERE Name^="!CURRFILE:\=\\!" ^
GET LastModified ^
/FORMAT:TABLE
') do (
for /F "delims=" %%S in ("%%T") do (
if %%S GTR !RECENT! (
endlocal
endlocal
set "RECENT=%%S"
set "RECENTFILE=%%~fL"
) else (
endlocal
endlocal
)
)
)
) else (
endlocal
)
)
if defined RECENTFILE (
rem Perform your action here:
echo(!RECENTFILE!
)
endlocal
exit /B
What happens:
there are two variables RECENT and RECENTFILE which hold the timestamp of and the path to most recent file, respectively;
the outer for /F loop walks through the items in the list file testlist.txt;
for each existing item, a wmic query is executed to get the last modify date, and its output is parsed by two nested for /F loops, each iterating once only; since wmic returns Unicode strings, a single for /F loop is not enough because it leaves some orphaned carriage-return characters, which may impact the remaining code, but a second loop removes them;
the retrieved file date is compared to the buffered one in RECENT, and if it is greater, meaning that the file is newer, it is stored in RECENT and the respective file path is stored in RECENTFILE;
if variable RECENTFILE is finally not defined, the list testlist.txt does not point to existing files, or it is empty;
the toggling of delayed expansion is necessary to avoid trouble with any special characters;
Besides the fact, that the wmic queries are worse in terms of performance compared to getting the timestamps using for (for instance for %F in ("*.*") do echo %~tF), the following restriction applies:
The , character must not occur in any of the listed file paths!
According to this answer, there is a way to overcome this, but then the ) character is disallowed: to replace the clause WHERE Name^="!CURRFILE:\=\\!" by WHERE ^(Name^="!CURRFILE:\=\\!"^) (the escaping ^ of the parenthesis is only required as the wmic command line is placed within a for /F set). So you can either have , or ) within a wmic command line, but not both of these characters.

loop through file saving to variable

I now have the following bat file working (which allows one to add text to the end of each line of a file) -- please see also:
bat file: Use of if, for and a variable all together
#echo off
setLocal EnableDelayedExpansion
IF EXIST "%FileToModify1%" (
for /f "tokens=* delims= " %%a in (%FileToModify1%) do (
echo %%a Note: certain conditions apply >> "%SaveFile1%"
)
)
However, I would like to save each line to a variable (including the new line symbol(s)) and then echo the variable to a file at the end. Since there are several lines in the file it is really inefficient to save to a file with each line.
I tried googling this, but the answers do not fit my situation...
essentially I need the syntax for concatenating and saving to a variable (cumulatively like "+=" in C#), and also using the new lines...
Actually you do not need to put everything into a variable, you just need to place the redirection at another position.
Try this:
#echo off
setlocal EnableDelayedExpansion
if exist "%FileToModify1%" (
for /F "usebackq delims=" %%a in ("%FileToModify1%") do (
echo %%a Note: certain conditions apply
)
) > "%SaveFile1%"
endlocal
Note that empty lines in the original file are ignored by for /F, so they are not transferred to the new file. Also lines starting with ; are ignored by for /F (unless you change the eol option -- see for /?).
I modified the for /F options:
no delims are allowed, so the each line is output as is (with "tokens=* delims= ", leading spaces are removed from each line if present);
usebackq allows to surround the file specification in "" which is helpful if it contains spaces;
Appendix A
If you still want to store the file content into a variable, you can do this:
#echo off
setlocal EnableDelayedExpansion
rem the two empty lines after the following command are mandatory:
set LF=^
if exist "%FileToModify1%" (
set "FileContent="
for /F "usebackq delims=" %%a in ("%FileToModify1%") do (
set "FileContent=!FileContent!%%a Note: certain conditions apply!LF!"
)
(echo !FileContent!) > "%SaveFile1%"
)
endlocal
The file content is stored in variable FileContent, including the appendix Note: certain conditions apply. LF holds the new-line symbol.
Note:
The length of a variable is very limited (as far as I know, 8191 bytes since Windows XP and 2047 bytes earlier)!
[References:
Store file output into variable (last code fragment);
Explain how dos-batch newline variable hack works]
Appendix B
Alternatively, you could store the file content in a array, like this:
#echo off
setlocal EnableDelayedExpansion
if exist "%FileToModify1%" (
set /A cnt=0
for /F "usebackq delims=" %%a in ("%FileToModify1%") do (
set /A cnt+=1
set "Line[!cnt!]=%%a Note: certain conditions apply"
)
(for /L %%i in (1,1,!cnt!) do (
echo !Line[%%i]!
)) > "%SaveFile1%"
)
endlocal
Each line of the file is stored in an array Line[1], Line[2], Line[3], etc., including the appendix Note: certain conditions apply. cnt contains the total number of lines, which is the array size.
Note:
Actually this is not a true array data type as such does not exist in batch, it is a collection of scalar variables with an array-style naming (Line[1], Line[2],...); therefore one might call it pseudo-array.
[References:
Store file output into variable (first code fragment);
How to create an array from txt file within a batch file?]
you can write the output file in one shot:
(
for /l %%i in (0,1,10) do (
echo line %%i
)
)>outfile.txt
(much quicker than appending each line separately)

cmd for loop variable set issue

I have a file named like
HelfTool.txt
Code1=Value1
Code2=Value2
I am trying to get the variable named as Code1 and code 2 in cmd batch file with corresponding values. I have written below code but it gives me error stated below.
for /f tokens^=1^,^2^ delims^=^*^=^" %%b in (C:\HelfTool.txt) do if not defined "%%b" set "%%b"=%%c
Environment variable Code1 not defined
Environment variable Code2 not defined
I tried to define these variable at the beginning of the batch file but no use. Can anyone help here.
Your if not defined is wrong - the variable name should not be quoted. It should be
if not defined %%b
Your set command is wrong - it creates a variable with quotes in the name. It should be
set %%b=%%c
or better yet, enclose the entire assignment within one set of quotes:
set "%%b=%%c"
Your FOR /F options are mostly correct, but I do not understand why you took the difficult route of escaping a bunch of characters instead of simply using quotes. Also, I don't think you want to include * as a delimiter. You could have used
for /f "tokens=1,2 delims=="
or better yet (just in case the value contains an =, though it will not preserve a leading = in the value)
for /f "tokens=1* delims=="
But I don't see why you are parsing the line at all, or why you think you must test if the variable is defined yet. It seems to me you could simply use:
for /f "delims=" %%A in (C:\HelfTool.txt) do set "%%A"
Next CLI output could help:
==>for /f tokens^=1^,^2^ delims^=^*^=^" %b in (HelfTool.txt) do #echo set "%b=%c"
set "Code1=Value1"
set "Code2=Value2"
Another approach:
==>for /f "tokens=*" %b in (HelfTool.txt) do #echo set "%b"
set "Code1=Value1"
set "Code2=Value2"
Double the % (percent sign) to use in a .bat batch script, e.g. last command should be
for /f "tokens=*" %%b in (HelfTool.txt) do #echo set "%%b"

Find files and sort by size in a Windows batch file

I have as command-line parameters to my batch script a list of filenames and a folder. For each filename, I need to print all subfolders of the folder where the file is found (the path of that file). The subfolder names should be sorted in descending order of the file sizes (the file can have various sizes in different subfolders).
I have done this so far, but it doesn't work:
::verify if the first parameter is the directory
#echo off
REM check the numbers of parameters
if "%2"=="" goto err1
REM check: is first parameter a directory?
if NOT EXIST %1\NUL goto err2
set d=%1
shift
REM iterate the rest of the parameters
for %%i in %dir do (
find %dir /name %i > temp
if EXIST du /b temp | cut /f 1 goto err3
myvar=TYPE temp
echo "file " %i "is in: "
for %%j in %myvar do
echo %j
echo after sort
du /b %myvar | sort /nr
)
:err1
echo Two parameters are necessary
goto end
:err2
echo First parameter must be a directory.
goto end
:err3
echo file does not exist.
goto end
:end
I don't feel guilty answering this homework question now that the semester is long past. Print folders and files recursively using Windows Batch is a closed duplicate question that discusses the assignment.
My initial solution is fairly straight forward. There are a few tricks to make sure it properly handles paths with special characters in them, but nothing too fancy. The only other trick is left padding the file size with spaces so that SORT works properly.
Just as in the original question, the 1st parameter should be a folder path (.\ works just fine), and subsequent arguments represent file names (wildcards are OK).
#echo off
setlocal disableDelayedExpansion
set tempfile="%temp%\_mysort%random%.txt"
set "root="
for %%F in (%*) do (
if not defined root (
pushd %%F || exit /b
set root=1
) else (
echo(
echo %%~nxF
echo --------------------------------------------
(
#echo off
for /f "eol=: delims=" %%A in ('dir /s /b "%%~nxF"') do (
set "mypath=%%~dpA"
set "size= %%~zA"
setlocal enableDelayedExpansion
set "size=!size:~-12!"
echo !size! !mypath!
endlocal
)
) >%tempfile%
sort /r %tempfile%
)
)
if exist %tempfile% del %tempfile%
if defined root popd
I had hoped to avoid creation of a temporary file by replacing the redirect and subsequent sort with a pipe directly to sort. But this does not work. (see my related question: Why does delayed expansion fail when inside a piped block of code?)
My first solution works well, except there is the potential for duplicate output depending on what input is provided. I decided I would write a version that weeds out duplicate file reports.
The basic premise was simple - save all output to one temp file with the file name added to the front of the sorted strings. Then I need to loop through the results and only print information when the file and/or the path changes.
The last loop is the tricky part, because file names can contain special characters like ! ^ & and % that can cause problems depending on what type of expansion is used. I need to set and compare variables within a loop, which usually requires delayed expansion. But delayed expansion causes problems with FOR variable expansion when ! is found. I can avoid delayed expansion by calling outside the loop, but then the FOR variables become unavailable. I can pass the variables as arguments to a CALLed routine without delayed expansion, but then I run into problems with % ^ and &. I can play games with SETLOCAL/ENDLOCAL, but then I need to worry about passing values across the ENDLOCAL barrier, which requires a fairly complex escape process. The problem becomes a big vicious circle.
One other self imposed constraint is I don't want to enclose the file and path output in quotes, so that means I must use delayed expansion, FOR variables, or escaped values.
I found an interesting solution that exploits an odd feature of FOR variables.
Normally the scope of FOR variables is strictly within the loop. If you CALL outside the loop, then the FOR variable values are no longer available. But if you then issue a FOR statement in the called procedure - the caller FOR variables become visible again! Problem solved!
#echo off
setlocal disableDelayedExpansion
set tempfile="%temp%\_mysort%random%.txt"
if exist %tempfile% del %tempfile%
set "root="
(
for %%F in (%*) do (
if not defined root (
pushd %%F || exit /b
set root=1
) else (
set "file=%%~nxF"
for /f "eol=: delims=" %%A in ('dir /s /b "%%~nxF"') do (
set "mypath=%%~dpA"
set "size= %%~zA"
setlocal enableDelayedExpansion
set "size=!size:~-12!"
echo(!file!/!size!/!mypath!
endlocal
)
)
)
)>%tempfile%
set "file="
set "mypath="
for /f "tokens=1-3 eol=/ delims=/" %%A in ('sort /r %tempfile%') do call :proc
if exist %tempfile% del %tempfile%
if defined root popd
exit /b
:proc
for %%Z in (1) do (
if "%file%" neq "%%A" (
set "file=%%A"
set "mypath="
echo(
echo %%A
echo --------------------------------------------
)
)
for %%Z in (1) do (
if "%mypath%" neq "%%C" (
set "mypath=%%C"
echo %%B %%C
)
)
exit /b

Resources