Loop through folders in subdirectories and combine text files - windows

I am wanting to loop through folders within a subdirectory and combine all text files into one file. I found some answers online but none seems to work. Any help is much appreciated. I have provided what I've found below. In the example below the DummyFolder has multiple subdirectories that contain .txt files that need to be merged into 1 file. I got code 3 to work yesterday but somehow I changed something and it is no longer working for some reason.
Code 1:
#echo off
set "header=C:\Users\user\Desktop\DummyFolder\Headings.txt"
set "folder=C:\Users\user\Desktop\DummyFolder\"
set "tempFile=%folder%\temp.txt"
for %%F in ("%folder%\*.txt") do (
type "%header%" >"%tempFile%"
type "%%F" >>"%tempFile%"
move /y "%tempFile%" "%%F" >nul
)
Also found this code (Code 2):
$startingDir = 'C:\Users\user\Desktop\DummyFolder\'
$combinedDir = 'C:\Users\user\Desktop\DummyFolder\CombinedTextFiles'
Get-ChildItem $startingDir -Recurse | Where-Object {
$txtfiles = Join-Path $_.FullName '*.txt'
$_.PSIsContainer -and (Test-Path $txtfiles)
} | ForEach-Object {
$merged = Join-Path $combinedDir ($_.Name + '_Merged.txt')
Get-Content $txtfiles | Set-Content $merged
}
Also found this code (Code 3):
#echo on
set folder="C:\Users\user\Desktop\DummyFolder\"
for /F %%a in ('dir /b /s %folder%') do (
if "%%~xa" == ".txt" (
(echo/------------------------------
type %%~a
echo/)>>"%~dp0list.txt"
)
)

In CMD you'd do something like this:
#echo off
set "basedir=C:\some\folder"
set "outfile=C:\path\to\output.txt"
(for /r "%basedir%" %f in (*.txt) do type "%~ff") > "%outfile%"
For use in batch files you need to change %f to %%f and %~ff to %%~ff.
In PowerShell you'd do something like this:
$basedir = 'C:\some\folder'
$outfile = 'C:\path\to\output.txt'
Get-ChildItem $basedir -Include *.txt -Recurse | Get-Content |
Set-Content $outfile

There are so many ways to do this. For example, using the Wolfram Language you can:
StringJoin ##
FileSystemMap[
If[FileExtension[#] == "txt", Import[#, "Text"]] &,
"C:\\Users\\user\\Desktop\\DummyFolder\\", Infinity, 1]
An then write the result using
Export[C:\\Users\\user\\Desktop\\, %, "Text"]
You can also do this with Python, Perl, etc.. use PowerShell only if you need to share your solution and want to avoid installers. I would not spend too much time learning 1981 technology (CMD).

Assuming that your source files are located in immediate sub-directories of the root directory DummyFolder and that you want the content of Headings.txt to occur once only on top of the resulting file, you could accomplish your task using the following script:
#echo off
rem // Define constants here:
set "folder=C:\Users\user\Desktop\DummyFolder"
set "header=%folder%\Headings.txt"
set "result=%folder%\merged.txt"
rem // Prepare result file, copy content of header file:
copy "%header%" "%result%" > nul
rem // Enumerate immediate sub-directories of the given root directory:
for /D %%D in ("%folder%\*") do (
rem // Enumerate matching files per sub-directory:
for %%F in ("%%~D\*.txt") do (
rem // Append content of current file to result file:
copy /Y "%result%" + "%%~F" "%result%" /B > nul
)
)
In case your source files are located anywhere in the directory tree DummyFolder, you need to make sure that the header file Headings.txt and the result file merged.txt are not iterated:
#echo off
rem // Define constants here:
set "folder=C:\Users\user\Desktop\DummyFolder"
set "header=Headings.txt"
set "result=merged.txt"
rem // Prepare result file, copy content of header file:
copy "%folder%\%header%" "%folder%\%result%" > nul
rem // Enumerate matching files in the whole given directory tree:
for /R "%folder%" %%F in ("*.txt") do (
rem // Exclude the header file to be re-processed:
if /I not "%%~nxF"=="%header%" (
rem // Exclude the result file to be processed:
if /I not "%%~nxF"=="%result%" (
rem // Append content of current file to result file:
copy /Y "%folder%\%result%" + "%%~F" "%folder%\%result%" /B > nul
)
)
)

This may be a simple answer for what you are looking for, the usebackq is important to allow "" around paths. tokens=* to include all information. To use in a console instead of a batch file change %% to %.
for /f "tokens=*" %%a in ('dir /s /b C:\testpath\*.txt') do (for /f "usebackq tokens=*" %%b in ("%%a") do (echo %%b >> C:\test.txt))

Code 3 is not bad but it won't work with spaces in a path because you use the standard delims as you're not providing one. Also there a several other errors about working with spaces in a path.
The following code works and combine all txt files in all subdirectories. It will create a new file list.txt in the folder where this batch file is located. If there is already an existing list.txt it will be overwritten. Note that it's a batch file:
#echo off
set "folder=C:\Users\user\Desktop\DummyFolder\"
rem create new empty file: list.txt in directory of batch file: %~dp0
break>"%~dp0list.txt"
rem loop through all output lines of the dir command, unset delimns
rem so that space will not separate
for /F "delims=" %%a in ('dir /b /s "%folder%"') do (
rem just look for txt files
if "%%~xa" == ".txt" (
rem don't use the list.txt
if not "%%a" == "%~dp0list.txt" (
rem append the output of the whole block into the file
(echo/------------------------------
type "%%a"
echo/)>>"%~dp0list.txt"
)
)
)
If you don't understand something it's quite easy to find something good on the internet because there are several great batch scripting sites. Further you can always use echo This is a message visible on the command prompt to display something that might be useful e.g. variables etc. With that you can "debug" and look what happens.
Some explanations beyond the comments (rem This is a comment) in the code:
1.
break command:
To clear a file I use the break command which will produce no output at all. That empty output I redirect to a file, read it here: https://stackoverflow.com/a/19633987/8051589.
2.
General variables:
You set variables via set varname=Content I prefer the way as I do it with quotes: set "varname=Content" as it works with redirection characters also. Use the variable with one starting % and one trailing % e.g. echo %varname%. You can read a lot of it on https://ss64.com/nt/set.html. I think ss64 is probably the best site for batch scripting out there.
3.
Redirection > and >>:
You can redirect the output of a command with > or >> where > creates a new file and overwrites existing files and >> appends to a file or create one if not existing. There are a lot more thing possible: https://ss64.com/nt/syntax-redirection.html.
4.
for /f loop:
In a batch file you loop through the lines of a command output by using a for /f loop. The variable that is used will be written with 2 % in front of it, here %%a. I also set the delimiter delimns to nothing so that the command output will not be separated into several tokens.
You can read a lot of details about a for /f loop at: https://ss64.com/nt/for_cmd.html.
5.
Special variable syntax %%~xa and %~dp0:
The variable %%a which hold one line of the dir command can be expand to the file extension only via: %%~xa as explained here: https://stackoverflow.com/a/5034119/8051589. The %~dp0 variable contains the path where the batch file is located see here: https://stackoverflow.com/a/10290765/8051589.
6.
Block redirection ( ... )>>:
To redirect multiple commands at once you can open a block (, execute commands, close the block ) and use a redirection. You could also execute every command and redirect that only that would have the same effect.

Related

Trying to write a script to change the end of file names with a .bat

I have a bunch of files that I need to rename. They are variable length. Like this:
A1B2C3D4.en.fr.pdf
A1B2C3D4S8.it.fr.pdf
A1B2C3.de.fr.pdf
A1B2C3D4E5.zn.fr.pdf
I want to change them so that I can run a .bat file to make 2 changes:
prefix them all with a static prefix, XYZ10;
replace the .*.fr.pdf variable ending with the static FRFR.pdf;.
So they'll look like this:
XYZ10A1B2C3D4FRFR.pdf
XYZ10A1B2C3D4S8FR.pdf
XYZ10A1B2C3FRFR.pdf
XYZ10A1B2C3D4E5FRFR.pdf
I've been doing it in individual steps each time with power shell but it's a pain to keep doing it and sometimes it does it improperly.
I've tried this:
#echo off
ren *.??.fr.pdf *.FRFR.pdf
but it just makes them look like this:
A1B2C3D4E5.zn.fr.FRFR.pdf
I don't know where to begin with the prefix, I don't really understand any of the things I've been reading about it...
EDIT:
This is what I've been doing to prefix in PowerShell.
Dir *.pdf | rename-item -newname {"XYZ10"+ $_.Name}
There is no simple ren command line to rename as you desire (walk through the thorough post How does the Windows RENAME command interpret wildcards?). I would do it the following way:
rem // Loop through all relevant files:
for /F "delims= eol=|" %%K in ('dir /B /A:-D-H-S "*.??.fr.pdf"') do (
rem // `%%K` is the full file name, `%%~nK` has got `.pdf` removed, `%%~xK` is `.pdf`.
for %%J in ("%%~nK") do for %%I in ("%%~nJ") do (
rem // `%%~nJ` has got `.fr.pdf` removed, `%%~nI` has got `.??.fr.pdf` removed.
rem // Actually rename the file (complaints in case of conflicts):
ren "%%K" "XYZ10%%~nIFRFR%%~xK"
)
)

Loop through files in a folder and check if they have different extensions

I have a folder that contains files; each document should have .pdf and .xml format. I need to write a BAT file to run from a scheduled task to verify that both documents exist for each.
My logic is:
loop through files in the folder
strip each file to its name without extension
check that same name files exist for both .xml and pdf.
if not mark a flag variable as problem
when done, if the flag variable is marked, send an Email notification
I know how to use blat to sending email, but I'm having trouble to execute the loop. I found a way to get path and file name without extension but can't merge them.
I've used batch files a few time, before but I'm far from an expert. What am I missing?
Here's the code I have so far:
set "FolderPath=E:\TestBat\Test\"
echo %FolderPath%
for %%f in (%FolderPath%*) do (
set /p val=<%%f
For %%A in ("%%f") do (
Set Folder=%%~dpA
Set Name=%%~nxA
)
echo Folder is: %Folder%
echo Name is: %Name%
if NOT EXIST %FolderPath%%name%.xml
set flag=MISSING
if NOT EXIST %FolderPath%%name%.pdf
set flag=MISSING
)
echo %Flag%
pause
There is no need for fancy code for a task such as this:
#Echo Off
Set "FolderPath=E:\TestBat\Test"
If /I Not "%CD%"=="%FolderPath%" PushD "%FolderPath%" 2>Nul||Exit/B
Set "flag="
For %%A In (*.pdf *.xml) Do (
If /I "%%~xA"==".pdf" (If Not Exist "%%~nA.xml" Set "flag=MISSING")
If /I "%%~xA"==".xml" (If Not Exist "%%~nA.pdf" Set "flag=MISSING")
)
If Defined flag Echo=%flag%
Timeout -1
Something like this :
set "FolderPath=E:\TestBat\Test\"
pushd "%FolderPath%"
for %%a in (*.xml) do (
if exist "%%~na.pdf"(
echo ok
) else (
rem do what you want here
echo Missing
)
)
popd
Is this what you want?
#echo off
setlocal enabledelayedexpansion
set "FolderPath=E:\TestBat\Test\"
echo !FolderPath!
for /f "usebackq delims=" %%f in (`dir !FolderPath! /B`) do (
set /p val=<%%f
For %%A in ("%%f") do (
Set Folder=%%~dpA
Set name=%%~nxA
)
echo Folder is: !Folder!
echo Name is: !name!
if NOT EXIST !FolderPath!!name!.xml set flag=MISSING
if NOT EXIST !FolderPath!!name!.pdf set flag=MISSING
)
echo Flag: !flag!
pause
endlocal
You should reformat your code and keep in mind that the grama for batch file is critical. BTW, if you are trying to update the existing batch variable and read it later, you should enable localdelayedexpansion and use ! instead of %.
Keep it simple:
#echo off
pushd "E:\TestBat\Test" || exit /B 1
for %%F in ("*.pdf") do if not exist "%%~nF.xml" echo %%~nxF
for %%F in ("*.xml") do if not exist "%%~nF.pdf" echo %%~nxF
popd
This returns all files that appear orphaned, that is, where the file with the same name but the other extension (.pdf, .xml) is missing. To implement a variable FLAG to indicate there are missing files, simply append & set "FLAG=missing" to each for line and ensure FLAG is empty initially. Then you can check it later by simply using if defined FLAG.
Note: This does not cover the e-mail notification issue. Since I do not know the BLAT tool you mentioned, I have no clue how you want to transfer the listed files to it (command line arguments, temporary file, or STDIN stream?).
In case there is a huge number of files in the target directory, another approach might be better in terms of performance, provided that the number of file system accesses is reduced drastically (note that the above script accesses the file system within the for loop body by if exist, hence for every iterated file individually). So here is an attempt relying on a temporary file and the findstr command:
#echo off
pushd "E:\TestBat\Test" || exit /B 1
rem // Return all orphaned `.pdf` files:
call :SUB "*.pdf" "*.xml"
rem // Return all orphaned `.xml` files:
call :SUB "*.xml" "*.pdf"
popd
exit /B
:SUB val_pattern_orphaned val_pattern_missing
set "LIST=%TEMP%\%~n0_%RANDOM%.tmp"
> "%LIST%" (
rem // Retrieve list of files with one extension:
for %%F in ("%~2") do (
rem /* Replace the extension by the other one,
rem then write the list to a temporary file;
rem this constitutes a list of expected files: */
echo(%%~nF%~x1
)
)
rem /* Search actual list of files with the other extension
rem for occurrences of the list of expected files and
rem return each item that does not match: */
dir /B /A:-D "%~1" | findstr /L /I /X /V /G:"%LIST%"
rem // Clean up the temporary file:
del "%LIST%"
exit /B
To understand how it works, let us concentrate on the first sub-routine call call :SUB "*.pdf" "*.xml" using an example; let us assume the target directory contains the following files:
AlOnE.xml
ExtrA.pdf
sAmplE.pdf
sAmplE.xml
So in the for loop a list of .xml files is gathered:
AlOnE.xml
sAmplE.xml
This is written to a temporary file but with the extensions .xml replaced by .pdf:
AlOnE.pdf
sAmplE.pdf
The next step is to generate a list of actually existing .pdf files:
ExtrA.pdf
sAmplE.pdf
This is piped into a findstr command line, that searches this list for search strings that are gathered from the temporary file, returning non-matching lines only. In other words, findstr returns only those lines of the input list that do not occur in the temporary file:
ExtrA.pdf
To finally get also orphaned .xml files, the second sub-routine call is needed.
Since this script uses a temporary file containing a file list which is processed once by findstr to find any orphaned files per extension, the overall number of file system access operations is lower. The weakest part however is the for loop (containing string concatenation operations).

Creating input subfolder structure inside output folder

I have a batch script that:
read input files from a folder
elaborate them
store output files in another folder
Example code:
set pathTmp=D:\a\b\c
set pathIn=%pathTmp%\in
set pathOut=%pathTmp%\out
for /f %%i in ('dir /b %pathIn%') do (
java XXX.jar %pathIn%\%%i >> %pathOut%\%%i
)
Now I'd like to modify it to read files from all subfolders of pathIn and put the output file in the same subfolder but under pathOut.
Example: if input file is in pathIn\zzz, the output file must be in pathOut\zzz.
How can I recreate the input subfolder structure inside output folder?
I would use xcopy together with the /L switch (to list files that would be copied) to retrieve the relative paths. For this to work, you need to change to the directory %pathIn% first and specify a relative source path (for this purpose, the commands pushd and popd can be used).
For example, when the current working directory is D:\a\b\c\in and its content is...:
D:\a\b\c\in
| data.bin
+---subdir1
| sample.txt
| sample.xml
\---subdir2
anything.txt
...the command line xcopy /L /I /S /E "." "D:\a\b\c\out" would return:
.\data.bin
.\subdir1\sample.txt
.\subdir1\sample.xml
.\subdir2\anything.txt
3 File(s)
As you can see there are paths relative to the current directory. To get rid of the summary line 3 File(s), the find ".\" command line is used to return only those lines containing .\.
So here is the modified script:
set "pathTmp=D:\a\b\c"
set "pathIn=%pathTmp%\in"
set "pathOut=%pathTmp%\out"
pushd "%pathIn%"
for /F "delims=" %%I in ('xcopy /L /I /S /E "." "%pathOut%" ^| find ".\"') do (
md "%pathOut%\%%I\.." > nul 2>&1
java "XXX.jar" "%%I" > "%pathOut%\%%I"
)
popd
Additionally, I placed md "%pathOut%\%%I\.." > nul 2>&1 before the java command line so that the directory is created in advance, not sure if this is needed though. The redirection > nul 2>&1 avoids any output, including error messages, to be displayed.
I put quotation marks around all paths in order to avoid trouble with white-spaces or any special characters in them. I also quoted the assignment expressions in the set command lines.
You need to specify the option string "delims=" in the for /F command line, because the default options tokens=1 and delims=TABSPACE would split your paths unintentionally at the first white-space.
Note that the redirection operator >> means to append to a file if it already exists. To overwrite, use the > operator (which I used).
You could do something like this:
#setlocal EnableDelayedExpansion
#echo off
set pathTmp=D:\a\b\c
set pathIn=%pathTmp%\in
set pathOut=%pathTmp%\out
REM set inLength=ADD FUNCTION TO CALCULATE LENGTH OF PATHIN
for /f %%i in ('dir /b /s %pathIn%') do (
set var=%%i
java XXX.jar %%i >> %pathOut%\!var:~%inLength%!
)
This will strip the length of the pathIn directory from the absolute path leaving only the relative path. Then it appends the relative path onto the pathOut var
You would need to find or write a function to get the length of the the pathIn string. Check out some solutions here.

Write script to search the solution

I'm working on removing a large number of old and unused images from our website. We run ASP.NET with C# code behind, and do our work out of Visual Studio (2013). Right now I'm just going through our images directory and searching the solution for the image file name. While we have some filenames that follow a pattern and can be done in a group using regex, this is still rather tedious. Is there a way that I can write a batch script (or anything) to search the solution for every file in this directory? I can imagine pseudocode like
for file in images_directory
if file not in solution
delete file
but is this possible?
Technically we're just moving the files into another folder to be safe, so I guess the actual pseudocode would be more like
for file in images_directory
if file not in solution
move file to backup_directory
Within your solution file, find all references to .csproj files. Within each .csproj file, find all include lines. Within each included file, find all lines containing references to images. Copy each relevant line to a temporary list. This will make searching faster than searching every .cs file multiple times for every image.
For each graphic file, use findstr to perform a regexp search for /\bfilename\b/i within the temporary list. If not found, use conditional execution to initiate a move of the orphaned image to backup.
Save this with a .bat extension, modify the first three set lines to appropriate values, and give it a shot. By default, it only pretends to move. If you're satisfied that the simulations will produce correct results, remove echo from the move line near the bottom to let the script off its leash.
#echo off
setlocal
set "image_dir=c:\path\to\images"
set "sln_file=c:\path\to\solution\Project1.sln"
set "backup_dir=c:\path\to\backup"
set "remember=%temp%\proj_images.txt"
for %%I in ("%sln_file%") do pushd "%%~dpI"
rem // .sln -> .csproj -> .cs -> images. Find image references and remember.
del "%remember%" >NUL 2>NUL
for /f "delims=" %%I in ('findstr /i ".csproj\>" "%sln_file%"') do (
rem // %%I contains lines matching /.csproj\b/ig
for %%p in (%%I) do if exist "%%~p" (
rem // %%p contains a .csproj filename
for /f "delims=" %%J in ('findstr /i "\<include\>" "%%~p"') do (
rem // %%J contains lines matching /\binclude\b/ig
for %%c in (%%J) do if exist "%%~c" (
rem // %%c contains the filename of an include
findstr /i ".png\> .jpg\> .gif\> .bmp\> .tif\>" "%%~c" >>"%remember%" && (
echo Images referenced within %%~nxc. I'll remember this.
)
)
)
)
)
rem // for each image file in image_dir (recursive)
for /r "%image_dir%" %%I in (*.png *.jpg *.gif *.bmp *.tif) do (
rem // regexp test for /\bfilename.ext\b/i
findstr /i "\<%%~nxI\>" "%remember%" >NUL || (
rem // non-zero exit status of findstr means not found
echo %%~nxI is not referenced by any files included in the solution's projects.
rem // *********************************************************
rem // REMOVE "ECHO" FROM THE FOLLOWING LINE TO ENABLE THE MOVES
rem // *********************************************************
echo move "%%~fI" "%backup_dir%"
)
)
del "%remember%" >NUL 2>NUL
echo Press any key to exit.
pause >NUL
Is this what you had in mind?
All together If I understand correctly You first want to obtain all images files from directory. Using PowerShell:
$imageFiles = Get-ChildItem 'path/to/image/directory' -Recurse | Where-Object { !($_.PSIsContainer) }
This grabs all files excluding Directories. Then:
$solutionText = Get-Content 'path/to/solution/file.csproj' | Out-String
ForEach ($file in $imageFiles ) {
if ($solutionText -match $file.Name) {
# Move to another folder
}
}
The only issue is that you'd need to make sure that the filenames don't have a chance of matching elsewhere on the file giving false positives.

CMD delete files

Perhaps someone can be of help; I have several files with the following naming convention:
fooR1.txt, fooR2.txt, fooR3.txt, . . . , fooR1000.txt
I wish to delete all the files greater than R500. I have several folders and I know how to pass through each folder, but I am not sure how to capture and delete the files with replication 501 and greater. How can I do such?
How about simply:
ren foo500.txt foo499bis.txt
del fooR5??.txt fooR6??.txt fooR7??.txt fooR8??.txt fooR9??.txt fooR10??.txt
ren foo499bis.txt foo500.txt
Not elegant, but efficient.
This will delete all files fooR###.txt where ### is greater than 500.
#echo off
setlocal EnableDelayedExpansion
for %%f in (fooR*.txt) do (
set num=%%~f
set num=!num:~4,-4!
if !num! gtr 500 del /q "%%~f"
)
endlocal
Because your range is open, I've reversed your criteria: delete anything that is not in the range 1-499. Please be aware that this is not exactly equivalent to yours, for example it will also delete a file named fooR001.txt or fooR_something_else.txt
It's also pretty slow.
#echo off
for %%F in (fooR*.txt) do (
echo %%F | findstr /v /r "fooR[1-9]\.txt fooR[1-9][0-9]\.txt fooR[1-4][0-9][0-9]\.txt" >nul && echo del %%F
)
First line (for) enumerates files starting with fooR, then for each file findstr checks if it does not match pattern (/v option) and finally a command is executed if a check (ie does not match) is positive (&& means execute only if previous command was successfull).
Code above will just echo commands, not execute them, so you may safely run it to verify it actually behaves as it should. To actually run delete, just remove echo in front of it.
note: you could actually run this directly from command line in a form of:
#for %F in (fooR*.txt) do #echo %%F | findstr /v /r "fooR[1-9]\.txt fooR[1-9][0-9]\.txt fooR[1-4][0-9][0-9]\.txt" >nul && echo del %F
You would need to make a Batch script for this. Then in the Batch file you could write.
DEL "fooR500.txt"
To delete all files with a .txt ending you would just write:
DEL "*.txt"
That's all I know, but if you want to get it so it does files 500 and higher you would have
to create a variable in Batch that holds the value 500 using:
set Value = 500
and then have it delete file "fooR" + Index + ".txt" so to do that you would have to do:
set "FilePre = fooR"
set "FileW = %FilePre% %Value%"
set "Ex = .txt"
set "FileX = %FileW% %Ex%"
del FileX
Then you will have to make Value go up by one and repeat the process 500 times until it reaches 1000.

Resources