Windows scripting string manipulation to remove alpha characters - windows

I am a Linux guy and trying to learn batch scripting.
I have the following requirement to manipulate the string.
set string=1.23.10xxxx2
I wanted to remove the alpha characters from the above string.
i need the output as 1.23.102 or 1.23.10, both outputs are fine to me, could anyone please help me.

#echo off
set remove=abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
set string=1.23.10xxxx2
for /F "tokens=1,2 delims=%remove%" %%a in ("%string%") do (
echo Part before removed chars: %%a
echo Part after removed chars: %%b
echo Both parts: %%a%%b
)

If the format has a consistent length, then you can simply use sub-string operations.
To get 1.23.10:
set "string=%string:~0,7%"
To get 1.23.102:
set "string=%string:~0,7%%string:~-1%"
If you are simply removing the character x, then use search and replace (always case insensitive):
set "string=%string:x=%"
All of the above are described in the help, accessed by help set or set /?.
But I suspect that none of the above will meet your needs. There is nothing built into batch to conveniently search and replace ranges of characters. You could use a FOR loop to iteratively search and replace each letter. This requires delayed expansion because normal expansion occurs at parse time, and the entire FOR construct is parsed in one pass.
setlocal enableDelayedExpansion
for %%C in (
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
) do set "string=!string:%%C=!"
The above works, but it is relatively inefficient.
There are any number of 3rd party tools that could efficiently solve the problem. But non-standard executables are forbidden in some environments. I've written a hybrid batch/JScript utility called REPL.BAT that works extremely well for this problem. It works on any modern Windows machine from XP onward. Click the link to get the script. Full documentation is built into the script.
Assuming REPL.BAT is either in your current directory, or better yet, somewhere in your PATH, then the following will work:
for /f "eol=a delims=" %%S in ('repl "[a-zA-Z]" "" s string') do set "string=%%S"

You can use GNUWin32 sed:
#ECHO OFF &SETLOCAL
set "string=1.23.10xxxx2"
FOR /f %%a IN ('echo %string% ^| sed "s/[a-zA-Z]\+//"') DO set "newstring=%%a"
ECHO %newstring%

Related

List of attributes for For Loop in cmd

I am looking for a list of attributes and what they do in a for loop inside the command prompt.
Specifically I have a .bat file that copies a file from the root of the C:\ drive and pastes it inside all folders found in a pre-specified directory (i.e. C:\Users\John\Test Directory).
This is the command:
#echo off
for /D %%a in (C:\Users\John\Test Directory\*.*) do xcopy /y /d C:\test_file.txt "%%a\"
The .bat does exactly what I need it to do, but I do not understand what the "%%a" does in the command. I see similar commands that use %%g, %%f, etc, but nothing that defines why those were chosen or what they specifically do. Are those attributes arbitrary or do they have a defined function? I seemingly can't find any information on the attributes so any insight is appreciated!
Arbitrary. You can use any letter, upper or lower, and even symbols.
for %%# in... do command %%#
would work just as well. But when working with multiple tokens per iteration, it's better to use the alphabet. Here's an example why:
for /f "usebackq tokens=1* delims==" %%I in ("textfile.txt") do (
set "config[%%~I]=%%~J"
)
This is because %%I contains the text matched prior to the first equal sign, and %%J contains everything after the first equal sign. This answer shows that example in context.
The answer to your question is hinted in the for command's documentation. help for in the cmd console for full details. Specifically:
Some examples might help:
FOR /F "eol=; tokens=2,3* delims=, " %i in (myfile.txt) do #echo %i %j %k
would parse each line in myfile.txt, ignoring lines that begin with
a semicolon, passing the 2nd and 3rd token from each line to the for
body, with tokens delimited by commas and/or spaces. Notice the for
body statements reference %i to get the 2nd token, %j to get the
3rd token, and %k to get all remaining tokens after the 3rd.
This page explains further:
FOR Parameters
The first parameter has to be defined using a single character, for example the letter G.
FOR %%G IN ...
In each iteration of a FOR loop, the IN ( ....) clause is evaluated and %%G set to a different value
If this clause results in a single value then %%G is set equal to that value and the command is performed.
If the clause results in a multiple values then extra parameters are implicitly defined to hold each. These are automatically assigned in alphabetical order %%H %%I %%J ...(implicit parameter definition)
If the parameter refers to a file, then enhanced variable reference can be used to extract the filename/path/date/size.
You can of course pick any letter of the alphabet other than %%G.
%%G is a good choice because it does not conflict with any of the pathname format letters (a, d, f, n, p, s, t, x) and provides the longest run of non-conflicting letters for use as implicit parameters.
G > H > I > J > K > L > M
Format letters are case sensitive, so using a capital letter is also a good way to avoid conflicts %%A rather than %%a.
Just in the interest of thoroughness, it should be pointed out that:
when using for in a cmd console, use single percents.
when using for in a bat script, use double percents.
Use a tilde when retrieving the iterative variable to strip surrounding quotation marks from the value. (e.g. "Hello world!" becomes Hello world!). It's convenient to use this to force a desired format. "%%~G" would always be quoted, whether the captured value was quoted or not. You should always do this when capturing file names with a for loop.
Tilde notation also allows expanding paths, retrieving file size, and other conversions. See the last couple of pages of help for in a cmd console for more details.

Batch wildcard strange behavior

I have a list of files that I'm looping through that match a certain size letter (A,B,C,D). The files are of the form ###T#####A###_# rev 1.dxf, where the rev 1 is only there some of the time, and A refers to the size, which is A, B, C, or D. When I try to loop through these in a set D.dxf or B.dxf, some A files are also found. I currently use the pattern ?????????A*.dxf, but would like to expand this to more file types without having to make multiple batch files. Interestingly, if I use the pattern TA*.dxf, the wildcard behaves normally.
Why does this happen, and how can I fix it while still being to catch files where the A may be at the beginning, end, middle, etc? If you need any clarification or extra information, feel free to ask.
Here is my relevant code:
FOR %%S IN (A,B,C,D) DO (
echo Converting size %%S. . .
FOR %%F in ("%filepath%\?????????%%S*.dxf") DO (
echo Converting %%~nxF to PDF, size %%S
SET %%S=!%%S! "%%~pF%%~nF.pdf"
"C:\Program Files\AutoDWG\AutoDWG DWG to PDF Converter\d2p.exe" /InFile %%~fF /OutFile %%~nF.pdf /Watermark %~dp0%%Swatermark.wdf /InConfigFile %~dp0%%S.ddp
)
echo:
echo Combining %%Ss. . .
pdftk !%%S! cat output "%filepath%\print\%%Ss.pdf"
echo Combined
echo:
)
EDIT: I'm running this on 32-bit Windows XP. Does this have anything to do with this thread? I will investigate when I get home.
EDIT 2: I've now figured out what the problem is. When I have several files with the same beginning characters, the 8.3 short names contain a hexadecimal number, which may match one of the letters I'm searching for. How can I discard short name matches in my for loop?
Your link to the Strange Windows DIR command behavior thread seems to be a good thought. From RBerteig's thorough answer: Wild cards at the command prompt are matched against both the long file name and the short "8.3" name if one is present.... Try next approach:
SETLOCAL enableextensions enabledelayedexpansion
:::
pushd %filepath%
FOR %%F in ("*.dxf") DO (
set "fname=%%~nF"
set "fmatch="
set "char04=!fname:~3,1!"
set "char10=!fname:~9,1!"
if /I "!char04!"=="T" (
FOR %%S in (A B C D) do if /I "!char10!"=="%%S" set "fmatch=!fname!"
)
if defined fmatch (
echo Converting %%~nxF to PDF, size !char10!
rem another stuff here
)
)
popd

dos command findstr search string withing specified bytes

Could I use findstr to search a string between specified bytes/positions only..
For example I have a text file and each line has maximum 1000 bytes
I wanted to search lines with a string in between byte number 50 to 100 only?
Your problem would be relatively easy to solve using regular expressions (regex). But unfortunately, FINDSTR support for regex is extremely limited. It does not have the features needed to solve your problem.
You could use grep for Windows instead of FINDSTR, but that requires a download.
Assuming you want to find my string somewhere between positions 50 and 100 on any line within "file.txt":
grep "^.\{49,91\}my string" file.txt
Another option is to switch to another scripting language with full support for regex. JScript, VBScript, and PowerShell can all be used to easily solve this problem.
A pure native batch solution requires a non-trivial script, and is much slower. Here is one possible solution:
#echo off
setlocal disableDelayedExpansion
for /f delims^=^ eol^= %%L in (file.txt) do (
set "ln=%%A"
setlocal enableDelayedExpansion
set "ln=!ln:~49,51!"
if "!ln:my string=!" neq "!ln!" echo !ln!
endlocal
)

File rewriting: One line is greater than variable's max size. Workaround?

I need to replace a single line in a file.
Generally, this code works fine:
(The actual specifics on what this block is doing is not necessary for this question).
for /F "tokens=1* delims=:" %%a in ('findstr /N "^" %DATA%') do (
if %%a equ %TargetLine% (
echo !insert!>>%filepath%cc.tmp
) else (
if [%%b]==[] (echo.>>%filepath%cc.tmp) else (echo %%b>>%filepath%cc.tmp)
)
)
Unfortunately, each line is assigned to %%a, which like any other variable can only store a maximum length of 8,192 characters (thanks dbenham for that tidbid, comes in use now).
So what options do I have when the line is greater than 8,192 characters (23,708 in this case)?
Before you ask: No it cannot be separated to a new line, it is an 10k JSON array encoded in Base64 which is then written into a ByteArray.
I assume that the way to go is using regular expressions, is this the correct assumption, or is there another workaround?
Thanks.
You could solve this with pure batch!
:readLongLine
< longline.tmp (
for /L %%n in (1 1 20) do set /p part[%%n]=
)
After this your line is splitted into the variables part[1] .. part[20]
Writing this to a new file you could use
:writeLongLine
<nul (
for /L %%n in (1 1 19) do set /p ".=!part[%%n]!"
(echo !part[20]!)
) > longLine2.tmp
You could use some other scripting language like VBScript, JScript, or PowerShell.
If you want to remain in the batch world, you can use a handy hybrid JScript/batch utility called REPL.BAT that performs regex search and replace on stdin and writes result to stdout. It is quite efficient, and works on any Windows machine from XP onward. It is pure script, so no exe download required. You can get REPL.BAT here. Full documentation is embedded within the script.
Simply use sed, awk or Perl for the job.

Windows bat file equivelent of bash string manipulation

How would I achieve this:
for i in *.e; do mv $i ${i%-b*.e}.e; done
in a Windows batch file? (It renames files containing "-b" to the part before "-b". Note that this is not necessarily the end of the string! e.g. "file-b-4.e" will become "file.e")
If you really want to do this in batch, this should work
#echo off
setlocal disableDelayedExpansion
for %%F in (*.e) do (
set "var=%%~F"
setlocal enableDelayedExpansion
set "var=!var:-b=.e:!"
for /f "eol=: delims=:" %%A in ("!var!") do (
endlocal
echo ren "%%F" "%%A"
)
)
Edit
The comment by panda-34 alluded to the fact that the original posted code failed if the file name begins with -b. The code above was fixed by incorporating the extension into the replacement string. (thanks panda-34 for alerting me to the problem)
panda-34 also provided an alternate solution that uses command injection with search and replace. The injected command is the REM statement.
The panda-34 solution works as long as the file name does not contain & or ^ characters, but fails if it does.
Below is a modified version of the command injection technique that should work with all valid Windows file names. There are 2 critical mods, 1) make sure the special chars in the file name are always quoted, and 2) do not pass the value as a CALL argument, otherwise ^ will be doubled to ^^.
#echo off
setlocal disableDelayedExpansion
for %%i in (*-b*.e) do (
set old="%%~ni"
call :ren_b
)
exit /b
:ren_b
set v=%old:-b=.e"&rem "%
echo ren "%old:~1,-1%.e" %v%
exit /b
Final Edit (I hope):
As baruch indicates in his comment, the solutions above remove starting with the 1st occurance, whereas the original bash command removes starting with the last occurance.
Below is a version that should be an exact equivalent of the original bash command.
#echo off
setlocal disableDelayedExpansion
set "search=-b"
for %%A in (*%search%*.e) do (
set "old=%%A"
setlocal enableDelayedExpansion
set "new=\_!old:%search%=\_!"
for %%B in ("!new!") do (
endlocal
set "new=%%~pB"
setlocal enableDelayedExpansion
set "new=!new:~2,-1!.e"
echo ren "!old!" "!new:\_=%search%!"
endlocal
)
)
Simple, really
for %%i in (*-b*.e) do call :ren_b %%~ni
goto :eof
:ren_b
set v=%*
set v="%v:-b=.e" ^& rem %
ren "%*.e" %v%
Here's a variant to keep the name till the last -b occurence
setlocal enabledelayedexpansion
for %%i in (*-b*.e) do (
set v=%%~ni
set v=!v:-b=\!
for %%j in ("\!v!") do (
set v=%%~pj
set v=!v:~1,-1!
set v=!v:\=-b!
ren "%%i" "!v!.e"
)
)
It will fail for names containing ! and starting with -b.
P.S, Didn't see, dbenham already provided the equivalent solution, probably with more provisions for terminal cases of file names.
Forget it, some convenient things cannot be done in NT scripting. What you are asking here is not possible to my knowledge. And I've written and maintained complex NT scripts bigger than 50 KiB, using all kinds of tricks. The book "Windows NT Shell Scripting" points out many of these, for the same and more see Rob van der Woude's scripting pages.
I reckon you could do part of this, but certainly not in a one-liner due to how variable expansion works in NT scripting. For example you could extract the part of the string that you expect to be -b and check whether it is -b, then extract the other parts and rename from the original name to the one that is comprised of only the extracted parts.
But you'll likely need ten to fifteen lines to achieve that. In that light, consider using a different scripting language for the purpose. Especially if this is a modern Windows version.
I realize this is not the desired answer (i.e. that this is possible and a sample), but cmd.exe is very limited compared to Bash, albeit by far not as limited as some opponents of traditional batch scripting are pointing out.

Resources