Batch string replace after special character - windows

In batch, how can replace the substring after a special character.
#echo off
set var1=abc_123
set var2=%var1:*_=%
echo %var2%
set var1=abc_123
set var3=%var1:_*=%
echo %var3%
output:
123
abc_123
In set var2=%var1:*_=%, the *match the abc, but in set var3=%var1:_*=% it doesn't work!
What's the difference between the two usage?
How can I use * or something to replace the _123

With the search/replace syntax it's not possible, because the asterix is used as wildcard only when it's the first character in the search expression, at any other position it's treated as normal character.
But for a single character it can be done with a FOR /F loop.
It splits the text by the delimiter character(s) into tokens.
FOR /F "tokens=1,* delims=_" %%L in ("%var1%") DO (
set var3=%%L
)
echo %var3%

If you're looking for simpler, then perhaps this is what you're looking to achieve:
Set "var1=abc_123"
Echo(%var1%
Set "var2=%var1:*_=%"
Echo(%var2%
Set "var3=%var1:_="&:"%"
Echo(%var3%
Simpler of course refers only to using expansion and replacement, not the understanding of the technique. If you want to learn more about the technique, please take a look at the examples in this external site thread.

Related

How can I get a string between two quotes in a batch file?

I have a string in a batch file, of the structure
[[status]]:{"01bcd123-1234-5678-0000-abcdefghijkl": "11"}
I need to get just the 01bcd123-1234-5678-0000-abcdefghijkl out of it, but trying to use " as a delimiter doesn't turn out well. \ and ^ don't seem to escape it properly.
set i=1
set "x!i!=%x:"=" & set /A i+=1 & set "x!i!=%"
Is what I have with x being the whole string, attempting to parse it into x1, x2 etc with " as the delimiter.
What is a proper way to split this string, using " as the delimiter?
Edit: Powershell tag is because I am running the script as part of a larger orchestration in Powershell and could export the functionality of the batch script into it if necessary.
Here are two approaches. The first one doesn't mess with the for syntax format, but it's risky - too much dependence on the string (the quotes are actually stripped by %%~). The second one is an ugly non-intuitive syntax, but actually delimits by quotes:
set "string=[[status]]:{"01bcd123-1234-5678-0000-abcdefghijkl": "11"}"
for /f "tokens=2 delims=:{" %%a in ("%string%") do #echo %%~a
for /f tokens^=2delims^=^" %%a in ("%string%") do #echo %%a
Well, the self-expanding code you have posted works fine, given that you have got delayed expansion enabled, by having put the statement setlocal EnableDelayedExpansion placed before. The string of interest is then stored in variable x2. Note that when the script terminates, x2 (like all the other x# variables as well) is no longer available since an implicit endlocal is executed then. To avoid that, place endlocal & set "x2=%x2%" in the last line:
#echo off
rem // Define string to parse:
set "x=[[status]]:{"01bcd123-1234-5678-0000-abcdefghijkl": "11"}"
rem // Enable delayed expansion:
setlocal EnableDelayedExpansion
rem // Initialise index counter:
set i=1
rem // Split string using self-expanding code:
set "x!i!=%x:"=" & set /A i+=1 & set "x!i!=%" & rem // (unbalanced `"`!)
rem // Display all `x#` variables:
set x
rem // Make `x2` survive the `endlocal` barrier:
endlocal & set "x2=%x2%"
rem // Return the retrieved value:
echo(%x2%
However, I would most probably use a for /F loop, but not with " as delimiter since the syntax appears quite odd then; rather I would use :, {, } and SPACE as delimiters. But I would remove the prefix [[status]] in advance:
#echo off
rem // Define string to parse:
set "x=[[status]]:{"01bcd123-1234-5678-0000-abcdefghijkl": "11"}"
rem /* At first, split off everything up to the first occurrence of `]]`;
rem if there is no such prefix, there is no harm, because nothing happens;
rem then extract the first token that is delimited by `:`, `{`, `}` or space;
rem that way there may even be spaces around the `:` or around `{` or `}`;
rem then return it with surrounding quotation marks removed (`~`-modifier): */
for /F "tokens=1 eol=: delims=:{} " %%I in ("%x:*]]=%") do echo(%%~I
N. B.:
The odd-looking syntax echo( is not a typo, it is actually the only safe way to echo an arbitrary string (even on, off or /?); take a look at this external thread for more details.
Since you tagged PowerShell, you can use the following regex, but I am not sure you want PowerShell based on the question.
[regex]::Match('[[status]]:{"01bcd123-1234-5678-0000-abcdefghijkl": "11"}','(?<=")[^"]+(?=")').Value
Split regex can also work:
('[[status]]:{"01bcd123-1234-5678-0000-abcdefghijkl": "11"}' -split '"')[1]
If you stick with a batch file, Stephan's helpful answer is definitely the simplest and fastest solution.
Needless to say, if you port your batch file to PowerShell, you'll have vastly more functionality at your disposal.
You can even harness that functionality from a batch file via PowerShell's CLI, by calling powershell.exe (Windows PowerShell) or pwsh.exe (POwerShell Core), but that comes with two caveats:
Doing so creates a PowerShell child process, whose startup time is not insignificant.
Getting nested quoting right can be a challenge, as shown below.
Here's a solution that calls PowerShell's CLI from a batch file, applying the -split technique from AdminOfThings' helfpul answer; again, this solution would be overkill in the case at hand, but the approach may be of interest if you need to perform tasks that simply cannot be done in the batch language or would be too cumbersome.
#echo off
setlocal
:: # The input text.
set txt=[[status]]:{"01bcd123-1234-5678-0000-abcdefghijkl": "11"}
:: # Call the PowerShell CLI to extract the token of interest and save the
:: # result in variable %id%.
:: # In PowerShell code, the equivalent would be:
:: # $id = ($txt -split '"')[1]
for /f %%i in ('powershell -noprofile -c "('%txt:"=\"%' -split '\""')[1]"') do set id=%%i
:: # Echo the result.
echo %id%
Note the need to \-escape the " chars. embedded in %txt%, via substitution %txt:"=\"%, and the need for an additional " char. after \" in '\""' so as to prevent the for command from breaking.

"delims=#+#" - more then 1 character as delimiter

Is it possible to define a delimiter which is not limited to 1 character? Based on the title's example, I would like to define my separator as e.g.
'#+#'. Textfiles/lines can contain both characters, but there is very little chance you'll come across that particular substring/text combo.
No, you can not use a string as a delimiter in the delims= clause. Of course you can include the string, but it will be handled as a set of separate characters that will be used as delimiters, not as a delimiter string.
If you really need to split on a string, the fastest approach could be to replace the delimiter string by a character not included in the data and use this character as delimiter
#echo off
setlocal enableextensions disabledelayedexpansion
for /f "delims=" %%a in ("this is a +test!! #+# of string #splitting#") do (
set "buffer=%%a"
setlocal enabledelayedexpansion
(for /f "tokens=1,2 delims=¬" %%b in ("!buffer:#+#=¬!") do (
endlocal
echo full line : [%%a]
echo first token : [%%b]
echo second token : [%%c]
)) || if "!!"=="" endlocal
)
Note: The setlocal enabledelayedexpansion is needed to be able to read the variable changed inside the for loop retrieving the data (here simulated directly including a string). Then, inside the for loop that tokenizes the readed line, delayed expansion is disabled to avoid problems with the ! characters (if delayed expansion is active, they will be consumed by the parser). This is the reason for the endlocal inside the loop.
As we are doing a string replacement and it is possible to end with a string composed of only delimiters, it is possible that the do clause of the inner for will not be executed, so the final if is included to ensure that the enabledelayedexpansion is cancelled.
I recently discovered an interesting trick that allows to use a multi-character string as delimiter to split a larger string in a very simple way; the method does not use any for command, but perform the split in just one line! Here it is:
#echo off
set "str=this is a +test!! #+# of string #splitting#"
set "first=%str:#+#=" & set "last=%"
echo full line : [%str%]
echo first token : [%first%]
echo last token : [%last%]
This method also allows to split a large string in several parts and store all of them in an array. Further details at dos batch iterate through a delimited string

How can I make a batch file that will tell me which lines of a text file are NOT in another file?

What I'm trying to do is take a text file with a bunch of strings to search for, each on its own line, and search for each one of these strings in a file (check.txt). I want the output to be a text file with a list of all the strings that COULDN'T be found.
I've tried a few things so far.
for /F "tokens=*" %%A in search.txt do (
#echo on
FINDSTR %%A check.txt
IF ERRORLEVEL 1 echo %%A FAIL > fail_match.txt
)
Another attempt I made (this one was just to tell me if the whole list was good or not) was
#echo on
FINDSTR /g:search.txt check.txt > a_match.txt
IF ERRORLEVEL 1 echo bad > a_match.txt
I realize that these are incredibly basic, and I'm sure there's some easy answer that I just don't understand. I'm not a programmer; I just want to make my job a lot easier (and faster).
To clarify, my list of things to search for is in search.txt, my list of things to check them against is check.txt. Check.txt is a json file, so it's all one enormous line. I don't know if that will make a difference or not. I want a list of all lines in search.txt that are not in check.txt.
Your search scheme seems naive on two fronts:
1) JSON is not guaranteed to be a single line. A valid JASON may have any amount of whitespace, including newlines. This could cause problems if your search string logically matches across multiple lines.
2) What about substring matches? Suppose one search string is bat, and your JSON contains bath. I doubt you would want to consider that a match.
It is possible that neither of the above concerns are a problem for your case. Assuming they aren't, then there may be a fairly simple solution using FINDSTR.
You were close on your first try, except
A) - Your FOR /F IN() clause is missing parentheses
B) - You want to force each search string to be interpreted as a string literal, possibly with spaces. That requires the /C: option.
C) - You assume leading spaces are not significant in your search string ("tokens=*" strips leading spaces)
D) - You assume no search lines begin with semicolon. (The default EOF character is semicolon, and FOR /F skips all lines that begin with the EOF character)
E) - Quotes and backslashes must be escaped within a search string: \" -> \\\\\", \ -> \\, " -> \". See What are the undocumented features and limitations of the Windows FINDSTR command? for more information.
Points C) and D) may be fixed by disabling EOF and DELIMS using the following odd syntax:
for delims^=^ eof^= %%A in ...
Point E) can be addressed by defining a variable and adding escape sequences via search and replace. But this requires delayed expansion, but delayed expansion will corrupt FOR /F variables upon expansion if they contain !. So delayed expansion must be strategically toggled on and off within the loop.
Instead of using IF ERRORLEVEN n, you can use conditional command concatenation || to take action if the previous command failed.
You don't need to see the output of the FINDSTR command, so that can be redirected to NUL.
You can improve performance by redirecting just once, outside the loop.
#echo off
setlocal disableDelayedExpansion
>fail_match.txt (
for /f delims^=^ eol^= %%A in (search.txt) do (
set "search=%%A"
setlocal enableDelayedExpansion
set "search2=!search:\"=\\"!"
set "search2=!search2:\=\\!"
set "search2=!search2:"=\"!"
findstr /c:"!search2!" check.txt >nul || echo !search!
endlocal
)
)
If none of your search strings begin with ;, and no search string contains " or \, then the solution can be as simple as:
#echo off
setlocal disableDelayedExpansion
>fail_match.txt (
for /f "delims=" %%A in (search.txt) do findstr /c:"%%A" check.txt >nul || echo %%A
)
if I read your question right (output all lines of check.txt that are not in search.txt), this single line should do:
findstr /v /x /g:search.txt check.txt > nomatch.txt

Find & copy a string in a file using only Windows batch

I call the file I want to search in input.txt and the string I want to find mystring.
Example content of input.txt (real input.txt)
randomstring1<>"\/=:
randomstring2<ORIGINAL>mystring</ORIGINAL>randomstring3
mystring is surrounded by the strings <ORIGINAL> and </ORIGINAL> that must be searched for
The string between both ORIGINAL-tags should be copied to clipboard (using | clip)
mystring and the tags occur only once. But they have no fixed position
all strings can contain special characters (<, >, ", \, /, =, :)
I read a lot of other SO questions but to be honest: the FOR-loop and SET-command syntax was too awkward for me. I guess my best shot will be the FINDSTR command. But maybe it is also possible with some help of RegEx expressions.
I do not want to use VBscript, Powershell, SED, FART, AWK, grep or any other additional tool.
Please be so kind and explain the difficult parts if you post a solution.
I want to understand it and maybe its helpful for others too.
My last attempt before I've given up was this test.cmd
#echo off
set "x=randomstring1<>"\/=:randomstring2<ORIGINAL>mystring</ORIGINAL>randomstring3"
set "x=%x:*<ORIGINAL>=%"
set "x=%x:</ORIGINAL>*=%"
set x=%x:~2%
echo %x%
pause
#echo off
rem Let findstr to find the LINE you want (only once):
for /F "delims=" %%a in ('findstr "<ORIGINAL>" input.txt') do set "line=%%a"
ECHO LINE: "%line%"
rem Change left delimiter by {
set "line=%line:<ORIGINAL>={%"
rem Change right delimiter by }
set "line=%line:</ORIGINAL>=}%"
ECHO STRING DELIMITED: "%LINE%"
rem Get second token delimited by { and }
for /F "tokens=2 delims={}" %%a in ("%line%") do set string=%%a
ECHO STRING: "%STRING%"
rem Copy string to clipboard
REM echo %string%| clip
Output:
LINE: "randomstring2<ORIGINAL>mystring</ORIGINAL>randomstring3"
STRING DELIMITED: "randomstring2{mystring}randomstring3"
STRING: "mystring"
As an option, you may delete from beginning of line until left delimiter:
set "line=%line:*<ORIGINAL>=%"
... and get the FIRST token separated by any delimiter you wish (ie: }):
for /F "delims=}" %%a in ("%line%") do set string=%%a

Remove trailing spaces from a file using Windows batch?

How could I trim all trailing spaces from a text file using the Windows command prompt?
The DosTips RTRIM function that Ben Hocking cites can be used to create a script that can right trim each line in a text file. However, the function is relatively slow.
DosTips user (and moderator) aGerman developed a very efficient right trim algorithm. He implemented the algorithm as a batch "macro" - an interesting concept of storing complex mini scripts in environment variables that can be executed from memory. The macros with arguments are a major discussion topic in and of themselves that is not relevent to this question.
I have extracted aGerman's algorithm and put it in the following batch script. The script expects the name of a text file as the only parameter and proceeds to right trim the spaces off each line in the file.
#echo off
setlocal enableDelayedExpansion
set "spcs= "
for /l %%n in (1 1 12) do set "spcs=!spcs!!spcs!"
findstr /n "^" "%~1" >"%~1.tmp"
setlocal disableDelayedExpansion
(
for /f "usebackq delims=" %%L in ("%~1.tmp") do (
set "ln=%%L"
setlocal enableDelayedExpansion
set "ln=!ln:*:=!"
set /a "n=4096"
for /l %%i in (1 1 13) do (
if defined ln for %%n in (!n!) do (
if "!ln:~-%%n!"=="!spcs:~-%%n!" set "ln=!ln:~0,-%%n!"
set /a "n/=2"
)
)
echo(!ln!
endlocal
)
) >"%~1"
del "%~1.tmp" 2>nul
Assuming the script is called rtrimFile.bat, then it can be called from the command line as follows:
rtrimFile "fileName.txt"
A note about performance
The original DosTips rtrim function performs a linear search and defaults to trimming a maximum of 32 spaces. It has to iterate once per space.
aGerman's algorithm uses a binary search and it is able to trim the maximum string size allowed by batch (up to ~8k spaces) in 13 iterations.
Unfotunately, batch is very SLOW when it comes to processing text. Even with the efficient rtrim function, it takes ~70 seconds to trim a 1MB file on my machine. The problem is, just reading and writing the file without any modification takes significant time. This answer uses a FOR loop to read the file, coupled with FINDSTR to prefix each line with the line number so that blank lines are preserved. It toggles delayed expansion to prevent ! from being corrupted, and uses a search and replace operation to remove the line number prefix from each line. All that before it even begins to do the rtrim.
Performance could be nearly doubled by using an alternate file read mechanism that uses set /p. However, the set /p method is limited to ~1k bytes per line, and it strips trailing control characters from each line.
If you need to regularly trim large files, then even a doubling of performance is probably not adequate. Time to download (if possible) any one of many utilities that could process the file in the blink of an eye.
If you can't use non-native software, then you can try VBScript or JScript excecuted via the CSCRIPT batch command. Either one would be MUCH faster.
UPDATE - Fast solution with JREPL.BAT
JREPL.BAT is a regular expression find/replace utility that can very efficiently solve the problem. It is pure script (hybrid batch/JScript) that runs natively on any Windows machine from XP onward. No 3rd party exe files are needed.
With JREPL.BAT somewhere within your PATH, you can strip trailing spaces from file "test.txt" with this simple command:
jrepl " +$" "" /f test.txt /o -
If you put the command within a batch script, then you must precede the command with CALL:
call jrepl " +$" "" /f test.txt /o -
Go get yourself a copy of CygWin or the sed package from GnuWin32.
Then use that with the command:
sed "s/ *$//" inputFile >outputFile
Dos Tips has an implementation of RTrim that works for batch files:
:rTrim string char max -- strips white spaces (or other characters) from the end of a string
:: -- string [in,out] - string variable to be trimmed
:: -- char [in,opt] - character to be trimmed, default is space
:: -- max [in,opt] - maximum number of characters to be trimmed from the end, default is 32
:$created 20060101 :$changed 20080219 :$categories StringManipulation
:$source http://www.dostips.com
SETLOCAL ENABLEDELAYEDEXPANSION
call set string=%%%~1%%
set char=%~2
set max=%~3
if "%char%"=="" set char= &rem one space
if "%max%"=="" set max=32
for /l %%a in (1,1,%max%) do if "!string:~-1!"=="%char%" set string=!string:~0,-1!
( ENDLOCAL & REM RETURN VALUES
IF "%~1" NEQ "" SET %~1=%string%
)
EXIT /b
If you're not used to using functions in batch files, read this.
There is a nice trick to remove trailing spaces based on this answer of user Aacini; I modified it so that all other spaces occurring in the string are preserved. So here is the code:
#echo off
setlocal EnableDelayedExpansion
rem // This is the input string:
set "x= This is a text string containing many spaces. "
rem // Ensure there is at least one trailing space; then initialise auxiliary variables:
set "y=%x% " & set "wd=" & set "sp="
rem // Now here is the algorithm:
set "y=%y: =" & (if defined wd (set "y=!y!!sp!!wd!" & set "sp= ") else (set "sp=!sp! ")) & set "wd=%"
rem // Return messages:
echo input: "%x%"
echo output: "%y%"
endlocal
However, this approach fails when a character of the set ^, !, " occurs in the string.
Good tool for removing trailing spaces in files in windows:
http://mountwhite.net/en/spaces.html
I just found a very nice solution for trimming off white-spaces of a string:
Have you ever called a sub-routine using call and expanded all arguments using %*? You will notice that any leading and/or trailing white-spaces are removed. Any white-spaces occurring in between other characters are preserved; so are all the other command token separators ,, ;, = and also the non-break space (character code 0xFF). This effect I am going to utilise for my script:
#echo off
set "STR="
set /P STR="Enter string: "
rem /* Enable Delayed Expansion to avoid trouble with
rem special characters: `&`, `<`, `>`, `|`, `^` */
setlocal EnableDelayedExpansion
echo You entered: `!STR!`
call :TRIM !STR!
echo And trimmed: `!RES!`
endlocal
exit /B
:TRIM
set "RES=%*"
exit /B
This script expects a string entered by the user which is then trimmed. This can of course also be applied on lines of a file (which the original question is about, but reading such line by line using for /F is shown in other answers anyway, so I skip this herein). To trim the string on one side only, add a single character to the opposite side prior to trimming and remove it afterwards.
This approach has got some limitations though: it does not handle characters %, !, ^ and " properly. To overcome this, several intermediate string manipulation operations become required:
#echo off
setlocal EnableExtensions DisableDelayedExpansion
set "STR="
set /P STR="Enter string: "
setlocal EnableDelayedExpansion
echo You entered: `!STR!`
set "STR=!STR:%%=%%%%!"
set "STR=!STR:"=""!^"
if not "%STR%"=="%STR:!=%" set "STR=!STR:^=^^^^!"
set "STR=%STR:!=^^^!%"
call :TRIM !STR!
set "RES=!RES:""="!^"
echo And trimmed: `!RES!`
endlocal
endlocal
exit /B
:TRIM
set "RES=%*"
exit /B
Update
Both of the above scripts cannot handle the characters &, <, > and |, because call seems to become aborted as soon as such a character appears in an unquoted and unescaped manner.
However, I finally found a way to fix that and come up with an approach that can successfully deal with all characters (except perhaps some control characters, which I did not test):
#echo off
setlocal EnableExtensions EnableDelayedExpansion
rem // The last white-space in `STRING` is a tabulator:
set "RESULT=" & set "STRING= (<&>"^|)^^!^^^^;,= ^"
echo Input string: `!STRING!`
rem // Double quotes to avoid troubles with unbalanced ones:
if defined STRING set "STRING=!STRING:"=""!^"
rem // Particularly handle carets and exclamation marks as delayed expansion is enabled:
if defined STRING set "STRING=!STRING:^=^^^^!"
if defined STRING set "STRING=%STRING:!=^^^!%" !
if defined STRING (
rem // Escape all characters that `call` has got troubles with:
set "STRING=!STRING:^=^^!"
set "STRING=!STRING:&=^&!"
set "STRING=!STRING:<=^<!"
set "STRING=!STRING:>=^>!"
set "STRING=!STRING:|=^|!"
)
rem /* Call the sub-routine here; the strigs `!=!` constitute undefined dummy variables
rem with an illegal name, which eventually become removed; the purpose of them us to
rem enable usage of that `call` inside of a `for` loop with the meta-variable `%%S`,
rem which would otherwise become unintentionally expanded rather than `%%STRING%%`,
rem which literally contained `%%S`; the `!=!` at the end is just there in case you
rem want to append another string that could also match another `for` meta-variable;
rem note that `!!` is not possible as this would be collapsed to a single `!`, so
rem a (most probably undefined) variable `!STRING%!` would then become expanded: */
call :TRIM %%!=!STRING%%!=!
rem /* The caret doubling done by `call` does not need to be reverted, because due to
rem doubling of the quotes carets appear unquoted, so implicit reversion occurs here;
rem of course the doubling of the quotes must eventually be undone: */
if defined RESULT set "RESULT=!RESULT:""="!^"
echo Now trimmed: `!RESULT!`
endlocal
exit /B
:TRIM
rem // This is the effective line that does the left- and right-trimming:
set "RESULT=%*" !
exit /B
I use this Python 2 script to print lines with trailing whitespace and remove them manually:
#!/usr/bin/env python2
import sys
if not sys.argv[1:]:
sys.exit('usage: whitespace.py <filename>')
for no, line in enumerate(open(sys.argv[1], 'rb').read().splitlines()):
if line.endswith(' '):
print no+1, line
I know that Python is not preinstalled for Windows, but at least it works cross-platform.

Resources