How to extract certain text inside a file using findstr - windows

I'm trying to scan a text file, and wanted to get the version only
I've tried running this
>for /f "usebackq tokens=2 delims=, " %i in (`findstr /l "version" "C:\Test\myfiles\package.text"`) do echo %i
however it's returning an entry twice
echo "4.2.20"
"4.2.20"
The text file has this format
"version": "4.2.20",
how to use findstr to return only the exact version in this format 4.2.20
Thank you!

Q:is there a way to return on the following format 4.220 (remove the last decimal/period?)
A: there is. Split the version string (handle it like a filename, so the first part (%%~ni, "Filename") gets anything before the last dot and the second part (%%~xi, "extension" gets the last dot and everything after). Then simply remove the dot from the "extension" and merge the two substrings:
#echo off
setlocal
for /f "usebackq tokens=2 delims=, " %%i in (`findstr /l "version" "C:\Test\myfiles\package.text"`) do (
set "major=%%~ni"
set "minor=%%~xi"
)
set "version=%major%%minor:.=%"
echo method 1: %version%
set "version=%major%%minor:~1%"
echo method 2: %version%
It is possible to do it in a single command line, but as you need delayed expansion, this gets ugly, hard to read and maintain. Not worth the effort, except you have a special requirement for that, IMHO. So (because you also tagged batch-file) I stuck to that.

Related

In Windows cmd, how to replace the " special character with a line break?

Just to be thorough, I'll state here my whole project and what I'm aiming at.
I intend to adapt a shell script to work in Windows cmd, as this is intended for people who are not going to have some sophisticate language available.
for g in $(curl -Ls https://api.chess.com/pub/player/hikaru/games/archives | jq -rc ".archives[]") ; do curl -Ls "$g" | jq -rc ".games[].pgn" ; done >> games.pgn
For some reason, Chess.com's API doesn't have a very important feature that Lichess' does, to export all games of a single player, so what I can do manually is to use https://api.chess.com/pub/player/hikaru/games/archives to export all available monthly archives and then hit the API for each one of them. (hikaru inside this will be a set variable, it's the nickname of the desired player to export).
The result for this command is something like
{"archives":["https://api.chess.com/pub/player/hikaru/games/2015/11","https://api.chess.com/pub/player/hikaru/games/2015/12","https://api.chess.com/pub/player/hikaru/games/2016/02","https://api.chess.com/pub/player/hikaru/games/2016/03","https://api.chess.com/pub/player/hikaru/games/2016/04","https://api.chess.com/pub/player/hikaru/games/2016/05"]}
to which I only have to append /pgn to get the desired result.
Obviously, cmd doesn't have jq available, so this involves "parsing" the string inside a batch file.
I figured if I just could replace every occurrence of " with a linebreak and echo the results, I could then use find (or findstr) to easily get a list of lines that only would need to be prefaced with curl and appended with /pgn to get my final result.
The big question is: how do I replace " with a linebreak in cmd? I found a few answers, but none of them seems to work with a special character, part of the problem is that I also didn't understand these answers enough to try and adapt them.
A second way of perhaps achieving the same result would be replacing [, ] and , with line breaks, but then I would also have to worry with deleting the final " to append /pgn, so if I'm able to do the former, it would be cleaner.
in batch/cmd, a for loop is used to process a list (separated by default delimiters like space, tab, comma). So just replace [ and ] with a space or comma, and you have a nice list to split. Finally, use find to filter the output to the relevant parts and you're done:
#Echo off
setlocal
set "string={"archives":["https://api.chess.com/pub/player/hikaru/games/2015/11","https://api.chess.com/pub/player/hikaru/games/2015/12","https://api.chess.com/pub/player/hikaru/games/2016/02","https://api.chess.com/pub/player/hikaru/games/2016/03","https://api.chess.com/pub/player/hikaru/games/2016/04","https://api.chess.com/pub/player/hikaru/games/2016/05"]}"
set "string=%string:[= %"
set "string=%string:]= %"
for %%a in (%string%) do echo %%~a|find "/"
Output:
https://api.chess.com/pub/player/hikaru/games/2015/11
https://api.chess.com/pub/player/hikaru/games/2015/12
https://api.chess.com/pub/player/hikaru/games/2016/02
https://api.chess.com/pub/player/hikaru/games/2016/03
https://api.chess.com/pub/player/hikaru/games/2016/04
https://api.chess.com/pub/player/hikaru/games/2016/05
(in case you wonder: the tilde in echo %%~a removes surrounding quotes)
Stephan's answer gave me the directions I needed to research more and build my own solution. This is not the final script to my project, but it does solve every problem presented in my original question:
#echo off
setLocal enabledelayedexpansion
for /f "delims=" %%a in (input.txt) do (
for %%b in (%%a) do (
set string=%%b
set "string=!string:[=,!"
set "string=!string:]=,!"
echo !string!>>replaced.txt
)
)
for /f "delims=" %%c in (replaced.txt) do (
for %%d in (%%c) do (
echo %%~d>>echo.txt
)
)
for /f %%e in (echo.txt) do echo curl %%~e/pgn|find ".">>list.txt
I basically run 3 sets of loops, the first one loads my input (this could not be done via set because there's a size limit, using a nested loop works around that) and replaces [ and ] for commas.
The second loop sorts again the output. This is done basically to trim unwanted characters from the first and last line.
The last loop generates a list of curl commands that will later be executed into a PGN file (which is a chess file).
This ends the scope of the question, but since my project wasn't that complex, I'll present it's final version, which improves on Compo's answer, in case someone else stumbles upon this question:
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
:: Chess.com and Lichess API Scraper ::
:: Author: fabiorzfreitas ::
:: Extract all games from a player from Chess.com and Lichess ::
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
:: This tool uses Chess.com and Lichess APIs to extract all games from a given player. ::
#echo off
setLocal enabledelayedexpansion
echo.
echo.
echo.
echo All input must be lowcase!
echo.
echo You can skip the input bellow by pressing Enter
echo.
echo.
echo.
set /p lichess="Input Lichess nickname and press Enter: "
set /p chess="Input Chess.com nickname and press Enter: "
echo.
:Lichess
if not defined lichess goto :Chess
curl https://lichess.org/api/games/user/%lichess% >> Games.pgn
:Chess
if not defined chess goto :End
(for /f "usebackq tokens=2 delims=[]" %%g in (`curl https://api.chess.com/pub/player/%chess%/games/archives`) do (
for %%h In (%%g) do curl "%%~h/pgn" >> Games.pgn
)
)
:End
exit
Based upon your own answer, it seems as if you could remove at least one of those steps by using the brackets [ and ], as delimiters.
You could also nest a for loop within another instead of having individual ones and writing to files.
Here it is as a single line batch-file:
#(For /F "UseBackQ Tokens=2 Delims=[]" %%G In ("input.txt") Do #For %%H In (%%G) Do #Echo curl.exe "%%~H/pgn") 1>"list.txt"
To do it directly in cmd:
(For /F "UseBackQ Tokens=2 Delims=[]" %G In ("input.txt") Do #For %H In (%G) Do #Echo curl.exe "%~H/pgn") 1>"list.txt"

FINDSTR to find text START END of string

I have string photo="999" price="10" category="1" . I want to get only 10. This means I need to the string which start price=" and ends with "
#For /F "Tokens=1*Delims==" %%A In ('FindStr /I "^price=" "C:\price.txt" 2^>NUL')Do #Set "Ver=%%~B"
#Echo(%%Ver%% = %Ver% & Pause
findstr always returns the complete line, if successful. So it's not the right tool for this task (actually, there is no tool in cmd at all that could do that this way).
But with a bit of logic, you can work around it: remove the part from the start until (including) the triggerword price (a task, the set command is happy to do), then process the rest with a for /f loop to get the desired substring:
set "string=photo="999" price="10" category="1""
echo check: %string%
echo debug: %string:*price=%
for /f tokens^=2^ delims^=^" %%a in ("%string:*price=%") do set "ver=%%~a"
echo ver=%ver%
If you are sure of the exact format of your string (in your example the searched substring is the second quoted argument, so the fourth token when splitted by ") it gets as easy as:
for /f tokens^=4^ delims^=^" %%a in ("%string%") do echo ver=%%~a
or
for /f tokens^=4^ delims^=^" %%a in (file.txt) do echo ver=%%~a
#ECHO OFF
SETLOCAL
set "string=photo="999" price="10" category="1""
:: remove quotes
set "string=%string:"=%"
for /f %%a in ("%string:* price=%") do set /a pricefound%%a
set pri
goto :eof
Since we don't have a representative sample of the file in question, we're forced to the conclusion that the requirement is to find the one and only appearance of price="anumber" in the file.
So, since findstr output, properly framed, would select this line, all we need do is process the string.
This is kind of a quick-and-dirty method; it may be adequate for OP's purpose.
First, remove the quotes from the string as they have a habit of interfering.
Next, use for /f in string-processing mode where it does its magic on the quoted string in parentheses. The string is the original string, minus quotes, so replace all characters up to "Spaceprice" with nothing and take the first token of the result, resulting in =10 assigned to %%a in the example case.
Then execute "set /a somevariablename=10" by simply concatenating the two strings.
Note that if the file contains a line like ... pricelastweek="9" ... then other measures may need to be taken.
Here's an example which tries to follow a similar methodology as your example code.
It uses FindStr to isolate any line in C:\price.txt, which includes the word price="<OneOrMoreDigits>". That line is saved as a variable named price, which is split under delayed expansion in a nested For loop, to remove everything up to, and including the first instance of the string price, leaving, in this case, ="10" category="1". The nested loop further splits that, to take the second token, using a doublequote character as the delimiter, (which should be your required value).
#For /F Delims^=^ EOL^= %%G In ('%__AppDir__%findstr.exe /IR "\<price=\"[0123456789]*\"\>" "C:\price.txt"') Do #(Set "price=%%G" & SetLocal EnableDelayedExpansion
For /F Tokens^=2^ Delims^=^" %%H In ("!price:* price=!") Do #EndLocal & Set "price=%%H")
#Echo %%price%% = %price% & Pause
Well clearly you need to match lines that contain price=" as there may be other lines.
What's unclear is if you need match 10 exactly, or just want that to be any number.
It seems likely you just want to match any number and grab it.
This is done easily with:
#For /F "Tokens=4 Delims=^= " %%A In ('
TYPE "C:\price.txt" ^| FIND /I "price="""') Do #(
Set "Ver=%%~A" & CALL SET Ver &Pause )
While is you need to match Price="10", which seems less useful, but at least one person took that meaning and your wording is a little unclear so I will add that was well:
#For /F "Tokens=4 Delims=^= " %%A In ('
TYPE "C:\price.txt" ^| FIND /I "price=""10"""') Do #(
Set "Ver=%%~A" & CALL SET Ver &Pause )
Note in all examples I left in the # symbols since I assume this is you being clever, and leaving ECHO ON and only removing the # symbols when you want to debug some specific thing you are doing.
However, in case not, it's worth pointing out that in a script it's usually easiest to place ECHO OFF at the start of the script instead of putting an # at the beginning of each statement to stop it from echoing.
Cheers! :)

Extracting URL from text file in Batch

I have a script that needs to extract a YouTube URL from a text file.
Here's what I have in the text file (output.txt):
---------- NUMBER11.TXT
<link itemprop="url" href="http://www.youtube.com/channel/UCnxGkOGNMqQEUMvroOWps6Q">
Note the text file has a line of empty space to start, which is annoying, and the URL is on line 3. Something that doesn't show up in the formatting for this site is the 11 spaces before the actual href starting as well. I'd like to separate it from the mass of other junk.
I've tried something like this:
set /p long= < output.txt
echo %long%
set short1=%long:^<link itemprop^="url" href^="=%
echo %short1% > o1.txt
I thought this would remove the selected text from the file, but I think this is a little over my head.
I'm getting the output.txt from firstly a curl of a youtube video page, and secondly from a find command here:
find "href=""http://www.youtube.com/channel/" %vd% > output.txt
Maybe I'm making this more complicated than it is?
Using batch-files to access files with special characters, like redirect, it can cause some problems, so it is not recommended, but I felt like posting an answer anyway, so given you exact example, here is one way. If your example is not as per your post, which I highly expect it to be, then this probably would not work.
#echo off
setlocal enabledelayedexpansion
for /f "usebackq delims=" %%i in ("output.txt") do for %%a in (%%i) do (
set "var=%%~a"
set "var=!var:>=!"
set "var=!var:"=!"
if "!var:~0,4!" == "http" echo !var!
)
#ECHO OFF
SETLOCAL
SET "sourcedir=U:\sourcedir"
SET "filename1=%sourcedir%\q64572433.txt"
set "url="
FOR /f "tokens=4,5delims=>= " %%a IN (%filename1%) DO if "%%~a"=="href" set "url=%%~b"
echo URL=%url%
GOTO :EOF
You would need to change the setting of sourcedir to suit your circumstances. The listing uses a setting that suits my system.
I used a file named q64572433.txt containing your data for my testing.
The for command tokenises each line of the file, using =, > and space as delimiters (the 3 characters between delims= and ")
On the line of interest, token 4 would be href and token 5 the url - and this is the only line where href is the fourth token. When that is detected, assign the 5th token (in %%b) to the variable, removing the quotes with ~ for good measure.
I would suggest you parse the results directly from your curl command instead of outputting them to a text file, and then using find against that output.
However, instead of using find.exe, I would suggest you use the following method using findstr.exe instead, to get the URL assigned to any line containing href= followed by "http: or "https and subsequently followed by youtube.com.
#Echo Off
SetLocal EnableExtensions DisableDelayedExpansion
For /F Tokens^=*EOL^= %%G In (
'%__APPDIR__%findstr.exe /IR "href=\"http[s:].*youtube\.com" "output.txt"'
) Do (Set "Line=%%G" & SetLocal EnableDelayedExpansion
For /F Tokens^=2Delims^=^" %%H In ("!Line:*href=!") Do EndLocal & Echo %%H)
Pause
If you want the output stored as a variable, instead of Echoing it, change Echo %%H to Set "URL=%%H". You could then use %URL%, (or "%URL%" if you need it doublequoted), elsewhere in your script.

Windows Batch file - strip leading characters

I have a batch file which copies some local files up to a google storage area using the gsutil tool. The gsutil tool produces a nice log file showing the details of the files that were uploaded and if it was OK or not.
Source,Destination,Start,End,Md5,UploadId,Source Size,Bytes Transferred,Result,Description
file://C:\TEMP\file_1.xlsx,gs://app1/backups/file_1.xlsx,2018-12-04T15:25:48.428000Z,2018-12-04T15:25:48.804000Z,CPHHZfdlt6AePAPz6JO2KQ==,,18753,18753,OK,
file://C:\TEMP\file_2.xlsx,gs://app1/backups/file_2.xlsx,2018-12-04T15:25:48.428000Z,2018-12-04T15:25:48.813000Z,aTKCOQSPVwDycM9+NGO28Q==,,18753,18753,OK,
What I would like to do is to
check the status result in column 8 (OK or FAIL)
If the status is OK then move the source file to another folder (so that it is not uploaded again).
The problem is that the source filename is appended with "file://" which I can't seem to remove, example
file://C:\TEMP\file_1.xlsx
needs to be changed into this
C:\TEMP\file_1.xlsx
I am using a for /f loop and I am not sure if the manipulation of the variables %%A is different within a for /f loop.
#echo off
rem copy the gsutil log file into a temp file and remove the header row using the 'more' command.
more +1 raw_results.log > .\upload_results.log
rem get the source file name (column 1) and the upload result (OK) from column 8
for /f "tokens=1,8 delims=," %%A in (.\upload_results.log) do (
echo The source file is %%A , the upload status was %%B
set line=%%A
set line=!line:file://:=! >> output2.txt echo !line!
echo !line!
)
The output is like this.
The source file is file://C:\TEMP\file_1.xlsx , the upload status was OK
The source file is file://C:\TEMP\file_2.xlsx , the upload status was OK
I'm expecting it to dump the altered values out into a new file but it is not producing anything at the moment.
Normally I would extract from a specific character to the end of the string with something like this but it doesn't work with my For/f loop.
%var:~7%
Any pointers or a different way of doing it greatly appreciated.
Since the part to remove seems fixed it is easier to use substrings.
Also using for /f "skip=1" evades he neccessity of the external command more +1 and another intermediate file.
#echo off & setlocal EnableDelayedExpansion
type NUL>output2.txt
for /f "skip=1 eol=| tokens=1,8 delims=," %%A in (.\upload_results.log) do (
echo The source file is %%A , the upload status was %%B
set "line=%%A"
set "line=!line:~7!"
echo(!line!>>output2.txt
echo(!line!
)
File names and paths can contain also one or more exclamation marks. The line set line=%%A is parsed by Windows command processor a second time before execution with enabled delayed expansion. See How does the Windows Command Interpreter (CMD.EXE) parse scripts? Every ! inside the string assigned to loop variable A is on this line interpreted as begin or end of a delayed expanded environment variable reference. So the string of loop variable A is assigned to environment variable line with an unwanted modification if file path/name contains one or more exclamation marks.
For that reason it is best to avoid usage of delayed expansion. The fastest solution is for this task using a second FOR to get file:// removed from string assigned to loop variable A.
#echo off
del output2.txt 2>nul
for /F "skip=1 tokens=1,8 delims=," %%A in (upload_results.log) do (
echo The source file is %%A , the upload status was %%B.
for /F "tokens=1* delims=/" %%C in ("%%~A") do echo %%D>>output2.txt
)
Even faster would be without the first echo command line inside the loop:
#echo off
(for /F "skip=1 delims=," %%A in (upload_results.log) do (
for /F "tokens=1* delims=/" %%B in ("%%~A") do echo %%C
))>output2.txt
The second solution can be written also as single command line:
#(for /F "skip=1 delims=," %%A in (upload_results.log) do #for /F "tokens=1* delims=/" %%B in ("%%~A") do #echo %%C)>output2.txt
All solutions do following:
The outer FOR processes ANSI (fixed one byte per character) or UTF-8 (one to four bytes per character) encoded text file upload_results.log line by line with skipping the first line and ignoring always empty lines and lines starting with a semicolon which do not occur here.
The line is split up on every occurrence of one or more commas into substrings (tokens) with assigning first comma delimited string to specified loop variable A. The first solution additionally assigns eighth comma delimited string to next loop variable B according to ASCII table.
The inner FOR processes the string assigned to loop variable A with using / as string delimiter to get assigned to specified loop variable file: and to next loop variable according to ASCII table the rest of the string after first sequence of forward slashes which is the full qualified file name.
The full qualified file name is output with command echo and appended either directly to file output2.txt (first solution) or first to a memory buffer which is finally at once written into file output2.txt overwriting a perhaps already existing file with that file name in current directory.
For understanding the used commands and how they work, open a command prompt window, execute there the following commands, and read entirely all help pages displayed for each command very carefully.
del /?
echo /?
for /?
See also the Microsoft article about Using command redirection operators for an explanation of the redirections >, >> and 2>nul

Get file name and append to beginning of line

I'm trying to get a side-by-side file path and file name in a text file so I can make inserting into a database easier. I've taken a look at other examples around SO, but I haven't been able to understand what is going on. For instance, I saw this batch file to append file names to end of lines but figured that I shouldn't ask for clarification because it's 1.5 years old.
What I have is a text file of file paths. They look like this:
\\proe\igi_files\TIFFS\AD\1_SIZE_AD\1AD0019.tif
What I want it to look like is this:
1AD0019.tif \\proe\igi_files\TIFFS\AD\1_SIZE_AD\1AD0019.tif
so that I can insert it into a database. Is there an easy way to do this on Windows via Batch files?
No batch file required. From the command line:
>"outputFile.txt" (for /f "usebackq eol=: delims=" %F in ("inputFile.txt") do #echo %~nxF %~dpF)
But that output format is risky because file and folder names can contain spaces, so it may be difficult to determine where the file name ends and the path begins. Better to enclose the file and path within quotes.
>"outputFile.txt" (for /f "usebackq eol=: delims=" %F in ("inputFile.txt") do echo "%~nxF" "%~dpF")
if done within a batch file, then percents must be doubled.
#echo off
>"outputFile.txt" (
for /f "usebackq eol=: delims=" %%F in ("inputFile.txt") do echo "%%~nxF" "%%~dpF"
)
You should read the built in help for the FOR command. Type help for or for /? from a command prompt to get help. That strategy works for pretty much for all commands.
In powershell, this little script should do the trick. In the first line, just specify the name of the text file that contains all the file paths.
$filelist="c:\temp\filelist.txt"
foreach($L in Get-Content $filelist) {
$i = $L.length - $L.lastindexof('\') -1
$fname=$L.substring($L.length - $i, $i)
echo ($fname + ' ' + $L)
}
If you don't have powershell installed on your machine, check out http://technet.microsoft.com/en-us/library/hh847837.aspx.
#ECHO OFF
SETLOCAL
(
FOR /f "delims=" %%i IN (yourfile.txt) DO ECHO %%~nxi %%i
)>newfile.txt
GOTO :EOF
No big drama - all on one active line, but spaced for clarity

Resources