Extracting URL from text file in Batch - windows

I have a script that needs to extract a YouTube URL from a text file.
Here's what I have in the text file (output.txt):
---------- NUMBER11.TXT
<link itemprop="url" href="http://www.youtube.com/channel/UCnxGkOGNMqQEUMvroOWps6Q">
Note the text file has a line of empty space to start, which is annoying, and the URL is on line 3. Something that doesn't show up in the formatting for this site is the 11 spaces before the actual href starting as well. I'd like to separate it from the mass of other junk.
I've tried something like this:
set /p long= < output.txt
echo %long%
set short1=%long:^<link itemprop^="url" href^="=%
echo %short1% > o1.txt
I thought this would remove the selected text from the file, but I think this is a little over my head.
I'm getting the output.txt from firstly a curl of a youtube video page, and secondly from a find command here:
find "href=""http://www.youtube.com/channel/" %vd% > output.txt
Maybe I'm making this more complicated than it is?

Using batch-files to access files with special characters, like redirect, it can cause some problems, so it is not recommended, but I felt like posting an answer anyway, so given you exact example, here is one way. If your example is not as per your post, which I highly expect it to be, then this probably would not work.
#echo off
setlocal enabledelayedexpansion
for /f "usebackq delims=" %%i in ("output.txt") do for %%a in (%%i) do (
set "var=%%~a"
set "var=!var:>=!"
set "var=!var:"=!"
if "!var:~0,4!" == "http" echo !var!
)

#ECHO OFF
SETLOCAL
SET "sourcedir=U:\sourcedir"
SET "filename1=%sourcedir%\q64572433.txt"
set "url="
FOR /f "tokens=4,5delims=>= " %%a IN (%filename1%) DO if "%%~a"=="href" set "url=%%~b"
echo URL=%url%
GOTO :EOF
You would need to change the setting of sourcedir to suit your circumstances. The listing uses a setting that suits my system.
I used a file named q64572433.txt containing your data for my testing.
The for command tokenises each line of the file, using =, > and space as delimiters (the 3 characters between delims= and ")
On the line of interest, token 4 would be href and token 5 the url - and this is the only line where href is the fourth token. When that is detected, assign the 5th token (in %%b) to the variable, removing the quotes with ~ for good measure.

I would suggest you parse the results directly from your curl command instead of outputting them to a text file, and then using find against that output.
However, instead of using find.exe, I would suggest you use the following method using findstr.exe instead, to get the URL assigned to any line containing href= followed by "http: or "https and subsequently followed by youtube.com.
#Echo Off
SetLocal EnableExtensions DisableDelayedExpansion
For /F Tokens^=*EOL^= %%G In (
'%__APPDIR__%findstr.exe /IR "href=\"http[s:].*youtube\.com" "output.txt"'
) Do (Set "Line=%%G" & SetLocal EnableDelayedExpansion
For /F Tokens^=2Delims^=^" %%H In ("!Line:*href=!") Do EndLocal & Echo %%H)
Pause
If you want the output stored as a variable, instead of Echoing it, change Echo %%H to Set "URL=%%H". You could then use %URL%, (or "%URL%" if you need it doublequoted), elsewhere in your script.

Related

Batch : ECHO outputs chinese character instead of numbers

I'm trying to work around some logs to extract the data I want and to push it into another simplified .txt file (before going for the next step).
Here's the bit of code I've been trying to use to reach my goal:
setlocal ENABLEDELAYEDEXPANSION
for %%i in (C:\Test_Analyse\*files*.txt) do (
SET va=%%i
SET va=!va:~16,-31!
find /v /c "" %%i | FINDSTR /V /R /C:"^$">>C:\test_results\!va!log3.txt
set /p var=<C:\test_results\!va!log3.txt
set var=!var:~68,10!
echo !var!>>C:\test_results\!va!log2.txt
)
endlocal
The C:\test_results\!va!log3.txt file content is : ---------- C:\TEST_ANALYSE\1K43782_TEST_RENAMED_FILES_20210915.TXT: 223856.
As far as I know, it does its job except for the echo !var!>>C:\test_results\!va!log2.txt part. It prints Chinese characters in my output file instead of 223856. On a side note, when I discard the #echo OFF, I notice the ECHO line working properly in CMD - so I guess it's maybe about encryption? But I tried a few things around that, without success sadly.

Windows Batch file - strip leading characters

I have a batch file which copies some local files up to a google storage area using the gsutil tool. The gsutil tool produces a nice log file showing the details of the files that were uploaded and if it was OK or not.
Source,Destination,Start,End,Md5,UploadId,Source Size,Bytes Transferred,Result,Description
file://C:\TEMP\file_1.xlsx,gs://app1/backups/file_1.xlsx,2018-12-04T15:25:48.428000Z,2018-12-04T15:25:48.804000Z,CPHHZfdlt6AePAPz6JO2KQ==,,18753,18753,OK,
file://C:\TEMP\file_2.xlsx,gs://app1/backups/file_2.xlsx,2018-12-04T15:25:48.428000Z,2018-12-04T15:25:48.813000Z,aTKCOQSPVwDycM9+NGO28Q==,,18753,18753,OK,
What I would like to do is to
check the status result in column 8 (OK or FAIL)
If the status is OK then move the source file to another folder (so that it is not uploaded again).
The problem is that the source filename is appended with "file://" which I can't seem to remove, example
file://C:\TEMP\file_1.xlsx
needs to be changed into this
C:\TEMP\file_1.xlsx
I am using a for /f loop and I am not sure if the manipulation of the variables %%A is different within a for /f loop.
#echo off
rem copy the gsutil log file into a temp file and remove the header row using the 'more' command.
more +1 raw_results.log > .\upload_results.log
rem get the source file name (column 1) and the upload result (OK) from column 8
for /f "tokens=1,8 delims=," %%A in (.\upload_results.log) do (
echo The source file is %%A , the upload status was %%B
set line=%%A
set line=!line:file://:=! >> output2.txt echo !line!
echo !line!
)
The output is like this.
The source file is file://C:\TEMP\file_1.xlsx , the upload status was OK
The source file is file://C:\TEMP\file_2.xlsx , the upload status was OK
I'm expecting it to dump the altered values out into a new file but it is not producing anything at the moment.
Normally I would extract from a specific character to the end of the string with something like this but it doesn't work with my For/f loop.
%var:~7%
Any pointers or a different way of doing it greatly appreciated.
Since the part to remove seems fixed it is easier to use substrings.
Also using for /f "skip=1" evades he neccessity of the external command more +1 and another intermediate file.
#echo off & setlocal EnableDelayedExpansion
type NUL>output2.txt
for /f "skip=1 eol=| tokens=1,8 delims=," %%A in (.\upload_results.log) do (
echo The source file is %%A , the upload status was %%B
set "line=%%A"
set "line=!line:~7!"
echo(!line!>>output2.txt
echo(!line!
)
File names and paths can contain also one or more exclamation marks. The line set line=%%A is parsed by Windows command processor a second time before execution with enabled delayed expansion. See How does the Windows Command Interpreter (CMD.EXE) parse scripts? Every ! inside the string assigned to loop variable A is on this line interpreted as begin or end of a delayed expanded environment variable reference. So the string of loop variable A is assigned to environment variable line with an unwanted modification if file path/name contains one or more exclamation marks.
For that reason it is best to avoid usage of delayed expansion. The fastest solution is for this task using a second FOR to get file:// removed from string assigned to loop variable A.
#echo off
del output2.txt 2>nul
for /F "skip=1 tokens=1,8 delims=," %%A in (upload_results.log) do (
echo The source file is %%A , the upload status was %%B.
for /F "tokens=1* delims=/" %%C in ("%%~A") do echo %%D>>output2.txt
)
Even faster would be without the first echo command line inside the loop:
#echo off
(for /F "skip=1 delims=," %%A in (upload_results.log) do (
for /F "tokens=1* delims=/" %%B in ("%%~A") do echo %%C
))>output2.txt
The second solution can be written also as single command line:
#(for /F "skip=1 delims=," %%A in (upload_results.log) do #for /F "tokens=1* delims=/" %%B in ("%%~A") do #echo %%C)>output2.txt
All solutions do following:
The outer FOR processes ANSI (fixed one byte per character) or UTF-8 (one to four bytes per character) encoded text file upload_results.log line by line with skipping the first line and ignoring always empty lines and lines starting with a semicolon which do not occur here.
The line is split up on every occurrence of one or more commas into substrings (tokens) with assigning first comma delimited string to specified loop variable A. The first solution additionally assigns eighth comma delimited string to next loop variable B according to ASCII table.
The inner FOR processes the string assigned to loop variable A with using / as string delimiter to get assigned to specified loop variable file: and to next loop variable according to ASCII table the rest of the string after first sequence of forward slashes which is the full qualified file name.
The full qualified file name is output with command echo and appended either directly to file output2.txt (first solution) or first to a memory buffer which is finally at once written into file output2.txt overwriting a perhaps already existing file with that file name in current directory.
For understanding the used commands and how they work, open a command prompt window, execute there the following commands, and read entirely all help pages displayed for each command very carefully.
del /?
echo /?
for /?
See also the Microsoft article about Using command redirection operators for an explanation of the redirections >, >> and 2>nul

Edit text file with batch file under Windows

I've got an issue regarding a text file I'd like to change with a batch file. I was able to trim it to this point.
3539
78060031
523 )
What I need now is to get the numbers in the same line. By the way the text file is not written by my programm. What I need is now to get some backspaces till it looks like this:
353978060031523
I know there is a simple solution, but since I'm very bad in scripting I can't
find it.
Sorry for my bad english and the bad post!
It's the first time I post something here.
Thank you in advance.
This is a duplicated question. But, well, nevermind, I just answer your question.
I don't know what's the purpose of a ")" behind the "523", but since you're just concatenate the string, try out the following script:
#echo off
setlocal EnableDelayedExpansion
for /f "tokens=*" %%a in (hxh-chp.txt) do (
set "concatenate_string=!concatenate_string!%%a"
)
echo !concatenate_string!
pause >nul
The following should do what you expect, that is, concatenate the numerical characters and removing all spaces and the ):
setlocal EnableDelayedExpansion
for /F "usebackq" %%L in ("\path\to\your\text_file.txt") do (
set "CONCAT=!CONCAT!%%L"
)
endlocal & set "CONCAT=%CONCAT%"
echo %CONCAT%
This code makes use of the default behaviour of for /F, where the option tokens=1 and delims is tab and space.

for loop doesn't iterate through the lines of a text file

I have a for loop that is supposed to print each line of a text file. Instead it's printing the logPath.
This is the code:
set enabledelayedexpansion
for %%G in (C:\ExecutionSDKTest_10.2.2\*.properties) DO (
Set fileName=%%~nxG
...
set logPath="C:/ExecutionSDKTest_10.2.2/Logs/!fileName!.log"
...
For /f "tokens=*" %%B in (!logPath!) Do (
echo Inside the for loop for printing each line!!
set logLine=%%B
print !logLine! REM this prints the logPath instead of each logLine and jumps out of this for loop after the 1st iteration!
)
)
Any help?
echo off
For %%G in (C:\ExecutionSDKTest_10.2.2\*.properties) DO (
FOR /F "tokens=*" %%i in (%%G) do #echo %%i
)
Use backslashes instead of forward slashes.
set "logPath=C:\ExecutionSDKTest_10.2.2\Logs\!fileName!.log"
While usually you can use them interchangeably in Windows, cmd is a special case as the forward slash is used for switches and options to built-in commands. And its parser often stumbles over forward slashes. You usually can safely pass such paths to external commands, though.
you don't tell us which line is issuing the "invalid switch" error message, but I see several potential problems:
to use !variables! you need to enable delayed expansion
SetLocal EnableDelayedExpansion
don't use '/' in filenames, change to '\'
set logPath="C:\ExecutionSDKTest_10.2.2\Logs\!fileName!.log"
print command sends a text file to the printer. Change it to echo
echo !logLine!

Batch command to find/replace text inside file

I have a template file (say myTemplate.txt) and I need to make some edits to create my own file (say myFile.txt) from this template.
So the template contains lines like
env.name=
env.prop=
product.images.dir=/opt/web-content/product-images
Now I want this to be replaced as follows;
env.name=abc
env.prop=xyz
product.images.dir=D:/opt/web-content/product-images
So I am looking for batch commands to do the following;
1. Open the template file.
2. Do a kind of find/replace for the string/text
3. Save the updates as a new file
How do I achieve this ?
The easiest route is to modify your template to look something like this:
env.name=!env.name!
env.prop=!env.prop!
product.images.dir=/opt/web-content/product-images
And then use a FOR loop to read and write the file while delayed expansion is enabled:
#echo off
setlocal enableDelayedExpansion
set "env.name=abc"
set "env.prop=xyz"
(
for /f "usebackq delims=" %%A in ("template.txt") do echo %%A
) >"myFile.txt"
Note it is much faster to use one over-write redirection > for the entire loop then it is to use append redirection >> within the loop.
The above assumes that no lines in template begin with ;. If they do, then you need to change the FOR EOL option to a character that will never start a line. Perhaps equal - for /f "usebackq eol== delims="
Also the above assumes the template doesn't contain any blank lines that you need preserved. If there are, then you can modify the above as follows (this also eliminates any potential EOL issue)
#echo off
setlocal enableDelayedExpansion
set "env.name=abc"
set "env.prop=xyz"
(
for /f "delims=" %%A in ('findstr /n "^" "template.txt"') do (
set "ln=%%A"
echo(!ln:*:=!
)
) >"myFile.txt"
There is one last potential complicating isse - you could have problems if the template contains ! and ^ literals. You could either escape the chars in the template, or you could use some additional substitution.
template.txt
Exclamation must be escaped^!
Caret ^^ must be escaped if line also contains exclamation^^^!
Caret ^ should not be escaped if line does not contain exclamation point.
Caret !C! and exclamation !X! could also be preserved using additional substitution.
extract from templateProcessor.bat
setlocal enableDelayedExpansion
...
set "X=^!"
set "C=^"
...

Resources