if my file conatins below text :
sampleA1xxx sampleA2yyyy sampleA3zzzzz ... sampleA4hhhhh
I want to find sampleA4 and display sampleA4hhhh using windows batch script.
Thats is my output should be: sampleA4hhhhh
Could anyone please help me.
take a batch or have a look at GNUWin sed:
>type file
^^sampleA1xxx ^^sampleA2yyyy ^^sampleA3zzzzz ^^sampleA4hhhhh
>sed -r "s/.*(\b\w+4\w+)/\1/" file
sampleA4hhhhh
#echo off
setlocal EnableDelayedExpansion
set target=sampleA4
set len=8
for /F "delims=" %%a in ('findstr "%target%" theFile.txt') do (
for %%b in (%%a) do (
set word=%%b
if "!word:~0,%len%!" equ "%target%" (
echo !word:~%len%!
)
)
)
#ECHO OFF
SETLOCAL
FOR /f "delims=" %%i IN (q17316008.txt) DO SET line=%%i
SET line=%line:^= %
SET line=%line:*sampleA4=sampleA4%
FOR %%i IN (%line:^^= %) DO SET line=%%i&GOTO :done
:done
ECHO %line%
GOTO :EOF
This should do what I gather to be the task. It assumes that the "samplea4" string exists in the file's single line, is case-insensitive and the line doesn't exceed the ~8K limit on line length.
Simply replace the carets with spaces, lop off the leading characters to the first occurrence of the target string, and process that target string as a list; the first element will be the required string, so stop the processing when it's available.
OK, I tried this myself with a similar case and it worked fine:
for /f "tokens=4 delims=^^" %%a in (seperate.txt) do (echo %%a)
pause
Note: This is designed for a batch file, and replace the 4 after "tokens=" with whichever separated text you want to stop at.
Hope this helped,
Yours Mona.
Related
I wan to search for a specific string in the file-path and double that particular string.
Example: if my filepath is C:\Users\XXXX\Desktop\g%h.txt
I want to search and get the % symbol and I need to double it like this C:\Users\XXXX\Desktop\g%%h.txt
How to do this with batch file?
in batchfiles, you have to escape some special chars, so cmd knows NOT to treat them as "special". A percent-sign is escaped with another percent-sign:
#echo off
setlocal enabledelayedexpansion
for %%i in (*.txt) do (
set name=%%i
echo old name: !name!
set name=!name:%%=%%%%!
echo new name: !name!
)
Here is sample to match any txt that contain one or more %
#echo off
setlocal enabledelayedexpansion
for /f "delims=" %%i in ('dir /b /a-d *.txt ^|findstr /i "%%"') do ( set file=%%i
rem Here to remove every % in the name.
echo ren "!file!" "!file:%%=!"
)
exit /b 0
In batch files as it was mentioned above you have to escape % sign with another % sign and file names are no exception. So instead of set file=C:\Users\XXX\Desktop\r%h.txt you must use set file=C:\Users\XXX\Desktop\r%%h.txt.
I wrote a separate batch file that accept two arguments: the first one is naming pattern and the second one is a symbol that needs to be doubled. So, for instance, if you want to double character "r" in all .txt files in the current directory you type
<nameOfTheBatchFile> *.txt r in the command line. Or in your case you can go with
<nameOfTheBatchFile> C:\Users\XXX\Desktop\r%h.txt %
#echo off
setlocal enabledelayedexpansion
set "namingPattern=%1"
set "charToDouble=%2"
for %%i in (%namingPattern%) do (
set "curName=%%i"
set newName=!curName:%charToDouble%=%charToDouble%%charToDouble%!
if "!curName!" neq "!newName!" (
echo Renaming !curName! -^> !newName!
ren "!curName!" "!newName!"
)
)
I have these files that contain a name in the first line I wish to extract. All the methods I have approach either result in error or give me nothing.
An example file can be found here: https://www.dropbox.com/s/0kcgj8vr3wii33s/FORS2.2011-10-23T23_59_20.355_out.fits?dl=0
I wish to extract "PN-G013.3+01.1", which lies on the first line (column 2091 in this case, the name is unique to each file) and is classed as "OBJECT".
Thank you in advance!
Peter
Interesting problem! Long lines may be read via chunks of 1023 characters long using set /P command; this method works as long as the line be less than 8193 characters long. This method also allows you to search for any string in the whole file, not just in the first line, and does not require any hard-coded lenghts.
#echo off
setlocal EnableDelayedExpansion
set "file=FORS2.2011-10-23T23_59_20.355_out.fits"
call :ReadLongLine < "%file%"
goto :EOF
:ReadLongLine
set "line="
:nextPart
set "part="
set /P part=
set "line=!line!!part!"
if "!part:~1022!" neq "" goto nextPart
set "line=!line:*OBJECT=!"
for /F "tokens=2 delims='" %%a in ("!line:~0,80!") do set "result=%%a"
echo Result: "%result%"
I suspect you have problems with it because it has Unix line endings instead of DOS style line endings.
This appears to work, but it's a bit brittle - tied to the position of the contents of the example you linked:
for /f "usebackq delims== tokens=28" %%a in (`findstr /n "OBJECT" FORS2.2011-10-23T23_59_20.355_out.fits`) do (
set x=%%a
)
set x=%x:'=%
for /f "tokens=1" %%a in ("%x%") do (set x=%%a)
echo %x%
If it's always a 14 character name field, you can skip the last for loop and try:
for /f "usebackq delims== tokens=28" %%a in (`findstr /n "OBJECT" FORS2.2011-10-23T23_59_20.355_out.fits`) do (
set x=%%a
)
set x=%x:~2,14%
echo %x%
I have a text file that is generated daily that has between 20 and 50 lines, each ending with a set delimiter, currently ~ but that can be changed to anything. I want to take this text file and split it into 20 to 50 different files, with one line per file. Ideally I'd like to do this from a windows batch file, or at least a program that I can script to run daily. Appreciate any help you can provide, thanks!
Sample Input File:
GS*SH*CSMZ9*20131209*0908*000000001*X*004010|ST*856*0001|BSN*00*000000001~
GS*SH*CSMZ9*20131209*0912*003400004*X*004010|SG*834*0001|TSN*00*000000004~
GS*SH*CSMZ5*20131209*1001*000040007*X*004010|ST*855*0001|BSN*00*000000007~
GS*SH*CSMZ9*20131209*1019*000000010*X*004010|SG*856*0001|BGN*00*000000010~
for /f "tokens=1*delims=:" %%a in ('findstr /n "^" "file.txt"') do for /f "delims=~" %%c in ("%%~b") do >"text%%a.txt" echo(%%c
I propose python here. Much more readable that any shell script.
f=open("filename")
lines = f.read().split('~')
for (n, line) in enumerate(lines):
open( "newname{}".format(n), "w" ).write(line)
This should create files called 00000001.txt and increment the number for each line.
If any line contains ! then it needs an extra modification.
#echo off
setlocal enabledelayedexpansion
set c=0
for /f "delims=" %%a in (file.txt) do (
set /a c+=1
set name=0000000!c!
set name=!name:~-8!.txt
>!name! echo %%a
)
#ECHO OFF
SETLOCAL
SET "destdir=c:\destdir"
FOR /f "tokens=1*delims=:" %%a IN ('findstr /n "." q21239387.txt') DO (
>"%destdir%\filename%%a.txt" ECHO(%%b
)
GOTO :EOF
where q21239387.txt is the name of the file I used as input for testing.
Files filename1.txt..filename99.txt created in the destination directory (99 is # lines in the file)
Note - will not correctly output source lines that begin with : if that is a problem.
Is it possible to remove duplicate rows from a text file? If yes, how?
Sure can, but like most text file processing with batch, it is not pretty, and it is not particularly fast.
This solution ignores case when looking for duplicates, and it sorts the lines. The name of the file is passed in as the 1st and only argument to the batch script.
#echo off
setlocal disableDelayedExpansion
set "file=%~1"
set "sorted=%file%.sorted"
set "deduped=%file%.deduped"
::Define a variable containing a linefeed character
set LF=^
::The 2 blank lines above are critical, do not remove
sort "%file%" >"%sorted%"
>"%deduped%" (
set "prev="
for /f usebackq^ eol^=^%LF%%LF%^ delims^= %%A in ("%sorted%") do (
set "ln=%%A"
setlocal enableDelayedExpansion
if /i "!ln!" neq "!prev!" (
endlocal
(echo %%A)
set "prev=%%A"
) else endlocal
)
)
>nul move /y "%deduped%" "%file%"
del "%sorted%"
This solution is case sensitive and it leaves the lines in the original order (except for duplicates of course). Again the name of the file is passed in as the 1st and only argument.
#echo off
setlocal disableDelayedExpansion
set "file=%~1"
set "line=%file%.line"
set "deduped=%file%.deduped"
::Define a variable containing a linefeed character
set LF=^
::The 2 blank lines above are critical, do not remove
>"%deduped%" (
for /f usebackq^ eol^=^%LF%%LF%^ delims^= %%A in ("%file%") do (
set "ln=%%A"
setlocal enableDelayedExpansion
>"%line%" (echo !ln:\=\\!)
>nul findstr /xlg:"%line%" "%deduped%" || (echo !ln!)
endlocal
)
)
>nul move /y "%deduped%" "%file%"
2>nul del "%line%"
EDIT
Both solutions above strip blank lines. I didn't think blank lines were worth preserving when talking about distinct values.
I've modified both solutions to disable the FOR /F "EOL" option so that all non-blank lines are preserved, regardless what the 1st character is. The modified code sets the EOL option to a linefeed character.
New solution 2016-04-13: JSORT.BAT
You can use my JSORT.BAT hybrid JScript/batch utility to efficiently sort and remove duplicate lines with a simple one liner (plus a MOVE to overwrite the original file with the final result). JSORT is pure script that runs natively on any Windows machine from XP onward.
#jsort file.txt /u >file.txt.new
#move /y file.txt.new file.txt >nul
you may use uniq http://en.wikipedia.org/wiki/Uniq from UnxUtils http://sourceforge.net/projects/unxutils/
Some time ago I found an unexpectly simple solution, but this unfortunately only works on Windows 10: the sort command features some undocumented options that can be adopted:
/UNIQ[UE] to output only unique lines;
/C[ASE_SENSITIVE] to sort case-sensitively;
So use the following line of code to remove duplicate lines (remove /C to do that in a case-insensitive manner):
sort /C /UNIQUE "incoming.txt" /O "outgoing.txt"
This removes duplicate lines from the text in incoming.txt and provides the result in outgoing.txt. Regard that the original order is of course not going to be preserved (because, well, this is the main purpose of sort).
However, you sould use these options with care as there might be some (un)known issues with them, because there is possibly a good reason for them not to be documented (so far).
The Batch file below do what you want:
#echo off
setlocal EnableDelayedExpansion
set "prevLine="
for /F "delims=" %%a in (theFile.txt) do (
if "%%a" neq "!prevLine!" (
echo %%a
set "prevLine=%%a"
)
)
If you need a more efficient method, try this Batch-JScript hybrid script that is developed as a filter, that is, similar to Unix uniq program. Save it with .bat extension, like uniq.bat:
#if (#CodeSection == #Batch) #then
#CScript //nologo //E:JScript "%~F0" & goto :EOF
#end
var line, prevLine = "";
while ( ! WScript.Stdin.AtEndOfStream ) {
line = WScript.Stdin.ReadLine();
if ( line != prevLine ) {
WScript.Stdout.WriteLine(line);
prevLine = line;
}
}
Both programs were copied from this post.
set "file=%CD%\%1"
sort "%file%">"%file%.sorted"
del /q "%file%"
FOR /F "tokens=*" %%A IN (%file%.sorted) DO (
SETLOCAL EnableDelayedExpansion
if not [%%A]==[!LN!] (
set "ln=%%A"
echo %%A>>"%file%"
)
)
ENDLOCAL
del /q "%file%.sorted"
This should work exactly the same. That dbenham example seemed way too hardcore for me, so, tested my own solution. usage ex.: filedup.cmd filename.ext
Pure batch - 3 effective lines.
#ECHO OFF
SETLOCAL
:: remove variables starting $
FOR /F "delims==" %%a In ('set $ 2^>Nul') DO SET "%%a="
FOR /f "delims=" %%a IN (q34223624.txt) DO SET $%%a=Y
(FOR /F "delims=$=" %%a In ('set $ 2^>Nul') DO ECHO %%a)>u:\resultfile.txt
GOTO :EOF
Works happily if the data does not contain characters to which batch has a sensitivity.
"q34223624.txt" because question 34223624 contained this data
1.1.1.1
1.1.1.1
1.1.1.1
1.2.1.2
1.2.1.2
1.2.1.2
1.3.1.3
1.3.1.3
1.3.1.3
on which it works perfectly.
Did come across this issue and had to resolve it myself because the use was particulate to my need.
I needed to find duplicate URL's and order of lines was relevant so it needed to be preserved. The lines of text should not contain any double quotes, should not be very long and sorting cannot be used.
Thus I did this:
setlocal enabledelayedexpansion
type nul>unique.txt
for /F "tokens=*" %%i in (list.txt) do (
find "%%i" unique.txt 1>nul
if !errorlevel! NEQ 0 (
echo %%i>>unique.txt
)
)
Auxiliary: if the text does contain double quotes then the FIND needs to use a filtered set variable as described in this post: Escape double quotes in parameter
So instead of:
find "%%i" unique.txt 1>nul
it would be more like:
set test=%%i
set test=!test:"=""!
find "!test!" unique.txt 1>nul
Thus find will look like find """what""" file and %%i will be unchanged.
I have used a fake "array" to accomplish this
#echo off
:: filter out all duplicate ip addresses
REM you file would take place of %1
set file=%1%
if [%1]==[] goto :EOF
setlocal EnableDelayedExpansion
set size=0
set cond=false
set max=0
for /F %%a IN ('type %file%') do (
if [!size!]==[0] (
set cond=true
set /a size="size+1"
set arr[!size!]=%%a
) ELSE (
call :inner
if [!cond!]==[true] (
set /a size="size+1"
set arr[!size!]=%%a&& ECHO > NUL
)
)
)
break> %file%
:: destroys old output
for /L %%b in (1,1,!size!) do echo !arr[%%b]!>> %file%
endlocal
goto :eof
:inner
for /L %%b in (1,1,!size!) do (
if "%%a" neq "!arr[%%b]!" (set cond=true) ELSE (set cond=false&&goto :break)
)
:break
the use of the label for the inner loop is something specific to cmd.exe and is the only way I have been successful nesting for loops within each other. Basically this compares each new value that is being passed as a delimiter and if there is no match then the program will add the value into memory. When it is done it will destroy the target files contents and replace them with the unique strings
Task in CMD.
1) How can I compare if string is in string? I checked manual here for "Boolean Test "does string exist ?"" But I can't understand the example or it does not work for me. This piece of code, it is just a try. I try to make a string compare of filter some sting if there is a tag <a> in a line.
FOR /f "tokens=* delims= usebackq" %%c in ("%source%") DO (
echo %%c
IF %%c == "<a" (pause)
)
So while I read a file, it should be paused if there is a link on a line.
2) I have one more ask. I would need to filter the line if there is a specific file in the link, and get content of the link. My original idea was to try to use findstr with regex, but it seems not to use sub-patterns. And next problem would be how to get the result to variable.
set "pdf=0_1_en.pdf"
type "%source%" | grep "%pdf%" | findstr /r /c:"%pdf%.*>(.*).*</a>"
So in summary, I want to go through file and if there is a link like this: REPAIRED: *
<b>GEN 0.1 Preface</b>
I forgot to style this as a code, so the inside of code was not displayed. Sorry.
Warnning: we don't know the path, only the basic filename.
Get the title GEN 0.1 Preface. But you should know, that there are also similar links with same link, which contain image, not a text inside a tag.
Code according Aacini to be changed a little bit:
#echo off
setlocal EnableDelayedExpansion
set "source=GEN 0 GENERAL.html"
set "pdf=0_1_en.pdf"
echo In file:%source%
echo Look for anchor:%pdf%
rem Process each line in %source% file:
for /F "usebackq delims=" %%c in ("%source%") do (
set "line=%%c"
rem Test if the line contain a "tag" that start with "<a" string:
set "tag=!line:*<a=!"
if not "!tag!" == "!line!" (
rem Take the string in tag that end in ">"
for /F "delims=^>" %%a in ("!tag!") do set "link=%%a"
echo Link found: !link!
if "!link!" == "GEN 0.1 Preface" echo Seeked link found
)
)
pause
Still not finished
Although your question is extensive it does not provide to much details, so I assumed several points because I don't know too much about .PDF files, tags, etc.
#echo off
setlocal EnableDelayedExpansion
set "source=GEN 0 GENERAL.html"
set "pdf=0_1_en.pdf"
echo In file: "%source%"
echo Look for anchor: "%pdf%"
rem Process each line in %source% file:
for /F "usebackq delims=" %%c in ("%source%") do (
set "line=%%c"
rem Test if the line contain "<a>" tag:
set "tag=!line:*<a>=!"
if not "!tag!" == "!line!" (
rem Test if "<a>" tag contain the anchor pdf file:
if not "!tag:%pdf%=!" == "!tag!" (
rem Get the value of "<b>" sub-tag
set "tag=!tag:<b>=$!"
set "tag=!tag:</b>=$!"
for /F "tokens=2 delims=$" %%b in ("!tag!") do set title=%%b
echo Title found: "!title!"
)
)
)
pause
Any missing point can be added or fixed, if you give me precise details about them.
EDIT: I fixed the program above after last indications from the OP. I used $ character to get the Title value; if this character may exist in original Tag, it must be changed by another unused one.
I tested this program with this "GEN 0 GENERAL.html" example file:
Line one
<a>href="/Dokumenter/EK_GEN_0_X_en.pdf" class="uline"><b>GEN 0.X Preface</b></a>
Line three
<a>href="/Dokumenter/EK_GEN_0_1_en.pdf" class="uline"><b>GEN 0.1 Preface</b></a>
Line five
and get this result:
In file: "GEN 0 GENERAL.html"
Look for anchor: "0_1_en.pdf"
Title found: "GEN 0.1 Preface"
EDIT: New faster method added
There is a simpler and faster method to solve this problem that, however, may fail if a line contains more than one tag:
#echo off
setlocal EnableDelayedExpansion
set "source=GEN 0 GENERAL.html"
set "pdf=0_1_en.pdf"
echo In file: "%source%"
echo Look for anchor: "%pdf%"
for /F "delims=" %%c in ('findstr /C:"<a>" "%source%" ^| findstr /C:"%pdf%"') do (
set "tag=%%c"
rem Get the value of "<b>" sub-tag
set "tag=!tag:<b>=$!"
set "tag=!tag:</b>=$!"
for /F "tokens=2 delims=$" %%b in ("!tag!") do set title=%%b
echo Title found: "!title!"
)
pause
First, one important question: does this really have to be implemented via a CMD script? Would you be able to go with VBScript, PowerShell, C#, or some other scripting/programming language? CMD is a notoriously painful scripting environment.
Secondly, I'm not sure if this answers your question--it's a bit unclear--but here's a quick trick you can use to see in CMD to see if a given string contains another substring:
setlocal enableextensions enabledelayedexpansion
set PATTERN=somepattern
for /f "delims=" %%f in (somefile.txt) do (
set CURRENT_LINE=%%f
if "!CURRENT_LINE:%PATTERN%=!" neq "!TEMP!" (
echo Found pattern in line: %%f
)
)
The idea is that you try to perform string replacement and see if anything was changed. This is certainly a hack, and it would be preferable if you could instead use a tool like findstr or grep, but if you're limited in your options, something like the above should work.
NOTE: I haven't actually run the above script excerpt, so let me know if you have any difficulty with it.
I have modified the way to do it. I realized that it is better to find name of pdf document first. This is my almost completed solution, but I ask you if you could help me with the last point. The last replacing statement does not work because I need to remove closing tag b. Just to get the title.
#echo off
setlocal EnableDelayedExpansion
set "source=GEN 0 GENERAL.html"
set "pdf=0_1_en.pdf"
echo In file:%source%
echo Look for anchor:%pdf%
rem Process each line in %source% file:
for /F "usebackq delims=" %%c in ("%source%") do (
set "line=%%c"
REM Test if the line contains pdf file I look for:
SET "pdfline=!line:%pdf%=!"
if not "!pdfline!" == "!line!" (
cls
echo Line: !line!
REM Test if the pdfline contains tag b
SET "tagline=!pdfline:*><b>=!"
if not "!tagline!" == "!pdfline!" (
cls
echo ACTUAL LINE: !tagline!
REM Remove closing tag b
SET "title=!tagline:</b*=!"
echo TITLE: !title!
pause
)
)
)
pause
BTW:
The html page I work with is here.
So I ask you to help complete/repair line SET "title=!tagline:</b*=!"