windows batch file script to sort URLs

windows batch file script to sort URLs - windows

How do I write a windows batch script to sort URLs by grouping those with unique file names together in a text file? I don't know how to describe further what I want to achieve but I hope the example below explains everything:
I want this text
http://example.com/5235/Guava.jpg
http://example.com/2725/Guava.jpg
http://example.com/4627/Guava.jpg
http://example.com/8385/Guava.jpg
http://example.com/3886/Lemon.jpg
http://example.com/5896/Lemon.jpg
http://example.com/2788/Lemon.jpg
http://example.com/1758/Lemon.jpg
http://example.com/1788/Apple.jpg
http://example.com/1567/Apple.jpg
http://example.com/8065/Apple.jpg
http://example.com/6467/Apple.jpg
http://example.com/1464/Banana.jpg
http://example.com/6581/Banana.jpg
http://example.com/4642/Banana.jpg
http://example.com/8635/Banana.jpg
http://example.com/2578/Pineapple.jpg
http://example.com/1452/Pineapple.jpg
http://example.com/8652/Pineapple.jpg
http://example.com/9463/Pineapple.jpg
http://example.com/9765/Peach.jpg
http://example.com/3578/Peach.jpg
http://example.com/3583/Peach.jpg
http://example.com/9467/Peach.jpg
http://example.com/3683/Mango.jpg
http://example.com/3479/Mango.jpg
http://example.com/1795/Mango.jpg
http://example.com/7345/Mango.jpg
sorted this way
http://example.com/5235/Guava.jpg
http://example.com/3886/Lemon.jpg
http://example.com/1788/Apple.jpg
http://example.com/1464/Banana.jpg
http://example.com/2578/Pineapple.jpg
http://example.com/9765/Peach.jpg
http://example.com/3683/Mango.jpg
http://example.com/2725/Guava.jpg
http://example.com/5896/Lemon.jpg
http://example.com/1567/Apple.jpg
http://example.com/6581/Banana.jpg
http://example.com/1452/Pineapple.jpg
http://example.com/3578/Peach.jpg
http://example.com/3479/Mango.jpg
http://example.com/4627/Guava.jpg
http://example.com/2788/Lemon.jpg
http://example.com/8065/Apple.jpg
http://example.com/4642/Banana.jpg
http://example.com/8652/Pineapple.jpg
http://example.com/3583/Peach.jpg
http://example.com/1795/Mango.jpg
http://example.com/8385/Guava.jpg
http://example.com/1758/Lemon.jpg
http://example.com/6467/Apple.jpg
http://example.com/8635/Banana.jpg
http://example.com/9463/Pineapple.jpg
http://example.com/9467/Peach.jpg
http://example.com/7345/Mango.jpg
In other words, for this particular example (withe four of each fruit jpeg) I want to sort lines according to this manner: 1, 5, 9, 13, 17, 21, 25, 2, 6, 10, 14, 18, 22, 26, and so on. I hope you get what I mean.
The text file always contains urls with the same number of every "fruit" picture. There can't be six lemon jpg files and four guava jpg files. I hope you get what I what I mean.

Maybe something like this:
#ECHO OFF
SET origfile=urls.txt
SET c=1
SET skip=4
FOR /L %%c IN (1,1,%skip%) DO IF EXIST %origfile%.%%c DEL %origfile%.%%c
FOR /F "tokens=*" %%L IN (%origfile%) DO CALL :process "%%L"
DEL %origfile%
FOR /L %%c IN (1,1,%skip%) DO (
TYPE %origfile%.%%c >> %origfile%
DEL %origfile%.%%c
)
GOTO :EOF
:process
ECHO %~1>>%origfile%.%c%
SET /A c=c%%skip+1
The idea is to output subsequent lines to different files, repeating the sequence every 4 lines (and 4 is parametrised here actually, so you can easily change it), then concatenate those files under the original name.

Run this on your file. Algorithm as described in my comment above.
#!/bin/bash
FILE=$1
FIRST=$(head -1 $FILE)
COUNT=$(grep $FIRST $FILE | wc -l)
LINES=$(uniq $FILE)
for i in $(seq 1 $COUNT); do
echo $LINES | tr " " "\n"
done

You can tell sort where to start comparing:
/+n Specifies the character number, n, to
begin each comparison. /+3 indicates that
each comparison should begin at the 3rd
character in each line. Lines with fewer
than n characters collate before other lines.
By default comparisons start at the first
So if your URI prefix is always the same (which your comments indicated) you can just run the file through
sort /+25 list.txt /O:list_new.txt
which should sort it by file name, then.

Related

How to rename multiple files in bash by moving/replacing characters and converting date format in file name? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
In my log directory I have 600+ log files that have a naming scheme as:
abc.log.DDMMMYYYY
For example:
abc.log.01Nov2017
abc.log.02Nov2017
abc.log.10Dec2017
abc.log.21Jan2018
abc.log.22Jan2018
abc.log.23Jan2018
I am looking a way to rename all these files as...
YYYY-MM-DD.abc.log
The month name in file name must convert to month number. (Jan = 01, Feb = 02 ...)
For example:
2017-11-01.abc.log
2017-11-02.abc.log
2017-12-10.abc.log
2018-01-21.abc.log
2018-01-22.abc.log
2018-01-23.abc.log
How can I rename all these files in bash?

#!/bin/bash -e
# Create kludged associative array (for bash versions prior to 4 -- 4 has
# built-in associative arrays).
i=0
for Month in Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec; do
let i=i+1
# Pad with leading zero and then take last two characters.
Padded=0$i
eval Key$Month=${Padded: -2:2}
done
# Iterate on all files whose names match *.log.*.
for File in *.log.*; do
# Match to pattern with expected date format.
if [[ ! $File =~ ^(.*)\.log\.([0-3][0-9])([A-Z][a-z][a-z])([0-9]{4})$ ]]; then
echo "$File does not match pattern."
else
# Extract matched name and date.
Name=${BASH_REMATCH[1]}
Day=${BASH_REMATCH[2]}
InMonth=${BASH_REMATCH[3]}
Year=${BASH_REMATCH[4]}
# Convert month name abbreviation to month number.
# number %m, day number %d).
eval OutMonth=\$Key$InMonth
NewName="$Year-$OutMonth-$Day.$Name.log"
# Inform user.
echo "Will rename $File to $NewName."
# Rename.
mv "$File" "$NewName"
fi
done
This is locale sensitive, of course. And it expects four-digit dates, so it will break in the year 10,000. And you could add various error checks.

As you have requested batch-file solution, here is a possible solution:
#echo off
setlocal EnableDelayedExpansion
rem Set Month Numbers:
set "Jan=01"
set "Feb=02"
set "Mar=03"
set "Apr=04"
set "May=05"
set "Jun=06"
set "Jul=07"
set "Aug=08"
set "Sep=09"
set "Oct=10"
set "Nov=11"
set "Dec=12"
rem Main Loop to rename files:
for %%A IN (*.log.*) do (
for /f "delims=." %%B IN ("%%~xA") do (
set "extension=%%B"
call ren "%%A" "!extension:~5!-%%!extension:~2,-4!%%-!extension:~0,-7!.abc.log"
)
)
Let me explain break it down:
First we set MMM variables to the requested format: MM.
Now, we come to the main loop.
We loop through all files in the current folder (%cd%) which contain .log.. We do this using * wildcard.
Then, we loop in the extension of each file found.
We set the extension (without the dot (.)) in the variable extension.
After that, we rename file found in first loop (%%A) with the strings found and analyzed.
To better understand how these commands work, I suggest you to open a cmd and type the following commands:
set /?
rem /?
for /?
ren /?
Some interesting references for further reading:
https://ss64.com/nt/for.html
https://ss64.com/nt/ren.html
What does %date:~-4,4%%date:~-10,2%%date:~-7,2%_%time:~0,2%%time:~3,2% mean?
https://www.dostips.com/DtTipsStringManipulation.php#Snippets.MidString
https://www.robvanderwoude.com/battech_wildcards.php
What does "&&" in this batch file?
https://ss64.com/nt/syntax-redirection.html

If gawk is available, please try:
#!/bin/bash
for f in abc.log.*; do echo "$f"; done | awk 'BEGIN {
str="JanFebMarAprMayJunJulAugSepOctNovDec"
for (i=1; i<=12; i++) s2n[substr(str, i*3-2, 3)] = sprintf("%02d", i)
}
{if (match($0, "(.+)\\.([0-9]{2})([A-Z][a-z]{2})([0-9]{4})", a))
printf("%s%c%04d-%02d-%02d.%s%c", $0, 0, a[4], s2n[a[3]], a[2], a[1], 0)
}
' | xargs -0 -n 2 mv
It maps the month name to month number via an array s2n.
The awk script outputs pairs of filenames like: abc.log.01Nov2017, 2017-11-01.abc.log... The filenames are separated by a null character.
The xargs reads the passed filenames two by two and performs mv command on
them.

Batch file to read a txt with special characters and replace a word in it

I'm trying to make a batch file that reads a txt file "ayylmao.txt" and find a specific word "hello" and replaces it with "xello".
The thing is that the "ayylmao.txt" contains specific characters.
Ayylmao.txt looks something like this:
‹‹R‹Ę‹/M‹;Ču‹č˙˙˙‹‹#‰‹‹#CëC;Đu‹čq˙˙˙‹‹#C‹D$‰;7u®‹Ó‹Ćčúţ˙˙„Ŕu3Ŕ‰YZ]_^[ĂŤ# SVWUÄđ‰$‹ô‹‰D$‹
‹‹#;Č‚† ‹Ř‹>_‹ůz;ßrv;Ču!‹B‹A‹B‹)B‹x uV‹čđţ˙˙ëM‹Ř‹>_‹ůz;ßu
‹B‹)Bë3‹Z‰\$‹>‹‹.}+ű‰|$+Č‹‰HŤT$‹čMţ˙˙„Ŕu3 hello Ŕë°ë‹‹ ‰‹;D$…Y˙˙˙3ŔÄ]_^[ĂSVW‹Ú‹đţ }ľ ëĆ˙˙ ć ˙˙‰sjh Vj
You can see the "hello" word in the last line. I want the batch to go to the process and give me a ayylmao1.txt that looks like this:
‹‹R‹Ę‹/M‹;Ču‹č˙˙˙‹‹#‰‹‹#CëC;Đu‹čq˙˙˙‹‹#C‹D$‰;7u®‹Ó‹Ćčúţ˙˙„Ŕu3Ŕ‰YZ]_^[ĂŤ# SVWUÄđ‰$‹ô‹‰D$‹
‹‹#;Č‚† ‹Ř‹>_‹ůz;ßrv;Ču!‹B‹A‹B‹)B‹x uV‹čđţ˙˙ëM‹Ř‹>_‹ůz;ßu
‹B‹)Bë3‹Z‰\$‹>‹‹.}+ű‰|$+Č‹‰HŤT$‹čMţ˙˙„Ŕu3 xello Ŕë°ë‹‹ ‰‹;D$…Y˙˙˙3ŔÄ]_^[ĂSVW‹Ú‹đţ }ľ ëĆ˙˙ ć ˙˙‰sjh Vj
You can see that "hello" is now "xello".
I found this batch file that replaces a word from a text file:
#echo off
REM -- Prepare the Command Processor --
SETLOCAL ENABLEEXTENSIONS
SETLOCAL DISABLEDELAYEDEXPANSION
if "%~1"=="" findstr "^::" "%~f0"&GOTO:EOF
for /f "tokens=1,* delims=]" %%A in ('"type %3|find /n /v """') do (
set "line=%%B"
if defined line (
call set "line=echo.%%line:%~1=%~2%%"
for /f "delims=" %%X in ('"echo."%%line%%""') do %%~X
) ELSE echo.
)
This code works for files that don't have specific characters very good if use it like this:
code.bat "hello" "xello" "ayylmao.txt">"ayylmao1.txt"
This code only types in ayylmao1.txt few special characters but replaces hello. I want all the special characters typed in there.
I made it like this:
chcp 1252
code.bat "hello" "xello" "ayylmao.txt">"ayylmao1.txt"
But it didn't work. It worked just like the first code.
If there is a way in PowerShell to do this I'd be glad to hear it.

What you have there looks like a binary file, not a text file, despite the extension. Batch is no good for editing binary files. In PowerShell it's doable, but you need to resort to working with the data bytes instead of simple text.
This is a basic example that will find the first occurrence of the string "hello" in your file and replace it with "xhello":
$f = 'C:\path\to\ayylmao.txt'
$stext = 'hello'
$rtext = [char[]]'xhello'
$len = $stext.Length
$offset = $len - 1
$data = [IO.File]::ReadAllBytes($f)
# find first occurrence of $stext in byte array
for ($i=0; $i -lt $data.Count - $offset; $i++) {
$slice = $data[$i..($i+$offset)]
if (-join [char[]]$slice -eq $stext) { break }
}
# Once you know the beginning ($i) and length ($len) of the array slice
# containing $stext you can "cut up" $data and concatenate the slices before
# and after $stext to the byte sequence you want to insert ($rtext):
#
# |<-- $stext -->|
# [...]['h','e','l','l','o'][...] <-- $data
# ^ ^ ^ ^
# | | | |
# | $i | $i+$len
# $i-1 $i+$offset (== $i+$len-1)
#
$rdata = $data[0..($i-1)] + [byte[]]$rtext + $data[($i+$len)..($data.Count-1)]
[IO.File]::WriteAllBytes($f, $rdata)
You'll need to adjust this code if you want the replacement to work differently (replace other occurrences as well, replace a different occurrence, …).

But it didn't work. It worked just like the first code. Help ?
This batch code is coming from this site and there is a link to discussion why it doesn't work with special characters.

Yes, the PowerShell replace command can replace the string and keep the special characters. To call it from within your batch script, use the following line
powershell -command "(get-content Ayylmao.txt) -replace 'hello','xello' | set-content Ayylmao.txt"
If you want to enter your parameters from the command line, then the line would be
powershell -command "(get-content %3) -replace '%1','%2' | set-content %4"
And if you want to use variables defined in the batch script, it is the same as you would for any batch script
set file=Ayylmao.txt
set Search_criteria=hello
set Replace_criteria=xello
powershell -command "(get-content %file%) -replace '%Search_criteria%','%Replace_criteria%' | set-content %file%"

How to remove comments from text file

My text file contains one line comments that all being with "// ". Two forward slashes and a space. These may either take up the whole line or just the last part of a line. Each comment does not extend beyond the line that it's on. So no /* */ type comments crossing multiple lines.
In simple terms, all comments start with "//space" anywhere on the line. Anything starting with "//space" should be removed and trailing spaces on that line should also be removed. Leading spaces should stay. Any blank lines should be removed.
Sample file:
// This is a comment
x = 1 // This is also a comment after the double slash
x = 2
x = 3 // The above is a blank line
// Comment on this record but nothing precedes it, so should be deleted.
y = 4 // A line with leading spaces that should be kept.
z = "//path"; // The first double slashes are not a comment since the space is missing after the "//"
// Last comment line.
Result file (no trailing spaces, but keep leading spaces.:
x = 1
x = 2
x = 3
y = 4
z = "//path";
I can remove the blank lines using gc file.txt | Where-Object { $_ -ne ''} > result.txt. However I'm having trouble with reading just the beginning part of a line up to the "//" comment part.
I also tried findstr but haven't found how to read each line up to the "//" and then trim spaces out.
I could write a script program to loop throught the file and do this, but it seems like there should be a way to accomplish it using a simple one or two line powershell or bat file command.
What is the easiest way (shortest amount of code) to remove these comments while keeping the uncommented contents of the file?

Since you seem to equate "easy" with "short", here's a fairly simple solution:
gc .\samplefile.txt|%{$_-replace"(.*)(// .*)",'$1'}|?{$_}
if it's really that important to you :-)
A bit more verbose version (still using regex):
Get-Content .\samplefile.txt | Where-Object {
-not ([String]::IsNullOrEmpty($_.Trim()) -or $_-match"^\s*// ")
} |ForEach-Object { $_ -replace "(.*)(// .*)",'$1' }
That being said, I would (personally) go for a more verbose and easier-to-read/maintain solution:
To remove everything after //, the easiest way is to find the first occurrence of // with String.IndexOf() and then grab the first part with String.Substring():
PS C:\> $CommentedString = "Content // this is a comment"
PS C:\> $CommentIndex = $CommentedString.IndexOf('// ')
PS C:\> $CommentedString.Substring(0,$CommentIndex)
Content
For the indented comments you can also use String.Trim() to remove whitespace from the beginning and end of the string:
PS C:\> " // Indented comment" -match '^//'
True
You can use the ForEach-Object cmdlet to go through every line and apply the above:
function Remove-Comments {
param(
[string]$Path,
[string]$OutFile
)
# Read file, remove comments and blank lines
$CleanLines = Get-Content $Path |ForEach-Object {
$Line = $_
# Trim() removes whitespace from both ends of string
$TrimmedLine = $Line.Trim()
# Check if what's left is either nothing or a comment
if([string]::IsNullOrEmpty($TrimmedLine) -or $TrimmedLine -match "^// ") {
# if so, return nothing (inside foreach-object "return" acts like "coninue")
return
}
# See if non-empty line contains comment
$CommentIndex = $Line.IndexOf("// ")
if($CommentIndex -ge 0) {
# if so, remove the comment
$Line = $Line.Substring(0,$CommentIndex)
}
# return $Line to $CleanLines
return $Line
}
if($OutFile -and (Test-Path $OutFile)){
[System.IO.File]::WriteAllLines($OutFile, $CleanLines)
} else {
# No OutFile was specified, write lines to pipeline
Write-Output $CleanLines
}
}
Applied to your sample:
PS C:\> Remove-Comments D:\samplefile.txt
x = 1
x = 2
x = 3

Like a great many text processing problems, there is a simple solution using JREPL.BAT - a powerful regex text processing utility for the Windows command line. It is pure script (hybrid JScript/batch) that runs natively on any Windows machine from XP onward. Full documentation is embedded within the script.
jrepl "^(.*?)\s*// " "$1!=''?$1:false" /jmatch /f test.txt /o out.txt
You can overwrite the original file by specifying - as the output file:
jrepl "^(.*?)\s*// " "$1!=''?$1:false" /jmatch /f test.txt /o -
I've tested, and it gives the exact output you are looking for.
If you put the command within a batch script, then you must use call jrepl

Tha Batch file below do what you want. Sorry, but there is not an "easy short code" way to do this...
#echo off
setlocal EnableDelayedExpansion
rem Set the maximum number of trailing spaces as a power_of_2-1 value. For example, for 15 spaces:
set spcPow2=4
set "spaces= "
for /L %%i in (1,1,%spcPow2%) do set "spaces=!spaces!!spaces!"
set /A spcPow2-=1
rem Process all lines, excepting empty ones and lines that start with "/"
setlocal DisableDelayedExpansion
for /F "eol=/ delims=" %%a in (test.txt) do (
set "line=%%a"
rem Split line at "// " and get the first part
setlocal EnableDelayedExpansion
for /F "delims=¡" %%b in ("!line:// =¡!") do (
endlocal
set "line=%%b"
)
rem Eliminate trailing spaces
setlocal EnableDelayedExpansion
set spc=0
for /L %%b in (%spcPow2%,-1,0) do (
set /A "newSpc=spc+(1<<%%b)"
for %%n in (!newSpc!) do if "!line:~-%%n!" equ "!spaces:~-%%n!" set "spc=%%n"
)
if !spc! gtr 0 for %%n in (!spc!) do set "line=!line:~0,-%%n!"
rem Show resulting line
if defined line echo !line!
endlocal
)
EDIT: New solution added
#set #x=1 // & CScript //nologo //E:JScript "%~F0" < samplefile.txt & goto :EOF
WScript.Stdout.Write(WScript.Stdin.ReadAll().replace(/(.*)\/\/ .*/g,"$1"))
Copy previous code into a file with .BAT extension, that is, it is a Batch file!

How to combine multiple lines in a single text file into one line, in Windows?

I have a multiple standard text files that follow this format, with varying numbers of lines in each file:
Line1
Line2
Line3
Line4
I want to merge every line into one, with a space in between each set of characters, so the text file would look as such:
Line1 Line2 Line3 Line3
...and so on. This needs to work with any given number of lines, due to the fact that each text file contains a different number of lines. My intention is not to merge the lines in the text files; I want each text file to remain separate. All the solutions I have found online either don't quite fit this or work exclusively with UNIX. I am running Windows 7. This can be done in Powershell, VBS, Batch, a particular program, doesn't matter, it just needs to work with Windows.
Much appreciated!

#ECHO OFF
setlocal
(SET var=)
FOR /f "delims=" %%x IN (list.txt) DO (
CALL SET var=%%var%% %%x
)
SET var=%var:~1%
echo var=%var%=
Where list.txt is the file containing your lines and var is the variable into which you want the lines concatenated.

Using batch:
for /f "usebackqdelims=" %%i in ("infile.txt") do #<nul set /p"=%%i ">>"outfile.txt"
>>"outfile.txt" echo.

Using PowerShell give this a try and see if it's what you want:
$my_file = "C:\file.txt"
$out_file = "C:\out.txt"
(Get-Content -Path $my_file) -join " " | Set-Content -Path $out_file

For the sake of completeness here's another solution in vbscript:
Set fso = CreateObject("Scripting.FileSystemObject")
Set infile = fso.OpenTextFile("C:\infile.txt")
Set outfile = fso.OpenTextFile("C:\outfile.txt", 2, True)
If Not infile.AtEndOfStream Then outfile.Write infile.ReadLine
Do Until infile.AtEndOfStream
outfile.Write " " & infile.ReadLine
Loop
infile.Close
outfile.Close

Install git-scm, cygwin or something else that contains bash, then you can do
cat *.txt | tr "\n" " "

Something like this?
(gc C:\test.txt) -join " "

Filtering file using reg exp and concatenate certain lines together (command-prompt)

I have to filter a text file filter.tmp containing two types of lines, this shows the difference:
findstr /r "^[0-9][0-9]*.*$" filter.tmp > filter-numbers.tmp
findstr /r "^[^0-9][^0-9]*.*$" filter.tmp > filter-text.tmp
What I need to do is to append lines containing text together like this and if line does contain number just put it to output file:
IF "current line" contains text THEN
previous line = concatenate "previous line" + "/" + "current line"
ELSE
echo "previous line" >> filter.out
echo "current line" >> filter.out
filter.tmp contains something like:
Hello
World
Foo
Bar
45: this is some line
Trouble
with code
66: another line
filter.out should look like:
Hello/World/Foo/Bar
45: this is some line
Trouble/with code
66: another line
I realize, this is very simple, but I just can not get it working. As I am thinking about it, it would be much easier to use C++....

This is a quite verbatim translation of your pseudocode and your regexes, based on the assumption that »contains numbers« really means »starts with two digits« (which is what your regexes show):
#echo off
setlocal enabledelayedexpansion
set Prev=
for /f "delims=" %%x in (filter.tmp) do (
set "Line=%%x"
if "!Line:~0,2!" GEQ "00" if "!Line:~0,2!" LEQ "99" (
if not "!Prev!"=="" (>>filter.out echo !Prev!)
>>filter.out echo !Line!
set Prev=
) else (
if "!Prev!"=="" (set "Prev=!Line!") else (set "Prev=!Prev!/!Line!")
)
)
if not "!Prev!"=="" (>>filter.out echo !Prev!)
This uses several things. First of all, we need delayed expansion which enables us to manipulate environment variables within the loop. Then we iterate over the lines in the file with for /f. Note that this will skip empty lines in the file, but you cannot avoid that. Inside the for /f loop the variable Line holds the current line and Prev the previous one (if there has been a previous one). I swapped the then and else branches of the condition since numbers at the start of the line are easier to check for than non-numbers.
With the echo you'll notice that I moved the redirection to the start of the line; this is to prevent trailing numbers in Prev or Line from having an effect on the redirection (and also to avoid trailing spaces).
If you're not adverse to PowerShell, you can use the following:
$(switch -Regex -File filter.tmp {
'^\D' { if ($prev) { $prev += "/$_" } else { $prev = $_ } }
'^\d{2}' { if ($pref) {$prev}; $_; $prev = '' }
}
if ($prev) { $prev }
) | Set-Content filter.out

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio