Batch file to process csv document to add space in postcode field - windows

I have a csv file populated with name, address, and postcode. A large number of the postcodes do not have the required space in between e.g LU79GH should be LU7 9GH and W13TP should be W1 3TP. I need to add a space in each postcode field if it is not there already, the space should always be before the last 3 characters.
What is the best way to solve this via windows command line?
Many Thanks

You can do this with for /f as follows:
#echo off
setlocal enabledelayedexpansion
if "%~1" equ "" (echo.%~0: usage: missing file name.& exit /b 1)
if "%~2" neq "" (echo.%~0: usage: too many arguments.& exit /b 1)
for /f %%i in (%~1) do (echo.%%i& goto :afterheader)
:afterheader
for /f "skip=1 tokens=1-3 delims=," %%i in (%~1) do (
set name=%%i
set address=%%j
set postcode=%%k
set postcode=!postcode: =!
echo.!name!,!address!,!postcode:~0,-3! !postcode:~-3!
)
exit /b 0
Demo:
> type data.csv
name,address,postcode
n1,a1,LU79GH
n2,a2,W13TP
n1,a1,LU7 9GH
n2,a2,W1 3TP
> .\add-space.bat data.csv
name,address,postcode
n1,a1,LU7 9GH
n2,a2,W1 3TP
n1,a1,LU7 9GH
n2,a2,W1 3TP
You can redirect the output to a file to capture it. (But you can't redirect to the same file as the input, because then the redirection will overwrite the input file before it can be read by the script. If you want to overwrite the original file, you can redirect the output to a new file, and then move the new file over the original after the script has finished.)

Using windows you could do something with Powershell.
$document = (Get-Content '\doc.csv')
foreach($line in $document) {
Write-Host $line
// Add logic to cut out exactly what column your looking at with
$list = $line -split","
// Then use an if statement and regular expression to match ones with no space
if($list[0] -match ^[A-Z0-9]$){
// item has no space add logic to add space and write to file
}else{
// item has space or doesnt match the above regular expression could skip this
}
}
Pretty good documentation online check out http://ss64.com/ps/ for help with powershell.

Parsing CSV can be tricky because a comma may be a column delimiter, or it may be a literal character within a quoted field.
Since your postcode is always the last field, I would simply look at the 4th character from the end of the entire line, and if it is not already a space, than insert a space before the last 3 characters in the line. I will also assume that the first line of the file lists the field names, so you don't want to modify that one.
Using pure batch (assuming no values contain !):
#echo off
setlocal enableDelayedExpansion
set "skip=true"
>"test.csv.new" (
for /f "usebackq delims=" %%A in ("test.csv") do (
set "line=%%A"
if "!line:~-4,1!" equ " " set "skip=true"
if defined skip (echo !line!) else (echo !line:~0,-3! !line:~-3!)
set "skip="
)
)
move /y "test.csv.new" "test.csv" >nul
The solution is simpler if you use my JREPL.BAT regular expression text processor. It is a pure script (hybrid JScript/batch) that runs natively on any Windows machine from XP onward. The following one liner will do the trick:
jrepl "[^ ](?=...$)" "$& " /jbegln "skip=(ln==1)" /f test.csv /o -
Use CALL JREPL ... if you use the command within another script.

Related

Windows Batch file - strip leading characters

I have a batch file which copies some local files up to a google storage area using the gsutil tool. The gsutil tool produces a nice log file showing the details of the files that were uploaded and if it was OK or not.
Source,Destination,Start,End,Md5,UploadId,Source Size,Bytes Transferred,Result,Description
file://C:\TEMP\file_1.xlsx,gs://app1/backups/file_1.xlsx,2018-12-04T15:25:48.428000Z,2018-12-04T15:25:48.804000Z,CPHHZfdlt6AePAPz6JO2KQ==,,18753,18753,OK,
file://C:\TEMP\file_2.xlsx,gs://app1/backups/file_2.xlsx,2018-12-04T15:25:48.428000Z,2018-12-04T15:25:48.813000Z,aTKCOQSPVwDycM9+NGO28Q==,,18753,18753,OK,
What I would like to do is to
check the status result in column 8 (OK or FAIL)
If the status is OK then move the source file to another folder (so that it is not uploaded again).
The problem is that the source filename is appended with "file://" which I can't seem to remove, example
file://C:\TEMP\file_1.xlsx
needs to be changed into this
C:\TEMP\file_1.xlsx
I am using a for /f loop and I am not sure if the manipulation of the variables %%A is different within a for /f loop.
#echo off
rem copy the gsutil log file into a temp file and remove the header row using the 'more' command.
more +1 raw_results.log > .\upload_results.log
rem get the source file name (column 1) and the upload result (OK) from column 8
for /f "tokens=1,8 delims=," %%A in (.\upload_results.log) do (
echo The source file is %%A , the upload status was %%B
set line=%%A
set line=!line:file://:=! >> output2.txt echo !line!
echo !line!
)
The output is like this.
The source file is file://C:\TEMP\file_1.xlsx , the upload status was OK
The source file is file://C:\TEMP\file_2.xlsx , the upload status was OK
I'm expecting it to dump the altered values out into a new file but it is not producing anything at the moment.
Normally I would extract from a specific character to the end of the string with something like this but it doesn't work with my For/f loop.
%var:~7%
Any pointers or a different way of doing it greatly appreciated.
Since the part to remove seems fixed it is easier to use substrings.
Also using for /f "skip=1" evades he neccessity of the external command more +1 and another intermediate file.
#echo off & setlocal EnableDelayedExpansion
type NUL>output2.txt
for /f "skip=1 eol=| tokens=1,8 delims=," %%A in (.\upload_results.log) do (
echo The source file is %%A , the upload status was %%B
set "line=%%A"
set "line=!line:~7!"
echo(!line!>>output2.txt
echo(!line!
)
File names and paths can contain also one or more exclamation marks. The line set line=%%A is parsed by Windows command processor a second time before execution with enabled delayed expansion. See How does the Windows Command Interpreter (CMD.EXE) parse scripts? Every ! inside the string assigned to loop variable A is on this line interpreted as begin or end of a delayed expanded environment variable reference. So the string of loop variable A is assigned to environment variable line with an unwanted modification if file path/name contains one or more exclamation marks.
For that reason it is best to avoid usage of delayed expansion. The fastest solution is for this task using a second FOR to get file:// removed from string assigned to loop variable A.
#echo off
del output2.txt 2>nul
for /F "skip=1 tokens=1,8 delims=," %%A in (upload_results.log) do (
echo The source file is %%A , the upload status was %%B.
for /F "tokens=1* delims=/" %%C in ("%%~A") do echo %%D>>output2.txt
)
Even faster would be without the first echo command line inside the loop:
#echo off
(for /F "skip=1 delims=," %%A in (upload_results.log) do (
for /F "tokens=1* delims=/" %%B in ("%%~A") do echo %%C
))>output2.txt
The second solution can be written also as single command line:
#(for /F "skip=1 delims=," %%A in (upload_results.log) do #for /F "tokens=1* delims=/" %%B in ("%%~A") do #echo %%C)>output2.txt
All solutions do following:
The outer FOR processes ANSI (fixed one byte per character) or UTF-8 (one to four bytes per character) encoded text file upload_results.log line by line with skipping the first line and ignoring always empty lines and lines starting with a semicolon which do not occur here.
The line is split up on every occurrence of one or more commas into substrings (tokens) with assigning first comma delimited string to specified loop variable A. The first solution additionally assigns eighth comma delimited string to next loop variable B according to ASCII table.
The inner FOR processes the string assigned to loop variable A with using / as string delimiter to get assigned to specified loop variable file: and to next loop variable according to ASCII table the rest of the string after first sequence of forward slashes which is the full qualified file name.
The full qualified file name is output with command echo and appended either directly to file output2.txt (first solution) or first to a memory buffer which is finally at once written into file output2.txt overwriting a perhaps already existing file with that file name in current directory.
For understanding the used commands and how they work, open a command prompt window, execute there the following commands, and read entirely all help pages displayed for each command very carefully.
del /?
echo /?
for /?
See also the Microsoft article about Using command redirection operators for an explanation of the redirections >, >> and 2>nul

How to split a file based on a group of Header, Details etc in Batch script?

I have a file that look like below.
File Date Source Target
HD|Field1|Field2|Field3
ITEM1|Other fields1
ITEM2|Other fields2
HD|Field1|Field2|Field3
ITEM1|Other fields
ITEM2|Other fields
ITEM3|Other fields
I need to create separate files based on the occurrence of HD. First file will contain lines starting from HD and will have everything till the next HD segment starts.
There can be N number of HD segments. The files also need to be renamed based on Field1 value of HD segment.
So file 1 will be as File-Field1 and would contain
HD|Field1|Field2|Field3
ITEM1|Other fields1
ITEM2|Other fields2
File 2 will be File-Field1(of 2nd HD segment) and would contain
HD|Field1|Field2|Field3
ITEM1|Other fields
ITEM2|Other fields
ITEM3|Other fields
I need some help in getting the batch script. I have done some basic code and it looks like below.
setLocal EnableDelayedExpansion
set limit=1
set file=Sample.txt
set lineCounter=1
set filenameCounter=1
set name=
set extension=
for %%a in (%file%) do ( set "name=%%~na" set "extension=%%~xa" )
for /f "skip=1 delims=," %%a in (%file%) do ( set
splitFile=Load-!name!!filenameCounter!!extension!
if "%%a"=="HD|" ( set /a filenameCounter=!filenameCounter! + 1 set
lineCounter=1 echo Created !splitFile!. ) echo %%a>> !splitFile! set /a
lineCounter=!lineCounter! + 1 )
With this I get only 1 file and the line with HD| but the name is fine as Load-Sample1.txt. However there is huge loss of data. What I tried is to do a loop that will skip the first line and then in the for loop create a new file everytime a HD| is encountered.
Here is a brittle pure batch solution (lots of ways the code can break depending on the content of the source file)
#echo off
setlocal enableDelayedExpansion
set "outfile="
for /f "delims=" %%A in (sample.txt) do (
for /f "delims=| tokens=1,2" %%a in ("%%A") do if "%%a"=="HD" set "outfile=%%b"
if defined outfile echo(%%A>>"!outfile!"
)
Here are some of the ways the above code could fail:
Empty lines will be stripped
Lines beginning with ; will be stripped
Lines containing ! will be corrupted
The code could be made more robust, but it will become significantly more complicated. I would not bother. Pure batch is a terrible language for text file manipulation, except for the simplest of tasks. It is slow, and requires loads of arcane knowledge.
I have added a new feature (v6.8) to my JREPL.BAT regular expression text processor that makes it trivial to create a fast and robust solution for this problem.
JREPL.BAT is pure script (hybrid JScript/batch) that runs natively on any Windows machine from XP onward - no 3rd party exe file required.
I use a regular expression to locate HD lines and extract the file name. I use custom JScript to open a new output file at each HD line.
jrepl "^HD\|([^|]+)" "openOutput($1);$txt=$0" /jq /f "sample.txt" >nul
Be sure to use CALL JREPL if you use the command within another batch script. However, CALL will double the quoted caret, and a caret could technically be part of a file name. So you should also use another new feature of version 6.8 - the new \c caret escape sequence. This will hide the caret from CALL so it does not get doubled.
call jrepl "\cHD\|([\c|]+)" "openOutput($1);$txt=$0" /x /jq /f "sample.txt" >nul

How to sort a file on multiple positions using Windows command line?

I have an index text file, and I'm having trouble sorting it. I've been looking online for an answer, but Google hasn't pulled up anything with multi-positional searches.
Trying to do so with Unix (which would be easy), would be done as
sort inputfile -k1.1 -k3.3 -o outputfile
should accomplish the task, but trying to do so gives me Cygwin errors of already specifying the input twice (UNIX sorts are out!).
I need to sort this index file, either with Windows console applications or Perl on both positions.
Here is the input data:
1925699|0003352_0050003895.pdf|00500003895|0003352
1682628|0003352_0050003894.pdf|00500003894|0003352
1682628|0003352_0050003893.pdf|00500003893|0003352
The desired output is:
1682628|0003352_0050003893.pdf|00500003893|0003352
1682628|0003352_0050003894.pdf|00500003894|0003352
1925699|0003352_0050003895.pdf|00500003895|0003352
I'm currently trying to use:
sort/+1,7 /+32,11 < inputfile > outputfile
But I've failed to get this to be successful. (It only sorts the first parameter.) Again Unix is out of the question, and I can do it in Perl, but can this be done in Windows command line?
#ECHO Off
SETLOCAL
SET "sourcedir=U:\sourcedir"
SET "destdir=U:\destdir"
SET "filename1=%sourcedir%\q45575219.txt"
SET "outfile=%destdir%\outfile.txt"
SET "tempfile=%destdir%\tempfile.txt"
(
FOR /f "usebackqdelims=" %%a IN ("%filename1%") DO (
FOR /f "tokens=1,3delims=|" %%s IN ("%%a") DO (
ECHO(%%s%%t^|%%a
)
)
)>"%tempfile%"
(
FOR /f "tokens=1*delims=|" %%a IN (' sort "%tempfile%" ' ) DO ECHO(%%b
)>"%outfile%"
DEL "%tempfile%"
GOTO :EOF
You would need to change the settings of sourcedir and destdir to suit your circumstances.
I used a file named q45575219.txt containing your data for my testing.
Produces the file defined as %outfile%
Uses a temporary file defined as %tempfile%
Read the source file, assigning each line to %%a. Analyse %%a using pipe as a delimiter and select the first and third tokens. Prefix the first and third tokens to the entire line, separated by a pipe and echo into a temporary file.
sort the temporary file, tokenise again on pipe, selecting the first token (before the first pipe) and the rest of the line; output only the rest to the destination file.

Batch Script Find String in String with a twist

I am trying to do this in a batch script, which should be simple, but after spending a couple of hours on it I am no closer to a solution.
If the CMD parameter contains a series of letters, I want to surround each letter with single quotes and separate by commas. For example, if the user enter this:
MYTEST.CMD ABCDEF
I want to create a string that looks like this:
'A','B','C','D','E','F'
The same as if they had entered this in the CMD line:
MYTEST.CMD "'A','B','C','D','E','F'"
Fairly easy, actually:
#echo off
set "LETTERS=%~1"
set OUTPUT=
if not defined LETTERS goto usage
:loop
if defined OUTPUT set "OUTPUT=%OUTPUT%,"
set "OUTPUT=%OUTPUT%'%LETTERS:~0,1%'"
set "LETTERS=%LETTERS:~1%"
if defined LETTERS goto loop
echo.%OUTPUT%
goto :eof
:usage
echo Please pass a few letters as argument, e.g.
echo. %~0 ABC
goto :eof
Let's dissect it a little:
We first store the argument in the variable LETTERS.
Then we initialise our output string to an empty string.
Then follows a loop that appends the first letter from LETTERS to OUTPUT in the proper format (with a comma before if OUTPUT is not empty) and removes that letter from LETTERS.
When LETTERS is empty, we exit the loop and print the result.
And just for the fun of it, the same as a PowerShell function:
function Get-LetterList([string]$Letters) {
([char[]]$Letters | ForEach-Object { "'$_'" }) -join ','
}
The Batch file below use an interesting trick I borrowed from this post that convert the Ascii (1-byte) characters into Unicode 2-bytes characters via cmd /U (inserting a zero-byte between characters), and then split the zero-bytes in individual lines via find command:
#echo off
setlocal EnableDelayedExpansion
set "output="
for /F "delims=" %%a in ('cmd /D /U /C echo %~1^| find /V ""') do (
set "output=!output!,'%%a'"
)
set output="%output:~1%"
echo %output%

Find & copy a string in a file using only Windows batch

I call the file I want to search in input.txt and the string I want to find mystring.
Example content of input.txt (real input.txt)
randomstring1<>"\/=:
randomstring2<ORIGINAL>mystring</ORIGINAL>randomstring3
mystring is surrounded by the strings <ORIGINAL> and </ORIGINAL> that must be searched for
The string between both ORIGINAL-tags should be copied to clipboard (using | clip)
mystring and the tags occur only once. But they have no fixed position
all strings can contain special characters (<, >, ", \, /, =, :)
I read a lot of other SO questions but to be honest: the FOR-loop and SET-command syntax was too awkward for me. I guess my best shot will be the FINDSTR command. But maybe it is also possible with some help of RegEx expressions.
I do not want to use VBscript, Powershell, SED, FART, AWK, grep or any other additional tool.
Please be so kind and explain the difficult parts if you post a solution.
I want to understand it and maybe its helpful for others too.
My last attempt before I've given up was this test.cmd
#echo off
set "x=randomstring1<>"\/=:randomstring2<ORIGINAL>mystring</ORIGINAL>randomstring3"
set "x=%x:*<ORIGINAL>=%"
set "x=%x:</ORIGINAL>*=%"
set x=%x:~2%
echo %x%
pause
#echo off
rem Let findstr to find the LINE you want (only once):
for /F "delims=" %%a in ('findstr "<ORIGINAL>" input.txt') do set "line=%%a"
ECHO LINE: "%line%"
rem Change left delimiter by {
set "line=%line:<ORIGINAL>={%"
rem Change right delimiter by }
set "line=%line:</ORIGINAL>=}%"
ECHO STRING DELIMITED: "%LINE%"
rem Get second token delimited by { and }
for /F "tokens=2 delims={}" %%a in ("%line%") do set string=%%a
ECHO STRING: "%STRING%"
rem Copy string to clipboard
REM echo %string%| clip
Output:
LINE: "randomstring2<ORIGINAL>mystring</ORIGINAL>randomstring3"
STRING DELIMITED: "randomstring2{mystring}randomstring3"
STRING: "mystring"
As an option, you may delete from beginning of line until left delimiter:
set "line=%line:*<ORIGINAL>=%"
... and get the FIRST token separated by any delimiter you wish (ie: }):
for /F "delims=}" %%a in ("%line%") do set string=%%a

Resources