VB Script to extract text from file using delimiters - vbscript

I tried with batch to extract required code, but this doesn't work especially for big files. I'm wondering if this is possible with VB script. So,
I need to extract text from file between 2 delimiters and copy it to TXT file. This text looks like XML code, instead delimiters <string> text... </string>, I have :::SOURCE text .... ::::SOURCE. As you see in first delimiter are 3x of ':' and in second are 4x of ':'
Most important is that there are multiple lines between these 2 delimiters.
Example of text:
text&compiled unreadable characters
text&compiled unreadable characters
:::SOURCE
just this code
just this code
...
just this code
::::SOURCE text&compiled unreadable characters
text&compiled unreadable characters
Desired output:
just this code
just this code
...
just this code

Maybe you can try somethig like this:
filePath = "D:\Temp\test.txt"
Set fso = CreateObject("Scripting.FileSystemObject")
Set f = fso.OpenTextFile(filePath)
startTag = ":::SOURCE"
endTag = "::::SOURCE"
startTagFound = false
endTagFound = false
outputStr = ""
Do Until f.AtEndOfStream
lineStr = f.ReadLine
startTagPosition = InStr(lineStr, startTag)
endTagPosition = InStr(lineStr, endTag)
If (startTagFound) Then
If (endTagPosition >= 1) Then
outputStr = outputStr + Mid(lineStr, 1, endTagPosition - 1)
Exit Do
Else
outputStr = outputStr + lineStr + vbCrlf
End If
ElseIf (startTagPosition >= 1) Then
If (endTagPosition >= 1) Then
outputStr = Mid(lineStr, startTagPosition + Len(startTag), endTagPosition - startTagPosition - Len(startTag) - 1)
Exit Do
Else
startTagFound = true
outputStr = Mid(lineStr, startTagPosition + Len(startTag)) + vbCrlf
End If
End If
Loop
WScript.Echo outputStr
f.Close
I've made the assumption that start and end tag can be anywhere inside the file, not only at start of lines. Maybe you can simplify the code if you have more information on the "encoding".

Related

Count Items in file using VB

Kind of new to VBS. I'm trying to count the fields in the file and have this code.
Col()
Function Col()
Const FSpec = "C:\test.txt"
Const del = ","
dim fs : Set fs = CreateObject("Scripting.FileSystemObject")
dim f : Set f = fs.OpenTextFile(FSpec, 1)
Dim L, C
Do Until f.AtEndOfStream
L = f.ReadLine()
C = UBound(Split(L, del))
C = C +1
WScript.Echo "Items:", C
Loop
f.Close
End Function
It works however, I don't want to count the delim inside " ".
Here's file content:
1,"2,999",3
So basically, I'm getting 4 items for now but I wanted to get 3. Kind of stuck here.
For an example of my second suggestion, a very simple example could be something like this. Not saying it is perfect, but it illustrates the idea:
Dim WeAreInsideQuotes 'global flag
Function RemoveQuotedCommas(ByVal line)
Dim i
Dim result
Dim current
For i = 1 To Len(line)
current = Mid(line, i, 1) 'examine character
'check if we encountered a quote
If current = Chr(34) Then
WeAreInsideQuotes = Not WeAreInsideQuotes 'toggle flag
End If
'process the character
If Not (current = Chr(44) And WeAreInsideQuotes) Then 'skip if comma and insode quotes
result = result & current
End If
Next
RemoveQuotedCommas = result
End Function

Read a file's data within a specified range using VB script. Is it possible?

This is the middle of the code I'm trying to work with. Is there a way to make the file it's reading open and read from line 2 to line 97? Where I need the correction is starred (****). What I'm trying to do is get the data from lines 2 through 97 to compare to another file I'll have to open from the same lines. The beginning and ends of each file are different but the middle information should match thus I need these specific lines.
' Build Aliquot file name
strFile = aBarcodeExportDir & "A-" & yearStr & "-" & splitStr2(0) & ".csv"
'msgbox("open file: " & strFile)
If (objFS.FileExists(strFile)) Then
' Open A file
Set objFile = objFS.OpenTextFile(strFile)
' Build string with file name minus extension - used later to determine EOF
strFileNameNoExtension = "A-" & yearStr & "-" & splitStr2(0)
' Create dictionary to hold key/value pairs - key = position; value = barcode
Set dictA = CreateObject("Scripting.Dictionary")
' Begin processing A file
Do Until objFile.AtEndOfStream(*****)
' Read a line
strLine = objFile.ReadLine(*****)
' Split on semi-colons
splitStr = Split(strLine, ";")
' If splitStr array contains more than 1 element then continue
If(UBound(splitStr) > 0) Then
' If barcode field is equal to file name then EOF
If(splitStr(6) = strFileNameNoExtension) Then
' End of file - exit loop
Exit Do
Else
' Add to dictionary
' To calculate position
' A = element(2) = position in row (1-16)
compA = splitStr(2)
' B = element(4) = row
compB = splitStr(4)
' C = element(5.1) = number of max positions in row
splitElement5 = Split(splitStr(5), "/")
compC = splitElement5(0)
' position = C * (B - 1) + A
position = compC * (compB - 1) + compA
barcode = splitStr(6) & ";" & splitStr(0) & ";" & splitStr(1) & ";" & splitStr(2)
'msgbox(position & ":" & barcode)
' Add to dictionary
dictA.Add CStr(position), barcode
End If
End If
Loop
' Close A file
objFile.Close
To give the exact answer, we may have to look at your text files(I mean with all the split functions you are using). But, If you just want to compare lines 2-97 of two text files, you can get a hint from the following piece of code:
strPath1 = "C:\Users\gr.singh\Desktop\abc\file1.txt" 'Replace with your File1 Path
strPath2 = "C:\Users\gr.singh\Desktop\abc\file2.txt" 'Replace with your File2 Path
Set objFso = CreateObject("Scripting.FileSystemObject")
Set objFile1 = objFso.OpenTextFile(strPath1,1)
Set objFile2 = objFso.OpenTextFile(strPath2,1)
blnMatchFailed = False
Do Until objFile1.AtEndOfStream
If objFile1.Line=1 Then
objFile1.SkipLine() 'Skips the 1st line of both the files
objFile2.SkipLine()
ElseIf objFile1.Line>=2 And objFile1.Line<=97 Then
strFile1 = objFile1.ReadLine()
strFile2 = objFile2.ReadLine()
If StrComp(strFile1,strFile2,1)<>0 Then 'textual comparison. Change 1 to 0, if you want binary comparison of both lines
blnMatchFailed = True
intFailedLine = objFile1.Line
Exit Do 'As soon as match fails, exit the Do while Loop
Else
blnMatchFailed = False
End If
Else
Exit Do
End If
Loop
If blnMatchFailed Then
MsgBox "Comparison Failed at line "&intFailedLine
Else
MsgBox "Comparison Passed"
End If
objFile1.Close
objFile2.Close
Set objFile1 = Nothing
Set objFile2 = Nothing
Set objFso = Nothing

VBscript - select file based on priority

I have a folder that I will be looping through to process files differently based on their filenames. Doing good on my script (first one!), until I realized there will be filenames that have also have numbers representing priority. For example in the folder there may be:
'NV_CX67_mainx.dxf'
'NV_CX67_mainx1.dxf'
'NV_CX67_mainx2.dxf '
'NV_CX67_mainxroad.dxf'
'NV_CX67_motx.dxf'
'NV_CX67_resxroad.dxf'
The mainx, mainx1 and mainx2 are the same file type but mainx2 has priority and should be the only one processed. Currently, my statement is:
If Instr(1,FileRef, "mainx",1) then
How might I add a 2nd filter to process only the file with the highest number before moving onto the next file?
You are going to have run through the following process
Sort your input files
Loop through each file one by one
Compare the current file to the previous one you looked at minus the numbers to see if it greater.
Only process an item you have scanned all the similar items to ensure this one has the largest number
I wrote up an example below. Notice only NV_CX67_mainx4.dxf, and NV_CX67_mainxroad.dxf get processed:
Option Explicit
Dim i, sBaseFileName, sPrevFileName, prevBaseFile
sPrevFileName = "~"
prevBaseFile = "~"
Dim arr(5)
'Initialize test array. This will need to be sorted for this code to work properly
arr(0) = "NV_CX67_mainx.dxf"
arr(1) = "NV_CX67_mainx4.dxf"
arr(2) = "NV_CX67_mainx2.dxf"
arr(3) = "NV_CX67_mainxroad.dxf"
arr(4) = "NV_CX67_motx.dxf"
arr(5) = "NV_CX67_resxroad.dxf"
'Loop through the array
For i = LBound(arr) to UBound(arr)
If Instr(1, arr(i), "mainx",1) Then 'Check prev qualifier
sBaseFileName = getsBaseFileName(arr(i))
'First Case
If prevBaseFile = "~" Then
prevBaseFile = sBaseFileName
sPrevFileName = arr(i)
'Tie - Figure out which one to keep based on number at end of file name
ElseIf prevBaseFile = sBaseFileName Then
sPrevFileName = GetMaxFile(sPrevFileName, arr(i))
prevBaseFile = getsBaseFileName(sPrevFileName)
'New Case - Process prev case
Else
'Process File
MsgBox ("Processing " + sPrevFileName)
'Capture new current file for future processing
sPrevFileName = arr(i)
prevBaseFile = getsBaseFileName(sPrevFileName)
End If
End If
Next
'If last file was valid process it
If sPrevFileName <> "~" Then
MsgBox ("Processing " + sPrevFileName)
End If
'Return the larger of the two files based on numbers at end.
'Note "file9.txt" > "file10.txt" in this code
Function GetMaxFile(sFile1, sFile2)
GetMaxFile = sFile1
If sFile2 > sFile1 Then
GetMaxFile = sFile2
End If
End Function
'Return the file without extension and trailing numbers
'getsBaseFileName("hello123.txt") returns "hello"
Function getsBaseFileName(sFile)
Dim sFileRev
Dim iPos
getsBaseFileName = sFile
sFileRev = StrReverse(sFile)
'Get rid of the extension
iPos = Instr(1, sFileRev, ".",1)
If iPos < 1 Then
Exit Function
End If
sFileRev = Right(sFileRev, Len(sFileRev)-iPos)
'Get rid of trailing numbers
Do
If InStr(1, "1234567890", Left(sFileRev, 1), 1) Then
sFileRev = Right(sFileRev, Len(sFileRev)-1)
Else
Exit Do
End If
Loop While(Len(sFileRev) > 0)
getsBaseFileName = StrReverse(sFileRev)
End Function

VBA Performance issue - Iteration

I am reading a text file with 5000 strings. Each string contains Date+Time and then 3 values. The delimiter between Date and Time is a space, and then the three values are tab delimited. First string (strData(0)) is just a header, so I do not need that. Last string is just a simple "End".
The below code works, but it takes 1 minute to import into the worksheet! What can I do to improve this, and what is taking time?
Screen updating is off.
'open the file and read the contents
Open strPpName For Binary As #1
MyData = Space$(LOF(1))
Get #1, , MyData
Close #1
strData() = Split(MyData, vbCrLf)
'split the data and write into the correct columns
Row = 3
i = 0
For Each wrd In strData()
If i > 0 Then 'first string is only header
tmpData() = Split(wrd, vbTab)
DateString() = Split(tmpData(0), " ")
If DateString(0) <> "End" Then
ActiveSheet.Cells(Row, 5) = DateString(0) 'Date
ActiveSheet.Cells(Row, 6) = DateString(1) 'Time
ActiveSheet.Cells(Row, 2) = tmpData(1) 'Value1
ActiveSheet.Cells(Row, 3) = tmpData(2) 'Value2
ActiveSheet.Cells(Row, 4) = tmpData(3) 'Value3
Row = Row + 1
Else
GoTo Done
End If
End If
i = i + 1
Next wrd
Done:
Try with something like this:
Dim Values(), N, I
N = 100
ReDim Values(6, N)
...
Do While Not EOF(1)
I = I + 1
If I > N Then
N = N + 100
ReDim Preserve Values(6, N)
End If
Values(0, I) = ...
...
Loop
Range("A1:F" & i) = Values
The loop will work with arrays that in VBA are much faster than working with the sheet.
Excel can handle multiple types of delimiters (tab and space) with get data from text. This is what I have from macro recorder
Sub Macro1()
'
' Macro1 Macro
'
'
With ActiveSheet.QueryTables.Add(Connection:= _
"TEXT;C:\Users\jeanno\Documents\random.txt", Destination:=Range("$A$1"))
.CommandType = 0
.Name = "random_1"
.FieldNames = True
.RowNumbers = False
.FillAdjacentFormulas = False
.PreserveFormatting = True
.RefreshOnFileOpen = False
.RefreshStyle = xlInsertDeleteCells
.SavePassword = False
.SaveData = True
.AdjustColumnWidth = True
.RefreshPeriod = 0
.TextFilePromptOnRefresh = False
.TextFilePlatform = 437
.TextFileStartRow = 1
.TextFileParseType = xlDelimited
.TextFileTextQualifier = xlTextQualifierDoubleQuote
.TextFileConsecutiveDelimiter = False
.TextFileTabDelimiter = True
.TextFileSemicolonDelimiter = False
.TextFileCommaDelimiter = False
.TextFileSpaceDelimiter = True
.TextFileColumnDataTypes = Array(1)
.TextFileTrailingMinusNumbers = True
.Refresh BackgroundQuery:=False
End With
End Sub
This will be much faster than string manipulation in VBA.
I think the problem is you might be reading the file in Binary. Try the following approach. I ran 5100+ records and it parsed it in under a second.
Public Sub ReadFileToExcel(filePath As String, rowNum As Long)
'******************************************************************************
' Opens a large TXT File, reads the data until EOF on the Source,
' adds the data in a EXCEL File, based on the row number.
' Arguments:
' ``````````
' 1. The Source File Path - "C:\Users\SO\FileName.Txt" (or) D:\Data.txt
' 2. The Row number you wish to start adding data.
'*******************************************************************************
Dim strIn As String, lineCtr As Long
Dim tmpData, DateString
'Open the SOURCE file for Read.
Open filePath For Input As #1
'Loop the SOURCE till the last line.
Do While Not EOF(1)
'Read one line at a time.
Line Input #1, strIn
lineCtr = lineCtr + 1
If lineCtr <> 1 Then
If InStr(strIn, "END") = 0 Then
tmpData = Split(strIn, vbTab)
DateString = Split(tmpData(0), " ")
ActiveSheet.Cells(rowNum, 5) = DateString(0) 'Date
ActiveSheet.Cells(rowNum, 6) = DateString(1) 'Time
ActiveSheet.Cells(rowNum, 2) = tmpData(1) 'Value1
ActiveSheet.Cells(rowNum, 3) = tmpData(2) 'Value2
ActiveSheet.Cells(rowNum, 4) = tmpData(3) 'Value3
rowNum = rowNum + 1
End If
End If
Loop
Debug.Print "Total number of records - " & lineCtr 'Print the last line
'Close the files.
Close #1
End Sub

vbs using .Read ( ) with a variable not an interger

I have a problem in that I need to read a specified quantity of characters from a text file, but the specified quantity varies so I cannot use a constant EG:
variable = WhateverIsSpecified
strText = objFile.Read (variable) ' 1 ~ n+1
objOutfile.write strText
NOT
strText = objFile.Read (n) ' n = any constant (interger)
When using the first way, the output is blank (no characters in the output file)
Thanks in advance
UPDATE
These are the main snippets in a bit longer code
Set file1 = fso.OpenTextFile(file)
Do Until file1.AtEndOfStream
line = file1.ReadLine
If (Instr(line,"/_N_") =1) then
line0 = replace(line, "/", "%")
filename = file1.Readline
filename = Left(filename, len(filename)-3) & "arc"
Set objOutFile = fso.CreateTextFile(destfolder & "\" & filename)
For i = 1 to 5
line = file1.Readline
next
nBytes = line 'this line contains the quantity needed to be read eg 1234
Do until Instr(line,"\") > 0
line = file1.ReadLine
Loop
StrData = ObjFile.Read (nBytes)
objOutFile.Write StrData
objOutFile.close
End if
Loop
WScript.quit
My own stupid error,
StrData = ObjFile.Read (nBytes)
should be
StrData = file1.Read (nBytes)

Resources