vbscript: RegExp Replace turn capture group into variable - vbscript

regex in VBscript has 3 methods, Test, Extract and Replace, but I can only seem to turn capture groups from Extract into variable.
However what I want to do is use capturing groups from 'Replace' as a variable. I can get a Regex.Replace working with no problems using $1 $2 etc for capturing groups, however I want multiply one of the capture groups.
In an xml file, I want to extract a value, times it by 15, and insert it back in. In this example the tag.
e.q.
strText = "
<rte>
<name>gpx.studio church 2 reduced.gpx</name>
<rtept lat='-33.482652' lon='150.159134'>
<ele>938.4</ele>
<desc>076</desc>
</rtept>
<rtept lat='-33.4825698175265' lon='150.159515440464'>
<ele>942.3</ele>
<desc>162</desc>
</rtept>
<rtept lat='-33.4828785376496' lon='150.159633457661'>
<ele>943.4</ele>
<desc>098</desc>
</rtept>
</rte>
</gpx>"
Dim oRegExp
Set oRegExp = New RegExp
oRegExp.Global=True
oRegExp.Multiline = True
oRegExp.Pattern = strPattern
strPattern = "(<rtept(?:(?:.|\n|\r)*?))<desc>(.*?)<\/desc>((?:(?:.|\n|\r)*?)<\/rtept>)"
strReplace = "$1<desc>$2<\/desc>$3"
' so on this line above, I want to turn the $2 into an integer and multiply it by 15 before putting back into replace.
' I have not done it here because I know it doesnt work as "$2"x1000
strNewText = oRegExp.Replace(strText, strReplace)
I want to turn the $2 into an integer and multiply it by 15 before putting back into replace.
I have tried to get the capture groups as SubMatches(1) which work with Regex.Extract method but it doesnt seem to work in Regex.Replace method, unless I am missing something....
help appreciated

Related

Fetching the pattern matched value from text file in VBScript [duplicate]

This question already has answers here:
Regular Expression - How to find a match within a match?
(2 answers)
Closed 3 years ago.
I have a text file which has a single line of text containing 1 of the strings:
--Result=PASS:Passed
--Result=FAIL:Failed
Am trying to fetch the value PASS or FAIL using the pattern matching concept in VBScript but till now have been able to just match the string and retrieve the entire line. Please find below the code that am using:
Dim oRE, oMatches
Set oRE = New RegExp
oRE.Pattern = "--Result=(PASS|FAIL).*"
Set objFileToRead = CreateObject("Scripting.FileSystemObject").OpenTextFile("C:\tmp\resultfile.txt",1)
Dim strline
do while not objFileToRead.AtEndOfStream
strline = objFileToRead.ReadLine()
Set oMatches = oRE.Execute(strline)
For Each oMatch in oMatches
result = oMatch.Value
Next
The result that I get now is the entire matching line. Is it possible to fetch just the PASS or FAIL substring from the text file instead of the entire line?
The Match is always the entire string that matched the pattern, you are looking for the 'Groups', which you get in vbscript through SubMatches.
if oMatches.Count > 0 then
result = oMatches(0).SubMatches(0)
end if
If you use multiple braces in the pattern, you can find these here through .SubMatches(1) etc.
Btw. your pattern does not have to match the entire input string (you don't use anchors ^$ anyway), you could just use (PASS|FAIL) as pattern, or maybe =(PASS|FAIL):.

Clean up lines from log file

I have a file "D:\test.log" that has either one of two styles. This will appear if the user is offline when the user received the message:
[02:19:47] Brother Aimbot (adama900): (Saved Thu Mar 31 05:15:09 2016)This is a test line
It will be like this if the user is online when the user received the message:
[02:19:47] Brother Aimbot (adama900): This is a test line
What I would like this to do is cut out the excess parts so it would look like this if it's either the first or second style:
Brother Aimbot (adama900) This is a test line
then place it into a message box.
Here is my code:
Sub main()
filename = "D:\Test.txt"
Set fso = CreateObject("Scripting.FileSystemObject")
Set f = fso.OpenTextFile(filename)
LNEVAL = f.ReadLine
LNENUM = 0
Do Until f.AtEndOfStream
For i = 1 To LNENUMs
f.ReadLine
Next
If InStr(LNEVAL, "(S") Then
LNEVAL = Left(LNEVAL, (Len("(S")+4))
MsgBox = LNEVAL
End If
Loop
f.Close
End Sub
This is what I have so far.
It's fairly simple to do what you want with a regular expression replacement. Basically what you want to do is remove three things from each line:
a substring between square brackets from the beginning of the string,
the colon separating the name from the message, and
an optional substring between parentheses after that colon.
A regular expression ^\[.*?\] matches an opening square bracket at the beginning of a string and the shortest number of characters up to a closing square bracket.
A regular expression \(Saved.*?\) matches an opening parenthesis followed by the word Saved and the shortest number of characters up to a closing parenthesis. However, since this part is optional you need to indicate that the expression can occur zero or one time by putting it in a non-capturing group and appending the ? modifier to it ((?:...)?).
Put the submatches that you do want to preserve in parentheses to create capturing groups
^\[.*?\] (.*?): (?:\(Saved.*?\))?(.*)
and replace each matching line with just the captured groups:
Set re = New RegExp
re.Pattern = ...
Set f = fso.OpenTextFile(filename)
Do Until f.AtEndOfStream
MsgBox re.Replace(f.ReadLine, "$1 $2")
Loop
f.Close
Some comments on your existing code:
For i = 1 To LNENUMs: this loop is always skipped over, because you set LNEUMs to 0. Since you only do f.ReadLine inside that For loop your outer Do loop becomes an infinite loop, since you never read the file to the end.
Len("(S")+4 always evaluates to 6, because the length of the string (S is not going to change, so you could just replace the expression with the numeric value.
MsgBox = LNEVAL: The MsgBox function doesn't work that way. Remove the = between function name and message.

RegEx to remove new line characters and replace with comma

I scraped a website using Nokogiri and after using xpath I was left with the following string (which is a few td's pushed into one string).
"Total First Downs\n\t\t\t\t\t\t\t\t359\n\t\t\t\t\t\t\t\t274\n\t\t\t\t\t\t\t"
My goal is to make this into an array that looks like the following(it will be a nested array):
["Total First Downs", "359", "274"]
The issue is creating a regex equation that removes the escaped characters, subs in one "," but does not sub in a "," after the last set of integers. If the comma after the last set of integers is necessary, I could use #compact to get rid of the nil that occurs in the array. If you need the code on how I scraped the website here it is: (please note i saved the webpage for testing in order for my ip address to not get burned during the trial phase)
f = File.open('page')
doc = Nokogiri::HTML:(f)
f.close
number = doc.xpath('//tr[#class="tbdy1"]').count
stats = Array.new(number) {Array.new}
i = 0
doc.xpath('//tr[#class="tbdy1"]').each do |tr|
stats[i] << tr.text
i += 1
end
Thanks for your help
I don't fully understand your problem, but the result can be easily achieved with this:
"Total First Downs\n\t\t\t\t\t\t\t\t359\n\t\t\t\t\t\t\t\t274\n\t\t\t\t\t\t\t"
.split(/[\n\t]+/)
# => ["Total First Downs", "359", "274"]
Try with gsub
"Total First Downs\n\t\t\t\t\t\t\t\t359\n\t\t\t\t\t\t\t\t274\n\t\t\t\t\t\t\t".gsub("/[\n\t]+/",",")

Extracting strings and writing to a new file

I was googling around but didn't find the right answer, perhaps people from here are willingly and able to help me.
I'm very new to VBS or WSH and I like to have a solution for this problem:
I'm searching for textstrings within a file without a line break (only one line). The textstrings I'm looking for start always with the same content "jpgline" and ends with the three letters "qbm". How can we extract each sentence (the strings are always 64 chars long) containg "jpgline....qbm" into a separate file.
I'm looking for a solution in Visual Basic Script as I use Windows 7.
Thanks in advance
M i k e
Use a regular expression:
Set re = New RegExp
re.Pattern = "^jpgline.*qbm$"
re.IgnoreCase = True
Set fso = CreateObject("Scripting.FileSystemObject")
Set inFile = fso.OpenTextFile("C:\path\to\input.txt")
Set outFile = fso.OpenTextFile("C:\path\to\output.txt", 2, True)
Do Until inFile.AtEndOfStream
line = inFile.ReadLine
If re.Test(line) Then outFile.WriteLine line
Loop
inFile.Close
outFile.Close
As your input file has no lines, use .ReadAll() to load its entire content into a string variable. Apply a RegExp to get all parts (Matches) defined by the pattern "jpgline.{N}qbm" where N is either 64 or 64 - the length of the pre/suffix. Ansgar has shown how to open and write to the output file.
Use the RegExp Docs to learn about .Execute and how to loop over the resulting match collection. The docs will tell you about .Test too.

vbscript - Replace all spaces

I have 6400+ records which I am looping through. For each of these: I check that the address is valid by testing it against something similar to what the Post Office uses (find address). I need to double check that the postcode I have pulled back matches.
The only problem is that the postcode may have been inputted in a number of different formats for example:
OP6 6YH
OP66YH
OP6 6YH.
If Replace(strPostcode," ","") = Replace(xmlAddress.selectSingleNode("//postcode").text," ","") Then
I want to remove all spaces from the string. If I do the Replace above, it removes the space for the first example but leave one for the third.
I know that I can remove these using a loop statement, but believe this will make the script run really slow as it will have to loop through 6400+ records to remove the spaces.
Is there another way?
I didn't realise you had to add -1 to remove all spaces
Replace(strPostcode," ","",1,-1)
Personally I've just done a loop like this:
Dim sLast
Do
sLast = strPostcode
strPostcode = Replace(strPostcode, " ", "")
If sLast = strPostcode Then Exit Do
Loop
However you may want to use a regular expression replace instead:
Dim re : Set re = New RegExp
re.Global = True
re.Pattern = " +" ' Match one or more spaces
WScript.Echo re.Replace("OP6 6YH.", "")
WScript.Echo re.Replace("OP6 6YH.", "")
WScript.Echo re.Replace("O P 6 6 Y H.", "")
Set re = Nothing
The output of the latter is:
D:\Development>cscript replace.vbs
OP66YH.
OP66YH.
OP66YH.
D:\Development>
This is the syntax Replace(expression, find, replacewith[, start[, count[, compare]]])
it will default to -1 for count and 1 for start. May be some dll is corrupt changing the defaults of Replace function.
String.Join("", YourString.Split({" "}, StringSplitOptions.RemoveEmptyEntries))
Because you get all strings without spaces and you join them with separator "".

Resources