How to extract data from tag or HTML class using vbscript? Only tags or classes I choose - vbscript

Can someone help me? I need to extract the texts that are between tags or HTML classes using VBScript and save to a single text file. I need to save the tags or classes I define on different lines.
I've found a lot of code on the internet, but none worked as expected.
For example, I have the code below, but I can't extract classes through it, and it's not possible for more than one tag either. In many cases the code doesn't even work.
myURL = "http://rss.cnn.com/rss/edition.rss"
Set oXMLHttp = CreateObject("MSXML2.XMLHTTP")
Set ohtmlFile = CreateObject("htmlfile")
Set objFSO = CreateObject("Scripting.FileSystemObject")
Const ForReading = 1, ForWriting = 2, ForAppending = 8
oXMLHttp.Open "GET", myURL, False
oXMLHttp.send
If oXMLHttp.Status = 200 Then
ohtmlFile.Write oXMLHttp.responseText
ohtmlFile.Close
Set oTable = ohtmlFile.getElementsByTagName("description")
sFileName = "c:\users\user\desktop\News.txt"
Set objFile = objFSO.OpenTextFile(sFileName, ForAppending, True)
For Each oTab In oTable
objFile.Write oTab.Innertext & vbCrLf
Next
objFile.Close
End If
WScript.Quit
thanks!

You could use a regular expression instead, though:
Option Explicit
Dim myURL,oXMLHttp,objFSO,Description,write2File,ws
myURL = "http://rss.cnn.com/rss/edition.rss"
set ws = CreateObject("wscript.shell")
Set oXMLHttp = CreateObject("MSXML2.XMLHTTP")
Set objFSO = CreateObject("Scripting.FileSystemObject")
oXMLHttp.Open "GET", myURL, False
oXMLHttp.send
If oXMLHttp.Status = 200 Then
Description = Extract(oXMLHttp.responseText)
Set write2File = objFSO.CreateTextFile(".\News.txt",True)
write2File.WriteLine(Description)
write2File.Close
ws.run ".\News.txt"
End If
'-------------------------------------------------------------------------
Function Extract(Data)
Dim re,Match,Matches
Set re = New RegExp
re.Global = True
re.IgnoreCase = True
re.Pattern = "<description><!\[CDATA\[([\s\S]*?)\]\]><\/description>"
Set Matches = re.Execute(Data)
For Each Match in Matches
Description = Description & Match.SubMatches(0) & vbCrlf & vbCrlf
Next
Extract = Description
End Function
'-------------------------------------------------------------------------
EDIT :
For your second request about how to get news from google :
Option Explicit
Dim myURL,oXMLHttp,objFSO,GoogleNews,write2File,ws
myURL = "https://news.google.com/?hl=en-US&gl=US&ceid=US:en"
set ws = CreateObject("wscript.shell")
Set oXMLHttp = CreateObject("MSXML2.XMLHTTP")
Set objFSO = CreateObject("Scripting.FileSystemObject")
oXMLHttp.Open "GET", myURL, False
oXMLHttp.send
If oXMLHttp.Status = 200 Then
GoogleNews = Extract(oXMLHttp.responseText)
Set write2File = objFSO.CreateTextFile(".\GoogleNews.txt",True,-1)
write2File.WriteLine(GoogleNews)
write2File.Close
ws.run ".\GoogleNews.txt"
End If
'-------------------------------------------------------------------------
Function Extract(Data)
Dim re,Match,Matches
Set re = New RegExp
re.Global = True
re.IgnoreCase = True
re.MultiLine = True
re.Pattern = "(\bclass=""DY5T1d"" >)(.+?)<\/a>"
Set Matches = re.Execute(Data)
For Each Match in Matches
GoogleNews = GoogleNews & Match.SubMatches(1) & vbCrlf & vbCrlf
Next
GoogleNews = Replace(GoogleNews,"'","'")
GoogleNews = Replace(GoogleNews,""",chr(34))
Extract = GoogleNews
End Function
'-------------------------------------------------------------------------

You are on the right track with using the getElementsByTagName Method with your ohtmlFile object. You can specify the tag types you want. For example:
Set objAnchors = ohtmlFile.getElementsByTagName("a")
This returns all <a> tags in the HTML document.

Related

How can I use two patterns and show in the same file using VBScript?

I have the code below:
Option Explicit
Dim myURL,oXMLHttp,objFSO,Description,write2File,ws
myURL = "https://www.cbsnews.com/latest/rss/main"
Set ws = CreateObject("wscript.shell")
Set oXMLHttp = CreateObject("MSXML2.XMLHTTP")
Set objFSO = CreateObject("Scripting.FileSystemObject")
oXMLHttp.Open "GET", myURL, False
oXMLHttp.Send
If oXMLHttp.Status = 200 Then
Description = Extract(oXMLHttp.responseText)
Set write2File = objFSO.CreateTextFile(".\Description.txt", True)
write2File.WriteLine(Description)
write2File.Close
End If
Function Extract(Data)
Dim re, Match, Matches
Set re = New RegExp
re.Global = True
re.IgnoreCase = True
re.MultiLine = True
re.Pattern = "<description>([\s\S]*?)<\/description>"
Set Matches = re.Execute(Data)
For Each Match in Matches
Description = Description & Match.SubMatches(0) & vbCrLf & vbCrLf
Next
Extract = Description
End Function
Now I need to save the title and description with two different patterns in the same text file. For exemple:
re.Pattern = "<title>([\s\S]*?)<\/title>" 'pattern 01
re.Pattern = "<description>([\s\S]*?)<\/description>" 'pattern 02
How it should be saved in the text file (exemple):
line 01: Text between tag "title"
line 02: Text between tag "description"
line 03: Text between tag "title"
line 04: Text between tag "description"
etc.
I tried a Forinside another For, but the result was not as expected, because I think I'm missing something.
Use an alternation:
re.Pattern = "<title>([\s\S]*?)</title>|<description>([\s\S]*?)</description>"
and append the respective submatch:
If Not IsEmpty(Match.SubMatches(0)) Then
Description = Description & Match.SubMatches(0)
ElseIf Not IsEmpty(Match.SubMatches(1)) Then
Description = Description & Match.SubMatches(1)
End If
Do you need read and parse xml ?
function Extract(Data)
Set doc = CreateObject("MSXML2.DOMDocument")
doc.loadXML(Data)
If doc.parseError <> 0 Then
response.write doc.parseError.reason
response.end
end if
Description = ""
For Each node In doc.selectNodes("/rss/channel/item")
if not node.selectSingleNode("title") is Nothing then
Description = Description & node.selectSingleNode("title").text & vbCrlf & vbCrlf
end if
if not node.selectSingleNode("description") is Nothing then
Description = Description & node.selectSingleNode("description").text & vbCrlf & vbCrlf
end if
Description = Description & vbCrlf & vbCrlf
Next
Extract = Description
end function

vbscript adding qoutes to base64 converted string

I got some script to convert string to base64 and write encoded data to text file that's all goes fine and result stored in encoded.txt but I need each line to be double quoted at the start and '& _' with double quotes at the end so how to make this script do that automatically for each line? here is my script
Option Explicit
Const fsDoOverwrite = true
Const fsAsASCII = false
Const adTypeBinary = 1
Dim objFSO
Dim objFileOut
Dim objXML
Dim objDocElem
Dim objStream
Set objStream = CreateObject("ADODB.Stream")
objStream.Type = adTypeBinary
objStream.Open()
objStream.LoadFromFile(Wscript.scriptfullname)
Set objXML = CreateObject("MSXml2.DOMDocument")
Set objDocElem = objXML.createElement("Base64Data")
objDocElem.dataType = "bin.base64"
objDocElem.nodeTypedValue = objStream.Read()
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objFileOut = objFSO.CreateTextFile("encoded.txt", fsDoOverwrite, fsAsASCII)
objFileOut.Write objDocElem.text
objFileOut.Close()
Set objFSO = Nothing
Set objFileOut = Nothing
Set objXML = Nothing
Set objDocElem = Nothing
Set objStream = Nothing
Sub encodeFileBase64( inputFile, outputFile )
Const fsDoOverwrite = True
Const fsAsASCII = False
Const adTypeBinary = 1
Dim stream, strBuffer
Set stream = WScript.CreateObject("ADODB.Stream")
With stream
.Type = adTypeBinary
.Open
.LoadFromFile inputFile
End With
With WScript.CreateObject("MSXML2.DOMDocument").CreateElement("Base64Data")
.dataType = "bin.base64"
.nodeTypedValue = stream.Read()
strBuffer = .Text
End With
stream.Close
With New RegExp
.Multiline = True
.Global = True
.IgnoreCase = False
.Pattern = "[\r\n]+(?=[0-9A-Za-z])"
strBuffer = Chr(34) & .Replace(strBuffer, Chr(34) & " & _" & vbCrLf & Chr(34) ) & Chr(34)
End With
WScript.CreateObject("Scripting.FileSystemObject").CreateTextFile( outputFile, fsDoOverwrite, fsAsASCII ).Write strBuffer
End Sub
encodeFileBase64 WScript.ScriptFullName, WScript.ScriptFullName & ".b64"
This will use a regular expression object to replace all intermediate line endings with the adecuated line termination/start and two aditional quotes, one at the start and one at the end.
Split the text by a line break, then loop through and put it all back together while you add the quotes. (not tested, but probably close, code follows; there is a strong chance you'll need to find the right type of line feed for your situation - I picked vbCr as an example):
allTextArray = SPLIT(originalText, vbCr)
For i=0 to UBound(allTextArray)
allTextWithQuotes = allTextWithQuotes & """" & allTextArray(i) & """"
Next

VB Script to change or strip out special characters

I have a small VB script (see below) which i found via google, what it does is find a this string (H*699557/1A) in an XML file and renames the filename to that string, This works brilliantly until it encounters a special character (As in the string example) where it then stops.
Could someone please help me to remove the special characters, any help appreciated.
Set objFS = CreateObject("Scripting.FileSystemObject")
strFolder = "C:\vbs"
Set objFolder = objFS.GetFolder(strFolder)
Set regEx = New RegExp
regEx.Pattern = ".*<SuppliersInvoiceNumber>(.*?)</SuppliersInvoiceNumber>.*"
regEx.IgnoreCase = True
regEx.MultiLine = True
For Each strFile In objFolder.Files
strFileName = strFile.Name
If InStr(strFileName ,"USI") > 0 Then
Set objFile=objFS.OpenTextFile(strFileName)
strData = objFile.ReadAll
Set objFile=Nothing
Set colMatches = regEx.Execute(strData)
For Each objMatch In colMatches
strNew = Split(objMatch.Submatches(0),"\")
strNewFile = strNew(0)
strFile.Name = strNewFile & ".xml"
Next
End If
Next
Set objFS = CreateObject("Scripting.FileSystemObject")
strFolder = "C:\vbs"
Set objFolder = objFS.GetFolder(strFolder)
Set regEx = New RegExp
regEx.Pattern = ".*<SuppliersInvoiceNumber>(.*?)</SuppliersInvoiceNumber>.*"
regEx.IgnoreCase = True
regEx.MultiLine = True
For Each strFile In objFolder.Files
strFileName = strFile.Name
If InStr(strFileName ,"USI") > 0 Then
Set objFile=objFS.OpenTextFile(strFileName)
strData = objFile.ReadAll
Set objFile=Nothing
Set colMatches = regEx.Execute(strData)
For Each objMatch In colMatches
strNew = Split(objMatch.Submatches(0),"\")
strNewFile = strNew(0)
strFile.Name = Replace(Replace(strNewFile,"*",""),"/","") & ".xml"
Next
End If
Next

vbscript search string in multiple files

Please advice how changes the current single incoming log file to search multiple files.
Dim strTextToFind, strInputFile, strOutputFile, boolMatchCaseSensitive
Dim objFSO, objInputFile, strFoundText, strLine, objOutputFile
strTextToFind = Inputbox("Enter the text you would like to search for.")
strInputFile = "C:\Users\mmmanima\Desktop\mani\Day_16.txt"
iF YOU CAN NOTICED, IM ONLY ACCESS THE day_16 FILE
strOutputFile = "C:\Users\mmmanima\Desktop\texting As\result.txt"
Set objFSO = CreateObject("Scripting.FilesystemObject")
Const intForReading = 1
Const intForWriting = 2
Const intForAppending = 8
Set objInputFile = objFSO.OpenTextFile(strInputFile, intForReading, False)
Do until objInputFile.atEndOfStream
strLine = objInputFile.ReadLine
If InStr(strLine,strTextToFind) > 0 Then
strFoundText = strLine
If strFoundText <> "" Then
Set objOutputFile = objFSO.OpenTextFile(strOutputFile,intForAppending, True)
objOutputFile.WriteLine strFoundText
objOutputFile.Close
Set objOutputFile = Nothing
End If
End If
loop
objInputFile.Close
Set objInputFile = Nothing
WScript.Quit
VBScript required to search userinput string into the share folder and there is 60 files.
As I believe you want to search through the all files in a particular folder. Then I suggest you to loop you action while all files are read
to do that it's easier to maintain sub or function
pseudo:
var inputFolder = ".\myfolder"
foreach file in the inputFolder
{
inputFile = file
searchIn(inputFile)
}
sub searchIn(inputFile)
{
'do your current works here
}
code:
This part will give you the all file names
Set fso = CreateObject("Scripting.FileSystemObject")
inputFldr = Replace(wscript.scriptfullname,wscript.scriptname,".\")
Set fldr = fso.getFolder(inputFldr)
For Each file In fldr.Files
'call to your function
Next
----------plese aware of typos------
Dim strTextToFind, strInputFile, strOutputFile, boolMatchCaseSensitive
Dim objFSO, objInputFile, strFoundText, strLine, objOutputFile
Set objFSO = CreateObject("Scripting.FileSystemObject")
inputFldr = Replace(wscript.scriptfullname,wscript.scriptname,".\")
Set fldr = objFSO.getFolder(inputFldr)
strTextToFind = Inputbox("Enter the text you would like to search for.")
For Each file In fldr.Files
yourFunctionName(file )
Next
sub yourFunctionName(inputFile)
strInputFile = inputFile
strOutputFile = ".\result.txt"
Const intForReading = 1
Const intForWriting = 2
Const intForAppending = 8
Set objInputFile = objFSO.OpenTextFile(strInputFile, intForReading, False)
Do until objInputFile.atEndOfStream
strLine = objInputFile.ReadLine
If InStr(strLine,strTextToFind) > 0 Then
strFoundText = strLine
If strFoundText <> "" Then
Set objOutputFile = objFSO.OpenTextFile(strOutputFile,intForAppending, True)
objOutputFile.WriteLine strFoundText
objOutputFile.Close
Set objOutputFile = Nothing
End If
End If
loop
objInputFile.Close
Set objInputFile = Nothing
end sub
WScript.echo "done"
WScript.Quit
You can try this vbscript, i added a function BrowseForFolder()
Option Explicit
Dim strTextToFind,inputFldr,strInputFile,strOutputFile,path,fldr
Dim objFSO, objInputFile,strFoundText,strLine,objOutputFile,file,ws
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set ws = CreateObject("wscript.Shell")
path = objFSO.GetParentFolderName(wscript.ScriptFullName)
strOutputFile = path & "\result.txt"
If objFSO.FileExists(strOutputFile) Then
objFSO.DeleteFile(strOutputFile)
End if
inputFldr = BrowseForFolder()
Set fldr = objFSO.getFolder(inputFldr)
strTextToFind = Inputbox("Enter the text you would like to search for it !","Enter the text you would like to search for it !","wscript")
For Each file In fldr.Files
Call Search(file,strTextToFind)
Next
ws.run strOutputFile
'***************************************************************************************************************
Sub Search(inputFile,strTextToFind)
strInputFile = inputFile
Const intForReading = 1
Const intForWriting = 2
Const intForAppending = 8
Set objInputFile = objFSO.OpenTextFile(strInputFile,intForReading, False)
Do until objInputFile.atEndOfStream
strLine = objInputFile.ReadLine
If InStr(strLine,strTextToFind) > 0 Then
strFoundText = strLine
If strFoundText <> "" Then
Set objOutputFile = objFSO.OpenTextFile(strOutputFile,intForAppending, True)
objOutputFile.WriteLine "The Path of file ===> "& DblQuote(strInputFile) & VbCRLF &_
"String found "& DblQuote(strTextToFind) & " ===> "& DblQuote(strFoundText) & VbCRLF & String(100,"*")
objOutputFile.Close
Set objOutputFile = Nothing
End If
End If
loop
objInputFile.Close
Set objInputFile = Nothing
End sub
'***************************************************************************************************************
Function BrowseForFolder()
Dim ws,objFolder,Copyright
Set ws = CreateObject("Shell.Application")
Set objFolder = ws.BrowseForFolder(0,"Choose the folder to search on it ",1,"c:\Programs")
If objFolder Is Nothing Then
Wscript.Quit
End If
BrowseForFolder = objFolder.self.path
end Function
'****************************************************************************************************************
Function DblQuote(Str)
DblQuote = Chr(34) & Str & Chr(34)
End Function
'*****************************************************
A bit late in the day after such a long time gap to address Mara Raj's problem with Hackoo's script but here it is for any others who may be interested. On starting the script it automatically deletes any existing result.txt file. Should the script subsequently go on to find "no match" it fails to generate a results.txt file as it would normally do if there were a match. The simplest way to correct this is to insert:
If objFSO.FileExists(strOutputFile) Then
else
wscript.echo "No Matches Found"
wscript.Quit
end if
between "next" and "ws.run strOutputFile"

Reading data from a file with a variable name - VBScript

I'm trying to count the number of lines in a text file using VBScript. I have managed to do this without a problem for a text file with a fixed name. EG: "C:\Orig\sample.txt"
However, our filenames change daily, EG: "C:\Orig\sample*todaysdate*.txt"
I have looked high and low for a way to 'read' a file with a variable name and have had no luck.
What I have so far for a fixed file name is:
Dim oFso, oReg, sData, lCount, linesum
Const ForReading = 1, sPath = "C:\Orig\sample.txt"
Set oReg = New RegExp
Set oFso = CreateObject("Scripting.FileSystemObject")
sData = oFso.OpenTextFile(sPath, ForReading).ReadAll
With oReg
.Global = True
.Pattern = "\r\n" 'vbCrLf
lCount = .Execute(sData).Count + 1
End With
WScript.Echo("The total number of lines including the header is " & lCount)
Set oFso = Nothing
Set oReg = Nothing
This works perfectly well, but I just cannot find the correct syntax for a variable file name.
If it is of any help, the file I'm looking to interrogate will be the ONLY file in the containing folder.
Is anybody able to offer any assistance? Many thanks.
I have now tried the following:
Dim objFso, objReg, sData, lCount
Const ForReading = 1
sPath = "C:\Orig"
Set objFso = CreateObject("Scripting.FileSystemObject")
Set objFolder = objFso.GetFolder(sPath)
For Each objFile in objFolder.Files
Set objReg = New RegExp
sData = objFso.OpenTextFile(sPath, ForReading).ReadAll
With objReg
.Global = True
.Pattern = "\r\n" 'vbCrLf
lCount = .Execute(sData).Count + 1
End With
WScript.Echo("The total number of lines including the header is " & lCount)
Set objFso = Nothing
Set objReg = Nothing
Set objFolder = Nothing
set sData = Nothing
Next
But on line 9 I am getting a 'Permission denied' error. I have checked folder permissions and file permissions and I have full rights.
Does anybody have any ideas?
Thanks in advance.
Loop through the files in the folder instead. There's no need to name the file directly.
Dim oFso, oReg, sData, lCount, linesum
Const ForReading = 1
sPath = "C:\Orig\sample\"
Set oFso = CreateObject("Scripting.FileSystemObject")
Set objFolder = objFso.GetFolder(sPath)
For Each objFile in objFolder.Files
Set oReg = New RegExp
sData = oFso.OpenTextFile(sPath, ForReading).ReadAll
With oReg
.Global = True
.Pattern = "\r\n" 'vbCrLf
lCount = .Execute(sData).Count + 1
End With
WScript.Echo("The total number of lines including the header is " & lCount)
Set oFso = Nothing
Set oReg = Nothing
Next

Resources