I am looking for a way to read a big binary file using VBScript (big - 1 GB). I can't read it directly with ReadAll function because the file is too big, so I am looking for a way to read it in a loop, like in C. So I want to read X bytes, process them (I don't need the full file to do my stuff), then read next 10 and over again.
The problem is that I can't find a way to do that, I know how to start reading from offset, but can't find a way to read X bytes, there are only ReadAll and ReadLine functions.
Is there a way to read X bytes?
When in doubt, read the documentation:
Read Method
Reads a specified number of characters from a TextStream file and returns the resulting string.
Syntax
object.Read(characters)
Arguments
object
Required. Always the name of a TextStream object.
characters
Required. Number of characters you want to read from the file.
filename = "C:\path\to\your.file"
Set fso = CreateObject("Scripting.FileSystemObject")
Set f = fso.OpenTextFile(filename)
Do Until f.AtEndOfStream
buf = f.Read(10)
'...
Loop
f.Close
Note, however, that the Read() method doesn't read bytes per se, but characters. Which is roughly the same as long as you open the file in ANSI mode (the default).
Related
I am trying to read an Excel file in VBScript, but in the file.Readline I am getting strange characters. Do you have any idea how you could get the value of the cells correctly? Without Excel libraries.
Dim fso,file
Set fso = Server.CreateObject("Scripting.FileSystemObject")
Set dict = CreateObject("Scripting.Dictionary")
Set file = fso.OpenTextFile ("C:\myFile.xlsx",1)
row = 0
Do Until file.AtEndOfStream
line = file.Readline
dict.Add row, line
row = row + 1
Loop
file.Close
if you are writing a macro in excel (Also visual basic script) there are more than one way getting a cell value.
There are range function
(example from web:) Worksheets("Sheet1").Range("A5").Value
There is cells function (example from web) Cells(1, 1).
The excel file (xslx) should be actually zip file where data is xml.
I think thats why you can't read it if you are using VB compiler.
You most likely need to have set the encoding for the page to UTF-8. See the links before for a simple description:
Classic ASP text substitution and UTF-8 encoding
https://www.w3schools.com/asp/prop_charset.asp
So it would look something like below, located near the top of the page:
Response.Charset = "UTF-8"
I have solved my problem using the extension .CSV, since this allows me to read the information as a .txt in which each column is separated with commas by default, so my code works normally.
I am reverse engineering some old database files. It's going pretty good. All the files I have worked with so far have fixed width records and the width is defined in the header. Pretty straight forward.. I know the header length, so I can start reading the file right after the header and then I know that X bytes later I get to the end of the record. If the record is 30 bytes and the header is 100 I can do something like this:
file = IO.binread(path + file_name, end_of_header, end_of_file)
read_file(file[0, 30]) #This calls a function that parses the data..
However, there are several tables with dynamic width records. So, one record can be 100 bytes and the next could be 20 bytes. The records are as big as the amount of text the user saved. There does not seem to be anything that notes the record length on the record..
Each record is separated by a delimiter (FEFE). I am scanning for the next delimiter and pulling the record that way, but it takes forever to read the entire file byte by byte looking for matches. Is there a better way than scanning to find the next match OR get a list of all the indexes of each occurrence of the byte array?
RUBY...
You can specify a separator for readline
file.readline(sep="FEFE")
or if you mean the 2 char hex string:
file.readline(sep="\xFE\xFE")
Gets you one record (including the delimiter)
Or you can pass to a code block
file.readlines(sep="\xFE\xFE").each{|line|...}
I'm writing a Matlab script which begins by reading a space delimited .log file into a cell array . Column headers in the file are all strings, but data types throughout the file are mixed, so for simplicity I've been treating every value as a string for now.
This is what I have so far, and it works just fine with small files.
fileID = fopen('file');
ImportData = char.empty; % create empty array to add on to
while ~feof(fileID)
tLines = fgetl(fileID); % reads line into string
raw = strsplit(tLines, ' '); %splits line into array for that line
ImportData = cat(1, ImportData, raw); %adds line to rest of array
end
fclose(fileID);
However the actual files this script will need to read are very unwieldy (30,000+ rows, 200+ columns) and I'm finding this procedure very slow for that. I've done some research and I'm sure that vectorization is the answer, but I'm very unfamiliar in this area.
What are the ways in which I could alter this procedure to dramatically increase speed?
EDIT: Column types are inconsistent, so the importdata function doesn't work. The file has a .log extension, so the readtable function doesn't work. Ideally a faster method of using textscan would be perfect.
readtable(filename,'FileType','text','Delimiter',' ')
should work fine. The file extension ".log" is irrelevant as long as your file is delimited with ' '.
You can further specify a format string/sequence if you have prior knowledge of column format. Specifying format strings can make the operation a lot quicker. If you don't specify a format then it will return numeric if entire column is numeric or cellstrings if it's mixed.
I am new programmar in Ruby. Can someone take an example about opening file with r+,w+,a+ mode in Ruby? What is difference between them and r,w,a?
Please explain, and provide an example.
The file open modes are not really specific to ruby - they are part of IEEE Std 1003.1 (Single UNIX Specification). You can read more about it here:
http://pubs.opengroup.org/onlinepubs/009695399/functions/fopen.html
r or rb
Open file for reading.
w or wb
Truncate to zero length or create file for writing.
a or ab
Append; open or create file for writing at end-of-file.
r+ or rb+ or r+b
Open file for update (reading and writing).
w+ or wb+ or w+b
Truncate to zero length or create file for update.
a+ or ab+ or a+b
Append; open or create file for update, writing at end-of-file.
Any mode that contains the letter 'b' stands for binary file. If the 'b' is not present is a 'plain text' file.
The difference between 'open' and 'open for update' is indicated as:
When a file is opened with update mode ( '+' as the second or third character in the mode argument), both input and output may be performed on the associated stream. However, the application shall ensure that output is not directly followed by input without an intervening call to fflush() or to a file positioning function ( fseek(), fsetpos(), or rewind()), and input is not directly followed by output without an intervening call to a file positioning function, unless the input operation encounters end-of-file.
So basically I have a record that looks like this
modulis = record
kodas : string[4];
pavadinimas : string[30];
skaicius : integer;
kiti : array[1..50] of string;
end;
And I'm trying to read it from the text file like this :
ReadLn(f1,N);
for i := 1 to N do
begin
Read(f1,moduliai[i].kodas);
Read(f1,moduliai[i].pavadinimas);
Read(f1,moduliai[i].skaicius);
for j := 1 to moduliai[i].skaicius do
Read(f1,moduliai[i].kiti[j]);
ReadLn(f1);
end;
And the file looks like this :
9
IF01 Programavimo ivadas 0
IF02 Diskrecioji matematika 1 IF01
IF03 Duomenu strukturos 2 IF01 IF02
IF04 Skaitmenine logika 0
IF05 Matematine logika 1 IF04
IF06 Operaciju optimizavimas 1 IF05
IF07 Algoritmu analize 2 IF03 IF06
IF08 Asemblerio kalba 1 IF03
IF09 Operacines sistemos 2 IF07 IF08
And I'm getting 106 bad numeric format. Can't figure out how to fix this, I'm not sure, but I think it has something to do with the text file, however I copied the text file from the internet so it has to be good :|
Reading string data is different from reading numeric data in Pascal.
With numbers the Read instruction consumes data until it hits white space or the end of file. Now white space in this case can be the space character, the tab character, the EOL 'character'. So if there are 2 numbers on one line of text, you could read them one by one using two consecutive Reads.
I believe you have already known that.
And I believe you thought it would work the same with strings. But it won't, you cannot read two string values from one line of text simply by using two consecutive Read instructions. Read would consume all the text up to EOL or EOF. After the reading the string variable is assigned however many characters it can hold, the rest of the data being thrown out into oblivion. It is essentially equivalent to ReadLn in this respect.
Solution? Arrange all the data in the input file on separate lines and better use ReadLns instead of all the Reads. (But I think the latter might be unnecessary, and rearranging the input data might be enough.)
Alternatively you would need to read the whole line of text into a temporary string variable, then split it manually and assign the parts to the corresponding record fields, not forgetting also to convert the numeric values from string to integer.
You choose what suits you better.
Because you have declared pavadinimas as string[30], it reads 30 character no matter what is the length of the string. For example in the following line pavadinimas will be
" Skaitmenine logika 0" instead of just "Skaitmenine logika"
IF04 Skaitmenine logika 0
I'm not a Pascal programmer, but it looks like the fields within your text file are not fixed length. How would you expect your program to delimit each field during read back?