I've got some time-domain data in a file I need to read with Fortran that looks like repetitions of this:
0.0000E+000
2 % number of particles
4 % number of values
0.00000E+000
0.00000E+000
0.00000E+000
0.00000E+000
4 % number of values
0.00000E+000
0.00000E+000
0.00000E+000
0.00000E+000
where the first line is the current time and I need the other values in an array sized by the number of particles & values. Ideally, each call to a read_values() subroutine would grab one chunk of this data (at the next time), but I'm not sure how to skip the comments. Is there an easy way to simply advance to the next line after a read?
Fortran I/O is normally record based (the new stream access method is not). For formatted files this mean lines. A Fortran read normally reads from a line and the next read will read the next line ... unless you explicitly use the non-advancing option. If you know which lines have integers, read the integers with a format that uses just that part of the line. That appears to be I8 with your file. The read won't process the rest of the line with the comment. The next read will read from the next line. If you don't know which lines contain which type of data, then you can read each line into a string and analyze the string to decide how to read from the string.
Related
I have a large CSV with a large number of columns. I am trying to count the number of lines using
File.open(file).readlines.to_a.compact.count.to_i
It displays 57 although there are only 56 rows. Upon close examination I found that a part of one line is wrapped to form the next line. How to get the correct count?
Upon close examination I found that a part of one line is wrapped to form the next line. How to get the correct count?
You need to show an example of the incoming data if you want us to help beyond generic answers.
To fix the problem, you have to be able to identify the line. We can't help you there because it could look like anything. Making a wild guess, I'd say that one of the columns had an embedded new-line in it, which forces the line to wrap.
It the file is a true CSV file, that column should be wrapped in double-quotes, so you could search the file for lines that do NOT end with whatever data type should be in the last column, then read the next line, join them, then rewrite the file. But, again, we have nothing to work with, because your file's format could be a huge number of different things.
Your best bet is to use the CSV class that comes with Ruby, and let it read the file, instead of trying to treat it like a text file. CSV files are text, but they are formatted to maintain the columns and rows, so using the CSV class will give you a better chance of getting at the data.
Looking at your code:
There are a number of ways to count the number of lines in a file, including the easiest which is:
`wc -l /path/to/file`.to_i
if you're using *nix.
Using File.open(file).readlines.to_a is horribly redundant and not fast or scalable if your file is big.
readlines returns an array.
to_a returns an array.
Why turn the array into an array?
readlines loads an entire file into memory, then splits it on line ends into an array. That process can be a lot slower than simply reading the file line-by-line and incrementing a counter, plus "slurping" can make your program crawl if the file is larger than available memory.
See "Why is "slurping" a file not a good practice?" for more information.
compact removes nils from an array. readlines should never return any nils so compact will iterate over the array looking for something that shouldn't exist.
count returns an integer.
to_i converts the receiver to an integer.
In other words, to_i is turning an integer into an integer. Why?
If you want to do it in Ruby instead of using wc -l, do something simple and fast:
lines_in_file = 0
File.foreach(some_file) { lines_in_file += 1 }
After running that, lines_in_file will contain the number of lines read. Memory won't be impacted and it'll run like blue blazes on huge files.
VBA question
There is a large log file (around 500,000 lines), I need to read it line by line in reverse order, i.e. from the last line to the first line.
I know I can use FileSystemObject in the Microsoft Scripting Runtime reference, but there is no such option like reverse for ReadLine Method in TextStream
Now, the only way I can think of is like this, has a counter and skip previous lines for each of the line I read, but definitely this is not good enough. Any suggestion code/algo will be much appreciated.
If your log is a kind of database with field which allows to determine the order (is there a date field or line number field), if so you could try to use ADO solution with SQL query to read the log in reverse order (ORDER BY ... DESC). So, you will be able to read from last to first. Or generally- try to use ADO.
A file is not line based, or even character based, it's just bytes so there is no way to read lines in reverse order in a file. How the text is separated into lines is only determined by where there are line break characters in the text.
You can read lines from the beginning and store them in a rotating buffer, so that you have for example the last 1000 lines in the buffer when you reach the end of the file. That way you have a certain number of lines that you can access from your buffer without having to read the entire file for every single line.
After that you know how many lines there are in the file, so when you need to refill the buffer you can just skip a certain number of lines and read the following lines into the buffer.
Help me brainstorm how I would solve this problem.
I have a file of dates with corresponding data, the format looks like this:
Date,data,data,data,data,data
Date,data,data,data,data,data
It's a plain csv file, only commas being used.
I need to be able to select a beginning date. And then get the data for the next 20 days beginning with the date selected.
Date format:
2007.05.21 (y,m,d)
So I think it would be best to search for the date. Either loading the entire file first into memory or read line by line. The file is only 1 megabyte, however I might want to do this with a 100 megabyte file as well. Is that still little?
Also I will want to do this very many times. I think I may want to keep the file in memory for the entire run of the program. So I can repeatedly access it.
After finding the date. I need to be able to get column 2 day 1, column 4 day 4. Ect. However there is always the same amount of columns for each day. So I guess if this is loaded into some kind of array I can always know in what array number the next and next day starts.
Any help would be greatly appreciated. Also any code examples provided would really help. This is not a homework problem or anything like that and I'm really new to programming.
You can user csv library to parse your file like this line by line
require 'csv'
date_to_search = Date(2009, 10, 10)
CSV.read('yourfilename.txt', :col_sep => ',') do |row|
# row will be an array of strings which you can parse
cur_date = Date.parse(row[0])
if cur_date == date_to_search
# you are set to read next 19 lines
# you can keep a counter and increment it after parsing each line (row here)
end
# compare and check if you need this line (and next 19)
# other calculations
end
As your data is sorted, Binary Search is what you want to use.
Simply put, you look up an element near the middle of your CSV, compare its date to the one you're looking for, and continue recursively in the matching half of the file (See the Wikipedia link for details).
Binary search has a runtime complexity of O(log n), which means that the number of read operations on a file containing 1,000,000 lines (Reasonable estimation for 100 MB) will never (under normal circumstances, that is, lines of different length are equally distributed) exceed 20.
Therefore, there is no need to keep the file in memory, quite the contrary. The operating system's disk cache will do the task of accelerating consecutive operations for you without running into memory shortage.
To read and process a line, you first need to find its first character, which is either the first letter after a newline character (\n) or the beginning of the file. Reading multiple lines can be achieved similar.
To parse a line, I suggest you split the line at the separation characters and/or the date's dots. This is, of course, only appropriate if the CSV comes from a trustworthy source and never changes its layout.
Using the Read from Text File Function I am able to easily read the first line of my file. However I now want it to read the second line. It would be great to just a for loop or something if I could specify the line number somewhere. Is there a way to do so? Thanks!
First, you can read the entire file as lines by right-clicking on the Read From Text File node and selecting "Read Lines". One read will return an array containing one element for each line and you can work with the lines with regular array handling methods. If you want to read each line individually, you can by wiring a 1 into the Count input and looping. Each iteration will return an array with one element (the current line read). You can get/set the offset (in bytes) to specify where in the file you want to read, but that's not necessary if I read your question correctly.
So basically I have a record that looks like this
modulis = record
kodas : string[4];
pavadinimas : string[30];
skaicius : integer;
kiti : array[1..50] of string;
end;
And I'm trying to read it from the text file like this :
ReadLn(f1,N);
for i := 1 to N do
begin
Read(f1,moduliai[i].kodas);
Read(f1,moduliai[i].pavadinimas);
Read(f1,moduliai[i].skaicius);
for j := 1 to moduliai[i].skaicius do
Read(f1,moduliai[i].kiti[j]);
ReadLn(f1);
end;
And the file looks like this :
9
IF01 Programavimo ivadas 0
IF02 Diskrecioji matematika 1 IF01
IF03 Duomenu strukturos 2 IF01 IF02
IF04 Skaitmenine logika 0
IF05 Matematine logika 1 IF04
IF06 Operaciju optimizavimas 1 IF05
IF07 Algoritmu analize 2 IF03 IF06
IF08 Asemblerio kalba 1 IF03
IF09 Operacines sistemos 2 IF07 IF08
And I'm getting 106 bad numeric format. Can't figure out how to fix this, I'm not sure, but I think it has something to do with the text file, however I copied the text file from the internet so it has to be good :|
Reading string data is different from reading numeric data in Pascal.
With numbers the Read instruction consumes data until it hits white space or the end of file. Now white space in this case can be the space character, the tab character, the EOL 'character'. So if there are 2 numbers on one line of text, you could read them one by one using two consecutive Reads.
I believe you have already known that.
And I believe you thought it would work the same with strings. But it won't, you cannot read two string values from one line of text simply by using two consecutive Read instructions. Read would consume all the text up to EOL or EOF. After the reading the string variable is assigned however many characters it can hold, the rest of the data being thrown out into oblivion. It is essentially equivalent to ReadLn in this respect.
Solution? Arrange all the data in the input file on separate lines and better use ReadLns instead of all the Reads. (But I think the latter might be unnecessary, and rearranging the input data might be enough.)
Alternatively you would need to read the whole line of text into a temporary string variable, then split it manually and assign the parts to the corresponding record fields, not forgetting also to convert the numeric values from string to integer.
You choose what suits you better.
Because you have declared pavadinimas as string[30], it reads 30 character no matter what is the length of the string. For example in the following line pavadinimas will be
" Skaitmenine logika 0" instead of just "Skaitmenine logika"
IF04 Skaitmenine logika 0
I'm not a Pascal programmer, but it looks like the fields within your text file are not fixed length. How would you expect your program to delimit each field during read back?