Read and write tab-delimited text data

Read and write tab-delimited text data - format

I have an excel output in the tab-delimited format:
temperature H2O CO2 N2 NH3
10 2.71539E+12 44374931376 7410673406 2570.560804
20 2.34216E+12 38494172272 6429230649 3148.699673
30 2.04242E+12 33759520581 5639029060 3856.866413
40 1.75491E+12 29172949817 4882467457 4724.305292
.
.
.
I need to convert these numbers to FORMAT(1X,F7.0,2X,1P4E11.3) readable for another code.
This is what I've come up with:
program fixformat
real temp, neuts(4)
integer i,j
character header
open(11,file='./unformatted.txt',status='old')
open(12,file='./formatted.txt',status='unknown')
read(11,*) header
write(12,*) header
do i = 1, 200
read(11,*) temp, (neuts(j),j=1,4)
write(12,23) temp, (neuts(j),j=1,4)
end do
23 FORMAT(1X,F7.0,2X,1P4E11.3)
close(11)
close(12)
return
end
I keep getting this error:
Fortran runtime error: Bad real number in item 1 of list input
Is there any other way to convert the data to that format?

You need a character string, not a single character for the header
character(80) header
other than that you program works for me. Make sure you have the right number of lines in your loop
Do i=1,200
Adjust 200 to the real number of your data lines.
If for some reason you still cannot read even a single line, you can also use the format:
read(11,'(f2.0,4(1x,f11.0))') temp, (neuts(j),j=1,4)
because the tab is just a character you can easily skip.
Notes:
Unformatted and formatted means something completely different in Fortran. Unformatted is what you may know as "binary".
Use some indentation and blank lines for your programs to make them readable.
There is no reason to explicitly use status=unknown. Just don't put anything there. In your case status=replace may be more appropriate.
The FORMAT statement is quite obsolete, in modern Fortran we use format strings:
write(12,'(1X,F7.0,2X,1P4E11.3)') temp, (neuts(j),j=1,4)
There is absolutely no reason for your return before the end. Returns is for early return from a procedure. Some put stop before the end program, but it is superfluous.

To read tab delimited data, I'd use a simple algorithm like the one below. NOTE: This is assuming that there is no tab character in any of your fields.
integer :: error_code, delim_index, line_index
character*500 :: data_line, field_data_string
double precision :: dp_value
Open(Unit=1001,File="C:\\MY\\PATH\\Data.txt")
DO
Read(UNIT=1001,End=106, FMT='(A)' ) data_line
line_length = LEN(TRIM(data_line))
delim_index = SCAN(data_line, achar(9) )
line_index = 0
DO WHILE ( delim_index .NE. 0 )
line_index = line_index + delim_index
IF (delim_index .EQ. 1 ) THEN ! found a NULL (no value), so skip
GOTO 101
END IF
field_data_string = data_line( (line_index-delim_index+1) : line_index )
READ( field_data_string, FMT=*, ERR=100) dp_value
PRINT *, "Is a double precision ", dp_value
GOTO 101
100 Continue
PRINT *, "Not a double precision"
101 Continue
IF ( (line_index+1) .GT. line_length ) THEN
GOTO 104 ! found end of line prematurely
END IF
delim_index = SCAN( data_line( line_index + 1 : ), achar(9) )
END DO
field_data_string = data_line( line_index + 1 : )
READ( field_data_string, FMT=*, ERR=102) dp_value
PRINT *, "Is a double precision ", dp_value
GOTO 103
102 Continue
PRINT *, "Not a double precision"
103 Continue
PRINT *, "Is a double precision ", dp_value
104 Continue
END DO
104 Continue
PRINT *, "Error opening file"
105 Continue
Close(1001)

Related

How to Enable Scroll Bar of QBasic Output Window?

I'm trying to display a statement 1000 times in QBASIC (using for statement). I think the program works properly, but I cannot see the 1000 statements because I cannot scroll up and down in the output window of QBASIC. I can see only the last part of the 1000 statements.
FOR x = 1 TO 1000
PRINT "maydie";
PRINT
NEXT x

That will be very hard. For QBasic you have to know how PRINT works. Than with look you could write an TSR program that does what you want in some other language. Alternative is store everything in array and create you own display routine with scrolling. But with 1000 lines will run into memory restrictions

In short, unless you're using a modern take on QBasic, you can't.
What you can do is print the output to a text file:
OPEN "C:\somefile.txt" FOR OUTPUT AS #1
FOR x = 1 TO 1000
PRINT #1, "maydie":
PRINT
NEXT x
This will write "maydie" to C:\somefile.txt 1000 times. Then use some text editor to view the output. You could even use a program to count the lines of text, something like OPEN "C:|somefile.txt" FOR INPUT AS #1: WHILE NOT EOF(1): INPUT #1, junk$: i = i + 1: WEND: PRINT "There were " + STR$(i) + " lines."

Though the other answerers are correct in saying that it is not inbuilt and hence not possible, I agree that this is very desirable! Consequently, I have time and time again devised scripts based on the following:
DIM text(1 to 1000) AS STRING
'Define text below: Here I've just defined it as every line being
'"maydie" with the value of the line number, but it could be whatever.
FOR i = 1 TO 1000
text(i) = STR$(i) + "maydie"
NEXT i
CLS
position% = 0
FOR i = 1 to 25
LOCATE i, 1: PRINT text(i); SPACE$(80 - LEN(text(i)));
NEXT i
DO
x$=INKEY$
IF x$ <> "" THEN
SELECT CASE x$
CASE CHR$(0) + CHR$(72) 'Up arrow
position% = position% - 1
IF position% < 0 THEN position% = 0
CASE CHR$(0) + CHR$(80) 'Down arrow
position% = position% + 1
IF position% > 975 THEN position% = 975
CASE CHR$(0) + "I" 'Page Up
position% = position% - 24
IF position% < 0 THEN position% = 0
CASE CHR$(0) + "Q" 'Page Down
position% = position% + 24
IF position% > 975 THEN position% = 975
CASE CHR$(27) 'ENDS the Program on ESC key.
END
END SELECT
FOR i = 1 to 25
LOCATE i, 1: PRINT text(i + position%); SPACE$(80 - LEN(text(i + position%)));
NEXT i
END IF
LOOP
Tested and works! If you want to use it multiple times in your program for multiple different text blocks, you can just turn it into a function and pass it the variables you want.

Debugging <<loop>> error message in haskell

Hello i am encountering this error message in a Haskell program and i do not know where is the loop coming from.There are almost no IO methods so that i can hook myself to them and print the partial result in the terminal.
I start with a file , i read it and then there are only pure methods.How can i debug this ?
Is there a way to attach to methods or create a helper that can do the following:
Having a method method::a->b how can i somehow wrap it in a iomethod::(a->b)->IO (a->b) to be able to test in in GHCI (i want to insert some putStrLn-s etc ?
P.S My data suffer transformations IO a(->b->c->d->......)->IO x and i do not know how to debug the part that is in the parathesis (that is the code that contains the pure methods)
Types and typeclass definitions and implementations
data TCPFile=Rfile (Maybe Readme) | Dfile Samples | Empty
data Header=Header { ftype::Char}
newtype Samples=Samples{values::[Maybe Double]}deriving(Show)
data Readme=Readme{ maxClients::Int, minClients::Int,stepClients::Int,maxDelay::Int,minDelay::Int,stepDelay::Int}deriving(Show)
data FileData=FileData{ header::Header,rawContent::Text}
(>>?)::Maybe a->(a->Maybe b)->Maybe b
(Just t) >>? f=f t
Nothing >>? _=Nothing
class TextEncode a where
fromText::Text-> a
getHeader::TCPFile->Header
getHeader (Rfile _ ) = Header { ftype='r'}
getHeader (Dfile _ )= Header{ftype='d'}
getHeader _ = Header {ftype='e'}
instance Show TCPFile where
show (Rfile t)="Rfile " ++"{"++content++"}" where
content=case t of
Nothing->""
Just c -> show c
show (Dfile c)="Dfile " ++"{"++show c ++ "}"
instance TextEncode Samples where
fromText text=Samples (map (readMaybe.unpack) cols) where
cols=splitOn (pack ",") text
instance TextEncode Readme where
fromText txt =let len= length dat
dat= case len of
6 ->Prelude.take 6 .readData $ txt
_ ->[0,0,0,0,0,0] in
Readme{maxClients=Prelude.head dat,minClients=dat!!1,stepClients=dat!!2,maxDelay=dat!!3,minDelay=dat!!4,stepDelay=dat!!5} where
instance TextEncode TCPFile where
fromText = textToFile
Main
module Main where
import Data.Text(Text,pack,unpack)
import Data.Text.IO(readFile,writeFile)
import TCPFile(TCPFile)
main::IO()
main=do
dat<-readTcpFile "test.txt"
print dat
readTcpFile::FilePath->IO TCPFile
readTcpFile path =fromText <$> Data.Text.IO.readFile path
textToFile::Text->TCPFile
textToFile input=case readHeader input >>? (\h -> Just (FileData h input)) >>? makeFile of
Just r -> r
Nothing ->Empty
readHeader::Text->Maybe Header
readHeader txt=case Data.Text.head txt of
'r' ->Just (Header{ ftype='r'})
'd' ->Just (Header {ftype ='d'})
_ -> Nothing
makeFile::FileData->Maybe TCPFile
makeFile fd= case ftype.header $ fd of
'r'->Just (Rfile (Just (fromText . rawContent $ fd)))
'd'->Just (Dfile (fromText . rawContent $ fd))
_ ->Nothing
readData::Text->[Int]
readData =catMaybes . maybeValues where
maybeValues=mvalues.split.filterText "{}"
#all the methods under this line are used in the above method
mvalues::[Text]->[Maybe Int]
mvalues arr=map (\x->(readMaybe::String->Maybe Int).unpack $ x) arr
split::Text->[Text]
split =splitOn (pack ",")
filterText::[Char]->Text->Text
filterText chars tx=Data.Text.filter (\x -> not (x `elem` chars)) tx
I want first to clean the Text from given characters , in our case }{ then split it by ,.After the text is split by commas i want to parse them, and create either a Rfile which contains 6 integers , either a Dfile (datafile) which contains any given number of integers.
Input
I have a file with the following content: r,1.22,3.45,6.66,5.55,6.33,2.32} and i am running runghc main 2>err.hs
Expected Output : Rfile (Just (Readme 1.22 3.45 6.66 5.55 6.33 2.32))

In the TextEncode Readme instance, len and dat depend on each other:
instance TextEncode Readme where
fromText txt =let len= length dat
dat= case len of
To debug this kind of thing, other than staring at the code, one thing you can do is compile with -prof -fprof-auto -rtsopts, and run your program with the cmd line options +RTS -xc. This should print a trace when the <<loop>> exception is raised (or if the program loops instead, when you kill it (Ctrl+C)). See the GHC manual https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/runtime_control.html#rts-flag--xc

As Li-yao Xia said part of the problem is the infinite recursion, but if you tried the following code, then the problem still remains.
instance TextEncode Readme where
fromText txt =let len= length [1,2,3,4,5,6] --dat
dat= case len of
The second issue is that the file contains decimal numbers but all the conversion function are expecting Maybe Int, changing the definitions of the following functions should give the expected results, on the other hand probably the correct fix is that the file should have integers and not decimal numbers.
readData::Text->[Double]
--readData xs = [1,2,3,4,5,6,6]
readData =catMaybes . maybeValues where
maybeValues = mvalues . split . filterText "{}"
--all the methods under this line are used in the above method
mvalues::[Text]->[Maybe Double]
mvalues arr=map (\x->(readMaybe::String->Maybe Double).unpack $ x) arr
data Readme=Readme{ maxClients::Double, minClients::Double,stepClients::Double,maxDelay::Double,minDelay::Double,stepDelay::Double}deriving(Show)

bash merging tables on unique id

I have two similar, 'table format' text files, each several million records long. In the inputfile1, the unique identifier is a merger of values in two other columns (neither of which are unique identifiers on their own). In inputfile2, the unique identifier is two letters followed by a random four-digit number.
How can I replace the unique identifiers in inputfile1 with the corresponding unique identifiers in the inputfile2? All of the records in the first table are present in the second, though not vis versa. Below are toy examples of the files.
Input file 1:
Grp Len ident data
A 20 A_20 3k3bj52
A 102 A_102 3k32rf2
A 352 A_352 3w3bj52
B 60 B_60 3k3qwrg
B 42 B_42 3kerj52
C 89 C_89 3kftj55
C 445 C_445 fy5763b
Input file 2:
Grp Len ident
A 20 fz2525
A 102 fz5367
A 352 fz4678
A 356 fz1543
B 60 fz5732
B 11 fz2121
B 42 fz3563
C 89 fz8744
C 245 fz2653
C 445 fz2985
C 536 fz8983
Desired output:
Grp Len ident data
A 20 fz2525 3k3bj52
A 102 fz5367 3k32rf2
A 352 fz4678 3w3bj52
B 60 fz5732 3k3qwrg
B 42 fz3563 3kerj52
C 89 fz8744 3kftj55
C 445 fz2985 fy5763b
My provisional plan is:
Generate extra identifiers for input2, in the style of input1 (easy)
Filter out lines from input2 that don't occur input1 (hardish)
Then stick on the data from input1 (easy)
I might be able to do this in R but the data is large and complex, and I was wondering if there was a way in bash or perl. Any tips in the right direction would be good.

This should work for you, assuming the Grp and Len values are in the same order in both files, as per my comment
Essentially it reads a line from the first file and then reads from the second file, forming the Grp_Len key from each record until it finds an entry that matches. Then it's just a matter of building the new output record
use strict;
use warnings;
open my $f1, '<', 'file1.txt';
print scalar <$f1>;
open my $f2, '<', 'file2.txt';
<$f2>;
while ( <$f1> ) {
my #f1 = split;
my #f2;
while () {
#f2 = split ' ', <$f2>;
last if join('_', #f2[0,1]) eq $f1[2];
}
print "#f2 $f1[3]\n";
}
output
Grp Len ident data
A 20 fz2525 3k3bj52
A 102 fz5367 3k32rf2
A 352 fz4678 3w3bj52
B 60 fz5732 3k3qwrg
B 42 fz3563 3kerj52
C 89 fz8744 3kftj55
C 445 fz2985 fy5763b
Update
Here's another version which is identical except that it builds a printf format string from the spacing of the column headers in the first file. That results in a much neater output
use strict;
use warnings;
open my $f1, '<', 'file1.txt';
my $head = <$f1>;
print $head;
my $format = create_format($head);
open my $f2, '<', 'file2.txt';
<$f2>;
while ( <$f1> ) {
my #f1 = split;
my #f2;
while () {
#f2 = split ' ', <$f2>;
last if join('_', #f2[0,1]) eq $f1[2];
}
printf $format, #f2, $f1[3];
}
sub create_format {
my ($head) = #_;
my ($format, $pos);
while ( $head =~ /\b\S/g ) {
$format .= sprintf("%%-%ds", $-[0] - $pos) if defined $pos;
$pos = $-[0];
}
$format . "%s\n";
}
output
Grp Len ident data
A 20 fz2525 3k3bj52
A 102 fz5367 3k32rf2
A 352 fz4678 3w3bj52
B 60 fz5732 3k3qwrg
B 42 fz3563 3kerj52
C 89 fz8744 3kftj55
C 445 fz2985 fy5763b

Getting fortran runtime error: end of file

I have recently learned how to work with basic files in Fortran
and I assumed it was as simple as:
open(unit=10,file="data.dat")
read(10,*) some_variable, somevar2
close(10)
So I can't understand why this function I wrote is not working.
It compiles fine but when I run it it prints:
fortran runtime error:end of file
Code:
Function Load_Names()
character(len=30) :: Staff_Name(65)
integer :: i = 1
open(unit=10, file="Staff_Names.txt")
do while(i < 65)
read(10,*) Staff_Name(i)
print*, Staff_Name(i)
i = i + 1
end do
close(10)
end Function Load_Names
I am using Fortran 2008 with gfortran.

A common reason for the error you report is that the program doesn't find the file it is trying to open. Sometimes your assumptions about the directory in which the program looks for files at run-time will be wrong.
Try:
using the err= option in the open statement to write code to deal gracefully with a missing file; without this the program crashes, as you have observed;
or
using the inquire statement to figure out whether the file exists where your program is looking for it.

You can check when a file has ended. It is done with the option IOSTAT for read statement.
Try:
Function Load_Names()
character(len=30) :: Staff_Name(65)
integer :: i = 1
integer :: iostat
open(unit=10, file="Staff_Names.txt")
do while(i < 65)
read(10,*, IOSTAT=iostat) Staff_Name(i)
if( iostat < 0 )then
write(6,'(A)') 'Warning: File containts less than 65 entries'
exit
else if( iostat > 0 )then
write(6,'(A)') 'Error: error reading file'
stop
end if
print*, Staff_Name(i)
i = i + 1
end do
close(10)
end Function Load_Names

Using Fortran 2003 standard, one can do the following to check if the end of file is reached:
use :: iso_fortran_env
character(len=1024) :: line
integer :: u1,stat
open (newunit=u1,action='read',file='input.dat',status='old')
ef: do
read(u1,'A',iostat=stat) line
if (stat == iostat_end) exit ef ! end of file
...
end do ef
close(u1)

Thanks for all your help i did fix the code:
Function Load_Names(Staff_Name(65))!Loads Staff Names
character(len=30) :: Staff_Name(65)
integer :: i = 1
open(unit=10, file="Staff_Names.txt", status='old', action='read')!opens file for reading
do while(i < 66)!Sets Set_Name() equal to the file one string at a time
read(10,*,end=100) Staff_Name(i)
i = i + 1
end do
100 close(10)!closes file
return!returns Value
end Function Load_Names
I needed to change read(10,*) to read(10,*,END=100)
so it knew what to do when it came to the end the file
as it was in a loop I assume.

Then your problem was that your file was a row vector, and it was likely
giving you this error immediately after reading the first element, as #M.S.B. was suggesting.
If you have a file with a NxM matrix and you read it in this way (F77):
DO i=1,N
DO j=1,M
READ(UNIT,*) Matrix(i,j)
ENDDO
ENDDO
it will load the first column of your file in the first row of your matrix and will give you an error as soon as it reaches the end of the file's first column, because the loop enforces it to read further lines and there are no more lines (if N<M when j=N+1 for example). To read the different columns you should use an implicit loop, which is why your solution worked:
DO i=1,N
READ(UNIT,*) (Matrix(i,j), j=1,M)
ENDDO

I am using GNU Fortran 5.4.0 on the Ubuntu system 16.04. Please check your file if it is the right one you are looking for, because sometimes files of the same name are confusing, and maybe one of them is blank. As you may check the file path if it is in the same working directory.

Parallel Reading/Writing File in c

Problem is to read a file of size about 20GB simultaneously by n processes. File contains one string at each line and Length of the strings may or may not be same. String length can be at-most 10 bytes long.
I have a cluster of having 16 nodes. Each node are the uni-processor and having 6GB RAM.I am using MPI to write Parallel codes.
What are the efficient way to partition this big file so that all resources can be utilized ?
Note: The constraints to the partitions is to read file as a chunk of fixed number of lines.
Assume file contains 1600 lines(e.g. 1600 strings). then first process should read from 1st line to 100th line, second process should do from 101th line to 200th line and so on....
As i think that one can't read a file by more than one processes at a time because we have only one file handler that point to somewhere only one string. then how other processes can read parallely from different chunks?

So as you're discovering, text file formats are poor for dealing with large amounts of data; not only are they larger than binary formats, but you run into formatting problems like here (seaching for newlines), and everything is much slower (data must be converted into strings). There can easily be 10x difference in IO speeds between text-based formats and binary formats for numerical data. But we'll assume for now you're stuck with the text file format.
Presumably, you're doing this partitioning for speed. But unless you have a parallel filesystem -- that is, multiple servers serving from multiple disks, and a FS that can keep those coordinated -- it's unlikely you're going to get a significant speedup from having multiple MPI tasks reading from the same file, as ultimately these requests are all going to get serialized anyway at the server/controller/disk level.
Further, reading in large blocks of data is going to be much faster than fseek()ing around and doing small reads looking for newlines.
So my suggestion would be to have one process (perhaps the last) read all the data in as few chunks as it can and send the relevant lines to each task (including, finally, itself). If you know how many lines the file has at the start, this is fairly simple; read in say 2 GB of data, search through memory for the end of the N/Pth line, and send that to task 0, send task 0 a "completed your data" message, and continue.

You don't specify if there are any constraints on the partitions, so I'll assume there are none. I'll also assume that you want the partitions to be as close to equal in size as possible.
The naïve approach would be to split the file into chunks of size 20GB/n. The starting position of chunk i wouild be i*20GB/n for i=0..n-1.
The problem with that is, of course, that there's no guarantee that chunk boundaries would fall between the lines of the input file. In general, they won't.
Fortunately, there's an easy way to correct for this. Having established the boundaries as above, shift them slightly so that each of them (except i=0) is placed after the following newline.
That'll involve reading 15 small fragments of the file, but will result in a very even partition.
In fact, the correction can be done by each node individually, but it's probably not worth complicating the explanation with that.

I think it would be better to write a piece of code that would get line lengths and distribute lines to processes. That distributing function would work not with strings themselves, but only their lengths.
To find an algorythm for even distribution of sources of fixed size is not a problem.
And after that the distributing func will tell other processes what pieces they have to get for work. Process 0 (distributor) will read a line. It already knows, that the line num. 1 should be worked by the process 1. ... P.0 reads line num. N and knows what process has to work with it.
Oh! We needn't optimize the distribution from the start. Simply the distributor process reads a new line from input and gives it to a free process. That's all.
So, you have even two solutions: heavily optimized and easy one.
We could reach even more optimalization if the distributor process will reoptimalize the unread yet strings from time to time.

Here is a function in python using mpi and the pypar extension to read the number of lines in a big file using mpi to split up the duties amongst a number of hosts.
def getFileLineCount( file1 ):
import pypar, mmap, os
"""
uses pypar and mpi to speed up counting lines
parameters:
file1 - the file name to count lines
returns:
(line count)
"""
p1 = open( file1, "r" )
f1 = mmap.mmap( p1.fileno(), 0, None, mmap.ACCESS_READ )
#work out file size
fSize = os.stat( file1 ).st_size
#divide up to farm out line counting
chunk = ( fSize / pypar.size() ) + 1
lines = 0
#set start and end locations
seekStart = chunk * ( pypar.rank() )
seekEnd = chunk * ( pypar.rank() + 1 )
if seekEnd > fSize:
seekEnd = fSize
#find start of next line after chunk
if pypar.rank() > 0:
f1.seek( seekStart )
l1 = f1.readline()
seekStart = f1.tell()
#tell previous rank my seek start to make their seek end
if pypar.rank() > 0:
# logging.info( 'Sending to %d, seek start %d' % ( pypar.rank() - 1, seekStart ) )
pypar.send( seekStart, pypar.rank() - 1 )
if pypar.rank() < pypar.size() - 1:
seekEnd = pypar.receive( pypar.rank() + 1 )
# logging.info( 'Receiving from %d, seek end %d' % ( pypar.rank() + 1, seekEnd ) )
f1.seek( seekStart )
logging.info( 'Calculating line lengths and positions from file byte %d to %d' % ( seekStart, seekEnd ) )
l1 = f1.readline()
prevLine = l1
while len( l1 ) > 0:
lines += 1
l1 = f1.readline()
if f1.tell() > seekEnd or len( l1 ) == 0:
break
prevLine = l1
#while
f1.close()
p1.close()
if pypar.rank() == 0:
logging.info( 'Receiving line info' )
for p in range( 1, pypar.size() ):
lines += pypar.receive( p )
else:
logging.info( 'Sending my line info' )
pypar.send( lines, 0 )
lines = pypar.broadcast( lines )
return ( lines )

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Read and write tab-delimited text data - format

Related

How to Enable Scroll Bar of QBasic Output Window?

Debugging <<loop>> error message in haskell

bash merging tables on unique id

Getting fortran runtime error: end of file

Parallel Reading/Writing File in c

Categories

Resources