How to transform one column to much column on matrix using Fortran 90 - matrix

I have one column (im = 160648) and row (jm = 1). I want to transform that to a matrix with sizes (im = 344) and (jm=467)
my program code is
program matrix
parameter (im=160648, jm=1)
dimension h(im,jm)
integer::h
open (1,file="Hasil.txt", status='old')
open (2,file="HasilNN.txt", status='unknown')
do i=1,jm
read(1,*)(h(i,j)),j=1,jm)
end do
do i=1,im
write(2,33)(h(i,j),j=1,jm)
end do
33 format(1x, 344f10.6)
end program matrix
the error code that appears when read(1,*)(h(i,j)),j=1,jm)
the data type is floating data.

Your read loop is:
do i=1,jm
read(1,*)(h(i,j)),j=1,jm)
end do
Shouldn't do i=1,jm be do i=1,im ?
This would imply there are "im" records (lines) in the formatted text file Hasil.txt, which your question suggests.
read(1,*)(h(i,j)),j=1,jm) implies each record (line of text) has "jm" values, which is 1 value per line. Is this what the file looks like ? (An unknown number of blank lines will be skipped with this read (lu,*) ... statement.)
You appear to be wanting to write this information to another file; HasilNN.txt using 33 format (1x, 344f10.6) which suggests 3441 characters per line, although your write statement will write only 1 value per line (as jm=1). This would be a very long line for a text file and probably difficult to manage outside the program. If you did wish to do this, you could achieve this with an implied do loop, such as:
write(2,33) ((h(i,j),j=1,jm),I=1,im)
A few comments:
using jm = 1 implies each row has only one value, which could be equivalently represented as a 1d vector "dimension h(im)", negating the need for j
File unit numbers 1 and 2 are typically reserved unit numbers for screen/keyboard. You would be better using units 11 and 12.
When devising this code, you need to address the record structure in the 2 files, as a simple vector could be used. You can control the line length with the format. A format of (1x,8f10.6) would create a record of 81 characters, which would be much easier to manage.
Format descriptor f10.6 also limits the range of values you can manage in the files. Values >= 1000 or <= -100 will overflow this format, while values smaller than 1.e-6 will be zero.
As #francescalus has noted, you have declared "h" as integer, but use a real format descriptor. This will produce an "Error : format-data mismatch" and has to be changed to what is expected in the file.
You should consider what you wish to achieve and adjust the code.

Related

Most efficient data structure for a nested loop?

I am iterating through each line in the first file (3000 lines total) to find it's corresponding label in the second file, line by line (which is ~2 million lines; 47 MB)
Currently, I have a nested loop structure with the outer loop grabbing a line (converting into a list) and the inner loop iterating through the 2 million lines (line by line):
for row in read_FIMO: #read_FIMO is first file; 3000 lines long
with open("chr8labels.txt") as label: #2 million lines long
for line in csv.reader(label, delimiter="\t"): #list
for i in range(int(row[3]),int(row[4])):
if i in range((int(line[1])-50),int(line[1])):#compare the ranges in each list
line1=str(line)
row1=str(row)
outF.append(row1+"\t"+line1)
-I realize this is horribly inefficient, but I need to find all instances of when the first range overlaps with the ranges of the other file
-Is reading in each file line by line the fastest way? if not, what would the best data structure be for entire file
-should the lines be in a different data structure other than lists?
THANK YOU if you have any feedback!
aside: the purpose is to label a range of numbers if the numbers are found in the ranges of the other file(long story; maybe not relevant?)
Your goal seems to be to find whether one range (row[3] to row[4]) overlaps with another (line[1]-50 to line[1]). For this, it is sufficient to that that either line[1]-50 or row[3] lies inside the other range. This eliminates the third nested loop.
Also, take the 2-million-line file and sort it, once, and then use the sorted list inside your 3000-line loop to do a binary search, cutting from a O(nm) algorithm to an O(nlogm) one.
(My Python is far from perfect, but this should get you going in the right direction.)
with open("...") as label:
reader = csv.reader(label, delimiter="\t")
lines = list(reader)
lines.sort(key=lambda line: int(line[1]))
for row in read_FIMO:
# Find the nearest, lesser value in lines smaller than row[3]
line = binary_search(lines, int(row[3]))
# If what you're after is multiple matches, then
# instead of getting a single line, get the smallest and
# largest indexes whose ranges overlap (which you can do with a
# binary search for row[3] and row[4]+50)
# e.g.,
# smallestIndex = binary_search(lines, int(row[3]))
# largestIndex = binary_search(lines, int(row[4])+50)
# for index in range(smallestIndex, largestIndex+1):
lower1 = int(line[1] - 50)
lower2 = int(row[3])
upper1 = int(line[1])
upper2 = int(row[4])
if (lower1 > lower2 and lower1 < upper1) or (lower2 > lower1 and lower2 < upper2):
line1=str(line)
row1=str(row)
outF.append(row1+"\t"+line1)
I think am efficient way would be...
Going through the destination file first and record all the labels
to a dictionary hash. [LabelName] -> [Line number]
Go through each line of the source and lookup the line from the
dictionary and print (or print something else if not found)
Notes / Tips
I think the above would be O(n)
You could also go through the source file first and record then go through the destination file. This would create a smaller dictionary (and use less memory) but the output may not be in the order you would like. (smaller data sets often have better cache hits ratios so this is why I bring this up.)
Also, only because you mentioned most efficient, and if you want to get crazy, I would try skipping the line by line processing, and just go through the input like it is one big string. Search for a new line char + white-space + a label name. Using this tip kind of depends what the input data looks like though. If the "csv.reader" already does parse by line then this would not be good. Also, you may need a lower level language with more control to do this method. (Caution: 90% chance this tip would just lead you to a rabbit hole that goes no-where)

Extracting data from text file in AMPL without adding indexes

I'm new to AMPL and I have data in a text file in matrix form from which I need to use certain values. However, I don't know how to use the matrices directly without having to manually add column and row indexes to them. Is there a way around this?
So the data I need to use looks something like this, with hundreds of rows and columns (and several more matrices like this), and I would like to use it as a parameter with index i for rows and j for columns.
t=1
0.0 40.95 40.36 38.14 44.87 29.7 26.85 28.61 29.73 39.15 41.49 32.37 33.13 59.63 38.72 42.34 40.59 33.77 44.69 38.14 33.45 47.27 38.93 56.43 44.74 35.38 58.27 31.57 55.76 35.83 51.01 59.29 39.11 30.91 58.24 52.83 42.65 32.25 41.13 41.88 46.94 30.72 46.69 55.5 45.15 42.28 47.86 54.6 42.25 48.57 32.83 37.52 58.18 46.27 43.98 33.43 39.41 34.0 57.23 32.98 33.4 47.8 40.36 53.84 51.66 47.76 30.95 50.34 ...
I'm not aware of an easy way to do this. The closest thing is probably the table format given in section 9.3 of the AMPL Book. This avoids needing to give indices for every term individually, but it still requires explicitly stating row and column indices.
AMPL doesn't seem to do a lot with position-based input formats, probably because it defaults to treating index sets as unordered so the concept of "first row" etc. isn't meaningful.
If you really wanted to do it within AMPL, you could probably put together a work-around along these lines:
declare a single-index param with length equal to the total size of your matrix (e.g. if your matrix is 10 x 100, this param has length 1000)
edit the beginning and end of your "matrix" data file to turn it into appropriate format for a single-index parameter indexed from 1 to n
then define your matrix something like this:
param m{i in 1..nrows,j in 1..ncols} := x[j+i*(ncols-1)];
(not tested, I won't promise that I have rows and columns the right way around there!)
But you're probably better off editing the input file into one of the standard AMPL matrix formats. AMPL isn't really designed for data wrangling - you can do it in a pinch but if you're doing this kind of thing repeatedly it may be less trouble to code it in a general-purpose language e.g. Python.

How to initialize a simple matrix in SAS?

I am new to SAS and have been using R most of the time. I am stuck with a simple and frustrating issue. All I want to do is to create a simple 3 X 3 matrix in SAS. But it throws an error. I need some help in understanding what's going on. The SAS documentation is not very helpful.
data matrixTest;
input Y $ X;
cards;
4 0
3 1
1 1
;
run;
/*Convert X to a categorical variable*/
data matrixTest;
set matrixTest;
if X = 0 then X = "0";
else X = "1";
run;
/*Get design matrix from the regression model*/
proc transreg data=matrixTest design;
model class(X/ zero=last);
output out=input_mcmc(drop=_: Int:);
run;
mX = {5 4 3, 4 0 4, 7 10 3};
And I get the following error when creating the matrix mX:
ERROR 180-322: Statement is not valid or it is used out of proper order.
Your error is that SAS is not a matrix language. SAS is more like a database language; the unit of operation is the dataset, analogous to a SQL table or a dataframe in R or Python.
SAS does have a matrix language built into the system, SAS/IML (interactive matrix language), but it's not part of base SAS and isn't really what you use in the context you're showing. The way you enter data as part of your program is how you did it in the first data step, with datalines.
Side note: You're also showing some R tendencies in the second data step; you cannot convert a variable's type that way. SAS has only 'numeric' and 'character', so you don't have 'categorical' data type anyway; just leave it as is.
Do not use the same data set name in the SET and DATA statements. This makes it hard to debug because you've destroyed your initial data set.
You cannot change types on the fly in SAS. If a variables i character it stays character.
If a variable is numeric, you assign values without quotes, quotes are used for character variables.
Your attempt to create a categorical variable doesn't make sense given the fact that it's already 0/1. Make sure your test data is reflective of your actual situation.
I'm not familiar with PROC TRANSREG so I cannot comment on that portion but those are the issues you're facing now.
As someone else mentioned, SAS is not a matrix language, it processes data line by line instead which means it can handle really, really large data sets because it doesn't have to load it into memory.
Your data set, matrixTest is essentially a data set and ready to go. You don't need to convert it to a matrix or 'initialize' it.
If you want a data set with those values then create that as a data set:
data mx;
input var1-var3;
cards;
5 4 3
4 0 4
7 10 3
;
run;

How to copy only selected column of input file to output file in jcl sort

I am trying to copy data at position (50,10) of my input file to an output file,
but I am having problems.
My input file size is 100; the needed data is from the 50th position for next 10 bytes.
I have used the following options but each of them cause an abend.
I have taken output file as length 10 only, as I only need 10 bytes.
But abend says. OUTREC RECORD LENGTH = 10
SORTIN : RECFM=VB ; LRECL= 100; BLKSIZE= 1000
SORTIN : DSNAME=MNV.TESTS.DF.CPR810S1.EZ2OP
OUTREC RECORD LENGTH = 10
SORTOUT RECFM INCOMPATIBLE
SORTOUT : RECFM=FB ; LRECL= ; BLKSIZE=
I have used the below options:
OUTREC FIELDS(50,10)
SORT FIELDS(1,4,CH,A)
--------didn't work------------
SORT FIELDS=COPY
OUTREC FIELDS=(115,9,125,10)
--------didn't work------------
SORT FIELDS=COPY
BUILD=(50,10)
--------didn't work------------
INREC FIELDS=(50,10)
SORT FIELDS=(1,3,CH,A)
--------didn't work------------
I know it's pointless to mention that you rarely Accept or provide feedback, and are not that much of a voter either.
For some reason you cut them off, but all those messages you posted come with a WER prefix and a message number. If you consult your SyncSORT manual, you'll find all the messages documented.
Forget that for a moment. You have posted SORTOUT RECFM INCOMPATIBLE. Why go on about the record-length? The RECFM. The RECFM. You have included the text of the message which shows the RECFM of the SORTIN, and also the one which shows the RECFM of SORTOUT. They are VB and FB respectively. If you look at the message in the manual, you'll discover that you haven't done anything explicit to make them different.
You have two choices. VTOF or CONVERT. You can use them on OUTREC (I believe) and OUTFIL (for sure).
OPTION COPY
OUTFIL VTOF,
BUILD=(50,10)
Why you'd want to try SORTing the file, I don't know, and you should be aware by not that just making up syntax does not work.
For SORT, by default, the output file is the same RECFM as the input. A variable-length record must always contain an RDW, 1,4 and the data itself starts at position 5.
If you need an output file of a different RECFM, then you must be explicit about it (with CONVERT, FTOV or VTOF).
When creating an F record, no RDW, so your BUILD=(50,10) is the correct format (if you are four bytes out, remember that for a V record, data starts at position five, so you need to add four to all start-positions which don't take account of the RDW (like a COBOL record-layout).
When creating a V from an F, no RDW, the FTOV/CONVERT will create it.
With V input and V output, always specify (1,4 at the start of your BUILD statement.

Format statement with unknown columns

I am attempting to use fortran to write out a comma-delimited file for import into another commercial package. The issue is that I have an unknown number of data columns. My output needs to look like this:
a_string,a_float,a_different_float,float_array_elem1,float_array_elem2,...,float_array_elemn
which would result in something that might look like this:
L1080,546876.23,4325678.21,300.2,150.125,...,0.125
L1090,563245.1,2356345.21,27.1245,...,0.00983
I have three issues. One, I would prefer the elements to be tightly grouped (variable column width), two, I do not know how to define a variable number of array elements in the format statement, and three, the array elements can span a large range--maybe 12 orders of magnitude. The following code conceptually does what I want, but the variable 'n' and the lack of column-width definition throws an error (of course):
WRITE(50,900) linenames(ii),loc(ii,1:2),recon(ii,1:n)
900 FORMAT(A,',',F,',',F,n(',',F))
(I should note that n is fixed at run-time.) The write statement does what I want it to when I do WRITE(50,*), except that it's width-delimited.
I think this thread almost answered my question, but I got quite confused: SO. Right now I have a shell script with awk fixing the issue, but that solution is...inelegant. I could do some manipulation to make the output a string, and then just write it, but I would rather like to avoid that option if at all possible.
I'm doing this in Fortran 90 but I like to try to keep my code as backwards-compatible as possible.
the format close to what you want is f0.3, this will give no spaces and a fixed number of decimal places. I think if you want to also lop off trailing zeros you'll need to do a good bit of work.
The 'n' in your write statement can be larger than the number of data values, so one (old school) approach is to put a big number there, eg 100000. Modern fortran does have some syntax to specify indefinite repeat, i'm sure someone will offer that up.
----edit
the unlimited repeat is as you might guess an asterisk..and is evideltly "brand new" in f2008
In order to make sure that no space occurs between the entries in your line, you can write them separately in character variables and then print them out using theadjustl() function in fortran:
program csv
implicit none
integer, parameter :: dp = kind(1.0d0)
integer, parameter :: nn = 3
real(dp), parameter :: floatarray(nn) = [ -1.0_dp, -2.0_dp, -3.0_dp ]
integer :: ii
character(30) :: buffer(nn+2), myformat
! Create format string with appropriate number of fields.
write(myformat, "(A,I0,A)") "(A,", nn + 2, "(',',A))"
! You should execute the following lines in a loop for every line you want to output
write(buffer(1), "(F20.2)") 1.0_dp ! a_float
write(buffer(2), "(F20.2)") 2.0_dp ! a_different_float
do ii = 1, nn
write(buffer(2+ii), "(F20.3)") floatarray(ii)
end do
write(*, myformat) "a_string", (trim(adjustl(buffer(ii))), ii = 1, nn + 2)
end program csv
The demonstration above is only for one output line, but you can easily write a loop around the appropriate block to execute it for all your output lines. Also, you can choose different numerical format for the different entries, if you wish.

Resources