READ statement doesn't read last number from file - matrix

Suppose you have this program
Subroutine readDIM which reads the dimensions (rows, columns) of a matrix from a txt file. (In order to simplify, let it be an INTEGER). ReadDIM works using tokens and it works fine by assumption.
A text file containing for example:
1 2 3 4
1 2 20 5
3 0 333 3
Returns nrow = 3, ncol = 4
Since readDIM has given the true dimensions of the matrix, I want to allocate space to:
REAL, DIMENSION (:,:), ALLOCATABLE :: vMatrix
To read the matrix from a txt file and to store it into the 2d-array. So I've written the following
SUBROUTINE buildVMatrix
OPEN(UNIT=1, FILE = filename, STATUS ='OLD',IOSTAT=ios);
ALLOCATE(vMatrix(nrow,ncol));
WRITE(*,*) "Register matrix from file:", filename;
WRITE(*,*) "-------------------------------------------------------";
DO i = 1, UBOUND(vMatrix,1)
READ(1,*, IOSTAT = ios) (vMatrix(i,j),j=1,UBOUND(vMatrix,2));
!IF(ios /= 0 ) EXIT
END DO
CLOSE(1)
END SUBROUTINE
When I print vMatrix the output is:
matrix.txt : 1 2 3 4 buildVMatrix output (once printed) 1 2 3 4
1 2 20 5 1 2 20 5
3 0 333 3 3 0 333 0
It doesn't read the last number. I know it's caused by the DO loop inside buildVMatrix, but can't explain myself this and have no idea how to fix it writing a different code.

It's because there's no line ending at the last line in your txt file, try to type a return after the last number.

Related

How to extract vectors from a given condition matrix in Octave

I'm trying to extract a matrix with two columns. The first column is the data that I want to group into a vector, while the second column is information about the group.
A =
1 1
2 1
7 2
9 2
7 3
10 3
13 3
1 4
5 4
17 4
1 5
6 5
the result that i seek are
A1 =
1
2
A2 =
7
9
A3 =
7
10
13
A4=
1
5
17
A5 =
1
6
as an illustration, I used the eval function but it didn't give the results I wanted
Assuming that you don't actually need individually named separated variables, the following will put the values into separate cells of a cell array, each of which can be an arbitrary size and which can be then retrieved using cell index syntax. It makes used of logical indexing so that each iteration of the for loop assigns to that cell in B just the values from the first column of A that have the correct number in the second column of A.
num_cells = max (A(:,2));
B = cell (num_cells,1);
for idx = 1:max(A(:,2))
B(idx) = A((A(:,2)==idx),1);
end
B =
{
[1,1] =
1
2
[2,1] =
7
9
[3,1] =
7
10
13
[4,1] =
1
5
17
[5,1] =
1
6
}
Cell arrays are accessed a bit differently than normal numeric arrays. Array indexing (with ()) will return another cell, e.g.:
>> B(1)
ans =
{
[1,1] =
1
2
}
To get the contents of the cell so that you can work with them like any other variable, index them using {}.
>> B{1}
ans =
1
2
How it works:
Use max(A(:,2)) to find out how many array elements are going to be needed. A(:,2) uses subscript notation to indicate every value of A in column 2.
Create an empty cell array B with the right number of cells to contain the separated parts of A. This isn't strictly necessary, but with large amounts of data, things can slow down a lot if you keep adding on to the end of an array. Pre-allocating is usually better.
For each iteration of the for loop, it determines which elements in the 2nd column of A have the value matching the value of idx. This returns a logical array. For example, for the third time through the for loop, idx = 3, and:
>> A_index3 = A(:,2)==3
A_index3 =
0
0
0
0
1
1
1
0
0
0
0
0
That is a logical array of trues/falses indicating which elements equal 3. You are allowed to mix both logical and subscripts when indexing. So using this we can retrieve just those values from the first column:
A(A_index3, 1)
ans =
7
10
13
we get the same result if we do it in a single line without the A_index3 intermediate placeholder:
>> A(A(:,2)==3, 1)
ans =
7
10
13
Putting it in a for loop where 3 is replaced by the loop variable idx, and we assign the answer to the idx location in B, we get all of the values separated into different cells.

Count characters starting at zero?

I need to write a for-each loop that lists each character in
mystery_string with its index. Example below:
mystery_string= "Olivia," output would be:
0 O
1 l
2 i
3 v
4 i
5 a
I cannot use the range function on this problem.
This is my code, but the number starts at 1. What am I doing wrong?
mystery_string = "CS1301"
count = 0
for current_letter in mystery_string:
count = count + 1
print (count , current_letter)
I have been getting this as output:
1 C
2 S
3 1
4 3
5 0
6 1
but it needs to start at zero.
Just add the count (count += 1) after you print in the for loop
Note: Also, please format your code in a code block surrounded with a tick(`) or multiline code with 3 tick (```)
The pythonic way is to use enumerate() in such a case. This way you'll get both the index and the content of your string.
mystery_string = "CS1301"
for count, current_letter in enumerate(mystery_string):
print (count , current_letter)

Subsetting Data with GREP

I have a very large text file (16GB) that I want to subset as fast as possible.
Here is a sample of the data involved
0 M 4 0
0 0 Q 0 10047345 3080290,4098689 50504886,4217515 9848058,1084315 50534229,4217515 50591618,4217515 26242582,2597528 34623075,3279130 68893581,5149883 50628761,4217517 32262001,3142702 35443881,3339757
0 108 C 0 50628761
0 1080 C 0 50628761
1 M 7 0
1 0 Q 0 17143989
2 M 15 1
2 0 Q 0 17143989 4219157,1841361,853923,1720163,1912374,1755325,4454730 65548702,4975721 197782,39086 54375043,4396765 31589696,3091097 6876504,851594 3374640,455375 13274885,1354902 31585771,3091016 61234218,4723345 31583582,3091014
2 27 C 0 31589696
The first number on every line is a sessionID and any line with an 'M' denotes the start of a session (data is grouped by session). The number following an M is a Day and the second number is a userID, a user can have multiple sessions.
I want to extract all lines related to a specific user which for each session include all of the lines up until the next 'M' line is encountered (can be any number of lines). As a second task I also want to extract all session lines related to a specific day.
For example with the above data, to extract the records for userid '0' the output would be:
0 M 4 0
0 0 Q 0 10047345 3080290,4098689 50504886,4217515 9848058,1084315 50534229,4217515 50591618,4217515 26242582,2597528 34623075,3279130 68893581,5149883 50628761,4217517 32262001,3142702 35443881,3339757
0 108 C 0 50628761
0 1080 C 0 50628761
1 M 7 0
1 0 Q 0 17143989
To extract the records for day 7 the output would be:
1 M 7 0
1 0 Q 0 17143989
I believe there is a much more elegant and simple solution to what I have achieved so far and it would be great to get some feedback and suggestions. Thank you.
What I have tried
I tried to use pcrgrep -M to apply this pattern directly (matching data between two M's) but struggled to get this working across the linebreaks. I still suspect this may be the fastest option so any guidance on whether this may be possible would be great.
The next part is quite scattered and it is not necessary to read on if you already have an idea for a better solution!
Failing the above, I split the problem into two parts:
Part 1: Isolating all 'M' lines to obtain a list of sessions which belonging to that user/day
grep method is fast (then need to figure out how to use this data)
time grep -c "M\t.*\t$user_id" trainSample.txt >> sessions.txt
awk method to create an array is slow
time myarr=$(awk '/M\t.*\t$user_id/ {print $1}' trainSample.txt
Part 2: Extracting all lines belonging to a session on the list created in part 1
Continuing from the awk method, I ran grep for each but this is WAY too slow (days to complete 16GB)
for i in "${!myarr[#]}";
do
grep "^${myarr[$i]}\t" trainSample.txt >> sessions.txt
echo -ne "Session $i\r"
done
Instead of running grep once per session ID as above using them all in the one grep command is MUCH faster (I ran it with 8 sessionIDs in a [1|2|3|..|8] format and it took the same time as each did separately i.e. 8X faster). However I need then to figure out how to do this dynamically
Update
I have actually established a working solution which only takes seconds to complete but it is some messy and inflexible bash coe which I have yet to extend to the second (isolating by days) case.
I want to extract all lines related to a specific user which for each session include all of the lines up until the next 'M' line is encountered (can be any number of lines).
$ awk '$2=="M"{p=$4==0}p' file
0 M 4 0
0 0 Q 0 10047345 3080290,4098689 50504886,4217515 9848058,1084315 50534229,4217515 50591618,4217515 26242582,2597528 34623075,3279130 68893581,5149883 50628761,4217517 32262001,3142702 35443881,3339757
0 108 C 0 50628761
0 1080 C 0 50628761
1 M 7 0
1 0 Q 0 17143989
As a second task I also want to extract all session lines related to a specific day.
$ awk '$2=="M"{p=$3==7}p' file
1 M 7 0
1 0 Q 0 17143989

Variable format

I wrote a program to calculate a square finite difference matrix, where you can enter the number of rows (equals the number of columns) -> this is stored in the variable matrix. The program works fine:
program fin_diff_matrix
implicit none
integer, dimension(:,:), allocatable :: A
integer :: matrix,i,j
print *,'Enter elements:'
read *, matrix
allocate(A(matrix,matrix))
A = 0
A(1,1) = 2
A(1,2) = -1
A(matrix,matrix) = 2
A(matrix,matrix-1) = -1
do j=2,matrix-1
A(j,j-1) = -1
A(j,j) = 2
A(j,j+1) = -1
end do
print *, 'Matrix A: '
write(*,1) A
1 format(6i10)
end program fin_diff_matrix
For the output I want that matrix is formatted for the output, e.g. if the user enters 6 rows the output should also look like:
2 -1 0 0 0 0
-1 2 -1 0 0 0
0 -1 2 -1 0 0
0 0 -1 2 -1 0
0 0 0 -1 2 -1
0 0 0 0 -1 2
The output of the format should also be variable, e.g. if the user enters 10, the output should also be formatted in 10 columns. Research on the Internet gave the following solution for the format statement with angle brackets:
1 format(<matrix>i<10)
If I compile with gfortran in Linux I always get the following error in the terminal:
fin_diff_matrix.f95:37.12:
1 format(<matrix>i10)
1
Error: Unexpected element '<' in format string at (1)
fin_diff_matrix.f95:35.11:
write(*,1) A
1
Error: FORMAT label 1 at (1) not defined
What doesn't that work and what is my mistake?
The syntax you are trying to use is non-standard, it works only in some compilers and I discourage using it.
Also, forget the FORMAT() statements for good, they are obsolete.
You can get your own number inside the format string when you construct it yourself from several parts
character(80) :: form
form = '( (i10,1x))'
write(form(2:11),'(i10)') matrix
write(*,form) A
You can also write your matrix in a loop per row and then you can use an arbitrarily large count number or a * in Fortran 2008.
do i = 1, matrix
write(*,'(999(i10,1x))') A(:,i)
end do
do i = 1, matrix
write(*,'(*(i10,1x))') A
end do
Just check if I did not transpose the matrix inadvertently.

how to iterate through one column matrix in matlab

I'm new to MATLAB and i'm trying to figure out how I would iterate over a matrix with only one column to count the occurrence of some number, n. For example, I would like to count how many times '1' appears in the matrix:
1
4
1
88
6
22
1
How could I make a loop that returns '3'? How would I create a loop that counts how many times some loop counter occurs (i.e. start at 0 and increment by one each loop to count how many times the counter occurs in the matrix)?
Thanks
Just use sum
>> a=[1 4 1 88 6 22 1]';
>> n=1;
>> sum(a==n)
ans =
3
a = [1 4 1 88 6 22 1];
count_n = size(a(a==n));
You wouldn't need to run a loop. You could just do it like this:
a = [ 1 4 1 88 6 22 1];
n = 1;
length(find(a(:)==n))

Resources