Multiplication algorithm of nxm and mxp matrices in Scala - algorithm

I am wondering why this matrix multiplication is not working in my Scala program, versus the result I am receiving when using Python. I am using the matrix multiplication algorithm described by this math: Matrix Multiplication where I have two matrices a = n x m and b = m x p. The code that I have written for this algorithm is (each matrix is a 2d array of doubles):
def dot(other: Matrix2D): Matrix2D ={
if (this.shape(1) != other.shape(0)){
throw new IndexOutOfBoundsException("Matrices were not the right shape! [" + this.shape(1) + " != " + other.shape(0) + "]")
}
val n = this.shape(1) //returns the number of columns, shape(0) returns number of rows
var a = matrix.clone()
var b = other.matrix.clone()
var c = Array.ofDim[Double](this.shape(0), other.shape(1))
for(i <- 0 until c.length){
for (j <- 0 until c(0).length){
for (k <- 0 until n){
c(i)(j) += a(i)(k) * b(k)(j)
}
}
}
Matrix2D(c)
}
The Input I put into both the Scala and Python code is:
a = [[1.0 1.0 1.0 1.0 0.0 0.0 0.0]
[1.0 1.0 0.0 1.0 0.0 0.0 0.0 ]
[1.0 1.0 1.0 1.0 1.0 1.0 1.0 ]
[1.0 0.0 0.0 0.0 1.0 1.0 1.0 ]
[1.0 0.0 0.0 0.0 1.0 0.0 1.0 ]
[1.0 0.0 0.0 0.0 0.0 0.0 0.0 ]]
b = [[0.0 0.0 0.0 ]
[0.0 -0.053430398509053074 0.021149859549078387 ]
[0.0 -0.010785871994186721 0.04942555653681449 ]
[0.0 0.04849323245519227 -0.0393881161667335 ]
[0.0 -0.03871752673999099 0.05228579488821056 ]
[0.0 0.07935206375269452 0.06511344235965408 ]
[0.0 -0.02462677123918247 1.723607966539059E-4 ]]
The output I receive from this function is:
[[0.0 -0.015723038048047533 0.031187299919159375]
[0.0 -0.0049371660538608045 -0.018238256617655116]
[0.0 2.84727725473527E-4 0.14875889796367792 ]
[0.0 0.01600776577352106 0.11757159804451854 ]
[0.0 -0.06334429797917346 0.05245815568486446 ]
[0.0 0.0 0.0 ]]
compared to python's numpy.dot algorithm:
[[ 0. -0.01572304 0.0311873 ]
[ 0. -0.00493717 -0.01823826]
[ 0. -0.01572304 0.0311873 ]
[ 0. 0.08912777 0.07801112]
[ 0. 0.00977571 0.01289768]
[ 0. 0.08912777 0.07801112]]
I am wondering why this algorithm doesn't completely fill the output algorithm that I need...I've been messing with the for loops and such and have not been able to figure out whats wrong.

Can you show your Python code?
I tried this in Numpy and get the same as your Scala code:
import numpy as np
a = np.array([[1.0,1.0,1.0,1.0,0.0,0.0,0.0],
[1.0, 1.0, 0.0, 1.0, 0.0,0.0,0.0 ],
[1.0, 1.0, 1.0, 1.0, 1.0,1.0,1.0 ],
[1.0, 0.0, 0.0, 0.0, 1.0 ,1.0,1.0 ],
[1.0, 0.0, 0.0, 0.0, 1.0, 0.0,1.0 ],
[1.0, 0.0, 0.0, 0.0, 0.0, 0.0,0.0 ]])
b=np.array([[0.0 ,0.0 ,0.0 ],
[0.0 ,-0.053430398509053074 ,0.021149859549078387 ],
[0.0 ,-0.010785871994186721, 0.04942555653681449 ],
[0.0 , 0.04849323245519227 ,-0.0393881161667335 ],
[0.0 ,-0.03871752673999099 , 0.05228579488821056 ],
[0.0 , 0.07935206375269452 , 0.06511344235965408 ],
[0.0 ,-0.02462677123918247 ,1.723607966539059E-4 ]])
print a.dot(b)
prints:
[[ 0. -0.01572304 0.0311873 ]
[ 0. -0.00493717 -0.01823826]
[ 0. 0.00028473 0.1487589 ]
[ 0. 0.01600777 0.1175716 ]
[ 0. -0.0633443 0.05245816]
[ 0. 0. 0. ]]

Related

Epipolar Geometry Pure Translation Implementing Equation 9.6 from book Multiview Geometry

Implementing Equation 9.6
We want to calculate how each pixel in image will move when we know the camera translation and depth of each pixel.
The book Multive View Geometry gives solution in Chapter 9 in section 9.5
height = 512
width = 512
f = 711.11127387
#Camera intrinsic parameters
K = np.array([[f, 0.0, width/2],
[0.0, f, height/2],
[0.0, 0.0, 1.0]])
Kinv= np.array([[1, 0, -width/2],
[0, 1, -height/2],
[0, 0, f ]])
Kinv = np.linalg.inv(K)
#Translation matrix on the Z axis change dist will change the height
T = np.array([[ 0.0],
[0.0],
[-0.1]])
plt.figure(figsize=(10, 10))
ax = plt.subplot(1, 1, 1)
plt.imshow(old_seg)
for row, col in [(150,100), (450, 350)]:
ppp = np.array([[col],[row],[0]])
print (" Point ", ppp)
plt.scatter(ppp[0][0], ppp[1][0])
# Equation 9.6
new_pt = ppp + K.dot(T/old_depth[row][col])
print (K)
print (T/old_depth[row][col])
print (K.dot(T/old_depth[row][col]))
plt.scatter(new_pt[0][0], new_pt[1][0], c='c', marker=">")
ax.plot([ppp[0][0],new_pt[0][0]], [ppp[1][0],new_pt[1][0]], c='g', alpha=0.5)
Output
Point [[100]
[150]
[ 0]]
[[711.11127387 0. 256. ]
[ 0. 711.11127387 256. ]
[ 0. 0. 1. ]]
[[ 0. ]
[ 0. ]
[-0.16262454]]
[[-41.63188234]
[-41.63188234]
[ -0.16262454]]
Point [[350]
[450]
[ 0]]
[[711.11127387 0. 256. ]
[ 0. 711.11127387 256. ]
[ 0. 0. 1. ]]
[[ 0. ]
[ 0. ]
[-0.19715078]]
[[-50.47059987]
[-50.47059987]
[ -0.19715078]]
I expect the bottom point to move in the opposite direction .
What mistake am I doing ?

Julia: How to execute functions in parallel?

I want to run functions in parallel. These functions are executed many times in a loop.
coordSys = SharedArray{Bool}([true,false,true,true]);
dir = SharedArray{Int8}([1,2,3,2]);
load = SharedArray{Float64}([8,-7.5,7,-8.5]);
L = SharedArray{Float64}([400,450,600,500]);
r = SharedArray{Float64}([0.0 0.0 1.0; 0.0 -1.0 0.0; 1.0 0.0 0.0
0.0 0.0 1.0; 0.0 -1.0 0.0; 1.0 0.0 0.0
0.0 0.0 1.0; 0.0 -1.0 0.0; 1.0 0.0 0.0
0.0 0.0 1.0; 0.0 -1.0 0.0; 1.0 0.0 0.0]);
Obviously these vectors will be huge, but for simplicity I just put this limited size.
Operation without parallel computing:
function unifLoad(coordSys,dir,load,L,ri)
if coordSys == true
if dir == 1
Q = [load;0;0];
elseif dir == 2
Q = [0;load;0];
elseif dir == 3
Q = [0;0;load];
end
q = ri*Q; #matrix multiplication
P = q[1]*L/2;
V = q[2]*L/2;
M = -q[3]*L*L/12;
f = [P;V;M];
else
f = [1.0;1.0;1.0];
end
return f
end
running the loop:
var = zeros(12)
for i = 1:length(L)
var[3*(i-1)+1:3*i] = unifLoad(coordSys[i],dir[i],load[i],L[i],r[3*(i-1)+1:3*i,:]);
end
The returned value is:
var
12-element Array{Float64,1}:
0.0
0.0
-1.06667e5
1.0
1.0
1.0
2100.0
0.0
-0.0
0.0
2125.0
-0.0
Operation with parallel computing
I've been trying to implement the same function in parallel, but without getting the same results.
# addprocs(3)
#everywhere function unifLoad_Parallel(coordSys,dir,load,L,ri)
if coordSys == true
if dir == 1
Q = [load;0;0];
elseif dir == 2
Q = [0;load;0];
elseif dir == 3
Q = [0;0;load];
end
q = ri*Q; # Matrix multiplication (ri -> Array 3x3)
P = q[1]*L/2;
V = q[2]*L/2;
M = -q[3]*L*L/12;
f = [P;V;M];
else
f = [1.0;1.0;1.0];
end
return f
end
running the parallel loop:
var_parallel = SharedArray{Float64}(12);
#parallel for i = 1:length(L)
var_parallel[3*(i-1)+1:3*i] = unifLoad_Parallel(coordSys[i],dir[i],load[i],L[i],r[3*(i-1)+1:3*i,:]);
end
The returned value is:
var_parallel
12-element SharedArray{Float64,1}:
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
On my Julia 0.6.3 the parallel code returns the same result so I am unable to reproduce the problem (also I do not encounter the issue #SalchiPapa reports).
However, I would like to note that this code actually should work faster with threads (I assume that the real problem is much larger). Here is the code you could use (I used an equivalent implementation to your which is a bit shorter - but the only significantly relevant change is that I wrap it in a function which provides dramatic performance gains). The crucial issue that all arrays except var are shared but only read. And var is written but only once at each entry and not read from. This is the case where it is safe to use threading which has a lower overhead.
Here is an example code (you have to define JULIA_NUM_TREADS environment variable before starting Julia and set it to number of threads you want - most probably 4 is what you want):
using Base.Threads
function experiment()
coordSys = [true,false,true,true];
dir = [1,2,3,2];
load = [8,-7.5,7,-8.5];
L = [400,450,600,500];
r = [0.0 0.0 1.0; 0.0 -1.0 0.0; 1.0 0.0 0.0
0.0 0.0 1.0; 0.0 -1.0 0.0; 1.0 0.0 0.0
0.0 0.0 1.0; 0.0 -1.0 0.0; 1.0 0.0 0.0
0.0 0.0 1.0; 0.0 -1.0 0.0; 1.0 0.0 0.0];
unifLoad(coordSys,dir,load,L,r, i) =
coordSys ? load * L * r[3*(i-1)+1:3*i, dir] .* [0.5, 0.5, -L/12] : [1.0, 1.0, 1.0]
var = zeros(12)
#threads for i = 1:length(L)
var[3*(i-1)+1:3*i] = unifLoad(coordSys[i],dir[i],load[i],L[i],r,i);
end
var
end
Also here is a bit simplified code for parallel processing using similar ideas:
coordSys = SharedArray{Bool}([true,false,true,true]);
dir = SharedArray{Int8}([1,2,3,2]);
load = SharedArray{Float64}([8,-7.5,7,-8.5]);
L = SharedArray{Float64}([400,450,600,500]);
r = SharedArray{Float64}([0.0 0.0 1.0; 0.0 -1.0 0.0; 1.0 0.0 0.0
0.0 0.0 1.0; 0.0 -1.0 0.0; 1.0 0.0 0.0
0.0 0.0 1.0; 0.0 -1.0 0.0; 1.0 0.0 0.0
0.0 0.0 1.0; 0.0 -1.0 0.0; 1.0 0.0 0.0]);
#everywhere unifLoad(coordSys,dir,load,L,r,i) =
coordSys ? load * L * r[3*(i-1)+1:3*i, dir] .* [0.5, 0.5, -L/12] : [1.0, 1.0, 1.0]
vcat(pmap(i -> unifLoad(coordSys[i],dir[i],load[i],L[i],r,i), 1:length(L))...)
Here pmap is mostly used to simplify the code so that you do not need #sync.

Parallel Computing with Julia #parallel and SharedArray

I have been trying to implement some parallel programming in Julia using #parallel and SharedArrays.
Xi = Array{Float64}([0.0, 450.0, 450.0, 0.0, 0.0, 450.0, 450.0, 0.0])
Yi = Array{Float64}([0.0, 0.0, 600.0, 600.0, 0.0, 0.0, 600.0, 600.0])
Zi = Array{Float64}([0.0, 0.0, 0.0, 0.0, 400.0, 400.0, 400.0, 400.0])
Xj = Array{Float64}([0.0, 450.0, 450.0, 0.0, 0.0, 450.0, 450.0, 0.0])
Yj = Array{Float64}([0.0, 0.0, 600.0, 600.0, 0.0, 0.0, 600.0, 600.0])
Zj = Array{Float64}([0.0, 0.0, 0.0, 0.0, 400.0, 400.0, 400.0, 400.0])
L = Array{Float64}([400.0, 400.0, 400.0, 400.0, 450.0, 600.0, 450.0, 600.0])
Rot = Array{Float64}([90.0, 90.0, 90.0, 90.0, 0.0, 0.0, 0.0, 0.0])
Obviously these vectors will be huge, but for simplicity I just put this limited size.
This is the operation without parallel computing:
function jt_transcoord(Xi, Yi, Zi, Xj, Yj, Zj, Rot, L)
r = Vector(length(Xi))
for i in 1:length(Xi)
rxX = (Xj[i] - Xi[i]) / L[i]
rxY = (Yj[i] - Yi[i]) / L[i]
rxZ = (Zj[i] - Zi[i]) / L[i]
if rxX == 0 && rxY == 0
r[i] = [0 0 rxZ; cosd(Rot[i]) -rxZ*sind(Rot[i]) 0; sind(Rot[i]) rxZ*cosd(Rot[i]) 0]
else
R=sqrt(rxX^2+rxY^2)
r21=(-rxX*rxZ*cosd(Rot[i])+rxY*sind(Rot[i]))/R
r22=(-rxY*rxZ*cosd(Rot[i])-rxX*sind(Rot[i]))/R
r23=R*cosd(Rot[i])
r31=(rxX*rxZ*sind(Rot[i])+rxY*cosd(Rot[i]))/R
r32=(rxY*rxZ*sind(Rot[i])-rxX*cosd(Rot[i]))/R
r33=-R*sind(Rot[i])
r[i] = [rxX rxY rxZ;r21 r22 r23;r31 r32 r33]
end
end
return r
end
The returned value is basically an array that contains a matrix in each vector row. That looks something like this:
r =
[[0.0 0.0 1.0; 0.0 -1.0 0.0; 1.0 0.0 0.0],
[0.0 0.0 1.0; 0.0 -1.0 0.0; 1.0 0.0 0.0],
[0.0 0.0 1.0; 0.0 -1.0 0.0; 1.0 0.0 0.0],
[0.0 0.0 1.0; 0.0 -1.0 0.0; 1.0 0.0 0.0],
[1.0 0.0 0.0; 0.0 -0.0 1.0; 0.0 -1.0 -0.0],
[0.0 1.0 0.0; 0.0 -0.0 1.0; 1.0 0.0 -0.0],
[-1.0 0.0 0.0; 0.0 0.0 1.0; 0.0 1.0 -0.0],
[0.0 -1.0 0.0; -0.0 0.0 1.0; -1.0 -0.0 -0.0]]
This is my function using #parallel. First of all I need to convert the vectors to SharedArrays:
Xi = convert(SharedArray, Xi)
Yi = convert(SharedArray, Yi)
Zi = convert(SharedArray, Zi)
Xj = convert(SharedArray, Xj)
Yj = convert(SharedArray, Yj)
Zj = convert(SharedArray, Zj)
L = convert(SharedArray, L)
Rot = convert(SharedArray, Rot)
This the same code but using #parallel
function jt_transcoord_parallel(Xi, Yi, Zi, Xj, Yj, Zj, Rot, L)
r = SharedArray{Float64}(zeros((length(Xi),1)))
#parallel for i in 1:length(Xi)
rxX = (Xj[i] - Xi[i]) / L[i]
rxY = (Yj[i] - Yi[i]) / L[i]
rxZ = (Zj[i] - Zi[i]) / L[i]
if rxX == 0 && rxY == 0
r[i] = [0 0 rxZ; cosd(Rot[i]) -rxZ*sind(Rot[i]) 0; sind(Rot[i]) rxZ*cosd(Rot[i]) 0]
else
R=sqrt(rxX^2+rxY^2)
r21=(-rxX*rxZ*cosd(Rot[i])+rxY*sind(Rot[i]))/R
r22=(-rxY*rxZ*cosd(Rot[i])-rxX*sind(Rot[i]))/R
r23=R*cosd(Rot[i])
r31=(rxX*rxZ*sind(Rot[i])+rxY*cosd(Rot[i]))/R
r32=(rxY*rxZ*sind(Rot[i])-rxX*cosd(Rot[i]))/R
r33=-R*sind(Rot[i])
r[i] = [rxX rxY rxZ;r21 r22 r23;r31 r32 r33]
end
end
return r
end
I just got a vector of zeros. My question is: Is there a way to implement this function using #parallel in Julia and get the same results that I got in my original function?
The functions jt_transcoord and jt_transcoord_parallel have major coding flaws.
In jt_transcoord, you are assigning an array to a vector element position. For example, you write r = Vector(length(Xi)) and then assign r[i] = [rxX rxY rxZ;r21 r22 r23;r31 r32 r33]. But r[i] should be a number, and you instead assign it a 3x3 matrix. I suspect that Julia is quietly changing types for you.
SharedArray objects will not admit this lax type conversion behavior. The components of a SharedArray must be of a single primitive type such as Float64, and Vector{Matrix} is not a primitive type. Open a Julia v0.6 REPL and copy/paste the following code:
r = SharedArray{Float64}(length(Xi))
for i in 1:length(Xi)
rxX = (Xj[i] - Xi[i]) / L[i]
rxY = (Yj[i] - Yi[i]) / L[i]
rxZ = (Zj[i] - Zi[i]) / L[i]
if rxX == 0 && rxY == 0
r[i] = [0 0 rxZ; cosd(Rot[i]) -rxZ*sind(Rot[i]) 0; sind(Rot[i]) rxZ*cosd(Rot[i]) 0]
else
R = sqrt(rxX^2+rxY^2)
r21 = (-rxX*rxZ*cosd(Rot[i])+rxY*sind(Rot[i]))/R
r22 = (-rxY*rxZ*cosd(Rot[i])-rxX*sind(Rot[i]))/R
r23 = R*cosd(Rot[i])
r31 = (rxX*rxZ*sind(Rot[i])+rxY*cosd(Rot[i]))/R
r32 = (rxY*rxZ*sind(Rot[i])-rxX*cosd(Rot[i]))/R
r33 = -R*sind(Rot[i])
r[i] = [rxX rxY rxZ;r21 r22 r23;r31 r32 r33]
end
end
On my end, I get:
ERROR: MethodError: Cannot `convert` an object of type Array{Float64,2} to an object of type Float64
This may have arisen from a call to the constructor Float64(...),
since type constructors fall back to convert methods.
Stacktrace:
[1] setindex!(::SharedArray{Float64,2}, ::Array{Float64,2}, ::Int64) at ./sharedarray.jl:483
[2] macro expansion at ./REPL[26]:6 [inlined]
[3] anonymous at ./<missing>:?
Essentially, Julia is telling you that it cannot assign a matrix to a SharedArray vector.
What are your options?
If you insist on having a Vector{Matrix} return type, then use r = Vector{Matrix{Float64}}(length(Xi)) in jt_transcoord. But you cannot use SharedArrays for this since Vector{Matrix} is not an admissible primitive type.
Alternatively, if you are willing to operate with tensors (i.e. 3-way arrays) then you can use pseudocode A below. But SharedArray computing will only help you if you carefully account for which process owns which portion of the tensor. Otherwise, the processes will need to communicate with each other, and your parallelized function could execute very slowly.
If you are willing to lay your 3x3 matrices in a 3n x 3 columnwise fashion, then you can use pseudocode B below.
Pseudocode A
function jt_transcoord_tensor(Xi, Yi, Zi, Xj, Yj, Zj, Rot, L)
# initialize array
r = Array{Float64}(3,3,length(Xi))
# r = SharedArray{Float64,3}((3,3,length(Xi))) # for SharedArrays
for i in 1:length(Xi)
# #parallel for i in 1:length(Xi) # for SharedArrays
# other code...
r[:,:,i] = [0 0 rxZ; cosd(Rot[i]) -rxZ*sind(Rot[i]) 0; sind(Rot[i]) rxZ*cosd(Rot[i]) 0]
# other code...
r[:,:,i] = [rxX rxY rxZ;r21 r22 r23;r31 r32 r33]
end
end
return r
end
Pseudocode B
function jt_transcoord_parallel(Xi, Yi, Zi, Xj, Yj, Zj, Rot, L)
n = length(Xi)
r = SharedArray{Float64}((3*n,3))
#parallel for i in 1:length(Xi)
# other code...
r[(3*(i-1)+1):3*(i),:] = [0 0 rxZ; cosd(Rot[i]) -rxZ*sind(Rot[i]) 0; sind(Rot[i]) rxZ*cosd(Rot[i]) 0]
# other code...
r[(3*(i-1)+1):3*(i),:] = [rxX rxY rxZ;r21 r22 r23;r31 r32 r33]
end
end
return r
end

Matrix Translation in GLSL is infinitely stretched

I work with webgl and modify the shaders (vs.glsls and fs.glsl) to understand the GLSL and graphic programming. I have a model that I want to scale, rotate and translate. Scaling and rotating works fine but when I multiply the translation matrix, the result is weird. I know this is a very basic question, but I am missing something and I need to find it out.
my model is infinitely stretchered through the y axis.
The white area is supposed to be the eye of the model:
this is my vertex shader code:
mat4 rX = mat4 (
1.0, 0.0, 0.0, 0.0,
0.0, 0.0, -1.0, 0.0,
0.0, 1.0, 0.0, 0.0,
0.0, 0.0, 0.0, 1.0
);
mat4 rZ = mat4 (
0.0, 1.0, 0.0, 0.0,
-1.0, 0.0, 0.0, 0.0,
0.0, 0.0, 1.0, 0.0,
0.0, 0.0, 0.0, 1.0
);
mat4 eyeScale = mat4 (
.50,0.0,0.0,0.0,
0.0,.50,0.0,0.0,
0.0,0.0,.50,0.0,
0.0,0.0,0.0,1.0
);
mat4 eyeTrans = mat4(
1.0,0.0,0.0,0.0,
0.0,1.0,0.0,4.0,
0.0,0.0,1.0,0.0,
0.0,0.0,0.0,1.0
);
mat4 iR = eyeTrans*rZ*rX*eyeScale;
gl_Position = projectionMatrix * modelViewMatrix *iR* vec4(position, 1.0);
}
You swapped rows and columns, when you set up the translation matrix
Change it to:
mat4 eyeTrans = mat4(
1.0, 0.0, 0.0, 0.0,
0.0, 1.0, 0.0, 0.0,
0.0, 0.0, 1.0, 0.0,
0.0, 4.0, 0.0, 1.0
);
A 4*4 matrix looks like this:
c0 c1 c2 c3 c0 c1 c2 c3
[ Xx Yx Zx Tx ] [ 0 4 8 12 ]
[ Xy Yy Zy Ty ] [ 1 5 9 13 ]
[ Xz Yz Zz Tz ] [ 2 6 10 14 ]
[ 0 0 0 1 ] [ 3 7 11 15 ]
In GLSL the columns are addressed like this:
vec4 c0 = eyeTrans[0].xyzw;
vec4 c1 = eyeTrans[1].xyzw;
vec4 c2 = eyeTrans[2].xyzw;
vec4 c3 = eyeTrans[3].xyzw;
And the memory image of a 4*4 matrix looks like this:
[ Xx, Xy, Xz, 0, Yx, Yy, Yz, 0, Zx, Zy, Zz, 0, Tx, Ty, Tz, 1 ]
See further:
GLSL 4×4 Matrix Fields

Line too long in PGI 16.9. How to solve?

USe the following dummy code to replicate the issue.
program pp
implicit none
real*8,dimension(45) :: refPoints
refPoints(:) = (/ -1.0 , 1.0 , 1.0 , -1.0 , -1.0 , 1.0 , 1.0 , -1.0 , 0.0 , 1.0 , 0.0 , -1.0 , 0.0 , 1.0 , 0.0 , -1.0 , -1.0 , 1.0 , 1.0 , -1.0 , 0.0 , 1.0 , 0.0 ,-1.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 1.0, 1.0, 2.0 , 3.0, 34.0, 35.0, 25.0, 1.0, 50.0, 5.0, 55.0 , 1.0 , 2.0, 3.0, 4.0, 5.0/)
end program pp
PGF90-S-0285-Source line too long (pp.f90: 6)
PGF90-S-0023-Syntax error - unbalanced parentheses (pp.f90: 6)
0 inform, 0 warnings, 2 severes, 0 fatal for pp
132 columns is limit the F90 standard and going beyond this limit is undefined behavior. While a pain, you'll be better off in the long run by getting your code into compliance by adding continuations.

Resources