Pascal: check multiple things in one if (like || in other languages) - pascal

I have recently started learning pascal, but I have a problem. Is it possible to do something like this (n is an integer):
if (n[1] + n[2] = n[3] + n[4]) || (n[1] + n[3] = n[2] + n[4])
In other languagles you can use ||, but I can't find a way in Pascal...

C like || logical OR operator equals to or in Pascal based languages, so write:
if (n[1] + n[2] = n[3] + n[4]) or (n[1] + n[3] = n[2] + n[4]) then

Related

What is the "Regular Expression" for this given machine?

Given Machine
Actually I'm confused between two options
cxa(bcxa+d)x
cxa(bc+d)x
here "x" means * on previous letter/bracket followed by "x".
Write out our equations:
(q1) = (q1)c + (q2)b + e
(q2) = (q2)d + (q1)a
Simplify the expression for (q1) using the rule (q) = (q)x + y <=> (q) = yx*:
(q1) = ((q2)b + e)c*
(q2) = (q2)d + (q1)a
Replace the expression for (q1) into the expression for (q2), factorize the (q2) out of the RHS, and apply the rule from above:
(q1) = ((q2)b + e)c*
(q2) = (q2)d + ((q2)b + e)c*a
= (q2)d + (q2)bc*a + c*a
= (q2)(d + bc*a) + c*a
= (c*a)(d + bc*a)*
This appears to be what you have down for option 1.

Mathematica: How to Parametric Plot a function in several time intervals using solutions of set of time dependent equations

I have set of time dependent equations. 4 equations with 4 time dependent variables {r[t], c[t], Uo[t], U1[t]}.
Those 4 variables need to be used for a parametric transformation function
zJouko[o_] := r[t]*Exp[o*I] + (Uo[t]/(Exp[o*I] - c[t])) + U1[t]. o has nothing to do with the time parameter.
I need to plot this parametric function zJouko[o] for few time intervals on the same figure.
I have initial conditions for the 4 variables.
I have tried to use NSolve and then use its results to the plot but unsuccessfully.
Another problem is that when I launch Mathematica NSolve is working for several times and after that return empty solution.
I have tried this code unsuccessfully. I also don't know where to put the time intervals in the code.
some constants:
q2 = 0.5; mu1 = 1; mu2 = 1; tau = 1.0;
NSolve with the 4 equations and initial conditions
setEquation = NSolve[
{Uo[t]/c[t] == U1[t],
q2 == (r[t]/c[t]) + (Uo[t]*c[t]/(1 - ((c[t])^2))) + U1[t],
mu1*Exp[-t/tau] == r[t]*(r[t] - (Uo[t]/((c[t])^2))),
mu2*Exp[-t/tau] ==
Uo[t]*(((Uo[t])/((((c[t])^2) - 1)^2)) - r[t]/((c[t])^2)),
r[0] == 1/100, Uo[0] == -1/2, U1[0] == -5/12, c[0] == 6/5}, {r[t],
c[t], Uo[t], U1[t]}]
the function and the ParametricPlot:
zJouko[o_] := r[t]*Exp[o*I] + (Uo[t]/(Exp[o*I] - c[t])) + U1[t];
ParametricPlot[{Re[zJouko[o]], Im[zJouko[o]]}, {o, 0, 2 Pi}]
In a fresh start of Mathematica I give it
q2 = 1/2; mu1 = 1; mu2 = 1; tau = 1;
setEquation = Reduce[{Uo[t]/c[t] == U1[t],
q2 == (r[t]/c[t]) + (Uo[t]*c[t]/(1 - ((c[t])^2))) + U1[t],
mu1*Exp[-t/tau] == r[t]*(r[t] - (Uo[t]/((c[t])^2))),
mu2*Exp[-t/tau] == Uo[t]*(((Uo[t])/((((c[t])^2) - 1)^2)) - r[t]/((c[t])^2)),
r[0] == 1/100, Uo[0] == -1/2, U1[0] == -5/12, c[0] == 6/5},
{r[t], c[t], Uo[t], U1[t]}]
and it tells me r[t] is either plus or minus Sqrt[16 + E^t]/(4*Sqrt[2]*Sqrt[E^t]) and then given that choice of r[t]
c[t] == 4*r[t]
Uo[t] == r[t] - 16*r[t]^3
U1[t] == (1 - 16*r[t]^2)/4
Test the solution to see if it is correct.
q2 = 1/2; mu1 = 1; mu2 = 1; tau = 1;
r[t_]:=Sqrt[16 + E^t]/(4*Sqrt[2]*Sqrt[E^t]);
c[t_]:= 4*r[t];
Uo[t_]:=r[t] - 16*r[t]^3;
U1[t_]:= (1 - 16*r[t]^2)/4;
Simplify[ {Uo[t]/c[t] == U1[t],
q2 == (r[t]/c[t]) + (Uo[t]*c[t]/(1 - ((c[t])^2))) + U1[t],
mu1*Exp[-t/tau] == r[t]*(r[t] - (Uo[t]/((c[t])^2))),
mu2*Exp[-t/tau] == Uo[t]*(((Uo[t])/((((c[t])^2) - 1)^2)) - r[t]/((c[t])^2))}]
and this returns
{True, True, True, True}
Check your conditions at t==0
{r[0],c[0],Uo[0],U1[0]}//N
and it returns
{0.728869, 2.91548, -5.46651, -1.875}
Plot your function
zJouko[o_]:= r[t]*Exp[o*I] + (Uo[t]/(Exp[o*I] - c[t])) + U1[t];
Plot[Table[{Re[zJouko[o]], Im[zJouko[o]]},{t,0,2}], {o, 0, 2 Pi}]
Please try to check all this carefully and see if you can find that I have made a mistake anywhere.

Loop optimisation

I am trying to understand what cache or other optimizations could be done in the source code to get this loop faster. I think it is quite cache friendly but, are there any experts out there that could squeeze a bit more performance tuning this code?
DO K = 1, NZ
DO J = 1, NY
DO I = 1, NX
SIDEBACK = STEN(I-1,J-1,K-1) + STEN(I-1,J,K-1) + STEN(I-1,J+1,K-1) + &
STEN(I ,J-1,K-1) + STEN(I ,J,K-1) + STEN(I ,J+1,K-1) + &
STEN(I+1,J-1,K-1) + STEN(I+1,J,K-1) + STEN(I+1,J+1,K-1)
SIDEOWN = STEN(I-1,J-1,K) + STEN(I-1,J,K) + STEN(I-1,J+1,K) + &
STEN(I ,J-1,K) + STEN(I ,J,K) + STEN(I ,J+1,K) + &
STEN(I+1,J-1,K) + STEN(I+1,J,K) + STEN(I+1,J+1,K)
SIDEFRONT = STEN(I-1,J-1,K+1) + STEN(I-1,J,K+1) + STEN(I-1,J+1,K+1) + &
STEN(I ,J-1,K+1) + STEN(I ,J,K+1) + STEN(I ,J+1,K+1) + &
STEN(I+1,J-1,K+1) + STEN(I+1,J,K+1) + STEN(I+1,J+1,K+1)
RES(I,J,K) = ( SIDEBACK + SIDEOWN + SIDEFRONT ) / 27.0
END DO
END DO
END DO
Ok, I think I've tried everything I reasonably could, and my conclusion unfortunately is that there is not too much room for optimizations, unless you are willing to go into parallelization. Let's see why, let's see what you can and can't do.
Compiler optimizations
Compilers nowadays are extremely good at optimizing code, much much more than humans are. Relying on the optimizations done by the compilers also have the added benefit that they don't ruin the readability of your source code. Whatever you do, (when optimizing for speed) always try it with every reasonable combination of compiler flags. You can even go as far as to try multiple compilers. Personally I only used gfortran (included in GCC) (OS is 64-bit Windows), which I trust to have efficient and correct optimization techniques.
-O2 almost always improve the speed drastically, but even -O3 is a safe bet (among others, it includes delicious loop unrolling). For this problem, I also tried -ffast-math and -fexpensive-optimizations, they didn't have any measurable effect, but -march-corei7(cpu architecture-specific tuning, specific to Core i7) had, so I did the measurements with -O3 -march-corei7
So how fast it actually is?
I wrote the following code to test your solution and compiled it with -O3 -march-corei7. Usually it ran under 0.78-0.82 seconds.
program benchmark
implicit none
real :: start, finish
integer :: I, J, K
real :: SIDEBACK, SIDEOWN, SIDEFRONT
integer, parameter :: NX = 600
integer, parameter :: NY = 600
integer, parameter :: NZ = 600
real, dimension (0 : NX + 2, 0 : NY + 2, 0 : NZ + 2) :: STEN
real, dimension (0 : NX + 2, 0 : NY + 2, 0 : NZ + 2) :: RES
call random_number(STEN)
call cpu_time(start)
DO K = 1, NZ
DO J = 1, NY
DO I = 1, NX
SIDEBACK = STEN(I-1,J-1,K-1) + STEN(I-1,J,K-1) + STEN(I-1,J+1,K-1) + &
STEN(I ,J-1,K-1) + STEN(I ,J,K-1) + STEN(I ,J+1,K-1) + &
STEN(I+1,J-1,K-1) + STEN(I+1,J,K-1) + STEN(I+1,J+1,K-1)
SIDEOWN = STEN(I-1,J-1,K) + STEN(I-1,J,K) + STEN(I-1,J+1,K) + &
STEN(I ,J-1,K) + STEN(I ,J,K) + STEN(I ,J+1,K) + &
STEN(I+1,J-1,K) + STEN(I+1,J,K) + STEN(I+1,J+1,K)
SIDEFRONT = STEN(I-1,J-1,K+1) + STEN(I-1,J,K+1) + STEN(I-1,J+1,K+1) + &
STEN(I ,J-1,K+1) + STEN(I ,J,K+1) + STEN(I ,J+1,K+1) + &
STEN(I+1,J-1,K+1) + STEN(I+1,J,K+1) + STEN(I+1,J+1,K+1)
RES(I,J,K) = ( SIDEBACK + SIDEOWN + SIDEFRONT ) / 27.0
END DO
END DO
END DO
call cpu_time(finish)
!Use the calculated value, so the compiler doesn't optimize away everything.
!Print the original value as well, because one can never be too paranoid.
print *, STEN(1,1,1), RES(1,1,1)
print '(f6.3," seconds.")',finish-start
end program
Ok, so this is as far as the compiler can take us. What's next?
Store intermediate results?
As you might suspect from the question mark, this one didn't really work. Sorry. But let's not rush that forward.
As mentioned in the comments, your current code calculates every partial sum multiple times, meaning one iteration's STEN(I+1,J-1,K-1) + STEN(I+1,J,K-1) + STEN(I+1,J+1,K-1) will be the next iteration's STEN(I,J-1,K-1) + STEN(I,J,K-1) + STEN(I,J+1,K-1), so no need to fetch and calculate again, you can store those partial results.
The problem is, that we cannot store too many partial results. As you said, your code is already quite cache-friendly, every partial sum you store means one less array element you can store in L1 cache. We could store a few values, from the last few iterations of I (values for index I-2, I-3, etc.), but the compiler almost certainly does that already. I have 2 proofs for this suspicion. First, my manual loop unrolling made the program slower, by about 5%
DO K = 1, NZ
DO J = 1, NY
DO I = 1, NX, 8
SIDEBACK(0) = STEN(I-1,J-1,K-1) + STEN(I-1,J,K-1) + STEN(I-1,J+1,K-1)
SIDEBACK(1) = STEN(I ,J-1,K-1) + STEN(I ,J,K-1) + STEN(I ,J+1,K-1)
SIDEBACK(2) = STEN(I+1,J-1,K-1) + STEN(I+1,J,K-1) + STEN(I+1,J+1,K-1)
SIDEBACK(3) = STEN(I+2,J-1,K-1) + STEN(I+2,J,K-1) + STEN(I+2,J+1,K-1)
SIDEBACK(4) = STEN(I+3,J-1,K-1) + STEN(I+3,J,K-1) + STEN(I+3,J+1,K-1)
SIDEBACK(5) = STEN(I+4,J-1,K-1) + STEN(I+4,J,K-1) + STEN(I+4,J+1,K-1)
SIDEBACK(6) = STEN(I+5,J-1,K-1) + STEN(I+5,J,K-1) + STEN(I+5,J+1,K-1)
SIDEBACK(7) = STEN(I+6,J-1,K-1) + STEN(I+6,J,K-1) + STEN(I+6,J+1,K-1)
SIDEBACK(8) = STEN(I+7,J-1,K-1) + STEN(I+7,J,K-1) + STEN(I+7,J+1,K-1)
SIDEBACK(9) = STEN(I+8,J-1,K-1) + STEN(I+8,J,K-1) + STEN(I+8,J+1,K-1)
SIDEOWN(0) = STEN(I-1,J-1,K) + STEN(I-1,J,K) + STEN(I-1,J+1,K)
SIDEOWN(1) = STEN(I ,J-1,K) + STEN(I ,J,K) + STEN(I ,J+1,K)
SIDEOWN(2) = STEN(I+1,J-1,K) + STEN(I+1,J,K) + STEN(I+1,J+1,K)
SIDEOWN(3) = STEN(I+2,J-1,K) + STEN(I+2,J,K) + STEN(I+2,J+1,K)
SIDEOWN(4) = STEN(I+3,J-1,K) + STEN(I+3,J,K) + STEN(I+3,J+1,K)
SIDEOWN(5) = STEN(I+4,J-1,K) + STEN(I+4,J,K) + STEN(I+4,J+1,K)
SIDEOWN(6) = STEN(I+5,J-1,K) + STEN(I+5,J,K) + STEN(I+5,J+1,K)
SIDEOWN(7) = STEN(I+6,J-1,K) + STEN(I+6,J,K) + STEN(I+6,J+1,K)
SIDEOWN(8) = STEN(I+7,J-1,K) + STEN(I+7,J,K) + STEN(I+7,J+1,K)
SIDEOWN(9) = STEN(I+8,J-1,K) + STEN(I+8,J,K) + STEN(I+8,J+1,K)
SIDEFRONT(0) = STEN(I-1,J-1,K+1) + STEN(I-1,J,K+1) + STEN(I-1,J+1,K+1)
SIDEFRONT(1) = STEN(I ,J-1,K+1) + STEN(I ,J,K+1) + STEN(I ,J+1,K+1)
SIDEFRONT(2) = STEN(I+1,J-1,K+1) + STEN(I+1,J,K+1) + STEN(I+1,J+1,K+1)
SIDEFRONT(3) = STEN(I+2,J-1,K+1) + STEN(I+2,J,K+1) + STEN(I+2,J+1,K+1)
SIDEFRONT(4) = STEN(I+3,J-1,K+1) + STEN(I+3,J,K+1) + STEN(I+3,J+1,K+1)
SIDEFRONT(5) = STEN(I+4,J-1,K+1) + STEN(I+4,J,K+1) + STEN(I+4,J+1,K+1)
SIDEFRONT(6) = STEN(I+5,J-1,K+1) + STEN(I+5,J,K+1) + STEN(I+5,J+1,K+1)
SIDEFRONT(7) = STEN(I+6,J-1,K+1) + STEN(I+6,J,K+1) + STEN(I+6,J+1,K+1)
SIDEFRONT(8) = STEN(I+7,J-1,K+1) + STEN(I+7,J,K+1) + STEN(I+7,J+1,K+1)
SIDEFRONT(9) = STEN(I+8,J-1,K+1) + STEN(I+8,J,K+1) + STEN(I+8,J+1,K+1)
RES(I ,J,K) = ( SIDEBACK(0) + SIDEOWN(0) + SIDEFRONT(0) + &
SIDEBACK(1) + SIDEOWN(1) + SIDEFRONT(1) + &
SIDEBACK(2) + SIDEOWN(2) + SIDEFRONT(2) ) / 27.0
RES(I + 1,J,K) = ( SIDEBACK(1) + SIDEOWN(1) + SIDEFRONT(1) + &
SIDEBACK(2) + SIDEOWN(2) + SIDEFRONT(2) + &
SIDEBACK(3) + SIDEOWN(3) + SIDEFRONT(3) ) / 27.0
RES(I + 2,J,K) = ( SIDEBACK(2) + SIDEOWN(2) + SIDEFRONT(2) + &
SIDEBACK(3) + SIDEOWN(3) + SIDEFRONT(3) + &
SIDEBACK(4) + SIDEOWN(4) + SIDEFRONT(4) ) / 27.0
RES(I + 3,J,K) = ( SIDEBACK(3) + SIDEOWN(3) + SIDEFRONT(3) + &
SIDEBACK(4) + SIDEOWN(4) + SIDEFRONT(4) + &
SIDEBACK(5) + SIDEOWN(5) + SIDEFRONT(5) ) / 27.0
RES(I + 4,J,K) = ( SIDEBACK(4) + SIDEOWN(4) + SIDEFRONT(4) + &
SIDEBACK(5) + SIDEOWN(5) + SIDEFRONT(5) + &
SIDEBACK(6) + SIDEOWN(6) + SIDEFRONT(6) ) / 27.0
RES(I + 5,J,K) = ( SIDEBACK(5) + SIDEOWN(5) + SIDEFRONT(5) + &
SIDEBACK(6) + SIDEOWN(6) + SIDEFRONT(6) + &
SIDEBACK(7) + SIDEOWN(7) + SIDEFRONT(7) ) / 27.0
RES(I + 6,J,K) = ( SIDEBACK(6) + SIDEOWN(6) + SIDEFRONT(6) + &
SIDEBACK(7) + SIDEOWN(7) + SIDEFRONT(7) + &
SIDEBACK(8) + SIDEOWN(8) + SIDEFRONT(8) ) / 27.0
RES(I + 7,J,K) = ( SIDEBACK(7) + SIDEOWN(7) + SIDEFRONT(7) + &
SIDEBACK(8) + SIDEOWN(8) + SIDEFRONT(8) + &
SIDEBACK(9) + SIDEOWN(9) + SIDEFRONT(9) ) / 27.0
END DO
END DO
END DO
And what's worse, it's easy to show we are already pretty close the theoretical minimal possible execution time. In order to calculate all these averages, the absolute minimum we need to do, is access every element at least once, and divide them by 27.0. So you can never get faster than the following code, which executes under 0.48-0.5 seconds on my machine.
program benchmark
implicit none
real :: start, finish
integer :: I, J, K
integer, parameter :: NX = 600
integer, parameter :: NY = 600
integer, parameter :: NZ = 600
real, dimension (0 : NX + 2, 0 : NY + 2, 0 : NZ + 2) :: STEN
real, dimension (0 : NX + 2, 0 : NY + 2, 0 : NZ + 2) :: RES
call random_number(STEN)
call cpu_time(start)
DO K = 1, NZ
DO J = 1, NY
DO I = 1, NX
!This of course does not do what you want to do,
!this is just an example of a speed limit we can never surpass.
RES(I, J, K) = STEN(I, J, K) / 27.0
END DO
END DO
END DO
call cpu_time(finish)
!Use the calculated value, so the compiler doesn't optimize away everything.
print *, STEN(1,1,1), RES(1,1,1)
print '(f6.3," seconds.")',finish-start
end program
But hey, even a negative result is a result. If just accessing every element once (and dividing by 27.0) takes up more than half of the execution time, that just means memory access is the bottle neck. Then maybe you can optimize that.
Less data
If you don't need the full precision of 64-bit doubles, you can declare your array with a type of real(kind=4). But maybe your reals are already 4 bytes. In that case, I believe some Fortran implementations support non-standard 16-bit doubles, or depending on your data you can just use integers (maybe floats multiplied by a number then rounded to integer). The smaller your base type is, the more elements you can fit into the cache. The most ideal would be integer(kind=1), of course, it caused more than a 2x speed up on my machine, compared to real(kind=4). But it depends on the precision you need.
Better locality
Column major arrays are slow when you need data from neighbouring column, and row major ones are slow for neighbouring rows.
Fortunately there is a funky way to store data, called a Z-order curve, which does have applications similar to your use case in computer graphics.
I can't promise it will help, maybe it will be terribly counterproductive, but maybe not. Sorry, I didn't feel like implementing it myself, to be honest.
Parallelization
Speaking of computer graphics, this problem is trivially and extremely well parallelizable, maybe even on a GPU, but if you don't want to go that far, you can just use a normal multicore CPU. The Fortran Wiki seems like a good place to search for Fortran parallelization libraries.

How to print an equation in Ruby

I've got a question.
1.0X + 1.0Y + -7.0 = 0
How can I print an equation better?
For example, instead of +- 7.0 I'd like to print -7.0; or in a case with zero coefficients.
Thanks
Prints an equation with full control of formating
a = b = 1
c = -7
puts "%0.1fX + %0.1fY %s %0.1f = %d"%[ a, b, c < 0 ? '-' : '+', c.abs, 0 ]
output:
1.0X + 1.0Y - 7.0 = 0
Documentation: Ruby's % string operator / sprintf
With a few substitutions, you could achieve a much cleaner equation :
equation = "1.0X + 1.0Y - -0.0Z + -7.0 = 0"
new_equation = equation.gsub('+ -', '- ')
.gsub('- -', '+ ')
.gsub(/^\s*\+/, '') # Remove leading +
.gsub(/(?<=\d)\.0+(?=\D)/, '') # Remove trailing zeroes
.gsub(/\b1(?=[a-z])/i, '') # Remove 1 in 1X
.gsub(/[+-]? ?0[a-z] ?/i, '') # Remove 0Z
p new_equation
# "X + Y - 7 = 0"
By the way, as much as I love Ruby, I must say that Sympy is an awesome project. This library alone makes it worthwhile to learn the basic Python syntax.

Reorganizing a formula containing Modulo

I have a formula that looks like
a = (b + 1 + c)%d
I want to express c in terms of rest, i.e. have "C" on the LHS.
Any suggestions ?
a = (b + 1 + c)%d
a + n*d = b + 1 + c
a -1 - b + n*d = c
For any integer n.

Resources