CUDA.jl: What is the most time-efficient way of copying data from CuArrays?

CUDA.jl: What is the most time-efficient way of copying data from CuArrays? - performance

I'm working on some code to solve the restricted (where the gravitational field is analytically solved for) n-body problem, involving the Sun, Jupiter and many asteroids/rockets in a 2D solar system. Using CUDA.jl, I parallelised the propagation of my asteroids using CUDA.jl's higher-order abstractions. My problem is that after each time step I want to save the X and Y positions of all my asteroids by putting them in 2 2D arrays (one for X and one for Y), such that the first row of the arrays are the initial X and Y positions of my asteroids, and the second row shows their positions after one time step, etc. I found that copying over this data and storing it was the slowest part of my program. Is there an easy way to speed this up?
This is my current code.
function MassCoordsCircle(rotation::AbstractFloat,radius::AbstractFloat)
return radius*cos(rotation),radius*sin(rotation)
end
function FindForcesArr(rocketCoord::AbstractArray,mass1Coord::AbstractFloat,mass2Coord::AbstractFloat,rocketOtherCoord::AbstractArray,mass1OtherCoord::AbstractFloat,mass2OtherCoord::AbstractFloat,mass1::AbstractFloat,mass2::AbstractFloat,G::AbstractFloat)
return -G*mass1*(rocketCoord.-mass1Coord) ./ sqrt.((rocketCoord.-mass1Coord) .^ 2 .+ (rocketOtherCoord.-mass1OtherCoord) .^2 ) .^ 3 .- G*mass2*(rocketCoord.-mass2Coord) ./ sqrt.((rocketCoord.-mass2Coord) .^ 2 .+ (rocketOtherCoord.-mass2OtherCoord) .^ 2) .^3
end
function TaylorArr(x::AbstractArray,y::AbstractArray,vx::AbstractArray,vy::AbstractArray,mass1::AbstractFloat,mass2::AbstractFloat,dMass1::AbstractFloat,dMass2::AbstractFloat,G::AbstractFloat,mass1Rotation::AbstractFloat,mass2Rotation::AbstractFloat,period::AbstractFloat,dt::AbstractFloat)
mass1x,mass1y=MassCoords(mass1Rotation,dMass1)
mass2x,mass2y=MassCoords(mass2Rotation,dMass2)
ax=FindForcesArr(x,mass1x,mass2x,y,mass1y,mass2y,mass1,mass2,G)
ay=FindForcesArr(y,mass1y,mass2y,x,mass1x,mass2x,mass1,mass2,G)
return x .+ dt .* vx .+ dt^2/2 .* ax,y .+ dt.*vy .+ dt^2/2 .* ay,vx .+ dt .* ax,vy .+ dt .* ay
end
using CUDA
using Plots
using PyPlot
using PyCall
np=pyimport("numpy")
G=6.6726e-11
#Mass of Sun
mass1=1.989e30
#Mass of Jupiter
mass2=1.898e27
#Sun-Jupiter distance
d_mass1_mass2=7.7834082e11
timeStep=100000.0
numberParticles=2000
d_COM_mass1=d_mass1_mass2*mass2/(mass1+mass2)
d_COM_mass2=d_mass1_mass2*mass1/(mass1+mass2)
MassCoords=MassCoordsCircle
PropagateParticle=TaylorArr
period=sqrt(4*pi^2/(G*(mass1+mass2))*d_mass1_mass2^3)
steps=trunc(Int64,div(period,timeStep)+1)
mass1xHistory=zeros(steps)
mass1yHistory=zeros(steps)
mass2xHistory=zeros(steps)
mass2yHistory=zeros(steps)
particleXs=CUDA.randn(Float64,numberParticles).*1e12
particleYs=CUDA.randn(Float64,numberParticles).*1e12
particleVXs=CUDA.randn(Float64,numberParticles).*15000
particleVYs=CUDA.randn(Float64,numberParticles).*15000
particleXHistories=Array{Float64}(undef,steps,numberParticles)
particleYHistories=Array{Float64}(undef,steps,numberParticles)
time=0
step=1
while step<=steps
#Sun is 180 degrees out of sync with Jupiter.
mass1Rotation=time/period*2*pi+pi
mass2Rotation=time/period*2*pi
mass1x,mass1y=MassCoords(mass1Rotation,d_COM_mass1)
mass2x,mass2y=MassCoords(mass2Rotation,d_COM_mass2)
mass1xHistory[step]=mass1x
mass1yHistory[step]=mass1y
mass2xHistory[step]=mass2x
mass2yHistory[step]=mass2y
particleXsRAM=Array(particleXs)
particleYsRAM=Array(particleYs)
for count in 1:numberParticles
particleXHistories[step,count]=particleXsRAM[count]
particleYHistories[step,count]=particleYsRAM[count]
end
particleXs,particleYs,particleVXs,particleVYs=PropagateParticle(particleXs,particleYs,particleVXs,particleVYs,mass1,mass2,d_COM_mass1,d_COM_mass2,G,mass1Rotation,mass2Rotation,period,timeStep)
time=time+timeStep
step=step+1
end
plt.figure(1,figsize=[7,7])
plt.plot(mass1xHistory,mass1yHistory)
plt.plot(mass2xHistory,mass2yHistory)
plt.plot(np.array(particleXHistories),np.array(particleYHistories))
plt.xlim([-9e11,9e11])
plt.ylim([-9e11,9e11])
plt.show()
The asteroids are given random starting locations and velocities. I used matplotlib over Plots because it was significantly faster when generating the first graph.
I apologise for the big block of code; I wanted to give a working example. In particular though, I want to draw your attention to the block from the line particleXsRAM=Array(particleXs) to the end of the for loop.
particleXsRAM=Array(particleXs)
particleYsRAM=Array(particleYs)
for count in 1:numberParticles
particleXHistories[step,count]=particleXsRAM[count]
particleYHistories[step,count]=particleYsRAM[count]
end
particleXs and particleYs are CuArrays.
This is the part where I am saving the data after each time-step, and I am slightly concerned about the performance of this block. Currently the fastest way I have found to save the data is by translating the CuArray into a normal Julia array, and putting that into a Julia 2D array element by element. However, I was wondering if there was a way to parallelise this using the higher-order abstractions?
So far, I've also tried to create a 2D CuArray for each asteroid's X and Ys, and copying the data over element by element, but this seemed to be completed by the CPU instead of the GPU. I've also tried to create an array of CuArrays, and, while this executed quickly, I wasn't able to make it plot in an amount of time that made the trade-off worth it.
This code does provide a performance boost over my non-parallelised code, but not as much as I had hoped.

I thought I had already tried this, but it turns out that Julia's normal array syntax works. You can just initialise two 2D CuArrays before the loop:
particleXHistories=CuArray{Float64}(undef,steps,numberParticles)
particleYHistories=CuArray{Float64}(undef,steps,numberParticles)
and then inside the loop update them like so:
particleXHistories[step,:]=particleXs
particleYHistories[step,:]=particleYs
This definitely speeds up the propagation part of my program. It does slow down the plotting part as the resulting data still needs to be turned into a normal Julia array using Array(particleXHistories) for X and Y but from my quick preliminary tests I think it's an overall speed increase.

Related

Optimizing MATLAB work on N dim array(512,512,400)

I am working on images that are 512x512 pixels; I have written a code that analyzes my images and gives me the values that I need in matrices that have dimensions (512,512,400) in 10 minutes more or less, using pre-allocation.
My problem is when I want to work with this matrices: it takes me hours to see results and I want to implement some script that does what I want in much less time. Can you help me?
% meanm is a matrix (512,512,400) that contains the mean of every inputmatrix
% sigmam is a matrix (512,512,400) that contains the std of every inputmatrix
% Basically what I want is that for every inputmatrix (512x512), that is stored inside
% an array of dimensions (512,512,400),
% if a value is higher than the meanm + sigmam it has to be changed with
% the corrispondent value of meanm matrix.
p=400;
for h=1:p
if (inputmatrix(:,:,h) > meanm(:,:,h) + sigmam(:,:,h))
inputmatrix(:,:,h) = meanm(:,:,h);
end
end
I know that MatLab performs better on matrices calculation but I have no idea how to translate this for loop on my 400 images in something easier for it.

Try using the condition of your for loop to make a logical matrix
logical_mask = (meanm + sigmam) < inputmatrix;
inputmatrix(logical_mask) = meanm(logical_mask);
This should improve your performance by using two features of Matlab
Vectorization uses matrix operations instead of loops. To quote the linked site "Vectorized code often runs much faster than the corresponding code containing loops."
Logical Indexing allows you to access all elements in your array that meet a condition simultaneously.

Fortran: Matrix value changing when using CGESV from LAPACK

I am solving a system of linear equations using the LAPACK method CGESV with very large matrices. So far everything works ok, but there is an issue with computation time. I am copying the matrix to a temporary matrix since I use it repeatedly before passing it into CGESV. That copying of the matrix takes a very long time considering the method is called 1000+ times in a loop. Here is a rough illustration of what I am currently doing:
do 1 i=1,1000
Atemp = A(:,:,i) !takes about 2.5 sec to compute
CGESV(x,x,Atemp,x,x,b,x,x)
1 continue
where 'b' is a vector and Atemp is 10,000x10,000. I'd like to do something like this:
do 1 i=1,1000
CGESV(x,x,A(:,:,i),x,x,b,x,x)
1 continue
but the values in 'A' get changed and I can no longer reuse it. I need to increase the efficiency since it is the difference between 1+ hour versus ~4 minute in computation time.
My question is: is there a way to copy matrices quickly? If not, is there a way I can tell CGESV to return the same matrix? I only need the 'b' vector anyway. Or is there a better way to do this?

Changing this
do 1 i=1,1000
Atemp = A(:,:,i) !takes about 2.5 sec to compute
CGESV(x,x,Atemp,x,x,b,x,x)
1 continue
to this
atemp = a
do i=1,1000
CGESV(x,x,atemp(:,:,i),x,x,b,x,x)
end do
may well show a slight improvement in execution speed. Not due to the modernisation of the loop statements but due to the single, large, copy rather than the repeated smaller copies. The modernisation of the loop statements is just for my fun.

Vectorization of matlab code

i'm kinda new to vectorization. Have tried myself but couldn't. Can somebody help me vectorize this code as well as give a short explaination on how u do it, so that i can adapt the thinking process too. Thanks.
function [result] = newHitTest (point,Polygon,r,tol,stepSize)
%This function calculates whether a point is allowed.
%First is a quick test is done by calculating the distance from point to
%each point of the polygon. If that distance is smaller than range "r",
%the point is not allowed. This will slow down the algorithm at some
%points, but will greatly speed it up in others because less calls to the
%circleTest routine are needed.
polySize=size(Polygon,1);
testCounter=0;
for i=1:polySize
d = sqrt(sum((Polygon(i,:)-point).^2));
if d < tol*r
testCounter=1;
break
end
end
if testCounter == 0
circleTestResult = circleTest (point,Polygon,r,tol,stepSize);
testCounter = circleTestResult;
end
result = testCounter;

Given the information that Polygon is 2 dimensional, point is a row vector and the other variables are scalars, here is the first version of your new function (scroll down to see that there are lots of ways to skin this cat):
function [result] = newHitTest (point,Polygon,r,tol,stepSize)
result = 0;
linDiff = Polygon-repmat(point,size(Polygon,1),1);
testLogicals = sqrt( sum( ( linDiff ).^2 ,2 )) < tol*r;
if any(testLogicals); result = circleTest (point,Polygon,r,tol,stepSize); end
The thought process for vectorization in Matlab involves trying to operate on as much data as possible using a single command. Most of the basic builtin Matlab functions operate very efficiently on multi-dimensional data. Using for loop is the reverse of this, as you are breaking your data down into smaller segments for processing, each of which must be interpreted individually. By resorting to data decomposition using for loops, you potentially loose some of the massive performance benefits associated with the highly optimised code behind the Matlab builtin functions.
The first thing to think about in your example is the conditional break in your main loop. You cannot break from a vectorized process. Instead, calculate all possibilities, make an array of the outcome for each row of your data, then use the any keyword to see if any of your rows have signalled that the circleTest function should be called.
NOTE: It is not easy to efficiently conditionally break out of a calculation in Matlab. However, as you are just computing a form of Euclidean distance in the loop, you'll probably see a performance boost by using the vectorized version and calculating all possibilities. If the computation in your loop were more expensive, the input data were large, and you wanted to break out as soon as you hit a certain condition, then a matlab extension made with a compiled language could potentially be much faster than a vectorized version where you might be performing needless calculation. However this is assuming that you know how to program code that matches the performance of the Matlab builtins in a language that compiles to native code.
Back on topic ...
The first thing to do is to take the linear difference (linDiff in the code example) between Polygon and your row vector point. To do this in a vectorized manner, the dimensions of the 2 variables must be identical. One way to achieve this is to use repmat to copy each row of point to make it the same size as Polygon. However, bsxfun is usually a superior alternative to repmat (as described in this recent SO question), making the code ...
function [result] = newHitTest (point,Polygon,r,tol,stepSize)
result = 0;
linDiff = bsxfun(#minus, Polygon, point);
testLogicals = sqrt( sum( ( linDiff ).^2 ,2 )) < tol*r;
if any(testLogicals); result = circleTest (point,Polygon,r,tol,stepSize); end
I rolled your d value into a column of d by summing across the 2nd axis (note the removal of the array index from Polygon and the addition of ,2 in the sum command). I then went further and evaluated the logical array testLogicals inline with the calculation of the distance measure. You will quickly see that a downside of heavy vectorisation is that it can make the code less readable to those not familiar with Matlab, but the performance gains are worth it. Comments are pretty necessary.
Now, if you want to go completely crazy, you could argue that the test function is so simple now that it warrants use of an 'anonymous function' or 'lambda' rather than a complete function definition. The test for whether or not it is worth doing the circleTest does not require the stepSize argument either, which is another reason for perhaps using an anonymous function. You can roll your test into an anonymous function and then jut use circleTest in your calling script, making the code self documenting to some extent . . .
doCircleTest = #(point,Polygon,r,tol) any(sqrt( sum( bsxfun(#minus, Polygon, point).^2, 2 )) < tol*r);
if doCircleTest(point,Polygon,r,tol)
result = circleTest (point,Polygon,r,tol,stepSize);
else
result = 0;
end
Now everything is vectorised, the use of function handles gives me another idea . . .
If you plan on performing this at multiple points in the code, the repetition of the if statements would get a bit ugly. To stay dry, it seems sensible to put the test with the conditional function into a single function, just as you did in your original post. However, the utility of that function would be very narrow - it would only test if the circleTest function should be executed, and then execute it if needs be.
Now imagine that after a while, you have some other conditional functions, just like circleTest, with their own equivalent of doCircleTest. It would be nice to reuse the conditional switching code maybe. For this, make a function like your original that takes a default value, the boolean result of the computationally cheap test function, and the function handle of the expensive conditional function with its associated arguments ...
function result = conditionalFun( default, cheapFunResult, expensiveFun, varargin )
if cheapFunResult
result = expensiveFun(varargin{:});
else
result = default;
end
end %//of function
You could call this function from your main script with the following . . .
result = conditionalFun(0, doCircleTest(point,Polygon,r,tol), #circleTest, point,Polygon,r,tol,stepSize);
...and the beauty of it is you can use any test, default value, and expensive function. Perhaps a little overkill for this simple example, but it is where my mind wandered when I brought up the idea of using function handles.

Lua: Code optimization vector length calculation

I have a script in a game with a function that gets called every second. Distances between player objects and other game objects are calculated every second there. The problem is that there can be thoretically 800 function calls in 1 second(max 40 players * 2 main objects(1 up to 10 sub-objects)). I have to optimize this function for less processing. this is my current function:
local square = math.sqrt;
local getDistance = function(a, b)
local x, y, z = a.x-b.x, a.y-b.y, a.z-b.z;
return square(x*x+y*y+z*z);
end;
-- for example followed by: for i = 800, 1 do getDistance(posA, posB); end
I found out, that the localization of the math.sqrt function through
local square = math.sqrt;
is a big optimization regarding to the speed, and the code
x*x+y*y+z*z
is faster than this code:
x^2+y^2+z^2
I don't know if the localization of x, y and z is better than using the class method "." twice, so maybe square(a.x*b.x+a.y*b.y+a.z*b.z) is better than the code local x, y, z = a.x-b.x, a.y-b.y, a.z-b.z;
square(x*x+y*y+z*z);
Is there a better way in maths to calculate the vector length or are there more performance tips in Lua?

You should read Roberto Ierusalimschy's Lua Performance Tips (Roberto is the chief architect of Lua). It touches some of the small optimizations you're asking about (such as localizing library functions and replacing exponents with their mutiplicative equivalents). Most importantly, it conveys one of the most important and overlooked ideas in engineering: sometimes the best solution involves changing your problem. You're not going to fix a 30-million-calculation leak by reducing the number of CPU cycles the calculation takes.
In your specific case of distance calculation, you'll find it's best to make your primitive calculation return the intermediate sum representing squared distance and allow the use case to call the final Pythagorean step only if they need it, which they often don't (for instance, you don't need to perform the square root to compare which of two squared lengths is longer).
This really should come before any discussion of optimization, though: don't worry about problems that aren't the problem. Rather than scouring your code for any possible issues, jump directly to fixing the biggest one - and if performance is outpacing missing functionality, bugs and/or UX shortcomings for your most glaring issue, it's nigh-impossible for micro-inefficiencies to have piled up to the point of outpacing a single bottleneck statement.
Or, as the opening of the cited article states:
In Lua, as in any other programming language, we should always follow the two
maxims of program optimization:
Rule #1: Don’t do it.
Rule #2: Don’t do it yet. (for experts only)

I honestly doubt these kinds of micro-optimizations really help any.
You should be focusing on your algorithms instead, like for example get rid of some distance calculations through pruning, stop calculating the square roots of values for comparison (tip: if a^2<b^2 and a>0 and b>0, then a<b), etc etc

Your "brute force" approach doesn't scale well.
What I mean by that is that every new object/player included in the system increases the number of operations significantly:
+---------+--------------+
| objects | calculations |
+---------+--------------+
| 40 | 1600 |
| 45 | 2025 |
| 50 | 2500 |
| 55 | 3025 |
| 60 | 3600 |
... ... ...
| 100 | 10000 |
+---------+--------------+
If you keep comparing "everything with everything", your algorithm will start taking more and more CPU cycles, in a cuadratic way.
The best option you have for optimizing your code isn't not in "fine tuning" the math operations or using local variables instead of references.
What will really boost your algorithm will be eliminating calculations that you don't need.
The most obvious example would be not calculating the distance between Player1 and Player2 if you already have calculated the distance between Player2 and Player1. This simple optimization should reduce your time by a half.
Another very common implementation consists in dividing the space into "zones". When two objects are on the same zone, you calculate the space between them normally. When they are in different zones, you use an approximation. The ideal way of dividing the space will depend on your context; an example would be dividing the space into a grid, and for players on different squares, use the distance between the centers of their squares, that you have computed in advance).
There's a whole branch in programming dealing with this issue; It's called Space Partitioning. Give this a look:
http://en.wikipedia.org/wiki/Space_partitioning

Seriously?
Running 800 of those calculations should not take more than 0.001 second - even in Lua on a phone.
Did you do some profiling to see if it's really slowing you down? Did you replace that function with "return (0)" to verify performance improves (yes, function will be lost).
Are you sure it's run every second and not every millisecond?
I haven't see an issue running 800 of anything simple in 1 second since like 1987.

If you want to calc sqrt for positive number a, take a recursive sequense
x_0 = a
x_n+1 = 1/2 * (x_n + a / x_n)
x_n goes to sqrt(a) with n -> infinity. first several iterations should be fast enough.
BTW! Maybe you'll try to use the following formula for length of vector instesd of standart.
local getDistance = function(a, b)
local x, y, z = a.x-b.x, a.y-b.y, a.z-b.z;
return x+y+z;
end;
It's much more easier to compute and in some cases (e.g. if distance is needed to know whether two object are close) it may act adequate.

AI: selecting immediate acceleration/rotation to get to a final point

I'm working on a game where on each update of the game loop, the AI is run. During this update, I have the chance to turn the AI-controlled entity and/or make it accelerate in the direction it is facing. I want it to reach a final location (within reasonable range) and at that location have a specific velocity and direction (again it doesn't need to be exact) That is, given a current:
P0(x, y) = Current position vector
V0(x, y) = Current velocity vector (units/second)
θ0 = Current direction (radians)
τmax = Max turn speed (radians/second)
αmax = Max acceleration (units/second^2)
|V|max = Absolute max speed (units/second)
Pf(x, y) = Target position vector
Vf(x, y) = Target velocity vector (units/second)
θf = Target rotation (radians)
Select an immediate:
τ = A turn speed within [-τmax, τmax]
α = An acceleration scalar within [0, αmax] (must accelerate in direction it's currently facing)
Such that these are minimized:
t = Total time to move to the destination
|Pt-Pf| = Distance from target position at end
|Vt-Vf| = Deviation from target velocity at end
|θt-θf| = Deviation from target rotation at end (wrapped to (-π,π))
The parameters can be re-computed during each iteration of the game loop. A picture says 1000 words so for example given the current state as the blue dude, reach approximately the state of the red dude within as short a time as possible (arrows are velocity):
Pic http://public.blu.livefilestore.com/y1p6zWlGWeATDQCM80G6gaDaX43BUik0DbFukbwE9I4rMk8axYpKwVS5-43rbwG9aZQmttJXd68NDAtYpYL6ugQXg/words.gif
Assuming a constant α and τ for Δt (Δt → 0 for an ideal solution) and splitting position/velocity into components, this gives (I think, my math is probably off):
Equations http://public.blu.livefilestore.com/y1p6zWlGWeATDTF9DZsTdHiio4dAKGrvSzg904W9cOeaeLpAE3MJzGZFokcZ-ZY21d0RGQ7VTxHIS88uC8-iDAV7g/equations.gif
(EDIT: that last one should be θ = θ0 + τΔt)
So, how do I select an immediate α and τ (remember these will be recomputed every iteration of the game loop, usually > 100 fps)? The simplest, most naieve way I can think of is:
Select a Δt equal to the average of the last few Δts between updates of the game loop (i.e. very small)
Compute the above 5 equations of the next step for all combinations of (α, τ) = {0, αmax} x {-τmax, 0, τmax} (only 6 combonations and 5 equations for each, so shouldn't take too long, and since they are run so often, the rather restrictive ranges will be amortized in the end)
Assign weights to position, velocity and rotation. Perhaps these weights could be dynamic (i.e. the further from position the entity is, the more position is weighted).
Greedily choose the one that minimizes these for the location Δt from now.
Its potentially fast & simple, however, there are a few glaring problems with this:
Arbitrary selection of weights
It's a greedy algorithm that (by its very nature) can't backtrack
It doesn't really take into account the problem space
If it frequently changes acceleration or turns, the animation could look "jerky".
Note that while the algorithm can (and probably should) save state between iterations, but Pf, Vf and θf can change every iteration (i.e. if the entity is trying to follow/position itself near another), so the algorithm needs to be able to adapt to changing conditions.
Any ideas? Is there a simple solution for this I'm missing?
Thanks,
Robert

sounds like you want a PD controller. Basically draw a line from the current position to the target. Then take the line direction in radians, that's your target radians. The current error in radians is current heading - line heading. Call it Eh. (heading error) then you say the current turn rate is going to be KpEh+d/dt EhKd. do this every step with a new line.
thats for heading
acceleration is "Accelerate until I've reached max speed or I wont be able to stop in time". you threw up a bunch of integrals so I'm sure you'll be fine with that calculation.
I case you're wondering, yes I've solved this problem before, PD controller works. don't bother with PID, don't need it in this case. Prototype in matlab. There is one thing I've left out, you need to have a trigger, like "i'm getting really close now" so I should start turning to get into the target. I just read your clarification about "only accelerating in the direction we're heading". that changes things a bit but not too much. that means to need to approach the target "from behind" meaning that the line target will have to be behind the real target, when you get near the behind target, follow a new line that will guide you to the real target. You'll also want to follow the lines, rather than just pick a heading and try to stick with it. So don't update the line each frame, just say the error is equal to the SIGNED DISTANCE FROM THE CURRENT TARGET LINE. The PD will give you a turn rate, acceleration is trivial, so you're set. you'll need to tweak Kd and Kp by head, that's why i said matlab first. (Octave is good too)
good luck, hope this points you in the right direction ;)
pun intended.
EDIT: I just read that...lots of stuff, wrote real quick. this is a line following solution to your problem, just google line following to accompany this answer if you want to take this solution as a basis to solving the problem.

I would like to suggest that yout consider http://en.wikipedia.org/wiki/Bang%E2%80%93bang_control (Bang-bang control) as well as a PID or PD. The things you are trying to minimise don't seem to produce any penalty for pushing the accelerator down as far as it will go, until it comes time to push the brake down as far as it will go, except for your point about how jerky this will look. At the very least, this provides some sort of justification for your initial guess.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio