Why useless if statement is improving performance? - performance

I'm writing a small library in rust as a 'collision detection' module. In trying to make a line-segment - line-segment intersection test as fast as possible, I've run into a strange discrepancy where seemingly useless code is making the function run faster.
My goal is optimising this method as much as possible, and knowing why the executable code changes performance as significantly as it does I would imagine would be helpful in achieving that goal. Whether it be a quirk in the rust compiler or something else, if anyone is knowledgeable to be able to understand this, here's what I have so far:
The two functions:
fn line_line(a1x: f64, a1y: f64, a2x: f64, a2y: f64, b1x: f64, b1y: f64, b2x: f64, b2y: f64) -> bool {
let dax = a2x - a1x;
let day = a2y - a1y;
let dbx = b2x - b1x;
let dby = b2y - b1y;
let dot = dax * dby - dbx * day;
let dd = dot * dot;
let nd1x = a1x - b1x;
let nd1y = a1y - b1y;
let t = (dax * nd1y - day * nd1x) * dot;
let u = (dbx * nd1y - dby * nd1x) * dot;
u >= 0.0 && u <= dd && t >= 0.0 && t <= dd
}
fn line_line_if(a1x: f64, a1y: f64, a2x: f64, a2y: f64, b1x: f64, b1y: f64, b2x: f64, b2y: f64) -> bool {
let dax = a2x - a1x;
let day = a2y - a1y;
let dbx = b2x - b1x;
let dby = b2y - b1y;
let dot = dax * dby - dbx * day;
if dot == 0.0 { return false } // useless branch if dot isn't 0 ?
let dd = dot * dot;
let nd1x = a1x - b1x;
let nd1y = a1y - b1y;
let t = (dax * nd1y - day * nd1x) * dot;
let u = (dbx * nd1y - dby * nd1x) * dot;
u >= 0.0 && u <= dd && t >= 0.0 && t <= dd
}
Criterion-rs bench code:
fn criterion_benchmark(c: &mut Criterion) {
c.bench_function("line-line", |b| b.iter(|| line_line(
black_box(0.0), black_box(1.0), black_box(2.0), black_box(0.0),
black_box(1.0), black_box(0.0), black_box(2.0), black_box(4.0))));
c.bench_function("line-line-if", |b| b.iter(|| line_line_if(
black_box(0.0), black_box(1.0), black_box(2.0), black_box(0.0),
black_box(1.0), black_box(0.0), black_box(2.0), black_box(4.0))));
}
Note that with these inputs, the if statement will not evaluate to true.
assert_eq!(line_line_if(0.0, 1.0, 2.0, 0.0, 1.0, 0.0, 2.0, 4.0), true); // does not panic
Criterion-rs bench results:
line-line time: [5.6718 ns 5.7203 ns 5.7640 ns]
line-line-if time: [5.1215 ns 5.1791 ns 5.2312 ns]
Godbolt.org rust compilation results for these two pieces of code.
Unfortunately I cannot read asm as well as I'd like. I do not know how to ask rustc to compile this without performing an inline of the entire calculation within the website either, but if you can do that I hope this may be of help?
My current environment is:
-Windows 10 (latest standard)
-Ryzen 3200G cpu
-Standard Rust install (no nightly/beta), compiling with znver1 (last I checked)
Testing is being done with cargo bench command, which I assume to be making optimizations, but haven't found a confirmation of this anywhere. Criterion-rs is doing all of the benchmark handling.
Thank you in advance if you're able to answer this.
EDIT:
Comparing the code in the godbolt link with -C opt-level=3 --target x86_64-pc-windows-msvc (the toolchain used by my installation of rustc currently) reveals that the compiler has inserted many unpcklpd and unpckhpd instructions into the function which may be slowing the code down rather than optimizing it. Setting the opt-level parameter to 2 does remove the extra unpck instructions without otherwise significantly changing the compilation result. However, adding
[profile.release]
opt-level = 2
While causing a recompilation, the backwards performance remained, same at opt-level = 1. I do not have target-cpu=native specified.

Related

Basic F# / Rust performance comparison

this is a simplistic performance test based on https://www.youtube.com/watch?v=QlMLB2-G25c which compares the performance of rust vs wasm vs python vs go
the original rust program (from https://github.com/masmullin2000/random-sort-examples) is:
use rand::prelude::*;
fn main() {
let vec = make_random_vec(1_000_000, 100);
for _ in 0..250 {
let mut v = vec.clone();
// v.sort_unstable();
v.sort(); // using stable sort as f# sort is a stable sort
}
}
pub fn make_random_vec(sz: usize, modulus: i64) -> Vec<i64> {
let mut v: Vec<i64> = Vec::with_capacity(sz);
for _ in 0..sz {
let x: i64 = random();
v.push(x % modulus);
}
v
}
so i created the following f# program to compare against rust
open System
let rec cls (arr:int64 array) count =
if count > 0 then
let v1 = Array.copy arr
let v2 = Array.sort v1
cls arr (count-1)
else
()
let rnd = Random()
let rndArray = Array.init 1000000 (fun _ -> int64 (rnd.Next(100)))
cls rndArray 250 |> ignore
i was expecting f# to be slower (both running on WSL2) but got the following times on my Core i7 8th gen laptop
Rust - around 17 seconds
Rust (unstable sort) - around 2.7 seconds
F# - around 11 seconds
my questions:
is the dotnet compiler doing some sort of optimisation that throws away some of the processing because the return values are not being used resulting in the f# code running faster or am i doing something wrong?
does f# have an unstable sort function that i can use to compare against the Rust unstable sort?

Int32 vs Float64 performances in Crystal

I ran this benchmark and I was very surprised to see that Crystal performances are almost the same for Int32 or Float64 operations.
$ crystal benchmarks/int_vs_float.cr --release
int32 414.96M ( 2.41ns) (±14.81%) 0.0B/op fastest
float64 354.27M ( 2.82ns) (±12.46%) 0.0B/op 1.17× slower
Do I have some weird side effects on my benchmark code?
require "benchmark"
res = 0
res2 = 0.0
Benchmark.ips do |x|
x.report("int32") do
a = 128973 / 119236
b = 119236 - 128973
d = 117232 > 123462 ? 117232 * 123462 : 123462 / 117232
res = a + b + d
end
x.report("float64") do
a = 1.28973 / 1.19236
b = 1.19236 - 1.28973
d = 1.17232 > 1.23462 ? 1.17232 * 1.23462 : 1.23462 / 1.17232
res = a + b + d
end
end
puts res
puts res2
First of all / in Crystal is float division, so this is largely comparing floats:
typeof(a) # => Float64
typeof(b) # => Int32
typeof(d) # => Float64 | Int32)
If we fix the benchmark to use integer division, //, I get:
int32 631.35M ( 1.58ns) (± 5.53%) 0.0B/op 1.23× slower
float64 773.57M ( 1.29ns) (± 3.21%) 0.0B/op fastest
Still no real difference, within error margin. Why's that? Let's dig deeper. First we can extract the examples into a not inlinable function and make sure to call it so Crystal doesn't just ignore it:
#[NoInline]
def calc
a = 128973 // 119236
b = 119236 - 128973
d = 117232 > 123462 ? 117232 * 123462 : 123462 // 117232
a + b + d
end
p calc
Then we can build this with crystal build --release --no-debug --emit llvm-ir to obtain an .ll file witht the optimized LLVM-IR. We dig out our calc function and see something like this:
define i32 #"*calc:Int32"() local_unnamed_addr #19 {
alloca:
%0 = tail call i1 #llvm.expect.i1(i1 false, i1 false)
br i1 %0, label %overflow, label %normal6
overflow: ; preds = %alloca
tail call void #__crystal_raise_overflow()
unreachable
normal6: ; preds = %alloca
ret i32 -9735
}
Where's all our calculations gone? LLVM did them at compile time because it was all constants! We can repeat the experiment with the Float64 example:
define double #"*calc:Float64"() local_unnamed_addr #11 {
alloca:
ret double 0x40004CAA3B35919C
}
A little less boilerplate, hence it being slightly faster, but again, all precomputed!
I'll end the exercise here. Further research for the reader:
What happens if we try to introduce non constant terms into all expressions?
Is the premise that 32bit integer operations should be any faster or slower than 64bit IEEE754 floating point operations on a modern 64bit CPU sane?

What is the simplest method to achieve 100% CPU usage in F#?

Sometimes I write code that I expect to be able to fully saturate the CPU.
For example, to calculate the Mandelbrot set, you might use something like this:
type MandelbrotPoint =
|Escaped of int
|NotEscaped
let getIterationCount maxIters (c:Complex) =
let rec innerFunction iters (z:Complex) =
match z.Magnitude, iters with
|m, i when m > 2.0 -> Escaped i
|_, i when i > maxIters -> NotEscaped
|_ -> innerFunction (iters + 1) (z * z + c)
innerFunction 0 c
let getIterationCounts (topLeft:Complex) pixelWidth pixelHeight realWidth =
let xGap = realWidth / ((pixelWidth - 1) |> float)
[|for iY in 0 .. (pixelHeight - 1) do
for iX in 0 .. (pixelWidth - 1) do
yield Complex(topLeft.Real + xGap * (float iX), topLeft.Imaginary - xGap * (float iY))
|]
|> Array.Parallel.map (getIterationCount 1000)
Naively, I would expect this to run at close to 100% until complete, but it bounces around between 25% and 60%.
I get that often calculations are constrained by how long it takes data to move into and out of the CPU cache, but that shouldn't be a problem here right? There is much data to move, its just a simple iterative calculation, no?
On my 4-core 8-thread CPU the below naive snippet does pretty close to 100% CPU saturation by FSI process:
let consume (x: int) =
while true do
let _ = x*x
()
[|0..7|] |> Array.Parallel.iter consume

How to improve performance with F# idioms

I'm using this course on Machine-Learning to learn F# at the same time. I've done the following homework exercise which is the first exercise of the second week:
Run a computer simulation for flipping 1,000 virtual fair coins. Flip
each coin independently 10 times. Focus on 3 coins as follows: c1
is the first coin flipped, crand is a coin chosen randomly from
the 1,000, and cmin is the coin which had the minimum frequency of
heads (pick the earlier one in case of a tie).
Let ν1 , νrand
, and νmin be the fraction of heads obtained for the 3 respective
coins out of the 10 tosses. Run the experiment 100,000 times in order
to get a full distribution of ν1 , νrand, and νmin (note that c rand
and c min will change from run to run).
What is the average value of νmin?
I have produced the following code, which works fine and gives the correct answer:
let private rnd = System.Random()
let FlipCoin() = rnd.NextDouble() > 0.5
let FlipCoinNTimes N = List.init N (fun _ -> FlipCoin())
let FlipMCoinsNTimes M N = List.init M (fun _ -> FlipCoinNTimes N)
let ObtainFrequencyOfHeads tosses =
let heads = tosses |> List.filter (fun toss -> toss = true)
float (List.length (heads)) / float (List.length (tosses))
let GetFirstRandMinHeadsFraction allCoinsLaunchs =
let first = ObtainFrequencyOfHeads(List.head (allCoinsLaunchs))
let randomCoin = List.item (rnd.Next(List.length (allCoinsLaunchs))) allCoinsLaunchs
let random = ObtainFrequencyOfHeads(randomCoin)
let min =
allCoinsLaunchs
|> List.map (fun coin -> ObtainFrequencyOfHeads coin)
|> List.min
(first, random, min)
module Exercice1 =
let GetResult() =
Seq.init 100000 (fun _ -> FlipMCoinsNTimes 1000 10)
|> Seq.map (fun oneExperiment -> GetFirstRandMinHeadsFraction oneExperiment)
|> Seq.map (fun (first, random, min) -> min)
|> Seq.average
However, it takes roughly 4 minutes to run in my machine. I know that it is doing a lot of work, but I'm wondering if there are some modifications that could be made to optimize it.
As I'm trying lo learn F#, I'm asking for optimizations that use F# idioms, not to change the code to a C-style.
Feel free to suggest any kind of improvement, in style, good practices, etc.
[UPDATE]
I have written some code to compare the proposed solutions, it is accesible here.
These are the results:
Base - result: 0.037510, time elapsed: 00:00:55.1274883, improvement:
0.99 x
Matthew Mcveigh - result: 0.037497, time elapsed: 00:00:15.1682052, improvement: 3.61 x
Fyodor Soikin - result:0.037524, time elapsed: 00:01:29.7168787, improvement: 0.61 x
GuyCoder - result: 0.037645, time elapsed: 00:00:02.0883482, improvement: 26.25 x
GuyCoder MathNet- result: 0.037666, time elapsed:
00:00:24.7596117, improvement: 2.21 x
TheQuickBrownFox - result:
0.037494, time elapsed: 00:00:34.2831239, improvement: 1.60 x
The winner concerning the improvement in time is the GuyCoder, so I will accept his answer. However, I find that his code is more difficult to understand.
Allocating a large amount of lists up front is heavy work, the algorithm can be processed online e.g. via sequences or recursion. I transformed all the work into tail recursive functions for some raw speed (will be transformed into loops by the compiler)
not guaranteed to be 100% correct, but hopefully gives you a gist of where I was going with it:
let private rnd = System.Random()
let flipCoin () = rnd.NextDouble() > 0.5
let frequencyOfHeads flipsPerCoin =
let rec countHeads numHeads i =
if i < flipsPerCoin then
let isHead = flipCoin ()
countHeads (if isHead then numHeads + 1 else numHeads) (i + 1)
else
float numHeads
countHeads 0 0 / float flipsPerCoin
let getFirstRandMinHeadsFraction numCoins flipsPerCoin =
let randomCoinI = rnd.Next numCoins
let rec run first random min i =
if i < numCoins then
let frequency = frequencyOfHeads flipsPerCoin
let first = if i = 0 then frequency else first
let random = if i = randomCoinI then frequency else random
let min = if min > frequency then frequency else min
run first random min (i + 1)
else
(first, random, min)
run 0.0 0.0 System.Double.MaxValue 0
module Exercice1 =
let getResult () =
let iterations, numCoins, numFlips = 100000, 1000, 10
let getMinFromExperiment () =
let (_, _, min) = getFirstRandMinHeadsFraction numCoins numFlips
min
let rec sumMinFromExperiments i sumOfMin =
if i < iterations then
sumMinFromExperiments (i + 1) (sumOfMin + getMinFromExperiment ())
else
sumOfMin
let sum = sumMinFromExperiments 0 0.0
sum / float iterations
Running your code on my computer and timing I get:
seconds: 68.481918
result: 0.47570994
Running my code on my computer and timing I get:
seconds: 14.003861
vOne: 0.498963
vRnd: 0.499793
vMin: 0.037675
with vMin being closest to the correct answer of b being 0.01
That is almost 5x faster.
I did not tinker with each method and data structure to figure out why and what worked, I just used many decades of experience to guide me. Clearly not storing the intermediate values but just the results is a big improvement. Specifically coinTest just returns the number of heads which is an int and not a list of the results. Also instead of getting a random number for each coin flip but getting a random number for each coin and then using each part of that random number as a coin flip is advantageous. That saves number of flips - 1 calls to a function. Also I avoided using float values until the very end; I don't consider that saving time on the CPU, but it did simplify the thought process of thinking only in int which allowed me to concentrate on other efficiencies. I know that may sound weird but the less I have to think about the better the answers I get. I also only ran coinTest when it was necessary, e.g. only the first coin, only the random coin, and looked for all tails as an exit condition.
namespace Workspace
module main =
[<EntryPoint>]
let main argv =
let rnd = System.Random()
let randomPick (limit : int) : int = rnd.Next(limit) // [0 .. limit) it's a Python habit
let numberOfCoins = 1000
let numberOfFlips = 10
let numberOfExperiements = 100000
let coinTest (numberOfFlips : int) : int =
let rec countHeads (flips : int) bitIndex (headCount : int) : int =
if bitIndex < 0 then headCount
else countHeads (flips >>> 1) (bitIndex-1) (headCount + (flips &&& 0x01))
countHeads (randomPick ((pown 2 numberOfFlips) - 1)) numberOfFlips 0
let runExperiement (numberOfCoins : int) (numberOfFlips : int) : (int * int * int) =
let (randomCoin : int) = randomPick numberOfCoins
let rec testCoin coinIndex (cFirst, cRnd, cMin, cFirstDone, cRanDone, cMinDone) : (int * int * int) =
if (coinIndex < numberOfCoins) then
if (not cFirstDone || not cRanDone || not cMinDone) then
if (cFirstDone && cMinDone && (coinIndex <> randomCoin)) then
testCoin (coinIndex+1) (cFirst, cRnd, cMin, cFirstDone, cRanDone, cMinDone)
else
let headsTotal = coinTest numberOfFlips
let (cFirst, cRnd, cMin, cFirstDone, cRanDone, cMinDone) =
let cFirst = if coinIndex = 0 then headsTotal else cFirst
let cRnd = if coinIndex = randomCoin then headsTotal else cRnd
let cMin = if headsTotal < cMin then headsTotal else cMin
let cRanDone = if (coinIndex >= randomCoin) then true else cRanDone
let cMinDone = if (headsTotal = 0) then true else cMinDone
(cFirst, cRnd, cMin, true, cRanDone, cMinDone)
testCoin (coinIndex+1) (cFirst, cRnd, cMin, cFirstDone, cRanDone, cMinDone)
else
(cFirst, cRnd, cMin)
else
(cFirst, cRnd, cMin)
testCoin 0 (-1,-1,10, false, false, false)
let runExperiements (numberOfExperiements : int) (numberOfCoins : int) ( numberOfFlips : int) =
let rec accumateExperiements index aOne aRnd aMin : (int * int * int) =
let (cOne,cRnd,cMin) = runExperiement numberOfCoins numberOfFlips
if index > numberOfExperiements then (aOne, aRnd, aMin)
else accumateExperiements (index + 1) (aOne + cOne) (aRnd + cRnd) (aMin + cMin)
let (aOne, aRnd, aMin) = accumateExperiements 0 0 0 0
let (vOne : double) = (double)(aOne) / (double)numberOfExperiements / (double)numberOfFlips
let (vRnd : double) = (double)(aRnd) / (double)numberOfExperiements / (double)numberOfFlips
let (vMin : double) = (double)(aMin) / (double)numberOfExperiements / (double)numberOfFlips
(vOne, vRnd, vMin)
let timeIt () =
let stopWatch = System.Diagnostics.Stopwatch.StartNew()
let (vOne, vRnd, vMin) = runExperiements numberOfExperiements numberOfCoins numberOfFlips
stopWatch.Stop()
printfn "seconds: %f" (stopWatch.Elapsed.TotalMilliseconds / 1000.0)
printfn "vOne: %A" vOne
printfn "vRnd: %A" vRnd
printfn "vMin: %A" vMin
timeIt ()
printf "Press any key to exit: "
System.Console.ReadKey() |> ignore
printfn ""
0 // return an integer exit code
========================================================================
This is just an intermediate answer because I inquired if the OP considered using MathNet Numerics idiomatic F# and the OP wanted to see what that looked like. After running his version and this first cut version on my machine the OP version is faster. OP: 75 secs, mine: 84 secs
namespace Workspace
open MathNet.Numerics.LinearAlgebra
module main =
[<EntryPoint>]
let main argv =
let rnd = System.Random()
let flipCoin() =
let head = rnd.NextDouble() > 0.5
if head then 1.0 else 0.0
let numberOfCoins = 1000
let numberOfFlips = 10
let numberOfExperiements = 100000
let numberOfValues = 3
let randomPick (limit : int) : int = rnd.Next(limit) // [0 .. limit) it's a Python habit
let headCount (m : Matrix<float>) (coinIndex : int) : int =
System.Convert.ToInt32((m.Row coinIndex).Sum())
let minHeads (m : Matrix<float>) (numberOfCoins : int) (numberOfFlips : int) : int =
let rec findMinHeads currentCoinIndex minHeadsCount minHeadsIndex =
match currentCoinIndex,minHeadsCount with
| -1,_ -> minHeadsCount
| _,0 -> minHeadsCount // Can't get less than zero so stop searching.
| _ ->
let currentMinHeadCount = (headCount m currentCoinIndex)
let nextIndex = currentCoinIndex - 1
if currentMinHeadCount < minHeadsCount
then findMinHeads nextIndex currentMinHeadCount currentCoinIndex
else findMinHeads nextIndex minHeadsCount minHeadsIndex
findMinHeads (numberOfCoins - 1) numberOfFlips -1
// Return the values for cOne, cRnd, and cMin as int values.
// Will do division on final sum of experiments instead of after each experiment.
let runExperiement (numberOfCoins : int) (numberOfFlips : int) : (int * int * int) =
let (flips : Matrix<float>) = DenseMatrix.init numberOfCoins numberOfFlips (fun i j -> flipCoin())
let cOne = headCount flips 0
let cRnd = headCount flips (randomPick numberOfCoins)
let cMin = minHeads flips numberOfCoins numberOfFlips
(cOne,cRnd,cMin)
let runExperiements (numberOfExperiements : int) (numberOfCoins : int) (numberOfFlips : int) : (int [] * int [] * int []) =
let (cOneArray : int[]) = Array.create numberOfExperiements 0
let (cRndArray : int[]) = Array.create numberOfExperiements 0
let (cMinArray : int[]) = Array.create numberOfExperiements 0
for i = 0 to (numberOfExperiements - 1) do
let (cOne,cRnd,cMin) = runExperiement numberOfCoins numberOfFlips
cOneArray.[i] <- cOne
cRndArray.[i] <- cRnd
cMinArray.[i] <- cMin
(cOneArray, cRndArray, cMinArray)
let (cOneArray, cRndArray, cMinArray) = runExperiements numberOfExperiements numberOfCoins numberOfFlips
let (vOne : double) = (double)(Array.sum cOneArray) / (double)numberOfExperiements / (double)numberOfFlips
let (vRnd : double) = (double)(Array.sum cRndArray) / (double)numberOfExperiements / (double)numberOfFlips
let (vMin : double) = (double)(Array.sum cMinArray) / (double)numberOfExperiements / (double)numberOfFlips
printfn "vOne: %A" vOne
printfn "vRnd: %A" vRnd
printfn "vMin: %A" vMin
Halfway through the coding I realized I could do all of the calculations using just int, it was only the last calculations that generated the percentages that needed to be a float or double and even then that is only because the list of answers is a percentage; in theory the numbers can be compared as int to get the same understanding. If I use only int then I would have to create an int Matrix type and that is more work than I want to do. When I get time I will switch the MathNet Matrix to an F# Array2D or something similar and check that. Note if you tag this with MathNet then the maintainer of MathNet might answer (Christoph Rüegg)
I made an change to this method and it is faster by 5 seconds.
// faster
let minHeads (m : Matrix<float>) (numberOfCoins : int) (numberOfFlips : int) : int =
let (mins : float[]) = m.FoldByRow((fun (x : float) y -> x + y), 0.0)
let (minHead : float) = Array.min mins
System.Convert.ToInt32(minHead)
I tried to find the smallest possible changes to your code to make it faster.
The biggest performance improvement I found was by changing the ObtainFrequencyOfHeads function so that it counts true values in the collection instead of creating an intermediate filtered collection and then counting that. I did this by using fold:
let ObtainFrequencyOfHeads tosses =
let heads = tosses |> List.fold (fun state t -> if t then state + 1 else state) 0
float heads / float (List.length (tosses))
Another improvement came from changing all of the lists into arrays. This was as simple as replacing every instance of List. with Array. (including the new function above).
Some might say this is less functional, because it's using a mutable collection instead of an immutable one. However, we're not mutating any arrays, just using the fact that they are cheap to create, check the length of, and look up by index. We have removed a restriction on mutation but we are still not using mutation. It is certainly idiomatic F# to use arrays for performance if required.
With both of these changes I got almost a 2x performance improvement in FSI.

Solving ODE in Scilab

I am trying to do some electric circuit analysis in Scilab by solving an ODE. But I need to change an ODE depending on current value of the function. I have implemented the solution in Scala using RK4 method and it works perfectly. Now I am trying to do the same but using standard functions in Scilab. And it is not working. I have tried to solve these two ODEs seperately and it is OK.
clear
state = 0 // state 0 is charging, 1 is discharging
vb = 300.0; vt = 500.0;
r = 100.0; rd = 10.0;
vcc = 600;
c = 48.0e-6;
function dudx = curfunc(t, uu)
if uu < vb then state = 0
elseif uu > vt state = 1
end
select state
case 0 then // charging
dudx = (vcc - uu) / (r * c)
case 1 then // discharging
dudx = - uu / (rd * c) + (vcc - uu) / (r * c)
end
endfunction
y0 = 0
t0 = 0
t = 0:1e-6:10e-3
%ODEOPTIONS=[1, 0, 0, 1e-6, 1e-12, 2, 500, 12, 5, 0, -1, -1]
y = ode(y0, t0, t, 1e-3, 1e-6, curfunc)
clear %ODEOPTIONS
plot(t, y)
so here I am solving for a node voltage, if node voltage is exceeding top threshold (vt) then discharging ODE is used, if node voltage is going below the bottom voltage (vb) then charging ODE is used. I have tried to play with %ODEOPTIONS but no luck
you can also use the ode("root"...) option.
The code will ressemble to
clear
state = 0 // state 0 is charging, 1 is discharging
vb = 300.0; vt = 500.0;
r = 100.0; rd = 10.0;
vcc = 600;
c = 48.0e-6;
function dudx = charging(t, uu)
//uu<vt
dudx = (vcc - uu) / (r * c)
endfunction
function e=chargingssurf(t,uu)
e=uu-vt
endfunction
function dudx = discharging(t, uu)
//uu<vb
dudx = - uu / (rd * c) + (vcc - uu) / (r * c)
endfunction
function e=dischargingssurf(t,uu)
e=uu-vb
endfunction
y0 = 0
t0 = 0
t = 0:1e-6:10e-3
Y=[];
T=[];
[y,rt] = ode("root",y0, t0, t, 1e-3, 1e-6, charging,1,chargingssurf);
disp(rt)
k=find(t(1:$-1)<rt(1)&t(2:$)>=rt(1))
Y=[Y y];;
T=[T t(1:k) rt(1)];
[y,rt] = ode("root",y($), rd(1), t(k+1:$), 1e-3, 1e-6, discharging,1,dischargingssurf);
Y=[Y y];
T=[T t(k+1:$)];
plot(T, Y)
The discharging code seems to be wrong...
I think it boils down to a standard problem: The standard ODE solvers have step size control. In contrast to the fixed step size of RK4. For step size control to work, the ODE function needs to have differentiability at least of the same order as the order of the method. Jumps as in your function are thus extremely unfortunate.
Another point to consider is that the internal steps of the method are not always in increasing time, they may jump backwards. See the coefficient table for Dormand-Prince for an example. Thus event based model changes as in your problem may lead to strange effects if the first problem were circumvented.
Workaround (not really a solution)
Declare state as a global variable, since as a local variable the value gets always reset to the global constant 0.
function dudx = curfunc(t, uu)
global state
...
endfunction
...
y = ode("fix",y0, t0, t, rtol, atol, curfunc)

Resources