Short Question
Is there a common way to handle very large anomalies (order of magnitude) within an otherwise uniform control region?
Background
I am working on a control algorithm that drives a motor across a generally uniform control region. With no / minimal loading the PID control works great (fast response, little to no overshoot). The issue I'm running into is there will usually be at least one high load location. The position is determined by the user during installation, so there is no reasonable way for me to know when / where to expect it.
When I tune the PID to handle the high load location, it causes large over shoots on the non-loaded areas (which I fully expected). While it is OK to overshoot mid travel, there are no mechanical hard stops on the enclosure. The lack of hardstops means that any significant overshoot can / does cause the control arm to be disconnected from the motor (yielding a dead unit).
Things I'm Prototyping
Nested PIDs (very agressive when far away from target, conservative when close by)
Fixed gain when far away, PID when close
Conservative PID (works with no load) + an external control that looks for the PID to stall and apply additional energy until either: the target is achieved or rapid rate of change is detected (ie leaving the high load area)
Hardware Limitations
Full travel defined
Hardstops cannot be added (at this point in time)
Update
My answer does not indicate that this is best solution. It's just my current solution that I thought I would share.
Initial Solution
stalled_pwm_output = PWM / | ΔE |
PWM = Max PWM value
ΔE = last_error - new_error
The initial relationship successfully ramps up the PWM output based on the lack of change in the motor. See the graph below for the sample output.
This approach makes since for the situation where the non-aggressive PID stalled. However, it has the unfortunate (and obvious) issue that when the non-aggressive PID is capable of achieving the setpoint and attempts to slow, the stalled_pwm_output ramps up. This ramp up causes a large overshoot when traveling to a non-loaded position.
Current Solution
Theory
stalled_pwm_output = (kE * PID_PWM) / | ΔE |
kE = Scaling Constant
PID_PWM = Current PWM request from the non-agressive PID
ΔE = last_error - new_error
My current relationship still uses the 1/ΔE concept, but uses the non-aggressive PID PWM output to determine the stall_pwm_output. This allows the PID to throttle back the stall_pwm_output when it starts getting close to the target setpoint, yet allows 100% PWM output when stalled. The scaling constant kE is needed to ensure the PWM gets into the saturation point (above 10,000 in graphs below).
Pseudo Code
Note that the result from the cal_stall_pwm is added to the PID PWM output in my current control logic.
int calc_stall_pwm(int pid_pwm, int new_error)
{
int ret = 0;
int dE = 0;
static int last_error = 0;
const int kE = 1;
// Allow the stall_control until the setpoint is achived
if( FALSE == motor_has_reached_target())
{
// Determine the error delta
dE = abs(last_error - new_error);
last_error = new_error;
// Protect from divide by zeros
dE = (dE == 0) ? 1 : dE;
// Determine the stall_pwm_output
ret = (kE * pid_pwm) / dE;
}
return ret;
}
Output Data
Stalled PWM Output
Note that in the stalled PWM output graph the sudden PWM drop at ~3400 is a built in safety feature activated because the motor was unable to reach position within a given time.
Non-Loaded PWM Output
Related
I am trying to generate a short delay between two calls writing HW based registers in GNU C on ARM (Linux).
It looks like the system latency is too high when I am using usleep() or nanosleep() functions.
The following code fragment
struct timespec ts;
ts.tv_sec = 0;
ts.tv_nsec = 1; // 1 nano second
//...
do{ } while (nanosleep(&ts, &ts));
results in over 100 us delay (comparing when present or commented out).
What is the way around? Since my desired delay is approximately 2 us I can possibly live even with a blocking function.
As #Lubo hinted I cannot rely on reliable delay generated within my code since that may be interrupted.
The HW register I am writing needs ~ 1us between two consequent writes.
If I want to generate a shortest possible delay at least 2us and won't mind getting longer delay in cases I get interrupted I may still be fine. In total I may acquire less delay compared to the current state when every time I am getting 100us more than intended.
i have some little trouble and i am asking for hint. I am on Windows platform, doing calculations in a following manner:
int input = 0;
int output; // junk bytes here
while(true) {
async_enqueue_upload(input); // completes instantly, but transfer will take 10us
async_enqueue_calculate(); // completes instantly, but computation will take 80us
async_enqueue_download(output); // completes instantly, but transfer will take 10us
sync_wait_finish(); // must wait while output is fully calculated, and there is no junk
input = process(output); // i cannot launch next step without doing it on the host.
}
I am asking about wait_finish() thing. I must wait all devices to finish, to combine all results and somehow process the data and upload a new portion, that is based on a previous computation step. I need to sync data in between each step, so i can't parallelize steps. I know, this is not quite performant case. So lets proceed to question.
I have 2 ways of checking completion, within wait_finish(). First is to put thread to sleep until it wakes up by completion event:
while( !is_completed() )
Sleep(1);
It has very low performance, because actual calculation, to say, takes 100us, and minimal Windows sheduler timestep is 1ms, so it gives unsuitable 10x lower performance.
Second way is to check completion in empty infinite loop:
while( !is_completed() )
{} // do_nothing();
It has 10x good computation performance. But it is also unsuitable solution, because it makes full cpu core utilisation usage, with absolutely useless work. How to make cpu "sleep" exactly time i needed? (Each step has equal amount of work)
How this case is usually solved, when amount of calculation time is too big for active spin-wait, but is too small compared to sheduler timestep? Also related subquestion - how to do that on linux?
Fortunately, i have succeeded in finding answer on my own. In short words - i should use linux for that.
And my investigation shows following. On windows there is hidden function in ntdll, NtDelayExecution(). It is not exposed through SDK, but can be loaded in a following manner:
static int(__stdcall *NtDelayExecution)(BOOL Alertable, PLARGE_INTEGER DelayInterval) = (int(__stdcall*)(BOOL, PLARGE_INTEGER)) GetProcAddress(GetModuleHandleW(L"ntdll.dll"), "NtDelayExecution");
It allows to set sleep intervals in 100ns periods. However, even that not worked well, as shown in a following benchmark:
SetPriorityClass(GetCurrentProcess(), REALTIME_PRIORITY_CLASS); // requires Admin privellegies
SetThreadPriority(GetCurrentThread(), THREAD_PRIORITY_TIME_CRITICAL);
uint64_t hpf = qpf(); // QueryPerformanceFrequency()
uint64_t s0 = qpc(); // QueryPerformanceCounter()
uint64_t n = 0;
while (1) {
sleep_precise(1); // NtDelayExecution(-1); waits one 100-nanosecond interval
auto s1 = qpc();
n++;
auto passed = s1 - s0;
if (passed >= hpf) {
std::cout << "freq=" << (n * hpf / passed) << " hz\n";
s0 = s1;
n = 0;
}
}
That yields something less than just 2000 hz loop rate, and result varies from string to string. That led me towards windows thread switching sheduler, which is totally not suited for real time tasks. And its minimum interval of 0.5ms (+overhead). Btw, does anyone knows on how to tune that value?
And next was linux question, and what does it can? So i've built custom tiny kernel 4.14 with means of buildroot, and tested that benchmark code there. I replaced qpc() to return clock_gettime() data, with CLOCK_MONOTONIC clock, and qpf() just returns number of nanoseconds in a second and sleep_precise() just called clock_nanosleep(). I was failed to find out what is the difference between CLOCK_MONOTONIC and CLOCK_REALTIME.
And i was quite surprised, getting whooping 18.4khz frequency just out of the box, and that was quite stable. While i tested several intervals, i found that i can set the loop to almost any frequency up to 18.4khz, but also that actual measured wait time results differs to 1.6 times of what i asked. For example if i ask to sleep 100 us it actually sleeps for ~160 us, giving ~6.25 khz frequency. Nothing else is going on the system, just kernel, busybox and this test. I am not an experience linux user, and i am still wondering how can i tune this to be more real-time and deterministic. Can i push that frequency maximum even more?
I'm experimenting with writing a simplistic single-AU play-through based, (almost)-no-latency tracking phase vocoder prototype in C. It's a standalone program. I want to find how much processing load can a single render callback safely bear, so I prefer keeping off async DSP.
My concept is to have only one pre-determined value which is window step, or hop size or decimation factor (different names for same term used in different literature sources). This number would equal inNumberFrames, which somehow depends on the device sampling rate (and what else?). All other parameters, such as window size and FFT size would be set in relation to the window step. This seems the simplest method for keeipng everything inside one callback.
Is there a safe method to machine-independently and safely guess or query which could be the inNumberFrames before the actual rendering starts, i.e. before calling AudioOutputUnitStart()?
The phase vocoder algorithm is mostly standard and very simple, using vDSP functions for FFT, plus custom phase integration and I have no problems with it.
Additional debugging info
This code is monitoring timings within the input callback:
static Float64 prev_stime; //prev. sample time
static UInt64 prev_htime; //prev. host time
printf("inBus: %d\tframes: %d\tHtime: %lld\tStime: %7.2lf\n",
(unsigned int)inBusNumber,
(unsigned int)inNumberFrames,
inTimeStamp->mHostTime - prev_htime,
inTimeStamp->mSampleTime - prev_stime);
prev_htime = inTimeStamp->mHostTime;
prev_stime = inTimeStamp->mSampleTime;
Curious enough, the argument inTimeStamp->mSampleTime actually shows the number of rendered frames (name of the argument seems somewhat misguiding). This number is always 512, no matter if another sampling rate has been re-set through AudioMIDISetup.app at runtime, as if the value had been programmatically hard-coded. On one hand, the
inTimeStamp->mHostTime - prev_htime
interval gets dynamically changed depending on the sampling rate set in a mathematically clear way. As long as sampling rate values match multiples of 44100Hz, actual rendering is going on. On the other hand 48kHz multiples produce the rendering error -10863 ( =
kAudioUnitErr_CannotDoInCurrentContext ). I must have missed a very important point.
The number of frames is usually the sample rate times the buffer duration. There is an Audio Unit API to request a sample rate and a preferred buffer duration (such as 44100 and 5.8 mS resulting in 256 frames), but not all hardware on all OS versions honors all requested buffer durations or sample rates.
Assuming audioUnit is an input audio unit:
UInt32 inNumberFrames = 0;
UInt32 propSize = sizeof(UInt32);
AudioUnitGetProperty(audioUnit,
kAudioDevicePropertyBufferFrameSize,
kAudioUnitScope_Global,
0,
&inNumberFrames,
&propSize);
This number would equal inNumberFrames, which somehow depends on the device sampling rate (and what else?)
It depends on what you attempt to set it to. You can set it.
// attempt to set duration
NSTimeInterval _preferredDuration = ...
NSError* err;
[[AVAudioSession sharedInstance]setPreferredIOBufferDuration:_preferredDuration error:&err];
// now get the actual duration it uses
NSTimeInterval _actualBufferDuration;
_actualBufferDuration = [[AVAudioSession sharedInstance] IOBufferDuration];
It would use a value roughly around the preferred value you set. The actual value used is a time interval based on a power of 2 and the current sample rate.
If you are looking for consistency across devices, choose a value around 10ms. The worst performing reasonable modern device is iOS iPod touch 16gb without the rear facing camera. However, this device can do around 10ms callbacks with no problem. On some devices, you "can" set the duration quite low and get very fast callbacks, but often times it will crackle up because the processing is not finished in the callback before the next callback happens.
I have to read 5 different frequencies(square wave) up to 20KHz by polling 5 different pins.
Im using a single timer interrupt only,for every 1 millisecond.
Polling of the pins would be done in the ISR.
The algorithm i have thought of so far is:
1.Count number of HIGH
2.Count number of LOW
3.Check if sum of HIGH+LOW=Time period.
This algorithm seems slow and is not practical.
Is there any Filter functions that i could use to check the frequency at pin so that all i have to do would be to call that function?
Any other algorithms, for frequency detection would be good.
I am restricted to only 1 interrupt in my code(timer interrupt)
you need to take in mind what are the input signal properties
as you are limited to just single interrupt (which is pretty odd)
and the timer has only 1 KHz
the input signal must not be above 0.5 KHz
if the signal is with noise the frequency can easily rise above that limit many times
speed
you wrote that the simple counting period approach is slow
what CPU and IO power have you?
I am used to Atmel AVR32 AT32UC3 chips which are 2 generation before ARM cortex chips
and I have around 96MIPS and 2-5 MHz pin R/W frequency there without DMA or interrupts
so what exactly is slow on that approach?
I would code it with your constrains like this (it is just C++ pseudo code not using your platform):
const int n=5;
volatile int T[n],t[n],s[n],r[n]; // T last measured period,t counter,s what edge is waitng for?
int T0[n]={?,?,?,...?}; // periods to compare with
void main() // main process
{
int i;
for (i=0;i<n;i++) { T[i]=0; t[i]=0; s[i]=0; r[i]=0; }
// config pins
// config and start timer
for (;;) // inf loop
{
for (i=0;i<n;i++) // test all pins
if (r[i]>=2) // ignore not fully measured pins
{
r[i]=2;
if (abs(T[i]-T[0])>1) // compare +/- 1T of timer can use even bigger number then 1
// frequency of pin(i) is not as it should be
}
}
}
void ISR_timer() // timer interrupt
{
// test_out_pin=H
int i;
bool p;
for (i=0;i<n;i++)
{
p=get_pin_state(i); // just read pin as true/false H/L
t[i]++; // inc period counter
if (s[i]==0){ if ( p) s[i]=1; } // edge L->H
else { if (!p) s[i]=0; T[i]=t[i]; t=0; r[i]++; } // edge H->L
}
// test_out_pin=L
}
you can also scan as comparation between last pin state and actual
that would eliminate the need of s[]
something like p0=p1; p1=get_pin_state(i); if ((p1)&&(p0!=p1)) { T[i]=t[i]; t[i]=0; }
this way you can also more easily implement SW glitch filters
but I think he MCU should have HW filters too (like most MCU does)
How would I do this without your odd constraints?
I would use external interrupt
usually they can be configured to be trigger on specific edge of signal
also including the HW filtering of noise
on each interrupt take the internal CPU clock counter value
and if it is not at disposal then timer/counter state
substract with the last measured one
restore from overflow if occured
change to s or Hz if needed
this way I can scan pins on MCU with 30MHz clock reliably with frequency up to 15MHz (use this for IRC decoder)
and yes IRC can give you above 1 MHz frequencies on occasions (on edge between states)
if you want the ratio also the you can:
have 2 interrupts one for positive and second for negative edge
or use just one and reconfigure the edge after each hit (Atmel UC3L chips I used some time ago had problem with this one due to internal bugs)
[notes]
it is essential that the pins you accessing are at the same IO port
so you can read them all at once and then just decode the pins after
also the GPIO module is usually configurable so check what clock is it powered with
there are usually 2 clocks one for interfacing the GPIO module with CPU core
and the second for the GPIO itself so check both
you can also use DMA instead of external interrupt
if you can configure to read the IO port by DMA ... to memory somewhere
then you can inspect on the background process independent to IO
I have a piece of code for getting same analog pwm output voltage from PB4 and PB5 using fast pwm in output compare mode. However the voltage from them is different. What could possibly be the reason for this ? Also the voltage from neither of the pins is close to 1.23 V which is what should be the output voltage should be.
Here is the code.
#include <`avr/io.h`>
#include <`avr/interrupt.h`>
ISR(TIMER0_COMP_vect)
{
cli();
PORTB &= ~(1<<PB5);
sei();
}
ISR(TIMER0_OVF_vect)
{
cli();
PORTB |= (1<<PB5);
sei();
}
void init(void)
{
TCCR0 |= (0<<FOC0)|(1<<WGM01)|(1<<WGM00)|(1<<COM01)|(1<<COM00)|(1<<CS02)|(1<<CS01)|(1<<CS00);
OCR0 = 63;
TIMSK |= (1<<OCIE0)|(1<<TOIE0);
}
int main(void)
{
DDRB = 0xFF;
PORTB = 0xFF;
init();
sei();
while(1);
}
Firstly, if you're using something like an ATMega328p, having all three CS bits set will enable an external clock source, rather than using the internal clock, and so the timer won't run (unless you do actually have an external timer clock source). Depending on what microcontroller you are using, make sure that those bits are enabling a particular prescaler value instead.
Secondly, you might also encounter problems due to your measurement method and the way PWM actually works. Though it is often listed as an analog output when dealing with Arduinos, pulse width modulation actually does exactly what it says - it rapidly switches a digital output between ground and VCC (likely 5V), with a varying duty cycle. If one of those output pins is viewed on an oscilloscope, it will probably show some form of square wave.
When measured with a multimeter, the value you are seeing will be a combination of samples taken while the output is either high or low, and possibly an average of these randomly timed samples, hence the unexpected reading.
To get your desired result, you really need to smooth out the digital output. In short, this is often done with a low pass filter, composed of a resistor and capacitor attached to the output pin.
This works by using the square wave to charge the capacitor through the resistor while it is high, and discharge it while it is low. By having more time high than low (a longer duty cycle), the capacitor stabilises at a higher voltage (and vice versa). The resistor limits the current drawn from the AVR output pin (as if the capacitor was at 0V and the output gets driven high, you're effectively shorting the output to ground momentarily).
For your case, a resistor somewhere around 4.7K and a capacitor around 2uf will probably suit. Increase the capacitance or the resistance to reduce ripple.