Auto-scheduler support for Hexagon - halide

How can I use auto-scheduler for Hexagon DSP or any target that is not supported? Also, are you planning to support auto-scheduling for Hexagon soon?

The auto scheduler might already work for Hexagon standalone use, but it will not automatically insert Hexagon offload calls. It is planned to support GPU scheduling in the auto scheduler and that will possibly address this, but the timeframe is a bit open as there's some work to be done on increasing the power of the auto scheduler first.

Related

Lightweight 2D library for embedded linux

I'm developing an application on relatively restricted embedded Linux platform, meaning it has 256MB of flash; no problem with RAM however. The application uses SPI TFT screen, exposed through framebuffer driver. The only thing required from UI is to support text presentation with various fonts and sizes, including text animations (fade, slide, etc.). On the prototype, which ran on RPi 3 I used libcairo so it went well. Now, provided the tight space constraints on the real platform, it doesn't seem feasible to use libcairo anymore, since according to what I've seen it requires more than 100 MB of space with all dependencies it has. Note however, that I come from bare metal world and never dealt with complex UI, so I might be completely wrong about libcairo and its size. So guys, please suggest what 2D library I could pick for my case (C++ is preferred, but C is also ok), and just in case there is a way to use libcairo with few megs footprint, please point me to the right direction.
Regards

Real-time capability comparison of single board computers

In my thesis, I plan on writing a section of real-time capability comparison of single board computers:
the factors (if they really have a real time clock, even if they don't have one, can real-time frameworks or RTOS be used to utilize them with real-time properties and how)
what scheduling is used in their out-of-the-box kernel? (for example, if Round-robin is used, then AFAIK real-time scheduling cannot be achieved)
Comparison between Pandaboard, Beagleboard, Beaglebone, and Especially Raspberry Pi
If you have a resource or idea regarding this, I would really appreciate it. In case I have missed an information, please do say and I'd be happy to provide that.
Thanks in advance.
EDIT:
I found a good answer here, but I can always appreciate any better guidance.
What makes a kernel/OS real-time?
First an observation. Scheduling is an OS concept. Why would it matter which scheduler is used in out-of-the-box kernel? If indeed there is such a thing as out-of-the-box kernel. Having said that, realtimeness is affected by scheduler and hardware. But when comparing boards, I would keep scheduler constant (or may be pick a few) and then compare boards. Choosing scheduler(s) is a separate topic on its own. Couple of things to take into account are that it should be pre-emptive and be able to deal with issues like priority inversion.
Note that all these boards have MMU which will bring in latency. That shouldn't really matter though, as long as that latency is bounded. I'd also compare accuracy of crystals on which the clocks are based. Note also SoCs have low power modes, they also tend to switch clocks. Whenever they come out of LP mode, they switch from some internal oscillator to more accurate clock source like external crystal. That requires time to for crystal to stabilise before it can continue normal operations. Comparison of latency involved in switching between power mode will also be a useful determinant.

struggling to find a way to monitor CPU and GPU, either third party or using code

so I'm currently working on a method to evaluate some graphics programming techniques in direct x 10, specifically custom shader files and instancing but I need a method of evaluating just how efficient it is to use them. I've been trying to find a way to evaluate it using draw speed, CPU load and GPU load as in theory there should be a much more rapid draw speed and the CPU & GPU load will be reduced as the program increases in efficiency.
My question is there a decent 3rd party method to monitor GPU & CPU or is it better to code manually, I'm using the rastertek framework currently.
DirectX has profiling tools already available..
i.e. here according to google ;-)

EGL/OpenGL ES/switching context is slow

I am developing an OpenGL ES 2.0 application (using angleproject on Windows for developement) that is made up of multiple 'frames'.
Each frame is an isolated application that should not interfere with the surrounding frames. The frames are drawn using OpenGL ES 2.0, by the code running inside of that frame.
My first attempt was to assign a frame buffer to each frame. But there was a problem - OpenGL's internal states are changed while one frame is drawing, and if the next frame doesn't comprehensively reset every known OpenGL state, there could be possible side effects. This defeats my requirement that each frame should be isolated and not affect one another.
My next attempt was to use a context per frame. I created a unique context for each frame. I'm using sharing resources, so that I can eglMakeCurrent to each frame, render each to their own frame buffer/texture, then eglMakeCurrent back to globally, to compose each texture to the final screen.
This does a great job at isolating the instances, however.. eglMakeCurrent is very slow. As little as 4 of them can make it take a second or more to render the screen.
What approach can I take? Is there a way I can either speed up context switching, or avoid context switching by somehow saving the OpenGL state per frame?
I have a suggestion that may eliminate the overhead of eglMakeCurrent while allowing you to use your current approach.
The concept of current EGLContext is thread-local. I suggest creating all contexts in your process's master thread, then create one thread per context, passing one context to each thread. During each thread's initialization, it will call eglMakeCurrent on the context it owns, and never call eglMakeCurrent again. Hopefully, in ANGLE's implementation, the thread-local storage for contexts is implemented efficiently and does not have unnecessary synchronization overhead.
The problem here is trying to do this in a generic platform and OS independent way. If you choose a specific platform, there are good solutions. On Windows, there are the wgl and glut libraries that will give you multiple windows with completely independent OpenGL contexts running concurrently. They are called Windows, not Frames. You could also use DirectX instead of OpenGL. Angle uses DirectX. On linux, the solution is X11 for OpenGL. In either case, it's critical to have quality OpenGL drivers. No Intel Extreme chipset drivers. If you want to do this on Android or iOS, then those require different solutions. There was a recent thread on the Khronos.org OpenGL ES forum about the Android case.

from xserver to xdirectfb

Hi
Is it possible to uninstall xserver and use xdirectfb with a tiny window manager - like awesome ?
Do I need to compile from source every appllication I want to use with xdirectfb ?
From these links, it isn't clear to me :
http://en.wikipedia.org/wiki/DirectFB
http://directfb.org/index.php?path=Projects%2FXDirectFB
Pretty much yes you can, no you don't have to. I'm not sure if you'll save anything though.
Normal X server contains both raw hardware access support (framebuffer) and X server abstraction layer for the windowed apps and window manager.
The X abstraction layer is quite heavyweight due to support of multiple displays on multiple hosts, windows geometry, ordering, palettes and so on, plus generally rather overly complex API. Running that uses up lots of resources but makes (arguably) programming easier.
OTOH a framebuffer usage is very simple, change a byte in memory, call one function and the corresponding pixel is set, that's all - no overhead on the API side, but it's up to your application to draw every single pixel and manage cooperation with other applications, create windows and so on.
DirectFB is a raw framebuffer access API that is fast, simple and with minimal overhead, but provides no extras.
XDirectFB is an app that will run on top of DirectFB providing all the complexity of X server, without a hardware layer of its own.
Then you can run any WM and app on top of XDirectFB like on top of any other X server.
Now while of course DirectFB alone is much more lightweight than any X server, whether the combination, DirectFB + XDirectFB is lighter than a dedicated X - this is not so sure.

Resources