PIC18F XC8 compiler - objects not initialized? - pic

I have to use a Microchip PIC for a new project (needed high pin count on a TQFP60 package with 5V operation).
I have a huge problem, I might miss something (sorry for that in advance).
IDE: MPLAB X 3.51
Compiler: XC8 1.41
The issue is that if I initialize an object to anything other than 0, it will not be initialized, and always be zero when I reach the main();
In simulator it works, and the object value is the proper one.
Simple example:
#include <xc.h>
static int x= 0x78;
void main(void) {
while(x){
x++;
}
return;
}
In simulator the x is 0x78 and the while(x) is true.
BUT when I load the code to the PIC18F67K40 using PICKIT3, the x is 0.
This happening even if I do a simple sprintf, and it does nothing as the formatting text string (char array) is full of zeros.
sprintf(buf,"Number is %u",x")
I can not initialize any object apart to be zero.
What is going on? Any help appreciated!

Found the problem, The chip has an errata issues, and I got the one which is effected, strange, Farnell sells it. More strange that the compiler is not prepared for that, does not even give a warning to say to be careful!
Errata note:
Module: PIC18 Core
3.1 TBLRD requires NVMREG value to point to
appropriate memory
The affected silicon revisions of the PIC18FXXK40
devices improperly require the NVMREG<1:0>
bits in the NVMCON register to be set for TBLRD
access of the various memory regions. The issue
is most apparent in compiled C programs when the
user defines a const type and the compiler uses
TBLRD instructions to retrieve the data from
program Flash memory (PFM). The issue is also
apparent when the user defines an array in RAM
for which the complier creates start-up code,
executed before main(), that uses TBLRD
instructions to initialize RAM from PFM.

Related

IRQ 8 request_irq, Operation not permitted

I'm new to kernel modules development and in my study process, I moved to the interrupts. My task is to write an interrupt handler module for IRQ 8, which will simply count the number of interrupts that occurred on this line and store the value in the kobject. The task sounds relatively easy at a first glance, but I've encountered strange behaviour. I wrote a handler function that simply increments the counter and returns interrupt as handled
static int ir=0;
static irq_handler_t my_handler(int irq_no, void *dev_id, struct pt_regs *regs)
{
ir++;
return (irq_handler_t) IRQ_HANDLED;
}
To hook the interrupt handler I call the request_irq() function inside my __init with the first argument being 8, so the IRQ 8 (which is reserved by rtc) line interrupts are handled
#define RTC_IRQ 8
[...]
int err;
err = request_irq(RTC_IRQ, (irq_handler_t) my_handler,IRQF_SHARED,"rtc0",NULL);
if (err != 0)
return -1;
With the implementation shown above, loading a kernel module gives me err equal to -22, which is EINVAL. After googling I discovered that for the IRQF_SHARED flag last parameter can't be assigned as NULL. I tried to find a method to obtain rtc->dev_id within the module, but in some of the examples they just typecasted the handler into (void *) so I tried passing (void *) my_handler. This gives me a flag mismatch warning on insmod
genirq: Flags mismatch irq 8. 00000080 (rtc0) vs. 00000000 (rtc0)
And err value set to -16, what I read from some sources means "busy". While trying to find a way to obtain a device-id I found out that interrupt is sent by the rtc0 device which is "inherited" from the rtc-cmos parent.
There are different controversial clues I found in different sources across the internet on this matter. Some state that the kernel disables rtc after the synchronization of the software clock, but this can't be the case, since the use of sudo bash -c ' echo +20 > /sys/class/rtc/rtc0/wakealarm ' and read of /proc/interrupts on the IRQ 8 line shows that interrupts are working as intended
Other sources state that all the request_irqs directed to the line must have the IRQF_SHARED flag installed to be able to share the interrupt line. Reading the source file for rtc-cmos gave me nothing since they are setting up interrupts via reading-writing CMOS directly
I spent a lot of time trying to figure out the solution to the problem, but it seems like the RTC interrupts aren't commonly used in a kernel modules development, so finding relevant and recent information on the case is difficult, most of the discussions and examples are related to the implementation when SA_SHIRQ-like flags were used and /drivers/examples folder was present in the kernel source files which is something around kernel version 2.6. And both interrupts and rtc kernel implementation were changed since those times
Any hints/suggestions that may help resolve this issue will be greatly appreciated. This is my first StackOverflow question, so if anything in its format is wrong or disturbing you are welcome to point it out in the comments as well
Thanks in advance for any help
I solved the problem quite a while ago now, but here's some explanation for newbies like me. #stark provided a good hint for the problem.
The main thing to understand is that irresponsible actions in the kernel space quickly lead to a "disaster" of sorts. Seemingly, this is the main reason Linux developers are closing more and more regions from the users/developers.
Read here for the solution
So, in modern kernel versions, you don't randomly tie the handler to a line and mark interrupts as resolved. But you still can "listen" to them using the IRQF_SHARED flag and at the end of your handler you let the interrupt untouched by returning IRQ_NONE, so you are not breaking the correct operation of the rest of the kernel if the interrupt is crucial for something else.
End of the solution, some extra advice on kernel development next
At the very start, it is important to understand that this is not a Userspace where your actions at most will lead to a memory leakage or corruption of some files. Here your actions will easily damage your kernel. If a similar scenario happened to Windows, you'll have no other choice than completely reinstalling the entire OS, but in GNU/Linux this is not really the case. You can swap to a different kernel without the need to go through a tedious process of recovering everything as was before, so if you are a hardcore enthusiast that's too lazy to use VMs, learning to swap kernels, will come in handy real soon:)

Sharing memory with the kernel and compiler optimizations

a frame is shared with a kernel.
User-space code:
read frame // read frame content
_mm_mfence // prevent before "releasing" a frame before we read everything.
frame.status = 0 // "release" a frame
Kernel code:
poll for frame.status // reads a frame's status
_mm_lfence
Kernel can poll it asynchronically, in another thread. So, there is no syscall between userspace code and kernelspace.
Is it correctly synchronized?
I doubt because of the following situation:
A compiler has a weak memory model and we have to assume that it can do wild changes as you can imagine if optimizied/changed program is consistent within one-thread.
So, on my eye we need a second barrier because it is possible that a compiler optimize out store frame.status, 0.
Yes, it will be a very wild optimization but if a compiler would be able to prove that noone in the context (within thread) reads that field it can optimize out it.
I believe that it is theoretically possibe, isn't it?
So, to prevent that we can put the second barrier:
User-space code:
read frame // read frame content
_mm_mfence // prevent before "releasing" a frame before we read everything.
frame.status = 0 // "release" a frame
_mm_fence
Ok, now compiler restrain itself before optimization.
What do you think?
EDIT
[The question is raised by the issue that __mm_fence does not prevent before optimizations-out.
#PeterCordes, to make sure myself: __mm_fence does not prevent before optimizations out (it is just x86 memory barrier, not compiler). However, atomic_thread_fence(any_order) prevents before reorderings (it depends on any_order, obviously) but it also prevents before optimizations out?
For example:
// x is an int pointer
*x = 5
*(x+4) = 6
std::atomic_thread_barrier(memory_order_release)
prevents before optimizations out of stores to x? It seems that it must- otherwise every store to x should be volatile.
However, I saw a lot of lock-free code and there is no making fields as volatile.
_mm_mfence is also a compiler barrier. (See When should I use _mm_sfence _mm_lfence and _mm_mfence, and also BeeOnRope's answer there).
atomic_thread_fence with release, rel_acq, or seq_cst stops earlier stores from merging with later stores. But mo_acquire doesn't have to.
Writes to non-atomic globals variables can only be optimized out by merging with other writes to the same non-atomic variables, not by optimizing them away entirely. So the real question is what reorderings can happen that can let two non-atomic assignments come together.
There has to be an assignment to an atomic variable in there somewhere for there to be anything that another thread could synchronize with. Some compilers might give atomic_thread_fence stronger behaviour wrt. non-atomic variables, but in C++11 there's no way for another thread to legally observer anything about the ordering of *x and x[4] in
#include <atomic>
std::atomic<int> shared_flag {0};
int x[8];
void writer() {
*x = 0;
x[4] = 0;
atomic_thread_fence(mo_release);
x[4] = 1;
atomic_thread_fence(mo_release);
shared_flag.store(1, mo_relaxed);
}
The store to shared_flag has to appear after the stores to x[0] and x[4], but it's only an implementation detail what order the stores to x[0] and x[4] happen in, and whether there are 2 stores to x[4].
For example, on the Godbolt compiler explorer gcc7 and earlier merge the stores to x[4], but gcc8 doesn't, and neither do clang or ICC. The old gcc behaviour does not violate the ISO C++ standard, but I think they strengthened gcc's thread_fence because it wasn't strong enough to prevent bugs in other cases.
For example,
void writer_gcc_bug() {
*x = 0;
std::atomic_thread_fence(std::memory_order_release);
shared_flag.store(1, std::memory_order_relaxed);
std::atomic_thread_fence(std::memory_order_release);
*x = 2; // gcc7 and earlier merge this, which arguably a bug
}
gcc only does shared_flag = 1; *x = 2; in that order. You could argue that there's no way for another thread to safely observe *x after seeing shared_flag == 1, because this thread writes it again right away with no synchronization. (i.e. data race UB in any potential observer makes this reordering arguably legal).
But gcc developers don't think that's enough reason, (it may be violating the guarantees of the builtin __atomic functions that the <atomic> header uses to implement the API). And there may be other cases where there is a real bug that even a standards-conforming program could observe the aggressive reordering that violated the standard.
Apparently this changed on 2017-09 with the fix for gcc bug 80640.
Alexander Monakov wrote:
I think the bug is that on x86 __atomic_thread_fence(x) is expanded into nothing for x!=__ATOMIC_SEQ_CST, it should place a compiler barrier similar to expansion of __atomic_signal_fence.
(__atomic_signal_fence includes something as strong as asm("" ::: "memory" ).)
Yup that would definitely be a bug. So it's not that gcc was being really clever and doing allowed reorderings, it was just mostly failing at thread_fence, and any correctness that did happen was due to other factors, like non-inline function boundaries! (And that it doesn't optimize atomics, only non-atomics.)

How can I enable UnAligned Access for ARM NEON in LLVM compiler?

What is the flag to enable unaligned memory access for ARM NEON in LLVM compiler.
I was testing my ARM NEON intrinsic program in Xcode. I am accessing data from unaligned memory:
char TempMemory[32] = {0};
char * pTempMem = TempMemory;
pTempMem += 7;
int32x2_t i32x2_value = vld1_lane_s32((int32_t const *) pTempMem, i32x2_offset, 0);
Equivalent assembly for the intrinsic should be VLD1.32 {d0[0]}, [pTempMem], but the compiler align it to next multiple of 32 and access data. Because of that, my program is not working fine.
So, How can I enable unaligned access in LLVM compiler?
This isn't actually a NEON problem, it's a C problem, and the issue is:
vld1_lane_s32((int32_t const *) pTempMem , i32x2_offset, 0);
Casting a pointer is a message to the compiler saying "hey, I know this looks bad, but trust me, I really know what I'm doing". Converting a pointer to type A to a pointer to type B, if the pointer does not have suitable alignment for type B, gives undefined behaviour. Therefore the compiler is free to assume that the argument to vld_1_lane_s32 is always 32-bit aligned because there's no valid way it couldn't be (and you've promised you know what you're doing), so it emits the instruction with the alignment hint.
Now, you could fiddle around with options in an attempt to get a different kind of undefined behaviour that matches what you want, but that's just bodging around the problem rather than fixing it. That the underlying NEON instruction set can support unaligned accesses doesn't affect the C language's definition of and restrictions around data alignment.
I'm not familiar with how clever LLVM is, so I'm not sure if simply omitting the pointer cast would work (technically, C permits converting char * to any other type of data pointer, so it should be able to sort out the alignment itself). Otherwise, the solution is to use an appropriate vld*_u8 operation to load the data into the vector via the correct type, then cast that with vreinterpret_s32_u8 once it's in the register.

Can I assume sizeof(GUID)==16 at all times?

The definition of GUID in the windows header's is like this:
typedef struct _GUID {
unsigned long Data1;
unsigned short Data2;
unsigned short Data3;
unsigned char Data4[ 8 ];
} GUID;
However, no packing is not defined. Since the alignment of structure members is dependent on the compiler implementation one could think this structure could be longer than 16 bytes in size.
If i can assume it is always 16 bytes - my code using GUIDs is more efficient and simple.
However, it would be completely unsafe - if a compiler adds some padding in between of the members for some reason.
My questions do potential reasons exist ? Or is the probability of the scenario that sizeof(GUID)!=16 actually really 0.
It's not official documentation, but perhaps this article can ease some of your fears. I think there was another one on a similar topic, but I cannot find it now.
What I want to say is that Windows structures do have a packing specifier, but it's a global setting which is somewhere inside the header files. It's a #pragma or something. And it is mandatory, because otherwise programs compiled by different compilers couldn't interact with each other - or even with Windows itself.
It's not zero, it depends on your system. If the alignment is word (4-bytes) based, you'll have padding between the shorts, and the size will be more than 16.
If you want to be sure that it's 16 - manually disable the padding, otherwise use sizeof, and don't assume the value.
If I feel I need to make an assumption like this, I'll put a 'compile time assertion' in the code. That way, the compiler will let me know if and when I'm wrong.
If you have or are willing to use Boost, there's a BOOST_STATIC_ASSERT macro that does this.
For my own purposes, I've cobbled together my own (that works in C or C++ with MSVC, GCC and an embedded compiler or two) that uses techniques similar to those described in this article:
http://www.pixelbeat.org/programming/gcc/static_assert.html
The real tricks to getting the compile time assertion to work cleanly is dealing with the fact that some compilers don't like declarations mixed with code (MSVC in C mode), and that the techniques often generate warnings that you'd rather not have clogging up an otherwise working build. Coming up with techniques that avoid the warnings is sometimes a challenge.
Yes, on any Windows compiler. Otherwise IsEqualGUID would not work: it compares only the first 16 bytes. Similarly, any other WinAPI function that takes a GUID* just checks the first 16 bytes.
Note that you must not assume generic C or C++ rules for windows.h. For instance, a byte is always 8 bits on Windows, even though ISO C allows 9 bits.
Anytime you write code dependent on the size of someone else's structure,
warning bells should go off.
Could you give an example of some of the simplified code you want to use?
Most people would just use sizeof(GUID) if the size of the structure was needed.
With that said -- I can't see the size of GUID ever changing.
#include <stdio.h>
#include <rpc.h>
int main () {
GUID myGUID;
printf("size of GUID is %d\n", sizeof(myGUID));
return 0;
}
Got 16. This is useful to know if you need to manually allocate on the heap.

how can i override malloc(), calloc(), free() etc under OS X?

Assuming the latest XCode and GCC, what is the proper way to override the memory allocation functions (I guess operator new/delete as well). The debugging memory allocators are too slow for a game, I just need some basic stats I can do myself with minimal impact.
I know its easy in Linux due to the hooks, and this was trivial under codewarrior ten years ago when I wrote HeapManager.
Sadly smartheap no longer has a mac version.
I would use library preloading for this task, because it does not require modification of the running program. If you're familiar with the usual Unix way to do this, it's almost a matter of replacing LD_PRELOAD with DYLD_INSERT_LIBRARIES.
First step is to create a library with code such as this, then build it using regular shared library linking options (gcc -dynamiclib):
void *malloc(size_t size)
{
void * (*real_malloc)(size_t);
real_malloc = dlsym(RTLD_NEXT, "malloc");
fprintf(stderr, "allocating %lu bytes\n", (unsigned long)size);
/* Do your stuff here */
return real_malloc(size);
}
Note that if you also divert calloc() and its implementation calls malloc(), you may need additional code to check how you're being called. C++ programs should be pretty safe because the new operator calls malloc() anyway, but be aware that no standard enforces that. I have never encountered an implementation that didn't use malloc(), though.
Finally, set up the running environment for your program and launch it (might require adjustments depending on how your shell handles environment variables):
export DYLD_INSERT_LIBRARIES=./yourlibrary.dylib
export DYLD_FORCE_FLAT_NAMESPACE=1
yourprogram --yourargs
See the dyld manual page for more information about the dynamic linker environment variables.
This method is pretty generic. There are limitations, however:
You won't be able to divert direct system calls
If the application itself tricks you by using dlsym() to load malloc's address, the call won't be diverted. Unless, however, you trick it back by also diverting dlsym!
The malloc_default_zone technique mentioned at http://lists.apple.com/archives/darwin-dev/2005/Apr/msg00050.html appears to still work, see e.g. http://code.google.com/p/fileview/source/browse/trunk/fileview/fv_zone.cpp?spec=svn354&r=354 for an example use that seems to be similar to what you intend.
After much searching (here included) and issues with 10.7 I decided to write a blog post about this topic: How to set malloc hooks in OSX Lion
You'll find a few good links at the end of the post with more information on this topic.
The basic solution:
malloc_zone_t *dz=malloc_default_zone();
if(dz->version>=8)
{
vm_protect(mach_task_self(), (uintptr_t)malloc_zones, protect_size, 0, VM_PROT_READ | VM_PROT_WRITE);//remove the write protection
}
original_free=dz->free;
dz->free=&my_free; //this line is throwing a bad ptr exception without calling vm_protect first
if(dz->version==8)
{
vm_protect(mach_task_self(), (uintptr_t)malloc_zones, protect_size, 0, VM_PROT_READ);//put the write protection back
}
This is an old question, but I came across it while trying to do this myself. I got curious about this topic for a personal project I was working on, mainly to make sure that what I thought was automatically deallocated was being properly deallocated. I ended up writing a C++ implementation to allow me to track the amount of allocated heap and report it out if I so chose.
https://gist.github.com/monitorjbl/3dc6d62cf5514892d5ab22a59ff34861
As the name notes, this is OSX-specific. However, I was able to do this on Linux environments using the malloc_usable_size
Example
#define MALLOC_DEBUG_OUTPUT
#include "malloc_override_osx.hpp"
int main(){
int* ip = (int*)malloc(sizeof(int));
double* dp = (double*)malloc(sizeof(double));
free(ip);
free(dp);
}
Building
$ clang++ -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.11.sdk \
-pipe -stdlib=libc++ -std=gnu++11 -g -o test test.cpp
$ ./test
0x7fa28a403230 -> malloc(16) -> 16
0x7fa28a403240 -> malloc(16) -> 32
0x7fa28a403230 -> free(16) -> 16
0x7fa28a403240 -> free(16) -> 0
Hope this helps someone else out in the future!
If the basic stats you need can be collected in a simple wrapper, a quick (and kinda dirty) trick is just using some #define macro replacement.
void* _mymalloc(size_t size)
{
void* ptr = malloc(size);
/* do your stat work? */
return ptr;
}
and
#define malloc(sz_) _mymalloc(sz_)
Note: if the macro is defined before the _mymalloc definition it will end up replacing the malloc call inside that function leaving you with infinite recursion... so ensure this isn't the case. You might want to explicitly #undef it before that function definition and simply (re)define it afterward depending on where you end up including it to hopefully avoid this situation.
I think if you define a malloc() and free() in your own .c file included in the project the linker will resolve that version.
Now then, how do you intend to implement malloc?
Check out Emery Berger's -- the author of the Hoard memory allocator's -- approach for replacing the allocator on OSX at https://github.com/emeryberger/Heap-Layers/blob/master/wrappers/macwrapper.cpp (and a few other files you can trace yourself by following the includes).
This is complementary to Alex's answer, but I thought this example was more to-the-point of replacing the system provided allocator.

Resources