I am trying to practice a bit using FreeRTOS on my Arduino. I believe I installed the libraries correctly as well as executed the code. When I try to verify on my IDE Arduino, I get the following error. At first, I thought
I needed to update the IDE on my macOS and all the libraries to help with storage, but I am still getting the error.
"text section exceeds available space in boardSketch uses 46768 bytes (144%) of program storage space. Maximum is 32256 bytes.
Global variables use 1572 bytes (76%) of dynamic memory, leaving 476 bytes for local variables. Maximum is 2048 bytes.
Sketch too big; see https://support.arduino.cc/hc/en-us/articles/360013825179 for tips on reducing it.
Error compiling for board Arduino Uno."
Related
I ran valgrind to one of my open-source OpenCL codes (https://github.com/fangq/mmc), and it detected a lot of memory leaks in the OpenCL host code. Most of those pointed back to the line where I created the context object using clCreateContextFromType.
I double checked all my OpenCL variables, command queues, kernels and programs, and made sure that they are all properly released, but still, when testing on sample programs, every call to the mmc_run_cl() function bumps up memory by 300MB-400MB and won't release at return.
you can reproduce the valgrind report by running the below commands in a terminal:
git clone https://github.com/fangq/mmc.git
cd mmc/src
make clean
make all
cd ../examples/validation
valgrind --show-leak-kinds=all --leak-check=full ../../src/bin/mmc -f cube2.inp -G 1 -s cube2 -n 1e4 -b 0 -D TP -M G -F bin
assuming you system has gcc/git/libOpenCL and valgrind installed. Change the -G 1 input to a different number if you want to run it on other OpenCL devices (add -L to list).
In the below table, I list the repeated count of each valgrind detected leaks on an NVIDIA GPU (TitanV) on a Linux box (Ubuntu 16.04) with the latest driver+cuda 9.
Again, most leaks are associated with the clCreateContextFromType line, which I assume some GPU memories are not released, but I did released all GPU resources at the end of the host code.
do you notice anything that I missed in my host code? your input is much appreciated
counts | error message
------------------------------------------------------------------------------------
380 ==27828== by 0x402C77: main (mmc.c:67)
Code: entry point to the below errors
64 ==27828== by 0x41CF02: mcx_list_gpu (mmc_cl_utils.c:135)
Code: OCL_ASSERT((clGetPlatformIDs(0, NULL, &numPlatforms)));
4 ==27828== by 0x41D032: mcx_list_gpu (mmc_cl_utils.c:154)
Code: context=clCreateContextFromType(cps,devtype[j],NULL,NULL,&status);
58 ==27828== by 0x41DF8A: mmc_run_cl (mmc_cl_host.c:111)
Code: entry point to the below errors
438 ==27828== by 0x41E006: mmc_run_cl (mmc_cl_host.c:124)
Code: OCL_ASSERT(((mcxcontext=clCreateContextFromType(cprops,CL_DEVICE_TYPE_ALL,...));
13 ==27828== by 0x41E238: mmc_run_cl (mmc_cl_host.c:144)
Code: OCL_ASSERT(((mcxqueue[i]=clCreateCommandQueue(mcxcontext,devices[i],prop,&status),status)));
1 ==27828== by 0x41E7A6: mmc_run_cl (mmc_cl_host.c:224)
Code: OCL_ASSERT(((gprogress[0]=clCreateBufferNV(mcxcontext,CL_MEM_READ_WRITE, NV_PIN, ...);
1 ==27828== by 0x41E7F9: mmc_run_cl (mmc_cl_host.c:225)
Code: progress = (cl_uint *)clEnqueueMapBuffer(mcxqueue[0], gprogress[0], CL_TRUE, ...);
10 ==27828== by 0x41EDFA: mmc_run_cl (mmc_cl_host.c:290)
Code: status=clBuildProgram(mcxprogram, 0, NULL, opt, NULL, NULL);
7 ==27828== by 0x41F95C: mmc_run_cl (mmc_cl_host.c:417)
Code: OCL_ASSERT((clEnqueueReadBuffer(mcxqueue[devid],greporter[devid],CL_TRUE,0,...));
Update [04/11/2020]:
Reading #doqtor's comment, I did the following test on 5 difference devices, 2 NVIDIA GPUs, 2 AMD GPUs and 1 Intel CPU. What he said was correct - the memory leak does not happen on the Intel OpenCL library, I also found that AMD OpenCL driver is fine too. The only problem is that NVIDIA OpenCL library seems to have a leak on both GPUs I tested (Titan V and RTX2080).
My test results are below. Memory/CPU profiling using psrecord introduced in this post.
I will open a new question and bounty on how to reduce this memory leak with NVIDIA OpenCL. If you have any experience in this, please share. will post the link below. thanks
I double checked all my OpenCL variables, command queues, kernels and
programs, and made sure that they are all properly released...
Well I still found one (tiny) memory leak in mmc code:
==15320== 8 bytes in 1 blocks are definitely lost in loss record 14 of 1,905
==15320== at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==15320== by 0x128D48: mmc_run_cl (mmc_cl_host.c:137)
==15320== by 0x11E71E: main (mmc.c:67)
Memory allocated by greporter isn't freed. So that's to be fixed by you.
The rest are potential memory leaks in OpenCL library. They may or may not to be a memory leaks as for example the library may use custom memory allocators which valgrind does not recognizes or does some other tricks. There is a lot threads about that:
clGetPlatformIDs Memory Leak
https://software.intel.com/en-us/forums/opencl/topic/753786
https://github.com/KhronosGroup/OpenCL-ICD-Loader/issues/13
OpenCL clGetPlatformIDs gives around 230 valgrind memcheck errors
In general you can't do much about that unless you want to dive into the library code and do something about that.
I would suggest to carefully suppress those reported which are coming from the library. The suppression file can be generated as described in the valgrind manual: https://valgrind.org/docs/manual/manual-core.html#manual-core.suppress
... but still, when testing on sample programs, every call to the
mmc_run_cl() function bumps up memory by 300MB-400MB and won't release
at return
How did you checked that? I haven't seen memory suspiciously growing. I set -n 1000e4 and it made it to run for like 2 minutes where the memory allocated stayed still for all the time at ~0.6% of my RAM size. Note that I didn't use nvidia CUDA but POCL on Intel GPU and CPU and linked with libOpenCL installed from ocl-icd-libopencl1:amd64 package on Ubuntu 18.04. So you may try to give that a go and check if that changes anything.
======== Update ================================
I've re-run it as you described in the comment and after first iteration the memory usage was 0.6% then after 2nd iteration it increased to 0.9% and after that the next iterations didn't increase memory usage. Valgrind also didn't report anything newer besides what I observed earlier. So I would suggest to link with different than nvidia-cuda libOpenCL and retest.
I have an Intel FPGA PCIe endpoint. It shows up correctly in lspci and all of the lspci -vv information looks correct (memory map, IRQ, BAR0 size all look OK). I want to stream some data over BAR0 and read/write status registers inside of my IP. My host machine has an Intel x86_64 CPU and running a Debian OS.
What I"m currently doing is this:
open() call to /sys/class/uio/uio0/device/resource0 -> returns a file descriptor.
mmap 4 KB on that file descriptor with PROT_READ + PROT_WRITE protection, and MAP_SHARED flags. Offset is 0. --> returns a pointer.
In a loop, set offsets relative to the pointer to random numbers. Do this for ~1000 bytes, 4 bytes at a time. After each write to the pointer, call msync.
Read back the pointer one DWORD at a time.
Read back the pointer in bulk using memcpy.
The outcome of step #4 is that most of the data looks correct, but some is not (which is strange). The outcome of number 5 is that I get 0xffff_ffff for everything, which is even stranger.
If I try to replace step #3 with a single memset / msync sequence, the program hangs for a little while and then returns a bus error. After this, lspci states that BAR0 is disabled and I can no longer interact with it.
Any ideas what I'm doing wrong? It could be a HW issue, but the HW is really simple right now. I configured the FPGA to act purely as a slave device, with its read response being the registered write data coming across the BAR0 interface. The IP I am working with only has a read-response-valid line (no write response valid, somehow) which I have hard-coded to 1. Seems that burst sizes more than 1 cause some kinds of issues with the PCIe core as seen in the memcpy/memset issues, but I don't see why that would be the case.
EDIT/UPDATE:
I was able to work around this. Supposedly the MMIO is only for 32b, so writing a loop that access the pointer 32b at a time is the solution here.
I'm trying to find current flag count in KMines by using gdb. I know that I should look for memory mappings first to avoid non-existent memory locations. So I ran info proc mappings command to see the memory segments. I picked up a random memory gap (0xd27000-0x168b000) from the result and executed the find command like this: find 0x00d27000, 0x0168b000, 10
But I got the warning: Unable to access 1458 bytes of target memory at 0x168aa4f, halting search. error. Although the address 0x168aa4f is between 0xd27000 and 0x168b000, gdb says that it can't access to it. Why does this happen? What can I do to avoid this situation? Or is there a way to ignore unmapped/unaccessible memory locations?
Edit: I tried to set the value of the address 0x168aa4f to 1 and it works, so gdb can actually access that address but gives error when used with the find command. But why?
I guess I have solved my own problem, I can't believe how simple the solution was. The only thing I did was to decrease the 2nd parameter's value by one. So the code should be find 0x00d27000, 0x0168afff, 10 because linux allocates the memory by using maps in [x,y) format, so if the line in root/proc/pid/maps says something like this;
01a03000-0222a000 rw-p
The memory allocated includes 0x01a03000 but not 0x0222a000. Hope this silly mistake of mine helps someone :D
Edit: The root of the problem is the algorithm implemented in target.c(gdb's source code I mean) the algorithm reads and searches the memory as chunks at the size of 16000 bytes. So even if the last byte of the chunk is invalid, gdb will throw the entire chunk into the trash and won't even give any proper information about the invalid byte, it only reports the beginning of the current chunk.
Situation:
I'm trying to manipulate/hack a Debian kernel to be able to use 9-bit UART via the parity hack that you can find reference to everywhere on the NET. Now for some reason Tx is working just fine, but when we look at Rx we're dropping bytes. We're expecting 9 bytes back, including the crc, but we get a variable amount back between 4-7. It will vary with the exact same code run a few times in a row, so it almost appears to be buffer related. Hooking it up to the logic analyzer the response is proper(9 expected bytes) from the slave device, but it is getting "mangled" somewhere low level that I can't track down.
Attempted:
So I started dumping all the bytes(char) that come in here:
static void serial_omap_rdi(struct uart_omap_port *up, unsigned int lsr)
Which still results in only seeing 4-7 bytes of bad data. I'm assuming this is due to parity/framing/FIFO errors but I just can't seem to get any deeper to where I can see exactly where it's failing.
Questions:
Has anyone seen anything like this type of discarded data on the UART line with a linux kernel? If so would you know any possible avenues to take in tracking it down?
Is there anyplace in the kernel I can look to actually dump the bit-by-bit data that is coming into the UART Rx line?
Any help is greatly appreciated as I'm at my wits end understanding how/where exactly the kernel is discarding these bytes...
Platform:
BBB Debian Linux 3.8.13-bone53 armv7l GNU/Linux
16750 UART (I Believe)
I'm disassembling "Test Drive III". It's a 1990 DOS game. The *.EXE has MZ format.
I've never dealt with segmentation or DOS, so I would be grateful if you answered some of my questions.
1) The game's system requirements mention 286 CPU, which has protected mode. As far as I know, DOS was 90% real mode software, yet some applications could enter protected mode. Can I be sure that the app uses the CPU in real mode only? IOW, is it guaranteed that the segment registers contain actual offset of the segment instead of an index to segment descriptor?
2) Said system requirements mention 1 MB of RAM. How is this amount of RAM even meant to be accessed if the uppermost 384 KB of the address space are reserved for stuff like MMIO and ROM? I've heard about UMBs (using holes in UMA to access RAM) and about HMA, but it still doesn't allow to access the whole 1 MB of physical RAM. So, was precious RAM just wasted because its physical address happened to be reserved for UMA? Or maybe the game uses some crutches like LIM EMS or XMS?
3) Is CS incremented automatically when the code crosses segment boundaries? Say, the IP reaches 0xFFFF, and what then? Does CS switch to the next segment before next instruction is executed? Same goes for SS. What happens when SP goes all the way down to 0x0000?
4) The MZ header of the executable looks like this:
signature 23117 "0x5a4d"
bytes_in_last_block 117
blocks_in_file 270
num_relocs 0
header_paragraphs 32
min_extra_paragraphs 3349
max_extra_paragraphs 65535
ss 11422
sp 128
checksum 0
ip 16
cs 8385
reloc_table_offset 30
overlay_number 0
Why does it have no relocation information? How is it even meant to run without address fixups? Or is it built as completely position-independent code consisting from program-counter-relative instructions? The game comes with a cheat utility which is also an MZ executable. Despite being much smaller (8448 bytes - so small that it fits into a single segment), it still has relocation information:
offset 1
segment 0
offset 222
segment 0
offset 272
segment 0
This allows IDA to properly disassemble the cheat's code. But the game EXE has nothing, even though it clearly has lots of far pointers.
5) Is there even such thing as 'sections' in DOS? I mean, data section, code (text) section etc? The MZ header points to the stack section, but it has no information about data section. Is data and code completely mixed in DOS programs?
6) Why even having a stack section in EXE file at all? It has nothing but zeroes. Why wasting disk space instead of just saying, "start stack from here"? Like it is done with BSS section?
7) MZ header contains information about initial values of SS and CS. What about DS? What's its initial value?
8) What does an MZ executable have after the exe data? The cheat utility has whole 3507 bytes in the end of the executable file which look like
__exitclean.__exit.__restorezero._abort.DGROUP#.__MMODEL._main._access.
_atexit._close._exit._fclose._fflush._flushall._fopen._freopen._fdopen
._fseek._ftell._printf.__fputc._fputc._fputchar.__FPUTN.__setupio._setvbuf
._tell.__MKNAME._tmpnam._write.__xfclose.__xfflush.___brk.___sbrk._brk._sbrk
.__chmod.__close._ioctl.__IOERROR._isatty._lseek.__LONGTOA._itoa._ultoa.
_ltoa._memcpy._open.__open._strcat._unlink.__VPRINTER.__write._free._malloc
._realloc.__REALCVT.DATASEG#.__Int0Vector.__Int4Vector.__Int5Vector.
__Int6Vector.__C0argc.__C0argv.__C0environ.__envLng.__envseg.__envSize
Is this some kind of debugging symbol information?
Thank you in advance for your help.
Re. 1. No, you can't be sure until you prove otherwise to yourself. One giveaway would be the presence of MOV CR0, ... in the code.
Re. 2. While marketing materials aren't to be confused with an engineering specification, there's a technical reason for this. A 286 CPU could address more than 1M of physical address space. The RAM was only "wasted" in real mode, and only if an EMM (or EMS) driver wasn't used. On 286 systems, the RAM past 640kb was usually "pushed up" to start at the 1088kb mark. The ISA and on-board peripherals' memory address space was mapped 1:1 into the 640-1024kb window. To use the RAM from the real mode needed an EMM or EMS driver. From protected mode, it was simply "there" as soon as you set up the segment descriptor correctly.
If the game actually needed the extra 384kb of RAM over the 640kb available in the real mode, it's a strong indication that it either switched to protected mode or required the services or an EMM or EMS driver.
Re. 3. I wish I remembered that. On reflection, I wish not :) Someone else please edit or answer separately. Hah, I did know it at some point in time :)
Re. 4. You say "[the code] has lots of instructions like call far ptr 18DCh:78Ch". This implies one of three things:
Protected mode is used and the segment part of the address is a selector into the segment descriptor table.
There is code there that relocates those instructions without DOS having to do it.
There is code there that forcibly relocates the game to a constant position in the address space. If the game doesn't use DOS to access on-disk files, it can remove DOS completely and take over, gaining lots of memory in the process. I don't recall whether you could exit from the game back to the command prompt. Some games where "play until you reboot".
Re. 5. The .EXE header does not "point" to any stack, there is no stack section you imply, the concept of sections doesn't exist as far as the .EXE file is concerned. The SS register value is obtained by adding the segment the executable was loaded at with the SS value from the header.
It's true that the linker can arrange sections contiguously in the .EXE file, but such sections' properties are not included in the .EXE header. They often can be reverse-engineered by inspecting the executable.
Re. 6. The SS and SP values in the .EXE header are not file pointers. The EXE file might have a part that maps to the stack, but that's entirely optional.
Re. 7. This is already asked and answered here.
Re. 8. This looks like a debug symbol list. The cheat utility was linked with the debugging information left in. You can have completely arbitrary data there - often it'd various resources (graphics, music, etc.).