There are a lot of logs and other stuff below, so to skip to the punchline: I have a linker script and am setting variables within it, and using these variables to set up memory sections. But it seems like these variables are always being set to 0 no matter what I set them to in the script.
I am writing a linker script, trying to add two new memory sections to my output elf for an embedded systems application. I am trying to chop off the first bit of the preexisting RAM memory section, change the size of RAM accordingly, and place my new sections there.
Previously, using the nrf52840 development board and this library, I can modify the linker script for one of the example applications as so:
SEARCH_DIR(.)
GROUP(-lgcc -lc -lnosys)
__NewSection1Start = 0x20000000;
__TotalLength = 0x1000;
__NewSection2Length = __TotalLength / 2;
__NewSection1Length = __TotalLength / 2;
__NewSection2Start = __NewSection1Start + __NewSection1Length;
MEMORY
{
FLASH (rx) : ORIGIN = 0x0, LENGTH = 0xff000
NEWSECTION1 (rwx) : ORIGIN = __NewSection1Start, LENGTH = __NewSection1Length
NEWSECTION2 (rwx) : ORIGIN = __NewSection2Start, LENGTH = __NewSection2Length
RAM (rwx) : ORIGIN = 0x20000000 + __TotalLength, LENGTH = 0x40000
}
FLASH_PAGE_SIZE = 4096;
FLASH_DATA_PAGES_USED = 4;
SECTIONS
{
.newsection1 :
{
KEEP (*(.newsection1));
} > NEWSECTION1
.newsection2 :
{
KEEP (*(.newsection2));
} > NEWSECTION2
}
...
And this works perfectly. Recently, I have needed to port to the nrf9160 using the Zephyr RTOS, so I began using the NordicPlayground nrf library built on top of Zephyr at this link, modifying one of the sample apps in there. In trying to rewrite the program for that board and environment, using a technique similar to this, I did a similar thing with the Zephyr linker script, modifying it as before. The build process somehow takes what I have done and generates the file build/spm/zephyr/linker.cmd in the build output folder:
__TotalLength = 0x1000;
__NewSection1Start = 0x20000000;
__NewSection2Length = __TotalLength / 2;
__NewSection1Length = __TotalLength / 2;
__NewSection2Start = __NewSection1Start + __NewSection1Length;
MEMORY
{
FLASH (rx) : ORIGIN = 0x0, LENGTH = 0xc000
SRAM (wx) : ORIGIN = 0x20000000 + __TotalLength, LENGTH = (64 * 1K) - __TotalLength
NEWSECTION1 (rwx) : ORIGIN = __NewSection1Start, LENGTH = __NewSection1Length
NEWSECTION2 (rwx) : ORIGIN = __NewSection2Start, LENGTH = __NewSection2Length
IDT_LIST (wx) : ORIGIN = (0x20000000 + (64 * 1K)), LENGTH = 2K
}
ENTRY("__start")
SECTIONS
{
.newsection1 :
{
KEEP (*(.newsection1));
} > NEWSECTION1
.newsection2 :
{
KEEP (*(.newsection2));
} > NEWSECTION2
}
...
However, although I thought these two scripts should behave the same, trying to compile and link with the nrf9160 version with the Zephyr RTOS throws this error:
riley#riley-Blade:~<code_base>/nrf/samples/nrf9160/example_app$ west build -b nrf9160_pca10090ns
source directory: /home/riley<code_base>/nrf/samples/nrf9160/example_app
build directory: /home/riley<code_base>/nrf/samples/nrf9160/example_app/build
BOARD: nrf9160_pca10090ns (origin: CMakeCache.txt)
[6/17] Linking C executable spm/zephyr/spm_zephyr_prebuilt.elf
Memory region Used Size Region Size %age Used
FLASH: 32 KB 48 KB 66.67%
SRAM: 10000 B 60 KB 16.28%
NEWSECTION1: 0 GB 2 KB 0.00%
NEWSECTION2: 0 GB 2 KB 0.00%
IDT_LIST: 40 B 2 KB 1.95%
[12/17] Linking C executable zephyr/zephyr_prebuilt.elf
FAILED: zephyr/zephyr_prebuilt.elf
: && ccache /home/riley/gcc-arm-none-eabi-9-2019-q4-major/bin/arm-none-eabi-gcc zephyr/CMakeFiles/zephyr_prebuilt.dir/misc/empty_file.c.obj -o zephyr/zephyr_prebuilt.elf -Wl,-T zephyr/linker.cmd -Wl,-Map=/home/riley<code_base>/nrf/samples/nrf9160/example_app/build/zephyr/zephyr.map -u_OffsetAbsSyms -u_ConfigAbsSyms -Wl,--whole-archive app/libapp.a zephyr/libzephyr.a zephyr/arch/arm/core/libarch__arm__core.a zephyr/arch/arm/core/cortex_m/libarch__arm__core__cortex_m.a zephyr/arch/arm/core/cortex_m/mpu/libarch__arm__core__cortex_m__mpu.a zephyr/lib/libc/minimal/liblib__libc__minimal.a zephyr/subsys/net/libsubsys__net.a zephyr/subsys/net/ip/libsubsys__net__ip.a zephyr/drivers/gpio/libdrivers__gpio.a zephyr/drivers/serial/libdrivers__serial.a zephyr/modules/nrf/lib/bsdlib/lib..__nrf__lib__bsdlib.a zephyr/modules/nrf/lib/at_host/lib..__nrf__lib__at_host.a zephyr/modules/nrf/drivers/at_cmd/lib..__nrf__drivers__at_cmd.a /home/riley<code_base>/nrfxlib/bsdlib/lib/cortex-m33/hard-float/libbsd_nrf9160_xxaa.a -Wl,--no-whole-archive zephyr/kernel/libkernel.a zephyr/CMakeFiles/offsets.dir/arch/arm/core/offsets/offsets.c.obj -L"/home/riley/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/thumb/v8-m.main+fp/hard" -L/home/riley<code_base>/nrf/samples/nrf9160/example_app/build/zephyr -lgcc -Wl,--print-memory-usage /home/riley<code_base>/nrfxlib/crypto/nrf_oberon/lib/cortex-m33/hard-float/liboberon_3.0.0.a -lc -mthumb -mcpu=cortex-m33 -mfpu=fpv5-sp-d16 -Wl,--gc-sections -Wl,--build-id=none -Wl,--sort-common=descending -Wl,--sort-section=alignment -nostdlib -static -no-pie -Wl,-X -Wl,-N -Wl,--orphan-handling=warn -mabi=aapcs -march=armv8-m.main+dsp libspmsecureentries.a && :
Memory region Used Size Region Size %age Used
FLASH: 110428 B 976 KB 11.05%
SRAM: 24608 B 124 KB 19.38%
NEWSECTION1: 264 B 0 GB inf%
NEWSECTION2: 1 B 0 GB inf%
IDT_LIST: 120 B 2 KB 5.86/home/riley/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/../../../../arm-none-eabi/bin/ld: zephyr/zephyr_prebuilt.elf section `.newsection1' will not fit in region `NEWSECTION1'
/home/riley/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/../../../../arm-none-eabi/bin/ld: zephyr/zephyr_prebuilt.elf section `.newsection2' will not fit in region `NEWSECTION2'
/home/riley/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/../../../../arm-none-eabi/bin/ld: section .newsection2 LMA [0000000000000000,0000000000000000] overlaps section .newsection1 LMA [0000000000000000,0000000000000107]
/home/riley/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/../../../../arm-none-eabi/bin/ld: region `NEWSECTION1' overflowed by 264 bytes
/home/riley/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/../../../../arm-none-eabi/bin/ld: region `NEWSECTION2' overflowed by 1 byte
collect2: error: ld returned 1 exit status
%
ninja: build stopped: subcommand failed.
ERROR: command exited with status 1: /home/riley/.local/bin/cmake --build /home/riley<code_base>/nrf/samples/nrf9160/example_app/build
This is really strange for me, because this ouptut makes it seem like NEWSECTION1 and NEWSECTION2 have 0 length and both start at address 0x0, but this definitely should not be the case looking at the used linker script. To further make this confusing, if I replace __NewSection1Length with 0x500, or something like that, then the memory section will be 0x500B long, meaning that it seems like the error is that using variables in this linker script are being ignored or set to 0? Any help would be very much appreciated.
Some of the more complicated features of Zephyr like bootloaders and SPM (secure partition manager) actually build using independent board definitions, linker scripts, and project configurations. Even though they all happen from one call to west build or ninja or make or whatever build tool you are configured with, they can sometimes end up mismatched.
You can see this if you go into the build output directory and notice that there will be multiple sub-directories with names like spm and mcuboot and zephyr. Each of these sub-directories will contain their own object files, map files, linker scripts, etc. The zephyr directory is the top level one that contains the build/link work for your final application and will merge in the pre-compiled binaries from the others. What I think is happening is that the prj.conf changes you made to create the custom sections in your application are only affecting one of these sub-builds. You likely need to find the prj.conf files for the other ones and modify them similarly.
Related
I have a problem debugging an stm32f407vet6 board and rust code.
The point of the problem is that GDB ignores breakpoints.
After setting breakpoints and executing the "continue" command in gdb, the program continues to ignore all breakpoints.
The only way to stop the program running is to cause an interrupt using the "ctrl + c" command.
After this command, the board stops its execution on the line currently being executed.
I have tried to set breakpoints on all lines where I can set them, but all the attempts are unsuccessful.
$ openocd
Open On-Chip Debugger 0.10.0 (2020-07-01) [https://github.com/sysprogs/openocd]
Licensed under GNU GPL v2
libusb1 09e75e98b4d9ea7909e8837b7a3f00dda4589dc3
For bug reports, read
http://openocd.org/doc/doxygen/bugs.html
Info : auto-selecting first available session transport "hla_swd". To override use 'transport select <transport>'.
Info : The selected transport took over low-level target control. The results might differ compared to plain JTAG/SWD
Info : Listening on port 6666 for tcl connections
Info : Listening on port 4444 for telnet connections
Info : clock speed 2000 kHz
Error: libusb_open() failed with LIBUSB_ERROR_NOT_SUPPORTED
Info : STLINK V2J35S7 (API v2) VID:PID 0483:3748
Info : Target voltage: 6.436364
Info : stm32f4x.cpu: hardware has 6 breakpoints, 4 watchpoints
Info : starting gdb server for stm32f4x.cpu on 3333
Info : Listening on port 3333 for gdb connections
$ arm-none-eabi-gdb -q target\thumbv7em-none-eabihf\debug\test_blink
Reading symbols from target\thumbv7em-none-eabihf\debug\test_blink...
(gdb) target remote :3333
Remote debugging using :3333
0x00004070 in core::ptr::read_volatile (src=0xe000e010) at C:\Users\User\.rustup\toolchains\stable-x86_64-pc-windows-msvc\lib/rustlib/src/rust\src/libcore/ptr/mod.rs:1005
1005 pub unsafe fn read_volatile<T>(src: *const T) -> T {
(gdb) load
Loading section .vector_table, size 0x1a8 lma 0x0
Loading section .text, size 0x47bc lma 0x1a8
Loading section .rodata, size 0xbf0 lma 0x4970
Start address 0x47a2, load size 21844
Transfer rate: 100 KB/sec, 5461 bytes/write.
(gdb) b main
Breakpoint 1 at 0x1f2: file src\main.rs, line 15.
(gdb) continue
Continuing.
Program received signal SIGINT, Interrupt.
0x00001530 in cortex_m::peripheral::syst::<impl cortex_m::peripheral::SYST>::has_wrapped (self=0x1000fc6c)
at C:\Users\User\.cargo\registry\src\github.com-1ecc6299db9ec823\cortex-m-0.6.3\src\peripheral/syst.rs:135
135 pub fn has_wrapped(&mut self) -> bool {
(gdb) bt
#0 0x00001530 in cortex_m::peripheral::syst::<impl cortex_m::peripheral::SYST>::has_wrapped (self=0x1000fc6c)
at C:\Users\User\.cargo\registry\src\github.com-1ecc6299db9ec823\cortex-m-0.6.3\src\peripheral/syst.rs:135
#1 0x00003450 in <stm32f4xx_hal::delay::Delay as embedded_hal::blocking::delay::DelayUs<u32>>::delay_us (self=0x1000fc6c, us=500000)
at C:\Users\User\.cargo\registry\src\github.com-1ecc6299db9ec823\stm32f4xx-hal-0.8.3\src/delay.rs:69
#2 0x0000339e in <stm32f4xx_hal::delay::Delay as embedded_hal::blocking::delay::DelayMs<u32>>::delay_ms (self=0x1000fc6c, ms=500)
at C:\Users\User\.cargo\registry\src\github.com-1ecc6299db9ec823\stm32f4xx-hal-0.8.3\src/delay.rs:32
#3 0x00000318 in test_blink::__cortex_m_rt_main () at src\main.rs:40
#4 0x000001f6 in main () at src\main.rs:15
memory.x file:
MEMORY
{
/* NOTE 1 K = 1 KiBi = 1024 bytes */
/* TODO Adjust these memory regions to match your device memory layout */
/* These values correspond to the LM3S6965, one of the few devices QEMU can emulate */
CCMRAM : ORIGIN = 0x10000000, LENGTH = 64K
RAM : ORIGIN = 0x20000000, LENGTH = 128K
FLASH : ORIGIN = 0x00000000, LENGTH = 512K
}
/* This is where the call stack will be allocated. */
/* The stack is of the full descending type. */
/* You may want to use this variable to locate the call stack and static
variables in different memory regions. Below is shown the default value */
_stack_start = ORIGIN(CCMRAM) + LENGTH(CCMRAM);
/* You can use this symbol to customize the location of the .text section */
/* If omitted the .text section will be placed right after the .vector_table
section */
/* This is required only on microcontrollers that store some configuration right
after the vector table */
/* _stext = ORIGIN(FLASH) + 0x400; */
/* Example of putting non-initialized variables into custom RAM locations. */
/* This assumes you have defined a region RAM2 above, and in the Rust
sources added the attribute `#[link_section = ".ram2bss"]` to the data
you want to place there. */
/* Note that the section will not be zero-initialized by the runtime! */
/* SECTIONS {
.ram2bss (NOLOAD) : ALIGN(4) {
*(.ram2bss);
. = ALIGN(4);
} > RAM2
} INSERT AFTER .bss;
*/
openocd.cfg file:
# Sample OpenOCD configuration for the STM32F3DISCOVERY development board
# Depending on the hardware revision you got you'll have to pick ONE of these
# interfaces. At any time only one interface should be commented out.
# Revision C (newer revision)
source [find interface/stlink.cfg]
# Revision A and B (older revisions)
# source [find interface/stlink-v2.cfg]
source [find target/stm32f4x.cfg]
# use hardware reset, connect under reset
# reset_config none separate
main.rs file:
#![no_main]
#![no_std]
#![allow(unsafe_code)]
// Halt on panic
#[allow(unused_extern_crates)] // NOTE(allow) bug rust-lang/rust#53964
extern crate panic_halt; // panic handler
use cortex_m;
use cortex_m_rt::entry;
use stm32f4xx_hal as hal;
use crate::hal::{prelude::*, stm32};
#[entry]
fn main() -> ! {
if let (Some(dp), Some(cp)) = (
stm32::Peripherals::take(),
cortex_m::peripheral::Peripherals::take(),
) {
let rcc = dp.RCC.constrain();
let clocks = rcc
.cfgr
.sysclk(168.mhz())
.freeze();
let mut delay = hal::delay::Delay::new(cp.SYST, clocks);
let gpioa = dp.GPIOA.split();
let mut l1 = gpioa.pa6.into_push_pull_output();
let mut l2 = gpioa.pa7.into_push_pull_output();
loop {
l1.set_low().unwrap();
l2.set_high().unwrap();
delay.delay_ms(500u32);
l1.set_high().unwrap();
l2.set_low().unwrap();
delay.delay_ms(500u32);
}
}
loop {}
}
Cargo.toml file:
[package]
name = "test_blink"
version = "0.1.0"
authors = ["Alex"]
edition = "2018"
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[dependencies]
embedded-hal = "0.2"
nb = "0.1.2"
cortex-m = "0.6"
cortex-m-rt = "0.6"
# Panic behaviour, see https://crates.io/keywords/panic-impl for alternatives
panic-halt = "0.2"
cortex-m-log="0.6.2"
[dependencies.stm32f4xx-hal]
version = "0.8.3"
features = ["rt", "stm32f407"]
I am new to rust embedded and maybe I have done something wrong, but I have already tried all the options I can find on the Internet.
At first I thought it was a problem with the cortex-debug plugin for vscode and even created the issue, but the guys couldn't help me because the problem is obviously not on their side.
Debugging "C" code in cubeIDE works, so I dare to assume that the problem is somewhere in rust--gdb--openocd. Perhaps I am missing something, but unfortunately I cannot find it myself yet.
I would appreciate any resources or ideas to solve this problem.
I'm hoping you checked out this resources:
Discovery - debug
From your screen-grab of arm-none-eabi-gdb it does indeed look it it did not hit the break point.
you should have seen this message afterwards:
Note: automatically using hardware breakpoints for read-only addresses.
Breakpoint 1, main () at ...
Did you compile your source with symbols, and unoptimised?
Your config all looks right to me otherwise.
I am trying to understand the GCC link script and create a small demo to practice , however I got the "syntax error" from ld. I appreciate any comments or suggestions. Thank you so much!
hello.c
__attribute__((section(".testsection"))) volatile int testVariable;
hello.ld
MEMORY {
TEST_SECTION: ORIGIN = 0x43840000 , LENGTH = 0x50
}
SECTIONS{
.testsection: > TEST_SECTION /* Syntax error here*/
}
compile command
gcc -T hello.ld -o hello hello.o
c:/gnu/bin/../lib/gcc/arm-none-eabi/9.2.1/../../../../arm-none-eabi/bin/ld.exe:../hello.ld:20: syntax error
collect2.exe: error: ld returned 1 exit status
Original post
MEMORY {
TEST_SECTION: ORIGIN = 0x43840000 LENGTH = 0x50
}
SECTIONS{
.testsection: > TEST_SECTION
}
Solution
MEMORY {
TEST_SECTION: ORIGIN = 0x43840000, LENGTH = 0x50
}
SECTIONS{
.testsection: { *(.testsection*) } > TEST_SECTION
}
I think it was missing two things.
First the comma between the origin value and LENGTH.
Second a section command was missing. I don't think the GNU linker script syntax allows for that, you probably need to put something in there anyway to have this be useful. You could try
.testsection: { } > TEST_SECTION
but my guess without trying it is that you wouldn't end up with anything in that memory.
Edit
After doing an experiment:
so.s
.text
add r0,r0,r0
add r1,r2,r3
add r1,r1,r1
add r2,r2,r2
so.ld
MEMORY
{
bob : ORIGIN = 0x30000000, LENGTH = 0x1000
ted : ORIGIN = 0x20000000, LENGTH = 0x1000
}
SECTIONS
{
.hello : { *(.text*) } > bob
.world : { } > ted
.text : { } > ted
}
Disassembly of section .hello:
30000000 <.hello>:
30000000: e0800000 add r0, r0, r0
30000004: e0821003 add r1, r2, r3
30000008: e0811001 add r1, r1, r1
3000000c: e0822002 add r2, r2, r2
As I suspected, first I already new the names in MEMORY are whatever you want, as you saw as well, RAM, ROM, etc are not special names just can't use reserved names if any. I suspected that SECTIONS worked the same way the thing on the left is a new name you are creating for the output. This built and linked just fine (not a real program obviously just enough to get the tools to do a demonstration).
The linker did not complain, but at the same time since there was nothing actually placed in ted then nothing was placed in that memory space in the output binary. .hello and .text don't have to match, one is an output name one is an input name, very often for simple stuff folks make them match, why create a new name. Only did it here for demonstration purposes to show that by using the same name on the left does not automatically include the input section to the output.
I highly recommend doing experiments like this and using tools like readelf and objdump and others to examine what is going on (as well as reading the documentation for the scripting language).
I have read https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-bar which details about PTX synchronization function.
It says there are 16 "barrier logical resource", and you can specify which barrier to use with the parameter "a". What is a barrier logical resource?
I have a piece of code from an outside source, which I know works. However, I cannot understand the syntax used inside "asm" and what "memory" does. I assume "name" replaces "%0" and "numThreads" replace "%1", but what is "memory" and what are the colons doing?
__device__ __forceinline__ void namedBarrierSync(int name, int numThreads) {
asm volatile("bar.sync %0, %1;" : : "r"(name), "r"(numThreads) : "memory");}
In a block of 256 threads, I only want threads 64 ~ 127 to synchronize. Is this possible with barrier.sync
function? ( for an example, say I have a grid of 1 block, block of 256 threads. we split the block into 3 conditional branches s.t. threads 0 ~ 63 go into kernel1, threads 64 ~ 127 go into kernel 2, and threads 128 ~ 255 go into kernel 3. I want threads in kernel 2 to only synchronize among themselves. So if I use the "namedBarrierSync" function defied above: "namedBarrierSync( 1, 64)". Then does it synchronize only threads 64 ~ 127, or threads 0 ~ 63?
I have tested with below code ( assume that gpuAssert is an error checking function defined somewhere in the file ).
Here is the code:
__global__ void test(int num_threads)
{
if (threadIdx.x >= 64 && threadIdx.x < 128)
{
namedBarrierSync(0, num_threads) ;
}
__syncthreads();
}
int main(void)
{
test<<<1, 1, 256>>>(128);
gpuAssert(cudaDeviceSynchronize(), __FILE__, __LINE_);
printf("complete\n");
return 1;
}
"barrier logical resource" are the hardware necessary to synchronize threads/warps in a thread block (probably atomic counters etc.). You don't need to know the actual hardware implementation to program them, it is sufficient to know there are 16 instances of them available.
As Robert Crovella has pointed out in your cross-post on the Nvidia forum, the documentation for inline PTX is at https://docs.nvidia.com/cuda/inline-ptx-assembly/index.html.
barrier.sync with a named barrier and thread count of 64 synchronizes the first two warps arriving at the named barrier (for compute capability up to 6.x) or the first 64 threads arriving at the named barrier (for compute capability 7.0 onwards).
Your test only launches a single thread (with 256 bytes of shared memory allocated to it), which makes tests of synchronisation instructions moot. You want to launch the test kernel as test<<<1, 256>>>(128); instead.
rom1 and rom2 have different address map and are not continuous!
some objects have to be placed into rom2.
every time rom1 is linked, objects in rom2 should be the fixed address(rom2). In other words, rom1 should know rom2's symbols' address when linking.
Can I link an elf(rom2) into rom1?
If I understand well :
your system has 2 memories rom1 and rom2
some objects have to be located in rom1, others in rom2
Your link script shall look like the following :
MEMORY
{
rom1 : org=0x10000 len=1024
rom2 : org=0x40000 len=1024
}
SECTIONS
{
.text1 0x10000 : {foo.o(.text) } > rom1
.text2 0x40000 : {bar.o(.text) } > rom2
}
In the part SECTIONS the linker collects the .text sections from foo.o and puts this .text section within the output section .text1 starting at address 0x10000 in rom1.
Similarly, it collects the .text section from bar.o and puts it in rom2.
I cannot get what size parameters are for fftwf_plan_dft_r2c_2d
Input: a N rows by M cols matrix
Output: a N rows by floor(M/2) + 1 cols matrix ?
Are parameters input or output size?
Tried to give input size. This is what GDB sais
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 1676.0x768]
0x637eed72 in n1fv_8 () from C:\devfiles\bin\libfftw3f-3.dll
(gdb) backtrace
#0 0x637eed72 in n1fv_8 () from C:\devfiles\bin\libfftw3f-3.dll
#1 0x7c91a000 in ntdll!RtlpUnWaitCriticalSection ()
from C:\WINDOWS\system32\ntdll.dll
No other thread uses fftw at the time.
The context:
Herbs::MatrixStorage<float> spectrum_in(frame_a.nRowsGet(),frame_a.nColsGet());
Herbs::MatrixStorage<std::complex<float>>
spectrum_out(frame_a.nRowsGet(),frame_a.nColsGet()/2+1);
Check for heap corruption issued by possible alllocation mistakes in MatrixStorage class . The rows in the matrix is one large block. rowGet returns the pointer to the given row.
memset(spectrum_in.rowGet(0),0
,sizeof(float)*spectrum_in.nRowsGet()*spectrum_in.nColsGet());
memset(spectrum_out.rowGet(0),0
,sizeof(float)*spectrum_out.nRowsGet()*spectrum_out.nColsGet());
heapdump(); //Heap seams to be fine after these
Causes sigsevg
FFT::PlanFloat_2dR2C plan(spectrum_in,spectrum_out);
The plan constructor does the following
plan=FFT::PlanFloat_2dR2C::PlanFloat_2dR2C(Herbs::MatrixStorage<InputType>& buffer_in
,Herbs::MatrixStorage<OutputType>& buffer_out)
{
plan=fftwf_plan_dft_r2c_2d
(
buffer_in.nRowsGet()
,buffer_in.nColsGet()
,buffer_in.rowGet(0)
,(fftwf_complex*)buffer_out.rowGet(0)
,FFTW_MEASURE
);
}
EDIT:
I used a precompiled DLL instead. GCC may produce bad code on 32-bit Windows (From release notes):
Removed an archaic stack-alignment hack that was failing with gcc-4.7/i386. Added stack-alignment hack necessary for gcc on Windows/i386. We will regret this in ten years (see previous change).
EDIT 2:
The dll became bad due to wrong options given to the configure script. The documentation is now updated.
You are correct. For an NxM float or double input you should allocate Nx(M/2+1) fftwf_complex or fftw_complex output.
The parameters are the input size. For example,
#include "fftw3.h"
#define Width 1024
#define Height 768
int main(){
float input[Width*Height];
fftwf_complex *fft = new fftwf_complex[((Width/2)+1)*Height];
fftwf_plan fplan = fftwf_plan_dft_r2c_2d(Height, Width,
(float*)input, fft, FFTW_ESTIMATE);
fftwf_execute(fplan);
fftwf_destroy_plan(fplan);
}
See Chapter 4.3 Basic Interface of the FFTW User Manual