What does attributte section mean in UEFI memmap? - linux-kernel

When I call memmap in the UEFI shell, I got two different attributes like the following:
Shell> memmap
Type Start End # Pages Attributes
Available 0000000080000000-00000000CFFFFFFF 0000000000050000 000000000000000E
Available 0000002000000000-000000237FFFFFFF 0000000000380000 000000000000000F
The problem is that I cannot use the second region of memory whose attribute marks as 000000000000000F. I've already registered that part of memory to my page table. But, my OS will panic when I convert a physical address from that region to a virtual address.
So, my problem is:
What does the attribute means?
How can I change the attribute so that I can use that part of memory?

The memmap command displays the memory map that is maintained by the UEFI environment by listing the contents of EFI_MEMORY_DESCRIPTOR for each memory region.
typedef struct {
UINT32 Type;
EFI_PHYSICAL_ADDRESS PhysicalStart;
EFI_VIRTUAL_ADDRESS VirtualStart;
UINT64 NumberOfPages;
UINT64 Attribute;
} EFI_MEMORY_DESCRIPTOR;
The Attribute field of a memory region describes the bit mask of capabilities for that memory region, and not necessarily the current settings for that memory region.
//*******************************************************
// Memory Attribute Definitions
//*******************************************************
// These types can be “ORed” together as needed.
#define EFI_MEMORY_UC 0x0000000000000001
#define EFI_MEMORY_WC 0x0000000000000002
#define EFI_MEMORY_WT 0x0000000000000004
#define EFI_MEMORY_WB 0x0000000000000008
#define EFI_MEMORY_UCE 0x0000000000000010
#define EFI_MEMORY_WP 0x0000000000001000
#define EFI_MEMORY_RP 0x0000000000002000
#define EFI_MEMORY_XP 0x0000000000004000
#define EFI_MEMORY_NV 0x0000000000008000
#define EFI_MEMORY_MORE_RELIABLE 0x0000000000010000
#define EFI_MEMORY_RO 0x0000000000020000
#define EFI_MEMORY_SP 0x0000000000040000
#define EFI_MEMORY_CPU_CRYPTO 0x0000000000080000
#define EFI_MEMORY_RUNTIME 0x8000000000000000
Full details about each of these attributes can be found in the UEFI Specification Section 7.2 under GetMemoryMap.
The only difference in attribute value between your two memory regions is EFI_MEMORY_UC, i.e. memory is not cacheable.

Related

Rust debugging doesn't stop at the breakpoints when debugging stm32f407 via openocd and gdb

I have a problem debugging an stm32f407vet6 board and rust code.
The point of the problem is that GDB ignores breakpoints.
After setting breakpoints and executing the "continue" command in gdb, the program continues to ignore all breakpoints.
The only way to stop the program running is to cause an interrupt using the "ctrl + c" command.
After this command, the board stops its execution on the line currently being executed.
I have tried to set breakpoints on all lines where I can set them, but all the attempts are unsuccessful.
$ openocd
Open On-Chip Debugger 0.10.0 (2020-07-01) [https://github.com/sysprogs/openocd]
Licensed under GNU GPL v2
libusb1 09e75e98b4d9ea7909e8837b7a3f00dda4589dc3
For bug reports, read
http://openocd.org/doc/doxygen/bugs.html
Info : auto-selecting first available session transport "hla_swd". To override use 'transport select <transport>'.
Info : The selected transport took over low-level target control. The results might differ compared to plain JTAG/SWD
Info : Listening on port 6666 for tcl connections
Info : Listening on port 4444 for telnet connections
Info : clock speed 2000 kHz
Error: libusb_open() failed with LIBUSB_ERROR_NOT_SUPPORTED
Info : STLINK V2J35S7 (API v2) VID:PID 0483:3748
Info : Target voltage: 6.436364
Info : stm32f4x.cpu: hardware has 6 breakpoints, 4 watchpoints
Info : starting gdb server for stm32f4x.cpu on 3333
Info : Listening on port 3333 for gdb connections
$ arm-none-eabi-gdb -q target\thumbv7em-none-eabihf\debug\test_blink
Reading symbols from target\thumbv7em-none-eabihf\debug\test_blink...
(gdb) target remote :3333
Remote debugging using :3333
0x00004070 in core::ptr::read_volatile (src=0xe000e010) at C:\Users\User\.rustup\toolchains\stable-x86_64-pc-windows-msvc\lib/rustlib/src/rust\src/libcore/ptr/mod.rs:1005
1005 pub unsafe fn read_volatile<T>(src: *const T) -> T {
(gdb) load
Loading section .vector_table, size 0x1a8 lma 0x0
Loading section .text, size 0x47bc lma 0x1a8
Loading section .rodata, size 0xbf0 lma 0x4970
Start address 0x47a2, load size 21844
Transfer rate: 100 KB/sec, 5461 bytes/write.
(gdb) b main
Breakpoint 1 at 0x1f2: file src\main.rs, line 15.
(gdb) continue
Continuing.
Program received signal SIGINT, Interrupt.
0x00001530 in cortex_m::peripheral::syst::<impl cortex_m::peripheral::SYST>::has_wrapped (self=0x1000fc6c)
at C:\Users\User\.cargo\registry\src\github.com-1ecc6299db9ec823\cortex-m-0.6.3\src\peripheral/syst.rs:135
135 pub fn has_wrapped(&mut self) -> bool {
(gdb) bt
#0 0x00001530 in cortex_m::peripheral::syst::<impl cortex_m::peripheral::SYST>::has_wrapped (self=0x1000fc6c)
at C:\Users\User\.cargo\registry\src\github.com-1ecc6299db9ec823\cortex-m-0.6.3\src\peripheral/syst.rs:135
#1 0x00003450 in <stm32f4xx_hal::delay::Delay as embedded_hal::blocking::delay::DelayUs<u32>>::delay_us (self=0x1000fc6c, us=500000)
at C:\Users\User\.cargo\registry\src\github.com-1ecc6299db9ec823\stm32f4xx-hal-0.8.3\src/delay.rs:69
#2 0x0000339e in <stm32f4xx_hal::delay::Delay as embedded_hal::blocking::delay::DelayMs<u32>>::delay_ms (self=0x1000fc6c, ms=500)
at C:\Users\User\.cargo\registry\src\github.com-1ecc6299db9ec823\stm32f4xx-hal-0.8.3\src/delay.rs:32
#3 0x00000318 in test_blink::__cortex_m_rt_main () at src\main.rs:40
#4 0x000001f6 in main () at src\main.rs:15
memory.x file:
MEMORY
{
/* NOTE 1 K = 1 KiBi = 1024 bytes */
/* TODO Adjust these memory regions to match your device memory layout */
/* These values correspond to the LM3S6965, one of the few devices QEMU can emulate */
CCMRAM : ORIGIN = 0x10000000, LENGTH = 64K
RAM : ORIGIN = 0x20000000, LENGTH = 128K
FLASH : ORIGIN = 0x00000000, LENGTH = 512K
}
/* This is where the call stack will be allocated. */
/* The stack is of the full descending type. */
/* You may want to use this variable to locate the call stack and static
variables in different memory regions. Below is shown the default value */
_stack_start = ORIGIN(CCMRAM) + LENGTH(CCMRAM);
/* You can use this symbol to customize the location of the .text section */
/* If omitted the .text section will be placed right after the .vector_table
section */
/* This is required only on microcontrollers that store some configuration right
after the vector table */
/* _stext = ORIGIN(FLASH) + 0x400; */
/* Example of putting non-initialized variables into custom RAM locations. */
/* This assumes you have defined a region RAM2 above, and in the Rust
sources added the attribute `#[link_section = ".ram2bss"]` to the data
you want to place there. */
/* Note that the section will not be zero-initialized by the runtime! */
/* SECTIONS {
.ram2bss (NOLOAD) : ALIGN(4) {
*(.ram2bss);
. = ALIGN(4);
} > RAM2
} INSERT AFTER .bss;
*/
openocd.cfg file:
# Sample OpenOCD configuration for the STM32F3DISCOVERY development board
# Depending on the hardware revision you got you'll have to pick ONE of these
# interfaces. At any time only one interface should be commented out.
# Revision C (newer revision)
source [find interface/stlink.cfg]
# Revision A and B (older revisions)
# source [find interface/stlink-v2.cfg]
source [find target/stm32f4x.cfg]
# use hardware reset, connect under reset
# reset_config none separate
main.rs file:
#![no_main]
#![no_std]
#![allow(unsafe_code)]
// Halt on panic
#[allow(unused_extern_crates)] // NOTE(allow) bug rust-lang/rust#53964
extern crate panic_halt; // panic handler
use cortex_m;
use cortex_m_rt::entry;
use stm32f4xx_hal as hal;
use crate::hal::{prelude::*, stm32};
#[entry]
fn main() -> ! {
if let (Some(dp), Some(cp)) = (
stm32::Peripherals::take(),
cortex_m::peripheral::Peripherals::take(),
) {
let rcc = dp.RCC.constrain();
let clocks = rcc
.cfgr
.sysclk(168.mhz())
.freeze();
let mut delay = hal::delay::Delay::new(cp.SYST, clocks);
let gpioa = dp.GPIOA.split();
let mut l1 = gpioa.pa6.into_push_pull_output();
let mut l2 = gpioa.pa7.into_push_pull_output();
loop {
l1.set_low().unwrap();
l2.set_high().unwrap();
delay.delay_ms(500u32);
l1.set_high().unwrap();
l2.set_low().unwrap();
delay.delay_ms(500u32);
}
}
loop {}
}
Cargo.toml file:
[package]
name = "test_blink"
version = "0.1.0"
authors = ["Alex"]
edition = "2018"
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[dependencies]
embedded-hal = "0.2"
nb = "0.1.2"
cortex-m = "0.6"
cortex-m-rt = "0.6"
# Panic behaviour, see https://crates.io/keywords/panic-impl for alternatives
panic-halt = "0.2"
cortex-m-log="0.6.2"
[dependencies.stm32f4xx-hal]
version = "0.8.3"
features = ["rt", "stm32f407"]
I am new to rust embedded and maybe I have done something wrong, but I have already tried all the options I can find on the Internet.
At first I thought it was a problem with the cortex-debug plugin for vscode and even created the issue, but the guys couldn't help me because the problem is obviously not on their side.
Debugging "C" code in cubeIDE works, so I dare to assume that the problem is somewhere in rust--gdb--openocd. Perhaps I am missing something, but unfortunately I cannot find it myself yet.
I would appreciate any resources or ideas to solve this problem.
I'm hoping you checked out this resources:
Discovery - debug
From your screen-grab of arm-none-eabi-gdb it does indeed look it it did not hit the break point.
you should have seen this message afterwards:
Note: automatically using hardware breakpoints for read-only addresses.
Breakpoint 1, main () at ...
Did you compile your source with symbols, and unoptimised?
Your config all looks right to me otherwise.

CUDA: how to use barrier.sync

I have read https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-bar which details about PTX synchronization function.
It says there are 16 "barrier logical resource", and you can specify which barrier to use with the parameter "a". What is a barrier logical resource?
I have a piece of code from an outside source, which I know works. However, I cannot understand the syntax used inside "asm" and what "memory" does. I assume "name" replaces "%0" and "numThreads" replace "%1", but what is "memory" and what are the colons doing?
__device__ __forceinline__ void namedBarrierSync(int name, int numThreads) {
asm volatile("bar.sync %0, %1;" : : "r"(name), "r"(numThreads) : "memory");}
In a block of 256 threads, I only want threads 64 ~ 127 to synchronize. Is this possible with barrier.sync
function? ( for an example, say I have a grid of 1 block, block of 256 threads. we split the block into 3 conditional branches s.t. threads 0 ~ 63 go into kernel1, threads 64 ~ 127 go into kernel 2, and threads 128 ~ 255 go into kernel 3. I want threads in kernel 2 to only synchronize among themselves. So if I use the "namedBarrierSync" function defied above: "namedBarrierSync( 1, 64)". Then does it synchronize only threads 64 ~ 127, or threads 0 ~ 63?
I have tested with below code ( assume that gpuAssert is an error checking function defined somewhere in the file ).
Here is the code:
__global__ void test(int num_threads)
{
if (threadIdx.x >= 64 && threadIdx.x < 128)
{
namedBarrierSync(0, num_threads) ;
}
__syncthreads();
}
int main(void)
{
test<<<1, 1, 256>>>(128);
gpuAssert(cudaDeviceSynchronize(), __FILE__, __LINE_);
printf("complete\n");
return 1;
}
"barrier logical resource" are the hardware necessary to synchronize threads/warps in a thread block (probably atomic counters etc.). You don't need to know the actual hardware implementation to program them, it is sufficient to know there are 16 instances of them available.
As Robert Crovella has pointed out in your cross-post on the Nvidia forum, the documentation for inline PTX is at https://docs.nvidia.com/cuda/inline-ptx-assembly/index.html.
barrier.sync with a named barrier and thread count of 64 synchronizes the first two warps arriving at the named barrier (for compute capability up to 6.x) or the first 64 threads arriving at the named barrier (for compute capability 7.0 onwards).
Your test only launches a single thread (with 256 bytes of shared memory allocated to it), which makes tests of synchronisation instructions moot. You want to launch the test kernel as test<<<1, 256>>>(128); instead.

BLOCK and ALIGN in linker script

What exactly do the BLOCK and ALIGN commands do in a linker script? The names seem to speak for themselves; but, I don't see exactly how they are affecting the resulting OS image.
The linker script below is based on the example given by the OSDev Bare Bones example (http://wiki.osdev.org/Bare_Bones).
(1) In the resulting image, the .rodata section begins at address 0x1400, and the .data section begins at 0x2400. Shouldn't the BLOCK(4K) command place .rodata at 0x1000 (or some other multiple of 4KB)?
(2) What does ALIGN(4K) do? The documentation (https://www.math.utah.edu/docs/info/ld_3.html) indicates that it is just a calculation that returns the current location adjusted to the nearest 4K boundary. However, the documentation for SECTION doesn't indicate a place for a parameter after the colon (other than an AT command, which isn't present here).
(3) In either case, why have both BLOCK and ALIGN?
SECTIONS
{
. = 0x7c00;
__start = .;
.text :
{
*(.boot)
/*
Magic bytes. 0x1FE == 510.
We could add this on each Gas file separately with `.word`,
but this is the perfect place to DRY that out.
*/
. = 0x1FE;
SHORT(0xAA55)
/*
This is only needed if we are going to use a 2 stage boot process,
e.g. by reading more disk than the default 512 bytes with BIOS `int 0x13`.
*/
*(.stage2)
*(.text)
}
/* Read-only data. */
.rodata BLOCK(4K) : ALIGN(4K)
{
LONG(0x11223344) /* just a marker so I can see where the linker places this section */
*(.rodata)
}
/* Read-write data (initialized) */
.data BLOCK(4K) : ALIGN(4K)
{
LONG(0x44332211)
*(.data)
}

How the Barebox boots up for Beaglebone Black?

I want to know the step-by-step boot sequence of Barebox for Beaglebone Black.
which function will execute first to how it handover the control to Kernel?
I would recommend you to check this presentation first. Page 3 and 4 are showing boot sequence in picture.
If you want to get barebox binary for Beaglebone board, you will enable 'CONFIG_MACH_BEAGLEBONE'.
In file 'images/Makefile.am33xx' you find entry function named 'start_am33xx_beaglebone_sdram' for this config option (SDRAM)
pblx-$(CONFIG_MACH_BEAGLEBONE) += start_am33xx_beaglebone_sdram
FILE_barebox-am33xx-beaglebone.img = start_am33xx_beaglebone_sdram.pblx
am33xx-barebox-$(CONFIG_MACH_BEAGLEBONE) += barebox-am33xx-beaglebone.img
This entry function is the "first step" (low level HW init) defined in 'arch/arm/boards/beaglebone/lowlevel.c' file.
Then the call chain is like 'barebox_arm_entry' ('arch/arm/include/asm/barebox-arm.h') -> 'barebox_*_pbl_start' ('arch/arm/cpu/entry.c') -> ...
Then initcalls will be called
#define core_initcall(fn) __define_initcall("1",fn,1)
#define postcore_initcall(fn) __define_initcall("2",fn,2)
#define console_initcall(fn) __define_initcall("3",fn,3)
#define postconsole_initcall(fn) __define_initcall("4",fn,4)
#define mem_initcall(fn) __define_initcall("5",fn,5)
#define mmu_initcall(fn) __define_initcall("6",fn,6)
#define postmmu_initcall(fn) __define_initcall("7",fn,7)
#define coredevice_initcall(fn) __define_initcall("8",fn,8)
#define fs_initcall(fn) __define_initcall("9",fn,9)
#define device_initcall(fn) __define_initcall("10",fn,10)
#define crypto_initcall(fn) __define_initcall("11",fn,11)
#define of_populate_initcall(fn) __define_initcall("12",fn,12)
#define late_initcall(fn) __define_initcall("13",fn,13)
#define environment_initcall(fn) __define_initcall("14",fn,14)
#define postenvironment_initcall(fn) __define_initcall("15",fn,15)
See these definitions.
Last (environment) init calls will load environment and run 'init' script(s). With boot/bootm/.. barebox commands you can load 'zImage', 'dtb', 'initrd' and pass commandline arguments for Linux kernel.

FFTW R2C two-dimensional size parameters

I cannot get what size parameters are for fftwf_plan_dft_r2c_2d
Input: a N rows by M cols matrix
Output: a N rows by floor(M/2) + 1 cols matrix ?
Are parameters input or output size?
Tried to give input size. This is what GDB sais
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 1676.0x768]
0x637eed72 in n1fv_8 () from C:\devfiles\bin\libfftw3f-3.dll
(gdb) backtrace
#0 0x637eed72 in n1fv_8 () from C:\devfiles\bin\libfftw3f-3.dll
#1 0x7c91a000 in ntdll!RtlpUnWaitCriticalSection ()
from C:\WINDOWS\system32\ntdll.dll
No other thread uses fftw at the time.
The context:
Herbs::MatrixStorage<float> spectrum_in(frame_a.nRowsGet(),frame_a.nColsGet());
Herbs::MatrixStorage<std::complex<float>>
spectrum_out(frame_a.nRowsGet(),frame_a.nColsGet()/2+1);
Check for heap corruption issued by possible alllocation mistakes in MatrixStorage class . The rows in the matrix is one large block. rowGet returns the pointer to the given row.
memset(spectrum_in.rowGet(0),0
,sizeof(float)*spectrum_in.nRowsGet()*spectrum_in.nColsGet());
memset(spectrum_out.rowGet(0),0
,sizeof(float)*spectrum_out.nRowsGet()*spectrum_out.nColsGet());
heapdump(); //Heap seams to be fine after these
Causes sigsevg
FFT::PlanFloat_2dR2C plan(spectrum_in,spectrum_out);
The plan constructor does the following
plan=FFT::PlanFloat_2dR2C::PlanFloat_2dR2C(Herbs::MatrixStorage<InputType>& buffer_in
,Herbs::MatrixStorage<OutputType>& buffer_out)
{
plan=fftwf_plan_dft_r2c_2d
(
buffer_in.nRowsGet()
,buffer_in.nColsGet()
,buffer_in.rowGet(0)
,(fftwf_complex*)buffer_out.rowGet(0)
,FFTW_MEASURE
);
}
EDIT:
I used a precompiled DLL instead. GCC may produce bad code on 32-bit Windows (From release notes):
Removed an archaic stack-alignment hack that was failing with gcc-4.7/i386. Added stack-alignment hack necessary for gcc on Windows/i386. We will regret this in ten years (see previous change).
EDIT 2:
The dll became bad due to wrong options given to the configure script. The documentation is now updated.
You are correct. For an NxM float or double input you should allocate Nx(M/2+1) fftwf_complex or fftw_complex output.
The parameters are the input size. For example,
#include "fftw3.h"
#define Width 1024
#define Height 768
int main(){
float input[Width*Height];
fftwf_complex *fft = new fftwf_complex[((Width/2)+1)*Height];
fftwf_plan fplan = fftwf_plan_dft_r2c_2d(Height, Width,
(float*)input, fft, FFTW_ESTIMATE);
fftwf_execute(fplan);
fftwf_destroy_plan(fplan);
}
See Chapter 4.3 Basic Interface of the FFTW User Manual

Resources