Device Tree - Overlap in memory

Device Tree - Overlap in memory - linux-kernel

One of the device tree specification file I came across has the following entry. If I understand it right, it has two memory regions starting at the address 40000000 as the node name shown as memory#40000000. The two ranges are 0x00 to 0x40000000 and 0x00 to 0x20000000. Aren't they overlapping? Why the node name says memory starts at 0x40000000 when the 'reg' entry contains 0x00
What is the correct interpretation of this memory node? (I went through the device tree specification, but I could not clearly understand this aspect)
memory#40000000 {
reg = <0x00 0x40000000 0x00 0x20000000>;
device_type = "memory";
};

Related

ARMv8A hypervisor - PCI MMU fault

I am trying to implement a minimal hypervisor on ARMv8A (Cortext A53 on QEMU Version 6.2.0).I have written a minimal hypervisor code in EL2 and the Linux boots successfully in EL1. Now I want to enable stage-2 MMU. I have written basic page tables in stage2 (Only the necessary page table entries to map to 1GB RAM). If I disable PCI in DTB the kernel boots successfully.The QEMU command line is given below.
qemu-system-aarch64 -machine virt,gic-version=2,virtualization=on -cpu cortex-a53 -nographic -smp 1 -m 4096 -kernel hypvisor/bin/hypervisor.elf -device loader,file=linux-5.10.155/arch/arm64/boot/Image,addr=0x80200000 -device loader,file=1gb_1core.dtb,addr=0x88000000
When the PCI is enabled in DTB, I am getting a kernel panic as shown below.
[ 0.646801] pci_bus 0000:00: root bus resource [mem 0x8000000000-0xffffffffff]
[ 0.647909] Unable to handle kernel paging request at virtual address 0000000093810004
[ 0.648109] Mem abort info:
[ 0.648183] ESR = 0x96000004
[ 0.648282] EC = 0x25: DABT (current EL), IL = 32 bits
[ 0.648403] SET = 0, FnV = 0
[ 0.648484] EA = 0, S1PTW = 0
[ 0.648568] Data abort info:
[ 0.648647] ISV = 0, ISS = 0x00000004
[ 0.648743] CM = 0, WnR = 0
[ 0.648885] [0000000093810004] user address but active_mm is swapper
[ 0.653399] Call trace:
[ 0.653598] pci_generic_config_read+0x38/0xe0
[ 0.653729] pci_bus_read_config_dword+0x80/0xe0
[ 0.653845] pci_bus_generic_read_dev_vendor_id+0x34/0x1b0
[ 0.653974] pci_bus_read_dev_vendor_id+0x4c/0x70
[ 0.654090] pci_scan_single_device+0x80/0x100
I set a GDB breakpoint in 'pci_generic_config_read' and observed that the faulting instruction is
>0xffff80001055d5c8 <pci_generic_config_read+56> ldr w1, [x0]
The value of register X0 is given below
(gdb) p /x $x0
$4 = 0xffff800020000000
The hardware (host) is configured to have 4GB in total and the Linux (guest) is supplied 1GB through command line and DTB. This is a single core system with 'kaslr' disabled.
Excerpt from the DTB containing PCI part is given below.
pcie#10000000 {
interrupt-map-mask = <0x1800 0x00 0x00 0x07>;
interrupt-map = <0x00 0x00 0x00 0x01 0x8001 0x00 0x00 0x00 0x03 0x04 0x00 0x00 0x00 0x02 0x8001 0x00 0x00 0x00 0x04 0x04 0x00 0x00 0x00 0x03 0x8001 0x00 0x00 0x00 0x05 0x04 0x00 0x00 0x00 0x04 0x8001 0x00 0x00 0x00 0x06 0x04 0x800 0x00 0x00 0x01 0x8001 0x00 0x00 0x00 0x04 0x04 0x800 0x00 0x00 0x02 0x8001 0x00 0x00 0x00 0x05 0x04 0x800 0x00 0x00 0x03 0x8001 0x00 0x00 0x00 0x06 0x04 0x800 0x00 0x00 0x04 0x8001 0x00 0x00 0x00 0x03 0x04 0x1000 0x00 0x00 0x01 0x8001 0x00 0x00 0x00 0x05 0x04 0x1000 0x00 0x00 0x02 0x8001 0x00 0x00 0x00 0x06 0x04 0x1000 0x00 0x00 0x03 0x8001 0x00 0x00 0x00 0x03 0x04 0x1000 0x00 0x00 0x04 0x8001 0x00 0x00 0x00 0x04 0x04 0x1800 0x00 0x00 0x01 0x8001 0x00 0x00 0x00 0x06 0x04 0x1800 0x00 0x00 0x02 0x8001 0x00 0x00 0x00 0x03 0x04 0x1800 0x00 0x00 0x03 0x8001 0x00 0x00 0x00 0x04 0x04 0x1800 0x00 0x00 0x04 0x8001 0x00 0x00 0x00 0x05 0x04>;
#interrupt-cells = <0x01>;
ranges = <0x1000000 0x00 0x00 0x00 0x3eff0000 0x00 0x10000 0x2000000 0x00 0x10000000 0x00 0x10000000 0x00 0x2eff0000 0x3000000 0x80 0x00 0x80 0x00 0x80 0x00>;
reg = <0x40 0x10000000 0x00 0x10000000>;
msi-parent = <0x8002>;
dma-coherent;
bus-range = <0x00 0xff>;
linux,pci-domain = <0x00>;
#size-cells = <0x02>;
#address-cells = <0x03>;
device_type = "pci";
compatible = "pci-host-ecam-generic";
};
If my interpretation of DTB is right, the PCI device is mapped to the address range '0x40_1000_0000' (offset) '0x1000_0000' (size 256MB). that is, it starts from 100GB in the physical address space.
I have written a page table entry mapping to this physical address as well (as a device memory).
Is it right for the PCI to map to such a higher address in the physical address space? Any hints on debugging this issue is greatly appreciated.

Yes, for a 64-bit CPU this is the expected place to find the PCI controller ECAM region. The virt board puts some "large" device memory regions beyond the 4GB mark (specifically, PCIE ECAM, a seconD PCIE MMIO window, and redistributors for CPUs above 123). (You can turn this off with -machine highmem=off if you like, though that will limit the amount of RAM you can give the VM to 3GB.)
Depending on what your hypervisor is doing, you might or might not want it to be talking directly to the host PCI controller anyway.

Linker error setting loading GDT register with LGDT instruction using Inline assembly

I was compiling my prototype of prototype of a kernel (sounds weird, but it really doesn't matter) and in the installation I need to link the ASM file to a C file compiled with gcc to get a executable that could be used as kernel.
The problem is that, after implementing a swap to protected mode from real mode, I get this error at linking the kernel.c and loader.asm scripts:
Code:
kernel.c:(.text+0x1e1): undefined reference to `gdtr'
I will explain how all process of installation is and I will put the codes below.
Installation steps:
1: Compile asm:
Code:
nasm -f elf32 loader.asm -o kasm.o
2: Compile .c :
Code:
gcc -m32 -ffreestanding -c kernel.c -o kc.o
3: Link both:
Code:
ld -m elf_i386 -T linker.ld -o kernel kasm.o kc.o
The complete error output is:
Code:
kc.o: In function `k_enter_protected_mode':
kernel.c:(.text+0x1e1): undefined reference to `gdtr'
The code looks like:
Code:
/*
*
* kernel.c - version 0.0.1
* This script is under the license of the distributed package, this license
* can be found in the package itself
* Script coded by Cristian Simón for the CKA Proyect
* ----
* License: GNU GPL v3
* Coder: Cristian Simón
* Proyect: CKA
*
*/
/* Output defines */
#define BLACK_BGROUND 0X07 /* black background */
#define WHITE_TXT 0x07 /* light gray on black text */
#define GREEN_TXT 0x02 /* light green on black text */
#define RED_TXT 0x04 /* light red on black text*/
#define CYAN_TXT 0x03 /*light cyan on black text */
#include <stddef.h>
#include <stdint.h>
#include <cpuid.h>
void k_clear_screen();
void k_sleep_3sec();
unsigned int k_printf(char *message, unsigned int line, float color);
void k_malloc(size_t sz);
void k_free(void *mem);
/* k_clear_screen : to clear the entire text screen */
void k_clear_screen()
{
char *vidmem = (char *) 0xC00B8000;
unsigned int i=0;
while(i < (80*25*2))
{
vidmem[i]=' ';
i++;
vidmem[i]=BLACK_BGROUND;
i++;
};
}
/* k_printf : the message and the line # */
unsigned int k_printf(char *message, unsigned int line, float color)
{
char *vidmem = (char *) 0xC00B8000;
unsigned int i=0;
i=(line*80*2);
while(*message!=0)
{
if(*message=='\n') /* check for a new line */
{
line++;
i=(line*80*2);
*message++;
} else {
vidmem[i]=*message;
*message++;
i++;
vidmem[i]=color;
i++;
};
};
return(1);
}
/*
* k_sleep_3sec : to make a simple delay of aprox 3 sec, since is a nasty sleep,
* duration will vary
* from system to system
*/
void k_sleep_3sec()
{
int c = 1, d = 1;
for ( c = 1 ; c <= 20000 ; c++ )
for ( d = 1 ; d <= 20000 ; d++ )
{}
}
/*
* Malloc and free functions for this kernel
* Maybe change in the future, sure
*/
static unsigned char our_memory[1024 * 1024]; /* reserve 1 MB for malloc */
static size_t next_index = 0;
int k_malloc_err;
void k_malloc(size_t sz)
{
void *mem;
if(sizeof our_memory - next_index < sz){
return NULL;
k_malloc_err = 1;
}
mem = &our_memory[next_index];
next_index += sz;
return mem;
}
void k_free(void *mem)
{
/* we cheat, and don't free anything. */
}
/* Schreduler */
/*---*/
/*
* Our schreduler is a RTC (Run to Completion)
* In the future we will add more schredulers or change the type
* but for now this is what we got
*/
int proc_number_count = 0;
void k_schreduler(char *proc_name, unsigned int proc_prior)
{
proc_number_count = proc_number_count + 1;
int proc_number = proc_number_count;
}
void k_enter_protected_mode()
{
__asm__ volatile ("cli;"
"lgdt (gdtr);"
"mov %eax, cr0;"
"or %al, 1;"
"mov cr0, %eax;"
"jmp 0x8,PModeMain;"
"PModeMain:");
}
/*main function*/
void k_main()
{
k_clear_screen();
k_printf(" Wellcome to", 0, WHITE_TXT);
k_printf(" CKA!", 1, GREEN_TXT);
k_printf("==============>", 2, WHITE_TXT);
k_printf(" CKA stands for C Kernel with Assembly", 3, WHITE_TXT);
k_printf(" Version 0.0.1, => based in the job of Debashis Barman", 4, WHITE_TXT);
k_printf(" Contact => assemblyislaw#gmail.com / blueshell#mail2tor.com", 5, WHITE_TXT);
k_printf(" or in the github repository page", 6, WHITE_TXT);
k_sleep_3sec();
k_clear_screen();
/* here start the magic */
k_printf(" !===> Starting Checkup <===!", 0, WHITE_TXT);
k_printf(" =-=-=-=-=-=-=-=-=-=-=-=-=-=-", 1, WHITE_TXT);
k_printf("[KernelInfo] Woah! No Kernel Panic for now! Well, lets fix that...", 2, CYAN_TXT);
k_printf("[Proc1] Checking for k_malloc() and k_free() kernel functions", 3, WHITE_TXT);
k_malloc(15);
if (k_malloc_err == 1){
k_printf("[F-ERROR] Unable to use k_malloc, do you have enough memory?", 4, RED_TXT);
while(1){
int error_stayer = 1;
}
} else{
k_printf("[Proc1] k_malloc and k_free found, resuming boot...", 4, GREEN_TXT);
}
k_enter_protected_mode();
k_printf("[KernelInfo] Switched to protected mode successfully", 5, CYAN_TXT);
}
This was kernel.c
Code:
ENTRY(loader)
OUTPUT_FORMAT(elf32-i386)
SECTIONS {
/* The kernel will live at 3GB + 1MB in the virtual
address space, which will be mapped to 1MB in the
physical address space. */
. = 0xC0100000;
.text : AT(ADDR(.text) - 0xC0000000) {
*(.text)
*(.rodata*)
}
.data ALIGN (0x1000) : AT(ADDR(.data) - 0xC0000000) {
*(.data)
}
.bss : AT(ADDR(.bss) - 0xC0000000) {
_sbss = .;
*(COMMON)
*(.bss)
_ebss = .;
}
}
This was the linker.ld
Code:
global _loader ; Make entry point visible to linker.
extern k_main ; _main is defined elsewhere
; setting up the Multiboot header - see GRUB docs for details
MODULEALIGN equ 1<<0 ; align loaded modules on page boundaries
MEMINFO equ 1<<1 ; provide memory map
FLAGS equ MODULEALIGN | MEMINFO ; this is the Multiboot 'flag' field
MAGIC equ 0x1BADB002 ; 'magic number' lets bootloader find the header
CHECKSUM equ -(MAGIC + FLAGS) ; checksum required
; This is the virtual base address of kernel space. It must be used to convert virtual
; addresses into physical addresses until paging is enabled. Note that this is not
; the virtual address where the kernel image itself is loaded -- just the amount that must
; be subtracted from a virtual address to get a physical address.
KERNEL_VIRTUAL_BASE equ 0xC0000000 ; 3GB
KERNEL_PAGE_NUMBER equ (KERNEL_VIRTUAL_BASE >> 22) ; Page directory index of kernel's 4MB PTE.
section .data
align 0x1000
BootPageDirectory:
; This page directory entry identity-maps the first 4MB of the 32-bit physical address space.
; All bits are clear except the following:
; bit 7: PS The kernel page is 4MB.
; bit 1: RW The kernel page is read/write.
; bit 0: P The kernel page is present.
; This entry must be here -- otherwise the kernel will crash immediately after paging is
; enabled because it can't fetch the next instruction! It's ok to unmap this page later.
dd 0x00000083
times (KERNEL_PAGE_NUMBER - 1) dd 0 ; Pages before kernel space.
; This page directory entry defines a 4MB page containing the kernel.
dd 0x00000083
times (1024 - KERNEL_PAGE_NUMBER - 1) dd 0 ; Pages after the kernel image.
section .text
align 4
MultiBootHeader:
dd MAGIC
dd FLAGS
dd CHECKSUM
; reserve initial kernel stack space -- that's 16k.
STACKSIZE equ 0x4000
; setting up entry point for linker
loader equ (_loader - 0xC0000000)
global loader
_loader:
; NOTE: Until paging is set up, the code must be position-independent and use physical
; addresses, not virtual ones!
mov ecx, (BootPageDirectory - KERNEL_VIRTUAL_BASE)
mov cr3, ecx ; Load Page Directory Base Register.
mov ecx, cr4
or ecx, 0x00000010 ; Set PSE bit in CR4 to enable 4MB pages.
mov cr4, ecx
mov ecx, cr0
or ecx, 0x80000000 ; Set PG bit in CR0 to enable paging.
mov cr0, ecx
; Start fetching instructions in kernel space.
; Since eip at this point holds the physical address of this command (approximately 0x00100000)
; we need to do a long jump to the correct virtual address of StartInHigherHalf which is
; approximately 0xC0100000.
lea ecx, [StartInHigherHalf]
jmp ecx ; NOTE: Must be absolute jump!
StartInHigherHalf:
; Unmap the identity-mapped first 4MB of physical address space. It should not be needed
; anymore.
mov dword [BootPageDirectory], 0
invlpg [0]
; NOTE: From now on, paging should be enabled. The first 4MB of physical address space is
; mapped starting at KERNEL_VIRTUAL_BASE. Everything is linked to this address, so no more
; position-independent code or funny business with virtual-to-physical address translation
; should be necessary. We now have a higher-half kernel.
mov esp, stack+STACKSIZE ; set up the stack
push eax ; pass Multiboot magic number
; pass Multiboot info structure -- WARNING: This is a physical address and may not be
; in the first 4MB!
push ebx
call k_main ; call kernel proper
hlt ; halt machine should kernel return
section .bss
align 32
stack:
resb STACKSIZE ; reserve 16k stack on a uint64_t boundary
This was loader.asm
I tried to solve this transforming the ASM block in an advanced ASM block and parsing gdtr as an argument but I don't understand this last method
How can I solve the error?

Your error:
kc.o: In function `k_enter_protected_mode':
kernel.c:(.text+0x1e1): undefined reference to `gdtr'
Is being generated because of this line of assembly code:
"lgdt (gdtr);"
gdtr is a memory operand and represents a label to a memory address where a GDT record can be found. You don't have such a structure defined with that name. That causes the undefined reference.
You need to create GDT record that contains the size and length of a GDT table. This record is what will get loaded into the GDT register by the LGDT instruction. You also haven't created a GDT table. gdtr should be a 6 byte structure consisting of the length of a GDT minus 1 (stored in a 16-bit word) and a 32-bit linear address where the GDT table can be found.
Rather than doing what you want in C I recommend just doing this in your assembly code prior to call k_main but after paging is set up.
Remove your k_enter_protected_mode function altogether in the C code. Then in the assembly file loader.asm place this code to load a new GDT at the start of your StartInHigherHalf code. So it would look like:
StartInHigherHalf:
; Set our own GDT, can't rely GDT register being valid after bootloader
; transfers control to our entry point
lgdt [gdtr] ; Load GDT Register with GDT record
mov eax, DATA_SEG
mov ds, eax ; Reload all the data descriptors with Data selector (2nd argument)
mov es, eax
mov gs, eax
mov fs, eax
mov ss, eax
jmp CODE_SEG:.setcs
; Do the FAR JMP to next instruction to set CS with Code selector, and
; set the EIP (instruction pointer) to offset of setcs
.setcs:
The only thing left is to define the GDT table. A simple one with a required NULL descriptor and a flat 32-bit code and data descriptor can be placed in your .data section by changing it to this:
section .data
align 0x1000
BootPageDirectory:
; This page directory entry identity-maps the first 4MB of the 32-bit physical address space.
; All bits are clear except the following:
; bit 7: PS The kernel page is 4MB.
; bit 1: RW The kernel page is read/write.
; bit 0: P The kernel page is present.
; This entry must be here -- otherwise the kernel will crash immediately after paging is
; enabled because it can't fetch the next instruction! It's ok to unmap this page later.
dd 0x00000083
times (KERNEL_PAGE_NUMBER - 1) dd 0 ; Pages before kernel space.
; This page directory entry defines a 4MB page containing the kernel.
dd 0x00000083
times (1024 - KERNEL_PAGE_NUMBER - 1) dd 0 ; Pages after the kernel image.
; 32-bit GDT to replace one created by multiboot loader
; Per the multiboot specification we Can't rely on GDTR
; being valid so we need our own if we ever intend to
; reload any of the segment registers (this may be an
; issue with protected mode interrupts).
align 8
gdt_start:
dd 0 ; null descriptor
dd 0
gdt32_code:
dw 0FFFFh ; limit low
dw 0 ; base low
db 0 ; base middle
db 10011010b ; access
db 11001111b ; 32-bit size, 4kb granularity, limit 0xfffff pages
db 0 ; base high
gdt32_data:
dw 0FFFFh ; limit low (Same as code)
dw 0 ; base low
db 0 ; base middle
db 10010010b ; access
db 11001111b ; 32-bit size, 4kb granularity, limit 0xfffff pages
db 0 ; base high
end_of_gdt:
gdtr:
dw end_of_gdt - gdt_start - 1
; limit (Size of GDT - 1)
dd gdt_start ; base of GDT
CODE_SEG equ gdt32_code - gdt_start
DATA_SEG equ gdt32_data - gdt_start
We've now added the required GDT structure and created a record called gdtr that can be loaded with the LGDT instruction.
Since you are using OSDev as a resource, I recommend looking at the GDT tutorial for information on creating a GDT. The Intel manuals are also an excellent source of information.
Other Observations
Your loader.asm sets up a Multiboot header so it is a good bet you are using a Multiboot compliant bootloader. When you use a Multiboot compliant bootloader your CPU will be placed into 32-bit protected mode before it starts running your code starting at _loader. Your question suggests that you think you are in real mode, but you are actually already in protected mode. With a Mulitboot loader it isn't necessary to set CR0 bit 0 to a value of 1. It is guaranteed to already be 1 (set). In my code above I have removed it after setting up the GDT.

How can I multiply two hex 128 bit numbers in assembly

I have two 128 bit numbers in memory in hexadecimal, for example (little endian):
x:0x12 0x45 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
y:0x36 0xa1 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
I've to perform the unsigned multiplication between these two numbers so my new number will be:
z:0xcc 0xe3 0x7e 0x2b 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
Now, I'm aware that I can move the half x and y number into rax and rbx registers and, for example, do the mul operation, and do the same with the other half. The problem is that by doing so I lose the carry-over and I've no idea how I can avoid that. It's about 4 hours I'm facing this problem and the only solution that can I see is the conversion in binary (and <-> shl,1).
Can you give me some input about this problem?
I think the best solution is to take one byte par time.

Let μ = 264, then we can decompose your 128 bit numbers a and b into a = a1μ + a2 and b = b1μ + b2. Then we can compute c = ab with 64 · 64 → 128 bit multiplications by first computing partial products:
q1μ + q2 = a2b2
r1μ + r2 = a1b2
s1μ + s2 = a2b1
t1μ + t2 = a1b1
and then accumulating them into a 256 bit result (watch the overflow when doing the additions!):
c = t1μ3 + (t2 + s1 + r1) μ2 + (s2 + r2 + q1) μ + q2

As usual, ask a compiler how to do something efficiently: GNU C on 64-bit platforms supports __int128_t and __uint128_t.
__uint128_t mul128(__uint128_t a, __uint128_t b) { return a*b; }
compiles to (gcc6.2 -O3 on Godbolt)
imul rsi, rdx # a_hi * b_lo
mov rax, rdi
imul rcx, rdi # b_hi * a_lo
mul rdx # a_lo * b_lo widening multiply
add rcx, rsi # add the cross products ...
add rdx, rcx # ... into the high 64 bits.
ret
Since this is targeting the x86-64 System V calling convention, a is in RSI:RDI, while b is in RCX:RDX. The result is returned in RDX:RAX.
Pretty nifty that it only takes one MOV instruction, since gcc doesn't need the high-half result of a_upper * b_lower or vice versa. It can destroy the high halves of the inputs with the faster 2-operand form of IMUL since they're only used once.
With -march=haswell to enable BMI2, gcc uses MULX to avoid even the one MOV.
Sometimes compiler output isn't perfect, but very often the general strategy is a good starting point for optimizing by hand.
Of course, if what you really wanted in the first place was 128-bit multiplies in C, just use the compiler's built-in support for it. That lets the optimizer do its job, often giving better results than if you'd written a couple parts in inline-asm. (https://gcc.gnu.org/wiki/DontUseInlineAsm).
Is there a 128 bit integer in gcc? for GNU C unsigned __int128
https://learn.microsoft.com/en-us/cpp/intrinsics/umul128?view=msvc-170 MSVC's _umul128 that does 64x64 => 128-bit multiply (on 64-bit CPUs only). Takes args as 64-bit halves, returns two halves.
Getting the high part of 64 bit integer multiplication - Including with MSVC intrinsics, but still only for 64-bit CPUs.
An efficient way to do basic 128 bit integer calculations in C++?

Ruby creating binary data from human readable

I am creating a variable that is the payload of an IPv6 packet, and I need to have multiple data formats concatenated to it, and am having some trouble.
Specifically, I have:
64 - unsigned int 1 byte (prefix length)
1100 0000 - binary 1 byte (flags)
86400 - unsigned int, left padded/4 bytes (lifetime)
14400 - unsigned int, left padded/4 bytes (preferred lifetime)
0x00 0x00 0x00 0x00 - reserved/unused 4 bytes
New to ruby - anything will help.

Are you familiar with pack? That's probably what you'll need to build your packets.

Access specific bit in embedded X86 assembly

I am trying to acces a specific bit and modify it.
I have moved 0x01ABCDEF (hex value) into ecx and want to be able to check bit values at specific position.
For example I must take byte 0 of 0x01ABCDEF (0xEF)
check if bit at position 7 is 1
set the middle 4 bits to 1 and the rest to 0.

Under x86 the most simple solution is using bit manipulation instructions like BT (bit test), BTC (bit test and complement), BTR (bit test and reset) and BTS (bit test and set).
Bit test example:
mov dl, 7 //test 7th bit
bt ecx, edx //test 7th bit in register ecx
Remember: only last 5 bits in register edx is used.
or
bt ecx, 7
In both cases the result is stored in carry flag.

It's been years since I've done asm, but you want to and your value with 0x80 and if the result is zero your bit is not set so you jump out, otherwise continue along and set your eax to the value you want (I assume the four bits you mean are the 00111100 in the fourth byte.
For example (treat this as pseudo code as it's been far too long):
and eax, 0x80
jz exit
mov eax, 0x3C
exit:

Most CPUs do not support bit-wise access, so you have to use OR to set and AND to clear bits.
As I'm not really familiar with assembly I will just give you C-ish pseudocode, but you should easily be able to transform that to assembly instructions.
value = 0x01ABCDEF;
last_byte = value & 0xFF; // = 0xEF
if (last_byte & 0x40) { // is the 7th bit set? (0x01 = 1st, 0x02 = 2nd, 0x04 = 3rd, 0x08 = 4th, 0x10 = 5th, 0x20 = 6th, 0x40 = 7th, 0x80 = 8th)
value = value & 0xFFFFFF00; // clear last byte
value = value | 0x3C; // set the byte with 00111100 bits (0x3C is the hex representation of these bits)
}
Not that you can remove the last_byte assignment and directly check value & 0x40. However, if you want to check something which is not the least significant part, you have to do shifting first. For example, to extract ABCD you would use the following:
middle_bytes = (value & 0xFFFF00) >> 8;
value & 0cFFFF00 gets rif og the more significant byte(s) (0x01) and >> 8 shifts the result left by one byte and thus gets rid of the last byte (0xEF).

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio