Identifying `struct resource` associated with PCI region - linux-kernel

I'm iterating over iomem_resource children:
struct resource *p;
for (p = iomem_resource.child; p ; p = p->sibling)
printk(KERN_NOTICE ":: %s %lx %lx-%lx", p->name, p->flags, p->start, p->end);
The output is as follows:
reserved 80000200 0-fff
System RAM 80000200 1000-9fbff
reserved 80000200 9fc00-9ffff
Video ROM 80002200 c0000-c7fff
Adapter ROM 80002200 e2000-eebff
reserved 80000200 f0000-fffff
System RAM 80000200 100000-3ffeffff
ACPI Tables 80000200 3fff0000-3fffffff
0000:00:02.0 21208 e0000000-e7ffffff
0000:00:03.0 20200 f0000000-f001ffff
0000:00:04.0 20200 f0400000-f07fffff
0000:00:04.0 21208 f0800000-f0803fff
0000:00:06.0 20200 f0804000-f0804fff
0000:00:0d.0 20200 f0806000-f0807fff
IOAPIC 0 80000200 fec00000-fec003ff
Local APIC 80000200 fee00000-fee00fff
reserved 200 fffc0000-ffffffff
I'd like to identify PCI regions (like 0000:00:02.0 etc). As far as I see, flags won't help much.
In kernel/resource.c they identify "System RAM" area just by name. What would be the appropriate approach for PCI regions?

It seems that the right way to identify PCI address regions is to iterate pci resources directly, instead of traversing iomem_resource:
struct pci_dev *dev = 0;
struct resource *p;
for_each_pci_dev(dev)
{
int i;
for (i = 0; i < DEVICE_COUNT_RESOURCE; ++i)
if ((p=dev->resource + i))
printk(KERN_NOTICE "%lx-%lx %x", p->start, p->end, p->flags);
// etc...
}

Related

How to reserve physical memory in kernel (arm64)

I want to reserve some memory to save kernel information. I copied reserve_crashkernel function to arm64 and modified it:
/* 16M alignment for crash kernel regions */
#define CRASH_ALIGN (16 << 20)
/* Location of the reserved area for the crash kernel */
struct resource crashk_res = {
.name = "Crash kernel",
.start = 0,
.end = 0,
.flags = IORESOURCE_MEM
};
static void __init reserve_crashkernel(void)
{
unsigned long long crash_size, crash_base, total_mem;
int ret;
crash_size = CRASH_ALIGN;
total_mem = memblock_phys_mem_size();
pr_info("crashkernel find memory %x - %llx.\n", CRASH_ALIGN, memblock_end_of_DRAM());
crash_base = memblock_find_in_range(CRASH_ALIGN, memblock_end_of_DRAM(),
crash_size, CRASH_ALIGN);
if (!crash_base) {
pr_info("crashkernel reservation failed - No suitable area found.\n");
return;
}
ret = memblock_reserve(crash_base, crash_size);
if (ret) {
pr_err("%s: Error reserving crashkernel memblock.\n", __func__);
return;
}
pr_info("Reserving %ldMB of memory at %ldMB for crashkernel (System RAM: %ldMB)\n",
(unsigned long)(crash_size >> 20),
(unsigned long)(crash_base >> 20),
(unsigned long)(total_mem >> 20));
crashk_res.start = crash_base;
crashk_res.end = crash_base + crash_size - 1;
insert_resource(&iomem_resource, &crashk_res);
}
When the kernel started, I can find kernel print like this:
[ 0.000000] crashkernel find memory 1000000 - 210000000.
[ 0.000000] Reserving 16MB of memory at 8272MB for crashkernel (System RAM: 8190MB)
But the /proc/iomem doesn't seem right. Without my code there is a 'System RAM' region:
100000000-20fffffff : System RAM
Now with reserve_crashkernel, the region changed to:
205000000-205ffffff : Crash kernel
I don't why the 'System RAM' region disappeared and I'm not sure that my code is correct.

Kernel address poising by clearing upper bits?

Is there some mechanism in Linux which is poisoning addresses by zeroing upper 16 bits?
I am debugging a kernel crash on an Intel x86-64 machine. The instruction which is causing the crash tries to access an address of 0x880139f3da00:
crash> bt
R10: 0000000000000001 R11: 0000000000000001 R12: 0000880139f3da00
~~~~~~~~~~~~~~~~~~~~~
crash> p arp_tbl.nht->hash_buckets[255]
$66 = (struct neighbour *) 0x880139f3da00
crash> p *arp_tbl.nht->hash_buckets[255]
Cannot access memory at address 0x880139f3da00
The hash_buckets table is valid:
crash> p arp_tbl.nht->hash_buckets[253]
$70 = (struct neighbour *) 0xffff88007325ae00
$71 = {
next = 0x0,
tbl = 0xffffffff81abbf20 <arp_tbl>,
Setting upper word to 0xffff makes the address valid and returns a valid data structure:
crash> p *((struct neighbour *)0xffff880139f3da00)
$73 = {
next = 0xffff88006de69a00,
tbl = 0xffffffff81abbf20 <arp_tbl>,
... rest looks reasonable too ...
Structure is updated by RCU operations (e.g. very likely by these in neigh_flush_dev()). So, what could be the reason that the address becomes invalid in such a way?
I can exclude hardware defects (seen on two machines and with different addresses). Systems are running CentOS 7 with kernel 3.10.0-514.6.1.el7.centos.plus.x86_64 till 3.10.0-514.21.2.el7.centos.plus.x86_64.
Update
From another crash dump, I see an skb of an IPv6 packet with
crash> p *((struct sk_buff *)0xffff880070e25e00)
$57 = {
transport_header = 54,
network_header = 14,
mac_header = 0,
...
head = 0xffff880138e28000 "",
data = 0xffff880138e2800e "`",
...
}
This crashes when writing the first 0x8 bytes in
#define HH_DATA_MOD 16
static inline int neigh_hh_output(const struct hh_cache *hh, struct sk_buff *skb)
{
if (likely(hh_len <= HH_DATA_MOD)) {
memcpy(skb->data - HH_DATA_MOD, hh->hh_data, HH_DATA_MOD); <<<<<
This would explain why two bytes are overridden (16 - 14).
can you inspect the memory location this address was read from? typically such a "partial zero" read is a result of memset being run on the area. after this cpu triggered a crash there was possibly enough time for whoever else was modifying the area to finish zeroing and possibly even fill it with other data.
so far there is no reason to suspect rcu plays any role here
this is most definitely not "poisoning" done by the kernel (it would be quite weird to do it in this way). however, if the crash is reproducible (you say it occurred on at least 2 different machines?) then running a debug kernel may be of help, especially with slab debug enabled.

How to work with reserved CMA memory?

I would like to allocate piece of physically contiguous reserved memory (in predefined physical addresses) for my device with DMA support.
As I see CMA has three options:
1. To reserve memory via kernel config file. 2. To reserve memory via kernel cmdline. 3. To reserve memory via device-tree memory node.
In the first case: size and number of areas could be reserved.
CONFIG_DMA_CMA=y
CONFIG_CMA_AREAS=7
CONFIG_CMA_SIZE_MBYTES=8
So I could use:
start_cma_virt = dma_alloc_coherent(dev->cmadev, (size_t)size_cma, &start_cma_dma, GFP_KERNEL);
in my driver to allocate contiguous memory. I could use it max 7 times and it will be possible allocate up to 8M. But unfortunately
dma_contiguous_reserve(min(arm_dma_limit, arm_lowmem_limit));
from arch/arm/mm/init.c:
void __init arm_memblock_init(struct meminfo *mi,const struct machine_desc *mdesc)
it is impossible to set predefined physical addresses for contiguous allocation.
Of Course I could use kernel cmdline:
mem=cma=cmadevlabel=8M#32M cma_map=mydevname=cmadevlabel
//struct device *dev = cmadev->dev; /*dev->name is mydevname*/
After that dma_alloc_coherent() should alloc memory in physical memory area from 32M + 8M (0x2000000 + 0x800000) up to 0x27FFFFF.
But unfortunately I have problem with this solution. Maybe my cmdline has error?
Next one try was device tree implementation.
cmadev_region: mycma {
/*no-map;*/ /*DMA coherent memory*/
/*reusable;*/
reg = <0x02000000 0x00100000>;
};
And phandle in some node:
memory-region = <&cmadev_region>;
As I saw in kernel usual it should be used like:
of_find_node_by_name(); //find needed node
of_parse_phandle(); //resolve a phandle property to a device_node pointer
of_get_address(); //get DT __be32 physical addresses
of_translate_address(); //DT represent local (bus, device) addresses so translate it to CPU physical addresses
request_mem_region(); //reserve IOMAP memory (cat /proc/iomem)
ioremap(); //alloc entry in page table for reserved memory and return kernel logical addresses.
But I want use DMA via (as I know only one external API function dma_alloc_coherent) dma_alloc_coherent() instead IO-MAP ioremap(). But how call
start_cma_virt = dma_alloc_coherent(dev->cmadev, (size_t)size_cma, &start_cma_dma, GFP_KERNEL);
associate memory from device-tree (reg = <0x02000000 0x00100000>;) to dev->cmadev ? In case with cmdline it is clear it has device name and addresses region.
Does reserved memory after call of_parse_phandle() automatically should be booked for your special driver (which parse DT). And next call dma_alloc_coherent will allocate dma area inside memory from cmadev_region: mycma?
To use dma_alloc_coherent() on reserved memory node, you need to declare that area as dma_coherent. You can do some thing like:
In dt:
cmadev_region: mycma {
compatible = "compatible-name"
no-map;
reg = <0x02000000 0x00100000>;
};
In your driver:
struct device *cma_dev;
static int rmem_dma_device_init(struct reserved_mem *rmem, struct device *dev)
{
int ret;
if (!mem) {
ret = dma_declare_coherent_memory(cma_dev, rmem->base, rmem->base,
rmem->size, DMA_MEMORY_EXCLUSIVE);
if (ret) {
pr_err("Error");
return ret;
}
}
return 0;
}
static void rmem_dma_device_release(struct reserved_mem *rmem,
struct device *dev)
{
if (dev)
dev->dma_mem = NULL;
}
static const struct reserved_mem_ops rmem_dma_ops = {
.device_init = rmem_dma_device_init,
.device_release = rmem_dma_device_release,
};
int __init cma_setup(struct reserved_mem *rmem)
{
rmem->ops = &rmem_dma_ops;
return 0;
}
RESERVEDMEM_OF_DECLARE(some-name, "compatible-name", cma_setup);
Now on this cma_dev you can perform dma_alloc_coherent and get memory.

Implementing PCIe Linux device driver (want to access my card registers from kernel driver)

I'm writing a device driver to access the memory in a FPGA on a PCIe card.
The card boots and is probed/found :-
/proc/iomem
80000000-840fffff : PCI Bus #03
80000000-83ffffff : 0000:03:00.0
84000000-840fffff : 0000:03:00.0
So reading ldd/etc I coded up a call to request_mem_region at the 80000000, and requested a pointer to it via ioremap_nocache
1) Do I need to request_mem_region as well as a ioremap_nocache, cant I use just the latter?
/proc/iomem After insmod my device driver :-
80000000-840fffff : PCI Bus #03
80000000-83ffffff : 0000:03:00.0
80000000-8003ffff : fp2
84000000-840fffff : 0000:03:00.0
2) Doesnt look quite right to me...?
Anyway, reads don't work (its not coded like below, it has checks etc):-
#define BAR_ADDR 0x80000000
void *base = ioremap_nocache(BAR_ADDR, 0x40000);
void *address = base + KNOWN_REG_LOCATION;
int data = ioread32(address);
printk("fp2: address:0x%08x, data:0x%08x\n", address, data);
Outputs :-
address:0xfd500000, data:0xffffffff
I can read the x80000000+KNOWN_REG_LOCATION from mmap userspace.
3) I've tried __raw_readl/readl with no luck as well.
4) Can I just read at the currently mapped address x80000000?
Ian,
I wrote a PCI driver for a device (full source). The mapping of the register space should be the same though. Here is how I do it.
dm7820_device->pci[region].virt_addr = ioremap_nocache(address, length);
if (dm7820_device->pci[region].virt_addr == NULL) {
printk(KERN_ERR "%s: ERROR: BAR%u remapping FAILED\n",
&((dm7820_device->device_name)[0]), region);
dm7820_release_resources();
return -ENOMEM;
}
if (request_mem_region(address, length, &((dm7820_device->device_name)[0])) == NULL) {
printk(KERN_ERR "%s: ERROR: I/O memory range %#lx-%#lx allocation FAILED\n",
&((dm7820_device->device_name)[0]), address, (address + length - 1));
dm7820_release_resources();
return -EBUSY;
}
The address and length values are returned from pci_resource_start() and pci_resource_length() calls.
Then you can access it using ioread32() using dm7820_device->pci[region].virt_addr + <register offset>
Let me know if you have any questions.

Printing the physical address space of shared memory

In C program I got 2 programs one to store a string in a shared memory and the other program is to print the same string via accessing the shared memory.
program 1->
printf("\n SENDER ADDRESS");
printf("\nTHE ADDRESS IS %p",addr1);
printf("\nENTER THE MESSAGE:");
scanf("%s",addr1);
printf("\nMESSAGE STORED IN %p IS %s",addr1,addr1);
program->2
I am having the same text body in program 2 . **BUT IT IS PRINTING THE LOGICAL ADDRESS. HENCE IN THE OUTPUT IAM GETTING 2 DIFFERENT LOGICAL ADRESS. BUT INSTEAD OF THAT IF I WANT TO HAVE SAME PHYSICAL ADDRESS(TO SATISFY THE COLLEGE PROFFESSOR)HOW CAN I DO.**
PLEASE HELP....
PROGRAM 1:
#include<sys/types.h>
#include<string.h>
#include<sys/ipc.h>
#include<sys/shm.h>
#include<stdio.h>
main()
{
key_t key;
int shmid;
char* addr1;
key = ftok("/home/tamil/myc/pws.c",'T');
shmid = shmget(key,128*1024,IPC_CREAT|SHM_R|SHM_W);
addr1 = shmat(shmid,0,0);
printf("\nIPC SHARED MEMORY");
printf("\n SENDER ADDRESS");
printf("\nTHE ADDRESS IS %p",addr1);
printf("\nENTER THE MESSAGE:");
scanf("%s",addr1);
printf("\nMESSAGE STORED IN %p IS %s",addr1,addr1);
}
PROGRAM 2:
#include<sys/types.h>
#include<string.h>
#include<sys/ipc.h>
#include<sys/shm.h>
#include<stdio.h>
main()
{
int shmid;
char* addr1;
key_t key;
key = ftok("/home/tamil/myc/pws.c",'T');
shmid = shmget(key,128*1024,SHM_R|SHM_W);
addr1 = shmat(shmid,0,0);
printf("\nIPC SHARED MEMORY");
printf("\n SENDER ADDRESS");
printf("\nTHE ADDRESSS IS %p",addr1);
printf("\nMESSAGE STORED IN %p IS %s",addr1,addr1);
}
output:
tamil#ubuntu:~/myc$ cc shmget.c
tamil#ubuntu:~/myc$ ./a.out
IPC SHARED MEMORY
SENDER ADDRESS
**THE ADDRESS IS **0xb786c000****
ENTER THE MESSAGE:helloworld
MESSAGE STORED IN **0xb786c000** IS helloworldtamil#ubuntu:~/myc$ cc shmget2.c
tamil#ubuntu:~/myc$ ./a.out
IPC SHARED MEMORY
SENDER ADDRESS
**THE ADDRESSS IS **0xb7706000****
MESSAGE STORED IN **0xb7706000** IS helloworldtamil#ubuntu:~/myc$
Dumping 2 MB of shared mem on Linux.
dd if=/dev/shm/stuff_here of=myshm.dmp bs=1024 count=2048
Might have to mount it first in fstab.

Resources