Why AMD64 RSP register was subtracted 8 bytes wrongly? - linux-kernel

Now I met one issue about kernel panic at linux-3.0, it seems RSP register was subtracted 8 bytes wrongly. So I can not Judge it is a CPU bug or kernel bug. But I go through the assemble code of do page_fault, no find any code to -8 for rsp.hope you can give me some ideas. Thanks!
BTW:This issue is hard to be reproduced, and only met in one x86 machine.
(1) For AMD64, r12-r15 and rbx, rbp are callee save registers, when call do_page_fault, in this function, they will be saved.
The stack like below:
00007f48c91c1000(r11)
0000000000000000(rbx)
00007ffc0f907bb0(rbp)
00007f48c9558000(r12)
00007f48c91c9708(r13)
00007f48ca168500(r14)
00007f48ca168500(r15)(caller save)
ffffffff81461fc5(page_fault+0x25/0x30)* (return address)
00007f48ca168500(r15) (callee save)
00007f48ca168500(r14)
00007f48c91c9708(r13)
00007f48c9558000(r12)
00007ffc0f907bb0(rbp)
00007f48c91e8598(rbx)
(2) But I got one wrong return value when do_page_fault finished, it should pop “page_fault+0x25/0x30” to RIP, but it seems pop the “00007f48ca168500(r15)” to RIP and cause this following OOP, it seems RSP register was subtracted 8 bytes wrongly in do_page_fault function:
<6>[29205.617769] ovs-vsctl[33927]: segfault at 7f48c9558000 ip 00007f48c9f62285 sp 00007ffc0f907ad0 error 6 in ld-2.11.3.so[7f48c9f57000+1f000]
<1>[29205.617808] BUG: unable to handle kernel paging request at 00007f48ca168500
<1>[29205.621539] IP: [<00007f48ca168500>] 0x7f48ca1684ff
<4>[29205.621539] PGD 3f76860067 PUD 32cf7f7067 PMD 2afce53067 PTE 800000375422e067
<1>[29205.621539] Thread overran stack, or stack corrupted
<0>[29205.621539] Oops: 0011 [#1] SMP
<4>[29205.621539] Inexact backtrace:
<4>[29205.621539]
<4>[29205.621539] CPU 43
<4>[29205.621539] Supported: No, Unsupported modules are loaded
<4>[29205.621539]
<4>[29205.621539] Pid: 33927, comm: ovs-vsctl Tainted: GF NX 3.0.93-0.8-default #1 xxxxx
<4>[29205.621539] RIP: 0010:[<00007f48ca168500>] [<00007f48ca168500>] 0x7f48ca1684ff
<4>[29205.621539] RSP: 0000:ffff882b370adf50 EFLAGS: 00010286
<4>[29205.621539] RAX: 0000000000000000 RBX: 00007f48c91c0000 RCX: ffff883fb99c03c0
<4>[29205.621539] RDX: 0000000000000000 RSI: 0000000000000286 RDI: 0000000000000286
<4>[29205.621539] RBP: 0000000000000000 R08: 0000000000000020 R09: 0000000000000000
<4>[29205.621539] R10: 0000000000000006 R11: 000000000000004a R12: 00007ffc0f907bb0
<4>[29205.621539] R13: 00007f48c9558000 R14: 00007f48c91c9708 R15: 00007f48ca168500
<4>[29205.621539] FS: 00007f48ca163c00(0000) GS:ffff88407f3e0000(0000) knlGS:0000000000000000
<4>[29205.621539] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[29205.621539] CR2: 00007f48ca168500 CR3: 0000002b3ce96000 CR4: 00000000001427e0
<4>[29205.621539] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>[29205.621539] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
<4>[29205.621539] Process ovs-vsctl (pid: 33927, threadinfo ffff882b370ac000, task ffff883fb99c03c0)
<0>[29205.621539] Stack:
<4>[29205.621539] ffffffff81461fc5 00007f48ca168500(r15) 00007f48ca168500(r14) 00007f48c91c9708(r13)
<4>[29205.621539] 00007f48c9558000(r12) 00007ffc0f907bb0(rbp) 00007f48c91e8598(rbx) 00007f48c91c1000(r11)
<4>[29205.621539] 00007f48c955ed60(r10) 0000000000000001(r9) 00007f48c91c5cb8(r8) 0000000000000007
<0>[29205.621539] Call Trace:
<0>[29205.621539] Inexact backtrace:
<0>[29205.621539]
<4>[29205.621539] [<ffffffff81461fc5>] ? page_fault+0x25/0x30
<0>[29205.621539] Code: Bad RIP value.
<1>[29205.621539] RIP [<00007f48ca168500>] 0x7f48ca1684ff
<4>[29205.621539] RSP <ffff882b370adf50>
<0>[29205.621539] CR2: 00007f48ca168500
Page_fault call sequence:
ffffffff81461fa0 <page_fault>:
ffffffff81461fa0: ff 15 ca aa 5b 00 callq *0x5baaca(%rip) # ffff:
ffff81a1ca70 <pv_irq_ops+0x30>
ffffffff81461fa6: 48 83 ec 78 sub $0x78,%rsp
ffffffff81461faa: e8 b1 01 00 00 callq ffffffff81462160 <error_entry>
ffffffff81461faf: 48 89 e7 mov %rsp,%rdi
ffffffff81461fb2: 48 8b 74 24 78 mov 0x78(%rsp),%rsi
ffffffff81461fb7: 48 c7 44 24 78 ff ff movq $0xffffffffffffffff,0x78(%rsp)
ffffffff81461fbe: ff ff
ffffffff81461fc0: e8 6b 32 00 00 callq ffffffff81465230 <do_page_fault>
ffffffff81461fc5: e9 46 02 00 00 jmpq ffffffff81462210 <error_exit>  ffffffff81461fc5(page_fault+0x25/0x30)
ffffffff81461fca: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)

Related

why it dumped stack twice when my linux driver failed at the second time while first time being insmod ran normally

i started learning linux driver a few days ago and i write a simple driver.fisrt i insmod my driver ,it showed it runs well ,and normal when i rmmod. but when i insmod it again, console log showed "killed", and then i use dmesg. The kernel log showed twice stackdump which surprised me so that i don't how to debug(use printk (●'◡'●)).
Many times tried on search machine, i got nothing. so i throwed it here, needing your guys help very very desperate. i truely wanna know why it failed at second time, why stackdump happened twice and how could i fix this driver. thanks very very much!
my vm linux kernel version is : 5.13.0
driver code is here:
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/init.h>
#include <linux/printk.h>
#include <linux/fs.h>
#include <linux/kdev_t.h>
#include <linux/device.h>
#include <linux/export.h>
#include <linux/types.h>
#include <linux/kobject.h>
static ssize_t my_file_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
return snprintf(buf, 64, "%s", __func__);
}
static ssize_t my_file_store(struct device *dev,
struct device_attribute *attr, const char *buf, size_t count)
{
pr_info("going to my_file_store\n");
return count;
}
static DEVICE_ATTR(my_file, 0664, my_file_show, my_file_store);
static int my_devid = -1;
static struct class *my_class = NULL;
static struct device *my_device = NULL;
static int __init my_init(void){
pr_info("going to %s\n", __func__);
int ret = 0;
ret = alloc_chrdev_region(&my_devid, 0, 1, "my_devid");
if(ret < 0){
my_devid = -1;
pr_err("[%s,%d]alloc_chrdev_region failed\n", __func__, __LINE__);
goto FAULT;
}
pr_info("my devid %d\n", my_devid);
my_class = class_create(THIS_MODULE, "my_class");
if(my_class == NULL){
pr_err("[%s,%d]class_create failed\n", __func__, __LINE__);
goto FAULT;
}
pr_info("[%s,%d]goes here\n", __func__, __LINE__);
my_device = device_create(my_class, NULL, my_devid, "%s", "my_dev");
if(my_device == NULL){
pr_err("[%s,%d] device_create failed\n", __func__, __LINE__);
goto FAULT;
}
pr_info("[%s,%d]goes here\n", __func__, __LINE__);
ret = device_create_file(my_device, &dev_attr_my_file);
if(ret < 0){
pr_err("sysfs_create_file failed\n");
goto FAULT;
}
pr_info("go to init tail now\n");
return 0;
FAULT:
if(my_devid != -1){
unregister_chrdev_region(my_devid, "my_devid");
my_devid = -1;
}
if(my_device != NULL){
device_destroy(my_class, my_devid);
my_device = NULL;
}
if(my_class != NULL){
class_destroy(my_class);
my_class = NULL;
}
return 0;
}
static void __exit my_exit(void){
pr_info("going to %s\n", __func__);
device_remove_file(my_device, &dev_attr_my_file);
if(my_devid != -1){
unregister_chrdev_region(my_devid, "my_devid");
my_devid = -1;
}
if(my_device != NULL){
device_destroy(my_class, my_devid);
my_device = NULL;
}
if(my_class != NULL){
class_destroy(my_class);
my_class = NULL;
}
}
module_init(my_init);
module_exit(my_exit);
MODULE_AUTHOR("tid");
MODULE_LICENSE("GPL");
this is dmesg:
going to my_init
[87682.699433] my devid 247463936
[87682.700041] [my_init,47]goes here
[87682.706933] [my_init,54]goes here
[87682.706937] go to init tail now
[87704.903499] going to my_exit
[87747.424115] going to my_init
[87747.424385] my devid 262144000
[87747.424418] [my_init,47]goes here
[87747.424784] sysfs: cannot create duplicate filename '/devices/virtual/my_class'
[87747.424989] CPU: 1 PID: 462167 Comm: insmod Tainted: G OE 5.13.0-27-generic #29~20.04.1-Ubuntu
[87747.424992] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
[87747.425172] Call Trace:
[87747.426055] dump_stack+0x7d/0x9c
[87747.427617] sysfs_warn_dup.cold+0x17/0x27
[87747.427889] sysfs_create_dir_ns+0xb8/0xd0
[87747.428703] kobject_add_internal+0xbd/0x2b0
[87747.429021] kobject_add+0x7e/0xb0
[87747.429023] ? kmem_cache_alloc_trace+0x37c/0x440
[87747.429671] get_device_parent.isra.0+0x179/0x1b0
[87747.429943] device_add+0xe3/0x8e0
[87747.429945] device_create_groups_vargs+0xd4/0xf0
[87747.429946] ? 0xffffffffc09b1000
[87747.429948] device_create+0x49/0x60
[87747.429950] my_init+0xf0/0x1000 [test]
[87747.430241] do_one_initcall+0x46/0x1d0
[87747.430632] ? __cond_resched+0x19/0x30
[87747.430866] ? kmem_cache_alloc_trace+0x37c/0x440
[87747.430869] do_init_module+0x62/0x260
[87747.430898] load_module+0x125d/0x1440
[87747.431183] __do_sys_finit_module+0xc2/0x120
[87747.431185] ? __do_sys_finit_module+0xc2/0x120
[87747.431186] __x64_sys_finit_module+0x1a/0x20
[87747.431188] do_syscall_64+0x61/0xb0
[87747.431260] ? __x64_sys_newfstat+0x16/0x20
[87747.431361] ? do_syscall_64+0x6e/0xb0
[87747.431363] ? __x64_sys_lseek+0x1a/0x20
[87747.431380] ? do_syscall_64+0x6e/0xb0
[87747.431382] ? exc_page_fault+0x8f/0x170
[87747.431383] ? asm_exc_page_fault+0x8/0x30
[87747.431385] entry_SYSCALL_64_after_hwframe+0x44/0xae
[87747.431386] RIP: 0033:0x7fd6b8d3789d
[87747.431388] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d c3 f5 0c 00 f7 d8 64 89 01 48
[87747.431390] RSP: 002b:00007ffe09073bd8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[87747.431405] RAX: ffffffffffffffda RBX: 0000557d4fa68760 RCX: 00007fd6b8d3789d
[87747.431405] RDX: 0000000000000000 RSI: 0000557d4db48358 RDI: 0000000000000003
[87747.431406] RBP: 0000000000000000 R08: 0000000000000000 R09: 00007fd6b8e0b260
[87747.431407] R10: 0000000000000003 R11: 0000000000000246 R12: 0000557d4db48358
[87747.431407] R13: 0000000000000000 R14: 0000557d4fa683d0 R15: 0000000000000000
[87747.431503] kobject_add_internal failed for my_class with -EEXIST, don't try to register things with the same name in the same directory.
[87747.431713] [my_init,54]goes here
[87747.431749] BUG: kernel NULL pointer dereference, address: 000000000000001f
[87747.431765] #PF: supervisor read access in kernel mode
[87747.431780] #PF: error_code(0x0000) - not-present page
[87747.431819] PGD 0 P4D 0
[87747.431821] Oops: 0000 [#1] SMP NOPTI
[87747.431823] CPU: 1 PID: 462167 Comm: insmod Tainted: G OE 5.13.0-27-generic #29~20.04.1-Ubuntu
[87747.431825] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
[87747.431826] RIP: 0010:sysfs_create_file_ns+0x26/0x90
[87747.431829] Code: 9c 0f 1f 00 0f 1f 44 00 00 55 48 89 e5 41 55 41 54 53 48 83 ec 10 65 48 8b 04 25 28 00 00 00 48 89 45 e0 31 c0 48 85 ff 74 5b <48> 83 7f 30 00 48 89 fb 74 51 49 89 f4 48 85 f6 74 49 49 89 d5 48
[87747.431831] RSP: 0018:ffffa39a0406fbe0 EFLAGS: 00010282
[87747.431832] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000027
[87747.431833] RDX: 0000000000000000 RSI: ffffffffc09ae020 RDI: ffffffffffffffef
[87747.431834] RBP: ffffa39a0406fc08 R08: ffff8e77b9e589c0 R09: ffffa39a0406fa18
[87747.431835] R10: 0000000000000001 R11: 0000000000000001 R12: ffffffffc09ae020
[87747.431836] R13: ffffffffffffffef R14: ffffffffc09ae040 R15: 0000000000000000
[87747.431837] FS: 00007fd6b8bf2540(0000) GS:ffff8e77b9e40000(0000) knlGS:0000000000000000
[87747.431838] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[87747.432290] CR2: 000000000000001f CR3: 000000001f4c8005 CR4: 00000000003706e0
[87747.432721] Call Trace:
[87747.432724] device_create_file+0x42/0x80
[87747.432726] ? 0xffffffffc09b1000
[87747.432728] my_init+0x141/0x1000 [test]
[87747.432730] do_one_initcall+0x46/0x1d0
[87747.432732] ? __cond_resched+0x19/0x30
[87747.432734] ? kmem_cache_alloc_trace+0x37c/0x440
[87747.432737] do_init_module+0x62/0x260
[87747.432739] load_module+0x125d/0x1440
[87747.432741] __do_sys_finit_module+0xc2/0x120
[87747.432742] ? __do_sys_finit_module+0xc2/0x120
[87747.432743] __x64_sys_finit_module+0x1a/0x20
[87747.432745] do_syscall_64+0x61/0xb0
[87747.432747] ? __x64_sys_newfstat+0x16/0x20
[87747.432749] ? do_syscall_64+0x6e/0xb0
[87747.432750] ? __x64_sys_lseek+0x1a/0x20
[87747.432752] ? do_syscall_64+0x6e/0xb0
[87747.432754] ? exc_page_fault+0x8f/0x170
[87747.432755] ? asm_exc_page_fault+0x8/0x30
[87747.432756] entry_SYSCALL_64_after_hwframe+0x44/0xae
[87747.432758] RIP: 0033:0x7fd6b8d3789d
[87747.432759] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d c3 f5 0c 00 f7 d8 64 89 01 48
[87747.432760] RSP: 002b:00007ffe09073bd8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[87747.432762] RAX: ffffffffffffffda RBX: 0000557d4fa68760 RCX: 00007fd6b8d3789d
[87747.433030] RDX: 0000000000000000 RSI: 0000557d4db48358 RDI: 0000000000000003
[87747.433032] RBP: 0000000000000000 R08: 0000000000000000 R09: 00007fd6b8e0b260
[87747.433033] R10: 0000000000000003 R11: 0000000000000246 R12: 0000557d4db48358
[87747.433033] R13: 0000000000000000 R14: 0000557d4fa683d0 R15: 0000000000000000
[87747.433036] Modules linked in: test(OE+) vsock_loopback vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vsock nls_iso8859_1 intel_rapl_msr intel_rapl_common crct10dif_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd rapl vmw_balloon snd_ens1371 snd_ac97_codec gameport ac97_bus snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi joydev input_leds serio_raw snd_seq snd_seq_device snd_timer snd soundcore vmw_vmci mac_hid sch_fq_codel vmwgfx ttm drm_kms_helper cec rc_core fb_sys_fops syscopyarea sysfillrect sysimgblt msr nfsd parport_pc auth_rpcgss ppdev nfs_acl lockd lp grace parport drm sunrpc ip_tables x_tables autofs4 hid_generic ahci e1000 libahci usbhid hid mptspi mptscsih mptbase crc32_pclmul psmouse scsi_transport_spi i2c_piix4 pata_acpi [last unloaded: test]
[87747.433869] CR2: 000000000000001f
[87747.434327] ---[ end trace d7785aaa07b44309 ]---
[87747.434352] RIP: 0010:sysfs_create_file_ns+0x26/0x90
[87747.434357] Code: 9c 0f 1f 00 0f 1f 44 00 00 55 48 89 e5 41 55 41 54 53 48 83 ec 10 65 48 8b 04 25 28 00 00 00 48 89 45 e0 31 c0 48 85 ff 74 5b <48> 83 7f 30 00 48 89 fb 74 51 49 89 f4 48 85 f6 74 49 49 89 d5 48
[87747.434359] RSP: 0018:ffffa39a0406fbe0 EFLAGS: 00010282
[87747.434361] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000027
[87747.434362] RDX: 0000000000000000 RSI: ffffffffc09ae020 RDI: ffffffffffffffef
[87747.434363] RBP: ffffa39a0406fc08 R08: ffff8e77b9e589c0 R09: ffffa39a0406fa18
[87747.434363] R10: 0000000000000001 R11: 0000000000000001 R12: ffffffffc09ae020
[87747.434364] R13: ffffffffffffffef R14: ffffffffc09ae040 R15: 0000000000000000
[87747.434365] FS: 00007fd6b8bf2540(0000) GS:ffff8e77b9e40000(0000) knlGS:0000000000000000
[87747.434366] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[87747.434368] CR2: 000000000000001f CR3: 000000001f4c8005 CR4: 00000000003706e0

Where does the text segment actually start? [duplicate]

There is a remote 64-bit *nix server that can compile a user-provided code (which should be written in Rust, but I don't think it matters since it uses LLVM). I don't know which compiler/linker flags it uses, but the compiled ELF executable looks weird - it has 4 LOAD segments:
$ readelf -e executable
...
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
...
LOAD 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000004138 0x0000000000004138 R 0x1000
LOAD 0x0000000000005000 0x0000000000005000 0x0000000000005000
0x00000000000305e9 0x00000000000305e9 R E 0x1000
LOAD 0x0000000000036000 0x0000000000036000 0x0000000000036000
0x000000000000d808 0x000000000000d808 R 0x1000
LOAD 0x0000000000043da0 0x0000000000044da0 0x0000000000044da0
0x0000000000002290 0x00000000000024a0 RW 0x1000
...
On my own system all executables that I was looking at only have 2 LOAD segments:
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
...
LOAD 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x00000000003000c0 0x00000000003000c0 R E 0x200000
LOAD 0x00000000003002b0 0x00000000005002b0 0x00000000005002b0
0x00000000000776c8 0x000000000009b200 RW 0x200000
...
What are the circumstances (compiler/linker versions, flags etc) under which a compiler might build an ELF with 4 LOAD segments?
What is the point of having 4 LOAD segments? I imagine that having a segment with read but not execute permission might help against certain exploits, but why have two such segments?
A typical BFD-ld or Gold linked Linux executable has 2 loadable segments, with the ELF header merged with .text and .rodata into the first RE segment, and .data, .bss and other writable sections merged into the second RW segment.
Here is the typical section to segment mapping:
$ echo "int foo; int main() { return 0;}" | clang -xc - -o a.out-gold -fuse-ld=gold
$ readelf -Wl a.out-gold
Elf file type is EXEC (Executable file)
Entry point 0x400420
There are 9 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
PHDR 0x000040 0x0000000000400040 0x0000000000400040 0x0001f8 0x0001f8 R 0x8
INTERP 0x000238 0x0000000000400238 0x0000000000400238 0x00001c 0x00001c R 0x1
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
LOAD 0x000000 0x0000000000400000 0x0000000000400000 0x0006b0 0x0006b0 R E 0x1000
LOAD 0x000e18 0x0000000000401e18 0x0000000000401e18 0x0001f8 0x000200 RW 0x1000
DYNAMIC 0x000e28 0x0000000000401e28 0x0000000000401e28 0x0001b0 0x0001b0 RW 0x8
NOTE 0x000254 0x0000000000400254 0x0000000000400254 0x000020 0x000020 R 0x4
GNU_EH_FRAME 0x00067c 0x000000000040067c 0x000000000040067c 0x000034 0x000034 R 0x4
GNU_STACK 0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW 0x10
GNU_RELRO 0x000e18 0x0000000000401e18 0x0000000000401e18 0x0001e8 0x0001e8 RW 0x8
Section to Segment mapping:
Segment Sections...
00
01 .interp
02 .interp .note.ABI-tag .dynsym .dynstr .gnu.hash .hash .gnu.version .gnu.version_r .rela.dyn .init .text .fini .rodata .eh_frame .eh_frame_hdr
03 .fini_array .init_array .dynamic .got .got.plt .data .bss
04 .dynamic
05 .note.ABI-tag
06 .eh_frame_hdr
07
08 .fini_array .init_array .dynamic .got .got.plt
This optimizes the number of mmaps that the kernel must perform to load such executable, but at a security cost: the data in .rodata shouldn't be executable, but is (because it's merged with .text, which must be executable). This may significantly increase the attack surface for someone trying to hijack a process.
Newer Linux systems, in particular using LLD to link binaries, prioritize security over speed, and put ELF header and .rodata into the first R-only segment, resulting in 3 load segments and improved security. Here is a typical mapping:
$ echo "int foo; int main() { return 0;}" | clang -xc - -o a.out-lld -fuse-ld=lld
$ readelf -Wl a.out-lld
Elf file type is EXEC (Executable file)
Entry point 0x201000
There are 10 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
PHDR 0x000040 0x0000000000200040 0x0000000000200040 0x000230 0x000230 R 0x8
INTERP 0x000270 0x0000000000200270 0x0000000000200270 0x00001c 0x00001c R 0x1
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
LOAD 0x000000 0x0000000000200000 0x0000000000200000 0x000558 0x000558 R 0x1000
LOAD 0x001000 0x0000000000201000 0x0000000000201000 0x000185 0x000185 R E 0x1000
LOAD 0x002000 0x0000000000202000 0x0000000000202000 0x001170 0x002005 RW 0x1000
DYNAMIC 0x003010 0x0000000000203010 0x0000000000203010 0x000150 0x000150 RW 0x8
GNU_RELRO 0x003000 0x0000000000203000 0x0000000000203000 0x000170 0x001000 R 0x1
GNU_EH_FRAME 0x000440 0x0000000000200440 0x0000000000200440 0x000034 0x000034 R 0x1
GNU_STACK 0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW 0
NOTE 0x00028c 0x000000000020028c 0x000000000020028c 0x000020 0x000020 R 0x4
Section to Segment mapping:
Segment Sections...
00
01 .interp
02 .interp .note.ABI-tag .rodata .dynsym .gnu.version .gnu.version_r .gnu.hash .hash .dynstr .rela.dyn .eh_frame_hdr .eh_frame
03 .text .init .fini
04 .data .tm_clone_table .fini_array .init_array .dynamic .got .bss
05 .dynamic
06 .fini_array .init_array .dynamic .got
07 .eh_frame_hdr
08
09 .note.ABI-tag
Not to be left behind, the newer BFD-ld (my version is 2.31.1) also makes ELF header and .rodata read-only, but fails to merge two R-only segments into one, resulting in 4 loadable segments:
$ echo "int foo; int main() { return 0;}" | clang -xc - -o a.out-bfd -fuse-ld=bfd
$ readelf -Wl a.out-bfd
Elf file type is EXEC (Executable file)
Entry point 0x401020
There are 11 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
PHDR 0x000040 0x0000000000400040 0x0000000000400040 0x000268 0x000268 R 0x8
INTERP 0x0002a8 0x00000000004002a8 0x00000000004002a8 0x00001c 0x00001c R 0x1
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
LOAD 0x000000 0x0000000000400000 0x0000000000400000 0x0003f8 0x0003f8 R 0x1000
LOAD 0x001000 0x0000000000401000 0x0000000000401000 0x00018d 0x00018d R E 0x1000
LOAD 0x002000 0x0000000000402000 0x0000000000402000 0x000110 0x000110 R 0x1000
LOAD 0x002e40 0x0000000000403e40 0x0000000000403e40 0x0001e8 0x0001f0 RW 0x1000
DYNAMIC 0x002e50 0x0000000000403e50 0x0000000000403e50 0x0001a0 0x0001a0 RW 0x8
NOTE 0x0002c4 0x00000000004002c4 0x00000000004002c4 0x000020 0x000020 R 0x4
GNU_EH_FRAME 0x002004 0x0000000000402004 0x0000000000402004 0x000034 0x000034 R 0x4
GNU_STACK 0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW 0x10
GNU_RELRO 0x002e40 0x0000000000403e40 0x0000000000403e40 0x0001c0 0x0001c0 R 0x1
Section to Segment mapping:
Segment Sections...
00
01 .interp
02 .interp .note.ABI-tag .hash .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn
03 .init .text .fini
04 .rodata .eh_frame_hdr .eh_frame
05 .init_array .fini_array .dynamic .got .got.plt .data .bss
06 .dynamic
07 .note.ABI-tag
08 .eh_frame_hdr
09
10 .init_array .fini_array .dynamic .got
Finally, some of these choices are affected by the --(no)rosegment (or -Wl,z,noseparate-code for BFD ld) linker option.

Kernel Crash with following code to clear process cache in Linux kernel

I am doing a Linux kernel module to cleanup a process cache.
Below is the code I am using to do that.
static void clear_process_cache(struct task_struct *p)
{
struct mm_struct *mm;
struct vm_area_struct *vma;
struct page *page;
char *my_page_address;
unsigned long uaddr, paddr;
long res;
unsigned int level;
pte_t *pte;
mm = p->mm;
for (vma = mm->mmap; vma; vma = vma->vm_next) {
for(uaddr = vma->vm_start; uaddr < vma->vm_end; uaddr += PAGE_SIZE) {
down_read(&p->mm->mmap_sem);
res = get_user_pages(p, mm, uaddr, 1, 0, 1, &page, NULL);
if (res == 1) {
my_page_address = kmap(page);
paddr = (unsigned long)page_address(page);
pte = lookup_address(paddr, &level);
if (pte && (pte_val(*pte) &_PAGE_PRESENT)) {
clflush_cache_range(my_page_address, PAGE_SIZE);
}
kunmap(page);
put_page(page);
}
up_read(&p->mm->mmap_sem);
}
}
}
When the code is called intensively, the Linux kernel crashed.
I checked my code, but could NOT find why it caused kernel crash.
Would you like to help on it, or is there any other high performance way to do that ??
Here is the crash dump.
[ 391.693385] general protection fault: 0000 [#1] SMP
[ 391.694435] Modules linked in: xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables vmw_vsock_vmci_transport vsock kvm_intel kvm irqbypass vmw_balloon input_leds joydev serio_raw shpchp vmw_vmci i2c_piix4 mac_hid ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper vmwgfx ablk_helper
[ 391.702930] cryptd ttm drm_kms_helper syscopyarea psmouse sysfillrect pata_acpi sysimgblt mptspi fb_sys_fops mptscsih drm mptbase vmxnet3 scsi_transport_spi floppy fjes
[ 391.705034] CPU: 3 PID: 1716 Comm: java Not tainted 4.4.131 #4
[ 391.706080] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/28/2017
[ 391.708180] task: ffff88042607c600 ti: ffff8804292b8000 task.ti: ffff8804292b8000
[ 391.709244] RIP: 0010:[<ffffffff811a34dc>] [<ffffffff811a34dc>] put_compound_page+0x5c/0x1b0
[ 391.710358] RSP: 0000:ffff8804292bbcc8 EFLAGS: 00210202
[ 391.711439] RAX: 00d0a78b4c535441 RBX: ffffffff810dc4f9 RCX: 000507e043713000
[ 391.712523] RDX: ffff8804292bbd44 RSI: 000507e043713000 RDI: ffffffff810dc4f9
[ 391.713586] RBP: ffff8804292bbcd8 R08: ffff880002213cf0 R09: 00003ffffffff000
[ 391.714653] R10: 0000000000000080 R11: 0000000000000000 R12: 00d0a78b4c535440
[ 391.715712] R13: 0000160000000000 R14: ffff8804292bbd88 R15: ffffffff810dc4f9
[ 391.716764] FS: 00007fb138d5b700(0000) GS:ffff88042d6c0000(0000) knlGS:0000000000000000
[ 391.717829] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 391.718877] CR2: 0000000000000000 CR3: 00000000351d1000 CR4: 00000000001606f0
[ 391.719972] Stack:
[ 391.720993] ffffffff810dc4f9 ffff880000000000 ffff8804292bbcf0 ffffffff811a364d
[ 391.722055] ffff8804292bbdc8 ffff8804292bbdf8 ffffffff8102e21e ffff8804292bbd48
[ 391.723122] 0000000000000000 ffff88042607c600 ffff880429e6ac00 ffff880425e9f388
[ 391.724165] Call Trace:
[ 391.725190] [<ffffffff810dc4f9>] ? vprintk_default+0x29/0x40
[ 391.726222] [<ffffffff811a364d>] put_page+0x1d/0x50
[ 391.727259] [<ffffffff8102e21e>] clear_process_cache+0x11e/0x1f0
[ 391.728298] [<ffffffff810dc4f9>] ? vprintk_default+0x29/0x40
[ 391.729318] [<ffffffff811918d0>] ? printk+0x5a/0x76
[ 391.730328] [<ffffffff8102e93d>] do_signal+0x20d/0x770
[ 391.731310] [<ffffffff81193459>] ? unlock_page+0x69/0x70
[ 391.732297] [<ffffffff811972c0>] ? __probe_kernel_read+0x40/0x90
[ 391.733271] [<ffffffff8106d3c3>] ? bad_area+0x43/0x50
[ 391.734220] [<ffffffff810034fc>] exit_to_usermode_loop+0x8c/0xd0
[ 391.735143] [<ffffffff81003c26>] prepare_exit_to_usermode+0x26/0x30
[ 391.736062] [<ffffffff8185184e>] retint_user+0x8/0x34
[ 391.736941] Code: ff 5b 41 5c 5d c3 48 89 df e8 01 f6 ff ff 48 89 df 31 f6 e8 17 76 ff ff 5b 41 5c 5d c3 48 8b 47 20 4c 8d 60 ff a8 01 4c 0f 44 e7 <41> f6 44 24 08 01 74 08 49 8b 04 24 a8 80 74 1a 48 8b 43 20 a8
[ 391.739698] RIP [<ffffffff811a34dc>] put_compound_page+0x5c/0x1b0
[ 391.740571] RSP <ffff8804292bbcc8>

Debugging page allocation failure on Coldfire uCLinux

I'm sometimes getting this crash output below on my Coldfire uCLinux system. How do I work out what's causing the problem?
Apr 4 10:44:33 (none) user.debug syslog: starting NTP
sh: page allocation failure. order:8, mode:0xd0
Stack from 41da5dcc:
4005b0f2 400553b6 40207431 406131f8 00000008 000000d0 00000008 00000000
000000a2 000a2000 000a2000 0000000c 40544a14 00000000 405434fc 00000077
41da5eac 00000000 00000010 00000000 41da5008 41da5000 00000000 00000100
00000000 41da5000 00000000 000200d0 4024eecc 00000080 00000000 00000000
4005de52 000000d0 00000008 4024eec8 00000000 00000001 00004d09 00079100
00000004 00003f20 00013424 41cd7000 41da5fcc 41da5f2a 00015790 00000000
Call Trace with CONFIG_FRAME_POINTER disabled:
[4005b0f2] [400553b6] [40207431] [4005de52] [40067d64]
[40093892] [4004b15e] [400390d8] [40020e70] [400677d8]
[40020e70] [401f0c92] [40068468] [4006aa4e] [40020ea0]
[4002386c]
Mem-Info:
DMA per-cpu:
CPU 0: hi: 0, btch: 1 usd: 0
Active_anon:0 active_file:0 inactive_anon:0
inactive_file:4484 dirty:0 writeback:0 unstable:0
free:8806 slab:565 mapped:0 pagetables:0 bounce:0
DMA free:35216kB min:1016kB low:1268kB high:1524kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:17936kB present:65024kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0
DMA: 0*4kB 0*8kB 1*16kB 4*32kB 6*64kB 3*128kB 46*256kB 44*512kB 0*1024kB 0*2048kB 0*4096kB 0*8192kB 0*16384kB = 35216kB
4484 total pagecache pages
0 pages RAM
0 pages reserved
0 pages shared
0 pages non-shared
Allocation of length 663552 from process 476 (sh) failed
DMA per-cpu:
CPU 0: hi: 0, btch: 1 usd: 0
Active_anon:0 active_file:0 inactive_anon:0
inactive_file:4484 dirty:0 writeback:0 unstable:0
free:8804 slab:567 mapped:0 pagetables:0 bounce:0
DMA free:35216kB min:1016kB low:1268kB high:1524kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:17936kB present:65024kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0
DMA: 0*4kB 0*8kB 1*16kB 4*32kB 6*64kB 3*128kB 46*256kB 44*512kB 0*1024kB 0*2048kB 0*4096kB 0*8192kB 0*16384kB = 35216kB
4484 total pagecache pages
Unable to allocate RAM for process text/data, errno 12
sh: page allocation failure. order:8, mode:0xd0
Stack from 41ea6dcc:
4005b0f2 400553b6 40207431 40645848 00000008 000000d0 00000008 00000000
000000a2 000a2000 000a2000 0000000c 40544a6c 00000000 405434fc 00000077
41ea6eac 00000000 00000010 00000000 41ea6008 41ea6000 00000000 00000100
00000000 41ea6000 00000000 000200d0 4024eecc 00000080 00000000 00000000
4005de52 000000d0 00000008 4024eec8 00000000 00000001 00004d09 00079100
00000004 00003f20 00013424 410ae600 41ea6fcc 41ea6f2a 00015790 00000000
Call Trace with CONFIG_FRAME_POINTER disabled:
[4005b0f2] [400553b6] [40207431] [4005de52] [40067d64]
[40093892] [4004b15e] [400390d8] [40020e70] [400677d8]
[40020e70] [401f0c92] [40068468] [4006aa4e] [40020ea0]
[400239c2] [4002386c]
Mem-Info:
Your system has run out of 1 MB free pages. With the power of two allocator, you need a free page of size 1 MB to allocate 663552 byes. This is caused by memory fragmentation. Normally, an MMU would reorganize the free space so that it appears contiguous for new allocations.
You can only take care of the problem through prevention. If the 663552 bytes are the sh binary, you will have to prevent it from being continously re-loaded into memory. This might be done by putting it into an XIP file system.
It might be a heap allocation done by the shell. In this case, you will have to change whatever processing is causing such a large malloc.
At the system level, you will also have to see which programs are large or cause large mallocs and change their behavior so that they don't cause more fragmentation.

How do debug symbols affect performance of a Linux executable compiled by GCC?

All other factors being equal (eg optimisation level), how does having debug symbols in an ELF or SO affect:
Load time.
Runtime memory footprint.
Runtime performance?
And what could be done to mitigate any negative effects?
EDIT
I've seen this question but I find the discussion unhelpful, as the code optimization factor has confused the issue there. Why does my code run slower with multiple threads than with a single thread when it is compiled for profiling (-pg)?
The debug symbols are located in totally different sections from the code/data sections. You can check it with objdump:
$ objdump -h a.out
a.out: file format elf64-x86-64
Sections:
Idx Name Size VMA LMA File off Algn
0 .interp 0000001c 0000000000400200 0000000000400200 00000200 2**0
CONTENTS, ALLOC, LOAD, READONLY, DATA
1 .note.ABI-tag 00000020 000000000040021c 000000000040021c 0000021c 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
2 .note.gnu.build-id 00000024 000000000040023c 000000000040023c 0000023c 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
3 .hash 00000018 0000000000400260 0000000000400260 00000260 2**3
CONTENTS, ALLOC, LOAD, READONLY, DATA
4 .gnu.hash 0000001c 0000000000400278 0000000000400278 00000278 2**3
CONTENTS, ALLOC, LOAD, READONLY, DATA
5 .dynsym 00000048 0000000000400298 0000000000400298 00000298 2**3
CONTENTS, ALLOC, LOAD, READONLY, DATA
6 .dynstr 00000038 00000000004002e0 00000000004002e0 000002e0 2**0
CONTENTS, ALLOC, LOAD, READONLY, DATA
7 .gnu.version 00000006 0000000000400318 0000000000400318 00000318 2**1
CONTENTS, ALLOC, LOAD, READONLY, DATA
8 .gnu.version_r 00000020 0000000000400320 0000000000400320 00000320 2**3
CONTENTS, ALLOC, LOAD, READONLY, DATA
9 .rela.dyn 00000018 0000000000400340 0000000000400340 00000340 2**3
CONTENTS, ALLOC, LOAD, READONLY, DATA
10 .rela.plt 00000018 0000000000400358 0000000000400358 00000358 2**3
CONTENTS, ALLOC, LOAD, READONLY, DATA
11 .init 00000018 0000000000400370 0000000000400370 00000370 2**2
CONTENTS, ALLOC, LOAD, READONLY, CODE
12 .plt 00000020 0000000000400388 0000000000400388 00000388 2**2
CONTENTS, ALLOC, LOAD, READONLY, CODE
13 .text 000001c8 00000000004003b0 00000000004003b0 000003b0 2**4
CONTENTS, ALLOC, LOAD, READONLY, CODE
14 .fini 0000000e 0000000000400578 0000000000400578 00000578 2**2
CONTENTS, ALLOC, LOAD, READONLY, CODE
15 .rodata 00000004 0000000000400588 0000000000400588 00000588 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
16 .eh_frame_hdr 00000024 000000000040058c 000000000040058c 0000058c 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
17 .eh_frame 0000007c 00000000004005b0 00000000004005b0 000005b0 2**3
CONTENTS, ALLOC, LOAD, READONLY, DATA
18 .ctors 00000010 0000000000600630 0000000000600630 00000630 2**3
CONTENTS, ALLOC, LOAD, DATA
19 .dtors 00000010 0000000000600640 0000000000600640 00000640 2**3
CONTENTS, ALLOC, LOAD, DATA
20 .jcr 00000008 0000000000600650 0000000000600650 00000650 2**3
CONTENTS, ALLOC, LOAD, DATA
21 .dynamic 000001a0 0000000000600658 0000000000600658 00000658 2**3
CONTENTS, ALLOC, LOAD, DATA
22 .got 00000008 00000000006007f8 00000000006007f8 000007f8 2**3
CONTENTS, ALLOC, LOAD, DATA
23 .got.plt 00000020 0000000000600800 0000000000600800 00000800 2**3
CONTENTS, ALLOC, LOAD, DATA
24 .data 00000010 0000000000600820 0000000000600820 00000820 2**3
CONTENTS, ALLOC, LOAD, DATA
25 .bss 00000010 0000000000600830 0000000000600830 00000830 2**3
ALLOC
26 .comment 00000039 0000000000000000 0000000000000000 00000830 2**0
CONTENTS, READONLY
27 .debug_aranges 00000030 0000000000000000 0000000000000000 00000869 2**0
CONTENTS, READONLY, DEBUGGING
28 .debug_pubnames 0000001b 0000000000000000 0000000000000000 00000899 2**0
CONTENTS, READONLY, DEBUGGING
29 .debug_info 00000055 0000000000000000 0000000000000000 000008b4 2**0
CONTENTS, READONLY, DEBUGGING
30 .debug_abbrev 00000034 0000000000000000 0000000000000000 00000909 2**0
CONTENTS, READONLY, DEBUGGING
31 .debug_line 0000003b 0000000000000000 0000000000000000 0000093d 2**0
CONTENTS, READONLY, DEBUGGING
32 .debug_str 00000026 0000000000000000 0000000000000000 00000978 2**0
CONTENTS, READONLY, DEBUGGING
33 .debug_loc 0000004c 0000000000000000 0000000000000000 0000099e 2**0
CONTENTS, READONLY, DEBUGGING
You can see the extra sections (27 through 33). These sections won't be loaded at runtime, so there won't be any performance penalty. Using gdb, you can also examine them at runtime:
$ gdb ./a.out
(gdb) break main
(gdb) run
(gdb) info files
// blah blah ....
Local exec file:
`/home/kghost/a.out', file type elf64-x86-64.
Entry point: 0x4003b0
0x0000000000400200 - 0x000000000040021c is .interp
0x000000000040021c - 0x000000000040023c is .note.ABI-tag
0x000000000040023c - 0x0000000000400260 is .note.gnu.build-id
0x0000000000400260 - 0x0000000000400278 is .hash
0x0000000000400278 - 0x0000000000400294 is .gnu.hash
0x0000000000400298 - 0x00000000004002e0 is .dynsym
0x00000000004002e0 - 0x0000000000400318 is .dynstr
0x0000000000400318 - 0x000000000040031e is .gnu.version
0x0000000000400320 - 0x0000000000400340 is .gnu.version_r
0x0000000000400340 - 0x0000000000400358 is .rela.dyn
0x0000000000400358 - 0x0000000000400370 is .rela.plt
0x0000000000400370 - 0x0000000000400388 is .init
0x0000000000400388 - 0x00000000004003a8 is .plt
0x00000000004003b0 - 0x0000000000400578 is .text
0x0000000000400578 - 0x0000000000400586 is .fini
0x0000000000400588 - 0x000000000040058c is .rodata
0x000000000040058c - 0x00000000004005b0 is .eh_frame_hdr
0x00000000004005b0 - 0x000000000040062c is .eh_frame
0x0000000000600630 - 0x0000000000600640 is .ctors
0x0000000000600640 - 0x0000000000600650 is .dtors
0x0000000000600650 - 0x0000000000600658 is .jcr
0x0000000000600658 - 0x00000000006007f8 is .dynamic
0x00000000006007f8 - 0x0000000000600800 is .got
0x0000000000600800 - 0x0000000000600820 is .got.plt
0x0000000000600820 - 0x0000000000600830 is .data
0x0000000000600830 - 0x0000000000600840 is .bss
// blah blah ....
So the only penalty is that you need extra disk space to store this information. You can also use strip to remove the debug information:
$ strip a.out
Use objdump to check it again, you'll see the difference.
EDIT:
Instead of looking at sections, actually the loader loads elf file according to its Program Header, which can be seen by objdump -p (the following example uses using a different elf binary):
$ objdump -p /bin/cat
/bin/cat: file format elf64-x86-64
Program Header:
PHDR off 0x0000000000000040 vaddr 0x0000000000000040 paddr 0x0000000000000040 align 2**3
filesz 0x00000000000001f8 memsz 0x00000000000001f8 flags r-x
INTERP off 0x0000000000000238 vaddr 0x0000000000000238 paddr 0x0000000000000238 align 2**0
filesz 0x000000000000001c memsz 0x000000000000001c flags r--
LOAD off 0x0000000000000000 vaddr 0x0000000000000000 paddr 0x0000000000000000 align 2**21
filesz 0x00000000000078bc memsz 0x00000000000078bc flags r-x
LOAD off 0x0000000000007c28 vaddr 0x0000000000207c28 paddr 0x0000000000207c28 align 2**21
filesz 0x0000000000000678 memsz 0x0000000000000818 flags rw-
DYNAMIC off 0x0000000000007dd8 vaddr 0x0000000000207dd8 paddr 0x0000000000207dd8 align 2**3
filesz 0x00000000000001e0 memsz 0x00000000000001e0 flags rw-
NOTE off 0x0000000000000254 vaddr 0x0000000000000254 paddr 0x0000000000000254 align 2**2
filesz 0x0000000000000044 memsz 0x0000000000000044 flags r--
EH_FRAME off 0x0000000000006980 vaddr 0x0000000000006980 paddr 0x0000000000006980 align 2**2
filesz 0x0000000000000274 memsz 0x0000000000000274 flags r--
STACK off 0x0000000000000000 vaddr 0x0000000000000000 paddr 0x0000000000000000 align 2**4
filesz 0x0000000000000000 memsz 0x0000000000000000 flags rw-
RELRO off 0x0000000000007c28 vaddr 0x0000000000207c28 paddr 0x0000000000207c28 align 2**0
filesz 0x00000000000003d8 memsz 0x00000000000003d8 flags r--
The program headers tell which segment will be loaded with what rwx flags; multiple sections with the same flags will be merged to a single segment.
BTW:
The loader doesn't care about sections when loading elf file, but it will look at several symbol-related sections to resolve symbols when needed.
You might want to look at Why does my code run slower with multiple threads than with a single thread when it is compiled for profiling (-pg)? for a quick explanations of how the debug symbols could affect optimization.
To answer your 3 questions:
Load time will be increased when the debug symbols are present over when not present
The on-disk footprint will be larger
If you compiled with zero optimization then you really lose nothing. If you set optimization, then the optimized code will be less optimized because of the debug symbols.

Resources