Out of memory: kill process - ruby

Does anyone have any ideas on why these logs are giving these errors - Out of memory: kill process ... nginx invoked oom-killer?
Lately, our cms has been going down and we have to manually restart the server in AWS and we're not sure what is happening to cause this behavior. log errors
Here are the exact lines of code that repeated 33 times while the server was down:
Out of memory: kill process 15654 (ruby) score 338490 or a child
Killed process 15654 (ruby) vsz:1353960kB, anon-rss:210704kB, file-rss:0kB
nginx invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
nginx cpuset=/ mems_allowed=0
Pid: 8729, comm: nginx Tainted: G W 2.6.35.14-97.44.amzn1.x86_64 #1
Call Trace:
[<ffffffff8108e638>] ? cpuset_print_task_mems_allowed+0x98/0xa0
[<ffffffff810bb157>] dump_header.clone.1+0x77/0x1a0
[<ffffffff81318d49>] ? _raw_spin_unlock_irqrestore+0x19/0x20
[<ffffffff811ab3af>] ? ___ratelimit+0x9f/0x120
[<ffffffff810bb2f6>] oom_kill_process.clone.0+0x76/0x140
[<ffffffff810bb4d8>] __out_of_memory+0x118/0x190
[<ffffffff810bb5d2>] out_of_memory+0x82/0x1c0
[<ffffffff810beb89>] __alloc_pages_nodemask+0x689/0x6a0
[<ffffffff810e7864>] alloc_pages_current+0x94/0xf0
[<ffffffff810b87ef>] __page_cache_alloc+0x7f/0x90
[<ffffffff810c15e0>] __do_page_cache_readahead+0xc0/0x200
[<ffffffff810c173c>] ra_submit+0x1c/0x20
[<ffffffff810b9f63>] filemap_fault+0x3e3/0x430
[<ffffffff810d023f>] __do_fault+0x4f/0x4b0
[<ffffffff810d2774>] handle_mm_fault+0x1b4/0xb40
[<ffffffff81007682>] ? check_events+0x12/0x20
[<ffffffff81006f1d>] ? xen_force_evtchn_callback+0xd/0x10
[<ffffffff81007682>] ? check_events+0x12/0x20
[<ffffffff8131c752>] do_page_fault+0x112/0x310
[<ffffffff813194b5>] page_fault+0x25/0x30
Mem-Info:
Node 0 DMA per-cpu:
CPU 0: hi: 0, btch: 1 usd: 0
CPU 1: hi: 0, btch: 1 usd: 0
CPU 2: hi: 0, btch: 1 usd: 0
CPU 3: hi: 0, btch: 1 usd: 0
Node 0 DMA32 per-cpu:
CPU 0: hi: 186, btch: 31 usd: 35
CPU 1: hi: 186, btch: 31 usd: 0
CPU 2: hi: 186, btch: 31 usd: 30
CPU 3: hi: 186, btch: 31 usd: 0
Node 0 Normal per-cpu:
CPU 0: hi: 186, btch: 31 usd: 202
CPU 1: hi: 186, btch: 31 usd: 30
CPU 2: hi: 186, btch: 31 usd: 59
CPU 3: hi: 186, btch: 31 usd: 140
active_anon:3438873 inactive_anon:284496 isolated_anon:0
active_file:0 inactive_file:62 isolated_file:64
unevictable:0 dirty:0 writeback:0 unstable:0
free:16763 slab_reclaimable:1340 slab_unreclaimable:2956
mapped:29 shmem:12 pagetables:11130 bounce:0
Node 0 DMA free:7892kB min:16kB low:20kB high:24kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15772kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
lowmem_reserve[]: 0 4024 14836 14836
Node 0 DMA32 free:47464kB min:4224kB low:5280kB high:6336kB active_anon:3848564kB inactive_anon:147080kB active_file:0kB inactive_file:8kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:4120800kB mlocked:0kB dirty:0kB writeback:0kB mapped:60kB shmem:0kB slab_reclaimable:28kB slab_unreclaimable:268kB kernel_stack:48kB pagetables:8604kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:82 all_unreclaimable? yes
lowmem_reserve[]: 0 0 10811 10811
Node 0 Normal free:11184kB min:11352kB low:14188kB high:17028kB active_anon:9906928kB inactive_anon:990904kB active_file:0kB inactive_file:1116kB unevictable:0kB isolated(anon):0kB isolated(file):256kB present:11071436kB mlocked:0kB dirty:0kB writeback:0kB mapped:56kB shmem:48kB slab_reclaimable:5332kB slab_unreclaimable:11556kB kernel_stack:2400kB pagetables:35916kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:32 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
Node 0 DMA: 3*4kB 1*8kB 2*16kB 1*32kB 0*64kB 3*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 1*4096kB = 7892kB
Node 0 DMA32: 62*4kB 104*8kB 53*16kB 29*32kB 27*64kB 7*128kB 2*256kB 3*512kB 1*1024kB 3*2048kB 8*4096kB = 47464kB
Node 0 Normal: 963*4kB 0*8kB 5*16kB 4*32kB 7*64kB 5*128kB 6*256kB 1*512kB 2*1024kB 1*2048kB 0*4096kB = 11292kB
318 total pagecache pages
0 pages in swap cache
Swap cache stats: add 0, delete 0, find 0/0
Free swap = 0kB
Total swap = 0kB
3854801 pages RAM
86406 pages reserved
14574 pages shared
3738264 pages non-shared

It's happening because your server is running out of memory. To solve this problem you have 2 options.
Update your Server's Ram or use SWAP (But upgrading Physical ram is recommended instead of using SWAP)
Limit Nginx ram use.
To limit nginx ram use open the /etc/nginx/nginx.conf file and add client_max_body_size <your_value_here> under the http configuration block. For example:
worker_processes 1;
http {
client_max_body_size 10M;
...
}
Note: use M for MB, G for GB and T for TB

Related

RaspberryPi 3b+ with multiple can buses (MPC2515))

I'm trying to connect 6 mcp2515 over spi0. I have adapted an SPI overlay to add the neccesary chip select lines. My new SPI overlay looks like this:
{
compatible = "brcm,bcm2835", "brcm,bcm2836", "brcm,bcm2708", "brcm,bcm2709";
fragment#0 {
target = <&spi0>;
frag0: __overlay__ {
#address-cells = <1>;
#size-cells = <0>;
pinctrl-0 = <&spi0_pins &spi0_cs_pins>;
status = "okay";
cs-gpios = <&gpio 8 1>, <&gpio 7 1>, <&gpio 22 1>, <&gpio 23 1>, <&gpio 24 1>, <&gpio 25 1>;
spidev#0{
compatible = "spidev";
reg = <0>; /* CE0 */
#address-cells = <1>;
#size-cells = <0>;
spi-max-frequency = <500000>;
};
spidev#1{
compatible = "spidev";
reg = <1>; /* CE1 */
#address-cells = <1>;
#size-cells = <0>;
spi-max-frequency = <500000>;
};
spidev#2{
compatible = "spidev";
reg = <2>; /* CE2 */
#address-cells = <1>;
#size-cells = <0>;
spi-max-frequency = <500000>;
};
spidev#3{
compatible = "spidev";
reg = <3>; /* CE3 */
#address-cells = <1>;
#size-cells = <0>;
spi-max-frequency = <500000>;
};
spidev#4{
compatible = "spidev";
reg = <4>; /* CE4 */
#address-cells = <1>;
#size-cells = <0>;
spi-max-frequency = <500000>;
};
spidev#5{
compatible = "spidev";
reg = <5>; /* CE5 */
#address-cells = <1>;
#size-cells = <0>;
spi-max-frequency = <500000>;
};
};
};
fragment#1 {
target = <&gpio>;
__overlay__ {
spi0_cs_pins: spi0_cs_pins {
brcm,pins = <7 8 22 23 24 25>;
brcm,function = <1>; /* out */
};
};
};
With this SPI overlay i have the 6 spi's in /sys/bus/spi/devices/
spi0.0 spi0.1 spi0.2 spi0.3 spi0.4 spi0.5
I have also made new overlays for the mcp2515 (can0 to can5) in order to bind them with the new chip select lines of spi0.
My /boot/config.txt looks like this:
dtoverlay=spi-gpio-cs-new
dtoverlay=mcp2515-can0,oscillator=8000000,interrupt=5
dtoverlay=mcp2515-can4,oscillator=8000000,interrupt=26
dtoverlay=mcp2515-can5,oscillator=8000000,interrupt=27
dmesg | grep mcp
[ 7.870207] mcp251x spi0.5 can0: MCP2515 successfully initialized.
[ 7.892886] mcp251x spi0.4 can1: MCP2515 successfully initialized.
[ 7.908725] mcp251x spi0.0 can2: MCP2515 successfully initialized.
ifconfig
can0: flags=193<UP,RUNNING,NOARP> mtu 16
unspec 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00 txqueuelen 10 (UNSPEC)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
can1: flags=193<UP,RUNNING,NOARP> mtu 16
unspec 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00 txqueuelen 10 (UNSPEC)
RX packets 36 bytes 180 (180.0 B)
RX errors 0 dropped 36 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
can2: flags=193<UP,RUNNING,NOARP> mtu 16
unspec 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00 txqueuelen 10 (UNSPEC)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
I only have 3 mcp2515 boards at my disposal for the moment. I have modified them regarding voltage supply to the CAN transceiver(5V) and can controller(3V) in order no to damage the Raspberry Pi GPIO's, the boards have been individually tested and I was able to send/receive CAN frames with them. They are connected to the Raspberry like this
Out of these 3 interfaces only can1 (spi0.4) is working! Using candump I can see can frames on the network.
My question is why can0 and can2 are muted when trying to send or receive CAN messages (candump and cansend)?
Kernel interrupt table
CPU0 CPU1 CPU2 CPU3
17: 217 0 0 0 ARMCTRL-level 1 Edge 3f00b880.mailbox
18: 47 0 0 0 ARMCTRL-level 2 Edge VCHIQ doorbell
40: 0 0 0 0 ARMCTRL-level 48 Edge bcm2708_fb DMA
42: 352 0 0 0 ARMCTRL-level 50 Edge DMA IRQ
44: 3062 0 0 0 ARMCTRL-level 52 Edge DMA IRQ
45: 0 0 0 0 ARMCTRL-level 53 Edge DMA IRQ
48: 0 0 0 0 ARMCTRL-level 56 Edge DMA IRQ
56: 14104 0 0 0 ARMCTRL-level 64 Edge dwc_otg, dwc_otg_pcd, dwc_otg_hcd:usb1
78: 0 0 0 0 ARMCTRL-level 86 Edge 3f204000.spi
80: 158 0 0 0 ARMCTRL-level 88 Edge mmc0
81: 7450 0 0 0 ARMCTRL-level 89 Edge uart-pl011
86: 4207 0 0 0 ARMCTRL-level 94 Edge mmc1
161: 0 0 0 0 bcm2836-timer 0 Edge arch_timer
162: 1813 1900 2264 1528 bcm2836-timer 1 Edge arch_timer
165: 0 0 0 0 bcm2836-pmu 9 Edge arm-pmu
166: 0 0 0 0 lan78xx-irqs 17 Edge usb-001:004:01
167: 0 0 0 0 pinctrl-bcm2835 26 Edge spi0.5
168: 6 0 0 0 pinctrl-bcm2835 27 Edge spi0.4
169: 0 0 0 0 pinctrl-bcm2835 5 Level spi0.0
FIQ: usb_fiq
IPI0: 0 0 0 0 CPU wakeup interrupts
IPI1: 0 0 0 0 Timer broadcast interrupts
IPI2: 1469 2966 3711 4460 Rescheduling interrupts
IPI3: 203 798 542 445 Function call interrupts
IPI4: 0 0 0 0 CPU stop interrupts
IPI5: 55 85 41 24 IRQ work interrupts
IPI6: 0 0 0 0 completion interrupts
Err: 0
I can see from this table that the SPI's are assigned with interrupts but only spi0.4 is actually activated. How can activate the other 2 interrupts for spi0.0 and spi0.5?
It's working!!!
As i mentioned in my first post only one board was working (can1 spi0.4), after i rechecked the other two non working boards i discovered that one had a hardware damage causing the other board not to work as well. As a final conclusion my spi and mcp overlays are fully functional!
Regards
Antmar
[ 7.846788] mcp251x spi0.0 can0: MCP2515 successfully initialized.
[ 7.888039] mcp251x spi0.1 can1: MCP2515 successfully initialized.
[ 7.924747] mcp251x spi0.2 can2: MCP2515 successfully initialized.
[ 7.936608] mcp251x spi0.3 can3: MCP2515 successfully initialized.
can0 241 [5] 67 A4 31 F0 C7
can1 2A0 [2] 02 93
can2 241 [5] 67 A4 31 F0 CB
can3 240 [2] 02 6A

Managing interrupt in Linux-based OS

Context
I have a device tree in which one of the node is:
gpio#41210000 {
#gpio-cells = <0x2>;
#interrupt-cells = <0x2>;
compatible = "xlnx,xps-gpio-1.00.a,generic-uio,BTandSW";
gpio-controller;
interrupt-controller;
interrupt-parent = <0x4>;
//interrupt-parent =<&gic>;
interrupts = <0x0 0x1d 0x4>;
reg = <0x41210000 0x10000>;
xlnx,all-inputs = <0x1>;
xlnx,all-inputs-2 = <0x1>;
xlnx,all-outputs = <0x0>;
xlnx,all-outputs-2 = <0x0>;
xlnx,dout-default = <0x0>;
xlnx,dout-default-2 = <0x0>;
xlnx,gpio-width = <0x4>;
xlnx,gpio2-width = <0x2>;
xlnx,interrupt-present = <0x1>;
xlnx,is-dual = <0x1>;
xlnx,tri-default = <0xffffffff>;
xlnx,tri-default-2 = <0xffffffff>;
};
Once the kernel is running, performing the
cat /proc/interrupts
the results are:
root#linaro-developer:~# cat /proc/interrupts
CPU0 CPU1
16: 0 0 GIC-0 27 Edge gt
17: 0 0 GIC-0 43 Level ttc_clockevent
18: 1588 1064 GIC-0 29 Edge twd
21: 43 0 GIC-0 39 Level f8007100.adc
24: 0 0 GIC-0 35 Level f800c000.ocmc
25: 506 0 GIC-0 59 Level xuartps
26: 0 0 GIC-0 51 Level e000d000.spi
27: 0 0 GIC-0 54 Level eth5
28: 4444 0 GIC-0 56 Level mmc0
29: 0 0 GIC-0 45 Level f8003000.dmac
30: 0 0 GIC-0 46 Level f8003000.dmac
31: 0 0 GIC-0 47 Level f8003000.dmac
32: 0 0 GIC-0 48 Level f8003000.dmac
33: 0 0 GIC-0 49 Level f8003000.dmac
34: 0 0 GIC-0 72 Level f8003000.dmac
35: 0 0 GIC-0 73 Level f8003000.dmac
36: 0 0 GIC-0 74 Level f8003000.dmac
37: 0 0 GIC-0 75 Level f8003000.dmac
38: 0 0 GIC-0 40 Level f8007000.devcfg
45: 0 0 GIC-0 41 Edge f8005000.watchdog
IPI1: 0 0 Timer broadcast interrupts
IPI2: 1731 2206 Rescheduling interrupts
IPI3: 29 36 Function call interrupts
IPI4: 0 0 CPU stop interrupts
IPI5: 0 0 IRQ work interrupts
IPI6: 0 0 completion interrupts
Err: 0
Questions
Once the kernel is running, should it recognize, automatically, and update the data in the file /proc/interrupt?
However, I wrote a .probe function in this way:
static int SWITCH_of_probe(struct platform_device *pdev)
{
int ret=0;
struct irq_data data_tmp;
SWITCH_01_devices->temp_res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
if (!(SWITCH_01_devices->temp_res)) {
dev_err(&pdev->dev, "TRY could not get IO memory\n");
return -ENXIO;
}
PDEBUG("resource : regs.start=%#x,regs.end=%#x\n",SWITCH_01_devices->temp_res->start,SWITCH_01_devices->temp_res->end);
//SWITCH_01_devices->irq_line = platform_get_irq(pdev, 0);
SWITCH_01_devices->irq_line = irq_of_parse_and_map(pdev->dev.of_node, 0);
if (SWITCH_01_devices->irq_line < 0) {
dev_err(&pdev->dev, "could not get IRQ\n");
printk(KERN_ALERT "could not get IRQ\n");
return SWITCH_01_devices->irq_line;
}
PDEBUG(" resource VIRTUAL IRQ NUMBER : irq=%#x\n",SWITCH_01_devices->irq_line);
ret = request_irq((SWITCH_01_devices->irq_line), SWITCH_01_interrupt, IRQF_SHARED , DRIVER_NAME, NULL);
if (ret) {
printk(KERN_ALERT "NEW SWITCH_01: can't get assigned irq, ret= %d\n", SWITCH_01_devices->irq_line, ret);
SWITCH_01_devices->irq_line = -1;
}
SWITCH_01_devices->mem_region_requested = request_mem_region((SWITCH_01_devices->temp_res->start),resource_size(SWITCH_01_devices->temp_res),"SWITCH_01");
if(SWITCH_01_devices->mem_region_requested == NULL){
printk(KERN_WARNING "[LEO] SWITCH: FaiSWITCH request_mem_region(res.start,resource_size(&(SWITCH_01_devices->res)),...);\n");
}
else
PDEBUG(" [+] request_mem_region\n");
return 0; /* Success */
}
When inserting the module in the kernel I have this output from dmesg:
[ 1249.777189] SWITCH_01: loading out-of-tree module taints kernel.
[ 1249.777787] [LEO] SWITCH_01: dinamic allocation of major number
[ 1249.777801] [LEO] cdev initialized
[ 1249.777988] [LEO] resource : regs.start=0x41210000,regs.end=0x4121ffff
[ 1249.777994] [LEO] resource : irq=0x2e
[ 1249.778000] NEW SWITCH_01: can't get assigned irq, ret= -22
[ 1249.782531] [LEO] [+] request_mem_region
What am I doing wrong? Why I cannot perform a correct request_irq?
Note: the interrupts = <0x0 0x1d 0x4> field of the device tree and the irq_number detected are different. As it was point out here, I changed the platform_get_irq(pdev, 0); with irq_of_parse_and_map(pdev->dev.of_node, 0); but the result is the same.
Once the kernel is running, should it recognize, automatically, and
update the data in the file /proc/interrupt?
Yes it will update, once the interrupt is registered.
[ 1249.777189] SWITCH_01: loading out-of-tree module taints kernel.
[ 1249.777787] [LEO] SWITCH_01: dinamic allocation of major number
[ 1249.777801] [LEO] cdev initialized
[ 1249.777988] [LEO] resource : regs.start=0x41210000,regs.end=0x4121ffff
[ 1249.777994] [LEO] resource : irq=0x2e
[ 1249.778000] NEW SWITCH_01: can't get assigned irq, ret= -22
[ 1249.782531] [LEO] [+] request_mem_region
What am I doing wrong? Why I cannot perform a correct request_irq?
A shared interrupt (IRQF_SHARED) must pass the dev_id (which you are passing NULL in request_irq()), if NULL, -EINVAL is returned back, so make sure you pass a non-null valid dev_id

EC2 high stolen time without load

I can see very high % of stolen time on a EC2 web server (t2.micro) without any load (one current user) with a high page load time. Is there a correlation between hight load time and hight stolen time? I have the same symptoms with another server from class t2.medium
Do you have an explanation?
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
2 0 0 79824 7428 479172 0 0 0 0 52 49 18 0 0 0 82
1 0 0 79792 7436 479172 0 0 0 6 54 49 18 0 0 0 82
1 0 0 79824 7444 479172 0 0 0 5 54 51 18 0 0 0 82

Rebalancing a Cassandra Ring with Vnodes

We had a system with a 3-node Cassandra 2.0.6 ring. Over time, the application load on that system increased until a limit where the ring could not handle it anymore, causing the typical node overload failures.
We doubled the size of the ring, and recently even added one more node, to try to handle the load, but there're still only 3 nodes taking all the load; but not the original 3 nodes of the initial ring.
We did the bootstrap + cleanup process described in the adding nodes guide. We also tried repairs on each node after not seeing much improvements in the ring load. Our load is 99.99% writes on this system.
Here's a chart of the cluster load illustrating the issue:
The highest load tables have a high cardinality on the partition key that I'd expect distributes well over vnodes.
Edit: nodetool info
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN x.y.z.92 56.83 GB 256 13.8% x-y-z-b53e8ab55e0a rack1
UN x.y.z.253 136.87 GB 256 15.2% x-y-z-bd3cf08449c8 rack1
UN x.y.z.70 69.84 GB 256 14.2% x-y-z-39e63dd017cd rack1
UN x.y.z.251 74.03 GB 256 14.4% x-y-z-36a6c8e4a8e8 rack1
UN x.y.z.240 51.77 GB 256 13.0% x-y-z-ea239f65794d rack1
UN x.y.z.189 128.49 GB 256 14.3% x-y-z-7c36c93e0022 rack1
UN x.y.z.99 53.65 GB 256 15.2% x-y-z-746477dc5db9 rack1
Edit: tpstats (node highly loaded)
Pool Name Active Pending Completed Blocked All time blocked
ReadStage 0 0 11591287 0 0
RequestResponseStage 0 0 283211224 0 0
MutationStage 32 405875 349531549 0 0
ReadRepairStage 0 0 3591 0 0
ReplicateOnWriteStage 0 0 0 0 0
GossipStage 0 0 3246983 0 0
AntiEntropyStage 0 0 72055 0 0
MigrationStage 0 0 133 0 0
MemoryMeter 0 0 205 0 0
MemtablePostFlusher 0 0 94915 0 0
FlushWriter 0 0 12521 0 0
MiscStage 0 0 34680 0 0
PendingRangeCalculator 0 0 14 0 0
commitlog_archiver 0 0 0 0 0
AntiEntropySessions 1 1 1 0 0
InternalResponseStage 0 0 30 0 0
HintedHandoff 0 0 1957 0 0
Message type Dropped
RANGE_SLICE 0
READ_REPAIR 196
PAGED_RANGE 0
BINARY 0
READ 0
MUTATION 31663792
_TRACE 24409
REQUEST_RESPONSE 4
COUNTER_MUTATION 0
How could I further troubleshoot this issue?
You need to run nodetool cleanup on the previous nodes that were part of the ring. Nodetool cleanup will remove the partition keys that the node currently does not own.
Seems like after the addition of the nodes, the keys have not been deleted hence causing the load to be higher on the previous nodes.
Try running
nodetool cleanup
on the previous nodes

madvise system call with MADV_SEQUENTIAL call takes too long to finish

In my code I am using an external C library and the library calls madvise with MADV_SEQUENTIAL option which takes too long to finish. In my opinion only calling madvise with MADV_SEQUENTIAL is enough for our job. My first question is, why multiple madvise system calls are made, is there a logic in calling madvise with different options sequentially? My second question is, do you have any idea why madvise with MADV_SEQUENTIAL takes too long, sometimes about 1-2 minutes?
[root#mymachine ~]# strace -ttT my_compiled_code
...
13:11:35.358982 open("/some/big/file", O_RDONLY) = 8 <0.000010>
13:11:35.359060 fstat64(8, {st_mode=S_IFREG|0644, st_size=953360384, ...}) = 0 <0.000006>
13:11:35.359155 mmap2(NULL, 1073741824, PROT_READ, MAP_SHARED, 8, 0) = 0x7755e000 <0.000007>
13:11:35.359223 madvise(0x7755e000, 1073741824, MADV_NORMAL) = 0 <0.000006>
13:11:35.359266 madvise(0x7755e000, 1073741824, MADV_RANDOM) = 0 <0.000006>
13:11:35.359886 madvise(0x7755e000, 1073741824, MADV_SEQUENTIAL) = 0 <0.000006>
13:11:53.730549 madvise(0x7755e000, 1073741824, MADV_RANDOM) = 0 <0.000013>
...
I am using 32-bit linux kernel: 3.4.52-9
[root#mymachine ~]# free -lk
total used free shared buffers cached
Mem: 4034412 3419344 615068 0 55712 767824
Low: 853572 495436 358136
High: 3180840 2923908 256932
-/+ buffers/cache: 2595808 1438604
Swap: 4192960 218624 3974336
[root#mymachine ~]# cat /proc/buddyinfo
Node 0, zone DMA 89 23 9 4 5 4 4 1 0 2 0
Node 0, zone Normal 9615 7099 3997 1723 931 397 78 0 0 1 1
Node 0, zone HighMem 7313 8089 2187 420 206 92 41 15 8 3 6

Resources