Solaris libumem why not show memory leak for first dynamic allocation - debugging

Say
void main()
{
void *buff;
buff = malloc(128);
buff = malloc(60);
buff = malloc(30);
buff = malloc(16);
free(buff);
sleep(180);
}
ulib mem in solaris10 show only 60 bytes and 30 bytes as leak , why it not show 128 bytes also leaked?

Three memory leaks are detected by mdb and libumem with your code here:
cat > leak.c <<%
int main() { void *buff; buff = malloc(128); buff = malloc(60); buff = malloc(30); buff = malloc(16); free(buff); sleep(180); }
%
gcc -g leak.c -o leak
pkill leak
UMEM_DEBUG=default UMEM_LOGGING=transaction LD_PRELOAD=libumem.so.1 leak &
sleep 5
rm -f core.leak.*
gcore -o core.leak $(pgrep leak)
mdb leak core.leak.* <<%
::findleaks -d
%
gcore: core.leak.1815 dumped
CACHE LEAKED BUFCTL CALLER
0807f010 1 0808b4b8 main+0x29
0807d810 1 0808d088 main+0x39
0807d010 1 08092cd0 main+0x49
------------------------------------------------------------------------
Total 3 buffers, 280 bytes
umem_alloc_160 leak: 1 buffer, 160 bytes
ADDR BUFADDR TIMESTAMP THREAD
CACHE LASTLOG CONTENTS
808b4b8 8088f28 1af921ab662 1
807f010 806c000 0
libumem.so.1`umem_cache_alloc_debug+0x144
libumem.so.1`umem_cache_alloc+0x19a
libumem.so.1`umem_alloc+0xcd
libumem.so.1`malloc+0x2a
main+0x29
_start+0x83
umem_alloc_80 leak: 1 buffer, 80 bytes
ADDR BUFADDR TIMESTAMP THREAD
CACHE LASTLOG CONTENTS
808d088 808cf68 1af921c11eb 1
807d810 806c064 0
libumem.so.1`umem_cache_alloc_debug+0x144
libumem.so.1`umem_cache_alloc+0x19a
libumem.so.1`umem_alloc+0xcd
libumem.so.1`malloc+0x2a
main+0x39
_start+0x83
umem_alloc_40 leak: 1 buffer, 40 bytes
ADDR BUFADDR TIMESTAMP THREAD
CACHE LASTLOG CONTENTS
8092cd0 808efc8 1af921f2bd2 1
807d010 806c0c8 0
libumem.so.1`umem_cache_alloc_debug+0x144
libumem.so.1`umem_cache_alloc+0x19a
libumem.so.1`umem_alloc+0xcd
libumem.so.1`malloc+0x2a
main+0x49
_start+0x83

Related

Will Buffering Data Improve the Write Performance in IStream?

I am trying to create an IStream from a disk file, then write many blocks of data into it. Each block is of 4096 byte size. Since there are many blocks to be written, I just thinking of buffering several blocks into one large buffer, and when the large buffer is full, then I perform the actual write to IStream. Will that improve the write performance?
I try to make a small app to test that:
CFile SrcFile;
LPSTREAM lpStreamFile = NULL;
BYTE lpBuf[4096];
BYTE lpLBuf[65536];
BYTE* lpCurBuf;
UINT uRead;
DWORD uStart, uStop;
uStart = ::GetTickCount();
if (SrcFile.Open(_T("D:\\1GB.dat"), CFile::modeRead | CFile::shareExclusive | CFile::typeBinary))
{
SrcFile.SeekToBegin();
::DeleteFile(_T("F:\\1GB.dat"));
if (SUCCEEDED(SHCreateStreamOnFileEx(_T("F:\\1GB.dat"), STGM_READWRITE | STGM_SHARE_EXCLUSIVE | STGM_CREATE | STGM_DIRECT,
FILE_ATTRIBUTE_NORMAL, TRUE, NULL, &lpStreamFile)) && (lpStreamFile != NULL))
{
lpCurBuf = lpLBuf;
while (TRUE)
{
uRead = SrcFile.Read(lpBuf, 4096);
if ((lpCurBuf + uRead) > (lpLBuf + 65536))
{
lpStreamFile->Write(lpLBuf, lpCurBuf - lpLBuf, NULL);
lpCurBuf = lpLBuf;
}
::memcpy(lpCurBuf, lpBuf, uRead);
lpCurBuf += uRead;
// lpStreamFile->Write(lpBuf, 4096, NULL);
if (uRead < 4096)
break;
}
if (lpCurBuf > lpLBuf)
{
lpStreamFile->Write(lpLBuf, lpCurBuf - lpLBuf, NULL);
}
lpStreamFile->Commit(STGC_DEFAULT);
if (lpStreamFile != NULL)
{
// Release the stream
lpStreamFile->Release();
lpStreamFile = NULL;
}
}
SrcFile.Close();
}
uStop = GetTickCount();
CString strMsg;
strMsg.Format(_T("Total tick count = %u."), uStop - uStart);
AfxMessageBox(strMsg);
The large buffer size 65536, 16 times of 4096. I use a 1GB data file to do the test.
To my surprise, when large buffer is not used, then the total time is always about 10sec.
However, when large buffer is used, the first run will take 30sec, then the remaining run will take 10sec.
So it seems that IStream already has an internal written buffer, so that my large buffer does not do any benefit to it. However, I cannot find any document saying that. Neither can I find any document telling me how to control, for example, set the size of the buffer.

How to get physical page in Solaris 11

I am using mdb on Solaris 11. I have opened a file by using " tail -f file_name" command in another ssh session. I got the pid of tail command and Vnode of the file opened by tail command. After getting the Vnode, I fired "walk page" on this file. Unfortunately, I am not getting any pages in walk. How to get Virtual Pages and Physical pages?
I have done following.
1) Opened the file with 'tail -f'.
2) Get the pid in mdb. Get the Vnode of the opened file.
3) Get the page_t from Vnode_t of the opened file.
4) Left shift to the page number with 0xD, it will give pp2pa effect.
Here are the dcmds in mdb.
> ::pgrep tail
S PID PPID PGID SID UID FLAGS ADDR NAME
R 2889 2882 2889 2850 0 0x4a004000 0000060013f29890 tail
> 0000060013f29890::pfiles
FD TYPE VNODE INFO
0 REG 00000600162f6740 /export/home/chaitanya/OpenSolaris/README.opensolaris
1 CHR 000006001a290400 /devices/pseudo/pts#0:2
2 CHR 000006001a290400 /devices/pseudo/pts#0:2
> 00000600162f6740::walk page
70004781480
700040b0400
> 70004781480::print -at page_t
{
70004781480 u_offset_t p_offset = 0x2000
70004781488 struct vnode *p_vnode = 0x600162f6740
70004781490 selock_t p_selock = 0
70004781494 uint_t p_vpmref = 0x11d9d
70004781498 struct page *p_hash = 0x70002f79b00
700047814a0 struct page *p_vpnext = 0x700040b0400
700047814a8 struct page *p_vpprev = 0x700040b0400
700047814b0 struct page *p_next = 0x70004781480
700047814b8 struct page *p_prev = 0x70004781480
700047814c0 ushort_t p_lckcnt = 0
700047814c2 ushort_t p_cowcnt = 0
700047814c4 kcondvar_t p_cv = {
700047814c4 ushort_t _opaque = 0
}
700047814c6 kcondvar_t p_io_cv = {
700047814c6 ushort_t _opaque = 0
}
700047814c8 uchar_t p_iolock_state = 0
700047814c9 volatile uchar_t p_szc = 0
700047814ca uchar_t p_fsdata = 0
700047814cb uchar_t p_state = 0x40
700047814cc uchar_t p_nrm = 0x2
700047814cd uchar_t p_vcolor = 0x2
700047814ce uchar_t p_index = 0
700047814cf uchar_t p_toxic = 0
700047814d0 void *p_mapping = 0
700047814d8 pfn_t p_pagenum = 0x80f029
700047814e0 uint_t p_share = 0
700047814e4 uint_t p_sharepad = 0
700047814e8 uint_t p_slckcnt = 0
700047814ec uint_t p_kpmref = 0
700047814f0 struct kpme *p_kpmelist = 0
700047814f8 kmutex_t p_ilock = {
700047814f8 void *[1] _opaque = [ 0 ]
}
}
Left shift to the page number 0x80f029 with 0xD, it will give pp2pa
> 101E052000,100::dump -p
\/ 1 2 3 4 5 6 7 8 9 a b c d e f v123456789abcdef
101e052000: 75742069 742e2020 4e6f7220 646f2079 ut it. Nor do y
101e052010: 6f752068 61766520 746f206b 65657020 ou have to keep
101e052020: 7468650a 20202020 206e616d 65206f70 the. name op
101e052030: 656e736f 6c617269 732e7368 2c206275 ensolaris.sh, bu
101e052040: 74207468 61742773 20746865 206e616d t that's the nam
101e052050: 65207765 276c6c20 75736520 696e2074 e we'll use in t
101e052060: 68657365 206e6f74 65732e0a 0a202020 hese notes...
101e052070: 20205468 656e206d 616b6520 74686520 Then make the
101e052080: 666f6c6c 6f77696e 67206368 616e6765 following change
101e052090: 7320696e 20796f75 72206f70 656e736f s in your openso
101e0520a0: 6c617269 732e7368 3a0a0a20 20202d20 laris.sh:.. -
101e0520b0: 6368616e 67652047 41544520 746f2074 change GATE to t
101e0520c0: 6865206e 616d6520 6f662074 68652074 he name of the t
101e0520d0: 6f702d6c 6576656c 20646972 6563746f op-level directo
101e0520e0: 72792028 652e672e 2c0a2020 20202022 ry (e.g.,. "
101e0520f0: 74657374 77732229 2e0a0a20 20202d20 testws")... -

LPC4357 jumping to external flash unstable

I have two programs, one is custom bootloader which will be located in BANK A and the other one is the main program which will be located in external spifi flash.
Both programs I load and debug from eclipse.
Currently I can jump from the bootloader to the main program, I can start debug with jtag from the reset handler in the main program but sometimes is continues to main and sometimes it breaks after the first step.
From the bootloader I do the following for jumping to main program:
halInit(); // init my hal functions
chSysInit(); // init my OS
chThdSleepMilliseconds(1000);
flashInit(); // init the spifi driver
uint32_t NewAddressOfNewVectorTable = 0x14000000;
__disable_irq();
/** Load new Vector Table **/
SCB->VTOR = AddressOfNewVectorTable;
__set_MSP(*(uint32_t*)(AddressOfNewVectorTable));
LPC_CREG->M4MEMMAP = AddressOfNewVectorTable;
/**
* Jump to Reset Vector of new Vector Table
* -> thats the second word (hence fourth byte).
*/
((user_code_pointer_t)(*((uint32_t*)(AddressOfNewVectorTable + 4))))();
This seems to work alright and it jumps to the 0x14000000 address.
Then when I start to debug with JTAG the app, it will come first in the reset handler.
I firstly had to disable __early_init() for the main app which configures the clocks, this would make all crash.
Is this correct?
Then sometimes I can step through all into the main() function.
But also regularly it crashes after doing the first step with JTAG with the following output:
SEGGER J-Link GDB Server V5.02l Command Line Version
JLinkARM.dll V5.02l (DLL compiled Nov 24 2015 09:36:30)
-----GDB Server start settings-----
GDBInit file: none
GDB Server Listening port: 2331
SWO raw output listening port: 2332
Terminal I/O port: 2333
Accept remote connection: localhost only
Generate logfile: off
Verify download: on
Init regs on start: on
Silent mode: off
Single run mode: on
Target connection timeout: 0 ms
------J-Link related settings------
J-Link Host interface: USB
J-Link script: none
J-Link settings file: none
------Target related settings------
Target device: LPC4357_M4
Target interface: SWD
Target interface speed: 1000kHz
Target endian: little
Connecting to J-Link...
J-Link is connected.
Firmware: J-Link V9 compiled Oct 9 2015 20:34:47
Hardware: V9.30
S/N: 269301849
OEM: SEGGER-EDU
Feature(s): FlashBP, GDB
Checking target voltage...
Target voltage: 3.31 V
Listening on TCP/IP port 2331
Connecting to target...Connected to target
Waiting for GDB connection...Connected to 127.0.0.1
Reading all registers
WARNING: Failed to read memory # address 0x00000000
Target interface speed set to 1000 kHz
Resetting target
Halting target CPU...
...Target halted (PC = 0x1A000140)
R0 = 1A000000, R1 = 1A000000, R2 = 400F1FC0, R3 = 12345678
R4 = 40045000, R5 = 1008000C, R6 = 00000000, R7 = 00000000
R8 = 00000000, R9 = 00000000, R10= 00000000, R11= 00000000
R12= 00000000, R13= 10000200, MSP= 10000200, PSP= 00000000
R14(LR) = 10405B25, R15(PC) = 1A000140
XPSR 41000000, APSR 40000000, EPSR 01000000, IPSR 00000000
CFBP 00000000, CONTROL 00, FAULTMASK 00, BASEPRI 00, PRIMASK 00
Reading all registers
Select auto target interface speed (2000 kHz)
Flash breakpoints enabled
Semi-hosting enabled (Handle on BKPT)
Semihosting I/O set to TELNET Client
SWO disabled succesfully.
SWO enabled succesfully.
Downloading 276 bytes # address 0x14000000 - Verified OK
Downloading 16192 bytes # address 0x14000120 - Verified OK
Downloading 16080 bytes # address 0x14004060 - Verified OK
Downloading 16176 bytes # address 0x14007F30 - Verified OK
Downloading 16064 bytes # address 0x1400BE60 - Verified OK
Downloading 16096 bytes # address 0x1400FD20 - Verified OK
Downloading 16096 bytes # address 0x14013C00 - Verified OK
Downloading 16112 bytes # address 0x14017AE0 - Verified OK
Downloading 8492 bytes # address 0x1401B9D0 - Verified OK
Downloading 8 bytes # address 0x1401DAFC - Verified OK
Downloading 2308 bytes # address 0x1401DB08 - Verified OK
Comparing flash [....................] Done.
Verifying flash [....................] Done.
Writing register (PC = 0x14000120)
Read 4 bytes # address 0x14000120 (Data = 0x4100F081)
Read 2 bytes # address 0x14015764 (Data = 0x2300)
Read 2 bytes # address 0x140157C8 (Data = 0x2300)
Read 2 bytes # address 0x140166D8 (Data = 0x2300)
Read 2 bytes # address 0x140166DC (Data = 0x4818)
Read 2 bytes # address 0x14001370 (Data = 0xB672)
Read 2 bytes # address 0x140013E8 (Data = 0xF015)
Read 2 bytes # address 0x140166D8 (Data = 0x2300)
Read 2 bytes # address 0x1401670E (Data = 0xAB03)
Read 2 bytes # address 0x14001370 (Data = 0xB672)
Read 2 bytes # address 0x14001370 (Data = 0xB672)
Read 2 bytes # address 0x14016764 (Data = 0xF7EC)
Read 2 bytes # address 0x14016764 (Data = 0xF7EC)
Read 2 bytes # address 0x14001370 (Data = 0xB672)
Read 2 bytes # address 0x140013D4 (Data = 0xF7FF)
Read 2 bytes # address 0x14015764 (Data = 0x2300)
Read 2 bytes # address 0x140157BA (Data = 0x4850)
Read 2 bytes # address 0x1400ACB8 (Data = 0x4806)
Read 2 bytes # address 0x1400ACBA (Data = 0x4907)
Read 2 bytes # address 0x1400ACC4 (Data = 0x4B05)
Resetting target
Halting target CPU...
...Target halted (PC = 0x1A000140)
Read 2 bytes # address 0x14016764 (Data = 0xF7EC)
Read 2 bytes # address 0x14016764 (Data = 0xF7EC)
Read 2 bytes # address 0x14016764 (Data = 0xF7EC)
R0 = 1A000000, R1 = 1A000000, R2 = 400F1FC0, R3 = 12345678
R4 = 40045000, R5 = 1008000C, R6 = 00000000, R7 = 00000000
R8 = 00000000, R9 = 00000000, R10= 00000000, R11= 00000000
R12= 00000000, R13= 10000200, MSP= 10000200, PSP= 00000000
R14(LR) = 10405B25, R15(PC) = 1A000140
XPSR 41000000, APSR 40000000, EPSR 01000000, IPSR 00000000
CFBP 00000000, CONTROL 00, FAULTMASK 00, BASEPRI 00, PRIMASK 00
Reading all registers
Read 4 bytes # address 0x1A000140 (Data = 0x4C24B672)
Setting breakpoint # address 0x14001370, Size = 2, BPHandle = 0x0001
Setting breakpoint # address 0x140013D4, Size = 2, BPHandle = 0x0002
Setting breakpoint # address 0x140013E8, Size = 2, BPHandle = 0x0003
Setting breakpoint # address 0x1400ACC4, Size = 2, BPHandle = 0x0004
Setting breakpoint # address 0x140166DC, Size = 2, BPHandle = 0x0005
Setting breakpoint # address 0x1401670E, Size = 2, BPHandle = 0x0006
Setting breakpoint # address 0x14016764, Size = 2, BPHandle = 0x0007
Starting target CPU...
Erasing flash [....................] Done.
Programming flash [....................] Done.
Verifying flash [....................] Done.
...Breakpoint reached # address 0x14001370
Reading all registers
Removing breakpoint # address 0x14001370, Size = 2
Removing breakpoint # address 0x140013D4, Size = 2
Removing breakpoint # address 0x140013E8, Size = 2
Removing breakpoint # address 0x1400ACC4, Size = 2
Removing breakpoint # address 0x140166DC, Size = 2
Removing breakpoint # address 0x1401670E, Size = 2
Removing breakpoint # address 0x14016764, Size = 2
Read 4 bytes # address 0x14001370 (Data = 0x4C23B672)
Read 4 bytes # address 0x1A0025B4 (Data = 0xF8D39B03)
Read 4 bytes # address 0x1A0025B4 (Data = 0xF8D39B03)
Performing single step...
...Target halted (PC = 0xFFFFFFFE)
Reading all registers
WARNING: Failed to read memory # address 0xFFFFFFFE
Read 4 bytes # address 0x1000041C (Data = 0x00000000)
Reading 64 bytes # address 0x10000400
Read 4 bytes # address 0x00000000 (Data = 0x10000400)
Read 4 bytes # address 0x00000000 (Data = 0x10000400)
Am I missing something that is needed after the jump to external flash?
Or can anyone give me a pointer why this is not functioning properly all the time?

CentOs7: HugePages_Rsvd equals 18446744073709551615

I have an application which uses a large amount of huge pages, for the purpose of DPDK. I allocate the pages at system start and then load/unload the application several times.
After some reloads, the program cannot allocate huge pages anymore. When I look at meminfo, I see:
HugePages_Total: 2656
HugePages_Free: 1504
HugePages_Rsvd: 18446744073709551615
HugePages_Surp: 0
This will remains this way, and will not let any application allocate huge pages, until I reboot the machine.
Any idea?
The decrement_hugepage_resv_vma function attempts -1 for resv_huge_pages,
but unsigned arithmetic causes it to instead ULONG_MAX (unsigned long resv_huge_pages), which is 18446744073709551615 on 64-bit systems.
alloc_huge_page()
page = dequeue_huge_page_vma(h, vma, addr, avoid_reserve)
decrement_hugepage_resv_vma() {
h->resv_huge_pages--;
}
static void return_unused_surplus_pages(struct hstate *h,
unsigned long unused_resv_pages)
{
h->resv_huge_pages -= unused_resv_pages;
}
Other less odds reason is gather_surplus_pages() function can overflow resv_huge_pages.
Try next during you test:
while [ 1 ] ; do date | awk '{print $4}' >> HugePages_Rsvd.log; cat /proc/meminfo | grep HugePages_Rsvd >> HugePages_Rsvd.log; sleep 1; done
I gues if resv_huge_pages will increment slowly so problem in h->resv_huge_pages += delta;
But if resv_huge_pages suddenly become -1 (unsigned long == 18446744073709551615 ) so problem in h->resv_huge_pages--; (resv_huge_pages was 0 and aftre decrement == -1)
Depend from you kernel you can check patch
mm: numa: disable change protection for vma(VM_HUGETLB)
6b79c57b92cdd90853002980609af516d14c4f9c
and BUG large value for HugePages_Rsvd

optimization of sequential i/o operations on large file sizes

Compiler: Microsoft C++ 2005Hardware: AMD 64-bit (16 GB)
Sequential, read-only access from an 18GB file is committed with the following timing, file access, and file structure characteristics:18,184,359,164 (file length)11,240,476,672 (ntfs compressed file length)
Time File Method Disk
14:33? compressed fstream fixed disk
14:06 normal fstream fixed disk
12:22 normal winapi fixed disk
11:47 compressed winapi fixed disk
11:29 compressed fstream ram disk
10:37 compressed winapi ram disk
7:18 compressed 7z stored decompression to ntfs 12gb ram disk
6:37 normal copy to same volume fixed disk
The fstream constructor and access:
define BUFFERSIZE 524288
unsigned int mbytes = BUFFERSIZE;
char * databuffer0; databuffer0 = (char*) malloc (mbytes);
datafile.open("drv:/file.ext", ios::in | ios::binary );
datafile.read (databuffer0, mbytes);
The winapi constructor and access:
define BUFFERSIZE 524288
unsigned int mbytes = BUFFERSIZE;
const TCHAR* const filex = _T("drv:/file.ext");
char ReadBuffer[BUFFERSIZE] = {0};
hFile = CreateFile(filex, GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
if( FALSE == ReadFile(hFile, ReadBuffer, BUFFERSIZE-1, &dwBytesRead, NULL))
{ ...
For the fstream method, -> 16MB buffer sizes do not decrease processing time. All buffer sizes beyond .5MB fail for the winapi method. What methods would optimize this implementation versus processing time?
Did you try memory-mapping the file? In my test this was always the fastest way to read large files.
Update: Here's an old, but still accurate description of memory mapped files:
http://msdn.microsoft.com/en-us/library/ms810613.aspx
Try this.
hf = CreateFile(..... FILE_FLAG_NO_BUFFERING | FILE_FLAG_OVERLAPPED ...)
Then the reading loop. Minor details omitted as typing on iPad...
int bufsize =4*1024*1024;
CEvent e1;
CEvent e2;
CEvent e3;
CEvent e4;
unsigned char* pbuffer1 = malloc(bufsize);
unsigned char* pbuffer2 = malloc(bufsize);
unsigned char* pbuffer3 = malloc(bufsize);
unsigned char* pbuffer4 = malloc(bufsize);
int CurOffset = 0;
do {
OVERLAPPED r1;
memset(&r1, 0, sizeof(OVERLAPPED));
r1.Offset = CurOffset;
CurOffset += bufsize;
r1.hEvent = e1;
if (! ReadFile(hf, pbuffer1, bufsize, bufsize, &r1)) {
// check for error AND error_handle_eof (important)
}
OVERLAPPED r2;
memset(&r2, 0, sizeof(OVERLAPPED));
r2.Offset = CurOffset;
CurOffset += bufsize;
r2.hEvent = e2;
if (! ReadFile(hf, pbuffer2, bufsize, bufsize, &r2)) {
// check for error AND error_handle_eof (important)
}
OVERLAPPED r3;
memset(&r3, 0, sizeof(OVERLAPPED));
r3.Offset = CurOffset;
CurOffset += bufsize;
r3.hEvent = e3;
if (! ReadFile(hf, pbuffer3, bufsize, bufsize, &r3)) {
// check for error AND error_handle_eof (important)
}
OVERLAPPED r4;
memset(&r4, 0, sizeof(OVERLAPPED));
r4.Offset = CurOffset;
CurOffset += bufsize;
r4.hEvent = e4;
if (! ReadFile(hf, pbuffer1, bufsize, bufsize, &r4)) {
// check for error AND error_handle_eof (important)
}
// wait for events to indicate data present
// send data to consuming threads
// allocate new buffer
} while ( not eof, etc )
The above is the bones of what you need. We use this and achieve high I/O throughput rates, but you will need to perhaps improve it slightly to achieve ultimate performance. We found 4 outstanding I/O was best for our use, but this will vary by platform. Reading less than 1Mb per IO was performance negative. Once you have the buffer read, don't ty and consume it in the reading loop, post to another thread, and allocate another buffer (but get them from a reuse queue, dont keep using malloc). The overall intent of the above is to try and keep 4 outstanding IO open to the disk, as soon as you don't have this, overall performance will drop.
Also, this works best on a disk that is only Reading your file. If you start reading/writing different files on the same disk at same time, performance drops quickly, unless you have SSD disks!
Not sure why your readfile is failing for 0.5Mb buffers, just double checked and our live prod code is using 4Mb buffers

Resources