I'm working since a year on image processing with OpenCV 2.2.0.
I get a memory allocation error ONLY if I try to allocate a >2GB IplImage, given that the same allocation with CvMat works. I can allocate whatever I want using CvMat, I tried also >10 GB.
OpenCV was 64-bit compiled and also this simple application. Furthermore, I'm sure that the application runs in 64-bit mode as I can see from the Task Manager. The O.S. (Windows 7) is 64-bit too.
int main(int argc, char* argv[])
{
printf("trying to allocate >2GB matrix...\n");
CvMat *huge_matrix = cvCreateMat(40000,30000,CV_16UC1);
cvSet(huge_matrix,cvScalar(5));
printf("...done!\n\n");
system("PAUSE");
printf("trying to allocate >2GB image...\n");
IplImage *huge_img = cvCreateImage(cvSize(40000,30000),IPL_DEPTH_16U, 1);
cvSet(huge_img,cvScalar(5));
printf("...done!\n\n");
system("PAUSE");
cvReleaseMat(&huge_matrix);
cvReleaseImage(&huge_img);
}
The error message is "Insufficient memory: in unknown function... can it be a bug?
IplImage structure does not support images bigger then 2Gb because it stores total image size in the field of int type. Even if you allocate IplImage bigger then 2Gb with some hack then other methods will not be able to process it correctly. OpenCV inherited IplImage structure from the Intel Image Processing Library so there are no chance that format will be changed.
You should use newer structures (CvMat in C interface or cv::Mat in C++ interface) to operate with huge images.
Related
I'm trying to use the VK_EXT_external_memory_host extension https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/VK_EXT_external_memory_host.html. I'm not sure what the difference is between vk::ExternalMemoryHandleTypeFlagBits::eHostAllocationEXT and eHostMappedForeignMemoryEXT but I've been failing to get either to work. (I'm using VulkanHpp).
void* data_ptr = getTorchDataPtr();
uint32_t MEMORY_TYPE_INDEX;
auto EXTERNAL_MEMORY_TYPE = vk::ExternalMemoryHandleTypeFlagBits::eHostAllocationEXT;
// or vk::ExternalMemoryHandleTypeFlagBits::eHostMappedForeignMemoryEXT;
vk::MemoryAllocateInfo memoryAllocateInfo(SIZE_BYTES, MEMORY_TYPE_INDEX);
vk::ImportMemoryHostPointerInfoEXT importMemoryHostPointerInfoEXT(
MEMORY_FLAG,
data_ptr);
memoryAllocateInfo.pNext = &importMemoryHostPointerInfoEXT;
vk::raii::DeviceMemory deviceMemory( device, memoryAllocateInfo );
I'm getting Result::eErrorOutOfDeviceMemory when the constructor of DeviceMemory calls vkAllocateMemory if EXTERNAL_MEMORY_TYPE = eHostAllocationEXT and zeros in the memory if EXTERNAL_MEMORY_TYPE = eHostMappedForeignMemoryEXT (I've checked the py/libtorch tensor I'm importing is non-zero, and that my code successfully copies and readbacks a different buffer).
All values of MEMORY_TYPE_INDEX produce the same behaviour (except when MEMORY_TYPE_INDEX overflows).
The set bits of the bitmask returned by getMemoryHostPointerPropertiesEXT is suppose to give the valid values for MEMORY_TYPE_INDEX.
auto pointerProperties = device.getMemoryHostPointerPropertiesEXT(
EXTERNAL_MEMORY_TYPE,
data_ptr);
std::cout << "memoryTypeBits " << std::bitset<32>(pointerProperties.memoryTypeBits) << std::endl;
}
But if EXTERNAL_MEMORY_TYPE = eHostMappedForeignMemoryEXT then vkGetMemoryHostPointerPropertiesEXT returns Result::eErrorInitializationFailed, and if EXTERNAL_MEMORY_TYPE = eHostAllocationEXT, then the 8th and 9th bits are set. But this is the same regardless of whether data_ptr is a cuda pointer 0x7ffecf400000 or a cpu pointer 0x2be7c80 so I'm feeling something has gone wrong.
I'm also unable to get the extension VK_KHR_external_memory_capabilities which is required by VK_KHR_external_memory which is a requirement of the extension we are using VK_EXT_external_memory_host. I'm using vulkan version 1.2.162.0.
The eErrorOutOfDeviceMemory is strange as we are not supposed to be allocating any memory, I'd be glad if someone could speculate about this.
I believe that host memory is cpu memory, thus:
vk::ExternalMemoryHandleTypeFlagBits::eHostAllocationEXT wont work because the pointer is to device memory (gpu).
vk::ExternalMemoryHandleTypeFlagBits::eHostMappedForeignMemoryEXT wont work because the memory is not mapped by the host (cpu).
Is there anyway to import local device memory into vulkan? does it have to be host mapped?
Probably not https://stackoverflow.com/a/54801938/11998382.
I think the best option, for me, is to map some vulkan memory, and copy the pytorch cpu tensor across. The same data would be uploaded to the gpu twice but this doesn't really matter I suppose.
This question already has an answer here:
How to retrieve committed memory via C++
(1 answer)
Closed 5 years ago.
These two pages: Windows - Commit Size vs Virtual Size and what's the difference between working set and commit size? do an excellent job of explaining what the commit size of a program is. However, I'm looking at a program in Process Explorer, Syncthing.exe from https://syncthing.net/ and seeing something that has me curious.
According to Process Explorer, the virtual size is between 34 and 35 Gb. Yet my page file is only 15.5 Gb in size. Therefore there must be at least 19 Gb in that program that are part of the Virtual map, but not yet committed.
What Win32 API could I call to determine the actual commit size of the program? Or is there a way to get this from Process Explorer, since none of the options on the Process Memory tab of the Select Columns dialog have the word "commit" in themm.
you need use NtQueryInformationProcess with ProcessVmCounters information class.
on exit you got VM_COUNTERS structure - look in ntddk.h (from windows WDK) for it definition.
typedef struct _VM_COUNTERS {
SIZE_T PeakVirtualSize;
SIZE_T VirtualSize;
ULONG PageFaultCount;
SIZE_T PeakWorkingSetSize;
SIZE_T WorkingSetSize;
SIZE_T QuotaPeakPagedPoolUsage;
SIZE_T QuotaPagedPoolUsage;
SIZE_T QuotaPeakNonPagedPoolUsage;
SIZE_T QuotaNonPagedPoolUsage;
SIZE_T PagefileUsage;
SIZE_T PeakPagefileUsage;
} VM_COUNTERS;
you also can use VM_COUNTERS_EX instead VM_COUNTERS - kernel understand which structure you requested by checking output buffer size. typical usage example :
HANDLE hProcess;
VM_COUNTERS_EX vmc;
if (0 <= ZwQueryInformationProcess(hProcess, ProcessVmCounters, &vmc, sizeof(vmc), 0))
{
}
I'm a newbie in OpenCL programming.
My very first program is giving me a hard time. I wanted to query device name and vendor name of every device in each platform. My system has two platforms, the first one is AMD platform and the second is NVIDIA CUDA platform. I've written the following code to get the desired info.
int main(int argc, char **argv) {
try {
vector<cl::Platform>platforms;
cl::Platform::get(&platforms);
cl_context_properties properties[] = {CL_CONTEXT_PLATFORM, (cl_context_properties)(platforms[0])(), 0};
cl::Context context(CL_DEVICE_TYPE_ALL, properties);
vector<cl::Device> devices = context.getInfo<CL_CONTEXT_DEVICES>();
string dName(devices[0].getInfo<CL_DEVICE_NAME>());
string vendor(devices[0].getInfo<CL_DEVICE_VENDOR>());
cout<<"\tDevice Name:"<<dName<<endl;
cout<<"\tDevice Vendor: "<<vendor<<endl;
}catch(cl::Error err) {
cerr<<err.what()<<" error: "<<printErrorString(err.err())<<endl;
return 0;
}
}
when I change the platform index to 1 in
cl_context_properties properties[] = {CL_CONTEXT_PLATFORM, (cl_context_properties)(platforms[0])(), 0};
my program crashes with 'Segmentation fault'.
I really appreciate your help.
Thanks!
I suspect that you are using the cl.hpp header file from the AMD APP SDK? If that is the case then the problem is that the header file calls an OpenCL 1.2 function (can't remember which one) that is supplied by the AMD devices in your system but not by the Nvidia GPU. Your Nvidia GPU only supports OpenCL 1.1. The best solution I know is to use the header files for OpenCL 1.1 from the Khronos website.
I am developing an OS in C (and some assembly of course) and now I want to allow it to load/run external (placed in the RAM-disk) programs. I have assembled a test program as raw machine code with nasm using '-f bin'. Everything else i found on the subject is loading code while running Windows or Linux. I load the program into memory using the following code:
#define BIN_ADDR 0xFF000
int run_bin(char *file) //Too many hacks at the moment
{
u32int size = 0;
char *bin = open_file(file, &size);
printf("Loaded [%d] bytes of [%s] into [%X]\n", size, file, bin);
char *reloc = (char *)BIN_ADDR; //no malloc because of the org statement in the prog
memset(reloc, 0, size);
memcpy(reloc, bin, size);
jmp_to_bin();
}
and the code to jump to it:
[global jmp_to_bin]
jmp_to_bin:
jmp [bin_loc] ;also tried a plain jump
bin_loc dd 0xFF000
This caused a GPF when I ran it. I could give you the registers at the GPF and/or a screenshot if needed.
Code for my OS is at https://github.com/farlepet/retro-os
Any help would be greatly appreciated.
You use identity mapping and flat memory space, hence address 0xff000 is gonna be in the BIOS ROM range. No wonder you can't copy stuff there. Better change that address ;)
My program has a custom allocator which gets memory from the OS using mmap(MAP_ANON | MAP_PRIVATE). When it no longer needs memory, the allocator calls either munmap or madvise(MADV_FREE). MADV_FREE keeps the mapping around, but tells the OS that it can throw away the physical pages associated with the mapping.
Calling MADV_FREE on pages you're going to need again eventually is much faster than calling munmap and later calling mmap again.
This almost works perfectly for me. The only problem is that, on MacOS, MADV_FREE is very lazy about getting rid of the pages I've asked it to free. In fact, it only gets rid of them when there's memory pressure from another application. Until it gets rid of the pages I've freed, MacOS reports that my program is still using that memory; in the Activity Monitor, its "Real Memory" column doesn't reflect the freed memory.
This makes it difficult for me to measure how much memory my program is actually using. (This difficulty in measuring RSS is keeping us from landing the custom allocator on 10.5.)
I could allocate a whole bunch of memory to force the OS to free up these pages, but in addition to taking a long time, that could have other side-effects, such as causing parts of my program to be paged out to disk.
On a lark, I tried the purge command, but that has no effect.
How can I force MacOS to clean out these MADV_FREE'd pages? Or, how can I ask MacOS how many MADV_FREE'd pages my process has in memory?
Here's a test program, if it helps. The Activity Monitor's "Real Memory" column shows 512MB after the program goes to sleep. On my Linux box, top shows 256MB of RSS, as desired.
#include <sys/mman.h>
#include <stdio.h>
#include <unistd.h>
#define SIZE (512 * 1024 * 1024)
// We use MADV_FREE on Mac and MADV_DONTNEED on Linux.
#ifndef MADV_FREE
#define MADV_FREE MADV_DONTNEED
#endif
int main()
{
char *x = mmap(0, SIZE, PROT_READ | PROT_WRITE, MAP_ANON | MAP_PRIVATE, -1, 0);
// Touch each page we mmap'ed so it gets a physical page.
int i;
for (i = 0; i < SIZE; i += 1024) {
x[i] = i;
}
madvise(x, SIZE / 2, MADV_FREE);
fprintf(stderr, "Sleeping. Now check my RSS. Hopefully it's %dMB.\n", SIZE / (2 * 1024 * 1024));
sleep(1024);
return 0;
}
mprotect(addr, length, PROT_NONE);
mprotect(addr, length, PROT_READ | PROT_WRITE);
Note as you say, madvise is lazier, and that is probably better for performance (just in case anyone is tempted to use this for performance rather than measurement).
Use MADV_FREE_REUSABLE on macOS. According to Apple's magazine_malloc implementation:
On OS X we use MADV_FREE_REUSABLE, which signals the kernel to remove the given pages from the memory statistics for our process. However, on returning that memory to use we have to signal that it has been reused.
https://opensource.apple.com/source/libmalloc/libmalloc-53.1.1/src/magazine_malloc.c.auto.html
Chromium, for example, also uses it:
MADV_FREE_REUSABLE is similar to MADV_FREE, but also marks the pages with the reusable bit, which allows both Activity Monitor and memory-infra to correctly track the pages.
https://github.com/chromium/chromium/blob/master/base/memory/discardable_shared_memory.cc#L377
I've looked and looked, and I don't think this is possible. :\
We're solving the problem by adding code to the allocator which explicitly decommits MADV_FREE'd pages when we ask it to.