UPC Runtime error: out-of-range size for UPC_SHARED_HEAP_SIZE - upc

I tried running a code xcorupc_alaska compiled on Berkeley UPC
upcrun -n 3 -shared-heap=18GB xcorupc_alaska inputpgas0.txt
Total memory on my computer is 64 GB and I want to allot 18 GB to 3 CPUs (it is a quad-core processor), so it should be doable (usage 18x3=54 GB). However I get this error.
UPC Runtime error: out-of-range size for UPC_SHARED_HEAP_SIZE: 18 GB
NOTICE: Before reporting bugs, run with GASNET_BACKTRACE=1 in the environment to generate a backtrace.
UPC Runtime error: out-of-range size for UPC_SHARED_HEAP_SIZE: 18 GB
NOTICE: Before reporting bugs, run with GASNET_BACKTRACE=1 in the environment to generate a backtrace.
UPC Runtime error: out-of-range size for UPC_SHARED_HEAP_SIZE: 18 GB
NOTICE: Before reporting bugs, run with GASNET_BACKTRACE=1 in the environment to generate a backtrace.
Any idea on what is causing this error and how to fix it ? Thanks for your help.
EDIT: Even for 64 bit system, default maximum shared memory per thread is 16 GB. According to information in INSTALL.TXT, I recompiled with the flag --with-sptr-packed-bits=20,9,35. This limits the maximum number of possible threads to 2^9, but allows 2^16=32 GB maximum shared memory per thread. This solved my problem.

The most common cause for this error is that your build of Berkeley UPC is configured for targeting 32-bit application executables, which can't reliably handle more than about 2GB of shared heap per process. You can confirm this by checking the "Architecture" line in the output of this command:
upcrun -i xcorupc_alaska (or swap in the name of any BUPC executable)
Given your hardware configuration I'd highly recommend rebuilding Berkeley UPC for the LP64 ABI, assuming your OS supports it (most modern OSs do).
The details for doing this depend on your translator and compiler. Assuming you are using the default online Berkeley UPC translator and a gcc-like compiler suite, you probably want a configure line something like:
$(srcdir)/configure CC='gcc -m64' CXX='g++ -m64' MPI_CC='mpicc -m64'
make sure to run this in a fresh build directory to ensure you start from a clean slate. Then build and install as usual (details in $(srcdir)/INSTALL.TXT).

Related

Speed up embedded linux compilation process

Have an embedded linux (OpenWrt) project for custom hardware. Any changes in kernel or application require full image or application recompiling. And recompiling is painfully slow.
To reduce this pain bought AMD Threadripper 3970X based work station with 128Gb RAM and 1Tb SSD. Testbenches for this CPU shows 120 second of linux kernel compilation time.
But I got bigger compilation time.
Full image compilation first time reduced from:
to:
Repeated image compilation reduced from:
to:
Package recompilation ($ time make package/tensorflow/compile) reduced from:
to:
E.g. compiling time reduced 2-7x.
During first image compilation all necessary source code to be downloaded from network. I have fast ethernet (100Mb/s) connection to not waist time for that.
I use RAMDISK:
$ sudo mkdir /mnt/ramdisk
$ sudo mount -t tmpfs -o rw,size=64G tmpfs /mnt/ramdisk
to store all sources, object and temporary files so no IO losses I believe.
make -j64 used to compile it. I see that all 64 cores loaded very rarely during compilation:
Mostly I see following:
or even this:
so I can't believe that faster compilation can't be achieved. Could someone give me hints/advices how to speed up GCC C/C++ cross compilation process. Some search points me to distcc and Parallel GCC but I doesn't have experience with it so not sure if this is what I need as OpenWrt has almost nothing in their manuals explaining how to speed up build process.
In linux, there is a concept of incremental build, so first time it will take time to build, but next time you need to build only the part which is changed or added extra. No need to rebuild everything. In that case build will be faster.
All the cores of the CPU will not be loaded all the times. It depends how many tasks are running currently. Suppose in your system, there are 8 cores but only 6 tasks are running. In that case all the cores will not be loaded fully.

OSX ld: why does pagezero_size default to 4GB on 64b OSX?

This is an OSX linker question. I don't think OSX (BSD or Mach layers) cares how large the zero page is or indeed whether it even exists. I think this is a tools thing. But that's my opinion and that's why I'm asking.
-pagezero_size size: By default the linker creates an unreadable segment starting at address zero named __PAGEZERO. Its existence will cause a bus error if a NULL pointer is dereferenced.
This is clear; it's for trapping NULL ptrs. On a 32b OSX system, the size of the segment is 4KB which is the system pagesize. But on current 64b system, the size of this segment increases to 4GB. Why doesn't it remain at the system pagesize 4KB or the architecture's maximum pagesize, 2MB? This means I can't use 32b absolute addressing at all.
Are there any problems with using this flag and overriding the default? Apple Store rules, ...?
(This feature is specific to the OSX ld64 linker. The feature dates at least to ld64-47.2 March 2006. Address Space Layout Randomization and 64b support start with Leopard in October 2007.)
The -pagezero_size option is a linker option, not a compiler option. So, when you use the compiler to drive linking, you need to pass it as -Wl,-pagezero_size,0x1000 (or whatever size you want). This works fine. The Wine project, to which I'm a contributor, relies on this for compatibility of its 64-bit build.
My understanding as to why the default page-zero size is 4GB for 64-bit is to catch cases where a pointer was inadvertently stored in a 32-bit variable and thus truncated. When it's eventually cast back to a pointer, it will be in the low 4GB and therefore invalid. Any attempt to dereference it will cause an access violation.
Update:
It seems that -pagezero_size is recognized as a compiler option, too, and works just fine in my testing. Either way, I get a functioning executable and otool shows a __PAGEZERO segment of the desired size.
What versions of the tools are you using? I'm using Xcode 8 on Sierra (10.12.6):
$ cc --version
Apple LLVM version 8.1.0 (clang-802.0.41)
...

Golang - why do compilations on similar machines result in significantly different binary file sizes?

I have a gorilla/mux based web service written in Golang.
I've observed that the exact same code produces a binary of size more than 10 MB on my Windows 10 Pro Machine while its about 7 MB on my colleague's Windows 10 Pro Machine.
On yet another colleague's MacBook Pro running OS X Yosemite, the binary is just a bit over 11 MB in size.
What does this binary actually contain?!
It may be due to different architectures (GOARCH env variable). Run go env to verify. Compiled binary to 386 and to amd64 differs significantly (compiled to amd64 is significantly bigger), but it should be close if the architecture is the same with different OS.
Also the Go version itself matters a lot, Go 1.7 reduced the compiled binary size. See blog post Smaller Go 1.7 binaries for details.
Also I assume it's the same, but whether debug info is excluded can reduce the compiled binary size significantly.

gcc 2.03 gives cc1.exe :"out of memory alloction" error while compilation with 32-bit windows 7 OS

I am using djgpp 2.03 version and 32-bit windows 7 OS with 3 GB RAM, but while compiling the c source code, I am getting the error "cc1.exe: out of memory allocating 65536 bytes after a total of 52828808 bytes" same source code is getting correctly compiled with the windows xp system with same utility(djgpp 2.03).I tried by increasing virtual memory space to few GB but didn't work,please help me to out of this issue......
Thanks and Best Regars
Rupesh thakur
In general, if you want to run dos programs in Windows, you should consider virtualization.
Yes, some dos programs will work even without virtualization, but some won't. This seems to be an example of the later.
Try with (unfortunately unreleased) version 2.04 of DJGPP. Follow these instructions. This version has much better compatibility with recent versions of Windows, where "recent" means Windows 2000 and above. (I cannot believe I am saying this in 2010).

Process sizes and differences in behaviour on 32bit vs. 64bit Windows versions

I am investigating a strange problem with my application, where the behaviour is different on 2 versions of Windows:
Windows XP (32-bit)
Windows Server 2008 (64-bit)
My findings are as follows.
Windows XP (32-bit)
When running my test scenario, the XML parser fails at a certain point during the parsing of a very large configuration file (see this question for more information).
At the time of failure, the process size is approximately 2.3GB. Note that a registry key has been set to allow the process to exceed the default maximum process size of 2GB (on 32-bit operating systems).
The system of the failure is a call to IXMLDOMDocument::load() failing, as described in the question linked above.
Windows Server 2008 (64-bit)
I run exactly the same test scenario in Windows Server 2008 -- the only variable is the operating system. When I look at my process under Task Manager, it has a * 32 next to it, which I am assuming means it is running in 32-bit compatibility mode.
What I am noticing is that at the point where the XML parsing fails on Windows XP, the process size on Windows Server 2008 is only about 1GB (IOW, approximately half the process size as on Windows XP).
The XML parsing does not fail on Windows Server 2008, it all works as it should.
My questions are:
Why would a 32-bit application (running in 32-bit mode) consume half the amount of memory on a 64-bit operating system? Is it really using half the memory, it is usual virtual memory differently, or is it something else?
Acknowledging that my application (seems) to be using half the amount of memory on Windows Server 2008, does anyone have any ideas as to why the XML parsing would be failing on Windows XP? Every time I run the test case, the error accessed via IXMLDOMParseError (see this answer) is different. Because this appears to be non-deterministic, it suggests to me that I am running into a memory usage problem rather than dealing with malformed XML.
You didn't say how you observed the process. I'll assume you used Taskmgr.exe. Beware that it's default view gives very misleading values in the Memory column. It shows Working set size, the amount of RAM that's being used by the process. That has nothing to do with the source of your problem, running out of virtual memory space. There is not much reason to assume that Windows 2008 would show the same value as XP, it has a significantly different memory manager.
You can see the virtual memory size as well, use View + Columns.
The reason your program doesn't bomb on a 64-bit operating system is because 32-bit processes have close to 4 gigabytes of addressable virtual memory. On a 32-bit operating system, it needs to share the address space with the operating system and gets only 2 gigabytes. More if you use the /3GB boot option.
Use the SAX parser to avoid consuming so much memory.
Not only are there differences in available memory between 32 bit and 64 bit (as discussed in previous answers), but its the availability of contiguous memory that may be killing your app on 32 bit.
On 32 bit machine your app's DLLs will be littering the memory landscape in the first 2GB of memory (app at 0x00400000, OS DLLs up at 0x7xxx0000, other DLLs elsewhere). Most likely the largest contiguous block you have available is about 1.1GB.
On a 64 bit machine (which gives you the 4GB address space with /LARGEADDRESSAWARE) you'll have a least one block in that 4GB space that is 2GB or more in size.
So there is your difference. If your XML parser is relying on a large blob of memory rather than many small blobs it may be that your XML parser is running out of contiguous usable space on 32 bit but is not running out of contiguous usable space on 64 bit.
If you want to visualize this on the 32 bit OS, grab a copy of VMValidator (free) and look at the Virtual view for a visualization of your memory and the Pages and Paragraphs views to see the data for each memory page/paragraph.

Resources