Running GridsearchCV() in parallel

Running GridsearchCV() in parallel - windows

What are the computer hardware requirements to run a GridsearchCV() or RandomizedsearchCV() in parallel ( having either n_jobs > 1 or n_job == -1 )?
Do all todays' computers support that?

Q: Do all todays' computers support that?
A: Yes, they certainly do.
The problem is with process-instantiation costs and with indirect-effects of the whole python-interpreter replica(s). For further details ref. to details explained here.

Related

Creating many Sockets in ZMQ - too many files error

I am trying to create sockets with inproc:// transport class from the same context in C.
I can create 2036 sockets, when I try to create more zmq_socket() returns NULL and the zmq_errno says 24 'Too many open files'.
How can I create more than 2036 sockets? Especially as inproc forces me to use only one context.
There are several things I don't understand:
- the sockets are eventually turned to inproc, why does it take up files?
- Increasing ZMQ_MAX_SOCKETS does not help, the system file limit appears to be the limiting factor
- I am unable to increase the file limit with ulimit on my Mac, no workaround helped.
// the code is actually in cython and can be found here:
https://github.com/DavoudTaghawiNejad/ABsinthCE

Use zmq_ctx_set():
zmq_ctx_set (context, ZMQ_MAX_SOCKETS, 256);

You can change these using sysctl ( tried on Yosemite and El Capitan ), but the problem is what to change. Here is a post on this topic: Increasing the maximum number of tcp/ip connections in linux
That's on Linux, and the Mac is based on BSD 4.x, but man pages for sysctl on BSD are available online.
Note: sysctl is a private interface on iOS.

Solution is multi-fold complex:
inproc does not force you to have a common Context() instance, but it is handy to have one, as the signalling / messaging goes without any data-transfers, just by Zero-copy, pointer manipulations for in-RAM blocks of memory, which is extremely fast.
I started to assemble ZeroMQ-related facts about having some 70.000 ~ 200.000 file-descriptors available for "sockets", as supported by O/S kernel settings, but your published aims are higher. Much higher.
Given your git-published Multi-agent ABCE Project paper refers to nanosecond shaving, HPC-domain grade solution to have ( cit. / emphasis added: )
the whopping number of 1.073.545.225, many more agents than fit into the memory of even the most sophisticated supercomputer, some small hundreds of thousands of file-descriptors are not much worth spending time with.
Your Project faces multiple troubles at the same time.
Let's peel the problem layers off, step by step:
File Descriptors (FD) -- Linux O/S level -- System-wide Limits:
To see the actual as-is state:
edit /etc/sysctl.conf file
# vi /etc/sysctl.conf
Append a config directive as follows:
fs.file-max = 100000
Save and close the file.
Users need to log out and log back in again to changes take effect or just type the following command:
# sysctl -p
Verify your settings with command:
# cat /proc/sys/fs/file-max
( Max ) User-specific File Descriptors (FD) Limits:
Each user has additionally a set of ( soft-limit, hard-limit ):
# su - ABsinthCE
$ ulimit -Hn
$ ulimit -Sn
However, you can limit your ABsinthCE user ( or any other ) to any specific limits by editing /etc/security/limits.conf file, enter:
# vi /etc/security/limits.conf
Where you set ABsinthCE user the respective soft- and hard-limit as needed:
ABsinthCE soft nofile 123456
ABsinthCE hard nofile 234567
All that is not for free - each file descriptor takes up some kernel memory, so at some point you may and you will exhaust it. A few hundred thousands file descriptors are not trouble for server deployments, where event-based ( epoll on Linux ) server architectures are used. But simply straight forget to try to grow this anywhere near the said 1.073.545.225 level.
Today,one can have a private HPC machine ( not a Cloud illusion ) with ~ 50-500 TB RAM.
But still, the multi-agent Project application architecture ought be re-defined, not to fail on extreme resources allocations ( just due to a forgiving syntax simplicity ).
Professional Multi-Agent simulators are right due to extreme scaling very, VERY CONSERVATIVE on per-Agent instance resource-locking.
So the best results are to be expected ( both performance-wise and latency-wise ) when using direct memory-mapped operations. ZeroMQ inproc:// transport-class is fine and does not require a Context() instance to allocate IO-thread ( as there is no data-pump at all, if using just inproc:// transport-class ), which is very efficient for a fast prototyping phase. The same approach will become risky for growing the scales much higher towards the levels expected in production.
Latency-shaving and accelerated-time simulator operations throughput scaling is the next set of targets, for raising both the Multi-Agent based simulations static scales and increasing the simulator performance.
For a serious nanoseconds huntingfollow the excellent Bloomberg's guru, John Lakos, insights on HPC.
Either pre-allocate ( as a common Best Practice in RTOS domain) and do not allocate at all, or follow John's fabulous testing-supported insights presented on ACCU 2017.

How to print kernel call stack in Mac OS X

In Linux, I can use echo t > /proc/sysrq-trigger to dump the kernel call stack of all threads in system.
Is there any method in Mac OS X for the same purpose? or any method to dump kernel stack of one process?

Short answer: procexp 0 threads (as root) will do the trick, where procexp is "Process Explorer" from http://newosxbook.com/tools/procexp.html .
Slightly Longer answer:
- Dtrace is overkill and will need SIP disablement
- stackshot is deprecated since its underlying syscall (#365) was removed
- A replacement, stack_snapshot_with_config(#491) can be used programmatically as well (this is what drives the above tool)

The answer is probably dtrace. I know Instruments.app (or iprofiler) can do probe based profiling, so it takes periodic stack traces. (user or kernel; your choice) As far as I'm aware this is all based on dtrace, although I don't know it well enough to be able to tell you a way to take a one-off trace.

Hmm... I didn't code on Mac OS X for serval years. But a tool with name 'stackshot' can help you do this. Try to google it to get the usage. :-)

From http://www.brendangregg.com/DTrace/DTrace-cheatsheet.pdf:
sudo dtrace -n 'fbt:::entry { stack(10); ustack(5) }'
prints 10 kernel frames, 5 userland frames

getting system time in Vxworks

is there anyways to get the system time in VxWorks besides tickGet() and tickAnnounce? I want to measure the time between the task switches of a specified task but I think the precision of tickGet() is not good enough because the the two tickGet() values at the beggining and the end of taskSwitchHookAdd function is always the same!

If you are looking to try and time task switches, I would assume you need a timer at least at the microsecond (us) level.
Usually, timers/clocks this fine grained are only provided by the platform you are running on. If you are working on an embedded system, you can try and read thru the manuals for your board support package (if there is one) to see if there are any functions provided to access various timers on a board.
A more low level solution would be to figure out the processor that is running on your system and then write some simple assembly code to poll the processor's internal timebase register (TBR). This might require a bit of research on the processor you are running on, but could be easily done.
If you are running on a PPC based processor, you can use the code below to read the TBR:
loop: mftbu rx #load most significant half from TBU
mftbl ry #load least significant half from TBL
mftbu rz #load from TBU again
cmpw rz,rx #see if 'old' = 'new'
bne loop #repeat if two values read from TBU are unequal
On an x86 based processor, you might consider using the RDTSC assembly instruction to read the Time Stamp Counter (TSC). On vxWorks, pentiumALib has some library functions (pentiumTscGet64() and pentiumTscGet32()) that will make reading the TSC easier using C.
source: http://www-inteng.fnal.gov/Integrated_Eng/GoodwinDocs/pdf/Sys%20docs/PowerPC/PowerPC%20Elapsed%20Time.pdf
Good luck!

It depends on what platform you are on, but if it is x86 then you can use:
pentiumTscGet64();

ATXmega change fuse

I have an ATXMEGAA3BU processor and I use a CrossPack on my MacOS. I would like to use my old USBASP pragrammer which is "configured" to programm the CPU through the PDI interface - that is not a problem.
The problem is that I do not know how to setup the FUSES on this ATXmega.
For ordinary CPU like ATMega8 the sequence in the Make file was simple.Just use this: FUSES = -U hfuse:w:0xd9:m -U lfuse:w:0x24:m
But the Xmega has five FUSEBYTES and I have a problem with them... so the simple question is "how to change e.g. JTAGEN from 0 to 1"? It is located in teh FUSEBYTE4 as bit 0. How to tell the CrossPack (avr-dude) to change this or other from e.g. FUSEBYTE0?
Thank you...

Maybe this is related to Robotics StackExchange.
But I will try to answer here.
If it's possible for you to switch to windows, The fuse bit changing progress is very easily done with CodevisionAVR. With just some single clicks it's done. and it doesn't have the headaches of this terminal commands.

Please refer to the datasheet for xmega a3bu at : http://www.atmel.com/Images/Atmel-8331-8-and-16-bit-AVR-Microcontroller-XMEGA-AU_Manual.pdf
The name of the fuse bytes are: FUSEBYTE0, FUSEBYTE1, ... FUSEBYTE5. There's no FUSEBYTE3. Have you tried
-U fusebyte0:w:0xd9:m -U fusebyte1:w:0x24:m -U fusebyte2:w:0x24:m and so on. You could give it a shot with exercise precaution while calculating the fuse bits and the lock bits.

I know this is probably too late for OP, but for others (like me) who come across this question, you can also add
FUSES =
{
0x00,//sets jtag address
0xAA,//fuse byte 1
0x9D,//f byte 2
0x00,//unused
0xDE,//f byte 4
0x1E //f byte 5
};
to the top of your main.c file and the compiler / programmer will take care of flashing them.
Tested on xmegaA4.

State of Registers After Bootup

I'm working on a boot loader on an x86 machine.
When the BIOS copies the contents of the MBR to 0x7c00 and jumps to that address, is there a standard meaning to the contents of the registers? Do the registers have standard values?
I know that the segment registers are typically set to 0, but will sometimes be 0x7c0. What about the other hardware registers?

This early execution environment is highly implementation defined, meaning the implementation of your particular BIOS. Never make any assumptions on the contents of registers. They might be initialized to 0, but they might contain a random value just as well.
from the OS dev Wiki, which is where I get information when I'm playing with my toy OS's

Best option would be to assume nothing. If they have meaning, you will find that from the other side when you need the information they provide.

Undefined, I believe? I think it depends on the mainboard and CPU, and should be treated as random for your own good.

Safest bet is to assume undefined.

Always assume undefined, otherwise you'll hit bad problems if you ever try to port architectures.
There is nothing quite like the pain of porting code that assumes everything uninitialized will be set to zero.

The only thing that I know to be well defined is the processor state immediately after reset.
For the record you can find that in Intel's Software Developer's Manual Vol 3 chapter 8: "PROCESSOR MANAGEMENT AND INITIALIZATION" in the table titled " IA-32 Processor States Following Power-up, Reset, or INIT"

You can always initialize them yourself to start with a known state.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Running GridsearchCV() in parallel - windows

What are the computer hardware requirements to run a GridsearchCV() or RandomizedsearchCV() in parallel ( having either n_jobs > 1 or n_job == -1 )? Do all todays' computers support that?

Q: Do all todays' computers support that? A: Yes, they certainly do. The problem is with process-instantiation costs and with indirect-effects of the whole python-interpreter replica(s). For further details ref. to details explained here.

Related

Creating many Sockets in ZMQ - too many files error

How to print kernel call stack in Mac OS X

getting system time in Vxworks

ATXmega change fuse

State of Registers After Bootup

Categories

Resources