MPI CPU binding to specific core range - parallel-processing

I am running a desktop with 32 cores (physical) on Windows 10 and have various MPI jobs I launch. I am trying to avoid a scheduler such as PBS as it is overkill for a single user, I am instead looking for if there is a way to specify which cores the MPI-task is launched on?
i.e. :
mpiexec -n 8 [core 0 to core 7] task.1
mpiexec -n 8 [core 8 to core 15] task.2 # etc...
I have seen various methods posted using powershell to set processor affinity but these do not appear to work, there are also options in the Intel MPI for :
Processor topology options:
-bind-to process binding
-map-by process mapping
-membind memory binding policy
But -map-by does not seem to lock to a specific core.

Related

Is there a way to scale up RabbitMQ RPC?

Hi I'm using RabbitMQ RPC to distribute tasks to workers (Request/Response)
I studied rpc_client.py and rpc_server.py on the RabbitMQ tutorial and I saw to scale up we must run another process (rpc_server).
I want to assign about 1k tasks per second but it's too slow
Can you help me to solve this problem
System Spec
RAM : 8GB
CPU : Intel® Core™ i7-7700 CPU # 3.60GHz × 8
In order to scale you can either run multiple processes of python server or use multithreading to handle the requests. Ofcourse it would be important to use a threadsafe client library for RabbitMQ like AMQPStorm.
Here is another example of multithreading using Pika library (note that Pika is not threadsafe)

Is Erlang VM Creates single thread for each hardware core of the CPU?

Does Erlang vm runs on single system thread for one hardware core? If not then what is the advantage of multi core?
No, BEAM (Erlang VM) creates one thread for scheduler per core (or value set by +S switch) and one dirty CPU thread per core (by default since R20 and value can be changed by +SDcpu) and 10 async I/O pool threads by default (change by +A) and 10 dirty I/O threads (since R20, change by +SDio). You can bind scheduler threads to physical cores using +sbt flag (use +sbt db for default bind). You can set CPU topology if detected wrong using +sct flag.
You could find out yourself by easy peek to the documentation. erl

Does Mesos really treat all your resources as a single pool?

Mesos is advertised as a system that lets you program against your datacenter like it's a single pool of resources (See the Mesos Website). But is this really true that you don't need to consider the configuration of the individual machines? Using Mesos, can you request more resources for a task than are available on a single machine?
For example, if you have 10 machines each with 2 cores and 2g of RAM and 20g HD, can you really request 10 cores, 15g of RAM and 100g of disk space for a single task?
If so, how does this work? Is Mesos able to address memory across machines for you, and use other CPUs as local threads and create a single filesystem from a number of distributed nodes?
How does it accomplish this without suffering from the Fallacies of distributed computing, especially those related to network latency and transport cost?
According to this Mesos architecture you can't aggregate resources from different slaves (agents / machines) to use them for one task.
As you can see there is strict "taks per agent" situation
Also their example says pretty much same
Let’s walk through the events in the figure.
Agent 1 reports to the master that it has 4 CPUs and 4 GB of memory
free. The master then invokes the allocation policy module, which
tells it that framework 1 should be offered all available resources.
The master sends a resource offer describing what is available on
agent 1 to framework 1. The framework’s scheduler replies to the
master with information about two tasks to run on the agent, using <2
CPUs, 1 GB RAM> for the first task, and <1 CPUs, 2 GB RAM> for the
second task. Finally, the master sends the tasks to the agent, which
allocates appropriate resources to the framework’s executor, which in
turn launches the two tasks (depicted with dotted-line borders in the
figure). Because 1 CPU and 1 GB of RAM are still unallocated, the
allocation module may now offer them to framework 2.

Supervisor : Why am I able to run more than 4 processes on a 4 core machine?

I'm working on a 4 core machine and using a vagrant box to host my application. I've configured Supervisor to spawn 5 processes on different ports and I can easily run all the processes independently. Does this mean that each processes spawned by Supervisor does not adhere to an individual core ?
Running processes is the OS job. It decides what to run and when to run it.
The fact that you have 4 cores means that you can execute 4 different "codes" in parallel (4 different threads not necessarily from the same process)
That means that if you put 4 processes on the same machine with 4 cores, chances are they will all run in parallel.
If you have 5 processes than at any given moment, only 4 are running, but it will seem like they are all running in parallel, because they will do a "context switch", your cores will randomly stop running 1 process and switch to the other

Hadoop and map-reduce on multicore machines

I have read a lot about Hadoop and Map-Reduce running on clusters of machines. Does some one know if the Apache distribution can be run on an SMP with several cores. In particular, can multiple Map-Reduce processes be run on the same machine. The scheduler will take care of spreading them across multiple cores. Thanks. - KG
Yes. You have multiple map and reduce slots in each machine which are determined by the RAM and CPU (each JVM instance needs 1GB by default so a 8GB machine with 16 cores should still have 7 task slots)
from hadoop wiki
Use the configuration knob: mapred.tasktracker.map.tasks.maximum and
mapred.tasktracker.reduce.tasks.maximum to control the number of
maps/reduces spawned simultaneously on a TaskTracker. By default, it
is set to 2, hence one sees a maximum of 2 maps and 2 reduces at a
given instance on a TaskTracker.
You can set those on a per-tasktracker basis to accurately reflect
your hardware (i.e. set those to higher nos. on a beefier tasktracker
etc.).
You can use those lightweight MapReduce frameworks for multicore computers.
For example
LeoTask: A lightweight, productive, and reliable mapreduce framework for multicore computers
https://github.com/mleoking/LeoTask
For Apache Hadoop 2.7.3, my experience has been that enabling YARN will also enable multi-core support. Here is a simple guide for enabling YARN on a single node:
https://hadoop.apache.org/docs/r2.7.3/hadoop-project-dist/hadoop-common/SingleCluster.html#YARN_on_a_Single_Node
The default configuration seems to work pretty well. If you want to tune your core usage, then perhaps look into setting 'yarn.scheduler.minimum-allocation-vcores' and 'yarn.scheduler.maximum-allocation-vcores' within yarn-site.xml (https://hadoop.apache.org/docs/r2.7.1/hadoop-yarn/hadoop-yarn-common/yarn-default.xml)
Also, see here for instructions on how to configure a simple Hadoop sandbox with multicore support: https://bitbucket.org/aperezrathke/hadoop-aee

Resources