What happens if I use more cores in QEMU than total available cores in host - parallel-processing

I am running dhrystone benchmarking tool to see the performance of qemu-system-riscv64 which is running ubuntu 22.04 pre-installed image. Host machine has 2 cores with 1 thread each. I ran tests on qemu-system-riscv64 in combination of 1, 2 and 4 cores (can be specified with smp flag). I observed that when I go from 1 core to two cores for qemu-system-riscv64, the dhrystones increase but when I go from 2 cores to 4 cores, the number of dhrystones become lower than that of two cores. What can be the reason of this behavior. I am using following command to boot ubuntu 22.04:
qemu-system-riscv64 \
-machine virt -nographic -m 2048 -smp 4 \
-kernel $UBOOTPATH/u-boot.bin \
-device virtio-net-device,netdev=eth0 -netdev user,id=eth0,hostfwd=::<host_port>-:<VM_port> \
-drive file=ubuntu-22.04.1-preinstalled-server-riscv64+unmatched.img,format=raw,if=virtio
I also tried running make with -j flag, the same behavior occurs when I use -j4 and -j2 as is described above.

Qemu target riscv64-softmmu supports MTTCG, so every emulated guest core runs in a separate host thread, thus guest performance is saturated by the total host processing power. I.e. with a guest capable of using all available guest cores on an otherwise idle host system adding a new guest core will increase overall guest performance as long as the total number of guest cores does not exceed the number of host cores. After that the host CPU load will approach 100% and adding new guest cores will only increase concurrence for the host CPU time.

Related

Unexpected supervisor processes limitation

I’m using Beanstalkd queues in Laravel, controlled by Supervisord.
Laravel v7.30.6
Beanstalkd v1.10
Supervisord v3.3.1
Ubuntu 18.04.6 LTS (125G RAM)
PHP 7.4
I have 19 tubes (queues) and around 1000 processes in total.
When I run supervisor in Systemd mode (service supervisor start) I face with some processes limitation. Supervisor runs only around 360 processes in total in some tubes, rest of tubes waits and doesn’t run processes at all.
beanstalk console example
But when I run supervisor from command line from root (/usr/bin/supervisord -c /etc/supervisor/supervisord.conf) all processes in all tubes runs normally.
So, why I have limitations in Systemd mode?
P.S.: and of course I know about system ulimit, and I have increased limits for root and for user owned tubes processes.
ulimit -Hu: 655350
ulimit -Su: 655350
supervisord requires minfds parameter to increase open files limit for beanstalkd processes
make sure this is set
cat /etc/supervisord.conf
[supervisors]
...
minfds=1024;

Heterogeneous nodes in OpenMPI

I am new to OpenMPI. I heard that it supports heterogeneous nodes.
I have couple of raspberry-pis and an i7 machine with me. I have installed OpenMPI in all of them. I have also configured password-less ssh so that master (i7 pc) could launch a process in raspberry-pis.
When I run simple hello_MPI.exe using following command from i7 machine,
mpiexec -machinefile machinefile -n 2 hello_MPI.exe
Nothing happens! It hangs. However, hello_MPI.exe executes properly when I am working with only 2 r-pis (one of the r-pis is master in this case. i7 machine is not used as one of the computing nodes)
Additional information:
hello_MPI.exe is in the same directory in all the nodes (2 raspberry-pi s and i7 machine). machinefile contains ip addresses of 2 raspberry-pis. .exe file on i7 machine and r-pi is not the same. i.e. the one on r-pi is compiled on r-pi and the one on i7 machine is compiled on i7 pc.
It will be very helpful for me if anyone could tell me what's happening here.
Thanks!

running spark-ec2 with --worker-instances

right an absolute spark noob is talking here.
this is the command I'm running and expecting 3 workers
./spark-ec2 --worker-instances=3 --key-pair=my.key --identity-file=mykey.pem --region=us-east-1 --zone=us-east-1a launch my-spark-cluster-G
however, in aws console only two servers will be created (master and slave)
on the other side in :
http://myMasterSparkURL:8080/
I get the following info which does not just add up:
Workers: 3
Cores: 3 Total, 3 Used
Memory: 18.8 GB Total, 18.0 GB Used
Applications: 1 Running, 0 Completed
Drivers: 0 Running, 0 Completed
Status: ALIVE
and under workers it shows:
worker1 (port 8081) worker1IP:43595 ALIVE 1 (1 Used) 6.3 GB (6.0 GB Used)
worker1 (port 8082) worker1IP:53195 ALIVE 1 (1 Used) 6.3 GB (6.0 GB Used)
worker1 (port 8083) worker1IP:41683 ALIVE 1 (1 Used) 6.3 GB (6.0 GB Used)
now if I click on the first one (worker with 8081) it redirected me to the worker page however if I click on the other two (workers with port 8082 and 8083). it basically says page not found.
with high probability I am assuming this is a bug in spark-ec2 but I'm not quite sure since I'm a noob here.
I've searched all over the place to find someone with similar issue. so I appreciate any suggestion where can give me some ideas why this is happening and how to fix it. ty
The spark version spark-1.3.0
You might want to change that invokation a little, this is how I have been creating clusters so far:
./spark-ec2 -k MyKey
-i MyKey.pem
-s 3
--instance-type=m3.medium
--region=eu-west-1
--spark-version=1.2.0
launch MyCluster

Installing Hadoop over 5 hard drives on a desktop

I have been working with installing Hadoop. I followed some instruction on a Udemy course, and I installed Hadoop on pseudo distributed mode, on my laptop. It was fairly straightforward.
After that, I started to wonder if I could set up Hadoop on a desktop computer. So went out and bought an empty case and put in a 64 bit, 8 core AMD processor, along with a 50GB SSD hard drive and 4 inexpensive 500GB hard drives. I installed Ubuntu 14.04 on the SSD drive, and put virtual machines on the other drives.
I'm envisioning using my SSD as the master and using my 4 hard drives as nodes. Again, everything is living in the same case.
Unfortunately, and I've been searching everywhere, and I can't find any tutorials, guides, books, etc, that describe setting up Hadoop in this manner. It seems like most everything I've found that details installation of Hadoop is either a simple pseudo distributed setup (which I've already done), or else the instructions jump straight to large scale commercial applications. I'm still learning the basics, clearly, but I'd like to play in this sort-of in between place.
Has anyone done this before, and/or come across any documentation / tutorials / etc that describe how to set Hadoop up in this way? Many thanks in advance for the help.
You can run hadoop in different VM's which are located in different drives in the same system.
But you need to allocate same configurations for all the master and slave nodes
Also ensure that all the VM's having different ip addresses.
You can get different IP addresses by connecting your master computer to the LAN or you need to disable some functionality in VM machines in order to get different IP addresses.
if you done the hadoop installation in pseduo mode means then follow the below steps this may help you.
MULTINODE :
Configure the hosts in the network using the following settings in the host file. This has to be done in all machine [in namenode too].
sudo vi /etc/hosts
add the following lines in the file:
yourip1 master
yourip2 slave01
yourip3 slave02
yourip4 slave03
yourip5 slave04
[Save and exit – type ESC then :wq ]
Change the hostname for the namenode and datanodes.
sudo vi /etc/hostname
For master machine [namenode ] – master
For other machines – slave01 and slave02 and slave03 and slave04 and slave 05
Restart the machines in order to get the settings related to the network applied.
sudo shutdown -r now
Copy the keys from the master node to all datanodes, so as this will help to access the machines without asking for permissions everytime.
#ssh-copy-id –i ~/.ssh/id_rsa.pub hduser#slave01
#ssh-copy-id –i ~/.ssh/id_rsa.pub hduser#slave02
#ssh-copy-id –i ~/.ssh/id_rsa.pub hduser#slave03
#ssh-copy-id –i ~/.ssh/id_rsa.pub hduser#slave04
Now we are about to configure the hadoop configuration settings, so navigate to the ‘conf’ folder.
cd ~/hadoop/etc
Edit the slaves file within the hadoop directory.
vi ~/hadoop/conf/slaves
And add the below :
master
slave01
slave02
slave03
slave04
Now update localhost to master in core-site.xml,hdfs-site.xml,mapred-site.xml and yarn-site.xml
Now copy the files from the hadoop/etc/hadoop folder from master to slave machines.
then format you namenode in all machines.
and start the hadoop services.
I given you the some clues for how to configure the hadoop multinode cluster.
Never tried, but if you type ifconfig then it gives you same ipaddress on all the vm machines in hard drives. So this may not be the better option to go..
You can try creating Hadoop Cluster on Amazon EC2 for free using this step by step guide HERE
Or Video guide HERE
Hope it helps!

EC2 micro instance memory issue

I am running a micro instace in EC2 with 592 MB available RAM
Jenkins was crashing due to Out Of Memory build errors while running UPDATE on big SQL Table in backend.
Disk utilisation is 83% with 6 GB out of 8GB EBS volume used ..
sudo du -hsx * | sort -rh | head -10
/
2.7G opt
1.5G var
1.2G usr
I found only 6 MB was free with command - "free -m " with these services running -
(i) LAMPP
(ii) Jenkins
(iii) Mysql 5.6
I stopped LAMPP and that created 70 MB free space
Then , I closed Jenkins, it created 320 MB free space
Closing MySQL 5.6 brings it up to 390 MB free space ..
So, 200MB RAM is still getting used with none of my services running.
Is 200MB RAM minimum required for an Ubuntu micro Instance running on Amazon EC2 ?
Nope, i believe it can run till its 100% used.
If a task that requires a large memory than what is available, the task is killed.
To free up more memory space, you can run this from your terminal
sudo apt-get autoremove

Resources