What causes apic_timer_interrupt()? - linux-kernel

I am on CentOS 7 (kernel 3.10.0-1062.1.2.el7.x86_64).
For some reasons, my application with tight loop (on an isolated core, which is on nohz_full list) has apic_timer_interrupt about once every 1 sec. This isn't the local timer interrupt.
My dummy application:
#include <cstdio>
#include <cstdlib>
#include <ctime>
#include <cstdint>
int main(int argc, char** argv) {
srand(time(nullptr));
int32_t i = 0;
while (i != rand()) i = rand() * -1;
printf("%d", i);
}
/proc/sched_debug showed that the core only had one task:
cpu#5, 2900.000 MHz
.nr_running : 1
dmesg showed that the core is using dynamic tick:
[Sun Mar 15 23:11:30 2020] NO_HZ: Full dynticks CPUs: 2-11.
Any idea?

Related

How to parallelize a for loop inside a method with pointers to objects

I am trying to parallelize a for loop inside a class using pointers to objects.
However, the performance of the loop is always 3.7 seconds regardless of the number of threads used.
Here is the part of the code:
#include <iostream>
#include <string>
#include <vector>
#include <omp.h>
#include "data.hpp"
#include "model.hpp"
#include "sub.hpp"
void Data::apply(Data* data, Model* model, Sub* sub, unsigned i)
{
omp_set_num_threads(omp_get_max_threads()); // max threads is 8
#pragma omp parallel for
for(int j = 0; j < model->n_max; ++j) //model->n_max is 20 000 000
{
sub->cur[j] = sub->old[i * model->n_old + j] * sub->i + sub->predict[j];
}
}
Can you explain me what I am doing wrong here?

How to assign multiple cores of a single node to single job/process in MPI cluster?

I have an MPI programme which I want to run on 30 nodes (each node has having 32 cores). How can I assign all cores of a node to a single job/process?
I am using slots to restrict the no of jobs for a particular node.
node001 slots=1 max_slots=20
node002 slots=1 max_slots=20
Is there any parameter I can use to achieve this?
Thanks in advance.
With openmpi, you can use the option --rankfile to explicitly set the ranks.
The syntax of the file can be found here : https://www.open-mpi.org/doc/v2.0/man1/mpirun.1.php^
Here is a very simple MPI+OpenMP program :
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sched.h>
#include <mpi.h>
#include <omp.h>
void main(int argc, char** argv)
{
MPI_Init(&argc, &argv);
unsigned cpu;
unsigned node;
int rank;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
#pragma omp parallel
{
printf("[%d:%d] %d\n", rank, omp_get_thread_num(), sched_getcpu());
}
MPI_Finalize();
}
Which prints [MPI_rank:OMP_rank] cpu for each OpenMP thread.
The basic format for rankfiles is :
rank <rank>=<host> slot=<slot>:<cores>
With this basic rankfile (Host=Marvin, 2cpu on one slot):
>cat ./rankfile
rank 0=Marvin slot=0:0
rank 1=Marvin slot=0:0
rank 2=Marvin slot=0:0
rank 3=Marvin slot=0:1
rank 4=Marvin slot=0:0-1
rank 5=Marvin slot=0:0
These are my prints :
>mpirun -n 6 --rankfile ./rankfile ./main
[0:0] 0
[1:0] 0
[2:0] 0
[3:0] 1
[4:0] 1
[4:1] 0
[5:0] 0
I didn't set OMP_NUM_THREADS environment variable in order to let OpenMP detect how many cores are available for each rank.
Hope this may help you

Why is C++ so much faster than C in this code?

My C code is:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(void){
char* a = (char*)malloc(200000);
for (int i = 0;i< 100000;i++){
strcat(a,"b");
}
printf("%s",a);
}
My C++ code is
#include <iostream>
int main(void){
std::string a = "";
for (int i = 0;i< 100000;i++){
¦ a+="b";
}
std::cout<<a;
}
On my machine, the C code runs in about 5 seconds, while on my machine, the C++ code runs in 0.025! seconds.
Now, the C code doesn't check for overflows, has no C++ overhead, classes and yet is quite a few magnitudes slower than my C++ code.
Using gcc/g++ 6.2.0 compiled with -O3 on Raspberry Pi.
#erwin was correct.
When I change my code to
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void mystrcat(char* src,char* dest,int lenSrc){
src[lenSrc]=dest[0];
}
int main(void){
char* a = (char*)malloc(200000);
for (int i = 0;i< 100000;i++){
mystrcat(a,"b",i);
}
a[100000] = 0;
printf("%s\n",a);
}
It takes about .012s to run (mostly printing the large screen).
Shlemiel's the painter's algorithm at work!

Implement a random-number generator using only getpid() and gettimeofday()?

I am using gcc compiler to Implement a random-number generator using only getpid() and gettimeofday(). Here is my code
#include <stdio.h>
#include <sys/time.h>
#include <sys/time.h>
#include <time.h>
#include <stdlib.h>
int main(int argc, char **argv)
{
struct timeval tv;
int count;
int i;
int INPUT_MAX =10;
int NO_OF_SAMPLES =10;
gettimeofday(&tv, NULL);
printf("Enter Max: \n");
scanf("%d", &INPUT_MAX);
printf("Enter No. of samples needed: \n");
scanf("%d", &NO_OF_SAMPLES);
/*printf("%ld\n",tv.tv_usec);
printf("PID :%d\n", getpid());*/
for (count = 0; count< NO_OF_SAMPLES; count++) {
printf("%ld\n", (getpid() * tv.tv_usec) % INPUT_MAX + 1);
for (i = 0; i < 1000000; ++i)
{
/* code */
}
}
return 0;
}
I gave a inner for loop for delay purpose but the result what i am getting is always same no. like this
./a.out
Enter Max:
10
Enter No. of samples needed:
10
1
1
1
1
1
1
1
1
1
1
Plz correct me what am i doing wrong?
getpid() is constant during the programs execution, so you get constant values, too.
But even if you use gettimeofday() inside the loop, this likely won't help:
gcc will likely optimize away your delay loop.
even it it's not optimized away, the delays will be very similar and your values won't be very random.
I'd suggest you look up "linear congruential generator", for a simple way to generate more random numbers.
Put gettimeofday in the loop. Look if getpid() is divisible by INPUT_MAX + 1 you will get the same answer always. Instead you can add getpid() (not make any sense though()) to tv.tv_usec.

Detect current CPU Clock Speed Programmatically on OS X?

I just bought a nifty MBA 13" Core i7. I'm told the CPU speed varies automatically, and pretty wildly, too. I'd really like to be able to monitor this with a simple app.
Are there any Cocoa or C calls to find the current clock speed, without actually affecting it?
Edit: I'm OK with answers using Terminal calls, as well as programmatic.
Thanks!
Try this tool called "Intel Power Gadget". It displays IA frequency and IA power in real time.
http://software.intel.com/sites/default/files/article/184535/intel-power-gadget-2.zip
You can query the CPU speed easily via sysctl, either by command line:
sysctl hw.cpufrequency
Or via C:
#include <stdio.h>
#include <sys/types.h>
#include <sys/sysctl.h>
int main() {
int mib[2];
unsigned int freq;
size_t len;
mib[0] = CTL_HW;
mib[1] = HW_CPU_FREQ;
len = sizeof(freq);
sysctl(mib, 2, &freq, &len, NULL, 0);
printf("%u\n", freq);
return 0;
}
Since it's an Intel processor, you could always use RDTSC. That's an assembler instruction that returns the current cycle counter — a 64bit counter that increments every cycle. It'd be a little approximate but e.g.
#include <stdio.h>
#include <stdint.h>
#include <unistd.h>
uint64_t rdtsc(void)
{
uint32_t ret0[2];
__asm__ __volatile__("rdtsc" : "=a"(ret0[0]), "=d"(ret0[1]));
return ((uint64_t)ret0[1] << 32) | ret0[0];
}
int main(int argc, const char * argv[])
{
uint64_t startCount = rdtsc();
sleep(1);
uint64_t endCount = rdtsc();
printf("Clocks per second: %llu", endCount - startCount);
return 0;
}
Output 'Clocks per second: 2002120630' on my 2Ghz MacBook Pro.
There is a kernel extension written by "flAked" which logs the cpu p-state to the kernel log.
http://www.insanelymac.com/forum/index.php?showtopic=258612
maybe you could contact him for the code.
This seems to work correctly on OSX.
However, it doesn't work on Linux, where sysctl is deprecated and KERN_CLOCKRATE is undefined.
#include <sys/sysctl.h>
#include <sys/time.h>
int mib[2];
size_t len;
mib[0] = CTL_KERN;
mib[1] = KERN_CLOCKRATE;
struct clockinfo clockinfo;
len = sizeof(clockinfo);
int result = sysctl(mib, 2, &clockinfo, &len, NULL, 0);
assert(result != -1);
log_trace("clockinfo.hz: %d\n", clockinfo.hz);
log_trace("clockinfo.tick: %d\n", clockinfo.tick);

Resources