I am trying perf test on imx6dl arm target, 2 subtests are failing on perf and are:
perf test -v 15
15: Test breakpoint overflow signal handler :
--- start ---
count1 0, count2 0, overflow 0
failed: wrong count for bp10
failed: wrong overflow hit
failed: wrong count for bp2
---- end ----
Test breakpoint overflow signal handler: FAILED!
perf test -v 16
16: Test breakpoint overflow sampling :
--- start ---
count 0, overflow 0
Wrong number of executions 0 != 10000
Wrong number of overflows 0 != 100
---- end ----
Test breakpoint overflow sampling: FAILED!
Please help me out why all the values are showing zero.
Thanks.
Your imx6dl arm may have no hardware performance counters or no interrupt-on-overflow mode on them. Or your kernel has no support of such hardware. You should check exact core name and configuration of your chip and ARM's documentation on hardware performance counters implemented in it.
Related
I am facing a fatal error when trying to simulate in ModelSim a design that instantiates a RAM IP for the target device MACHXO3L from Lattice Semiconductor. I have compiled their libraries to use in ModelSim, but the simulations always stop due to the following fatal error:
# ** Fatal: (vsim-3483) Delay in signal assignment is not ascending.
# Time: 20 ns Iteration: 1 Process: /fft_tb/fft_i/RAM_i1/RAM_0_0_0/P107 File: C:/lscc/diamond/3.11_x64/cae_library/simulation/script/../vhdl/machxo3l/src/MACHXO3L_MISC.vhd Line: 541
# Fatal error in Process P107 at C:/lscc/diamond/3.11_x64/cae_library/simulation/script/../vhdl/machxo3l/src/MACHXO3L_MISC.vhd line 541
ModelSim Fatal Error:
Any ideas? It seems that the problem is the Lattice library MACHXO3L_MISC.vhd line 541
As correctly suggested by #user1155120, the problem is solved by changing the simulation time resolution. I changed it to picoseconds by modifying the modelsim.ini file. The parameter to be modified is:
; Set SystemC default time unit.
; Set to fs, ps, ns, us, ms, or sec with optional
; prefix of 1, 10, or 100. The default is 1 ns.
; The ScTimeUnit value is honored if it is coarser than Resolution.
; If ScTimeUnit is finer than Resolution, it is set to the value
; of Resolution. For example, if Resolution is 100ps and ScTimeUnit is ns,
; then the default time unit will be 1 ns. However if Resolution
; is 10 ns and ScTimeUnit is ns, then the default time unit will be 10 ns.
ScTimeUnit = ps
You may also want or need to change the same parameter in the .mpf file, in your project folder.
If that doesn't change the simulation resolution you can implicitly do it in the vsim command:
vsim work.<your_test_bench> -t ps
I'm able to read DS18B20 sensors using the example code provided in this repository.
It works well using a standard Espressif ESP32-WROOM-32 (aka ESP32-DevKitC), which uses a 40 MHz XTAL.
I'm not able to run the same example using an Allnet-IOT-WLAN, which uses a 26 MHz XTAL.
I suspect that the problem is related with RMT initialization. The initialization is using:
rmt_tx.clk_div = 80;
I've tried different settings for clk_div with no luck.
Does anyone know how to use the DS18B20 sensor with ESP-IDF, using a board with a 26 MHz XTAL, instead of more standard 40 MHz one?
ESP32-WROOM-32 output (working)
I (0) cpu_start: Starting scheduler on APP CPU.
Find devices:
0 : d4000008e40d7428
1 : f8000008e3632528
Found 2 devices
Device 1502162ca5b2ee28 is not present
Temperature readings (degrees C): sample 1
0: 22.3 0 errors
1: 21.8 0 errors
Temperature readings (degrees C): sample 2
0: 22.3 0 errors
1: 21.9 0 errors
Allnet-IOT-WLAN output (not working)
I (0) cpu_start: Starting scheduler on APP CPU.
Find devices:
Found 0 devices
E (6780) owb_rmt: rx_items == 0
E (6880) owb_rmt: rx_items == 0
E (6980) owb_rmt: rx_items == 0
There are no differences in the RMT initialization using different XTAL clock frequencies.
D (2319) rmt: Rmt Tx Channel 1|Gpio 25|Sclk_Hz 80000000|Div 80|Carrier_Hz 0|Duty 35
D (2319) intr_alloc: Connected src 47 to int 13 (cpu 0)
D (2319) rmt: Rmt Rx Channel 0|Gpio 25|Sclk_Hz 80000000|Div 80|Thresold 77|Filter 30
Both use the same 80 MHx source.
I was using a wrong pinout diagram. I've tested the RMT with a more simple example and I found out that the pinout was wrong.
The DS18b20 sensors works well with a 26 MHz XTAL with the esp32-ds18b20 library.
I want to disable the timer interrupt on some of the cores (1-2) on my machine which is a x86 running centos 7 with rt patch, both cores are isolated cores with nohz_full, (you can see the cmdline) but timer interrupt continues to interrupt the real time process which are running on core1 and core2.
1. uname -r
3.10.0-693.11.1.rt56.632.el7.x86_64
2. cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-3.10.0-693.11.1.rt56.632.el7.x86_64 \
root=/dev/mapper/centos-root ro crashkernel=auto \
rd.lvm.lv=centos/root rd.lvm.lv=centos/swap rhgb quiet \
default_hugepagesz=2M hugepagesz=2M hugepages=1024 \
intel_iommu=on isolcpus=1-2 irqaffinity=0 intel_idle.max_cstate=0 \
processor.max_cstate=0 idle=mwait tsc=perfect rcu_nocbs=1-2 rcu_nocb_poll \
nohz_full=1-2 nmi_watchdog=0
3. cat /proc/interrupts
CPU0 CPU1 CPU2
0: 29 0 0 IO-APIC-edge timer
.....
......
NMI: 0 0 0 Non-maskable interrupts
LOC: 835205157 308723100 308384525 Local timer interrupts
SPU: 0 0 0 Spurious interrupts
PMI: 0 0 0 Performance monitoring interrupts
IWI: 0 0 0 IRQ work interrupts
RTR: 0 0 0 APIC ICR read retries
RES: 347330843 309191325 308417790 Rescheduling interrupts
CAL: 0 935 935 Function call interrupts
TLB: 320 22 58 TLB shootdowns
TRM: 0 0 0 Thermal event interrupts
THR: 0 0 0 Threshold APIC interrupts
DFR: 0 0 0 Deferred Error APIC interrupts
MCE: 0 0 0 Machine check exceptions
MCP: 2 2 2 Machine check polls
CPUs/Clocksource:
4. lscpu | grep CPU.s
CPU(s): 3
On-line CPU(s) list: 0-2
NUMA node0 CPU(s): 0-2
5. cat /sys/devices/system/clocksource/clocksource0/current_clocksource
tsc
Thanks a lot for any help.
Moses
Even with nohz_full= you get some ticks on the isolated CPUs:
Some process-handling operations still require the occasional scheduling-clock tick. These operations include calculating CPU load, maintaining sched average, computing CFS entity vruntime, computing avenrun, and carrying out load balancing. They are currently accommodated by scheduling-clock tick every second or so. On-going work will eliminate the need even for these infrequent scheduling-clock ticks.
(Documentation/timers/NO_HZ.txt, cf. (Nearly) full tickless operation in 3.10 LWN, 2013)
Thus, you have to check the rate of the local timer, e.g.:
$ perf stat -a -A -e irq_vectors:local_timer_entry sleep 120
(while your isolated threads/processes are running)
Also, nohz_full= is only effective if there is just one runnable task on each isolated core. You can check that with e.g. ps -L -e -o pid,tid,user,state,psr,cmd and cat /proc/sched_debug.
Perhaps you need to move some (kernel) tasks to you house-keeping core, e.g.:
# tuna -U -t '*' -c 0-4 -m
You can get more insights into what timers are still active by looking at /proc/timer_list.
Another method to investigate causes for possible interruption is to use the functional tracer (ftrace). See also Reducing OS jitter due to per-cpu kthreads for some examples.
I see nmi_watchdog=0 in your kernel parameters, but you don't disable the soft watchdog. Perhaps this is another timer tick source that would show up with ftrace.
You can disable all watchdogs with nowatchdog.
Btw, some of your kernel parameters seem to be off:
tsc=perfect - do you mean tsc=reliable? The 'perfect' value isn't documented in the kernel docs
idle=mwait - do you mean idle=poll? Again, I can't find the 'mwait' value in the kernel docs
intel_iommu=on - what's the purpose of this?
I have this little nonsense script here which I am executing in MATLAB R2013b:
clear all;
n = 2000;
times = 50;
i = 0;
tCPU = tic;
disp 'CPU::'
A = rand(n, n);
B = rand(n, n);
disp '::Go'
for i = 0:times
CPU = A * B;
end
tCPU = toc(tCPU);
tGPU = tic;
disp 'GPU::'
A = gpuArray(A);
B = gpuArray(B);
disp '::Go'
for i = 0:times
GPU = A * B ;
end
tGPU = toc(tGPU);
fprintf('On CPU: %.2f sec\nOn GPU: %.2f sec\n', tCPU, tGPU);
Unfortunately after execution I receive a message from Windows saying: "Display driver stopped working and has recovered.".
Which I assume means that Windows did not get response from my graphic cards driver or something. The script returned without errors:
>> test
CPU::
::Go
GPU::
::Go
On CPU: 11.01 sec
On GPU: 2.97 sec
But no matter if the GPU runs out of memory or not, MATLAB is not able to use the GPU device before I restarted it. If I don't restart MATLAB I receive just a message from CUDA:
>> test
Warning: An unexpected error occurred during CUDA
execution. The CUDA error was:
CUDA_ERROR_LAUNCH_TIMEOUT
> In test at 1
Warning: An unexpected error occurred during CUDA
execution. The CUDA error was:
CUDA_ERROR_LAUNCH_TIMEOUT
> In test at 1
Warning: An unexpected error occurred during CUDA
execution. The CUDA error was:
CUDA_ERROR_LAUNCH_TIMEOUT
> In test at 1
Warning: An unexpected error occurred during CUDA
execution. The CUDA error was:
CUDA_ERROR_LAUNCH_TIMEOUT
> In test at 1
CPU::
::Go
GPU::
Error using gpuArray
An unexpected error occurred during CUDA execution.
The CUDA error was:
the launch timed out and was terminated
Error in test (line 21)
A = gpuArray(A);
Does anybody know how to avoid this issue or what I am doing wrong here?
If needed, my GPU Device:
>> gpuDevice
ans =
CUDADevice with properties:
Name: 'GeForce GTX 660M'
Index: 1
ComputeCapability: '3.0'
SupportsDouble: 1
DriverVersion: 6
ToolkitVersion: 5
MaxThreadsPerBlock: 1024
MaxShmemPerBlock: 49152
MaxThreadBlockSize: [1024 1024 64]
MaxGridSize: [2.1475e+09 65535 65535]
SIMDWidth: 32
TotalMemory: 2.1475e+09
FreeMemory: 1.9037e+09
MultiprocessorCount: 2
ClockRateKHz: 950000
ComputeMode: 'Default'
GPUOverlapsTransfers: 1
KernelExecutionTimeout: 1
CanMapHostMemory: 1
DeviceSupported: 1
DeviceSelected: 1
The key piece of information is this part of the gpuDevice output:
KernelExecutionTimeout: 1
This means that the host display driver is active on the GPU you are running the compute jobs on. The NVIDIA display driver contains a watchdog timer which kills any task which takes more than a predefined amount of time without yielding control back to the driver for screen refresh. This is intended to prevent the situation where a long running or stuck compute job renders the machine unresponsive by freezing the display. The runtime of your Matlab script is clearly exceeding the display driver watchdog timer limit. Once that happens, the the compute context held on the device is destroyed and Matlab can no longer operate with the device. You might be able to reinitialise the context by calling reset, which I guess will run cudaDeviceReset() under the cover.
There is a lot of information about this watchdog timer on the interweb - for example this Stack Overflow question. The solution for how to modify this timeout is dependent on your OS and hardware. The simplest way to avoid this is to not run CUDA code on a display GPU, or increase the granularity of your compute jobs so that no one operation has a runtime which exceeds the timeout limit. Or just write faster code...
I installed Linpack on a 2-Node cluster with Xeon processors. Sometimes if I start Linpack with this command:
mpiexec -np 28 -print-rank-map -f /root/machines.HOSTS ./xhpl_intel64
linpack starts and prints the output, sometimes I only see the mpi mappings printed and then nothing following. To me this seems like random behaviour because I don't change anything between the calls and as already mentioned, Linpack sometimes starts, sometimes not.
In top I can see that xhpl_intel64processes have been created and they are heavily using the CPU but when watching the traffic between the nodes, iftop is telling me that it nothing is sent.
I am using MPICH2 as MPI implementation. This is my HPL.dat:
# cat HPL.dat
HPLinpack benchmark input file
Innovative Computing Laboratory, University of Tennessee
HPL.out output file name (if any)
6 device out (6=stdout,7=stderr,file)
1 # of problems sizes (N)
10000 Ns
1 # of NBs
250 NBs
0 PMAP process mapping (0=Row-,1=Column-major)
1 # of process grids (P x Q)
2 Ps
14 Qs
16.0 threshold
1 # of panel fact
2 PFACTs (0=left, 1=Crout, 2=Right)
1 # of recursive stopping criterium
4 NBMINs (>= 1)
1 # of panels in recursion
2 NDIVs
1 # of recursive panel fact.
1 RFACTs (0=left, 1=Crout, 2=Right)
1 # of broadcast
1 BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
1 # of lookahead depth
1 DEPTHs (>=0)
2 SWAP (0=bin-exch,1=long,2=mix)
64 swapping threshold
0 L1 in (0=transposed,1=no-transposed) form
0 U in (0=transposed,1=no-transposed) form
1 Equilibration (0=no,1=yes)
8 memory alignment in double (> 0)
edit2:
I now just let the program run for a while and after 30min it tells me:
# mpiexec -np 32 -print-rank-map -f /root/machines.HOSTS ./xhpl_intel64
(node-0:0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15)
(node-1:16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31)
Assertion failed in file ../../socksm.c at line 2577: (it_plfd->revents & 0x008) == 0
internal ABORT - process 0
APPLICATION TERMINATED WITH THE EXIT STRING: Hangup (signal 1)
Is this a mpi problem?
Do you know what type of problem this could be?
I figured out what the problem was: MPICH2 uses different random ports each time it starts and if these are blocked your application wont start up correctly.
The solution for MPICH2 is to set the environment variable MPICH_PORT_RANGE to START:END, like this:
export MPICH_PORT_RANGE=50000:51000
Best,
heinrich