Reading DS18b20 sensors using ESP-IDF using a ESP32 with a 26 MHx XTAL

Reading DS18b20 sensors using ESP-IDF using a ESP32 with a 26 MHx XTAL - esp32

I'm able to read DS18B20 sensors using the example code provided in this repository.
It works well using a standard Espressif ESP32-WROOM-32 (aka ESP32-DevKitC), which uses a 40 MHz XTAL.
I'm not able to run the same example using an Allnet-IOT-WLAN, which uses a 26 MHz XTAL.
I suspect that the problem is related with RMT initialization. The initialization is using:
rmt_tx.clk_div = 80;
I've tried different settings for clk_div with no luck.
Does anyone know how to use the DS18B20 sensor with ESP-IDF, using a board with a 26 MHz XTAL, instead of more standard 40 MHz one?
ESP32-WROOM-32 output (working)
I (0) cpu_start: Starting scheduler on APP CPU.
Find devices:
0 : d4000008e40d7428
1 : f8000008e3632528
Found 2 devices
Device 1502162ca5b2ee28 is not present
Temperature readings (degrees C): sample 1
0: 22.3 0 errors
1: 21.8 0 errors
Temperature readings (degrees C): sample 2
0: 22.3 0 errors
1: 21.9 0 errors
Allnet-IOT-WLAN output (not working)
I (0) cpu_start: Starting scheduler on APP CPU.
Find devices:
Found 0 devices
E (6780) owb_rmt: rx_items == 0
E (6880) owb_rmt: rx_items == 0
E (6980) owb_rmt: rx_items == 0

There are no differences in the RMT initialization using different XTAL clock frequencies.
D (2319) rmt: Rmt Tx Channel 1|Gpio 25|Sclk_Hz 80000000|Div 80|Carrier_Hz 0|Duty 35
D (2319) intr_alloc: Connected src 47 to int 13 (cpu 0)
D (2319) rmt: Rmt Rx Channel 0|Gpio 25|Sclk_Hz 80000000|Div 80|Thresold 77|Filter 30
Both use the same 80 MHx source.
I was using a wrong pinout diagram. I've tested the RMT with a more simple example and I found out that the pinout was wrong.
The DS18b20 sensors works well with a 26 MHz XTAL with the esp32-ds18b20 library.

Related

Peculiar behaviour with Mellanox ConnectX-5 and DPDK in rxonly mode

Recently I observed a peculiar behaviour with Mellanox ConnectX-5 100 Gbps NIC. While working on 100 Gbps rxonly using DPDK rxonly mode. It was observed that I was able to receive 142 Mpps using 12 queues. However with 11 queues, it was only 96 Mpps, with 10 queues 94 Mpps, 9 queues 92 Mpps. Can anyone explain why there is a sudden/abrupt jump in capture performance from 11 queues to 12 queues?
The details of the setup is mentioned below.
I have connected two servers back to back. One of them (server-1) is used for traffic generation and the other (server-2) is used for traffic reception. In both the servers I am using Mellanox ConnectX-5 NIC.
Performance tuning parameters mentioned in section-3 of https://fast.dpdk.org/doc/perf/DPDK_19_08_Mellanox_NIC_performance_report.pdf [pg no.:11,12] has been followed
Both servers are of same configuration.
Server configuration
Processor: Intel Xeon scalable processor, 6148 series, 20 Core HT, 2.4 GHz, 27.5 L3 Cache
No. of Processor: 4 Nos.
RAM: 256 GB, 2666 MHz speed
DPDK version used is dpdk-19.11 and OS is RHEL-8.0
For traffic generation testpmd with --forward=txonly and --txonly-multi-flow is used. Command used is below.
Packet generation testpmd command in server-1
./testpmd -l 4,5,6,7,8,9,10,11,12,13,14,15,16 -n 6 -w 17:00.0,mprq_en=1,rxq_pkt_pad_en=1 --socket-mem=4096,0,0,0 -- --socket-num=0 --burst=64 --txd=4096 --rxd=4096--mbcache=512 --rxq=12 --txq=12 --nb-cores=12 -i -a --rss-ip --no-numa --forward=txonly --txonly-multi-flow
testpmd> set txpkts 64
It was able to generate 64 bytes packet at the sustained rate of 142.2 Mpps. This is used as input to the second server that works in rxonly mode. The command for reception is mentioned below
Packet Reception command with 12 cores in server-2
./testpmd -l 4,5,6,7,8,9,10,11,12,13,14,15,16 -n 6 -w 17:00.0,mprq_en=1,rxq_pkt_pad_en=1 --socket-mem=4096,0,0,0 -- --socket-num=0 --burst=64 --txd=4096 --rxd=4096--mbcache=512 --rxq=12 --txq=12 --nb-cores=12 -i -a --rss-ip --no-numa
testpmd> set fwd rxonly
testpmd> show port stats all
######################## NIC statistics for port 0 ########################
RX-packets: 1363328297 RX-missed: 0 RX-bytes: 87253027549
RX-errors: 0
RX-nombuf: 0
TX-packets: 19 TX-errors: 0 TX-bytes: 3493
Throughput (since last show)
Rx-pps: 142235725 Rx-bps: 20719963768
Tx-pps: 0 Tx-bps: 0
############################################################################
Packet Reception command with 11 cores in server-2
./testpmd -l 4,5,6,7,8,9,10,11,12,13,14,15 -n 6 -w 17:00.0,mprq_en=1,rxq_pkt_pad_en=1 --socket-mem=4096,0,0,0 -- --socket-num=0 --burst=64 --txd=4096 --rxd=4096--mbcache=512 --rxq=11 --txq=11 --nb-cores=11 -i -a --rss-ip --no-numa
testpmd> set fwd rxonly
testpmd> show port stats all
######################## NIC statistics for port 0 ########################
RX-packets: 1507398174 RX-missed: 112937160 RX-bytes: 96473484013
RX-errors: 0
RX-nombuf: 0
TX-packets: 867061720 TX-errors: 0 TX-bytes: 55491950935
Throughput (since last show)
Rx-pps: 96718960 Rx-bps: 49520107600
Tx-pps: 0 Tx-bps: 0
############################################################################
If you see there is a sudden jump in Rx-pps from 11 cores to 12 cores. This variation was not observed elsewhere like 8 to 9, 9 to 10 or 10 to 11 and so on.
Can anyone explain the reason of this sudden jump in performance.
The same experiment was conducted, this time using 11 cores for traffic generation.
./testpmd -l 4,5,6,7,8,9,10,11,12,13,14,15 -n 6 -w 17:00.0,mprq_en=1,rxq_pkt_pad_en=1 --socket-mem=4096,0,0,0 -- --socket-num=0 --burst=64 --txd=4096 --rxd=4096--mbcache=512 --rxq=11 --txq=11 --nb-cores=11 -i -a --rss-ip --no-numa --forward=txonly --txonly-multi-flow
testpmd> show port stats all
######################## NIC statistics for port 0 ########################
RX-packets: 0 RX-missed: 0 RX-bytes: 0
RX-errors: 0
RX-nombuf: 0
TX-packets: 2473087484 TX-errors: 0 TX-bytes: 158277600384
Throughput (since last show)
Rx-pps: 0 Rx-bps: 0
Tx-pps: 142227777 Tx-bps: 72820621904
############################################################################
On the capture side with 11 cores
./testpmd -l 1,2,3,4,5,6,10,11,12,13,14,15 -n 6 -w 17:00.0,mprq_en=1,rxq_pkt_pad_en=1 --socket-mem=4096,0,0,0 -- --socket-num=0 --burst=64 --txd=1024 --rxd=1024--mbcache=512 --rxq=11 --txq=11 --nb-cores=11 -i -a --rss-ip --no-numa
testpmd> set fwd rxonly
testpmd> show port stats all
######################## NIC statistics for port 0 ########################
RX-packets: 8411445440 RX-missed: 9685 RX-bytes: 538332508206
RX-errors: 0
RX-nombuf: 0
TX-packets: 0 TX-errors: 0 TX-bytes: 0
Throughput (since last show)
Rx-pps: 97597509 Rx-bps: 234643872
Tx-pps: 0 Tx-bps: 0
############################################################################
On the capture side with 12 cores
./testpmd -l 1,2,3,4,5,6,10,11,12,13,14,15,16 -n 6 -w 17:00.0,mprq_en=1,rxq_pkt_pad_en=1 --socket-mem=4096,0,0,0 -- --socket-num=0 --burst=64 --txd=1024 --rxd=1024--mbcache=512 --rxq=12 --txq=12 --nb-cores=12 -i -a --rss-ip --no-numa
testpmd> set fwd rxonly
testpmd> show port stats all
######################## NIC statistics for port 0 ########################
RX-packets: 9370629638 RX-missed: 6124 RX-bytes: 554429504128
RX-errors: 0
RX-nombuf: 0
TX-packets: 0 TX-errors: 0 TX-bytes: 0
Throughput (since last show)
Rx-pps: 140664658 Rx-bps: 123982640
Tx-pps: 0 Tx-bps: 0
############################################################################
The sudden jump in performance from 11 to 12 core still remains the same.

With DPDK LTS release for 19.11, 20.11, 21.11 running just in vector mode (default mode) for Mellanox CX-5 and CX-6 does not produce the problem mentioned above.
[EDIT-1] retested with rxqs_min_mprq=1 for 2 * 100Gbps for 64B, For 16 RXTX on 16T16C resulted in degradation 9~10Mpps. For all RX queue from 1 to 7 RX there is degration of 6Mpps with rxqs_min_mprq=1.
Following is the capture for RXTX to core scaling
investigating into MPRQ claim, the following are some the unique observations
For both MLX CX-5 and CX-6, the max that each RX queue can attain is around 36 to 38 MPPs
Single core can achieve up to 90Mpps (64B) with 3 RXTX in IO using AMD EPYC MILAN on both CX-5 and CX-6.
For 100Gbps on 64B can be achieved with 14 Logical cores (7 Physical cores) with testpmd in IO mode.
for both CX-5 and CX-6 2 * 100Gbps for 64B requires MPRQ and compression technique to allow more packets in and out of system.
There are multitude of configuration tuning required to achieve high number. Please refer stackoverflow question and DPDK MLX tuning parameters for more information.
PCIe gen4 BW is not the limiting factor, but the NIC ASIC with internal embedded siwtch results in above mentioned behaviour. hence to overcome these limitation one needs to use PMD arguments to activate the Hardware, which further increases the overhead on CPU in PMD processing. Thus there is barrier (needs more cpu) to process the compressed and multiple packets inlined to convert to DPDK single MBUF. This is reason why more therads are required when using PMD arguments.
note:
Test application: testpmd
EAL Args: --in-memory --no-telemetry --no-shconf --single-file-segments --file-prefix=2 -l 7,8-31
PMD args vector: none
PMD args for 2 * 100Gbps line rate: txq_inline_mpw=204,txqs_min_inline=1,mprq_en=1,rxqs_min_mprq=1,mprq_log_stride_num=12,rxq_pkt_pad_en=1,rxq_cqe_comp_en=4

Bad Disk performance after moving from Ubuntu to Centos 7

Relatively old Dell R620 server (32 cores / 128GB RAM) was working perfect for years with Ubuntu. Plain OS install, no Virtualization.
2 system disks in mirror (XFS)
6 RAID 5 disks for /var (XFS)
server is used for a nightly check of a MySQL Xtrabackup file.
Before the format and move to Centos 7 the process would finish by 08:00, Now running late at noon.
99% of the job is opening a large tar.gz file.
htop : there are only two processes doing something :
1. gzip -d : about 20% CPU
2. tar zxf Xtrabackup.tar.gz : about 4-7% CPU
iotop : it's steady at around 3M/s (Read) / 20-25 M/s (Write) which is about 25% of what i would expect at minimum.
Memory : Used : 1GB of 128GB
Server is fully updated both OS / HW / Firmware including the disks firmware.
IDRAC shows no problems.
Bottom line : Server is not working hard (to say the least) but performance is way off.
Any ideas would be appreciated.
vmstat
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
2 2 0 469072 0 130362040 0 0 57 341 0 0 0 0 98 2 0
0 2 0 456916 0 130374568 0 0 3328 24576 1176 3241 2 1 94 4 0

You have blocked processes and also io operations (around 20MB/s). And this mean for me you have few processes which concurrently access disc resources. What you can do to improve the performance is instead of
tar zxf Xtrabackup.tar.gz
use
gzip -d Xtrabackup.tar.gz|tar xvf -
The second add parallelism and can benefit from multy processor, You can also benefit from increase of the pipe (fifo) buffer. Check this answer for some ideas
Also consider to tune filesystem where are stored output files of tar

Disable timer interrupt on Linux kernel

I want to disable the timer interrupt on some of the cores (1-2) on my machine which is a x86 running centos 7 with rt patch, both cores are isolated cores with nohz_full, (you can see the cmdline) but timer interrupt continues to interrupt the real time process which are running on core1 and core2.
1. uname -r
3.10.0-693.11.1.rt56.632.el7.x86_64
2. cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-3.10.0-693.11.1.rt56.632.el7.x86_64 \
root=/dev/mapper/centos-root ro crashkernel=auto \
rd.lvm.lv=centos/root rd.lvm.lv=centos/swap rhgb quiet \
default_hugepagesz=2M hugepagesz=2M hugepages=1024 \
intel_iommu=on isolcpus=1-2 irqaffinity=0 intel_idle.max_cstate=0 \
processor.max_cstate=0 idle=mwait tsc=perfect rcu_nocbs=1-2 rcu_nocb_poll \
nohz_full=1-2 nmi_watchdog=0
3. cat /proc/interrupts
CPU0 CPU1 CPU2
0: 29 0 0 IO-APIC-edge timer
.....
......
NMI: 0 0 0 Non-maskable interrupts
LOC: 835205157 308723100 308384525 Local timer interrupts
SPU: 0 0 0 Spurious interrupts
PMI: 0 0 0 Performance monitoring interrupts
IWI: 0 0 0 IRQ work interrupts
RTR: 0 0 0 APIC ICR read retries
RES: 347330843 309191325 308417790 Rescheduling interrupts
CAL: 0 935 935 Function call interrupts
TLB: 320 22 58 TLB shootdowns
TRM: 0 0 0 Thermal event interrupts
THR: 0 0 0 Threshold APIC interrupts
DFR: 0 0 0 Deferred Error APIC interrupts
MCE: 0 0 0 Machine check exceptions
MCP: 2 2 2 Machine check polls
CPUs/Clocksource:
4. lscpu | grep CPU.s
CPU(s): 3
On-line CPU(s) list: 0-2
NUMA node0 CPU(s): 0-2
5. cat /sys/devices/system/clocksource/clocksource0/current_clocksource
tsc
Thanks a lot for any help.
Moses

Even with nohz_full= you get some ticks on the isolated CPUs:
Some process-handling operations still require the occasional scheduling-clock tick. These operations include calculating CPU load, maintaining sched average, computing CFS entity vruntime, computing avenrun, and carrying out load balancing. They are currently accommodated by scheduling-clock tick every second or so. On-going work will eliminate the need even for these infrequent scheduling-clock ticks.
(Documentation/timers/NO_HZ.txt, cf. (Nearly) full tickless operation in 3.10 LWN, 2013)
Thus, you have to check the rate of the local timer, e.g.:
$ perf stat -a -A -e irq_vectors:local_timer_entry sleep 120
(while your isolated threads/processes are running)
Also, nohz_full= is only effective if there is just one runnable task on each isolated core. You can check that with e.g. ps -L -e -o pid,tid,user,state,psr,cmd and cat /proc/sched_debug.
Perhaps you need to move some (kernel) tasks to you house-keeping core, e.g.:
# tuna -U -t '*' -c 0-4 -m
You can get more insights into what timers are still active by looking at /proc/timer_list.
Another method to investigate causes for possible interruption is to use the functional tracer (ftrace). See also Reducing OS jitter due to per-cpu kthreads for some examples.
I see nmi_watchdog=0 in your kernel parameters, but you don't disable the soft watchdog. Perhaps this is another timer tick source that would show up with ftrace.
You can disable all watchdogs with nowatchdog.
Btw, some of your kernel parameters seem to be off:
tsc=perfect - do you mean tsc=reliable? The 'perfect' value isn't documented in the kernel docs
idle=mwait - do you mean idle=poll? Again, I can't find the 'mwait' value in the kernel docs
intel_iommu=on - what's the purpose of this?

Linpack sometimes starting, sometimes not, but nothing changed

I installed Linpack on a 2-Node cluster with Xeon processors. Sometimes if I start Linpack with this command:
mpiexec -np 28 -print-rank-map -f /root/machines.HOSTS ./xhpl_intel64
linpack starts and prints the output, sometimes I only see the mpi mappings printed and then nothing following. To me this seems like random behaviour because I don't change anything between the calls and as already mentioned, Linpack sometimes starts, sometimes not.
In top I can see that xhpl_intel64processes have been created and they are heavily using the CPU but when watching the traffic between the nodes, iftop is telling me that it nothing is sent.
I am using MPICH2 as MPI implementation. This is my HPL.dat:
# cat HPL.dat
HPLinpack benchmark input file
Innovative Computing Laboratory, University of Tennessee
HPL.out output file name (if any)
6 device out (6=stdout,7=stderr,file)
1 # of problems sizes (N)
10000 Ns
1 # of NBs
250 NBs
0 PMAP process mapping (0=Row-,1=Column-major)
1 # of process grids (P x Q)
2 Ps
14 Qs
16.0 threshold
1 # of panel fact
2 PFACTs (0=left, 1=Crout, 2=Right)
1 # of recursive stopping criterium
4 NBMINs (>= 1)
1 # of panels in recursion
2 NDIVs
1 # of recursive panel fact.
1 RFACTs (0=left, 1=Crout, 2=Right)
1 # of broadcast
1 BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
1 # of lookahead depth
1 DEPTHs (>=0)
2 SWAP (0=bin-exch,1=long,2=mix)
64 swapping threshold
0 L1 in (0=transposed,1=no-transposed) form
0 U in (0=transposed,1=no-transposed) form
1 Equilibration (0=no,1=yes)
8 memory alignment in double (> 0)
edit2:
I now just let the program run for a while and after 30min it tells me:
# mpiexec -np 32 -print-rank-map -f /root/machines.HOSTS ./xhpl_intel64
(node-0:0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15)
(node-1:16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31)
Assertion failed in file ../../socksm.c at line 2577: (it_plfd->revents & 0x008) == 0
internal ABORT - process 0
APPLICATION TERMINATED WITH THE EXIT STRING: Hangup (signal 1)
Is this a mpi problem?
Do you know what type of problem this could be?

I figured out what the problem was: MPICH2 uses different random ports each time it starts and if these are blocked your application wont start up correctly.
The solution for MPICH2 is to set the environment variable MPICH_PORT_RANGE to START:END, like this:
export MPICH_PORT_RANGE=50000:51000
Best,
heinrich

Asynchronous Read/Writing with libraw1394

I'm trying to get two computers to communicate with each other over firewire. Both of the computers are running Ubuntu 9.10 and both have read/write access to the /dev/raw1394 node. I'm using firecontrol to quickly test sending read/write requests. If I can get it to work with firecontrol, I should be able to figure out how to do the same in my code.
On computer A, I do this:
computerA $ ./commander
Work Now
Copyright (C) 2002-2007 by Manfred Weihs
This software comes with absolutely no warranty.
No adapter specified!
successfully got handle
current generation number (driver): 1
1 card(s) found
nodes on bus: 2, card name: ohci1394
using adapter 0
found: 2 nodes on bus, local ID is 1, IRM is 1
current generation number (adapter): 7
entering command mode
Type 'help' for more information!
Command: w . 0 0 0xDE
insufficient arguments for operation!
Command: w . 0 0 2 0xDe
writing to node 0, bus 1023, offset 000000000000 2 bytes:
00 DE
write succeeded.
Ack code: complete
Since computer A is on node 1, I send to node 0. Then I go to computer B and read from node 0 and get this:
computerB $ ./commander
Copyright (C) 2002-2007 by Manfred Weihs
This software comes with absolutely no warranty.
No adapter specified!
successfully got handle
current generation number (driver): 1
1 card(s) found
nodes on bus: 2, card name: ohci1394
using adapter 0
found: 2 nodes on bus, local ID is 0, IRM is 1
current generation number (adapter): 9
entering command mode
Type 'help' for more information!
Command: r . 0 0 1
reading from node 0, bus 1023, offset 000000000000 1 bytes
read failed.
Ack code: pending; Response code: address error
I'm using the same offset for both of them. What am I doing wrong and how am I supposed to read/write from/to firewire nodes?
I have these same problems when I try and use raw1394 in my own code.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio