ESP32 not functioning WIFI - esp32

I have a Frankenstein ESP 32 setup using my dead Wemos Lolin 32 board and an external Ai thinker ESP 32 chip.
It has been working alright until recently it's failing to connect to any Wifi while dumping garbage data for anything beyond the Wifi. begin() function. It occasionally connects in roughly 1 of 15 reboots. It also seems to sometimes stall UART before triggering a reboot and then working.
I have been using it with a GC9A01 1.28-inch TFT display. I may have damaged it while making the connections but I can't be sure. If I don't use Wifi most other functionality seems to work ok.
This is a sample of the many codes I have tried.
#include <Arduino.h>
#include <WiFi.h>
#include "time.h"
const char *ssid = "VUMA FIBER ";
const char *password = "mysecurepassword";
const char *ntpServer = "pool.ntp.org";
const long gmtOffset_sec = 3 * 3600;
const int daylightOffset_sec = 0;
hw_timer_t *My_timer = NULL;
class Timehandler
{
private:
hw_timer_t *My_timer = NULL;
uint8_t counter = 0;
public:
uint8_t hour;
uint8_t minutes;
uint8_t seconds;
unsigned int year;
uint8_t date;
uint8_t day;
struct tm timeinfo;
bool fetchtime(uint8_t gmtOffset = 0, uint8_t daylightOffset_sec = 0, const char *ntpServer = "pool.ntp.org")
{
// configTime(gmtOffset * 3600, daylightOffset_sec, ntpServer);
// struct tm timeinfo;
// if (!getLocalTime(&timeinfo)) {
// return false;
// } else {
Serial.println(&timeinfo, "%A, %B %d %Y %H:%M:%S");
return true;
// }
}
void maintainTime()
{
seconds++;
if (seconds >= 60)
{
minutes++;
seconds = 0;
}
if (minutes >= 60)
{
hour++;
minutes = 0;
}
if (hour >= 24)
{
date++;
hour = 0;
}
Serial.printf("%02d", hour);
Serial.print(":");
Serial.printf("%02d", minutes);
Serial.print(":");
Serial.printf("%02d", seconds);
Serial.println("");
}
void getTime()
{
struct tm timeinfo;
if (!getLocalTime(&timeinfo))
{
Serial.println("Failed to obtain time");
return;
}
hour = timeinfo.tm_hour;
minutes = timeinfo.tm_min;
seconds = timeinfo.tm_sec;
year = timeinfo.tm_year;
date = timeinfo.tm_mday;
day = timeinfo.tm_wday;
}
};
bool connected = false;
Timehandler t;
void setup()
{
Serial.begin(115200);
// connect to WiFi
Serial.print("Connecting to ");
Serial.println(ssid);
WiFi.begin(ssid, password);
t.fetchtime();
t.getTime();
}
void loop()
{
if (!connected)
{
Serial.println("failed...retrying");
if (WiFi.status() == WL_CONNECTED)
{
Serial.println(" CONNECTED");
while (!t.fetchtime(3))
{
Serial.println("failed...retrying");
delay(500);
t.fetchtime(3);
WiFi.disconnect(true);
WiFi.mode(WIFI_OFF);
}
t.getTime();
connected = true;
}
}else{
t.maintainTime();
}
t.maintainTime();
delay(1000);
}
Here is a sample of the serial output.
CURRENT: upload_protocol = esptool
Looking for upload port...
Auto-detected: COM5
Uploading .pio\build\lolin32\firmware.bin
esptool.py v3.1
Serial port COM5
Connecting....
Chip is ESP32-D0WD (revision 1)
Features: WiFi, BT, Dual Core, 240MHz, VRef calibration in efuse, Coding Scheme None
Crystal is 40MHz
MAC: 94:3c:c6:10:4b:30
Uploading stub...
Running stub...
Stub running...
Changing baud rate to 460800
Changed.
Configuring flash size...
Auto-detected Flash size: 4MB
Flash will be erased from 0x00001000 to 0x00005fff...
Flash will be erased from 0x00008000 to 0x00008fff...
Flash will be erased from 0x0000e000 to 0x0000ffff...
Flash will be erased from 0x00010000 to 0x000abfff...
Compressed 17120 bytes to 11164...
Writing at 0x00001000... (100 %)
Wrote 17120 bytes (11164 compressed) at 0x00001000 in 0.6 seconds (effective 213.4 kbit/s)...
Hash of data verified.
Compressed 3072 bytes to 128...
Writing at 0x00008000... (100 %)
Wrote 3072 bytes (128 compressed) at 0x00008000 in 0.1 seconds (effective 247.8 kbit/s)...
Hash of data verified.
Compressed 8192 bytes to 47...
Writing at 0x0000e000... (100 %)
Wrote 8192 bytes (47 compressed) at 0x0000e000 in 0.2 seconds (effective 397.7 kbit/s)...
Hash of data verified.
Compressed 638784 bytes to 391514...
Writing at 0x00010000... (4 %)
Writing at 0x0001bce6... (8 %)
Writing at 0x00029453... (12 %)
Writing at 0x00031fdd... (16 %)
Writing at 0x00037279... (20 %)
Writing at 0x0003c526... (25 %)
Writing at 0x000419cd... (29 %)
Writing at 0x00046f5c... (33 %)
Writing at 0x0004c242... (37 %)
Writing at 0x000533e8... (41 %)
Writing at 0x0005ba7c... (45 %)
Writing at 0x00061195... (50 %)
Writing at 0x00066a45... (54 %)
Writing at 0x0006bdb0... (58 %)
Writing at 0x000715e8... (62 %)
Writing at 0x00077550... (66 %)
Writing at 0x0007cdc5... (70 %)
Writing at 0x000830c2... (75 %)
Writing at 0x00088d20... (79 %)
Writing at 0x0008ebd9... (83 %)
Writing at 0x00094b83... (87 %)
Writing at 0x0009ab4a... (91 %)
Writing at 0x000a0842... (95 %)
Writing at 0x000a6686... (100 %)
Wrote 638784 bytes (391514 compressed) at 0x00010000 in 10.5 seconds (effective 486.4 kbit/s)...
Hash of data verified.
Leaving...
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:1
load:0x3fff0018,len:4
load:0x3fff001c,len:1044load:0x40078000,len:10124
load:0x40080400,len:5856
entry 0x400806a8
Connecting to VUMA FIBER
���n���b�l�|�␒b␒␂␌␌␌ll␌␌␌␌␌�␌␌�␌␌l`␃␜␒␒nn�␐␂␌�n�np�␒␒nn␌��r��`␃␜␒␒no�␐�bbb|␒�b␒␒on�␎l�␌�␌␌�␌ll`␃␜␒␒oo�␐�bbc|␒�b␓␒nn�␎l�␌�␌␌�␌�l`␃␜␒␒no�␐�bbb|␒�b␒␒nn�␎l�␌�␌␌�␌�l`␃␒␒nn�␐�ccc|␒�b␒␒nn�␎l�␌�␌␌�␌␌�␎l␜␒␒on�␐�cbc|␒�b␒␒on�␎l�␌�␌␌�␌l�␎l␜␒␒oo�␐�ccc|␒�b␓␒oo�␎l�␌�␌␌�␌��␏l␜␒␒oo�␐�ccc|␒�b␓␒nn�␎l�␌�␌␌�␌�␎l␜␒␒nn�␐�ccc|␒�b␒␒on�␎l�␌�␌␌�␌␌l`␃␜␒␒nn�␐�bbb|␒�b␒␒nn�␎l�␌�␌␌�␌ll`␂␜␒␒nn�␐�bbb|␒�b␓␒on�␎l�␌�␌␌�l␌l`␂␜␒␒on�␐�cbb|␒�b␓␒oo�␎l�␌�␌␌�lll`␂␜␒␒nn�␐�bcc|␒�b␓␒on�␎l�␌�␌␌�l�l`␂␜␓␒oo�␐�ccc|␒�b␛␒og�␎l�␌�␌␌�l�d`␃␜␒␒on�␐�ccc|␒�b␓␒oo�␎l�␌�␌␌�l␌�␏l␜␒␒nn�␐�bbb|␒�b␒␒nn�␎l�␌�␌␌�ll�␎l␜␒␒oo�␐�bbb|␒�b␒␒nn�␎l�␌�␌␌�l��␎l␜␒␒on�␐�bbb|␒�b␒␒on�␎l�␌�␌␌�l�␏l␜␒␒nn�␐�bbb|␒�b␒␒nn�␎l�␌�␌␌�l␌l`␂␜␒␒oo�␐�ccc|␒�b␓␒no�␎l�␌�␌␌�lll`␂␜␒␒nn�␐�ccb|␒�b␒␒oo�␎l�␌�␌␌��␌l`␃␜␒␒oo�␐�cbb|␒�b␒␒oo�␎l�␌�␌␌��ll`␂␜␒␒on�␐�bbb|␒�b␒␒no�␎l�␌�␌␌���l`␂␜␒␒oo�␐�bcc|␒�b␒␒on�␎l�␌�␌␌���l`␂␜␒␒on�␐�bbc|␒�b␒␒no�␎l�␌�␌␌��␌�␏l␜␒␒nn�␐�ccc|␒�b␒␒nn�␎l�␌�␌␌��l�␏l␜␒␒on�␐�cbb|␒�b␓␒on�␎l�␌�␌␌����␎l␜␒␒nn�␐�bbb|␒�b␒␒nn�␎l�␌�␌␌���␎l␜␒␒on�␐�ccc|␒�b␓␒no�␎l�␌�␌␌��␌l`␃␜␒␒nn�␐�bcb|␒�b␒␒nn�␎l�␌�␌␌��ll`␂␜␒␒no�␐�bcb|␒�b␒␒nn�␎l�␌�␌␌��␌l`␃␜␒␒nn�␐�bbb|␒�b␓␒nn�␎l�␌�␌␌��ll`␂␜␒␒nn�␐�bbb|␒�b␓␒no�␎l�␌�␌␌��l`␃␜␒␒no�␐�bbb|␒�b␒␒no�␎l�␌�␌␌���l`␂␜␒␒nn�␐�bcb|␒�b␒␒nn�␎l�␌�␌␌��␌�␏l␜␒␒oo�␐�ccc|␒�b␓␒go�␎l�␄�␄␌��l�␏l␜␒␒oo�␐�ccc|␒�b␓␒oo�␎l�␌�␌␌�쌎␎l␜␒␒nn�␐�bbb|␒�b␒␒nn�␎l�␌�␌␌���␎l␜␒␒nn�␐�bbb|␒�b␒␒nn�␎l�␌�␌␌��␌l`␂␜␒␒oo�␐�bbb|␒�b␒␒oo�␎l�␌�␌␌��ll`␃␜␒␒no�␐�bcb|␒�b␓␒no�␎l�␌�␌␌�␌�l`␂␜␒␒nn�␐�bbb|␒�b␒␒nn�␎l�␌�␌␌�␌�prl␜␒␒on�␐�ccc|␒�b␓␒no�␎l�␌�␌␌�␌�␜rl␜␒␒no�␐�cbb|␒�b␓␒on�␎l�␌�␌␌�␌�␜rl␜␒␒on�␐�ccb|␒�b␒␒no�␎l�␌�␌␌�␌��␎l␜␒␒no�␐�bcb|␒�b␒␒on�␎l�␌�␌␌�␌�rrl␜␒␒no�␐�ccc|␒�b␒␒no�␎l�␌�␌␌�␌��␎l␜␒␒nn�␐�bbb|␒�b␒␒oo�␎l�␌�␌␌�␌��␎l␜␒␒oo�␐�ccc|␒�b␒␒oo�␎l�␌�␌␌�␌�l`␃␜␒␒on�␐�bcb|␒�b␒␒on�␎l�␌�␌␌�␌�|rl␜␒␒oo�␐�ccc|␒�b␓␒oo�␎l�␌�␌␌�l�l`␃␜␒␒no�␐�bcc|␒�b␓␒oo�␎l�␌�␌␌�l�prl␜␒␒nn�␐�bbb|␒�b␒␒nn�␎l�␌�␌␌�l�␜rl␜␒␒no�␐�bbb|␒�b␓␒on�␎l�␌�␌␌�l�␜rl␜␒␒on�␐�ccc|␒�b␓␒oo�␎l�␌�␌␌�l��␎l␜␒␒nn�␐�ccb|␒�b␒␒nn�␎l�␌�␌␌�l�rrl␜␒␒nn�␐�cbc|␒�b␓␒no�␎l�␌�␌␌�l��␎l␜␒␒no�␐�ccc|␒�b␓␒no�␎l�␌�␌␌�l��␏l␜␒␒on�␐�ccb|␒�b␒␒oo�␎l�␌�␌␌�l�l`␂␜␒␒oo�␐�ccc|␒�b␓␒no�␎l�␌�␌␄�d�|rl␜␒␒no�␐�ccb|␒�b␓␒nn�␎l�␌�␌l�␌␌l`␂␜␒␒on�␐�bcb|␒�b␒␒nn�␎l�␌�␌l�␌ll`␂␜␒␒nn�␐�bbb|␒�b␒␒no�␎l�␌�␌l�␌�l`␂␜␒␒nn�␐�bbb|␒�b␒␒nn�␎l�␌�␌l�␌�l`␃␜␒␒nn�␐�bbb|␒�b␒␒nn�␎l�␌�␌l�␌␌�␎l␜␒␒nn�␐�bbc|␒�b␒␒nn�␎l�␌�␌l�␌l�␎l␜␒␒no�␐�bbb|␒�b␒␒nn�␎l�␌�␌l�␌��␏l␜␒␒nn�␐�bbb|␒�b␒␒nn�␎l�␌�␌l�␌�␎l␜␒␒no�␐�ccc|␒�b␓␒on�␎l�␌�␌l�␌␌l`␃␜␒␒on�␐�bcb|␒�b␒␒on�␎l�␌�␌l�␌ll`␃␜␒␒on�␐�bcb|␒�b␓␒on�␎l�␌�␌l�l␌l`␃␜␒␒oo�␐�cbb|␒�b␓␒oo�␎l�␌�␌l�lll`␂␜␒␒no�␐�bcb|␒�b␓␒on�␎l�␌�␌l�l�l`␂␜␒␒on�␐�bcb|␒�b␒␒no�␎l�␌�␌l�l�l`␂␜␒␒on�␐�bcc|␒�b␓␒oo�␎l�␌�␌l�l␌�␎l␜␒␒oo�␐�bcc|␒�b␓␒nn�␎l�␌�␌l�ll�␏l␜␒␒oo�␐�bcc|␒�b␒␒oo�␎l�␌�␌l�l��␎l␜␒␒no�␐�bbc|␒�b␒␒nn�␎l�␌�␌l�l�␎l␜␒␒oo�␐�bbb|␒�b␓␒oo�␎l�␌�␌l�l␌l`␃␜␒␒no�␐�cbc|␒�b␓␒no�␎l�␌�␌l�lll`␃␜␒␒no�␐�bcb|␒�b␓␒nn�␎l�␌�␌l��␌l`␂␜␒␒on�␐�bcb|␒�b␒␒nn�␎l�␌�␌l��ll`␃␜␒␒on�␐�ccc|␒�b␓␒oo�␎l�␌�␌l���l`␂␜␒␒nn�␐�bbb|␒�b␒␒nn�␎l�␌�␌l���l`␂␜␒␒on�␐�bbb|␒�b␓␒oo�␎l�␌�␌l��␌�␎l␜␒␒on�␐�bcb|␒�b␒␒oo�␎l�␌�␌l��l�␏l␜␒␒on�␐�bcc|␒�b␒␒oo�␎l�␌�␌l����␎l␜␒␒oo�␐�ccb|␒�b␓␒oo�␎l�␌�␌l���␏l␜␒␒'g�␐�#c#<␒�b␛␒'o�␏l�␄�␌d��␌l`␃␜␒␒no�␐�ccb|␒�b␓␒oo�␎l�␌�␌l��ll`␃␜␒␒oo�␐�bbc|␒�b␒␒on�␎l�␌�␌l��␌l`␂␜␒␒nn�␐�cbc|␒�b␒␒nn�␎l�␌�␌l��ll`␃␜␒␒oo�␐�ccc|␒�b␓␒oo�␎l�␌�␌l��l`␃␜␒␒nn�␐�cbb|␒�b␒␒nn�␎l�␌�␌l���l`␂␜␒␒on�␐�cbc|␒�b␓␒oo�␎l�␌�␌l��␌�␏l␜␒␒nn�␐�bbc|␒�b␓␒no�␎l�␌�␌l��l�␎l␜␒␒nn�␐�bbb|␒�b␒␒nn�␎l�␌�␌l�쌎␏l␜␓␒''�␐�###<␒�#␛␒''�␇$�␄�␄$���␇l␜␒␒no�␐�bcb|␒�b␒␒oo�␎l�␌�␌l��␌l`␃␜␒␒no�␐�ccb|␒�b␓␒no�␎l�␌�␌l��ll`␃␜␒␒oo�␐�ccc|␒�b␓␒oo�␎l�␌�␌l�␌�l`␃␜␒␒nn�␐�bbb|␒�b␒␒on�␎l�␌�␌l�␌�prl␜␒␒no�␐�bbb|␒�b␒␒no�␎l�␌�␌l�␌�␜rl␜␒␒oo�␐�bcb|␒�b␒␒no�␎l�␌�␌l�␌�␜rl␜␒␒nn�␐�bbc|␒�b␒␒on�␎l�␌�␌l�␌��␎l␜␒␒on�␐�bbb|␒�b␒␒nn�␎l�␌�␌l�␌�rrl␜␒␒on�␐�cbb|␒�b␒␒nn�␎l�␌�␌l�␌��␎l␜␒␒oo�␐�cbc|␒�b␒␒no�␎l�␌�␌l�␌��␏l␜␒␛''�␐�###<␓�b␛␛''�␎$�␄�␄$�␄�$ ␃␜␒␒oo�␐�cbc|␒�b␓␒on�␎l�␌�␌l�␌�|rl␜␒␒nn�␐�cbc|␒�b␒␒on�␎l�␌�␌l�l�l`␂␜␒␒on�␐�ccb|␒�b␒␒no�␎l�␌�␌l�l�prlets Jun 8 2016 00:22:57
rst:0x1 (POWERON_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:1
load:0x3fff0018,len:4
load:0x3fff001c,len:1044
load:0x40078000,len:10124
load:0x40080400,len:5856
entry 0x400806a8
ets Jun 8 2016 00:22:57
rst:0x10 (RTCWDT_RTC_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:1
load:0x3fff0018,len:4
load:0x3fff001c,len:1044
load:0x40078000,len:10124
load:0x40080400,len:5856
entry 0x400806a8
Connecting to VUMA FIBER
Sunday, January 00 1900 00:00:00
Failed to obtain time
failed...retrying
00:00:01
failed...retrying
00:00:02
failed...retrying
00:00:03
failed...retrying
CONNECTED
Sunday, January 00 1900 00:00:00
Failed to obtain time
00:00:04
00:00:05
00:00:06
00:00:07
00:00:08
00:00:09
00:00:10
00:00:11
00:00:12
00:00:13
00:00:14
00:00:15
00:00:16
00:00:17
00:00:18
00:00:19
00:00:20
00:00:21
00:00:22
00:00:23
00:00:24
00:00:25
00:00:26
00:00:27
00:00:28
00:00:29
00:00:30
00:00:31
00:00:32
00:00:33
00:00:34
00:00:35
00:00:36
00:00:37
00:00:38
00:00:39
00:00:40
00:00:41
00:00:42
Any help would be appreciated

Related

SIGXCPU raised by setrlimit RLIMIT_CPU later than expected in a virtual machine

[EDIT: added MCVE in the text, clarifications]
I have the following program that sets RLIMIT_CPU to 2 seconds using setrlimit() and catches the signal. RLIMIT_CPU limits CPU time. «When the process reaches the soft limit, it is sent a SIGXCPU signal. The default action for this signal is to terminate the process. However, the signal can be caught, and the handler can return control to the main program.» (man)
The following program sets RLIMIT_CPU and a signal handler for SIGXCPU, then it generates random numbers until SIGXCPU gets raised, the signal handler simply exits the program.
test_signal.cpp
/*
* Test program for signal handling on CMS.
*
* Compile with:
* /usr/bin/g++ [-DDEBUG] -Wall -std=c++11 -O2 -pipe -static -s \
* -o test_signal test_signal.cpp
*
* The option -DDEBUG activates some debug logging in the helpers library.
*/
#include <iostream>
#include <fstream>
#include <random>
#include <chrono>
#include <iostream>
#include <unistd.h>
#include <csignal>
#include <sys/time.h>
#include <sys/resource.h>
using namespace std;
namespace helpers {
long long start_time = -1;
volatile sig_atomic_t timeout_flag = false;
unsigned const timelimit = 2; // soft limit on CPU time (in seconds)
void setup_signal(void);
void setup_time_limit(void);
static void signal_handler(int signum);
long long get_elapsed_time(void);
bool has_reached_timeout(void);
void setup(void);
}
namespace {
unsigned const minrand = 5;
unsigned const maxrand = 20;
int const numcycles = 5000000;
};
/*
* Very simple debugger, enabled at compile time with -DDEBUG.
* If enabled, it prints on stderr, otherwise it does nothing (it does not
* even evaluate the expression on its right-hand side).
*
* Main ideas taken from:
* - C++ enable/disable debug messages of std::couts on the fly
* (https://stackoverflow.com/q/3371540/2377454)
* - Standard no-op output stream
* (https://stackoverflow.com/a/11826787/2377454)
*/
#ifdef DEBUG
#define debug true
#else
#define debug false
#endif
#define debug_logger if (!debug) \
{} \
else \
cerr << "[DEBUG] helpers::"
// conversion factor betwen seconds and nanoseconds
#define NANOS 1000000000
// signal to handle
#define SIGNAL SIGXCPU
#define TIMELIMIT RLIMIT_CPU
/*
* This could be a function factory where and a closure of the signal-handling
* function so that we could explicitly pass the output ofstream and close it.
* C++ support closures only for lambdas, alas, at the moment we also need
* the signal-handling function to be a pointer to a function and lambaa are
* a different object that can not be converted. See:
* - Passing lambda as function pointer
* (https://stackoverflow.com/a/28746827/2377454)
*/
void helpers::signal_handler(int signum) {
helpers::timeout_flag = true;
debug_logger << "signal_handler:\t" << "signal " << signum \
<< " received" << endl;
debug_logger << "signal_handler:\t" << "exiting after " \
<< helpers::get_elapsed_time() << " microseconds" << endl;
exit(0);
}
/*
* Set function signal_handler() as handler for SIGXCPU using sigaction. See
* - https://stackoverflow.com/q/4863420/2377454
* - https://stackoverflow.com/a/17572787/2377454
*/
void helpers::setup_signal() {
debug_logger << "set_signal:\t" << "set_signal() called" << endl;
struct sigaction new_action;
//Set the handler in the new_action struct
new_action.sa_handler = signal_handler;
// Set to empty the sa_mask. It means that no signal is blocked
// while the handler run.
sigemptyset(&new_action.sa_mask);
// Block the SIGXCPU signal, while the handler run, SIGXCPU is ignored.
sigaddset(&new_action.sa_mask, SIGNAL);
// Remove any flag from sa_flag
new_action.sa_flags = 0;
// Set new action
sigaction(SIGNAL,&new_action,NULL);
if(debug) {
struct sigaction tmp;
// read the old signal associated to SIGXCPU
sigaction(SIGNAL, NULL, &tmp);
debug_logger << "set_signal:\t" << "action.sa_handler: " \
<< tmp.sa_handler << endl;
}
return;
}
/*
* Set soft CPU time limit.
* RLIMIT_CPU set teg CPU time limit in seconds..
* See:
* - https://www.go4expert.com/articles/
* getrlimit-setrlimit-control-resources-t27477/
* - https://gist.github.com/Leporacanthicus/11086960
*/
void helpers::setup_time_limit(void) {
debug_logger << "set_limit:\t\t" << "set_limit() called" << endl;
struct rlimit limit;
if(getrlimit(TIMELIMIT, &limit) != 0) {
perror("error calling getrlimit()");
exit(EXIT_FAILURE);
}
limit.rlim_cur = helpers::timelimit;
if(setrlimit(TIMELIMIT, &limit) != 0) {
perror("error calling setrlimit()");
exit(EXIT_FAILURE);
}
if (debug) {
struct rlimit tmp;
getrlimit(TIMELIMIT, &tmp);
debug_logger << "set_limit:\t\t" << "current limit: " << tmp.rlim_cur \
<< " seconds" << endl;
}
return;
}
void helpers::setup(void) {
struct timespec start;
if (clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &start)) {
exit(EXIT_FAILURE);
}
start_time = start.tv_sec*NANOS + start.tv_nsec;
setup_signal();
setup_time_limit();
return;
}
long long helpers::get_elapsed_time(void) {
struct timespec current;
if (clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &current)) {
exit(EXIT_FAILURE);
}
long long current_time = current.tv_sec*NANOS + current.tv_nsec;
long long elapsed_micro = (current_time - start_time)/1000 + \
((current_time - start_time) % 1000 >= 500);
return elapsed_micro;
}
bool helpers::has_reached_timeout(void) {
return helpers::timeout_flag;
}
int main() {
helpers::setup();
ifstream in("input.txt");
in.close();
ofstream out("output.txt");
random_device rd;
mt19937 eng(rd());
uniform_int_distribution<> distr(minrand, maxrand);
int i = 0;
while(!helpers::has_reached_timeout()) {
int nmsec;
for(int n=0; n<numcycles; n++) {
nmsec = distr(eng);
}
cout << "i: " << i << "\t- nmsec: " << nmsec << "\t- ";
out << "i: " << i << "\t- nmsec: " << nmsec << "\t- ";
cout << "program has been running for " << \
helpers::get_elapsed_time() << " microseconds" << endl;
out << "program has been running for " << \
helpers::get_elapsed_time() << " microseconds" << endl;
i++;
}
return 0;
}
I compile it as follows:
/usr/bin/g++ -DDEBUG -Wall -std=c++11 -O2 -pipe -static -s -o test_signal test_signal.cpp
On my laptop it correctly gets a SIGXCPU after 2 seconds, see the output:
$ /usr/bin/time -v ./test_signal
[DEBUG] helpers::set_signal: set_signal() called
[DEBUG] helpers::set_signal: action.sa_handler: 1
[DEBUG] helpers::set_limit: set_limit() called
[DEBUG] helpers::set_limit: current limit: 2 seconds
i: 0 - nmsec: 11 - program has been running for 150184 microseconds
i: 1 - nmsec: 18 - program has been running for 294497 microseconds
i: 2 - nmsec: 9 - program has been running for 422220 microseconds
i: 3 - nmsec: 5 - program has been running for 551882 microseconds
i: 4 - nmsec: 20 - program has been running for 685373 microseconds
i: 5 - nmsec: 16 - program has been running for 816642 microseconds
i: 6 - nmsec: 9 - program has been running for 951208 microseconds
i: 7 - nmsec: 20 - program has been running for 1085614 microseconds
i: 8 - nmsec: 20 - program has been running for 1217199 microseconds
i: 9 - nmsec: 12 - program has been running for 1350183 microseconds
i: 10 - nmsec: 17 - program has been running for 1486431 microseconds
i: 11 - nmsec: 13 - program has been running for 1619845 microseconds
i: 12 - nmsec: 20 - program has been running for 1758074 microseconds
i: 13 - nmsec: 11 - program has been running for 1895408 microseconds
[DEBUG] helpers::signal_handler: signal 24 received
[DEBUG] helpers::signal_handler: exiting after 2003326 microseconds
Command being timed: "./test_signal"
User time (seconds): 1.99
System time (seconds): 0.00
Percent of CPU this job got: 99%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:02.01
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 1644
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 59
Voluntary context switches: 1
Involuntary context switches: 109
Swaps: 0
File system inputs: 0
File system outputs: 16
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
If I compile and run in a virtual machine (VirtualBox, running Ubuntu), I get this:
$ /usr/bin/time -v ./test_signal
[DEBUG] helpers::set_signal: set_signal() called
[DEBUG] helpers::set_signal: action.sa_handler: 1
[DEBUG] helpers::set_limit: set_limit() called
[DEBUG] helpers::set_limit: current limit: 2 seconds
i: 0 - nmsec: 12 - program has been running for 148651 microseconds
i: 1 - nmsec: 13 - program has been running for 280494 microseconds
i: 2 - nmsec: 7 - program has been running for 428390 microseconds
i: 3 - nmsec: 5 - program has been running for 580805 microseconds
i: 4 - nmsec: 10 - program has been running for 714362 microseconds
i: 5 - nmsec: 19 - program has been running for 846853 microseconds
i: 6 - nmsec: 20 - program has been running for 981253 microseconds
i: 7 - nmsec: 7 - program has been running for 1114686 microseconds
i: 8 - nmsec: 7 - program has been running for 1249530 microseconds
i: 9 - nmsec: 12 - program has been running for 1392096 microseconds
i: 10 - nmsec: 20 - program has been running for 1531859 microseconds
i: 11 - nmsec: 19 - program has been running for 1667021 microseconds
i: 12 - nmsec: 13 - program has been running for 1818431 microseconds
i: 13 - nmsec: 17 - program has been running for 1973182 microseconds
i: 14 - nmsec: 7 - program has been running for 2115423 microseconds
i: 15 - nmsec: 20 - program has been running for 2255140 microseconds
i: 16 - nmsec: 13 - program has been running for 2394162 microseconds
i: 17 - nmsec: 10 - program has been running for 2528274 microseconds
i: 18 - nmsec: 15 - program has been running for 2667978 microseconds
i: 19 - nmsec: 8 - program has been running for 2803725 microseconds
i: 20 - nmsec: 9 - program has been running for 2940610 microseconds
i: 21 - nmsec: 19 - program has been running for 3075349 microseconds
i: 22 - nmsec: 14 - program has been running for 3215255 microseconds
i: 23 - nmsec: 5 - program has been running for 3356515 microseconds
i: 24 - nmsec: 5 - program has been running for 3497369 microseconds
[DEBUG] helpers::signal_handler: signal 24 received
[DEBUG] helpers::signal_handler: exiting after 3503271 microseconds
Command being timed: "./test_signal"
User time (seconds): 3.50
System time (seconds): 0.00
Percent of CPU this job got: 99%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:03.52
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 1636
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 59
Voluntary context switches: 0
Involuntary context switches: 106
Swaps: 0
File system inputs: 0
File system outputs: 16
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
Even running the binary compiled on my laptop, the process gets killed after around 3 seconds of elapsed user time.
Any idea of what could be causing this? For a broader context see, this thread: https://github.com/cms-dev/cms/issues/851

high latency on elasticsearch requests

I am using the Transport Client to index data. Here's a small test program:
#Test
public void LatencyTest() throws Exception {
// warmup
for (int i = 0; i < 10; i++) {
client.prepareGet("ts_head", "doc", "latency_test_id").get();
}
List<Long> times = Lists.newArrayList();
for (int i = 0; i < 100; i++) {
final long t1 = System.nanoTime();
final IndexResponse response = client.prepareIndex("ts_head", "doc", "latency_test_id").setSource("{}", XContentType.JSON).get();
final long t2 = System.nanoTime();
times.add(t2 - t1);
}
final double average = times.stream().mapToLong(v -> v).average().getAsDouble();
System.out.println("Average time: " + (average / 1_000_000f) + "ms");
}
the output is: Average time: 79.3905588ms, which seems awfully slow. It accesses a cluster consisting of 2 nodes (both master) over a local network with a gigabit connection. The network itself doesn't seem to have high latency:
$ ping -U 172.***
PING 172.*** (172.***) 56(84) bytes of data.
64 bytes from 172.***: icmp_seq=1 ttl=128 time=0.324 ms
64 bytes from 172.***: icmp_seq=2 ttl=128 time=0.280 ms
64 bytes from 172.***: icmp_seq=3 ttl=128 time=0.348 ms
64 bytes from 172.***: icmp_seq=4 ttl=128 time=0.306 ms
Indexing via HTTP is pretty slow as well:
$ curl -w "#curl-format.txt" -X POST -s "http://172.***:9200/ts_head/doc/latency_test_id" --data '{}'
{"_index":"ts_head","_type":"doc","_id":"latency_test_id","_version":211,"result":"updated","_shards":{"total":2,"successful":2,"failed":0},"created":false} time_namelookup: 0,001381
time_connect: 0,135607
time_appconnect: 0,000000
time_pretransfer: 0,136056
time_redirect: 0,000000
time_starttransfer: 0,283840
----------
time_total: 0,284044
Now my question is: What am I doing wrong? Elasticsearch must surely be capable of indexing faster, right?
For testing, I also read data, and it reported "took": 1, but still too consistently over 100ms to respond via curl.

Reading a file and output selected data

I'm trying to read data and output only selected data from a CSV file(Date, time and S(speed). My program is able to run but it loops in a weird way. Line 1, than Line 1 and 2, than Line 1,2 and 3 and so on and so forth. The final record that is shown in the program is also incorrect(0/3/2016 23:50). Need some guidance to rectify this problem.
Sample data in notepad format below. It contains thousands of line but only pick the last 3 lines to illustrate my problem. Ignore the "Line" below, it's just to tell you which line is in excel format. Line 1 is the header which i ignore, follow by the data.
Line 1:WAST,DP,Dta,Dts,EV,QFE,QFF,QNH,RF,RH,S,SR,ST1,ST2,ST3,ST4,Sx,T
Line 2: 31/03/2016 23:30,11.4,257,11,0,1012.9,1016.4,1016.5,0,63.5,5,13,26.4,25.8,25.6,26.1,7,18.48
Line 3: 31/03/2016 23:40,11.4,250,15,0,1012.9,1016.4,1016.5,0,64.7,5,10,26.4,25.8,25.6,26.1,6,18.19
Line 4:31/03/2016 23:50,11.4,243,5,0,1012.7,1016.2,1016.3,0,65.8,3,11,26.3,25.8,25.6,26.1,5,17.95
Current Output:
31/3/2016 23:30
Speed: 5
31/3/2016 23:30
Speed: 5
31/3/2016 23:40
Speed: 5
31/3/2016 23:30
Speed: 5
31/3/2016 23:40
Speed: 5
31/3/2016 23:50
Speed: 3
31/3/2016 23:30
Speed: 5
31/3/2016 23:40
Speed: 5
31/3/2016 23:50
Speed: 3
0/3/2016 23:50
Speed: 3
Max Speed: 5
Done!
Intended Output
31/3/2016 23:30
Speed: 5
31/3/2016 23:40
Speed: 5
31/3/2016 23:50
Speed: 3
Max Speed: 5
Done!
using namespace std;
typedef struct
{
Date d;
Time t;
float speed;
}
WindLogType;
//declare speed max function
ostream & operator <<(ostream &osObject, const WindLogType & w1);
istream & operator >>(istream &input, WindLogType &w1);
int main()
{
string filename;
ifstream input;
filename = "Data.csv";
input.open(filename.c_str());
input.ignore(500,'\n');
Vector<WindLogType> windlog;
string line,line2, readDay, readMonth, readYear, readHour, readMinute;
float sp;
while(!input.eof())
{
getline(input,readDay,'/');
getline(input,readMonth,'/');
getline(input,readYear,' ');
getline(input,readHour,':');
getline(input,readMinute,',');
int day1 =atoi(readDay.c_str());
int month1=atoi(readMonth.c_str());
int year1=atoi(readYear.c_str());
int hour1=atoi(readHour.c_str());
int minute1=atoi(readMinute.c_str());
float s1;
for(int i = 0;i<10;i++)
{
input>>s1;
input.ignore(50,',');
}
WindLogType T1;//create a record
T1.d.setDate(day1,month1,year1);
T1.t.setTime(hour1,minute1);
T1.speed = s1;
windlog.push_back(T1);//push inside vector
windlog.print();
getline(input,line2,'\n');
}
float maxSpeed;
WindLogType H1;
H1=windlog.at(0);
maxSpeed=H1.speed;
for(int i=0;i<windlog.size();i++)
{
if(windlog.at(i).speed>maxSpeed)
{
maxSpeed=windlog.at(i).speed;
}
}
cout<<"Max Speed: "<<maxSpeed<<endl;
cout<<"Done!"<<endl;
return 0;
}
ostream & operator <<(ostream &osObject, const WindLogType &w1)
{
osObject<<w1.d<<w1.t<<endl<<"Speed: "<<w1.speed;
return osObject;
}
istream &operator >>(istream & input, WindLogType &w1)
{
input>>w1.d>>w1.t>>w1.speed;
return input;
}

Minimize cudaDeviceSynchronize launch overhead

I'm currently doing a project with CUDA where a pipeline is refreshed with 200-10000 new events every 1ms. Each time, I want to call one(/two) kernels which compute a small list of outputs; then fed those outputs to the next element of the pipeline.
The theoretical flow is:
receive data in an std::vector
cudaMemcpy the vector to GPU
processing
generate small list of outputs
cudaMemcpy to the output std::vector
But when I'm calling cudaDeviceSynchronize on a 1block/1thread empty kernel with no processing, it already takes in average 0.7 to 1.4ms, which is already higher than my 1ms timeframe.
I could eventually change the timeframe of the pipeline in order to receive events every 5ms, but with 5x more each times. It wouldn't be ideal though.
What would be the best way to minimize the overhead of cudaDeviceSynchronize? Could streams be helpful in this situation? Or another solution to efficiently run the pipeline.
(Jetson TK1, compute capabilities 3.2)
Here's a nvprof log of the applications:
==8285== NVPROF is profiling process 8285, command: python player.py test.rec
==8285== Profiling application: python player.py test.rec
==8285== Profiling result:
Time(%) Time Calls Avg Min Max Name
94.92% 47.697ms 5005 9.5290us 1.7500us 13.083us reset_timesurface(__int64, __int64*, __int64*, __int64*, __int64*, float*, float*, bool*, bool*, Event*)
5.08% 2.5538ms 8 319.23us 99.750us 413.42us [CUDA memset]
==8285== API calls:
Time(%) Time Calls Avg Min Max Name
75.00% 5.03966s 5005 1.0069ms 25.083us 11.143ms cudaDeviceSynchronize
17.44% 1.17181s 5005 234.13us 83.750us 3.1391ms cudaLaunch
4.71% 316.62ms 9 35.180ms 23.083us 314.99ms cudaMalloc
2.30% 154.31ms 50050 3.0830us 1.0000us 2.6866ms cudaSetupArgument
0.52% 34.857ms 5005 6.9640us 2.5000us 464.67us cudaConfigureCall
0.02% 1.2048ms 8 150.60us 71.917us 183.33us cudaMemset
0.01% 643.25us 83 7.7490us 1.3330us 287.42us cuDeviceGetAttribute
0.00% 12.916us 2 6.4580us 2.0000us 10.916us cuDeviceGetCount
0.00% 5.3330us 1 5.3330us 5.3330us 5.3330us cuDeviceTotalMem
0.00% 4.0830us 1 4.0830us 4.0830us 4.0830us cuDeviceGetName
0.00% 3.4160us 2 1.7080us 1.5830us 1.8330us cuDeviceGet
A small reconstitution of the program (nvprof log at the end) - for some reason, the average of cudaDeviceSynchronize is 4 times lower, but it's still really high for an empty 1-thread kernel:
/* Compile with `nvcc test.cu -I.`
* with -I pointing to "helper_cuda.h" and "helper_string.h" from CUDA samples
**/
#include <iostream>
#include <cuda.h>
#include <helper_cuda.h>
#define MAX_INPUT_BUFFER_SIZE 131072
typedef struct {
unsigned short x;
unsigned short y;
short a;
long long b;
} Event;
long long *d_a_[2], *d_b_[2];
float *d_as_, *d_bs_;
bool *d_some_bool_[2];
Event *d_data_;
int width_ = 320;
int height_ = 240;
__global__ void reset_timesurface(long long ts,
long long *d_a_0, long long *d_a_1,
long long *d_b_0, long long *d_b_1,
float *d_as, float *d_bs,
bool *d_some_bool_0, bool *d_some_bool_1, Event *d_data) {
// nothing here
}
void reset_errors(long long ts) {
static const int n = 1024;
static const dim3 grid_size(width_ * height_ / n
+ (width_ * height_ % n != 0), 1, 1);
static const dim3 block_dim(n, 1, 1);
reset_timesurface<<<1, 1>>>(ts, d_a_[0], d_a_[1],
d_b_[0], d_b_[1],
d_as_, d_bs_,
d_some_bool_[0], d_some_bool_[1], d_data_);
cudaDeviceSynchronize();
// static long long *h_holder = (long long*)malloc(sizeof(long long) * 2000);
// cudaMemcpy(h_holder, d_a_[0], 0, cudaMemcpyDeviceToHost);
}
int main(void) {
checkCudaErrors(cudaMalloc(&(d_a_[0]), sizeof(long long)*width_*height_*2));
checkCudaErrors(cudaMemset(d_a_[0], 0, sizeof(long long)*width_*height_*2));
checkCudaErrors(cudaMalloc(&(d_a_[1]), sizeof(long long)*width_*height_*2));
checkCudaErrors(cudaMemset(d_a_[1], 0, sizeof(long long)*width_*height_*2));
checkCudaErrors(cudaMalloc(&(d_b_[0]), sizeof(long long)*width_*height_*2));
checkCudaErrors(cudaMemset(d_b_[0], 0, sizeof(long long)*width_*height_*2));
checkCudaErrors(cudaMalloc(&(d_b_[1]), sizeof(long long)*width_*height_*2));
checkCudaErrors(cudaMemset(d_b_[1], 0, sizeof(long long)*width_*height_*2));
checkCudaErrors(cudaMalloc(&d_as_, sizeof(float)*width_*height_*2));
checkCudaErrors(cudaMemset(d_as_, 0, sizeof(float)*width_*height_*2));
checkCudaErrors(cudaMalloc(&d_bs_, sizeof(float)*width_*height_*2));
checkCudaErrors(cudaMemset(d_bs_, 0, sizeof(float)*width_*height_*2));
checkCudaErrors(cudaMalloc(&(d_some_bool_[0]), sizeof(bool)*width_*height_*2));
checkCudaErrors(cudaMemset(d_some_bool_[0], 0, sizeof(bool)*width_*height_*2));
checkCudaErrors(cudaMalloc(&(d_some_bool_[1]), sizeof(bool)*width_*height_*2));
checkCudaErrors(cudaMemset(d_some_bool_[1], 0, sizeof(bool)*width_*height_*2));
checkCudaErrors(cudaMalloc(&d_data_, sizeof(Event)*MAX_INPUT_BUFFER_SIZE));
for (int i = 0; i < 5005; ++i)
reset_errors(16487L);
cudaFree(d_a_[0]);
cudaFree(d_a_[1]);
cudaFree(d_b_[0]);
cudaFree(d_b_[1]);
cudaFree(d_as_);
cudaFree(d_bs_);
cudaFree(d_some_bool_[0]);
cudaFree(d_some_bool_[1]);
cudaFree(d_data_);
cudaDeviceReset();
}
/* nvprof ./a.out
==9258== NVPROF is profiling process 9258, command: ./a.out
==9258== Profiling application: ./a.out
==9258== Profiling result:
Time(%) Time Calls Avg Min Max Name
92.64% 48.161ms 5005 9.6220us 6.4160us 13.250us reset_timesurface(__int64, __int64*, __int64*, __int64*, __int64*, float*, float*, bool*, bool*, Event*)
7.36% 3.8239ms 8 477.99us 148.92us 620.17us [CUDA memset]
==9258== API calls:
Time(%) Time Calls Avg Min Max Name
53.12% 1.22036s 5005 243.83us 9.6670us 8.5762ms cudaDeviceSynchronize
25.10% 576.78ms 5005 115.24us 44.250us 11.888ms cudaLaunch
9.13% 209.77ms 9 23.308ms 16.667us 208.54ms cudaMalloc
6.56% 150.65ms 1 150.65ms 150.65ms 150.65ms cudaDeviceReset
5.33% 122.39ms 50050 2.4450us 833ns 6.1167ms cudaSetupArgument
0.60% 13.808ms 5005 2.7580us 1.0830us 104.25us cudaConfigureCall
0.10% 2.3845ms 9 264.94us 22.333us 537.75us cudaFree
0.04% 938.75us 8 117.34us 58.917us 169.08us cudaMemset
0.02% 461.33us 83 5.5580us 1.4160us 197.58us cuDeviceGetAttribute
0.00% 15.500us 2 7.7500us 3.6670us 11.833us cuDeviceGetCount
0.00% 7.6670us 1 7.6670us 7.6670us 7.6670us cuDeviceTotalMem
0.00% 4.8340us 1 4.8340us 4.8340us 4.8340us cuDeviceGetName
0.00% 3.6670us 2 1.8330us 1.6670us 2.0000us cuDeviceGet
*/
As detailled in the comments of the original message, my problem was entirely related to the GPU I'm using (Tegra K1). Here's an answer I found for this particular problem; it might be useful for other GPUs as well. The average for cudaDeviceSynchronize on my Jetson TK1 went from 250us to 10us.
The rate of the Tegra was 72000kHz by default, we'll have to set it to 852000kHz using this command:
$ echo 852000000 > /sys/kernel/debug/clock/override.gbus/rate
$ echo 1 > /sys/kernel/debug/clock/override.gbus/state
We can find the list of available frequency using this command:
$ cat /sys/kernel/debug/clock/gbus/possible_rates
72000 108000 180000 252000 324000 396000 468000 540000 612000 648000 684000 708000 756000 804000 852000 (kHz)
More performance can be obtained (again, in exchange for a higher power draw) on both the CPU and GPU; check this link for more informations.

Struggling - yet another memory corruption problem, bad alloc (C++, VS 2008)

I've read a lot of posts on memory corruption and it seems like it can be a considerably difficult problem to solve. When I run my code on my linux machine it executes fine and valgrind doesn't report any leaks or errors. However when I run the code on my lab's windows machine with VS2008, I get a bad alloc error, stopping with _RAISE(nomem). This seems strange to me because I would have expected valgrind to catch it.
void *__CRTDECL operator new(size_t size) _THROW1(_STD bad_alloc)
{ // try to allocate size bytes
void *p;
while ((p = malloc(size)) == 0)
if (_callnewh(size) == 0)
{ // report no memory
static const std::bad_alloc nomem;
_RAISE(nomem);
}
return (p);
}
From what I've read it seems like this problem often comes from writing past the end of an allocated block of memory or after it has been freed but I haven't had any luck identifying where that might be happening. Here's my call stack from after trying to run a Release build.
KernelBase.dll!_RaiseException#16() + 0x58 bytes
msvcr90.dll!_CxxThrowException(void * pExceptionObject=0x0040f6d4, const _s__ThrowInfo * pThrowInfo=0x6f63d604) Line 161 C++
msvcr90.dll!operator new(unsigned int size=4) Line 63 + 0x17 bytes C++
tempMem.exe!std::vector<unsigned char,std::allocator<unsigned char> >::vector<unsigned char,std::allocator<unsigned char> >(const std::vector<unsigned char,std::allocator<unsigned char> > & _Right=[...]()) Line 500 + 0x31 bytes C++
tempMem.exe!DendriteSegment::DendriteSegment(const DendriteSegment & __that={...}) + 0x4a bytes C++
tempMem.exe!std::list<DendriteSegment,std::allocator<DendriteSegment> >::_Buynode(std::_List_nod<DendriteSegment,std::allocator<DendriteSegment> >::_Node * _Next=0x005d84d0, std::_List_nod<DendriteSegment,std::allocator<DendriteSegment> >::_Node * _Prev=0x0093af50, const DendriteSegment & _Val={...}) Line 1208 C++
tempMem.exe!TP::adaptSegments(Cell * & cellToAdapt=0x005d8450, std::list<segUpdate,std::allocator<segUpdate> > & segUpdateList=[1]({segToUpdate=0x00000000 synapseChanges=[0]() newSynapsesToAdd=[2](0x005bc8f8 {DSlist=[19]({sequenceSegment=true synapse=[3](0x005d79a0 {DSlist={...} connected=0x005d7af8 currentState=0x005d79c0 ...},0x005d7ef8 {DSlist={...} connected=0x005d8050 currentState=0x005d7f18 ...},0x005d8450 {DSlist={...} connected=0x005d85a8 currentState=0x005d8470 ...}) permanence=[3](80 'P',80 'P',80 'P') ...},. ), bool posReinforce=true) Line 701 + 0x1b bytes C++
tempMem.exe!Level::TPlearning() Line 236 + 0x26 bytes C++
tempMem.exe!main(int argc=, char * * argv=) Line 96 C++
msvcr90.dll!_encode_pointer(void * ptr=0x6f5d3607) Line 114 + 0x5 bytes C
0069ee20()
msvcr90.dll!_initterm(void (void)* * pfbegin=0x00000001, void (void)* * pfend=0x000a1ef8) Line 903 C
tempMem.exe!__tmainCRTStartup() Line 582 + 0x17 bytes C
kernel32.dll!#BaseThreadInitThunk#12() + 0x12 bytes
ntdll.dll!___RtlUserThreadStart#8() + 0x27 bytes
ntdll.dll!__RtlUserThreadStart#8() + 0x1b bytes
When running a debugging session I get a different error (it doesn't seem to happen all the time though...)
return HeapAlloc(_crtheap, 0, size ? size : 1);
from here
#ifdef _WIN64
return HeapAlloc(_crtheap, 0, size ? size : 1);
#else /* _WIN64 */
if (__active_heap == __SYSTEM_HEAP) {
return HeapAlloc(_crtheap, 0, size ? size : 1);
} else
if ( __active_heap == __V6_HEAP ) {
if (pvReturn = V6_HeapAlloc(size)) {
return pvReturn;
}
}
In this case the call stack is
ntdll.dll!_RtlpBreakPointHeap#4() + 0x23 bytes
ntdll.dll!#RtlpAllocateHeap#24() + 0x57dbc bytes
ntdll.dll!_RtlAllocateHeap#12() + 0x502a bytes
ntdll.dll!_RtlDebugAllocateHeap#12() + 0xb5 bytes
ntdll.dll!#RtlpAllocateHeap#24() + 0x57c17 bytes
ntdll.dll!_RtlAllocateHeap#12() + 0x502a bytes
msvcr90d.dll!_heap_alloc_base(unsigned int size=38) Line 105 + 0x28 bytes C
msvcr90d.dll!_heap_alloc_dbg_impl(unsigned int nSize=2, int nBlockUse=1, const char * szFileName=0x00000000, int nLine=0, int * errno_tmp=0x0052f284) Line 427 + 0x9 bytes C++
msvcr90d.dll!_nh_malloc_dbg_impl(unsigned int nSize=2, int nhFlag=0, int nBlockUse=1, const char * szFileName=0x00000000, int nLine=0, int * errno_tmp=0x0052f284) Line 239 + 0x19 bytes C++
msvcr90d.dll!_nh_malloc_dbg(unsigned int nSize=2, int nhFlag=0, int nBlockUse=1, const char * szFileName=0x00000000, int nLine=0) Line 296 + 0x1d bytes C++
msvcr90d.dll!malloc(unsigned int nSize=2) Line 56 + 0x15 bytes C++
msvcr90d.dll!operator new(unsigned int size=2) Line 59 + 0x9 bytes C++
tempMem.exe!std::_Allocate(unsigned int _Count=2) Line 43 + 0xc bytes C++
tempMem.exe!std::allocator<uint8_t>::allocate(unsigned int _Count=2) Line 145 + 0x13 bytes C++
tempMem.exe!std::::_Buy(unsigned int _Capacity=2) Line 1115 + 0x14 bytes C++
tempMem.exe!std::::vector(const std::vector<uint8_t, std::allocator<uint8_t> > & _Right=[2](80 'P',80 'P')) Line 501 + 0x2b bytes C++
tempMem.exe!DendriteSegment::DendriteSegment() + 0x8b bytes C++
tempMem.exe!std::_Construct(DendriteSegment * _Ptr=0x007e7490, const DendriteSegment & _Val={...}) Line 52 + 0x97 bytes C++
tempMem.exe!std::allocator<DendriteSegment>::construct(DendriteSegment * _Ptr=0x007e7490, const DendriteSegment & _Val={...}) Line 155 + 0x15 bytes C++
tempMem.exe!std::::_Buynode(std::_List_nod<DendriteSegment, std::allocator<DendriteSegment> >::_Node * _Next=0x00637f60, std::_List_nod<DendriteSegment, std::allocator<DendriteSegment> >::_Node * _Prev=0x00bfcb50, const DendriteSegment & _Val={...}) Line 1199 + 0x47 bytes C++
tempMem.exe!std::::_Insert(std::list<DendriteSegment, std::allocator<DendriteSegment> >::_Const_iterator<1> _Where={sequenceSegment=true synapse=[0]() permanence=[0]() ...}, const DendriteSegment & _Val={...}) Line 718 + 0x65 bytes C++
tempMem.exe!std::::push_back(const DendriteSegment & _Val={...}) Line 670 + 0x6f bytes C++
tempMem.exe!TP::adaptSegments(Cell * & cellToAdapt=0x00637ee8, std::list<segUpdate, std::allocator<segUpdate> > & segUpdateList=[1](...), bool posReinforce=true) Line 701 + 0x16 bytes C++
tempMem.exe!TP::phase3() Line 949 + 0x3e bytes C++
tempMem.exe!Col::TPphase3() Line 398 + 0xd bytes C++
tempMem.exe!Level::TPlearning() Line 236 + 0x4a bytes C++
tempMem.exe!Network::runTPlearning() Line 93 + 0xd bytes C++
tempMem.exe!main(int argc=1, char * * argv=0x006f62a0) Line 93 + 0xd bytes C++
tempMem.exe!__tmainCRTStartup() Line 582 + 0x19 bytes C
tempMem.exe!mainCRTStartup() Line 399 C
kernel32.dll!#BaseThreadInitThunk#12() + 0x12 bytes
ntdll.dll!___RtlUserThreadStart#8() + 0x27 bytes
ntdll.dll!__RtlUserThreadStart#8() + 0x1b bytes
This is my first time debugging this type of problem and I hope I'm just overlooking/misunderstanding something obvious...
Here's the code for the adaptSegments function that corresponds to this line from the call stack (Release)
tempMem.exe!TP::adaptSegments(Cell * & cellToAdapt=0x005d8450, std::list<segUpdate,std::allocator<segUpdate> > & segUpdateList=[1]({segToUpdate=0x00000000 synapseChanges=[0]() newSynapsesToAdd=[2](0x005bc8f8 {DSlist=[19]({sequenceSegment=true synapse=[3](0x005d79a0 {DSlist={...} connected=0x005d7af8 currentState=0x005d79c0 ...},0x005d7ef8 {DSlist={...} connected=0x005d8050 currentState=0x005d7f18 ...},0x005d8450 {DSlist={...} connected=0x005d85a8 currentState=0x005d8470 ...}) permanence=[3](80 'P',80 'P',80 'P') ...},. ), bool posReinforce=true) Line 701 + 0x1b bytes
bool TP::adaptSegments(Cell *& cellToAdapt,
std::list<segUpdate> & segUpdateList, bool posReinforce)
{
std::list<segUpdate>::iterator curSegUpdate;
std::list<activeSynapsePair>::iterator curSyn;
std::list<Cell *>::iterator synToAdd;
int size = 0;
//for each segUpdate element in the cell's segUpdateList
for (curSegUpdate = segUpdateList.begin();
curSegUpdate != segUpdateList.end(); ++curSegUpdate)
{
//if the segment already exists
if (curSegUpdate->segToUpdate != NULL)
{
//if sequence segment flag is true, set it on DS
if(curSegUpdate->sequenceSegment == true)
{curSegUpdate->segToUpdate->sequenceSegment = true;}
if (posReinforce == true)
{
//for each synapses permanence pair in segUpdate
for (curSyn = (curSegUpdate->
synapseChanges.begin());
curSyn !=(curSegUpdate->synapseChanges.end());
++curSyn)
{
//decrement inactive synapses
if (curSyn->second == false)
{
if (*(curSyn->first)-
permanenceDec < 0)
{*(curSyn->first) = 0;}
else
{*(curSyn->first)-=
permanenceDec;}
}
//increment active synapses
else if (curSyn->second == true)
{
if (*(curSyn->first)+
permanenceInc > 100)
{*(curSyn->first) =100;}
else
{*(curSyn->first)+=
permanenceInc;}
}
}
}
else if (posReinforce == false)
{
//for each synapses permanence pair in segUpdate
for (curSyn = (curSegUpdate->
synapseChanges.begin());
curSyn !=(curSegUpdate->synapseChanges.end());
++curSyn)
{
//decrement active synapses
if (curSyn->second == true)
{
if (*(curSyn->first)-
permanenceDec < 0)
{*(curSyn->first) = 0;}
else
{*(curSyn->first)-=
permanenceDec;}
}
}
}
//if adding synapses to an existing segment
if (curSegUpdate->newSynapsesToAdd.empty()==false)
{
if (curSegUpdate->segToUpdate->synapse.size()
<MAX_NUM_SYN)
{
//for each synapses in newSynapses
for (synToAdd =
curSegUpdate->newSynapsesToAdd.begin();
synToAdd != curSegUpdate->
newSynapsesToAdd.end(); ++synToAdd)
{
//add new synapse to list
curSegUpdate->segToUpdate->
synapse.push_back(*synToAdd);
//and permenance with initialPerm
curSegUpdate->segToUpdate->
permanence.push_back(initialPerm);
}
}//if less than MAX_NUM_SYN
}
}//end if segment already exists
//if segment doesn't exist, create a new segment & add synapses
else if (curSegUpdate->segToUpdate == NULL)
{
size = curSegUpdate->newSynapsesToAdd.size();
if (size != 0)
{
DendriteSegment myNewSeg; //create a new DS
//set sequenceSegment flag if it is true
if (curSegUpdate->sequenceSegment == true)
{myNewSeg.sequenceSegment = true;}
std::copy(curSegUpdate->newSynapsesToAdd.begin(),
curSegUpdate->newSynapsesToAdd.end(),
std::back_inserter(myNewSeg.synapse));
myNewSeg.permanence.resize(size, initialPerm);
//then add it to the cells list of DS
cellToAdapt->DSlist.push_back(myNewSeg);
}//if size not 0
}
}
return true;}
My next step is to try Application Verifier. I tried Intel Inspector XE 2011 but it doesn't seem to detect any relevant memory problems.
Update: Using gflags I found that the cause of my woes was due to having pointers to elements in a std::vector. myVector.push_back(newElem) was used to add elements to the vectors causing pointers to elements in the vector to become bad. I replaced the vectors with std::list which does not have the same problem (see here)
Try gflags in the Microsoft Debugging Tools for Windows (http://www.google.ca/search?sourceid=chrome&ie=UTF-8&q=debugging+tools+for+windows) toolkit. It allows you to run with a debug heap that catches problems as they occur rather than random problems that occur long after the site where the crash occurred.

Resources