Reading a file and output selected data - c++11

I'm trying to read data and output only selected data from a CSV file(Date, time and S(speed). My program is able to run but it loops in a weird way. Line 1, than Line 1 and 2, than Line 1,2 and 3 and so on and so forth. The final record that is shown in the program is also incorrect(0/3/2016 23:50). Need some guidance to rectify this problem.
Sample data in notepad format below. It contains thousands of line but only pick the last 3 lines to illustrate my problem. Ignore the "Line" below, it's just to tell you which line is in excel format. Line 1 is the header which i ignore, follow by the data.
Line 1:WAST,DP,Dta,Dts,EV,QFE,QFF,QNH,RF,RH,S,SR,ST1,ST2,ST3,ST4,Sx,T
Line 2: 31/03/2016 23:30,11.4,257,11,0,1012.9,1016.4,1016.5,0,63.5,5,13,26.4,25.8,25.6,26.1,7,18.48
Line 3: 31/03/2016 23:40,11.4,250,15,0,1012.9,1016.4,1016.5,0,64.7,5,10,26.4,25.8,25.6,26.1,6,18.19
Line 4:31/03/2016 23:50,11.4,243,5,0,1012.7,1016.2,1016.3,0,65.8,3,11,26.3,25.8,25.6,26.1,5,17.95
Current Output:
31/3/2016 23:30
Speed: 5
31/3/2016 23:30
Speed: 5
31/3/2016 23:40
Speed: 5
31/3/2016 23:30
Speed: 5
31/3/2016 23:40
Speed: 5
31/3/2016 23:50
Speed: 3
31/3/2016 23:30
Speed: 5
31/3/2016 23:40
Speed: 5
31/3/2016 23:50
Speed: 3
0/3/2016 23:50
Speed: 3
Max Speed: 5
Done!
Intended Output
31/3/2016 23:30
Speed: 5
31/3/2016 23:40
Speed: 5
31/3/2016 23:50
Speed: 3
Max Speed: 5
Done!
using namespace std;
typedef struct
{
Date d;
Time t;
float speed;
}
WindLogType;
//declare speed max function
ostream & operator <<(ostream &osObject, const WindLogType & w1);
istream & operator >>(istream &input, WindLogType &w1);
int main()
{
string filename;
ifstream input;
filename = "Data.csv";
input.open(filename.c_str());
input.ignore(500,'\n');
Vector<WindLogType> windlog;
string line,line2, readDay, readMonth, readYear, readHour, readMinute;
float sp;
while(!input.eof())
{
getline(input,readDay,'/');
getline(input,readMonth,'/');
getline(input,readYear,' ');
getline(input,readHour,':');
getline(input,readMinute,',');
int day1 =atoi(readDay.c_str());
int month1=atoi(readMonth.c_str());
int year1=atoi(readYear.c_str());
int hour1=atoi(readHour.c_str());
int minute1=atoi(readMinute.c_str());
float s1;
for(int i = 0;i<10;i++)
{
input>>s1;
input.ignore(50,',');
}
WindLogType T1;//create a record
T1.d.setDate(day1,month1,year1);
T1.t.setTime(hour1,minute1);
T1.speed = s1;
windlog.push_back(T1);//push inside vector
windlog.print();
getline(input,line2,'\n');
}
float maxSpeed;
WindLogType H1;
H1=windlog.at(0);
maxSpeed=H1.speed;
for(int i=0;i<windlog.size();i++)
{
if(windlog.at(i).speed>maxSpeed)
{
maxSpeed=windlog.at(i).speed;
}
}
cout<<"Max Speed: "<<maxSpeed<<endl;
cout<<"Done!"<<endl;
return 0;
}
ostream & operator <<(ostream &osObject, const WindLogType &w1)
{
osObject<<w1.d<<w1.t<<endl<<"Speed: "<<w1.speed;
return osObject;
}
istream &operator >>(istream & input, WindLogType &w1)
{
input>>w1.d>>w1.t>>w1.speed;
return input;
}

Related

ESP32 not functioning WIFI

I have a Frankenstein ESP 32 setup using my dead Wemos Lolin 32 board and an external Ai thinker ESP 32 chip.
It has been working alright until recently it's failing to connect to any Wifi while dumping garbage data for anything beyond the Wifi. begin() function. It occasionally connects in roughly 1 of 15 reboots. It also seems to sometimes stall UART before triggering a reboot and then working.
I have been using it with a GC9A01 1.28-inch TFT display. I may have damaged it while making the connections but I can't be sure. If I don't use Wifi most other functionality seems to work ok.
This is a sample of the many codes I have tried.
#include <Arduino.h>
#include <WiFi.h>
#include "time.h"
const char *ssid = "VUMA FIBER ";
const char *password = "mysecurepassword";
const char *ntpServer = "pool.ntp.org";
const long gmtOffset_sec = 3 * 3600;
const int daylightOffset_sec = 0;
hw_timer_t *My_timer = NULL;
class Timehandler
{
private:
hw_timer_t *My_timer = NULL;
uint8_t counter = 0;
public:
uint8_t hour;
uint8_t minutes;
uint8_t seconds;
unsigned int year;
uint8_t date;
uint8_t day;
struct tm timeinfo;
bool fetchtime(uint8_t gmtOffset = 0, uint8_t daylightOffset_sec = 0, const char *ntpServer = "pool.ntp.org")
{
// configTime(gmtOffset * 3600, daylightOffset_sec, ntpServer);
// struct tm timeinfo;
// if (!getLocalTime(&timeinfo)) {
// return false;
// } else {
Serial.println(&timeinfo, "%A, %B %d %Y %H:%M:%S");
return true;
// }
}
void maintainTime()
{
seconds++;
if (seconds >= 60)
{
minutes++;
seconds = 0;
}
if (minutes >= 60)
{
hour++;
minutes = 0;
}
if (hour >= 24)
{
date++;
hour = 0;
}
Serial.printf("%02d", hour);
Serial.print(":");
Serial.printf("%02d", minutes);
Serial.print(":");
Serial.printf("%02d", seconds);
Serial.println("");
}
void getTime()
{
struct tm timeinfo;
if (!getLocalTime(&timeinfo))
{
Serial.println("Failed to obtain time");
return;
}
hour = timeinfo.tm_hour;
minutes = timeinfo.tm_min;
seconds = timeinfo.tm_sec;
year = timeinfo.tm_year;
date = timeinfo.tm_mday;
day = timeinfo.tm_wday;
}
};
bool connected = false;
Timehandler t;
void setup()
{
Serial.begin(115200);
// connect to WiFi
Serial.print("Connecting to ");
Serial.println(ssid);
WiFi.begin(ssid, password);
t.fetchtime();
t.getTime();
}
void loop()
{
if (!connected)
{
Serial.println("failed...retrying");
if (WiFi.status() == WL_CONNECTED)
{
Serial.println(" CONNECTED");
while (!t.fetchtime(3))
{
Serial.println("failed...retrying");
delay(500);
t.fetchtime(3);
WiFi.disconnect(true);
WiFi.mode(WIFI_OFF);
}
t.getTime();
connected = true;
}
}else{
t.maintainTime();
}
t.maintainTime();
delay(1000);
}
Here is a sample of the serial output.
CURRENT: upload_protocol = esptool
Looking for upload port...
Auto-detected: COM5
Uploading .pio\build\lolin32\firmware.bin
esptool.py v3.1
Serial port COM5
Connecting....
Chip is ESP32-D0WD (revision 1)
Features: WiFi, BT, Dual Core, 240MHz, VRef calibration in efuse, Coding Scheme None
Crystal is 40MHz
MAC: 94:3c:c6:10:4b:30
Uploading stub...
Running stub...
Stub running...
Changing baud rate to 460800
Changed.
Configuring flash size...
Auto-detected Flash size: 4MB
Flash will be erased from 0x00001000 to 0x00005fff...
Flash will be erased from 0x00008000 to 0x00008fff...
Flash will be erased from 0x0000e000 to 0x0000ffff...
Flash will be erased from 0x00010000 to 0x000abfff...
Compressed 17120 bytes to 11164...
Writing at 0x00001000... (100 %)
Wrote 17120 bytes (11164 compressed) at 0x00001000 in 0.6 seconds (effective 213.4 kbit/s)...
Hash of data verified.
Compressed 3072 bytes to 128...
Writing at 0x00008000... (100 %)
Wrote 3072 bytes (128 compressed) at 0x00008000 in 0.1 seconds (effective 247.8 kbit/s)...
Hash of data verified.
Compressed 8192 bytes to 47...
Writing at 0x0000e000... (100 %)
Wrote 8192 bytes (47 compressed) at 0x0000e000 in 0.2 seconds (effective 397.7 kbit/s)...
Hash of data verified.
Compressed 638784 bytes to 391514...
Writing at 0x00010000... (4 %)
Writing at 0x0001bce6... (8 %)
Writing at 0x00029453... (12 %)
Writing at 0x00031fdd... (16 %)
Writing at 0x00037279... (20 %)
Writing at 0x0003c526... (25 %)
Writing at 0x000419cd... (29 %)
Writing at 0x00046f5c... (33 %)
Writing at 0x0004c242... (37 %)
Writing at 0x000533e8... (41 %)
Writing at 0x0005ba7c... (45 %)
Writing at 0x00061195... (50 %)
Writing at 0x00066a45... (54 %)
Writing at 0x0006bdb0... (58 %)
Writing at 0x000715e8... (62 %)
Writing at 0x00077550... (66 %)
Writing at 0x0007cdc5... (70 %)
Writing at 0x000830c2... (75 %)
Writing at 0x00088d20... (79 %)
Writing at 0x0008ebd9... (83 %)
Writing at 0x00094b83... (87 %)
Writing at 0x0009ab4a... (91 %)
Writing at 0x000a0842... (95 %)
Writing at 0x000a6686... (100 %)
Wrote 638784 bytes (391514 compressed) at 0x00010000 in 10.5 seconds (effective 486.4 kbit/s)...
Hash of data verified.
Leaving...
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:1
load:0x3fff0018,len:4
load:0x3fff001c,len:1044load:0x40078000,len:10124
load:0x40080400,len:5856
entry 0x400806a8
Connecting to VUMA FIBER
���n���b�l�|�␒b␒␂␌␌␌ll␌␌␌␌␌�␌␌�␌␌l`␃␜␒␒nn�␐␂␌�n�np�␒␒nn␌��r��`␃␜␒␒no�␐�bbb|␒�b␒␒on�␎l�␌�␌␌�␌ll`␃␜␒␒oo�␐�bbc|␒�b␓␒nn�␎l�␌�␌␌�␌�l`␃␜␒␒no�␐�bbb|␒�b␒␒nn�␎l�␌�␌␌�␌�l`␃␒␒nn�␐�ccc|␒�b␒␒nn�␎l�␌�␌␌�␌␌�␎l␜␒␒on�␐�cbc|␒�b␒␒on�␎l�␌�␌␌�␌l�␎l␜␒␒oo�␐�ccc|␒�b␓␒oo�␎l�␌�␌␌�␌��␏l␜␒␒oo�␐�ccc|␒�b␓␒nn�␎l�␌�␌␌�␌�␎l␜␒␒nn�␐�ccc|␒�b␒␒on�␎l�␌�␌␌�␌␌l`␃␜␒␒nn�␐�bbb|␒�b␒␒nn�␎l�␌�␌␌�␌ll`␂␜␒␒nn�␐�bbb|␒�b␓␒on�␎l�␌�␌␌�l␌l`␂␜␒␒on�␐�cbb|␒�b␓␒oo�␎l�␌�␌␌�lll`␂␜␒␒nn�␐�bcc|␒�b␓␒on�␎l�␌�␌␌�l�l`␂␜␓␒oo�␐�ccc|␒�b␛␒og�␎l�␌�␌␌�l�d`␃␜␒␒on�␐�ccc|␒�b␓␒oo�␎l�␌�␌␌�l␌�␏l␜␒␒nn�␐�bbb|␒�b␒␒nn�␎l�␌�␌␌�ll�␎l␜␒␒oo�␐�bbb|␒�b␒␒nn�␎l�␌�␌␌�l��␎l␜␒␒on�␐�bbb|␒�b␒␒on�␎l�␌�␌␌�l�␏l␜␒␒nn�␐�bbb|␒�b␒␒nn�␎l�␌�␌␌�l␌l`␂␜␒␒oo�␐�ccc|␒�b␓␒no�␎l�␌�␌␌�lll`␂␜␒␒nn�␐�ccb|␒�b␒␒oo�␎l�␌�␌␌��␌l`␃␜␒␒oo�␐�cbb|␒�b␒␒oo�␎l�␌�␌␌��ll`␂␜␒␒on�␐�bbb|␒�b␒␒no�␎l�␌�␌␌���l`␂␜␒␒oo�␐�bcc|␒�b␒␒on�␎l�␌�␌␌���l`␂␜␒␒on�␐�bbc|␒�b␒␒no�␎l�␌�␌␌��␌�␏l␜␒␒nn�␐�ccc|␒�b␒␒nn�␎l�␌�␌␌��l�␏l␜␒␒on�␐�cbb|␒�b␓␒on�␎l�␌�␌␌����␎l␜␒␒nn�␐�bbb|␒�b␒␒nn�␎l�␌�␌␌���␎l␜␒␒on�␐�ccc|␒�b␓␒no�␎l�␌�␌␌��␌l`␃␜␒␒nn�␐�bcb|␒�b␒␒nn�␎l�␌�␌␌��ll`␂␜␒␒no�␐�bcb|␒�b␒␒nn�␎l�␌�␌␌��␌l`␃␜␒␒nn�␐�bbb|␒�b␓␒nn�␎l�␌�␌␌��ll`␂␜␒␒nn�␐�bbb|␒�b␓␒no�␎l�␌�␌␌��l`␃␜␒␒no�␐�bbb|␒�b␒␒no�␎l�␌�␌␌���l`␂␜␒␒nn�␐�bcb|␒�b␒␒nn�␎l�␌�␌␌��␌�␏l␜␒␒oo�␐�ccc|␒�b␓␒go�␎l�␄�␄␌��l�␏l␜␒␒oo�␐�ccc|␒�b␓␒oo�␎l�␌�␌␌�쌎␎l␜␒␒nn�␐�bbb|␒�b␒␒nn�␎l�␌�␌␌���␎l␜␒␒nn�␐�bbb|␒�b␒␒nn�␎l�␌�␌␌��␌l`␂␜␒␒oo�␐�bbb|␒�b␒␒oo�␎l�␌�␌␌��ll`␃␜␒␒no�␐�bcb|␒�b␓␒no�␎l�␌�␌␌�␌�l`␂␜␒␒nn�␐�bbb|␒�b␒␒nn�␎l�␌�␌␌�␌�prl␜␒␒on�␐�ccc|␒�b␓␒no�␎l�␌�␌␌�␌�␜rl␜␒␒no�␐�cbb|␒�b␓␒on�␎l�␌�␌␌�␌�␜rl␜␒␒on�␐�ccb|␒�b␒␒no�␎l�␌�␌␌�␌��␎l␜␒␒no�␐�bcb|␒�b␒␒on�␎l�␌�␌␌�␌�rrl␜␒␒no�␐�ccc|␒�b␒␒no�␎l�␌�␌␌�␌��␎l␜␒␒nn�␐�bbb|␒�b␒␒oo�␎l�␌�␌␌�␌��␎l␜␒␒oo�␐�ccc|␒�b␒␒oo�␎l�␌�␌␌�␌�l`␃␜␒␒on�␐�bcb|␒�b␒␒on�␎l�␌�␌␌�␌�|rl␜␒␒oo�␐�ccc|␒�b␓␒oo�␎l�␌�␌␌�l�l`␃␜␒␒no�␐�bcc|␒�b␓␒oo�␎l�␌�␌␌�l�prl␜␒␒nn�␐�bbb|␒�b␒␒nn�␎l�␌�␌␌�l�␜rl␜␒␒no�␐�bbb|␒�b␓␒on�␎l�␌�␌␌�l�␜rl␜␒␒on�␐�ccc|␒�b␓␒oo�␎l�␌�␌␌�l��␎l␜␒␒nn�␐�ccb|␒�b␒␒nn�␎l�␌�␌␌�l�rrl␜␒␒nn�␐�cbc|␒�b␓␒no�␎l�␌�␌␌�l��␎l␜␒␒no�␐�ccc|␒�b␓␒no�␎l�␌�␌␌�l��␏l␜␒␒on�␐�ccb|␒�b␒␒oo�␎l�␌�␌␌�l�l`␂␜␒␒oo�␐�ccc|␒�b␓␒no�␎l�␌�␌␄�d�|rl␜␒␒no�␐�ccb|␒�b␓␒nn�␎l�␌�␌l�␌␌l`␂␜␒␒on�␐�bcb|␒�b␒␒nn�␎l�␌�␌l�␌ll`␂␜␒␒nn�␐�bbb|␒�b␒␒no�␎l�␌�␌l�␌�l`␂␜␒␒nn�␐�bbb|␒�b␒␒nn�␎l�␌�␌l�␌�l`␃␜␒␒nn�␐�bbb|␒�b␒␒nn�␎l�␌�␌l�␌␌�␎l␜␒␒nn�␐�bbc|␒�b␒␒nn�␎l�␌�␌l�␌l�␎l␜␒␒no�␐�bbb|␒�b␒␒nn�␎l�␌�␌l�␌��␏l␜␒␒nn�␐�bbb|␒�b␒␒nn�␎l�␌�␌l�␌�␎l␜␒␒no�␐�ccc|␒�b␓␒on�␎l�␌�␌l�␌␌l`␃␜␒␒on�␐�bcb|␒�b␒␒on�␎l�␌�␌l�␌ll`␃␜␒␒on�␐�bcb|␒�b␓␒on�␎l�␌�␌l�l␌l`␃␜␒␒oo�␐�cbb|␒�b␓␒oo�␎l�␌�␌l�lll`␂␜␒␒no�␐�bcb|␒�b␓␒on�␎l�␌�␌l�l�l`␂␜␒␒on�␐�bcb|␒�b␒␒no�␎l�␌�␌l�l�l`␂␜␒␒on�␐�bcc|␒�b␓␒oo�␎l�␌�␌l�l␌�␎l␜␒␒oo�␐�bcc|␒�b␓␒nn�␎l�␌�␌l�ll�␏l␜␒␒oo�␐�bcc|␒�b␒␒oo�␎l�␌�␌l�l��␎l␜␒␒no�␐�bbc|␒�b␒␒nn�␎l�␌�␌l�l�␎l␜␒␒oo�␐�bbb|␒�b␓␒oo�␎l�␌�␌l�l␌l`␃␜␒␒no�␐�cbc|␒�b␓␒no�␎l�␌�␌l�lll`␃␜␒␒no�␐�bcb|␒�b␓␒nn�␎l�␌�␌l��␌l`␂␜␒␒on�␐�bcb|␒�b␒␒nn�␎l�␌�␌l��ll`␃␜␒␒on�␐�ccc|␒�b␓␒oo�␎l�␌�␌l���l`␂␜␒␒nn�␐�bbb|␒�b␒␒nn�␎l�␌�␌l���l`␂␜␒␒on�␐�bbb|␒�b␓␒oo�␎l�␌�␌l��␌�␎l␜␒␒on�␐�bcb|␒�b␒␒oo�␎l�␌�␌l��l�␏l␜␒␒on�␐�bcc|␒�b␒␒oo�␎l�␌�␌l����␎l␜␒␒oo�␐�ccb|␒�b␓␒oo�␎l�␌�␌l���␏l␜␒␒'g�␐�#c#<␒�b␛␒'o�␏l�␄�␌d��␌l`␃␜␒␒no�␐�ccb|␒�b␓␒oo�␎l�␌�␌l��ll`␃␜␒␒oo�␐�bbc|␒�b␒␒on�␎l�␌�␌l��␌l`␂␜␒␒nn�␐�cbc|␒�b␒␒nn�␎l�␌�␌l��ll`␃␜␒␒oo�␐�ccc|␒�b␓␒oo�␎l�␌�␌l��l`␃␜␒␒nn�␐�cbb|␒�b␒␒nn�␎l�␌�␌l���l`␂␜␒␒on�␐�cbc|␒�b␓␒oo�␎l�␌�␌l��␌�␏l␜␒␒nn�␐�bbc|␒�b␓␒no�␎l�␌�␌l��l�␎l␜␒␒nn�␐�bbb|␒�b␒␒nn�␎l�␌�␌l�쌎␏l␜␓␒''�␐�###<␒�#␛␒''�␇$�␄�␄$���␇l␜␒␒no�␐�bcb|␒�b␒␒oo�␎l�␌�␌l��␌l`␃␜␒␒no�␐�ccb|␒�b␓␒no�␎l�␌�␌l��ll`␃␜␒␒oo�␐�ccc|␒�b␓␒oo�␎l�␌�␌l�␌�l`␃␜␒␒nn�␐�bbb|␒�b␒␒on�␎l�␌�␌l�␌�prl␜␒␒no�␐�bbb|␒�b␒␒no�␎l�␌�␌l�␌�␜rl␜␒␒oo�␐�bcb|␒�b␒␒no�␎l�␌�␌l�␌�␜rl␜␒␒nn�␐�bbc|␒�b␒␒on�␎l�␌�␌l�␌��␎l␜␒␒on�␐�bbb|␒�b␒␒nn�␎l�␌�␌l�␌�rrl␜␒␒on�␐�cbb|␒�b␒␒nn�␎l�␌�␌l�␌��␎l␜␒␒oo�␐�cbc|␒�b␒␒no�␎l�␌�␌l�␌��␏l␜␒␛''�␐�###<␓�b␛␛''�␎$�␄�␄$�␄�$ ␃␜␒␒oo�␐�cbc|␒�b␓␒on�␎l�␌�␌l�␌�|rl␜␒␒nn�␐�cbc|␒�b␒␒on�␎l�␌�␌l�l�l`␂␜␒␒on�␐�ccb|␒�b␒␒no�␎l�␌�␌l�l�prlets Jun 8 2016 00:22:57
rst:0x1 (POWERON_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:1
load:0x3fff0018,len:4
load:0x3fff001c,len:1044
load:0x40078000,len:10124
load:0x40080400,len:5856
entry 0x400806a8
ets Jun 8 2016 00:22:57
rst:0x10 (RTCWDT_RTC_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:1
load:0x3fff0018,len:4
load:0x3fff001c,len:1044
load:0x40078000,len:10124
load:0x40080400,len:5856
entry 0x400806a8
Connecting to VUMA FIBER
Sunday, January 00 1900 00:00:00
Failed to obtain time
failed...retrying
00:00:01
failed...retrying
00:00:02
failed...retrying
00:00:03
failed...retrying
CONNECTED
Sunday, January 00 1900 00:00:00
Failed to obtain time
00:00:04
00:00:05
00:00:06
00:00:07
00:00:08
00:00:09
00:00:10
00:00:11
00:00:12
00:00:13
00:00:14
00:00:15
00:00:16
00:00:17
00:00:18
00:00:19
00:00:20
00:00:21
00:00:22
00:00:23
00:00:24
00:00:25
00:00:26
00:00:27
00:00:28
00:00:29
00:00:30
00:00:31
00:00:32
00:00:33
00:00:34
00:00:35
00:00:36
00:00:37
00:00:38
00:00:39
00:00:40
00:00:41
00:00:42
Any help would be appreciated

Spiral in sampled x-y plane

Let’s say I have the following 3D discretized space, in which the indexes of the samples/nodes are sequential as it is shown in the picture.
Now consider only the horizontal middle layer.
My objective is to find a programmatically and iterative rule/s that allow me to run a spiral (like the image or similar, it can start in any direction) over the mid-layer, starting from node 254, as it is shown on the image:
As you can see in the picture, the yellow crosses show the nodes to be explored. In the first lap these nodes are consecutive while in the second they are separated by 1 node and so on.
I started to solve the problem as follows (pseudocode):
I considered size(y) = y = 13
Size(z) = z = 3
Lap 1:
254 – z * y = 215
254 – z * (y + 1) = 212
254 – z = 251
254 + z * (y - 1) = 290
254 + z * y = 293
254 + z * (y + 1) = 296
254 + z = 257
254 – z * (y – 1) = 218
Lap 2:
254 – 3 * z * y = 137
254 – 3 * z * (y + 2/3) = 131
…
But I think there may be a simpler, more general rule.
each direction has constant index increment:
const int dx = 39;
const int dy = 3;
const int dz = 1;
so to make a spiral you just start from start index and increment in current direction i-times then rotate by 90 deg and do the same ... then increment i and do this until desired size is hit ...
You should also add range checking so your spiral will not go outside your array as that would screw things up. By checking actual x,y,z coordinates. So either compute them in parallel or infer them from ix using modular arithmetics so for example something like (C++):
const int dx = 39;
const int dy = 3;
const int dz = 1;
int cw[4]={-dx,-dy,+dx,+dy}; // CW rotation
int ix=254; // start point (center of spiral)
int dir=0; // direction cw[dir]
int n=5; // size
int i,j,k,x,y,z,a; // temp
for (k=0,i=1;i<=n;i+=k,k^=1,dir++,dir&=3)
for (j=1;j<=i;j++)
{
int a=ix-1;
z = a% 3; a/= 3; // 3 is z-resolution
y = a%13; a/=13; // 13 is y-resolution
x = a;
if ((x>=0)&&(x<13)&&(y>=0)&&(y<13)&&(z>=0)&&(z<3))
{
// here use point ix
// Form1->mm_log->Lines->Add(AnsiString().sprintf("%i (%i,%i,%i) %i",ix,x,y,z,i));
}
ix+=cw[dir];
}
producing this output
ix x,y,z i
254 (6,6,1) 1
215 (5,6,1) 1
212 (5,5,1) 2
251 (6,5,1) 2
290 (7,5,1) 2
293 (7,6,1) 2
296 (7,7,1) 3
257 (6,7,1) 3
218 (5,7,1) 3
179 (4,7,1) 3
176 (4,6,1) 3
173 (4,5,1) 3
170 (4,4,1) 4
209 (5,4,1) 4
248 (6,4,1) 4
287 (7,4,1) 4
326 (8,4,1) 4
329 (8,5,1) 4
332 (8,6,1) 4
335 (8,7,1) 4
338 (8,8,1) 5
299 (7,8,1) 5
260 (6,8,1) 5
221 (5,8,1) 5
182 (4,8,1) 5
143 (3,8,1) 5
140 (3,7,1) 5
137 (3,6,1) 5
134 (3,5,1) 5
131 (3,4,1) 5
In case you want CCW spiral either reverse the cw[] or instead of dir++ do dir--
In case you want to have changeable screw width then you just increment i by the actual width instead of just by one.
Based on #Spektre answer, this code worked for me:
const int x_res = 13;
const int y_res = 13;
const int z_res = 3;
const int dx = 39;
const int dy = 3;
const int dz = 1;
int cw[4]={-dx,-dy,+dx,+dy}; // CW rotation
int ix=254; // start point (center of spiral)
int dir=0; // direction cw[dir]
int n=30; // size
int i,j,k;
cout << ix << endl;
// first "lap" (consecutive nodes)
for (k=0,i=1;i<=2;i+=k,k^=1,dir++,dir&=3)
for (j=1;j<=i;j++)
{
ix+=cw[dir];
cout << ix << endl;
}
i-=1;
int width = 2; //screw width
i+=width;
int dist = 1; //nodes separation
int node_count = 0; //nodes counter
for (k=k,i=i;i<=n;i+=k,k^=width,dir++,dir&=3)
{
if (dir==1)
{
dist+=1;
}
for (j=1;j<=i;j++)
{
ix+=cw[dir];
node_count +=1;
if ((0 < ix) && (ix <= x_res*y_res*z_res))
{
if (node_count == dist)
{
cout << ix << endl;
node_count = 0;
}
}
else return 0;
}
}
return 0;
with this output:
254 215 212 251 290 293 296 257 218 179 140 134 128 206 284 362 368 374 380 302
224 146 68 59 50 83 200 317 434 443 452 461 386 269 152 35

SIGXCPU raised by setrlimit RLIMIT_CPU later than expected in a virtual machine

[EDIT: added MCVE in the text, clarifications]
I have the following program that sets RLIMIT_CPU to 2 seconds using setrlimit() and catches the signal. RLIMIT_CPU limits CPU time. «When the process reaches the soft limit, it is sent a SIGXCPU signal. The default action for this signal is to terminate the process. However, the signal can be caught, and the handler can return control to the main program.» (man)
The following program sets RLIMIT_CPU and a signal handler for SIGXCPU, then it generates random numbers until SIGXCPU gets raised, the signal handler simply exits the program.
test_signal.cpp
/*
* Test program for signal handling on CMS.
*
* Compile with:
* /usr/bin/g++ [-DDEBUG] -Wall -std=c++11 -O2 -pipe -static -s \
* -o test_signal test_signal.cpp
*
* The option -DDEBUG activates some debug logging in the helpers library.
*/
#include <iostream>
#include <fstream>
#include <random>
#include <chrono>
#include <iostream>
#include <unistd.h>
#include <csignal>
#include <sys/time.h>
#include <sys/resource.h>
using namespace std;
namespace helpers {
long long start_time = -1;
volatile sig_atomic_t timeout_flag = false;
unsigned const timelimit = 2; // soft limit on CPU time (in seconds)
void setup_signal(void);
void setup_time_limit(void);
static void signal_handler(int signum);
long long get_elapsed_time(void);
bool has_reached_timeout(void);
void setup(void);
}
namespace {
unsigned const minrand = 5;
unsigned const maxrand = 20;
int const numcycles = 5000000;
};
/*
* Very simple debugger, enabled at compile time with -DDEBUG.
* If enabled, it prints on stderr, otherwise it does nothing (it does not
* even evaluate the expression on its right-hand side).
*
* Main ideas taken from:
* - C++ enable/disable debug messages of std::couts on the fly
* (https://stackoverflow.com/q/3371540/2377454)
* - Standard no-op output stream
* (https://stackoverflow.com/a/11826787/2377454)
*/
#ifdef DEBUG
#define debug true
#else
#define debug false
#endif
#define debug_logger if (!debug) \
{} \
else \
cerr << "[DEBUG] helpers::"
// conversion factor betwen seconds and nanoseconds
#define NANOS 1000000000
// signal to handle
#define SIGNAL SIGXCPU
#define TIMELIMIT RLIMIT_CPU
/*
* This could be a function factory where and a closure of the signal-handling
* function so that we could explicitly pass the output ofstream and close it.
* C++ support closures only for lambdas, alas, at the moment we also need
* the signal-handling function to be a pointer to a function and lambaa are
* a different object that can not be converted. See:
* - Passing lambda as function pointer
* (https://stackoverflow.com/a/28746827/2377454)
*/
void helpers::signal_handler(int signum) {
helpers::timeout_flag = true;
debug_logger << "signal_handler:\t" << "signal " << signum \
<< " received" << endl;
debug_logger << "signal_handler:\t" << "exiting after " \
<< helpers::get_elapsed_time() << " microseconds" << endl;
exit(0);
}
/*
* Set function signal_handler() as handler for SIGXCPU using sigaction. See
* - https://stackoverflow.com/q/4863420/2377454
* - https://stackoverflow.com/a/17572787/2377454
*/
void helpers::setup_signal() {
debug_logger << "set_signal:\t" << "set_signal() called" << endl;
struct sigaction new_action;
//Set the handler in the new_action struct
new_action.sa_handler = signal_handler;
// Set to empty the sa_mask. It means that no signal is blocked
// while the handler run.
sigemptyset(&new_action.sa_mask);
// Block the SIGXCPU signal, while the handler run, SIGXCPU is ignored.
sigaddset(&new_action.sa_mask, SIGNAL);
// Remove any flag from sa_flag
new_action.sa_flags = 0;
// Set new action
sigaction(SIGNAL,&new_action,NULL);
if(debug) {
struct sigaction tmp;
// read the old signal associated to SIGXCPU
sigaction(SIGNAL, NULL, &tmp);
debug_logger << "set_signal:\t" << "action.sa_handler: " \
<< tmp.sa_handler << endl;
}
return;
}
/*
* Set soft CPU time limit.
* RLIMIT_CPU set teg CPU time limit in seconds..
* See:
* - https://www.go4expert.com/articles/
* getrlimit-setrlimit-control-resources-t27477/
* - https://gist.github.com/Leporacanthicus/11086960
*/
void helpers::setup_time_limit(void) {
debug_logger << "set_limit:\t\t" << "set_limit() called" << endl;
struct rlimit limit;
if(getrlimit(TIMELIMIT, &limit) != 0) {
perror("error calling getrlimit()");
exit(EXIT_FAILURE);
}
limit.rlim_cur = helpers::timelimit;
if(setrlimit(TIMELIMIT, &limit) != 0) {
perror("error calling setrlimit()");
exit(EXIT_FAILURE);
}
if (debug) {
struct rlimit tmp;
getrlimit(TIMELIMIT, &tmp);
debug_logger << "set_limit:\t\t" << "current limit: " << tmp.rlim_cur \
<< " seconds" << endl;
}
return;
}
void helpers::setup(void) {
struct timespec start;
if (clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &start)) {
exit(EXIT_FAILURE);
}
start_time = start.tv_sec*NANOS + start.tv_nsec;
setup_signal();
setup_time_limit();
return;
}
long long helpers::get_elapsed_time(void) {
struct timespec current;
if (clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &current)) {
exit(EXIT_FAILURE);
}
long long current_time = current.tv_sec*NANOS + current.tv_nsec;
long long elapsed_micro = (current_time - start_time)/1000 + \
((current_time - start_time) % 1000 >= 500);
return elapsed_micro;
}
bool helpers::has_reached_timeout(void) {
return helpers::timeout_flag;
}
int main() {
helpers::setup();
ifstream in("input.txt");
in.close();
ofstream out("output.txt");
random_device rd;
mt19937 eng(rd());
uniform_int_distribution<> distr(minrand, maxrand);
int i = 0;
while(!helpers::has_reached_timeout()) {
int nmsec;
for(int n=0; n<numcycles; n++) {
nmsec = distr(eng);
}
cout << "i: " << i << "\t- nmsec: " << nmsec << "\t- ";
out << "i: " << i << "\t- nmsec: " << nmsec << "\t- ";
cout << "program has been running for " << \
helpers::get_elapsed_time() << " microseconds" << endl;
out << "program has been running for " << \
helpers::get_elapsed_time() << " microseconds" << endl;
i++;
}
return 0;
}
I compile it as follows:
/usr/bin/g++ -DDEBUG -Wall -std=c++11 -O2 -pipe -static -s -o test_signal test_signal.cpp
On my laptop it correctly gets a SIGXCPU after 2 seconds, see the output:
$ /usr/bin/time -v ./test_signal
[DEBUG] helpers::set_signal: set_signal() called
[DEBUG] helpers::set_signal: action.sa_handler: 1
[DEBUG] helpers::set_limit: set_limit() called
[DEBUG] helpers::set_limit: current limit: 2 seconds
i: 0 - nmsec: 11 - program has been running for 150184 microseconds
i: 1 - nmsec: 18 - program has been running for 294497 microseconds
i: 2 - nmsec: 9 - program has been running for 422220 microseconds
i: 3 - nmsec: 5 - program has been running for 551882 microseconds
i: 4 - nmsec: 20 - program has been running for 685373 microseconds
i: 5 - nmsec: 16 - program has been running for 816642 microseconds
i: 6 - nmsec: 9 - program has been running for 951208 microseconds
i: 7 - nmsec: 20 - program has been running for 1085614 microseconds
i: 8 - nmsec: 20 - program has been running for 1217199 microseconds
i: 9 - nmsec: 12 - program has been running for 1350183 microseconds
i: 10 - nmsec: 17 - program has been running for 1486431 microseconds
i: 11 - nmsec: 13 - program has been running for 1619845 microseconds
i: 12 - nmsec: 20 - program has been running for 1758074 microseconds
i: 13 - nmsec: 11 - program has been running for 1895408 microseconds
[DEBUG] helpers::signal_handler: signal 24 received
[DEBUG] helpers::signal_handler: exiting after 2003326 microseconds
Command being timed: "./test_signal"
User time (seconds): 1.99
System time (seconds): 0.00
Percent of CPU this job got: 99%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:02.01
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 1644
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 59
Voluntary context switches: 1
Involuntary context switches: 109
Swaps: 0
File system inputs: 0
File system outputs: 16
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
If I compile and run in a virtual machine (VirtualBox, running Ubuntu), I get this:
$ /usr/bin/time -v ./test_signal
[DEBUG] helpers::set_signal: set_signal() called
[DEBUG] helpers::set_signal: action.sa_handler: 1
[DEBUG] helpers::set_limit: set_limit() called
[DEBUG] helpers::set_limit: current limit: 2 seconds
i: 0 - nmsec: 12 - program has been running for 148651 microseconds
i: 1 - nmsec: 13 - program has been running for 280494 microseconds
i: 2 - nmsec: 7 - program has been running for 428390 microseconds
i: 3 - nmsec: 5 - program has been running for 580805 microseconds
i: 4 - nmsec: 10 - program has been running for 714362 microseconds
i: 5 - nmsec: 19 - program has been running for 846853 microseconds
i: 6 - nmsec: 20 - program has been running for 981253 microseconds
i: 7 - nmsec: 7 - program has been running for 1114686 microseconds
i: 8 - nmsec: 7 - program has been running for 1249530 microseconds
i: 9 - nmsec: 12 - program has been running for 1392096 microseconds
i: 10 - nmsec: 20 - program has been running for 1531859 microseconds
i: 11 - nmsec: 19 - program has been running for 1667021 microseconds
i: 12 - nmsec: 13 - program has been running for 1818431 microseconds
i: 13 - nmsec: 17 - program has been running for 1973182 microseconds
i: 14 - nmsec: 7 - program has been running for 2115423 microseconds
i: 15 - nmsec: 20 - program has been running for 2255140 microseconds
i: 16 - nmsec: 13 - program has been running for 2394162 microseconds
i: 17 - nmsec: 10 - program has been running for 2528274 microseconds
i: 18 - nmsec: 15 - program has been running for 2667978 microseconds
i: 19 - nmsec: 8 - program has been running for 2803725 microseconds
i: 20 - nmsec: 9 - program has been running for 2940610 microseconds
i: 21 - nmsec: 19 - program has been running for 3075349 microseconds
i: 22 - nmsec: 14 - program has been running for 3215255 microseconds
i: 23 - nmsec: 5 - program has been running for 3356515 microseconds
i: 24 - nmsec: 5 - program has been running for 3497369 microseconds
[DEBUG] helpers::signal_handler: signal 24 received
[DEBUG] helpers::signal_handler: exiting after 3503271 microseconds
Command being timed: "./test_signal"
User time (seconds): 3.50
System time (seconds): 0.00
Percent of CPU this job got: 99%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:03.52
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 1636
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 59
Voluntary context switches: 0
Involuntary context switches: 106
Swaps: 0
File system inputs: 0
File system outputs: 16
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
Even running the binary compiled on my laptop, the process gets killed after around 3 seconds of elapsed user time.
Any idea of what could be causing this? For a broader context see, this thread: https://github.com/cms-dev/cms/issues/851

How do I make this program work for input >10 for the USACO Training Pages Square Palindromes?

Problem Statement -
Given a number base B (2 <= B <= 20 base 10), print all the integers N (1 <= N <= 300 base 10) such that the square of N is palindromic when expressed in base B; also print the value of that palindromic square. Use the letters 'A', 'B', and so on to represent the digits 10, 11, and so on.
Print both the number and its square in base B.
INPUT FORMAT
A single line with B, the base (specified in base 10).
SAMPLE INPUT
10
OUTPUT FORMAT
Lines with two integers represented in base B. The first integer is the number whose square is palindromic; the second integer is the square itself. NOTE WELL THAT BOTH INTEGERS ARE IN BASE B!
SAMPLE OUTPUT
1 1
2 4
3 9
11 121
22 484
26 676
101 10201
111 12321
121 14641
202 40804
212 44944
264 69696
My code works for all inputs <=10, however, gives me some weird output for inputs >10.
My Code-
#include<iostream>
#include<cstdio>
#include<cmath>
using namespace std;
int baseToBase(int num, int base) //accepts a number in base 10 and the base to be converted into as arguments
{
int result=0, temp=0, i=1;
while(num>0)
{
result = result + (num%base)*pow(10, i);
i++;
num = num/base;
}
result/=10;
return result;
}
long long int isPalin(int n, int base) //checks the palindrome
{
long long int result=0, temp, num=n*n, x=n*n;
num = baseToBase(num, base);
x = baseToBase(x, base);
while(num)
{
temp=num%10;
result = result*10 + temp;
num/=10;
}
if(x==result)
return x;
else
return 0;
}
int main()
{
int base, i, temp;
long long int sq;
cin >> base;
for(i=1; i<=300; i++)
{
temp=baseToBase(i, base);
sq=isPalin(i, base);
if(sq!=0)
cout << temp << " " << sq << endl;
}
return 0;
}
For input = 11, the answer should be
1 1
2 4
3 9
6 33
11 121
22 484
24 565
66 3993
77 5335
101 10201
111 12321
121 14641
202 40804
212 44944
234 53535
While my answer is
1 1
2 4
3 9
6 33
11 121
22 484
24 565
66 3993
77 5335
110 10901
101 10201
111 12321
121 14641
209 40304
202 40804
212 44944
227 50205
234 53535
There is a difference in my output and the required one as 202 shows under 209 and 110 shows up before 101.
Help appreciated, thanks!
a simple example for B = 11 to show error in your base conversion is for i = 10 temp should be A but your code calculates temp = 10. Cause in we have only 10 symbols 0-9 to perfectly show every number in base 10 or lower but for bases greater than that you have to use other symbols to represent a different digit like 'A', 'B' and so on. problem description clearly states that. Hope You will be able to fix your code now by modifying your int baseToBase(int num, int base)function.

CUDA - Multiple Threads

I am trying to make an LCG Random Number Generator run in parallel using CUDA & GPU's. However, I am having trouble actually getting multiple threads running at the same time.Here is a copy of the code:
#include <iostream>
#include <math.h>
__global__ void rng(long *cont)
{
int a=9, c=3, F, X=1;
long M=524288, Y;
printf("\nKernel X is %d\n", X[0]);
F=X;
Y=X;
printf("Kernel F is %d\nKernel Y is %d\n", F, Y);
Y=(a*Y+c)%M;
printf("%ld\t", Y);
while(Y!=F)
{
Y=(a*Y+c)%M;
printf("%ld\t", Y);
cont[0]++;
}
}
int main()
{
long cont[1]={1};
int X[1];
long *dev_cont;
int *dev_X;
cudaEvent_t beginEvent;
cudaEvent_t endEvent;
cudaEventCreate( &beginEvent );
cudaEventCreate( &endEvent );
printf("Please give the value of the seed X ");
scanf("%d", &X[0]);
printf("Host X is: %d", *X);
cudaEventRecord( beginEvent, 0);
cudaMalloc( (void**)&dev_cont, sizeof(long) );
cudaMalloc( (void**)&dev_X, sizeof(int) );
cudaMemcpy(dev_cont, cont, 1 * sizeof(long), cudaMemcpyHostToDevice);
cudaMemcpy(dev_X, X, 1 * sizeof(int), cudaMemcpyHostToDevice);
rng<<<1,1>>>(dev_cont);
cudaMemcpy(cont, dev_cont, 1 * sizeof(long), cudaMemcpyDeviceToHost);
cudaEventRecord( endEvent, 0);
cudaEventSynchronize (endEvent );
float timevalue;
cudaEventElapsedTime (&timevalue, beginEvent, endEvent);
printf("\n\nYou generated a total of %ld numbers", cont[0]);
printf("\nCUDA Kernel Time: %.2f ms\n", timevalue);
cudaFree(dev_cont);
cudaFree(dev_X);
cudaEventDestroy( endEvent );
cudaEventDestroy( beginEvent );
return 0;
}
Right now I am only sending one block with one thread. However, if I send 100 threads, the only thing that will happen is that it will produce the same number 100 times and then proceed to the next number. In theory this is what is meant to be expected but it automatically disregards the purpose of "random numbers" when a number is repeated.
The idea I want to implement is to have multiple threads. One thread will use that formula:
Y=(a*Y+c)%M but using an initial value of Y=1, then another thread will use the same formula but with an initial value of Y=1000, etc etc. However, once the first thread produces 1000 numbers, it needs to stop making more calculations because if it continues it will interfere with the second thread producing numbers with a value of Y=1000.
If anyone can point in the right direction, at least in the way of creating multiple threads with different functions or instructions inside of them, to run in parallel, I will try to figure out the rest.
Thanks!
UPDATE: July 31, 8:14PM EST
I updated my code to the following. Basically I am trying to produce 256 random numbers. I created the array where those 256 numbers will be stored. I also created an array with 10 different seed values for the values of Y in the threads. I also changed the code to request 10 threads in the device. I am also saving the numbers that are generated in an array. The code is not working correctly as it should. Please advise on how to fix it or how to make it achieve what I want.
Thanks!
#include <iostream>
#include <math.h>
__global__ void rng(long *cont, int *L, int *N)
{
int Y=threadIdx.x;
Y=N[threadIdx.x];
int a=9, c=3, i;
long M=256;
for(i=0;i<256;i++)
{
Y=(a*Y+c)%M;
N[i]=Y;
cont[0]++;
}
}
int main()
{
long cont[1]={1};
int i;
int L[10]={1,25,50,75,100,125,150,175,200,225}, N[256];
long *dev_cont;
int *dev_L, *dev_N;
cudaEvent_t beginEvent;
cudaEvent_t endEvent;
cudaEventCreate( &beginEvent );
cudaEventCreate( &endEvent );
cudaEventRecord( beginEvent, 0);
cudaMalloc( (void**)&dev_cont, sizeof(long) );
cudaMalloc( (void**)&dev_L, sizeof(int) );
cudaMalloc( (void**)&dev_N, sizeof(int) );
cudaMemcpy(dev_cont, cont, 1 * sizeof(long), cudaMemcpyHostToDevice);
cudaMemcpy(dev_L, L, 10 * sizeof(int), cudaMemcpyHostToDevice);
cudaMemcpy(dev_N, N, 256 * sizeof(int), cudaMemcpyHostToDevice);
rng<<<1,10>>>(dev_cont, dev_L, dev_N);
cudaMemcpy(cont, dev_cont, 1 * sizeof(long), cudaMemcpyDeviceToHost);
cudaMemcpy(N, dev_N, 256 * sizeof(int), cudaMemcpyDeviceToHost);
cudaEventRecord( endEvent, 0);
cudaEventSynchronize (endEvent );
float timevalue;
cudaEventElapsedTime (&timevalue, beginEvent, endEvent);
printf("\n\nYou generated a total of %ld numbers", cont[0]);
printf("\nCUDA Kernel Time: %.2f ms\n", timevalue);
printf("Your numbers are:");
for(i=0;i<256;i++)
{
printf("%d\t", N[i]);
}
cudaFree(dev_cont);
cudaFree(dev_L);
cudaFree(dev_N);
cudaEventDestroy( endEvent );
cudaEventDestroy( beginEvent );
return 0;
}
#Bardia - Please let me know how I can change my code to accommodate my needs.
UPDATE: August 1, 5:39PM EST
I edited my code to accommodate #Bardia's modifications to the Kernel code. However a few errors in the generation of numbers are coming out. First, the counter that I created in the kernel to count the amount of numbers that are being created, is not working. At the end it only displays that "1" number was generated. The Timer that I created to measure the time it takes for the kernel to execute the instructions is also not working because it keeps displaying 0.00 ms. And based on the parameters that I have set for the formula, the numbers that are being generated and copied into the array and then printed on the screen do not reflect the numbers that are meant to appear (or even close). These all used to work before.
Here is the new code:
#include <iostream>
#include <math.h>
__global__ void rng(long *cont, int *L, int *N)
{
int Y=threadIdx.x;
Y=L[threadIdx.x];
int a=9, c=3, i;
long M=256;
int length=ceil((float)M/10); //256 divided by the number of threads.
for(i=(threadIdx.x*length);i<length;i++)
{
Y=(a*Y+c)%M;
N[i]=Y;
cont[0]++;
}
}
int main()
{
long cont[1]={1};
int i;
int L[10]={1,25,50,75,100,125,150,175,200,225}, N[256];
long *dev_cont;
int *dev_L, *dev_N;
cudaEvent_t beginEvent;
cudaEvent_t endEvent;
cudaEventCreate( &beginEvent );
cudaEventCreate( &endEvent );
cudaEventRecord( beginEvent, 0);
cudaMalloc( (void**)&dev_cont, sizeof(long) );
cudaMalloc( (void**)&dev_L, sizeof(int) );
cudaMalloc( (void**)&dev_N, sizeof(int) );
cudaMemcpy(dev_cont, cont, 1 * sizeof(long), cudaMemcpyHostToDevice);
cudaMemcpy(dev_L, L, 10 * sizeof(int), cudaMemcpyHostToDevice);
cudaMemcpy(dev_N, N, 256 * sizeof(int), cudaMemcpyHostToDevice);
rng<<<1,10>>>(dev_cont, dev_L, dev_N);
cudaMemcpy(cont, dev_cont, 1 * sizeof(long), cudaMemcpyDeviceToHost);
cudaMemcpy(N, dev_N, 256 * sizeof(int), cudaMemcpyDeviceToHost);
cudaEventRecord( endEvent, 0);
cudaEventSynchronize (endEvent );
float timevalue;
cudaEventElapsedTime (&timevalue, beginEvent, endEvent);
printf("\n\nYou generated a total of %ld numbers", cont[0]);
printf("\nCUDA Kernel Time: %.2f ms\n", timevalue);
printf("Your numbers are:");
for(i=0;i<256;i++)
{
printf("%d\t", N[i]);
}
cudaFree(dev_cont);
cudaFree(dev_L);
cudaFree(dev_N);
cudaEventDestroy( endEvent );
cudaEventDestroy( beginEvent );
return 0;
}
This is the output I receive:
[wigberto#client2 CUDA]$ ./RNG8
You generated a total of 1 numbers
CUDA Kernel Time: 0.00 ms
Your numbers are:614350480 32767 1132936976 11079 2 0 10 0 1293351837 0 -161443660 48 0 0 614350336 32767 1293351836 0 -161444681 48 614350760 32767 1132936976 11079 2 0 10 0 1057178751 0 -161443660 48 155289096 49 614350416 32767 1057178750 0 614350816 32767 614350840 32767 155210544 49 0 0 1132937352 11079 1130370784 11079 1130382061 11079 155289096 49 1130376992 11079 0 1 1610 1 1 1 1130370408 11079 614350896 32767 614350816 32767 1057178751 0 614350840 32767 0 0 -161443150 48 0 0 1132937352 11079 1 11079 0 0 1 0 614351008 32767 614351032 32767 0 0 0 0 0 0 1130369536 1 1132937352 11079 1130370400 11079 614350944 32767 1130369536 11079 1130382061 11079 1130370784 11079 1130365792 11079 6143510880 614351008 32767 -920274837 0 614351032 32767 0 0 -161443150 48 0 0 0 0 1 0 128 0-153802168 48 614350896 32767 1132839104 11079 97 0 88 0 1 0 155249184 49 1130370784 11079 0 0-1 0 1130364928 11079 2464624 0 4198536 0 4198536 0 4197546 0 372297808 0 1130373120 11079 -161427611 48 111079 0 0 1 0 -153802272 48 155249184 49 372297840 0 -1 0 -161404446 48 0 0 0 0372298000 0 372297896 0 372297984 0 0 0 0 0 1130369536 11079 84 0 1130471067 11079 6303744 0614351656 32767 0 0 -1 0 4198536 0 4198536 0 4197546 0 1130397880 11079 0 0 0 0 0 0 00 0 0 -161404446 48 0 0 4198536 0 4198536 0 6303744 0 614351280 32767 6303744 0 614351656 32767 614351640 32767 1 0 4197371 0 0 0 0 0 [wigberto#client2 CUDA]$
#Bardia - Please advise on what is the best thing to do here.
Thanks!
You can address threads within a block by threadIdx variable.
ie., in your case you should probably set
Y = threadIdx.x and then use Y=(a*Y+c)%M
But in general implementing a good RNG on CUDA could be really difficult.
So I don't know if you want to implement your own generator just for practice..
Otherwise there is a CURAND library available which provides a number of pseudo- and quasi-random generators, ie. XORWOW, MersenneTwister, Sobol etc.
It should do the same work in all threads, because you want them to do the same work. You should always distinguish threads from each other with addressing them.
For example you should say thread #1 you do this job and save you work here and thread #2 you do that job and save your work there and then go to Host and use that data.
For a two dimensional block grid with two dimension threads in each block I use this code for addressing:
int X = blockIdx.x*blockDim.x+threadIdx.x;
int Y = blockIdx.y*blockDim.y+threadIdx.y;
The X and Y in the code above are the global address of your thread (I think for your a one dimensional grid and thread is sufficient).
Also remember that you can not use the printf function on the kernel. The GPUs can't make any interrupt. For this you can use cuPrintf function which is one of CUDA SDK's samples, but read it's instructions to use it correctly.
This answer relates to the edited part of the question.
I didn't notice that it is a recursive Algorithm and unfortunately I don't know how to parallelize a recursive algorithm.
My only idea for generating these 256 number is to generate them separately. i.e. generate 26 of them in the first thread, 26 of them on the second thread and so on.
This code will do this (this is only kernel part):
#include <iostream>
#include <math.h>
__global__ void rng(long *cont, int *L, int *N)
{
int Y=threadIdx.x;
Y=L[threadIdx.x];
int a=9, c=3, i;
long M=256;
int length=ceil((float)M/10); //256 divided by the number of threads.
for(i=(threadIdx.x*length);i<length;i++)
{
Y=(a*Y+c)%M;
N[i]=Y;
cont[0]++;
}
}

Resources