Confusion in understanding thread::join() in "The C++ Programming Language" - c++11

When reading "The C++ Programming Language, 4th Edition" by Bjarne Stroustrup, section 42.2.4 join() of the new thread class in STL. It has an example code, that confuses me.
void run(int i, int n) // warning: really poor code
{
thread t1 {f};
thread t2;
vector<Foo> v;
// ...
if (i<n) {
thread t3 {g};
// ...
t2 = move(t3); // move t3 to outer scope
}
v[i] = Foo{}; // might throw
// ...
t1.join();
t2.join();
}
It has these comments after the code snippet:
Here, I have made several bad mistakes. In particular:
We may never reach the two join()s at the end. In that case, the
destructor for t1 will terminate the program.
We may reach the two join()s at the end without the move t2=move(t3) having executed. In that case, t2.join() will terminate the program.
Why we may never reach the two joins(), why the destructor for t1 is called before t1.join()?

Related

C++11 redeclaration in range-based for loop

This code compiles in Visual Studio 2015 update 3 (and here: visual C++ compiler online , and does not in other compilers I have tried online (GCC and CLANG), giving a redeclaration error
vector<int> v = {1,2,3};
for (auto i : v) {
printf("%d ", i);
int i = 99;
printf("%d ", i);
}
output: 1 99 2 99 3 99
VS C++ online compiler (version: 19.10.24903.0) warns about this:
warning C4456: declaration of 'i' hides previous local declaration
Is there some space in the C++11 spec to allow for both implementations to be valid?
Seems to me that VS2015 is creating a scope for the "auto i", and an inner scope for the loop body.
Adding an extra scope, as a colleague suggested, compiles fine in the other compilers I have tested (not that I wanted this, it's just for curiosity):
vector<int> v = {1,2,3};
for (auto i : v) {{
printf("%d ", i);
int i = 99;
printf("%d ", i);
}}
thanks
EDIT:
Ok, after reading this other question Redeclaration of variable in range-based for loops and the answer from "Angew", I believe that VS is actually correct.
I am reading here: cpp reference
Given this grammar description:
for ( range_declaration : range_expression ) loop_statement
and what this is equivalent to:
{
auto && __range = range_expression ;
for (auto __begin = begin_expr, __end = end_expr;
__begin != __end; ++__begin) {
range_declaration = *__begin;
loop_statement
}
}
I understand that loop_statement is actually my entire block including the brackets, so the redefinition is indeed in an inner block, hence valid.
EDIT 2:
My last edit, for future reference, reading the traditional for loop grammar is a similar situation (cpp for loop) as the range-based:
for ( init-statement condition(optional); iteration_expression(optional) ) statement
"The above syntax produces code equivalent to:"
{
init_statement
while ( condition ) {
statement
iteration_expression ;
}
}
So looking back, I could also interpret/parse statement as my inner block, including the braces, for which I would at least expect a consistent behavior in which ever compiler I am. But ALL compilers will bail out with a redeclaration error for the traditional for-loop.
N4606 (C++17 draft) 3.3.3 basic.scope.block, section 4 says
Names declared in the
init-statement
, the
for-range-declaration
, and in the
condition
of
if
,
while
,
for
, and
switch
statements are local to the
if
,
while
,
for
, or
switch
statement (including the controlled statement),
and shall not be redeclared in a subsequent condition of that statement nor in the outermost block (or, for
the
if
statement, any of the outermost blocks) of the controlled statement; see 6.4
shortened:
Names declared in the ... for-range-declaration ... are local to the ... for ... and shall not be redeclared in a subsequent condition of that statement nor in the outermost block
I read this as saying it should not be allowed.
Is there some space in the C++11 spec to allow for both implementations to be valid?
With one exception the only answer to that is no.
The exception is for global variables, where you can use the scoping operator :: to reach them. Otherwise, if you shadow the name of a variable in an outer scope, you no longer have access to it.

MQL4 Function pointer / function callback solution

As far as i have seen function pointers do not exist in MQL4.
As a workaround i use:
// included for both caller as callee side
class Callback{
public: virtual void callback(){ return; }
}
Then in the source where a callback is passed from:
class mycb : Callback{
public: virtual void callback(){
// call to whatever function needs to be called back in this source
}mcbi;
now mcbi can be passed as follows:
afunction(){
fie_to_receive_callback((Callback *)mycbi);
}
and the receiver can callback as:
fie_to_receive_callback(mycb *mcbi){
mcbi.callback(); // call the callback function
}
is there a simpler way to pass a function callback in mql4 ?
Actually there is a way, using function pointers in MQL4.
Here is an example:
typedef int(*MyFuncType)(int,int);
int addition (int a, int b)
{ return (a+b); }
int subtraction (int a, int b)
{ return (a-b); }
int operation (int x, int y, MyFuncType myfunc)
{
int g;
g = myfunc(x,y);
return (g);
}
int OnInit()
{
int m,n;
m = operation (7, 5, addition);
n = operation (20, m, subtraction);
Print(n);
return(INIT_FAILED); //just to close the expert
}
No. Fortunately there is not. ( . . . . . . . however MQL4 language syntax creeps * )
MQL4 Runtime Execution Engine ( MT4 ) has rather fragile process/thread handling and adding more ( and smarter ) constructs ( beyond rudimentary { OnTimer() | OnTick() | OnCalculate() } event-bound callbacks ) constitutes rather a threat to the already unguaranteed RealTime Execution of the main MT4-duties. While "New"-MQL4.56789 may provide hacks into doing so, there might be safer rather an off-loading strategy to go distributed and let MT4-legacy handlers receive "pre-baked" results from external processing Cluster, rather than trying to hang more and more and more flittering gadgets on a-years-old-poor-Xmas-tree.
To realise how brute this danger-avoidance is, just notice that original OnTimer() used 1 second resolution ( yes 1.000.000.000 ns steps in the world, where stream-providers label events in nano-seconds ... )
* ): Yes, since "new"-MQL4 introduction, there were many stealth-mode changes in the original MQL4-language. After each update it is more than recommendable to review "new"-Help file, as there might be both new options & nasty surprises. Maintaining an MQL4 Code-Base with more than a few hundreds man*years, this is indeed a very devastating experience.

Synchronized Block takes more time after instrumenting with ASM

I am trying to instrument java synchronized block using ASM. The problem is that after instrumenting, the execution time of the synchronized block takes more time. Here it increases from 2 msecs to 200 msecs on Linux box.
I am implementing this by identifying the MonitorEnter and MonitorExit opcode.
I try to instrument at three level 1. just before the MonitorEnter 2. after MonitorEnter 3. Before MonitorExit.
1 and 3 together works fine, but when i do 2, the execution time increase dramatically.
Even if we instrument another single SOP statement, which is intended to be executed just once, it give higher values.
Here the sample code (prime number, 10 loops):
for(int w=0;w<10;w++){
synchronized(s){
long t1 = System.currentTimeMillis();
long num = 2000;
for (long i = 1; i < num; i++) {
long p = i;
int j;
for (j = 2; j < p; j++) {
long n = p % i;
}
}
long t2 = System.currentTimeMillis();
System.out.println("Time>>>>>>>>>>>> " + (t2-t1) );
}
Here the code for instrumention (here System.currentMilliSeconds() gives the time at which instrumention happened, its no the measure of execution time, the excecution time is from obove SOP statement):
public void visitInsn(int opcode)
{
switch(opcode)
{
// Scenario 1
case 194:
visitFieldInsn(Opcodes.GETSTATIC, "java/lang/System", "out", "Ljava/io /PrintStream;");
visitLdcInsn("TIME Arrive: "+System.currentTimeMillis());
visitMethodInsn(Opcodes.INVOKEVIRTUAL, "java/io/PrintStream", "println", "(Ljava/lang/String;)V");
break;
// scenario 3
case 195:
visitFieldInsn(Opcodes.GETSTATIC, "java/lang/System", "out", "Ljava/io/PrintStream;");
visitLdcInsn("TIME exit : "+System.currentTimeMillis());
visitMethodInsn(Opcodes.INVOKEVIRTUAL, "java/io/PrintStream", "println", "(Ljava/lang/String;)V");
break;
}
super.visitInsn(opcode);
// scenario 2
if(opcode==194)
{
visitFieldInsn(Opcodes.GETSTATIC, "java/lang/System", "out", "Ljava/io/PrintStream;");
visitLdcInsn("TIME enter: "+System.currentTimeMillis());
visitMethodInsn(Opcodes.INVOKEVIRTUAL, "java/io/PrintStream", "println", "(Ljava/lang/String;)V");
}
}
I am not able to find the reason why it is happening and how t correct it.
Thanks in advance.
The reason lies in the internals of the JVM that you were using for running the code. I assume that this was a HotSpot JVM but the answers below are equally right for most other implementations.
If you trigger the following code:
int result = 0;
for(int i = 0; i < 1000; i++) {
result += i;
}
This will be translated directly into Java byte code by the Java compiler but at run time the JVM will easily see that this code is not doing anything. Executing this code will have no effect on the outside (application) world, so why should the JVM execute it? This consideration is exactly what compiler optimization does for you.
If you however trigger the following code:
int result = 0;
for(int i = 0; i < 1000; i++) {
System.out.println(result);
}
the Java runtime cannot optimize away your code anymore. The whole loop must always run since the System.out.println(int) method is always doing something real such that your code will run slower.
Now let's look at your example. In your first example, you basically write this code:
synchronized(s) {
// do nothing useful
}
This entire code block can easily be removed by the Java run time. This means: There will be no synchronization! In the second example, you are writing this instead:
synchronized(s) {
long t1 = System.currentTimeMillis();
// do nothing useful
long t2 = System.currentTimeMillis();
System.out.println("Time>>>>>>>>>>>> " + (t2-t1));
}
This means that the effective code might be look like this:
synchronized(s) {
long t1 = System.currentTimeMillis();
long t2 = System.currentTimeMillis();
System.out.println("Time>>>>>>>>>>>> " + (t2-t1));
}
What is important here is that this optimized code will be effectively synchronized what is an important difference with respect to execution time. Basically, you are measuring the time it costs to synchronize something (and even that might be optimized away after a couple of runs if the JVM realized that the s is not locked elsewhere in your code (buzzword: temporary optimization with the possibility of deoptimization if loaded code in the future will also synchronize on s).
You should really read this:
http://www.ibm.com/developerworks/java/library/j-jtp02225/
http://www.ibm.com/developerworks/library/j-jtp12214/
Your test for example misses a warm-up, such that you are also measuring how much time the JVM will use for byte code to machine code optimization.
On a side note: Synchronizing on a String is almost always a bad idea. Your strings might be or might not be interned what means that you cannot be absolutely sure about their identity. This means, that synchronization might or might not work and you might even inflict synchronization of other parts of your code.

pthread condition variables vs win32 events (linux vs windows-ce)

I am doing a performance evaluation between Windows CE and Linux on an arm imx27 board. The code has already been written for CE and measures the time it takes to do different kernel calls like using OS primitives like mutex and semaphores, opening and closing files and networking.
During my porting of this application to Linux (pthreads) I stumbled upon a problem which I cannot explain. Almost all tests showed a performance increase from 5 to 10 times but not my version of win32 events (SetEvent and WaitForSingleObject), CE actually "won" this test.
To emulate the behaviour I was using pthreads condition variables (I know that my implementation doesn't fully emulate the CE version but it's enough for the evaluation).
The test code uses two threads that "ping-pong" each other using events.
Windows code:
Thread 1: (the thread I measure)
HANDLE hEvt1, hEvt2;
hEvt1 = CreateEvent(NULL, FALSE, FALSE, TEXT("MyLocEvt1"));
hEvt2 = CreateEvent(NULL, FALSE, FALSE, TEXT("MyLocEvt2"));
ResetEvent(hEvt1);
ResetEvent(hEvt2);
for (i = 0; i < 10000; i++)
{
SetEvent (hEvt1);
WaitForSingleObject(hEvt2, INFINITE);
}
Thread 2: (just "responding")
while (1)
{
WaitForSingleObject(hEvt1, INFINITE);
SetEvent(hEvt2);
}
Linux code:
Thread 1: (the thread I measure)
struct event_flag *event1, *event2;
event1 = eventflag_create();
event2 = eventflag_create();
for (i = 0; i < 10000; i++)
{
eventflag_set(event1);
eventflag_wait(event2);
}
Thread 2: (just "responding")
while (1)
{
eventflag_wait(event1);
eventflag_set(event2);
}
My implementation of eventflag_*:
struct event_flag* eventflag_create()
{
struct event_flag* ev;
ev = (struct event_flag*) malloc(sizeof(struct event_flag));
pthread_mutex_init(&ev->mutex, NULL);
pthread_cond_init(&ev->condition, NULL);
ev->flag = 0;
return ev;
}
void eventflag_wait(struct event_flag* ev)
{
pthread_mutex_lock(&ev->mutex);
while (!ev->flag)
pthread_cond_wait(&ev->condition, &ev->mutex);
ev->flag = 0;
pthread_mutex_unlock(&ev->mutex);
}
void eventflag_set(struct event_flag* ev)
{
pthread_mutex_lock(&ev->mutex);
ev->flag = 1;
pthread_cond_signal(&ev->condition);
pthread_mutex_unlock(&ev->mutex);
}
And the struct:
struct event_flag
{
pthread_mutex_t mutex;
pthread_cond_t condition;
unsigned int flag;
};
Questions:
Why doesn't I see the performance boost here?
What can be done to improve performance (e.g are there faster ways to implement CEs behaviour)?
I'm not used to coding pthreads, are there bugs in my implementation maybe resulting in performance loss?
Are there any alternative libraries for this?
Note that you don't need to be holding the mutex when calling pthread_cond_signal(), so you might be able to increase the performance of your condition variable 'event' implementation by releasing the mutex before signaling the condition:
void eventflag_set(struct event_flag* ev)
{
pthread_mutex_lock(&ev->mutex);
ev->flag = 1;
pthread_mutex_unlock(&ev->mutex);
pthread_cond_signal(&ev->condition);
}
This might prevent the awakened thread from immediately blocking on the mutex.
This type of implementation only works if you can afford to miss an event. I just tested it and ran into many deadlocks. The main reason for this is that the condition variables only wake up a thread that is already waiting. Signals issued before are lost.
No counter is associated with a condition that allows a waiting thread to simply continue if the condition has already been signalled. Windows Events support this type of use.
I can think of no better solution than taking a semaphore (the POSIX version is very easy to use) that is initialized to zero, using sem_post() for set() and sem_wait() for wait(). You can surely think of a way to have the semaphore count to a maximum of 1 using sem_getvalue()
That said I have no idea whether the POSIX semaphores are just a neat interface to the Linux semaphores or what the performance penalties are.

Producer-consumer with sempahores

Producer-consumer problem taken from Wikipedia:
semaphore mutex = 1
semaphore fillCount = 0
semaphore emptyCount = BUFFER_SIZE
procedure producer() {
while (true) {
item = produceItem()
down(emptyCount)
down(mutex)
putItemIntoBuffer(item)
up(mutex)
up(fillCount)
}
up(fillCount) //the consumer may not finish before the producer.
}
procedure consumer() {
while (true) {
down(fillCount)
down(mutex)
item = removeItemFromBuffer()
up(mutex)
up(emptyCount)
consumeItem(item)
}
}
My question - why does the producer have up(fillCount) //the consumer may not finish before the producer after the while loop. When will the program get there and why is it needed?
I think the code doesn't make sense this way. The loop never ends, so the line in question can be never reached.
The code didn't originally contain that line, and it was added by an anonymous editor in March 2009. I removed that line now.
In general, code on Wikipedia is often edited by many people over a long period of time, so it's quite easy to introduce bugs into it.

Resources