It is well known that ios_base::sync_with_stdio(false) will help the performance of cin and cout in <iostream> by preventing sync b/w C and C++ I/O. However, I am curious as to whether it makes any difference at all in <fstream>.
I ran some tests with GNU C++11 and the following code (with and without the ios_base::sync_with_stdio(false) snippet):
#include <fstream>
#include <iostream>
#include <chrono>
using namespace std;
ofstream out("out.txt");
int main() {
auto start = chrono::high_resolution_clock::now();
long long val = 2;
long long x=1<<22;
ios_base::sync_with_stdio(false);
while (x--) {
val += x%666;
out << val << "\n";
}
auto end = chrono::high_resolution_clock::now();
chrono::duration<double> diff = end-start;
cout<<diff.count()<<" seconds\n";
return 0;
}
The results are as follows:
With sync_with_stdio(false): 0.677863 seconds (average 3 trials)
Without sync_with_stdio(false): 0.653789 seconds (average 3 trials)
Is this to be expected? Is there a reason for a nearly identical, if not slower speed, with sync_with_stdio(false)?
Thank you for your help.
The idea of sync_with_stdio() is to allow mixing input and output to standard stream objects (stdin, stdout, and stderr in C and std::cin, std::cout, std::cerr, and std::clog as well as their wide character stream counterparts in C++) without any need to worry about characters being buffered in any of the buffers of the involved objects. Effectively, with std::ios_base::sync_with_stdio(true) the C++ IOStreams can't use their own buffers. In practice that normally means that buffering on std::streambuf level is entirely disabled. Without a buffer IOStreams are rather expensive, though, as they process individual character involving potentially multiple virtual function calls. Essentially, the speed-up you get from std::ios_base::sync_with_stdio(false) is allowing both the C and C++ library to user their own buffers.
An alternative approach could be to share the buffer between the C and C++ library facilities, e.g., by building the C library facilities on top of the more powerful C++ library facilities (before people complain that this would be a terrible idea, making C I/O slower: that is actually not true at all with a proper implementation of the standard C++ library IOStreams). I'm not aware of any non-experimental implementation which does use that. With this setup std::ios_base::sync_with_stdio(value) wouldn't have any effect at all.
Typical implementations of IOStreams use different stream buffers for the standard stream objects from those used for file streams. Part of the reason is probably that the standard stream objects are normally not opened using a name but some other entity identifying them, e.g., a file descriptor on UNIX systems and it would require a "back door" interface to allow using a std::filebuf for the standard stream objects. However, at least early implementations of Dinkumware's standard C++ library which shipped (ships?), e.g., with MSVC++, used std::filebuf for the standard stream objects. This std::filebuf implementation was just a wrapper around FILE*, i.e., literally implementing what the C++ standard says rather than semantically implementing it. That was already a terrible idea to start with but it was made worse by inhibiting std::streambuf level buffering for all file streams with std::ios_base::sync_with_stdio(true) as that setting also affected file streams. I do not know whether this [performance] problem was fixed since. Old issue in the C/C++ User Journal and/or P.J.Plauger's "The [draft] Standard C++ Library" should show a discussion of this implementation.
tl;dr: According to the standard std::ios_base::sync_with_stdio(false) only changes the constraints for the standard stream objects to make their use faster. Whether it has other effects depends on the IOStream implementation and there was at least one (Dinkumware) where it made a difference.
Related
I'm learning how to use microcontrollers without a bunch of abstractions. I've read somewhere that it's better to use PUT32() and GET32() instead of volatile pointers and stuff. Why is that?
With a basic pin wiggle "benchmark," the performance of GPIO->ODR=0xFFFFFFFF seems to be about four times faster than PUT32(GPIO_ODR, 0xFFFFFFFF), as shown by the scope:
(The one with lower frequency is PUT32)
This is my code using PUT32
PUT32(0x40021034, 0x00000002); // RCC IOPENR B
PUT32(0x50000400, 0x00555555); // PB MODER
while (1) {
PUT32(0x50000414, 0x0000FFFF); // PB ODR
PUT32(0x50000414, 0x00000000);
}
This is my code using the arrow thing
* (volatile uint32_t *) 0x40021034 = 0x00000002; // RCC IOPENR B
GPIOB->MODER = 0x00555555; // PB MODER
while (1) {
GPIOB->ODR = 0x00000000; // PB ODR
GPIOB->ODR = 0x0000FFFF;
}
I shamelessly adapted the assembly for PUT32 from somewhere
PUT32 PROC
EXPORT PUT32
STR R1,[R0]
BX LR
ENDP
My questions are:
Why is one method slower when it looks like they're doing the same thing?
What's the proper or best way to interact with GPIO? (Or rather what are the pros and cons of different methods?)
Additional information:
Chip is STM32G031G8Ux, using Keil uVision IDE.
I didn't configure the clock to go as fast as it can, but it should be consistent for the two tests.
Here's my hardware setup: (Scope probe connected to the LEDs. The extra wires should have no effect here)
Thank you for your time, sorry for any misunderstandings
PUT32 is a totally non-standard method that the poster in that other question made up. They have done this to avoid the complication and possible mistakes in defining the register access methods.
When you use the standard CMSIS header files and assign to the registers in the standard way, then all the complication has already been taken care of for you by someone who has specific knowledge of the target that you are using. They have designed it in a way that makes it hard for you to make the mistakes that the PUT32 is trying to avoid, and in a way that makes the final syntax look cleaner.
The reason that writing to the registers directly is quicker is because writing to a register can take as little as a single cycle of the processor clock, whereas calling a function and then writing to the register and then returning takes four times longer in the context of your experiment.
By using this generic access method you also risk introducing bugs that are not possible if you used the manufacturer provided header files: for example using a 32 bit access when the register is 16 or 8 bits.
This question is not about AU plugins, but about integrating audio units as building blocks of standalone application programs. After much trying I can't figure out what would be the simplest "graphless" connection of two AudioUnits, which would function as a "playthrough".
I understand how powerful and sufficient a single audio unit of subtype kAudioUnitSubType_HALOutput can be in capturing, rendering, live-processing and forwarding of any audio input data. However, play through seems functional as long as working with either full-duplex audio hardware or creating aggregate i/o device from built-in devices on user level.
However, built-in devices are not full duplex and "aggregating" them also has certain disadvantage. Therefore I've decided to study a hard-coded two-unit connection possibility (without plunging into Graph API), and test its behavior with non-full-duplex hardware.
Unfortunately, I have found neither comprehensive documentation nor example code for creating a simplest two-unit play through using only the straightforward connecting paradigm, as suggested in Apple Technical Note TN2091:
AudioUnitElement halUnitOutputBus = 1; //1 suggested by TN2091 (else 0)
AudioUnitElement outUnitInputElement = 1; //1 suggested by TN2091 (else 0)
AudioUnitConnection halOutToOutUnitIn;
halOutToOutUnitIn.sourceAudioUnit = halAudioUnit;
halOutToOutUnitIn.sourceOutputNumber = halUnitOutputBus;
halOutToOutUnitIn.destInputNumber = outUnitInputElement;
AudioUnitSetProperty (outAudioUnit, // connection destination
kAudioUnitProperty_MakeConnection, // property key
kAudioUnitScope_Input, // destination scope
outUnitInputElement, // destination element
&halOutToOutUnitIn, // connection definition
sizeof (halOutToOutUnitIn)
);
My task is to keep off involving Graphs if possible, or even worse, CARingBuffers from so-called PublicUtility, which used to be plagued by bugs and latency issues for years and involve some ambitious assumptions, such as:
#if TARGET_OS_WIN32
#include <windows.h>
#include <intrin.h>
#pragma intrinsic(_InterlockedOr)
#pragma intrinsic(_InterlockedAnd)
#else
#include <CoreFoundation/CFBase.h>
#include <libkern/OSAtomic.h>
#endif
Thanks in advance for any hint which may point me in the right direction.
I was watching Bjarne Stroustrup's talk "The Essence of C++".
In 44:26 he mentioned "C++11 specifies a GC Interface".
May I ask what is the interface, and how to implement it?
Any more detailed good introduction online, or some sample codes to demonstrate it pls?
Stroustrup extends this discussion in his C++ FAQ, the thing is that GC usage is optional, library vendors are free to implement one or not :
Garbage collection (automatic recycling of unreferenced regions of
memory) is optional in C++; that is, a garbage collector is not a
compulsory part of an implementation. However, C++11 provides a
definition of what a GC can do if one is used and an ABI (Application
Binary Interface) to help control its actions.
The rules for pointers and lifetimes are expressed in terms of "safely
derived pointer" (3.7.4.3); roughly: "pointer to something allocated
by new or to a sub-object thereof."
to ordinary mortals: [...]
The functions in the C++ standard supporting this (the "interface" to which Stroustrup is referring to) are :
std::declare_reachable
std::undeclare_reachable
std::declare_no_pointers
std::undeclare_no_pointers
These functions are presented in the N2670 proposal :
Its purpose is to support both garbage collected implementations and
reachability-based leak detectors. This is done by giving undefined
behavior to programs that "hide a pointer" by, for example, xor-ing it
with another value, and then later turn it back into an ordinary
pointer and dereference it. Such programs may currently produce
incorrect results with conservative garbage collectors, since an
object referenced only by such a "hidden pointer" may be prematurely
collected. For the same reason, reachability-based leak detectors may
erroneously report that such programs leak memory.
Either your implementation supports "strict pointer safety" in which case implementing a GC is possible, or it has a "relaxed pointer safety" (by default), in which case it is not. You can determine that by looking at the result of std::get_pointer_safety(), if available.
I don't know of any actual standard C++ GC implementation, but at least the standard is preparing the ground for it to happen.
In addition to the good answer by quantdev, which I've upvoted, I wanted to provide a little more information here (which would not fit in a comment).
Here is a C++11 conforming program which demonstrates whether or not an implementation supports the GC interface:
#include <iostream>
#include <memory>
int
main()
{
#ifdef __STDCPP_STRICT_POINTER_SAFETY__
std::cout << __STDCPP_STRICT_POINTER_SAFETY__ << '\n';
#endif
switch (std::get_pointer_safety())
{
case std::pointer_safety::relaxed:
std::cout << "relaxed\n";
break;
case std::pointer_safety::preferred:
std::cout << "preferred\n";
break;
case std::pointer_safety::strict:
std::cout << "strict\n";
break;
}
}
An output of:
relaxed
means that the implementation has a trivial implementation which does nothing at all.
libc++ outputs:
relaxed
VS-2015 outputs:
relaxed
gcc 5.0 outputs:
prog.cc: In function 'int main()':
prog.cc:10:13: error: 'get_pointer_safety' is not a member of 'std'
switch (std::get_pointer_safety())
^
Is there a way to protect an area of the memory?
I have this struct:
#define BUFFER 4
struct
{
char s[BUFFER-1];
const char zc;
} str = {'\0'};
printf("'%s', zc=%d\n", str.s, str.zc);
It is supposed to operate strings of lenght BUFFER-1, and garantee that it ends in '\0'.
But compiler gives error only for:
str.zc='e'; /*error */
Not if:
str.s[3]='e'; /*no error */
If compiling with gcc and some flag might do, that is good as well.
Thanks,
Beco
To detect errors at runtime take a look at the -fstack-protector-all option in gcc. It may be of limited use when attempting to detect very small overflows like the one your described.
Unfortunately you aren't going to find a lot of info on detecting buffer overflow scenarios like the one you described at compile-time. From a C language perspective the syntax is totally correct, and the language gives you just enough rope to hang yourself with. If you really want to protect your buffers from yourself you can write a front-end to array accesses that validates the index before it allows access to the memory you want.
The definition of GUID in the windows header's is like this:
typedef struct _GUID {
unsigned long Data1;
unsigned short Data2;
unsigned short Data3;
unsigned char Data4[ 8 ];
} GUID;
However, no packing is not defined. Since the alignment of structure members is dependent on the compiler implementation one could think this structure could be longer than 16 bytes in size.
If i can assume it is always 16 bytes - my code using GUIDs is more efficient and simple.
However, it would be completely unsafe - if a compiler adds some padding in between of the members for some reason.
My questions do potential reasons exist ? Or is the probability of the scenario that sizeof(GUID)!=16 actually really 0.
It's not official documentation, but perhaps this article can ease some of your fears. I think there was another one on a similar topic, but I cannot find it now.
What I want to say is that Windows structures do have a packing specifier, but it's a global setting which is somewhere inside the header files. It's a #pragma or something. And it is mandatory, because otherwise programs compiled by different compilers couldn't interact with each other - or even with Windows itself.
It's not zero, it depends on your system. If the alignment is word (4-bytes) based, you'll have padding between the shorts, and the size will be more than 16.
If you want to be sure that it's 16 - manually disable the padding, otherwise use sizeof, and don't assume the value.
If I feel I need to make an assumption like this, I'll put a 'compile time assertion' in the code. That way, the compiler will let me know if and when I'm wrong.
If you have or are willing to use Boost, there's a BOOST_STATIC_ASSERT macro that does this.
For my own purposes, I've cobbled together my own (that works in C or C++ with MSVC, GCC and an embedded compiler or two) that uses techniques similar to those described in this article:
http://www.pixelbeat.org/programming/gcc/static_assert.html
The real tricks to getting the compile time assertion to work cleanly is dealing with the fact that some compilers don't like declarations mixed with code (MSVC in C mode), and that the techniques often generate warnings that you'd rather not have clogging up an otherwise working build. Coming up with techniques that avoid the warnings is sometimes a challenge.
Yes, on any Windows compiler. Otherwise IsEqualGUID would not work: it compares only the first 16 bytes. Similarly, any other WinAPI function that takes a GUID* just checks the first 16 bytes.
Note that you must not assume generic C or C++ rules for windows.h. For instance, a byte is always 8 bits on Windows, even though ISO C allows 9 bits.
Anytime you write code dependent on the size of someone else's structure,
warning bells should go off.
Could you give an example of some of the simplified code you want to use?
Most people would just use sizeof(GUID) if the size of the structure was needed.
With that said -- I can't see the size of GUID ever changing.
#include <stdio.h>
#include <rpc.h>
int main () {
GUID myGUID;
printf("size of GUID is %d\n", sizeof(myGUID));
return 0;
}
Got 16. This is useful to know if you need to manually allocate on the heap.