A particular C programming style problem? - coding-style

I have heard a lot about the importance of programming style. In my opinion, indention is easy to deal with. But other things frustrated me a lot. Considering a particular example to demonstrate the use of inet_makeaddr.
/* demonstrate the use of host address functions */
#include <stdio.h>
#include <arpa/inet.h>
#include <netinet/in.h>
int
main(void)
{
/* inet_makeaddr demo */
uint32_t neta = 0x0a3e5500;
uint32_t hosta = 0x0c;
struct in_addr alla = inet_makeaddr(neta, hosta);
printf("makeaddr of net: %08x and host: %08x = %08x\n",
neta, hosta, alla);
return 0;
}
Somebody may want to write as follows:
uint32_t neta;
uint32_t hosta;
struct in_addr alla;
neta = 0x0a3e5500;
hosta = 0x0c;
alla = inet_makeaddr(neta, hosta);
Then others may always initialize the variable when defined:
uint32_t neta = 0;
uint32_t hosta = 0;
struct in_addr alla = {0};
neta = 0x0a3e5500;
hosta = 0x0c;
alla = inet_makeaddr(neta, hosta);
Is any one of these better than the other ones, or it is just a personal taste?

I think the first of the three examples is the best: the second example one has uninitialized variables, and the third example has variables initialized to a meaningless (zero) value. I prefer to initialize variables (with a meaningful value) as soon as I define them (so that I don't have uninitialized variables). See also Should Local Variable Initialisation Be Mandatory?

I like to initialize values when defining. At least you know you won't have any "silly" NULL reference errors.

The bottom one has the small advantage that even if you change the code so that the initialization is no longer performed, you'll never have garbage in the variables, but only zeros. The top one has the same advantage though. My own preference in C is for functions to be extremely short, so that you never have to worry about those kinds of things, so I use the top form or the second form. But if your functions are long-winded, initializing everything to zero might be the way to go.

Personally I define the variables close to the function call if they are interesting for the function call. If it is an uninteresting variable I usually define it it in the declaration.

It is usually always better to initialise variables in any language. Somehow it's just oen of those little things that make your life easier, just like leaving a trailing comma.
Although if you are going to initialise your variables it's probably best to do it with a value that means something to your algorithm, otherwise you're not solving anything, just changing the way everything behaves when you create a bug.

Related

(Solved User error) c++11 make_shared<t>(new class) memory leak

I've been looking into smart pointers, unit testing how they manage memory and am finding and unexpected issue that all the examples recommend doing, but it creates a huge memory leak for me.
This seems to occur when I use a class that has a constructor that builds from another copy of the same class. I'll give an example.
If I have a class like:
Class foo{
public:
//Ignore unsafe practices here
HeavyInMemory* variable;
foo(){
variable = new HeavyInMemory();
}
foo(foo* copyThis){
variable = nullptr;
if(copyThis){
variable = new HeavyInMemory(copyThis->variable);
}
}
~foo(){
delete variable;
}
}
I find that I will get a huge memory leak because std::make_shared has no way to tell the difference between make_shared(args) and make_shared(new T)
Main(){
for(int i =0; i < 100; i++{
//Should not leak, if I follow examples of how to use make_shared
auto test = make_shared<foo>(new foo());
}
//Checking memory addresses these do not match, checking total program memory use, leaks like a
//sieve.
}
Am I misunderstanding something?
Do the examples just not consider this as most use primitive types as examples rather than classes.
Does c++11 just not support the make_shared(new T) format even though I see old books like scott meyers books from 1992. It just doesn't make sense.
Also why would you use make_shared(new T) over make_shared(args)? I've seen a couple threads where people have asked this on here, but neither seemed to actually answer the question with a code example.
//As they mainly say code compiler order causes the leak but in my example this would still leak:
auto originalObject = new foo();
auto expectedDestructorWhenOutofScope = make_shared<foo>(originalObject);
//I have found if I give if the object instead it doesn't leak, but this is getting into the realms of
//hacks that may sometimes work
auto originalObject = new foo();
auto expectedDestructorWhenOutofScope = make_shared<foo>(*originalObject);
EDIT:
Thanks to Igor Tandetnik I now see I am using make_shared entirely wrong. It should be used as a constructor. Thanks again Igor I appreciate it.
//Create new
auto expectedDestructorWhenOutofScope = make_shared<foo>();
//Use object already created
std::shared_ptr<Object> p2(new foo())

What exactly is the in-class-initializer?

I've read many text mentioned the in-class-initializer and I've searched many question on stackoverflow, However I didn't found any precise explanation on what is the in-class-initializer. And as far as I understood the variable of build-in type declared outside any function will be default initialized by the compiler, does the in-class-initilizer doing the same action for a declared variable?
Here is a simple example for in-class initialization. It's useful for less typing, especially when more than one constructor signatures are available. It's recommend in the core guidelines, too.
class Foo {
public:
Foo() = default; // No need to initialize data members in the initializer list.
Foo(bool) { /* Do stuff here. */ } // Again, data member already have values.
private:
int bar = 42;
// ^^^^ in-class initialization
int baz{};
// ^^ same, but requests zero initialization
};
As the data members are explicitly initialized, the second part of your questions doesn't really apply to to in-class initialization.

How can flex return multiple terminals at one time

In order to make my question easy to understand I want to use the following example:
The following code is called nonblock do-loop in fortran language
DO 20 I=1, N ! line 1
DO 20 J=1, N ! line 2
! more codes
20 CONTINUE ! line 4
Pay attention that the label 20 at line 4 means the end of both the inner do-loop and the outer do-loop.
I want my flex program to parse the feature correctly: when flex reads the label 20, it will return ENDDO terminal twice.
Firstly, because I also use bison, so every time bison calls yylex() to get one terminal. If I can ask bison to get terminals from yylex() in some cases, and from another function in other cases, maybe I could solve this problem, however, I got no idea here then.
Of course there are some workarounds, for eample, I can use flex's start condition but I don't think it is a good solution. So I ask if there's any way to solve my question without a workaround?
It is easy enough to modify the lexical scanner produced by (f)lex to implement a token queue, but that is not necessarily the optimal solution. (See below for a better solution.) (Also, it is really not clear to me that for your particular problem, fabricating the extra token in the lexer is truly appropriate.)
The general approach is to insert code at the top of the yylex function, which you can do by placing the code immediately after the %% line and before the first rule. (The code must be indented so that it is not interpreted as a rule.) For non-reentrant scanners, this will typically involve the use of a local static variable to hold the queue. For a simple but dumb example, using the C API but compiling with C++ so as to have access to the C++ standard library:
%%
/* This code will be executed each time `yylex` is called, before
* any generated code. It may include declarations, even if compiled
* with C89.
*/
static std::deque<int> tokenq;
if (!tokenq.empty()) {
int token = tokenq.front();
tokenq.pop_front();
return token;
}
[[:digit:]]+ { /* match a number and return that many HELLO tokens */
int n = atoi(yytext);
for (int i = 0; i < n; ++i)
tokenq.push_back(HELLO);
}
The above code makes no attempt to provide a semantic value for the queued tokens; you could achieve that using something like a std::queue<std::pair<int, YYSTYPE>> for the token queue, but the fact that YYSTYPE is typically a union will make for some complications. Also, if that were the only reason to use the token queue, it is obvious that it could be replaced with a simple counter, which would be much more efficient. See, for example, this answer which does something vaguely similar to your question (and take note of the suggestions in Note 1 of that answer).
Better alternative: Use a push parser
Although the token queue solution is attractive and simple, it is rarely the best solution. In most cases, code will be clearer and easier to write if you request bison to produce a "push parser". With a push parser, the parser is called by the lexer every time a token is available. This makes it trivial to return multiple tokens from a lexer action; you just call the parser for each token. Similarly, if a rule doesn't produce any tokens, it simply fails to call the parser. In this model, the only lexer action which actually returns is the <<EOF>> rule, and it only does so after calling the parser with the END token to indicate that parsing is complete.
Unfortunately, the interface for push parsers is not only subject to change, as that manual link indicates; it is also very badly documented. So here is a simple but complete example which shows how it is done.
The push parser keeps its state in a yypstate structure, which needs to be passed to the parser on each call. Since the lexer is called only once for each input file, it is reasonable for the lexer to own that structure, which can be done as above with a local static variable [Note 1]: the parser state is initialized when yylex is called, and the EOF rule deletes the parser state in order to reclaim whatever memory it is using.
It is usually most convenient to build a reentrant push parser, which means that the parser does not rely on the global yylval variable [Note 2]. Instead, a pointer to the semantic value must be provided as an additional argument to yypush_parse. If your parser doesn't refer to the semantic value for the particular token type, you can provide NULL for this argument. Or, as in the code below, you can use a local semantic value variable in the lexer. It is not necessary that every call to the push parser provide the same pointer. In all, the changes to the scanner definition are minimal:
%%
/* Initialize a parser state object */
yypstate* pstate = yypstate_new();
/* A semantic value which can be sent to the parser on each call */
YYSTYPE yylval;
/* Some example scanner actions */
"keyword" { /* Simple keyword which just sends a value-less token */
yypush_parse(pstate, TK_KEYWORD, NULL); /* See Note 3 */
}
[[:digit:]]+ { /* Token with a semantic value */
yylval.num = atoi(yytext);
yypush_parse(pstate, TK_NUMBER, &yylval);
}
"dice-roll" { /* sends three random numbers */
for (int i = 0; i < 2; ++i) {
yylval.num = rand() % 6;
yypush_parse(pstate, TK_NUMBER, &yylval);
}
<<EOF>> { /* Obligatory EOF rule */
/* Send the parser the end token (0) */
int status = yypush_parse(pstate, 0, NULL);
/* Free the pstate */
yypstate_delete(pstate);
/* return the parser status; 0 is success */
return status;
}
In the parser, not much needs to be changed at all, other than adding the necessary declarations: [Note 4]
%define api.pure full
%define api.push-pull push
Notes
If you were building a reentrant lexer as well, you would use the extra data section of the lexer state object instead of static variables.
If you are using location objects in your parser to track source code locations, this also applies to yylloc.
The example code does not do a good job of detecting errors, since it doesn't check return codes from the calls to yypush_parse. One solution I commonly use is some variant on the macro SEND:
#define SEND(token) do { \
int status = yypush_parse(pstate, token, &yylval); \
if (status != YYPUSH_MORE) { \
yypstate_delete(pstate); \
return status; \
} \
} while (0)
It's also possible to use a goto to avoid the multiple instances of the yypstate_delete and return. YMMV.
You may have to modify the prototype of yyerror. If you are using locations and/or providing extra parameters to the push_parser, the location object and/or the extra parameters will also be present in the yyerror call. (The error string is always the last parameter.) For whatever reason, the parser state object is not provided to yyerror, which means that the yyerror function no longer has access to variables such as yych, which are now members of the yypstate structure rather than being global variables, so if you use these variables in your error reporting (which is not really recommended practice), then you will have to find an alternative solution.
Thanks to one of my friends, he provide a way to achieve
If I can ask bison to get terminals from yylex() in some cases, and from another function in other cases
In flex generated flex.cpp code, there is a macro
/* Default declaration of generated scanner - a define so the user can
* easily add parameters.
*/
#ifndef YY_DECL
#define YY_DECL_IS_OURS 1
extern int yylex (void);
#define YY_DECL int yylex (void)
#endif /* !YY_DECL */
so I can "rename" flex's yylex() function to another function like pure_yylex().
So my problem is solved by:
push all terminals I want to give bison to a global vector<int>
implement a yylex() function by myself, when bison call yylex(), this function will firstly try to get terminals from a that global vector<int>
if vector<int> is empty, yylex() calls pure_yylex(), and flex starts to work

When to use ostream_iterator

As I know, we can use ostream_iterator in c++11 to print a container.
For example,
std::vector<int> myvector;
for (int i=1; i<10; ++i) myvector.push_back(i*10);
std::copy ( myvector.begin(), myvector.end(), std::ostream_iterator<int>{std::cout, " "} );
I don't know when and why we use the code above, instead of traditional way, such as:
for(const auto & i : myvector) std::cout<<i<<" ";
In my opinion, the traditional way is faster because there is no copy, am I right?
std::ostream_iterator is a single-pass OutputIterator, so it can be used in any algorithms which accept such iterator. The use of it for outputing vector of int-s is just for presenting its capabilities.
In my opinion, the traditional way is faster because there is no copy, am I right?
You may find here: http://en.cppreference.com/w/cpp/algorithm/copy that copy is implemented quite similarly to your for-auto loop. It is also specialized for various types to work as efficient as possible. On the other hand writing to std::ostream_iterator is done by assignment to it, and you can read here : http://en.cppreference.com/w/cpp/iterator/ostream_iterator/operator%3D that it resolves to *out_stream << value; operation (if delimiter is ignored).
You may also find that this iterator suffers from the problem of extra trailing delimiter which is inserted at the end. To fix this there will be (possibly in C++17) a new is a single-pass OutputIterator std::experimental::ostream_joiner
A short (and maybe silly) example where using iterator is usefull. The point is that you can direct your data to any sink - a file, console output, memory buffer. Whatever output you choose, MyData::serialize does not needs changes, you only need to provide OutputIterator.
struct MyData {
std::vector<int> data = {1,2,3,4};
template<typename T>
void serialize(T iterator) {
std::copy(data.begin(), data.end(), iterator);
}
};
int main()
{
MyData data;
// Output to stream
data.serialize(std::ostream_iterator<int>(std::cout, ","));
// Output to memory
std::vector<int> copy;
data.serialize(std::back_inserter(copy));
// Other uses with different iterator adaptors:
// std::front_insert_iterator
// other, maybe custom ones
}
The difference is polymorphism vs. hardcoded stream.
std::ostream_iterator builds itself from any class which inherits from std::ostream, so in runtime, you can change or wire the iterator to write to difference output stream type based on the context on which the functions runs.
the second snippet uses a hardcoded std::cout which cannot change in runtime.

Is there a way to make a moved object "invalid"?

I've some code that moves an object into another object. I won't need the original, moved object anymore in the upper level. Thus move is the right choice I think.
However, thinking about safety I wonder if there is a way to invalidate the moved object and thus preventing undefined behaviour if someone accesses it.
Here is a nice example:
// move example
#include <utility> // std::move
#include <vector> // std::vector
#include <string> // std::string
int main () {
std::string foo = "foo-string";
std::string bar = "bar-string";
std::vector<std::string> myvector;
myvector.push_back (foo); // copies
myvector.push_back (std::move(bar)); // moves
return 0;
}
The description says:
The first call to myvector.push_back copies the value of foo into the
vector (foo keeps the value it had before the call). The second call
moves the value of bar into the vector. This transfers its content
into the vector (while bar loses its value, and now is in a valid but
unspecified state).
Is there a way to invalidate bar, such that access to it will cause a compiler error? Something like:
myvector.push_back (std::move(bar)); // moves
invalidate(bar); //something like bar.end() will then result in a compiler error
Edit: And if there is no such thing, why?
Accessing the moved object is not undefined behavior. The moved object is still a valid object, and the program may very well want to continue using said object. For example,
template< typename T >
void swap_by_move(T &a, T &b)
{
using std::move;
T c = move(b);
b = move(a);
a = move(c);
}
The bigger picture answer is because moving or not moving is a decision made at runtime, and giving a compile-time error is a decision made at compile time.
foo(bar); // foo might move or not
bar.baz(); // compile time error or not?
It's not going to work.. you can approximate in compile time analysis, but then it's going to be really difficult for developers to either not get an error or making anything useful in order to keep a valid program or the developer has to make annoying and fragile annotations on functions called to promise not to move the argument.
To put it a different way, you are asking about having a compile time error if you use an integer variable that contains the value 42. Or if you use a pointer that contains a null pointer value. You might be succcessful in implementing an approximate build-time code convention checker using clang the analysis API, however, working on the CFG of the C++ AST and erroring out if you can't prove that std::move has not been called till a given use of a variable.
Move semantics works like that so you get an object in any it's correct state. Correct state means that all fields have correct value, and all internal invariants are still good. That was done because after move you don't actually care about contents of moved object, but stuff like resource management, assignments and destructors should work OK.
All STL classes (and all classed with default move constructor/assignment) just swap it's content with new one, so both states are correct, and it's very easy to implement, fast, and convinient enough.
You can define your class that has isValid field that's generally true and on move (i. e. in move constructor / move assignment) sets that to false. Then your object will have correct state I am invalid. Just don't forget to check it where needed (destructor, assignment etc).
That isValid field can be either one pointer having null value. The point is: you know, that object is in predictable state after move, not just random bytes in memory.
Edit: example of String:
class String {
public:
string data;
private:
bool m_isValid;
public:
String(string const& b): data(b.data), isValid(true) {}
String(String &&b): data(move(b.data)) {
b.m_isValid = false;
}
String const& operator =(String &&b) {
data = move(b.data);
b.m_isValid = false;
return &this;
}
bool isValid() {
return m_isValid;
}
}

Resources