Context
Assessment piece for a data structures and algorithms course, an exercise in using an AVL tree and hash table to parse input to create a dictionary file and then use that file to perform cursory spell checking.
N.B.: I am not asking for help in solving this problem that's not what I'm having difficulty with. I am asking for help understanding an aspect of C++ function object passing/usage that is causing me considerable frustration. This aspect of C++ is not part of the assessment, there are no marks attached to it, I simply have a personal issue submitting code I dislike the design of.
Problem
Passing a functor to a recursive function results in compiler error, "attempt to use a deleted function." I thought this was an issue with passing the functor by value, so I changed the parameter to pass by reference which yields a, "no matching member function for call to <public member function of AVL tree that kicks off the recursion>," in which case I don't know how to alter the function declaration so it does match. I have also tried making the parameter: const UnaryFunction& action (a constant function-object reference), but this yields the compiler error, "no matching function for call to object of type 'const std::__1::__mem_fn<void (DictGen::*)(std::__1::basic_string<char> &)>'," in which case I can't understand why it wouldn't be matching to the DictGen::output signature.
Code
Relevant parts of AVL tree class:
template <class T>
struct AVLNode
{ // simple data carrier node for AVL tree
AVLNode<T>* lChild;
AVLNode<T>* rChild;
AVLBalance balFac;
T data;
};
template <class T>
class AVLTree<T>
{
...
AVLNode<T>* root;
template <class UnaryFunction>
void inorderAction( AVLNode<T>* node, UnaryFunction action )
{
if ( node != NULL )
{
inorderAction( node->lChild, action );
action( node->data ); // << problem line
inorderAction( node->rChild, action );
}
}
public:
template <class UnaryFunction>
void inorder( UnaryFunction action )
{
inorderAction( root, action );
}
}
Relevant parts of DictGen class:
class DictGen
{
...
FILE* outStream;
AVLTree<std::string> dict;
void output( std::string& word )
{
fprintf( outstream, "%s\n", word.c_str() );
}
public:
goGoGadgetDictionaryGenerator()
{
...
dict.inorder( std::mem_fn( &DictGen::output ) ); // << also problem line
}
}
Interpretation/Translation
AVL tree class has a flexible inorder traversal that allows me to action the node however I want with the given UnaryFunction action. A DictGen object is initialised with a FILE* so DictGen instances may output to different files, hence the need to pass a member function object in the dict.inorder( ... ) call.
Efforts/research so far
My initial solution was to follow the functions as parameters example given in our textbook which involved using C function pointers and polluting global space. Although this worked I was unsatisfied with this design; I wished to bundle this behaviour in a DictGen class.
My after consulting both my lecturer and lab tutor they suggested using C++ functors but weren't able to help with implementation as neither had used functors in a while.
I forged ahead finding very handy material on SO (helping me reference a member function), several functor tutorials via Google and an excellent PDF from a Stanford course regarding functor implementation and usage. However, while all these resources have carried me this far, none have been able to shed any light on my current predicament. I was really hoping making the parameter a const UnaryFunction& would solve it but can't understand why the signature doesn't match.
I have also tried using an inline lambda but require the object context to access outStream.
I have spent the last four days ploughing away at this issue and the only remaining lead I have is an SO post that casually remarked that the C++ spec contains information about the implicit deletion of function objects but I haven't been able to make any further progress. If there is an SO post that solves my issue, I haven't been able to find it.
Questions
Does the recursion really have anything to do with this issue?
Is there some novice aspect of functor passing/usage I'm not grasping?
What is causing the function to be deleted?
What am I missing about getting the function signatures to match when it appears that function deletion isn't the issue?
This is my very first SO post, I have done my best to keep the question-asking suggestions in mind. I welcome any constructive criticism to help me improve this post so that I can it can both solve my issue and serve as a future resource for similar issues.
You need to have an instance of DictGen bound to the member function:
// ...
void gen()
{
dict.inorder(
std::bind( std::mem_fn( &DictGen::output ),
this, std::placeholders::_1) );
}
// ...
You are coding in C++11. While there are uses for std::mem_fn and std::bind, they are a very awkward way to generate these kind of functors.
void gen()
{
dict.inorder(
[this]( std::string& word ) { this->output(word); }
);
}
while the lambda syntax might be somewhat new to you, this is far less backwards than the std::bind( std::mem_fn( &T::method ), this, std::placeholders::_1)
The basic syntax of a lambda is:
[capture-list]( arguments )->return value { code }
where capture-list is [=] (auto-capture by value) or [&] (auto-capture by reference) or [var1, var2] (capture var1 and var2 by value) or [&var1, &var2] (capture var1 and var2 by reference) or a mixture of same. (C++1y adds new syntax, like [x = std::move(y)])
(arguments) are just a usual function argument bit. It is actually optional, but required if you want a return value.
-> return value is optional for single-statement lambdas, or lambdas that return void. (In C++1y, it is optional even with multiple returns)
Then the code.
Related
I'm building a publish-subscribe class (called SystermInterface), which is responsible to receive updates from its instances, and publish them to subscribers.
Adding a subscriber callback function is trivial and has no issues, but removing it yields an error, because std::function<()> is not comparable in C++.
std::vector<std::function<void()> subs;
void subscribe(std::function<void()> f)
{
subs.push_back(f);
}
void unsubscribe(std::function<void()> f)
{
std::remove(subs.begin(), subs.end(), f); // Error
}
I've came down to five solutions to this error:
Registering the function using a weak_ptr, where the subscriber must keep the returned shared_ptr alive.
Solution example at this link.
Instead of registering at a vector, map the callback function by a custom key, unique per callback function.
Solution example at this link
Using vector of function pointers. Example
Make the callback function comparable by utilizing the address.
Use an interface class (parent class) to call a virtual function.
In my design, all intended classes inherits a parent class called
ServiceCore, So instead of registering a callback function, just
register ServiceCore reference in the vector.
Given that the SystemInterface class has a field attribute per instance (ID) (Which is managed by ServiceCore, and supplied to SystemInterface by constructing a ServiceCore child instance).
To my perspective, the first solution is neat and would work, but it requires handling at subscribers, which is something I don't really prefer.
The second solution would make my implementation more complex, where my implementation looks as:
using namespace std;
enum INFO_SUB_IMPORTANCE : uint8_t
{
INFO_SUB_PRIMARY, // Only gets the important updates.
INFO_SUB_COMPLEMENTARY, // Gets more.
INFO_SUB_ALL // Gets all updates
};
using CBF = function<void(string,string)>;
using INFO_SUBTREE = map<INFO_SUB_IMPORTANCE, vector<CBF>>;
using REQINF_SUBS = map<string, INFO_SUBTREE>; // It's keyed by an iterator, explaining it goes out of the question scope.
using INFSRC_SUBS = map<string, INFO_SUBTREE>;
using WILD_SUBS = INFO_SUBTREE;
REQINF_SUBS infoSubrs;
INFSRC_SUBS sourceSubrs;
WILD_SUBS wildSubrs;
void subscribeInfo(string info, INFO_SUB_IMPORTANCE imp, CBF f) {
infoSubrs[info][imp].push_back(f);
}
void subscribeSource(string source, INFO_SUB_IMPORTANCE imp, CBF f) {
sourceSubrs[source][imp].push_back(f);
}
void subscribeWild(INFO_SUB_IMPORTANCE imp, CBF f) {
wildSubrs[imp].push_back(f);
}
The second solution would require INFO_SUBTREE to be an extended map, but can be keyed by an ID:
using KEY_T = uint32_t; // or string...
using INFO_SUBTREE = map<INFO_SUB_IMPORTANCE, map<KEY_T,CBF>>;
For the third solution, I'm not aware of the limitations given by using function pointers, and the consequences of the fourth solution.
The Fifth solution would eliminate the purpose of dealing with CBFs, but it'll be more complex at subscriber-side, where a subscriber is required to override the virtual function and so receives all updates at one place, in which further requires filteration of the message id and so direct the payload to the intended routines using multiple if/else blocks, which will increase by increasing subscriptions.
What I'm looking for is an advice for the best available option.
Regarding your proposed solutions:
That would work. It can be made easy for the caller: have subscribe() create the shared_ptr and corresponding weak_ptr objects, and let it return the shared_ptr.
Then the caller must not lose the key. In a way this is similar to the above.
This of course is less generic, and then you can no longer have (the equivalent of) captures.
You can't: there is no way to get the address of the function stored inside a std::function. You can do &f inside subscribe() but that will only give you the address of the local variable f, which will go out of scope as soon as you return.
That works, and is in a way similar to 1 and 2, although now the "key" is provided by the caller.
Options 1, 2 and 5 are similar in that there is some other data stored in subs that refers to the actual std::function: either a std::shared_ptr, a key or a pointer to a base class. I'll present option 6 here, which is kind of similar in spirit but avoids storing any extra data:
Store a std::function<void()> directly, and return the index in the vector where it was stored. When removing an item, don't std::remove() it, but just set it to std::nullptr. Next time subscribe() is called, it checks if there is an empty element in the vector and reuses it:
std::vector<std::function<void()> subs;
std::size_t subscribe(std::function<void()> f) {
if (auto it = std::find(subs.begin(), subs.end(), std::nullptr); it != subs.end()) {
*it = f;
return std::distance(subs.begin(), it);
} else {
subs.push_back(f);
return subs.size() - 1;
}
}
void unsubscribe(std::size_t index) {
subs[index] = std::nullptr;
}
The code that actually calls the functions stored in subs must now of course first check against std::nullptrs. The above works because std::nullptr is treated as the "empty" function, and there is an operator==() overload that can check a std::function against std::nullptr, thus making std::find() work.
One drawback of option 6 as shown above is that a std::size_t is a rather generic type. To make it safer, you might wrap it in a class SubscriptionHandle or something like that.
As for the best solution: option 1 is quite heavy-weight. Options 2 and 5 are very reasonable, but 6 is, I think, the most efficient.
I've some code that moves an object into another object. I won't need the original, moved object anymore in the upper level. Thus move is the right choice I think.
However, thinking about safety I wonder if there is a way to invalidate the moved object and thus preventing undefined behaviour if someone accesses it.
Here is a nice example:
// move example
#include <utility> // std::move
#include <vector> // std::vector
#include <string> // std::string
int main () {
std::string foo = "foo-string";
std::string bar = "bar-string";
std::vector<std::string> myvector;
myvector.push_back (foo); // copies
myvector.push_back (std::move(bar)); // moves
return 0;
}
The description says:
The first call to myvector.push_back copies the value of foo into the
vector (foo keeps the value it had before the call). The second call
moves the value of bar into the vector. This transfers its content
into the vector (while bar loses its value, and now is in a valid but
unspecified state).
Is there a way to invalidate bar, such that access to it will cause a compiler error? Something like:
myvector.push_back (std::move(bar)); // moves
invalidate(bar); //something like bar.end() will then result in a compiler error
Edit: And if there is no such thing, why?
Accessing the moved object is not undefined behavior. The moved object is still a valid object, and the program may very well want to continue using said object. For example,
template< typename T >
void swap_by_move(T &a, T &b)
{
using std::move;
T c = move(b);
b = move(a);
a = move(c);
}
The bigger picture answer is because moving or not moving is a decision made at runtime, and giving a compile-time error is a decision made at compile time.
foo(bar); // foo might move or not
bar.baz(); // compile time error or not?
It's not going to work.. you can approximate in compile time analysis, but then it's going to be really difficult for developers to either not get an error or making anything useful in order to keep a valid program or the developer has to make annoying and fragile annotations on functions called to promise not to move the argument.
To put it a different way, you are asking about having a compile time error if you use an integer variable that contains the value 42. Or if you use a pointer that contains a null pointer value. You might be succcessful in implementing an approximate build-time code convention checker using clang the analysis API, however, working on the CFG of the C++ AST and erroring out if you can't prove that std::move has not been called till a given use of a variable.
Move semantics works like that so you get an object in any it's correct state. Correct state means that all fields have correct value, and all internal invariants are still good. That was done because after move you don't actually care about contents of moved object, but stuff like resource management, assignments and destructors should work OK.
All STL classes (and all classed with default move constructor/assignment) just swap it's content with new one, so both states are correct, and it's very easy to implement, fast, and convinient enough.
You can define your class that has isValid field that's generally true and on move (i. e. in move constructor / move assignment) sets that to false. Then your object will have correct state I am invalid. Just don't forget to check it where needed (destructor, assignment etc).
That isValid field can be either one pointer having null value. The point is: you know, that object is in predictable state after move, not just random bytes in memory.
Edit: example of String:
class String {
public:
string data;
private:
bool m_isValid;
public:
String(string const& b): data(b.data), isValid(true) {}
String(String &&b): data(move(b.data)) {
b.m_isValid = false;
}
String const& operator =(String &&b) {
data = move(b.data);
b.m_isValid = false;
return &this;
}
bool isValid() {
return m_isValid;
}
}
In C#, you can define a custom enumeration very trivially, eg:
public IEnumerable<Foo> GetNestedFoos()
{
foreach (var child in _SomeCollection)
{
foreach (var foo in child.FooCollection)
{
yield return foo;
}
foreach (var bar in child.BarCollection)
{
foreach (var foo in bar.MoreFoos)
{
yield return foo;
}
}
}
foreach (var baz in _SomeOtherCollection)
{
foreach (var foo in baz.GetNestedFoos())
{
yield return foo;
}
}
}
(This can be simplified using LINQ and better encapsulation but that's not the point of the question.)
In C++11, you can do similar enumerations but AFAIK it requires a visitor pattern instead:
template<typename Action>
void VisitAllFoos(const Action& action)
{
for (auto& child : m_SomeCollection)
{
for (auto& foo : child.FooCollection)
{
action(foo);
}
for (auto& bar : child.BarCollection)
{
for (auto& foo : bar.MoreFoos)
{
action(foo);
}
}
}
for (auto& baz : m_SomeOtherCollection)
{
baz.VisitAllFoos(action);
}
}
Is there a way to do something more like the first, where the function returns a range that can be iterated externally rather than calling a visitor internally?
(And I don't mean by constructing a std::vector<Foo> and returning it -- it should be an in-place enumeration.)
I am aware of the Boost.Range library, which I suspect would be involved in the solution, but I'm not particularly familiar with it.
I'm also aware that it's possible to define custom iterators to do this sort of thing (which I also suspect might be involved in the answer) but I'm looking for something that's easy to write, ideally no more complicated than the examples shown here, and composable (like with _SomeOtherCollection).
I would prefer something that does not require the caller to use lambdas or other functors (since that just makes it a visitor again), although I don't mind using lambdas internally if needed (but would still prefer to avoid them there too).
If I'm understanding your question correctly, you want to perform some action over all elements of a collection.
C++ has an extensive set of iterator operations, defined in the iterator header. Most collection structures, including the std::vector that you reference, have .begin and .end methods which take no arguments and return iterators to the beginning and the end of the structure. These iterators have some operations that can be performed on them manually, but their primary use comes in the form of the algorithm header, which defines several very useful iteration functions.
In your specific case, I believe you want the for_each function, which takes a range (as a beginning to end iterator) and a function to apply. So if you had a function (or function object) called action and you wanted to apply it to a vector called data, the following code would be correct (assuming all necessary headers are included appropriately):
std::for_each(data.begin(), data.end(), action);
Note that for_each is just one of many functions provided by the algorithm header. It also provides functions to search a collection, copy a set of data, sort a list, find a minimum/maximum, and much more, all generalized to work over any structure that has an iterator. And if even these aren't enough, you can write your own by reading up on the operations supported on iterators. Simply define a template function that takes iterators of varying types and document what kind of iterator you want.
template <typename BidirectionalIterator>
void function(BidirectionalIterator begin, BidirectionalIterator end) {
// Do something
}
One final note is that all of the operations mentioned so far also operate correctly on arrays, provided you know the size. Instead of writing .begin and .end, you write + 0 and + n, where n is the size of the array. The trivial zero addition is often necessary in order to decay the type of the array into a pointer to make it a valid iterator, but array pointers are indeed random access iterators just like any other container iterator.
What you can do is writing your own adapter function and call it with different ranges of elements of the same type.
This is a non tested solution, that will probably needs some tweaking to make it compile,but it will give you an idea. It uses variadic templates to move from a collection to the next one.
template<typename Iterator, Args...>
visitAllFoos(std::pair<Iterator, Iterator> collection, Args&&... args)
{
std::for_each(collection.first, collection.second, {}(){ // apply action });
return visitAllFoos(std::forward<Args>(args)...);
}
//you can call it with a sequence of begin/end iterators
visitAllFoos(std::make_pair(c1.begin(), c1,end()), std::make_pair(c2.begin(), c2,end()))
I believe, what you're trying to do can be done with Boost.Range, in particular with join and any_range (the latter would be needed if you want to hide the types of the containers and remove joined_range from the interface).
However, the resulting solution would not be very practical both in complexity and performance - mostly because of the nested joined_ranges and type erasure overhead incurred by any_range. Personally, I would just construct std::vector<Foo*> or use visitation.
You can do this with the help of boost::asio::coroutine; see examples at https://pubby8.wordpress.com/2014/03/16/multi-step-iterators-using-coroutines/ and http://www.boost.org/doc/libs/1_55_0/doc/html/boost_asio/overview/core/coroutine.html.
struct STest : public boost::noncopyable {
STest(STest && test) : m_n( std::move(test.m_n) ) {}
explicit STest(int n) : m_n(n) {}
int m_n;
};
STest FuncUsingConst(int n) {
STest const a(n);
return a;
}
STest FuncWithoutConst(int n) {
STest a(n);
return a;
}
void Caller() {
// 1. compiles just fine and uses move ctor
STest s1( FuncWithoutConst(17) );
// 2. does not compile (cannot use move ctor, tries to use copy ctor)
STest s2( FuncUsingConst(17) );
}
The above example illustrates how in C++11, as implemented in Microsoft Visual C++ 2012, the internal details of a function can modify its return type. Up until today, it was my understanding that the declaration of the return type is all a programmer needs to know to understand how the return value will be treated, e.g., when passed as a parameter to a subsequent function call. Not so.
I like making local variables const where appropriate. It helps me clean up my train of thought and clearly structure an algorithm. But beware of returning a variable that was declared const! Even though the variable will no longer be accessed (a return statement was executed, after all), and even though the variable that was declared const has long gone out of scope (evaluation of the parameter expression is complete), it cannot be moved and thus will be copied (or fail to compile if copying is not possible).
This question is related to another question, Move semantics & returning const values. The difference is that in the latter, the function is declared to return a const value. In my example, FuncUsingConst is declared to return a volatile temporary. Yet, the implementational details of the function body affect the type of the return value, and determine whether or not the returned value can be used as a parameter to other functions.
Is this behavior intended by the standard?
How can this be regarded useful?
Bonus question: How can the compiler know the difference at compile time, given that the call and the implementation may be in different translation units?
EDIT: An attempt to rephrase the question.
How is it possible that there is more to the result of a function than the declared return type? How does it even seem acceptable at all that the function declaration is not sufficient to determine the behavior of the function's returned value? To me that seems to be a case of FUBAR and I'm just not sure whether to blame the standard or Microsoft's implementation thereof.
As the implementer of the called function, I cannot be expected to even know all callers, let alone monitor every little change in the calling code. On the other hand, as the implementer of the calling function, I cannot rely on the called function to not return a variable that happens to be declared const within the scope of the function implementation.
A function declaration is a contract. What is it worth now? We are not talking about a semantically equivalent compiler optimization here, like copy elision, which is nice to have but does not change the meaning of code. Whether or not the copy ctor is called does change the meaning of code (and can even break the code to a degree that it cannot be compiled, as illustrated above). To appreciate the awkwardness of what I am discussing here, consider the "bonus question" above.
I like making local variables const where appropriate. It helps me clean up my train of thought and clearly structure an algorithm.
That is indeed a good practice. Use const wherever you can. Here, however, you cannot (if you expect your const object to be moved from).
The fact that you declare a const object inside your function is a promise that your object's state won't ever be altered as long as the object is alive - in other words, never before its destructor is invoked. Not even immediately before its destructor is invoked. As long as it is alive, the state of a const object shall not change.
However, here you are somehow expecting this object to be moved from right before it gets destroyed by falling out of scope, and moving is altering state. You cannot move from a const object - not even if you are not going to use that object anymore.
What you can do, however, is to create a non-const object and access it in your function only through a reference to const bound to that object:
STest FuncUsingConst(int n) {
STest object_not_to_be_touched_if_not_through_reference(n);
STest const& a = object_not_to_be_touched_if_not_through_reference;
// Now work only with a
return object_not_to_be_touched_if_not_through_reference;
}
With a bit of discipline, you can easily enforce the semantics that the function should not modify that object after its creation - except for being allowed to move from it when returning.
UPDATE:
As suggested by balki in the comments, another possibility would be to bind a constant reference to a non-const temporary object (whose lifetime would be prolonged as per ยง 12.2/5), and perform a const_cast when returning it:
STest FuncUsingConst(int n) {
STest const& a = STest();
// Now work only with a
return const_cast<STest&&>(std::move(a));
}
A program is ill-formed if the copy/move constructor [...] for an object is implicitly odr-used and the special member function is not accessible
-- n3485 C++ draft standard [class.copy]/30
I suspect your problem is with MSVC 2012, and not with C++11.
This code, even without calling it, is not legal C++11:
struct STest {
STest(STest const&) = delete
STest(STest && test) : m_n( std::move(test.m_n) ) {}
explicit STest(int n) : m_n(n) {}
int m_n;
};
STest FuncUsingConst(int n) {
STest const a(n);
return a;
}
because there is no legal way to turn a into a return value. While the return can be elided, eliding the return value does not remove the requirement that the copy constructor exist.
If MSVC2012 is allowing FuncUsingConst to compile, it is doing so in violation of the C++11 standard.
We have people who run code for simulations, testing etc. on some supercomputers that we have. What would be nice is, if as part of a build process we can check that not only that the code compiles but that the ouput matches some pattern which will indicate we are getting meaningful results.
i.e. the researcher may know that the value of x must be within some bounds. If not, then a logical error has been made in the code (assuming it compiles and their is no compile time error).
Are there any pre-written packages for this kind of thing. The code is written in FORTRAN, C, C++ etc.
Any specific or general advice would be appreciated.
I expect most unit testing frameworks could do this; supply a toy test data set and see that the answer is sane in various different ways.
A good way to ensure that the resulting value of any computation (whether final or intermediate) meets certain constraints, is to use an object oriented programming language like C++, and define data-types that internally enforce the conditions that you are checking for. You can then use those data-types as the return value of any computation to ensure that said conditions are met for the value returned.
Let's look at a simple example. Assume that you have a member function inside of an Airplane class as a part of a flight control system that estimates the mass of the airplane instance as a function of the number passengers and the amount of fuel that plane has at that moment. One way to declare the Airplane class and an airplaneMass() member function is the following:
class Airplane {
public:
...
int airplaneMass() const; // note the plain int return type
...
private:
...
};
However, a better way to implement the above, would be to define a type AirplaneMass that can be used as the function's return type instead of int. AirplaneMass can internally ensure (in it's constructor and any overloaded operators) that the value it encapsulates meets certain constraints. An example implementation of the AirplaneMass datatype could be the following:
class AirplaneMass {
public:
// AirplaneMass constructor
AirplaneMass(int m) {
if (m < MIN || m > MAX) {
// throw exception or log constraint violation
}
// if the value of m meets the constraints,
// assign it to the internal value.
mass_ = m;
}
...
/* range checking should also be done in the implementation
of overloaded operators. For instance, you may want to
make sure that the resultant of the ++ operation for
any instance of AirplaneMass also lies within the
specified constraints. */
private:
int mass_;
};
Thereafter, you can redeclare class Airplane and its airplaneMass() member function as follows:
class Airplane {
public:
...
AirplaneMass airplaneMass() const;
// note the more specific AirplaneMass return type
...
private:
...
};
The above will ensure that the value returned by airplaneMass() is between MIN and MAX. Otherwise, an exception will be thrown, or the error condition will be logged.
I had to do that for conversions this month. I don't know if that might help you, but it appeared quite simple a solution to me.
First, I defined a tolerance level. (Java-ish example code...)
private static final double TOLERANCE = 0.000000000001D;
Then I defined a new "areEqual" method which checks if the difference between both values is lower than the tolerance level or not.
private static boolean areEqual(double a, double b) {
return (abs(a - b) < TOLERANCE);
}
If I get a false somewhere, it means the check has probably failed. I can adjust the tolerance to see if it's just a precision problem or really a bad result. Works quite well in my situation.