Functional programming and memory management - memory-management

As per my understanding, one of the characteristics of functional programming is the way we deal with mutable objects.
Ex:
var notFunctionalFilter = function(objectArray) {
for (var i=0; i< objectArray.length; i++) {
if (objectArray[i].active) {
objectArray.splice(i, 1);
i --;
}
}
return objectArray;
};
var functionalFilter = function(objectArray) {
var filtered = [];
for (var i=0; i< objectArray.length; i++) {
if (objectArray[i].active) {
filtered.push(objectArray[i]);
}
}
return filtered;
};
I tend to write more and more code with the "functionnal way", as it feels much cleaner (especially in JS using the beautiful LoDash library, but that's not the topic).
There actually has been quite some articles about this topic going around recently, like this very good one: A practical introduction to functional programming
But there's something that is never discussed there, it is memory management. Here are my questions:
Do we agree that functionalFilter uses more memory than notFunctionalFilter?
Should this be taken into account when deciding how to write the filter function?
Or does the garbage collector handles this perfectly (in most languages) because it is written the functional way?
Thanks

This is a slight aside but your functional filter should look like this:
var functionalFilter = function (item) {
return item.active;
};
and used like this:
var filtered = objectArray.filter(functionFilter);
The only "functional" thing about your "functionalFilter" is that it has no side effects. There is a lot more to functional programming and functional JS than that.
As for memory. Yes it uses more... maybe... sort of. I am going on the assumption that you are passing in an array of objects based on the name. Using the builtin Array.filter is going to minimize this, but in your code the extra memory footprint is tiny.
Objects in JS are passed by reference, that means that your new array is merely an array of pointers to the original objects. (Warning: this means changing them in filtered changes them in objectArray as well. Unless you do a deep clone) That array wrapper is relatively small and probably not even worth talking about in memory terms.

Related

(Solved User error) c++11 make_shared<t>(new class) memory leak

I've been looking into smart pointers, unit testing how they manage memory and am finding and unexpected issue that all the examples recommend doing, but it creates a huge memory leak for me.
This seems to occur when I use a class that has a constructor that builds from another copy of the same class. I'll give an example.
If I have a class like:
Class foo{
public:
//Ignore unsafe practices here
HeavyInMemory* variable;
foo(){
variable = new HeavyInMemory();
}
foo(foo* copyThis){
variable = nullptr;
if(copyThis){
variable = new HeavyInMemory(copyThis->variable);
}
}
~foo(){
delete variable;
}
}
I find that I will get a huge memory leak because std::make_shared has no way to tell the difference between make_shared(args) and make_shared(new T)
Main(){
for(int i =0; i < 100; i++{
//Should not leak, if I follow examples of how to use make_shared
auto test = make_shared<foo>(new foo());
}
//Checking memory addresses these do not match, checking total program memory use, leaks like a
//sieve.
}
Am I misunderstanding something?
Do the examples just not consider this as most use primitive types as examples rather than classes.
Does c++11 just not support the make_shared(new T) format even though I see old books like scott meyers books from 1992. It just doesn't make sense.
Also why would you use make_shared(new T) over make_shared(args)? I've seen a couple threads where people have asked this on here, but neither seemed to actually answer the question with a code example.
//As they mainly say code compiler order causes the leak but in my example this would still leak:
auto originalObject = new foo();
auto expectedDestructorWhenOutofScope = make_shared<foo>(originalObject);
//I have found if I give if the object instead it doesn't leak, but this is getting into the realms of
//hacks that may sometimes work
auto originalObject = new foo();
auto expectedDestructorWhenOutofScope = make_shared<foo>(*originalObject);
EDIT:
Thanks to Igor Tandetnik I now see I am using make_shared entirely wrong. It should be used as a constructor. Thanks again Igor I appreciate it.
//Create new
auto expectedDestructorWhenOutofScope = make_shared<foo>();
//Use object already created
std::shared_ptr<Object> p2(new foo())

Equivalent of enumerators in C++11?

In C#, you can define a custom enumeration very trivially, eg:
public IEnumerable<Foo> GetNestedFoos()
{
foreach (var child in _SomeCollection)
{
foreach (var foo in child.FooCollection)
{
yield return foo;
}
foreach (var bar in child.BarCollection)
{
foreach (var foo in bar.MoreFoos)
{
yield return foo;
}
}
}
foreach (var baz in _SomeOtherCollection)
{
foreach (var foo in baz.GetNestedFoos())
{
yield return foo;
}
}
}
(This can be simplified using LINQ and better encapsulation but that's not the point of the question.)
In C++11, you can do similar enumerations but AFAIK it requires a visitor pattern instead:
template<typename Action>
void VisitAllFoos(const Action& action)
{
for (auto& child : m_SomeCollection)
{
for (auto& foo : child.FooCollection)
{
action(foo);
}
for (auto& bar : child.BarCollection)
{
for (auto& foo : bar.MoreFoos)
{
action(foo);
}
}
}
for (auto& baz : m_SomeOtherCollection)
{
baz.VisitAllFoos(action);
}
}
Is there a way to do something more like the first, where the function returns a range that can be iterated externally rather than calling a visitor internally?
(And I don't mean by constructing a std::vector<Foo> and returning it -- it should be an in-place enumeration.)
I am aware of the Boost.Range library, which I suspect would be involved in the solution, but I'm not particularly familiar with it.
I'm also aware that it's possible to define custom iterators to do this sort of thing (which I also suspect might be involved in the answer) but I'm looking for something that's easy to write, ideally no more complicated than the examples shown here, and composable (like with _SomeOtherCollection).
I would prefer something that does not require the caller to use lambdas or other functors (since that just makes it a visitor again), although I don't mind using lambdas internally if needed (but would still prefer to avoid them there too).
If I'm understanding your question correctly, you want to perform some action over all elements of a collection.
C++ has an extensive set of iterator operations, defined in the iterator header. Most collection structures, including the std::vector that you reference, have .begin and .end methods which take no arguments and return iterators to the beginning and the end of the structure. These iterators have some operations that can be performed on them manually, but their primary use comes in the form of the algorithm header, which defines several very useful iteration functions.
In your specific case, I believe you want the for_each function, which takes a range (as a beginning to end iterator) and a function to apply. So if you had a function (or function object) called action and you wanted to apply it to a vector called data, the following code would be correct (assuming all necessary headers are included appropriately):
std::for_each(data.begin(), data.end(), action);
Note that for_each is just one of many functions provided by the algorithm header. It also provides functions to search a collection, copy a set of data, sort a list, find a minimum/maximum, and much more, all generalized to work over any structure that has an iterator. And if even these aren't enough, you can write your own by reading up on the operations supported on iterators. Simply define a template function that takes iterators of varying types and document what kind of iterator you want.
template <typename BidirectionalIterator>
void function(BidirectionalIterator begin, BidirectionalIterator end) {
// Do something
}
One final note is that all of the operations mentioned so far also operate correctly on arrays, provided you know the size. Instead of writing .begin and .end, you write + 0 and + n, where n is the size of the array. The trivial zero addition is often necessary in order to decay the type of the array into a pointer to make it a valid iterator, but array pointers are indeed random access iterators just like any other container iterator.
What you can do is writing your own adapter function and call it with different ranges of elements of the same type.
This is a non tested solution, that will probably needs some tweaking to make it compile,but it will give you an idea. It uses variadic templates to move from a collection to the next one.
template<typename Iterator, Args...>
visitAllFoos(std::pair<Iterator, Iterator> collection, Args&&... args)
{
std::for_each(collection.first, collection.second, {}(){ // apply action });
return visitAllFoos(std::forward<Args>(args)...);
}
//you can call it with a sequence of begin/end iterators
visitAllFoos(std::make_pair(c1.begin(), c1,end()), std::make_pair(c2.begin(), c2,end()))
I believe, what you're trying to do can be done with Boost.Range, in particular with join and any_range (the latter would be needed if you want to hide the types of the containers and remove joined_range from the interface).
However, the resulting solution would not be very practical both in complexity and performance - mostly because of the nested joined_ranges and type erasure overhead incurred by any_range. Personally, I would just construct std::vector<Foo*> or use visitation.
You can do this with the help of boost::asio::coroutine; see examples at https://pubby8.wordpress.com/2014/03/16/multi-step-iterators-using-coroutines/ and http://www.boost.org/doc/libs/1_55_0/doc/html/boost_asio/overview/core/coroutine.html.

Efficient Independent Synchronized Blocks?

I have a scenario where, at certain points in my program, a thread needs to update several shared data structures. Each data structure can be safely updated in parallel with any other data structure, but each data structure can only be updated by one thread at a time. The simple, naive way I've expressed this in my code is:
synchronized updateStructure1();
synchronized updateStructure2();
// ...
This seems inefficient because if multiple threads are trying to update structure 1, but no thread is trying to update structure 2, they'll all block waiting for the lock that protects structure 1, while the lock for structure 2 sits untaken.
Is there a "standard" way of remedying this? In other words, is there a standard threading primitive that tries to update all structures in a round-robin fashion, blocks only if all locks are taken, and returns when all structures are updated?
This is a somewhat language agnostic question, but in case it helps, the language I'm using is D.
If your language supported lightweight threads or Actors, you could always have the updating thread spawn a new a new thread to change each object, where each thread just locks, modifies, and unlocks each object. Then have your updating thread join on all its child threads before returning. This punts the problem to the runtime's schedule, and it's free to schedule those child threads any way it can for best performance.
You could do this in langauges with heavier threads, but the spawn and join might have too much overhead (though thread pooling might mitigate some of this).
I don't know if there's a standard way to do this. However, I would implement this something like the following:
do
{
if (!updatedA && mutexA.tryLock())
{
scope(exit) mutexA.unlock();
updateA();
updatedA = true;
}
if (!updatedB && mutexB.tryLock())
{
scope(exit) mutexB.unlock();
updateB();
updatedB = true;
}
}
while (!(updatedA && updatedB));
Some clever metaprogramming could probably cut down the repetition, but I leave that as an exercise for you.
Sorry if I'm being naive, but do you not just Synchronize on objects to make the concerns independent?
e.g.
public Object lock1 = new Object; // access to resource 1
public Object lock2 = new Object; // access to resource 2
updateStructure1() {
synchronized( lock1 ) {
...
}
}
updateStructure2() {
synchronized( lock2 ) {
...
}
}
To my knowledge, there is not a standard way to accomplish this, and you'll have to get your hands dirty.
To paraphrase your requirements, you have a set of data structures, and you need to do work on them, but not in any particular order. You only want to block waiting on a data structure if all other objects are blocked. Here's the pseudocode I would base my solution on:
work = unshared list of objects that need updating
while work is not empty:
found = false
for each obj in work:
try locking obj
if successful:
remove obj from work
found = true
obj.update()
unlock obj
if !found:
// Everything is locked, so we have to wait
obj = randomly pick an object from work
remove obj from work
lock obj
obj.update()
unlock obj
An updating thread will only block if it finds that all objects it needs to use are locked. Then it must wait on something, so it just picks one and locks it. Ideally, it would pick the object that will be unlocked earliest, but there's no simple way of telling that.
Also, it's conceivable that an object might become free while the updater is in the try loop and so the updater would skip it. But if the amount of work you're doing is large enough, relative to the cost of iterating through that loop, the false conflict should be rare, and it would only matter in cases of extremely high contention.
I don't know any "standard" way of doing this, sorry. So this below is just a ThreadGroup, abstracted by a Swarm-class, that »hacks» at a job list until all are done, round-robin style, and makes sure that as many threads as possible are used. I don't know how to do this without a job list.
Disclaimer: I'm very new to D, and concurrency programming, so the code is rather amateurish. I saw this more as a fun exercise. (I'm too dealing with some concurrency stuff.) I also understand that this isn't quite what you're looking for. If anyone has any pointers I'd love to hear them!
import core.thread,
core.sync.mutex,
std.c.stdio,
std.stdio;
class Swarm{
ThreadGroup group;
Mutex mutex;
auto numThreads = 1;
void delegate ()[int] jobs;
this(void delegate()[int] aJobs, int aNumThreads){
jobs = aJobs;
numThreads = aNumThreads;
group = new ThreadGroup;
mutex = new Mutex();
}
void runBlocking(){
run();
group.joinAll();
}
void run(){
foreach(c;0..numThreads)
group.create( &swarmJobs );
}
void swarmJobs(){
void delegate () myJob;
do{
myJob = null;
synchronized(mutex){
if(jobs.length > 0)
foreach(i,job;jobs){
myJob = job;
jobs.remove(i);
break;
}
}
if(myJob)
myJob();
}while(myJob)
}
}
class Jobs{
void job1(){
foreach(c;0..1000){
foreach(j;0..2_000_000){}
writef("1");
fflush(core.stdc.stdio.stdout);
}
}
void job2(){
foreach(c;0..1000){
foreach(j;0..1_000_000){}
writef("2");
fflush(core.stdc.stdio.stdout);
}
}
}
void main(){
auto jobs = new Jobs();
void delegate ()[int] jobsList =
[1:&jobs.job1,2:&jobs.job2,3:&jobs.job1,4:&jobs.job2];
int numThreads = 2;
auto swarm = new Swarm(jobsList,numThreads);
swarm.runBlocking();
writefln("end");
}
There's no standard solution but rather a class of standard solutions depending on your needs.
http://en.wikipedia.org/wiki/Scheduling_algorithm

How much information hiding is necessary when doing code refactoring?

How much information hiding is necessary? I have boilerplate code before I delete a record, it looks like this:
public override void OrderProcessing_Delete(Dictionary<string, object> pkColumns)
{
var c = Connect();
using (var cmd = new NpgsqlCommand("SELECT COUNT(*) FROM orders WHERE order_id = :_order_id", c)
{ Parameters = { {"_order_id", pkColumns["order_id"]} } } )
{
var count = (long)cmd.ExecuteScalar();
// deletion's boilerplate code...
if (count == 0) throw new RecordNotFoundException();
else if (count > 1) throw new DatabaseStructureChangedException();
// ...boiler plate code
}
// deleting of table(s) goes here...
}
NOTE: boilerplate code is code-generated, including the "using (var cmd = new NpgsqlCommand( ... )"
But I'm seriously thinking to refactor the boiler plate code, I wanted a more succint code. This is how I envision to refactor the code (made nicer with extension method (not the sole reason ;))
using (var cmd = new NpgsqlCommand("SELECT COUNT(*) FROM orders WHERE order_id = :_order_id", c)
{ Parameters = { {"_order_id", pkColumns["order_id"]} } } )
{
cmd.VerifyDeletion(); // [EDIT: was ExecuteWithVerification before]
}
I wanted the executescalar and the boilerplate code to goes inside the extension method.
For my code above, does it warrants code refactoring / information hiding? Is my refactored operation looks too opaque?
I would say that your refactor is extremely good, if your new single line of code replaces a handful of lines of code in many places in your program. Especially since the functionality is going to be the same in all of those places.
The programmer coming after you and looking at your code will simply look at the definition of the extension method to find out what it does, and now he knows that this code is defined in one place, so there is no possibility of it differing from place to place.
Try it if you must, but my feeling is it's not about succinctness but whether or not you want to enforce the behavior every time or most of the time. And by extension, if the verify-condition changes that it would likely change across the board.
Basically, reducing a small chunk of boiler-plate code doesn't necessarily make things more succinct; it's just one more bit of abstractness the developer has to wade through and understand.
As a developer, I'd have no idea what "ExecuteWithVerify" means. What exactly are we verifying? I'd have to look it up and remember it. But with the boiler-plate code, I can look at the code and understand exactly what's going on.
And by NOT reducing it to a separate method I can also tune the boiler-plate code for cases where exceptions need to be thrown for differing conditions.
It's not information-hiding when you extract or refactor your code. It's only information-hiding when you start restricting access to your extension definition after refactoring.
"new" operator within a Class (except for the Constructor) should be Avoided at all costs. This is what you need to refactor here.

Tree traversal without recursive / stack usage (C#)?

I'm working with WPF and I'm developing a complex usercontrol, which is composed of a tree with rich functionality etc.
For this purpose I used a View-Model design pattern, because some operations couldn't be achieved directly in WPF. So I take the IHierarchyItem (which is a node and pass it to this constructor to create a tree structure)
private IHierarchyItemViewModel(IHierarchyItem hierarchyItem, IHierarchyItemViewModel parent)
{
this.hierarchyItem = hierarchyItem;
this.parent = parent;
List<IHierarchyItemViewModel> l = new List<IHierarchyItemViewModel>();
foreach (IHierarchyItem item in hierarchyItem.Children)
{
l.Add(new IHierarchyItemViewModel(item, this));
}
children = new ReadOnlyCollection<IHierarchyItemViewModel>(l);
}
The problem is that this constructor takes about 3 seconds !! for 200 items on my dual-core.
Am I doing anythig wrong or recursive constructor call is that slow?
Thank you very much!
OK I found a non recursive version by myself, although it uses the Stack.
It traverses the whole tree:
Stack<MyItem> stack = new Stack<MyItem>();
stack.Push(root);
while (stack.Count > 0)
{
MyItem taken = stack.Pop();
foreach (MyItem child in taken.Children)
stack.Push(MyItem);
}
There should be nothing wrong with a recursive implementation of a tree, especially for such a small number of items. A recursive implementation is sometimes less space efficient, and slightly less time efficient, but the code clarity often makes up for it.
It would be useful for you to perform some simple profiling on your constructor. Using one of the suggestions from: http://en.csharp-online.net/Measure_execution_time you could indicate for yourself how long each piece is taking.
There is a possibility that one piece in particular is taking a long time. In any case, that might help you narrow down where you are really spending the time.

Resources