V8's garbage collection seems to easily clean up as it goes a Local<T> value where T anything stored in the Local, however if you create an ObjectTemplate and then create an instance of that Object, v8 will wait to clean up the memory. Consider the following example where the resident set size remains stable throughout program execution:
Isolate* isolate = Isolate::New(create_params);
Persistent<Context> *context= ContextNew(isolate); // creates a persistent context
for(int i = 1 ; i <= 1000000; i ++ ) {
isolate->Enter();
EnterContext(isolate, context); // enters the context
{
HandleScope handle_scope(isolate);
Local<Object> result = Object::New(isolate);
}
ExitContext(isolate, context);
isolate->Exit();
}
Above, all we do is create a new Object in a loop, and then handle_scope goes out of scope and it looks like the Local values allocated are garbage collected right away as the residential set size remains steady. However, there is an issue when this object is created through an ObjectTemplate that is also created in the loop:
Isolate* isolate = Isolate::New(create_params);
Persistent<Context> *context= ContextNew(isolate); // creates a persistent context
for(int i = 1 ; i <= 1000000; i ++ ) {
isolate->Enter();
EnterContext(isolate, context); // enters the context
{
HandleScope handle_scope(isolate);
Local<Object> result;
Local<ObjectTemplate> templ = ObjectTemplate::New(isolate);
if (!templ->NewInstance(context->Get(isolate)).ToLocal(&result)) { exit(1); }
}
ExitContext(isolate, context);
isolate->Exit();
}
Here, the resident set size increases linearly until an unnecessary amount of ram is used for such a small program. Just looking to understand what is happening here. Sorry for the long explanation, i tried to keep it short and to the point :p. Thanks in advance!
V8 assumes that ObjectTemplates are long-lived and hence allocates them in the "old generation" part of the heap, where it takes longer for them to get collected by a (comparatively slow and rare) full GC cycle -- if the assumption was right and they actually are long-lived, this is an overall performance win. Objects themselves, on the other hand, are allocated in the "young generation", where they are quick and easy to collect by the (comparatively frequent) young-generation GC cycles.
If you run with --trace-gc you should see this explanation confirmed.
Related
I have two classes let's call them A and B
class A
{
....some function definitions..
specialFunc1(int count, B b)
{
stores data in the class B's object to its own containers
and count is an integer which maps data from b to an int using
std::map
}
};
class B
{
Has containers which are processing and storing data
and class A can access these containers because they have public
access specifier
};
int main()
{
A a;
for (int i = 0; i < 10000; i++)
{
B b;
Then b call its methods to store data in desired containers
a.specialFun1(i,a);
}
return 0;
}
My code is as described above, it has two classes let's call them A and B which are interacting with each other.
Class B loads in some data inside the loop in the main function does some processing on it and then passes it to class A which maps each iteration's data to a std::map.
And every successive iteration defines object b again and again.
My understanding is, since object b scope is just within the loop as soon as the iteration completes it should go out of memory and hence a new definition of b should not mess up with the memory on the stack.
Am I right in thinking this or it can potentially cause memory corruption due to successive allocation and deallocation inside the class object b.
Am I right in thinking this or it can potentially cause memory corruption due to successive allocation and deallocation inside the class object b.
You are right. Successive allocation and deallocation is allowed in C++, and as such doesn't in general cause memory corruption.
Of course, any code that has undefined behaviour could corrupt memory. The shown (incomplete) code doesn't demonstrate undefined behaviour.
I am migrating a project that was run on bare-bone to linux, and need to eliminate some {disable,enable}_scheduler calls. :)
So I need a lock-free sync solution in a single writer, multiple readers scenario, where the writer thread cannot be blocked. I came up with the following solution, which does not fit to the usual acquire-release ordering:
class RWSync {
std::atomic<int> version; // incremented after every modification
std::atomic_bool invalid; // true during write
public:
RWSync() : version(0), invalid(0) {}
template<typename F> void sync(F lambda) {
int currentVersion;
do {
do { // wait until the object is valid
currentVersion = version.load(std::memory_order_acquire);
} while (invalid.load(std::memory_order_acquire));
lambda();
std::atomic_thread_fence(std::memory_order_seq_cst);
// check if something changed
} while (version.load(std::memory_order_acquire) != currentVersion
|| invalid.load(std::memory_order_acquire));
}
void beginWrite() {
invalid.store(true, std::memory_order_relaxed);
std::atomic_thread_fence(std::memory_order_seq_cst);
}
void endWrite() {
std::atomic_thread_fence(std::memory_order_seq_cst);
version.fetch_add(1, std::memory_order_release);
invalid.store(false, std::memory_order_release);
}
}
I hope the intent is clear: I wrap the modification of a (non-atomic) payload between beginWrite/endWrite, and read the payload only inside the lambda function passed to sync().
As you can see, here I have an atomic store in beginWrite() where no writes after the store operation can be reordered before the store. I did not find suitable examples, and I am not experienced in this field at all, so I'd like some confirmation that it is OK (verification through testing is not easy either).
Is this code race-free and work as I expect?
If I use std::memory_order_seq_cst in every atomic operation, can I omit the fences? (Even if yes, I guess the performance would be worse)
Can I drop the fence in endWrite()?
Can I use memory_order_acq_rel in the fences? I don't really get the difference -- the single total order concept is not clear to me.
Is there any simplification / optimization opportunity?
+1. I happily accept any better idea as the name of this class :)
The code is basically correct.
Instead of having two atomic variables (version and invalid) you may use single version variable with semantic "Odd values are invalid". This is known as "sequential lock" mechanism.
Reducing number of atomic variables simplifies things a lot:
class RWSync {
// Incremented before and after every modification.
// Odd values mean that object in invalid state.
std::atomic<int> version;
public:
RWSync() : version(0) {}
template<typename F> void sync(F lambda) {
int currentVersion;
do {
currentVersion = version.load(std::memory_order_seq_cst);
// This may reduce calls to lambda(), nothing more
if(currentVersion | 1) continue;
lambda();
// Repeat until something changed or object is in an invalid state.
} while ((currentVersion | 1) ||
version.load(std::memory_order_seq_cst) != currentVersion));
}
void beginWrite() {
// Writer may read version with relaxed memory order
currentVersion = version.load(std::memory_order_relaxed);
// Invalidation requires sequential order
version.store(currentVersion + 1, std::memory_order_seq_cst);
}
void endWrite() {
// Writer may read version with relaxed memory order
currentVersion = version.load(std::memory_order_relaxed);
// Release order is sufficient for mark an object as valid
version.store(currentVersion + 1, std::memory_order_release);
}
};
Note the difference in memory orders in beginWrite() and endWrite():
endWrite() makes sure that all previous object's modifications have been completed. It is sufficient to use release memory order for that.
beginWrite() makes sure that reader will detect object being in invalid state before any futher object's modification is started. Such garantee requires seq_cst memory order. Because of that reader uses seq_cst memory order too.
As for fences, it is better to incorporate them into previous/futher atomic operation: compiler knows how to make the result fast.
Explanations of some modifications of original code:
1) Atomic modification like fetch_add() is intended for cases, when concurrent modifications (like another fetch_add()) are possible. For correctness, such modifications use memory locking or other very time-costly architecture-specific things.
Atomic assignment (store()) does not use memory locking, so it is cheaper than fetch_add(). You may use such assignment because concurrent modifications are not possible in your case (reader does not modify version).
2) Unlike to release-acquire semantic, which differentiate load and store operations, sequential consistency (memory_order_seq_cst) is applicable to every atomic access, and provide total order between these accesses.
The accepted answer is not correct. I guess the code should be something like "currentVersion & 1" instead of "currentVersion | 1". And subtler mistake is that, reader thread can go into lambda(), and after that, the write thread could run beginWrite() and write value to non-atomic variable. In this situation, write action in payload and read action in payload haven't happens-before relationship. concurrent access (without happens-before relationship) to non-atomic variable is a data race. Note that, single total order of memory_order_seq_cst does not means the happens-before relationship; they are consistent, but two kind of things.
I have a consumer as part of the producer consumer pattern:
simplified:
public class MessageFileLogger : ILogger
{
private BlockingCollection<ILogItem> _messageQueue;
private Thread _worker;
private bool _enabled = false;
public MessageFileLogger()
{
_worker = new Thread(LogMessage);
_worker.IsBackground = true;
_worker.Start();
}
private void LogMessage()
{
while (_enabled)
{
if (_messageQueue.Count > 0)
{
itm = _messageQueue.Take();
processItem(itm);
}
else
{
Thread.Sleep(1000);
}
}
}
}
If I remove the
Thread.Sleep(1000);
The CPU usages climbs to something extremely high (13%) as opposed to 0%, with setting the thread to sleep.
Also, if I instantiate multiple instances of the class, the CPU usage climbs in 13% increments, with each instance.
A new LogItem is added the BlockingCollection about every minute or so (maybe every 30 seconds), and writes an applicable message to a file.
Is it possible that the thread is somehow blocking other threads from running, and the system somehow needs to compensate?
Update:
Updated code to better reflect actual code
You gave the thread code to run, so by default it runs that code (the while loop) as fast as it possibly can on a single logical core. Since that's about 13%, I'd imagine your CPU has 4 hyperthreaded cores, resulting in 8 logical cores. Each thread runs it's while loop as fast as it possibly can on it's core, resulting in another 13% usage. Pretty straightforward really.
Side effects of not using sleep are that the whole system runs slower, and uses/produces SIGNIFICANTLY more battery/heat.
Generally, the proper way is to give the _messageQueue another method like
bool BlockingCollection::TryTake(type& item, std::chrono::milliseconds time)
{
DWORD Ret = WaitForSingleObject(event, time.count());
if (Ret)
return false;
item = Take(); //might need to use a shared function instead of calling direct
return true;
}
Then your loop is easy:
private void LogMessage()
{
type item;
while (_enabled)
{
if (_messageQueue.Take(item, std::chrono::seconds(1)))
;//your origional code makes little sense, but this is roughly the same
processItem(itm);
}
}
It also means that if an item is added at any point during the blocking part, it's acted on immediately instead of up to a full second later.
Let's talk about theory a bit. We have one container, let's call it TMyObj that looks like this:
struct TMyObj{
bool bUpdated;
bool bUnderUpdate;
}
Let a class named TMyClass have an array of the container above + 3 helpful functions. One for getting an object to be updated. One for adding update info to a certain object and one for getting an updated object. It's also called in this order. Here's the class
class TMyClass{
TmyObj entries[];
TMyObj GetObjToUpdate;
{
//Enter critical section
for(int i=0; i<Length(entries); i++)
if(!entries[i].bUnderUpdate)
{
entries[i].bUnderUpdate=true;
return entries[i];
}
//Leave critical section
}
//the parameter here is always contained in the Entries array above
void AddUpdateInfo(TMyObj obj)
{
//Do something...
//Enter critical section
if(updateInfoOver) obj.bUpdated=true; //left bUnderUpdate as true so it doesn't bother us
//Leave critical section
}
TmyObj GetUpdatedObj
{
//<-------- here
for(int i=0; i<Length(entrues); i++)
if(entries[i].bUpdated) then return entries[i];
//<-------- and here?
}
}
Now imagine 5+ threads using the first two and another one for using the last function(getUpdadtedObj) on one instance of the class above.
Question: Will it be thread-safe if there's no critical section in the last function?
Given your sample code, it appears that it would be thread-safe for a read. This is assuming entries[] is a fixed size. If you are simply iterating over a fixed collection, there is no reason that the size of the collection should be modified, therefore making a thread-safe read ok.
The only thing I could see is that the result might be out of date. The problem comes from a call to GetUpdatedObj -- Thread A might not see an update to entries[0] during the life-cycle of
for(int i=0; i<Length(entrues); i++)
if(entries[i].bUpdated) then return entries[i];
if Thread B comes along and updates entries[0] while i > 0 -- it all depends if that is considered acceptable or not.
I have a scenario where, at certain points in my program, a thread needs to update several shared data structures. Each data structure can be safely updated in parallel with any other data structure, but each data structure can only be updated by one thread at a time. The simple, naive way I've expressed this in my code is:
synchronized updateStructure1();
synchronized updateStructure2();
// ...
This seems inefficient because if multiple threads are trying to update structure 1, but no thread is trying to update structure 2, they'll all block waiting for the lock that protects structure 1, while the lock for structure 2 sits untaken.
Is there a "standard" way of remedying this? In other words, is there a standard threading primitive that tries to update all structures in a round-robin fashion, blocks only if all locks are taken, and returns when all structures are updated?
This is a somewhat language agnostic question, but in case it helps, the language I'm using is D.
If your language supported lightweight threads or Actors, you could always have the updating thread spawn a new a new thread to change each object, where each thread just locks, modifies, and unlocks each object. Then have your updating thread join on all its child threads before returning. This punts the problem to the runtime's schedule, and it's free to schedule those child threads any way it can for best performance.
You could do this in langauges with heavier threads, but the spawn and join might have too much overhead (though thread pooling might mitigate some of this).
I don't know if there's a standard way to do this. However, I would implement this something like the following:
do
{
if (!updatedA && mutexA.tryLock())
{
scope(exit) mutexA.unlock();
updateA();
updatedA = true;
}
if (!updatedB && mutexB.tryLock())
{
scope(exit) mutexB.unlock();
updateB();
updatedB = true;
}
}
while (!(updatedA && updatedB));
Some clever metaprogramming could probably cut down the repetition, but I leave that as an exercise for you.
Sorry if I'm being naive, but do you not just Synchronize on objects to make the concerns independent?
e.g.
public Object lock1 = new Object; // access to resource 1
public Object lock2 = new Object; // access to resource 2
updateStructure1() {
synchronized( lock1 ) {
...
}
}
updateStructure2() {
synchronized( lock2 ) {
...
}
}
To my knowledge, there is not a standard way to accomplish this, and you'll have to get your hands dirty.
To paraphrase your requirements, you have a set of data structures, and you need to do work on them, but not in any particular order. You only want to block waiting on a data structure if all other objects are blocked. Here's the pseudocode I would base my solution on:
work = unshared list of objects that need updating
while work is not empty:
found = false
for each obj in work:
try locking obj
if successful:
remove obj from work
found = true
obj.update()
unlock obj
if !found:
// Everything is locked, so we have to wait
obj = randomly pick an object from work
remove obj from work
lock obj
obj.update()
unlock obj
An updating thread will only block if it finds that all objects it needs to use are locked. Then it must wait on something, so it just picks one and locks it. Ideally, it would pick the object that will be unlocked earliest, but there's no simple way of telling that.
Also, it's conceivable that an object might become free while the updater is in the try loop and so the updater would skip it. But if the amount of work you're doing is large enough, relative to the cost of iterating through that loop, the false conflict should be rare, and it would only matter in cases of extremely high contention.
I don't know any "standard" way of doing this, sorry. So this below is just a ThreadGroup, abstracted by a Swarm-class, that »hacks» at a job list until all are done, round-robin style, and makes sure that as many threads as possible are used. I don't know how to do this without a job list.
Disclaimer: I'm very new to D, and concurrency programming, so the code is rather amateurish. I saw this more as a fun exercise. (I'm too dealing with some concurrency stuff.) I also understand that this isn't quite what you're looking for. If anyone has any pointers I'd love to hear them!
import core.thread,
core.sync.mutex,
std.c.stdio,
std.stdio;
class Swarm{
ThreadGroup group;
Mutex mutex;
auto numThreads = 1;
void delegate ()[int] jobs;
this(void delegate()[int] aJobs, int aNumThreads){
jobs = aJobs;
numThreads = aNumThreads;
group = new ThreadGroup;
mutex = new Mutex();
}
void runBlocking(){
run();
group.joinAll();
}
void run(){
foreach(c;0..numThreads)
group.create( &swarmJobs );
}
void swarmJobs(){
void delegate () myJob;
do{
myJob = null;
synchronized(mutex){
if(jobs.length > 0)
foreach(i,job;jobs){
myJob = job;
jobs.remove(i);
break;
}
}
if(myJob)
myJob();
}while(myJob)
}
}
class Jobs{
void job1(){
foreach(c;0..1000){
foreach(j;0..2_000_000){}
writef("1");
fflush(core.stdc.stdio.stdout);
}
}
void job2(){
foreach(c;0..1000){
foreach(j;0..1_000_000){}
writef("2");
fflush(core.stdc.stdio.stdout);
}
}
}
void main(){
auto jobs = new Jobs();
void delegate ()[int] jobsList =
[1:&jobs.job1,2:&jobs.job2,3:&jobs.job1,4:&jobs.job2];
int numThreads = 2;
auto swarm = new Swarm(jobsList,numThreads);
swarm.runBlocking();
writefln("end");
}
There's no standard solution but rather a class of standard solutions depending on your needs.
http://en.wikipedia.org/wiki/Scheduling_algorithm