how do I allocate memory for some of the structure elements - memory-management

I want to allocate memory for some elements of a structure, which are pointers to other small structs.How do I allocate and de-allocate memory in best way?
Ex:
typedef struct _SOME_STRUCT {
PDATATYPE1 PDatatype1;
PDATATYPE2 PDatatype2;
PDATATYPE3 PDatatype3;
.......
PDATATYPE12 PDatatype12;
} SOME_STRUCT, *PSOME_STRUCT;
I want to allocate memory for PDatatype1,3,4,6,7,9,11.Can I allocate memory with single malloc? or what is the best way to allocate memory for only these elements and how to free the whole memory allocated?

There is a trick that allows a single malloc, but that also has to weighed against doing a more standard multiple malloc approach.
If [and only if], once the DatatypeN elements of SOME_STRUCT are allocated, they do not need to be reallocated in any way, nor does any other code do a free on any of them, you can do the following [the assumption that PDATATYPEn points to DATATYPEn]:
PSOME_STRUCT
alloc_some_struct(void)
{
size_t siz;
void *vptr;
PSOME_STRUCT sptr;
// NOTE: this optimizes down to a single assignment
siz = 0;
siz += sizeof(DATATYPE1);
siz += sizeof(DATATYPE2);
siz += sizeof(DATATYPE3);
...
siz += sizeof(DATATYPE12);
sptr = malloc(sizeof(SOME_STRUCT) + siz);
vptr = sptr;
vptr += sizeof(SOME_STRUCT);
sptr->Pdatatype1 = vptr;
// either initialize the struct pointed to by sptr->Pdatatype1 here or
// caller should do it -- likewise for the others ...
vptr += sizeof(DATATYPE1);
sptr->Pdatatype2 = vptr;
vptr += sizeof(DATATYPE2);
sptr->Pdatatype3 = vptr;
vptr += sizeof(DATATYPE3);
...
sptr->Pdatatype12 = vptr;
vptr += sizeof(DATATYPE12);
return sptr;
}
Then, the when you're done, just do free(sptr).
The sizeof above should be sufficient to provide proper alignment for the sub-structs. If not, you'll have to replace them with a macro (e.g. SIZEOF) that provides the necessary alignment. (e.g.) for 8 byte alignment, something like:
#define SIZEOF(_siz) (((_siz) + 7) & ~0x07)
Note: While it is possible to do all this, and it is more common for things like variable length string structs like:
struct mystring {
int my_strlen;
char my_strbuf[0];
};
struct mystring {
int my_strlen;
char *my_strbuf;
};
It is debatable whether it's worth the [potential] fragility (i.e. somebody forgets and does the realloc/free on the individual elements). The cleaner way would be to embed the actual structs rather than the pointers to them if the single malloc is a high priority for you.
Otherwise, just do the the [more] standard way and do the 12 individual malloc calls and, later, the 12 free calls.
Still, it is a viable technique, particularly on small memory constrained systems.
Here is the [more] usual way involving per-element allocations:
PSOME_STRUCT
alloc_some_struct(void)
{
void *vptr;
PSOME_STRUCT sptr;
sptr = malloc(sizeof(SOME_STRUCT));
// either initialize the struct pointed to by sptr->Pdatatype1 here or
// caller should do it -- likewise for the others ...
sptr->Pdatatype1 = malloc(sizeof(DATATYPE1));
sptr->Pdatatype2 = malloc(sizeof(DATATYPE2));
sptr->Pdatatype3 = malloc(sizeof(DATATYPE3));
...
sptr->Pdatatype12 = malloc(sizeof(DATATYPE12));
return sptr;
}
void
free_some_struct(PSOME_STRUCT sptr)
{
free(sptr->Pdatatype1);
free(sptr->Pdatatype2);
free(sptr->Pdatatype3);
...
free(sptr->Pdatatype12);
free(sptr);
}

If your structure contains the others structures as elements instead of pointers, you can allocate memory for the combined structure in one shot:
typedef struct _SOME_STRUCT {
DATATYPE1 Datatype1;
DATATYPE2 Datatype2;
DATATYPE3 Datatype3;
.......
DATATYPE12 Datatype12;
} SOME_STRUCT, *PSOME_STRUCT;
PSOME_STRUCT p = (PSOME_STRUCT)malloc(sizeof(SOME_STRUCT));
// Or without malloc:
PSOME_STRUCT p = new SOME_STRUCT();

Related

node.js c++ addon - afraid of memory leak

first of all I admit I'm a newbie in C++ addons for node.js.
I'm writing my first addon and I reached a good result: the addon does what I want. I copied from various examples I found in internet to exchange complex data between the two languages, but I understood almost nothing of what I wrote.
The first thing scaring me is that I wrote nothing that seems to free some memory; another thing which is seriously worrying me is that I don't know if what I wrote may helps or creating confusion for the V8 garbage collector; by the way I don't know if there are better ways to do what I did (iterating over js Object keys in C++, creating js Objects in C++, creating Strings in C++ to be used as properties of js Objects and what else wrong you can find in my code).
So, before going on with my job writing the real math of my addon, I would like to share with the community the nan and V8 part of it to ask if you see something wrong or that can be done in a better way.
Thank you everybody for your help,
iCC
#include <map>
#include <nan.h>
using v8::Array;
using v8::Function;
using v8::FunctionTemplate;
using v8::Local;
using v8::Number;
using v8::Object;
using v8::Value;
using v8::String;
using Nan::AsyncQueueWorker;
using Nan::AsyncWorker;
using Nan::Callback;
using Nan::GetFunction;
using Nan::HandleScope;
using Nan::New;
using Nan::Null;
using Nan::Set;
using Nan::To;
using namespace std;
class Data {
public:
int dt1;
int dt2;
int dt3;
int dt4;
};
class Result {
public:
int x1;
int x2;
};
class Stats {
public:
int stat1;
int stat2;
};
typedef map<int, Data> DataSet;
typedef map<int, DataSet> DataMap;
typedef map<float, Result> ResultSet;
typedef map<int, ResultSet> ResultMap;
class MyAddOn: public AsyncWorker {
private:
DataMap *datas;
ResultMap results;
Stats stats;
public:
MyAddOn(Callback *callback, DataMap *set): AsyncWorker(callback), datas(set) {}
~MyAddOn() { delete datas; }
void Execute () {
for(DataMap::iterator i = datas->begin(); i != datas->end(); ++i) {
int res = i->first;
DataSet *datas = &i->second;
for(DataSet::iterator l = datas->begin(); l != datas->end(); ++l) {
int dt4 = l->first;
Data *data = &l->second;
// TODO: real population of stats and result
}
// test result population
results[res][res].x1 = res;
results[res][res].x2 = res;
}
// test stats population
stats.stat1 = 23;
stats.stat2 = 42;
}
void HandleOKCallback () {
Local<Object> obj;
Local<Object> res = New<Object>();
Local<Array> rslt = New<Array>();
Local<Object> sts = New<Object>();
Local<String> x1K = New<String>("x1").ToLocalChecked();
Local<String> x2K = New<String>("x2").ToLocalChecked();
uint32_t idx = 0;
for(ResultMap::iterator i = results.begin(); i != results.end(); ++i) {
ResultSet *set = &i->second;
for(ResultSet::iterator l = set->begin(); l != set->end(); ++l) {
Result *result = &l->second;
// is it ok to declare obj just once outside the cycles?
obj = New<Object>();
// is it ok to use same x1K and x2K instances for all objects?
Set(obj, x1K, New<Number>(result->x1));
Set(obj, x2K, New<Number>(result->x2));
Set(rslt, idx++, obj);
}
}
Set(sts, New<String>("stat1").ToLocalChecked(), New<Number>(stats.stat1));
Set(sts, New<String>("stat2").ToLocalChecked(), New<Number>(stats.stat2));
Set(res, New<String>("result").ToLocalChecked(), rslt);
Set(res, New<String>("stats" ).ToLocalChecked(), sts);
Local<Value> argv[] = { Null(), res };
callback->Call(2, argv);
}
};
NAN_METHOD(AddOn) {
Local<Object> datas = info[0].As<Object>();
Callback *callback = new Callback(info[1].As<Function>());
Local<Array> props = datas->GetOwnPropertyNames();
Local<String> dt1K = Nan::New("dt1").ToLocalChecked();
Local<String> dt2K = Nan::New("dt2").ToLocalChecked();
Local<String> dt3K = Nan::New("dt3").ToLocalChecked();
Local<Array> props2;
Local<Value> key;
Local<Object> value;
Local<Object> data;
DataMap *set = new DataMap();
int res;
int dt4;
DataSet *dts;
Data *dt;
for(uint32_t i = 0; i < props->Length(); i++) {
// is it ok to declare key, value, props2 and res just once outside the cycle?
key = props->Get(i);
value = datas->Get(key)->ToObject();
props2 = value->GetOwnPropertyNames();
res = To<int>(key).FromJust();
dts = &((*set)[res]);
for(uint32_t l = 0; l < props2->Length(); l++) {
// is it ok to declare key, data and dt4 just once outside the cycles?
key = props2->Get(l);
data = value->Get(key)->ToObject();
dt4 = To<int>(key).FromJust();
dt = &((*dts)[dt4]);
int dt1 = To<int>(data->Get(dt1K)).FromJust();
int dt2 = To<int>(data->Get(dt2K)).FromJust();
int dt3 = To<int>(data->Get(dt3K)).FromJust();
dt->dt1 = dt1;
dt->dt2 = dt2;
dt->dt3 = dt3;
dt->dt4 = dt4;
}
}
AsyncQueueWorker(new MyAddOn(callback, set));
}
NAN_MODULE_INIT(Init) {
Set(target, New<String>("myaddon").ToLocalChecked(), GetFunction(New<FunctionTemplate>(AddOn)).ToLocalChecked());
}
NODE_MODULE(myaddon, Init)
One year and half later...
If somebody is interested, my server is up and running since my question and the amount of memory it requires is stable.
I can't say if the code I wrote really does not has some memory leak or if lost memory is freed at each thread execution end, but if you are afraid as I was, I can say that using same structure and calls does not cause any real problem.
You do actually free up some of the memory you use, with the line of code:
~MyAddOn() { delete datas; }
In essence, C++ memory management boils down to always calling delete for every object created by new. There are also many additional architecture-specific and legacy 'C' memory management functions, but it is not strictly necessary to use these when you do not require the performance benefits.
As an example of what could potentially be a memory leak: You're passing the object held in the *callback pointer to the function AsyncQueueWorker. Yet nowhere in your code is this pointer freed, so unless the Queue worker frees it for you, there is a memory leak here.
You can use a memory tool such as valgrind to test your program further. It will spot most memory problems for you and comes highly recommended.
One thing I've observed is that you often ask (paraphrased):
Is it okay to declare X outside my loop?
To which the answer actually is that declaring variables inside of your loops is better, whenever you can do it. Declare variables as deep inside as you can, unless you have to re-use them. Variables are restricted in scope to the outermost set of {} brackets. You can read more about this in this question.
is it ok to use same x1K and x2K instances for all objects?
In essence, when you do this, if one of these objects modifies its 'x1K' string, then it will change for all of them. The advantage is that you free up memory. If the string is the same for all these objects anyway, instead of having to store say 1,000,000 copies of it, your computer will only keep a single one in memory and have 1,000,000 pointers to it instead. If the string is 9 ASCII characters long or longer under amd64, then that amounts to significant memory savings.
By the way, if you don't intend to modify a variable after its declaration, you can declare it as const, a keyword short for constant which forces the compiler to check that your variable is not modified after declaration. You may have to deal with quite a few compiler errors about functions accepting only non-const versions of things they don't modify, some of which may not be your own code, in which case you're out of luck. Being as conservative as possible with non-const variables can help spot problems.

Is there an easier way to set/get values in a gSOAP request/response?

I am using gSOAP to configure an ONVIF compatible camera.
Currently, I am manually setting all the parameters in the request by doing something like this. This is for the SetVideEncoderConfiguration
MediaBindingProxy mediaDevice (uri);
AUTHENTICATE (mediaDevice);
_trt__SetVideoEncoderConfiguration req;
_trt__SetVideoEncoderConfigurationResponse resp;
struct tt__VideoEncoderConfiguration encoderConfig;
struct tt__VideoResolution resolutionConfig;
encoderConfig.Name = strdup (name);
encoderConfig.UseCount = 1;
encoderConfig.Quality = 50;
if (strcmp (encoding, "H264") == 0)
encoderConfig.Encoding = tt__VideoEncoding__H264;
else if (strcmp (encoding, "JPEG") == 0)
encoderConfig.Encoding = tt__VideoEncoding__JPEG;
encoderConfig.token = strdup (profileToken);
encoderConfig.SessionTimeout = (LONG64)"PT0S";
resolutionConfig.Width=1280;
resolutionConfig.Height=720;
encoderConfig.Resolution = &resolutionConfig;
tt__VideoRateControl rateControl;
rateControl.FrameRateLimit = 15;
rateControl.EncodingInterval = 1;
rateControl.BitrateLimit = 4500;
encoderConfig.RateControl = &rateControl;
struct tt__H264Configuration h264;
h264.GovLength = 30;
h264.H264Profile = tt__H264Profile__Baseline;
encoderConfig.H264 = &h264;
struct tt__MulticastConfiguration multicast;
struct tt__IPAddress address;
address.IPv4Address = strdup ("0.0.0.0");
multicast.Address = &address;
encoderConfig.Multicast = &multicast;
req.Configuration = &encoderConfig;
req.ForcePersistence = true;
int ret = mediaDevice.SetVideoEncoderConfiguration (&req, resp);
qDebug () << "Set Encoder: " << ret;
Is there an easier way to do this? May be some function calls that set the request parameters? Another way I found with GetMediaUri was to use something like
soap_new_req__trt__GetStreamUri (mediaDevice.soap,soap_new_req_tt__StreamSetup (mediaDevice.soap, (enum tt__StreamType)0, soap_new_tt__Transport(mediaDevice.soap), 1, NULL), "profile1");
Are these the only two ways for client side code with gSOAP?
-Mandar Joshi
There are four variations of soap_new_T() to allocate data of type T in C++ with gSOAP:
T * soap_new_T(struct soap*) returns a new instance of T that is default
initialized and allocated on the heap managed by the soap context.
T * soap_new_T(struct soap*, int n) returns an array of n new instances of
T on the managed heap. The instances in the array are default initialized as described above.
T * soap_new_req_T(struct soap*, ...) (structs and classes only) returns a
new instance of T allocated on the managed heap and sets the required data members to the values specified in the other arguments ....
T * soap_new_set_T(struct soap*, ...) (structs and classes only) returns a
new instance of T on the managed heap and sets the public/serializable data members to the values specified in the other arguments ....
Use soap_strdup(struct soap*, const char*) instead of strdup to dup strings onto the managed heap.
All data on the managed heap is mass-deleted with soap_destroy(soap) and
soap_end(soap) (call these in that order) which must be called before soap_done(soap) or soap_free(soap).
To allocate pointers to data, use templates:
template<class T>
T * soap_make(struct soap *soap, T val)
{
T *p = (T*)soap_malloc(soap, sizeof(T));
if (p)
*p = val;
return p;
}
template<class T>
T **soap_make_array(struct soap *soap, T* array, int n)
{
T **p = (T**)soap_malloc(soap, n * sizeof(T*));
for (int i = 0; i < n; ++i)
p[i] = &array[i];
return p;
}
Then use soap_make<int>(soap, 123) to create a pointer to the value 123 on the managed heap and soap_make_array(soap, soap_new_CLASSNAME(soap, 100), 100) to create 100 pointers to 100 instances of CLASSNAME.
The gSOAP tools also generate deep copy operations for you: CLASSNAME::soap_dup(struct soap*) creates a deep copy of the object and allocates it in a another soap context that you provide as argument. Use NULL as this argument to allocate unmanaged deep copies (but these cannot have pointer cycles!). Then delete unmanaged copies with CLASSNAME::soap_del() for deep deletion of all members and then delete the object itself.
See Memory management in C++ for more details. Use gSOAP 2.8.39 and greater.

Swift Dictionary Memory Consumption is Astronomical

Can anyone help shed some light on why the below code consumes well over 100 MB of RAM during runtime?
public struct Trie<Element : Hashable> {
private var children: [Element:Trie<Element>]
private var endHere : Bool
public init() {
children = [:]
endHere = false
}
public init<S : SequenceType where S.Generator.Element == Element>(_ seq: S) {
self.init(gen: seq.generate())
}
private init<G : GeneratorType where G.Element == Element>(var gen: G) {
if let head = gen.next() {
(children, endHere) = ([head:Trie(gen:gen)], false)
} else {
(children, endHere) = ([:], true)
}
}
private mutating func insert<G : GeneratorType where G.Element == Element>(var gen: G) {
if let head = gen.next() {
let _ = children[head]?.insert(gen) ?? { children[head] = Trie(gen: gen) }()
} else {
endHere = true
}
}
public mutating func insert<S : SequenceType where S.Generator.Element == Element>(seq: S) {
insert(seq.generate())
}
}
var trie = Trie<UInt32>()
for i in 0..<300000 {
trie.insert([UInt32(i), UInt32(i+1), UInt32(i+2)])
}
Based on my calculations total memory consumption for the above data structure should be somewhere around the following...
3 * count * sizeof(Trie<UInt32>)
Or –
3 * 300,000 * 9 = 8,100,000 bytes = ~8 MB
How is it that this data structure consumes well over 100 MB during runtime?
sizeof reports only the static footprint on the stack, which the Dictionary is just kind of a wrapper of the reference to its internal reference type implementation, and also the copy on write support. In other words, the key-value pairs and the hash table of your dictionary are allocated on the heap, which is not covered by sizeof. This applies to all other Swift collection types.
In your case, you are creating three Trie - and indirectly three dictionaries - every iteration of the 300000. I wouldn't be surprised if the 96-byte allocations mentioned by #Macmade is the minimum overhead of a dictionary (e.g. its hash bucket).
There might also be cost related to growing storage. So you may try to see if setting a minimumCapacity on the dictionary would help. On the other hand, if you do not need a divergent path generated per iteration, you may consider an indirect enum as an alternative, e.g.
public enum Trie<Element> {
indirect case Next(Element, Trie<Element>)
case End
}
which should use less memory.
Size of your struct is 9 bytes, not 5.
You can check it with sizeof:
let size = sizeof( Trie< UInt32 > );
Also, you iterate 300'000 times, but insert 3 values (of course, it's a trie). So that's 900'000.
Anyway, that does not explain by itself the memory consumption you are observing.
I'm not really fluent in Swift, and I don't understand you code.
Maybe there's also some error in it, making it allocate more memory than needed.
But anyway, in order to understand what's happening, you need to run your code in Instruments (command-i).
On my machine, I can see 900'000 96 bytes allocations by swift_slowAlloc.
That's more like it...
Why 96 bytes, assuming there's no error in your code?
Well, it might be because of the way memory is allocated for your elements.
When satisfying a request, the memory allocator may allocate more memory than requested. That may be because it needs some internal metadata, because of paging, because of alignment, ...
But even though, it seems really exaggerated, so use instruments and double check what your code is doing.

How to create a string on the heap in D?

I'm writing a trie in D and I want each trie object have a pointer to some data, which has a non-NULL value if the node is a terminal node in the trie, and NULL otherwise. The type of the data is undetermined until the trie is created (in C this would be done with a void *, but I plan to do it with a template), which is one of the reasons why pointers to heap objects are desirable.
This requires me to eventually create my data on the heap, at which point it can be pointed to by the trie node. Experimenting, it seems like new performs this task, much as it does in C++. However for some reason, this fails with strings. The following code works:
import std.stdio;
void main() {
string *a;
string b = "hello";
a = &b;
writefln("b = %s, a = %s, *a = %s", b, a, *a);
}
/* OUTPUT:
b = hello, a = 7FFF5C60D8B0, *a = hello
*/
However, this fails:
import std.stdio;
void main() {
string *a;
a = new string();
writefln("a = %s, *a = %s", a, *a);
}
/* COMPILER FAILS WITH:
test.d(5): Error: new can only create structs, dynamic arrays or class objects, not string's
*/
What gives? How can I create strings on the heap?
P.S. If anyone writing the D compiler is reading this, the apostrophe in "string's" is a grammatical error.
Strings are always allocated on the heap. This is the same for any other dynamic array (T[], string is only an alias to type immutable(char)[]).
If you need only one pointer there are two ways to do it:
auto str = "some immutable(char) array";
auto ptr1 = &str; // return pointer to reference to string (immutable(char)[]*)
auto ptr2 = str.ptr; // return pointer to first element in string (char*)
If you need pointer to empty string, use this:
auto ptr = &"";
Remember that you can't change value of any single character in string (because they are immutable). If you want to operate on characters in string use this:
auto mutableString1 = cast(char[])"Convert to mutable."; // shouldn't be used
// or
auto mutableString2 = "Convert to mutable.".dup; // T[].dup returns mutable duplicate of array
Generally you should avoid pointers unless you absolutely know what are you doing.
From memory point of view any pointer take 4B (8B for x64 machines) of memory, but if you are using pointers to arrays then, if pointer is not null, there are 12B (+ data in array) of memory in use. 4B if from pointer and 8B are from reference to array, because array references are set of two pointers. One to first and one to last element in array.
Remember that string is just immutable(char)[]. So you don't need pointers since string is already a dynamic array.
As for creating them, you just do new char[X], not new string.
The string contents are on the heap already because strings are dynamic arrays. However, in your case, it is better to use a char dynamic array instead as you require mutability.
import std.stdio;
void main() {
char[] a = null; // redundant as dynamic arrays are initialized to null
writefln("a = \"%s\", a.ptr = %s", a, a.ptr); // prints: a = "", a.ptr = null
a = "hello".dup; // dup is required because a is mutable
writefln("a = \"%s\", a.ptr = %s", a, a.ptr); // prints: a = "hello", a.ptr = 7F3146469FF0
}
Note that you don't actually hold the array's contents, but a slice of it. The array is handled by the runtime and it is allocated on the heap.
A good reading on the subject is this article http://dlang.org/d-array-article.html
If you can only use exactly one pointer and you don't want to use the suggestions in Marmyst's answer (&str in his example creates a reference to the stack which you might not want, str.ptr loses information about the strings length as D strings are not always zero terminated) you can do this:
Remeber that you can think of D arrays (and therefore strings) as a struct with a data pointer and length member:
struct ArraySlice(T)
{
T* ptr;
size_t length;
}
So when dealing with an array the array's content is always on the heap, but the ptr/length combined type is a value type and therefore usually kept on the stack. I don't know why the compiler doesn't allow you to create that value type on the heap using new, but you can always do it manually:
import core.memory;
import std.stdio;
string* ptr;
void alloc()
{
ptr = cast(string*)GC.malloc(string.sizeof);
*ptr = "Hello World!";
}
void main()
{
alloc();
writefln("ptr=%s, ptr.ptr=%s, ptr.length=%s, *ptr=%s", ptr, ptr.ptr, ptr.length, *ptr);
}

How to put my structure variable into CPU caches to eliminate main memory page access time? Options

It's clear that there is no explicit way or certain system calls that
help programmers to put a variable into the CPU cache.
But I think that a certain programming style or well designed
algorithm can make it possible to increase the possibilities that the
variable can be cached into the CPU caches.
Here is my example:
I want to append an 8 byte structure at the end of an array consisting
of the same type of structures, declared in the global main memory
region.
This process is continuously repeated for 4 million operations. This process takes 6 seconds, 1.5 us for each operation. I think this result tells that the two memory areas have not been cached.
I got some clues from a cache-oblivious algorithm, so I tried several
ways to enhance this. Until now, no enhancement.
I think some clever codes can reduce the elapsed time, up to 10 to 100
times. Please show me the way.
-------------------------------------------------------------------------
Appended (2011-04-01)
Damon~ thank you for your comment!
After reading your comment, I analyzed my code again, and found several things
that I missed. The following code that I attached is the abbreviated version of my original code.
To accurately measure each operation's execution time (in the original code, there are several different types of operations), I inserted the time measuring code using clock_gettime() function. I thought if I measure each operation's execution time and accumulate them, the additional cost by the main loop can be avoided.
In the original code, the time measuring code was hidden by a macro function, so I totally forgot about it.
The running time of this code is almost 6 seconds. But if I get rid of the time measuring function in the main loop, it becomes 0.1 seconds.
Since the clock_gettime() function supports very high precision (upto 1 nano second), executed on the basis of an independent thread, and also it requires very big structure,
I think the function caused the cache-out of the main memory area where the consecutive insertions are performed.
Thank you again for your comment. For further enhancement, any suggestion will be very helpful for me to optimize my code.
I think the hierachically defined structure variable might cause unnecessary time cost,
but first I want to know how much it would be, before I change it to the more C-style code.
typedef struct t_ptr {
uint32 isleaf :1, isNextLeaf :1, ptr :30;
t_ptr(void) {
isleaf = false;
isNextLeaf = false;
ptr = NIL;
}
} PTR;
typedef struct t_key {
uint32 op :1, key :31;
t_key(void) {
op = OP_INS;
key = 0;
}
} KEY;
typedef struct t_key_pair {
KEY key;
PTR ptr;
t_key_pair() {
}
t_key_pair(KEY k, PTR p) {
key = k;
ptr = p;
}
} KeyPair;
typedef struct t_op {
KeyPair keyPair;
uint seq;
t_op() {
seq = 0;
}
} OP;
#define MAX_OP_LEN 4000000
typedef struct t_opq {
OP ops[MAX_OP_LEN];
int freeOffset;
int globalSeq;
bool queueOp(register KeyPair keyPair);
} OpQueue;
bool OpQueue::queueOp(register KeyPair keyPair) {
bool isFull = false;
if (freeOffset == (int) (MAX_OP_LEN - 1)) {
isFull = true;
}
ops[freeOffset].keyPair = keyPair;
ops[freeOffset].seq = globalSeq++;
freeOffset++;
}
OpQueue opQueue;
#include <sys/time.h>
int main() {
struct timespec startTime, endTime, totalTime;
for(int i = 0; i < 4000000; i++) {
clock_gettime(CLOCK_REALTIME, &startTime);
opQueue.queueOp(KeyPair());
clock_gettime(CLOCK_REALTIME, &endTime);
totalTime.tv_sec += (endTime.tv_sec - startTime.tv_sec);
totalTime.tv_nsec += (endTime.tv_nsec - startTime.tv_nsec);
}
printf("\n elapsed time: %ld", totalTime.tv_sec * 1000000LL + totalTime.tv_nsec / 1000L);
}
YOU don't put the structure into any cache. The CPU does that automatically for you. The CPU is even more clever than that; if you access sequential memory, it will start putting things from memory into the cache before you read them.
And really, it should be common sense that for a simple bit of code like this, the time you spend on measuring is ten times more than the time to perform the code (apparently 60 times in your case).
Since you put so much confidence in clock_gettime (): I suggest you call it five times in a row and store the results, then print the differences. There's resolution, there's precision, and there's how long it takes to return the current time, which is pretty damned long.
I have been unable to force caching, but you can force memory to be uncache-able. If you have large other datastructures you might exclude these so that they will not pollute your caches. This can be done by specifying PAGE_NOCACHE for the Windows VirutalAllocXXX functions.
http://msdn.microsoft.com/en-us/library/windows/desktop/aa366786(v=vs.85).aspx

Resources