arm-gcc mktime binary size - gcc

I need to perform simple arithmetic on struct tm from time.h. I need to add or subtract seconds or minutes, and be able to normalize the structure. Normally, I'd use mktime(3) which performs this normalization as a side effect:
struct tm t = {.tm_hour=0, .tm_min=59, .tm_sec=40};
t.tm_sec += 30;
mktime(&t);
// t.tm_hour is now 1
// t.tm_min is now 0
// t.tm_sec is now 10
I'm doing this on an STM32 with 32 kB of flash, and binary gets very big. mktime(3) and the other stuff it pulls in take up 16 kB of flash--half the available space.
Is there a function in newlib that is specifically responsible for struct tm normalization? I realize that linking to a private function like that would make the code less portable.

There is a validate_structure() function in newlib/libc/time/mktime.c which does a part of the job, normalizes month, day-of-month, hour, min, sec, but leaves day-of-week and day-of-year alone.
It's declared static, so you can't simply call it, but you can copy the function from the sources. (There might be licensing issues though). Or you can just reimplement it, it's quite straightforward.
The tm_wday and tm_yday is calculated later in mktime(), so you'd need the whole mess including the timezone stuff in order to have these two normalized.
The bulk of that 16kB code is related to a call to siscanf(), a variant of sscanf() without floating point support, which is (I believe) used to parse timezone and DST information in environment variables.
You can cut lots of unnecessary code by using --specs=nano.specs when linking, which would switch to simplified printf/scanf code, saving about 10kB of code in your case.

Related

Stimulate code-inlining

Unlike in languages like C++, where you can explicitly state inline, in Go the compiler dynamically detects functions that are candidate for inlining (which C++ can do too, but Go can't do both). Also there's a debug option to see possible inlining happening, yet there is very few documented online about the exact logic of the go compiler(s) doing this.
Let's say I need to rerun some big loop over a set of data every n-period;
func Encrypt(password []byte) ([]byte, error) {
return bcrypt.GenerateFromPassword(password, 13)
}
for id, data := range someDataSet {
newPassword, _ := Encrypt([]byte("generatedSomething"))
data["password"] = newPassword
someSaveCall(id, data)
}
Aiming for example for Encrypt to being inlined properly what logic should I need to take into consideration for the compiler?
I know from C++ that passing by reference will increase likeliness for automatic inlining without the explicit inline keyword, but it's not very easy to understand what the compiler exactly does to determine the decisions on choosing to inline or not in Go. Scriptlanguages like PHP for example suffer immensely if you do a loop with a constant addSomething($a, $b) where benchmarking such a billion cycles the cost of it versus $a + $b (inline) is almost ridiculous.
Until you have performance problems, you shouldn't care. Inlined or not, it will do the same.
If performance does matter and it makes a noticable and significant difference, then don't rely on current (or past) inlining conditions, "inline" it yourself (do not put it in a separate function).
The rules can be found in the $GOROOT/src/cmd/compile/internal/inline/inl.go file. You may control its aggressiveness with the 'l' debug flag.
// The inlining facility makes 2 passes: first caninl determines which
// functions are suitable for inlining, and for those that are it
// saves a copy of the body. Then InlineCalls walks each function body to
// expand calls to inlinable functions.
//
// The Debug.l flag controls the aggressiveness. Note that main() swaps level 0 and 1,
// making 1 the default and -l disable. Additional levels (beyond -l) may be buggy and
// are not supported.
// 0: disabled
// 1: 80-nodes leaf functions, oneliners, panic, lazy typechecking (default)
// 2: (unassigned)
// 3: (unassigned)
// 4: allow non-leaf functions
//
// At some point this may get another default and become switch-offable with -N.
//
// The -d typcheckinl flag enables early typechecking of all imported bodies,
// which is useful to flush out bugs.
//
// The Debug.m flag enables diagnostic output. a single -m is useful for verifying
// which calls get inlined or not, more is for debugging, and may go away at any point.
Also check out blog post: Dave Cheney - Five things that make Go fast (2014-06-07) which writes about inlining (long post, it's about in the middle, search for the "inline" word).
Also interesting discussion about inlining improvements (maybe Go 1.9?): cmd/compile: improve inlining cost model #17566
Better still, don’t guess, measure!
You should trust the compiler and avoid trying to guess its inner workings as it will change from one version to the next.
There are far too many tricks the compiler, the CPU or the cache can play to be able to predict performance from source code.
What if inlining makes your code bigger to the point that it doesn’t fit in the cache line anymore, making it much slower than the non-inlined version? Cache locality can have a much bigger impact on performance than branching.

Writing small amount of data to large number of files on GlusterFS 3.7

I'm experimenting with 2 Gluster 3.7 servers in 1x2 configuration. Servers are connected over 1 Gbit network. I'm using Debian Jessie.
My use case is as follows: open file -> append 64 bytes -> close file and do this in a loop for about 5000 different files. Execution time for such loop is roughly 10 seconds if I access files through mounted glusterfs drive. If I use libgfsapi directly, execution time is about 5 seconds (2 times faster).
However, the same loop executes in 50ms on plain ext4 disk.
There is huge performance difference between Gluster 3.7 end earlier versions which is, I believe, due to the cluster.eager-lock setting.
My target is to execute the loop in less than 1 second.
I've tried to experiment with lots of Gluster settings but without success. dd tests with various bsize values behave like that TCP no-delay option is not set, although from Gluster source code it seems that no-delay is default.
Any idea how to improve the performance?
Edit:
I've found a solution that works in my case so I'd like to share it in case anyone else faces the same issue.
The root cause of the problem is the number of roundtrips between client and Gluster server during execution of open/write/close sequence. I don't know exactly what is happening behind but timing measurements shows exactly that pattern. Now, the obvious idea would be to "pack" open/write/close sequence into a single write function. Roughly, the C prototype of such function would be:
int write(const char* fname, const void *buf, size_t nbyte, off_t offset)
But, there is already such API function glfs_h_anonymous_write in libgfapi (thanks goes to Suomya from Gluster mailing group). Kind of hidden thing there is the file identifier which is not plain file name, but something of type struct glfs_object. Clients obtain an instance of such object through API calls glfs_h_lookupat/glfs_h_creat. The point here is that glfs_object representing filename is "stateless" in a sense that corresponding inode is left intact (not ref counted). One should think of glfs_object as plain filename identifier and use it as you would use filename (actually, glfs_object stores plain pointer to corresponding inode without ref counting it).
Finally, we should use glfs_h_lookupat/glfs_h_creat once and write many times to the file using glfs_h_anonymous_write.
That way I was able to append 64 bytes to 5000 files in 0.5 seconds, which is 20 times faster than using mounted volume and open//write/close sequence.

Best way to measure elapsed time in Scheme

I have some kind of "main loop" using glut. I'd like to be able to measure how much time it takes to render a frame. The time used to render a frame might be used for other calculations. The use of the function time isn't adequate.
(time (procedure))
I found out that there is a function called current-time. I had to import some package to get it.
(define ct (current-time))
Which define ct as a time object. Unfortunately, I couldn't find any arithmetic packages for dates in scheme. I saw that in Racket there is something called current-inexact-milliseconds which is exactly what I'm looking for because it has nanoseconds.
Using the time object, there is a way to convert it to nanoseconds using
(time->nanoseconds ct)
This lets me do something like this
(let ((newTime (current-time)))
(block)
(print (- (time->nanoseconds newTime) (time->nanoseconds oldTime)))
(set! oldTime newTime))
Seems good enough for me except that for some reasons it was printing things like this
0
10000
0
0
10000
0
10000
I'm rendering things using opengl and I find it hard to believe that some rendering loop are taking 0 nanoseconds. And that each loop is quite stable enough to always take the same amount of nanoseconds.
After all, your results are not so surprising because we have to consider the limited timer resolution for each system. In fact, there are some limits that depend in general by the processor and by the OS processes. These are not able to count in an accurate manner than we expect, despite a quartz oscillator can reach and exceed a period of a nanosecond. You are also limited by the accuracy and resolution of the functions you used. I had a look at the documentation of Chicken scheme but there is nothing similar to (current-inexact-milliseconds) → real? of Racket.
After digging around, I came with the solution that I should write it in C and bind it to scheme using bindings.
(require-extension bind)
(bind-rename "getTime" "current-microseconds")
(bind* #<<EOF
uint64_t getTime();
#ifndef CHICKEN
#include <sys/time.h>
uint64_t getTime() {
struct timeval tim;
gettimeofday(&tim, NULL);
return 1000000 * tim.tv_sec + tim.tv_usec;
}
#endif
EOF
)
Unfortunately this solution isn't the best because it will be chicken-scheme only. It could be implemented as a library but a library to wrap only one function that doesn't exists on any other scheme doesn't make sense.
Since nanoseconds doens't actually make much sense after all, I got the microseconds instead.
Watch the trick here, define the function to get wrapped above and prevent the include to get parsed by bind. When the file will get loaded in Gcc, it will build with the include and function definition.
CHICKEN has current-milliseconds: http://api.call-cc.org/doc/library/current-milliseconds

How do you allocate memory at a predetermined location?

How do i allocate memory using new at a fixed location? My book says to do this:
char *buf=new char[sizeof(sample)];
sample *p=new(buf)sample(10,20);
Here new is allocating memory at buf's address, and (10,20) are values being passed. But what is sample? is it an address or a data type?
let me explain this code to you...
char *buf=new char[sizeof(sample)];
sample *p=new(buf)sample(10,20);
This is really four lines of code, written as two for your convenience. Let me just expand them
char *buf; // 1
buf = new char[sizeof(sample)]; // 2
sample *p; // 3
p = new(buf)sample(10,20); // 4
Line 1 and 3 are simple to explain, they are both declaring pointers. buf is a pointer to a char, p is a pointer to a sample. Now, we can not see what sample is, but we can assume that it is either a class defined else where, or some of data type that has been typedefed (more or less just given a new name) but either way, sample can be thought as a data type just link int or string
Line 2 is a allocating a block of memory and assigning it our char pointer called buf. Lets say sample was a class that contains 2 ints, this means it is (under most compilers) going to be 8 bytes (4 per int). And so buf points to the start of a block of memory that has been set aside to hold chars.
Line 4 is where it gets a big complex. if it where just p = new sample(10,20) it would be a simple case of creating a new object of type sample, passing it the two ints and storing the address of this new object in the pointer p. The addition of the (buf) is basically telling new to make use of the memory pointed to by buf.
The end effect is, you have one block of memory allocated (more or less 8 bytes) and it has two pointers pointing to it. One of the points, buf, is looking at that memory as 8 chars, that other, p, is looking at is a single sample.
Why would you do this?
Normally, you wouldn't. Modern C++ has made the sue of new rather redundant, there are many better ways to deal with objects. I guess that main reason for using this method, is if for some reason you want to keep a pool of memory allocated, as it can take time to get large blocks of memory and you might be able to save your self some time.
For the most part, if you think you need to do something like this, you are trying to solve the wrong thing
A Bit Extra
I do not have much experience with embedded or mobile devices, but I have never seen this used.
The code you posted is basically the same as just doing sample *p = new sample(10,20) neither method is controlling where the sample object is created.
Also consider that you do not always need to create objects dynamically using new.
void myFunction(){
sample p = sample(10,20);
}
This automatically creates a sample object for you. This method is much more preferable as it is easier to read and understand and you do not need to worry about deleting the object, it will be cleaned up for you when the function returns.
If you really do need to make use of dynamic objects, consider use of smart pointers, something like unique_ptr<sample> This will give you the ability to use dynamic object creation but save you the hassle of manual deleting the object of type sample (I can point you towards more info on this if you life)
It is a datatype or a typedef.

Can I assume sizeof(GUID)==16 at all times?

The definition of GUID in the windows header's is like this:
typedef struct _GUID {
unsigned long Data1;
unsigned short Data2;
unsigned short Data3;
unsigned char Data4[ 8 ];
} GUID;
However, no packing is not defined. Since the alignment of structure members is dependent on the compiler implementation one could think this structure could be longer than 16 bytes in size.
If i can assume it is always 16 bytes - my code using GUIDs is more efficient and simple.
However, it would be completely unsafe - if a compiler adds some padding in between of the members for some reason.
My questions do potential reasons exist ? Or is the probability of the scenario that sizeof(GUID)!=16 actually really 0.
It's not official documentation, but perhaps this article can ease some of your fears. I think there was another one on a similar topic, but I cannot find it now.
What I want to say is that Windows structures do have a packing specifier, but it's a global setting which is somewhere inside the header files. It's a #pragma or something. And it is mandatory, because otherwise programs compiled by different compilers couldn't interact with each other - or even with Windows itself.
It's not zero, it depends on your system. If the alignment is word (4-bytes) based, you'll have padding between the shorts, and the size will be more than 16.
If you want to be sure that it's 16 - manually disable the padding, otherwise use sizeof, and don't assume the value.
If I feel I need to make an assumption like this, I'll put a 'compile time assertion' in the code. That way, the compiler will let me know if and when I'm wrong.
If you have or are willing to use Boost, there's a BOOST_STATIC_ASSERT macro that does this.
For my own purposes, I've cobbled together my own (that works in C or C++ with MSVC, GCC and an embedded compiler or two) that uses techniques similar to those described in this article:
http://www.pixelbeat.org/programming/gcc/static_assert.html
The real tricks to getting the compile time assertion to work cleanly is dealing with the fact that some compilers don't like declarations mixed with code (MSVC in C mode), and that the techniques often generate warnings that you'd rather not have clogging up an otherwise working build. Coming up with techniques that avoid the warnings is sometimes a challenge.
Yes, on any Windows compiler. Otherwise IsEqualGUID would not work: it compares only the first 16 bytes. Similarly, any other WinAPI function that takes a GUID* just checks the first 16 bytes.
Note that you must not assume generic C or C++ rules for windows.h. For instance, a byte is always 8 bits on Windows, even though ISO C allows 9 bits.
Anytime you write code dependent on the size of someone else's structure,
warning bells should go off.
Could you give an example of some of the simplified code you want to use?
Most people would just use sizeof(GUID) if the size of the structure was needed.
With that said -- I can't see the size of GUID ever changing.
#include <stdio.h>
#include <rpc.h>
int main () {
GUID myGUID;
printf("size of GUID is %d\n", sizeof(myGUID));
return 0;
}
Got 16. This is useful to know if you need to manually allocate on the heap.

Resources