I have been developing a script on my linux box for quite some time, and wanted to run it on my Mac as well.
I thought that the functions on the Mac were the same as the functions on linux, but today I realized it was wrong. I knew that fewer functions existed on the Mac, but I thought that the functions that did exist, had the same implementation.
This problem is specifically in regards to the date command.
When I run the command on my linux machine with the parameter to provide some time in nanoseconds, I get the correct result, but when I run it on my mac, it does not have that option.
Linux-Machine> date +%N
55555555555 #Current time in nanoseconds
Mac-Machine> date +%N
N
How do I go about getting the current time in nanoseconds as a bash command on the Mac?
Worst case is I create a small piece of code that calls a system function in C or something and then call it within my script.
Any help is much appreciated!
This is because OSX and Linux use two different sets of tools. Linux uses the GNU version of the date command (hence, GNU/Linux). Remember that Linux is Linux and OS X is Unix. They're different.
You can install the GNU date command which is included in the "coreutils" package from MacPorts. It will be installed on your system as gdate. You can either use that, or link the date binary with the new gdate binary; your choice.
man date indicates that it doesn't go beyond one second. I would recommend trying another language (Python 2):
$ python -c 'import time; print repr(time.time())'
1332334298.898616
For Python 3, use:
$ python -c 'import time; print(repr(time.time()))'
There are "Linux specifications" but they do not regulate the behavior of the date command much. What you have is really the opposite -- Linux (or more specifically the GNU user-space tools) has a large number of extensions which are not compatible with Unix by any reasonable definition.
There is a large number of standards which do regulate these things. The one you should be looking at is POSIX which requires
date [-u] [+format]
and nothing more to be supported by adhering implementations. (There are other standards like XPG and SUS which you might want to look at as well, but at the very least, you should require and expect POSIX these days ... finally.)
The POSIX document contains a number of examples but there is nothing for date conversion which is however a practical problem which many scripts turn to date for. Also, for your concrete problem, there is nothing for reporting times with sub-second accuracy in POSIX.
Anyway, griping that *BSD isn't Linux isn't really helpful here; you just have to understand what the differences are, and code defensively. If your requirements are complex or unusual, perhaps turn to a scripting language like Perl or Python which perform these types of date formatting operations more or less out of the box in a standard installation (though neither Perl nor Python have a quick and elegant way to do date conversion out of the box, either; solutions tend to be somewhat tortured).
In practical terms, you can compare the MacOS date man page and the Linux one and try to reconcile your requirements.
For your practical requirement, MacOS date does not support any format string with nanosecond accuracy, but nor are you likely to receive useful results on that scale when the execution of the command will take a significant number of nanoseconds. I would settle for millisecond-level accuracy (and even that is going to be thrown off by the execution time in the final digits) and multiply to get the number in nanosecond scale.
nanoseconds () {
python -c 'import time; print(int(time.time()*1000*1000*1000))'
}
(Notice the parentheses around the argument to print() for Python 3.) You will notice that Python does report a value at nanosecond accuracy (the last digits are often not zeros), though by the time you have run time.time() the value will obviously no longer be correct.
To get an idea of the error rate,
bash#macos-high-sierra$ python3
Python 3.5.1 (default, Dec 26 2015, 18:08:53)
[GCC 4.2.1 Compatible Apple LLVM 7.0.2 (clang-700.1.81)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import time
>>> import timeit
>>> def nanoseconds ():
... return int(time.time()*1000*1000*1000)
...
>>> timeit.timeit(nanoseconds, number=10000)
0.0066173350023746025
>>> timeit.timeit('int(time.time()*1000*1000*1000)', number=10000)
0.00557799199668807
The overhead of starting Python and printing the value is probably going to add a few orders of magnitude of overhead, realistically, but I haven't attempted to quantify that. (The output from timeit is in seconds.)
I have been developing a script on my linux box for quite some time, and wanted to run it on my Mac as well.
I thought that the functions on the Mac were the same as the functions on linux, but today I realized it was wrong. I knew that fewer functions existed on the Mac, but I thought that the functions that did exist, had the same implementation.
This problem is specifically in regards to the date command.
When I run the command on my linux machine with the parameter to provide some time in nanoseconds, I get the correct result, but when I run it on my mac, it does not have that option.
Linux-Machine> date +%N
55555555555 #Current time in nanoseconds
Mac-Machine> date +%N
N
How do I go about getting the current time in nanoseconds as a bash command on the Mac?
Worst case is I create a small piece of code that calls a system function in C or something and then call it within my script.
Any help is much appreciated!
This is because OSX and Linux use two different sets of tools. Linux uses the GNU version of the date command (hence, GNU/Linux). Remember that Linux is Linux and OS X is Unix. They're different.
You can install the GNU date command which is included in the "coreutils" package from MacPorts. It will be installed on your system as gdate. You can either use that, or link the date binary with the new gdate binary; your choice.
man date indicates that it doesn't go beyond one second. I would recommend trying another language (Python 2):
$ python -c 'import time; print repr(time.time())'
1332334298.898616
For Python 3, use:
$ python -c 'import time; print(repr(time.time()))'
There are "Linux specifications" but they do not regulate the behavior of the date command much. What you have is really the opposite -- Linux (or more specifically the GNU user-space tools) has a large number of extensions which are not compatible with Unix by any reasonable definition.
There is a large number of standards which do regulate these things. The one you should be looking at is POSIX which requires
date [-u] [+format]
and nothing more to be supported by adhering implementations. (There are other standards like XPG and SUS which you might want to look at as well, but at the very least, you should require and expect POSIX these days ... finally.)
The POSIX document contains a number of examples but there is nothing for date conversion which is however a practical problem which many scripts turn to date for. Also, for your concrete problem, there is nothing for reporting times with sub-second accuracy in POSIX.
Anyway, griping that *BSD isn't Linux isn't really helpful here; you just have to understand what the differences are, and code defensively. If your requirements are complex or unusual, perhaps turn to a scripting language like Perl or Python which perform these types of date formatting operations more or less out of the box in a standard installation (though neither Perl nor Python have a quick and elegant way to do date conversion out of the box, either; solutions tend to be somewhat tortured).
In practical terms, you can compare the MacOS date man page and the Linux one and try to reconcile your requirements.
For your practical requirement, MacOS date does not support any format string with nanosecond accuracy, but nor are you likely to receive useful results on that scale when the execution of the command will take a significant number of nanoseconds. I would settle for millisecond-level accuracy (and even that is going to be thrown off by the execution time in the final digits) and multiply to get the number in nanosecond scale.
nanoseconds () {
python -c 'import time; print(int(time.time()*1000*1000*1000))'
}
(Notice the parentheses around the argument to print() for Python 3.) You will notice that Python does report a value at nanosecond accuracy (the last digits are often not zeros), though by the time you have run time.time() the value will obviously no longer be correct.
To get an idea of the error rate,
bash#macos-high-sierra$ python3
Python 3.5.1 (default, Dec 26 2015, 18:08:53)
[GCC 4.2.1 Compatible Apple LLVM 7.0.2 (clang-700.1.81)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import time
>>> import timeit
>>> def nanoseconds ():
... return int(time.time()*1000*1000*1000)
...
>>> timeit.timeit(nanoseconds, number=10000)
0.0066173350023746025
>>> timeit.timeit('int(time.time()*1000*1000*1000)', number=10000)
0.00557799199668807
The overhead of starting Python and printing the value is probably going to add a few orders of magnitude of overhead, realistically, but I haven't attempted to quantify that. (The output from timeit is in seconds.)
I've got a tool in production which calls git rev-list on a single, large repo 10-30 times per minute. I'm seeing git response times vary widely, from around 1 second to as much as 50 seconds (before timeout mechanism abandons the git request).
git rev-list --pretty=raw 2ef9fa0d0fa4c34d57103a0545b3cc96c2552e6f..f5daa48ebcd3cc95a0df683f8c3a3ad64def4a6e
The goal is to see if the two commits are ancestors/descendants and if so, which of the two is the ancestor. I make this call once, if I get output I have my answer, if no output, I swap the commit positions and run again. If no output this time then they are not ancestors/descendants of one another.
Is there another, more efficient way, to find this information? If it comes to it, even suggestions on modeling the commit tree in some structure outside of git are appreciated.
Thanks.
Look at git merge-base first-sha1 second-sha1. If the result is a third sha1, they are not descendents. Otherwise, the result is the older ancestor.
However, if you could describe at a higher level some of the work flow, there's probably an easier way and you may not have to rely on this. I wrote this article about Branch per Feature: http://dymitruk.com/blog/2012/02/05/branch-per-feature/ It may give you some ideas.
git merge-base (and even git rev-list) will be even faster with:
commit graphs (mentioned here)
Git 2.20 (Q4 2018)
That is because both are base on the concept of "commits being reachable".
And recent update broke the reachability algorithm when refs (e.g. tags) that point at objects that are not commit were involved, which has been fixed.
See commit 4067a64, commit b67f6b2 (21 Sep 2018), commit 6621c83 (28 Aug 2018), and commit 6cc01743 (20 Jul 2018) by Derrick Stolee (derrickstolee).
(Merged by Junio C Hamano -- gitster -- in commit 0f7ac90, 24 Sep 2018)
commit-reach: use can_all_from_reach
The is_descendant_of method previously used in_merge_bases() to check if the commit can reach any of the commits in the provided list.
This had two performance problems:
The performance is quadratic in worst-case.
A single in_merge_bases() call requires walking beyond the target commit in order to find the full set of boundary commits that may be merge-bases.
The can_all_from_reach method avoids this quadratic behavior and can limit the search beyond the target commits using generation numbers.
It requires a small prototype adjustment to stop using commit-date as a cutoff, as that optimization is no longer appropriate here.
Since in_merge_bases() uses paint_down_to_common(), is_descendant_of() naturally found cutoffs to avoid walking the entire commit graph.
Since we want to always return the correct result, we cannot use the min_commit_date cutoff in can_all_from_reach. We then rely on generation numbers to provide the cutoff.
Since not all repos will have a commit-graph file, nor will we always have generation numbers computed for a commit-graph file, create a new method, generation_numbers_enabled(), that checks for a commit-graph file and sees if the first commit in the file has a non-zero generation number.
In the case that we do not have generation numbers, use the old logic for is_descendant_of().
Performance was measured on a copy of the Linux repository using the 'test-tool reach is_descendant_of' command using this input:
A:v4.9
X:v4.10
X:v4.11
X:v4.12
X:v4.13
X:v4.14
X:v4.15
X:v4.16
X:v4.17
X.v3.0
Note that this input is tailored to demonstrate the quadratic nature of the previous method, as it will compute merge-bases for v4.9 versus all of the later versions before checking against v4.1.
Before: 0.26 s
After: 0.21 s
Since we previously used the is_descendant_of method in the ref_newer method, we also measured performance there using 'test-tool reach ref_newer' with this input:
A:v4.9
B:v3.19
Before: 0.10 s
After: 0.08 s
By adding a new commit with parent v3.19, we test the non-reachable case
of ref_newer:
Before: 0.09 s
After: 0.08 s
And before Git 2.20 (Q4 2018), the generation of (experimental) commit-graph files have so far been fairly silent, even though it takes noticeable amount of time in a meaningfully large repository.
The users will now see progress output.
See commit 6b89a34 (19 Sep 2018), and commit 1f7f557, commit 7b0f229 (17 Sep 2018) by Ævar Arnfjörð Bjarmason (avar).
Helped-by: Martin Ågren martin.agren#gmail.com.
(Merged by Junio C Hamano -- gitster -- in commit 36d767d, 16 Oct 2018)
With Git 2.21 (Q1 2019), that progress will be more accurate:
See commit 01ca387 (19 Nov 2018) by Ævar Arnfjörð Bjarmason (avar).
Helped-by: Derrick Stolee (derrickstolee).
(Merged by Junio C Hamano -- gitster -- in commit d4c9027, 14 Jan 2019)
commit-graph: split up close_reachable() progress output
Amend the progress output added in 7b0f229 ("commit-graph write:
add progress output", 2018-09-17) so that the total numbers it reports
aren't higher than the total number of commits anymore.
See this thread for a bug report pointing that out.
When I added this I wasn't intending to provide an accurate count, but
just have some progress output to show the user the command wasn't
hanging. But since we are showing numbers, let's make them
accurate. The progress descriptions were suggested by Derrick Stolee.
As noted in the original thread, we are unlikely to show anything except the "Expanding reachable..." message even on fairly large repositories such as
linux.git.
On a test repository I have with north of 7 million commits
all of these are displayed. Two of them don't show up for long, but as
noted in future-proofing this fo
And (still Git 2.21, Q1 2019), the codepath to show progress meter while writing out commit-graph file has been improved.
See commit 49bbc57, commit 890226c, commit e59c615, commit 7c7b8a7, commit d9b1b30, commit 2894473, commit 53035c4 (19 Jan 2019) by Ævar Arnfjörð Bjarmason (avar).
See commit 857ba92 (23 Jan 2019), and commit 5af7417 (19 Jan 2019) by SZEDER Gábor (szeder).
(Merged by Junio C Hamano -- gitster -- in commit e5eac57, 05 Feb 2019)
commit-graph write: add intermediate progress
Add progress output to sections of code between "Annotating[...]" and
"Computing[...]generation numbers".
This can collectively take 5-10 seconds on a large enough repository.
On a test repository with I have, with ~7 million commits and ~50 million objects, we'll now emit:
$ ~/g/git/git --exec-path=$HOME/g/git commit-graph write
Finding commits for commit graph among packed objects: 100% (124763727/124763727), done.
Loading known commits in commit graph: 100% (18989461/18989461), done.
Expanding reachable commits in commit graph: 100% (18989507/18989461), done.
Clearing commit marks in commit graph: 100% (18989507/18989507), done.
Counting distinct commits in commit graph: 100% (18989507/18989507), done.
Finding extra edges in commit graph: 100% (18989507/18989507), done.
Computing commit graph generation numbers: 100% (7250302/7250302), done.
Writing out commit graph in 4 passes: 100% (29001208/29001208), done.
Whereas on a medium-sized repository such as linux.git, these new progress bars won't have time to kick in and as before and we'll still emit output like:
$ ~/g/git/git --exec-path=$HOME/g/git commit-graph write
Finding commits for commit graph among packed objects: 100% (6529159/6529159), done.
Expanding reachable commits in commit graph: 815990, done.
Computing commit graph generation numbers: 100% (815983/815983), done.
Writing out commit graph in 4 passes: 100% (3263932/3263932), done.
Git 2.24 (Q4 2019) adds a few fixes to make it faster.
See commit dd2e50a (07 Sep 2019) by Jeff King (peff).
See commit 67fa6aa (07 Sep 2019) by SZEDER Gábor (szeder).
(Merged by Junio C Hamano -- gitster -- in commit cda8faa, 07 Oct 2019)
commit-graph: turn off save_commit_buffer
The commit-graph tool may read a lot of commits, but it only cares about
parsing their metadata (parents, trees, etc) and doesn't ever show the
messages to the user.
And so it should not need save_commit_buffer, which is meant for holding onto the object data of parsed commits so that we can show them later. In fact, it's quite harmful to do so.
According to massif, the max heap of "git commit-graph write --reachable" in linux.git before/after this patch (removing the commit graph file in between) goes from ~1.1GB to ~270MB.
Which isn't surprising, since the difference is about the sum of the
uncompressed sizes of all commits in the repository, and this was
equivalent to leaking them.
This obviously helps if you're under memory pressure, but even without
it, things go faster.
My before/after times for that command (without massif) went from 12.521s to 11.874s, a speedup of ~5%.
And:
commit-graph: don't show progress percentages while expanding reachable commits
Commit 49bbc57 (commit-graph write: emit a percentage for all
progress, 2019-01-19, Git v2.21.0-rc0) was a bit overeager when it added progress
percentages to the "Expanding reachable commits in commit graph" phase
as well, because most of the time the number of commits that phase has
to iterate over is not known in advance and grows significantly, and,
consequently, we end up with nonsensical numbers:
$ git commit-graph write --reachable
Expanding reachable commits in commit graph: 138606% (824706/595), done.
[...]
$ git rev-parse v5.0 | git commit-graph write --stdin-commits
Expanding reachable commits in commit graph: 81264400% (812644/1), done.
[...]
Even worse, because the percentage grows so quickly, the progress code
outputs much more often than it should (because it ticks every second,
or every 1%), slowing the whole process down.
My time for "git commit-graph write --reachable" on linux.git went from 13.463s to 12.521s with this patch, ~7% savings.
Therefore, don't show progress percentages in the "Expanding reachable
commits in commit graph" phase.
Git 2.24 (Q4 2019) adds another optimization.
See commit 7371612 (26 Aug 2019) by Garima Singh (singhgarima).
(Merged by Junio C Hamano -- gitster -- in commit caf150c, 07 Oct 2019)
commit-graph: add --[no-]progress to write and verify
Add --[no-]progress to git commit-graph write and verify.
The progress feature was introduced in 7b0f229
("commit-graph write: add progress output", 2018-09-17, Git v2.20.0-rc0) but
the ability to opt-out was overlooked.
With Git 2.28 (Q3 2020), "git merge-base --is-ancestor" is taught to take advantage of the commit graph.
See commit 80b8ada, commit d91d6fb (17 Jun 2020) by Derrick Stolee (derrickstolee).
(Merged by Junio C Hamano -- gitster -- in commit dc4b3cf, 25 Jun 2020)
commit-reach: use fast logic in repo_in_merge_base
Reported-by: Ævar Arnfjörð Bjarmason
Reported-by: SZEDER Gábor
Signed-off-by: Derrick Stolee
The repo_is_descendant_of() method is aware of the existence of the commit-graph file. It checks for generation_numbers_enabled() before deciding on using can_all_from_reach() or repo_in_merge_bases() depending on the situation. The reason here is that can_all_from_reach() uses a depth-first search that is limited by the minimum generation number of the target commits, and that algorithm can be very slow when generation numbers are not present. The alternative uses paint_down_to_common() which will walk the entire merge-base boundary, which is typically slower.
This method is used by commands like "git tag --contains" and "git branch --contains" for very fast results when a commit-graph file exists.
Unfortunately, it is not used in commands like "git merge-base --is-ancestor" which is doing an even simpler request.
This issue was raised recently with respect to a change to how generation numbers are stored, but was also reported much earlier before commit-reach.c existed to simplify these reachability queries.
The root cause is that builtin/merge-base.c has a method handle_is_ancestor() that calls in_merge_bases(), an older version of repo_in_merge_bases().
It would be better if we have every caller to in_merge_bases() use the logic in can_all_from_reach() when possible.
This is where things get a little tricky: repo_is_descendant_of() calls repo_in_merge_bases() in the non-generation numbers enabled case! If we simply update repo_in_merge_bases() to call repo_is_descendant_of() instead of repo_in_merge_bases_many(), then we will get a recursive call loop. Thankfully, this is caught by the test suite in the default mode (i.e. GIT_TEST_COMMIT_GRAPH=0).
The trick, then, is to make the non-generation number case for repo_is_descendant_of() call repo_in_merge_bases_many() directly, skipping the non-_many version. This allows us to take advantage of this faster code path, when possible.
The easiest way to measure the performance impact is to test the following command on the Linux kernel repository:
git merge-base --is-ancestor <A> <B>
| A | B | Time Before | Time After |
|------|------|-------------|------------|
| v3.0 | v5.7 | 0.459s | 0.028s |
| v4.0 | v5.7 | 0.267s | 0.021s |
| v5.0 | v5.7 | 0.074s | 0.013s |
Note that each of these samples return success. The old code performed the same operation when <A> and <B> are swapped.
However, can_all_from_reach() will return immediately if the generation numbers show that <A> has larger generation number than <B>.
Thus, the time for the swapped case is universally 0.004s in each case.
With Git 2.28 (Q3 2020), is_descendant_of() is no longer used:
See commit c1ea625 (23 Jun 2020) by Carlo Marcelo Arenas Belón (carenas).
(Merged by Junio C Hamano -- gitster -- in commit 0258ed1, 06 Jul 2020)
commit-reach: avoid is_descendant_of() shim
Helped-by: Derrick Stolee
Signed-off-by: Carlo Marcelo Arenas Belón
Reviewed-by: Derrick Stolee
d91d6fbf26 ("commit-reach: create repo_is_descendant_of()", 2020-06-17, Git v2.28.0-rc0 -- merge listed in batch #5) adds a repository aware version of is_descendant_of() and a backward compatibility shim that is barely used.
Update all callers to directly use the new repo_is_descendant_of() function instead; making the codebase simpler and pushing more the_repository references higher up the stack.
Also, before Git 2.31 (Q1 2021), the code to implement "git merge-base --independent"(man) was poorly done and was kept from the very beginning of the feature.
See commit 41f3c99, commit 3677773, commit c8d693e, commit fbc21e3 (19 Feb 2021), and commit 0fac156 (01 Feb 2021) by Derrick Stolee (derrickstolee).
(Merged by Junio C Hamano -- gitster -- in commit 48923e8, 25 Feb 2021)
commit-reach: use heuristic in remove_redundant()
Signed-off-by: Derrick Stolee
Reachability algorithms in commit-reach.c frequently benefit from using the first-parent history as a heuristic for satisfying reachability queries.
The most obvious example was implemented in 4fbcca4 ("commit-reach: make can_all_from_reach...
linear", 2018-07-20, Git v2.20.0-rc0 -- merge listed in batch #1).
Update the walk in remove_redundant() to use this same heuristic.
Here, we are walking starting at the parents of the input commits.
Sort those parents and walk from the highest generation to lower.
Each time, use the heuristic of searching the first parent history before continuing to expand the walk.
The order in which we explore the commits matters, so update compare_commits_by_gen to break generation number ties with commit date.
This has no effect when the commits are in a commit-graph file with corrected commit dates computed, but it will assist when the commits are in the region "above" the commit-graph with "infinite" generation number.
Note that we cannot shift to use compare_commits_by_gen_then_commit_date as the method prototype is different.
We use compare_commits_by_gen for QSORT() as opposed to as a priority function.
The important piece is to ensure we short-circuit the walk when we find that there is a single non-redundant commit.
This happens frequently when looking for merge-bases or comparing several tags with 'git merge-base --independent'(man).
Use a new count 'count_still_independent' and if that hits 1 we can stop walking.
To update 'count_still_independent' properly, we add use of the RESULT flag on the input commits.
Then we can detect when we reach one of these commits and decrease the count.
We need to remove the RESULT flag at that moment because we might re-visit that commit when popping the stack.
We use the STALE flag to mark parents that have been added to the new walk_start list, but we need to clear that flag before we start walking so those flags don't halt our depth-first-search walk.
On my copy of the Linux kernel repository, the performance of 'git merge-base --independent' <all-tags> goes from 1.1 seconds to 0.11 seconds.
With Git 2.31 (Q1 2021), the common code to deal with "chunked file format" that is shared by the multi-pack-index and commit-graph files have been factored out, to help codepaths for both filetypes to become more robust.
See commit c4ff24b (24 Feb 2021) by Taylor Blau (ttaylorr).
See commit a43a2e6, commit 5387fef, commit 329fac3, commit 6ab3b8b, commit 2692c2f, commit 5f0879f, commit 63a8f0e, commit c144241, commit 0ccd713, commit 980f525, commit 7a3ada1, commit 31bda9a, commit b4d9414, commit 577dc49, commit 47410aa, commit 570df42 (18 Feb 2021), and commit eb90719 (05 Feb 2021) by Derrick Stolee (derrickstolee).
(Merged by Junio C Hamano -- gitster -- in commit 660dd97, 01 Mar 2021)
commit-graph.c: display correct number of chunks when writing
Reported-by: SZEDER Gábor
Signed-off-by: Taylor Blau
Acked-by: Derrick Stolee
When writing a commit-graph, a progress meter is shown which indicates the number of pieces of data to write (one per commit in each chunk).
In 47410aa ("commit-graph: use chunk-format write API", 2021-02-18, Git v2.32.0 -- merge), the number of chunks became tracked by the new chunk-format API.
But a stray local variable was left behind from when write_commit_graph_file() used to keep track of the same.
Since this was no longer updated after 47410aa, the progress meter appeared broken:
$ git commit-graph write --reachable
Expanding reachable commits in commit graph: 837569, done.
Writing out commit graph in 3 passes: 166% (4187845/2512707), done.
Drop the local variable and rely instead on the chunk-format API to tell us the correct number of chunks.
Git 2.38 (Q3 2022) offers another approach (beside the original commit-graph mentioned in my previous answer).
See commit 359b01c (11 Jul 2022) by Jeff King (peff).
(Merged by Junio C Hamano -- gitster -- in commit 40ab711, 19 Jul 2022)
ref-filter: disable save_commit_buffer while traversing
Signed-off-by: Jeff King
Various ref-filter options like "--contains" or "--merged" may cause us to traverse large segments of the history graph.
It's counter-productive to have save_commit_buffer turned on, as that will instruct the commit code to cache in-memory the object contents for each commit we traverse.
This increases the amount of heap memory used while providing little or no benefit, since we're not actually planning to display those commits (which is the usual reason that tools like git-log(man) want to keep them around).
We can easily disable this feature while ref-filter is running.
This lowers peak heap (as measured by massif) for running:
git tag --contains 1da177e4c3
in linux.git from ~100MB to ~20MB.
It also seems to improve runtime by 4-5% (600ms vs 630ms).
A few points to note:
it should be safe to temporarily disable save_commit_buffer like this.
The saved buffers are accessed through get_commit_buffer(), which treats the saved ones like a cache, and loads on-demand from the object database on a cache miss.
So any code that was using this would not be wrong, it might just incur an extra object lookup for some objects.
But...
I don't think any ref-filter related code is using the cache.
While it's true that an option like "--format=%(*contents:subject)" or
"--sort=*authordate" will need to look at the commit contents,
ref-filter doesn't use get_commit_buffer() to do so!
It always reads
the objects directly via read_object_file(), though it does avoid
re-reading objects if the format can be satisfied without them.
Timing "git tag --format=%(*authordate)" shows that we're the same
before and after, as expected.
Note that all of this assumes you don't have a commit-graph file.
If you do, then the heap usage is even lower, and the runtime is 10x faster.
So in that sense this is not urgent, as there's a much better solution.
But since it's such an obvious and easy win for fallback cases (including commits which aren't yet in the graph file), there's no reason not to.
I have been developing a script on my linux box for quite some time, and wanted to run it on my Mac as well.
I thought that the functions on the Mac were the same as the functions on linux, but today I realized it was wrong. I knew that fewer functions existed on the Mac, but I thought that the functions that did exist, had the same implementation.
This problem is specifically in regards to the date command.
When I run the command on my linux machine with the parameter to provide some time in nanoseconds, I get the correct result, but when I run it on my mac, it does not have that option.
Linux-Machine> date +%N
55555555555 #Current time in nanoseconds
Mac-Machine> date +%N
N
How do I go about getting the current time in nanoseconds as a bash command on the Mac?
Worst case is I create a small piece of code that calls a system function in C or something and then call it within my script.
Any help is much appreciated!
This is because OSX and Linux use two different sets of tools. Linux uses the GNU version of the date command (hence, GNU/Linux). Remember that Linux is Linux and OS X is Unix. They're different.
You can install the GNU date command which is included in the "coreutils" package from MacPorts. It will be installed on your system as gdate. You can either use that, or link the date binary with the new gdate binary; your choice.
man date indicates that it doesn't go beyond one second. I would recommend trying another language (Python 2):
$ python -c 'import time; print repr(time.time())'
1332334298.898616
For Python 3, use:
$ python -c 'import time; print(repr(time.time()))'
There are "Linux specifications" but they do not regulate the behavior of the date command much. What you have is really the opposite -- Linux (or more specifically the GNU user-space tools) has a large number of extensions which are not compatible with Unix by any reasonable definition.
There is a large number of standards which do regulate these things. The one you should be looking at is POSIX which requires
date [-u] [+format]
and nothing more to be supported by adhering implementations. (There are other standards like XPG and SUS which you might want to look at as well, but at the very least, you should require and expect POSIX these days ... finally.)
The POSIX document contains a number of examples but there is nothing for date conversion which is however a practical problem which many scripts turn to date for. Also, for your concrete problem, there is nothing for reporting times with sub-second accuracy in POSIX.
Anyway, griping that *BSD isn't Linux isn't really helpful here; you just have to understand what the differences are, and code defensively. If your requirements are complex or unusual, perhaps turn to a scripting language like Perl or Python which perform these types of date formatting operations more or less out of the box in a standard installation (though neither Perl nor Python have a quick and elegant way to do date conversion out of the box, either; solutions tend to be somewhat tortured).
In practical terms, you can compare the MacOS date man page and the Linux one and try to reconcile your requirements.
For your practical requirement, MacOS date does not support any format string with nanosecond accuracy, but nor are you likely to receive useful results on that scale when the execution of the command will take a significant number of nanoseconds. I would settle for millisecond-level accuracy (and even that is going to be thrown off by the execution time in the final digits) and multiply to get the number in nanosecond scale.
nanoseconds () {
python -c 'import time; print(int(time.time()*1000*1000*1000))'
}
(Notice the parentheses around the argument to print() for Python 3.) You will notice that Python does report a value at nanosecond accuracy (the last digits are often not zeros), though by the time you have run time.time() the value will obviously no longer be correct.
To get an idea of the error rate,
bash#macos-high-sierra$ python3
Python 3.5.1 (default, Dec 26 2015, 18:08:53)
[GCC 4.2.1 Compatible Apple LLVM 7.0.2 (clang-700.1.81)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import time
>>> import timeit
>>> def nanoseconds ():
... return int(time.time()*1000*1000*1000)
...
>>> timeit.timeit(nanoseconds, number=10000)
0.0066173350023746025
>>> timeit.timeit('int(time.time()*1000*1000*1000)', number=10000)
0.00557799199668807
The overhead of starting Python and printing the value is probably going to add a few orders of magnitude of overhead, realistically, but I haven't attempted to quantify that. (The output from timeit is in seconds.)
I am compiling some benchmarks, and it says that I can try the option gcc-serial instead of only gcc, can anyone please explain the difference between gcc and gcc serial?.
The place where that appears is here and it is mentioned for example in the slide 71. It is mentioned in more places but in none of them say what is gcc-serial.
Thank you.
The slides refer to a tool from Stanford (PARSEC) meant to benchmark multithreaded shared memory programs -- a.k.a. parallel programs. In many cases, "serial" is the opposite of "parallel":
$ cat config/gcc-serial.bldconf
#!/bin/bash
#
# gcc-serial.bldconf - file containing global information necessary to build
# the serial versions of the PARSEC programs with gcc
#
# Copyright (C) 2006, 2007 Christian Bienia
# Global configuration is identical to multi-threaded version
source ${PARSECDIR}/config/gcc.bldconf
I've never heard of gcc-serial, and I've used gcc for quite a while. Can you clarify more precisely what your benchmarks are telling you? Maybe you meant "gcc -serial" (with a space after gcc and before -serial)? Even in that, case though, I still don't know, since I can't find any mention of a -serial option in my gcc manual.
One version of gcc I'm using has the -mserialize-volatile and -mno-serialize-volatile options, which enable and disable respectively the generation of code that ensures the sequential consistency of volatile memory accesses.
From the slides, it seems to be a configuration name for the benchmarking tool, not a command you should use. It probably means some special way of using gcc when the tool is used.