TortoiseGit: Why do some of my branches not have revision numbers? - tortoisegit

We just started using the revision number feature in TortoiseGit. Earlier today, I noticed that all commits have revision numbers on them, up to the number 310. However, after committing a new branch later, I noticed that the latest commit has the rev number 284. All previous commits do not have numbers unless they are on the same graph flow line.
In short, why do many of these commits have no revision number associated with them? Is there a branch view that will number all of these together?

Git has no (incremental) revision numbers. You can only try to emulate them, e.g. by counting all commits before - this, however, will not provide unique revision numbers.
The branch revision number is calculated by calling git rev-list --count --first-parent [SHA1] and represents the number of commits between the beginning of time and the selected commit. This number is NOT guaranteed to be unique, especially if you alter the history (e.g., using rebase) or use several branches at the same time. It can be seen "kinda unique" per branch in case you don't alter its history (e.g. by rebasing, resetting) and only commit or merge other branches on it. This number is only displayed for first-parent commits and not for commits on non-fast-forward merges (here duplicate numbers could occur). See https://gcc.gnu.org/ml/gcc/2015-08/msg00148.html and https://gitlab.com/tortoisegit/tortoisegit/merge_requests/1 for more details.
https://tortoisegit.org/docs/tortoisegit/tgit-dug-settings.html#tgit-dug-settings-dialogs
In order to reduce confusions, TortoiseGit does not show them for other than the current branch or for non-fast-forward merges.

I just discovered the answer. I needed to adjust the branch view. Right click on the below link and click browse to select a new perspective.

Related

Merge two lists based on deltas

I think it's similar to offline data synchronization, but it doesn't have to be nearly as extreme as that.
So I'm looking for a way to merge two, likely similar, sets of data that are aware of their version (so where they split) , aware of the set of CRUD actions that got them to their most recent versions. The difference being the child set is probably off by one or many actions per one version while the authority set has multiple versions with around one delta per version.
Say you have two lists. Lists A and List B. You have 3 versions. Version ab, version abcde, and version abk.
In version ab it is 1, 2, 3.
List A is version abcde.
In version abc it appended item 4.
In version abcd it moved item 1 to last place.
In version abcde it deleted item 3.
It looks like 2, 4, 1 in the latest version.
List B is version abk.
In version abk it appended item 4k.
It looks like 1, 2, 3, 4k in the latest version.
The goal is to synchronize List B with the authority, A, by sending what it did in version abk - and getting back a response for the deltas it needs to get from version abk to abcdef. Where abcdef will be list A merged with list B.
Given the information I have above, how might the logic look for merging two lists using deltas based on their versions look like? Is there additional information needed for efficiently doing a merge between deltas? Or maybe what is a good direction on this? I'm hoping to synchronize the two to a new version by having one send the deltas to another and getting deltas back that bring the old list up to speed.

git - Calculate the mean number of characters per commit message per author

I'd like to calculate the mean number of characters per commit message per author in a git repo - I'm not very good with bash unfortunately so I really have no idea how to start!

Why is branch prediction quite accurate?

Why is branch prediction accurate? Can we generally think of it at a high level in terms of how certain branches of our code execute 99% of time, while the rest is special cases and exception handling?
My question my be a little vague but I am only interested in high level view on this. Let me give you an example
Say you have a function with a parameter
void execute(Input param) {
assertNotEmpty(param)
(...)
}
I execute my function conditionally given parameter isn't empty. 99% of times this parameter will indeed be non empty. Can I then think of neural network based branch prediction for example, in a way, that as it has seen such instruction flow countless times (such assertions are quite common), it will simply learn that most of the time that parameter is non empty and take branch accordingly?
Can we then think of our code in terms of - the cleaner, the more predictable it is, or even more common - the easier we make it for branch predictor?
Thanks!
A short history of how branches are predicted:
When Great-Granny was programming
there was no prediction and no pre-fetch, soon she started pre-fetching the next instruction while executing the current instruction. Most of the times this was correct and improved the clock per instruction in most cases by one and otherwise nothing was lost. This already had a misprediction rate of only average 34% (59%-9%, H&P AQA p.81).
When Granny was programming
There was the problem that the CPU's were getting faster and added a Decoding stage to the pipeline, making it Fetch -> Decode -> Execute -> Write back. With 5 instructions between branches 2 fetches were lost every 5 instructions if the branch was backward or forward and was respectively taken and not taken. A quick research showed that most conditional backward branches were loops and most were taken and most forward was not taken, as they mostly were bad cases. With profiling we get down to 3%-24%
The advent of the dynamic branch predictor with the saturation counter
made life for the programmer easier. From the observation that most branches do what they did last time, having a list of counters address with the low bits of the address of a branch told if the branch was taken or not and the Branch Target Buffer provided the address to be fetched. On this local predictor it lowers the mis-prediction rate to 1%-18%.
This is all good and fine, but some branches are depended on how previous other branches acted. So if we have a history of the last branches take or not taken as 1 and 0 we have 2^H different predictors depending on the history. In practice the history bits are xor'ed with the branch lower address bits, using the same array as in the previous version.
The PRO of this is that the predictor can quickly learn patterns, the CON is if there is no pattern the branch will overwrite the previous branches bits. The PRO outweighs the CON as the locality is more important than branches that are not in the current (inner) loop. This global predictor improve the mis-prediction down to 1%-11%.
That is great, but in some cases the local predictor beats the global predictor so we want both. Xor-ing the local branch history with the address improves on the local branch prediction making it a 2 level predictor as well, just with local instead of global branch history. Adding a 3rd saturation counter for each branch that counts which was right we can select between them. This tournament predictor improves the misprediction rate with around 1% point compared with the global predictor.
Now your case is one in 100 branches in another direction.
Lets examine the local two level predictor, when we get to the one case the last H branches of this branches have all been in the same direction, lets say taken, making all history 1's so the branch predictor will have chosen a single entry in the local predictor table and it will be saturated to taken. This means it will in all cases case an mis-predict on the one case, and the next call where the branch will be taken will most likely be correctly predicted (barring aliasing to the branch table entry). So the local branch predictor can't be used as having a 100 bit long history would require a 2^100 large predictor.
Maybe the global predictor catch the case then, in the last 99 cases the branch was taken, so the predictors for the last 99 will have updated according to the different behaviour of the last H branches moving them to predict taken. So if the last H branches have independent behaviour from the current branch, then all the entries in the global branch prediction table will predict taken and so you will get a mis-predict.
But if a combination of previous branches, say the 3rd, 7th and 12th, all acted so that if the right combination of these were taken/not taken it would foreshadow the opposite behavior, the branch prediction entry of this combination would correctly predict the behaviour of the branch. The problem here is that if you only seldom, seen in the runtime over the program, updates this branch entry and other branches alias to it with their behaviour then it might fail to predict anyway.
Let assume the global branch behaviour actually predicts the right outcome based on the pattern of previous branches. Then you will most likely be mislead by the tournament predictor which says the local predictor is "always" right and the local predictor will always mis-predict for your case.
Note 1: The "always" should be taken with a small grain of sand, as other branches might pollute your branch table entries with aliasing to the same entry. The designers have tried to make this less likely with having 8K different entries, creatively rearranging the bits of the lower address of the branch.
Note 2: Other schemes might be able to solve this but unlikely as its 1 in 100.
There are couple of reasons that allow us to develop good branch predictors:
Bi-modal distribution - the outcome of branches is often bimodally distributed, i.e. an individual branch is often highly biased towards taken or untaken. If the distribution of most branches would be uniform then it'd be impossible to devise a good prediction algorithm.
Dependency between branches - in real-world programs, there is a significant amount of dependency between distinct branches, that is the outcome of one branch affects the outcome of another branch. For example:
if (var1 == 3) // b1
var1 = 0;
if (var2 == 3) // b2
var2 = 0;
if (var1 != var2) // b3
...
The outcome of branch b3 here depends on the outcome of branches b1 and b2. If both b1 and b2 are untaken (that is their conditions evaluate to true and var1 and var2 are assigned 0) then branch b3 will be taken. The predictor that looks at a single branch only has no way to capture this behavior. Algorithms that examine this inter-branch behavior are called two-level predictors.
You didn't ask for any particular algorithms so I won't describe any of them, but I'll mention the 2-bit prediction buffer scheme that works reasonably well and is quite simple to implement (essentially, one keeps track of outcomes of a particular branch in a cache and makes decision based on the current state in the cache). This scheme was implemented in the MIPS R10000 processor and the results showed prediction accuracy of ~90%.
I'm not sure about application of NNs to branch-prediction - it does seem possible to design an algorithm based on NNs. However, I believe it wouldn't have any practical usage as: a) it would be too complex to implement in hardware (so it'd take too many gates and introduce a lot of delay); b) it wouldn't have significant improvement on predictor's performance compared to traditional algorithms that are much easier to implement.
Many languages provides mechanisms to tell the compiler thich branch is most expected result. It helps the compiler to organise the code to maximise positive branch predictions. An example gcc __builtin_expect, likely, unlikely

Efficient synchronization algorithm

Lets say I have a large sorted (+10 MB, +650k rows) dataset on node_a and different dataset on node_b. There is no master version of the dataset, meaning that either node can have some pieces which are not available to other node. My goal is to have a content of node_a synchronized with content of node_b. What is the most efficient way to do so?
Common sense solution would be:
node_a: Here's everything I have... (sends entire dataset)
node_b: Here's what you don't have... (sends missing parts)
But this solution is not efficient at all. It requires the node_a to send (+10 MB) every time he attempts to synchronize.
So this time using a little brainpower I could introduce a partitioning of the dataset, sending only a part of entire content and expect differences found between first and last row of the part.
Can you think of any better solutions?
For a single synchronization:
Break the dataset up into arbitrary parts, hash each (with MD5, for example), and only send through the hash values instead of the whole data set. Then use a comparison of the hash values on the other side to determine what's not the same on each side, and send this through as appropriate.
If each part doesn't have a global unique ID (i.e. a primary key that's guaranteed to be the same for the corresponding row on each side), you may need some meta-data sent across as well, or send hashes of parts incrementally, determining the difference as you go, and changing what you send if required (e.g. send the hash of 10 rows at a time, if you find a missing row, there will be a mismatch of the rows - either cater for this on the receiver-side, or offset the sender by one row). How exactly this should be done will depend on what your data looks like.
For repeated synchronization:
A good idea might be to create a master version, and store this separately on one of the nodes, although this probably isn't necessary if you don't care about conflicts or being able to revert mistakes.
With or without a master version, you can use versioning here. Store the version of last synchronize, and store a version on each part. When synchronizing, just send the parts with a version higher than the last synchronize version.
As an alternative to a globally auto-incremented version, you could either use a timestamp as the version, or just have a modified flag on each part, setting it when modified, sending all parts with their flag set, and resetting the flags once synchronized.

optimal algorithm for adding chosen table rows to the database

I am trying to apply a save method in a backing Java bean which will take the table rows that are selected and save them in the database. However, let's say the user changes his choices a little (changes 1 out of his 5 choices). I am wondering about the algorithm I am going to apply if it matters in efficiency in the long term or not....
here it goes :
every time the user clicks the button (save) I will delete all his previous choices and insert all the current choices to the database
once the button is clicked --- see which rows the user de-selected and delete their rows from the database and add the new ones???
is choice number 2 better or not than choice 1 .......or it doesn't really matter for number of choices that will not exceed 15 ??
Thanks
I would definitely go for option 2, try to figure out the minimum number of operations you need to perform.
It is, however, fairly normal to fall back to option 1 in times of deadlines etc. since it is a bit easier to implement.
There shouldn't, however, be that much harder to figure out what the changes are, since it doesn't seem to me that you're changing the rows themselves. Either you're deleting ones that had their checkmark cleared, or you insert ones that had their checkmark set.
Simply store a list of primary key values of whatever is in the database, then compare to that list when you iterate through the new list when the user wants to persist the changes.
A minimal work solution here would also mean you would be a bit more future-proof in terms of refactoring, changes, or additions. For instance, what if there in the future is data attached to any of those rows. You would need to keep that as well. Generally I'm a bit opposed to writing code just for the sake of "what if", but here I feel it's more like "why wouldn't you ..." than that.
So my advice is go for option 2. Not much more work.

Resources