What is the difference between Compare with BASE vs. WORKING TREE? - tortoisegit

#1 and #3 in this screensot.
What is BASE vs. WORKING TREE?

Compare with Working Tree compares the selected file of the selected vsion against the currently checked out files (this is called working tree in Git).
Compare with base compares the selected file against the previous commit (i.e., it shows the changes of the current commit).

Related

Algorithm to recursively search Git repo for a string

I am working on a project to automate the code review process for a team of engineers. Basically, what happens is every time an engineer makes a change to a file, before those changes are pushed to Github, they need to figure out what other files are being impacted by that change and add the engineers in charge of those files to view and approve that change. Right now, the person who made the change would manually do the following things: check which function the change occurred in, use the text search feature of an IDE (such as VS code) to see where that function is being used in the entire repo, go through all those search results and check which functions in other files is calling the original function, and then do a search for those functions. They would recursively search for functions until one of a group of designated files called "base files" appear in the search results. Separate engineers are in charge of separate base files, so once a base file appears in the search process, the person who made the change would need to add the engineer in charge of that base file to approve the change because the functionality of that file is potentially impacted by that change. We are trying to find a way to automate these manual steps.
I was wondering if there are any known algorithms that can be used to accomplish something like this. I am thinking of using graphs or trees, but I am not sure which specific graph or tree algorithms I should use.
Hmm, searching for strings is not good enough.
mark all base files
make call graph, directed graph (might not be acyclic)
do a BFS from changed file and log all Base files
Doxygen can generate some call graphs, or maybe there already is some Clang/LLVM call graph builder.

Config spec for UCM components

this is kind of a follow up to this question: ClearCase UCM: Get latest version from Dev-stream
I need a dynamic view to have the LATEST (or CHECKED_OUT) version of certain components of a UCM VOB and at the same time specific baselines for other components.
For components where I want to include a specific baseline I can just include them with
element component_1/... BASELINEwhere BASELINE is just a name of a baseline (without the need to specify a stream or anything).
The folder of the component is later included by a element * /main/LATEST-directive (at least to my knowledge).
As mentioned in the above mentioned link I can also add a line likeelement component_2/... /main/INT-STREAM/DEV-STREAM/LATEST which should give me the latest version of DEV-STREAM.
Now I found out that sometimes (when the DEV-STREAM was 'branched' from version /main/0 I need a element component_2/... /main/DEV-STREAM/LATESTto get the latest version of this component. And in other cases there is no DEV-STREAM (because the file was obviously never changed and therefore the DEV-STREAM is not created), so I need a third lineelement component_2/... /main/INT-STREAM/LATEST.
And the same applies to version CHECKED_OUT.
As I want to create the config spec by script I would either need to find out where the component_2 is located (in the INT-STREAM or the DEV-STREAM) and where the DEV-STREAM was branched off or I would have to include 6 lines (one of them should match) for each component - in comparison to the one line for the baseline. Obviously I wouldn't want to include each file (there are VERY many) but would like to be able to simply specify the component with all its subfolders, just like for the baseline.
Thanks for reading - and obviously for any answers
You can avoid all those multiple rules with:
element component_2/... .../DEV-STREAM/LATEST
element component_2/... .../INT-STREAM/LATEST -mkbranch DEV-STREAM
The order is important, and the '...' allows you to select a branch without knowing of its exact parent branch.

ClearCase UCM: Get latest version from Dev-stream

I'm stuck over the config spec for a dynamic view.
I try to get the latest version of a folder of a UCM stream from the Dev-stream into another (Base) dynamic view.
My idea would have been to do a
element PathToFolder/... .../DEV-STREAM-NAME/LATEST
but that won't give me anything in the view.
The config spec that is automatically generated by UCM does not help me as it specifies a specific baseline and creates a branch once you check out a file (which I of course do not want, I only need read-access to the version).
Is it possible to access the latest version from the Dev-stream, or do I have to skip the folder altogether in the config spec and just copy it using the operating system (which would be possible but takes quite long as the folder might be quite large)?
Thanks for any answers and Best Regards
You need to make sure the parent folders are selected as well:
element vob/... .../DEV-STREAM-NAME/LATEST
element vob/path/... .../DEV-STREAM-NAME/LATEST
element vob/path/to/... .../DEV-STREAM-NAME/LATEST
element vob/path/to/folder/... .../DEV-STREAM-NAME/LATEST
element * /main/LATEST
If one of the parent folder has no version in DEV-STREAM-NAME, but a version in the parent stream (like INT-STREAM-NAME), you would need to select that as well.
element vob/path/to/... .../INT-STREAM-NAME/LATEST

How do I get a file manifest for each revision in a git repository?

I have a git repository that was created on Microsoft Windows. Microsoft Windows has a case insensitive file system. The people checking into this repository have not been careful about the case of their filenames. This means that the same directory or file sometimes shows up under two different names.
I mean to fix this problem. But in order to really fix it, I have to get a handle on it.
Is there a quick and simple way to get a list of the files at each revision?
I need this in order to figure out which revisions (if any) have the same file under two different names so I can decide on a strategy for fixing such cases. This means I need to get this information en-masse as quickly as possible so the analysis consumes a resonable amount of time.
One way to get this is with ls-tree:
git ls-tree -r --name-only <commit>
(Note that this looks at the portion of the tree corresponding to your current directory, so you should either run it from the top level of your repo, or give the --full-tree option.)
This is essentially instantaneous, since all Git has to do is recursively examine the tree; it doesn't even have to look at the contents of files.
I'm not sure how you're going to use a list of filenames to detect the same file under two different names. If you just mean that you want to look for filenames that would be the same on a case-insensitive filesystem, then the list of filenames is all you needed.
However, if you think the files might actually have the same content, you could drop the --name-only, so that you'll also see the SHA1s of all the file, and can find identical files by looking for duplicate hashes.
You could run something like this:
git log --name-only --pretty="format:%H"
This command will show the the sha1 and the list of changed files for every revision.

What is a sensible data-structure for allowing efficient synchronisation between two root paths?

I am working on an application that involves maintaining consistency between two local directories. Specifically, the directories should be identical, with the exception that all files in one of the directories are modified in some particular way (this part is not important to my question).
While running, my application runs two processes that listen for changes occurring under each of the paths, and performs relevant operations to bring them back in sync when necessary.
In terms of my specific question: I'm looking for advice on the tricker situation of when one starts the application. At this point, each process needs to check all files/folders under both the path that it is looking after, to see if anything has changed in anyway whilst the application was not running. (Let us assume that the application cannot be notified by the OS of anything that happened while it was shutdown, and thus will need to directly check every file/folder.)
Each process will have access to (and maintain) a persistent data-structure of all files/folder under its designated path. I was thinking that the following should be held within the data-structure for each of the files and folders:
File/folder name;
File hash (CRC32);
File/folder last mod data; and
File/folder size.
These pieces of information will obviously help to check for any changes to files/folder, but what is the best way to store them?
It seems to me that one sensible way to approach the situation of an application start is for each process to recursively scan through all files/folders under its designated path, and compare the metadata for each file scanned to the metadata stored in its data-structure. Then the processes should also iterate through the data-structures to look for things that have been removed from the paths. Some cases that may be encountered during this process are:
file modified (file name found in data-structure, but hash differs);
file added (no identical filename or hash found in data-structure);
file renamed (file with same hash exists in data-structure, but not with same filename);
folder added (no folder name in data-structure);
folder removed (folder name in data-structure, but not under path);
folder renamed (tricky one).
So, what's the best data-structure to use for this task? In my head I'm thinking some form of sorted associative array, e.g., a red-black tree, which store file and folder objects. Each file object contains name, hash and mod-date attributes , while each folder object contains name and children attributes, where children stores another associative array with everything underneath. Given the path to an arbitrary file, e.g., /foo/bar/file.txt, you begin at the root (foo), check for bar and so on until you get to file.txt's parent object.
Another alternative I can think of is to merely store everything flatly, such that there is one red-black tree where each key is the full path to each file/folder, and the value is the file / folder object. This would probably be quicker for retrieval, but it won't be possible to detect renamed files/folders without iterating through all values anyway, which sounds expensive. In the first approach, it may be the case that identifying a rename would only involves checking a portion of the data-structure rather than all of it.
Sorry the above ideas aren't terribly well thought out. What's the state of the art in this area, and are there any well-trodden approaches to these types of problems?
You're modelling a filesystem, so it's quite natural to use a hierarchical data structure. After all, you don't need to compare the file at dir1\dir2\foo.txt to dir3\bar.txt, right? You didn't mention file moves between directories as something you're tracking.
So, the data structure could be:
interface IFSEntry {
string name
datetime creationDate
pure virtual bool Compare(IFSEntry other)
pure virtual void UpdateFrom(IFSEntry other)
pure virtual bool WasRenamed(Dictionary<string,IFSEntry> possibleOriginals, out string oldName)
...
}
class File : IFSEntry {
...
}
class Directory : IFSEntry {
private Dictionary<string,IFSEntry> children;
...
}
The Directory implementations of UpdateFrom and Compare would recurse down their children.
File renames would be relatively easy by comparing CRC's. You'd miss files that changed in both places and were renamed. You could add a CRC dictionary to the Directory class if the time to run the comparisons proves a performance problem.
For directory moves, if the child files also changed, then you've got a fuzzy logic situation. It would be best to have a merge tool that the user would operate for that situation.
If a file changes in both places, you also need a user-facing merge strategy if conflicting changes occur. I'd argue that is always a good idea, just to let the user eyeball that the document didn't lose coherence.

Resources