Get tripple dot diff with gitpython - gitpython

I am getting the diff between two commits using gitpython in below way:
def get_inbetween_commit_diff(repo_path, commit_a, commit_b):
repo = Repo(repo_path)
uni_diff_text = repo.git.diff(
"{}".format(commit_a), "{}".format(commit_b), ignore_blank_lines=True, ignore_space_at_eol=True
)
return uni_diff_text
However, the default repo.git.diff shows the diff with double dot. Is there a way to achieve triple dot diff using gitpython?
Reference on double dot and triple dot diff: https://matthew-brett.github.io/pydagogue/git_diff_dots.html

repo.git.diff calls git directly, so I think you can just do this:
repo.git.diff(
"{}...{}".format(commit_a, commit_b), ignore_blank_lines=True, ignore_space_at_eol=True
)

Related

Jenkins Groovy: how to extract specific number from parameter in method?

I am quite new in Groovy and trying to extract find the best way to write a method that will extract the 3rd number in branch name which is passed as a parameter.
Standard gitBranchName looks like this release-1.2.3
Below is my method and I am wondering what to do with newestTagNumber:
#!/usr/bin/env groovy
def call(gitRepoName, gitBranchName) {
withCredentials([
string(credentialsId: 'jenkinsuser', variable: 'USER'),
string(credentialsId: 'jenkinssecret', variable: 'SECRET')]) {
def commitHash = sh(
script: 'git rev-parse --short HEAD',
/*script: 'git rev-parse --symbolic-full-name #{-1} && git rev-parse --abbrev-ref #{-1}' */
returnStdout: true).trim()
def repoUrl = "bitbucket.org/cos/${gitRepoName}"
/* newestTagNumber - this is 3rd number taken from gitBranchName */
def newestTagNumber = <what_to_add_here?>
/* nextTagNumber - incremented by 1 */
def nextTagNumber = newestTagNumber + 1
sh """
git config user.email "example#example.com"
git config user.name "jenkins"
git tag -a release-${nextTagNumber}.t -m 'Create tag release-${nextTagNumber}.t'
"""
sh('git push --tags https://${JT_USER}:${JT_SECRET}#' + repoUrl)
}
}
This is how it will probably work using Regex, but is there a prettier way to do it in Groovy?
[\d]*(\d$)
Thank you guys!
You can simply split the String by "." and get the last digit.
def parts = gitBranchName.split("\\.")
def newestTagNumber = parts[parts.size()-1]
If you are sure you will always get the branch name in this format with 3 decimal points(release-1.2.3) here is a one-liner.
def newestTagNumber = gitBranchName.split("\\.")[2]
#ycr solution is right, but I found even better option if you always want to change the last number (which is my case):
def newestTagNumber = gitBranchName.split("\\.").last()
Thanks!

Snakemake, how to change output filename when using wildcards

I think I have a simple problem but I don't how to solve it.
My input folder contains files like this:
AAAAA_S1_R1_001.fastq
AAAAA_S1_R2_001.fastq
BBBBB_S2_R1_001.fastq
BBBBB_S2_R2_001.fastq
My snakemake code:
import glob
samples = [os.path.basename(x) for x in sorted(glob.glob("input/*.fastq"))]
name = []
for x in samples:
if "_R1_" in x:
name.append(x.split("_R1_")[0])
NAME = name
rule all:
input:
expand("output/{sp}_mapped.bam", sp=NAME),
rule bwa:
input:
R1 = "input/{sample}_R1_001.fastq",
R2 = "input/{sample}_R2_001.fastq"
output:
mapped = "output/{sample}_mapped.bam"
params:
ref = "refs/AF086833.fa"
run:
shell("bwa mem {params.ref} {input.R1} {input.R2} | samtools sort > {output.mapped}")
The output file names are:
AAAAA_S1_mapped.bam
BBBBB_S2_mapped.bam
I want the output file to be:
AAAAA_mapped.bam
BBBBB_mapped.bam
How can I or change the outputname or rename the files before or after the bwa rule.
Try this:
import pathlib
indir = pathlib.Path("input")
paths = indir.glob("*_S?_R?_001.fastq")
samples = set([x.stem.split("_")[0] for x in paths])
rule all:
input:
expand("output/{sample}_mapped.bam", sample=samples)
def find_fastqs(wildcards):
fastqs = [str(x) for x in indir.glob(f"{wildcards.sample}_*.fastq")]
return sorted(fastqs)
rule bwa:
input:
fastqs = find_fastqs
output:
mapped = "output/{sample}_mapped.bam"
params:
ref = "refs/AF086833.fa"
shell:
"bwa mem {params.ref} {input.fastqs} | samtools sort > {output.mapped}"
Uses an input function to find the correct samples for rule bwa. There might be a more elegant solution, but I can't see it right now. I think this should work, though.
(Edited to reflect OP's edit.)
Unfortunately, I've also had this problem with filenames with the following logic: {batch}/{seq_run}_{index}_{flowcell}_{lane}_{read_orientation}.fastq.gz.
I think that the core problem is that none of the individual wildcards are unique. Also, not all values for all wildcards can be combined; seq_run1 was run on lane1, not lane2. Therefore, expand() does not work.
After multiple attempts in Snakemake (see below), my solution was to standardize input with mv / sed / rename. Removing {batch}, {flowcell} and {lane} made it possible to use {sample}, a unique combination of {seq_run} and {index}.
What did not work (but it could be worth to try for others in the same situation):
Adding the zip argument to expand()
Renaming output using the following syntax:
output: "_".join(re.split("[/_]", "{full_filename}")[1,2]+".fastq.gz"

Rugged merge commit from origin does not update working tree

Similar to this question, but instead of creating a new file, I'm trying to merge from origin. After creating a new index using Rugged::Repository's merge_commits, and a new merge commit, git reports the new file (coming from origin) as deleted.
Create a merge index,
> origin_target = repo.references['refs/remotes/origin/master'].target
> merge_index = repo.merge_commits(repo.head.target, origin_target)
and a new merge commit,
> options = {
update_ref: 'refs/heads/master',
committer: {name: 'user', email: 'user#foo.com', time: Time.now},
author: {name: 'user', email: 'user#foo.com', time: Time.now},
parents: [repo.head.target, origin_target],
message: "merge `origin/master` into `master`"}
and make sure to use the tree from the merge index.
> options[:tree] = merge_index.write_tree(repo)
Create the commit
> merge_commit = Rugged::Commit.create(repo, options)
Check that our HEAD has been updated:
> repo.head.target.tree
=> #<Rugged::Tree:16816500 {oid: 16c147f358a095bdca52a462376d7b5730e1978e}>
<"first_file.txt" 9d096847743f97ba44edf00a910f24bac13f36e2>
<"second_file.txt" 8178c76d627cade75005b40711b92f4177bc6cfc>
<"newfile.txt" e69de29bb2d1d6434b8b29ae775ad8c2e48c5391>
Looks good. I see the new file in the index. Write it to disk.
> repo.index.write
=> nil
...but git reports the new file as deleted:
$ git st
## master...origin/master [ahead 2]
D newfile.txt
How can I properly update my index and working tree?
There is an important distinction between the Git repository and the working directory. While most common command-line git commands operate on the working directory as well as the repository, the lower-level commands of libgit2 / librugged mostly operate on only the repository. This includes writing the index as in your example.
To update the working directory to match the index, the following command should work (after writing the index):
options = { strategy: force }
repo.checkout_head(options)
Docs for checkout_head: http://www.rubydoc.info/github/libgit2/rugged/Rugged/Repository#checkout_head-instance_method
Note: I tested with update_ref: 'HEAD' for the commit. I'm not sure if update_ref: 'refs/heads/master' will have the same effect.

Checking if an object is in a repo in gitpython

I'm working on a program that will be adding and updating files in a git repo. Since I can't be sure if a file that I am working with is currently in the repo, I need to check its existence - an action that seems to be harder than I thought it would be.
The 'in' comparison doesn't seem to work on non-root levels on trees in gitpython. Ex.
>>> repo = Repo(path)
>>> hct = repo.head.commit.tree
>>>> 'A' in hct['documents']
False
>>> hct['documents']['A']
<git.Tree "8c74cba527a814a3700a96d8b168715684013857">
So I'm left to wonder, how do people check that a given file is in a git tree before trying to work on it? Trying to access an object for a file that is not in the tree will throw a KeyError, so I can do try-catches. But that feels like a poor use of exception handling for a routine existence check.
Have I missed something really obvious? How does once check for the existence of a file in a commit tree using gitpython (or really any library/method in Python)?
Self Answer
OK, I dug around in the Tree class to see what __contains__ does. Turns out, when searching in sub folders, one has to check for existence of a file using the full relative path from the repo's root. So a working version of the check I did above is:
>>> 'documents/A' in hct['documents']
True
EricP's answer has a bug. Here's a fixed version:
def fileInRepo(repo, filePath):
'''
repo is a gitPython Repo object
filePath is the full path to the file from the repository root
returns true if file is found in the repo at the specified path, false otherwise
'''
pathdir = os.path.dirname(filePath)
# Build up reference to desired repo path
rsub = repo.head.commit.tree
for path_element in pathdir.split(os.path.sep):
# If dir on file path is not in repo, neither is file.
try :
rsub = rsub[path_element]
except KeyError :
return False
return(filePath in rsub)
Usage:
file_found = fileInRepo(repo, 'documents/A')
This is very similar to EricP's code, but handles the case where the folder containing the file is not in the repo. EricP's function raises a KeyError in that case. This function returns False.
(I offered to edit EricP's code but was rejected.)
Expanding on Bill's solution, here is a function that determines whether a file is in a repo:
def fileInRepo(repo,path_to_file):
'''
repo is a gitPython Repo object
path_to_file is the full path to the file from the repository root
returns true if file is found in the repo at the specified path, false otherwise
'''
pathdir = os.path.dirname(path_to_file)
# Build up reference to desired repo path
rsub = repo.head.commit.tree
for path_element in pathdir.split(os.path.sep):
rsub = rsub[path_element]
return(path_to_file in rsub)
Example usage:
file_found = fileInRepo(repo, 'documents/A')
If you want to omit catch try you can check if object is in repo with:
def fileInRepo(repo, path_to_file):
dir_path = os.path.dirname(path_to_file)
rsub = repo.head.commit.tree
path_elements = dir_path.split(os.path.sep)
for el_id, element in enumerate(path_elements):
sub_path = os.path.join(*path_elements[:el_id + 1])
if sub_path in rsub:
rsub = rsub[element]
else:
return False
return path_to_file in rsub
or you can iterate through all items in repo, but it will be for sure slower:
def isFileInRepo(repo, path_to_file):
rsub = repo.head.commit.tree
for element in rsub.traverse():
if element.path == path_to_file:
return True
return False
There already exists a method of Tree that will do what fileInRepo re-implements in Lucidity's answer .
The method is Tree.join:
https://gitpython.readthedocs.io/en/3.1.29/reference.html#git.objects.tree.Tree.join
A less redundant implementation of fileInRepo is:
def fileInRepo(repo, filePath):
try:
repo.head.commit.tree.join(filePath)
return True
except KeyError:
return False

Grit commit_diff shows reverse diff

I'm trying to do a very simple thing: Read a diff from a git repo via the ruby gem Grit. I'm creating a file and adding the line "This is me changing the first file". Now I do this to get the diff:
r = Grit::Repo.new("myrepo")
c = r.commits.first
d = r.commit_diff(c.id).first
puts d.first.diff
The output of this is:
--- a/First-File.asciidoc
+++ b/First-File.asciidoc
## -1,2 +1 ##
-This is me changing the first file
See that minus in front of the added line? Why would a commit_diff show in reverse? I know that git reverses the diff if I reverse the commit shas, but this is a Grit library call that only gives the commit diff?
Any clues?
Let me answer that question. The commit shows up in correct form, if you do this insteas:
r = Grit::Repo.new("myrepo")
c = r.commits.first
d = c.diffs.first
puts d.first.diff
Not sure what the difference would be between Commit.diff and Repo.commit_diff.

Resources