get all commit of a file/path with rugged - ruby

I would like to get the list of all the commits for a file/path but I don't know how to do it.
For example I want all the commit of the file "test", to get oid of each commit and thanks to this oid, I will get the blob of all revision for this file.
Is it possible ?
Thanks !

We can get all commits by this way :
tab = []
walker = Rugged::Walker.new(repo)
walker.sorting(Rugged::SORT_DATE)
walker.push(repo.head.target)
walker.each do |commit|
if commit.diff(paths: ["path_of_file"]).size > 0
tab.push(commit)
end
end

Related

How to detect a file rename using Rugged?

I'm a novice Rugged user, and I'm attempting to detect file renames in the commit history. I'm diffing each commit against its first parent, as follows:
repo = Rugged::Repository.discover("foo")
walker = Rugged::Walker.new(repo)
walker.sorting(Rugged::SORT_TOPO)
walker.push("master")
walker.each.take(200).each do |commit|
puts commit.oid
puts commit.message
diffs = nil
# Handle Root commit
if commit.parents.count > 0 then
diffs = commit.parents[0].diff(commit)
else
diffs = commit.diff(nil)
end
(files,additions,deletions) = diffs.stat
puts "Files changed: #{files}, Additions: #{additions}, Deletions: #{deletions}"
paths = [];
diffs.each_delta do |delta|
old_file_path = delta.old_file[:path]
new_file_path = delta.new_file[:path]
puts delta.status
puts delta.renamed?
puts delta.similarity
paths += [delta]
end
puts "Paths:"
puts paths
puts "===================================="
end
walker.reset
However, when I do have a rename, the program will output an addition and a removal (A and D status). This matches the output of git log --name-status.
On the other hand, I found out that using git log --name-status --format='%H' --follow -- b.txt correctly shows the rename as R100.
The repo history and the outputs of git can be seen in the following gist: https://gist.github.com/ifigueroap/60716bbf4aa2f205b9c9
My question is how to use the Diff, or Delta objects of Rugged to detect such a file rename...
Thanks
Before accessing diffs.stat, you should call diffs.find_similar! with :renames => true. That'll modify the diffs object to do include rename information. This is not done by default, as the underlying operation is quite complex and not needed in most cases.
Check the documentation for find_similar! here: https://github.com/libgit2/rugged/blob/e96d26174b2bf763e9dd5dd2370e79f5e29077c9/ext/rugged/rugged_diff.c#L310-L366 for more options.

Why does the file get removed when commiting second time via rugged?

I want to store text files in a Git repo. I am using Ruby rugged gem 0.19.0 for this. The problem is that adding a second file f2 seems to automatically delete the first one f1. I have isolated the code to reproduce this (basically code straight from rugged gem docs):
require 'rugged'
def commit(file_name, message, content)
user = { email: 'email', name: 'name', time: Time.now }
repo = Rugged::Repository.new('repo')
oid = repo.write(content, :blob)
index = repo.index
index.add(path: file_name, oid: oid, mode: 0100644)
options = {}
options[:tree] = index.write_tree(repo)
options[:author] = user
options[:committer] = user
options[:message] = message
options[:parents] = repo.empty? ? [] : [ repo.head.target ].compact
options[:update_ref] = 'HEAD'
Rugged::Commit.create(repo, options)
end
Rugged::Repository.init_at('repo', :bare)
commit('f1', 'create f1', 'f1 content')
commit('f2', 'create f2', 'f2 content')
After running above code and cloning the bare repo created, the git log --name-status shows that second commit removes f1 file.
How do I fix this to not mess with files stored previously in the repo?
Rugged README
oid = repo.write("This is a blob.", :blob)
index = repo.index
index.read_tree(repo.head.target.tree) # notice
index.add(:path => "README.md", :oid => oid, :mode => 0100644)
but repo.head.target is a string in 0.19.0
oid = repo.write(content, :blob)
index = repo.index
begin
commit = repo.lookup(repo.head.target)
tree = commit.tree
index.read_tree(tree)
rescue
end
index.add(path: file_name, oid: oid, mode: 0100644)
It works

Access git log data using ruby rugged gem?

For a given file in a git repo, I'd like to look up the SHA of the last commit in which the file was modified, along with the timestamp.
At the command line, this data is visible with git log for a particular file path, e.g.
git log -n 1 path/to/file
Using the "git" gem for ruby I can also do this:
require 'git'
g = Git.open("/path/to/repo")
modified = g.log(1).object(relative/path/to/file).first.date
sha = g.log(1).object(relative/path/to/file).first.sha
Which is great, but is running too slowly for me when looping through lots of paths. As Rugged uses C libraries instead, I was hoping it would be faster but cannot see how to construct the right query in the rugged syntax. Any suggestions?
This should work:
repo = Rugged::Repository.new("/path/to/repo")
walker = Rugged::Walker.new(repo)
walker.sorting(Rugged::SORT_DATE)
walker.push(repo.head.target)
commit = walker.find do |commit|
commit.parents.size == 1 && commit.diff(paths: ["relative/path/to/file"]).size > 0
end
sha = commit.oid
Taken and adapted from https://github.com/libgit2/pygit2/issues/200#issuecomment-15899713
As an aside: Just because rugged is written in C does not mean that costly operations suddenly become cheap and quick. Obviously, you save a lot of string parsing and stuff like that, but this is not always the bottleneck.
As you're not interested in the actual textual diff here, the libgit2 GIT_DIFF_FORCE_BINARY might be something that could also help in increasing the performance of this lookup - unfortunately this is not yet available in Rugged (but will be, soon).
Testing this with the Rugged repo itself, it works correctly:
repo = Rugged::Repository.new(".")
walker = Rugged::Walker.new(repo)
walker.sorting(Rugged::SORT_DATE)
walker.push(repo.head.target)
commit = walker.find do |commit|
commit.parents.size == 1 && commit.diff(paths: ["Gemfile"]).size > 0
end
sha = commit.oid # => "8f5c763377f5bf0fb88d196b7c45a7d715264ad4"
walker = Rugged::Walker.new(repo)
walker.sorting(Rugged::SORT_DATE)
walker.push(repo.head.target)
commit = walker.find do |commit|
commit.parents.size == 1 && commit.diff(paths: [".travis.yml"]).size > 0
end
sha = commit.oid # => "4e18e05944daa2ba8d63a2c6b149900e3b93a88f"

Get all open pull requests from an organisation using the Github API Ruby gem

For our organisation's dashboard, I'd like to keep a count of all the open PRs on all our repositories. At the moment, all I've got is to loop through all the repos, and count through all the open PRs on each repo like so (which often results in a rate limit error):
connection = Github.new oauth_token: MY_OAUTH_TOKEN
pulls = 0
connection.repos.list(:org => GITHUB_ORGANISATION).each do |repo|
pulls += connection.pull_requests.list(:user => repo['owner']['login'], :repo => repo['name']).count
end
I know there must be a nicer way round this. Any ideas? (short of screen scraping!)
OK, so I think I've cracked this now. Pull requests are issues, so I can get all issues, and loop through the issues like so:
pulls = 0
issues = connection.issues.list(:org => GITHUB_ORGANISATION, :filter => 'all', :auto_pagination => true)
issues.each do |issue|
if issue["pull_request"]
pulls += 1
end
end
Once you remember that pull requests are issues too, everything just falls into place.

PG::ERROR: another command is already in progress

I have an importer which takes a list of emails and saves them into a postgres database. Here is a snippet of code within a tableless importer class:
query_temporary_table = "CREATE TEMPORARY TABLE subscriber_imports (email CHARACTER VARYING(255)) ON COMMIT DROP;"
query_copy = "COPY subscriber_imports(email) FROM STDIN WITH CSV;"
query_delete = "DELETE FROM subscriber_imports WHERE email IN (SELECT email FROM subscribers WHERE suppressed_at IS NOT NULL OR list_id = #{list.id}) RETURNING email;"
query_insert = "INSERT INTO subscribers(email, list_id, created_at, updated_at) SELECT email, #{list.id}, NOW(), NOW() FROM subscriber_imports RETURNING id;"
conn = ActiveRecord::Base.connection_pool.checkout
conn.transaction do
raw = conn.raw_connection
raw.exec(query_temporary_table)
raw.exec(query_copy)
CSV.read(csv.path, headers: true).each do |row|
raw.put_copy_data row['email']+"\n" unless row.nil?
end
raw.put_copy_end
while res = raw.get_result do; end # very important to do this after a copy
result_delete = raw.exec(query_delete)
result_insert = raw.exec(query_insert)
ActiveRecord::Base.connection_pool.checkin(conn)
{
deleted: result_delete.count,
inserted: result_insert.count,
updated: 0
}
end
The issue I am having is that when I try to upload I get an exception:
PG::ERROR: another command is already in progress: ROLLBACK
This is all done in one action, the only other queries I am making are user validation and I have a DB mutex preventing overlapping imports. This query worked fine up until my latest push which included updating my pg gem to 0.14.1 from 0.13.2 (along with other "unrelated" code).
The error initially started on our staging server, but I was then able to reproduce it locally and am out of ideas.
If I need to be more clear with my question, let me know.
Thanks
Found my own answer, and this might be useful if anyone finds the same issue when importing loads of data using "COPY"
An exception is being thrown within the CSV.read() block, and I do catch it, but I was not ending the process correctly.
begin
CSV.read(csv.path, headers: true).each do |row|
raw.put_copy_data row['email']+"\n" unless row.nil?
end
ensure
raw.put_copy_end
while res = raw.get_result do; end # very important to do this after a copy
end
This block ensures that the COPY command is completed. I also added this at the end to release the connection back into the pool, without disrupting the flow in the case of a successful import:
rescue
ActiveRecord::Base.connection_pool.checkin(conn)

Resources