Why does the file get removed when commiting second time via rugged? - ruby

I want to store text files in a Git repo. I am using Ruby rugged gem 0.19.0 for this. The problem is that adding a second file f2 seems to automatically delete the first one f1. I have isolated the code to reproduce this (basically code straight from rugged gem docs):
require 'rugged'
def commit(file_name, message, content)
user = { email: 'email', name: 'name', time: Time.now }
repo = Rugged::Repository.new('repo')
oid = repo.write(content, :blob)
index = repo.index
index.add(path: file_name, oid: oid, mode: 0100644)
options = {}
options[:tree] = index.write_tree(repo)
options[:author] = user
options[:committer] = user
options[:message] = message
options[:parents] = repo.empty? ? [] : [ repo.head.target ].compact
options[:update_ref] = 'HEAD'
Rugged::Commit.create(repo, options)
end
Rugged::Repository.init_at('repo', :bare)
commit('f1', 'create f1', 'f1 content')
commit('f2', 'create f2', 'f2 content')
After running above code and cloning the bare repo created, the git log --name-status shows that second commit removes f1 file.
How do I fix this to not mess with files stored previously in the repo?

Rugged README
oid = repo.write("This is a blob.", :blob)
index = repo.index
index.read_tree(repo.head.target.tree) # notice
index.add(:path => "README.md", :oid => oid, :mode => 0100644)
but repo.head.target is a string in 0.19.0
oid = repo.write(content, :blob)
index = repo.index
begin
commit = repo.lookup(repo.head.target)
tree = commit.tree
index.read_tree(tree)
rescue
end
index.add(path: file_name, oid: oid, mode: 0100644)
It works

Related

How to detect a file rename using Rugged?

I'm a novice Rugged user, and I'm attempting to detect file renames in the commit history. I'm diffing each commit against its first parent, as follows:
repo = Rugged::Repository.discover("foo")
walker = Rugged::Walker.new(repo)
walker.sorting(Rugged::SORT_TOPO)
walker.push("master")
walker.each.take(200).each do |commit|
puts commit.oid
puts commit.message
diffs = nil
# Handle Root commit
if commit.parents.count > 0 then
diffs = commit.parents[0].diff(commit)
else
diffs = commit.diff(nil)
end
(files,additions,deletions) = diffs.stat
puts "Files changed: #{files}, Additions: #{additions}, Deletions: #{deletions}"
paths = [];
diffs.each_delta do |delta|
old_file_path = delta.old_file[:path]
new_file_path = delta.new_file[:path]
puts delta.status
puts delta.renamed?
puts delta.similarity
paths += [delta]
end
puts "Paths:"
puts paths
puts "===================================="
end
walker.reset
However, when I do have a rename, the program will output an addition and a removal (A and D status). This matches the output of git log --name-status.
On the other hand, I found out that using git log --name-status --format='%H' --follow -- b.txt correctly shows the rename as R100.
The repo history and the outputs of git can be seen in the following gist: https://gist.github.com/ifigueroap/60716bbf4aa2f205b9c9
My question is how to use the Diff, or Delta objects of Rugged to detect such a file rename...
Thanks
Before accessing diffs.stat, you should call diffs.find_similar! with :renames => true. That'll modify the diffs object to do include rename information. This is not done by default, as the underlying operation is quite complex and not needed in most cases.
Check the documentation for find_similar! here: https://github.com/libgit2/rugged/blob/e96d26174b2bf763e9dd5dd2370e79f5e29077c9/ext/rugged/rugged_diff.c#L310-L366 for more options.

Correct way of calling mongodb cloneCollection from ruby

I am trying to call the mongodb cloneCollection command from ruby. Based on the first question at https://github.com/mongodb/mongo-ruby-driver/wiki/FAQ I made this test script:
require "mongo"
include Mongo
db = MongoClient.new("localhost", 27017).db("test")
coll = "users2"
cmd = {}
cmd['cloneCollection'] = coll
cmd['from'] = "test.example.com:27017"
db.command(cmd)
However, instead of cloning the users2 collection, it creates an empty database with the same name. Cannot figure out what I'm doing wrong. Any ideas? Thanks!
The following Ruby program is a self-contained working example that should answer your question, complete with startup of source and destination mongod servers.
It appears that the arg to cloneCollection must be a fully qualified name, e.g., "test.users" for the "users" collection in the "test" database.
require "mongo"
# startup source and destination mongod servers
source = { 'port' => 27018, 'dbpath' => 'data27018' }
dest = { 'port' => 27019, 'dbpath' => 'data27019' }
[ source, dest ].each do |server|
dbpath = server['dbpath']
system("rm -fr #{dbpath} && mkdir #{dbpath} && mongod --port #{server['port']} --dbpath #{dbpath} &")
end
sleep 10 # delay for mongod startup
db_name = 'test'
coll_name = 'users'
db_coll_name = "#{db_name}.#{coll_name}"
# create source collection
db = Mongo::MongoClient.new("localhost", source['port']).db(db_name)
coll = db[coll_name]
coll.insert({'name' => 'Gary'})
# cloneCollection from source to dest
db = Mongo::MongoClient.new("localhost", dest['port']).db(db_name)
cmd = BSON::OrderedHash.new
cmd['cloneCollection'] = db_coll_name
cmd['from'] = "localhost:#{source['port']}"
db.command(cmd)
# verify cloneCollection
p db[coll_name].find.to_a
# kill mongod servers
pids = `pgrep mongod`.split(/\n/).sort.slice(-2,2)
system("kill #{pids.join(' ')}")
Please let me know if you have any further questions.
After being surprised that Mongo::DB does not have a cloneCollection method, and reading #Gary Murakami's excellent answer above, I wrote this piece of monkey patch that I though may be useful to others:
class Mongo::DB
def cloneCollection(remote_host, collection_name)
cmd = BSON::OrderedHash.new
cmd['cloneCollection'] = name + "." + collection_name
cmd['from'] = remote_host
self.command(cmd)
end
end
Just drop it anywhere in your source file to get a Mongo::DB.cloneCollection implementation.

PG::ERROR: another command is already in progress

I have an importer which takes a list of emails and saves them into a postgres database. Here is a snippet of code within a tableless importer class:
query_temporary_table = "CREATE TEMPORARY TABLE subscriber_imports (email CHARACTER VARYING(255)) ON COMMIT DROP;"
query_copy = "COPY subscriber_imports(email) FROM STDIN WITH CSV;"
query_delete = "DELETE FROM subscriber_imports WHERE email IN (SELECT email FROM subscribers WHERE suppressed_at IS NOT NULL OR list_id = #{list.id}) RETURNING email;"
query_insert = "INSERT INTO subscribers(email, list_id, created_at, updated_at) SELECT email, #{list.id}, NOW(), NOW() FROM subscriber_imports RETURNING id;"
conn = ActiveRecord::Base.connection_pool.checkout
conn.transaction do
raw = conn.raw_connection
raw.exec(query_temporary_table)
raw.exec(query_copy)
CSV.read(csv.path, headers: true).each do |row|
raw.put_copy_data row['email']+"\n" unless row.nil?
end
raw.put_copy_end
while res = raw.get_result do; end # very important to do this after a copy
result_delete = raw.exec(query_delete)
result_insert = raw.exec(query_insert)
ActiveRecord::Base.connection_pool.checkin(conn)
{
deleted: result_delete.count,
inserted: result_insert.count,
updated: 0
}
end
The issue I am having is that when I try to upload I get an exception:
PG::ERROR: another command is already in progress: ROLLBACK
This is all done in one action, the only other queries I am making are user validation and I have a DB mutex preventing overlapping imports. This query worked fine up until my latest push which included updating my pg gem to 0.14.1 from 0.13.2 (along with other "unrelated" code).
The error initially started on our staging server, but I was then able to reproduce it locally and am out of ideas.
If I need to be more clear with my question, let me know.
Thanks
Found my own answer, and this might be useful if anyone finds the same issue when importing loads of data using "COPY"
An exception is being thrown within the CSV.read() block, and I do catch it, but I was not ending the process correctly.
begin
CSV.read(csv.path, headers: true).each do |row|
raw.put_copy_data row['email']+"\n" unless row.nil?
end
ensure
raw.put_copy_end
while res = raw.get_result do; end # very important to do this after a copy
end
This block ensures that the COPY command is completed. I also added this at the end to release the connection back into the pool, without disrupting the flow in the case of a successful import:
rescue
ActiveRecord::Base.connection_pool.checkin(conn)

Directly Downloading a File From an RSS feed Using Ruby - Handling Redirects

I'm writing a program in Ruby that downloads a file from an RSS feed to my local hard drive. Previously, I'd written this application in Perl and figured a great way to learn Ruby would be to recreate this program using Ruby code.
In the Perl program (which works), I was able to download the original file directly from the server it was hosted on (keeping the original file name) and it worked great. In the Ruby program (which isn't working), I have to sort of "stream" the data from the file I want into a new file that I've created on my hard drive. Unfortunately, this isn't working and the "streamed" data is always coming back empty. My assumption is that there is some sort of redirect that Perl can handle to retrieve the file directly that Ruby cannot.
I'm going to post both programs (they're relatively small) and hope that this helps solve my issue. If you have questions, please let me know. As a side note, I pointed this program at a more static URL (a jpeg) and it downloaded the file just fine. This is why I'm theorizing that some sort of redirect is causing issues.
The Ruby Code (That Doesn't Work)
require 'net/http';
require 'open-uri';
require 'rexml/document';
require 'sqlite3';
# Create new SQLite3 database connection
db_connection = SQLite3::Database.new('fiend.db');
# Make sure I can reference records in the query result by column name instead of index number
db_connection.results_as_hash = true;
# Grab all TV shows from the shows table
query = '
SELECT
id,
name,
current_season,
last_episode
FROM
shows
ORDER BY
name
';
# Run through each record in the result set
db_connection.execute(query) { |show|
# Pad the current season number with a zero for later user in a search query
season = '%02d' % show['current_season'].to_s;
# Calculate the next episode number and pad with a zero
next_episode = '%02d' % (Integer(show['last_episode']) + 1).to_s;
# Store the name of the show
name = show['name'];
# Generate the URL of the RSS feed that will hold the list of torrents
feed_url = URI.encode("http://btjunkie.org/rss.xml?query=#{name} S#{season}E#{next_episode}&o=52");
# Generate a simple string the denotes the show, season and episode number being retrieved
episode_id = "#{name} S#{season}E#{next_episode}";
puts "Loading feed for #{name}..";
# Store the response from the download of the feed
feed_download_response = Net::HTTP.get_response(URI.parse(feed_url));
# Store the contents of the response (in this case, XML data)
xml_data = feed_download_response.body;
puts "Feed Loaded. Parsing items.."
# Create a new REXML Document and pass in the XML from the Net::HTTP response
doc = REXML::Document.new(xml_data);
# Loop through each in the feed
doc.root.each_element('//item') { |item|
# Find and store the URL of the torrent we wish to download
torrent_url = item.elements['link'].text + '/download.torrent';
puts "Downloading #{episode_id} from #{torrent_url}";
## This is where crap stops working
# Open Connection to the host
Net::HTTP.start(URI.parse(torrent_url).host, 80) { |http|
# Create a torrent file to dump the data into
File.open("#{episode_id}.torrent", 'wb') { |torrent_file|
# Try to grab the torrent data
data = http.get(torrent_url[19..torrent_url.size], "User-Agent" => "Mozilla/4.0").body;
# Write the data to the torrent file (the data is always coming back blank)
torrent_file.write(data);
# Close the torrent file
torrent_file.close();
}
}
break;
}
}
The Perl Code (That Does Work)
use strict;
use XML::Parser;
use LWP::UserAgent;
use HTTP::Status;
use DBI;
my $dbh = DBI->connect("dbi:SQLite:dbname=fiend.db", "", "", { RaiseError => 1, AutoCommit => 1 });
my $userAgent = new LWP::UserAgent; # Create new user agent
$userAgent->agent("Mozilla/4.0"); # Spoof our user agent as Mozilla
$userAgent->timeout(20); # Set timeout limit for request
my $currentTag = ""; # Stores what tag is currently being parsed
my $torrentUrl = ""; # Stores the data found in any node
my $isDownloaded = 0; # 1 or zero that states whether or not we've downloaded a particular episode
my $shows = $dbh->selectall_arrayref("SELECT id, name, current_season, last_episode FROM shows ORDER BY name");
my $id = 0;
my $name = "";
my $season = 0;
my $last_episode = 0;
foreach my $show (#$shows) {
$isDownloaded = 0;
($id, $name, $season, $last_episode) = (#$show);
$season = sprintf("%02d", $season); # Append a zero to the season (e.g. 6 becomes 06)
$last_episode = sprintf("%02d", ($last_episode + 1)); # Append a zero to the last episode (e.g. 6 becomes 06) and increment it by one
print("Checking $name S" . $season . "E" . "$last_episode \n");
my $request = new HTTP::Request(GET => "http://btjunkie.org/rss.xml?query=$name S" . $season . "E" . $last_episode . "&o=52"); # Retrieve the torrent feed
my $rssFeed = $userAgent->request($request); # Store the feed in a variable for later access
if($rssFeed->is_success) { # We retrieved the feed
my $parser = new XML::Parser(); # Make a new instance of XML::Parser
$parser->setHandlers # Set the functions that will be called when the parser encounters different kinds of data within the XML file.
(
Start => \&startHandler, # Handles start tags (e.g. )
End => \&endHandler, # Handles end tags (e.g.
Char => \&DataHandler # Handles data inside of start and end tags
);
$parser->parsestring($rssFeed->content); # Parse the feed
}
}
#
# Called every time XML::Parser encounters a start tag
# #param: $parseInstance {object} | Instance of the XML::Parser. Passed automatically when feed is parsed.
# #param: $element {string} | The name of the XML element being parsed (e.g. "title"). Passed automatically when feed is parsed.
# #attributes {array} | An array of all of the attributes of $element
# #returns: void
#
sub startHandler {
my($parseInstance, $element, %attributes) = #_;
$currentTag = $element;
}
#
# Called every time XML::Parser encounters anything that is not a start or end tag (i.e, all the data in between tags)
# #param: $parseInstance {object} | Instance of the XML::Parser. Passed automatically when feed is parsed.
# #param: $element {string} | The name of the XML element being parsed (e.g. "title"). Passed automatically when feed is parsed.
# #attributes {array} | An array of all of the attributes of $element
# #returns: void
#
sub DataHandler {
my($parseInstance, $element, %attributes) = #_;
if($currentTag eq "link" && $element ne "\n") {
$torrentUrl = $element;
}
}
#
# Called every time XML::Parser encounters an end tag
# #param: $parseInstance {object} | Instance of the XML::Parser. Passed automatically when feed is parsed.
# #param: $element {string} | The name of the XML element being parsed (e.g. "title"). Passed automatically when feed is parsed.
# #attributes {array} | An array of all of the attributes of $element
# #returns: void
#
sub endHandler {
my($parseInstance, $element, %attributes) = #_;
if($element eq "item" && $isDownloaded == 0) { # We just finished parsing an element so let's attempt to download a torrent
print("DOWNLOADING: $torrentUrl" . "/download.torrent \n");
system("echo.|lwp-download " . $torrentUrl . "/download.torrent"); # We echo the "return " key into the command to force it to skip any file-overwite prompts
if(unlink("download.torrent.html")) { # We tried to download a 'locked' torrent
$isDownloaded = 0; # Forces program to download next torrent on list from current show
}
else {
$isDownloaded = 1;
$dbh->do("UPDATE shows SET last_episode = '$last_episode' WHERE id = '$id'"); # Update DB with new show information
}
}
}
Yes, the URLs you are retrieving appear to be returning a 302 (redirect). Net::HTTP requires/allows you to handle the redirect yourself. You typically use a recursive techique like AboutRuby mentioned (although this http://www.ruby-forum.com/topic/142745 suggests you should not only look at the 'Location' field but also for META REFRESH in the response).
open-uri will handle redirects for you if you're not interested in the low-level interaction:
require 'open-uri'
File.open("#{episode_id}.torrent", 'wb') {|torrent_file| torrent_file.write open(torrent_url).read}
get_response will return a class from the HTTPResponse hierarchy. It's usually HTTPSuccess, but if there's a redirect, it will be HTTPRedirection. A simple recursive method can solve this, that follows redirects. How to handle this correctly is in the docs under the heading "Following Redirection."

Why must I use local path rather than 'svn://' with SVN bindings?

I'm using the Ruby SVN bindings built with SWIG. Here's a little tutorial.
When I do this
#repository = Svn::Repos.open('/path/to/repository')
I can access the repository fine. But when I do this
#repository = Svn::Repos.open('svn://localhost/some/path')
It fails with
/SourceCache/subversion/subversion-35/subversion/subversion/libsvn_subr/io.c:2710: 2: Can't open file 'svn://localhost/format': No such file or directory
When I do this from the command line, I do get output
svn ls svn://localhost/some/path
Any ideas why I can't use the svn:// protocol?
EDIT
Here's what I ended up doing, and it works.
require 'svn/ra'
class SvnWrapper
def initialize(repository_uri, repository_username, repository_password)
# Remove any trailing slashes from the path, as the SVN library will choke
# if it finds any.
#repository_uri = repository_uri.gsub(/[\/]+$/, '')
# Initialize repository session.
#context = Svn::Client::Context.new
#context.add_simple_prompt_provider(0) do |cred, realm, username, may_save|
cred.username = repository_username
cred.password = repository_password
cred.may_save = true
end
config = {}
callbacks = Svn::Ra::Callbacks.new(#context.auth_baton)
#session = Svn::Ra::Session.open(#repository_uri, config, callbacks)
end
def ls(relative_path, revision = nil)
relative_path = relative_path.gsub(/^[\/]+/, '').gsub(/[\/]+$/, '')
entries, properties = #session.dir(relative_path, revision)
return entries.keys.sort
end
def info(relative_path, revision = nil)
path = File.join(#repository_uri, relative_path)
data = {}
#context.info(path, revision) do |dummy, infoStruct|
# These values are enumerated at http://svn.collab.net/svn-doxygen/structsvn__info__t.html.
data['url'] = infoStruct.URL
data['revision'] = infoStruct.rev
data['kind'] = infoStruct.kind
data['repository_root_url'] = infoStruct.repos_root_url
data['repository_uuid'] = infoStruct.repos_UUID
data['last_changed_revision'] = infoStruct.last_changed_rev
data['last_changed_date'] = infoStruct.last_changed_date
data['last_changed_author'] = infoStruct.last_changed_author
data['lock'] = infoStruct.lock
end
return data
end
end
Enjoy.
The svn command is a client. It communicates with the Subversion server using several protocols (http(s)://, svn:// and file:///).
Repos.open is a repository function (much like svnadmin for instance). It operates directly on the database, and doesn't use a client protocol to communicate with the server.

Resources