optimising reading id3 tags of mp3 files - ruby

I'm trying to read mp3 files using 'mp3info' gem and by going through each file which ends with .mp3 in its file name in a directory and going inside a directory using Dir.chdir() and repeating the process and storing these tags in database. But I have 30gb of music collection and it takes around 6-10 mins for the whole scan to complete. Is there any way I can optimise this scan?
def self.gen_list(dir)
prev_pwd=Dir.pwd
begin
Dir.chdir(dir)
rescue Errno::EACCES
end
counter = 0
Dir[Dir.pwd+'/*'].each{|x|
#puts Dir.pwd
if File.directory?(x) then
self.gen_list(x) do |y|
yield y
end
else if File.basename(x).match('.mp3') then
begin
Mp3Info.open(x) do |y|
yield [x,y.tag.title,y.tag.album,y.tag.artist]
end
rescue Mp3InfoError
end
end
end
}
Dir.chdir(prev_pwd)
end
This is the method which generates list and sends the tags to &block where data is stored in database..

Have you tried setting the parse_mp3 flag to false? by default it is on which means you are going to pull in the entire file for each scan when all you care about is the info. I don't know how much time this will save you. See the github source for more info.
https://github.com/moumar/ruby-mp3info/blob/master/lib/mp3info.rb#L214
# Specify :parse_mp3 => false to disable processing of the mp3
def initialize(filename_or_io, options = {})

You can:
Run several processes (for each directory in the base dir, for example)
Use threads with rubinius or JRuby.

You can try taglib-ruby gem which is unlike mp3info wrapper over C library and it could give you little bit more performance. Otherwise you have to stick to JRuby and run multiple threads (4 if you have 4 cores).

You may also benefit from a more direct way of retrieving the mp3 files.
Dir['**/*.mp3'].each |filepath|
Mp3Info.open(filepath) do |mp3|
...
end
rescue Mp3ErrorInfo
...
end
This will find all .mp3 files at any depth from the current directory and yield the relative path to the block. It is approximately equivalent to find . -name '*.mp3' -print

Related

Fact file was parsed but returned an empty data set

For my current module, I need to check if php version 5 or 7 is installed and created a fact for this. The fact file is stored in the modules directory in facts.d/packageversion.rb and has the following content:
#!/usr/bin/ruby
require 'facter'
Facter.add(:php_version) do
setcode do
if File.directory? '/etc/php5'
5
else
if File.directory? '/etc/php7'
7
else
0
end
end
end
end
But I can't use it in my module. In Puppet agent log, i get this error:
Fact file /var/lib/puppet/facts.d/packageversion.rb was parsed but
returned an empty data set
How can I solve this?
facts.d is the module directory for external facts. You could place this file into the external facts directory, but the expected output would need to be key-value pairs. This is not happening, so Puppet is not finding a data set for the fact. https://docs.puppet.com/facter/3.6/custom_facts.html#executable-facts-----unix
You have written this fact as a custom fact and not an external fact. Therefore, it needs to be placed inside the lib/facter directory in your module instead. Then it will function correctly. I notice this important information seems to have been removed from the latest Facter documentation, which probably lends to your confusion.
Also, consider using an elsif in your code for clarity and optimization:
if File.directory? '/etc/php5'
5
elsif File.directory? '/etc/php7'
7
else
0
end
What Matt Schuchard said.
Also, you might consider that the Approved Vox Populi Puppet module uses this code for PHP version:
Facter.add(:phpversion) do
setcode do
output = Facter::Util::Resolution.exec('php -v')
unless output.nil?
output.split("\n").first.split(' ').
select { |x| x =~ %r{^(?:(\d+)\.)(?:(\d+)\.)?(\*|\d+)} }.first
end
end
end
Note that Facter::Util::Resolution.exec is deprecated in favour of Facter::Core::Execution.exec.
Aside from that, you might consider this a better way of getting the PHP version.

How to unit test a "disk full" scenario with Ruby RSpec?

I need to unit test scenarios like the following:
The disk has 1MB free space. I try to copy 2MB of file(s) to the disk.
What's the best way to do this with Ruby RSpec?
For further information, I need to unit test the following file cache method, since it appears to have some issue:
def set first_key, second_key='', files=[]
# If cache exists already, overwrite it.
content_dir = get first_key, second_key
second_key_file = nil
begin
if (content_dir.nil?)
# Check the size of cache, and evict entries if too large
check_cache_size if (rand(100) < check_size_percent)
# Make sure cache dir doesn't exist already
first_cache_dir = File.join(dir, first_key)
if (File.exist?first_cache_dir)
raise "BuildCache directory #{first_cache_dir} should be a directory" unless File.directory?(first_cache_dir)
else
FileUtils.mkpath(first_cache_dir)
end
num_second_dirs = Dir[first_cache_dir + '/*'].length
cache_dir = File.join(first_cache_dir, num_second_dirs.to_s)
# If cache directory already exists, then a directory must have been evicted here, so we pick another name
while File.directory?cache_dir
cache_dir = File.join(first_cache_dir, rand(num_second_dirs).to_s)
end
content_dir = File.join(cache_dir, '/content')
FileUtils.mkpath(content_dir)
# Create 'last_used' file
last_used_filename = File.join(cache_dir, 'last_used')
FileUtils.touch last_used_filename
FileUtils.chmod(permissions, last_used_filename)
# Copy second key
second_key_file = File.open(cache_dir + '/second_key', 'w+')
second_key_file.flock(File::LOCK_EX)
second_key_file.write(second_key)
else
log "overwriting cache #{content_dir}"
FileUtils.touch content_dir + '/../last_used'
second_key_file = File.open(content_dir + '/../second_key', 'r')
second_key_file.flock(File::LOCK_EX)
# Clear any existing files out of cache directory
FileUtils.rm_rf(content_dir + '/.')
end
# Copy files into content_dir
files.each do |filename|
FileUtils.cp(filename, content_dir)
end
FileUtils.chmod(permissions, Dir[content_dir + '/*'])
# Release the lock
second_key_file.close
return content_dir
rescue => e
# Something went wrong, like a full disk or some other error.
# Delete any work so we don't leave cache in corrupted state
unless content_dir.nil?
# Delete parent of content directory
FileUtils.rm_rf(File.expand_path('..', content_dir))
end
log "ERROR: Could not set cache entry. #{e.to_s}"
return 'ERROR: !NOT CACHED!'
end
end
One solution is to stub out methods that write to disk to raise an error. For example, for the specs that test disk space errors, you could try:
before do
allow_any_instance_of(File).to receive(:open) { raise Errno::ENOSPC }
# or maybe # allow(File).to receive(:write) { raise Errno::ENOSPC }
# or # allow(FileUtils).to receive(:cp) { raise Errno::ENOSPC }
# or some combination of these 3...
end
it 'handles an out of disk space error' do
expect{ my_disk_cache.set('key1', 'key2', [...]) }.to # your logic for how BuildCache::DiskCache should handle the error here.
end
There are two problems with this however:
1) Errno::ENOSPC may not be the error you actually see getting raised. That error fits the description in your question, but depending on the peculiarities of your lib and the systems it runs on, you might not really be getting an Errno::ENOSPC error. Maybe you run out of RAM first and are getting Errno::ENOMEM, or maybe you have too many file descriptors open and are getting Errno::EMFILE. Of course if you want to be rigorous you could handle all of these, but this is time consuming and you'll get diminishing returns for handling the more obscure errors.
See this for more information on Errno errors.
2) This solution involves stubbing a specific method on a specific class. (File.open) This isn't ideal because it couples the setup for your test to the implementation in your code. That is to say, if you refactor BuildCache::DiskCache#set to not use File.open, then this test might start failing even though the method might be correct.
That said, File.open is fairly low level. I know that some FileUtils methods use File.open, (Notably, FileUtils.cp) so I would suggest just using that first allow_any_instance_of line. I'd expect that to handle most of your use cases.
Alternatively, there is a tool called fakefs that may be able to help you with this. I am not familiar with it, but it may well have functionality that helps with testing such errors. You may want to look into it.
You could make use of any of the method calls you know are happening inside of the method you need to test, and stub them so they raise an error. E.g. FileUtils.touch is called a number of times, so we could do:
it 'handles file write error gracefully' do
allow(FileUtils).to receive(:touch).and_raise('oh no')
# your expectations
# your test trigger
end

Trouble Creating Directories with mkdir

New to Ruby, probably something silly
Trying to make a directory in order to store files in it. Here's my code to do so
def generateParsedEmailFile
apath = File.expand_path($textFile)
filepath = Pathname.new(apath + '/' + #subject + ' ' + #date)
if filepath.exist?
filepath = Pathname.new(filepath+ '.1')
end
directory = Dir.mkdir (filepath)
Dir.chdir directory
emailText = File.new("emailtext.txt", "w+")
emailText.write(self.generateText)
emailText.close
for attachment in #attachments
self.generateAttachment(attachment,directory)
end
end
Here's the error that I get
My-Name-MacBook-2:emails myname$ ruby etext.rb email4.txt
etext.rb:196:in `mkdir': Not a directory - /Users/anthonydreessen/Developer/Ruby/emails/email4.txt/Re: Make it Brief Report Wed 8 May 2013 (Errno::ENOTDIR)
from etext.rb:196:in `generateParsedEmailFile'
from etext.rb:235:in `<main>'
I was able to recreate the error - it looks like email4.txt is a regular file, not a directory, so you can't use it as part of your directory path.
If you switched to mkdir_p and get the same error, perhaps one of the parents named in '/Users/anthonydreessen/Developer/Ruby/emails/email4.txt/Re: Make it Brief Report Wed 8 May 2013' already exists as a regular file and can't be treated like a directory. Probably that last one named email.txt
You've got the right idea, but should be more specific about the files you're opening. Changing the current working directory is really messy as it changes it across the entire process and could screw up other parts of your application.
require 'fileutils'
def generate_parsed_email_file(text_file)
path = File.expand_path("#{#subject} #{date}", text_file)
while (File.exist?(path))
path.sub!(/(\.\d+)?$/) do |m|
".#{m[1].to_i + 1}"
end
end
directory = File.dirname(path)
unless (File.exist?(directory))
FileUtils.mkdir_p(directory)
end
File.open(path, "w+") do |email|
emailText.write(self.generateText)
end
#attachments.each do |attachment|
self.generateAttachment(attachment, directory)
end
end
I've taken the liberty of making this example significantly more Ruby-like:
Using mixed-case names in methods is highly irregular, and global variables are frowned on.
It's extremely rare to see for used, each is much more flexible.
The File.open method yields to a block if the file could be opened, and closes automatically when the block is done.
The ".1" part has been extended to keep looping until it finds an un-used name.
FileUtils is employed to makes sure the complete path is created.
The global variable has been converted to an argument.

Is there a better way to check file size using SCP before downloading?

I have this code to download a file from a remote machine, but I want to limit it to files that are less than 5MB.
So far the code works. but is there a better way to check the filesize before downloading?
Net::SCP.start(hname, uname, :password => pw ) do|scp|
fsize=scp.download!("#{remdir}/#{filname}").size;puts fsize
scp.download!("#{remdir}/#{filname}", "#{filname}") do |ch, name, sent, total|
File.open(lfile, 'a'){ |f| f.puts "#{name}: #{(sent.to_f * 100 / total.to_f).to_i}% complete"}
#puts "Sending : #{(sent.to_f * 100 / total.to_f).to_i}% complete"
#puts "#{name}: #{sent}/#{total}"
#print "\r#{name}: #{(sent.to_f * 100 / total.to_f).to_i}%"
end
end
Does this cause any problem if I am use it for large files?
fsize=scp.download!("#{remdir}/#{filname}").size;puts fsize
This page says the file will be returned as a string:
http://ruby.about.com/od/ssh/ss/netscp_6.htm
Update:
I tried SFTP aswell. first it did not work for full path to file. and secondly it did not do what i wanted. so was using scp.download!().size. i know i am doing the download twice :(
require 'net/sftp'
# did not take full path "/user/myname/filename"
remote_path="somefile.txt"
Net::SFTP.start(hname, uname, :password => pw) do |sftp|
attrs = sftp.stat!("rome_desc.txt") ; puts attrs # gave # ☼ ↨?% (? '→ ??Q{ ?Qt;?
sftp.stat!(remote_path) do |response| puts response #returned no such file (2)
# but did not do below upload operation.
unless response.ok?
sftp.upload!("#{filname}", remdir)
end
end
end
Update:2 Solution
found the solution using the the comments provided from below users and after searching the net.
Net::SFTP.start(hname, uname, :password => pw) do |sftp| #, :verbose => Logger::DEBUG
sftp.dir.glob("./#{remdir}", "#{filname}") do |entry| p entry.name
file_size=entry.attributes.size; file_size = '%.2f' % (("#{file_size}").to_f / 2**20) ; File.open(lfile, 'a'){ |f| f.puts "File size is #{file_size} mb"}
if file_size < file_size_param then
sftp.download!("#{remdir}/#{filname}", filname)
else
File.open(lfile, 'a'){ |f| f.puts "File size is greater than #{file_size_param} mb. so can not Download File"}
end
end
end
used .attributes.size to obtain the file size and perform the download operation by checking the filesize.
sftp.dir.glob("./#{remdir}", "#{filname}") do |entry| p entry.name
file_size=entry.attributes.size
Does this cause any problem if I am use it for large files?
We don't know because we don't know how fast your internet connection is, how much RAM you have, how fast the pipe is from the host you're downloading the file from?
Basically though, you are reading the file twice, once into memory to see how big it is, then again if it meets your requirement, which seems really... silly.
You're doubling the traffic to the host you're reading from and on your network connection, and, if the file is larger than RAM on your local machine, it is going to go nuts.
As Darshan says, look at using Net::SFTP. It will give you the ability to query the file's size before you try to load it, without pulling the entire thing down. It's a bit more complicated to use, but that complexity translates into flexibility.
"/user/myname/filename"
(S)FTP might not necessarily have its base path where it can see that file. To probe the system and figure out, ask the system, via the SFTP connection, what its current directory is when you first login, then ask it for the files it can see using something like this example from the Net::STFP docs:
sftp.dir.glob("/base/path", "*/**/*.rb") do |entry|
p entry.name
end
That will recursively look through the "/base/path" hierarchy, searching for all "*.rb" files.
Your current code downloads the file, checks the size of the downloaded file (presumably to check if it is less than 5MB, but you don't actually do that), and then downloads it again. Even if you did something with fsize, it's too late to have not downloaded it.
I'd look into the sftp gem rather than scp; it should be pretty straightforward to do what you want with sftp, but not with scp.

Good Way to Handle Many Different Files?

I'm building a specialized pipeline, and basically, every step in the pipeline involves taking one file as input and creating a different file as output. Not all files are in the same directory, all output files are of a different format, and because I'm using several different programs, different actions have to be taken to appease the different programs.
This has led to some complicated file management in my code, and the more I try to organize the file directories, the more ugly it's getting. Just about every class involves some sort of code like the following:
#fileName = File.basename(file)
#dataPath = "#{$path}/../data/"
MzmlToOther.new("mgf", "#{#dataPath}/spectra/#{#fileName}.mzML", 1, false).convert
system("wine readw.exe --mzXML #{#file}.raw #{$path}../data/spectra/#{File.basename(#file + ".raw", ".raw")}.mzXML 2>/dev/null")
fileName = "#{$path}../data/" + parts[0] + parts[1][6..parts[1].length-1].chomp(".pep.xml")
Is there some sort of design pattern, or ruby gem, or something to clean this up? I like writing clean code, so this is really starting to bother me.
You could use a Makefile.
Make is essential a DSL designed for handling converting one type of file to another type via running an external program. As an added bonus, it will handle only performing the steps necessary to incrementally update your output if some set of source files change.
If you really want to use Ruby, try a rakefile. Rake will do this, and it's still Ruby.
You can make this as sophisticated as you want but this basic script will match a file suffix to a method which you can then call with the file path.
# a conversion method can be used for each file type if you want to
# make the code more readable or if you need to rearrange filenames.
def htm_convert file
"HTML #{file}"
end
# file suffix as key, lambda as value, the last uses an external method
routines = {
:log => lambda {|file| puts "LOG #{file}"},
:rb => lambda {|file| puts "RUBY #{file}"},
:haml => lambda {|file| puts "HAML #{file}"},
:htm => lambda {|file| puts htm_convert(file) }
}
# this loops recursively through the directory and sub folders
Dir['**/*.*'].each do |f|
suffix = f.split(".")[-1]
if routine = routines[suffix.to_sym]
routine.call(f)
else
puts "UNPROCESSED -- #{f}"
end
end

Resources