I'm trying to use the gem charlock_holmes (https://github.com/brianmario/charlock_holmes) to detect and correct character formatting errors. However, the program doesn't return anything.
My code is:
require 'charlock_holmes'
contents = File.read('./myfile.csv')
detection = CharlockHolmes::EncodingDetector.detect(contents)
# => {:encoding => 'UTF-8', :confidence => 100, :type => :text}
as specified in the documentation.
When I run this in the directory, I just get nothing at all:
user$ ruby detector.rb
user$
Expected behavior is that it returns the detected encoding (and, if desired, can change it as well). I've got all the gems installed, I think, and I've tried under both 1.9.2 and 2.0.0.
Any ideas what I'm doing wrong or how to find out? I'm afraid I'm new to ruby, but I have tried to do a pretty comprehensive search before asking and have come up blank.
I think you should put p detection in your file detector.rb.
Save your code as below :
require 'charlock_holmes'
contents = File.read('./myfile.csv')
detection = CharlockHolmes::EncodingDetector.detect(contents)
p detection
Now run it as you ran earlier.
Related
I'm experiencing an issue wherein awesome_print is not displaying output in it's gorgeous colorized multiline format. What I find most curious is that while the gem is installed:
$ gem install awesome_print
Successfully installed awesome_print-1.6.1
1 gem installed
It returns a false upon require in IRB:
>> require 'awesome_print'
false
Any idea as to what may be causing this? I am not quite sure how to tackle this since gem installation seems to work fine and I can even use ap "test" in IRB with no error, except there is no colorization or proper printing with multiple lines and seems to simply fall back to some other method for printing.
No ~/.aprc changes evoke any changes either.
Pass the options ap object, options = {:plain => false, :multiline => true} or you can add it to the config file.
create an ~/.irbc file with the following content
require "awesome_print"
AwesomePrint.irb!
:multiline => true, # Display in multiple lines.
:plain => false
I had the same error,although require was returning false but awesome print was working, try to print something using awesome_print(ap), like
ap data = {foo: "bar"}
I am trying to parse the following string called result:
{
"status":0,
"id":"faxxxxx-1",
"hypotheses":[
{"utterance":"skateboard","confidence":0.90466744},
{"utterance":"skate board"},
{"utterance":"skateboarding"},
{"utterance":"skateboards"},
{"utterance":"skate bored"}
]
}
Using obj = JSON.parse(result) in Ruby 1.8 with the json gem.
The command in question is:
puts "#{obj['hypotheses'][0]}"
My old workstation (whose harddrive died) gave me:
{"utterance" => "skateboard", "confidence" => 0.90466744}
My current workstation gives me:
confidence0.90466744utteranceskateboard
The old workstation was not set up by me, so I don't know what kind of packages were installed, while this current one was.
Why is there a difference in the output of the exact same script?
How can I make the current one look like the old one?
I am completely new to this btw.
In Ruby 1.8, Hash#to_s simply joins all of the elements together without spaces, equivalent to to_a.flatten.join('').
In Ruby 1.9, Hash#to_s is an alias to inspect and produces well-formatted output.
To get the equivalent thing in both cases:
puts obj['hypotheses'][0].inspect
The same thing applies to Array.
just having troubles to make I18n to work without Rails environment:
irb> require 'i18n'
=> true
irb> I18n.load_path=Dir['/usr/lib/ruby/gems/1.9.1/gems/rails-i18n-0.6.6/rails/locale/en.yml']
=> ["/usr/lib/ruby/gems/1.9.1/gems/rails-i18n-0.6.6/rails/locale/en.yml"]
irb> I18n.load_path+=Dir['/usr/lib/ruby/gems/1.9.1/gems/rails-i18n-0.6.6/rails/locale/sk.yml']
=> ["/usr/lib/ruby/gems/1.9.1/gems/rails-i18n-0.6.6/rails/locale/en.yml", "/usr/lib/ruby/gems/1.9.1/gems/rails-i18n-0.6.6/rails/locale/sk.yml"]
irb> I18n.locale=:sk
=> :sk
irb> I18n.default_locale=:sk
=> :sk
irb> I18n.l Time.now
I18n::MissingTranslationData: translation missing:
sk.time.formats.default
from /usr/lib/ruby/gems/1.9.1/gems/i18n-0.6.1/lib/i18n.rb:289:in
`handle_exception'
from /usr/lib/ruby/gems/1.9.1/gems/i18n-0.6.1/lib/i18n.rb:159:in
`translate'
from
/usr/lib/ruby/gems/1.9.1/gems/i18n-0.6.1/lib/i18n/backend/base.rb:55:in
`localize'
from /usr/lib/ruby/gems/1.9.1/gems/i18n-0.6.1/lib/i18n.rb:236:in
`localize'
from (irb):11
from /usr/bin/irb:12:in `<main>'
irb>
What am I doing wrong ? The sk.yml DOES contain sk.time.formats.default
element !!
In addition what's the I18n's default load_path(s) so I won't be
bothered to supply full paths to every translation YAML/Ruby file ?
Thanks.
You already set the search path for the language definitions with I18n.load_path.
It seems, this is enough when using rails. Without rails, you must also load the language definitions with I18n.backend.load_translations.
In summary, you need two steps:
I18n.load_path = Dir['*.yml']
I18n.backend.load_translations
The dictionaries are defined with language key, e.g. like:
en:
hello: "Hello world"
If you prefer to define your en.yml without language key, you may load them via
I18n.backend.store_translations(:en , YAML.load(File.read('en.yml')))
(You may also use a here-document or direct a ruby-hash).
It seems like your load_path is not being set correctly.
Try including the whole directory and if it's successful, you should see your :sk and :en files by calling I18n.load_path.
I18n.load_path = Dir['/usr/lib/ruby/gems/1.9.1/gems/rails-i18n-0.6.6/rails/locale/*yml']
Setting the files paths directly can be a bit confusing since I18n won't raise an error if the file doesn't exist.
As a side note, I'd advise against including translations from the rails-i18n gem as the path may be different from one machine to another with different ruby versions etc.. a file local to the project would be better.
You'll need to install rails-i18n gem just to get localization data.
With this gem install, one can e.g. print month names in sk localization with:
require 'rails-i18n'
I18n.load_path += $LOADED_FEATURES
.select {|f| "rails-i18n.rb".in? f }
.collect {|f| f.sub('lib/rails-i18n.rb', 'rails/locale/sk.yml') }
I18n.locale = :sk
puts I18n.t('date.month_names').compact
This yields:
Január
Február
Marec
Apríl
Máj
Jún
Júl
August
September
Október
November
December
I've one file, main.rb with the following content:
require "tokenizer.rb"
The tokenizer.rb file is in the same directory and its content is:
class Tokenizer
def self.tokenize(string)
return string.split(" ")
end
end
If i try to run main.rb I get the following error:
C:\Documents and Settings\my\src\folder>ruby main.rb
C:/Ruby193/lib/ruby/1.9.1/rubygems/custom_require.rb:36:in `require': cannot load such file -- tokenizer.rb (LoadError)
from C:/Ruby193/lib/ruby/1.9.1/rubygems/custom_require.rb:36:in `require '
from main.rb:1:in `<main>'
I just noticed that if I use load instead of require everything works fine. What may the problem be here?
I just tried and it works with require "./tokenizer".
Just do this:
require_relative 'tokenizer'
If you put this in a Ruby file that is in the same directory as tokenizer.rb, it will work fine no matter what your current working directory (CWD) is.
Explanation of why this is the best way
The other answers claim you should use require './tokenizer', but that is the wrong answer, because it will only work if you run your Ruby process in the same directory that tokenizer.rb is in. Pretty much the only reason to consider using require like that would be if you need to support Ruby 1.8, which doesn't have require_relative.
The require './tokenizer' answer might work for you today, but it unnecessarily limits the ways in which you can run your Ruby code. Tomorrow, if you want to move your files to a different directory, or just want to start your Ruby process from a different directory, you'll have to rethink all of those require statements.
Using require to access files that are on the load path is a fine thing and Ruby gems do it all the time. But you shouldn't start the argument to require with a . unless you are doing something very special and know what you are doing.
When you write code that makes assumptions about its environment, you should think carefully about what assumptions to make. In this case, there are up to three different ways to require the tokenizer file, and each makes a different assumption:
require_relative 'path/to/tokenizer': Assumes that the relative path between the two Ruby source files will stay the same.
require 'path/to/tokenizer': Assumes that path/to/tokenizer is inside one of the directories on the load path ($LOAD_PATH). This generally requires extra setup, since you have to add something to the load path.
require './path/to/tokenizer': Assumes that the relative path from the Ruby process's current working directory to tokenizer.rb is going to stay the same.
I think that for most people and most situations, the assumptions made in options #1 and #2 are more likely to hold true over time.
Ruby 1.9 has removed the current directory from the load path, and so you will need to do a relative require on this file, as David Grayson says:
require_relative 'tokenizer'
There's no need to suffix it with .rb, as Ruby's smart enough to know that's what you mean anyway.
require loads a file from the $LOAD_PATH. If you want to require a file relative to the currently executing file instead of from the $LOAD_PATH, use require_relative.
I would recommend,
load './tokenizer.rb'
Given, that you know the file is in the same working directory.
If you're trying to require it relative to the file, you can use
require_relative 'tokenizer'
I hope this helps.
Another nice little method is to include the current directory in your load path with
$:.unshift('.')
You could push it onto the $: ($LOAD_PATH) array but unshift will force it to load your current working directory before the rest of the load path.
Once you've added your current directory in your load path you don't need to keep specifying
require './tokenizer'
and can just go back to using
require 'tokenizer'
This will work nicely if it is in a gem lib directory and this is the tokenizer.rb
require_relative 'tokenizer/main'
For those who are absolutely sure their relative path is correct, my problem was that my files did not have the .rb extension! (Even though I used RubyMine to create the files and selected that they were Ruby files on creation.)
Double check the file extensions on your file!
What about including the current directory in the search path?
ruby -I. main.rb
I used jruby-1.7.4 to compile my ruby code.
require 'roman-numerals.rb'
is the code which threw the below error.
LoadError: no such file to load -- roman-numerals
require at org/jruby/RubyKernel.java:1054
require at /Users/amanoharan/.rvm/rubies/jruby-1.7.4/lib/ruby/shared/rubygems/custom_require.rb:36
(root) at /Users/amanoharan/Documents/Aptana Studio 3 Workspace/RubyApplication/RubyApplication1/Ruby2.rb:2
I removed rb from require and gave
require 'roman-numerals'
It worked fine.
The problem is that require does not load from the current directory. This is what I thought, too but then I found this thread. For example I tried the following code:
irb> f = File.new('blabla.rb')
=> #<File:blabla.rb>
irb> f.read
=> "class Tokenizer\n def self.tokenize(string)\n return string.split(
\" \")\n end\nend\n"
irb> require f
LoadError: cannot load such file -- blabla.rb
from D:/dev/Ruby193/lib/ruby/1.9.1/rubygems/custom_require.rb:36:in `req
uire'
from D:/dev/Ruby193/lib/ruby/1.9.1/rubygems/custom_require.rb:36:in `req
uire'
from (irb):24
from D:/dev/Ruby193/bin/irb:12:in `<main>'
As it can be seen it read the file ok, but I could not require it (the path was not recognized). and here goes code that works:
irb f = File.new('D://blabla.rb')
=> #<File:D://blabla.rb>
irb f.read
=> "class Tokenizer\n def self.tokenize(string)\n return string.split(
\" \")\n end\nend\n"
irb> require f
=> true
As you can see if you specify the full path the file loads correctly.
First :
$ sudo gem install colored2
And,you should input your password
Then :
$ sudo gem update --system
Appear
Updating rubygems-update
ERROR: While executing gem ... (OpenSSL::SSL::SSLError)
hostname "gems.ruby-china.org" does not match the server certificate
Then:
$ rvm -v
$ rvm get head
Last
What language do you want to use?? [ Swift / ObjC ]
ObjC
Would you like to include a demo application with your library? [ Yes / No ]
Yes
Which testing frameworks will you use? [ Specta / Kiwi / None ]
None
Would you like to do view based testing? [ Yes / No ]
No
What is your class prefix?
XMG
Running pod install on your new library.
you need to give the path.
Atleast you should give the path from the current directory. It will work for sure.
./filename
How does one reliably determine a file's type? File extension analysis is not acceptable. There must be a rubyesque tool similar to the UNIX file(1) command?
This is regarding MIME or content type, not file system classifications, such as directory, file, or socket.
There is a ruby binding to libmagic that does what you need. It is available as a gem named ruby-filemagic:
gem install ruby-filemagic
Require libmagic-dev.
The documentation seems a little thin, but this should get you started:
$ irb
irb(main):001:0> require 'filemagic'
=> true
irb(main):002:0> fm = FileMagic.new
=> #<FileMagic:0x7fd4afb0>
irb(main):003:0> fm.file('foo.zip')
=> "Zip archive data, at least v2.0 to extract"
irb(main):004:0>
If you're on a Unix machine try this:
mimetype = `file -Ib #{path}`.gsub(/\n/,"")
I'm not aware of any pure Ruby solutions that work as reliably as 'file'.
Edited to add: depending what OS you are running you may need to use 'i' instead of 'I' to get file to return a mime-type.
I found shelling out to be the most reliable. For compatibility on both Mac OS X and Ubuntu Linux I used:
file --mime -b myvideo.mp4
video/mp4; charset=binary
Ubuntu also prints video codec information if it can which is pretty cool:
file -b myvideo.mp4
ISO Media, MPEG v4 system, version 2
You can use this reliable method base on the magic header of the file :
def get_image_extension(local_file_path)
png = Regexp.new("\x89PNG".force_encoding("binary"))
jpg = Regexp.new("\xff\xd8\xff\xe0\x00\x10JFIF".force_encoding("binary"))
jpg2 = Regexp.new("\xff\xd8\xff\xe1(.*){2}Exif".force_encoding("binary"))
case IO.read(local_file_path, 10)
when /^GIF8/
'gif'
when /^#{png}/
'png'
when /^#{jpg}/
'jpg'
when /^#{jpg2}/
'jpg'
else
mime_type = `file #{local_file_path} --mime-type`.gsub("\n", '') # Works on linux and mac
raise UnprocessableEntity, "unknown file type" if !mime_type
mime_type.split(':')[1].split('/')[1].gsub('x-', '').gsub(/jpeg/, 'jpg').gsub(/text/, 'txt').gsub(/x-/, '')
end
end
This was added as a comment on this answer but should really be its own answer:
path = # path to your file
IO.popen(
["file", "--brief", "--mime-type", path],
in: :close, err: :close
) { |io| io.read.chomp }
I can confirm that it worked for me.
If you're using the File class, you can augment it with the following functions based on #PatrickRichie's answer:
class File
def mime_type
`file --brief --mime-type #{self.path}`.strip
end
def charset
`file --brief --mime #{self.path}`.split(';').second.split('=').second.strip
end
end
And, if you're using Ruby on Rails, you can drop this into config/initializers/file.rb and have available throughout your project.
For those who came here by the search engine, a modern approach to find the MimeType in pure ruby is to use the mimemagic gem.
require 'mimemagic'
MimeMagic.by_magic(File.open('tux.jpg')).type # => "image/jpeg"
If you feel that is safe to use only the file extension, then you can use the mime-types gem:
MIME::Types.type_for('tux.jpg') => [#<MIME::Type: image/jpeg>]
You could give shared-mime a try (gem install shared-mime-info). Requires the use ofthe Freedesktop shared-mime-info library, but does both filename/extension checks as well as "magic" checks... tried giving it a whirl myself just now but I don't have the freedesktop shared-mime-info database installed and have to do "real work," unfortunately, but it might be what you're looking for.
Pure Ruby solution using magic bytes and returning a symbol for the matching type:
https://github.com/SixArm/sixarm_ruby_magic_number_type
I wrote it, so if you have suggestions, let me know.
I recently found mimetype-fu.
It seems to be the easiest reliable solution to get a file's MIME type.
The only caveat is that on a Windows machine it only uses the file extension, whereas on *Nix based systems it works great.
The best I found so far:
http://bogomips.org/mahoro.git/
The ruby gem is well.
mime-types for ruby
You could give a go with MIME::Types for Ruby.
This library allows for the identification of a file’s likely MIME content type. The identification of MIME content type is based on a file’s filename extensions.