I need to read an external file in ruby.
Running file -i locally shows
text/plain; charset=utf-16le
I open it in ruby CSV with separater '\t' and a row shows as:
<CSV::Row "\xFF\xFEC\x00a\x00n\x00d\x00i\x00d\x00a\x00t\x00e\x00 \x00n\x00u\...
row.to_s produces \x000\x000\x000\x001\x00\t\x00E\x00D\x00O
Running puts row shows the data correctly:
0001 EDOARDO A...
(the values also show legibly in vim and LibreOffice Calc)
Any suggestions how to get the data in ruby? I've tried various combinations of opening the CSV with external_encoding: 'utf-16le', internal_encoding: "utf-8" etc., but puts is the only thing that gives legible values
It also said ASCII-8BIT in ruby CSV.
<#CSV io_type:StringIO encoding:ASCII-8BIT lineno:0 col_sep:"\\t" row_sep:"\n" quote_char:"\"" headers:true>
The file itself was produced as an XLS file. I have uploaded an edited version here (edited i gvim)
This is working fine for me:
require 'csv'
CSV.foreach("file.xls", encoding: "UTF-16LE:UTF-8", col_sep: "\t") do |row|
puts row.inspect
end
this will produce the following output:
["Candidate number", "First name", "Last name", "Date of birth", "Preparation centre", "Result", "Score", "Reading and Writing", "Listening", "Speaking", "Result enquiry", "Raised on", "Raised by", "Enquiry status", "Withdrawn on", "Withdrawn by", nil]
["0001", "EDOARDO", "AGNEW", "20/01/2001", "Fondazione Istituto Massimo", "RY5-G8-Y2", "-", nil, nil, nil, "-", "00000000", nil, nil, "00000000", nil, nil]
As you can see each row is an array of strings of each column in the document.
The issue was that I was reading from a Paperclip attachment, which needed to have the encoding set (overridden) before saving.
Adding s3_headers in the model worked:
has_attached_file :attachment, s3_headers: lambda { |attachment|
{
'content-Type' => 'text/csv; charset=utf-16le'
}
}
Thanks to Julien for tipping me off that the issue was related to the paperclip attachment (that solution works to read the file directly)
I have an array of json files.this is the sample of single json file,
{
"job": [
"admin",
"developer"
],
"name": "dave"
}
i need to get the "name" value if admin exist in "job". Need to do the same for other json files in the array.
Helps would be appreciated.
I am assuming if hash["job"] is present its an Array.
require 'json'
str = '{ "job": [ "admin", "developer" ], "name": "dave"}'
hash = JSON::parse(str)
# => {"job"=>["admin", "developer"], "name"=>"dave"}
name = hash["name"] if hash["job"] && hash["job"].include?("admin")
# => "dave"
Read the json file to hash using File Handling.
1) You need to require JSON before JSON parse.
require 'json'
If the above step returns false then probably you don't have json gem installed on your machine. Install JSON gem using the following command.
gem install json
2) Open JSON file for parsing : Create file handle to parse JSON file.
file = File.read('file-name-to-be-read.json')
The above command will open the file in the read mode.
3) Now parse the data from the file.
data_hash = JSON.parse(file)
The above command will parse the data from the file using file handle created with name 'file' and variable data_hash will have parsed hash from the file.
4) Now if we take the example mentioned in the question.
{
"job": [
"admin",
"developer"
],
"name": "dave"
}
require 'json'
file = File.read('file-name-to-be-read.json')
data_hash = JSON.parse(file)
The data_hash will contain {"job"=>["admin", "developer"], "name"=>"dave"}
Now the key "job" from the above hash consist of an array that includes ["admin","developer"]. You can use the following ternary command to find out the name if the job is "admin".
data_hash["job"].select{|x| x == 'admin'}.any? ? data_select["name"] : "not found"
any? checks for the job, if it is 'admin' then it will provide the name.
I'm not a Ruby coder, but I do need to read a Json file and access to nodes using such a language.
I did my homework, but I'm not able to finish them because of my lack of knowledge I hope you can compensate.
Let's start with my sample Json file.
{
"app": [{
"name":"test",
"ip_address": "172.90.90.90"
}],
"mysql": [{
"server_password": "root",
"dbName":"dbname"
}],
"phpmyadmin": [{
"app_username": "root",
"app_password": "root"
}]
}
And this is the little code fragment I put together to read the file.
require 'json'
data = JSON.parse(File.read("data.json"))
Now, as long as I do something like
print data[0]
or
print data["app"]
everything is fine, but if I try to access the subnode "app"."name" no matter the format or the parenthesis I sue, I always get a system exception. I was expecting as the most reasonable way to do this something like data["app"]["name"] but it is clearly not the case.
I'm testing this using ruby compiler on a Mac Os X which, and Ruby version should be the latest as far as I can understand (ruby 2.0.0p247)
Can you please help me out?
Thanks and have an happy new year start.
The reason is because data["app"] is an array:
1.9.3p484 :001 > require 'json'
=> true
1.9.3p484 :002 > data = JSON.parse(File.read("/Users/example/Desktop/json.json"))
=> {"app"=>[{"name"=>"test", "ip_address"=>"172.90.90.90"}], "mysql"=>[{"server_password"=>"root", "dbName"=>"dbname"}], "phpmyadmin"=>[{"app_username"=>"root", "app_password"=>"root"}]}
1.9.3p484 :003 > print data["app"]
[{"name"=>"test", "ip_address"=>"172.90.90.90"}]
If you do data["app"].first["name"], you'll get what you want:
1.9.3p484 :004 > print data["app"].first["name"]
test
In your sample data, app contains an Array, so you need to access it as such:
data["app"][0]["name"]
I have some code written in Ruby 1.9.2 patch level 136 and I'm have an issue where when I perform a find via the _id in the raw ruby mongo driver I get a nil when trying to use a value from a csv file. Here's the code:
require 'mongo'
require 'csv'
require 'bson'
# Games database
gamedb = Mongo::Connection.new("localhost", 27017).db("gamedb")
#games = gamedb.collection("games")
# Loop over CSV data.
CSV.foreach("/tmp/somedata.csv") do |row|
puts row[0] # Puts the ObjectId
#game = #games.find( { "_id" => row[0] } ).first
puts #game.inspect
end
The CSV file looks like this:
_id,game_title,platform,upc_db_match,upc
4ecdacc339c7d7a2a6000002,TMNT,PSP,TMNT,085391157663
4ecdacc339c7d7a2a6000004,Super Mario Galaxy,Wii,Super Mario Galaxy,045496900434
4ecdacc339c7d7a2a6000005,Beowulf,PSP,Beowulf,097363473046
The first column is the objectId in Mongo that I already have. If I perform a local find from the mongo command line the values in the first column, I get the data I want. However, the code above returns nil on the #game.inspect call.
I've tried the following variations, which all produce nil:
#game = #games.find( { "_id" => row[0].to_s } ).first
#game = #games.find( { "_id" => row[0].to_s.strip } ).first
I've even tried building the ObjectId with the BSON classes as such:
#game = #games.find( { "_id" => BSON::ObjectId(row[0]) } ).first
or
#game = #games.find( { "_id" => BSON::ObjectId("#{row[0]}") } ).first
Both of which output the following error:
/Users/donnfelker/.rvm/gems/ruby-1.9.2-p136#upc-etl/gems/bson-1.4.0/lib/bson/types/object_id.rb:126:in `from_string': illegal ObjectId format: _id (BSON::InvalidObjectId)
from /Users/donnfelker/.rvm/gems/ruby-1.9.2-p136#upc-etl/gems/bson-1.4.0/lib/bson/types/object_id.rb:26:in `ObjectId'
from migrate_upc_from_csv.rb:14:in `block in <main>'
from /Users/donnfelker/.rvm/rubies/ruby-1.9.2-p136/lib/ruby/1.9.1/csv.rb:1768:in `each'
from /Users/donnfelker/.rvm/rubies/ruby-1.9.2-p136/lib/ruby/1.9.1/csv.rb:1202:in `block in foreach'
from /Users/donnfelker/.rvm/rubies/ruby-1.9.2-p136/lib/ruby/1.9.1/csv.rb:1340:in `open'
from /Users/donnfelker/.rvm/rubies/ruby-1.9.2-p136/lib/ruby/1.9.1/csv.rb:1201:in `foreach'
from migrate_upc_from_csv.rb:10:in `<main>'
The crazy thing is, if I manually create the BSON ObjectId by hand it works (as shown below):
#game = #games.find( { "_id" => BSON::ObjectId("4ecdacc339c7d7a2a6000004") } ).first
When I run #game.inspect I get my data back, as I would expect. However, If I change this to use row[0], I get nil.
Why? What am I doing wrong?
System Details
$ gem list
*** LOCAL GEMS ***
bson (1.4.0)
bson_ext (1.4.0)
mongo (1.4.0)
RVM Version: rvm 1.6.9
Ruby Version: ruby 1.9.2p136 (2010-12-25 revision 30365) [x86_64-darwin10.6.0]
Mongo Version:
[initandlisten] db version v1.8.2, pdfile version 4.5
[initandlisten] git version: 433bbaa14aaba6860da15bd4de8edf600f56501b
Again, why? What am I doing wrong here? Thanks!
The first row is not being read as a header, to do that pass in :headers => true like this:
require 'csv'
# Loop over CSV data.
CSV.foreach("/tmp/somedata.csv", :headers => true) do |row|
puts row[0] # Puts the ObjectId
end
If you do not pass the :headers parameter in you can see the first row[0] object is the string "_id":
_id
4ecdacc339c7d7a2a6000002
4ecdacc339c7d7a2a6000004
4ecdacc339c7d7a2a6000005
When you include it, you are golden:
4ecdacc339c7d7a2a6000002
4ecdacc339c7d7a2a6000004
4ecdacc339c7d7a2a6000005
Are you sure your CSV parsing code isn't treating the headers as a first line of data and actually tries to do BSON::ObjectId("_id")? The error message kinda looks like it. Try with FasterCSV.foreach('/tmp/somedata.csv', :headers => true) and using row['_id'] (IIRC you'll still have to use BSON::ObjectID).
This is purely an experiment, but I'm wondering if it's possible to get a list of the require'd gems at runtime via some kind of metaprogramming. For example, say I have:
require 'rubygems'
require 'sinatra'
require 'nokogiri'
# don't know what to do here
How can I print out the following at runtime?
this app needs rubygems, sinatra, nokogiri
You can't do this exactly, because requiring one file may require others, and Ruby can't tell the difference between the file that you required and the file that someone else required.
You can check out $LOADED_FEATURES for a list of every single thing that's been required. But you should use Bundler if you want to specify dependencies explicitly.
Here's a thoroughly imperfect way to guess at the gem names and enumerate everything:
ruby-1.9.2-p180 :001 > $LOADED_FEATURES.
select { |feature| feature.include? 'gems' }.
map { |feature| File.dirname(feature) }.
map { |feature| feature.split('/').last }.
uniq.sort
=> ["1.9.1", "action_dispatch", "action_pack", "action_view", "actions", "active_model", "active_record", "active_support", "addressable", "agent", "array", "aws", "builder", "bundler", "cache_stores", "cancan", "cdn", "class", "client", "common", "compute", "connection", "control", "controllers", "core", "core_ext", "core_extensions", "css", "data_mapper", "decorators", "dependencies", "dependency_detection", "deprecation", "devise", "digest", "dns", "encodings", "encryptor", "engine", "errors", "excon", "ext", "failure", "faraday", "fields", "fog", "formatador", "geographer", "haml", "hash", "helpers", "heroku_san", "hmac", "hooks", "hoptoad_notifier", "html", "http", "i18n", "idna", "importers", "inflector", "initializers", "instrumentation", "integrations", "interpolate", "interval_skip_list", "jquery-rails", "json", "kaminari", "kernel", "lib", "mail", "metric_parser", "mime", "mixins", "model_adapters", "models", "module", "mongo_mapper", "mongoid", "multibyte", "new_relic", "node", "nokogiri", "numeric", "oauth", "object", "omniauth", "orm_adapter", "package", "parser", "parsers", "plugin", "pp", "providers", "queued", "rack", "rails", "railtie", "redis", "request", "request_proxy", "resp ruby-1.9.2-p180 :008 >onse", "resque", "retriever_methods", "routing", "ruby_extensions", "ruby_flipper", "rubygems", "runtime", "samplers", "sass", "sax", "script", "scss", "selector", "sequel", "ses", "shell", "signature", "simple_geo", "state_machine", "stats_engine", "storage", "strategies", "string", "tar_reader", "template", "terremark", "thor", "tokens", "tree", "treetop", "twitter", "us", "util", "vendor", "version_specific", "visitors", "warden", "xml", "xml_mini", "xpath", "xslt"]
Here's a way to get all the calls to require. Create this file: show_requires.rb
alias :orig_require :require
def require s
print "Requires #{s}\n" if orig_require(s)
end
Then start your app with
ruby -r show_requires.rb myapp.rb
This produces something like:
C:\code\test>ruby -r show_requires.rb test.rb
Requires stringio
Requires yaml/error
Requires syck
Requires yaml/ypath
Requires yaml/basenode
Requires yaml/syck
Requires yaml/tag
Requires yaml/stream
Requires yaml/constants
Requires date/format
Requires date
Requires yaml/rubytypes
Requires yaml/types
Requires yaml
Requires etc
Requires dl
Requires rbreadline
Requires readline
If you want only the top-level requires, add a global to track the nesting level:
$_rq_lvl = 0
alias :orig_require :require
def require s
$_rq_lvl+=1
print "Requires #{s}\n" if orig_require(s) and $_rq_lvl == 1
$_rq_lvl -=1
end
Then you get:
C:\code\test>ruby -r require_test.rb test.rb
Requires yaml
Requires readline
Just a slight touch to add to the previous -- consider that in order to precisely replace the behaviour of #require then you must also return a boolean value, so this is a more faithful override:
module Kernel
alias :orig_require :require
def require(name)
print "Requiring #{name}"
is_okay = orig_require(name)
puts " - #{is_okay ? 'Yup!' : 'Nope :('}"
is_okay
end
end
Interestingly with some testing I was doing -- tracking down a chain of stuff blowing up when requiring a module -- then this became necessary!