I'm using Anemone to store crawled pages into MongoDB. It mostly works, except for accessing the page headers when I retrieve a page from MongoDB.
When I call collection.find_one("http://stackoverflow.com") I'll get the correct object from the data store, but I can't acecss the headers.
Anemone stores the headers as a hash, so theoretically, after retreiving the document, I should be able to do something like
document["headers"]["content-type"]
but that doesn't work because document["headers"] is a BSON::Binary.
puts document["headers"]
displays a mixture of text and binary characters.
How can I create a usable ruby hash object from the binary data that comes back from MongoDB?
EDIT: I haven't solved the original problem, but was able to modify Anemone so that I can have it load the data for me, which seems to work:
class NewMongo < Anemone::Storage::MongoDB
def initialize(mongo_db, collection_name)
#db = mongo_db
#collection = #db[collection_name]
#Do not delete the collection! I need it!
##collection.remove
#collection.create_index 'url'
end
end
And then later on...
repo = NewMongo.new(db, "pages")
repo.each db |url, page|
puts page.content_type
end
If the data was stored in a Binary format by the Anemone storage backend there isn't much you can do unless you know the format or there is a deserializer they provide. It sounds like that would be a bad choice for storing the header as the hash would be a more natural form for it.
Related
I am building a command line app that will generate metadata files amongst other things. I have a series of values that I want included, and I would like to insert those values into json format and than write it to a .txt file.
The complicated part (to me at least) is some of the values are dynamic (i.e. they may change everytime a file is created), other parts of the json file will need to be static. Is there any sort of templating that may help with this? (json erb)
If I were to use a json erb template, how would I write the result of the template (after it has been populated) to a txt file since this is not a rails app and I thus would not be calling the view.
Thank you in advance for any help.
It seems like two things could be helpful to you, but your question is pretty open ended ...
First, if your json templates are complex (static and dynamic parts?) I suggest you look at a tool like RABL ...
https://github.com/nesquena/rabl
There is a railscast on RABL here:
http://railscasts.com/episodes/322-rabl
RABL lets you create templates for generating custom JSON output.
Regarding writing to a file, you may or may not need to call the controller first. But the flow would be something like:
#sample_controller.rb
require 'json'
def get_sample
#x = {:a => "apple", :b => "baker"}
render json: #x
end
You can call the controller and get the rendered json.
z = get_sample
File.open(yourfile, 'w') { |file| file.write(z) }
I have written a Jekyll plugin to display the number of pageviews on a page by calling the Google Analytics API using the garb gem. The only trouble with my approach is that it makes a call to the API for each page, slowing down build time and also potentially hitting the user call limits on the API.
It would be possible to return all the data in a single call and store it locally, and then look up the pageview count from each page, but my Jekyll/Ruby-fu isn't up to scratch. I do not know how to write the plugin to run once to get all the data and store it locally where my current function could then access it, rather than calling the API page by page.
Basically my code is written as a liquid block that can be put into my page layout:
class GoogleAnalytics < Liquid::Block
def initialize(tag_name, markup, tokens)
super # options that appear in block (between tag and endtag)
#options = markup # optional optionss passed in by opening tag
end
def render(context)
path = super
# Read in credentials and authenticate
cred = YAML.load_file("/home/cboettig/.garb_auth.yaml")
Garb::Session.api_key = cred[:api_key]
token = Garb::Session.login(cred[:username], cred[:password])
profile = Garb::Management::Profile.all.detect {|p| p.web_property_id == cred[:ua]}
# place query, customize to modify results
data = Exits.results(profile,
:filters => {:page_path.eql => path},
:start_date => Chronic.parse("2011-01-01"))
data.first.pageviews
end
Full version of my plugin is here
How can I move all the calls to the API to some other function and make sure jekyll runs that once at the start, and then adjust the tag above to read that local data?
EDIT Looks like this can be done with a Generator and writing the data to a file. See example on this branch Now I just need to figure out how to subset the results: https://github.com/Sija/garb/issues/22
To store the data, I had to:
Write a Generator class (see Jekyll wiki plugins) to call the API.
Convert data to a hash (for easy lookup by path, see 5):
result = Hash[data.collect{|row| [row.page_path, [row.exits, row.pageviews]]}]
Write the data hash to a JSON file.
Read in the data from the file in my existing Liquid block class.
Note that the block tag works from the _includes dir, while the generator works from the root directory.
Match the page path, easy once the data is converted to a hash:
result[path][1]
Code for the full plugin, showing how to create the generator and write files, etc, here
And thanks to Sija on GitHub for help on this.
I have been looking at Padrino for a project I am working on, and it seems a great fit, as I would ideally be wanting to support data being sent and received as json.
However I am wondering if there is any automated helper or functionality built in to take data from a post request (or other request) and put that data into the model without having to write custom logic for each model to process the data?
In the Blog example they briefly skim over this but just seem to pass the parameter data into the initilizer of their Post model, making me assume that it just magically knows what to do with everything... Not sure if this is the case, and if so is it Padrino functionality or ActiveRecord (as thats what they seem to use in the example).
I know I can use ActiveSupport for JSON based encoding/decoding but this just gives me a raw object, and as the storage concerns for each model reside within the main model class I would need to use a mixin or something to achieve this, which seems nasty.
Are there any good patterns/functionality around doing this already?
Yep, you can use provides and each response object will call to_json i.e:
get :action, :provides => :json do
#colletion = MyCollection.all
render #collection # will call #collection.to_json
end
Here an example of an ugly code that fills certain models.
# Gemfile
gem 'json' # note that there are better and faster gems like yajl
# controller
post "/update/:model/:id", :provides => :json do
if %w(Account Post Category).include?(params[:model])
klass = params[:model].constantize
klass.find(params[:id])
klass.update_attributes(JSON.parse(params[:attributes]))
end
end
Finally if you POST a request like:
attributes = { :name => "Foo", :category_id => 2 }.to_json
http://localhost:3000/Account/12?attributes=#{attributes}
You'll be able to update record 12 of the Account Model.
Is there any way that I can fire a raw mongo query directly in Ruby instead of converting them to the native Ruby objects?
I went through Ruby Mongo Tutorial, but I cannot find such a method anywhere.
If it were mysql, I would have fired a query something like this.
ActiveRecord::Base.connection.execute("Select * from foo")
My mongo query is a bit large and it is properly executing in the MongoDB console. What I want is to directly execute the same inside Ruby code.
Here's a (possibly) better mini-tutorial on how to get directly into the guts of your MongoDB. This might not solve your specific problem but it should get you as far as the MongoDB version of SELECT * FROM table.
First of all, you'll want a Mongo::Connection object. If
you're using MongoMapper then you can call the connection
class method on any of your MongoMapper models to get a connection
or ask MongoMapper for it directly:
connection = YourMongoModel.connection
connection = MongoMapper.connection
Otherwise I guess you'd use the from_uri constructor to build
your own connection.
Then you need to get your hands on a database, you can do this
using the array access notation, the db method, or get
the current one straight from MongoMapper:
db = connection['database_name'] # This does not support options.
db = connection.db('database_name') # This does support options.
db = MongoMapper.database # This should be configured like
# the rest of your app.
Now you have a nice shiny Mongo::DB instance in your hands.
But, you probably want a Collection to do anything interesting
and you can get that using either array access notation or the
collection method:
collection = db['collection_name']
collection = db.collection('collection_name')
Now you have something that behaves sort of like an SQL table so
you can count how many things it has or query it using find:
cursor = collection.find(:key => 'value')
cursor = collection.find({:key => 'value'}, :fields => ['just', 'these', 'fields'])
# etc.
And now you have what you're really after: a hot out of the oven Mongo::Cursor
that points at the data you're interested in. Mongo::Cursor is
an Enumerable so you have access to all your usual iterating
friends such as each, first, map, and one of my personal
favorites, each_with_object:
a = cursor.each_with_object([]) { |x, a| a.push(mangle(x)) }
There are also command and eval methods on Mongo::DB that might do what you want.
In case you are using mongoid you will find the answer to your question here.
If you're using Mongoid 3, it provides easy access to its MongoDB driver: Moped. Here's an example of accessing some raw data without using Models to access the data:
db = Mongoid::Sessions.default
# inserting a new document
collection = db[:collection_name]
collection.insert(name: 'my new document')
# finding a document
doc = collection.find(name: 'my new document').first
# "select * from collection"
collection.find.each do |document|
puts document.inspect
end
Does anyone have any insights into using CarrierWave with an ActiveResource model (in Rails 3)? I've got an ActiveResource model with field for the filename, and I want to save the file to the remote filesystem.
I've tried a few things without much success (or conviction that I was doing anything remotely correctly), so I'd appreciate suggestions from anyone who's successfully implemented CarrierWave without using the ORM modules already included in the gem.
I'm probably late for this as the original author has moved on, but this question comes up at the top when someone searches for "carrierwave activeresource", so I thought it was still worth answering.
For the sake of discussion, let's assume we have a model named Artist with a picture named artist_picture mounted as a CarrierWave uploader. With ActiveRecord, you would assign this picture to a File:
artist.artist_picture=File.open('ravello.jpg')
And when you save artist:
artist.save!
the picture will be saved, also.
Now, let's say I create a resource based on this:
class Artist < ActiveResource::Base
end
If I subsequently read in an artist:
artist = Artist.find(1)
and look at it, I'll find this in there:
#<Artist:0x39432039 #attributes={"id"=>1, "name"=>"Ravello", "artist_picture"=>#<ArtistPicture:0x282347249243 #attributes={"url"=>"/uploads/artists/artist_picture/1/ravello.jpg"}, #prefix_options={}, #persisted=false>, #prefix_options={}, #persisted=false>
Interestingly, artist_picture is itself a model and we could declare it and play around with it if we wanted. As it is, you can use the url to grab the picture if you want. But let's talk instead about uploading another picture.
We can add this little bit of code to the Artist model on the server side:
def artist_picture_as_base64=(picsource)
tmpfile = Tempfile.new(['artist','.jpg'], Rails.root.join('tmp'), :encoding => 'BINARY')
begin
tmpfile.write(Base64.decode64(picsource.force_encoding("BINARY")))
file = CarrierWave::SanitizedFile.new(tmpfile)
file.content_type = 'image/jpg'
self.artist_picture = file
ensure
tmpfile.close!
end
end
I'm just showing a simple example - you should probably pass the original filename, also. Anyway, on the resource side:
class Artist < ActiveResource::Base
def artist_picture=(filename)
self.artist_picture_as_base64=Base64.encode64(File.read(filename))
end
end
At this point, on the resource side you need only set "artist_picture" to a filename and it will be encoded and sent when the resource is saved. On the server side, the file will be decoded and saved. Presumably you could skip base64 encoding by just forcing the string to binary encoding, but it craps when I do that and I don't have the patience to track it down. Encoding as base64 works.