Upload to S3 with progress in plain Ruby script - ruby

This question is related to this one: Tracking Upload Progress of File to S3 Using Ruby aws-sdk,
However since there is no clear solution to this I was wondering if there's a better/easier way (if one exists) of getting file upload progress with S3 using Ruby in 2018?
In my current setup I'm basically creating a new Resource, fetch my bucket and call upload_file but I haven't yet found any options for passing blocks which would help in yielding some sort of progress.
...
#connection = Aws::S3::Resource.new
#s3_bucket = #connection.bucket(bucket)
#s3_bucket.object(path).upload_file(data, {acl: 'public-read'})
...
Is there a way to do this using the newest sdk-for-ruby v3?
Any help (or even better a small example) would be great.

The example Trevor gives in https://stackoverflow.com/a/12147709/153886 is not hacky from what I can see - just wiring things together. The SDK simply does not provide a feature for passing progress details on all operations. Plus, Trevor is the maintainer of the Ruby SDK at AWS so I trust his judgement.
Expanding on his example
bar = ProgressBar.create(:title => "Uploading action", :starting_at => 0, :total => file.size)
obj = s3.buckets['my-bucket'].objects['object-key']
obj.write(:content_length => file.size) do |writable, n_bytes|
writable.write(file.read(n_bytes))
bar.progress += n_bytes
end
If you want to have a progress block right in the upload_file method I believe you will need to open a PR to the SDK. It is not that strange that is not the case for Ruby (or for any other runtime) because, for example, there could be an optimisation in the HTTP client library that uses IO.copy_stream from your source body argument to the destination socket, which does not relay progress anywhere.

Related

Download a GitHub file with Octokit::Client in Ruby

I am trying to download an XML file from GitHub via Octokit::Client in Ruby. This is in a Dangerfile, so I have access to the client via github.api.
I have got something working with the following code:
listing = client.contents('erik-allen/RepoName', :path => 'Root/Path')
download = open(listing[0].download_url)
I can then call Nokogiri::XML(download) and parse the XML with no issues.
However, it only works because it is the only file in the directory. It also does not feel like the correct way to do things.
I have tried a few other ways:
download = client.contents('erik-allen/RepoName', :path => 'Root/Path/File.xml')
That returned a Sawyer::Resource but I have yet to find a way to get any data from that. I tried combinations of .get.data and .data but neither worked. Calling Base64.decode64() on the result did not yield anything either.
I am suspecting I may need an "accept" header, but I am not sure how I would do that with Octokit::Client.
Does anyone have any suggestions. I would have assumed this would be a common task, but I can find no examples.
I was able to eventually figure things out. There is a content property on the Sawyer::Resource that gives the Base64 data. So the final solution is:
contents = client.contents('erik-allen/RepoName', :path => 'Root/Path/File.xml')
download = Base64.decode64(contents.content)
I can then call Nokogiri::XML(download) and parse the XML with no issues.

How to upload large archive to Amazon Glacier using ruby and aws-sdk?

The amazon documentation for glacier doesn't seem to include any Ruby examples and the documentation itself is rather sparse.
I gather I need to instantiate a Glacier Client object and then use the upload_multipart_part method to access the Glacier API, but not sure how to massage the arguments to pass to upload_multipart_part. How do I calculate the checksums that AWS is looking for using ruby? And what is upload_id?
UPDATE
Figured out most of this by reading Amazon's documentation. They don't appear to have any code samples for ruby, the github repo of code examples does not cover Glacier. But by looking at the raw API documentation and some Java and PHP examples, it looks like I'd do this:
client = AWS::Glacier::Client.new(access_key_id: ACCESS_KEY_ID, secret_access_key: SECRET_ACCESS_KEY)
resp = client.initiate_multipart_upload(account_id: ACCOUNT_ID, vault_name: 'My Vault', archive_description: "Backup of some stuff", part_size: PART_SIZE_IN_BYTES)
And if all goes well, the Amazon API response should include a unique upload_id which I would then use in subsequent calls using client.upload_multipart_part().
I'm guessing the checksums can be calculated like this:
Digest::SHA256.file(file_to_upload).hexdigest
UPDATE 2
Seems like this has already been solved:
https://github.com/fog/fog
For anyone else interested, this link is helpful, pretty much covers most of what you need:
http://www.spacevatican.org/2012/9/4/using-glacier-with-fog/
It doesn't really provide all details on how to instantiate a glacier object:
glacier = Fog::AWS::Glacier.new({
:aws_access_key_id => YOUR_ACCESS_KEY_ID,
:aws_secret_access_key => YOUR_SECRET_ACCESS_KEY
})

Render a view's output later via a delayed_job

If I render html I get html to the browser which works great. However, how can I get a route's response (the html) when being called in a module or class.
I need to do this because I'm sending documents to DocRaptor and rather than store the markup/html in a db column I would like to instead store record IDs and create the markup when the job executes.
A possible solution is using Ruby's HTTP library, Httparty or wget or something and open up the route and use the response.body. Before doing so I thought I'd ask around.
Thanks!
-- Update --
Here's something like what I ended up going with:
Quick tip - in case anyone does this and need their helper methods you need to extend AV with ApplicationHelper:
Here's something like what I ended up doing:
av = ActionView::Base.new()
av.view_paths = ActionController::Base.view_paths
av.extend ApplicationHelper #or any other helpers your template may need
body = av.render(:template => "orders/receipt.html.erb",:locals => {:order => order})
Link:
http://www.rigelgroupllc.com/blog/2011/09/22/render-rails3-views-outside-of-your-controllers/
check this question out, it contains the code probably want in an answer:
Rails 3 > Rendering views in rake task

Ruby -- Using facebook's Graph API Explorer in conjunction with the koala gem

I've found facebook's 'Graph API Explorer' tool (https://developers.facebook.com/tools/explorer/) to be an incredibly easy way, welcoming (for beginners) & effective way to use facebook's graph API via its GUI.
I'd like to be able to use the koala gem to pass these generated URLs to facebook's api.
Right now, lets say I had a query like this
url = "me?fields=id,name,posts.fields(likes.fields(id,name),comments.fields(parent,likes.fields(id,name)),message)"
I'd like to be able to pass that directly into koala as a single string.
#graph.get_connections(url)
It doesn't like that so I separate out the uid and the ? operator like the gem seems to want
url = "fields=id,name,posts.fields(likes.fields(id,name),comments.fields(parent,likes.fields(id,name)),message)"
#graph.get_connections("me", url)
This however, returns an error as well:
Koala::Facebook::AuthenticationError:
type: OAuthException, code: 2500,
message: Unknown path components: /fields=id,name,posts.fields(likes.fields(id,name),comments.fields(parent,likes.fields(id,name)),message) [HTTP 400]
Currently this is where I am stuck. I'd like to continue using koala because I like the gem-approach to working with API's, especially when it comes to using OAuth & OAuth2.
UPDATE:
I'm starting to break down the request into pieces which the koala gem can handle, for example
posts = #graph.get_connections("me", "posts")
postids = posts.map { |p| p['id'] }
likes = postids.inject([]) {|ary, id| ary << #graph.get_connection(id, "likes") }
So that's a long way of getting two arrays, one of posts, one of like data.
But I'd quickly burn up my API requests limit in no time using this kind of approach.
I was kind of hoping I'd just be able to pass the whole string from the Graph API Explorer and just get what I wanted rather than having to manually parse all this stuff.
I don't really know about your posts.fields(likes.fields(id,name) -this does not work in the Graph API Explorer- and stuff like that but I know you can do this:
fb_api = Koala::Facebook::API.new(access_token)
fb_api.api("/me?fields=id,name,posts")
# => => {"id"=>"71170", "name"=>"My Name", "posts"=>{"paging"=>{"next"=>"https://graph.facebook.com/71170/posts?access_token=CAAEO&limit=25&until=13705022", "previous"=>"https://graph.facebook.com/711737070/posts?access_token=CAAEOTYMZD&limit=25&since=1370723&__previous=1"}, "data"=>[{"id"=>"71170_1013572471", "comments"=>{"count"=>0}, "created_time"=>"2013-06-09T08:03:43+0000", "from"=>{"id"=>"71170", "name"=>"My Name"}, "updated_time"=>"2013-06-09T08:03:43+0000", "privacy"=>{"value"=>""}, "type"=>"status", "story_tags"=>{"0"=>[{"id"=>"71170", "name"=>" ", "length"=>8, "type"=>"user", "offset"=>0}]}, "story"=>" likes a photo."}]}}
And you will receive in a hash what you asked for.
From time to time, you must pass nil as a param to koala:
result += graph_api.batch do |batch_api|
facebook_page_ids.each do |facebook_page_id|
batch_api.get_connections(facebook_page_id, nil, {"fields"=>"posts"})
end
end

Avoid repeated calls to an API in Jekyll Ruby plugin

I have written a Jekyll plugin to display the number of pageviews on a page by calling the Google Analytics API using the garb gem. The only trouble with my approach is that it makes a call to the API for each page, slowing down build time and also potentially hitting the user call limits on the API.
It would be possible to return all the data in a single call and store it locally, and then look up the pageview count from each page, but my Jekyll/Ruby-fu isn't up to scratch. I do not know how to write the plugin to run once to get all the data and store it locally where my current function could then access it, rather than calling the API page by page.
Basically my code is written as a liquid block that can be put into my page layout:
class GoogleAnalytics < Liquid::Block
def initialize(tag_name, markup, tokens)
super # options that appear in block (between tag and endtag)
#options = markup # optional optionss passed in by opening tag
end
def render(context)
path = super
# Read in credentials and authenticate
cred = YAML.load_file("/home/cboettig/.garb_auth.yaml")
Garb::Session.api_key = cred[:api_key]
token = Garb::Session.login(cred[:username], cred[:password])
profile = Garb::Management::Profile.all.detect {|p| p.web_property_id == cred[:ua]}
# place query, customize to modify results
data = Exits.results(profile,
:filters => {:page_path.eql => path},
:start_date => Chronic.parse("2011-01-01"))
data.first.pageviews
end
Full version of my plugin is here
How can I move all the calls to the API to some other function and make sure jekyll runs that once at the start, and then adjust the tag above to read that local data?
EDIT Looks like this can be done with a Generator and writing the data to a file. See example on this branch Now I just need to figure out how to subset the results: https://github.com/Sija/garb/issues/22
To store the data, I had to:
Write a Generator class (see Jekyll wiki plugins) to call the API.
Convert data to a hash (for easy lookup by path, see 5):
result = Hash[data.collect{|row| [row.page_path, [row.exits, row.pageviews]]}]
Write the data hash to a JSON file.
Read in the data from the file in my existing Liquid block class.
Note that the block tag works from the _includes dir, while the generator works from the root directory.
Match the page path, easy once the data is converted to a hash:
result[path][1]
Code for the full plugin, showing how to create the generator and write files, etc, here
And thanks to Sija on GitHub for help on this.

Resources