Media data upload and performance - performance

I need to create RESTful API for uploading media data. I need to be able to handle hundreds (thousands) of simultaneous requests. Once data is uploaded to my server, we are going to store it on Amazon S3 and populate some meta data into database. Could you advice in a few questions:
1) Which language is better for these kind of tasks ? (I'm familiar with PHP and Perl)
2) What about server? (nginx ?)
3) We need to be able to scale easily in case there are a lot of requests
4) Anything else you could point out and advice ?
Thank you

use feature ":5.16";
use warnings FATAL => qw(all);
use strict;
use Data::Dump qw(dump);
use Amazon::S3;
my $s3 = Amazon::S3->new
({aws_access_key_id => "...",
aws_secret_access_key => "...",
retry => 1
}
);
my $b = $s3->bucket("Your bucket name");
my $f = "test.data";
$b->add_key_filename($f, "test.data",
{"x-amz-storage-class"=>"REDUCED_REDUNDANCY", 'x-amz-meta-version'=>'12.11.22', acl_short=>'public-read'
});
say dump($b->errstr) if $b->errstr;

Related

possible to stream a mp3/mp4 from Dropbox API V2 with PHP?

Yesterday I set it up so I can serve MP3 files stored in my Dropbox using https://github.com/spatie/dropbox-api and Laravel. However this only works for small'ish files as the way it's working now, it has to load the entire file first and then serve it from Laravel. This doesn't work at all for movies or for long tracks as it takes forever and runs out of memory.
Here's the code I'm currently using
$authorizationToken = 'my-api-token';
$client = new \Spatie\Dropbox\Client($authorizationToken);
$path = "/offline/a-very-long-song.mp3"; // path in dropbox
$stream = $client->download($path);
$file = stream_get_contents($stream);
fclose($stream);
unset($stream);
$file_info = new \finfo(FILEINFO_MIME_TYPE);
return response($file, 200)->withHeaders([
'Content-Type' => $file_info->buffer($file),
'Content-Disposition' => 'inline; filename="' . basename($path) . '"',
]);
I was wondering if there's a way to stream it so it doesn't have to load the entire file first. I guess this happens naturally when you load a media file in the browser, but since there are no direct links to the physical file with Dropbox, I'm not sure if it's possible.
The Dropbox API does offer the ability to retrieve temporary direct links that can be used for streaming files like this, via the /2/files/get_temporary_link endpoint:
https://www.dropbox.com/developers/documentation/http/documentation#files-get_temporary_link
In the library you're using, that appears to be available as the getTemporaryLink method, as shown in the example here:
https://github.com/spatie/dropbox-api#a-minimal-implementation-of-dropbox-api-v2

Upload to S3 with progress in plain Ruby script

This question is related to this one: Tracking Upload Progress of File to S3 Using Ruby aws-sdk,
However since there is no clear solution to this I was wondering if there's a better/easier way (if one exists) of getting file upload progress with S3 using Ruby in 2018?
In my current setup I'm basically creating a new Resource, fetch my bucket and call upload_file but I haven't yet found any options for passing blocks which would help in yielding some sort of progress.
...
#connection = Aws::S3::Resource.new
#s3_bucket = #connection.bucket(bucket)
#s3_bucket.object(path).upload_file(data, {acl: 'public-read'})
...
Is there a way to do this using the newest sdk-for-ruby v3?
Any help (or even better a small example) would be great.
The example Trevor gives in https://stackoverflow.com/a/12147709/153886 is not hacky from what I can see - just wiring things together. The SDK simply does not provide a feature for passing progress details on all operations. Plus, Trevor is the maintainer of the Ruby SDK at AWS so I trust his judgement.
Expanding on his example
bar = ProgressBar.create(:title => "Uploading action", :starting_at => 0, :total => file.size)
obj = s3.buckets['my-bucket'].objects['object-key']
obj.write(:content_length => file.size) do |writable, n_bytes|
writable.write(file.read(n_bytes))
bar.progress += n_bytes
end
If you want to have a progress block right in the upload_file method I believe you will need to open a PR to the SDK. It is not that strange that is not the case for Ruby (or for any other runtime) because, for example, there could be an optimisation in the HTTP client library that uses IO.copy_stream from your source body argument to the destination socket, which does not relay progress anywhere.

Generating Drupal image derivatives via a curl call

We have a content type that uses a number of image styles to re-purpose images for a variety of different sections of our website, and have a large number of derivatives that need to be generated.
I want to use a script to pre-generate the necessary image derivatives before we go live after a major upgrade.
My thought was to write a script that uses Curl to call the URLs for which image derivatives will be created.
If in a browser I go to a specific URL that will cause generation of a derivative the image gets generated as expected. This is default Drupal behavior.
However, if I make a call to Curl on the command line for another URL that will cause generation of a derivative, the image does not get generated as expected.
I suspect it's because Curl is not actually downloading images. I also tried with Lynx and the result was the same.
Can anyone advise if there is a way to force Curl or Lynx to automatically download images so that the derivatives get created?
Thanks,
Pablo
you want to download all <img src="url" /> 's ?
easy, parse out the src attributes with DOMDocument and make an individual curl request for each image, kinda like this:
function downloadAllImagesFromUrl(string $url):int{
$imagesDownloaded=0;
$ch=curl_init();
if(!curl_setopt_array($ch,array(
CURLOPT_AUTOREFERER => true,
CURLOPT_BINARYTRANSFER => true,
CURLOPT_FOLLOWLOCATION => true,
CURLOPT_HTTPGET => true,
CURLOPT_SSL_VERIFYPEER => false,
CURLOPT_CONNECTTIMEOUT => 4,
CURLOPT_TIMEOUT => 8,
CURLOPT_COOKIEFILE => "", // <<makes curl save/load cookies across requests..
CURLOPT_ENCODING => "", // << makes curl post all supported encodings, gzip/deflate/etc, makes transfers faster
CURLOPT_URL=>$url,
CURLOPT_RETURNTRANSFER=>true
))){
throw new Exception(curl_error($ch));
}
$html=curl_exec($ch);
$domd=#DOMDocument::loadHTML($html);
foreach($domd->getElementsByTagName("img") as $img){
$src=$img->getAttribute("src");
if(!$src){
continue;
}
//Warning: you might want to parse_url PHP_URL_HOST / PHP_URL_PORT / PHP_URL_PATH
// if the urls are not absolute but relative.
curl_setopt($ch,CURLOPT_URL,$src);
curl_exec($ch);
++$imagesDownloaded;
}
curl_close($ch);
return $imagesDownloaded;
}
It is probably much faster to use get_headers() instead of curl_exec, but since PHP by default use ignore_user_abort, drupal may abort image generation if you dont actually download them but only get their headers. warning, the code above assumes all image src's are absolute. you need additional coding with parse_url & PHP_URL_HOST / PHP_URL_PORT / PHP_URL_PATH if you want to handle relative urls.. and note: this can be made much faster by using multithreading with curl_multi interface, but that requires much more complex coding..

Direct (and simple!) AJAX upload to AWS S3 from (AngularJS) Single Page App

I know there's been a lot of coverage on upload to AWS S3. However, I've been struggling with this for about 24 hours now and I have not found any answer that fits my situation.
What I'm trying to do
Upload a file to AWS S3 directly from my client to my S3 bucket. The situation is:
It's a Single Page App, so upload request must be in AJAX
My server and my client are not on the same domain
The S3 bucket is of the newest sort (Frankfurt), for which some signature-generating libraries don't work (see below)
Client is in AngularJS
Server is in ExpressJS
What I've tried
Heroku's article on direct upload to S3. Doesn't fit my client/server configuration (plus it really does not fit harmoniously with Angular)
ready-made directives like ng-s3upload. Does not work because their signature-generating algorithm is not accepted by recent s3 buckets.
Manually creating a file upload directive and logic on the client like in this article (using FormData and Angular's $http). It consisted of getting a signed URL from AWS on the server (and that part worked), then AJAX-uploading to that URL. It failed with some mysterious CORS-related message (although I did set a CORS config on Heroku)
It seems I'm facing 2 difficulties: having a file input that works in my Single Page App, and getting AWS's workflow right.
The kind of solution I'm looking for
If possible, I'd like to avoid 'all included' solutions that manage the whole process while hiding of all of the complexity, making it hard to adapt to special cases. I'd much rather have a simple explanation breaking down the flow of data between the various components involved, even if it requires some more plumbing from me.
I finally managed. The key points were:
Let go of Angular's $http, and use native XMLHttpRequest instead.
Use the getSignedUrl feature of AWS's SDK, instead on implementing my own signature-generating workflow like many libraries do.
Set the AWS configuration to use the proper signature version (v4 at the time of writing) and region ('eu-central-1' in the case of Frankfurt).
Here below is a step-by-step guide of what I did; it uses AngularJS and NodeJS on the server, but should be rather easy to adapt to other stacks, especially because it deals with the most pathological cases (SPA on a different domain that the server, with a bucket in a recent - at the time of writing - region).
Workflow summary
The user selects a file in the browser; your JavaScript keeps a reference to it.
the client sends a request to your server to obtain a signed upload URL.
Your server chooses a name for the object to put in the bucket (make sure to avoid name collisions!).
The server obtains a signed URL for your object using the AWS SDK, and sends it back to the client. This involves the object's name and the AWS credentials.
Given the file and the signed URL, the client sends a PUT request directly to your S3 Bucket.
Before you start
Make sure that:
Your server has the AWS SDK
Your server has AWS credentials with proper access rights to your bucket
Your S3 bucket has a proper CORS configuration for your client.
Step 1: setup a SPA-friendly file upload form / widget.
All that matters is to have a workflow that eventually gives you programmatic access to a File object - without uploading it.
In my case, I used the ng-file-select and ng-file-drop directives of the excellent angular-file-upload library. But there are other ways of doing it (see this post for example.).
Note that you can access useful information in your file object such as file.name, file.type etc.
Step 2: Get a signed URL for the file on your server
On your server, you can use the AWS SDK to obtain a secure, temporary URL to PUT your file from someplace else (like your frontend).
In NodeJS, I did it this way:
// ---------------------------------
// some initial configuration
var aws = require('aws-sdk');
aws.config.update({
accessKeyId: process.env.AWS_ACCESS_KEY,
secretAccessKey: process.env.AWS_SECRET_KEY,
signatureVersion: 'v4',
region: 'eu-central-1'
});
// ---------------------------------
// now say you want fetch a URL for an object named `objectName`
var s3 = new aws.S3();
var s3_params = {
Bucket: MY_BUCKET_NAME,
Key: objectName,
Expires: 60,
ACL: 'public-read'
};
s3.getSignedUrl('putObject', s3_params, function (err, signedUrl) {
// send signedUrl back to client
// [...]
});
You'll probably want to know the URL to GET your object to (typically if it's an image). To do this, I simply removed the query string from the URL:
var url = require('url');
// ...
var parsedUrl = url.parse(signedUrl);
parsedUrl.search = null;
var objectUrl = url.format(parsedUrl);
Step 3: send the PUT request from the client
Now that your client has your File object and the signed URL, it can send the PUT request to S3. My advice in Angular's case is to just use XMLHttpRequest instead of the $http service:
var signedUrl, file;
// ...
var d_completed = $q.defer(); // since I'm working with Angular, I use $q for asynchronous control flow, but it's not mandatory
var xhr = new XMLHttpRequest();
xhr.file = file; // not necessary if you create scopes like this
xhr.onreadystatechange = function(e) {
if ( 4 == this.readyState ) {
// done uploading! HURRAY!
d_completed.resolve(true);
}
};
xhr.open('PUT', signedUrl, true);
xhr.setRequestHeader("Content-Type","application/octet-stream");
xhr.send(file);
Acknowledgements
I would like to thank emil10001 and Will Webberley, whose publications were very valuable to me for this issue.
You can use the ng-file-upload $upload.http method in conjunction with the aws-sdk getSignedUrl to accomplish this. After you get the signedUrl back from your server, this is the client code:
var fileReader = new FileReader();
fileReader.readAsArrayBuffer(file);
fileReader.onload = function(e) {
$upload.http({
method: 'PUT',
headers: {'Content-Type': file.type != '' ? file.type : 'application/octet-stream'},
url: signedUrl,
data: e.target.result
}).progress(function (evt) {
var progressPercentage = parseInt(100.0 * evt.loaded / evt.total);
console.log('progress: ' + progressPercentage + '% ' + file.name);
}).success(function (data, status, headers, config) {
console.log('file ' + file.name + 'uploaded. Response: ' + data);
});
To do multipart uploads, or those larger than 5 GB, this process gets a bit more complicated, as each part needs its own signature. Conveniently, there is a JS library for that:
https://github.com/TTLabs/EvaporateJS
via https://github.com/aws/aws-sdk-js/issues/468
Use s3-file-upload open source directive having dynamic data-binding and auto-callback functions - https://github.com/vinayvnvv/s3FileUpload

Need help transferring a jpg from web server to S3 - PHP and CodeIgniter

I have a Flash application that captures an image and passes an encoded image to my web server. The web server then decodes the image and saves it to a tmp directory. That part works fine. Next I want to move this image from the web sever to my S3 account but am having trouble. I am using the code below. Any help is appreciated.
In the CodeIgniter Controller Constructor:
$this->load->library('S3');
In the function (also within the controller for now)
/************** S3 upload example***************/
if (!defined('awsAccessKey')) define('awsAccessKey', 'xxxxxxx');
if (!defined('awsSecretKey')) define('awsSecretKey', 'xxxxxxx');
$s3 = new S3(awsAccessKey, awsSecretKey);
if($s3->putObjectFile($filePathTemp, "bucket", $filePathNew, "ACL_PUBLIC_READ_WRITE")){
#unlink($filePathTemp);
}
I can't even get the $s3 variable to return anything in an "echo" statement by returning a value on the first line of the S3 constructor. I can't access the S3 object/class, but the following statement returns a "1":
echo "S3 --> " . class_exists('S3');
Thanks in advance for your time.
-Tim
I'm assuming (you don't say: you should specify which one) you are using this. Your usage looks correct: I'm assuming your keys are wrong. If
print_r($s3->listBuckets());
doesn't return anything check your server logs.

Resources