Laravel 5: How to copy (stream) a file from Amazon S3 to FTP? - laravel

I have to move large content, which I don't want to put into memory from AWS S3 to FTP with Laravel's filesystem.
I know how to stream local content to S3, but haven't found a solution yet from S3 to FTP.
The closest I found was this, but I'm stuck in adapting it for my case.
Here is what's missing in my code (??):
$inputStream = Storage::disk('s3')->getDriver()->??
$destination = Storage::disk('ftp')->getDriver()->??
Storage:disk('ftp')->getDriver()->putStream($destination, $inputStream);

I think I found a solution:
$input = Storage::disk('s3')->getDriver();
$output = Storage::disk('ftp')->getDriver();
$output->writeStream($ftp_file_path, $input->readStream($s3_file_path));

Related

s3 awk bash pipeline

Following this question Splitting out a large file.
I would like to pipe calls from an Amazon s3:// bucket containing large gzipped files, process them with an awk command.
Sample file to process
...
{"captureTime": "1534303617.738","ua": "..."}
...
Script to optimize
aws s3 cp s3://path/to/file.gz - \
| gzip -d \
| awk -F'"' '{date=strftime("%Y%m%d%H",$4); print > "splitted."date }'
gzip splitted.*
# make some visual checks here before copying to S3
aws s3 cp splitted.*.gz s3://path/to/splitted/
Do you think I can wrap everything in the same pipeline to avoid writing files locally?
I can use Using gzip to compress files to transfer with aws command to be able to gzip and copy on the fly, but gzipping inside awk would be great.
Thank you.
Took me a bit to understand that your pipeline creates one "splitted.date file for each line in the source file. Since shell pipelines operate on byte streams and not files, while S3 operates on files (objects), you must turn your byte stream into a set of files on local storage before sending them back to S3. So, a pipeline by itself won't suffice.
But I'll ask: what's the larger purpose you trying to accomplish?
You're on the path to generating lots of S3 objects, one for each line of your "large gzipped files". Is this using S3 as a key value store? I'll ask if this is the best design for the goal of your effort? In other words, is S3 the best repository for this information or is here some other store (DynamoDB, or another NoSQL) that would be a better solution?
All the best
Two possible optimizations :
On large and multiple files it will help to use all the cores to gzip the files, use xargs, pigz or gnu parallel
Gzip with all cores
parallelize S3 upload :
https://github.com/aws-samples/aws-training-demo/tree/master/course/architecting/s3_parallel_upload

Laravel 5.5 - Store image to S3

I am currently writing images created by ImageMagick to local storage in a Laravel 5.5 app like this...
$imagick->writeImages(storage_path('app/files/' . $tempfoldername . '_' . $title . '/' . $title . '_page.jpg'), false);
I have now setup an S3 bucket on AWS to store image to instead, how can i modify the above statement to store them in the bucket instead?
I have already set Laravel up with the S3 details and can successfully read and write to the S3 bucket.
Should I do as I am doing and move them afterwards? Or can I do it directly from that imagemagick statement?
Since you're processing the image using image magick, You have 2 options:
First option
Store the image in the local folder, then upload, then unlink
Storage::disk('s3')->put($title . '_page.jpg', new File($filePath));
unlink($filePath);
Or add the image directly to s3 using the following
Storage::disk('s3')->put($title . '_page.jpg', $imagick->getImageBlob());

Writing Spark dataframe as parquet to S3 without creating a _temporary folder

Using pyspark I'm reading a dataframe from parquet files on Amazon S3 like
dataS3 = sql.read.parquet("s3a://" + s3_bucket_in)
This works without problems. But then I try to write the data
dataS3.write.parquet("s3a://" + s3_bucket_out)
I do get the following exception
py4j.protocol.Py4JJavaError: An error occurred while calling o39.parquet.
: java.lang.IllegalArgumentException: java.net.URISyntaxException:
Relative path in absolute URI: s3a://<s3_bucket_out>_temporary
It seems to me that Spark is trying to create a _temporary folder first, before it is writing to write into the given bucket. Can this be prevent somehow, so that spark is writing directly to the given output bucket?
You can't eliminate the _temporary file as that's used to keep the intermediate
work of a query hidden until it's complete
But that's OK, as this isn't the problem. The problem is that the output committer gets a bit confused trying to write to the root directory (can't delete it, see)
You need to write to a subdirectory under a bucket, with a full prefix. e.g.
s3a://mybucket/work/out .
I should add that trying to commit data to S3A is not reliable, precisely because of the way it mimics rename() by what is something like ls -rlf src | xargs -p8 -I% "cp % dst/% && rm %". Because ls has delayed consistency on S3, it can miss newly created files, so not copy them.
See: Improving Apache Spark for the details.
Right now, you can only reliably commit to s3a by writing to HDFS and then copying. EMR s3 works around this by using DynamoDB to offer a consistent listing
I had the same issue when writing the root of S3 bucket:
df.save("s3://bucketname")
I resolved it by adding a / after the bucket name:
df.save("s3://bucketname/")

FTP copy a file to another place in same FTP

I need to upload same file to 2 different place in same FTP. Is there a way to copy the file on the FTP to the other place instead of upload it again? Thanks.
There's no standard way to duplicate a remote file over the FTP protocol. Some FTP servers support proprietary or non-standard extensions for this though.
Some FTP clients do support the remote file duplication. Either using the extensions or via a temporary local copy of the remote file.
For example WinSCP FTP client does support the duplication using both drag&drop and menu/keyboard command:
It supports the SITE CPFR/CPTO FTP extension (supported for example by the ProFTPD mod_copy module)
It falls back to an automatic duplication via a local temporary copy, if the above extension is not available.
(I'm the author of WinSCP)
Another workaround is to open a second connection to the FTP server and make the server upload the file to itself by piping a passive mode data connection to an active mode data connection. This solution is shown in the answer by #SaadAchemlal. This is basically use of FXP protocol, but for one server. Though many FTP servers will reject this, as they wont allow data connection to/from an address different to the client's.
Side note: people often confuse move with copy. In case you actually want to move, then that's a completely different question. Moving file on FTP is widely supported.
I don't think there's a way to copy files without downloading and re-uploading, at least I found nothing like this in the List of FTP commands and no client I have seen so far supported something like this.
Yes, the FTP protocol itself can support this in theory. The FTP RFC 959 discusses this in section 5.2 (see the paragraph starting with "When data is to be transferred between two servers, A and B..."). However, I don't know of any client that offers this sort of dual server control operation.
Note that this method could transfer the file from the FTP server to itself using its own network, which won't be as fast as a local file copy but would almost certainly be faster than downloading and then reuploading the file.
I can copy files between remote folders in Linux based systems.
In my particular case, I'm using very common file manager PCManFM:
Menu "Go" --> "Connect to server"
FTP Login info, etc
Open new tab in PCManFM
Connect to same server
Copy from tab to tab...
It's a bit slow, so I guess that it could be downloading and uploading back the files, but it's done automatically and very user-friendly.
The code below makes the FTP server to upload the file to itself (using loopback connection). It needs the FTP server to allow both passive and active connection mode.
If you want to understand the ftp commands here is a list of them : List of ftp commands
function copyFile($filePath, $newFilePath)
{
$ftp1 = ftp_connect('192.168.1.1');
$ftp2 = ftp_connect('192.168.1.1');
ftp_raw($ftp1, "USER ftpUsername");
ftp_raw($ftp1, "PASS mypassword");
ftp_raw($ftp2, "USER ftpUsername");
ftp_raw($ftp2, "PASS mypassword");
$res = ftp_raw($ftp2, "PASV");
$addressAndPort = substr($res[0], strpos($res[0], '(') + 1);
$addressAndPort = substr($addressAndPort, 0, strpos($addressAndPort, ')'));
ftp_raw($ftp1, "CWD ." . dirname($newFilePath));
ftp_raw($ftp2, "CWD ." . dirname($filePath));
ftp_raw($ftp1, "PORT ".$addressAndPort);
ftp_raw($ftp1, "STOR " . basename($newFilePath));
ftp_raw($ftp2, "RETR " . basename($filePath));
ftp_raw($ftp1, "QUIT");
ftp_raw($ftp2, "QUIT");
}
I managed to do this by using WebDrive to mount the ftp as a local folder, then "download" the files using filezilla directly to the folder. It was a bit slower than download normally is, but you dont need to have the space on your hdd.
Here's another workaround using PHP cUrl to execute a copy request on the server by feeding parameters from the local machine and reporting the outcome:
Local code:
In this simple test routine, I want to copy the leaning tower photo to the correct folder, Pisa:
$ch = curl_init();
$data = array ('pic' => 'leaningtower', 'folder' => 'Pisa');
curl_setopt($ch, CURLOPT_URL,"http://travelphotos.com/copypic.php");
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $data);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
$result = curl_exec($ch);
curl_close($ch);
echo $result;
Server code (copypic.php):
On the remote server, I have simple error checking. On this server I had to mess with the path designation, i.e., I had to use "./" for an acceptable path reference, so you may have to tinker with it a bit.
$pic = $_POST["pic"];
$folder = $_POST["folder"];
if (!$pic || !$folder) exit();
$sourcePath = "./unsortedpics/".$pic.".jpg";
$destPath = "./sortedpics/".$folder."/".$pic.".jpg";
if (!file_exists($sourcePath )) exit("Source file not found");
if (!is_dir("./sortedpics/".$folder)) exit("Invalid destination folder");
if (!copy($sourcePath , $destPath)) exit("Copy not successful");
echo "File copied";
You can do this from C-Panel.
Log into your C-Panel.
Go into file manager.
Find the file or folder you want to duplicate.
Right-click and chose Copy.
Type in the new director you want to copy to.
Done!
You can rename the file to be copied into the full path of your wanted result.
For example:
If you want to move the file "file.txt" into the folder "NewFolder" you can write it as
ftp> rename file.txt NewFolder/file.txt
This worked for me.

FTP client to zip before upload and unzip on the server after upload

I am always working with some big websites that is annoying to upload given the number of small files.
I use Filezilla but am happy to buy some commercial solution if there is one out there that can zip the files before upload and then unzip it after upload.
Its a pain to have to manually do that all the time.
If someone know of any ftp client or extension for Filezilla or other that would do that... I sent an email to the support for CuteFTP and WSFtp - no answer so far...
I know FTP protocol does not allow this command - thats why Im asking for a extension (if anyone know) or a free or commercial FTP client that do the job...
use this in a php file maybe called : zip.php
*<?php $zip = new ZipArchive(); $res = $zip->open('yourzipfile.zip'); if ($res === true{
$zip->extractTo('./');
$zip->close();
echo 'ok'; } else
echo 'failed'; ?>*
zip you site and upload it at the root of your server.
Also upload the zip.php at the same place
now enter this in your browser :www.yoursite.com/zip.php
If everything goes well, you will receive "ok"; otherwise there is a problem
For more details on the class: http://www.php.net/manual/en/class.ziparchive.php
Couldn't you set up some bash scripts to rar and ftp a file and then on the server check for a file's presence every x seconds and unrar and remove when it is there?

Resources