FTP/SFTP access to an Amazon S3 Bucket [closed]

FTP/SFTP access to an Amazon S3 Bucket [closed] - ftp

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 3 years ago.
Improve this question
Is there a way to connect to an Amazon S3 bucket with FTP or SFTP rather than the built-in Amazon file transfer interface in the AWS console? Seems odd that this isn't a readily available option.

There are three options.
You can use a native Amazon Managed SFTP service (aka AWS Transfer for SFTP), which is easier to set up.
Or you can mount the bucket to a file system on a Linux server and access the files using the SFTP as any other files on the server (which gives you greater control).
Or you can just use a (GUI) client that natively supports S3 protocol (what is free).
Managed SFTP Service
In your Amazon AWS Console, go to AWS Transfer for SFTP and create a new server.
In SFTP server page, add a new SFTP user (or users).
Permissions of users are governed by an associated AWS role in IAM service (for a quick start, you can use AmazonS3FullAccess policy).
The role must have a trust relationship to transfer.amazonaws.com.
For details, see my guide Setting up an SFTP access to Amazon S3.
Mounting Bucket to Linux Server
Just mount the bucket using s3fs file system (or similar) to a Linux server (e.g. Amazon EC2) and use the server's built-in SFTP server to access the bucket.
Install the s3fs
Add your security credentials in a form access-key-id:secret-access-key to /etc/passwd-s3fs
Add a bucket mounting entry to fstab:
<bucket> /mnt/<bucket> fuse.s3fs rw,nosuid,nodev,allow_other 0 0
For details, see my guide Setting up an SFTP access to Amazon S3.
Use S3 Client
Or use any free "FTP/SFTP client", that's also an "S3 client", and you do not have setup anything on server-side. For example, my WinSCP or Cyberduck.
WinSCP has even scripting and .NET/PowerShell interface, if you need to automate the transfers.

Update
S3 now offers a fully-managed SFTP Gateway Service for S3 that integrates with IAM and can be administered using aws-cli.
There are theoretical and practical reasons why this isn't a perfect solution, but it does work...
You can install an FTP/SFTP service (such as proftpd) on a linux server, either in EC2 or in your own data center... then mount a bucket into the filesystem where the ftp server is configured to chroot, using s3fs.
I have a client that serves content out of S3, and the content is provided to them by a 3rd party who only supports ftp pushes... so, with some hesitation (due to the impedance mismatch between S3 and an actual filesystem) but lacking the time to write a proper FTP/S3 gateway server software package (which I still intend to do one of these days), I proposed and deployed this solution for them several months ago and they have not reported any problems with the system.
As a bonus, since proftpd can chroot each user into their own home directory and "pretend" (as far as the user can tell) that files owned by the proftpd user are actually owned by the logged in user, this segregates each ftp user into a "subdirectory" of the bucket, and makes the other users' files inaccessible.
There is a problem with the default configuration, however.
Once you start to get a few tens or hundreds of files, the problem will manifest itself when you pull a directory listing, because ProFTPd will attempt to read the .ftpaccess files over, and over, and over again, and for each file in the directory, .ftpaccess is checked to see if the user should be allowed to view it.
You can disable this behavior in ProFTPd, but I would suggest that the most correct configuration is to configure additional options -o enable_noobj_cache -o stat_cache_expire=30 in s3fs:
-o stat_cache_expire (default is no expire)
specify expire time(seconds) for entries in the stat cache
Without this option, you'll make fewer requests to S3, but you also will not always reliably discover changes made to objects if external processes or other instances of s3fs are also modifying the objects in the bucket. The value "30" in my system was selected somewhat arbitrarily.
-o enable_noobj_cache (default is disable)
enable cache entries for the object which does not exist. s3fs always has to check whether file(or sub directory) exists under object(path) when s3fs does some command, since s3fs has recognized a directory which does not exist and has files or subdirectories under itself. It increases ListBucket request and makes performance bad. You can specify this option for performance, s3fs memorizes in stat cache that the object (file or directory) does not exist.
This option allows s3fs to remember that .ftpaccess wasn't there.
Unrelated to the performance issues that can arise with ProFTPd, which are resolved by the above changes, you also need to enable -o enable_content_md5 in s3fs.
-o enable_content_md5 (default is disable)
verifying uploaded data without multipart by content-md5 header. Enable to send "Content-MD5" header when uploading a object without multipart posting. If this option is enabled, it has some influences on a performance of s3fs when uploading small object. Because s3fs always checks MD5 when uploading large object, this option does not affect on large object.
This is an option which never should have been an option -- it should always be enabled, because not doing this bypasses a critical integrity check for only a negligible performance benefit. When an object is uploaded to S3 with a Content-MD5: header, S3 will validate the checksum and reject the object if it's corrupted in transit. However unlikely that might be, it seems short-sighted to disable this safety check.
Quotes are from the man page of s3fs. Grammatical errors are in the original text.

Answer from 2014 for the people who are down-voting me:
Well, S3 isn't FTP. There are lots and lots of clients that support S3, however.
Pretty much every notable FTP client on OS X has support, including Transmit and Cyberduck.
If you're on Windows, take a look at Cyberduck or CloudBerry.
Updated answer for 2019:
AWS has recently released the AWS Transfer for SFTP service, which may do what you're looking for.

Or spin Linux instance for SFTP Gateway in your AWS infrastructure that saves uploaded files to your Amazon S3 bucket.
Supported by Thorntech

Amazon has released SFTP services for S3, but they only do SFTP (not FTP or FTPES) and they can be cost prohibitive depending on your circumstances.
I'm the Founder of DocEvent.io, and we provide FTP/S Gateways for your S3 bucket without having to spin up servers or worry about infrastructure.
There are also other companies that provide a standalone FTP server that you pay by the month that can connect to an S3 bucket through the software configuration, for example brickftp.com.
Lastly there are also some AWS Marketplace apps that can help, here is a search link. Many of these spin up instances in your own infrastructure - this means you'll have to manage and upgrade the instances yourself which can be difficult to maintain and configure over time.

WinSCp now supports S3 protocol
First, make sure your AWS user with S3 access permissions has an “Access key ID” created. You also have to know the “Secret access key”. Access keys are created and managed on Users page of IAM Management Console.
Make sure New site node is selected.
On the New site node, select Amazon S3 protocol.
Enter your AWS user Access key ID and Secret access key
Save your site settings using the Save button.
Login using the Login button.

Filezilla just released a Pro version of their FTP client. It connects to S3 buckets in a streamlined FTP like experience. I use it myself (no affiliation whatsoever) and it works great.

As other posters have pointed out, there are some limitations with the AWS Transfer for SFTP service. You need to closely align requirements. For example, there are no quotas, whitelists/blacklists, file type limits, and non key based access requires external services. There is also a certain overhead relating to user management and IAM, which can get to be a pain at scale.
We have been running an SFTP S3 Proxy Gateway for about 5 years now for our customers. The core solution is wrapped in a collection of Docker services and deployed in whatever context is needed, even on-premise or local development servers. The use case for us is a little different as our solution is focused data processing and pipelines vs a file share. In a Salesforce example, a customer will use SFTP as the transport method sending email, purchase...data to an SFTP/S3 enpoint. This is mapped an object key on S3. Upon arrival, the data is picked up, processed, routed and loaded to a warehouse. We also have fairly significant auditing requirements for each transfer, something the Cloudwatch logs for AWS do not directly provide.
As other have mentioned, rolling your own is an option too. Using AWS Lightsail you can setup a cluster, say 4, of $10 2GB instances using either Route 53 or an ELB.
In general, it is great to see AWS offer this service and I expect it to mature over time. However, depending on your use case, alternative solutions may be a better fit.

Related

No permanent filesystem for Heroku?

The app I am currently hosting on Heroku allows users to submit photos. Initially, I was thinking about storing those photos on the filesystem, as storing them in the database is apparently bad practice.
However, it seems there is no permanent filesystem on Heroku, only an ephemeral one. Is this true and, if so, what are my options with regards to storing photos and other files?

It is true. Heroku allows you to create cloud apps, but those cloud apps are not "permanent" - they are instances (or "slugs") that can be replicated multiple times on Amazon's EC2 (that's why scaling is so easy with Heroku). If you were to push a new version of your app, then the slug will be recompiled, and any files you had saved to the filesystem in the previous instance would be lost.
Your best bet (whether on Heroku or otherwise) is to save user submitted photos to a CDN. Since you are on Heroku, and Heroku uses AWS, I'd recommend Amazon S3, with optionally enabling CloudFront.
This is beneficial not only because it gets around Heroku's ephemeral "limitation", but also because a CDN is much faster, and will provide a better service for your webapp and experience for your users.

Depending on the technology you're using, your best bet is likely to stream the uploads to S3 (Amazon's storage service). You can interact with S3 with a client library to make it simple to post and retrieve the files. Boto is an example client library for Python - they exist for all popular languages.
Another thing to keep in mind is that Heroku file systems are not shared either. This means you'll have to be putting the file to S3 with the same application as the one handling the upload (instead of say, a worker process). If you can, try to load the upload into memory, never write it to disk and post directly to S3. This will increase the speed of your uploads.
Because Heroku is hosted on AWS, the streams to S3 happen at a very high speed. Keep that in mind when you're developing locally.

FTP access on Windows Azure

Quick question. I'm currently moving a asp.net MVC web application to the Windows Azure platform. Everything is working out okay apart from one thing.
In the application at the moment, we make use of FTP accounts for each user to import large quantities of files to our database.
I understand FTP on Azure is not as straightforward.
I've googled and found this article: Ftp on Azure
This seems to be what I need except obviously we'll need to be able to add new users with their own separate FTP account. Does anyone know of an easy workaround for this?
Thanks in advance

Did you consider running a (FTP) service that's not IIS based, and you could add users programatically? Also, how are you going to solve data sync issues when the role recycles or when you upgrade it? Make sure to backup to blob on a somewhat regular basis!
Personally, I'd mount a VHD drive (Azure Drive) which is actually hosted on blob storage, and have my FTP server point to that drive. However, make sure you only have one instance of the server (problem #1) unless you don't need higher than 99,9% reliability you can solve this by running a single instance. Step 2 is I'd implement user management in relation to that program.
It's not straightforward, and I'd advise against it though. But I understand that sometimes you have to do this. I would solve it like I described above.

steps to securing amazon EC2+EBS

I have just installed a fedora linux AMI on amazon EC2, from the amazon collection. I plan to connect it to EBS storage. Assuming I have done nothing more than the most basic steps, no password changed, nothing extra has been done at this stage other than the above.
Now, from this point, what steps should I take to stop the hackers and secure my instance/EBS?

Actually there is nothing different here from securing any other Linux server.

At some point you need to create your own image (AMI). The reason for doing this is that the changes you will make in an existing AMI will be lost if your instance goes down (which could easily happen as Amazon doesn't guarantee that an instance will stay active indefinitely). Even if you do use EBS for data storage, you will need to do the same mundane tasks configuring the OS every time the instance goes down. You may also want to stop and restart your instance in certain periods or in case of peak traffic start more than one of them.
You can read some instructions for creating your image in the documentation. Regarding security you need to be careful not to expose your certification files and keys. If you fail on doing this, then a cracker could use them to start new instances that will be charged for. Thankfully the process is very safe and you should only pay attention in a couple of points:
Start from an image you trust. Users are allowed to create public images to be used by everyone and they could either by mistake or in purpose have left a security hole in them that could allow someone to steal your identifiers. Starting from an official Amazon AMI, even if it lacks some of the features you require, is always a wise solution.
In the process of creating an image, you will need to upload your certificates in a running instance. Upload them in a location that isn't bundled in the image (/mnt or /tmp). Leaving them in the image is insecure, since you may need to share your image in the future. Even if you are never planning to do so, a cracker could exploit a security fault in the software your using (OS, web server, framework) to gain access in your running instance and steal your credentials.
If you are planning to create a public image, make sure that you leave no trace of your keys/identifies in it (in the command history of the shell for example).

What we did at work is we made sure that servers could be accessed only with a private key, no passwords. We also disabled ping so that anyone out there pinging for servers would be less likely to find ours. Additionally, we blocked port 22 from anything outside our network IP, wit the exception of a few IT personnel who might need access from home on the weekends. All other non-essential ports were blocked.
If you have more than one EC2 instance, I would recommend finding a way to ensure that intercommunication between servers is secure. For instance, you don't want server B to get hacked too just because server A was compromised. There may be a way to block SSH access from one server to another, but I have not personally done this.
What makes securing an EC2 instance more challenging than an in-house server is the lack of your corporate firewall. Instead, you rely solely on the tools Amazon provides you. When our servers were in-house, some weren't even exposed to the Internet and were only accessible within the network because the server just didn't have a public IP address.

Do I need Amazon's EC2, Cloudfront, RDS?

I want to publish a web site on Amazon's servers, that:
Runs CakePHP
Uses MySQL to store data
Lets users upload audio through flash (currently using a hosted Flash Media Server), and listen to the files later
Do I need Amazon's EC2 for the website, RDS for the MySQL database, and CloudFront for the FMS? I'd really like a walkthrough of which services I should use.
Thanks.

First of all you need EC2 service in order to have a virtual machine, where you can install Apache, PHP and your Web Application.
Then you also need a database server and data repository for the media files. The recommended way is exactly what you suggest: RDS for MySQL and CloudFront as the file repository.
Initially none of the above services (RDS, CloudFront and even EBS) were available. Developers have no way to use a MySQL database, because even if it was installed in an EC2 instance, the instance isn't guaranteed to stay up and running and if the instance is lost, the data is also lost. For this reason EBS was introduced. It created a mounted storage with guaranteed persistence that you could access from the EC2 instance. Theoretically you could install MySQL there and use it to store the flash files. If you only want to serve files through the HTTP protocol, there is no problem using EBS.
CloudFront however has some advantages:
Users are automatically routed to the nearest edge location for high performance delivery of your content.
You can also use it to stream content through the the RTMP protocol.
You don't have to worry about the size of the storage. With EBS you create a storage with a specific size. This could be a problem if you later find out that you need more storage. With CloudFront the files are installed in S3 and you do not need to worry about their size.
You do not waste web server capacity. If you use EBS, the files will be served by the server in EC2.
You could also use S3, but you wouldn't able to use the RTMP protocol and you would need to manually create links to your files. Also, it wouldn't be possible to use your domain name for the files.
RDS also has some advantages over installing MySQL in EC2, EBS:
automated database backups
You can monitor your database with Amazon CloudWatch (free service)

You need EC2 to launch instance and create your LAMP server. RDS is good if you don't need to manage MySql db yourself, but one limiting factor of RDS is you can't have DB replication.
For persistent storage, you can make use EBS or S3 for data file.

One thing not mentioned in any of these replies is the security that may (or may not) need to go around your file access. Cloud networks are good for publicly accessible data, but I haven't seen a cloud network yet that will provide a granular level of file access on a per user basis. While you may be able to obfuscate the url's to access files so that it isn't easy to sequentially guess audio file IDs, that may not be enough if people are keeping private audio. Not saying don't do it, just make the decision with care.

Is it possible to use AWS as a web host? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed last year.
The community reviewed whether to reopen this question last year and left it closed:
Original close reason(s) were not resolved
Improve this question
Is it possible to load / host an entire website using AWS? Or is it only a service that can load specific pieces of a website - such as images, etc. Obviously, I'd want to use my own domain. If you can use it, are there any limitations?
Here's the AWS link, for context:
http://aws.amazon.com/s3/

AWS = Amazon Web Services = a suite of different web services.
S3 (which you linked to) is an object store. You can't host a web service on S3.
EC2, also under the AWS umbrella, is virtualized compute space. You CAN host a web service on EC2. It is just like having a server in a rack somewhere, except that when you shut down an instance, it is gone forever. But using EBS, which is like a virtualized hard drive, will prevent you from losing your data when the EC2 instance shuts down.
See http://aws.amazon.com/ec2/ and http://aws.amazon.com/ebs/

EDIT: Aug 12, 2016 they have a dedicated section on how to get started hosting a website on AWS. Please note S3 only allows STATIC websites but AWS provides SDKs in case you want to run PHP, ASP.NET, etc on your instance. See the links for more details.
http://docs.aws.amazon.com/AmazonS3/latest/dev/WebsiteHosting.html
https://aws.amazon.com/websites/
So guess what I just found while doing some Google searches for hosting on AWS?! A blog post by the AWS stating that you can (now) host a website on S3. (Funny enough, the StackOverflow and the AWS post were right next to each other in the SERPs!)
http://aws.typepad.com/aws/2011/02/host-your-static-website-on-amazon-s3.html

Yes it is completely possible to host websites on AWS in 2 ways:
1.) Easy - S3 (Simple Storage Solution) is a bucket storage solution that lets you serve static content e.g. images but has recently been upgraded so you can use it to host flat .html files and your site will get served by a default Apache installation with very little configuration on your part (but also little control).
2.) Trickier - You can use EC2 (Elastic Compute Cloud) and create a virtual Linux instance then install Apache/NGinx (or whatever) on that to give you complete control over serving whatever/however you want. You use SecurityGroups to enable/disable ports for individual machines or groups of them.
#danben your EC2 instance does not have a constant public IP by default. Amazon makes you use a CNAME - not an A record as your IP may change under load. You have to pay for an ElasticIP to get a consistent public IP for your setup (or use some sort of DynDNS)

As #danben mentioned, there is a difference between S3 and EC2.
One thing that may be interesting for people looking to host a website on Amazon, specially if they want to start small is that Amazon started offering a free tier some months ago. Together with services like BitNami Cloud Hosting (disclaimer, I helped design it, so it is a bit like my baby :) means you can get your site on the Amazon cloud in just minutes, for basically 0 dollars. You still need to give credit card info to Amazon, but it will not be charged if you stay within the limits of their free tier.
One thing to consider too is that at the time of writing this (Jul 2011), Amazon restricts you to one IP address per server. If you need to host multiple domains, you may need to use name-based virtual hosts or some tricks using their Elastic Load Balancer (which will cost you more). But all in all, it is worth a try if you are a bit technical and want more control than what shared hosting provides you

At reinvent 2018, AWS launched the Amplify Console, a continuous deployment and hosting service for single page and static apps with serverless backends. Check it out: http://console.amplify.aws

Yes! You can easily host your website on AWS.
There are two ways;
One with Native AWS - This is a tricky method that requires expertise and a series of commands to run. You need to manage security, DNS, SSL, server protocols, and more by yourself.
Managed Cloud Platforms like Cloudways - You can easily launch an AWS server and host your website with a few clicks. Moreover, you can quickly manage your server protocols, packages, security firewalls, DNS, and more from its intuitive platform.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio