Spark AMI for Ubuntu (or maybe Centos) - NOT amazon linux? - amazon-ec2

The spark distribution includes an ec2 launch script that points to a location in github for spark ami's. Unfortunately the ami (only one) is an amazon linux - which is very limited. Specifically the amazon linux ami has limited packages support.
So, if for example I want to get php5.4 (instead of default 5.3) on the amazon linux - no such luck.
Are there any non-amazon linux ami's available for using with the spark-ec2?

I don't know of an up-to-date set of Spark AMIs apart from the ones provided by the Spark project.
That said, I have developed a way using Packer to automatically create a set of Spark AMIs from a set of base AMIs and some Bash scripts:
https://github.com/nchammas/spark-ec2/tree/packer/image-build
This is being done as part of SPARK-3821.
You'll need to do some work to get this to work with Ubuntu, since the scripts currently assume a yum-based Linux distribution.
Basically:
These lines define the base AMIs to build on.
These lines show the scripts that are being run to build the image.
These and these lines tell Packer to copy the built AMIs to all EC2 regions. You probably want to change this.
The shortest path to success for you might be to try a CentOS or Fedora base image that has the packages you are looking for. That will minimize the changes you have to make to the Bash scripts.
Around the Spark 1.4 release timeframe (roughly June/July 2015), I will work to have this merged into the main spark-ec2 repo.

Related

Ways to build different types of Virtual Machine images

Inside a CI/CD environment, I have a tar.gz file that I need to package into a virtual machine image. I want to take Ubuntu Server installation, install some packages, install my tar.gz file, and output the image into various both EC2/AMI and VMware OVF formats (and possibly others in the future, i.e. docker images).
I've been looking at Packer, Vagrant, and Ansible. But I'm not certain which of these tools will help me accomplish what I need.
Packer sounds like the right solution, but the documentation isn't very clear on how to start with a VMware OVF/OVA image and build an EC2/AMI image. Or am I able to start with a Docker image and output an EC2/AMI image??? Based on the docs, it seems like I need to start with AMI and build an AMI. Or start with ".vmx" (it doesn't actually say anything about OVF/OVA files) and build an OFA/OVF. But can I start with format A and end up with format B?
Is Vagrant or Ansible better for this??
Here is what I've gathered.
I found a nicely documented approach that starts with an Ubuntu ISO image, uses a preseed file for unattended Ubuntu installation, and deploys it out to vSphere as a template:
https://www.thehumblelab.com/automating-ubuntu-18-packer/
In order to deploy it out to vSphere, it requires a plugin from Jetbrains. It uses vSphere/ESXi to build the template image. Packer docs also discuss "Building on a Remote vSphere Hypervisor" using remote_* key words in the json file. But I suppose this accomplishes the same thing.
Then to build the EC2 AMI images, I believe you add another builder to the json. However, I don't believe you can start from the same ISO image as done in the VMware builder. Instead, I believe you need to start with a prebuilt AMI image (specified by the source_ami field in the json).
I guess Packer doesn't allow you to start from a single source A and fan out to target formats B, C, etc... If you want to build an AMI, you need to start with an AMI. If you want to build VMware images, you need to start from an ISO or .vmx (I suppose this means OVF) and build out an OVF/OVA or template.

Need to update my EC2 environment regularly?

I have an EC2 instance, Python 3 environment with many installed features like SQLAlchemy, cherrypy, etc. How often do I need to update them? Are they updated automatically, or do I need to do this updating manually? I haven't found this information online.
You EC2 instance will not be updated automatically. In this regard, it will behave exactly like a local machine. So, if you need to updates to your OS or other software, you will have to apply them manually.
However, manually running software update commands or even release upgrades is not the ideal scenario and something you may want to avoid in cloud environments.
Instead, you should think about how to automate your environment setup and configuration.
AWS provides some ways to run scripts at instance launch:
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/user-data.html
This way, you could easily migrate to more up-to-data base images (AMIs) or change the hardware configuration of your instance.

Can fly CLI tool be used for any Concourse machine?

Im working with Concourse and using the fly cli tool. When I create any new VM or instance running Concourse it will immediately give me the option to download the fly cli tool.
Is this version of fly specific only to the machine I downloaded it from or will it work on any machine running Concourse?
fly will warn you its version is too different from the target, and prevent itself from running if the discrepancy is too large (a major or minor version difference), in which case you should run fly sync.
If it's not warning you, you should be fine.

Spark EC2 support for Windows

All the documentation about deploying a Spark cluster on Amazon EC2 is relative to Linux environments. However, my distributed project is, at this moment, dependent of some Windows functionalities, and I would like to start working with a Windows cluster while making the necessary changes.
I would like to know if there is any method that makes us able to deploy a Windows Spark cluster on EC2 in a way relatively similar to the spark-ec2 script provided by Spark.
spark-ec2 currently only supports launching clusters in EC2 using specific Linux AMIs, so deploying a Windows Spark cluster is currently not possible using that tool. I doubt that spark-ec2 will ever have that capability, since all of the setup scripts it uses assume a Linux host.
That said, Databricks recently announced a community-managed index of Spark packages, and people are adding in stuff there all the time. For example, there is already a package to let you launch Spark clusters on Google's Compute Engine.
Though there doesn't currently appear to be anything for you, I would keep my eye on that community index for something that lets you launch Windows Spark clusters on EC2.
In a resource with Spark Packages, suggested by Nick, you can see recently added project by Sigmoid Analytics - that lets you launch the Spark cluster on Azure - spark_azure:
https://github.com/sigmoidanalytics/spark_azure

Run Amazon EC2 AMI in Windows

Is there a way to run an Amazon EC2 AMI image in Windows? I'd like to be able to do some testing and configuration locally. I'm looking for something like Virtual PC.
If you build your images from scratch you can do it with VMware (or insert your favorite VM software here).
Build and install your linux box as you'd like it, then run the AMI packaging/uploading tools in the guest. Then, just keep backup copies of your VM image in sync with the different AMI's you upload.
Some caveats: you'll need to make sure you're using compatible kernels, or at least have compatible kernel modules in the VM, or your instance won't boot on the EC2 network. You'll also have to make sure your system can autoconfigure itself, too (network, mounts, etc).
If you want to use an existing AMI, it's a little trickier. You need to download and unpack the AMI into a VM image, add a kernel and boot it. As far as I know, there's no 'one click' method to make it work. Also, the AMI's might be encrypted (I know they are at least signed).
You may be able to do this by having a 'bootstrap' VM set up to specifically extract the AMI's into a virtual disk using the AMI tools, then boot that virtual disk separately.
I know it's pretty vague, but those are the steps you'd have to go through. You could probably do some scripting to automate the process of converting AMI's to vdks.
The Amazon forum is also helpful. For example, see this article.
Oh, this article also talks about some of these processes in detail.
Amazon EC2 with Windows Server - announced this morning, very exciting
http://aws.amazon.com/windows/
It's a bit of a square peg in a round hole ... kind of like running MS-Office on Linux.
Depending on how you value your time, it's cheaper to just get another PC and install Linux and Xen.

Resources