How to install puppet agent in a large scale of nodes? - windows

What can be a better approach to install PE4 agents in a large scale(500+) machines? These machines are all windows VMs, not residing on any clouds.
It is really a tough task to login to each machine and install manually.

What was already provided in the comments as far as orchestration tools.
You can also use WMIRemote, PSExec or PowerShell Remoting to perform the installs. For the installs, you could also use Chocolatey.


Remotely installing msi on different windows nodes?

I have an MSI I want to occasionally install on remote windows nodes in my cluster. What is the easiest and least overhead way of doing this?
I've looked into chef, but it seems like I need an entire server just to manage different nodes.
Chef's windows_package resource is used to install MSIs. If you're on a very old version you might need the windows cookbook for it, but we pulled MSI support in maybe ~12.0 I think? Check the release notes if you have to be certain, but anything remotely recent has it in core.

Chef Architecture Configuration Windows

We have a windows environment here and I was looking into Chef CM solution as it has significantly strong Windows support comparing to others. I know that it can use Server/Client configuration or just chef-solo. Ideally, I would like to have a distributed environment and be able to manage my Windows nodes from a centralized server.
It looks like chef server is not available on Windows and I would like to ask if there is a way to bypass Linux and only have chef running on pure Windows environment? Is there other way to setup centralized chef repository with all the recipes, cookbooks, etc that would be running on Windows or Linux part is essential to fully implement Chef? Is it doable to run just chef-solo but have a centralized version controlled chef repository synced with it? Thanks!
Chef Solo or also Chef Client in local mode (a.k.a. chef-zero, preferred ofer chef-solo by many people) should be able to fulfil your needs. You would just have to either update an, e.g., Git repository or download an archive containing all the cookbooks and run let them apply by chef-solo/ chef-zero.
Chef server is not available for Windows. If it is an option, you can make use of Hosted Chef.

Is it posible to install Hive and Hadoop on Windows?

I want to know if I can install Hive on windows? If yes how can I do that?
As of now the Microsoft provided "Hadoop on Windows" is not available to not available to general consumption and there is no public information about its general availability.
If you see my blog below you will see that I have had chance to use the binaries in past but then most of the focus is on "Hadoop on Azure" now which is in limited CTP release 2:
I would say that there are developers who have written some articles or solution on having Hadoop and other components running on Windows using Cigwin which you can try however there is nothing very robust and stable I really know. If you really want to give a try I would personally suggest downloading Cloudera Hadoop VM on your Windows Box and give a try using any Virtual Machine player application.
You need to install cygwin.
Would I? No; I'd run the Cloudera VMs and not try to deal with all the possible issues.
Coming real-soon-now is the Hadoop for Windows Server program formerly known as Isotope.
Getting Started with Hadoop For Windows Server
Performance is way better than using Cygwin equivalent install - all the file I/O is done natively instead. Comes with Pig and Hive thrown into the bag too, and has an Azure equivalent install package as well - check it out.

Simplest way to get access to a remote server for computing tasks

I'm working on some academic research projects involving scraping large data sets from the web using Python. It's been inconvenient to work on my academic institution's Linux server because (1) I don't have superuser access, meaning I'm dependent on the IT staff to install my packages, and (2) my disk quota is somewhat limited (I would ideally want ~10 GB). What is the simplest way for me to get access to a machine that solves these problems? I don't need huge processing power; I just need access to a reasonably fast machine that runs 24/7, so that my programs can run continuously, and above all, something very simple to get running, use, and maintain, since I have a few non-CS people working on this project with me. Linux would be preferable, but I'd consider Windows too.
I'm aware of Amazon Web Services, but am wondering if there's something more appropriate to my specific needs.
By the way, it would be a huge bonus if I could get some sort of remote desktop access to this machine so I wasn't limited to using SSH and SFTP.
EDIT: I can't use VirtualBox or Virtual PC because I need the program to be running around the clock, and I need to turn off my laptop often, etc.
If you do want to stick with running on your CS department's machines, use virtualenv to solve your package installation woes. And if disk space is an issue, you could use S3 (and perhaps FUSE) to store huge amounts of data extremely cheaply.
However, if that's not really what you're after, I can recommend Slicehost very highly. They give you a virtual private server - so you have complete control over what gets installed, users, admin, etc.
In principle, it's very much like EC2 (which I prefer to use for "real" servers), but has a friendly interface, great customer service and is aimed at smaller projects like yours.
Use x11vnc with ssh.
'sudo apt-get install x11vnc' on your remote server.
Once you have that, you can access your remote server via vnc, but the great thing is that you can tunnel vnc over ssh like so:
ssh -X -C -L 5900:localhost:5900 remotehost x11vnc -localhost -display :0
For more details see the x11vnc manpage.
Or, just setup remote desktop -- (which is actually vnc) on your linux distribution. Most distributions come with a GUI to configure remote desktop access.
If you have a linux machine you can use, then SSH -X will allow you to start GUI programs. It's not remote desktop, but it's close.
ssh -X
Then bam. A firefox window pops on your desktop.
I have been pretty happy with TekTonic Virtual Private Servers. It's a virtualized environment, but you have full root access to install any packages you need. I'm not sure what your CPU and memory constraints are, but if they aren't too extensive then this should fit the bill nicely for you. I don't know if you would be able to enable a remote desktop as I've never tried but it may be possible to install the requisite packages.
The plans range from $15/mo to $100/mo, the $15/mo plan comes with 294MB RAM, 13GB disk space, and 2.6GHz max CPU speed. I ran on that plan for quite a while and eventually moved up to the next level up with double the disk/cpu/mem, and I've been quite happy with it. I've been with them since 2003 and have yet to find anyone who offers equivalent plans at these prices.

Creating a virtual machine image as a continuous integration artifact?

I'm currently working on a server-side product which is a bit complex to deploy on a new server, which makes it an ideal candidate for testing out in a VM. We are already using Hudson as our CI system, and I would really like to be able to deploy a virtual machine image with the latest and greatest software as a build artifact.
So, how does one go about doing this exactly? What VM software is recommended for this purpose? How much scripting needs to be done to accomplish this? Are there any issues in particular when using Windows 2003 Server as the OS here?
Sorry to deny anyone an accepted answer here, but based on further research (thanks to your answers!), I've found a better solution and wanted to summarize what I've found.
First, both VirtualBox and VMWare Server are great products, and since both are free, each is worth evaluating. We've decided to go with VMWare Server, since it is a more established product and we can get support for it should we need. This is especially important since we are also considering distributing our software to clients as a VM instead of a special server installation, assuming that the overhead from the VMWare Player is not too high. Also, there is a VMWare scripting interface called VIX which one can use to directly install files to the VM without needing to install SSH or SFTP, which is a big advantage.
So our solution is basically as follows... first we create a "vanilla" VM image with OS, nothing else, and check it into the repository. Then, we write a script which acts as our installer, putting the artifacts created by Hudson on the VM. This script should have interfaces to copy files directly, over SFTP, and through VIX. This will allow us to continue distributing software directly on the target machine, or through a VM of our choice. This resulting image is then compressed and distributed as an artifact of the CI server.
Regardless of the VM software (I can recommend VirtualBox, too) I think you are looking at the following scenario:
Build is done
CI launches virtual machine (or it is always running)
CI uses scp/sftp to upload build into VM over the network
CI uses the ssh (if available on target OS running in VM) or other remote command execution facility to trigger installation in the VM environment
VMWare Server is free and a very stable product. It also gives you the ability to create snapshots of the VM slice and rollback to previous version of your virtual machine when needed. It will run fine on Win 2003.
In terms of provisioning new VM slices for your builds, you can simply copy and past the folder that contains the VMWare files, change the SID and IP of the new VM and you have a new machine. Takes 15 minutes depending on the size of your VM slice. No scripting required.
If you use VirtualBox, you'll want to look into running it headless, since it'll be on your server. Normally, VirtualBox runs as a desktop app, but it's possible to start VMs from the commandline and access the virtual machine over RDP.
VBoxManage startvm "Windows 2003 Server" -type vrdp
We are using Jenkins + Vagrant + Chef for this scenario.
So you can do the following process:
Version control your VM environment using vagrant provisioning scripts (Chef or Puppet)
Build your system using Jenkins/Hudson
Run your Vagrant script to fetch the last stable release from CI output
Save the VM state to reuse in future.
I'd recommend VirtualBox. It is free and has a well-defined programming interface, although I haven't personally used it in automated build situations.
Choosing VMWare is currently NOT a bad choice.
Just like VMWare gives support for VMWare server, SUN gives support for VirtualBOX.
You can also accomplish this task using VMWare Studio, which is also free.
The basic workflow is this:
1. Create an XML file that describes your virtual machine
2. Use studio to create the shell.
3. Use VMWare server to provision the virtual machine.
