How to build and install spark in an offline environment? - maven

I am trying to install Spark 1.3.1 in an offline clusters (No Internet at all, only Lan). However, I don't know how to build it from source code since either via maven or sbt requires network connection. Can someone offer some help or possible solutions?
Thanks.

A simply (albiet somewhat hacky) solution would be to build it on a machine with internet access and then copy all of the items in ~/.ivy2 over to the machine with only lan access so that it can access the cached items. Another, perhaps simpler, option would be to use a pre-built Spark is thats an acceptable solution.

Related

Is it possible to install CDH on a RHEL7 server where Hadoop and few other components are installed seperatly

I have an RHEL7 server in which i am trying to create a common datalake platform for POC and learning purpose. I have setup Hadoop,Hive,Zookeeper,Kafka,Spark,Sqoop separately.
Installing these components separately turns out to be a tricky affair and is taking lot of effort even though this is for an internal purpose and not production specific.
I am now trying to install CDH package in this Server now.
Is it possible to do so? Will it overlap with the current installations?
How can this be achieved.
Note: Reason why we went with separate installation is due to unavailability of internet in the server at that point of time.
Reason why going for CDH now is due to availability of internet for few days after some approvals plus CDH saves lot of time and effort and includes the
components required to setup a datalake.
Can someone please help me out here.
Yes it is feasible to setup CDH without disturbing existing configs with docker. Checkout the below link for setup guide. I have tested this and it works fine even if I have individual tools setup.
https://www.cloudera.com/documentation/enterprise/5-6-x/topics/quickstart_docker_container.html

Remotely installing msi on different windows nodes?

I have an MSI I want to occasionally install on remote windows nodes in my cluster. What is the easiest and least overhead way of doing this?
I've looked into chef, but it seems like I need an entire server just to manage different nodes.
Chef's windows_package resource is used to install MSIs. If you're on a very old version you might need the windows cookbook for it, but we pulled MSI support in maybe ~12.0 I think? Check the release notes if you have to be certain, but anything remotely recent has it in core.

Clarity on Vagrant usage and provisioning tool

Ok, so I'm a bit late jumping onto the Vagrant band-wagon, but figured it's about time I did.
Brief background: I've been a freelance developer for quite some time now developing solutions based on Magento and Drupal, and have finally gathered enough demand to warrant the need to build up a team. Previously, whenever I started development on any new project, I use to clone a preconfigured base VM in Virtualbox, and use that. Of course there were still configurations to do on it until I could start with actual development. Every project's web files therefore all resided inside /var/www/projectname on an Ubuntu VM.
Now I've read up on why I should be Vagrant, especially considering that I now have a team of 4 developers working with me, but I would appreciate any feedback on the following questions I have:
Moderator note: I know this isn't exactly asking a programming question, so please advise if this could be turned into a wiki, as I'm sure that feedback into this will help someone just like me.
I am still reading through the Vagrant docs, so please be kind...noob questions ahead!
I now work on a Mac. Does it matter if I use Parallels, and another developer uses VirtualBox on Windows if we need to share or collaborate on projects?
When I issue the command, vagrant up for an existing project, will it start the VM up as I would in VirtualBox or will it recreate the VM?
Is the command vagrant halt the same issuing sudo poweroff in Ubuntu, for example?
I currently use PhpStorm and its SFTP feature for project files synchronization with the option to exclude certain files on the remote server (VM) from being imported and sync'ed...will I be able to specify the same using Vagrant folder sharing?
Could I easily zip or archive a Vagrant VM, move it to a file server, and then "re-import" when and if needed? (example bug fixes, or new feature enhancements)
What do we use to easily provision VMs for common projects? Should we being using Puppet, Chef, Puphpet or Salt? I've seen that Puphpet provides a nice GUI to create a vagrantfile which I'm sure once generated, we could customize for future projects. At a very basic level, we need to ensure that certain applications are installed onto the server (zip, phpmyadmin, OpenSSL, etc.), certain PHP settings, PHP and PEAR modules, and Apache settings. I already have base VMs set up as I'd like them for both Magento projects as well as Drupal projects.
EDIT: I should also add that I use to enable Host Adapter in VirtualBox (on Windows), configure the VHost inside Ubuntu, and then update my host machine's hosts file with something like 192.168.56.3 drupalsite1.dev. So I'm unsure if Port Forwarding would be better to use? I'm not very clued up on that I must admit.
Like i said - noob questions! However, I would really appreciate any feedback on these questions. My deepest thanks!
Most of what you are asking is subjective so common sense and experience are the best tools.
I recommend all team members use the same provider (parallels isn't officially supported) and virtualbox is readily available. The base boxes, by provider, could have slight variances, you never know.
Vagrant will start the vm similarly but vagrant also does other things like configuration the network, hostname, shared folders, etc. Not quite the same. The big power lies in the capability to be able to teardown the environment and bring it back in a cleanly provisioned state.
Basically, yes.
Yes, your vagrant VMs are just like your own mini cloud. You would interact the servers similar to the way you'd interact with external boxes.
Yes, the simple answer is that it's called packaging and you can share the resultant .box. However, it's good practice to keep the base box and provisioning scripts under CM so you can rebuild and modify as needed.
For provisioners, I think it is dependent upon your experience and your familiarity with the provisioner language and how much you want to invest in learning them. Look through the provisioner support and see what fits your need and budget. Chef has a very steep learning curve, in my experience, but also has a lot of thought built in. Most provisioners have wide libraries of available installation "scripts".
The host adapter can be handled identically in vagrant.
Learn by doing, I recommend going down the table of contents (navbar) of the vagrant docs and trying each step where it makes sense. Then make your decisions.
That is my 2 cents. Hope this helps!

Is it posible to install Hive and Hadoop on Windows?

I want to know if I can install Hive on windows? If yes how can I do that?
As of now the Microsoft provided "Hadoop on Windows" is not available to not available to general consumption and there is no public information about its general availability.
If you see my blog below you will see that I have had chance to use the binaries in past but then most of the focus is on "Hadoop on Azure" now which is in limited CTP release 2:
http://blogs.msdn.com/b/avkashchauhan/archive/2012/01/28/creating-your-own-hadoop-cluster-on-windows-azure-by-using-your-own-windows-azure-subscription-account.aspx
I would say that there are developers who have written some articles or solution on having Hadoop and other components running on Windows using Cigwin which you can try however there is nothing very robust and stable I really know. If you really want to give a try I would personally suggest downloading Cloudera Hadoop VM on your Windows Box and give a try using any Virtual Machine player application.
http://hadoop.apache.org/common/docs/r1.0.3/single_node_setup.html
You need to install cygwin.
Would I? No; I'd run the Cloudera VMs and not try to deal with all the possible issues.
Coming real-soon-now is the Hadoop for Windows Server program formerly known as Isotope.
Getting Started with Hadoop For Windows Server
Performance is way better than using Cygwin equivalent install - all the file I/O is done natively instead. Comes with Pig and Hive thrown into the bag too, and has an Azure equivalent install package as well - check it out.

Experiences with the various ways of running svnserve on Windows

Is there anyone out there that can share experiences with the various flavours of running svnserve on Windows. I'm using it mainly for a small hobby project that I share with friends, so it will run on my desktop.
Using the Collabnet Subversion Edge seems a bit heavy weight. Any drawbacks in just run 'svnserve'? I recently found VisualSVNserver which seems to add some easy administrative functions.
I have good experience with VisualSVN server, very easy to set up and configure user accounts.
It is also very easy to upgrade, just run the latest installer and you're done.
With VisualSVN you can run HTTPS with a self signed certificate. If you just run svnserve you're left without encryption and that is not recommended if you plan to access your server from the internet.
Keep in mind that whatever solution you choose they all use standard svn as the backend and you can easily move your repositories from one solution to another.
If you plan to make your project open source you can host your code at sourceforge or codeplex.

Resources