learning mahout on windows - windows

I am familiar with machine learning algorithms, have good background in statistics and know how to program in R and Python. I have a Windows 7 PC with 4 gb RAM and 200 GB free hard disk space on my Lenovo t430 laptop. I want to learn/check/observe Apache Mahout algorithms and try examples. I am a paid member of safari online books and have access to books such as "Apache Mahout Cookbook" and "Mahout in Action"
Please answer my below questions.
1) what softwares do I need to install? are they available to install on a windows pc?
2) please provide a link to install those softwares on a windows pc
3) is there a book that will shows how to use Mahout on a windows pc?
I already searched stackexchange site as well as performed google search extensively, but I am still confused regarding which softwares to install, how to install them and how to run examples from mahout website. I am confused because some of the results that I got on Stackexchange site are 2 years old and I feel that they might not be valid any more. Also when I performed google search, variety of websites and youtube videos told me to install variety of softwares.

From the readme:
Welcome to Apache Mahout!
Mahout is a scalable machine learning library that implements many different
approaches to machine learning. The project currently contains
implementations of algorithms for classification, clustering, frequent item
set mining, genetic programming and collaborative filtering.
Of course you can run Java and the Java SDK on windows. All you need is an IDE that you can code Java in, such as Eclipse or IntelliJ. Just load up the jars!
The distribution contains:
mahout-core.jar
mahout-examples.jar
mahout-integration.jar
mahout-math.jar
seems like pretty straightforward java library. You might need to toss in a Hadoop or a database or whatnot to do some of the examples.

Related

Installation error for CNTK v2beta15

I am trying to run my model on Philly cluster which uses CNTK v2beta15 with py34. Could someone point me to the documentation for that particular version as many commands and examples of CNTK v2.0 (stable) are not working on Philly. Also, I am running into issues while installing the v2beta15 locally. I downloaded the binaries and trying to run install.bat while the machine crashes.
Any solution would be very helpful!
Python documentation for 2.0b15 is at
https://cntk.ai/pythondocs-ver/2_0-beta15/
Also, consider rephrasing your question to benefit other SO users. Is the Philly cluster something that the rest of the community should care about? If you have issues with internal infrastructure in your organization it might be best to contact the team responsible for that through non-public means.

Installing Hadoop tar file vs Cloudera VM on Ubuntu

I am complete beginner to Hadoop and I saw various posts on internet whics tells about installing Cloudera VM using VMWare. Recently I saw a youtube video which shows how to install hadoop on ubuntu by downloading hadoop tar file from Apache but they didn't install Cloudera VM. My Question is:
What is the difference between the two approach? Is there any benefit using one over the another?
I want to learn Hadoop by myself and looking for the best way/more adopted way to learn it.
Cloudera is "yet-another distribution of hadoop" You can think of basic Hadoop as stock android in Nexus mobile phones and Cloudera Hadoop as androids in non-nexus phone. Its basically a custom built version.
Cloudera is more of a plug-and play version meaning you can download the VM and start playing with Hadoop.
On, the other hand,Hadoop in Ubuntu is a get your hands dirty mode where you work on building your own hadoop.
Personal Opinion - I suggest setting up your own Hadoop that helps better understanding of internals of Hadoop and the Hadoop learning activities that follow.
Hope it helps. Happy Hadooping!
I spent a lot of time playing with the Cloudera software and their Quickstart VM is good, until you start trying to e.g. add nodes. It was not designed to do that but when you have invested time using it it would be nice to use it as a basis for a real system.
So the next step would be to use CDH (Cloudera's 'proper' Hadoop) or Hortonworks version HDP or maybe even MapR (I've not used it).
CDH and HDP technologies have nice GUI features over basic Hadoop and are seemingly easier to setup. HOWEVER I spent a lot of time trying to get both CDH and HDP to work unsuccesfully.
They give red lights and cryptic messages when things go wrong and add a layer of obfuscation when trying to fix things. For example in plain hadoop you can easily change the configuration files but in CDH you can't access them directly you have to discover where Cloudera hides their various options.
If would recommend using plain hadoop unless you have a big organisation, lots of people and machines.
UPDATE: I have finally got HDP to work and it is really nice. Good Ambari GUi and you can use Zeppelin Notebooks to do fancy graphics.

Which are the best sources for learning the Windows Installer technology?

I would like to know if you could share some (trusted) sources of information (books, URLs) that you consider the most relevant for learning Windows Installer. They could be for starting on this technology or for an advanced or professional level of knowledge.
Where can a future deployment engineer start and where can he/she go to keep on the right direction (step by step)?
I'm obviously biased but I think my blog and the WiX toolset are good ways to learn:
http://robmensching.com/blog
http://wix.sf.net (click on the Manual or Tutorial links on the right)
Some people like Phil Wilson's "The Definitive Guide to Windows Installer" but I never read it. I learned straight out of the MSI SDK.
I did 7 years of writing InstallScript installers before ever picking up MSI. While there is a huge difference between procedural script-driven imperative installs and data driven declarative installs, they both do the same fundemental thing: deploy software.
I became an MSI Expert but studying everything I could on the domain, writing LOTS of installs and by blogging for 7 years and answering over 4,400 posts on the InstallShield community forums. The only way to go in my book is to have been there and done that.
So the first step in your quest should be to understand the Windows Platform and related technologies very thoroughly. These evolve over the years but you should get a decent understanding of:
Fundamentals
Registry
FileSystem
NTFS
ACL's
DLL Types ( Win32, COM, .NET Assembly)
Win32 API
.NET Base Class Libraries
Service Control Manager Drivers ODBC
SQL IIS Active Directory ( GPO, LDAPand so on )
Global Assembly Cache
WinSxS Cache
DLL Hell
Good and Bad Installer Behavior
The second step is
Tools
Now let's start to writing installs. As Leslie ( Easter I assume ) said in another answer, pick a tool and learn how to use it to accomplish the above things. But don't stop there, as soon as you can go to the next step.
MSI
Start digging deep down into how your tool is working behind the scenes as soon as you can. Just as you can write C# in .NET and look at the IL with ILDASM, learn to use ORCA and see what is happening. Read the MSI SDK. Yes, it's rough and cryptic but I spent 3 months commuting beween DC and TX and I spent at least 16 hours a week traveling away from internet connections but nothing except the SDK to read. Read it, know it, live it... the cryptic help topics will eventually start to click and become second nature.
And finally, read my blog: DeploymentEngineering.com and every other blog you can find.
There is not a simple answer. The primary reason is that most install developers use a specific tool which in turn hides the bulk of Windows Installer behavior. While it would be nice if those developers had an in-depth knowledge of Windows Installer, that's not the case. 
My suggestion would be as follows: 
Focus on a specific tool. Many of the development environments offer a trial period and some are free. The on-line help for these tools plus the act of building some sample packages will be a useful process.
If practical, consider taking a training class for the tool. I know Flexera sells their basic and advanced InstallShield course manuals. They are a bit over-priced, but it does include need-to-know Windows Installer specifics. The problem you'll run into is that most documentation is specific to the tool without explaining a lot of the connectivity to Windows Installer.
You'll need the Windows Installer SDK -- in addition to the help file, there are some interesting tools and VBScript scripts. Orca is one tool that is included with the SDK and there are similar tools on the Internet (SuperOrca, InstEd, etc.). The SDK is not a great read but it is a great reference. As you come across specific questions regarding Windows Installer use the SDK help file to understand the deeper internals.
Google 'windows installer blog'. You probably don't want to hear that, but there are many great blogs available that cover many bits and pieces of Windows Installer. Make sure you pick up the Windows Installer Team blog.
No matter what path you choose, you'll find learning Windows Installer to be a hands-on process. I hope this helps! 
I'm also biased, but this might be helpful. I recently revisited WiX for a real-world Windows Installer project and wrote up my solution which ultimately plugged into a continuous integration server.
The steps in the article take you through using WiX, localizing the MSI, and creating a bootstrapper for installing any prerequisites.
For learning, Tramontana's tutorial WiX helped me a lot.
A nice little blog post about how to debug custom actions is WiX and DTF: Debug a Managed Custom Action and how to generate an MSI log.

First Steps on Win7 and Instrumentation

Hello as showed in the last pdc, we as developers can take advantage of SO capabilities and add instrumentation to our code (based on best practices).
So in that session they introduce the new Windows PowerShell-based troubleshooting platform, and how it enables you to easily monitor multiple data sources to empower the end user and IT pro to detect and resolve software problems. But the demo was never uploaded, and I want to know how to "write-code" using Instrumentation on Win7 or How to demostrate with some prof tools how my code help to improve software quality.
I Also try the Windows 7 for Developers Training Kit, but demo is apparently not using WMI
Thanks
I found this place to be a good place to start. Windows 7 Training Kit

CouchDB on Windows?

I started exploring CouchDB and I am interested in following:
Is there or will there be a Windows install?
If there is, is there a shared hosting provider that offers CouchDB?
Not knowing much about it, can it be somehow embedded in my application or bin deployed (don't laugh).
The most reliable source is the CouchDB download page
There are several places offering CouchDb hosting. Besides Cloudant, you can use most Infrastructure-as-a-Service parties like Google, AWS, etc.
This question was asked (and answered) elsewhere on StackOverflow here and here.
There's a Windows version now, available on CouchIO (http://www.couch.io/get) blog.
Download & Unzip
Double-Click bin\couchdb
Relax!
Visit http://127.0.0.1:5984/_utils
There's been a fully compatible Windows build of CouchDB shortly after every source release, since the initial 1.0.0 release over 18 months ago. You can get this directly from the Apache CouchDB mirrors http://couchdb.apache.org/ now.
NB the embedded test suite is actually for developer testing; due to subtle timing constraints not all tests will pass first time round on every machine. In the next release of CouchDB, the tests will be done outside the browser which will be both simpler and more robust.
Please up-vote this so we have the right information to hand.
Since this question was posted, there is a Windows download available at https://couchdb.apache.org/ .

Resources