starfish or splunk - hadoop

hiall
My goal is to analyze log files of Hadoop and there are two tools starfish(open source) and splunk(commercial product). Does anyone know the pros and cons as to which one to choose.
I really appreciate your answer.
Thanks

Well,
the pros and cons are the same of any open source vs commercial tool choice.
The main guideline should be, what are your prerequisites?
Splunk core is opensource, the free license allows you to index 500Mb/day,
probably its main advantage is providing a BI tool cheaper than other comercial ones,
it also has an impressive amount of plugins, including for Hadoop,
and like Hadoop relies on a (different) MapReduce implementation since Splunk 4.x.
It both has a Python and Java SDK, which may come in handy.
Its approach is, install it and after (a minimal) setup, start playing with your data.
I don't know Starfish, though it does look promissing,
it only seems to require JavaFX while Splunk comes with its own Python alternative installation.
But in the end, it all boils down to what are your most important prerequisites.

Barriers to entry is low for both. Best is to try both out for a while and see what works for you.
Depending on your use case each tool has different strengths. What is your use case?
Generally speaking Splunk is easy and modern with great community support. Answers are generally a few searches away.

Related

How good enough it is to build a REST Api based server in C++?

I am looking from a security perspective and are there any frameworks available to build and use JSON?
I am interested because of the performance which C++ can offer. Currently, Python and Node.js are also available option for me. How can I decide which language+framework should I use ?
Appreciate your support.
Thanks !
PS. - Currently, I am using Java Spring to implement restful apis.
There's plenty of them out there. The absolute simplest one I found, and use, is this. https://github.com/eidheim/Simple-Web-Server .
There are clearly more sophisticated ones out there, just "ask the google". I don't know of any exhaustive comparisons between these frameworks and the ones you specified. However, you could write your own simple benchmarks around the domain you're most interested in. That's what really matters, right?
For json libs there's rapidjson and spirit json. Don't waste your time with boost::property_tree, it's not fully fleshed out wrt JSON.
As for speed. It's compiled so.... its a good possibility that a C++ framework will outperform one based on an interpreted language. So lets say it's faster, you have a heck of a learning curve to climb (assuming you don't know C++ already) but in the end, in my humble opinion it's worth it. I've done these before in Python and Ruby. I really enjoy having the compiler check types. My code is more robust, it does what I tell it to do, and I'm not forced to use exceptions.
Tip: get a code completion plugin like YouCompleteMe

Reason for Make's Popularity vs. Alternatives

What forces are at work keeping crufty old Make (with or without makefile generator tools) prominent as a build tool? Is it deficiencies in alternatives that keep them from being widely adopted, or insufficient publicity, or does something about Make keep it in place?
Despite Make's many weaknesses and difficulties dealing with large projects
(e.g. see http://freshmeat.net/articles/what-is-wrong-with-make) it appears to still be more widely used than newer, improved alternatives such as Scons, Jam, Rake, Cook, and others.
Are there measurable benefits to the alternatives, or are the "market shares" due mostly to opinion and experience of team leaders?
Ubiquity: I like Make because I can trust it will be available where I need it i.e. installed or easily installable on the target machine.
It's widely available, well documented, concise and powerful + best of all - no XML!.
I've been using it for close to 15 years and still haven't found something better. The coolest thing I've done with it is to have a master makefile generate makefiles for sub projects on-the-fly.
Regarding your question, which forces are keeping make alive ...its the force of habit.
simplicity - easy to do simple things
ubiquity - some version is on your system
speed - fast enough for most things
expressive - pretty good match to the job
nonobvious complexity - mainly large projects expose problems
It's availability on a large number of platforms probably helps. If writing a product for multiple platforms, knowing it will always be there is a plus point. It's a pain to have to port your build tool to a new platform before you can build your own project.
Hm, I never used make as a build system.
Other than that, it's a unique dataflow-programming language, where you can describe set of nodes, each serving specific purpose, describe their behavior, and let the manager handle and control the data flow between them.
We used scons on a relatively large project to replace make, and found that it was a reasonably flexible system, that allowed us to do some very necessary (but very unfortunate) hacking to get things to build the way we needed them to. Also, make is -strange-.
i think what would have to occur to see a big shift to another tool, is 1st the tool would have to be created.... that is significantly better. and to affect change, either one of the linux distros or one of the major packages would have to switch to it and probably keep the old one arround for compatibility. i would envision that the new build tool would be capable of generating the legacy makefiles. linux already demonstrated how well he can solve the source code control system with git. i have a pretty good hunch he could come up with something pretty cool and tie in with git.

How do you avoid platform/framework decision paralysis? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
So, I've got an idea for a website. I can start off using any platform and frameworks I want, but there are almost too many options.
OS Platform:
Windows, *nix
Web Framework:
Rails, ASP.NET, ASP.NET MVC, Django, Zend, Cake, others
Hosting:
EC2, Dedicated Server, Shared Hosting, VPS, App Engine, Azure, others
Persistence:
S3, MySql, PostreSql, Sql Server, SimpleDB, CouchDB, others
How do you avoid decision paralysis and get started?
Firstly, your familiarity with a framework's language should dictate which framework you choose. Don't add the burden of learning another language on top of learning a framework.
Next, have a look at the remaining frameworks. Do they have good documentation? What about the community. (A good community can go a long way to making up any shortcomings of a given technology.) Does the framework solve the problems that you need solved?
Finally, just dive in and try something! Pick the one that makes the most sense to you and start writing code. Don't do too much hand-wringing over your decision. If it becomes obvious that you made the wrong choice, it should be obvious quite early. Learn from what you've accomplished so far and consider restarting with a different technology. (Just don't get several weeks down the road before you make this decision!)
I'm sure you don't like all of those technologies equally. Pick a framework that you like and get to work.
It depends on what your app is going to be doing. A handful of the technologies you listed are direct competitors (like Django vs. Rails), but some are completely different ways to do things (like MySQL vs. S3).
Questions to answer before you begin:
Will the app need to be horizontally partitioned in the near term? If so, using EC2, Google App Engine or Azure would be a good option.
Will your app fit into the constraints of Google App Engine? If so, it requires a lot less hassle on your part than running on bare metal (whether real or virtual).
What's your preferred web framework? If you want an MS framework, you'll need to run on a host that supports that.
What will your persistence and data access patterns look like? This will determine whether to use a database or something more exotic.
If you are running on EC2, the other AWS services are more appealing. Similarly, if you are using GAE, you have only one option for persistence. If you are using Rails, may as well start with MySQL.
In answer to your question of how to reduce the number of options, the answer is to realize that many of the options are related, so you don't have as many choices to make as it first appears.
Some advice that was once given to me is, pick what your friends (or colleagues) are using. Having people around you that you can share ideas and the learning experience with is invaluable.
If you want to learn something new: I'd just go with your gut and get started. If it sucks then switch to something more familiar.
If you don't have much time: Go with what you know and forget about the other options. Just start coding.
Optimize for happiness. Pick the one that you like the most. Or the one that intrigues you the most.
I've worked in Microsoft shops, in Ruby on Rails, and in homegrown shops having Apache, Jetty, even Mason.
All frameworks have their warts, their idiosyncracies that will keep you up until 3 AM, and their "tribal knowledge" vagaries that will be completely unexportable to other frameworks. (The last point is sometimes by design, the whole "platform entrenchment" business strategy)
Listen to what the supporters of the frameworks say about the problems with the other frameworks (Google: X framework vs Y framework). Pick the framework that has the loudest supporters. If they are equally loud, make the decision with a dice roll.
With me it's simple.
I only know MS stack and see no point in "checking out" all of those you mentioned.
No, actually I once tried to use JSF before excluding it from my list permanently.
Use what you are experienced in and where you can be more productive. The objective is to get your site up and running. Go for it.
One of the biggest factors in determining which platform/framework to use is your budget. You have to factor in the cost of licensing, software required to develop/maintain your website and other miscellaneous costs.
I suggest you begin with a scorecard of your own construction. Perhaps you can find different ones on the web, but if you do, modify them to meet YOUR needs. There should be a scorecard for each level in the stack (as you've described). Each scorecard should share some aspects to grade with other scorecards but each will also have their unique aspects.
Once constructed, weight each aspect graded according to your needs.
Once you've chosen the weights, pick the scales for grades.
At this point promise yourself you wont mess with the weights or the scale and then start collecting data on your options for each level in the stack.
You may also want to put a time limit on the collection period.
Make your decision based on the outcome of the scorecard.
The beauty of this approach is that the effort is made in constructing the scorecard, not in circular arguments of options. The effort in making the scorecard is vendor agnostic and focuses on the desired result, not the options. Thus you can avoid paralysis.
One more thing, my best scorecards have included sections addressing the availability of resources and other human related things. Don't make the mistake of just looking at the technology.
good luck.
Go for personal preferences.
One decision at a time:
Firts I would begin with type of language:
Script: PHP, Python,
Serious: Java, .Net
The language will restrict your OS, plattform and will give you hints for the dataabse decission. The database load is also important. And, Do you want logic in the DDBB? how much data?
Last advice. Try combinations well tested. LAMP, WAMP, Windows with SQL Server and .NET.
Evaluate each platform and technology for quality of tools for your needs. For example, if you are cost sensitive, you would value free operating systems and tools higher than costly ones. If you need performance, you would value tools which provide high performance higher than ones that don't.
It entirely depends on your situation. I spent several months evaluating stuff for a new commercial web site last year, and it was very easy to feel paralized. In the end it was talking to several people who'd done similar things, and of course reading a lot of stuff online and from Amazon. I chose Java, since our team had a lot of experience in it, and it has good performance and extensive supporting technologies. Oracle is our database but we used a persistence manager to make it easy to change later on. We used a half-dozen very good libraries to eliminate much of the boring and repetitive coding (Restlet, iBatis, Freemarker, XStream, jQuery, SLF4J). We used Glassfish as our web server.
Yours sounds like a small project with only you to work on it. In that case, pick a complete framework instead of a smorgasbord like we did. Pick something fun to work with, and something with good "return on resume". Look very hard at Ruby on Rails, Django (kind of a Python on Rails), and Groovy on Grails (a Rails-wannabe for the Java world). In your shoes I'd pick Ruby on Rails because there's a large and growing community and a good number of books and tutorials. Plus, Ruby looks like a worthwhile language to learn. For your database, just pick one. These frameworks make it easy to change your mind later. Pick MySQL unless you have another you like better.
And as other posters said, just do it! ;-)
Like others said, pick something you and your employees are familiar with. I highly doubt you are close to being industry ready with all those techs.
OS Platform: Windows, *nix
Shouldn't matter except for Windows licensing costs, and that is probably the least of your expenses.
Web Framework: Rails, ASP.NET, ASP.NET MVC, Django, Zend, Cake, others
Dependent on your favorite language
Hosting: EC2, Dedicated Server, Shared Hosting, VPS, App Engine, Azure, others
You should design your product to be movable, so you can scale among these. If you know for sure you are going big, then just start off with EC2. App Engine is extremely limiting, ex. they don't let you form outbound connections.
Persistence: S3, MySql, PostreSql, Sql Server, SimpleDB, CouchDB, others
You need to do the research yourself whether or not your product requires an RDBMS or a simple key/value store, and what features each of these have.
Just go for it! Your platform choice really is not all that important as long as you make a reasonable choice (Ruby + Rails, Python + Django, PHP + Cake/CodeIgniter). Any of these can be used to build successful sites. If your site really takes off, you'll be able to scale it fine.

Multi-language build tools

I have a build process for a large enterprise system comprising several dozen separate EXEs and DLLs. These use multiple languages, C, C++, Fortran, Python, Awk and a couple more. The build scripts are 4DOS batch processes which evolved over 4 decades. They are large and unwieldy and need constant care and feeding.
I must keep the Visual Studio solution and project files as the basic compile/link entities. What's the best tool for wrapping these disparate languages all together. 4DOS is very old and cumbersome.
EDIT:
Thanks gang. I think I'll try SCONS first because it's Python. We have plenty of people well versed in Python to be able to update and maintain it. I'm 61 now and it's not going to be me supporting this in the long term. I don't like anything requiring JAVA or XML because those are not languages already in our product mix and we have enough in play.
Those blog posts were good. He concluded that SCONS was best but simply too slow for his purposes. I'm not looking for speed in nightly builds. It's got until 7 AM. I want readability and maintainability.
For example Apache Ant
Ant is a good choice. I would also be tempted to try Rake.
I think the best choice is NAnt and MSBulid
Scons perhaps?
These may be a little a outdated - the build systems might have evolved quite a bit, but this should at least give you a better idea on what to expect:
The Quest for the Perfect Build System
The Quest for the Perfect Build System (Part 2)
Personally, I never needed anything special, that couldn't be achieved with VS project/solution files, makefile's and BATCH'es, so I won't be recommending anything in particular.
Scons definitely. It plays with fortran and C naturally, and it is python based so it shouldn't have any problem with that one either (never used it for py though, so can't tell from experience). Also, much more readable than the majority of them out there.
I know Maven isn't known to focus on anything but Java, but perhaps it might at least be worth mentioning. There have been some work towards enabling at least C/C++. When comparing to Ant, it's pluggable in a similar fashion, but it's declarative rather than imperative, with standardized dependency management, and a build result repository which may even be distributed.
ANT + terp for the C++ portions. terp plays nicely with VisualStudio as well as with many other C++ compilers on many platforms. ANT requires Java though, if only as the hosting technology. I don't know whether that is a no-no with your requirements or whether you just don't want to start writing Java code.

Practical Alternative for Windows Scheduled Tasks (small shop)

I work in a very small shop (2 people), and since I started a few months back we have been relying on Windows Scheduled tasks. Finally, I've decided I've had enough grief with some of its inabilities such as
No logs that I can find except on a domain level (inaccessible to machine admins who aren't domain admins)
No alerting mechanism (e-mail, for one) when the job fails.
Once again, we are a small shop. I'm looking to do the analogous scheduling system upgrade than I'm doing with source control (VSS --> Subversion). I'm looking for suggestions of systems that
Are able to do the two things outlined above
Have been community-tested. I'd love to be a guinae pig for exciting software, but job scheduling is not my day job.
Ability to remotely manage jobs a plus
Free a plus. Cheap is okay, but I have very little interest in going through a full blown sales pitch with 7 power point presentations.
Built-in ability to run common tasks besides .EXE's a (minor) plus (run an assembly by name, run an Excel macro by name a plus, run a database stored procedure, etc.).
I think you can look at :
http://www.visualcron.com/
Consider Cygwin and its version of "cron". It meets requirements #1 thru 4 (though without a nice UI for #3.)
Apologize for kicking up the dust here on a very old thread. But I couldn't disagree more with what's been presented here.
Scheduled tasks in Windows are AWESOME (a %^#% load better than writing services I might add). Yes, not without limitations. But still extremely powerful. I rely on them in earnest for a variety of different things.
If you even have a slight grasp on c# you can write as custom "task" (essentially a console application) to do, well, virtually anything. If persistent/accessible logging is what you're after, why not something like Serilog or NLog? Even at the time of writing, it had a very robust feature set. This tool in and of itself, in conjunction with some c#, could've solved both your problems very easily.
Perhaps I'm missing the point, but it seems to me that this isn't really a problem. At least not anymore...
If you're looking for a free tool there is plenty of implementations for the popular Cron tool for Windows, for example CRONw. It's pretty easy to configure and maintain. You could easily write add custom WSH scripts to send your emails and add log entries.
If you're going commercial way BMC Control-M is arguably one of the best but I don't believe that it is particularly cheap.
You may also consider some upcoming packages like JobScheduler
Pretty old question, but we use Jenkins. Yes its main purpose is for CI\CD, but its also a really nice UI for CRON with a ton of plugins and integrations.

Resources