Hadoop Hive web interface options - hadoop

I've been experimenting with Hive for some data mining activities and would like to make it easily available to less command line orientated colleagues.
Hive does now ship with a web interface (http://wiki.apache.org/hadoop/Hive/HiveWebInterface) but it's very basic at this stage.
My question is does a visually polished and fully featured interface (either desktop or preferably web based) to Hive exist yet? Are their any open source efforts outside the Hive project working on this?

Now with new version of Cloudera's Hadoop Distribution comes HUE (Hadoop User Experience) with plugin called Beeswax, which most likely all you would need.
It's pretty tricky to configure, but one you get over it, it provides something like phpmyadmin interface, but is much nicer and easier. It supports writing queries, importing data, storing results, etc.

Web based opensource GUI for Hive
HWI - Shipped in Hive. with basic features.
Hue - Nice query editor with autocompletion. Support parameterized query. Latest version includes basic visualization of query result. Includes many other useful tools like managing HDFS, JobFlows, etc. Thus, heavy and little bit tricky to install and configure.
Zeppelin - Only includes Hive tool compare to Hue. Support query template. Pluggable visualization architecture and it's online archive, so easily create custom visualization and share it. Lightweight and easier to install than Hue while it does not include any feature for non-hive related things.
Other alternatives
Excel - Microsoft Excel is capable of making hive query and fetch data from hive. http://docs.hortonworks.com/HDPDocuments/HDP1/HDP-Win-1.1/bk_dataintegration/content/ch_using-hive-2.html has guide for doing it.
Commercial BI tool - Commercial BI tool like Tableau, Datameer, Karmasphere support connection to Hadoop or Hive. They have nice dashboards, charts. All they offer trial/community/personal edition.

HUE is usefull and good but you should also try "Karmasphere Analyst Free/community Edition". It is very easy to use and well documented. Free version is very capable. It is not web based but it has different OS support (windows,linux...etc). YOu can check the GUI from documents to see how it looks.

Related

How to read from an EDB database file

Microsoft Edge and other Microsoft products use an Extensible Storage Engine. If you have edge installed on windows, you can find the database file here:
C:\Users\username\AppData\Local\Packages\Microsoft.MicrosoftEdge_xxxxxxx\AC
\MicrosoftEdge\User\Default\DataStore\Data\nouser1\xxxxxxxx\DBStore\spartan.edb
I would like to read this database from my .NET app.
The only tool I have seen for viewing this data appears to be deprecated:
http://www.woanware.co.uk/forensics/esedbviewer.html
I can't seem to find any relevant nuget packages for querying this type of database. Does anyone have experience working with this type of database?
The database engine is esent.dll, and you can access it in several different ways:
C API. https://msdn.microsoft.com/en-us/library/gg269259(v=exchg.10).aspx
C#. https://github.com/microsoft/managedesent
Simplified C# (Isam layer). Easier to use, but not everything is exposed. https://github.com/Microsoft/ManagedEsent/tree/master/isam
(Disclosure: I've worked on the above products.)
That being said, if you just modify a random database, you can impact the host process's integrity, and it might end up crashing.
If you're a law-enforcement agent and it's for forensic purposes, Microsoft should be able to assist you (I haven't done it myself, but I've heard that they'll do it).
-martin

GoodData: "CloudConnect" or another tool for ETL development

We are GoodData customers who are beginning the process of evaluating ETL tools other than CloudConnect. I'd like some recommendations from other GoodData customers who do their own ETL/LDM development with a tool other than CloudConnect. What has been your experience with these other tools? How do they compare with CloudConnect? I have another conversation going on LinkedIn (https://www.linkedin.com/groups/Model-ETL-Development-CloudConnect-vs-6616061.S.5897711443083538433?qid=fbab6f85-4bd2-4515-8737-98a365bf9208&trk=groups_most_popular-0-b-ttl&goback=%2Egmp_6616061). From this conversation I have learned a lot about Keboola but I would like to hear others' experiences with other tools.
Other option is to use our "BI Automation Framework" that is being developed on the top of our Ruby SDK and it is great fit if you are more "Developer/Coder". It will be integrated with our Agile DataWarehouse Service (ADS) where you have option to manage your data transformation process using the Vertica database with SQL. We are moving forward quickly in this space.
Another option you can use is to use the ADS + CloudConnect as orchestration tool. Again, this helps you when doing SQL transformation is more comfortable for you. If you want to start testing those tools, let me know.
JT

Can we compare saiku with Pentaho Analyzer?

I'm currently in an internship and i have to create a whole BI application.
I think i'll use pentaho, and I have to use just open source component.
I know that Pentaho Analyzer is not free
My question is: Is saiku an equivalent of analyzer?
If yes, can I use it with pentaho instead of analyzer?
thks
I'm the developer of Pivot4J project and want to share my (subjective) opinion on the subject.
First, as though you righteously assumed Pivot4J to be more of an API than an application, it does not always mean you need to write lot of code to use it.
We also have a Pentaho BI plugin which does not require any coding and has comparable features to Saiku plugin, though it's targeted toward the yet unreleased Pentaho 5.0 platform.
And our sample application provides most of the functionalities that JPivot web application has, even though it lacks a data source configuration feature which will be soon to be fixed.
Compared to Saiku, I think each project has its own advantage in different scenarios.
Saiku has a much lightweight architecture on the client side than our sample application and the plugin, so it can be deployed and embedded virtually anywhere.
While it's not much difficult to create a full REST style analytic application with Pivot4J, our current sample and plugin applications require at least a Servlet container to run and are more difficult to be embeded than Saiku in certain environment.
On the other hand, as Pivot4J is designed to be UI independent API from the start, it could provide more flexibility than Saiku in my opinion to developers when they want to build their own application on top of it, or intend to customize core behaviors of the API.
For example, if you want to use Pivot4J with your own application which is build with ExtJS, DhtmlX, or any other UI toolkits, it'd be much easier to achieve a seamless integration with Pivot4J, as it provides you with convenient abstract extension points to do that.
Finally, if you're familiar with Javascript you might find working with Saiku easier as it delegates most of the UI related works to the client side.
On the other hand, if you're an old school Java developer like me :) you might find our sample application to be easier to understand and work with, as there's virtually no custom script involved and everything is done on the server side with JSF component model.
To conclude, I'd like to say that Pivot4J is not just an API which cannot be used without writing much codes as it already includes quite feature complete Pentaho BI plugin for the upcoming 5.0 release of the platform. And as Pivot4J and Saiku take rather different approach from each other, each has its own strength and advantage which could be leveraged to suit the specific use case.
Yes of course. Both the tools use the same underlying OLAP engine - Mondrian. Saiku is essentially the same as analyzer providing many of the same features - however it has a different architecture which additionally makes it very embeddable and pluggable. Plus Saiku can be used standalone too if you want to.
Check out the demo at dev.analytical-labs.com to see what it can do.
Also for help you wont find many tools with such a great community - hook up with them on Freenode IRC at either ##Pentaho or ##Saiku depending on your questions!
Pentaho is the right choice for OS BI too - Presume you looked at Jaspersoft as well? Worth a look but you'll no doubt realise the features are better in Pentaho.
Have you think about a pure javascript UI to pivot your olap cubes? There is one such component calls WebPivotTable at http://webpivottable.com
Jpivot, saiku and pentaho are all based on olap4j API so that they all need a java server side service. WebPivotTable use AJAX call to xmla service directly so that it can be used to pivot any xmla OLAP server, like mondrian, SSAS, iccube. Since it doesn't tie up with any java back end and also it is pure javascript based, you can easily integrate it into any website or web application.

Lotus Notes XPages for design and Oracle (or other RDBMS) for data

I plan to make an web application using XPages for the design (plus the Lotus Notes Elements), but instead of storing the data in Lotus Notes, I will store the data in a RDBMS (specifically an Oracle Database). As you can see, I want to create a clear separation between the design and data. Is there a way I can do this? I mean, is there a way I can use Oracle as the data source and XPages/Lotus Notes for the UI only? Thanks a lot!
There actually has been native support for RDBMS/JDBC connectivity in Xpages since July 2011. Take a look at the Extension Library on OpenNTF (http://extlib.openntf.org/) and read the blogpost announcing the functionality: http://www.openntf.org/blogs/openntf.nsf/d6plinks/NHEF-8JYMXE.
It's not yet officially supported through IBM (not part of the Upgrade Pack 1 - http://www-01.ibm.com/software/lotus/notesanddomino/nd85-UpgradePack853-1.html), but there will be support for it in the future.
The short answer appears to be "yes but no" .
XPages is JDBC compliant, so you can connect to anything, including Oracle. The snag is, you'll probably have to write the code yourself. Apparently there are plans to open source a JDBC-based Relational Database XPages DataSource but there's been no activity as such. Find out more
XPages101.net may be a good site for cutting your teeth on Xpages. They have 60 lessons you can subscribe to. They are highly recommended. You may be particularly interested in lesson 55.
DISCLAIMER: I am not affiliated with XPages101.net in any way.
The XPages environment doesn't have any particular support for relational data sources, but there are two viable options depending on how cautious your organization is about using recent and beta software and how much support code you're willing to write.
For the former case, the XPages Extension Library (http://extlib.openntf.org/) is adding in relational database support for the 8.5.3 release of Domino, which is scheduled to come out at the beginning of next month. I'm not in the 8.5.3 beta, so I haven't seen this in action, but the video on their page looks promising and the Extension Library is high-quality in general. So if you're willing to deploy 8.5.3 when it comes out and use the Extension Library, it'll likely do just what you want.
If you won't be immediately upgrading or don't want to deploy the Library, you can write your own code using JDBC drivers; there's an example of this technique here: http://www-10.lotus.com/ldd/ddwiki.nsf/dx/xpage_integration_rdb.htm . While you can't, for example, just pass a RecordSet to a Repeat control on an XPage, if you're willing to write your own ORM, you could make your objects implement the List interface use Java-Bean-style naming, which would let you use them in standard controls and write expression language like "#{someRecord.someField}".

Excel report framework

Is there any Excel report framework available? We need to export some of the reports into Excel format. Our application is java application hence anything supporting java would be great. I have tried Apache POI API, however that is not good enough. Any framework based on Windows API would be better.
SQL Server Reporting Services has options to export to Excel.
FYI, JasperReports uses POI for Excel-conversion.
Can you elaborate on what you don't like about Apache POI? I've been using POI for years now and haven't found anything that it couldn't do with a tweak here or there or taking an creative approach. IMHO, it's the best open-source (and free) Excel generation/reporting framework out there.
If you are willing to pay money, then Actuate has probably the best solution. Actuate's e.Spreadsheet Engine and the Excel API, you can read, write, modify and generate entire spreadsheets or parts of spreadsheets. I've used it and their API is richer and simpler then POI. POI, while powerful feels like an API that's grown up over time and has many developers involved in creating functionality and patches.
Try xlsgen, supports Java (but can only run under Windows).
Why is poi not good enough for you?
An alternative might be jasper reports. I've used this instead of poi a couple of times and the experience was pleasant.
You can also try jxl, but honestly it's API is more confusing than POI.
Xylophone is LGPL Java library and command line utility that uses Apache POI, but mitigates most of its drawbacks.
It consumes data in XML format, spreadsheet templates in XLS(X) format and makes producing of complex Excel reports more fun. You can read about it in this post. Because of licensing and security issues this must be better choice for Java backend than Windows API.

Resources