Sqooping MS Access data into HDFS - Hadoop - hadoop

I have use case where I need to Import/Sqoop Microsoft Access data into hdfs. Is there any drivers available for MS Access to Sqoop data. Has any one came across such a case .
Please drop your comments and observations .

Looks like there is no support with access. Here is the list of supported databases from sqoop. The nearest is Microsoft SQL server here. Main requirement is the jdbc driver to connect to database.

Related

Can ETL informatica Big Data edition (not the cloud version) connect to Cloudera Impala?

We are trying do a proof of concept on Informatica Big Data edition (not the cloud version) and I have seen that we might be able to use HDFS, Hive as source and target. But my question is does Informatica connect to Cloudera Impala? If so, do we need to have any additional connector for that? I have done comprehensive research to check if this is supported but could not find anything. Did anyone already try this? If so, can you specify the steps and link to any documentation?
Informatica version: 9.6.1 (Hotfix 2)
You can use the odbc driver provided by cloudera.
http://www.cloudera.com/downloads/connectors/impala/odbc/2-5-22.html
For Irene, the you can use the same driver the above one is based the simba driver.
http://www.simba.com/drivers/hbase-odbc-jdbc/

HDFS for Teradata

As per my understanding, HDFS is useful for the data that is unstructured and large in quantity. I wanted to know, is it possible to use HDFS with Teradata, as Teradata is RDBMS and hence not so Unstructured?
Also, how does HDFS come into picture with the database anyway. Is it that the File System contains data or , how exactly does it work in simple terms? Thanks
With Teradata DB itself - no.
However:), Teradata is providing so-called UDA (Unified Data Architecture), where Teradata, Aster DB and Hadoop(HDFS) are interconnected and can work together almost seamlessly :).
In general, if you want to work with unstructured data only, choose Aster. Which is product of Teradata and you can be connect with HDFS directly. HDFS is used here as a cheap and fast data storage.
Even more interesting solution will come up with the new Aster version (6), where AFS (Aster File system) is going to be implemented. ASR is a distributed filesystem similar to HDFS. I'm looking forward to give a try as well ;)
To add some more details to the answer of xhudik.
To connect Teradata with Hadoop, you need a connector. One is called Teradata QueryGrid for Hadoop. It is an addon to Teradata DWH and connects to HCatalog. And HCatalog connects to HDFS.
You can also use the Teradata Connector for Hadoop, which is a SQOOP extension and so you can connect to Teradata from Hadoop.

Is there a way to use a JDBC as a input resource for Hadoop's MapReduce?

I have data in a PostgreSQL DB and I'd like to get it, treat it and save it to a HBase DB. Is it possible to distribute somehow the JDBC operation in a Map operation?
Yes you can do that by DBInputFormat:
DBInputFormat uses JDBC to connect to data sources. Because JDBC is widely implemented, DBInputFormat can work with MySQL, PostgreSQL, and several other database systems. Individual database vendors provide JDBC drivers to allow third-party applications (like Hadoop) to connect to their databases.
The DBInputFormat is an InputFormat class that allows you to read data from a database. An InputFormat is Hadoop’s formalization of a data source; it can mean files formatted in a particular way, data read from a database, etc. DBInputFormat provides a simple method of scanning entire tables from a database, as well as the means to read from arbitrary SQL queries performed against the database.
LINK
I think you're looking for Sqoop, which is designed to import from SQL servers to HDFS stack technologies. It puts the data it gets from a JDBC connection into HDFS, thereby splitting it across your Hadoop NameNodes. I believe this is what you are looking for.
SQl to hadOOP = SQOOP, get it?
Sqoop can import into HBase. See this link.

Hadoop remote data sources connectivity

What are the available hadoop remote data sources connectivity options?
I know about drivers for MongoDB, MySQL and Vertica connectivity but my question is what are other available data sources that have driver for hadoop connectivity?
These are the few I am aware of :
Oracle
ArcGIS Geodatabase
Teradata
Microsoft SQL Server 2008 R2 Parallel Data Warehouse (PDW)
PostgreSQL
IBM InfoSphere warehouse
Couchbase
Netezza
Tresata
But I am still wondering about the intent of this question. Every data source fits into a particular use case. Like, Couchbase for document data storage, Tresata for financial data storage and so on. Are you going to decide your store based on the connector availability??I don't think so.
Your list will be too long to be useful.
Just one reference: cascading gives you access to almost anything you want to access. More, you're not limited with Java. For example there is scalding component which provides very good framework for Scala programmers.

How to retrieve and analyse data from a MS SQL Server using Hadoop, Hive and Sqoop?

I want to do analysis on data which is in database(MS SQL Server). So how can I Bring that data on HDFS with the help of Sqoop/Hive? Is it possible with Hive/Sqoop?
Please suggest me how can we do it.
Thanks.
Microsoft recently released a SQL connector for sqoop. There are few ETL tools (open source and not) that also connect from SQL to Hadoop (like Talend etc.)

Resources