Testing Hadoop to Teradata flow [closed] - hadoop

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 3 years ago.
Improve this question
I would like to test a flow between an hadoop datalake and teradata tables. The thing is that I am new to these technologies.
The data lake is my data source for the datawarehouse I have on teradata.
I read about QuerySurge but I'd like to know if it is possible to create my own scripts to test the flows.

Teradata offers connectors for Cloudera (link) and Hortonworks (link) which facilitate moving data between the platforms.
QueryGrid is an offering from Teradata that allows you to create "linked servers" on your Teradata platform. Using these "linked servers" you are able to query data on a Hadoop platform from Teradata. Currently, these types of workloads are intended to be low concurrency. That landscape is evolving fairly quickly and concurrency rates may increase as the technologies evolve and mature.

Feel free to use QuerySurge.
I'm working on Querysurge from last 5 years to test and validate the data from different sources.
it's basically automation of custom SQL Scripts.
QuerySurge

Related

Is Cassandra jdbc still actively supported? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 2 years ago.
Improve this question
Looking through the Apache Cassandra page and other linked Cassandra documentation, emphasis seems to be on CQL and there are no references to JDBC. Is a Cassandra JDBC driver still under active development? A google search for 'cassandra jdbc' turns up a few things, but none appear connected to Apache or DataStax. If anyone is using a Cassandra JDBC driver, which one are you using?
Digging around and I found one in BitBucket from the DbSchema people.
I've only tried a few basic queries so far, so this is more of a 'hey, look' rather than an endorsement.
CData Software (the company I work for) makes a JDBC Driver for Cassandra. As of 31 May, 2016, the driver is in Beta, but it is written to the JDBC standard and should work in any and all environments that support JDBC.
The link above will take you to the page to read more about the driver and download a Beta (or trial/full version once the driver is released).

Selecting the ETL tool [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
I will have this situation in my project:
DB: Cassandra
Data Source: A Relational DB
Manager of ETL process: .NET API
ETL TOOL: ???
I will use Cassandra as my database.
It will collaborate with Oracle db using an ETL tool, dynamically.
The source data has stored in oracle and it will use in Cassandra with the management of an API.
The question is that, what ETL tool is better in this situation, since my ETL manager, which will select the parameters dynamically, is a .NET API (User will use this API backstaires, using the project.)
SSIS was a good tool, since it's compatible with Microsoft .Net. but incompatible with Cassandra 3.x ). Then, we can't use the benefits of new versions of Cassandra, as materialized view, SASI secondary index (soon) and etc.

Is there a good ETL framework for data warehouse in Hadoop [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I have investigated Oozie and Azkaban, but I think they are only used to schedule some jobs.
DW often need large of jobs to schedule, and is there a good framework for it?
You can Use Pentaho data integration tool . Check this out. http://www.pentaho.com/product/data-integration
You may also check Talend for Data integration in hadoop based warehouse. It offers graphical tools to create data intgeration flow between the hadoop components and it is opensource too.
please check http://www.talend.com/resource/hadoop-tools.html

Tableau programming [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I am new here, and i hope that i could find answers for my questions related to open source reporting systems.
Is it possible to change in the programming logic of 'Tableau desktop'? I am asking this because i need to make changes that
enables me to log users' interactions with the system (Tableau
Desktop).
Is it possible to perform Big Data analysis by combining Tableau Desktop with Hadoop or Spark?
If the answers for the above questions is no, then could you please
recommend any other open source (free) reporting system that satisfy
these requirements.
Thank you in advance and best regards to all of you
Tableau has drivers to connect to several "big data" No SQL databases, and has added a Spark SQL driver as of Tableau version 8.3.
The full list of supported drivers can be found on Tableau's website at http://www.tableau.com/support/drivers
Your question about logging user interactions is not at all clear, but you might have better luck instituting logging at the database level instead of at client level.
In response to your question regarding user interactions, I'd recommend you take a look at the views_stats table in the Tableau Server database.
Instructions for connecting to the 'workgroup' database: http://onlinehelp.tableau.com/current/server/en-us/adminview_postgres_connect.htm
Versions 8 and 9 includes a Spark connection
As far as logging users goes, Tableau Desktop is designed as a single license tool for developers and shouldn't need to be logged.
If you're interested in logging users, you may be thinking of Tableau Server, which has built-in functions for things like that as well as a REST API, which has some additional functions.

Data Loading Software [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
We deal with scientific research data and we have volumes and volumes of data put together in different template file formats (excel, csv, txt, xml etc). We were using old legacy C programs (developed in-house) to load these data into our databases. (We use ingres as our DBMS). Are there any open-source software that is available for ETL (extraction, transformation , loading) process?. What have been your experiences, if you have used any?
Based on what other Ingress users are saying, the 2 that are fairly well spoken of are Talend and Pentaho.
Pentaho site: http://www.pentaho.com/
Talend site - as already mentioned by Paul: http://talend.com/index.php
Here is an open source solution for importing multiple file formats into a database system or other system type.
http://talend.com/index.php
At the company I work at we use SQL Server Integration Services which does similar things but it should come with SQL Server if you're using that.
There is an opensource set of bi and etl tools - have a look at Pentaho - I believe it's etl tool is called "kettle" - pretty rich set of functionality, gui tools for the etl process.
We use DBMS/COPY but it looks like it is no longer in production. It has a GUI interface for setting up scripts or you can hand-write them.

Resources