I am new to talend ETL tool.I have created jobs workflow in talend DI(data integration) tool now I want to switch/implement same jobs using hadoop for that I am using talend Big Data tool. Can anybody explain me how could I achieve this. Talend DI to Talend Big Data integration.
As per my understanding liberaries are different for both DI and Big Data Talend tool may be because of this import is not possible.
Talend checks the version of the enviromeent before importing items, you can dodge it by deleting the talend.project file from the exported directory or archive.
I have had success with editing the talend.project file so the version shown matches the one you are trying to import TO (if they are different).
I imagine this may not always work if you are using components or features that are not compatible with the target version.
Related
What is the best and faster way for ETL from a huge file of records to inserting those transformed records in MySQL table ?
There are a lot of tools to do that, you can use SSIS, Talend, Informatica, etc..
Also you can create you own etl process using python, golang, or any other language.
Another simple and minimal tool to do that is Dixer I'm the main developer and is a tool to do etl process fast for big files and support big relational databases without the need to download dependencies.
I need to schedule report exporting for a particular department which needs to run daily. And each day the list of employees for that department changes.
So i am using Talend ETL to schedule this job and save the pdfs in a particular folder in my local.
But the problem is, while creation of pdf from the server it is missing some of the data and generating pdf!!! though complete set of data is available in the report??
I am using Jasperserver 6.3
Talend ETL community 6.01
and tjasperserverexec component to trigger pdf export.
Can any one pls suggest how to fix this words missing in the automated export of reports. Thanks in advance.
as it was missing the last line i just concatenated with \n\r for the column and it fixed the issue..
Is there a possibility to deploy or redeploy a SAS job (Data Integration Studio) via a shell script ?
Also is there a way to create its SPK file via script ?
You can deploy DI jobs from commandline, see here:
http://support.sas.com/documentation/cdl/en/etlug/65807/HTML/default/viewer.htm#p1jxhqhaz10gj2n1pyr0hbzozv2f.htm
I have imported and exported objects into SAS DIS via shell script use this sas ExportPackage utility. I personally find it way more convenient as compared to window method. However, for it to work you need to have X-windows Environment, i used Xming for it.
As for deploying Jobs, never tried it.
To redeploy jobs DI Studio versions 4.901 and higher have a DepoyJobs tool which is designed to perform this function: read more in the SAS documentation. It is available on the server. Older versions had a similar but much more restrictive client tool using ant.
Also see Paper 1067-2017 An Introduction to the Improved SASĀ® Data Integration Studio Batch Deployment Utility on UNIX by Jeff Dyson, The Financial Risk Group, which gives a run through on how to use it.
Does anyone know if there is a straightforward way to get data out of MS Project 2013 and into Oracle 11g? We have a master schedule created in MS Proj and want to create a web-based application that will perform monitoring and metrics charting of the project schedule statuses. I have successfully exported to CSV and imported into Oracle, but this was cumbersome and required a lot of formatting of the data in the CSV format before it was pushed back into Oracle. I'm in the beginning phases here, but wanted to solicit anyone who may have had experience with this in the past.
If you don't mind writing a little code, you could use MPXJ. You should be able to extract what you need using Java or a .Net language. You can perform the import directly in code, or just generate a suitably formatted output file for import into Oracle using other tools.
What's the best practice to control versions of Tableau projects?
If a change in Tableau project requires changes in the database (in my case - RedShift) and in ETL (in my case my python script), how to version control all of them together, such that I would be able to roll-back to previous version in case of a problem?
Thanks!
EDIT - Tableau has added version control features to Tableau Server since the time that this answer was originally provided.
At present Tableau Server does not provide version control functionality. There are a few ideas on the Tableau Community forum requesting integration with version control software such as Git or for version control to be baked into Tableau Server. Since Tableau workbooks are just XML files, then one could use some form of source control software for workbooks stored on a shared drive, and for publishing permissions to be restricted to a site/project admin
In theory a script could tie all of these components together. If a particular version of a Tableau workbook were associated with a specific database and ETL change (although I'm not sure what part the Python script plays here), then the previous version of the workbook could be retrieved from source control and republished as part of a rollback
Another way to accomplish the ability to rollback to previous version, is to run the native Tableau backup command just before applying any project changes. This will provide a snapshot of the server state at the time of the change.
The format is tabadmin backup backupfilename
In Tableau 8.0 and earlier, the server must be stopped first, via tabadmin stop
So your existing DB and ETL change deployment mechanism could be extended to call the backup command, and use a backupfilename that has the build or release number appended in the filename.
Running a server backup like this may not be as heavyweight an operation as you think- if your workbooks use all live connections and no cached or uploaded data, the backup command is quick and should complete in several seconds.