Best way for ETL from a file to MySQL - etl

What is the best and faster way for ETL from a huge file of records to inserting those transformed records in MySQL table ?

There are a lot of tools to do that, you can use SSIS, Talend, Informatica, etc..
Also you can create you own etl process using python, golang, or any other language.
Another simple and minimal tool to do that is Dixer I'm the main developer and is a tool to do etl process fast for big files and support big relational databases without the need to download dependencies.

Related

export jobs from talend DI to talend big data

I am new to talend ETL tool.I have created jobs workflow in talend DI(data integration) tool now I want to switch/implement same jobs using hadoop for that I am using talend Big Data tool. Can anybody explain me how could I achieve this. Talend DI to Talend Big Data integration.
As per my understanding liberaries are different for both DI and Big Data Talend tool may be because of this import is not possible.
Talend checks the version of the enviromeent before importing items, you can dodge it by deleting the talend.project file from the exported directory or archive.
I have had success with editing the talend.project file so the version shown matches the one you are trying to import TO (if they are different).
I imagine this may not always work if you are using components or features that are not compatible with the target version.

Visual Studio 2013 - DB Data Compare

I know you can compare and sync data between databases in Visual Studio using SSDT. But is there a way to compare DB data in DB project vs actual DB?
We currently use RedGate to sync schema and data and normally when someone makes a data change in their local DB he syncs it with redgate project scripts and checks them in to GIT so that everyone can sync their local DBs with those redgate scripts to be up to date. RedGate became too expensive and we are looking at the alternatives and looks like Visual Studio has SSDT that allows to do these kind of things.
So I was able to create a database project in VS and import the schema from the DB so now all developers be up to date if schema changes but it doesn't have an option to do the same thing with DB data. No option to create data scripts (and add them to the database project) to compare with DB, as I said it only allows you to do data compare between DBs but not between DB and the DB project scripts, at least that's what I found so far. Is there even a way to do it so that we can include data scripts and be able to sync them with DBs?
You could check in MERGE-scripts for static tables and include them as Post Scripts.
The merge statements will ensure that your target tables have the correct rows when publishing (insert/update/delete in the merge will do this).
Make one merge script file per table, and include them all in your Post Script File.
Only difference/downside is that you can not import into the merge script, you have to type in the code to get it version controlled, or write an SP that generates the SQL Merge statement.

Importing MS Project 2013 data into Oracle 11g database

Does anyone know if there is a straightforward way to get data out of MS Project 2013 and into Oracle 11g? We have a master schedule created in MS Proj and want to create a web-based application that will perform monitoring and metrics charting of the project schedule statuses. I have successfully exported to CSV and imported into Oracle, but this was cumbersome and required a lot of formatting of the data in the CSV format before it was pushed back into Oracle. I'm in the beginning phases here, but wanted to solicit anyone who may have had experience with this in the past.
If you don't mind writing a little code, you could use MPXJ. You should be able to extract what you need using Java or a .Net language. You can perform the import directly in code, or just generate a suitably formatted output file for import into Oracle using other tools.

Managing database scripts in your solutions

I usually create a solution folder in Visual Studio and put my DB scripts in them. I always use at least this set of scripts:
Drop model
Create model script
User functions
Stored procedures
Static data (lookup tables)
Test data (not deployed)
Then I simply combine them and run against an SQL Server so I'm able to recreate the whole DB in a single step (by combining these scripts into a single one and executing it).
Anyway. I've never used projects in either:
Visual Studio or
SQL Management Studio
I've tried creating SQL Server 2008 Database Project in Visual Studio 2010, but I'm somehow overwhelmed by all the possible server settings (which I prefer to stay default as set on the server anyway). So I'm a bit confused: Should I use this project template or should I just do the same thing I always did?
What do you use and why? What are advantages I may benefit from by using either?
If I were you I would continue to do it the way you are doing it. In fact I do! The advantages of having the actual .sql files right there in a folder for you to use/edit/look at in my opinion are far better than the advantages you get by using a DB project. DB Project would be used if you were doing something like Storage Reports, were you have to communicate with like 8 databases and compare then to 8 different databases and save result sets etc... Now don't get my wrong there are advantages of Database Projects, I just don't think they are actually doing much help when you have such a simple setup that works already.
Advantages of the SQL Server 2008 Database Project in VS10:
Not having to switch back and forth
from your current client you use to
communicate with your SQL server.
Decent Data and Schema compare tools.
Gives you a one-click way to reverse
engineer a database into source
control, and keep it up to date.
You can compare projects to physical
databases and vice-versa. (This makes it pretty easy to keep your database up to date, no matter where you make change it: file system database project, or in the physical database itself)
If the current tool your using is not specifically tailored to SQL Server, this one is.
Extremely helpful if you need to do
unit tests directly on the database
without using abstractions.
If you're looking for something a little less complicated, you might want to try SQL Source Control. This won't even require you to maintain scripts, as it doesn't this for you behind the scenes. It will, however, only work as a solution for you if you use either TFS or SVN. And it costs $295...
It has a 28-day trial period, so if you're happy to try it out, I'd be interested in your feedback.

DB design strategy in Visual Studio

I'm currently investigating ASP.NET MVC 2 and LINQ to SQL. It all looks pretty cool. But I have a few application and development lifecycle issues.
Currently, I design the DB in SqlServer Management Studio.
Then I update my DBML files by deleting and re-importing modified tables.
Issues:
I can't find how to simply update the whole DBML schema.
My DBML then loses some of the changes I made such as renaming relation members or mapping of some int to an enum.
If I want a SQL script to deploy my DB (or to keep the schema under source control), I need to go use the 'Genererate Script' SSMS wizard which would be cool if a) it could remember my settings and b) it could be automated.
Should I work the other way around (start from my DBML and generate the DB)? Should I go for some other framework (NHibernate? Can I use some Linq flavor with it?)
Also, I read that LINQ2SQL is already obsolete in favor of Linq to Entities. Does it mean that the ultimate tool supposed to make my life so much better will again make me lose time in the long term?
Thanks for shedding some light.
If you are starting your DB Schema from scratch you could consider "Code-First Development with Entity Framework 4" as outlined by Scottgu.
I have been using this on a new project and am finding it extremely beneficial - especially for testing.
I started with simple POCO classes representing my data, then as the project progressed I would allow EF4 to generate the schema to a "real" DB using my "in-memory" example data ... now I am using a mixture of both in memory POCO (for development and TDD) and auto-generated DB Schema (auto-loaded with more "realistic" data) for demonstrations etc ... so far I am very happy.
There is a lot of opinion over LINQ2SQL and whether it's 'obsolete' or 'discontinued'. But it is still in the .NET framework and a good tool, so if it suit your needs then you should use it. Frankly the Entity Framework is still not perfect and if you don't need the extra flexibility that it affords then it is not worth the pain. If I had a small to midsize project then I would definitely use LINQ2SQL again (and over EF).
As for your question, yes you'll lose any names or different type mappings when you remove and re-add a table. The options that I'm aware of are
Only remove / re-add the table that has changed (not all tables)
Try altering the DBML tables in place, rather than remove / re-add. You can add and remove columns, change column names and data types, add relationships all on the DBML.
I like JcMalta's suggestion of creating objects as classes before rendering into the database, but if you find SQL Studio to be quick to develop with then it might simply be quickest to create tables there are drop them into your DBML. It's a touch annoying to have to change something in a database and the push the changes into your code but the code-gen tools are quite good and take away most of the pain.
You can try CodeSmith/PLINQO to auto-sync DB/code:
http://plinqo.com/
As a follow-up, just wanted to say that I eventually found and fell in love with Huagati DBML/EDMX Tools.
To be totally honest, I must say that the price has significantly increased since I purchased it. I believe it is still worth the money anyway.
And for people who are looking for the same kind of tool for MySQL (or other), DevArt is your friend.

Resources