Jitterbit4 vs Jitterbit 5+ vs Talend vs Other ETL tools - etl

Whilst I find Jitterbit 4 a fairly powerful tool, I guess that my company and I have kinda maxed out the capabilities of v4 of the thing, or so it seems.
I am trying to keep some now business critical processes alive, and finding that I'm swimming against the tide.
Any experience of improvements to be gained to moving to a later version of Jitterbit that make this route worthwhile, or time to move to a more able platform? I've used in the past Business Objects DM, but I don't think our budget would stretch to that.
I've done some limited research, but I need more information than some generalized blog quotes to form a case for either upgrading, or moving platform.
I'd like to assign multiple automated triggers - for example M-F every 15 minutes, S&S every hour. It would be nice to be able to open more than one project at a time in the IDE.
I have to look after a number of processes which take data from CSV files, or MySQL/MSSQL tables, and upload to Netsuite CRM, or extract data from Netsuite CRM and move to MySQL/MSSQL. (interaction with Netsuite is via SOAP requests using XML) Up until November these processes were generally run perhaps 3 or 4 times a day, but a number of processes now are running at 15 or 5 minute intervals. I've done some optimisation work, but the server is running pretty much at max speed - the limit being that we can update up to 2000 records per hour to Netsuite. And the company wants to do more in 2015.
The limit to Netsuite is absolute - however the problems I am wanting to sort out include better control of logging - I can't seem to turn off logging on bits I don't want or need to be logged. I'd like to be able to open two projects in one IDE, so I can compare code. And I'd like to be able to open the development IDE on one server, but open the admin panel to view the other server - the IDE I use allows only one login.
If Talend or something else can offer these sorts of advantages then perhaps it's the way to go - especially as Jitterbit isn't a skill found in a lot of DevOps here in the UK, but Talend and other things are.

I'm going to start this by saying I really don't have any knowledge about Jitterbit at all so have no real comparison. The other thing to add is that some of the things you want are available in the enterprise licences for Talend but not in the free Talend Open Studio (TOS) edition. If you have absolutely zero budget you could probably get by with TOS and using external scripts to build your jobs and projects and to run them using Cron or some other way of launching the built JARs.
I'll start by talking about what you can do with the enterprise editions of Talend (such as Talend Enterprise Data Integration).
Talend's enterprise editions come with a Talend Administration Centre (TAC) that can be used to schedule jobs on multiple triggers and deploy on chosen Job Execution Servers to run the jobs. It's pretty trivial to set up Cron style triggers to run every 15 minutes M-F and then another one to run every hour Saturday and Sunday. The TAC also provides a centralised reference to all of the Talend cluster's configuration and settings as well as creating users and assigning privileges. You can also see some logging when Talend is configured to use the Activity Monitoring Console (AMC) and then any job level logging can be configured in the job itself and then viewed in the execution history of the task.
I'm not sure what you mean about being able to open two projects at a time to be able to compare code and what you'd use it for but you can open multiple jobs at the same time to look at them. Multiple projects at the same time is a no go. I guess you could install the Studio twice in separate locations with separate work spaces (Talend Studio is based on Eclipse) and then open a project in each and compare them visually. I'm not really sure why you would do that though.
If you're finding that you have lots of processes running that are maxing out your job execution server you can easily add more job execution servers and deploy some of the tasks to the extra job execution server. Unfortunately you can't just add a bunch of job execution servers and have the TAC load balance the work across them. With just TOS you could always just have more commodity machines that you manually deploy prebuilt binaries to and execute (it's just running a binary JAR so they only need a JRE on them). It might be a bit of a pain to organise though.
Talend's enterprise editions also come with some centralised source control in the form of SVN (although quite bastardised) which is useful if you ever intend to add more team members as putting TOS into source control can be a pain.
As for non enterprise specific things Talend generates reasonably performant Java code (has easily matched any of my requirements with essentially no effort in optimisation up to now). For instance I tend to hit around 3 requests per second when dealing with internal network web services. Obviously if Netsuite simply takes a long time to respond to each request then that might not help.
Talend has out of the box connectors to easily connect to all of your mentioned data sources minus Netsuite directly (although there is an unofficial NetSuite connector on the TalendForge) but as with Jitterbit you should be able to easily do XML over SOAP to talk to it.
If I were you I'd download TOS and see if it does what you need as is. If you think you might want some of the enterprise capabilities then they offer a free 30 day trial.
You might want to try and be critical and think about the things you might potentially lose from moving away from Jitterbit as well.

Related

Creating a Windows installer using C# Winforms instead of Installer tool [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I have used InstallAware and InstallShield before, and they are pretty difficult to work with and when something goes wrong it is very difficult to find and resolved the issue.
My question is why can't we use a Windows application written using C# to do this.
I understand that .Net framework may not be installed on the destination computer, so I wonder why no one has ever used this architecture:
I will create a simple installer using IntallSiheld(or any other similar tool) to just install .Net Framework and after that extracts and runs my own Windows application which I have written using C# in elevated mode. My application will run a Wizard with Back and Next button and I will take care of everything in it (copying files, creating and starting Windows Services, adding registry values, creating firewall extensions etc.)
Has anyone ever done this, and is there anything that prevents people from doing this?
In essence: don't try to re-invent the wheel. Use an existing deployment tool and stay with your day job :-). There are many such tools available. See links below.
And below, prolonged, repetitive musing:
Redux: IMHO and with all due respect, if I may say so, making your own installer software is reinventing the wheel for absolutely no gain whatsoever I am afraid. I believe you will "re-discover" the complexities found by others who have walked the path that is involved in deployment as you create your own installer software and find that software can be quick to make, but very hard to perfect. In the process you will expend lots of effort trying to wrap things up - and "the last meter is very long" as you curse yourself dealing with trifles that take up your time at the expense of what would otherwise pay the bills. Sorting out the bugs in any toolkit for whatever technical feature, can take years or even decades. And no, I am not making it up. It is what all deployment software vendors deal with.
Many Existing Tools: there are many existing tools that implement such deployment functionality already - which are not based on Windows Installer (Inno Setup, NSIS, DeployMaster and heaps of other less known efforts):
There is a list of non-MSI installer software here.
There is another list of MSI-capable software here.
My 2 cents - if you do not like MSI, choose one of the free, non-MSI deployment tools. How to create windows installer.
Corporate Deployment: The really important point (for me) is that corporate deployment relies on standardized packaging formats - such as MSI - to allow reliable, remote management of your software's deployment. Making your own installer will not impress any system administrators or corporate deployment specialists (at least until you sort out years of bugs and deficiencies). They want standardized format that they know how to handle (that does not imply that they are that impressed with existing deployment technology). Doing your deployment with standardized deployment formats can get you corporate approval for your software. If you make a weird deployment format that does unusual things on install that can't be easily captured and deployed on a large scale your software is head-first out of any large corporation. No mercy - for real. These are busy environments and you will face little understanding for your unusual solution.
"File-Pushers": Those of us who push files around for a living know that the field of deployment is riddled with silly problems that quickly kill your productiveness in other endeavors - the ones that make you stand out in your field - your day job. Deployment is a high profile, low status endeavor - and we are not complaining. It is just what it is: a necessity that is harder to deal with than you might think. Just spend your time more wisely is what I would conclude.
Complexity: Maybe skim the section "The Complexity of Deployment" here: Windows Installer and the creation of WiX. It is astonishing to deal with all the silly bugs that happen in deployment. It is not just a file copy, though it might be easy to think it is. And if it happens to be just a file copy, then there are existing tools that do the job. Free ones too. See links above. And if you think deployment is only file-copy in general, then please skim this list of tasks a deployment task should be capable of supporting: What is the benefit and real purpose of program installation?
Will your home-grown package handle the following? (just some random thoughts)
A malware-infected terminal server PC in Korea with Unicode characters in the path?
Symbolic links and NTFS junction points paths?
A laptop which shuts itself off in the middle of your file copy because it is out of battery?
Out of disk space situations? What about disk errors? And copy timeouts?
What about reboot requirements? For in-use files or some other reason. How are they to be handled? What if the system is in a reboot pending state and you need to detect it before kicking off your install?
How will you reliably install, configure and start and stop services?
How will you support uninstall and cleanup for your application?
Security software which flags your unknown, unrecognized, non-standard package a security threat and quarantines it? How would you begin to deal with this? Who do you contact to get into the good graces of a "recognized binary" for elevation?
Non-standard NTFS permissioning (ACLs) and NT Privileges? How do you detect it and degrade gracefully when you get permission denied? (for whatever reason).
Deployment of necessary runtimes for your application to work? (has been done by many others before). Download of the lastest runtimes if your embedded ones are out of date? Etc...
Provide a standardized way to extract files from your installation binary?
Provide help and support for your setup binaries for users who try to use them?
Etc... This was just a random list of whatever came to mind quickly. There are obviously many issues.
This was a bit over the top for what you asked, but don't be fooled to think deployment is something you can sort out a solution for in a few hours. And definitely don't take the job promising to do so - if that is what you are being asked. Just my two cents.
The above issues, and many others, are what people discover they have to handle when creating deployment software - for all but the most trivial deployments. Don't waste your time - use some established tool.
Transaction: If you are working in a corporation and just need your files to your testers, you can deploy using batch files for that matter - if you would like to. But you have to support it, and I guarantee you it will take a lot of your time. What do you do when the batch file failed half-way through due to a network error, and your testers are testing files that are inconsistent? Future deployment technologies may be better for such light-weight tasks. Perhaps the biggest feature of a deployment tool is to report whether the deployment completed successfully or not, and to log the errors and to roll the machine back to a stable state if something failed. Windows Installer does a lot of this work for you.
Distribution: A lot of people feel they can "just replicate my build folder to the user's computers". The complexities involved here are many. There is network involved, and network can never be assumed to be reliable, you need lots of error handling here. Then there is the issue of transactions: when do you know when the computer is in a stable state and should stop replicating. How often do you replicate, only on demand? How do you deal with the few computers that failed to replicate. How do you tell the users? These are distribution issues. Corporations have huge tools such as SCCM to deal with all these error conditions. Trying to re-implement all these checks, logging and features will take a long time. In the end you will have re-created an existing distribution system. Full circle. And how do you do inventory of your computers when there is no product registered as installed since only a batch file or script ran? And if you start replicating a lot of packages, how many times do you scan each file to determine if they are up to date? How much network traffic do you want to create? Where does it end? The answer: I guess transactions must be implemented with full logging and error tracking and rollback. Then you are full circle to a distribution system like I mentioned above and a supported package format as well.
This "just replicate my build folder to my users" ideas somehow remind me of this list: https://en.wikipedia.org/wiki/Fallacies_of_distributed_computing. Not a 100% match, but the issues are reminiscent. When networking is involved, things start to become very unpredictable and you need logging, error control, transactions, rollback, network communication, etc... We have re-discovered large scale deployment - the beast that it is.
Network: and let's say you want to replicate your build folder to 10000 desktop machines in your enterprise. How do you kick off the replication? Do you start all replications at once and take down the trading floor of the bank as file replication takes over the whole network like a DDOS attack? Sorry - it is getting out of hand - please pardon the lunacy - but it really is upsetting that this replication approach is seen as viable for large scale deployment with current technology approaches. Built-in Windows features could help, but still need to be tested properly. You need scheduling, queuing, caching, regional distribution shares, logging, reporting / inventory, and God knows what else that a packaging / deployment system gives you already. And re-implementing it will be a pain train of brand new bugs to deal with.
Maybe we one day will see automatic output folder replication based on automatic package generation which really works via an intelligent and transacted distribution system. Many corporate teams are trying, and by using existing tools they get closer with standard package formats used. I guess current cloud deployment systems are moving in this direction with online repositories and easy, interactive installation, but we still need to package our software intelligently. It will be interesting to see what the future holds and what new problems result for packaging and distribution in the age of the cloud.
As we pull files directly from online repositories on-demand we will see a bunch of new problems? Malware, spoofing and injection? (already problematic, but could get worse). Remote files deleted without warning (to get rid of vulnerable releases that should no longer be used - leaving users stranded)? Certificate and signature problems? Firewalls & proxy issues? Auto-magic updates with unfortunate bugs hitting everyone immediately and unexpectedly? And the fallacies of the network and other factors as linked to above. Beats me. We will see.
OK, it became a rant as usual - and that last paragraph is heading over board with speculation (and some of the issues already apply to current deployment). Sorry about that. But do try to get management approval to use an existing packaging & deployment solution is my only advice.
Links:
Stefan Kruger's Installsite.org twitter feed: https://twitter.com/installsite
Choosing a deployment tool:
How to create windows installer
What installation product to use? InstallShield, WiX, Wise, Advanced Installer, etc
Windows Installer and the creation of WiX
WiX quick start tips
More on dark.exe (a bit down the page)

ETL Tool Which is most configurable

I am looking for best suited ETL Tool for the following criteria.
Supports MongoDB
Accepts Metadata as input (Or accepts file and builds its metadata on the fly)
provides configurable Mapping. (mapping can be defined from outside development, using some file ot table)
Please suggest the tool which caters to the above needs.
Hmm, your questing is looking most configurable ETL tool. From past years of experience in ETL process, I can inform you that you will never find such tool that meets all your demands. Especially when you have Enterprise level data warehouse (needed because of high and complex reporting needs), the only one software solution is to build your own custom project based ETL software, which is often ungrateful.
But (big BUT), you can achieve at least 80% of needs with existing tools. Plugins, smart usage of scripts, good data-flow design and (if needed) small custom software in pair with scheduling could help you out to fulfill imagined process. ETL process doesn't seem to be different in compare to any other work - 80% of the work is done in 20% of time, and the rest of work (20%) is done in 80% of time.
My suggestion for you:
Pentaho Data Integration - free and open source
PDI is powerfull ETL tool, and surley can meet your demands. There is a plenty of plugins, solid level community and fine API if you're going to develop more plugins.
Pentaho Data Integration + Integration Server - Enterprise Edition - "cheap enough" for almost every medium size projects
Enterprise edition has everything like free edition, including more plugins (JMS producer for example), version control system, instaview's and ect.
Beside, it has it own Server so scheduling is software based (not OS based), logging, better management and most important thing - support!
Informatica or Microsoft SSIS - expensive and brilliant
I would not wasting words for this tools. Informatica is primary ETL oriented company that using Informatica on high level require deep understanding of DB/DWH design, ETL process, PL/SQL, dimensional modeling ect.
SSIS is primary constructed for SQL Server, so I don't see high usage needs if at least one of your source db or target db (DWH) is not running on SQL Server.
Conclusion
This is just a scratch of plenty tools that market provide to us. Someone else would probably not even mention these tools. Please look one of the lists.
Almost each BI system has it own ETL tool. Maybe the good choice would be to use it together, in that way you will be in possibility to use maximum from both.
Note: Good ETL project manager, or ETL developer can extend tool advantages to level that better/more expensive have!

How could Database Projects and TFS help our dev team?

I have just started working with a new company in a very small IT department, and an even smaller dev team (just 2 developers including me). We mainly develop in house web applications for the company.
Now my background is in desktop applications so this job comes with a slight learning curve for me having never developed ASP.Net web applications before. Currently we do not use TFS however I have made the suggestion and it is something we are going to be adopting soon.
I am also considering recommending we move our SQL databases into database projects as currently we do nightly backups to protect the data but all the updates are manual, we connect to the database and execute queries etc.
Im not a DBA but in my last job we were in the process of migrating our databases to database projects and the DBAs seemed to love the idea. What would the benefits and potential downfalls of this be? Would it aid us with updating databases in our live enviroment after development has been done? Obviously we dont want to loose any data but just update tables / Stored Procs etc.
As a side question I have very limited knowledge of TFS, and although we are going to be using it to handle our version control is it possible to use TFS to update our live websites automatically once development has finished?
Sorry if this is quite a broad question, I am attempting to research this myself but I would like to hear from people who actually use the products and do these things.
Thanks
About database projects: I, and several dba's I know, have had mixed experiences with them. I'm not sure they are exactly where they should be at this time but it may be simply a function of how I work. The deployment model is... difficult and can result in some unexpected behavior. If you go this route test test test to make sure you understand exactly what's happening.
If you are just trying to get version control for the database you might consider SQL Source Control from Red Gate. It looks pretty nice and hooks into TFS. I used one of the early versions (beta and 1.0) for awhile and was very happy with it. Now that I think about it, I'm not sure why I don't have it here... ;)
As far as deploying out of TFS, you can absolutely do this. We have a build server set up so that whenever code is checked in the build server automatically spins up to compile and deploy it out to one of our testing sites. Look here for a primer to get you started. This does require some configuration on your web server to properly support and the documentation is spotty at best.
Once you are happy with the test area, you can hook something into changes to the Build Quality so that it pushes the code to a staging or even production server... Or simply have a build setup to recompile and deploy out there. Although I don't recommend doing production pushes this way. Not because of a technical issue, but rather a timing one. It's usually much faster to just copy / paste from one location to production when necessary thereby limiting downtime.

Does CI need a CI-Server

Is a CI server required for continous integration?
In order to facilitate continous integration you need to automate the build, distribution, and deploy processes. Each of these steps is possible without any specialized CI-Server. Coordinating these activities can be done through file notifications and other low level mechanisms; however, a database driven backend (a CI-Server) coordinating these steps greatly enhances the reliability, scalability, and maintainability of your systems.
You don't need a dedicated server, but a build machine of some kind is invaluable, otherwise there is no single central place where the code is always being built and tested. Although you can mimic this affect using a developer machine, there's the risk of overlap with the code that is being changed on that machine.
BTW I use Hudson, which is pretty light weight - doesn't need much to get it going.
It's important to use a dedicated machine so that you get independent verification, without corruption.
For small projects, it can be a pretty basic machine, so don't let hardware costs get you down. You probably have an old machine in a closet that is good enough.
You can also avoid dedicated hardware by using a virtual machine. Best bet is to find a server that is doing something else but is underloaded, and put the VM on it.
Before I ever heard the term "continuous-integration" (This was back in 2002 or 2003) I wrote a nightly build script that connected to cvs, grabbed a clean copy of the main project and the five smaller sub-projects, built all the jars via ant then built and redeployed a WAR file via a second ant script that used the tomcat ant tasks.
It ran via cron at 7pm and sent email with a bunch of attached output files. We used it for the entire 7 months of the project and it stayed in use for the next 20 months of maintenance and improvements.
It worked fine but I would prefer hudson over bash scripts, cron and ant.
A separate machine is really necessary if you have more than one developer on the project.
If you're using the .NET technology stack here's some pointers:
CruiseControl.Net is fairly lightweight. That's what we use. You could probably run it on your development machine without too much trouble.
You don't need to install or run Visual Studio unless you have Visual Studio Setup Projects. Instead, you can use a free command line build tool called MSBuild.

Are there better clients for viewing System Monitor logs?

Does anyone know of a better GUI client for displaying Windows System Monitor log files? (System Monitor is sometimes called Performance Monitor.) I'm trying to track a long-term memory leak in a C# application running on Windows XP or 2K3 by comparing memory usages to run logs.
Specifically I want a client that will allow me to see the following (because System Monitor is unable or difficult):
Specify exact date time ranges for viewing data (or at least finer granularity than hours)
Show time intervals along the horizontal axis
Show max, min, average for the time range
Somewhere show the interval on which source data was captured (1 sec, 5 min, etc.)
(If no such thing exists I'm willing to hear recommendations for better long term performance/memory capturing tools.)
Edit: I've done Google searches and haven't found anything except tutorials on how to create System Monitor logs.
See this question.
The PAL tool does a nice job of creating an HTML report with charts and graphs. By creating your own Threshold file you can control what goes into the report.
While I accepted Patrick Cuff's answer, for my needs I found a better way to graph the data: Excel
It still doesn't provide everything I need, but it is a marked improvement over the System Monitor GUI. I use the relog command line tool to convert the log into a CSV, and then import the CSV into Excel. Excel does not automatically handle the third one, but I can add new columns to graph, and it does allow me to have better control over which data I'm displaying.
One of the tricks that I have used in the past is to use performance/system monitor to log this data to a SQL database. SQL Expression can work great for this. Then you can generate reports using Reporting Services or for the more adventurous types you can do some cube analysis with Analysis Services. So while this does not solve the UI problem, it does allow you to make your own UI. When I had done this previously I just used a simple Reporting Services graph.
SCOM 2007 with reporting services actually does a pretty good job of this. If not the SQLh2 tool is almost as good and its free. You will probably have to customize the reports yourself though

Resources