We currently use Datastage ETL to - Export a CSV/text file with data from 15 tables(3 different schemas) on a daily basis.
I am wondering If there is a simpler way to accomplish this with out using an ETL. I tried Scriptella. It looks simple/fast, but it again it is an ETL. Please suggest..
We use Python. Every programming language -- every single one ever invented -- is an alternative to an ETL.
You never need an ETL.
The questions is these:
Which is cheaper to build? Custom software or a configuration of an ETL?
Which is cheaper to maintain an operate?
Which is easier to adapt to changing requirements?
Why not use a free and easy to use ETL tool such as expressor Studio. You can download it at http://www.expressorstudio.com.
My 2 cents.
Datastage is an awful tool, and expensive to license.
SSIS is much simpler, or cloverETL is good.
ETL tool vs code is a good question.
ETL tools often have better performance as can queue data up ready to be used
where programming is is going to do this one at a time, and datastage can do this in parallel (but again i think it blows). PLus ETL tools can get data from multiple heterogeneous sources, where as you cant do this (easily) with code.
However if any data transformations etc are all to be done with data on the same server, I generally end up doing as much in SQL/TSQL(or PL/SQL) as possible, as it is just tonnes easier to debug/maintain. Primary Keys/Foreign Keys are your friend, and any missed lookups can be checked through checking counts later on to ensure data integrity is in order.
You do not need a ETL tool for that purpose. You can perform all the tasks using python, right from extracting data from CSVs/XMLs/text files, transforming data (identifying data types, null value transformation) and loading into tables.
https://towardsdatascience.com/python-etl-tools-best-8-options-5ef731e70b49
ETL can definitely be performed without the help of ETL Tools.
for eg: we can develop python scripts or there is open sources like Drift to work with it.
I think it's better to use a cheap ETL tool for your task. Because ETL tools work better than code always and make your task easy.
ETL Tool Vs Manual Script
“According to the IT research firm Forrester, the low-code development
platform market will reach a value of $21.2 billion by 2022, growing
at an annual rate of 40 percent. What’s more, 45 percent of developers
have already used a low-code platform or expect to do so in the near
future.”
Related
I have a usecase. I want to Integrate / Transform data from different / disparate sources without storing it. Data sources are database(oracle,db2,etc), Webservice(Rest/Soap), Flat files(CSV, XML, JSON), MQ dumps, mainframe systems. I want to pull data from these sources and do some kind of intelligent transformation and integration and provide it our customers. It looks like typical ETL scenario, but my situation is different. I am not allowed to store the data given by the desperate sources, that means, for simple example, i pull data from oracle, soap and a rest, and do all my intelligent transformations and integrations on the fly.
I browsed through google and technical stuffs but could not get convincing solution to my problem.
If you guys can help me giving some valuable insight on this problem and give suggestion and probable approaches to it.
Note: Data size from these sources can sometime be really huge.
Thanks in Advance
Take a look at htto://teiid.org
Thst is exactly what it does, and it is Open Source.
Talend Open Studio y a great solution as well, I'm using it and it's great and easy to make the ETL workflow.
https://www.talend.com/products/data-integration/data-integration-manuals-release-notes/
You can see a lot of help videos: https://www.youtube.com/results?search_query=talend+studio
This may be a question for Survey Monkey, but I felt that someone here may have encountered something like this in past experiences. Is there a way to work with the API of Survey Monkey (SM), to add the information from the survey straight into a database of my own? I realize that I can generate the information into output files, but I was wondering if there was a way to directly access the information from the SM database. I feel like this might cause some privacy concerns for SM. Has anyone attempted this, or would the best option of mine be to create my own surveys without a third party website?
I had a similar issue and here's my solution.
I was doing health related surveys which contain HIPPA protected Personal Health Info. Zapier is NOT HIPAA safe, so the "zap the results over to Google Drive" solution didn't work.
So I wanted a quick n dirty way to grab SM survey data and begin to design a data structure to analyse and store this data. I figured that I would start with <1000 results, sort it out, then build out a bigger/fancier structure as needed.
I just downloaded CSV's of the SM individual responses, munged the downloaded CSV files to make a Python CSV reader happy, then wrote a Python 3.5 script to grab the survey data and spit it out into a couple of output CSV files designed for different analytic purposes.
It was really quick and easy to alter the Python script to deliver different subsets of data to different output files, and really quick and easy to see if these output (CSV or XLS) files really told me what I wanted to know.
This is a really quick and easy way to start analysing right away without spending too much time on procedural overhead. You can alter CSV (or XLS ) tables really quickly and easily, so you can mix and match data / derivative data as much as you want. A wise person once told me "don't think, do." So the more you analyse on small runs of data, the better your final Big Buildout In The Sky will look.
Yah, you can spend a lot of time writing and API and setting up a dbase, but if you are not completely happy with what you want out of the SM data, start small. Hope this helps.
Are there etl conversion tools that can convert a single job or multiple jobs in one go and without taking much time.
AnalytixDS claims to do this with their Litespeed Conversion product.
http://analytixds.com/litespeed-conversion/
A long list of the tools they claim to convert is about half way down the page.
I don't know anything more about it than I have read on their web site, but I've not seen any other product make this claim.
I am not a DBA. However, I work on a web application that lives entirely in an Oracle database (Yes, it uses PL/SQL procedures to write HTML to clobs and then vomits the clob at your browser. No, it wasn't my idea. Yes, I'll wait while you go cry.).
We're having some performance issues, and I've been assigned to find some bottlenecks and remove them. How do I go about measuring Oracle performance and finding these bottlenecks? Our unhelpful sysadmin says that Grid Control wasn't helpful, and that he had to rely on "his experience" and queries against the data dictionary and "v$" views.
I'd like to run some tests against my local Oracle instance and see if I can replicate the problems he found so I can make sure my changes are actually improving things. Could someone please point me in the direction of learning how to do this?
Not too surprising there are entire books written on this topic.
Really what you need to do is divide and conquer.
First thing is to just ask yourself some standard common sense questions. Has performance slowly degraded or was there a big drop in performance recently is an example.
After the obvious a good starting point for you would be to narrow down where to spend your time - top queries is a decent start for you. This will give you particular queries which run for a long time.
If you know specifically what screens in you front-end are slow and you know what stored procedures go with that, I'd put some logging. Simple DBMS_OUTPUT.put_lines with some wall clock information at key points. Then I'd run those interactively in SQLNavigator to see what part of the stored procedure is going slow.
Once you start narrowing it down you can look to evaluate why a particular query is going slow. EXPLAIN_PLAN will be your best friend to start with.
It can be overwhelming to analyze database performance with Grid Control, and I would suggest starting with the simplier AWR report - you can find the scripts to generate them in $ORACLE_HOME/rdbms/admin on the db host. This report will rank the SQL seen in the database by various categories (e.g. CPU time, disk i/o, elapsed time) and give you an idea where the bottlenecks are on the database side.
One advantage of the AWR report is that it is a SQL*Plus script and can be run from any client - it will spool HTML or text files to your client.
edit:
There's a package called DBMS_PROFILER that lets you do what you want, I think. I found out my IDE will profile PL/SQL code as I would guess many other IDE's do. They probably use this package.
http://www.dba-oracle.com/t_dbms_profiler.htm
http://www.databasejournal.com/features/oracle/article.php/2197231/Oracles-DBMSPROFILER-PLSQL-Performance-Tuning.htm
edit 2:
I just tried the Profiler out in PL/SQL Developer. It creates a report on the total time and occurrences of snippets of code during runtime and gives code location as unit name and line number.
original:
I'm in the same boat as you, as far as the crazy PL/SQL generated pages go.
I work in a small office with no programmer particularly versed in advanced features of Oracle. We don't have any established methods of measuring and improving performance. But the best bet I'd guess is to try out different PL/SQL IDE's.
I use PL/SQL Developer by Allaround Automations. It's got a testing functionality that lets you debug your PL/SQL code and that may have some benchmarking feature I haven't used yet.
Hope you find a better answer. I'd like to know too. :)
"I work on a web application that
lives entirely in an Oracle database
(Yes, it uses PL/SQL procedures to
write HTML to clobs and then vomits
the clob at your browser"
Is it the Apex product ? That's the web application environment now included as standard part of the Oracle database (although technically it doesn't spit out CLOBs).
If so there is a whole bunch of instrumentation already built in to the product/environment (eg it keeps a rolling two-week history of activity).
I have a bunch of perfmon files that have captured information over a period of time. Whats the best tool to crunch this information? Idealy I'd like to be able to see avg stats per hour for the object counters that have been monitored.
From my experience, even just Excel makes a pretty good tool for quickly whipping up graphs of perfmon if you relog the data to CSV or TSV. You can just plot a rolling average & see the progression. Excel isn't fancy, but if you don't have more than 30-40 megs of data it can do a pretty quick job. I've found that Excel 2007 tends to get unstable when using tables & over 50 megs of data: at one point an 'undo' caused it to consume 100% cpu & 1.3 GB of RAM.
Addendum - relog isn't the best known tool but it is very useful. I don't know of any GUI front ends, so you just have to run it from the command line. The two most common cases I've used it for are
Removing unnecessary counters from logs that different sysadmin gave me, e.g. the entire process & memory objects.
Converting the binary perfmon logs to .csv or .tsv files.
Perhaps look into using LogParser.
It depends on how the info was logged (Perfmon doesn't lack flexibility)
If they're CSV you can even use the ODBC Text drivers and run queries against them!
(performance would be 'intriguing')
And here's the obligatory link to a CodingHorror article on the topic ;-)
This is a free tool provided on Codeplex, provides charting capabilities, and inbuilt thresholds for differnt server roles, which can also be modified. Generates HTML reports.
http://www.codeplex.com/PAL/Release/ProjectReleases.aspx?ReleaseId=21261
Take a look at SmartMon (www.perfmonanalysis.com). It analyzes Perfmon data in CSV and SQL Server databases.