I have some doubt about "partition wise join", that includes Oracle:
https://www.youtube.com/watch?v=p9BMyQun84Y
Oracle partition-wise join over multiple partitions
I don't understand this functionality. I have watched videos and I still doesn't understand how to use it.
I understand partitioning but I doesn't understand this. I need a simple example.
Thank you.
If you look at the documentation e.g. here, there's a good detailed and lengthy explanation.
Related
I'm planning a repository clean and I would like to know if there is a way to find out in OBIEE (12.2.1.3.0) which tables are not being used at all.
That alone would solve my problem right away. It would be great to have access to a list of tables and fields and which analysis, agents, etc are using them.
Thank you very much!
You can get that all with OBIEE's in-built capabilities but it will take time and effort. Best look at rhe lineage solution by this guy: https://datalysis.ch/
My wide data look like this:
What I am trying to accomplish is long:
I have many Score_X's and each score has many items. So the less hard-coding (e.g. Convert data from wide format to long format in SQL) the better.
I have thought about a few ways to do this; unfortunately Hive does not have many features that other SQL implementations have. So first I would appreciate a solution to my problem, and secondly, if anyone knows easy ways to emulate these things in Hive please do share with me.
The pivot function, which Hive doesn't have.
I tried to apply Joe Stefanelli's answer in Selecting all columns that start with XXX using a wildcard?. Hive does not have INFORMATION_SCHEMA either. I was told (also by stackoverflow) that I could get table metadata by first installing MySQL and then detour through MySQL; I don't feel like spending that much effort on a simple task like reshaping a table...
Then I think I can combine the values of Score_A_1, Score_A_2 and Score_A_3 into one Score_A array and then do a LATERAL VIEW EXPLODE like in myui's answer in How to transpose/pivot data in hive?. But I Googled around and could not find a tutorial to do that.
Thanks. Your help is greatly appreciated.
Update:
So the array function will create an array column from multiple columns. Now I am doing the LATERAL VIEW EXPLODE; through hard-coding (i.e., non-dynamic query) I am getting what I want. However it is difficult to believe that there is not a simpler way to perform a data management task as basic as reshaping. Am I missing something fundamental about Hive?
I have a SQL query:
ANALYSE TABLE CUST_STAT COMPUTE STATISTICS;
it works well in Oracle, but recently I am switching to use PostgreSQL, I change the SQL to:
ANALYSE CUST_STAT COMPUTE STATISTICS;
I already read the manual section on partitioning, I know the TABLE keywords is not needed in PostgreSQL, but I still getting error for the PARTITION :
ANALYZE CUST_STAT PARTITION CUST_STAT_P201307 ;
Can anyone help?
There is no COMPUTE STATISTICS sub-command for ANALYZE in PostgreSQL.
ANALYZE tablename;
per the manual on ANALYZE.
There is also no PARTITION keyword. PostgreSQL's partitioning is limited and largely manual. See the user manual section on partitioning.
The PostgreSQL manual is quite detailed and pretty good. I suggest reading it rather than trying to apply Oracle experience directly to Pg. They're not the same DB.
On partitioning, this tutorial is a bit old and is targeted at EnterpriseDB, but I think it uses only standard features, and it might help introduce the concepts. I haven't reviewed it in detail.
Another simple step-by-step example is on this blog entry.
Examples are no substitute for understanding though, and this is an area you need to understand, not just follow recipes for. If you don't have time for that I strongly recommend seeking someone who does to help you with your implementation in-depth.
One thing I always wonder while writing query is that am I writing most optimized query or not? I know certain things like:
1) using SELECT field1, filed2 instead of SELECT *
2) Giving proper indexes to the tables
but I am sure there are more things that should be kept in mind for writing queries, since most of the database can only grow more and optimal query will help in execution time. Can you share some tips and tricks on writing queries?
Testing is the best way to measure performance. Monitor your queries on the live database and make use of things like the slow query log.
I would also recommend enabling the query cache, which will give most typical usage situations a massive boost.
Use proper data types for your fields
Use back-tick character (`) for reserved keywords
When dealing with multiple tables, try using joins
Resource:
See:
20 SQL Tips
As well as the Do's and Dont's, you may find the Hidden Features of MySQL useful.
As a matter of fact, no "tips" can help you.
Database design require deep knowledge, not tips.
There are always "weight" of these "dont's". Most of such listings fall to list most unimportant things and fail to mention important ones. Your list for example, is if it was culinary forum:
Always use a knife with black handle
To prepare good dish you need to choose proper ingredients.
First one is impressing but never help in the real world.
Second one is right, but must be backed with deep knowledge to make it right.
So, it must be a book, not tips. Ones from Paul Dubios are among recommended.
use below fields necessarily in each table
tablename_id( auto increment , unsigned zerofill)
created_by( timestamp)
tablerow_status( enum ('t','f') by default set 't')
always make an comment when u create a field in mysql( it helps when u search in phpmyadmin))
alwayz take care of Normalization forms
if u r doing some field that would be alwayz positive then select unsigned .
use decimal data type instead of float in somw case( like discount, it should be maximum 99.99% so use decimal( 5,2)
use date, time data type whereve needed, don't use timestamp everywhere
Correlated subqueries are very bad, but often not well understood and end up in production. They can often be fixed by using derived tables and a join instead.
http://en.wikipedia.org/wiki/Correlated_subquery
One more thing I found today is regarding the difference between COUNT(*) and COUNT(col)
Using COUNT(*) is faster than COUNT(col)
MYISAM tables cached number of rows in this table, for innoDB doesn't cache row count and may be slower without WHERE clause
It is better to use NOT NULL column for both MYISAM and innoDB than some other column where Null is allowed.
More details here
I'm working on a project with a friend that will utilize Hbase to store it's data. Are there any good query examples? I seem to be writing a ton of Java code to iterate through lists of RowResult's when, in SQL land, I could write a simple query. Am I missing something? Or is Hbase missing something?
I think you, like many of us, are making the mistake of treating bigtable and HBase like just another RDBMS when it's actually a column-oriented storage model meant for efficiently storing and retrieving large sets of sparse data. This means storing, ideally, many-to-one relationships within a single row, for example. Your queries should return very few rows but contain (potentially) many datapoints.
Perhaps if you told us more about what you were trying to store, we could help you design your schema to match the bigtable/HBase way of doing things.
For a good rundown of what HBase does differently than a "traditional" RDBMS, check out this awesome article: Matching Impedance: When to use HBase by Bryan Duxbury.
If you want to access HBase using a query language and a JDBC driver it is possible. Paul Ambrose has released a library called HBQL at hbql.com that will help you do this. I've used it for a couple of projects and it works well. You obviously won't have access to full SQL, but it does make it a little easier to use.
I looked at Hadoop and Hbase and as Sean said, I soon realised it didn't give me what I actually wanted, which was a clustered JDBC compliant database.
I think you could be better off using something like C-JDBC or HA-JDBC which seem more like what I was was after. (Personally, I haven't got farther with either of these other than reading the documentation so I can't tell which of them is any good, if any.)
I'd recommend taking a look at Apache Hive project, which is similar to HBase (in the sense that it's a distributed database) which implements a SQL-esque language.
Thanks for the reply Sean, and sorry for my late response. I often make the mistake of treating HBase like a RDBMS. So often in fact that I've had to re-write code because of it! It's such a hard thing to unlearn.
Right now we have only 4 tables. Which, in this case, is very few considering my background. I was just hoping to use some RDBMS functionality while mostly sticking to the column-oriented storage model.
Glad to hear you guys are using HBase! I'm not an expert by any stretch of the imagination, but here are a couple of things that might help.
HBase is based on / inspired by BigTable, which happens to be exposed by AppEngine as their db api, so browsing their docs should help a great deal if you're working on a webapp.
If you're not working on a webapp, the kind of iterating you're describing is usually handled with via map/reduce (don't emit the values you don't want). Skipping over values using iterators virtually guarantees your application will have bottlenecks with HBase-sized data sets. If you find you're still thinking in SQL, check out cloudera's pig tutorial and hive tutorial.
Basically the whole HBase/SQL mental difference (for non-webapps) boils down to "Send the computation to the data, don't send the data to the computation" -- if you keep that in mind while you're coding you'll do fine :-)
Regards,
David