how to sort_area_size increase - performance

how set sort_area_size in oracle 10g and what size should be as i have more than 2.2m rows in single table. and please tell me the suggested size of SORT_AREA_RETAINED_SIZE
as my queries are too much slow they takes more than 1 hours to complete. (mostly)
please suggest me the way by which i can optimize my queries and tune the database oracle 10g
thanks
updated with query
the query is
SELECT A.TITLE,C.TOWN_VILL U_R,F.CODE TOWN_CODE,F.CITY_TOWN_MAKE,A.FRM,A.PRD_CODE,A.BR_CODE,A.SIZE_CODE ,B.PRICES,
A.PROJECT_YY,A.PROJECT_MM,d.province ,D.BR_CODE BRANCH_CODE,D.STRATUM,L.LSM_GRP LSM,
SUM(GET_FRAC_FACTOR_ALL_PR_NEW(A.FRM,A.PRD_CODE,A.BR_CODE,A.SIZE_CODE,A.PROJECT_YY,A.PROJECT_MM,A.FRAC_CODE ,B.PRICES,A.QTY_USED,A.VERIF_CODE, A.PACKING_CODE, J.TYPE ,'R') )
* MAX(D.UNIVERSE) / MAX(E.SAMPLE) /1000000 MARKET , D.UNIVERSE ,E.SAMPLE
FROM A2_FOR_CPMARKETS A,
BRAND J,
PRICES B,CP_SAMPLE_ALL_MONTHS C ,
CP_LSM L,
HOUSEHOLD_GL D,
SAMPLE_CP_ALL_MONTHS E ,
City_Town_ALL F
WHERE A.PRD_CODE = B.PRD_CODE
AND A.BR_CODE = B.BR_CODE
AND DECODE(A.SIZE_CODE,NULL,'L',A.SIZE_CODE) = B.SIZE_CODE -- for unbranded loose
AND DECODE(B.VAR_CODE,'X','X',A.VAR_CODE) = B.VAR_CODE
AND DECODE(B.COL_CODE,'X','X',A.COL_CODE) = B.COL_CODE
AND DECODE(B.PACK_CODE,'X','X',A.PACKING_CODE) = B.PACK_CODE
AND A.project_yy||A.project_MM BETWEEN B.START_DATE AND B.END_DATE
AND A.PRD_CODE=J.PRD_CODE
AND A.BR_CODE=J.BR_CODE
AND A.FRM = C.FRM
AND A.PROJECT_YY=L.YEAR
AND A.frm=L.FORM_NO
AND C.TOWN_VILL= D.U_R
AND C.CLASS = D.CLASS
AND D.TOWN=F.GRP
AND D.TOWN = E.TOWN_CODE
AND A.PROJECT_YY = E.PROJECT_YY
AND A.PROJECT_MM = E.PROJECT_MM
AND A.PROJECT_YY = C.PROJECT_YY
AND A.PROJECT_MM = C.PROJECT_MM
-- FOR HOUSEJOLD_GL
AND A.PROJECT_YY = D.YEAR
AND A.PROJECT_MM = D.MONTH
-- END HOUSEHOLD_GL
AND C.TOWN_VILL = E.TOWN_VILL
AND C.CLASS = E.CLASS
AND C.TOWN_VILL = F.TOWN_VILL
AND C.TOWN_CODE=F.CODE
AND (DECODE(e.PROJECT_YY,'1997','1','1998','1','1999','1','2000','1','2001','1','2002','1','2') = F.TYP )
GROUP BY A.TITLE,C.TOWN_VILL,F.CODE ,F.CITY_TOWN_MAKE,A.FRM,A.PRD_CODE,A.BR_CODE,A.SIZE_CODE ,B.PRICES,
A.PROJECT_YY,A.PROJECT_MM,d.province,D.BR_CODE ,D.STRATUM,L.LSM_GRP ,
UNIVERSE ,E.SAMPLE
![alt text][1]
[1]: http://C:\Documents and Settings\Hussain\My Documents\My Pictures\explain plan.jpg

Check here for Oracle Documentation for SORT_AREA_SIZE. You can use alter session set sort_area_size=10000 command to modify this for the session and alter system for system. It is the same way for SORT_AREA_RETAINED_SIZE.
Is you entire table (with 2.2 m rows) fetched in the result set? Is there some sort operation in it?
There could be some other reasons for the query to perform badly. Can you share the query and explain plan?

When you run an execution plan for the query using the DBMS_Xplan.Display method oracle will estimate (usually pretty reasonably) what size of temporary tablespace storage you would need to execute it.
2.2 Million rows may be irrelevant to the sort size by the way. the memory required for aggregate operations such as MAX and SUM are more related to the size of the result set than to the size of the source data.
Providing a link to a jpg file stored on your pc does not count as having provided an execution plan, btw.

A.project_yy||A.project_MM BETWEEN B.START_DATE AND B.END_DATE
You know we have DATE datatypes in databases, right ? Using the incorrect datatypes makes it harder for Oracle to determine data distributions, predicate selectivity and appropriate query plans

Related

Optimizing Apache Spark SQL Queries

I am facing very long latencies on Apache Spark when running some SQL queries. In order to simplify the query, I run my calculations in a sequential manner: The output of each query is stored as a temporary table (.registerTempTable('TEMP')) so it can be used in the following SQL query and so on... But the query takes too much time, while in 'Pure Python' code, it takes just a few minutes.
sqlContext.sql("""
SELECT PFMT.* ,
DICO_SITES.CodeAPI
FROM PFMT
INNER JOIN DICO_SITES
ON PFMT.assembly_department = DICO_SITES.CodeProg """).registerTempTable("PFMT_API_CODE")
sqlContext.sql("""
SELECT GAMMA.*,
(GAMMA.VOLUME*GAMMA.PRORATA)/100 AS VOLUME_PER_SUPPLIER
FROM
(SELECT PFMT_API_CODE.* ,
SUPPLIERS_PROP.CODE_SITE_FOURNISSEUR,
SUPPLIERS_PROP.PRORATA
FROM PFMT_API_CODE
INNER JOIN SUPPLIERS_PROP ON PFMT_API_CODE.reference = SUPPLIERS_PROP.PIE_NUMERO
AND PFMT_API_CODE.project_code = SUPPLIERS_PROP.FAM_CODE
AND PFMT_API_CODE.CodeAPI = SUPPLIERS_PROP.SITE_UTILISATION_FINAL) GAMMA """).registerTempTable("TEMP_ONE")
sqlContext.sql("""
SELECT TEMP_ONE.* ,
ADCP_DATA.* ,
CASE
WHEN ADCP_DATA.WEEK <= weekofyear(from_unixtime(unix_timestamp())) + 24 THEN ADCP_DATA.CAPACITY_ST + ADCP_DATA.ADD_CAPACITY_ST
WHEN ADCP_DATA.WEEK > weekofyear(from_unixtime(unix_timestamp())) + 24 THEN ADCP_DATA.CAPACITY_LT + ADCP_DATA.ADD_CAPACITY_LT
END AS CAPACITY_REF
FROM TEMP_ONE
INNER JOIN ADCP_DATA
ON TEMP_ONE.reference = ADCP_DATA.PART_NUMBER
AND TEMP_ONE.CodeAPI = ADCP_DATA.API_CODE
AND TEMP_ONE.project_code = ADCP_DATA.PROJECT_CODE
AND TEMP_ONE.CODE_SITE_FOURNISSEUR = ADCP_DATA.SUPPLIER_SITE_CODE
AND TEMP_ONE.WEEK_NUM = ADCP_DATA.WEEK_NUM
""" ).registerTempTable('TEMP_BIS')
sqlContext.sql("""
SELECT TEMP_BIS.CSF_ID,
TEMP_BIS.CF_ID ,
TEMP_BIS.CAPACITY_REF,
TEMP_BIS.VOLUME_PER_SUPPLIER,
CASE
WHEN TEMP_BIS.CAPACITY_REF >= VOLUME_PER_SUPPLIER THEN 'CAPACITY_OK'
WHEN TEMP_BIS.CAPACITY_REF < VOLUME_PER_SUPPLIER THEN 'CAPACITY_NOK'
END AS CAPACITY_CHECK
FROM TEMP_BIS
""").take(100)
Could anyone highlight (if there are any) the best practices for writing pyspark SQL queries on Spark?
Does it make sense that locally on my computer the script is much faster than on the Hadoop cluster?
Thanks in advance
You should cache your intermediate results, what is the data source?
can you retrieve only relevant data from it or only relevant columns. There are many options you should provide more info about your data.

LINQ to SQL deferred execution and materialization

I am sort of new to this. Curious what happens in the following situation?
var q = //MY LINQ TO SQL QUERY.Select(...)
.........
.........
var c = q.Count();
.........
.........
var x = q.Where(....).Select(....);
var y = x.ToList();//or something such that forces materialization
var d = q.Count();//same as c
var e = x.Count();
var f = y.Count();
How many times did the sql statements make a trip to the db actually? Once at Count(). Again at Where()? Or Linq retains what it materialized during Count()?
Or it also depends on what the Where(..) has? Like if it is again referencing to database vs it just references what's obtained as part of 'q'/ or any other .net collections etc?
Edit:
Updated my code with a couple of other scenarios. Please correct my answers below:
q -no db trip
c -yes, but translates to aggregate qry - select Count(*) and not a result set (as per answer below)
x -no db trip. No matter what is written in the Where(..)
y - yes
d - yes - does not *reuse* c
e - yes - select count(*) EVEN THOUGH x already materized during y
f - no db trip
When you call Count it does not materialize the entire data set. Rather it prepares and executes a query like
SELECT COUNT(*) FROM ...
Using ExecuteScalar to get the result.
When you call Where and Select it does not materialize anything (assuming q is an IQueryable). Instead it's just preparing a query like
SELECT col1, col2, ... FROM ...
But it doesn't actually execute it at that point. It will only execute the query when you call GetEnumerator on q. You'll rarely do this directly, but anything like the following will cause your query to be executed:
var arry = q.ToArray();
var list = q.ToList();
foreach(var rec in q) ...
It will only execute this query once, so having multiple foreach loops will not create multiple database queries. Of course, if you create a new IQueryable based on q (e.g. var q2 = q.Where(...)) it will not be tied to the result set used by q, so it will have to query the database again.
I tested out your code in LINQPad, and it seems all your analyses are correct.

Entity Framework SQL Selecting 600+ Columns

I have a query generated by entity framework running against oracle that's too slow. It runs in about 4 seconds.
This is the main portion of my query
var query = from x in db.BUILDINGs
join pro_co in db.PROFILE_COMMUNITY on x.COMMUNITY_ID equals pro_co.COMMUNITY_ID
join co in db.COMMUNITies on x.COMMUNITY_ID equals co.COMMUNITY_ID
join st in db.STATE_PROFILE on co.STATE_CD equals st.STATE_CD
where pro_co.PROFILE_NM == authorizedUser.ProfileName
select new
{
COMMUNITY_ID = x.COMMUNITY_ID,
COUNTY_ID = x.COUNTY_ID,
REALTOR_GROUP_NM = x.REALTOR_GROUP_NM,
BUILDING_NAME_TX = x.BUILDING_NAME_TX,
ACTIVE_FL = x.ACTIVE_FL,
CONSTR_SQFT_AVAIL_NB = x.CONSTR_SQFT_AVAIL_NB,
TRANS_RAIL_FL = x.TRANS_RAIL_FL,
LAST_UPDATED_DT = x.LAST_UPDATED_DT,
CREATED_DATE = x.CREATED_DATE,
BUILDING_ADDRESS_TX = x.BUILDING_ADDRESS_TX,
BUILDING_ID = x.BUILDING_ID,
COMMUNITY_NM = co.COMMUNITY_NM,
IMAGECOUNT = x.BUILDING_IMAGE2.Count(),
StateCode = st.STATE_NM,
BuildingTypeItems = x.BUILDING_TYPE_ITEM,
BuildingZoningItems = x.BUILDING_ZONING_ITEM,
BuildingSpecFeatures = x.BUILDING_SPEC_FEATURE_ITEM,
buildingHide = x.BUILDING_HIDE,
buildinghideSort = x.BUILDING_HIDE.Count(y => y.PROFILE_NM == ProfileName) > 0 ? 1 : 0,
BUILDING_CITY_TX = x.BUILDING_CITY_TX,
BUILDING_ZIP_TX = x.BUILDING_ZIP_TX,
LPF_GENERAL_DS = x.LPF_GENERAL_DS,
CONSTR_SQFT_TOTAL_NB = x.CONSTR_SQFT_TOTAL_NB,
CONSTR_STORIES_NB = x.CONSTR_STORIES_NB,
CONSTR_CEILING_CENTER_NB = x.CONSTR_CEILING_CENTER_NB,
CONSTR_CEILING_EAVES_NB = x.CONSTR_CEILING_EAVES_NB,
DESCR_EXPANDABLE_FL = x.DESCR_EXPANDABLE_FL,
CONSTR_MATERIAL_TYPE_TX = x.CONSTR_MATERIAL_TYPE_TX,
SITE_ACRES_SALE_NB = x.SITE_ACRES_SALE_NB,
DESCR_PREVIOUS_USE_TX = x.DESCR_PREVIOUS_USE_TX,
CONSTR_YEAR_BUILT_TX = x.CONSTR_YEAR_BUILT_TX,
DESCR_SUBDIVIDE_FL = x.DESCR_SUBDIVIDE_FL,
LOCATION_CITY_LIMITS_FL = x.LOCATION_CITY_LIMITS_FL,
TRANS_INTERSTATE_NEAREST_TX = x.TRANS_INTERSTATE_NEAREST_TX,
TRANS_INTERSTATE_MILES_NB = x.TRANS_INTERSTATE_MILES_NB,
TRANS_HIGHWAY_NAME_TX = x.TRANS_HIGHWAY_NAME_TX,
TRANS_HIGHWAY_MILES_NB = x.TRANS_HIGHWAY_MILES_NB,
TRANS_AIRPORT_COM_NAME_TX = x.TRANS_AIRPORT_COM_NAME_TX,
TRANS_AIRPORT_COM_MILES_NB = x.TRANS_AIRPORT_COM_MILES_NB,
UTIL_ELEC_SUPPLIER_TX = x.UTIL_ELEC_SUPPLIER_TX,
UTIL_GAS_SUPPLIER_TX = x.UTIL_GAS_SUPPLIER_TX,
UTIL_WATER_SUPPLIER_TX = x.UTIL_WATER_SUPPLIER_TX,
UTIL_SEWER_SUPPLIER_TX = x.UTIL_SEWER_SUPPLIER_TX,
UTIL_PHONE_SVC_PVD_TX = x.UTIL_PHONE_SVC_PVD_TX,
CONTACT_ORGANIZATION_TX = x.CONTACT_ORGANIZATION_TX,
CONTACT_PHONE_TX = x.CONTACT_PHONE_TX,
CONTACT_EMAIL_TX = x.CONTACT_EMAIL_TX,
TERMS_SALE_PRICE_TX = x.TERMS_SALE_PRICE_TX,
TERMS_LEASE_SQFT_NB = x.TERMS_LEASE_SQFT_NB
};
There is a section of code that tacks on dynamic where and sort clauses to the query but I've left those out. The query takes about 4 seconds to run no matter what is in the where and sort.
I dropped the generated SQL in Oracle and an explain plan didn't appear to show anything that screamed fix me. Cost is 1554
If this isn't allowed I apologize but I can't seem to find a good way to share this information. I've uploaded the explain plan generated by Sql Developer here: http://www.123server.org/files/explainPlanzip-e1d291efcd.html
Table Layout
Building
--------------------
- BuildingID
- CommunityId
- Lots of other columns
Profile_Community
-----------------------
- CommunityId
- ProfileNM
- lots of other columns
state_profile
---------------------
- StateCD
- ProfileNm
- lots of other columns
Profile
---------------------
- Profile-NM
- a few other columns
All of the tables with allot of columns have 120-150 columns each. It seems like entity is generating a select statement that pulls every column from every table instead of just the ones I want.
The thing that's bugging me and I think might be my issue is that in my LINQ I've selected 50 items, but the generated sql is returning 677 columns. I think returning so many columns is the source of my slowness possibly.
Any ideas why I am getting so many columns returned in SQL or how to speed my query?
I have a suspicion some of the performance is being impacted by your object creation. Try running the query without just a basic "select x" and see if it's the SQL query taking time or the object creation.
Also if the query being generated is too complicated you could try separating it out into smaller sub-queries which gradually enrich your object rather than trying to query everything at once.
I ended up creating a view and having the view only select the columns I wanted and joining on things that needed to be left-joined in linq.
It's pretty annoying that EF selects every column from every table you're trying to join across. But I guess I only noticed this because I am joining a bunch of tables with 150+ columns in them.

How to compare two tables in linq to sql?

I am trying to compare two tables (i.e values, count, etc..) in linq to sql but I am not getting the way to achieve it. I tried the following,
Table1.Any(i => i.itemNo == Table2.itemNo)
It gives error. Could you please help me?
Thanks in Advance.
how about
var isDifferent =
Table1.Zip(Table2, (j, k) => j.itemNo == k.itemMo).Any(m => !m);
EDIT
if Linq-To-Sql does not support Zip.
var one = Table1.ToList();
var two = Table2.ToList();
var isDifferent =
one.Zip(two, (j, k) => j.itemNo == k.itemMo).Any(m => !m);
if the tables are vary large this could cause performance problems. In that case you will need a much more sophisticated solution, if so, please ask.
EDIT2
If the tables are very large you don't want to get all the data from the server and hold it memory. Additionaly, Linq and SQL server do not garauntee the order of the rows unless you specify an order in the query. This becomes espcially relavent for large result sets returned by a multi processor server where the effects of parallelism are likely to come into play.
I suggest that Linq-to-Sql doesen't really cater well for your scenario so you will have to help it out using ExecuteQuery somthing like this.
string zipQuery =
#"SELECT TOP 1
1
FROM
[Table1] [one]
WHERE
NOT EXISTS (
SELECT * FROM [Table2] [two] WHERE [two].[itemNo] = [one].[itemNo]
)
UNION ALL
SELECT
1
FROM
[Table2] [two]
WHERE
NOT EXISTS (
SELECT * FROM [Table1] [one] WHERE [one].[itemNo] = [two].[itemNo]
)
UNION ALL
SELECT 0";
var isDifferent = context.ExecuteQuery<int>(zipQuery).Single() == 1;
This will do the select on the server without returning lots of data to the client but, I think you will agree is much more complicated.
EDIT3
Okay, the zip approach should be fine for 1000 rows. I've read your comment and I suggest changing the code accordingly.
var one = Table1.ToList();
var two = Table2.ToList();
var isDifferent =
one.Count != two.Count ||
one.Zip(two, (o, t) => o.itemNo == k.itemNo).Any(m => !m);
You should probably consider putting an order by on the list retrievers, like this.
var one = Table1.OrderBy(o => o.itemNo).ToList();
Strictly, the results of a Linq-to-Sql come back in any order unless an order is specified.

How to speed up query

var ci = ctx.CI().Where(p => p.PId == pId);
var result = ctx.RM().Where(p => p.R.D.PId == Id && p.MTId == mt.Id).
Sum(p => (((p.M.TN * p.EC * p.F.PW * 52m) + (p.M.TN * p.EC * p.F.PY * (WW / 52m)))
/ 100m) * ci.FirstOrDefault(q => q.PId == p.R.PId.Value && q.FPId == p.R.FPId.Value).Factor);
8000 records. Query Takes 2000ms to load doing it this way and 4000 using join on CI and RM.
As you can see there are 6 tables used. CC, RM, R, D, F and M.
Model was defined using CodeFirst, so i'm using EF 4.1.
How can i speed up my query to run way faster than 2 seconds?
With a complex query like this, I'm guessing that a lot of the time is being spent compiling the query. Try using CompiledQuery to allow you to reuse a precompiled query.
Beyond that, you'll need to analyze the SQL that gets produced to see where time is spent in the execution plan. It's possible that you'll be able to significantly improve performance with a few well-placed indices.
Use a stored procedure, and bind the ef to a function for it.

Resources