Calculating an Average across multiple columns in Hadoop Hive - hadoop

I am trying to calculate an average of three columns in Hive but with no luck. Below is my code.
select c.university_name, c.country, AVG(c.world_rank) as AvgC, AVG(s.world_rank) as AvgS, AVG(t.world_rank) as AvgT, SUM(AvgC+AvgS+AvgT)/3 as TotalAvg
from cwur c
join shanghai s on (c.university_name = s.university_name and c.year = s.year)
join times t on (c.university_name = t.university_name and c.year = t.year)
Is Hive even capable of averaging across three calculated columns?

You are missing the group by clause
select
c.university_name,
c.country,
AVG(c.world_rank) as AvgC,
AVG(s.world_rank) as AvgS,
AVG(t.world_rank) as AvgT,
(AvgC+AvgS+AvgT)/3 as TotalAvg
from cwur c
join shanghai s on (c.university_name = s.university_name and c.year = s.year)
join times t on (c.university_name = t.university_name and c.year = t.year)
group by c.university_name, c.country

Related

speed up a query with multiple inner joins in ms access

as tittle says i need to improve this query that i have made in ms access, the tables are from a linked DB. i can't index them. i need help to understand where it is taking so long... is there any function like
EXPLAIN to access? do i need to put more columns in some sort of group by? what i need to do to improve the speed of this (the group by of first select has 4M rows but after grouped only has 321k and it takes 20min to run when laptop doesn't crashes)
SELECT a.SEQ_NO,
b.SKU,
b.maxdate,
(a.BASE_COST/a.EXCHANGE) AS BASE_COST,
(a.NET_COST/a.EXCHANGE) AS NET_COST,
(a.NET_NET_COST/a.EXCHANGE) AS EXCHAGED_NET_NET_COST,
a.NET_NET_COST,
(a.DEAD_NET_NET_COST/a.EXCHANGE) AS DEAD_NET_NET_COST,
(a.LANDED_COST/a.EXCHANGE) AS LANDED_COST,
(a.POSEIMA/a.EXCHANGE) AS POSEIMA,
(a.TOTAL_BONUS/a.EXCHANGE) AS TOTAL_BONUS,
(a.IEC/a.EXCHANGE) AS IEC,
(a.IEC_BONUS/a.EXCHANGE) AS IEC_BONUS,
(a.ECO_INVOICE_FORN/a.EXCHANGE) AS ECO_INVOICE_FORN_SYSTEM,
(a.ECO_INVOICE/a.EXCHANGE) AS ECO_INVOICE_SYSTEM,
(a.ECO_MERCHANDISE/a.EXCHANGE) AS ECO_MERCHANDISE_SYSTEM,
c.SUPPLIER,
c.SUP_NAME,
d.UPC,
d.PRIMARY_UPC_IND,
f.BRAND,
g.DEPT,
g.DESC_UP,
g.CLASS,
g.SUBCLASS,
h.AV_COST,
h.UNIT_RETAIL AS Last_of_unit_retail,
h.STATUS, i.[UNIT VALUE],
i.[INITIAL DATE],
i.[END DATE] INTO PRICELIST
FROM (((((((RMS_MC_NB_PRICELIST_COST AS a INNER JOIN (SELECT MAX(SEQ_NO) AS
ID, SKU, MAX(ACTIVE_DATE) AS maxdate FROM RMS_MC_NB_PRICELIST_COST GROUP BY
SKU) AS b ON a.SEQ_NO = b.ID)
INNER JOIN RMS_MC_SUPS AS c ON a.SUPPLIER = c.SUPPLIER)
INNER JOIN RMS_MC_UPC_EAN AS d ON b.SKU = d.SKU)
INNER JOIN RMS_MC_WIN_ATTRIBUTES AS e ON b.SKU = e.SKU)
INNER JOIN RMS_MC_NB_BRAND AS f ON e.NB_BRAND_NO = f.BRAND_NO)
INNER JOIN RMS_MC_DESC_LOOK AS g ON b.SKU = g.SKU)
INNER JOIN RMS_MC_WIN_STORE AS h ON b.SKU = h.SKU)
LEFT JOIN MAPA_APOIOS_SISO AS i ON b.SKU = i.[# ARTICLE];

sql select records that don't have relation in a third table

I have three tables
CLAIMS_TB
CLAIMS_RISK_TB
VEHICLE_TB
And then I need this result below:
Who can help me or share with me the query to be used?
N.B: If the code is 700 it means that it is a vehicle and it must fill the column called "ai_vehicle_use" otherwise it must leave it blank because "VEHICLE_TB" table contains only vehicles
This is what I tried:
select
klm.CM_PL_INDEX,
klm.cm_no,
klmrisk.cr_risk_code,
CASE WHEN klm.CM_PR_CODE = '0700' THEN klmrisk.cr_risk_code ELSE '' END,
veh.ai_vehicle_use
from CLAIMS_TB klm
JOIN CLAIMS_RISK_TB klmrisk
ON (klm.cm_index = klmrisk.cr_cm_index)
INNER JOIN VEHICLE_TB veh
on veh.ai_regn_no = klm.cm_no
where klm.cm_no='CL/01/044/00001/01/2018'
or klmrisk.cr_cm_index='86594'
order by klmrisk.cr_risk_code;
I believe this could fit your needs.
SELECT
*
FROM CLAIMS_TB AS c
LEFT JOIN CLAIMS_RISK_TB cl ON c.cm_index = cl.cr_cm_index
LEFT JOIN VEHICLE_TB v ON cl.cr_risk_code = v.ai_risk_index
Finaly I find the solution, query below works:
select * from CLAIMS_TB c
JOIN CLAIMS_RISK_TB cr ON( C.CM_INDEX = cr.cr_cm_index)
LEFT OUTER JOIN VEHICLE_TB v ON (cr.cr_risk_code = v.ai_regn_no);

PL SQL - Join 2 tables and return max from right table

Trying to retrive the MAX doc in the right table.
SELECT F43.PDDOCO,
F43.PDSFXO,
F43.PDLNID,
F43.PDAREC/100 As Received,
F431.PRAREC/100,
max(F431.PRDOC)
FROM PRODDTA.F43121 F431
LEFT OUTER JOIN PRODDTA.F4311 F43
ON
F43.PDKCOO=F431.PRKCOO
AND F43.PDDOCO=F431.PRDOCO
AND F43.PDDCTO=F431.PRDCTO
AND F43.PDSFXO=F431.PRSFXO
AND F43.PDLNID=F431.PRLNID
WHERE F431.PRDOCO = 401531
and F431.PRMATC = 2
and F43.PDLNTY = 'DC'
Group by
F43.PDDOCO,
F43.PDSFXO,
F43.PDLNID,
F43.PDAREC,
F431.PRAREC/100
This query is still returning the two rows in the right table. Fairly new to SQL and struggling with the statement. Any help would be appreciated.
Without seeing your data it is difficult to tell where the problem might so I will offer a few suggestions that could help.
First, you are joining with a LEFT JOIN on the PRODDTA.F4311 but you have in the WHERE clause a filter for that table. You should move the F43.PDLNTY = 'DC' to the JOIN condition. This is causing the query to act like an INNER JOIN.
Second, you can try using a subquery to get the MAX(PRDOC) value. Then you can limit the columns that you are grouping on which could eliminate the duplicates. The query would them be similar to the following:
SELECT F43.PDDOCO,
F43.PDSFXO,
F43.PDLNID,
F43.PDAREC/100 As Received,
F431.PRAREC/100,
F431.PRDOC
FROM PRODDTA.F43121 F431
INNER JOIN
(
-- subquery to get the max
-- then group by the distinct columns
SELECT PDKCOO, max(PRDOC) MaxPRDOC
FROM PRODDTA.F43121
WHERE PRDOCO = 401531
and PRMATC = 2
GROUP BY PDKCOO
) f2
-- join the subquery result back to the PRODDTA.F43121 table
on F431.PRDOC = f2.MaxPRDOC
AND F431.PDKCOO = f2.PDKCOO
LEFT OUTER JOIN PRODDTA.F4311 F43
ON F43.PDKCOO=F431.PRKCOO
AND F43.PDDOCO=F431.PRDOCO
AND F43.PDDCTO=F431.PRDCTO
AND F43.PDSFXO=F431.PRSFXO
AND F43.PDLNID=F431.PRLNID
AND F43.PDLNTY = 'DC' -- move this filter to the join instead of the WHERE
WHERE F431.PRDOCO = 401531
and F431.PRMATC = 2
If you provide your table structures and some sample data, it will be easier to determine the issue.

Complex Join Using LINQ EF

How can I join the two queries using LINQ to EF? I need the result set returned to me that includes joined data from the 2 queries combined.
1
select StockNo, Description
from VehicleOption_New
where StockNo in
(
select v.StockNo
from Vehicles v
join StatusDescription s
on v.Status = s.StatusId
where NewOrUsed = 'n' and v.model = 'cts'
)
and color is not null
2
select v.StockNo, s.StatusDescriptionText
from Vehicles v
join StatusDescription s
on v.Status = s.StatusId
where NewOrUsed = 'n' and v.model = 'cts'
Once you have the equivalent EF queries you can use either Concat() or Union() to combine the results.

Join two Oracle queries

I've to query from two tables and want one result.. how can i join these two queries?
First query is querying from two tables and the second one is only from one.
select pt.id,pt.promorow,pt.promocolumn,pt.type,pt.image,pt.style,pt.quota_allowed,ptc.text,pq.quota_left
from promotables pt,promogroups pg ,promotablecontents ptc ,promoquotas pq where pt.id_promogroup = 1 and ptc.country ='049' and ptc.id_promotable = pt.id and pt.id_promogroup = pg.id and pq.id_promotable = pt.id order by pt.promorow,pt.promocolumn
select pt.id,pt.promorow,pt.promocolumn,pt.type,pt.image,pt.style,pt.quota_allowed from promotables pt where pt.type='heading'
Use UNION or UNION ALL. As long as you have the same number of columns and they are compatible types that should do what you want.
SELECT pt.id, pt.promorow, pt.promocolumn, pt.type, pt.image, pt.style, pt.quota_allowed, ptc.text, pq.quota_left
FROM promotables pt, promogroups pg, promotablecontents ptc, promoquotas pq
WHERE pt.id_promogroup = 1
AND ptc.country ='049'
AND ptc.id_promotable = pt.id
AND pt.id_promogroup = pg.id
AND pq.id_promotable = pt.id
UNION
SELECT pt.id, pt.promorow, pt.promocolumn, pt.type, pt.image, pt.style, pt.quota_allowed, NULL, NULL
FROM promotables pt
WHERE pt.type='heading'
ORDER BY 2, 3
If you want to display duplicates (e.g identical rows coming from both queries), use UNION ALL

Resources