sql server Min, Max range - max

I have a huge table
startip endip country
1111 2000 in
2001 3000 in
3001 4000 in
4001 5000 chi
5001 6000 chi
I want to merge all these rows like
startip endip country
1111 4000 in
4001 6000 chi
I want do this in sqlserver.
Min(start_ip) , Max(end_ip) for DISTINCT(country)
Any help ?

Use GROUP BY.
For example:
SELECT country, MIN(start_ip), MAX(end_ip)
FROM a_table
GROUP BY country

Not sure what your table schema looks like but I think what you're trying to achieve will be accomplished using GROUP BY:
SELECT country, MIN(start_ip), MAX(end_ip) FROM your_table
GROUP BY country

Related

Select single random sample from group by in Hive

I have a table that looks like so:
Name Age Num_Hobbies Num Shoes
Jane 31 10 2
Bob 23 3 4
Jane 60 2 200
Jane 31 100 6
Bob 10 8 7
etc etc
I would like to group this table by Name and Age, and at random pick one row from the rest of the columns.
In pandas, I would do the following:
df.groupby(['Name', 'Age']).apply(lambda x: x.sample(n=1))
In hive, I know how to create the group, but not how to choose a single random sample from group.
I saw this question on stack overflow: How to sample for each group in hive?
However, I do not understand how to apply Dynamic partitions or Hive bucketing to select a single sample from a group.
You can use rank() or row_number() with rand()
select * from
(
select name,age,rank() (partition by name,age order by rand()) as rank
from table
) t
where rank = 1

Tables Summary into single table in laravel

I have two tables named 'Customers' and 'Committeetables' in mysql database. They have following columns
Table: Customer
nd Name FatherName Cell# AdvanceAmount
25 Waseem Asghar 0302 2500
30 Ramzan Khan 0303 3500
35 Rana Ali 0307 2000
Table: CommitteeTables
nd Amount RecievingDate DrawDate
25 1500 2-10-15 10-10-15
30 1500 2-10-15 10-10-15
25 1500 3-10-15 10-11-15
30 1500 3-10-15 10-11-15
35 1500 3-10-15 10-11-15
A Column 'nd' in 'Customers' table is primary key and in 'CommitteeTables' table is foreign key. It means 'Customers' table joins with 'ComitteeTables' table through 'nd' column.
Expected Result table should be as under:
nd Name Father Name Cell# Advance Amount Balance
25 Waseem Asghar 0302 25000 3000 22000
30 Ramzan Khan 0303 35000 3000 32000
32 Rana Ali 0307 20000 1500 18500
----------------------------------------------------------------------------
total 80000 7500 72500
I am using Laravel 5 and HTML, Bootstrap. I need this result in Laravel 5. I shall be really thankful if someone help me to solve my problem.
Thanks
Waseem
This is not at all a Laravel question. It's a general SQL question.
Something like this:
SELECT nd, Name, FatherName, Cell#, AdvanceAmount as Advance,
(SELECT SUM(Amount) FROM CommitteeTables WHERE CommitteeTables.nd=Custoemr.nd) as Amount, Advance-Balance AS Balance
FROM Customer;
I'm not entirely sure that's correct but you can try it and fix any errors. As Bogdan suggests you should try something first then come back here with any problems that you have.

What if I want Oracle SQL to GROUP BY this but not that?

This is what I get previously from another SQL code:
Customer Id week_ending Purchase Id Price
1234 2/28/2015 8604220 15
1234 2/28/2015 8604220 13.75
1234 2/28/2015 8604220 12.95
1234 2/28/2015 8604220 18.95
567890 8/15/2015 6376243 5.15
567890 8/15/2015 6376243 0.89
567890 8/15/2015 6376243 3.99
567890 8/15/2015 6376243 2.3
1234 1/24/2015 8824241 0.99
1234 1/24/2015 8824241 3.99
1234 1/24/2015 8824241 3.89
Now I want to sum the price by Purchase ID since it is unique for every of our customer's order but I don't want my SQL to think and sum it by Customer ID (since each customer could order multiple times with multiple Purchase ID). Following is my code that I wrote but I'm afraid that it would sum them by customer_id. How do I avoid this mistake of double accounting? Thanks in advance!
WITH example AS(SELECT
customer_id
,MAX(nvl(promised_arrival_day, ship_day)) OVER (PARTITION BY purchase_id) AS ship2_day
,purchase_id
,SUM(price) AS order_size
FROM
my_table
GROUP BY
customer_id
,MAX(nvl(promised_arrival_day, ship_day)) OVER (PARTITION BY customer_purchase_id)
,purchase_id)
SELECT
example.customer_id
,TO_CHAR(example.ship2_day + (7-TO_CHAR(example.ship2_day,'d')),'MM-DD-YYYY') AS week_ending
,example.purchase_id
,example.order_size
FROM
example;
Just
SELECT customer_id, purchase_id, sum(price)
FROM your_table
GROUP BY customer_id, purchase_id
Each record will be counted only once. It doesn't do any "double accounting" as you say.
You will get one record for each unique combination of customer_id/puchase_id in your data.
Looking at your data, I would do something like.
with tbl as(
-- query by which you are getting dataset in example
)
select customer_id,purchase_id, sum(price) as total_price from tbl
group by purchase_id,customer_id

PIG Script How to

I am trying clean up this employee volunteer data. There is no way to track if employee already is registered volunteer so he can sign up as new volunteer and will get a new VOLUNTEER_ID. I have a data feeding into where i can tie each VOLUNTEER_ID to its EMP_ID. The volunteer data needs to be cleaned up so we can figure out how the employee moved from a volunteer_level to another and when.
The business logic is that, when there is a overlaping dates, we give the highest level to the employee for the timeframe of between start_date and end_date.
I posted a Input sample of data and what the output should be.
Is it possible to do this a PIG script ? Can someone please help me
INPUT:
EMP_ID VOLUNTEER_ID V_LEVEL STATUS START_DATE END_DATE
10001 100 1 A 1/1/2006 12/31/2007
10001 200 1 A 5/1/2006
10001 100 1 A 1/1/2008
10001 300 3 P 3/1/2008 3/1/2008
10001 300 3 A 3/2/2008 12/1/2008
10001 1001 2 A 5/1/2008 6/30/2008
10001 1001 3 A 7/1/2008
10001 300 2 A 12/2/2008
OUTPUT NEEDED:( VOLUNTEER_ID is not needed in output but adding below to show which ID was selected for output and which did not)
EMP_ID VOLUNTEER_ID V_LEVEL STATUS START_DATE END_DATE
10001 100 1 A 1/1/2006 12/31/2007
10001 300 3 P 3/1/2008 3/1/2008
10001 300 3 A 3/2/2008 12/1/2008
10001 1001 2 A 5/1/2008 6/30/2008
10001 1001 3 A 7/1/2008
It seems like you want the row in your data with the earliest start date for each V_LEVEL, STATUS, EMP_ID, and VOLUNTEER_ID
First we add a unix time column and then find the min for that column (this is in the latest version of pig so you may need to update your version).
data_with_unix = foreach data generate EMP_ID, VOLUNTEER_ID, V_LEVEL, STATUS, START_DATE, END_DATE, ToUnixTime((datetime)START_DATE) as unix_time;
grp = group data_with_unix by (EMP_ID, VOLUNTEER_ID, V_LEVEL, STATUS);
max_date = foreach grp generate group, MIN(data_with_unix.unix_time);
Then join the start and end date back into your dataset since there it doesn't look like there is currently a way to convert unix time back to date.

Grouping data by date ranges

I wonder how do I select a range of data depending on the date range?
I have these data in my payment table in format dd/mm/yyyy
Id Date Amount
1 4/1/2011 300
2 10/1/2011 200
3 27/1/2011 100
4 4/2/2011 300
5 22/2/2011 400
6 1/3/2011 500
7 1/1/2012 600
The closing date is on the 27 of every month. so I would like to group all the data from 27 till 26 of next month into a group.
Meaning to say I would like the output as this.
Group 1
1 4/1/2011 300
2 10/1/2011 200
Group 2
1 27/1/2011 100
2 4/2/2011 300
3 22/2/2011 400
Group 3
1 1/3/2011 500
Group 4
1 1/1/2012 600
It's not clear the context of your qestion. Are you querying a database?
If this is the case, you are asking about datetime but it seems you have a column in string format.
First of all, convert your data in datetime data type (or some equivalent, what db engine are you using?), and then use a grouping criteria like this:
GROUP BY datepart(month, dateadd(day, -26, [datefield])), DATEPART(year, dateadd(day, -26, [datefield]))
EDIT:
So, you are in Linq?
Different language, same logic:
.GroupBy(x => DateTime
.ParseExact(x.Date, "dd/mm/yyyy", CultureInfo.InvariantCulture) //Supposed your date field of string data type
.AddDays(-26)
.ToString("yyyyMM"));
If you are going to do this frequently, it would be worth investing in a table that assigns a unique identifier to each month and the start and end dates:
CREATE TABLE MonthEndings
(
MonthID INTEGER NOT NULL PRIMARY KEY,
StartDate DATE NOT NULL,
EndDate DATE NOT NULL
);
INSERT INTO MonthEndings VALUES(201101, '27/12/2010', '26/01/2011');
INSERT INTO MonthEndings VALUES(201102, '27/01/2011', '26/02/2011');
INSERT INTO MonthEndings VALUES(201103, '27/02/2011', '26/03/2011');
INSERT INTO MonthEndings VALUES(201112, '27/11/2011', '26/01/2012');
You can then group accurately using:
SELECT M.MonthID, P.Id, P.Date, P.Amount
FROM Payments AS P
JOIN MonthEndings AS M ON P.Date BETWEEN M.StartDate and M.EndDate
ORDER BY M.MonthID, P.Date;
Any group headings etc are best handled out of the DBMS - the SQL gets you the data in the correct sequence, and the software retrieving the data presents it to the user.
If you can't translate SQL to LINQ, that makes two of us. Sorry, I have never used LINQ, so I've no idea what is involved.
SELECT *, CASE WHEN datepart(day,date)<27 THEN datepart(month,date)
ELSE datepart(month,date) % 12 + 1 END as group_name
FROM payment

Resources