Thursday, December 13, 2018

OVER(PARTITION BY clause ORDER BY clause)

From MS Document,

PARTITION BY
Divides the query result set into partitions. The window function is applied to each partition separately and computation restarts for each partition.

First of all, we might consider that the concept of PARTITION BY is similar with GROUP BY, but they are used in different situation.

We will do the following experiment steps by steps to illustrate its difference.

Preparation: Create testing data in Database:

CREATE TABLE People
(
  PeopleID int identity(1,1),
  Name varchar(20),
);

CREATE TABLE PaymentHistory
(
  PaymentHistoryID int identity(1,1),
  PeopleID int,
  PaymentDate Datetime,
  Amount decimal
);

INSERT INTO People
Select 'Frank' UNION ALL
Select 'Jake' UNION ALL
Select 'Jordan'

INSERT INTO PaymentHistory
Select 1, '2018-10-10', 100 UNION ALL
Select 1, '2018-10-11', 200 UNION ALL
Select 1, '2018-10-12', 300 UNION ALL
Select 2, '2018-10-09', 200 UNION ALL
Select 2, '2018-10-13', 300 UNION ALL
Select 3, '2018-10-14', 300 

First, we use the following query to view all the data.

select
       p.PeopleID,
       p.Name,
       ph.PaymentDate,
       ph.Amount
from dbo.People p
join dbo.PaymentHistory ph on p.PeopleID = ph.PeopleID













From results, we can see that each People has paid different amounts on different dates.

To image that if one day, clients want us to provide a report indicating each people’s last payment date? How do we handle it?

We might come up the following solution using GROUP BY.

select
       p.PeopleID,
       p.Name,
       Max(ph.PaymentDate) as LastPaymentDate
from dbo.People p
join dbo.PaymentHistory ph on p.PeopleID = ph.PeopleID

group by p.PeopleID, p.Name




According to the results, we can find that the row number is reduced from 6 rows to 3 rows. Group By commands group that 6 rows to 3, and use aggregate function Max to get the latest payment date.

But, if clients change their mind, they also want to know what amount of the last payment for each people? How do we modify the previous query?

Because GROUP BY will make rows become grouped into a single row, we will not know which information from that grouped row is the right one we want.

Now, you might realize that if we still want to keep modifying the previous query, it will become really complicated.

Then, we would like to introduce the ranking window functions, such as ROW_NUMBER().

Syntax:
ROW_NUMBER () OVER (PARTITION BY x ORDER BY y)

The following query is an example.
select
       p.PeopleID,
       p.Name,
       ph.PaymentDate,
       ph.Amount,
       ROW_NUMBER() over (Partition by p.PeopleID order by ph.PaymentDate desc) as RowNumber
from dbo.People p

join dbo.PaymentHistory ph on p.PeopleID = ph.PeopleID



We found that it provides rolled-up data without rolling up. Then, using RowNumber, we can select the amount of last payment date for each People.
select
       *
from (
       select
              p.PeopleID,
              p.Name,
              ph.PaymentDate,
              ph.Amount,
              Rank() over (Partition by p.PeopleID order by ph.PaymentDate desc) as RowNumber
       from dbo.People p
       join dbo.PaymentHistory ph on p.PeopleID = ph.PeopleID
)as q

where q.RowNumber = 1




That is cool function, right? Next time, not only GROUP BY, but also we can have another choice OVER (PARTITION BY ORDER BY)

No comments:

Post a Comment