From MS Document,
PARTITION BY
Divides the query result set into partitions. The window function is applied to each partition separately and computation restarts for each partition.
First of all, we might consider that the concept of PARTITION BY is similar with GROUP BY, but they are used in different situation.
We will do the following experiment steps by steps to illustrate its difference.
Preparation: Create testing data in Database:
CREATE TABLE People(PeopleID int identity(1,1),Name varchar(20),);CREATE TABLE PaymentHistory(PaymentHistoryID int identity(1,1),PeopleID int,PaymentDate Datetime,Amount decimal);INSERT INTO PeopleSelect 'Frank' UNION ALLSelect 'Jake' UNION ALLSelect 'Jordan'INSERT INTO PaymentHistorySelect 1, '2018-10-10', 100 UNION ALLSelect 1, '2018-10-11', 200 UNION ALLSelect 1, '2018-10-12', 300 UNION ALLSelect 2, '2018-10-09', 200 UNION ALLSelect 2, '2018-10-13', 300 UNION ALLSelect 3, '2018-10-14', 300
First, we use the following query to view all the data.
selectp.PeopleID,p.Name,ph.PaymentDate,ph.Amountfrom dbo.People pjoin dbo.PaymentHistory ph on p.PeopleID = ph.PeopleID
From results, we can see that each People has paid different amounts on different dates.
To image that if one day, clients want us to provide a report indicating each people’s last payment date? How do we handle it?
We might come up the following solution using GROUP BY.
selectp.PeopleID,p.Name,Max(ph.PaymentDate) as LastPaymentDatefrom dbo.People pjoin dbo.PaymentHistory ph on p.PeopleID = ph.PeopleID
group by p.PeopleID, p.Name
According to the results, we can find that the row number is reduced from 6 rows to 3 rows. Group By commands group that 6 rows to 3, and use aggregate function Max to get the latest payment date.
But, if clients change their mind, they also want to know what amount of the last payment for each people? How do we modify the previous query?
Because GROUP BY will make rows become grouped into a single row, we will not know which information from that grouped row is the right one we want.
Now, you might realize that if we still want to keep modifying the previous query, it will become really complicated.
Then, we would like to introduce the ranking window functions, such as ROW_NUMBER().
Syntax:ROW_NUMBER () OVER (PARTITION BY x ORDER BY y)
The following query is an example.
selectp.PeopleID,p.Name,ph.PaymentDate,ph.Amount,ROW_NUMBER() over (Partition by p.PeopleID order by ph.PaymentDate desc) as RowNumberfrom dbo.People p
join dbo.PaymentHistory ph on p.PeopleID = ph.PeopleID
We found that it provides rolled-up data without rolling up. Then, using RowNumber, we can select the amount of last payment date for each People.
select*from (selectp.PeopleID,p.Name,ph.PaymentDate,ph.Amount,Rank() over (Partition by p.PeopleID order by ph.PaymentDate desc) as RowNumberfrom dbo.People pjoin dbo.PaymentHistory ph on p.PeopleID = ph.PeopleID)as q
where q.RowNumber = 1
That is cool function, right? Next time, not only GROUP BY, but also we can have another choice OVER (PARTITION BY ORDER BY)