分析函数语法:
function_name(<argument>,<argument>……)
over
(<partition-clause><order-by-clause><windowing clause>)
例:
sum(sal)
over
(partition by deptno order by ename) new_alias
说明:
sum就是函数名
(sal)是分析函数的参数,每个函数有0~3个参数,参数可以是表达式:sum(sal+comm)
over是一个关键字,用于标识分析函数,否则查询分析器不能区别sum()是聚合函数还是分析函数。
1.function子句
Oracle提供了26个分析函数,按功能分5类:
等级(ranking)函数:用于寻找前N种查询
开窗(windowing)函数:用于计算不同的累积,如sum,count,avg,min,max等,作用于数据的一个窗口上。
制表(reporting)函数:与开窗函数同名,作用于一个分区或一组上的所有列
开窗函数和制表函数的关键不同之处在于over语句上缺少一个order by子句。
lag,lead函数:这类函数允许在结果集中向前或向后检索值,为了避免数据的自连接,它们是非常有用的。
var_pop,var_samp,stdev_pope及线性的衰减函数:计算任何未排序分区的统计值
2.partition子句
按照表达式分区(就是分组),如果省略了分区子句,则全部的结果集被看做是一个单一的组。
3.order by子句
分析函数中order by的存在将添加一个默认的开窗子句,这意味着计算中所使用的行的集合是当前分区中当前行和前面所有行,没有order by时,默认的窗口是全部的分区,在order by子句后可以添加nulls last,如:order by comm desc null last表示排序时忽略comm列为空的行。
4.windowing子句
用于定义分析函数将在其上操作的行的集合
windowing子句给出了一个定义变化或固定的数据窗口的方法,分析函数将对这些数据进行操作
默认的窗口是一个固定的窗口,仅仅在一组的第一行开始,一直继续到当前行,要使用窗口,必须使用order by子句
根据2个标准可以建立窗口:数据值的范围(ranges)或与当前行的行偏移量
1)Range窗口:
Range 5 preceding将产生一个滑动窗口,它在组中拥有当前行的前5行集合。
Range窗口仅对numbers和dates起作用,因为不可能从varchar2中增加或减去N个单元
另外的限制是order by中只能有一列,因而范围实际上是一维的,不能在N维空间中
例:avg(t.sal) over(order by t.hiredate asc range 100 preceding) --统计前100天的平均工资
2)Row窗口
利用Row分区,就没有Range分区那样的限制了,数据可以是任何类型,且order by可以包括很多列
3)Specifying窗口
unbounded preceding:这个窗口从当前分区的每一行开始,并结束于正在处理的当前行
current row:该窗口从当前行开始(并结束)
numeric expression preceding:对该窗口从当前行之前的数据表达式的行开始,对range来说,从行序值小于数字表达式的当前行的值开始
numeric expression following:对该窗口从当前行之后的数据表达式的行终止(或开始),对range来说,从行序值大于数字表达式的当前行的方位开始(或终止)
range between 100 preceding and 100 following
等级函数:
例:按区域查找上一年度订单总额占区域订单总额20%以上的客户
select all_sales.*,
100 * round(cust_sales / region_sales, 2) || '%' Percent
from (select o.cust_nbr customer,
o.region_id region,
sum(o.tot_sales) cust_sales,
sum(sum(o.tot_sales)) over(partition by o.region_id) region_sales
from orders_tmp o
where o.year = 2001
group by o.region_id, o.cust_nbr) all_sales
where all_sales.cust_sales > all_sales.region_sales * 0.2;
①ROW_NUMBER:
Row_number函数返回一个唯一的值,当碰到相同数据时,排名按照记录集中记录的顺序依次递增。
②DENSE_RANK:
Dense_rank函数返回一个唯一的值,除非当碰到相同数据时,此时所有相同数据的排名都是一样的。
③RANK:
Rank函数返回一个唯一的值,除非遇到相同的数据时,此时所有相同数据的排名是一样的,同时会在最后一条相同记录和下一条不同记录的排名之间空出排名。
REGION_ID CUSTOMER_ID CUSTOMER_SALES
---------- ----------- --------------
5 1 151162
10 29 903383
6 7 971585
10 28 986964
9 21 1020541
9 22 1036146
8 16 1068467
6 8 1141638
5 3 1161286
5 5 1169926
8 19 1174421
7 12 1182275
7 11 1190421
6 10 1196748
6 9 1208959
10 30 1216858
5 2 1224992
9 24 1224992
9 23 1224992
8 18 1253840
7 15 1255591
7 13 1310434
10 27 1322747
8 20 1413722
6 6 1788836
10 26 1808949
5 4 1878275
7 14 1929774
8 17 1944281
9 25 2232703
SQL> select region_id,
customer_id,
sum(customer_sales) total,
rank() over(order by sum(customer_sales) desc) rank,
dense_rank() over(order by sum(customer_sales) desc) dense_rank,
row_number() over(order by sum(customer_sales) desc) row_number
from user_order
group by region_id, customer_id;
REGION_ID CUSTOMER_ID TOTAL RANK DENSE_RANK ROW_NUMBER
---------- ----------- ---------- ---------- ---------- ----------
......
8 18 1253840 11 11 11
5 2 1224992 12 12 12
9 23 1224992 12 12 13
9 24 1224992 12 12 14
10 30 1216858 15 13 15
.......
SQL> select region_id,
customer_id,
sum(customer_sales) total,
rank() over(partition by region_id order by sum(customer_sales) desc) rank,
dense_rank() over(partition by region_id order by sum(customer_sales) desc) dense_rank,
row_number() over(partition by region_id order by sum(customer_sales) desc) row_number
from user_order
group by region_id, customer_id;
REGION_ID CUSTOMER_ID TOTAL RANK DENSE_RANK ROW_NUMBER
---------- ----------- ---------- ---------- ---------- ----------
......
5 4 1878275 1 1 1
5 2 1224992 2 2 2
5 5 1169926 3 3 3
6 6 1788836 1 1 1
6 9 1208959 2 2 2
6 10 1196748 3 3 3
......
SQL> select region_id,
customer_id,
sum(customer_sales) cust_sales,
sum(sum(customer_sales)) over(partition by region_id) ran_total,
rank() over(partition by region_id order by sum(customer_sales) desc) rank
from user_order
group by region_id, customer_id;
REGION_ID CUSTOMER_ID CUST_SALES RAN_TOTAL RANK
---------- ----------- ---------- ---------- ----------
10 31 6238901 1
10 26 1808949 6238901 2
10 27 1322747 6238901 3
10 30 1216858 6238901 4
10 28 986964 6238901 5
10 29 903383 6238901 6
这里有一条记录的CUST_TOTAL字段值为NULL,居然排在第一名,显然是不符合情理。
SQL> select region_id,
customer_id,
sum(customer_sales) cust_total,
sum(sum(customer_sales)) over(partition by region_id) reg_total,
rank() over(partition by region_id order by sum(customer_sales) desc NULLS LAST) rank
from user_order
group by region_id, customer_id;
REGION_ID CUSTOMER_ID CUST_TOTAL REG_TOTAL RANK
---------- ----------- ---------- ---------- ----------
10 26 1808949 6238901 1
10 27 1322747 6238901 2
10 30 1216858 6238901 3
10 28 986964 6238901 4
10 29 903383 6238901 5
10 31 6238901 6
例:找出订单总额最多、最少的客户
SQL> select min(customer_id) keep (dense_rank first order by sum(customer_sales) desc) first,
min(customer_id) keep (dense_rank last order by sum(customer_sales) desc) last
from user_order
group by customer_id;
FIRST LAST
---------- ----------
31 1
min函数的作用是用于当存在多个First/Last情况下保证返回唯一的记录。
keep告诉Oracle只保留符合条件的记录。
dense_rank告诉Oracle排列的策略。
first/last告诉Oracle最终筛选的条件。
例:找出订单总额排名前1/5的客户
SQL> select region_id,
customer_id,
ntile(5) over(order by sum(customer_sales) desc) til
from user_order
group by region_id, customer_id;
REGION_ID CUSTOMER_ID TILE
---------- ----------- ----------
10 31 1
9 25 1
10 26 1
6 6 1
8 18 2
5 2 2
9 23 3
6 9 3
7 11 3
5 3 4
6 8 4
8 16 4
6 7 5
10 29 5
5 1 5
Ntil函数为各个记录在记录集中的排名计算比例,我们看到所有的记录被分成5个等级,那么假如我们只需要前1/5的记录则只需要截取TILE的值为1的记录就可以了。
窗口函数:
例:列出每月的订单总额以及截至到当前月的订单总额
SQL> select month,
sum(tot_sales) month_sales,
sum(sum(tot_sales)) over(order by month rows between unbounded preceding and current row) current_total_sales,
sum(sum(tot_sales)) over(order by month rows between unbounded preceding and unbounded following) total_sales
from orders
group by month;
MONTH MONTH_SALES CURRENT_TOTAL_SALES TOTAL_SALES
---------- ----------- ------------------- -----------
1 610697 610697 6307766
2 428676 1039373 6307766
3 637031 1676404 6307766
4 541146 2217550 6307766
5 592935 2810485 6307766
6 501485 3311970 6307766
7 606914 3918884 6307766
8 460520 4379404 6307766
9 392898 4772302 6307766
10 510117 5282419 6307766
11 532889 5815308 6307766
12 492458 6307766 6307766
例:当天销售额和五天内的平均销售额
SQL> select trunc(order_dt) day,
sum(sale_price) daily_sales,
avg(sum(sale_price)) over (order by trunc(order_dt) range between interval '2' day preceding and interval '2' day following) five_day_avg
from cust_order
where sale_price is not null
and order_dt between to_date('01-jul-2001','dd-mon-yyyy')
and to_date('31-jul-2001','dd-mon-yyyy')
例:显示当前月、上一个月、后一个月的销售情况,以及每3个月的销售平均值
SQL> select month,
first_value(sum(tot_sales)) over (order by month rows between 1 preceding and 1 following) prev_month,
sum(tot_sales) monthly_sales,
last_value (sum(tot_sales)) over (order by month rows between 1 preceding and 1 following) next_month,
avg(sum(tot_sales)) over (order by month rows between 1 preceding and 1 following) rolling_avg
from orders
where year = 2001
and region_id = 6
group by month
order by month;
例:每次显示当月的销售额和上个月的销售额
SQL> select month,
sum(tot_sales) monthly_sales,
lag(sum(tot_sales), 1) over (order by month) prev_month_sales
from orders
where year = 2001
and region_id = 6
group by month
order by month;
lag(sum(tot_sales),1)中的1表示以1月为基准。
制表函数:
SQL> select month,
sum(tot_sales) month_sales,
sum(sum(tot_sales)) over(order by month rows between unbounded preceding and unbounded following) win_sales,
sum(sum(tot_sales)) over() rpt_sales
from orders
group by month;
MONTH MONTH_SALES WINDOW_SALES REPORT_SALES
---------- ----------- ------------ ------------
1 610697 6307766 6307766
2 428676 6307766 6307766
3 637031 6307766 6307766
4 541146 6307766 6307766
5 592935 6307766 6307766
6 501485 6307766 6307766
7 606914 6307766 6307766
8 460520 6307766 6307766
9 392898 6307766 6307766
10 510117 6307766 6307766
11 532889 6307766 6307766
12 492458 6307766 6307766
over函数的空括号表示该记录集的所有记录都应该被列入统计的范围,如果使用了partition by则先分区,再依次统计各个分区。
SQL> select region_id,
salesperson_id,
sum(tot_sales) sp_sales,
round(ratio_to_report(sum(tot_sales)) over (partition by region_id), 2) sp_ratio
from orders
where year = 2001
group by region_id, salesperson_id
order by region_id, salesperson_id;
Oracle提供的Ratio_to_report函数允许我们计算每条记录在其对应记录集或其子集中所占的比例。