# Hive窗口函数总结

``````CREATE TABLE lxy (cookieid INT, create_time STRING, pv INT)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ',';
SELECT * FROM lxy;
``````

# SUM(), MIN(),MAX(),AVG()等聚合函数

``````SELECT *,
SUM(a.pv) OVER (PARTITION BY cookieid ORDER BY create_time ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS pv1,
SUM(a.pv) OVER (PARTITION BY cookieid ORDER BY create_time ROWS BETWEEN 2 PRECEDING AND 1 FOLLOWING) AS pv2
FROM lxy AS a;
``````

# 新增加序号列NTILE, ROW_NUMBER(), RANK(), DENSE_RANK()

``````SELECT *,
NTILE(3) OVER (PARTITION BY cookid2 ORDER BY pv) AS n1,
ROW_NUMBER() OVER (PARTITION BY cookid2 ORDER BY pv) AS n2,
RANK() OVER (PARTITION BY cookid2 ORDER BY pv) AS n3,
DENSE_RANK() OVER (PARTITION BY cookid2 ORDER BY pv) AS n4
FROM lxy3;
``````

lx3
``````SELECT *,
LAG(pv, 2) OVER(PARTITION BY cookid2 ORDER BY log_date) AS lag1,
FIRST_VALUE() OVER(PARTITION BY cookid2 ORDER BY log_date) AS first_pv,
FIRST_VALUE() OVER(PARTITION BY cookid2 ORDER BY log_date) AS last_pv,
LAST_VALUE() OVER(PARTITION BY cookid2 ORDER BY log_date) AS current_last_pv
FROM lxy3;
``````

``````SELECT *,
FIRST_VALUE() OVER(PARTITION BY cookid2 ORDER BY pv DESC) AS first_pv
FROM lxy3;
``````

# GROUPING SET, CUBE, ROLL UP

``````CREATE EXTERNAL TABLE lxw1234 (
month STRING,
day STRING,
) ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LOCATION '/user/chenlinlin2156233/lxy2/';
``````

``````SELECT * FROM lxw1234;
``````

GROUPING SET(key1, key2)相当于是对不同字段进行group操作以后，再进行union all的操作。

``````SELECT month,
day,
GROUPING__ID
FROM lxw1234
GROUP BY month, day
GROUPING SETS(month, day)
ORDER BY GROUPING__ID;
``````

1. GROUPING_ID是自动生成的，是进行了GROUPING_SET()的操作之后。
2. 下划线有两个
3. 需要先做GROUP BY操作再传入GROUPING SETS
等价于先group再union all的做法
``````SELECT month,NULL,COUNT(DISTINCT cookieid) AS uv,1 AS GROUPING__ID FROM lxw1234 GROUP BY month
UNION ALL
SELECT NULL,day,COUNT(DISTINCT cookieid) AS uv,2 AS GROUPING__ID FROM lxw1234 GROUP BY day
UNION ALL
SELECT month,day,COUNT(DISTINCT cookieid) AS uv,3 AS GROUPING__ID FROM lxw1234 GROUP BY month,day
``````

CUBE就是比以上的GROUPING SETS多了一个两列的整合，也就是笛卡尔乘积。

``````SELECT month,
day,
GROUPING__ID
FROM lxw1234
GROUP BY month, day
WITH CUBE
ORDER BY GROUPING__ID;
``````

``````SELECT month,
day,
GROUPING__ID
FROM lxw1234
GROUP BY month, day
WITH ROLLUP
ORDER BY GROUPING__ID;
``````
rollup返回的结果

``````SELECT month,
day,
GROUPING__ID
FROM lxw1234
GROUP BY day, month
WITH ROLLUP
ORDER BY GROUPING__ID;
``````

# 参考资源

### 推荐阅读更多精彩内容

• 一：前言 根据官网的介绍，hive推出的窗口函数功能是对hive sql的功能增强，确实目前用于离线数据分析逻辑日...
愤怒的谜团阅读 17,763评论 0 11
• pyspark.sql模块 模块上下文 Spark SQL和DataFrames的重要类： pyspark.sql...
mpro阅读 7,850评论 0 11
• 这一周主要学习了 Hive 的一些基础知识，学习了多个 Hive 窗口函数，虽然感觉这些窗口函数没有实际的应用...
大石兄阅读 1,182评论 2 8
• pyspark.sql module Module context Spark SQL和DataFrames中的重...
盗梦者_56f2阅读 4,654评论 0 19
• 分析函数，也称为窗口函数，通常被认为仅对数据仓库SQL有用。使用分析函数的查询，基于对数据行的分组来计算总量值。与...
猫猫_tomluo阅读 2,417评论 3 18