# Hive分组取TOPN数据

0.113字数 515阅读 9674

# 1、ROW_NUMBER，RANK()，DENSE_RANK()

partition by：类似hive的建表，分区的意思；
order by ：排序，默认是升序，加desc降序；
rank：表示别名

## 1.1 案例

### 1.1.2导入数据

``````--执行下述语句
-----------------------------------------------
drop table datatable;
CREATE table datatable (
province string,
city string,
people int)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
STORED AS TEXTFILE;

overwrite into table temp.datatable;
----------------------------------------------
--查看结果
hive (temp)> select * from datatable;
OK
province      city  people

``````

## 1.2 按照人口降序排列，生成衍生变量

``````select province,city,
rank() over (order by people desc) rank,
dense_rank() over (order by people desc) dense_rank,
row_number() over(order by people desc) row_number
from datatable
group by province,city,people;

--结果
province        city    rank    dense_rank      row_number

``````

row_number：顺序下来
rank:在遇到数据相同项时,会留下空位5,(第一列4,4,6)
dense_rank:在遇到数据相同项时,不会留下空位,(红框内第一列,4,4,5)

### 1.3 分组按照省份分区，再按照人口降序排列，生成衍生变量

``````select province,city,
rank() over (partition by province order by people desc) rank,
dense_rank() over (partition by province order by people desc) dense_rank,
row_number() over(partition by province order by people desc) row_number
from datatable
group by province,city,people;

--结果
province        city    rank    dense_rank      row_number

``````

# 2取TOPN数据

## 2.1 按照国家提取TOP3

### 2.1.2导入数据

``````--执行下述语句
-----------------------------------------------
drop table temp.tripdata;
CREATE table datatable (
country string,
city string,
Visitors int)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
STORED AS TEXTFILE;

overwrite into table temp.tripdata;
----------------------------------------------

----------------------------------------------
--查看结果
hive (temp)> select * from tripdata;
country        city   visitors

``````
``````---格式：select 品牌,count/sum/其它() as num  from table_name order by num limit 10;

select country,city,visitors
from tripdata
order by visitors desc
limit 5;

country city    visitors

``````

## 2.2 按照国家、城市提取TOP3

``````--取top10品牌下各品牌的top10渠道 ,格式:
select
a.*
from
(
select 品牌,渠道,count/sum/其它() as num row_number() over (partition by 品牌 order by num desc ) rank
from table_name
where 品牌限制条件
group by 品牌,渠道
)a
where a.rank<=10
``````
``````select a.*
from (
select country,city,visitors, row_number() over (partition by country order by visitors desc ) rank
from tripdata
order by country,visitors desc
) a
where a.rank<=3;

--结果
a.country       a.city  a.visitors      a.rank

``````

## 2.3 按照国家提取TOP5

``````--取top10品牌下各品牌的top10渠道中各渠道的top10档期 ,格式:
select  a.*
from
(
select 品牌,渠道,档期,count/sum/其它() as num row_number() over (partition by 品牌,渠道 order by num desc ) rank
from table_name
where 品牌,渠道 限制条件
group by 品牌,渠道,档期
)a
where  a.rank<=10
``````
``````select a.*
from (
select country,city,visitors, row_number() over (partition by city order by visitors desc ) rank
from tripdata
order by country,city,visitors desc
) a
where a.rank<=3;

--结果
a.country       a.city  a.visitors      a.rank

``````