使用clickhouse实现开窗函数 row/rank_number 和 lag lead ROW_NUMBER实现 如何在ClickHouse中实现ROW_NUMBER OVER 和DENSE_RANK OVER等同效果的查询,它们在一些其他数据库中可用于RANK排序。 CH中并没有直接提供对应的开窗函数,需要利用一些特殊函数变相实现,主要会用到下面几个数组函数,它们分别是: arrayEnumerate arrayEnumerateDense arrayEnumerateUniq 这些函数均接受一个数组作为输入参数,并返回数组中素出现的位置,例如: SELECT arrayEnumerate([10, 20, 30, 10, 40]) AS row_number, arrayEnumerateDense([10, 20, 30, 10, 40]) AS dense_rank, arrayEnumerateUniq([10, 20, 30, 10, 40]) AS uniq_rank ┌─row_number──┬─dense_rank──┬─uniq_rank───┐ │ [1,2,3,4,5] │ [1,2,3,1,4] │ [1,1,1,2,1] │ └─────────────┴─────────────┴─────────────┘ 数据格式:
我们的目标是实现开窗查询: ROW_NUMBER() OVER( PARTITION BY id ORDER BY val )==>arrayEnumerate DENSE_RANK() OVER( PARTITION BY id ORDER BY val )==>arrayEnumerateDense UNIQ_RANK() OVER( PARTITION BY id ORDER BY val )==>arrayEnumerateUniq 代码如下 SELECT customer_id , groupArray(loan_dt) AS loan_dt, groupArray(ifnull(end_date,toDate(‘2099-12-31’))) AS end_date, groupArray(due_days) AS due_days, groupArray(loan_id) AS loan_id, arrayEnumerate(loan_id) AS row_number, arrayEnumerateDense(loan_id) AS dense_rank, arrayEnumerateUniq(loan_id) AS uniq_rank FROM ( SELECT * FROM res_report.xxx_loan ORDER BY loan_dt ,loan_id ) GROUP BY customer_id
数组展开,利用ARRAY JOIN将数组展开,并按照customer_id 、loan_id 列排序: SELECT customer_id ,loan_id ,loan_dt ,IF(end_date=toDate(‘2099-12-31’),null,end_date) as end_dt ,due_days ,row_number ,dense_rank ,uniq_rank from ( SELECT customer_id , groupArray(loan_dt) AS loan_dt, groupArray(ifnull(end_date,toDate(‘2099-12-31’))) AS end_date, groupArray(due_days) AS due_days, groupArray(loan_id) AS loan_id, arrayEnumerate(loan_id) AS row_number, arrayEnumerateDense(loan_id) AS dense_rank, arrayEnumerateUniq(loan_id) AS uniq_rank FROM ( SELECT * FROM res_report.xxx_loan ORDER BY loan_dt ,loan_id ) GROUP BY customer_id ) ARRAY JOIN loan_dt, loan_id, end_date, due_days, row_number, dense_rank, uniq_rank ORDER BY customer_id ASC, row_number ASC , dense_rank ASC
技巧:因为end_date可能为空值,会导致array长度不一致。报错。需要用特数值填充然后最后再转换回来。 lag/lead实现: neighbor(column, offset[, default_value]) The result of the function depends on the affected data blocks and the order of data in the block. If you make a subquery with ORDER BY and call the function from outside the subquery, you can get the expected result. Parameters column — A column name or scalar expression. offset — The number of rows forwards or backwards from the current row of column. Int64. default_value — Optional. The value to be returned if offset goes beyond the scope of the block. Type of data blocks affected. Returned values Value for column in offset distance from current row if offset value is not outside block bounds. Default value for column if offset value is outside block bounds. If default_value is given, then it will be used. Type: type of data blocks affected or default value type. 参考: https://clickhouse.tech/docs/en/sql-reference/functions/other-functions/ 代码如下 SELECT customer_id ,loan_id ,loan_dt ,IF(end_date=toDate(‘2099-12-31’),null,end_date) as end_dt ,due_days ,row_number ,dense_rank ,uniq_rank ,if(neighbor(row_number , 1)<>1,neighbor(loan_dt , 1),null) as lead_loan_dt ,if(row_number<>1,neighbor(end_dt, -1),null) as lag_end_dt from ( SELECT customer_id , groupArray(loan_dt) AS loan_dt, groupArray(ifnull(end_date,toDate(‘2099-12-31’))) AS end_date, groupArray(due_days) AS due_days, groupArray(loan_id) AS loan_id, groupArray(loan_tot) AS loan_tot, arrayEnumerate(loan_id) AS row_number, arrayEnumerateDense(loan_id) AS dense_rank, arrayEnumerateUniq(loan_id) AS uniq_rank FROM ( SELECT * FROM res_report.xxx_loan ORDER BY loan_dt ,loan_id ) GROUP BY customer_id ) ARRAY JOIN loan_dt, loan_id, end_date, due_days, row_number, dense_rank, uniq_rank ORDER BY customer_id ASC, row_number ASC , dense_rank ASC
发现有一点问题:就是最后一行lead的时候会出现异常值1970-01-01年值的问题。 SELECT customer_id ,loan_id ,loan_dt ,IF(end_date=toDate(‘2099-12-31’),null,end_date) as end_dt ,due_days ,row_number as row_num ,dense_rank ,uniq_rank ,if(neighbor(row_num , 1)>1,neighbor(loan_dt , 1),null) as lead_loan_dt ,if(row_num<>1,neighbor(end_dt, -1),null) as lag_end_dt from ( SELECT customer_id , groupArray(loan_dt) AS loan_dt, groupArray(ifnull(end_date,toDate(‘2099-12-31’))) AS end_date, groupArray(due_days) AS due_days, groupArray(loan_id) AS loan_id, arrayEnumerate(loan_id) AS row_number, arrayEnumerateDense(loan_id) AS dense_rank, arrayEnumerateUniq(loan_id) AS uniq_rank FROM ( SELECT * FROM res_report.ipeso_loan ORDER BY loan_dt ,loan_id ) GROUP BY customer_id ) ARRAY JOIN loan_dt, loan_id, end_date, due_days, row_number, dense_rank, uniq_rank ORDER BY customer_id ASC, row_number ASC , dense_rank ASC
2024最新激活全家桶教程,稳定运行到2099年,请移步至置顶文章:https://sigusoft.com/99576.html
版权声明:本文内容由互联网用户自发贡献,该文观点仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请联系我们举报,一经查实,本站将立刻删除。 文章由激活谷谷主-小谷整理,转载请注明出处:https://sigusoft.com/77034.html