柴成亮,威尼斯87978797预聘副教授(特别研究员),博士生导师,CCF优秀博士论文奖获得者。2015年和2020年分别于哈尔滨工业大学与清华大学获得学士和博士学位,2020-2022年于清华大学从事博士后研究。已发表CCF A类论文近40篇,包括SIGMOD、VLDB、ICDE、KDD、TKDE和VLDBJ。获得CCF优秀博士论文奖(全国Top 10)、ACM中国优秀博士论文奖(全国Top 2)、入选福布斯中国30位30岁以下精英榜单、百度奖学金(全球Top10)等奖励。在学术服务方面,担任国际高水平SCI期刊JCST特约编辑; KDD、ICDE、VLDB、AAAI、ICDCS等多个国际顶级会议程序委员会委员;CCF前沿讲习班学术主任;中国数据库专委会执行委员;曾在国际顶级会议SIGMOD 2021、KDD 2018、ICDE 2019做3小时辅导报告。
研究方向1—以数据为中心的人工智能(data-centric AI):在人工智能时代,算法、算力与数据成为不可或缺的三要素。现有研究主要关注人工智能算法,但是数据也是非常重要的。主要研究如何从数据的角度赋能人工智能模型,尤其是大语言模型。主要包括面向人工智能大模型的数据发现与选择、数据清洗与融合、数据标注和数据血缘等。
研究方向2—数据湖与大模型: 在多源异构大数据时代,数据湖由于其能高效地以原始格式存储各种数据而得到广泛应用,其存储的数据能有效支持数据分析与人工智能算法。主要研究如何索引数据湖中数据、如何高效检索数据以支持大模型推理(检索增强RAG技术),如何提取和利用数据湖中蕴含的知识,如何利用大模型对湖中多模态数据进行高效准确分析。
2024.12更新: 目前招收2025年入学的硕士生(考研)3-4人。每年计划招收硕士生3-4人,博士生1人。
人工智能、数据科学、数据湖、数据库系统
*表示通讯作者
[1] Chengliang Chai, Yuhao Deng, Yutong Zhan, Ziqi Cao, Yuanfang Zhang, Lei Cao, Yu-Ping Wang, Zhiwei Zhang, Ye Yuan, Guoren Wang, Nan Tang LakeCompass: An End-to-End System for Table Maintenance, Search and Analysis in Data Lakes VLDB 2024 (CCF A)
[2] Chengliang Chai, Kaisen Jin, Nan Tang, Ju Fan, Lianpeng Qiao, Yuping Wang, Yuyu Luo, Ye Yuan, Guoren Wang Mitigating Data Scarcity in Supervised Machine Learning Through Reinforcement Learning Guided Data Generation. ICDE 2024 (CCF A)
[3] Yuhao Deng, Chengliang Chai*, Lei Cao, Nan Tang, Jiayi Wang, Ju Fan, Ye Yuan, and Guoren Wang. MisDetect: Iterative Mislabel Detection using Early Loss. VLDB 2024 (CCF A)
[4] Yuhao Deng, Chengliang Chai*, Lei Cao, Qin Yuan, Siyuan Chen, Yanrui Yu, Zhaoze Sun, Junyi Wang, Jiajun Li, Ziqi Cao, Kaisen Jin, Chi Zhang, Yuqing Jiang, Yuanfang Zhang, Yuping Wang, Ye Yuan, Guoren Wang, and Nan Tang. LakeBench: A Benchmark for Discovering Joinable and Unionable Tables in Data Lakes.VLDB 2024 (CCF A)
[5] Jiayi Wang, Chengliang Chai*, Jiabin Liu, Guoliang Li Cardinality estimation using normalizing flow.VLDB 2024 (CCF A)
[6] Tan Tang, Chengliang Chai*, Dawei Zhao, Haohai Ma, Yong Zheng, Zhenyong Fan, Xin Wu, Jiaquan Zhang, Rui Zhang, Duanshun Li, Yi He, Keji Huang, Guangbin Meng, Yidong Wang, Yuefeng Zhou, Tao Tao, Lirong Jian, Jiwu Shu, Yuping Wang, Ye Yuan, Guoren Wang, Guoliang Li Separation Is for Better Reunion: Data Lake Storage at Huawei. ICDE 2024 (CCF A)
[7] Chengliang Chai, Jiabin Liu, Nan Tang, Ju Fan, Dongjing Miao, Jiayi Wang, Yuyu Luo, Guoliang Li GoodCore: Data-effective and Data-efficient Machine Learning through Coreset Selection over Incomplete Data.SIGMOD 2023 (CCF A)
[8] Chengliang Chai, Jiayi Wang, Yuyu Luo, Zeping Niu, Guoliang Li Data Management for Machine Learning: A Survey. TKDE 2023 (CCF A)
[9] Chengliang Chai, Jiayi Wang, Nan Tang, Ye Yuan, Jiabin Liu, Yuhao Deng, Guoren Wang: Efficient Coreset Selection with Cluster-based Methods. KDD 2023 (CCF A)
[10] Chengliang Chai, Nan Tang, Ju Fan, Yuyu Luo Demystifying Artificial Intelligence for Data Preparation SIGMOD 2023 (CCF A)
[11] Chengliang Chai, Jiayi Wang, Yuyu Luo, Zeping Niu, Guoliang Li Data Management for Machine Learning: A Survey TKDE 2023 (CCF A)
[12] Jiayi Wang, Chengliang Chai*, Nan Tang, Jiabin Liu, Guoliang Li Coresets over Multiple Tables for Feature-rich and Data-efficient Machine Learning VLDB 2023 (CCF A)
[13] Dynamic Materialized View Management using Graph Neural Network Yue Han, Chengliang Chai*, Jiabin Liu, Guoliang Li, Chuangxian Wei, Chaoqun Zhan ICDE 2023 (CCF A)
[14] Chengliang Chai, Jiabin Liu, Nan Tang, Guoliang Li Selective Data Acquisition in the Wild for Model Charging VLDB 2022 (CCF A)
[15] Lixi Zhang, Chengliang Chai*, Xuanhe Zhou, Guoliang Li LearnedSQLGen: Constraint-aware SQL Generation using Reinforcement Learning SIGMOD 2022 (CCF A)
[16] Xiang Yu, Chengliang Chai*, Guoliang Li, Jiabin Liu Cost-based or Learning-based? A Hybrid Query Optimizer for Query Plan Selection VLDB 2022 (CCF A)
[17] Jiayi Wang, Chengliang Chai*, Jiabin Liu, Guoliang Li FACE: A Normalizing Flow based Cardinality Estimator VLDB 2022 (CCF A)
[18] Xuedi Qin, Chengliang Chai*, Nan Tang, Jian Li, Yuyu Luo, Guoliang Li, Yaoyu Zhu Synthesizing Entity Resolution Datasets ICDE 2022 (CCF A)
[19] Jiabin Liu, Chengliang Chai*, Yuyu Luo, Yin Lou, Jianhua Feng, Nan Tang Feature Augmentation with Reinforcement Learning ICDE 2022 (CCF A)
[20] RW-tree: A Learned Workload-aware Framework for R-tree Construction Haowen Dong, Chengliang Chai*, Yuyu Luo, Jiabin Liu, Guoliang Li ICDE 2022 (CCF A)
[21] Chengliang Chai, Guoliang Li, Ju Fan, et al. CrowdChart: Crowdsourced Data Extraction from Visualization Charts TKDE 2021 (CCF A).
[22] Xuedi Qin, Chengliang Chai*, Yuyu Luo, Tianyu Zhao, Nan Tang, Guoliang Li, Xiang Yu, Mourad Ouzzani Interactively Discovering and Ranking Desired Tuples by Data Exploration VLDBJ 2021 (CCF A)
[23] Jiabin Liu, Fu Zhu, Chengliang Chai*, Yuyu Luo, Nan Tang Automatic Data Acquisition for Deep Learning VLDB 2021 (CCF A)
[24] Xuedi Qin, Chengliang Chai*, Yuyu Luo, Nan Tang, Guoliang Li Ranking Desired Tuples by Database Exploration ICDE 2021 (CCF A)
[25] Chengliang Chai, Lei Cao, Jian Li, Guoliang Li, Yuyu Luo, Samuel Madden Human-in-the-loop Outlier Detection. SIGMOD 2020 (CCF A).
[26] Xuanhe Zhou, Chengliang Chai*, Guoliang Li, Ji Sun DB Meets AI: A Survey TKDE 2020 (CCF A).
[27] Yuyu Luo, Xuedi Qin, Chengliang Chai*, Nan Tang, Guoliang Li Steerable Self-driving Data Visualization TKDE, 2020 (CCF A).
[28] Xuanhe Zhou, Chengliang Chai*, Guoliang Li, Ji Sun Database Meets Artificial Intelligence: A Survey. TKDE, 2020. (CCF A).
[29] Yuyu Luo, Chengliang Chai*, Xuedi Qin, Guoliang Li, Nan Tang Interactive Cleaning for Progressive Visualization through Composite Questions ICDE, 2020. (CCF A).
[30] Chengliang Chai, Guoliang Li, Ju Fan, Yuyu Luo Crowdsourcing Data Extraction from Visualization Chart ICDE, 2020. (CCF A).
[31] Chengliang Chai, Ju Fan, Guoliang Li, Jiannan Wang, Yudian Zheng. Crowdsourcing Database Systems: Overview and Challenges ICDE, 2019. (CCF A).
[32] Chengliang Chai, Ju Fan, Guoliang Li. Incentive-Based Entity Collection Using Crowdsourcing ICDE 2018. (CCF A).
[33] Chengliang Chai, Guoliang Li, Jian Li, et al. A Partial-order-based Framework for Cost-effective Crowdsourced Entity Resolution VLDB Journal, 2018 (CCF A).
[34] Chengliang Chai, Guoliang Li, Jian Li et al. Cost-Effective Crowdsourced Entity Resolution: A Partial-Order Approach SIGMOD 2016. (CCF A).
[1] 博士后创新人才计划,面向人工智能的数据准备,2021-2022,主持;
[2] 国家基金委青年项目,面向机器学习的数据发现,2022-2024,主持;
[3] 国家基金委面上项目,基于大语言模型的数据清洗融合,2025-2028,主持;
[4] 博士后面上项目,关系数据的智能发现技术研究,2021-2022,主持;
[5] 国家基金委重点项目,众包数据库关键技术研究,2017- 2021,参加;
[6] 科技部973计划,大数据群体计算的基础理论与关键技术,2015-2019参加
[1] CCF优秀博士论文奖
[2] ACM中国优秀博士论文奖
[3] 福布斯中国30Under30
[4] 博士后创新人才计划 [5] 百度奖学金
[6] 国家电网科学技术进步一等奖
[7] 浙江省科学技术进步二等奖
[8] 之江实验室—国际青年人才优秀成果奖
[9] 清华大学优秀博士后
[10] 清华大学优秀博士毕业生
中国计算机协会数据库专委执行委员
中国计算机协会前沿讲习班学术主任
CCA类会议ICDE DBML workshop主席
CCA类会议DASAA BDQM workshop主席
国际重要期刊JCST特约编辑
多次担任VLDB、ICDE、KDD、AAAI等会议PC Member