Abstract: Ensemble learning plays an important role in big data analysis. A great limitation is that multiple parties cannot share their knowledge extracted from ensemble learning model with privacy guarantee, therefore it is a great demand to develop privacy-preserving collaborative ensemble learning. This paper proposes a privacy-preserving collaborative ensemble learning framework under differential privacy. In the framework, multiple parties can independently build their local ensemble models with personalized privacy budgets, and collaboratively share their knowledge to obtain a stronger classifier with the help of central agent in a privacy-preserving way. Under this framework, this paper presents the differentially private versions of two widely-used ensemble learning algorithms: collaborative random forests under differential privacy (CRFsDP) and collaborative adaptive boosting under differential privacy (CAdaBoostDP). Theoretical analysis and extensive experimental results show that our proposed framework achieves a good balance between privacy and utility in an efficient way.
Keywords: Ensemble learning, differential privacy, random forests, adaptive boosting