摘要:Map Reduce is a programming model for processing large data sets,and Hadoop is the most popular open-source implementation of MapReduce.To achieve high performance,up to 190 Hadoop configuration parameters must be manually tunned.This is not only time-consuming but also error-pron.In this paper,we propose a new performance model based on random forest,a recently developed machine-learning algorithm.The model,called RFMS,is used to predict the performance of a Hadoop system according to the system’s configuration parameters.RFMS is created from 2000 distinct fine-grained performance observations with different Hadoop configurations.We test RFMS against the measured performance of representative workloads from the Hadoop Micro-benchmark suite.The results show that the prediction accuracy of RFMS achieves 95% on average and up to 99%.This new,highly accurate prediction model can be used to automatically optimize the performance of Hadoop systems.
原文链接:http://www.cqvip.com//QK/70429X/201302/46536953.html
送人玫瑰,手留余香~如您已下载到该资源,可在回帖当中上传与大家共享,欢迎来CDA社区交流学习。(仅供学术交流用。)