摘要翻译:
保护隐私的机器学习算法对于分析医疗或财务记录等个人数据的日益普遍的环境至关重要。我们提供了通过正则化经验风险最小化(ERM)来产生分类器隐私保护近似的一般技术。这些算法在Dwork等人的$\epsilon$-差分隐私定义下是私有的。(2006)。首先应用Dwork等人的输出摄动思想。(2006),改为机构风险管理分类。然后我们提出了一种新的隐私保护机器学习算法设计方法--目标摄动法。这种方法在对分类器进行优化之前需要对目标函数进行扰动。如果损失和正则化子满足一定的凸性和可微性准则,我们证明了理论结果表明我们的算法保持隐私,并给出了线性核和非线性核的推广界。我们进一步提出了一种隐私保护技术来调整一般
机器学习算法中的参数,从而为训练过程提供端到端的隐私保证。我们将这些结果应用于正则化logistic回归和支持向量机的隐私保护类比。我们在真实的人口统计和基准数据集上评估了他们的表现,获得了令人鼓舞的结果。我们的研究结果表明,无论是理论上还是实证上,客观扰动在管理隐私和学习绩效之间的内在权衡方面都优于先前的最先进的输出扰动。
---
英文标题:
《Differentially Private Empirical Risk Minimization》
---
作者:
Kamalika Chaudhuri, Claire Monteleoni, Anand D. Sarwate
---
最新提交年份:
2011
---
分类信息:
一级分类:Computer Science 计算机科学
二级分类:Machine Learning 机器学习
分类描述:Papers on all aspects of machine learning research (supervised, unsupervised, reinforcement learning, bandit problems, and so on) including also robustness, explanation, fairness, and methodology. cs.LG is also an appropriate primary category for applications of machine learning methods.
关于机器学习研究的所有方面的论文(有监督的,无监督的,强化学习,强盗问题,等等),包括健壮性,解释性,公平性和方法论。对于机器学习方法的应用,CS.LG也是一个合适的主要类别。
--
一级分类:Computer Science 计算机科学
二级分类:Artificial Intelligence
人工智能
分类描述:Covers all areas of AI except Vision, Robotics, Machine Learning, Multiagent Systems, and Computation and Language (Natural Language Processing), which have separate subject areas. In particular, includes Expert Systems, Theorem Proving (although this may overlap with Logic in Computer Science), Knowledge Representation, Planning, and Uncertainty in AI. Roughly includes material in ACM Subject Classes I.2.0, I.2.1, I.2.3, I.2.4, I.2.8, and I.2.11.
涵盖了人工智能的所有领域,除了视觉、机器人、机器学习、多智能体系统以及计算和语言(自然语言处理),这些领域有独立的学科领域。特别地,包括专家系统,定理证明(尽管这可能与计算机科学中的逻辑重叠),知识表示,规划,和人工智能中的不确定性。大致包括ACM学科类I.2.0、I.2.1、I.2.3、I.2.4、I.2.8和I.2.11中的材料。
--
一级分类:Computer Science 计算机科学
二级分类:Cryptography and Security 密码学与安全
分类描述:Covers all areas of cryptography and security including authentication, public key cryptosytems, proof-carrying code, etc. Roughly includes material in ACM Subject Classes D.4.6 and E.3.
涵盖密码学和安全的所有领域,包括认证、公钥密码系统、携带证明的代码等。大致包括ACM主题课程D.4.6和E.3中的材料。
--
一级分类:Computer Science 计算机科学
二级分类:Databases 数据库
分类描述:Covers database management, datamining, and data processing. Roughly includes material in ACM Subject Classes E.2, E.5, H.0, H.2, and J.1.
涵盖数据库管理、
数据挖掘和数据处理。大致包括ACM学科类E.2、E.5、H.0、H.2和J.1中的材料。
--
---
英文摘要:
Privacy-preserving machine learning algorithms are crucial for the increasingly common setting in which personal data, such as medical or financial records, are analyzed. We provide general techniques to produce privacy-preserving approximations of classifiers learned via (regularized) empirical risk minimization (ERM). These algorithms are private under the $\epsilon$-differential privacy definition due to Dwork et al. (2006). First we apply the output perturbation ideas of Dwork et al. (2006), to ERM classification. Then we propose a new method, objective perturbation, for privacy-preserving machine learning algorithm design. This method entails perturbing the objective function before optimizing over classifiers. If the loss and regularizer satisfy certain convexity and differentiability criteria, we prove theoretical results showing that our algorithms preserve privacy, and provide generalization bounds for linear and nonlinear kernels. We further present a privacy-preserving technique for tuning the parameters in general machine learning algorithms, thereby providing end-to-end privacy guarantees for the training process. We apply these results to produce privacy-preserving analogues of regularized logistic regression and support vector machines. We obtain encouraging results from evaluating their performance on real demographic and benchmark data sets. Our results show that both theoretically and empirically, objective perturbation is superior to the previous state-of-the-art, output perturbation, in managing the inherent tradeoff between privacy and learning performance.
---
PDF链接:
https://arxiv.org/pdf/0912.0071