Di Wang's Homepage
Chinese: 王帝
Al Khawarizmi Building 1, Room 4341
Division of CEMSE
King Abdullah University of Science and Technology
Thuwal, Saudi Arabia, 239556900
Email :
di.wang@kaust.edu.sa
Website: [
KAUST Personal][
Personal] [
Laboratory]
Tel: +966 (012) 8080645
I am currently an Assistant Professor of Computer Science in the Division of Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) at the King Abdullah University of Science and Technology (KAUST), start from Spring 2021. I am also the PI of PrivacyAwareness, Responsibility and Trustworthy (PART) Lab .
Before that, I got my Ph.D degree in Computer Science at
The State University of New York (SUNY) at Buffalo in 2020 under supervision of Dr. Jinhui Xu . Before my Ph.D study I took
my Master degree in Mathematics at
University of Western Ontario in 2015, and I received my Bachelor degree in Mathematics and Applied
Mathematics at Shandong University in 2014.
My most recent resume (last updated in November, 2021) can be found here.
Dissertation: Some Fundamental Machine Learning Problems in the Differential Privacy Model.
Current Openings: I am always looking for Postdocs, PhD students, internship and visiting students (all are fully funded). If you are interested in working with me, please send me your CV and transcripts before applying. See PhD and Postdoc for details.
Private Data Analytics: Differential privacy, privacypreserving machine learning/data mining and privacy attack in machine learning
Trustworthy Machine Learning: Robust statistics/estimation, interpretable machine learning, fairness in machine learning, adversarial machine learning
Statistical Learning Theory: high dimensional statistics, causal inference, statistical estimation, learning theory and quantum machine learning
Healthcare: Trustworthy issues in digital healthcare, biomedical imaging and bioinformatics

Instructor
 CS 229: Machine Learning, Spring 2022@KAUST
 CS 394S: Contemporary Topics in Computer Security: Differential Privacy, Fall 2021 @KAUST
 Short Course: Selected Topics in Private Machine Learning and Statistics, January 2021 @ECNU.
 CSE 474/574: Introduction to Machine Learning, Summer 2019 @SUNY at Buffalo.
 Teaching assistant:
 CSE 474/574 Introduction to Machine Learning, Spring 2018 @SUNY at Buffalo.
 CSE 431/531 Analysis of Algorithm, Fall 2017, Spring 2017, Fall 2016, Spring 2016 @SUNY at Buffalo.
 CSE 115 Introduction to Computer Science for Majors I, Fall 2015 @ @SUNY at Buffalo.
 MATH 1229A Methods of Matrix Algebra, Summer 2015, Spring 2015 @ UWO.
 MATH 1225B Methods of Calculus, Fall 2014 @ UWO.
Postdocs
 Yan Hu, 12/2021
 Sultan J. Majeed
PhD students
 Zihang Xiang (CS PhD), 01/2021
 Lijie Hu (CS PhD), 01/2021
 Yulian Wu (CS PhD), 09/2021
 Chenglong Wang (CS PhD), 09/2021
 Xiaochuan Gou (CS PhD), 09/2020 , Coadvised with Xiangliang Zhang
 Shaza Alawadi (AMCS PhD), 01/2022
Master Students
Visiting Students/Reaserch Interns
 Jinyan Su (Undergraduate at University of Electronic Science and Technology of China), 03/2021
 Hanpu Shen (Undergraduate at Southern University of Science and Technology), 05/202109/2021
 Yuan Qiu (Undergraduate at Sun Yatsen University), 09/202112/2021
 Tao Yang (Undergraduate at Nankai University), 09/202112/2021
 Binlan Wu (Master student at Technical University of Munich), 10/202101/2022

High Dimensional Differentially Private Stochastic Optimization with Heavytailed Data. [Link] Abstract▼
As one of the most fundamental problems in machine learning, statistics and differential privacy, Differentially Private Stochastic Convex Optimization (DPSCO) has been extensively studied in recent years. However, most of the previous work can only handle either regular data distribution or irregular data in the low dimensional space case. To better understand the challenges arising from irregular data distribution, in this paper we provide the first study on the problem of DPSCO with heavytailed data in the high dimensional space. In the first part we focus on the problem over some polytope constraint (such as the $\ell_1$norm ball). We show that if the loss function is smooth and its gradient has bounded second order moment, it is possible to get an error bound of $\tilde{O}(\frac{\log d}{(n\epsilon)^\frac{1}{3}})$ in the $\epsilon$DP model, where $n$ is the sample size and $d$ is the dimensionality of the underlying space. Next, for LASSO, if the data distribution that has bounded fourthorder moments, we improve the bound to $\tilde{O}(\frac{\log d}{(n\epsilon)^\frac{2}{5}})$ in the $(\epsilon, \delta)$DP model. In the second part of the paper, we study DPSCO for sparse learning with heavytailed data. We first revisit the sparse linear regression and propose a truncated DPIHT method whose output could achieve an error of $\tilde{O}(\frac{s^{*2}\log d}{n\epsilon})$, where $s^*$ is the sparsity of the underlying parameter. Then we study a more general problem over the sparsity ({\em i.e.,} $\ell_0$norm) constraint, and show that it is possible to achieve an error of $\tilde{O}(\frac{s^{*\frac{3}{2}}\log d}{n\epsilon})$, which is also near optimal up to a factor of $\tilde{O}{(\sqrt{s^*})}$, if the loss function is smooth and strongly convex.
Lijie Hu, Shuo Ni, Hanshen Xiao and Di Wang.
To appear in the 41st ACM SIGMODSIGACTSIGAI Symposium on Principles of Database Systems (PODS 2022)
 Estimating Smooth GLM in Noninteractive Local Differential Privacy Model with Public Unlabeled Data [Link] Abstract▼
In this paper, we study the problem of estimating smooth Generalized Linear Models (GLM) in the Noninteractive Local Differential Privacy (NLDP) model.
Different from its classical setting, our model allows the server to process
some additional public but unlabeled data.
We first show that there is an $(\epsilon, \delta)$NLDP algorithm for
GLM (under some mild assumptions), if each data record is i.i.d sampled from some subGaussian distribution with bounded $\ell_1$norm.
The sample complexity of both
public and private data, for the algorithm to achieve an $\alpha$ estimation error (in $\ell_\infty$norm), is $\tilde{O}(p^2\alpha^{2}\epsilon^{2})$ if $\alpha$ is not too small (i.e., $\alpha\geq \Omega(\frac{1}{\sqrt{p}})$), where $p$ is the dimensionality of the data. This is a significant improvement over the previously known quasipolynomial (in $\alpha$) or exponential (in $p$) complexity of convex GLM with no public data.
We then extend our idea to the nonlinear regression problem and show
a similar phenomenon for it.
Finally, we demonstrate the practicality of our algorithms through experiments on both synthethic and real world datasets.
To our best knowledge, this is the first paper showing the existence of efficient and practical
algorithms for GLM and nonlinear regression
in the NLDP model with public unlabeled data.
Di Wang^{*}, Huanyu Zhang^{*}, Marco Gaboardi and Jinhui Xu. (* equal contribution)
The 32nd International Conference on Algorithmic Learning Theory (ALT 2021).
 On Sparse Linear Regression in the Local Differential Privacy Model. [Link] Abstract▼
In this paper, we study the sparse linear regression problem in the Local Differential Privacy (LDP) model. We first show that polynomial dependency on the dimensionality $p$ of the space is unavoidable for the estimation error in both noninteractive and sequential interactive local models, if the privacy of the whole dataset needs to be preserved. Similar limitations also exist for other types of error measurements and in the relaxed local models. This indicates that differential privacy in high dimensional space is unlikely achievable for the problem. With the understanding of this limitation, we then present two algorithmic results. The first one is
a sequential interactive LDP algorithm for the low dimensional sparse case, called Locally Differentially Private Iterative Hard Thresholding (LDPIHT), which achieves a near optimal upper bound. This algorithm is actually rather general and can be used to solve quite a few other problems, such as (Local) DPERM with sparsity constraints and sparse regression with nonlinear measurements. The second one is for the restricted (high dimensional) case where only the privacy of the responses (labels) needs to be preserved. For this case,
we show that the optimal rate of the error estimation can be made logarithmically depending on $p$ (i.e., $\log p$) in the local model,
where an upper bound is obtained by a labelprivacy version of LDPIHT. Experiments on real world and synthetic datasets confirm our theoretical analysis.
Di Wang and Jinhui Xu.
IEEE Transactions on Information Theory, Volume 67, no. 2, Pages 11821200, Feb. 2021.
 Empirical Risk Minimization in the Noninteractive Local
Model of Differential Privacy. [Link] Abstract▼
In this paper, we study the Empirical Risk Minimization (ERM) problem in the noninteractive Local Differential Privacy (LDP) model. We first show that if the loss function is $(\infty, T)$smooth, by using the Bernstein polynomial approximation we can avoid a dependency of the sample complexity, to achieve error $\alpha$, on the exponential of the dimensionality $p$ with base $1/\alpha$ (i.e., $\alpha^{p}$).
This answers a question from (Smith et.al., 2017). Then, we propose playerefficient algorithms with $1$bit communication complexity and $O(1)$ computation cost for each player. The error bound of these algorithms is asymptotically the same as the original one. With some additional assumptions, we also give an algorithm which is more efficient for the server. Based on different types of polynomial approximations, we propose (efficient) noninteractive locally differential private algorithms for learning the set of kway marginal queries and the set of smooth queries. Moreover, we study the case of $1$Lipschitz generalized linear convex loss functions and show that there is an $(\epsilon, \delta)$LDP algorithm whose sample complexity for achieving error $\alpha$ is only linear in the dimensionality $p$ and quasipolynomial in other terms. To prove this, we first show that the conclusion holds for the hinge loss function. Then, we extend the result to any $1$Lipschitz generalized linear convex loss functions by showing that every such a function can be approximated by a linear combination of hinge loss functions and some linear functions. Our results use a polynomial of inner product approximation technique. Then we apply our technique to the Euclidean median problem and show that its sample complexity needs only to be quasipolynomial in $p$, which is the first result with a subexponential sample complexity in $p$ for nongeneralized linear loss functions.
Di Wang, Marco Gaboardi, Adam Smith and Jinhui Xu.
Journal of Machine Learning Research, Volume 21, 200 (2020), Pages 139.
 On Differentially Private Stochastic Convex Optimization with Heavytailed Data [Link] Abstract▼
In this paper, we consider the problem of designing Differentially Private (DP) algorithms for Stochastic Convex Optimization (SCO) on heavytailed data. The irregularity of such data violates some key assumptions used in almost all existing DPSCO and DPERM methods, resulting in failure to provide the DP guarantees. To better understand this type of challenges, we provide in this paper a comprehensive study of DPSCO under various settings. First, we consider the case where the loss function is strongly convex and smooth. For this case, we propose a method based on the sampleandaggregate framework, which has an excess population risk of $\tilde{O}(\frac{d^3}{n\epsilon^4})$ (after omitting other factors), where $n$ is the sample size and $d$ is the dimensionality of the data. Then, we show that with some additional assumptions on the loss functions, it is possible to reduce the \textit{expected} excess population risk to $\tilde{O}(\frac{ d^2}{ n\epsilon^2 })$. To lift these additional conditions, we also provide a gradient smoothing and trimming based scheme to achieve excess population risks of $\tilde{O}(\frac{ d^2}{n\epsilon^2})$ and $\tilde{O}(\frac{d^\frac{2}{3}}{(n\epsilon^2)^\frac{1}{3}})$ for strongly convex and general convex loss functions, respectively, \textit{with high probability}. Experiments on both synthetic and realworld datasets suggest that our algorithms can effectively deal with the challenges caused by data irregularity.
Di Wang^{*}, Hanshen Xiao^{*}, Srini Devadas and Jinhui Xu (* equal contribution).
The 37th International Conference on Machine Learning (ICML 2020).
 Facility Location Problem in Differential Privacy Model Revisited. [Link] Abstract▼
In this paper we study the uncapacitated facility location problem in the
model of differential privacy (DP) with uniform facility
cost. Specifically, we first show that, under the \emph{hierarchically wellseparated tree (HST) metrics} and the superset output setting that was introduced in (Gupta et al 2010), there is an $\epsilon$DP
algorithm that achieves an $O(\frac{1}{\epsilon})$
(expected multiplicative) approximation ratio; this implies an
$O(\frac{\log n}{\epsilon})$
approximation ratio for the general metric case, where $n$ is the size of the input metric. These bounds improve
the bestknown results given by (Gupta et al 2010). In particular, our approximation ratio for HSTmetrics is independent of $n$, and the ratio for general metrics is independent of the aspect ratio of the input metric. On the negative side, we show that the approximation ratio of any $\epsilon$DP algorithm is lower bounded by $\Omega(\frac{1}{\sqrt{\epsilon}})$, even for instances on HST metrics with uniform facility cost, under the superset output setting. The lower bound shows that the dependence of the approximation ratio for HST metrics on $\epsilon$ can not be removed or greatly improved. Our novel methods and techniques for both the upper and lower bound may find additional applications.
[alphabetic order] Yunus Esencayi, Marco Gaboardi, Shi Li and Di Wang
Conference on Neural Information Processing Systems (NIPS/NeurIPS), 2019.
 Differentially Private Empirical Risk Minimization with Nonconvex Loss Functions. [Link] Abstract▼
We study the problem of Empirical Risk Minimization (ERM) with (smooth) nonconvex loss functions under the differentialprivacy (DP) model. Existing approaches for this problem mainly adopt gradient norms to measure the error, which in general cannot guarantee the quality of the solution. To address this issue,
we first study the expected excess empirical (or population) risk, which was primarily used as the utility to measure the quality for convex loss functions. Specifically, we show that
the excess empirical (or population) risk can be upper bounded by $\tilde{O}(\frac{d\log (1/\delta)}{\log n\epsilon^2})$ in the $(\epsilon, \delta)$DP settings, where $n$ is the data size and $d$ is the dimensionality of the space.
The $\frac{1}{\log n}$ term in the empirical risk bound can be further improved to $\frac{1}{n^{\Omega(1)}}$ (when $d$ is a constant) by a highly nontrivial analysis on the timeaverage error.
To obtain more efficient solutions, we also consider the connection between achieving differential privacy and finding approximate local minimum.
Particularly, we show that when the size $n$ is large enough, there are $(\epsilon, \delta)$DP algorithms which can find an approximate local minimum of the empirical risk with high probability in both the constrained and nonconstrained settings.
These results indicate that one can escape saddle points privately.
Di Wang, Changyou Chen and Jinhui Xu.
The 36th International Conference on Machine Learning (ICML 2019).
 On Sparse Linear Regression in the Local Differential Privacy Model. [Link] Abstract▼
In this paper, we study the sparse linear regression problem under the Local Differential Privacy (LDP) model. We first show that polynomial dependency on the dimensionality $p$ of the space is unavoidable for the estimation error in both noninteractive and sequential interactive local models, if the privacy of the whole dataset needs to be preserved. Similar limitations also exist for other types of error measurements and in the relaxed local models. This indicates that differential privacy in high dimensional space is unlikely achievable for the problem. With the understanding of this limitation, we then present two algorithmic results. The first one is
a sequential interactive LDP algorithm for the low dimensional sparse case, called Locally Differentially Private Iterative Hard Thresholding (LDPIHT), which achieves a near optimal upper bound. This algorithm is actually rather general and can be used to solve quite a few other problems, such as (Local) DPERM with sparsity constraints and sparse regression with nonlinear measurements. The second one is for the restricted (high dimensional) case where only the privacy of the responses (labels) needs to be preserved. For this case,
we show that the optimal rate of the error estimation can be made logarithmically depending on $p$ (i.e., $\log p$) in the local model,
where an upper bound is obtained by a labelprivacy version of LDPIHT. Experiments on real world and synthetic datasets confirm our theoretical analysis.
Di Wang and Jinhui Xu.
The 36th International Conference on Machine Learning (ICML 2019).
Selected as Long Talk(Acceptance Rate: 140/3424= 4.1%) .
 Empirical Risk Minimization in Noninteractive Local Differential Privacy Revisited. [Link]
Abstract▼
In this paper, we revisit the Empirical Risk Minimization problem in the noninteractive local model of differential privacy. In the case of constant or low dimensions ($p\ll n$), we first show that if the loss function is $(\infty, T)$smooth, we can avoid a dependence of the sample complexity, to achieve error $\alpha$, on the exponential of the dimensionality $p$ with base $1/\alpha$ (i.e., $\alpha^{p}$),
which answers a question in (Smith et al 2017). Our approach is based on polynomial approximation. Then, we propose playerefficient algorithms with $1$bit communication complexity and $O(1)$ computation cost for each player. The error bound is asymptotically the same as the original one. With some additional assumptions, we also give an efficient algorithm for the server.
In the case of high dimensions ($n\ll p$),
we show that if the loss function is a convex generalized linear function, the error can be bounded by using the Gaussian width of the constrained set, instead of $p$, which improves the one in
Smith et al.
Our techniques can be extended to some related problems, such as $k$way marginal queries and smooth queries.
Di Wang, Marco Gaboardi and Jinhui Xu.
Conference on Neural Information Processing Systems (NIPS/NeurIPS), 2018.
 Differentially Private Empirical Risk Minimization Revisited: Faster and More General. [Link] Abstract▼
In this paper we study the differentially private Empirical Risk Minimization (ERM) problem in different settings. For smooth (strongly) convex loss function with or without (non)smooth regularization, we give algorithms that achieve either optimal or near optimal utility bounds with less gradient complexity compared with previous work. For ERM with smooth convex loss function in highdimensional ($p\gg n$) setting, we give an algorithm which achieves the upper bound with less gradient complexity than previous ones. At last, we generalize the expected excess empirical risk from convex loss functions to nonconvex ones satisfying the PolyakLojasiewicz condition and give a tighter upper bound on
the utility than the one in (Zhang et al 2017).
Di Wang, Minwei Ye and Jinhui Xu.
Conference on Neural
Information Processing Systems (NIPS/NeurIPS), 2017.

Byzantine Resilient and Private Machine Learning Revisited. Abstract▼
Zihang Xiang and Di Wang.

On Learning Halfspaces with Public Unlabeled Data in Noninteractive LDP Model. Abstract▼
Jinyan Su and Di Wang.

Privacypreserving Sparse Generalized Eigenvalue Problem. Abstract▼
Lijie Hu^{*}, Zihang Xiang^{*} and Di Wang. (* equal contribution)

On Differentially Private Stochastic Convex Optimization with Heavytailed Gradients. Abstract▼
Youming Tao, Yuian Wu, Xiuzhen Cheng and Di Wang.

High Dimensional Sparse Statistical Estimation Under Onebit Quantization. Abstract▼
In this paper, we consider several fundamental machine learning and (sparse) statistical estimation problems from i.i.d samples of a subGaussian or heavytailed distribution. Instead of having the full knowledge of the samples, we focus on the scenario where each entry of these the samples is quantized to one or two bits, which plays a critical role in signal processing or machine learning applications. Specifically, we give the first study on the problems of (sparse) covariance matrix estimation, sparse linear regression and lowrank matrix completion. For covariance matrix estimation, compared with the previous results, we extend to the heavytailed case where the underlying distribution has bounded fourth order moments. Moreover, we extend to the high dimensional sparse (and heavytailed) cases by providing several new estimators which consist of four steps: Truncating, Dithering, Quantizing and Thresholding. For sparse linear model and lowrank matrix completion, we propose new quadratic objective functions for onebit quantized samples based on our previous estimators. For all of our estimators we also derive estimation errors. Especially, in the subGaussian case, all of our estimators could achieve near optimal rates in the unquantized cases.
Finally, extensive experiments on synthetic and realworld data have been carried out to support our theories.
Junren Chen^{*}, ChengLong Wang^{*} and Di Wang. (* equal contribution)

Faster Rates of Differentially Private Stochastic Convex Optimization. Abstract▼
In this paper, we revisit the problem of Differentially Private Stochastic Convex Optimization (DPSCO) and provide excess population risks for some special classes of functions that are faster than the previous results of general convex and strongly convex functions. In the first part of the paper, we study the case where the population risk function satisfies the Tysbakov Noise Condition (TNC) with some parameter $\theta>1$. Specifically, we first show that under some mild assumptions on the loss functions, there is an algorithm whose output could achieve an upper bound of $\tilde{O}((\frac{1}{\sqrt{n}}+\frac{d}{n\epsilon})^\frac{\theta}{\theta1}) $ and $\tilde{O}((\frac{1}{\sqrt{n}}+\frac{\sqrt{d\log(1/\delta)}}{n\epsilon})^\frac{\theta}{\theta1})$ for $\epsilon$DP and $(\epsilon, \delta)$DP, respectively when $\theta\geq 2$, here $n$ is the sample size and $d$ is the dimension of the space. Then we address the inefficiency issue, improve the upper bounds by $\text{Poly}(\log n)$ factors and extend to the case where $\theta\geq \bar{\theta}>1$ for some known $\bar{\theta}$. Next we show that the excess population risk of population functions satisfying TNC with parameter $\theta\geq 2$ is always lower bounded by $\Omega((\frac{d}{n\epsilon})^\frac{\theta}{\theta1}) $ and $\Omega((\frac{\sqrt{d\log(1/\delta)}}{n\epsilon})^\frac{\theta}{\theta1})$ for $\epsilon$DP and $(\epsilon, \delta)$DP, respectively, which matches our upper bounds. In the second part, we focus on a special case where the population risk function is strongly convex. Unlike the previous studies, here we assume the loss function is {\em nonnegative} and {\em the optimal value of population risk is sufficiently small}. With these additional assumptions, we propose a new method whose output could achieve an upper bound of $O(\frac{d\log(1/\delta)}{n^2\epsilon^2}+\frac{1}{n^{\tau}})$ and $O(\frac{d^2}{n^2\epsilon^2}+\frac{1}{n^{\tau}})$ for any $\tau> 1$ in $(\epsilon,\delta)$DP and $\epsilon$DP model respectively if the sample size $n$ is sufficiently large. These results circumvent their corresponding lower bounds in \cite{feldman2020private} for general strongly convex functions. Finally, we conduct experiments of our new methods on real world data. Experimental results also provide new insights into established theories.
Jinyan Su and Di Wang.

Differentially Private Expectation Maximization Algorithm with Statistical Guarantees. [Link] Abstract▼
As one popular technique for estimating the maximum likelihood
of mixture models or incomplete data problems, (Gradient) Expectation Maximization (EM) algorithm presents a challenge for preserving privacy of sensitive data. While although there are already some Differentially Private (DP) variants of (Gradient) EM algorithm, however, unlike in the nonprivate case, there is still no finite sample statistical guarantees. To address this issue, in this paper we propose the first DP variant of (Gradient) EM algorithm with statistical guarantees. Moreover, we apply our general framework to three canonical models: Gaussian Mixture Model (GMM), Mixture of Regressions Model (MRM) and Linear Regression with Missing Covariates (RMC). Specifically, for GMM in the DP model, our estimation error is near optimal in some cases. And for other two models in the DP model, we provide the first finite sample statistical guarantees. Our theory is supported by thorough numerical results.
Di Wang^{*}, Jiahao Ding^{*}, Lijie Hu, Zejun Xie, Miao Pan and Jinhui Xu (* equal contribution).

Optimal Rates of (Locally) Differentially Private Heavytailed
MultiArmed Bandits. [Link] Abstract▼
In this paper we study the problem of stochastic multiarmed bandits (MAB) in the (local) differential privacy (DP/LDP) model. Unlike the previous results which need to assume bounded reward distributions, here we mainly focus on the case the reward distribution of each arm only has $(1+v)$th moment with some $v\in (0, 1]$. In the first part, we study the problem in the central $\epsilon$DP model. We first provide a nearoptimal result by developing a private and robust Upper Confidence Bound (UCB) algorithm. Then, we improve the result via a private and robust version of the Successive Elimination (SE) algorithm. Finally, we show that the instancedependent regret bound of our improved algorithm is optimal by showing its lower bound. In the second part of the paper, we study the problem in the $\epsilon$LDP model. We propose an algorithm which could be seen as locally private and robust version of the SE algorithm, and show it could achieve (near) optimal rates for both instancedependent and instanceindependent regrets. All of the above results can also reveal the differences between the problem of private MAB with bounded rewards and heavytailed rewards. To achieve these (near) optimal rates, we develop several new hard instances and private robust estimators as byproducts, which might could be used to other related problems. Finally, experimental results also support our theoretical analysis and show the effectiveness of our algorithms.
Youming Tao^{*}, Yulian Wu^{*}, Peng Zhao and Di Wang. (* equal contribution)

On Facility Location Problem in Local Differential Privacy Model. Abstract▼
This paper studies the facility location problem, where the given input is combined of (1) a metric space (V, d), (2) facility cost f_i for each facility, We study the uncapacitated facility location (UFL) problem under the constraints imposed by the local differential privacy (LDP). Recently, Gupta et al. (2009) and Esencayi et al. (2019) proposed lower and upper bounds for the UFL problem on the central differential privacy (DP) model where a curator first collects all data before being processed. In this paper, we focus on the LDP model, where we protect a client's participation in the facility location instance. Under the HST metric, we show that there is a noninteractive $\epsilon$LDP algorithm achieving $O(n^{1/4}/\epsilon^2)$approximation ratio, where n is the size of the metric. On the negative side, we show a lower bound of $\Omega(n^{1/4}/\sqrt{\epsilon})$ on the approximation ratio for any noninteractive $\epsilon$LDP algorithm. Thus, our results are tight up to a factor of $\epsilon$. Moreover, unlike previous results, our results generalize for nonuniform facility costs.
Yunus Esencayi, Chenglin Fan, Di Wang, Marco Gaboradi, Vincent CohenAddad and Shi Li.
 Towards Assessment of Randomized Mechanisms for
Certifying Adversarial Robustness. [Link] Abstract▼
As a certified defensive technique, randomized smoothing has received considerable attention due to its scalability to large datasets and neural networks. However, several important questions remain unanswered, such as (i) whether the Gaussian mechanism is an appropriate option for certifying $\ell_2$norm robustness, and (ii) whether there is an appropriate randomized mechanism to certify $\ell_\infty$norm robustness
for highdimensional datasets. To shed light on these questions, the main difficulty is how to assess each randomized mechanism. In this paper, we propose a generic framework, which connects the existing frameworks in (Lecuyer et al. 2018; Li et al. 2019), to assess randomized mechanisms.
Under our framework, for a mechanism which can certify a certain extent of robustness, we define the magnitude ({\em i.e.,} the expected $\ell_\infty$ norm) of the randomized noise it adds as the metric for assessing its appropriateness. We also derive lower bounds on the metric for $\ell_2$norm and $\ell_\infty$norm cases as the criteria for assessment.
Based on our framework, we assess the Gaussian and Exponential mechanisms by comparing the magnitude of noise added by these mechanisms and the corresponding criteria. We first conclude that the Gaussian mechanism is an appropriate option to certify $\ell_2$norm robustness. Moreover, surprisingly, we also show that the Gaussian mechanism is also an appropriate option for certifying $\ell_\infty$norm robustness, instead of the Exponential mechanism.
Finally, we verify our theoretical results by evaluations on CIFAR10 and ImageNet.
Tianhang Zheng^{*}, Di Wang^{*}, Baochun Li and Jinhui Xu (* equal contribution).