{"id":1010,"date":"2025-07-21T15:47:15","date_gmt":"2025-07-21T15:47:15","guid":{"rendered":"https:\/\/www.let-all.com\/blog\/?p=1010"},"modified":"2025-07-21T15:49:41","modified_gmt":"2025-07-21T15:49:41","slug":"testing-assumptions-of-learning-algorithms","status":"publish","type":"post","link":"https:\/\/www.let-all.com\/blog\/2025\/07\/21\/testing-assumptions-of-learning-algorithms\/","title":{"rendered":"Testing Assumptions of Learning Algorithms"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">Today&#8217;s technical post is by <a href=\"https:\/\/www.vasilyan.net\/\">Arsen Vasilyan<\/a>. This focuses on the very exciting new &#8220;testable learning&#8221; he introduced with Rubinfeld in a <a href=\"https:\/\/arxiv.org\/abs\/2204.07196\">2023 paper<\/a>. There&#8217;s been a flurry of work since then, so this is a good chance to catch up in case you&#8217;re behind!<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">1. The Goal: Learning with Noisy Labels<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\"><b>1.1 Introduction <\/b><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Many learning algorithms simply stop working when some of the training examples are mislabeled. For example, suppose we are using a linear program to learn a linear classifier. Then, a single mislabeled data-point can make the linear program infeasible, and the algorithm fails to output a solution. Today, we will discuss how to design learning algorithms that are provably robust to such noise.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">For noise-resistant learning algorithms, a key consideration emerges: what can these algorithms assume about how the data-points are distributed? We will discuss three theoretical frameworks addressing this question:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li class=\"\">Framework 1 &#8212; <em>the distribution-free framework<\/em> &#8212; makes no assumption on how data-points are generated. For every distribution over datapoints, the algorithm has to run efficiently and be robust to noise. This framework is very well-understood, but is computationally intractable. Even the simplest concept classes lead to computationally hard learning tasks in this framework.<\/li>\n\n\n\n<li class=\"\">Framework 2 &#8212; <em>the distribution-specific framework<\/em> &#8212; assumes that data-points come from some well-behaved distribution, such as the Gaussian distribution. Over the last 30 years, the distribution-specific framework has led to a wealth of provably efficient algorithms. However, these algorithms fail to be noise-resistant when their assumptions do not hold.<\/li>\n\n\n\n<li class=\"\">Framework 3 &#8212; <em>the testable framework<\/em> &#8212; relies on an assumption on the data-point distribution only for the run-time of an algorithm, but not for the algorithm&#8217;s robustness to noise. Even if data-points do not come from some well-behaved distribution, the algorithm should be robust to noise. However, in this case, the algorithm might take a long time to run.<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">This blog post will mostly focus on the third framework, which was recently introduced in [<a href=\"https:\/\/dl.acm.org\/doi\/10.1145\/3564246.3585117\">RV &#8217;23<\/a>] in the context of learning with noise, and further explored in [<a href=\"https:\/\/dl.acm.org\/doi\/10.1145\/3564246.3585206\">GKK &#8217;23<\/a>, <a href=\"https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2023\/hash\/7c319b62e2257b34cb0e1040ced2e007-Abstract-Conference.html\">DKKLZ &#8217;23<\/a>, <a href=\"https:\/\/papers.nips.cc\/paper_files\/paper\/2023\/hash\/204d9a9a4816a45909010587ffc3204b-Abstract-Conference.html\">GKSV &#8217;23<\/a>, <a href=\"https:\/\/openreview.net\/forum?id=z6n1fKMMC1\">GKSV &#8217;24<\/a>, <a href=\"https:\/\/papers.nips.cc\/paper_files\/paper\/2024\/hash\/e209210eae282e23e305df49fbb2769c-Abstract-Conference.html\">GSSV &#8217;24<\/a>, <a href=\"https:\/\/proceedings.mlr.press\/v247\/diakonikolas24a.html\">DKLZ&#8217;24<\/a>, <a href=\"https:\/\/papers.nips.cc\/paper_files\/paper\/2024\/hash\/06e9029d3d4d6cee71c5d9b8502f891b-Abstract-Conference.html\">STW &#8217;24<\/a>, <a href=\"https:\/\/arxiv.org\/abs\/2501.09189\">GKSV &#8217;25<\/a>]. In brief, although Framework 3 is more stringent than Framework 2, there is a toolkit of techniques that we can use to upgrade many Framework 2 algorithms into Framework 3.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><b>1.2 FAQ on the Testable Framework (Framework 3) <\/b><\/h3>\n\n\n\n<div class=\"wp-block-group is-vertical is-layout-flex wp-container-core-group-is-layout-4fc3f8e1 wp-block-group-is-layout-flex\">\n<p class=\"wp-block-paragraph\">We get a lot of questions about Framework 3, and so we begin with the following FAQs:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li class=\"\"><b>Q:<\/b><em> Why is the framework called testable?<\/em> <br><b>A:<\/b> All current algorithms in Framework 3 follow the following general recipe for upgrading Framework 2 algorithms into Framework 3 &#8212; the procedure suggests the name <em>testable learning<\/em>. See Section <a href=\"#subsecMaking-Low-Degree-Algorithm\">2.2<\/a> for more about this recipe.\n<ul class=\"wp-block-list\">\n<li class=\"\">a) Run a Framework 2 algorithm and obtain a hypothesis <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bh%7D&#038;bg=ffffff&#038;fg=000&#038;s=0&#038;c=20201002\" alt=\"{h}\" class=\"latex\" \/>. <\/li>\n\n\n\n<li class=\"\">b) Run a tester that is guaranteed to output \u201caccept\u201d whenever the assumption holds. <\/li>\n\n\n\n<li class=\"\">c) If the tester accepted, return the hypothesis <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bh%7D&#038;bg=ffffff&#038;fg=000&#038;s=0&#038;c=20201002\" alt=\"{h}\" class=\"latex\" \/>. <\/li>\n\n\n\n<li class=\"\">d) Otherwise, the assumption must have been violated so we can run a computationally expensive algorithm (See Section <a href=\"#subsecThree-Frameworks-for\">1.3<\/a> for more about those). <\/li>\n<\/ul>\n<\/li>\n\n\n\n<li class=\"\"><b>Q:<\/b><em> To implement this tester, cant we just check that the distribution is &#8220;close&#8221; to a Gaussian using tools from distribution testing? <\/em><br><b>A:<\/b> For many notions of &#8220;closeness&#8221; this is statistically impossible: for example one cannot test whether an unknown distribution is Gaussian or far from Gaussian in total variation distance.<sup data-fn=\"15aafd02-c12c-4db6-b4a0-4f893efdd1a6\" class=\"fn\"><a href=\"#15aafd02-c12c-4db6-b4a0-4f893efdd1a6\" id=\"15aafd02-c12c-4db6-b4a0-4f893efdd1a6-link\">1<\/a><\/sup> For other notions of closeness, such as the earth-mover distance, these tools require a number of samples and run-time that is exponential in the dimension. In learning theory, we are looking for much faster run-times (see [<a href=\"https:\/\/dl.acm.org\/doi\/10.1145\/3564246.3585117\">RV &#8217;23<\/a>] for more discussion).<\/li>\n\n\n\n<li class=\"\"><b>Q:<\/b><em> What if the assumption is violated only slightly? I am concerned that this will cause a Framework 3 algorithm to run slowly as a result.<\/em> <br><b>A:<\/b> Many algorithms in Framework 3 can be modified to run fast whenever the assumption is only approximately satisfied in total variation distance [<a href=\"https:\/\/arxiv.org\/abs\/2501.09189\">GKSV &#8217;25<\/a>]. This setting is called <em>Tolerant Testable Learning<\/em>, and discussed more in Section <a href=\"#subsecAssumption-Tolerance-and-the\">3.2<\/a>.<\/li>\n\n\n\n<li class=\"\"><b>Q:<\/b><em> The algorithm is guaranteed to run fast for only one specific distribution.<\/em> <em>What if I want it to run fast for an entire family of distributions?<\/em> <br><b>A<\/b>: This setting, called <em>universal testable learning<\/em>, has also been studied. [<a href=\"https:\/\/papers.nips.cc\/paper_files\/paper\/2023\/hash\/204d9a9a4816a45909010587ffc3204b-Abstract-Conference.html\">GKSV &#8217;23<\/a>] studies it for the family of <em>log-concave distributions<\/em>, and it turns out to be intimately connected with Sum-of-Squares relaxations [<a href=\"https:\/\/arxiv.org\/abs\/1711.07465\">KS17<\/a>].<\/li>\n\n\n\n<li class=\"\"><b>Q:<\/b><em> Framework 3 is defined to work with learning under label noise (a.k.a. agnostic learning). What about (noise-free) PAC learning?<\/em> <br><b>A:<\/b> Frameworks 2 and 3 are essentially equivalent in the noise-free setting. In this setting, one can always tell whether the learning algorithm successfully produced a good hypothesis: just estimate its prediction error by drawing a fresh set of examples. If the prediction error is large, this must be because the assumption is violated, so Framework 3 does not require us to run fast. In this case, we can produce a classifier by running a computationally expensive algorithm (See Section <a href=\"#subsecThree-Frameworks-for\"><\/a><a href=\"#subsecThree-Frameworks-for\">1.3<\/a> for more about those).<\/li>\n\n\n\n<li class=\"\"><b>Q:<\/b><em> What is the sample complexity of this framework? Is it related to VC dimension?<\/em> <br><b>A: <\/b>[<a href=\"https:\/\/dl.acm.org\/doi\/10.1145\/3564246.3585206\">GKK &#8217;23<\/a>]<b> <\/b>have shown that the sample complexity of this model is characterized by the Rademacher complexity of the hypothesis class. However, in this blog post we will focus on the run-time rather than sample complexity.<\/li>\n<\/ol>\n<\/div>\n\n\n\n<p class=\"wp-block-paragraph\">We will now introduce the three frameworks in more detail, survey more of the known results and highlight general ideas for designing such algorithms.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><b>1.3 Three Frameworks for Learning with Noisy Labels<\/b><\/h3>\n<div id=\"subsecThree-Frameworks-for\"><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">Let us be more formal now. <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cmathcal%7BC%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;mathcal{C}}\" class=\"latex\" \/> will denote our hypothesis class, containing binary-valued functions over the domain <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cmathbb%7BR%7D%5E%7Bd%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;mathbb{R}^{d}}\" class=\"latex\" \/>. We get independent examples <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cleft%5C%7B+x_%7Bi%7D%5Cright%5C%7D+%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;left&#92;{ x_{i}&#92;right&#92;} }\" class=\"latex\" \/> from a distribution <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BD_%7B%5Ctext%7BExamples%7D%7D%2C%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{D_{&#92;text{Examples}},}\" class=\"latex\" \/> and each <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bx_%7Bi%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{x_{i}}\" class=\"latex\" \/> is labeled by<b> <\/b>some unknown binary-valued<b> <\/b>function<sup data-fn=\"30a9afd3-f30d-4c82-8879-cc2de2a0d64c\" class=\"fn\"><a href=\"#30a9afd3-f30d-4c82-8879-cc2de2a0d64c\" id=\"30a9afd3-f30d-4c82-8879-cc2de2a0d64c-link\">2<\/a><\/sup> <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bf_%7B%5Ctext%7Bnoisy+labels%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{f_{&#92;text{noisy labels}}}\" class=\"latex\" \/>. To model arbitrary label noise, we place no assumption on <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bf_%7B%5Ctext%7Bnoisy+labels%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{f_{&#92;text{noisy labels}}}\" class=\"latex\" \/>, and our goal is to get as close as possible to the smallest error among all functions <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bf%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{f}\" class=\"latex\" \/> from the class <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cmathcal%7BC%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;mathcal{C}}\" class=\"latex\" \/>:<\/p>\n\n\n\n<p class=\"has-text-align-center wp-block-paragraph\"><img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%5Cdisplaystyle+%5Ctext%7Bopt%7D_%7B%5Cmathcal%7BC%7D%7D%3A%3D%5Cmin_%7Bf%5Cin%5Cmathcal%7BC%7D%7D%5Cunderbrace%7B%5CPr_%7Bx%5Csim+D_%7B%5Ctext%7BExamples%7D%7D%7D%5Bf%28x%29%5Cneq+f_%7B%5Ctext%7Bnoisy+labels%7D%7D%28x%29%5D%7D_%7B%5Ctext%7BPrediction+error+of+%5Censuremath%7Bf%7D.%7D%7D.+&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"&#92;displaystyle &#92;text{opt}_{&#92;mathcal{C}}:=&#92;min_{f&#92;in&#92;mathcal{C}}&#92;underbrace{&#92;Pr_{x&#92;sim D_{&#92;text{Examples}}}[f(x)&#92;neq f_{&#92;text{noisy labels}}(x)]}_{&#92;text{Prediction error of &#92;ensuremath{f}.}}. \" class=\"latex\" \/><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">For instance, <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Ctext%7Bopt%7D_%7B%5Ctext%7BLinear+Classifiers%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;text{opt}_{&#92;text{Linear Classifiers}}}\" class=\"latex\" \/> denotes the smallest classification error achievable by any linear classifier.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p class=\"wp-block-paragraph\"><b>Framework 1 [Distribution-free Framework]<\/b> For any distribution <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BD_%7B%5Ctext%7BExamples%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{D_{&#92;text{Examples}}}\" class=\"latex\" \/> and function <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bf_%7B%5Ctext%7Bnoisy+labels%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{f_{&#92;text{noisy labels}}}\" class=\"latex\" \/>, the learning algorithm should run in time <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BT%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{T}\" class=\"latex\" \/> and with high probability<sup data-fn=\"ebba8fcb-f5da-491f-bd59-92e53a33a4bc\" class=\"fn\"><a href=\"#ebba8fcb-f5da-491f-bd59-92e53a33a4bc\" id=\"ebba8fcb-f5da-491f-bd59-92e53a33a4bc-link\">3<\/a><\/sup> output a classifier <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bh%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{h}\" class=\"latex\" \/> with<\/p>\n\n\n\n<p class=\"has-text-align-center wp-block-paragraph\"><img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%5Cdisplaystyle+%5Coverbrace%7B%5CPr_%7Bx%5Csim+D_%7B%5Ctext%7BExamples%7D%7D%7D%5Bh%28x%29%5Cneq+f_%7B%5Ctext%7Bnoisy+labels%7D%7D%28x%29%5D%7D%5E%7B%5Ctext%7BPrediction+error+of+%5Censuremath%7Bh%7D.%7D%7D%5Cleq%5Cunderbrace%7B%5Coverbrace%7B%5Ctext%7Bopt%7D_%7B%5Cmathcal%7BC%7D%7D%7D%5E%7B%5Ctext%7BOptimal+prediction+error+in+%5Censuremath%7B%5Cmathcal%7BC%7D.%7D%7D%7D%2B%5Cvarepsilon%7D_%7B%5Ctext%7B%5Censuremath%7B%5Ctext%7B%5Ctext%7B%5Censuremath%7B%5Ctext%7BWeaker+guarantees%2C+such+as+O%28%5Censuremath%7B%5Ctext%7Bopt%7D_%7B%5Cmathcal%7BC%7D%7D%7D%29%2B%5Censuremath%7B%5Cvarepsilon%7D%2C%5C+also+considered.%7D%7D%7D%7D%7D%7D%7D.+&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"&#92;displaystyle &#92;overbrace{&#92;Pr_{x&#92;sim D_{&#92;text{Examples}}}[h(x)&#92;neq f_{&#92;text{noisy labels}}(x)]}^{&#92;text{Prediction error of &#92;ensuremath{h}.}}&#92;leq&#92;underbrace{&#92;overbrace{&#92;text{opt}_{&#92;mathcal{C}}}^{&#92;text{Optimal prediction error in &#92;ensuremath{&#92;mathcal{C}.}}}+&#92;varepsilon}_{&#92;text{&#92;ensuremath{&#92;text{&#92;text{&#92;ensuremath{&#92;text{Weaker guarantees, such as O(&#92;ensuremath{&#92;text{opt}_{&#92;mathcal{C}}})+&#92;ensuremath{&#92;varepsilon},&#92; also considered.}}}}}}}. \" class=\"latex\" \/><\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p class=\"wp-block-paragraph\">This task, also known as <em>agnostic learning<\/em>, can be solved sample-efficiently. According to a central theorem in the VC theory, <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cwidetilde%7BO%7D%28%5Ctext%7BVC%7D%28%5Cmathcal%7BC%7D%29%2F%5Cvarepsilon%5E%7B2%7D%29%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;widetilde{O}(&#92;text{VC}(&#92;mathcal{C})\/&#92;varepsilon^{2})}\" class=\"latex\" \/> samples suffice (where <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Ctext%7BVC%7D%28%5Cmathcal%7BC%7D%29%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;text{VC}(&#92;mathcal{C})}\" class=\"latex\" \/> is the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Vapnik%E2%80%93Chervonenkis_dimension\">VC dimension<\/a> of <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cmathcal%7BC%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;mathcal{C}}\" class=\"latex\" \/>). For example, if <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cmathcal%7BC%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;mathcal{C}}\" class=\"latex\" \/> is the class of linear classifiers, then <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Ctext%7BVC%7D%28%5Cmathcal%7BC%7D%29%3Dd%2B1%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;text{VC}(&#92;mathcal{C})=d+1}\" class=\"latex\" \/>, so <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cwidetilde%7BO%7D%28d%2F%5Cvarepsilon%5E%7B2%7D%29%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;widetilde{O}(d\/&#92;varepsilon^{2})}\" class=\"latex\" \/> samples are enough.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">However, when considering the run-time <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BT%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{T}\" class=\"latex\" \/>, the Framework 1 is revealed to be intractable. For example, no <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B2%5E%7Bo%28d%29%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{2^{o(d)}}\" class=\"latex\" \/>-time algorithm is known in Framework 1 when <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cmathcal%7BC%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;mathcal{C}}\" class=\"latex\" \/> is taken to be the class of linear classifiers (which is arguably the most basic hypothesis class). Furthermore, even the task of deciding if <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Ctext%7B+%5Censuremath%7B%5Ctext%7Bopt%7D_%7B%5Ctext%7BLinear+Classifiers%7D%7D%7D%7D%5Cleq0.01%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;text{ &#92;ensuremath{&#92;text{opt}_{&#92;text{Linear Classifiers}}}}&#92;leq0.01}\" class=\"latex\" \/> or <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Ctext%7Bopt%7D_%7B%5Ctext%7BLinear+Classifiers%7D%7D%5Cgeq0.49%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;text{opt}_{&#92;text{Linear Classifiers}}&#92;geq0.49}\" class=\"latex\" \/> is known to be NP-hard [<a href=\"https:\/\/ieeexplore.ieee.org\/document\/4031389\">GR&#8217;06<\/a>, <a href=\"https:\/\/ieeexplore.ieee.org\/document\/4031391\">FGKP&#8217;06<\/a>] for linear classifiers and believed to require <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B2%5E%7B%5COmega%28d%29%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{2^{&#92;Omega(d)}}\" class=\"latex\" \/> time. In summary, these Framework 1 tasks take <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cwidetilde%7BO%7D%28d%2F%5Cvarepsilon%5E%7B2%7D%29%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;widetilde{O}(d\/&#92;varepsilon^{2})}\" class=\"latex\" \/> samples but the run-time <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BT%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{T}\" class=\"latex\" \/> is <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B2%5E%7B%5COmega%28d%29%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{2^{&#92;Omega(d)}}\" class=\"latex\" \/>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">These intractability results hold for certain carefully-constructed worst-case choices of <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BD_%7B%5Ctext%7BExamples%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{D_{&#92;text{Examples}}}\" class=\"latex\" \/>. What happens if <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BD_%7B%5Ctext%7BExamples%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{D_{&#92;text{Examples}}}\" class=\"latex\" \/> is some specific commonly-occurring probability distribution? For instance, we may want to find a good linear classifier when the data-points come from a Gaussian distribution or some other well-behaved distribution <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BD_%7B%5Ctext%7BAssumption%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{D_{&#92;text{Assumption}}}\" class=\"latex\" \/>. This has motivated research in the following framework:<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p class=\"wp-block-paragraph\"><b>Framework 2 [Distribution-Specific Framework]<\/b> For any distribution <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BD_%7B%5Ctext%7BExamples%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{D_{&#92;text{Examples}}}\" class=\"latex\" \/> and function <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bf_%7B%5Ctext%7Bnoisy+labels%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{f_{&#92;text{noisy labels}}}\" class=\"latex\" \/>, if <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BD_%7B%5Ctext%7BExamples%7D%7D%3DD_%7B%5Ctext%7BAssumption%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{D_{&#92;text{Examples}}=D_{&#92;text{Assumption}}}\" class=\"latex\" \/>, then the learning algorithm should run in time <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BT%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{T}\" class=\"latex\" \/> and with high probability output a classifier <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bh%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{h}\" class=\"latex\" \/> with<\/p>\n\n\n\n<p class=\"has-text-align-center wp-block-paragraph\"><img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%5Cdisplaystyle+%5Coverbrace%7B%5CPr_%7Bx%5Csim+D_%7B%5Ctext%7BExamples%7D%7D%7D%5Bh%28x%29%5Cneq+f_%7B%5Ctext%7Bnoisy+labels%7D%7D%28x%29%5D%7D%5E%7B%5Ctext%7BPrediction+error+of+%5Censuremath%7Bh%7D.%7D%7D%5Cleq%5Cunderbrace%7B%5Coverbrace%7B%5Ctext%7Bopt%7D_%7B%5Cmathcal%7BC%7D%7D%7D%5E%7B%5Ctext%7BOptimal+prediction+error+in+%5Censuremath%7B%5Cmathcal%7BC%7D.%7D%7D%7D%2B%5Cvarepsilon%7D_%7B%5Ctext%7B%5Censuremath%7B%5Ctext%7B%5Ctext%7B%5Censuremath%7B%5Ctext%7BWeaker+guarantees%2C+such+as+O%28%5Censuremath%7B%5Ctext%7Bopt%7D_%7B%5Cmathcal%7BC%7D%7D%7D%29%2B%5Censuremath%7B%5Cvarepsilon%7D%2C%5C+also+considered.%7D%7D%7D%7D%7D%7D%7D.+&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"&#92;displaystyle &#92;overbrace{&#92;Pr_{x&#92;sim D_{&#92;text{Examples}}}[h(x)&#92;neq f_{&#92;text{noisy labels}}(x)]}^{&#92;text{Prediction error of &#92;ensuremath{h}.}}&#92;leq&#92;underbrace{&#92;overbrace{&#92;text{opt}_{&#92;mathcal{C}}}^{&#92;text{Optimal prediction error in &#92;ensuremath{&#92;mathcal{C}.}}}+&#92;varepsilon}_{&#92;text{&#92;ensuremath{&#92;text{&#92;text{&#92;ensuremath{&#92;text{Weaker guarantees, such as O(&#92;ensuremath{&#92;text{opt}_{&#92;mathcal{C}}})+&#92;ensuremath{&#92;varepsilon},&#92; also considered.}}}}}}}. \" class=\"latex\" \/><\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p class=\"wp-block-paragraph\">This framework is also often referred to as <em>distribution-specific agnostic learning<\/em>. For many natural choices of distribution <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BD_%7B%5Ctext%7BAssumption%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{D_{&#92;text{Assumption}}}\" class=\"latex\" \/>, the aforementioned hardness results for Framework 1 can be sidestepped in Framework 2. When the class <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cmathcal%7BC%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;mathcal{C}}\" class=\"latex\" \/> is the class of linear classifiers and the distribution <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BD_%7B%5Ctext%7BAssumption%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{D_{&#92;text{Assumption}}}\" class=\"latex\" \/> is a Gaussian distribution, the algorithm of [<a href=\"https:\/\/epubs.siam.org\/doi\/10.1137\/060649057\">KKMS &#8217;08<\/a>, <a href=\"https:\/\/epubs.siam.org\/doi\/abs\/10.1137\/100783030\">DGJSV &#8217;10<\/a>] can achieve an error <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Ctext%7Bopt%7D_%7B%5Ctext%7BLinear+Classifiers%7D%7D%2B%5Cvarepsilon%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;text{opt}_{&#92;text{Linear Classifiers}}+&#92;varepsilon}\" class=\"latex\" \/> with run-time <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bd%5E%7B%5Ctilde%7BO%7D%281%2F%5Cvarepsilon%5E%7B2%7D%29%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{d^{&#92;tilde{O}(1\/&#92;varepsilon^{2})}}\" class=\"latex\" \/>, which is a lot better than the <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B2%5E%7BO%28d%29%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{2^{O(d)}}\" class=\"latex\" \/> run-time in Framework 1. This run-time is believed to be optimal [<a href=\"https:\/\/papers.nips.cc\/paper\/2020\/hash\/9d7311ba459f9e45ed746755a32dcd11-Abstract.html\">DKZ &#8217;20<\/a>, <a href=\"https:\/\/papers.nips.cc\/paper\/2020\/hash\/17257e81a344982579af1ae6415a7b8c-Abstract.html\">GGK &#8217;20<\/a>, <a href=\"https:\/\/proceedings.mlr.press\/v134\/diakonikolas21c.html\">DKPZ &#8217;21<\/a>, <a href=\"https:\/\/proceedings.mlr.press\/v202\/diakonikolas23b.html\">DKR &#8217;23<\/a>], but can be sped up to <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Ctext%7Bpoly%7D%28d%2F%5Cvarepsilon%29%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;text{poly}(d\/&#92;varepsilon)}\" class=\"latex\" \/> if one is happy with a slightly larger error<sup data-fn=\"a367653b-0bd9-4068-ac43-a1ec745f36d5\" class=\"fn\"><a href=\"#a367653b-0bd9-4068-ac43-a1ec745f36d5\" id=\"a367653b-0bd9-4068-ac43-a1ec745f36d5-link\">4<\/a><\/sup> of <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BO%28%5Ctext%7Bopt%7D_%7B%5Ctext%7BLinear+Classifiers%7D%7D%29%2B%5Cvarepsilon%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{O(&#92;text{opt}_{&#92;text{Linear Classifiers}})+&#92;varepsilon}\" class=\"latex\" \/> [<a href=\"https:\/\/dl.acm.org\/doi\/10.1145\/2591796.2591839\">ABL &#8217;14<\/a>, <a href=\"https:\/\/proceedings.mlr.press\/v162\/diakonikolas22b.html\">DKTZ &#8217;22<\/a>].<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Yet, Framework 2 relies on the potentially unverifiable assumption that the distribution <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BD_%7B%5Ctext%7BExamples%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{D_{&#92;text{Examples}}}\" class=\"latex\" \/> equals <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BD_%7B%5Ctext%7BAssumption%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{D_{&#92;text{Assumption}}}\" class=\"latex\" \/>. Note that it is impossible to verify whether <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BD_%7B%5Ctext%7BExamples%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{D_{&#92;text{Examples}}}\" class=\"latex\" \/> is exactly equal to the Gaussian distribution. Unfortunately, when <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BD_%7B%5Ctext%7BExamples%7D%7D%5Cneq+D_%7B%5Ctext%7BAssumption%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{D_{&#92;text{Examples}}&#92;neq D_{&#92;text{Assumption}}}\" class=\"latex\" \/>, the <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Ctext%7Bopt%7D_%7B%5Cmathcal%7BC%7D%7D%2B%5Cvarepsilon%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;text{opt}_{&#92;mathcal{C}}+&#92;varepsilon}\" class=\"latex\" \/> error guarantee of an algorithm in Framework 2 becomes null and void. Remember how a linear-programming approach can fully fail to find a linear classifier with even a single mislabeled example? We see that Framework 2 also permits an algorithm to fully fail when <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BD_%7B%5Ctext%7BExamples%7D%7D%5Cneq+D_%7B%5Ctext%7BAssumption%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{D_{&#92;text{Examples}}&#92;neq D_{&#92;text{Assumption}}}\" class=\"latex\" \/>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">This limitation is addressed by the following more challenging framework, in which the algorithm must <b>always<\/b> output a near-optimal classifier (with high probability, as in Framework 1) but must run quickly <strong>only <\/strong>when the assumption holds (as in Framework 2):<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p class=\"wp-block-paragraph\"><b>Framework 3 [Testable Framework]<\/b> For any distribution <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BD_%7B%5Ctext%7BExamples%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{D_{&#92;text{Examples}}}\" class=\"latex\" \/> and function <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bf_%7B%5Ctext%7Bnoisy+labels%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{f_{&#92;text{noisy labels}}}\" class=\"latex\" \/>, the learning algorithm should with high probability output a classifier <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bh%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{h}\" class=\"latex\" \/> with <a name=\"eq optimality guarantee.\"><\/a><\/p>\n\n\n\n<p class=\"has-text-align-center wp-block-paragraph\"><img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%5Cdisplaystyle+%5Coverbrace%7B%5CPr_%7Bx%5Csim+D_%7B%5Ctext%7BExamples%7D%7D%7D%5Bh%28x%29%5Cneq+f_%7B%5Ctext%7Bnoisy+labels%7D%7D%28x%29%5D%7D%5E%7B%5Ctext%7BPrediction+error+of+%5Censuremath%7Bh%7D.%7D%7D%5Cleq%5Cunderbrace%7B%5Coverbrace%7B%5Ctext%7Bopt%7D_%7B%5Cmathcal%7BC%7D%7D%7D%5E%7B%5Ctext%7BOptimal+prediction+error+in+%5Censuremath%7B%5Cmathcal%7BC%7D.%7D%7D%7D%2B%5Cvarepsilon%7D_%7B%5Ctext%7B%5Censuremath%7B%5Ctext%7BWeaker+guarantees%2C+such+as+O%28%5Censuremath%7B%5Ctext%7Bopt%7D_%7B%5Cmathcal%7BC%7D%7D%7D%29%2B%5Censuremath%7B%5Cvarepsilon%7D%2C%5C+also+considered.%7D%7D%7D%7D.+%5C+%5C+%5C+%5C+%5C+%281%29&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"&#92;displaystyle &#92;overbrace{&#92;Pr_{x&#92;sim D_{&#92;text{Examples}}}[h(x)&#92;neq f_{&#92;text{noisy labels}}(x)]}^{&#92;text{Prediction error of &#92;ensuremath{h}.}}&#92;leq&#92;underbrace{&#92;overbrace{&#92;text{opt}_{&#92;mathcal{C}}}^{&#92;text{Optimal prediction error in &#92;ensuremath{&#92;mathcal{C}.}}}+&#92;varepsilon}_{&#92;text{&#92;ensuremath{&#92;text{Weaker guarantees, such as O(&#92;ensuremath{&#92;text{opt}_{&#92;mathcal{C}}})+&#92;ensuremath{&#92;varepsilon},&#92; also considered.}}}}. &#92; &#92; &#92; &#92; &#92; (1)\" class=\"latex\" \/><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Furthermore, if <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BD_%7B%5Ctext%7BExamples%7D%7D%3DD_%7B%5Ctext%7BAssumption%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{D_{&#92;text{Examples}}=D_{&#92;text{Assumption}}}\" class=\"latex\" \/>, then with high probability the learning algorithm should run<sup data-fn=\"e18fc90b-639d-4ee0-bce9-3c77a7eaa096\" class=\"fn\"><a href=\"#e18fc90b-639d-4ee0-bce9-3c77a7eaa096\" id=\"e18fc90b-639d-4ee0-bce9-3c77a7eaa096-link\">5<\/a><\/sup> in time <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BT%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{T}\" class=\"latex\" \/>.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p class=\"wp-block-paragraph\">This framework is also often referred to as <em>testable learning<\/em> (for reasons explained in Section <a href=\"#subsecMaking-Low-Degree-Algorithm\"><\/a><a href=\"#subsecMaking-Low-Degree-Algorithm\">2.2<\/a>). When a Framework 3 algorithm produces a classifier <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bh%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{h}\" class=\"latex\" \/>, the user can be confident that <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bh%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{h}\" class=\"latex\" \/> is nearly-optimal (Equation <a href=\"#eq optimality guarantee.\">1<\/a>) whether or not <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BD_%7B%5Ctext%7BExamples%7D%7D%3DD_%7B%5Ctext%7BAssumption%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{D_{&#92;text{Examples}}=D_{&#92;text{Assumption}}}\" class=\"latex\" \/>, but the fast run-time is only guaranteed when <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BD_%7B%5Ctext%7BExamples%7D%7D%3DD_%7B%5Ctext%7BAssumption%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{D_{&#92;text{Examples}}=D_{&#92;text{Assumption}}}\" class=\"latex\" \/>.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1444\" height=\"698\" src=\"https:\/\/i0.wp.com\/www.let-all.com\/blog\/wp-content\/uploads\/2025\/07\/2.png?fit=678%2C328&amp;ssl=1\" alt=\"\" class=\"wp-image-1039\" srcset=\"https:\/\/i0.wp.com\/www.let-all.com\/blog\/wp-content\/uploads\/2025\/07\/2.png?w=1444&amp;ssl=1 1444w, https:\/\/i0.wp.com\/www.let-all.com\/blog\/wp-content\/uploads\/2025\/07\/2.png?resize=300%2C145&amp;ssl=1 300w, https:\/\/i0.wp.com\/www.let-all.com\/blog\/wp-content\/uploads\/2025\/07\/2.png?resize=1024%2C495&amp;ssl=1 1024w, https:\/\/i0.wp.com\/www.let-all.com\/blog\/wp-content\/uploads\/2025\/07\/2.png?resize=768%2C371&amp;ssl=1 768w, https:\/\/i0.wp.com\/www.let-all.com\/blog\/wp-content\/uploads\/2025\/07\/2.png?w=1356&amp;ssl=1 1356w\" sizes=\"auto, (max-width: 678px) 100vw, 678px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Even though Framework 3 requires more from the learning algorithm than Framework 2, a growing body of research demonstrates that Framework 3 is often as computationally tractable as Framework 2. This research direction was initiated in [<a href=\"https:\/\/dl.acm.org\/doi\/10.1145\/3564246.3585117\">RV &#8217;23<\/a>] which mainly considered linear classifiers under the Gaussian distribution, and was subsequently extended to many more hypothesis classes and distributions <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BD_%7B%5Ctext%7BAssumption%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{D_{&#92;text{Assumption}}}\" class=\"latex\" \/> [<a href=\"https:\/\/dl.acm.org\/doi\/10.1145\/3564246.3585206\">GKK &#8217;23<\/a>, <a href=\"https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2023\/hash\/7c319b62e2257b34cb0e1040ced2e007-Abstract-Conference.html\">DKKLZ &#8217;23<\/a>, <a href=\"https:\/\/papers.nips.cc\/paper_files\/paper\/2023\/hash\/204d9a9a4816a45909010587ffc3204b-Abstract-Conference.html\">GKSV &#8217;23<\/a>, <a href=\"https:\/\/openreview.net\/forum?id=z6n1fKMMC1\">GKSV &#8217;24<\/a>, <a href=\"https:\/\/papers.nips.cc\/paper_files\/paper\/2024\/hash\/e209210eae282e23e305df49fbb2769c-Abstract-Conference.html\">GSSV &#8217;24<\/a>, <a href=\"https:\/\/proceedings.mlr.press\/v247\/diakonikolas24a.html\">DKLZ&#8217;24<\/a>, <a href=\"https:\/\/papers.nips.cc\/paper_files\/paper\/2024\/hash\/06e9029d3d4d6cee71c5d9b8502f891b-Abstract-Conference.html\">STW &#8217;24<\/a>, <a href=\"https:\/\/arxiv.org\/abs\/2501.09189\">GKSV &#8217;25<\/a>]. In this blog post, we will survey this research direction and explore some of the techniques used to design such algorithms.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><b>2. How to Modify Framework 2 Algorithms to Work in Framework 3. <\/b><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">In this section we will discuss a general recipe for converting Framework 2 algorithms into Framework 3 by augmenting them with an appropriate <em>tester<\/em>. As a case study, we will focus on one of the most widely-studied Framework 2 algorithms: the <em>Low-Degree Algorithm<\/em>. We will see how it can be upgraded into Framework 3 using the <em>Moment-Matching Tester<\/em>. Later, in Section 3, we will see how a similar recipe can be used for other algorithms as well.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><b>2.1 What is an Example of Framework 2 Algorithm? The Low-Degree Algorithm. <\/b><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">We begin by examining the following Framework 2 algorithm.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p class=\"wp-block-paragraph\"><b>Degree-k Low-Degree Algorithm [<a href=\"https:\/\/epubs.siam.org\/doi\/10.1137\/060649057\">KKMS &#8217;08<\/a>]:<\/b><\/p>\n\n\n\n<div class=\"wp-block-group is-vertical is-layout-flex wp-container-core-group-is-layout-4fc3f8e1 wp-block-group-is-layout-flex\">\n<div class=\"wp-block-group is-vertical is-layout-flex wp-container-core-group-is-layout-4fc3f8e1 wp-block-group-is-layout-flex\">\n<p class=\"wp-block-paragraph\">Given: access to independent Gaussian examples labeled by unknown function <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bf_%7B%5Ctext%7Bnoisy+labels%7D%7D%3A%5Cmathbb%7BR%7D%5E%7Bd%7D%5Crightarrow%5Cleft%5C%7B+%5Cpm1%5Cright%5C%7D+%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{f_{&#92;text{noisy labels}}:&#92;mathbb{R}^{d}&#92;rightarrow&#92;left&#92;{ &#92;pm1&#92;right&#92;} }\" class=\"latex\" \/> and a parameter <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bk%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{k}\" class=\"latex\" \/>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Output: hypothesis <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bh%3A%5Cmathbb%7BR%7D%5E%7Bd%7D%5Crightarrow%5Cleft%5C%7B+%5Cpm1%5Cright%5C%7D+%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{h:&#92;mathbb{R}^{d}&#92;rightarrow&#92;left&#92;{ &#92;pm1&#92;right&#92;} }\" class=\"latex\" \/>.<\/p>\n<\/div>\n\n\n\n<ol class=\"wp-block-list\">\n<li class=\"\"><img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BS%5Cleftarrow%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{S&#92;leftarrow}\" class=\"latex\" \/> {<img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bd%5E%7Bk%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{d^{k}}\" class=\"latex\" \/> Gaussian examples labeled by function <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bf_%7B%5Ctext%7Bnoisy+labels%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{f_{&#92;text{noisy labels}}}\" class=\"latex\" \/>}.<\/li>\n\n\n\n<li class=\"\">Compute<br><p align=\"center\"><img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%5Cdisplaystyle+p%5E%7B%2A%7D%5Cleftarrow%5Cunderset%7B%5Cboldsymbol%7B%5Ctext%7Bdegree%7D%7D-k%5Ctext%7B+polynomial+%7Dp%7D%7B%5Ctext%7Bargmin%7D%7D%5Cleft%5B%5Cmathbb%7BE%7D_%7B%28x_%7Bi%7D%2Cf_%7B%5Ctext%7Bnoisy+labels%7D%7D%28x_%7Bi%7D%29%29%5Csim+S%7D%7Cp%28x_%7Bi%7D%29-f_%7B%5Ctext%7Bnoisy+labels%7D%7D%28x_%7Bi%7D%29%7C%5Cright%5D+&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"&#92;displaystyle p^{*}&#92;leftarrow&#92;underset{&#92;boldsymbol{&#92;text{degree}}-k&#92;text{ polynomial }p}{&#92;text{argmin}}&#92;left[&#92;mathbb{E}_{(x_{i},f_{&#92;text{noisy labels}}(x_{i}))&#92;sim S}|p(x_{i})-f_{&#92;text{noisy labels}}(x_{i})|&#92;right] \" class=\"latex\" \/><\/p><p>by solving a linear program.<\/p><\/li>\n\n\n\n<li class=\"\">Output hypothesis <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bh%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{h}\" class=\"latex\" \/> mapping <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bx%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{x}\" class=\"latex\" \/> in <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cmathbb%7BR%7D%5E%7Bd%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;mathbb{R}^{d}}\" class=\"latex\" \/> to <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Ctext%7Bsign%28%5Censuremath%7Bp%5E%7B%2A%7D%7D%28x%29%29%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;text{sign(&#92;ensuremath{p^{*}}(x))}}\" class=\"latex\" \/>.<\/li>\n<\/ol>\n<\/div>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p class=\"wp-block-paragraph\">It can be shown [<a href=\"https:\/\/epubs.siam.org\/doi\/10.1137\/060649057\">KKMS &#8217;08<\/a>, <a href=\"https:\/\/epubs.siam.org\/doi\/abs\/10.1137\/100783030\">DGJSV &#8217;10<\/a>] that, for <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bk%3D%5Ctilde%7BO%7D%281%2F%5Cvarepsilon%5E%7B2%7D%29%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{k=&#92;tilde{O}(1\/&#92;varepsilon^{2})}\" class=\"latex\" \/>, the resulting classifier <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bh%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{h}\" class=\"latex\" \/> will achieve prediction error <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Ctext%7BO%28%5Censuremath%7B%5Ctext%7Bopt%7D_%7B%5Ctext%7BLinear+Classifiers%7D%7D%7D%7D%29%2B%5Cvarepsilon%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;text{O(&#92;ensuremath{&#92;text{opt}_{&#92;text{Linear Classifiers}}}})+&#92;varepsilon}\" class=\"latex\" \/>. Furthermore, with a slightly more sophisticated version of step 3, the prediction error improves to <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Ctext%7B%5Censuremath%7B%5Ctext%7Bopt%7D_%7B%5Ctext%7BLinear+Classifiers%7D%7D%7D%7D%2B%5Cvarepsilon%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;text{&#92;ensuremath{&#92;text{opt}_{&#92;text{Linear Classifiers}}}}+&#92;varepsilon}\" class=\"latex\" \/> [<a href=\"https:\/\/epubs.siam.org\/doi\/10.1137\/060649057\">KKMS &#8217;08<\/a>, <a href=\"https:\/\/epubs.siam.org\/doi\/abs\/10.1137\/100783030\">DGJSV &#8217;10<\/a>]. The run-time is <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bd%5E%7BO%28k%29%7D%3Dd%5E%7B%5Ctilde%7BO%7D%281%2F%5Cvarepsilon%5E%7B2%7D%29%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{d^{O(k)}=d^{&#92;tilde{O}(1\/&#92;varepsilon^{2})}}\" class=\"latex\" \/>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">This approach also yields a classifier <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bh%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{h}\" class=\"latex\" \/> with error <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Ctext%7B%5Censuremath%7B%5Ctext%7Bopt%7D_%7B%5Ctext%7B%5Censuremath%7B%5Cmathcal%7BC%7D%7D%7D%7D%7D%7D%2B%5Cvarepsilon%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;text{&#92;ensuremath{&#92;text{opt}_{&#92;text{&#92;ensuremath{&#92;mathcal{C}}}}}}+&#92;varepsilon}\" class=\"latex\" \/> for many other hypothesis classes <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cmathcal%7BC%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;mathcal{C}}\" class=\"latex\" \/>, such as ANDs of linear classifiers or indicators of convex sets in <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cmathbb%7BR%7D%5E%7Bd%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;mathbb{R}^{d}}\" class=\"latex\" \/> [<a href=\"https:\/\/ieeexplore.ieee.org\/document\/4690987\">KOS &#8217;08<\/a>]. The same approach can even be used when the class <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cmathcal%7BC%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;mathcal{C}}\" class=\"latex\" \/> is constant-depth circuits and the distribution <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BD_%7B%5Ctext%7BExamples%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{D_{&#92;text{Examples}}}\" class=\"latex\" \/> is uniform over binary bit-strings [<a href=\"https:\/\/epubs.siam.org\/doi\/10.1137\/060649057\">KKMS &#8217;08<\/a>]. To achieve all these results, we only need to run the algorithm above with a larger value of the degree parameter <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bk%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{k}\" class=\"latex\" \/>. The run-time <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bd%5E%7BO%28k%29%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{d^{O(k)}}\" class=\"latex\" \/> will increase, but as long as we insist on error bound of <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Ctext%7Bopt%7D_%7B%5Cmathcal%7BC%7D%7D%2B%5Cvarepsilon%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;text{opt}_{&#92;mathcal{C}}+&#92;varepsilon}\" class=\"latex\" \/> in Framework 2, this approach is known to be optimal for many hypothesis classes <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cmathcal%7BC%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;mathcal{C}}\" class=\"latex\" \/> [<a href=\"https:\/\/epubs.siam.org\/doi\/abs\/10.1137\/1.9781611973730.34\">DFTWW&#8217;15<\/a>, <a href=\"https:\/\/papers.nips.cc\/paper\/2020\/hash\/9d7311ba459f9e45ed746755a32dcd11-Abstract.html\">DKZ &#8217;20<\/a>, <a href=\"https:\/\/papers.nips.cc\/paper\/2020\/hash\/17257e81a344982579af1ae6415a7b8c-Abstract.html\">GGK &#8217;20<\/a>, <a href=\"https:\/\/proceedings.mlr.press\/v134\/diakonikolas21c.html\">DKPZ &#8217;21<\/a>, <a href=\"https:\/\/proceedings.mlr.press\/v202\/diakonikolas23b.html\">DKR &#8217;23<\/a>]. (Additionally, in order to achieve error <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Ctext%7B%5Censuremath%7B%5Ctext%7Bopt%7D_%7B%5Ctext%7B%5Censuremath%7B%5Cmathcal%7BC%7D%7D%7D%7D%7D%7D%2B%5Cvarepsilon%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;text{&#92;ensuremath{&#92;text{opt}_{&#92;text{&#92;ensuremath{&#92;mathcal{C}}}}}}+&#92;varepsilon}\" class=\"latex\" \/> rather than <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Ctext%7BO%28%5Censuremath%7B%5Ctext%7Bopt%7D_%7B%5Ctext%7B%5Censuremath%7B%5Cmathcal%7BC%7D%7D%7D%7D%7D%29%7D%2B%5Cvarepsilon%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;text{O(&#92;ensuremath{&#92;text{opt}_{&#92;text{&#92;ensuremath{&#92;mathcal{C}}}}})}+&#92;varepsilon}\" class=\"latex\" \/> one needs to replace step 3 with a slightly more sophisticated version of this step [<a href=\"https:\/\/epubs.siam.org\/doi\/10.1137\/060649057\">KKMS &#8217;08<\/a>]. From now on, we will make a tacit assumption that this improved version of step 3 is used.)<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><b>2.2 Converting the Low-Degree Algorithm into Framework 3 via the Moment-Matching Test. <\/b><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><a name=\"subsecMaking-Low-Degree-Algorithm\"><\/a>For a wide range of concept classes <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cmathcal%7BC%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;mathcal{C}}\" class=\"latex\" \/>, you can get Framework 3 algorithms by combining the Low-Degree Algorithm with a suitable tester <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cmathcal%7BT%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;mathcal{T}}\" class=\"latex\" \/>. Here is a general recipe to extend Framework 2 algorithms so they run in Framework 3:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li class=\"\">Run<sup data-fn=\"f0bd6d70-92ae-44be-9cd5-09cb8b45e961\" class=\"fn\"><a href=\"#f0bd6d70-92ae-44be-9cd5-09cb8b45e961\" id=\"f0bd6d70-92ae-44be-9cd5-09cb8b45e961-link\">6<\/a><\/sup> an algorithm <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cmathcal%7BA%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;mathcal{A}}\" class=\"latex\" \/> for class <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cmathcal%7BC%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;mathcal{C}}\" class=\"latex\" \/> in Framework 2 to get a classifier <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bh.%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{h.}\" class=\"latex\" \/><\/li>\n\n\n\n<li class=\"\">Run some algorithm <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cmathcal%7BT%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;mathcal{T}}\" class=\"latex\" \/>, which we will call the <em>tester<\/em>, that accesses <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BD_%7B%5Ctext%7BExamples%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{D_{&#92;text{Examples}}}\" class=\"latex\" \/> and outputs Accept or Reject.<\/li>\n\n\n\n<li class=\"\">If <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cmathcal%7BT%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;mathcal{T}}\" class=\"latex\" \/> accepts, then return the classifier <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bh%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{h}\" class=\"latex\" \/> from step (1).<\/li>\n\n\n\n<li class=\"\">If <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cmathcal%7BT%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;mathcal{T}}\" class=\"latex\" \/> rejects, then run a (potentially very slow) algorithm <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cmathcal%7BA%7D%27%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;mathcal{A}&#039;}\" class=\"latex\" \/> for class <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cmathcal%7BC%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;mathcal{C}}\" class=\"latex\" \/> in Framework 1. (See Section <a href=\"#subsecThree-Frameworks-for\">1.3<\/a> for more about those)<\/li>\n<\/ol>\n\n\n\n<p class=\"wp-block-paragraph\">Let us compare Framework 2 and Framework 3, to see what specifications the tester <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cmathcal%7BT%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;mathcal{T}}\" class=\"latex\" \/> needs to meet to ensure that the procedure above satisfies Framework 3. We see that it would suffice if algorithm <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cmathcal%7BT%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;mathcal{T}}\" class=\"latex\" \/> accepted whenever <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BD_%7B%5Ctext%7BExamples%7D%7D%3DD_%7B%5Ctext%7BAssumption%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{D_{&#92;text{Examples}}=D_{&#92;text{Assumption}}}\" class=\"latex\" \/> and rejected whenever <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BD_%7B%5Ctext%7BExamples%7D%7D%5Cneq+D_%7B%5Ctext%7BAssumption%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{D_{&#92;text{Examples}}&#92;neq D_{&#92;text{Assumption}}}\" class=\"latex\" \/>. Unfortunately, for example when <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BD_%7B%5Ctext%7BAssumption%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{D_{&#92;text{Assumption}}}\" class=\"latex\" \/> is the Gaussian distribution, there is provably no tester <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cmathcal%7BT%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;mathcal{T}}\" class=\"latex\" \/> that can distinguish if <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BD_%7B%5Ctext%7BExamples%7D%7D%3DD_%7B%5Ctext%7BAssumption%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{D_{&#92;text{Examples}}=D_{&#92;text{Assumption}}}\" class=\"latex\" \/> or <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BD_%7B%5Ctext%7BExamples%7D%7D%5Cneq+D_%7B%5Ctext%7BAssumption%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{D_{&#92;text{Examples}}&#92;neq D_{&#92;text{Assumption}}}\" class=\"latex\" \/>.<\/p>\n\n\n\n<div class=\"wp-block-group is-vertical is-layout-flex wp-container-core-group-is-layout-4fc3f8e1 wp-block-group-is-layout-flex\">\n<div class=\"wp-block-group is-vertical is-layout-flex wp-container-core-group-is-layout-4fc3f8e1 wp-block-group-is-layout-flex\">\n<p class=\"wp-block-paragraph\">However, we can get away with less stringent requirements for <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cmathcal%7BT%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;mathcal{T}}\" class=\"latex\" \/> and still achieve our goal:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li class=\"\"><b>Completeness:<\/b> whenever <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BD_%7B%5Ctext%7BExamples%7D%7D%3DD_%7B%5Ctext%7BAssumption%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{D_{&#92;text{Examples}}=D_{&#92;text{Assumption}}}\" class=\"latex\" \/>, the tester <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cmathcal%7BT%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;mathcal{T}}\" class=\"latex\" \/> accepts with high probability.<\/li>\n\n\n\n<li class=\"\"><b>Soundness: <\/b>For all distributions <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BD_%7B%5Ctext%7BExamples%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{D_{&#92;text{Examples}}}\" class=\"latex\" \/>, either the tester <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cmathcal%7BT%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;mathcal{T}}\" class=\"latex\" \/> will with high probability reject <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BD_%7B%5Ctext%7BExamples%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{D_{&#92;text{Examples}}}\" class=\"latex\" \/>, or <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BD_%7B%5Ctext%7BExamples%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{D_{&#92;text{Examples}}}\" class=\"latex\" \/> is such that the algorithm <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cmathcal%7BA%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;mathcal{A}}\" class=\"latex\" \/> will with high probability give a hypothesis <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bh%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{h}\" class=\"latex\" \/> satisfying the <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cvarepsilon%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;varepsilon}\" class=\"latex\" \/>-optimality guarantee<br><p align=\"center\"><img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%5Cdisplaystyle+%5Coverbrace%7B%5CPr_%7Bx%5Csim+D_%7B%5Ctext%7BExamples%7D%7D%7D%5Bh%28x%29%5Cneq+f_%7B%5Ctext%7Bnoisy+labels%7D%7D%28x%29%5D%7D%5E%7B%5Ctext%7BPrediction+error+of+%5Censuremath%7Bh%7D.%7D%7D%5Cleq%5Coverbrace%7B%5Ctext%7Bopt%7D_%7B%5Cmathcal%7BC%7D%7D%7D%5E%7B%5Ctext%7BOptimal+prediction+error+in+%5Censuremath%7B%5Cmathcal%7BC%7D.%7D%7D%7D%2B%5Cvarepsilon.+&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"&#92;displaystyle &#92;overbrace{&#92;Pr_{x&#92;sim D_{&#92;text{Examples}}}[h(x)&#92;neq f_{&#92;text{noisy labels}}(x)]}^{&#92;text{Prediction error of &#92;ensuremath{h}.}}&#92;leq&#92;overbrace{&#92;text{opt}_{&#92;mathcal{C}}}^{&#92;text{Optimal prediction error in &#92;ensuremath{&#92;mathcal{C}.}}}+&#92;varepsilon. \" class=\"latex\" \/><\/p><\/li>\n\n\n\n<li class=\"\"><b>Fast Run Time: <\/b>the tester <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cmathcal%7BT%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;mathcal{T}}\" class=\"latex\" \/> runs in time at most <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BT%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{T}\" class=\"latex\" \/>.<\/li>\n<\/ul>\n<\/div>\n\n\n\n<p class=\"wp-block-paragraph\">The point of the Soundness condition is to permit the algorithm <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cmathcal%7BT%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;mathcal{T}}\" class=\"latex\" \/> to accept a distribution <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BD_%7B%5Ctext%7BExamples%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{D_{&#92;text{Examples}}}\" class=\"latex\" \/> even if <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BD_%7B%5Ctext%7BExamples%7D%7D%5Cneq+D_%7B%5Ctext%7BAssumption%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{D_{&#92;text{Examples}}&#92;neq D_{&#92;text{Assumption}}}\" class=\"latex\" \/> but <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BD_%7B%5Ctext%7BExamples%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{D_{&#92;text{Examples}}}\" class=\"latex\" \/> is still &#8220;good enough&#8221; for the algorithm <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cmathcal%7BA%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;mathcal{A}}\" class=\"latex\" \/>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n<\/div>\n\n\n\n<p class=\"wp-block-paragraph\">Let us come back to using the recipe above, taking <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cmathcal%7BA%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;mathcal{A}}\" class=\"latex\" \/> to be the Low-Degree Algorithm. The recipe above can be executed using the following algorithm<sup data-fn=\"e24c404f-9465-416b-9d59-3feddcf98899\" class=\"fn\"><a href=\"#e24c404f-9465-416b-9d59-3feddcf98899\" id=\"e24c404f-9465-416b-9d59-3feddcf98899-link\">7<\/a><\/sup> as the tester <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cmathcal%7BT%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;mathcal{T}}\" class=\"latex\" \/>:<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p class=\"wp-block-paragraph\"><b>Degree-<img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bk%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{k}\" class=\"latex\" \/> Moment-Matching Tester [<a href=\"https:\/\/dl.acm.org\/doi\/10.1145\/3564246.3585117\">RV &#8217;23<\/a>, <a href=\"https:\/\/dl.acm.org\/doi\/10.1145\/3564246.3585206\">GKK &#8217;23<\/a>]:<\/b><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li class=\"\">Given: access to independent examples from <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BD_%7B%5Ctext%7BExamples%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{D_{&#92;text{Examples}}}\" class=\"latex\" \/>, a reference distribution <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BD_%7B%5Ctext%7BAssumption%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{D_{&#92;text{Assumption}}}\" class=\"latex\" \/>, parameter <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bk%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{k}\" class=\"latex\" \/>.<\/li>\n\n\n\n<li class=\"\">Output: Accept or Reject.<\/li>\n<\/ul>\n\n\n\n<ol class=\"wp-block-list\">\n<li class=\"\"><img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BS%5Cleftarrow%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{S&#92;leftarrow}\" class=\"latex\" \/> {<img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bd%5E%7BO%28k%29%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{d^{O(k)}}\" class=\"latex\" \/> examples from <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BD_%7B%5Ctext%7BExamples%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{D_{&#92;text{Examples}}}\" class=\"latex\" \/>}.<\/li>\n\n\n\n<li class=\"\">For all monomials <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bm%3D%5Cprod_%7Bi%7Dx_%7Bi%7D%5E%7B%5Calpha_%7Bi%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{m=&#92;prod_{i}x_{i}^{&#92;alpha_{i}}}\" class=\"latex\" \/> of total degree at most <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bk%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{k}\" class=\"latex\" \/>:\n<ol class=\"wp-block-list\">\n<li class=\"\">If <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Clvert%5Cmathbb%7BE%7D_%7Bx%5Csim+S%7D%5Bm%28x%29%5D-%5Cmathbb%7BE%7D_%7Bx%5Csim+D_%7B%5Ctext%7BAssumption%7D%7D%7D%5Bm%28x%29%5D%5Crvert%3Ed%5E%7B-k%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;lvert&#92;mathbb{E}_{x&#92;sim S}[m(x)]-&#92;mathbb{E}_{x&#92;sim D_{&#92;text{Assumption}}}[m(x)]&#92;rvert&gt;d^{-k}}\" class=\"latex\" \/>, output Reject and terminate. <br>(Note: For many distributions <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BD_%7B%5Ctext%7BAssumption%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{D_{&#92;text{Assumption}}}\" class=\"latex\" \/> of interest, such as Gaussian distribution, the quantity <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cmathbb%7BE%7D_%7Bx%5Csim+D_%7B%5Ctext%7BAssumption%7D%7D%7D%5Bm%28x%29%5D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;mathbb{E}_{x&#92;sim D_{&#92;text{Assumption}}}[m(x)]}\" class=\"latex\" \/> can be computed directly. One can also draw samples from <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BD_%7B%5Ctext%7BAssumption%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{D_{&#92;text{Assumption}}}\" class=\"latex\" \/> and use them to estimate <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cmathbb%7BE%7D_%7Bx%5Csim+D_%7B%5Ctext%7BAssumption%7D%7D%7D%5Bm%28x%29%5D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;mathbb{E}_{x&#92;sim D_{&#92;text{Assumption}}}[m(x)]}\" class=\"latex\" \/>.)<\/li>\n<\/ol>\n<\/li>\n\n\n\n<li class=\"\">If this step is reached, output Accept.<\/li>\n<\/ol>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p class=\"wp-block-paragraph\"> Since in <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bd%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{d}\" class=\"latex\" \/> dimensions there are at most <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bd%5E%7BO%28k%29%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{d^{O(k)}}\" class=\"latex\" \/> monomials of degree <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bk%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{k}\" class=\"latex\" \/>, the Moment-Matching Tester above runs in time <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bd%5E%7BO%28k%29%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{d^{O(k)}}\" class=\"latex\" \/>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Overall, using the recipe above to combine the degree-<img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bk%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{k}\" class=\"latex\" \/> Low-Degree Algorithm with the degree-<img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bk%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{k}\" class=\"latex\" \/> Moment-Matching Tester, yields Framework 3 algorithms for many concept classes&#8212;including linear classifiers, ANDs of linear classifiers, and constant-depth circuits. As a concrete example, when <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cmathcal%7BC%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;mathcal{C}}\" class=\"latex\" \/> is the class of linear classifiers, the resulting algorithm in Framework 3 has a run-time bound <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BT%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{T}\" class=\"latex\" \/> of <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bd%5E%7B%5Ctilde%7BO%7D%281%2F%5Cvarepsilon%5E%7B2%7D%29%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{d^{&#92;tilde{O}(1\/&#92;varepsilon^{2})}}\" class=\"latex\" \/>. This run-time matches best algorithm of in Framework 2 (believed to be optimal).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The proof of correctness for these algorithms in Framework 3 is far from automatic. As explained in Section <a href=\"#subsecAnalysis-of-the\">2.3<\/a>, the notion of sandwiching polynomials from the field of pseudorandomness turns out to be very important for the proof of correctness [<a href=\"https:\/\/dl.acm.org\/doi\/10.1145\/3564246.3585206\">GKK &#8217;23<\/a>].<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><b>Remark:<\/b> many papers referenced here phrase their main result as a tester <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cmathcal%7BT%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;mathcal{T}}\" class=\"latex\" \/> that satisfies the three aforementioned conditions of <em>completeness<\/em>, <em>soundness<\/em> and <em>fast run-time<\/em> (together with a Framework 2 algorithm <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cmathcal%7BA%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;mathcal{A}}\" class=\"latex\" \/>). Note that, as described above and also noted in [<a href=\"https:\/\/dl.acm.org\/doi\/10.1145\/3564246.3585117\">RV &#8217;23<\/a>], having such a tester gives<sup data-fn=\"5da7233e-676e-42f9-8bc5-41c5198adc8a\" class=\"fn\"><a href=\"#5da7233e-676e-42f9-8bc5-41c5198adc8a\" id=\"5da7233e-676e-42f9-8bc5-41c5198adc8a-link\">8<\/a><\/sup> an algorithm in Framework 3 (in fact it is equivalent to having an algorithm in Framework 3 [<a href=\"https:\/\/dl.acm.org\/doi\/10.1145\/3564246.3585117\">RV &#8217;23<\/a>]).<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><b>2.3 Analysis of the Moment-Matching Tester through Sandwiching Polynomials. <\/b><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><a name=\"subsecAnalysis-of-the\"><\/a><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Soon, we will discuss more Framework 3 algorithms, but let us first discuss the proof of correctness for the Moment-Matching Tester. Recall that it needs to satisfy three conditions: <em>Completeness<\/em>, <em>Soundness<\/em> and <em>Fast Run Time<\/em>. Among these three, the Soundness condition is the most challenging one to prove. [<a href=\"https:\/\/dl.acm.org\/doi\/10.1145\/3564246.3585206\">GKK &#8217;23<\/a>] give a principled way of doing this, which we will briefly sketch now. A key step is to show that every function <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bf%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{f}\" class=\"latex\" \/> in the class <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cmathcal%7BC%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;mathcal{C}}\" class=\"latex\" \/> has<em> <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cvarepsilon%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;varepsilon}\" class=\"latex\" \/>-sandwiching polynomials<\/em> of degree <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bk%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{k}\" class=\"latex\" \/> under <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BD_%7B%5Ctext%7BAssumption%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{D_{&#92;text{Assumption}}}\" class=\"latex\" \/>, i.e. a pair of polynomials <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bp_%7B%5Ctext%7Bup%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{p_{&#92;text{up}}}\" class=\"latex\" \/> and <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bp_%7B%5Ctext%7Bdown%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{p_{&#92;text{down}}}\" class=\"latex\" \/> satisfying:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li class=\"\"><b>Sandwiching: <\/b>for every point <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bx%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{x}\" class=\"latex\" \/>, we have <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bp_%7B%5Ctext%7Bdown%7D%7D%28x%29%5Cleq+f%28x%29%5Cleq+p_%7B%5Ctext%7Bup%7D%7D%28x%29%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{p_{&#92;text{down}}(x)&#92;leq f(x)&#92;leq p_{&#92;text{up}}(x)}\" class=\"latex\" \/>.<\/li>\n\n\n\n<li class=\"\"><em><img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cvarepsilon%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;varepsilon}\" class=\"latex\" \/><\/em><b>-Closeness: <\/b>we have <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cmathbb%7BE%7D_%7Bx%5Csim+D_%7B%5Ctext%7BAssumption%7D%7D%7D%5Bp_%7B%5Ctext%7Bup%7D%7D%28x%29-p_%7B%5Ctext%7Bdown%7D%7D%28x%29%5D%5Cleq%5Cvarepsilon%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;mathbb{E}_{x&#92;sim D_{&#92;text{Assumption}}}[p_{&#92;text{up}}(x)-p_{&#92;text{down}}(x)]&#92;leq&#92;varepsilon}\" class=\"latex\" \/>.<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">For example, linear classifiers have <em><img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cvarepsilon%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;varepsilon}\" class=\"latex\" \/>-sandwiching polynomials <\/em>of degree <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Ctilde%7BO%7D%5Cleft%28%5Cfrac%7B1%7D%7B%5Cvarepsilon%5E%7B2%7D%7D%5Cright%29%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;tilde{O}&#92;left(&#92;frac{1}{&#92;varepsilon^{2}}&#92;right)}\" class=\"latex\" \/> under the Gaussian distribution <a href=\"https:\/\/epubs.siam.org\/doi\/abs\/10.1137\/100783030\">[DGJSV &#8217;10]<\/a>. This notion was first studied in the field of <em>pseudorandomness<\/em>, because the existence of sandwiching polynomials for <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bf%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{f}\" class=\"latex\" \/> can be leveraged to approximate the expectation <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cmathbb%7BE%7D_%7Bx%5Csim+D_%7B%5Ctext%7BAssumption%7D%7D%7D%5Bf%28x%29%5D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;mathbb{E}_{x&#92;sim D_{&#92;text{Assumption}}}[f(x)]}\" class=\"latex\" \/> while only making deterministic queries to <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bf%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{f}\" class=\"latex\" \/>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">It is shown in [<a href=\"https:\/\/dl.acm.org\/doi\/10.1145\/3564246.3585206\">GKK &#8217;23<\/a>] that whenever such a pair of polynomials exists for every <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bf%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{f}\" class=\"latex\" \/> in the hypothesis class <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cmathcal%7BC%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;mathcal{C}}\" class=\"latex\" \/>, then for the Moment-Matching Tester, the Soundness condition holds. Let us briefly sketch the argument, denoting by <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bf%5E%7B%2A%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{f^{*}}\" class=\"latex\" \/> the function in <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cmathcal%7BC%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;mathcal{C}}\" class=\"latex\" \/> with the smallest prediction error <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Ctext%7Bopt%7D_%7B%5Cmathcal%7BC%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;text{opt}_{&#92;mathcal{C}}}\" class=\"latex\" \/>. By assumption, there is a pair of sandwiching polynomials <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bp_%7B%5Ctext%7Bup%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{p_{&#92;text{up}}}\" class=\"latex\" \/> and <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bp_%7B%5Ctext%7Bdown%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{p_{&#92;text{down}}}\" class=\"latex\" \/> for <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bf%5E%7B%2A%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{f^{*}}\" class=\"latex\" \/>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">If the distribution <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BD_%7B%5Ctext%7BExamples%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{D_{&#92;text{Examples}}}\" class=\"latex\" \/> is such that the Moment-Matching Tester is likely to pass, then for every monomial <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bm%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{m}\" class=\"latex\" \/> of degree at most <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bk%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{k}\" class=\"latex\" \/> we have <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cmathbb%7BE%7D_%7Bx%5Csim+D_%7B%5Ctext%7BAssumption%7D%7D%7D%5Bm%28x%29%5D%5Capprox%5Cmathbb%7BE%7D_%7Bx%5Csim+D_%7B%5Ctext%7BExamples%7D%7D%7D%5Bm%28x%29%5D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;mathbb{E}_{x&#92;sim D_{&#92;text{Assumption}}}[m(x)]&#92;approx&#92;mathbb{E}_{x&#92;sim D_{&#92;text{Examples}}}[m(x)]}\" class=\"latex\" \/>. Therefore,<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%5Cdisplaystyle+%5Cmathbb%7BE%7D_%7Bx%5Csim+D_%7B%5Ctext%7BExamples%7D%7D%7D%5Cleft%5B%5Cleft%7Cf%5E%7B%2A%7D%28x%29-p_%7B%5Ctext%7Bdown%7D%7D%28x%29%5Cright%7C%5Cright%5D+%5C%5C+%5Cleq%5Cmathbb%7BE%7D_%7Bx%5Csim+D_%7B%5Ctext%7BExamples%7D%7D%7D%5Bp_%7B%5Ctext%7Bup%7D%7D%28x%29-p_%7B%5Ctext%7Bdown%7D%7D%28x%29%5D%5Capprox%5Cmathbb%7BE%7D_%7Bx%5Csim+D_%7B%5Ctext%7BAssumption%7D%7D%7D%5Bp_%7B%5Ctext%7Bup%7D%7D%28x%29-p_%7B%5Ctext%7Bdown%7D%7D%28x%29%5D%5Cleq%5Cvarepsilon%2C+&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"&#92;displaystyle &#92;mathbb{E}_{x&#92;sim D_{&#92;text{Examples}}}&#92;left[&#92;left|f^{*}(x)-p_{&#92;text{down}}(x)&#92;right|&#92;right] &#92;&#92; &#92;leq&#92;mathbb{E}_{x&#92;sim D_{&#92;text{Examples}}}[p_{&#92;text{up}}(x)-p_{&#92;text{down}}(x)]&#92;approx&#92;mathbb{E}_{x&#92;sim D_{&#92;text{Assumption}}}[p_{&#92;text{up}}(x)-p_{&#92;text{down}}(x)]&#92;leq&#92;varepsilon, \" class=\"latex\" \/><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">where the first step uses the sandwiching property, the second step breaks <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bp_%7B%5Ctext%7Bup%7D%7D%28x%29-p_%7B%5Ctext%7Bdown%7D%7D%28x%29%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{p_{&#92;text{up}}(x)-p_{&#92;text{down}}(x)}\" class=\"latex\" \/> into monomials and then applies<sup data-fn=\"010dd730-d676-4dad-b414-d8513a155208\" class=\"fn\"><a href=\"#010dd730-d676-4dad-b414-d8513a155208\" id=\"010dd730-d676-4dad-b414-d8513a155208-link\">9<\/a><\/sup> <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cmathbb%7BE%7D_%7Bx%5Csim+D_%7B%5Ctext%7BAssumption%7D%7D%7D%5Bm%28x%29%5D%5Capprox%5Cmathbb%7BE%7D_%7Bx%5Csim+D_%7B%5Ctext%7BExamples%7D%7D%7D%5Bm%28x%29%5D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;mathbb{E}_{x&#92;sim D_{&#92;text{Assumption}}}[m(x)]&#92;approx&#92;mathbb{E}_{x&#92;sim D_{&#92;text{Examples}}}[m(x)]}\" class=\"latex\" \/> to each monomial <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bm%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{m}\" class=\"latex\" \/>. The last step above uses the <em><img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cvarepsilon%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;varepsilon}\" class=\"latex\" \/><\/em>-Closeness property.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Finally, the upper bound on <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cmathbb%7BE%7D_%7Bx%5Csim+D_%7B%5Ctext%7BExamples%7D%7D%7D%5Cleft%5B%5Cleft%7Cf%5E%7B%2A%7D%28x%29-p_%7B%5Ctext%7Bdown%7D%7D%28x%29%5Cright%7C%5Cright%5D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;mathbb{E}_{x&#92;sim D_{&#92;text{Examples}}}&#92;left[&#92;left|f^{*}(x)-p_{&#92;text{down}}(x)&#92;right|&#92;right]}\" class=\"latex\" \/> tells us that the Low-Degree Algorithm finds a polynomial <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bp%5E%7B%2A%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{p^{*}}\" class=\"latex\" \/> with <\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%5Cdisplaystyle+%5Cfrac%7B1%7D%7B2%7D%5Cmathbb%7BE%7D_%7Bx%5Csim+D_%7B%5Ctext%7BExamples%7D%7D%7D%5Cleft%5B%5Cleft%7Cf_%7B%5Ctext%7Bnoisy+labels%7D%7D%28x%29-p%5E%7B%2A%7D%28x%29%5Cright%7C%5Cright%5D%5Clesssim%5Cfrac%7B1%7D%7B2%7D%5Cmathbb%7BE%7D_%7Bx%5Csim+D_%7B%5Ctext%7BExamples%7D%7D%7D%5Cleft%5B%5Cleft%7Cf_%7B%5Ctext%7Bnoisy+labels%7D%7D%28x%29-p_%7B%5Ctext%7Bdown%7D%7D%28x%29%5Cright%7C%5Cright%5D+%5Clesssim+%5C%5C+%5Cunderbrace%7B%5Cfrac%7B1%7D%7B2%7D%5Cmathbb%7BE%7D_%7Bx%5Csim+D_%7B%5Ctext%7BExamples%7D%7D%7D%5Cleft%5B%5Cleft%7Cf_%7B%5Ctext%7Bnoisy+labels%7D%7D%28x%29-f%5E%7B%2A%7D%28x%29%5Cright%7C%5Cright%5D%7D_%7B%3D%5Ctext%7Bopt%7D_%7B%5Cmathcal%7BC%7D%7D%5Ctext%7B+because+%7D+f%5E%7B%2A%7D+%5Ctext%7B+is+best+classifier+in+%7D%5Cmathcal%7BC%7D%2B%5Cvarepsilon%7D.+&#038;bg=ffffff&#038;fg=000&#038;s=0&#038;c=20201002\" alt=\"&#92;displaystyle &#92;frac{1}{2}&#92;mathbb{E}_{x&#92;sim D_{&#92;text{Examples}}}&#92;left[&#92;left|f_{&#92;text{noisy labels}}(x)-p^{*}(x)&#92;right|&#92;right]&#92;lesssim&#92;frac{1}{2}&#92;mathbb{E}_{x&#92;sim D_{&#92;text{Examples}}}&#92;left[&#92;left|f_{&#92;text{noisy labels}}(x)-p_{&#92;text{down}}(x)&#92;right|&#92;right] &#92;lesssim &#92;&#92; &#92;underbrace{&#92;frac{1}{2}&#92;mathbb{E}_{x&#92;sim D_{&#92;text{Examples}}}&#92;left[&#92;left|f_{&#92;text{noisy labels}}(x)-f^{*}(x)&#92;right|&#92;right]}_{=&#92;text{opt}_{&#92;mathcal{C}}&#92;text{ because } f^{*} &#92;text{ is best classifier in }&#92;mathcal{C}+&#92;varepsilon}. \" class=\"latex\" \/><\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><b>3. Going Beyond Low-Degree Algorithm. <\/b><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\"><a name=\"secGoing-Beyond-Low-Degree\"><\/a><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">To recap: in Framework 2, the Low-Degree Algorithm works for many hypothesis classes <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cmathcal%7BC%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;mathcal{C}}\" class=\"latex\" \/>, giving a classifier <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bh%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{h}\" class=\"latex\" \/> with error <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Ctext%7Bopt%7D_%7B%5Cmathcal%7BC%7D%7D%2B%5Cvarepsilon%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;text{opt}_{&#92;mathcal{C}}+&#92;varepsilon}\" class=\"latex\" \/>. In Framework 3, this algorithm can be used together with the Moment-Matching Tester to get computationally efficient algorithms. However, this approach has a number of limitations:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li class=\"\">Run-time for the Low-Degree Algorithm tends to be exponential in <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B1%2F%5Cvarepsilon%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{1\/&#92;varepsilon}\" class=\"latex\" \/>. For example, to get error <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Ctext%7B%5Censuremath%7B%5Ctext%7Bopt%7D_%7B%5Ctext%7BLinear+Classifiers%7D%7D%7D%7D%2B%5Cvarepsilon%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;text{&#92;ensuremath{&#92;text{opt}_{&#92;text{Linear Classifiers}}}}+&#92;varepsilon}\" class=\"latex\" \/>, the Low-Degree Algorithms needs to run in time <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bd%5E%7B%5Ctilde%7BO%7D%281%2F%5Cvarepsilon%5E%7B2%7D%29%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{d^{&#92;tilde{O}(1\/&#92;varepsilon^{2})}}\" class=\"latex\" \/>.<\/li>\n\n\n\n<li class=\"\">The algorithms are <em>improper<\/em>, which means the hypothesis <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bh%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{h}\" class=\"latex\" \/> they give is not itself in the class <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cmathcal%7BC%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;mathcal{C}}\" class=\"latex\" \/>. To be concrete, compare the two following two tasks:\n<ul class=\"wp-block-list\">\n<li class=\"\">(a) We get a labeled data-set <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BS%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{S}\" class=\"latex\" \/>, and we want to fit approximately the best linear classifier to <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BS%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{S}\" class=\"latex\" \/>.<\/li>\n\n\n\n<li class=\"\">(b) We get a labeled data set <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BS%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{S}\" class=\"latex\" \/>, and we want to fit a classifier to <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BS%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{S}\" class=\"latex\" \/> of the form <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Ctext%7Bsign%7D%28p%28x%29%29%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;text{sign}(p(x))}\" class=\"latex\" \/> where <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bp%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{p}\" class=\"latex\" \/> is some polynomial, this classifier should approximately be as good as the best linear classifier on <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BS%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{S}\" class=\"latex\" \/> .<\/li>\n<\/ul>\n<\/li>\n\n\n\nThe Low-Degree Algorithm is solving task (b) not task (a). However, most people would agree that task (a) is far more natural.\n\n\n\n<li class=\"\">The Moment-Matching Tester is guaranteed to accept when the distribution <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BD_%7B%5Ctext%7BExamples%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{D_{&#92;text{Examples}}}\" class=\"latex\" \/> equals to a specific distribution <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BD_%7B%5Ctext%7BAssumption%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{D_{&#92;text{Assumption}}}\" class=\"latex\" \/>, but still might reject when <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BD_%7B%5Ctext%7BExamples%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{D_{&#92;text{Examples}}}\" class=\"latex\" \/> is very similar to <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BD_%7B%5Ctext%7BAssumption%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{D_{&#92;text{Assumption}}}\" class=\"latex\" \/>. Formally, one can construct a distribution <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BD_%7B%5Ctext%7BExamples%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{D_{&#92;text{Examples}}}\" class=\"latex\" \/> that will be rejected by the degree-<img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bk%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{k}\" class=\"latex\" \/> Moment-Matching Tester, even though <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Ctext%7Bdist%7D_%7B%5Ctext%7BTV%7D%7D%28D_%7B%5Ctext%7BExamples%7D%7D%2CD_%7B%5Ctext%7BAssumption%7D%7D%29%3Dd%5E%7B-O%28k%29%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;text{dist}_{&#92;text{TV}}(D_{&#92;text{Examples}},D_{&#92;text{Assumption}})=d^{-O(k)}}\" class=\"latex\" \/>.<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">We will now discuss subsequent work that focuses on removing these limitations.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><b>3.1 The Matter of Efficiency: Polynomial-Time Algorithms in Framework 3. <\/b><\/h3>\n\n\n\n<h4 class=\"wp-block-heading\">3.1.1 Learning Beyond Low-Degree Algorithm.<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">In contrast with the Low-Degree Algorithm, some algorithms in Framework 2 are tailor-made for specific classes <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cmathcal%7BC%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;mathcal{C}}\" class=\"latex\" \/> and have run-times <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Ctext%7Bpoly%7D%28d%2F%5Cvarepsilon%29%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;text{poly}(d\/&#92;varepsilon)}\" class=\"latex\" \/>, at the price of looser error guarantees such as <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BO%28%5Ctext%7Bopt%7D_%7B%5Cmathcal%7BC%7D%7D%29%2B%5Cvarepsilon%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{O(&#92;text{opt}_{&#92;mathcal{C}})+&#92;varepsilon}\" class=\"latex\" \/> or <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cwidetilde%7BO%7D%28%5Ctext%7Bopt%7D_%7B%5Cmathcal%7BC%7D%7D%29%2B%5Cvarepsilon%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;widetilde{O}(&#92;text{opt}_{&#92;mathcal{C}})+&#92;varepsilon}\" class=\"latex\" \/>. For example, consider the following algorithm:<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p class=\"wp-block-paragraph\"><b>Averaging Algorithm [<a href=\"https:\/\/epubs.siam.org\/doi\/10.1137\/060649057\">KKMS &#8217;08<\/a>]:<\/b><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li class=\"\">Given: access to independent standard Gaussian examples <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cleft%5C%7B+x_%7Bi%7D%5Cright%5C%7D+%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;left&#92;{ x_{i}&#92;right&#92;} }\" class=\"latex\" \/>, labeled by unknown function <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bf_%7B%5Ctext%7Bnoisy+labels%7D%7D%3A%5Cmathbb%7BR%7D%5E%7Bd%7D%5Crightarrow%5Cleft%5C%7B+%5Cpm1%5Cright%5C%7D+%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{f_{&#92;text{noisy labels}}:&#92;mathbb{R}^{d}&#92;rightarrow&#92;left&#92;{ &#92;pm1&#92;right&#92;} }\" class=\"latex\" \/>.<\/li>\n\n\n\n<li class=\"\">Output: hypothesis <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cwidehat%7Bh%7D%3A%5Cmathbb%7BR%7D%5E%7Bd%7D%5Crightarrow%5Cleft%5C%7B+%5Cpm1%5Cright%5C%7D+%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;widehat{h}:&#92;mathbb{R}^{d}&#92;rightarrow&#92;left&#92;{ &#92;pm1&#92;right&#92;} }\" class=\"latex\" \/>.<\/li>\n<\/ul>\n\n\n\n<ol class=\"wp-block-list\">\n<li class=\"\"><img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BS%5Cleftarrow%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{S&#92;leftarrow}\" class=\"latex\" \/> {<img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Ctext%7Bpoly%7D%28d%2F%5Cvarepsilon%29%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;text{poly}(d\/&#92;varepsilon)}\" class=\"latex\" \/> Gaussian examples <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cleft%5C%7B+x_%7Bi%7D%5Cright%5C%7D+%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;left&#92;{ x_{i}&#92;right&#92;} }\" class=\"latex\" \/> labeled by <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bf_%7B%5Ctext%7Bnoisy+labels%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{f_{&#92;text{noisy labels}}}\" class=\"latex\" \/>}.<\/li>\n\n\n\n<li class=\"\"><img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bw%5Cleftarrow%5Csum_%7Bx_%7Bi%7D%5Cin+S%7D%5Cleft%5Bf_%7B%5Ctext%7Bnoisy+labels%7D%7D%28x_%7Bi%7D%29%5Ccdot+x_%7Bi%7D%5Cright%5D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{w&#92;leftarrow&#92;sum_{x_{i}&#92;in S}&#92;left[f_{&#92;text{noisy labels}}(x_{i})&#92;cdot x_{i}&#92;right]}\" class=\"latex\" \/>.<\/li>\n\n\n\n<li class=\"\">Output <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bh%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{h}\" class=\"latex\" \/> mapping <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bx%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{x}\" class=\"latex\" \/> in <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cmathbb%7BR%7D%5E%7Bd%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;mathbb{R}^{d}}\" class=\"latex\" \/> to <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Ctext%7Bsign%28%5Censuremath%7Bw%5Ccdot+x%7D%29%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;text{sign(&#92;ensuremath{w&#92;cdot x})}}\" class=\"latex\" \/><\/li>\n<\/ol>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p class=\"wp-block-paragraph\">When the distribution <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BD_%7B%5Ctext%7BAssumption%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{D_{&#92;text{Assumption}}}\" class=\"latex\" \/> is the standard Gaussian, this simple Framework 2 algorithm runs in time <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Ctext%7Bpoly%7D%28d%2F%5Cvarepsilon%29%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;text{poly}(d\/&#92;varepsilon)}\" class=\"latex\" \/>, and gives a classifier <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Ctext%7Bsign%7D%28w%5Censuremath%7B%5Ccdot+x%7D%29%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;text{sign}(w&#92;ensuremath{&#92;cdot x})}\" class=\"latex\" \/> with error <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cwidetilde%7BO%7D%28%5Ctext%7Bopt%7D_%7B%5Cmathcal%7BC%7D%7D%29%2B%5Cvarepsilon%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;widetilde{O}(&#92;text{opt}_{&#92;mathcal{C}})+&#92;varepsilon}\" class=\"latex\" \/>. Here <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Ctext%7Bopt%7D_%7B%5Cmathcal%7BC%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;text{opt}_{&#92;mathcal{C}}}\" class=\"latex\" \/> is the best prediction error of an origin-centered linear classifier (i.e. a classifier of the form <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Ctext%7Bsign%7D%28w%27%5Censuremath%7B%5Ccdot+x%7D%29%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;text{sign}(w&#039;&#92;ensuremath{&#92;cdot x})}\" class=\"latex\" \/>) [<a href=\"https:\/\/epubs.siam.org\/doi\/10.1137\/060649057\">KKMS &#8217;08<\/a>]. The analysis of the Averaging Algorithm is self-contained and only takes three pages [<a href=\"https:\/\/epubs.siam.org\/doi\/10.1137\/060649057\">KKMS &#8217;08<\/a>], and a key idea is to denote the optimal classifier as <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Ctext%7Bsign%7D%28w%5E%7B%2A%7D%5Censuremath%7B%5Ccdot+x%7D%29%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;text{sign}(w^{*}&#92;ensuremath{&#92;cdot x})}\" class=\"latex\" \/> and decompose <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bw%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{w}\" class=\"latex\" \/> into <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bw_%7B%5Ctext%7Bsignal%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{w_{&#92;text{signal}}}\" class=\"latex\" \/> and <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bw_%7B%5Ctext%7Bnoise%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{w_{&#92;text{noise}}}\" class=\"latex\" \/> as follows:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\n<img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%5Cdisplaystyle++w%3D%5Csum_%7Bx_%7Bi%7D%5Cin+S%7D%5Cleft%5Bf_%7B%5Ctext%7Bnoisy+labels%7D%7D%28x_%7Bi%7D%29%5Ccdot+x_%7Bi%7D%5Cright%5D%3D+%5Cunderbrace%7B%5Csum_%7Bx_%7Bi%7D%5Cin+S%7D%5Cleft%5B%5Ctext%7Bsign%7D%28w%5E%7B%2A%7D%5Ccdot+x_%7Bi%7D%29%5Ccdot+x_%7Bi%7D%5Cright%5D%7D+_%7B%5Ctext%7BSignal+term%3A+%7Dw_%7B%5Ctext%7Bsignal%7D%7D%7D+%2B+%5Cunderbrace%7B%5Csum_%7Bx_%7Bi%7D%5Cin+S%7D%5Cleft%5B%28f_%7B%5Ctext%7Bnoisy+labels%7D%7D%28x_%7Bi%7D%29-%5Ctext%7Bsign%7D%28w%5E%7B%2A%7D%5Ccdot+x_%7Bi%7D%29%29%5Ccdot+x_%7Bi%7D%5Cright%5D%7D_%7B%5Ctext%7BNoise+term%3A+%7Dw_%7B%5Ctext%7Bnoise%7D%7D%7D+&#038;bg=ffffff&#038;fg=000&#038;s=0&#038;c=20201002\" alt=\"&#92;displaystyle  w=&#92;sum_{x_{i}&#92;in S}&#92;left[f_{&#92;text{noisy labels}}(x_{i})&#92;cdot x_{i}&#92;right]= &#92;underbrace{&#92;sum_{x_{i}&#92;in S}&#92;left[&#92;text{sign}(w^{*}&#92;cdot x_{i})&#92;cdot x_{i}&#92;right]} _{&#92;text{Signal term: }w_{&#92;text{signal}}} + &#92;underbrace{&#92;sum_{x_{i}&#92;in S}&#92;left[(f_{&#92;text{noisy labels}}(x_{i})-&#92;text{sign}(w^{*}&#92;cdot x_{i}))&#92;cdot x_{i}&#92;right]}_{&#92;text{Noise term: }w_{&#92;text{noise}}} \" class=\"latex\" \/>\n<br>\narguing<sup data-fn=\"8efb53cb-cccd-41c3-ae64-7e218cc109e9\" class=\"fn\"><a href=\"#8efb53cb-cccd-41c3-ae64-7e218cc109e9\" id=\"8efb53cb-cccd-41c3-ae64-7e218cc109e9-link\">10<\/a><\/sup> as follows:<\/p>\n\n\n\n<ol style=\"list-style-type:lower-alpha\" class=\"wp-block-list\">\n<li class=\"\">For a Gaussian dataset <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BS%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{S}\" class=\"latex\" \/>, the angle <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cangle%28w_%7B%5Ctext%7Bsignal%7D%7D%2Cw%5E%7B%2A%7D%29%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;angle(w_{&#92;text{signal}},w^{*})}\" class=\"latex\" \/> will be at most <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cvarepsilon%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;varepsilon}\" class=\"latex\" \/> with high probability. This is argued by observing that the inner product <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bw_%7B%5Ctext%7Bsignal%7D%7D%5Ccdot+w%5E%7B%2A%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{w_{&#92;text{signal}}&#92;cdot w^{*}}\" class=\"latex\" \/> is likely to be large while the inner product with directions orthogonal to <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bw%5E%7B%2A%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{w^{*}}\" class=\"latex\" \/> will likely be much smaller. Likewise, the norm <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%7Cw_%7B%5Ctext%7Bsignal%7D%7D%7C%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{|w_{&#92;text{signal}}|}\" class=\"latex\" \/> can be shown to be close to <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5CTheta%28%7CS%7C%29%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;Theta(|S|)}\" class=\"latex\" \/> with high probability.<\/li>\n\n\n\n<li class=\"\">Since the classifier <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Ctext%7Bsign%7D%28w%5E%7B%2A%7D%5Censuremath%7B%5Ccdot+x%7D%29%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;text{sign}(w^{*}&#92;ensuremath{&#92;cdot x})}\" class=\"latex\" \/> has the optimal error <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Ctext%7Bopt%7D_%7B%5Cmathcal%7BC%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;text{opt}_{&#92;mathcal{C}}}\" class=\"latex\" \/>, only <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BO%28%5Ctext%7Bopt%7D_%7B%5Cmathcal%7BC%7D%7D%29%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{O(&#92;text{opt}_{&#92;mathcal{C}})}\" class=\"latex\" \/> fraction of points in <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BS%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{S}\" class=\"latex\" \/> will contribute to <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bw_%7B%5Ctext%7Bnoise%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{w_{&#92;text{noise}}}\" class=\"latex\" \/> (with high probability over <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BS%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{S}\" class=\"latex\" \/>). Yet, it can be shown that small subsets of a Gaussian data-set have small averages. Formally, with high probability, if <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BS%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{S}\" class=\"latex\" \/> comes from the standard Gaussian and has size <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Ctext%7Bpoly%7D%28d%2F%5Cvarepsilon%29%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;text{poly}(d\/&#92;varepsilon)}\" class=\"latex\" \/>, every subset <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BS%27%5Csubset+S%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{S&#039;&#92;subset S}\" class=\"latex\" \/> of size <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Calpha%7CS%7C%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;alpha|S|}\" class=\"latex\" \/> satisfies<br><p align=\"center\"><img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%5Cdisplaystyle+%5Cleft%7C%5Csum_%7Bx_%7Bi%7D%5Cin+S%27%7D%5Cleft%5Bx_%7Bi%7D%5Cright%5D%5Cright%7C%5Cleq%5Cleft%28%5Ctilde%7BO%7D%5Cleft%28%5Calpha%5Cright%29%2B%5Cvarepsilon%5Cright%29%7CS%7C.+&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"&#92;displaystyle &#92;left|&#92;sum_{x_{i}&#92;in S&#039;}&#92;left[x_{i}&#92;right]&#92;right|&#92;leq&#92;left(&#92;tilde{O}&#92;left(&#92;alpha&#92;right)+&#92;varepsilon&#92;right)|S|. \" class=\"latex\" \/><\/p><br>This lets us upper-bound the norm <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%7Cw_%7B%5Ctext%7Bnoise%7D%7D%7C%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{|w_{&#92;text{noise}}|}\" class=\"latex\" \/> with <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cleft%28%5Ctilde%7BO%7D%28%5Ctext%7Bopt%7D_%7B%5Cmathcal%7BC%7D%7D%29%2B%5Cvarepsilon%5Cright%29%5Ccdot%7CS%7C%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;left(&#92;tilde{O}(&#92;text{opt}_{&#92;mathcal{C}})+&#92;varepsilon&#92;right)&#92;cdot|S|}\" class=\"latex\" \/>.<\/li>\n\n\n\n<li class=\"\">Steps (a) and (b) together can be used to conclude that<br><p align=\"center\"><br><img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%5Cdisplaystyle+%5Cangle%28w%2Cw%5E%7B%2A%7D%29%3D%5Cangle%28w_%7B%5Ctext%7Bsignal%7D%7D%2Bw_%7B%5Ctext%7Bnoise%7D%7D%2Cw%5E%7B%2A%7D%29%5Cleq%5Ctilde%7BO%7D%28%5Ctext%7Bopt%7D_%7B%5Cmathcal%7BC%7D%7D%29%2B%5Cvarepsilon.+&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"&#92;displaystyle &#92;angle(w,w^{*})=&#92;angle(w_{&#92;text{signal}}+w_{&#92;text{noise}},w^{*})&#92;leq&#92;tilde{O}(&#92;text{opt}_{&#92;mathcal{C}})+&#92;varepsilon. \" class=\"latex\" \/><\/p><br><p>We now use the bound on the angle <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cangle%28w%2Cw%5E%7B%2A%7D%29%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;angle(w,w^{*})}\" class=\"latex\" \/> to bound how much less accurate the classifier <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Ctext%7Bsign%7D%28w%5Censuremath%7B%5Ccdot+x%7D%29%29%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;text{sign}(w&#92;ensuremath{&#92;cdot x}))}\" class=\"latex\" \/> can be compared to <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Ctext%7Bsign%7D%28w%5E%7B%2A%7D%5Censuremath%7B%5Ccdot+x%7D%29%29%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;text{sign}(w^{*}&#92;ensuremath{&#92;cdot x}))}\" class=\"latex\" \/>. Indeed, for a Gaussian sample <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bx%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{x}\" class=\"latex\" \/>, the probability that <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Ctext%7Bsign%7D%28w%5E%7B%2A%7D%5Censuremath%7B%5Ccdot+x%7D%29%29%5Cneq%5Ctext%7Bsign%7D%28w%5Censuremath%7B%5Ccdot+x%7D%29%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;text{sign}(w^{*}&#92;ensuremath{&#92;cdot x}))&#92;neq&#92;text{sign}(w&#92;ensuremath{&#92;cdot x})}\" class=\"latex\" \/> is <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5CTheta%28%5Cangle%28w%2Cw%5E%7B%2A%7D%29%29%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;Theta(&#92;angle(w,w^{*}))}\" class=\"latex\" \/>, which allows us to conclude that <br><\/p> <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%5Cdisplaystyle+%5CPr_%7Bx%5Csim+D_%7B%5Ctext%7BExamples%7D%7D%7D%5B%5Ctext%7Bsign%7D%28w%7B%5Ccdot+x%7D%29%5Cneq+f_%7B%5Ctext%7Bnoisy+labels%7D%7D%28x%29%5D%5Cleq%5C%5C+%5Cunderbrace%7B%5CPr_%7Bx%5Csim+D_%7B%5Ctext%7BExamples%7D%7D%7D%5B%5Ctext%7Bsign%7D%28w%5E%7B%2A%7D%7B%5Ccdot+x%7D%29%5Cneq%5Ctext%7Bsign%7D%28w%7B%5Ccdot+x%7D%29%5D%7D_%7B%3D%5CTheta%28%5Cangle%28w%2Cw%5E%7B%2A%7D%29%29%5Cleq%5Ctilde%7BO%7D%28%5Ctext%7Bopt%7D_%7B%5Cmathcal%7BC%7D%7D%29%2B%5Cvarepsilon%7D%2B%5Cunderbrace%7B%5CPr_%7Bx%5Csim+D_%7B%5Ctext%7BExamples%7D%7D%7D%5B%5Ctext%7Bsign%7D%28w%5E%7B%2A%7D%7B%5Ccdot+x%7D%29%5Cneq+f_%7B%5Ctext%7Bnoisy+labels%7D%7D%28x%29%5D%7D_%7B%3D%5Ctext%7Bopt%7D_%7B%5Cmathcal%7BC%7D%7D%7D%3D%5Ctilde%7BO%7D%28%5Ctext%7Bopt%7D_%7B%5Cmathcal%7BC%7D%7D%29%2B%5Cvarepsilon.+&#038;bg=ffffff&#038;fg=000&#038;s=0&#038;c=20201002\" alt=\"&#92;displaystyle &#92;Pr_{x&#92;sim D_{&#92;text{Examples}}}[&#92;text{sign}(w{&#92;cdot x})&#92;neq f_{&#92;text{noisy labels}}(x)]&#92;leq&#92;&#92; &#92;underbrace{&#92;Pr_{x&#92;sim D_{&#92;text{Examples}}}[&#92;text{sign}(w^{*}{&#92;cdot x})&#92;neq&#92;text{sign}(w{&#92;cdot x})]}_{=&#92;Theta(&#92;angle(w,w^{*}))&#92;leq&#92;tilde{O}(&#92;text{opt}_{&#92;mathcal{C}})+&#92;varepsilon}+&#92;underbrace{&#92;Pr_{x&#92;sim D_{&#92;text{Examples}}}[&#92;text{sign}(w^{*}{&#92;cdot x})&#92;neq f_{&#92;text{noisy labels}}(x)]}_{=&#92;text{opt}_{&#92;mathcal{C}}}=&#92;tilde{O}(&#92;text{opt}_{&#92;mathcal{C}})+&#92;varepsilon. \" class=\"latex\" \/><br><\/li>\n<\/ol>\n\n\n\n<p class=\"wp-block-paragraph\">A series of subsequent works developed new algorithms, improving the prediction error to <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BO%28%5Ctext%7Bopt%7D_%7B%5Cmathcal%7BC%7D%7D%29%2B%5Cvarepsilon%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{O(&#92;text{opt}_{&#92;mathcal{C}})+&#92;varepsilon}\" class=\"latex\" \/> [<a href=\"https:\/\/dl.acm.org\/doi\/10.1145\/2591796.2591839\">ABL &#8217;14<\/a>, <a href=\"https:\/\/proceedings.neurips.cc\/paper\/2020\/hash\/d785bf9067f8af9e078b93cf26de2b54-Abstract.html\">DKTZ &#8217;20<\/a>, <a href=\"https:\/\/proceedings.mlr.press\/v162\/diakonikolas22b.html\">DKTZ &#8217;22<\/a>]. Also, unlike the Low-Degree Algorithm, these algorithms are <em>proper<\/em>, i.e. the hypothesis <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bh%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{h}\" class=\"latex\" \/> they produce is itself a linear classifier, rather than a complicated function of the form <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Ctext%7Bsign%7D%28p%28x%29%29%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;text{sign}(p(x))}\" class=\"latex\" \/>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">However, all these algorithms inherently operate in Framework 2 rather than Framework 3, relying on the Gaussianity assumption not only for their run-time but also for their accuracy guarantee. For example, we can see that all three aforementioned steps &#8212; (a), (b) and (c) &#8212; in the accuracy analysis of the Averaging Algorithm use Gaussianity in crucial ways.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">3.1.2 Modified Moment-Matching Tester.<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">Can we again use the recipe from Section <a href=\"#subsecMaking-Low-Degree-Algorithm\">2.2<\/a> and the Moment-Matching Tester to get a <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Ctext%7Bpoly%7D%28d%2F%5Cvarepsilon%29%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;text{poly}(d\/&#92;varepsilon)}\" class=\"latex\" \/>-time algorithm in Framework 3 based on the Averaging Algorithm? Conceptually, the Moment-Matching Tester checks that the distribution <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BD_%7B%5Ctext%7BExamples%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{D_{&#92;text{Examples}}}\" class=\"latex\" \/> is indistinguishable from <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BD_%7B%5Ctext%7BAssumption%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{D_{&#92;text{Assumption}}}\" class=\"latex\" \/> by low-degree monomials. It is therefore unsurprising that this tester works well with the Low-Degree Algorithm, as this algorithm is based on finding the best-fitting polynomial. By the same token, one would not expect the Moment-Matching Tester to work well with tailor-made algorithms such as the Averaging Algorithm, as these algorithms do not use polynomials in any way.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Surprisingly, a modified version of the Moment-Matching Tester can nevertheless be used in this way. First, we run one of these algorithms and obtain a hypothesis <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bh%28x%29%3D%5Ctext%7Bsign%7D%28w%5Censuremath%7B%5Ccdot+x%7D%29%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{h(x)=&#92;text{sign}(w&#92;ensuremath{&#92;cdot x})}\" class=\"latex\" \/>. As in Section <a href=\"#subsecMaking-Low-Degree-Algorithm\">2.2<\/a>, we need to decide whether to output the classifier <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bh%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{h}\" class=\"latex\" \/> or to run a slow Framework 1 Algorithm. This is again done by running a tester <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cmathcal%7BT%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;mathcal{T}}\" class=\"latex\" \/>, but here the tester also uses the vector <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bw%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{w}\" class=\"latex\" \/> obtained prior to running the tester:<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p class=\"wp-block-paragraph\"><b>Strip Tester [<a href=\"https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2023\/hash\/7c319b62e2257b34cb0e1040ced2e007-Abstract-Conference.html\">DKKLZ &#8217;23<\/a>, <a href=\"https:\/\/papers.nips.cc\/paper_files\/paper\/2023\/hash\/204d9a9a4816a45909010587ffc3204b-Abstract-Conference.html\">GKSV &#8217;23<\/a>, <a href=\"https:\/\/openreview.net\/forum?id=z6n1fKMMC1\">GKSV &#8217;24<\/a>]:<\/b><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li class=\"\">Given: access to independent examples from <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BD_%7B%5Ctext%7BExamples%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{D_{&#92;text{Examples}}}\" class=\"latex\" \/>, a reference distribution <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BD_%7B%5Ctext%7BAssumption%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{D_{&#92;text{Assumption}}}\" class=\"latex\" \/>, unit vector <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bw%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{w}\" class=\"latex\" \/> in <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cmathbb%7BR%7D%5E%7Bd%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;mathbb{R}^{d}}\" class=\"latex\" \/>.<\/li>\n\n\n\n<li class=\"\">Output: Accept or Reject.<\/li>\n<\/ul>\n\n\n\n<ol class=\"wp-block-list\">\n<li class=\"\"><img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BS%5Cleftarrow%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{S&#92;leftarrow}\" class=\"latex\" \/> {<img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Ctext%7Bpoly%7D%28d%2F%5Cvarepsilon%29%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;text{poly}(d\/&#92;varepsilon)}\" class=\"latex\" \/> independent examples from <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BD_%7B%5Ctext%7BExamples%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{D_{&#92;text{Examples}}}\" class=\"latex\" \/>}.<\/li>\n\n\n\n<li class=\"\">For all monomials <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bm%3D%5Cprod_%7Bi%7Dx_%7Bi%7D%5E%7B%5Calpha_%7Bi%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{m=&#92;prod_{i}x_{i}^{&#92;alpha_{i}}}\" class=\"latex\" \/> of total degree at most <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BO%281%29%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{O(1)}\" class=\"latex\" \/> and integers <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bi%5Cin%5Cleft%28-%5Cfrac%7B%5Csqrt%7B%5Clog1%2F%5Cvarepsilon%7D%7D%7B%5Cvarepsilon%7D%2C%5Cfrac%7B%5Csqrt%7B%5Clog1%2F%5Cvarepsilon%7D%7D%7B%5Cvarepsilon%7D%5Cright%29%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{i&#92;in&#92;left(-&#92;frac{&#92;sqrt{&#92;log1\/&#92;varepsilon}}{&#92;varepsilon},&#92;frac{&#92;sqrt{&#92;log1\/&#92;varepsilon}}{&#92;varepsilon}&#92;right)}\" class=\"latex\" \/>:\n<ol class=\"wp-block-list\">\n<li class=\"\">Define<br><p align=\"center\"><img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%5Cdisplaystyle+%5Cmathbf%7B1%7D_%7Bw%5Ccdot+x%5Cin%5Bi%5Cvarepsilon%2C%28i%2B1%29%5Cvarepsilon%5D%7D%3D%5Cbegin%7Bcases%7D+1+%26+%5Ctext%7Bif+%7Dw%5Ccdot+x%5Cin%5Bi%5Cvarepsilon%2C%28i%2B1%29%5Cvarepsilon%5D%5C%5C+0+%26+%5Ctext%7Botherwise.%7D+%5Cend%7Bcases%7D+&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"&#92;displaystyle &#92;mathbf{1}_{w&#92;cdot x&#92;in[i&#92;varepsilon,(i+1)&#92;varepsilon]}=&#92;begin{cases} 1 &amp; &#92;text{if }w&#92;cdot x&#92;in[i&#92;varepsilon,(i+1)&#92;varepsilon]&#92;&#92; 0 &amp; &#92;text{otherwise.} &#92;end{cases} \" class=\"latex\" \/><\/p><br><\/li>\n\n\n\n<li class=\"\">If <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Clvert%5Cmathbb%7BE%7D_%7Bx%5Csim+S%7D%5Bm%28x%29%5Cmathbf%7B1%7D_%7Bw%5Ccdot+x%5Cin%5Bi%5Cvarepsilon%2C%28i%2B1%29%5Cvarepsilon%5D%7D%5D-%5Cmathbb%7BE%7D_%7Bx%5Csim+D_%7B%5Ctext%7BAssumption%7D%7D%7D%5Bm%28x%29%5Cmathbf%7B1%7D_%7Bw%5Ccdot+x%5Cin%5Bi%5Cvarepsilon%2C%28i%2B1%29%5Cvarepsilon%5D%7D%5D%5Crvert%3E%5Cleft%28%5Cvarepsilon%2Fd%5Cright%29%5E%7BO%281%29%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;lvert&#92;mathbb{E}_{x&#92;sim S}[m(x)&#92;mathbf{1}_{w&#92;cdot x&#92;in[i&#92;varepsilon,(i+1)&#92;varepsilon]}]-&#92;mathbb{E}_{x&#92;sim D_{&#92;text{Assumption}}}[m(x)&#92;mathbf{1}_{w&#92;cdot x&#92;in[i&#92;varepsilon,(i+1)&#92;varepsilon]}]&#92;rvert&gt;&#92;left(&#92;varepsilon\/d&#92;right)^{O(1)}}\" class=\"latex\" \/>, output Reject and terminate.<\/li>\n<\/ol>\n<\/li>\n\n\n\n<li class=\"\">If this step is reached, output Accept.<\/li>\n<\/ol>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p class=\"wp-block-paragraph\">Essentially, the algorithm above breaks the <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bd%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{d}\" class=\"latex\" \/>-dimensional space into strips along the vector <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bw%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{w}\" class=\"latex\" \/> and runs the degree-<b><img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BO%281%29%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{O(1)}\" class=\"latex\" \/> <\/b>moment tester for each of the strips.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Combining this Strip Tester with the Framework 2 algorithm of [<a href=\"https:\/\/proceedings.neurips.cc\/paper\/2020\/hash\/d785bf9067f8af9e078b93cf26de2b54-Abstract.html\">DKTZ &#8217;20<\/a>], yields an algorithm in Framework 3 with prediction error <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BO%28%5Ctext%7Bopt%7D_%7B%5Cmathcal%7BC%7D%7D%29%2B%5Cvarepsilon%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{O(&#92;text{opt}_{&#92;mathcal{C}})+&#92;varepsilon}\" class=\"latex\" \/> and run-time <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Ctext%7Bpoly%7D%28d%2F%5Cvarepsilon%29%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;text{poly}(d\/&#92;varepsilon)}\" class=\"latex\" \/> [<a href=\"https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2023\/hash\/7c319b62e2257b34cb0e1040ced2e007-Abstract-Conference.html\">DKKLZ &#8217;23<\/a>, <a href=\"https:\/\/papers.nips.cc\/paper_files\/paper\/2023\/hash\/204d9a9a4816a45909010587ffc3204b-Abstract-Conference.html\">GKSV &#8217;23<\/a>, <a href=\"https:\/\/openreview.net\/forum?id=z6n1fKMMC1\">GKSV &#8217;24<\/a>], where <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cmathcal%7BC%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;mathcal{C}}\" class=\"latex\" \/> is the class of origin-centered linear classifiers.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Moreover, like the Averaging Algorithm, this algorithm is <em>proper<\/em>, i.e. the hypothesis <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bh%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{h}\" class=\"latex\" \/> is itself a linear classifier. This addresses the second limitation of the Low Degree Algorithm we pointed out earlier.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><b>3.2 Assumption-Tolerance and the Spectral Tester. <\/b><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><a name=\"subsecAssumption-Tolerance-and-the\"><\/a><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Finally, we briefly discuss some recent work on addressing the third limitation of the Moment-Matching Tester described in the beginning of Section <a href=\"#secGoing-Beyond-Low-Degree\">3<\/a>. To recap, the limitation is that the Moment-Matching Tester might reject distributions that are very close to the distribution <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BD_%7B%5Ctext%7BAssumption%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{D_{&#92;text{Assumption}}}\" class=\"latex\" \/> but not exactly equal to it. To address this, the paper considers the following<sup data-fn=\"5da9fe9e-2908-44f7-b5dd-3394397a0f9a\" class=\"fn\"><a href=\"#5da9fe9e-2908-44f7-b5dd-3394397a0f9a\" id=\"5da9fe9e-2908-44f7-b5dd-3394397a0f9a-link\">11<\/a><\/sup> more stringent version of Framework 3:<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p class=\"wp-block-paragraph\"><b>Framework 4 [Tolerant Testable Framework]<\/b> For any <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BD_%7B%5Ctext%7BExamples%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{D_{&#92;text{Examples}}}\" class=\"latex\" \/> and <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bf_%7B%5Ctext%7Bnoisy+labels%7D%7D%2C%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{f_{&#92;text{noisy labels}},}\" class=\"latex\" \/> the learning algorithm should with high probability output a classifier <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bh%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{h}\" class=\"latex\" \/> with <a name=\"eq optimality guarantee.-1\"><\/a><\/p>\n\n\n\n<p class=\"has-text-align-center wp-block-paragraph\"><img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%5Cdisplaystyle+%5Coverbrace%7B%5CPr_%7Bx%5Csim+D_%7B%5Ctext%7BExamples%7D%7D%7D%5Bh%28x%29%5Cneq+f_%7B%5Ctext%7Bnoisy+labels%7D%7D%28x%29%5D%7D%5E%7B%5Ctext%7BPrediction+error+of+%5Censuremath%7Bh%7D.%7D%7D%5Cleq%5Cunderbrace%7B%5Coverbrace%7B%5Ctext%7Bopt%7D_%7B%5Cmathcal%7BC%7D%7D%7D%5E%7B%5Ctext%7BOptimal+prediction+error+in+%5Censuremath%7B%5Cmathcal%7BC%7D.%7D%7D%7D%2B%5Cvarepsilon%7D_%7B%5Ctext%7B%5Censuremath%7B%5Ctext%7BWeaker+guarantees+such+as+O%28%5Censuremath%7B%5Ctext%7Bopt%7D_%7B%5Cmathcal%7BC%7D%7D%7D%29%2B%5Censuremath%7B%5Cvarepsilon%5C+%7Dalso+considered.%7D%7D%7D%7D.+%5C+%5C+%5C+%5C+%5C+%282%29&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"&#92;displaystyle &#92;overbrace{&#92;Pr_{x&#92;sim D_{&#92;text{Examples}}}[h(x)&#92;neq f_{&#92;text{noisy labels}}(x)]}^{&#92;text{Prediction error of &#92;ensuremath{h}.}}&#92;leq&#92;underbrace{&#92;overbrace{&#92;text{opt}_{&#92;mathcal{C}}}^{&#92;text{Optimal prediction error in &#92;ensuremath{&#92;mathcal{C}.}}}+&#92;varepsilon}_{&#92;text{&#92;ensuremath{&#92;text{Weaker guarantees such as O(&#92;ensuremath{&#92;text{opt}_{&#92;mathcal{C}}})+&#92;ensuremath{&#92;varepsilon&#92; }also considered.}}}}. &#92; &#92; &#92; &#92; &#92; (2)\" class=\"latex\" \/><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Furthermore, for some absolute constant <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BC%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{C}\" class=\"latex\" \/>, if <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Ctext%7Bdist%7D_%7B%5Ctext%7BTV%7D%7D%28D_%7B%5Ctext%7BExamples%7D%7D%2CD_%7B%5Ctext%7BAssumption%7D%7D%29%5Cleq+C%5Cvarepsilon%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;text{dist}_{&#92;text{TV}}(D_{&#92;text{Examples}},D_{&#92;text{Assumption}})&#92;leq C&#92;varepsilon}\" class=\"latex\" \/>, then with high probability the learning algorithm should run in time <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BT%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{T}\" class=\"latex\" \/>.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p class=\"wp-block-paragraph\">In [<a href=\"https:\/\/papers.nips.cc\/paper_files\/paper\/2024\/hash\/e209210eae282e23e305df49fbb2769c-Abstract-Conference.html\">GSSV &#8217;24<\/a>], a general methodology is developed for designing algorithms in this more general framework. For example, when <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BD_%7B%5Ctext%7BAssumption%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{D_{&#92;text{Assumption}}}\" class=\"latex\" \/> is the Gaussian distribution and <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cmathcal%7BC%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;mathcal{C}}\" class=\"latex\" \/> is the class of linear classifiers, a run-time of <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bd%5E%7B%5Ctilde%7BO%7D%281%2F%5Cvarepsilon%5E%7B2%7D%29%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{d^{&#92;tilde{O}(1\/&#92;varepsilon^{2})}}\" class=\"latex\" \/> is achieved, matching the best algorithms in Frameworks 2 and 3.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">In brief, one of the key insights is that the degree-<img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bk%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{k}\" class=\"latex\" \/> Moment-Matching tester can often be replaced by what is called the <em>degree-<img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bk%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{k}\" class=\"latex\" \/> Spectral Tester<\/em>. Given a data-set <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BS%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{S}\" class=\"latex\" \/>, this tester checks that<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%5Cdisplaystyle+%5Cmax_%7B%5Ctext%7Bdegree%7D-k%5Ctext%7B+polynomial+%7Dp%7D%5Cmathbb%7BE%7D_%7Bx%5Csim+S%7D%5B%28p%28x%29%29%5E%7B2%7D%5D%5Cleq+O%281%29%5Cmathbb%7BE%7D_%7Bx%5Csim+D_%7B%5Ctext%7BAssumption%7D%7D%7D%5B%28p%28x%29%29%5E%7B2%7D%5D.+&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"&#92;displaystyle &#92;max_{&#92;text{degree}-k&#92;text{ polynomial }p}&#92;mathbb{E}_{x&#92;sim S}[(p(x))^{2}]&#92;leq O(1)&#92;mathbb{E}_{x&#92;sim D_{&#92;text{Assumption}}}[(p(x))^{2}]. \" class=\"latex\" \/><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The second insight is that the degree-<img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bk%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{k}\" class=\"latex\" \/> Spectral Tester can be implemented in time <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bd%5E%7BO%28k%29%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{d^{O(k)}}\" class=\"latex\" \/> using an eigenvalue computation. The final insight<sup data-fn=\"d7bfd691-edf6-4468-9965-ab5b7dba918c\" class=\"fn\"><a href=\"#d7bfd691-edf6-4468-9965-ab5b7dba918c\" id=\"d7bfd691-edf6-4468-9965-ab5b7dba918c-link\">12<\/a><\/sup> is that when <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Ctext%7Bdist%7D_%7B%5Ctext%7BTV%7D%7D%28D_%7B%5Ctext%7BExamples%7D%7D%2CD_%7B%5Ctext%7BAssumption%7D%7D%29%5Cleq+C%5Cvarepsilon%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;text{dist}_{&#92;text{TV}}(D_{&#92;text{Examples}},D_{&#92;text{Assumption}})&#92;leq C&#92;varepsilon}\" class=\"latex\" \/> one can remove <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BO%28%5Cvarepsilon%29%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{O(&#92;varepsilon)}\" class=\"latex\" \/> fraction of elements from <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BS%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{S}\" class=\"latex\" \/> and have it pass the Spectral Tester. [<a href=\"https:\/\/papers.nips.cc\/paper_files\/paper\/2024\/hash\/e209210eae282e23e305df49fbb2769c-Abstract-Conference.html\">GSSV &#8217;24<\/a>] designs an algorithm that efficiently finds which <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BO%28%5Cvarepsilon%29%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{O(&#92;varepsilon)}\" class=\"latex\" \/> fraction of elements needs to be removed from <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BS%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{S}\" class=\"latex\" \/> to achieve this.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><b>4. Other recent work. <\/b><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">This blog post focused on side-stepping the intractability of Framework 1 via making assumptions on the distribution <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BD_%7B%5Ctext%7BExamples%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{D_{&#92;text{Examples}}}\" class=\"latex\" \/> and not making any assumptions on the labeling function <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bf_%7B%5Ctext%7Bnoisy+labels%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{f_{&#92;text{noisy labels}}}\" class=\"latex\" \/>. Yet, improved algorithms are possible when the labeling function <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bf_%7B%5Ctext%7Bnoisy+labels%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{f_{&#92;text{noisy labels}}}\" class=\"latex\" \/> is also assumed to be well-behaved. For example, the <em>Random Classification Noise <\/em>assumption essentially presupposes that <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bf_%7B%5Ctext%7Bnoisy+labels%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{f_{&#92;text{noisy labels}}}\" class=\"latex\" \/> is formed by taking a hypothesis <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bf%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{f}\" class=\"latex\" \/> in the class <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cmathcal%7BC%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;mathcal{C}}\" class=\"latex\" \/> and flipping each function value with some probability <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Ceta%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;eta}\" class=\"latex\" \/>. The work of [<a href=\"https:\/\/arxiv.org\/abs\/2501.09189\">GKSV &#8217;25<\/a>] gives algorithms for which (like in Framework 3) the failure of the Random Classification Noise assumptions can affect only the run-time rather than the accuracy guarantee. [<a href=\"https:\/\/arxiv.org\/abs\/2501.09189\">GKSV &#8217;25<\/a>] also designs such algorithms for the <em>Massart Noise <\/em>assumption that is far more general than the Random Classification Noise assumption.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The work of [<a href=\"https:\/\/papers.nips.cc\/paper_files\/paper\/2024\/hash\/06e9029d3d4d6cee71c5d9b8502f891b-Abstract-Conference.html\">STW &#8217;24<\/a>] shows how to use the Low-Degree Algorithm and the Moment-Matching Tester to get algorithms in Framework 3 for the challenging hypothesis class of polynomial threshold functions.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The work of [<a href=\"https:\/\/papers.nips.cc\/paper_files\/paper\/2023\/hash\/204d9a9a4816a45909010587ffc3204b-Abstract-Conference.html\">GKSV &#8217;23<\/a>] extends Framework 3 to families of distributions. Rather than being required to run in time <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BT%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{T}\" class=\"latex\" \/> only when <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BD_%7B%5Ctext%7BExamples%7D%7D%3DD_%7B%5Ctext%7BAssumption%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{D_{&#92;text{Examples}}=D_{&#92;text{Assumption}}}\" class=\"latex\" \/> for a specific distribution <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BD_%7B%5Ctext%7BAssumption%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{D_{&#92;text{Assumption}}}\" class=\"latex\" \/>, the algorithms are required to run in time <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BT%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{T}\" class=\"latex\" \/> when <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BD_%7B%5Ctext%7BExamples%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{D_{&#92;text{Examples}}}\" class=\"latex\" \/> belongs to an entire family of distributions <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cmathcal%7BD%7D_%7B%5Ctext%7BAssumption%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;mathcal{D}_{&#92;text{Assumption}}}\" class=\"latex\" \/>. Designing such algorithms turns out to be intimately connected with the field of Sum-of-Squares relaxations [<a href=\"https:\/\/arxiv.org\/abs\/1711.07465\">KS17<\/a>].<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">[<a href=\"https:\/\/proceedings.mlr.press\/v247\/klivans24a.html\">KKV24<\/a>, <a href=\"https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2024\/hash\/e209210eae282e23e305df49fbb2769c-Abstract-Conference.html\">GSSV&#8217;24<\/a>, <a href=\"https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2024\/hash\/f8238b414fce7d91fedb896253545300-Abstract-Conference.html\">CKKSV&#8217;24<\/a>, <a href=\"https:\/\/proceedings.mlr.press\/v247\/klivans24b.html\">KSV&#8217;24<\/a>] build on the approaches described here to <em>test distribution shift<\/em>, i.e. to test that the unlabeled data found during deployment of a classifier <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bh%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{h}\" class=\"latex\" \/> comes from the same distribution as the training data. Similar to the test in Section <a href=\"#subsecMaking-Low-Degree-Algorithm\">2.2<\/a>, the test is required to either guarantee that <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bh%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{h}\" class=\"latex\" \/> has a good prediction accuracy on this new dataset, or to detect that the new dataset came from a data distribution that differs from the distribution used during training.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><b>5. Open Problems. <\/b><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">There are unsolved problems for all three frameworks we discussed today. Even in Framework 1, where there are still large gaps in our understanding of some basic problems:<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p class=\"wp-block-paragraph\"><b>Open problem 1: <\/b>Is there an algorithm in Framework 1 with accuracy <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Ctext%7Bopt%7D_%7B%5Cmathcal%7BC%7D%7D%2B%5Cvarepsilon%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;text{opt}_{&#92;mathcal{C}}+&#92;varepsilon}\" class=\"latex\" \/> and run-time bound <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bd%5E%7BO_%7B%5Cvarepsilon%7D%281%29%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{d^{O_{&#92;varepsilon}(1)}}\" class=\"latex\" \/>, when <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cmathcal%7BC%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;mathcal{C}}\" class=\"latex\" \/> is the class of linear classifiers? What about algorithms with run-times <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B2%5E%7Bd%5E%7Bo%281%29%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{2^{d^{o(1)}}}\" class=\"latex\" \/> or <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B2%5E%7Bo%28d%29%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{2^{o(d)}}\" class=\"latex\" \/> when <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cvarepsilon%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;varepsilon}\" class=\"latex\" \/> is fixed to be a small constant (say <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cvarepsilon%3D0.01%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;varepsilon=0.01}\" class=\"latex\" \/>)?<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p class=\"wp-block-paragraph\">To the best of our understanding, assuming the Exponential Time Hypothesis, the NP-hardness results of [<a href=\"https:\/\/ieeexplore.ieee.org\/document\/4031389\">GR&#8217;06<\/a>, <a href=\"https:\/\/ieeexplore.ieee.org\/document\/4031391\">FGKP&#8217;06<\/a>] imply that no <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B2%5E%7Bo%28d%29%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{2^{o(d)}}\" class=\"latex\" \/>-time algorithm exists for this problem for a <em>proper<\/em> learning algorithm, i.e. an algorithm that itself outputs a linear classifier. However, when general <em>improper <\/em>algorithms are considered (i.e. algorithms with no restrictions on what classifier they can produce), no such hardness result is known.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The following is an open question in Framework 2:<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p class=\"wp-block-paragraph\"><b>Open problem 2: <\/b>Is there an algorithm in Framework 2 with accuracy <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BO%28%5Ctext%7Bopt%7D_%7B%5Cmathcal%7BC%7D%7D%29%2B%5Cvarepsilon%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{O(&#92;text{opt}_{&#92;mathcal{C}})+&#92;varepsilon}\" class=\"latex\" \/> and run-time bound <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Ctext%7Bpoly%7D%28d%2F%5Cvarepsilon%29%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;text{poly}(d\/&#92;varepsilon)}\" class=\"latex\" \/>, when <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cmathcal%7BC%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;mathcal{C}}\" class=\"latex\" \/> is the class of linear classifiers and <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BD_%7B%5Ctext%7BAssumption%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{D_{&#92;text{Assumption}}}\" class=\"latex\" \/> is the uniform distribution over <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cleft%5C%7B+%5Cpm1%5Cright%5C%7D+%5E%7Bd%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;left&#92;{ &#92;pm1&#92;right&#92;} ^{d}}\" class=\"latex\" \/>? What about algorithms with weaker accuracy bounds such as <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cwidetilde%7BO%7D%28%5Ctext%7Bopt%7D_%7B%5Cmathcal%7BC%7D%7D%29%2B%5Cvarepsilon%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;widetilde{O}(&#92;text{opt}_{&#92;mathcal{C}})+&#92;varepsilon}\" class=\"latex\" \/> and <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BO%28%5Csqrt%7B%5Ctext%7Bopt%7D_%7B%5Cmathcal%7BC%7D%7D%7D%29%2B%5Cvarepsilon%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{O(&#92;sqrt{&#92;text{opt}_{&#92;mathcal{C}}})+&#92;varepsilon}\" class=\"latex\" \/>?<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p class=\"wp-block-paragraph\">For a long time, we have had algorithms with run-time <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Ctext%7Bpoly%7D%28d%2F%5Cvarepsilon%29%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;text{poly}(d\/&#92;varepsilon)}\" class=\"latex\" \/> and accuracy <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BO%28%5Ctext%7Bopt%7D_%7B%5Cmathcal%7BC%7D%7D%29%2B%5Cvarepsilon%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{O(&#92;text{opt}_{&#92;mathcal{C}})+&#92;varepsilon}\" class=\"latex\" \/> when <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BD_%7B%5Ctext%7Bassumption%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{D_{&#92;text{assumption}}}\" class=\"latex\" \/> is the standard Gaussian distribution [<a href=\"https:\/\/dl.acm.org\/doi\/10.1145\/2591796.2591839\">ABL &#8217;14<\/a>, <a href=\"https:\/\/proceedings.neurips.cc\/paper\/2020\/hash\/d785bf9067f8af9e078b93cf26de2b54-Abstract.html\">DKTZ &#8217;20<\/a>, <a href=\"https:\/\/proceedings.mlr.press\/v162\/diakonikolas22b.html\">DKTZ &#8217;22<\/a>]. Ultimately, one would hope to achieve this for all product distributions over <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cmathbb%7BR%7D%5E%7Bd%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;mathbb{R}^{d}}\" class=\"latex\" \/> and not only the Gaussian distribution, but even for the uniform distribution over <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cleft%5C%7B+%5Cpm1%5Cright%5C%7D+%5E%7Bd%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;left&#92;{ &#92;pm1&#92;right&#92;} ^{d}}\" class=\"latex\" \/> such algorithms are yet to be developed.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Finally, the following is an open question in Framework 3:<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p class=\"wp-block-paragraph\"><b>Open problem 3: <\/b>Is there an algorithm in Framework 3 with accuracy <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BO%28%5Ctext%7Bopt%7D_%7B%5Cmathcal%7BC%7D%7D%29%2B%5Cvarepsilon%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{O(&#92;text{opt}_{&#92;mathcal{C}})+&#92;varepsilon}\" class=\"latex\" \/> and run-time bound <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Ctext%7Bpoly%7D%28d%2F%5Cvarepsilon%29%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;text{poly}(d\/&#92;varepsilon)}\" class=\"latex\" \/>, when <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cmathcal%7BC%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;mathcal{C}}\" class=\"latex\" \/> is the class of general linear classifiers and <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BD_%7B%5Ctext%7BAssumption%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{D_{&#92;text{Assumption}}}\" class=\"latex\" \/> is the standard Gaussian distribution?<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p class=\"wp-block-paragraph\">Note that [<a href=\"https:\/\/proceedings.mlr.press\/v247\/diakonikolas24a.html\">DKLZ&#8217;24<\/a>] gives an algorithm with a run-time bound <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Ctext%7Bpoly%7D%28d%2F%5Cvarepsilon%29%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;text{poly}(d\/&#92;varepsilon)}\" class=\"latex\" \/> and accuracy <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BO%28%5Csqrt%7B%5Ctext%7Bopt%7D_%7B%5Cmathcal%7BC%7D%7D%7D%29%2B%5Cvarepsilon%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{O(&#92;sqrt{&#92;text{opt}_{&#92;mathcal{C}}})+&#92;varepsilon}\" class=\"latex\" \/>, [<a href=\"https:\/\/dl.acm.org\/doi\/10.1145\/3564246.3585117\">RV &#8217;23<\/a>, <a href=\"https:\/\/dl.acm.org\/doi\/10.1145\/3564246.3585206\">GKK &#8217;23<\/a>] give algorithms with accuracy <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Ctext%7Bopt%7D_%7B%5Cmathcal%7BC%7D%7D%2B%5Cvarepsilon%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;text{opt}_{&#92;mathcal{C}}+&#92;varepsilon}\" class=\"latex\" \/> and run-time <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bd%5E%7B%5Ctilde%7BO%7D%281%2F%5Cvarepsilon%5E%7B2%7D%29%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{d^{&#92;tilde{O}(1\/&#92;varepsilon^{2})}}\" class=\"latex\" \/>. As mentioned earlier, [<a href=\"https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2023\/hash\/7c319b62e2257b34cb0e1040ced2e007-Abstract-Conference.html\">DKKLZ &#8217;23<\/a>, <a href=\"https:\/\/papers.nips.cc\/paper_files\/paper\/2023\/hash\/204d9a9a4816a45909010587ffc3204b-Abstract-Conference.html\">GKSV &#8217;23<\/a>, <a href=\"https:\/\/openreview.net\/forum?id=z6n1fKMMC1\">GKSV &#8217;24<\/a>] give algorithms with accuracy <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BO%28%5Ctext%7Bopt%7D_%7B%5Cmathcal%7BC%7D%7D%29%2B%5Cvarepsilon%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{O(&#92;text{opt}_{&#92;mathcal{C}})+&#92;varepsilon}\" class=\"latex\" \/> but only when <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cmathcal%7BC%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;mathcal{C}}\" class=\"latex\" \/> is the class of <em>origin-centered<\/em> linear classifiers.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"> <strong>Acknowledgements <\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">We are grateful to Gautam Kamath, Adam Klivans, and Ronitt Rubinfeld for their valuable feedback on a draft of this blog post.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><b>Footnotes<\/b><\/p>\n\n\n<ol class=\"wp-block-footnotes\"><li id=\"15aafd02-c12c-4db6-b4a0-4f893efdd1a6\">A proof sketch: Draw <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BN%5E%7B2%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{N^{2}}\" class=\"latex\" \/> samples <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BS%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{S}\" class=\"latex\" \/> from the Gaussian distribution and let <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BD_%7B%5Ctext%7Bsub-sample%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{D_{&#92;text{sub-sample}}}\" class=\"latex\" \/> be uniform over <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BS%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{S}\" class=\"latex\" \/>. The distribution <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BD_%7B%5Ctext%7Bsub-sample%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{D_{&#92;text{sub-sample}}}\" class=\"latex\" \/> is <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5COmega%281%29%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;Omega(1)}\" class=\"latex\" \/>-far from Gaussian in total variation distance, but it can be shown that <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5COmega%28N%29%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;Omega(N)}\" class=\"latex\" \/> samples are necessary to distinguish such <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BD_%7B%5Ctext%7Bsub-sample%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{D_{&#92;text{sub-sample}}}\" class=\"latex\" \/> from the Gaussian distribution. We can take <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BN%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{N}\" class=\"latex\" \/> to be arbitrarily large, so no finite-sample tester exists. <a href=\"#15aafd02-c12c-4db6-b4a0-4f893efdd1a6-link\" aria-label=\"Jump to footnote reference 1\">\u21a9\ufe0e<\/a><\/li><li id=\"30a9afd3-f30d-4c82-8879-cc2de2a0d64c\">One can also consider labels <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cleft%5C%7B+y_%7Bi%7D%5Cright%5C%7D+%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;left&#92;{ y_{i}&#92;right&#92;} }\" class=\"latex\" \/> that are themselves random variables rather than a deterministic function of example <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bx%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{x}\" class=\"latex\" \/>. Everything is this blog post will apply to that setting as well. <a href=\"#30a9afd3-f30d-4c82-8879-cc2de2a0d64c-link\" aria-label=\"Jump to footnote reference 2\">\u21a9\ufe0e<\/a><\/li><li id=\"ebba8fcb-f5da-491f-bd59-92e53a33a4bc\">Throughout blog post, we will assume a success probability of (say) <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B0.99%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{0.99}\" class=\"latex\" \/>. This probability can be improved to <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B1-%5Cdelta%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{1-&#92;delta}\" class=\"latex\" \/> by repeating the process <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Clog1%2F%5Cdelta%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;log1\/&#92;delta}\" class=\"latex\" \/> times and selecting the classifier that does best on a fresh set of samples. <a href=\"#ebba8fcb-f5da-491f-bd59-92e53a33a4bc-link\" aria-label=\"Jump to footnote reference 3\">\u21a9\ufe0e<\/a><\/li><li id=\"a367653b-0bd9-4068-ac43-a1ec745f36d5\">Note: even for this easier task, no algorithm is known in Framework 2 with run-time better than <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B2%5E%7BO%28d%29%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{2^{O(d)}}\" class=\"latex\" \/>. <a href=\"#a367653b-0bd9-4068-ac43-a1ec745f36d5-link\" aria-label=\"Jump to footnote reference 4\">\u21a9\ufe0e<\/a><\/li><li id=\"e18fc90b-639d-4ee0-bce9-3c77a7eaa096\">Henceforth, for simplicity, when we say &#8220;a Framework 3 algorithm <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cmathcal%7BA%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;mathcal{A}}\" class=\"latex\" \/> runs in time <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BT%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{T}\" class=\"latex\" \/>&#8221; we really mean &#8220;<img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cmathcal%7BA%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;mathcal{A}}\" class=\"latex\" \/> runs in time <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BT%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{T}\" class=\"latex\" \/> when <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BD_%7B%5Ctext%7BExamples%7D%7D%3DD_%7B%5Ctext%7BAssumption%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{D_{&#92;text{Examples}}=D_{&#92;text{Assumption}}}\" class=\"latex\" \/>&#8221; as Framework 3 does not require a run-time bound when <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BD_%7B%5Ctext%7BExamples%7D%7D%5Cneq+D_%7B%5Ctext%7BAssumption%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{D_{&#92;text{Examples}}&#92;neq D_{&#92;text{Assumption}}}\" class=\"latex\" \/>. <a href=\"#e18fc90b-639d-4ee0-bce9-3c77a7eaa096-link\" aria-label=\"Jump to footnote reference 5\">\u21a9\ufe0e<\/a><\/li><li id=\"f0bd6d70-92ae-44be-9cd5-09cb8b45e961\">If the algorithm runs longer than <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7BT%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{T}\" class=\"latex\" \/> or fails to output a classifier, stop and go to last step. <a href=\"#f0bd6d70-92ae-44be-9cd5-09cb8b45e961-link\" aria-label=\"Jump to footnote reference 6\">\u21a9\ufe0e<\/a><\/li><li id=\"e24c404f-9465-416b-9d59-3feddcf98899\">The tester below generalizes the tester of [<a href=\"https:\/\/www.sciencedirect.com\/science\/article\/pii\/S0020019003003594\">AGM &#8217;03<\/a>, <a href=\"https:\/\/dl.acm.org\/doi\/abs\/10.1145\/1250790.1250863\">AAKMRX &#8217;07<\/a>, <a href=\"https:\/\/drops.dagstuhl.de\/storage\/00lipics\/lipics-vol116-approx-random2018\/LIPIcs.APPROX-RANDOM.2018.54\/LIPIcs.APPROX-RANDOM.2018.54.pdf\">OZ &#8217;18<\/a>] for testing <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bk%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{k}\" class=\"latex\" \/>-wise independence. <a href=\"#e24c404f-9465-416b-9d59-3feddcf98899-link\" aria-label=\"Jump to footnote reference 7\">\u21a9\ufe0e<\/a><\/li><li id=\"5da7233e-676e-42f9-8bc5-41c5198adc8a\">Inspecting the general recipe described in this section, we see that, strictly speaking, to execute this recipe one needs not only a tester <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cmathcal%7BT%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;mathcal{T}}\" class=\"latex\" \/> but also an algorithm <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cmathcal%7BA%7D%27%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;mathcal{A}&#039;}\" class=\"latex\" \/> for class <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cmathcal%7BC%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;mathcal{C}}\" class=\"latex\" \/> in Framework 1 (which is allowed to be very slow). Existence of such <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cmathcal%7BA%7D%27%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;mathcal{A}&#039;}\" class=\"latex\" \/> is a very mild condition and holds for all concept classes <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7B%5Cmathcal%7BC%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{&#92;mathcal{C}}\" class=\"latex\" \/> for which this problem has been considered. <a href=\"#5da7233e-676e-42f9-8bc5-41c5198adc8a-link\" aria-label=\"Jump to footnote reference 8\">\u21a9\ufe0e<\/a><\/li><li id=\"010dd730-d676-4dad-b414-d8513a155208\">To do this, one needs to show that none of the coefficients of <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%7Bp_%7B%5Ctext%7Bup%7D%7D%28x%29-p_%7B%5Ctext%7Bdown%7D%7D%28x%29%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0&#038;c=20201002\" alt=\"{p_{&#92;text{up}}(x)-p_{&#92;text{down}}(x)}\" class=\"latex\" \/> is too large, which turns out to be true. <a href=\"#010dd730-d676-4dad-b414-d8513a155208-link\" aria-label=\"Jump to footnote reference 9\">\u21a9\ufe0e<\/a><\/li><li id=\"8efb53cb-cccd-41c3-ae64-7e218cc109e9\">For the rest of this subsection we will refer to Standard Gaussian distribution as simply Gaussian and origin-centered linear classifiers as simply linear classifiers. <a href=\"#8efb53cb-cccd-41c3-ae64-7e218cc109e9-link\" aria-label=\"Jump to footnote reference 10\">\u21a9\ufe0e<\/a><\/li><li id=\"5da9fe9e-2908-44f7-b5dd-3394397a0f9a\">The definition is inspired by the setting of tolerant property testing [<a href=\"https:\/\/www.sciencedirect.com\/science\/article\/pii\/S0022000006000444\">PRR06<\/a>]. <a href=\"#5da9fe9e-2908-44f7-b5dd-3394397a0f9a-link\" aria-label=\"Jump to footnote reference 11\">\u21a9\ufe0e<\/a><\/li><li id=\"d7bfd691-edf6-4468-9965-ab5b7dba918c\">This last insight is closely related to the method used in [<a href=\"https:\/\/dl.acm.org\/doi\/abs\/10.1145\/3188745.3188754\">DKS18<\/a>] to solve a different problem. <a href=\"#d7bfd691-edf6-4468-9965-ab5b7dba918c-link\" aria-label=\"Jump to footnote reference 12\">\u21a9\ufe0e<\/a><\/li><\/ol>\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Today&#8217;s technical post is by Arsen Vasilyan. This focuses on the very exciting new &#8220;testable learning&#8221; he introduced with Rubinfeld in a 2023 paper. There&#8217;s been a flurry of work since then, so this is a good chance to catch up in case you&#8217;re behind! 1. The Goal: Learning with Noisy Labels 1.1 Introduction Many [&hellip;]<\/p>\n","protected":false},"author":20,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"nf_dc_page":"","om_disable_all_campaigns":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_feature_clip_id":0,"_jetpack_memberships_contains_paid_content":false,"footnotes":"[{\"content\":\"A proof sketch: Draw $latex {N^{2}}&amp;fg=000000$ samples $latex {S}&amp;fg=000000$ from the Gaussian distribution and let $latex {D_{\\\\text{sub-sample}}}&amp;fg=000000$ be uniform over $latex {S}&amp;fg=000000$. The distribution $latex {D_{\\\\text{sub-sample}}}&amp;fg=000000$ is $latex {\\\\Omega(1)}&amp;fg=000000$-far from Gaussian in total variation distance, but it can be shown that $latex {\\\\Omega(N)}&amp;fg=000000$ samples are necessary to distinguish such $latex {D_{\\\\text{sub-sample}}}&amp;fg=000000$ from the Gaussian distribution. We can take $latex {N}&amp;fg=000000$ to be arbitrarily large, so no finite-sample tester exists.\",\"id\":\"15aafd02-c12c-4db6-b4a0-4f893efdd1a6\"},{\"content\":\"One can also consider labels $latex {\\\\left\\\\{ y_{i}\\\\right\\\\} }&amp;fg=000000$ that are themselves random variables rather than a deterministic function of example $latex {x}&amp;fg=000000$. Everything is this blog post will apply to that setting as well.\",\"id\":\"30a9afd3-f30d-4c82-8879-cc2de2a0d64c\"},{\"content\":\"Throughout blog post, we will assume a success probability of (say) $latex {0.99}&amp;fg=000000$. This probability can be improved to $latex {1-\\\\delta}&amp;fg=000000$ by repeating the process $latex {\\\\log1\/\\\\delta}&amp;fg=000000$ times and selecting the classifier that does best on a fresh set of samples.\",\"id\":\"ebba8fcb-f5da-491f-bd59-92e53a33a4bc\"},{\"content\":\"Note: even for this easier task, no algorithm is known in Framework 2 with run-time better than $latex {2^{O(d)}}&amp;fg=000000$.\",\"id\":\"a367653b-0bd9-4068-ac43-a1ec745f36d5\"},{\"content\":\"Henceforth, for simplicity, when we say ``a Framework 3 algorithm $latex {\\\\mathcal{A}}&amp;fg=000000$ runs in time $latex {T}&amp;fg=000000$'' we really mean ``$latex {\\\\mathcal{A}}&amp;fg=000000$ runs in time $latex {T}&amp;fg=000000$ when $latex {D_{\\\\text{Examples}}=D_{\\\\text{Assumption}}}&amp;fg=000000$'' as Framework 3 does not require a run-time bound when $latex {D_{\\\\text{Examples}}\\\\neq D_{\\\\text{Assumption}}}&amp;fg=000000$.\",\"id\":\"e18fc90b-639d-4ee0-bce9-3c77a7eaa096\"},{\"content\":\"If the algorithm runs longer than $latex {T}&amp;fg=000000$ or fails to output a classifier, stop and go to last step.\",\"id\":\"f0bd6d70-92ae-44be-9cd5-09cb8b45e961\"},{\"content\":\"The tester below generalizes the tester of [<a href=\\\"https:\/\/www.sciencedirect.com\/science\/article\/pii\/S0020019003003594\\\">AGM '03<\/a>, <a href=\\\"https:\/\/dl.acm.org\/doi\/abs\/10.1145\/1250790.1250863\\\">AAKMRX '07<\/a>, <a href=\\\"https:\/\/drops.dagstuhl.de\/storage\/00lipics\/lipics-vol116-approx-random2018\/LIPIcs.APPROX-RANDOM.2018.54\/LIPIcs.APPROX-RANDOM.2018.54.pdf\\\">OZ '18<\/a>] for testing $latex {k}&amp;fg=000000$-wise independence.\",\"id\":\"e24c404f-9465-416b-9d59-3feddcf98899\"},{\"content\":\"Inspecting the general recipe described in this section, we see that, strictly speaking, to execute this recipe one needs not only a tester $latex {\\\\mathcal{T}}&amp;fg=000000$ but also an algorithm $latex {\\\\mathcal{A}'}&amp;fg=000000$ for class $latex {\\\\mathcal{C}}&amp;fg=000000$ in Framework 1 (which is allowed to be very slow). Existence of such $latex {\\\\mathcal{A}'}&amp;fg=000000$ is a very mild condition and holds for all concept classes $latex {\\\\mathcal{C}}&amp;fg=000000$ for which this problem has been considered.\",\"id\":\"5da7233e-676e-42f9-8bc5-41c5198adc8a\"},{\"content\":\"To do this, one needs to show that none of the coefficients of $latex {p_{\\\\text{up}}(x)-p_{\\\\text{down}}(x)}&amp;fg=000000$ is too large, which turns out to be true.\",\"id\":\"010dd730-d676-4dad-b414-d8513a155208\"},{\"content\":\"For the rest of this subsection we will refer to Standard Gaussian distribution as simply Gaussian and origin-centered linear classifiers as simply linear classifiers.\",\"id\":\"8efb53cb-cccd-41c3-ae64-7e218cc109e9\"},{\"content\":\"The definition is inspired by the setting of tolerant property testing [<a href=\\\"https:\/\/www.sciencedirect.com\/science\/article\/pii\/S0022000006000444\\\">PRR06<\/a>].\",\"id\":\"5da9fe9e-2908-44f7-b5dd-3394397a0f9a\"},{\"content\":\"This last insight is closely related to the method used in [<a href=\\\"https:\/\/dl.acm.org\/doi\/abs\/10.1145\/3188745.3188754\\\">DKS18<\/a>] to solve a different problem.\",\"id\":\"d7bfd691-edf6-4468-9965-ab5b7dba918c\"}]","jetpack_post_was_ever_published":false},"categories":[4],"tags":[],"class_list":["post-1010","post","type-post","status-publish","format-standard","hentry","category-technical"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/www.let-all.com\/blog\/wp-json\/wp\/v2\/posts\/1010","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.let-all.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.let-all.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.let-all.com\/blog\/wp-json\/wp\/v2\/users\/20"}],"replies":[{"embeddable":true,"href":"https:\/\/www.let-all.com\/blog\/wp-json\/wp\/v2\/comments?post=1010"}],"version-history":[{"count":121,"href":"https:\/\/www.let-all.com\/blog\/wp-json\/wp\/v2\/posts\/1010\/revisions"}],"predecessor-version":[{"id":1141,"href":"https:\/\/www.let-all.com\/blog\/wp-json\/wp\/v2\/posts\/1010\/revisions\/1141"}],"wp:attachment":[{"href":"https:\/\/www.let-all.com\/blog\/wp-json\/wp\/v2\/media?parent=1010"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.let-all.com\/blog\/wp-json\/wp\/v2\/categories?post=1010"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.let-all.com\/blog\/wp-json\/wp\/v2\/tags?post=1010"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}