Understanding Machine Learning: From Theory to AlgorithmsMachine learning is one of the fastest growing areas of computer science, with far-reaching applications. The aim of this textbook is to introduce machine learning, and the algorithmic paradigms it offers, in a principled way. The book provides an extensive theoretical account of the fundamental ideas underlying machine learning and the mathematical derivations that transform these principles into practical algorithms. Following a presentation of the basics of the field, the book covers a wide array of central topics that have not been addressed by previous textbooks. These include a discussion of the computational complexity of learning and the concepts of convexity and stability; important algorithmic paradigms including stochastic gradient descent, neural networks, and structured output learning; and emerging theoretical concepts such as the PAC-Bayes approach and compression-based bounds. Designed for an advanced undergraduate or beginning graduate course, the text makes the fundamentals and algorithms of machine learning accessible to students and non-expert readers in statistics, computer science, mathematics, and engineering. |
Contents
Foundations | 11 |
A Formal Learning Model | 22 |
Learning via Uniform Convergence | 31 |
The BiasComplexity Tradeoff | 36 |
The VCDimension | 43 |
Nonuniform Learnability | 58 |
The Runtime of Learning | 73 |
From Theory to Algorithms | 87 |
Nearest Neighbor | 219 |
Neural Networks | 228 |
Additional Learning Models | 243 |
Clustering | 264 |
Dimensionality Reduction | 278 |
Generative Models | 295 |
Feature Selection and Generation | 309 |
Advanced Theory | 323 |
Boosting | 101 |
Model Selection and Validation | 114 |
Convex Learning Problems | 124 |
Regularization and Stability | 137 |
Stochastic Gradient Descent | 150 |
Support Vector Machines | 167 |
Kernel Methods | 179 |
Multiclass Ranking and Complex Prediction Problems | 190 |
Decision Trees | 212 |
Other editions - View all
Understanding Machine Learning: From Theory to Algorithms Shai Shalev-Shwartz,Shai Ben-David Limited preview - 2014 |
Common terms and phrases
AdaBoost agnostic PAC approximation error argmax argmin assume assumption Bayes binary classification Chapter class H classifier clustering computational concludes our proof consider convex function decision tree define defined definition denote derive dimensional distribution domain empirical risk Equation ERM rule example exists feature space finite first function f given graph halfspaces hinge loss hypothesis class implement inequality input instances k-means kernel label LD(h Ldim(H learner learning algorithm learning problems Lemma Let H linear predictors loss function LS(h machine learning mapping matrix minimal multiclass neural networks neuron node norm obtain optimization problem output overfitting PAC learnable PAC learning papaya parameter Perceptron polynomial prediction prior knowledge probability prove Rademacher complexity random variable regression sample complexity sequence shattered solve Stochastic Gradient Descent Theorem training set uniform convergence upper bound VC-dimension VCdim(H vector