Deep learning
Material type:
- 9780262035613
Incluye índice y bibliografía.
Website -- Acknowledgments -- Notation -- 1 Introduction -- 1.1 Who Should Read This Book? -- 1.2 Historical Trends in Deep Learning -- I Applied Math and Machine Learning Basics -- 2 Linear Algebra -- 2.1 Scalars, Vectors, Matrices and Tensors -- 2.2 Multiplying Matrices and Vectors -- 2.3 Identity and Inverse Matrices -- 2.4 Linear Dependence and Span -- 2.5 Norms -- 2.6 Special Kinds of Matrices and Vectors -- 2.7 Eigendecomposition -- 2.8 Singular Value Decomposition -- 2.9 The Moore-Penrose Pseudoinverse -- 2.10 The Trace Operator -- 2.11 The Determinant -- 2.12 Example: Principal Components Analysis -- 3 Probability and Information Theory -- 3.1 Why Probability? -- 3.2 Random Variables -- 3.3 Probability Distributions -- 3.4 Marginal Probability -- 3.5 Conditional Probability -- 3.6 The Chain Rule of Conditional Probabilities -- 3.7 Independence and Conditional Independence -- 3.8 Expectation, Variance and Covariance -- 3.9 Common Probability Distributions -- 3.10 Useful Properties of Common Functions -- 3.11 Bayes’ Rule -- 3.12 Technical Details of Continuous Variables -- 3.13 Information Theory -- 3.14 Structured Probabilistic Models -- 4 Numerical Computation -- 4.1 Overflow and Underflow -- 4.2 Poor Conditioning -- 4.3 Gradient-Based Optimization -- 4.4 Constrained Optimization -- 4.5 Example: Linear Least Squares -- 5 Machine Learning Basics -- 5.1 Learning Algorithms -- 5.2 Capacity, Overfitting and Underfitting -- 5.3 Hyperparameters and Validation Sets -- 5.4 Estimators, Bias and Variance -- 5.5 Maximum Likelihood Estimation -- 5.6 Bayesian Statistics -- 5.7 Supervised Learning Algorithms -- 5.8 Unsupervised Learning Algorithms -- 5.9 Stochastic Gradient Descent -- 5.10 Building a Machine Learning Algorithm -- 5.11 Challenges Motivating Deep Learning -- II Deep Networks: Modern Practices -- 6 Deep Feedforward Networks -- 6.1 Example: Learning XOR -- 6.2 Gradient-Based Learning -- 6.3 Hidden Units -- 6.4 Architecture Design -- 6.5 Back-Propagation and Other Differentiation Algorithms -- 6.6 Historical Notes -- 7 Regularization for Deep Learning -- 7.1 Parameter Norm Penalties -- 7.2 Norm Penalties as Constrained Optimization -- 7.3 Regularization and Under-Constrained Problems -- 7.4 Dataset Augmentation -- 7.5 Noise Robustness -- 7.6 Semi-Supervised Learning -- 7.7 Multitask Learning -- 7.8 Early Stopping -- 7.9 Parameter Tying and Parameter Sharing -- 7.10 Sparse Representations -- 7.11 Bagging and Other Ensemble Methods -- 7.12 Dropout -- 7.13 Adversarial Training -- 7.14 Tangent Distance, Tangent Prop and Manifold Tangent Classifier -- 8 Optimization for Training Deep Models -- 8.1 How Learning Differs from Pure Optimization -- 8.2 Challenges in Neural Network Optimization -- 8.3 Basic Algorithms -- 8.4 Parameter Initialization Strategies -- 8.5 Algorithms with Adaptive Learning Rates -- 8.6 Approximate Second-Order Methods -- 8.7 Optimization Strategies and Meta-Algorithms -- 9 Convolutional Networks -- 9.1 The Convolution Operation -- 9.2 Motivation -- 9.3 Pooling -- 9.4 Convolution and Pooling as an Infinitely Strong Prior -- 9.5 Variants of the Basic Convolution Function -- 9.6 Structured Outputs -- 9.7 Data Types -- 9.8 Efficient Convolution Algorithms -- 9.9 Random or Unsupervised Features -- 9.10 The Neuroscientific Basis for Convolutional Networks -- 9.11 Convolutional Networks and the History of Deep Learning -- 10 Sequence Modeling: Recurrent and Recursive Nets -- 10.1 Unfolding Computational Graphs -- 10.2 Recurrent Neural Networks -- 10.3 Bidirectional RNNs -- 10.4 Encoder-Decoder Sequence-to-Sequence Architectures -- 10.5 Deep Recurrent Networks -- 10.6 Recursive Neural Networks -- 10.7 The Challenge of Long-Term Dependencies -- 10.8 Echo State Networks -- 10.9 Leaky Units and Other Strategies for Multiple Time Scales -- 10.10 The Long Short-Term Memory and Other Gated RNNs -- 10.11 Optimization for Long-Term Dependencies -- 10.12 Explicit Memory -- 11 Practical Methodology -- 11.1 Performance Metrics -- 11.2 Default Baseline Models -- 11.3 Determining Whether to Gather More Data -- 11.4 Selecting Hyperparameters -- 11.5 Debugging Strategies -- 11.6 Example: Multi-Digit Number Recognition -- 12 Applications -- 12.1 Large-Scale Deep Learning -- 12.2 Computer Vision -- 12.3 Speech Recognition -- 12.4 Natural Language Processing -- 12.5 Other Applications -- III Deep Learning Research -- 13 Linear Factor Models -- 13.1 Probabilistic PCA and Factor Analysis -- 13.2 Independent Component Analysis (ICA) -- 13.3 Slow Feature Analysis -- 13.4 Sparse Coding -- 13.5 Manifold Interpretation of PCA -- 14 Autoencoders -- 14.1 Undercomplete Autoencoders -- 14.2 Regularized Autoencoders -- 14.3 Representational Power, Layer Size and Depth -- 14.4 Stochastic Encoders and Decoders -- 14.5 Denoising Autoencoders -- 14.6 Learning Manifolds with Autoencoders -- 14.7 Contractive Autoencoders -- 14.8 Predictive Sparse Decomposition -- 14.9 Applications of Autoencoders -- 15 Representation Learning -- 15.1 Greedy Layer-Wise Unsupervised Pretraining -- 15.2 Transfer Learning and Domain Adaptation -- 15.3 Semi-Supervised Disentangling of Causal Factors -- 15.4 Distributed Representation -- 15.5 Exponential Gains from Depth -- 15.6 Providing Clues to Discover Underlying Causes -- 16 Structured Probabilistic Models for Deep Learning -- 16.1 The Challenge of Unstructured Modeling -- 16.2 Using Graphs to Describe Model Structure -- 16.3 Sampling from Graphical Models -- 16.4 Advantages of Structured Modeling -- 16.5 Learning about Dependencies -- 16.6 Inference and Approximate Inference -- 16.7 The Deep Learning Approach to Structured Probabilistic Models -- 17 Monte Carlo Methods -- 17.1 Sampling and Monte Carlo Methods -- 17.2 Importance Sampling -- 17.3 Markov Chain Monte Carlo Methods -- 17.4 Gibbs Sampling -- 17.5 The Challenge of Mixing between Separated Modes -- 18 Confronting the Partition Function -- 18.1 The Log-Likelihood Gradient -- 18.2 Stochastic Maximum Likelihood and Contrastive Divergence -- 18.3 Pseudolikelihood -- 18.4 Score Matching and Ratio Matching -- 18.5 Denoising Score Matching -- 18.6 Noise-Contrastive Estimation -- 18.7 Estimating the Partition Function -- 19 Approximate Inference -- 19.1 Inference as Optimization -- 19.2 Expectation Maximization -- 19.3 MAP Inference and Sparse Coding -- 19.4 Variational Inference and Learning -- 19.5 Learned Approximate Inference -- 20 Deep Generative Models -- 20.1 Boltzmann Machines -- 20.2 Restricted Boltzmann Machines -- 20.3 Deep Belief Networks -- 20.4 Deep Boltzmann Machines -- 20.5 Boltzmann Machines for Real-Valued Data -- 20.6 Convolutional Boltzmann Machines -- 20.7 Boltzmann Machines for Structured or Sequential Outputs -- 20.8 Other Boltzmann Machines -- 20.9 Back-Propagation through Random Operations -- 20.10 Directed Generative Nets -- 20.11 Drawing Samples from Autoencoders -- 20.12 Generative Stochastic Networks -- 20.13 Other Generation Schemes -- 20.14 Evaluating Generative Models -- 20.15 Conclusion -- Bibliography -- Index