Matt Log

Activation Functions

In this post we will talk about activation functions, explaining what they are and what are the most commonly used (e.g. ReLU). Disclaimer: These notes are for the most part a collection of concepts taken from the slides of the ‘Artificial Neural Networks and Deep Learning’ course at Polytechnic of Milan, the book ‘Deep Learning’ (Goodfellow-et-al-2016) and from some other online resources. I am simply putting together all the information to study for the exam and I thought it would be a good idea to upload them here since they can be useful for someone who is interested in this topic....

Overfitting in NNs

In this post we will talk about the problem of overfitting, explaining what it is, what are its causes and how we can deal with it. More precisely, the following techniques will be explained: early stopping, weight decay and dropout. Disclaimer: These notes are for the most part a collection of concepts taken from the slides of the ‘Artificial Neural Networks and Deep Learning’ course at Polytechnic of Milan, the book ‘Deep Learning’ (Goodfellow-et-al-2016) and from some other online resources....

Error Functions in NNs

In this post we will talk about how error functions are used in Neural Networks and how they are selected according to the task we have to solve. Disclaimer: These notes are for the most part a collection of concepts taken from the slides of the ‘Artificial Neural Networks and Deep Learning’ course at Polytechnic of Milan and from some other online resources. I am simply putting together all the information to study for the exam and I thought it would be a good idea to upload them here since they can be useful for someone who is interested in this topic....

Kernel Methods

In this post we will talk about Kernel Methods, explaining the math behind them in order to understand how powerful they are and for what tasks they can be used in an efficient way. Disclaimer: the following notes were written following the slides provided by the professor Restelli at Polytechnic of Milan and the book ‘Pattern Recognition and Machine Learning'. Kernel Methods Kernel methods are non-parametric and memory-based (e.g. K-NN), i....

PAC learning and VC dimension

In this post we will talk about PAC Learning and VC Dimension, explaining what they are and why they are useful in Machine Learning. Disclaimer: the following notes were written following the slides provided by the professor Restelli at Polytechnic of Milan and the book ‘Pattern Recognition and Machine Learning'. PAC-Learning and VC-Dimension PAC-Learning In Probably Approximately Correct Learning, the learner receives samples and must select a generalization function (called the hypothesis) from a certain class of possible functions....

Bias-Variance Tradeoff and Model Selection

In this post we will talk about the Bias-Variance tradeoff, explaining where it comes from and how we can manage it, introducing techniques for model selection (feature selection, regularization, dimensionality reduction) and model ensemble (bagging and boosting). Disclaimer: the following notes were written following the slides provided by the professor Restelli at Polytechnic of Milan and the book ‘Pattern Recognition and Machine Learning'. Bias-Variance trade-off and Model Selection No Free Lunch Theorems Define $Acc_G(L)$ as the generalization accuracy of the learner $L$, which is the accuracy of $L$ on non-training samples....

Linear Classification

In this post we will talk about Linear Classification, explaining some of the main methods which are at the basis of this task. Disclaimer: the following notes were written following the slides provided by the professor Restelli at Polytechnic of Milan and the book ‘Pattern Recognition and Machine Learning'. Linear Classification The goal in classification is to take an input vector $x$ and to assign it to one of $K$ discrete classes $C_k$ where $k = 1,…,K$....

Linear Regression

In this post we will analyze Linear Regression Models in a pretty much detailed way, discussing the different approaches in which the problem can be tackled and also explaining what is regularization. Disclaimer: the following notes were written following the slides provided by the professor Restelli at Polytechnic of Milan and the book ‘Pattern Recognition and Machine Learning'. The goal of regression is to predict the value of one or more continuous target variables $t$ given the value of a D-dimensional vector $\boldsymbol{x}$ of input variables....