Quantization in Deep Learning
In recent years deep learning models have become huge, reaching hundreds of billions of parameters. Hence the need to reduce their size. Of course, there was the need to accomplish this task without resulting in a reduced accuracy. Enters quantization. Background As you might know, deep learning models eat numbers, both during training and inference. When the task has to do with images, we just note that images are nothing more than matrices of pixels, so we’re already good to go....
What is the KV cache?
Recently we’ve seen researchers and engineers scaling transformer-based models to hundreds of billions of parameters. The transformer architecture is exactly what made this possible, thanks to its sequence parallelism (here is an introduction to the transformer architecture). However, if it certainly enables an efficient training procedure, the same cannot be said about the inference process. Background Recall the definition of Attention given in the “Attention Is All You Need” paper:...
Word Embedding
In this post I will give you a brief introduction about Word Embedding, a technique used in NLP as an efficient representation of words. Disclaimer: These notes are for the most part a collection of concepts taken from the slides of the ‘Artificial Neural Networks and Deep Learning’ course at Polytechnic of Milan and from some other online resources. I am simply putting together all the information to study for the exam and I thought it would be a good idea to upload them here since they can be useful for someone who is interested in this topic....
Seq2Seq models and the Attention mechanism
The path followed in this post is: sequence-to-sequence models $\rightarrow$ neural turing machines $\rightarrow$ attentional interfaces $\rightarrow$ transformers. This post is dense of stuff, but I tried to keep it as simple as possible, without losing important details! Disclaimer: These notes are for the most part a collection of concepts taken from the slides of the ‘Artificial Neural Networks and Deep Learning’ course at Polytechnic of Milan, the book ‘Deep Learning’ (Goodfellow-et-al-2016) and from some other online resources....
Introduction to RNN and LSTM
In this post I will go through Recurrent Neural Networks (RNNs) and Long-Short Term Memories (LSTMs), explaining why RNNs are not enough to deal with sequence modeling and how LSTMs solve those problems. Disclaimer: These notes are for the most part a collection of concepts taken from the slides of the ‘Artificial Neural Networks and Deep Learning’ course at Polytechnic of Milan, the book ‘Deep Learning’ (Goodfellow-et-al-2016) and from some other online resources....
Introduction to GAN
In this post I will give you an introduction to Generative Adversarial Networks, explaining the reasons behind their architecture and how they are trained. Disclaimer: These notes are for the most part a collection of concepts taken from the slides of the ‘Artificial Neural Networks and Deep Learning’ course at Polytechnic of Milan, the book ‘Deep Learning’ (Goodfellow-et-al-2016) and from some other online resources. I am simply putting together all the information to study for the exam and I thought it would be a good idea to upload them here since they can be useful for someone who is interested in this topic....
Object Localization and Detection
In this post I will introduce the Object Localization and Detection task, starting from the most straightforward solutions, to the best models that reached state-of-the-art performances, i.e. R-CNN, Fast R-CNN, Faster R-CNN and YOLO. Disclaimer: These notes are for the most part a collection of concepts taken from the slides of the ‘Artificial Neural Networks and Deep Learning’ course at Polytechnic of Milan and from some other online resources. I am just putting together all the information to study for the exam and I thought it would be a good idea to upload them here since they can be useful for someone interested in this topic....
Image Segmentation
In this post I will explain Image Segmentation, focusing on the architecture of the models used to perform this task. Fully Convolutional Networks and U-Net will be at the center of the discussion. Disclaimer: These notes are for the most part a collection of concepts taken from the slides of the ‘Artificial Neural Networks and Deep Learning’ course at Polytechnic of Milan and from some other online resources. I am just putting together all the information to study for the exam and I thought it would be a good idea to upload them here since they can be useful for someone interested in this topic....
Introduction to CNN
In this post I will give you an introduction to Convolutional Neural Networks (CNN). We will see the reasons behind the success of this architecture and the latter will be analyzed layer by layer. Disclaimer: These notes are for the most part a collection of concepts taken from the slides of the ‘Artificial Neural Networks and Deep Learning’ course at Polytechnic of Milan, the book ‘Deep Learning’ (Goodfellow-et-al-2016) and from some other online resources....
Batch Normalization
In this post we will talk about batch normalization, explaining what it is and how it works! Disclaimer: These notes are for the most part a collection of concepts taken from the slides of the ‘Artificial Neural Networks and Deep Learning’ course at Polytechnic of Milan, the book ‘Deep Learning’ (Goodfellow-et-al-2016) and from some other online resources. I am simply putting together all the information to study for the exam and I thought it would be a good idea to upload them here since they can be useful for someone who is interested in this topic....