adeeplearner's blog

My notes on Memory Efficient Swish

May 17, 2021

Recently, while implementing EfficientNet networks I came across a github comment detailing an implementation of Swish activation function that promises saving upto 30% GPU memory usage for EfficientNets. In this (short) blog post I will briefly go over the details of this implementation and explain what enables this implementation to save GPU memory.

Learning a Gaussian distribution

February 7, 2021

Until now, I have discussed what Gaussians are? how can we implement them? and how different metrics can be defined to compare two Gaussian distributions? This brief post is going to go into detail of how we can enable a neural network to learn to output Gaussian distributions.

More precisely this post will cover:

The concept of a loss function and its minimisation.

How to take partial derivatives of a loss function?

How to learn a simple neural network that outputs Gaussian distributions?

Incorporating the above into much bigger neural networks

What are class activation maps?

October 25, 2020

This post is aimed at implementing and showing some interesting use cases for Class Activation Maps (CAMs) using its description from the original paper “Learning Deep Features for Discriminative Localization”. For this, I will be using PyTorch to implement the method to get CAMs out of relevant deep learning classification models.

Implementing Gaussians and distribution comparison metrics

August 23, 2020

In the previous post, I went through some areas where a Gaussian distribution could be useful. This post is going to be focused on implementation of Gaussians. Specifically, we will be implementing our first Gaussian, its discrete integral approximation and different comparison metrics that can be used to compare two distributions. I will be using Python’s NumPy library for all numerical operations in this post.

On the relation of learning Gaussian and the L2 loss

July 26, 2020

In my previous post, I introduced Gaussians and their existence in different datasets. This post is primarily aimed at a derivation that I think is trivial for understanding where Gaussians are used in machine learning. This proof can be found in any machine learning textbook. Most part of the derivation can be found in this StatsExchange post that I got help from.