Posts

Fast online moments estimates on the GPU

06/08/2021  ·  3 minutes
Python PyTorch
Estimating moments is an important step of any statistical analysis of data. The mean, variance, skewness and kurtosis of a dataset can already tell a lot about the distribution of our data. However, some datasets don’t quite fit in memory. If you have a dataset of N samples and C features where N is a lot bigger than C, you can benefit a lot by using online algorithms. $$ \bar x = \frac 1N\sum_{i=1}^N x_i $$

A simple image concat script

21/05/2021  ·  1 minute
Usage ./image_concat.py --resize 256x256 hazelnut.png leather.png out.png Result Images from the MVTec-AD dataset. Source

Converting a python Dict into a class

07/12/2020  ·  1 minute
Python
In python, dictionnaries and classes behave differently. One of the main difference is how you access their members. In the following example, class_instance and dictionnary hold the same data. class Foo: def __init__(self): self.bar = "foo" class_instance = Foo() dictionnary = {'bar': 'foo'} print(class_instance.bar, dictionnary['bar']) # foo foo But in order to access the value of bar, the syntax is longer for the dictionnary and may not be convenient in cases where the class syntax is required.

Keeping the same tmux session across launches

13/09/2020  ·  2 minutes
Linux
Context I use alacritty as my main terminal. As fast as it is, it doesn’t come pre-configured with tabs or even other windows. So what most people do (me included), it that they use tmux. tmux is a terminal multiplexer (hence the name) which allows you to create windows and split them all over a terminal UI. One of the best feature of tmux, which is often used over ssh, is persistence even when tmux is closed.

An analysis on US flights and cascading failures using PySpark

09/09/2020  ·  7 minutes
Python Spark
Introduction In this blog post, we are going to study a dataset of US only flights during the year 2007. The dataset was released by the American Statistical Association as part of their Bi-Annual Data exposition. During the competition, participants were asked to focus on only one question and try answering it by investigating the dataset. The question we are going to try to answer is: Can you detect cascading failures as delays in one airport create delays in others?

Google Compute Engine SSH Tunnel

01/09/2020  ·  1 minute
Google Cloud
TLDR You can create a ssh tunnel from your virtual machine to your local machine by running the following command on your local machine: $ gcloud compute ssh <INSTANCE-NAME> -- \ -N -L <LOCAL-PORT>:localhost:<REMOTE-PORT> Example For example, if you are training a neural network using tensorflow on your virtual machine named instance-1. To monitor the training, you would launch tensorboard on the remote machine: