Setup#

The tutorials will be done on Google Colab, a remote Python server running in the cloud. Although you access it from your account, Colab runs on Google’s servers and does not automatically have access to the files on your local computer or Google Drive.

You will be given data files to download and upload to your Google Drive. Storing files in Drive is the easiest way to make them accessible to Colab without extra setup. But how do you access them from Colab? This section will guide you through the steps.

Bash (re)fresher#

You will need to be familiar with the basics of bash for your setup mission.

Note to the Unix whisperers

If you are familiar with bash^*, you can quickly go through this. If not, follow along and learn the basic commands that will help you to navigate in a terminal.

Here are the most common commands in bash:

Command	Meaning / Action	Example / Usage
`ls`	List files and folders	`ls`
`pwd`	Print working directory (where you are)	`pwd`
`cd`	Change directory	`cd myfolder/`
`cd ..`	Go up one directory	`cd ..`
`mkdir`	Make a new directory (folder)	`mkdir a_new_folder`
`rm`	⚠️ Remove a file ⚠️	`rm a_file.txt`
`rm -r`	⚠️ Remove a directory (recursively) ⚠️	`rm -r a_folder/`
`cp`	Copy files or folders	`cp a_file.txt copied_file.txt`
`mv`	Move or rename a file or folder	`mv old.txt new.txt`
`cat`	Show file contents	`cat notes.txt`
`head`	Show first 5 lines of a file	`head -n 5 a_big_file.csv`
`tail`	Show last 12 lines of a file	`tail -n 12 another_big_file.txt`

In Colab, a code cell runs Python by default. To run a bash command, put a ! in front of it. This will however work only in the current cell. To run a bash command persisting across the follow cells, you need to use the % sign. In brief:

Use ! for quick inspections like !ls, !cat file.txt, !head file.csv.
Use %cd only if you want to change the working directory for Colab in a persistent way.

Your setup mission#

Let’s do a short exercise to get familiar with the steps that will be done at each tutorial:

Download the dataset
Upload it to your Google Drive
Mount your Google Drive in Colab
Read the file from Colab

1. Download the dataset#

Here is a cute little dataset: (you will understand later)
Download dataset

2. Upload the file on your Drive#

Start by uploading the dataset to your Google Drive. It can help to keep things organized in a folder like data/intro_to_ml, but feel free to place it wherever works best for you.

3. Mount your Google Drive in Colab#

In your Google Drive, click on + New and then More and click on Google Colaboratory.

To access your Google Drive files, you need to create a ‘bridge’ from the Colab server to your Drive. This is done by first importing a special library and then entering this command:

from google.colab import drive
drive.mount('/content/gdrive')

Copy and paste the above in your blank Colab file. A popup will appear asking you to grant Colab access to your Google Drive: click “accept” to proceed.

4. Read your file#

Now let’s navigate with bash commands to find that file.

You can first do:

!pwd

to see where you are in your Drive and then list the file content with ls or to put more details, using the option -l:

!ls -l

Then navigate to the folders where your file is using a couple of cd and ls commands. For the cd command, remember to use the % symbol at front.

Tip

Once you locate your data file in Google Drive, it’s a good practice to store its full path (i.e., the string of characters representing all directories and subdirectories) in a variable. You can name it data_path for instance. Careful using path as it is a special python variable!

Let’s inspect the file!
In the section Bash (re)fresher above, you can see the commands cat, head and tail. The first one will display in the terminal the entire file. What if there are many many lines? This will saturate your output window. Instead, you can use head or tail to display a portion of the file. For instance, head -n 10 shows the first 10 lines, tail -n 3 will give you the last three lines.

In my Colab setup with Drive mounted, it looks like this:

data_path = "/content/gdrive/MyDrive/data/intro_ml_2025/"
data_file = "data_setup.csv"

In bash, variables are prefixed with a $ sign. To use multiple variables (or a variable and a string) together, just place them back-to-back without any separator.

!head -n 10 $data_path$data_file

This will print the first 10 lines of the file.

It’s important to reach this point before starting the tutorials. Ask your TA for help if need be. And remember you are here to learn!

Bonus: plot the data#

Curious about what this setup dataset looks like? Use this plotting macro and see for yourself 😉

import os
import pandas as pd
import matplotlib.pyplot as plt

def plot_setup_data(csv_file):
    # Check if file exist
    if not os.path.exists(csv_file):
        print(f"File {csv_file} does not exist.")
        return
    
    df = pd.read_csv(csv_file)

    required_cols = {"x1", "x2", "y"}
    if not required_cols.issubset(df.columns):
        print(f"CSV must contain columns: {required_cols}")
        return
    
    # map y=0 -> black, y=1 -> red
    colors = df["y"].map({0: "black", 1: "red"})
    
    plt.figure(figsize=(6, 6))
    plt.scatter(df["x1"], df["x2"], c=colors, s=8)
    plt.xlabel(r"$x_1$"); plt.ylabel(r"$x_2$")
    
    # Square plot from 0 to 1, no other tick
    plt.xlim(0, 1); plt.ylim(0, 1)
    plt.xticks([0, 1]); plt.yticks([0, 1])
    plt.axis("equal")
    plt.grid(False)

    plt.show()

You’re done the setup! Enjoy the tutorials!

Introduction
to Machine Learning

Setup

Contents

Setup#

Bash (re)fresher#

Your setup mission#

1. Download the dataset#

2. Upload the file on your Drive#

3. Mount your Google Drive in Colab#

4. Read your file#

Bonus: plot the data#

Introduction to Machine Learning

Setup

Contents

Setup#

Bash (re)fresher#

Your setup mission#

1. Download the dataset#

2. Upload the file on your Drive#

3. Mount your Google Drive in Colab#

4. Read your file#

Bonus: plot the data#

Introduction
to Machine Learning