The Course Project

In order to learn machine learning techniques, best practices, and theory, one really needs to apply machine learning to the real world. While synthetic/mocked/theoretical data is often studied, a set of real world applications (yours and your fellow students’) will ground your understanding and better prepare you all for using machine learning in your future careers.

About Project Deliverables

Each deliverable has certain number of points associated with it:

  • (1) Possible Direction
  • (1) Task Definition
  • (2) Test Set
  • (3) Training Set
  • (3) Feature Analysis
  • (3) Exploration
  • (3) Project Reflection

There are 15 total points in the project. Remember that for an A in the course you will need at least 12 of these points declared satisfactory.

(1) Possible Direction (4 March)

To that end, your first deliverable for the Course Project will be a 'verbal presentation' of a Possible Direction with:

  1. The question you will teach a machine to answer.
  2. The dataset you expect to draw examples from.

You will have done some checking of the following issues before settling on a dataset & question pair. This is not binding -- we will present these “Possible Directions” and you can join a classmates direction (with or without making it a group project).

For more information, please see: Choosing a Dataset and Task

(1) Task Definition (11 March)

For this deadline, your task definition will be written and binding. You will have heard of different projects from your peers, and may even be working on the same task as another student. That is fully acceptable, and leads to some interesting comparisons and shared understanding of the problem that can develop later.

For more information, please see: Choosing a Dataset and Task

(2) Test Set (25 March)

By this point, you will have collected (as fairly as possible) at least 100 labels for your problem. This is a large enough collection to be useful, but small enough that if you realize your task definition needs to change or accomodate something in the actual data, you can quickly patch any labels that need changing. A brief reflection on what you learned while performing this annotation will be required.


  • (1) >= 100 labels for test set
  • (1) Thoughts on task definition.

(3) Training Set (TUESDAY 6 April)

By this point, you will have collected at least 300 additional labels for your problem. Your understanding of the task, and what makes it possible for you as a human will be improving, and you will likely be working on feature extractors at this point. You will choose any basic machine-learning model to explore the labeling curve of your problem at this point; but even if this curve indicates that more labels would improve performance, you will likely be focusing on the model from this point onward.


  • (1) >= 300 labels for training set
  • (1) Learning Curve
  • (1) Thoughts on feature design.

(3) Feature & Data Analysis (22 April)

By this point, you will have explored a handful of models, and have a better understanding of the difficulties of your task, what works well, and what does not. You will analyze feature contributions to model performances, to see if you can simplify the model, and propose your next 'Exploration' task.


  • (1) Feature Performance Analysis
  • (1) Appropriate Models tried and Compared
  • (1) Proposal of Exploration Task

(3) Exploration (6 May)

This task is the most open-ended.

  • If your course project turns out to be rather solvable, you are free to pursue a related task, or a task redefinition here.
  • You may also explore implementing particular model or subdomain of ML (e.g., deep learning, reinforcement learning, recommender systems, etc.) instead of being tied to your project description.


  • (1) Task is a logical continuation or appropriate challenge.
  • (1) Task is properly versioned / submitted.
  • (1) Task is properly evaluated.

(3) Project Reflection (20 May)

The project reflection is again, slightly open-ended.

If you are still working on your originally proposed task, your job here is to assemble a presentation for your corporate overlords, imploring them to: (a) ship your new model, (b) fund more research into this problem, (c) never ship your new model, etc.

If you are not still working on your originally proposed task, you may assemble a similar presentation -- but for your exploration. Alternately, we will discuss an appropriate objective.


  • (1) Presentation exists and is of appropriate length and depth.
  • (1) Recommendation is clear from presentation, and supported by data.
  • (1) Lessons-learned from the overall project are included.