Syllabus
An online, always up-to-date version of this syllabus can be found online.
- Instructor:
- Prof. John Foley (johnf@middlebury.edu)
- Course Website:
- Lectures:
- Some content available as videos before class.
- Synchronous and remote.
- Tuesdays & Thursdays 5:05-6:20PM EDT (Middlebury Time)
- Office Hours
- Office Hours or "Student Hours" are times when I am available for you to discuss aspects of the course, Computer Science, Middlebury, your plans, your questions about college in general, or really anything else I might be able to help with.
- Always by appointment:
- Schedule a 15, 20 or 30-minute meeting with me. Calendly is a service that lets you drop a meeting on my calendar whenever I'm free. I will guarantee I can make anything with 24 hours notice, but I will try my best with anything sooner.
- Message me in Slack and we can find a time to chat if those EDT business hours don't work for you.
Course Description
Machine Learning is the study and design of computational systems that automatically improve their performance through experience. This course introduces the theory and practice of machine learning and its application to tasks such as database mining, pattern recognition, and strategic game-playing. Possible topics include decision-tree methods, neural networks, Bayesian and statistical methods, genetic algorithms, and reinforcement learning. (CSCI 0200 and CSCI 0201 and MATH 0200) 3 hrs. lect./lab DED
Course Modalities
- Lectures will be delivered somewhat asynchronously with lecture content delivered as videos and readings.
- In-class synchronous time will be used for questions, labs & projects with a focus on exploring concepts with short hands-on exercises in Python.
Course Topics
This is an elective computer science course and it requires Math Foundations of Computing (CS200), Data Structures (CS201) and Linear Algebra (MATH 200). This course does not require any courses in probability or statistics, nor any previous data science experience.
Course Content
The primary mode of content delivery will be through video lectures and readings, and there will be synchronous discussions and coding problems during class time.
Textbook: A Course in Machine Learning
We will be drawing some readings from the free "A Course In Machine Learning" by Hal Daumé III which is available in chapters online at ciml.info.
At various points, I may draw selections from other textbooks, including:
- Patterns, Predictions, and Actions: A story about machine learning by Moritz Hardt and Benjamin Recht
- Probabilistic Machine Learning: An Introduction by Kevin Patrick Murphy (forthcoming 2021 version available as a draft)
- Search Engines: Information Retrieval in Practice by Bruce Croft, Donald Metzler, and Trevor Strohman
This list will be kept up to date as I locate useful readings. With the diversity and sometimes depth of mathematical background assumbed by authors of machine learning texts, in particular, I find that no single textbook has the best descriptions or most useful examples.
Online Resources
There are probably hundreds of machine-learning resources available online (free or paid) and so at various points we have the opportunity to supplement our knowledge with online resources.
Previous versions of this course followed Andrew Ng's Coursera course, which is available in full on YouTube... starting with 1.1 What is Machine Learning?. We may draw on this course, but I find that it focuses far too much on optimization for a modern course; in the post-Tensorflow world, we have auto-differentiation, and while computing derivatives are an important part of some machine learning approaches, I don't think it should be front and center in an undergraduate course in Computer Science.
Course Equipment & Software
Since this course will be run virtually we will need to use software in addition to the course-specific software we would ordinarily setup and use.
Computer / Laptop Access
If you have any questions about computer access, please don't hesitate to contact me ASAP - johnf@middlebury.edu - there are options through both the College and Department. Some more details are below, but I'm happy to help you navigate these systems.
If you ever find yourself temporarily in need of a laptop, the Computer Science department has 10 rotating Dell laptops available to our students. These come pre-installed with software for most of the courses in the major. They are available to be loaned out short-term or long-term. Please get in touch ahead of time if you think you might need one. Due to COVID-19, short-term loans may be trickier than usual.
On Long-Term Use: College policy has changed recently to include the expectation for every student to have a laptop available. The college provides laptops to those who need them where “need” is based on Student Financial Services calculations. If you anticipate needing a laptop for the whole term, we encourage you to inquire with Student Financial Services and the library first due to our smaller pool of equipment. However, our department commits to meeting the needs of every student, so do not be afraid to reach out if you believe you need one of our laptops for any length of time.
From Robert Lichenstein (rlichenstein@), lightly edited.
Zoom (emphasizing synchronous audio)
Our synchronous meeting times will be conducted over Zoom, where we will occasionally use breakout rooms or share screens but we will focus on working with concepts in a hands-on manner.
Slack
Slack is an online, real-time chat service commonly used in industry/etc. This will:
- Provide a space for asking questions that is less formal than email and doesn't require a good Zoom connection.
- Provide a long-lasting chat space, since Zoom chat disappears at the end of any specific call.
- Provide the opportunity for students to have one-on-one or small group conversations given the course exists in virtual space this Spring.
You should receive an invitation to participate in a private Slack. I recommend downloading their application rather than using it in a web browser.
Python
repl.it
You will also be able to access Python through a website like repl.it. This kind of access to Python will be far less convenient than having it on your personal computer but may be the best choice.
Course Structure
We will meet every Tuesday & Thursday. Each day, we will have time for discussion and an in-class activity, usually by exploring a concept in a hands-on manner with Python.
Weekly Practical Assignments
These will be due Monday evenings; the end of the week, plus the weekend if you need the time. They will typically be between 100 and 300 lines of Python code; to explore an idea, method, or concept.
While we have opportunity to work on practical assignments almost every day in class, you may not actually finish these activities during class time. Every week, you will complete one of the potential two assignments and submit it as a 'Practical' assignment, which will be graded for participation see grading effort below.
The Course Project
Most course project deliverables will be in-class on Thursdays. Due to the scattered vacation days, one deliverable will be due on a Tuesday class day.
During the course, you will work on a personalized longer-term machine learning project. There will be a handful of deliverables, some due in class (on Thursdays) and some due by the end of the week e.g., Fridays. More detail about the course project is available.
Tuesday/Thursdays: Readings & Videos
Class time will be as hands-on as possible, so I will be asking you to watch videos and occasionally read before class.
Deadline Conflicts
The value in a schedule is not usually the exact hard deadlines but rather in ensuring that our learning proceeds together, at a reasonable pace. The other importance of deadlines is ensuring that I can get you timely and helpful feedback!
Grading Information
This semester I am exploring 'specification' grading. What that means is that you are intended to find satisfactory completion of a majority of activities in the course. This provides you more flexiblity with what topics to focus your energies while avoiding the need to assign extremely detailed point values or to, e.g., assign exams.
For each 'point' (Practical 'Lab' or Project aspect), if a satisfactory effort was put in, you will receive a 'Pass' (weight=1.0), if effort was put in but a major misunderstanding is presented, you will be given a 'Retry' (and a temporary weight=0.5) and if no effort was put in, you will receive a 'Fail' (weight=0.0).
Letter Grade | Practical 'Lab' Points | Project Points |
---|---|---|
A | 10/12 | 13/15 |
B | 8/12 | 10/15 |
C | 6/12 | 8/15 |
D | 5/12 | 7/15 |
F | 0/12 | 0/12 |
A-, B+, etc. will be assessed as being between these requirements: the higher 'signed' grade will be given to someone who achieves at least one of the point requirements for the higher grade, where someone who misses both targets will be given the slightly lower signed grade: A student with scores: Practical=11, Project=11
would receive an A-, for meeting the A level for Practicals, but missing on the project. A student with scores: Practical=9, Project=11
would receive a B+, for exceeding the B level but not reaching the A level on either section.
Large differences between your two grading categories will be dealt with on a case-by-case basis.
One can envision this system as having two 'skips' for the practicals, and almost one 'skip' for a later project deliverable.
Academic Honesty at Middlebury
As an academic community devoted to the life of the mind, Middlebury requires of every student complete intellectual honesty in the preparation and submission of all academic work. Details of our Academic Honesty, Honor Code, and Related Disciplinary Policies are available in Middlebury’s handbook.
Middlebury College Syllabus Template retrieved 18 August 2020.
My personal take
When you get the opportunity to discuss your work with other students, ensure that everyone leaves the group discussion with the same level of understanding. A working solution you do not understand has no value to your learning.
Because machine learning is rather popular, it is likely you will be able to find partial or full solutions to challenges in your project or labs using online resources such as google or StackOverflow. A working solution you do not understand has no value to your learning.
Lastly, I remind you that I have a PhD in search engines. If you can find it, I can find it, too. Just CITE it and tell me, and don't turn in something that is fully-quoted from any sources.
Universal Access & Accommodations
The Disability Resource Center (DRC) at Middlebury provides for student accommodation in courses.
The DRC provides support for students with disabilities and facilitates the accommodations process by helping students understand the resources and options available and by helping faculty understand how to increase access and full participation in courses. The DRC can also provide referrals for students who would like to undergo diagnostic testing. Students who are on financial aid and have never undergone diagnostic testing can apply to the CTLR for support to cover the cost of off-campus testing. DRC services are free to all students.
If you have any accommodations through this office, please notify the instructor as soon as possible at the start of the semester so that your accommodation can be supported as quickly as possible.
I try to create course materials with "Universal Design" in mind: that is, I try to make it so that all materials can be accomplished fairly by all students. If there is any change that can be made to the course materials that would improve your learning, don't hesitate to ask or suggest such changes.
Inclusivity & Discussions
It is important to me to create an inclusive learning environment where diversity and individual differences are respected and recognized as a source of strength. However, this must be a team effort so I expect you to join me in fostering such an environment. This class will represent a diversity of individual backgrounds and experiences, and every member is expected to show respect for every other member so that everyone can learn in this space. If you experience or witness any behavior that opposes this idea, it would be helpful for me to know so that I can address it, but I do recognize that this is additional work and may be difficult. If you are comfortable reporting such incidents, there are a few ways you can do so:
- Email / Talk to me
- Report it to our anonymous CS departmental climate feedback form
- Fill out a Bias Incident Report which goes to the Middlebury Community Bias Response Team
You belong in this class and in the computer science department. Thank you for being here and for contributing to this course.
Pronouns and Identity
I will use your preferred gender pronouns and name, and I expect you to use the names and pronouns your classmates prefer. (I understand that some students may be in the process of exploring their gender identity, may not feel comfortable sharing a gender pronoun, or may not go by gender pronouns; you can let me know if you do not want to share a gender pronoun.)
If you are communicating about another student and do not know their pronouns, go ahead and use their whole name to refer to them e.g., -- "I agree with John.". You will notice that I also use they/them when referring to a hypothetical student.
Preliminary Schedule
This is quite a strange semester and time in the world -- this schedule represents my best guess about the future -- something will have to change. Check the online version for the latest; it will be kept up-to-date.
Date | Week # | Project/Notes | Content |
---|---|---|---|
25-Feb | 0 | What is ML? | |
2-Mar | 1 | Decision "CART" Trees | |
4-Mar | P: Possible Direction | ... | |
9-Mar | 2 | Last day to drop online | Experimentation in ML |
11-Mar | P: Task Def. | ... | |
16-Mar | 3 | Dataset Balance & AUC | |
18-Mar | frequency learning & Naïve Bayes | ||
23-Mar | 4 | P: Test Set | Perceptron & Linear Models |
25-Mar | Ethics in ML: could vs. should | ||
30-Mar | 5 | Last day to drop with approval | ... |
1-Apr | kNN, Regression | ||
6-Apr | 6 | The effect of data size | |
8-Apr | -- NO CLASS (R,F) | ||
13-Apr | 7 | No large assignments due. | Feature Engineering |
15-Apr | P: Training Set | Coordinate Ascent | |
20-Apr | 8 | Optimization: Linear & Logistic Regression | |
22-Apr | SGD & Neural Networks | ||
27-Apr | 9 | k-Means & Clustering | |
29-Apr | P: Analysis | Word Embeddings | |
4-May | 10 | No large assignments due. | Boosting & Bagging |
6-May | P: Exploration | Sequence Labeling: CRF, LSTM, Transformer | |
11-May | 11 | Spring Symposium | NO CLASS |
13-May | No large assignments due. | Recommender Systems | |
18-May | 12 | Adv. Topic by Request | |
20-May | P: Reflection | Adv. Topic by Request |
TA/Tutoring Sessions
This course has no dedicated tutoring resources. General tutoring will be able to help with Python-specific questions or with setting up and managing Python itself.
Information about the tutors can be found at go/cstutors.