COLUMBIA UNIVERSITY COMS 6998.005
(This page is in flux)

Important Dates

Percentages are of your total class grade.

Overview

The major portion of your grade is based on the research project. Students will organize into teams of 1-3 students and work on a semester long project. Some possible ideas are described below.

Good class projects can vary dramatically in complexity, scope, and topic. The only requirement is that they be related to something we have studied in this class and that they contain some element of research – e.g., that you do more than simply engineer a piece of software that someone else has described or architected. To help you determine if your idea is of reasonable scope, we will arrange to meet with each group several times throughout the semester.

Proposal Presentations (Due: 1/24-2/7)

At the beginning of the 2nd to 4th lectures, each group will give a 5 minute presentation about their proposed project to the class. The presentation should contain:

Teams can meet with the instructor after their presentations for further discussion and feedback.

Click here to sign up. Click “next” until you get to the appropriate week.

Prospectus (Due: 2/11)

Your reserach prospectus will contain an overview of the research problem, your hypothesis, first pass at related work, a description of how you plan to complete the project, and metrics to decide if it worked.

Your prospectus should follow the example:

Submission:

  1. Rename the filename of your prospectus to the following format, last names should be in alphabetical order. prospectus_<lastname1>_.._<lastnameN>.pdf
  2. Click here to upload the file by 2/11 11:59PM EST

Poster Session (Due: 4/25)

Your team will prepare and present a project poster at the end-of-course poster session. This gives you an opportunity to present a short demo of your work and show what you have accomplished in the class!

Submission

Report (Due: 4/30)

You will prepare a conference-style report on your project with maximum length of 15 pages (10 pt font or larger, one or two columns, 1 inch margins, single or double spaced – more is not better.) Your report should expand upon your prospectus and introduce and motivate the problem your project addresses, describe related work in the area, discuss the elements of your solution, and present results that measure the behavior, performance, or functionality of your system (with comparisons to other related systems as appropriate.)

Because this report is the primary deliverable upon which you will be graded, do not treat it as an afterthought. Plan to leave at least a week to do the writing, and make sure your proofread and edit carefully!

Submission

  1. Rename the filename of your report to the following format, last names should be in alphabetical order. report_<lastname1>_<lastname2>.._<lastnameN>.pdf
  2. Click here to upload the file by 4/30 11:59PM EST

Project Suggestions

The following are examples of possible projects – they are by no means a complete list and you are free to select your own projects. In general, projects can be of three varieties:

  1. Research project: model an unsolved problem, propose algorithmic solution, evaluate and report findings.
  2. Win: pick an existing useful application and a well-recognized metric (latency, prediction, etc) and win against the state of the art.
  3. Break and fix: implement a state of the art algorithm on real data, show that it doesn’t actually work (results are poor, it’s slow, etc), make it work.

Data Cleaning

Understand how scientific articles use and talk about data. Two possible directions:

Arachnid is a new explanation engine that automatically generates cleaning programs based on user specifications of data quality. It is an extension to ideas from Scorpion. Contact Eugene for a copy of Arachnid. Some possible projects:

Automatic Interface Generation

Precision interfaces automatically generates interaction interfaces from program logs. It supports any parsable language that can be represented as an abstract syntax tree. Extend the system in interesting ways

Query Engine for Interactive Apps

Smoke is the fastest lineage-enabled database engine. It captures the relationships between output and input records as efficient lineage indexes. It turns out, this can be used to express and speed up interactive applications such as visualizations. Extend or use it in interesting ways