Project Website: http://daslab.seas.harvard.edu/classes/cs165/project.html
The class has a running project throughout the semester.. The project is about designing and implementing a prototype of
a modern main-memory optimized column-store data system. By the end of the project you will have designed, implemented, and
evaluated several key elements of a modern data system and you will have experienced several design tradeoffs in the same
way they are experienced in industry labs.
This is a challenging but fun project!
We will also point to several open research problems throughout the semester that may be studied on top of the class project
and that you may decide to take on as a research project. The project has a total of five milestones with specific expected
The five deliverables are:
1) basic storage layer, 2) indexing methods optimized for main-memory, 3) shared scans methods, 4) joins, and 5) updates.
The deliverables will be tested using predefined automated unit tests for functionality and, as extra credit, for performance.
Automated Testing Infrastructure:
We have an automated testing infrastructure. We provide a series of tests (using both fixed and randomized data) to automatically
test your code for each project milestone. You are able to submit your code daily and get results by automated emails overnight.
Tests run against an in-house Linux server at DASlab. You will be able to find the exact specifications of the machine and
tests on the project website. Once you pass all the tests in the testing infrastructure your project is complete!
We will have a running competition and an anonymous leaderboard so you can continuously compare your system’s performance
against the rest of the class (and past classes). Essentially we provide additional tests that increase the amount of test
data so performance differences between projects will be highlighted. You will be able to run these tests daily as well,
so you can improve throughout the semester. We will also provide a “benchmark” entry in the leaderboard which represents
what we consider good performance for each milestone based on an in-house implementation from the lab.
We will give you starting code
that implements the basic client-server functionality (i.e., communication) so you can focus on building the server side
code, that is, the essential core data processing algorithms and data structures of a database system. In addition, whenever
applicable we will let you know if there are existing libraries that is OK to use.
Individual deliverables should pass all provided tests on the testing infrastructure. However, you will not be judged only
on how well your system works; it should be clear that you have designed and implemented the whole system, i.e., you should
be able to perform changes on-the-fly and explain design details. At the end of the semester each student will have a 1-hour
session with the instructor and another 1-hour session with the TFs where the student will demonstrate the system, and answer
questions about the design and about supporting alternative functionality. [Tip: From past experience we found that frequent
participation in office hours, brainstorming sessions and labs implies that the instructor and the TFs are very well aware
of your system and your progress which makes the final evaluation a mere formality in these cases.]
The project is an individual project. The final deliverable should be personal. You must write from scratch all the code
of your system and all documentation and reports. Discussing the design and implementation problems with other students is
allowed and encouraged! We will do so in the class as well and during office hours, labs and brainstorming sessions. All
students that have collaborated with other students in whatever capacity should provide a collaboration statement with their
final deliverable to properly acknowledge any ideas that was taken or was influenced by discussions with other students.
Late Days Policy & Schedule:
The project is due at the end of the semester. In the project description you can find a detailed time-schedule that we propose
you follow. With the exception of the midway check-in (which is a hard deadline), the rest is a “suggested schedule” that
will allow you to spread the work throughout the semester and to have sufficient time for each milestone based on the complexity
and the work required at each phase of the project. This is an involved project that requires commitment through the entire
semester and cannot be done in 2-3 weeks at the end. Not submitting the project milestones on time will have no side-effects
on your grade but at the same time, we will not be able to provide you with any feedback on your progress until we have your
design documents and your code.
Experience says that every year a number of students cannot handle the freedom to self-pace, and end up significantly deviating
from the schedule. We will send you frequent reminders but you should know that deviating from the schedule by more than
a couple of weeks will most likely mean that you will not be able to finish the whole project by the end of the semester
(unless you are already an experienced systems student).
The goal here is to demonstrate that you are having decent progress and mainly to avoid falling behind. By late October each
student should 1) deliver a design document that describes the intended design for the first two milestones (5%) and 2) have
implemented a project that passes at least the first three tests of the first milestone in the automated testing infrastructure
(5%). A template of the expected design document is provided online. The midway check-in deadline is a hard one; no extensions
will be given so please do not ask for one unless you think there is a fair reason such as a medical issue. The reason is
that we are trying to make students see the scope of the project early on.
The three fastest projects (top 3 in the leaderboard by the end of the testing period) will gain extra points (5%). The competition
will terminate the last day before we need to upload grades so you will have plenty of time to improve (until around mid
Extra Points for Bonus Tasks:
We will regularly assign extra tasks or you can come up with your own extra tasks for the various components of the project.
With these extra tasks you gain extra points (up to 5%).
What is a Successful Project?
A successful project passes all the predefined tests we provide on the testing infrastructure and the student successfully
passes the final face-to-face evaluation. A successful final evaluation is one where the student is able: (1) to fully explain
every detail of the design, and (2) to propose efficient designs for new functionality on the spot. On the class website
you will find a step-by-step guide that will help you prepare for the evaluation meeting.