The class meets twice a week: Mondays and
Wednesdays 4:00-5:30pm. Room TBA. Class starts at 4:10pm.
Classes are designed to be discussion-based and slides
will be used mainly to drive discussions as opposed to
delivering the material.
Interaction in every class:
In every class there
will be an interactive 30-40 minute session where students will work on
problems in groups of 3-4 students.
Office hours & Labs:
Interaction does not stop in lecture time. CS165 is designed to maximize interaction as we truly believe this is the best way to learn; we offer daily office hours and labs.
Starting Week 1, Prof. Idreos will hold office hours every week day in his office, MD139. Labs are also offered every day of the week as of Week 2. Labs are offered by the TFs. Check the class website to get the exact time slots for both OH and labs.
The goal of OH is to provide any kind of feedback on the class material. You should come to OH to ask questions about past classes and quizzes. You should also come to OH to discuss the design of your project and to get feedback on your design documents. You are also welcome to come to OH for any other general question regarding classes, carriers in industry/academia, PhDs, etc.
Labs: Labs can help with similar discussions as with OH but the main goal of Labs is to provide hands-on help for the project. So bring your laptop and your questions about specific project parts you need help with. Labs are the place to go when you have a persistent bug, when you need help with a specific tool for the project (e.g., for debugging or performance testing) or to get feedback about the quality of your coding.
Finding and fixing bugs can be very difficult and time consuming. As such, we want to make the time you spend in Labs is as useful as possible. We want to teach you the process of finding and fixing bugs, not just solve a bug for you. We expect that before coming to labs you have spend several hours â€œfightingâ€ a bug. Then if you cannot make any more progress on your own, you should come by and by then you will have enough experience to really understand the solution and the process. Do not feel like something wrong is happening if you find yourself stack with a bug for a day or two. This is normal and part of the learning process. It will and should happen several times through the semester. Before coming to discuss a bug you should perform/answer several questions on your own: Check the class website for exact instructions.
We will also offer extra weekend office hours and labs as needed.
Attendance and Simultaneous Enrollment:
Based on the philosophy of the course, attendance in lectures, labs and office hours is optional. The best way to learn, though, is through discussion and interaction with the instructor and the TFs. Our classes are not about "lecturing" - they are semi-flipped and all about interaction. We hope to see you there! If you are a college student and considering simultaneous enrollment then come to OH to discuss if depending on your exact situation this may be a good idea.
All classes and interactive sessions in class will be recorded and will be available online. So even if you miss a class it will be easy to catch up and you can also use these recordings to recite specific material throughout the semester (e.g., to prepare for midterms).
Another component of the course is sections. Sections are used to deliver material about the class, i.e., to go more deeply into some of the concepts discussed in class, to do additional quizzes, or to deliver background material that is needed to follow next weekâ€™s class or for the project. There will be no actual section meeting. Instead, all sections will be recored by the TFs and videos will be posted online. The material posted will be tailored to present a step by step guide for any of the topics presented to make it easy to follow everything without having to be physically present in an actual section. However, if there are still questions about the material presented in sections, you will be able to ask those questions either during the daily office hours or during the daily labs.
Throughout the semester, on select Tuesdays evenings the instructor, and DASlab PhDs and postdocs will discuss about research! First, DASlab researchers will present their recent work on data systems research and connect it with the material you are learning in class. Then, you will get the chance to talk with them about their research, open problems and be exposed to open research opportunities. Snacks and drinks will be provided.
It is a tradition in CS165 and CS265 to schedule several discussion sessions throughout the semester. Typically we bring food and drinks and have a relaxed time discussing projects, open research topics, careers in industry and academia, grad school and anything else you may have in mind.
Who can take this class?
You probably heard stories that this is a very heavy class and that the project will consume a ton of your time. While this is true, it is also true that you will have a lot of help! So fear not.
Naturally, the more background you have the
smoother your experience in 165 will be. Prior knowledge of C
programming and systems programming, as well as a good
understanding of computer architecture and in particular the
memory hierarchy (cache memories) is very important for this
class. Courses providing systems background (like CS50 and
in particular CS61 or equivalent) are essential. Good hacking, algorithm designing, and data structures skills are also required.
A self-evaluation guide
is posted on the class website to help you understand if you qualify for the course and how much material you might need to cover. The course (lectures, sections, labs, and office hours) is designed so you can acquire the necessary background even if you are missing some essential knowledge at the beginning of the semester. So we have you covered. However, you should be aware that if you did not breeze through the self-evaluation guide you will have to put in more hours to successfully complete the course. Talk to the instructor if you have not taken CS61 or if you do not feel completely comfortable with the self-test but you still think you are ready for CS165.
We provide a Test 0 that is designed to 1) help you get an idea about how fit you are for the class and 2) bootstrap your semester project. Essentially Test 0 consists of an independent data structure design and implementation in C that you can later on use as is for the first milestone of your semester project.
If you are reading this text a few weeks or even months before the semester starts, you can use the guidelines on the class website to prepare for the course. There you will find specific study material and programming exercises.
How can I do great in CS165?
Just utilize all resources provided. Show up in class to participate in interactive sessions. There are also daily office hours and labs; show up as often as possible so we can help with anything you need! When you find yourself stuck with the project either with a design decision or just a bug, it is normal to struggle for a while â€” it is part of the learning process â€” but after some time grab your laptop and come by!
We welcome feedback and ideas about the course at any point during the semester. Just come and chat with us during office hours! Tell us how you are keeping up and how we can make it easier for you.
No Laptop/Phone Policy:
CS165 is based on
interaction. We want students actively participating in
class and interactive sessions, asking and answering
questions to maximize learning. In each class, we will
bring a printed copy of the slides for each one of the
students so you can follow along and to keep notes on
paper. So you do not need your laptop or phones for notes
or looking up the slides online. In fact, recent studies
show that even if you only use a laptop for note taking,
it can have a negative impact on how well you understand
the material in class. [The Pen Is Mightier Than the
Keyboard: Advantages of Longhand Over Laptop Note Taking.
Pam A. Mueller and Daniel M. Oppenheimer. Psychological
Science. 2014, Vol. 25(6) 1159â€“1168] (NOTE: There are
cases where having a phone or laptop during class is
necessary such as when you expect an important call or
message or when you need the laptop to better follow the
slides due to any issues with your eyes or ears. Just let
the instructor know and all such cases will be granted
permission to use any tools necessary.)
Every semester we arrange a few
guest lectures by leaders in data system design from
industry and academia. Past guest lecturers in our
2014/2015 classes include: Guy Lohman from IBM Research,
Erietta Liarou from EPFL Lausanne, Alkis Simitsis and
Georgia Koutrika from HP Labs, Nikita Shamgunov from
MemSQL, Laura Haas from IBM Research, Nga Tran from
Vertica and Jignesh Patel from University of Wisconsin,
Magda Balazinska from University of Washington,
Johannes Gherke from Microsoft, Goetz Graefe from Google, Marcin Zukowski from Snowflake, Justin Levandoski from Microsoft Research.
You will get the opportunity to both hear a guest lecture
and to actively participate in discussions with our guest
The class is about
state-of-the-art data system design. There is no textbook
for that. Thus, we use recent research papers and surveys
which will be posted on the course website, which you will
have access to through the Harvard network. We also use
the following textbook: Database Management Systems, by
Raghu Ramakrishnan and Johannes Gehrke. This textbook is a
great source for all the seminal and traditional topics
that we will cover.
The slides used during the course
will be available online before each class. We will also
print slides for you and bring them to each class. If
there is material that we want to communicate to you only
after class, this will be available shortly after each
SLIDES ARE NOT NOTES!
You should not expect the slides to cover the material in
detail. The class is based on discussion and problem
solving; the slides are tailored to drive the discussion
as opposed to serving the material.
In each class one or more students will be assigned to
take notes. After class these students will populate a
collaborative notes document and then all students are
welcome to jump in and enrich the notes further.
Collaborative note taking and editing will be part of your
class participation grade and a great way to recite the
material and also see how your fellow students perceive
The link to the collaborative notes is available on the
top right of the class website.
We will use Piazza for online discussions. The link for the class is https://piazza.com/harvard/fall2017/cs165/home
for extension, and piazza.com/harvard/fall2017/cs165l/home
We continuously monitor Piazza and will be answering your questions promptly. In past offerings the average response time was in the order of a few minutes. So you basically have access to the teaching staff all day long. You are welcome to post any question that might help you understand the material better or help you with the project. Anonymous posting (to the other students) will be enabled so that students feel more comfortable posting questions.
BASIC RULES FOR PIAZZA: We only have a few basic rules so we can keep the forum functional and useful for the students as well as manageable for the staff.
- We ask that you first search the forum well before
posting a question so that we do not have duplicate entries.
- Please make sure to stay on top of all staff posts
(especially those that are pinned). Anything we post in
Piazza we consider â€œknown.â€
- Do not use Piazza to post code or ask help with
debugging. While it can work in some cases remote debugging
is a pain and takes a lot of time. We have labs every day.
Bring your laptop and we will help you on site or join
remotely and we will help you via a shared screen mode.
â€¢ Class participation and quizzes: 20%
â€¢ Midterm 1: 15%
â€¢ Midterm 2: 15%
â€¢ Project milestone 1-5: 40%
â€¢ Midway Check-in: 10%
â€¢ Bonus: Extra project tasks: up to 5%
â€¢ Bonus: Speed prize: up to 5%
This adds up to more than 100%, however the grades are judged upon a 100% scale.
We do not allow pass fail in CS165. Due to the interactive nature of the course, for every student that takes it, the teaching staff need to invest a lot of time during class, OH and labs. We expect students to fully commit and we are here to help you all the way through every single day.
AUDITing: We may allow a couple of audit slots depending on the number of students. Contact Stratos.
We hold two midterms. Books and notes may
be open during midterms. Laptops, phones or any other
electronic devices are not allowed.
Midterms are not designed to test how much you can remember from the content. Instead, they stress your ability to come up with new solutions, think through all design decisions and side effects of any solution you choose and how you communicate your design. The best way to prepare for midterms is to have an excellent handle on all the
topics we work on during our interactive in-class sessions. In particular, the midterms questions would require similar thinking as the interactive sessions.
As a result, following the class and the in-class quizzes will naturally help you practice for the midterms.
You do not have to study for midterms alone. In addition to office hours and labs, before each midterm the instructor will hold special weekend-long meetings to help you go over the current material and past in-class quizzes. You may stay for as long as you need until you feel you are well prepared.
Feedback on Progress:
We provide feedback continuously. The main thing that you will need feedback on is your semester project. The way to get feedback is to show up to our daily office hours and labs and share your design decisions, code, and test results with the staff. In this way, you will get hands-on help and feedback.
Feedback on midterms will be provided within one week and you are welcome to come by during office hours to discuss any one of the tasks. We will also cover the midterm topics during class 1-2 weeks after each midterm.
Project Website: http://daslab.seas.harvard.edu/classes/cs165/project.html
The class has a running project throughout the semester. The project is about designing and implementing a prototype of a modern main-memory optimized column- store data system. By the end of the project you will have designed, implemented, and evaluated several key elements of a modern data system and you will have experienced several design tradeoffs in the same way they are experienced in industry labs.
This is a challenging but fun project!
We will also point to several open research problems throughout the semester that may be studied on top of the class project and that you may decide to take on as a research project.
The project has a total of five milestones with specific expected deliverables. The submission of each deliverable includes two parts: source code and a document detailing the major design decisions and why you made them (design document).
The five deliverables are:
1) basic storage layer, 2) indexing methods optimized for main-memory, 3) shared scans methods, 4) joins, and 5) updates.
The deliverables will be tested using predefined automated unit tests for functionality and, as extra credit, for performance.
Automated Testing Infrastructure:
We have an automated testing infrastructure. We provide a series of tests (using both fixed and randomized data) to automatically test your code for each project milestone. You are able to submit your code daily and get results by automated emails overnight. Tests run against an in-house Linux server at DASlab. You will be able to find the exact specifications of the machine and tests on the project website. Once you pass all the tests in the testing infrastructure your project is complete!
We will have a running competition and an anonymous leaderboard so you can continuously compare your systemâ€™s performance against the rest of the class. Essentially this means that we provide additional tests that increase the amount of test data so performance differences between projects will be highlighted. You will be able to run these tests daily as well, so you can improve throughout the semester. We will also provide a "benchmark" entry in the leaderboard which represents what we consider good performance for each milestone based on an in-house implementation from the lab.
We will give you starting code that implements the basic client-server
functionality (i.e., communication) so you can focus on building the server side code, that is, the essential core data processing algorithms and data structures of a database system. In addition, whenever applicable we will let you know if there are existing libraries that is OK to use.
Individual deliverables should pass all provided tests on the testing infrastructure. However, you will not be judged only on how well your system works; it should be clear that you have designed and implemented the whole system, i.e., you should be able to perform changes on-the-fly and
explain design details.
At the end of the semester each student will have a 1-hour session with the instructor and another 1-hour session with the TFs where the student will demonstrate the system, and answer questions about the design and about supporting alternative
functionality. [Tip: From past experience we found that frequent
participation in office hours, brainstorming sessions and labs implies that the instructor and the TFs are very well aware of your system and your progress which makes the final evaluation a mere formality in these
The project is an individual project. The final deliverable should be personal. You must write from scratch all the code of your system and all documentation and reports. Discussing the design and implementation problems with other students is allowed and encouraged! We will do so in the class as well and during office hours, labs and brainstorming sessions.
All students that have collaborated with other students in whatever capacity should provide a collaboration statement with their final deliverable to properly acknowledge any ideas that was taken or was influenced by discussions with other students.
Late Days Policy & Schedule:
We allow for 1000 late days or until Harvard requires us to upload your grade! The more input you give us, the more we can help you learn. On the project website and in the project description you can find a detailed time-schedule that we propose you follow. With the exception of the midway check-in (which is a hard deadline), the rest is a â€œsuggested scheduleâ€ that will allow you to spread the work throughout the semester and to have sufficient time for each milestone based on the complexity and the work required at each phase of the project. This is an involved project that requires commitment through the entire semester and cannot be done in 2-3 weeks at the end. Not submitting the project milestones on time will have no side- effects on your grade but at the same time, we will not be able to provide you with any feedback on your progress until we have your design documents and your code.
Experience says that every year a number of students cannot handle the freedom to self-pace, and end up significantly deviating from the schedule. We will send you frequent reminders but you should know that deviating from the schedule by more than a couple of weeks will most likely mean that you will not be able to finish the whole project by the end of the semester (unless you are already an experienced systems student).
The goal here is to demonstrate that you are having decent progress and mainly to avoid falling behind. By October 10 midnight (hard deadline) each student should 1) deliver a design document that describes the intended design for the first two milestones and a description of the rest of the milestones (5%) and 2) have implemented a project that passes at least the first three tests of the first milestone in the automated testing infrastructure (5%). A template of the expected design document will be provided early in the semester.
The three fastest projects (top 3 in the leaderboard by the end of the testing period) will gain extra points (5%). The competition will terminate the last day before we need to upload grades so you will have plenty of time to improve (until mid December).
Extra Points for Bonus Tasks:
We will regularly assign extra tasks or you can come up with your own extra tasks for the various components of the project. With these extra tasks you gain extra points (up to 5%).
What is a Successful Project?
A successful project passes all the predefined tests we provide on the testing infrastructure and the student successfully passes the final face-to-face evaluation. A successful final evaluation is one where the student is able: (1) to fully explain every detail of the design, and (2) to propose efficient designs for new functionality on the spot. A month before the final evaluation you will find on the class website a step-by-step guide that will help you prepare for the evaluation meeting.
Joining Class Remotely
Lectures will be broadcasted live Mondays/Wednesdays 4-5:30pm. Lectures will also be available for on-demand broadcast within 24 hours after each class. Students will be able to watch the live or recorded broadcast through their browser using the Matterhorn player. The link to the broadcasts for CS165 will be available through the canvas website for this class and will also be posted on the class website before the first lecture.
Extension school students will be able to participate live in classes, office hours and labs via web-conference tools (we will use Zoom). The course staff will be online with Zoom during each session and you will be able to actively interact with the staff. Other than standard chatting and talking features Zoom also offers screen sharing features which can be used for when you need help with specific issues such as debugging.
Capturing Discussions: Given that a big portion of the class is based on interaction, extension school in cooperation with the class staff is working to set-up a system with several microphones across the classroom so we can accurately and clearly capture brainstorming discussions and comments during class time. Microphones will â€œfollowâ€ the instructor.
Grading: Even though we encourage extension school students to utilize the opportunity to interact with the staff and participate in class live we know that for practical reasons this will not be possible for all remote students. For this reason for extension school students there will be no â€œclass participationâ€ grade and the portion of this grade will be distributed equally in project (60%) and midterms (40%).
Midterms: Extension School will contact students directly regarding administrative preparations and options for midterms. Midterms are proctored and we also allow the new option to take the midterm directly through Canvas with a camera. Local extension school students should come to campus and take the midterm on midterm day (we usually book a slot at ~6pm so it is easy to attend after work).
Piazza: To participate in piazza you need a Harvard
email address. If you do not have one you can create one
Office Hours and Labs: If none of the existing slots
for office hours and labs do not work (e.g., due to time
differences), we will include additional slots; just let us
Starting Date: Note that usually extension school
shows the class starting date to be one day after the
actual starting date (which is Wednesday August 31, at
4pm). In fact, this is when the first video will be
available. However, extension school students will still be
able to stream live the first class on August 31 and
participate live as normal.
Accessibility: Harvard and the Extension School are
committed to providing an accessible academic community.
The Disability Services Office offers a variety of
accommodations and services to students with documented
disabilities. Please visit www.extension.harvard.edu/resources-policies/resources/disability-services-accessibility
for more information and do not hesitate to contact Prof.
Idreos directly, by email, with any questions or concerns
you might have.
How to Read Research Papers
Interacting using Zoom:
We will be using Zoom for in-class communication [http://zoom.us/].
The class will be recorded through a separate service of Harvard Extension School, so Zoom will not be used for recordings.
Install and try Zoom:
Please navigate to http://zoom.us/ and create an account so we can see your name during class and OH discussion.
Download the client (for your computer and/or your tablet and phone).
For the discussion you will need a USB headset with a microphone in order to ensure that the audio will be clear
when you speak and that there wonâ€™t be feedback that distracts everyone in the class. Make sure that you plug in
your headset before you log into Zoom or your audio may not work.
It is a good practice to test your audio before class begins.Â
If you forget to plug in your mic before class, restart Zoom.Â
If all else fails, you can also join by telephone.
(Go to Joining a Session, which you see after you log on, and then click on the Join by Phone option.)
If you have any technical issues during class, please mute yourself and immediately call the HELP Desk at 617-998-8571.
They will be there to help you with any technical problem so that you can rejoin class as quickly as possible.
Using Zoom in class and OH:
When class starts all remote students will be muted, but you can un-mute and/or raise your hand when you have questions.
Keep your camera on, especially if you are interacting with the class. Always keep camera on during OH.
Speaker View & Gallery View:
Zoom supports speaker view & gallery view: Speaker view highlights one person with four in miniature;
gallery view allows the whole class to be visible.Â
Speaker view is for talking/lecturing; gallery view is for whole-class discussion and OH.
In some cases, we may share material from the computer; the video will still be available, and the view will be focused on the shared screen.
Interaction During Class and OH:
How to get our attention during class:Â (1) use chat, (2) click on the hand icon in Participants.
In OH typically it would be possible to speak and directly jump in when you have questions.
The meeting URL will be https://zoom.us/j/9063672373 for sections and https://zoom.us/j/462802443 for class. These links will only be active during the time of office hours and class respectively.
Using Zoom Guide
In this class many of the reading assignments are recent research papers. Unless mentioned otherwise in class, you are not expected to read and understand all these research papers in extreme detail. The main purpose is for you to get exposed to recent ideas and concepts and get inspiration about new opportunities and what is coming in the future. We expect you to read and understand well the abstract, introduction and related work parts of all papers. For the rest of the material in a paper, i.e., the main technical part and the analysis we expect you to have a high level idea of what this does unless we explicitly cover in detail the exact techniques in class. Our goal is that by the end of the semester you will have enough background to be able to pick up any of these papers again and understand it fully! Of course if you want to discuss any of those papers in more detail we will happily do so during office hours.