I have been working on DataPath, a fast database engine I'm developing with Chris Jermaine (Rice, Computer Science Department) for the laste 3 years. As we are approaching the first public code release and working to wrap up the second rewrite, it crossed my mind that it would be beneficial to share the problems and solutions (good or bad) that we encounter.
While the effort is shared with Chris and many students, the ideas in this blog are mine and are not necessarily shard by the other members of the project.
Main goal: provide my insights in writing code that pushes the modern architecture (disk, memory, multi-cores) to the limit. The perspective is mostly from the point of view of designing a database engine that runs well on large data (1-10TB/machine).
Sub goals: offer insights into software development issues, dealing with design decisions, marrying performance and elegance and other things readers might ask for.
My qualifications: Until 2009 when we started developing DataPath I was mainly a theoretician (approximate query processing, foundations of data-mining). Designing and writing code for a database system is a new experience for me. The lack of experience in designing and building systems would normally be a big warning sign and a good indicator for lots of sub-standard work. In this case, due to the ambitions goals of DataPath, I think it is a tremendous asset since I can leverage my theoretical understanding and I am forced to think through everything and potentially follow paths that are unconventional. Chris did implement a lot of systems and he keeps me in check.
Alin
No comments:
Post a Comment