COLUMBIA UNIVERSITY COMS 6998.005

Questions for LogicBlox lecture

The LogicBlox paper and company spans the intersection of databases, programming languages, ML, and algorithms/theory. There are three sets of questions to ponder while reading the paper. Pick one of the following sets of questions to answer:

1. DB and PL

Notice that the PL community tends to hold the virtual machine constant (e.g. Java VM, the CLR, the x86 instruction set) and vary the imperative or functional surface language. ON the other hand, the DB community holds the language constant (SQL) and varies the underlying run-time (e.g. runtimes for OLTP, OLAP, Graphs, Documents, etc).

2. DB and ML/AI

3.DB and Algorithms/Theory

Broadly speaking, the performance of a program can be improved in two ways:

  1. improving constant factors (e.g. reduce interpretation overhead, memory hierarchy optimizations (latency hiding), parallelization)
  2. improving asymptotics (e.g. sorting in O(n log n) vs in O(n^2))

Q1: Discuss why traditional relational algebra operators are either unary or binary. Consider the following aspects:

Seminal work from the early 1970’s by researchers like Codd working on relational database and Kowalski on Prolog showed that it is possible to build systems that can compute the correct answers to declarative programs written in subsets of first order logic. Their work established that it can be done but didn’t make any guarantees about the computing resources needed to do it. The lack asymptotic complexity guarantees on the evaluation of such programs limited the utility of declarative of programming systems to important but niche subsystems like databases. Most systems today are still built primarily with imperative programming languages.

Q2: Can we build databases systems that make asymptotic complexity guarantees on query evaluation?