Mechanics

This course can be, and has been taught using a variety of platforms.

Three systems are needed: A relational database, Spark, and TensorFlow.

Since the course uses Big Data, there are advantages to setting up a small scale, low or no cost environment where students can experiment and debug their code before running it on the full datasets.

Our current deployment consists of a hosted Jupyter Hub hosted on Rice University infrastructure for Python, Postgres, and Spark (using small datasets). They then run their code on the larger datasets on Amazon Web Services.

An alternative to the locally hosted solution is to install containerized software, such as Docker, on student machines. Then containers can be distributed that contain the database software and data, Spark, or TensorFlow. Again, the large datasets are run on AWS.

Finally, the students may utilize a fully hosted environment, such as Amazon Web Services, for all of the systems.