M.S. IDSE

The MS in Data Sciences will offer students an in depth education experience to focus on data science as it pertains to their unique interests. The foundation of this program has been built from the 4-course Certification of Professional Achievement in Data Sciences program. Students will be interacting with diverse faculty members and students, given the opportunity to conduct research opportunities, included in a capstone project course, and available for industry interaction.

Students will be given the opportunity to select an elective track which incorporates the six centers within the Institute as well as an Entrepreneurship track. This allows students to hone in on their particular interests and skill sets.

The tuition cost for the course(s) will default to the rate used by the School of Engineering. Please note this estimate is expected to change for each annual term. For more details on tuition and fees, click here.

We’ll be accepting applications late September.

Our curriculum is 30 credits total.

      1. Prerequisites: MATH V1101 and V1102 or the equivalent. A calculus-based introduction to probability theory. Topics covered include random variables, conditional probability, expectation, independence, Bayes’ rule, important distributions, joint distributions, moment generating functions, central limit theorem, laws of large numbers and Markov’s inequality.
      1. Methods for organizing data, e.g. hashing, trees, queues, lists, priority queues. Streaming algorithms for computing statistics on the data. Sorting and searching. Basic graph models and algorithms for searching, shortest paths, and matching. Dynamic programming. Linear and convex programming. Floating point arithmetic, stability of numerical algorithms, Eigenvalues, singular values, PCA, gradient descent, stochastic gradient descent, and block coordinate descent. Conjugate gradient, Newton and quasi-Newton methods. Large scale applications from signal processing, collaborative filtering, recommendations systems, etc.
      1. Course covers fundamentals of statistical inference and testing, and gives an introduction to statistical modeling. The first half of the course will be focused on inference and testing, covering topics such as maximum likelihood estimates, hypothesis testing, likelihood ratio test, Bayesian inference, etc. The second half of the course will provide introduction to statistical modeling via introductory lectures on linear regression models, generalized linear regression models, nonparametric regression, and statistical computing. Throughout the course, real-data examples will be used in lecture discussion and homework problems.
      1. An introduction to computer architecture and distributed systems with an emphasis on warehouse scale computing systems. Topics will include fundamental tradeoffs in computer systems, hardware and software techniques for exploiting instruction-level parallelism, data-level parallelism and task level parallelism, scheduling, caching, prefetching, network and memory architecture, latency and throughput optimizations, specialization, and an introduction to programming data center computers.
      1. An introduction to machine learning, with an emphasis on data science. Topics will include least squares methods, Gaussian distributions, linear classification, linear regression, maximum likelihood, exponential family distributions, Bayesian networks, Bayesian inference, mixture models, the EM algorithm, graphical models, hidden Markov models, support vector machines, and kernel methods. Part of the course will be focused on methods and problems relevant to big data problems.
      1. This class introduces the data processing and algorithmic skills, as well as design principles necessary to explore and present datasets computationally and visually. These include command line tools, the use of state-of-the art languages and software, an algorithmic understanding of how to work with a large datasets (including parallelism and the map-reduce framework), interactive visualizations, exploratory data analysis as a means to generate and test hypotheses, as well as basics of data exploration and visualization.
      1. This course provides a unique opportunity for students in the MS in Data Science program to apply their knowledge of the foundations, theory and methods of data science to address data science problems in industry, government and the non-profit sector. The course activities focus on a semester-length data science project sponsored by a local organization. The project synthesizes the statistical, computational, engineering challenges and social issues involved in solving complex real-world problems.
      1. The elective courses for the proposed M.S. in Data Science will draw upon existing graduate level courses at Columbia University. In addition to advisor approval, elective course selection will be subject to course pre-requisites, course availability, and the cross-registration procedures of the school/department offering the requested courses.

 

The program may be completed in two semesters of full-time intensive study or on a part-time basis.