Data Analysis: Statistical Modeling and Computation in Applications
A hands-on introduction to the interplay between statistics and computation for the analysis of real data. — Part of the MITx MicroMasters program in Statistics and Data Science.
About the course Data Analysis: Statistical Modeling and Computation in Applications
Data science requires multi-disciplinary skills ranging from mathematics, statistics, machine learning, problem solving to programming, visualization, and communication skills. In this course, learners will combine these foundational and practical skills with domain knowledge to ask and answer questions using real data.
This course will start with a review of common statistical and computational tools such as hypothesis testing, regression, and gradient descent methods. Then, learners will study common models and methods to analyze specific types of data in four different domain areas:
- Epigenetic Codes and Data Visualization
- Criminal Networks and Network Analysis
- Prices, Economics and Time Series
- Environmental Data and Spatial Statistics
Learners will be guided to analyze a real data set from each of these areas of focus, and present their findings in written reports. They will also discuss relevant and practical issues with peers.
This course is part of the MITx MicroMasters Program in Statistics and Data Science. It is at a similar pace and level of rigor as an on-campus course at MIT. Master the skills needed to be an informed and effective practitioner of data science. You will complete this course and three others from MITx and then take a virtually-proctored exam to earn your MicroMasters, an academic credential that will demonstrate your proficiency in data science or accelerate your path towards an MIT PhD or a Master’s at other universities. To learn more about this program, please visit here.
If you have specific questions about this course, please contact us at [email protected].
What you’ll learn
- Model, form hypotheses, perform statistical analysis on real data
- Use dimension reduction techniques such as principal component analysis to visualize high-dimensional data and apply this to genomics data
- Analyze networks (e.g. social networks) and use centrality measures to describe the importance of nodes, and apply this to criminal networks
- Model time series using moving average, autoregressive and other stationary models for forecasting with financial data
- Use Gaussian processes to model environmental data and make predictions
- Communicate analysis results effectively