In this course, we will learn how to find data sources and datasets and learn from them. This course situates data analysis tasks and interpretations in the context of urban planning and research methods. This course uses advanced quantitative and statistical methods for analyzing urban issues. To start this course, students are expected to understand basic computer literacy, including file management and cloud-based data backups (e.g. Dropbox, Google Drive, MS OneDrive). Students are also expected to have basic mathematical knowledge and data manipulation in MS Excel.
This course uses the Python programming language as the main platform for data manipulation and statistical analysis. Students will learn basic data manipulation using Python and measurement of relationships using statistical methods. We will start by downloading data from NYC’s open data catalog and reading them in Jupyter Notebooks. After basic data visualizations, we will start exploring the data using descriptive statistics. Then, the fundamentals of inferential statistics will be taught. The course work will culminate with a final project that students will carry out by choosing among a pool of datasets to answer independent research questions.
This summer course is fast-paced, and students are expected to learn how to troubleshoot programming problems independently. We will use generative in this course for developing basic codes. Students who have concerns about using AI need to talk to the instructor in advance.
We will use MS Teams as our communication platform. Students need to be on Teams and be active participants. Being an active participant means you need to ask questions and respond to questions brought by others.
Learning Objectives: The learning objectives of this course include:
- To learn about online datasets, open data catalogs, and other relevant resources;
- To learn basic programming skills (with Python) for data preparation and analysis
- To learn descriptive statistics;
- To learn relationships between planning-related variables in urban areas and inferential statistics for hypothesis testing;
- To learn how to use data analysis for critically evaluating existing planning policies and building future alternatives.
Session 1
- Introduction to the course, our policies, and our resources.
- If you are not on Teams, you are not in our team.
- How to leverage AI.
- Let’s set up Google Colab, as your programming environment
- We will mount your Google Drive in Colab.
- We are using this notebook: https://github.com/mehdiheris/UrbanDataAnalytics/blob/main/notebooks/Start_Python_Session_1.ipynb
- Let’s do Python: What are variables?
- Let’s do Python: What are data types?
- Let’s do Python: Create Lists.
- Let’s do Python: What are For loops?
Session 2
- Explanation of mean, median, and standard deviation
- We are using the following notebooks:
import pandas as pdimport matplotlib.pyplot as pltimport seaborn as sns
- Start with Dataframes in Pandas
- Explanation of columns
- Learn how to get mean, median, min, and max for each column
- Explanation of histogram plots
- Explanation of scatter plots
Session 3
We are using these notebooks:
- 1 https://github.com/mehdiheris/UrbanDataAnalytics/blob/main/notebooks/Explore_a_Table_Session_3.ipynb
- 2 https://github.com/mehdiheris/UrbanDataAnalytics/blob/main/notebooks/Pandas_Groupby_Session_3.ipynb