๐ Digital Wellbeing Analysis: Statistical Impact of Screen Time on Youth Health Using Python + Tableau
- Business Problem: The pervasive increase in screen time among youth is a significant concern for parents, educators, and healthcare providers. This project was initiated to statistically quantify the impact of screen time on various health domains and to identify the key demographic factors that influence screen time habits.
- Solution: A comprehensive statistical analysis was performed on survey data from 99 children and adolescents. The methodology included data cleaning, feature engineering, independent t-tests, Pearson correlation, Chi-square tests of association, and multiple regression to build a holistic understanding of the issue.
- Key Results: The analysis revealed that higher screen time is statistically linked to more negative health outcomes across all measured domains (Physical, Psychological, Academic, Social, and Habitual). Habit patterns showed the strongest correlation with screen time (r = 0.46). Age was identified as the most significant demographic factor influencing screen time levels.
- Tools & Skills Used: Python | Pandas | Seaborn | Tableau | Scikit-Learn | FPDF | python-docx | Statistical Analysis | Data Visualization | Automated Reporting
This analysis provides clear, data-driven evidence that screen time duration is a critical factor in youth wellbeing, offering a foundation for targeted, age-appropriate intervention strategies.
For a more dynamic and exploratory view of the findings, please visit the interactive dashboard on Tableau Public. The dashboard allows you to filter data by demographic groups and see how health impacts change accordingly.
View Tableau Dashboard
- This project simulates a public health or academic research study aimed at understanding the consequences of digital device usage on children and adolescents (ages 10-21). As digital immersion becomes the norm, it's crucial to move beyond anecdotal evidence and provide quantifiable insights.
- From an analyst's perspective, the objective is to translate raw survey data into actionable intelligence for stakeholders like schools, parental groups, and policymakers. The findings can inform educational curricula, public health campaigns, and guidelines for healthy digital habits.
- The project culminates in a series of automated reports, including a detailed academic-style PDF and DOCX, making the findings accessible to both technical and non-technical audiences.
- Data Source: The data was collected via a structured survey and provided in an Excel file (
dataset_raw.xlsx) containing two main tables. - Data Structure:
- Table 1:
demographic_dataโ 100 records containing participant demographics (Age, Gender, Income, etc.). - Table 2:
impact_dataโ 100 records containing responses on a 5-point Likert scale to questions across five health domains.
- Table 1:
- Data Quality & Cleaning: The initial dataset of 100 records was checked for missing values. One record with a missing value in
Social_3was dropped, resulting in a final, clean dataset of 99 records and 41 columns (including engineered features) for analysis.
The project followed a structured analytical workflow:
- Data Cleaning & Transformation: Raw data from two Excel sheets was loaded, merged, and cleaned. Columns were renamed for clarity. Categorical variables were mapped to readable labels (e.g.,
1โ'10-12'), and composite scores were engineered for each health domain by averaging responses. - Exploratory Data Analysis (EDA) & Statistical Testing:
- Independent T-tests: To compare the mean health scores of "High Screen Time" (>2 hours/day) vs. "Low Screen Time" groups.
- Pearson Correlation: To measure the strength and direction of the linear relationship between daily screen time and the five health domain scores.
- Chi-Square Test of Association: To determine if there is a statistically significant association between screen time levels and key demographic variables.
- Predictive Modeling: A simple Multiple Linear Regression model was built to identify which demographic variables were the strongest predictors of daily screen time duration.
- Automated Reporting & Visualization: A dedicated Python script was created to automatically generate all findings into multiple formats (CSV, Excel, PNG, TXT, PDF, DOCX) to ensure the results are easily shareable and interpretable.
- Insight: Participants with high screen time (>2 hours/day) reported statistically significant negative impacts across all five health domains (p < 0.05 for all tests). This confirms that increased screen time is not benign and has measurable, broad-ranging consequences.
- Insight: The most pronounced negative impact was observed in Habit Patterns (Mean Difference = 0.82, p < 0.0001) and Academic Scores (Mean Difference = 0.63, p = 0.003), suggesting these are the areas of greatest vulnerability.
- Insight: There is a significant positive correlation between daily screen time and negative health impact scores across all domains. This indicates a dose-response relationship: as screen time increases, negative outcomes tend to increase as well.
- Insight: The strongest relationship exists between screen time and Habit Score (r = 0.46), followed by Overall Health Impact (r = 0.38). This suggests that excessive screen time is most disruptive to a child's routines and habits.

- Insight: Age is the only demographic variable with a statistically significant association with screen time levels (ฯยฒ = 21.90, p = 0.0013). This implies that screen time habits change significantly as a child transitions through different age groups.
- Observation: Other factors like Gender, Income, and Family Type did not show a statistically significant association in this dataset, suggesting that age is a more powerful differentiating factor.

-
Results:
- Universal Negative Impact: High screen time is significantly associated with poorer outcomes in physical, psychological, academic, social, and habitual health domains.
- Habits and Academics are Key: The strongest correlations and mean differences were seen in habit patterns and academic performance, highlighting these as primary areas of concern.
- Age is the Primary Driver: Age group is the most significant demographic factor linked to different screen time patterns.
-
Recommendations:
- For Educators & Parents: Develop age-specific screen time guidelines. A one-size-fits-all approach is insufficient. Interventions for early adolescents (13-15) should differ from those for older teens.
- For Public Health: Launch awareness campaigns focusing on how screen time disrupts healthy habits (sleep, meals, chores) and negatively impacts academic focus, as these are the most affected areas.
- For Future Research: Further investigate the nuances within age groups to develop more targeted and effective digital wellbeing strategies.
- Self-Reported Data: The analysis relies on self-reported survey data, which may be subject to recall bias or social desirability bias.
- Sample Size: The sample size of 99 participants is relatively small, which may limit the statistical power for subgroup analyses and the generalizability of the findings.
- Correlation vs. Causation: This is a cross-sectional study. While significant correlations and associations were found, they do not imply causation. A longitudinal study would be required to establish causal links.
- Longitudinal Study: Track a cohort of participants over several years to understand the causal relationships between screen time and long-term health outcomes.
- Qualitative Analysis: Conduct interviews or focus groups to understand the context behind screen time usage (e.g., educational vs. entertainment, social interaction vs. passive consumption).
- Expand Dataset: Increase the sample size and diversity to allow for more robust analysis of demographic subgroups (e.g., income levels, family types).
- Python: Pandas, NumPy, Seaborn, Matplotlib, Scikit-Learn
- Statistical Analysis: Independent T-tests, Pearson Correlation, Chi-Square Tests, Multiple Linear Regression
- Automated Reporting: FPDF, python-docx for generating professional PDF and Word documents directly from the analysis script.
- Data Cleaning & Feature Engineering: Handling missing data, creating composite scores, and mapping categorical variables.
โโโ .gitignore
โโโ LICENSE
โโโ README.md
โโโ data
โโโ processed_data
โ โโโ clean_df.csv # Final cleaned dataset
โ โโโ dataset.xlsx # Dataset after Excel preprocessing
โ โโโ final_merged_dataset.csv # Merged dataset
โโโ raw_data
โ โโโ raw_data.xlsx
โโโ models
โโโ analysis_results.pkl # Pickled analysis results
โโโ notebooks
โโโ 01_Data_Cleaning _and_Preprocessing.ipynb
โโโ 02_Exploratory_Data_Analysis.ipynb
โโโ 03_Reporting.ipynb
โโโ analysis_summary_report.txt
โโโ reports
โโโ Report.md
โโโ Screen_time_impact_analysis_report.docx # Final Report (in Docx format)
โโโ analysis_summary_report.txt # Final Report (in txt format)
โโโ detailed_screen_time_impact_analysis_report.pdf # Final Report (in pdf format)
โโโ figures # PNG visualizations
โ โโโ correlation_matrix.png
โ โโโ correlation_matrix_detailed.png
โ โโโ demographic_distribution_detailed.png
โ โโโ demographic_distributions.png
โโโ final_dataset.csv # Some extra datasets (showing results)
โโโ regression_coefficients.csv
โโโ association_analysis.csv
โโโ correlation_analysis.csv
โโโ significance_results.csv
โโโ complete_analysis_results.xlsx # Final Excel workbook having all final datasets
โโโ requirements.txt
- Dataset Source: "Proprietary survey data collected in Q2 2025"
- American Academy of Pediatrics. (2016). Media and Young Minds.
- Twenge, J. M., & Campbell, W. K. (2018). Associations between screen time and lower psychological well-being.
- World Health Organization. (2019). Guidelines on physical activity, sedentary behaviour and sleep for children.