Malik Ibrahim Ali Khan

Malik Ibrahim Ali Khan

Master of Science in Data Science

Regis University

Expected Graduation: Spring 2026

About Me

I am a Master’s student in Data Science at Regis University with a background in software development and a strong interest in applying analytics to real-world problems. My experience spans machine learning, statistical modeling, and data visualization, along with hands-on work building and evaluating predictive models across different domains. I enjoy breaking down complex problems, testing assumptions, and building structured, data-driven solutions that are both practical and ethical. Beyond academics, I value discipline, continuous improvement, and creative thinking—whether in coding, research, or personal development and I aim to build systems and insights that create measurable impact.

Technical Skills

Programming Languages

Python, JavaScript, R, SQL, Java

Tools & Frameworks

Docker, Git, Terraform

Machine Learning

TensorFlow, PyTorch, Scikit-learn

Web Development

React, Node.js, Flask, Django

Practicum Projects

MSDS 692

Early Gameplay Risk Scoring Engine Using Behavioral Data

Machine Learning Python

Project Methodology In this project, I focused on predicting whether a gameplay session would end in an incorrect outcome using early behavioral signals. The main idea was to see if the first 20 actions in a session contain enough information to estimate risk. First, I cleaned and prepared the dataset from the Kaggle “Predict Student Performance from Game Play.” The dataset included question-level labels, session IDs, level groups, and timestamped gameplay logs. I engineered two types of features: Overall session features – total events, session time, mean and median time between actions, number of long pauses, unique event count, and pause ratio. Early-session features (first 20 actions) – early mean timing, timing variability, early pauses, early unique events, and early pause ratio. To prevent data leakage, I applied a group-based train-test split so that sessions were not shared across training and testing sets. I trained two models: Logistic Regression (baseline) Random Forest (nonlinear ensemble model) Model performance was evaluated using ROC AUC and risk bucket analysis. Key Findings The Random Forest model performed better than Logistic Regression, achieving a ROC AUC of approximately 0.64 compared to 0.61. Although the AUC is moderate, the model was effective for risk ranking. When sessions were ranked by predicted risk and the top 20% highest-risk sessions were flagged: About 44% of incorrect sessions were captured. The high-risk group had a much lower correctness rate compared to the low-risk group. This shows that incorrect outcomes are concentrated in higher-risk sessions. Even without perfect prediction, early gameplay behavior provides meaningful signals that can support targeted intervention strategies.

Regis University | MSDS 692 | Spring 2026
MSDS 696

Wildfire Ignition Risk Prediction (Colorado)

Deep Learning Data Analysis Python

Wildfires are becoming a serious concern in Colorado because they affect communities, infrastructure, and the environment. Predicting the exact time and location of a wildfire ignition is very difficult, so this project focuses on ranking locations by wildfire ignition risk instead of predicting exact fire events. This project combines wildfire ignition data from NASA’s MODIS FIRMS system with daily climate data from the gridMET dataset. The final dataset includes grid-level obser- vations across Colorado from 2018 to 2023. Exploratory data analysis showed that wildfire events are rare, seasonal, and more closely related to accumulated dryness than single-day weather conditions. Several machine learning models were tested, including Logistic Regression, Random Forest, Gradient Boosting, and XGBoost. The models were trained on data from 2018 to 2022 and evaluated on unseen data from 2023. Since wildfire events are rare, the focus was placed on risk ranking rather than traditional accuracy. The results suggest that climate-based features can help identify areas with higher wildfire ignition risk. This approach can support wildfire monitoring, planning, and decision-making.

Regis University | MSDS 696 | Spring 2026

Get In Touch

Let's Connect

Feel free to reach out for collaboration opportunities, questions about my projects, or just to connect!

Send a Message

I'm always open to discussing new projects, creative ideas, or opportunities to be part of your vision.

Email Me Download CV