Introduction
Predicting academic success is a crucial task for universities seeking to admit students who thrive academically. To tackle this, I developed a predictive deep learning model that forecasts first-year GPA using high school SAT scores and GPA. But building an accurate model isn't just about architecture—optimization plays a key role in ensuring efficient and precise learning.
This project explores various optimization techniques such as SGD, Adam, RMSProp, and more, applied to a regression task, and evaluates their impact on training convergence and prediction accuracy.
🔗 GitHub Repository: Student GPA Prediction with Optimization Techniques
Problem Statement
Objective: Prodigy University aims to develop a predictive model to forecast first-year GPA based on SAT scores and high school GPA. The model should help the admissions office identify candidates likely to excel academically.
Challenges:
Predicting continuous outcomes (GPA).
Handling high-dimensional data with limited features.
Evaluating the impact of different optimization algorithms on model performance.
Data Overview
The dataset consists of:
Inputs: SAT scores and high school GPA.
Target: First-year college GPA.
The data was preprocessed to handle missing values, scale features, and standardize inputs for effective model training.
Neural Network Architecture
A fully connected neural network (Multi-Layer Perceptron) was implemented:
Input Layer: Two input features (SAT and GPA).
Hidden Layers: Two neurons in a single layer with ReLU activation.
Output Layer: Single neuron for GPA prediction with linear activation.
Optimization Techniques Compared
To optimize the neural network, I experimented with the following algorithms:
Optimizer | Key Feature |
Stochastic Gradient Descent (SGD) | Updates weights using one data point at a time. |
Batch Gradient Descent | Processes the entire dataset in one go. |
Mini-Batch Gradient Descent | Combines advantages of SGD and batch methods. |
Momentum | Accelerates convergence by considering past gradients. |
Nesterov Momentum | Anticipates the next step for faster learning. |
AdaGrad | Adapts learning rate based on feature scaling. |
RMSProp | Focuses on recent gradients for efficient updates. |
Adam | Combines momentum and RMSProp for robust learning. |
Steps in the Project
1. Data Preprocessing
Standardized Features: Ensured all features are scaled for consistent gradient updates.
Feature Engineering: Applied techniques to refine SAT and GPA inputs for better model interpretation.
Insights:
Standardization helped accelerate convergence across all optimizers, highlighting its importance in numerical stability.
2. Model Implementation
Built a simple MLP architecture using PyTorch with two hidden neurons, ReLU activation, and a linear output layer.
Insights:
The simplicity of the model allowed us to focus on how optimizers influenced convergence and final performance.
3. Optimization Techniques and Training
Each optimizer was applied individually, and the model was trained for 100 epochs. Loss curves were recorded for comparison.
Optimizer | MSE (Lower is Better) | Training Stability |
SGD | 0.32 | High Variance |
Batch GD | 0.29 | Stable |
Mini-Batch GD | 0.28 | Balanced |
Momentum | 0.25 | Faster Convergence |
Nesterov Momentum | 0.24 | Faster and Smoother |
AdaGrad | 0.27 | Limited Generalization |
RMSProp | 0.22 | Best Convergence Rate |
Adam | 0.23 | Most Efficient |
4. Evaluation and Visualization
Loss Curves: RMSProp showed the most consistent and fastest loss reduction.
Predicted vs Actual GPA: Visualized actual vs predicted GPA to assess generalization.
Insights:
RMSProp and Adam achieved the best balance of speed and accuracy.
SGD struggled with stability due to variance in updates, while Momentum-based optimizers excelled in efficiency.
Conclusion
Through this comparative study, I observed that optimizers like RMSProp and Adam are well-suited for regression tasks due to their adaptive learning capabilities. Momentum-based techniques like Nesterov also offer excellent convergence rates, particularly for smaller datasets.
Takeaways:
Selecting the right optimizer significantly impacts training efficiency and model performance.
Combining standardized inputs with advanced optimizers ensures robust results.
Future Work
Extend the study to larger and more complex datasets.
Experiment with deep architectures for more nuanced predictions.
Explore hybrid optimizers combining best features from existing techniques.
Call to Action
Interested in understanding how different optimization techniques impact deep learning models? Check out the complete implementation on my GitHub repository:
🔗 GitHub Repository: Student GPA Prediction with Optimization Techniques
Let’s connect and discuss more about the exciting world of AI and deep learning!
Tags
#DeepLearning #MachineLearning #OptimizationTechniques #AI #Regression #PyTorch #LearningJourney