Optimizing Neural Networks: A Comprehensive Guide to ML Techniques

Introduction

Predicting academic success is a crucial task for universities seeking to admit students who thrive academically. To tackle this, I developed a predictive deep learning model that forecasts first-year GPA using high school SAT scores and GPA. But building an accurate model isn't just about architecture—optimization plays a key role in ensuring efficient and precise learning.

This project explores various optimization techniques such as SGD, Adam, RMSProp, and more, applied to a regression task, and evaluates their impact on training convergence and prediction accuracy.

🔗 GitHub Repository: Student GPA Prediction with Optimization Techniques

Problem Statement

Objective: Prodigy University aims to develop a predictive model to forecast first-year GPA based on SAT scores and high school GPA. The model should help the admissions office identify candidates likely to excel academically.

Challenges:

Predicting continuous outcomes (GPA).
Handling high-dimensional data with limited features.
Evaluating the impact of different optimization algorithms on model performance.

Data Overview

The dataset consists of:

Inputs: SAT scores and high school GPA.
Target: First-year college GPA.

The data was preprocessed to handle missing values, scale features, and standardize inputs for effective model training.

Neural Network Architecture

A fully connected neural network (Multi-Layer Perceptron) was implemented:

Input Layer: Two input features (SAT and GPA).
Hidden Layers: Two neurons in a single layer with ReLU activation.
Output Layer: Single neuron for GPA prediction with linear activation.

Optimization Techniques Compared

To optimize the neural network, I experimented with the following algorithms:

Optimizer	Key Feature
Stochastic Gradient Descent (SGD)	Updates weights using one data point at a time.
Batch Gradient Descent	Processes the entire dataset in one go.
Mini-Batch Gradient Descent	Combines advantages of SGD and batch methods.
Momentum	Accelerates convergence by considering past gradients.
Nesterov Momentum	Anticipates the next step for faster learning.
AdaGrad	Adapts learning rate based on feature scaling.
RMSProp	Focuses on recent gradients for efficient updates.
Adam	Combines momentum and RMSProp for robust learning.

Steps in the Project

1. Data Preprocessing

Standardized Features: Ensured all features are scaled for consistent gradient updates.
Feature Engineering: Applied techniques to refine SAT and GPA inputs for better model interpretation.

Insights:
Standardization helped accelerate convergence across all optimizers, highlighting its importance in numerical stability.

2. Model Implementation

Built a simple MLP architecture using PyTorch with two hidden neurons, ReLU activation, and a linear output layer.

Insights:
The simplicity of the model allowed us to focus on how optimizers influenced convergence and final performance.

3. Optimization Techniques and Training

Each optimizer was applied individually, and the model was trained for 100 epochs. Loss curves were recorded for comparison.

Optimizer	MSE (Lower is Better)	Training Stability
SGD	0.32	High Variance
Batch GD	0.29	Stable
Mini-Batch GD	0.28	Balanced
Momentum	0.25	Faster Convergence
Nesterov Momentum	0.24	Faster and Smoother
AdaGrad	0.27	Limited Generalization
RMSProp	0.22	Best Convergence Rate
Adam	0.23	Most Efficient

4. Evaluation and Visualization

Loss Curves: RMSProp showed the most consistent and fastest loss reduction.
Predicted vs Actual GPA: Visualized actual vs predicted GPA to assess generalization.

Insights:

RMSProp and Adam achieved the best balance of speed and accuracy.
SGD struggled with stability due to variance in updates, while Momentum-based optimizers excelled in efficiency.

Conclusion

Through this comparative study, I observed that optimizers like RMSProp and Adam are well-suited for regression tasks due to their adaptive learning capabilities. Momentum-based techniques like Nesterov also offer excellent convergence rates, particularly for smaller datasets.

Takeaways:

Selecting the right optimizer significantly impacts training efficiency and model performance.
Combining standardized inputs with advanced optimizers ensures robust results.

Future Work

Extend the study to larger and more complex datasets.
Experiment with deep architectures for more nuanced predictions.
Explore hybrid optimizers combining best features from existing techniques.

Call to Action

Interested in understanding how different optimization techniques impact deep learning models? Check out the complete implementation on my GitHub repository: