Optimizing multi label student performance prediction with GNN-TINet: A contextual multidimension...

As education increasingly relies on data-driven methodologies, accurately predicting student performance is essential for implementing timely and effective interventions. The California Student Performance Dataset offers a distinctive basis for analyzing comp…
Ellamae O'Reilly · 4 days ago · 3 minutes read


Early Student Performance Prediction Using GNN-Transformer-InceptionNet: A Multi-label, Multidimensional Deep Learning Framework

Introduction

Educational data mining (EDM) is crucial for analyzing patterns in educational datasets, identifying factors that contribute to student success, and improving learning outcomes. Predicting student performance is a critical task in EDM, and various methods have been proposed, including decision trees, random forests, and deep learning models such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs).

However, existing models often face challenges in capturing complex relationships and interactions within multi-label educational datasets. Multi-label datasets involve students with multiple performance categories, such as high, medium, or low performance in various academic areas. To address these challenges, we propose GNN-Transformer-InceptionNet (GNN-TINet), a novel deep learning model that combines graph neural networks (GNNs), transformers, and the Inception architecture.

GNNs excel at representing and learning from graph-structured data, making them suitable for capturing the relational aspects of educational data. Transformers are known for their ability to model long-range dependencies and handle sequential information, such as student performance over time. The Inception architecture allows for multi-scale feature extraction, capturing diverse patterns within the data.

Proposed Methodology

The proposed methodology consists of several key modules:

  • Data Preprocessing: Involves handling missing values, encoding categorical variables, normalizing numerical features, and identifying outliers using novel techniques such as Contextual Adaptive Imputation and Dynamic Range Scaling.
  • Hierarchical Contextual Feature Scoring (HCFS): Selects relevant features by considering their mutual information, contextual importance, and redundancy, resulting in an optimal feature set.
  • Feature Engineering: Adds new features, such as Academic Consistency Score, Study Efficiency Ratio, and Peer Influence Index, to enhance the model's ability to capture relationships between data points.
  • Data Balancing: Uses Cluster-Based Class Expansion (CBCE) to address class imbalance by generating synthetic instances for underrepresented classes, preserving the inherent data distribution and diversity.
  • Classification with GNN-Transformer-InceptionNet Network (GNN-TINet): Combines GNNs, transformers, and Inception architecture to classify complex educational data. GNNs model the relational structure, transformers capture long-range dependencies, and Inception enables multi-scale feature extraction.
  • Performance Evaluation

    The model's performance is evaluated using a comprehensive set of metrics, including accuracy, precision, recall, F1-score, and two novel metrics: Learning Impact Factor (LIF) and Predictive Consistency Score (PCS). PCS measures the model's ability to make consistent predictions across evaluations, while LIF assesses the model's effectiveness in predicting long-term student performance changes.

    Simulation Results and Discussion

    Extensive simulations demonstrate the effectiveness of the GNN-TINet model. It achieves an accuracy of 98.5%, outperforming existing methods. Analysis of student performance patterns reveals relationships between GPA, homework completion, parental involvement, and other factors. The GNN-TINet model identifies students at risk of underperformance and excels in predicting long-term performance changes.

    Conclusion and Future Work

    GNN-TINet advances EDM by providing a robust and accurate framework for early student performance prediction. It effectively handles multi-label educational datasets and captures complex relationships and interactions within the data. By integrating GNNs, transformers, and InceptionNet, the model achieves superior performance and can be used to personalize interventions and improve learning outcomes.

    Future work may focus on extending the model to handle additional educational data sources, such as behavioral and affective data, and exploring its applicability in different educational contexts. By leveraging advanced deep learning techniques, we can further enhance the accuracy and generalizability of student performance prediction models and contribute to the advancement of educational data mining.