Fine-Tuning of Distil-BERT for Continual Learning in Text Classification: An Experimental Analysis
Continual learning (CL) with bidirectional encoder representation from transformer (BERT) and its variant Distil-BERT, have shown remarkable performance in various natural language processing (NLP) tasks, such as text classification (TC).However, the model degrading factors like catastrophic forgetting (CF), accuracy, task dependent architecture ru