Klasifikasi Kategori Cerita Pendek Menggunakan XGBoost dengan Seleksi Fitur Chi-Square

M. Faisal Afandi; Ngurah Agus Sanjaya ER; Putu Gede Hendra Suputra; Luh Arida Ayu Rahning Putri

Authors

M. Faisal Afandi Universitas Udayana Author
Ngurah Agus Sanjaya ER Universitas Udayana Author
Putu Gede Hendra Suputra Universitas Udayana Author
Luh Arida Ayu Rahning Putri Universitas Udayana Author

Keywords:

text classification, short stories, xgboost, random forest, chi-square, ensemble learning

Abstract

Text classification is one of the major challenges in the field of natural language processing, particularly in categorizing texts by genre. This study aims to develop a classification system for Indonesian short stories into three genre categories: romance, horror, and religion. Two ensemble-based machine learning algorithms, XGBoost and Random Forest, are employed in the experiments. Prior to model training, the short story data undergo text preprocessing and feature extraction using the TF-IDF method. To enhance feature relevance, Chi-Square feature selection is applied. The models are trained using various hyperparameter combinations and validated using 5-Fold Cross Validation. Experimental results show that Chi-Square feature selection improves model accuracy. Final evaluation is performed on test data using the best hyperparameter configuration. XGBoost achieves the best performance with an F1-Score of 89%, while Random Forest achieves an F1-Score of 86%. These results indicate that XGBoost generalizes better to unseen data, despite using fewer trees than Random Forest.

Klasifikasi Kategori Cerita Pendek Menggunakan XGBoost dengan Seleksi Fitur Chi-Square

Authors

Keywords:

Abstract

Downloads

Published

Issue

Section