Income Prediction Using Machine Learning
DOI:
https://doi.org/10.22105/scfa.v1i3.46Keywords:
Income prediction, Machine learning, Data preprocessing, KMeans clustering, Random Forest, Predictive analytics, Model evaluationAbstract
This initiative utilizes machine learning techniques to forecast personal income levels based on demographic and employment information. The research improves predictive precision by grouping individuals with similar traits using KMeans and applying algorithms such as Random Forest and XGBoost. Important data preprocessing procedures—Like managing missing values and encoding categorical variables—were crucial in enhancing model effectiveness. Of all the models assessed, Random Forest achieved the best accuracy.
This research highlights the importance of predicting income in areas such as finance, policymaking, and marketing, where insights based on data facilitate targeted decision-making. The study demonstrates how machine learning can offer accurate income predictions, allowing for well-informed decisions across various industries.
References
[1] Bengio, Y., Courville, A., & Vincent, P. (2013). Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence, 35(8), 1798–1828. https://doi.org/10.1109/TPAMI.2013.50
[2] Ahmed, M., Seraj, R., & Islam, S. M. S. (2020). The k-means algorithm: A comprehensive survey and performance evaluation. Electronics, 9(8), 1295. https://doi.org/10.3390/electronics9081295
[3] Rigatti, S. J. (2017). Random forest. Journal of insurance medicine, 47(1), 31–39. https://doi.org/10.17849/insm-47-01-31-39.1
[4] Shaik, N. B., Jongkittinarukorn, K., & Bingi, K. (2024). XGBoost based enhanced predictive model for handling missing input parameters: A case study on gas turbine. Case studies in chemical and environmental engineering, 10, 100775. https://doi.org/10.1016/j.cscee.2024.100775
[5] Patra, M., Chakraborty, G., & Mohapatra, H. (2024). Learning To navigate society: Machine learning's impact on social dynamics. In Role of Emerging Technologies in Social Science (pp. 83). Cambridge Scholars Publishing. https://B2n.ir/yn4361
[6] Hastie, T., Tibshirani, T., & Friedman, J. (2009). The elements of statistical learning: data mining, inference, and prediction. Springer Nature. https://link.springer.com/book/10.1007/978-0-387-84858-7
[7] Wang, W., Men, C., & Lu, W. (2008). Online prediction model based on support vector machine. Neurocomputing, 71(4), 550–558. https://doi.org/10.1016/j.neucom.2007.07.020
[8] Harumy, T. H. F., Zarlis, M., Effendi, S., & Lidya, M. S. (2021). Prediction using a neural network algorithm approach (A review). 2021 international conference on software engineering & computer systems and 4th international conference on computational science and information management (ICSECS-ICOCSIM) (pp. 325–330). IEEE. https://doi.org/10.1109/ICSECS52883.2021.00066
[9] Li, Y., & Wu, H. (2012). A clustering method based on k-means algorithm. Physics procedia, 25, 1104–1109. https://doi.org/10.1016/j.phpro.2012.03.206
[10] Li, Y. G. (2013). A clustering method based on k-means algorithm. Applied mechanics and materials, 380, 1697–1700. https://www.scientific.net/AMM.380-384.1697
[11] Gunning, D. (2017). Explainable artificial intelligence (xai). Defense advanced research projects agency (DARPA), nd web, 2(2), 1. https://B2n.ir/xp6067
[12] Duval, A. (2019). Explainable artificial intelligence (XAI). MA4K9 scholarly report, mathematics institute, the university of warwick, 4. http://dx.doi.org/10.13140/RG.2.2.24722.09929
[13] Baranes, A., Palas, R., & Yosef, A. (2022). Predicting earnings directional movement utilizing recurrent neural networks (RNN). Journal of emerging technologies in accounting, 19(2), 43–59. https://doi.org/10.2308/JETA-2021-001
[14] Breiman, L. (2001). Random forests. Machine learning, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324
[15] Mishra, S. R., Pranati., Anika., & Mohapatra, H. (2024). Enhancing money laundering detection through machine learning: A comparative study of algorithms and feature selection techniques. In AI and blockchain applications in industrial robotics (pp. 300–321). IGI Global Scientific Publishing. https://doi.org/10.4018/979-8-3693-0659-8.ch012
[16] Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., … others. (2011). Scikit-learn: Machine learning in Python. The journal of machine learning research, 12, 2825–2830. https://b2n.ir/tf4537