The working condition data of lithium-ion battery can vary significantly due to factors such as battery type, production processes, and usage conditions. These data differences pose a challenge to accurately predicting the state of charge (SOC) , leading to various scenarios where the model exhibits low training accuracy, high training accuracy and low prediction accuracy, and so on. To investigate the impact of data differences on the training results, it is crucial to study the influence of distribution diversity of large-scale data on the generalization of the prediction model of SOC. Therefore, 32 operational data sets of actual lithium batteries were studied in this paper. Considering the demand of advanced battery management technology, random forest (RF) was combined with MIMO strategy to predict multi-step SOC, and prediction models were established for 32 operational data sets respectively. The application effect of RF is studied and the effect of data set properties on multi-step prediction model of SOC is analyzed. The results indicate that, for large-scale lithium-ion battery data, excluding a small amount of data, the RF-MIMO model achieves an R2 training accuracy of approximately 0.95 or higher for predicting future SOC with a time step of 180 intervals. The median R2 accuracy of each model to predict other data sets remains about 0.9. When the dataset meets the requirements of a wide distribution range of SOC, a left-skewed tendency in the kernel density curve, and a relatively uniform distribution, the model training can obtain high precision.
Keywords lithium-ion battery, large-scale data, state of charge, multistep prediction