Analysis and Prediction of the China Securities Bank Index Based on the ARIMA Model

: Banking is crucial for a country's financial system. Also, the performance of banks can indicate the confidence of investors in the financial market and the overall situation of the country's economy to a certain extent. As the CSBI index reflects the overall performance of Chinese banking stocks, predicting the relevant price movements can predict how the banking industry will move over time to come and help investors make investment decisions. Under this circumstance, the paper selects the CSBI closing price from January 2th, 2019 to August 11th, 2023 and build the ARIMA (0,1,0) model to make predictions based on these data. The conclusion shows that the ARIMA model has good forecasting abilities in the short term, but its timeliness is relatively poor in the long term. Finally, on the basis of these analyses, the paper summarizes the development trend of the industry in the short term, lists the reasons for the inaccuracy of the long-term forecast from different aspects and provides suggestions for investors.


Introduction
Bank stocks play a crucial role in a country's financial system.As a provider of financial services, banks' earnings and performance are directly affected by changes in the country's economy.The China Securities Bank Index (CSBI) is made up of banking stocks in the CSI Broad Market Index constituent stocks, which are weighted by average daily turnover and average daily total market capitalization.It reflects the overall performance of stocks in the banking industry.Therefore, the forecasting of its future price can indicate the direction of China's economy and short-term policies.It also provides valuable insights for investors' decision-making direction.In terms of the method, the Autoregressive Integrated Moving Average Model (ARIMA) is generally appropriate for the analysis and application of time series.In addition, it is capable of accurately predicting the future price data of stocks.
In the past, it was a hot topic to study the changes in different indexes of the financial market, both at home and abroad.Scholars often undertook experiments through time series analysis in order to provide data references for their future investment decision-making.The practicality of the ARIMA model for short-term data prediction was confirmed by some scholars who selected specific stocks in the financial market for predictive analysis.Liu et al. established an ARIMA model and used Python tools to predict the historical closing price data of Southwest Securities stocks.Then the practicality of the model was confirmed [1].Huang established an ARIMA model in R to analyze and predict Ping An's stock price.The ideal results were also obtained successfully [2].Khan used the ARIMA model to predict the Maruti Suzuki stock price, which also confirmed that the short-term prediction of the model had better effect [3].Shakir et al. used three different custom ARIMA models to analyze Netflix's five years of stock history price data and predict the future trend.Finally, the most accurate stock prediction model was obtained and the desired results were got [4].Some scholars also made prediction analysis on the bank stock price and related indicators.Kong predicted and judged the upand-down trend of ICBC stocks price in the next trading day by constructing a variety of models and algorithms.After comparation, there was a conclusion that the ARIMA model was suitable for price prediction and machine learning was suitable for judging the accuracy of predictions [5].Yin introduced the ARIMA (2,1,2) model and the Box-Jenkins method to analyze and predict the P/E ratio of the BCM stocks.Finally, there was a conclusion that the model had high prediction accuracy [6].However, some of these above scholars' predictions didn't contain data to show their conclusion of the evaluation of long-term predictions.They just made simple descriptions.Weng constructed the ARIMA (6,1,6) model to analyze and predict the stock price of CCB.Then through a variety of indicators, the data fit was judged.On the basis of data analysis, the conclusion that the long-term prediction effect error was large was also drew [7].Many scholars also conducted predictive analysis of other indexes in the stock market.Li used the ARIMA model to predict the Shanghai Stock Exchange Index (SSE Index).Based on the experimental results, the model was believed that it had good prediction ability in a short term but the timeliness of long-term prediction was not good enough [8].Zou constructed the ARIMA (3,1,3) model to conduct an empirical analysis of the closing price of the SSE Index and came to a similar conclusion [9].Wang et al. established the ARIMA (0,1,6) model to predict the annual trend of the SSE 50 Index in 2017.It could be concluded that the index had a slight recovery and maintained a trend of volatility.Also, the model's error gradually increased with the extension of the forecast time [10].Li used the combination of ARIMA (3,1,3) model and artificial neural network model to predict the CSI 300 Index.Finally, a prediction curve with high accuracy was obtained [11].
In short, there is a growing number of data analysis and forecasts on composite indexes and individual stocks in the stock market.However, less attention is paid to indicators which reflect the development of the entire banking industry.At the same time, more scholars emphasize the applicability of ARIMA model for short-term predictions, while the analysis of the reasons for the limitations of long-term predictions is insufficient.The purpose of this paper is to solve the two problems mentioned above using the ARIMA model.It aims to analyze and predict the CSBI's price over the next 15 trading days based on the historical data.Additionally, a summary is provided of the changes in China's banking industry or even the national economy.Moreover, the paper compares the fit between the predicted and actual data to analyze both short-term and long-term accuracy, then further summarizes the reasons why the prediction is not accurate enough in the long-term.

Data Source
The prices of CSBI fluctuate at every point in the trading period, so the selection of the daily closing price as the research object is more meaningful in predicting the overall fluctuation of the index than its opening, the highest and the lowest prices.The paper extracts the closing price data for 1120 trading days from 2/1/2019 to 11/8/2023.The data is gathered from Orient Wealth Online.

Variable Selection
This paper mainly analyzes and predicts the change of the CSBI closing price from the dimension of time, so the variables are the day of trading and the CSBI closing price.Sometimes the day of trading can be shown in the format of date.Table 1 shows the variable attributes.

Research Protocol
The research uses Excel software to list the basic outline of the selected data and SPSS Statistics 27.0 software to build an ARIMA model for analysis.Also, the research process is systematic.Firstly, the paper carries out an initial stationary test.The sequence diagram, autocorrelation coefficient function (ACF) and partial autocorrelation coefficient function (PACF) analysis plots of the historical closing price data are viewed and analyzed.After the previous analysis, the nonstationary data is smoothed by difference method and the stationarity test is carried out again.Then, according to the ACF and PACF plots, the model parameters p (autoregressive order), d (the number of differences required to change the original time series into a stationary time series) and q (moving average order) are determined to build the optimal model.After that, the applicability of the model is tested by performing a white noise test on the residual sequence of the model.Finally, the future value of the indicator is predicted and analyzed using the model which passes the test.

3.
Results and Discussion

Stationary Test
Firstly, the original time series is analysed and shown in Figure 1.It can be preliminarily determined that the data is distributed in a non-stationary time series.

Model Recognition
In this case, the difference method should be used to perform differential processing of appropriate order on the original data in order to form a stationary time series.Therefore, the paper carries out the 1st order differential processing to smooth the original sequence and then observes the stationarity of the new one.Figure 4 shows the result of the process.It indicates that the data after processing tends to be stable.By plotting first order difference sequence ACF and PACF plots in Figure 5 and 6, it can be observed that the coefficients are distributed within the confidence interval.It indicates that after the first differential process, a stationary sequence for the Arima model is obtained.The model parameter is ARIMA (0,1,0) with  = 0,  = 1,  = 0.

Model Testing
To ensure the applicability of the model to the CSBI closing price, the model is constructed using SPSS 27.0 software.Some of the relevant model parameters and features are shown in the tables below.Table 2 shows that the R-squared is 0.979, close to 1, indicating that the model fits well and can explain 97.9% of the phenomenon.This is a preliminary demonstration that the model fits well.
At the same time, according to the BIC criterion, the smaller the BIC value, the better the model fits the data.The BIC value for this model is shown in Table 3.Compared to models with other parameters, the BIC value 8.578 in this model is the smallest.It indicates that the data fits well in this case.In summary, it can be concluded that this model is the optimal one.At the same time, according to the BIC criterion, the smaller the BIC value, the better the model fits the data.The BIC value for this model is shown in Table 3.Compared to models with other parameters, the BIC value 8.578 in this model is the smallest.It indicates that the data fits well in this case.In summary, it can be concluded that this model is the optimal one.

Residual White-noise Test
After selecting the final model, the residual terms also need to be tested.The paper chooses to test the characteristics of these terms in 2 different ways.The first one is performing a stationary test on the residuals.It can be seen in Figure 7 that the values of ACF and PACF for each lagging order are within the confidence interval and there is no significant change in trend.However, only through this analysis cannot accurately show the distribution features of the residual terms.Further proofs are needed from other dimensions and results.4 shows that the residual terms have P values that are greater than 0.05.According to this result, the residual terms are part of the white noise sequence.Additionally, it indicates that the useful information in the residual terms has been extracted and the model is essentially flawless.Furthermore, by plotting the QQ plot, Figure 8 shows that most of the residual values are near the straight line.This indicates that the residual terms conform to the characteristics of a normal distribution.Therefore, combined with the stationary test results, it is determined that the residual terms after the optimal model fitting are a white noise sequence.

Model Prediction
The CSBI closing price is predicted using the optimal model.The results are shown in Table 5 and Figure 9. Table 5 shows that the forecast values' relative errors of the first 5 trading days are relatively small.The absolute values of three of them are within 1%, which are more accurate.Due to a situation of more than 2%, the relative error's absolute values of the forecast value have become relatively large in the last 3 predicted trading days.In addition, the average absolute value of the relative error in this period is greater than that in the first 5 days.It can be seen that as the prediction interval of the model increases, the predictive error will gradually increase.The accuracy of short-term prediction is fairly high.Figure 9 shows that the forecast trend of the price for CSBI is relatively stable in the coming period.Also, it is slightly downward but it doesn't have huge fluctuations.

Conclusion
As for the generally stable trend of the price for CSBI, it can be inferred that China's banking industry is operating in a stable manner.Also, at present, the economic policy orientation promoted by the state is to maintain the stable operation of the banking industry and ensure the overall requirement of 'economic stability and development security'.At the same time, the country's attitude towards the banking industry is to seek progress while the credit risk is manageable and market risk and liquidity risk are generally stable.The presentation of prediction results and relative error analysis demonstrate that the ARIMA (0,1,0) model is a reliable prediction method in the economic field.From the comparison of the predicted and the actual value of the price in the next 15 days, it can be seen that the model has a good prediction effect in the first 5 trading days.Nevertheless, the prediction error after that is somewhat large.It indicates that the timeliness of this model should be taken into account when using it for prediction.It is better suited for short-term forecasting.
There are numerous reasons for larger errors in long-term forecasting results resulting from this consequence.In addition to the internal volatility, the stock market is also impacted by various external factors.Firstly, the government may introduce new policies to promote economic development.For example, at the end of August this year, the government introduced policies such as halving the stamp duty on securities transactions and tightening the pace of IPOs in stages.The implementation of these policies will bring new vitality to the market and boost the growth of China's overall economy.Therefore, the CSBI index prices may increase significantly.Secondly, changes in the social environment and the international situation are also uncertain factors.If the COVID-19 epidemic is repeated and the new virus is infested, the national economy may experience a sluggish development again in the future.As a major part of the national economy, the banking industry will inevitably be affected to a certain extent.Also, changes in the international political situation and fluctuations in the international financial market may have an impact on the domestic stock market.Finally, investor psychology will be affected by changes in the investor base.If more investors become more aggressive, the prices of the relatively unpopular types of banking stocks and indexes will fall.
In summary, the development of China's banking industry is relatively stable because of the steady trend of the CSBI index.The paper recommends that conservative investors choose CSBI as their investment choice.However, investors should make decisions based on both machine forecasts and economic facts when investing in the future.

Figure 1 :
Figure 1: Original time series plot of CSBI closing price.

Figure 2 :
Figure 2: Original time series ACF plot of CSBI closing price.

Figure 3 :
Figure 3: Original time series PACF plot of CSBI closing price.

Figure 2
Figure 2 and 3 demonstrate that the ACF plot tends to tail.The coefficients show a slow rate of periodic decrease and most of the data does not fall within the confidence interval.The PACF plot shows a 1st order truncation and the coefficients fluctuate in a small range above and below the zero line.In summary, the original time series is a nonstationary time series.

Figure 4 :
Figure 4: Time series plot of CSBI closing price after 1st order differential processing.

Figure 5 :
Figure 5: ACF plot of CSBI closing price after 1st order differential processing.

Figure 8 :
Figure 8: Q-Q plot of residual from the final model.
At the same time, as shown by the Ljung-Box test, Table