ESG Scoring System Construction: Portfolio Investment Based on Machine Learning

: The majority of existing ESG rating systems in the Chinese market are based on categorical classification ratings, and as a result of the voluntary disclosure system, rating data provided by rating organizations is occasionally absent or delayed. This article employs natural language processing (NLP) to extract keywords such as green, clean, renewable, poverty alleviation, and moral from the financial reports of CSI 300 constituent companies, and then counts their corresponding frequencies in order to construct percentage ESG ratings that address the discontinuity, imprecision, and time lag inherent in the original ratings. This article employs a self-normalized neural network (SNN) to develop a multi factor model based on the suggested ESG ratings and then conducts sector neutral hierarchical back-testing to compare the proposed rating to the traditional ratings. The results indicate that the model generated using the ESG ratings developed in this research yields a higher rate of return than the model built using traditional ESG ratings, and the model constructed without an ESG factor. This may be because deriving ESG ratings directly from financial statements eliminates the risk of corporate falsification or whitewashing of accounts. This work adds to the body of knowledge by proposing a novel approach to constructing an ESG scoring system and incorporating it into portfolio investments to maximize returns.


Introduction
Along with the opening of the Chinese financial market, ESG has garnered increasing attention from Chinese investors as an emerging investment concept.ESG stands for Environmental (E), Social (S), and Governance (G).These three non-economic factors are used to assess a company's performance in terms of social responsibility, establishing a sustainable development pattern, and implementing a modern management style.They serve as a benchmark for investors seeking to make socially responsible investments.A scientific and objective ESG rating not only assists investors in forecasting a company's future financial performance but also in assessing the company's sustainable development status.ESG ratings are a set of indicators developed by professional rating agencies that are critical for investors to make investment choices.However, in the Chinese market, this is not a mandatory requirement, and regulators and industry associations only encourage companies to disclose pertinent information voluntarily.As a result, the Chinese market's existing ESG ratings are characterized by an opaque evaluation process and a small number of covered companies.
At the moment, research on ESG in Chinese markets is scarce, and the majority of the literature focuses on its definition and strategy application, rather than on the quantitative impact of ESG factors on the stock investment process and portfolio selection.Additionally, in terms of ESG rating, the indicators published by existing rating agencies, such as Huazheng and Wind, are general categorical gradings and usually have a time lag which is intolerable in making real time transaction decisions.In this paper, I will propose a new ESG rating system that is based on a numerical grading mechanism to cope with these two problems and then backtest its effect using a self-normalized neural network.With its percentage grading scale, this numerical rating system not only compensates for the shortcomings of overly broad traditional rating systems.Additionally, it could avoid the constraints associated with companies withholding ESG information since the rating is based on ESG keywords in financial reports.

2.
Literature Review and Hypotheses

Impact of ESG on Companies' Financial Performance
In terms of the relationship between ESG performance and revenue, Rockness [1] was the first to conduct serious research on the topic and determined that there was no correlation existed at the time.
Several further researchers concurred that there is a minimal association between the two: Jaggi and Freedman [2] conducted a study in the United States and concluded that the environmental and financial performance of a business is not associated in the short term; Andreas [3] studies the influence of European firms' sustainability performance on their stock price and concludes that a corporation's social performance within a specific industry has no discernible effect on stock returns.On the contrary, other research indicates that whether or not a company complies with its social obligation does have an effect on the financial success of a corporation.According to several publications, strong environmental performance is a possible expense to the business and will have a detrimental effect on their income level (Walley and Whitehead [4]; Gray and Shabegian [5]).These adverse effects of ESG may result in a fall in the market price impairing their anticipated future earnings.However, other studies assert that improving environmental protection and reducing pollution benefits business financial performance (Hart and Ahuja [6]; Telle [7]).In general, positive correlations exist between corporate performance and public relations in terms of shareholder value, which represents a company's profitability and competitiveness.
H1: Companies' ESG performance and compliance have a significant impact on their stock returns.

Criteria Selection of ESG Rating Index
In systems and the theoretical discussion of related disclosure, while relevant research on specific ESG evaluation systems and Chinese enterprises' ESG disclosure and financial performance remains relatively scarce.Additionally, the majority of the available research discusses just a subset of elements such as equity structure, environmental governance spending, and so forth.The proposed rating system in this paper presents a complete set of ESG rating criteria from a novel viewpoint, based on the frequency of ESG-related terms in financial reports of publicly traded corporations.

Overview
To begin, this article employs natural language processing to extract keyword frequencies such as green, clean, renewable, poverty alleviation, and ethical from the financial reports of CSI 300 constituent companies and then converts them to percentage ESG ratings in order to address the discontinuity, inaccuracy, and time lag associated with traditional ESG ratings.The study then uses a self-normalized neural network to construct a multifactor model based on the constructed ESG scores.Following that, a sectorneutral hierarchical backtest is conducted to compare the suggested ratings' efficacy to that of the standard ratings.

Data Selection
The analyses in this paper are carried out on portfolios chosen from CSI 300, a weighted index that consists of 300 Ashare stocks listed on the Shanghai or Shenzhen Stock Exchanges.Additionally, the time span covered by this article is 2010/12/31 to 2020/12/31, and the sample includes 29 distinct industries, including renewable energy, communication, iron, real estate, agriculture, and others.
For establishing the ESG rating, this article chose the annual financial reports of CSI 300 businesses and extract the associated keywords from these PDF documents., this paper then assigned different weights to these words according to their relative importance and managed to develop a scoring system using keyword frequencies and their corresponding weights.

Constructing Multi-Factor Model
Based on the factor model built by Nalini [17], a total of 89 factors, including financial quality, leverage, momentum, volatility, investor sentiment, and technical fundamentals, are selected in this study.In this approach, this study is trying to integrate various elements that may affect the return into the model and boost the explanatory power of this multi-factor model on the fluctuations of the market return.
Then, three multi-factor models were built: a) the model with only the basic factor pool, b) the model with the basic factor pool plus traditional ESG rating published by Wind Database, and c) the model with the basic factor pool and the ESG rating proposed above.

Training the Model
(1)Self-Normalizing Neural Network Deep learning could be deemed as a multi-layered and more sophisticated version of machine learning.This paper applies self-normalizing neural networks (SNN) proposed by Klambauer [18] to calculate the optimal parameters for these models.The term "normalization" refers to the process of converting inputs to zero mean and unit variance; this is often performed as a preprocessing step.It accelerates learning and increases accuracy since it allows for comparison between the values of different characteristics.By employing SNN to construct neural networks, it could guarantee the data distribution across each layer stays stable, allowing practitioners to use neural net-based techniques for non-perception tasks such as predicting the stock returns in financial markets.
(2)Time Series Cross-Validation After deciding on a model and training strategy, the next step is to choose hyperparameters for training the SNN.One of the most effective approaches to evaluating and choosing hyperparameters is cross-validation.Cross-validation, some-times referred to as out-of-sample testing, is a resampling technique used to assess machine learning models on limited training data.Traditionally, crossvalidation is performed by dividing the original data set into k subsets and then using one dataset as the test set and the remaining data as training sets.Then, by training the model for k times on a periodic basis, k MSEs may be generated.The ultimate metric CV, as shown below, equals the average of these k MSEs.

Results and Discussion
This section aims to provide an assessment of the three models.For back-testing, this paper stratified the data into five layers based on return forecasts for T+1.Then, it conducted back-testing and analyzed the models' effectiveness from 8 aspects -annualized return, volatility, Sharpe ratio, maximum drawdown, excess return, tracking error, information ratio (IR), and success ratio.A specific stratification involved labeling all stocks by sector, sorting them by forecasted return, and dividing each sector into five stratified portfolios.As a result, five sector-neutral stratified portfolios were established using this strategy.

The Model with Basic Factor Pool
Figure 1 gives an overview of the cumulative returns of each of the five portfolios as well as the benchmark CIS 300 for the model built with only the basic factor pool. Table 2 compares the benchmark CSI 300 Index to the five portfolios plus an additional Long-Short Portfolio constructed by long Portfolio 1 and short Portfolio 5.It could be observed that Portfolio 1 yields an annualized return of 0.0869, and has the highest Sharp Ratio of 0.01489, which is consistent with the inherent nature of the stratification method this paper employed.

The Model with Wind ESG Rating
As with the previous section, Figure 2 illustrates the cumulative return for each portfolio, and Table 3 exhibits the back-testing result for the model created using features from the basic factor pool plus Wind ESG rating, the most widely used ESG rating in the Chinese market.As can be seen from these graphs, all the metrics in Table 2 have increased in comparison to Model 1, suggesting that portfolios created using ESG are indeed capable of providing better outcomes.The graph also shows that the portfolio selected following the method described above can yield a return that is up 28% higher than the benchmark CSI 300 Index.

The Model with Constructed ESG Rating
Graphs below illustrate the return and other financial indicators of the model built using proposed ESG rating data.As demonstrated by the following results, the model trained using the proposed ESG criterion exceeds the two preceding models across all relevant metrics, including the return, Sharpe ratio, and success ratio.Among these, portfolio1's Sharpe ratio is 32% greater than model2's and 235.9% greater than model1, indicating that the continuous and non-defective ESG scoring criteria developed in this research outperform traditional ESG scoring criteria in the real world.This could be because the continuous ESG rating, which is calculated on a percentage basis, provides a more accurate depiction of a company's ESG performance.Additionally, deriving ESG ratings directly from financial statements eliminates the risk of corporate falsification or whitewashing of accounts, which might be a significant disadvantage for the existing ESG rating system, as corporations are only required to report ESG data voluntarily.

Conclusion
The crucial finding in this article is that constructing a new ESG scoring system using natural language processing and a self-normalized neural network can generate higher returns when conducting portfolio investing in the Chinese stock market.This paper investigates the relationship between ESG ratings and portfolio returns by extracting ESG keywords using NLP, building multi-factor models with SNN, and then conducting sectorneutral back-testing to compare the results of these three models.Empirical research has established that the ESG factor generally improves the effectiveness of the model -models incorporating the ESG factor provide higher returns than the model that does not.What's more, the ESG factor established in this work dwarfs the typical ESG factor in terms of returns.With the rating system outlined in this paper, investors will have access to more accurate and timely ESG ratings for all publicly traded companies, enabling them to make sound investment decisions.

3.3. Methodology 3.3.1. Developing ESG Rating System
Development, Swaps, Option Incentives, Long Term Equity and etc.Based on China ESG Development White Paper 2021[16]

Table 1 :
Basic Settings of Self-Normalizing Neural Network.

Table 2 :
Back-Testing results of Model 1.

Table 3 :
Back-Testing results of Model 2(With Wind ESG).