The Effect of Investor Sentiment throughout the Distribution of Returns in the Chinese Stock Market

: This paper adopts the data of xueqiu.com in the first half of 2015 from the Datago database and selects 100 A-shares stocks in CSI100. The number of followers is considered when building the bullishness index. In addition, quantile regression is used to study the contemporaneous and predictive effects of sentiment on stock returns. The result suggests that sentiment has negative contemporaneous effects on stock returns, while for the predictive model, the effect is positive. Furthermore, the effects of sentiment are stronger in extreme circumstances and more substantial at lower quantile than at higher quantile. However, the weighted bullishness index does not explain well as the original one. Overall, this paper verifies the difference of effects between extreme and normal conditions, then attempts to use a new method to construct a bullishness index and offer more evidence about the relationship between social network sentiment and stock returns in the Chinese stock market.


Introduction
In recent years, the rise of social networks has provided people with a broader platform to capture valuable information and exchange their ideas, and stock investors are no exception. There is growing literature on the link between investors' sentiment and stock returns. Due to the development of text mining, more literature directly extracts the sentiment from each post and generalizes it into a sentiment indicator. Moreover, different social medial platforms were studied separately, including platforms mainly discussing US stocks such as Twitter and StockTwits and platforms mainly focusing on A-shares like Weibo and xueqiu.com [1][2][3][4]. Furthermore, one apparent difference between the platforms mentioned above is that StockTwits and xueqiu.com are financial forums that cover more pure information because almost all users are investors.
However, existing studies only consider the number of posts conveying sentiment when constructing the sentiment indicator. Other factors like the followers of the poster and the number of likes, retweets and replies can also affect the influence of one post. If the scope is narrowed down to A-share related research, most of them validated the correlation between sentiment and stock returns, but either positive or negative is not confirmed [5]. Moreover, only the predictive effect was concerned, but the contemporaneous effect was nearly mentioned.
This paper focuses on 100 representative stocks in A-shares to investigate the contemporaneous and predictive effects of social network sentiment on returns in the Chinese stock market. In addi-tion, the followers of posters are considered to construct a new weighted sentiment indicator. Two models are compared using original and weighted indicators to see if weighted indicator is more accurately fitted.
The study explores the exact influence of social sentiment on A-share related stock returns and can help future researchers comprehensively understand the effect of social network sentiment on stock returns.

Literature Review
Behavioral finance has always been a hot topic. It is inevitable for an investor to be influenced by others' sentiments. According to "no-trade theorem" of Milgrom and Stokey, when a transaction occurs, the disagreement between the buyer and the seller will make them think about why you would like to trade with me [6]. Sometimes, these sentiments will cause investors to change their behaviors. In this case, they cannot be considered as ration investors due to noise. De Long et al. argued that unpredictable behaviors of noise traders could result in the divergence of asset prices from their fundamental values [7]. More detailed studies verified the relationship between investor sentiment and stock returns in the stock market. Specifically, Antweiler and Frank stated that positive message posting has a statistically significant negative effect on returns next day [8].
To further study these topics, the first key research point is how to measure investor sentiment. The first primary method uses market survey indicators, including the consumer confidence index [9], the investor intelligence survey, etc [10]. The second primary method is observable underlying proxies to build a model for the investor sentiment index. One set of the authoritative composite variables are the closed-end fund discount, the number and average first day returns on IPOs, share turnover, the equity share and the dividend premium [11,12]. Both methods try to predict the sentiment indirectly. With the development of the internet, investors have more opportunities to search for information, express emotions, and exchange opinions. Their comments on a particular stock on social media directly reflect their sentiments. Through text mining and machine learning, analyze of the specified emotions contained in the text is possible. Some studies on Twitter have made progress that the DJIA predictions used mood states derived from Twitter have higher accuracy and lower error [1].
In China, the effect is more significant due to the imperfection of China's market trading mechanism. Particularly, in the Chinese stock market, ordinary small and medium-sized investors are limited by information channels and knowledge levels [13]. Filtering the data source is vital to getting reliable results. In China, Weibo is a micro-blogging platform like Twitter, where over 100 million people actively share their feelings and thoughts every day. Xu et al. extracted time series of network emotion from Weibo and showed a causal relationship between online sentiment volatility and stock market returns [14]. However, the shortcoming of Weibo is also obvious. The active user base is large, and it is challenging to capture texts highly related to stocks [3]. Consequently, some studies selected financial forums mainly concentrating on shareholders and intensely relevant professional topics. Bu et al. collected all the comments and posts of the CSI 300 constituent stocks on the stock message boards of Easymoney.com [13].
Based on the analysis above, this paper used the data from xueqiu.com, one of the biggest and most professional online financial forums in China, to carry out the remaining discussions about investor sentiment and stock returns. Moreover, the literature mainly focuses on the classification of sentiment and constructs the bullishness using total numbers of positive, negative and neutral messages, mostly ignoring the differences of influence between them. In fact, a post has a greater impact on investor sentiment. This paper aims to take the level of influence into account when constructing the bullishness index and compare the accuracy of the prediction between the original and the weighted.
The second key research point is how to build a prediction model between investor sentiment and stock return. One of the traditional and normal methods is OLS, which confines the analysis to the conditional mean of the return distribution [8]. QR is a better choice than OLS since OLS ignores the difference throughout return distribution, which means different market conditions (i.e., low/ high return distribution quantiles) [2]. Therefore, this paper adopts QR introduced by Koenker and Bassett to analyze the effects of investor sentiment throughout the return distribution while considering the influence of messages [15].

Data Set
The data set used in this paper to reflect investor sentiment information comes from Datago Technology Limited. Datago analyzed postings and replies to postings from xueqiu.com users through the industry-leading machine learning and natural language processing technology. For each sentiment (positive, neutral and negative), the data set provided not only the total number of posts but also the average number of followers of posters. Given the information, the assumption is held that the more followers positive posters have than negative posters, the more influential positive sentiment are than negative sentiments on that day. The level of this influence is measured by rating the average number of followers of the posters. The data covers the period from January 1, 2015, to June 30, 2015, as a sample and select all stocks in CSI100, which are the most representative 100 large-cap companies in the A-shares stock market. Those companies are frequently discussed, which can avoid the error caused by too few posts. The reason for choosing the first half of 2015 is that a stock crash occurred in China's stock market in June 2015. Some companies in this period performed with high volatility while the market indices was not completely off track, therefore it is beneficial to study extreme circumstances.

Bullishness Index
Let M t c denotes the number of posts of type c ∈ {pos, neu, neg} in day t, and a total number of relevant posts M t = M t pos + M t neg does not include neutral sentiments. According to Antweiler and Frank, the logarithmic transformation works better than two other indices since it can reduce the impact of extremely large posts number [8]. Furthermore, to take the influence of numbers of followers into account, the number of positive or negative posts is weighted using follower ratings. If there was no post for one stock on one day, it is obvious that both bullish indices are 0. The B t denotes the bullishness index while the B t * denotes the weighted bullishness index.

The Quantile Regression
Quantile regression is used in this research since OLS analysis only considers the effect of sentiment on the average returns. However, sentiment may impact the entire distribution of return, not only the mean, since sentiment expressed on posts can vary under different conditions, especially in extreme times. Additionally, the stock returns are not usually normally distributed so the quantile regression is better than previous studies on the following two points. Firstly, QR still works with some outliers, and the error term is not normally distributed. Therefore, the results obtained by QR are more robust than those obtained by OLS. Secondly, Chevapatrakul stated that an assumption on the error term is non-necessary due to the semiparametric nature of QR. Based on the above, quantile regression is more appropriate for modelling stock returns [16]. Al-Nasseri et al. summarized the benchmark model on stock returns accepted in the literature, and both contemporaneous and predictability effects of sentiment were discussed [2].
As for contemporaneous effects, the conditional quantile function of with original bullishness index and weighted bullishness index are specified respectively as q τ (R it |∆B it , MSG it , MKT t , NWK t ) = α τ + γ τ ∆B it + λ 1τ MSG it + λ 2τ MKT t + λ 3τ NWK t q τ (R it |∆B it * , MSG it , MKT t , NWK t ) = α τ + γ τ ∆B it * + λ 1τ MSG it + λ 2τ MKT t + λ 3τ NWK t As for predictability effects, the conditional quantile function of with original bullishness index and weighted bullishness index are specified respectively as q τ (R it |∆B i(t−1) , MSG i(t−1) , MKT t , NWK t ) = α τ + γ τ ∆B i(t−1) + λ 1τ MSG i(t−1) + λ 2τ MKT t + λ 3τ NWK t q τ (R it |∆B i(t−1) * , MSG i(t−1) , MKT t , NWK t ) = α τ + γ τ ∆B i(t−1) * + λ 1τ MSG i(t−1) + λ 2τ MKT t + λ 3τ NWK t Where R it is the daily stock returns for stock i on day t, which is calculated by 100 ln( P t P t−1 ), 100 times logarithmic difference of closing prices on day t and day (t-1); ∆B it and ∆B it * are the changes of original and weighted bullishness index, reflecting the change of investors' sentiment from day (t-1) to day t; MSG it is the sum of positive, neutral, negative posts for stock i on day t; MKT t is the index returns of CSI100 on day t, representing the return of market; NWK t is a dummy variable to capture the potential Monday return anomaly effect, which equals to 1 if day t is Monday and otherwise equals to 0. Table 2 summarizes the data used in the research. Our data has 11882 observations for the 100 CSI100 stocks from January 1, 2015, to June 30, 2015, over 119 trading days. (One stock with id 000166.SZ did not go public until January 56, 2015, so just leave it out in January.) The mean of the stock returns is 0.112%, while the highest return is 9.631% and the lowest return is -72.485%. The data has a relatively large range and high volatility due to the large deviation of 3.652. The change of the original bullishness index is stable ranging from -3.807 to 4.043 and its standard deviation is 0.780. The weighted bullishness index has a larger range and standard deviation which ranges from -5.472 to 5.323 and its deviation is 1.189, indicating that the weighted bullishness index has higher volatility.     This figure shows the relationship between the quantile and different coefficients of the bullishness index. The x-axis represents the quantiles of the return distribution ( = 0.02, 0.04, …, 0.98), and the y-axis represents the coefficients of the shift in bullishness index (a) and (b) . The black points are the coefficients at different quantiles, while the shaded area denotes the 95% confidence interval. The red line represents the OLS estimated coefficient with the dotted line representing its 95% confidence interval.

Contemporaneous Effects of Original Bullishness Index
Both (a) and (b) in Figure 1 show that at all quantiles are negative. The common performance is that is close to 0 at medium quantiles, but its absolute value increases from median to higher and lower quantiles. What is more, this phenomenon is more evident at the lower quantiles. Table 3 and  Table 4 report the statistical significance of the bullishness index. For the original bullishness index, table 3 indicates good significance at low and medium quantiles. For the weighted bullishness index, table 4 indicates good significance only at the low quantile.     Both (a) and (b) in Figure 2 shows that γ at all quantiles are positive. The common performance is that γ is close to 0 at medium quantiles, but its absolute value increases as it is far away from the median quantile. Additionally, the value reaches its largest value at the low quantile. Table 5 and Table 6 report the statistical significance of the bullishness index. For the original bullishness index, table 5 implies that the γ is significant at all quantiles. For the weighted bullishness index, table 6 indicates good significance only at low and medium quantile.

Discussion
The result suggests that sentiment has negative contemporaneous effects on stock returns while for the predictive model, the effect is positive. These effects reflected by coefficients are indeed larger at low or high quantiles, confirming current ideas that mispricing from sentiment could be strong in extreme conditions [2,17]. However, the contemporaneous effect and predictive effect in the literature are opposite to our research [2,8]. The results provide a new insight into the relationship between sentiment and stock returns. The potential reason is the difference between the Chinese and US stock markets. In terms of statistical significance, compared with the original bullishness index, the weighted one has the same or even weaker significance level. The results are inconsistent with our assumption that the larger followers of posters are likely to have a greater influence on posts. The reason for this inconsistency may be that the number of followers is a more indirect factor than the number of views, but the latter is out of reach.
Not only the absolute value of the slope but its significance level imply that sentiment has a greater impact on the stock returns when the quantile is lower. This suggests that in a sluggish stock market, people are more susceptible to the words of others on social media. This contributes one more evidence of investors' sentiment has less impact on market returns in bull markets than in bear markets [18].
The paper also has the following limitations. Firstly, the posts on non-trading days are not considered, whose effects cannot be simply summarized as Monday return anomaly effect (variable NWK). Secondly, the stocks covered by the index CSI100 are partially replaced every six months, and one replacement happened in June 2015, but the 100 stocks used in the study are unchanged due to the small scale of change. Thirdly, the R-square of the quantile regression model is not high, probably because there are missing essential variables. It is beyond the scope of this study to use the model to fit very accurate stock returns.

Conclusions
In this paper, the sentiment and weighted sentiment indicators for 100 stocks in CSI100 in the first half of 2015 are extracted based on the data of xueqiu.com from the Datago database. One sentiment indicator only considers the number of positive and negative posts, while the other adopts the number of followers of the posters to weight the number of positive or negative posts to discuss whether it is more accurate to take the impact of posters into account. Additionally, quantile regression models are constructed to study both the contemporaneous and predictive effects of sentiment instead of OLS.
The study finds that sentiment has negative contemporaneous effects on stock returns but has positive predictive effects on stock returns. Both effects are strong at high or low quantile, but they are not conspicuous in normal circumstances. Moreover, more specifically, the effect at low quantiles is larger than that at high quantiles. However, the assumption about followers of the posters was not verified in the paper since the weighted bullishness index could not explain the stock returns better.
This paper provides a new approach to building the bullishness index and studies the contemporaneous effects through quantile regression in the A-shares market. These were not performed in the previous literature. Future research can further explore why the contemporaneous effects are negative, but the predictive effects are positive. And data from more platforms should be tested to confirm the relationship. More notably, whether exists a sentiment indicator better than the bullishness index that only contains the number of positive and negative posts and how should other relative factors be combined into the bullishness index are also questions worth to exploring.