Next Article in Journal
Resource Allocation Scheduling with Position-Dependent Weights and Generalized Earliness–Tardiness Cost
Next Article in Special Issue
The Gibbons, Ross, and Shanken Test for Portfolio Efficiency: A Note Based on Its Trigonometric Properties
Previous Article in Journal
A Provable Secure Cybersecurity Mechanism Based on Combination of Lightweight Cryptography and Authentication for Internet of Things
Previous Article in Special Issue
Comparing SSD-Efficient Portfolios with a Skewed Reference Distribution
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Portfolio Evaluation with the Vector Distance Based on Portfolio Composition

School of Business, Yonsei University, Seoul 03722, Republic of Korea
*
Author to whom correspondence should be addressed.
Submission received: 11 September 2022 / Revised: 4 December 2022 / Accepted: 5 December 2022 / Published: 1 January 2023

Abstract

:
We propose a novel portfolio evaluation method, a distance-based approach, which directly evaluates the portfolio composition rather than portfolio returns. In this approach, we consider a portfolio as an estimator for an in-sample tangency portfolio, which we define as the optimal reference portfolio. We then evaluate the portfolio by computing its vector distance to the optimal reference portfolio. In search of the proper distance-based performance measure, we choose four representative vector distances and compare their suitability as a new portfolio performance measure. Through extensive statistical analysis, we find that the Euclidean distance is the most proper distance-based performance measure of the four representative vector distances. We further verify that a portfolio with a large Euclidean distance is not desirable because not only does it provide a low utility implied by the first four moments of portfolio returns, but also it is not likely to maintain its long-term performance. Hence, the Euclidean distance can complement the return-based performance measures by confirming the reliability of a portfolio in its investment performance.
MSC:
62P05; 90C30; 91G10; 91G70; 91G80

1. Introduction

The portfolio selection theory [1] has been an important foundation of modern portfolio theory. Despite its theoretical contribution, researchers have pointed out its limited practical impact since its input parameters, the mean and the covariance of returns, have to be estimated from historical data. Sample estimates obtained from such historical data usually contain significant estimation errors, leading to unsatisfactory investment performance [2,3].
To improve Markowitz’s portfolio model, abundant research has attempted to develop more sophisticated models. Related literature includes imposing an additional constraint on portfolio weights [4,5,6], incorporating robust optimization techniques [7,8], and developing better estimates of the input parameters, such as shrinkage estimators [9,10]. Other efforts include integrating the mean-variance model with various active management strategies, such as time-varying investment targets [11] or dynamic self-rebalancing [12].
At the same time, researchers have developed various performance measures to evaluate these portfolio models. Most literature focuses on return-based performance measures such as the Sharpe ratio [13] that characterize the expected return-risk tradeoff of a portfolio. Another example is the Treynor ratio [14], which measures how much excess return was generated for each unit of systematic risk taken on by a portfolio. Recently, Refs [15,16] investigated higher moments of portfolio returns, such as skewness and kurtosis, as a new return-based performance measure.
While the return-based performance measure is intuitive, it is often insufficient for more sophisticated portfolio evaluation. Indeed, researchers have reported empirical results that for the Sharpe ratio, many portfolio models are statistically indifferent from a benchmark model such as the 1/N portfolio, which enforces an equal weight of 1/N to N risky assets [17,18]. Hence, in a practical situation with a choice of many portfolio models, more effort should be devoted to developing an additional dimension of portfolio analysis to allow for more fine-grained evaluation.
To this end, we break away from the traditional return-based portfolio analysis and propose to evaluate portfolio composition. While several researchers have focused on portfolio composition as an object of interest for estimation, it has rarely been used for portfolio evaluation. An idea of evaluating the portfolio composition appeared at best indirectly in [19], who used the 2-norm distance between a portfolio obtained from historical returns and an optimal reference portfolio with no estimation risk to measure the estimation error. Extending this idea, we develop a novel framework for directly evaluating the portfolio composition. We consider a portfolio as an estimator for an in-sample tangency portfolio, which we define as the optimal reference portfolio. We then evaluate the portfolio by computing its vector distance to the optimal reference portfolio. We refer to this new evaluation as a distance-based performance measure.
In searching for the proper distance-based performance measure, we aim to explore the following research questions. First, out of many vector distances, such as norm-based or inner-product-based distances, which is the most suitable for portfolio evaluation? Second, can the distance-based performance measure represent investor preference implied by the traditional utility theory? Finally, what complementary information can the distance-based performance measure provide as an additional dimension of portfolio analysis?
By addressing these research questions, we make the following contributions to the literature. First, out of four representative types of vector distances—Euclidean, Manhattan, Cosine, and Pearson—we confirm that the Euclidean distance is the most suitable for portfolio evaluation. We randomly generate 30,000 portfolios and investigate their representative four vector distances from the in-sample tangency portfolio. Through our extensive statistical analysis, we find that the Euclidean distance is the most proper distance-based performance measure of the four representative vector distances since it evaluates portfolios in the most unbiased and sophisticated manner and shows the most consistent relationship with the traditional return-based performance measures.
Our second contribution is to justify the Euclidean distance further by examining its relationship with an investor’s utility preference and reliability of investment performance. The traditional utility theory states that an investor should prefer a portfolio with a high mean, a low standard deviation, a high skewness, and a low kurtosis of portfolio returns [20]. Our numerical results suggest that the Euclidean distance shows a consistent relationship with these four moments of portfolio returns. That is, investors should prefer a portfolio with a shorter Euclidean distance since it gives a higher utility implied by portfolio returns. Finally, we find that a portfolio with a large Euclidean distance is unreliable since its mean return and risk are highly scattered around the average. Hence, we conclude that the Euclidean distance can complement the return-based performance measures by confirming the reliability of a portfolio in its investment performance.
From a practical perspective, we believe that our proposed distance-based framework can provide investors with a more sophisticated tool for portfolio evaluation. Furthermore, our findings suggest that decreasing the Euclidean distance can be a convincing direction for improving out-of-sample performance. Indeed, a subsequent work of this paper has developed a new algorithm that incorporates this idea and shows significantly improved investment performance [21] over various benchmarks, including the 1/N portfolio.
The remainder of this paper is organized as follows. Section 2 formally introduces the distance-based framework for portfolio evaluation. Section 3 lays out the design of numerical experiments. Section 4 investigates the suitability of four representative vector distances as a portfolio performance measure and confirms that the Euclidean distance is the most proper one. Section 5 further justifies the Euclidean distance as a meaningful performance measure by analyzing its relationship with an investor’s return-based utility and the reliability of investment performance. Finally, Section 6 provides conclusions and directions for future research.

2. Distance-Based Performance Measures for Portfolio Analysis

Consider a typical investment for a period 0 , T consisting of n risky assets. At the initial time 0 , we choose a portfolio to hold for the investment period. Since we do not have full knowledge of future returns, we can only use historical returns to choose a portfolio. We denote this estimated portfolio as w = w 1 , ,   w n n .
After the investment period, that is, at time T , we can observe the true (in-sample) returns realized for 0 ,   T and evaluate the performance of w . A common approach to doing so is a return-based approach where we measure its out-of-sample return. However, we propose a distance-based approach to directly evaluate the portfolio composition since it is a vector of decision variables that fundamentally affects the out-of-sample return of the portfolio. In this approach, we consider w as an estimator for an optimal reference portfolio w * . We then evaluate w by measuring d ( w ,   w * ) where d is a measure of similarity between w and w * . We will generally refer to d , as a distance function. A small value of d ( w ,   w * ) indicates that w is compositionally similar to w * and, therefore, more desirable in the distance-based approach. We refer to d ( w ,   w * ) as a distance-based performance measure.
From the mean-variance perspective, the most intuitive choice for the optimal reference portfolio is an in-sample tangency portfolio (hereby, in-sample TP). The in-sample TP is a portfolio obtained in an ex-post manner that achieves the highest Sharpe ratio for the investment period. Formally, let μ * and Σ * denote the in-sample mean vector and the covariance matrix of returns, respectively, obtained from the in-sample excess returns (over a risk-free return) for 0 ,   T . The in-sample TP w * = w 1 * , , w n * n is defined as follows:
w * = a r g m a x w n w μ * w Σ * w 1 / 2    
s . t .   w 1 N = 1 w 0 N ,
where 0 N and 1 N denote vectors of zeros and ones, respectively. That is, the in-sample TP is the optimal portfolio that an investor should have targeted if the investor had fully known the returns that would be realized for the future investment period. Note that w * and, therefore, d w ,   w * can be computed only in an ex-post manner (i.e., at T ). Hence, the timeline for portfolio evaluation is the same as the return-based approach. The optimal solution of the tangency portfolio model can be found with readily available software such as R and Matlab [22].
The in-sample TP has the following desirable properties as the optimal reference portfolio. First, since a high Sharpe ratio indicates a suitable risk-return tradeoff, the literature has used TP as a superior portfolio [23,24]. Second, the composition of the in-sample TP is uniquely determined on the efficient frontier [25,26]. Lastly, unlike the mean-variance portfolio, the tangency portfolio is independent of an individual investor’s attitude toward risk, such as minimum acceptable return [27].
There are many types of a distance function d that quantify the compositional similarity between w and w * . Generally, vector distances in n can be categorized as either norm or inner product distances. In search of a proper distance-based performance measure, we investigate four representative vector distances, as shown in Table 1. We first choose two norm distances: Manhattan and Euclidean distances. The Manhattan distance is the sum of the compositional difference between w and w * , while the Euclidean distance is the shortest length between the two portfolio vectors. We also investigate two inner product distances: Cosine and Pearson distances. The Cosine distance measures an angle between w and w * while the Pearson distance evaluates the correlation between the two vectors. Note that we have slightly adjusted the definition of the two inner product distances so that, like the norm distances, a small value indicates that the two portfolios are compositionally similar.
Figure 1 visualizes the difference between the norm and the inner product methods for measuring the compositional difference between the two portfolios. The shorter the norm distance d between the two portfolios, the more similar the two portfolios are. In addition, the smaller the inner product distance θ between the two portfolios, the more similar the directions of the two portfolios are.
As stated in the first research question, our goal is to investigate which vector distance would be the most suitable for portfolio evaluation. To this end, in the following section, we design numerical experiments to examine the different implications that each vector distance has in evaluating portfolios.

3. Experiment Design

We first generate 30,000 random portfolios for evaluation, denoted as w R = w R 1 , , w R n n , using Random Portfolio Weights Generator from Rho-Works’ website (http://www.rhoworks.com/randweights.php, accessed on 18 July 2021). We thereafter construct the in-sample TP w * using monthly excess returns of 17 industry portfolios (For a detailed description of the 17 industry portfolios, please see Appendix A) from Kenneth French’s database (http://mba.tuck.dartmouth.edu/pages/faculty/ken.french, accessed on 18 July 2021) from October 1992 to September 2017. Kenneth French’s database is widely used for portfolio research and is created using all stock information provided by The Center for Research in Security Price (CRSP) in the United States. The 17 industry portfolios are composed of Food, Mines, Oil, and 14 other industries. The total investment period is 25 years (300 months). For each random portfolio, we compute the four representative vector distances in Table 1 and hence obtain 30,000 observations for each vector distance. We standardize each vector distance as shown in Table 2 to unify their scales. Furthermore, we compute the mean return, the risk, and the Sharpe ratio for each random portfolio to investigate their relationship with each vector distance.
Figure 2 outlines the experiments. In Section 4, we compare the suitability of each vector distance as a performance measure. We focus on two necessary conditions of a proper performance measure. First, it should not show significant bias and skewness when evaluating randomly generated portfolios. Second, it should suggest a consistent relationship with the traditional return-based performance measures. Section 4.1 examines the first condition by examining the distribution of each vector distance. Section 4.2 investigates the second condition by analyzing how the mean portfolio return, the risk, the Sharpe ratio, and their proportions of outperformance to the 1/N portfolio change as each vector distance increases. Based on the numerical results in Section 4, out of the four representative vector distances, we argue that the Euclidean distance is the most suitable performance measure for portfolio analysis.
In Section 5, we further justify the Euclidean distance as an additional dimension for portfolio analysis. For a portfolio with a smaller Euclidean distance to be preferred by practical investors caring about portfolio returns, it should provide a higher return-based utility. Section 5.1 investigates this by analyzing how the Euclidean distance is associated with the preference implied by the traditional return-based utility theory. In Section 5.2, we discuss how the Euclidean distance can complement the return-based performance measures. By analyzing how the random portfolios grouped by the Euclidean distance are located on the risk-return plane, we provide an insight into the relationship between the Euclidean distance and the reliability of investment performance. All experimental procedures, such as optimization, distance calculation, and statistical analyses, are conducted using MATLAB R2017a.

4. Comparing the Suitability of Vector Distances as a Performance Measure

4.1. Analysis of the Distribution of Each Vector Distance

Figure 3a shows the scatter plot of each vector distance and the Sharpe ratio, respectively. We first observe that all of the vector distances show a strong negative correlation with the Sharpe ratio. This result is also supported by Table 3, which lists the Pearson’s correlation coefficients between each vector distance and the Sharpe ratio. This negative relationship is indeed expected since we use the in-sample TP as the optimal reference portfolio.
However, the negative relationship with the Sharpe ratio is not enough for a vector distance to be a proper performance measure. Since the random portfolios are generated without sampling bias, a vector distance that reveals strong skewness may not be suitable for portfolio evaluation. Figure 3b lists how the 30,000 random portfolios are distributed when each distance is applied. It is visually explicit that the Euclidean distance has a smooth and symmetric distribution without a strong skewness. However, the other three distance methods show a strong negative skewness. If the evaluation results of randomly selected portfolios are negatively skewed, it is difficult to distinguish which portfolios are good or bad since considerable proportions of the portfolios are far from the in-sample TP.
This visual implication is supported by the descriptive statistics of each vector distance in Table 4. The standardized Euclidean distribution has a skewness coefficient close to 0, the lowest kurtosis coefficient, and the largest range. Since all standard deviations are adjusted to 1, the fact that the Euclidean distribution has the largest range implies that the Euclidean distance can distinguish portfolio composition with the highest level of sophistication. A distribution with a skewness coefficient between −0.5 and 0.5 is classified as symmetric [28]. Moreover, the lower the kurtosis of the distribution is, the less peaked and the smoother the shape of the distribution is. Considering that the standard normal distribution has a kurtosis coefficient of 3, the standardized distribution of the Euclidean distribution is the smoothest and the most symmetric to approximate the normal distribution. Hence, we conclude that the Euclidean distance method is the most suitable evaluation for portfolio composition.

4.2. Relationship between Each Vector Distance and the Traditional Return-Based Performance Measures

In this subsection, we identify which vector distance shows the most consistent relationship with the traditional performance measures. Intuitively, for a vector distance to be suitable as an evaluation method, it should suggest a consistent tendency where portfolios with longer distances, on average, have lower Sharpe ratios. Hence, we aim to examine how the Sharpe ratio changes as each vector distance increases. Furthermore, since changes in the Sharpe ratio are accompanied by changes in the mean portfolio return and/or the risk, we also investigate how these two performance measures change as each vector distance increases.
The meaning of the consistent relationship is defined as follows: if there is an instance where a longer distance implies a statistically significant increase in the Sharpe ratio, the distance method may not be aligned with the existing portfolio evaluation. In this sense, such vector distance does not show a consistent relationship with the Sharpe ratio. For the same reason, we expect that the mean return (the risk) decreases (increases) as a vector distance increases if it shows a consistent relationship.
We categorize the 30,000 random portfolios for evaluation into 100 groups based on the percentile from each distance. We thereafter examine how their average investment performance varies due to the changes in the percentile group. For example, the first percentile group in the Euclidean distance contains 300 portfolios with the shortest Euclidean distance among the 30,000 random portfolios, and the 100th percentile group in the Manhattan distance contains the 300 portfolios with the longest Manhattan distance. We analyze how the average (annualized) Sharpe ratio, the mean return, the risk, and their proportions of outperformance to the 1/N portfolio change with the percentile group for each vector distance. The proportion of outperformance compared to the 1/N portfolio is defined as the probability that a portfolio performs better than the 1/N portfolio. For example, the proportion of outperformance of the Sharpe ratio to the 1/N portfolio in the first percentile group is the proportion of portfolios with a higher Sharpe ratio than that of the 1/N portfolio among the 300 portfolios with the shortest distance. We include this proportion of outperformance since outperformance over the 1/N portfolio has been considered an objective criterion to judge whether investment performance is good or bad [12,17].
We first examine the visual implications of each vector distance with Figure 4 and Figure 5. Figure 4 shows the changes in the average investment performance by the distance level. Figure 4a–c represent the Sharpe ratio, the mean portfolio return, and the risk, respectively. The central line of each graph represents the average performance of all portfolios in each percentile group. The lower (upper) line represents the average of portfolios in the lower (upper) 10% in each percentile group. The horizontal axis represents percentile groups for each distance. Figure 5 shows the corresponding changes in the proportions of outperformance to the 1/N portfolio by distance level.
From Figure 4 and Figure 5, we observe that the Euclidean distance shows the most consistent relationship with the three return-based performance measures. The Euclidean distance generally shows monotone relationships where portfolios with longer Euclidean distances, on average, have lower Sharpe ratios, lower mean returns, and higher risks. However, the other vector distances do not show a systematic decrease (increase) of the Sharpe ratio and the mean portfolio return (the risk) as each vector distance increases. Figure 5 shows qualitatively similar results to Figure 4 in that only the Euclidean distance satisfies the condition that the portfolios with longer distances, on average, have lower proportions of outperformance to the 1/N portfolio.
For example, for the Manhattan distance in Figure 4, the average Sharpe ratio of the 70th percentile group is higher than that of the 60th percentile group. This means that the average Sharpe ratio can even go up with the increasing Manhattan distance. One can also observe that, in the graphs on Manhattan, Cosine, and Pearson distances, the first and the ninth decile lines for all the performance measures exhibit irregular trend movements. For example, when portfolios are classified with the Pearson distance, the difference between the first and the ninth decile of the risk decreases from the 60th percentile in Figure 4, which means that a portfolio with a longer Pearson distance can be, on average, more stable than a portfolio with a shorter Pearson distance.
To rigorously confirm these visual implications, we conduct multiple independent t-tests investigating whether the differences in the three return-based performance measures between i th and j th percentile groups ( i < j and i , j = 1 , 2 , , 100 ) are statistically significant. Multiple independent t-tests involve conducting an independent t-test between each pair of groups among the entire set. Although conducting a t-test multiple times is known to increase type I errors, it can be adjusted by various methods, such as Bonferroni correction [29]. The significance level adjusted by Bonferroni correction is defined as α = α m when t-tests are performed m times at a target significance level α . This adjustment is frequently used in medical and health studies [30]. We conduct pairwise comparisons of the investment performance for 100 percentile groups at the target significance level α = 0.1 . Therefore, a total of m = 100 ! 2 ! 100 2 ! = 4950  t-tests are conducted at the corrected significance level α = 0.1 4950 .
Formally, let Y i generally denote each return-based performance measure (and its proportion of outperformance) in the i th percentile distance group with E Y i denoting the mean of Y i . We construct the null and alternative hypotheses as (2)–(7). These hypotheses verify whether the changes in the Sharpe ratio, the mean return, the risk, and their proportions of outperformance due to the changes in each vector distance are consistent with the intuition. Hence, we define “Inconsistency” as the case of adopting the alternative hypothesis and “Consistency” as the case of failing to reject the null hypothesis.
Investment performance:
Sharpe   ratio H 10 :   E Y i E Y j 0   vs .   H 11 :   E Y i E Y j < 0 i < j  
Rate   of   return   H 20 :   E Y i E Y j 0   vs .   H 21 :   E Y i E Y j < 0 i < j  
Risk H 30 :   E Y i E Y j 0   vs .   H 31 :   E Y i E Y j > 0 i < j  
Proportion of outperformance:
Sharpe   ratio H 40 :   E Y i E Y j 0   vs .   H 41 :   E Y i E Y j < 0 i < j  
Rate   of   return   H 50 :   E Y i E Y j 0   vs .   H 51 :   E Y i E Y j < 0 i < j  
Risk H 60 :   E Y i E Y j 0   vs .   H 61 :   E Y i E Y j < 0 i < j  
Table 5 shows the numerical results of the multiple t-tests. Table 5a compares the investment performance among the percentile groups, and Table 5b compares the investment proportions of outperformance among the groups. The upper figure for each distance is the number of “Inconsistency” or “Consistency” cases, and the lower one is its corresponding proportion out of the entire 4950 t-tests. Results of the Euclidean distance show that, in all pairs, portfolios in a lower percentile group do not have a lower Sharpe ratio, a lower mean portfolio return, a higher risk, and a lower proportion of outperformance on average than the other portfolios in a higher percentile group. However, for the other three distances, there exist cases inconsistent with the intuition with a non-negligible proportion, which parallels the visual implications of Figure 4 and Figure 5. For example, in 63 out of the 4950 (1.27%) t-tests, portfolios with shorter Manhattan distances have, on average, higher Sharpe ratios than the other portfolios with longer Manhattan distances. Similar inconsistencies are found for Cosine and Pearson distances. In summary, we statistically verify that only the Euclidean distance is consistent with the traditional return-based performance measures. Therefore, we conclude that, out of the four representative vector distances, the Euclidean distance is the most proper distance-based performance measure for portfolio analysis.

5. Further Justification of the Euclidean Distance as a New Performance Measure

In this section, we further justify the Euclidean distance as a new dimension for portfolio analysis. To this end, we conduct a detailed analysis of why a shorter Euclidean distance should be preferred by an investor who cares about portfolio returns. We first investigate how the Euclidean distance can represent an investor’s preference implied by the return-based utility theory (Section 5.1). We thereafter examine why a portfolio with a large Euclidean distance is not desirable by studying how the reliability of investment performance changes as the Euclidean distance increases (Section 5.2).

5.1. Relationship between the Euclidean Distance and the Utility Preference Implied by Portfolio Returns

The utility function is often applied in portfolio theory [20,31], where investors are usually assumed to show risk aversion. Specifically, the utility function is increasing in portfolio returns and satisfies the law of diminishing marginal utility. Since it has been argued that higher moments beyond the mean return and the risk should be considered to represent an investor’s utility [32], we analyze how the Euclidean distance can represent the preference implied by the first four moments of portfolio returns, including the skewness and kurtosis of portfolio returns.
We denote R w as a return of portfolio w , μ w as the mean portfolio return, that is E R w . Then, U ( R w ), the utility function of R w , can be written as (8) through the Taylor series expansion. The second moment of return E R w μ w 2 refers to the variance of the portfolio return ( σ w 2 ), the third moment E R w μ w 3 to its skewness ( s w 3 ), and the fourth moment E R w μ w 4 to its kurtosis ( k w 4 ) . The skewness measures the asymmetry of a distribution of stock returns, and the kurtosis measures the thickness of tails of the stock return distribution. Based on the Taylor series expansion, the expected value of U ( R w ) can be approximated by (9) [29,33]:
The Taylor series expansion for U R w :
U μ w + U μ w R w μ w + 1 2 ! U μ w R w μ w 2 + 1 3 ! U μ w R w μ w 3 + 1 4 ! U μ w R w μ w 4 + O R w 4
Equation for E U R w :
E U μ w + 1 2 ! U μ w σ w 2 + 1 3 ! U μ w s w 3 + 1 4 ! U μ w k w 4
It was shown in [20] that the portfolio utility function U · satisfies U μ w < 0 , U μ w > 0 , and U μ w < 0 , and proved that the portfolio utility should increase with the increase in μ w , σ w 2 , s w 3 , and k w 4 . That is, a portfolio with a low mean return, a high risk, a low skewness, and a high kurtosis will not be preferred by investors. Therefore, if the Euclidean distance appropriately represents an investor’s preference, the portfolio with a shorter Euclidean distance should show a higher (lower) mean return and skewness (risk and kurtosis) of portfolio returns.
Figure 6 displays the scatter plots of the annualized mean return, the risk, the skewness, and the kurtosis of the 30,000 random portfolios versus their Euclidean distances. The solid black line is a linear regression line summarizing the patterns in each graphs. Figure 6a,b show that, as the Euclidean distance increases, the mean portfolio return tends to decrease, and the risk tends to increase, while Figure 6c,d show that the skewness tends to decrease and the kurtosis to increase. Hence, we confirm from Figure 6 that the Euclidean distance is consistent with the characteristics of a portfolio utility function. In conclusion, it is reasonable to evaluate portfolio composition with the Euclidean distance since the Euclidean distance is closely related to an investor’s preference implied by portfolio returns.

5.2. Risk-Return Analysis of Portfolio Based on the Euclidean Distance

Recall that, in Section 4.2, we confirmed that the Euclidean distance showed a monotone (and therefore consistent) relationship with the mean portfolio return and the risk. We now further investigate, in addition to these monotone relationships, how the mean return and the risk change with the Euclidean distance.
We categorize the 30,000 random portfolios into 10 groups based on the Euclidean distance and draw the risk-return scatter diagram of each group. For example, 3000 portfolios with the shortest Euclidean distance are classified as Group 1, and 3000 portfolios with the longest Euclidean distance as Group 10. The risk-return scatter diagrams for all 10 groups are shown in Figure 7. The horizontal axis is the risk, and the vertical axis is the excess mean return over the risk-free rate. Descriptive statistics for each group are summarized in Table 6.
The curved line in Figure 7 is the minimum-variance frontier, with the upper-right-hand part of it being the efficient frontier [34,35]. ▲ indicates the in-sample TP and ◆ the average risk and return for each group. The location of ◆ for each group can be found in detail in Table 6.
According to Figure 7 and Table 6, ◆ moves southeast gradually from Group 1 to Group 10. More importantly, while the portfolios in Group 1 compactly cluster around ◆ in Figure 7, the portfolios in Group 10 tend to scatter out of ◆. These results indicate that a portfolio with a larger distance has not only worse average investment performance but also higher variability in both return and risk. Hence, a portfolio with a large Euclidean distance is unreliable even if it achieves a higher mean return or a lower risk; these investment performances are likely a result of accidental observations arising from the larger variability. In this sense, the Euclidean distance can complement the return-based performance measures by confirming the reliability of a portfolio in its investment performance.

6. Conclusions and Future Research

In this study, we developed a distance-based approach for portfolio analysis to directly evaluate portfolio composition. To develop a novel distance-based performance measure, we selected the four representative vector distances—Euclidean, Manhattan, Cosine, and Pearson distances—and investigated their suitability as a portfolio performance measure. Through extensive simulation and statistical analysis, we confirmed that the Euclidean distance is the most proper distance-based performance measure since it enables the most sophisticated and unbiased evaluation of randomly generated portfolios, and it shows the most consistent relationship with the traditional return-based performance measures. We further justified the Euclidean distance as a new dimension for portfolio analysis by verifying that not only a portfolio with a large Euclidean distance provides a lower utility implied by the traditional utility theory, but also its investment performance is highly unreliable.
For future research, we are working on constructing a portfolio with a shorter Euclidean distance using a statistical learning technique based on [21]. Specifically, we aim to generate an investment portfolio by making a convex combination of various out-of-sample portfolios and calibrating the combination level based on the Euclidean distance. Additionally, another study is underway to predict future in-sample TPs directly through time-series prediction models [36]. It collects the past in-sample TPs as time-series data and directly uses those portfolios instead of the historical rate of return to predict future investment portfolios through time-series forecasting methods.
It would also be meaningful to investigate the impact of estimated inputs on the portfolio selection model in terms of portfolio composition through prediction methods such as index smoothing and moving averaging with real stock data such as the S&P 500, and such data can be harvested mechanically by previously studied algorithms [37]. Lastly, verifying our result by constructing mathematical models and proving theorems will greatly enhance the applicability of the insight we provided in this paper and will be an interesting future research topic.

Author Contributions

Conceptualization, H.J. and S.K.; methodology, H.J., H.K. and S.K.; software, H.J.; validation, H.K. and S.L.; formal analysis, H.J.; investigation, H.J. and H.K.; resources, H.J. and S.K.; data curation, H.J.; writing—original draft preparation, H.J. and S.B.S.; writing—review and editing, S.L. and S.B.S.; supervision, S.K.; project administration, S.K.; funding acquisition, S.K. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Ministry of Education of the Republic of Korea and the National Research Foundation of Korea under grant NRF-2020S1A5A2A01045624; Yonsei University under grants 2021-22-0219 and 2022-22-0028; Yonsei Business Research Institute.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Figure A1 shows time-series graphs of the 17 industry portfolios used in this paper from the Kenneth French’s database. The horizontal axis refers to the timeline from October 1992 to September 2017, and the vertical axis refers to the monthly rate of return in each industry. In addition, simple statistics such as the average, the standard deviation, the skewness, and the kurtosis of returns by industry can be found in Table A1.
Figure A1. Time-series graphs of returns by 17 industry portfolios from October 1992 to September 2017. (a) Food; (b) Mining and Mineral (Mines); (c) Oil and Petroleum Products (Oil); (d) Textiles, Apparel & Footwea (Clths); (e) Consumer Durables (Durbl); (f) Chemicals (Chems); (g) Drugs, Soap, Perfumes, Tobacco (Cnsum); (h) Construction and Construction Materials (Cnstr); (i) Steel Works Etc (Steel); (j) Fabricated Products (FabPr); (k) Machinery and Business Equipment (Machn); (l) Automobiles (Cars); (m) Trans Transportation (Trans); (n) Utilities (Utils); (o) Retail Stores (Rtail); (p) Banks, Insurance Companies, and Other Financials (Finan); (q) Other.
Figure A1. Time-series graphs of returns by 17 industry portfolios from October 1992 to September 2017. (a) Food; (b) Mining and Mineral (Mines); (c) Oil and Petroleum Products (Oil); (d) Textiles, Apparel & Footwea (Clths); (e) Consumer Durables (Durbl); (f) Chemicals (Chems); (g) Drugs, Soap, Perfumes, Tobacco (Cnsum); (h) Construction and Construction Materials (Cnstr); (i) Steel Works Etc (Steel); (j) Fabricated Products (FabPr); (k) Machinery and Business Equipment (Machn); (l) Automobiles (Cars); (m) Trans Transportation (Trans); (n) Utilities (Utils); (o) Retail Stores (Rtail); (p) Banks, Insurance Companies, and Other Financials (Finan); (q) Other.
Mathematics 11 00221 g0a1
Table A1. Summary statistics about 17 industry portfolios from October 1992 to September 2017.
Table A1. Summary statistics about 17 industry portfolios from October 1992 to September 2017.
IndustryMeanStandard
Deviation
SkewnessKurtosis
a. Food0.0080.037−0.4574.849
b. Mining and Mineral (Mines)0.0090.082−0.3104.105
c. Oil and Petroleum Products (Oil)0.0090.0550.0313.584
d. Textiles, Apparel & Footwea (Clths)0.0090.058−0.1535.261
e. Consumer Durables (Durbl)0.0070.056−0.2376.907
f. Chemicals (Chems)0.0100.057−0.1195.330
g. Drugs, Soap, Perfumes, Tobacco (Cnsum)0.0100.040−0.3743.138
h. Construction and Construction Materials (Cnstr)0.0100.056−0.2164.032
i. Steel Works Etc (Steel)0.0080.084−0.2254.893
j. Fabricated Products (FabPr)0.0100.053−0.4935.267
k. Machinery and Business Equipment (Machn)0.0120.072−0.5624.893
l. Automobiles (Cars)0.0090.065−0.0185.849
m. Trans Transportation (Trans)0.0110.048−0.6214.668
n. Utilities (Utils)0.0080.040−0.5573.786
o. Retail Stores (Rtail)0.0090.045−0.2653.873
p. Banks, Insurance Companies, and Other Financials (Finan)0.0100.055−0.7005.480
q. Other0.0090.049−0.5654.125

References

  1. Markowitz, H. Porfolio selection. J. Financ. 1952, 7, 77–91. [Google Scholar] [CrossRef]
  2. Klein, R.W.; Bawa, V.S. The effect of estimation risk on optimal portfolio choice. J. Financ. Econ. 1976, 3, 215–231. [Google Scholar] [CrossRef]
  3. Best, M.J.; Grauer, R.R. On the sensitivity of mean-variance-efficient portfolios to changes in asset means: Some analytical and computational results. Rev. Financ. Stud. 1991, 4, 315–342. [Google Scholar] [CrossRef] [Green Version]
  4. De Miguel, V.; Garlappi, L.; Nogales, F.J.; Uppal, R. A Generalized Approach to Portfolio Optimization: Improving Performance by Constraining Portfolio Norms. Manag. Sci. 2009, 55, 798–812. [Google Scholar] [CrossRef] [Green Version]
  5. Jagannathan, R.; Ma, T. Risk reduction in large portfolios: Why imposing the wrong constraints helps. J. Financ. 2003, 58, 1651–1683. [Google Scholar] [CrossRef] [Green Version]
  6. Ban, G.Y.; El Karoui, N.; Lim, A.E.B. Machine learning and portfolio optimization. Manag. Sci. 2018, 64, 1136–1154. [Google Scholar] [CrossRef] [Green Version]
  7. Goldfarb, D.; Iyengar, G. Robust portfolio selection problems. Math. Oper. Res. 2003, 28, 1–38. [Google Scholar] [CrossRef] [Green Version]
  8. Kim, W.C.; Kim, J.H.; Fabozzi, F.J. Deciphering robust portfolios. J. Bank. Financ. 2014, 45, 1–8. [Google Scholar] [CrossRef]
  9. Ledoit, O.; Wolf, M. A well-conditioned estimator for large-dimensional covariance matrices. J. Multivar. Anal. 2004, 88, 365–411. [Google Scholar] [CrossRef] [Green Version]
  10. Ledoit, O.; Wolf, M. Nonlinear shrinkage of the covariance matrix for portfolio selection: Markowitz meets goldilocks. Rev. Financ. Stud. 2017, 30, 4349–4388. [Google Scholar] [CrossRef]
  11. Jung, J.; Kim, S. An adaptively managed dynamic portfolio selection model using a time-varying investment target according to the market forecast. J. Oper. Res. Soc. 2015, 66, 1115–1131. [Google Scholar] [CrossRef]
  12. Jung, J.; Kim, S. Developing a dynamic portfolio selection model with a self-adjusted rebalancing method. J. Oper. Res. Soc. 2017, 68, 766–779. [Google Scholar] [CrossRef]
  13. Sharpe, W.F. Mutual fund performance. J. Bus. 1966, 39, 119–138. [Google Scholar] [CrossRef]
  14. Treynor, J.L. How to rate management of investment funds. Harv. Bus. Rev. 1965, 43, 63–75. [Google Scholar]
  15. Harvey, C.R.; Liechty, J.C.; Liechty, M.W.; Peter, M. Portfolio selection with higher moments. Quant. Financ. 2010, 10, 469–485. [Google Scholar] [CrossRef] [Green Version]
  16. Chen, L.; Li, S.; Wang, J. Liquidity, Skewness and Stock Returns: Evidence from Chinese Stock Market. Asia-Pac. Financ. Mark. 2011, 18, 405–427. [Google Scholar] [CrossRef]
  17. DeMiguel, V.; Garlappi, L.; Uppal, R. Optimal versus naive diversification: How inefficient is the 1/N portfolio strategy? Rev. Financ. Stud. 2009, 22, 1915–1953. [Google Scholar] [CrossRef] [Green Version]
  18. Agrrawal, P. Using Index ETFs for Multi-Asset-Class Investing: Shifting the Efficient Frontier Up. J. Index Investig. 2013, 4, 83–94. [Google Scholar] [CrossRef]
  19. Simaan, Y. Estimation Risk in Portfolio Selection: The Mean Variance Model Versus the Mean Absolute Deviation Model. Manag. Sci. 1997, 43, 1437–1446. [Google Scholar] [CrossRef]
  20. Scott, R.C.; Horvath, P.A. On the Direction of Preference for Moments of Higher Order than the Variance. J. Financ. 1980, 35, 915–919. [Google Scholar] [CrossRef]
  21. Kim, H.; Lee, S.; Soh, S.B.; Kim, S. Improving portfolio investment performance with distance-based portfolio-combining algorithms. J. Financ. Res. 2022, 45, 941–959. [Google Scholar] [CrossRef]
  22. Benninga, S. Financial Modeling; MIT Press: Cambridge, MA, USA, 2014. [Google Scholar]
  23. Martellini, L. Toward the design of better equity benchmarks: Rehabilitating the tangency portfolio from modern portfolio theory. J. Portf. Manag. 2008, 34, 34–41. [Google Scholar] [CrossRef]
  24. Keykhaei, R.; Jahandideh, M.T. Tangency portfolios in the LP solvable portfolio selection models. RAIRO—Oper. Res. 2012, 46, 149–158. [Google Scholar] [CrossRef] [Green Version]
  25. Fama, E.F. Risk, return and equilibrium: Some clarifying comments. J. Financ. 1968, 23, 29–40. [Google Scholar] [CrossRef]
  26. Sharpe, W.F. Capital asset prices: A theory of market equilibrium under conditions of risk. J. Financ. 1964, 19, 425–442. [Google Scholar] [CrossRef] [Green Version]
  27. Bhalla, V.K. Investment Management (Security Analysis and Portfolio Management); S. Chand Publishing: New Delhi, India, 2008. [Google Scholar]
  28. Evans, J.R.; Lindsay, W.M. Managing for Quality and Performance Excellence; Cengage Learning: Mason, Ohio, 2013; ISBN 1285633172. [Google Scholar]
  29. Welkowitz, J.; Cohen, B.H.; Ewen, R.B. Introductory Statistics for the Behavioral Sciences; John Wiley & Sons.: Hoboken, NJ, USA, 2006. [Google Scholar]
  30. Anand Lingeswaran Repetitive transcranial magnetic stimulation in the treatment of depression: A randomized, double-blind, placebo-controlled trial. Indian J. Psychol. Med. 2011, 33, 35. [CrossRef] [Green Version]
  31. Levy, H.; Markowitz, H.M. Approximating expected utility by a function of mean and variance. Am. Econ. Rev. 1979, 69, 308–317. [Google Scholar]
  32. Paul, A. Samuelson The fundamental approximation theorem of portfolio analysis in terms of means, variances and higher moments. Rev. Econ. Stud. 1970, 37, 537–542. [Google Scholar]
  33. Brito, R.P.; Sebastião, H.; Godinho, P. Efficient skewness/semivariance portfolios. J. Asset Manag. 2016, 17, 331–346. [Google Scholar] [CrossRef] [Green Version]
  34. Merton, R.C. An analytic derivation of the efficient portfolio frontier. J. Financ. Quant. Anal 1972, 7, 1851–1872. [Google Scholar] [CrossRef] [Green Version]
  35. Elton, E.J.; Gruber, M.J.; Brown, S.J.; Goetzmann, W.N. Modern Portfolio Theory and Investment Analysis; John Wiley & Sons: Hoboken, NJ, USA, 2009. [Google Scholar]
  36. Worthington, A.; Valadkhani, A. Catastrophic shocks and capital markets: A comparative analysis by disaster and sector. Glob. Econ. Rev. 2005, 34, 331–344. [Google Scholar] [CrossRef] [Green Version]
  37. Agrrawal, P. An automation algorithm for harvesting capital market information from the web. Manag. Financ. 2009, 35, 427–438. [Google Scholar] [CrossRef]
Figure 1. Norm and inner product methods to measure similarity between two vectors. (a) distance measured by the norm; (b) distance measured by the inner product.
Figure 1. Norm and inner product methods to measure similarity between two vectors. (a) distance measured by the norm; (b) distance measured by the inner product.
Mathematics 11 00221 g001
Figure 2. An outline of the experiments.
Figure 2. An outline of the experiments.
Mathematics 11 00221 g002
Figure 3. Sharpe ratio and histogram for each distance. (a) scatter plots of each distance and the Sharpe ratio; (b) histograms of the distance.
Figure 3. Sharpe ratio and histogram for each distance. (a) scatter plots of each distance and the Sharpe ratio; (b) histograms of the distance.
Mathematics 11 00221 g003
Figure 4. Changes in the average investment performance by the distance percentile group. (a) Sharpe ratio; (b) rate of return; (c) risk.
Figure 4. Changes in the average investment performance by the distance percentile group. (a) Sharpe ratio; (b) rate of return; (c) risk.
Mathematics 11 00221 g004
Figure 5. Changes in the proportions of outperformance to the 1/N portfolio by distance percentile group. (a) Sharpe ratio; (b) rate of return; (c) risk.
Figure 5. Changes in the proportions of outperformance to the 1/N portfolio by distance percentile group. (a) Sharpe ratio; (b) rate of return; (c) risk.
Mathematics 11 00221 g005
Figure 6. Moments of return according to the Euclidean distance. (a) rate of return; (b) risk; (c) skewness; (d) kurtosis.
Figure 6. Moments of return according to the Euclidean distance. (a) rate of return; (b) risk; (c) skewness; (d) kurtosis.
Mathematics 11 00221 g006
Figure 7. Risk-return graphs according to the Euclidean distance groups. (a) Group 1; (b) Group 2; (c) Group 3; (d) Group 4; (e) Group 5; (f) Group 6; (g) Group 7; (h) Group 8; (i) Group 9; (j) Group 10.
Figure 7. Risk-return graphs according to the Euclidean distance groups. (a) Group 1; (b) Group 2; (c) Group 3; (d) Group 4; (e) Group 5; (f) Group 6; (g) Group 7; (h) Group 8; (i) Group 9; (j) Group 10.
Mathematics 11 00221 g007
Table 1. Distances for the portfolio composition evaluation.
Table 1. Distances for the portfolio composition evaluation.
CategoryDistance MethodFormula
NormEuclidean
distance
i = 1 n w i w i * 2
Manhattan
distance
i = 1 n w i w i *
Inner ProductCosine
distance
1 i = 1 n w i · w i * i = 1 n w i 2 i = 1 n w i * 2
Pearson
distance
1 i = 1 n w i w ¯ w i * w ¯ * i = 1 n w i w ¯ 2 i = 1 n w i * w ¯ * 2 = 1 ρ w , w * ( w ¯ = i = 1 n w i / n , w ¯ * = i = 1 n w i * / n )
Table 2. Formulas for standardizing distance methods.
Table 2. Formulas for standardizing distance methods.
Standardized DistanceFormula
Euclidean distance E D w * , w R M e a n E D w * , w R S t d E D w * , w R
Manhattan distance M D w * , w R M e a n M D w * , w R S t d M D w * , w R
Cosine distance C D w * , w R M e a n C D w * , w R S t d C D w * , w R
Pearson distance P D w * , w R M e a n P D w * , w R S t d P D w * , w R
Table 3. The correlations between each distance and the Sharpe ratio.
Table 3. The correlations between each distance and the Sharpe ratio.
Standardized DistancePearson’s Correlation Coefficient
Euclidean distance−0.5569
Manhattan distance−0.6517
Cosine distance−0.6849
Pearson distance−0.6684
Table 4. Descriptive statistics for each vector distance.
Table 4. Descriptive statistics for each vector distance.
Standardized
Distance
Range (=max–min)Coefficient of
Skewness
Coefficient of
Kurtosis
Euclidean distance6.2208−0.15493.0141
Manhattan distance5.7967−1.08473.5950
Cosine distance4.6691−1.36964.1047
Pearson distance5.4545−1.41954.3447
Table 5. Multiple t-test results. (a) Investment performance comparison; (b) Proportion of outperformance comparison.
Table 5. Multiple t-test results. (a) Investment performance comparison; (b) Proportion of outperformance comparison.
(a)
StandardizedSharpe RatioMean ReturnRisk
DistanceInconsistencyConsistencyInconsistencyConsistencyInconsistencyConsistency
Euclidean049500495004950
distance0.00%100.00%0.00%100.00%0.00%100.00%
Manhattan63488714448061054845
distance1.27%98.73%2.91%97.09%2.12%97.88%
Cosine264924724878294921
distance0.53%99.47%1.45%98.55%0.59%99.41%
Pearson749435148992174733
distance0.14%99.86%1.03%98.97%4.38%95.62%
(b)
StandardizedSharpe Ratio Proportion of OutperformanceMean Return Proportion of OutperformanceRisk Proportion of Outperformance
DistanceInconsistencyConsistencyInconsistencyConsistencyInconsistencyConsistency
Euclidean049500495004950
Distance0.00%100.00%0.00%100.00%0.00%100.00%
Manhattan76487413448162634687
distance1.54%98.46%2.71%97.29%5.31%94.69%
Cosine2449261094841614889
distance0.48%99.52%2.20%97.80%1.23%98.77%
Pearson0495075487504950
distance0.00%100.00%1.52%98.48%0.00%100.00%
Table 6. Summary statistics of the mean return and the risk according to the Euclidean distance groups.
Table 6. Summary statistics of the mean return and the risk according to the Euclidean distance groups.
GroupAverageStandard Deviation
Mean ReturnRiskMean ReturnRisk
10.10260.13650.00740.0128
20.09720.14840.00860.0176
30.09650.15830.00930.0195
40.09470.16490.01000.0209
50.09390.17050.01050.0221
60.09310.17440.01080.0241
70.09220.17910.01200.0273
80.09210.18460.01310.0294
90.09270.18850.01430.0328
100.09100.20150.01620.0387
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Jeon, H.; Lee, S.; Kim, H.; Soh, S.B.; Kim, S. Portfolio Evaluation with the Vector Distance Based on Portfolio Composition. Mathematics 2023, 11, 221. https://0-doi-org.brum.beds.ac.uk/10.3390/math11010221

AMA Style

Jeon H, Lee S, Kim H, Soh SB, Kim S. Portfolio Evaluation with the Vector Distance Based on Portfolio Composition. Mathematics. 2023; 11(1):221. https://0-doi-org.brum.beds.ac.uk/10.3390/math11010221

Chicago/Turabian Style

Jeon, Heonbae, Soonbong Lee, Hongseon Kim, Seung Bum Soh, and Seongmoon Kim. 2023. "Portfolio Evaluation with the Vector Distance Based on Portfolio Composition" Mathematics 11, no. 1: 221. https://0-doi-org.brum.beds.ac.uk/10.3390/math11010221

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop