Theoretical developement in the regression analysis of fractional data and its applications to Finance

Cofinanciado por:
Project title | Theoretical developement in the regression analysis of fractional data and its applications to Finance
Project Code | PTDC/EGE-ECO/119148/2010
Main objective |

Region of intervention |

Beneficiary entity |
  • Universidade de Évora(líder)
  • Centro de Matemática Aplicada à Previsão e Decisão Económica (CEMAPRE/ISEG/UTL)(parceiro)
  • Universidade de Coimbra(parceiro)

Approval date | 09-09-2011
Start date | 01-03-2012
Date of the conclusion | 31-08-2015

Total eligible cost |
European Union financial support |
National/regional public financial support |
Apoio financeiro atribuído à Universidade de Évora | 81800 €

Summary

In several economic settings, the variable of interest in regression models is often a proportion, or a vector of proportions, corresponding to a set of shares for a given number of exhaustive, mutually exclusive categories. Examples include pension plan participation rates, fraction of land allocated to agriculture, percentage of weekly time devoted to each of a given set of human activities, fractions of income spent on various classes of goods, asset portfolio shares, and proportions of different types of debt within firms’ financing mix. While in the first two cases there are only two categories (usually a characteristic and its opposite, or absence) and a single proportion is modelled, the remaining examples illustrate the more general situation where the joint behaviour of a multivariate fractional variable is of interest.

The regression analysis of fractional data, inherently bounded within the unit simplex, raises a number of interesting research issues that challenge conventional approaches of estimation and inference. For the case of a single proportion, the main issues are discussed in the seminal paper by [PaWo96], who propose robust quasi-maximum likelihood estimation on the basis of a Bernoulli-based likelihood and a logit conditional mean function. In a recent paper ([RaRaMu11]), some of the authors of this research proposal survey the main alternative regression models and estimation methods that are available for dealing with (univariate) fractional response variables and propose a unified testing methodology to assess the validity of the assumptions required by each model and method.

In this project, we continue the research initiated in 2007 with the FCT-funded project PTDC/ECO/64693/2006, which focused on the regression analysis of univariate fractional data using parametric methods. The application of these methods in that context is again considered but the main aim of the current research proposal is the analysis of multivariate fractional data using both parametric and nonparametric regression techniques. Moreover, while the previous project considered a single empirical application (the determinants of firms’ capital structure decisions) to illustrate the usefulness of the estimation and inference tools developed, this research project considers a much wider range of applications in finance, which is an area where, in recent years, most team members have also accumulated some expertise (see, for instance, [RaSi09], [SaMu09] and [Ba10]).

The parametric models proposed for modelling multivariate fractional responses differ on a number of respects, such as: (i) the adoption, or not, of full joint distributional assumptions for shares; (ii) the possibility, or not, of dealing with boundary observations; and (iii) in cases where shares result from ratios of known integers, the use, or not, of this additional information. In any case, all models have in common the use of functional forms for the conditional expectation of the response variables which enforce the conceptual requirement that, as for the observed shares, its elements also belong to the unit simplex. We also investigate at some length the specification analysis of these models, which is a sensitive issue that has not merited much attention in the literature on multivariate fractional regression.

In some applications, it may be useful to dispose of other, less conventional regression tools to model fractional response variables. In this research project we consider two alternative techniques to parametric methods: decision tree models and artificial neural networks. Both these nonparametric methods are first adapted to the fractional context and then shown to be competitive techniques to standard methods in modelling and forecasting fractional responses. Decision tree models are also used jointly with parametric methods to model fractional response variables by groups of homogenous firms, where the definition of the groups is determined by the nonparametric procedure.

A research area where many variables of interest have a fractional nature is that of finance. In this project, we intend to revisit several important issues in finance, ranging from the relationship between corporate board structure and firm performance to the classical capital structure and cash-holding decisions. We will also explore other relevant topics, such as the forecasting of loss-given-default in bank loans and the determinants of institutional investors’ equity ownership. Such diversity of examples enables us to apply the methods developed in a rich variety of settings: single and multivariate shares; fractional responses obtained as ratios of known and/or unknown integers; observation, or not, of boundary values with nontrivial probability; joint or separate regressions by groups of firms; and cases where the main interest is either modelling relationships or forecasting.


Goals, activities and expected/achieved results

In order to circumvent the limitations of the existing parametric models for multivariate fractional responses, we consider various alternative approaches that fully account for the bounded, unit-sum nature of fractional variables and are sufficiently general to be applied to a variety of situations. Moreover, we consider the issue of specification testing. In particular, concerning parametric approaches, we address the following topics:

(i) The few studies that have modelled multivariate fractional response variables acknowledging appropriately their share nature and using conditional mean models are all based on a multinomial logit specification. This choice has been dictated mainly by convenience: this is the simplest model used in the discrete case to describe multivariate choice probabilities. However, the behavioral implications of the well-known independency of irrelevant alternatives (IIA) property of the multinomial logit model, which implies that the ratio between the proportions allocated to two categories is independent from the remaining categories, naturally extends to the case of fractional variables. In this research project we consider the adaptation to the fractional setting of other specifications that are commonly used in the discrete case to model choice probabilities (e.g. nested logit, mixed logit).

(ii) In contrast to the case of conditional mean models, most applications of the Dirichlet model are based on different reparameterizations of the Dirichlet distribution, which imply different functional forms for the conditional expectation of the response variables. However, most of those alternative specifications of the Dirichlet regression model seem to be specific to the particular studies carried out by their proponents. For instance, [ChGr02] use a specification that only makes sense in the framework of Lorenz curves. In this research project we provide an integrated approach for all models analyzed, suggesting reparameterizations of the distributions underlying the Dirichlet and other full parametric regression models that ensure that the same specifications used for E(Y|X) in conditional mean models may also be used for describing the conditional expectation implied by each parametric model. With this approach, as for the univariate case, each model formulated for E(Y|X), irrespective of the specific formulation chosen, may be estimated by either QML or ML, depending on the adoption, or not, of full joint distributional assumptions for the fractional response variables.

(iii) In some applications, the response variables may be interpreted as ratios of integers, i.e. the dependent variables are the proportions of sampling units in a given group who select each of a set of mutually exclusive alternatives. In such cases, provided that the size group or the total number of units in a given group that choose each alternative are also known, models that exploit the extra information available may produce more efficient estimators than models which use information on the fractional response alone. In this research project, we propose using two parametric models that have not been considered previously in the literature of fractional responses, namely the multinomial and the Dirichlet-multinomial regression models. For both models we propose parameterizations that imply standard specifications for the conditional expectation of the response variables and, in contrast to the Dirichlet model, have the advantage of being able to deal with boundary observations.

(iv) Another limitation of the existing literature on multivariate fractional regression is the absence of suitable tests for assessing the assumptions underlying each parametric model. To the best of our knowledge, although the assumptions implied by the multinomial logit model are often questionable and the Dirichlet regression model is not robust to deviations from the assumed distribution, not a single paper in this area has applied specification tests. In this research project we develop tests for assessing both distributional and first moment assumptions. In the former case, we consider conditional moment tests for assessing the covariance structures implied by the Dirichlet, the multinomial and the Dirichlet-multinomial regression models. Regarding conditional mean assumptions, we propose tests that are extensions of their counterparts for either the univariate fractional case or the multivariate discrete case, namely RESET-type tests, goodness-of-functional form tests ([RaRaMu11]) and tests for the IIA assumption.

In terms of semi and nonparametric techniques, the following methods are expected to be adapted and applied to the modelling and forecasting of both univariate and multivariate fractional regression models:

(i) Decision tree models ([BrFrOlSt84]; [Qu86]). A decision tree model is a regression technique in which the predicted values of the target variable are obtained through a series of sequential logical if-then conditions. This sequence of binary splits divides the fractional response observations into several partitions according to some explanatory variables. The objective of the splitting procedure is to divide the data into groups in which the observations are as homogenous as possible. The predicted response in a given partition is equal to the average of the response variable for the set of observations that lie in the partition, which implies that when the response variable is bounded to the unit interval, predicted values will inevitably be also bounded between 0 and 1. The only application of decision tree models in the fractional context, [Ba10], was made by one of the researchers involved in this project, which, however, was mainly interested in forecasting in the framework of univariate models. In this project, we extend the method for the multivariate setting and consider also its use for the determination of the factors that explain the conditional mean of fractional response variables, both as a sole method and in conjugation with parametric models.

(ii) Artificial neural networks ([Bi96]). An artificial neural network is a nonparametric mathematical model that attempts to emulate the functioning of biological neural networks, consisting of a group of interconnected processing units denoted by neurons. Due to their good capability of approximating arbitrary complex functions ([HoStWh89]), these models have been applied in a wide range of scientific domains, including finance (e.g. [AlMaVa94] use neural networks to model the probability of default). Typically, neural networks employ linear activation functions in the output neuron. In order to adapt it to the fractional context, in this research project we consider using sigmoid activation functions that guarantee that the predicted values are constrained to the unit interval. Furthermore, neural network architectures with two or more neurons in the output layer allow regression analyses of multivariate fractional dependent variables.

For all the new models developed in this research project, we carry out Monte Carlo simulation studies to assess the finite sample performance of the estimators and tests developed. In addition, we use real data to show the usefulness of the suggested methods in empirical work. In particular, we consider the following applications in the area of finance:

(i) Capital structure decisions. In this case, we consider two distinct applications of our methods. First, since nonparametric methods may be used both in classification and regression problems, we implement a full nonparametric version of the two-part fractional regression model considered by [RaSi09] to deal with cases where the fractional dependent variable (e.g. the proportion of interest-bearing debt in firms’ capital) has a nontrivial probability of assuming one of its boundary values. Second, we show how our multivariate models can cope with situations where there is a clear interdependency between the proportions allocated to different types/sources of funding, an issue mainly disregarded by the previous literature;

(ii) Cash-holdings decisions. This application is mainly designed to show how decision tree models and parametric regression models can be combined in a single research design to improve our understanding of a well-established research area. Specifically, we first use decision tree models to partition firms into homogenous groups using a number of firms’ attributes (e.g. number of employees, annual turnover, annual balance sheet total, industry dummies), and then employ parametric models to study the cash holding’s conditional mean of each group.

(iii) Composition of the board of directors and determinants of institutional equity ownership. These two applications will shed light on how fractional regression models behave when the fractional dependent variable is defined as the ratio of two known integers. While in the first application the fractional response is univariate and the denominator of the ratio is typically a small value, in the second case that denominator is commonly very large. In this second application, we also consider the case of multivariate proportions, as we partition institutional investors into five mutually exclusive categories, an innovation in comparison with the extent literature.

(iv) Loss-given-default. The main purpose of this application is to apply the theory of forecasting in the context of fractional regression. In order to contribute to the current literature on this subject, we implement neural network models of loss-given default and apply parametric and nonparametric models that take into account the fact that recovery distributions for bank loans and subordinated bonds typically present point masses at zero, that is, many credits result in total loss of the outstanding debt.