Economic analysis of healthcare interventions can provide a useful perspective to decision makers regarding the resources allocation. It is only one perspective. It demonstrates the value of an intervention when compared to others that would produce the same or at least similar outcomes. However, it is not the only perspective. Political, budgetary, humanistic and/or resources allocation perspectives, may provide additional vectors that will influence healthcare decisions.

Economic analysis uses neoclassical economic models to envision comparisons among health interventions, whether they are pharmaceutical, surgical or medical. Although, the construction of the models is fairly straight-forward, populating them with relevant data has proved to be challenging. Especially, when it comes to recent interventions for which market data is not yet available. Nevertheless, decisions need to be made, and as such, one needs to consider practical issues. In evaluating choices in resource allocation, the opportunity costs do provide valuable insights in making health policies as well as their implementations. Economic models can provide those insights.

*Consider Retrospective versus Prospective Studies.*

True, prospective studies are the gold standard (currently) for answering questions on health interventions. Does this mean that retrospective studies are unnecessary? Not at all. Retrospective studies can answer many healthcare questions, especially in resource utilization, treatment pathway, optimization, improvement of treatment patterns, cost/benefit and cost /effectiveness analysis.

Before embarking on a prospective time-and-motion study, it is prudent to review the resource utilization in every cost center. Thus, concentrating efforts primarily in areas that are cost drivers. A retrospective analysis can provide this information. The same holds true for comparing treatment pathways. The most frequently used method for analyzing retrospective data is regression. Since retrospective datasets were originally collected for purposes other than those required for construction of an economic model, the results may be skewed. To correct for this possibility, multiple regression analysis aims to adjust for confounding factors and thus provide objective results.

*Consider Regression Analysis*

There are many ways to do regression but only one of them is BLUE. What is a BLUE regression? BLUE stands for the best linear unbiased estimate. In other words, it is the functional form that yields accurate results. In our “point-&-click” world, anyone can construct a regression. However, you need to ask: ‘How valid are the results?’

Multiple regression is a tool for estimating/predicting the impact that several factors (independent variables) have on another factor (dependent variable). It can be used in two ways:

- Estimation, when the result of interest is within the ranges determined by the variables (interpolation).
- Prediction, when it is outside of that range (extrapolation). Sidebar: Here is an example of how things can go wrong. Assume that you want to determine the speed to which salt dissolves in water based on several factors including water temperature. The coefficient for water temperature variable will indicate the increase in speed of salt dissolving predicated on the increase in temperature. Let’s say that you have data on the speed of solubility on a range of 20C-80C. So far, so good. Now, assume that someone asks you for the speed of solubility at 120C. You can plug it into the regression formula and get an answer but that answer would be wrong. Why? Because at temperatures over 100C water changes its state from liquid to gas. Hence, there will be no liquid water in which to dissolve the salt and your regression formula does not apply to a gaseous form. The point is, that you need to be careful about the results that are extrapolated from available data.

In the above paragraph, I spoke of independent variables as impacting the dependent variable. Let’s stay with this for a moment. What makes a variable independent? Lack of correlation with any other independent variable. The independent variables need to be independent because the interpretation of their coefficients is the amount by which the dependent variable will change given a one unit change in the independent variable, with all other factors remaining the same. Now, if two independent variables are correlated, then a change in one, will also constitute a change in the other. So, the condition that “all other factors remain the same” is violated. This is called colinearity. To avoid it, do a simple correlation test to make sure that the independent variables are truly independent. If you can derive one independent variable from another, your results will be biased. For example, the value of hematocrit is roughly three times the value of hemoglobin. As an independent variable, use either one, but not both.

A few words on the importance of the error term. Start by considering the functional form of a “True Model”, which explains all the variation in the dependent variable. It means that the independent variables completely describe the process. Alas, life is not that easy. Chances are, you’ll have missing variables (latent variables), measurement error, missing data, etc. hence the error term.

Moving on, two important questions to ask, are:

- How well does the model fit the data? You can find this out from the coefficient of determination. Otherwise known as the R-square. It represents the amount of variability in the data that it is explained by the model. The higher the coefficient the determination, the higher confidence that the functional form is correct.
- How well does the data fit the model? Otherwise known as the analysis of residuals. Few, very few people look at this. The most important aspect is that the residuals are normally distributed. It stems from the stochastic nature of random variables. Regression residuals are actually estimates of the true error. Just like the regression coefficients are estimates of the true population coefficients. Residuals can be thought of as elements of variation unexplained by the fitted model. Since this is a form of error, the same general assumptions apply to the group of residuals that are expected for errors: It is expected that they will be normally and independently distributed (mean of 0; constant variance). Randomness and unpredictability are crucial components of any regression model. Otherwise, your model is not valid.

One word about betas (the coefficient in front of each independent variable). As mentioned earlier, it indicates the amount by which the dependent variable will change due to a one unit change in the particular independent variable all other variables remaining unchanged. Yes, it is important enough to be repeated.

My last point in this brief post on regression – for a complete description look for my upcoming book- has to do with the functional form specification. The functional form is correctly specified IFF the inclusion of each of the independent variables has a conceptual rationale for being included in the model. For instance: A model was developed having as one of the independent variables the square of the age of the participant. Not the actual age, but the square of the age. Although, it explained 4% of the variability in the model, to this day, I cannot imagine what relevance the square of the age may have to do with the outcome. Hence, be on the lookout for independent variables that do not make sense. Some ignorant (let’s call them ignorant because it is kinder than unscrupulous) modelers will include every available variable in order to increase the coefficient of determination. The question you need to ask is: ‘Is it logical to assume that this variable have an actual impact on outcome or is it just a spurious association?’.

In conclusion, let me go over best linear unbiased estimator (BLUE) one more time:

- Best means that there are none better.
- Linear means that the relationship between the independent and dependent variables is in fact linear.
- Unbiased means that there is no systematic error.
- Estimator refers to the betas, but you already knew that.

*Consider the Fallacy of Survey Data*

Surveys are as good as the people who validate them. There is an entire science dedicated to developing and validating such instruments. Unfortunately, often they are cobbled together by people who have no idea as to how to go about it. Worse, conclusions are drawn from, and decisions are based on the data from such questionnaires.

I remember the following example given by Dr. Steve Albert, my doctoral adviser: “Say you give 100 people the following four distinct choices: ‘Is the moon is made of yellow, blue, red or white cheese?’ and 80% of them reply “yellow”. Does this mean that moon is made of yellow cheese?” Think about it.

Now, think about the implications of acting on the belief that the moon is made of yellow cheese. Publishing these data would cause ripples in the cheese markets. To aggravate the situation, private space exploration companies announce their intention to mine the moon for cheese and deliver it to markets on Earth at very low prices, given the abundance of this product. Cheese makers will ask for protection, demanding that:

- Only cheese produced on Earth can be label as “cheese”.
- Only cheese produced on Earth can bear the label “Appellation d’origine contrôleè” (protected designation of origin).
- Any other similar product would have to bear the label “cheese-like substance”.
- Moon cheese-like substance will be subject to high importation tariffs.

Granted, it is a bit of a stretch but it illustrates a point. The point being that there is important relevance attached, not only to how a question is asked, but also in how such questions are formulated. How questions are asked, the order in which they are placed as well as the responder burden. The last one pertains to how much time one can expect a person to read and answer a questionnaire, before he lose interest, and starts checking boxes at random, just to be done with the task.

Designing questionnaires is both an art and a science. It requires the imagination of creating something new as well as the scientific rigor to produce valid data. In this process, the first and most important factor is to determine what concept is being measured, then decide how to measure it. Since, measuring the wrong concept, with maximum accuracy, is just as useless, as measuring the right concept, without any.

*Consider PRO Chaos*

Notoriously noisy, PRO data presents unique analysis challenges. As with any other type of data, the objective of PRO data analysis is to find a pattern that models the data. This pattern may not be obvious, but once found, a definite structure can be modeled that would explain the phenomenon being studied. In Chaos Theory, mathematical studies of dynamic systems demonstrates that the solutions of simple non-linear equations can predict complex behavior over time. These studies, posit the concept of a strange attractor which has the property of modeling the decay of the system to a steady state. However, this steady state is rather complex and not. Nevertheless, upon convergence, it will emerge as the solution of a probabilistic set of equations. When applied to PRO data, these equations can assist in separating a signal -should it exist – from the noise.

In a separate, but similar line of thinking, signal processing algorithms have been used by Philips to review data for obtaining its specific mathematical features. Which, in turn, those features may provide evidence of a measurable effect of a compound or treatment.

## Finally,

**Consider the Blunt Axe Theorem: **** **“If the axe fell the tree, it was sharp enough”. In our case, we need to ensure that the models are not needlessly complicated. Transparency is the key in designing intuitive models that can be easily understood and implemented. Simplicity is not always possible but we should strive for it nevertheless.