## Multiple Regression

## Testing Multiple Linear Restrictions: the F-test

The **t-test** is to test whether or not the unknown parameter in the population is equal to a given constant (in some cases, we are to test if the coefficient is equal to 0 – in other words, if the independent variable is **individually significant**.)

The **F-test** is to test whether or not **a group of variables** has an effect on y, meaning we are to test if these variables are **jointly significant**.

Looking at the **t-ratios** for *“bavg,” “hrunsyr,”* and “rbisyr,” we can see that **none of them is individually statistically different from 0.** However, in this case, we are not interested in their

*individual*significance on y,

**we are interested in their**(Their individual t-ratios are small maybe because of multicollinearity.) Therefore, we need to conduct the F-test.

*joint significance*on y.

SSR_{UR }= 183.186327 (SSR of Unrestricted Model)

SSR_{R}=198.311477 (SSR of Restricted Model)

SSR stands for **Sum of Squares of Residuals**. *Residual is the difference between the actual y and the predicted y from the model.* Therefore, the smaller SSR is, the better the model is.

From the data above, we can see that *after we drop the group of variables* (bavg,” “hrunsyr,” and “rbisyr”), *SSR increases from 183 to 198*, which is about 8.2%. Therefore, we can conclude that we should keep those 3 variables.

**q**: number of restriction (the number of independent variables are dropped). In this case, q=3.

**k**: number of independent variables

q: numerator degrees of freedom

n-k-1: denominator degrees of freedom

In order to find Critical F, we can look up the F table. I also have found a convenient website for critical-F value http://www.danielsoper.com/statcalc/calc04.aspx.

We can calculate F in STATA by using the command

**test bavg hrunsyr brisyr**

Here is the output

Our F statistic is **9.55**.

******NOTE******: When we calculate F test, **we need to make sure that ****our unrestricted and restricted models are from the same set of observations**. We can check by looking at **the number of observations** in each model and make sure they are the same. Sometimes there are *missing values* in our data, so *there may be fewer observations in the unrestricted model *(since we account for more variables)* than in the restricted model* (using fewer variables).

In our example, our observations are **353** for both unrestricted and restricted models.

**If the number of observations differs, we have to re-estimate the restricted model** (the models after dropped some variables) using the same observations used to estimate unrestricted model (the original model).

Back to our example, **if our observations were different in the two models**, we would

**if bavg~=.** means if bavg is not missing,

**if bavg~=. & hrunsyr ~=. & rbisyr~=.** means if bavg, hrunsyr, rbisyr are **ALL** not missing (notice the “**&” **sign). That means even if one value of either one variable is missing, STATA will not take that observation into account while generating the regression.

========================================

==============================

===============

There is one special case of F-test that we want to test the **overall significance **of a model. In other words, we want to know **if the regression model is useful at all**, or we would need to throw it out and consider other variables. This is rarely the case, though.

**~~~~~~~~~~~000~~~~~~~~~~~**

This was such a painful and lengthy post. It has so many formulas that I had to do it in Microsoft Words and then convert it into several pictures…. I hope I made sense, though. =)

Let’s just keep in mind that **the F test is for joint significance**. That means we want to see whether or not **a group of variables should be kept in the model. **

Also, unlike the t distribution (bell shaped curve), **F distribution is skewed to the right, **with the smallest value is 0. Therefore, we would **reject the null hypothesis** if **F-statistic** (from the formula) **is greater than critical-F** (from the F table).

## Single Linear Combinations of Parameters

**Single Linear Combinations of Parameters means we are to test the linear relationship between two parameters in our multiple regression analysis. The simplest case can be**

**H _{0}:_{ } β_{1 }= β_{2}**

_{ }Or **H _{0}: β_{1 }= 10β_{2}**

Our hypothesis can be pretty much anything, as long as β_{1 }and β_{2} has linear relationship.

**Note** that we are to test whether or not **the effects of the two x variables** **on y** have **a linear relationship**, **NOT** the **linear relationship** between **the two x variables** **on each other** (that is the case of *perfect multicollinearity*).

For example, we are interested in testing

**H _{0}:_{ } β_{1 }= β_{2}**

**H _{1}:_{ } β_{1 }≠ β_{2}**

_{
}

**FIRST METHOD**

Set **θ = β _{1} – β_{2}**, then we will have

**H _{0}:_{ } θ = 0**

H_{1}:_{ } θ_{ }≠ 0

–**Set α** (if not given, assume it to be .05)

–**Find critical value**: df=n-k-1 (k is the number of x variables), then use the t-table to find critical value.

–**Calculate test statistic**:

*(That output above was an example from my class notes.)*

Therefore,

–**Decision**: to reject H_{0 }or not (by comparing t^{0}_{ }with the critical value)

–**Conclusion**:

If we **reject H _{0}**

_{ }(β

_{1 }= β

_{2}), we will conclude that

**β**

_{1}**is statistically different from β**.

_{2}at α levelIf we **fail to reject H _{0}**

_{ }(β

_{1 }= β

_{2}), we will conclude that

**β**.

_{1}is not statistically different from β_{2}at α level=========

Crazy enough, huh? There is another method that may look easier:

**SECOND METHOD**

-Set **θ = β _{1} – β_{2}**, then

**β**

_{1}= θ + β_{2}_{
}

_{ }

-Substitute **β _{1 }**in our original model by

**θ + β**

_{2}_{
}

y= β_{0 }+ **β _{1}**x

_{1 }+ β

_{2}x

_{2 }+ β

_{3}x

_{3 }+ u

y= β_{0 }+ **(θ + β _{2})**x

_{1}+ β

_{2}x

_{2 }+ β

_{3}x

_{3 }+ u =

**β**

_{0 }+ θ**x**

_{1}+ β_{2}(**x**

_{1}+x_{2})_{ }+**β**

_{3}x_{3}_{ }+ uNow our 3 variables in the model are **x _{1, }**

**x**

_{1}+x_{2, }**x**

_{3}_{
}

-Construct a new variable that is the sum of x_{1 }and x_{2 }(in STATA) by using the command

gen totx12 = x_{1 }+ x_{2}

_{“totx12” is just the name of the new variable.}

_{
}

_{ }

-Run the regression of y on x_{1}, totx12, and x_{3}

_{
}

-Test:

**H _{0}:_{ } θ = 0**

H_{1}:_{ } θ_{ }≠ 0

Now we can look at the t-ratio or p-value of the coefficient on x_{1} (**coefficient on x _{1 } is now θ**) then make our decision whether or not to reject H

_{0}.