·

Cursos Gerais ·

Inferência Estatística 1

Send your question to AI and receive an answer instantly

Ask Question

Preview text

C H A P T E R 9 Bayesian Hypothesis Testing 91 INTRODUCTION In this chapter we discuss Bayesian hypothesis testing We begin with some histor ical background regarding how hypothesis testing has been treated in science in the past and show how the Bayesian approach to the subject has really provided the statistical basis for its development We then discuss some of the problems that have plagued fiequentist methods of hypothesis testing during the twentieth century We will treat two Bayesian approaches to the subject 1 The vague prior approach of Lindley which is somewhat limited but easy to implement and 2 The very general approach of Jefkys which is the current commonly accepted Bayesian niethod of hypothesis testing although it is somewhat more complicated to carry out 92 A BRIEF HISTORY OF SCIENTIFIC HYPOTHESIS TESTING There is considerable evidence of ad hoc tests of hypotheses that were developed to serve particular applications especially in astronomy as science has developed But there had been no underlying theory that could serve as the basis for generating appropriate tests in general until Bayes theorem was expounded Moreover researchers had difficulty applying the theorem even when they wanted to use it Karl Pearson 1 892 initiated the development of a formal theory of hypothesis testing with his development of chisquared testing for multinomial proportions Me liked the idea of applying Bayes theorem to test hypotheses but he could not quite figure out how to generate prior distributions to support the Bayesian approach Moreover he did not recognize that corwideration of one or more alternative hypoth eses might be relevant for testing a basic scientific hypothesis 217 Subjective and Objective Bayesian Statistics Principles Models and Applications S james Press Copyright 0 2003 by John Wiley and Sons Inc 218 BAYESIAN HYPOTHESIS TESTNG Student William Sealy Gosset 1908 in developing his ttest for the mean of a normal distribution and for his work with the sample correlation coefficient claimed that he would have preferred to use Bayes theorem he referred to it as inverse probability but he did not know how to set his prior distribution Fisher 1925 developed a formal theory of hypothesis testing that would serve for a variety of scientific situations although Fisher like Karl Pearson also did not consider alternative hypotheses that modification would wait for Neyman and Pear son 1933 Fisher attempted to develop an approach that would be objective in some sense and would compare the actual observed data with how data might look if they were generated randomly Fishers approach to scientific hypothesis testing was totally nonBayesian it was based upon the thinking of his time that was dominated by the influential twentiethcentury philosopher Karl Popper 1 93 5 1959 Popper advocated a theory of falsification ar refbtability to test scientific theories As Popper saw it a scientific theory should be tested by examining evidence that could in principle refute disconfirm or falsify the theory Poppers idea for testing a scientific theory was for the scientist to set up a strawman hyputh esis a hypothesis opposite to what the scientist truly believes but under considera tion to see if it can be destroyed and then show that the strawman hypothesis is indeed false Then the theory based upon the strawman hypothesis could be discarded Otherwise one had to wait for additional evidence before proceeding to accept the strawman hypothesis Fisher adopted this falsificationstrawman position For example Popper suggests that a scientist might believe that heshe has a strong theory about why some phenomenon takes place The scientist might set up a hypothesis that implies that the phenomenon takes place say at random The hypothesis of randomness is then tested thats the strawman hypothesis The scien tist may then find that empirical data do not support the randomness hypothesis So it must be rejected But the scientists real hypothesis that heshe believes in cannot yet be accepted as correct if that is the alternative hypothesis It will require more testing before it can be accepted This process was formalized by Fisher suggesting beginning with a null hypoth esis a hypothesis that the researcher believes a priori to be false and then carrying out an experiment that will generate data that will show the null hypothesis to be false Fisher proposed a test of significance in which if an appropriate test statistic exceeds some special calculated value based upon a preassigned signifi cance level the null hypothesis is rejected But if it turns out that the null hypoth esis cannot be rejected no conclusion is drawn For Fisher there was no criterion for accepting a hypothesis In science we never know when a theory is true in some sense we can only show that a theory may be false because we can find contra dictions or inconsistencies in its implications Fisher also suggested that one could alternatively compute a pvalue that is the probability of observing the actually observed value of the test statistic or anything more extreme assuming the null hypothesis is true Some frequentists also think of the pvalue as a sample signif icance level We will discuss below how pvalues relate to posterior probabilities 92 A BRIEF HISTORY OF SClENTIFlC HYPOTHESIS TESTING 21 9 To the extent that scientists should reject a hypothesis theory if experimental data suggest that the alternative hypothesis theory is more probable this is very sensible it is the way science proceeds So Fisher appropriately suggested that we not accept the alternative hypothesis when we reject the null hypothesis We should just postpone a decision until we have better information We will see that in Bayesian hypothesis testing we compute the weight of the experimental evidence as measured by how probable it makes the main hypothesis relative to the alternative hypothesis The PopperFisher approach to hypothesis testing was not probability based That is probabilities were not placed on the hypothesis being tested For Fisher one could not place a probability on an hypothesis because an hypothesis is not a random variable in the frequentkt sense For a Bayesian however there is no problem placing a probability on an hypothesis The truthfulness of the hypothesis is unknown so a researcher can put hisher subjective probability on it to express hisher degree of uncertainty about its truthfulness We will see how this is done in the Jefreys approach to Bayesian hypothesis testing Jerzy Neyman and Egon Pearson in a series of papers starting in 1928 developed a theory of hypothesis testing that modified and extended Fishers ideas in various ways see for example Neyrnan and Pearson 1966 where these papers are collected Neyman and Pearson introduced the idea of alternative hypotheses In addition they introduced the notions of Type One and Type Two errors that could be made in testing statistical hypotheses They defined the concept of power of a test and proposed that the ratio of the likethood of the null hypothesis to the likelihood of the alternative hypothesis be used to compare a simple null hypothesis against a simple alternative hypothesis NeymanPearson Lemma But the theory was all still embedded in the freyuentist falsification notions of Popper and Fisher Wald 1939 proposed a theory of decision making that incorporated statistical inference problems of hypothesis testing We will be discussing decision theory in Chapter 1 I This theory suggested as did the idea of Neyman and Pearson that hypotheses should be tested on the basis of their consequences Bayesian methodol ogy shaped the development of this approach to hypothesis testing in that it was found that Bayesianderived decision making procedures are the ones that generate the very best decision rules see Chapter 11 When it was found to be difficult to assign prior distributions to develop the optimal procedures given the limited devel opment of prior distribution theory at that time alternative minimax and other procedures were developed to generate usehl decision rules But the Bayes proce dures that is procedures found by minimizing the expected loss or maximizing the expected utility of a decision rule were seen to be the best decision rules that could be found in contexts in which the decision maker had to make decisions in the face of uncertain outcomes of experiments that were decided by nature this was shown in Savage 1954 In contexts games in whch decisions were lo be made with respect to other human decision makers minimax or some other nonexpectedloss criterion might be appropriate There is additional discussion in Chapter 11 Lehmann 1 959 summarized the FisherlNeymanPearsonWald frequentist methods of hypothesis testing procedures in his book on hypothesis testing 220 BAYESIAN HYPOTHESlS TESTING Throughout he used the longrun frequency interpretation of probability rather than the Bayesian or subjective probability notion In keeping with Bayesian thinking Good 1950 1965 proposed and further codified the testing process by suggesting that to compare scientific theories scien tists should examine the weight ofthe evidence favoring each of them and he has shown that this concept is well defined in terns of a conditioning on prior informa tion Good 1983 developed these ideas tlurther This was the state of statistical and scientific hypothesis testing until Jefieys 1961 and Lindley 1965 proposed their Bayesian approaches that are outlined in Sections 94 and 95 93 TESTING PROBLEMS WITH FREQUENTIST METHODS OF HYPOTHESIS There are a variety of problems difficulties and inconsistencies associated with frequentist methods of testing hypotheses that are overcome by using Bayesian methods Some of these problems are enumerated below 1 Buyesians huve infrequent need lo test To begin hypothesis testing per se is something Bayesian scientists do only infrequently The reason is that once the posterior distribution is found it contains all the information usually required about an unknown quantity The posterior distribution can and should be used to learn and to modify or update earlier held beliefs and judgments We do not normally need to go beyond the posterior distribution In some situations however such as where the researcher is attempting to decide whether some empirical data conform to a particular theory or when the researcher is trying to distinguish among two or more theories that might reasonably explain some empirical data the Bayesian researcher does need to test among several hypotheses or theories We also need to go beyond the posterior distribution in experimental design situations see for example Raiffa and Schlaifer 196 1 who discuss the value of samplr in formaijon and preposterior analy sis 2 Problems with probabilities on hypotheses The fiequentist approach to hypothesis testing does not permit researchers to place probabilities of being correct on the competing hypotheses This is because of the limitations on mathematical probabilities used by frequentists For the frequentist probabilities can only be defined for random variables and hypotheses are not random variables they are not observable But Bayesian subjcctive probability is defined for all unknowns and the truthfulness of the hypothescs is unknown This limitation for frequentists is a real drawback because the applied researcher would really like to be able to place a degree of belief on the hypothesis He or she would like to see how the weight of evidence modifies hisher degree of helief brobability of the hypothesis being true It is subjective probabilities of the competing hypotheses being true that are 93 PROBLEMS WITH FREQGENTIST MhTHODS OF HYPUTHESIS TESTING 221 compared in thc suhjecfiive Bayesian approach Objective Bayesians as well as frequentists have problems in hypothesis testing with odds ratios as with Bayes factors because odds ratios are not defined when improper prior probabilities arc used in both the numerator and denominator in the odds ratio 3 Problems with preassigned significance levels Frequentist methods of hypothesis testing require that a level of significance of the test such as 5 percent be preassigned But that level of significance is quite arbitrary It could just as easily be less than 1246 percent etc Where should the line be drawn and still have the result be significant statistically for a frequentist The concept is not well defined In the Bayesian approach we completely obviate the necessity of assigning such arbitrary levels of significance If the weight of the evidence favors one hypothesis over another that is all we need to know to decide in favor of that hypothesis 4 Inatlequucy of frequentist testing of a sharp null hypothesis Suppose we wish to test Ho 6 0 vs H I 0 O for some known O and we decide to base our test on a statistic T I TfX XJ It is usuaily the case that in addition to 0 being unknown we usually cannot be certain that ff Oo precisely even if it may be close to 0 In fact in the usual frequentist approach we often start out believing I Oo which corresponds to some intervention having had an effect but we test a null hypothesis 11 that O 0 that is we start out by disbelieving 0 Oo and then we test it Suppose 0 is actually E away from f for some E 0 and I is very small Then by consistency of the testing procedure for sufficiently large n WG will rcject Ho with probability equal to one So depending upon whether we want to reject H or whether we want to find that we cannot reject H we can choose n accordingIy This is a very unsatisfactory situation The same argument applies to all significance testing 5 Frequentist use of possible valws tiever observed Jeffreys I 961 p 385 points out that frequentist hypothesis testers have to rely upon values of observables never observed He says What the use of the value implies therefore is that a hypothesis that may be true may be rejected because it has not predicted observable results that have not occurred Jeffreys is reacting to the fact that the frequentist divides the sample space into a critical region for which the null hypothesis will be rejected if the test statistic fall into it and a complementary region for which the null hypothesis will not be rejected if the test statistic falls into it But these regions contain values of possible test statistics never actually observed So the test depends upon values that are observable but have never actually been observed Tests that depend upon values that have never actually been observed violate the likelihood principle sce Chapter 3 6 Prclhbmv with pvalues The yvalue is generaIfy the value reported by researchers for the statistical significance of an experiment they camed out If p is very low 5 005 the result found in the experiment is considered 222 BAYESlAN HYWMESIS TESTTNG sfuristicully significunt by frequentkt standards But the pvalue depends upon the sample size used in the experiment By taking a sufficiently large sample size we can generally achieve a small pvalue Berger and Selke 1987 and Casella and Berger 1987 compared pvalues with the posterior distribution for the same problem They found for that problem that the evidence against the null hypothesis based upon the posterior distribution is gener ally weaker than that reflected by the pvalue That is the pvalue suggests rejecting the null hypothesis more often than the Bayesian approach would suggest In this sense the Bayesian hypothesis test is more conservative than is the frequentist hypothesis test The example used by Berger and Selke 1987 is presented in the fullowing Suppose the probabiIity density function for an observable X is given byJx 1 0 We are interested in testing the sharp null hypothesis Ho 0 0 versus the alter native hypothesis that HI 8 Let TX denote an appropriate test statistic Denote the pvalue While the result will be general for concreteness we will make the example very specific Suppose we have a sample of independent and identically distributed data X XI Xn and 4 I Oj N0 IT for known 0 i 1 n The usual test statistic sufficient in this problem is Define The pvalue becomes 92 a denotes the cdf of the standard normal distribution Suppose in the interest of fairness a Bayesian scientist assigns 50 percent prior probability to H and 50 percent prior probability to H I but he spreads the mass on HI out according to 93 PROBLEMS WITH FREQUENTIST METHODS OF NypoTHESlS TESTlNG 223 NOO 0 The posterior probability on the null hypothesis is shown below to be given by Prooj By byes theorem where and Note that since marginally x n 1 4 Simplifying gives the result in equation 95 We provide the values of the posterior probability PHO I x in Table 91 from which may bc seen that for example for n 50 the fiequentist researcher could reject at p 0050 5 percent since g 196 whereas the posterior probability PHo I x 052 so actually H0 is favored over the alternative At n SO the frequentist approach to hypothesis testing suggests that the null hypothesis be rejected at the 5 percent level of significance whereas h m a Bayesian point of view for n SO or more the posterior probability says that we should not reject the 224 BAYESIAN HYPOTHESIS TESTING Table 91 Values of the Posterior Probability PH 1 x pvalue g n 1 n S n 10 n 2 0 n 5 0 n 100 n I000 0100 1645 042 044 047 056 065 072 089 0050 1960 035 033 037 042 052 060 082 0010 2576 021 013 014 016 022 027 053 0001 3291 0086 0026 0024 0026 0034 0045 0124 null hypothesis So for the Bayesian hypothesis tester the evidence against the null hypothesis is weaker 94 HYPOTHESIS TESTING LINDLEYS VAGUE PRIOR PROCEDURE FOR BAYESlAN A procedure for testing a hypothesis H against an alternative hypothesis HI tiom a Bayesian point of view was suggested by Lindley 1965 Vol 2 p 65 The test procedure is readily understood through a simple example Suppose we have the independent and identically distributed data X X and A I 0 NB I i 1 n We wish to test H 0 00 versus HI 0 O0 We first recognize that 2 is sufficient for U and that c I 0 N0 ln Adopt the vague prior density for 0 g0 0 constant The result is that the posterior density for O is given by 6 I Z N5 1 n Next develop a credibility interval for H at level of credibility II where say a 5 percent This result is I 196 I x 95 percent 96 Now examine whether the interval includes the null hypothesis value D Ut If 0 is not included within this 95 percent credibility interval this is considered evidence against the null hypothesis and H is rejected Alternatively if 0 is found to lie within the 95 percent credibility interval we cannot reject the nu11 hypothesis Lindley I 965 actually proposed this frequentistlike Bayesian testing procedure by using a frequentist confidence interva1 approach instead of the credibility interval approach outlined here But under a vague prior density for 0 the approaches are equivalent We note here that these ideas result from attenipts to avoid totally abandoning the fiequentist notions of PopperFisherNeymanPearson It is still required that We preassign a level of significance for the test we still adopt a strawman null hypoth esis and we still do not place probabilities on the competing hypotheses The procedure also requires that we adopt a vague prior density for the unknown 0 and for the credibility interval to be sensible the prior density must be sniooth in the vicinity of do However suppose we have meaningfid prior information about 8 95 JEFFREYS PROCEDURE FOR BAYESIAN IIYPOTIIESIS TESTING 22s and we would likc to bring that infomation to bear on the problem The procedure does not afford us any way to bring the information into the problem Even worse if the prior information is mixed continuous and discrete the testing approach will not be applicable Regardless for simple situations where vague prior densities are sensible the Lindley hypothesis testing procedure provides a rapid and easily applied testing method A more general hypothesis resting procedure is found in the Jeffreys 1961 approach described in Section 95 941 The Liadley Paradox The Bayesian approach to hypothesis testing when little prior information is avail able received substantial interest when Lindley I 957 called attention to the para doxical result that a fiequentist scientist could strongly reject a sharp null hypothesis H while a Bayesian scientist could put a lump of prior probability on Ho and then spread the remaining prior probability out over all other values in a vague way uniformly and find there are high posterior odd in favor of Ho This paradox is equivalent to the discussion surrounding Table 9 t 95 JEFFREYS PROCEDURE FOR BAYESIAN HYPOTHESIS TESTING Jeffreys 1961 Chapters 5 and 6 suggested a totally Bayesian approach to hypoth esis testing that circumvents the inadequacies of the frequentist test procedures This procedure is outlined below 951 Testing P Simple Null Hypothesis Against a Simple AJteraative Hypothesis First we consider the case of testing a simple null hypothesis Ho 8 O against a simple alternative hypothesis HI 0 el where 0 and Il are preassigned constants recall that a simple hypothesis is one for which there is only one possible value for the unknown We assume that H and HI are mutually exclusive and exhaustive hypotheses Let T TX X denote an appropriate test statistic based upon a sample of n observations Then by Baycs theorem the posterior probability of H given the observed data T is 97 where PH and PHl denote the researchers prior probabilities of H and H Similarly for hypothesis HI we have 98 226 BAYESIAN HYPOTHESIS TESTiNG Note that PZi I T f Pli I T 1 Equations 97 and 98 can be combined to form the ratio Recall that if two probabilities sum to one their ratio is called the odds in favor of the event whose probability is in the numerator of the ratio Therefore Equation 99 may be interpreted to state that the posterior odds ratio in favor of H is equal to the product of the prior odds ratio in favor of Ho and the likelihood ratio JeReys Hypothesis Testing Criterion The Jefieys criterion for hypothesis testing becomes in a natural way If the postenor odds ratio exceeds unity we accept H otherwise we reject H in favor of It is not necessary to specify any particular level of significance We merely accept or reject the null hypothesis on the basis of which posterior probability is greater equivalently we accept or reject the null hypothesis on the basis of whether the posterior odds ratio is greater or less than one If the posterior odds ratio is precisely equal to one no decision can be made without additional data or additional prior information Note that if there were several possible hypotheses this approach would extend in a natural way we would find the hypothesis with the largest posterior probability I EMARK When we accept the null hypothesis because the weight of the evidence shows the null hypothesis to be favored by the data over the alternative hypothesis we should recognize that we are merely doing so on the basis that the null hypothesis is the one to be entertained until we have better information or a modified theory We are not assuming that the null hypothesis is true merely that with the present state of knowledge the null hypothesis is more probable than the alternative hypothesis Bayes Factors We note from equation 99 that the ratio of the posterior odds ratio to the prior odds ratio called the Buyes factor is a factor that depends only upon the sample data The Bayes factor reflects the extent to which the data themselves Without prior information favor one model over another In the case of testing a simple null hypothesis against a simple alternative hypothesis the Bayes factor is just the like lihood ratio which is also the frequentist test statistic for comparing two simple hypotheses the result of the NeymanPearson Lemma It becomes a bit more complicated instead of just the simple likelihood ratio in the case of a simple null hypothesis versus a composite alternative hypothesis Because the prior odds ratio is often taken to be one by the objective Bayesian the Bayes factor acts as an objectivist Bayesians answer to how to compare models The subjectivist Bayesian 95 JEFFREYS PROCEDURE FOR BAYESIAN HYPOTHESIS TESTING 227 scjcntist nceds only the posterior odds ratio to compare models which may differ from the Bayes factor depending upon the value of the prior odds ratio As an example suppose X I 0 N8 I and we are interested in testing whether Ho 0 0 versus HI 0 1 and these are the only two possibilities We take a random sample X XLr and form the sufficient statistic T x I N g q We note that T I Ho N0 liV and T I H E Nfl lN Assume that u priuri P P f f l 05 Then the posterior oddq ratio is given by 91 1 Suppose our sample is of size N 10 and we find X 2 Then the posterior odds ratio becomes 912 Since the posterior odds ratio is so small we must clearly reject Ha in favor of HI 0 1 Because the prior odds ratio is unity in this case the possterior odds ratio is equal to the Bayes factor Note that comparing the posterior odds ratio with unity is equivalent to choosing the largcr of the two posterior probabiiities of the hypotheses If we could assign losscs to thc two possible incorrect decisions we would choose the hypothesis with the smaller expected loss See Chapter 1 1 for the role of loss hctions in decision making 952 Testing a Simple Null Hypothesis Against a Composite Alternative Hypothesis Next we consider the more common case of testing a simple hypothesis Ho against a compositc hypothesis H Suppose there is a parameter 0 possibly vector valued indexing the distribution of the test statistic T TXt Then the ratio of the posterior density of Ha compared with that of H is 228 BAYFSIAN HYPOTHESIS TESTING where gB denotes the prior density for 8 under H Thus the posterior adds ratio in the case of a composite alternative hypothesis is the product of the prior odds ratio timcs the ratio of the averaged or marginal likelihoods under Ho and H Note that under H because it is a simple hypothesis the likelihood has only one value so its average is that value If the null hypothesis were also composite we would need to use an integral average in that case as well We assume of course that these integrals converge In the event g0 is an improper density the integrals will not always exist Note also that in this case the Bayes factor is the ratio of the like lihood under H to the averaged likelihood under Hi We have assumed there arc no additional parameters in the problem If there are we deal with them by integrating them out with respect to an appropriate prior distribution For example suppase U 1 80 N0 a and we are interested in testing the hypothesis H 0 0 d 0 versus the alternative hypothesis H O 0 a2 0 If X X are iid j s2 is suficient for 8 o where s2 is the sample variance Then the posterior odds ratio for testing H versus HI is 914 for appropriate prior densities gl0 and g2a2 As an example of a simple versus composite hypothesis testing problem in which there are no additional parameters suppose X I 0 N0 l and we are interested in testing H 0 0 versus H 8 0 We take a random sample X XI of size N 10 and form the sufficient statistic T X and assume x 2 Assume PHo PtHl 05 We note that I 0 N0 1N and so and As a prior distribution for 8 under H I we take B NI 1 Then 915 eo5uI f i 95 JEPFREYS PROCEDURE FOR BAYESIAN HYPOTHESIS TESTING 229 The posterior odds ratio becomes JN 1 exp 05 Since hT 10 and 2 2 we have 918 Thus we reject Ho 0 0 in favor of H 0 0 That is the evidence strongly favors the alternative hypothesis H 953 Problems With Bayesian Hypothesis Testing with Vague Prior Informa tion Comparing models or testing hypotheses when thc prior information about the unknowns is weak or vague presents some difficulties Note fiom Equation 914 that as long as gi and g2d are proper prior densities the integrals and the Bayes factor are well defined The subjectivist Bayesian scientist who uses subjec tive information to assess hishcr prior distributions will have no difficulty adopting Jeffreys method of testing hypotheses or comparing models and theories But the objectivist Bayesian has considerable problems as will be seen fiom the following discussion Suppose for example that gz02 is a vague prior density so that The proportionality constant is arbitrary So in this situation in the ratio of integrals in Equation 914 there results an arbitrary ratio of constants rendering the criter ion for decision arbitrary A solution to this problem was proposed by Lempers 1971 Section 53 He suggested that in such situations the data could be divided into two parts the fust of which is used as a training sample and the remaining part for hypothesis testing A twostep procedure results For the first step the training sample is used with a vague 230 BAYISAN HYPOTHESIS TESTING prior density for the unknowns and a posterior density is developed in the usual way This posterior distribution is not used for comparing models In the second step the posterior density developed in the first step is used as the proper prior density for model comparison with the remaining part of the data Now there are no hypothesis testing problems The resulting Bayes factor is now based upon only part of the data the remaining part of the data after the training data portion is extracted and accordingly is called a partial Buyes fuctor A remaining problem is how to subdivide the data into two parts Berger and Pericchi 1996 suggested that the training data portion be determined the smallest possible data set that would generate a proper posterior distribution for the unknowns a proper posterior distribution is what is required to make the Lempers proposal operational There are of course many ways to generate such a minimal training data set Each such training data set would generate a feasible Bayes factor Berger and Pericchi call the average of these feasible Bayes factors the intrinsic Buyafuctor If the partial Bayes factor is robust with respect to which training data set is used so that resulting posterior probabilities do not vary much using the intrinsic Bayes factor is very reasonable If the partial Bayes factor is not robust in this sense there can still be problems OHagen 1993 considers the case in which therc are very large data sets so that asymptotic behavior can be used For such situations he defines afructional l3uyer factor that depends upon the fraction b of the total sample of data that has not been used for model comparison It is not clear that the use of fractional Bayes factors wilt improve the situation in small or moderate sizc samples The Bayesian Jeffxys approach is now the preferred method of comparing scientific theories For example in the book by hiathms and Walker 1965 pp 361370 in which in the Preface the authors explain that the book was an outgrowth of lectures by Richard Feynman at Cornell University Feynman suggests that to compare contending theories in physics one should use the Bayesian approach This fact was called to my attention by Dr Carlo Brumat SUMMARY This chapter has presented the Bayesian approach to hypothesis testing and model comparison We traced the development of scientific hypothesis testing from the approach of Karl Pearson to that of Harold Jefieys We showed how the Bayesian approach differs from the frequentist approach and why there are problems with the frequentist methodology We introduced both the Lindley vague prior approach to hypothesis testing as well as the Jefieys general prior approach to testing and model comparison We examined the testing of simple null hypotheses against simple alternative hypotheses as well as the testing of simple Venus composite hypotheses We discussed Bayes factors partial Bayes factors intrinsic Bayes factors and fractional Bayes factors FURTHER READING EXERCISES 23 1 91 92 93 94 9 5 9A 97 98 99 910 91 1 912 Suppose that XI X I are independent and identically distributed as NO 4 Test the hypothesis that tf 0 3 versus the alternative hypothesis Ho U 3 Assume that you have observed 2 5 and that your prior probabilities are PFI 06 and P H 04 4ssume that your prior probability for fl follows the law N I Use the Jefieys tcsting procedure Give the Bayes factor for Exercise 91 Suppose the probability mass function for an obsewable variable X is given by f x I A eALx x 0 1 I 0 Your prior density for i is given by g2 2e2 You observe X 3 Test the hypothesis f i I versus the alternative hypothesis H i 1 Assume that your prior probabilities on the hypotheses are PH PHo 1 2 Explain the use of the intrinsic Bayrsfactor Explain the difference between the Lindley and Jeffreys methods of Bayesian hypothesis testing What is meant by Lindlcys paradox Explain some of the problems associated with the use of pvalues and significance testing Explain how frequentist hypothesis testing violates the likelihood principle Suppose X I n 50 are iid observations from R07 You observe t 2 Suppose your prior distribution for H is vague Use the Lindley hypothesis twting procedure to test HL 0 5 versus HI 8 f 5 Suppose that X A are iid following the law h0 d We assunie that o2 is unknown Form the sample mean and variance k I n X and 2 I n cT 2 You observe X 5 s2 37 for n 50 You adopt the prior distributions 0 hr I I and g l02 rx 4eZui with B and o2 a priori independent Assume that PHo 075 P H 025 Test the hypothesis H 0 3 versus Find the Bayes factor for the hypothesis testing problem in Exercise 910 Suppose r denotes the number of successes in n trials and r follows a binomial distribution with parameter p Carry out a Bayesian test of thc hypothesis H p 02 versus the alternative A p 08 where these are the only two possibilities Assume that r 3 and n 10 and that the prior probabilities of H and A are equal H 3 RlRTHER WADING Berger J 0 and Pcricchi L R 1996 The Intrinsic Bayes Factor for Model Selection and Prediction J Am Sfutk Assoc 91 109 122 Solutions for astcnsked cxercm m y be found in Appmhx 7 232 BAYESIAN NYP0ITIESIS TESTING Berger J 0 and Seke T 1987 Testing A Point Null Hypothesis The Irreconcilability of p Casella G and Berger R L I 987 Reconciling Bayesian and Frequentist Evidence in the Fisher R A 1925 1970 Statistical Methodsfor Research Mrkers 14th Edition New York Good I J 1950 Probahiligandthe Wcrghtingof Evidence London Charles Grifin and Co Good 1 J 1965 The Estimation oProbnbilities An Essuy on Modem Bayesian Methods G M I J 1983 Good TAinking The Foundations of Pmbability and its Applications Jeffeys H 1939 1948 I 961 Theory ofhbabilig 3rd Edition Oxford The Clarendon Lehmann E L 1959 Esting Stutistical Hypotheses New York John Wiley and Sons Inc Lempers F B 1971 Posterior Prohub of Alternative Linear Models Rotterdam Lindley D 1957 A Statistical Paradox Biomrtrika 44 187192 Lindley D i 1965 Introduction to Probubiliry und Statistics Part IProbabiiiW and Part Mathews J and Walker R L 1965 Mathematical Methods ojPhysics New Yo W A Neyman J and Pearson E S 1933 On the Testing of Statistical Hypotheses in Relation to Neyman J and Pearson E S 1966 Joint Stutistical Papers ofJ Nqmun and E S Pearson OHagen A 1 993 Fractional Bayes Factors for Model Comparison Statistical Research Pearson K 1 892 The Grummur of Science London Adam and Charles Black Popper K 1 933 1 959 The Logic ofscientific Discovery New York Basic Bocks London Raiffa It and Schlaifa R 1 96 I Applied Stutistical Recision Theoy Boston Graduate Savage L J 1954 The Founduliun qfSlafsistics New York John Wiley Sons Inc Student William Sealy Gosset 1908 Biomehika 6 125 Paper on the Student t Wald A 1939 Contributions to the Theory of Statistical Estimation and Testing Hypoth Values and Evidence Jr Am Statist Assoc 82397 112122 Onesided Testing Problem J Am Statist Assoc 82397 1061 1 I Hafner Edinburgh Oliver and Boyd Ltd Research Monograph 30 Cambridge MA The MIT Press Minneapolis University of Minnesota Press Press University Press 2hference Cambridge Cambridge Uniwnity Press Benjamin Inc Probability A Priori Proc Of the Cambridge Phil SOC 29492510 Berkeley C4 University of California Ress 10 papers Report 936 University of Nottingham Hutchinson School of Business Administration Harvard University distribution eses Ann Math Statist 10 299326