Exercícios - Degroot - Cálculo das Probabilidades 1

· Estatística ·

Cálculo das Probablidades 1

· 2023/2

Send your question to AI and receive an answer instantly

Preview text

Traduzido do Inglês para o Português - www.onlinedoctranslator.com Probability and Statistics Fourth Edition Probabilidade e Estatística Quarta edição This page intentionally left blank Esta página foi intencionalmente deixada em branco Probability and Statistics Fourth Edition Morris H. DeGroot Carnegie Mellon University Mark J. Schervish Carnegie Mellon University Addison-Wesley Boston Columbus Indianapolis New York San Francisco Upper Saddle River Amsterdam Cape Town Dubai London Madrid Milan Munich Paris Montr´eal Toronto Delhi Mexico City S˜ao Paulo Sydney Hong Kong Seoul Singapore Taipei Tokyo Probabilidade e Estatística Quarta edição Morris H. DeGroot Universidade Carnegie Mellon Mark J. Schervish Universidade Carnegie Mellon Addison-Wesley Boston Columbus Indianápolis Nova York São Francisco Upper Saddle River Amsterdã Cidade do Cabo Dubai Londres Madrid Milão Munique Paris Montreal Toronto Delhi Cidade do México São Paulo Sydney Hong Kong Seul Cingapura Taipei Tóquio Editor in Chief: Deirdre Lynch Acquisitions Editor: Christopher Cummings Associate Content Editors: Leah Goldberg, Dana Jones Bettez Associate Editor: Christina Lepre Senior Managing Editor: Karen Wernholm Production Project Manager: Patty Bergin Cover Designer: Heather Scott Design Manager: Andrea Nix Senior Marketing Manager: Alex Gay Marketing Assistant: Kathleen DeChavez Senior Author Support/Technology Specialist: Joe Vetere Rights and Permissions Advisor: Michael Joyce Manufacturing Manager: Carol Melville Project Management, Composition: Windfall Software, using ZzTEX Cover Photo: Shutterstock/© Marilyn Volan The programs and applications presented in this book have been included for their instruc- tional value. They have been tested with care, but are not guaranteed for any particular purpose. The publisher does not offer any warranties or representations, nor does it accept any liabilities with respect to the programs or applications. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and Pearson Education was aware of a trademark claim, the designations have been printed in initial caps or all caps. Library of Congress Cataloging-in-Publication Data DeGroot, Morris H., 1931–1989. Probability and statistics / Morris H. DeGroot, Mark J. Schervish.—4th ed. p. cm. ISBN 978-0-321-50046-5 1. Probabilities—Textbooks. 2. Mathematical statistics—Textbooks. I. Schervish, Mark J. II. Title. QA273.D35 2012 519.2—dc22 2010001486 Copyright © 2012, 2002 Pearson Education, Inc. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission of the publisher. Printed in the United States of America. For information on obtaining permission for use of material in this work, please submit a written request to Pearson Education, Inc., Rights and Contracts Department, 75 Arlington Street, Suite 300, Boston, MA 02116, fax your request to 617-848-7047, or e-mail at http://www.pearsoned.com/legal/permissions.htm. 1 2 3 4 5 6 7 8 9 10—EB—14 13 12 11 10 ISBN 10: 0-321-50046-6 www.pearsonhighered.com ISBN 13: 978-0-321-50046-5 Editor chefe:Deirdre Lynch Editor de Aquisições:Cristóvão Cummings Editores de conteúdo associados:Leah Goldberg, Dana Jones Bettez Editor associado:Cristina Lepre Editor-chefe sênior:Karen Wernholm Gerente de Projetos de Produção:Patty Bergin Designer de capa: Heather Scott Gerente de design:Andrea Nix Gerente Sênior de Marketing:Alex Gay Assistente de marketing:Kathleen De Chavez Especialista em Suporte ao Autor Sênior/Tecnologia:Joe Veteré Consultor de direitos e permissões:Michael Joyce Gerente de Manufatura:Carol Melville Gerenciamento de Projetos, Composição:Software inesperado, usando ZzTEX Foto de capa:Shutterstock/© Marilyn Volan Os programas e aplicativos apresentados neste livro foram incluídos por seu valor instrucional. Eles foram testados com cuidado, mas não são garantidos para nenhuma finalidade específica. O editor não oferece quaisquer garantias ou representações, nem aceita quaisquer responsabilidades com relação aos programas ou aplicativos. Muitas das designações utilizadas pelos fabricantes e vendedores para distinguir os seus produtos são reivindicadas como marcas comerciais. Onde essas designações aparecem neste livro, e a Pearson Education estava ciente de uma reivindicação de marca registrada, as designações foram impressas em letras maiúsculas ou todas em letras maiúsculas. Dados de catalogação na publicação da Biblioteca do Congresso DeGroot, Morris H., 1931–1989. Probabilidade e estatística / Morris H. DeGroot, Mark J. Schervish.—4ª ed. pág. cm. ISBN 978-0-321-50046-5 1. Probabilidades – Livros Didáticos. 2. Estatística matemática – livros didáticos. I. Schervish, Mark J. II. Título. QA273.D35 2012 519,2—dc22 2010001486 Direitos autorais © 2012, 2002 Pearson Education, Inc. Todos os direitos reservados. Nenhuma parte desta publicação pode ser reproduzida, armazenada em um sistema de recuperação ou transmitida, de qualquer forma ou por qualquer meio, eletrônico, mecânico, fotocópia, gravação ou outro, sem a permissão prévia por escrito do editor. Impresso nos Estados Unidos da América. Para obter informações sobre como obter permissão para uso do material neste trabalho, envie uma solicitação por escrito para Pearson Education, Inc., Rights and Contracts Department, 75 Arlington Street, Suite 300, Boston, MA 02116, envie sua solicitação por fax para 617-848- 7047 ou e-mail em http:// www.pearsoned.com/legal/permissions.htm. 1 2 3 4 5 6 7 8 9 10—EB—14 13 12 11 10 ISBN 10: 0-321-50046-6 ISBN 13: 978-0-321-50046-5 www.pearsonhighered.com To the memory of Morrie DeGroot. MJS À memória de Morrie DeGroot. MJS This page intentionally left blank Esta página foi intencionalmente deixada em branco Contents Preface xi 1 Introduction to Probability 1 1.1 The History of Probability 1 1.2 Interpretations of Probability 2 1.3 Experiments and Events 5 1.4 Set Theory 6 1.5 The Definition of Probability 16 1.6 Finite Sample Spaces 22 1.7 Counting Methods 25 1.8 Combinatorial Methods 32 1.9 Multinomial Coefficients 42 1.10 The Probability of a Union of Events 46 1.11 Statistical Swindles 51 1.12 Supplementary Exercises 53 2 Conditional Probability 55 2.1 The Definition of Conditional Probability 55 2.2 Independent Events 66 2.3 Bayes’ Theorem 76 ⋆ 2.4 The Gambler’s Ruin Problem 86 2.5 Supplementary Exercises 90 3 Random Variables and Distributions 93 3.1 Random Variables and Discrete Distributions 93 3.2 Continuous Distributions 100 3.3 The Cumulative Distribution Function 107 3.4 Bivariate Distributions 118 3.5 Marginal Distributions 130 3.6 Conditional Distributions 141 3.7 Multivariate Distributions 152 3.8 Functions of a Random Variable 167 3.9 Functions of Two or More Random Variables 175 ⋆ 3.10 Markov Chains 188 3.11 Supplementary Exercises 202 vii Conteúdo Prefácio XI 1Introdução à Probabilidade 1 1.1 A História da Probabilidade 1.2 Interpretações de probabilidade 1.3 Experimentos e Eventos 1.4 Teoria dos Conjuntos 6 1,5 A definição de probabilidade 1.6 Espaços amostrais finitos 1.7 Métodos de contagem 1,8 Métodos Combinatórios 1,9 Coeficientes Multinomiais 1.10A probabilidade de uma união de eventos 1.11Fraudes Estatísticas 51 1.12Exercícios Suplementares 53 1 2 5 16 22 25 32 42 46 2Probabilidade Condicional 55 2.1 A definição de probabilidade condicional 2.2 Eventos Independentes 2.3 Teorema de Bayes - 2.4 O problema da ruína do jogador 2,5 Exercícios Suplementares 55 66 76 86 90 3Variáveis Aleatórias e Distribuições 93 3.1 Variáveis Aleatórias e Distribuições Discretas 3.2 Distribuições Contínuas 100 3.3 A função de distribuição cumulativa 3.4 Distribuições Bivariadas 3.5 Distribuições Marginais 3.6 Distribuições Condicionais 3.7 Distribuições Multivariadas 3.8 Funções de uma variável aleatória 3.9 Funções de duas ou mais variáveis aleatórias - 3.10Correntes de Markov 188 3.11Exercícios Suplementares 202 93 107 118 130 141 152 167 175 vii viii Contents 4 Expectation 207 4.1 The Expectation of a Random Variable 207 4.2 Properties of Expectations 217 4.3 Variance 225 4.4 Moments 234 4.5 The Mean and the Median 241 4.6 Covariance and Correlation 248 4.7 Conditional Expectation 256 ⋆ 4.8 Utility 265 4.9 Supplementary Exercises 272 5 Special Distributions 275 5.1 Introduction 275 5.2 The Bernoulli and Binomial Distributions 275 5.3 The Hypergeometric Distributions 281 5.4 The Poisson Distributions 287 5.5 The Negative Binomial Distributions 297 5.6 The Normal Distributions 302 5.7 The Gamma Distributions 316 5.8 The Beta Distributions 327 5.9 The Multinomial Distributions 333 5.10 The Bivariate Normal Distributions 337 5.11 Supplementary Exercises 345 6 Large Random Samples 347 6.1 Introduction 347 6.2 The Law of Large Numbers 348 6.3 The Central Limit Theorem 360 6.4 The Correction for Continuity 371 6.5 Supplementary Exercises 375 7 Estimation 376 7.1 Statistical Inference 376 7.2 Prior and Posterior Distributions 385 7.3 Conjugate Prior Distributions 394 7.4 Bayes Estimators 408 viii Conteúdo 4Expectativa 207 4.1 A expectativa de uma variável aleatória 4.2 Propriedades das Expectativas 4.3 Variância 4.4 Momentos 4,5 A média e a mediana 4.6 Covariância e Correlação 4.7 Expectativa Condicional - 4.8 Utilitário 265 4.9 Exercícios Suplementares 207 217 225 234 241 248 256 272 5Distribuições Especiais 275 5.1 Introdução 275 5.2 As Distribuições Bernoulli e Binomial 5.3 As distribuições hipergeométricas 5.4 As Distribuições de Poisson 287 5.5 As distribuições binomiais negativas 5.6 As distribuições normais 5.7 As Distribuições Gama 5.8 As distribuições beta 5.9 As Distribuições Multinomiais 5.10As distribuições normais bivariadas 5.11Exercícios Suplementares 345 275 281 297 302 316 327 333 337 6Grandes amostras aleatórias 347 6.1 Introdução 347 6.2 A Lei dos Grandes Números 6.3 O Teorema do Limite Central 6.4 A correção para continuidade 6,5 Exercícios Suplementares 348 360 371 375 7Estimativa 376 7.1 Inferência Estatística 376 7.2 Distribuições anteriores e posteriores 7.3 Distribuições anteriores conjugadas 7.4 Estimadores Bayes 408 385 394 Contents ix 7.5 Maximum Likelihood Estimators 417 7.6 Properties of Maximum Likelihood Estimators 426 ⋆ 7.7 Sufficient Statistics 443 ⋆ 7.8 Jointly Sufficient Statistics 449 ⋆ 7.9 Improving an Estimator 455 7.10 Supplementary Exercises 461 8 Sampling Distributions of Estimators 464 8.1 The Sampling Distribution of a Statistic 464 8.2 The Chi-Square Distributions 469 8.3 Joint Distribution of the Sample Mean and Sample Variance 473 8.4 The t Distributions 480 8.5 Confidence Intervals 485 ⋆ 8.6 Bayesian Analysis of Samples from a Normal Distribution 495 8.7 Unbiased Estimators 506 ⋆ 8.8 Fisher Information 514 8.9 Supplementary Exercises 528 9 Testing Hypotheses 530 9.1 Problems of Testing Hypotheses 530 ⋆ 9.2 Testing Simple Hypotheses 550 ⋆ 9.3 Uniformly Most Powerful Tests 559 ⋆ 9.4 Two-Sided Alternatives 567 9.5 The t Test 576 9.6 Comparing the Means of Two Normal Distributions 587 9.7 The F Distributions 597 ⋆ 9.8 Bayes Test Procedures 605 ⋆ 9.9 Foundational Issues 617 9.10 Supplementary Exercises 621 10 Categorical Data and Nonparametric Methods 624 10.1 Tests of Goodness-of-Fit 624 10.2 Goodness-of-Fit for Composite Hypotheses 633 10.3 Contingency Tables 641 10.4 Tests of Homogeneity 647 10.5 Simpson’s Paradox 653 ⋆ 10.6 Kolmogorov-Smirnov Tests 657 Conteúdo ix 7,5 Estimadores de Máxima Verossimilhança 417 7.6 Propriedades dos estimadores de máxima verossimilhança - 7.7 - 7,8 - 7,9 7.10Exercícios Suplementares 426 Estatísticas suficientes Estatísticas conjuntamente suficientes que melhoram um estimador 443 449 455 461 8Amostragem de Distribuições de Estimadores 464 8.1 A distribuição amostral de uma estatística 464 8.2 As Distribuições Qui-Quadrado 469 8.3 Distribuição Conjunta da Média Amostral e Variância Amostral 8.4 OtDistribuições 8,5 Intervalos de confiança - 8.6 Análise Bayesiana de Amostras de uma Distribuição Normal 8.7 Estimadores imparciais - 8.8 Informações sobre Pescador 8,9 Exercícios Suplementares 473 480 485 495 506 514 528 9Testando Hipóteses 530 9.1 Problemas de teste de hipóteses - 9.2 - 9.3 - 9.4 9,5 OtTeste 9.6 Comparando as médias de duas distribuições normais 9.7 OFDistribuições - 9,8 - 9,9 9h10Exercícios Suplementares 530 Testando Hipóteses Simples Uniformemente Testes Mais Poderosos Alternativas Bilaterais 567 576 550 559 587 597 605 617 Procedimentos de teste Bayes Questões Fundamentais 621 10Dados categóricos e métodos não paramétricos 624 10.1 10.2 10.3 10.4 10,5 - 10.6 Testes de adequação de ajuste para tabelas de contingência de hipóteses compostas Testes de Homogeneidade O Paradoxo de Simpson Testes Kolmogorov-Smirnov 624 633 641 647 653 657 x Contents ⋆ 10.7 Robust Estimation 666 ⋆ 10.8 Sign and Rank Tests 678 10.9 Supplementary Exercises 686 11 Linear Statistical Models 689 11.1 The Method of Least Squares 689 11.2 Regression 698 11.3 Statistical Inference in Simple Linear Regression 707 ⋆ 11.4 Bayesian Inference in Simple Linear Regression 729 11.5 The General Linear Model and Multiple Regression 736 11.6 Analysis of Variance 754 ⋆ 11.7 The Two-Way Layout 763 ⋆ 11.8 The Two-Way Layout with Replications 772 11.9 Supplementary Exercises 783 12 Simulation 787 12.1 What Is Simulation? 787 12.2 Why Is Simulation Useful? 791 12.3 Simulating Specific Distributions 804 12.4 Importance Sampling 816 ⋆ 12.5 Markov Chain Monte Carlo 823 12.6 The Bootstrap 839 12.7 Supplementary Exercises 850 Tables 853 Answers to Odd-Numbered Exercises 865 References 879 Index 885 x Conteúdo - 10.7 - 10.8 10.9 Estimativa robusta Exercícios complementares de testes de sinalização e classificação 666 678 686 11Modelos Estatísticos Lineares 689 11.1O Método dos Mínimos Quadrados 689 11.2Regressão 698 11.3Inferência Estatística em Regressão Linear Simples - 11.4Inferência Bayesiana em Regressão Linear Simples 11,5O modelo linear geral e a regressão múltipla 11.6Análise de variação - 11.7 - 11.8 11.9Exercícios Suplementares 707 729 736 754 763 O layout bidirecional O layout bidirecional com replicações 783 772 12Simulação 787 12.1O que é simulação? 787 12.2Por que a simulação é útil? 791 12.3Simulando Distribuições Específicas 12.4Amostragem de Importância 816 - 12,5Cadeia de Markov Monte Carlo 12.6O Bootstrap 839 12,7Exercícios Suplementares 804 823 850 Tabelas Respostas às referências de exercícios com números ímpares Índice 853 865 879 885 Preface Changes to the Fourth Edition . I have reorganized many main results that were included in the body of the text by labeling them as theorems in order to facilitate students in finding and referencing these results. . I have pulled the important defintions and assumptions out of the body of the text and labeled them as such so that they stand out better. . When a new topic is introduced, I introduce it with a motivating example before delving into the mathematical formalities. Then I return to the example to illustrate the newly introduced material. . I moved the material on the law of large numbers and the central limit theorem to a new Chapter 6. It seemed more natural to deal with the main large-sample results together. . I moved the section on Markov chains into Chapter 3. Every time I cover this material with my own students, I stumble over not being able to refer to random variables, distributions, and conditional distributions. I have actually postponed this material until after introducing distributions, and then gone back to cover Markov chains. I feel that the time has come to place it in a more natural location. I also added some material on stationary distributions of Markov chains. . I have moved the lengthy proofs of several theorems to the ends of their respective sections in order to improve the flow of the presentation of ideas. . I rewrote Section 7.1 to make the introduction to inference clearer. . I rewrote Section 9.1 as a more complete introduction to hypothesis testing, including likelihood ratio tests. For instructors not interested in the more math- ematical theory of hypothesis testing, it should now be easier to skip from Section 9.1 directly to Section 9.5. Some other changes that readers will notice: . I have replaced the notation in which the intersection of two sets A and B had been represented AB with the more popular A ∩ B. The old notation, although mathematically sound, seemed a bit arcane for a text at this level. . I added the statements of Stirling’s formula and Jensen’s inequality. . I moved the law of total probability and the discussion of partitions of a sample space from Section 2.3 to Section 2.1. . I define the cumulative distribution function (c.d.f.) as the prefered name of what used to be called only the distribution function (d.f.). . I added some discussion of histograms in Chapters 3 and 6. . I rearranged the topics in Sections 3.8 and 3.9 so that simple functions of random variables appear first and the general formulations appear at the end to make it easier for instructors who want to avoid some of the more mathematically challenging parts. . I emphasized the closeness of a hypergeometric distribution with a large num- ber of available items to a binomial distribution. xi Prefácio Mudanças na Quarta Edição . Reorganizei muitos resultados principais que foram incluídos no corpo do texto, rotulando-os como teoremas, a fim de facilitar aos alunos a localização e referência desses resultados. . Retirei as definições e suposições importantes do corpo do texto e as rotulei como tal para que se destacassem melhor. . Quando um novo tópico é introduzido, eu o apresento com um exemplo motivador antes de me aprofundar nas formalidades matemáticas. Depois volto ao exemplo para ilustrar o material recém-introduzido. . Mudei o material sobre a lei dos grandes números e o teorema do limite central para um novo Capítulo 6. Parecia mais natural lidar juntos com os principais resultados de grandes amostras. . Mudei a seção sobre cadeias de Markov para o Capítulo 3. Cada vez que abordo esse material com meus próprios alunos, tropeço na impossibilidade de me referir a variáveis aleatórias, distribuições e distribuições condicionais. Na verdade, adiei este material até depois da introdução das distribuições e depois voltei a cobrir as cadeias de Markov. Sinto que chegou a hora de colocá-lo em um local mais natural. Também adicionei algum material sobre distribuições estacionárias de cadeias de Markov. . Movi as longas provas de vários teoremas para o final de suas respectivas seções, a fim de melhorar o fluxo de apresentação das ideias. . Reescrevi a Seção 7.1 para tornar a introdução à inferência mais clara. . Reescrevi a Seção 9.1 como uma introdução mais completa aos testes de hipóteses, incluindo testes de razão de verossimilhança. Para instrutores não interessados na teoria mais matemática dos testes de hipóteses, agora será mais fácil pular da Seção 9.1 diretamente para a Seção 9.5. Algumas outras mudanças que os leitores notarão: . Substituí a notação em que a intersecção de dois conjuntosAeBtinha sido representadoABcom o mais popularA∩B. A notação antiga, embora matematicamente correta, parecia um pouco misteriosa para um texto deste nível. . Adicionei as afirmações da fórmula de Stirling e da desigualdade de Jensen. . Mudei a lei da probabilidade total e a discussão das partições de um espaço amostral da Seção 2.3 para a Seção 2.1. . Defino a função de distribuição cumulativa (cdf) como o nome preferido do que costumava ser chamado apenas de função de distribuição (df). . Adicionei algumas discussões sobre histogramas nos Capítulos 3 e 6. . Reorganizei os tópicos nas Seções 3.8 e 3.9 para que funções simples de variáveis aleatórias apareçam primeiro e as formulações gerais apareçam no final para facilitar aos instrutores que desejam evitar algumas das partes matematicamente mais desafiadoras. . Enfatizei a proximidade de uma distribuição hipergeométrica com um grande número de itens disponíveis a uma distribuição binomial. XI xii Preface . I gave a brief introduction to Chernoff bounds. These are becoming increasingly important in computer science, and their derivation requires only material that is already in the text. . I changed the definition of confidence interval to refer to the random interval rather than the observed interval. This makes statements less cumbersome, and it corresponds to more modern usage. . I added a brief discussion of the method of moments in Section 7.6. . I added brief introductions to Newton’s method and the EM algorithm in Chapter 7. . I introduced the concept of pivotal quantity to facilitate construction of confi- dence intervals in general. . I added the statement of the large-sample distribution of the likelihood ratio test statistic. I then used this as an alternative way to test the null hypothesis that two normal means are equal when it is not assumed that the variances are equal. . I moved the Bonferroni inequality into the main text (Chapter 1) and later (Chapter 11) used it as a way to construct simultaneous tests and confidence intervals. How to Use This Book The text is somewhat long for complete coverage in a one-year course at the under- graduate level and is designed so that instructors can make choices about which topics are most important to cover and which can be left for more in-depth study. As an ex- ample, many instructors wish to deemphasize the classical counting arguments that are detailed in Sections 1.7–1.9. An instructor who only wants enough information to be able to cover the binomial and/or multinomial distributions can safely dis- cuss only the definitions and theorems on permutations, combinations, and possibly multinomial coefficients. Just make sure that the students realize what these values count, otherwise the associated distributions will make no sense. The various exam- ples in these sections are helpful, but not necessary, for understanding the important distributions. Another example is Section 3.9 on functions of two or more random variables. The use of Jacobians for general multivariate transformations might be more mathematics than the instructors of some undergraduate courses are willing to cover. The entire section could be skipped without causing problems later in the course, but some of the more straightforward cases early in the section (such as con- volution) might be worth introducing. The material in Sections 9.2–9.4 on optimal tests in one-parameter families is pretty mathematics, but it is of interest primarily to graduate students who require a very deep understanding of hypothesis testing theory. The rest of Chapter 9 covers everything that an undergraduate course really needs. In addition to the text, the publisher has an Instructor’s Solutions Manual, avail- able for download from the Instructor Resource Center at www.pearsonhighered .com/irc, which includes some specific advice about many of the sections of the text. I have taught a year-long probability and statistics sequence from earlier editions of this text for a group of mathematically well-trained juniors and seniors. In the first semester, I covered what was in the earlier edition but is now in the first five chap- ters (including the material on Markov chains) and parts of Chapter 6. In the second semester, I covered the rest of the new Chapter 6, Chapters 7–9, Sections 11.1–11.5, and Chapter 12. I have also taught a one-semester probability and random processes xii Prefácio . Fiz uma breve introdução aos limites de Chernoff. Estes estão se tornando cada vez mais importantes na ciência da computação e sua derivação requer apenas material que já esteja no texto. . Alterei a definição de intervalo de confiança para me referir ao intervalo aleatório em vez do intervalo observado. Isto torna as declarações menos complicadas e corresponde ao uso mais moderno. . Adicionei uma breve discussão sobre o método dos momentos na Seção 7.6. . Adicionei breves introduções ao método de Newton e ao algoritmo EM no Capítulo 7. . Introduzi o conceito de quantidade fundamental para facilitar a construção de intervalos de confiança em geral. . Adicionei a declaração da distribuição de amostras grandes da estatística do teste de razão de verossimilhança. Usei então isso como uma forma alternativa de testar a hipótese nula de que duas médias normais são iguais quando não se presume que as variâncias sejam iguais. . Mudei a desigualdade de Bonferroni para o texto principal (Capítulo 1) e posteriormente (Capítulo 11) usei-a como forma de construir testes simultâneos e intervalos de confiança. Como usar este livro O texto é um tanto longo para uma cobertura completa em um curso de um ano de graduação e foi elaborado para que os instrutores possam fazer escolhas sobre quais tópicos são mais importantes para serem abordados e quais podem ser deixados para um estudo mais aprofundado. Como exemplo, muitos instrutores desejam tirar a ênfase dos argumentos clássicos de contagem detalhados nas Seções 1.7–1.9. Um instrutor que deseja apenas informações suficientes para poder cobrir as distribuições binomiais e/ou multinomiais pode discutir com segurança apenas as definições e teoremas sobre permutações, combinações e possivelmente coeficientes multinomiais. Apenas certifique-se de que os alunos percebam o que esses valores contam, caso contrário as distribuições associadas não farão sentido. Os vários exemplos nestas seções são úteis, mas não necessários, para a compreensão das distribuições importantes. Outro exemplo é a Seção 3.9 sobre funções de duas ou mais variáveis aleatórias. O uso de Jacobianos para transformações multivariadas gerais pode ser mais matemático do que os professores de alguns cursos de graduação estão dispostos a cobrir. A seção inteira poderia ser ignorada sem causar problemas posteriormente no curso, mas alguns dos casos mais simples no início da seção (como convolução) podem valer a pena ser introduzidos. O material nas Seções 9.2 a 9.4 sobre testes ótimos em famílias de um parâmetro é bastante matemático, mas é de interesse principalmente para estudantes de pós-graduação que necessitam de um conhecimento muito profundo da teoria dos testes de hipóteses. O restante do Capítulo 9 cobre tudo o que um curso de graduação realmente precisa. O uso de Jacobianos para transformações multivariadas gerais pode ser mais matemático do que os professores de alguns cursos de graduação estão dispostos a cobrir. A seção inteira poderia ser ignorada sem causar problemas posteriormente no curso, mas alguns dos casos mais simples no início da seção (como convolução) podem valer a pena ser introduzidos. O material nas Seções 9.2 a 9.4 sobre testes ótimos em famílias de um parâmetro é bastante matemático, mas é de interesse principalmente para estudantes de pós- graduação que necessitam de um conhecimento muito profundo da teoria dos testes de hipóteses. O restante do Capítulo 9 cobre tudo o que um curso de graduação realmente precisa. O uso de Jacobianos para transformações multivariadas gerais pode ser mais matemático do que os professores de alguns cursos de graduação estão dispostos a cobrir. A seção inteira poderia ser ignorada sem causar problemas posteriormente no curso, mas alguns dos casos mais simples no início da seção (como convolução) podem valer a pena ser introduzidos. O material nas Seções 9.2 a 9.4 sobre testes ótimos em famílias de um parâmetro é bastante matemático, mas é de interesse principalmente para estudantes de pós-graduação que necessitam de um conhecimento muito profundo da teoria dos testes de hipóteses. O restante do Capítulo 9 cobre tudo o que um curso de graduação realmente precisa. mas alguns dos casos mais simples no início da seção (como convolução) podem valer a pena ser introduzidos. O material nas Seções 9.2 a 9.4 sobre testes ótimos em famílias de um parâmetro é bastante matemático, mas é de interesse principalmente para estudantes de pós-graduação que necessitam de um conhecimento muito profundo da teoria dos testes de hipóteses. O restante do Capítulo 9 cobre tudo o que um curso de graduação realmente precisa. mas alguns dos casos mais simples no início da seção (como convolução) podem valer a pena ser introduzidos. O material nas Seções 9.2 a 9.4 sobre testes ótimos em famílias de um parâmetro é bastante matemático, mas é de interesse principalmente para estudantes de pós-graduação que necessitam de um conhecimento muito profundo da teoria dos testes de hipóteses. O restante do Capítulo 9 cobre tudo o que um curso de graduação realmente precisa. Além do texto, a editora conta com umaManual de soluções do instrutor,disponível para download no Instructor Resource Center em www.pearsonhighered. com/irc, que inclui alguns conselhos específicos sobre muitas das seções do texto. Ensinei uma sequência de probabilidade e estatística de um ano de edições anteriores deste livro para um grupo de juniores e seniores matematicamente bem treinados. No primeiro semestre, cobri o que estava na edição anterior, mas agora está nos primeiros cinco capítulos (incluindo o material sobre cadeias de Markov) e partes do Capítulo 6. No segundo semestre, cobri o restante do novo Capítulo 6, Capítulos 7–9, Seções 11.1–11.5 e Capítulo 12. Também ensinei probabilidade de um semestre e processos aleatórios Preface xiii course for engineers and computer scientists. I covered what was in the old edition and is now in Chapters 1–6 and 12, including Markov chains, but not Jacobians. This latter course did not emphasize mathematical derivation to the same extent as the course for mathematics students. A number of sections are designated with an asterisk (*). This indicates that later sections do not rely materially on the material in that section. This designation is not intended to suggest that instructors skip these sections. Skipping one of these sections will not cause the students to miss definitions or results that they will need later. The sections are 2.4, 3.10, 4.8, 7.7, 7.8, 7.9, 8.6, 8.8, 9.2, 9.3, 9.4, 9.8, 9.9, 10.6, 10.7, 10.8, 11.4, 11.7, 11.8, and 12.5. Aside from cross-references between sections within this list, occasional material from elsewhere in the text does refer back to some of the sections in this list. Each of the dependencies is quite minor, however. Most of the dependencies involve references from Chapter 12 back to one of the optional sections. The reason for this is that the optional sections address some of the more difficult material, and simulation is most useful for solving those difficult problems that cannot be solved analytically. Except for passing references that help put material into context, the dependencies are as follows: . The sample distribution function (Section 10.6) is reintroduced during the discussion of the bootstrap in Section 12.6. The sample distribution function is also a useful tool for displaying simulation results. It could be introduced as early as Example 12.3.7 simply by covering the first subsection of Section 10.6. . The material on robust estimation (Section 10.7) is revisited in some simulation exercises in Section 12.2 (Exercises 4, 5, 7, and 8). . Example 12.3.4 makes reference to the material on two-way analysis of variance (Sections 11.7 and 11.8). Supplements The text is accompanied by the following supplementary material: . Instructor’s Solutions Manual contains fully worked solutions to all exercises in the text. Available for download from the Instructor Resource Center at www.pearsonhighered.com/irc. . Student Solutions Manual contains fully worked solutions to all odd exercises in the text. Available for purchase from MyPearsonStore at www.mypearsonstore .com. (ISBN-13: 978-0-321-71598-2; ISBN-10: 0-321-71598-5) Acknowledgments There are many people that I want to thank for their help and encouragement during this revision. First and foremost, I want to thank Marilyn DeGroot and Morrie’s children for giving me the chance to revise Morrie’s masterpiece. I am indebted to the many readers, reviewers, colleagues, staff, and people at Addison-Wesley whose help and comments have strengthened this edition. The reviewers were: Andre Adler, Illinois Institute of Technology; E. N. Barron, Loyola University; Brian Blank, Washington University in St. Louis; Indranil Chakraborty, University of Ok- lahoma; Daniel Chambers, Boston College; Rita Chattopadhyay, Eastern Michigan University; Stephen A. Chiappari, Santa Clara University; Sheng-Kai Chang, Wayne State University; Justin Corvino, Lafayette College; Michael Evans, University of Prefácio xiii curso para engenheiros e cientistas da computação. Abordei o que estava na edição antiga e agora está nos capítulos 1–6 e 12, incluindo cadeias de Markov, mas não jacobianas. Este último curso não enfatizou a derivação matemática na mesma medida que o curso para estudantes de matemática. Várias seções são designadas com um asterisco (*). Isso indica que as seções posteriores não dependem materialmente do material daquela seção. Esta designação não pretende sugerir que os instrutores pulem essas seções. Pular uma dessas seções não fará com que os alunos percam definições ou resultados de que precisarão mais tarde. As seções são 2,4, 3,10, 4,8, 7,7, 7,8, 7,9, 8,6, 8,8, 9,2, 9,3, 9,4, 9,8, 9,9, 10,6, 10,7, 10,8, 11,4, 11,7, 11,8 e 12,5. Além das referências cruzadas entre as seções desta lista, material ocasional de outras partes do texto faz referência a algumas das seções desta lista. Cada uma das dependências é bem menor, entretanto. A maioria das dependências envolve referências do Capítulo 12 a uma das seções opcionais. A razão para isso é que as seções opcionais abordam alguns dos materiais mais difíceis, e a simulação é mais útil para resolver problemas difíceis que não podem ser resolvidos analiticamente. Exceto pela passagem de referências que ajudam a contextualizar o material, as dependências são as seguintes: . A função de distribuição de amostra (Seção 10.6) é reintroduzida durante a discussão do bootstrap na Seção 12.6. A função de distribuição de amostras também é uma ferramenta útil para exibir resultados de simulação. Poderia ser introduzido já no Exemplo 12.3.7 simplesmente cobrindo a primeira subseção da Seção 10.6. . O material sobre estimativa robusta (Seção 10.7) é revisitado em alguns exercícios de simulação na Seção 12.2 (Exercícios 4, 5, 7 e 8). . O Exemplo 12.3.4 faz referência ao material sobre análise de variância bidirecional (Seções 11.7 e 11.8). Suplementos O texto é acompanhado do seguinte material complementar: . Manual de soluções do instrutorcontém soluções totalmente trabalhadas para todos os exercícios do texto. Disponível para download no Instructor Resource Center em www.pearsonhighered.com/irc. . Manual de soluções para estudantescontém soluções totalmente trabalhadas para todos os exercícios estranhos no texto. Disponível para compra na MyPearsonStore em www.mypearsonstore. com. (ISBN-13: 978-0-321-71598-2; ISBN-10: 0-321-71598-5) Agradecimentos Há muitas pessoas a quem quero agradecer pela ajuda e incentivo durante esta revisão. Em primeiro lugar, quero agradecer a Marilyn DeGroot e aos filhos de Morrie por me darem a oportunidade de revisar a obra-prima de Morrie. Estou em dívida com os muitos leitores, revisores, colegas, funcionários e pessoas da Addison-Wesley, cuja ajuda e comentários fortaleceram esta edição. Os revisores foram: Andre Adler, Instituto de Tecnologia de Illinois; EN Barron, Universidade Loyola; Brian Blank, Universidade de Washington em St. Indranil Chakraborty, Universidade de Oklahoma; Daniel Chambers, Faculdade de Boston; Rita Chattopadhyay, Universidade Oriental de Michigan; Stephen A. Chiappari, Universidade de Santa Clara; Sheng-Kai Chang, Universidade Estadual de Wayne; Justin Corvino, Lafayette College; Michael Evans, Universidade de xiv Preface Toronto; Doug Frank, Indiana University of Pennsylvania; Anda Gadidov, Ken- nesaw State University; Lyn Geisler, Randolph–Macon College; Prem Goel, Ohio State University; Susan Herring, Sonoma State University; Pawel Hitczenko, Drexel University; Lifang Hsu, Le Moyne College; Wei-Min Huang, Lehigh University; Syed Kirmani, University of Northern Iowa; Michael Lavine, Duke University; Rich Levine, San Diego State University; John Liukkonen, Tulane University; Sergio Loch, Grand View College; Rosa Matzkin, Northwestern University; Terry Mc- Connell, Syracuse University; Hans-Georg Mueller, University of California–Davis; Robert Myers, Bethel College; Mario Peruggia, The Ohio State University; Stefan Ralescu, Queens University; Krishnamurthi Ravishankar, SUNY New Paltz; Diane Saphire, Trinity University; Steven Sepanski, Saginaw Valley State University; Hen- Siong Tan, Pennsylvania University; Kanapathi Thiru, University of Alaska; Ken- neth Troske, Johns Hopkins University; John Van Ness, University of Texas at Dal- las; Yehuda Vardi, Rutgers University; Yelena Vaynberg, Wayne State University; Joseph Verducci, Ohio State University; Mahbobeh Vezveai, Kent State University; Brani Vidakovic, Duke University; Karin Vorwerk, Westfield State College; Bette Warren, Eastern Michigan University; Calvin L. Williams, Clemson University; Lori Wolff, University of Mississippi. The person who checked the accuracy of the book was Anda Gadidov, Kenne- saw State University. I would also like to thank my colleagues at Carnegie Mellon University, especially Anthony Brockwell, Joel Greenhouse, John Lehoczky, Heidi Sestrich, and Valerie Ventura. The people at Addison-Wesley and other organizations that helped produce the book were Paul Anagnostopoulos, Patty Bergin, Dana Jones Bettez, Chris Cummings, Kathleen DeChavez, Alex Gay, Leah Goldberg, Karen Hartpence, and Christina Lepre. If I left anyone out, it was unintentional, and I apologize. Errors inevitably arise in any project like this (meaning a project in which I am involved). For this reason, I shall post information about the book, including a list of corrections, on my Web page, http://www.stat.cmu.edu/~mark/, as soon as the book is published. Readers are encouraged to send me any errors that they discover. Mark J. Schervish October 2010 XIV Prefácio Toronto; Doug Frank, Universidade de Indiana da Pensilvânia; Anda Gadidov, Universidade Estadual de Kennesaw; Lyn Geisler, Randolph – Macon College; Prem Goel, Universidade Estadual de Ohio; Susan Herring, Universidade Estadual de Sonoma; Pawel Hitczenko, Universidade Drexel; Lifang Hsu, Faculdade Le Moyne; Wei-Min Huang, Universidade Lehigh; Syed Kirmani, Universidade do Norte de Iowa; Michael Lavine, Universidade Duke; Rich Levine, Universidade Estadual de San Diego; John Liukkonen, Universidade de Tulane; Sergio Loch, Grand View College; Rosa Matzkin, Universidade Northwestern; Terry McConnell, Universidade de Syracuse; Hans-Georg Mueller, Universidade da Califórnia – Davis; Robert Myers, Colégio Bethel; Mario Peruggia, Universidade Estadual de Ohio; Stefan Ralescu, Universidade de Queens; Krishnamurthi Ravishankar, SUNY New Paltz; Diane Saphire, Universidade Trinity; Steven Sepanski, Universidade Estadual de Saginaw Valley; Hen-Siong Tan, Universidade da Pensilvânia; Kanapathi Thiru, Universidade do Alasca; Kenneth Troske, Universidade Johns Hopkins; John Van Ness, Universidade do Texas em Dallas; Yehuda Vardi, Universidade Rutgers; Yelena Vaynberg, Universidade Estadual de Wayne; Joseph Verducci, Universidade Estadual de Ohio; Mahbobeh Vezveai, Universidade Estadual de Kent; Brani Vidakovic, Universidade Duke; Karin Vorwerk, Westfield State College; Bette Warren, Universidade Oriental de Michigan; Calvin L. Williams, Universidade Clemson; Lori Wolff, Universidade do Mississippi. Universidade Estadual de Kent; Brani Vidakovic, Universidade Duke; Karin Vorwerk, Westfield State College; Bette Warren, Universidade Oriental de Michigan; Calvin L. Williams, Universidade Clemson; Lori Wolff, Universidade do Mississippi. Universidade Estadual de Kent; Brani Vidakovic, Universidade Duke; Karin Vorwerk, Westfield State College; Bette Warren, Universidade Oriental de Michigan; Calvin L. Williams, Universidade Clemson; Lori Wolff, Universidade do Mississippi. A pessoa que verificou a veracidade do livro foi Anda Gadidov, da Kennesaw State University. Gostaria também de agradecer aos meus colegas da Carnegie Mellon University, especialmente Anthony Brockwell, Joel Greenhouse, John Lehoczky, Heidi Sestrich e Valerie Ventura. As pessoas da Addison-Wesley e de outras organizações que ajudaram a produzir o livro foram Paul Anagnostopoulos, Patty Bergin, Dana Jones Bettez, Chris Cummings, Kathleen DeChavez, Alex Gay, Leah Goldberg, Karen Hartpence e Christina Lepre. Se deixei alguém de fora, foi sem querer e peço desculpas. Erros inevitavelmente surgem em qualquer projeto como este (ou seja, um projeto no qual estou envolvido). Por esse motivo, publicarei informações sobre o livro, incluindo uma lista de correções, em minha página da Web, http://www.stat.cmu.edu/~mark/, assim que o livro for publicado. Os leitores são incentivados a me enviar quaisquer erros que descobrirem. Mark J. Schervish Outubro de 2010 Chapter 1 Introduction to Probability 1.1 The History of Probability 1.2 Interpretations of Probability 1.3 Experiments and Events 1.4 Set Theory 1.5 The Definition of Probability 1.6 Finite Sample Spaces 1.7 Counting Methods 1.8 Combinatorial Methods 1.9 Multinomial Coefficients 1.10 The Probability of a Union of Events 1.11 Statistical Swindles 1.12 Supplementary Exercises 1.1 The History of Probability The use of probability to measure uncertainty and variability dates back hundreds of years. Probability has found application in areas as diverse as medicine, gam- bling, weather forecasting, and the law. The concepts of chance and uncertainty are as old as civilization itself. People have always had to cope with uncertainty about the weather, their food supply, and other aspects of their environment, and have striven to reduce this uncertainty and its effects. Even the idea of gambling has a long history. By about the year 3500 b.c., games of chance played with bone objects that could be considered precursors of dice were apparently highly developed in Egypt and elsewhere. Cubical dice with markings virtually identical to those on modern dice have been found in Egyptian tombs dating from 2000 b.c. We know that gambling with dice has been popular ever since that time and played an important part in the early development of probability theory. It is generally believed that the mathematical theory of probability was started by the French mathematicians Blaise Pascal (1623–1662) and Pierre Fermat (1601–1665) when they succeeded in deriving exact probabilities for certain gambling problems involving dice. Some of the problems that they solved had been outstanding for about 300 years. However, numerical probabilities of various dice combinations had been calculated previously by Girolamo Cardano (1501–1576) and Galileo Galilei (1564– 1642). The theory of probability has been developed steadily since the seventeenth century and has been widely applied in diverse fields of study. Today, probability theory is an important tool in most areas of engineering, science, and management. Many research workers are actively engaged in the discovery and establishment of new applications of probability in fields such as medicine, meteorology, photography from satellites, marketing, earthquake prediction, human behavior, the design of computer systems, finance, genetics, and law. In many legal proceedings involving antitrust violations or employment discrimination, both sides will present probability and statistical calculations to help support their cases. 1 C1 felizmente Introdução a Probabilidade 1.1 A História da Probabilidade 1.2Interpretações de probabilidade 1.3Experimentos e Eventos 1.4Teoria de conjuntos 1,5A definição de probabilidade 1.6Espaços amostrais finitos 1.7Métodos de contagem 1,8Métodos Combinatórios 1,9Coeficientes Multinomiais 1.10A probabilidade de uma união de eventos 1.11Fraudes estatísticas 1.12Exercícios Suplementares 1.1 A História da Probabilidade O uso da probabilidade para medir a incerteza e a variabilidade remonta a centenas de anos. A probabilidade encontrou aplicação em áreas tão diversas como medicina, jogos de azar, previsão do tempo e direito. Os conceitos de acaso e incerteza são tão antigos quanto a própria civilização. As pessoas sempre tiveram de lidar com a incerteza sobre o clima, o seu abastecimento alimentar e outros aspectos do seu ambiente, e têm-se esforçado por reduzir esta incerteza e os seus efeitos. Até a ideia de jogar tem uma longa história. Por volta do ano 3500aC, os jogos de azar jogados com objetos de osso que poderiam ser considerados precursores dos dados eram aparentemente altamente desenvolvidos no Egito e em outros lugares. Dados cúbicos com marcações virtualmente idênticas às dos dados modernos foram encontrados em tumbas egípcias datadas de 2000.a.C.Sabemos que o jogo com dados tem sido popular desde então e desempenhou um papel importante no desenvolvimento inicial da teoria das probabilidades. Acredita-se geralmente que a teoria matemática da probabilidade foi iniciada pelos matemáticos franceses Blaise Pascal (1623-1662) e Pierre Fermat (1601-1665) quando conseguiram derivar probabilidades exatas para certos problemas de jogo envolvendo dados. Alguns dos problemas que eles resolveram estavam pendentes há cerca de 300 anos. No entanto, as probabilidades numéricas de várias combinações de dados foram calculadas anteriormente por Girolamo Cardano (1501–1576) e Galileo Galilei (1564–1642). A teoria da probabilidade tem sido desenvolvida de forma constante desde o século XVII e tem sido amplamente aplicada em diversos campos de estudo. Hoje, a teoria da probabilidade é uma ferramenta importante na maioria das áreas da engenharia, ciência e gestão. Muitos investigadores estão activamente empenhados na descoberta e estabelecimento de novas aplicações de probabilidade em campos como medicina, meteorologia, fotografia de satélite, marketing, previsão de terramotos, comportamento humano, concepção de sistemas informáticos, finanças, genética e direito. Em muitos processos judiciais que envolvem violações antitrust ou discriminação no emprego, ambas as partes apresentarão cálculos de probabilidade e estatísticos para ajudar a apoiar os seus casos. 1 2 Chapter 1 Introduction to Probability References The ancient history of gambling and the origins of the mathematical theory of prob- ability are discussed by David (1988), Ore (1960), Stigler (1986), and Todhunter (1865). Some introductory books on probability theory, which discuss many of the same topics that will be studied in this book, are Feller (1968); Hoel, Port, and Stone (1971); Meyer (1970); and Olkin, Gleser, and Derman (1980). Other introductory books, which discuss both probability theory and statistics at about the same level as they will be discussed in this book, are Brunk (1975); Devore (1999); Fraser (1976); Hogg and Tanis (1997); Kempthorne and Folks (1971); Larsen and Marx (2001); Larson (1974); Lindgren (1976); Miller and Miller (1999); Mood, Graybill, and Boes (1974); Rice (1995); and Wackerly, Mendenhall, and Schaeffer (2008). 1.2 Interpretations of Probability This section describes three common operational interpretations of probability. Although the interpretations may seem incompatible, it is fortunate that the calcu- lus of probability (the subject matter of the first six chapters of this book) applies equally well no matter which interpretation one prefers. In addition to the many formal applications of probability theory, the concept of probability enters our everyday life and conversation. We often hear and use such expressions as “It probably will rain tomorrow afternoon,” “It is very likely that the plane will arrive late,” or “The chances are good that he will be able to join us for dinner this evening.” Each of these expressions is based on the concept of the probability, or the likelihood, that some specific event will occur. Despite the fact that the concept of probability is such a common and natural part of our experience, no single scientific interpretation of the term probability is accepted by all statisticians, philosophers, and other authorities. Through the years, each interpretation of probability that has been proposed by some authorities has been criticized by others. Indeed, the true meaning of probability is still a highly controversial subject and is involved in many current philosophical discussions per- taining to the foundations of statistics. Three different interpretations of probability will be described here. Each of these interpretations can be very useful in applying probability theory to practical problems. The Frequency Interpretation of Probability In many problems, the probability that some specific outcome of a process will be obtained can be interpreted to mean the relative frequency with which that outcome would be obtained if the process were repeated a large number of times under similar conditions. For example, the probability of obtaining a head when a coin is tossed is considered to be 1/2 because the relative frequency of heads should be approximately 1/2 when the coin is tossed a large number of times under similar conditions. In other words, it is assumed that the proportion of tosses on which a head is obtained would be approximately 1/2. Of course, the conditions mentioned in this example are too vague to serve as the basis for a scientific definition of probability. First, a “large number” of tosses of the coin is specified, but there is no definite indication of an actual number that would 2 Capítulo 1 Introdução à Probabilidade Referências A história antiga do jogo e as origens da teoria matemática da probabilidade são discutidas por David (1988), Ore (1960), Stigler (1986) e Todhunter (1865). Alguns livros introdutórios à teoria das probabilidades, que discutem muitos dos mesmos tópicos que serão estudados neste livro, são Feller (1968); Hoel, Porto e Pedra (1971); Meyer (1970); e Olkin, Gleser e Derman (1980). Outros livros introdutórios, que discutem a teoria das probabilidades e a estatística aproximadamente no mesmo nível em que serão discutidos neste livro, são Brunk (1975); Devore (1999); Fraser (1976); Hogg e Tanis (1997); Kempthorne e Folks (1971); Larsen e Marx (2001); Larson (1974); Lindgren (1976); Miller e Miller (1999); Humor, Graybill e Boes (1974); Arroz (1995); e Wackerly, Mendenhall e Schaeffer (2008). 1.2 Interpretações de Probabilidade Esta seção descreve três interpretações operacionais comuns de probabilidade. Embora as interpretações possam parecer incompatíveis, é uma sorte que o cálculo de probabilidade (o tema dos primeiros seis capítulos deste livro) se aplique igualmente bem, independentemente da interpretação preferida. Além das muitas aplicações formais da teoria da probabilidade, o conceito de probabilidade entra na nossa vida cotidiana e nas nossas conversas. Freqüentemente ouvimos e usamos expressões como “Provavelmente choverá amanhã à tarde”, “É muito provável que o avião chegue atrasado” ou “Há boas chances de que ele possa jantar conosco esta noite”. Cada uma dessas expressões é baseada no conceito de probabilidade, ou probabilidade, de que algum evento específico ocorra. Apesar do fato de o conceito de probabilidade ser uma parte tão comum e natural da nossa experiência, nenhuma interpretação científica do termoprobabilidadeé aceito por todos os estatísticos, filósofos e outras autoridades. Ao longo dos anos, cada interpretação de probabilidade proposta por algumas autoridades foi criticada por outras. Na verdade, o verdadeiro significado da probabilidade ainda é um assunto altamente controverso e está envolvido em muitas discussões filosóficas atuais relativas aos fundamentos da estatística. Três interpretações diferentes de probabilidade serão descritas aqui. Cada uma dessas interpretações pode ser muito útil na aplicação da teoria das probabilidades a problemas práticos. A interpretação de frequência da probabilidade Em muitos problemas, a probabilidade de que algum resultado específico de um processo seja obtido pode ser interpretada como significando ofrequência relativacom o qual esse resultado seria obtido se o processo fosse repetido um grande número de vezes em condições semelhantes. Por exemplo, a probabilidade de obter cara quando uma moeda é lançada é considerada 1/2 porque a frequência relativa de caras deve ser aproximadamente 1/2 quando a moeda é lançada um grande número de vezes em condições semelhantes. Em outras palavras, assume-se que a proporção de lançamentos em que se obtém cara seria de aproximadamente 1/2. É claro que as condições mencionadas neste exemplo são demasiado vagas para servirem de base a uma definição científica de probabilidade. Primeiro, é especificado um “grande número” de lançamentos da moeda, mas não há indicação definitiva de um número real que seria 1.2 Interpretations of Probability 3 be considered large enough. Second, it is stated that the coin should be tossed each time “under similar conditions,” but these conditions are not described precisely. The conditions under which the coin is tossed must not be completely identical for each toss because the outcomes would then be the same, and there would be either all heads or all tails. In fact, a skilled person can toss a coin into the air repeatedly and catch it in such a way that a head is obtained on almost every toss. Hence, the tosses must not be completely controlled but must have some “random” features. Furthermore, it is stated that the relative frequency of heads should be “approx- imately 1/2,” but no limit is specified for the permissible variation from 1/2. If a coin were tossed 1,000,000 times, we would not expect to obtain exactly 500,000 heads. Indeed, we would be extremely surprised if we obtained exactly 500,000 heads. On the other hand, neither would we expect the number of heads to be very far from 500,000. It would be desirable to be able to make a precise statement of the like- lihoods of the different possible numbers of heads, but these likelihoods would of necessity depend on the very concept of probability that we are trying to define. Another shortcoming of the frequency interpretation of probability is that it applies only to a problem in which there can be, at least in principle, a large number of similar repetitions of a certain process. Many important problems are not of this type. For example, the frequency interpretation of probability cannot be applied directly to the probability that a specific acquaintance will get married within the next two years or to the probability that a particular medical research project will lead to the development of a new treatment for a certain disease within a specified period of time. The Classical Interpretation of Probability The classical interpretation of probability is based on the concept of equally likely outcomes. For example, when a coin is tossed, there are two possible outcomes: a head or a tail. If it may be assumed that these outcomes are equally likely to occur, then they must have the same probability. Since the sum of the probabilities must be 1, both the probability of a head and the probability of a tail must be 1/2. More generally, if the outcome of some process must be one of n different outcomes, and if these n outcomes are equally likely to occur, then the probability of each outcome is 1/n. Two basic difficulties arise when an attempt is made to develop a formal defi- nition of probability from the classical interpretation. First, the concept of equally likely outcomes is essentially based on the concept of probability that we are trying to define. The statement that two possible outcomes are equally likely to occur is the same as the statement that two outcomes have the same probability. Second, no sys- tematic method is given for assigning probabilities to outcomes that are not assumed to be equally likely. When a coin is tossed, or a well-balanced die is rolled, or a card is chosen from a well-shuffled deck of cards, the different possible outcomes can usually be regarded as equally likely because of the nature of the process. However, when the problem is to guess whether an acquaintance will get married or whether a research project will be successful, the possible outcomes would not typically be considered to be equally likely, and a different method is needed for assigning probabilities to these outcomes. The Subjective Interpretation of Probability According to the subjective, or personal, interpretation of probability, the probability that a person assigns to a possible outcome of some process represents her own 1.2 Interpretações de Probabilidade 3 ser considerado suficientemente grande. Em segundo lugar, afirma-se que a moeda deve ser lançada sempre “sob condições semelhantes”, mas estas condições não são descritas com precisão. As condições sob as quais a moeda é lançada não devem ser completamente idênticas para cada lançamento porque os resultados seriam então os mesmos e haveria todas as caras ou todas as coroas. Na verdade, uma pessoa habilidosa pode lançar uma moeda ao ar repetidamente e pegá-la de tal forma que uma cara seja obtida em quase todos os lançamentos. Conseqüentemente, os lançamentos não devem ser completamente controlados, mas devem ter algumas características “aleatórias”. Além disso, afirma-se que a frequência relativa das cabeças deve ser “aproximadamente 1/2”, mas nenhum limite é especificado para a variação permitida de 1/2. Se uma moeda fosse lançada 1.000.000 de vezes, não esperaríamos obter exatamente 500.000 caras. Na verdade, ficaríamos extremamente surpresos se obtivessemos exactamente 500.000 cabeças. Por outro lado, também não esperaríamos que o número de cabeças estivesse muito longe de 500 mil. Seria desejável poder fazer uma declaração precisa das probabilidades dos diferentes números possíveis de caras, mas estas probabilidades dependeriam necessariamente do próprio conceito de probabilidade que estamos a tentar definir. Outra deficiência da interpretação frequencial da probabilidade é que ela se aplica apenas a um problema no qual pode haver, pelo menos em princípio, um grande número de repetições semelhantes de um determinado processo. Muitos problemas importantes não são deste tipo. Por exemplo, a interpretação da frequência da probabilidade não pode ser aplicada diretamente à probabilidade de um conhecido específico se casar nos próximos dois anos ou à probabilidade de um determinado projeto de pesquisa médica levar ao desenvolvimento de um novo tratamento para uma determinada doença. dentro de um período de tempo especificado. A interpretação clássica da probabilidade A interpretação clássica da probabilidade é baseada no conceito deresultados igualmente prováveis. Por exemplo, quando uma moeda é lançada, existem dois resultados possíveis: cara ou coroa. Se for possível presumir que esses resultados têm a mesma probabilidade de ocorrer, então eles devem ter a mesma probabilidade. Como a soma das probabilidades deve ser 1, tanto a probabilidade de cara quanto a probabilidade de coroa devem ser 1/2. De modo mais geral, se o resultado de algum processo deve ser um dosnresultados diferentes, e se estesn resultados têm a mesma probabilidade de ocorrer, então a probabilidade de cada resultado é 1 /n. Duas dificuldades básicas surgem quando se tenta desenvolver uma definição formal de probabilidade a partir da interpretação clássica. Primeiro, o conceito de resultados igualmente prováveis baseia-se essencialmente no conceito de probabilidade que estamos a tentar definir. A afirmação de que dois resultados possíveis têm a mesma probabilidade de ocorrer é igual à afirmação de que dois resultados têm a mesma probabilidade. Em segundo lugar, não é fornecido nenhum método sistemático para atribuir probabilidades a resultados que não sejam considerados igualmente prováveis. Quando uma moeda é lançada, ou um dado bem equilibrado é lançado, ou uma carta é escolhida de um baralho bem embaralhado, os diferentes resultados possíveis podem geralmente ser considerados igualmente prováveis devido à natureza do processo. No entanto, A interpretação subjetiva da probabilidade De acordo com a interpretação subjetiva ou pessoal da probabilidade, a probabilidade que uma pessoa atribui a um possível resultado de algum processo representa a sua própria probabilidade. 4 Chapter 1 Introduction to Probability judgment of the likelihood that the outcome will be obtained. This judgment will be based on each person’s beliefs and information about the process. Another person, who may have different beliefs or different information, may assign a different probability to the same outcome. For this reason, it is appropriate to speak of a certain person’s subjective probability of an outcome, rather than to speak of the true probability of that outcome. As an illustration of this interpretation, suppose that a coin is to be tossed once. A person with no special information about the coin or the way in which it is tossed might regard a head and a tail to be equally likely outcomes. That person would then assign a subjective probability of 1/2 to the possibility of obtaining a head. The person who is actually tossing the coin, however, might feel that a head is much more likely to be obtained than a tail. In order that people in general may be able to assign subjective probabilities to the outcomes, they must express the strength of their belief in numerical terms. Suppose, for example, that they regard the likelihood of obtaining a head to be the same as the likelihood of obtaining a red card when one card is chosen from a well-shuffled deck containing four red cards and one black card. Because those people would assign a probability of 4/5 to the possibility of obtaining a red card, they should also assign a probability of 4/5 to the possibility of obtaining a head when the coin is tossed. This subjective interpretation of probability can be formalized. In general, if people’s judgments of the relative likelihoods of various combinations of outcomes satisfy certain conditions of consistency, then it can be shown that their subjective probabilities of the different possible events can be uniquely determined. However, there are two difficulties with the subjective interpretation. First, the requirement that a person’s judgments of the relative likelihoods of an infinite number of events be completely consistent and free from contradictions does not seem to be humanly attainable, unless a person is simply willing to adopt a collection of judgments known to be consistent. Second, the subjective interpretation provides no “objective” basis for two or more scientists working together to reach a common evaluation of the state of knowledge in some scientific area of common interest. On the other hand, recognition of the subjective interpretation of probability has the salutary effect of emphasizing some of the subjective aspects of science. A particular scientist’s evaluation of the probability of some uncertain outcome must ultimately be that person’s own evaluation based on all the evidence available. This evaluation may well be based in part on the frequency interpretation of probability, since the scientist may take into account the relative frequency of occurrence of this outcome or similar outcomes in the past. It may also be based in part on the classical interpretation of probability, since the scientist may take into account the total num- ber of possible outcomes that are considered equally likely to occur. Nevertheless, the final assignment of numerical probabilities is the responsibility of the scientist herself. The subjective nature of science is also revealed in the actual problem that a particular scientist chooses to study from the class of problems that might have been chosen, in the experiments that are selected in carrying out this study, and in the conclusions drawn from the experimental data. The mathematical theory of probability and statistics can play an important part in these choices, decisions, and conclusions. Note: The Theory of Probability Does Not Depend on Interpretation. The math- ematical theory of probability is developed and presented in Chapters 1–6 of this book without regard to the controversy surrounding the different interpretations of 4 Capítulo 1 Introdução à Probabilidade julgamento da probabilidade de o resultado ser obtido. Este julgamento será baseado nas crenças e informações de cada pessoa sobre o processo. Outra pessoa, que pode ter crenças ou informações diferentes, pode atribuir uma probabilidade diferente ao mesmo resultado. Por esta razão, é apropriado falar da personalidade de uma determinada pessoa.probabilidade subjetivade um resultado, em vez de falar do probabilidade verdadeiradesse resultado. Como ilustração desta interpretação, suponhamos que uma moeda seja lançada uma vez. Uma pessoa sem informações especiais sobre a moeda ou a forma como ela é lançada pode considerar que cara e coroa são resultados igualmente prováveis. Essa pessoa atribuiria então uma probabilidade subjetiva de 1/2 à possibilidade de obter cara. A pessoa que está realmente jogando a moeda, entretanto, pode sentir que é muito mais provável obter uma cara do que uma coroa. Para que as pessoas em geral possam atribuir probabilidades subjectivas aos resultados, devem expressar a força da sua crença em termos numéricos. Suponhamos, por exemplo, que eles considerem a probabilidade de obter cara igual à probabilidade de obter uma carta vermelha quando uma carta é escolhida de um baralho bem embaralhado contendo quatro cartas vermelhas e uma carta preta. Esta interpretação subjetiva da probabilidade pode ser formalizada. Em geral, se os julgamentos das pessoas sobre as probabilidades relativas de várias combinações de resultados satisfazem certas condições de consistência, então pode ser demonstrado que as suas probabilidades subjectivas dos diferentes eventos possíveis podem ser determinadas de forma única. No entanto, existem duas dificuldades com a interpretação subjetiva. Primeiro, o requisito de que os julgamentos de uma pessoa sobre as probabilidades relativas de um número infinito de acontecimentos sejam completamente consistentes e livres de contradições não parece ser humanamente alcançável, a menos que uma pessoa esteja simplesmente disposta a adoptar um conjunto de julgamentos conhecidos por serem consistentes. Segundo, Por outro lado, o reconhecimento da interpretação subjetiva da probabilidade tem o efeito salutar de enfatizar alguns dos aspectos subjetivos da ciência. A avaliação que um determinado cientista faz da probabilidade de algum resultado incerto deve, em última análise, ser a avaliação da própria pessoa, baseada em todas as evidências disponíveis. Esta avaliação pode muito bem basear-se, em parte, na interpretação da frequência da probabilidade, uma vez que o cientista pode ter em conta a frequência relativa de ocorrência deste resultado ou de resultados semelhantes no passado. Também pode basear-se, em parte, na interpretação clássica da probabilidade, uma vez que o cientista pode levar em conta o número total de resultados possíveis que são considerados igualmente prováveis de ocorrer. No entanto, a atribuição final das probabilidades numéricas é da responsabilidade da própria cientista. A natureza subjetiva da ciência também é revelada no problema real que um determinado cientista escolhe estudar a partir da classe de problemas que poderia ter sido escolhida, nos experimentos que são selecionados na realização deste estudo e nas conclusões tiradas do estudo experimental. dados. A teoria matemática da probabilidade e da estatística pode desempenhar um papel importante nessas escolhas, decisões e conclusões. Nota: A Teoria da Probabilidade não depende de interpretação.A teoria matemática da probabilidade é desenvolvida e apresentada nos capítulos 1 a 6 deste livro, sem levar em conta a controvérsia em torno das diferentes interpretações de 1.3 Experiments and Events 5 the term probability. This theory is correct and can be usefully applied, regardless of which interpretation of probability is used in a particular problem. The theories and techniques that will be presented in this book have served as valuable guides and tools in almost all aspects of the design and analysis of effective experimentation. 1.3 Experiments and Events Probability will be the way that we quantify how likely something is to occur (in the sense of one of the interpretations in Sec. 1.2). In this section, we give examples of the types of situations in which probability will be used. Types of Experiments The theory of probability pertains to the various possible outcomes that might be obtained and the possible events that might occur when an experiment is performed. Definition 1.3.1 Experiment and Event. An experiment is any process, real or hypothetical, in which the possible outcomes can be identified ahead of time. An event is a well-defined set of possible outcomes of the experiment. The breadth of this definition allows us to call almost any imaginable process an experiment whether or not its outcome will ever be known. The probability of each event will be our way of saying how likely it is that the outcome of the experiment is in the event. Not every set of possible outcomes will be called an event. We shall be more specific about which subsets count as events in Sec. 1.4. Probability will be most useful when applied to a real experiment in which the outcome is not known in advance, but there are many hypothetical experiments that provide useful tools for modeling real experiments. A common type of hypothetical experiment is repeating a well-defined task infinitely often under similar conditions. Some examples of experiments and specific events are given next. In each example, the words following “the probability that” describe the event of interest. 1. In an experiment in which a coin is to be tossed 10 times, the experimenter might want to determine the probability that at least four heads will be obtained. 2. In an experiment in which a sample of 1000 transistors is to be selected from a large shipment of similar items and each selected item is to be inspected, a person might want to determine the probability that not more than one of the selected transistors will be defective. 3. In an experiment in which the air temperature at a certain location is to be observed every day at noon for 90 successive days, a person might want to determine the probability that the average temperature during this period will be less than some specified value. 4. From information relating to the life of Thomas Jefferson, a person might want to determine the probability that Jefferson was born in the year 1741. 5. In evaluating an industrial research and development project at a certain time, a person might want to determine the probability that the project will result in the successful development of a new product within a specified number of months. 1.3 Experimentos e Eventos 5 o termo probabilidade. Esta teoria está correta e pode ser aplicada de forma útil, independentemente de qual interpretação de probabilidade é usada em um problema específico. As teorias e técnicas que serão apresentadas neste livro serviram como guias e ferramentas valiosas em quase todos os aspectos do projeto e da análise da experimentação eficaz. 1.3 Experimentos e Eventos A probabilidade será a forma como quantificamos a probabilidade de algo ocorrer (no sentido de uma das interpretações da Seção 1.2). Nesta seção, damos exemplos dos tipos de situações em que a probabilidade será usada. Tipos de experimentos A teoria da probabilidade refere-se aos vários resultados possíveis que podem ser obtidos e aos possíveis eventos que podem ocorrer quando um experimento é realizado. Definição 1.3.1 Experimento e Evento.Umexperimentaré qualquer processo, real ou hipotético, no qual os resultados possíveis podem ser identificados com antecedência. Umeventoé um conjunto bem definido de resultados possíveis do experimento. A amplitude desta definição permite-nos chamar quase qualquer processo imaginável de experiência, quer o seu resultado venha a ser conhecido ou não. A probabilidade de cada evento será a nossa maneira de dizer quão provável é que o resultado da experiência esteja no evento. Nem todo conjunto de resultados possíveis será chamado de evento. Seremos mais específicos sobre quais subconjuntos contam como eventos na Seção. 1.4. A probabilidade será mais útil quando aplicada a um experimento real em que o resultado não é conhecido antecipadamente, mas existem muitos experimentos hipotéticos que fornecem ferramentas úteis para modelar experimentos reais. Um tipo comum de experimento hipotético é repetir uma tarefa bem definida com frequência infinita sob condições semelhantes. Alguns exemplos de experimentos e eventos específicos são apresentados a seguir. Em cada exemplo, as palavras após “a probabilidade de que” descrevem o evento de interesse. 1. Numa experiência em que uma moeda deve ser lançada 10 vezes, o experimentador pode querer determinar a probabilidade de obter pelo menos quatro caras. 2. Em um experimento no qual uma amostra de 1.000 transistores deve ser selecionada de uma grande remessa de itens semelhantes e cada item selecionado deve ser inspecionado, uma pessoa pode querer determinar a probabilidade de que não mais do que um dos transistores selecionados será estar com defeito. 3. Numa experiência em que a temperatura do ar num determinado local deve ser observada todos os dias ao meio-dia durante 90 dias consecutivos, uma pessoa pode querer determinar a probabilidade de que a temperatura média durante este período seja inferior a algum valor especificado. 4. A partir de informações relativas à vida de Thomas Jefferson, uma pessoa pode querer determinar a probabilidade de Jefferson ter nascido no ano de 1741. 5. Ao avaliar um projeto de pesquisa e desenvolvimento industrial em um determinado momento, uma pessoa pode querer determinar a probabilidade de o projeto resultar no desenvolvimento bem-sucedido de um novo produto dentro de um determinado número de meses. 6 Chapter 1 Introduction to Probability The Mathematical Theory of Probability As was explained in Sec. 1.2, there is controversy in regard to the proper meaning and interpretation of some of the probabilities that are assigned to the outcomes of many experiments. However, once probabilities have been assigned to some simple outcomes in an experiment, there is complete agreement among all authorities that the mathematical theory of probability provides the appropriate methodology for the further study of these probabilities. Almost all work in the mathematical theory of probability, from the most elementary textbooks to the most advanced research, has been related to the following two problems: (i) methods for determining the probabilities of certain events from the specified probabilities of each possible outcome of an experiment and (ii) methods for revising the probabilities of events when additional relevant information is obtained. These methods are based on standard mathematical techniques. The purpose of the first six chapters of this book is to present these techniques, which, together, form the mathematical theory of probability. 1.4 Set Theory This section develops the formal mathematical model for events, namely, the theory of sets. Several important concepts are introduced, namely, element, subset, empty set, intersection, union, complement, and disjoint sets. The Sample Space Definition 1.4.1 Sample Space. The collection of all possible outcomes of an experiment is called the sample space of the experiment. The sample space of an experiment can be thought of as a set, or collection, of different possible outcomes; and each outcome can be thought of as a point, or an element, in the sample space. Similarly, events can be thought of as subsets of the sample space. Example 1.4.1 Rolling a Die. When a six-sided die is rolled, the sample space can be regarded as containing the six numbers 1, 2, 3, 4, 5, 6, each representing a possible side of the die that shows after the roll. Symbolically, we write S = {1, 2, 3, 4, 5, 6}. One event A is that an even number is obtained, and it can be represented as the subset A = {2, 4, 6}. The event B that a number greater than 2 is obtained is defined by the subset B = {3, 4, 5, 6}. ◀ Because we can interpret outcomes as elements of a set and events as subsets of a set, the language and concepts of set theory provide a natural context for the development of probability theory. The basic ideas and notation of set theory will now be reviewed. 6 Capítulo 1 Introdução à Probabilidade A Teoria Matemática da Probabilidade Como foi explicado na Seç. 1.2, há controvérsia em relação ao significado e interpretação adequados de algumas das probabilidades atribuídas aos resultados de muitos experimentos. Contudo, uma vez atribuídas probabilidades a alguns resultados simples numa experiência, existe um acordo completo entre todas as autoridades de que a teoria matemática da probabilidade fornece a metodologia apropriada para o estudo mais aprofundado destas probabilidades. Quase todos os trabalhos na teoria matemática da probabilidade, desde os livros mais elementares até às pesquisas mais avançadas, têm sido relacionados com os dois problemas seguintes: Esses métodos são baseados em técnicas matemáticas padrão. O objetivo dos primeiros seis capítulos deste livro é apresentar essas técnicas que, juntas, formam a teoria matemática da probabilidade. 1.4 Teoria dos Conjuntos Esta seção desenvolve o modelo matemático formal para eventos, ou seja, a teoria dos conjuntos. Vários conceitos importantes são introduzidos, nomeadamente, elemento, subconjunto, conjunto vazio, intersecção, união, complemento e conjuntos disjuntos. O espaço amostral Definição 1.4.1 Espaço amostral.O conjunto de todos os resultados possíveis de um experimento é chamado de espaço amostraldo experimento. O espaço amostral de um experimento pode ser pensado como umdefinir, ou coleção, de diferentes resultados possíveis; e cada resultado pode ser pensado como umapontar, ou um elemento, no espaço amostral. Da mesma forma, os eventos podem ser pensados comosubconjuntosdo espaço amostral. Exemplo 1.4.1 Lançando um dado.Quando um dado de seis lados é lançado, o espaço amostral pode ser considerado como contendo os seis números 1,2,3,4,5,6, cada um representando um lado possível do dado que aparece após o lançamento. Simbolicamente, escrevemos S= {1,2,3,4,5,6}. Um eventoAé que um número par é obtido e pode ser representado como o subconjuntoA= {2,4,6}. O eventoBque um número maior que 2 é obtido é definido pelo subconjuntoB= {3,4,5,6}. - Como podemos interpretar os resultados como elementos de um conjunto e os eventos como subconjuntos de um conjunto, a linguagem e os conceitos da teoria dos conjuntos fornecem um contexto natural para o desenvolvimento da teoria das probabilidades. As ideias básicas e a notação da teoria dos conjuntos serão agora revisadas. 1.4 Set Theory 7 Relations of Set Theory Let S denote the sample space of some experiment. Then each possible outcome s of the experiment is said to be a member of the space S, or to belong to the space S. The statement that s is a member of S is denoted symbolically by the relation s ∈ S. When an experiment has been performed and we say that some event E has occurred, we mean two equivalent things. One is that the outcome of the experiment satisfied the conditions that specified that event E. The other is that the outcome, considered as a point in the sample space, is an element of E. To be precise, we should say which sets of outcomes correspond to events as de- fined above. In many applications, such as Example 1.4.1, it will be clear which sets of outcomes should correspond to events. In other applications (such as Example 1.4.5 coming up later), there are too many sets available to have them all be events. Ide- ally, we would like to have the largest possible collection of sets called events so that we have the broadest possible applicability of our probability calculations. However, when the sample space is too large (as in Example 1.4.5) the theory of probability simply will not extend to the collection of all subsets of the sample space. We would prefer not to dwell on this point for two reasons. First, a careful handling requires mathematical details that interfere with an initial understanding of the important concepts, and second, the practical implications for the results in this text are min- imal. In order to be mathematically correct without imposing an undue burden on the reader, we note the following. In order to be able to do all of the probability cal- culations that we might find interesting, there are three simple conditions that must be met by the collection of sets that we call events. In every problem that we see in this text, there exists a collection of sets that includes all the sets that we will need to discuss and that satisfies the three conditions, and the reader should assume that such a collection has been chosen as the events. For a sample space S with only finitely many outcomes, the collection of all subsets of S satisfies the conditions, as the reader can show in Exercise 12 in this section. The first of the three conditions can be stated immediately. Condition 1 The sample space S must be an event. That is, we must include the sample space S in our collection of events. The other two conditions will appear later in this section because they require additional definitions. Condition 2 is on page 9, and Condition 3 is on page 10. Definition 1.4.2 Containment. It is said that a set A is contained in another set B if every element of the set A also belongs to the set B. This relation between two events is expressed symbolically by the expression A ⊂ B, which is the set-theoretic expression for saying that A is a subset of B. Equivalently, if A ⊂ B, we may say that B contains A and may write B ⊃ A. For events, to say that A ⊂ B means that if A occurs then so does B. The proof of the following result is straightforward and is omitted. Theorem 1.4.1 Let A, B, and C be events. Then A ⊂ S. If A ⊂ B and B ⊂ A, then A = B. If A ⊂ B and B ⊂ C, then A ⊂ C. Example 1.4.2 Rolling a Die. In Example 1.4.1, suppose that A is the event that an even number is obtained and C is the event that a number greater than 1 is obtained. Since A = {2, 4, 6} and C = {2, 3, 4, 5, 6}, it follows that A ⊂ C. ◀ 1.4 Teoria dos Conjuntos 7 Relações da Teoria dos Conjuntos DeixarSdenota o espaço amostral de algum experimento. Então cada resultado possívelé do experimento é considerado um membro do espaçoS, ou pertencer ao espaçoS. A afirmação de queéé membro deSé denotado simbolicamente pela relaçãoé∈S. Quando um experimento foi realizado e dizemos que algum eventoEocorreu, queremos dizer duas coisas equivalentes. Uma é que o resultado do experimento satisfez as condições que especificaram aquele eventoE. A outra é que o resultado, considerado como um ponto no espaço amostral, é um elemento deE. Para ser mais preciso, deveríamos dizer quais conjuntos de resultados correspondem aos eventos definidos acima. Em muitas aplicações, como no Exemplo 1.4.1, ficará claro quais conjuntos de resultados devem corresponder aos eventos. Em outras aplicações (como o Exemplo 1.4.5 que será apresentado posteriormente), há muitos conjuntos disponíveis para que todos sejam eventos. Idealmente, gostaríamos de ter a maior coleção possível de conjuntos chamados eventos, para que tivéssemos a aplicabilidade mais ampla possível dos nossos cálculos de probabilidade. No entanto, quando o espaço amostral é muito grande (como no Exemplo 1.4.5), a teoria da probabilidade simplesmente não se estenderá à coleção de todos os subconjuntos do espaço amostral. Preferimos não nos deter neste ponto por duas razões. Primeiro, um manuseio cuidadoso requer detalhes matemáticos que interferem na compreensão inicial dos conceitos importantes, e segundo, as implicações práticas para os resultados deste texto são mínimas. Para estarmos matematicamente corretos sem impor um fardo indevido ao leitor, observamos o seguinte. Para podermos fazer todos os cálculos de probabilidade que possamos achar interessantes, existem três condições simples que devem ser satisfeitas pela coleção de conjuntos que chamamos de acontecimentos. Em cada problema que vemos neste texto, existe uma coleção de conjuntos que inclui todos os conjuntos que precisaremos discutir e que satisfaz as três condições, e o leitor deve assumir que tal coleção foi escolhida como os eventos. Para um espaço amostral Para podermos fazer todos os cálculos de probabilidade que possamos achar interessantes, existem três condições simples que devem ser satisfeitas pela coleção de conjuntos que chamamos de acontecimentos. Em cada problema que vemos neste texto, existe uma coleção de conjuntos que inclui todos os conjuntos que precisaremos discutir e que satisfaz as três condições, e o leitor deve assumir que tal coleção foi escolhida como os eventos. Para um espaço amostral Para podermos fazer todos os cálculos de probabilidade que possamos achar interessantes, existem três condições simples que devem ser satisfeitas pela coleção de conjuntos que chamamos de acontecimentos. Em cada problema que vemos neste texto, existe uma coleção de conjuntos que inclui todos os conjuntos que precisaremos discutir e que satisfaz as três condições, e o leitor deve assumir que tal coleção foi escolhida como os eventos. Para um espaço amostral e o leitor deve presumir que tal coleção foi escolhida como os eventos. Para um espaço amostral e o leitor deve presumir que tal coleção foi escolhida como os eventos. Para um espaço amostralScom apenas um número finito de resultados, a coleção de todos os subconjuntos deSsatisfaz as condições, como o leitor pode mostrar no Exercício 12 desta seção. A primeira das três condições pode ser declarada imediatamente. Doença 1 O espaço amostralSdeve ser um evento. Ou seja, devemos incluir o espaço amostralSem nossa coleção de eventos. As outras duas condições aparecerão posteriormente nesta seção porque requerem definições adicionais. A condição 2 está na página 9 e a condição 3 está na página 10. Definição 1.4.2 Contenção.Diz-se que um conjuntoA está contido emoutro conjuntoBse cada elemento do conjuntoAtambém pertence ao conjuntoB. Esta relação entre dois eventos é expressa simbolicamente pela expressãoA⊂B, que é a expressão da teoria dos conjuntos para dizer queA é um subconjunto deB. Equivalentemente, seA⊂B, podemos dizer queBcontémAe pode escreverB⊃A. Para eventos, dizer issoA⊂Bsignifica que seAocorre então o mesmo aconteceB. A prova do resultado a seguir é direta e é omitida. Teorema 1.4.1 DeixarA,B, eCsejam eventos. EntãoA⊂S. SeA⊂BeB⊂A, entãoA=B. SeA⊂B eB⊂C, entãoA⊂C. Exemplo 1.4.2 Lançando um dado.No Exemplo 1.4.1, suponha queAé o evento em que um número par é obtido eCé o evento em que um número maior que 1 é obtido. Desde A= {2,4, 6} eC= {2,3,4,5,6}, segue-se queA⊂C. - 8 Capitulo 1 Introdugao a Probabilidade 8 Chapter 1 Introduction to Probability O conjunto vazioAlguns eventos sdo impossiveis. Por exemplo, quando um dado é lancado, The Empty Set Some events are impossible. For example, when a die is rolled, it é impossivel obter um numero negativo. Portanto, o evento em que um numero negativo sera is impossible to obtain a negative number. Hence, the event that a negative number obtido é definido pelo subconjunto deSque nao contém resultados. will be obtained is defined by the subset of S$ that contains no outcomes. Definicado Conjunto vazio.O subconjunto deSque nao contém elementos é chamado deconjunto vazio, ounulo Definition Empty Set. The subset of S that contains no elements is called the empty set, or null 1.4.3 definir, e € denotado pelo simbolo®. 1.4.3 set, and it is denoted by the symbol #. Em termos de eventos, 0 conjunto vazio é qualquer evento que ndo pode ocorrer. In terms of events, the empty set is any event that cannot occur. Teorema DeixarAseja um evento. Entao@ CA. Theorem Let A be an event. Then c A. 1.4.2 1.4.2 ProvaDeixarAser um evento arbitrario. Como o conjunto vazio@ndo contém pontos, é Proof Let A be an arbitrary event. Since the empty set 4 contains no points, it is logicamente correto dizer que todo ponto pertencente a@também pertence aA, oud CA. logically correct to say that every point belonging to J also belongs to A, or 8 C A. a a Conjuntos Finitos e InfinitosAlguns conjuntos contém apenas um numero finito de elementos, enquanto Finite and Infinite Sets Some sets contain only finitely many elements, while others outros possuem um numero infinito de elementos. Existem dois tamanhos de conjuntos infinitos que precisamos have infinitely many elements. There are two sizes of infinite sets that we need to distinguir. distinguish. Definicgao Contavel incontavel.Um conjunto infinitoAécontéveke houver uma correspondéncia um-para-um Definition Countable/Uncountable. An infinite set A is countable if there is a one-to-one corre- 1.4.4 espondéncia entre os elementos deAe o conjunto dos nimeros naturais {1,2,3, ...}. Um 1.4.4 spondence between the elements of A and the set of natural numbers {1, 2, 3,...}.A conjunto éincontdveke nao for finito nem contavel. Se dissermos que um conjunto temno set is uncountable if it is neither finite nor countable. If we say that a set has at most mdximo muitos contéveiselementos, queremos dizer que o conjunto é finito ou contavel. countably many elements, we mean that the set is either finite or countable. Exemplos de conjuntos infinitos contaveis incluem os inteiros, os inteiros pares, os inteiros Examples of countably infinite sets include the integers, the even integers, the odd impares, os nimeros primos e qualquer sequéncia infinita. Cada um deles pode ser colocado integers, the prime numbers, and any infinite sequence. Each of these can be put em correspondéncia biunivoca com os numeros naturais. Por exemplo, o seguinte in one-to-one correspondence with the natural numbers. For example, the following fungdofcoloca os inteiros em correspondéncia biunivoca com os numeros naturais: function f puts the integers in one-to-one correspondence with the natural numbers: { mi sené estranho n=l if n is odd fink 2 . fm=) 7 0. , - 5 _sene par. —4 ifniseven. 2 2 Cada sequéncia infinita de itens distintos é um conjunto contavel, pois sua indexagdo o coloca Every infinite sequence of distinct items is a countable set, as its indexing puts it in em correspondéncia biunivoca com os nimeros naturais. Exemplos de conjuntos incontaveis one-to-one correspondence with the natural numbers. Examples of uncountable sets incluem os numeros reais, os reais positivos, os numeros no intervalo [0,1], e o conjunto de include the real numbers, the positive reals, the numbers in the interval [0, 1], and the todos os pares ordenados de nimeros reais. Um argumento para mostrar que os numeros set of all ordered pairs of real numbers. An argument to show that the real numbers reais sdo incontaveis aparece no final desta secdo. Cada subconjunto dos inteiros tem no are uncountable appears at the end of this section. Every subset of the integers has maximo muitos elementos contaveis. at most countably many elements. Operacées da Teoria dos Conjuntos Operations of Set Theory Definicgao Complemento.Ocomp/lementode um conjuntoAé definido como o conjunto que contém todos Definition Complement. The complement of a set A is defined to be the set that contains all 1.4.5 elementos do espacgo amostralSquendopertence aA. A nota¢do para o 1.4.5 elements of the sample space S$ that do not belong to A. The notation for the complemento deAéAc. complement of A is A‘. Em termos de eventos, 0 eventoAcé 0 evento queAndo ocorre. In terms of events, the event A‘ is the event that A does not occur. Exemplo Langando um dado.No Exemplo 1.4.1, suponha novamente queAé o evento em que um numero par Example Rolling a Die. In Example 1.4.1, suppose again that A is the event that an even number 1.4.3 é enrolado; entéoAc {1,3,5} 6 o evento em que um numero impar é langado. - 1.4.3 is rolled; then A‘ = {1, 3, 5} is the event that an odd number is rolled. < Podemos agora enunciar a segunda condi¢do que exigimos da colegdo de eventos. We can now state the second condition that we require of the collection of events. 1.4 Teoria dos Conjuntos 9 1.4 Set Theory 9 Figura 1.10 eventoAc. S Figure I.1 The event A‘. S Figura 1.20 conjuntoAU8. S Figure 1.2 The set AUB. Ss Doenca SeAé um evento, entaoActambém é um evento. Condition If A is an event, then A‘ is also an event. 2 2 Ou seja, para cada conjuntoAde resultados que chamamos de evento, devemos também chamar seu That is, for each set A of outcomes that we call an event, we must also call its complementoAcum evento. complement A‘ an event. Uma versdo genérica da relagdo entreAeAcesta esbocado na Fig. 1.1. Um esboco A generic version of the relationship between A and A‘ is sketched in Fig. 1.1. deste tipo é chamado deDiagrama de Venn. A sketch of this type is called a Venn diagram. Algumas propriedades do complemento sao declaradas sem prova no préximo resultado. Some properties of the complement are stated without proof in the next result. Teorema DeixarAseja um evento. Entao Theorem Let A be an event. Then 1.4.3 1.4.3 . (AcleA, DS SB. 4 (AN =A, HK =S, SHO. O evento vazio@é um evento. 7 The empty event % is an event. 2 Definicao Unido de Dois Conjuntos.SeAeSsdo quaisquer dois conjuntos, o UnigodeAe Bé definido para ser Definition Union of Two Sets. If A and B are any two sets, the union of A and B is defined to be 1.4.6 0 conjunto contendo todos os resultados que pertencem aAsozinho, paraBsozinho ou 1.4.6 the set containing all outcomes that belong to A alone, to B alone, or to both A and para ambosAe B. A notacdo para a unido deAeBéAUB. B. The notation for the union of A and Bis AU B. O conjuntoAUBesta esbocado na Figura 1.2. Em termos de eventos, AUBé 0 evento que AouBou The set A U B is sketched in Fig. 1.2. In terms of events, A U B is the event that either ambos ocorrem. A or B or both occur. A uniao possui as seguintes propriedades cujas provas ficam ao critério do leitor. The union has the following properties whose proofs are left to the reader. Teorema Para todos os conjuntosAeB, Theorem For all sets A and B, 1.4.4 AUB= BUA, AUA=A, AUAGS, 1.4.4 AUB=BUA, AUA=A, AUA‘S=S, AU © =A, AUS=S. AUG=A, AUS=S. Além disso, seACB, entaoAUB=B. a Furthermore, if AC B, then AUB=B. = O conceito de unido se estende a mais de dois conjuntos. The concept of union extends to more than two sets. Definicao Unido de muitos conjuntos.O UnigodenconjuntosA1, ..., Ané definido como o conjunto que Definition Union of Many Sets. The union of n sets Ay,..., A, is defined to be the set that 1.4.7 contém todos os resultados que pertencem a pelo menos um dessesnconjuntos. A notacdo para esta 1.4.7 contains all outcomes that belong to at least one of these n sets. The notation for this unido é uma das seguintes: union is either of the following: UW n AIUAQU. . .UAnou Aeu. A;UA)U-+-UA, or (J Aj. eu=1 isl 10 Capitulo 1 Introdugao a Probabilidade 10 Chapter 1 Introduction to Probability Da mesma forma, oUnigode uma sequéncia infinita de conjuntosA1, A2, . ..6 0 conjunto que contém todos os Similarly, the union of an infinite sequence of sets Aj, A>, ... is the set that contains resultados que podem serU longo para pelo menos um dos eventos na sequéncia. O infinito all outcomes that belong to at least one of the events in the sequence. The infinite wwe 00 : : oO unido édenotadapor 4, union is denoted by );~, Aj. Em termos de eventos, a unido de uma colecao de eventos é 0 evento em que ocorre In terms of events, the union of a collection of events is the event that at least pelo menos um dos eventos da colecdo. one of the events in the collection occurs. Podemos agora enunciar a condicdo final necessdria para a colecdo de conjuntos que We can now state the final condition that we require for the collection of sets chamamos de eventos. that we call events. . - , , Us sas ; . oO 4: Doenca SeA1, A2,...€ uma colegdo contavel de eventos, entdo eu=1Aeutambém é um evento. Condition If Ay, Az, ... is a countable collection of events, then );", A; is also an event. 3 3 Por outras palavras, se decidirmos chamar cada conjunto de resultados numa qualquer colecgdo contavel de In other words, if we choose to call each set of outcomes in some countable collection acontecimento, seremos obrigados a chamar também a sua unido de acontecimento. Nos fazemosndo an event, we are required to call their union an event also. We do not require that exigem que a unido de uma colecao arbitraria de eventos seja um evento. Para ficar claro, vamos£User um the union of an arbitrary collection of events be an event. To be clear, let J be an conjunto arbitrario que usamos para indexar uma colecdo geral de eventos {Aeu: eu FU}.A unido dos eventos arbitrary set that we use to index a general collection of events {A; :i € 7}. The union nesta colecdo é 0 conjunto de resultados queU estado em pelo menos um dos of the events in this collection is the set of outcomes that are in at least one of the atéUs na colegdo. A notacdo para esta unido é eueEUAeu. NOs ndo events in the collection. The notation for this union is );-,; A;. We do not require exigimesdssaser um evento, a menos que£Ué contavel. that L;-<, A; be an event unless / is countable. A condicdo 3 refere-se a uma colecao contavel de eventos. Podemos provar que a condicaéo Condition 3 refers to a countable collection of events. We can prove that the também se aplica a todo conjunto finito de eventos. condition also applies to every finite collection of events. Teorema A unido de um numero finito de eventosA1,..., Ané um evento. Theorem The union of a finite number of events Aj, ..., A, is an event. 1.4.5 1.4.5 ProvaPara cadaeu=nt1, n+2,..., definirAew=O.Porque@é um evento, agora Proof For eachm=n+1,n+2,..., define A,, = @. Because 4 is an event, we now {gm uma colecao contavelAi, A2,...de véspera its. Segue (Fda Condig¢do 3 que have a countable collection A;, A>, ... of events. It follows from Condition 3 that 00 : ogee . oO : as oo _ yn eu=1Aeué um evento. Mas é facil ver isso eux\Aeu= _eu=tAeu. 7 U;-1 Am is an event. But it is easy to see that U4) Am =U), Am: 7 A unido de trés eventosA,8, eCpode ser construido diretamente a partir da definicdo The union of three events A, B, and C can be constructed either directly from the deAUBUCou avaliando primeiro a unido de quaisquer dois eventos e depois formando a definition of A U B UC or by first evaluating the union of any two of the events and unido desta combinacao de eventos e do terceiro evento. Em outras palavras, o seguinte then forming the union of this combination of events and the third event. In other resultado é verdadeiro. words, the following result is true. Teorema Propriedade associativa.Para cada trés eventosA,B, eC, a seguinte associativa Theorem Associative Property. For every three events A, B, and C, the following associative 1.4.6 as relacdes sdo satisfeitas: 1.4.6 relations are satisfied: AU BU C= (AUB) AU (BUC). 7 AUBUC=(AUB)UC=AU(BUC). a Definigao Intersegdo de dois conjuntos.SeAeBsdo quaisquer dois conjuntos, o/nterse¢dodeAe Bé Definition Intersection of Two Sets. If A and B are any two sets, the intersection of A and B is 1.4.8 definido como 0 conjunto que contém todos os resultados que pertencemtanto para A quanto 1.4.8 defined to be the set that contains all outcomes that belong both to A and to B. The para B. Anotacao para a interseccdo deAeBEANB. notation for the intersection of A and Bis AN B. O conjuntoAn Besta esbocado em um diagrama de Venn na Figura 1.3. Em termos de eventos,AN Bé o The set AM B is sketched in a Venn diagram in Fig. 1.3. In terms of events, AN B is evento em que ambosAeBocorrer. the event that both A and B occur. A prova da primeira parte do prdéximo resultado segue do Exercicio 3 desta The proof of the first part of the next result follows from Exercise 3 in this section. secdo. O resto da prova é simples. The rest of the proof is straightforward. Figura 1.30 conjuntoAné. S| Figure 1.3 The set AN B. Ry 1.4 Teoria dos Conjuntos 1 1 1.4 Set Theory II Teorema SeAe Bsdo eventos, entéo também éANB. Para todos os eventos/AeB, Theorem If A and B are events, then so is Af B. For all events A and B, 1.4.7 AN B=BNA, AN A=A, ANAK@, 1.4.7 ANB=BNA, ANA=A, ANAS=S, AND =, ANS=A. ANG=B, ANS=A. Além disso, seACB, entaoANB=A. a Furthermore, if Ac B, then AN B= A. = O conceito de intersegdo se estende a mais de dois conjuntos. The concept of intersection extends to more than two sets. Definigao Interseccgdo de muitos conjuntos.O/nterse¢aodenconjuntosA1, ..., Ané definido como sendo o Definition Intersection of Many Sets. The intersection of n sets Aj, ..., A, is defined to be the 1.4.9 conjunto que contém os elementos que sdo cofmnmmon para todos estesnconjuntos. A notacao para 1.4.9 set that contains the elements that are common to all these n sets. The notation for esse cruzamento €AiNA2Nn. . .NAnou éu=1Aeu. Notacdes semelhantes sao usadas para this intersection is Ay Ay M... A, or ();_, A;. Similar notations are used for the intersecdo de uma sequéncia infinita de conjuntos ou para a intersegdo de uma colecdo intersection of an infinite sequence of sets or for the intersection of an arbitrary arbitraria de conjuntos. collection of sets. Em termos de eventos, a intersegdo de uma colegdo de eventos é 0 evento em que In terms of events, the intersection of a collection of events is the event that every ocorre cada evento da colecdo. event in the collection occurs. O seguinte resultado relativo a interseccdo de trés eventos é facil de provar. The following result concerning the intersection of three events is straightfor- ward to prove. Teorema Propriedade associativa.Para cada trés eventosA,B, eC, a seguinte associativa Theorem Associative Property. For every three events A, B, and C, the following associative 1.4.8 as relacGes sao satisfeitas: 1.4.8 relations are satisfied: AN BN C=(AN BN C=AN (BNC). 2 ANBNAC=(ANB)NC=AN(BNC). 7 Definigao Disjunto/mutuamente exclusivo.Diz-se que dois conjuntosAe Bsdodisjunto, oumutuamente Definition Disjoint/Mutually Exclusive. It is said that two sets A and B are disjoint, or mutually 1.4.10 exclusivo, seAe Bnao tém resultados em comum,, isto é, seANB=G.Os conjuntos Ai,..., An 1.4.10 exclusive, if A and B have no outcomes in common, that is, if AM B = @. The sets ou os conjuntosA1, Az, .. .sdo disjuntos se para cadaeu=j,nds temos issoAeue ASdO Aj,..-, A, or the sets Aj, A>, ... are disjoint if for every i 4 j, we have that A; and disjuntos, ou seja,AeuA=Opara todoseu=/.0s eventos em uma colecdo arbitraria sdo A; are disjoint, that is, A; 1 A; =9 for alli 4 j. The events in an arbitrary collection disjuntos se ndo houver dois eventos na colecdo que tenham resultados em comum. are disjoint if no two events in the collection have any outcomes in common. Em termos de eventos,AeBsdo disjuntos se ambos nao puderem ocorrer. In terms of events, A and B are disjoint if they cannot both occur. Como ilustragdo desses conceitos, um diagrama de Venn para trés eventosA1,A2, e A3 As an illustration of these concepts, a Venn diagram for three events Aj, A, and é apresentado na Figura 1.4. Este diagrama indica que as varias intersecgdes de A1,A2, eA3 A3 is presented in Fig. 1.4. This diagram indicates that the various intersections of e seus complementos particionardo o espacgo amostralSem oito subconjuntos disjuntos. Aj, Az, and A3 and their complements will partition the sample space S into eight disjoint subsets. Figura 1.4Particdo de 5 Ss Figure 1.4 Partition of S determinado por trés S determined by three eventosA1,A2,A3. events Ay, A>, A3. At-Ac 2-A3 © ASN ASNAS 12 Chapter 1 Introduction to Probability Example 1.4.4 Tossing a Coin. Suppose that a coin is tossed three times. Then the sample space S contains the following eight possible outcomes s1, . . . , s8: s1: HHH, s2: THH, s3: HTH, s4: HHT, s5: HTT, s6: THT, s7: TTH, s8: TTT. In this notation, H indicates a head and T indicates a tail. The outcome s3, for example, is the outcome in which a head is obtained on the first toss, a tail is obtained on the second toss, and a head is obtained on the third toss. To apply the concepts introduced in this section, we shall define four events as follows: Let A be the event that at least one head is obtained in the three tosses; let B be the event that a head is obtained on the second toss; let C be the event that a tail is obtained on the third toss; and let D be the event that no heads are obtained. Accordingly, A = {s1, s2, s3, s4, s5, s6, s7}, B = {s1, s2, s4, s6}, C = {s4, s5, s6, s8}, D = {s8}. Various relations among these events can be derived. Some of these relations are B ⊂ A, Ac = D, B ∩ D = ∅, A ∪ C = S, B ∩ C = {s4, s6}, (B ∪ C)c = {s3, s7}, and A ∩ (B ∪ C) = {s1, s2, s4, s5, s6}. ◀ Example 1.4.5 Demands for Utilities. A contractor is building an office complex and needs to plan for water and electricity demand (sizes of pipes, conduit, and wires). After consulting with prospective tenants and examining historical data, the contractor decides that the demand for electricity will range somewhere between 1 million and 150 million kilowatt-hours per day and water demand will be between 4 and 200 (in thousands of gallons per day). All combinations of electrical and water demand are considered possible. The shaded region in Fig. 1.5 shows the sample space for the experiment, consisting of learning the actual water and electricity demands for the office complex. We can express the sample space as the set of ordered pairs {(x, y) : 4 ≤ x ≤ 200, 1 ≤ y ≤ 150}, where x stands for water demand in thousands of gallons per day and y Figure 1.5 Sample space for water and electric demand in Example 1.4.5 1 150 0 4 Water Electric 200 12 Capítulo 1 Introdução à Probabilidade Exemplo 1.4.4 Jogando uma moeda.Suponha que uma moeda seja lançada três vezes. Então o espaço amostralS contém os seguintes oito resultados possíveisé1, . . . , é8: é1: é2: é3: é4: é5: é6: é7: é8: AHH, THH, HTH, HHT, HTTP, Isso, TTH, TTT. Nesta notação, H indica cara e T indica cauda. O resultadoé3, por exemplo, é o resultado em que se obtém cara no primeiro lançamento, se obtém coroa no segundo lançamento e se obtém cara no terceiro lançamento. Para aplicar os conceitos introduzidos nesta seção, definiremos quatro eventos da seguinte forma: SejaAser o evento em que pelo menos uma cara for obtida nos três lançamentos; deixar Bseja o evento em que uma cara é obtida no segundo lançamento; deixarCseja o evento em que uma coroa é obtida no terceiro lançamento; e deixarDseja o evento quenãocabeças são obtidas. De acordo, A= {é1, é2, é3, é4, é5, é6, é7}, B= {é1, é2, é4, é6}, C= {é4, é5, é6, é8}, D= {é8}. Várias relações entre esses eventos podem ser derivadas. Algumas dessas relações sãoB⊂A,Ac=D,B∩D=∅,A∪C=S,B∩C= {é4, é6},(B∪C)c= {é3, é7}, e A∩(B∪C)= { é1, é2, é4, é5, é6}. - Exemplo 1.4.5 Demandas por serviços públicos.Um empreiteiro está construindo um complexo de escritórios e precisa planejar para a demanda de água e eletricidade (tamanhos de canos, conduítes e fios). Depois de consultar os potenciais inquilinos e examinar os dados históricos, o empreiteiro decide que a procura de electricidade irá variar entre 1 milhão e 150 milhões de quilowatts-hora por dia e a procura de água estará entre 4 e 200 (em milhares de galões por dia). Todas as combinações de demanda elétrica e de água são consideradas possíveis. A região sombreada na Fig. 1.5 mostra o espaço amostral para o experimento, que consiste em aprender as demandas reais de água e eletricidade para o complexo de escritórios. Podemos expressar o espaço amostral como o conjunto de pares ordenados {(x, y):4≤x≤200,1≤ sim≤150}, ondexrepresenta a demanda de água em milhares de galões por dia esim Figura 1.5Espaço amostral para demanda de água e eletricidade no Exemplo 1.4.5 Elétrico 150 1 Água 0 4 200 1.4 Teoria dos Conjuntos 1 3 1.4 Set Theory i) Figura 1.6Partigdo de A 5 Figure 1.6 Partition of Ss UBno Teorema 1.4.11. AUB in Theorem 1.4.11. representa a demanda elétrica em milhdes de quilowatts-hora por dia. Os tipos de conjuntos que queremos stands for the electric demand in millions of kilowatt-hours per day. The types of sets chamar de eventos incluem conjuntos como that we want to call events include sets like {a demanda de agua é de pelo menos 100} = {(x, yx2100},e {a {water demand is at least 100} = {(x, y):x => 100}, and demanda elétrica nado é superior a 35} = {(x, yi sims35}, {electric demand is no more than 35} = {(x, y): y < 35}, junto com intersegées, unides e complementos de tais conjuntos. Este espaco amostral tem along with intersections, unions, and complements of such sets. This sample space infinitos pontos. Na verdade, o espaco amostral é incontavel. Existem muitos outros conjuntos has infinitely many points. Indeed, the sample space is uncountable. There are many que sao dificeis de descrever e que nao teremos necessidade de considerar como eventos. more sets that are difficult to describe and which we will have no need to consider as - events. < Propriedades Adicionais de ConjuntosA prova do seguinte resultado util 6 deixada para o Additional Properties of Sets The proof of the following useful result is left to Exercicio 3 desta secao. Exercise 3 in this section. Teorema Leis de De Morgan.Para cada dois conjuntosAeB, Theorem De Morgan’s Laws. For every two sets A and B, 1.4.9 1.4.9 . . . (AUB)HAN Bc e(ANB)HAU Be. a (AU B)S=ASN B® and (ANB) =ASUBS. 7 A generalizagdo do Teorema 1.4.9 € 0 assunto do Exercicio 5 desta secdo. The generalization of Theorem 1.4.9 is the subject of Exercise 5 in this section. As provas das seguintes propriedades distributivas séo deixadas para o Exercicio 2 desta seco. The proofs of the following distributive properties are left to Exercise 2 in this Estas propriedades também se estendem de forma natural a colegdes maiores de eventos. section. These properties also extend in natural ways to larger collections of events. Teorema Propriedades Distributivas.Para cada trés conjuntosA,B, eC, Theorem Distributive Properties. For every three sets A, B, and C, 1.4.10 1.4.10 AN (BU C= (AN BYU(AN Che AU (BN CE (AU BIN(AU C). a AN(BUC)=(ANB)U(ANC) and AU(BNC)=(AUB)N(AUC). ef O resultado a seguir é util para calcular probabilidades de eventos que podem ser The following result is useful for computing probabilities of events that can be particionados em partes menores. Sua prova é deixada para o Exercicio 4 nesta secdo e é partitioned into smaller pieces. Its proof is left to Exercise 4 in this section, and is ilustrada pela Figura 1.6. illuminated by Fig. 1.6. Teorema Particionando um conjunto.Para cada dois conjuntosAeB,ANBeANBsao disjuntos e Theorem Partitioning a Set. For every two sets A and B, AN B and AN B* are disjoint and 1.4.11 1.4.11 ; A= (AN BY (AN Bc). A=(ANB)U(ANB‘). Além disso, BeANBcsdo disjuntos e In addition, B and AN B¢ are disjoint, and AU B= BU(AN Bo). a AUB=BU(ANB,. = Prova de que os numeros reais sao incontaveis Proof That the Real Numbers Are Uncountable Mostraremos que os numeros reais no intervalo [0,1)sdo incontaveis. Cada conjunto We shall show that the real numbers in the interval [0, 1) are uncountable. Every maior é por maioria de razdo incontavel. Para cada numeroxe [0,1), defina a sequéncia larger set is a fortiori uncountable. For each number x € [0, 1), define the sequence {an(x}}or=-1do seguinte modo. Primeiro,a1(x10x,ondesimrepresenta o maior numero inteiro {a,(x)}°2 , as follows. First, a;(x) = [10x], where |y] stands for the greatest integer menos que ou igual as/m(arredondar os ndo inteiros para o numero inteiro mais préximo abaixo). Entaéo less than or equal to y (round nonintegers down to the closest integer below). Then 14 Capitulo 1 Introdugao a Probabilidade 14 Chapter 1 Introduction to Probability 0230713...19921 0230713... 00...2736011...8 1992100... 021279...701601 2736011... 3...1515151...23 8021279... 45678... 7016013... 1515151... 2345678... 0173298... 0173298... Figura 1.7Uma matriz de uma colegao Figure 1.7 An array of a countable contavel de sequéncias de digitos com a collection of sequences of digits with the diagonal sublinhada. diagonal underlined. definird1(x-10x-a1 (x), que estara novamente em [0,1). Paran >1,an(x¥10bn-1 (x) set bj(x) = 10x — ay(x), which will again be in [0, 1). For n > 1, a,(x) = [10b,_1(x) J @ bn(xF10bn-1(x)-an(x). E facil ver que a sequéncia {an(x}~ n=ida um and b,,(x) = 10b,_1(x) — a,(x). It is easy to see that the sequence {a,(x)}*°, gives a expansdo decimal paraxna forma decimal expansion for x in the form » CO x= an(x)10-n. (1.4.1) x=) -a,(x)10-", (1.4.1) n=! n=1 Por construcdo, cada numero do formulariox=kA Oeupara alguns ndo negativos By construction, each number of the form x =k/10” for some nonnegative inteiroskeeuterdan(x-0 paran > m. Os numeros do formulariok/Weu integers k and m will have a,(x) =0 for n > m. The numbers of the form k/10” Sdo os Unicos que possuem uma expansdo decimal alternativax=~ 7=1.n(x)10-n. are the only ones that have an alternate decimal expansion x = )°°° , c,(x)10™. Quandokndo é um multiplo de 10, esta expansdo alternativa satisfazcn(x an(x) When k is not a multiple of 10, this alternate expansion satisfies c, (x) = a,(x) for para n=1,..., @U-1,Ceu(xF daeu(x}1, eCn(x9 paran > m. DeixarC {0,1,... ,I}« n=1,...,m—1,c,,(x) =a,,(x) —1,andc,(x) =9 forn > m. Let C = {0,1,..., 9} representa o conjunto de todas as sequéncias infinitas de digitos. Deixar Bdenotar o stand for the set of all infinite sequences of digits. Let B denote the subset of C subconjunto deC consistindo naquelas sequéncias que ndo terminam na repeticao de 9. consisting of those sequences that don’t end in repeating 9’s. Then we have just Entdo acabamos de construir uma fungdoado intervalo [0,1 JparaBisso é bijetivo e cujo constructed a function a from the interval [0, 1) onto B that is one-to-one and whose inverso é dado em (1.4.1). Mostramos agora que 0 conjunto&é incontavel, portanto [0,1)é inverse is given in (1.4.1). We now show that the set B is uncountable, hence [0, 1) incontavel. Pegue qualquer subconjunto contavel dee organize as sequéncias em uma is uncountable. Take any countable subset of B and arrange the sequences into a matriz retangular com oka sequéncia que atravessa o# linha da matriz para A=1,2,....A rectangular array with the kth sequence running across the kth row of the array for Figura 1.7 da um exemplo de parte de tal array. k=1,2,.... Figure 1.7 gives an example of part of such an array. Na Fig. 1.7, sublinhamos oko digito doka sequéncia para cadak. Esta parte da In Fig. 1.7, we have underlined the kth digit in the Ath sequence for each k. This matriz € chamada dediagonada matriz. Mostramos agora que deve existir uma portion of the array is called the diagonal of the array. We now show that there must sequéncia emBisso ndo faz parte desta matriz. Isto provara que todo o conjunto B exist a sequence in B that is not part of this array. This will prove that the whole set ndo pode ser colocado em tal matriz e, portanto, ndo pode ser contavel. Construa o B cannot be put into such an array, and hence cannot be countable. Construct the seqléncia {dn}«=1do seguinte modo. Para cadan, deixardn=2 se ono digito dona sequéncia sequence {d,,}°° , as follows. For each n, let d,, = 2 if the nth digit in the nth sequence é 1, edn=1 caso contrario. Esta sequéncia nado termina com a repetic¢ao de 9; portanto, é is 1, and d, = 1 otherwise. This sequence does not end in repeating 9’s; hence, it is em8&. Concluimos a prova mostrando que {dn} n=1ndo aparece em nenhum lugar in B. We conclude the proof by showing that {d,,}"° , does not appear anywhere in a matriz. Se a sequéncia apareceu na matriz, digamos, no# linha, entdo éko the array. If the sequence did appear in the array, say, in the kth row, then its kth elemento seria oko elemento diagonal da matriz. Mas construimos a sequéncia de element would be the kth diagonal element of the array. But we constructed the modo que para cadan(Incluindon=h), isso éno0 elemento nunca correspondeu ao no sequence so that for every n (including n =k), its nth element never matched the elemento diagonal. Portanto, a sequéncia ndo pode estar no# linha, ndo importa o nth diagonal element. Hence, the sequence can’t be in the kth row, no matter what que ké. O argumento apresentado aqui é essencialmente o do matematico alemdo k is. The argument given here is essentially that of the nineteenth-century German do século XIX, Georg Cantor. mathematician Georg Cantor. 1.4 Teoria dos Conjuntos 1 5 1.4 Set Theory 15 Resumo Summary Usaremos a teoria dos conjuntos para o modelo matematico de eventos. Os resultados de um We will use set theory for the mathematical model of events. Outcomes of an exper- experimento sdo elementos de algum espaco amostral5, e cada evento é um subconjunto deS. iment are elements of some sample space S, and each event is a subset of S. Two Dois eventos ocorrem se 0 resultado estiver na interseccdo dos dois conjuntos. Pelo menos um events both occur if the outcome is in the intersection of the two sets. At least one of evento de uma colecdo ocorre se o resultado estiver na uniao dos conjuntos. Dois eventos nado a collection of events occurs if the outcome is in the union of the sets. Two events can- podem ocorrer ambos se os conjuntos forem disjuntos. Um evento nao ocorre se o resultado not both occur if the sets are disjoint. An event fails to occur if the outcome is in the estiver no complemento do conjunto. O conjunto vazio representa todo evento que nao pode complement of the set. The empty set stands for every event that cannot possibly oc- ocorrer. Sup6e-se que a colecdo de eventos contenha 0 espaco amostral, o complemento de cur. The collection of events is assumed to contain the sample space, the complement cada evento e a unido de cada colecdo contavel de eventos. of each event, and the union of each countable collection of events. Exercicios Exercises 1.Suponha queACB. Mostre isso BcCAc. sangue de diferentes maneiras dependendo do tipo sanguineo. O 1. Suppose that A Cc B. Show that BS Cc A°. blood in different ways depending on the blood type. Anti- 2P iedades distributivas do T. 1.4.10 Anti-A reage com os tipos sanguineos A e AB, mas ndo com Be O. > Pp the distributi ties in Th 1.4.10 A reacts with blood types A and AB, but not with B and “Prove as propriedades distributivas do Teorema 1.4.10. O Anti-B reage com os tipos sanguineos B e AB, mas nado com Ae » trove the austrrultve properties in Sneorem haa O. Anti-B reacts with blood types B and AB, but not with 3.Prove as leis de De Morgan (Teorema 1.4.9). O. Suponha que o sangue de uma pessoa seja coletado e testado 3. Prove De Morgan’s laws (Theorem 1.4.9). A and O. Suppose that a person’s blood is sampled and com os dois antigenos. DeixarAseja o evento em que 0 sangue tested with the two antigens. Let A be the event that the 4.Prove 0 Teorema 1.4.11. reage com o anti-A, e deixeAseja o evento em que ele reage com 4, Prove Theorem 1.4.11. blood reacts with anti-A, and let B be the event that it 5.Para cada colecao de eventos Aeu(eu€EU ), mostre anti-B. Classifique 0 tipo sanguineo da pessoa usando os eventosA 5. For every collection of events A; (i € 1), show that reacts with anti-B. Classify the person’s blood type using que () ( ) ,Be seus complementos. the events A, B, and their complements. U “nn M © U . . . . . . . ‘ Aeu = Ag, € Aey = Ag 9.DeixarSseja um determinado espaco amostral e sejaAi, A2,. ..seja U A,) = 0 A and Q A.) = LU AS. 9. Let s be a given sample space and let Aj, Ap, ... be u U. umUn sequéncia infinita o Agventos. Paran=1,2,...,deixarBn= ‘ | J | \ J an infinite sequence of events. Forn = 1, 2,..., let B, = eueéU euseU eueéU euséU 0 . iel iel iel iel oo oO eu-nAeve deixarC n= aa, U2, Ai and let C, =()72,, Ai- 6.Suponha que uma carta seja selecionada de um baralho de 20 a.Mostre issoBiD BD. . .e essaCiC Qc. .... 6. Suppose that one card is to be selected from a deck of a. Show that B; > By D---andthatC,; Cc Q@)c---. cartas que contém 10 cartas vermelhas numeradas de 1 a 10 e 10 ; : 20 cards that contains 10 red cards numbered from 1 to : cartas azuis numeradas de 1 a 10.Aseja o evento em que uma b.SMicomo isso € um resultado em Spertence ao evento 10 and 10 blue cards numbered from 1 to 10. Let A be b. Show that an outcome in S belongs to the event carta com um numero par é selecionada, sejaBseja o evento em ; m1 Brse € somente se pertence a um numero the event that a card with an even number is selected, (1 Bn if and only if it belongs to an infinite number que um cartdo azul é selecionado, e deixe Gerad 0 evento em que infinito de eventosA1, A2,.... let B be the event that a blue card is selected, and let of the events Aj, Az,.... uma carta com numero menor que 5 for selecionada. Descreva 0 c.SUcomo isso é um resultado em Spertence ao evento C be the event that a card with a number less than 5 is c. Show that an outcome in S belongs to the event espaco amostralSe descreva cada um dos seguintes eventos em r=1 CnSe @ somente se pertence a todos os eventos Ar, A2 selected. Describe the sample space S and describe each Up, Cn if and only if it belongs to all the events palavras e como subconjuntos deS: ,...exceto possivelmente um numero finito desses eventos. of the following events both in words and as subsets of S: Ay, Ao, ... except possibly a finite number of those . events. a.AN BNC b.BN Cc c.AUBUC a ANBNC b. BNC ce AUBUC d.An(BUQ) e ANBAC.. 10.Trés dados de seis faces sdo lancados. Os seis lados de cada dado d. AN(BUC) e. ASN BENC.. 10. Three six-sided dice are rolled. The six sides of each sdo numerados de 1 a 6. DeixarAseja 0 evento em que o primeiro dado die are numbered 1-6. Let A be the event that the first 7.Suponha que um numeroxdeve ser selecionado da linha real mostre um numero par, sejaBseja o evento em que o segundo dado 7. Suppose that a number x is to be selected from the real die shows an even number, let B be the event that the S, e deixarA,B, eCsejam os eventos representados pelos mostre um numero par, e sejaGseja 0 evento em que o terceiro dado line S, and let A, B, and C be the events represented by the second die shows an even number, and let C be the event seguintes subconjuntos deS, onde a notacao {x - - -} denota o mostre um numero par. Além disso, para cada eu=1,... ,6, deixeAeu following subsets of S, where the notation {x:---}denotes _ that the third die shows an even number. Also, for each conjunto contendo cada pontoxpara o qual a propriedade seja o evento em que o primeiro dado mostre o numeroeu, deixar Beu the set containing every point x for which the property ji=1,...,6,let A; be the event that the first die shows the apresentada apéos os dois pontos € satisfeita: seja o evento em que o segundo dado mostre o numeroey, e deixar Ceu presented following the colon is satisfied: number i, let B; be the event that the second die shows A= (0 1€x<5}, Be { seja o evento em que o terceiro dado mostre o numeroeu. Expresse A=(x:l<x <5}, the number i, and let C; be the event that the third die cada um dos seguintes eventos em termos dos eventos nomeados shows the number i. Express each of the following events xX 3<XxS7}, C= {XxS descritos acima: B= {x:3<x <7}, in terms of the named events described above: O}. a.O evento em que todos os trés dados mostram numeros pares C = {x:x <0}. a. The event that all three dice show even numbers Descreva cada um dos sequintes eventos como um conjunto de b.O evento em que nenhum dado mostra um numero par Describe each of the following events as a set of real b. The event that no die shows an even number numeros reais: c.O evento em que pelo menos um dado mostra um numero impar numbers: c. The event that at least one die shows an odd number a.Ac b.AUB c.BNCc d.O evento em que no maximo dois dados mostram nUmeros impares a. AS b. AUB ce BOC* d. The event that at most two dice show odd numbers d. AN BN Ce e.(AUBNC. e.O evento em que a soma dos trés dados nao seja superior d. ASN BSNC e (AUB)NC. e. The event that the sum of the three dices is no greater a5 .. than 5 8.Um modelo simplificado do sistema de tipo sanguineo humano 8. A simplified model of the human blood-type system tem quatro tipos de sangue: A, B, AB e O. Existem dois antigenos, 11.Uma célula de energia consiste em duas subcélulas, cada uma das quais has four blood types: A, B, AB, and O. There are two 11. A power cell consists of two subcells, each of which anti-A e anti-B, que reagem com o sangue de uma pessoa. pode fornecer de 0 a 5 volts, independentemente do valor da outra. antigens, anti-A and anti-B, that react with a person’s can provide from 0 to 5 volts, regardless of what the other 16 Chapter 1 Introduction to Probability subcell provides. The power cell is functional if and only if the sum of the two voltages of the subcells is at least 6 volts. An experiment consists of measuring and recording the voltages of the two subcells. Let A be the event that the power cell is functional, let B be the event that two subcells have the same voltage, let C be the event that the first subcell has a strictly higher voltage than the second subcell, and let D be the event that the power cell is not functional but needs less than one additional volt to become functional. a. Define a sample space S for the experiment as a set of ordered pairs that makes it possible for you to express the four sets above as events. b. Express each of the events A, B, C, and D as sets of ordered pairs that are subsets of S. c. Express the following set in terms of A, B, C, and/or D: {(x, y) : x = y and x + y ≤ 5}. d. Express the following event in terms of A, B, C, and/or D: the event that the power cell is not func- tional and the second subcell has a strictly higher voltage than the first subcell. 12. Suppose that the sample space S of some experiment is finite. Show that the collection of all subsets of S satisfies the three conditions required to be called the collection of events. 13. Let S be the sample space for some experiment. Show that the collection of subsets consisting solely of S and ∅ satisfies the three conditions required in order to be called the collection of events. Explain why this collection would not be very interesting in most real problems. 14. Suppose that the sample space S of some experiment is countable. Suppose also that, for every outcome s ∈ S, the subset {s} is an event. Show that every subset of S must be an event. Hint: Recall the three conditions required of the collection of subsets of S that we call events. 1.5 The Definition of Probability We begin with the mathematical definition of probability and then present some useful results that follow easily from the definition. Axioms and Basic Theorems In this section, we shall present the mathematical, or axiomatic, definition of proba- bility. In a given experiment, it is necessary to assign to each event A in the sample space S a number Pr(A) that indicates the probability that A will occur. In order to satisfy the mathematical definition of probability, the number Pr(A) that is assigned must satisfy three specific axioms. These axioms ensure that the number Pr(A) will have certain properties that we intuitively expect a probability to have under each of the various interpretations described in Sec. 1.2. The first axiom states that the probability of every event must be nonnegative. Axiom 1 For every event A, Pr(A) ≥ 0. The second axiom states that if an event is certain to occur, then the probability of that event is 1. Axiom 2 Pr(S) = 1. Before stating Axiom 3, we shall discuss the probabilities of disjoint events. If two events are disjoint, it is natural to assume that the probability that one or the other will occur is the sum of their individual probabilities. In fact, it will be assumed that this additive property of probability is also true for every finite collection of disjoint events and even for every infinite sequence of disjoint events. If we assume that this additive property is true only for a finite number of disjoint events, we cannot then be certain that the property will be true for an infinite sequence of disjoint events as well. However, if we assume that the additive property is true for every infinite sequence 16 Capítulo 1 Introdução à Probabilidade subcélula fornece. A célula de potência está funcional se e somente se a soma das duas tensões das subcélulas for de pelo menos 6 volts. Um experimento consiste em medir e registrar as tensões das duas subcélulas. DeixarAseja o evento em que a célula de energia esteja funcional, deixeBseja o evento em que duas subcélulas tenham a mesma voltagem, sejaCseja o evento em que a primeira subcélula tenha uma tensão estritamente mais alta que a segunda subcélula, e sejaDseja o caso em que a célula de potência não esteja funcional, mas precise de menos de um volt adicional para se tornar funcional. d.Expresse o seguinte evento em termos deA,B,Ce/ouD: o evento em que a célula de potência não está funcionando e a segunda subcélula tem uma tensão estritamente mais alta que a primeira subcélula. 12.Suponha que o espaço amostralSde algum experimento é finito. Mostre que a coleção de todos os subconjuntos deSsatisfaz as três condições necessárias para ser chamada de coleção de eventos. 13.DeixarSser o espaço amostral para algum experimento. Mostre que a coleção de subconjuntos consistindo apenas deSe∅ satisfaz as três condições exigidas para ser chamada de coleção de eventos. Explique por que esta coleção não seria muito interessante na maioria dos problemas reais. a.Defina um espaço amostralSpara o experimento como um conjunto de pares ordenados que possibilita expressar os quatro conjuntos acima como eventos. b.Expresse cada um dos eventosA,B,C, eDcomo conjuntos de pares ordenados que são subconjuntos deS. 14.Suponha que o espaço amostralSde algum experimento é contável. Suponha também que, para cada resultadoé∈S, o subconjunto {é}é um evento. Mostre que todo subconjunto deS deve ser um evento.Dica:Lembre-se das três condições exigidas para a coleção de subconjuntos deSque chamamos de eventos. c.Expresse o seguinte conjunto em termos deA,B,Ce/ ou D: {(x, y):x=simex+sim≤5}. 1.5 A Definição de Probabilidade Começamos com a definição matemática de probabilidade e depois apresentamos alguns resultados úteis que decorrem facilmente da definição. Axiomas e Teoremas Básicos Nesta seção apresentaremos a definição matemática, ou axiomática, de probabilidade. Em um determinado experimento, é necessário atribuir a cada eventoAno espaço amostralSum número Pr(A)que indica a probabilidade de queAVai acontecer. Para satisfazer a definição matemática de probabilidade, o número Pr(A)atribuído deve satisfazer três axiomas específicos. Esses axiomas garantem que o número Pr(A)terá certas propriedades que intuitivamente esperamos que uma probabilidade tenha sob cada uma das várias interpretações descritas na Seção. 1.2. O primeiro axioma afirma que a probabilidade de todo evento deve ser não negativa. Axioma 1 Para cada eventoA, Pr.(A)≥0. O segundo axioma afirma que se é certo que um evento ocorrerá, então a probabilidade desse evento é 1. Axioma 2 Pr.(S)=1. Antes de enunciar o Axioma 3, discutiremos as probabilidades de eventos disjuntos. Se dois acontecimentos são disjuntos, é natural assumir que a probabilidade de um ou outro ocorrer é a soma das suas probabilidades individuais. Na verdade, será assumido que estepropriedade aditivada probabilidade também é verdadeira para cada coleção finita de eventos disjuntos e até mesmo para cada sequência infinita de eventos disjuntos. Se assumirmos que esta propriedade aditiva é verdadeira apenas para um número finito de eventos disjuntos, não podemos então ter certeza de que a propriedade também será verdadeira para uma sequência infinita de eventos disjuntos. No entanto, se assumirmos que a propriedade aditiva é verdadeira para toda sequência infinita 1.5 A Definicgao de Probabilidade 17 1.5 The Definition of Probability 17 de eventos disjuntos, entéo (como provaremos) a propriedade também deve ser verdadeira para todo of disjoint events, then (as we shall prove) the property must also be true for every numero finito de eventos disjuntos. Essas consideracdes levam ao terceiro axioma. finite number of disjoint events. These considerations lead to the third axiom. Axioma Para cada sequéncia infinita de eventos disjuntosA1, A2,..., Axiom For every infinite sequence of disjoint events A,, A,,..., 3 ( }oy e 3 xo xo Pr. Acu= Pr.(Aeu). Pr (U 4) = 5° Pr(A)). eu=1 eu=1 i=l i=l Exemplo Langando um dado.No Exemplo 1.4.1, para cada subconjuntoAdeS {1,2,3,4,5,6}, deixe Pr(Aser Example Rolling a Die. In Example 1.4.1, for each subset A of S = {1, 2, 3, 4, 5, 6}, let Pr(A) be 1.5.1 o numero de elementos deAdividido por 6. E trivial ver que isso satisfaz os dois primeiros 1.5.1 the number of elements of A divided by 6. It is trivial to see that this satisfies the first axiomas. Existem apenas um numero finito de colecées distintas de eventos disjuntos nado two axioms. There are only finitely many distinct collections of nonempty disjoint vazios. Ndo é dificil ver que o Axioma 3 também 6 satisfeito por este exemplo. - events. It is not difficult to see that Axiom 3 is also satisfied by this example. < Exemplo Um dado carregado.No Exemplo 1.5.1, existem outras opcées para as probabilidades de eventos. Example ALoaded Die. In Example 1.5.1, there are other choices for the probabilities of events. 1.5.2 Por exemplo, se acreditarmos que o dado esta carregado, poderemos acreditar que alguns lados tém 1.5.2 For example, if we believe that the die is loaded, we might believe that some sides probabilidades diferentes de aparecer. Para ser mais especifico, suponhamos que acreditamos que 6 have different probabilities of turning up. To be specific, suppose that we believe that tem duas vezes mais probabilidade de surgir do que cada um dos outros cinco lados. Poderiamos 6 is twice as likely to come up as each of the other five sides. We could set p; = 1/7 for definirpev=1/7 para eu=1,2,3,4,5 epe= 2/7. Entdo, para cada eventoA, defina Pr(Aser a soma de todosp i=1,2,3,4,5 and pg = 2/7. Then, for each event A, define Pr(A) to be the sum of eude tal modo queeueA. Por exemplo, seA= {1,3,5}, entao Pr/Akpitp3tps= 3/7. Nao é dificil verificar all p; such that i € A. For example, if A = {1, 3, 5}, then Pr(A) = py + p3 + ps =3/7. que isto também satisfaz todos os trés axiomas. - It is not difficult to check that this also satisfies all three axioms. < Estamos agora preparados para dar a definigdo matematica de probabilidade. We are now prepared to give the mathematical definition of probability. Definigao Probabilidade.Amedida de probabilidade, ou simplesmente umprobabilidade, em um espaco amostralSé um Definition Probability. A probability measure, or simply a probability, on a sample space S is a 1.5.1 especificacdo de numeros Pr(Ajpara todos os eventosAque satisfazem os Axiomas 1,2 e 3. 1.5.1 specification of numbers Pr(A) for all events A that satisfy Axioms 1, 2, and 3. Derivaremos agora duas consequéncias importantes do Axioma 3. Primeiro, mostraremos We shall now derive two important consequences of Axiom 3. First, we shall que se um evento for impossivel, sua probabilidade deve ser 0. show that if an event is impossible, its probability must be 0. Teorema Pr.(O0. Theorem Pr(@) = 0. 1.5.1 1.5.1 ProvaConsidere a sequéncia infinita de eventosA1, A2,.. .de tal modo queAew=@para eu=1,2 Proof Consider the infinite sequence of events A;, A>,... such that A; =9 for ,....Em outras palavras, cada um dos eventos na sequéncia é apenas o conjunto vazio @ i=1,2,....In other words, each of the events in the sequence is just the empty set U.Entdo esta sequéncia 6 uma sequéncia de eventos disjuntos, uma vez queS N @ = W.Além disso, #. Then this sequence is a sequence of disjoint events, since § N 4 = %. Furthermore, eu=1 Aeu=@.Portanto, segue do Axioma 3 que Us, A; = 9. Therefore, it follows from Axiom 3 that Pr.(@}Pr. Acu = Pr.(Aeu= Pr.(@). Pr(Q) = n(U 4) = > Pr(A;) = > Pr(Q). eu=1 eu=1 eu=1 i=l i=l i=l Esta equacdo afirma que quando o numero Pr(@X adicionado repetidamente em uma série This equation states that when the number Pr(@) is added repeatedly in an infinite infinita, a soma dessa série é simplesmente o numero Pr(@). O Unico numero real com esta series, the sum of that series is simply the number Pr(4). The only real number with propriedade é zero. 7 this property is zero. a Podemos agora mostrar que a propriedade aditiva assumida no Axioma 3 para uma sequéncia infinita de We can now show that the additive property assumed in Axiom 3 for an infinite acontecimentos disjuntos também é verdadeira para cada numero finito de acontecimentos disjuntos. sequence of disjoint events is also true for every finite number of disjoint events. Teorema Para cada sequéncia finita deneventos disjuntosA1, ..., An, Theorem For every finite sequence of n disjoint events A;,..., A,, 1.5.2 ( ly oy 1.5.2 , , Pr. Aeu = Pr.(Aeu). m(U 4) => S- Pr(Aj). eu=1 eu=1 i=1 i=l ProvaConsidere a sequéncia infinita de eventosA1, Az,...,no qualAi,..., An Proof Consider the infinite sequence of events A;, A>,..., in which Aj,..., A, sdo asndados eventos disjuntos eAeu=@paraeu > n. Entdo os eventos neste infinito are the n given disjoint events and A; = 9 for i > n. Then the events in this infinite 18 Capitulo 1 Introdugao a Probabilidade 18 Chapter 1 Introduction to Probability acin en ici Uo Un . ae oo 1 ; sequéncia sdo disjuntas e eu=1Aeu= —_eu=1Aeu. Portanto, pelo Axioma 3, sequence are disjoint and )**, A; = U;_, A;. Therefore, by Axiom 3, (py ) Cy) 5 , . ~ Pr. Aeu =Pr. Acu= Pr.(Acu) m(U 4) = m(U 4) =) ¢ Pr(Aj) eu=1 eu=1 eu=1 i=l i=l i=l »? » n oO =~ Pr.(Aeu}+ Pr.(Aeu) =) > Pr(A;)+ >> Pr(Aj) eu=1 eu=n+1 i=l i=n+1 »? n =‘ Pr.(Aeu}+0 = )¢ Pr(A;) +0 eu=1 i=l ” n = pr.(Aeu). = = 5) Pr(A)). = eu=1 i=1 Outras propriedades de probabilidade Further Properties of Probability Dos axiomas e teoremas que acabamos de apresentar, derivaremos agora quatro outras From the axioms and theorems just given, we shall now derive four other general propriedades gerais das medidas de probabilidade. Devido 4 natureza fundamental destas quatro properties of probability measures. Because of the fundamental nature of these four propriedades, elas serdo apresentadas na forma de quatro teoremas, cada um dos quais é facilmente properties, they will be presented in the form of four theorems, each one of which is provado. easily proved. Teorema Para cada eventoA, Pr.(Ac#1 — Pr(A). Theorem For every event A, Pr(A°) = 1 — Pr(A). 1.5.3 1.5.3 ProvaDesdeAeAcsao eventos disjuntos eAUA=S, segue do Teorema 1.5.2 que Pr Proof Since A and A° are disjoint events and A U A‘ = S, it follows from Theo- (SEPr.(A}Pr.(Ac). Desde Pr(SF1 pelo Axioma 2, entdo Pr(AcE 1 - Pr(A). rem 1.5.2 that Pr(S) = Pr(A) + Pr(A‘). Since Pr(S) = 1 by Axiom 2, then Pr(A‘°) = 7 1 — Pr(A). 7 Teorema SeACB, entdo Pr(A}sPr.(B). Theorem If A Cc B, then Pr(A) < Pr(B). 1.5.4 1.5.4 ProvaConforme ilustrado na Fig. 1.8, o eventoSpode ser tratado como a unido Proof As illustrated in Fig. 1.8, the event B may be treated as the union of the de dois eventos disjuntosAe BN Ac. Portanto, Pr.(BEPr.(A}Pr.(BNAc). Desde Pr(BN two disjoint events A and BN A‘. Therefore, Pr(B) = Pr(A) + Pr(B 1 A‘). Since Ac20, entdo Pr(BPr.(A). 7 Pr(B N A‘) = 0, then Pr(B) => Pr(A). a Teorema Para cada eventoA, 0<Pr.(A)<1. Theorem For every event A,0 < Pr(A) <1. 1.5.5 1.5.5 ProvaE conhecido pelo Axioma 1 que Pr(/A20. DesdeACc Spara cada eventoA, Proof It is known from Axiom 1 that Pr(A) > 0. Since A C S for every event A, Teorema 1.5.4 implica Pr/APr.(S-1, pelo Axioma 2. 7 Theorem 1.5.4 implies Pr(A) < Pr(S) = 1, by Axiom 2. 7 Teorema Para cada dois eventosAe8, Theorem For every two events A and B, 196 Pr.(ANBoPr.(A}Pr.(ANB). 1.9.6 Pr(A 0 B*) = Pr(A) — Pr(AN B). Figura 1.8 8=AU(BNAc) S B Figurel.8 B=AU(BNA‘) S$ 2 na prova do Teorema 1.5.4. | @ in the proof of Theorem 1.5.4. 1.5 The Definition of Probability 19 Proof According to Theorem 1.4.11, the events A ∩ Bc and A ∩ B are disjoint and A = (A ∩ B) ∪ (A ∩ Bc). It follows from Theorem 1.5.2 that Pr(A) = Pr(A ∩ B) + Pr(A ∩ Bc). Subtract Pr(A ∩ B) from both sides of this last equation to complete the proof. Theorem 1.5.7 For every two events A and B, Pr(A ∪ B) = Pr(A) + Pr(B) − Pr(A ∩ B). (1.5.1) Proof From Theorem 1.4.11, we have A ∪ B = B ∪ (A ∩ Bc), and the two events on the right side of this equation are disjoint. Hence, we have Pr(A ∪ B) = Pr(B) + Pr(A ∩ Bc) = Pr(B) + Pr(A) − Pr(A ∩ B), where the first equation follows from Theorem 1.5.2, and the second follows from Theorem 1.5.6. Example 1.5.3 Diagnosing Diseases. A patient arrives at a doctor’s office with a sore throat and low- grade fever. After an exam, the doctor decides that the patient has either a bacterial infection or a viral infection or both. The doctor decides that there is a probability of 0.7 that the patient has a bacterial infection and a probability of 0.4 that the person has a viral infection. What is the probability that the patient has both infections? Let B be the event that the patient has a bacterial infection, and let V be the event that the patient has a viral infection. We are told Pr(B) = 0.7, that Pr(V ) = 0.4, and that S = B ∪ V . We are asked to find Pr(B ∩ V ). We will use Theorem 1.5.7, which says that Pr(B ∪ V ) = Pr(B) + Pr(V ) − Pr(B ∩ V ). (1.5.2) Since S = B ∪ V , the left-hand side of (1.5.2) is 1, while the first two terms on the right-hand side are 0.7 and 0.4. The result is 1 = 0.7 + 0.4 − Pr(B ∩ V ), which leads to Pr(B ∩ V ) = 0.1, the probability that the patient has both infections. ◀ Example 1.5.4 Demands for Utilities. Consider, once again, the contractor who needs to plan for water and electricity demands in Example 1.4.5. There are many possible choices for how to spread the probability around the sample space (pictured in Fig. 1.5 on page 12). One simple choice is to make the probability of an event E proportional to the area of E. The area of S (the sample space) is (150 − 1) × (200 − 4) = 29,204, so Pr(E) equals the area of E divided by 29,204. For example, suppose that the contractor is interested in high demand. Let A be the set where water demand is at least 100, and let B be the event that electric demand is at least 115, and suppose that these values are considered high demand. These events are shaded with different patterns in Fig. 1.9. The area of A is (150 − 1) × (200 − 100) = 14,900, and the area 1.5 A Definição de Probabilidade 19 ProvaDe acordo com o Teorema 1.4.11, os eventosA∩BceA∩Bsão disjuntos e A=(A∩B)∪(A∩Bc). Segue do Teorema 1.5.2 que Pr.(A)=Pr.(A∩B)+Pr.(A∩Bc). Subtrair Pr(A∩B)de ambos os lados desta última equação para completar a prova. Teorema 1.5.7 Para cada dois eventosAeB, Pr.(A∪B)=Pr.(A)+Pr.(B)-Pr.(A∩B). (1.5.1) ProvaDo Teorema 1.4.11, temos A∪B=B∪(A∩Bc), e os dois eventos no lado direito desta equação são disjuntos. Portanto, temos Pr.(A∪B)=Pr.(B)+Pr.(A∩Bc) =Pr.(B)+Pr.(A)-Pr.(A∩B), onde a primeira equação segue do Teorema 1.5.2, e a segunda segue do Teorema 1.5.6. Exemplo 1.5.3 Diagnosticando Doenças.Um paciente chega ao consultório médico com dor de garganta e baixa febre de grau. Após um exame, o médico decide se o paciente tem uma infecção bacteriana, uma infecção viral ou ambas. O médico decide que há uma probabilidade de 0,7 de o paciente ter uma infecção bacteriana e uma probabilidade de 0,4 de a pessoa ter uma infecção viral. Qual é a probabilidade de o paciente ter ambas as infecções? DeixarBser o caso de o paciente ter uma infecção bacteriana, e deixarVser o evento em que o paciente tenha uma infecção viral. Dizem-nos o Pr(B)=0.7, que Pr(V)=0.4, e issoS= B∪V.Somos solicitados a encontrar o Pr(B∩V). Usaremos o Teorema 1.5.7, que diz que Pr.(B∪V)=Pr.(B)+Pr.(V)-Pr.(B∩V). (1.5.2) DesdeS=B∪V,o lado esquerdo de (1.5.2) é 1, enquanto os dois primeiros termos do lado direito são 0,7 e 0,4. O resultado é 1 = 0.7 + 0.4 − Pr(B∩V), o que leva a Pr(B∩V)=0.1, a probabilidade de o paciente ter ambas as infecções. - Exemplo 1.5.4 Demandas por serviços públicos.Considere, mais uma vez, o empreiteiro que precisa planejar demandas de água e eletricidade no Exemplo 1.4.5. Existem muitas opções possíveis sobre como distribuir a probabilidade em torno do espaço amostral (ilustrado na Figura 1.5 na página 12). Uma escolha simples é tornar a probabilidade de um eventoEproporcional à área deE. A área deS(o espaço amostral) é(150-1)×(200 - 4)=29,204, então Pr(E)é igual à área deEdividido por 29.204. Por exemplo, suponha que o empreiteiro esteja interessado em alta demanda. DeixarA ser o conjunto onde a demanda de água é de pelo menos 100, e deixarBseja o evento em que a demanda elétrica seja de pelo menos 115, e suponha que esses valores sejam considerados de alta demanda. Esses eventos estão sombreados com padrões diferentes na Figura 1.9. A área deAé(150-1)×(200-100)=14,900, e a área 20 Capitulo 1 Introdugdo a Probabilidade 20 Chapter 1 Introduction to Probability Figura 1.90s dois eventos de Elétrico Figure 1.9 The two events Electric interesse no espaco amostral de of interest in utility demand demanda de concessionaria para o 150 sample space for Exam- 150 meme 115 ; P 115 ; bf ff 1 : 1 Agua Water o) 4 100 200 o| 4 100 200 de Bé(150-115)x(200 - 46,860. Entdo, of B is (150 — 115) x (200 — 4) = 6,860. So, 14,900 6,860 14,900 6,860 Pr.(A — =0.5102, Pr.(B& —— =0.2349. Pr(A) = —— =0.5102, Pr(B) = —— =0.2349. 29,204 29,204 29,204 29,204 Os dois eventos se cruzam na regido denotada porANB. A area desta regido €/ The two events intersect in the region denoted by AN B. The area of this region 150-115)x(200-100}§3,500, entdo Pr(ANBE3,500/29,204 = 0.1198. Se o empreiteiro is (150 — 115) x (200 — 100) = 3,500, so Pr(A N B) = 3,500/29,204 = 0.1198. If the desejar calcular a probabilidade de que pelo menos uma das duas exigéncias seja contractor wishes to compute the probability that at least one of the two demands elevada, essa probabilidade é will be high, that probability is Pr.(AUBFPr.(A}Pr.(B}-Pr.(ANBE0.5102 + 0.2349 - 0.1198 = 0.6253, Pr(A U B) = Pr(A) + Pr(B) — Pr(AN B) = 0.5102 + 0.2349 — 0.1198 = 0.6253, de acordo com o Teorema 1.5.7. - according to Theorem 1.5.7. < A prova do seguinte resultado util 6 deixada para o Exercicio 13. The proof of the following useful result is left to Exercise 13. Teorema Desigualdade de Bonferroni.Para todos os eventosAt,..., An, Theorem Bonferroni Inequality. For all events A;,..., A, 1.5.8 ( ly yn ( a ) sn 1.5.8 hn h hn hn Pr. Aeus Pr.(Aeue Pr Aeu 21 - Pr.(Aeu). m(U 4 < }) Pr(A;) and nA 4 >1-— > Pr(Ao. eu=1 eu=1 eu=1 eu=1 i=l i=l i=l i=l (A segunda desigualdade acima é conhecida como Desigualdade de Bonferroni.) = (The second inequality above is known as the Bonferroni inequality.) a Nota: Probabilidade zero nao significa impossivel.Quando um evento tem probabilidade 0, Note: Probability Zero Does Not Mean Impossible. When an event has probability isso nao significa que o evento seja impossivel. No Exemplo 1.5.4, existem muitos eventos com 0, it does not mean that the event is impossible. In Example 1.5.4, there are many probabilidade 0, mas nem todos sdo impossiveis. Por exemplo, para cadax, no caso dea events with 0 probability, but they are not all impossible. For example, for every x, the demanda de agua ser igualxcorresponde a um segmento de linha na Fig. Como os segmentos event that water demand equals x corresponds to a line segment in Fig. 1.5. Since line de linha t€m drea 0, a probabilidade de cada segmento de linha é 0, mas os eventos ndo sdo segments have 0 area, the probability of every such line segment is 0, but the events todos impossiveis. Na verdade, se todo evento da forma {demanda de agua for igualx} fossem are not all impossible. Indeed, if every event of the form {water demand equals x} impossiveis, ent&o a procura de agua nao poderia assumir qualquer valor. See >0, o evento were impossible, then water demand could not take any value at all. If « > 0, the event {a demanda de agua esta entrex-cext ¢} {water demand is between x — « and x + €} tera probabilidade positiva, mas essa probabilidade ira para 0 conformeevai para 0. will have positive probability, but that probability will go to 0 as € goes to 0. Resumo Summary Apresentamos a definicéo matematica de probabilidade através dos trés axiomas. Os We have presented the mathematical definition of probability through the three axiomas exigem que todo evento tenha probabilidade ndo negativa, que todo 0 espac¢o axioms. The axioms require that every event have nonnegative probability, that the amostral tenha probabilidade 1 e que a unido de uma sequéncia infinita de eventos whole sample space have probability 1, and that the union of an infinite sequence disjuntos tenha probabilidade igual 4 soma de suas probabilidades. Alguns resultados of disjoint events have probability equal to the sum of their probabilities. Some importantes a serem lembrados incluem o seguinte: important results to remember include the following: 1.5 A Definicgao de Probabilidade 21 1.5 The Definition of Probability 21 S Ao disj (yy % “rf A, are disjoint, Pr (U’_,A;) = 7%, Pr(A eAi,..., Aksdo disjuntos, Pr Koja = euRIPr.(Aeu): 1+ ++» Ag are disjoint, Pr (U;_,A;) = )0;_, Pr(A)). Pr.(Ac#1 - Pr(A). ¢ Pr(A°) =1-— Pr(A). Ac Bimplica que Pr(A}sPr.(B). ¢ AC B implies that Pr(A) < Pr(B). Pr.(AUBFPr.(A}Pr.(B}-Pr. (ANB). ¢ Pr(A U B) = Pr(A) + Pr(B) — Pr(A rn B). Ndo importa como as probabilidades foram determinadas. Desde que satisfagam os trés It does not matter how the probabilities were determined. As long as they satisfy the axiomas, devem também satisfazer as relagdes acima, bem como todos os resultados que three axioms, they must also satisfy the above relations as well as all of the results provaremos mais adiante no texto. that we prove later in the text. Exercicios Exercises 1.Uma bola deve ser selecionada de uma caixa contendo bolas Pr.(A}Pr.(B}2 pr (ANB). 1. One ball is to be selected from a box containing red, Pr(A) + Pr(B) — 2 Pr(AN B). vermelhas, brancas, azuis, amarelas e verdes. Se a a white, blue, yellow, and green balls. If the probability that ; probabilidade de a bola selecionada ser vermelha 6 1/5e a 10.Para dois eventos arbitrariosAeB, prove que the selected ball will be red is 1/5 and the probability that 10. For two arbitrary events A and B, prove that probabilidade de ser branca é 2/5, qual é a probabilidade de it will be white is 2/5, what is the probability that it will be _ c que seja azul, amarela ou verde? Pr.(AFPr.(ANBHPr.(ANBc). blue, yellow, or green? Pr(A) = Pr(A 0 B) + Pr(A NB’). 2.Um aluno selecionado em uma turma sera um menino ou uma 11.Um ponto (x,sim) deve ser selecionado do quadrados 2. A student selected from a class will be either a boy or i. A point (x, y) is to be selected from the square S menina. Se a probabilidade de um menino ser selecionado é 0,3, contendo todos os pontos (x, sim) tal que Osxs1 e Ossims a girl. If the probability that a boy will be selected is 0.3, containing all points (x, y) such thatO<x <1 and 0 Sys qual é a probabilidade de uma menina ser selecionada? 1. Suponha que a probabilidade de o ponto selecionado pertencer what is the probability that a girl will be selected? 1. Suppose that the probability that the selected point will ; ; a cada subconjunto especificado deSé igual 4 4rea desse ; belong to each specified subset of S is equal to the area of 3.Considere dois eventosAe Btal que Pr(AF 1/3 e Pr.(BF subconjunto. Encontre a probabilidade de cada um dos seguintes 3. Consider two events A and B such that Pr(A)=1/3 _ that subset. Find the probability of each of the following + —_ 1 c * 12, Determine o valor de Pr(BrAc)para cada uma das subconjuntos:(a)o subconjunto de pontos tal que(x-1 3b+(sim- and Pr(B) = 1/2. Determine the value of Pr(B A ) for subsets: (a) the subset of points such that (x — 5)" +0 - seguintes condicées:(a)Ae Bsdo disjuntos; 1 4:(b)o subconjunto de pontos tal que. 2<x+vocé 35 each of the following conditions: (a) A and B are disjoint; 1)2 5 1: (h) the subset of points such that } <x + y <3: (b)ACB(c)Pr.(ANBE12. 2pz1 * J P q 2; (b) A CB; (c) Pr(AN B) = 1/8. 2) = P 52 TY S 2 (c)o subconjunto de pontos tal quesims1 -x2;(e)0 subconjunto (c) the subset of points such that y < 1 — x; (d) the subset 4.Se a probabilidade desse alunoAsera reprovado em um de pontos tal quex=sim. 4. Ifthe probability that student A will fail a certain statis- of points such that x = y. determinado exame de estatistica é 0,5, a probabilidade de que o ; oo. a tics examination is 0.5, the probability that student B will ; ; ; alunoBsera reprovado no exame 6 0,2, e a probabilidade de que 12.DeixarA1, A2, 7. Seruma sequencia infinita arbitraria de fail the examination is 0.2, and the probability that both 12. Let Aj, Ao, ... be an arbitrary infinite Sequence of ambos os alunosAe estudanteBsera reprovado no exame é 0,1, eventos, e deixar&1, B2,.. .seja outra sequéncia infinita student A and student B will fail the examination is 0.1, events, and let By, By,... be another infinite sequence qual é a probabilidade de pelo menos um desses dois alunos ser de eventos definidos da seguinte forma:B1=A1,B2=Ac — 1A2, B3= what is the probability that at least one of these two stu- of events defined as follows: By = Aj, By = A, Az, B3= reprovado no exame? AtNAc 2NA3,B4a=Ac 1NAc2NAc 3NA4,... .Prove isso dents will fail the examination? AS N AS 1 A3, By = At Nn AS Nn AS 1 Ag, .... Prove that 5.Para as condicées do Exercicio 4, qual é a probabilidade de ( WU ) »? 5. For the conditions of Exercise 4, what is the probability n n nenhum alunoAnem estudante sera reprovado no exame? Pr. Aeu = Pr.(BeuJparan=1,2,..., that neither student A nor student B will fail the examina- Pr U A; |= > Pr(B;) forn=1,2,..., eu=1 eu=1 tion? i=1 i=1 6.Para as condicées do Exercicio 4, qual é a probabilidade e essa 6. For the conditions of Exercise 4, what is the probability and that de exatamente um dos dois alunos ser reprovado no ( (3 ) P that exactly one of the two students will fail the examina- oo oo exame? Pr. Aeu = Pr.(Beu). tion? r(U 4) =) Pr(B;). 7.Considere dois eventosAe&com Pr.(A-0.4 e Pr(B¥0.7. eu=1 eu=1 7. Consider two events A and B with Pr(A) =0.4 and i=l i=l Determine os valores maximo e minimo possiveis de Pr 43 prove o Teorema 1.5.8.Dica.Use o Exercicio 12. Pr(B) = 0.7. Determine the maximum and minimum pos- 43, prove Theorem 1.5.8. Hint: Use Exercise 12. (AN Bye as condicées sob as quais cada um desses sible values of Pr(A MN B) and the conditions under which valores é alcancado. 14.Considere, mais uma vez, os quatro tipos sanguineos A, B, ABe each of these values is attained. 14. Consider, once again, the four blood types A, B, AB, 8.5e50 das familias d d inad O descritos no Exercicio 8 da Sedo. 1.4 juntamente com os dois 8. If 50 £ the families j an bscrib and O described in Exercise 8 in Sec. 1.4 together with id “d por cento ¢ as da laS ha 6S orem a antigenos anti-A e anti-B. Suponha que, para uma determinada . h percent of the 6S, ma none oe “i ScTl a the two antigens anti-A and anti-B. Suppose that, for a '. tia assinam 0 jorna 4 tard a, 85 bn cento de pessoa, a probabilidade de sangue tipo O seja 0,5, a probabilidade to whe lo the aftemeomnc, percent 4 4s ¢ tami tot the given person, the probability of type O blood is 0.5, the amillas assinam 0 Jornal da tarde e por cento as de sangue tipo A seja 0,34 e a probabilidade de sangue tipo B seja scribe to the alternoon newspaper, an percent of the probability of type A blood is 0.34, and the probability of familias assinam pelo menos um dos dois jornais, que 012 families subscribe to at least one of the two newspapers, type B blood is 0.12 percentagem de familias assina? ambos os jornais? _ what percentage of the families subscribe to both newspa- ; a ; ; a.Encontre a probabilidade de cada um dos antigenos pers? a. Find the probability that each of the antigens will reagir com o sangue dessa pessoa. . react with this person’s blood. 9.Prove que para cada dois eventosAe&, a probabilidade de b.Encontre a probabilidade de ambos os antigenos reagirem com o 9. Prove that for every two events A and B, the probability b. Find the probability that both antigens will react with que exatamente um dos dois eventos ocorra é dada pela that exactly one of the two events will occur is given by the . ; . sangue dessa pessoa. . this person’s blood. expressdo expression 22 Capitulo 1 Introdugdo a Probabilidade 22 Chapter 1 Introduction to Probability 1.6 Espacos Amostrais Finitos 1.6 Finite Sample Spaces Os experimentos mais simples para determinar e derivar probabilidades sdo aqueles que The simplest experiments in which to determine and derive probabilities are those envolvem apenas um numero finito de resultados possiveis. Esta secao fornece varios exemplos that involve only finitely many possible outcomes. This section gives several ex- para ilustrar os conceitos importantes da Seco. 1,5 em espacos amostrais finitos. amples to illustrate the important concepts from Sec. 1.5 in finite sample spaces. Exemplo Pesquisa Populacional Atual.Todo més, o Census Bureau realiza uma pesquisa sobre Example Current Population Survey. Every month, the Census Bureau conducts a survey of 1.6.1 a populacgao dos Estados Unidos, a fim de aprender sobre as caracteristicas da forca de 1.6.1 the United States population in order to learn about labor-force characteristics. trabalho. Varias informacées sao coletadas sobre cada um dos cerca de 50.000 domicilios. Uma Several pieces of information are collected on each of about 50,000 households. informagao é se alguém no agregado familiar esta ou nado a procura activa de emprego, mas One piece of information is whether or not someone in the household is actively actualmente ndo esté empregado. Suponhamos que a nossa experiéncia consiste em looking for employment but currently not employed. Suppose that our experiment seleccionar aleatoriamente trés agregados familiares dos 50.000 que foram inquiridos num consists of selecting three households at random from the 50,000 that were surveyed determinado més e obter acesso a informagao registada durante o inquérito. (Devido a in a particular month and obtaining access to the information recorded during the natureza confidencial das informagées obtidas durante a Pesquisa da Populacao Atual, apenas survey. (Due to the confidential nature of information obtained during the Current pesquisadores do Census Bureau poderiam realizar 0 experimento que acabamos de Population Survey, only researchers in the Census Bureau would be able to perform descrever.) Os resultados que comp6em o espaco amostralSpara este experimento podem ser the experiment just described.) The outcomes that make up the sample space S for descritos como listas de trés numeros distintos de 1 a 50.000. Por exemplo(00,1,246026 uma this experiment can be described as lists of three three distinct numbers from 1 to dessas listas onde controlamos a ordem pela qual os trés agregados familiares foram 50,000. For example (300, 1, 24602) is one such list where we have kept track of the seleccionados. Claramente, existem apenas um numero finito dessas listas. Podemos assumir order in which the three households were selected. Clearly, there are only finitely que cada lista tem a mesma probabilidade de ser escolhida, mas precisamos de ser capazes de many such lists. We can assume that each list is equally likely to be chosen, but we contar quantas dessas listas existem. Aprenderemos um método para contar os resultados need to be able to count how many such lists there are. We shall learn a method for deste exemplo na Secdo. 1.7. - counting the outcomes for this example in Sec. 1.7. < Requisitos de probabilidades Requirements of Probabilities Nesta secao, consideraremos experimentos para os quais existe apenas um numero finito de In this section, we shall consider experiments for which there are only a finite number resultados possiveis. Em outras palavras, consideraremos experimentos para os quais 0 espaco of possible outcomes. In other words, we shall consider experiments for which the amostralScontém apenas um numero finito de pontosé1,..., én. Em um experimento deste sample space S contains only a finite number of points 51, ... , s,. In an experiment of tipo, uma medida de probabilidade desé especificado atribuindo uma probabilidadepeupara this type, a probability measure on S is specified by assigning a probability p; to each cada pontoéeveS. O numeropexé a probabilidade de que o resultado do experimento sejaéeu(eu point s; ¢ S. The number p; is the probability that the outcome of the experiment =1,...,n). Para satisfazer os axiomas da probabilidade, os numeros p1,..., pdg.ndeve will be s; G@ =1,..., 7). In order to satisfy the axioms of probability, the numbers satisfazer as duas condigdes a seguir: P1.+-+s Py Must satisfy the following two conditions: peuv20 paraeu=1,...,/ p29 = fori=1,...,n e and n n > Peu=1. > pi=l. eu=1 i=l A probabilidade de cada eventoApode entdo ser encontrado adicionando as probabilidadespeu The probability of each event A can then be found by adding the probabilities p; of de todos os resultados éeuque pertencem aA. Esta é a versdo geral do Exemplo 1.5.2. all outcomes s; that belong to A. This is the general version of Example 1.5.2. Exemplo Quebras de fibra.Considere um experimento no qual cinco fibras com comprimentos diferentes sao Example Fiber Breaks. Consider an experiment in which five fibers having different lengths are 1.6.2 submetido a um processo de teste para saber qual fibra quebrara primeiro. Suponha que os 1.6.2 subjected to a testing process to learn which fiber will break first. Suppose that the comprimentos das cinco fibras sejam 1, 2, 3, 4e 5 polegadas, respectivamente. Suponha lengths of the five fibers are 1, 2, 3, 4, and 5 inches, respectively. Suppose also that também que a probabilidade de qualquer fibra ser a primeira a quebrar é proporcional ao the probability that any given fiber will be the first to break is proportional to the comprimento dessa fibra. Determinaremos a probabilidade de que 0 comprimento da fibra que length of that fiber. We shall determine the probability that the length of the fiber se rompe primeiro ndo seja superior a 3 polegadas. that breaks first is not more than 3 inches. Neste exemplo, vamos deixar éeuser 0 resultado em que a fibra cujo comprimento é In this example, we shall let s; be the outcome in which the fiber whose length is eupolegadas quebram primeiro(eu=1,...,5). EntaoS={&1,..., S}epeu=aiparaeu=1,...,5, i inches breaks first (( =1,..., 5). Then S = {s),..., s5} and p; =ai fori=1,...,5, ondeaé um fator de proporcionalidade. Deve ser verdade quepit. . .+5= 1, e sabemos where q@ is a proportionality factor. It must be true that py +---+ p5=1, and we quepit+. . .+5= 15a, entaoa= 1/15. SeAé o evento em que 0 comprimento do know that pj +---+ ps5 = 15a, soa = 1/15. If A is the event that the length of the 1.6 Finite Sample Spaces 23 fiber that breaks first is not more than 3 inches, then A = {s1, s2, s3}. Therefore, Pr(A) = p1 + p2 + p3 = 1 15 + 2 15 + 3 15 = 2 5. ◀ Simple Sample Spaces A sample space S containing n outcomes s1, . . . , sn is called a simple sample space if the probability assigned to each of the outcomes s1, . . . , sn is 1/n. If an event A in this simple sample space contains exactly m outcomes, then Pr(A) = m n . Example 1.6.3 Tossing Coins. Suppose that three fair coins are tossed simultaneously. We shall determine the probability of obtaining exactly two heads. Regardless of whether or not the three coins can be distinguished from each other by the experimenter, it is convenient for the purpose of describing the sample space to assume that the coins can be distinguished. We can then speak of the result for the first coin, the result for the second coin, and the result for the third coin; and the sample space will comprise the eight possible outcomes listed in Example 1.4.4 on page 12. Furthermore, because of the assumption that the coins are fair, it is reasonable to assume that this sample space is simple and that the probability assigned to each of the eight outcomes is 1/8. As can be seen from the listing in Example 1.4.4, exactly two heads will be obtained in three of these outcomes. Therefore, the probability of obtaining exactly two heads is 3/8. ◀ It should be noted that if we had considered the only possible outcomes to be no heads, one head, two heads, and three heads, it would have been reasonable to assume that the sample space contained just these four outcomes. This sample space would not be simple because the outcomes would not be equally probable. Example 1.6.4 Genetics. Inherited traits in humans are determined by material in specific locations on chromosomes. Each normal human receives 23 chromosomes from each parent, and these chromosomes are naturally paired, with one chromosome in each pair coming from each parent. For the purposes of this text, it is safe to think of a gene as a portion of each chromosome in a pair. The genes, either one at a time or in combination, determine the inherited traits, such as blood type and hair color. The material in the two locations that make up a gene on the pair of chromosomes comes in forms called alleles. Each distinct combination of alleles (one on each chromosome) is called a genotype. Consider a gene with only two different alleles A and a. Suppose that both parents have genotype Aa, that is, each parent has allele A on one chromosome and allele a on the other. (We do not distinguish the same alleles in a different order as a different genotype. For example, aA would be the same genotype as Aa. But it can be convenient to distinguish the two chromosomes during intermediate steps in probability calculations, just as we distinguished the three coins in Example 1.6.3.) What are the possible genotypes of an offspring of these two parents? If all possible results of the parents contributing pairs of alleles are equally likely, what are the probabilities of the different genotypes? To begin, we shall distinguish which allele the offspring receives from each parent, since we are assuming that pairs of contributed alleles are equally likely. 1.6 Espaços Amostrais Finitos 23 fibra que quebra primeiro não tem mais de 3 polegadas, entãoA= {é1, é2, é3}. Portanto, 1 2 3 2 + + =. Pr.(A)=p1+p2+p3= - 15 15 15 5 Espaços Amostrais Simples Um espaço amostralScontendonresultadosé1, . . . , éné chamado de espaço amostral simples se a probabilidade atribuída a cada um dos resultadosé1, . . . , éné 1/n. Se um eventoAneste espaço amostral simples contém exatamenteeuresultados, então eu Pr.(A)=. n Exemplo 1.6.3 Jogando moedas.Suponha que três moedas honestas sejam lançadas simultaneamente. Nós devemos determine a probabilidade de obter exatamente duas caras. Independentemente de as três moedas poderem ou não ser distinguidas umas das outras pelo experimentador, é conveniente, para o propósito de descrever o espaço amostral, assumir que as moedas podem ser distinguidas. Podemos então falar do resultado da primeira moeda, do resultado da segunda moeda e do resultado da terceira moeda; e o espaço amostral compreenderá os oito resultados possíveis listados no Exemplo 1.4.4 na página 12. Além disso, devido à suposição de que as moedas são justas, é razoável assumir que este espaço amostral é simples e que a probabilidade atribuída a cada um dos oito resultados é 1/8. Como pode ser visto na listagem do Exemplo 1.4.4, exatamente duas caras serão obtidas em três desses resultados. Portanto, a probabilidade de obter exatamente duas caras é 3/8. - Deve-se notar que se tivéssemos considerado que os únicos resultados possíveis eram nenhuma cara, uma cabeça, duas caras e três caras, teria sido razoável assumir que o espaço amostral continha apenas estes quatro resultados. Este espaço amostral não seria simples porque os resultadosnão seria igualmente provável. Exemplo 1.6.4 Genética.Características herdadas em humanos são determinadas por materiais em locais específicos nos cromossomos. Cada ser humano normal recebe 23 cromossomos de cada pai, e esses cromossomos são naturalmente pareados, com um cromossomo em cada par vindo de cada pai. Para os propósitos deste texto, é seguro pensar em umgene como uma porção de cada cromossomo em um par. Os genes, um de cada vez ou em combinação, determinam as características herdadas, como tipo sanguíneo e cor do cabelo. O material nos dois locais que constituem um gene no par de cromossomos vem em formas chamadasalelos. Cada combinação distinta de alelos (um em cada cromossomo) é chamada degenótipo. Considere um gene com apenas dois alelos diferentesAea. Suponha que ambos os pais tenham genótipoAh, isto é, cada pai tem aleloAem um cromossomo e aleloano outro. (Não distinguimos os mesmos alelos em uma ordem diferente como um genótipo diferente. Por exemplo,aAseria o mesmo genótipo queAh. Mas pode ser conveniente distinguir os dois cromossomas durante passos intermédios nos cálculos de probabilidade, tal como distinguimos as três moedas no Exemplo 1.6.3.) Quais são os possíveis genótipos de uma descendência destes dois progenitores? Se todos os resultados possíveis dos pares de alelos que contribuem para os pais são igualmente prováveis, quais são as probabilidades dos diferentes genótipos? Para começar, distinguiremos qual alelo a prole recebe de cada progenitor, uma vez que estamos assumindo que os pares de alelos contribuídos são igualmente prováveis. 24 Chapter 1 Introduction to Probability Afterward, we shall combine those results that produce the same genotype. The possible contributions from the parents are: Mother Father A a A AA Aa a aA aa So, there are three possible genotypes AA, Aa, and aa for the offspring. Since we assumed that every combination was equally likely, the four cells in the table all have probability 1/4. Since two of the cells in the table combined into genotype Aa, that genotype has probability 1/2. The other two genotypes each have probability 1/4, since they each correspond to only one cell in the table. ◀ Example 1.6.5 Rolling Two Dice. We shall now consider an experiment in which two balanced dice are rolled, and we shall calculate the probability of each of the possible values of the sum of the two numbers that may appear. Although the experimenter need not be able to distinguish the two dice from one another in order to observe the value of their sum, the specification of a simple sample space in this example will be facilitated if we assume that the two dice are distinguishable. If this assumption is made, each outcome in the sample space S can be represented as a pair of numbers (x, y), where x is the number that appears on the first die and y is the number that appears on the second die. Therefore, S comprises the following 36 outcomes: (1, 1) (1, 2) (1, 3) (1, 4) (1, 5) (1, 6) (2, 1) (2, 2) (2, 3) (2, 4) (2, 5) (2, 6) (3, 1) (3, 2) (3, 3) (3, 4) (3, 5) (3, 6) (4, 1) (4, 2) (4, 3) (4, 4) (4, 5) (4, 6) (5, 1) (5, 2) (5, 3) (5, 4) (5, 5) (5, 6) (6, 1) (6, 2) (6, 3) (6, 4) (6, 5) (6, 6) It is natural to assume that S is a simple sample space and that the probability of each of these outcomes is 1/36. Let Pi denote the probability that the sum of the two numbers is i for i = 2, 3, . . . , 12. The only outcome in S for which the sum is 2 is the outcome (1, 1). Therefore, P2 = 1/36. The sum will be 3 for either of the two outcomes (1, 2) and (2, 1). Therefore, P3 = 2/36 = 1/18. By continuing in this manner, we obtain the following probability for each of the possible values of the sum: P2 = P12 = 1 36, P5 = P9 = 4 36, P3 = P11 = 2 36, P6 = P8 = 5 36, P4 = P10 = 3 36, P7 = 6 36. ◀ 24 Capítulo 1 Introdução à Probabilidade Depois, combinaremos os resultados que produzem o mesmo genótipo. As possíveis contribuições dos pais são: Mãe a Ah ah Pai A a A AA aA Portanto, existem três genótipos possíveisAA,Ah, eahpara a prole. Como presumimos que todas as combinações eram igualmente prováveis, todas as quatro células da tabela têm probabilidade de 1/4. Como duas das células da tabela foram combinadas no genótipoAh, esse genótipo tem probabilidade 1/2. Os outros dois genótipos têm, cada um, probabilidade de 1/4, pois cada um corresponde a apenas uma célula da tabela. - Exemplo 1.6.5 Lançando dois dados.Consideraremos agora um experimento no qual dois dados equilibrados são lançados, e calcularemos a probabilidade de cada um dos valores possíveis da soma dos dois números que podem aparecer. Embora o experimentador não precise ser capaz de distinguir os dois dados um do outro para observar o valor de sua soma, a especificação de um espaço amostral simples neste exemplo será facilitada se assumirmos que os dois dados são distinguíveis. Se esta suposição for feita, cada resultado no espaço amostralSpode ser representado como um par de números (x,sim), ondexé o número que aparece no primeiro dado esimé o número que aparece no segundo dado. Portanto,Scompreende os seguintes 36 resultados: (1,1) (2,1) (3,1) (4,1) (5,1) (6,1) (1,2) (2,2) (3,2) (4,2) (5,2) (6,2) (1,3) (2,3) (3,3) (4,3) (5,3) (6,3) (1,4) (2,4) (3,4) (4,4) (5,4) (6,4) (1,5) (2,5) (3,5) (4,5) (5,5) (6,5) (1,6) (2,6) (3,6) (4,6) (5,6) (6,6) É natural supor queSé um espaço amostral simples e que a probabilidade de cada um desses resultados é 1/36. DeixarPeudenotam a probabilidade de que a soma dos dois números sejaeupara eu= 2,3, . . . ,12. O único resultado emSpara o qual a soma é 2 é o resultado(1,1). Portanto,P2= 1/36. A soma será 3 para qualquer um dos dois resultados(1,2)e(2,1). Portanto,P3= 2/36 = 1/18. Continuando desta forma, obtemos a seguinte probabilidade para cada um dos valores possíveis da soma: 1 36 2 36 3 36 4 36 5 36 P2=P12= , P5 =P 9= , P3=P11= , P6 =P8= , 6 36 P4=P10= , P7 = . - 1.7 Counting Methods 25 Summary A simple sample space is a finite sample space S such that every outcome in S has the same probability. If there are n outcomes in a simple sample space S, then each one must have probability 1/n. The probability of an event E in a simple sample space is the number of outcomes in E divided by n. In the next three sections, we will present some useful methods for counting numbers of outcomes in various events. Exercises 1. If two balanced dice are rolled, what is the probability that the sum of the two numbers that appear will be odd? 2. If two balanced dice are rolled, what is the probability that the sum of the two numbers that appear will be even? 3. If two balanced dice are rolled, what is the probability that the difference between the two numbers that appear will be less than 3? 4. A school contains students in grades 1, 2, 3, 4, 5, and 6. Grades 2, 3, 4, 5, and 6 all contain the same number of students, but there are twice this number in grade 1. If a student is selected at random from a list of all the students in the school, what is the probability that she will be in grade 3? 5. For the conditions of Exercise 4, what is the probabil- ity that the selected student will be in an odd-numbered grade? 6. If three fair coins are tossed, what is the probability that all three faces will be the same? 7. Consider the setup of Example 1.6.4 on page 23. This time, assume that two parents have genotypes Aa and aa. Find the possible genotypes for an offspring and find the probabilities for each genotype. Assume that all possi- ble results of the parents contributing pairs of alleles are equally likely. 8. Consider an experiment in which a fair coin is tossed once and a balanced die is rolled once. a. Describe the sample space for this experiment. b. What is the probability that a head will be obtained on the coin and an odd number will be obtained on the die? 1.7 Counting Methods In simple sample spaces, one way to calculate the probability of an event involves counting the number of outcomes in the event and the number of outcomes in the sample space. This section presents some common methods for counting the number of outcomes in a set. These methods rely on special structure that exists in many common experiments, namely, that each outcome consists of several parts and that it is relatively easy to count how many possibilities there are for each of the parts. We have seen that in a simple sample space S, the probability of an event A is the ratio of the number of outcomes in A to the total number of outcomes in S. In many experiments, the number of outcomes in S is so large that a complete listing of these outcomes is too expensive, too slow, or too likely to be incorrect to be useful. In such an experiment, it is convenient to have a method of determining the total number of outcomes in the space S and in various events in S without compiling a list of all these outcomes. In this section, some of these methods will be presented. 1.7 Métodos de Contagem 25 Resumo Um espaço amostral simples é um espaço amostral finitoSde modo que cada resultado emStem a mesma probabilidade. Se houvernresultados em um espaço amostral simplesS, então cada um deve ter probabilidade 1/n. A probabilidade de um eventoEem um espaço amostral simples é o número de resultados emEdividido porn. Nas próximas três seções, apresentaremos alguns métodos úteis para contar o número de resultados em vários eventos. Exercícios 1.Se dois dados equilibrados forem lançados, qual é a probabilidade de a soma dos dois números que aparecem ser ímpar? 6.Se três moedas honestas forem lançadas, qual é a probabilidade de que todas as três faces sejam iguais? 2.Se dois dados equilibrados forem lançados, qual é a probabilidade de a soma dos dois números que aparecem ser par? 7.Considere a configuração do Exemplo 1.6.4 na página 23. Desta vez, suponha que dois pais tenham genótiposAheah. Encontre os genótipos possíveis para uma prole e encontre as probabilidades para cada genótipo. Suponha que todos os resultados possíveis dos pares de alelos que contribuem para os pais sejam igualmente prováveis. 3.Se dois dados equilibrados forem lançados, qual é a probabilidade de a diferença entre os dois números que aparecem ser menor que 3? 4.Uma escola contém alunos da 1ª série,2,3,4,5, e 6. 2ª série,3,4,5 e 6 contêm o mesmo número de alunos, mas há o dobro desse número na 1ª série. Se um aluno for selecionado aleatoriamente de uma lista de todos os alunos da escola, qual é a probabilidade de ele estar na série 3? 8.Considere um experimento em que uma moeda honesta é lançada uma vez e um dado equilibrado é lançado uma vez. a.Descreva o espaço amostral para este experimento. b.Qual é a probabilidade de obter uma cara na moeda e um número ímpar no dado? 5.Para as condições do Exercício 4, qual é a probabilidade de o aluno selecionado estar em uma série ímpar? 1.7 Métodos de Contagem Em espaços amostrais simples, uma maneira de calcular a probabilidade de um evento envolve contar o número de resultados no evento e o número de resultados no espaço amostral. Esta seção apresenta alguns métodos comuns para contar o número de resultados em um conjunto. Estes métodos baseiam-se na estrutura especial que existe em muitas experiências comuns, nomeadamente, que cada resultado consiste em várias partes e que é relativamente fácil contar quantas possibilidades existem para cada uma das partes. Vimos que em um espaço amostral simplesS, a probabilidade de um eventoAé a razão entre o número de resultados emApara o número total de resultados emS. Em muitos experimentos, o número de resultados emSé tão grande que uma listagem completa desses resultados é muito cara, muito lenta ou muito provavelmente incorreta para ser útil. Em tal experimento, é conveniente ter um método para determinar o número total de resultados no espaçoSe em vários eventosSsem compilar uma lista de todos esses resultados. Nesta seção, alguns desses métodos serão apresentados. 26 Chapter 1 Introduction to Probability Figure 1.10 Three cities with routes between them in Example 1.7.1. A C 4 1 2 3 5 6 7 8 B Multiplication Rule Example 1.7.1 Routes between Cities. Suppose that there are three different routes from city A to city B and five different routes from city B to city C. The cities and routes are depicted in Fig. 1.10, with the routes numbered from 1 to 8. We wish to count the number of different routes from A to C that pass through B. For example, one such route from Fig. 1.10 is 1 followed by 4, which we can denote (1, 4). Similarly, there are the routes (1, 5), (1, 6), . . . , (3, 8). It is not difficult to see that the number of different routes 3 × 5 = 15. ◀ Example 1.7.1 is a special case of a common form of experiment. Example 1.7.2 Experiment in Two Parts. Consider an experiment that has the following two charac- teristics: i. The experiment is performed in two parts. ii. The first part of the experiment has m possible outcomes x1, . . . , xm, and, regardless of which one of these outcomes xi occurs, the second part of the experiment has n possible outcomes y1, . . . , yn. Each outcome in the sample space S of such an experiment will therefore be a pair having the form (xi, yj), and S will be composed of the following pairs: (x1, y1)(x1, y2) . . . (x1, yn) (x2, y1)(x2, y2) . . . (x2, yn) ... ... ... (xm, y1)(xm, y2) . . . (xm, yn). ◀ Since each of the m rows in the array in Example 1.7.2 contains n pairs, the following result follows directly. Theorem 1.7.1 Multiplication Rule for Two-Part Experiments. In an experiment of the type described in Example 1.7.2, the sample space S contains exactly mn outcomes. Figure 1.11 illustrates the multiplication rule for the case of n = 3 and m = 2 with a tree diagram. Each end-node of the tree represents an outcome, which is the pair consisting of the two parts whose names appear along the branch leading to the end- node. Example 1.7.3 Rolling Two Dice. Suppose that two dice are rolled. Since there are six possible outcomes for each die, the number of possible outcomes for the experiment is 6 × 6 = 36, as we saw in Example 1.6.5. ◀ The multiplication rule can be extended to experiments with more than two parts. 26 Capítulo 1 Introdução à Probabilidade Figura 1.10Três cidades com rotas entre elas no Exemplo 1.7.1. 4 5 1 B 2 6 A C 7 3 8 Regra de multiplicação Exemplo 1.7.1 Rotas entre cidades.Suponha que existam três rotas diferentes da cidadeApara cidadeBe cinco rotas diferentes da cidadeBpara a cidadeC. As cidades e rotas estão representadas na Fig. 1.10, com as rotas numeradas de 1 a 8. Queremos contar o número de rotas diferentes deAparaCque passamB. Por exemplo, uma dessas rotas da Fig. 1.10 é 1 seguido por 4, que podemos denotar(1,4). Da mesma forma, existem as rotas (1,5), (1,6), . . . , (3,8). Não é difícil ver que o número de rotas diferentes 3×5 = 15. - O Exemplo 1.7.1 é um caso especial de uma forma comum de experimento. Exemplo 1.7.2 Experimente em duas partes.Considere um experimento que tem as duas características a seguir: características: eu. O experimento é realizado em duas partes. ii. A primeira parte do experimento temeuPossíveis resultadosx1, . . . , xeu, e, independentemente de qual desses resultadosxeuocorre, a segunda parte do experimento temnPossíveis resultadossim1, . . . , simn. Cada resultado no espaço amostralSde tal experimento será, portanto, um par tendo a forma (xeu,simj),eSserá composto pelos seguintes pares: (x1, sim1)(x1, sim2). . . (x1, simn) (x2, sim1)(x2, sim2). . . (x2, simn) . . . . . . eu, sim2) . . . (xeu, simn). (xeu, sim1)(x . . . - Já que cada um doseulinhas na matriz no Exemplo 1.7.2 contémnpares, o seguinte resultado segue diretamente. Teorema 1.7.1 Regra de multiplicação para experimentos em duas partes.Em um experimento do tipo descrito no Exemplo 1.7.2, o espaço amostralScontém exatamentehomemresultados. A Figura 1.11 ilustra a regra de multiplicação para o caso den=3 eeu=2 com um diagrama de árvore. Cada nó final da árvore representa um resultado, que é o par que consiste nas duas partes cujos nomes aparecem ao longo do ramo que leva ao nó final. Exemplo 1.7.3 Lançando dois dados.Suponha que dois dados sejam lançados. Como existem seis possíveis resultados para cada dado, o número de resultados possíveis para o experimento é 6×6 = 36, como vimos no Exemplo 1.6.5. - A regra da multiplicação pode ser estendida a experimentos com mais de duas partes. 1.7 Counting Methods 27 Figure 1.11 Tree diagram in which end-nodes represent outcomes. x2 x1 y1 y1 y2 y2 y3 y3 (x1, y1) (x2, y1) (x2, y2) (x2, y3) (x1, y2) (x1, y3) Theorem 1.7.2 Multiplication Rule. Suppose that an experiment has k parts (k ≥ 2), that the ith part of the experiment can have ni possible outcomes (i = 1, . . . , k), and that all of the outcomes in each part can occur regardless of which specific outcomes have occurred in the other parts. Then the sample space S of the experiment will contain all vectors of the form (u1, . . . , uk), where ui is one of the ni possible outcomes of part i (i = 1, . . . , k). The total number of these vectors in S will be equal to the product n1n2 . . . nk. Example 1.7.4 Tossing Several Coins. Suppose that we toss six coins. Each outcome in S will consist of a sequence of six heads and tails, such as HTTHHH. Since there are two possible outcomes for each of the six coins, the total number of outcomes in S will be 26 = 64. If head and tail are considered equally likely for each coin, then S will be a simple sample space. Since there is only one outcome in S with six heads and no tails, the probability of obtaining heads on all six coins is 1/64. Since there are six outcomes in S with one head and five tails, the probability of obtaining exactly one head is 6/64 = 3/32. ◀ Example 1.7.5 Combination Lock. A standard combination lock has a dial with tick marks for 40 numbers from 0 to 39. The combination consists of a sequence of three numbers that must be dialed in the correct order to open the lock. Each of the 40 numbers may appear in each of the three positions of the combination regardless of what the other two positions contain. It follows that there are 403 = 64,000 possible combinations. This number is supposed to be large enough to discourage would-be thieves from trying every combination. ◀ Note: The Multiplication Rule Is Slightly More General. In the statements of The- orems 1.7.1 and 1.7.2, it is assumed that each possible outcome in each part of the experiment can occur regardless of what occurs in the other parts of the experiment. Technically, all that is necessary is that the number of possible outcomes for each part of the experiment not depend on what occurs on the other parts. The discussion of permutations below is an example of this situation. Permutations Example 1.7.6 Sampling without Replacement. Consider an experiment in which a card is selected and removed from a deck of n different cards, a second card is then selected and removed from the remaining n − 1 cards, and finally a third card is selected from the remaining n − 2 cards. Each outcome consists of the three cards in the order selected. A process of this kind is called sampling without replacement, since a card that is drawn is not replaced in the deck before the next card is selected. In this experiment, any one of the n cards could be selected first. Once this card has been removed, any one of the other n − 1 cards could be selected second. Therefore, there are n(n − 1) 1.7 Métodos de Contagem 27 Figura 1.11Diagrama de árvore em que os nós finais representam os resultados. sim1 (x1,sim1) (x1,sim2) sim2 x1 sim3 (x1,sim3) (x2,sim1) sim1 x2 sim2 (x2,sim2) (x2,sim3) sim3 Teorema 1.7.2 Regra de multiplicação.Suponha que um experimento tenhakpeças(k≥2), que oeuº parte do experimento pode terneuPossíveis resultados(eu=1, . . . , k), e que todos os resultados em cada parte podem ocorrer independentemente de quais resultados específicos tenham ocorrido nas outras partes. Então o espaço amostralSdo experimento conterá todos os vetores da forma(você1, . . . , vocêk), ondevocêeué um dosneupossíveis resultados da parte eu (eu=1, . . . , k). O número total desses vetores emSserá igual ao produto n1 n. . . nk. 2 Exemplo 1.7.4 Jogando várias moedas.Suponha que joguemos seis moedas. Cada resultado emSconsistirá de uma sequência de seis caras e coroas, como HTTHHH. Como existem dois resultados possíveis para cada uma das seis moedas, o número total de resultados emSserão 26= 64. Se cara e coroa forem consideradas igualmente prováveis para cada moeda, entãoSserá um espaço amostral simples. Como há apenas um resultado emScom seis caras e nenhuma coroa, a probabilidade de obter cara em todas as seis moedas é 1/64. Como existem seis resultados emScom uma cara e cinco coroas, a probabilidade de obter exatamente uma cara é 6/64 = 3/32. - Exemplo 1.7.5 Fechadura de combinação.Uma fechadura de combinação padrão possui um mostrador com marcas de 40 números de 0 a 39. A combinação consiste em uma sequência de três números que devem ser discados na ordem correta para abrir a fechadura. Cada um dos 40 números pode aparecer em cada uma das três posições da combinação, independentemente do que as outras duas posições contenham. Segue-se que existem 403= 64,000 combinações possíveis. Supõe-se que esse número seja grande o suficiente para desencorajar possíveis ladrões de tentar todas as combinações. - Nota: A regra da multiplicação é um pouco mais geral.Nas afirmações dos Teoremas 1.7.1 e 1.7.2, assume-se que cada resultado possível em cada parte do experimento pode ocorrer independentemente do que ocorre nas outras partes do experimento. Tecnicamente, tudo o que é necessário é que onúmerodos resultados possíveis para cada parte do experimento não depende do que ocorre nas outras partes. A discussão das permutações abaixo é um exemplo desta situação. Permutações Exemplo 1.7.6 Amostragem sem reposição.Considere um experimento em que uma carta é selecionada e removido de um baralho dencartas diferentes, uma segunda carta é então selecionada e removida das restantes.n-1 carta e, finalmente, uma terceira carta é selecionada das restantesn-2 cartas. Cada resultado consiste em três cartas na ordem selecionada. Um processo deste tipo é chamado amostragem sem reposição, uma vez que uma carta comprada não é recolocada no baralho antes que a próxima carta seja selecionada. Neste experimento, qualquer um dosnos cartões podem ser selecionados primeiro. Uma vez que esta carta tenha sido removida, qualquer uma das outrasn-1 cartão pode ser selecionado em segundo lugar. Portanto, existemn(n-1) 28 Chapter 1 Introduction to Probability possible outcomes for the first two selections. Finally, for every given outcome of the first two selections, there are n − 2 other cards that could possibly be selected third. Therefore, the total number of possible outcomes for all three selections is n(n − 1)(n − 2). ◀ The situation in Example 1.7.6 can be generalized to any number of selections without replacement. Definition 1.7.1 Permutations. Suppose that a set has n elements. Suppose that an experiment consists of selecting k of the elements one at a time without replacement. Let each outcome consist of the k elements in the order selected. Each such outcome is called a per- mutation of n elements taken k at a time. We denote the number of distinct such permutations by the symbol Pn,k. By arguing as in Example 1.7.6, we can figure out how many different permutations there are of n elements taken k at a time. The proof of the following theorem is simply to extend the reasoning in Example 1.7.6 to selecting k cards without replacement. The proof is left to the reader. Theorem 1.7.3 Number of Permutations. The number of permutations of n elements taken k at a time is Pn,k = n(n − 1) . . . (n − k + 1). Example 1.7.7 Current Population Survey. Theorem 1.7.3 allows us to count the number of points in the sample space of Example 1.6.1. Each outcome in S consists of a permutation of n = 50,000 elements taken k = 3 at a time. Hence, the sample space S in that example consisits of 50,000 × 49,999 × 49,998 = 1.25 × 1014 outcomes. ◀ When k = n, the number of possible permutations will be the number Pn,n of different permutations of all n cards. It is seen from the equation just derived that Pn,n = n(n − 1) . . . 1 = n! The symbol n! is read n factorial. In general, the number of permutations of n differ- ent items is n!. The expression for Pn,k can be rewritten in the following alternate form for k = 1, . . . , n − 1: Pn,k = n(n − 1) . . . (n − k + 1)(n − k)(n − k − 1) . . . 1 (n − k)(n − k − 1) . . . 1 = n! (n − k)!. Here and elsewhere in the theory of probability, it is convenient to define 0! by the relation 0!= 1. With this definition, it follows that the relation Pn,k = n!/(n − k)! will be correct for the value k = n as well as for the values k = 1, . . . , n − 1. To summarize: Theorem 1.7.4 Permutations. The number of distinct orderings of k items selected without replace- ment from a collection of n different items (0 ≤ k ≤ n) is Pn,k = n! (n − k)!. 28 Capítulo 1 Introdução à Probabilidade resultados possíveis para as duas primeiras seleções. Finalmente, para cada resultado dado das duas primeiras seleções, existemn-2 outras cartas que poderiam ser selecionadas em terceiro lugar. Portanto, o número total de resultados possíveis para todas as três seleções é n(n-1)(n-2). - A situação no Exemplo 1.7.6 pode ser generalizada para qualquer número de seleções sem substituição. Definição 1.7.1 Permutações.Suponha que um conjunto tenhanelementos. Suponha que um experimento consista de selecionarkdos elementos, um de cada vez, sem reposição. Deixe cada resultado consistir nokelementos na ordem selecionada. Cada um desses resultados é chamado de permutação denelementos tomadoskde uma vez. Denotamos o número de tais permutações distintas pelo símboloPnão, k. Argumentando como no Exemplo 1.7.6, podemos descobrir quantas permutações diferentes existem denelementos tomadoskde uma vez. A prova do seguinte teorema consiste simplesmente em estender o raciocínio do Exemplo 1.7.6 para selecionarkcartões sem reposição. Cabe ao leitor fornecer a prova. Teorema 1.7.3 Número de permutações.O número de permutações denelementos tomadoskde uma vez éPnão, k=n(n-1). . . (n-k+1). Exemplo 1.7.7 Pesquisa Populacional Atual.O Teorema 1.7.3 nos permite contar o número de pontos em o espaço amostral do Exemplo 1.6.1. Cada resultado emSconsiste em uma permutação de n=50 ,000 elementos retiradosk=3 de cada vez. Portanto, o espaço amostralSnesse exemplo consiste em 50,000×49,999×49,998 = 1.25×1014 resultados. - Quandok=n, o número de permutações possíveis será o númeroPnão, nãode diferentes permutações de todosncartões. É visto pela equação que acabamos de derivar que Pnão, não=n(n-1). . .1 =n! O símbolon! está lidonfatorial. Em geral, o número de permutações denitens diferentes én!. A expressão paraPnão, kpode ser reescrito na seguinte forma alternativa para k=1, . . . , n-1: (n-k)(n-k-1). . .1 (n-k)(n- k-1). . .1 n! (n-k)! Pnão, k=n(n-1). . . (n-k+1) = . Aqui e em outros lugares da teoria da probabilidade, é conveniente definir 0! pela relação 0! = 1. Com esta definição, segue-se que a relaçãoPnão, k=n!/(n-k)! estará correto para o valork=nbem como para os valoresk=1, . . . , n-1. Para resumir: Teorema 1.7.4 Permutações.O número de ordenações distintas dekitens selecionados sem substituição mento de uma coleção denitens diferentes (0≤k≤n) é n! Pnão, k= (n-k)! . 1.7 Counting Methods 29 Example 1.7.8 Choosing Officers. Suppose that a club consists of 25 members and that a president and a secretary are to be chosen from the membership. We shall determine the total possible number of ways in which these two positions can be filled. Since the positions can be filled by first choosing one of the 25 members to be president and then choosing one of the remaining 24 members to be secretary, the possible number of choices is P25,2 = (25)(24) = 600. ◀ Example 1.7.9 Arranging Books. Suppose that six different books are to be arranged on a shelf. The number of possible permutations of the books is 6!= 720. ◀ Example 1.7.10 Sampling with Replacement. Consider a box that contains n balls numbered 1, . . . , n. First, one ball is selected at random from the box and its number is noted. This ball is then put back in the box and another ball is selected (it is possible that the same ball will be selected again). As many balls as desired can be selected in this way. This process is called sampling with replacement. It is assumed that each of the n balls is equally likely to be selected at each stage and that all selections are made independently of each other. Suppose that a total of k selections are to be made, where k is a given positive integer. Then the sample space S of this experiment will contain all vectors of the form (x1, . . . , xk), where xi is the outcome of the ith selection (i = 1, . . . , k). Since there are n possible outcomes for each of the k selections, the total number of vectors in S is nk. Furthermore, from our assumptions it follows that S is a simple sample space. Hence, the probability assigned to each vector in S is 1/nk. ◀ Example 1.7.11 Obtaining Different Numbers. For the experiment in Example 1.7.10, we shall deter- mine the probability of the event E that each of the k balls that are selected will have a different number. If k > n, it is impossible for all the selected balls to have different numbers be- cause there are only n different numbers. Suppose, therefore, that k ≤ n. The number of outcomes in the event E is the number of vectors for which all k components are different. This equals Pn,k, since the first component x1 of each vector can have n pos- sible values, the second component x2 can then have any one of the other n − 1values, and so on. Since S is a simple sample space containing nk vectors, the probability p that k different numbers will be selected is p = Pn,k nk = n! (n − k)!nk . ◀ Note: Using Two Different Methods in the Same Problem. Example 1.7.11 illus- trates a combination of techniques that might seem confusing at first. The method used to count the number of outcomes in the sample space was based on sampling with replacement, since the experiment allows repeat numbers in each outcome. The method used to count the number of outcomes in the event E was permutations (sam- pling without replacement) because E consists of those outcomes without repeats. It often happens that one needs to use different methods to count the numbers of out- comes in different subsets of the sample space. The birthday problem, which follows, is another example in which we need more than one counting method in the same problem. 1.7 Métodos de Contagem 29 Exemplo 1.7.8 Escolhendo Oficiais.Suponha que um clube seja composto por 25 sócios e que um presidente e um secretário serão escolhidos entre os membros. Determinaremos o número total possível de maneiras pelas quais essas duas posições podem ser preenchidas. Como os cargos podem ser preenchidos escolhendo primeiro um dos 25 membros para ser presidente e depois escolhendo um dos 24 membros restantes para ser secretário, o número possível de escolhas éP25,2=(25)(24)=600. - Exemplo 1.7.9 Organizando livros.Suponha que seis livros diferentes sejam dispostos em uma estante. O o número de permutações possíveis dos livros é 6! = 720. - Exemplo 1.7.10 Amostragem com Substituição.Considere uma caixa que contémnbolas numeradas 1, . . . , n. Primeiro, uma bola é selecionada aleatoriamente da caixa e seu número é anotado. Esta bola é então colocada de volta na caixa e outra bola é selecionada (é possível que a mesma bola seja selecionada novamente). Quantas bolas desejar podem ser selecionadas desta forma. Este processo é chamado amostragem com reposição. Supõe-se que cada um dosn as bolas tenham a mesma probabilidade de serem selecionadas em cada fase e que todas as seleções sejam feitas independentemente umas das outras. Suponha que um total dekdevem ser feitas seleções, ondeké um determinado número inteiro positivo. Então o espaço amostralSdeste experimento conterá todos os vetores da forma (x1, . . . , xk), ondexeué o resultado doeuª seleção(eu=1, . . . , k). Já que existemnresultados possíveis para cada um doskseleções, o número total de vetores emS énk. Além disso, das nossas suposições segue-se queSé um espaço amostral simples. Portanto, a probabilidade atribuída a cada vetor emSé 1/nk. - Exemplo 1.7.11 Obtenção de números diferentes.Para o experimento do Exemplo 1.7.10, determinaremos mine a probabilidade do eventoEque cada um doskas bolas selecionadas terão um número diferente. Sek > n, é impossível que todas as bolas selecionadas tenham números diferentes porque só existemnnúmeros diferentes. Suponhamos, portanto, quek≤n. O número de resultados no eventoEé o número de vetores para os quais todoskcomponentes são diferentes. Isso é igualP não, k, já que o primeiro componentex1de cada vetor pode ternvalores possíveis, o segundo componentex2pode então ter qualquer um dos outrosn-1 valores e assim por diante. DesdeSé um espaço amostral simples contendonkvetores, a probabilidadep queknúmeros diferentes serão selecionados é Pnão, k n! (n-k)!nk p = = . - nk Nota: Usando dois métodos diferentes no mesmo problema.O Exemplo 1.7.11 ilustra uma combinação de técnicas que podem parecer confusas à primeira vista. O método utilizado para contar o número de resultados no espaço amostral foi baseado na amostragem com reposição, uma vez que o experimento permite repetir números em cada resultado. O método usado para contar o número de resultados no eventoEforam permutações (amostragem sem reposição) porqueEconsiste nesses resultados sem repetições. Muitas vezes acontece que é necessário usar métodos diferentes para contar o número de resultados em diferentes subconjuntos do espaço amostral. O problema do aniversário, a seguir, é outro exemplo em que precisamos de mais de um método de contagem no mesmo problema. 30 Chapter 1 Introduction to Probability The Birthday Problem In the following problem, which is often called the birthday problem, it is required to determine the probability p that at least two people in a group of k people will have the same birthday, that is, will have been born on the same day of the same month but not necessarily in the same year. For the solution presented here, we assume that the birthdays of the k people are unrelated (in particular, we assume that twins are not present) and that each of the 365 days of the year is equally likely to be the birthday of any person in the group. In particular, we ignore the fact that the birth rate actually varies during the year and we assume that anyone actually born on February 29 will consider his birthday to be another day, such as March 1. When these assumptions are made, this problem becomes similar to the one in Example 1.7.11. Since there are 365 possible birthdays for each of k people, the sample space S will contain 365k outcomes, all of which will be equally probable. If k > 365, there are not enough birthdays for every one to be different, and hence at least two people must have the same birthday. So, we assume that k ≤ 365. Counting the number of outcomes in which at least two birthdays are the same is tedious. However, the number of outcomes in S for which all k birthdays will be different is P365,k, since the first person’s birthday could be any one of the 365 days, the second person’s birthday could then be any of the other 364 days, and so on. Hence, the probability that all k persons will have different birthdays is P365,k 365k . The probability p that at least two of the people will have the same birthday is therefore p = 1 − P365,k 365k = 1 − (365)! (365 − k)!365k . Numerical values of this probability p for various values of k are given in Table 1.1. These probabilities may seem surprisingly large to anyone who has not thought about them before. Many persons would guess that in order to obtain a value of p greater than 1/2, the number of people in the group would have to be about 100. However, according to Table 1.1, there would have to be only 23 people in the group. As a matter of fact, for k = 100 the value of p is 0.9999997. Table 1.1 The probability p that at least two people in a group of k people will have the same birthday k p k p 5 0.027 25 0.569 10 0.117 30 0.706 15 0.253 40 0.891 20 0.411 50 0.970 22 0.476 60 0.994 23 0.507 30 Capítulo 1 Introdução à Probabilidade O problema do aniversário No problema a seguir, frequentemente chamado de problema do aniversário, é necessário determinar a probabilidadepque pelo menos duas pessoas em um grupo dekas pessoas farão aniversário no mesmo dia, ou seja, terão nascido no mesmo dia do mesmo mês, mas não necessariamente no mesmo ano. Para a solução aqui apresentada, assumimos que os aniversários doskas pessoas não são relacionadas (em particular, assumimos que não há gêmeos) e que cada um dos 365 dias do ano tem a mesma probabilidade de ser o aniversário de qualquer pessoa do grupo. Em particular, ignoramos o facto de que a taxa de natalidade varia de facto durante o ano e assumimos que qualquer pessoa nascida no dia 29 de Fevereiro considerará o seu aniversário como sendo outro dia, como 1 de Março. Quando estas suposições são feitas, este problema torna-se semelhante ao do Exemplo 1.7.11. Como existem 365 aniversários possíveis para cada umkpessoas, o espaço amostralS conterá 365kresultados, todos os quais serão igualmente prováveis. Se k >365, não há aniversários suficientes para que cada um seja diferente e, portanto, pelo menos duas pessoas devefazem aniversário no mesmo dia. Então, assumimos quek≤365. Contar o número de resultados em que pelo menos dois aniversários coincidem é tedioso. No entanto, o número de resultados emSpara o qual todoskaniversários serão diferentes é P365, k, como o aniversário da primeira pessoa pode ser qualquer um dos 365 dias, o aniversário da segunda pessoa pode ser qualquer um dos outros 364 dias e assim por diante. Portanto, a probabilidade de que todosk pessoas terão aniversários diferentes é P365, k. 365k A probabilidadepque pelo menos duas pessoas farão aniversário no mesmo dia é, portanto, P365, k (365)! (365 -k)!365k p=1 - =1 - . 365k Valores numéricos desta probabilidadeppara vários valores deksão apresentados na Tabela 1.1. Estas probabilidades podem parecer surpreendentemente grandes para quem nunca pensou nelas antes. Muitas pessoas imaginariam que, para obter um valor depmaior que 1/2, o número de pessoas no grupo teria que ser cerca de 100. Contudo, de acordo com a Tabela 1.1, teria que haver apenas 23 pessoas no grupo. Aliás, parak=100 o valor depé 0,9999997. Tabela 1.1A probabilidadepque pelo menos dois pessoas em um grupo dekas pessoas farão aniversário no mesmo dia k p k p 5 10 15 20 22 23 0,027 0,117 0,253 0,411 0,476 0,507 25 30 40 50 60 0,569 0,706 0,891 0,970 0,994 1.7 Métodos de Contagem 31 1.7 Counting Methods 31 O calculo neste exemplo ilustra uma técnica comum para resolver problemas de The calculation in this example illustrates a common technique for solving prob- probabilidade. Se alguém deseja calcular a probabilidade de algum eventoA, pode ability problems. If one wishes to compute the probability of some event A, it might ser mais simples calcular Pr(Ac}e entdo use o fato de que Pr(A¥ 1 - Pr(Ac). Esta ideia é be more straightforward to calculate Pr(A‘°) and then use the fact that Pr(A) = particularmente util quando 0 eventoAtem a forma “pelo menos ncoisas acontecem” 1 — Pr(A°). This idea is particularly useful when the event A is of the form “at least ondené pequeno comparado a quantas coisas poderiam acontecer. n things happen” where n is small compared to how many things could happen. Formula de Stirling Stirling’s Formula Para grandes valores den, é quase impossivel calcularn!. Paran=70,n!>10100 For large values of n, it is nearly impossible to compute n!. For n > 70, n! > 101° e ndo pode ser representado em muitas calculadoras cientificas. Na maioria dos casos and cannot be represented on many scientific calculators. In most cases for which para os quais n! 6 necessdrio com um grande valor den, basta a razdo den! para outro n!is needed with a large value of n, one only needs the ratio of n! to another large grande numeroan. Um exemplo comum disso éPnao, com grandene nao tao grandek, que number a,. A common example of this is P, , with large n and not so large k, which é igualn!//n-k}. Nesses casos, podemos notar que equals n!/(n — k)!. In such cases, we can notice that nm = @registro(n!+log(an). n! _ elog(a))—log(a,) an an Comparado com a computacaor!, 6 preciso um tempo muito maiornantes do registro(n!torna- Compared to computing n!, it takes a much larger n before log(n!) becomes difficult se dificil de representar. Além disso, se tivéssemos uma aproximacao simples énconectar(n! tal to represent. Furthermore, if we had a simple approximation s, to log(n!) such that que limn-~| ér-registro(n!)| =0, entaéo a razdo den!/anparaén/anseria proximo de 1 para grandes lim, 60 |S, — log(n!)| = 0, then the ratio of n!/a, to s,,/a, would be close to 1 for large n.O resultado seguinte, cuja prova pode ser encontrada em Feller (1968), fornece tal n. The following result, whose proof can be found in Feller (1968), provides such an aproximacdo. approximation. Teorema Férmula de Stirling.Deixar Theorem Stirling’s Formula. Let 1.7.5 ( ) 1.7.5 ,_ 1 1 1 1 @n= = registro2m+ M+ = registro(npn. S, = ~ log2m7)+ (n+ — } log(n) —n1. 2 2 2 2 Entdo limdon-»| é-registro(n!)| =0. Dito de outra forma, Then lim, 49 |5, — log(n!)| = 0. Put another way, 2Qm2nn12e-n (Qn) '/2nntt/2e—n lmao. = 7 lim ———__ = 1. 7 7% nl noo n!} Exemplo Aproximando o numero de permutagdes.Suponha que queremos calcular P70,20= Example Approximating the Number of Permutations. Suppose that we want to compute P79 19 = 1.7.12 70!A0!. A aproximagao da formula de Stirling é 1.7.12 70!/50!. The approximation from Stirling’s formula is 70! (2171/27070.5€-70 70! Qa) /27970-5¢-70 — = —________ =3.940 1035. 10" Gm) 3.940 x 10°, 50! (2725050.5e-50 50! (27) 1/25950-5e—50 O calculo exato produz 3.938x1035. A aproximagao e o calculo exato diferem em The exact calculation yields 3.938 x 10*>. The approximation and the exact calcula- menos de 1/10 de 1 por cento. - tion differ by less than 1/10 of 1 percent. < Resumo Summary Suponha que as seguintes condicgdes sejam atendidas: Suppose that the following conditions are met: Cada elemento de um conjunto consiste em&partes distinguiveisx, ..., xk. ¢ Each element of a set consists of k distinguishable parts x1, ..., x,. Hampossibilidades para a primeira partex1. ¢ There are n, possibilities for the first part x1. Para cadaeu=2,..., ke cada combinacao(x1, ..., Xeu-1)de primeiraeu1 partes, ¢ Foreachi =2,...,kandeach combination (x, ..., x;_1) of the first i — 1 parts, existemneupossibilidades para oeua parte xeu. there are n; possibilities for the ith part x;. Nessas condigées, existemn. . . nkelementos do conjunto. A terceira condigdo exige Under these conditions, there are n, ---n, elements of the set. The third condition apenas que o numero de possibilidades dexeuser neundo importa o que aconteca antes requires only that the number of possibilities for x; be n; no matter what the earlier 32 Chapter 1 Introduction to Probability parts are. For example, for i = 2, it does not require that the same n2 possibilities be available for x2 regardless of what x1 is. It only requires that the number of possibilities for x2 be n2 no matter what x1 is. In this way, the general rule includes the multiplication rule, the calculation of permutations, and sampling with replacement as special cases. For permutations of m items k at a time, we have ni = m − i + 1 for i = 1, . . . , k, and the ni possibilities for part i are just the ni items that have not yet appeared in the first i − 1 parts. For sampling with replacement from m items, we have ni = m for all i, and the m possibilities are the same for every part. In the next section, we shall consider how to count elements of sets in which the parts of each element are not distinguishable. Exercises 1. Each year starts on one of the seven days (Sunday through Saturday). Each year is either a leap year (i.e., it includes February 29) or not. How many different cal- endars are possible for a year? 2. Three different classes contain 20, 18, and 25 students, respectively, and no student is a member of more than one class. If a team is to be composed of one student from each of these three classes, in how many different ways can the members of the team be chosen? 3. In how many different ways can the five letters a, b, c, d, and e be arranged? 4. If a man has six different sportshirts and four different pairs of slacks, how many different combinations can he wear? 5. If four dice are rolled, what is the probability that each of the four numbers that appear will be different? 6. If six dice are rolled, what is the probability that each of the six different numbers will appear exactly once? 7. If 12 balls are thrown at random into 20 boxes, what is the probability that no box will receive more than one ball? 8. An elevator in a building starts with five passengers and stops at seven floors. If every passenger is equally likely to get off at each floor and all the passengers leave independently of each other, what is the probability that no two passengers will get off at the same floor? 9. Suppose that three runners from team A and three run- ners from team B participate in a race. If all six runners have equal ability and there are no ties, what is the prob- ability that the three runners from team A will finish first, second, and third, and the three runners from team B will finish fourth, fifth, and sixth? 10. A box contains 100 balls, of which r are red. Suppose that the balls are drawn from the box one at a time, at ran- dom, without replacement. Determine (a) the probability that the first ball drawn will be red; (b) the probability that the 50th ball drawn will be red; and (c) the probability that the last ball drawn will be red. 11. Let n and k be positive integers such that both n and n − k are large. Use Stirling’s formula to write as simple an approximation as you can for Pn,k. 1.8 Combinatorial Methods Many problems of counting the number of outcomes in an event amount to counting how many subsets of a certain size are contained in a fixed set. This section gives examples of how to do such counting and where it can arise. Combinations Example 1.8.1 Choosing Subsets. Consider the set {a, b, c, d} containing the four different letters. We want to count the number of distinct subsets of size two. In this case, we can list all of the subsets of size two: {a, b}, {a, c}, {a, d}, {b, c}, {b, d}, and {c, d}. 32 Capítulo 1 Introdução à Probabilidade as peças são. Por exemplo, paraeu=2, faznãoexigir que o mesmon2possibilidades estarão disponíveis parax2independentemente do quex1é. Requer apenas que onúmerode possibilidades parax2sern2não importa o quex1é. Desta forma, a regra geral inclui a regra da multiplicação, o cálculo de permutações e a amostragem com reposição como casos especiais. Para permutações deeuUnidkde cada vez, temosneu=eu-eu+1 para eu=1, . . . , k, e aneupossibilidades para parteeusão apenas osneuitens que ainda não apareceram no primeiroeu-1 partes. Para amostragem com reposição deeuitens, temosneu=eupara todos eu, e aeuas possibilidades são as mesmas para todas as partes. Na próxima seção, consideraremos como contar elementos de conjuntos nos quais as partes de cada elemento não são distinguíveis. Exercícios 1.Cada ano começa em um dos sete dias (de domingo a sábado). Cada ano é bissexto (ou seja, inclui 29 de fevereiro) ou não. Quantos calendários diferentes são possíveis durante um ano? 8.Um elevador em um prédio começa com cinco passageiros e para em sete andares. Se todos os passageiros têm a mesma probabilidade de descer em cada andar e todos os passageiros saem independentemente uns dos outros, qual é a probabilidade de que dois passageiros não desçam no mesmo andar? 2.Três classes diferentes contêm 20,18 e 25 alunos, respectivamente, e nenhum aluno é membro de mais de uma turma. Se uma equipe for composta por um aluno de cada uma dessas três turmas, de quantas maneiras diferentes os membros da equipe poderão ser escolhidos? 9.Suponha que três corredores da equipeAe três corredores da equipeBparticipar de uma corrida. Se todos os seis corredores têm habilidades iguais e não há empates, qual é a probabilidade de que os três corredores da equipeAterminará em primeiro, segundo e terceiro, e os três corredores da equipeBterminará em quarto, quinto e sexto? 3.De quantas maneiras diferentes as cinco letras podema,b,c, d, e eseja organizado? 4.Se um homem tem seis camisas esportivas diferentes e quatro pares de calças diferentes, quantas combinações diferentes ele pode usar? 10.Uma caixa contém 100 bolas, das quaisRÉ vermelho. Suponha que as bolas sejam retiradas da caixa, uma de cada vez, aleatoriamente, sem reposição. Determinar(a)a probabilidade de a primeira bola sorteada ser vermelha;(b)a probabilidade de a 50ª bola sorteada ser vermelha; e(c)a probabilidade de que a última bola sorteada seja vermelha. 5.Se forem lançados quatro dados, qual é a probabilidade de cada um dos quatro números que aparecem ser diferente? 6.Se seis dados forem lançados, qual é a probabilidade de cada um dos seis números diferentes aparecer exatamente uma vez? 11.Deixarneksejam inteiros positivos tais que ambosne n-ksão grandes. Use a fórmula de Stirling para escrever uma aproximação tão simples quanto possível paraPnão, k. 7.Se 12 bolas forem lançadas aleatoriamente em 20 caixas, qual a probabilidade de nenhuma caixa receber mais de uma bola? 1.8 Métodos Combinatórios Muitos problemas de contagem do número de resultados num evento equivalem a contar quantos subconjuntos de um determinado tamanho estão contidos num conjunto fixo. Esta seção dá exemplos de como fazer essa contagem e onde ela pode surgir. Combinações Exemplo 1.8.1 Escolhendo subconjuntos.Considere o conjunto {a, b, c, d}contendo as quatro letras diferentes. Queremos contar o número de subconjuntos distintos de tamanho dois. Neste caso, podemos listar todos os subconjuntos de tamanho dois: {um, b},{um, c},{de Anúncios},{b, c},{b, d},e {cd}. 1.8 Combinatorial Methods 33 We see that there are six distinct subsets of size two. This is different from counting permutaions because {a, b} and {b, a} are the same subset. ◀ For large sets, it would be tedious, if not impossible, to enumerate all of the subsets of a given size and count them as we did in Example 1.8.1. However, there is a connection between counting subsets and counting permutations that will allow us to derive the general formula for the number of subsets. Suppose that there is a set of n distinct elements from which it is desired to choose a subset containing k elements (1 ≤ k ≤ n). We shall determine the number of different subsets that can be chosen. In this problem, the arrangement of the elements in a subset is irrelevant and each subset is treated as a unit. Definition 1.8.1 Combinations. Consider a set with n elements. Each subset of size k chosen from this set is called a combination of n elements taken k at a time. We denote the number of distinct such combinations by the symbol Cn,k. No two combinations will consist of exactly the same elements because two subsets with the same elements are the same subset. At the end of Example 1.8.1, we noted that two different permutations (a, b) and (b, a) both correspond to the same combination or subset {a, b}. We can think of permutations as being constructed in two steps. First, a combination of k elements is chosen out of n, and second, those k elements are arranged in a specific order. There are Cn,k ways to choose the k elements out of n, and for each such choice there are k!ways to arrange those k elements in different orders. Using the multiplication rule from Sec. 1.7, we see that the number of permutations of n elements taken k at a time is Pn,k = Cn,kk!; hence, we have the following. Theorem 1.8.1 Combinations. The number of distinct subsets of size k that can be chosen from a set of size n is Cn,k = Pn,k k! = n! k!(n − k)!. In Example 1.8.1, we see that C4,2 = 4!/[2!2!] = 6. Example 1.8.2 Selecting a Committee. Suppose that a committee composed of eight people is to be selected from a group of 20 people. The number of different groups of people that might be on the committee is C20,8 = 20! 8!12! = 125,970. ◀ Example 1.8.3 Choosing Jobs. Suppose that, in Example 1.8.2, the eight people in the committee each get a different job to perform on the committee. The number of ways to choose eight people out of 20 and assign them to the eight different jobs is the number of permutations of 20 elements taken eight at a time, or P20,8 = C20,8 × 8!= 125,970 × 8!= 5,078,110,400. ◀ Examples 1.8.2 and 1.8.3 illustrate the difference and relationship between com- binations and permutations. In Example 1.8.3, we count the same group of people in a different order as a different outcome, while in Example 1.8.2, we count the same group in different orders as the same outcome. The two numerical values differ by a factor of 8!, the number of ways to reorder each of the combinations in Example 1.8.2 to get a permutation in Example 1.8.3. 1.8 Métodos Combinatórios 33 Vemos que existem seis subconjuntos distintos de tamanho dois. Isso é diferente de contar permutações porque {um, b}e {BA}são o mesmo subconjunto. - Para conjuntos grandes, seria tedioso, se não impossível, enumerar todos os subconjuntos de um determinado tamanho e contá-los como fizemos no Exemplo 1.8.1. No entanto, existe uma ligação entre a contagem de subconjuntos e a contagem de permutações que nos permitirá derivar a fórmula geral para o número de subconjuntos. Suponha que exista um conjunto denelementos distintos dos quais se deseja escolher um subconjunto contendokelementos(1≤k≤n). Determinaremos o número de subconjuntos diferentes que podem ser escolhidos. Neste problema, a disposição dos elementos num subconjunto é irrelevante e cada subconjunto é tratado como uma unidade. Definição 1.8.1 Combinações.Considere um conjunto comnelementos. Cada subconjunto de tamanhokescolhido deste conjunto é chamado decombinação denelementos tomadoskde uma vez. Denotamos o número de tais combinações distintas pelo símboloCnão, k. Duas combinações não consistirão exatamente dos mesmos elementos porque dois subconjuntos com os mesmos elementos são o mesmo subconjunto. No final do Exemplo 1.8.1, notamos que duas permutações diferentes(uma, b) e(BA)ambos correspondem à mesma combinação ou subconjunto {um, b}. Podemos pensar nas permutações como sendo construídas em duas etapas. Primeiro, uma combinação dekelementos são escolhidos dentren, e em segundo lugar, aqueleskos elementos são organizados em uma ordem específica. HáC não, kmaneiras de escolher okelementos fora den, e para cada uma dessas escolhas existem k! maneiras de organizar esseskelementos em ordens diferentes. Usando a regra de multiplicação da Seç. 1.7, vemos que o número de permutações denelementos tomadoskde cada vez éPnão, k=Cnão, kk!; Por isso, temos o seguinte. Teorema 1.8.1 Combinações.O número de subconjuntos distintos de tamanhokque pode ser escolhido de um conjunto de tamanhoné P n! k!(n-k)! Cnão, k=não, k = . k! No Exemplo 1.8.1, vemos queC4,2= 4!/[2!2!] = 6. Exemplo 1.8.2 Selecionando um Comitê.Suponha que um comitê composto por oito pessoas deva ser selecionado de um grupo de 20 pessoas. O número de diferentes grupos de pessoas que podem estar no comitê é 20! 8!12! C20,8= =125,970. - Exemplo 1.8.3 Escolhendo empregos.Suponha que, no Exemplo 1.8.2, as oito pessoas do comitê cada um recebe um trabalho diferente para desempenhar no comitê. O número de maneiras de escolher oito pessoas entre 20 e atribuí-las a oito empregos diferentes é o número de permutações de 20 elementos tomados oito de cada vez, ou P20,8=C20,8×8! = 125,970×8! = 5,078,110,400. - Os Exemplos 1.8.2 e 1.8.3 ilustram a diferença e a relação entre combinações e permutações. No Exemplo 1.8.3, contamos o mesmo grupo de pessoas numa ordem diferente como um resultado diferente, enquanto no Exemplo 1.8.2, contamos o mesmo grupo em ordens diferentes como o mesmo resultado. Os dois valores numéricos diferem por um fator de 8!, o número de maneiras de reordenar cada uma das combinações no Exemplo 1.8.2 para obter uma permutação no Exemplo 1.8.3. 34 Capitulo 1 Introdugdo a Probabilidade 34 Chapter 1 Introduction to Probability Coeficientes Binomiais Binomial Coefficients Definigao Coeficientes Binomiais.O numero Cnéo, Aambém é denotado pelo simbolo (Qou seja, por Definition Binomial Coefficients. The number C,, ; is also denoted by the symbol (/’). That is, for 1.8.2 k=0,1,...,7, 1.8.2 k=0,1,...,n, ( Mn nl n n! = —__., (1.8.1) ( Serena (1.8.1) k K(n-k} k k\(n —k)! Quando esta notacdo é usada, este numero é chamado decoeficiente binomial. When this notation is used, this number is called a binomial coefficient. O nomecoeficiente binomiaideriva do aparecimento do simbolo no teorema The name binomial coefficient derives from the appearance of the symbol in the binomial, cuja prova é deixada como Exercicio 20 nesta secdo. binomial theorem, whose proof is left as Exercise 20 in this section. Teorema Teorema Binomial.Para todos os numerosxesime cada inteiro positivon, Theorem Binomial Theorem. For all numbers x and y and each positive integer n, 1.8.2 yl ) 1.8.2 h n i n (x+ @)n= k Xksimnr-k, Z (x + yy" _ > ( atv : k=0 k=0 k Existem algumas relac6es Uteis entre coeficientes binomiais. There are a couple of useful relations between binomial coefficients. Teorema Para todosn, Theorem For all n, 1.8.3 OO 1.8.3 Ch Ch = =1. = =1. 0 n 0 n Para todosne tudoA=0,1,..., 7, For all n and allk =0,1,...,n, ()( ) n _ n n\ | n k n-k (") (, "') ProvaA primeira equacdo segue do fato de que 0! = 1. A segunda equacdo segue da Eq. Proof The first equation follows from the fact that 0!=1. The second equation (1.8.1). A segunda equagdo também pode ser derivada do fato de que a selegdok follows from Eq. (1.8.1). The second equation can also be derived from the fact that elementos para formar um subconjunto é equivalente a selecionar os restantes/7-k selecting k elements to form a subset is equivalent to selecting the remaining n — k elementos para formar o complemento do subconjunto. 7 elements to form the complement of the subset. a As vezes € conveniente usar a expressdo “nescolherk’ pelo valor de It is sometimes convenient to use the expression “n choose k” for the value of G : an C,,,x- Thus, the same quantity is represented by the two different notations C,, , and Repose Mo’ FUSS PR REMY EAI S HERES Ge dae Haat as ARAPTEL LOTR SHARAD de (7), and we may refer to this quantity in three different ways: as the number of combinagdes denelementos tomadoskde cada vez, como o coeficiente binomial dene k, ou combinations of n elements taken k at a time, as the binomial coefficient of n and simplesmente como “nescolherk.” k, or simply as “n choose k.” Exemplo Tipos sanguineos.No Exemplo 1.6.4 na pagina 23, definimos genes, alelos e gendtipos. Example Blood Types. In Example 1.6.4 on page 23, we defined genes, alleles, and genotypes. 1.8.4 O gene para 0 tipo sanguineo humano consiste em um par de alelos escolhidos entre os trés 1.8.4 The gene for human blood type consists of a pair of alleles chosen from the three alelos comumente chamados O, Ae B. Por exemplo, duas combinacées possiveis de alelos alleles commonly called O, A, and B. For example, two possible combinations of (chamados gendétipos) para formar um gene do tipo sanguineo seriam BB e AO. . Nao alleles (called genotypes) to form a blood-type gene would be BB and AO. We will distinguiremos os mesmos dois alelos em ordens diferentes, portanto OA representa 0 mesmo not distinguish the same two alleles in different orders, so OA represents the same genotipo que AO. Quantos gendtipos existem para 0 tipo sanguineo? genotype as AO. How many genotypes are there for blood type? A resposta poderia ser facilmente encontrada contando, mas é um exemplo de calculo The answer could easily be found by counting, but it is an example of a more mais geral. Suponha que um gene consiste em um par escolhido de um conjunto de general calculation. Suppose that a gene consists of a pair chosen from a set of nalelos diferentes. Supondo que ndo podemos distinguir o mesmo par em d(se __n)ferente n different alleles. Assuming that we cannot distinguish the same pair in different ordens, existemnpares onde ambos os alelos sdo iguais, e ha 2pares orders, there are n pairs where both alleles are the same, and there are (5) pairs onde os dois alelos sdo diferentes. O numero total de gendtipos é where the two alleles are different. The total number of genotypes is () ( ) n n(n-1 n(n+1 ni n n(n —1 nin+1 n+1 ne pe UA lr) _ | na () ay MRD mtd) _ (n+) 2 2 2 2 2 2 2 2 1.8 Métodos Combinatorios 35 1.8 Combinatorial Methods 35 Para 0 caso do tipo sanguineo, feryerrs: entdo ha For the case of blood type, we have n = 3, so there are 4 _-%3 L¢ (:) _4x3_, 2 2 2 2 gendtipos, como poderia ser facilmente verificado por contagem. - genotypes, as could easily be verified by counting. < Nota: Amostragem com Reposic¢ao.O método de contagem descrito no Exemplo Note: Sampling with Replacement. The counting method described in Exam- 1.8.4 6 um tipo de amostragem com reposicao diferente do tipo descrito no Exemplo ple 1.8.4 is a type of sampling with replacement that is different from the type 1.7.10. No Exemplo 1.7.10, amostramos com reposi¢do, mas distinguimos entre described in Example 1.7.10. In Example 1.7.10, we sampled with replacement, but amostras com as mesmas bolas em ordens diferentes. Isso poderia ser chamado we distinguished between samples having the same balls in different orders. This amostragem solicitada com reposicgo. No Exemplo 1.8.4, amostras contendo os could be called ordered sampling with replacement. In Example 1.8.4, samples con- mesmos genes em ordens diferentes foram consideradas 0 mesmo resultado. Esse taining the same genes in different orders were considered the same outcome. This poderia ser chamadoamostragem no ordenada com reposic¢ao. A formula geral para could be called unordered sampling with replacement. The general formula for the numero de amostras ndo ordenadas de tamanhokcom substituigdo denelementos ém-1) , number of unordered samples of size k with replacement from n elements is th), e pode ser derivado no Exercicio 19. E possivel terAmaior quenao amostrar com and can be derived in Exercise 19. It is possible to have k larger than n when sampling reposicdo. with replacement. Exemplo Selecionando produtos de panificagdo.Vocé vai a uma padaria para selecionar alguns produtos assados_ para o jantar Example Selecting Baked Goods. You go toa bakery to select some baked goods for a dinner 1.8.5 festa. Vocé precisa escolher um total de 12 itens. O padeiro tem sete tipos diferentes de 1.8.5 party. You need to choose a total of 12 items. The baker has seven different types itens para escolher, com muitos de cada tipo disponiveis. Quantas caixas diferentes de 12 of items from which to choose, with lots of each type available. How many different itens vocé pode escolher? Aqui nao distinguiremos a mesma colecdo de 12 itens dispostos boxfuls of 12 items are possible for you to choose? Here we will not distinguish the em ordens diferentes na caixa. Este é um exemplo de amostragem nado ordenada com same collection of 12 items arranged in different orders in the box. This is an example reposicdo porque podemos (na verdade devemos) escolher of unordered sampling with replacement because we can (indeed we must) choose © mesmo tipo de item mais de ume vez, mas ndo estamos distinguindo os mesmos itens the same type of item more than once, but we are not distinguishing the same items em ordens diferentes. Ha7+12-1),, | =18,564 caixas diferentes. - in different orders. There are (15) = 18,564 different boxfuls. < O Exemplo 1.8.5 levanta uma quest&o que pode causar confusdo se no se determinar Example 1.8.5 raises an issue that can cause confusion if one does not carefully cuidadosamente os elementos do espac¢o amostral e especificar cuidadosamente quais determine the elements of the sample space and carefully specify which outcomes resultados (se houver) sdo igualmente provaveis. O préximo exemplo ilustra a questao no (if any) are equally likely. The next example illustrates the issue in the context of contexto do Exemplo 1.8.5. Example 1.8.5. Exemplo Selecionando produtos de panificagdo.Imagine duas maneiras diferentes de escolher uma caixa cheia de 12 assados Example Selecting Baked Goods. Imagine two different ways of choosing a boxful of 12 baked 1.8.6 produtos selecionados entre os sete tipos diferentes disponiveis. No primeiro método, vocé 1.8.6 goods selected from the seven different types available. In the first method, you escolhe um item aleatoriamente entre sete disponiveis. Entao, independentemente de qual choose one item at random from the seven available. Then, without regard to what item foi escolhido primeiro, vocé escolhe o segundo item aleatoriamente entre os sete item was chosen first, you choose the second item at random from the seven available. disponiveis. Entao vocé continua desta forma escolhendo aleatoriamente o préximo item entre Then you continue in this way choosing the next item at random from the seven os sete disponiveis, independentemente do que ja foi escolhido, até ter escolhido 12. Para este available without regard to what has already been chosen until you have chosen 12. método de escolha, é natural deixar os resultados serem as sequéncias possiveis dos 12. tipos For this method of choosing, it is natural to let the outcomes be the possible sequences de itens escolhidos. O espaco amostral conteria 712= 1.381010 of the 12 types of items chosen. The sample space would contain 7!7 = 1.38 x 101° resultados diferentes que seriam igualmente provaveis. different outcomes that would be equally likely. No segundo método de escolha, a padeira informa que tem disponivel In the second method of choosing, the baker tells you that she has available 18.564 caixas diferentes recém-embaladas. Vocé ent4o seleciona um aleatoriamente. Nesse caso, 0 18,564 different boxfuls freshly packed. You then select one at random. In this case, espaco amostral consistiria em 18.564 resultados diferentes igualmente provaveis. the sample space would consist of 18,564 different equally likely outcomes. Apesar dos diferentes espacos amostrais que surgem nos dois métodos de escolha, In spite of the different sample spaces that arise in the two methods of choosing, existem algumas descrigdes verbais que identificam um evento em ambos os espacos there are some verbal descriptions that identify an event in both sample spaces. For amostrais. Por exemplo, ambos os espacos amostrais contém um evento que poderia ser example, both sample spaces contain an event that could be described as {all 12 items descrito como {todos os 12 itens so do mesmo tipo} mesmo que os resultados sejam tipos are of the same type} even though the outcomes are different types of mathematical diferentes de objetos matematicos nos dois espacos amostrais. A probabilidade de todos os 12 objects in the two sample spaces. The probability that all 12 items are of the same itens serem do mesmo tipo sera, na verdade, diferente dependendo do método usado para type will actually be different depending on which method you use to choose the escolher a caixa. boxful. No primeiro método, sete dos 712resultados igualmente provaveis contém 12 itens do In the first method, seven of the 7!? equally likely outcomes contain 12 of the mesmo tipo. Portanto, a probabilidade de que todos os 12 itens sejam do mesmo tipo é same type of item. Hence, the probability that all 12 items are of the same type is 36 Capitulo 1 Introdugdo a Probabilidade 36 Chapter 1 Introduction to Probability 7/112= 5.06x10-10. No segundo método, existem sete caixas igualmente provaveis que 7/7 = 5.06 x 10-1. In the second method, there are seven equally liklely boxes contém 12 do mesmo tipo de item. Portanto, a probabilidade de que todos os 12 itens that contain 12 of the same type of item. Hence, the probability that all 12 items are sejam do mesmo tipo é 7/18,564 = 3.77x10-4. Antes de poder calcular a probabilidade de of the same type is 7/18,564 = 3.77 x 10-4. Before one can compute the probability um evento como {todos os 12 itens sio do mesmo tipo}, deve-se ter cuidado ao definir 0 for an event such as {all 12 items are of the same type}, one must be careful about experimento e seus resultados. - defining the experiment and its outcomes. < Arranjos de elementos de dois tipos distintosQuando um conjunto contém apenas el- Arrangements of Elements of Two Distinct Types When a set contains only el- elementos de dois tipos distintos, um coeficiente binomial pode ser usado para representar o ements of two distinct types, a binomial coefficient can be used to represent the numero de arranjos diferentes de todos os elementos do conjunto. Suponha, por exemplo, que number of different arrangements of all the elements in the set. Suppose, for ex- kbolas vermelhas semelhantes en-kbolas verdes semelhantes devem ser dispostas em fila. ample, that k similar red balls and n — k similar green balls are to be arranged in a Como as bolas vermelhas ocupardokposicées na linha, cada arranjo diferente donbolas row. Since the red balls will occupy k positions in the row, each different arrangement corresponde a uma escolha diferente dokposicgées ocupadas pelas bolas vermelhas. Portanto, o of the n balls corresponds to a different choice of the k positions occupied by the red numero de diferentes arranjos donbolas sera igual ao numero de maneiras diferentes pelas balls. Hence, the number of different arrangements of the n balls will be equal to quaiskposicdes podem ser selecionadas para as bolas vermelhas the number of different ways in which k positions can be selected for the red balls denaproveitar (aly)le posigdes. Como esse numero de maneiras é especificado pelo bi(no _7)- from the n available positions. Since this number of ways is specified by the bino- coeficiente mialx,o numero de arranjos diferentes donbolas também é k mial coefficient (7), the number of different arrangements of the n balls is also (7). Em outras palavras, o numero de diferentes arranjos denobjetos con(s) n)lista dek In other words, the number of different arrangements of n objects consisting of k objetos semelhantes de um tipo en-kobjetos semelhantes de um segundo tipo é k similar objects of one type and n — k similar objects of a second type is (7). Exemplo Jogando uma moeda.Suponha que uma moeda honesta seja lancada 10 vezes e que seja desejado Example Tossing a Coin. Suppose that a fair coin is to be tossed 10 times, and it is desired 1.8.7 para determinar (a) a probabilidadepde obter exatamente trés caras e(b)a 1.8.7 to determine (a) the probability p of obtaining exactly three heads and (b) the probabilidadegde obter trés ou menos caras. probability p’ of obtaining three or fewer heads. (a) O numero total possivel de sequéncias diferentes de 10 caras e coroas é 210, e (a) The total possible number of different sequences of 10 heads and tails is 2!°, pode-se assumir que cada uma dessas sequéncias é igualmente provavel. O and it may be assumed that each of these sequences is equally probable. The numero dessas sequéncias que contém exatamente trés caras sera igual ao number of these sequences that contain exactly three heads will be equal to numero de arranjos diferentes que podem ser formados com trés caras e sete the number of different arrangements that can be formed with three heads and coroas. Aqui estado alguns desses arranjos: seven tails. Here are some of those arrangements: HHHTTTTITT,HHTT TTT TT, HHTTHT TTT, TTHTHTHTTT,etc.. HHHTTTTTTT, HHTHTTITIT, HHITHTTTTT, TTHTHTHTTT, etc. Cada um desses arranjos equivale a yma escolha de onde colocar as 3 cabecas Each such arrangement is equivalent to a choice of where to put the 3 heads entre os 10 langamentos, entdo ha1optais arranjos. A probabilidade de among the 10 tosses, so there are (2) such arrangements. The probability of obter exatamente trés caras é entdo obtaining exactly three heads is then ( ) 0 10 3 =0.1172 3 = 0.1172 PR Tig NN NE P= x9 NTE (b) Usando 0 mesmo raciocinio da parte (a), o numero de sequéncias na amostra (b) Using the same reasoning as in part (a), the number of sequences in the sample espaco que contém exatamentekcabecas(k=0,1,2,3 6k (10) Portanto, a probabilidade space that contain exactly k heads (k = 0, 1, 2, 3) is (7). Hence, the probability de obter trés ou menos caras é of obtaining three or fewer heads is ( 1p 9! ) 4 +4 $ + 10 (D)+(P)+(2)+ (8) _— / —_ p= 210 p= 310 _ 1+10+45+120 _ 176 1+10+45+120 176 _ i909. _ 7210 =0.1719. - => 910. => 710 => 0.1719. < Nota: Usando dois métodos diferentes no mesmo problema. parte (a) do Exemplo 1.8.7 Note: Using Two Different Methods in the Same Problem. Part (a) of Exam- é outro exemplo do uso de dois métodos de contagem diferentes no mesmo problema. A parte ple 1.8.7 is another example of using two different counting methods in the same (b) ilustra outra técnica geral. Nesta parte, dividimos 0 evento de interesse em varios problem. Part (b) illustrates another general technique. In this part, we broke the subconjuntos disjuntos e contamos o nimero de resultados separadamente para cada event of interest into several disjoint subsets and counted the numbers of outcomes subconjunto e, em seguida, Somamos as contagens para obter o total. Em muitos problemas, separately for each subset and then added the counts together to get the total. In pode exigir diversas aplicagdes do mesmo ou de diferentes métodos de contagem. many problems, it can require several applications of the same or different counting 1.8 Métodos Combinatérios 37 1.8 Combinatorial Methods 37 métodos para contar o numero de resultados em um evento. O préximo exemplo é aquele em methods in order to count the number of outcomes in an event. The next example is que os elementos de um evento sdo formados em duas partes (regra de multiplicagdo), mas one in which the elements of an event are formed in two parts (multiplication rule), precisamos realizar cdlculos de combinacdo separados para determinar o numero de but we need to perform separate combination calculations to determine the numbers resultados para cada parte. of outcomes for each part. Exemplo Amostragem sem reposi¢do.Suponha que uma turma contenha 15 meninos e 30 meninas, Example Sampling without Replacement. Suppose that a class contains 15 boys and 30 girls, 1.8.8 e que 10 alunos serdo selecionados aleatoriamente para uma tarefa especial. Vamos 1.8.8 and that 10 students are to be selected at random for a special assignment. We shall determinar a probabilidadepque exatamente tr€s meninos serdo selecionados. determine the probability p that exactly three boys will be selected. O numero de combinacoes diferentes dos 45 alunos que podem ser obtidas The number of different combinations of the 45 students that might be obtained na amostra de 10 alunos 645) 10, ea gfirmacao de que os 10 alunos sdo selecionados in the sample of 10 students is (1), and the statement that the 10 students are selected aleatoriamente significa que cada um dessesaspcombinagées possiveis sdo igualmente provaveis. at random means that each of these (1) possible combinations is equally probable. Portanto, devemos determinar o numero destas combinacdes que contém exatamente trés rapazes e Therefore, we must find the number of these combinations that contain exactly three sete raparigas. boys and seven girls. Quando uma combinagao de trés meninos e sete meninas é formada, 0 numero de combinagdes When a combination of three boys and seven girls is formed, the number of diferentes nas quais trés meninos podem ser selecionados dentre os 15 meninos disponiveis different combinations in which three boys can be selected from the 15 available boys é (15), o numero de combinacgées diferentes nas quais sete meninas podem ser selecionadas is (3), and the number of different combinations in which seven girls can be selected das 30 meninas disponiveis wats Como cada uma dessas combinacées de trés meninos from the 30 available girls is (>). Since each of these combinations of three boys pode ser emparelhado com cada uma das combinag6es de sete meninas para formar uma amostra distinta, can be paired with each of the combinations of seven girls to form a distinct sample, o numero de combinacées contendo exatamente trés meninos és) BB Panto, a the number of combinations containing exactly three boys is (2) (>). Therefore, the probabilidade desejada é desired probability is ( Me 30 15) (30 3 (3)(7) p= he =0.2904. - P= = 0.2904. < 70 (ia) Exemplo Cartas de jogar.Suponha que um baralho de 52 cartas contendo quatro ases seja embaralhado completamente. Example Playing Cards. Suppose that a deck of 52 cards containing four aces is shuffled thor- 1.8.9 aproximadamente e as cartas sdo entdo distribuidas entre quatro jogadores de modo que cada 1.8.9 oughly and the cards are then distributed among four players so that each player jogador receba 13 cartas. Determinaremos a probabilidade de cada jogador receber um as. receives 13 cards. We shall determine the probability that each player will receive one ace. O numero de combinacées diferentes possiveis das quatro posigdes do baralho () The number of possible different combinations of the four positions in the deck ocupado pelos quatro ases é{52) ¢ pode-se supor que cada um dessess2 4 occupied by the four aces is (°7), and it may be assumed that each of these (°7) combinacées é igualmente provavel. Se cada jogador receber um as, entdo deve haver combinations is equally probable. If each player is to receive one ace, then there exatamente um as entre as 13 cartas que o primeiro jogador recebera e um as entre cada must be exactly one ace among the 13 cards that the first player will receive and one um dos trés grupos restantes de 13 cartas que os outros trés jogadores receberao. Em ace among each of the remaining three groups of 13 cards that the other three players outras palavras, existem 13 posigdes possiveis para o as que o primeiro will receive. In other words, there are 13 possible positions for the ace that the first jogador receberé, 13 outras posicées possiveis parao As que 0 segundo jogador recebera player is to receive, 13 other possible positions for the ace that the second player is to receber, e assim por diante. Portanto, entre oss2z4combinacgodes possiveis das posicées receive, and so on. Therefore, among the (?) possible combinations of the positions para os quatro ases, exatamente 134dessas combinacgées levara ao resultado desejado. for the four aces, exactly 13¢ of these combinations will lead to the desired result. Portanto, a probabilidadepque cada jogador recebera um as é Hence, the probability p that each player will receive one ace is 4 Pp Bios. - p= —_ = 0.1055. < 4 (i) Amostras encomendadas versus nédo encomendadasVarios dos exemplos nesta secdo e Ordered versus Unordered Samples Several of the examples in this section and na secdo anterior envolveram a contagem do numero de amostras possiveis que poderiam the previous section involved counting the numbers of possible samples that could surgir usando varios esquemas de amostragem. As vezes tratamos a mesma colecdo de arise using various sampling schemes. Sometimes we treated the same collection of elementos em ordens diferentes como amostras diferentes, e as vezes tratamos os mesmos elements in different orders as different samples, and sometimes we treated the same elementos em ordens diferentes como a mesma amostra. Em geral, como saber qual é a forma elements in different orders as the same sample. In general, how can one tell which correta de contar em um determinado problema? As vezes, a descrigéo do problema deixara is the correct way to count in a given problem? Sometimes, the problem description claro 0 que é necessario. Por exemplo, se formos solicitados a encontrar a probabilidade will make it clear which is needed. For example, if we are asked to find the probability 38 Chapter 1 Introduction to Probability that the items in a sample arrive in a specified order, then we cannot even specify the event of interest unless we treat different arrangements of the same items as different outcomes. Examples 1.8.5 and 1.8.6 illustrate how different problem descriptions can lead to very different calculations. However, there are cases in which the problem description does not make it clear whether or not one must count the same elements in different orders as different outcomes. Indeed, there are some problems that can be solved correctly both ways. Example 1.8.9 is one such problem. In that problem, we needed to decide what we would call an outcome, and then we needed to count how many outcomes were in the whole sample space S and how many were in the event E of interest. In the solution presented in Example 1.8.9, we chose as our outcomes the positions in the 52-card deck that were occupied by the four aces. We did not count different arrangements of the four aces in those four positions as different outcomes when we counted the number of outcomes in S. Hence, when we calculated the number of outcomes in E, we also did not count the different arrangements of the four aces in the four possible positions as different outcomes. In general, this is the principle that should guide the choice of counting method. If we have the choice between whether or not to count the same elements in different orders as different outcomes, then we need to make our choice and be consistent throughout the problem. If we count the same elements in different orders as different outcomes when counting the outcomes in S, we must do the same when counting the elements of E. If we do not count them as different outcomes when counting S, we should not count them as different when counting E. Example 1.8.10 Playing Cards, Revisited. We shall solve the problem in Example 1.8.9 again, but this time, we shall distinguish outcomes with the same cards in different orders. To go to the extreme, let each outcome be a complete ordering of the 52 cards. So, there are 52! possible outcomes. How many of these have one ace in each of the four sets of 13 cards received by the four players? As before, there are 134 ways to choose the four positions for the four aces, one among each of the four sets of 13 cards. No matter which of these sets of positions we choose, there are 4! ways to arrange the four aces in these four positions. No matter how the aces are arranged, there are 48! ways to arrange the remaining 48 cards in the 48 remaining positions. So, there are 134 × 4!× 48! outcomes in the event of interest. We then calculate p = 134 × 4!× 48! 52! = 0.1055. ◀ In the following example, whether one counts the same items in different orders as different outcomes is allowed to depend on which events one wishes to use. Example 1.8.11 Lottery Tickets. In a lottery game, six numbers from 1 to 30 are drawn at random from a bin without replacement, and each player buys a ticket with six different numbers from 1 to 30. If all six numbers drawn match those on the player’s ticket, the player wins. We assume that all possible draws are equally likely. One way to construct a sample space for the experiment of drawing the winning combination is to consider the possible sequences of draws. That is, each outcome consists of an ordered subset of six numbers chosen from the 30 available numbers. There are P30,6 = 30!/24!such outcomes. With this sample space S, we can calculate probabilities for events such as A = {the draw contains the numbers 1, 14, 15, 20, 23, and 27}, B = {one of the numbers drawn is 15}, and C = {the first number drawn is less than 10}. 38 Capítulo 1 Introdução à Probabilidade que os itens de uma amostra chegam numa ordem específica, então não podemos sequer especificar o evento de interesse, a menos que tratemos diferentes arranjos dos mesmos itens como resultados diferentes. Os Exemplos 1.8.5 e 1.8.6 ilustram como diferentes descrições de problemas podem levar a cálculos muito diferentes. Contudo, há casos em que a descrição do problema não deixa claro se se deve ou não contar os mesmos elementos em ordens diferentes como resultados diferentes. Na verdade, existem alguns problemas que podem ser resolvidos corretamente nos dois sentidos. O Exemplo 1.8.9 é um desses problemas. Nesse problema, precisávamos decidir o que chamaríamos de resultado e, em seguida, precisávamos contar quantos resultados havia em todo o espaço amostralSe quantos estavam no eventoEde interesse. Na solução apresentada no Exemplo 1.8.9, escolhemos como resultados as posições no baralho de 52 cartas que estavam ocupadas pelos quatro ases. Não contamos diferentes arranjos dos quatro ases nessas quatro posições como resultados diferentes quando contamos o número de resultados emS. Portanto, quando calculamos o número de resultados emE, também não contamos os diferentes arranjos dos quatro ases nas quatro posições possíveis como resultados diferentes. Em geral, este é o princípio que deve nortear a escolha do método de contagem. Se tivermos a escolha entre contar ou não os mesmos elementos em ordens diferentes como resultados diferentes, então precisamos de fazer a nossa escolha e ser consistentes ao longo de todo o problema. Se contarmos os mesmos elementos em ordens diferentes como resultados diferentes ao contar os resultados emS, devemos fazer o mesmo ao contar os elementos deE. Se não os contarmos como resultados diferentes ao contarS, não devemos contá-los como diferentes ao contarE. Exemplo 1.8.10 Cartas de jogar, revisitadas.Resolveremos o problema do Exemplo 1.8.9 novamente, mas isso vez, distinguiremos resultados com as mesmas cartas em ordens diferentes. Para ir ao extremo, deixe que cada resultado seja uma ordenação completa das 52 cartas. Então, são 52! Possíveis resultados. Quantos destes têm um ás em cada um dos quatro conjuntos de 13 cartas recebidos pelos quatro jogadores? Como antes, existem 134maneiras de escolher as quatro posições para os quatro ases, um entre cada um dos quatro conjuntos de 13 cartas. Não importa qual desses conjuntos de posições escolhemos, existem 4! maneiras de organizar os quatro ases nessas quatro posições. Não importa como os ases estejam dispostos, são 48! maneiras de organizar as 48 cartas restantes nas 48 posições restantes. Então, são 134×4!×48! resultados em caso de interesse. Calculamos então 134×4!×48! 52! p= =0.1055. - No exemplo a seguir, é permitido contar os mesmos itens em ordens diferentes como resultados diferentes, dependendo de quais eventos se deseja usar. Exemplo 1.8.11 Bilhete de loteria.Em um jogo de loteria, seis números de 1 a 30 são sorteados aleatoriamente. uma lata sem reposição, e cada jogador compra um bilhete com seis números diferentes de 1 a 30. Se todos os seis números sorteados corresponderem aos do bilhete do jogador, o jogador ganha. Assumimos que todos os empates possíveis são igualmente prováveis. Uma forma de construir um espaço amostral para o experimento de sorteio da combinação vencedora é considerar as possíveis sequências de sorteios. Ou seja, cada resultado consiste num subconjunto ordenado de seis números escolhidos entre os 30 números disponíveis. HáP30,6= 30!/24! tais resultados. Com este espaço amostralS, podemos calcular probabilidades para eventos como A= {o sorteio contém os números 1, 14, 15, 20, 23 e 27}, B= { um dos números sorteados é 15},e C= {o primeiro número sorteado é menor que 10}. 1.8 Métodos Combinatérios 39 1.8 Combinatorial Methods 39 Existe outro espaco amostral natural, que denotaremosS, para este experimento. There is another natural sample space, which we shall denote S’, for this experiment. Consiste apenas nas diferentes combinacées de seis numeros sorteados dos 30 It consists solely of the different combinations of six numbers drawn from the 30 disponivel. Ha6= 30!/6!24!tais resultados. Também parece natural considerar todos estes resultados available. There are (°) = 30!/(6!24!) such outcomes. It also seems natural to consider igualmente provaveis. Com este espaco amostral, podemos calcular as probabilidades dos eventosAe all of these outcomes equally likely. With this sample space, we can calculate the Bacima, masCndo é um subconjunto do espacgo amostral $, portanto, nao podemos calcular sua probabilities of the events A and B above, but C is not a subset of the sample space probabilidade usando esse espaco amostral menor. Quando 0 espaco amostral para um experimento S’, so we cannot calculate its probability using this smaller sample space. When the pode naturalmente ser construido de mais de uma maneira, é necessario escolher com base em quais sample space for an experiment could naturally be constructed in more than one way, eventos se deseja calcular as probabilidades. one needs to choose based on for which events one wants to compute probabilities. - < O Exemplo 1.8.11 levanta a questdo de saber se sera possivel calcular as mesmas Example 1.8.11 raises the question of whether one will compute the same prob- probabilidades usando dois espacos amostrais diferentes quando 0 evento, comoAouZB, existe abilities using two different sample spaces when the event, such as A or B, exists em ambos os espacos amostrais. No exemplo, cada resultado no espaco amostral menor 5 in both sample spaces. In the example, each outcome in the smaller sample space corresponde a um evento no espacgo amostral maior S. Na verdade, cada resultadoé S’ corresponds to an event in the larger sample space S. Indeed, each outcome s’ emSicorresponde ao evento emScontendo o 6! permutacgées da combinacgdo in S’ corresponds to the event in S containing the 6! permutations of the single Unicaé. Por exemplo, o eventoAno exemplo tem apenas um resultado 6=(1,14, combination s’. For example, the event A in the example has only one outcome 15,20,23,27)no espaco amostralS, enquanto o evento correspondente no espago s’ = (1, 14, 15, 20, 23, 27) in the sample space S’, while the corresponding event in amostralStem 6! permutagées incluindo the sample space S has 6! permutations including (1,14,15,20,23,27), (14,20,27,15,23,1), (27,23,20,15,14,1 }etc. (1, 14, 15, 20, 23, 27), (14, 20, 27, 15, 23, 1), (27, 23, 20, 15, 14, 1), ete. No espaco amostralS, a probabilidade do eventoAé In the sample space S, the probability of the event A is ! 124) ! 124! Pr.(AF = Oy Pr(a)= = P30,6 : 6 30,6 , (5) No espaco amostralS, o eventoAtem essa mesma probabilidade porque tem apenas um dose In the sample space S’, the event A has this same probability because it has only one resultad@@igualmente provaveis. O mesmo raciocinio se aplica a todos os resultados em of the (°°) equally likely outcomes. The same reasoning applies to every outcome in S. Portanto, se o mesmo evento pode ser expresso em ambos os espacos amostraisSeS;, calcularemos S’. Hence, if the same event can be expressed in both sample spaces S and S’, we a mesma probabilidade usando qualquer espaco amostral. Esta é uma caracteristica especial de will compute the same probability using either sample space. This is a special feature exemplos como o Exemplo 1.8.11, em que cada resultado no espaco amostral menor corresponde a of examples like Example 1.8.11 in which each outcome in the smaller sample space um evento no espaco amostral maior com o mesmo numero de elementos. Existem exemplos em que corresponds to an event in the larger sample space with the same number of elements. esta caracteristica ndo esta presente, e ndo se pode tratar ambos os espacos amostrais como espacos There are examples in which this feature is not present, and one cannot treat both amostrais simples. sample spaces as simple sample spaces. Exemplo Jogando moedas.Um experimento consiste em langar uma moeda duas vezes. Se quisermos Example Tossing Coins. An experiment consists of tossing a coin two times. If we want to 1.8.12 distinguir H seguido por T de T seguido por H, devemos usar 0 espaco amostral S= {HH, 1.8.12 distinguish H followed by T from T followed by H, we should use the sample space HT, TH, TT},que pode naturalmente ser assumido como um espaco amostral simples. Por S ={HH, HT, TH, TT}, which might naturally be assumed a simple sample space. outro lado, podemos estar interessados apenas no numero de H langados. Neste caso, On the other hand, we might be interested solely in the number of H’s tossed. In this podemos considerar o menor espaco amostralS= {0,1,2} onde cada resultado conta case, we might consider the smaller sample space S’ = {0, 1, 2} where each outcome apenas o numero de H's. Os resultados 0 e 2 emScada um corresponde a um Unico merely counts the number of H’s. The outcomes 0 and 2 in S’ each correspond to resultado em5S, mas 1€Scorresponde ao evento {H7, TH} CScom dois resultados. Se a single outcome in S, but 1 € S’ corresponds to the event {HT, TH} Cc S with two pensarmos emScomo um espaco amostral simples, entéoSndo sera um espaco amostral outcomes. If we think of S as a simple sample space, then S’ will not be a simple simples, porque o resultado 1 tera probabilidade 1/2, enquanto os outros dois resultados sample space, because the outcome 1 will have probability 1/2 while the other two tém, cada um, probabilidade 1/4. outcomes each have probability 1/4. Ha situagdes em que seria justificado tratar Scomo um espaco amostral simples e There are situations in which one would be justified in treating S’ as a simple atribuindo a cada um de seus resultados probabilidade 1/3. Alguém poderia fazer isso se sample space and assigning each of its outcomes probability 1/3. One might do this acreditasse que a moeda ndo era justa, mas no tivesse ideia de qudo injusta ela era ou if one believed that the coin was not fair, but one had no idea how unfair it was or de qual lado teria maior probabilidade de cair. Nesse caso, Sndo seria um espaco amostral which side were more likely to land up. In this case, § would not be a simple sample simples, porque dois de seus resultados teriam probabilidade 1/3 e os outros dois teriam space, because two of its outcomes would have probability 1/3 and the other two probabilidades que somam 1/3. - would have probabilities that add up to 1/3. < 40 Capitulo 1 Introdugdo 4 Probabilidade 40 Chapter 1 Introduction to Probability O Exemplo 1.8.6 é outro caso de dois espacos amostrais diferentes em que cada resultado Example 1.8.6 is another case of two different sample spaces in which each em um espaco amostral corresponde a um numero diferente de resultados no outro espaco. outcome in one sample space corresponds to a different number of outcomes in the Veja o Exercicio 12 na Secdo. 1.9 para uma andlise mais completa do Exemplo 1.8.6. other space. See Exercise 12 in Sec. 1.9 for amore complete analysis of Example 1.8.6. O torneio de ténis The Tennis Tournament Apresentaremos agora um problema dificil que tem uma solugdo simples e elegante. We shall now present a difficult problem that has a simple and elegant solution. Suponha quertenistas sdo inscritos em um torneio. Na primeira rodada, os jogadores sdo Suppose that n tennis players are entered in a tournament. In the first round, the colocados uns contra os outros aleatoriamente. O perdedor de cada par é eliminado do players are paired one against another at random. The loser in each pair is eliminated torneio e o vencedor de cada par continua na segunda rodada. Se o numero de jogadores from the tournament, and the winner in each pair continues into the second round. nfor impar, entdo um jogador é escolhido aleatoriamente antes dos pares serem feitos If the number of players n is odd, then one player is chosen at random before the para a primeira rodada, e esse jogador continua automaticamente para a segunda pairings are made for the first round, and that player automatically continues into rodada. Todos os jogadores na segunda rodada sao emparelhados aleatoriamente. the second round. All the players in the second round are then paired at random. Novamente, o perdedor de cada par é eliminado e 0 vencedor de cada par continua na Again, the loser in each pair is eliminated, and the winner in each pair continues terceira rodada. Se o numero de jogadores na segunda rodada for impar, entédo um into the third round. If the number of players in the second round is odd, then one desses jogadores é escolhido aleatoriamente antes dos outros serem emparelhados, e of these players is chosen at random before the others are paired, and that player esse jogador continua automaticamente para a terceira rodada. O torneio continua desta automatically continues into the third round. The tournament continues in this way forma até que restem apenas dois jogadores na rodada final. Eles entdo jogam entre si, e until only two players remain in the final round. They then play against each other, o vencedor desta partida é o vencedor do torneio. Vamos supor que todosnjogadores tém and the winner of this match is the winner of the tournament. We shall assume that habilidades iguais, e determinaremos a probabilidadepque dois jogadores especificosAeB all n players have equal ability, and we shall determine the probability p that two jogardo entre si durante o torneio. specific players A and B will ever play against each other during the tournament. Primeiro determinaremos o numero total de partidas que serdo disputadas We shall first determine the total number of matches that will be played during durante o torneio. Apdés o término de cada partida, um jogador - 0 perdedor da the tournament. After each match has been played, one player—the loser of that partida - é eliminado do torneio. O torneio termina quando todos forem eliminados match—is eliminated from the tournament. The tournament ends when everyone do torneio, exceto 0 vencedor da partida final. Desde exatamenter-1 jogadores has been eliminated from the tournament except the winner of the final match. Since devem ser eliminados, segue-se que exatamente/-1 correspondéncias devem exactly n — 1 players must be eliminated, it follows that exactly n — 1 matches must ser disputado durante o torneio. ( be played during the tournament. O numero de pares possiveis de jogadores én) 2. Cada um dos dois jogadores em cada The number of possible pairs of players is (5). Each of the two players in every partida tem a mesma probabilidade de vencer essa partida, e todos os pares iniciais sdo feitos match is equally likely to win that match, and all initial pairings are made in a random de maneira aleatéria. Portanto, antes do inicio do torneio, todos os pares possiveis de manner. Therefore, before the tournament begins, every possible pair of players is jogadores tam a mesma probabilidade de aparecer em cada um dos/-1 partidas a serem equally likely to appear in each particular one of the n — 1 matches to be played disputadas durante o torneio. Assim, a probabilidade de que (pl! during the tournament. Accordingly, the probability that players A and B will meet em alguma correspondéncia especifica especificada antecipadamente @1 Saxe Dermre abhcorerar in some particular match that is specified in advance is 1/(5). If A and B do meet in naquela partida especifica, um deles perderd e sera eliminado. Portanto, esses mesmos dois that particular match, one of them will lose and be eliminated. Therefore, these same jogadores nao poderao se enfrentar em mais de uma partida. two players cannot meet in more than one match. Segue-se da explicacgdo anterior que a probabilidadepque os jogadoresA eB It follows from the preceding explanation that the probability p that players A vai me (et em algum momento durante o torneio é igual ao produto do and B will meet at some time during the tournament is equal to the product of the probabilidade 1 /aque eles se encontrardo em qualquer partida especifica e o total probability 1/(5) that they will meet in any particular specified match and the total numeror-1 de partidas diferentes em que eles podem se encontrar. Por isso, number n — 1 of different matches in which they might possibly meet. Hence, p= ( fel 2 p= note n n n 2 (3) Resumo Summary Mostramos que 0 numero do tamanhodsubconjuntos de um conjunto de tamanhoné r) =N"'AK(n- We showed that the number of size k subsets of a set of size n is (i) =n!/[k\(n — k}]. Este acaba sendo o numero de amostras possiveis de tamanhokextraido sem k)!]. This turns out to be the number of possible samples of size k drawn without reposigdo de uma populagdo de tamanhonbem como o numero de arranjos den replacement from a population of size n as well as the number of arrangements of n itens de dois tipos comdAde um tipo en-kdo outro tipo. Também vimos varios items of two types with k of one type and n — k of the other type. We also saw several 1.8 Métodos Combinatorios 41 1.8 Combinatorial Methods 41 exemplos em que mais de uma técnica de contagem foi necessdria em diferentes pontos do examples in which more than one counting technique was required at different points mesmo problema. As vezes, é necessaria mais de uma técnica para contar os elementos de um in the same problem. Sometimes, more than one technique is required to count the Unico conjunto. elements of a single set. Exercicios Exercises 1.Dois pesquisadores examinardo um bairro com 20 casas. 13.Uma caixa contém 24 lampadas, das quais quatro estado 1. Two pollsters will canvas a neighborhood with 20 13. A box contains 24 light bulbs of which four are de- Cada pesquisador visitara 10 das casas. Quantas atribuicdes com defeito. Se uma pessoa selecionar 10 ldmpadas da houses. Each pollster will visit 10 of the houses. How many fective. If one person selects 10 bulbs from the box in diferentes de pesquisadores as casas sdo possiveis? caixa de forma aleatéria e uma segunda pessoa pegar a different assignments of pollsters to houses are possible? a random manner, and a second person then takes the , , - (93) restantes 14 l[Ampadas, qual é a probabilidade de que todas as quatro . . . 93 remaining 14 bulbs, what is the probability that all four Qy2u2! dos dois numeros a seguir € maior: 39 OU lAmpadas defeituosas sejam obtidas pela mesma pessoa? 2. Which of the following two numbers is larger: (30) or defective bulbs will be obtained by the same person? 2 318 ( ) 14.Prove que, para todos os inteiros positivosnek (n=k), (3i) 14. Prove that, for all positive integers n and k (n> k), ual dos dois numeros a seguir é maior: 2?) ou ()( )( ) 3. Which of the following two numbers is larger: (2°) or en 9 30 n no 93) 9 8 ger: (30) (; +( n )- vr) 63? kk k (63)? k k-) \k J 4.Uma caixa contém 24 lampadas, das quais quatro estdo com defeito. 15 4. A box contains 24 light bulbs, of which four are defec- 15 Se uma pessoa seleciona aleatoriamente quatro lampadas da caixa, . tive. If a person selects four bulbs from the box at random, . sem substitui-las, qual 6 a probabilidade de que todas as quatro a.Prove isso without replacement, what is the probability that all four a. Prove that lAmpadas estejam com defeito? ( y ) n ( ) ( ) bulbs will be defective? n n n n + + tet =2n. ; oo: + + test =2". 5.Prove que o seguinte numero é um numero inteiro: 0 1 2 n 5. Prove that the following number is an integer: 0 1 2 n 4155x4156x. . .x4250x4251 b.Prove isso 4155 x 4156 x --- x 4250 x 4251 b. Prove that 2x3x. ..x96%x97 Ot) OO, 0, 2x3x---x9x97T n\ (a\ (n\n Ch - + - tot FI =0. - + - +---4+(-1)"(" ) =0. 6.Suponha quenas pessoas estdo sentadas de maneira aleatéria em 0 1 2 3 E')n n 6. Suppose that n people are seated in a random manner (*) (") (") (") (“dD n uma fileira denassentos de teatro. Qual é a probabilidade de que duas . bi ial in a row of n theater seats. What is the probability that . he bi ‘al th pessoas em particularAeBestardo sentados um ao lado do outro? Dica‘Use 0 teorema binomial. two particular people A and B will be seated next to each Hint: Use the binomial theorem. 16.0 Senado dos Estados Unidos contém dois senadores de cada other? 16. The United States Senate contains two senators from 7.Sekas pessoas estdo sentadas de maneira aleatéria em uma um dos 50 estados.(a)Se uma comissdo de oito senadores for 7. If k people are seated in a random manner in a row each of the 50 states. (a) If a committee of eight senators fileira contendonassentos(n > k), qual é a probabilidade de que as selecionada aleatoriamente, qual éa probabilidade de ela conter containing n seats (n > k), what is the probability that the is selected at random, what is the probability that it will pessoas ocupemassentos adjacentes na fila? pelo menos um dos dois senadores de um determinado estado?( people will occupy k adjacent seats in the row? contain at least one of the two senators from a certain b)Qual é a probabilidade de que um grupo de 50 senadores ; ; ; specified state? (b) What is the probability that a group 8.Sekas pessoas estado sentadas de maneira aleatoria em um selecionados aleatoriamente contenha um senador de cada 8. If k people are seated in a random manner in a circle of 50 senators selected at random will contain one senator circulo contendoncadeiras(n > k), qual € a probabilidade de que as estado? containing n chairs (n > k), what is the probability that the from each state? pessoas ocupemkcadeiras adjacentes no circulo? people will occupy k adjacent chairs in the circle? . 17.Um baralho de 52 cartas contém quatro ases. Se as cartas ; ; 17. A deck of 52 cards contains four aces. If the cards 9.Senas pessoas estéo sentadas de maneira aleatoria em uma fileira forem embaralhadas e distribuidas aleatoriamente entre quatro 9. If n people are seated ma random manner in a row are shuffled and distributed in a random manner to four contendo 2nassentos, qual é a probabilidade de que duas pessoas nado jogadores, de modo que cada jogador receba 13 cartas, qual é a containing 2n seats, what is the probability that no two players so that each player receives 13 cards, what is the ocupem assentos adjacentes? probabilidade de todos os quatro ases serem recebidos pelo people will occupy adjacent seats? probability that all four aces will be received by the same 10.Uma caixa contém 24 lampadas, das quais duas estado com defeito. mesmo jogador? 10. A box contains 24 light bulbs, of which two are de- player? Se uma pessoa seleciona 10 lampadas aleatoriamente, sem reposicdo, 18.Suponha que 100 estudantes de matematica sejam divididos fective. If a person selects 10 bulbs at random, without 18. Suppose that 100 mathematics students are divided qual é a probabilidade de que ambas as lampadas defeituosas sejam em cinco turmas, cada uma contendo 20 alunos, e que os prémios replacement, what is the probability that both defective into five classes, each containing 20 students, and that selecionadas? sejam concedidos a 10 desses alunos. Se cada aluno tiver a bulbs will be selected? awards are to be given to 10 of these students. If each 11.Suponha que um comité de 12 pessoas seja selecionado mesma probabilidade de receber um prémio, qual € a 11. Suppose that a committee of 12 people is selected in Student is equally likely to receive an award, what is the aleatoriamente em um grupo de 100 pessoas. Determine a probabilidade de que exatamente dois alunos em cada turma arandom manner from a group of 100 people. Determine probability that exactly two students in each class will probabilidade de duas pessoas em particularAeBambos serao recebam prémios? the probability that two particular people A and B will — Teceive awards? selecionados. 19.Um restaurante temnitens em seu menu. Durante um both be selected. 19, A restaurant has n items on its menu. During a partic- 12.Suponha que 35 pessoas sejam divididas aleatoriamente determinado dia, kos clientes chegardo e cada um escolhera 12. Suppose that 35 people are divided in a random man- ular day, k customers will arrive and each one will choose em duas equipes, de forma que uma equipe contenha 10 um item. O gerente deseja contar quantas colec6es diferentes ner into two teams in such a way that one team contains = one item. The manager wants to count how many dif- pessoas e a outra equipe contenha 25 pessoas. Qual é a de escolhas dos clientes sdo possiveis, independentemente da 10 people and the other team contains 25 people. What is ferent collections of customer choices are possible with- probabilidade de que duas pessoas em particularAe Bestarao ordem em que as escolhas sdo feitas. (Por exemplo, sek=3 ear the probability that two particular people A and B willbe Out regard to the order in which the choices are made. no mesmo time? , +++, anSao Os itens do menu, on the same team? (For example, if k = 3 and aj, ..., a, are the menu items, 42 Capitulo 1 Introdugdo 4 Probabilidade 42 Chapter 1 Introduction to Probability entdoai aaindo Se distingue dea aia3.) Prove que notal que o resultado seja verdadeiro para todosnsmp, prove que isso then a,a3q, is not distinguished from a ,a,a3.) Prove that No such that the result is true for all n < no, prove that it is t(ele num ; . ; : também é verdade paran=not 1. the number of different collections of customer choices is also true forn =ng + 1. mere nag SE OHS ss Solesheude sand! eapdeationte §,, rk), Hint: Assume that the menu items are aj, ..., dy. Mostre que cada conjunto de escolhas do cliente, organizado 21.Volte ao problema do aniversario na pagina 30. Quantos conjuntos Show that each collection of customer choices, arranged 21. Return to the birthday problem on page 30. How com oa1o primeiro, 0420 segundo, etc., pode ser identificado diferentes de aniversarios estado disponiveis comkpessoas e 365 dias with the ay’s first, the ay’s second, etc., can be identified many different sets of birthdays are available with k peo- com uma sequéncia dedzeros e7-1, onde cada 0 representa em que nao distinguimos os mesmos aniversarios em ordens with a sequence of k zeros and n — 1 ones, where each _— ple and 365 days when we don’t distinguish the same uma escolha do cliente e cada 1 indica um ponto nasequéncia _—_ diferentes? Por exemplo, seA=3, contariamos (1° de janeiro, 3 de stands for a customer choice and each 1 indicates a point birthdays in different orders? For example, if k = 3, we onde o numero do item do menu aumenta em 1. Por margo, 1° de janeiro) da mesma forma que (1° de janeiro, 1° de janeiro, in the sequence where the menu item number increases would count (Jan. 1, Mar. 3, Jan.1) the same as (Jan. 1, exemplo, sek=3 en=5, entdoaiaia3torna-se 0011011. 3 de marco). by 1. For example, if k =3andn=5,thenaja,a;becomes _Jan. 1, Mar. 3). 0011011. 22.Deixarnseja um numero inteiro par grande. Use a formula de Stirlings 22. Let n be a large even integer. Use Stirlings’ formula 20.Prove o teorema binomial 1.8.2.DicaNocé pode usar um (Teorema 1(.7.5 a o bindmio 20. Prove the binomial theorem 1.8.2. Hint: Youmay use — (Theorem 1.7.5) to find an approximation to the binomial at n, . . . . n . . . indu¢géoargumento. Isto é, primeiro prove que o resultado é coeficiente n7PPRAIUG TS PPR IETSSCB AF an induction argument. That is, first prove that the result coefficient (,, 2): Compute the approximation with n= verdadeiro sen=1. Entdo, sob a suposicdo de que existe 500. is true if n = 1. Then, under the assumption that there is 500. 1.9 Coeficientes Multinomiais 1.9 Multinomial Coefficients Aprendemos como contar 0 numero de maneiras de particionar um conjunto finito em mais de We learn how to count the number of ways to partition a finite set into more than dois subconjuntos disjuntos. Isso generaliza os coeficientes binomiais da Se¢ao. 1.8. A two disjoint subsets. This generalizes the binomial coefficients from Sec. 1.8. The generalizacao é util quando os resultados consistem em diversas partes selecionadas de um generalization is useful when outcomes consist of several parts selected from a numero fixo de tipos distintos. fixed number of distinct types. Comecaremos com um exemplo bastante simples que ilustrara as ideias gerais desta We begin with a fairly simple example that will illustrate the general ideas of this secdo. section. Exemplo Escolha dos Comités.Suponha que 20 membros de uma organizac¢ao sejam divididos Example Choosing Committees. Suppose that 20 members of an organization are to be divided 1.9.1 em trés comisséesA, B, eCde tal forma que cada uma das comiss6éesAe Bé ter oito 1.9.1 into three committees A, B, and C in such a way that each of the committees A and membros e comitéCé ter quatro membros. Determinaremos o numero de maneiras B is to have eight members and committee C is to have four members. We shall diferentes pelas quais os membros podem ser designados para esses comités. determine the number of different ways in which members can be assigned to these Observe que cada um dos 20 membros é designado para um e apenas um comité. committees. Notice that each of the 20 members gets assigned to one and only one committee. Uma maneira de pensar nas atribuigées é formar um comitéAfiprimeiro escolhendo One way to think of the assignments is to form committee A first by choosing its seus oito membros e depois dividindo os 12 membros restantes em comitésBeC. Cada eight members and then split the remaining 12 members into committees B and C. uma dessas operagées é escolher uma combinacao, e cada escolha de comité Apode ser Each of these operations is choosing a combination, and every choice of committee emparelhado com cada uma das divisdes dos 12 membros restantes em comitésBeC. A can be paired with every one of the splits of the remaining 12 members into Portanto, o numero de atribuicgdes em trés comissdes é 0 produto do numero de committees B and C. Hence, the number of assignments into three committees is combinacées para as duas partes da tarefa. Especificamente, para formar comitéA, the product of the numbers of combinations for the two parts of the assignment. devemos escolher oito entre 20 membros, e isso Specifically, to form committee A, we must choose eight out of 20 members, and this pode ser feito en@minhos. Em seguida, dividir os 12 membros restantes em comitésB can be done in (2) ways. Then to split the remaining 12 members into committees B ecexistems (12) maneiras de fazer isso. Aqui, a resposta é and C there are are (3) ways to do it. Here, the answer is (3)0 20 “12 20! 12! 20! 20\ (12 20! 12! 20! ¥ - =62,355,150. - = — — = —— = 62,355,150. < 8 88112! 8!4! 8!8!4! 8/\8 8!12!8!4! = 81814! Observe como 0 12! que aparece no denominador de2o0) sdivide com os 12! Notice how the 12! that appears in the denominator of (2) divides out with the 12! que aparece no numerador des. Eo facto é a chave para a formula geral que that appears in the numerator of (;). This fact is the key to the general formula that derivaremos a seguir. we shall derive next. Em geral, suponha quenelementos distintos devem ser divididos emkgrupos In general, suppose that n distinct elements are to be divided into k different diferentes(k22)de tal forma que, por1,..., kK, o/o grupo contém exatamente nj groups (k > 2) in such a way that, for 7 =1,...,k, the jth group contains exactly elementos, ondem+net. . .+nken. Deseja-se determinar 0 numero de maneiras n, elements, where nj +n2+---+nz =n. It is desired to determine the number diferentes pelas quais onelementos podem ser divididos emdAgrupos. O of different ways in which the n elements can be divided into the k groups. The 1.9 Coeficientes Multinomiais 43 1.9 Multinomial Coefficients 43 ( ; ; ; nglementos do primeiro grupo podem ser selecionados a partir donelementos disponiveis em n) n, elements in the first group can be selected from the n available elements in ) jeitos diferentes. Depois demelementos do primeiro grupo foram selecionados, 072 different ways. After the n; elements in the first group have been selected, the n elementos do segundo grupo podem ser selecionados entre os restantes/-m elementos elements in the second group can be selected from the remaining n — n, elements emmnn) ijeitos diferentes. Portanto, o numero total de diferentes maneiras de selecionar o in Cn) different ways. Hence, the total number of different ways of selecting the elementos para 0 primeiro grupo e o segundo grupo én) mn), Depois dent ne elements for both the first group and the second group is (i) Ci): After the n; +n elementos nos dois primeiros grupos foram selecionados, o numero de maneiras diferentes de elements in the first two groups have been selected, the number of different ways in qual ornzelementos do terceiro grupo podem ser selecionados én-m-) 2. Portanto, o total which the n3 elements in the third group can be selected is ee), Hence, the total numero de maneiras diferentes de selecionar os elementos para os trés primeiros grupos é number of different ways of selecting the elements for the first three groups is ( )¢ )( ) nm mm n-m-ne n n—ny\(n—-—ny-—Nn2 m n 1B m)\ ny n3 Resulta da explicagdo anterior que, para cadaf1,..., k-2 depois do primeiro/ It follows from the preceding explanation that, for each j =1,...,k —2 after grupos foram formados, o numero de maneiras diferentes pelas quais onj+1 the first j groups have been formed, the number of different ways in which the nj; elementos no (prdéximo grupo (/+1) pode ser selecionado entre os restantes/7-m-. . .- Nj elements in the next group (j + 1) can be selected from the remaining n — n, —---— elementos é ne “ MN apbs os elementos do grupok-1 foi selecionado, n; elements is on). After the elements of group k — 1 have been selected, J 0 restantenkos elementos devem entdo formar o Ultimo grupo. Portanto, o numero total the remaining n, elements must then form the last group. Hence, the total number de diferentes maneiras de dividir onelementos nokgrupos é of different ways of dividing the n elements into the k groups is ()( )( )( ) nm mm n-m-m Mm M-. +> Aeo= n “yoyo ) ("ome n! mM n2 TB nke1 mine}. . nk ny n2 n3 Np_-4 ny!n>! tae ny! onde a ultima formula segue da escrita dos coeficientes binomiais em termos de where the last formula follows from writing the binomial coefficients in terms of fatoriais. factorials. Definicgao Coeficientes Multinomiais.O numero Definition Multinomial Coefficients. The number 1.9.1 nl ( n ) 1.9.1 nl h ————,,_ que denotaremos por , ———,_ which we shall denote by ; minal. . .nk M, M!,..., 1k ny!nz!---n,! Ny, N7,..., Mz & chamado decoeficiente multinomial. is called a multinomial coefficient. O nomecoeficiente multinomiaieriva do aparecimento do simbolo no teorema The name multinomial coefficient derives from the appearance of the symbol in the multinomial, cuja prova é deixada como Exercicio 11 nesta segdo. multinomial theorem, whose proof is left as Exercise 11 in this section. Teorema Teorema Multinomial.Para todos os numeros,..., Xke Cada inteiro positivon, Theorem Multinomial Theorem. For all numbers x1, ..., x; and each positive integer n, 1.9.1 >( n ) 1.9.1 Ch (X14... AXA kK Xn! 2,...XNn (x f..-+x a ( Japp ea m, 1m, ..., nk © 1xre i, ' ‘ du Ny,N>,...,ny) 1? k onde a soma se estende a todas as combinacées possiveis de inteiros ndo negativos ™ where the summation extends over all possible combinations of nonnegative integers ,--., kde tal modo quem+net. . .+NKEN. 7 ny,..., mg such thatny +ny+---+np=n. 7 Um coeficiente multinomial é uma generalizagdo do coeficiente binomial A multinomial coefficient is a generalization of the binomial coefficient discussed discutido na Secao. 1.8. Parak=2, o teorema multinomial é igual ao teorema binomial, in Sec. 1.8. For k = 2, the multinomial theorem is the same as the binomial theorem, e o coeficiente multinomial torna-se um coeficiente binomial. Em particular, and the multinomial coefficient becomes a binomial coefficient. In particular, ( QO k, n-k k ° k,n—k k} Exemplo Escolha dos Comités.No Exemplo 1.9.1, vemos que a solucdo ali obtida é a Example Choosing Committees. In Example 1.9.1, we see that the solution obtained there is the 1.9.2 igual ao coeficiente multinomial para 0 qualn=20, kA=3, m=n2= 8, ens= 4, ou seja, 1.9.2 same as the multinomial coefficient for which n = 20, k = 3, ny = n> = 8, and n3=4, namely, ( ) 20 20! 20 20! = —— =62,355,150. - = —— = 62,355,150. < 8,8,4 (8124! 8, 8, 4 (8!)24! 44 Capitulo 1 Introdu¢ao a Probabilidade 44 Chapter 1 Introduction to Probability Arranjos de elementos de mais de dois tipos distintosTao binomial Arrangements of Elements of More Than Two Distinct Types Just as binomial coeficientes podem ser usados _ para representar o numero de arranjos diferentes dos coefficients can be used to represent the number of different arrangements of the elementos de um conjunto contendo elementos de apenas dois tipos distintos, elements of a set containing elements of only two distinct types, multinomial coeffi- coeficientes multinomiais podem ser usados _ para representar o numero de arranjos cients can be used to represent the number of different arrangements of the elements diferentes dos elementos de um conjunto contendo elementos dedtipos diferentes(k=2). of a set containing elements of k different types (k > 2). Suppose, for example, that Suponha, por exemplo, que nbolas dekcores diferentes devem ser organizadas em uma n balls of k different colors are to be arranged in a row and that there are n; balls fileira e que hanjbolas de cor/ (F1,..., k), ondem+ne+. . .+nk=n. Entao cada arranjo of color j (j =1,..., 4), where n; +n) +---+n, =n. Then each different arrange- diferente donbolas corresponde a uma forma diferente de dividir onposicées disponiveis ment of the n balls corresponds to a different way of dividing the n available positions na linha em um grupo demiposigées a serem ocupadas pelas bolas da cor 1, um segundo in the row into a group of n, positions to be occupied by the balls of color 1, a second grupo demposigées a serem ocupadas pelas bolas da cor 2, e assim por diante. Portanto, group of n> positions to be occupied by the balls of color 2, and so on. Hence, the o numero total de diferentes arranjos possiveis donas bolas devem ser total number of different possible arrangements of the n balls must be ( ) n _ nl n _ n! Mm, M,..., Nk minal. + nk (nom) Sarat Exemplo Dados de rolamento.Suponha que 12 dados sejam lancados. Vamos determinar a probabilidade Example Rolling Dice. Suppose that 12 dice are to be rolled. We shall determine the probability 1.9.3 pque cada um dos seis niimeros diferentes aparecera duas vezes. 1.9.3 p that each of the six different numbers will appear twice. Cada resultado no espago amostralSpode ser considerada como uma sequéncia Each outcome in the sample space S can be regarded as an ordered sequence ordenada de 12 numeros, onde oeuo numero na sequéncia é 0 resultado doeuo rolo. of 12 numbers, where the ith number in the sequence is the outcome of the th roll. Portanto, serdo 612resultados possiveis em5S, e cada um desses resultados pode ser Hence, there will be 62 possible outcomes in S, and each of these outcomes can considerado igualmente provavel. O numero desses resultados que conteriam cada be regarded as equally probable. The number of these outcomes that would contain um dos seis numeros 1,2, ...,6 exatamente duas vezes sera igual ao numero de each of the six numbers 1, 2,..., 6 exactly twice will be equal to the number of diferentes arranjos possiveis desses 12 elementos. Este numero pode ser different possible arrangements of these 12 elements. This number can be determined determinado avaliando o coeficiente multinomial para 0 qualn=12,k=6, em=mz=...= by evaluating the multinomial coefficient for which n = 12,k = 6, andny =n) =---= 6= 2. Portanto, o numero de tais resultados é Ng = 2. Hence, the number of such outcomes is ( 12 ) _ 12! ( 12 ) _ 12! 2,2,2,2,2,2 (2!)6° 2,2,2,2,2,2) (21° e a probabilidade necessariapé and the required probability p is 12! 12! Exemplo Cartas de jogar.Um baralho de 52 cartas contém 13 coragdes. Suponha que as cartas sejam Example Playing Cards. A deck of 52 cards contains 13 hearts. Suppose that the cards are 1.9.4 embaralhados e distribuidos entre quatro jogadoresA,8,C, eDpara que cada jogador 1.9.4 shuffled and distributed among four players A, B, C, and D so that each player receba 13 cartas. Vamos determinar a probabilidadepaquele jogadorArecebera seis receives 13 cards. We shall determine the probability p that player A will receive coragées, jogador Brecebera quatro coragées, jogador Crecebera dois coragées, e o six hearts, player B will receive four hearts, player C will receive two hearts, and jogador Drecebera um coracao. player D will receive one heart. O numero totalNde diferentes maneiras pelas quais as 52 cartas podem ser distribuidas entre os The total number N of different ways in which the 52 cards can be distributed quatro jogadores, de modo que cada jogador receba 13 cartas é among the four players so that each player receives 13 cards is ( ) pe 82) 5h veo 24 y)= ae 13,13,13,13 (3! 13, 13, 13, 13 (13!)4 Pode-se presumir que cada uma dessas formas é igualmente provavel. Devemos agora calcular It may be assumed that each of these ways is equally probable. We must now calculate o numeroMde formas de distribuir as cartas para que cada jogador receba o numero the number M of ways of distributing the cards so that each player receives the necessario de copas. O numero de maneiras diferentes pelas quais os coragdes podem ser required number of hearts. The number of different ways in which the hearts can distribuidos aos jogadoresA,8,C, eDpara que o numero de coracGes que eles recebem seja 6,4,2 be distributed to players A, B, C, and D so that the numbers of hearts they receive e 1, respectivamente, é are 6, 4, 2, and 1, respectively, is ( 13 y 13! ( 13 )- 13! 6,4,2,16!4!2!1! ‘ YO 1.9 Coeficientes Multinomiais 45 1.9 Multinomial Coefficients 45 Além disso, o numero de maneiras diferentes pelas quais as outras 39 cartas podem ser distribuidas aos Also, the number of different ways in which the other 39 cards can then be distributed quatro jogadores, de modo que cada um tenha um total de 13 cartas, é to the four players so that each will have a total of 13 cards is ( 39 ) _ 39! ( 39 ) _ 39! 7,9,11,12 719111112!” 7,9,11,12)/ 7!9"1"2!" Portanto, Therefore, 13! 39! 13! 39! M= ——— .——__, M = ——_ - ——_., 6l4!2'1! 719111112! Ol4!2"! 719111112! e a probabilidade necessdariapé and the required probability p is M 13!39!(13! M 13!391(13))4 p= —= BRISA =0.00196. p=—= B83) = 0.00196. N 6141211171911 1112!52! N 6l41211171911111 2152! Ha outra abordagem para este problema nos moldes indicados no Exemplo 1.8.9 There is another approach to this problem along the lines indicated in Exam- na pagina 37. O numero de combinac6es diferentes possiveis das 13 posicées ple 1.8.9 on page 37. The number of possible different combinations of the 13 posi- ¢des no baralho ocupado pelas copas 613. G2 ogadorAé receber seis coragdes, tions in the deck occupied by the hearts is (33) If player A is to receive six hearts, ha13) spossiveis combinagées das seis posigdes que esses coragdes ocupam entre there are (¢) possible combinations of the six positions these hearts occupy among as 13 cartas queAira receber. Da mesma forma, se 0 jogadorBé receber quatro coracées, ha the 13 cards that A will receive. Similarly, if player B is to receive four hearts, there siol"Bossiveis combinacées de suas posicées entre as 13 cartas queBira re- are (7) possible combinations of their positions among the 13 cards that B will re- receber. Ha2_ ("3) combinacées possiveis para 0 jogadorC, e haiposbivel ceive. There are (5) possible combinations for player C, and there are (") possible combinacgées para jogadorD. Por isso, combinations for player D. Hence, ()O430 13 13 13) (13) (13) (13 6 4 2 1 _ \6 4 2 1 Pp ED , p= 52 13 13 que produz 0 mesmo valor obtido pelo primeiro método de solucao. which produces the same value as the one obtained by the first method of solution. - < Resumo Summary ( ), ae ; Co ; ; ; Os coeficientes multinomiais generalizam os coeficientes binomiais. O coeficiente m a n, © Multinomial coefficients generalize binomial coefficients. The coefficient (,, " n,) Is o numero de maneiras de particionar um conjunto denitens em subconjuntos the number of ways to partition a set of n items into distinguishable subsets of sizes distinguiveis de tamanhos m,..., nkondem+. . +n. E também o nimero de arranjos ny, ..., Mg Where ny +--- +n, =n. It is also the number of arrangements of n items denitens dekdiferentes tipos para os quaisneusdo do tipoeuparaeu=1,..., k.O Exemplo of k different types for which n; are of typei fori =1,..., k. Example 1.9.4 illustrates 1.9.4 ilustra outro ponto importante a ser lembrado sobre o calculo de probabilidades: another important point to remember about computing probabilities: There might pode haver mais de um método correto para calcular a mesma probabilidade. be more than one correct method for computing the same probability. Exercicios Exercises 1.Trés pesquisadores examinardo um bairro com 21 casas. __ ter cinco membros e a outra comissdo ter oito membros, 1. Three pollsters will canvas a neighborhood with 21 to have five members and the other committee is to have Cada pesquisador visitara sete das casas. Quantas de quantas maneiras diferentes essas comissOes podem houses. Each pollster will visit seven of the houses. How eight members, in how many different ways can these atribuigées diferentes de pesquisadores as casas sdo ser selecionadas? many different assignments of pollsters to houses are pos- committees be selected? possiveis? oe . . sible? i. . 4.Se as letras6,6,6,¢¢¢.eu,eu,a,cestdo organizados em 4. If the letters s, 5, 5, t,t, t, i, i, a, c are arranged in a 2.Suponha que 18 contas vermelhas, 12 contas amarelas, oito ordem aleatoria, qual é a probabilidade de soletrarem a 2. Suppose that 18 red beads, 12 yellow beads, eight blue —_ random order, what is the probability that they will spell contas azuis e 12 contas pretas sejam enfiadas em uma fileira. palavra “estatisticas’? beads, and 12 black beads are to be strung in a row. How the word “statistics”? jos di 2 i ? Quantos arranjos diferentes de cores podem ser formados? 5.Suponha quendados equilibrados sao lancados. many different arrangements of the colors can be formed? 5, Suppose that n balanced dice are rolled. Determine the 3.Suponha que dois comités sejam formados em uma Determine a probabilidade de que o numerojaparecera 3. Suppose that two committees are to be formed in an probability that the number j will appear exactly n ; times organizagdo com 300 membros. Se uma comissdo for exatamentenvezes (F1,...,6), ondem+mt. . .+16=N, organization that has 300 members. If one committee is (J =1,.-., 6), wheren; +ny+...+ng=n. 46 Capitulo 1 Introdugdo 4 Probabilidade 46 Chapter 1 Introduction to Probability 6.Se sete dados equilibrados forem langados, qual é a probabilidade de — 10.Suponha que dois meninos chamados Davis, trés meninos 6. Ifseven balanced dice are rolled, what is the probability 10. Suppose that two boys named Davis, three boys cada um dos seis numeros diferentes aparecer pelo menos uma vez? chamados Jones e quatro meninos chamados Smith estejam sentados that each of the six different numbers will appear at least named Jones, and four boys named Smith are seated at aleatoriamente em uma fileira contendo nove assentos. Qual é a once? random in a row containing nine seats. What is the prob- babilidade d inos Davi dois primei . ability that the Davis boys will occupy the first two seats 7.Suponha que um baralho de 25 cartas contenha 12 cartas Mame ence eas OCU are mT OS CO Primes 7. Suppose that a deck of 25 cards contains 12 red cards. : y ys} PY ; : oo assentos da fila, os meninos Jones ocuparem os préximos trés . . in the row, the Jones boys will occupy the next three seats, vermelhas. Suponha também que as 25 cartas sejam distribuidas . . nie Suppose also that the 25 cards are distributed in a random : : . 7 assentos e os meninos Smith ocuparem os Ultimos quatro assentos? . and the Smith boys will occupy the last four seats? aleatoriamente entre trés jogadoresA,B, eCde tal forma que o manner to three players A, B, and C in such a way that jogadorArecebe 10 cartas, jogador Brecebe oito cartas e o jogador ; ; . player A receives 10 cards, player B receives eight cards, ; ; ; Crecebe sete cartas. Determine a probabilidade de que ojogadorA = 11.Prove o teorema multinomial 1.9.1. (Vocé pode usar and player C receives seven cards. Determine the proba- 11. Prove the multinomial theorem 1.9.1, (You may wish receberd seis cartées vermelhos, jogadorBrecebera dois cartées a mesma dica do Exercicio 20 da Secao 1.8.) bility that player A will receive six red cards, player B will to use the same hint as in Exercise 20 in Sec. 1.8.) vermelhos e o jogador Crecebera quatro cartdes vermelhos. receive two red cards, and player C will receive four red 12.Volte ao Exemplo 1.8.6. DeixarSseja o espacgo amostral maior cards. 12. Return to Example 1.8.6. Let S be the larger sample . . z + Scai . / 8.Um baralho de 52 cartas contém 12 cartas ilustradas. Se as (primeiro metodo de escolha) e sejaSseja 0 menor espace 8. A deck of 52 cards contains 12 picture cards. If the space (first method of choosing) and let 5’ be the smaller ve . amostral (segundo método). Para cada elementoédeS, deixar Ns) ae : sample space (second method). For each element s’ of S’, 52 cartas forem distribuidas aleatoriamente entre quatro , ‘ 52 cards are distributed in a random manner among four ; : . . Jrepresentam o numero de elementos deSque levam a mesma : : let N(s’) stand for the number of elements of S that lead to jogadores, de modo que cada jogador receba 13 cartas, qual é ixa cheiaé d dem d ha éi d players in such a way that each player receives 13 cards, th boxful s’ when the order of choosing is j d a probabilidade de cada jogador receber trés cartas com calxa chelae quando a ordem de escolna € ignoraca. what is the probability that each player will receive three © SAME DOXTUES WHEN Me OFCeT OF CHOOSING IS 1BOTEC. figuras? a.Para cadaé€S, encontre uma formula paraMs)).Dica-Deixar n picture cards? a. For each s’ € S$’, find a formula for N(s’). Hint: Let eurepresenta o numero de itens do tipoevemépara . n, stand for the number of items of type i in s’ for 9.Suponha que um baralho de 52 cartas contenha 13 cartas eu=1 7 9. Suppose that a deck of 52 cards contains 13 red cards, i=l 7 vermelhas, 13 cartas amarelas, 13 cartas azuis e 13 cartas a a . . . 13 yellow cards, 13 blue cards, and 13 green cards. If the f ha , Is th ber of verdes. Se as 52 cartas forem distribuidas aleatoriamente b.Verifique isso “~ éesN(s) igual ao numero de saidas 52 cards are distributed in a random manner among four b. Verify that ise’ N(s’) equals the number of out- entre quatro jogadores de forma que cada jogador receba 13 entras. players in such a way that each player receives 13 cards, comes in S. cartas, qual é a probabilidade de cada jogador receber 13 what is the probability that each player will receive 13 cartas da mesma cor? cards of the same color? 1.10 A probabilidade de uma unido de eventos 1.10 The Probability of a Union of Events Os axiomas da probabilidade nos dizem diretamente como encontrar a probabilidade da unigo The axioms of probability tell us directly how to find the probability of the union de eventos disjuntos. O Teorema 1.5.7 mostrou como encontrar a probabilidade da unigo de dois of disjoint events. Theorem 1.5.7 showed how to find the probability for the union eventos arbitrdrios. Este teorema é generalizado para a uniao de uma cole¢ao arbitraria e finita of two arbitrary events. This theorem is generalized to the union of an arbitrary de eventos. finite collection of events. Vamos agora considerar novamente um espaco amostral arbitrarioSque pode conter um We shall now consider again an arbitrary sample space S that may contain either a numero finito de resultados ou um numero infinito, e desenvolveremos algumas finite number of outcomes or an infinite number, and we shall develop some further propriedades gerais adicionais das varias probabilidades que podem ser especificadas general properties of the various probabilities that might be specified for the events para oUeventos emS. Nesta secdo, estudaremos em particular a probabilidade da ufligg., in S. In this section, we shall study in particular the probability of the union U"_, A; deneventosA1, ..., An. of n events Aj,..., Aj. Se os eventos, ..., Ansdo disjuntos, sabemos que If the events Aj,..., A, are disjoint, we know that ( ly ” n n Pr. Acu= Pr.(Aeu). Pr((_) A; } = 50 Pr(A)). eu=1 eu=1 i=1 i=l Além disso, para cada dois eventosA1eA2, independentemente de serem disjuntos ou nao, Furthermore, for every two events A; and A>, regardless of whether or not they are sabemos pelo Teorema 1.5.7 da Secao. 1,5 isso disjoint, we know from Theorem 1.5.7 of Sec. 1.5 that Pr.(A1UA2}Pr.(Ai +Pr.(A2}Pr.(A1nAz). Pr(Aq U Ad) = Pr(Aq) + Pr(A2) — Pr(Ay, 2 A). Nesta secdo, estenderemos esse resultado, primeiro para trés eventos e depois para um numero In this section, we shall extend this result, first to three events and then to an arbitrary arbitrario e finito de eventos. finite number of events. A Unido de Trés Eventos The Union of Three Events Teorema Para cada trés eventosA1,A2, eA3, Theorem For every three events Aj, A>, and A3, 1.10.1 1.10.1 1.10 The Probability of a Union of Events 47 Pr(A1 ∪ A2 ∪ A3) = Pr(A1) + Pr(A2) + Pr(A3) − [Pr(A1 ∩ A2) + Pr(A2 ∩ A3) + Pr(A1 ∩ A3)] + Pr(A1 ∩ A2 ∩ A3). (1.10.1) Proof By the associative property of unions (Theorem 1.4.6), we can write A1 ∪ A2 ∪ A3 = (A1 ∪ A2) ∪ A3. Apply Theorem 1.5.7 to the two events A = A1 ∪ A2 and B = A3 to obtain Pr(A1 ∪ A2 ∪ A3) = Pr(A ∪ B) = Pr(A) + Pr(B) − Pr(A ∩ B). (1.10.2) We next compute the three probabilities on the far right side of (1.10.2) and combine them to get (1.10.1). First, apply Theorem 1.5.7 to the two events A1 and A2 to obtain Pr(A) = Pr(A1) + Pr(A2) − Pr(A1 ∩ A2). (1.10.3) Next, use the first distributive property in Theorem 1.4.10 to write A ∩ B = (A1 ∪ A2) ∩ A3 = (A1 ∩ A3) ∪ (A2 ∩ A3). (1.10.4) Apply Theorem 1.5.7 to the events on the far right side of (1.10.4) to obtain Pr(A ∩ B) = Pr(A1 ∩ A3) + Pr(A2 ∩ A3) − Pr(A1 ∩ A2 ∩ A3). (1.10.5) Substitute (1.10.3), Pr(B) = Pr(A3), and (1.10.5) into (1.10.2) to complete the proof. Example 1.10.1 Student Enrollment. Among a group of 200 students, 137 students are enrolled in a mathematics class, 50 students are enrolled in a history class, and 124 students are enrolled in a music class. Furthermore, the number of students enrolled in both the mathematics and history classes is 33, the number enrolled in both the history and music classes is 29, and the number enrolled in both the mathematics and music classes is 92. Finally, the number of students enrolled in all three classes is 18. We shall determine the probability that a student selected at random from the group of 200 students will be enrolled in at least one of the three classes. Let A1 denote the event that the selected student is enrolled in the mathematics class, let A2 denote the event that he is enrolled in the history class, and let A3 denote the event that he is enrolled in the music class. To solve the problem, we must determine the value of Pr(A1 ∪ A2 ∪ A3). From the given numbers, Pr(A1) = 137 200, Pr(A2) = 50 200, Pr(A3) = 124 200, Pr(A1 ∩ A2) = 33 200, Pr(A2 ∩ A3) = 29 200, Pr(A1 ∩ A3) = 92 200, Pr(A1 ∩ A2 ∩ A3) = 18 200. It follows from Eq. (1.10.1) that Pr(A1 ∪ A2 ∪ A3) = 175/200 = 7/8. ◀ The Union of a Finite Number of Events A result similar to Theorem 1.10.1 holds for any arbitrary finite number of events, as shown by the following theorem. 1.10 A probabilidade de uma união de eventos 47 Pr.(A1∪A2∪A3)=Pr.(A1)+Pr.(A2)+Pr.(A3) − [Pr.(A1∩A2)+Pr.(A2∩A3)+Pr.(A1∩A3)] + Pr.(A1∩A2∩A3). (1.10.1) Prova Pela propriedade associativa dos sindicatos (Teorema 1.4.6), podemos escrever A1∪A2∪A3=(A1∪A2)∪A3. Aplique o Teorema 1.5.7 aos dois eventosA=A1∪A2eB=A3obter Pr.(A1∪A2∪A3)=Pr.(A∪B) =Pr.(A)+Pr.(B)-Pr.(A∩B). (1.10.2) Em seguida, calculamos as três probabilidades no lado direito de (1.10.2) e as combinamos para obter (1.10.1). Primeiro, aplique o Teorema 1.5.7 aos dois eventosA1eA2obter Pr.(A)=Pr.(A1)+Pr.(A2)-Pr.(A1∩A2). A seguir, use a primeira propriedade distributiva do Teorema 1.4.10 para escrever A∩B=(A1∪A2)∩A3=(A1∩A3)∪(A2∩A3). Aplique o Teorema 1.5.7 aos eventos no lado direito de (1.10.4) para obter Pr.(A∩B)=Pr.(A1∩A3)+Pr.(A2∩A3)-Pr.(A1∩A2∩A3). Substituto (1.10.3), PR(B)=Pr.(A3), e (1.10.5) em (1.10.2) para completar a prova. (1.10.3) (1.10.4) (1.10.5) Exemplo 1.10.1 Inscrição de Alunos.Entre um grupo de 200 alunos, 137 alunos estão matriculados em um aula de matemática, 50 alunos estão matriculados em uma aula de história e 124 alunos estão matriculados em uma aula de música. Além disso, o número de alunos matriculados nas aulas de matemática e de história é 33, o número de matriculados nas aulas de história e de música é 29, e o número de matriculados nas aulas de matemática e de música é 92. Finalmente, o número de alunos matriculado em todas as três turmas é 18. Determinaremos a probabilidade de um aluno selecionado aleatoriamente no grupo de 200 alunos estar matriculado em pelo menos uma das três turmas. DeixarA1denotar o evento em que o aluno selecionado está matriculado na aula de matemática, sejaA2denota o evento em que ele está matriculado na aula de história, e deixaA3 denota o evento em que ele está matriculado na aula de música. Para resolver o problema, devemos determinar o valor de Pr(A1∪A2∪A3). Dos números dados, 137 200 50 200 124 200 Pr.(A1)= , Pr.(A )= 2 , Pr.(A )= 3 , 33 200 29 200 92 200 Pr.(A1∩A2)= , Pr.(A∩A )= 2 3 , Pr.(A∩A )= , 1 3 18 200 Pr.(A1∩A2∩A3)= Segue-se da Eq. (1.10.1) que Pr(A1∪A2∪A3)=175/200 = 7/8. . - A União de um Número Finito de Eventos Um resultado semelhante ao Teorema 1.10.1 é válido para qualquer número finito arbitrário de eventos, conforme mostrado pelo teorema a seguir. 48 Capitulo 1 Introduc¢ao a Probabilidade 48 Chapter 1 Introduction to Probability Teorema Para cadaneventosM,..., An, Theorem For every n events Ay,..., Ap, 1.10.2 ( ly » 5 5 1.10.2 h h Pr. Acu= Pr.(Aeu- Pr.(Aeu Aj Pr. (AeuANAk) m(U 4 = )>Pr(A;) — )) Pr(A4;N AZ) + Yo Pr(A;N A; Ay) eu=1 only euy eu<j<k i=1 i=1 i<j i<j<k - Pr. (Aeu ANAK Aeu}t. « (1.10.6) — So P(A; A;N ARN AD) +2" (1.10.6) eu<j<k</ i<j<k<l + —1)neiPr.(AinAa2n. . .NAn). + (—1)"*! Pr(Ay NA. N---NA,). ProvaA prova procede por inducao. Em particular, primeiro estabelecemos que (1.10.6) é Proof The proof proceeds by induction. In particular, we first establish that (1.10.6) verdadeiro paran=1 en=2. A seguir mostramos que se existe eutal que (1.10.6) é is true for n = 1 and n = 2. Next, we show that if there exists m such that (1.10.6) is verdadeiro para todosnseu, entdo (1.10.6) também é verdadeiro paran=eut1. O caso den= true for all n < m, then (1.10.6) is also true for n = m + 1. The case of n = 1 is trivial, 1 é trivial, e o caso den=2 é 0 Teorema 1.5.7. Para completar a proWakiliponha que (1.10.6) and the case of n = 2 is Theorem 1.5.7. To complete the proof, assume that (1.10.6) é verdade para todosnseu. DeixarAr, ..., Aewisejam eventos. DefinirA= #1 Aeue B=Aeu+1. is true for alln < m. Let Aj, ..., Aj41 be events. Define A = )”_, Aj and B = A, 44. O teorema 1.5.7 diz que Theorem 1.5.7 says that Cy) , Pr. = Aew =Pr.(AUBEPr.(A}Pr.(B)-Pr.(ANB). (1.10.7) m(U 4) = Pr(A U B) = Pr(A) + Pr(B) — Pr(AN B). (1.10.7) eu=1 i=1 Assumimos que Pr(AXé igual a (1.10.6) comn=eu. Precisamos mostrar que We have assumed that Pr(A) equals (1.10.6) with n = m. We need to show that when quando adicionamos Pr(A)para Pr(B)-Pr.(ANB), obtemos (1.10.6) comn=eut1.A we add Pr(A) to Pr(B) — Pr(A /N B), we get (1.10.6) with n = m + 1. The difference diferenca entre (1.10.6) comn=eut1 e Pr(Asdo todos os termos em que um dos between (1.10.6) with n =m +1 and Pr(A) is all of the terms in which one of the subscritos (eu,/,k, etc.) € igualeu+1. Esses termos sdo os seguintes: subscripts (i, j, k, etc.) equals m + 1. Those terms are the following: yeu > m Pr.(Aeu+1)- Pr.(AeuwAeut1 + Pr.(AeuAAeu+1) Pr(Amy1) — > Pr(Ap A Agi) + >) Pr(Ap 1A; 1 Amst) 5 eu=1 eusy i=l i<j . Pr.(AeuANA iAeui}te. (1.10.8) — S7 PA; APN ALA Amy) Fo (1.10.8) eu<sj<k i<j<k + (1 eur2Pr.(AinA2n. . .NAeuAeu+1 ). + (-1)"*? Pr(AyN AQ N+: -N Ay A Am+D- O primeiro termo em (1.10.8) 6 Pr(BEPr.(Aeu+1). Resta apenas mostrar que The first term in (1.10.8) is Pr(B) = Pr(A,,,;). All that remains is to show that - Pr.(ANB} igual a todos, exceto o primeiro termo em (1.10.8). — Pr(A N B) equals all but the first term in (1.10.8). Use a generalizagdo natural da propriedade distributiva (Teorema 1.4.10) para Use the natural generalization of the distributive property (Theorem 1.4.10) to escrever write ( \ gu (gY¥ m m AN B= Aeu WAeu+1= (Aeu\Aeu+1). (1.10.9) ANB= (U 4) N Amst = J(Ai 9 Amt): (1.10.9) eu=1 eu=1 i=1 i=l A unido em (1.10.9) contémeueventos e, portanto, podemos aplicar (1.10.6) comn=eu e cada um The union in (1.10.9) contains m events, and hence we can apply (1.10.6) with n =m Aeusubstituido porAeuNAeu+1. O resultado é que -Pr/ANBX igual a todos, exceto o primeiro and each A; replaced by A; N A,,41. The result is that — Pr(A M B) equals all but the termo em (1.10.8). . first term in (1.10.8). 7 O calculo no Teorema 1.10.2 pode ser descrito da seguinte forma: Primeiro, pegue a The calculation in Theorem 1.10.2 can be outlined as follows: First, take the soma das probabilidades doneventos individuais. Segundo, subtraia a soma das sum of the probabilities of the n individual events. Second, subtract the sum of the probabilidades das interseg6es de todos os pares possiveis de eventos; nesta etapa, probabilities of the intersections of all possible pairs of events; in this step, there havera Dares diferentes para os quais as probabilidades esto incluidas. Terceiro, adicione o will be (5) different pairs for which the probabilities are included. Third, add the provavelmenteaceta)i@S as intersecgdes de todos os grupos possiveis de trés dos eventos; la probabilities of the intersections of all possible groups of three of the events; there vaiser 3cruzamentos deste tipo. Quarto, subtraia a soma das probabilidades will be (3) intersections of this type. Fourth, subtract the sum of the probabilities das intersecgdes de todos os grupos possiveis de quatro dos eventos; havera4 (n) of the intersections of all possible groups of four of the events; there will be 9) cruzamentos deste tipo. Continue desta forma até que, finalmente, a probabilidade intersections of this type. Continue in this way until, finally, the probability of the da intersecgdo de todosneventos sdo adicionados ou subtraidos, dependendo sené intersection of all m events is either added or subtracted, depending on whether n is um numero impar ou par. an odd number or an even number. 1.10 A probabilidade de uma unido de eventos 49 1.10 The Probability of a Union of Events 49 O problema de correspondéncia The Matching Problem Suponha que todas as cartas de um baralho dencartas diferentes sdo colocadas em uma fileira, Suppose that all the cards in a deck of n different cards are placed in a row, and that € que as cartas de outro baralho semelhante sdo entéo embaralhadas e colocadas em uma the cards in another similar deck are then shuffled and placed in a row on top of the fileira no topo das cartas do baralho original. Deseja-se determinar a probabilidadepnque cards in the original deck. It is desired to determine the probability p, that there havera pelo menos uma correspondéncia entre as cartas correspondentes dos dois baralhos. O will be at least one match between the corresponding cards from the two decks. The mesmo problema pode ser expresso em varios contextos divertidos. Por exemplo, poderiamos same problem can be expressed in various entertaining contexts. For example, we supor que uma pessoa digitanletras, digita os enderecos correspondentes emv envelopes e, could suppose that a person types n letters, types the corresponding addresses on n em seguida, coloca oncartas nonenvelopes de maneira aleatoria. Seria desejavel determinar a envelopes, and then places the n letters in the n envelopes in a random manner. It probabilidadepnque pelo menos uma carta seja colocada no envelope correto. Como outro could be desired to determine the probability p, that at least one letter will be placed exemplo, poderiamos supor que as fotografias denatores de cinema famosos sdo in the correct envelope. As another example, we could suppose that the photographs emparelhados de maneira aleatéria comnfotografias dos mesmos atores tiradas quando eram of n famous film actors are paired in a random manner with n photographs of the bebés. Poderia entdo ser desejado determinar a probabilidadepnque a fotografia de pelo same actors taken when they were babies. It could then be desired to determine the menos um ator sera combinada corretamente com a fotografia do bebé desse ator. probability p, that the photograph of at least one actor will be paired correctly with this actor’s own baby photograph. Aqui discutiremos esse problema de correspondéncia no contexto de cartas colocadas em Here we shall discuss this matching problem in the context of letters being placed envelopes. Assim, deixaremosAeuseja 0 evento que cartaeué colocado i(nUo coragaéo ) reto in envelopes. Thus, we shall let A; be the event that letter i is placed in the correct envelope(eu=1,..., 1), e determinaremos 0 valor depn=Pr. 0 Aeu por envelope (i =1,..., 2), and we shall determine the value of p, = Pr (Uj_, Ai) by usando a Eq. (1.10.6). Como as cartas sdo colocadas aleatoriamente nos envelopes, a using Eq. (1.10.6). Since the letters are placed in the envelopes at random, the probabilidade Pr(Aeu)que qualquer carta especifica sera colocada no envelope probability Pr(A;) that any particular letter will be placed in the correct envelope correto é 1/n. Portanto, o valor da primeira soma no lado direito da Eq. (1.10.6) é is 1/n. Therefore, the value of the first summation on the right side of Eq. (1.10.6) is n n 2 Pr.(Aeu=n=t- S > Pr(A;) =1- tly eu=1 n i=1 n Além disso, como a letra 1 poderia ser colocada em qualquer uma dasnenvelopes e a Furthermore, since letter 1 could be placed in any one of n envelopes and letter carta 2 poderiam entdo ser colocados em qualquer um dos outros/-1 envelopes, a 2 could then be placed in any one of the other n — 1 envelopes, the probability probabilidade Pr(/A1MA2)que tanto a carta 1 quanto a carta 2 serao colocadas nos Pr(A;M A>) that both letter 1 and letter 2 will be placed in the correct envelopes envelopes corretos 6 1{n(n-1)]. Da mesma forma, a probabilidade Pr(AeuwAjque is 1/[n(n — 1)]. Similarly, the probability Pr(A; A;) that any two specific letters i quaisquer duas letras especificaseu e/ (eu=jJambos serdo colocados nos envelopes and j (i € j) will both be placed in the correct envelopes is 1/[n(n — 1)]. Therefore, corretos é 1{.n(n-1)]. Portanto, o valor da segunda soma no lado direito da Eq. (1.10.6) é the value of the second summation on the right side of Eq. (1.10.6) is () 2 Pr(AeunAe 7 —t_—=1. S>Pr(4; Aj) = (() _l eus) 2 n(n-1) 2! iy 2/n(n—1) 2! Por raciocinio semelhante, pode-se determinar que a probabilidade Pr(AeuwN By similar reasoning, it can be determined that the probability Pr(A; 0 A; Ax) AWNAk) que quaisquer trés letras especificaseu,j,ek (eu <j <k)sera colocado nos that any three specific letters i, j, and k (i < j <k) will be placed in the correct envelopes corretos é 14.N(n-1)(n-2)]. Portanto, o valor da terceira soma é envelopes is 1/[n(n — 1)(n — 2)]. Therefore, the value of the third summation is () > Pr. (AeuNANAKE A mn = * S> P(A; A; A, = (‘) = = eusssk , i<j<k . Este procedimento pode ser continuado até que se verifique que a probabilidade Pr(Ain This procedure can be continued until it is found that the probability Pr(A, A. .MAnjsso tudoncartas serdo colocadas nos envelopes corretos é 1//n!). Segue-se A,-++:MA,,) that all n letters will be placed in the correct envelopes is 1/(n!). It now agora da Eq. (1.10.6) que a probabilidadepnque pelo menos uma carta seja colocada follows from Eq. (1.10.6) that the probability p, that at least one letter will be placed no envelope correto é in the correct envelope is pr=1 - xt i ct (Anite, (1.10.10) Prato tate (Ite, (1.10.10) Essa probabilidade possui as seguintes caracteristicas interessantes. Comon- ~,0 valor de This probability has the following interesting features. As n — oo, the value of pnaproxima-se do seguinte limite: P, approaches the following limit: oo 1 1 1 . 1 1 1 limaopn=1 - a7 at ao fim Pa=lo staat 50 Chapter 1 Introduction to Probability It is shown in books on elementary calculus that the sum of the infinite series on the right side of this equation is 1 − (1/e), where e = 2.71828. . . . Hence, 1 − (1/e) = 0.63212. . . . It follows that for a large value of n, the probability pn that at least one letter will be placed in the correct envelope is approximately 0.63212. The exact values of pn, as given in Eq. (1.10.10), will form an oscillating sequence as n increases. As n increases through the even integers 2, 4, 6, . . . , the values of pn will increase toward the limiting value 0.63212; and as n increases through the odd integers 3, 5, 7, . . . , the values of pn will decrease toward this same limiting value. The values of pn converge to the limit very rapidly. In fact, for n = 7 the exact value p7 and the limiting value of pn agree to four decimal places. Hence, regardless of whether seven letters are placed at random in seven envelopes or seven million letters are placed at random in seven million envelopes, the probability that at least one letter will be placed in the correct envelope is 0.6321. Summary We generalized the formula for the probability of the union of two arbitrary events to the union of finitely many events. As an aside, there are cases in which it is easier to compute Pr(A1 ∪ . . . ∪ An) as 1 − Pr(Ac 1 ∩ . . . ∩ Ac n) using the fact that (A1 ∪ . . . ∪ An)c = Ac 1 ∩ . . . ∩ Ac n. Exercises 1. Three players are each dealt, in a random manner, five cards from a deck containing 52 cards. Four of the 52 cards are aces. Find the probability that at least one person receives exactly two aces in their five cards. 2. In a certain city, three newspapers A, B, and C are published. Suppose that 60 percent of the families in the city subscribe to newspaper A, 40 percent of the families subscribe to newspaper B, and 30 percent subscribe to newspaper C. Suppose also that 20 percent of the families subscribe to both A and B, 10 percent subscribe to both A and C, 20 percent subscribe to both B and C, and 5 percent subscribe to all three newspapers A, B, and C. What percentage of the families in the city subscribe to at least one of the three newspapers? 3. For the conditions of Exercise 2, what percentage of the families in the city subscribe to exactly one of the three newspapers? 4. Suppose that three compact discs are removed from their cases, and that after they have been played, they are put back into the three empty cases in a random manner. Determine the probability that at least one of the CD’s will be put back into the proper cases. 5. Suppose that four guests check their hats when they arrive at a restaurant, and that these hats are returned to them in a random order when they leave. Determine the probability that no guest will receive the proper hat. 6. A box contains 30 red balls, 30 white balls, and 30 blue balls. If 10 balls are selected at random, without replace- ment, what is the probability that at least one color will be missing from the selection? 7. Suppose that a school band contains 10 students from the freshman class, 20 students from the sophomore class, 30 students from the junior class, and 40 students from the senior class. If 15 students are selected at random from the band, what is the probability that at least one student will be selected from each of the four classes? Hint: First determine the probability that at least one of the four classes will not be represented in the selection. 8. If n letters are placed at random in n envelopes, what is the probability that exactly n − 1 letters will be placed in the correct envelopes? 9. Suppose that n letters are placed at random in n en- velopes, and let qn denote the probability that no letter is placed in the correct envelope. For which of the follow- ing four values of n is qn largest: n = 10, n = 21, n = 53, or n = 300? 50 Capítulo 1 Introdução à Probabilidade É mostrado em livros de cálculo elementar que a soma das séries infinitas no lado direito desta equação é 1 -(1/e), ondee=2.71828. . . .Portanto, 1 -(1/e)= 0.63212. . . .Segue-se que para um grande valor den, a probabilidadepnque pelo menos uma carta será colocada no envelope correto é de aproximadamente 0,63212. Os valores exatos depn, conforme dado na Eq. (1.10.10), formará uma sequência oscilante comonaumenta. Comonaumenta através dos inteiros pares 2,4,6, . . . ,os valores depn aumentará em direção ao valor limite 0,63212; e comonaumenta através dos inteiros ímpares 3,5,7, . . . ,os valores depndiminuirá em direção a esse mesmo valor limite. Os valores depnconvergem para o limite muito rapidamente. Na verdade, paran=7 o valor exatop7e o valor limite depnconcordar com quatro casas decimais. Portanto, independentemente de sete cartas serem colocadas aleatoriamente em sete envelopes ou sete milhões de cartas serem colocadas aleatoriamente em sete milhões de envelopes, a probabilidade de pelo menos uma carta ser colocada no envelope correto é 0,6321. Resumo Generalizamos a fórmula da probabilidade da união de dois eventos arbitrários para a união de um número finito de eventos. À parte, há casos em que é mais fácil de calcular Pr(A1∪. . .∪An)como 1 - Pr(Ac 1∩. . .∩Ac n)usando o fato de que (A1∪. . .∪An)c=Ac 1∩. . .∩Ac n. Exercícios 1.Três jogadores recebem, cada um, de maneira aleatória, cinco cartas de um baralho contendo 52 cartas. Quatro das 52 cartas são ases. Encontre a probabilidade de que pelo menos uma pessoa receba exatamente dois ases em suas cinco cartas. -los em uma ordem aleatória quando eles saem. Determine a probabilidade de nenhum convidado receber o chapéu adequado. 6.Uma caixa contém 30 bolas vermelhas, 30 bolas brancas e 30 bolas azuis. Se 10 bolas forem selecionadas aleatoriamente, sem reposição, qual é a probabilidade de faltar pelo menos uma cor na seleção? 2.Numa determinada cidade, três jornaisA,B, eCsão publicados. Suponha que 60% das famílias da cidade assinem jornaisA, 40 por cento das famílias assinam jornaisBe 30% assinam jornaisC. Suponha também que 20% das famílias assinam ambosAeB, 10 por cento assinam ambos AeC, 20 por cento assinam ambosBeCe 5% assinam os três jornaisA,B, eC. Qual a percentagem de famílias da cidade que assina pelo menos um dos três jornais? 7.Suponha que uma banda escolar contenha 10 alunos do primeiro ano, 20 alunos do segundo ano, 30 alunos do primeiro ano e 40 alunos do último ano. Se 15 alunos forem selecionados aleatoriamente da banda, qual é a probabilidade de pelo menos um aluno ser selecionado de cada uma das quatro turmas?Dica:Primeiro determine a probabilidade de pelo menos uma das quatro classes não estar representada na seleção. 3.Para as condições do Exercício 2, qual a percentagem de famílias da cidade que assina exactamente um dos três jornais? 8.Senletras são colocadas aleatoriamente emnenvelopes, qual é a probabilidade de que exatamenten-1 cartas serão colocadas nos envelopes corretos? 4.Suponha que três discos compactos sejam removidos de suas caixas e que, depois de reproduzidos, sejam recolocados nas três caixas vazias de maneira aleatória. Determine a probabilidade de que pelo menos um dos CDs seja colocado de volta nas caixas apropriadas. 9.Suponha quenletras são colocadas aleatoriamente emn envelopes e deixeqndenota a probabilidade de que nenhuma carta seja colocada no envelope correto. Para qual dos seguintes quatro valores denéqnmaior:n=10,n=21,n=53, ou n= 300? 5.Suponha que quatro convidados verifiquem seus chapéus quando chegam a um restaurante e que esses chapéus sejam devolvidos 1.11 Fraudes Estatisticas 51 1.11 Statistical Swindles 51 10.Se trés cartas forem colocadas aleatoriamente em trés Dica:Deixe a sequénciaBi, 62,...ser definido como no 10. If three letters are placed at random in three en- Hint: Let the sequence By, Bo, ... be defined as in Exer- envelopes, qual é a probabilidade de que exatamente uma Exercicio 12 da Seg. 1.5, e mostre que velopes, what is the probability that exactly one letter will cise 12 of Sec. 1.5, and show that carta seja colocada no envelope correto? ( F ( ) be placed in the correct envelope? UW CO n 11.Suponha que 10 cartas, das quais cinco sdo vermelhas e cinco Pr. Aeu=lim Pr Beu=lim Pr (An). 11. Suppose that 10 cards, of which five are red and five Pr (U 4) = hm. Pr (U a) = im, Pr(A,,). verdes, sejam colocadas aleatoriamente em 10 envelopes, dos quais eu=1 eu=1 are green, are placed at random in 10 envelopes, of which i=1 i=1 cinco so vermelhos e cinco sdo verdes. Determine a probabilidade de . oo, five are red and five are green. Determine the probability oo, que exatamentexos envelopes conterao um cartéo com uma cor 13.DeixarA, A2, . ..ser uma sequéncia infinita de eventos that exactly x envelopes will contain a card with a match- 13. Let Aj, Az, ... be an infinite sequence of events such correspondente(x=0,1, .. .,10). tais queA1DA2>. . .. Prove isso ing color (x =0, 1, ..., 10). that Aj > Az D---. Prove that 12.DeixarA1, A2,...ser uma sequéncia infinita de eventos ( 9 . 12. Let Ay, Az, ... be an infinite sequence of events such pad . tais queAiC Arc. . .. Prove isso Pr. Aeu=lim Pr (An). that A; C A) C-- -. Prove that Pr (A 4) = jim Pr(A,). ( oy eu=1 . it Pr. Aeu=lim Pr(An). Dica-Considere a sequénciaAc 1, Ae, ...,¢ aplique exercicios Pr (U 4) = lim Pr(A,). Hint: Consider the sequence A‘, AS, ..., and apply Exer- eu=1 oe ciso 12. pas noo cise 12. 1.11 Fraudes Estatisticas I.11 Statistical Swindles Esta secéo apresenta alguns exemplos de como alguém pode ser enganado por argumentos que This section presents some examples of how one can be misled by arguments that exigem que se ignore o calculo de probabilidade. require one to ignore the calculus of probability. Uso enganoso de estatisticas Misleading Use of Statistics O campo da estatistica tem uma imagem negativa na mente de muitas pessoas porque existe The field of statistics has a poor image in the minds of many people because there is uma crenga generalizada de que os dados estatisticos e as andlises estatisticas podem ser a widespread belief that statistical data and statistical analyses can easily be manip- facilmente manipulados de uma forma ndo cientifica e antiética, num esforco para mostrar que ulated in an unscientific and unethical fashion in an effort to show that a particular uma determinada conclusdo ou ponto de vista é correto. Todos nés j4 ouvimos dizer que conclusion or point of view is correct. We all have heard the sayings that “There “Existem tr€s tipos de mentiras: mentiras, mentiras malditas e estatisticas” (Mark Twain [1924, are three kinds of lies: lies, damned lies, and statistics” (Mark Twain [1924, p. 246] p. 246] diz que esta linha foi atribuida a Benjamin Disraeli) e que “vocé pode provar qualquer says that this line has been attributed to Benjamin Disraeli) and that “you can prove coisa com estatisticas.” anything with statistics.” Um beneficio de estudar probabilidade e estatistica é que o conhecimento que adquirimos One benefit of studying probability and statistics is that the knowledge we gain nos permite analisar argumentos estatisticos que lemos em jornais, revistas ou em outros enables us to analyze statistical arguments that we read in newspapers, magazines, lugares. Poderemos entdo avaliar estes argumentos pelos seus méritos, em vez de aceita-los or elsewhere. We can then evaluate these arguments on their merits, rather than cegamente. Nesta sec¢do, descreveremos trés esquemas que tém sido utilizados para induzir accepting them blindly. In this section, we shall describe three schemes that have been os consumidores a enviar dinheiro aos operadores dos esquemas em troca de certos tipos de used to induce consumers to send money to the operators of the schemes in exchange informagao. Os dois primeiros esquemas nao sdo de natureza estritamente estatistica, mas for certain types of information. The first two schemes are not strictly statistical in baseiam-se fortemente em nuances de probabilidade. nature, but they are strongly based on undertones of probability. PrevisOes perfeitas Perfect Forecasts Suponha que numa manhd de segunda-feira vocé receba pelo correio uma carta de uma Suppose that one Monday morning you receive in the mail a letter from a firm empresa com a qual nao esta familiarizado, informando que a empresa vende previs6es sobre with which you are not familiar, stating that the firm sells forecasts about the stock © mercado de acées por taxas muito altas. Para indicar a capacidade de previsdo da empresa, market for very high fees. To indicate the firm’s ability in forecasting, it predicts that a ela prevé que uma determinada acdo, ou uma determinada carteira de acdes, aumentara de particular stock, or a particular portfolio of stocks, will rise in value during the coming valor durante a proxima semana. Vocé ndo responde a esta carta, mas observa o mercado de week. You do not respond to this letter, but you do watch the stock market during the acdes durante a semana e percebe que a previsdo estava correta. Na manha da segunda-feira week and notice that the prediction was correct. On the following Monday morning seguinte, vocé receberd outra carta da mesma empresa contendo outra previsdo, esta you receive another letter from the same firm containing another prediction, this one especificando que o valor de uma determinada acdo caird durante a préxima semana. Mais specifying that a particular stock will drop in value during the coming week. Again uma vez a previsdo se mostra correta. the prediction proves to be correct. 52 Chapter 1 Introduction to Probability This routine continues for seven weeks. Every Monday morning you receive a prediction in the mail from the firm, and each of these seven predictions proves to be correct. On the eighth Monday morning, you receive another letter from the firm. This letter states that for a large fee the firm will provide another prediction, on the basis of which you can presumably make a large amount of money on the stock market. How should you respond to this letter? Since the firm has made seven successive correct predictions, it would seem that it must have some special information about the stock market and is not simply guessing. After all, the probability of correctly guessing the outcomes of seven successive tosses of a fair coin is only (1/2)7 = 0.008. Hence, if the firm had only been guessing each week, then the firm had a probability less than 0.01 of being correct seven weeks in a row. The fallacy here is that you may have seen only a relatively small number of the forecasts that the firm made during the seven-week period. Suppose, for example, that the firm started the entire process with a list of 27 = 128 potential clients. On the first Monday, the firm could send the forecast that a particular stock will rise in value to half of these clients and send the forecast that the same stock will drop in value to the other half. On the second Monday, the firm could continue writing to those 64 clients for whom the first forecast proved to be correct. It could again send a new forecast to half of those 64 clients and the opposite forecast to the other half. At the end of seven weeks, the firm (which usually consists of only one person and a computer) must necessarily have one client (and only one client) for whom all seven forecasts were correct. By following this procedure with several different groups of 128 clients, and starting new groups each week, the firm may be able to generate enough positive responses from clients for it to realize significant profits. Guaranteed Winners There is another scheme that is somewhat related to the one just described but that is even more elegant because of its simplicity. In this scheme, a firm advertises that for a fixed fee, usually 10 or 20 dollars, it will send the client its forecast of the winner of any upcoming baseball game, football game, boxing match, or other sports event that the client might specify. Furthermore, the firm offers a money-back guarantee that this forecast will be correct; that is, if the team or person designated as the winner in the forecast does not actually turn out to be the winner, the firm will return the full fee to the client. How should you react to such an advertisement? At first glance, it would appear that the firm must have some special knowledge about these sports events, because otherwise it could not afford to guarantee its forecasts. Further reflection reveals, however, that the firm simply cannot lose, because its only expenses are those for advertising and postage. In effect, when this scheme is used, the firm holds the client’s fee until the winner has been decided. If the forecast was correct, the firm keeps the fee; otherwise, it simply returns the fee to the client. On the other hand, the client can very well lose. He presumably purchases the firm’s forecast because he desires to bet on the sports event. If the forecast proves to be wrong, the client will not have to pay any fee to the firm, but he will have lost any money that he bet on the predicted winner. Thus, when there are “guaranteed winners,” only the firm is guaranteed to win. In fact, the firm knows that it will be able to keep the fees from all the clients for whom the forecasts were correct. 52 Capítulo 1 Introdução à Probabilidade Essa rotina continua por sete semanas. Toda segunda-feira de manhã você recebe uma previsão da empresa pelo correio, e cada uma dessas sete previsões se mostra correta. Na manhã da oitava segunda-feira, você recebe outra carta da empresa. Esta carta afirma que, por uma grande taxa, a empresa fornecerá outra previsão, com base na qual você provavelmente poderá ganhar uma grande quantia de dinheiro no mercado de ações. Como você deve responder a esta carta? Como a empresa fez sete previsões corretas sucessivas, parece que ela deve ter alguma informação especial sobre o mercado de ações e não está simplesmente adivinhando. Afinal, a probabilidade de adivinhar corretamente os resultados de sete lançamentos sucessivos de uma moeda honesta é apenas(1/2)7= 0.008. Portanto, se a empresa tivesse apenas adivinhado todas as semanas, então a empresa teria uma probabilidade inferior a 0,01 de acertar sete semanas consecutivas. A falácia aqui é que você pode ter visto apenas um número relativamente pequeno de previsões que a empresa fez durante o período de sete semanas. Suponha, por exemplo, que a empresa iniciou todo o processo com uma lista de 27= 128 clientes potenciais. Na primeira segunda-feira, a empresa poderia enviar a previsão de que o valor de uma determinada ação subirá para metade desses clientes e enviar a previsão de que o valor da mesma ação cairá para a outra metade. Na segunda segunda-feira, a empresa poderia continuar escrevendo para os 64 clientes para os quais a primeira previsão se mostrou correta. Poderia novamente enviar uma nova previsão para metade desses 64 clientes e a previsão oposta para a outra metade. Ao final de sete semanas, a empresa (que geralmente consiste em apenas uma pessoa e um computador) deve necessariamente ter um cliente (e apenas um cliente) para o qual todas as sete previsões estavam corretas. Seguindo este procedimento com vários grupos diferentes de 128 clientes, e iniciando novos grupos a cada semana, a empresa poderá gerar respostas positivas suficientes dos clientes para obter lucros significativos. Vencedores garantidos Existe outro esquema que está um pouco relacionado com o que acabamos de descrever, mas que é ainda mais elegante devido à sua simplicidade. Nesse esquema, uma empresa anuncia que, por uma taxa fixa, geralmente de 10 ou 20 dólares, enviará ao cliente sua previsão do vencedor de qualquer jogo de beisebol, futebol americano, luta de boxe ou outro evento esportivo que o cliente possa especificar. . Além disso, a empresa oferece uma garantia de devolução do dinheiro de que esta previsão estará correta; isto é, se a equipe ou pessoa designada como vencedora na previsão não for realmente a vencedora, a empresa devolverá o valor integral ao cliente. Como você deve reagir a tal anúncio? À primeira vista, parece que a empresa deve ter algum conhecimento especial sobre estes eventos desportivos, pois caso contrário não teria condições de garantir as suas previsões. Uma reflexão mais aprofundada revela, no entanto, que a empresa simplesmente não pode perder, porque as suas únicas despesas são as de publicidade e de envio. Com efeito, quando este esquema é utilizado, a empresa retém os honorários do cliente até que o vencedor seja decidido. Se a previsão estiver correta, a empresa fica com a taxa; caso contrário, simplesmente devolve a taxa ao cliente. Por outro lado, o cliente pode muito bem perder. Ele provavelmente compra a previsão da empresa porque deseja apostar no evento esportivo. Se a previsão se revelar errada, o cliente não terá que pagar nenhuma taxa à empresa, mas terá perdido o dinheiro que apostou no vencedor previsto. Assim, quando há “vencedores garantidos”, apenas a empresa tem garantia de vitória. Na verdade, a empresa sabe que poderá ficar com os honorários de todos os clientes para os quais as previsões estavam corretas. 1.12 Exercicios Suplementares 53 1.12 Supplementary Exercises 53 Melhorando suas chances na loteria Improving Your Lottery Chances As loterias estaduais se tornaram muito populares na América. As pessoas gastam milhdes de ddlares State lotteries have become very popular in America. People spend millions of todas as semanas para comprar bilhetes com probabilidades muito pequenas de ganhar prémios dollars each week to purchase tickets with very small chances of winning medium médios a enormes. Com tanto dinheiro a ser gasto em bilhetes de lotaria, ndo deveria surpreender to enormous prizes. With so much money being spent on lottery tickets, it should not que alguns individuos empreendedores tenham inventado esquemas para lucrar com a ingenuidade be surprising that a few enterprising individuals have concocted schemes to cash in probabilistica do publico comprador de bilhetes. Existem agora varios livros e videos disponiveis que on the probabilistic naiveté of the ticket-buying public. There are now several books pretendem ajudar os jogadores de loteria a melhorar seu desempenho. As pessoas realmente pagam and videos available that claim to help lottery players improve their performance. por esses itens. Alguns dos conselhos sao apenas de bom senso, mas alguns deles séo enganosos e People actually pay money for these items. Some of the advice is just common sense, baseiam-se em equivocos sutis sobre probabilidade. but some of it is misleading and plays on subtle misconceptions about probability. Para ser mais concreto, suponha que temos um jogo em que ha 40 bolas numeradas For concreteness, suppose that we have a game in which there are 40 balls num- de 1 a 40 e seis sdo sorteadas sem reposicdo para determinar o vencedor. bered 1 to 40 and six are drawn without replacement to determine the winning combinacao. A compra de um bilhete exige que o cliente escolha seis numeros diferentes. combination. A ticket purchase requires the customer to choose six different num- numeros de 1 a 40 e pagam uma taxa. Este jogo tem40$= 3,838,380 vitdérias diferentes bers from 1 to 40 and pay a fee. This game has (°) = 3,838,380 different winning combinacdes e o mesmo numero de ingressos possiveis. Um conselho frequentemente combinations and the same number of possible tickets. One piece of advice often encontrado em auxilios de loteria publicados é ndo escolher os seis nUmeros do seu bilhete found in published lottery aids is not to choose the six numbers on your ticket too far muito distantes. Muitas pessoas tendem a escolher seus seis numeros distribuidos apart. Many people tend to pick their six numbers uniformly spread out from 1 to 40, uniformemente de 1 a 40, mas a combinac¢ao vencedora geralmente tem dois numeros but the winning combination often has two consecutive numbers or at least two num- consecutivos ou pelo menos dois numeros muito proximos. Alguns desses “conselheiros” bers very close together. Some of these “advisors” recommend that, since it is more recomendam que, como é mais provavel que haja nimeros préximos, os jogadores agrupem likely that there will be numbers close together, players should bunch some of their alguns dos seus seis nimeros. Tal conselho pode fazer sentido para evitar escolher os mesmos six numbers close together. Such advice might make sense in order to avoid choosing numeros que outros jogadores num jogo parimutuel (ou seja, um jogo em que todos os the same numbers as other players in a parimutuel game (i.e., a game in which all vencedores partilham o jackpot). Mas a ideia de que qualquer estratégia pode melhorar as suas winners share the jackpot). But the idea that any strategy can improve your chances chances de vitéria é enganosa. of winning is misleading. Para ver por que esse conselho é enganoso, vamosera 0 evento em que a combinacao To see why this advice is misleading, let E be the event that the winning com- vencedora contém pelo menos um par de numeros consecutivos. O leitor pode calcular Pr(E)no bination contains at least one pair of consecutive numbers. The reader can calculate Exercicio 13 na Seg. 1.12. Para este exemplo, Pr(/E0.577. Portanto, os auxilios a loteria estao Pr(£) in Exercise 13 in Sec. 1.12. For this example, Pr(£) = 0.577. So the lottery aids corretos ao afirmar queétem alta probabilidade. Contudo, ao afirmar que a escolha de um are correct that EF has high probability. However, by claiming that choosing a ticket in bilhete em Zaumenta sua chance de ganhar, eles confundem a probabilidade do evento£com a E increases your chance of winning, they confuse the probability of the event E with probabilidade de cada resultado emF. Se vocé escolher o ingresso(5,7,14,23,24,38), sua the probability of each outcome in E. If you choose the ticket (5, 7, 14, 23, 24, 38), probabilidade de ganhar é de apenas 1/3,828,380, assim como seria se vocé escolhesse your probability of winning is only 1/3,828,380, just as it would be if you chose any qualquer outro ingresso. O fato de esse ticket estar em nao torna sua probabilidade de ganhar other ticket. The fact that this ticket happens to be in E doesn’t make your probabil- igual a 0,577. A razo pela qual o Pr(E tao grande é que tantas combinacées diferentes estao ity of winning equal to 0.577. The reason that Pr(£) is so big is that so many different emE. Cada uma dessas combinagées ainda tem probabilidade 1/3,828,380 de vitdéria e vocé sé combinations are in E. Each of those combinations still has probability 1/3,828,380 ganha uma combinagdo em cada bilhete. O fato de existirem tantas combinagées Endo torna of winning, and you only get one combination on each ticket. The fact that there are cada um deles mais provavel do que qualquer outra coisa. so many combinations in E does not make each one any more likely than anything else. 1.12 Exercicios Suplementares 1.12 Supplementary Exercises 1.Suponha que uma moeda Seja langada sete vezes. DeixarAdenotamo § 4.Suponha que em um baralho de 20 cartas, cada carta tenha 1. Suppose that a coin is tossed seven times. Let A denote 4. Suppose that in a deck of 20 cards, each card has one evento em que uma cara é obtida no primeiro lancamento, e deixamos um dos numeros 1,2,3,4 ou 5 e ha quatro cartas com cada the event that a head is obtained on the first toss, and let B of the numbers 1, 2, 3, 4, or 5 and there are four cards Bdenota 0 evento em que uma cara é obtida no quinto langamento. numero. Se 10 cartas forem escolhidas aleatoriamente do denote the event that a head is obtained on the fifth toss. with each number. If 10 cards are chosen from the deck at S40 Ae Bdisjunto? baralho, sem reposi¢ao, qual é a probabilidade de que cada Are A and B disjoint? random, without replacement, what is the probability that _ . um dos nimeros 1,2,3,4 e 5 aparecerdo exatamente duas each of the numbers 1, 2, 3, 4, and 5 will appear exactly 2.SeA,B, eDsdo trés eventos tais que Pr(AUBU vezes? 2. If A, B, and D are three events such that Pr(A U B U twice? DF0.7, qual é 0 valor de Pr(ANBaAADcP D) =0.7, what is the value of Pr(AS N BSN D°)? 3.Suponha que um determinado distrito eleitoral contenha 350 5.Considere o empreiteiro do Exemplo 1.5.4 na pagina 19. Ele deseja 3. Suppose that a certain precinct contains 350 voters, of 5. Consider the contractor in Example 1.5.4 on page 19. eleitores, dos quais 250 sdo democratas e 100 sdo republicanos. Se 30 calcular a probabilidade de que a demanda total da concessionaria seja which 250 are Democrats and 100 are Republicans. If 30 He wishes to compute the probability that the total utility eleitores forem escolhidos aleatoriamente no distrito eleitoral, qual é a alta, o que significa que a soma da demanda de agua e eletricidade voters are chosen at random from the precinct, what is the demand is high, meaning that the sum of water and elec- probabilidade de que exatamente 18 democratas sejam selecionados? (nas unidades do Exemplo 1.4.5) é pelo menos probability that exactly 18 Democrats will be selected? trical demand (in the units of Example 1.4.5) is at least 54 Capitulo 1 Introdugdo a Probabilidade 54 Chapter 1 Introduction to Probability 215. Faga um desenho desse evento em um grafico como a Figura 1.5 ou 12.DeixarAi, ..., AnSerneventos arbitrarios. Mostre que a 215. Draw a picture of this event on a graph like Fig. 1.5 12. Let Aj,..., A, be n arbitrary events. Show that the Figura 1.9 e encontre sua probabilidade. probabilidade de que exatamente um dessesneventos ocorrerdo é or Fig. 1.9 and find its probability. probability that exactly one of these n events will occur is 6.Suponha que uma caixa contenhaAbolas vermelhas ecbolas brancas. »” > > 6. Suppose that a box contains r red balls and w white n Suponha também que as bolas sejam retiradas da caixa, uma de cada Pr.(Aeu}2 Pr.(AeuNAj+3 Pr.(AeuNANAk) balls. Suppose also that balls are drawn from the box one > Pr(A;) — 2 > Pr(A; 1 Aj) +3 > Pr(A; A; 9 Ax) vez, aleatoriamente, sem reposicado.(a)Qual é a probabilidade de que eu=1 euy eus<k at a time, at random, without replacement. (a) What is the i=l i<j i<j<k todos bolas vermelhas serdo obtidas antes de quaisquer bolas eet Eni nPr.(A 1A 9+**MAn). probability that all r red balls will be obtained before any eee (—1)"* hn Pr(A} A Ay ++ A,). brancas serem obtidas?(b)Qual é a probabilidade de que todosAbolas white balls are obtained? (b) What is the probability that vermelhas serdo obtidas antes que duas bolas brancas sejam obtidas? all r red balls will be obtained before two white balls are ; ; ; . 13.Considere um jogo de loteria estadual em que cada obtained? 13. Consider a state lottery game in which each winning combinacao vencedora e cada bilhete consiste em um conjunto de . . combination and each ticket consists of one set of k num- 7.Suponha que uma caixa contenhaAbolas vermelhas,cbolas brancas e Anumeros escolhidos entre os nimeros de 1 anSem substituigdo. 7. Suppose that a box contains r red balls, w white balls, bers chosen from the numbers 1 ton without replacement. bbolas azuis. Suponha também que as bolas sejam retiradas da caixa, Calcularemos a probabilidade de a combinag¢ao vencedora conter and b blue balls. Suppose also that balls are drawn from We shall compute the probability that the winning combi- uma de cada vez, aleatoriamente, sem reposicdo. Qual 6 a pelo menos um par de numeros consecutivos. the box one at a time, at random, without replacement. nation contains at least one pair of consecutive numbers. probabilidade de que todosAbolas vermelhas serdo obtidas antes de . oo What is the probability that all r red balls will be obtained P hat if dk — 1. th i, bj quaisquer bolas brancas serem obtidas? a.Prove que sen <2k-1, entao cada combinacao vencedora before any white balls are obtained? a. Prove that ifn < 2k —1, u en every winning combt- tera pelo menos um par de numeros consecutivos. Para nation has at least one pair of consecutive numbers. 8.Suponha que 10 cart6es, dos quais sete sao vermelhos e trés sdo 0 resto do problema, suponha quen22k-1. 8. Suppose that 10 cards, of which seven are red and three For the rest of the problem, assume that n > 2k — 1. verdes, sejam colocados aleatoriamente em 10 envelopes, dos quais b.Deixareui<. .. <eukser uma combinagao vencedora are green, are put at random into 10 envelopes, of which b. Let i, <--» <i, be an arbitrary possible winning sete sdo vermelhos e trés sdo verdes, de modo que cada envelope possivel arbitrdria, organizada do menor para o seven are red and three are green, so that each envelope combination arranged in order from smallest to contenha um carte. Determine a probabilidade de que exatamente k maior. Paraé=1,..., k, deixar/é=eué(s-1). Aquilo 6, contains one card. Determine the probability that exactly largest. For s=1,...,k, let j, =i, — (s — 1). That os envelopes conterdo um cartéo com uma cor correspondente (A=0,1 k envelopes will contain a card with a matching color is +. ,10). (k=0,1,..., 10). ° fz=eui, . Axi, 9.Suponha que 10 cartées, dos quais cinco sao vermelhos e cinco __ 1 9. Suppose that 10 cards, of which five are red and five ; . 1 verdes, sejam colocados aleatoriamente em 10 envelopes, dos quais f= eur are green, are put at random into 10 envelopes, of which J2= 12 —~ sete sdo vermelhos e trés sdo verdes, de modo que cada envelope _ seven are red and three are green, so that each envelope : contenha um cartdo. Determine a probabilidade de que exatamente k jae euk-(k-1). contains one card. Determine the probability that exactly i=in —(k —D. os envelopes conterdo um cartéo com uma cor correspondente (A=0,1 k envelopes will contain a card with a matching color ; ; se. 10). Prove isso(eut, ..., eukJcontém pelo menos um par de (k=0,1,..., 10). Prove that (i1,..., 7) contains at least one pair of oo, numeros consecutivos se e somente se(ji,... , /kJcontém i, consecutive numbers if and only if (jj, ..., j,) con- 10.Suponha que os eventosAeBsdo disjuntos. Em que nuimeros repetidos. 10. Suppose that the events A and B are disjoint. Under tains repeated numbers. condicées sdoAce Bcdisjunto? c.Prove que 1S/1s. . .</K<-k+1 @ que o ( what conditions are A° and B° disjoint? c. Prove that 1 < j; <---< j,<n—k-+1and that the , : . : iw A . . . : - m—k+t 11.DeixarA1,A2, eAaser trés eventos arbitrarios. Mostre que a numero de(, ..., jkkconjuntos sem repeticGes én-kj) . 11. Let Ay, A>, and A3 be three arbitrary events. Show that number of (j;,..., j,) Sets with no repeatsis ("";*'). probabilidade de que exatamente um desses trés eventos d.Encontre a probabilidade de nao haver nenhum par de the probability that exactly one of these three events will d. Find the probability that there is no pair of consecu- ocorra é numeros consecutivos na combinagao vencedora. occur 1s tive numbers in the winning combination. Pr.(A1}+Pr.(Az}Pr.(A3) e.Encontre a probabilidade de pelo menos um par de nimeros Pr(A,) + Pr(A2) + Pr(A3) e. Find the probability of at least one pair of consecu- - 2 pr(AiNA2}2 pr(AinA3}2 pr(A2nA3) consecutivos na combinacdo vencedora. —2Pr(A,N A>) —2 Pr(A, N A3) — 2 Pr(Ay N A3) tive numbers in the winning combination. + 3 pr(AiNA2nA3). 4+3Pr(A,M Ay N AQ). Chapter 2 Conditional Probability 2.1 The Definition of Conditional Probability 2.2 Independent Events 2.3 Bayes’ Theorem 2.4 The Gambler’s Ruin Problem 2.5 Supplementary Exercises 2.1 The Definition of Conditional Probability A major use of probability in statistical inference is the updating of probabilities when certain events are observed. The updated probability of event A after we learn that event B has occurred is the conditional probability of A given B. Example 2.1.1 Lottery Ticket. Consider a state lottery game in which six numbers are drawn without replacement from a bin containing the numbers 1–30. Each player tries to match the set of six numbers that will be drawn without regard to the order in which the numbers are drawn. Suppose that you hold a ticket in such a lottery with the numbers 1, 14, 15, 20, 23, and 27. You turn on your television to watch the drawing but all you see is one number, 15, being drawn when the power suddenly goes off in your house. You don’t even know whether 15 was the first, last, or some in-between draw. However, now that you know that 15 appears in the winning draw, the probability that your ticket is a winner must be higher than it was before you saw the draw. How do you calculate the revised probability? ◀ Example 2.1.1 is typical of the following situation. An experiment is performed for which the sample space S is given (or can be constructed easily) and the proba- bilities are available for all of the events of interest. We then learn that some event B has occuured, and we want to know how the probability of another event A changes after we learn that B has occurred. In Example 2.1.1, the event that we have learned is B = {one of the numbers drawn is 15}. We are certainly interested in the probabil- ity of A = {the numbers 1, 14, 15, 20, 23, and 27 are drawn}, and possibly other events. If we know that the event B has occurred, then we know that the outcome of the experiment is one of those included in B. Hence, to evaluate the probability that A will occur, we must consider the set of those outcomes in B that also result in the occurrence of A. As sketched in Fig. 2.1, this set is precisely the set A ∩ B. It is therefore natural to calculate the revised probability of A according to the following definition. 55 C2 felizmente Probabilidade Condicional 2.1A definição de probabilidade condicional 2.2Eventos Independentes 2.3Teorema de Bayes 2.4O problema da ruína do jogador 2,5Exercícios Suplementares 2.1 A Definição de Probabilidade Condicional Um dos principais usos da probabilidade na inferência estatística é a atualização das probabilidades quando determinados eventos são observados. A probabilidade atualizada do eventoAdepois que aprendemos esse eventoBocorreu é a probabilidade condicional deAdadoB. Exemplo 2.1.1 Bilhete de loteria.Considere um jogo de loteria estadual em que seis números são sorteados sem substituição de uma caixa contendo os números de 1 a 30. Cada jogador tenta acertar o conjunto de seis números que serão sorteados, independentemente da ordem em que os números são sorteados. Suponha que você tenha um bilhete de loteria com os números 1, 14, 15, 20, 23 e 27. Você liga a televisão para assistir ao sorteio, mas tudo o que vê é um número, 15, sendo sorteado quando o poder de repente sai em sua casa. Você nem sabe se 15 foi o primeiro, o último ou algum empate intermediário. Porém, agora que você sabe que 15 aparece no sorteio vencedor, a probabilidade de seu bilhete ser vencedor deve ser maior do que era antes de você ver o sorteio. Como você calcula a probabilidade revisada? - O Exemplo 2.1.1 é típico da seguinte situação. Um experimento é realizado para o qual o espaço amostralSé dado (ou pode ser construído facilmente) e as probabilidades estão disponíveis para todos os eventos de interesse. Aprendemos então que algum eventoB ocorreu, e queremos saber como a probabilidade de outro eventoAmuda depois que aprendemos issoBocorreu. No Exemplo 2.1.1, o evento que aprendemos éB= {um dos números sorteados é 15}. Certamente estamos interessados na probabilidade de A= {os números 1, 14, 15, 20, 23 e 27 são sorteados}, e possivelmente outros eventos. Se soubermos que o eventoBocorreu, então sabemos que o resultado do experimento é um daqueles incluídos emB. Portanto, para avaliar a probabilidade de que Aocorrerá, devemos considerar o conjunto desses resultados emBque também resultam na ocorrência deA. Conforme esboçado na Fig. 2.1, este conjunto é precisamente o conjuntoA∩B. É portanto natural calcular a probabilidade revista deAde acordo com a seguinte definição. 55 56 Capitulo 2 Probabilidade Condicional 56 Chapter 2 Conditional Probability Figura 2.10s resultados no Ss Figure 2.1 The outcomesin 5 eventoBque também pertencem B the event B that also belong B A A ao eventoA. ay to the event A. ay AB ANB Definigao Probabilidade Condicional.Suponha que aprendemos que um evento Bocorreu e que Definition Conditional Probability. Suppose that we learn that an event B has occurred and that 2.1.1 queremos calcular a probabilidade de outro eventoAtendo em conta que 2.1.1 we wish to compute the probability of another event A taking into account that sabemos queSocorreu. A nova probabilidade deAé chamado deprobabilidade we know that B has occurred. The new probability of A is called the conditional condicional do eventoAdado que o eventoBocorreue é denotado Pr(A| 8). Se Pr probability of the event A given that the event B has occurred and is denoted Pr(A|B). (B) >0, calculamos essa probabilidade como If Pr(B) > 0, we compute this probability as Pr.(ANB Pr(ANB Pr (Al Be PANS) (2.1.1) Pr(4jp) = PAO 8) (2.1.1) Pr.(B) Pr(B) A probabilidade condicional Pr(A| Byndo esta definido se Pr(/BEO. The conditional probability Pr(A|B) is not defined if Pr(B) = 0. Por conveniéncia, a notacdo na Defini¢gdo 2.1.1 é lida simplesmente como a For convenience, the notation in Definition 2.1.1 is read simply as the conditional probabilidade condicional deAdadoB. Eq. (2.1.1) indica que Pr(A| Bé calculado como probability of A given B. Eq. (2.1.1) indicates that Pr(A|B) is computed as the a proporcdo da probabilidade total Pr(/B)que é representado por Pr(ANB), proportion of the total probability Pr(B) that is represented by Pr(A M B), intuitively intuitivamente a propor¢do deBisso também faz parteA. the proportion of B that is also part of A. Exemplo Bilhete de loteria.No Exemplo 2.1.1, vocé aprendeu que o evento Example Lottery Ticket. In Example 2.1.1, you learned that the event 2.1.2 , , 2.1.2 : B= {um dos numeros sorteados é 15} B = {one of the numbers drawn is 15} ocorreu. Vocé deseja calcular a probabilidade do eventoAque seu ingresso has occurred. You want to calculate the probability of the event A that your ticket é un vencedor. Ambos os eventosAeBsdo expressaveis no espaco amostral que consiste em is a winner. Both events A and B are expressible in the sample space that consists of 030) 6= 30!/6!24! combinacées possiveis de 30 itens considerados seis de cada vez, ou seja, the (°°) = 30!/(6!24!) possible combinations of 30 items taken six at a time, namely, os sorteios ndo ordenados de seis numeros de 1 a 30. O evento&consiste em the unordered draws of six numbers from 1-30. The event B consists of combinations combinacées que incluem 15. Como existem 29 numeros restantes para escolher o outro that include 15. Since there are 29 remaining numbers from which to choose the other cinco no sorteio vencedor, hasresl#@ados emB. Segue que five in the winning draw, there are (2) outcomes in B. It follows that (29) 29 2912416! (3) 2912416! Pr.(B (55) —— =0.2. Pr(B) = = = —— =0.2. 30) 3015124! (8) G°) 3015124! O eventoAque o seu bilhete é um vencedor consiste em um Unico resultado que também estd em, The event A that your ticket is a winner consists of a single outcome that is also in B, entaoANB=A, e soAN B=A, and 1 6!24! 1 6!24! Pr.(ANBEPr.(AF (30) —- = —— =1.68x10. ° Pr(A 9 B) = Pr(A) = = = —— = 1.68 x 10°. 30! (6) 30! 6 6 Segue-se que a probabilidade condicional deAdado Bé It follows that the conditional probability of A given B is 6124! 6124! Pr.(A| BE30!=8.4*10-6. Pr(A|B) = 22+ =8.4 x 107°. 0.2__ 0.2 Isso é cinco vezes maior que Pr(Ajantes de vocé aprender isso Bhavia ocorrido. - This is five times as large as Pr(A) before you learned that B had occurred. < Definigado 2.1.1 para a probabilidade condicional Pr(A| 8% formulado em termos da Definition 2.1.1 for the conditional probability Pr(A|B) is worded in terms of interpretacao subjetiva da probabilidade na Secdo. 1.2. Eq. (2.1.1) também tem um significado the subjective interpretation of probability in Sec. 1.2. Eq. (2.1.1) also has a simple simples em termos de interpretacdo frequencial da probabilidade. De acordo com meaning in terms of the frequency interpretation of probability. According to the 2.1 The Definition of Conditional Probability 57 frequency interpretation, if an experimental process is repeated a large number of times, then the proportion of repetitions in which the event B will occur is approx- imately Pr(B) and the proportion of repetitions in which both the event A and the event B will occur is approximately Pr(A ∩ B). Therefore, among those repetitions in which the event B occurs, the proportion of repetitions in which the event A will also occur is approximately equal to Pr(A|B) = Pr(A ∩ B) Pr(B) . Example 2.1.3 Rolling Dice. Suppose that two dice were rolled and it was observed that the sum T of the two numbers was odd. We shall determine the probability that T was less than 8. If we let A be the event that T < 8 and let B be the event that T is odd, then A ∩ B is the event that T is 3, 5, or 7. From the probabilities for two dice given at the end of Sec. 1.6, we can evaluate Pr(A ∩ B) and Pr(B) as follows: Pr(A ∩ B) = 2 36 + 4 36 + 6 36 = 12 36 = 1 3, Pr(B) = 2 36 + 4 36 + 6 36 + 4 36 + 2 36 = 18 36 = 1 2. Hence, Pr(A|B) = Pr(A ∩ B) Pr(B) = 2 3. ◀ Example 2.1.4 A Clinical Trial. It is very common for patients with episodes of depression to have a recurrence within two to three years. Prien et al. (1984) studied three treatments for depression: imipramine, lithium carbonate, and a combination. As is traditional in such studies (called clinical trials), there was also a group of patients who received a placebo. (A placebo is a treatment that is supposed to be neither helpful nor harmful. Some patients are given a placebo so that they will not know that they did not receive one of the other treatments. None of the other patients knew which treatment or placebo they received, either.) In this example, we shall consider 150 patients who entered the study after an episode of depression that was classified as “unipolar” (meaning that there was no manic disorder). They were divided into the four groups (three treatments plus placebo) and followed to see how many had recurrences of depression. Table 2.1 summarizes the results. If a patient were selected at random from this study and it were found that the patient received the placebo treatment, what is the conditional probability that the patient had a relapse? Let B be the event that the patient received the placebo, and let A be the event that Table 2.1 Results of the clinical depression study in Example 2.1.4 Treatment group Response Imipramine Lithium Combination Placebo Total Relapse 18 13 22 24 77 No relapse 22 25 16 10 73 Total 40 38 38 34 150 2.1 A Definição de Probabilidade Condicional 57 interpretação de frequência, se um processo experimental é repetido um grande número de vezes, então a proporção de repetições em que o eventoBocorrerá é aproximadamente Pr(B)e a proporção de repetições em que tanto o eventoAe o eventoBocorrerá é aproximadamente Pr(A∩B). Portanto, entre aquelas repetições em que o eventoBocorre, a proporção de repetições em que o eventoAtambém ocorrerá é aproximadamente igual a Pr.(A∩B) Pr.(B) Pr.(A|B)= . Exemplo 2.1.3 Dados de rolamento.Suponha que dois dados foram lançados e foi observado que a somaTde os dois números eram ímpares. Vamos determinar a probabilidade de queTera inferior a 8. Se deixarmosAseja o evento queT<8 e deixeBseja o evento queTé estranho, então A∩ Bé o evento queTé 3,5 ou 7. Das probabilidades para dois dados fornecidas no final da Seção. 1.6, podemos avaliar Pr(A∩B)e Pr(B)do seguinte modo: 2 4 6 12 1 + + = =, Pr.(A∩B)= 36 36 36 36 3 2 4 6 4 2 + + + + = 18 =. 1 Pr.(B)= 36 36 36 36 36 36 2 Por isso, Pr.(A∩B) Pr.(B) 2 3 Pr.(A|B)= = . - Exemplo 2.1.4 Um ensaio clínico.É muito comum que pacientes com episódios de depressão tenham uma recorrência dentro de dois a três anos. Prien et al. (1984) estudaram três tratamentos para a depressão: imipramina, carbonato de lítio e uma combinação. Como é tradicional em tais estudos (chamadostestes clínicos), houve também um grupo de pacientes que recebeu placebo. (Um placebo é um tratamento que supostamente não é útil nem prejudicial. Alguns pacientes recebem um placebo para que não saibam que não receberam um dos outros tratamentos. Nenhum dos outros pacientes sabia qual tratamento ou placebo eles recebido, também.) Neste exemplo, consideraremos 150 pacientes que entraram no estudo após um episódio de depressão que foi classificado como “unipolar” (o que significa que não houve transtorno maníaco). Eles foram divididos em quatro grupos (três tratamentos mais placebo) e acompanhados para ver quantos tiveram recorrências de depressão. A Tabela 2.1 resume os resultados. Se um paciente fosse selecionado aleatoriamente deste estudo e fosse descoberto que o paciente recebeu o tratamento com placebo, qual é a probabilidade condicional de que o paciente tenha tido uma recaída? Deixar Bser o evento em que o paciente recebeu o placebo, e deixarAseja o evento que Tabela 2.1Resultados do estudo clínico de depressão no Exemplo 2.1.4 Grupo de tratamento Placebo combinado de imipramina e lítioTotal Resposta Recaída Sem recaída 18 22 13 25 22 16 24 10 77 73 Total 40 38 38 34 150 58 Chapter 2 Conditional Probability the patient had a relapse. We can calculate Pr(B) = 34/150 and Pr(A ∩ B) = 24/150 directly from the table. Then Pr(A|B) = 24/34 = 0.706. On the other hand, if the randomly selected patient is found to have received lithium (call this event C) then Pr(C) = 38/150, Pr(A ∩ C) = 13/150, and Pr(A|C) = 13/38 = 0.342. Knowing which treatment a patient received seems to make a difference to the probability of relapse. In Chapter 10, we shall study methods for being more precise about how much of a difference it makes. ◀ Example 2.1.5 Rolling Dice Repeatedly. Suppose that two dice are to be rolled repeatedly and the sum T of the two numbers is to be observed for each roll. We shall determine the probability p that the value T = 7 will be observed before the value T = 8 is observed. The desired probability p could be calculated directly as follows: We could assume that the sample space S contains all sequences of outcomes that terminate as soon as either the sum T = 7 or the sum T = 8 is obtained. Then we could find the sum of the probabilities of all the sequences that terminate when the value T = 7 is obtained. However, there is a simpler approach in this example. We can consider the simple experiment in which two dice are rolled. If we repeat the experiment until either the sum T = 7 or the sum T = 8 is obtained, the effect is to restrict the outcome of the experiment to one of these two values. Hence, the problem can be restated as follows: Given that the outcome of the experiment is either T = 7 or T = 8, determine the probability p that the outcome is actually T = 7. If we let A be the event that T = 7 and let B be the event that the value of T is either 7 or 8, then A ∩ B = A and p = Pr(A|B) = Pr(A ∩ B) Pr(B) = Pr(A) Pr(B). From the probabilities for two dice given in Example 1.6.5, Pr(A) = 6/36 and Pr(B) = (6/36) + (5/36) = 11/36. Hence, p = 6/11. ◀ The Multiplication Rule for Conditional Probabilities In some experiments, certain conditional probabilities are relatively easy to assign directly. In these experiments, it is then possible to compute the probability that both of two events occur by applying the next result that follows directly from Eq. (2.1.1) and the analogous definition of Pr(B|A). Theorem 2.1.1 Multiplication Rule for Conditional Probabilities. Let A and B be events. If Pr(B) > 0, then Pr(A ∩ B) = Pr(B) Pr(A|B). If Pr(A) > 0, then Pr(A ∩ B) = Pr(A) Pr(B|A). Example 2.1.6 Selecting Two Balls. Suppose that two balls are to be selected at random, without replacement, from a box containing r red balls and b blue balls. We shall determine the probability p that the first ball will be red and the second ball will be blue. Let A be the event that the first ball is red, and let B be the event that the second ball is blue. Obviously, Pr(A) = r/(r + b). Furthermore, if the event A has occurred, then one red ball has been removed from the box on the first draw. Therefore, the 58 Capítulo 2 Probabilidade Condicional o paciente teve uma recaída. Podemos calcular Pr(B)=34/150 e Pr.(A∩B)=24/150 diretamente da mesa. Então Pr.(A|B)=24/34 = 0.706. Por outro lado, se for descoberto que o paciente selecionado aleatoriamente recebeu lítio (chame este eventoC) então Pr(C)=38/150, Pr.(A∩C)=13/150, e Pr.(A|C)=13/38 = 0.342. Saber qual o tratamento que um paciente recebeu parece fazer diferença na probabilidade de recaída. No Capítulo 10, estudaremos métodos para sermos mais precisos sobre a diferença que isso faz. - Exemplo 2.1.5 Jogando dados repetidamente.Suponha que dois dados sejam lançados repetidamente e o somaTdos dois números deve ser observado para cada lançamento. Vamos determinar a probabilidadepque o valorT=7 será observado antes do valorT=8 é observado. A probabilidade desejadappoderia ser calculado diretamente da seguinte forma: Poderíamos assumir que o espaço amostralScontém todas as sequências de resultados que terminam assim que a somaT=7 ou a somaT=8 é obtido. Então poderíamos encontrar a soma das probabilidades de todas as sequências que terminam quando o valorT=7 é obtido. No entanto, há uma abordagem mais simples neste exemplo. Podemos considerar o experimento simples em que dois dados são lançados. Se repetirmos a experiência até que a somaT=7 ou a somaT=8 for obtido, o efeito é restringir o resultado do experimento a um desses dois valores. Portanto, o problema pode ser reformulado da seguinte forma: Dado que o resultado do experimento é ouT=7 ouT=8, determine a probabilidadepque o resultado é realmenteT=7. Se deixarmosAseja o evento queT=7 e deixeBseja o evento em que o valor deTé 7 ou 8, entãoA∩B=Ae Pr.(A∩B) Pr.(B) Pr.(A) Pr.(B) p=Pr.(A|B)= = . Das probabilidades para dois dados dadas no Exemplo 1.6.5, Pr(A)=6/36 e Pr.(B)= (6/36)+(5/36)=11/36. Portanto,p=6/11. - A regra de multiplicação para probabilidades condicionais Em alguns experimentos, certas probabilidades condicionais são relativamente fáceis de atribuir diretamente. Nestes experimentos, é então possível calcular a probabilidade de que ambos os eventos ocorram aplicando o próximo resultado que segue diretamente da Eq. (2.1.1) e a definição análoga de Pr(B|A). Teorema 2.1.1 Regra de multiplicação para probabilidades condicionais.DeixarAeBsejam eventos. Se Pr(B) >0, então Pr.(A∩B)=Pr.(B)Pr.(A|B). Se Pr(A) >0, então Pr.(A∩B)=Pr.(A)Pr.(B|A). Exemplo 2.1.6 Selecionando duas bolas.Suponha que duas bolas sejam selecionadas aleatoriamente, sem substituição, a partir de uma caixa contendoRbolas vermelhas ebbolas azuis. Vamos determinar a probabilidadepque a primeira bola será vermelha e a segunda bola será azul. DeixarAseja o evento em que a primeira bola é vermelha, e sejaBseja o evento em que a segunda bola seja azul. Obviamente, Pr.(A)=r/(r+b). Além disso, se o eventoA ocorreu, então uma bola vermelha foi removida da caixa no primeiro sorteio. Portanto, o 2.1 The Definition of Conditional Probability 59 probability of obtaining a blue ball on the second draw will be Pr(B|A) = b r + b − 1. It follows that Pr(A ∩ B) = r r + b . b r + b − 1. ◀ The principle that has just been applied can be extended to any finite number of events, as stated in the following theorem. Theorem 2.1.2 Multiplication Rule for Conditional Probabilities. Suppose that A1, A2, . . . , An are events such that Pr(A1 ∩ A2 ∩ . . . ∩ An−1) > 0. Then Pr(A1 ∩ A2 ∩ . . . ∩ An) (2.1.2) = Pr(A1) Pr(A2|A1) Pr(A3|A1 ∩ A2) . . . Pr(An|A1 ∩ A2 ∩ . . . ∩ An−1). Proof The product of probabilities on the right side of Eq. (2.1.2) is equal to Pr(A1) . Pr(A1 ∩ A2) Pr(A1) . Pr(A1 ∩ A2 ∩ A3) Pr(A1 ∩ A2) . . . Pr(A1 ∩ A2 ∩ . . . ∩ An) Pr(A1 ∩ A2 . . . ∩ An−1). Since Pr(A1 ∩ A2 ∩ . . . ∩ An−1) > 0, each of the denominators in this product must be positive. All of the terms in the product cancel each other except the final numerator Pr(A1 ∩ A2 ∩ . . . ∩ An), which is the left side of Eq. (2.1.2). Example 2.1.7 Selecting Four Balls. Suppose that four balls are selected one at a time, without replacement, from a box containing r red balls and b blue balls (r ≥ 2, b ≥ 2). We shall determine the probability of obtaining the sequence of outcomes red, blue, red, blue. If we let Rj denote the event that a red ball is obtained on the jth draw and let Bj denote the event that a blue ball is obtained on the jth draw (j = 1, . . . , 4), then Pr(R1 ∩ B2 ∩ R3 ∩ B4) = Pr(R1) Pr(B2|R1) Pr(R3|R1 ∩ B2) Pr(B4|R1 ∩ B2 ∩ R3) = r r + b . b r + b − 1 . r − 1 r + b − 2 . b − 1 r + b − 3. ◀ Note: Conditional Probabilities Behave Just Like Probabilities. In all of the sit- uations that we shall encounter in this text, every result that we can prove has a conditional version given an event B with Pr(B) > 0. Just replace all probabilities by conditional probabilities given B and replace all conditional probabilities given other events C by conditional probabilities given C ∩ B. For example, Theorem 1.5.3 says that Pr(Ac) = 1 − Pr(A). It is easy to prove that Pr(Ac|B) = 1 − Pr(A|B) if Pr(B) > 0. (See Exercises 11 and 12 in this section.) Another example is Theorem 2.1.3, which is a conditional version of the multiplication rule Theorem 2.1.2. Although a proof is given for Theorem 2.1.3, we shall not provide proofs of all such conditional theorems, because their proofs are generally very similar to the proofs of the unconditional versions. Theorem 2.1.3 Suppose that A1, A2, . . . , An, B are events such that Pr(B) > 0 and Pr(A1 ∩ A2 ∩ . . . ∩ An−1|B) > 0. Then Pr(A1 ∩ A2 ∩ . . . ∩ An|B) = Pr(A1|B) Pr(A2|A1 ∩ B) . . . × Pr(An|A1 ∩ A2 ∩ . . . ∩ An−1 ∩ B). (2.1.3) 2.1 A Definição de Probabilidade Condicional 59 probabilidade de obter uma bola azul no segundo sorteio será b R+b-1 Pr.(B|A)= . Segue que R Pr.(A∩B)= . b R+b-1 . - R+b O princípio que acabamos de aplicar pode ser estendido a qualquer número finito de eventos, conforme afirmado no teorema a seguir. Teorema 2.1.2 Regra de multiplicação para probabilidades condicionais.Suponha queA1, A2, . . . , Ansão eventos como o Pr(A1∩A2∩. . .∩An−1) >0. Então Pr.(A1∩A2∩. . .∩An) =Pr.(A1)Pr.(A2|A1)Pr.(A3|A1∩A2). . .Pr.(An|A1∩A2∩. . .∩An−1). (2.1.2) ProvaO produto das probabilidades no lado direito da Eq. (2.1.2) é igual a Pr.(A1).Pr.(A1∩A2).Pr.(A1∩A2∩A3). . .Pr.(A1∩A2∩. . .∩An) Pr.(A1) Pr.(A1∩A2) Pr.(A1∩A . . .∩An−1) . 2 Desde Pr(A1∩A2∩. . .∩An−1) >0, cada um dos denominadores deste produto deve ser positivo. Todos os termos do produto se cancelam, exceto o numerador final Pr(A1∩A 2∩. . .∩An), que é o lado esquerdo da Eq. (2.1.2). Exemplo 2.1.7 Selecionando Quatro Bolas.Suponha que quatro bolas sejam selecionadas, uma de cada vez, sem substituição, a partir de uma caixa contendoRbolas vermelhas ebbolas azuis (R≥2,b≥2). Determinaremos a probabilidade de obter a sequência de resultados vermelho, azul, vermelho, azul. Se deixarmosRjdenotam o evento em que uma bola vermelha é obtida nojdesenhe e deixe Bjdenotam o evento em que uma bola azul é obtida nojo sorteio (j=1, . . . ,4), então Pr.(R1∩B2∩R3∩B4)=Pr.(R1)Pr.(B2|R1)Pr.(R3|R1∩B2)Pr.(B4|R1∩B2∩R3) R R+b b R+b-1 = . . R-1.b-1 R+b-2R+b-3 . - Nota: As probabilidades condicionais se comportam exatamente como as probabilidades.Em todas as situações que encontraremos neste texto, todo resultado que pudermos provar tem uma versão condicional dado um eventoBcom Pr.(B) >0. Basta substituir todosprobabilidades por probabilidades condicionais dadasBe substituir todas as probabilidades condicionais dados outros eventosCpor probabilidades condicionais dadasC∩B. Por exemplo, o Teorema 1.5.3 diz que Pr(Ac)=1 − Pr(A). É fácil provar que Pr(Ac|B)=1 − Pr(A|B)se Pr(B) >0. (Veja os Exercícios 11 e 12 nesta seção.) Outro exemplo é o Teorema 2.1.3, que é uma versão condicional da regra de multiplicação Teorema 2.1.2. Embora seja dada uma prova para o Teorema 2.1.3, não forneceremos provas de todos esses teoremas condicionais, porque as suas provas são geralmente muito semelhantes às provas das versões incondicionais. Teorema 2.1.3 Suponha queA1, A2, . . . , An, Bsão eventos tais que Pr(B) >0 e Pr(A1∩A2∩. . .∩ An−1|B) >0. Então Pr.(A1∩A2∩. . .∩An|B)=Pr.(A1|B)Pr.(A2|A1∩B). . . ×Pr.(An|A1∩A2∩. . .∩An−1∩B). (2.1.3) 60 = Capitulo 2 Probabilidade Condicional 60 Chapter 2 Conditional Probability ProvaO produto das probabilidades no lado direito da Eq. (2.1.3) é igual a Proof The product of probabilities on the right side of Eq. (2.1.3) is equal to Pr.(AiNB).Pr.(AinA2nB). . . Pr.(AinA2n. . .NAnB) Pr(A,; NB) Pr(AyN A. B) Pr(AyN AQgN---NA,B) Pr.(B) Pr.(AinB) Pr.(AINA 9°*-NAmINB) © Pr(B) Pr(A, 7 B) Pr(Ay 0 Ag+ +? Ag_1 OB) Desde Pr(AiNA2n --- NAn-1| 8) >0, cada um dos denominadores neste produto deve Since Pr(A, 1 A,N--+-NA,_4|B) > 0, each of the denominators in this product must seja positivo. Todos os termos do produto se cancelam, exceto o primeiro be positive. All of the terms in the product cancel each other except the first denom- denominador e o numerador final para produzir Pr(AinA2n. . .NAnNB)Pr.(B), que 60 inator and the final numerator to yield Pr(AyN A2N--- A, B)/ Pr(B), which is lado esquerdo da Eq. (2.1.3). 7 the left side of Eq. (2.1.3). 7 Probabilidade Condicional e Particdes Conditional Probability and Partitions O Teorema 1.4.11 mostra como calcular a probabilidade de um evento particionando o espaco Theorem 1.4.11 shows how to calculate the probability of an event by partitioning amostral em dois eventos Be Bc. Este resultado generaliza-se facilmente para partigdes maiores the sample space into two events B and B°. This result easily generalizes to larger e, quando combinado com o Teorema 2.1.1, resulta numa ferramenta muito poderosa para partitions, and when combined with Theorem 2.1.1 it leads to a very powerful tool calcular probabilidades. for calculating probabilities. Definigao Particdo.Deixar Sdenotar o espaco amostral de algum experimento e considerarkeventos Definition Partition. Let S denote the sample space of some experiment, and consider k events 2.1.2 Bi,..., BeemSde tal modo que&1,..., Bksdo disjuntos e but Beu=S. Diz-se que estes 2.1.2 B,,..., B, in S such that By,..., B, are disjoint and UL, B; = S. It is said that these eventos formam umparticdodeS. events form a partition of S. Normalmente, os eventos que compéem uma particdo sdo escolhidos de modo que uma Typically, the events that make up a partition are chosen so that an important importante fonte de incerteza no problema seja reduzida se soubermos qual evento ocorreu. source of uncertainty in the problem is reduced if we learn which event has occurred. Exemplo Selecionando parafusos.Duas caixas contém parafusos longos e parafusos curtos. Suponha que uma caixa Example Selecting Bolts. Two boxes contain long bolts and short bolts. Suppose that one box 2.1.8 contém 60 parafusos longos e 40 parafusos curtos, e que a outra caixa contém 10 parafusos longos e 20 2.1.8 contains 60 long bolts and 40 short bolts, and that the other box contains 10 long bolts parafusos curtos. Suponha também que uma caixa seja selecionada aleatoriamente e um parafuso seja entao and 20 short bolts. Suppose also that one box is selected at random and a bolt is then selecionado aleatoriamente dessa caixa. Gostariamos de determinar a probabilidade de esse parafuso ser selected at random from that box. We would like to determine the probability that longo. - this bolt is long. < As particdes podem facilitar os cdlculos de probabilidades de determinados eventos. Partitions can facilitate the calculations of probabilities of certain events. Teorema Lei da probabilidade total.Suponha que os eventos&1,..., Bkformar uma particado do Theorem Law of total probability. Suppose that the events By,..., B, form a partition of the 2.1.4 espacoSe Pr(Bj) >0 paraf-1,..., k. Entado, para cada eventoAemS, 2.1.4 space S and Pr(B;) > O for j =1,..., k. Then, for every event A in S, yk k Pr.(AF Pr.(BiPr.(A| By. (2.1.4) Pr(A) = > Pr(B;) Pr(A|B;). (2.1.4) Fl j=l ProvaOs eventosBiNA, B2NA,..., Bk Aformara uma particdo deA, conforme ilustrado na Figura 2.2. Proof Theevents B}N A, By A,..., B, NA willform a partition of A, as illustrated Portanto, podemos escrever in Fig. 2.2. Hence, we can write A=(BiINA)(B2N AN. . .U(BKNA). A=(B,N A) U (B)N A) U---U (BNA). Figura 2.20 inter- S Figure 2.2 The inter- s secdes deAcom eventos 1 sections of A with events ,..., Bde uma particdo na iil ~ B,,..., Bs of a partition in Zo ™~ prova do Teorema 2.1.4. c |p the proof of Theorem 2.1.4. Cc | 2.1 A Definicado de Probabilidade Condicional 61 2.1 The Definition of Conditional Probability 61 Além disso, desde okeventos no lado direito desta equagao sdo disjuntos, Furthermore, since the k events on the right side of this equation are disjoint, yk k Pr.(AF Pr.(BNA). Pr(A) = > Pr(B, A). fl iol Finalmente, se Pr(Bj) >0 parafi1,..., k, entao Pr(BNAPr.(ByPr.(A| Bie segue Finally, if Pr(B;) > 0 for j=1,...,k, then Pr(B; 1 A) = Pr(B;) Pr(A|B;) and it que a Eq. (2.1.4) é valido. 7 follows that Eq. (2.1.4) holds. 7 Exemplo Selecionando parafusos.No Exemplo 2.1.8, deixeBisera o evento em que a primeira caixa (aquela com Example Selecting Bolts. In Example 2.1.8, let B, be the event that the first box (the one with 2.1.9 60 parafusos longos e 40 curtos) for selecionado, deixe Bzseja o evento em que a segunda caixa (aquela com 2.1.9 60 long and 40 short bolts) is selected, let By be the event that the second box (the 10 parafusos longos e 20 curtos) seja selecionada, e deixeAser 0 evento em que um parafuso longo é one with 10 long and 20 short bolts) is selected, and let A be the event that a long selecionado. Entao bolt is selected. Then Pr.(AEPr.(B1)Pr.(A| Bi }+Pr.(B2)Pr.(A| B2). Pr(A) = Pr(B,) Pr(A|B,) + Pr(B2) Pr(A|B>). Como uma caixa é selecionada aleatoriamente, sabemos que Pr(B1Pr.(B21/2. Além Since a box is selected at random, we know that Pr(B,) = Pr(B>) = 1/2. Fur- disso, a probabilidade de selecionar um parafuso longo da primeira caixa é Pr(A| 81} 60/ thermore, the probability of selecting a long bolt from the first box is Pr(A|B,) = 100 = 34, e a probabilidade de selecionar um parafuso longo da segunda caixa é Pr(A| 82) 60/100 = 3/5, and the probability of selecting a long bolt from the second box is =100 = 1A. Portanto, Pr(A|B,) = 10/30 = 1/3. Hence, 13.11.77 13 11 =+$7 Pr (AF =-=t+=-== =. - Pr(A)==-2+5->=—7. < 25 2 3 15 (5552 3755 Exemplo Alcangando uma pontuacao alta.Suponha que uma pessoa jogue um jogo em que sua pontuacao deva ser Example Achieving a High Score. Suppose that a person plays a game in which his score must be 2.1.10 um dos 50 numeros 1,2,...,50 e que cada um desses 50 numeros tem a mesma probabilidade 2.1.10 one of the 50 numbers 1, 2, . .., 50 and that each of these 50 numbers is equally likely de ser sua pontuacao. Na primeira vez que ele joga, sua pontuac¢do éX. Ele ent&o continua a to be his score. The first time he plays the game, his score is X. He then continues to jogar até obter outra pontuacdoSde tal modo queS=X. Assumiremos que, condicionalmente as play the game until he obtains another score Y such that Y > X. We will assume that, jogadas anteriores, os 50 resultados permanecem igualmente provaveis em todas as jogadas conditional on previous plays, the 50 scores remain equally likely on all subsequent subsequentes. Vamos determinar a probabilidade do eventoAqueS=50. plays. We shall determine the probability of the event A that Y = 50. Para cadaeu=1,...,50, deixe Beuseja 0 evento queX=eu. Condicional em Beu, 0 valor deSé For each i = 1,..., 50, let B; be the event that X =i. Conditional on B,, the igualmente provavel que seja qualquer um dos nimeroseu, eu+1,...,50. Uma vez que cada um value of Y is equally likely to be any one of the numbers i, i + 1,..., 50. Since each destes (51 -eu) valores possiveis paraSé igualmente provavel, segue-se que of these (51 — 7) possible values for Y is equally likely, it follows that 1 1 Pr.(A| Beu}=Pr.(S=50| Beu= ———. Pr(A|B;) = Pr(Y = 50|B;) = ——. 51 -eu 51-i Além disso, como a probabilidade de cada um dos 50 valores deXé 1/50, segue-se que Pr Furthermore, since the probability of each of the 50 values of X is 1/50, it follows that (Beu¥1A0 para todoseue Pr(B;) = 1/50 for all i and 01 ( ) 50 ya 1 1 1.1 1 1 1 1 11 1 Pr.(AF — -—— = — 1t+a+ =+.---+ — =0,09:00. - Pr(A) = warp (tg tg te tg) 0.000. < 50 51-eu 50 2 3 50 () 5 51-i 50 2 3 50 eu=1 i=1 Nota: Versdo Condicional da Lei da Probabilidade Total.A lei da probabilidade total tem Note: Conditional Version of Law of Total Probability. The law of total probability uma condicional andloga a outro evento, a saber, has an analog conditional on another event C, namely, ye k Pr.(A| G= Pr.(Bj| OPr.(A| BNO). (2.1.5) Pr(A|C) = )° Pr(B,|C) Pr(A|B; OC). (2.1.5) Fil j=l O leitor pode provar isso no Exercicio 17. The reader can prove this in Exercise 17. Experiéncia AumentadaéEm alguns experimentos, pode nao ficar claro na Augmented Experiment In some experiments, it may not be clear from the initial descricdo inicial do experimento que existe uma parti¢do que facilitara o calculo de description of the experiment that a partition exists that will facilitate the calculation probabilidades. No entanto, existem muitos experimentos em que tal particdo existe of probabilities. However, there are many such experiments in which such a partition se imaginarmos que 0 experimento tem alguma estrutura adicional. Considere a exists if we imagine that the experiment has some additional structure. Consider the seguinte modificagdo dos Exemplos 2.1.8 e 2.1.9. following modification of Examples 2.1.8 and 2.1.9. 62 Capitulo 2 Probabilidade Condicional 62 Chapter 2 Conditional Probability Exemplo Selecionando parafusos.Ha uma caixa de parafusos que contém alguns parafusos longos e outros curtos Example Selecting Bolts. There is one box of bolts that contains some long and some short 2.1.11 parafusos. Uma gerente ndo consegue abrir a caixa no momento e pergunta aos funcionarios qual é a 2.1.11 bolts. A manager is unable to open the box at present, so she asks her employees composicao da caixa. Um funcionario diz que contém 60 parafusos longos e 40 parafusos curtos. what is the composition of the box. One employee says that it contains 60 long bolts Outro diz que contém 10 parafusos longos e 20 parafusos curtos. Incapaz de conciliar essas opinides, and 40 short bolts. Another says that it contains 10 long bolts and 20 short bolts. © gestor decide que cada um dos funcionarios esta correto com probabilidade 1/2. Deixar Biseja o Unable to reconcile these opinions, the manager decides that each of the employees evento em que a caixa contém 60 parafusos longos e 40 parafusos curtos, e deixeB2seja o evento em is correct with probability 1/2. Let B, be the event that the box contains 60 long and que a caixa contém 10 parafusos longos e 20 parafusos curtos. A probabilidade de 0 primeiro 40 short bolts, and let B, be the event that the box contains 10 long and 20 short parafuso selecionado ser longo agora é calculada precisamente como no Exemplo 2.1.9. bolts. The probability that the first bolt selected is long is now calculated precisely as - in Example 2.1.9. < No Exemplo 2.1.11 existe apenas uma caixa de parafusos, mas acreditamos que ela tenha In Example 2.1.11, there is only one box of bolts, but we believe that it has one uma de duas composicGes possiveis. Deixamos os eventos81eS2determinar as composigées of two possible compositions. We let the events B,; and B, determine the possible possiveis. Esse tipo de situagdo 6 muito comum em experimentos. compositions. This type of situation is very common in experiments. Exemplo Um ensaio clinico.Considere um ensaio clinico como 0 estudo de tratamentos para a depressdo Example A Clinical Trial. Consider a clinical trial such as the study of treatments for depression 2.1.12 no Exemplo 2.1.4. Tal como em muitos destes ensaios, cada paciente tem dois resultados 2.1.12 in Example 2.1.4. As in many such trials, each patient has two possible outcomes, possiveis, neste caso recaida e ndo recaida. Iremos nos referir 4 recaida como “fracasso” e a in this case relapse and no relapse. We shall refer to relapse as “failure” and no auséncia de recaida como “sucesso”. Por enquanto, consideraremos apenas os pacientes do relapse as “success.” For now, we shall consider only patients in the imipramine grupo de tratamento com imipramina. Se soubéssemos a eficacia da imipramina, ou seja, a treatment group. If we knew the effectiveness of imipramine, that is, the proportion proporcdo pde sucessos entre todos os pacientes que poderiam receber o tratamento, entdo p of successes among all patients who might receive the treatment, then we might poderiamos modelar os pacientes em nosso estudo como tendo probabilidadepde sucesso. model the patients in our study as having probability p of success. Unfortunately, we Infelizmente, nado sabemospno inicio do julgamento. Em analogia a caixa de parafusos com do not know p at the start of the trial. In analogy to the box of bolts with unknown composic¢ao desconhecida no Exemplo 2.1.11, podemos imaginar que a colegdo de todos os composition in Example 2.1.11, we can imagine that the collection of all available pacientes disponiveis (da qual foram selecionados os 40 pacientes com imipramina neste patients (from which the 40 imipramine patients in this trial were selected) has two or ensaio) tem duas ou mais composigées possiveis. Podemos imaginar que a composicado do more possible compositions. We can imagine that the composition of the collection of acervo de pacientes determina a proporcdo que tera sucesso. Para simplificar, neste exemplo, patients determines the proportion that will be success. For simplicity, in this example, imaginamos que existem 11 diferentes composicdes possiveis da colegdo de pacientes. Em we imagine that there are 11 different possible compositions of the collection of particular, assumimos que as proporcées de sucesso para as 11 composicées possiveis sao 0,1/ patients. In particular, we assume that the proportions of success for the 11 possible 10,...,9/10,1. (Seremos capazes de lidar com modelos mais realistas parapno Capitulo 3.) Por compositions are 0, 1/10, ..., 9/10, 1. (We shall be able to handle more realistic exemplo, se soubéssemos que nossos pacientes foram retirados de uma colegdo com a models for p in Chapter 3.) For example, if we knew that our patients were drawn proporcdo 3/10 de sucessos, ficariamos confortaveis em dizer que cada paciente em nossa from a collection with the proportion 3/10 of successes, we would be comfortable amostra tem probabilidade de sucessop=3/ 0. O valor depé uma importante fonte de incerteza saying that the patients in our sample each have success probability p = 3/10. The neste problema, e particionaremos o espaco amostral pelos possiveis valores dep. Para1,..., value of p is animportant source of uncertainty in this problem, and we shall partition 11, deixe8ser o evento em que nossa amostra foi retirada de uma colecdo com propor¢ao(-1)/ the sample space by the possible values of p. For j = 1,..., 11, let B; be the event 10 de sucessos. Também podemos identificar Bicomo o evento {p=(/-1 JA 0}. that our sample was drawn from a collection with proportion (j — 1)/10 of successes. We can also identify B; as the event {p = (j — 1)/10}. Agora deixefiseja o evento em que o primeiro paciente do grupo da imipramina Now, let E, be the event that the first patient in the imipramine group has a tenha sucesso. Definimos cada evento&para que Pr(£1| BE (-1)/A0. Suponha que, antes de success. We defined each event B; so that Pr(£,|B;) = (j — 1)/10. Supppose that, iniciar o julgamento, acreditamos que o Pr(Bj-1/11 para cada/.Segue que prior to starting the trial, we believe that Pr(B;) = 1/11 for each j. It follows that . 11 . Pr.(Eik afl 55 1 (2.1.6) Pr(E,) =>) tij-t_ St (2.1.6) fl 1110 110 2 1 11 10 110 2 onde a segunda igualdade usa o fato de que 2 Pagn(n+ 2. - where the second equality uses the fact that Viet jJ=n(n +1)/2. < Os eventos, B2,..., B1ino Exemplo 2.1.12 pode ser pensado da mesma maneira que os The events By, By, ..., By, in Example 2.1.12 can be thought of in much the dois eventosS1eB2que determinam a mistura de parafusos longos e curtos no Exemplo 2.1.11. same way as the two events B, and B> that determine the mixture of long and short Existe apenas uma caixa de parafusos, mas ha incerteza quanto 4 sua composi¢ao. Da mesma bolts in Example 2.1.11. There is only one box of bolts, but there is uncertainty about forma no Exemplo 2.1.12, existe apenas um grupo de pacientes, mas acreditamos que ele its composition. Similarly in Example 2.1.12, there is only one group of patients, possui uma das 11 composicées possiveis determinadas pelos eventos 81, 82,..., B11. Para but we believe that it has one of 11 possible compositions determined by the events chamar esses eventos, eles devem ser subconjuntos do espaco amostral do experimento em B,, Bo, ..., By. To call these events, they must be subsets of the sample space for the questdao. Esse sera 0 caso no Exemplo 2.1.12 se imaginarmos que experiment in question. That will be the case in Example 2.1.12 if we imagine that 2.1 The Definition of Conditional Probability 63 the experiment consists not only of observing the numbers of successes and failures among the patients but also of potentially observing enough additional patients to be able to compute p, possibly at some time very far in the future. Similarly, in Example 2.1.11, the two events B1 and B2 are subsets of the sample space if we imagine that the experiment consists not only of observing one sample bolt but also of potentially observing the entire composition of the box. Throughout the remainder of this text, we shall implicitly assume that experi- ments are augmented to include outcomes that determine the values of quantities such as p. We shall not require that we ever get to observe the complete outcome of the experiment so as to tell us precisely what p is, but merely that there is an exper- iment that includes all of the events of interest to us, including those that determine quantities like p. Definition 2.1.3 Augmented Experiment. If desired, any experiment can be augmented to include the potential or hypothetical observation of as much additional information as we would find useful to help us calculate any probabilities that we desire. Definition 2.1.3 is worded somewhat vaguely because it is intended to cover a wide variety of cases. Here is an explicit application to Example 2.1.12. Example 2.1.13 A Clinical Trial. In Example 2.1.12, we could explicitly assume that there exists an infinite sequence of patients who could be treated with imipramine even though we will observe only finitely many of them. We could let the sample space consist of infinite sequences of the two symbols S and F such as (S, S, F, S, F, F, F, . . .). Here S in coordinate i means that the ith patient is a success, and F stands for failure. So, the event E1 in Example 2.1.12 is the event that the first coordinate is S. The example sequence above is then in the event E1. To accommodate our interpretation of p as the proportion of successes, we can assume that, for every such sequence, the proportion of S’s among the first n coordinates gets close to one of the numbers 0, 1/10, . . . , 9/10, 1 as n increases. In this way, p is explicitly the limit of the proportion of successes we would observe if we could find a way to observe indefinitely. In Example 2.1.12, B2 is the event consisting of all the outcomes in which the limit of the proportion of S’s equals 1/10, B3 is the set of outcomes in which the limit is 2/10, etc. Also, we observe only the first 40 coordinates of the infinite sequence, but we still behave as if p exists and could be determined if only we could observe forever. ◀ In the remainder of the text, there will be many experiments that we assume are augmented. In such cases, we will mention which quantities (such as p in Exam- ple 2.1.13) would be determined by the augmented part of the experiment even if we do not explicitly mention that the experiment is augmented. The Game of Craps We shall conclude this section by discussing a popular gambling game called craps. One version of this game is played as follows: A player rolls two dice, and the sum of the two numbers that appear is observed. If the sum on the first roll is 7 or 11, the player wins the game immediately. If the sum on the first roll is 2, 3, or 12, the player loses the game immediately. If the sum on the first roll is 4, 5, 6, 8, 9, or 10, then the two dice are rolled again and again until the sum is either 7 or the original value. If the original value is obtained a second time before 7 is obtained, then the 2.1 A Definição de Probabilidade Condicional 63 o experimento consiste não apenas em observar o número de sucessos e fracassos entre os pacientes, mas também em observar potencialmente pacientes adicionais suficientes para poder calcularp, possivelmente em algum momento muito distante no futuro. Da mesma forma, no Exemplo 2.1.11, os dois eventosB1eB2são subconjuntos do espaço amostral se imaginarmos que o experimento consiste não apenas em observar um parafuso amostral, mas também em observar potencialmente toda a composição da caixa. Ao longo do restante deste texto, assumiremos implicitamente que os experimentos são aumentados para incluir resultados que determinam os valores de quantidades como p. Não exigiremos que observemos o resultado completo do experimento para nos dizer precisamente o quepé, mas apenas que existe um experimento que inclui todos os eventos de interesse para nós, incluindo aqueles que determinam quantidades comop. Definição 2.1.3 Experimento Aumentado.Se desejado, qualquer experimento pode ser aumentado para incluir o observação potencial ou hipotética de tantas informações adicionais quanto considerarmos úteis para nos ajudar a calcular quaisquer probabilidades que desejarmos. A definição 2.1.3 está redigida de forma um tanto vaga porque se destina a cobrir uma ampla variedade de casos. Aqui está uma aplicação explícita ao Exemplo 2.1.12. Exemplo 2.1.13 Um ensaio clínico.No Exemplo 2.1.12, poderíamos assumir explicitamente que existe um sequência infinita de pacientes que poderiam ser tratados com imipramina, embora observemos apenas um número finito deles. Poderíamos deixar o espaço amostral consistir em sequências infinitas dos dois símbolosSeFcomo(S, S, F, S, F, F, F, . . .). AquiSem coordenadaseu significa que oeuo paciente é um sucesso, eFsignifica fracasso. Então, o eventoE1no Exemplo 2.1.12 é o evento em que a primeira coordenada éS. A sequência de exemplo acima é então no eventoE1. Para acomodar nossa interpretação depcomo a proporção de sucessos, podemos assumir que, para cada sequência desse tipo, a proporção deSestá entre os primeirosn coordenadas se aproxima de um dos números 0,1/10, . . . ,9/10,1 comonaumenta. Desta maneira,pé explicitamente o limite da proporção de sucessos que observaríamos se pudéssemos encontrar uma maneira de observar indefinidamente. No Exemplo 2.1.12,B2é o evento que consiste em todos os resultados em que o limite da proporção deSé igual a 1/10,B3é o conjunto de resultados em que o limite é 2/10, etc. Além disso, observamos apenas as primeiras 40 coordenadas da sequência infinita, mas ainda nos comportamos como sepexiste e poderia ser determinado se pudéssemos observar para sempre. - No restante do texto, haverá muitos experimentos que presumimos serem aumentados. Nesses casos, mencionaremos quais quantidades (comopno Exemplo 2.1.13) seria determinado pela parte aumentada do experimento, mesmo que não mencionemos explicitamente que o experimento é aumentado. O jogo de dados Concluiremos esta seção discutindo um popular jogo de azar chamado craps. Uma versão deste jogo é jogada da seguinte forma: um jogador lança dois dados e a soma dos dois números que aparecem é observada. Se a soma no primeiro lançamento for 7 ou 11, o jogador ganha o jogo imediatamente. Se a soma no primeiro lançamento for 2,3 ou 12, o jogador perde o jogo imediatamente. Se a soma no primeiro lançamento for 4, 5, 6, 8, 9 ou 10, os dois dados serão lançados repetidamente até que a soma seja 7 ou o valor original. Se o valor original for obtido uma segunda vez antes de 7 ser obtido, então o 64 Capitulo 2 Probabilidade Condicional 64 Chapter 2 Conditional Probability jogador vence. Se asoma 7 for obtida antes do valor original ser obtido pela segunda player wins. If the sum 7 is obtained before the original value is obtained a second vez, o jogador perde. time, then the player loses. Vamos agora calcular a probabilidade Pr(C), ondeCé 0 evento em que 0 jogador vencera. We shall now compute the probability Pr(W), where W is the event that the Deixe o espaco amostralSconsistem em todas as sequéncias possiveis de somas dos player will win. Let the sample space S consist of all possible sequences of sums from langamentos de dados que podem ocorrer em um jogo. Por exemplo, alguns dos elementosS the rolls of dice that might occur in a game. For example, some of the elements of S are sdo (4,7),(11),4,3,4),12),10,8,2,12,6,7), etc. Vemos isso(11)€Cmas(4,7 €Cz, etc. Comegamos (4, 7), (11), (4, 3, 4), (12), (10, 8, 2, 12, 6, 7), etc. We see that (11) € W but (4, 7) e WS, percebendo que se um resultado esta ou ndo emCdepende de forma crucial do primeiro etc.. We begin by noticing that whether or not an outcome is in W depends in a crucial langamento. Por esta razdo, faz sentido particionarCde acordo com a soma do primeiro way on the first roll. For this reason, it makes sense to partition W according to the langamento. Deixar Beuseja o evento em que apyimeiro langamento éeuparaeu=2, ...,12. sum on the first roll. Let B; be the event that the first roll isi fori =2,..., 12. O Teorema 2.1.4 nos diz que Pr(C ev2Pr.(Beu)Pr.(C| Beu). Desde Pr(Beu)para cada eufoi Theorem 2.1.4 tells us that Pr(W) = YS Pr(B;) Pr(W|B;). Since Pr(B;) for each calculado no Exemplo 1.6.5, precisamos determinar Pr(C| BeuJpara cadaeu. Comecamos comeu=2. i was computed in Example 1.6.5, we need to determine Pr(W|B;) for each i. We Como 0 jogador perde se o primeiro lancamento for 2, temos Pr(C| 820. Da mesma forma, Pr(C| 23 begin with i = 2. Because the player loses if the first roll is 2, we have Pr(W|B>) = 0. 0 = Pr(C| B12). Além disso, Pr.(C| 87#1 porque o jogador ganha se o primeiro langamento for 7. Da Similarly, Pr(W|B3) = 0 = Pr(W|B,2). Also, Pr(W|B-7) = 1 because the player wins if mesma forma, Pr(C| B11. the first roll is 7. Similarly, Pr(W|B,,) = 1. Para cada primeiro langamentoevue€ {4,5,6,8,9,10}, Pr.(C| Beué a probabilidade de que, em For each first roll i € {4, 5, 6, 8, 9, 10}, Pr(W|B;) is the probability that, in a uma sequéncia de langamentos de dados, a somaeusera obtido antes que a soma 7 seja obtida. sequence of dice rolls, the sum i will be obtained before the sum 7 is obtained. As Conforme descrito no Exemplo 2.1.5, esta probabilidade é igual a probabilidade de obter a described in Example 2.1.5, this probability is the same as the probability of obtaining somaeuquando a soma deve sereuou 7. Portanto, the sum i when the sum must be either i or 7. Hence, Pr.(C| BeuJ= eee Pr(W|B) = Bd Pr.(BeuJB7) Pr(B; U Bz) Calculamos os valores necessdarios aqui: We compute the necessary values here: se 1 % 2 x 1 % 2 Pr(C|Bae 28 = =, PW BR = =, Pr(W|By) === ==, — P(W|Bs) = P= =e, 36+ 363 sete tas (3 x ta = 5 3. 5 3 _ 5 3 _S Pr.(C| Bee == = =, Pr.(C|Be z= —, Pr(W|Be) = = HT, OPr(WIB3) = =e 3o+ 36s B+6 x11 jt3606(C go t36 0 ( 4. Ly 3 4 A 2 % 1 Pr(C|Be 72 —= 5, Pr(C|Bi= => = =. Pr(W|By) =~ — ==, — Pr(W| By) = = — = =. 36+36 36+ 36 36 36 36 36 Finalmente, calculamos a soma 212 spr(BeuPr.(C Beu): Finally, we compute the sum Ye) Pr(B;) Pr(W|B;): y? 31, 42,55, 6 < 31,42,55., 6 a+ St D+ = ; j= Tay Hf 4p tt ype Pr.(CF Prt euPrtc Beu-0 + 0 + 363° 366° acu’ 3% Pr(W) 2 Pr(B;) Pr(W|B;) =0+0+ 363+ 365+ 311% 4 224 424 B12 9 = 29286 493, 42 42 3h 2 4 a 28 — 0.493, 3611 365 363 36 5940 3611 365 363 36 5940 Assim, a probabilidade de ganhar no jogo de dados é ligeiramente inferior a 1/2. Thus, the probability of winning in the game of craps is slightly less than 1/2. Resumo Summary A probabilidade revisada de um eventoAdepois de saber daquele evento A(com Pr.(B) >0) The revised probability of an event A after learning that event B (with Pr(B) > 0) ocorreu é a probabilidade condicional deAdado8, denotado por Pr(A| BJe calculado como has occurred is the conditional probability of A given B, denoted by Pr(A|B) and Pr(ANB)Pr.(B). Muitas vezes é facil avaliar uma probabilidade condicional, como Pr(A| B), computed as Pr(A MN B)/ Pr(B). Often it is easy to assess a conditional probability, diretamente. Nesse caso, podemos usar a regra de multiplicagdo para probabilidades such as Pr(A|B), directly. In such a case, we can use the multiplication rule for con- condicionais para calcular Pr(ANB}Pr.(8)Pr.(A| 8). Todos os resultados de probabilidade ditional probabilities to compute Pr(A N B) = Pr(B) Pr(A|B). All probability results tém versdes condicionais a um evento&com Pr.(8) >0: Apenas mudetodosprobabilidades have versions conditional on an event B with Pr(B) > 0: Just change all probabili- de modo que sejam condicionais aBalém de qualquer outra coisa, eles ja estavam ties so that they are conditional on B in addition to anything else they were already 2.1 The Definition of Conditional Probability 65 conditional on. For example, the multiplication rule for conditional probabilities be- comes Pr(A1 ∩ A2|B) = Pr(A1|B) Pr(A2|A1 ∩ B). A partition is a collection of disjoint events whose union is the whole sample space. To be most useful, a partition is cho- sen so that an important source of uncertainty is reduced if we learn which one of the partition events occurs. If the conditional probability of an event A is available given each event in a partition, the law of total probability tells how to combine these conditional probabilities to get Pr(A). Exercises 1. If A ⊂ B with Pr(B) > 0, what is the value of Pr(A|B)? 2. If A and B are disjoint events and Pr(B) > 0, what is the value of Pr(A|B)? 3. If S is the sample space of an experiment and A is any event in that space, what is the value of Pr(A|S)? 4. Each time a shopper purchases a tube of toothpaste, he chooses either brand A or brand B. Suppose that for each purchase after the first, the probability is 1/3 that he will choose the same brand that he chose on his preceding purchase and the probability is 2/3 that he will switch brands. If he is equally likely to choose either brand A or brand B on his first purchase, what is the probability that both his first and second purchases will be brand A and both his third and fourth purchases will be brand B? 5. A box contains r red balls and b blue balls. One ball is selected at random and its color is observed. The ball is then returned to the box and k additional balls of the same color are also put into the box. A second ball is then selected at random, its color is observed, and it is returned to the box together with k additional balls of the same color. Each time another ball is selected, the process is repeated. If four balls are selected, what is the probability that the first three balls will be red and the fourth ball will be blue? 6. A box contains three cards. One card is red on both sides, one card is green on both sides, and one card is red on one side and green on the other. One card is selected from the box at random, and the color on one side is observed. If this side is green, what is the probability that the other side of the card is also green? 7. Consider again the conditions of Exercise 2 of Sec. 1.10. If a family selected at random from the city subscribes to newspaper A, what is the probability that the family also subscribes to newspaper B? 8. Consider again the conditions of Exercise 2 of Sec. 1.10. If a family selected at random from the city subscribes to at least one of the three newspapers A, B, and C, what is the probability that the family subscribes to newspaper A? 9. Suppose that a box contains one blue card and four red cards, which are labeled A, B, C, and D. Suppose also that two of these five cards are selected at random, without replacement. a. If it is known that card A has been selected, what is the probability that both cards are red? b. If it is known that at least one red card has been selected, what is the probability that both cards are red? 10. Consider the following version of the game of craps: The player rolls two dice. If the sum on the first roll is 7 or 11, the player wins the game immediately. If the sum on the first roll is 2, 3, or 12, the player loses the game immediately. However, if the sum on the first roll is 4, 5, 6, 8, 9, or 10, then the two dice are rolled again and again until the sum is either 7 or 11 or the original value. If the original value is obtained a second time before either 7 or 11 is obtained, then the player wins. If either 7 or 11 is obtained before the original value is obtained a second time, then the player loses. Determine the probability that the player will win this game. 11. For any two events A and B with Pr(B) > 0, prove that Pr(Ac|B) = 1 − Pr(A|B). 12. For any three events A, B, and D, such that Pr(D) > 0, prove that Pr(A ∪ B|D) = Pr(A|D) + Pr(B|D) − Pr(A ∩ B|D). 13. A box contains three coins with a head on each side, four coins with a tail on each side, and two fair coins. If one of these nine coins is selected at random and tossed once, what is the probability that a head will be obtained? 14. A machine produces defective parts with three differ- ent probabilities depending on its state of repair. If the machine is in good working order, it produces defective parts with probability 0.02. If it is wearing down, it pro- duces defective parts with probability 0.1. If it needs main- tenance, it produces defective parts with probability 0.3. The probability that the machine is in good working order is 0.8, the probability that it is wearing down is 0.1, and the probability that it needs maintenance is 0.1. Compute the probability that a randomly selected part will be defective. 2.1 A Definição de Probabilidade Condicional 65 condicional. Por exemplo, a regra de multiplicação para probabilidades condicionais torna-se Pr (A1∩A2|B)=Pr.(A1|B)Pr.(A2|A1∩B). Uma partição é uma coleção de eventos disjuntos cuja união é todo o espaço amostral. Para ser mais útil, uma partição é escolhida de modo que uma importante fonte de incerteza seja reduzida se soubermos qual dos eventos de partição ocorre. Se a probabilidade condicional de um eventoAestá disponível dado cada evento em uma partição, a lei da probabilidade total diz como combinar essas probabilidades condicionais para obter Pr(A). Exercícios 1.SeA⊂Bcom Pr.(B) >0, qual é o valor de Pr(A|B)? duas dessas cinco cartas são selecionadas aleatoriamente, sem reposição. 2.SeAeBsão eventos disjuntos e Pr(B) >0, qual é o valor de Pr(A|B)? a.Se for conhecido esse cartãoAfoi selecionado, qual é a probabilidade de ambas as cartas serem vermelhas? 3.SeSé o espaço amostral de um experimento eAé qualquer evento nesse espaço, qual é o valor de Pr(A|S)? b.Se for sabido que pelo menos um cartão vermelho foi selecionado, qual é a probabilidade de ambos os cartões serem vermelhos? 4.Cada vez que um comprador compra um tubo de pasta de dente, ele escolhe a marca A ou a marca B. Suponha que, para cada compra após a primeira, a probabilidade seja de 1/3 de que ele escolha a mesma marca que escolheu na compra anterior e a a probabilidade é de 2/3 de que ele troque de marca. Se for igualmente provável que ele escolha a marca A ou a marca B em sua primeira compra, qual é a probabilidade de que tanto a primeira como a segunda compras sejam da marca A e que a terceira e a quarta compras sejam da marca B? 10.Considere a seguinte versão do jogo de dados: O jogador lança dois dados. Se a soma no primeiro lançamento for 7 ou 11, o jogador ganha o jogo imediatamente. Se a soma no primeiro lançamento for 2,3 ou 12, o jogador perde o jogo imediatamente. No entanto, se a soma no primeiro lançamento for 4,5,6,8,9 ou 10, então os dois dados são lançados repetidamente até que a soma seja 7 ou 11 ou o valor original. Se o valor original for obtido uma segunda vez antes de obter 7 ou 11, o jogador ganha. Se 7 ou 11 forem obtidos antes do valor original ser obtido pela segunda vez, o jogador perde. Determine a probabilidade de o jogador vencer este jogo. 5.Uma caixa contémRbolas vermelhas ebbolas azuis. Uma bola é selecionada aleatoriamente e sua cor é observada. A bola é então devolvida à área ekbolas adicionais da mesma cor também são colocadas na caixa. Uma segunda bola é então selecionada aleatoriamente, sua cor é observada e ela é devolvida à caixa junto comkbolas adicionais da mesma cor. Cada vez que outra bola é selecionada, o processo se repete. Se quatro bolas forem selecionadas, qual é a probabilidade de as três primeiras bolas serem vermelhas e a quarta bola ser azul? 11.Para quaisquer dois eventosAeBcom Pr.(B) >0, prove que Pr(Ac|B)=1 − Pr(A|B). 12.Para quaisquer três eventosA,B, eD, tal que Pr(D) >0, prove que Pr(A∪B|D)=Pr.(A|D)+Pr.(B|D)-Pr.(A∩ B|D). 6.Uma caixa contém três cartas. Uma carta é vermelha em ambos os lados, uma carta é verde em ambos os lados e uma carta é vermelha em um lado e verde no outro. Uma carta é selecionada aleatoriamente da caixa e a cor de um lado é observada. Se este lado for verde, qual é a probabilidade de o outro lado da carta também ser verde? 13.Uma caixa contém três moedas com cara de cada lado, quatro moedas com cauda de cada lado e duas moedas honestas. Se uma dessas nove moedas for selecionada aleatoriamente e lançada uma vez, qual é a probabilidade de obter cara? 7.Considere novamente as condições do Exercício 2 da Seção. 1.10. Se uma família selecionada aleatoriamente na cidade assina um jornalA, qual é a probabilidade de a família também assinar jornaisB? 14.Uma máquina produz peças defeituosas com três probabilidades diferentes, dependendo do seu estado de conservação. Se a máquina estiver em boas condições de funcionamento, ela produzirá peças defeituosas com probabilidade 0,02. Se estiver desgastado, produz peças defeituosas com probabilidade 0,1. Se precisar de manutenção, produz peças defeituosas com probabilidade 0,3. A probabilidade de a máquina estar em boas condições de funcionamento é 0,8, a probabilidade de estar desgastada é 0,1 e a probabilidade de necessitar de manutenção é 0,1. Calcule a probabilidade de uma peça selecionada aleatoriamente apresentar defeito. 8.Considere novamente as condições do Exercício 2 da Seção. 1.10. Se uma família selecionada aleatoriamente na cidade assina pelo menos um dos três jornaisA,B, eC, qual é a probabilidade de a família assinar jornaisA? 9.Suponha que uma caixa contenha um cartão azul e quatro cartões vermelhos, rotuladosA,B,C, eD. Suponha também que 66 Chapter 2 Conditional Probability 15. The percentages of voters classed as Liberals in three different election districts are divided as follows: in the first district, 21 percent; in the second district, 45 percent; and in the third district, 75 percent. If a district is selected at random and a voter is selected at random from that district, what is the probability that she will be a Liberal? 16. Consider again the shopper described in Exercise 4. On each purchase, the probability that he will choose the same brand of toothpaste that he chose on his preced- ing purchase is 1/3, and the probability that he will switch brands is 2/3. Suppose that on his first purchase the proba- bility that he will choose brand A is 1/4 and the probability that he will choose brand B is 3/4. What is the probability that his second purchase will be brand B? 17. Prove the conditional version of the law of total prob- ability (2.1.5). 2.2 Independent Events If learning that B has occurred does not change the probability of A, then we say that A and B are independent. There are many cases in which events A and B are not independent, but they would be independent if we learned that some other event C had occurred. In this case, A and B are conditionally independent given C. Example 2.2.1 Tossing Coins. Suppose that a fair coin is tossed twice. The experiment has four outcomes, HH, HT, TH, and TT, that tell us how the coin landed on each of the two tosses. We can assume that this sample space is simple so that each outcome has probability 1/4. Suppose that we are interested in the second toss. In particular, we want to calculate the probability of the event A = {H on second toss}. We see that A = {HH,TH}, so that Pr(A) = 2/4 = 1/2. If we learn that the first coin landed T, we might wish to compute the conditional probability Pr(A|B) where B = {T on first toss}. Using the definition of conditional probability, we easily compute Pr(A|B) = Pr(A ∩ B) Pr(B) = 1/4 1/2 = 1 2, because A ∩ B = {T H} has probability 1/4. We see that Pr(A|B) = Pr(A); hence, we don’t change the probability of A even after we learn that B has occurred. ◀ Definition of Independence The conditional probability of the event A given that the event B has occurred is the revised probability of A after we learn that B has occurred. It might be the case, however, that no revision is necessary to the probability of A even after we learn that B occurs. This is precisely what happened in Example 2.2.1. In this case, we say that A and B are independent events. As another example, if we toss a coin and then roll a die, we could let A be the event that the die shows 3 and let B be the event that the coin lands with heads up. If the tossing of the coin is done in isolation of the rolling of the die, we might be quite comfortable assigning Pr(A|B) = Pr(A) = 1/6. In this case, we say that A and B are independent events. In general, if Pr(B) > 0, the equation Pr(A|B) = Pr(A) can be rewritten as Pr(A ∩ B)/ Pr(B) = Pr(A). If we multiply both sides of this last equation by Pr(B), we obtain the equation Pr(A ∩ B) = Pr(A) Pr(B). In order to avoid the condition Pr(B) > 0, the mathematical definition of the independence of two events is stated as follows: Definition 2.2.1 Independent Events. Two events A and B are independent if Pr(A ∩ B) = Pr(A) Pr(B). 66 Capítulo 2 Probabilidade Condicional 15.As percentagens de eleitores classificados como liberais em três distritos eleitorais diferentes estão divididas da seguinte forma: no primeiro distrito, 21 por cento; no segundo distrito, 45 por cento; e no terceiro distrito, 75 por cento. Se um distrito for selecionado aleatoriamente e um eleitor desse distrito for selecionado aleatoriamente, qual é a probabilidade de ele ser liberal? mesma marca de pasta de dente que ele escolheu na compra anterior é 1/3, e a probabilidade de ele trocar de marca é 2/3. Suponha que na sua primeira compra a probabilidade de ele escolher a marca A seja de 1/4 e a probabilidade de ele escolher a marca B seja de 3/4. Qual é a probabilidade de que sua segunda compra seja da marca B? 16.Considere novamente o comprador descrito no Exercício 4. Em cada compra, a probabilidade de ele escolher o 17.Prove a versão condicional da lei da probabilidade total (2.1.5). 2.2 Eventos Independentes Se aprender issoBocorreu não altera a probabilidade deA,então dizemos issoAeBsão independentes. Há muitos casos em que eventosAeB não são independentes, mas seriam independentes se soubéssemos que algum outro eventoChavia ocorrido. Nesse caso,AeBsão condicionalmente independentes, dadosC. Exemplo 2.2.1 Jogando moedas.Suponha que uma moeda honesta seja lançada duas vezes. O experimento tem quatro resultados, HH, HT, TH e TT, que nos dizem como a moeda caiu em cada um dos dois lançamentos. Podemos assumir que este espaço amostral é simples, de modo que cada resultado tem probabilidade 1/4. Suponha que estejamos interessados no segundo lance. Em particular, queremos calcular a probabilidade do eventoA= {H no segundo lançamento}. Nós vemos queA= {HH,TH}, então Pr(A)=2/4 = 1/2. Se soubermos que a primeira moeda caiu em T, poderemos desejar calcular a probabilidade condicional Pr(A|B)ondeB= {T no primeiro lance}. Usando a definição de probabilidade condicional, calculamos facilmente Pr.(A∩B) Pr.(B) 1/4 1/2 1 2 Pr.(A|B)= = = , porqueA∩B= {º}tem probabilidade 1/4. Vemos que o Pr(A|B)=Pr.(A); portanto, não alteramos a probabilidade deAmesmo depois de aprendermos issoBocorreu. - Definição de Independência A probabilidade condicional do eventoAdado que o eventoBocorreu é a probabilidade revisada deAdepois que aprendemos issoBocorreu. Pode acontecer, no entanto, que nenhuma revisão seja necessária na probabilidade deAmesmo depois de aprendermos isso Bocorre. Isto é precisamente o que aconteceu no Exemplo 2.2.1. Neste caso, dizemos que AeBsãoeventos independentes. Como outro exemplo, se lançarmos uma moeda e depois lançarmos um dado, poderíamos deixarAseja o evento em que o dado mostra 3 e sejaBseja o evento em que a moeda cai com cara para cima. Se o lançamento da moeda for feito isoladamente do lançamento do dado, poderemos ficar bastante confortáveis atribuindo Pr(A|B)=Pr.(A)=1/6. Neste caso, dizemos queAeBsão eventos independentes. Em geral, se Pr(B) >0, a equação Pr(A|B)=Pr.(A)pode ser reescrito como Pr(A∩ B)/Pr.(B)=Pr.(A). Se multiplicarmos ambos os lados desta última equação por Pr(B), obtemos a equação Pr(A∩B)=Pr.(A)Pr.(B). Para evitar a condição Pr(B) >0, a definição matemática da independência de dois eventos é declarada da seguinte forma: Definição 2.2.1 Eventos Independentes.Dois eventosAeBsãoindependentese Pr.(A∩B)=Pr.(A)Pr.(B). 2.2 Eventos Independentes 67 2.2 Independent Events 67 Suponha que Pr(A) >0 e Pr(B) >0. Entdo segue-se facilmente das definigdes de Suppose that Pr(A) > 0 and Pr(B) > 0. Then it follows easily from the definitions independéncia e probabilidade condicional queAe Bsdo independentes se e somente of independence and conditional probability that A and B are independent if and only se Pr(A| BEPr. (Ae Pr(B| AFPr.(B). if Pr(A|B) = Pr(A) and Pr(B|A) = Pr(B). Independéncia de dois eventos Independence of Two Events Se dois eventosAeBsdo considerados independentes porque os eventos nao estado If two events A and B are considered to be independent because the events are fisicamente relacionados e se as probabilidades Pr(Aje Pr(Bsdo conhecidos, entdo a physically unrelated, and if the probabilities Pr(A) and Pr(B) are known, then the definigdo pode ser usada para atribuir um valor a Pr(ANB). definition can be used to assign a value to Pr(A N B). Exemplo Operacdo de maquina.Suponha que duas maquinas 1 e 2 em uma fabrica sejam operadas in- Example Machine Operation. Suppose that two machines 1 and 2 in a factory are operated in- 2.2.2 dependentes um do outro. DeixarAseja o evento em que a maquina 1 ficara inoperante 2.2.2 dependently of each other. Let A be the event that machine 1 will become inoperative durante um determinado periodo de 8 horas, sejaBseja o evento em que a maquina 2 during a given 8-hour period, let B be the event that machine 2 will become inopera- ficara inoperante durante o mesmo periodo, e suponha que Pr(AF 1 e Pr.(BF1A. tive during the same period, and suppose that Pr(A) = 1/3 and Pr(B) = 1/4. We shall Determinaremos a probabilidade de pelo menos uma das maquinas ficar inoperante determine the probability that at least one of the machines will become inoperative durante um determinado periodo. during the given period. A probabilidade Pr(ANB)que ambas as maquinas ficardo inoperantes durante o The probability Pr(A N B) that both machines will become inoperative during periodo é the period is ( if } 1 1 1 1 Pr. (ANBPr.(ApPr.(BE =. - ==. Pr(A NM B) = Pr(A) Pr(B) = (z) (Z) =—. 34 12 3/ \4 12 Portanto, a probabilidade Pr(AUB)que pelo menos uma das maquinas ficara Therefore, the probability Pr(A U B) that at least one of the machines will become inoperante durante o periodo é inoperative during the period is Pr.(AUBPr.(A}Pr.(B-Pr. (ANB) Pr(A U B) = Pr(A) + Pr(B) — Pr(AN B) 1111 1 1 1 1 =t-=5 — - = +--—=-. < 34122 3 4 12 2 O préximo exemplo mostra que dois eventosAe8, que estdo fisicamente relacionados, The next example shows that two events A and B, which are physically related, podem, no entanto, satisfazer a definigdo de independéncia. can, nevertheless, satisfy the definition of independence. Exemplo Langando um dado.Suponha que um dado equilibrado seja langado. DeixarAser 0 evento em que um par Example Rolling a Die. Suppose that a balanced die is rolled. Let A be the event that an even 2.2.3 numero é obtido, e deixeBseja o evento em que um dos numeros 1,2,3 ou 4 é 2.2.3 number is obtained, and let B be the event that one of the numbers 1, 2, 3, or 4 is obtido. Mostraremos que os eventosAeBsdo independentes. obtained. We shall show that the events A and B are independent. Neste exemplo, Pr(AF1/2 e Pr(BE2A. Além disso, uma vez queANBé o evento In this example, Pr(A) = 1/2 and Pr(B) = 2/3. Furthermore, since AN B is the em que o numero 2 ou o numero 4 é obtido, Pr(ANBE1A. Portanto, Pr.(ANBEPr.(A) event that either the number 2 or the number 4 is obtained, Pr(A N B) = 1/3. Hence, Pr.(B). Seque-se que os acontecimentosAeAsdo eventos independentes, embora a Pr(A NM B) = Pr(A) Pr(B). It follows that the events A and B are independent events, ocorréncia de cada evento dependa do mesmo langamento de um dado. - even though the occurrence of each event depends on the same roll of a die. < A independéncia dos acontecimentosAeBno Exemplo 2.2.3 também pode ser The independence of the events A and B in Example 2.2.3 can also be interpreted interpretado da seguinte forma: Suponha que uma pessoa deva apostar se o numero as follows: Suppose that a person must bet on whether the number obtained on the obtido no dado sera par ou impar, ou seja, se o eventoAVai acontecer. Como trés dos die will be even or odd, that is, on whether or not the event A will occur. Since three resultados possiveis do langamento sao pares e os outros trés sdo impares, a pessoa of the possible outcomes of the roll are even and the other three are odd, the person normalmente nao tera preferéncia entre apostar num numero par e apostar num numero will typically have no preference between betting on an even number and betting on impar. an odd number. Suponha também que depois de 0 dado ter sido langado, mas antes de a pessoa Suppose also that after the die has been rolled, but before the person has learned saber o resultado e antes de decidir se aposta num resultado par ou impar, ela é the outcome and before she has decided whether to bet on an even outcome or on an informada de que o resultado real foi um dos numeros 1,2,3 ou 4, ou seja, que o odd outcome, she is informed that the actual outcome was one of the numbers 1, 2, 3, evento Bocorreu. A pessoa agora sabe que 0 resultado foi 1,2,3 ou 4. No entanto, or 4, i.e., that the event B has occurred. The person now knows that the outcome was como dois destes numeros sao pares e dois sdo impares, a pessoa normalmente 1, 2, 3, or 4. However, since two of these numbers are even and two are odd, the ainda ndo tera preferéncia entre apostar num numero par e apostar num numero person will typically still have no preference between betting on an even number impar. Em outras palavras, a informagdo de que 0 eventoBtem and betting on an odd number. In other words, the information that the event B has 68 Chapter 2 Conditional Probability occurred is of no help to the person who is trying to decide whether or not the event A has occurred. Independence of Complements In the foregoing discussion of independent events, we stated that if A and B are independent, then the occurrence or nonoccurrence of A should not be related to the occurrence or nonoccurrence of B. Hence, if A and B satisfy the mathematical definition of independent events, then it should also be true that A and Bc are independent events, that Ac and B are independent events, and that Ac and Bc are independent events. One of these results is established in the next theorem. Theorem 2.2.1 If two events A and B are independent, then the events A and Bc are also indepen- dent. Proof Theorem 1.5.6 says that Pr(A ∩ Bc) = Pr(A) − Pr(A ∩ B). Furthermore, since A and B are independent events, Pr(A ∩ B) = Pr(A) Pr(B). It now follows that Pr(A ∩ Bc) = Pr(A) − Pr(A) Pr(B) = Pr(A)[1 − Pr(B)] = Pr(A) Pr(Bc). Therefore, the events A and Bc are independent. The proof of the analogous result for the events Ac and B is similar, and the proof for the events Ac and Bc is required in Exercise 2 at the end of this section. Independence of Several Events The definition of independent events can be extended to any number of events, A1, . . . , Ak. Intuitively, if learning that some of these events do or do not occur does not change our probabilities for any events that depend only on the remaining events, we would say that all k events are independent. The mathematical definition is the following analog to Definition 2.2.1. Definition 2.2.2 (Mutually) Independent Events. The k events A1, . . . , Ak are independent (or mutually independent) if, for every subset Ai1, . . . , Aij of j of these events (j = 2, 3, . . . , k), Pr(Ai1 ∩ . . . ∩ Aij) = Pr(Ai1) . . . Pr(Aij). As an example, in order for three events A, B, and C to be independent, the following four relations must be satisfied: Pr(A ∩ B) = Pr(A) Pr(B), Pr(A ∩ C) = Pr(A) Pr(C), Pr(B ∩ C) = Pr(B) Pr(C), (2.2.1) and Pr(A ∩ B ∩ C) = Pr(A) Pr(B) Pr(C). (2.2.2) It is possible that Eq. (2.2.2) will be satisfied, but one or more of the three rela- tions (2.2.1) will not be satisfied. On the other hand, as is shown in the next example, 68 Capítulo 2 Probabilidade Condicional ocorreu não ajuda em nada a pessoa que está tentando decidir se o evento A ocorreu. Independência de complementosNa discussão anterior sobre eventos independentes, afirmamos que seAeBsão independentes, então a ocorrência ou não de A não deve estar relacionado à ocorrência ou não ocorrência deB. Portanto, seAe B satisfazer a definição matemática de eventos independentes, então também deveria ser verdade queAeBcsão eventos independentes, queAceBsão eventos independentes e queAc eBcsão eventos independentes. Um desses resultados é estabelecido no próximo teorema. Teorema 2.2.1 Se dois eventosAeBsão independentes, então os eventosAeBctambém são independentes dente. ProvaO teorema 1.5.6 diz que Pr.(A∩Bc)=Pr.(A)-Pr.(A∩B). Além disso, desdeAeBsão eventos independentes, Pr(A∩B)=Pr.(A)Pr.(B). Segue-se agora que Pr.(A∩Bc)=Pr.(A)-Pr.(A)Pr.(B)=Pr.(A)[1 − Pr(B)] =Pr.(A)Pr.(Bc). Portanto, os acontecimentosAeBcsão independentes. A prova do resultado análogo para os eventosAceBé semelhante, e a prova para os eventosAceBcé necessário no Exercício 2 no final desta seção. Independência de Vários Eventos A definição de eventos independentes pode ser estendida a qualquer número de eventos, A1, . . . , Ak. Intuitivamente, se saber que alguns destes eventos ocorrem ou não não altera as nossas probabilidades para quaisquer eventos que dependam apenas dos eventos restantes, diríamos que todoskos eventos são independentes. A definição matemática é análoga à Definição 2.2.1. Definição 2.2.2 Eventos (mutuamente) independentes.OkeventosA1, . . . , Aksãoindependente(oumutuamente independente) se, para cada subconjuntoAeu, . . . , Aeu dejdesses eventos(j=2,3, . . . , k), ) =Pr.(Aeu). . .Pr.(A 1 j Pr.(Aeu ∩. . .∩A 1 euj ). 1 euj Por exemplo, para que três eventosA,B, eCpara ser independente, as quatro relações a seguir devem ser satisfeitas: Pr.(A∩B)=Pr.(A)Pr.(B), Pr. (A∩C)=Pr.(A)Pr.(C), Pr.(B∩ C)=Pr.(B)Pr.(C), (2.2.1) e Pr.(A∩B∩C)=Pr.(A)Pr.(B)Pr.(C). (2.2.2) É possível que a Eq. (2.2.2) será satisfeita, mas uma ou mais das três relações (2.2.1) não será satisfeita. Por outro lado, como é mostrado no próximo exemplo, 2.2 Eventos Independentes 69 2.2 Independent Events 69 também é possivel que cada uma das trés relaces (2.2.1) seja satisfeita, mas a Eq. (2.2.2) it is also possible that each of the three relations (2.2.1) will be satisfied but Eq. (2.2.2) nao sera satisfeito. will not be satisfied. Exemplo Independéncia de pares.Suponha que uma moeda honesta seja lancada duas vezes de modo que a amostra Example Pairwise Independence. Suppose that a fair coin is tossed twice so that the sample 2.2.4 espacoS= {HH, HT, TH, TT} é simples. Defina os trés eventos a seguir: 2.2.4 space S = {HH, HT, TH, TT} is simple. Define the following three events: A= {H no primeiro langamento} = {HH, HT}, A = {H on first toss} = {HH, HT}, B= {H no segundo lance} = {HH, TH},e & { B=({H on second toss} = {HH, TH}, and Ambos langam 0 mesmo} = {HH, TT}. C = {Both tosses the same} = {HH, TT}. EntaoAN B=AN C=BN C=AN BN G {AH}. Por isso, Then ANB=ANC=BNCH=ANBNC = {HAF}. Hence, Pr.(AEPr.(BEPr.(C=1/2 Pr(A) = Pr(B) = Pr(C) = 1/2 e and Pr. (ANBEPr.(ANC=Pr.(BYCEPr.(ANBICF1A. Pr(AN B) =Pr(ANC) =Pr(BNC)=Pr(AN BNC) =1/4. Segue-se que cada uma das trés relagées da Eq. (2.2.1) é satisfeita, mas a Eq. (2.2.2) ndo esta It follows that each of the three relations of Eq. (2.2.1) is satisfied but Eq. (2.2.2) is satisfeito. Esses resultados podem ser resumidos dizendo que os eventosA,B, eC sdo not satisfied. These results can be summarized by saying that the events A, B, and C independente aos pares, mas todos os trés eventos nao sdo independentes. - are pairwise independent, but all three events are not independent. < Apresentaremos agora alguns exemplos que ilustrardo 0 poder e 0 alcance do We shall now present some examples that will illustrate the power and scope of conceito de independéncia na solucdo de problemas de probabilidade. the concept of independence in the solution of probability problems. Exemplo Inspecionando itens.Suponha que uma maquina produza um item defeituoso com probabilidade Example Inspecting Items. Suppose that a machine produces a defective item with probability 2.2.5 (A0<p <1) e produz um item nao defeituoso com probabilidade 1 -p. Suponha ainda que seis 2.2.5 p (0 < p <1) and produces a nondefective item with probability 1 — p. Suppose itens produzidos pela maquina sejam selecionados aleatoriamente e inspecionados, e que os further that six items produced by the machine are selected at random and inspected, resultados (defeituosos ou nao defeituosos) para esses seis itens sejam independentes. and that the results (defective or nondefective) for these six items are independent. Determinaremos a probabilidade de que exatamente dois dos seis itens sejam defeituosos. We shall determine the probability that exactly two of the six items are defective. Pode-se supor que 0 espacgo amostralScontém todos os arranjos possiveis It can be assumed that the sample space S contains all possible arrangements de seis itens, cada um dos quais pode ser defeituoso ou nao. Para 1,...,6, of six items, each one of which might be either defective or nondefective. For j = vamos deixar Didenota 0 evento em que o/o item da amostra esta com defeito 1,..., 6, we shall let D; denote the event that the jth item in the sample is defective para queDgé o evento em que este item ndo esta defeituoso. Como os resultados dos seis so that Ds is the event that this item is nondefective. Since the outcomes for the six itens diferentes sdo independentes, a probabilidade de obter qualquer sequéncia different items are independent, the probability of obtaining any particular sequence particular de itens defeituosos e ndo defeituosos sera simplesmente o produto das of defective and nondefective items will simply be the product of the individual probabilidades individuais dos itens. Por exemplo, probabilities for the items. For example, Pr.(DaND2NDe 3NDceaNDs De 6Pr.(De 1/Pr.(D2)Pr.(Dc3Pr.(DeaPr.(Ds)Pr.(De ¢ ) Pr(D{ 9 D2 A DSN DEN Ds 1M Dé) = Pr(D}) Pr(D2) Pr(D§) Pr(D§) Pr(Ds) Pr(D¢) =(1 -p)o(1 -p)1 -p)p( -pF p21 -pp. = (1- p)p— p)d— p)pd = p) = p= py. Pode-se ver que a probabilidade de qualquer outra sequéncia particular emScontendo It can be seen that the probability of any other particular sequence in S containing dois itens defeituosos e quatro itens ndo defeituosos também serdopa(1 -p)4. Portanto, a two defective items and four nondefective items will also be p*(1 — p)*. Hence, the probabilidade de haver exatamente dois defeituosos na amostra de seis itens pode ser probability that there will be exactly two defectives in the sample of six items can be encontrado multiplicando a probabilidadep2(1 -p4de qualquer sequéncia especifica cont@ndo found by multiplying the probability p?(1 — p)* of any particular sequence containing dois defeituosos pelo numero possivel de tais sequéncias. Ja que existeme) 2distinto two defectives by the possible number of such sequences. Since there are (6) distinct arranjos de dois itens defeituosos e ayatro itens ndo defeituosos, a probabilidade de arrangements of two defective items and four nondefective items, the probability of obter exatamente dois defeituosos é2p2(1 -p)s. - obtaining exactly two defectives is (5) p?(1— p)4. < Exemplo Obtendo um item com defeito.Para as condicdes do Exemplo 2.2.5, vamos agora determinar Example Obtaining a Defective Item. For the conditions of Example 2.2.5, we shall now deter- 2.2.6 mine a probabilidade de que pelo menos um dos seis itens da amostra seja defeituoso. 2.2.6 mine the probability that at least one of the six items in the sample will be defective. Como os resultados para os diferentes itens sdo independentes, a probabilidade de que Since the outcomes for the different items are independent, the probability that todos os seis itens ndo sejam defeituosos é(1 -p. Portanto, a probabilidade de pelo menos um all six items will be nondefective is (1 — p)°. Therefore, the probability that at least item estar com defeito é 1 -(1 -px. - one item will be defective is 1 — (1 — p)®. < 70 Capitulo 2 Probabilidade Condicional 70 Chapter 2 Conditional Probability Exemplo Jogando uma moeda até aparecer uma cara.Suponha que uma moeda honesta seja lancada até que saia cara Example Tossing a Coin Until a Head Appears. Suppose that a fair coin is tossed until a head 2.2.7 aparece pela primeira vez e assuma que os resultados dos langamentos sao independentes. Vamos 2.2.7 appears for the first time, and assume that the outcomes of the tosses are independent. determinar a probabilidadepnisso exatamentenserdo necessarios langamentos. We shall determine the probability p, that exactly n tosses will be required. A probabilidade desejada é igual a probabilidade de obtern-1 coroa consecutiva The desired probability is equal to the probability of obtaining n — 1 tails in e obtendo cara no préximo langamento. Como os resultados dos langamentos sdo succession and then obtaining a head on the next toss. Since the outcomes of the independentes, a probabilidade desta sequéncia particular denresultados é pn=(1/2)n tosses are independent, the probability of this particular sequence of n outcomes is . Pr= (1/2)”. A probabilidade de que uma cara seja obtida mais cedo ou mais tarde (ou, equivalentemente, de que The probability that a head will be obtained sooner or later (or, equivalently, uma coroa no seja obtida para sempre) é that tails will not be obtained forever) is Y 1,1_,1 = 11,1 _f ts tgalg LyPa=at4tgt =1. Como a soma das probabilidadesprfor 1, segue-se que a probabilidade de obter uma Since the sum of the probabilities p, is 1, it follows that the probability of obtaining sequéncia infinita de coroas sem nunca obter cara deve ser 0. - an infinite sequence of tails without ever obtaining a head must be 0. < Exemplo Inspecionando os itens um de cada vez.Considere novamente uma maquina que produz um produto defeituoso Example Inspecting Items One at a Time. Consider again a machine that produces a defective 2.2.8 item com probabilidadepe produz um item nao defeituoso com probabilidade 1 -p. Suponha 2.2.8 item with probability p and produces a nondefective item with probability 1 — p. que os itens produzidos pela maquina sejam selecionados aleatoriamente e inspecionados um Suppose that items produced by the machine are selected at random and inspected de cada vez até que exatamente cinco itens defeituosos sejam obtidos. Vamos determinar a one at a time until exactly five defective items have been obtained. We shall deter- probabilidadepnisso exatamentenUnid(nz5 deve ser selecionado para obter os cinco mine the probability p, that exactly n items (n > 5) must be selected to obtain the defeituosos. five defectives. O quinto item defeituoso sera ono item que é inspecionado se e somente se houver The fifth defective item will be the nth item that is inspected if and only if there exatamente quatro defeituosos entre os primeiros-1 itens e depois ono item esta com are exactly four defectives among the first n — 1 items and then the nth item is defeito. Por raciocinio semelhante ao dado no Exemplo 2.2.5, pode-se mostrar que defective. By reasoning similar to that given in Example 2.2.5, it can be shown that a probabilidade de obter exatamente quatro defeituosos e/-5 ndo defeituosos entre the probability of obtaining exactly four defectives and n — 5 nondefectives among o primeiror-1 item én-1)4p (1*-ps. A Probabilidade de queno item sera the first n — 1 items is ("7') p4(1 — p)"~>. The probability that the nth item will be defeituoso ép. Como o primeiro evento se refere apenas aos resultados do primeiro/+1 itens e defective is p. Since the first event refers to outcomes for only the first n — 1 items 0 segundo evento refere-se ao resultado apenas para 0/7 item, esses dois eventos sdo and the second event refers to the outcome for only the nth item, these two events independentes. Portanto, a probabilidade de ambos os eventos ocorrerem é igual a are independent. Therefore, the probability that both events will occur is equal to O produto de suas probabilidades. Segue que the product of their probabilities. It follows that pn= mT Ps(1 -p)r-s. - Pn= (" 4 ‘ora — py": < Exemplo Pessoas v. Collins.Finkelstein e Levin (1990) descrevem um caso criminal cujo veredicto Example People v. Collins. Finkelstein and Levin (1990) describe a criminal case whose verdict 2.2.9 foi anulado pela Suprema Corte da Califérnia em parte devido a um calculo de 2.2.9 was overturned by the Supreme Court of California in part due to a probability cal- probabilidade envolvendo probabilidade condicional e independéncia. O caso, Pessoas x culation involving both conditional probability and independence. The case, People Collins, 68 Cal. 2d 319, 438 P.2d 33 (1968), envolveu um roubo de bolsa em que v. Collins, 68 Cal. 2d 319, 438 P.2d 33 (1968), involved a purse snatching in which wit- testemunhas afirmaram ter visto uma jovem com cabelos loiros presos em rabo de cavalo nesses claimed to see a young woman with blond hair in a ponytail fleeing from the fugindo do local em um carro amarelo dirigido por um homem negro com barba. Um scene in a yellow car driven by a black man with a beard. A couple meeting the de- casal que atende a descri¢do foi preso poucos dias apds o crime, mas nenhuma evidéncia scription was arrested a few days after the crime, but no physical evidence was found. fisica foi encontrada. Um matematico calculou a probabilidade de um casal selecionado A mathematician calculated the probability that a randomly selected couple would aleatoriamente possuir as caracteristicas descritas em cerca de 8.3x10-8, ou 1 em 12 possess the described characteristics as about 8.3 x 10~®, or 1 in 12 million. Faced milhdes. Confrontado com probabilidades tao esmagadoras e sem provas fisicas, o juri with such overwhelming odds and no physical evidence, the jury decided that the decidiu que os réus deviam ser 0 Unico casal e condenou-os. A Suprema Corte considerou defendants must have been the only such couple and convicted them. The Supreme que uma probabilidade mais util deveria ter sido calculada. Com base no depoimento das Court thought that a more useful probability should have been calculated. Based testemunhas, houve um casal que atendeu a descrigdo acima. Dado que ja existia um on the testimony of the witnesses, there was a couple that met the above descrip- casal que correspondia a descricao, qual é a probabilidade condicional de que também tion. Given that there was already one couple who met the description, what is the existisse um segundo casal como os arguidos? conditional probability that there was also a second couple such as the defendants? Deixarpser a probabilidade de que um casal selecionado aleatoriamente de uma populagdo den Let p be the probability that a randomly selected couple from a population of n casais tem certas caracteristicas. DeixarAseja o evento em que pelo menos um casal na populacao couples has certain characteristics. Let A be the event that at least one couple in the tenha as caracteristicas, e deixeSser 0 evento em que pelo menos dois casais population has the characteristics, and let B be the event that at least two couples 2.2 Independent Events 71 have the characteristics. What we seek is Pr(B|A). Since B ⊂ A, it follows that Pr(B|A) = Pr(B ∩ A) Pr(A) = Pr(B) Pr(A). We shall calculate Pr(B) and Pr(A) by breaking each event into more manageable pieces. Suppose that we number the n couples in the population from 1 to n. Let Ai be the event that couple number i has the characteristics in question for i = 1, . . . , n, and let C be the event that exactly one couple has the characteristics. Then A = (Ac 1 ∩ Ac 2 . . . ∩ Ac n)c, C = (A1 ∩ Ac 2 . . . ∩ Ac n) ∪ (Ac 1 ∩ A2 ∩ Ac 3 . . . ∩ Ac n) ∪ . . . ∪ (Ac 1 ∩ . . . ∩ Ac n−1 ∩ An), B = A ∩ Cc. Assuming that the n couples are mutually independent, Pr(Ac) = (1 − p)n, and Pr(A) = 1 − (1 − p)n. The n events whose union is C are disjoint and each one has probability p(1 − p)n−1, so Pr(C) = np(1 − p)n−1. Since A = B ∪ C with B and C disjoint, we have Pr(B) = Pr(A) − Pr(C) = 1 − (1 − p)n − np(1 − p)n−1. So, Pr(B|A) = 1 − (1 − p)n − np(1 − p)n−1 1 − (1 − p)n . (2.2.3) The Supreme Court of California reasoned that, since the crime occurred in a heavily populated area, n would be in the millions. For example, with p = 8.3 × 10−8 and n = 8,000,000, the value of (2.2.3) is 0.2966. Such a probability suggests that there is a reasonable chance that there was another couple meeting the same description as the witnesses provided. Of course, the court did not know how large n was, but the fact that (2.2.3) could easily be so large was grounds enough to rule that reasonable doubt remained as to the guilt of the defendants. ◀ Independence and Conditional Probability Two events A and B with positive probability are independent if and only if Pr(A|B) = Pr(A). Similar results hold for larger collections of independent events. The following theorem, for example, is straightforward to prove based on the definition of independence. Theorem 2.2.2 Let A1, . . . , Ak be events such that Pr(A1 ∩ . . . ∩ Ak) > 0. Then A1, . . . , Ak are independent if and only if, for every two disjoint subsets {i1, . . . , im} and {j1, . . . , jℓ} of {1, . . . , k}, we have Pr(Ai1 ∩ . . . ∩ Aim|Aj1 ∩ . . . ∩ Ajℓ) = Pr(Ai1 ∩ . . . ∩ Aim). Theorem 2.2.2 says that k events are independent if and only if learning that some of the events occur does not change the probability that any combination of the other events occurs. The Meaning of Independence We have given a mathematical definition of inde- pendent events in Definition 2.2.1. We have also given some interpretations for what it means for events to be independent. The most instructive interpretation is the one based on conditional probability. If learning that B occurs does not change the prob- ability of A, then A and B are independent. In simple examples such as tossing what we believe to be a fair coin, we would generally not expect to change our minds 2.2 Eventos Independentes 71 tem as características. O que buscamos é Pr(B|A). DesdeB⊂A, segue que Pr.(B∩A) Pr.(A) Pr.(B) Pr.(A) Pr.(B|A)= = . Vamos calcular Pr(B)e Pr(A)dividindo cada evento em partes mais gerenciáveis. Suponha que numeramos oncasais na população de 1 an. DeixarAeu seja o evento que o número do casaleutem as características em questão paraeu=1, . . . , n, e deixarCseja o evento em que exatamente um casal tenha as características. Então A=(Ac 1∩Ac. . .∩Ac 2 n)c, C=(A1∩Ac. . .∩Ac 2 n)∪(Ac 1∩A2∩Ac. . .∩Ac n)∪. . .∪(Ac 3 1∩. . .∩Ac n−1∩An), B=A∩Cc. Supondo que oncasais são mutuamente independentes, Pr(Ac)=(1 -p)ne Pr.(A)=1 - (1 -p)n. Oneventos cuja união éCsão disjuntos e cada um tem probabilidadep(1 - p)n−1, então Pr(C)=np(1 -p)n−1. DesdeA=B∪CcomBeC disjunto, temos Pr.(B)=Pr.(A)-Pr.(C)=1 -(1 -p)n-np(1 -p)n−1. Então, 1 -(1 -p)n-np(1 -p)n−1 1 -(1 -p)n Pr.(B|A)= . (2.2.3) A Suprema Corte da Califórnia argumentou que, como o crime ocorreu em uma área densamente povoada,nestaria na casa dos milhões. Por exemplo, comp=8.3×10−8 en=8,000,000, o valor de (2.2.3) é 0,2966. Tal probabilidade sugere que há uma probabilidade razoável de que houvesse outro casal que correspondesse à mesma descrição que as testemunhas forneceram. É claro que o tribunal não sabia quão grandenera, mas o facto de (2.2.3) poder facilmente ser tão grande foi motivo suficiente para determinar que subsistia uma dúvida razoável quanto à culpa dos réus. - Independência e probabilidade condicionalDois eventosAeBcom positivo probabilidade são independentes se e somente se Pr(A|B)=Pr.(A). Resultados semelhantes são válidos para coleções maiores de eventos independentes. O seguinte teorema, por exemplo, é fácil de provar com base na definição de independência. Teorema 2.2.2 DeixarA1, . . . , Aksejam eventos tais que Pr(A1∩. . .∩Ak) >0. EntãoA1, . . . , Aksão independente se e somente se, para cada dois subconjuntos disjuntos {eu1, . . . , eueu}e {j1, . . . , j} de {1 , . . . , k}, Nós temos Pr.(Aeu∩. . .∩Aeu|A ∩. . .∩Aj)=Pr.(Aeu1 ∩ . . . ∩Aeu). 1 eu j 1 eu O teorema 2.2.2 diz quekos eventos são independentes se e somente se saber que alguns dos eventos ocorrem não altera a probabilidade de que qualquer combinação dos outros eventos ocorra. O significado da independênciaDemos uma definição matemática de eventos independentes na Definição 2.2.1. Também demos algumas interpretações sobre o que significa os eventos serem independentes. A interpretação mais instrutiva é aquela baseada na probabilidade condicional. Se aprender issoBocorre não altera a probabilidade deA, entãoAeBsão independentes. Em exemplos simples, como lançar o que acreditamos ser uma moeda justa, geralmente não esperaríamos mudar de ideia 72 Chapter 2 Conditional Probability about what is likely to happen on later flips after we observe earlier flips; hence, we declare the events that concern different flips to be independent. However, consider a situation similar to Example 2.2.5 in which items produced by a machine are in- spected to see whether or not they are defective. In Example 2.2.5, we declared that the different items were independent and that each item had probability p of being defective. This might make sense if we were confident that we knew how well the machine was performing. But if we were unsure of how the machine were perform- ing, we could easily imagine changing our mind about the probability that the 10th item is defective depending on how many of the first nine items are defective. To be specific, suppose that we begin by thinking that the probability is 0.08 that an item will be defective. If we observe one or zero defective items in the first nine, we might not make much revision to the probability that the 10th item is defective. On the other hand, if we observe eight or nine defectives in the first nine items, we might be uncomfortable keeping the probability at 0.08 that the 10th item will be defective. In summary, when deciding whether to model events as independent, try to answer the following question: “If I were to learn that some of these events occurred, would I change the probabilities of any of the others?” If we feel that we already know ev- erything that we could learn from these events about how likely the others should be, we can safely model them as independent. If, on the other hand, we feel that learning some of these events could change our minds about how likely some of the others are, then we should be more careful about determining the conditional probabilities and not model the events as independent. Mutually Exclusive Events and Mutually Independent Events Two similar-sound- ing definitions have appeared earlier in this text. Definition 1.4.10 defines mutually exclusive events, and Definition 2.2.2 defines mutually independent events. It is almost never the case that the same set of events satisfies both definitions. The reason is that if events are disjoint (mutually exclusive), then learning that one occurs means that the others definitely did not occur. Hence, learning that one occurs would change the probabilities for all the others to 0, unless the others already had probability 0. Indeed, this suggests the only condition in which the two definitions would both apply to the same collection of events. The proof of the following result is left to Exercise 24 in this section. Theorem 2.2.3 Let n > 1 and let A1, . . . , An be events that are mutually exclusive. The events are also mutually independent if and only if all the events except possibly one of them has probability 0. Conditionally Independent Events Conditional probability and independence combine into one of the most versatile models of data collection. The idea is that, in many circumstances, we are unwilling to say that certain events are independent because we believe that learning some of them will provide information about how likely the others are to occur. But if we knew the frequency with which such events would occur, we might then be willing to assume that they are independent. This model can be illustrated using one of the examples from earlier in this section. Example 2.2.10 Inspecting Items. Consider again the situation in Example 2.2.5. This time, however, suppose that we believe that we would change our minds about the probabilities of later items being defective were we to learn that certain numbers of early items 72 Capítulo 2 Probabilidade Condicional sobre o que provavelmente acontecerá em lançamentos posteriores, depois de observarmos lançamentos anteriores; portanto, declaramos que os eventos que dizem respeito a diferentes lançamentos são independentes. Entretanto, considere uma situação semelhante ao Exemplo 2.2.5, na qual itens produzidos por uma máquina são inspecionados para verificar se estão ou não defeituosos. No Exemplo 2.2.5, declaramos que os diferentes itens eram independentes e que cada item tinha probabilidadepde estar com defeito. Isso poderia fazer sentido se estivéssemos confiantes de que sabíamos o desempenho da máquina. Mas se não tivéssemos certeza do desempenho da máquina, poderíamos facilmente imaginar mudar de ideia sobre a probabilidade de o décimo item estar com defeito, dependendo de quantos dos primeiros nove itens estão com defeito. Para ser mais específico, suponha que comecemos pensando que a probabilidade é de 0,08 de que um item seja defeituoso. Se observarmos um ou nenhum item defeituoso nos primeiros nove, talvez não façamos muita revisão na probabilidade de o décimo item ser defeituoso. Por outro lado, se observarmos oito ou nove defeituosos nos primeiros nove itens, poderemos sentir-nos desconfortáveis em manter em 0,08 a probabilidade de que o décimo item seja defeituoso. Em resumo, ao decidir se modelamos eventos como independentes, tente responder à seguinte pergunta: “Se eu soubesse que alguns desses eventos ocorreram, mudaria as probabilidades de algum dos outros?” Se sentirmos que já sabemos tudo o que poderíamos aprender com estes acontecimentos sobre a probabilidade dos outros, podemos modelá-los com segurança como independentes. Se, por outro lado, sentirmos que aprender alguns destes acontecimentos pode mudar a nossa opinião sobre a probabilidade de alguns dos outros, então deveríamos ser mais cuidadosos na determinação das probabilidades condicionais e não modelar os acontecimentos como independentes. Eventos mutuamente exclusivos e eventos mutuamente independentesDois sons semelhantes definições apareceram anteriormente neste texto. A Definição 1.4.10 define eventos mutuamente exclusivos e a Definição 2.2.2 define eventos mutuamente independentes. Quase nunca acontece que o mesmo conjunto de eventos satisfaça ambas as definições. A razão é que se os eventos são disjuntos (mutuamente exclusivos), então saber que um ocorre significa que os outros definitivamente não ocorreram. Assim, saber que um ocorre alteraria as probabilidades de todos os outros para 0, a menos que os outros já tivessem probabilidade 0. Na verdade, isto sugere a única condição em que as duas definições se aplicariam ao mesmo conjunto de acontecimentos. A prova do resultado a seguir é deixada para o Exercício 24 desta seção. Teorema 2.2.3 Deixarn >1 e deixeA1, . . . , Anser eventos mutuamente exclusivos. Os eventos são também mutuamente independentes se e somente se todos os eventos, exceto possivelmente um deles, têm probabilidade 0. Eventos Condicionalmente Independentes A probabilidade condicional e a independência combinam-se num dos modelos mais versáteis de recolha de dados. A ideia é que, em muitas circunstâncias, não estamos dispostos a dizer que certos eventos são independentes porque acreditamos que aprender alguns deles fornecerá informações sobre a probabilidade de os outros ocorrerem. Mas se soubéssemos a frequência com que tais acontecimentos ocorreriam, poderíamos então estar dispostos a assumir que são independentes. Este modelo pode ser ilustrado usando um dos exemplos anteriores nesta seção. Exemplo 2.2.10 Inspecionando itens.Considere novamente a situação do Exemplo 2.2.5. Desta vez, porém, suponhamos que acreditamos que mudaríamos de ideia sobre as probabilidades de itens posteriores serem defeituosos se soubéssemos que certo número de itens anteriores 2.2 Eventos Independentes 73 2.2 Independent Events 73 estavam com defeito. Suponha que pensemos no nimeropdo Exemplo 2.2.5 como a proporcao de were defective. Suppose that we think of the number p from Example 2.2.5 as the itens defeituosos que esperariamos ver se inspecionassemos uma amostra muito grande de itens. Se proportion of defective items that we would expect to see if we were to inspect a very soubéssemos essa proporcaop, e se amostrarmos apenas alguns, digamos, seis ou 10 itens agora, large sample of items. If we knew this proportion p, and if we were to sample only a poderiamos nos sentir confiantes em afirmar que a probabilidade de um item posterior ser few, say, six or 10 items now, we might feel confident maintaining that the probability defeituoso permanecepmesmo depois de inspecionarmos alguns dos itens anteriores. Por outro lado, of a later item being defective remains p even after we inspect some of the earlier se ndo tivermos certeza de qual seria a proporcdo de itens defeituosos em uma amostra grande, items. On the other hand, if we are not sure what would be the proportion of defective poderemos nao nos sentir confiantes em manter a probabilidade igual 4 medida que continuamos a items in a large sample, we might not feel confident keeping the probability the same inspecionar. as we continue to inspect. Para ser mais preciso, suponha que tratamos a proporcdopde itens defeituosos como To be precise, suppose that we treat the proportion p of defective items as desconhecidos e que estamos lidando com um experimento aumentado conforme descrito na unknown and that we are dealing with an augmented experiment as described in Definigdo 2.1.3. Para simplificar, suponha queppode assumir um de dois valores, 0,01 ou 0,4, 0 Definition 2.1.3. For simplicity, suppose that p can take one of two values, either 0.01 primeiro correspondendo a operac¢ao normal e 0 segundo correspondendo a uma necessidade or 0.4, the first corresponding to normal operation and the second corresponding to de manutencao. DeixarBiseja o evento quep=0.01, e deixe B2seja o evento quep=0.4. Se a need for maintenance. Let B, be the event that p = 0.01, and let By be the event soubéssemos disso f1ocorrido, entaéo procederiamos sob a suposi¢ado de que os eventos, D2 that p = 0.4. If we knew that B, had occurred, then we would proceed under the ,...@ram independentes com Pr(Deu| 810.01 para todoseu. Por exemplo, poderiamos fazer os assumption that the events D;, D>, ... were independent with Pr(D,|B,) = 0.01 for mesmos calculos dos Exemplos 2.2.5 e 2.2.8 all i. For example, we could do the same calculations as in Examples 2.2.5 and 2.2.8 comp=0.01. DeixeAseja o evento em que observagyos exatamente dois defeitos de forma aleatéria with p = 0.01. Let A be the event that we observe exactly two defectives in a random amostra de seis itens. Entao Pr.(A| B120.0120.994= 1.44x10-3. Da mesma forma, se nds sample of six items. Then Pr(A|B) = (5)0.0170.994 = 1.44 x 1073, Similarly, if we sabia queAzocorreu, entao assumiriamos quer. D2,...eram independentes knew that By had occurred, then we would assume that D,, Do, . .. were independent com Pr.(Deu| 620.4, Neste caso, Pr.(A|B2- 5 0.420.64= 0.311. - with Pr(D;|Bz) = 0.4. In this case, Pr(A| Bz) = ($)0.470.64 = 0.311. < No Exemplo 2.2.10, nado ha razdo para quepdeve ser obrigado a assumir no maximo dois In Example 2.2.10, there is no reason that p must be required to assume at most valores diferentes. Poderiamos facilmente permitirptomar um terceiro valor ou um quarto two different values. We could easily allow p to take a third value or a fourth value, valor, etc. Na verdade, no Capitulo 3 aprenderemos como lidar com o caso em que todo etc. Indeed, in Chapter 3 we shall learn how to handle the case in which every number numero entre 0 e 1 6 um valor possivel dep. O objetivo do exemplo simples é ilustrar o conceito between 0 and 1 is a possible value of p. The point of the simple example is to illustrate de assumir que os eventos sdo independentes e condicionados a outro evento, como#iouBeno the concept of assuming that events are independent conditional on another event, exemplo. such as B, or B, in the example. O conceito formal ilustrado no Exemplo 2.2.10 é 0 seguinte: The formal concept illustrated in Example 2.2.10 is the following: Definigao Independéncia Condicional.Dizemos que os acontecimentosA1, . . ., Axsdocondicionalmente inde- Definition Conditional Independence. We say that events A;,..., A, are conditionally inde- 2.2.3 pendente dadoBse, para cada subcolegaoAeu,...,A gude/destes ev entes (= 2.2.3 pendent given B if, for every subcollection A;,, ..., A;, of j of these events (j = 2,3,...,k), ( ’ 2,3,...,k), PrAeu, 9.» A HBP (Ace BPA aul B) Pr( Ai, Aen A;,|B) = Pr(Aj,|B) «++ Pr(A; |B). A Definigdo 2.2.3 é idéntica 4 Definicdo 2.2.2 para eventos independentes com a modificagdo Definition 2.2.3 is identical to Definition 2.2.2 for independent events with the mod- que todosprobabilidades na definicdo esto agora condicionadas aB. Como nota, mesmo se ification that all probabilities in the definition are now conditional on B. As a note, assumirmos que os eventos/i, ..., Aksdo condicionalmente independentes, dadosB, isso éndo even if we assume that events A;,..., A, are conditionally independent given B, it necessdrio que sejam condicionalmente independentes, dadoBc. No Exemplo 2.2.10, os eventos is not necessary that they be conditionally independent given B°. In Example 2.2.10, D1, D2, .. .eram condicionalmente independentes, dados ambos&1e82=Bc 1, the events Dj, D2, ... were conditionally independent given both B, and B, = By, qual é a situacdo tipica. Exercicio 16 na Sec. 2.3 6 um exemplo em que os eventos sao which is the typical situation. Exercise 16 in Sec. 2.3 is an example in which events are condicionalmente independentes, dado um eventoBmas nao so condicionalmente conditionally independent given one event B but are not conditionally independent independentes dado o complemento&c. given the complement B°. Lembre-se de que dois eventosA1eA2(com Pr.(A1) >0) sdo independentes se e somente se Pr(A2|A Recall that two events A; and A> (with Pr(A;) > 0) are independent if and only 1}Pr.(A2). Um resultado semelhante é valido para eventos condicionalmente independentes. if Pr(A>|A,) = Pr(A2). A similar result holds for conditionally independent events. Teorema Suponha queA1,A2, eBsdo eventos tais que Pr(A1NB) >0. EntaoAieA2sdo Theorem Suppose that Aj, Az, and B are events such that Pr(A,M B) > 0. Then A, and A) are 2.2.4 condicionalmente independente dadoBse e somente se Pr(Az| AiNBPr.(A2| B). 7 2.2.4 conditionally independent given B if and only if Pr(A,|A,; MN B) = Pr(A,|B). = Este é outro exemplo da afirmacdo que fizemos anteriormente de que todo resultado que podemos This is another example of the claim we made earlier that every result we can prove provar tem uma condicional andloga a um eventoB. O leitor pode provar este teorema no Exercicio has an analog conditional on an event B. The reader can prove this theorem in 22. Exercise 22. 74 Capitulo 2 Probabilidade Condicional 74 Chapter 2 Conditional Probability O problema do colecionador The Collector’s Problem Suponha quenbolas sao lancadas aleatoriamenteRcaixas(Rsn). Vamos supor quenos Suppose that n balls are thrown in a random manner into r boxes (r <n). We shall langamentos sao independentes e que cada um dosAcaixas tem a mesma probabilidade de assume that the n throws are independent and that each of the r boxes is equally receber qualquer bola. O problema é determinar a probabilidadepque cada caixa receberda pelo likely to receive any given ball. The problem is to determine the probability p that menos uma bola. Este problema pode ser reformulado em termos de um problema de every box will receive at least one ball. This problem can be reformulated in terms of colecionador da seguinte forma: suponha que cada pacote de chiclete contenha a imagem de a collector’s problem as follows: Suppose that each package of bubble gum contains um jogador de beisebol, que as imagens deAdiferentes jogadores sao usados, que a imagem the picture of a baseball player, that the pictures of r different players are used, that de cada jogador tem a mesma probabilidade de ser colocada em qualquer embalagem de the picture of each player is equally likely to be placed in any given package of gum, chiclete e que as imagens sdo colocadas em embalagens diferentes, independentemente umas and that pictures are placed in different packages independently of each other. The das outras. O problema agora é determinar a probabilidadepque uma pessoa que compran problem now is to determine the probability p that a person who buys n packages of pacotes de goma(nzrjobterd um conjunto completo deAfotos diferentes. gum (n > r) will obtain a complete set of r different pictures. Paraeu=1,..., /, deixarALudenota o evento em que a imagem do jogadoreuesta desaparecido For i =1,...,7r, let A; denote the event that the picture of player i is missing de tudonpacotes. Entdo grieve 0 evento em que a imagem de pelo menos um jogador from all n packages. Then \_);_, A; is the event that the picture of at least one player esté desaparecido. Encontraremos 0 Pr(r eu=1AeuJaplicando a Eq. (1.10.6). is missing. We shall find Pr(;_, A;) by applying Eq. (1.10.6). Como a foto de cada um dosAjogadores tém a mesma probabilidade de serem colocados Since the picture of each of the r players is equally likely to be placed in any em qualquer pacote especifico, a probabilidade de que a imagem do jogadoreunao sera obtido particular package, the probability that the picture of player i will not be obtained in em nenhum pacote especifico 6(R-1)/r. Como os pacotes sdo preenchidos de forma any particular package is (r — 1)/r. Since the packages are filled independently, the independente, a probabilidade de que a imagem do jogadoreundo sera obtido em nenhum dos probability that the picture of player i will not be obtained in any of the n packages npacotes é [(r-1)/r)n. Por isso, is [(r — 1)/r}". Hence, ( ) R-1 —1)\" Pr.(Aeu= > paraeu=1,..., RF. Pr(A;) = (—) fori=1,...,r. r Agora considere quaisquer dois jogadoreseue/.A probabilidade de que nem a imagem do Now consider any two players i and j. The probability that neither the picture of jogadoreunem a foto do jogadorjsera obtido em qualquer pacote especifico é (r-2)/r. player i nor the picture of player j will be obtained in any particular package is Portanto, a probabilidade de que nenhuma imagem seja obtida em qualquer um dosn (r — 2)/r. Therefore, the probability that neither picture will be obtained in any of pacotes é [(r-2)/r]n. Por isso, the n packages is [(r — 2)/r]". Thus, ( ) n R-2 —2 Pr.(AeAjpe Pr(A; Aj) = (=) R r Se considerarmos a seguir quaisquer trés jogadoreseu,j,ek, descobrimos que If we next consider any three players i, j, and k, we find that () n R-3 —3 Pr. (AeunANAKE = Pr(Aj; NA; NA, = (=) : r Continuando desta forma, finalmente chegamos a probabilidade Pr(AinA2n. . . By continuing in this way, we finally arrive at the probability Pr(A, 1 A29---N A,) Ar) que as fotos de todosAfaltam jogadores nonpacotes. Claro, esta that the pictures of all r players are missing from the n packages. Of course, this probabilidade é 0. Portanto, pela Eq. (1.10.6) da Seg. 1.10, probability is 0. Therefore, by Eq. (1.10.6) of Sec. 1.10, ( () ()(,,) ( dC.) we =p Rin a R2n° + EVR R Vv" Pr Ja (+) (5) (—) + 4 v( r yey . eu = ——_ - —— eee — — : =r —_— —_ ——_ eee — — R 2 R R-1 R ‘ r 2 r r-W\r eu=1 i=l ¥1 ()( / ) r-1 . 1 =" eter 1-2, =Yen(") (1-4). , J R ? J r Fl j=1 Desde entaoe probabilidadepde obter um conjunto completo de imagens diferentes é igual a 1 - Since the probability p of obtaining a complete set of r different pictures is equal to Pr( eu=1Aeu), segue da derivacdo anterior queppode ser escrito no 1— Pr(U;_, A,), it follows from the foregoing derivation that p can be written in the forma form x1 ( ye . ) r-l . n . n i {7 J poy * 1-2. p=Lcw([) (1) fF0 J j=0 J r 2.2 Independent Events 75 Summary A collection of events is independent if and only if learning that some of them occur does not change the probabilities that any combination of the rest of them occurs. Equivalently, a collection of events is independent if and only if the probability of the intersection of every subcollection is the product of the individual probabilities. The concept of independence has a version conditional on another event. A collection of events is independent conditional on B if and only if the conditional probability of the intersection of every subcollection given B is the product of the individual conditional probabilities given B. Equivalently, a collection of events is conditionally independent given B if and only if learning that some of them (and B) occur does not change the conditional probabilities given B that any combination of the rest of them occur. The full power of conditional independence will become more apparent after we introduce Bayes’ theorem in the next section. Exercises 1. If A and B are independent events and Pr(B) < 1, what is the value of Pr(Ac|Bc)? 2. Assuming that A and B are independent events, prove that the events Ac and Bc are also independent. 3. Suppose that A is an event such that Pr(A) = 0 and that B is any other event. Prove that A and B are independent events. 4. Suppose that a person rolls two balanced dice three times in succession. Determine the probability that on each of the three rolls, the sum of the two numbers that appear will be 7. 5. Suppose that the probability that the control system used in a spaceship will malfunction on a given flight is 0.001. Suppose further that a duplicate, but completely in- dependent, control system is also installed in the spaceship to take control in case the first system malfunctions. De- termine the probability that the spaceship will be under the control of either the original system or the duplicate system on a given flight. 6. Suppose that 10,000 tickets are sold in one lottery and 5000 tickets are sold in another lottery. If a person owns 100 tickets in each lottery, what is the probability that she will win at least one first prize? 7. Two students A and B are both registered for a certain course. Assume that student A attends class 80 percent of the time, student B attends class 60 percent of the time, and the absences of the two students are independent. a. What is the probability that at least one of the two students will be in class on a given day? b. If at least one of the two students is in class on a given day, what is the probability that A is in class that day? 8. If three balanced dice are rolled, what is the probability that all three numbers will be the same? 9. Consider an experiment in which a fair coin is tossed until a head is obtained for the first time. If this experiment is performed three times, what is the probability that ex- actly the same number of tosses will be required for each of the three performances? 10. The probability that any child in a certain family will have blue eyes is 1/4, and this feature is inherited indepen- dently by different children in the family. If there are five children in the family and it is known that at least one of these children has blue eyes, what is the probability that at least three of the children have blue eyes? 11. Consider the family with five children described in Exercise 10. a. If it is known that the youngest child in the family has blue eyes, what is the probability that at least three of the children have blue eyes? b. Explain why the answer in part (a) is different from the answer in Exercise 10. 12. Suppose that A, B, and C are three independent events such that Pr(A) = 1/4, Pr(B) = 1/3, and Pr(C) = 1/2. (a) Determine the probability that none of these three events will occur. (b) Determine the probability that ex- actly one of these three events will occur. 13. Suppose that the probability that any particle emitted by a radioactive material will penetrate a certain shield is 0.01. If 10 particles are emitted, what is the probability that exactly one of the particles will penetrate the shield? 2.2 Eventos Independentes 75 Resumo Uma coleção de eventos é independente se e somente se o conhecimento de que alguns deles ocorrem não altera as probabilidades de ocorrência de qualquer combinação do restante deles. De forma equivalente, uma coleção de eventos é independente se e somente se a probabilidade da intersecção de cada subcoleção for o produto das probabilidades individuais. O conceito de independência tem uma versão condicionada a outro acontecimento. Uma coleção de eventos é independente, condicionada aBse e somente se a probabilidade condicional da interseção de cada subcoleção dadaBé o produto das probabilidades condicionais individuais dadasB. De forma equivalente, uma coleção de eventos é condicionalmente independente, dadaBse e somente se aprendermos que alguns deles (eB) ocorrer não altera as probabilidades condicionais dadasBque qualquer combinação do resto deles ocorra. O poder total da independência condicional ficará mais aparente depois de introduzirmos o teorema de Bayes na próxima seção. Exercícios 1.SeAeBsão eventos independentes e Pr(B) <1, qual é o valor de Pr(Ac|Bc)? 8.Se três dados equilibrados forem lançados, qual é a probabilidade de todos os três números serem iguais? 2.Assumindo queAeBsão eventos independentes, prove que os eventosAceBctambém são independentes. 9.Considere um experimento em que uma moeda honesta é lançada até que uma cara seja obtida pela primeira vez. Se esta experiência for realizada três vezes, qual é a probabilidade de que seja necessário exatamente o mesmo número de lançamentos para cada uma das três execuções? 3.Suponha queAé um evento tal que Pr(A)=0 e isso Bé qualquer outro evento. Prove issoAeBsão eventos independentes. 4.Suponha que uma pessoa jogue dois dados equilibrados três vezes consecutivas. Determine a probabilidade de que em cada um dos três lançamentos a soma dos dois números que aparecem seja 7. 10.A probabilidade de qualquer criança de uma determinada família ter olhos azuis é de 1/4, e essa característica é herdada independentemente por diferentes crianças da família. Se há cinco filhos na família e se sabe que pelo menos uma dessas crianças tem olhos azuis, qual é a probabilidade de pelo menos três dos filhos terem olhos azuis? 5.Suponha que a probabilidade de o sistema de controle usado em uma nave espacial funcionar mal em um determinado voo seja de 0,001. Suponha ainda que um sistema de controle duplicado, mas completamente independente, também seja instalado na nave espacial para assumir o controle no caso de mau funcionamento do primeiro sistema. Determine a probabilidade de a nave espacial estar sob o controle do sistema original ou do sistema duplicado em um determinado voo. 11.Considere a família com cinco filhos descrita no Exercício 10. a.Se se sabe que o filho mais novo da família tem olhos azuis, qual é a probabilidade de pelo menos três dos filhos terem olhos azuis? b.Explique por que a resposta na parte (a) é diferente da resposta no Exercício 10. 6.Suponha que 10.000 bilhetes sejam vendidos em uma loteria e 5.000 bilhetes sejam vendidos em outra loteria. Se uma pessoa possui 100 bilhetes em cada loteria, qual é a probabilidade de ela ganhar pelo menos um primeiro prêmio? 12.Suponha queA,B, eCsão três eventos independentes tais que Pr(A)=1/4, Pr.(B)=1/3, e Pr.(C)= 1/2.(a) Determine a probabilidade de que nenhum desses três eventos ocorra.(b)Determine a probabilidade de ocorrer exatamente um desses três eventos. 7.Dois estudantesAeBambos estão matriculados em um determinado curso. Suponha que esse alunoAassiste às aulas 80 por cento do tempo, o alunoBassiste às aulas 60% do tempo e as faltas dos dois alunos são independentes. a.Qual é a probabilidade de pelo menos um dos dois alunos estar na aula em um determinado dia? 13.Suponha que a probabilidade de qualquer partícula emitida por um material radioativo penetrar em um determinado escudo seja de 0,01. Se forem emitidas 10 partículas, qual é a probabilidade de que exatamente uma das partículas penetre no escudo? b.Se pelo menos um dos dois alunos estiver na aula num determinado dia, qual é a probabilidade de queAestá na aula naquele dia? 76 Chapter 2 Conditional Probability 14. Consider again the conditions of Exercise 13. If 10 particles are emitted, what is the probability that at least one of the particles will penetrate the shield? 15. Consider again the conditions of Exercise 13. How many particles must be emitted in order for the probability to be at least 0.8 that at least one particle will penetrate the shield? 16. In the World Series of baseball, two teams A and B play a sequence of games against each other, and the first team that wins a total of four games becomes the winner of the World Series. If the probability that team A will win any particular game against team B is 1/3, what is the probability that team A will win the World Series? 17. Two boys A and B throw a ball at a target. Suppose that the probability that boy A will hit the target on any throw is 1/3 and the probability that boy B will hit the target on any throw is 1/4. Suppose also that boy A throws first and the two boys take turns throwing. Determine the probability that the target will be hit for the first time on the third throw of boy A. 18. For the conditions of Exercise 17, determine the prob- ability that boy A will hit the target before boy B does. 19. A box contains 20 red balls, 30 white balls, and 50 blue balls. Suppose that 10 balls are selected at random one at a time, with replacement; that is, each selected ball is replaced in the box before the next selection is made. Determine the probability that at least one color will be missing from the 10 selected balls. 20. Suppose that A1, . . . , Ak form a sequence of k inde- pendent events. Let B1, . . . , Bk be another sequence of k events such that for each value of j (j = 1, . . . , k), either Bj = Aj or Bj = Ac j. Prove that B1, . . . , Bk are also inde- pendent events. Hint: Use an induction argument based on the number of events Bj for which Bj = Ac j. 21. Prove Theorem 2.2.2 on page 71. Hint: The “only if” direction is direct from the definition of independence on page 68. For the “if” direction, use induction on the value of j in the definition of independence. Let m = j − 1 and let ℓ = 1 with j1 = ij. 22. Prove Theorem 2.2.4 on page 73. 23. A programmer is about to attempt to compile a se- ries of 11 similar programs. Let Ai be the event that the ith program compiles successfully for i = 1, . . . , 11. When the programming task is easy, the programmer expects that 80 percent of programs should compile. When the programming task is difficult, she expects that only 40 per- cent of the programs will compile. Let B be the event that the programming task was easy. The programmer believes that the events A1, . . . , A11 are conditionally independent given B and given Bc. a. Compute the probability that exactly 8 out of 11 programs will compile given B. b. Compute the probability that exactly 8 out of 11 programs will compile given Bc. 24. Prove Theorem 2.2.3 on page 72. 2.3 Bayes’ Theorem Suppose that we are interested in which of several disjoint events B1, . . . , Bk will occur and that we will get to observe some other event A. If Pr(A|Bi) is available for each i, then Bayes’ theorem is a useful formula for computing the conditional probabilities of the Bi events given A. We begin with a typical example. Example 2.3.1 Test for a Disease. Suppose that you are walking down the street and notice that the Department of Public Health is giving a free medical test for a certain disease. The test is 90 percent reliable in the following sense: If a person has the disease, there is a probability of 0.9 that the test will give a positive response; whereas, if a person does not have the disease, there is a probability of only 0.1 that the test will give a positive response. Data indicate that your chances of having the disease are only 1 in 10,000. However, since the test costs you nothing, and is fast and harmless, you decide to stop and take the test. A few days later you learn that you had a positive response to the test. Now, what is the probability that you have the disease? ◀ 76 Capítulo 2 Probabilidade Condicional 14.Considere novamente as condições do Exercício 13. Se forem emitidas 10 partículas, qual é a probabilidade de pelo menos uma das partículas penetrar na blindagem? 20.Suponha queA1, . . . , Akformar uma sequência dekeventos independentes. DeixarB1, . . . , Bkser outra sequência dek eventos tais que para cada valor dej (j=1, . . . , k), qualquer Bj=AjouBj=Ac j.Prove issoB1, . . . , Bktambém são inde- 15.Considere novamente as condições do Exercício 13. Quantas partículas devem ser emitidas para que a probabilidade de pelo menos uma partícula penetrar na blindagem seja de pelo menos 0,8? eventos pendentes.Dica:Use um argumento de indução baseado no número de eventosBjpara qualBj=Ac j. 21.Prove o Teorema 2.2.2 na página 71.Dica:A direção “somente se” é direta da definição de independência na página 68. Para a direção “se”, use indução no valor dej na definição de independência. Deixareu=j-1 e deixe = 1 comj1=euj. 16.Na World Series de beisebol, dois timesAeB jogam uma sequência de jogos entre si, e o primeiro time que vencer um total de quatro jogos torna-se o vencedor da World Series. Se a probabilidade dessa equipeAvencerá qualquer jogo específico contra o timeBé 1/3, qual é a probabilidade de que a equipeAvai ganhar a World Series? 22.Prove o Teorema 2.2.4 na página 73. 17.Dois rapazesAeBlançar uma bola em um alvo. Suponha que a probabilidade desse meninoAacertará o alvo em qualquer lance é 1/3 e a probabilidade de que o meninoBatingirá o alvo em qualquer lance é 1/4. Suponha também que aquele meninoAlança primeiro e os dois meninos se revezam no lançamento. Determine a probabilidade de o alvo ser atingido pela primeira vez no terceiro lançamento do menino.A. 23.Um programador está prestes a tentar compilar uma série de 11 programas semelhantes. DeixarAeuseja o evento que eu o programa compila com sucesso paraeu=1, . . . ,11. Quando a tarefa de programação é fácil, o programador espera que 80% dos programas sejam compilados. Quando a tarefa de programação é difícil, ela espera que apenas 40% dos programas sejam compilados. DeixarBaconteceria que a tarefa de programação fosse fácil. O programador acredita que os eventosA1, . . . , A11são condicionalmente independentes, dadosBe dadoBc. 18.Para as condições do Exercício 17, determine a probabilidade de que o meninoAvai acertar o alvo antes garotoBfaz. 19.Uma caixa contém 20 bolas vermelhas, 30 bolas brancas e 50 bolas azuis. Suponha que 10 bolas sejam selecionadas aleatoriamente, uma de cada vez, com reposição; isto é, cada bola selecionada é recolocada na caixa antes que a próxima seleção seja feita. Determine a probabilidade de faltar pelo menos uma cor nas 10 bolas selecionadas. a.Calcule a probabilidade de que exatamente 8 de 11 programas sejam compiladosB. b.Calcule a probabilidade de que exatamente 8 de 11 programas sejam compiladosBc. 24.Prove o Teorema 2.2.3 na página 72. 2.3 Teorema de Bayes Suponha que estejamos interessados em qual dos vários eventos disjuntosB1, . . . , Bk ocorrerá e que poderemos observar algum outro eventoA.SePr.(A|Beu)está disponível para cadaeu,então o teorema de Bayes é uma fórmula útil para calcular as probabilidades condicionais doBeueventos dadosA. Começamos com um exemplo típico. Exemplo 2.3.1 Teste para uma doença.Suponha que você esteja andando na rua e perceba que o O Departamento de Saúde Pública está oferecendo um exame médico gratuito para uma determinada doença. O teste é 90% confiável no seguinte sentido: se uma pessoa tiver a doença, há uma probabilidade de 0,9 de que o teste dê uma resposta positiva; ao passo que, se uma pessoa não tiver a doença, existe uma probabilidade de apenas 0,1 de que o teste dê uma resposta positiva. Os dados indicam que as suas chances de ter a doença são de apenas 1 em 10.000. Porém, como o teste não custa nada e é rápido e inofensivo, você decide parar e fazer o teste. Alguns dias depois você descobre que teve uma resposta positiva ao teste. Agora, qual é a probabilidade de você ter a doença? - 2.3Teoremade Bayes 77 2.3 Bayes’ Theorem (77 A Ultima questdo do Exemplo 2.3.1 é um protétipo da questao para a qual o teorema The last question in Example 2.3.1 is a prototype of the question for which Bayes’ de Bayes foi concebido. Temos pelo menos dois eventos desconexos (“vocé tem a doenca” theorem was designed. We have at least two disjoint events (“you have the disease” e “vocé nao tem a doenga”) sobre os quais ndo temos certeza, e aprendemos uma and “you do not have the disease”) about which we are uncertain, and we learn a informacdo (0 resultado do teste) que nos diz algo sobre o acontecimentos incertos. piece of information (the result of the test) that tells us something about the uncertain Depois precisamos de saber como rever as probabilidades dos acontecimentos a luz da events. Then we need to know how to revise the probabilities of the events in the light informagdo que aprendemos. of the information we learned. Apresentamos agora a estrutura geral em que opera 0 teorema de Bayes antes de We now present the general structure in which Bayes’ theorem operates before retornar ao exemplo. returning to the example. Declaragdo, prova e exemplos do teorema de Bayes Statement, Proof, and Examples of Bayes’ Theorem Exemplo Selecionando parafusos.Considere novamente a situacdo do Exemplo 2.1.8, na qual um parafuso é Example Selecting Bolts. Consider again the situation in Example 2.1.8, in which a bolt is 2.3.2 selecionado aleatoriamente de uma das duas caixas. Suponha que nado possamos dizer sem 2.3.2 selected at random from one of two boxes. Suppose that we cannot tell without fazer mais esforco em qual das duas caixas 0 parafuso esta sendo selecionado. Por exemplo, as making a further effort from which of the two boxes the one bolt is being selected. For caixas podem ter aparéncia idéntica ou outra pessoa pode realmente selecionar a caixa, mas SO example, the boxes may be identical in appearance or somebody else may actually conseguimos ver o parafuso. Antes de selecionar 0 parafuso, era igualmente provavel que cada select the box, but we only get to see the bolt. Prior to selecting the bolt, it was uma das duas caixas fosse selecionada. No entanto, se soubermos que esse eventoAocorreu, equally likely that each of the two boxes would be selected. However, if we learn that ou seja, um parafuso longo foi selecionado, podemos calcular as probabilidades condicionais event A has occurred, that is, a long bolt was selected, we can compute the conditional das duas caixas dadasA. Para lembrar ao leitor, B16 0 evento em que a caixa é selecionada probabilities of the two boxes given A. To remind the reader, B, is the event that the contendo 60 parafusos longos e 40 parafusos curtos, enquantoA2é 0 evento em que a caixa é box is selected containing 60 long bolts and 40 short bolts, while B, is the event that selecionada contendo 10 parafusos longos e 20 parafusos curtos. No Exemplo 2.1.9, calculamos the box is selected containing 10 long bolts and 20 short bolts. In Example 2.1.9, we Pr(AE7/5, Pr.(A| B1 534, Pr.(A| 821A, e Pr.(BiPr.(B21/2. Entdo, por exemplo, computed Pr(A) = 7/15, Pr(A|B,) = 3/5, Pr(A|B,) = 1/3, and Pr(B,) = Pr(Bz) = 1/2. So, for example, 1 1 3 Pr.(ANB Pr.(B1 )Pr.(A| Bt - sz 9 Pr(ANB Pr(B,) Pr(A|B 7Xz 9 Pr(Bi|A) = (ANB 4) Pr(BiPr(Al Bi) _, 7 Pr(B,|A) = ( v _ Pr(By) CIB) 2732 Pr.(A) Pr.(A) 15 14 Pr(A) Pr(A) B 14 Como a primeira caixa tem uma propor¢ao maior de parafusos longos que a segunda caixa, parece Since the first box has a higher proportion of long bolts than the second box, it seems razoavel que a probabilidade deBideve subir depois de sabermos que um parafuso longo foi reasonable that the probability of B, should rise after we learn that a long bolt was selecionado. Deve ser aquele Pr(B2| A5/14 ja que uma ou outra caixa tinha que ser selecionada. selected. It must be that Pr(B|A) = 5/14 since one or the other box had to be selected. - < No Exemplo 2.3.2, comegamos com a incerteza sobre qual das duas caixas seria escolhida In Example 2.3.2, we started with uncertainty about which of two boxes would e entdo observamos um longo parafuso retirado da caixa escolhida. Como as duas caixas tm be chosen and then we observed a long bolt drawn from the chosen box. Because the chances diferentes de ter um parafuso longo desenhado, a observacdo de um parafuso longo two boxes have different chances of having a long bolt drawn, the observation of a alterou as probabilidades de cada uma das duas caixas ter sido escolhida. O calculo preciso de long bolt changed the probabilities of each of the two boxes having been chosen. The como as probabilidades mudam é 0 propésito do teorema de Bayes. precise calculation of how the probabilities change is the purpose of Bayes’ theorem. Teorema Teorema de Bayes.Deixe os eventos&1,..., Bformar uma partigao do espacoSde tal modo que Theorem Bayes’ theorem. Let the events By,..., B, form a partition of the space S such that 2.3.1 Pr.(Bj) >0 paraf-1,..., k, e deixarAser um evento tal que Pr(A) >0. Entdo, por eu= 2.3.1 Pr(B;) > 0 for j =1,...,k, and let A be an event such that Pr(A) > 0. Then, for 1,...,k, i=l,...,k, Pr.(BeuPr.(A| B Pr(B;) Pr(A|B; Pr.(Beu| ARS «CaP Al Bey (2.3.1) Pr(B,)A) = BD ENAIB) (2.3.1) fiPr.(ByPr.(A| Bi) Vizl Pr(B;) Pr(A|B;) ProvaPela definicdo de probabilidade condicional, Proof By the definition of conditional probability, Pr.(BeuNA Pr(ByN A Pr.(Beu| A Pri(BewVA) Pr(B;|A) = Pr(Bj 0 A) Pr.(A) Pr(A) O numerador do lado direito da Eq. (2.3.1) € igual a Pr(BeuA)pelo Teorema 2.1.1. O The numerator on the right side of Eq. (2.3.1) is equal to Pr(B; N A) by Theorem 2.1.1. denominador é igual a Pr(A)de acordo com o Teorema 2.1.4. 7 The denominator is equal to Pr(A) according to Theorem 2.1.4. = 78 Chapter 2 Conditional Probability Example 2.3.3 Test for a Disease. Let us return to the example with which we began this section. We have just received word that we have tested positive for a disease. The test was 90 percent reliable in the sense that we described in Example 2.3.1. We want to know the probability that we have the disease after we learn that the result of the test is positive. Some readers may feel that this probability should be about 0.9. However, this feeling completely ignores the small probability of 0.0001 that you had the disease before taking the test. We shall let B1 denote the event that you have the disease, and let B2 denote the event that you do not have the disease. The events B1 and B2 form a partition. Also, let A denote the event that the response to the test is positive. The event A is information we will learn that tells us something about the partition elements. Then, by Bayes’ theorem, Pr(B1|A) = Pr(A|B1) Pr(B1) Pr(A|B1) Pr(B1) + Pr(A|B2) Pr(B2) = (0.9)(0.0001) (0.9)(0.0001) + (0.1)(0.9999) = 0.00090. Thus, the conditional probability that you have the disease given the test result is approximately only 1 in 1000. Of course, this conditional probability is approxi- mately 9 times as great as the probability was before you were tested, but even the conditional probability is quite small. Another way to explain this result is as follows: Only one person in every 10,000 actually has the disease, but the test gives a positive response for approximately one person in every 10. Hence, the number of positive responses is approximately 1000 times the number of persons who actually have the disease. In other words, out of every 1000 persons for whom the test gives a positive response, only one person actually has the disease. This example illustrates not only the use of Bayes’ theorem but also the importance of taking into account all of the information available in a problem. ◀ Example 2.3.4 Identifying the Source of a Defective Item. Three different machines M1, M2, and M3 were used for producing a large batch of similar manufactured items. Suppose that 20 percent of the items were produced by machine M1, 30 percent by machine M2, and 50 percent by machine M3. Suppose further that 1 percent of the items produced by machine M1 are defective, that 2 percent of the items produced by machine M2 are defective, and that 3 percent of the items produced by machine M3 are defective. Finally, suppose that one item is selected at random from the entire batch and it is found to be defective. We shall determine the probability that this item was produced by machine M2. Let Bi be the event that the selected item was produced by machine Mi (i = 1, 2, 3), and let A be the event that the selected item is defective. We must evaluate the conditional probability Pr(B2|A). The probability Pr(Bi) that an item selected at random from the entire batch was produced by machine Mi is as follows, for i = 1, 2, 3: Pr(B1) = 0.2, Pr(B2) = 0.3, Pr(B3) = 0.5. Furthermore, the probability Pr(A|Bi) that an item produced by machine Mi will be defective is Pr(A|B1) = 0.01, Pr(A|B2) = 0.02, Pr(A|B3) = 0.03. It now follows from Bayes’ theorem that 78 Capítulo 2 Probabilidade Condicional Exemplo 2.3.3 Teste para uma doença.Voltemos ao exemplo com o qual iniciamos esta seção. Acabamos de receber a notícia de que testamos positivo para uma doença. O teste foi 90% confiável no sentido descrito no Exemplo 2.3.1. Queremos saber a probabilidade de termos a doença depois de sabermos que o resultado do teste é positivo. Alguns leitores podem achar que essa probabilidade deveria ser de cerca de 0,9. Porém, esse sentimento ignora completamente a pequena probabilidade de 0,0001 de você ter a doença antes de fazer o teste. Vamos deixarB1indique o evento em que você tem a doença e deixeB2 denota o evento em que você não tem a doença. Os eventosB1eB2formar uma partição. Além disso, deixeAdenotam o evento em que a resposta ao teste é positiva. O eventoA esta é uma informação que aprenderemos e que nos diz algo sobre os elementos da partição. Então, pelo teorema de Bayes, Pr.(A|B1)Pr.(B1) Pr.(A|B1)Pr.(B1)+Pr.(A|B2)Pr.(B2) (0.9)(0.0001) (0.9)(0.0001)+(0.1)(0.9999) Pr.(B1|A)= = =0.00090. Assim, a probabilidade condicional de você ter a doença, dado o resultado do teste, é de aproximadamente apenas 1 em 1.000. É claro que essa probabilidade condicional é aproximadamente 9 vezes maior do que a probabilidade era antes de você ser testado, mas mesmo a probabilidade condicional é bastante pequena. . Outra forma de explicar este resultado é a seguinte: apenas uma pessoa em cada 10.000 tem realmente a doença, mas o teste dá uma resposta positiva para aproximadamente uma pessoa em cada 10. Assim, o número de respostas positivas é aproximadamente 1.000 vezes o número de respostas positivas. pessoas que realmente têm a doença. Por outras palavras, em cada 1000 pessoas para as quais o teste dá uma resposta positiva, apenas uma pessoa tem realmente a doença. Este exemplo ilustra não apenas o uso do teorema de Bayes, mas também a importância de levar em conta todas as informações disponíveis em um problema. - Exemplo 2.3.4 Identificando a origem de um item com defeito.Três máquinas diferentesM1,M2, eM3 foram usados para produzir um grande lote de itens manufaturados semelhantes. Suponha que 20% dos itens foram produzidos por máquinasM1, 30 por cento por máquinaM2e 50 por cento por máquinaM3. Suponha ainda que 1% dos itens produzidos por máquinasM1são defeituosos, que 2% dos itens produzidos por máquinasM2 são defeituosos e que 3% dos itens produzidos por máquinasM3estão com defeito. Finalmente, suponha que um item seja selecionado aleatoriamente de todo o lote e seja considerado defeituoso. Determinaremos a probabilidade de que este item tenha sido produzido por máquinaM2. DeixarBeuser o evento em que o item selecionado foi produzido por máquinaMeu(eu= 1,2, 3), e deixeAser o evento em que o item selecionado está com defeito. Devemos avaliar a probabilidade condicional Pr(B2|A). A probabilidade Pr(Beu)que um item selecionado aleatoriamente de todo o lote foi produzido por máquinaMeué o seguinte, paraeu=1,2,3: Pr.(B1)=0.2,Pr.(B2)=0.3,Pr.(B3)=0.5. Além disso, a probabilidade Pr(A|Beu)que um item produzido por máquinaMeuestará com defeito é Pr.(A|B1)=0.01,Pr.(A|B2)=0.02,Pr.(A|B3)=0.03. Segue-se agora do teorema de Bayes que 2.3Teoremade Bayes 79 2.3 Bayes’ Theorem 79 Pr.(B2| AF 5 PAN) Pr(B>|A) = — Fr) ENAIB2) F1Pr.(ByPr.(A| B) jai Pr(B;j) Pr(AlB;) _ (0.3)0.02) -0.26. _ _ (0.3) (0.02) _ 0.26. < (0.2)0.01 4 0.3)/0.02} (0.5)0.03) (0.2)(0.01) + (0.3)(0.02) + (0.5) (0.03) Exemplo Identificando Gendétipos.Considere um gene que possui dois alelos (ver Exemplo 1.6.4 na Example Identifying Genotypes. Consider a gene that has two alleles (see Example 1.6.4 on 2.3.5 pagina 23)Aea. Suponha que o gene se exiba através de uma Caracteristica (como cor 2.3.5 page 23) A and a. Suppose that the gene exhibits itself through a trait (such as do cabelo ou tipo sanguineo) com duas versées. Nos chamamosAdominanteea hair color or blood type) with two versions. We call A dominant and a recessive recessivo se individuos com gendtiposAAeAAtém a mesma versdo da caracteristica e if individuals with genotypes AA and Aa have the same version of the trait and os individuos com gendtipoaftem a outra versdo. As duas versées da caracteristica the individuals with genotype aa have the other version. The two versions of the sdo chamadas/fendtipos. Chamaremos 0 fendtipo exibido por individuos com trait are called phenotypes. We shall call the phenotype exhibited by individuals gendtiposAAeAho Tracgo dominante,e a outra caracteristica sera chamada de 7ra¢o with genotypes AA and Aa the dominant trait, and the other trait will be called the recessivo. Em estudos de genética populacional, 6 comum ter informacées sobre os recessive trait. In population genetics studies, it is common to have information on the fendtipos dos individuos, mas é bastante dificil determinar os gendtipos. No entanto, phenotypes of individuals, but it is rather difficult to determine genotypes. However, algumas informagées sobre gendtipos podem ser obtidas observando-se fendtipos some information about genotypes can be obtained by observing phenotypes of de pais e filhos. parents and children. Suponha que 0 aleloAé dominante, que os individuos acasalam independentemente Assume that the allele A is dominant, that individuals mate independently of do gendtipo e que os gendtiposAA,Ah, eahocorrem na populagado com probabilidades de genotype, and that the genotypes AA, Aa, and aa occur in the population with prob- 1/4, 1/2 e 1/4, respectivamente. Vamos observar um individuo cujos pais nado estaéo abilities 1/4, 1/2, and 1/4, respectively. We are going to observe an individual whose disponiveis e observaremos 0 fendotipo desse individuo. Deixar Fseja o evento em que o parents are not available, and we shall observe the phenotype of this individual. Let individuo observado possui 0 trago dominante. Gostariamos de revisar nossa opinido E be the event that the observed individual has the dominant trait. We would like sobre os possiveis gendtipos dos pais. Existem seis combinagées possiveis de gendtipos, B to revise our opinion of the possible genotypes of the parents. There are six possible 1,..., B6, para os pais antes de fazer quaisquer observagées, e estas estdo listadas na genotype combinations, B,, ..., Bg, for the parents prior to making any observations, Tabela 2.2. and these are listed in Table 2.2. As probabilidades doBe.foram calculados usando a suposicdo de que os pais The probabilities of the B; were computed using the assumption that the parents acasalaram independentemente do gendtipo. Por exemplo, B3ocorre se o pai forAAe mated independently of genotype. For example, B3 occurs if the father is AA and the a mde €a/(probabilidade 1/16) ou se o pai forahe a mde €AA(probabilidade 1/16). Os mother is aa (probability 1/16) or if the father is aa and the mother is AA (probability valores de Pr(E| BeuJforam calculados assumindo que os dois alelos disponiveis sdo 1/16). The values of Pr(E|B;) were computed assuming that the two available alleles passados de pais para filhos com probabilidade 1/2 cada e independentemente are passed from parents to children with probability 1/2 each and independently for para os dois pais. Por exemplo, dado 4, o eventofocorre se e somente se a crianca the two parents. For example, given By, the event E occurs if and only if the child nado receber doisa's. A probabilidade de obterade ambos os pais dados&4é 1/4, entdo does not get two a’s. The probability of getting a from both parents given By, is 1/4, Pr(E| BaF3A. so Pr(E| By) = 3/4. Agora vamos calcular Pr(B1 | £Je Pr(Bs| £). Deixamos os demais calculos para o Now we shall compute Pr(B,|£) and Pr(Bs|E). We leave the other calculations leitor. O denominador do teorema de Bayes € 0 mesmo para ambos os calculos, ou to the reader. The denominator of Bayes’ theorem is the same for both calculations, seja, namely, » 5 Pr.(EF Pr.(Beu)Pr.(E| Beu) Pr(E) = ) © Pr(B;) Pr(E|B;) eu=1 i=l ~ 1 axtixdaxdice 1 3 1 1 1 yo- 3 aft ag yagi gt 3g yt tyoe3. 146 #4 8 4 4 4 2 16 #44 16 6 €640CUCC8mUC<—~<~CwMA HA 4 2 1B Nome do evento Bi B2 B3 Ba Bs Be Name of event By Bp B, By Bs Bo Probabilidade deBeu 16/01 1/4 1/8 1/4 1/4 16/01 Probability of B; 1/16 1/4 1/8 1/4 1/4 1/16 Pr.(E| Beu) 1 1 1 3/4 1/2 0 Pr(E|B;) 1 1 1 3/4 1/2 0 80 = Capitulo 2 Probabilidade Condicional 80 Chapter 2 Conditional Probability Aplicando o teorema de Bayes, obtemos Applying Bayes’ theorem, we get 1 44 4X15 1 wzxil 1 ax3_1 Pr.(Bi| EF — = —, Pr.(BlEe4a — = =, - Pr(B,|E) = #2 = —, Pr(B5|E) = +. =e. < 4- «(12 i 6 3 12 3 6 Nota: Versao Condicional do Teorema de Bayes.Ha também uma versdo do teorema de Note: Conditional Version of Bayes’ Theorem. There is also a version of Bayes’ Bayes condicional a um eventoC. theorem conditional on an event C: Pr.(Beu| C)Pr.(A| BeuNC Pr(B;|C) Pr(A|B; AC Pr.(Beu| ANCE 5 eel RAL Bene ) (2.3.2) Pr(B,|ANC) = BIC) ENAIB OG) (2.3.2) F1Pr.(Bi| CPr.(A| BNO Vi=l Pr(B;|C) Pr(A|B; OC) Probabilidades anteriores e posteriores Prior and Posterior Probabilities No Exemplo 2.3.4, uma probabilidade como Pr(62} frequentemente chamado de In Example 2.3.4, a probability like Pr(B,) is often called the prior probability that probabilidade anteriorque o item selecionado tera sido produzido por maquina/2, the selected item will have been produced by machine M>, because Pr(B>) is the porque Pr(&2% a probabilidade deste evento antes do item ser selecionado e antes de se probability of this event before the item is selected and before it is known whether saber se o item selecionado é defeituoso ou nado. Uma probabilidade como Pr(&2| AX the selected item is defective or nondefective. A probability like Pr(B>|A) is then entao chamado deprobabilidade posteriorque o item selecionado foi produzido por called the posterior probability that the selected item was produced by machine M), maquina/2, pois é a probabilidade deste evento apos se saber que o item selecionado because it is the probability of this event after it is known that the selected item is esta com defeito. defective. Assim, no Exemplo 2.3.4, a probabilidade a priori de que o item selecionado terd sido Thus, in Example 2.3.4, the prior probability that the selected item will have been produzido por maquina/M2é 0,3. Depois que um item foi selecionado e considerado defeituoso, produced by machine Mj is 0.3. After an item has been selected and has been found a probabilidade posterior de que o item tenha sido produzido por maquina M2é 0,26. Como to be defective, the posterior probability that the item was produced by machine esta probabilidade posterior é menor que a probabilidade anterior de que o item foi produzido M, is 0.26. Since this posterior probability is smaller than the prior probability that por maquina/2, a probabilidade posterior de que o item tenha sido produzido por uma das the item was produced by machine M), the posterior probability that the item was outras maquinas deve ser maior do que a probabilidade anterior de que tenha sido produzido produced by one of the other machines must be larger than the prior probability that por uma dessas maquinas (ver Exercicios 1 e 2 no final desta secao). it was produced by one of those machines (see Exercises 1 and 2 at the end of this section). Calculo de probabilidades posteriores em mais de um estagio Computation of Posterior Probabilities in More Than One Stage Suponha que uma caixa contenha uma moeda honesta e uma moeda com cara de cada lado. Suppose that a box contains one fair coin and one coin with a head on each side. Suponha também que uma moeda seja selecionada aleatoriamente e que, ao ser langada, seja obtida Suppose also that one coin is selected at random and that when it is tossed, a head is uma cara. Determinaremos a probabilidade de a moeda ser a moeda honesta. obtained. We shall determine the probability that the coin is the fair coin. Deixar Biseja o evento em que a moeda € honesta, sejaB2seja o evento em que a moeda tenha Let B, be the event that the coin is fair, let B. be the event that the coin has two duas caras, e sejaseja o evento em que uma cara é obtida quando a moeda é langada. Entdo, pelo heads, and let H, be the event that a head is obtained when the coin is tossed. Then, teorema de Bayes, by Bayes’ theorem, Pr.(B1 )Pr.(M| Bi Pr(B,) Pr(A,|B Pr(Bi| Mie ——_e@vPrPA| Bi) _ Pr(B,| Hy) = 28) PGB) _ Pr.(Bi)Pr.(Hi | Bi + Pr.(B2)Pr.(M | B2) Pr(B,) Pr(A,|B,) + Pr(B2) Pr(A;| Bo) 1/2172 1 1/2)(1/2 1 -_ A202) oN (2.3.3) —~__ G/2ja/2y dd (2.3.3) A2Z)M2KA2)1) 3 (1/2)0/2) + 1/2)Q) 3 Assim, apés 0 primeiro lancamento, a probabilidade posterior de a moeda ser justa é 1/3. Thus, after the first toss, the posterior probability that the coin is fair is 1/3. Agora suponhamos que a mesma moeda seja lancada novamente e assumimos que os Now suppose that the same coin is tossed again and we assume that the two dois langamentos sdo condicionalmente independentes, dados ambos&1e42. Suponha que tosses are conditionally independent given both B, and B,. Suppose that another outra cabe¢a seja obtida. Existem duas maneiras de determinar 0 novo valor da probabilidade head is obtained. There are two ways of determining the new value of the posterior posterior de que a moeda seja justa. probability that the coin is fair. A primeira maneira é retornar ao inicio do experimento e assumir novamente que as The first way is to return to the beginning of the experiment and assume again probabilidades a priori séo Pr(B1 EPr.(B21/2. Vamos deixar Hin Hedenotaremos o evento that the prior probabilities are Pr(B,) = Pr( Bz) = 1/2. We shall let H, N H, denote the em que cara é obtida em dois langamentos da moeda, e calcularemos a probabilidade event in which heads are obtained on two tosses of the coin, and we shall calculate the posterior Pr(B1 | HiNH2)que a moeda é justa depois de termos observado o posterior probability Pr(B,|H, M H>) that the coin is fair after we have observed the 2.3 Bayes’ Theorem 81 event H1 ∩ H2. The assumption that the tosses are conditionally independent given B1 means that Pr(H1 ∩ H2|B1) = 1/2 × 1/2 = 1/4. By Bayes’ theorem, Pr(B1|H1 ∩ H2) = Pr(B1) Pr(H1 ∩ H2|B1) Pr(B1) Pr(H1 ∩ H2|B1) + Pr(B2) Pr(H1 ∩ H2|B2) = (1/2)(1/4) (1/2)(1/4) + (1/2)(1) = 1 5. (2.3.4) The second way of determining this same posterior probability is to use the conditional version of Bayes’ theorem (2.3.2) given the event H1. Given H1, the conditional probability of B1 is 1/3, and the conditional probability of B2 is therefore 2/3. These conditional probabilities can now serve as the prior probabilities for the next stage of the experiment, in which the coin is tossed a second time. Thus, we can apply (2.3.2) with C = H1, Pr(B1|H1) = 1/3, and Pr(B2|H1) = 2/3. We can then compute the posterior probability Pr(B1|H1 ∩ H2) that the coin is fair after we have observed a head on the second toss and a head on the first toss. We shall need Pr(H2|B1 ∩ H1), which equals Pr(H2|B1) = 1/2 by Theorem 2.2.4 since H1 and H2 are conditionally independent given B1. Since the coin is two-headed when B2 occurs, Pr(H2|B2 ∩ H1) = 1. So we obtain Pr(B1|H1 ∩ H2) = Pr(B1|H1) Pr(H2|B1 ∩ H1) Pr(B1|H1) Pr(H2|B1 ∩ H1) + Pr(B2|H1) Pr(H2|B2 ∩ H1) = (1/3)(1/2) (1/3)(1/2) + (2/3)(1) = 1 5. (2.3.5) The posterior probability of the event B1 obtained in the second way is the same as that obtained in the first way. We can make the following general statement: If an experiment is carried out in more than one stage, then the posterior probability of every event can also be calculated in more than one stage. After each stage has been carried out, the posterior probability calculated for the event after that stage serves as the prior probability for the next stage. The reader should look back at (2.3.2) to see that this interpretation is precisely what the conditional version of Bayes’ theorem says. The example we have been doing with coin tossing is typical of many applications of Bayes’ theorem and its conditional version because we are assuming that the observable events are conditionally independent given each element of the partition B1, . . . , Bk (in this case, k = 2). The conditional independence makes the probability of Hi (head on ith toss) given B1 (or given B2) the same whether or not we also condition on earlier tosses (see Theorem 2.2.4). Conditionally Independent Events The calculations that led to (2.3.3) and (2.3.5) together with Example 2.2.10 illustrate simple cases of a very powerful statistical model for observable events. It is very common to encounter a sequence of events that we believe are similar in that they all have the same probability of occurring. It is also common that the order in which the events are labeled does not affect the probabilities that we assign. However, we often believe that these events are not independent, because, if we were to observe some of them, we would change our minds about the probability of the ones we had not observed depending on how many of the observed events occur. For example, in the coin-tossing calculation leading up to Eq. (2.3.3), before any tosses occur, the probability of H2 is the same as the probability of H1, namely, the 2.3 Teorema de Bayes 81 eventoH1∩H2. A suposição de que os lançamentos são condicionalmente independentes, dada B1significa que Pr(H1∩H2|B1)=1/2×1/2 = 1/4. Pelo teorema de Bayes, Pr.(B1)Pr.(H1∩H2|B1) Pr.(B1)Pr.(H1∩H2|B1)+Pr.(B2)Pr.(H1∩H2|B2) (1/2)(1/4) (1/2)(1/4)+(1/2)(1) Pr.(B1|H1∩H2)= 1 5 = = . (2.3.4) A segunda forma de determinar esta mesma probabilidade posterior é usar a versão condicional do teorema de Bayes (2.3.2) dado o eventoH1. DadoH1, a probabilidade condicional deB1é 1/3, e a probabilidade condicional deB2é, portanto, 2/3. Estas probabilidades condicionais podem agora servir como probabilidades anteriores para a próxima fase da experiência, na qual a moeda é lançada uma segunda vez. Assim, podemos aplicar (2.3.2) comC=H1, Pr.(B1|H1)=1/3, e Pr.(B2|H1)=2/3. Podemos então calcular a probabilidade posterior Pr(B1|H1∩H2)que a moeda é justa depois de termos observado uma cara no segundo lançamento e uma cara no primeiro lançamento. Precisaremos do Pr(H2|B1∩H1), que é igual a Pr(H2|B1)=1/2 pelo Teorema 2.2.4 desdeH1e H2são condicionalmente independentes, dadosB1. Como a moeda tem duas caras quando B2ocorre, Pr(H2|B2∩H1)=1. Então obtemos Pr.(B1|H1)Pr.(H2|B1∩H1) Pr.(B1|H1)Pr.(H2|B1∩H1)+Pr.(B2|H1)Pr.(H2|B2∩H1) (1/3)(1/2) (1/3)(1/2)+(2/3)(1) Pr.(B1|H1∩H2)= 1 5 = = . (2.3.5) A probabilidade posterior do eventoB1obtido na segunda maneira é igual ao obtido na primeira maneira. Podemos fazer a seguinte afirmação geral: Se um experimento for realizado em mais de um estágio, então a probabilidade posterior de cada evento também pode ser calculada em mais de um estágio. Após a realização de cada etapa, a probabilidade posterior calculada para o evento após essa etapa serve como probabilidade anterior para a próxima etapa. O leitor deveria olhar novamente para (2.3.2) para ver que esta interpretação é precisamente o que diz a versão condicional do teorema de Bayes. O exemplo que temos feito com o lançamento de moeda é típico de muitas aplicações do teorema de Bayes e sua versão condicional porque estamos assumindo que os eventos observáveis são condicionalmente independentes, dado cada elemento da partiçãoB1, . . . , Bk(nesse caso,k=2). A independência condicional torna a probabilidade deHeu(de cabeça erguidaeuo lance) dadoB1(ou dadoB2) o mesmo, quer também condicionemos ou não lançamentos anteriores (ver Teorema 2.2.4). Eventos Condicionalmente Independentes Os cálculos que levaram a (2.3.3) e (2.3.5), juntamente com o Exemplo 2.2.10, ilustram casos simples de um modelo estatístico muito poderoso para eventos observáveis. É muito comum encontrar uma sequência de eventos que acreditamos serem semelhantes, pois todos têm a mesma probabilidade de ocorrer. Também é comum que a ordem em que os eventos são rotulados não afete as probabilidades que atribuímos. No entanto, muitas vezes acreditamos que estes acontecimentos não são independentes, porque, se observássemos alguns deles, mudaríamos de ideias sobre a probabilidade daqueles que não havíamos observado, dependendo de quantos dos acontecimentos observados ocorrem. Por exemplo, no cálculo do lançamento de moeda que leva à Eq. (2.3.3), antes de ocorrer qualquer lançamento, a probabilidade deH2é igual à probabilidade deH1, a saber, o 82 Chapter 2 Conditional Probability denominator of (2.3.3), 3/4, as Theorem 2.1.4 says. However, after observing that the event H1 occurs, the probability of H2 is Pr(H2|H1), which is the denominator of (2.3.5), 5/6, as computed by the conditional version of the law of total probability (2.1.5). Even though we might treat the coin tosses as independent conditional on the coin being fair, and we might treat them as independent conditional on the coin being two-headed (in which case we know what will happen every time anyway), we cannot treat them as independent without the conditioning information. The conditioning information removes an important source of uncertainty from the problem, so we partition the sample space accordingly. Now we can use the conditional independence of the tosses to calculate joint probabilities of various combinations of events conditionally on the partition events. Finally, we can combine these probabilities using Theorem 2.1.4 and (2.1.5). Two more examples will help to illustrate these ideas. Example 2.3.6 Learning about a Proportion. In Example 2.2.10 on page 72, a machine produced defective parts in one of two proportions, p = 0.01 or p = 0.4. Suppose that the prior probability that p = 0.01 is 0.9. After sampling six parts at random, suppose that we observe two defectives. What is the posterior probability that p = 0.01? Let B1 = {p = 0.01} and B2 = {p = 0.4} as in Example 2.2.10. Let A be the event that two defectives occur in a random sample of size six. The prior probability of B1 is 0.9, and the prior probability of B2 is 0.1. We already computed Pr(A|B1) = 1.44 × 10−3 and Pr(A|B2) = 0.311 in Example 2.2.10. Bayes’ theorem tells us that Pr(B1|A) = 0.9 × 1.44 × 10−3 0.9 × 1.44 × 10−3 + 0.1 × 0.311 = 0.04. Even though we thought originally that B1 had probability as high as 0.9, after we learned that there were two defective items in a sample as small as six, we changed our minds dramatically and now we believe that B1 has probability as small as 0.04. The reason for this major change is that the event A that occurred has much higher probability if B2 is true than if B1 is true. ◀ Example 2.3.7 A Clinical Trial. Consider the same clinical trial described in Examples 2.1.12 and 2.1.13. Let Ei be the event that the ith patient has success as her outcome. Recall that Bj is the event that p = (j − 1)/10 for j = 1, . . . , 11, where p is the proportion of successes among all possible patients. If we knew which Bj occurred, we would say that E1, E2, . . . were independent. That is, we are willing to model the patients as conditionally independent given each event Bj, and we set Pr(Ei|Bj) = (j − 1)/10 for all i, j. We shall still assume that Pr(Bj) = 1/11 for all j prior to the start of the trial. We are now in position to express what we learn about p by computing posterior probabilities for the Bj events after each patient finishes the trial. For example, consider the first patient. We calculated Pr(E1) = 1/2 in (2.1.6). If E1 occurs, we apply Bayes’ theorem to get Pr(Bj|E1) = Pr(E1|Bj) Pr(Bj) 1/2 = 2(j − 1) 10 × 11 = j − 1 55 . (2.3.6) After observing one success, the posterior probabilities of large values of p are higher than their prior probabilities and the posterior probabilities of low values of p are lower than their prior probabilities as we would expect. For example, Pr(B1|E1) = 0, because p = 0 is ruled out after one success. Also, Pr(B2|E1) = 0.0182, which is much smaller than its prior value 0.0909, and Pr(B11|E1) = 0.1818, which is larger than its prior value 0.0909. 82 Capítulo 2 Probabilidade Condicional denominador de (2.3.3), 3/4, como diz o Teorema 2.1.4. No entanto, depois de observar que o eventoH1ocorre, a probabilidade deH2é Pr(H2|H1), que é o denominador de (2.3.5), 5/6, conforme calculado pela versão condicional da lei da probabilidade total (2.1.5). Mesmo que possamos tratar os lançamentos de moeda como independentes, condicionados à moeda ser justa, e possamos tratá-los como independentes, condicionados à moeda ter duas caras (nesse caso, sabemos o que acontecerá sempre de qualquer maneira), não podemos tratá-los como independente sem a informação condicionante. A informação condicionada remove uma importante fonte de incerteza do problema, então particionamos o espaço amostral de acordo. Agora podemos usar a independência condicional dos lançamentos para calcular as probabilidades conjuntas de várias combinações de eventos condicionalmente aos eventos de partição. Finalmente, podemos combinar essas probabilidades usando o Teorema 2.1.4 e (2.1.5). Mais dois exemplos ajudarão a ilustrar essas ideias. Exemplo 2.3.6 Aprendendo sobre uma proporção.No Exemplo 2.2.10 na página 72, uma máquina produziu peças defeituosas em uma das duas proporções,p=0.01 oup=0.4. Suponha que a probabilidade anterior de quep=0.01 é 0,9. Depois de amostrarmos seis peças aleatoriamente, suponhamos que observamos duas peças defeituosas. Qual é a probabilidade posterior de quep=0.01? DeixarB1= {p=0.01} eB2= {p=0.4} como no Exemplo 2.2.10. DeixarAseja o evento em que dois defeitos ocorrem em uma amostra aleatória de tamanho seis. A probabilidade anterior de B1é 0,9, e a probabilidade anterior deB2é 0,1. Já calculamos Pr(A|B1)= 1.44×10 −3e Pr(A|B2)=0.311 no Exemplo 2.2.10. O teorema de Bayes nos diz que 0.9×1.44×10−3 0.9×1.44×10−3+ 0.1×0.311 Pr.(B1|A)= =0.04. Embora pensássemos originalmente queB1tinha probabilidade tão alta quanto 0,9, depois que soubemos que havia dois itens defeituosos em uma amostra tão pequena quanto seis, mudamos drasticamente de ideia e agora acreditamos queB1tem probabilidade tão pequena quanto 0,04. A razão para esta grande mudança é que o eventoAque ocorreu tem uma probabilidade muito maior se B2é verdade do que seB1é verdade. - Exemplo 2.3.7 Um ensaio clínico.Considere o mesmo ensaio clínico descrito nos Exemplos 2.1.12 e 2.1.13. DeixarEeuseja o evento queeuo paciente tem o sucesso como resultado. Lembre-se dissoBjé o evento quep=(j-1)/10 paraj=1, . . . ,11, ondepé a proporção de sucessos entre todos os pacientes possíveis. Se soubéssemos qualBjocorreu, diríamos queE1, É2, . . .eram independentes. Ou seja, estamos dispostos a modelar os pacientes como condicionalmente independentes, dado cada eventoBj,e definimos Pr(Eeu|Bj)=(j-1)/10 para todoseu j.Ainda assumiremos que Pr(Bj)=1/11 para todosjantes do início do julgamento. Estamos agora em posição de expressar o que aprendemos sobrepcalculando probabilidades posteriores para oBjeventos após cada paciente terminar o ensaio. Por exemplo, considere o primeiro paciente. Calculamos Pr(E1)=1/2 pol. (2.1.6). Se E1ocorre, aplicamos o teorema de Bayes para obter Pr.(E1|Bj)Pr.(Bj) 1/2 2(j-1) 10×11 j-1 55 Pr.(Bj|E1)= = = . (2.3.6) Depois de observar um sucesso, as probabilidades posteriores de grandes valores depsão maiores que suas probabilidades anteriores e as probabilidades posteriores de valores baixos depsão inferiores às suas probabilidades anteriores, como seria de esperar. Por exemplo, Pr.(B1 |E1)=0, porquep=0 é descartado após um sucesso. Além disso, Pr.(B2|E1)=0.0182, que é muito menor que seu valor anterior 0.0909, e Pr.(B11|E1)=0.1818, que é maior que seu valor anterior 0. 0909. 2.3Teoremade Bayes 83 2.3 Bayes’ Theorem 83 0,5 0.5 0,4 0.4 0,3 0.3 0,2 0.2 0,1 0.1 Ol Bap es Be Bs Be Br Be BLOB |B, By By By Bs Be By By By By By, Figura 2.3As probabilidades posteriores dos elementos de Figure 2.3 The posterior probabilities of partition particdo apés 40 pacientes no Exemplo 2.3.7. elements after 40 patients in Example 2.3.7. Poderiamos verificar como as probabilidades posteriores se comportam apés a observacdo de We could check how the posterior probabilities behave after each patient is cada paciente. Contudo, avancaremos para 0 ponto em que todos os 40 pacientes na coluna de observed. However, we shall skip ahead to the point at which all 40 patients in the imipramina da Tabela 2.1 foram observados. DeixarArepresentam o evento observado de que 22 imipramine column of Table 2.1 have been observed. Let A stand for the observed deles sdo sucessos e 18 sdo fracassos. Podemos usar 0 mesmo raciocinio event that 22 of them are successes and 18 are failures. We can use the same reasoning como no Exemplo 2.2.5 para calcular Pr(A| Bj). Ha22sequaPlcias possiveis de 40 as in Example 2.2.5 to compute Pr(A|B;). There are (6) possible sequences of 40 pacientes com 22 sucessos e, condicionadas4a probabilidade de cada sequéncia patients with 22 successes, and, conditional on B,, the probability of each sequence EMA1A0b2(1 - (411A 0hs. is (Lj — 1/10) — [7 — 1]/10)!8. Ento, So, ( ho 40 Pr.(A| B= 39 (-A0p2(1 - [41] 0)8, (2.3.7) Pr(A|B;) = (Su — 1]/10)”7(1 — [j — 1]/10)'8, (2.3.7) para cada/.Entdo o teorema de Bayes nos diz que for each j. Then Bayes’ theorem tells us that . . 40) /7 ; . Pr. (Bi) AK nea Opal - [4111108 Pr(B,IA) i (55)(Li — 1]/10)21 — [7 — 1]/10)!8 (Bj\| AF : i — oO OOD: ew=1 i S2Qeu-1]/10p2(1 - [eu-1]/10)8 / yi ECS) di — 11/10)221 — [i — 1]/10)!8 A Figura 2.3 mostra as probabilidades posteriores dos 11 elementos de particado apdés Figure 2.3 shows the posterior probabilities of the 11 partition elements after observ- observarA. Observe que as probabilidades deBee B7sd0 os mais altos, 0,42. Isto ing A. Notice that the probabilities of Bs and B; are the highest, 0.42. This corresponds corresponde ao fato de que a proporcdo de sucessos na amostra observada é de 22/40 = to the fact that the proportion of successes in the observed sample is 22/40 = 0.55, 0.55, a meio caminho entre(6-1)/10 e(7-1 )JA0. halfway between (6 — 1)/10 and (7 — 1)/10. Também podemos calcular a probabilidade de que 0 préximo paciente seja um sucesso We can also compute the probability that the next patient will be a success both antes do ensaio e depois dos 40 pacientes. Antes do julgamento, o Pr.(£41=Pr.(1), que é igual a before the trial and after the 40 patients. Before the trial, Pr(£4,) = Pr(£,), which 1/2, conforme calculado em (2.1.6). Depois de observar os 40 pacientes, podemos calcular Pr(E equals 1/2, as computed in (2.1.6). After observing the 40 patients, we can compute 41| Ajusando a versdo condicional da lei da probabilidade total, (2.1.5): Pr(£4;|A) using the conditional version of the law of total probability, (2.1.5): y! 11 Pr.(Ea1|A Pr.(Ea1 | BiNA)Pr.(Bj| A). (2.3.8) Pr(E4;|A) = > Pr(E4)|B; 9 A) Pr(B;|A). (2.3.8) Fl j=l Usando os valores de Pr(Bj| Ana Fig. 2.3 e o fato de que Pr(Ea1 | BNAPr.(£41 | Bi) =(-1)/10 Using the values of Pr(B;|A) in Fig. 2.3 and the fact that Pr(E£4,|B; 0 A) = Pr(E4)|B;) (independéncia condicional do feuConsiderando a8),calculamos (2.3.8) como 0,5476. Isto = (j — 1)/10 (conditional independence of the E; given the B;), we compute (2.3.8) também esta muito prdoximo da frequéncia de sucesso observada. - to be 0.5476. This is also very close to the observed frequency of success. < O calculo no final do Exemplo 2.3.7 é tipico do que acontece apés a observacdo de muitos The calculation at the end of Example 2.3.7 is typical of what happens after ob- eventos condicionalmente independentes com a mesma probabilidade condicional de serving many conditionally independent events with the same conditional probability ocorréncia. A probabilidade condicional do préximo evento dados aqueles que foram of occurrence. The conditional probability of the next event given those that were observados tende a estar préxima da frequéncia de ocorréncia observada entre os eventos observed tends to be close to the observed frequency of occurrence among the ob- observados. Na verdade, quando existem dados substanciais, a escolha de probabilidades served events. Indeed, when there is substantial data, the choice of prior probabilities anteriores torna-se muito menos importante. becomes far less important. 84 Chapter 2 Conditional Probability 0.1 0.4 0.5 0.3 0.2 0 B1 B2 B3 B4 B5 B6 B7 B8 B9 B10 B11 X X X X X X X X X X X Figure 2.4 The posterior probabilities of partition elements after 40 patients in Example 2.3.8. The X characters mark the values of the posterior probabilities calculated in Example 2.3.7. Example 2.3.8 The Effect of Prior Probabilities. Consider the same clinical trial as in Example 2.3.7. This time, suppose that a different researcher has a different prior opinion about the value of p, the probability of success. This researcher believes the following prior probabilities: Event B1 B2 B3 B4 B5 B6 B7 B8 B9 B10 B11 p 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Prior prob. 0.00 0.19 0.19 0.17 0.14 0.11 0.09 0.06 0.04 0.01 0.00 We can recalculate the posterior probabilities using Bayes’ theorem, and we get the values pictured in Fig. 2.4. To aid comparison, the posterior probabilities from Example 2.3.7 are also plotted in Fig. 2.4 using the symbol X. One can see how close the two sets of posterior probabilities are despite the large differences between the prior probabilities. If there had been fewer patients observed, there would have been larger differences between the two sets of posterior probabilites because the observed events would have provided less information. (See Exercise 12 in this section.) ◀ Summary Bayes’ theorem tells us how to compute the conditional probability of each event in a partition given an observed event A. A major use of partitions is to divide the sample space into small enough pieces so that a collection of events of interest become conditionally independent given each event in the partition. Exercises 1. Suppose that k events B1, . . . , Bk form a partition of the sample space S. For i = 1, . . . , k, let Pr(Bi) denote the prior probability of Bi. Also, for each event A such that Pr(A) > 0, let Pr(Bi|A) denote the posterior probability of Bi given that the event A has occurred. Prove that if Pr(B1|A) < Pr(B1), then Pr(Bi|A) > Pr(Bi) for at least one value of i (i = 2, . . . , k). 84 Capítulo 2 Probabilidade Condicional 0,5 0,4 0,3 0,2 0,1 X X X X X B X X X X X X 0 1 B2 B3 B4 B5 B6 B7 B8 B9B10 B11 Figura 2.4As probabilidades posteriores dos elementos de partição após 40 pacientes no Exemplo 2.3.8. Os caracteres X marcam os valores das probabilidades posteriores calculadas no Exemplo 2.3.7. Exemplo 2.3.8 O efeito das probabilidades anteriores.Considere o mesmo ensaio clínico do Exemplo 2.3.7. Desta vez, suponha que um pesquisador diferente tenha uma opinião anterior diferente sobre o valor dop, a probabilidade de sucesso. Este pesquisador acredita nas seguintes probabilidades anteriores: Evento B1 B2 B3 B4 B5 B6 B7 B8 B9 B10 B11 p Problema anterior. 0,00 0,19 0,19 0,17 0,14 0,11 0,09 0,06 0,04 0,01 0,00 0,0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1,0 Podemos recalcular as probabilidades posteriores usando o teorema de Bayes e obtemos os valores ilustrados na Figura 2.4. Para ajudar na comparação, as probabilidades posteriores do Exemplo 2.3.7 também são plotadas na Figura 2.4 usando o símbolo X. Pode-se ver quão próximos estão os dois conjuntos de probabilidades posteriores, apesar das grandes diferenças entre as probabilidades anteriores. Se tivesse havido menos pacientes observados, teria havido diferenças maiores entre os dois conjuntos de probabilidades posteriores porque os eventos observados teriam fornecido menos informações. (Veja o Exercício 12 nesta seção.) - Resumo O teorema de Bayes nos diz como calcular a probabilidade condicional de cada evento em uma partição dado um evento observadoA. Um dos principais usos das partições é dividir o espaço amostral em pedaços pequenos o suficiente para que uma coleção de eventos de interesse se torne condicionalmente independente, dado cada evento na partição. Exercícios 1.Suponha quekeventosB1, . . . , Bkformar uma partição do espaço amostralS. Paraeu=1, . . . , k, deixe Pr(Beu)denotar a probabilidade anterior deBeu. Além disso, para cada eventoAtal que Pr(A) >0, deixe Pr(Beu|A)denotar a probabilidade posterior deBeudado que o eventoAocorreu. Prove que se Pr(B1|UMA) < Pr.(B1), então Pr(Beu|A) >Pr.(Beu)para pelo menos um valor de eu(eu=2, . . . , k). 2.3 Bayes’ Theorem 85 2. Consider again the conditions of Example 2.3.4 in this section, in which an item was selected at random from a batch of manufactured items and was found to be de- fective. For which values of i (i = 1, 2, 3) is the posterior probability that the item was produced by machine Mi larger than the prior probability that the item was pro- duced by machine Mi? 3. Suppose that in Example 2.3.4 in this section, the item selected at random from the entire lot is found to be non- defective. Determine the posterior probability that it was produced by machine M2. 4. A new test has been devised for detecting a particular type of cancer. If the test is applied to a person who has this type of cancer, the probability that the person will have a positive reaction is 0.95 and the probability that the person will have a negative reaction is 0.05. If the test is applied to a person who does not have this type of cancer, the prob- ability that the person will have a positive reaction is 0.05 and the probability that the person will have a negative re- action is 0.95. Suppose that in the general population, one person out of every 100,000 people has this type of can- cer. If a person selected at random has a positive reaction to the test, what is the probability that he has this type of cancer? 5. In a certain city, 30 percent of the people are Conser- vatives, 50 percent are Liberals, and 20 percent are Inde- pendents. Records show that in a particular election, 65 percent of the Conservatives voted, 82 percent of the Lib- erals voted, and 50 percent of the Independents voted. If a person in the city is selected at random and it is learned that she did not vote in the last election, what is the prob- ability that she is a Liberal? 6. Suppose that when a machine is adjusted properly, 50 percent of the items produced by it are of high quality and the other 50 percent are of medium quality. Suppose, however, that the machine is improperly adjusted during 10 percent of the time and that, under these conditions, 25 percent of the items produced by it are of high quality and 75 percent are of medium quality. a. Suppose that five items produced by the machine at a certain time are selected at random and inspected. If four of these items are of high quality and one item is of medium quality, what is the probability that the machine was adjusted properly at that time? b. Suppose that one additional item, which was pro- duced by the machine at the same time as the other five items, is selected and found to be of medium quality. What is the new posterior probability that the machine was adjusted properly? 7. Suppose that a box contains five coins and that for each coin there is a different probability that a head will be obtained when the coin is tossed. Let pi denote the probability of a head when the ith coin is tossed (i = 1, . . . , 5), and suppose that p1 = 0, p2 = 1/4, p3 = 1/2, p4 = 3/4, and p5 = 1. a. Suppose that one coin is selected at random from the box and when it is tossed once, a head is obtained. What is the posterior probability that the ith coin was selected (i = 1, . . . , 5)? b. If the same coin were tossed again, what would be the probability of obtaining another head? c. If a tail had been obtained on the first toss of the selected coin and the same coin were tossed again, what would be the probability of obtaining a head on the second toss? 8. Consider again the box containing the five different coins described in Exercise 7. Suppose that one coin is selected at random from the box and is tossed repeatedly until a head is obtained. a. If the first head is obtained on the fourth toss, what is the posterior probability that the ith coin was se- lected (i = 1, . . . , 5)? b. If we continue to toss the same coin until another head is obtained, what is the probability that exactly three additional tosses will be required? 9. Consider again the conditions of Exercise 14 in Sec. 2.1. Suppose that several parts will be observed and that the different parts are conditionally independent given each of the three states of repair of the machine. If seven parts are observed and exactly one is defective, compute the posterior probabilities of the three states of repair. 10. Consider again the conditions of Example 2.3.5, in which the phenotype of an individual was observed and found to be the dominant trait. For which values of i (i = 1, . . . , 6) is the posterior probability that the parents have the genotypes of event Bi smaller than the prior probability that the parents have the genotyes of event Bi? 11. Suppose that in Example 2.3.5 the observed individual has the recessive trait. Determine the posterior probabil- ity that the parents have the genotypes of event B4. 12. In the clinical trial in Examples 2.3.7 and 2.3.8, sup- pose that we have only observed the first five patients and three of the five had been successes. Use the two different sets of prior probabilities from Examples 2.3.7 and 2.3.8 to calculate two sets of posterior probabilities. Are these two sets of posterior probabilities as close to each other as were the two in Examples 2.3.7 and 2.3.8? Why or why not? 13. Suppose that a box contains one fair coin and one coin with a head on each side. Suppose that a coin is drawn at random from this box and that we begin to flip the coin. In Eqs. (2.3.4) and (2.3.5), we computed the conditional 2.3 Teorema de Bayes 85 2.Considere novamente as condições do Exemplo 2.3.4 desta seção, em que um item foi selecionado aleatoriamente de um lote de itens fabricados e foi considerado defeituoso. Para quais valores deeu(eu=1,2,3) é a probabilidade posterior de que o item tenha sido produzido por máquinaMeu maior do que a probabilidade anterior de que o item foi produzido por máquinaMeu? 1, . . . ,5), e suponha quep1= 0,p2= 1/4,p3= 1/2, p4= 3/4, e p5= 1. a.Suponha que uma moeda seja selecionada aleatoriamente da caixa e, ao ser lançada uma vez, seja obtida uma cara. Qual é a probabilidade posterior de queeua moeda foi selecionada (eu=1, . . . ,5)? b.Se a mesma moeda fosse lançada novamente, qual seria a probabilidade de obter outra cara? 3.Suponha que no Exemplo 2.3.4 desta seção, o item selecionado aleatoriamente de todo o lote seja considerado não defeituoso. Determine a probabilidade posterior de que tenha sido produzido por máquinaM2. c.Se tivesse sido obtida coroa no primeiro lançamento da moeda selecionada e a mesma moeda fosse lançada novamente, qual seria a probabilidade de obter cara no segundo lançamento? 4.Um novo teste foi desenvolvido para detectar um tipo específico de câncer. Se o teste for aplicado em uma pessoa que tem esse tipo de câncer, a probabilidade de a pessoa ter uma reação positiva é de 0,95 e a probabilidade de a pessoa ter uma reação negativa é de 0,05. Se o teste for aplicado em uma pessoa que não tem esse tipo de câncer, a probabilidade de a pessoa ter uma reação positiva é de 0,05 e a probabilidade de a pessoa ter uma reação negativa é de 0,95. Suponhamos que na população em geral, uma pessoa em cada 100.000 pessoas tenha este tipo de cancro. Se uma pessoa selecionada aleatoriamente tiver uma reação positiva ao teste, qual a probabilidade de ela ter esse tipo de câncer? 8.Considere novamente a caixa contendo as cinco moedas diferentes descritas no Exercício 7. Suponha que uma moeda seja selecionada aleatoriamente da caixa e seja lançada repetidamente até obter uma cara. a.Se a primeira cara for obtida no quarto lançamento, qual é a probabilidade posterior de que oeua moeda foi selecionada (eu=1, . . . ,5)? b.Se continuarmos a lançar a mesma moeda até obter outra cara, qual é a probabilidade de serem necessários exatamente três lançamentos adicionais? 9.Considere novamente as condições do Exercício 14 da Seção. 2.1. Suponha que diversas peças serão observadas e que as diferentes peças sejam condicionalmente independentes, dados cada um dos três estados de reparo da máquina. Se sete peças forem observadas e exatamente uma estiver defeituosa, calcule as probabilidades posteriores dos três estados de reparo. 5.Numa determinada cidade, 30% da população são conservadores, 50% são liberais e 20% são independentes. Os registos mostram que, numa determinada eleição, votaram 65% dos Conservadores, 82% dos Liberais e 50% dos Independentes. Se uma pessoa na cidade for selecionada aleatoriamente e se descobrir que ela não votou nas últimas eleições, qual a probabilidade de ela ser liberal? 10.Considere novamente as condições do Exemplo 2.3.5, nas quais o fenótipo de um indivíduo foi observado e considerado a característica dominante. Para quais valores deeu (eu=1 , . . . ,6) é a probabilidade posterior de que os pais tenham os genótipos do eventoBeumenor que a probabilidade anterior de que os pais tenham os genótipos do evento Beu? 6.Suponha que quando uma máquina é ajustada adequadamente, 50% dos itens produzidos por ela são de alta qualidade e os outros 50% são de qualidade média. Suponhamos, porém, que a máquina seja ajustada incorretamente durante 10% do tempo e que, nessas condições, 25% dos itens por ela produzidos sejam de alta qualidade e 75% sejam de qualidade média. 11.Suponha que no Exemplo 2.3.5 o indivíduo observado tenha o traço recessivo. Determine a probabilidade posterior de que os pais tenham os genótipos do eventoB4. a.Suponha que cinco itens produzidos pela máquina em um determinado momento sejam selecionados aleatoriamente e inspecionados. Se quatro desses itens forem de alta qualidade e um item for de qualidade média, qual é a probabilidade de a máquina ter sido ajustada corretamente naquele momento? 12.No ensaio clínico dos Exemplos 2.3.7 e 2.3.8, suponhamos que observamos apenas os primeiros cinco pacientes e que três dos cinco tiveram sucesso. Use os dois conjuntos diferentes de probabilidades anteriores dos Exemplos 2.3.7 e 2.3.8 para calcular dois conjuntos de probabilidades posteriores. Esses dois conjuntos de probabilidades posteriores estão tão próximos um do outro quanto os dois nos Exemplos 2.3.7 e 2.3.8? Por que ou por que não? b.Suponha que um item adicional, que foi produzido pela máquina ao mesmo tempo que os outros cinco itens, seja selecionado e considerado de qualidade média. Qual é a nova probabilidade posterior de que a máquina tenha sido ajustada corretamente? 7.Suponha que uma caixa contenha cinco moedas e que para cada moeda haja uma probabilidade diferente de obter cara quando a moeda for lançada. Deixarpeudenotam a probabilidade de uma cara quando oeua moeda é lançada (eu= 13.Suponha que uma caixa contenha uma moeda honesta e uma moeda com cara de cada lado. Suponha que uma moeda seja retirada aleatoriamente desta caixa e que comecemos a jogar a moeda. Nas Eqs. (2.3.4) e (2.3.5), calculamos o condicional 86 Chapter 2 Conditional Probability probability that the coin was fair given that the first two flips both produce heads. a. Suppose that the coin is flipped a third time and another head is obtained. Compute the probability that the coin is fair given that all three flips produced heads. b. Suppose that the coin is flipped a fourth time and the result is tails. Compute the posterior probability that the coin is fair. 14. Consider again the conditions of Exercise 23 in Sec. 2.2. Assume that Pr(B) = 0.4. Let A be the event that ex- actly 8 out of 11 programs compiled. Compute the condi- tional probability of B given A. 15. Use the prior probabilities in Example 2.3.8 for the events B1, . . . , B11. Let E1 be the event that the first pa- tient is a success. Compute the probability of E1 and ex- plain why it is so much less than the value computed in Example 2.3.7. 16. Consider a machine that produces items in sequence. Under normal operating conditions, the items are independent with probability 0.01 of being defective. However, it is possible for the machine to develop a “memory” in the following sense: After each defective item, and independent of anything that happened earlier, the probability that the next item is defective is 2/5. Af- ter each nondefective item, and independent of anything that happened earlier, the probability that the next item is defective is 1/165. Assume that the machine is either operating normally for the whole time we observe or has a memory for the whole time that we observe. Let B be the event that the machine is operating normally, and assume that Pr(B) = 2/3. Let Di be the event that the ith item inspected is defective. Assume that D1 is independent of B. a. Prove that Pr(Di) = 0.01 for all i. Hint: Use induc- tion. b. Assume that we observe the first six items and the event that occurs is E = Dc 1 ∩ Dc 2 ∩ D3 ∩ D4 ∩ Dc 5 ∩ Dc 6. That is, the third and fourth items are defective, but the other four are not. Compute Pr(B|D). ⋆ 2.4 The Gambler’s Ruin Problem Consider two gamblers with finite resources who repeatedly play the same game against each other. Using the tools of conditional probability, we can calculate the probability that each of the gamblers will eventually lose all of his money to the opponent. Statement of the Problem Suppose that two gamblers A and B are playing a game against each other. Let p be a given number (0 < p < 1), and suppose that on each play of the game, the probability that gambler A will win one dollar from gambler B is p and the probability that gambler B will win one dollar from gambler A is 1 − p. Suppose also that the initial fortune of gambler A is i dollars and the initial fortune of gambler B is k − i dollars, where i and k − i are given positive integers. Thus, the total fortune of the two gamblers is k dollars. Finally, suppose that the gamblers play the game repeatedly and independently until the fortune of one of them has been reduced to 0 dollars. Another way to think about this problem is that B is a casino and A is a gambler who is determined to quit as soon he wins k − i dollars from the casino or when he goes broke, whichever comes first. We shall now consider this game from the point of view of gambler A. His initial fortune is i dollars and on each play of the game his fortune will either increase by one dollar with a probability of p or decrease by one dollar with a probability of 1 − p. If p > 1/2, the game is favorable to him; if p < 1/2, the game is unfavorable to him; and if p = 1/2, the game is equally favorable to both gamblers. The game ends either when the fortune of gambler A reaches k dollars, in which case gambler B will have no money left, or when the fortune of gambler A reaches 0 dollars. The problem is to 86 Capítulo 2 Probabilidade Condicional probabilidade de a moeda ser justa, dado que os dois primeiros lançamentos produzem cara. independente com probabilidade 0,01 de ser defeituoso. Contudo, é possível que a máquina desenvolva uma “memória” no seguinte sentido: após cada item defeituoso, e independentemente de qualquer coisa que tenha acontecido anteriormente, a probabilidade de o próximo item estar defeituoso é de 2/5. Depois de cada item não defeituoso, e independente de qualquer coisa que tenha acontecido anteriormente, a probabilidade de o próximo item estar com defeito é 1/165. Suponha que a máquina esteja operando normalmente durante todo o tempo que observamos ou tenha uma memória para todo o tempo que observamos. DeixarBser o evento em que a máquina está operando normalmente e assumir que Pr(B)= 2/3. DeixeDeuseja o evento queeuo item inspecionado está com defeito. Assuma issoD1é independente deB. a.Suponha que a moeda seja lançada uma terceira vez e outra cara seja obtida. Calcule a probabilidade de a moeda ser justa, dado que todos os três lançamentos produziram cara. b.Suponha que a moeda seja lançada uma quarta vez e o resultado seja coroa. Calcule a probabilidade posterior de que a moeda seja honesta. 14.Considere novamente as condições do Exercício 23 na Seção. 2.2. Suponha que Pr(B)=0.4. DeixeAseja o evento em que exatamente 8 dos 11 programas foram compilados. Calcule a probabilidade condicional deBdadoA. 15.Use as probabilidades anteriores no Exemplo 2.3.8 para os eventosB1, . . . , B11. DeixarE1seja o evento em que o primeiro paciente seja um sucesso. Calcule a probabilidade deE1e explique por que é muito menor que o valor calculado no Exemplo 2.3.7. a.Prove que Pr(Deu)=0.01 para todoseu.Dica:Use indução. b.Suponha que observamos os primeiros seis itens e o evento que ocorre éE=Dc 1∩Dc 2∩D3∩D4∩Dc 5∩ Dc6. Ou seja, o terceiro e o quarto itens estão com defeito, mas os outros quatro não são. Calcular PR(B|D). 16.Considere uma máquina que produz itens em sequência. Em condições normais de operação, os itens são - 2.4 O problema da ruína do jogador Considere dois jogadores com recursos finitos que jogam repetidamente o mesmo jogo um contra o outro. Utilizando as ferramentas da probabilidade condicional, podemos calcular a probabilidade de cada um dos jogadores acabar por perder todo o seu dinheiro para o adversário. Enunciado do problema Suponha que dois jogadoresAeBestão jogando um jogo um contra o outro. Deixarp ser um determinado número (0<p <1), e suponha que em cada jogada do jogo, a probabilidade de o jogadorAganhará um dólar do jogadorBépe a probabilidade de que o jogadorBganhará um dólar do jogadorAé 1 -p. Suponha também que a fortuna inicial do jogadorAéeudólares e a fortuna inicial do jogadorBék-eu dólares, ondeeuek -eurecebem inteiros positivos. Assim, a fortuna total dos dois jogadores ékdólares. Finalmente, suponha que os jogadores joguem repetidamente e de forma independente até que a fortuna de um deles seja reduzida a 0 dólares. Outra maneira de pensar sobre esse problema é queBé um cassino eAé um jogador que está determinado a desistir assim que ganhark-eudólares do cassino ou quando ele falir, o que ocorrer primeiro. Vamos agora considerar este jogo do ponto de vista do jogadorA. Sua fortuna inicial éeudólares e em cada jogada sua fortuna aumentará em um dólar com uma probabilidade depou diminuir em um dólar com probabilidade de 1 -p. Sep >1/2, o jogo lhe é favorável; sep <1/2, o jogo lhe é desfavorável; e sep=1/2, o jogo é igualmente favorável para ambos os jogadores. O jogo termina quando a fortuna do jogadorAalcançakdólares, nesse caso o jogadorBnão terá mais dinheiro, ou quando a fortuna do jogadorAchega a 0 dólares. O problema é Traduzido do Inglês para o Português - www.onlinedoctranslator.com 2.4 The Gambler’s Ruin Problem 87 determine the probability that the fortune of gambler A will reach k dollars before it reaches 0 dollars. Because one of the gamblers will have no money left at the end of the game, this problem is called the Gambler’s Ruin problem. Solution of the Problem We shall continue to assume that the total fortune of the gamblers A and B is k dollars, and we shall let ai denote the probability that the fortune of gambler A will reach k dollars before it reaches 0 dollars, given that his initial fortune is i dollars. We assume that the game is the same each time it is played and the plays are independent of each other. It follows that, after each play, the Gambler’s Ruin problem essentially starts over with the only change being that the initial fortunes of the two gamblers have changed. In particular, for each j = 0, . . . , k, each time that we observe a sequence of plays that lead to gambler A’s fortune being j dollars, the conditional probability, given such a sequence, that gambler A wins is aj. If gambler A’s fortune ever reaches 0, then gambler A is ruined, hence a0 = 0. Similarly, if his fortune ever reaches k, then gambler A has won, hence ak = 1. We shall now determine the value of ai for i = 1, . . . , k − 1. Let A1 denote the event that gambler A wins one dollar on the first play of the game, let B1 denote the event that gambler A loses one dollar on the first play of the game, and let W denote the event that the fortune of gambler A ultimately reaches k dollars before it reaches 0 dollars. Then Pr(W) = Pr(A1) Pr(W|A1) + Pr(B1) Pr(W|B1) = pPr(W|A1) + (1 − p)Pr(W|B1). (2.4.1) Since the initial fortune of gambler A is i dollars (i = 1, . . . , k − 1), then Pr(W) = ai. Furthermore, if gambler A wins one dollar on the first play of the game, then his fortune becomes i + 1 dollars and the conditional probability Pr(W|A1) that his fortune will ultimately reach k dollars is therefore ai+1. If A loses one dollar on the first play of the game, then his fortune becomes i − 1 dollars and the conditional probability Pr(W|B1) that his fortune will ultimately reach k dollars is therefore ai−1. Hence, by Eq. (2.4.1), ai = pai+1 + (1 − p)ai−1. (2.4.2) We shall let i = 1, . . . , k − 1 in Eq. (2.4.2). Then, since a0 = 0 and ak = 1, we obtain the following k − 1 equations: a1 =pa2, a2 =pa3 + (1 − p)a1, a3 =pa4 + (1 − p)a2, ... ak−2 =pak−1 + (1 − p)ak−3, ak−1 =p + (1 − p)ak−2. (2.4.3) If the value of ai on the left side of the ith equation is rewritten in the form pai + (1 − p)ai and some elementary algebra is performed, then these k − 1 equations can 2.4 O problema da ruína do jogador 87 determine a probabilidade de que a fortuna do jogadorAatingirákdólares antes de chegar a 0 dólares. Como um dos jogadores não terá mais dinheiro no final do jogo, este problema é chamado deRuína do jogadorproblema. Solução do problema Continuaremos a assumir que a fortuna total dos jogadoresAeBékdólares, e deixaremosa eudenotam a probabilidade de que a fortuna do jogadorAatingirák dólares antes de chegar a 0 dólares, dado que sua fortuna inicial éeudólares. Assumimos que o jogo é o mesmo sempre que é jogado e que as jogadas são independentes umas das outras. Segue-se que, após cada jogada, o problema da ruína do jogador essencialmente recomeça, com a única mudança sendo que a sorte inicial dos dois jogadores mudou. Em particular, para cadaj=0, . . . , k, cada vez que observamos uma sequência de jogadas que levam o jogadorAa fortuna éjdólares, a probabilidade condicional, dada tal sequência, de que o jogadorAvitórias éaj.Se jogadorAa fortuna de alguém chega a 0, então o jogadorA está arruinado, portantoa0= 0. Da mesma forma, se sua fortuna atingirk, então jogadorA ganhou, portantoak=1. Vamos agora determinar o valor deaeupara eu=1, . . . , k-1. DeixarA1denota o evento em que o jogadorAganha um dólar na primeira jogada do jogo, deixeB1denota o evento em que o jogadorAperde um dólar na primeira jogada do jogo e deixaCdenotam o evento em que a fortuna do jogadorAfinalmente alcança k dólares antes de chegar a 0 dólares. Então Pr.(C)=Pr.(A1)Pr.(C|A1)+Pr.(B1)Pr.(C|B1) =pPr.(C|A1)+(1 -p)Pr.(C|B1). (2.4.1) Desde a fortuna inicial do jogadorAéeudólares (eu=1, . . . , k-1), então Pr(C)=aeu. Além disso, se o jogadorAganha um dólar na primeira jogada do jogo, então sua fortuna se tornaeu+1 dólar e a probabilidade condicional Pr(C|A1)que sua fortuna acabará alcançandokdólares é, portanto,aeu+1. SeAperde um dólar na primeira jogada do jogo, então sua fortuna se tornaeu-1 dólar e a probabilidade condicional Pr(C|B1)que sua fortuna acabará alcançandokdólares é, portanto,aeu−1. Portanto, pela Eq. (2.4.1), aeu=paieu+1+(1 -p)umaeu−1. (2.4.2) Vamos deixareu=1, . . . , k-1 na Eq. (2.4.2). Então, desdea0= 0 eak=1, obtemos o seguintek-1 equações: a1=pai2, a2=pai3+(1 -p)uma1, a 3=pai4+(1 -p)uma2, . . . ak−2=paik−1+(1 -p)umak−3, a k−1=p+(1 -p)umak−2. (2.4.3) Se o valor deaeuno lado esquerdo doeua equação é reescrita na formapaieu+ (1 - p)umaeue alguma álgebra elementar é executada, então estesk-1 equações podem 88 Capitulo 2 Probabilidade Condicional 88 Chapter 2 Conditional Probability ser reescrito da seguinte forma: be rewritten as follows: a2-am= IPs a7 —- a, = I=, Pp ( p 2 2 d3-d2= 1? ima ) = 1-p. Hn, a3-a = I Pw _— ay) = (2) a, Pp ( Pp p p 3 3 1- 1 - 1- 1— a4- B= ~P (3.42) = —P ai, a4 — a3 = —— (a3 — ay) = (—) ay, p p (2.4.4) Pp Pp (2.4.4) a ( de k-2 1- 1 - 1- 1— Ak-1-Ak-2= —P lay Pe 4H, Oy —1 ~ 4-9 = —" (ay_y — a3) = (2) a, Pp ( Pp p p 1 k-1 1- 1 - 1- 1— 1 -ak-1= —P ae1-ak-2° Po a. 1l-—a,_,= Ta, — a2) = (—) ay. Pp Pp p p Igualando a soma dos lados esquerdos destesk-1 equacées com a soma dos lados By equating the sum of the left sides of these k — 1 equations with the sum of the direitos, obtemos a relagdo right sides, we obtain the relation rip ) eu kl, i 1-am=a1 —_ . (2.4.5) 1- a=a4 > (—") : (2.4.5) eu=1 P i=1 P Solu¢do para um jogo justoSuponha primeiro quep=1/2. Entao( -p)/p=1, e Solution for a Fair Game Suppose first that p = 1/2. Then (1 — p)/p =1, and it segue da Eq. (2.4.5) que 1 -a1=(k-1)a1, do qualai= 1/k. Por sua vez, segue da primeira follows from Eq. (2.4.5) that 1 — a, = (k — 1)ay, from which a; = 1/k. In turn, it follows equacdo em (2.4.4) quea2= 2/k, segue da segunda equacdo em (2.4.4) quea3= 3/k, e from the first equation in (2.4.4) that a. = 2/k, it follows from the second equation in assim por diante. Desta forma, obtemos a seguinte solugdo completa quandop=1 72: (2.4.4) that a3; = 3/k, and so on. In this way, we obtain the following complete solution when p = 1/2: eu i . deur 7 paraeu=1,..., k-1. (2.4.6) qj = i fori=1,...,k—-1. (2.4.6) Exemplo A probabilidade de vencer em um jogo justo.Suponha quep=1/2, caso em que o Example The Probability of Winning in a Fair Game. Suppose that p = 1/2, in which case the 2.4.1 0 jogo é igualmente favoravel para ambos os jogadores; e suponha que a fortuna 2.4.1 game is equally favorable to both gamblers; and suppose that the initial fortune of inicial do jogadorAé de 98 dolares e a fortuna inicial do jogadorBé apenas dois gambler A is 98 dollars and the initial fortune of gambler B is just two dollars. In dédlares. Neste exemplo, eu=98 ek=100. Portanto, segue-se da Eq. (2.4.6) que existe this example, i = 98 and k = 100. Therefore, it follows from Eq. (2.4.6) that there uma probabilidade de 0,98 de que o jogadorAganhara dois délares do jogador Bantes is a probability of 0.98 that gambler A will win two dollars from gambler B before do jogador&ganha 98 doélares do jogadorA. - gambler B wins 98 dollars from gambler A. < Solugéo para um jogo injusto —Suponha agora quep=1/2. Entdo a Eq. (2.4.5) pode ser Solution for an Unfair Game Suppose now that p 41/2. Then Eq. (2.4.5) can be reescrito na forma rewritten in the form ( de ( ) 1-p e 1-p 1—p ‘ 1—p Pp Pp Pp p 1-a=a1 ——(——}—___, (2.4.7) 1 - a, =a, .—2_——. (2.4.7) T-p 1 1-p)_4, Pp Pp Por isso, ’ ~~ Hence, 1-p 1 l-p)\_ 1 Pp Pp ai=(——>j4—. (2.4.8) a, = +. (2.4.8) 1-p 4 1p) _4 Pp Pp 2.4 O problema da ruina do jogador 89 2.4 The Gambler’s Ruin Problem 89 Cada um dos outros valores deaeuparaeu=2, ..., k-1 agora pode ser determinado a partir Each of the other values of a; for i =2,..., k — 1 can now be determined in turn das equacées em (2.4.4). Desta forma, obtemos a seguinte solucdo completa: from the equations in (2.4.4). In this way, we obtain the following complete solution: (Jeu ; Bo (Ct) = deu= % paraeu=1,..., k-1. (2.4.9) qd; = i fori=1,...,k—-—1. (2.4.9) 1-p 4 1-p) _4 Pp P Exemplo A probabilidade de vencer em um jogo desfavoravel.Suponha quep=0.4, em que Example The Probability of Winning in an Unfavorable Game. Suppose that p = 0.4, in which 2.4.2 caso a probabilidade de que 0 jogadorAganhara um dolar em qualquer jogada é menor 2.4.2 case the probability that gambler A will win one dollar on any given play is smaller do que a probabilidade de perder um dolar. Suponha também que a fortuna inicial do than the probability that he will lose one dollar. Suppose also that the initial fortune jogadorAé de 99 dolares e a fortuna inicial do jogador Bé apenas um dolar. of gambler A is 99 dollars and the initial fortune of gambler B is just one dollar. We Determinaremos a probabilidade de que o jogadorAganhara um dolar do jogadorB& antes shall determine the probability that gambler A will win one dollar from gambler B do jogadorBganha 99 dolares do jogadorA. before gambler B wins 99 dollars from gambler A. Neste exemplo, a probabilidade necessariaaeué dado pela Eq. (2.4.9), em que In this example, the required probability a; is given by Eq. (2.4.9), in which (| -p)/p=3/2, eu=99, ek=100. Portanto, (1 — p)/p =3/2, i = 99, and k = 100. Therefore, ( 3) 99 1 3 99 4 2 _ 1_2 2 1 2 deu= => —= -, a; = —— _ © —_ = -.. we 382 3 (3)y"-1 3? 3 2 2 Portanto, embora a probabilidade de que o jogadorAganhar um délar em qualquer jogada é de Hence, although the probability that gambler A will win one dollar on any given play apenas 0,4, a probabilidade de ele ganhar um dolar antes de perder 99 ddlares é de is only 0.4, the probability that he will win one dollar before he loses 99 dollars is aproximadamente 2/3. - approximately 2/3. < Resumo Summary Consideramos um jogador e um oponente que comegam cada um com quantias finitas de We considered a gambler and an opponent who each start with finite amounts of dinheiro. Os dois entao jogam uma sequéncia de jogos um contra 0 outro até que um money. The two then play a sequence of games against each other until one of them deles fique sem dinheiro. Conseguimos calcular a probabilidade de cada um deles ser o runs out of money. We were able to calculate the probability that each of them would primeiro a acabar em fungdo da probabilidade de ganhar o jogo e de quanto dinheiro be the first to run out as a function of the probability of winning the game and of how cada um tem no inicio. much money each has at the start. Exercicios Exercises 1.Considere 0 jogo desfavoravel do Exemplo 2.4.2. Desta Suponha quep=1/2. Para qual destas trés condi¢des 1. Consider the unfavorable game in Example 2.4.2. This Suppose that p = 1/2. For which of these three condi- vez, suponha que a fortuna inicial do jogadorAéeu délares existe a maior probabilidade de 0 jogadorAvai ganhar a time, suppose that the initial fortune of gambler A is i tions is there the greatest probability that gambler A will comeus98. Suponha que a fortuna inicial do jogador Bé fortuna inicial do apostador Bantes que ele perca sua dollars with i < 98. Suppose that the initial fortune of — win the initial fortune of gambler B before he loses his 100 -euddlares. Mostre que a probabilidade é maior que fortuna inicial? gambler B is 100 —i dollars. Show that the probability own initial fortune? 1/2 de que o jogadorAperdaseudolares antes de ganhar is greater than 1/2 that gambler A losses i dollars before 100 -eudolares. 3.Considere novamente as trés condicées diferentes (a), winning 100 — i dollars. 3. Consider again the three different conditions (a), (b), (b) e (c) dadas no Exercicio 2, mas suponha agora quep <1/ and (c) given in Exercise 2, but suppose now that p < 1/2. 2.Considere as sequintes trés condicées possiveis diferentes 2. Para qual destas trés condicées existe a maior 2. Consider the following three different possible condi- For which of these three conditions is there the greatest no problema da ruina do jogador: probabilidade de 0 jogadorAvai ganhar a fortuna inicial do tions in the gambler’s ruin problem: probability that gambler A will win the initial fortune of wa . , Le apostador Bantes que ele perca sua fortuna inicial? ae . gambler B before he loses his own initial fortune? a.A fortuna inicial do jogadorAé de dois ddlares, ea a. The initial fortune of gambler A is two dollars, and fortuna inicial do jogador e um dolar. 4.Considere novamente as trés condicées diferentes (a), the initial fortune of gambler B is one dollar. 4. Consider again the three different conditions (a), (b), b.A fortuna inicial do jogadorAé de 20 délares, ea (b) e (c) dadas no Exercicio 2, mas suponha agora quep >1/ b. The initial fortune of gambler A is 20 dollars, and the and (c) given in Exercise 2, but suppose now that p > 1/2. fortuna inicial do jogador Bé 10 dodlares. 2. Para qual destas trés condigées existe a maior initial fortune of gambler B is 10 dollars. For which of these three conditions is there the greatest c.A fortuna inicial do jogadorAé de 200 délares, ea probabilidade de 0 jogadorAvai ganhar a fortuna inicial do c. The initial fortune of gambler A is 200 dollars, and probability that gambler A will win the initial fortune of fortuna inicial do jogador Bé 100 délares. apostador antes que ele perca sua fortuna inicial? the initial fortune of gambler B is 100 dollars. gambler B before he loses his own initial fortune? 90 Chapter 2 Conditional Probability 5. Suppose that on each play of a certain game, a person is equally likely to win one dollar or lose one dollar. Suppose also that the person’s goal is to win two dollars by playing this game. How large an initial fortune must the person have in order for the probability to be at least 0.99 that she will achieve her goal before she loses her initial fortune? 6. Suppose that on each play of a certain game, a person will either win one dollar with probability 2/3 or lose one dollar with probability 1/3. Suppose also that the person’s goal is to win two dollars by playing this game. How large an initial fortune must the person have in order for the probability to be at least 0.99 that he will achieve his goal before he loses his initial fortune? 7. Suppose that on each play of a certain game, a person will either win one dollar with probability 1/3 or lose one dollar with probability 2/3. Suppose also that the person’s goal is to win two dollars by playing this game. Show that no matter how large the person’s initial fortune might be, the probability that she will achieve her goal before she loses her initial fortune is less than 1/4. 8. Suppose that the probability of a head on any toss of a certain coin is p (0 < p < 1), and suppose that the coin is tossed repeatedly. Let Xn denote the total number of heads that have been obtained on the first n tosses, and let Yn = n − Xn denote the total number of tails on the first n tosses. Suppose that the tosses are stopped as soon as a number n is reached such that either Xn = Yn + 3 or Yn = Xn + 3. Determine the probability that Xn = Yn + 3 when the tosses are stopped. 9. Suppose that a certain box A contains five balls and an- other box B contains 10 balls. One of these two boxes is selected at random, and one ball from the selected box is transferred to the other box. If this process of selecting a box at random and transferring one ball from that box to the other box is repeated indefinitely, what is the probabil- ity that box A will become empty before box B becomes empty? 2.5 Supplementary Exercises 1. Suppose that A, B, and D are any three events such that Pr(A|D) ≥ Pr(B|D) and Pr(A|Dc) ≥ Pr(B|Dc). Prove that Pr(A) ≥ Pr(B). 2. Suppose that a fair coin is tossed repeatedly and inde- pendently until both a head and a tail have appeared at least once. (a) Describe the sample space of this experi- ment. (b) What is the probability that exactly three tosses will be required? 3. Suppose that A and B are events such that Pr(A) = 1/3, Pr(B) = 1/5, and Pr(A|B) + Pr(B|A) = 2/3. Evaluate Pr(Ac ∪ Bc). 4. Suppose that A and B are independent events such that Pr(A) = 1/3 and Pr(B) > 0. What is the value of Pr(A ∪ Bc|B)? 5. Suppose that in 10 rolls of a balanced die, the number 6 appeared exactly three times. What is the probability that the first three rolls each yielded the number 6? 6. Suppose that A, B, and D are events such that A and B are independent, Pr(A ∩ B ∩ D) = 0.04, Pr(D|A ∩ B) = 0.25, and Pr(B) = 4 Pr(A). Evaluate Pr(A ∪ B). 7. Suppose that the events A, B, and C are mutually in- dependent. Under what conditions are Ac, Bc, and Cc mutually independent? 8. Suppose that the events A and B are disjoint and that each has positive probability. Are A and B independent? 9. Suppose that A, B, and C are three events such that A and B are disjoint, A and C are independent, and B and C are independent. Suppose also that 4Pr(A) = 2Pr(B) = Pr(C) > 0 and Pr(A ∪ B ∪ C) = 5Pr(A). Determine the value of Pr(A). 10. Suppose that each of two dice is loaded so that when either die is rolled, the probability that the number k will appear is 0.1 for k = 1, 2, 5, or 6 and is 0.3 for k = 3 or 4. If the two loaded dice are rolled independently, what is the probability that the sum of the two numbers that appear will be 7? 11. Suppose that there is a probability of 1/50 that you will win a certain game. If you play the game 50 times, independently, what is the probability that you will win at least once? 12. Suppose that a balanced die is rolled three times, and let Xi denote the number that appears on the ith roll (i = 1, 2, 3). Evaluate Pr(X1 > X2 > X3). 13. Three students A, B, and C are enrolled in the same class. Suppose that A attends class 30 percent of the time, B attends class 50 percent of the time, and C attends class 80 percent of the time. If these students attend class independently of each other, what is (a) the probability that at least one of them will be in class on a particular day and (b) the probability that exactly one of them will be in class on a particular day? 14. Consider the World Series of baseball, as described in Exercise 16 of Sec. 2.2. If there is probability p that team A will win any particular game, what is the probability 90 Capítulo 2 Probabilidade Condicional 5.Suponha que em cada jogada de um determinado jogo, uma pessoa tenha a mesma probabilidade de ganhar ou perder um dólar. Suponha também que o objetivo da pessoa seja ganhar dois dólares jogando este jogo. Qual deve ser a fortuna inicial que a pessoa deve ter para que a probabilidade de atingir seu objetivo antes de perder sua fortuna inicial seja de pelo menos 0,99? a probabilidade de ela atingir seu objetivo antes de perder sua fortuna inicial é inferior a 1/4. 8.Suponha que a probabilidade de sair cara em qualquer lançamento de uma determinada moeda sejap(0<p <1), e suponha que a moeda seja lançada repetidamente. DeixarXndenotam o número total de caras que foram obtidas no primeironjoga e deixa Sn=n-Xndenota o número total de coroas no primeironarremessos. Suponha que os lançamentos sejam interrompidos assim que um númeroné alcançado tal que ouXn=Sn+3 ou Sn=Xn+3. Determine a probabilidade de queXn=Sn+3 quando os lançamentos são interrompidos. 6.Suponha que em cada jogada de um determinado jogo, uma pessoa ganhará um dólar com probabilidade 2/3 ou perderá um dólar com probabilidade 1/3. Suponha também que o objetivo da pessoa seja ganhar dois dólares jogando este jogo. Qual deve ser a fortuna inicial que uma pessoa deve ter para que a probabilidade de atingir seu objetivo antes de perder sua fortuna inicial seja de pelo menos 0,99? 9.Suponha que uma determinada caixaAcontém cinco bolas e outra caixaBcontém 10 bolas. Uma dessas duas caixas é selecionada aleatoriamente e uma bola da caixa selecionada é transferida para a outra caixa. Se este processo de selecionar uma caixa aleatoriamente e transferir uma bola dessa caixa para a outra for repetido indefinidamente, qual é a probabilidade de que a caixaAficará vazio antes da caixaBfica vazio? 7.Suponha que em cada jogada de um determinado jogo, uma pessoa ganhará um dólar com probabilidade de 1/3 ou perderá um dólar com probabilidade de 2/3. Suponha também que o objetivo da pessoa seja ganhar dois dólares jogando este jogo. Mostre que não importa quão grande seja a fortuna inicial da pessoa, 2.5 Exercícios Suplementares 1.Suponha queA,B, eDsão quaisquer três eventos tais que Pr(A|D)≥Pr.(B|D)e Pr(A|Dc)≥Pr.(B|Dc). Prove que Pr (A)≥Pr.(B). Csão independentes. Suponha também que 4Pr(A)=2Pr (B)= Pr.(C) >0 e Pr(A∪B∪C)=5Pr(A). Determine o valor de Pr(A). 2.Suponha que uma moeda honesta seja lançada repetida e independentemente até que uma cara e uma coroa tenham aparecido pelo menos uma vez.(a)Descreva o espaço amostral deste experimento.(b)Qual é a probabilidade de que sejam necessários exatamente três lançamentos? 10.Suponha que cada um dos dois dados seja carregado de modo que, quando um dos dados for lançado, a probabilidade de o númerok aparecerá é 0,1 parak=1,2,5 ou 6 e é 0,3 parak=3 ou 4. Se os dois dados carregados forem lançados independentemente, qual é a probabilidade de que a soma dos dois números que aparecem seja 7? 3.Suponha queAeBsão eventos tais que Pr(A)= 1/3, Pr. (B)=1/5, e Pr.(A|B)+Pr.(B|A)=2/3. Avalie o PR(Ac∪Bc). 11.Suponha que haja uma probabilidade de 1/50 de você ganhar um determinado jogo. Se você jogar 50 vezes, de forma independente, qual é a probabilidade de ganhar pelo menos uma vez? 4.Suponha queAeBsão eventos independentes tais que Pr(A)=1/3 e Pr.(B) >0. Qual é o valor de Pr(A∪ Bc|B)? 12.Suponha que um dado balanceado seja lançado três vezes eXeudenota o número que aparece noeuo rolo (eu=1,2,3). Avaliar PR(X1>X2>X3). 5.Suponha que em 10 lançamentos de um dado equilibrado, o número 6 apareça exatamente três vezes. Qual é a probabilidade de que os três primeiros lançamentos produzam o número 6? 6.Suponha queA,B, eDsão eventos tais queAe Bsão independentes, Pr(A∩B∩D)=0.04, Pr.(D|A∩B)= 0.25, e Pr.(B)=4 pr(A). Avaliar PR(A∪B). 13.Três estudantesA,B, eCestão matriculados na mesma turma. Suponha queAassiste às aulas 30% do tempo, Bassiste às aulas 50 por cento do tempo, eCassiste às aulas 80% do tempo. Se esses alunos assistem às aulas independentemente uns dos outros, qual é (a) a probabilidade de que pelo menos um deles esteja na aula em um determinado dia e (b) a probabilidade de que exatamente um deles esteja na aula em um determinado dia? ? 7.Suponha que os eventosA,B, eCsão mutuamente independentes. Em que condições sãoAc,Bc, eCc mutuamente independentes? 8.Suponha que os eventosAeBsão disjuntos e que cada um tem probabilidade positiva. SãoAeBindependente? 14.Considere a World Series de beisebol, conforme descrito no Exercício 16 da Seção. 2.2. Se houver probabilidadepaquela equipe Avencerá qualquer jogo em particular, qual é a probabilidade 9.Suponha queA,B, eCsão três eventos tais queA eBsão disjuntos,AeCsão independentes eBe 2.5 Supplementary Exercises 91 that it will be necessary to play seven games in order to determine the winner of the Series? 15. Suppose that three red balls and three white balls are thrown at random into three boxes and and that all throws are independent. What is the probability that each box contains one red ball and one white ball? 16. If five balls are thrown at random into n boxes, and all throws are independent, what is the probability that no box contains more than two balls? 17. Bus tickets in a certain city contain four numbers, U, V , W, and X. Each of these numbers is equally likely to be any of the 10 digits 0, 1, . . . , 9, and the four numbers are chosen independently. A bus rider is said to be lucky if U + V = W + X. What proportion of the riders are lucky? 18. A certain group has eight members. In January, three members are selected at random to serve on a commit- tee. In February, four members are selected at random and independently of the first selection to serve on an- other committee. In March, five members are selected at random and independently of the previous two selections to serve on a third committee. Determine the probability that each of the eight members serves on at least one of the three committees. 19. For the conditions of Exercise 18, determine the prob- ability that two particular members A and B will serve together on at least one of the three committees. 20. Suppose that two players A and B take turns rolling a pair of balanced dice and that the winner is the first player who obtains the sum of 7 on a given roll of the two dice. If A rolls first, what is the probability that B will win? 21. Three players A, B, and C take turns tossing a fair coin. Suppose that A tosses the coin first, B tosses second, and C tosses third; and suppose that this cycle is repeated indefinitely until someone wins by being the first player to obtain a head. Determine the probability that each of three players will win. 22. Suppose that a balanced die is rolled repeatedly until the same number appears on two successive rolls, and let X denote the number of rolls that are required. Determine the value of Pr(X = x), for x = 2, 3, . . . . 23. Suppose that 80 percent of all statisticians are shy, whereas only 15 percent of all economists are shy. Suppose also that 90 percent of the people at a large gathering are economists and the other 10 percent are statisticians. If you meet a shy person at random at the gathering, what is the probability that the person is a statistician? 24. Dreamboat cars are produced at three different fac- tories A, B, and C. Factory A produces 20 percent of the total output of Dreamboats, B produces 50 percent, and C produces 30 percent. However, 5 percent of the cars produced at A are lemons, 2 percent of those produced at B are lemons, and 10 percent of those produced at C are lemons. If you buy a Dreamboat and it turns out to be a lemon, what is the probability that it was produced at factory A? 25. Suppose that 30 percent of the bottles produced in a certain plant are defective. If a bottle is defective, the probability is 0.9 that an inspector will notice it and re- move it from the filling line. If a bottle is not defective, the probability is 0.2 that the inspector will think that it is defective and remove it from the filling line. a. If a bottle is removed from the filling line, what is the probability that it is defective? b. If a customer buys a bottle that has not been removed from the filling line, what is the probability that it is defective? 26. Suppose that a fair coin is tossed until a head is ob- tained and that this entire experiment is then performed independently a second time. What is the probability that the second experiment requires more tosses than the first experiment? 27. Suppose that a family has exactly n children (n ≥ 2). Assume that the probability that any child will be a girl is 1/2 and that all births are independent. Given that the family has at least one girl, determine the probability that the family has at least one boy. 28. Suppose that a fair coin is tossed independently n times. Determine the probability of obtaining exactly n − 1 heads, given (a) that at least n − 2 heads are obtained and (b) that heads are obtained on the first n − 2 tosses. 29. Suppose that 13 cards are selected at random from a regular deck of 52 playing cards. a. If it is known that at least one ace has been selected, what is the probability that at least two aces have been selected? b. If it is known that the ace of hearts has been selected, what is the probability that at least two aces have been selected? 30. Suppose that n letters are placed at random in n en- velopes, as in the matching problem of Sec. 1.10, and let qn denote the probability that no letter is placed in the cor- rect envelope. Show that the probability that exactly one letter is placed in the correct envelope is qn−1. 31. Consider again the conditions of Exercise 30. Show that the probability that exactly two letters are placed in the correct envelopes is (1/2)qn−2. 32. Consider again the conditions of Exercise 7 of Sec. 2.2. If exactly one of the two students A and B is in class on a given day, what is the probability that it is A? 33. Consider again the conditions of Exercise 2 of Sec. 1.10. If a family selected at random from the city 2.5 Exercícios Suplementares 91 que será necessário disputar sete partidas para determinar o vencedor da Série? noBsão limões, e 10 por cento daqueles produzidos emC são limões. Se você comprar um Dreamboat e ele for um limão, qual é a probabilidade de ele ter sido produzido na fábrica?A? 15.Suponha que três bolas vermelhas e três bolas brancas sejam lançadas aleatoriamente em três caixas e que todos os lançamentos sejam independentes. Qual é a probabilidade de que cada caixa contenha uma bola vermelha e uma bola branca? 25.Suponha que 30% das garrafas produzidas em uma determinada fábrica sejam defeituosas. Se uma garrafa estiver com defeito, a probabilidade é de 0,9 de que um inspetor a perceba e a retire da linha de envase. Se uma garrafa não estiver com defeito, a probabilidade é de 0,2 de que o inspetor pense que ela está com defeito e a retire da linha de envase. 16.Se cinco bolas forem lançadas aleatoriamentencaixas, e todos os lançamentos são independentes, qual é a probabilidade de nenhuma caixa conter mais de duas bolas? 17.As passagens de ônibus em uma determinada cidade contêm quatro números,você, V,C, eX. Cada um desses números tem a mesma probabilidade de ser qualquer um dos 10 dígitos 0,1, . . . ,9, e os quatro números são escolhidos independentemente. Diz-se que um passageiro de ônibus tem sorte se você+V=C+X. Qual proporção dos pilotos tem sorte? a.Se uma garrafa for retirada da linha de envase, qual é a probabilidade de ela estar com defeito? b.Se um cliente comprar uma garrafa que não foi retirada da linha de envase, qual a probabilidade de ela estar com defeito? 18.Um certo grupo tem oito membros. Em janeiro, três membros são selecionados aleatoriamente para integrar um comitê. Em fevereiro, quatro membros são selecionados aleatoriamente e independentemente da primeira seleção para integrar outro comitê. Em março, cinco membros são selecionados aleatoriamente e independentemente das duas seleções anteriores para integrar um terceiro comitê. Determine a probabilidade de cada um dos oito membros servir em pelo menos um dos três comitês. 26.Suponha que uma moeda honesta seja lançada até obter uma cara e que todo esse experimento seja então realizado de forma independente uma segunda vez. Qual é a probabilidade de que o segundo experimento exija mais lançamentos do que o primeiro experimento? 27.Suponha que uma família tenha exatamentencrianças (n≥2). Suponha que a probabilidade de qualquer criança ser uma menina seja 1/2 e que todos os nascimentos sejam independentes. Dado que a família tem pelo menos uma menina, determine a probabilidade de a família ter pelo menos um menino. 19.Para as condições do Exercício 18, determine a probabilidade de dois membros particularesAeB servirão juntos em pelo menos um dos três comitês. 28.Suponha que uma moeda honesta seja lançada independentementen vezes. Determine a probabilidade de obter exatamenten- 1 cara, dada(a)que pelo menosn-2 cabeças são obtidas e(b)que as caras são obtidas no primeiron-2 lançamentos. 20.Suponha que dois jogadoresAeBse revezam no lançamento de um par de dados equilibrados e que o vencedor é o primeiro jogador que obtiver a soma de 7 em um determinado lançamento dos dois dados. SeArolar primeiro, qual é a probabilidade de queBirá vencer? 29.Suponha que 13 cartas sejam selecionadas aleatoriamente de um baralho normal de 52 cartas. 21.Três jogadoresA,B, eCrevezem-se no lançamento de uma moeda justa. Suponha queAjoga a moeda primeiro,Bjoga em segundo lugar, eClança o terceiro; e suponha que esse ciclo se repita indefinidamente até que alguém ganhe por ser o primeiro jogador a obter cara. Determine a probabilidade de cada um dos três jogadores vencer. a.Se for sabido que pelo menos um ás foi escolhido, qual é a probabilidade de pelo menos dois ases terem sido escolhidos? b.Se for sabido que o ás de copas foi escolhido, qual é a probabilidade de pelo menos dois ases terem sido escolhidos? 22.Suponha que um dado balanceado seja lançado repetidamente até que o mesmo número apareça em dois lançamentos sucessivos, e deixe Xdenota o número de rolos necessários. Determine o valor de Pr(X=x), parax=2,3, . . . . 30.Suponha quenletras são colocadas aleatoriamente emnenvelopes, como no problema de correspondência da Seç. 1.10, e deixeqn denota a probabilidade de que nenhuma carta seja colocada no envelope correto. Mostre que a probabilidade de que exatamente uma carta seja colocada no envelope correto éqn−1. 23.Suponhamos que 80% de todos os estatísticos sejam tímidos, enquanto apenas 15% de todos os economistas são tímidos. Suponhamos também que 90% das pessoas numa grande reunião sejam economistas e os outros 10% sejam estatísticos. Se você encontrar uma pessoa tímida aleatoriamente na reunião, qual é a probabilidade de essa pessoa ser estatística? 31.Considere novamente as condições do Exercício 30. Mostre que a probabilidade de que exatamente duas cartas sejam colocadas nos envelopes corretos é(1/2)qn−2. 32.Considere novamente as condições do Exercício 7 da Seção. 2.2. Se exatamente um dos dois alunosAeBestá na aula em um determinado dia, qual é a probabilidade de que estejaA? 24.Os carros Dreamboat são produzidos em três fábricas diferentesA,B, eC. FábricaAproduz 20% da produção total da Dreamboats,Bproduz 50 por cento, e Cproduz 30 por cento. No entanto, 5 por cento dos carros produzidos emAsão limões, 2% dos produzidos 33.Considere novamente as condições do Exercício 2 da Seção. 1.10. Se uma família selecionada aleatoriamente na cidade 92 Capitulo 2 Probabilidade Condicional 92 Chapter 2 Conditional Probability assina exatamente um dos trés jornaisA,B&, eC qual éa antes de entrevista-los. Assumimos que o melhor candidato tem a subscribes to exactly one of the three newspapers A, B, before we interview them. We assume that the best candi- probabilidade de que sejaA? mesma probabilidade de ser cada um dosncandidatos na and C, what is the probability that it is A? date is equally likely to be each of the n candidates in the sequéncia antes do inicio das entrevistas. Apés 0 inicio das sequence before the interviews start. After the interviews 34.Trés prisioneirosA,B, eCno corredor da morte sabem que entrevistas, podemos classificar os candidatos que vimos, mas 34, Three prisoners A, B, and C on death row know that start, we are able to rank those candidates we have seen, exatamente dois deles serdo executados, mas nado sabem ndo temos informacgées sobre a classificagdo dos candidatos exactly two of them are going to be executed, but they do but we have no information about where the remaining quais. PrisioneiroAsabe que o carcereiro nao lhe dira se vai ou restantes em relacdo aos que vimos. Apos cada entrevista, é not know which two. Prisoner A knows that the jailer will | candidates rank relative to those we have seen. After each nao ser executado. Ele, portanto, pede ao carcereiro que Ihe necessario contratar imediatamente o candidato atual e not tell him whether or not he is going to be executed. He interview, it is required that either we hire the current can- diga o nome de um prisioneiro que ndo sejaAele mesmo interromper as entrevistas, ou devemos dispensar o candidato therefore asks the jailer to tell him the name of one pris- didate immediately and stop the interviews, or we must let quem sera executado. O carcereiro responde queBsera atual e nunca mais poderemos chama-lo de volta. Optamos por oner other than A himself who will be executed. The jailer the current candidate go and we never can call them back. executado. Ao receber esta resposta, o PrisioneiroAAs razdes entrevistar da seguinte forma: Selecionamos um nimero Osr<ne responds that B will be executed. Upon receiving this re- We choose to interview as follows: We select a number sdo as seguintes: Antes de falar com o carcereiro, a entrevistamos o primeiroRcandidatos sem qualquer intengdo de sponse, Prisoner A reasons as follows: Before he spoke to 0 <r<n and we interview the first r candidates without probabilidade era de 2/3 de que ele fosse um dos dois contrata-los. Comecgando com 0 proximo candidatoA+1, the jailer, the probability was 2/3 that he would be one of any intention of hiring them. Starting with the next can- prisioneiros executados. Depois de falar com o carcereiro, ele continuamos entrevistando até que o candidato atual seja o the two prisoners executed. After speaking to the jailer, didate r + 1, we continue interviewing until the current sabe que ele ou 0 prisioneiro Gera 0 outro a ser executado. melhor que vimos até agora. Paramos entdo e contratamos o he knows that either he or prisoner C will be the other candidate is the best we have seen so far. We then stop Conseqiientemente, a probabilidade de ele ser executado candidato atual. Se nenhum dos candidatos def+1 parané o one to be executed. Hence, the probability that he will be and hire the current candidate. If none of the candidates agora é de apenas 1/2. Assim, apenas por fazer a pergunta ao melhor, acabamos de contratar candidaton. Gostariamos de executed is now only 1/2. Thus, merely by asking the jailer from r + 1 to n is the best, we just hire candidate n. We carcereiro, 0 preso reduziua probabilidade de ser executado calcular a probabilidade de contratarmos o melhor candidato e his question, the prisoner reduced the probability that he would like to compute the probability that we hire the best de 2/3 para 1/2, porque poderia seguir exatamente esse gostariamos de escolherApara tornar essa probabilidade a maior would be executed from 2/3 to 1/2, because he could go candidate and we would like to choose r to make this prob- mesmo raciocinio independentemente da resposta que o possivel. DeixarAsera o evento em que contrataremos o melhor through exactly this same reasoning regardless of which ability as large as possible. Let A be the event that we hire carcereiro desse. Discuta 0 que ha de errado como candidato, e deixaremosBeuseja 0 evento em que o melhor answer the jailer gave. Discuss what is wrong with prisoner the best candidate, and let B; be the event that the best prisioneiro Araciocinio. candidato esteja na posigdoeuna sequéncia de entrevistas. A’s reasoning. candidate is in position i in the sequence of interviews. a.Deixareu > r. Encontre a probabilidade de que 0 candidato que é a. Leti > r. Find the probability that the candidate who 35.Suponha que cada um dos dois jogadoresAeBtem uma relativamente o melhor entre os primeiroseuentrevistado 35. Suppose that each of two gamblers A and B has an is relatively the best among the first i interviewed fortuna inicial de 50 ddlares, e que ha probabilidade p aparece no primeiroRentrevistas. initial fortune of 50 dollars, and that there is probability appears in the first r interviews. aquele jogadorAganhara em qualquer jogada de um jogo b.Prove que Pr(A| Beu=0 paraeusRe Pr(A| Beu= r/eu- p that gambler A will win on any single play of a game b. Prove that Pr(A|B;) =0 for i <r and Pr(A|B;) = contra o jogadorB, Além disso, suponha que um jogador 1 paraeu > r. against gambler B. Also, suppose either that one gambler r/(i —1) fori > r. possa ganhar um dolar do outro em cada jogada ou que ; ; ; . can win one dollar from the other on each play of the game a . possa dobrar as apostas e um possa ganhar dois délares c.Para fixor, deixar prseja oO provavel }ualidade deAusando isso or that they can double the stakes and one can win two c. For fixed r, let p, be the probability of A using that do outro em cada jogada. Sob qual destas duas condicées valor def. Prove issopr=(r/n) eu RH (EU 11. dollars from the other on each play of the game. Under value of r. Prove that p, = (r/n) Di;_, 44 — D~. Atém maior probabilidade de ganhar a fortuna inicial deB d.Deixargr=pr-pr-iparaR=1,..., 7-1, e prove quegré which of these two conditions does A have the greater d. Let g, = p, — p,_, for r=1,...,n—1, and prove antes de perder o seu proprio para cada uma das uma fungdo estritamente decrescente def. probability of winning the initial fortune of B before losing that q, is a strictly decreasing function of r. seguintes condicdes:(a)p <1/2; e.Mostre que um valor deRque maximizapré a Ultima de her own for cach of the following conditions: (a) p < 1/2; e. Show that a value of r that maximizes p, is the last r (b)p >1/2;(c)p=1/2? tal modo quegr>0. (Dica-Escreverpr=pot git... .+gR (b) p > 1/2; (©) p= 1/2? such that g, > 0. (Hint: Write p, = pp +4, +-+-+49,; parar >0.) for r > 0.) 36.Uma sequéncia dencandidatos a emprego esta preparado para uma f.Paran=10, encontre o valor deRque maximizapre 36. A sequence of n job candidates is prepared to inter- f. Forn=10, find the value of r that maximizes p,, and entrevista para um emprego. Gostariamos de contratar o melhor candidato, view for a job. We would like to hire the best candidate, : , oo / encontre o correspondentepavalor. . . vgs : find the corresponding p, value. mas nao temos informagées que possam distinguir os candidatos but we have no information to distinguish the candidates Chapter 3 Random Variables and Distributions 3.1 Random Variables and Discrete Distributions 3.2 Continuous Distributions 3.3 The Cumulative Distribution Function 3.4 Bivariate Distributions 3.5 Marginal Distributions 3.6 Conditional Distributions 3.7 Multivariate Distributions 3.8 Functions of a Random Variable 3.9 Functions of Two or More Random Variables 3.10 Markov Chains 3.11 Supplementary Exercises 3.1 Random Variables and Discrete Distributions A random variable is a real-valued function defined on a sample space. Random variables are the main tools used for modeling unknown quantities in statistical analyses. For each random variable X and each set C of real numbers, we could calculate the probability that X takes its value in C. The collection of all of these probabilities is the distribution of X. There are two major classes of distributions and random variables: discrete (this section) and continuous (Sec. 3.2). Discrete distributions are those that assign positive probability to at most countably many different values. A discrete distribution can be characterized by its probability function (p.f.), which specifies the probability that the random variable takes each of the different possible values. A random variable with a discrete distribution will be called a discrete random variable. Definition of a Random Variable Example 3.1.1 Tossing a Coin. Consider an experiment in which a fair coin is tossed 10 times. In this experiment, the sample space S can be regarded as the set of outcomes consisting of the 210 different sequences of 10 heads and/or tails that are possible. We might be interested in the number of heads in the observed outcome. We can let X stand for the real-valued function defined on S that counts the number of heads in each outcome. For example, if s is the sequence HHTTTHTTTH, then X(s) = 4. For each possible sequence s consisting of 10 heads and/or tails, the value X(s) equals the number of heads in the sequence. The possible values for the function X are 0, 1, . . . , 10. ◀ Definition 3.1.1 Random Variable. Let S be the sample space for an experiment. A real-valued func- tion that is defined on S is called a random variable. For example, in Example 3.1.1, the number X of heads in the 10 tosses is a random variable. Another random variable in that example is Y = 10 − X, the number of tails. 93 C3 felizmente Variáveis aleatórias e Distribuições 3.1Variáveis Aleatórias e Distribuições Discretas 3.2Distribuições Contínuas 3.3A função de distribuição cumulativa 3.4Distribuições Bivariadas 3.5Distribuições Marginais 3.6Distribuições Condicionais 3.7Distribuições Multivariadas 3.8Funções de uma variável aleatória 3.9Funções de duas ou mais variáveis aleatórias 3.10Cadeias de Markov 3.11Exercícios Suplementares 3.1 Variáveis Aleatórias e Distribuições Discretas Uma variável aleatória é uma função de valor real definida em um espaço amostral. Variáveis aleatórias são as principais ferramentas utilizadas para modelagem de quantidades desconhecidas em análises estatísticas. Para cada variável aleatóriaXe cada conjuntoCde números reais, poderíamos calcular a probabilidade de queXleva seu valor emC.A coleção de todas essas probabilidades é a distribuição deX.Existem duas classes principais de distribuições e variáveis aleatórias: discretas (esta seção) e contínuas (Seção 3.2). Distribuições discretas são aquelas que atribuem probabilidade positiva a, no máximo, muitos valores diferentes contáveis. Uma distribuição discreta pode ser caracterizada por sua função de probabilidade (pf), que especifica a probabilidade de a variável aleatória assumir cada um dos diferentes valores possíveis. Uma variável aleatória com distribuição discreta será chamada de variável aleatória discreta. Definição de uma variável aleatória Exemplo 3.1.1 Jogando uma moeda.Considere um experimento em que uma moeda honesta é lançada 10 vezes. Nisso experimento, o espaço amostralSpode ser considerado como o conjunto de resultados que consiste nos 210diferentes sequências de 10 caras e/ou coroas possíveis. Podemos estar interessados no número de caras no resultado observado. Podemos deixarX representam a função de valor real definida emSque conta o número de caras em cada resultado. Por exemplo, seéé a sequência HHTTTTHTTTH, entãoX(s)=4. Para cada sequência possíveléconsistindo em 10 caras e/ou coroas, o valorX(s)é igual ao número de caras na sequência. Os valores possíveis para a funçãoXsão 0,1, . . . ,10. - Definição 3.1.1 Variável aleatória.DeixarSser o espaço amostral para um experimento. Uma função com valor real ção que é definida emSé chamado devariável aleatória. Por exemplo, no Exemplo 3.1.1, o númeroXde caras nos 10 lançamentos é uma variável aleatória. Outra variável aleatória nesse exemplo éS=10 -X, o número de caudas. 93 94 Capitulo 3 Varidveis Aleatdrias e Distribuicgées 94 Chapter 3 Random Variables and Distributions Figura 3.10 evento em que pelo Elétrico Figure 3.1 The event that Electric menos uma demanda de servico at least one utility demand is pubblico é alta no Exemplo 3.1.3. 150 high in Example 3.1.3. 150 : : 1 : 1 Agua Water o) 4 100 200 9) 4 100 200 Exemplo Medindo a altura de uma pessoa.Considere um experimento em que uma pessoa é selecionada em Example Measuring a Person’s Height. Consider an experiment in which a person is selected at 3.1.2 aleatoriamente de alguma populacdo e sua altura em polegadas é medida. Essa altura € uma 3.1.2 random from some population and her height in inches is measured. This height is a varidvel aleatéria. - random variable. < Exemplo Demandas por servicos pUblicos.Considere o empreiteiro no Exemplo 1.5.4 na pagina 19 que é Example Demands for Utilities. Consider the contractor in Example 1.5.4 on page 19 who is 3.1.3 preocupado com a demanda por agua e eletricidade em um novo complexo de 3.1.3 concerned about the demands for water and electricity in a new office complex. The escritérios. O espaco amostral foi retratado na Fig. 1.5 na pagina 12 e consiste em uma sample space was pictured in Fig. 1.5 on page 12, and it consists of a collection of colegdo de pontos da forma(x, y), ondexé a demanda por agua esimé a demanda por points of the form (x, y), where x is the demand for water and y is the demand eletricidade. Ou seja, cada pontoéesé um pareé=(x, y). Uma variavel aleatéria de interesse for electricity. That is, each point s € S is a pair s = (x, y). One random variable neste problema é a demanda por agua. Isto pode ser expresso comoX(s¥ xquando€=(x, y). that is of interest in this problem is the demand for water. This can be expressed Os possiveis valores deXsdo os numeros no intervalo [4,200]. Outra variavel aleatéria as X(s) =x when s = (x, y). The possible values of X are the numbers in the interval interessante éS,igual 4 demanda de eletricidade, que pode ser expressa comoS(sEsim [4, 200]. Another interesting random variable is Y, equal to the electricity demand, quandoé=(x, y). Os possiveis valores deSsdo os numeros no intervalo [1,150]. Uma terceira which can be expressed as Y(s) = y when s = (x, y). The possible values of Y are the variavel aleatoria possivelZé um indicador de se pelo menos uma demanda é alta ou ndo. numbers in the interval [1, 150]. A third possible random variable Z is an indicator of DeixarAeBserdo os dois eventos descritos no Exemplo 1.5.4. Aquilo é,Aé o evento em que whether or not at least one demand is high. Let A and B be the two events described a demanda de agua é de pelo menos 100, eBé in Example 1.5.4. That is, A is the event that water demand is at least 100, and B is caso a demanda elétrica seja de pelo mens 115. Defina the event that electric demand is at least 115. Define 1 seéeAUB, 1 ifse AUB, ZS , Z(s) = . 0 seéeAUB. 0 ifs ZAUB. Os possiveis valores deZsdo os numeros 0 e 1. O eventoAUBé indicado na Fig. The possible values of Z are the numbers 0 and 1. The event A U B is indicated in 3.1. - Fig. 3.1. < A distribuicao de uma variavel aleatéria The Distribution of a Random Variable Quando uma medida de probabilidade é especificada no espaco amostral de um When a probability measure has been specified on the sample space of an experiment, experimento, podemos determinar probabilidades associadas aos valores possiveis de we can determine probabilities associated with the possible values of each random cada variavel aleatériaX. DeixarCser um subconjunto da reta real tal que {XEQé um variable X. Let C be a subset of the real line such that {X € C} is an event, and let evento, e deixe Pr(X€CMdenota a probabilidade de que o valor deXpertencera ao Pr(X € C) denote the probability that the value of X will belong to the subset C. subconjuntoC. Entdo Pr.(XEC igual a probabilidade de que o resultado édo experimento Then Pr(X € C) is equal to the probability that the outcome s of the experiment will sera tal queX(/s}EC. Em simbolos, be such that X(s) € C. In symbols, Pr. (XECFPr.féX(SEQ). (3.1.1) Pr(x € C) = Pr({s: X(s) € C}). (3.1.1) Definicgao Distribuigdo.DeixarXseja uma variavel aleatéria. OdistribuiggodeXé a colegdo de todos Definition Distribution. Let X be arandom variable. The distribution of X is the collection of all 3.1.2 probabilidades da forma Pr(X€C)para todos os conjuntosCde numeros reais tais que {XECG} 6 um 3.1.2 probabilities of the form Pr(X € C) for all sets C of real numbers such that {X € C} evento. is an event. E uma consequéncia direta da definicado da distribuicdo deXque esta distribuicado é It is a straightforward consequence of the definition of the distribution of X that em si uma medida de probabilidade no conjunto de numeros reais. O conjunto this distribution is itself a probability measure on the set of real numbers. The set 3.1 Variaveis Aleatdérias e Distribuigdes Discretas 95 3.1 Random Variables and Discrete Distributions 95 Figura 3.20 evento em que a Elétrico Figure 3.2 The event that Electric demanda de agua esta entre 50 water demand is between 50 e 175 no Exemplo 3.1.5. 150 and 175 in Example 3.1.5. 150 : Lot : Lot 1 : 1 Agua Water 9] 4 100 175 200 9] 4 100 175 200 {X€Gserd um evento para cada setCde numeros reais que a maioria dos leitores sera capaz de {X € C} will be an event for every set C of real numbers that most readers will be imaginar. able to imagine. Exemplo Jogando uma moeda.Considere novamente um experimento em que uma moeda honesta é lancada 10 vezes, Example Tossing a Coin. Consider again an experiment in which a fair coin is tossed 10 times, 3.1.4 e deixarXseja o numero de caras obtidas. Neste experimento, os possiveis valores de 3.1.4 and let X be the number of heads that are obtained. In this experiment, the possible Xsdo 0,1,2,...,10. Para cadax, Pr.(X=x} a soma das probabilidades de todos os values of X are 0, 1, 2,..., 10. For each x, Pr(X = x) is the sum of the probabilities resultados no evento {X=x}. Como a moeda é justa, cada resultado tem a mesma of all of the outcomes in the event {X = x}. Because the coin is fair, each outcome probabilidade 1/210, e precisamos apenas contar quantos resultados éter X/sx. Nés has the same probability 1/2!°, and we need only count how many outcomes s have sabemos issoX/(s-xse e somente se exatamentexdos 10 lancamentos sdo H. X (s) =x. We know that X (s) = x if and only if exactly x of the 10 tosses are H. Hence, Portanto, o numero de resultadosécomX/s#¥xé igual ao numero (de the number of outcomes s with X (s) = x is the same as the number of subsets of size (para serem as caras) que podem ser escolhidos entre os 10 langamentos, a saber,10)subconjumtdedacandah6om x (to be the heads) that can be chosen from the 10 tosses, namely, (1°), according to Definicgdes 1.8.1 e 1.8.2. Por Isso Definitions 1.8.1 and 1.8.2. Hence, 101 10\ 1 Pr. (X=x — _ parax=0,1,2,...,10. - Prix =x) = — forx=0,1,2,...,10. < X210 x } 210 Exemplo Demandas por servigos ptblicos.No Exemplo 1.5.4, calculamos algumas caracteristicas do Example Demands for Utilities. In Example 1.5.4, we actually calculated some features of the 3.1.5 distribuigdes das trés varidveis aleatériasX,5,eZdefinido no Exemplo 3.1.3. Por exemplo, o 3.1.5 distributions of the three random variables X, Y, and Z defined in Example 3.1.3. eventoA, definido como 0 evento em que a demanda de agua é de pelo menos 100, pode ser For example, the event A, defined as the event that water demand is at least 100, can expresso ComoA= {X2100}, e Pr.(A0.5102. Isto significa que o Pr.(X2100 0.5102. A be expressed as A = {X > 100}, and Pr(A) = 0.5102. This means that Pr(X > 100) = distribuigao deXconsiste em todas as probabilidades da forma Pr(X€ Cara todos os conjuntosC 0.5102. The distribution of X consists of all probabilities of the form Pr(X € C) for all de tal modo que {XEGé um evento. Todos estes podem ser calculados de maneira semelhante sets C such that {X € C} is an event. These can all be calculated in a manner similar ao calculo de Pr(Ano Exemplo 1.5.4. Em particular, seCé um subintervalo do intervalo [4,200], to the calculation of Pr(A) in Example 1.5.4. In particular, if C is a subinterval of the entdo interval [4, 200], then 150-1)x(duracdo do intervaloC, 150 — 1) x (length of interval C Pr.(XECF (150-1 x(uracao do intervalo@) (3.1.2) Prix €C)= (150 = D x Gength of interval C) (3.1.2) 29,204 29,204 Por exemplo, seCé 0 intervalo [50,175], entéo seu comprimento é 125, e Pr(XECE 149x12529, For example, if C is the interval [50,175], then its length is 125, and Pr(X € C) = 204 = 0.6378. O subconjunto do espaco amostral cuja probabilidade acabou de ser calculada é 149 x 125/29,204 = 0.6378. The subset of the sample space whose probability was desenhado na Figura 3.2. - just calculated is drawn in Fig. 3.2. < A definigdo geral de distribuigdo na Defini¢do 3.1.2 é estranha e sera util The general definition of distribution in Definition 3.1.2 is awkward, and it will encontrar formas alternativas de especificar as distribuigdes de varidveis aleatorias. be useful to find alternative ways to specify the distributions of random variables. In No restante desta secdo, apresentaremos algumas dessas alternativas. the remainder of this section, we shall introduce a few such alternatives. Distribuicdes Discretas Discrete Distributions Definicgao Distribui¢aéo Discreta/Variavel Aleatéria.Dizemos que uma variavel aleatériaXtem umdiscreto Definition —_ Discrete Distribution/Random Variable. We say that a random variable X has a discrete 3.1.3 distribuic¢goou aquiloXé umvaridvel aleatoria discretaseXpode levar apenas um numero finito k 3.1.3 distribution or that X is a discrete random variable if X can take only a finite number de valores diferentesxi,..., xkou, no Maximo, uma sequéncia infinita de valores diferentes x1, x k of different values x1, ..., x, or, at most, an infinite sequence of different values Qpevee X1,%X2,---- 96 Capitulo 3 Varidveis Aleatdrias e Distribuicgées 96 Chapter 3 Random Variables and Distributions Diz-se que varidveis aleatérias que podem assumir todos os valores em um intervalo tem Random variables that can take every value in an interval are said to have continuous distribuicées continuase sAo discutidos na Sec. 3.2. distributions and are discussed in Sec. 3.2. Definigao Funcao de probabilidade/pf/Suporte.Se uma variavel aleatoriaXtem uma distribui¢do discreta, Definition Probability Function/p.f./Support. If a random variable X has a discrete distribution, 3.1.4 ofun¢do de probabilidadeabreviadopf) dexé definido como a funcgdoftal que 3.1.4 the probability function (abbreviated p.f) of X is defined as the function f such that para todo numero realx, for every real number x, f(x Pr. (X=x). ff) = Pr(X =x). O fechamento do conjunto {x f(x) >0} € chamado deapojo de (a distribuicdo de)Xx. The closure of the set {x : f(x) > 0} is called the support of (the distribution of) X. Alguns autores referem-se a funcgdo de probabilidade como ofun¢ao de massa de probabilidadeou Some authors refer to the probability function as the probability mass function, or pmf Nao utilizaremos esse termo novamente neste texto. p.m.f. We will not use that term again in this text. Exemplo Demandas por servicos publicos.A variavel aleatoriaZno Exemplo 3.1.3 é igual a 1 se pelo menos um Example Demands for Utilities. The random variable Z in Example 3.1.3 equals 1 if at least one 3.1.6 das demandas de servicos publicos é alta, eZ=0 se nenhuma das demandas for alta. DesdeZ 3.1.6 of the utility demands is high, and Z = 0 if neither demand is high. Since Z takes only assume apenas dois valores diferentes, tem uma distribuicdo discreta. Observe que {&:2Z(/s¥1} = two different values, it has a discrete distribution. Note that {s :Z(s) =1}=AUB, AUB, ondeAeBsao definidos no Exemplo 1.5.4. Calculamos Pr(AUB/-0.65253 pol. where A and B are defined in Example 1.5.4. We calculated Pr(A U B) = 0.65253 in Exemplo 1.5.4. SeZtem pffentdo Example 1.5.4. If Z has p.f. f, then {65253 sez, 0.65253 ifz=1, f(z= | 0.34747. sez=0, f(2) = } 0.34747 ifz=0, 0 de outra forma. 0 otherwise. O apoio deZé o conjunto {0,1}, que possui apenas dois elementos. - The support of Z is the set {0, 1}, which has only two elements. < Exemplo Jogando uma moeda.A variavel aleatoriaXno Exemplo 3.1.4 tem apenas 11 possiveis diferentes Example Tossing a Coin. The random variable X in Example 3.1.4 has only 11 different possible 3.1.7 valores. E pf dado no final desse exemplo para os valoresx=0,...,10 que 3.1.7 values. Its p.f. f is given at the end of that example for the values x = 0, ..., 10 that constituem o suporte deX//x-0 para todos os outros valores dex. - constitute the support of X; f(x) =0 for all other values of x. < Aqui estado alguns fatos simples sobre funcées de probabilidade Here are some simple facts about probability functions Teorema DeixarXseja uma variavel aleatdéria discreta com PF£Sexndo é um dos valores possiveis Theorem Let X be a discrete random variable with p.f. f. If x is not one of the possible values 3.1.1 dex, entao¥x0. Além disso, se a sequénciax1, x2, .. .inclui todos os valores possiveis 3.1.1 of X, then f(x) = 0. Also, if the sequence xj, x, ... includes all the possible values deX,ento eu=1f(xeu=1. a of X, then ran f@) =1. a Um PF tipico é esbogado na Fig. 3.3, em que cada segmento vertical representa o A typical p.f. is sketched in Fig. 3.3, in which each vertical segment represents valor def(xkcorrespondente a um valor possivelx. A soma das alturas dos segmentos the value of f(x) corresponding to a possible value x. The sum of the heights of the verticais na Fig. 3.3 deve ser 1. vertical segments in Fig. 3.3 must be 1. Figura 3.3Um exemplo de Ax) Figure 3.3 An example of Sx) PF a pf x1 x2 x3 O x4 x xy X, x3 «0 X4 x 3.1 Variaveis Aleatdérias e Distribuigdes Discretas 97 3.1 Random Variables and Discrete Distributions 97 O Teorema 3.1.2 mostra que o FP de uma variavel aleatéria discreta caracteriza Theorem 3.1.2 shows that the p.f. of a discrete random variable characterizes its sua distribuicgdo e nos permite dispensar a definigdo geral de distribuigdo quando distribution, and it allows us to dispense with the general definition of distribution discutimos variaveis aleatorias discretas. when we are discussing discrete random variables. Teorema SeXtem uma distribui¢gdo discreta, a probabilidade de cada subconjunto Cda linha real pode Theorem If X has a discrete distribution, the probability of each subset C of the real line can 3.1.2 ser determinado a partir da relagdo 5 3.1.2 be determined from the relation Pr.(XECE f{xeu). = Pr(X €C)= > f(x). = xeuEC xzpEC Algumas variaveis aleatérias tém distribuigdes que aparecem com tanta frequéncia que Some random variables have distributions that appear so frequently that the as distribuigées recebem nomes. A variavel aleatériaZno Exemplo 3.1.6 é um deles. distributions are given names. The random variable Z in Example 3.1.6 is one such. Definicgao Distribuigdo de Bernoulli/variavel aleatéria.Uma variavel aleatériaZisso leva apenas dois Definition Bernoulli Distribution/Random Variable. A random variable Z that takes only two 3.1.5 valores 0 e 1 com Pr(Z=1ptem oDistribuicao de Bernoulli com parametrop. Também 3.1.5 values 0 and 1 with Pr(Z = 1) = p has the Bernoulli distribution with parameter p. dizemos queZ um Varidvel aleatéria de Bernoulli com parametrop. We also say that Z is a Bernoulli random variable with parameter p. OZno Exemplo 3.1.6 tem a distribuigao de Bernoulli com parametro 0.65252. E facil The Z in Example 3.1.6 has the Bernoulli distribution with parameter 0.65252. It perceber que o nome de cada distribuigdo de Bernoulli é suficiente para nos permitir is easy to see that the name of each Bernoulli distribution is enough to allow us to calcular o FP, o que, por sua vez, nos permite caracterizar a sua distribuicdo. compute the p.f., which, in turn, allows us to characterize its distribution. Concluimos esta secdo com ilustracdes de duas familias adicionais de distribuicdes discretas que We conclude this section with illustrations of two additional families of discrete surgem com frequéncia suficiente para terem nomes. distributions that arise often enough to have names. Distribuigées Uniformes em Numeros Inteiros Uniform Distributions on Integers Exemplo Numeros diarios.Um popular jogo de loteria estadual exige que os participantes selecionem um jogo de trés Example Daily Numbers. A popular state lottery game requires participants to select a three- 3.1.8 numero de digito (0s iniciais permitidos). Em seguida, trés bolas, cada uma com um digito, sdo 3.1.8 digit number (leading 0s allowed). Then three balls, each with one digit, are chosen at escolhidas aleatoriamente em tigelas bem misturadas. O espaco amostral aqui consiste em random from well-mixed bowls. The sample space here consists of all triples (i,, i>, i3) todos os triplos(eu, euz, eus) ondeeu€ {0,...,9} paraf1,2,3. Se&=(eun, eur, eu3), definirX(sé where ij€ {0O,..., 9} for 7 =1, 2, 3. Ifs = (Gj, in, i3), define X(s) = 100i, + 10%, + i3. 100eu+ 10eu2+eus. Por exemplo,X0,1,515. E facil verificar que Pr(X=x=0.001 para cada For example, X (0, 1, 5) = 15. It is easy to check that Pr(X = x) = 0.001 for each numero inteiroxe {0,1,... ,999}. - integer x € {0,1,..., 999}. < Definicao Distribuigdo Uniforme em Inteiros.Deixarasbsejam inteiros. Suponha que o valor de um Definition Uniform Distribution on Integers. Let a <b be integers. Suppose that the value of a 3.1.6 variavel aleatériaXé igualmente provavel que seja cada um dos nimeros inteirosa,..., b. Entao 3.1.6 random variable X is equally likely to be each of the integers a, ..., b. Then we say dizemos issoXtem odistribuicao uniforme nos inteirosa,..., b. that X has the uniform distribution on the integers a, ..., b. OXno Exemplo 3.1.8 tem a distribuigdo uniforme nos inteiros 0,1, ...,999. Uma The X in Example 3.1.8 has the uniform distribution on the integers 0, 1, ..., 999. distribuigdo uniforme em um conjunto dekinteiros tem probabilidade 1/kem cada A uniform distribution on a set of k integers has probability 1/k on each integer. numero inteiro. Seb > uma, hab-a+1 inteiro deaparabincluindoaed. O préximo If b > a, there are b — a + 1 integers from a to b including a and b. The next result resultado segue imediatamente o que acabamos de ver e ilustra como 0 nome da follows immediately from what we have just seen, and it illustrates how the name of distribui¢do caracteriza a distribuicdo. the distribution characterizes the distribution. Teorema SeXtem a distribuigdo uniforme nos inteirosa, ..., b, o FP deXé Theorem If X has the uniform distribution on the integers a, ..., b, the p.f. of X is 3.1.3 3.1.3 { arax=a b | for b ——_—_ =d,...,0, —_ x=da,...,b, fxr |barl P FO=) batt 0 de outra forma. a 0 otherwise. a A distribuicdo uniforme nos inteirosa, ..., brepresenta o resultado de um experimento The uniform distribution on the integers a, ..., b represents the outcome of an que é frequentemente descrito dizendo que um dos inteirosa, ..., béescol/hido aleatoriamente. experiment that is often described by saying that one of the integersa, ..., bischosen Neste contexto, a frase “ao acaso” significa que cada um dosb-a+1 inteiro tem a mesma at random. In this context, the phrase “at random” means that each of the b -a+1 probabilidade de ser escolhido. Neste mesmo sentido, no é possivel escolher aleatoriamente integers is equally likely to be chosen. In this same sense, it is not possible to choose um numero inteiro do conjunto detodosinteiros positivos, porque nao é possivel an integer at random from the set of all positive integers, because it is not possible 98 Capitulo 3 Varidveis Aleatdrias e Distribuicgées 98 Chapter 3 Random Variables and Distributions atribuir a mesma probabilidade a cada um dos inteiros positivos e ainda assim tornar a soma to assign the same probability to every one of the positive integers and still make the dessas probabilidades igual a 1. Em outras palavras, uma distribuicao uniforme nao pode ser sum of these probabilities equal to 1. In other words, a uniform distribution cannot atribuida a uma sequéncia infinita de valores possiveis, mas tal distribuicdo pode ser atribuida a be assigned to an infinite sequence of possible values, but such a distribution can be qualquer sequéncia finita. assigned to any finite sequence. Nota: Varidveis aleatérias podem ter a mesma distribuigao sem serem a mesma Note: Random Variables Can Have the Same Distribution without Being the variavel aleatéria.Considere dois sorteios diarios consecutivos de numeros, como no Same Random Variable. Consider two consecutive daily number draws as in Ex- Exemplo 3.1.8. O espago amostral consiste em todas as 6 tuplas(e, ..., eU6), onde as ample 3.1.8. The sample space consists of all 6-tuples (41, ...,i¢), where the first trés primeiras coordenadas sao os nimeros sorteados no primeiro dia e as trés ultimas three coordinates are the numbers drawn on the first day and the last three are the sdo os numeros sorteados no segundo dia (todos na ordem sorteada). Seé=(em,..., eu6), numbers drawn on the second day (all in the order drawn). If s = (i, ..., ig), let deixar Xi(s=100eui+ 10 eu2+ euse deixar. X2(s-100eu4t+ 10 eus+eus. E facil ver issoXi X1(s) = 100i; + 10i, +73 and let X2(s) = 100i4 + 1075 + ig. It is easy to see that X, eX2sdo0 fungées diferentes deée ndo sdo a mesma variavel aleatéria. Na verdade, ha and X, are different functions of s and are not the same random variable. Indeed, apenas uma pequena probabilidade de que assumam 0 mesmo valor. Mas tém a there is only a small probability that they will take the same value. But they have mesma distribuigdo porque assumem os mesmos valores com as mesmas the same distribution because they assume the same values with the same probabil- probabilidades. Se um empresario tem 1000 clientes numerados 0,...,999, e ele ities. If a businessman has 1000 customers numbered 0, ... , 999, and he selects one seleciona um aleatoriamente e registra o numeroS,a distribuigdo deSsera igual a at random and records the number Y, the distribution of Y will be the same as the distribuigdo deXie dexX2, masSndo é comoNouX2de qualquer outra forma. distribution of X, and of X>, but Y is not like X, or X> in any other way. Distribuigdes Binomiais Binomial Distributions Exemplo Pecas defeituosas.Considere novamente o Exemplo 2.2.5 da pagina 69. Nesse exemplo, uma maquina Example Defective Parts. Consider again Example 2.2.5 from page 69. In that example, a ma- 3.1.9 a China produz um item defeituoso com probabilidadep(0 <p <1) e produz um item nao 3.1.9 chine produces a defective item with probability p (0 < p < 1) and produces a non- defeituoso com probabilidade 1 -p. Assumimos que os eventos em que os diferentes itens defective item with probability 1 — p. We assumed that the events that the different apresentavam defeitos eram mutuamente independentes. Suponha que o experimento items were defective were mutually independent. Suppose that the experiment con- consista em examinarndesses itens. Cada resultado deste experimento consistira em uma lista sists of examining n of these items. Each outcome of this experiment will consist of de quais itens estao com defeito e quais nao estado, na ordem examinada. Por exemplo, a list of which items are defective and which are not, in the order examined. For ex- podemos considerar 0 como um item sem defeito e 1 como um item com defeito. Entdo cada ample, we can let 0 stand for a nondefective item and 1 stand for a defective item. resultado é uma sequéncia dendigitos, cada um dos quais é 0 ou 1. Para ser mais especifico, se, Then each outcome is a string of n digits, each of which is 0 or 1. To be specific, if, digamos,n=6, entao alguns dos resultados possiveis sdo say, n = 6, then some of the possible outcomes are 010010,100100,000011,110.000,100001,000000,etc. (3.1.3) 010010, 100100, 000011, 110000, 100001, 000000, etc. (3.1.3) Nos vamos deixarXdenota o numero desses itens que estado com defeito. Entdo a We will let X denote the number of these items that are defective. Then the random variavel aleatoriaXtera uma distribuigdo discreta, e os possiveis valores deXsera 0,1,2 variable X will have a discrete distribution, and the possible values of X will be ,...,N. Por exemplo, os primeiros quatro resultados listados na Eq. (3.1.3) todos tém 0,1,2,...,n. For example, the first four outcomes listed in Eq. (3.1.3) all have X(S2. O Ultimo resultado listado foiX/s-0. - X(s) =2. The last outcome listed has X(s) = 0. < O Exemplo 3.1.9 € uma generalizagdo do Exemplo 2.2.5 comnitens inspecionados em Example 3.1.9 is a generalization of Example 2.2.5 with n items inspected rather vez de apenas seis, e reescritos na notagdo de variaveis aleatérias. Parax=0,1,...,7,a than just six, and rewritten in the notation of random variables. For x =0,1,..., 7, probabilidade de obter cada sequéncia ordenada especifica denitens contendo the probability of obtaining each particular ordered sequence of n items containing exatamentexdefeituosos en-(x n)ndo defeituoso épx(1 -p)n-x, assim como foi no Ex- exactly x defectives and n — x nondefectives is p*(1 — p)"~*, just as it was in Ex- amplo 2.2.5. Ja que existem diferentes sequéncias ordenadas deste tipo, segue-se ample 2.2.5. Since there are (") different ordered sequences of this type, it follows that ( ) Pr. (X=x}F x px(1 -p)r-x. Prix =x) = (")ova — py”. X Portanto, o FP deXsera o seguinte: Therefore, the p.f. of X will be as follows: tO) Q n x n-x fix XPx(1 -p)n-x parax=0,1,...,7, (3.1.4) f(x) = | (2) p*(1— p) forx =0,1,...,n, (3.1.4) 0 de outra forma. 0 otherwise. Definigao Distribuigdo Binomial/Variavel Aleatéria.A distribuicdo discreta representada pela Definition Binomial Distribution/Random Variable. The discrete distribution represented by the 3.1.7 pf em (3.1.4) 6 chamado dedistribuicgo binomial com pardmetrosnep. Aleatério 3.1.7 p.f. in (3.1.4) is called the binomial distribution with parameters n and p. A random 3.1 Random Variables and Discrete Distributions 99 variable with this distribution is said to be a binomial random variable with parame- ters n and p. The reader should be able to verify that the random variable X in Example 3.1.4, the number of heads in a sequence of 10 independent tosses of a fair coin, has the binomial distribution with parameters 10 and 1/2. Since the name of each binomial distribution is sufficient to construct its p.f., it follows that the name is enough to identify the distribution. The name of each distri- bution includes the two parameters. The binomial distributions are very important in probability and statistics and will be discussed further in later chapters of this book. A short table of values of certain binomial distributions is given at the end of this book. It can be found from this table, for example, that if X has the bino- mial distribution with parameters n = 10 and p = 0.2, then Pr(X = 5) = 0.0264 and Pr(X ≥ 5) = 0.0328. As another example, suppose that a clinical trial is being run. Suppose that the probability that a patient recovers from her symptoms during the trial is p and that the probability is 1 − p that the patient does not recover. Let Y denote the number of patients who recover out of n independent patients in the trial. Then the distribution of Y is also binomial with parameters n and p. Indeed, consider a general experiment that consists of observing n independent repititions (trials) with only two possible results for each trial. For convenience, call the two possible results “success” and “failure.” Then the distribution of the number of trials that result in success will be binomial with parameters n and p, where p is the probability of success on each trial. Note: Names of Distributions. In this section, we gave names to several families of distributions. The name of each distribution includes any numerical parameters that are part of the definition. For example, the random variable X in Example 3.1.4 has the binomial distribution with parameters 10 and 1/2. It is a correct statement to say that X has a binomial distribution or that X has a discrete distribution, but such statements are only partial descriptions of the distribution of X. Such statements are not sufficient to name the distribution of X, and hence they are not sufficient as answers to the question “What is the distribution of X?” The same considerations apply to all of the named distributions that we introduce elsewhere in the book. When attempting to specify the distribution of a random variable by giving its name, one must give the full name, including the values of any parameters. Only the full name is sufficient for determining the distribution. Summary A random variable is a real-valued function defined on a sample space. The distri- bution of a random variable X is the collection of all probabilities Pr(X ∈ C) for all subsets C of the real numbers such that {X ∈ C} is an event. A random variable X is discrete if there are at most countably many possible values for X. In this case, the distribution of X can be characterized by the probability function (p.f.) of X, namely, f (x) = Pr(X = x) for x in the set of possible values. Some distributions are so famous that they have names. One collection of such named distributions is the collection of uniform distributions on finite sets of integers. A more famous collection is the col- lection of binomial distributions whose parameters are n and p, where n is a positive integer and 0 < p < 1, having p.f. (3.1.4). The binomial distribution with parameters n = 1 and p is also called the Bernoulli distribution with parameter p. The names of these distributions also characterize the distributions. 3.1 Variáveis Aleatórias e Distribuições Discretas 99 variável com esta distribuição é considerada umavariável aleatória binomial com parâmetrosnep. O leitor deve ser capaz de verificar que a variável aleatóriaXno Exemplo 3.1.4, o número de caras numa sequência de 10 lançamentos independentes de uma moeda honesta, tem a distribuição binomial com parâmetros 10 e 1/2. Como o nome de cada distribuição binomial é suficiente para construir seu FP, segue-se que o nome é suficiente para identificar a distribuição. O nome de cada distribuição inclui os dois parâmetros. As distribuições binomiais são muito importantes em probabilidade e estatística e serão discutidas mais detalhadamente em capítulos posteriores deste livro. Uma pequena tabela de valores de certas distribuições binomiais é fornecida no final deste livro. Pode-se verificar nesta tabela, por exemplo, que seXtem a distribuição binomial com parâmetrosn=10 ep=0.2, então Pr(X=5)=0.0264 e Pr(X≥ 5)=0.0328. Como outro exemplo, suponha que um ensaio clínico esteja sendo realizado. Suponha que a probabilidade de uma paciente se recuperar dos sintomas durante o estudo sejape que a probabilidade é 1 -pque o paciente não se recupere. DeixarSdenota o número de pacientes que se recuperam denpacientes independentes no estudo. Então a distribuição deStambém é binomial com parâmetrosnep. Na verdade, considere um experimento geral que consiste em observarnrepetições independentes (tentativas) com apenas dois resultados possíveis para cada tentativa. Por conveniência, chame os dois resultados possíveis de “sucesso” e “fracasso”. Então a distribuição do número de tentativas que resultam em sucesso será binomial com parâmetrosnep, ondepé a probabilidade de sucesso em cada tentativa. Nota: Nomes das Distribuições.Nesta seção, demos nomes a diversas famílias de distribuições. O nome de cada distribuição inclui quaisquer parâmetros numéricos que façam parte da definição. Por exemplo, a variável aleatóriaXno Exemplo 3.1.4 tem a distribuição binomial com parâmetros 10 e 1/2. É uma afirmação correta dizer queXtem uma distribuição binomial ou queXtem uma distribuição discreta, mas tais afirmações são apenas descrições parciais da distribuição deX. Tais declarações sãonãosuficiente para nomear a distribuição deXe, portanto, não são suficientes como respostas à pergunta “Qual é a distribuição deX?” As mesmas considerações se aplicam a todas as distribuições nomeadas que apresentamos em outras partes deste livro. Ao tentar especificar a distribuição de uma variável aleatória fornecendo seu nome, deve-se fornecer o nome completo, incluindo os valores de quaisquer parâmetros. Apenas o nome completo é suficiente para determinar a distribuição. Resumo Uma variável aleatória é uma função de valor real definida em um espaço amostral. A distribuição de uma variável aleatóriaXé a coleção de todas as probabilidades Pr(X∈C)para todos os subconjuntosCdos números reais tais que {X∈C}é um evento. Uma variável aleatóriaXé discreto se houver no máximo muitos valores possíveis paraX. Neste caso, a distribuição deXpode ser caracterizado pela função de probabilidade (pf) deX, a saber, f(x) =Pr.(X=x)paraxno conjunto de valores possíveis. Algumas distribuições são tão famosas que têm nomes. Uma coleção dessas distribuições nomeadas é a coleção de distribuições uniformes em conjuntos finitos de inteiros. Uma coleção mais famosa é a coleção de distribuições binomiais cujos parâmetros sãonep, ondené um número inteiro positivo e 0 <p <1, tendo PF (3.1.4). A distribuição binomial com parâmetros n=1 eptambém é chamada de distribuição de Bernoulli com parâmetrop. Os nomes dessas distribuições também caracterizam as distribuições. 100 Capitulo 3 Varidveis Aleatdrias e Distribuigées 100 Chapter 3 Random Variables and Distributions Exercicios Exercises 1.Suponha que uma variavel aleatériaXtem a distribuicgdo Use o fato de que Pr(xX25FPr.(Ss3), ondeStem a 1. Suppose that a random variable X has the uniform dis- Use the fact that Pr(X > 5) = Pr(Y <3), where Y has the uniforme nos inteiros 10, ...,20. Encontre a probabilidade de distribuigdo binomial com pardametrosn=8 ep=0.3. tribution on the integers 10, ..., 20. Find the probability binomial distribution with parameters n = 8 and p = 0.3. queXé par. that X is even. . . 8.Se 10 por cento das bolas em uma determinada caixa sdo 8. If 10 percent of the balls in a certain box are red, and 2.Suponha que uma variavel aleatériaXtem uma distribuicdo vermelhas, e se 20 bolas sao selecionadas aleatoriamente da 2. Suppose that a random variable X has a discrete distri- if 20 balls are selected from the box at random, with re- discreta com o seguinte PF: caixa, com reposicao, qual é a probabilidade de que mais de trés bution with the following p.f.: placement, what is the probability that more than three { bolas vermelhas sejam obtidas? red balls will be obtained? fix CX parax=1,...,5, f(x) = cx forx=1,...,5, 0 caso contrario. 9.Suponha que uma variavel aleatériaXtem uma distribuicdo ~ 10 otherwise. 9. Suppose that a random variable X has a discrete distri- . discreta com o seguinte PF: . bution with the following p.f.: Determine o valor da constantec. {c Determine the value of the constant c. =0,1,2,...,d . < forx=0,1,2,..., 3.Suponha que dois dados equilibrados sejam langados e f(x 2x Para ° 3. Suppose that two balanced dice are rolled, and let X f= | 2 * deixeX denota 0 valor absoluto da diferenga entre os dois O - outra forma. denote the absolute value of the difference between the O otherwise. numeros que aparecem. Determine e esboce o FP dex. Encontre o valor da constantec. two numbers that appear. Determine and sketch the p.f. Find the value of the constant c , of X. ‘ 10.U heiro civil esta estudand fai irar é . oe : . 10. A civil i is studyi left-t ] that i 4.Suponha que uma moeda honesta seja langada 10 vezes de m éngenneire em esta estucaneo uma tana Para virar 3 4. Suppose that a fair coin is tossed 10 times indepen- 0 civ’ engnieer 1s Stuaying a urn lane mar Is . . , esquerda que seja longa o suficiente para acomodar sete carros. . : long enough to hold seven cars. Let X be the number forma independente. Determine o PF do numero de caras que : . ; : . . dently. Determine the p.f. of the number of heads that will . 50 obtidas DeixarXseja o numero de carros na pista no final de um sinal be obtained of cars in the lane at the end of a randomly chosen red Serge o , vermelho escolhido aleatoriamente. O engenheiro acredita que a . light. The engineer believes that the probability that X = 5.Suponha que uma caixa contenha sete bolas vermelhas e probabilidade de queX= xé proporcional a(x+1)8 -x)parax=0,...,7 5. Suppose that a box contains seven red balls and three x is proportional to (x + 1)(8 — x) for x =0,..., 7 (the tr€s bolas azuis. Se forem selecionadas cinco bolas ao acaso, (os possiveis valores dex). blue balls. If five balls are selected at random, without possible values of X). sem reposicdo, determine o FP do numero de bolas vermelhas a.Encontre o PF deX. replacement, determine the p.f. of the number of red balls a. Find the pf. of X. do obtidas. that will be obtained. . ue . que serac obnaas b.Encontre a probabilidade de queXserao pelo menos 5. b. Find the probability that X will be at least 5. 6.Suponha que uma variavel aleatoriaxXtem a distribuic¢do 414.M ~ . h , 6. Suppose that a random variable X has the binomial dis- 11. Show that there d . b hth binomial com pardmetrosn=15 ep=0.5. Encontre PR(X <6). , ostre que nao existe nenhum numeroctal que a tribution with parameters n = 15 and p = 0.5. Find Pr(X < - Show that there does not exist any number c such that seguinte fungdo seria um PF: 6) the following function would be a p.f:: {c . 7.Suponha que uma variavel aleatériaXtem a distribuigado fix x parax=1,2,...,de 7. Suppose that a random variable X has the binomial dis- fos £ forx=1,2,..., binomial com parametrosn=8 ep=0.7. Encontre PR(X2 5) 0 -outra forma. tribution with parameters n = 8 and p = 0.7. Find Pr(x > 0 otherwise. usando a tabela fornecida no final deste livro. Dica: 5) by using the table given at the end of this book. Hint: 3.2 Distribuigdes Continuas 3.2 Continuous Distributions A seguir, nos concentramos em varidveis aleatorias que podem assumir todos os valores Next, we focus on random variables that can assume every value in an interval em um intervalo (limitado ou ilimitado), Se uma varidvel aleatériaXtem associada a ele uma (bounded or unbounded). If a random variable X has associated with it a function funcdo ftal que a integral defem cada intervalo dé a probabilidade de queXesté no f such that the integral of f over each interval gives the probability that X is in the intervalo, entao chamamosta fun¢ao de densidade de probabilidade (pdf) deXe dizemos interval, then we call f the probability density function (p.d.f) of X and we say issoXtem uma distribui¢ao continua. that X has a continuous distribution. A fungdo de densidade de probabilidade The Probability Density Function Exemplo Demandas por servicos pUblicos.No Exemplo 3.1.5, determinamos a distribuigdo do de- Example Demands for Utilities. In Example 3.1.5, we determined the distribution of the de- 3.2.1 mando por agua,X. Na Figura 3.2, vemos que 0 menor valor possivel dexé 4 e 0 3.2.1 mand for water, X. From Fig. 3.2, we see that the smallest possible value of X is 4 maior é 200. Para cada intervaloC= [a, a]C [4,200], Eq. (3.1.2) diz que and the largest is 200. For each interval C = [co, cy] c [4, 200], Eq. (3.1.2) says that 149(ci-@) _ ¢ Jot 149(c; cp) cy —€ cy Pr.(@sxsaiF eee 1-42 = — ax. Pr(cy < X < cy) = ——1 = 1 -| —dx. 29204 196 0196 29204 196 co (196 3.2 Distribuigdes Continuas 101 3.2 Continuous Distributions 101 Entdo, se definirmos So, if we define 1 1 — se 4sx<200, — if4<x <200, fix 96 (3.2.1) f(x) = 4 196 “= (3.2.1) 0 de outra forma, 0 otherwise, nds temos isso j we have that cl Cy Pr.(asXsaie fixJdx. (3.2.2) Pr(cg < X <cy) = / f@)dx. (3.2.2) a c0 Porque definimosf(xser 0 paraxfora do intervalo [4,200], vemos que a Eq. (3.2.2) Because we defined f(x) to be 0 for x outside of the interval [4, 200], we see that Eq. vale para todosa@sci, ainda quea= -~e/ouci=, - (3.2.2) holds for all co < cy, even if cp = —00 and/or c; = 00. < A demanda de aguaXno Exemplo 3.2.1 € um exemplo do seguinte. The water demand X in Example 3.2.1 is an example of the following. Definicgao Distribuigdo Continua/Variavel Aleatoria.Dizemos que uma variavel aleatériaXtem um Definition | Continuous Distribution/Random Variable. We say that a random variable X has a 3.2.1 distribuicgo continuaou aquiloXé umvaridvel aleatoria continuase existe uma funcdo 3.2.1 continuous distribution or that X is a continuous random variable if there exists a ndo negativafdefinido na reta real, de modo que para cada intervalo de numeros nonnegative function f, defined on the real line, such that for every interval of real reais (limitado ou ilimitado), a probabilidade de queXassume um valor no intervalo é numbers (bounded or unbounded), the probability that X takes a value in the interval a integral defdurante o intervalo. is the integral of f over the interval. Por exemplo, na situacdo descrita na Definigdo 3.2.1, para cada intervalo fechado limitado For example, in the situation described in Definition 3.2.1, for each bounded closed tum, 6], interval [a, b], Jo b Pr.(asXSb f(xax. (3.2.3) Pr(a < X <b) -| f(x) dx. (3.2.3) a a Da mesma forma, Pr.( X2aF J Fane Pr(XXK bb J of(x)dx. Similarly, Pr(Xx = a) = fe f(x) dx and Pr(Xx < b) = L f(x) dx. Vemos que a fun¢doftaracteriza a distribuigdo de uma variavel aleatéria continua We see that the function f characterizes the distribution of a continuous ran- da mesma maneira que a fungdo de probabilidade caracteriza a distribuigdo de uma dom variable in much the same way that the probability function characterizes the variavel aleatoria discreta. Por esse motivo, a fungdofdesempenha um papel distribution of a discrete random variable. For this reason, the function f plays an importante e por isso lhe damos um nome. important role, and hence we give it a name. Definigao Fungdo de densidade de probabilidade/pdf/Suporte.SexXtem uma distribuigdo continua, o Definition Probability Density Function/p.d.f./Support. If X has a continuous distribution, the 3.2.2 fungdofdescrito na Definigdo 3.2.1 é chamado defun¢do densidade de probabilidade 3.2.2 function f described in Definition 3.2.1 is called the probability density function (abreviadopa/s) dex. O fechamento do conjunto {xf (x) >0} 6 chamado deapoio de (a (abbreviated p.d.f) of X. The closure of the set {x : f(x) > 0} is called the support distribuic¢ao de)x. of (the distribution of) X. O Exemplo 3.2.1 demonstra que a demanda de aguaXtem pdf dado por (3.2.1). Example 3.2.1 demonstrates that the water demand X has p.d.f. given by (3.2.1). Cada pdffdeve satisfazer os dois requisitos seguintes: Every p.d.f. f must satisfy the following two requirements: f(x20,para todo x, (3.2.4) f()=0, for all x, (3.2.4) e and Joo oo f(xJdx=1. (3.2.5) / f(x) dx =1. (3.2.5) _ _ Uma FDP tipica 6 esbogada na Figura 3.4. Nessa figura, a area total sob a A typical p.d.f. is sketched in Fig. 3.4. In that figure, the total area under the curve curva deve ser 1, e o valor de Pr(asXS 6% igual a area da regido sombreada. must be 1, and the value of Pr(a < X <b) is equal to the area of the shaded region. Nota: Distribuigées continuas atribuem probabilidade 0 a valores individuais.A Note: Continuous Distributions Assign Probability 0 to Individual Values. The integral na Eq. (3.2.3) também é igual a Pr(uma < X<bjassim como o Pr(a <X < bk Pr(as integral in Eq. (3.2.3) also equals Pr(a < X <b) as well as Pr(a < X <b) and Pr(a < X<b). Portanto, segue-se da definigdo de distribuigdes continuas que, seXtem uma X <b). Hence, it follows from the definition of continuous distributions that, if X distribuigdo continua, Pr(X=a-0 para cada numeroa. Como observamos na pagina 20, 0 has a continuous distribution, Pr(X =a) =0 for each number a. As we noted on fato de o Pr(X=a¥0 nao significa queX=aé impossivel. Se isso acontecesse, page 20, the fact that Pr(X = a) = 0 does not imply that X = a is impossible. If it did, 102 Chapter 3 Random Variables and Distributions Figure 3.4 An example of a p.d.f. a b x f(x) all values of X would be impossible and X couldn’t assume any value. What happens is that the probability in the distribution of X is spread so thinly that we can only see it on sets like nondegenerate intervals. It is much the same as the fact that lines have 0 area in two dimensions, but that does not mean that lines are not there. The two vertical lines indicated under the curve in Fig. 3.4 have 0 area, and this signifies that Pr(X = a) = Pr(X = b) = 0. However, for each ϵ > 0 and each a such that f (a) > 0, Pr(a − ϵ ≤ X ≤ a + ϵ) ≈ 2ϵf (a) > 0. Nonuniqueness of the p.d.f. If a random variable X has a continuous distribution, then Pr(X = x) = 0 for every individual value x. Because of this property, the values of each p.d.f. can be changed at a finite number of points, or even at certain infinite sequences of points, without changing the value of the integral of the p.d.f. over any subset A. In other words, the values of the p.d.f. of a random variable X can be changed arbitrarily at many points without affecting any probabilities involving X, that is, without affecting the probability distribution of X. At exactly which sets of points we can change a p.d.f. depends on subtle features of the definition of the Riemann integral. We shall not deal with this issue in this text, and we shall only contemplate changes to p.d.f.’s at finitely many points. To the extent just described, the p.d.f. of a random variable is not unique. In many problems, however, there will be one version of the p.d.f. that is more natural than any other because for this version the p.d.f. will, wherever possible, be continuous on the real line. For example, the p.d.f. sketched in Fig. 3.4 is a continuous function over the entire real line. This p.d.f. could be changed arbitrarily at a few points without affecting the probability distribution that it represents, but these changes would introduce discontinuities into the p.d.f. without introducing any apparent advantages. Throughout most of this book, we shall adopt the following practice: If a random variable X has a continuous distribution, we shall give only one version of the p.d.f. of X and we shall refer to that version as the p.d.f. of X, just as though it had been uniquely determined. It should be remembered, however, that there is some freedom in the selection of the particular version of the p.d.f. that is used to represent each continuous distribution. The most common place where such freedom will arise is in cases like Eq. (3.2.1) where the p.d.f. is required to have discontinuities. Without making the function f any less continuous, we could have defined the p.d.f. in that example so that f (4) = f (200) = 0 instead of f (4) = f (200) = 1/196. Both of these choices lead to the same calculations of all probabilities associated with X, and they 102 Capítulo 3 Variáveis Aleatórias e Distribuições Figura 3.4Um exemplo de pdf f(x) a b x todos os valores deXseria impossível eXnão poderia assumir nenhum valor. O que acontece é que a probabilidade na distribuição deXestá tão espalhado que só podemos vê-lo em conjuntos como intervalos não degenerados. É praticamente o mesmo que as linhas terem área 0 em duas dimensões, mas isso não significa que as linhas não existam. As duas linhas verticais indicadas sob a curva na Fig. 3.4 têm área 0, e isso significa que Pr (X=a)=Pr.(X=b)=0. No entanto, para cadaε >0 e cadaade tal modo quef(a) >0, pr(a-ε≤X≤a+e) ≈2εf(a) >0. Não exclusividade do pdf Se uma variável aleatóriaXtem uma distribuição contínua, então Pr(X=x)=0 para cada valor individualx. Devido a esta propriedade, os valores de cada pdf podem ser alterados em um número finito de pontos, ou mesmo em certas sequências infinitas de pontos, sem alterar o valor da integral da pdf sobre qualquer subconjuntoA. Em outras palavras, os valores da pdf de uma variável aleatóriaXpode ser alterado arbitrariamente em muitos pontos sem afetar quaisquer probabilidades envolvendoX, isto é, sem afetar a distribuição de probabilidade deX. Exatamente em quais conjuntos de pontos podemos alterar uma pdf depende de características sutis da definição da integral de Riemann. Não trataremos deste assunto neste texto e apenas contemplaremos alterações nos pdfs em um número finito de pontos. Na medida que acabamos de descrever, a fdp de uma variável aleatória não é única. Em muitos problemas, entretanto, haverá uma versão da pdf que é mais natural do que qualquer outra porque para esta versão a pdf será, sempre que possível, contínua na linha real. Por exemplo, a pdf esboçada na Fig. 3.4 é uma função contínua sobre toda a reta real. Esta fdp poderia ser alterada arbitrariamente em alguns pontos sem afectar a distribuição de probabilidade que representa, mas estas alterações introduziriam descontinuidades na fdp sem introduzir quaisquer vantagens aparentes. Ao longo da maior parte deste livro, adotaremos a seguinte prática: Se uma variável aleatóriaXtem uma distribuição contínua, daremos apenas uma versão do pdf deXe nos referiremos a essa versão comoopdf deX, como se tivesse sido determinado de forma única. Deve ser lembrado, no entanto, que existe alguma liberdade na seleção da versão específica da pdf que é usada para representar cada distribuição contínua. O lugar mais comum onde tal liberdade surgirá é em casos como a Eq. (3.2.1) onde a pdf deve ter descontinuidades. Sem fazer a funçãofmenos contínuo, poderíamos ter definido o pdf naquele exemplo para quef (4)=f (200)=0 em vez def (4)=f (200)=1/196. Ambas as escolhas levam aos mesmos cálculos de todas as probabilidades associadas aX, e eles 3.2 Distribuig6es Continuas 103 3.2 Continuous Distributions 103 ambos sdo igualmente validos. Como o suporte de uma distribuicgéo continua é o fechamento are both equally valid. Because the support of a continuous distribution is the closure do conjunto onde a fdp é estritamente positiva, pode-se mostrar que o suporte é Unico. Uma of the set where the p.d.f. is strictly positive, it can be shown that the support is unique. abordagem sensata seria entdo escolher a versdo do pdf que fosse estritamente positiva no A sensible approach would then be to choose the version of the p.d.f. that was strictly suporte, sempre que possivel. positive on the support whenever possible. O leitor deve notar que “distribuigdo continua” éndoo nome de uma distribuicdo, The reader should note that “continuous distribution” is not the name of a assim como “distribuicdo discreta” ndo é o nome de uma distribuigdo. Existem muitas distribution, just as “discrete distribution” is not the name of a distribution. There are distribuigdes discretas e muitas continuas. Algumas distribuigées de cada tipo tém many distributions that are discrete and many that are continuous. Some distributions nomes que introduzimos ou apresentaremos mais tarde. of each type have names that we either have introduced or will introduce later. Apresentaremos agora varios exemplos de distribuigdes continuas e suas We shall now present several examples of continuous distributions and their pdf. p.d.f’s. Distribuicées Uniformes em Intervalos Uniform Distributions on Intervals Exemplo Previsées de temperatura.Meteorologistas televisivos anunciam temperaturas altas e baixas Example Temperature Forecasts. Television weather forecasters announce high and low tem- 3.2.2 previsdes de temperatura como numeros inteiros de graus. Estas previsdes, no entanto, sdo o 3.2.2 perature forecasts as integer numbers of degrees. These forecasts, however, are the resultado de modelos meteorolégicos muito sofisticados que fornecem previsées mais precisas que results of very sophisticated weather models that provide more precise forecasts that as personalidades da televisdo arredondam para o numero inteiro mais préximo para simplificar. the television personalities round to the nearest integer for simplicity. Suppose that Suponha que o previsor anuncie uma alta temperatura desim. Se quiséssemos saber qual a the forecaster announces a high temperature of y. If we wanted to know what tem- temperaturaXos modelos meteorolégicos realmente produzidos, pode ser seguro assumir queXera perature X the weather models actually produced, it might be safe to assume that X igualmente provavel que fosse qualquer numero no intervalo desim-1/2 parasim+1/2. - was equally likely to be any number in the interval from y — 1/2 to y + 1/2. < A distribuigdo deXno Exemplo 3.2.2 6 um caso especial do seguinte. The distribution of X in Example 3.2.2 is a special case of the following. Definicao Distribuigdo Uniforme em um Intervalo.Deixaraebsejam dois numeros reais dados tais que Definition Uniform Distribution on an Interval. Let a and b be two given real numbers such that 3.2.3 uma < b. DeixarXseja uma variavel aleatéria tal que se saiba queasXsbe, para cada 3.2.3 a <b. Let X be a random variable such that it is known that a < X <b and, for subintervalo de [um, 6], a probabilidade de queXpertencera a esse subintervalo é every subinterval of [a, b], the probability that X will belong to that subinterval is proporcional ao comprimento desse subintervalo. Dizemos entdo que a variavel aleatoria proportional to the length of that subinterval. We then say that the random variable Xtem odistribuiggo uniforme no intervalo[um, 6). X has the uniform distribution on the interval [a, b}. Uma variavel aleatoriaXcom a distribuigdo uniforme no intervalo [um, 6] representa 0 resultado de A random variable X with the uniform distribution on the interval [a, b] represents um experimento que é frequentemente descrito dizendo que um ponto € escolhido a/eatoriamentedo the outcome of an experiment that is often described by saying that a point is chosen intervalo [um, 6]. Neste contexto, a frase “aleatoriamente” significa que 0 ponto tem a mesma at random from the interval [a, b]. In this context, the phrase “at random” means probabilidade de ser escolhido em qualquer parte especifica do intervalo ou em qualquer outra parte that the point is just as likely to be chosen from any particular part of the interval as do mesmo comprimento. from any other part of the same length. Teorema Distribuigdéo Uniforme pdfSextem a distribuigdo uniforme em um intervalo [um, b], entao Theorem Uniform Distribution p.d.f. If X has the uniform distribution on an interval [a, b], then 3.2.1 o pdf dexé 3.2.1 the p.d.f. of X is t 1 1 fixi=b-a —— Paraasxsb, (3.2.6) fa)= | pig brasxsh, (3.2.6) 0 de outra forma. 0 otherwise. ProvaXdeve assumir um valor no intervalo [um, b]. Portanto, o pdffx)deXdeve ser 0 fora de [um, Ob]. Proof X must take a value in the interval [a, b]. Hence, the p.d-f. f(x) of X must Além disso, uma vez que qualquer subintervalo particular de [um, 6] tendo um determinado be 0 outside of [a, b]. Furthermore, since any particular subinterval of [a, b] having comprimento tem a maior probabilidade de conterXassim como qualquer outro subintervalo com o a given length is as likely to contain X as is any other subinterval having the same mesmo comprimento, independentemente da localizacgdo do subintervalo especifico em [um, O], length, regardless of the location of the particular subinterval in [a, b], it follows that segue que f(xMeve ser constante ao longo de [um, b], e esse intervalo é entdo o suporte da f(x) must be constant throughout [a, b], and that interval is then the support of the distribuicdo. Também, distribution. Also, Joo Jo oo b f(x)dx= f(xJdx=1. (3.2.7) / fQ)dx= / f(x) dx =1. (3.2.7) — 0 a —oo a Portanto, o valor constante de/(x)por todo [um, 6] deve ser 1/b-a), e o pdf dex Therefore, the constant value of f(x) throughout [a, b] must be 1/(b — a), and the deve ser (3.2.6). 7 p.d.f. of X must be (3.2.6). 7 104 Capitulo 3 Varidveis Aleatdrias e Distribuicgées 104 Chapter 3 Random Variables and Distributions Figura 3.5A pdf paraa Ax) Figure 3.5 The p.d.f. for the fo) distribuigdo uniforme no uniform distribution on the intervalo [um, 6]. interval [a, b]. | | | | a b x a b x A pdf (3.2.6) esta esbocgada na Figura 3.5. Por exemplo, a variavel aleatériaX( Th p.d.f. (3.2.6) is sketched in Fig. 3.5. As an example, the random variable X (demand demanda por agua) no Exemplo 3.2.1 tem distribuigdo uniforme no intervalo [4,200]. for water) in Example 3.2.1 has the uniform distribution on the interval [4, 200]. Nota: Densidade ndo é probabilidade.0 leitor deve observar que a pdf em (3.2.6) Note: Density Is Not Probability. The reader should note that the p.d.f. in (3.2.6) can pode ser maior que 1, principalmente seb-um <1. Na verdade, as PDFs podem ser be greater than 1, particularly if b — a < 1. Indeed, p.d.f’s can be unbounded, as we ilimitadas, como veremos no Exemplo 3.2.6. O pdf dex, f(x), em si ndo é igual a shall see in Example 3.2.6. The p.d.f. of X, f(x), itself does not equal the probability probabilidade de queXé pertox. A integral deSobre valores proximosxda a probabilidade that X is near x. The integral of f over values near x gives the probability that X is de queXé pertox, e a integral nunca é maior que 1. near x, and the integral is never greater than 1. E visto na Eq. (3.2.6) que a pdf que representa uma distribuigdo uniforme em um It is seen from Eq. (3.2.6) that the p.d.f. representing a uniform distribution on determinado intervalo é constante nesse intervalo, e o valor constante da pdf é 0 a given interval is constant over that interval, and the constant value of the p.d.f. inverso do comprimento do intervalo. Ndo é possivel definir uma distribuigdo is the reciprocal of the length of the interval. It is not possible to define a uniform uniforme ao longo de um intervalo ilimitado, porque a duragdo desse intervalo é distribution over an unbounded interval, because the length of such an interval is infinita. infinite. Considere novamente a distribuigdo uniforme no intervalo [um, 6]. Como a Consider again the uniform distribution on the interval [a, b]. Since the proba- probabilidade é 0 de que um dos extremosaoubsera escolhida, é irrelevante se a bility is 0 that one of the endpoints a or b will be chosen, it is irrelevant whether the distribuigdo é considerada como uma distribuicgdo uniforme nofechadointervaloasxsb, ou distribution is regarded as a uniform distribution on the closed interval a < x <b, or como uma distribuigdo uniforme noabririntervalouma < x <b, ou como uma distribuicgdo as a uniform distribution on the open interval a < x < b, or as a uniform distribution uniforme no intervalo semiaberto e semifechado(um, 6] em que um endpoint é incluido e on the half-open and half-closed interval (a, b] in which one endpoint is included and 0 outro endpoint é excluido. the other endpoint is excluded. Por exemplo, se uma variavel aleatériaXtem a distribuigdo uniforme no intervalo For example, if a random variable X has the uniform distribution on the interval [-1,4], entdo o pdf dexé [—1, 4], then the p.d.f. of X is { fie 14 para -1Sxs4, f= | 1/5. for —1 < x <4, 0 de outra forma. 0 otherwise. Além disso, Furthermore, p 2 2 2 Pr.(0SX<2 fxax= 5 PrO<X <2)= / f(x) dx = 5 0 0 Observe que definimos o pdf deXser estritamente positivo no intervalo fechado [-1,4] Notice that we defined the p.d.f. of X to be strictly positive on the closed interval e 0 fora deste intervalo fechado. Teria sido igualmente sensato definir a pdf como [—1, 4] and 0 outside of this closed interval. It would have been just as sensible to estritamente positiva no intervalo aberto(-1,4)e 0 fora deste intervalo aberto. A define the p.d.f. to be strictly positive on the open interval (—1, 4) and 0 outside of this distribuigdo de probabilidade seria a mesma de qualquer maneira, incluindo o open interval. The probability distribution would be the same either way, including calculo de Pr(0sX<2)Jque acabamos de apresentar. Depois disso, quando houver the calculation of Pr(O < X < 2) that we just performed. After this, when there are varias opgdes igualmente sensatas sobre como definir um pdf, simplesmente several equally sensible choices for how to define a p.d.f., we will simply choose one escolheremos uma delas sem tomar nota das outras opcées. of them without making any note of the other choices. Outras distribuigdes continuas Other Continuous Distributions Exemplo PDF especificado de forma incompletaSuponha que a pdf de uma certa variavel aleatoriaX Example Incompletely Specified p.d.f. Suppose that the p.d.f. of a certain random variable X 3.2.3 tem o seguinte formato: 3.2.3 has the following form: 3.2 Distribuig6es Continuas 105 3.2 Continuous Distributions 105 t cx para O0<x <4, cx for0<x <4, fx f(x) = . 0 de outra forma, 0 otherwise, ondecé uma dada constante. Vamos descobrirfarminho o valor dec. where c is a given constant. We shall determine the value of c. Para cada PDF, deve ser verdade ques »f/x-1. Portanto, neste exemplo, For every p.d.f., it must be true that £25 f (x) = 1. Therefore, in this example, Ja 4 cx dx=81. / cx dx = 8c =1. 0 0 Por isso, =12. - Hence, c = 1/8. < Nota: Calculo de constantes de normalizagao.0 calculo no Exemplo 3.2.3 ilustra um Note: Calculating Normalizing Constants. The calculation in Example 3.2.3 illus- ponto importante que simplifica muitos resultados estatisticos. O pdf deXfoi especificado trates an important point that simplifies many statistical results. The p.d.f. of X was sem fornecer explicitamente o valor da constantec. No entanto, conseguimos descobrir specified without explicitly giving the value of the constant c. However, we were able qual era o valor decusando o fato de que a integral de uma pdf deve ser 1. Muitas vezes to figure out what was the value of c by using the fact that the integral of a p.d.f. must acontecera, especialmente no Capitulo 8, onde encontramos distribuigées amostrais de be 1. It will often happen, especially in Chapter 8 where we find sampling distribu- resumos de dados observados, que podemos determinar a pdf de uma variavel aleatoria, tions of summaries of observed data, that we can determine the p.d.f. of a random exceto para um fator constante. Esse fator constante deve ser o valor Unico tal que a variable except for a constant factor. That constant factor must be the unique value integral da pdf seja 1, mesmo que ndo possamos calcula-lo diretamente. such that the integral of the p.d-f. is 1, even if we cannot calculate it directly. Exemplo Calculando probabilidades a partir de um pdfSuponha que o pdf dexXé como no Exemplo 3.2.3, Example Calculating Probabilities froma p.d.f. Suppose that the p.d_-f. of X is as in Example 3.2.3, 3.2.4 nomeadamente, 3.2.4 namely, “ 0 ~ for 0 4 — para O0<x <4, — for0<x <4, fix-8 P f= | 8 0 caso contrario. 0 otherwise. Vamos agora determinar os valores de Pr(1<X<2)e Pr(X >2). Aplique a Eq. (3.2.3) para obter We shall now determine the values of Pr(1 < X <2) and Pr(X > 2). Apply Eq. (3.2.3) to get 21 J 3 *4 3 Pr.(1sXS2) —x-dx= — Pri<X <2)= —x dx = — 18 16 1 8 16 e and 41 J 3 44 3 Pr.(X >2)F —x-dx= -. - Pr(x > 2)= —xdx=-. < 28 4 2 8 4 Exemplo Varidveis aleatorias ilimitadas.Muitas vezes é conveniente e Util representar uma con- Example Unbounded Random Variables. It is often convenient and useful to represent a con- 3.2.5 distribuigdo continua por uma pdf que é positiva em um intervalo ilimitado da linha real. Por 3.2.5 tinuous distribution by a p.d.f. that is positive over an unbounded interval of the real exemplo, num problema pratico, a tensdoXem um determinado sistema elétrico pode ser uma line. For example, in a practical problem, the voltage X in a certain electrical system variavel aleatéria com uma distribuigdo continua que pode ser aproximadamente might be a random variable with a continuous distribution that can be approximately representado pelo pdf represented by the p.d.f. lo paraxs0, 0 for x <0, 1 2, = 1 2. TOF | ———_ parax >0. (3.2.8) FQ) ——j~ forx>0. (3.2.8) (1 +xp (1+.x)? Pode-se verificar que as propriedades (3.2.4) e (3.2.5) exigidas de todas as PDFs sao It can be verified that the properties (3.2.4) and (3.2.5) required of all p.d-f’s are satisfeitas por f(x). satisfied by f(x). Mesmo que a tensdoXpode realmente ser limitado na situac¢do real, 0 Even though the voltage X may actually be bounded in the real situation, the pdf (3.2.8) pode fornecer uma boa aproximagdo para a distribuigdo deXem toda a sua p.d.f. (3.2.8) may provide a good approximation for the distribution of X over its full gama de valores. Por exemplo, suponha que se saiba que o valor maximo possivel de range of values. For example, suppose that it is known that the maximum possible Xé 1000, nesse caso Pr(X >10000. Quando a pdf (3.2.8) é usada, calculamos Pr(X > value of X is 1000, in which case Pr(X > 1000) = 0. When the p.d.f. (3.2.8) is used, 1000}-0.001. Se (3.2.8) representa adequadamente a variabilidade deXdurante o we compute Pr(X > 1000) = 0.001. If (3.2.8) adequately represents the variability intervalo(0,1000), entdo pode ser mais conveniente usar 0 pdf (3.2.8) do que um pdf of X over the interval (0, 1000), then it may be more convenient to use the p.d.f. semelhante a (3.2.8) paraxs1000, exceto por uma nova normalizagdo (3.2.8) than a p.d.f. that is similar to (3.2.8) for x < 1000, except for anew normalizing 106 Capitulo 3 Varidveis Aleatdrias e Distribuicgées 106 Chapter 3 Random Variables and Distributions constante e é 0 parax >1000. Isto pode ser especialmente verdadeiro se nao tivermos certeza constant, and is 0 for x > 1000. This can be especially true if we do not know for sure de que a tensdo maxima é de apenas 1000. - that the maximum voltage is only 1000. < Exemplo PDFs ilimitados.Como 0 valor de uma pdf é uma densidade de probabilidade, em vez de uma Example Unbounded p.d.f.’s. Since a value of a p.d.f. is a probability density, rather than a 3.2.6 probabilidade, tal valor pode ser maior que 1. Na verdade, os valores dos seguintes 3.2.6 probability, such a value can be larger than 1. In fact, the values of the following pdf sao ilimitados na vizinhanga dex=0: p.d.f. are unbounded in the neighborhood of x = 0: { 3x12 paraO<x <1 2x13 for0<x <1 fix P (3.2.9) fay ={3* ¥<4 (3.2.9) 0 - de outra forma. 0 otherwise. Pode-se verificar que embora a pdf (3.2.9) seja ilimitada, ela satisfaz as It can be verified that even though the p.d-f. (3.2.9) is unbounded, it satisfies the propriedades (3.2.4) e (3.2.5) exigidas de uma pdf - properties (3.2.4) and (3.2.5) required of a p.d-f. < Distribuicdes Mistas Mixed Distributions A maioria das distribuigdes encontradas em problemas praticos sao discretas ou continuas. Most distributions that are encountered in practical problems are either discrete or Mostraremos, contudo, que as vezes pode ser necessario considerar uma distribuicao que é continuous. We shall show, however, that it may sometimes be necessary to consider a uma mistura de uma distribuicdo discreta e de uma distribuigdo continua. distribution that is a mixture of a discrete distribution and a continuous distribution. Exemplo Tensdo truncada.Suponha que no sistema elétrico considerado no Exemplo 3.2.5, Example Truncated Voltage. Suppose that in the electrical system considered in Example 3.2.5, 3.2.7 a tensdoXdeve ser medido por um voltimetro que registrara o valor real de XseX<3, mas 3.2.7 the voltage X is to be measured by a voltmeter that will record the actual value of simplesmente registrara o valor 3 seX >3. Se deixarmosSdenotar o valor registrado pelo X if X <3 but will simply record the value 3 if X > 3. If we let Y denote the value voltimetro, entao a distribuigéo deSpode ser derivado da seguinte forma. recorded by the voltmeter, then the distribution of Y can be derived as follows. Primeiro, Pr.(S=3Pr.(X23}-1/A. Desde o valor Unico S=3 tem probabilidade 1/4, First, Pr(Y = 3) = Pr(X > 3) = 1/4. Since the single value Y = 3 has probability segue-se que Pr(0<S <33/. Além disso, uma vez que S=Xpara 0<X<3, esta 1/4, it follows that Pr(O < Y <3) = 3/4. Furthermore, since Y = X for 0 < X <3, this probabilidade 3/4 paraSé distribuido ao longo do intervalo(0,3)de acordo com o probability 3/4 for Y is distributed over the interval (0, 3) according to the same p.d.f. mesmo pdf (3.2.8) doXno mesmo intervalo. Assim, a distribuigdo deSé especificado (3.2.8) as that of X over the same interval. Thus, the distribution of Y is specified by pela combinagdo de uma pdf no intervalo(0,3)e uma probabilidade positiva no ponto the combination of a p.d.f. over the interval (0, 3) and a positive probability at the S=3. - point Y =3. < Resumo Summary Uma distribuicgdo continua é caracterizada por sua funcdo de densidade de probabilidade (pdf). A continuous distribution is characterized by its probability density function (p.d.f.). Uma fungdo ndo negariva te o pdf da distribuigdo deXSe, para cada intervalo A nonnegative function f is the p.d.f. of the distribution of X if, for every interval [um, b], Pr.(asX< bb © af(x)dx. Variaveis aleatérias continuas satisfazem Pr(X=x- [a,b], Praa<X<b)= t? f (x) dx. Continuous random variables satisfy Pr(X = x) = 0 para cada valorx. Se a pdf de uma distribuigdo for constante em um intervalo [um, b] e esta 0 0 for every value x. If the p.d-f. of a distribution is constant on an interval [a, b] and fora do intervalo, dizemos que a distribuicdo é uniforme no intervalo [um, 6]. is 0 off the interval, we say that the distribution is uniform on the interval [a, b]. Exercicios Exercises 1.DeixarXser uma variavel aleatoria com a pdf especificada no Esboce este pdf e d(etermi) nado, va (lues do fo)I- 1. Let X be a random variable with the p.d.f. specified in Sketch this p.d.f. and determine the values of the fol- 2.6. «< } 1 1 3 2.6. . . yes 1 1 3 Exemplo 3.2.6. Calcular PR(X$8/27}: probabilidades baixas:a.Pr, X< = b.Pr. —<X< = Example 3.2.6. Compute Pr(X = 8/27) lowing probabilities: a. Pr (x < *) b. Pr (J <X< ;) 2.Suponha que a pdf de uma variavel aleatoriaXé o ( ) 2 4 4 2. Suppose that the p.d.f. of a random variable X is as 2 4 4 seguinte: f c.Pr.X>1 5. follows: c. Pr (x > 3). Z eyes 4 . . fr 81-3) para O<x <1, 3.Suponha que a pdf de uma variavel aleatériaXé 0 f= 3(1—x9) for0<x <1, 3. Suppose that the p.d.f. of a random variable X is as 0 _ de outra forma. seg ui nte: 0 otherwise. follows: 3.3 A Fungdo de Distribuigéo Cumulativa 107 3.3 The Cumulative Distribution Function 107 { 49 -x2) para -3<x<3 a.Encontre o valor da constantece esboce o pdf L9—x2) for-3<x <3 a. Find the value of the constant c and sketch the p.d-f. = SASS, =) 36 =* 5": f(x) b.Encontre o valor de Pr(1 <X<2). fQ)= . b. Find the value of Pr(1 < X <2). o de outra forma. 0 otherwise. Esboce esta fdp e determine os valores das seguintes 2 Mostre suc neo existe nennum numeroctal que a Sketch this p.d.f. and determine the values of the following a snow that phere does not exist any number e such that probabilidades:a.Pr.(X <O)b.Pr.(-1<X<1) seguinte Tuncgao omer um par: probabilities: a. Pr(X <0) b. Pr(—1< X <1) the following function f(x) would be a p.d.t: c.Pr.(X>2), fie tte ‘Prax 0, c. Pr(X > 2). fx) | i forx>0, — x)= 4.Suponha que a pdf de uma variavel aleatériaXé o -0 de outra forma. 4. Suppose that the p.d.f. of a random variable X is as 0 otherwise. seguinte: follows: eguin { 10.Suponha que a pdf de uma variavel aleatoriaXé o onows 10. Suppose that the p.d.f. of a random variable X is as < inte: 2 . fix = por 1<xs2, seguinte: ‘ ; f(x) = | . _ S* <2, follows: u . — . a de outra forma for Aone «~Para O<x <1, . otherwise fo) = { Goo for0 <x <1, a.Encontre o valor da constantece esboce o pdf 0 de outra forma. a. Find the value of the constant c and sketch the p.d-f. 0 otherwise. b.Encontre o valor de Pr(X >3/2). a.Encontre o valor da constantece esboce o pdf b. Find the value of Pr(X > 3/2). a. Find the value of the constant c and sketch the p.d-f. 5.Suponha que a pdf de uma variavel aleatériaXé o b.Encontre o valor de Pr(X<1/2). 5. Suppose that the p.d-f. of a random variable X is as b. Find the value of Pr(X < 1/2). seguinte: follows: g { 11.Mostre que ndo existe nenhum numeroctal que a 11. Show that there does not exist any number c such that 1 < seguinte funcdof(xseria um pdf: i the following function would be a p.d.f.: foo gx Para O<x<4, g Gaof(x)s C p fa)= | sx for 0 <* <4, g f(x) p Q de outra forma. fix- x Para O<x <1, 0 otherwise. F(x) = | £ for0<x <1, a.Encontre o valor dettal que Pr(XstF1A. O de outra forma. a. Find the value of t such that Pr(X <t) = 1/4. 0 otherwise. b.Encontre o valor dettal que Pr(x2t¥1/2. 12.No Exemplo 3.1.3 na pagina 94, determine a distribuicao b. Find the value of t such that Pr(X > 7) = 1/2. 12. In Example 3.1.3 on page 94, determine the distri- 6.Deixar Xseja uma varidvel aleatoria para a qual a pdf é da variavel aleatériaS,a demanda de eletricidade. Além disso, 6. Let X be a random variable for which the p.d-f. is as pation fine random variable Y, the electricity demand. dada no Exercicio 5. Apés o valor deXfoi observado, vamos encontre Pr(S <50). given in Exercise 5. After the value of X has been ob- so, find Pr(Y < 50). Sseja o numero inteiro mais proximo dex. Encontre o PF 13.Uma vendedora de sorvete leva 20 litros de sorvete em seu served, let Y be the integer closest to X. Find the p.f. of — 13, An ice cream seller takes 20 gallons of ice cream in da variavel aleatoriaS. caminhdo todos os dias. DeixarXrepresenta o numero de the random variable Y. her truck each day. Let X stand for the number of gallons 7.Suponha que uma variavel aleatériaXtem a galdes que ela vende. A probabilidade é 0,1 de queX=20. Se 7. Suppose that a random variable X has the uniform that she sells. The probability is 0.1 that X = 20. If she distribuicdo uniforme no intervalo [-2,8]. Encontre o ela ndo vender todos os 20 galées, a distribuigdo deXsegue distribution on the interval [—2, 8]. Find the p.d.f. of X and doesn’t sell all 20 gallons, the distribution of X follows a pdf deXe o valor de Pr(0<X<7). uma distribuigdo contin’s com um pdf no formato the value of Pr(0 < X <7). continuous distribution with a p.d-f. of the form 8.Suponha que a pdf de uma variavel aleatériaXé o for cx para O<x <20, 8. Suppose that the p.d.f. of a random variable X is as foal for 0 <x < 20, seguinte: O de outra forma, follows: 0 otherwise, { fix CEe-2x parax >0, ondecé uma constante que faz Pr(X <200.9. Encontre a f(x) = ce-2* forx > 0, where cis aconstant that makes Pr(X < 20) = 0.9. Find the 0 de outra forma. constantecpara que Pr(X <200.9 conforme descrito acima. 7 0 otherwise. constant c so that Pr(X < 20) = 0.9 as described above. 3.3 A Funcdo de Distribuigdéo Cumulativa 3.3 The Cumulative Distribution Function Embora uma distribui¢ao discreta seja caracterizada por seu PF e uma distribui¢ao Although a discrete distribution is characterized by its p.f and a continuous distri- continua seja caracterizada por sua FDP, toda distribuigao tem uma caracteriza¢do bution is characterized by its p.d.f., every distribution has a common characteriza- comum através de sua fun¢ao de distribuigao (cumulativa) (cdf). O inverso do cdf é tion through its (cumulative) distribution function (c.d.f.). The inverse of the c.d.f. chamado de fungao quantile é util para indicar onde a probabilidade esta localizada is called the quantile function, and it is useful for indicating where the probability em uma distribui¢ao. is located in a distribution. Exemplo Tensdo.Considere novamente a tensdoXdo Exemplo 3.2.5. A distribuigdo dex Example Voltage. Consider again the voltage X from Example 3.2.5. The distribution of X 3.3.1 é caracterizado pela pdf na Eq. (3.2.8). Uma caracterizagdo alternativa que esta mais 3.3.1 is characterized by the p.d-f. in Eq. (3.2.8). An alternative characterization that is diretamente relacionada as probabilidades associadasXé obtido da sequinte fungdo: more directly related to probabilities associated with X is obtained from the following function: 108 Capitulo 3 Varidveis Aleatdrias e Distribuicgées 108 Chapter 3 Random Variables and Distributions Jx {ro paraxso, x 0 for x <0, x x F(x}EPr.(XSXx)= _. fymorrer= l Asep rs parax >0, F(x)=Pr(X < x)= [. fQ)dy = / ast dy 5 forx>0, e _ { ° (3.3.1) 0 G+y) (3.3.1) 0 paraxs0, 0 for x <0, =| - 1 = 1 1 —— parax>0. ~ ) 1— —— forx>0. 1 +x 1+.x Assim, por exemplo, Pr.(X<3 FF 33A. - So, for example, Pr(X < 3) = F(3) =3/4. < Definicdo e propriedades basicas Definition and Basic Properties Definicgao (Fungao de distribuigdo cumulativa.O fun¢ao de distribuic¢aoou distribuicao cumulativa Definition (Cumulative) Distribution Function. The distribution function or cumulative distribu- 3.3.1 fun¢ao de acdo(abreviado CDF)Fde uma variavel aleatoriaXé a fungdo 3.3.1 tion function (abbreviated c.d.f) F of a random variable X is the function F(x}FPr.(XSx)para -~<x <oo, (3.3.2) F(x) =Pr(X <x) for -coo<x<om. (3.3.2) Deve-se enfatizar que a funcdo de distribuicgdo cumulativa é definida como acima It should be emphasized that the cumulative distribution function is defined as above para cada variavel aleatériaX, independentemente de a distribuigdo deXé discreto, for every random variable X, regardless of whether the distribution of X is discrete, continuo ou misto. Para a variavel aleatéria continua no Exemplo 3.3.1, 0 cdf foi continuous, or mixed. For the continuous random variable in Example 3.3.1, the c.d.f. calculado na Eq. (3.3.1). Aqui esta um exemplo discreto: was calculated in Eq. (3.3.1). Here is a discrete example: Exemplo Bernoulli CDFDeixarXtem a distribuigdo de Bernoulli com pardmetropdefinido em Example Bernoulli c.d.f. Let X have the Bernoulli distribution with parameter p defined in 3.3.2 Definigdo 3.1.5. Entdo Pr.(X=01 -pe Pr(X=1 p. DeixarSeja o cdf dex. E facil ver 3.3.2 Definition 3.1.5. Then Pr(X = 0) =1— p and Pr(x = 1) = p. Let F be the c.d-f. of X. issoF(x0 parax <0 porquex20 com certeza. De forma similar, -(x#1 para x21 It is easy to see that F(x) =0 for x < 0 because X > 0 for sure. Similarly, F(x) = 1 for porquexXs1 com certeza. Para O<x <1, Pr.(XSxPr.(X=01 -pporque x > 1 because X <1 for sure. For 0 < x < 1, Pr(X <x) =Pr(X =0) = 1 — p because 0 €0 unico valor possivel deXissq esta no intervalo(—~, x]. Resumindo, 0 is the only possible value of X that is in the interval (—oo, x]. In summary, l parax <0, 0 for x < 0, F(XF | 1-p para Osx <1, F(x)=},1-p for0<x <1, 1 parax21. - 1 for x > 1. < Veremos em breve (Teorema 3.3.2) que a fdc permite o calculo de todas as We shall soon see (Theorem 3.3.2) that the c.d.f. allows calculation of all interval probabilidades de intervalo; portanto, caracteriza a distribuigdo de uma variavel probabilities; hence, it characterizes the distribution of a random variable. It follows aleatoria. Segue-se da Eq. (3.3.2) que o cdf de cada variavel aleatériaXé uma fungdoF from Eq. (3.3.2) that the c.d.f. of each random variable X is a function F defined on definido na reta real. O valor defem cada pontoxdeve ser um numerof(x)no intervalo the real line. The value of F at every point x must be a number F(x) in the interval [0,1] porquef(x% a probabilidade do evento {Xs x}. Além disso, segue da Eq. (3.3.2) [0, 1] because F(x) is the probability of the event {X <x}. Furthermore, it follows que o cdf de cada variavel aleatériaXdeve ter as trés propriedades a seguir. from Eq. (3.3.2) that the c.d.f. of every random variable X must have the following three properties. Propriedade Ndo decrescente.A funcaoF(x)é ndo decrescente comoxaumenta, isto 6, sexi <x2, Property Nondecreasing. The function F (x) is nondecreasing as x increases; that is, if x1 < xp, 3.3.1 entaoF (x1 SF (x2). 3.3.1 then F(x) < F(x). ProvaSex1 <x2, entdo o evento {X<x1}é um subconjunto do evento {XSx2}. Portanto, Proof If x; <x, then the event {X < x} is a subset of the event {X < x5}. Hence, Pr{XSx1} Spr{Xsx2}de acordo com o Teorema 1.5.4. a Pr{X <x} < Pr{X < x9} according to Theorem 1.5.4. a Um exemplo de fdc é esbogado na Figura 3.6. E mostrado nessa figura que 0< An example of a c.d.f. is sketched in Fig. 3.6. It is shown in that figure that 0 < F(x)S1 em toda a linha real. Também, F(x sempre nado decrescente comoxaumenta, F(x) <1 over the entire real line. Also, F(x) is always nondecreasing as x increases, emboraF(x}é constante ao longo do intervaloxisxsx2e parax2x4. although F(x) is constant over the interval x, <x <x, and for x > x4. Propriedade Limites em+.limaox--» F(x 0 elimaox-~ F(x 1. Property Limits at too. lim,_,_,, F(x) =0 and lim,_,,, F(x) = 1. 3.3.2 3.3.2 ProvaComo na prova da Propriedade 3.3.1, observe que {XSx1} C {XSx2}em qualquer momentox Proof Asin the proof of Property 3.3.1, note that {X <x } Cc {X <x} whenever x; < 1<x2. O fato de o Pr.(X<x)se aproxima de 0 comox> -~agora segue do Exercicio 13 em xX. The fact that Pr(X < x) approaches 0 as x + —oo now follows from Exercise 13 in 3.3 The Cumulative Distribution Function 109 Figure 3.6 An example of a c.d.f. 1 z3 z2 z1 z0 0 x1 x2 x3 x4 x F(x) Section 1.10. Similarly, the fact that Pr(X ≤ x) approaches 1 as x → ∞ follows from Exercise 12 in Sec. 1.10. The limiting values specified in Property 3.3.2 are indicated in Fig. 3.6. In this figure, the value of F(x) actually becomes 1 at x = x4 and then remains 1 for x > x4. Hence, it may be concluded that Pr(X ≤ x4) = 1 and Pr(X > x4) = 0. On the other hand, according to the sketch in Fig. 3.6, the value of F(x) approaches 0 as x → −∞, but does not actually become 0 at any finite point x. Therefore, for every finite value of x, no matter how small, Pr(X ≤ x) > 0. A c.d.f. need not be continuous. In fact, the value of F(x) may jump at any finite or countable number of points. In Fig. 3.6, for instance, such jumps or points of discontinuity occur where x = x1 and x = x3. For each fixed value x, we shall let F(x−) denote the limit of the values of F(y) as y approaches x from the left, that is, as y approaches x through values smaller than x. In symbols, F(x−) = lim y→x y<x F(y). Similarly, we shall define F(x+) as the limit of the values of F(y) as y approaches x from the right. Thus, F(x+) = lim y→x y>x F(y). If the c.d.f. is continuous at a given point x, then F(x−) = F(x+) = F(x) at that point. Property 3.3.3 Continuity from the Right. A c.d.f. is always continuous from the right; that is, F(x) = F(x+) at every point x. Proof Let y1 > y2 > . . . be a sequence of numbers that are decreasing such that limn→∞ yn = x. Then the event {X ≤ x} is the intersection of all the events {X ≤ yn} for n = 1, 2, . . . . Hence, by Exercise 13 of Sec. 1.10, F(x) = Pr(X ≤ x) = lim n→∞ Pr(X ≤ yn) = F(x+). It follows from Property 3.3.3 that at every point x at which a jump occurs, F(x+) = F(x) and F(x−) < F(x). 3.3 A Função de Distribuição Cumulativa 109 Figura 3.6Um exemplo de CDF F(x) 1 z3 z2 z1 z0 0 x1 x2 x3 x4 x Seção 1.10. Da mesma forma, o fato de Pr(X≤x)se aproxima de 1 comox→ ∞segue do Exercício 12 na Seção. 1.10. Os valores limites especificados na Propriedade 3.3.2 estão indicados na Fig. 3.6. Nesta figura, o valor deF(x)na verdade se torna 1 emx=x4e então permanece 1 parax > x4. Portanto, pode-se concluir que Pr(X≤x4)=1 e Pr(X > x4)=0. Por outro lado, de acordo com o esboço da Fig. 3.6, o valor deF(x)se aproxima de 0 comox→ −∞, mas na verdade não se torna 0 em nenhum ponto finitox. Portanto, para cada valor finito dex, não importa quão pequeno seja, Pr(X≤x) >0. Um cdf não precisa ser contínuo. Na verdade, o valor deF(x)pode saltar em qualquer número finito ou contável de pontos. Na Figura 3.6, por exemplo, tais saltos ou pontos de descontinuidade ocorrem ondex=x1ex=x3. Para cada valor fixox, vamos deixar F (x-)denota o limite dos valores deF (s)comosimabordagensxda esquerda, isto é, comosimabordagens xatravés de valores menores quex. Em símbolos, F (x-)=limãoF(y). sim→x sim<x Da mesma forma, definiremosF (x+)como o limite dos valores deF (s)comosimabordagensx da direita. Por isso, F (x+)=limãoF(y). sim→x y>x Se o cdf é contínuo em um determinado pontox, entãoF (x-)=F (x+)=F(x)nesse ponto. Propriedade 3.3.3 Continuidade da Direita.Um cdf é sempre contínuo à direita; aquilo é,F(x)= F (x+)em cada pontox. ProvaDeixarsim1>sim2> . . .seja uma sequência de números decrescentes tais que limiten →∞simn=x. Então o evento {X≤x}é a interseção de todos os eventos {X≤simn} paran=1,2 , . . . .Portanto, pelo Exercício 13 da Sec. 1.10, F(x)=Pr.(X≤x)=lim Pr(X≤simn)=F (x+). n→∞ Segue-se da Propriedade 3.3.3 que em cada pontoxem que ocorre um salto, F (x+)=F(x)eF (x-) < F (x). 110 Chapter 3 Random Variables and Distributions In Fig. 3.6 this property is illustrated by the fact that, at the points of discontinuity x = x1 and x = x3, the value of F(x1) is taken as z1 and the value of F(x3) is taken as z3. Determining Probabilities from the Distribution Function Example 3.3.3 Voltage. In Example 3.3.1, suppose that we want to know the probability that X lies in the interval [2, 4]. That is, we want Pr(2 ≤ X ≤ 4). The c.d.f. allows us to compute Pr(X ≤ 4) and Pr(X ≤ 2). These are related to the probability that we want as follows: Let A = {2 < X ≤ 4}, B = {X ≤ 2}, and C = {X ≤ 4}. Because X has a continuous distribution, Pr(A) is the same as the probability that we desire. We see that A ∪ B = C, and it is clear that A and B are disjoint. Hence, Pr(A) + Pr(B) = Pr(C). It follows that Pr(A) = Pr(C) − Pr(B) = F(4) − F(2) = 4 5 − 3 4 = 1 20. ◀ The type of reasoning used in Example 3.3.3 can be extended to find the prob- ability that an arbitrary random variable X will lie in any specified interval of the real line from the c.d.f. We shall derive this probability for four different types of intervals. Theorem 3.3.1 For every value x, Pr(X > x) = 1 − F(x). (3.3.3) Proof The events {X > x} and {X ≤ x} are disjoint, and their union is the whole sample space S whose probability is 1. Hence, Pr(X > x) + Pr(X ≤ x) = 1. Now, Eq. (3.3.3) follows from Eq. (3.3.2). Theorem 3.3.2 For all values x1 and x2 such that x1 < x2, Pr(x1 < X ≤ x2) = F(x2) − F(x1). (3.3.4) Proof Let A = {x1 < X ≤ x2}, B = {X ≤ x1}, and C = {X ≤ x2}. As in Example 3.3.3, A and B are disjoint, and their union is C, so Pr(x1 < X ≤ x2) + Pr(X ≤ x1) = Pr(X ≤ x2). Subtracting Pr(X ≤ x1) from both sides of this equation and applying Eq. (3.3.2) yields Eq. (3.3.4). For example, if the c.d.f. of X is as sketched in Fig. 3.6, then it follows from Theorems 3.3.1 and 3.3.2 that Pr(X > x2) = 1− z1 and Pr(x2 < X ≤ x3) = z3 − z1. Also, since F(x) is constant over the interval x1 ≤ x ≤ x2, then Pr(x1 < X ≤ x2) = 0. It is important to distinguish carefully between the strict inequalities and the weak inequalities that appear in all of the preceding relations and also in the next theorem. If there is a jump in F(x) at a given value x, then the values of Pr(X ≤ x) and Pr(X < x) will be different. Theorem 3.3.3 For each value x, Pr(X < x) = F(x−). (3.3.5) 110 Capítulo 3 Variáveis Aleatórias e Distribuições Na Fig. 3.6 esta propriedade é ilustrada pelo fato de que, nos pontos de descontinuidade x=x1ex=x3, o valor deF (x1)é tomado comoz1e o valor deF (x3)é tomado como z3. Determinando probabilidades da função de distribuição Exemplo 3.3.3 Tensão.No Exemplo 3.3.1, suponha que queremos saber a probabilidade de queXmentiras no intervalo [2,4]. Ou seja, queremos que o Pr(2≤X≤4). O cdf nos permite calcular Pr(X≤4)e Pr(X≤2). Eles estão relacionados à probabilidade que queremos da seguinte forma: SejaA= {2<X≤4},B= {X≤2}, eC= {X≤4}. PorqueXtem uma distribuição contínua, Pr(A)é igual à probabilidade que desejamos. Nós vemos queA∪B= C, e é claro queAeBsão disjuntos. Portanto, Pr.(A)+Pr.(B)=Pr.(C). Segue que 4 5 3 4 1 20 Pr.(A)=Pr.(C)-Pr.(B)=F (4)-F (2)= - = . - O tipo de raciocínio usado no Exemplo 3.3.3 pode ser estendido para encontrar a probabilidade de que uma variável aleatória arbitráriaXestará em qualquer intervalo especificado da reta real da cdf. Derivaremos essa probabilidade para quatro tipos diferentes de intervalos. Teorema 3.3.1 Para cada valorx, Pr.(X > x)=1 -F(x). (3.3.3) ProvaOs eventos {X > x}e {X≤x}são disjuntos e sua união é todo o espaço amostralScuja probabilidade é 1. Portanto, Pr(X > x)+Pr.(X≤x)=1. Agora, Eq. (3.3.3) segue da Eq. (3.3.2). Teorema 3.3.2 Para todos os valoresx1ex2de tal modo quex1<x2, Pr.(x1<X≤x2)=F (x2)-F (x1). (3.3.4) ProvaDeixarA= {x1<X≤x2},B= {X≤x1}, eC= {X≤x2}. Como no Exemplo 3.3.3, AeBsão disjuntos e sua união éC, então Pr.(x1<X≤x2)+Pr.(X≤x1)=Pr.(X≤x2). Subtraindo Pr(X≤x1)de ambos os lados desta equação e aplicando a Eq. (3.3.2) produz a Eq. (3.3.4). Por exemplo, se o cdf deXé como esboçado na Fig. 3.6, então segue dos Teoremas 3.3.1 e 3.3.2 que Pr(X > x2)=1 -z1e Pr(x2<X≤x3)=z3-z1. Além disso, desde F(x)é constante ao longo do intervalox1≤x≤x2, então Pr(x1<X≤x2)=0. É importante distinguir cuidadosamente entre as desigualdades estritas e as desigualdades fracas que aparecem em todas as relações anteriores e também no próximo teorema. Se houver um saltoF(x)em um determinado valorx, então os valores de Pr(X≤x) e Pr(X<x)será diferente. Teorema 3.3.3 Para cada valorx, Pr.(X<x)=F (x-). (3.3.5) 3.3 A Fungdo de Distribuigéo Cumulativa 111 3.3 The Cumulative Distribution Function III ProvaDeixarsin <simm<...ser uma sequéncia crescente de numeros tal que limn--simn= x. Proof Let y, < yy <---beanincreasing sequence of numbers such that lim,,_,., y, = Ent&o pode-se mostrar que x. Then it can be shown that \9 CO {X<x} = {X<simr}. {X <x}= les < yy}. n=1 n=1 Portanto, segue-se do Exercicio 12 da Seg. 1.10 que Therefore, it follows from Exercise 12 of Sec. 1.10 that Pr.(X<x lim Pr (Xssimmn) Pr(Xx <x) = lim Pr(X < y,) 60 n—> oo =limaoF (S/F (x). = = lim F(y,) = F(x). = n> n—> oo Por exemplo, para o cdf esbogado na Fig. 3.6, Pr(X<x3 Ze Pr(X<xa) For example, for the c.d.f. sketched in Fig. 3.6, Pr(X < x3) =z) and Pr(X < x4) =1. =1. Finalmente, mostraremos que para cada valorx, Pr.(X=x}€ igual a quantidade Finally, we shall show that for every value x, Pr(X = x) is equal to the amount do salto que ocorre em/no pontox. SeFé continuo no pontox, isto é, se ndo of the jump that occurs in F at the point x. If F is continuous at the point x, that is, houver salto nox, entdo Pr(X=x}0. if there is no jump in F at x, then Pr(X = x) =0. Teorema Para cada valorx, Theorem For every value x, 3.3.4 3.3.4 _ Pr. (X=XE F(x} F (x-). (3.3.6) Pr(X =x) = F(x) — F(x). (3.3.6) ProvaE sempre verdade que 0 Pr(X=xFPr.(XSx}Pr.(X<x). A relacdo (3.3.6) decorre Proof It is always true that Pr(X = x) = Pr(X < x) — Pr(X < x). The relation (3.3.6) do fato de que Pr(X<xF(xJem todos os pontos e do Teorema 3.3.3. follows from the fact that Pr(X < x) = F(x) at every point and from Theorem 3.3.3. . . Na Figura 3.6, por exemplo, Pr(x=x1 21-2, Pr.(X=x3/-23-z2, e a probabilidade In Fig. 3.6, for example, Pr(X = x1) =z, — Zo, Pr(X = x3) =z3 — Zz, and the de todos os outros valores individuais dexé 0. probability of every other individual value of X is 0. O cdf de uma distribuicgdo discreta The c.d.f. of a Discrete Distribution Da definigdo e propriedades de um cdfF/(x), seque-se que seuma < be se Pr(a <X From the definition and properties of a c.d.f. F(x), it follows that if a <b and < DFO, entdof(xsera constante e horizontal ao longo do intervalo uma <x <b. if Pr(a < X < b) =0, then F(x) will be constant and horizontal over the interval Além disso, como acabamos de ver, em todos os pontosxtal que Pr(X=x) >0, o cdf a <x <b.Furthermore, as we have just seen, at every point x such that Pr(X = x) > 0, saltara pelo valor Pr(X=x). the c.d.f. will jump by the amount Pr(X = x). Suponha queXtem uma distribuigdo discreta com o PFf(x). Juntas, as Suppose that X has a discrete distribution with the p.f. f(x). Together, the prop- propriedades de um cdf implicam queF/(xdeve ter o seguinte formato: F/x)tera um erties of a c.d.f. imply that F(x) must have the following form: F (x) will have a jump salto de magnitude/f/xeuJem cada valor possivelxeudex, eF/x)sera constante entre of magnitude f (x;) at each possible value x; of X, and F(x) will be constant between cada par de saltos sucessivos. A distribuigdo de uma variavel aleatéria discretaXpode every pair of successive jumps. The distribution of a discrete random variable X can ser representado igualmente bem pelo PF ou pelo CDF dex. be represented equally well by either the p.f. or the c.d-f. of X. O cdf de uma distribuigdo continua The c.d.f. of a Continuous Distribution Teorema DeixarXtem uma distribuigdo continua e deixa/(x)eF(x)denotar seu pdf eo Theorem Let X have a continuous distribution, and let f(x) and F(x) denote its p.d.f. and the 3.3.5 cdf, respectivamente. Entdo/é continua j cadax, 3.3.5 c.d.f., respectively. Then F is continuous at every x, x x F(x f(odt, (3.3.7) F(x) =| f(t) dt, (3.3.7) — 0 —0o e and dF(x) dF (x) —— =f(x), 3.3.8 ——— = f(x), 3.3.8 oy tO) (3.3.8) TL) (3.3.8) de forma algumaxde tal modo que# continuo. at all x such that f is continuous. 112 Chapter 3 Random Variables and Distributions Proof Since the probability of each individual point x is 0, the c.d.f. F(x) will have no jumps. Hence, F(x) will be a continuous function over the entire real line. By definition, F(x) = Pr(X ≤ x). Since f is the p.d.f. of X, we have from the definition of p.d.f. that Pr(X ≤ x) is the right-hand side of Eq. (3.3.7). It follows from Eq. (3.3.7) and the relation between integrals and derivatives (the fundamental theorem of calculus) that, for every x at which f is continuous, Eq. (3.3.8) holds. Thus, the c.d.f. of a continuous random variable X can be obtained from the p.d.f. and vice versa. Eq. (3.3.7) is how we found the c.d.f. in Example 3.3.1. Notice that the derivative of the F in Example 3.3.1 is F ′(x) = ⎧ ⎨ ⎩ 0 for x < 0, 1 (1 + x)2 for x > 0, and F ′ does not exist at x = 0. This verifies Eq (3.3.8) for Example 3.3.1. Here, we have used the popular shorthand notation F ′(x) for the derivative of F at the point x. Example 3.3.4 Calculating a p.d.f. from a c.d.f. Let the c.d.f. of a random variable be F(x) = ⎧ ⎪⎨ ⎪⎩ 0 for x < 0, x2/3 for 0 ≤ x ≤ 1, 1 for x > 1. This function clearly satisfies the three properties required of every c.d.f., as given earlier in this section. Furthermore, since this c.d.f. is continuous over the entire real line and is differentiable at every point except x = 0 and x = 1, the distribution of X is continuous. Therefore, the p.d.f. of X can be found at every point other than x = 0 and x = 1 by the relation (3.3.8). The value of f (x) at the points x = 0 and x = 1 can be assigned arbitrarily. When the derivative F ′(x) is calculated, it is found that f (x) is as given by Eq. (3.2.9) in Example 3.2.6. Conversely, if the p.d.f. of X is given by Eq. (3.2.9), then by using Eq. (3.3.7) it is found that F(x) is as given in this example. ◀ The Quantile Function Example 3.3.5 Fair Bets. Suppose that X is the amount of rain that will fall tomorrow, and X has c.d.f. F. Suppose that we want to place an even-money bet on X as follows: If X ≤ x0, we win one dollar and if X > x0 we lose one dollar. In order to make this bet fair, we need Pr(X ≤ x0) = Pr(X > x0) = 1/2. We could search through all of the real numbers x trying to find one such that F(x) = 1/2, and then we would let x0 equal the value we found. If F is a one-to-one function, then F has an inverse F −1 and x0 = F −1(1/2). ◀ The value x0 that we seek in Example 3.3.5 is called the 0.5 quantile of X or the 50th percentile of X because 50% of the distribution of X is at or below x0. Definition 3.3.2 Quantiles/Percentiles. Let X be a random variable with c.d.f. F. For each p strictly between 0 and 1, define F −1(p) to be the smallest value x such that F(x) ≥ p. Then F −1(p) is called the p quantile of X or the 100p percentile of X. The function F −1 defined here on the open interval (0, 1) is called the quantile function of X. 112 Capítulo 3 Variáveis Aleatórias e Distribuições ProvaComo a probabilidade de cada ponto individualxé 0, o cdfF(x)não terá saltos. Por isso,F(x)será uma função contínua em toda a linha real. Por definição,F(x)=Pr.(X≤x). Desdefé o pdf deX, temos pela definição de pdf que Pr(X≤x)é o lado direito da Eq. (3.3.7). Segue-se da Eq. (3.3.7) e a relação entre integrais e derivadas (o teorema fundamental do cálculo) que, para cadaxem qualfé contínuo, Eq. (3.3.8) é válido. Assim, o cdf de uma variável aleatória contínuaXpode ser obtido no pdf e vice- versa. Eq. (3.3.7) é como encontramos o cdf no Exemplo 3.3.1. Notar que a derivada doFno Exemplo 3.3.1 é ⎧ ⎨0 parax <0, parax >0, F'(x)= 1 ⎩ (1 +x)2 eF'não existe emx=0. Isso verifica a Eq (3.3.8) para o Exemplo 3.3.1. Aqui, usamos a popular notação abreviadaF'(x)para a derivada deFno pontox. Exemplo 3.3.4 Calculando um pdf a partir de um cdfSeja o cdf de uma variável aleatória ⎧ ⎪⎨0 parax <0, F(x)= 2 3 ⎪x/ ⎩ 1 para 0≤x≤1, parax >1. Esta função satisfaz claramente as três propriedades exigidas de cada cdf, conforme fornecido anteriormente nesta seção. Além disso, como esta fdc é contínua ao longo de toda a reta real e é diferenciável em todos os pontos, excetox=0 ex=1, a distribuição deX é contínuo. Portanto, o pdf deXpode ser encontrado em todos os pontos, excetox=0 ex=1 pela relação (3.3.8). O valor def(x)nos pontosx=0 ex=1 pode ser atribuído arbitrariamente. Quando a derivadaF'(x)é calculado, verifica-se quef(x) é dado pela Eq. (3.2.9) no Exemplo 3.2.6. Por outro lado, se o pdf deXé dado pela Eq. (3.2.9), então usando a Eq. (3.3.7) verifica-se queF(x)é como dado neste exemplo. - A Função Quantil Exemplo 3.3.5 Apostas justas.Suponha queXé a quantidade de chuva que cairá amanhã, eXtem CDFF.Suponha que queiramos fazer uma aposta de dinheiro igual emXda seguinte forma: SeX≤x0, ganhamos um dólar e seX > x0perdemos um dólar. Para tornar esta aposta justa, precisamos do Pr(X≤x0)=Pr.(X > x0)=1/2. Poderíamos pesquisar todos os números reais x tentando encontrar um tal queF(x)=1/2, e então deixaríamosx0igual ao valor que encontramos. SeFé uma função injetora, entãoFtem um inversoF-1ex0=F-1(1/2). - O valor quex0que buscamos no Exemplo 3.3.5 é chamado de 0,5quantildeXou o 50ºpercentildeXporque 50% da distribuição deXestá em ou abaixox0. Definição 3.3.2 Quantis/Percentis.DeixarXseja uma variável aleatória com cdfF.Para cadapestritamente entre 0 e 1, definaF-1(P)ser o menor valorxde tal modo queF(x)≥p. Então F-1(P)é chamado depquantildeXou os 100ppercentildeX. A funçãoF-1 definido aqui no intervalo aberto(0,1)é chamado defunção quantildeX. 3.3 A Fungdo de Distribuigéo Cumulativa 113 3.3 The Cumulative Distribution Function 113 Exemplo Pontuacées de testes padronizados.Muitas universidades nos Estados Unidos contam com padrées padronizados Example Standardized Test Scores. Many universities in the United States rely on standardized 3.3.6 pontuagées de testes como parte de seu processo de admissdo. Milhares de pessoas fazem esses testes 3.3.6 test scores as part of their admissions process. Thousands of people take these tests sempre que sdo oferecidos. A pontuagao de cada examinado é comparada com a colecdo de pontuagées de each time that they are offered. Each examinee’s score is compared to the collection todos os examinados para ver onde ela se enquadra na classificacdo geral. Por exemplo, se 83% de todas as of scores of all examinees to see where it fits in the overall ranking. For example, if pontuacgées dos testes forem iguais ou inferiores 4 sua pontuacdo, o relatorio do teste indicara que vocé 83% of all test scores are at or below your score, your test report will say that you obteve pontuacao no 83° percentil. - scored at the 83rd percentile. < A notagdoF1(Pyna Defini¢gdo 3.3.2 merece alguma justificagdo. Suponha The notation F~!(p) in Definition 3.3.2 deserves some justification. Suppose first primeiro que o cdfAdeXé continua e biunivoca sobre todo o conjunto de valores that the c.d.f. F of X is continuous and one-to-one over the whole set of possible possiveis dex. Entdo o inversoFidefexiste, e para cada 0<p <1, existe ume values of X. Then the inverse F~! of F exists, and for each 0 < p <1, there is one apenas um.xde tal modo quef/(xEp. QuexéF1(P). A definigdo 3.3.2 estende o and only one x such that F(x) = p. That x is F~'(p). Definition 3.3.2 extends the conceito de func¢do inversa a funcdes ndo decrescentes (como as cdf) que ndo concept of inverse function to nondecreasing functions (such as c.d.f’s) that may be podem ser injetoras nem continuas. neither one-to-one nor continuous. Quantis de Distribuigé6es ContinuasQuando o cdf de uma variavel aleatériaXé Quantiles of Continuous Distributions When the c.d.f. of a random variable X is continua e biunivoca sobre todo o conjunto de valores possiveis dex, 0 inverso F continuous and one-to-one over the whole set of possible values of X, the inverse 1deFexiste e é igual a funcdo quantilica dex. F—' of F exists and equals the quantile function of X. Exemplo Valor em risco.O gestor de uma carteira de investimentos esta interessado em saber quanto Example Value at Risk. The manager of an investment portfolio is interested in how much 3.3.7 dinheiro que a carteira pode perder num horizonte de tempo fixo. DeixarXser a variagao 3.3.7 money the portfolio might lose over a fixed time horizon. Let X be the change no valor de uma determinada carteira ao longo de um periodo de um més. Suponha queX in value of the given portfolio over a period of one month. Suppose that X has tem a pdf na Figura 3.7. O gestor calcula uma quantidade conhecida no mundo da gestdo the p.d.f. in Fig. 3.7. The manager computes a quantity known in the world of risk de risco como Va/or em risco(denotado por VaR). Para ser especifico, deixe S= -X management as Value at Risk (denoted by VaR). To be specific, let Y = —X stand representam a perda incorrida pela carteira durante um més. O gerente deseja ter um for the loss incurred by the portfolio over the one month. The manager wants to nivel de confianga sobre o tamanhoSpode ser. Neste exemplo, o gerente especifica um have a level of confidence about how large Y might be. In this example, the manager nivel de probabilidade, como 0,99 e entdo encontrasimo, 0 0.99 quantil deS.0 gerente specifies a probability level, such as 0.99 and then finds yp, the 0.99 quantile of Y. The agora tem 99% de certeza de que S<simo, esimoé chamado de VaR. SeXtem uma manager is now 99% sure that Y < yo, and yo is called the VaR. If X has a continuous distribuigdo continua, entdo é facil ver quesi/moesta intimamente relacionado ao quantil distribution, then it is easy to see that yg is closely related to the 0.01 quantile of 0,01 da distribuigéo deX. O quantil 0,01.xotem a propriedade que Pr(X<x0+0.01. Mas Pr. the distribution of X. The 0.01 quantile x9 has the property that Pr(X < x9) = 0.01. (X<x0FPr.(S >-x0}1 — Pr(Ss —x0). Portanto, -xoé um 0.99 quantil de S.Para a pdf na Fig. 3.7, But Pr(X < x9) = Pr(Y > —x9) =1—Pr(Y < —xp). Hence, —x9 is a 0.99 quantile of vemos quex0= -4.14, como indica a regido sombreada. Entdosimo= 4.14 € 0 VaR para um Y. For the p.d.f. in Fig. 3.7, we see that xy) = —4.14, as the shaded region indicates. més no nivel de probabilidade 0,99. - Then yo = 4.14 is VaR for one month at probability level 0.99. < Figura 3.74 pdf da variagado Densidade Figure 3.7 The p.d-f. of the Density do valor de uma carteira com change in value of a portfolio menor 1% indicada. with lower 1% indicated. 0,12 0.12 0,10 0.10 0,08 0.08 0,06 0.06 0,04 0.04 0,01 0.01 0,02 0.02 = Mudanga de valor = Change in value 0 }20 -10 -4,14 0 10 20 |-20 -10 -4.14 0 10 20 114 Capitulo 3 Varidveis Aleatdrias e Distribuigées 114 Chapter 3 Random Variables and Distributions Figura 3.80 cdf de uma Figure 3.8 The c.d.f. of a distribuigaéo uniforme S40 uniform distribution indi- S10 indicando como resolver 3 cating how to solve for a & um quantil. 508 quantile. 08 os ne g PP----- 7-7 S 0,6 \ = 0.6 | 5 | 4 | ay ! 2 04 S I s I 2 0,2 £02 | z I v I 0 1 2 F'p3 4 Xx 0 1 2F lp) 3 4% Exemplo Distribuigdo Uniforme em um Intervalo.DeixarXtem a distribuigdo uniforme no Example Uniform Distribution on an Interval. Let X have the uniform distribution on the 3.3.8 intervalo [um, 6]. O cdf dexé 3.3.8 interval [a, b]. The c.d.f. of X is | SexSa, 0 if x <a, Sx * 4 F(X}EPr.(XSx — vocéeuma <xsb, F(x) =Pr(X <x)= / —du tfa<x<b, |. b-a a b-a 1 sex>b. 1 ifx >b. A integral acima é igual(x-a)//b-a). Entdo, F(x (x-a)//b-a)para todosuma <x <b, que é The integral above equals (x —a)/(b—a). So, F(x) = (x —a)/(b—a) for alla<x <b, uma funcdo estritamente crescente ao longo de todo o intervalo de valores possiveis which is a strictly increasing function over the entire interval of possible values of X. dex. O inverso desta funcdo é a funcdo quantil deX, que obtemos definindo F/xjigual The inverse of this function is the quantile function of X, which we obtain by setting ape resolvendo parax: F(x) equal to p and solving for x: xa =p X—a b-a ‘ b-a P» X-a=p(b-a), x —a=p(b—a), X=at+ p(b-aFpb+(1 -pja. x=a+t p(b—a)=pb+(-— p)a. A Figura 3.8 ilustra como o calculo de um quantil se relaciona com o cdf Figure 3.8 illustrates how the calculation of a quantile relates to the c.d.f. A fungao quantil deXéF1(Ppb+(1 -pJumapara O0<p <1. Em particular, F1(1/2) The quantile function of X is F~'(p) = pb + (1— p)a for 0 < p <1. Inparticular, =(b+a)/2. - F-'(1/2) = (b +.a)/2. < Nota: Os quantis, assim como os cdfs, dependem apenas da distribuigdo.Quaisquer duas Note: Quantiles, Like c.d.f.’s, Depend on the Distribution Only. Any two random varidveis aleatérias com a mesma distribuicdo tém a mesma funcdo quantilica. Quando nos variables with the same distribution have the same quantile function. When we refer referimos a um quantil deX, queremos dizer um quantil da distribuicdo dex. to a quantile of X, we mean a quantile of the distribution of X. Quantis de distribuigées discretasE conveniente poder calcular quantis Quantiles of Discrete Distributions It is convenient to be able to calculate quantiles também para distribuigdes discretas. A fung¢do quantilica da Defini¢do 3.3.2 existe for discrete distributions as well. The quantile function of Definition 3.3.2 exists for all para todas as distribuicgées, sejam elas discretas, continuas ou outras. Por exemplo, distributions whether discrete, continuous, or otherwise. For example, in Fig. 3.6, let na Fig. 3.6, deixe ~spszi. Entéo o menorxde tal modo quef(xpéx. Para cada valor Z9 p is x;. For every value of x < x, dex <x, Nos temosF (x) < ~speF (x11. Notar queF(x-zipara todosxentrex1 we have F(x) <z9 < p and F(x;) = z;. Notice that F(x) = z, for all x between x, ex2, mas desdex1é o menor de todos esses numeros,x1é opquantil. Como as fungdes and x5, but since x, is the smallest of all those numbers, x, is the p quantile. Because de distribuigdo sao continuas a direita, o menorxde tal modo queF/(xzp existe para distribution functions are continuous from the right, the smallest x such that F(x) > p todo 0<p <1. Parap=1, ndo ha garantia de que talxexistira. Por exemplo, na Figura exists for all 0 < p < 1. For p =1, there is no guarantee that such an x will exist. For 3.6,F (x41, mas no Exemplo 3.3.1,F (x) <1 para todosx. Parap=0, nunca ha um menor example, in Fig. 3.6, F(x4) = 1, but in Example 3.3.1, F(x) <1 for all x. For p =0, xde tal modo quef(x¥0 porque limx+-~F(x0. Isto é, se F (x00, entaof(x0 para there is never a smallest x such that F(x) = 0 because lim,_,_,, F(x) = 0. That is, if todosx <xo. Por estas raz6es, nunca falamos sobre os quantis 0 ou 1. F (xq) = 0, then F(x) = 0 for all x < xg. For these reasons, we never talk about the 0 or 1 quantiles. 3.3 The Cumulative Distribution Function 115 Table 3.1 Quantile function for Example 3.3.9 p F−1(p) (0, 0.1681] 0 (0.1681, 0.5283] 1 (0.5283, 0.8370] 2 (0.8370, 0.9693] 3 (0.9693, 0.9977] 4 (0.9977, 1) 5 Example 3.3.9 Quantiles of a Binomial Distribution. Let X have the binomial distribution with pa- rameters 5 and 0.3. The binomial table in the back of the book has the p.f. f of X, which we reproduce here together with the c.d.f. F: x 0 1 2 3 4 5 f (x) 0.1681 0.3602 0.3087 0.1323 0.0284 0.0024 F(x) 0.1681 0.5283 0.8370 0.9693 0.9977 1 (A little rounding error occurred in the p.f.) So, for example, the 0.5 quantile of this distribution is 1, which is also the 0.25 quantile and the 0.20 quantile. The entire quantile function is in Table 3.1. So, the 90th percentile is 3, which is also the 95th percentile, etc. ◀ Certain quantiles have special names. Definition 3.3.3 Median/Quartiles. The 1/2 quantile or the 50th percentile of a distribution is called its median. The 1/4 quantile or 25th percentile is the lower quartile. The 3/4 quantile or 75th percentile is called the upper quartile. Note: The Median Is Special. The median of a distribution is one of several special features that people like to use when sumarizing the distribution of a random vari- able. We shall discuss summaries of distributions in more detail in Chapter 4. Because the median is such a popular summary, we need to note that there are several dif- ferent but similar “definitions” of median. Recall that the 1/2 quantile is the smallest number x such that F(x) ≥ 1/2. For some distributions, usually discrete distributions, there will be an interval of numbers [x1, x2) such that for all x ∈ [x1, x2), F(x) = 1/2. In such cases, it is common to refer to all such x (including x2) as medians of the dis- tribution. (See Definition 4.5.1.) Another popular convention is to call (x1 + x2)/2 the median. This last is probably the most common convention. The readers should be aware that, whenever they encounter a median, it might be any one of the things that we just discussed. Fortunately, they all mean nearly the same thing, namely that the number divides the distribution in half as closely as is possible. 3.3 A Função de Distribuição Cumulativa 115 Tabela 3.1Função quantil por exemplo 3.3.9 p F-1(p) (0,0.1681] (0.1681,0.5283] ( 0.5283,0.8370] ( 0.8370,0.9693] ( 0.9693,0.9977] ( 0.9977,1) 0 1 2 3 4 5 Exemplo 3.3.9 Quantis de uma distribuição binomial.DeixarXtem a distribuição binomial com pa- ramâmetros 5 e 0,3. A tabela binomial no final do livro tem o pffdeX, que reproduzimos aqui junto com o cdfF: x 0 1 2 3 4 5 f(x) F(x) 0,1681 0,1681 0,3602 0,5283 0,3087 0,8370 0,1323 0,9693 0,0284 0,9977 0,0024 1 (Ocorreu um pequeno erro de arredondamento no PF) Então, por exemplo, o quantil 0,5 desta distribuição é 1, que também é o quantil 0,25 e o quantil 0,20. Toda a função quantil está na Tabela 3.1. Portanto, o percentil 90 é 3, que também é o percentil 95, etc. - Certos quantis têm nomes especiais. Definição 3.3.3 Mediana/quartis.O 1/2 quantil ou o 50º percentil de uma distribuição é chamado de mediana. O quantil 1/4 ou percentil 25 é oQuartil inferior. O quantil 3/4 ou percentil 75 é chamado dequartil superior. Nota: A mediana é especial.A mediana de uma distribuição é um dos vários recursos especiais que as pessoas gostam de usar ao resumir a distribuição de uma variável aleatória. Discutiremos resumos de distribuições com mais detalhes no Capítulo 4. Como a mediana é um resumo muito popular, precisamos observar que existem diversas “definições” de mediana diferentes, mas semelhantes. Lembre-se de que o quantil 1/2 é o menor númeroxde tal modo queF(x)≥1/2. Para algumas distribuições, geralmente distribuições discretas, haverá um intervalo de números [x1, x2)tal que para todosx∈ [x1, x 2),F(x)=1/2. Nesses casos, é comum referir-se a todos essesx(Incluindox2) como medianas da distribuição. (Ver Definição 4.5.1.) Outra convenção popular é chamar(x1+x2)/2 a mediana. Esta última é provavelmente a convenção mais comum. Os leitores devem estar cientes de que, sempre que encontrarem uma mediana, pode ser qualquer uma das coisas que acabamos de discutir. Felizmente, todos eles significam quase a mesma coisa, ou seja, que o número divide a distribuição pela metade, tanto quanto possível. 116 Chapter 3 Random Variables and Distributions Example 3.3.10 Uniform Distribution on Integers. Let X have the uniform distribution on the integers 1, 2, 3, 4. (See Definition 3.1.6.) The c.d.f. of X is F(x) = ⎧ ⎪⎪⎪⎪⎪⎨ ⎪⎪⎪⎪⎪⎩ 0 if x < 1, 1/4 if 1 ≤ x < 2, 1/2 if 2 ≤ x < 3, 3/4 if 3 ≤ x < 4, 1 if x ≥ 4. The 1/2 quantile is 2, but every number in the interval [2, 3]might be called a median. The most popular choice would be 2.5. ◀ One advantage to describing a distribution by the quantile function rather than by the c.d.f. is that quantile functions are easier to display in tabular form for multiple distributions. The reason is that the domain of the quantile function is always the interval (0, 1) no matter what the possible values of X are. Quantiles are also useful for summarizing distributions in terms of where the probability is. For example, if one wishes to say where the middle half of a distribution is, one can say that it lies between the 0.25 quantile and the 0.75 quantile. In Sec. 8.5, we shall see how to use quantiles to help provide estimates of unknown quantities after observing data. In Exercise 19, you can show how to recover the c.d.f. from the quantile function. Hence, the quantile function is an alternative way to characterize a distribution. Summary The c.d.f. F of a random variable X is F(x) = Pr(X ≤ x) for all real x. This function is continuous from the right. If we let F(x−) equal the limit of F(y) as y approaches x from below, then F(x) − F(x−) = Pr(X = x). A continuous distribution has a continuous c.d.f. and F ′(x) = f (x), the p.d.f. of the distribution, for all x at which F is differentiable. A discrete distribution has a c.d.f. that is constant between the possible values and jumps by f (x) at each possible value x. The quantile function F −1(p) is equal to the smallest x such that F(x) ≥ p for 0 < p < 1. Exercises 1. Suppose that a random variable X has the Bernoulli distribution with parameter p = 0.7. (See Definition 3.1.5.) Sketch the c.d.f. of X. 2. Suppose that a random variable X can take only the values −2, 0, 1, and 4, and that the probabilities of these values are as follows: Pr(X = −2) = 0.4, Pr(X = 0) = 0.1, Pr(X = 1) = 0.3, and Pr(X = 4) = 0.2. Sketch the c.d.f. of X. 3. Suppose that a coin is tossed repeatedly until a head is obtained for the first time, and let X denote the number of tosses that are required. Sketch the c.d.f. of X. 4. Suppose that the c.d.f. F of a random variable X is as sketched in Fig. 3.9. Find each of the following probabili- ties: a. Pr(X = −1) b. Pr(X < 0) c. Pr(X ≤ 0) d. Pr(X = 1) e. Pr(0 < X ≤ 3) f. Pr(0 < X < 3) g. Pr(0 ≤ X ≤ 3) h. Pr(1 < X ≤ 2) i. Pr(1 ≤ X ≤ 2) j. Pr(X > 5) k. Pr(X ≥ 5) l. Pr(3 ≤ X ≤ 4) 5. Suppose that the c.d.f. of a random variable X is as follows: F(x) = ⎧ ⎪⎨ ⎪⎩ 0 for x ≤ 0, 1 9x2 for 0 < x ≤ 3, 1 for x > 3. Find and sketch the p.d.f. of X. 116 Capítulo 3 Variáveis Aleatórias e Distribuições Exemplo 3.3.10 Distribuição Uniforme em Inteiros.DeixarXtem a distribuição uniforme nos inteiros 1,2,3,4. (Ver Definição 3.1.6.) O cdf deXé ⎧ ⎪⎪0 ⎪⎪⎪⎨1/4 sex <1, se 1≤x <2, se 2≤x <3, se 3≤x <4, sex≥4. F(x)= 1/2 ⎪⎪⎪⎪⎪3/4 ⎩ 1 O quantil 1/2 é 2, mas cada número no intervalo [2,3] pode ser chamada de mediana. A escolha mais popular seria 2,5. - Uma vantagem de descrever uma distribuição pela função quantílica em vez da cdf é que as funções quantílicas são mais fáceis de exibir em forma tabular para distribuições múltiplas. A razão é que o domínio da função quantil é sempre o intervalo(0,1)não importa quais sejam os valores possíveis deXsão. Os quantis também são úteis para resumir distribuições em termos de onde está a probabilidade. Por exemplo, se quisermos dizer onde está a metade central de uma distribuição, podemos dizer que ela está entre o quantil 0,25 e o quantil 0,75. Na seg. 8.5, veremos como usar quantis para ajudar a fornecer estimativas de quantidades desconhecidas após observar os dados. No Exercício 19, você pode mostrar como recuperar o cdf da função quantil. Portanto, a função quantil é uma forma alternativa de caracterizar uma distribuição. Resumo O CDFFde uma variável aleatóriaXéF(x)=Pr.(X≤x)para tudo de verdadex. Esta função é contínua da direita. Se deixarmosF (x-)igualar o limite deF (s)comosim abordagens xde baixo, entãoF(x)-F (x-)=Pr.(X=x). Uma distribuição contínua tem um cdf contínuo eF'(x)=f(x), o pdf da distribuição, para todosxem qual Fé diferenciável. Uma distribuição discreta tem um cdf que é constante entre os valores possíveis e salta porf(x)em cada valor possívelx. A função quantil F-1(P)é igual ao menorxde tal modo queF(x)≥ppara 0<p <1. Exercícios 1.Suponha que uma variável aleatóriaXtem a distribuição de Bernoulli com parâmetrop=0.7. (Ver Definição 3.1.5.) Esboce o cdf deX. a.Pr.(X= -1) c.Pr.(X≤0) e.Pr.(0<X≤3) g.Pr.(0≤X≤3) eu.Pr.(1≤X≤2) k.Pr.(X≥5) b.Pr.(X <0) d.Pr.(X=1) f. h. j.Pr.(X >5) eu.Pr.(3≤X≤4) Pr.(0<X<3) Pr.( 1<X≤2) 2.Suponha que uma variável aleatóriaXpode assumir apenas os valores −2,0,1 e 4, e que as probabilidades desses valores são as seguintes: Pr(X= -2)=0.4, Pr.(X=0)= 0.1, Pr.(X=1)=0.3, e Pr.(X=4)=0.2. Esboce o cdf de X. 5.Suponha que o cdf de uma variável aleatóriaXé o seguinte: 3.Suponha que uma moeda seja lançada repetidamente até que uma cara seja obtida pela primeira vez, e deixeXdenota o número de lançamentos necessários. Esboce o cdf deX. ⎧ ⎪⎨0 parax≤0, para 0<x≤3, parax >3. F(x)= 1x2 ⎪⎩9 1 4.Suponha que o cdfFde uma variável aleatóriaXé como esboçado na Fig. 3.9. Encontre cada uma das seguintes probabilidades: Encontre e esboce o pdf deX. 3.3 A Fungdo de Distribuigéo Cumulativa 117 3.3 The Cumulative Distribution Function 117 6.Suponha que o cdf de uma variavel aleatériaXé 0 15.Suponha queXtem o pdf 6. Suppose that the c.d.f. of a random variable X is as 15. Suppose that X has the p.d.f. seguinte: { follows: . { oy3 fix)= 2x seO<x <1, 3 roy = | if0<x <1, _ = x = F(x) paraxs3, 0 de outra forma. F(x) = | eé for x < 3, 0 otherwise. 1 parax >3. Encontre e esboce o cdf ouXx. ! for x > 3. Find and sketch the c.d.f. or X. Encontre e esboce o pdf dex. . .. ae Find and sketch the p.d-f. of X. . . . Co 16.Encontre a fungdo quantilica para a distribuigdo no 16. Find the quantile function for the distribution in Ex- 7.Suponha, como no Exercicio 7 da Segdo. 3.2, que uma Exemplo 3.3.1. 7. Suppose, as in Exercise 7 of Sec. 3.2, that a random ample 3.3.1. variavel aleatériaXtem a distribuigdo uniforme no intervalo . ; , variable X has the uniform distribution on the interval ; . 4 [-2,8]. Encontre e esboce o cdf dex. 17.Prove que a funcdo quantilFide uma variavel [—2, 8]. Find and sketch the c.d.f. of X. 17. Prove that the quantile function F' ofa general ran- aleatéria geralXtem as trés propriedades a seguir que dom variable X has the following three properties that are 8.Suponha que um ponto noxy-plano é escolhido sao anadlogas as propriedades do cdf: 8. Suppose that a point in the xy-plane is chosen at ran- analogous to properties of the c.d.f.: aleatoriamente do interior de um circulo para o qual a 1 funcSo nao d ted 0<p <1 dom from the interior of a circle for which the equation is p-hi d ne functi fp for 0 1 equagdo é x2+s/m= 1; e suponha que a probabilidade de o a.Fre uma TUNGadd NAdo Cecrescente deppara Usp <'. x? + y? = 1; and suppose that the probability that the point a. 1s. a nondecreasing TNCHON OF p tors P<" ponto pertencer a cada regido dentro do circulo seja b.Deixarxrlimp-oF 1 (Pxi=limp-1F1 (P). will belong to each region inside the circle is proportional b. Let xo = lim po F-\(p) and x, =lim pod F-\(p). proporcional a area dessa regido. DeixarZdenota uma Entzoxé igual ao maior limite inferior do conjunto to the area of that region. Let Z denote a random variable Then x9 equals the greatest lower bound on the set variavel aleatoria que representa a distancia do centro do de numerosctal que Pr(X<c) >0, exié igual ao representing the distance from the center of the circle to of numbers c such that Pr(X <c) > 0, and x; equals circulo até o ponto. Encontre e esboce o cdf dez. menor limite superior no conjunto de numerosd the point. Find and sketch the c.d.f. of Z. the least upper bound on the set of numbers d such 9.Suponha queXtem distribuicgdo uniforme no intervalo tal que Pr(X2e) >0. 9. Suppose that X has the uniform distribution on the that Pr(X > d) > 0. [0,5] e que a variavel aleatériaSé definido por S=0 sexs c.£1é continuo da esquerda; aquilo 6F1(P F1(pdg.-) interval [0, 5] and that the random variable Y is defined c. F~! is continuous from the left; that is F-\(p) = 1,S=5 sex23, eS=Xde outra forma. Esboce o cdf deS. para todos 03, and Y=X otherwise. F-\(p-) for all0 < p <1. . 7 .. . Sketch the c.d.f. of Y. . . . . 18.DeixarXser uma variavel aleatéria com fungdo 18. Let X be a random variable with quantile function 10.Para o cdf no Exemplo 3.3.4, encontre a fun¢do quantil 1. Suponha as trés condig6es a seguir: (i)F1(P 10. For the c.d.f. in Example 3.3.4, find the quantile func- 1. Assume the following three conditions: (i) F~!(p) = quantil. qpara todospno intervalo(po, pdg.1), (ii) qualquer po= 0 tion. c for all p in the interval (po, p1), (ii) either po = 0 or z see _ —1 ses : -1 11.Para o cdf no Exercicio 5, encontre a fungdo quantil. ou F1(pag.o) <¢, € (ill) oupI= 1 OUF1(p) > cparap > pi. 11. For the c.d.f. in Exercise 5, find the quantile function. | * (Po) <¢, and (iii) either pj = lor F~"(p) > ¢ for p > Prove que Pr(X=cFp1-po. Pp}. Prove that Pr(X =c) = p; — po. 12.Para o cdf no Exercicio 6, encontre a fungdo quantil. 12. For the c.d-f. in Exercise 6, find the quantile function. . . . 19.DeixarXseja uma variavel aleatéria com cdffe fungado ; ; 19, Let X be a random variable with c.d.f. F and quantile 13.Suponha que um corretor acredite que amudanca no valor —quantilF1. Deixarxoexiser conforme definido no Exercicio 13. Suppose that a broker believes that the change in function F~!. Let xq and x, be as defined in Exercise 17. Xde um determinado investimento nos prdoximos dois meses 17. (Observe quexo= -0e/oux1=~sdo possiveis.) Prove que value X of a particular investment over the next two (Note that x9 = —oo and/or x, = 00 are possible.) Prove tem distribuigdo uniforme no intervalo [-12, 24]. Encontre o para todosxno intervalo aberto(xo, x1), F(xJé 0 maior pde tal months has the uniform distribution on the interval [—12, that for all x in the open interval (xo, x1), F (x) is the largest valor em risco VaR para dois meses no nivel de probabilidade modo queF1(PKX. 24]. Find the value at risk VaR for two months at proba- p such that F~1(p) <x. 0,95. bility level 0.95. — . . Coe 20.No Exercicio 13 da Sedo. 3.2, desenhe um esboco do cdfF . . . . . 20. In Exercise 13 of Sec. 3.2, draw a sketch of the c.d.f. F 14.Encontre os quartis e a mediana da distribui¢do de Xe encontraF (10). 14. Find the quartiles and the median of the binomial — of y and find F(10). binomial com pardmetros=10 ep=0.2. distribution with parameters n = 10 and p = 0.2. AX) F(x) {+ --------------------------- |+--------------------------- | | | | 08+ ---------------- ! 0.8 ---------------- ! | | | | | | | | 0,6-- ---------------- | | 0.6-- ---------------- | | | | | | | | OA mn | | | 0.4 pa | | | | | | | | | 0,2 | | | | | 0.2 | | | | | | | | | | | | | | | | | | | | | | | | | -1 0 1 2 3 4 5 x =1 0 1 2 3 4 5 x Figura 3.90 cdf para o Exercicio 4. Figure 3.9 The c.d.f. for Exercise 4. 118 Chapter 3 Random Variables and Distributions 3.4 Bivariate Distributions We generalize the concept of distribution of a random variable to the joint distri- bution of two random variables. In doing so, we introduce the joint p.f. for two discrete random variables, the joint p.d.f. for two continuous random variables, and the joint c.d.f. for any two random variables. We also introduce a joint hybrid of p.f. and p.d.f. for the case of one discrete random variable and one continuous random variable. Example 3.4.1 Demands for Utilities. In Example 3.1.5, we found the distribution of the random variable X that represented the demand for water. But there is another random variable, Y, the demand for electricity, that is also of interest. When discussing two random variables at once, it is often convenient to put them together into an ordered pair, (X, Y). As early as Example 1.5.4 on page 19, we actually calculated some probabilities associated with the pair (X, Y). In that example, we defined two events A and B that we now can express as A = {X ≥ 115} and B = {Y ≥ 110}. In Example 1.5.4, we computed Pr(A ∩ B) and Pr(A ∪ B). We can express A ∩ B and A ∪ B as events involving the pair (X, Y). For example, define the set of ordered pairs C = {(x, y) : x ≥ 115 and y ≥ 110} so that that the event {(X, Y) ∈ C)} = A ∩ B. That is, the event that the pair of random variables lies in the set C is the same as the intersection of the two events A and B. In Example 1.5.4, we computed Pr(A ∩ B) = 0.1198. So, we can now assert that Pr((X, Y) ∈ C) = 0.1198. ◀ Definition 3.4.1 Joint/Bivariate Distribution. Let X and Y be random variables. The joint distribution or bivariate distribution of X and Y is the collection of all probabilities of the form Pr[(X, Y) ∈ C]for all sets C of pairs of real numbers such that {(X, Y) ∈ C} is an event. It is a straightforward consequence of the definition of the joint distribution of X and Y that this joint distribution is itself a probability measure on the set of ordered pairs of real numbers. The set {(X, Y) ∈ C} will be an event for every set C of pairs of real numbers that most readers will be able to imagine. In this section and the next two sections, we shall discuss convenient ways to characterize and do computations with bivariate distributions. In Sec. 3.7, these considerations will be extended to the joint distribution of an arbitrary finite number of random variables. Discrete Joint Distributions Example 3.4.2 Theater Patrons. Suppose that a sample of 10 people is selected at random from a theater with 200 patrons. One random variable of interest might be the number X of people in the sample who are over 60 years of age, and another random variable might be the number Y of people in the sample who live more than 25 miles from the theater. For each ordered pair (x, y) with x = 0, . . . , 10 and y = 0, . . . , 10, we might wish to compute Pr((X, Y) = (x, y)), the probability that there are x people in the sample who are over 60 years of age and there are y people in the sample who live more than 25 miles away. ◀ Definition 3.4.2 Discrete Joint Distribution. Let X and Y be random variables, and consider the ordered pair (X, Y). If there are only finitely or at most countably many different possible values (x, y) for the pair (X, Y), then we say that X and Y have a discrete joint distribution. 118 Capítulo 3 Variáveis Aleatórias e Distribuições 3.4 Distribuições Bivariadas Generalizamos o conceito de distribuição de uma variável aleatória para a distribuição conjunta de duas variáveis aleatórias. Ao fazer isso, introduzimos o FP conjunto para duas variáveis aleatórias discretas, o FD conjunto para duas variáveis aleatórias contínuas e o Fdc conjunto para quaisquer duas variáveis aleatórias. Também introduzimos um híbrido conjunto de PF e FD para o caso de uma variável aleatória discreta e uma variável aleatória contínua. Exemplo 3.4.1 Demandas por serviços públicos.No Exemplo 3.1.5, encontramos a distribuição do aleatório variávelXque representava a demanda por água. Mas há outra variável aleatória, S,a procura de electricidade, isso também é interessante. Ao discutir duas variáveis aleatórias ao mesmo tempo, muitas vezes é conveniente colocá-las juntas em um par ordenado,(X, Y). Já no Exemplo 1.5.4 na página 19, calculamos algumas probabilidades associadas ao par(X, Y). Nesse exemplo, definimos dois eventosAeBque agora podemos expressar comoA= {X≥115} eB= {S≥110}. No Exemplo 1.5.4, calculamos Pr(A∩B)e Pr(A∪B). Podemos expressarA∩Be A∪Bcomo eventos envolvendo o par(X, Y). Por exemplo, defina o conjunto de pares ordenadosC= {(x, y):x≥115 esim≥110} para que o evento {(X, Y)∈C)} =A∩B. Ou seja, o evento em que o par de variáveis aleatórias pertence ao conjuntoCé igual à intersecção dos dois eventosAeB. No Exemplo 1.5.4, calculamos Pr(A∩B)= 0.1198. Assim, podemos agora afirmar que Pr((X, Y)∈C)=0.1198. - Definição 3.4.1 Distribuição Conjunta/Bivariada.DeixarXeSsejam variáveis aleatórias. Odistribuição conjunta oudistribuição bivariadadeXeSé a coleção de todas as probabilidades da forma Pr[(X, Y)∈C] para todos os conjuntosCde pares de números reais tais que {(X, Y)∈C}é um evento. É uma consequência direta da definição da distribuição conjunta deXe Sque esta distribuição conjunta é em si uma medida de probabilidade no conjunto de pares ordenados de números reais. O conjunto {(X, Y)∈C}será um evento para cada setCde pares de números reais que a maioria dos leitores será capaz de imaginar. Nesta seção e nas próximas duas seções, discutiremos maneiras convenientes de caracterizar e fazer cálculos com distribuições bivariadas. Na seg. 3.7, essas considerações serão estendidas à distribuição conjunta de um número arbitrário e finito de variáveis aleatórias. Distribuições Conjuntas Discretas Exemplo 3.4.2 Patronos do Teatro.Suponha que uma amostra de 10 pessoas seja selecionada aleatoriamente de um teatro com 200 clientes. Uma variável aleatória de interesse pode ser o númeroX de pessoas na amostra com mais de 60 anos de idade, e outra variável aleatória pode ser o númeroSde pessoas na amostra que moram a mais de 40 quilômetros do teatro. Para cada par ordenado(x, y)comx=0, . . . ,10 esim=0, . . . ,10, podemos desejar calcular Pr((X, Y)=(x, y)), a probabilidade de que existamxpessoas da amostra com mais de 60 anos de idade e hásimpessoas na amostra que moram a mais de 40 quilômetros de distância. - Definição 3.4.2 Distribuição Conjunta Discreta.DeixarXeSsejam variáveis aleatórias e considere o ordenado par(X, Y). Se houver apenas um número finito ou no máximo contável de muitos valores possíveis diferentes(x, y)para o par(X, Y), então dizemos queXeStenha umdistribuição conjunta discreta. 3.4 Distribuicées Bivariadas 119 3.4 Bivariate Distributions 119 As duas variaveis aleatérias no Exemplo 3.4.2 tem uma distribuigdo conjunta discreta. The two random variables in Example 3.4.2 have a discrete joint distribution. Teorema Suponha que duas variadveis aleatériasXeScada um tem uma distribuigdo discreta. Entaéo Theorem Suppose that two random variables X and Y each have a discrete distribution. Then 3.4.1 Xe Stem uma distribuigdo conjunta discreta. 3.4.1 X and Y have a discrete joint distribution. ProvaSe ambos XeStiver apenas um numero finito de valores possiveis, entao haverd apenas Proof If both X and Y have only finitely many possible values, then there will be um numero finito de diferentes valores possiveis (x,s/m) para o par (X,5).Por outro lado, se only a finite number of different possible values (x, y) for the pair (X, Y). On the qualquer umXouSou ambos podem assumir um numero infinito contavel de valores possiveis, other hand, if either X or Y or both can take a countably infinite number of possible ent&o também havera um numero infinito contavel de valores possiveis para o par (X,5).Em values, then there will also be a countably infinite number of possible values for the todos esses casos, 0 par(X, Ytem uma distribuigdo conjunta discreta. 7 pair (X, Y). In all of these cases, the pair (X, Y) has a discrete joint distribution. Quando definirmos brevemente a distribuigdo conjunta continua, veremos que 0 andlogo ébvio When we define continuous joint distribution shortly, we shall see that the obvious do Teorema 3.4.1 ndo é verdadeiro. analog of Theorem 3.4.1 is not true. Definicao Funcao Conjunta de Probabilidade, pfO fun¢do de probabilidade conjunta, ou ojunta pf, deXe Definition Joint Probability Function, p.f. The joint probability function, or the joint p.f, of X and 3.4.3 Sé definido como a funcgdoftal que para cada ponto (x,sim) noxy-avido, 3.4.3 Y is defined as the function f such that for every point (x, y) in the xy-plane, f(x, YFPr.(X=xeS=e). ff, y) =Pr(x =x and Y=y). O seguinte resultado é facil de provar porque existem no maximo muitos pares contaveis( The following result is easy to prove because there are at most countably many xX, Yisso deve levar em conta toda a probabilidade de uma distribuicdo conjunta discreta. pairs (x, y) that must account for all of the probability a discrete joint distribution. Teorema DeixarXe Stem uma distribuigdo conjunta discreta. Se(x, yndo é um dos possiveis Theorem Let X and Y have a discrete joint distribution. If (x, y) is not one of the possible 3.4.2 valores do par(X, Y), entdof(x, yE0. Além disso, 3.4.2 values of the pair (X, Y), then f(x, y) = 0. Also, 2 F(x, YF. Yo f@ y=1. Todos(x,y) All (x,y) Finalmente, para cada conjunto Cde pares ordenados, Finally, for each set C of ordered pairs, 2d Prl(X, YE = f(x,y). = Pri(X,Y)eC]= >) fc, y). = (XYEC (x,y)EC Exemplo Especificando uma distribuicdo conjunta discreta por meio de uma tabela de probabilidades.Em um certo subtrbio Example Specifying a Discrete Joint Distribution by a Table of Probabilities. In a certain suburban 3.4.3 area, cada familia relatou o numero de carros e o numero de aparelhos de televisdo que 3.4.3 area, each household reported the number of cars and the number of television sets possuia. DeixarXrepresentam o numero de carros pertencentes a uma familia selecionada that they owned. Let X stand for the number of cars owned by a randomly selected aleatoriamente nesta area. DeixarSrepresentam o numero de aparelhos de televisdo household in this area. Let Y stand for the number of television sets owned by that pertencentes ao mesmo domicilio selecionado aleatoriamente. Nesse caso,Xassume apenas os same randomly selected household. In this case, X takes only the values 1, 2, and 3; valores 1, 2 e 3; Sassume apenas os valores 1, 2, 3 e 4; e a junta pffdleXe5é conforme Y takes only the values 1, 2, 3, and 4; and the joint p.f. f of X and Y is as specified in especificado na Tabela 3.2. Table 3.2. Tabela 3.2junta PFF(x, ypara Table 3.2 Joint p.f. f(x, y) for Exemplo 3.4.3 Example 3.4.3 sim y x 1 2 3 4 x 1 2 3 4 1 0,1 +O 0,1 O 1 01 0 0.1 0 2 0,3 0 0,1 0,2 2 03 0 0.1 0.2 3 0 0,2 0 0) 3 0 0.2 0 0 120 Capitulo 3 Varidveis Aleatdrias e Distribuigées 120 Chapter 3 Random Variables and Distributions Figura 3.100 PF conjunto fx,sim) Figure 3.10 The joint p-f. of f(y) de Xe Sno Exemplo 3.4.3. X and Y in Example 3.4.3. 2 i ! 2 2 ‘ 2 3 | 24 3 | 4 x sim * y Esta junta PF esta esbocada na Figura 3.10. Determinaremos a probabilidade de que o This joint p.f. is sketched in Fig. 3.10. We shall determine the probability that agregado familiar seleccionado aleatoriamente possua pelo menos dois carros e televisées. Em the randomly selected household owns at least two of both cars and televisions. In simbolos, este é Pr(X22 eS=2). symbols, this is Pr(X > 2 and Y > 2). Somandof (x, ysobre todos os valores dex22 esim=2, obtemos o valor By summing f(x, y) over all values of x > 2 and y > 2, we obtain the value Pr. (X22 eS22EF(2,2HF (2,3 Ff (2 4Hf 3,2) Pr(xX >2and Y > 2)= f(2,2)+ f2,3)+ f2,4 4+ fG, 2) + £3,3hF BA) + £3,3)+ FB, 4) =0.5. = 0.5. A seguir, determinaremos a probabilidade de que 0 agregado familiar seleccionado aleatoriamente Next, we shall determine the probability that the randomly selected household owns possua exactamente um carro, nomeadamente Pr(X=1). Somando as probabilidades da primeira linha exactly one car, namely Pr(X = 1). By summing the probabilities in the first row of da tabela, obtemos 0 valor the table, we obtain the value 4 4 Pr.(X=1} F(1, vocé0.2. - Pr(X =1)= > fi, y) =0.2. < sim=1 y=1 Distribuicdes Conjuntas Continuas Continuous Joint Distributions Exemplo Demandas por servicos pUblicos.Considere novamente a distribuigdo conjunta deXesSem Exame- Example Demands for Utilities. Consider again the joint distribution of X and Y in Exam- 3.4.4 ponto 3.4.1. Quando calculamos pela primeira vez as probabilidades para essas duas variaveis 3.4.4 ple 3.4.1. When we first calculated probabilities for these two random variables back aleatérias no Exemplo 1.5.4 na pagina 19 (mesmo antes de nomea-las ou chaméa-las de in Example 1.5.4 on page 19 (even before we named them or called them random varidveis aleatdérias), assumimos que a probabilidade de cada subconjunto do espacgo amostral variables), we assumed that the probability of each subset of the sample space was era proporcional a area de o subconjunto. Como a area do espaco amostral é 29.204, a proportional to the area of the subset. Since the area of the sample space is 29,204, probabilidade de que o par(x, Y}fica em uma regido Cé a area deCdividido por 29.204. the probability that the pair (X, Y) lies in a region C is the area of C divided by 29,204. Também podemos escrever esta relacdo como j j We can also write this relation as 1 1 Pr.(X, YEO = — axgy, (3.4.1) Pr((x, YNEC -|/ / — dx dy, 3.4.1 assumindo que a integral existe. - assuming that the integral exists. < Se olharmos cuidadosamente para a Eq. (3.4.1), notard-se a semelhanga com as Eqs. (3.2.2) e If one looks carefully at Eq. (3.4.1), one will notice the similarity to Eqs. (3.2.2) (3.2.1). Formalizamos esta conexdo da seguinte forma. and (3.2.1). We formalize this connection as follows. Definigao Distribuigdéo Conjunta Continua/PDF Conjunto/Suporte.Duas varidveis aleatoriasXeSter Definition | Continuous Joint Distribution/Joint p.d.f./Support. Two random variables X and Y have 3.4.4 adistribuicgo conjunta continuase existe uma fungdo nao negativa/definido ao longo 3.4.4 a continuous joint distribution if there exists a nonnegative function f defined over de todoxy-plano tal que para cada subconjuntoCdo avido, the entire xy-plane such that for every subset C of the plane, Prl(X% YE = f(x, y) dx dy, Pr[(X, Ye cl= | / f(x, y) dx dy, ia Cc 3.4 Distribuicées Bivariadas 121 3.4 Bivariate Distributions 121 se a integral existir. A fungdo chamado defun¢ao de densidade de probabilidade if the integral exists. The function f is called the joint probability density function conjunta (abreviadopdaf conjunto) deXeS.O fechamento do conjunto {(x, yi f(x, y) >0} é (abbreviated joint p.d.f) of X and Y. The closure of the set {(x, y): f(x, y) > 0} is chamado deapoio de (a distribui¢ao de) (X, Y). called the support of (the distribution of) (X, Y). Exemplo Demandas por servicos pUblicos.No Exemplo 3.4.4, fica claro na Eq. (3.4.1) que o pdf conjunto Example Demands for Utilities. In Example 3.4.4, it is clear from Eq. (3.4.1) that the joint p.d.f. 3.4.5 dexesé a funcdo 3.4.5 of X and Y¥ is the function 1 1 ——__ para 4§x<200 e 1Ssims150, — for4<x <200and1< y <150, F(X, YF Lat P (3.4.2) f(x, y)= | 29,204 » (3.4.2) 0 de outra forma. - 0 otherwise. < Fica claro na Definicdo 3.4.4 que a fdp conjunta de duas variaveis aleatdrias It is clear from Definition 3.4.4 that the joint p.d.f. of two random variables caracteriza a sua distribuigdo conjunta. O resultado a seguir também é direto. characterizes their joint distribution. The following result is also straightforward. Teorema Um PDF conjunto deve satisfazer as duas condigdes a seguir: Theorem A joint p.d.f. must satisfy the following two conditions: 3.4.3 a“ 3.4.3 Ff (x, yR0 para -~<x <we -00<vocé <e, 4 f(@,y)=0 for —co<x<ooand -w<y<o, e and Joofen co poo F(x, y) dx dy=1. 7 / / f(, y) dx dy =1. 7 = 00 —00 —0o J—0o Qualquer fungdo que satisfaca as duas formulas exibidas no Teorema 3.4.3 é a FDP conjunta Any function that satisfies the two displayed formulas in Theorem 3.4.3 is the joint para alguma distribuigdo de probabilidade. p.d.f. for some probability distribution. Um exemplo de grafico de uma FDP conjunta é apresentado na Figura 3.11. An example of the graph of a joint p.d.f. is presented in Fig. 3.11. O volume total abaixo da superficie=f(x, ye acima doxy-plane deve ser 1. A The total volume beneath the surface z = f(x, y) and above the xy-plane must be probabilidade de que o par (X,S)pertencera ao retangulo Cé igual ao volume da 1. The probability that the pair (X, Y) will belong to the rectangle C is equal to the figura sdlida com baseAmostrado na Figura 3.11. O topo desta figura sdlida é volume of the solid figure with base A shown in Fig. 3.11. The top of this solid figure formado pela superficiez=F (x, y). is formed by the surface z = f(x, y). Na seg. 3.5, mostraremos que seXeStem uma distribuigdo conjunta continua, entaoXeS In Sec. 3.5, we will show that if X and Y have a continuous joint distribution, cada um tem uma distribuicgdo continua quando considerado separadamente. Isso parece then X and Y each have a continuous distribution when considered separately. This razoavel intuitivamente. Contudo, o inverso desta afirmacgdo nao é verdadeiro, e o resultado seems reasonable intutively. However, the converse of this statement is not true, and seguinte ajuda a mostrar porqué. the following result helps to show why. Figura 3.11Um exemplo de hx sim) Figure 3.11 An example of SQ, y) PDF conjunto a joint p.d.f. r " 4 i = ‘ x sim x y 122 Capitulo 3 Varidveis Aleatdrias e Distribuicgées 122 Chapter 3 Random Variables and Distributions Teorema Para cada distribuigdo conjunta continua noxy-plane, as duas declaracgdes a seguir Theorem For every continuous joint distribution on the x y-plane, the following two statements 3.4.4 segurar: 3.4.4 hold: eu. Cada ponto individual, e cada sequéncia infinita de pontos, noxy-plane tem i. Every individual point, and every infinite sequence of points, in the xy-plane probabilidade 0. has probability 0. ii. DeixarSer uma funcdo continua de uma variavel real definida em um intervalo ii. Let f be a continuous function of one real variable defined on a (possibly (possivelmente ilimitado)(uma, 6). Os conjuntos {(x, yi sim=f(x), a<x<bhe {(x, vy): x unbounded) interval (a, b). The sets {(x, y): y= f(x), a <x <b} and {(x, y): =f(y), a <y <b}tem probabilidade 0. x = f(y), a < y < b} have probability 0. ProvaDe acordo com a Definic¢do 3.4.4, a probabilidade de uma distribuicgaéo conjunta continua Proof According to Definition 3.4.4, the probability that a continuous joint distri- ser atribuida a uma regido especifica doxy-plane pode ser encontrado integrando o pdf bution assigns to a specified region of the xy-plane can be found by integrating the conjuntof (x, ysobre essa regido, se a integral existir. Se a regido for um Unico ponto, a integral joint p.d-f. f(x, y) over that region, if the integral exists. If the region is a single point, sera 0. Pelo Axioma 3 da probabilidade, a probabilidade para qualquer conjunto contavel de the integral will be 0. By Axiom 3 of probability, the probability for any countable pontos também deve ser 0. A integral de uma funcdo de duas variadveis sobre o grafico de uma collection of points must also be 0. The integral of a function of two variables over fungdo continua noxyplane também é 0. = the graph of a continuous function in the xy-plane is also 0. = Exemplo Nao é uma distribuigdo conjunta continua.Segue-se de (ii) do Teorema 3.4.4 que o Example Not a Continuous Joint Distribution. It follows from (ii) of Theorem 3.4.4 that the 3.4.6 probabilidade de que(X, Y}ficara em cada linha reta especificada no plano é 0. Se X 3.4.6 probability that (X, Y) will lie on each specified straight line in the plane is 0. If tem uma distribuicgdo continua e seS=X, entao ambosxXeStém distribuigdes continuas, X has a continuous distribution and if Y = X, then both X and Y have continuous mas a probabilidade é 1 de que(X, Yesta em linha retasim=x. Por isso, XeSndo pode distributions, but the probability is 1 that (X, Y) lies on the straight line y = x. Hence, ter uma distribuigdo conjunta continua. - X and Y cannot have a continuous joint distribution. < Exemplo Calculando uma constante de normalizacdo.Suponha que a pdf conjunta deXeSé especificado Example Calculating a Normalizing Constant. Suppose that the joint p.d.f. of X and Y is specified 3.4.7 do seguinte modo: 3.4.7 as follows: F(x yr { casim — parax2<sims!, faye { cx?y forx?<y<1, 0 caso contrario. 0 otherwise. Vamos determinar o valor da constantec. We shall determine the value of the constant c. O apoioSde(X, Vesta esbocado na Fig. 3.12. Desdef (x, yE0 foras, segue que The support S of (X, Y) is sketched in Fig. 3.12. Since f(x, y) =0 outside S, it follows that Joofeo SJ 2 poe F (x, y) dx dy= f (x, y) dx dy / / f(x, y)dx ay= | / f(x, y) dx dy fr fi (3.4.3) “wees , ' (3.4.3) - . 4 _ / / 2 _4 cx2vocé morre dx= C. = cx*y dy dx = Cc. -1 2x2 21 =1 J x2 21 Como o valor desta integral deve ser 1, o valor decdeve ser 21/4. Since the value of this integral must be 1, the value of c must be 21/4. Os limites de integracdo na ultima integral em (3.4.3) foram determinados como The limits of integration on the last integral in (3.4.3) were determined as follows. segue. Temos a opcao de integrarxousimcomo a integral interna, e escolhemos sim. We have our choice of whether to integrate x or y as the inner integral, and we chose Entdo, devemos encontrar, para cadax, 0 intervalo desimvalores sobre os quais integrar. y. So, we must find, for each x, the interval of y values over which to integrate. From Na Figura 3.12, vemos que, para cadax,simcorre da curva ondesim=x2para a linha onde Fig. 3.12, we see that, for each x, y runs from the curve where y = x” to the line sim=1. O intervalo dexvalores para a integral externa é de -1 a1 de acordo where y = 1. The interval of x values for the outer integral is from —1 to 1 according conforme a Figura 3.92. Se tivgssemos optado por integrarxpor dentro, entdo para cadasim, nds vemos que to Fig. 3.12. If we had chosen to integrate x on the inside, then for each y, we see that xcorre de-simparao Sim, enquantos/mvai de 0 a 1. A resposta final teria sido x runs from —,/y to ,/y, while y runs from 0 to 1. The final answer would have been mesmo. - the same. < Exemplo Calculando Probabilidades a partir de um PDF ConjuntoPara a distribuicdo conjunta no Exemplo 3.4.7, Example Calculating Probabilities from a Joint p.d.f. For the joint distribution in Example 3.4.7, 3.4.8 vamos agora determinar o valor de Pr(X25$). 3.4.8 we shall now determine the value of Pr(X > Y). O subconjuntoSode Sondex2simesta esbocgado na Fig. 3.13. Por isso, The subset So of S where x > y is sketched in Fig. 3.13. Hence, SJ frfe2 3 Leroy 3 Pr.(X2S f(x, y) dx day= — x2vocé morre dx= —. - Prix >Y)= / / SQ, y) dx dy = / / <x? dy dx =—. < % 0 x24 20 So 0 Jx2 4 20 3.4 Distribuicées Bivariadas 123 3.4 Bivariate Distributions 123 Figura 3.120 apoios de( si simex Figure 3.12 The support S$ y y=x? X, ¥Yno Exemplo 3.4.8. of (X, Y) in Example 3.4.8. (-1, 1) (1, 1) (-1, 1) d, 1) -1 1 x —1 1 x Figura 3.130 subconjuntoSo Sj Figure 3.13 The subset Sp y do suporte Sondex2sim no of the support S where x > y Exemplo 3.4.8. im- in Example 3.4.8. y= p (1,1) sim-x Pp (1, 1) y=x SIM-X2. y= x? 4 1 x -1 1 x Exemplo Determinacdo de uma PDF conjunta por métodos geométricos.Suponha que um ponto(X, Yb se- Example Determining a Joint p.d.f. by Geometric Methods. Suppose that a point (X, Y) is se- 3.4.9 selecionado aleatoriamente de dentro do circulox2+sin2s9. Determinaremos o pdf 3.4.9 lected at random from inside the circle x? + y? <9. We shall determine the joint conjunto deXeS. p.d.f. of X and Y. O apoio de(X, Y¥ 0 conjuntoSde pontos dentro e dentro do circuloxe+sinns9. A The support of (X, Y) is the set S of points on and inside the circle x? + y? <9. afirmacao de que o ponto(X, Y)é selecionado aleatoriamente de dentro do circulo é The statement that the point (X, Y) is selected at random from inside the circle is interpretado como significando que a pdf conjunta deXeSé constante ao longoSe é 0 forasS interpreted to mean that the joint p.d-f. of X and Y is constant over S andis0 outside S. . Por isso, Thus, { Cpara(x, WES, de c for (x, y) ES, Fix, ye f(x,y) -|{ . 0 outra forma. 0 otherwise. Nés devemos ter j j We must have f(x, y) dx dy=cx(area deS¥1. / / f(x, y) dx dy =c x (area of S) = 1. Ss S Como a area do circuloSé 977, o valor da constantecdeve ser 1/977). - Since the area of the circle S is 97, the value of the constant c must be 1/(97). << Distribuigdes Bivariadas Mistas Mixed Bivariate Distributions Exemplo Um ensaio clinico.Considere um ensaio clinico (como o descrito no Exemplo 2.1.12) Example A Clinical Trial. Consider a clinical trial (such as the one described in Example 2.1.12) 3.4.10 em que cada paciente com depressdo recebe um tratamento e é acompanhado para verificar se 3.4.10 in which each patient with depression receives a treatment and is followed to see apresenta uma recaida na depressdo. DeixarXser o indicador se o primeiro paciente é ou ndo whether they have a relapse into depression. Let X be the indicator of whether or um “sucesso” (sem recaida). Aquilo 6X=1 se o paciente nao tiver recaida eX=0 se o paciente tiver not the first patient is a “success” (no relapse). That is X = 1 if the patient does not recaida. Além disso, deixe Pser a proporcdo de pacientes que ndo apresentam recidiva entre relapse and X = 0 if the patient relapses. Also, let P be the proportion of patients todos os pacientes que poderiam receber o tratamento. E claro queXdeve ter uma distribuigao who have no replapse among all patients who might receive the treatment. It is clear discreta, mas pode ser sensato pensar emPcomo uma variavel aleatéria continua assumindo that X must have a discrete distribution, but it might be sensible to think of P as seu valor em qualquer lugar no intervalo [0,1]. EmboraXePndo pode ter nem uma distribuigdo a continuous random variable taking its value anywhere in the interval [0, 1]. Even conjunta discreta nem uma distribuicao conjunta continua, ainda podemos estar interessados though X and P can have neither a joint discrete distribution nor a joint continuous na distribuicao conjunta deXeP. - distribution, we can still be interested in the joint distribution of X and P. < 124 Capitulo 3 Varidveis Aleatdrias e Distribuicgées 124 Chapter 3 Random Variables and Distributions Antes do Exemplo 3.4.10, discutimos distribuigées bivariadas que eram discretas ou Prior to Example 3.4.10, we have discussed bivariate distributions that were continuas. Ocasionalmente, deve-se considerar uma distribuigdo bivariada mista em que either discrete or continuous. Occasionally, one must consider a mixed bivariate dis- uma das variadveis aleatérias é discreta e a outra é continua. Usaremos uma fungaof (x, y) tribution in which one of the random variables is discrete and the other is continuous. para caracterizar tal distribuigdo conjunta da mesma maneira que usamos um PF We shall use a function f(x, y) to characterize such a joint distribution in much the conjunto para caracterizar uma distribuicgdo conjunta discreta ou uma pdf conjunta para same way that we use a joint p.f. to characterize a discrete joint distribution or a joint caracterizar uma distribuicgéo conjunta continua. p.d.f. to characterize a continuous joint distribution. Definigao PF/pdf conjuntoDeixarXeSsejam variadveis aleatdrias tais queXé discreto eSé Definition Joint p.f./p.d.f. Let X and Y be random variables such that X is discrete and Y is 3.4.5 continuo. Suponha que exista uma fungdof (x, ydefinido noxy-plano tal que, 3.4.5 continuous. Suppose that there is a function f(x, y) defined on the xy-plane such para cada parAe&de subconjuntos dos numeros reais, that, for every pair A and B of subsets of the real numbers, ip Pr.(X€Ae SEB f (x, y)dy, (3.4.4) Pr(X © AandY €B)= / S> f(x. y)dy, (3.4.4) BxEA Bea se a integral existir. Entao a fungdo# chamado deconjunto pf/pdfdeXxeS. if the integral exists. Then the function f is called the joint p.f/p.d.f of X and Y. Claramente, a Definicdo 3.4.5 pode ser modificada de uma forma Obvia seSé discreto Clearly, Definition 3.4.5 can be modified in an obvious way if Y is discrete and X eX é continuo. Cada pf/pdf conjunto deve satisfazer duas condicgées. SeXé a variavel is continuous. Every joint p.f./p.d.f. must satisfy two conditions. If X is the discrete aleatéria discreta com valores possiveisx, x2,...eSé a variavel aleatéria continua, random variable with possible values x, x.,... and Y is the continuous random entdof (x, y20 para todosx, vocée variable, then f(x, y) > 0 for all x, y and Joy ® 00 f(xeu, vocé) morre=1. (3.4.5) / > SQ. y)dy =1. (3.4.5) - ~eu=1 —00 jy Porque nao negativo, a soma e a integral nas Eqs. (3.4.4) e (3.4.5) podem ser feitos Because f is nonnegative, the sum and integral in Eqs. (3.4.4) and (3.4.5) can be done na ordem que for mais conveniente. in whichever order is more convenient. Nota: Probabilidades de conjuntos mais gerais.Para um conjunto geralCde pares de Note: Probabilities of More General Sets. For a general set C of pairs of real numeros reais, podemos calcular Pr((X, YJeCusando o pf/pdf conjunto dexXeS.Para cadax, numbers, we can compute Pr((X, Y) € C) using the joint p.f/p.d.f. of X and Y. For deixar Ge {sim:(x, YEG. Entao each x, let C, = {y: (x, y) € C}. Then J Pr.((X, YECr f(x, ydy, Pr((x, Y)€C)= >> / f(x, y)dy, Cx Cc Todosx All x x se todas as integrais existirem. Alternativamente, para cadasim, definir Csim= {x.(x, YEG, e if all of the integrals exist. Alternatively, for each y, define C” = {x : (x, y) € C}, and entdo then Ja Ly . Pr.(% YECF F(x, y) dy, Pr((X, ¥) €C) = / d= fey) | dy, — 0xECsim —0 | recy se a integral existir. if the integral exists. Exemplo Um documento conjunto pf/pdfSuponha que o conjunto pf/pdf dexe5é Example A Joint p.f./p.d.f. Suppose that the joint p.f./p.d.f. of X and Y is 3.4.11 yer 3.4.11 — F(X, VWF Zz parax=1,2,3 e 0<vocé <1. f@ y= JT forx =1,2,3and0<y<1. Devemos verificar para ter certeza de que esta funcdo satisfaz (3.4.5). E mais facil integrar ao We should check to make sure that this function satisfies (3.4.5). It is easier to longo dosimvalores primeiro, entao calculamos integrate over the y values first, so we compute 3 frxyet y31 3 pl yx 34 30" 3 =1. > / dy = > 3 =1. x1 0 xe x=1°9 x=1 Suponha que desejamos calcular a probabilidade de queS21/2 eX22. Ou seja, Suppose that we wish to compute the probability that Y > 1/2 and X > 2. That is, we queremos Pr(X€Ae SEB)comA= [2,0 Je B= [1/2,%). Entdo, aplicamos a Eq. (3.4.4) want Pr(X € A and Y € B) with A = [2, oo) and B = [1/2, oo). So, we apply Eq. (3.4.4) 3.4 Distribuicées Bivariadas 125 3.4 Bivariate Distributions 125 para obter a probabilidade to get the probability BI yer By apy ? Spl xy 5/1 = 4/2)" ee 1-02) "29.5417. ~/ 1*—dy=)° (9) = 0.5417. xX=2 12 3 X=2 3 x2 1/2 3 x=2 3 Para ilustragdo, calculgremos asoma ea integral também na outra ordem. For illustration, we shall compute the sum and integral in the other order also. Para cadasim€ [1/2,1), a R26 (x, Y25/3+sime. Parasim=1/2, a soma é 0. Entdo, 0 For each y € [1/2, 1), ae f(x, y) =2y/34+ y*. For y > 1/2, the sum is 0. So, the probabilidade é probability is 1 [ Cc Ms 1 Cpe, ts 'r2 1 1\*] 1 1\° =simtsim mores = 1- = += 1- = =0.5417. - =y+y|dy=-|]1-[(-= +-=|/1-[(-= = 0.5417. < 123 3 2 3 2 1/2 L3 3 2 3 2 Exemplo Um ensaio clinico.Um possivel pf/pdf conjunto paraXePno Exemplo 3.4.10 é Example A Clinical Trial. A possible joint p.f./p.d.f. for X and P in Example 3.4.10 is 3.4.12 3.4.12 f(x, P= px(\ -ph-x,parax=0,1 e O<p <1. f(x, pp=p'd—p)'*, forx=0,1land0<p<1. Aqui,Xé discreto ePé continuo. A fungdo# ndo negativo e o leitor deve ser capaz Here, X is discrete and P is continuous. The function f is nonnegative, and the de demonstrar que satisfaz (3.4.5). Suponha que desejamos calcular Pr(Xs0 ePs1 reader should be able to demonstrate that it satisfies (3.4.5). Suppose that we wish /2). Isso pode ser calculado como to compute Pr(X <0 and P < 1/2). This can be computed as jz 1 3 1/2 1 > 3 (i-p)dp=- —-[(-12p-(1 - Ob] = =, [ (1— p)dp =—~[0 — 1/2)" — 1—0)"J= 5. 0 2 8 0 2 8 Suponha que também desejemos calcular Pr(X=1). Desta vez, aplicamos a Eq. (3.4.4) Suppose that we also wish to compute Pr(X = 1). This time, we apply Eq. (3.4.4) with com A= {1} eB=(0,1). Nesse caso, A= {1} and B = (0, 1). In this case, J 1 1 1 Pr.(X=1 pdp= -. - Pr(X = 1) -| pdp=-. < 0 2 0 2 Um tipo mais complicado de distribuigaéo conjunta também pode surgir em um problema pratico. A more complicated type of joint distribution can also arise in a practical prob- lema. lem. Exemplo Uma distribuigéo conjunta complicada.Suponha queXeSsdo os momentos em que dois Example A Complicated Joint Distribution. Suppose that X and Y are the times at which two 3.4.13 componentes especificos de um sistema eletr6nico falnam. Pode haver uma certa 3.4.13 specific components in an electronic system fail. There might be a certain probability probabilidade p (0<p <1 que os dois componentes irdo falhar ao mesmo tempo e uma certa p (0 < p <1) that the two components will fail at the same time and a certain probabilidade 1 -pque eles falhardo em momentos diferentes. Além disso, se falharem ao probability 1 — p that they will fail at different times. Furthermore, if they fail at mesmo tempo, entdo o seu tempo de falha comum podera ser distribuido de acordo com uma the same time, then their common failure time might be distributed according to a certa pdf./(x}; se falharem em momentos diferentes, entéo esses tempos poderdo ser certain p.d.f. f(x); if they fail at different times, then these times might be distributed distribuidos de acordo com uma determinada pdf conjuntag(x, y). according to a certain joint p.d.f. g(x, y). A distribuigdo conjunta deXeSneste exemplo ndo é continuo, porque ha The joint distribution of X and Y in this example is not continuous, because probabilidade positivapque(X, Ywai mentir na linhax=sim. A distribuigéo conjunta também there is positive probability p that (X, Y) will lie on the line x = y. Nor does the joint nado possui um pf/pdf conjunto ou qualquer outra fungdo simples para descrevé-la. distribution have a joint p.f./p.d.f. or any other simple function to describe it. There Existem maneiras de lidar com essas distribuigdes conjuntas, mas ndo as discutiremos are ways to deal with such joint distributions, but we shall not discuss them in this neste texto. - text. < Func6es de distribuicao cumulativa bivariada Bivariate Cumulative Distribution Functions O primeiro calculo no Exemplo 3.4.12, a saber, Pr(X<0 e5<1/2), 6 uma generalizacao The first calculation in Example 3.4.12, namely, Pr(x <0 and Y < 1/2), is a gener- do calculo de um cdf para uma distribui¢do bivariada. Formalizamos a generalizacado alization of the calculation of a c.d.f. to a bivariate distribution. We formalize the da seguinte forma. generalization as follows. Definigao Funcao de distribuigdo conjunta (cumulativa)/cdfO funcao de distribuic¢ao conjuntaouarticulacao Definition Joint (Cumulative) Distribution Function/c.d.f. The joint distribution function or joint 3.4.6 funcao de distribuic¢ao cumulativa cdf conjunto) de duas variaveis aleatdériasXeSé 3.4.6 cumulative distribution function (joint c.d.f) of two random variables X and Y is 126 Capitulo 3 Varidveis Aleatdrias e Distribuicgées 126 Chapter 3 Random Variables and Distributions Figura 3.144 probabilidade de Sh Figure 3.14 The probability —y um retangulo. of a rectangle. d-u-— d-t-- chL-- ch-- | | | | a b x a b x definida como a funcdoAtal que para todos os valores dexevocé (-~<x <we -~< vocé < defined as the function F such that for all values of x and y (—oo < x < co and —0o0 < 0), y < 00), F(x, yJ=Pr. (XSxe Sse). F(x, y) =Pr(X <x andY <y). Fica claro na Definicdo 3.4.6 quef(x,yé mondtono aumentando emxpara cada fixo sime é It is clear from Definition 3.4.6 that F(x, y) is monotone increasing in x for each fixed monotono aumentando emsimpara cada fixox. y and is monotone increasing in y for each fixed x. Se o cdf conjunto de duas varidveis aleatérias arbitrariasXe5éF,entao a probabilidade de If the joint c.d.f. of two arbitrary random variables X and Y is F, then the que o par (X,5)ficara em um retangulo especificado noxyavido pode ser encontrado emAda probability that the pair (X, Y) will lie in a specified rectangle in the xy-plane can be seguinte forma: Para determinados numerosuma < bec<d, found from F as follows: For given numbers a < b andc <d, Pr.(uma < XS bec < Sse) Pr(a < X <bandc <Y <d) =Pr.(uma < X< be Sse}-Pr.(uma < Xs be S<c) = [Pr.(X<be =Pr(a < X <band Y <d) — Pr(a < X <bandY <c) Sse}Pr.(X<aeS<e) (3.4.6) = [Pr(X < band Y <d) — Pr(X <aandY <d)] (3.4.6) - [Pr(X<be Ssc}Pr. (XS ae SS) —[Pr(X <band Y <c) —Pr(X <aandY <c)] =F (b, d)-Mania)-F (b, C+F (a, ©). = F(b, d) — F(a, d) — F(b, c) + F(a, c). Portanto, a probabilidade do retanguloCesbocado na Fig. 3.14 é dado pela Hence, the probability of the rectangle C sketched in Fig. 3.14 is given by the combinacdo de valores defapenas derivado. Deve-se notar que dois lados do combination of values of F just derived. It should be noted that two sides of the retangulo estdo incluidos no conjuntoCe os outros dois lados sdo excluidos. Assim, rectangle are included in the set C and the other two sides are excluded. Thus, if there se houver pontos ou segmentos de reta na fronteira de Cque tém probabilidade are points or line segments on the boundary of C that have positive probability, it is positiva, é importante distinguir entre as desigualdades fracas e as desigualdades important to distinguish between the weak inequalities and the strict inequalities in estritas na Eq. (3.4.6). Eq. (3.4.6). Teorema DeixarXe Ster um cdf conjuntof.O CDFFide apenas uma Unica variavel aleatériaX Theorem Let X and Y have a joint c.d.f. F. The c.d-f. F, of just the single random variable X 3.4.5 pode ser derivado do cdf conjuntotomoAl (xFlimaosim>~F/x, y). Da mesma forma, 0 CDF F 3.4.5 can be derived from the joint c.d.f. F as F,(x) =limy_,.. F(x, y). Similarly, the c.d.f. 2deSé igual af2(sFlimaox-~»F/(x,y), para 0<vocé <t, F, of Y equals F>(y) = lim,_,,., F(x, y), for 0 < y < oo. ProvaProvamos a afirmacdo sobreFicomo a afirmacgdo sobre/E similar. Deixe -00<x <, Proof We prove the claim about F; as the claim about F) is similar. Let —oo < x < 00. Definir Define Bo= {XS xe SSO}, Bo={X <x and Y <0}, Bn= {XS xen-1<SSn},paran=1,2,..., B,={X <xandn-—1<Y<n}, forn=1,2,..., (g¥ m Acuz Bn, paraeu=1,2,.... Am =|) B,, form=1,2,.... n=0 n=0 Uso Entdo {XS x} = n=-0Bn, @Aeu= {XS xe SS eu}paraeu=1,2,.... Segue-se Then {X <x}= Um 9 B,, and A,, ={X <x and Y < m} form =1, 2,.... It follows aquele pr(Aeu=F(x,mjpara cadaeu. Também, that Pr(A,,) = F(x, m) for each m. Also, 3.4 Distribuicées Bivariadas 127 3.4 Bivariate Distributions 127 Gy) x Fi(xj=Pr.(XSX}Pr. Bn F\(x) =Pr(X <x) =Pr (U *, n=1 n=1 > CO =~ Pr.(BnFlimao_ Pr.(Aeu) = > Pr(B,,) = lim Pr(A,,) eu-o m—>oo n=0 n=0 = mao F(x,mFlimaoF(x,y), = lim F(x,m)= lim F(x, y), eu-o Simo m—->oo yoo onde a terceira igualdade decorre da aditividade contavel e do fato de que o&n where the third equality follows from countable additivity and the fact that the B,, os eventos sao disjuntos, e a Ultima igualdade decorre do fato de queF(x,yé mondtono events are disjoint, and the last equality follows from the fact that F(x, y) is monotone aumentando emsimpara cada fixox. 7 increasing in y for each fixed x. 7 Outras relagdes envolvendo a distribui¢do univariada dex, a distribuigdo univariada Other relationships involving the univariate distribution of X, the univariate distri- deS,e sua distribuigdo conjunta bivariada sera apresentada na proxima secao. bution of Y, and their joint bivariate distribution will be presented in the next section. Finalmente, seXeSter uma distribuigdo conjunta continua com PDF conjuntofentdo o Finally, if X and Y have a continuous joint distribution with joint p.d.f. f, then cdf conjunto em(x, yé the joint c.d-f. at (x, y) is Jsim Jx y x F(X, Fir, s) dr ds. F(x, y) =| / f(r, s) dr ds. -0 -0% —oo0 J—00 Aqui, os simbolosReésdo usadas simplesmente como variaveis ficticias de integragdo. A Here, the symbols r and s are used simply as dummy variables of integration. The pdf conjunta pode ser derivada da pdf conjunta usando as relacées joint p.d.f. can be derived from the joint c.d.f. by using the relations RF(XY) _ RFXY) F(x, y) _ 0 F(x, y) F(x, YP Y= UN: fy) = “= u Oxoy Oyox axdy dyox em todos os pontos (x,sim) em que existem essas derivadas de segunda ordem. at every point (x, y) at which these second-order derivatives exist. Exemplo Determinando um PDF conjunto a partir de um cdf conjuntoSuponha queXe5sdo variaveis aleatérias Example Determining a Joint p.d.f. froma Joint c.d.f. Suppose that X and Y are random variables 3.4.14 que assumem valores apenas nos intervalos 0SX<2 e 0S S<2. Suponha também que o 3.4.14 that take values only in the intervals 0 < X <2 and 0 < Y <2. Suppose also that the cdf conjunto dexXeS,para OS x<2 e Os s/ms2, 6 0 seguinte: joint c.d.f. of X and Y, for0 <x <2 and0< y <2, is as follows: 1 1 FXYF Tere. (3.4.7) F(x, y)= 16 + y). (3.4.7) Vamos primeiro determinar o cdf apenas da variavel aleatoriaXe entdo determine o We shall first determine the c.d-f. F; of just the random variable X and then determine pdf conjuntofdexeS. the joint p.d-f. f of X and Y. O valor deF(x,yem qualquer momento (x, s/m) noxy-plano que ndo representa The value of F(x, y) at any point (x, y) in the xy-plane that does not represent um par de valores possiveis deXe Spode ser calculado a partir de (3.4.7) e o fato de a pair of possible values of X and Y can be calculated from (3.4.7) and the fact that que F(x, y=Pr.(XSxe Sse). Assim, se qualquer umx <0 ouvocé <0, entaoF/x,y-0. Se F(x, y)=Pr(X <x and Y < y). Thus, if eitherx <Oory <0, then F(x, y) =0.If both ambos x >2 evocé >2, entaoF/(x,y-1. Se Os x<2 evocé >2, entaoF(x,yFF(x,2), e seque da x >2andy>2,then F(x, y)=1.If0 <x <2and y >2, then F(x, y) = F(x, 2), and Eq. (3.4.7) que it follows from Eq. (3.4.7) that 1 1 FX YF gre) F(x, y)= a +2). Da mesma forma, se O<sims2 ex >2, entdo Similarly, if 0 < y <2 and x > 2, then 1 1 F(X, YF gn oe). F(x, y) = av + 2). A fungaoF(x, yjagora foi especificado para cada ponto doxy-avido. The function F(x, y) has now been specified for every point in the xy-plane. Ao deixarsim- ~,descobrimos que o cdf apenas da variavel aleatériaXé By letting y — oo, we find that the c.d.f. of just the random variable X is f parax <0, 0 for x < 0, A(x 4Xx(x+2) — para OS x<2, Fix) = 4 gx(v+2) for0<x <2, 8 1 parax >2. 1 for x > 2. 128 Capitulo 3 Varidveis Aleatdrias e Distribuicgées 128 Chapter 3 Random Variables and Distributions Além disso, para 0<x <2 e 0<vocé <2, Furthermore, for 0 < x <2 and0< y <2, F(X, 1 PF (x, 1 BEY) I ire) OPO) le ty), Oxoy 8 axdy 8 Também sex <0, vocé <0,x >2, ouvocé >2, entdo Also, if x <0, y < 0, x > 2, or y > 2, then BFKY) 6 PF Y) _ 9 OxOy ‘ axdy , Portanto, o pdf conjunto dexeSé Hence, the joint p.d.f. of X and Y is { &(x+e) ara 0<x <2 e 0<vocé <2, caso L(x +y) forO<x <2and0<y <2, F(x, y P f(x, yy=y > * 0 - contrario. - 0 otherwise. < Exemplo Demandas por servicos ptiblicos.Podemos calcular o cdf conjunto para a demanda de agua e eletricidade Example Demands for Utilities. We can compute the joint c.d.f. for water and electric demand 3.4.15 no Exemplo 3.4.4 usando a pdf conjunta dada na Eq. (3.4.2). Se qualquer umxs4 ou 3.4.15 in Example 3.4.4 by using the joint p.d.f. that was given in Eq. (3.4.2). Ifeither x < 4 or sims1, entaoF(x,y-0 porque tambémxXs xouSssimseria impossivel. Da mesma forma, y <1, then F(x, y) = 0 because either X < x or Y < y would be impossible. Similarly, se ambosx2200 esim=150,F(x,y=1 porque ambosX<xe Sssimseriam eventos certos. if both x > 200 and y > 150, F(x, y) = 1 because both X < x and Y < y would be sure Para outros valores dexesim, calculamos events. For other values of x and y, we compute Jxjsim 1 Xy . “fy 4 xy —— dyx= —— para 4s x<200, 1<sims150, — dydx = —— for4<x <200,1<y< 150, 4 1 29204 29,204 4 J1 29,204 29,204 fxfiso 1 x x ps0 4 x F(x YF —— dyx= — para 4sx<200,vocé >150, Fax, y= / / ——dydx = — for4<x < 200, y > 150, 4 1 29,204 196 4 J1 29,204 196 J200f sim 1 sim 200 py 4 y ——dyx= — __parax >200, 1<sims150. —dydx = ——~ __ forx > 200,1< y < 150. 4 1 29,204 149 4 1 29,204 149 A razdo pela qual precisamos de trés casos na férmula paraF(x,yé que a pdf conjunta The reason that we need three cases in the formula for F(x, y) is that the joint p.d.f. na Eq. (3.4.2) cai para 0 quandoxultrapassa acima de 200 ou quandos/mcruza acima in Eq. (3.4.2) drops to 0 when x crosses above 200 or when y crosses above 150; de 150; portanto, nunca queremos integrar 1/29,204 alémx=200 ou maiss/m=150. Se hence, we never want to integrate 1/29,204 beyond x = 200 or beyond y = 150. If considerarmos o limite comosim> »deF(x,yXpara fixo 4<x<200), obtém-se 0 segundo one takes the limit as y > oo of F(x, y) (for fixed 4 < x < 200), one gets the second caso na férmula acima, que entdo é o cdf dex, Fi(x). Da mesma forma, se tomarmos 0 case in the formula above, which then is the c.d.f. of X, F,(x). Similarly, if one takes limitex+~ F(x, yXpara fixo 1<sims150), obtém-se o terceiro caso na formula, que entdo the lim,_,,. F(x, y) (for fixed 1 < y < 150), one gets the third case in the formula, é o cdf deS, F2(s). - which then is the c.d.f. of Y, Fy(y). < Resumo Summary O cdf conjunto de duas varidveis aleatdriasXe SéF/x, yPr.(Xsxe Sse). A pdf conjunta de The joint c.d.f. of two random variables X and Y is F(x, y) = Pr(X <x and Y < y). duas variaveis aleatérias continuas 6 uma fungdo nao negativaftal que a probabilidade The joint p.d.f. of two continuous random variables is a nonnegative function f such do par(X, Yestar em um conjuntoCé a integral def (x, ysobre o conjuntoG se a integral that the probability of the pair (X, Y) being ina set C is the integral of f(x, y) over the existir. A fdp conjunta também é a segunda derivada parcial mista da fdc conjunta em set C, if the integral exists. The joint p.d-f. is also the second mixed partial derivative relacdo a ambas as variaveis. O FP conjunto de duas variaveis _aleatérias discretas 6 uma of the joint c.d.f. with respect to both variables. The joint p.f. of two discrete random fungdo nado negativastal que a probabilidade do par(X, Yestar em um conjuntoCé a soma variables is anonnegative function f such that the probability of the pair (X, Y) being def (x, yJem todos os pontosC. Um PF conjunto pode ser estritamente positivo em muitos inaset C is the sum of f(x, y) over all points in C. A joint p.f. can be strictly positive at pares contaveis(x, yyno maximo. O conjunto pf/pdf de uma variavel aleatoria discreta Xe countably many pairs (x, y) at most. The joint p.f./p.d.f. of a discrete random variable uma variavel aleatéria continuaSé uma funcado nao negativaftal que a probabilidade do X and a continuous random variable Y is a nonnegative function f such that the par(X, Yestar em um conjuntoCé obtido somandof (x, y/geralxde tal modo que(x, yYEC probability of the pair (X, Y) being in a set C is obtained by summing f(x, y) over para cadasime entdo integrando a funcdo resultante desim. all x such that (x, y) € C for each y and then integrating the resulting function of y. 3.4 Distribuicées Bivariadas 129 3.4 Bivariate Distributions 129 Exercicios Exercises 1.Suponha que a pdf conjunta de um par de variaveis Determinar(a)o valor da constantec 1. Suppose that the joint p.d.f. of a pair of random vari- Determine (a) the value of the constant c; aleatérias(X, Yé constante no retangulo onde Osxs2e0 = (b)Pr.(OSXS1/2}(c)Pr.(SSX+1} ables (X, Y) is constant on the rectangle where 0 <x <2 (b) Pr(O < X < 1/2); (ec) Pr(Y < X +1); <sims1, e suponha que o pdf seja 0 fora deste (d)Pr.(S=X2). and 0 < y <1, and suppose that the p.d.f. is 0 off of this (d) Pr(Y = X”). retangulo. rectangle. . . . 6.Suponha que um ponto (X,5)é escolhido aleatoriamente da : 6. Suppose that a point (X, Y) is chosen at random from a.Encontre o valor constante do pdf no retangulo. regio Snoxy-plano contendo todos os pontos (x, sim) de tal a. Find the constant value of the p.d.f. on the rectangle. the region S in the xy-plane containing all points (x, y) b.Encontrar PR(X25). modo quex20,simz20 e 4sim+xs4, b. Find Pr(x > Y). such that x > 0, y>0, and 4y +x <4. 2.Suponha que em um letreiro elétrico haja trés [ampadas na a.Determine a pdf conjunta dexeS. 2. Suppose that in an electric display sign there are three a. Determine the joint p.d.f of X and Y. primeira fileira e quatro lampadas na segunda fileira. DeixarX b.Suponha queSoé um subconjunto da regidoStendo light bulbs in the first row and four light bulbs in the second b. Suppose that Sp is a subset of the region S having area denota o numero de lampadas na primeira linha que area ae determine Pr[(X, YE So]. row. Let X denote the number of bulbs in the first row that a and determine Pr[(X, Y) € So]. queimarao em um horario especificadote deixar Sdenota o . will be burned out at a specified time r, and let Y denote . . numero de lampadas na segunda linha que queimardo ao 7.Suponha que um ponto (x,5)deve ser escolhido no . the number of bulbs in the second row that will be burned 7 Suppose that a point (X, Y) Is to be chosen from the : , quadradoSnoxy-plano contendo todos os pontos (x,sim . - square S in the xy-plane containing all points (x, y) such mesmo tempot.Suponha que a junta PF dexe Sé conforme . out at the same time t. Suppose that the joint p.f. of X and : especificado na tabela a seguir: ) tal que Os xs1 e O<sims1. Suponha que a Y is as specified in the following table: that 0 <x <1 and 0 < y <1. Suppose that the probabil- probabilidade de 0 ponto escolhido ser 0 canto(0,0)é ity that the chosen point will be the corner (0, 0) is 0.1, 5 0,1, a probabilidade de que seja o canto(1,0% 0,2, a y the probability that it will be the corner (1, 0) is 0.2, the probabilidade de que seja 0 canto(0,1# 0,4,ea probability that it will be the corner (0, 1) is 0.4, and the xX 0 1 2 3 4 probabilidade de que seja o canto(1,1 0,1. Suponha Xx 0 1 2 3 4 probability that it will be the corner (1, 1) is 0.1. Suppose —_ WS também que se 0 ponto escolhido ndo for um dos —_ WS also that if the chosen point is not one of the four cor- 0 0,08 0,07 0,06 0,01 0,01 quatro vértices do quadrado, entdo sera um ponto 0 0.08 0.07 0.06 0.01 0.01 ners of the square, then it will be an interior point of the 1 0,06 0,10 0,12 0,05 0,02 interior do quadrado e sera escolhido de acordo com 1 0.06 0.10 0.12 0.05 0.02 square and will be chosen according to a constant p.d.f. uma fdp constante sobre o interior do quadrado. over the interior of the square. Determine (a) Pr(X < 1/4) 2 005 0,06 0,09 0,04 0,03 — Determinar(a)Pr.(xX<1/A) e(b)Pr.(X+SS1), 2 0.05 0.06 0.09 0.04 0.03 and (b) Pr(X + Y <1). 3 0,02 0,03 0,03 0,03 0,04 8.Suponha queXeSsdo variaveis aleatdrias tais que (X, Y)deve 3 0.02 0.03 0.03 0.03 0.04 8. Suppose that X and Y are random variables such that pertencer ao retangulo noxyplano contendo todos os pontos( (X, Y) must belong to the rectangle in the xy-plane con- Determine cada uma das seguintes probabilidades: x, yJpara o qual O<x<3 e Ossims4. Suponha também que o cdf Determine each of the following probabilities: taining all points (x, y) for which 0 < x <3 and0<y <4. conjunto deXeSem todos os pontos (x,s/m) neste retangulo é Suppose also that the joint c.d.f. of X and Y at every point a.Pr.(X=2) b.Pr.(S22) especificado da seguinte forma: a. Pr(x = 2) b. Pr(Y > 2) (x, y) in this rectangle is specified as follows: c.Pr.(XS2 eS<2) d.Pr.(X=5) ce. Pr(X <2 and Y <2) d. Pr(x = Y) 1 1 2 e.Pr.(X > Y) FIXYF = _xyvote). e. Pr(X > Y) Fx, y= Teexyae + y). 3.Suponha quexestém uma distribuigdo conjunta discreta para a Determinar(a)Pr.(1$X<2 e 1<S<2} 3. Suppose that X and Y have a discrete joint distribution Determine (a) Pr(1 < X <2 and1<Y <2); qual o PF conjunto é definido como segue: (b)Pr.(2<X<4 e 2<S<4}(c)o cdf deS for which the joint p.f. is defined as follows: (b) Pr(2 < X <4 and 2 <Y <4); (c) the c.d.f. of Y; j ; <X). joint p.d.f. ; < X). Fe xtsim| parax=-2,-1,0,1,2e (d)o pdf conjunto deXe5S;(e)Pr. (SX) clx + y| forx—=—2, 1,0, 1,2 and (d) the joint p.d.f. of X and Y; (e) Pr(Y < X) F(X, VF | sim= -2,-1,0,1,2, Caso 9.No Exemplo 3.4.5, calcule a probabilidade de que a f@y= y =-2, -1, 0, 1, 2, 9. In Example 3.4.5, compute the probability that water 0 contrario. demanda por 4guaXé maior que a demanda elétricaS. 0 otherwise. demand X is greater than electric demand Y. Determinar(a)o valor da constantec(b)Pr.(X= 0 eS= -2}( 10.Deixar Ssera a taxa (chamadas por hora) com que as chamadas Determine (a) the value of the constant c; (b) Pr(X = 10. Let Y be the rate (calls per hour) at which calls arrive c)Pr.(X=1 }(e)Pr.(| X-5|<1). chegam a uma central telefénica. DeixarXserd o numero de chamadas 0 and Y = —2): (e) Pr(X = 1); (d) Pr(|X — Y| <1). at a switchboard. Let X be the number of calls during a ‘ durante um periodo de duas horas. Uma escolha popular de pf/pdf , , ~ two-hour period. A popular choice of joint p.f./p.d.f. for 4.Suponha queXeStém uma distribuicao conjunta continua para a conjunto para (X, Yneste exemplo seria algo como 4. Suppose that X and Y have a continuous joint distribu- (X, Y) in this example would be one like qual a pdf conjunta é definida da seguinte forma: { tion for which the joint p.d.f. is defined as follows: { F(X YF ie 3sim sevocé >0 ex=0,1,... 5 fo. y= | Cy 39 ify >Oandx=0,1,..., f(x, YECR para OSx<2 e Ossims1, caso Oo. de outra forma. fs y= | cy~ for0 =x <2and0<y <1, 0 otherwise. 0 —contrario. 0 otherwise. ; a.Verifique isso um PF/pdf conjuntoDicaPrimeiro, some ox ; a. Verify that f is a joint p.f/p.d.f. Hint: First, sum over Determinar(a)o valor da constantec(b)Pr.(X+5 >2}(c) valores usando a formula bem conhecida para a expansdo Determine (a) the value of the constant c; (b) Pr(X + Y > the x values using the well-known formula for the Pr.(S <12}(e)Pr.(XS1}(e)Pr. (X=3.5). em série de poténcias deessim. 2); (ce) Pr(Y < 1/2); (d) Pr(X < 1); (e) Pr(X = 3Y). power series expansion of e?”. 5.Suponha que a pdf conjunta de duas variaveis aleatériasXe b.Encontrar PR(X=0). 5. Suppose that the joint p.d.f. of two random variables X b. Find Pr(X = 0). Se 0 seguinte: { 11.Considere o ensaio clinico de medicamentos para depressdo no Exemplo and ¥ is as follows: 11. Consider the clinical trial of depression drugs in Ex- ara O<sims1 ~, de 2.1.4. Suponha que um paciente seja selecionado aleatoriamente entre os c(x2 + for0O<y<1-— x2, ample 2.1.4. Suppose that a patient is selected at random F(X, YF clare) res forma 150 pacientes desse estudo e registremosSum fa y= “ y) otherwice. from the 150 patients in that study and we record Y, an 130 Chapter 3 Random Variables and Distributions Table 3.3 Proportions in clinical depression study for Exercise 11 Treatment group (Y) Response (X) Imipramine (1) Lithium (2) Combination (3) Placebo (4) Relapse (0) 0.120 0.087 0.146 0.160 No relapse (1) 0.147 0.166 0.107 0.067 indicator of the treatment group for that patient, and X, an indicator of whether or not the patient relapsed. Table 3.3 contains the joint p.f. of X and Y. a. Calculate the probability that a patient selected at random from this study used Lithium (either alone or in combination with Imipramine) and did not re- lapse. b. Calculate the probability that the patient had a re- lapse (without regard to the treatment group). 3.5 Marginal Distributions Earlier in this chapter, we introduced distributions for random variables, and in Sec. 3.4 we discussed a generalization to joint distributions of two random variables simultaneously. Often, we start with a joint distribution of two random variables and we then want to find the distribution of just one of them. The distribution of one random variable X computed from a joint distribution is also called the marginal distribution of X. Each random variable will have a marginal c.d.f. as well as a marginal p.d.f. or p.f. We also introduce the concept of independent random variables, which is a natural generalization of independent events. Deriving a Marginal p.f. or a Marginal p.d.f. We have seen in Theorem 3.4.5 that if the joint c.d.f. F of two random variables X and Y is known, then the c.d.f. F1 of the random variable X can be derived from F. We saw an example of this derivation in Example 3.4.15. If X has a continuous distribution, we can also derive the p.d.f. of X from the joint distribution. Example 3.5.1 Demands for Utilities. Look carefully at the formula for F(x, y) in Example 3.4.15, specifically the last two branches that we identified as F1(x) and F2(y), the c.d.f.’s of the two individual random variables X and Y. It is apparent from those two formulas and Theorem 3.3.5 that the p.d.f. of X alone is f1(x) = ⎧ ⎨ ⎩ 1 196 for 4 ≤ x ≤ 200, 0 otherwise, which matches what we already found in Example 3.2.1. Similarly, the p.d.f. of Y alone is f2(y) = ⎧ ⎨ ⎩ 1 149 for 1 ≤ y ≤ 150, 0 otherwise. ◀ The ideas employed in Example 3.5.1 lead to the following definition. 130 Capítulo 3 Variáveis Aleatórias e Distribuições Tabela 3.3Proporções no estudo clínico de depressão para o Exercício 11 Grupo de tratamento (S) Imipramina (1) Lítio (2) Combinação (3) Placebo (4) Resposta (X) Recaída (0) Sem recaída (1) 0,120 0,147 0,087 0,166 0,146 0,107 0,160 0,067 indicador do grupo de tratamento para aquele paciente, eX, um indicador de se o paciente teve ou não uma recaída. A Tabela 3.3 contém o FP conjunto deXeS. ou em combinação com Imipramina) e não teve recaída. b.Calcule a probabilidade de o paciente ter tido uma recaída (independentemente do grupo de tratamento). a.Calcule a probabilidade de um paciente selecionado aleatoriamente neste estudo usar Lítio (seja sozinho 3.5 Distribuições Marginais Anteriormente neste capítulo, introduzimos distribuições para variáveis aleatórias e na Seção. 3.4 discutimos uma generalização para distribuições conjuntas de duas variáveis aleatórias simultaneamente. Muitas vezes, começamos com uma distribuição conjunta de duas variáveis aleatórias e depois queremos encontrar a distribuição de apenas uma delas. A distribuição de uma variável aleatóriaXcalculada a partir de uma distribuição conjunta também é chamada de distribuição marginal deX.Cada variável aleatória terá uma fdc marginal, bem como uma fdp ou pf marginal. Também introduzimos o conceito de variáveis aleatórias independentes, que é uma generalização natural de eventos independentes. Derivando um PF Marginal ou um PDF Marginal Vimos no Teorema 3.4.5 que se a fdc conjuntaFde duas variáveis aleatóriasX eSé conhecido, então o cdfF1da variável aleatóriaXpode ser derivado de F.Vimos um exemplo desta derivação no Exemplo 3.4.15. SeXtem uma distribuição contínua, também podemos derivar a pdf deXda distribuição conjunta. Exemplo 3.5.1 Demandas por serviços públicos.Observe atentamente a fórmula paraF(x,y)no Exemplo 3.4.15, especificamente os dois últimos ramos que identificamos comoF1(x)eF2(s), os cdfs das duas variáveis aleatórias individuaisXeS.É evidente a partir dessas duas fórmulas e Teorema 3.3.5 que a pdf deXsozinho é ⎧ ⎨ 1 para 4≤x≤200, de outra forma, f1(x)= 196 0 ⎩ que corresponde ao que já encontramos no Exemplo 3.2.1. Da mesma forma, o pdf deS sozinho é ⎧ ⎨1 por 1≤sim≤150, de outra forma. f2(s)= ⎩149 0 - As ideias empregadas no Exemplo 3.5.1 levam à seguinte definição. 3.5 Distribuigdes Marginais 131 3.5 Marginal Distributions 131 Figura 3.15Informatica Sh Figure 3.15 Computing y fi(x)da junta pf ° eo ie! e f(x) from the joint p.f. ° eo ie! e ° e je; e ° e je; e e e |e, e e e |e, e e e | e | e e e | e | e e e |e; e@ e e |e; e@ oe lel oe lel Definicao Marginal cdf/pf/pdfSuponha queXeStém uma distribuigdo conjunta. O cdf de Definition = Marginal c.d.f./p.f./p.d.f. Suppose that X and Y have a joint distribution. The c.d.f. of 3.5.1 Xderivado pelo Teorema 3.4.5 é chamado decdf marginaldeX. Da mesma forma, 0 PF 3.5.1 X derived by Theorem 3.4.5 is called the marginal c.d.f of X. Similarly, the p.f. or p.d-f. ou pdf deXassociado ao cdf marginal dexé chamado demarginal pfoumarginal pdfde of X associated with the marginal c.d-f. of X is called the marginal p.f or marginal x. p.a.f of X. Para obter uma formula especifica para o PF marginal ou FD marginal, comegamos com uma To obtain a specific formula for the marginal p.f. or marginal p.d.f., we start with distribuicdo conjunta discreta. a discrete joint distribution. Teorema Sexe Stém uma distribuigdo conjunta discreta para a qual o FP conjunto éfentdo o Theorem If X and Y have a discrete joint distribution for which the joint p.f. is f, then the 3.5.1 marginal pffidexé 3.5.1 marginal p.f. f, of X is fi(xF 2 f(x,y). (3.5.1) fie) = d> fx, y). (3.5.1) Todossin All y Da mesma forma, 0 PF marginalfdeSéf(sk 2 Todosxf (x, y). Similarly, the marginal p.f. fp of Y is fo(v) = oan, £@, y)- ProvaProvamos 0 resultado parafi, como prova parafE similar. Ilustramos a prova na Fig. Proof We prove the result for f,, as the proof for f, is similar. We illustrate the 3.15. Nessa figura, o conjunto de pontos na caixa tracejada é 0 conjunto de pares coma proof in Fig. 3.15. In that figure, the set of points in the dashed box is the set of primeira coordenadax. O evento {X=x}pode ser expresso como a unido dos eventos pairs with first coordinate x. The event {X = x} can be expressed as the union of the representados pelos pares na caixa tracejada, a saber, Bsim= {X=xe S=sim} events represented by the pairs in the dashed box, namely, B, = {X =x and Y = y} FSU todos possiveissim. O Bsimos eventos sao disjuntos e Pr(Bsim¥ f(x, y). Desde Pr(X=x for all possible y. The By events are disjoint and Pr(By) = f(x, y). Since Pr(X =x) = TodossimPr.(Bsim), Eq. (3.5.1) é valido. a Yan y Pr(B,), Eq. (3.5.1) holds. a Exemplo Derivando um PF Marginal de uma Tabela de Probabilidades.Suponha queXeSsdo as Example Deriving a Marginal p.f. from a Table of Probabilities. Suppose that X and Y are the 3.5.2 varidveis aleatérias no Exemplo 3.4.3 na pagina 119. Estas sdo respectivamente o numero de carros e 3.5.2 random variables in Example 3.4.3 on page 119. These are respectively the numbers televisdes pertencentes a um agregado familiar seleccionado aleatoriamente numa determinada area of cars and televisions owned by a radomly selected household in a certain suburban suburbana. A Tabela 3.2 na pagina 119 fornece seu FP conjunto, e repetimos essa tabela na Tabela 3.4 junto area. Table 3.2 on page 119 gives their joint p.f., and we repeat that table in Table 3.4 com os totais de linhas e colunas adicionados as margens. together with row and column totals added to the margins. O PF marginalfideXpode ser lido nos totais das linhas da Tabela 3.4. Os numeros The marginal p.f. f; of X can be read from the row totals of Table 3.4. The foram obtidos somando os valores de cada linha desta tabela a partir das quatro colunas numbers were obtained by summing the values in each row of this table from the four da parte central da tabela (aquelas rotuladassim=1,2,3,4). Desta forma, verifica-se quefi (1) columns in the central part of the table (those labeled y = 1, 2, 3, 4). In this way, it is =0.2,A(20.6,f (30.2, efi(x0 para todos os outros valores dex. Este FP marginal da as found that f;(1) = 0.2, f,(2) = 0.6, f,(3) = 0.2, and f,(x) = 0 for all other values of x. probabilidades de uma familia selecionada aleatoriamente possuir 1, 2 ou 3 carros. Da This marginal p.f. gives the probabilities that a randomly selected household owns mesma forma, o PF marginal/de5S,as probabilidades de uma familia possuir 1, 2, 3 ou 4 1, 2, or 3 cars. Similarly, the marginal p.f. f, of Y, the probabilities that a household televisores podem ser lidas nos totais da coluna. Esses numeros foram obtidos somando owns 1, 2, 3, or 4 televisions, can be read from the column totals. These numbers were os numeros de cada uma das colunas das trés linhas da parte central da tabela (aquelas obtained by adding the numbers in each of the columns from the three rows in the rotuladasx=1,2,3.) - central part of the table (those labeled x = 1, 2, 3.) < O nomedistribui¢go marginaleriva do facto de as distribuigdes marginais serem os The name marginal distribution derives from the fact that the marginal distribu- totais que aparecem nas margens de tabelas como a Tabela 3.4. tions are the totals that appear in the margins of tables like Table 3.4. SeXeStém uma distribuigdo conjunta continua para a qual a pdf conjunta éfentdo o If X and Y have a continuous joint distribution for which the joint p.d.f. is f, then pdf marginalfideXé novamente determinado da maneira mostrada na Eq. (3.5.1), mas the marginal p.d-f. f; of X is again determined in the manner shown in Eq. (3.5.1), but 132 Capitulo 3 Varidveis Aleatdrias e Distribuicgées 132 Chapter 3 Random Variables and Distributions Tabela 3.4junta PFF(x, ycom marginal Table 3.4 Joint pf. f(x, y) with marginal PFs para Exemplo 3.5.2 p.f£’s for Example 3.5.2 sim y x 1 2 3 4 Total x 1 2 3 4 Total 1 01 0 0,1 0 0,2 1 01 0 O11 0 0.2 2 03 0 0,1 0,2 0,6 2 0.3 0 O1 02 0.6 3 0 020 O 0,2 3 0 02 0 O 0.2 Total 0,40,20,2 0,2 1,0 Total 0.4 0.2 0.2 0.2 1.0 a soma de todos os valores possiveis deSagora é substituido pela integral sobre todos os valores the sum over all possible values of Y is now replaced by the integral over all possible possiveis des. values of Y. Teorema Sexe Ster uma distribuigdo conjunta continua com PDF conjuntofentdo a marginal Theorem If X and Y have a continuous joint distribution with joint p.d.f. f, then the marginal 3.5.2 pdffidexé j 3.5.2 p.df. f, of X is * CO A(X F(x, y)dy para -~< x <eo, (3.5.2) A, @) -| f(x, y)dy for -co<x<o@. (3.5.2) ~ 0 00 Da mesma forma, 0 pdf marginal dest Similarly, the marginal p.d.f. f> of Y is * CO ASE F(x, y)dx — para -»<vocé <e, (3.5.3) hO) -| f(x, y)dx for —co<y<oo. (3.5.3) ~ 0 00 ProvaProvamos (3.5.2) pois a prova de (3.5.3) é semelhante. Para cadax, Pr.(X<x)pode ser Proof We prove (3.5.2) as the proof of (3.5.3) is similar. For each x, Pr(X <x) can be escrito como Pr((X, YEO), ondeE {(r,s)t RSx}. Podemos calcular esta probabilidade written as Pr((X, Y) € C), where C = {(r, s)) :r <x}. We can compute this probability diretamente da PDF conjunta derescome j directly from the joint p.d-f. of X and Y as x °° x CO Pr.(X% YECF firs)dsdr Pr((x, Y) €C) -| / f(r, s)dsdr fx [feo ] (3.5.4) ee (3.5.4) = f(r, s)ds_ Dr. -| | Ta, sas | dr —0 0 —0o —0o A integral interna na ultima expressdo da Eq. (3.5.4) € uma fungdo deRe pode ser The inner integral in the last expression of Eq. (3.5.4) is a function of r and it facilmenteJreconhecido comofi (7), ondefié definido na Eq. (3.5.2). Segue que can easily be recognized as f(r), where f; is defined in Eq. (3.5.2). It follows that Pr.(XSxFx -fi(r) dr, entaofé a pdf marginal dex. 7 Prix <x)= Pose fi@)dr, so f, is the marginal p.d-f. of X. . Exemplo Derivando um PDF MarginalSuponha que a pdf conjunta deXesSé conforme especificado em Example Deriving a Marginal p.d.f. Suppose that the joint p.d.f. of X and Y is as specified in 3.5.3 Exemplo 3.4.8, a saber, 3.5.3 Example 3.4.8, namely, { 21 . 21,2 2 F(x, yr aosim — Parax2ssimst, fa, y= | Gxy forx’<y <1, QO — aso contrario. 0 otherwise. O conjuntoSde pontos(x, yjpara qualf(x, y) >0 esta esbocgado na Figura 3.16. The set S of points (x, y) for which f(x, y) > 0 is sketched in Fig. 3.16. We shall Determinaremos primeiro a pdf marginalAdexe entdo o pdf marginalfadeS. determine first the marginal p.d.f. f, of X and then the marginal p.d-f. f, of Y. Pode ser visto na Fig. 3.16 queXndo pode assumir nenhum valor fora do It can be seen from Fig. 3.16 that X cannot take any value outside the interval intervalo [-1,1]. Portanto,A(x-0 parax <1 oux >1. Além disso, para -1<x<1, é [—1, 1]. Therefore, f,(x) =0 for x < —1 or x > 1. Furthermore, for —1 < x <1, it is visto na Fig. 3.16 quef(x, y=0 a menos quex2ssims1. Portanto, para -1<xs1, seen from Fig. 3.16 that f(x, y) = 0 unless x2 < y <1. Therefore, for —1 <x <1, 0 1 J Jn ov 5, °° '7/21\ 5 21\ 5 4 fl (XF f (Xx, y) dy= — __ evocé esté morrendo=_ —— X2(1 -X4), Si @) = SQ, y) dy = —_]x y dy ={|— /]x (1 xX ). — 00 x2 4 8 —cO x2 4 8 3.5 Distribuigdes Marginais 133 3.5 Marginal Distributions 133 Figura 3.160 conjuntoSonde sh Sime Figure 3.16 Theset S where y y= x? Ff (x, y) >0 no Exemplo 3.5.3. f(, y) > 0 in Example 3.5.3. (-1, 1) (1, 1) (-1,1 a, -1 1 x —1 1 x Figura 3.170 pdf marginal A(x) Figure 3.17 The marginal f{CO) deXno Exemplo 3.5.3. p.d.f. of X in Example 3.5.3. 1 1 -1 1 x -1 1 * Figura 3.180 pdf marginal ‘alsim) Figure 3.18 The marginal fry) de Sno Exemplo 3.5.3. p.d.f. of Y in Example 3.5.3. | | | | | | | | 0 1 x 0 1 x Este pdf marginal deXesta esbogado na Fig. 3.17. This marginal p.d.f. of X is sketched in Fig. 3.17. Aseguir, pode ser visto na Fig. 3.16 queSndo pode assumir nenhum valor fora do Next, it can be seen from Fig. 3.16 that Y cannot take any value outside the intervalo [0,1]. Portanto, 2(s-0 paravocé <0 ou voé 1. Além disso, para Ossims1, é visto interval [0, 1]. Therefore, f(y) =0 for y < 0 or y > 1. Furthermore, for 0 < y <1, it na Fig. 3.12 quef(x, y+0 a menos que - SIMSXS — Sim. Portanto, para Ossims1, is seen from Fig. 3.12 that f(x, y) =O unless —,/y <x < ,/y. Therefore, for0 < y <1, Je WV sink) OF 00 VI (21) 5 T\ 5 A(sF Ff (x, y) dx= yo (ee dx MN foo) = f f(x, y) ax= | — | x?y dx =(-=) y??. — 00 - sim 4 2 —oo —J/y 4 2 Este pdf marginal deSesta esbogado na Fig. 3.18. - This marginal p.d.f. of Y is sketched in Fig. 3.18. < SeXtem uma distribuicdo discreta eStem uma distribuigdo continua, podemos derivar If X has a discrete distribution and Y has a continuous distribution, we can derive o FP marginal deXe o pdf marginal deSdo conjunto PF/pdf da mesma forma que the marginal p.f. of X and the marginal p.d.f. of Y from the joint p.f./p.d-f. in the same derivamos um PF marginal ou uma pdf marginal de um PF conjunto ou de um pdf ways that we derived a marginal p.f. or a marginal p.d.f. from a joint p.f. or a joint conjunto O seguinte resultado pode ser provado combinando as técnicas usadas nas p.d.f. The following result can be proven by combining the techniques used in the provas dos Teoremas 3.5.1 e 3.5. 2. proofs of Theorems 3.5.1 and 3.5.2. Teorema Deixarfer o pf/pdf conjunto dexeS,comxXdiscreto eScontinuo. Entdo o Theorem Let f be the joint p.f./p.d.f. of X and Y, with X discrete and Y continuous. Then the 3.5.3 FP marginal deXé j 3.5.3 marginal p.f. of X is °° Co A (XP r.(X=xX} f(x, y) dy,para todosx, fi) = Pr(X =x) = / f(@, y)dy, forall x, ~ 0 60 134 Capitulo 3 Varidveis Aleatdrias e Distribuicgées 134 Chapter 3 Random Variables and Distributions e o pdf marginal desé and the marginal p.d.f. of Y is 2d f(s F(x, y), para -~<vocé <w, . hod=>° fl, y), for -c0 < y < ow. = x x Exemplo Determinando um PF Marginal e um pdf Marginal de um PF/pdf ConjuntoSuponha que o Example Determining a Marginal p.f. and Marginal p.d.f. from a Joint p.f./p.d.f. Suppose that the 3.5.4 pf/pdf conjunto dexeSé como no Exemplo 3.4.11 na pagina 124. O FP marginal dexé 3.5.4 joint p.f./p.d.f. of X and Y is as in Example 3.4.11 on page 124. The marginal p.f. of X obtido integrando is obtained by integrating frxynt 1x1 1 xy* 1 fl XF — > morrez Ty X -|/ dd =r, (XE ; 3 3 fi) F 3 y 3 parax=1,2,3. O pdf marginal deSé obtido somando for x = 1, 2, 3. The marginal p.d.f. of Y is obtained by summing 1. 2sim 1 2 R(sF 3 + 3 + sinm,para 0<vocé <1. - fhQ) = 3 + > +y’, for0<y<1. < Embora as distribuigées marginais deXe Spode ser derivado de sua distribui¢do Although the marginal distributions of X and Y can be derived from their conjunta, ndo é possivel reconstruir a distribuigdo conjunta deXe Sde suas joint distribution, it is not possible to reconstruct the joint distribution of X and distribuigdes marginais sem informagées adicionais. Por exemplo, a pdf marginal Y from their marginal distributions without additional information. For instance, esbocada nas Figs. 3.17 e 3.18 ndo revelam nenhuma informacao sobre a relagdo the marginal p.d.f.’s sketched in Figs. 3.17 and 3.18 reveal no information about the entreXeS.Na verdade, por definigdo, a distribuigdo marginal de Xespecifica relationship between X and Y. In fact, by definition, the marginal distribution of probabilidades paraXsem levar em conta os valores de quaisquer outras variaveis X specifies probabilities for X without regard for the values of any other random aleatorias. Esta propriedade de uma pdf marginal pode ser ilustrada ainda mais por variables. This property of a marginal p.d.f. can be further illustrated by another outro exemplo. example. Exemplo Distribuigdes Marginais e Conjuntas.Suponha que um centavo e um niquel sejam langadosn Example Marginal and Joint Distributions. Suppose that a penny and a nickel are each tossed n 3.5.5 vezes para que cada par de sequéncias de langamentos (mlangamentos em cada 3.5.5 times so that every pair of sequences of tosses (1 tosses in each sequence) is equally sequéncia) tem a mesma probabilidade de ocorrer. Considere as duas definigdes a seguir likely to occur. Consider the following two definitions of X and Y: (i) X is the number deXeS: (eu)Xé o numero de caras obtidas com o centavo, e5é o numero de caras obtidas of heads obtained with the penny, and Y is the number of heads obtained with the com 0 niquel. (ii) AmbosXeSsdo o numero de caras obtidas com o centavo, entdo as nickel. (ii) Both X and Y are the number of heads obtained with the penny, so the variaveis aleatériasXeSsdo realmente idénticos. random variables X and Y are actually identical. No caso (i), a distribuigdo marginal deXe a distribuigdo marginal deSserdo In case (i), the marginal distribution of X and the marginal distribution of Y will distribuigdes binomiais idénticas. O mesmo par de distribuigdes marginais dexe S be identical binomial distributions. The same pair of marginal distributions of X and também sera obtido no caso (ii). No entanto, a distribuigdo conjunta deXe Sndo sera o Y will also be obtained in case (ii). However, the joint distribution of X and Y will mesmo nos dois casos. No caso (i),XeSpode assumir valores diferentes. Deles not be the same in the two cases. In case (i), X and Y can take different values. Their articulagdo PF é joint p.f. is © (NCA) (4) sem nm (1849 f(x, YF xy 2 parax=0,1...,n,sim=0,1,...,, de fx, y= (")()) (5) forx =0,1...,n,y=0,1,...,n, 0 _ outra forma. 0 otherwise. No caso (ii),XeSdevem assumir 0 mesmo valor, e seu FP conjunto é In case (ii), X and Y must take the same value, and their joint p.f. is ‘ (n) f 1 * n 1\* f (x, YF x 2 parax=sim=0,1...,n, de . f(x, y)= (") (3) forx=y=0,1...,n, < 0 _ outra forma. 0 otherwise. Varidveis Aleatérias Independentes Independent Random Variables Exemplo Demandas por servicos pUblicos.Nos Exemplos 3.4.15 e 3.5.1, encontramos os fdcs marginais de Example Demands for Utilities. In Examples 3.4.15 and 3.5.1, we found the marginal c.d.f’s of 3.5.6 as demandas de agua ¢ energia elétrica foram, respectivamente, 3.5.6 water and electric demand were, respectively, f parax <4, |. paravocé <1, 0 for x <4, 0 for y <1, Fi x ara 4<xs200 E sim or 1<sims150 F, “for 4 <x <200 FE Y forl<y <150 ye a ! An - = ! = _ = X = > = __ = = > 1(XF ! 9 ? 2(SF | 4g P (x) 196 2(y) 149 y 1 parax >200, 1 paravocé >150. 1 for x > 200, 1 for y > 150. 3.5 Distribuigdes Marginais 135 3.5 Marginal Distributions 135 O produto destas duas funcées é precisamente o mesmo que a fdc conjunta dex The product of these two functions is precisely the same as the joint c.d.f. of X and e Sdado no Exemplo 3.5.1. Uma consequéncia deste facto é que, para cadaxe Y given in Example 3.5.1. One consequence of this fact is that, for every x and Sim, Pr.(XSx,eSsePr.(XSx/Pr. (Sse). Esta equacdo fazXeSum exemplo da préxima y, Pr(X <x, and Y < y) =Pr(X <x) Pr(Y < y). This equation makes X and Y an definicao. - example of the next definition. < Definigao Variaveis Aleatérias Independentes.Diz-se que duas variaveis aleatériasXeSsdo Definition Independent Random Variables. It is said that two random variables X and Y are 3.5.2 independentese, para cada dois conjuntosAe Bde numeros reais tais que {XE A}e {SEB 3.5.2 independent if, for every two sets A and B of real numbers such that {X € A} and }sdo eventos, {Y € B} are events, Pr. (X€Ae SEBEPr.(XEAPr. (SEB). (3.5.5) Pr(x € Aand Y € B) =Pr(X € A) Pr(Y € B). (3.5.5) Em outras palavras, deixefser qualquer evento cuja ocorréncia ou ndo depende In other words, let E be any event the occurrence or nonoccurrence of which depends apenas do valor deX(comoF {XEA}), e deixarDser qualquer evento cuja ocorréncia only on the value of X (such as E = {X € A}), and let D be any event the occurrence or ou ndo depende apenas do valor deS(comoD= {SE B}). Entdo Xe Ssdo variaveis nonoccurrence of which depends only on the value of Y (such as D = {Y € B}). Then aleatérias independentes se e somente sefe Dsdo eventos independentes para todos X and Y are independent random variables if and only if E and D are independent esses eventos FeD. events for all such events E and D. SeXeSso independentes, ent&o para todos os numeros reaisxesim, deve ser verdade If X and Y are independent, then for all real numbers x and y, it must be true que that Pr. (XSxeSseFPr.(XSx)Pr. (Sse). (3.5.6) Pr(x <x and Y < y)=Pr(X <x) Pr(Y <y). (3.5.6) Além disso, como todas as probabilidades deXeSdo tipo que aparece na Eq. (3.5.5) pode Moreover, since all probabilities for X and Y of the type appearing in Eq. (3.5.5) can ser derivada de probabilidades do tipo que aparece na Eq. (3.5.6), pode-se mostrar que se be derived from probabilities of the type appearing in Eq. (3.5.6), it can be shown that a Eq. (3.5.6) é satisfeito para todos os valores dexesim, entaoXeSdeve ser independente. A if Eq. (3.5.6) is satisfied for all values of x and y, then X and Y must be independent. prova desta afirmagdo esta além do escopo deste livro e foi omitida, mas a resumiremos The proof of this statement is beyond the scope of this book and is omitted, but we no seguinte teorema. summarize it as the following theorem. Teorema Deixe o cdf conjunto deXeSserF,deixe o cdf marginal deXserA,, e deixe o Theorem Let the joint c.d.f. of X and Y be F, let the marginal c.d.f. of X be F, and let the 3.5.4 cdf marginal deSser/. EntaéoXeSsdo independentes se e somente se, para todos os 3.5.4 marginal c.d.f. of Y be Fy. Then X and Y are independent if and only if, for all real numeros reaisxesim, F(x, y=Fi (x)F2(s). = numbers x and y, F(x, y) = F\(x) Fy(y). 2 Por exemplo, as procuras de 4gua e electricidade no Exemplo 3.5.6 so independentes. Se For example, the demands for water and electricity in Example 3.5.6 are independent. voltarmos ao Exemplo 3.5.1, também veremos que o produto das FdB marginais da demanda If one returns to Example 3.5.1, one also sees that the product of the marginal p.d.f’s de agua e eletricidade é igual 4 sua Fdp conjunta dada na Eq. (3.4.2). Esta relagdo é of water and electric demand equals their joint p.d.f. given in Eq. (3.4.2). This relation caracteristica de variaveis aleatérias independentes, sejam elas discretas ou continuas. is characteristic of independent random variables whether discrete or continuous. Teorema Suponha queXeSsdo variadveis aleatérias que possuem um PF, pdf ou PF/pdf conjuntof. Theorem Suppose that X and Y are random variables that have a joint p.f, p.d.f, or p.f/p.d-f. f. 3.5.5 EntaoXe Ssera independente se e somente se/pode ser representado da seguinte 3.5.5 Then X and Y will be independent if and only if f can be represented in the following forma para -%<x <oe -0<yocé <0: form for —oo < x < oo and —00 < y < oo: onde/é uma fungdo ndo negativa dexsozinho ef2é uma fungdo ndo negativa desim where h, is a nonnegative function of x alone and hp is a nonnegative function of y sozinho. alone. ProvaDaremos a prova apenas para 0 caso em que Xé discreto eSé continuo. Os Proof We shall give the proof only for the case in which X is discrete and Y is outros casos sdo similares. Para a parte “se”, suponha que a Eq. (3.5.7) é valido. continuous. The other cases are similar. For the “if” part, assume that Eq. (3.5.7) Escrever j holds. Write * CO Ai(xF /n (x)h2(vocé)morreu=ci hi (x), fix) -| hy(x)ho(y)dy = cyhy(x), — 0 —0o ondeci= J, ha(vocémorreuteve ser finito e estritamente positivo, caso contrariofindo seria where c, = £25 hy(y)dy must be finite and strictly positive, otherwise f, wouldn’t be um pf Entao, M1 (x)=A (x)/c1. De forma similar, a p.f. So, Ay(x) = fy (x) /cy. Similarly, 2» 21 1 1 1 (SF In (x)h2(5 he(5) fk ahi) Aly) = QMO) = ha) YP — fi) = Maly). x xCl x x “1 1 raduzido do Inglés para o Portugués - www.onlinedoctranslator.com 136 Capitulo 3 Varidveis Aleatdrias e Distribuicgées 136 Chapter 3 Random Variables and Distributions Entdo, h2(s ci f2(s). Desdef (x, yn (x)h2(s), segue que So, h(y) = c fo(y). Since f(x, y) = h4(x)h2(y), it follows that fi(x) (x) Fin ye “afisk Aioh(s) (3.5.8) fa y= AO fA) = fix) Aly). (3.5.8) 1 Agora deixeAeSser conjuntos de nuimeros reais. Supondo que existam integrais, podemos escrever Now let A and B be sets of real numbers. Assuming the integrals exist, we can write J Pr.(X€Ae SEB f(x, ydy Pr(X ¢ AandYeB)=) / f(x, y)dy xfEA B xeA°B 2 = fi(x)f2(5) morri, = > Aim ho)dy, BXEA Bea 5 J = fi(x) f(s) morri, = > fio | foO)dy, XEA B xeA B onde a primeira igualdade é da Definicdo 3.4.5, a segunda é da Eq. (3.5.8), ea where the first equality is from Definition 3.4.5, the second is from Eq. (3.5.8), and the terceira é 0 rearranjo direto. Vemos agora issoXeSsdo independentes de acordo third is straightforward rearrangement. We now see that X and Y are independent com a Definicdo 3.5.2. according to Definition 3.5.2. Para a parte “somente se”, suponha queXeSsdo independentes. DeixarAe Bser For the “only if” part, assume that X and Y are independent. Let A and B be sets conjuntos de nimeros reais. Deixarfiseja o pdf marginal dex, e deixarfseja o FP marginal of real numbers. Let f; be the marginal p.d.f. of X, and let f, be the marginal p.f. of de S.Entdo f Y. Then 2d Pr.(X€Ae SEB Ai(X) —— evocémorreu Pr(X € Aand Y € B) = > fix) / fo(y)dy A B B xEA fy = fi (x)f2(s) morri, = / > fi@) foo)dy, BxEA Bea (se a integral existir) onde a primeira igualdade segue da Definicdo 3.5.2 e a segunda (if the integral exists) where the first equality follows from Definition 3.5.2 and the é um rearranjo direto. Vemos agora issofi (x)A(ssatisfaz as condigdes necessarias second is a straightforward rearrangement. We now see that f(x) fo(y) satisfies the para serf(x, yconforme indicado na Definicdo 3.4.5. "= conditions needed to be f(x, y) as stated in Definition 3.4.5. = Um corolario simples segue do Teorema 3.5.5. A simple corollary follows from Theorem 3.5.5. Corolario Duas variaveis aleatériasXeSsdo independentes se e somente se o seguinte fator- Corollary Two random variables X and Y are independent if and only if the following factor- 3.5.1 a izacdo € satisfeita para todos os numeros reaisxesim: 3.5.1 ization is satisfied for all real numbers x and y: F(x, YF A (x)f2(s). (3.5.9) Fx, y= fim AY). (3.5.9) = = Como afirmado na Sec. 3.2 (ver pagina 102), em uma distribuigaéo continua os valores de um As stated in Sec. 3.2 (see page 102), in a continuous distribution the values of a pdf pode ser alterado arbitrariamente em qualquer conjunto contavel de pontos. p.d.f. can be changed arbitrarily at any countable set of points. Therefore, for such a Portanto, para tal distribuigdo seria mais preciso afirmar que as variaveis aleatdrias distribution it would be more precise to state that the random variables X and Y are XeSsdo independentes se e somente se for possivel escolher versées deff, eftal independent if and only if it is possible to choose versions of f, f;, and f> such that que a Eq. (3.5.9) é satisfeito para -»<x <oe -0<yocé <w, Eq. (3.5.9) is satisfied for —oo < x < oo and —o0 < y < ov. O significado da independéncia | Demos uma definicdo matematica de in- The Meaning of Independence We have given a mathematical definition of in- variaveis aleatorias dependentes na Definicdo 3.5.2, mas ainda ndo demos qualquer dependent random variables in Definition 3.5.2, but we have not yet given any in- interpretacdo do conceito de varidveis aleatdrias independentes. Devido a estreita terpretation of the concept of independent random variables. Because of the close ligagdo entre eventos independentes e variaveis aleatérias independentes, a connection between independent events and independent random variables, the in- interpretacdo de variaveis aleatorias independentes deve estar intimamente relacionada terpretation of independent random variables should be closely related to the inter- com a interpretagdo de eventos independentes. Modelamos dois eventos como pretation of independent events. We model two events as independent if learning independentes se saber que um deles ocorre nao altera a probabilidade de que o outro that one of them occurs does not change the probability that the other one occurs. ocorra. E mais facil estender esta ideia a varidveis aleatérias discretas. Suponha queXeS It is easiest to extend this idea to discrete random variables. Suppose that X and Y 3.5 Marginal Distributions 137 Table 3.5 Joint p.f. f (x, y) for Example 3.5.7 y x 1 2 3 4 5 6 Total 0 1/24 1/24 1/24 1/24 1/24 1/24 1/4 1 1/12 1/12 1/12 1/12 1/12 1/12 1/2 2 1/24 1/24 1/24 1/24 1/24 1/24 1/4 Total 1/6 1/6 1/6 1/6 1/6 1/6 1.000 have a discrete joint distribution. If, for each y, learning that Y = y does not change any of the probabilities of the events {X = x}, we would like to say that X and Y are independent. From Corollary 3.5.1 and the definition of marginal p.f., we see that in- deed X and Y are independent if and only if, for each y and x such that Pr(Y = y) > 0, Pr(X = x|Y = y) = Pr(X = x), that is, learning the value of Y doesn’t change any of the probabilities associated with X. When we formally define conditional distribu- tions in Sec. 3.6, we shall see that this interpretation of independent discrete random variables extends to all bivariate distributions. In summary, if we are trying to decide whether or not to model two random variables X and Y as independent, we should think about whether we would change the distribution of X after we learned the value of Y or vice versa. Example 3.5.7 Games of Chance. A carnival game consists of rolling a fair die, tossing a fair coin two times, and recording both outcomes. Let Y stand for the number on the die, and let X stand for the number of heads in the two tosses. It seems reasonable to believe that all of the events determined by the roll of the die are independent of all of the events determined by the flips of the coin. Hence, we can assume that X and Y are independent random variables. The marginal distribution of Y is the uniform distribution on the integers 1, . . . , 6, while the distribution of X is the binomial distribution with parameters 2 and 1/2. The marginal p.f.’s and the joint p.f. of X and Y are given in Table 3.5, where the joint p.f. was constructed using Eq. (3.5.9). The Total column gives the marginal p.f. f1 of X, and the Total row gives the marginal p.f. f2 of Y. ◀ Example 3.5.8 Determining Whether Random Variables Are Independent in a Clinical Trial. Return to the clinical trial of depression drugs in Exercise 11 of Sec. 3.4 (on page 129). In that trial, a patient is selected at random from the 150 patients in the study and we record Y, an indicator of the treatment group for that patient, and X, an indicator of whether or not the patient relapsed. Table 3.6 repeats the joint p.f. of X and Y along with the marginal distributions in the margins. We shall determine whether or not X and Y are independent. In Eq. (3.5.9), f (x, y) is the probability in the xth row and the yth column of the table, f1(x) is the number in the Total column in the xth row, and f2(y) is the number in the Total row in the yth column. It is seen in the table that f (1, 2) = 0.087, while f1(1) = 0.513, and f2(1) = 0.253. Hence, f (1, 2) ̸= f1(1)f2(1) = 0.129. It follows that X and Y are not independent. ◀ It should be noted from Examples 3.5.7 and 3.5.8 that X and Y will be indepen- dent if and only if the rows of the table specifying their joint p.f. are proportional to 3.5 Distribuições Marginais 137 Tabela 3.5Junta PFf (x, y)por Exemplo 3.5.7 sim x 1 2 3 4 5 6 Total 0 1 2 1/24 1/24 1/24 1/24 1/24 1/24 1/12 1/12 1/12 1/12 1/12 1/12 1/24 1/24 1/24 1/24 1/ 24 1/24 1/4 1/2 1/4 Total 1/6 1/6 1/6 1/6 1/6 1/6 1.000 tem uma distribuição conjunta discreta. Se, para cadasim, aprendendo queS=simnão altera nenhuma das probabilidades dos eventos {X=x}, gostaríamos de dizer queXeSsão independentes. Do Corolário 3.5.1 e da definição de FP marginal, vemos que de fatoXeS são independentes se e somente se, para cadasimextal que Pr(S=e) >0, pr(X=x|S=e)=Pr.(X= x), isto é, aprender o valor deSnão altera nenhuma das probabilidades associadas aX. Quando definimos formalmente distribuições condicionais na Seção. 3.6, veremos que esta interpretação de variáveis aleatórias discretas independentes se estende a todas as distribuições bivariadas. Em resumo, se estamos tentando decidir se devemos ou não modelar duas variáveis aleatóriasXeScomo independentes, deveríamos pensar se mudaríamos a distribuição deXdepois que aprendemos o valor deSou vice-versa. Exemplo 3.5.7 Jogos de azar.Um jogo de carnaval consiste em lançar um dado justo, lançando uma moeda honesta duas vezes e registrando ambos os resultados. DeixarSrepresente o número no dado e deixeXrepresentam o número de caras nos dois lançamentos. Parece razoável acreditar que todos os eventos determinados pelo lançamento do dado são independentes de todos os eventos determinados pelo lançamento da moeda. Portanto, podemos assumir queXeS são variáveis aleatórias independentes. A distribuição marginal deSé a distribuição uniforme nos inteiros 1, . . . ,6, enquanto a distribuição deXé a distribuição binomial com parâmetros 2 e 1/2. Os FP's marginais e o PF conjunto deX eSsão apresentados na Tabela 3.5, onde o PF da junta foi construído usando a Eq. (3.5.9). A coluna Total fornece o PF marginalf1deX, e a linha Total fornece o PF marginalf2deS. - Exemplo 3.5.8 Determinar se as variáveis aleatórias são independentes em um ensaio clínico.Voltou para o ensaio clínico de medicamentos para depressão no Exercício 11 da Seção. 3.4 (na página 129). Nesse ensaio, um paciente é selecionado aleatoriamente entre os 150 pacientes do estudo e registramos S,um indicador do grupo de tratamento para esse paciente, eX, um indicador de se o paciente teve ou não uma recaída. A Tabela 3.6 repete o FP conjunto deXeSjuntamente com as distribuições marginais nas margens. Determinaremos se ou nãoXeS são independentes. Na equação (3.5.9),f (x, y)é a probabilidade noxa linha e asimª coluna da tabela,f1(x)é o número na coluna Total noxª linha, ef2(s)é o número na linha Total nosimª coluna. Vê-se na tabela quef (1,2)=0.087, enquanto f1(1)=0.513, ef2(1)=0. 253. Portanto,f (1,2)=f1(1)f2(1)=0.129. Daqui resulta que XeSnão são independentes. - Deve-se notar nos Exemplos 3.5.7 e 3.5.8 queXeSserá independente se e somente se as linhas da tabela que especificam seu FP conjunto forem proporcionais a 138 Capitulo 3 Varidveis Aleatdrias e Distribuicgées 138 Chapter 3 Random Variables and Distributions Tabela 3.6Proporcgées marginais no Exemplo 3.5.8 Table 3.6 Proportions marginals in Example 3.5.8 Grupo de tratamento (5) Treatment group (Y) Resposta (X) Imipramina (1) Litio (2) Combinagdo (3) Placebo (4) ‘Total Response (X) Imipramine (1) Lithium (2) Combination (3) Placebo (4) Total Recaida (0) 0,120 0,087 0,146 0,160 0,513 Relapse (0) 0.120 0.087 0.146 0.160 0.513 Sem recaida (1) 0,147 0,166 0,107 0,067 0,487 No relapse (1) 0.147 0.166 0.107 0.067 0.487 entre si, ou equivalentemente, se e somente se as colunas da tabela forem one another, or equivalently, if and only if the columns of the table are proportional proporcionais entre si. to one another. Exemplo C4lculo de uma probabilidade envolvendo varidveis aleatérias independentes.Suponha que dois Example Calculating a Probability Involving Independent Random Variables. Suppose that two 3.5.9 MedidasXe5sdo feitos das chuvas em um determinado local em 1° de maio em dois anos 3.5.9 measurements X and Y are made of the rainfall at a certain location on May 1 in two consecutivos. Pode ser razoavel, dado 0 conhecimento do histérico das chuvas em 1° de consecutive years. It might be reasonable, given knowledge of the history of rainfall maio, tratar as varidveis aleatériasXeScomo independente. Suponha que o on May 1, to treat the random variables X and Y as independent. Suppose that the pdfgde cada medicdo é o seguinte: p.d.f. g of each measurement is as follows: { 2x para OSxs1, 2x for0O<x <1, OXF P g(x) = | =" = 0 de outra forma. 0 otherwise. Vamos determinar 0 valor de Pr(X+Ss1). We shall determine the value of Pr(X + Y <1). DesdeXeSsdo independentes e cada um tem 0 pdfg, segue da Eq. (3.5.9) que Since X and Y are independent and each has the p.d_f. g, it follows from Eq. (3.5.9) para todos os valores dexesimo pdf conjuntof (x, ydeXe Ssera especificado por that for all values of x and y the joint p.d-f. f(x, y) of X and Y will be specified by a relagdof(x, yFg(x)g(y). Por isso, the relation f(x, y) = g(x)g(y). Hence, { 4x ara OS x<1 e OSsims1, caso 4xy forO<x<land0<y <1, Fi,yp paral™ fon =| » == =e 0 contrario. 0 otherwise. O conjuntoSnoxy-avido, ondef (x, y) >0, e o subconjuntoSo, ondex+sims1, estdo The set S in the xy-plane, where f(x, y) > 0, and the subset So, where x + y <1, are esbocados na Fig. Por isso, sketched in Fig. 3.19. Thus, SJ fifi-x 1 1 pl-x 1 Pr.(X+ S81 F(x, y) dx dy= Axy dydx= -. poxsyens= [ [rey dx ay= | / 4xydydx=-. So 0 Oo 6 So 0 Jo 6 Como nota final, se as duas medicgéesXeSforam feitas no mesmo dia em locais préximos, As a final note, if the two measurements X and Y had been made on the same day at entdo pode nao fazer tanto sentido trata-las como independentes, uma vez que nearby locations, then it might not make as much sense to treat them as independent, esperariamos que fossem mais semelhantes entre si do que com precipitagées histéricas. since we would expect them to be more similar to each other than to historical Por exemplo, se aprendermos primeiro queXé pequena em comparagdo com a rainfalls. For example, if we first learn that X is small compared to historical rainfall precipitagdo historica na data em questdo, podemos entdo esperar Sser menor do que a on the date in question, we might then expect Y to be smaller than the historical distribuigdo histérica sugeriria. - distribution would suggest. < Figura 3.190 subconjunto% Sh Figure 3.19 The subset So y ondex+sims1 where x+y <1 no Exemplo 3.5.9. Ss in Example 3.5.9. s 0 1 x 0 1 x 3.5 Distribuigdes Marginais 139 3.5 Marginal Distributions 139 O teorema 3.5.5 diz queXeSsdo independentes se e somente se, para todos os valores Theorem 3.5.5 says that X and Y are independent if and only if, for all values of de xesim,fpode ser fatorado no produto de uma fungdo arbitraria ndo negativa dex e uma x and y, f can be factored into the product of an arbitrary nonnegative function of x fungao arbitraria ndo negativa desim. Contudo, deve ser enfatizado que, assim como na and an arbitrary nonnegative function of y. However, it should be emphasized that, Eq. (3.5.9), a fatoracgdo na Eq. (3.5.7) deve ser satisfeita para todos os valores de xesim(-0< just as in Eq. (3.5.9), the factorization in Eq. (3.5.7) must be satisfied for all values of X <ooe -00<Vocé <0), x and y (—oo < x < co and —co < y < oo). Exemplo Variaveis Aleatorias Dependentes.Suponha que a pdf conjunta deXeStem o seguinte Example Dependent Random Variables. Suppose that the joint p.d.f. of X and Y has the follow- 3.5.10 forma: 3.5.10 ing form: F(x, yr { kesim — parax2+simes1, caso fa. y= { kx*y? forx?+y* <1, 0 contrario. 0 otherwise. Mostraremos queXeSndo sdo independentes. We shall show that X and Y are not independent. E evidente que em cada ponto dentro do circulox2+simes1,f (x, yjpode ser fatorado como It is evident that at each point inside the circle x” + y? <1, f (x, y) can be factored na Eq. (3.5.7). No entanto, esta mesma factorizagdo também nao pode ser satisfeita em todos as in Eq. (3.5.7). However, this same factorization cannot also be satisfied at every os pontos fora deste circulo. Por exemplo,f(0.9,0.9=0, mas nenhumfi(0.90 nem £0.90. (No point outside this circle. For example, (0.9, 0.9) =0, but neither f,(0.9) = 0 nor Exercicio 13, vocé pode verificar esta caracteristica defies.) f2(0.9) = 0. (In Exercise 13, you can verify this feature of f,; and fy.) A caracteristica importante deste exemplo é que os valores deXeSsdo obrigados The important feature of this example is that the values of X and Y are con- a ficar dentro de um circulo. O pdf conjunto deXeSé positivo dentro do circulo e zero strained to lie inside a circle. The joint p.d.f. of X and Y is positive inside the circle fora do circulo. Sob estas condigdes,XeSndo pode ser independente, porque para and zero outside the circle. Under these conditions, X and Y cannot be independent, cada valor dadosimdeS,os possiveis valores deXvai depender des/m. Por exemplo, se because for every given value y of Y, the possible values of X will depend on y. For S=0, entadoXpode ter qualquer valor tal queX2<1; seS=1/2, entaoX deve ter um valor example, if Y =0, then X can have any value such that X* < 1; if Y =1/2, then X tal quex2s3A. - must have a value such that X? < 3/4. < O Exemplo 3.5.10 mostra que é preciso ter cuidado ao tentar aplicar o Example 3.5.10 shows that one must be careful when trying to apply Theo- Teorema 3.5.5. A situagdo que surgiu nesse exemplo ocorrera sempre que {(x, y): rem 3.5.5. The situation that arose in that example will occur whenever {(x, y): f (x, y) >0} possui limites curvos ou nado paralelos aos eixos coordenados. Existe f(, y) > 0} has boundaries that are curved or not parallel to the coordinate axes. um caso especial importante em que é facil verificar as condigdes do Teorema There is one important special case in which it is easy to check the conditions of 3.5.5. A prova é deixada como um exercicio. Theorem 3.5.5. The proof is left as an exercise. Teorema DeixarXeStém uma distribuigdo conjunta continua. Suponha que {(x, yi f(x, y) >0} Theorem Let X and Y have a continuous joint distribution. Suppose that {(, y): f(x, y) > 0} 3.5.6 é uma regido retangular A{possivelmente ilimitado) com lados (se houver) paralelos aos 3.5.6 is a rectangular region R (possibly unbounded) with sides (if any) parallel to the eixos coordenados. EntaéoXeSsdo independentes se e somente se a Eq. (3.5.7) vale para coordinate axes. Then X and Y are independent if and only if Eq. (3.5.7) holds for todos(x, VER. a all (x, y)ER. a Exemplo Verificando a fatoragdo de um pdf conjuntoSuponha que o pdf conjuntofdexe sé Example Verifying the Factorization of a Joint p.d.f. Suppose that the joint p.d.f. f of X and Y is 3.5.11 do seguinte modo: 3.5.11 as follows: F(x, yF { Ke-oo2e) paraxed esim20, caso fx, y)= { ke 429) for x 20 and y > 0, 0 contrario, 0 otherwise, onde&é alguma constante. Primeiro determinaremos sexe Ssdo independentes e where k is some constant. We shall first determine whether X and Y are independent entdo determine suas PDFs marginais. and then determine their marginal p.d.f’s. Neste exemplo,f(x, y=0 fora de uma regido retangular ilimitadaRcujos lados In this example, f(x, y) = 0 outside of an unbounded rectangular region R whose sdo as linhasx=0 esim=0. Além disso, em cada ponto dentroR,f (x, yjpode ser sides are the lines x = 0 and y = 0. Furthermore, at each point inside R, f(x, y) can fatorado como na Eq. (3.5.7) deixando/M (x ke-xe 2 (sk e-2sim. Portanto,X eSsdo be factored as in Eq. (3.5.7) by letting h(x) = ke~* and h>(y) = e~*». Therefore, X independentes. and Y are independent. Segue-se que neste caso, exceto para fatores constantes, /n(x)parax20 e/n2(s) It follows that in this case, except for constant factors, h,(x) for x > 0 and A(y) parasimz20 deve ser o pdf marginal deXeS.Ao escolher constantes que fazem /n for y > 0 must be the marginal p.d.f.’s of X and Y. By choosing constants that make (xkehe(sjintegrar a unidade, podemos concluir que a pdf marginalfie AdexXes h,(x) and h>(y) integrate to unity, we can conclude that the marginal p.d.f’s f, and deve ser 0 seguinte: ff, of X and Y must be as follows: { e-x —x fixe parax2o, f= { e for x 20, 0 de outra forma, 0 otherwise, 140 Capitulo 3 Varidveis Aleatdrias e Distribuicgées 140 Chapter 3 Random Variables and Distributions e { and 2e-2sim “ —2y f(s parasim20, ho) = { 2e for y = 0, 0 de outra forma. 0 otherwise. Se multiplicarmosf (xWezesf2(se compare o produto comf (x, y), nds vemos que k=2. If we multiply f(x) times f5(y) and compare the product to f(x, y), we see that - k=2. < Nota: Fungées separadas de variaveis aleatérias independentes sao Note: Separate Functions of Independent Random Variables Are Independent. If independentes.Se Xe5sdo independentes, entaoh(Xeg(Ssdo independentes, ndo X and Y are independent, then (X) and g(Y) are independent no matter what the importa quais sejam as funcéeshegsdo. Isto é verdade porque para cadato evento {A(Xkt functions h and g are. This is true because for every f, the event {h(X) <1} can always }sempre pode ser escrito como {XEA}, ondeA= {x A(x}/sG. De forma similar, {g(/S)s vocé} be written as {X € A}, where A = {x : h(x) <t}. Similarly, {g(Y) <u} can be written pode ser escrito como {S€8}, entdo a Eq. (3.5.6) parah(X)eg(S/seque da Eq. (3.5.5) paraXeS. as {Y € B}, so Eq. (3.5.6) for A(X) and g(Y) follows from Eq. (3.5.5) for X and Y. Resumo Summary Deixar f(x, yser um PF conjunto, pdf conjunto ou PF/pdf conjunto de duas variaveis Let f(x, y) be a joint p.f., joint p.d-f., or joint p.f./p.d.f. of two random variables X aleatériasXe5.0 PF marginal ou pdf deXé denotado porfi(xy ), eo PF marginal ou and Y. The marginal p.f. or p.d-f. of X is denoted by f,(x), and the marginal p.f. or pdf dese denotado porf(s). Obterfi(x), calcular simt (x, ye discreto p.d.f. of Y is denoted by f;(y). To obtain f(x), compute )/,, f(x, y) if Y is discrete OUs- of (x, y) ayse Se continuo. Da mesma forma, para obter/2(s), calcular ~ xf (x, y) or Le f (x, y) dy if Y is continuous. Similarly, to obtain f,(y), compute }°, f(x, y) seXé discreto Ou» - f(x, y) dxseXé continuo. As variaveis aleatdriasXe if X is discrete or £25 f(, y) dx if X is continuous. The random variables X and Ssdo independentes se e somente sef (x, YF fi(x)f2(sjparatodosxesim. Isto é verdade Y are independent if and only if f(x, y) = f,{(x) f2() for all x and y. This is true independentemente deXe/ouSé continuo ou discreto. Uma condicdo suficiente para que regardless of whether X and/or Y is continuous or discrete. A sufficient condition for duas variaveis aleatdérias continuas sejam independentes é queR= {(x, yf (x, y) >0} ser two continuous random variables to be independent is that R = {(x, y): f(x, y) > O} retangular com lados paralelos aos eixos coordenados e quef (x, y/fatores em fungdes be rectangular with sides parallel to the coordinate axes and that f(x, y) factors into separadas dexdesimemR. separate functions of x of y in R. Exercicios Exercises 1.Suponha quexeStém uma distribuigdo conjunta continua a.Determine as pdfs marginais dexeS. 1. Suppose that X and Y have a continuous joint distribu- a. Determine the marginal p.d.f’s of X and Y. para a qual a pdf conjunta é b.Sd0.Xe Sindependente? tion for which the joint p.d_f. is b. Are X and Y independent? { k paraasx<bec<simsa, de c.O evento é {X<1} e 0 evento {21/2} independente? k fora<x <bandc<y <d, c. Are the event {X < 1} and the event {Y > 1/2} inde- F(x, YF SQ, y= | ; pendent? 0 outra forma, 0 otherwise, ondeuma < b,c<d, ek >0. Encontre as distribuigdes 4.Suponha que a pdf conjunta dexe5é o seguinte: where a < b,c < d, and k > 0. Find the marginal distribu- 4. Suppose that the joint p.d-f. of X and Y is as follows: marginais deXeS. { tions of X and Y. 15 15.2 _ 2 2.Suponha queXestém uma distribuigéo conjunta discreta para a F(X, WF 4x2 Para Ossims1 -x2, de 2. Suppose that X and Y have a discrete joint distribution f@, y= | gx for0< » <1-x*, qual o PF conjunto é definido como segue: O — __outra forma. for which the joint p.f. is defined as follows: 0 otherwise. { . _ . . ; f ab(xte) — parax=0,1,2 esim=0,1,2,3, caso a.Determine as pdfs marginais dexeS. _ A(x +y) forx=0, 1,2 and y =0, 1,2, 3, a. Determine the marginal p.d.f’s of X and Y. (X%, YF - b.Sdo0Xe Sindependente? FO, = . b. Are X and Y independent? 0 L contrario. 0 otherwise. a.Determine os FPs marginais deXeS. 5.Uma certa drogaria possui trés cabines telefOnicas publicas. a. Determine the marginal p.f’s of X and Y. 5. A certain drugstore has three public telephone booths. b.S3 d d > Paraeu=0, 1, 2, 3, deixepeudenota a probabilidade de que . 9 For i = 0, 1, 2, 3, let p; denote the probability that ex- -SdoXeSindependente? exatamenteeuas cabines telef6nicas estardo ocupadas em b. Are X and ¥ independent? actly i telephone booths will be occupied on any Monday 3.Suponha queXestém uma distribuigdo conjunta continua para a qualquer segunda-feira a noite as 8hPM;e suponha quepo= 0. 3. Suppose that X and Y have a continuous joint distribu- evening at 8:00 P.M.; and suppose that po = 0.1, p; = 0.2, qual a pdf conjunta é definida da seguinte forma: 1,pi= 0.2, pr= 0.4, eps= 0.3. DeixeXeSdenota o numero de tion for which the joint p.d.f. is defined as follows: pz = 0.4, and p3= 0.3. Let X and Y denote the number of { estandes que estardo ocupados as 8hPMem duas noites booths that will be occupied at 8:00 p.m. on two indepen- f 3 para Os x<2 e O<sims1, caso independentes de segunda-feira. Determinar:(a)o PF conjunto _ By? for0 <x <2and0<y<1, dent Monday evenings. Determine: (a) the joint p.f. of X (%, YF 0” contrério deX e5:(b)Pr.(X=5}(e)Pr.(X > Y). SQ, W= 0 otherwise and Y; (b) Pr(X = Y); (©) Pr(X > Y). 3.6 Distribuigdes Condicionais 141 3.6 Conditional Distributions 141 6.Suponha que em um determinado medicamento a concentracao de 11.Suponha que duas pessoas marquem um encontro 6. Suppose that in a certain drug the concentration of a 11. Suppose that two persons make an appointment to um determinado produto quimico seja uma variavel aleatoria com entre 5PMe 6PMem um determinado local e concordam particular chemical is arandom variable with a continuous meet between 5 p.m. and 6 PM. at a certain location, and distribuigdo continua para a qual a pdfgé o seguinte: que nenhuma das pessoas esperara mais de 10 minutos distribution for which the p.d.f. g is as follows: they agree that neither person will wait more than 10 { 3 pela outra. Se eles chegarem independentemente em 3.9 minutes for the other person. If they arrive independently G(X 3x2 Para OSxs<2, hordarios aleatérios entre 5PMe 6PM,qual é a g(x) = | gx° forO<x <2, at random times between 5 p.m. and 6 p.M., what is the 0 de outra forma. probabilidade de eles se encontrarem? 0 otherwise. probability that they will meet? Suponha que as concentragdesXeSdo produto quimico em dois 12.Prove 0 Teorema 3.5.6. Suppose that the concentrations X and Y of the chemical 12. Prove Theorem 3.5.6. lotes separados do medicamento sdo varidveis aleatérias in two separate batches of the drug are independent ran- independentes para cada uma das quais a pdf ég. Determinar 13.No Exemplo 3.5.10, verifique seXeSt€m os mesmos dom variables for each of which the p.d.f.is g. Determine —__13,_ In Example 3.5.10, verify that X and Y have the same (a)o pdf conjunto deXe5;(b)Pr.(X=5}(c)Pr.(X > Y), pdfs marginais e que (a) the joint p.d.f. of X and Y; (b) Pr(X = Y);(©)Pr(X > Y); marginal p.d.f.’s and that (d)Pr.(X+5<1), ( (d) Pr(X + Y <1). 2 2)3/2 : 7.Suponha que a pdf conjunta deXeSé o seguinte: fi(xF2k2(1 -x2B23 se -1Sxs1, 7. Suppose that the joint p.d.f. of X and ¥ is as follows: fix) = | 2kx*(L—x7)°7/3 if-1<x <1, { 0 de outra forma. 0 otherwise. F(x, YF 2xesim para OSx<1 e 0<vocé <w, de _ |2xe for0<x<land0<y<wo, XY, 0 outra forma 14.Para a FDP conjunta no Exemplo 3.4.7, determine se fO.W= 0 otherwise 14. For the joint p.d.f. in Example 3.4.7, determine XeSsdo independentes. ; whether or not X and Y are independent. Sao. Xe Sindependente? Are X and Y independent? . , a, 15.Um processo de pintura consiste em duas etapas. Na . . . 15. A painting process consists of two stages. In the first 8.Suponha re a pdf conjunta dexeSé o sequinte: primeira etapa é aplicada a tinta e na segunda etapa é 8. Suppose that the joint p.d-f of X and ¥ is as follows: stage, the paint is applied, and in the second stage, a pro- 24xy parax20,sim>0, extsims1, caso adicionada uma camada protetora. DeixarXseja o tempo gasto 24xy forx>0,y>0,andx+y <1, tective coat is added. Let X be the time spent on the first F(X, YF 0 contrario na primeira etapa, e deixeSsera o tempo gasto na segunda ff, y= 0 otherwise stage, and let Y be the time spent on the second stage. The ‘ etapa. A primeira etapa envolve uma inspecdo. Caso a tinta ‘ first stage involves an inspection. If the paint fails the in- Sao Xe Sindependente? ndo passe na inspecdo, deve-se esperar trés minutos e aplicar Are X and Y independent? spection, one must wait three minutes and apply the paint 9.Suponha que um ponto (x, 5)é escolhido aleatoriamente do a tinta etc, Apos ume Seguns aplicagao, nao ha mais 9. Suppose that a point (X, Y) is chosen at random from en ae a second application, [ere is no further in- retanguloSdefinido da seguinte forma: inspegao. O pat conjunto dexe5e the rectangle S defined as follows: spection. Mie JOM’ p.c.t OF & ances is S {(x, yi OS xS2 e 1S sims4}. |, se 1<x <3 e O<vocé <1, se 6 S={(x, y) :0<x<2and1<y <4}. 5 ifl<x<3and0<y<l, . ; ; F(x, 1 4 ; . ; x,yy=yl j a.Determine a pdf conjunta deXeS,o marginal YF le <x Be O<voce <1, caso a. Determine the joint p.d.f. of X and Y, the marginal FC, y) 6 if6< *< 8and0<y <1, pdf dex, e o pdf marginal deS. 0 contrario. p.d.f. of X, and the marginal p.d.f. of Y. 0 otherwise. b.Sao Xe Sindependente? a.Esboce a regido ondef (x, y) >0. Observe que nao é b. Are X and Y independent? a. Sketch the region where f(x, y) > 0. Note that it is 10.Suponha que um ponto (X,5)é escolhido aleatoriamente do exatamente um retangulo. 10. Suppose that a point (X, Y) is chosen at random from not exactly a rectangle. circuloSdefinido da sequinte forma: b.Encontre os PDFs marginais dexeS. the circle S defined as follows: b. Find the marginal p.d.f’s of X and Y. S={(x%, yotsinms|}. c.Mostre issoXeSsdo independentes. S={((x, y) wy y2 <1. c. Show that X and Y are independent. ; ; ; Este problema nado contradiz o Teorema 3.5.6. Nesse . . . This problem does not contradict Theorem 3.5.6. In that a.Determine a pdf conjunta dexXeS,o marginal teorema as condicées, incluindo 0 conjunto onde f(x, y) >0 a. Determine the joint p.d.f of X and Y, the marginal theorem the conditions, including that the set where pdf dex, e o pdf marginal deS. serem retangulares, sao suficientes, mas ndo necessérios. p.d.f. of X, and the marginal p.d.f. of Y. f(x, y) > 0 be rectangular, are sufficient but not neces- b.SdoXe Sindependente? b. Are X and Y independent? sary. 3.6 Distribuigdes Condicionais 3.6 Conditional Distributions Generalizamos o conceito de probabilidade condicional para distribui¢ées condicionais. We generalize the concept of conditional probability to conditional distributions. Lembre-se de que as distribuigées so apenas colegdes de probabilidades de eventos Recall that distributions are just collections of probabilities of events determined determinadas por varidveis aleatérias. Distribuigées condicionais serao as probabilidades by random variables. Conditional distributions will be the probabilities of events de eventos determinados por algumas varidveis aleatorias condicionais a eventos determined by some random variables conditional on events determined by other determinados por outras varidveis aleatérias. A ideia 6 que normalmente haverd muitas random variables. The idea is that there will typically be many random variables of varidveis aleatorias de interesse em um problema aplicado. Depois de observarmos interest in an applied problem. After we observe some of those random variables, algumas dessas varidveis aleatorias, queremos poder ajustar as probabilidades we want to be able to adjust the probabilities associated with the ones that have not associadas aquelas que ainda néo foram observadas. A distribui¢ao condicional de uma yet been observed. The conditional distribution of one random variable X given varidvel aleat6riaXdado outroSseré a distribuic¢&o que usariamos paraXdepois de another Y will be the distribution that we would use for X after we learn the value aprendermos o valor deE. of Y. 142 Chapter 3 Random Variables and Distributions Table 3.7 Joint p.f. for Example 3.6.1 Brand Y Stolen X 1 2 3 4 5 Total 0 0.129 0.298 0.161 0.280 0.108 0.976 1 0.010 0.010 0.001 0.002 0.001 0.024 Total 0.139 0.308 0.162 0.282 0.109 1.000 Discrete Conditional Distributions Example 3.6.1 Auto Insurance. Insurance companies keep track of how likely various cars are to be stolen. Suppose that a company in a particular area computes the joint distribution of car brands and the indicator of whether the car will be stolen during a particular year that appears in Table 3.7. We let X = 1 mean that a car is stolen, and we let X = 0 mean that the car is not stolen. We let Y take one of the values from 1 to 5 to indicate the brand of car as indicated in Table 3.7. If a customer applies for insurance for a particular brand of car, the company needs to compute the distribution of the random variable X as part of its premium determination. The insurance company might adjust their premium according to a risk factor such as likelihood of being stolen. Although, overall, the probability that a car will be stolen is 0.024, if we assume that we know the brand of car, the probability might change quite a bit. This section introduces the formal concepts for addressing this type of problem. ◀ Suppose that X and Y are two random variables having a discrete joint distribu- tion for which the joint p.f. is f . As before, we shall let f1 and f2 denote the marginal p.f.’s of X and Y, respectively. After we observe that Y = y, the probability that the random variable X will take a particular value x is specified by the following condi- tional probability: Pr(X = x|Y = y) = Pr(X = x and Y = y) Pr(Y = y) = f (x, y) f2(y) . (3.6.1) In other words, if it is known that Y = y, then the probability that X = x will be updated to the value in Eq. (3.6.1). Next, we consider the entire distribution of X after learning that Y = y. Definition 3.6.1 Conditional Distribution/p.f. Let X and Y have a discrete joint distribution with joint p.f. f . Let f2 denote the marginal p.f. of Y. For each y such that f2(y) > 0, define g1(x|y) = f (x, y) f2(y) . (3.6.2) Then g1 is called the conditional p.f. of X given Y. The discrete distribution whose p.f. is g1(.|y) is called the conditional distribution of X given that Y = y. 142 Capítulo 3 Variáveis Aleatórias e Distribuições Tabela 3.7Junta PF para Exemplo 3.6.1 MarcaS 3 RoubadoX 1 2 4 5 Total 0 1 0,129 0,298 0,161 0,280 0,108 0,010 0,010 0,001 0,002 0,001 0,976 0,024 Total 0,139 0,308 0,162 0,282 0,109 1.000 Distribuições Condicionais Discretas Exemplo 3.6.1 Seguro Automóvel.As seguradoras monitoram a probabilidade de vários carros serem roubado. Suponha que uma empresa numa determinada área calcule a distribuição conjunta de marcas de automóveis e o indicador de se o carro será roubado durante um determinado ano que aparece na Tabela 3.7. Nós deixamosX=Quero dizer que um carro foi roubado e deixamosX=0 significa que o carro não foi roubado. Nós deixamosSpegue um dos valores de 1 a 5 para indicar a marca do carro conforme indicado na Tabela 3.7. Se um cliente solicitar seguro para uma determinada marca de carro, a empresa precisará calcular a distribuição da variável aleatóriaXcomo parte de sua determinação de prêmio. A seguradora pode ajustar seu prêmio de acordo com um fator de risco, como a probabilidade de ser roubado. Embora, no geral, a probabilidade de um carro ser roubado seja de 0,024, se assumirmos que conhecemos a marca do carro, a probabilidade pode mudar bastante. Esta seção apresenta os conceitos formais para abordar esse tipo de problema. - Suponha queXeSsão duas variáveis aleatórias com uma distribuição conjunta discreta para a qual o PF conjunto éf.Como antes, vamos deixarf1ef2denotam os FPs marginais deXeS,respectivamente. Depois de observarmos queS=sim, a probabilidade de que a variável aleatóriaXassumirá um valor específicoxé especificado pela seguinte probabilidade condicional: Pr.(X=xeS=e) Pr.(S=e) Pr.(X=x|S=e)= = f (x, y) . (3.6.1) f2(s) Em outras palavras, se for sabido queS=sim, então a probabilidade de queX=xserá atualizado para o valor na Eq. (3.6.1). A seguir, consideramos toda a distribuição deX depois de aprender issoS=sim. Definição 3.6.1 Distribuição Condicional/pfDeixarXeStêm uma distribuição conjunta discreta com articulação pff.Deixarf2denotar o PF marginal deS.Para cadasimde tal modo quef2(s) >0, definir f (x, y) f2(s) g1(x|e)= . (3.6.2) Entãog1é chamado dePF condicional deXdadoS.A distribuição discreta cujo PF ég1(.| e)é chamado dedistribuição condicional deXdado queS=sim. 3.6 Distribuigdes Condicionais 143 3.6 Conditional Distributions 143 Tabela 3.8PF condicional deSdadoXpara a prova- Table 3.8 Conditional pf. of Y given X for Exam- ple 3.6.3 ple 3.6.3 Marcas Brand Y RoubadoxX 1 2 3 4 5 Stolen X 1 2 3 4 5 0 0,928 0,968 0,994 0,993 0,991 0 0.928 0.968 0.994 0.993 0.991 1 0,072 0,032 0,006 0,007 0,009 1 0.072 0.032 0.006 0.007 0.009 Devemos verificar quegi(x| e na verdade um PF em fungdo dexpara cadasim. Deixarsim We should verify that g;(x|y) is actually a p.f. as a function of x for each y. Let y be seja tal quef2(s) >0. Entaogi(x| e0 para todosxe such that f,(y) > 0. Then g;(x|y) > 0 for all x and 2 1) 1 1 1 gi(xje= —— f(x, yF =—F(sF1. gi(xly) = ——~ )_ f@, y=—~hAMe=l x hi(s)x Ffsp » fr) » fr) Observe que nao nos preocupamos em definirgi(x| ejpara aquelessimde tal modo quef(s=0. Notice that we do not bother to define g,(x|y) for those y such that f>(y) = 0. Da mesma forma, sexé um determinado valor deXde tal modo quefi (xPr.(X=x) >0, e seg2 Similarly, if x is a given value of X such that f, (x) = Pr(X =x) > 0, and if g5(y|x) (vocé| x) é oPF condicional deSdado queX=x, entao is the conditional p.f. of Y given that X =x, then f(x, y) f(x, y) g2(vocé| x= ————., (3.6.3) 89(y|x) = ———. (3.6.3) fi (x) fi) Para cadaxde tal modo quefi(x) >0, a fungdog2(vocé| xserd um PF em funcdo desim. For each x such that f,(x) > 0, the function g>(y|x) will be a p.f. as a function of y. Exemplo Calculando um PF Condicional a partir de um PF ConjuntoSuponha que a junta PF dexe.sé Example Calculating a Conditional p.f. from a Joint p.f. Suppose that the joint p.f. of X and Y is 3.6.2 conforme especificado na Tabela 3.4 no Exemplo 3.5.2. Determinaremos 0 FP condicional 3.6.2 as specified in Table 3.4 in Example 3.5.2. We shall determine the conditional p.f. of de Sdado quexX=2. Y given that X =2. O FP marginal deXaparece na coluna Total da Tabela 3.4, entdof (2 Pr.(X= 20. The marginal p.f. of X appears in the Total column of Table 3.4, so f,(2) = Pr(X = 6. Portanto, a probabilidade condicionalgz(vocé| 2)queSassumira um valor especifico 2) = 0.6. Therefore, the conditional probability g5(y|2) that Y will take a particular simé value y is F (2, vocé) f(2, y) (vocé|2- ———. 2) = ———.. gplvoce| a 82(yI2) 06 Deve-se notar que para todos os valores possiveis desim, as probabilidades condicionais gz It should be noted that for all possible values of y, the conditional probabilities (vocé|2deve ser proporcional as probabilidades conjuntasf (2, vocé). Neste exemplo, cada valor go(y|2) must be proportional to the joint probabilities f(2, y). In this example, each def (2, vocéé simplesmente dividido pela constantefi(2-0.6 para que a soma dos resultados value of f(2, y) is simply divided by the constant f,(2) = 0.6 in order that the sum of seja igual a 1. Assim, the results will be equal to 1. Thus, Ql |2F12, g2(2|20, 92(3|2F14, g2(4|2F1A. - g20|2)= 1/2, g9(2|2)=0, — g2(3|2) = 1/6, — g2(4/2) = 1/3. < Exemplo Seguro Automével.Considere novamente as probabilidades de marcas de automéveis e carros serem roubados Example Auto Insurance. Consider again the probabilities of car brands and cars being stolen 3.6.3 no Exemplo 3.6.1. A distribuigdo condicional deXsendo roubado) dadoS(marca) é apresentada na 3.6.3 in Example 3.6.1. The conditional distribution of X (being stolen) given Y (brand) Tabela 3.8. Parece que a Marca 1 tem muito mais probabilidade de ser roubada do que outros carros is given in Table 3.8. It appears that Brand 1 is much more likely to be stolen than nesta area, tendo a Marca 1 também uma probabilidade significativa de ser roubada. other cars in this area, with Brand 1 also having a significant chance of being stolen. - < Distribuigdes Condicionais Continuas Continuous Conditional Distributions Exemplo Tempos de processamento.Um processo de fabricacdo consiste em duas etapas. O primeiro estagio Example Processing Times. A manufacturing process consists of two stages. The first stage 3.6.4 levaSminutos, e todo o processo levaXminutos (que inclui o primeiro 3.6.4 takes Y minutes, and the whole process takes X minutes (which includes the first 144 Capitulo 3 Varidveis Aleatdrias e Distribuicgées 144 Chapter 3 Random Variables and Distributions Sminutos). Suponha quexeSter uma distribuigdo continua conjunta com pdf conjunto Y minutes). Suppose that X and Y have a joint continuous distribution with joint p.df. ex para O<simsx <o, de e* for0<y<x<oo Fox, ye “mee fon =| = SE SO 0 outra forma. 0 otherwise. Depois de aprendermos quanto tempoSque leva a primeira etapa, queremos atualizar nossa After we learn how much time Y that the first stage takes, we want to update our distribuicdo para o tempo totalX. Em outras palavras, gostariamos de poder calcular uma distribuicdo distribution for the total time X. In other words, we would like to be able to compute condicional paraXdadoS=sim. Nao podemos argumentar da mesma forma que fizemos com a conditional distribution for X given Y = y. We cannot argue the same way as we distribuicgdes conjuntas discretas, porque {S=sim}é um evento com probabilidade 0 para todossim. did with discrete joint distributions, because {Y = y} is an event with probability 0 - for all y. < Para facilitar a solug¢do de problemas como o colocado no Exemplo 3.6.4, 0 To facilitate the solutions of problems such as the one posed in Example 3.6.4, conceito de probabilidade condicional sera estendido considerando a definigdo the concept of conditional probability will be extended by considering the definition do FP condicional deXdado na Eq. (3.6.2) e a analogia entre um PF e um pdf of the conditional p.f. of X given in Eq. (3.6.2) and the analogy between a p.f. anda p.df. Definigao PDF condicionalDeixarXeSter uma distribuigdo conjunta continua com PDF conjunto Definition | Conditional p.d.f. Let X and Y have a continuous joint distribution with joint p.d.f. 3.6.2 fe respectivas marginaisfef. Deixarsimseja um valor tal quef(s) >0. Entdo 0 pdf 3.6.2 f and respective marginals f; and f>. Let y be a value such that f,(y) > 0. Then the condicionalgi deXxdado queS=simé definido da seguinte forma: conditional p.d.f. g, of X given that Y = y is defined as follows: F(x, , gi(x| eF P06 Y)_ para -%< x <oo, (3.6.4) gy(x|y) = Ly) for —0o <x < oo. (3.6.4) f(s) fry) Para valores desimde tal modo quef(sk0, somos livres para definirgi(x| ekomo For values of y such that f5(y) = 0, we are free to define g;(x|y) however we wish, quisermos, desde quegi(x| e& um pdf em fungdo dex. so long as g;(x|y) is a p.d.f. as a function of x. Deve-se notar que a Eq. (3.6.2) e Eq. (3.6.4) sao idénticos. No entanto, a Eq. (3.6.2) foi It should be noted that Eq. (3.6.2) and Eq. (3.6.4) are identical. However, derivadocomo a probabilidade condicional de queX=xdado queS=sim, enquanto a Eq. Eq. (3.6.2) was derived as the conditional probability that X =x given that Y = y, (3.6.4) foidefiniramser o valor da pdf condicional deXdado queS=sim. Na verdade, whereas Eq. (3.6.4) was defined to be the value of the conditional p.d.f. of X given deveriamos verificar quegi(x| e)conforme definido acima é realmente um pdf that Y = y. In fact, we should verify that g|(x|y) as defined above really is a p.d-f. Teorema Para cadasim,gi(x| edefinido na Definicgado 3.6.2 6 uma pdf em fungdo dex. Theorem For each y, g;(x|y) defined in Definition 3.6.2 is a p.d-f. as a function of x. 3.6.1 3.6.1 ProvaSefi(s¥0, entdogié definido como qualquer pdf que desejarmos e, portanto, é um pdf sef Proof If f5(y) =0, then g; is defined to be any p.d.f. we wish, and hence it is a p.d-f. 2(s) >0,g1é definido pela Eq. (3.6.4). Para cada um dessessim, é claro quegi(x| e/20 para todosx. If fo(y) > 0, gy is defined by Eq. (3.6.4). For each such y, it is clear that gj(x|y) > 0 Também sefa(s) >0, entdo for all x. Also, if f(y) > 0, then Joo Jof (x, y) dx oo 0 f. °° oo S(O, y) dx gi(xje) d= 2 = ay, / gi(elyy dx = ERLE _ AO) _y, = © f(s) f(s) 00 f(y) fy) usando a formula parafa(sna Eq. (3.5.3). 7 by using the formula for f5(y) in Eq. (3.5.3). 7 Exemplo Tempos de processamento.No Exemplo 3.6.4,5é 0 tempo que leva a primeira etapa de um processo, Example Processing Times. In Example 3.6.4, Y is the time that the first stage of a process takes, 3.6.5 enquantoXé o tempo total das duas etapas. Queremos calcular a pdf condicional deXdado 3.6.5 while X is the total time of the two stages. We want to calculate the conditional p.d.f. S.Podemos calcular a pdf marginal deSda seguinte forma: Para cadasim, os possiveis of X given Y. We can calculate the marginal p.d.f. of Y as follows: For each y, the valores deXsdo todosx2sim, entdo para cadavocé >0, possible values of X are all x > y, so for each y > 0, Joo 0° f(s @-xOX=@-sim, ho) = / e “dx =e”, sim y ef2(sF0 paravocé <0. Para cadasimz20, 0 pdf condicional deXdadoS=simé entdo and f>(y) =0 for y < 0. For each y > 0, the conditional p.d-f. of X given Y = y is then F (x, e-x ; —x gi(x| eF P%Y) _ — =6sim-x,paraxzsim, gy(xly) = F@, y) = < =e’, forx>y, f2(s) e-sim fo) ey 3.6 Distribuigdes Condicionais 145 3.6 Conditional Distributions 145 Figura 3.20A condicao Ax,sim) Figure 3.20 The condi- S(% y) pdf nacionalgi(x| simoX tional p.d-f. g1(x|yo) is pro- proporcional af(x, yo). portional to f(x, yo). Ax, Simm) SQ; Yo) 2 5 2 5 j a ; j a ; Be AS 3s" — aS «Co a E sim - E y egi(x| e-0 parax <y. Assim, por exemplo, se observarmos5=4 e queremos a and g;(x|y) =0 for x < y. So, for example, if we observe Y = 4 and we want the probabilidade condicional de queX29, oom conditional probability that X > 9, we compute °° CO Pr.(X29| S=4}- e4-xdx= e-5= 0.0067. - Pr(X > 9|¥Y =4) = [ e**dx = e> = 0.0067. < 9 9 A Definigdo 3.6.2 tem uma interpretacdo que pode ser entendida considerando a Definition 3.6.2 has an interpretation that can be understood by considering Figura 3.20. O pdf conjuntofdefine uma superficie sobre oxy-plano para o qual a altura f Fig. 3.20. The joint p.d.f. f defines a surface over the xy-plane for which the height (x, yem cada ponto(x, yyrepresenta a probabilidade relativa desse ponto. Por exemplo, se f(, y) at each point (x, y) represents the relative likelihood of that point. For se sabe queS=simo, entao o ponto (x,s/m) deve estar na linhasim=simonoxyplano ea instance, if it is known that Y = yo, then the point (x, y) must lie on the line y = yo in probabilidade relativa de qualquer ponto (x, simo) nesta linha éf (x, yo). Portanto, o pdf the xy-plane, and the relative likelihood of any point (x, yo) on this line is f(x, yo). condicionalgi(x| simm)deXdeveria ser proporcionalf(x, yo). Em outras palavras, gi (x| simo}é Hence, the conditional p.d-f. ¢;(x|y9) of X should be proportional to f(x, yo). In other essencialmente o mesmo quef (x, yo), mas inclui um fator constante 1{4(vocéJ], que é words, g)(x|yo) is essentially the same as f(x, yo), but it includes a constant factor necessdario para fazer com que a pdf condicional se integre 4 unidade sobre todos os 1/[.f2(0)], which is required to make the conditional p.d-f. integrate to unity over all valores dex. values of x. Da mesma forma, para cada valor dexde tal modo quefi(x) >0, opaf condicional deSdado quex=x Similarly, for each value of x such that f,(x) > 0, the conditional p.d.f. of Y given é definido da seguinte forma: that X =x is defined as follows: F(X, 5 gp(vocée| x £06 Y) para -©<vocé <e, (3.6.5) go(y|x) = fe, y) for -co<y<om. (3.6.5) fi (x) fi) Esta equacdo é idéntica 4 Eq. (3.6.3), que foi derivado para distribuigdes discretas. Sef (x This equation is identical to Eq. (3.6.3), which was derived for discrete distributions. 0, entaéogz2(vocé| xX arbitrario desde que seja um pdf em funcgdo desim. If f,(x) =0, then g>(y|x) is arbitrary so long as it is a p.d.f. as a function of y. Exemplo Calculando um pdf condicional a partir de um pdf conjuntoSuponha que a pdf conjunta deXe Example Calculating a Conditional p.d.f. from a Joint p.d.f. Suppose that the joint p.d.f. of X and 3.6.6 Sé conforme especificado no Exemplo 3.4.8 na pagina 122. Primeiro determinaremos a 3.6.6 Y is as specified in Example 3.4.8 on page 122. We shall first determine the conditional fdp condicional deSdado queX=xe entdo determine algumas probabilidades paraSdado o p.d.f. of Y given that X = x and then determine some probabilities for Y given the valor especificoX=1/2. specific value X = 1/2. O conjuntoSpara qualf(x, y) >0 foi esbogado na Fig. 3.12 na pagina 123. Além The set S for which f(x, y) > 0 was sketched in Fig. 3.12 on page 123. Further- disso, a pdf marginalffoi derivado no Exemplo 3.5.3 na pagina 132 e esbogado more, the marginal p.d.f. f; was derived in Example 3.5.3 on page 132 and sketched na Fig. 3.17 na pagina 133. Pode ser visto na Fig.A (x) >0 para -1<x <1 mas ndo in Fig. 3.17 on page 133. It can be seen from Fig. 3.17 that f(x) > 0 for —1 <x <1 but parax=0. Portanto, para cada valor dado dextal que -1<x <0 ou O<x <1, not for x = 0. Therefore, for each given value of x such that -1 <x <0or0<x <1, o pdf condicionalgz(vocé| wT 0 seguinte: the conditional p.d-f. g5(y|x) of Y will be as follows: 2sim 2y —. arax2ssims1, —-_ forx2<y< 1, qrvoceixe |t-x 4 P gr(ylx) = | 1—x4 =*= 0 de outra forma. 0 otherwise. 146 Capitulo 3 Varidveis Aleatdrias e Distribuicgées 146 Chapter 3 Random Variables and Distributions ( | ) Em particular, se for sabido queX=1/2, entdo PrSzi| 4X=1 5 =1e In particular, if it is known that X = 1/2, then Pr(¥ >4|X= ) =1and ( | ) sy, O : prse Se T= ge Sth am % . re(v= 3] 2) = [ie (|p) 0-2 p 4 2 3A 3 15 4 2 3/4 2 15 Nota: Uma pdf condicional nao é 0 resultado do condicionamento em um conjunto Note: A Conditional p.d.f. Is Not the Result of Conditioning on a Set of Probability de probabilidade zero.O pdf condicionalgi(x| edeXdadoS=simé o pdf que usarfamos para Xse Zero. The conditional p.d-f. g;(x|y) of X given Y = y is the p.d.f. we would use for nds aprendéssemos issoS=sim. Isso soa como se estivéssemos condicionados ao evento {S=sim X if we were to learn that Y = y. This sounds as if we were conditioning on the event }, que tem probabilidade zero se Stem uma distribuigdo continua. Na verdade, para os casos que {Y = y}, which has zero probability if Y has a continuous distribution. Actually, for veremos neste texto, o valor degi(x| e um limite: the cases we shall see in this text, the value of g)(x|y) is a limit: _. Oo . . _ oO gi(x|eFlimao — Pr.(XSx| sim-e < YSsimte). (3.6.6) gy(x|y) = lim — Pr(X <x|y-—e€ <Y<yte). (3.6.6) £00X e>0 Ox O evento de condicionamento {sim-es Ss sim ena Eq. (3.6.6) tem probabilidade positiva se The conditioning event {y —« < Y < y +e} in Eq. (3.6.6) has positive probability if a fdp marginal deSé positivo emsim. A matematica necessdaria para tornar isso rigoroso the marginal p.d_-f. of Y is positive at y. The mathematics required to make this rigor- esta além do escopo deste texto. (Veja o Exercicio 11 nesta secdo e os Exercicios 25 e 26 ous is beyond the scope of this text. (See Exercise 11 in this section and Exercises 25 na Secdo 3.11 para resultados que podemos provar.) Outra maneira de pensar sobre o and 26 in Sec. 3.11 for results that we can prove.) Another way to think about condi- condicionamento em uma variavel aleatoria continua é notar que as FDP condicionais que tioning on a continuous random variable is to notice that the conditional p.d-f.’s that calculamos sdo tipicamente continuas como uma fungao da variavel condicionante. Isto we compute are typically continuous as a function of the conditioning variable. This significa que 0 condicionamento emS=simou emS=sim+epara pequenogproduzira quase a means that conditioning on Y = y or on Y = y + € for small € will produce nearly mesma distribuigdo condicional paraX. Entéo ndo importa muito se usarmosS=sim como the same conditional distribution for X. So it does not matter much if we use Y = y um substituto paraSperto desim. No entanto, é importante ter em mente que a fdp as a surogate for Y close to y. Nevertheless, it is important to keep in mind that the condicional deXdadoS=simé melhor pensado como o pdf condicional deX dado queSesta conditional p.d.f. of X given Y = y is better thought of as the conditional p.d.f. of X muito perto desim. Esta formulagdo é estranha, por isso ndo a usaremos, mas devemos given that Y is very close to y. This wording is awkward, so we shall not use it, but lembrar a distingdo entre a fdp condicional e o condicionamento a um evento com we must remember the distinction between the conditional p.d.f. and conditioning probabilidade 0. Apesar desta distingdo, ainda é legitimo tratarS como a constantesimao on an event with probability 0. Despite this distinction, it is still legitimate to treat Y lidar com a distribuigéo condicional deXdadoS=sim. as the constant y when dealing with the conditional distribution of X given Y = y. Para distribuigdes conjuntas mistas, continuamos a usar as Eqs. (3.6.2) e (3.6.3) para definir For mixed joint distributions, we continue to use Eqs. (3.6.2) and (3.6.3) to define PF's e PDF's condicionais. conditional p.f’s and p.d.f’s. Definicao PF condicional ou pdf de distribuigdo mista.DeixarxXseja discreto e deixeSser Definition | Conditional p.f. or p.d.f. from Mixed Distribution. Let X be discrete and let Y be 3.6.3 continuo com conjunto pf/pdffEntdo oPF condicional deXdadoS=simé definido 3.6.3 continuous with joint p.f/p.d.f. f. Then the conditional p.f: of X given Y = y is defined pela Eq. (3.6.2), e opdf condicional deSdadox=xé definido pela Eq. (3.6.3). by Eq. (3.6.2), and the conditional p.d.f. of Y given X = x is defined by Eq. (3.6.3). Construcdo da Distribuigao Conjunta Construction of the Joint Distribution Exemplo Pecas defeituosas.Suponha que uma determinada maquina produza produtos defeituosos e nado defeituosos. Example Defective Parts. Suppose that a certain machine produces defective and nondefective 3.6.7 peas, mas ndo sabemos que propor¢do de pegas defeituosas encontrariamos entre 3.6.7 parts, but we do not know what proportion of defectives we would find among todas as pecas que poderiam ser produzidas por esta maquina. DeixarPrepresentam a all parts that could be produced by this machine. Let P stand for the unknown proporcdo desconhecida de pecas defeituosas entre todas as pecas possiveis produzidas proportion of defective parts among all possible parts produced by the machine. If we pela maquina. Se aprendéssemos isso P=p, poderiamos estar dispostos a dizer que as were to learn that P = p, we might be willing to say that the parts were independent partes eram independentes umas das outras e cada uma tinha probabilidadepde estar of each other and each had probability p of being defective. In other words, if we com defeito. Em outras palavras, se condicionarmosP=p, entdo temos a situagdo descrita condition on P = p, then we have the situation described in Example 3.1.9. As in no Exemplo 3.1.9. Como naquele exemplo, suponha que examinemosnpegas e deixeX that example, suppose that we examine n parts and let X stand for the number of representam o numero de defeituosos entre osnpegas examinadas. A distribuicdo dex, defectives among the n examined parts. The distribution of X, assuming that we know supondo que sabemos P=p, éa distribuigdo binomial com parametrosnep. Ou seja, P =p, is the binomial distribution with parameters n and p. That is, we can let the podemos deixar o bindémio PF (3.1.4) ser o PF condicional deXdadoP=p, a saber, binomial p.f. (3.1.4) be the conditional p.f. of X given P = p, namely, ( ) gi(x| PF x px(1 -p)n-x,parax=0,..., 7. gy(x|p) = (")ora — py)’, forx =0,...,7. 3.6 Distribuigdes Condicionais 147 3.6 Conditional Distributions 147 Poderiamos também acreditar quePtem uma distribuigdo continua com pdf comof We might also believe that P has a continuous distribution with p.d-f. suchas f,(p) =1 (PF1 por Osps1. (Isso significa quePtem distribuigdo uniforme no intervalo [0,1].) for 0 < p <1. (This means that P has the uniform distribution on the interval [0, 1].) Sabemos que o PF condicionalgideXdado P=psatisfaz We know that the conditional p.f. 9; of X given P = p satisfies F(x, p) f(, P) gpa SE, gi(xlp) =, f2(P) fx(P) onde& o PF/pdf conjunto deXeP.Se multiplicarmos ambos os lados desta equagdo where f is the joint p.f/p.d.f. of X and P. If we multiply both sides of this equation porf2(P), segue-se que o conjunto pf/pdf dexePé by f2(p), it follows that the joint p.f/p.d.f. of X and P is () n ~ _ F (x, pF gi (x| p)R(PE px(| -p)n-x,parax=0,..., Ndo,e OSpsi. f(x, p= a1xlp) f(p) = (")ora —p)’™, forx=0,...,n, andO<p<1. x x - < A construcdo do Exemplo 3.6.7 esta disponivel em geral, conforme explicamos a seguir. The construction in Example 3.6.7 is available in general, as we explain next. Generalizando a regra de multiplicagéo para probabilidades condicionaisUm caso especial Generalizing the Multiplication Rule for Conditional Probabilities A special case do Teorema 2.1.2, a regra de multiplicagdo para probabilidades condicionais, diz que of Theorem 2.1.2, the multiplication rule for conditional probabilities, says that if se AeBsdo dois eventos, entdo Pr(ANBPr.(A/Pr.(B| A). O seguinte teorema, cuja A and B are two events, then Pr(A M B) = Pr(A) Pr(B|A). The following theorem, prova é imediata das Eqs. (3.6.4) e (3.6.5), generaliza o Teorema 2.1.2 para o caso de whose proof is immediate from Eqs. (3.6.4) and (3.6.5), generalizes Theorem 2.1.2 to duas variaveis aleatorias. the case of two random variables. Teorema Regra de multiplicagdo para distribuig6es.DeixarXeSsejam varidveis aleatdrias tais queX Theorem Multiplication Rule for Distributions. Let X and Y be random variables such that X 3.6.2 tem PF ou PDFA(xJeStem PF ou PDFf(s). Além disso, suponha que o PF ou pdf 3.6.2 has p.f. or p.d.f. f;(x) and Y has pf. or p.d.f. f(y). Also, assume that the conditional condicional deXdadoS=simégi(x| eJenquanto o PF condicional ou pdf deSdado X=xég p.f. or p.d.f. of X given Y = y is g1(x|y) while the conditional p.f. or p.d.f. of Y given 2(vocé| x). Entdo para cadasimde tal modo quef(s) >0 e cadax, X =X is go(y|x). Then for each y such that f>(y) > 0 and each x, F(x, YF gi (x| s)f2(s), (3.6.7) f@ Y= H81GlW AO), (3.6.7) onde# o PF, pdf ou PF/pdf conjunto dexXeS.Da mesma forma, para cadaxde tal modo que f where f is the joint p.f, p.d.f,, or p.f./p.d.f. of X and Y. Similarly, for each x such that 1(x) >0 e cadasim, f(x) > 0 and each y, F (x, YF fi (x)g2(vocé| x). (3.6.8) FQ, y) = fi@)g2(y|x). (3.6.8) 7 7 No Teorema 3.6.2, se/2(voc&0 para algum valorsimo, entéo pode-se assumir sem perda In Theorem 3.6.2, if f(y) = 0 for some value yo, then it can be assumed without de generalidade quef(x, yo0 para todos os valores dex. Neste caso, ambos os lados da Eq. loss of generality that f(x, yo) =0 for all values of x. In this case, both sides of (3.6.7) sera 0, e o fato de quegi(x| simo)ndo é definido de forma Unica torna-se irrelevante. Eq. (3.6.7) will be 0, and the fact that g;(x|yg) is not uniquely defined becomes Portanto, a Eq. (3.6.7) sera satisfeito portodosvalores dexesim. Uma afirmacao semelhante se irrelevant. Hence, Eq. (3.6.7) will be satisfied for all values of x and y. A similar aplica a Eq. (3.6.8). statement applies to Eq. (3.6.8). Exemplo Esperando em uma fila.DeixarXé a quantidade de tempo que uma pessoa tem que esperar pelo atendimento Example Waiting ina Queue. Let X be the amount of time that a person has to wait for service 3.6.8 numa fila. Quanto mais rapido o servidor trabalhar na fila, menor deverd ser o tempo de 3.6.8 in a queue. The faster the server works in the queue, the shorter should be the espera. DeixarSrepresenta a taxa na qual o servidor funciona, que consideraremos waiting time. Let Y stand for the rate at which the server works, which we will take desconhecida. Uma escolha comum de distribuigdo condicional paraXdadoS=simtem pdf to be unknown. A common choice of conditional distribution for X given Y = y has condicional para cadavocé >0: conditional p.d-f. for each y > 0: {_. VOSxY paraxz0, e-*Y forx >0, giix| er P sitly)= 4 — 0 de outra forma. 0 otherwise. Vamos assumir queStem distribuigdo continua com pdff(s-esimparavocé >0. We shall assume that Y has a continuous distribution with p.d.f. f,(y) =e7” for y > 0. Agora podemos construir o pdf conjunto deXeSusando o Teorema 3.6.2: Now we can construct the joint p.d.f. of X and Y using Theorem 3.6.2: { VOS-voce(x+1) >0,vocé >0, —yG+D forx >0, y>0, F(x, YF gt (x| s)R(s Parone’ fe. y) = gi@ly) AO) = | * nee 0 caso contrario. - 0 otherwise. < 148 Capitulo 3 Varidveis Aleatdrias e Distribuicgées 148 Chapter 3 Random Variables and Distributions Exemplo Pecas defeituosas.DeixarXser o numero de pecas defeituosas em uma amostra de tamanhon, e Example Defective Parts. Let X be the number of defective parts in a sample of size n, and 3.6.9 deixar Pserd a proporcao de defeituosos entre todas as peas, como no Exemplo 3.6.7. A articulagado 3.6.9 let P be the proportion of defectives among all parts, as in Example 3.6.7. The joint pf/pdf dexXeP=pfoi calculado la como p.f./p.d.f of X and P = p was calculated there as () f(x, PF gi (x| p)f2(P ” px(1 -p)n-x,parax=0,..., Ne OSps. f(x, p) = a1(xlp) fo(p) = (")ova —p)"™, forx=0,...,nand0<p<1. x x Poderiamos agora calcular a fdp condicional dePdadoX=xencontrando primeiro o PF We could now compute the conditional p.d.f. of P given X = x by first finding the marginal dex: marginal p.f. of X: hQ n ‘Vn fi(xF x Px(1 -P)n-xDP, (3.6.9) fi) =| ( ora — p)" “dp, (3.6.9) 0 Oo \x O pdf condicional dePdadoX=xé entdo The conditional p.d.f. of P given X = x is then _ X(1 _ y)\n-x e2(pag. |x ie =j—PC Phx | parad<p<1. (3.6.10) go(pix) = LEP PCa tog ep <i (3.6.10) ) 0gx(1 -gyn-xdg SQ) So A= q)"*dq A integral no denominador da Eq. (3.6.10) pode ser tedioso de calcular, mas pode ser The integral in the denominator of Eq. (3.6.10) can be tedious to calculate, but it can encontrado. Por exemplo, sen=2 ex=1, obtemos be found. For example, if n = 2 and x = 1, we get fi 1 1 1 1 1 1 1 1-qdqg=-= <= = 1-—q)dq==--==-. (UIE 3 [a DIV=5- 355 Nesse caso, g2(pag.| 1 6p(1 -p)para OS pst. - In this case, go(p|1) = 6p(1 — p) for0 < p <1. < Teorema de Bayes e a Lei da Probabilidade Total para Varidveis AleatériasO Bayes’ Theorem and the Law of Total Probability for Random Variables The calculo feito na Eq. (3.6.9) € um exemplo de generalizagdo da lei da probabilidade total calculation done in Eq. (3.6.9) is an example of the generalization of the law of total para variaveis aleatdérias. Além disso, 0 calculo na Eq. (3.6.10) 6 um exemplo de probability to random variables. Also, the calculation in Eq. (3.6.10) is an example of generalizagdo do teorema de Bayes para varidveis aleatdrias. As provas desses the generalization of Bayes’ theorem to random variables. The proofs of these results resultados sdo diretas e ndo sdo fornecidas aqui. are straightforward and not given here. Teorema Lei da Probabilidade Total para Varidveis Aleatorias.Sef(s o PF marginal ou pdf de um Theorem Law of Total Probability for Random Variables. If f(y) is the marginal pf. or p.d.f. of a 3.6.3 variavel aleatoriaSegi (x| e/€ o PF ou pdf condicional deXdadoS=sim, entdo o PF 3.6.3 random variable Y and g;(x|y) is the conditional p.f. or p.d.f. of X given Y = y, then marginal ou pdf dexé the marginal p.f. or p.d.f. of X is 2d fi (x H(x| S)R(s), (3.6.11) A) =o gi@ly) AG). (3.6.11) sim y seSé discreto. SeSé continuo, 0 PR marginal ou pdf dexé if Y is discrete. If Y is continuous, the marginal p.f. or p.d.f. of X is °° CO A(x gi(x| s)f2(s) morri. (3.6.12) A@= / gi(xly) oO) dy. (3.6.12) 7 ~ 7 Existem vers6es das Eqs. (3.6.11) e (3.6.12) comxesimtrocados e os There are versions of Eqs. (3.6.11) and (3.6.12) with x and y switched and the subscritos 1 e 2 trocados. Estas versées seriam utilizadas se a distribui¢do subscripts 1 and 2 switched. These versions would be used if the joint distribution conjunta deXeSforam construidos a partir da distribuigdo condicional deSdadoxe of X and Y were constructed from the conditional distribution of Y given X and the a distribuigdo marginal dex. marginal distribution of X. Teorema Teorema de Bayes para varidveis aleatdrias.Se/2(s% 0 PF marginal ou pdf de um aleatério Theorem Bayes’ Theorem for Random Variables. If f(y) is the marginal p.f. or p.d.f. of a random 3.6.4 variavelSegi(x| e/6 o PF ou pdf condicional deXdadoS=sim, entdo o PF ou pdf 3.6.4 variable Y and g(x|y) is the conditional pf. or p.d.f. of X given Y = y, then the condicional deSdadoX=xé conditional p.f. or p.d.f. of Y given X = x is qivoce| xe LOIS) (3.6.13) a (y|x) = S40 (3.6.13) fi(x) fi) 3.6 Distribuigdes Condicionais 149 3.6 Conditional Distributions 149 ondefi (xX obtido da Eq. (3.6.11) ou (3.6.12). Da mesma forma, o PF ou pdf where f\(x) is obtained from Eq. (3.6.11) or (3.6.12). Similarly, the conditional p.f. condicional deXdado S=simé or p.d.f. of X given Y = y is g2(voce| x)fi (x) g2(y|x) fi(x) gx|e=e = (3.6.14) g(aly) = Vee (3.6.14) f(s) fy) ondef2(sX% obtido da Eq. (3.6.11) ou (3.6.12) comxesimcomutado e com os where f>(y) is obtained from Eq. (3.6.11) or (3.6.12) with x and y switched and with subscritos 1 e 2 trocados. 7 the subscripts 1 and 2 switched. 7 Exemplo Escolhendo pontos de distribuigdes uniformes.Suponha que um pontoXé escolhido de Example Choosing Points from Uniform Distributions. Suppose that a point X is chosen from 3.6.10 a distribuigdo uniforme no intervalo [0,1], e isso apdés 0 valorX=xtem sido 3.6.10 the uniform distribution on the interval [0, 1], and that after the value X = x has been observado(0<x <1), um pontoSé entdo escolhido a partir da distribuigdo observed (0 < x < 1), a point Y is then chosen from the uniform distribution on the uniforme no intervalo [x,1]. Vamos derivar a pdf marginal deS. interval [x, 1]. We shall derive the marginal p.d.f. of Y. DesdeXtem uma distribuigdo uniforme, a pdf marginal deXé o seguinte: Since X has a uniform distribution, the marginal p.d.f. of X is as follows: t 1 para O<x <1, 1 for0<x <1, A(x fie) = ; 0 de outra forma. 0 otherwise. Da mesma forma, para cada valorX=x(0 <x <1), a distribuigdo condicional deSé a Similarly, for each value X = x (0 < x <1), the conditional distribution of Y is the distribuigdo uniforme no intervalo [x,1]. Como a duracao deste intervalo é 1 -x, o uniform distribution on the interval [x, 1]. Since the length of this interval is 1 — x, pdf condicional deSdado queX=xvai ser the conditional p.d.f. of Y given that X = x will be | 1 arax<y<l tL forx<y<1l g2(vocé| x [1 -x P ys SVixny=}) 1x y , 0 de outra forma. 0 otherwise. Segue-se da Eq. (3.6.8) que o pdf conjunto deXe Svai ser It follows from Eq. (3.6.8) that the joint p.d.f. of X and Y will be | 1 ara O<x <y<1 1 for0<x<y<l fixyk liix ?P YS" (3.6.15) fx. y=} Tox vsyst (3.6.15) 0 de outra forma. 0 otherwise. Assim, para 0<vocé <1, 0 valor da pdf marginal/2(sdeSvai ser Thus, for 0 < y <1, the value of the marginal p.d.f. f5(y) of Y will be Joo Jsim 1 oo yo4 A(sF F(x, Y) ax= —— adx=-registro(1 -e). (3.6.16) hy) = / f@, y)dx= / —— dx = —log(1— y). (3.6.16) — 0 0 1-x oo go l-x Além disso, desdeSndo pode estar fora do intervalo 0<vocé <1, entéofa(sK0 para sim Furthermore, since Y cannot be outside the interval 0 < y <1, then f(y) = 0 for <0 ousim=1. Este pdf marginalfesta esbogado na Figura 3.21. E interessante notar y <Oor y >1. This marginal p.d.f. f, is sketched in Fig. 3.21. It is interesting to note que neste exemplo a fungdofé ilimitado. that in this example the function f, is unbounded. Também podemos encontrar o pdf condicional deXdadoS=simaplicando o teorema We can also find the conditional p.d-f. of X given Y = y by applying Bayes’ theo- de Bayes (3.6.14). O produto deg2(vocé| xJefi (xjja foi calculado na Eq. (3.6.15). rem (3.6.14). The product of go(y|x) and f,(x) was already calculated in Eq. (3.6.15). Figura 3.210 pdf marginal = sim Figure 3.21 The marginal = f.() deSno Exemplo 3.6.10. | p.d.f. of Y in Example 3.6.10. | | | | | | | | | | | | | | | | | 0 1 sim 0 1 y 150 Chapter 3 Random Variables and Distributions The ratio of this product to f2(y) from Eq. (3.6.16) is g1(x|y) = ⎧ ⎨ ⎩ −1 (1 − x) log(1 − y) for 0 < x < y, 0 otherwise. ◀ Theorem 3.6.5 Independent Random Variables. Suppose that X and Y are two random variables having a joint p.f., p.d.f., or p.f./p.d.f. f . Then X and Y are independent if and only if for every value of y such that f2(y) > 0 and every value of x, g1(x|y) = f1(x). (3.6.17) Proof Theorem 3.5.4 says that X and Y are independent if and only if f (x, y) can be factored in the following form for −∞ < x < ∞ and −∞ < y < ∞: f (x, y) = f1(x)f2(y), which holds if and only if, for all x and all y such that f2(y) > 0, f1(x) = f (x, y) f2(y) . (3.6.18) But the right side of Eq. (3.6.18) is the formula for g1(x|y). Hence, X and Y are independent if and only if Eq. (3.6.17) holds for all x and all y such that f2(y) > 0. Theorem 3.6.5 says that X and Y are independent if and only if the conditional p.f. or p.d.f. of X given Y = y is the same as the marginal p.f. or p.d.f. of X for all y such that f2(y) > 0. Because g1(x|y) is arbitrary when f2(y) = 0, we cannot expect Eq. (3.6.17) to hold in that case. Similarly, it follows from Eq. (3.6.8) that X and Y are independent if and only if g2(y|x) = f2(y), (3.6.19) for every value of x such that f1(x) > 0. Theorem 3.6.5 and Eq. (3.6.19) give the mathematical justification for the meaning of independence that we presented on page 136. Note: Conditional Distributions Behave Just Like Distributions. As we noted on page 59, conditional probabilities behave just like probabilities. Since distributions are just collections of probabilities, it follows that conditional distributions behave just like distributions. For example, to compute the conditional probability that a discrete random variable X is in some interval [a, b]given Y = y, we must add g1(x|y) for all values of x in the interval. Also, theorems that we have proven or shall prove about distributions will have versions conditional on additional random variables. We shall postpone examples of such theorems until Sec. 3.7 because they rely on joint distributions of more than two random variables. Summary The conditional distribution of one random variable X given an observed value y of another random variable Y is the distribution we would use for X if we were to learn that Y = y. When dealing with the conditional distribution of X given Y = y, it is safe to behave as if Y were the constant y. If X and Y have joint p.f., p.d.f., or p.f./p.d.f. f (x, y), then the conditional p.f. or p.d.f. of X given Y = y is g1(x|y) = 150 Capítulo 3 Variáveis Aleatórias e Distribuições A proporção deste produto paraf2(s)da Eq. (3.6.16) é ⎧ ⎨ − 1 para 0<x < y, de outra forma. g1(x|e)= ⎩(1 -x)registro(1 -e) 0 - Teorema 3.6.5 Variáveis Aleatórias Independentes.Suponha queXeSsão duas variáveis aleatórias ter um PF, pdf ou PF/pdf conjuntof.EntãoXeSsão independentes se e somente se para cada valor desimde tal modo quef2(s) >0 e cada valor dex, g1(x|e)=f1(x). (3.6.17) ProvaO teorema 3.5.4 diz queXeSsão independentes se e somente sef (x, y)pode ser fatorado na seguinte forma para -∞<x <∞e -∞<você <∞: f (x, y)=f1(x)f2(s), que vale se e somente se, para todosxe tudosimde tal modo quef2(s) >0, f (x, y) f2(s) f1(x)= . (3.6.18) Mas o lado direito da Eq. (3.6.18) é a fórmula parag1(x|e). Por isso,XeSsão independentes se e somente se a Eq. (3.6.17) vale para todosxe tudosimde tal modo quef2(s) >0. O teorema 3.6.5 diz queXeSsão independentes se e somente se o PF ou pdf condicional deX dadoS=simé o mesmo que o PF marginal ou pdf deXpara todossimde tal modo que f2(s) >0. Porqueg1(x|e)é arbitrário quandof2(s)=0, não podemos esperar a Eq. (3.6.17) para ser válido nesse caso. Da mesma forma, segue da Eq. (3.6.8) queXeSsão independentes se e somente se g2(você|x)=f2(s), (3.6.19) para cada valor dexde tal modo quef1(x) >0. Teorema 3.6.5 e Eq. (3.6.19) dá a justificativa matemática para o significado de independência que apresentamos na página 136. Nota: Distribuições condicionais se comportam exatamente como distribuições.Como observamos na página 59, as probabilidades condicionais comportam-se exatamente como as probabilidades. Como as distribuições são apenas coleções de probabilidades, segue-se que as distribuições condicionais se comportam exatamente como as distribuições. Por exemplo, para calcular a probabilidade condicional de que uma variável aleatória discretaXestá em algum intervalo [ um, b] dadoS=sim, devemos adicionarg1(x|e) para todos os valores dexno intervalo. Além disso, os teoremas que provamos ou iremos provar sobre distribuições terão versões condicionais a variáveis aleatórias adicionais. Adiaremos exemplos de tais teoremas até a Seção. 3.7 porque dependem de distribuições conjuntas de mais de duas variáveis aleatórias. Resumo A distribuição condicional de uma variável aleatóriaXdado um valor observado sim de outra variável aleatóriaSé a distribuição que usaríamos paraXse nós aprendêssemos issoS=sim. Ao lidar com a distribuição condicional deXdadoS= sim, é seguro comportar-se como seSeram a constantesim. SeXeStem PF, pdf ou PF/pdf conjuntof (x, y), então o PF ou pdf condicional deXdadoS=simég1(x|e)= 3.6 Distribuigdes Condicionais 151 3.6 Conditional Distributions 151 f (x, y)/f2(s), onde fé o PF marginal ou pdf deS.Quando for conveniente especificar t(, y)/fo(y), where f, is the marginal p.f. or p.d.f. of Y. When it is convenient to diretamente uma distribuicgdo condicional, a distribuigdéo conjunta pode ser construida a specify a conditional distribution directly, the joint distribution can be constructed partir da condicional juntamente com a outra marginal. Por exemplo, from the conditional together with the other marginal. For example, F(x, YF gi (x| s)f2(sF fi (x)g2(voce| x). F(x, y) = Oly) 200) = fiO)go(ylx). Neste caso, temos versées da lei da probabilidade total e do teorema de Bayes para In this case, we have versions of the law of total probability and Bayes’ theorem for variaveis aleatérias que nos permitem calcular as demais marginais e condicionais. random variables that allow us to calculate the other marginal and conditional. Duas variadveis aleatériasXeSsdo independentes se e somente se o PF ou pdf Two random variables X and Y are independent if and only if the conditional p.f. condicional deXdadoS=simé o mesmo que o PF marginal ou pdf deXpara todossimde tal or p.d.f. of X given Y = y is the same as the marginal p.f. or p.d.f. of X for all y such modo quef(s) >0. Equivalentemente,XeSsdo independentes se e somente se o FP that f5(y) > 0. Equivalently, X and Y are independent if and only if the conditional condicional de pdf deSdadoX=xé o mesmo que o PF marginal ou pdf deSpara todosx de p-f£. of p.d.f. of Y given X = x is the same as the marginal p.f. or p.d.f. of Y for all x tal modo quefi(x) >0. such that f(x) > 0. Exercicios Exercises { 1.Suponha que duas variaveis aleatériasXeStenha o pdf F(x, YF c(x+ vocé) para O<x<1 e Ossims1, caso 1. Suppose that two random variables X and Y have the fa y= c(x + y’) forO0<x <land0O<y<l, conjunto no Exemplo 3.5.10 na pagina 139. Calcule o pdf VY. 0 contrario. joint p.d.f. in Example 3.5.10 on page 139. Compute the es 0 otherwise. condicional deXdado=simpara cadasim. conditional p.d.f. of X given Y = y for each y. . Lo . P Determinar(a)o pdf condicional deXpara cada dado P é ” ” Determine (a) the conditional p.d.f. of X for every given 2.Cada aluno de uma determinada escola de ensino médio foi valor deS,e(b)Pr.(X <1 2| 51 2). 2. Each student in a certain high school was classified ac- value of Y, and (b) Pr(X < sl¥ = 5): classificado de acordo com seu ano de escolaridade (primeiro, 5.Suponha que a pdf conjunta de dois pontosxXeS cording to her year in school (freshman, sophomore, ju- 5. Suppose that the joint p.d.f. of two points X and Y segundo, terceiro ou Ultimo ano) e de acordo com o numero de : : eax nior, or senior) and according to the number of times that : . : os . . escolhido pelo processo descrito no Exemplo 3.6.10 é tao .. . chosen by the process described in Example 3.6.10 is as vezes que visitou determinado museu (nunca, uma vez ou mais de . oe she had visited a certain museum (never, once, or more . : we ~ . es dado pela Eq. (3.6.15). Determinar(a)a c(condicdo|nal pd). f. . . . given by Eq. (3.6.15). Determine (a) the conditional p.d_f. uma vez). As propor¢ées de alunos nas diversas classificagdes sao | than once). The proportions of students in the various clas- ; 1 3 apresentadas na tabela seguinte: deXpara cada valor dado deS,e(b)Pr.X >1 AS=3 7 sifications are given in the following table: of X for every given value of Y, and (b) Pr(X > 5 | Y= 3). Mais 6.Suponha que a pdf conjunta de duas variaveis aleatériasX More 6. Suppose that the joint p.d.f. of two random variables X Nunca Uma vez que uma vez e5€ 0 seguinge: Never Once than once and ¥ is as follows: a f qecadox pala OSx<71/2 e OSsims3, caso a _ Jcsinx for0O<x<z/2and0<y <3, Calouros 0,08 0,10 0,04 AYF contrério. Freshmen 0.08 0.10 0.04 f=) 9 otherwise. Alunos do segundo ano 0,04 0,10 0,04 Determinar(a)o pdf condicional deSpara cada valor Sophomores 0.04 0.10 0.04 Determine (a) the conditional p.d.f. of Y for every given Juniores 0,04 0,20 0,09 dado dex, e(b)Pr.(1 <S <2| X<0.73). Juniors 0.04 0.20 0.09 value of X, and (b) Pr(1 < Y < 2|X = 0.73). Idosos 0,02 0,15 0,10 7.Suponha que a pdf conjunta de duas variaveis aleatdriasX Seniors 0.02 0.15 0.10 7. Suppose that the joint p.d.f. of two random variables X eSé o seguinte; and Y is as follows: a.Se uma aluna selecionada aleatoriamente no ensino médio for do (4-2.x-e) parax >0, vocé >0, a. If a student selected at random from the high school + (4-2x—y) forx>0,y>0, primeiro ano do ensino médio, qual é a probabilidade de ela nunca F(x, yr 2x4 3A is a junior, what is the probability that she has never fl y= d2 4 ter visitado o museu? VW, | C AXFVOCE SE, visited the museum? = and 2x + y <4, b.Se uma aluna selecionada aleatoriamente na escola 0 de outra forma, b. Ifastudent selected at random from the high school 0 otherwise. secundaria visitou 0 museu trés vezes, qual é a Determinar(a)o pdf condicional de Spara cada valor has visited the museum three times, what is the prob- —_ Determine (a) the conditional p.d.f. of Y for every given probabilidade de ela estar no Ultimo ano? dado dex, e(b)Pr.(S22 | X=0.5). ability that she is a senior? value of X, and (b) Pr(Y > 2|X =0.5). 3.Suponha que um ponto (X,5)é escolhido aleatoriamente do disco 8.Suponha que a pontuagdo de uma pessoaXem um teste de 3. Suppose that a point (X, Y) is chosen at random from 8. Suppose that a person’s score X on a mathematics ap- Sdefinido da seguinte forma: aptidao matematica é um numero entre 0 e 1, e que sua the disk S defined as follows: titude test is a number between 0 and 1, and that his score pontuacdo Sem um teste de aptidao musical também é um Y on a music aptitude test is also a number between 0 S= {(%, Yi 2+ (Sim2 ps9}. numero entre 0 e 1. Suponha ainda que na populacdo de todos os S={(x, yi: —1)? + (y +2)? <9}. and 1. Suppose further that in the population of all col- ; . estudantes universitarios nos Estados Unidos, as pontuagéesXeS ; . ; lege students in the United States, the scores X and Y are Determinar(a)o pdf condicional deSpara cada valor so distribuidos de acordo com o seguinte pdf conjunto: Determine (a) the conditional p.d.f. of Y for every given _ distributed according to the following joint p.d.f.: dado dex, e(b)Pr.(S >0 | X=2). { value of X, and (b) Pr(Y > 0|X = 2). 8(2x+3e)para OSx<1 e O<sims1, 0 2(2x + 3y) for0<x <land0<y<1 4.Suponha que a pdf conjunta de duas varidveis aleatériasXe F(X, YF (@ ~ 4. Suppose that the joint p.d-f. of two random variables X ff, y= s(x +5y) ~ * ~ =vS* Sé 0 seguinte: - de outra forma. and Y is as follows: 0 otherwise. 152 Capitulo 3 Varidveis Aleatdrias e Distribuicgées 152 Chapter 3 Random Variables and Distributions a.Que proporcdo de estudantes universitarios obtém pontuagdo 11.A definigdo da pdf condicional deXdadoS= simé arbitrario a. What proportion of college students obtain a score 11. The definition of the conditional p.d-f. of X given Y = superior a 0,8 na prova de matematica? sef2(s0. A razao pela qual isto ndo causa problemas sérios é greater than 0.8 on the mathematics test? y is arbitrary if f>(y) = 0. The reason that this causes no b.Se a nota de um aluno na prova de musica for 0,3, qualéa (ue € altamente improvavel que observemosSperto de um b. Ifastudent’s score on the music test is 0.3, what isthe SeTious problem is that it is highly unlikely that we will probabilidade de sua nota na prova de matematica ser valorsinmde tal modo quef2(vocéo0. Para ser mals preciso, probability that his score on the mathematics test will Observe Y close to a value yo such that f2(y9) = 0. To be superior a 0,8? vamosf(vocéo0, e deixeAo= [sirmo-e,sirm+é]. Além disso, be greater than 0.8? thle, precise, let Fay) 0, and “et Ao = bo —€, yo te]. deixesinnseja tal quef(vocé:) >0, e deixeA1= [sinm- €, vocéité]. . . so, let e such that > 0, and let Ay = _ c. Se a nota de um aluno na prova de matematica for mela ans ( ) , . . 1 c. If a student’s score on the mathematics test is 0.3, ¥1 AO. 1=bi , - Assuma isso/2é continuo em amboss/noesimm. Mostre isso : ue : : €, y, te]. Assume that f> is continuous at both yo and yy. 0,3, qual é a probabilidade de sua nota na prova de what is the probability that his score on the music - . . Show that musica ser superior a 0,8? test will be greater than 0.8? a, PIlSEAO) _ 4 am PX € AO) _ 9 9.Suponha que qualquer um dos dois instrumentos possa ser --0Pr.(SEA1) i 9. Suppose that either of two instruments might be used at PriyeA) usado para fazer uma determinada medicao. O instrumento 1 for making a certain measurement. Instrument 1 yields a ; . ; ; produz uma medicdo cuja pdfh 16 Ou seja, a probabilidade capacidade queSé perto desirmé muito menor measurement whose p.d.f. hy is That is, the probability that Y is close to yg is much smaller { do que a probabilidade de queSé perto desirm. than the probability that Y is close to yy. M(x 2x para O<x <1, 12.DeixarSsera a taxa (chamadas por hora) com que as chamadas h(x) = | 2x for0<x <1, 12. Let Y be the rate (calls per hour) at which calls arrive O de outra forma. chegam a uma central telefénica. DeixarXsera o numero de chamadas 0 otherwise. at a switchboard. Let X be the number of calls during a ; ; ; . durante um perfodo de duas horas. Suponha que a fdp marginal desé . . two-hour period. Suppose that the marginal p.d.f. of Y is O instrumento 2 produz uma medicdo cuja pdfh2é { Instrument 2 yields a measurement whose p.d.f. h> is sim é “Y ify>0 { f(s e-sim sevocé >0, f ( ) _ | e y > noe 2 ay . ho(x}=3x 2para O<x <1, 0 caso contrario, h(x) = | 3x° for 0 <x <1, 0 otherwise, O de outra forma. e que o FP condicional deXdadoS=simé 0 otherwise. and that the conditional p.f. of X given Y = y is Suponha que um dos dois instrumentos seja escolhido (oe) Suppose that one of the two instruments is chosen at ran- (2y)* eee ; x 04 ; ‘th i ye” ifx=—0,1 aleatoriamente e uma medidaXé feito com isso. —— &2sim SEx=0,1,..., dom and a measurement X is made with it. _j—__é Ux =U,4,..., gi(x| er [x gi(xly) = x! a.Determine a pdf marginal dex. 0 de outra forma. a. Determine the marginal p.d.f. of X. 0 otherwise. b.Se o valor da medicao forx=1A, qual é a a.Ffencontre 0 PF marginal dex. (Vocé pode usar a formula b. If the value of the measurement is X = 1/4, what is a. Find the marginal p.f. of X. (You may use the formula probabilidade de o instrumento 1 ter sido usado? © kin the probability that instrument 1 was used? OO ky ' 5 Mt orre=K.,) Jo yre dy =k!) 10.Numa grande colecdo de moedas, a probabilidadeXque b.Encontre o pdf condicionalg2(vocé| 0)de SdadoX=0. 10. In a large collection of coins, the probability X that a b. Find the conditional p.d.f. g5(y|0) of Y given X = 0. uma cara sera obtida quando uma moeda for langada varia de c.Encontre o pdf condicionalg2(vocé| 1)deSdadoX=1. head will be obtained when a coin is tossed varies from one c. Find the conditional p.d.f. g5(y|1) of Y given X = 1. uma moeda para outra, e a distribuigdo deXna colegdo é . _ a - coin to another, and the distribution of X in the collection : . i . d.Para quais valores desimég2(vocé| 1) >g2(vocé| OP Isso . : : d. For what values of y is g9(y|1) > g2(y|0)? Does this especificado pelo seguinte pdf: or arenes . is specified by the following p.d-f.: . Lo, concorda com a intuigdo de que quanto mais ligagdes vocé agree with the intuition that the more calls you see, a * , 5 . . 9 hoe { 6x(1 -x) para O<x <1, vé, mais alta vocé deve pensar que é a taxa? hws 6x(l—x) ford<x <1, the higher you should think the rate is? 0 de outra forma. 13.Comece com a distribuicgaéo conjunta do grupo de tratamento e me 10 otherwise. 13. Start with the joint distribution of treatment group da resposta na Tabela 3.6 na pagina 138. Para cada grupo de and response in Table 3.6 on page 138. For each treatment Suponha que uma moeda seja selecionada aleatoriamente da tratamento, calcule a distribuigdo condicional da resposta dado o Suppose that a coin is selected at random from the collec- group, compute the conditional distribution of response colecdo e lancada uma vez, e que seja obtida uma cara. grupo de tratamento. Eles parecem ser muito semelhantes ou tion and tossed once, and that a head is obtained. Deter- given the treatment group. Do they appear to be very Determine a pdf condicional deXpara esta moeda. bastante diferentes? mine the conditional p.d.f. of X for this coin. similar or quite different? 3.7 Distribuigdes Multivariadas 3.7 Multivariate Distributions Nesta secao, estenderemos os resultados desenvolvidos nas Sees 3.4, 3.5 e 3.6 para In this section, we shall extend the results that were developed in Sections 3.4, duas varidveis aleatoriasXeSpara um numero finito arbitrario nde varidveis aleat6ériasX 3.5, and 3.6 for two random variables X and Y to an arbitrary finite number ,..+,Xn. Em geral, a distribuicgo conjunta de mais de duas varidveis aleatérias 6 chamada n of random variables X,,..., X,. In general, the joint distribution of more dedistribuigdo multivariada. A teoria da inferéncia estatistica (o assunto da parte deste than two random variables is called a multivariate distribution. The theory of livro que come¢a no Capitulo 7) baseia-se em modelos matemdticos para dados statistical inference (the subject of the part of this book beginning with Chapter 7) observdveis nos quais cada observacao 6 uma varidvel aleatoria. Por esta razao, relies on mathematical models for observable data in which each observation is distribuicgées multivariadas surgem naturalmente nos modelos matemdticos de dados. O a random variable. For this reason, multivariate distributions arise naturally in modelo mais comumente usado serd aquele em que as varidveis aleatérias dos dados the mathematical models for data. The most commonly used model will be one in individuais sao condicionalmente independentes, dadas uma ou duas outras varidveis which the individual data random variables are conditionally independent given aleatorias. one or two other random variables. 3.7 Distribuigdes Multivariadas 153 3.7 Multivariate Distributions 153 Distribuicées Conjuntas Joint Distributions Exemplo Um ensaio clinico.suponha queeupacientes com uma determinada condicdo médica recebem um Example A Clinical Trial. Suppose that m patients with a certain medical condition are given a 3.7.1 tratamento, e cada paciente se recupera da condicdo ou ndo consegue se recuperar. Para cadaeu=1 3.7.1 treatment, and each patient either recovers from the condition or fails to recover. For , +++, GU, podemos deixarXeu=1 se pacienteeuse recupera eXeu=0 se ndo. Também podemos acreditar eachi =1,...,m, we can let X; = 1if patient i recovers and X; = 0 if not. We might que existe uma variavel aleatoria tendo uma distribuigdo continua assumindo valores entre 0 e 1 tal also believe that there is arandom variable P having a continuous distribution taking que, se soubéssemos queP=p, diriamos que oeu os pacientes se recuperam ou ndo conseguem se values between 0 and 1 such that, if we knew that P = p, we would say that the m recuperar independentemente uns dos outros, cada um com probabilidade pde recuperacdo. Agora patients recover or fail to recover independently of each other each with probability nomeamosn=eu1 varidveis aleatérias nas quais estamos interessados. p of recovery. We now have named n = m+ 1 random variables in which we are - interested. < A situacao descrita no Exemplo 3.7.1 exige que construamos uma distribuicao conjunta The situation described in Example 3.7.1 requires us to construct a joint distri- paranvariaveis aleatérias. Forneceremos agora definigdes e exemplos dos conceitos bution for n random variables. We shall now provide definitions and examples of the importantes necessarios para discutir distribuigdes multivariadas. important concepts needed to discuss multivariate distributions. Definigao Funcdo de Distribuigdo Conjunta/cdfO cdf conjuntodemvariaveis aleatoriasXi,..., Xné Definition —_Joint Distribution Function/c.d.f. The joint c.d.f of n random variables Xj,..., X, is 3.7.1 a fungdoFcujo valor em cada ponto(x1, ..., XnJemnespaco dimensional Rné 3.7.1 the function F whose value at every point (x;, ..., x,,) in n-dimensional space R” is especificado pela relacao specified by the relation F(xi,...,XnPr.(Xisx1, X2Sx2,..., XnSXn). (3.7.1) F (x4, ...,X,) = Pr(Xy < x1, Xo < X90, ..., X_ SX). (3.7.1) Cada CDF multivariado satisfaz propriedades semelhantes aquelas fornecidas anteriormente para CDFs Every multivariate c.d.f. satisfies properties similar to those given earlier for univari- univariados e bivariados. ate and bivariate c.d.f’s. Exemplo Tempos de falha.Suponha que uma maquina tenha trés partes, e parteeufalhara na horaXeu Example Failure Times. Suppose that a machine has three parts, and part i will fail at time X; 3.7.2 paraeu=1,2,3. A fungdo a seguir pode ser o cdf conjunto de%1,X2, eX3: 3.7.2 for i = 1, 2, 3. The following function might be the joint c.d-f. of X1, X2, and X3: { _ _ . —p*1 _ —2x9 _ —3x3 > F (x1, 32, 6)" (l -em)(1 -e-2x2)(1 -e-3,3) paraxt, x2, x320, caso F(t 43,43) = { (1 — e*1)(1 — e7?*2)(1 — 03) for xy, X,¥32 0, 0 contrario. - 0 otherwise. < Nota¢dao vetorialNo estudo da distribuicdo conjunta demvariaveis aleatdérias X1,..., Xn Vector Notation In the study of the joint distribution of n random variables , Muitas vezes é conveniente usar a notacdo vetorialX=(M1, ..., XnJe para se referirXcomo X1,..., X,, it is often convenient to use the vector notation X = (X;,..., X,) and umvetor aleatério. Em vez de falar da distribuigéo conjunta das varidveis aleatériasX1 to refer to X as a random vector. Instead of speaking of the joint distribution of ,.-.,Xncom um cdf conjuntoF (x1, ..., Xn), podemos simplesmente falar da distribuigdo the random variables X;,..., X, with a joint c.df. F(xy,...,x,), we can simply do vetor aleatérioXcom CDFF (x). Quando esta notacdo vetorial é usada, deve-se ter em speak of the distribution of the random vector X with c.d.f. F(x). When this vector mente que seXé ummvetor aleatério multidimensional, entéo seu cdf é definido como notation is used, it must be kept in mind that if X is an n-dimensional random vec- uma fungaéo emnespaco dimensionalAn. Em cada ponto x=(x1,..., XnJERn, 0 valor deF (x}é tor, then its c.d.f. is defined as a function on n-dimensional space R”. At each point especificado pela Eq. (3.7.1). X= (x1,...,%,) € R”, the value of F(x) is specified by Eq. (3.7.1). Definigao Distribuigdo Discreta Conjunta/pfDiz-se quenvariaveis aleatoriasXi,..., Xntenha um Definition Joint Discrete Distribution/p.f. It is said that n random variables X;,..., X,, have a 3.7.2 distribuicgo conjunta discretase 0 vetor aleatério(X,..., XnJpode ter apenas um 3.7.2 discrete joint distribution if the random vector (Xj, ..., X,) can have only a finite numero finito ou uma sequéncia infinita de diferentes valores possiveis(x1,..., Xn) number or an infinite sequence of different possible values (x1, ..., x,) in R”. The emAn. O junta pfdem, ..., Xné entdo definido como a fungdoftal que para cada ponto joint p.f of X1,..., X, is then defined as the function f such that for every point (x1, ...,XnERn, (x1, ---5X,) € R”, f(xX,...,XnFPr (Maxi, ..., Xn=Xn). Sq, 5 X_) = Pr(X, = 44, .. 2, X, = Xy)- Na notagdo vetorial, a Definicdo 3.7.2 diz que o vetor aleatorioXtem uma distribuicdo In vector notation, Definition 3.7.2 says that the random vector X has a discrete discreta e que seu PF é especificado em cada pontox€Anpela relagdo distribution and that its p.f. is specified at every point x € R” by the relation f (X}Pr.(X=x). f(@) =Pr(X¥ =x). 154 Capitulo 3 Varidveis Aleatdrias e Distribuicgées 154 Chapter 3 Random Variables and Distributions O resultado a seguir é uma generalizacdo simples do Teorema 3.4.2. The following result is a simple generalization of Theorem 3.4.2. Teorema SeXtem uma distribuicdo conjunta discreta com PF conjuntafentdo para cada subconjunto CC Rn, Theorem If X has a joint discrete distribution with joint p.f. f, then for every subset C C R", 3.7.1 2 3.7.1 Pr.(XECF F (x). " Pr(X¥ EC)= > f(x). " XEC xEC E facil mostrar que, se cada um dos%i, ..., Xntem uma distribuicdo discreta, entao X=(X1 It is easy to show that, if each of X,,..., X,, has a discrete distribution, then ,..+,Xatem uma distribuigdo conjunta discreta. X = (Xj,..., X,) has a discrete joint distribution. Exemplo Um ensaio clinico.Considere oeupacientes no Exemplo 3.7.1. Suponha por enquanto que Example A Clinical Trial. Consider the m patients in Example 3.7.1. Suppose for now that 3.7.3 P=pé conhecido para que nao o tratemos como uma variavel aleatéria. O PF conjunto de X 3.7.3 P = :p is known so that we don’t treat it as a random variable. The joint p.f. of =(M,...,Xeuk X = (X1,..., Xm) is Ff (XFF pxit...+xeu(1 -P)ew-xi-...-Xeu, f() = pute tm (1 _ pyr xm para todosxev€ {0,1} e 0 caso contrario. - for all x; € {0, 1} and 0 otherwise. < Definigao Distribuigdo Continua/pdfDiz-se quenvariaveis aleatériasXi,..., Xntenha um Definition | Continuous Distribution/p.d.f. It is said that n random variables X),..., X, have a 3.7.3 distribuigdo conjunta continua se houver uma fungdo nao negativadefinido em Antal 3.7.3 continuous joint distribution if there is a nonnegative function f defined on R” such que para cada subconjuntoCc Rn, iy that for every subset C C R", Pr[(M,..., XnEQ = ot Wixi, ... Xn) aX... Xn, (3.7.2) Pr[(X1,..., X,) €C]= / / fq, -Xp_) dx, +++ dXp, (3.7.2) se a integral existir. A fung¢do# chamado depdf conjuntode™, ..., Xn. if the integral exists. The function f is called the joint p.d.f of X1,..., Xn- Na notacdo vetorial,f(x)denota o pdf do vetor aleatérioXe Eq. (3.7.2) In vector notation, f(x) denotes the p.d-f. of the random vector X and Eq. (3.7.2) poderia ser reescrito de forma mais simples na ore could be rewritten more simply in the form Pr. (XECF “ot F (xox. Pr(X €C) -| Oe / f(x) dx. Teorema Se a distribuigdo conjunta de®, ..., Xné continuo, entdo o pdf conjuntofpode ser Theorem If the joint distribution of X;,..., X,, is continuous, then the joint p.d-f. f can be 3.7.2 derivado do cdf conjuntofusando a relagdo 3.7.2 derived from the joint c.d.f. F by using the relation OnF (x1,...,X O" F(x1,...,X (M1, ...,X0F OnE ( « XM) FO. x,) = 2 EG tn) OX, . . OXn Ox, +++ OX, em todos os pontos(x1,..., XnJem que a derivada nesta relacdo existe. a at all points (x1, ..., x,,) at which the derivative in this relation exists. 7 Exemplo Tempos de falha.Podemos encontrar a pdf conjunta para as trés varidveis aleatérias no Exame- Example Failure Times. We can find the joint p.d.f. for the three random variables in Exam- 3.7.4 ple 3.7.2 aplicando o Teorema 3.7.2. A parcial mista de terceira ordem é facilmente calculada 3.7.4 ple 3.7.2 by applying Theorem 3.7.2. The third-order mixed partial is easily calculated ser f to be 6@-x1-2.x2-3.x3 —x1—-2x7—313 fot, 2, BF parax1, x2, x3>0, caso f (xp Xo 3) = | 6e for x1, *2, x3 >0, 0 contrario. - 0 otherwise. < E importante notar que, mesmo que cada um1,..., Xntem uma distribuicgdo It is important to note that, even if each of X;,..., X,, has a continuous distri- continua, 0 vetorX=(X1,..., Xn)Jpode nao ter uma distribuigdo conjunta continua. Veja bution, the vector X = (X),..., X,,) might not have a continuous joint distribution. o Exercicio 9 nesta secao. See Exercise 9 in this section. Exemplo Tempos de servico em uma fila.Uma fila é um sistema no qual os clientes fazem fila para atendimento Example Service Times in a Queue. A queue is a system in which customers line up for service 3.7.5 e receber seu servico de acordo com algum algoritmo. Um modelo simples é a fila de servidor 3.7.5 and receive their service according to some algorithm. A simple model is the single- Unico, na qual todos os clientes esperam por um Unico servidor para atender todos os que server queue, in which all customers wait for a single server to serve everyone ahead estdo a sua frente na fila e entao sdo atendidos. Suponha que/os clientes chegam a um of them in the line and then they get served. Suppose that n customers arrive at a 3.7 Distribuigdes Multivariadas 155 3.7 Multivariate Distributions 155 fila de servidor Unico para servico. DeixarXeuseja o tempo que o servidor gasta atendendo o single-server queue for service. Let X; be the time that the server spends serving clienteeuparaeu=1,..., 7. Poderiamos usar uma distribuigéo conjunta paraX=(Xi1,..., Xn)com customer i fori =1,...,n. We might use a joint distribution for X = (X,,..., X,) pdf conjunto do formulario with joint p.d.f. of the form Cc c | (CD ee Fs para todosxeu>0, oo for all x; > 0, — n f (XF | 24° 7 yeu (3.7.3) f@~= (2 + x;) (3.7.3) 0 de outra forma. 0 otherwise. Vamos agora encontrar o valor dectal que a fungdo na Eq. (3.7.3) 6 uma pdf conjunta. We shall now find the value of c such that the function in Eq. (3.7.3) is a joint p.d-f. Podemos fazer isso integrando cada variavelm, . . ., xnsSucessivamente (comegando comxn We can do this by integrating over each variable x;, ..., x, in succession (starting ). A primeira integral é with x,,). The first integral is Joo c cn °° c c/n pep (3.7.4) / gy (3.7.4) 0 DK Tt Xt Q+x1t4xm1) 9 0 Qtat-- +a, (+x; t-0: xy" O lado direito da Eq. (3.7.4) esta no mesmo formato do pdf original, exceto quen The right-hand side of Eq. (3.7.4) is in the same form as the original p.d.f. except foi reduzido paran-1 ecfoi dividido porn. Segue-se que quando integramos sobre that n has been reduced to n — 1 and c has been divided by n. It follows that when a variavelxXe(paraeu=n-1, n-2,...,1), o resultado sera da mesma forma comn we integrate over the variable x; (for i =n —1,n —2,..., 1), the result will be in reduzido aeu-1 ecdividido porn(n-1)... eu. O resultado da integragdo de todas as the same form with n reduced to i — 1 and c divided by n(n — 1) - - -i. The result of coordenadas, excetox1é entdo integrating all coordinates except x, is then c/n c/n! ————., _ parax1>0. _e/nt for x; > 0. (2+xip (2+ x4)? Integrandoxifora disso rendec{2(n!J], que deve ser igual a 1, entaoc=2/(nl). - Integrating x; out of this yields c/[2(n!)], which must equal 1, so c = 2(n}). < Distribuicdes Mistas Mixed Distributions Exemplo Chegadas em uma fila.No Exemplo 3.7.5, introduzimos a fila de servidor Unico e Example Arrivals at a Queue. In Example 3.7.5, we introduced the single-server queue and 3.7.6 discutimos os tempos de servico. Algumas caracteristicas que influenciam o desempenho de 3.7.6 discussed service times. Some features that influence the performance of a queue are uma fila sao a taxa com que os clientes chegam e a taxa com que os clientes sao atendidos. the rate at which customers arrive and the rate at which customers are served. Let Z DeixarZ representa a taxa na qual os clientes sdo atendidos e deixaSrepresenta a taxa com que stand for the rate at which customers are served, and let Y stand for the rate at which os clientes chegam 4a fila. Finalmente, deixe Crepresenta o numero de clientes que chegam customers arrive at the queue. Finally, let W stand for the number of customers that durante um dia. EntaoCé discreto enquantoSeZpodem ser variaveis aleatérias continuas. Um arrive during one day. Then W is discrete while Y and Z could be continuous random possivel pf/pdf conjunto para essas trés variadveis aleatérias é variables. A possible joint p.f./p.d.f. for these three random variables is { Fy, zw 6e-32-10sim(Be)c/A paraz, y>0 ec=0,1,..., de fy z,w)= { 6e~72—-l0(8y)” /w! for z, y> Oandw=0,1,..., 0 outra forma. 0 otherwise. Podemos verificar esta afirmagdo em breve. - We can verify this claim shortly. < Definicao PF/pdf conjuntoDeixar™,..., Xnser variaveis aleatérias, algumas das quais tem um continuo Definition Joint p.f./p.d.f. Let X,,..., X,, be random variables, some of which have a continuous 3.7.4 distribuigdo conjunta e alguns dos quais tém distribuigdes discretas; sua distribuigdo conjunta 3.7.4 joint distribution and some of which have discrete distributions; their joint distribu- seria ent&o representada por uma funcdofque chamamos deconjunto pf/pdfA funcdo tem a tion would then be represented by a function f that we call the joint p.f'/p.d.f.The propriedade de que a probabilidade de queXesté em um subconjunto CC Rré calculado function has the property that the probability that X lies in a subset C C R” is calcu- somandof (xsobre os valores das coordenadas dexque correspondem as varidveis aleatérias lated by summing f(x) over the values of the coordinates of x that correspond to the discretas e integrando sobre aquelas coordenadas que correspondem as varidveis aleatérias discrete random variables and integrating over those coordinates that correspond to continuas para todos os pontosxEC. the continuous random variables for all points x € C. Exemplo Chegadas em uma fila.Verificaremos agora se o pf/pdf proposto no Exemplo 3.7.6 Example Arrivals at a Queue. We shall now verify that the proposed p.f./p.d.f. in Example 3.7.6 3.7.7 na verdade soma e integra a 1 sobre todos os valores de(y, z, w). Devemos 3.7.7 actually sums and integrates to 1 over all values of (y, z, w). We must sum over w resumirce integrar maissimez. Podemos escolher em que ordem fazé-los. Ndo é and integrate over y and z. We have our choice of in what order to do them. It is not 156 Capitulo 3 Varidveis Aleatdrias e Distribuigées 156 Chapter 3 Random Variables and Distributions dificil ver que podemos fatorarftomof (y, Zz, wEh2(z)/n3(s, w), onde difficult to see that we can factor f as f(y, z, w) =h2(z)h13(y, w), where { 6e-3z —3z ho(z= paraz >0, hy(z) = | 6e for z > 0, 0 de outra forma, 0 otherwise, { 4 _ —10y W Jy! = n3(s, WE €-10sim(Be)c/d 0 paravocé >0 ec=0,1,...,de hialy, w) = | e (8y)"/w! fory>Oandw=0,1,..., outra forma. 0 otherwise. Ent&o podemos mega primeiro para conseguir j So we can integrate z out first to get ce co 00 00 Fly, Z, W)dz=!n3(s, w) 6e-3z0z=2/n3(s, W). / SO, 2, w)dz =hy43(y, w) [ 6e- dz = 2h13(y, w). — 00 0 —cO 0 Integrandosimfora de/3(s, w¥ possivel, mas ndo agradavel. Em vez disso, observe que (8 Integrating y out of 4 13(y, w) is possible, but not pleasant. Instead, notice that e)/d é o@ termo na expansdo de Taylor deessim. Por isso, (8y)”/w! is the wth term in the Taylor expansion of e®”. Hence, y Y Bek — =2e-sosineasine2e- = = (ayy 213(s, WE2e-10sim (Be)c =2 -10sime8sim=2 @-2sim, > 2hy3(y, w) = Je —l0y > (8y)" = 2e7l0y 89 — 2e~2), d w! 0 0 w=0 w=0 paravocé >0 e 0 caso contrario. Finalmente, integrandosimrende 1. - for y > 0 and 0 otherwise. Finally, integrating over y yields 1. < Exemplo Um ensaio clinico.No Exemplo 3.7.1, uma das varidveis aleatériasPtem um continuo Example A Clinical Trial. In Example 3.7.1, one of the random variables P has a continuous 3.7.8 distribuigdo e os outros%i,..., Xeutém distribuigdes discretas. Uma possivel articulagao 3.7.8 distribution, and the others X;,..., X,, have discrete distributions. A possible joint pf/pdf para(™,..., Xeu,Pe p-f/p.d.f. for (X1,..., X,,, P) is { Fix, Pk pPxit...+xeu(1 -P)euxi-...-xeu para todosxev€ {0,1} e 0<ps1, caso fe, p= | pitt xm (1 — py"—"1-""-%m_— for all x; € {0, 1} and 0 < p <1, ‘ 0 contrario. , 0 otherwise. Podemos encontrar probabilidades com base nesta funcgdo. Suponha, por exemplo, que We can find probabilities based on this function. Suppose, for example, that we want queremos a probabilidade de que haja exatamente um sucesso entre os dois primeiros the probability that there is exactly one success among the first two patients, that is, pacientes, ou seja, Pr(X1+X2= 1). Devemos integrarf (x, Psobrepe soma todos os valores Pr(X, + X> = 1). We must integrate f(x, p) over p and sum over all values of x that dexque témx1+x2= 1. Para fins de ilustragdo, suponha queeu=4. Primeiro, fatore px+.0(1 - have x; + x» = 1. For purposes of illustration, suppose that m = 4. First, factor out Pp-x-2=p(1 -p), que rende p21 — p)?-*1-*2 = p(1 — p), which yields F(x, PF [p(1 -p)lprarxa(1 -ppa-x3-xa, fx, p) =[pA— p)|p*4(1 = pyr, parax3, x4€ {0,1}, 0<p <1, exi+x2= 1. Resumindo.xzrendimentos for x3, x4 € {0, 1},0 < p <1, and x; + x. = 1. Summing over x3 yields ( ) [PA -pllpx(\ -ph-a(l -pi pp.a(l -ph-x = [pl -pllpa( -ph-x. [pl — p)] (p= py — p) + pp = py’) = [p= p)p = py. Resumindo issoxadap(1 -p). Em seguida, integrepobtert Joc -p)dp= Summing this over x4 gives p(1 — p). Next, integrate over p to get fo p(l— p)dp= 1/. Finalmente, observe que existem dois(x1, xzWetores,(1,0)e(0,1), que tém xi+x 1/6. Finally, note that there are two (x;, x2) vectors, (1, 0) and (0, 1), that have 2= 1, entdo Pr(Xi+X2= TEN AHA AFIA. - Xy + x) =1,s80 Pr(X; + Xp = 1) = (1/6) + (1/6) = 1/3. < Distribuigdes Marginais Marginal Distributions Derivando um PDF Marginake a distribuigdo conjunta denvariaveis aleatérias™ Deriving a Marginal p.d.f. If the joint distribution of n random variables X),..., ,-..,Xné conhecida, entdo a distribuigdo marginal de cada variavel aleatoriaXeupode ser X,, is known, then the marginal distribution of each single random variable X; can derivado desta distribuigéo conjunta. Por exemplo, se o pdf conjunto deXi,..., Xn be derived from this joint distribution. For example, if the joint p.d.f. of X),..., X, éfentdo o pdf marginalngexie especificado em cada valorxipela relagdo is f, then the marginal p.d-f. f; of X1 is specified at every value x, by the relation FR. J woof M4 OB «fn. Ailey) = Jo [So FL +s Xn) dxQ + Xp. —""_———— >_> n-1 n—-1 De forma mais geral, a FDP da junta marginal de qualquerkdomwvariaveis aleatérias Xi More generally, the marginal joint p.d-f. of any & of the n random variables ,...,Xnpode ser encontrado integrando a pdf conjunta sobre todos os valores possiveis de X1,..., X, can be found by integrating the joint p.d-f. over all possible values of 3.7 Distribuigdes Multivariadas 157 3.7 Multivariate Distributions 157 o outror-kvariaveis. Por exemplo, se# a pdf conjunta de quatro variaveis the other n — k variables. For example, if f is the joint p.d-f. of four random variables aleatorias ™1,X2,X3, eX4, entdo a pdf bivariada marginal fadeX2eX%4é especificado X1, Xo, X3, and X4, then the marginal bivariate p.d.f. f54 of X, and X4 is specified at em cada ponto(z, x4)pela relacdo each point (x2, x4) by the relation Jef op RA(X2, XA (x1, X2, X3, X4)dx1 abe. foa(x2, X4) = / / Sf (X41, X%2, X3, X4) dx1 dx3. -0 -0 —oo J—00 Exemplo Tempos de servigo em uma fila.sSuponha quen=5 no Exemplo 3.7.5 e que queremos o Example Service Times in a Queue. Suppose that n = 5 in Example 3.7.5 and that we want the 3.7.9 pdf bivariado marginal de(X1, X4). Devemos integrar a Eq. (3.7.3) acimax2,x3, exs. 3.7.9 marginal bivariate p.d.f. of (X,, X4). We must integrate Eq. (3.7.3) over x2, x3, and x5. Como a pdf conjunta é simétrica em relagdo as permutagdes das coordenadas de x, Since the joint p.d.f. is symmetric with respect to permutations of the coordinates of iremos apenas integrar as Ultimas trés variaveis e depois mudar os nomes das x, we shall just integrate over the last three variables and then change the names of varidveis restantes paraxiex4. Ja vimos como fazer isso no Exemplo 3.7.5. the remaining variables to x; and x4. We already saw how to do this in Example 3.7.5. O resultado é The result is | 4 parax, x >0 4 for >0 TT , ' ST X14, XxX > fix, 2 |2+xitx2b (3.7.5) fir, X2) = } (2+ x1 +.29)3 bee (3.7.5) 0 de outra forma. 0 otherwise. Entaéofi4é exatamente como (3.7.5) com todos os 2 subscritos alterados para 4. A pdf marginal Then fj4 is just like (3.7.5) with all the 2 subscripts changed to 4. The univariate univariada de cadaXeué marginal p.d.f. of each X; is | 2 >0 2 for x; > 0 ———_ araxeu>0, ——~ X; 5 feu(xeu= |(2 4x2 eu) Pr (3.7.6) fila) =) Q4x,2 (3.7.6) 0 de outra forma. 0 otherwise. Assim, por exemplo, se quisermos saber qual a probabilidade de um cliente ter que So, for example, if we want to know how likely it is that a customer will have to wait esperar mais de trés unidades de tempo, podemos calcular Pr(Xeu>3 jintegrando a funcdo longer than three time units, we can calculate Pr(X; > 3) by integrating the function na Eq. (3.7.6) de 3 a~.O resultado é 0,4. - in Eq. (3.7.6) from 3 to oo. The result is 0.4. < Semvariaveis aleatoriasXi,..., Xntém uma distribuigdo conjunta discreta, entdo a junta If n random variables X),..., X,, have a discrete joint distribution, then the marginal PF de cada subconjunto domvaridveis podem ser obtidas a partir de relagdes marginal joint p.f. of each subset of the n variables can be obtained from relations semelhantes aquelas para distribuigdes continuas. Nas novas relacées, as integrais sao similar to those for continuous distributions. In the new relations, the integrals are substituidas por somas. replaced by sums. Derivando um cdf marginalconsidere agora uma distribuigdo conjunta para a qual a Deriving a Marginal e.d.f. Consider now a joint distribution for which the joint fdc conjunta deX1,..., XnéfO CDF marginal AideXipode ser obtido a partir da seguinte c.d.f. of X,..., X, is F. The marginal c.d.f. F,; of X; can be obtained from the relagdo: following relation: AiO Pr.(Xisxi Pr. (xis, X2 <c, eoes Xn<o) Fi(xy) = Pr(x, < x4) = Pr(x, <4, Xo <OW,..., Xn < oo) = limao F(x, Or Xn). = lim F(x, XQ,---5 Xp). X2, v0e, XP XQ, 0005 Xp OO Exemplo Tempos de falha.Podemos encontrar o cdf marginal deXido cdf conjunto no Exame- Example Failure Times. We can find the marginal c.d.f. of X, from the joint c.d.f. in Exam- 3.7.10 ple 3.7.2 deixandox2ex3Va para~.O limite €/i(xi 1 -e-mparaxi20 e 0 caso 3.7.10 ple 3.7.2 by letting x, and x3 go to oo. The limit is F)(x;) = 1—e~“! for x, > 0 and 0 contrario. - otherwise. < Mais geralmente, a junta marginal cdf de qualquerkdomariaveis aleatérias X1 More generally, the marginal joint c.d.f. of any k of the n random variables ,+.+,Xnpode ser encontrado calculando o valor limite don-dimensional cdf Feomoxf X1,..., X, can be found by computing the limiting value of the n-dimensional c.d.f. copara cada um dos outros-Avariaveisxj.Por exemplo, seFé o cdf conjunto de quatro F as x; — oo for each of the other n — k variables x;. For example, if F is the joint variaveis aleatdriasX1,X2,X3, eX4, entdo o cdf bivariado marginal Faadex2eX4é c.d.f. of four random variables X1, X7, X3, and X4, then the marginal bivariate c.d.f. especificado em cada ponto (x2,x4) pela relagdo Fy, of X> and X, is specified at every point (x2, x4) by the relation P2402, X4FlimaoF (1, X2, X3, x4). Fy4(X, x4) = lm F(x, X9, X3, X4). X1, X37 0 X1,X3— 00 158 Capitulo 3 Varidveis Aleatdrias e Distribuicgées 158 Chapter 3 Random Variables and Distributions Exemplo Tempos de falha.Podemos encontrar o cdf bivariado marginal deX1eX3da articulacdo Example Failure Times. We can find the marginal bivariate c.d.f. of X; and X3 from the joint 3.7.11 cdf no Exemplo 3.7.2 deixandox2Va para™.O limite é 3.7.11 c.d.f. in Example 3.7.2 by letting x. go to oo. The limit is { -e - — ew *1) (1 — e733 > Fame (ell -e3) paraxi, x320, Fis. 43) = | (1 — e*1)(1 — e733) for x}, 32 0, 0 caso contrario. - 0 otherwise. < Varidveis Aleatérias Independentes Independent Random Variables Definigao Varidveis Aleatdérias Independentes.Diz-se quenvariaveis aleatériasX1,..., Xnsdo Definition Independent Random Variables. It is said that n random variables X),..., X, are 3.7.5 independentese, para cadanconjuntosA1, A2,..., Ande numeros reais, 3.7.5 independent if, for every n sets Aj, A>, ..., A, of real numbers, Pr.(XiGA1, X2EA2,..., XnEAn) Pr(X, € Ay, X2 € Az, ..., Xn € An) =Pr.(X1EAt )Pr.(X2€A2). . .Pr.(XnEAn). = Pr(X, € A,) Pr(X, € Az) --- Pr(X,, € A,,). SeX,..., Xnsdo independentes, segue-se facilmente que as variadveis aleatérias em cada If X;,..., X, are independent, it follows easily that the random variables in every subconjunto nado vazio deX1,..., Xntambém sao independentes. (Veja o Exercicio 11.) nonempty subset of X),..., X, are also independent. (See Exercise 11.) Existe uma generalizagdo do Teorema 3.5.4. There is a generalization of Theorem 3.5.4. Teorema Deixar Adenotar o cdf conjunto de, ..., Xn, e deixarfeudenotar a univariada marginal Theorem Let F denote the joint c.d.f. of X;,..., X,, and let F; denote the marginal univariate 3.7.3 cdf deXeuparaeu=1,..., 7. As variaveisXi,..., Xnsdo independentes se e somente se, para 3.7.3 c.d.f. of X; fori =1,...,. The variables X;,..., X,, are independent if and only if, todos os pontos(x1, x2, ..., Xn}ERn, for all points (x1, x9,...,%,) € R", F(m, X2,..., XNF A (X1)P2(x2). .. Fr(xn). a F (x4, X9,---,X,) = Fy) Fox) ++ + Fi, (Xy)- a O teorema 3.7.3 diz queXi, ... , Xnsdo independentes se e somente se sua fdc conjunta for Theorem 3.7.3 says that X),..., X,, are independent if and only if their joint c.d.f. 0 produto de suanCDFs marginais individuais. E facil verificar que as trés variaveis is the product of their n individual marginal c.d.f’s. It is easy to check that the three aleatérias no Exemplo 3.7.2 sdo independentes usando o Teorema 3.7.3. random variables in Example 3.7.2 are independent using Theorem 3.7.3. Ha também uma generalizacdo do Corolario 3.5.1. There is also a generalization of Corollary 3.5.1. Teorema Sem, ..., Xnt€m uma distribuicdo conjunta continua, discreta ou mista para a qual o Theorem If X,,..., X, have a continuous, discrete, or mixed joint distribution for which the 3.7.4 joint pdf, joint PF ou joint pf/pdf éfe sefevé a pdf univariada marginal ou PF deXeu(eu 3.7.4 joint p.d.f, joint p.f, or joint p.f'/p.d.f. is f, andif f; is the marginal univariate p.d.f. or =1,...,), entdom,..., Xnsdo independentes se e somente se a Seguinte relacdo for p.f of X¥; @=1,...,), then X,,..., X, are independent if and only if the following satisfeita em todos os pontos(m1, x2,..., Xn}ERn: relation is satisfied at all points (11, x2, ...,X,) € R": f(x, X22, eee, Xn fi (x1 )f2(x2). oe fn(Xn). (3.7.7) St (4, XQ, 665 Xn) = Si) fo(%) te FSn(Xn)- (3.7.7) 7 7 Exemplo Tempos de servigo em uma fila.No Exemplo 3.7.9, podemos multiplicar as duas unidades Example Service Times in a Queue. In Example 3.7.9, we can multiply together the two uni- 3.7.12 variar pdfs marginais deXieX2calculado usando a Eq. (3.7.6) e veja se o produto 3.7.12 variate marginal p.d.f’s of X, and X, calculated using Eq. (3.7.6) and see that the ndondoigual a pdf marginal bivariada de(™, X2)na Eq. (3.7.5). Entao™ product does not equal the bivariate marginal p.d.f. of (X,, X2) in Eq. (3.7.5). So X; eX2ndo sdo independentes. - and X> are not independent. < Definigao Amostras aleatérias/iid/tamanho da amostra.Considere uma dada distribuigdo de probabilidade no Definition | Random Samples/i.i.d./Sample Size. Consider a given probability distribution on the 3.7.6 linha real que pode ser representada por um PF ou um pdffDiz-se quenvariaveis aleatériasX 3.7.6 real line that can be represented by either a p.f. or a p.d-f. f. It is said that n ,...,Xnformar umamostra aleatoriadesta distribuigdo se essas varidveis aleatorias forem random variables X,,..., X, form a random sample from this distribution if these independentes e o PF ou pdf marginal de cada uma delas for fDizemos também que tais random variables are independent and the marginal p.f. or p.d.f. of each of them is variaveis aleatérias sdoindependente e distribuido de forma idéntica, abreviado eReferimo-nos f. Such random variables are also said to be independent and identically distributed, ao numeronde variadveis _aleatérias como otamanho da amostra. abbreviated i.i.d. We refer to the number n of random variables as the sample size. A definicdo 3.7.6 diz queXi,..., Xnformar uma amostra aleatéria a partir da distribuigdo Definition 3.7.6 says that X;,..., X, form a random sample from the distribution representada pore for PF ou PDF conjuntogé especificado como segue em todos os pontos (x1 represented by f if their joint p.f. or p.d-f. g is specified as follows at all points , X2, 0.4, XNERn (X4,.X,---5X_) ER": G(X, 2.6, XNA) 02)... Axn). B(Xy, Xp) = f(xy) f 2) «+ + fy). Claramente, uma amostra iid ndo pode ter uma distribuicgdo conjunta mista. Clearly, an i.i.d. sample cannot have a mixed joint distribution. 3.7 Distribuigdes Multivariadas 159 3.7 Multivariate Distributions 159 Exemplo Vida Util das lampadas.Suponha que a vida util de cada lampada produzida em um Example Lifetimes of Light Bulbs. Suppose that the lifetime of each light bulb produced in a 3.7.13 determinada fabrica é distribuida de acordo com o seguinte pdf: 3.7.13 certain factory is distributed according to the following p.d.f.: { X@-x arax >0, xe* forx>0, fix P f(x) = | 0 de outra forma. 0 otherwise. Determinaremos a fdp conjunta dos tempos de vida de uma amostra aleatéria den We shall determine the joint p.d.f. of the lifetimes of a random sample of n light bulbs lampadas retiradas da producao da fabrica. drawn from the factory’s production. As vidasX1,..., Xndas lampadas selecionadas formardo uma amostra aleatéria The lifetimes X,,..., X,, of the selected bulbs will form a random sample from do pdffPara simplicidade tipografica, usaremos a notacdo exp(V)para denotar o the p.d.f. f. For typographical simplicity, we shall use the notation exp(v) to denote exponencialevquando a expressdo paravé complicado. Entdo o pdf conjuntog dex1 the exponential e” when the expression for v is complicated. Then the joint p.d-f. g ,...,Xnsera o seguinte: Sexeu>0 paraeu=1,..., 7, of X1,..., X,, will be as follows: If x; > 0 fori=1,...,n, iT’ n G(X, ..., XnF f(xeu) g(x... tn =] ] fod U=1 i=l CY ) iT ? n n = eu experiéncia - Xeu. => (1 “| exp (- > “| . eu=1 eu=1 i=1 i=l De outra forma,g(x1,..., Xn-0. Otherwise, g(x1,...,X,) =0. Toda probabilidade envolvendonvidas™, ..., Xnpode, em principio, ser determinado Every probability involving the n lifetimes X,, ..., X,, can in principle be deter- pela integracgdo deste pdf conjunto no subconjunto apropriado deRn. Foy _ por exemplo, se mined by integrating this joint p.d.f. over the appropriate subset of R”. For example, if Cé o subconjunto de pontos(m,..., xndde tal modo quexeu>0 paraeu=1,..., ne cue Xeu<um, C is the subset of points (x1, ..., x,) such that x, > Ofori =1,...,n and an Xj <4, ondeaé um determinado numero positivo, entdo where a isa given positive number, then Go) gD ; Pr. Xsyma= oO XEU experiencia ~ XeuOX.4 —. AXn. - Pr (x X; < : = / Ge / (1 “) exp (- » “) dx,---dxy. < eu=1 eu=1 eu=1 i=1 i=1 i=1 A avaliagdo da integral dada no final do Exemplo 3.7.13 pode exigir um tempo The evaluation of the integral given at the end of Example 3.7.13 may require consideravel sem o auxilio de tabelas ou de um computador. Certas outras a considerable amount of time without the aid of tables or a computer. Certain probabilidades, entretanto, podem ser facilmente avaliadas a partir das propriedades other probabilities, however, can be evaluated easily from the basic properties of basicas de distribuig6es continuas e amostras aleatdérias. Por exemplo, suponha que para continuous distributions and random samples. For example, suppose that for the as condigdes do Exemplo 3.7.13 se deseja encontrar Pr(Xi <X2<... <Xn). Como as variaveis conditions of Example 3.7.13 it is desired to find Pr(X, < X, <--- < X,,). Since the aleatoriasXi,..., Xntém uma distribuigdo conjunta continua, a probabilidade de que pelo random variables X;,..., X, have a continuous joint distribution, the probability menos duas dessas variaveis aleatdérias tenham o mesmo valor é 0. Na verdade, a that at least two of these random variables will have the same value is 0. In fact, probabilidade é 0 de que o vetor(X1,..., Xn)pertencera a cada subconjunto especifico deR the probability is 0 that the vector (X;,..., X,,) will belong to each specific subset npara o qual ono volume dimensional é 0. Além disso, como, ..., Xnsdo independentes of R” for which the n-dimensional volume is 0. Furthermore, since X;,..., X, are e distribuidas de forma idéntica, cada uma dessas varidveis tem amesma probabilidade independent and identically distributed, each of these variables is equally likely to de ser a menor dasvidas, e cada um tem a mesma probabilidade de ser o maior. De be the smallest of the n lifetimes, and each is equally likely to be the largest. More forma mais geral, se as vidasX1, ..., Xnsdo organizados em ordem do menor para o generally, if the lifetimes Xj, ..., X, are arranged in order from the smallest to the maior, cada ordem particular dex, ..., Xné tao provavel de ser obtido quanto qualquer largest, each particular ordering of X,,..., X, is as likely to be obtained as any outro pedido. Ja que existemn! diferentes ordenacées possiveis, a probabilidade de que a other ordering. Since there are n! different possible orderings, the probability that ordenacdao especificaX <X2<... <Xnsera obtido é 1/n!. Por isso, the particular ordering X, < X, <--- < X,, will be obtained is 1/n!. Hence, 1 1 Pr.(m™ SAS... <XnF. — Pr(Xx, < XxX) <eecc X,) =—-, n n!} Distribuigdes Condicionais Conditional Distributions Suponha quemvariaveis aleatoriasXi,..., Xntém uma distribuicdo conjunta continua para Suppose that n random variables X,,..., X,, have a continuous joint distribution for a qual a pdf conjunta éfe essafodenota a pdf conjunta marginal dok <nvariaveis which the joint p.d.f. is f and that fy) denotes the marginal joint p.d-f. of the k <n ran- aleatoriasX1,..., Xk. Entdo para todos os valores dex1,..., xkde tal modo quefo(m, ... , xk) dom variables X),..., X,. Then for all values of x1, ..., x, such that fo(x1, ..., x4) > >0, 0 pdf condicional de(X«1,..., Xnxdado queXi=m1,..., Xk=xké definido 0, the conditional p.d-f. of (X;41,..., X,) given that X;=2x,,..., X, =x, is defined 160 Capitulo 3 Varidveis Aleatdrias e Distribuicgées 160 Chapter 3 Random Variables and Distributions do seguinte modo: as follows: HX, X2,...,Xn) F(X, ---5 Xp) k+1...(Xk1, 2 XN| XI, 2 XK (Xpaps +s X_lXp.-- 5 %y) = OS. g | fox, ..., XW Sk+1..n eq nlX1 k Fol 0%) A definig&o acima generaliza para distribuigdes conjuntas arbitrarias como segue. The definition above generalizes to arbitrary joint distributions as follows. Definigao PF condicional, pdf ou PF/pdfSuponha que o vetor aleatérioX=(X1, ... , Xn) Definition | Conditional p.f., p.d.f., orp.f./p.d.f. Suppose that the random vector X¥ = (X,..., X;) 3.7.7 é dividido em dois subvetoresSeZ, onde Sé umkvetor aleatério multidimensional 3.7.7 is divided into two subvectors Y and Z, where Y is a k-dimensional random vector compreendendokdomvariaveis aleatérias emX, eZé um(n-kNetor aleatdério comprising k of the n random variables in X, and Z is an (n — k)-dimensional random multidimensional compreendendo o outror-kvariaveis aleatorias emX. Suponha vector comprising the other n —k random variables in X. Suppose also that the também que njunta dimensional pf, pdf ou pf/pdf de/S,Zéfe que a marginal(n-k)- PF n-dimensional joint p.f., p.d.f., or p.f./p.d-f. of (Y, Z) is f and that the marginal (n — k)- dimensional, pdf ou PF/pdf deZé/2. Entdo para cada ponto dadozeFr-de tal modo dimensional p.f., p.d.f., or p.f£./p.d.f. of Z is fy. Then for every given point z ¢ R”~* such quef(z) >0, o condicionalA-dimensional pf, pdf ou pf/pdfgideSdado Z=zé definido da that f(z) > 0, the conditional k-dimensional p.f., p.d-f., or p-f./p.d-f. 9; of Y given seguinte forma: Z =z is defined as follows: f(sim,z) H(Y, 2) k gi(sim| z= ———— parasime Rk. (3.7.8) gi(y|zZ) = —— forye R*. (3.7.8) f(z) fx) Eq. (3.7.8) pode ser reescrito como Eq. (3.7.8) can be rewritten as f (sim zg (sim| z)f2(2), (3.7.9) f(Y. 2) = 12) A), (3.7.9) 0 que permite a construcdo da distribuigdo conjunta a partir de uma distribuicdo condicional e which allows construction of the joint distribution from a conditional distribution and de uma distribuicgao marginal. Tal como no caso bivariado, é seguro assumir quef(sim,z=0 a marginal distribution. As in the bivariate case, it is safe to assume that f(y, z) =0 sempre que/2(z0. Entao Eq. (3.7.9) vale para todossimezemboragi (sim| Zao é definido whenever f>(z) = 0. Then Eq. (3.7.9) holds for all y and z even though g;(y|z) is not exclusivamente. uniquely defined. Exemplo Tempos de servico em uma fila.No Exemplo 3.7.9, calculamos a bivariada marginal Example Service Times in a Queue. In Example 3.7.9, we calculated the marginal bivariate 3.7.14 distribuigdo de dois tempos de servicoZ=(X1, X2). Agora podemos encontrar a pdf 3.7.14 distribution of two service times Z = (X 1, X7). We can now find the conditional three- tridimensional condicional de S=(X3, X4, Xs)dadoZ+(x1, x2)Joara cada par(x1, x2)de tal modo dimensional p.d.f. of Y = (X3, X4, Xs) given Z = (xj, x2) for every pair (x1, x2) such quex1, x2>0: that x1, x. > 0: x1, ...,X ) — fy, ..-,%5) G1 (X3, X4, X5| X1, X2F 2 81 (X3, X4, X5|X4, X2) = LOA, +++ 45) fi2(m1, x2) fio, xX) _| 240 )( 4 yay =( 240 )( 4 ) (2 +xI+.. +5516 (2 +xXI1+XxB > ~\Q4+x, +--+ +45)°) (2 +41 +29) 60(2 +x 1+ _ x; 60(2 3 = 602 +x 1 XB (3.7.10) = 0G + xt x2)" (3.7.10) (2+xi+, +--+ X5K (2+x,+---+x5)° parax3, x4, x5>0 e 0 caso contrario. O pdf conjunto em (3.7.10) parece um monte de for x3, x4, x5 > 0, and 0 otherwise. The joint p.d-f. in (3.7.10) looks like a bunch of simbolos, mas pode ser bastante util. Suponha que observemos1= 4 eX2= 6. Entao symbols, but it can be quite useful. Suppose that we observe X, = 4 and X, = 6. Then | 103,680 103,680 para, x4, x5>0, ——_————_———__ for x3, x4, x5 > 0, 9108, x4, X5|4.6 | (12 +x+x4+x5)0 © 81(%3, X4, 1514.6) = 1 (12 + x3 +.x4 + x5)® de outra forma. 0 otherwise. Agora podemos calcular a probabilidade condicional de queX3>3 dadoX= 4, X2= 6: We can now calculate the conditional probability that X3 > 3 given X; =4, X, =6: 3.7 Distribuigdes Multivariadas 161 3.7 Multivariate Distributions 161 Joofoofo 10,360 co fr 10,360 Pr.(X3>3| X= 4, X2= 6 — sdixiax 3 Pr(X3 > 3|X; =4, X) =6) = / [ [ x cd yd x3 3 0 oO (12 +x3+x44+x5)6 3 Jo Jo (2+4+x3+x44+ x5)° Joe 20,736 ~~ 20,736 = ——@— dxsdr3 = / / ——— — —dx4dx3 p 0 (12 +%3+%x4)55 3 Jo (2+ x34 x4) “5184 > 5184 = ———§ a3 =| ——— dx3 3 (2 +3) 3 (12 +.43)4 1728 1728 = —— =0.512. = — =0.512. 153 153 Compare isso com o calculo de Pr(X3>3 0.4 no final do Exemplo 3.7.9. Depois de aprender que Compare this to the calculation of Pr(X3 > 3) = 0.4 at the end of Example 3.7.9. os dois primeiros tempos de servico sdo um pouco maiores que trés unidades de tempo, After learning that the first two service times are a bit longer than three time units, we revisamos a probabilidade de queX3>3 para cima para refletir o que aprendemos com as duas revise the probability that X3 > 3 upward to reflect what we learned from the first two primeiras observacées. Se os dois primeiros tempos de servico tivessem sido pequenos, a observations. If the first two service times had been small, the conditional probability probabilidade condicional de queX3>3 teria sido menor que 0,4. Por exemplo, Pr.(X3>3 | Xi= 1, X2 that X3 > 3 would have been smaller than 0.4. For example, Pr(X3 > 3|X; =1, X, = =1.50.216. - 1.5) = 0.216. < Exemplo Determinando um PDF Bivariado MarginalSuponha queZé uma variavel aleatoria para a qual Example Determining a Marginal Bivariate p.d.f. Suppose that Z is a random variable for which 3.7.15 o pdff/é o seguinte: 3.7.15 the p.d.f. fo is as follows: { 2e@-2z —2z f(z paraz >0, (3.7.11) fol) = { 2e forz>0, (3.7.11) 0 de outra forma. 0 otherwise. Suponha, além disso, que para cada valor dadoZz >0 duas outras variaveis aleatdériasXie Suppose, furthermore, that for every given value Z = z > 0 two other random vari- X2sd0 independentes e distribuidos de forma idéntica e o pdf condicional ables X, and X, are independent and identically distributed and the conditional p.d.f. de cada uma dessas varidveis éa seguinte: of each of these variables is as follows: Z @-2x —2x o(x| ZF parax >0, (3.7.12) e(x|z) = { ze for x > 0, (3.7.12) 0 de outra forma. 0 otherwise. Determinaremos a pdf da junta marginal de(X1, X2). We shall determine the marginal joint p.d.f. of (X,, X). DesdeXieX2sdo iid para cada valor dado deZ, seu pdf conjunto condicional Since X, and X are i.i.d. for each given value of Z, their conditional joint p.d_f. quandoZz >0 é ( when Z =z > Ois 22 @-2(x1+x2) 2 —2(x1+x2) gi2(m, x2| 2 parax1, x2>0, gp(%, Iz) = { ze for x, *y > 0, 0 caso contréario. 0 otherwise. O pdf conjuntofde(Z, x1, X2sera positivo apenas nesses pontos(z,x1, x2) de tal The joint p.d.f. f of (Z, X;, Xz) will be positive only at those points (z, x, x2) modo quex1, x2, 7>0. Segue-se agora que, em cada ponto, such that x1, x2, z > 0. It now follows that, at every such point, F(Z, X1, 2F 1(Z)g12(x1, X2| ZF222€-22+x1+22). F(Z, X41 X2) = fo(Z) e124, X2|z) = 222e OD) Paraxi>0 ex2>0, a junta marginal pdfA2(m, x2)deX1eX2pode ser determinado For x; >0 and x, > 0, the marginal joint p.d-f f,o(x1, x.) of X,; and X, can be usando integracdo por partes ou alguns resultados especiais que surgirdo na determined either using integration by parts or some special results that will arise Secdo. 5.7: j in Sec. 5.7: A2(x1, X2F . F(z, X1, X2)dz= 4 fi2l ) [xr )d 4 , 1 Al, = 121%1, X2) = Z,X1,X%2) dZ = ———_______,» 0 (2 +x1+x2)8 0 (2 +x, +x)3 param, x2>0. O leitor notara que esta pdf é igual a pdf marginal bivariada de(Xi, for x1, x. > 0. The reader will note that this p.d-f. is the same as the marginal bivariate X2Jencontrado na Eq. (3.7.5). p.d.f. of (X1, Xz) found in Eq. (3.7.5). A partir desta pdf marginal bivariada, podemos avaliar probabilidades envolvendoX From this marginal bivariate p.d.f., we can evaluate probabilities involving X eX2, como o Pr(X1+X2<4). Nés temos and X>, such as Pr(X; + X7 < 4). We have faJaoe 4 4 4 p4—x2 4 4 Pr.(Xi+X2<A4} ——_—_,, dx dx2=. = - Pr(X, + X%.<D)= [ [ ————, dx, dx. = -. < 0 0 (2+xi+x2) 3 9 0 Jo (2+x1 +2) 9 162 Capitulo 3 Varidveis Aleatdrias e Distribuicgées 162 Chapter 3 Random Variables and Distributions Exemplo Tempos de servico em uma fila.Podemos pensar na variavel aleatériaZno Exemplo 3.7.15 Example Service Times in a Queue. We can think of the random variable Z in Example 3.7.15 3.7.16 como a taxa na qual os clientes sdo atendidos na fila do Exemplo 3.7.5. Com esta 3.7.16 as the rate at which customers are served in the queue of Example 3.7.5. With this interpretagdo, é util encontrar a distribuigdo condicional da taxaZdepois de interpretation, it is useful to find the conditional distribution of the rate Z after we observarmos alguns dos horarios de servigo, comoXieX2. observe some of the service times such as X, and X>. Para cada valor dez, o pdf condicional deZdadoM=x1eX2=x26é For every value of z, the conditional p.d.-f. of Z given X; = x; and X7 = x) is F(Z, X\, X2) Ff (Z, X15 X2) Q(2z| m1, 2 FT — go(Zlxy, x9) = fi2(xi, x2){ i2(%1, X2) _ 12 4+x1+Hx2B2Ez2+x+12)2( paraz>0, (3.7.13) _ 5(2 + x1 +2)322e 22 t+) for z > 0, (3.7.13) 0 _ de outra forma. 0 otherwise. Finalmente, avaliaremos Pr(Z1 | X1= 1, X2= 4). Nos temos Finally, we shall evaluate Pr(Z < 1|X;=1, X= 4). We have fi 1 Pr.(ZS1|M= 1, X2= 4 go(z| 1,4)az Pr(Z < 1|X, =1, X,=4) -| go(z|1, 4) dz 0 0 fi 1 = — 171.52e-7zd2z=0.9704. - = / 171.5z*e~? dz = 0.9704. < 0 0 Lei da Probabilidade Total e Teorema de BayesO Exemplo 3.7.15 contém um exemplo Law of Total Probability and Bayes’ Theorem Example 3.7.15 contains an example da versdo multivariada da lei da probabilidade total, enquanto o Exemplo 3.7.16 of the multivariate version of the law of total probability, while Example 3.7.16 contém um exemplo da versdo multivariada do teorema de Bayes. As provas das contains an example of the multivariate version of Bayes’ theorem. The proofs of vers6es gerais sdo consequéncias diretas da Definicdo 3.7.7. the general versions are straightforward consequences of Definition 3.7.7. Teorema Lei Multivariada da Probabilidade Total e Teorema de Bayes.Suponha as condigées e Theorem Multivariate Law of Total Probability and Bayes’ Theorem. Assume the conditions and 3.7.5 notagdo dada na Definicdo 3.7.7. SeZtem uma distribui¢do conjunta continua, a pdf 3.7.5 notation given in Definition 3.7.7. If Z has a continuous joint distribution, the mar- marginal de sé ginal p.d.f. of Y is Joo Joo CO CO fi(sime + _gi(sim| z)f2(z)az, (3.7.14) fly) = / nee / 81(y1Z) fo(Z) dz, (3.7.14) oO. —% —0o —oo —_ —$——S——$<$<—S n-k n—k e o pdf condicional deZdadoS=simé and the conditional p.d.f. of Z given Y = y is . (sim| z)f2(z) (y1Z) fo(z) g2(z| sim qisin| 21°) (3.7.15) go(Z|y) = 81012) 2) (3.7.15) fi(sim) Ay) SeZtem uma distribuigdo conjunta discreta, entdo a integral multipla em (3.7.14) deve ser If Z has a discrete joint distribution, then the multiple integral in (3.7.14) must be substituida por uma soma multipla. SeZtem uma distribuigéo conjunta mista, a integral replaced by a multiple summation. If Z has a mixed joint distribution, the multiple multipla deve ser substituida pela integracgdo sobre essas coordenadas com distribuicgées integral must be replaced by integration over those coordinates with continuous continuas e soma sobre essas coordenadas com distribuigdes discretas. 2 distributions and summation over those coordinates with discrete distributions. Varidveis aleatérias condicionalmente independentesnos Exemplos 3.7.15 e 3.7.16,Z6 Conditionally Independent Random Variables In Examples 3.7.15 and 3.7.16, Z is a Unica variavel aleatoriaZeS=(Xi, X2). Esses exemplos também ilustram o uso de variaveis the single random variable Z and Y = (X, X,). These examples also illustrate the use aleatérias condicionalmente independentes. Aquilo é,X1eX2sdo condicionalmente of conditionally independent random variables. That is, X; and X> are conditionally independentes, dados# zpara todosz >0. No Exemplo 3.7.16, dissemos queZfoi a taxa independent given Z = z for all z > 0. In Example 3.7.16, we said that Z was the com que os clientes foram atendidos. Quando esta taxa é desconhecida, 6 uma rate at which customers were served. When this rate is unknown, it is a major source importante fonte de incerteza. Particionando o espaco amostral pelos valores da taxaZe of uncertainty. Partitioning the sample space by the values of the rate Z and then entdo condicionando em cada valor deZelimina uma importante fonte de incerteza para conditioning on each value of Z removes a major source of uncertainty for part of parte do calculo. the calculation. Em geral, a independéncia condicional para variaveis aleatérias é semelhante a In general, conditional independence for random variables is similar to condi- independéncia condicional para eventos. tional independence for events. 3.7 Distribuigdes Multivariadas 163 3.7 Multivariate Distributions 163 Definicao Variaveis aleatérias condicionalmente independentes.DeixarZseja um vetor aleatério com junta Definition Conditionally Independent Random Variables. Let Z be a random vector with joint 3.7.8 pdf, pdf ou pdf/pdff/(z). Varias variaveis aleatdériasX, ..., Xnsdocondicionalmente 3.7.8 p-£, p.d.f, or p.f/p.d.f. fo(z). Several random variables X,,..., X,, are conditionally independente dadoZse, para todoszde tal modo quefi(z) >0, temos independent given Z if, for all z such that fo(z) > 0, we have iT’ n GIX|Z= ——— geutxeul 2) g(xiz) = | si(ailz), eu=1 i=1 ondeg(x| zsignifica PF multivariado condicional, pdf ou PF/pdf deXdado Z=zegeu where g(x|z) stands for the conditional multivariate p.f., p.d-f£, or p.f/p.d-f. of X given (xeu| ZSignifica o PF univariado condicional ou pdf deXeudadoZ=z. Z =zand g;(x;|z) stands for the conditional univariate pf. or p.d.f. of X; given Z = z. No Exemplo 3.7.15, geu(xeu| Z=Z e-zxeuparaxeu>0 eeu=1,2. In Example 3.7.15, g;(x;|z) = ze~*" for x; > 0 andi = 1, 2. Exemplo Um ensaio clinico.No Exemplo 3.7.8, 0 conjunto pf/pdf ali fornecido foi construido por Example A Clinical Trial. In Example 3.7.8, the joint p.f./p.d.f. given there was constructed by 3.7.17 assumindo que, ..., Xeueram condicionalmente independentes, dado P=pcada um 3.7.17 assuming that X),..., X,, were conditionally independent given P = p each with com 0 mesmo PF condicional, geu(xeu| Pp pxeu(1 -p)i-xeuparaxeuE {0,1} e issoPteve a the same conditional p.f., g;(x;|p) = p*\(1 — p)!-*i for x; € {0, 1} and that P had distribuigdo uniforme no intervalo [0,1]. Essas suposicées produzem, na notacao da the uniform distribution on the interval [0, 1]. These assumptions produce, in the Definicdo 3.7.8, notation of Definition 3.7.8, t Pxit...+xeu(1 = PO-x1-..rXeu para todos xev€ {0,1} e OSps1, caso prt t%m (| — p)0-1-"Am_— for all x; € {0, 1} and 0 < p <1, g(x| pF _ salpy= . 0 contrario, 0 otherwise, para 0<ps1. Combinando isso com o pdf marginal deP,2(P1 por Osps1 e 0 caso for 0 < p < 1. Combining this with the marginal p.d-f. of P, fo(p) =1for0<p<1 contrario, obtemos o conjunto pf/pdf dado no Exemplo 3.7.8. - and 0 otherwise, we get the joint p.f./p.d.f. given in Example 3.7.8. < Vers6es condicionais de teoremas passados e futurosMencionamos anteriormente que Conditional Versions of Past and Future Theorems We mentioned earlier that distribuigdes condicionais se comportam exatamente como distribuigdes. Consequentemente, conditional distributions behave just like distributions. Hence, all theorems that we todos os teoremas que provamos e provaremos no futuro tém versées condicionais. Por have proven and will prove in the future have conditional versions. For example, exemplo, a lei da probabilidade total na Eq. (3.7.14) tem a seguinte versdo condicional the law of total probability in Eq. (3.7.14) has the following version conditional on outro vetor aleatérioG@c. j j another random vector W = w: ce co oo oo fi(sim| cE Cot gi(sim| Zz, f(z2| Caz, (3.7.16) fiQy|w) -| tee / 21 (yIZ, Ww) fo(z|w) dz, (3.7.16) on, 0 —0o —0o —_—_ -— -—- n-k n—k ondefi(sim| csignifica pdf condicional, pf ou pf/pdf deSdado Cc, gi(sim| Z,c) where f;(y|w) stands for the conditional p.d.f., p.f., or p.f/p.d-f. of Y given W =n, significa pdf condicional, pf ou pf/pdf deSdado(Z, C=(Z,c), ef(z| csignifica o pdf g1(y|z, w) stands for the conditional p.d.f, p.f, or p.f./p.d.-f. of Y given (Z, W) = (z, w), condicional deZdado C=c. Usando a mesma nota¢do, a versdo condicional do and f>(z|w) stands for the conditional p.d.f. of Z given W = w. Using the same teorema de Bayes é notation, the conditional version of Bayes’ theorem is . N(sim| z,C)fR(z| € ZW) fol(Z|w fi(sim|c) fidylw) Exemplo Condicionamento em variaveis aleatérias em sequéncia.No Exemplo 3.7.15, encontramos o Example Conditioning on Random Variables in Sequence. In Example 3.7.15, we found the 3.7.18 pdf condicional deZdado(™, x2 (m1, x2). Suponha agora que existam mais trés 3.7.18 conditional p.d.f. of Z given (X 1, X>) = (x1, x2). Suppose now that there are three observacoées disponiveis,X3,X4, eX5, e suponha que todos, ..., Xs more observations available, X3, X4, and Xs, and suppose that all of X),..., Xs5 sdo condicionalmente iid dadosZzcom pdfg(x| Zz). Usaremos a versdo condicional do are conditionally iid. given Z =z with p.d.f. g(x|z). We shall use the conditional teorema de Bayes para calcular a fdp condicional deZdado(™1,..., X5 (x1,..., X5). version of Bayes’ theorem to compute the conditional p.d.f. of Z given (X),..., X5) = Primeiro, encontraremos a pdf condicionalg345(x3, x4, x5| x1, x2,z)deS= (X3, X4, X5)dado (x1, ..., 5). First, we shall find the conditional p.d-f. 9345(%3, +4, %5|X1, %2, Zz) of Y= Z=ze C=(M1, X2-(x1, x2). Usaremos a notacao para pdfs na discussdo imediatamente (X3, X4, X5) given Z =z and W = (Xj, X>) = (41, x2). We shall use the notation for anterior a este exemplo. DesdeX, ..., X5sdo condicionalmente iid dadosZ, temos p.d.f’s in the discussion immediately preceding this example. Since X,..., X5 are issogi(sim| z,c)ndo depende dec. Na verdade, conditionally i.i.d. given Z, we have that g)(y|z, w) does not depend on w. In fact, (siM| Z, CF 903 | 2)9(x4| Z)Q06 | ZF 23€-203+ x48), gi(ylz, W) = g(x3lz)g(xglz)g(as|z) = Pe FOSS), 164 Capitulo 3 Varidveis Aleatdrias e Distribuicgées 164 Chapter 3 Random Variables and Distributions paraxs, x4, x5>0. Também precisamos do pdf condicional deZdadoC=c, que foi for x3, x4, x5 > 0. We also need the conditional p.d.f. of Z given W = w, which was calculado na Eq. (3.7.13), e agora denotamos isso calculated in Eq. (3.7.13), and we now denote it 1 1 A(z| oF 52 txt 2PE( 2#xI472). flzlw) = 52 +x + xy P272e 2 F 1439), Finalmente, precisamos da fdp condicional das Ultimas trés observacées, dadas as duas Finally, we need the conditional p.d.f. of the last three observations given the first primeiras. Isto foi calculado no Exemplo 3.7.14, e agora o denotamos two. This was calculated in Example 3.7.14, and we now denote it . 602 +x1+.28 60(2 + x, + x5)? A(sim| c= ————_—_——_. A(ylw) = ———+* (2 +X1+.. 4x54 (2+4x,+++++%5) Agora combine-os usando 0 teorema de Bayes (3.7.17) para obter Now combine these using Bayes’ theorem (3.7.17) to obtain . BO ZB 32 +x1+x2822E-22+m1+x2) Z3eW23txat45) 5 (2 + xy $ xp)3 2277 ta t2) g2(2| sim, = 82(Z|Y, W) = — 60(2 +x1+x28 60(2 + xy + x2) (2 +X1+.. 4X54 (2+x,+---+x5)° 1 1 6,5,-2(2 = — (2 +X+4. AXK 2 Cg2+xit...+%5), = — (24x, 4---4+x5)% onl tay teebis)_ 120° | % 120°" "1 ° paraz >0. - for z > 0. < Nota: Regra simples para criagdo de versdes condicionais de resultados.Se vocé quiser Note: Simple Rule for Creating Conditional Versions of Results. If you ever wish to determinar a versdo condicional dadaC=cde um resultado que vocé comprovou, aqui esta um determine the conditional version given W = w ofa result that you have proven, here método simples. Basta adicionar “condicional emC=c’Para cada declaracao probabilistica no is asimple method. Just add “conditional on W = w” to every probabilistic statement resultado. Isso inclui todas as probabilidades, cdfs, quantis, nomes de distribuicées, pdfs, PFs e in the result. This includes all probabilities, c.d.f’s, quantiles, names of distributions, assim por diante. Também inclui todos os conceitos probabilisticos futuros que p.d.f’s, p.f’s, and so on. It also includes all future probabilistic concepts that we apresentaremos em capitulos posteriores (como valores esperados e variagdes no Capitulo 4). introduce in later chapters (such as expected values and variances in Chapter 4). Nota: A independéncia é um caso especial de independéncia condicional.Deixar%,..., Note: Independence is a Special Case of Conditional Independence. Let X),..., Xnsejam variaveis aleatérias independentes e sejam Cseja uma variavel aleatéria constante. X,, be independent random variables, and let W be a constant random variable. Ou seja, ha uma constantectal que Pr(G@c=1. Entéo™,..., Xntambém so condicionalmente That is, there is a constant c such that Pr(W =c) =1. Then Xj,..., X, are also independentes, dadoC=c. A prova é direta e fica ao critério do leitor (Exercicio 15). Este conditionally independent given W =c. The proof is straightforward and is left to resultado nao é particularmente interessante por si s6. Seu valor é 0 seguinte: Se provarmos the reader (Exercise 15). This result is not particularly interesting in its own right. um resultado para variaveis aleatorias condicionalmente independentes ou variaveis Its value is the following: If we prove a result for conditionally independent random aleatorias condicionalmente iid, entao o mesmo resultado sera valido para varidveis aleatorias variables or conditionally i.i.d. random variables, then the same result will hold for independentes ou variaveis aleatérias iid, conforme o caso. independent random variables or 1.i.d. random variables as the case may be. Histogramas Histograms Exemplo Taxa de servigo.Nos Exemplos 3.7.5 e 3.7.6, consideramos os clientes que chegam a um Example Rate of Service. In Examples 3.7.5 and 3.7.6, we considered customers arriving at a 3.7.19 fila e sendo atendido. DeixarZrepresentam a taxa com que os clientes foram atendidos, e 3.7.19 queue and being served. Let Z stand for the rate at which customers were served, deixamos%1, X2,...representam os tempos que os sucessivos clientes exigiram para and we let X;, X>, .. . stand for the times that the successive customers requrired for servigo. Assuma isso, X2, . ..sdo condicionalmente iid dadosZ=zcom pdf service. Assume that X;, X, ... are conditionally iid. given Z = z with p.d-f. { Z @-2x —zx arax >0 for 0 G(x| ZF parax (3.7.18) g(x|z) = | ze ve’ (3.7.18) 0 de outra forma. 0 otherwise. Isto 6 0 mesmo que (3.7.12) do Exemplo 3.7.15. Nesse exemplo, modelamosZcomo uma This is the same as (3.7.12) from Example 3.7.15. In that example, we modeled Z as variavel aleatoria com pdff(z+2 exp(-2z)paraz >0. Neste exemplo, assumiremos queX1 a random variable with p.d.f. f(z) = 2 exp(—2z) for z > 0. In this example, we shall ,...,Xnsera observado para algum valor granden, e queremos pensar sobre 0 que essas assume that X,,..., X, will be observed for some large value n, and we want to observacées nos dizem sobreZ. Para ser mais especifico, suponha que observamosn=100 think about what these observations tell us about Z. To be specific, suppose that we tempos de servico. As primeiras 10 vezes estado listadas aqui: observe n = 100 service times. The first 10 times are listed here: 1.39,0.61,2.47,3.35,2.56,3.60,0.32,1.43,0.51,0.94. 1.39, 0.61, 2.47, 3.35, 2.56, 3.60, 0.32, 1.43, 0.51, 0.94. 3.7 Multivariate Distributions 165 The smallest and largest observed service times from the entire sample are 0.004 and 9.60, respectively. It would be nice to have a graphical display of the entire sample of n = 100 service times without having to list them separately. ◀ The histogram, defined below, is a graphical display of a collection of numbers. It is particularly useful for displaying the observed values of a collection of random variables that have been modeled as conditionally i.i.d. Definition 3.7.9 Histogram. Let x1, . . . , xn be a collection of numbers that all lie between two values a < b. That is, a ≤ xi ≤ b for all i = 1, . . . , n. Choose some integer k ≥ 1 and divide the interval [a, b] into k equal-length subintervals of length (b − a)/k. For each subinterval, count how many of the numbers x1, . . . , xn are in the subinterval. Let ci be the count for subinterval i for i = 1, . . . , k. Choose a number r > 0. (Typically, r = 1 or r = n or r = n(b − a)/k.) Draw a two-dimensional graph with the horizonal axis running from a to b. For each subinterval i = 1, . . . , k draw a rectangular bar of width (b − a)/k and height equal to ci/r over the midpoint of the ith interval. Such a graph is called a histogram. The choice of the number r in the definition of histogram depends on what one wishes to be displayed on the vertical axis. The shape of the histogram is identical regardless of what value one chooses for r. With r = 1, the height of each bar is the raw count for each subinterval, and counts are displayed on the vertical axis. With r = n, the height of each bar is the proportion of the set of numbers in each subinterval, and the vertical axis displays proportions. With r = n(b − a)/k, the area of each bar is the proportion of the set of numbers in each subinterval. Example 3.7.20 Rate of Service. The n = 100 observed service times in Example 3.7.19 all lie between 0 and 10. It is convenient, in this example, to draw a histogram with horizontal axis running from 0 to 10 and divided into 10 subintervals of length 1 each. Other choices are possible, but this one will do for illustration. Figure 3.22 contains the histogram of the 100 observed service times with r = 100. One sees that the numbers of observed service times in the subintervals decrease as the center of the subinterval increses. This matches the behavior of the conditional p.d.f. g(x|z) of the service times as a function of x for fixed z. ◀ Histograms are useful as more than just graphical displays of large sets of num- bers. After we see the law of large numbers (Theorem 6.2.4), we can show that the Figure 3.22 Histogram of service times for Exam- ple 3.7.20 with a = 0, b = 10, k = 10, and r = 100. 0.05 0 0.10 0.15 0.20 0.25 0.30 2 4 6 8 10 Time Proportion 3.7 Distribuições Multivariadas 165 Os menores e maiores tempos de atendimento observados em toda a amostra são 0,004 e 9,60, respectivamente. Seria bom ter uma exibição gráfica de toda a amostra den=100 tempos de serviço sem precisar listá-los separadamente. - O histograma, definido abaixo, é uma exibição gráfica de uma coleção de números. É particularmente útil para exibir os valores observados de uma coleção de variáveis aleatórias que foram modeladas como condicionalmente iid Definição 3.7.9 Histograma.Deixarx1, . . . , xnser uma coleção de números que estão todos entre dois valores uma < b. Aquilo é,a≤xeu≤bpara todoseu=1, . . . , n. Escolha algum número inteirok≥1 e divida o intervalo [um, b] emksubintervalos de comprimento igual(b-a)/k. Para cada subintervalo, conte quantos dos númerosx1, . . . , xnestão no subintervalo. Deixar ceuseja a contagem para o subintervaloeuparaeu=1, . . . , k. Escolha um númeror >0. (Normalmente, R=1 ouR=nouR=n(b-a)/k.) Desenhe um gráfico bidimensional com o eixo horizontal partindo deaparab. Para cada subintervaloeu=1, . . . , kdesenhe uma barra retangular de largura(b-a)/ke altura igual aceu/racima do ponto médioeuo intervalo. Esse gráfico é chamado dehistograma. A escolha do númeroRna definição do histograma depende do que se deseja exibir no eixo vertical. A forma do histograma é idêntica, independentemente do valor escolhido paraR. ComR=1, a altura de cada barra é a contagem bruta para cada subintervalo e as contagens são exibidas no eixo vertical. ComR=n, a altura de cada barra é a proporção do conjunto de números em cada subintervalo e o eixo vertical exibe as proporções. ComR=n(b-a)/k, a área de cada barra é a proporção do conjunto de números em cada subintervalo. Exemplo 3.7.20 Taxa de serviço.On=100 tempos de serviço observados no Exemplo 3.7.19 estão todos entre 0 e 10. É conveniente, neste exemplo, desenhar um histograma com eixo horizontal indo de 0 a 10 e dividido em 10 subintervalos de comprimento 1 cada. Outras opções são possíveis, mas esta servirá como ilustração. A Figura 3.22 contém o histograma dos 100 tempos de serviço observados comR=100. Vê-se que o número de tempos de serviço observados nos subintervalos diminui à medida que o centro do subintervalo aumenta. Isso corresponde ao comportamento do pdf condicionalg(x|z)dos tempos de serviço em funçãoxpara fixoz. - Os histogramas são úteis mais do que apenas exibições gráficas de grandes conjuntos de números. Depois de vermos a lei dos grandes números (Teorema 6.2.4), podemos mostrar que o Figura 3.22Histograma de tempos de serviço para o Exemplo 3.7.20 coma=0,b=10, k=10, eR=100. 0h30 0,25 0,20 0,15 0,10 0,05 0 2 4 6 8 10 Tempo Proporção 166 Capitulo 3 Varidveis Aleatdrias e Distribuicgées 166 Chapter 3 Random Variables and Distributions histograma de uma amostra grande (condicionalmente) iid de variaveis aleatdrias histogram of a large (conditionally) i.i.d. sample of continuous random variables is continuas é uma aproximacao da pdf (condicional) das variaveis aleatérias na amostra, an approximation to the (conditional) p.d.f. of the random variables in the sample, desde que se use a terceira escolha def, a saber, R=n(b-a)/k. so long as one uses the third choice of r, namely, r = n(b — a)/k. Nota: Histogramas mais gerais.As vezes é conveniente dividir 0 intervalo de numeros a Note: More General Histograms. Sometimes it is convenient to divide the range of serem plotados em um histograma em subintervalos de comprimentos desiguais. Nesse caso, the numbers to be plotted in a histogram into unequal-length subintervals. In such a normalmente deixariamos a altura de cada barra serceu/reu, ondeceué a contagem bruta eReué case, one would typically let the height of each bar be c;/r;, where c; is the raw count proporcional ao comprimento doeuo subintervalo. Desta forma, a area de cada barra ainda é and r; is proportional to the length of the ith subinterval. In this way, the area of each proporcional 4 contagem ou propor¢do em cada subintervalo. bar is still proportional to the count or proportion in each subinterval. Resumo Summary Uma colecdo finita de varidveis aleatérias 6 chamada de vetor aleatorio. Definimos A finite collection of random variables is called a random vector. We have defined distribuigdes conjuntas para vetores aleatorios arbitrarios. Todo vetor aleatorio tem uma fdc joint distributions for arbitrary random vectors. Every random vector has a joint c.d_f. conjunta Vetores aleatérios continuos tém uma fdp conjunta Vetores aleatérios discretos tem Continuous random vectors have a joint p.d.f. Discrete random vectors have a joint uma pf conjunta Vetores aleatérios de distribuigao mista tem uma fdp conjunta As coordenadas p.f. Mixed distribution random vectors have a joint p.f./p.d.f. The coordinates of an de um ncorrida dimensional[]vetor domésticoXsdo independentes se o conjunto PF, PDF ou PF/ n-dimensional random vector X are independent if the joint p.f., p.d-f, or p.f/p.d.f. PDF f(xfatoresem 1 feu(xeu). f (x) factors into []/_, f,(;)- Podemos calcular distribuigd6es marginais de subvetores de um vetor aleatério e We can compute marginal distributions of subvectors of a random vector, and podemos calcular a distribuigdo condicional de um subvetor dado o resto do vetor. we can compute the conditional distribution of one subvector given the rest of the Podemos construir uma distribuigdo conjunta para um vetor aleatério juntando uma vector. We can construct a joint distribution for a random vector by piecing together distribuigdo marginal para parte do vetor e uma distribuigdo condicional para o restante, a marginal distribution for part of the vector and a conditional distribution for the dada a primeira parte. Existem vers6es do teorema de Bayes e da lei da probabilidade rest given the first part. There are versions of Bayes’ theorem and the law of total total para vetores aleatérios. probability for random vectors. Ummvetor aleatério multidimensionalXtem coordenadas que so condicionalmente An n-dimensional random vector X has coordinates that are conditionally inde- independentesmzse o PF condicional, pdf ou PF/pdfg(x| z)deXdadoZ=z pendent given Z if the conditional p.f, p.d-f£, or p.f/p.d.f. g(x|z) of X given Z =z fatoresem @u=1geu(xeu| z). Existem versées do teorema de Bayes, a lei da total factors into []/_, g;(x;|z). There are versions of Bayes’ theorem, the law of total probabilidade e todos os teoremas futuros sobre varidveis aleatérias e vetores aleatdérios probability, and all future theorems about random variables and random vectors condicionais a um vetor aleatério adicional arbitrario. conditional on an arbitrary additional random vector. Exercicios Exercises 1.Suponha que trés variaveis aleatérias%1,X2, eX3 (Notar quexitem uma distribuigdo continua ex2e X3tém 1. Suppose that three random variables X,, Xz, and X3 (Notice that X; has a continuous distribution and X and tém uma distribuigdo conjunta continua com o seguinte distribuigdes discretas.) Determine(a)o valor da have a continuous joint distribution with the following X3 have discrete distributions.) Determine (a) the value of pdf conjunto:/x1, x2, +3 constantec(b)a junta marginal PF dex2eX3; e joint p.d.f: f(xy, x2, x3) = the constant c; (b) the marginal joint p.f. of X and X3; and { (c)o pdf condicional deXidadoX2= 1 eX3= 1. (c) the conditional p.d.f. of X; given X, = 1and X3=1. C(x1+ 2x2+3%3)0 para OS xeuS1(eu=1,2,3), de . co. _ c(xy + 2x2 +3x3) forO<x; <1 G@=1, 2,3), . 3.Suponha que trés variaveis aleatdriasXi,X2, eX3 . 3. Suppose that three random variables X,, X7, and X3 outra forma. t€m uma distribuigdo conjunta continua com o seguinte 0 otherwise. have a continuous joint distribution with the following . df conjunto: (1, x2, x3 . joint p.d.f.: , x9, = Determinar(a)o valor da constanteG P J ‘ ( F Determine (a) the value of the constant c; J P FR, ¥2, ¥3) (b)A margemlal pdf conjunto demers e C€0n+22438) —paraxeus0(eu=1,2,3), de (b) the marginal joint pdf of x and X3; and ce 142243%3) for x, > 0 (i = 1, 2, 3), c)Pr.x3<1 0 aXi=1 4, X2=40 c) Pr (x |x =7,X =}). : (c) 0) outra forma. () 3S 2)A1 = g A2= 4 0 otherwise. 2.Suponha que trés variaveis aleatérias%1,X2, eX3 Determinar(a)o valor da constantec(b)a junta marginal 2. Suppose that three random variables X1, X2, and X3 Determine (a) the value of the constant c; (b) the marginal tém uma distribuigéo conjunta mista com pf/pdf: pdf dexieX3; e(c)Pr.(X1 <1 | X2= 2, X3= 1). have a mixed joint distribution with p.f/p.d-f.: joint p.d.f. of X; and X3; and (ce) Pr(X; <1|X, =2, X3=1). fx, x2, x3) 4.Suponha que um ponto (X1,X2,X3) € escolhido f (4, X, x3) 4. Suppose that a point (X,, X2, X3) is chosen at random, Ten hot aleatoriamente, ou seja, de acordo com a pdf uniforme, do Lhayte that is, in accordance with the uniform p.d.f., from the 1 (1-x18 © x20Se 0<x1<1 seguinte conjuntoS: exp P3323 if O< xy <1 following set S: = ex2, x3€ {0,1}, = and x5, x3 € {0, 1}, = < < f= . : => :O<x; < j= . 0 de outra forma S= {(x1, x2, X3k 0 XeuS1 paraeu=1,2,3} 0 otherwise. S = {(x1, x2, x3):0 < x; <1 fori =1, 2, 3} 3.8 Funcées de uma variavel aleatéria 167 3.8 Functions of a Random Variable ‘167 Determinar: 9.DeixarXser uma variavel aleatéria com distribuicgdo Determine: 9. Let X be arandom variable with a continuous distribu- LC da ( da ( da ] continua. Deixar1=X2=X. 2 2 2 tion. Let X; = X, = X. a.Pr.X1-1 5 + X15 + Xe 5 Sq Ce a. pr] (x: 4) + (X.-4) + (X3- 3) <1] a a.Prove que ambosXiexX2tem uma distribuicgdo a. Prove that both X, and X> have a continuous distri- b.Pr.(Xo1+Xo+2 31) continua. b. Pr(x? + X3 + X3 <1) bution. b.Prove issoX=(X1, X2)ndo tem uma distribuigdo . . b. Prove that X = (X,, Xz) does not have a continuous 5.Suponha que um sistema eletrénico contenhancomponentes conjunta continua. 5. Suppose that an electronic system contains n compo- joint distribution. que funcionam independentemente uns dos outros e que a nents that function independently of each other and that probabilidade desse componente eufuncionara corretamente é peu —- 10.Volte a situagdo descrita no Exemplo 3.7.18. DeixarX=(X the probability that component will function properly is 10. Return to the situation described in Example 3.7.18. (eu=1,..., n). Diz-se que os componentes estado conectadosem 1,...,X5e calcule a pdf condicional deZdadoX=x p; @ =1,...,n). It is said that the components are con- Let X = (Xj,..., X5) and compute the conditional p.d-f. sériese uma condicao necessaria e suficiente para o sistema diretamente em uma Unica etapa, como se todosXforam nected in series if a necessary and sufficient condition for of Z given X =x directly in one step, as if all of X were funcionar adequadamente é que todosncomponentes funcionam observados ao mesmo tempo. the system to function properly is that all n components observed at the same time. corretamente. Diz-se que os componentes esto conectadosem function properly. It is said that the components are con- ; paralelose uma condicdo necessaria e suficiente para o sistema 11.Suponha queNi, .. ., Xnsdo independentes. Deixark <ne nected in parallel if a necessary and sufficient condition for 11. Suppose that X;, . a Xn are independent. Let k <n funcionar adequadamente é que pelo menos um dos n deixarem,..., eusser inteiros distintos entre 1 en. Prove the system to function properly is that at least one of the and let iy,..., i, be distinct integers between 1 and n. componentes funcionam corretamente. A probabilidade de o issoXeu, . . ., Xeusao independentes. n components functions properly. The probability that the Prove that X;,,..., X;, are independent. sistema funcionar corretamente é chamada deconfiabilidadedo 12.DeixarXser um vetor aleatorio que é dividido em trés system will function properly is called the reliability of the 12. Let X be arandom vector that is split into three parts, sistema. Determine a confiabilidade do sistema,(a)assumindo que i Rares system. Determine the reliability of the system, (a) assum- 7 hat Xh ti ‘oint os componentes estdo conectados em série, e(b) assumindo que partes, x (32,0) suponha queXtem uma distribuicao ing that the components are connected in series, and (b) x 7” Y, Z, W). Suppose tha as a con™nuous Join os componentes estado conectados em paralelo. conjunta continua com pdff(sim,z,¢). Deixargi (sim,2| o) assuming that the components are connected in parallel. distribution with p.df f(y, z; w). Let gi(y, zlw) be the seja o pdf condicional de(S,Z)dado Cc, e deixar g2(sim| c) conditional p.d.f. of (Y, Z) given W = w, and let go(y|w) 6.Suponha que onvariaveis aleatériasX1...,Xnformaruma _ Seja o trapaceirofpdf adicional deSdadoC=c, Prove isso g2( 6. Suppose that the n random variables X,..., X,, forma be the conditional p.d.f. of Y given W = w. Prove that amostra aleatéria de uma distribuigdo discreta para a qual sim| c=gi(sim,2z| c)az. random sample from a discrete distribution for which the go(ylw) = f gi(y, Zlw) dz. o PF éfDetermine o valor de Pr(M1=X2=. . .=Xn). . a . p.f. is f. Determine the value of Pr(X, = X, =---=X,). .. : . 13.Deixar™, X2, X3ser condicionalmente independente 13. Let X 1, X2, X3 be conditionally independent given 7.Suponha que orvariaveis aleatoriasX,..., Xnformar uma dado £zpara todoszcom o pdf condicionalg(x| zyna Eq. 7. Suppose that the n random variables X;,..., X,, forma Z =z for all z with the conditional p.df. g(x|z) in Eq. amostra aleatéria de uma distribuigdo continua para a qual a (3.7.12). Além disso, deixe o pdf marginal deZserfem random sample from a continuous distribution for which (3.7.12). Also, let the marginal p.df. of Z be fo in pdf éfDetermine a probabilidade de que pelo menoskdestesn _—_Eq. (3.7.11). Prove aye a fdp condicional deX3dado the p.d.f. is f. Determine the probability that at least k Eq. (3.7.11). Prove that the conditional p.d.f. of X3 given variaveis aleatdrias ficardéo em um intervalo especificado as (M1, Xe (m1, x28 ; G(XhzZ)g0(z| x1, x2)dz, ondegoé of these n random variables will lie in a specified interval = (x, X) = (x4, x9) is I g(x3|z)go(zlx1, x2) dz, where go is xb. definido na Eq. (3.7.13). (Vocé pode provar isso mesmo que ndo aSx8b. defined in Eq. (3.7.13). (You can prove this even if you 8.Suponha que a pdf de uma variavel aleatériaXé o consiga calcular a integral na forma fechada.) 8. Suppose that the p.d.f. of a random variable X is as Cannot compute the integral in closed form.) seguinte: { 14.Considere a situa¢do descrita no Exemplo 3.7.14. follows: 14. Consider the situation described in Example 3.7.14. fix I nex parax >0 Suponha queXi= 5 eX2= 7 sdo observados. f@= J nem for x >0 Suppose that X; =5 and X, =7 are observed. Q de outra forma, a.Calcule o pdf condicional deX3dado(™1, X2 6,7). (Vocé 0 otherwise. a. Compute the conditional p.d.f. of X3 given (Xi, X7)= pode usar 0 resultado indicado no Exercicio 12.) (5, 7). (You may use the result stated in Exercise 12.) Suponha tambem que para qualquer valor dadox=x (x >0), b.Encontre a probabilidade condicional de quexX3>3 dado (X1 Suppose also that for any given value X=x(x> 9), the n b. Find the conditional probability that X3 > 3 given on variaveis aleatoriasS1,..., SnSa0 lid e o pdf _ X2)=(5,7)e compare-o com 0 valor de Pr(X3>3) random variables Y;, . ms Y,, are i.i.d. and the conditional (X;, X>) = (5, 7) and compare it to the value of condicionalgde cada um deles € o seguinte: encontrado no Exemplo 3.7.9. Vocé pode sugerir uma p.d.f. g of each of them is as follows: Pr(X3 > 3) found in Example 3.7.9. Can you suggest { 1 razdo pela qual a probabilidade condicional deveria ser 1 a reason why the conditional probability should be gls|xe x _~«~Pata Oy<x, maior que a probabilidade marginal? g(ylx) = | xy for0<y<x, higher than the marginal probability? Q de outra forma. 0 otherwise. . . 15.DeixarXi,..., Xnsejam varidveis aleatdérias independentes e 15. Let X;,..., X,, be independent random variables, and Determinar(a)a junta marginal pdf deSi,..., Sne sejam Gseja uma variavel aleatéria tal que Pr(=c1 para alguma Determine (a) the marginal joint p.d-f of Y;,..., Y, and let W be a random variable such that Pr(W = c) = 1 for (b)o pdf condicional deXpara quaisquer valores dados de S _constantec. Prove isso, ..., XnSdo condicionalmente (b) the conditional p.d.f. of X for any given values of some constant c. Prove that X;,..., X, are conditionally 1,224, Sm independentes, dados@c. Y,,..., Vy. independent given W =c. 3.8 Funcdes de uma variavel aleatoéria 3.8 Functions of a Random Variable Muitas vezes descobrimos que depois de calcularmos a distribuigaéo de uma varidvel Often we find that after we compute the distribution of a random variable X, we aleatériaX,nés realmente queremos a distribuic¢ao de alguma fungao deX.Por exemplo, seX really want the distribution of some function of X. For example, if X is the rate at é a taxa na qual os clientes sao atendidos em uma fila, ent&01/Xé o tempo médio de which customers are served in a queue, then 1/ X is the average waiting time. If we espera. Se tivermos a distribuic¢ao deX,devemos ser capazes de determinar a distribuicao have the distribution of X, we should be able to determine the distribution of 1/X de1/X ou de qualquer outra fun¢ao dex.Como fazer isso é o assunto desta se¢ao. or of any other function of X. How to do that is the subject of this section. 168 Capitulo 3 Varidveis Aleatdrias e Distribuigées 168 Chapter 3 Random Variables and Distributions Variavel aleatoria com distribuicgdo discreta Random Variable with a Discrete Distribution Exemplo Distancia do meio.DeixarXAtem a distribuicdo uniforme nos inteiros Example Distance from the Middle. Let X have the uniform distribution on the integers 3.8.1 1,2,...,9. Suponha que estejamos interessados em saber até que pontoxXvem do 3.8.1 1, 2,..., 9. Suppose that we are interested in how far X is from the middle of the meio da distribuigdo, ou seja, 5. Poderiamos definirS= |X-5| e calcular probabilidades distribution, namely, 5. We could define Y = |X — 5| and compute probabilities such como Pr(S=1}FPr.(XE {4,6} -2/. - as Pr(Y = 1) = Pr(X € {4, 6}) =2/9. < O Exemplo 3.8.1 ilustra o procedimento geral para encontrar a distribuicdo de Example 3.8.1 illustrates the general procedure for finding the distribution of a uma funcgdo de uma variavel aleatoria discreta. O resultado geral é direto. function of a discrete random variable. The general result is straightforward. Teorema Funcdo de uma variavel aleatéria discreta.DeixarXtem uma distribuigdo discreta com PFF Theorem Function of a Discrete Random Variable. Let X have a discrete distribution with p.f. f, 3.8.1 e deixar S=r(X)para alguma funcdo deAdefinido no conjunto de valores possiveis dex. 3.8.1 and let Y =r(X) for some function of r defined on the set of possible values of X. Para cada valor possivelsimdeS,o PFgdeSé For each possible value y of Y, the p.f. g of Y is 2d g(S=Pr.(S=e=Pr[ (XE sim] = AX). = g(y) = Pr(Y = y) = Pr[r(X) = yJ= > f (x). = xir(xsim xir(x)=y Exemplo Distancia do meio.Os possiveis valores deSno Exemplo 3.8.1 sdo 0, 1, 2, 3, Example Distance from the Middle. The possible values of Y in Example 3.8.1 are 0, 1, 2, 3, 3.8.2 e 4. Vemos issoS=0 se e somente seX=5, entadogOFF(5-1A. Para todos os 3.8.2 and 4. We see that Y = 0 if and only if X =5, so g(0) = f(5) = 1/9. For all other outros valores deS,existem dois valores deXque dao esse valor deS.Por exemplo, values of Y, there are two values of X that give that value of Y. For example, {S=4} = {X=1}U (XE 9}. Entdo, g/SF2/4 parasin=1,2,3,4. - {Y =4} = {X =1} U {X = 9}. So, g(y) = 2/9 for y = 1, 2, 3, 4. < Variavel aleatéria com distribuigdo continua Random Variable with a Continuous Distribution Se uma variavel aleatériaXtem uma distribuigdo continua, entéo o procedimento para derivar a If arandom variable X has a continuous distribution, then the procedure for deriving distribuicdo de probabilidade de uma funcdo deXdifere daquele dado para uma distribuicdo the probability distribution of a function of X differs from that given for a discrete discreta. Uma maneira de proceder é por calculo direto como no Exemplo 3.8.3. distribution. One way to proceed is by direct calculation as in Example 3.8.3. Exemplo Tempo médio de espera.DeixarZser a taxa na qual os clientes sao atendidos em uma fila, Example Average Waiting Time. Let Z be the rate at which customers are served in a queue, 3.8.3 e suponha queZtem um cdf continuofO tempo médio de espera 6S=1/Z. 3.8.3 and suppose that Z has a continuous c.d.f. F. The average waiting time is Y = 1/Z. Se quisermos encontrar o cdfGdeS,nés podemos escrever If we want to find the c.d.f. G of Y, we can write ( 1 ) 1 ) ( 1 ) ( ), 1 1 1 1 G(s Pr.(SsePr. =<sim=Pr. Z —-, =Pr. Z>—-, =1-F -—, G(y)=Pr(Y < y)=Pr ( < r) = Pr (z > *) = Pr (z > *) =1-F (<) ; Z sim sim sim Z y y y onde a quarta igualdade decorre do fato de queZztem uma distribuigdo continua where the fourth equality follows from the fact that Z has a continuous distribution tal que Pr(Z=1/s0. - so that Pr(Z = 1/y) =0. < Em geral, suponha que a pdf deXé# que outra variavel aleatéria é definida como In general, suppose that the p.d.f. of X is f and that another random variable is S=r(X). Para cada numero realsim, o cdf G/s)de Spode ser derivado da seguinte forma: defined as Y = r(X). For each real number y, the c.d.f. G(y) of Y can be derived as follows: GisFPr {Sse PrLrOQssim| G(y) =Pr(iY < y) = Pr[r(X) < y] = fixdx. = / f(x) dx. {xr(x)ssim} {xir(x)<y} Se a variavel aleatériaStambém tem distribuigdo continua, seu pdfgpode ser obtido a If the random variable Y also has a continuous distribution, its p.d.f. g can be obtained partir da relagdo from the relation dGly) dG(y) gis gy) =. morrer dy Esta relacdo é satisfeita em todos os pontossimem qualGé diferenciavel. This relation is satisfied at every point y at which G is differentiable. 3.8 Funcées de uma variavel aleatéria 169 3.8 Functions of a Random Variable 169 Figura 3.230 pdf de S asin) Figure 3.23 The p.d-f. of a(y) X2no Exemplo 3.8.4. Y = X? in Example 3.8.4. 0) 1 sim 0 1 y Exemplo Derivando o pdf dexzquandoXTem uma distribuigdo uniforme.Suponha queXtem o Example Deriving the p.d.f. of X? when X Has a Uniform Distribution. Suppose that X has the 3.8.4 distribuigdo uniforme no intervalo [-1,1], entao 3.8.4 uniform distribution on the interval [—1, 1], so { 1/2 para-1Sxs<1, 1/2 for-l<x <1, fix P f(x) = | / 7 0 de outra forma. 0 otherwise. Vamos determinar a pdf da variavel aleatéria S=X2. We shall determine the p.d.f. of the random variable Y = X?. Desde S=X2, entdoSdeve pertencer ao intervalo 0<S<1. Assim, para cada valor Since Y = X?, then Y must belong to the interval 0 < Y < 1. Thus, for each value deStal que O0<sims1, 0 CDFG(sddeSé of Y such that 0 < y <1, the c.d.f. G(y) of Y is G(sPr.(SsePr. (X2<e) G(y) = Pr(Y < y) =Pr(X? <y) =Pr.-sinn2sXS sim) _ Pr(—y!/? <X< yl/2y Jsinna yl = fxdx= sim 2. = / f(x)dx =y'/?. = sim —yl/2 Para 0<vocé <1, seque-se que 0 pdfg(/sddesé For 0 < y < 1, it follows that the p.d.f. g(y) of Y is dGly) _ 1 _dG(y) 1 oS mover sim" gy) = dy 7 2yl/2° Este pdf deSesta esbocado na Figura 3.23. Deve-se notar que emboraSé This p.d.f. of Y is sketched in Fig. 3.23. It should be noted that although Y is simplesmente o quadrado de uma variavel aleatéria com distribuigdo uniforme, a pdf simply the square of a random variable with a uniform distribution, the p.d.f. of Y is de Sé ilimitado na vizinhancga desim=0. - unbounded in the neighborhood of y = 0. < Funcées lineares sdo transformagédes muito uteis e a fdp de uma funcdo linear Linear functions are very useful transformations, and the p.d.f. of a linear func- de uma variavel aleatoria continua é facil de derivar. A prova do seguinte resultado é tion of a continuous random variable is easy to derive. The proof of the following deixada ao leitor no Exercicio 5. result is left to the reader in Exercise 5. Teorema Fungdo linear.Suponha queXé uma variavel aleatéria para a qual o pdf éf essa Theorem Linear Function. Suppose that X is arandom variable for which the p.d-f. is f and that 3.8.2 S=machado+ BA=0). Entdo o pdf de sé 3.8.2 Y =aX +b (a £0). Then the p.d-f. of Y is 1 ( mb 1 —b Gs- —f — para -~<vocé <o, (3.8.1) a(y) = ar(*) for —co< y<o, (3.8.1) la a la| a e 0 caso contrario. = and 0 otherwise. = A transformacdo integral de probabilidade The Probability Integral Transformation Exemplo DeixarXser uma variavel aleatéria continua com pdff(x-experiéncia-x)parax >0 e 0 Example Let X be a continuous random variable with p.d.f. f(x) = exp(—x) for x > 0 and 0 3.8.5 de outra forma. O cdf deXéF/x1 - exp(¢-x)parax >0 e 0 caso contrario. Se deixarmos 3.8.5 otherwise. The c.d.f. of X is F(x) = 1 — exp(—x) for x > 0 and 0 otherwise. If we let 170 Chapter 3 Random Variables and Distributions F be the function r in the earlier results of this section, we can find the distribution of Y = F(X). The c.d.f. or Y is, for 0 < y < 1, G(y) = Pr(Y ≤ y) = Pr(1 − exp(−X) ≤ y) = Pr(X ≤ − log(1 − y)) = F(− log(1 − y)) = 1 − exp(−[− log(1 − y)]) = y, which is the c.d.f. of the uniform distribution on the interval [0, 1]. It follows that Y has the uniform distribution on the interval [0, 1]. ◀ The result in Example 3.8.5 is quite general. Theorem 3.8.3 Probability Integral Transformation. Let X have a continuous c.d.f. F, and let Y = F(X). (This transformation from X to Y is called the probability integral transformation.) The distribution of Y is the uniform distribution on the interval [0, 1]. Proof First, because F is the c.d.f. of a random variable, then 0 ≤ F(x) ≤ 1 for −∞ < x < ∞. Therefore, Pr(Y < 0) = Pr(Y > 1) = 0. Since F is continuous, the set of x such that F(x) = y is a nonempty closed and bounded interval [x0, x1] for each y in the interval (0, 1). Let F −1(y) denote the lower endpoint x0 of this interval, which was called the y quantile of F in Definition 3.3.2. In this way, Y ≤ y if and only if X ≤ x1. Let G denote the c.d.f. of Y. Then G(y) = Pr(Y ≤ y) = Pr(X ≤ x1) = F(x1) = y. Hence, G(y) = y for 0 < y < 1. Because this function is the c.d.f. of the uniform distribution on the interval [0, 1], this uniform distribution is the distribution of Y. Because Pr(X = F −1(Y)) = 1 in the proof of Theorem 3.8.3, we have the following corollary. Corollary 3.8.1 Let Y have the uniform distribution on the interval [0, 1], and let F be a continuous c.d.f. with quantile function F −1. Then X = F −1(Y) has c.d.f. F. Theorem 3.8.3 and its corollary give us a method for transforming an arbitrary continuous random variable X into another random variable Z with any desired continuous distribution. To be specific, let X have a continuous c.d.f. F, and let G be another continuous c.d.f. Then Y = F(X) has the uniform distribution on the interval [0, 1] according to Theorem 3.8.3, and Z = G−1(Y) has the c.d.f. G according to Corollary 3.8.1. Combining these, we see that Z = G−1[F(X)] has c.d.f. G. Simulation Pseudo-Random Numbers Most computer packages that do statistical analyses also produce what are called pseudo-random numbers. These numbers appear to have some of the properties that a random sample would have, even though they are generated by deterministic algorithms. The most fundamental of these programs are the ones that generate pseudo-random numbers that appear to have the uniform distribution on the interval [0, 1]. We shall refer to such functions as uniform pseudo- random number generators. The important features that a uniform pseudo-random number generator must have are the following. The numbers that it produces need to be spread somewhat uniformly over the interval [0, 1], and they need to appear to be observed values of independent random 170 Capítulo 3 Variáveis Aleatórias e Distribuições Fseja a funçãoRnos resultados anteriores desta seção, podemos encontrar a distribuição deS=F(X). O cdf ouSé, para 0<você <1, G(s)=Pr.(S≤e)=Pr.(1 − exp(-X)≤e)=Pr.(X≤ −registro(1 -e)) =F (-registro(1 -e))=1 − exp(−[− registro(1 -e)])=sim, que é o cdf da distribuição uniforme no intervalo [0,1]. Segue queS tem distribuição uniforme no intervalo [0,1]. - O resultado no Exemplo 3.8.5 é bastante geral. Teorema 3.8.3 Transformação Integral de Probabilidade.DeixarXtem um cdf contínuoF,e deixarS=F(X). (Esta transformação deXparaSé chamado detransformação integral de probabilidade .) A distribuição deSé a distribuição uniforme no intervalo [0,1]. ProvaPrimeiro, porqueFé o cdf de uma variável aleatória, então 0≤F(x)≤1 para − ∞<x <∞.Portanto, Pr.(S <0)=Pr.(S >1)=0. DesdeFé contínuo, o conjunto dexde tal modo queF(x)=simé um intervalo fechado e limitado não vazio [x0, x1] para cadasim no intervalo(0,1). DeixarF-1(s)denotar o ponto final inferiorx0desse intervalo, que foi chamado desimquantil deFna Definição 3.3.2. Desta maneira,S≤simse e apenas se X≤ x1. DeixarGdenotar o cdf deS.Então G(s)=Pr.(S≤e)=Pr.(X≤x1)=F (x1)=você. Por isso,G(s)=simpara 0<você <1. Porque esta função é o cdf da distribuição uniforme no intervalo [0,1], esta distribuição uniforme é a distribuição deS. Porque Pr.(X=F-1(S))=1 na prova do Teorema 3.8.3, temos o seguinte corolário. Corolário 3.8.1 DeixarStem a distribuição uniforme no intervalo [0,1], e deixeFser um contínuo cdf com função quantilF-1. EntãoX=F-1(S)tem cdfF. O Teorema 3.8.3 e seu corolário nos fornecem um método para transformar uma variável aleatória contínua arbitráriaXem outra variável aleatóriaZcom qualquer distribuição contínua desejada. Para ser específico, deixeXtem um cdf contínuoF,e deixarG seja outro cdf contínuo EntãoS=F(X)tem distribuição uniforme no intervalo [0 ,1] de acordo com o Teorema 3.8.3, eZ=G−1(S)tem o cdfGde acordo com o Corolário 3.8.1. Combinando estes, vemos queZ=G−1[F(X)] tem cdfG. Simulação Números pseudo-aleatóriosA maioria dos pacotes de computador que fazem análises estatísticas também produzem o que chamamos denúmeros pseudo-aleatórios. Esses números parecem ter algumas das propriedades que uma amostra aleatória teria, embora sejam gerados por algoritmos determinísticos. Os mais fundamentais desses programas são aqueles que geram números pseudoaleatórios que parecem ter distribuição uniforme no intervalo [0,1]. Nos referiremos a essas funções comogeradores de números pseudoaleatórios uniformes. Os recursos importantes que um gerador uniforme de números pseudo-aleatórios deve ter são os seguintes. Os números que ele produz precisam ser distribuídos de maneira uniforme ao longo do intervalo [0,1], e eles precisam parecer valores observados de variáveis aleatórias independentes. 3.8 Funcées de uma variavel aleatéria 171 3.8 Functions of a Random Variable 171 variaveis. Esta Ultima caracteristica 6 muito complicada de definir com preciséo. Um variables. This last feature is very complicated to word precisely. An example of a exemplo de sequéncia quendoparecem ser observacées de variaveis aleatdrias sequence that does not appear to be observations of independent random variables independentes seria aquela que estava perfeitamente espacada. Outro exemplo seria would be one that was perfectly evenly spaced. Another example would be one with aquele com o seguinte comportamento: Suponha que olhamos para a sequénciaX, X2,... the following behavior: Suppose that we look at the sequence X1, X2,... one ata um de cada vez, e cada vez que encontramos umXeu>0.5, anotamos o proximo numero Xeu time, and every time we find an X; > 0.5, we write down the next number X;,1. If the +1, Se a subsequéncia de numeros que anotamos no estiver espalhada de maneira subsequence of numbers that we write down is not spread approximately uniformly aproximadamente uniforme no intervalo [0,1], entéo a sequéncia original nado se parece over the interval [0, 1], then the original sequence does not look like observations com observacées de variaveis aleatdrias independentes com distribuigdéo uniforme no of independent random variables with the uniform distribution on the interval [0, 1]. intervalo [0,1]. A razdo é que a distribuigdo condicional deXeu+1dado queXeu>0.5 deve ser The reason is that the conditional distribution of X;+; given that X; > 0.5 is supposed uniforme no intervalo [0,1], de acordo com a independéncia. to be uniform over the interval [0, 1], according to independence. Gerando numeros pseudo-aleatorios com uma distribuigado especificadaUm uniforme Generating Pseudo-Random Numbers Having a Specified Distribution A uniform gerador de numeros pseudo-aleatérios pode ser usado para gerar valores de uma pseudo-random number generator can be used to generate values of a random variavel aleatériaStendo qualquer cdf continuo especificadoG. Se uma variavel variable Y having any specified continuous c.d.f. G. If a random variable X has the aleatériaXtem distribuigdo uniforme no intervalo [0,1] e se a fungdo quantilG-1é uniform distribution on the interval [0, 1] and if the quantile function G~! is defined definido como antes, entdo segue do Corolario 3.8.1 que o cdf da variavel aleatéria S as before, then it follows from Corollary 3.8.1 that the c.d.f. of the random variable =G-1(XNai serG. Portanto, se um valor deXé produzido por um gerador uniforme de Y =G~|(X) will be G. Hence, if a value of X is produced by a uniform pseudo- numeros pseudoaleatoérios, entdo o valor correspondente deStera a propriedade random number generator, then the corresponding value of Y will have the desired desejada. Servalores independentes%, ..., Xnsdo produzidos pelo gerador, entdo os property. If n independent values X;,..., X, are produced by the generator, then valores correspondentes51,..., Snparecera formar uma amostra aleatoria de the corresponding values Y,,..., Y,, will appear to form a random sample of size n tamanhon da distribuigdo com o cdfG. from the distribution with the c.d.f. G. Exemplo Gerando Valores Independentes a partir de um PDF EspecificadoSuponha que um pseudo- Example Generating Independent Values from a Specified p.d.f. Suppose that a uniform pseudo- 3.8.6 gerador de numeros aleatérios deve ser usado para gerar trés valores independentes de 3.8.6 random number generator is to be used to generate three independent values from a distribuigdo para a qual o pdfgé o seguinte: the distribution for which the p.d.f. g is as follows: { 3(2 -e) para 0<vocé <2, 1(22—y) for0<y <2, OSE g(y= 2 > > Q de outra forma. 0 otherwise. Para 0<vocé <2, o CDFGda distribuigdo dada é For 0 < y <2, the c.d-f. G of the given distribution is i 2 . SHIT? G(sEsim- —. G(y)=y—- a 4 4 Além disso, para 0<x <1, a fungdo inversasim=G-1(x)pode ser encontrado resolvendo a Also, for 0 < x <1, the inverse function y= G~!(x) can be found by solving the equagdox=G(s)parasim. O resultado é equation x = G(y) for y. The result is sim=G-1(x2[1 -(1 -xhz]. (3.8.2) y=G (x) =2[1- d— x7]. (3.8.2) A proxima etapa é gerar trés numeros pseudo-aleatorios uniformesx1,x2, €x3 The next step is to generate three uniform pseudo-random numbers x,, x», and x3 usando o gerador. Suponha que os trés valores gerados sejam using the generator. Suppose that the three generated values are X1= 0.4125, x2= 0.0894, x3= 0.8302. x,=0.4125, x, =0.0894, x3 = 0.8302. Quando esses valores dex1,x2, ex3sdo substituidos sucessivamente na Eq. (3.8.2), os When these values of x1, x2, and x3 are substituted successively into Eq. (3.8.2), valores desimque sao obtidos sdosinm= 0.47, sirm= 0.09, esima= 1.18. Estes sdo entdo the values of y that are obtained are y,; = 0.47, y, = 0.09, and y3 = 1.18. These are tratados como os valores observados de trés varidveis aleatérias independentes com a then treated as the observed values of three independent random variables with the distribuigdo para a qual a pdf ég. - distribution for which the p.d.f. is g. < SeGé um cdf geral, existe um método semelhante ao Corolario 3.8.1 que pode ser If G is a general c.d-f., there is a method similar to Corollary 3.8.1 that can be usado para transformar uma variavel aleatéria uniforme em uma variavel aleatéria com used to transform a uniform random variable into a random variable with c.d.f. G. cdfG. Veja o Exercicio 12 nesta segdo. Existem outros métodos computacionais para gerar See Exercise 12 in this section. There are other computer methods for generating valores a partir de certas distribuigdes especificadas que sdo mais rapidos e precisos do values from certain specified distributions that are faster and more accurate than que usar a funcdo quantil. Esses topicos sao discutidos nos livros de Kennedy e using the quantile function. These topics are discussed in the books by Kennedy and 172 Capitulo 3 Varidveis Aleatdrias e Distribuigées 172 Chapter 3 Random Variables and Distributions Suave (1980) e Rubinstein (1981). O Capitulo 12 deste texto contém técnicas e exemplos que Gentle (1980) and Rubinstein (1981). Chapter 12 of this text contains techniques and mostram como a simulacao pode ser usada para resolver problemas estatisticos. examples that show how simulation can be used to solve statistical problems. Fungao Gerakm geral, seXtem uma distribuigdo continua e seS=r(X), entdo nado é General Function In general, if X has a continuous distribution and if Y =r(X), necessariamente verdade queStambém tera uma distribuicgdo continua. Por then it is not necessarily true that Y will also have a continuous distribution. For ex- exemplo, suponha que/s(xc ondecé uma constante, para todos os valores dexem ample, suppose that r(x) = c, where c is a constant, for all values of x in some interval algum intervalo asx<b, e que Pr(asXsb) >0. Entao Pr(S=c) >0. Desde a distribuigdo des a <x <b,and that Pr(a < X <b) > 0. Then Pr(Y =c) > 0. Since the distribution of Y atribui probabilidade positiva ao valorc, esta distribuig¢do ndo pode ser continua. Para assigns positive probability to the value c, this distribution cannot be continuous. In derivar a distribuigdo deSem um caso como este, o cdf deSdeve ser obtido aplicando order to derive the distribution of Y in a case like this, the c.d.f. of Y must be derived métodos como os descritos acima. Para certas funcdesA, no entanto, a distribuigdo by applying methods like those described above. For certain functions r, however, deSsera continuo; e entdo sera possivel derivar o pdf deSdiretamente, sem primeiro the distribution of Y will be continuous; and it will then be possible to derive the derivar sua fdc. Desenvolveremos esse caso em detalhes no final desta secdo. p.d.f. of Y directly without first deriving its c.d.f. We shall develop this case in detail at the end of this section. Derivacao Direta do pdf Quando Ré um para um e diferenciavel Direct Derivation of the p.d.f. When r is One-to-One and Differentiable Exemplo Tempo médio de espera.Considere o Exemplo 3.8.3 novamente. O pdfgdeSpode ser com- Example Average Waiting Time. Consider Example 3.8.3 again. The p.d-f. g of Y can be com- 3.8.7 colocado deG(s¥1 -F (1/s)porquefe 1/sambos tém derivadas em lugares suficientes. 3.8.7 puted from G(y) = 1-— F(1/y) because F and 1/y both have derivatives at enough Aplicamos a regra da cadeia de diferenciacdo para obter places. We apply the chain rule for differentiation to obtain | ( ) () dGly) dF(x) 1 1 1 dG(y) dF (x) 1 1\ 1 ose SM =. SO RL, gy)= SX =- -s)=f(-)<. morrer aN 1/5 shim aa dy dx |xatjy\ y y/ y exceto emsim=0 e nesses valores desimde tal modo queF(xndo é diferencidvel emx=1/s. except at y = 0 and at those values of y such that F(x) is not differentiable at x = 1/y. - < Fungdes um-para-um diferencidveisO método usado no Exemplo 3.8.7 generaliza Differentiable One-To-One Functions The method used in Example 3.8.7 general- para funcées biunivocas diferenciaveis muito arbitrarias. Antes de declarar 0 resultado izes to very arbitrary differentiable one-to-one functions. Before stating the general geral, devemos relembrar algumas propriedades de fungées injetoras diferenciaveis do result, we should recall some properties of differentiable one-to-one functions from calculo. DeixarRser uma funcao biunivoca diferenciavel no intervalo aberto(uma, b). Entao calculus. Let r be a differentiable one-to-one function on the open interval (a, b). Ré estritamente crescente ou estritamente decrescente. PorqueRtambém é€ continuo, Then r is either strictly increasing or strictly decreasing. Because r is also continu- mapeara o intervalo(uma, b)para outro intervalo aberto(a, 8), Chamou oi/magem de (a, b) ous, it will map the interval (a, b) to another open interval (a, 8), called the image of sobR. Ou seja, para cadaxE(uma, b),r(x)E(a, B), e para cadasimE(a, B)na xE(uma, bye tal (a, b) under r. That is, for each x € (a, b), r(x) € (@, B), and for each y € (a, B) there is modo quesi=r(x}e istosimé Unico porqueRé um para um. Entdo o inverso édeRexistira no x € (a, b) such that y = r(x) and this y is unique because r is one-to-one. So the inverse intervalo(a, 8), significando que paraxe€(uma, besim€(a, BJNds temos/(xsimse e apenas s of r will exist on the interval (a, 8), meaning that for x € (a, b) and y € (a, B) we ses(séx. A derivada deéexistira (possivelmente infinito), e esta relacionado coma have r(x) = y if and only if s(y) = x. The derivative of s will exist (possibly infinite), derivada deRpor and it is related to the derivative of r by ( | )-4 -1 ds(s) _ oO) ds(y) _ oo morrer ax x=5(s) dy dx x=s(y) Teorema DeixarXseja uma variavel aleatéria para a qual a pdf éf para o qual Pr(a<X <bF'. Theorem Let X be arandom variable for which the p.d-f. is f and for which Pr(a < X <b) =1. 3.8.4 (Aqui,ae/oubpode ser finito ou infinito.) DeixeS=r(X), e suponha quer(x) é 3.8.4 (Here, a and/or b can be either finite or infinite.) Let Y = r(X), and suppose that r(x) diferenciavel e um-para-um parauma < x < b. Deixar(a, Beja a imagem do is differentiable and one-to-one for a < x < b. Let (a, 8) be the image of the interval intervalo (a, bsob a fungaoR. Deixars(sseja a fungdo inversa der(x)paraa <y < B. (a, b) under the function r. Let s(y) be the inverse function of r(x) for a < y < B. Entdo 0 pdfgdeSé Then the p.d.f. g of Y is | ds(s) ds(y) s(sJl| | ——lparaa <y <B, ——|_ for , ols fs(sIl | om lp y<B, (3.8.3) g(y) = fis] dy a<y<f (3.8.3) 0 de outra forma. 0 otherwise. 3.8 Funcées de uma variavel aleatéria 173 3.8 Functions of a Random Variable 173 ProvaSefesta aumentando, entéoéesté aumentando, e para cadasim€(a, f), Proof If, is increasing, then s is increasing, and for each y € (a, 8), G(sFPr.(SseHPr[ (Xs sim] = PrlLXss(sJ] =AL (SJ. G(y) =Pr(Y < y) = Pr[r(X) < y])=Pr[X <s(y)] = F[sQ)]. Segue queGé diferenciavel em tudosimonde amboséé diferenciavel e onde F/x}é It follows that G is differentiable at all y where both s is differentiable and where diferenciavel emx=s(s). Usando a regra da cadeia para diferenciacdo, seque-se F(x) is differentiable at x = s(y). Using the chain rule for differentiation, it follows que a pdfo(s)paraa <y < Pvai ser that the p.d-f. g(y) fora < y < B will be dGiy) _ daAs(sh ds(s) dG dF[s(y) ds gis PGW). PASS peysy F505). (3.8.4) g(y) = 2) = LON = pp o¢y] BO (3.8.4) morrer morrer morrer dy dy dy Porqueéesté aumentando, ds(y//ayé positivo; portanto, é igual a | ds(y)/dy|e Eq. (3.8.4) implica a Because s is increasing, ds(y)/dy is positive; hence, it equals |ds(y)/dy| and Eq. Eq. (3.8.3). Da mesma forma, seResta diminuindo, entaoéesta diminuindo e para cadasime€(a, 3) (3.8.4) implies Eq. (3.8.3). Similarly, if r is decreasing, then s is decreasing, and for , each y € (a, 8), G(sFPr[r(Xs sim] = Pr[X25(s)] = 1 -ALs(sJ]. G(y) = Pr[r(X) < y]= Pr[X = s(Q)] =1—- F[sQ)]. Usando novamente a regra da cadeia, diferenciamosGpara obter o pdf des Using the chain rule again, we differentiate G to get the p.d-f. of Y dGly) ds(s) dG(y) ds(y) ose RM = -fsisy SO. (3.8.5) g(y) =F = ffs] S. (3.8.5) morrer morrer dy dy Desdeéesta diminuindo estritamente, ds(y//dyé negativo, entdo -ds(y//dyé igual | ds(y)/ dy Since s is strictly decreasing, ds(y)/dy is negative so that —ds(y)/dy equals |ds(y)/ |. Segue-se que a Eq. (3.8.5) implica a Eq. (3.8.3). 7 dy|. It follows that Eq. (3.8.5) implies Eq. (3.8.3). 7 Exemplo Crescimento microbiano.Um modelo popular para populacdes de organismos microscépicos em Example Microbial Growth. A popular model for populations of microscopic organisms in 3.8.8 grandes ambientes é um crescimento exponencial. No tempo 0, suponha quevorganismos sao 3.8.8 large environments is exponential growth. At time 0, suppose that v organisms are introduzidos em um grande tanque de agua e deixadosXseja a taxa de crescimento. Depois do tempo introduced into a large tank of water, and let X be the rate of growth. After time t,poderiamos prever um tamanho populacional deeuxzAssuma issoXé desconhecido, mas tem um t, we would predict a population size of ve*’. Assume that X is unknown but has a distribuigdo continua com pdf continuous distribution with p.d.f. { 3(1 -x, ara O<x <1, 3(1—x)? forO<x <1, foe RP fon =| G9) . 0 de outra forma. 0 otherwise. Estamos interessados na distribuigdéo deS=euxpara valores conhecidos devetPara ser We are interested in the distribution of Y = ve*’ for known values of v and fr. For mais concreto, deixev=10 e&5, de modo que/(x10e5x. concreteness, let v = 10 and t = 5, so that r(x) = 10e™. Neste exemplo, Pr(0<X<11 eRé uma funcao continua e estritamente crescente In this example, Pr(O < X < 1) =1 and, is a continuous and strictly increasing dexpara 0<x <1. Comoxvaria ao longo do intervalo(0,1), verifica-se que sim=r(xNaria function of x for 0 <x <1. As x varies over the interval (0, 1), it is found that ao longo do intervalo(10,10e5). Além disso, durante 10<vocé <1065, a fungdo inversa y =r(x) varies over the interval (10, 10e>). Furthermore, for 10 < y< 10e>, the és(sEregistro(s/0)A. Portanto, por 10<vocé <106s, inverse function is s(y) = log(y/10)/5. Hence, for 10 < y < 10e°, ds(s) _ 1 ds(y) _ 1 morer 5 SIM dy 5y" Segue-se da Eq. ai queg(sNai ser It follows from Eq. (3.8.3) that g(y) will be 3(1 - registro(s10)45 3(1-1 10)/5)* (1 registro(s10/5)_ por 10<vocé <106s, _ 31 = log(y/10)/5)" for10<y< 10e>, OSE l 5sim sO)= >y 0 de outra forma. - 0 otherwise. < Resumo Summary Aprendemos varios métodos para determinar a distribuigdo de uma fungdo de uma We learned several methods for determining the distribution of a function of a variavel aleatéria. Para uma variavel aleatériaXcom distribuigdo continua tendo pdf7, random variable. For a random variable X with a continuous distribution having seRé estritamente crescente ou estritamente decrescente com inverso diferenciavel p.d.f. f, if r is strictly increasing or strictly decreasing with differentiable inverse (ou seja,s(r(x)E xe é diferenciavel), entdo a pdf deS=r(X/eg(s s (Le., s(r(x)) =x and »s is differentiable), then the p.d.f. of Y=r(X) is g(y)= 174 Capitulo 3 Varidveis Aleatdrias e Distribuigées 174 Chapter 3 Random Variables and Distributions F (s(y)| ds(y)/dy|. Uma transformacdo especial nos permite transformar uma variavel f (s(v))|ds(y)/dy|. A special transformation allows us to transform a random variable aleatéria Xcom a distribuicdo uniforme no intervalo [0,1] em uma variavel aleatériaScom X with the uniform distribution on the interval [0, 1]into arandom variable Y with an um cdf continuo arbitrario Gpor S=G-1(X). Este método pode ser usado em conjunto com arbitrary continuous c.d.f. G by Y = G~!(X). This method can be used in conjunction um gerador uniforme de numeros pseudoaleatérios para gerar varidveis aleatérias com with a uniform pseudo-random number generator to generate random variables with distribuigées continuas arbitrarias. arbitrary continuous distributions. Exercicios Exercises 1.Suponha que a pdf de uma variavel aleatdériaXé o 10.DeixarXseja uma variavel aleatéria para a qual a pdf 1. Suppose that the p.d.f. of a random variable X is as 10. Let X be a random variable for which the p.d.f f is as p q p PP P P seguinte: como dado no Exercicio 3. Construa uma variavel aleatériaS= follows: given in Exercise 3. Construct a random variable Y = r(X) { 3x para0<x<1 1(X) para o qual o pdfgé como dado no Exercicio 9. 3x2 for 0 1 for which the p.d-f. g is as given in Exercise 9. ' _ | 3x- for0<x <1, fix 0 de outraf 11.Explique como usar um gerador uniforme de numeros fQ)= | 0 otherwise 11. Explain how to use a uniform pseudo-random number Sounrerorme. pseudoaleatorios para gerar quatro valores independentes a , generator to generate four independent values from a Além disso, suponha que S=1 -X2. Determine o pdf deS. partir de uma distribuigdo para a qual a pdf é Also, suppose that Y = 1— X?. Determine the p.d.f. of Y. distribution for which the p.d.f. is . . { ; 2.Suponha que uma variavel aleatoriaXpode ter cada J2sim+1) — para 0<vocé <1, 2. Suppose that a random variable X can have each of the IQy+1) for0<y<1, um dos sete valores —3,-2,-1,0,1,2,3 com igual WSF 0 seven values —3, —2, —1, 0, 1, 2, 3 with equal probability. g(y) = 0 herwi probabilidade. Determine o FP deS=X2-X. - de outra forma. Determine the p.f. of Y = X? — X. otherwise. 3.Suponha que a pdf de uma variavel aleatéria.Xé 0 12.Deixar Ser um cdf arbitrdrio (ndo necessariamente 3. Suppose that the pdf. of a random variable X is as 12. Let F be an arbitrary c.d.f. (not necessarily discrete, . p . q p discreto, ndo necessariamente continuo, nem . PP pe’ not necessarily continuous, not necessarily either). Let seguinte: . . . = «. follows: _1 . . ye { necessariamente). Deixar Fiseja a fungdo quantilica da F~* be the quantile function from Definition 3.3.2. Let X fixe 1 para O<x <2, Definicao 3.3.2. DeixarX tem a distribuigdo uniforme no fix) = 5x for0 <x <2, have the uniform distribution on the interval [0, 1]. Define 0 intervalo [0,1]. Definir S=F1(X). Prove que a cdf deSéF. Dica: x)= 0 herwi Y = F~*(X). Prove that the c.d.f. of Y is F. Hint: Compute He outra forma. Calcular PR(Sseem dois casos. Primeiro, faga 0 caso em otherwise. Pr(Y < y) in two cases. First, do the case in which y is the Além disso, suponha queS=X(2 -X). Determine ocdfeo _ quesimé 0 valor Unico dexde tal modo queF(x} F (s). Em Also, suppose that Y = X(2— X). Determine thec.d.f.and unique value of x such that F(x) = F(y). Second, do the pdf des. segundo lugar, faga 0 caso em que existe um intervalo the p.d.f. of Y. case in which there is an entire interval of x values such inteiro dexvalores tais queF(x F (s). that F(x) = F(y). 4.Suponha que o pdf dexXé como dado no Exercicio 3. 4. Suppose that the p.d.f. of X is as given in Exercise 3. ; ; : —f) _ 13.DeixarZser a taxa na qual os clientes sdo atendidos em : _4_ x3 13. Let Z be the rate at which customers are served in a Determine o pdf deS=4 -X3. q Determine the p.d.f. of Y =4 — X°. uma fila. Assuma issoZtem o pdf queue. Assume that Z has the p.d.f. 5.Prove o Teorema 3.8.2. (Dica‘Aplique 0 Teorema { 505, 5. Prove Theorem 3.8.2. (Hint: Either apply Theorem - 3.8.4 ou primeiro calcule o cdf separadamente parauma >0 e um < f(z paraz >0, 3.8.4 or first compute the c.d.f. seperately for a > 0 and f= | 2e for z > 0, 0.) 0 de outra forma. a< 0.) 0 otherwise. 6.Suponha que o pdf dexXé como dado no Exercicio 3. Encontre o pdf do tempo médio de espera 7=1/Z. 6. Suppose that the p.d.f. of X is as given in Exercise 3. Find the p.d.f. of the average waiting time T = 1/Z. Determine o pdf deS=3X+2. 14.DeixarXtem a distribuigdo uniforme no intervalo [ Determine the p.d.f of Y = 3X + 2. 14. Let X have the uniform distribution on the interval 7.Suponha que uma variavel aleatoriaXtem distribuigdo 7, 4], e deixarc >0. Prove issocX+atem a distribuicgao 7. Suppose that a random variable X has the uniform _ [4, 5], and let c > 0. Prove that cX +d has the uniform uniforme no intervalo [0,1]. Determine o pdf de uniforme no intervalo [cat d, cbt a]. distribution on the interval [0, 1]. Determine the p.d.f. of ‘istribution on the interval [ca + d, cb + d]. . , , 2 3 1/2 i, oo. (a)X2,(b)-X3, e(c)X12. 15.A maior parte do calculo no Exemplo 3.8.4 é (a) X*, (b) —X°, and (e) X"””. 15. Most of the calculation in Example 3.8.4 is quite gen- 8.Suponha que o pdf deXé o seguinte: bastante geral. Suponha queXtem distribuicdo ; 8. Suppose that the p.d.f. of X is as follows: eral. Suppose that x has a continuous distribution with { continua com pdffDeixar S=X2, e mostre que a pdf deSé p.d.f. f. Let Y = X*, and show that the p.d.f. of Y is fix= ex parax >0, 1 F(x) | e* forx>0, 1 i x)= 1/2 1/2 0 parax<o. gS =—If(siapfesima. 0 forx <0. g(y) = ayia FO y+ f(-y"/))]. . _ : _— y1/2 . Determine o pdf deS=X12. 16.No Exemplo 3.8.4, a pdf de5=X2é muito maior para valores de Determine the p.d.f. of Y = X¥/?. 16. In Example 3.8.4, the p.d.f. of Y = X? is much larger 9.Suponha queXtem distribuicdo uniforme no intervalo —_—-S/Proximo de 0 do que para valores desimperto de 1, apesar do 9. Suppose that X has the uniform distribution on the for values of y near 0 than for values of y near 1 despite [0,1]. Construa uma variavel aleatoriaS=r(X)para o qual fato de que o pdf deXé plano. Dé uma raz@o intuitiva pela qual interval [0, 1]. Construct a random variable Y = r(X) for the fact that the p.d.f. of X is flat. Give an intuitive reason o pdf sera isso ocorre neste exemplo. which the p.d.£. will be why this occurs in this example. { 17.Um agente de seguros vende uma apolice com franquia de $ 100 e limite 17. An insurance agent sells a policy which has a $100 de- 3 3,2 8 poly oS ssim _ Pata O<vocé <2, de $ 5.000. Isto significa que quando o tomador do seguro apresenta uma e(y) = gy for0<y <2, ductible and a $5000 cap. This means that when the policy 0 de outra forma. reclamacdo, o tomador do seguro deve pagar o primeiro 0 otherwise. holder files a claim, the policy holder must pay the first - policy pay 3.9 Functions of Two or More Random Variables 175 $100. After the first $100, the insurance company pays the rest of the claim up to a maximum payment of $5000. Any excess must be paid by the policy holder. Suppose that the dollar amount X of a claim has a continuous distribution with p.d.f. f (x) = 1/(1 + x)2 for x > 0 and 0 otherwise. Let Y be the amount that the insurance company has to pay on the claim. a. Write Y as a function of X, i.e., Y = r(X). b. Find the c.d.f. of Y. c. Explain why Y has neither a continuous nor a dis- crete distribution. 3.9 Functions of Two or More Random Variables When we observe data consisting of the values of several random variables, we need to summarize the observed values in order to be able to focus on the infor- mation in the data. Summarizing consists of constructing one or a few functions of the random variables that capture the bulk of the information. In this section, we describe the techniques needed to determine the distribution of a function of two or more random variables. Random Variables with a Discrete Joint Distribution Example 3.9.1 Bull Market. Three different investment firms are trying to advertise their mutual funds by showing how many perform better than a recognized standard. Each com- pany has 10 funds, so there are 30 in total. Suppose that the first 10 funds belong to the first firm, the next 10 to the second firm, and the last 10 to the third firm. Let Xi = 1 if fund i performs better than the standard and Xi = 0 otherwise, for i = 1, . . . , 30. Then, we are interested in the three functions Y1 = X1 + . . . + X10, Y2 = X11 + . . . + X20, Y3 = X21 + . . . + X30. We would like to be able to determine the joint distribution of Y1, Y2, and Y3 from the joint distribution of X1, . . . , X30. ◀ The general method for solving problems like those of Example 3.9.1 is a straight- forward extension of Theorem 3.8.1. Theorem 3.9.1 Functions of Discrete Random Variables. Suppose that n random variables X1, . . . , Xn have a discrete joint distribution for which the joint p.f. is f, and that m functions Y1, . . . , Ym of these n random variables are defined as follows: Y1 = r1(X1, . . . , Xn), Y2 = r2(X1, . . . , Xn), ... Ym = rm(X1, . . . , Xn). 3.9 Funções de duas ou mais variáveis aleatórias 175 $ 100. Após os primeiros $ 100, a seguradora paga o restante do sinistro até um pagamento máximo de $ 5.000. Qualquer franquia deverá ser paga pelo tomador do seguro. Suponha que o valor em dólaresXde uma reclamação tem uma distribuição contínua com pdff(x)=1/(1 +x)2parax >0 e 0 caso contrário. Deixar Sserá o valor que a seguradora deverá pagar pelo sinistro. a.EscreverScomo a função deX, ou seja,S=r(X). b.Encontre o cdf deS. c.Explique por queSnão tem distribuição contínua nem discreta. 3.9 Funções de duas ou mais variáveis aleatórias Quando observamos dados que consistem em valores de diversas variáveis aleatórias, precisamos resumir os valores observados para podermos focar nas informações dos dados. Resumir consiste em construir uma ou algumas funções das variáveis aleatórias que capturam a maior parte da informação. Nesta seção, descrevemos as técnicas necessárias para determinar a distribuição de uma função de duas ou mais variáveis aleatórias. Variáveis aleatórias com distribuição conjunta discreta Exemplo 3.9.1 Mercado em alta.Três empresas de investimento diferentes estão tentando anunciar seus interesses mútuos. fundos, mostrando quantos têm desempenho melhor do que um padrão reconhecido. Cada empresa possui 10 fundos, portanto são 30 no total. Suponha que os primeiros 10 fundos pertençam à primeira empresa, os próximos 10 à segunda empresa e os últimos 10 à terceira empresa. DeixarXeu= 1 se fundoeufunciona melhor que o padrão eXeu=0 caso contrário, paraeu=1, . . . ,30. Então, estamos interessados nas três funções S1=X1+. . .+X10, S2=X 11+. . .+X20, S3=X21+ . . .+X30. Gostaríamos de poder determinar a distribuição conjunta deS1,S2, eS3da distribuição conjunta deX1, . . . , X30. - O método geral para resolver problemas como os do Exemplo 3.9.1 é uma extensão direta do Teorema 3.8.1. Teorema 3.9.1 Funções de variáveis aleatórias discretas.Suponha quenvariáveis aleatóriasX1, . . . , Xn têm uma distribuição conjunta discreta para a qual o FP conjunto éf,e essaeufunções S1, . . . , S eudestesnvariáveis aleatórias são definidas da seguinte forma: S1=R1(X1, . . . , Xn), S2=R2(X1, . . . , Xn), . . . Seu=Reu(X1, . . . , Xn). 176 Capitulo 3 Varidveis Aleatdrias e Distribuicgées 176 Chapter 3 Random Variables and Distributions Para determinados valoressinm,..., simeudoewariaveis aleatdriasSi,..., Seu, deixarAdenotar o conjunto de For given values Vy>-++5 Vm of the m random variables Y1, wae Yo let A denote the todos os pontos(x1,..., Xndde tal modo que set of all points (x1, ..., x,,) such that Ri(xi,..., XnFsim, ry(x4,---5X%,) =V, R2(m1,...,XnF sim, 17(X1,---5X_) =Y2, Reu(x1,..., XnFsimeu. Vin (X4; wey Xp) =Ymn- Entdo o valor da junta pfgde51,..., Seué especificado no ponto(sinm, ..., simeu) pela Then the value of the joint p.f. g of Yj,..., Y,, is specified at the point (1, ..., Ym) relagdo by the relation 2d G(S1,..., SiMeu fa, ..., Xn). 7 BOD +s Yn) = > f (1, -..5 Xp). a (x1, ...XnJEA (X45 ---.Xn)EA Exemplo Mercado em alta.Lembre-se da situacdo no Exemplo 3.9.1. Suponha que queremos a uniao Example Bull Market. Recall the situation in Example 3.9.1. Suppose that we want the joint 3.9.2 pfgde(5si, S2, $3)no pontoG,5,8). Ou seja, queremosg3,5,8Pr.(Si= 3, S2= 5, S3= 8) O 3.9.2 p.f. g of (%, Yo, Y3) at the point (3, 5, 8). That is, we want g9(3, 5, 8) = Pr(Y; = 3, Y, = conjuntoAconforme definido no Teorema 3.9.1 é 5, Y3 = 8). The set A as defined in Theorem 3.9.1 is A= {(x1,..., B02X1+.. AXIO= 3, x14... .+x20= 5, x21+. . .+.X30= 8}. A= {(Xy,..., X39) 1p +--+ + x19 = 3, Hy ++ ++ + X99 =5, XQ] +--+ +239 = 8}. Dois dos pontos do conjuntoAsdo Two of the points in the set A are (1,1,1,0,0,0,0,0,0,0,1,1,1,1,1,0,0,0,0,0,1,1,1,1,1,1,1,1,0,0), (1,0,0,0,1,0,0,1, qd, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0), 0,0,0,1,1,0,0,1,0,1,0,1,1,0,1,1,1,0,1,1,1,1). (1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0,1, 1, 1, 0,1, 1,1, 1). Um argumento de contagem como aqueles desenvolvidos na Seg. 1.8 pode ser usado para descobrir que A counting argument like those developed in Sec. 1.8 can be used to discover that ha ( 1 there are 10 10 10\ /10\ /10 =1,360,800 = 1,360,800 3 5 8 3 5 8 pontos emA. A menos que a distribuic¢ao conjunta deXi,..., X3otem alguma estrutura simples, points in A. Unless the joint distribution of X,, ..., X39 has some simple structure, sera extremamente tedioso calcularg(3,5,8Joem como a maioria dos outros valores deg. Por it will be extremely tedious to compute g(3, 5, 8) as well as most other values of g. exemplo, se todos os 230valores possiveis do vetor(X1,..., X30sd0 igualmente provaveis, entado For example, if all of the 2*° possible values of the vector (X1, ..., X39) are equally likely, then 1,360,800 _ 1,360,800 3 92,5,8F —739 =1.27x10-3. - g(3, 5, 8) = 730 1.27 x 107. < O préximo resultado fornece um exemplo importante de uma funcdo de variadveis aleatérias discretas. The next result gives an important example of a function of discrete random variables. Teorema Distribuig6es Binomial e Bernoulli.Assuma isso, ..., XnSdo iid varidveis aleatérias Theorem Binomial and Bernoulli Distributions. Assume that X,,..., X,, are i.i.d. random vari- 3.9.2 capazes de ter a distribuigdo de Bernoulli com pardmetrop. DeixarS=Xit. . .+Xn. 3.9.2 ables having the Bernoulli distribution with parameter p. Let Y = X,+---+ Xj. Entdo Stem a distribuigdo binomial com pardmetrosnep. Then Y has the binomial distribution with parameters n and p. Prova E claro queS=simse UMn) e somente se exatamentesimdeM, ..., Xnigual a1 eo Proof It is clear that Y = y if and only if exactly y of X;,..., X, equal 1 and the outro/-simigual a 0. Existem __simvalores possiveis distintos para 0 vetor(X1,..., Xn) other n — y equal 0. There are (") distinct possible values for the vector (X1,..., X,) que témsimuns en-simzeros. Cada um desses vetores tem probabilidadepsim(1 -p)n-simde that have y ones and n — y zeros. Each such vector has probability p’(1 — p)"~” of ser observado; portanto (a probabilidade de queS=simé a soma das probabilidades desses being observed; hence the probability that Y = y is the sum of the probabilities of vetores, Ou seja, simpsim( -pynsimparasim=0, ..., n. Da DefinicSo 3.1.7, nds those vectors, namely, (5) p" — p)"~» for y =0, ..., n. From Definition 3.1.7, we veja issoStem a distribuigdo binomial com pardmetrosnep. = see that Y has the binomial distribution with parameters n and p. = Exemplo Amostragem de pegas.Suponha que duas maquinas estejam produzindo pegas. Paraeu=1,2, 0 Example Sampling Parts. Suppose that two machines are producing parts. For i = 1, 2, the 3.9.3 probabilidade épeuaquela maquinaeuproduzira uma pega defeituosa e assumiremos que todas 3.9.3 probability is p; that machine i will produce a defective part, and we shall assume as pecas de ambas as maquinas sdo independentes. Suponha que o primeirompecas sao that all parts from both machines are independent. Assume that the first n, parts produzidas pela maquina 1 e que a Ultimamzas pecas sdo produzidas pela maquina 2, are produced by machine 1 and that the last n2 parts are produced by machine 2, 3.9 Fungdes de duas ou mais variaveis aleatdrias 177 3.9 Functions of Two or More Random Variables 177 comn=m+nesendo o numero total de pecas amostradas. DeixarXeu=1 se oeua peca with n =n, +n, being the total number of parts sampled. Let X; = 1 if the ith part esta com defeito eXeu=0 caso contrario paraeu=1,..., 7. Definir=Xi+. . .+Xne S2=Xm is defective and X; = 0 otherwise for i =1,...,n. Define Y, = X; +---+X,, and +1+...+Xn, Estes sdo os numeros totais de pecas defeituosas produzidas por cada Y=X . 'y Th the tot lt b £ defecti ' d ed b maquina. As suposic6es declaradas no problema nos permitem concluir que 2=An4it-:* + An. These are the total numbers of detective parts produced by Sie S2sdo independentes de acordo com a nota sobre fungdes separadas each machine. The assumptions stated in the problem allow us to conclude that Y, and de variaveis _aleatorias independentes na pagina 140. Além disso, 0 Teorema 3.9.2 Yy are independent according to the note about separate functions of independent diz queStem a distribuigdo binomial com parametrosneppara/-1,2. Estas duas random variables on page 140. Furthermore, Theorem 3.9.2 says that Y, has the distribuigdes marginais, juntamente com o facto deSieS2sdo independentes, dé a . . gs . . J . totalid ade binomial distribution with parameters n; and p; for j =1, 2. These two marginal distributions, together with the fact that Y, and Y, are independent, give the entire distribuigdo conjunta. Entdo, por exemplo, segé a junta PF deS1e52, podemos calcular joint distribution. So, for example, if g is the joint p.f. of Y; and Y, we can compute 0 ( nm n n gsi, simp PSI pp y)IM 21 -p)na-sima, 8(V1 Y2) = ( ) py (l— py( *) py (1 — pyr, sm SIM yy y2 parasinn=0,..., mesinn=0,..., m, enquantog(si, sim)-0 caso contrario. Nao ha necessidade de for y, =0,...,n,and yx =0,..., 2, while g(y,, y2) = 0 otherwise. There is no need encontrar um conjuntoAcomo no Exemplo 3.9.2, devido a estrutura simplificadora da distribuicao to find a set A as in Example 3.9.2, because of the simplifying structure of the joint conjunta dem,..., Xn. - distribution of X,,..., X,. < Variaveis aleatorias com distribuigdo conjunta continua Random Variables with a Continuous Joint Distribution Exemplo Tempo total de servigo.Suponha que os dois primeiros clientes de uma fila planejem sair Example Total Service Time. Suppose that the first two customers in a queue plan to leave 3.9.4 junto. DeixarXeuseja o tempo que leva para atender o clienteeuparaeu=1,2. Suponha 3.9.4 together. Let X; be the time it takes to serve customer i fori = 1, 2. Suppose also that também que X1eX2sdo variaveis aleatdérias independentes com distribuigdo comum X, and X> are independent random variables with common distribution having p.d.f. tendo pdf f(x-2e-2xparax >0 e 0 caso contrario. Como os clientes sairdo juntos, eles estao f (x) =2e~** for x > 0 and 0 otherwise. Since the customers will leave together, they interessados no tempo total necessdario para atender ambos, ou seja,S=X1+X2. Agora are interested in the total time it takes to serve both of them, namely, Y = X; + X>. podemos encontrar o pdf deS. We can now find the p.d-f. of Y. Para cadasim, deixar For each y, let Asim= {(X1, X2):X1+X2S Sim}. Ay = {(%q, X2) 1X1 +X7 < yh. EntdoS<simse e apenas se(M1, X2E Asim. O conjuntoAsimé retratado na Fig. 3.24. Se deixarmos Then Y < y if and only if (X1, X2) € Ay. The set A, is pictured in Fig. 3.24. If we let G(sMenotar o cdf deS,entdo paravocé >0, G(y) denote the c.d.f. of Y, then, for y > 0, Jsim{ simoe y py—x G(SEPr.((M1, X2)E Asim 4e-2x1-2x2dx1 x2 G(y) = Pr((Xq, X2) € Ay) = [ [ he 781-2 x1dx9 0 0 0 JO Jsim [ ] Jsiml ] y y = 2@22 1 -€2(v0c8.0)~ yo= 2@-20- 2@-25imadx2 _ / 2-282 [1 _ e209) dx) = / [262% _ 26%] dy 0 0 0 0 =1 -@-2sim-2 VOS-2sim. =1-¢- 2ye”, Figura 3.240 conjuntoAsimem Figure 3.24 The set Ay in Exemplo 3.9.4 e na prova . Example 3.9.4 and in the do Teorema 3.9.4. > proof of Theorem 3.9.4. » > Lrrny, Lire, SITS 4) 178 Capitulo 3 Varidveis Aleatdrias e Distribuicgées 178 Chapter 3 Random Variables and Distributions Tomando a derivada deG(s)em relagdo asim, obtemos o pdf Taking the derivative of G(y) with respect to y, we get the p.d.f. (Ss A -e-2sim VOS-2sinAVOS-25Im, g(y) = “ [1 — ev ye? = 4ye~, mmorrer y paravocé >0 e 0 caso contrario. - fory>0 and 0 otherwise. < A transformacao no Exemplo 3.9.4 6 um exemplo de método de forca bruta que esta sempre The transformation in Example 3.9.4 is an example of a brute-force method that is disponivel para encontrar a distribuigdo de uma funcdo de diversas variaveis aleatérias, porém always available for finding the distribution of a function of several random variables, pode ser dificil de aplicar em casos individuais. however, it might be difficult to apply in individual cases. Teorema Distribuicdo de forga bruta de uma fungdo.Suponha que a pdf conjunta deX=(X1,..., Xn) Theorem Brute-Force Distribution of a Function. Suppose that the joint p.d.f. of X = (X1,..., X,) 3.9.3 éf (xe essaS=r(X). Para cada numero realsim, definirAsim= {x.rogssim}. Entao 3.9.3 is f(x) and that ¥Y =r(X). For each real number y, define A, = {x :r() < y}. Then 0 cdf G(sddeSé if the c.d.f. G(y) of Y is G(SF ... F(x)aX. (3.9.1) Gy) = / tee / f(x) dx. (3.9.1) Asim Ay ProvaDa definicdo de cdf, Proof From the definition of c.d.f., G(sPr.(SSe=Pr[(X)ssim] = Pr(XEAsim), G(y) = Pr(¥ < y) = Pr[r(X) < y] = Pr(X € Ay), que é igual ao lado direito da Eq. (3.9.1) pela Definigao 3.7.3. 7 which equals the right side of Eq. (3.9.1) by Definition 3.7.3. 7 Se a distribuigdo deStambém é continuo, entdo a pdf de Spode ser encontrado If the distribution of Y also is continuous, then the p.d.f. of Y can be found by diferenciando 0 cdf G(s). differentiating the c.d.f. G(y). Um caso especial popular do Teorema 3.9.3 é 0 seguinte. A popular special case of Theorem 3.9.3 is the following. Teorema Fungdo Linear de Duas Variaveis Aleatdérias.Deixar %eX2tem pdf conjuntof(x1, x2), Theorem Linear Function of Two Random Variables. Let X, and X> have joint p.d.f. f (x4, x9), 3.9.4 e deixar S=a1 X1+ a2X2+ bcomai= 0. Entdo Stem uma distribuigdo continua cujo pdf 3.9.4 and let Y = a,X, +a)X2+b with a, 4 0. Then Y has a continuous distribution whose é p.d.f. is Je ( sim-b-a2x2 ) 1 °° —b—ayx 1 OSE f ———_,» |at+— dr. (3.9.2) g(y) = / tf (P22, ») —dxp. (3.9.2) ~ 0 a —0oo a |a4| ProvaPrimeiro, encontraremos o cdfGde Scuja derivada veremos é a funcdo gna Eq. Proof First, we shall find the c.d.f. G of Y whose derivative we will see is the function (3.9.2). Para cadasim, deixarAsim= {(x1, x2+.a1x1+a2x2+ bs sim}. O conjuntoAsimtem a mesma g in Eq. (3.9.2). For each y, let Ay = {(x1, x2) :ayx1 + a2x2 + b < y}. The set A, has forma geral do conjunto da Fig. 3.24. Escreveremos a integral sobre 0 conjunto Asimcomx2 the same general form as the set in Fig. 3.24. We shall write the integral over the set na integral externa exina integral interna. Assuma issoai>0. O outro caso é semelhante. A,, with x, in the outer integral and x, in the inner integral. Assume that a; > 0. The De acordo com 0 Teorema 3.9.3, other case is similar. According to Theorem 3.9.3, JJ Jof(sim-b-ax2Van oo p(y—b—ayx2)/ay G(s f(x, X2)dx1 dx2= f(x1, X2)dx1 dx2.(3.9.3) G(y) -| [ te. X)dxdx> -| / f (x1, X9)dx1dx7. (3.9.3) Asim — 0 —00 Ay —oco J—00 Para a integral interna, realize a mudanga de variavelz=a x1+a2x2+ bcujo inverso For the inner integral, perform the change of variable z = a,x, + a9x, + b whose éx1=(z-b-azx2)/ai, para quedxi=dz/a. A integral interna, apds esta mudanga de inverse is x; = (z — b — ayx2)/a,, so that dx; = dz/a,. The inner integral, after this variavel, torna-se change of variable, becomes Jsim ( zb-machado ) 1 Y z—b—ax 1 po etic” [os (Ste) Lae — 00 da 2a —oo ay ay Podemos agora substituir esta expressdo pela integral interna na Eq. (3.9.3): We can now substitute this expression for the inner integral into Eq. (3.9.3): foof si ( ) =ysin -b-mach 1 ~° —b— 1 G(s fo Zemachad,X — _ azaty, G(y)= / / f (S22 yy) =dzdxy —-0 ~—-© a a —oo J—0o ay ay Jsim J ( Zb-a2X2,X ) 1 ye z—b—ax 1 = fo 2» —drdaz. (3.9.4) = / / f (E22. 2) —dxpdz. (3.9.4) -0 -o© H da —o0 J—0o ay aq 3.9 Fungdes de duas ou mais variaveis aleatdrias 179 3.9 Functions of Two or More Random Variables 179 Deixarg(z)[denotam a integral interna no lado direito da Eq. (3.9.4). Entéo nés temos Let g(z) denote the inner integral on the far right side of Eq. (3.9.4). Then we have G(sFsim - ~g(z)adz, cuja derivada €0(s), a funcdo na Eq. (3.9.2). = G(y) = ia g(z)dz, whose derivative is g(y), the function in Eq. (3.9.2). = O caso especial do Teorema 3.9.4 em queXieX2sdo independentes, a1=a2= 1, eb= The special case of Theorem 3.9.4 in which X, and X> are independent, a, = a, = 1, 0 é chamadoconvolu¢ao. and b = 0 is called convolution. Definigao Convolucdo.DeixarXieX2sejam variaveis aleatdrias continuas independentes e sejam Definition Convolution. Let X,; and X, be independent continuous random variables and let 3.9.1 S=Xi+X2. A distribuigdo deSé chamado deconvo/ucaodas distribuigdes de X1eX2. 3.9.1 Y = X,+ Xp. The distribution of Y is called the convolution of the distributions of O pdf deSas vezes é chamada de convolugdo dos pdfs dere X2. X, and X>. The p.d.f. of Y is sometimes called the convolution of the p.d.f’s of X; and X>. Se deixarmos o pdf deXeuser feuparaeu=1,2 na Definic¢do 3.9.1, entao Teorema If we let the p.d.f. of X; be f; for i = 1, 2 in Definition 3.9.1, then Theorem 3.9.4 (with 3.9.4 (com a1=a2= 1 eb=0) diz quer pdf des=Xi+X2é a, = a7 = 1 and b = 0) says that the p.d.f. of Y = X, + X> is * CO OSE fi (vocé-z)f2(z)az. (3.9.5) gy) = / AO —2)fo(2dz. (3.9.5) ~ 0 00 De forma equivalente, trocando os nomes dosXieX2, obtemos a forma alternativa Equivalently, by switching the names of X, and X>, we obtain the alternative form para a convolucdo: j for the convolution: °° Co OSE fi (z)f2(vocé-z) az. (3.9.6) g(y) = / Ai\@ fry — z) dz. (3.9.6) 00 —0o A pdf encontrada no Exemplo 3.9.4 é 0 caso especial de (3.9.5) comfi (xF R(x 2e-2« The p.d.f. found in Example 3.9.4 is the special case of (3.9.5) with f,(x) = fo(x) = parax >0 e 0 caso contrario. 2e~?* for x > 0 and 0 otherwise. Exemplo Uma carteira de investimentos.Suponha que um investidor queira comprar agées e Example An Investment Portfolio. Suppose that an investor wants to purchase both stocks and 3.9.5 titulos. DeixarXiseja o valor das acgées ao final de um ano, e deixeX2sera 0 valor 3.9.5 bonds. Let X, be the value of the stocks at the end of one year, and let X> be the dos titulos ao final de um ano. Suponha queXeX2sao independentes. Deixar% value of the bonds at the end of one year. Suppose that X, and X are independent. tem a distribui¢do uniforme no intervalo [1000,4000], e deixeXztem a Let X, have the uniform distribution on the interval [1000, 4000], and let X> have the distribuigdo uniforme no intervalo [800,1200]. A soma S=Xi+X2é 0 valor no final uniform distribution on the interval [800, 1200]. The sum Y = X, + X> is the value at do ano da carteira composta por acées e titulos. Nés the end of the year of the portfolio consisting of both the stocks and the bonds. We encontrarei o pdf deS.A funcaof (zZJ2(vocé-z)na Eq. (3.9.6) é shall find the p.d.f. of Y. The function f,(z) fo(y — z) in Eq. (3.9.6) is [ea33. 0-7 por 1000sz<4000 8.333 x 10-7 for 1000 < z < 4000 fi (z)f2(vocé-zF | e 800<sim-z<1200, caso (3.9.7) ADA -— 2) = and 800 < y — z < 1200, (3.9.7) 0 contrario. 0 otherwise. Precisamos integrar a fungdo na Eq. (3.9.7) acimazpara cada valor desimpara obter o We need to integrate the function in Eq. (3.9.7) over z for each value of y to get pdf marginal deS.E util observar um grafico do conjunto de(sim, z)pares para os the marginal p.d-f. of Y. It is helpful to look at a graph of the set of (y, z) pairs for quais a funcao na Eq. (3.9.7) é positivo. A Figura 3.25 mostra a regido sombreada. Por which the function in Eq. (3.9.7) is positive. Figure 3.25 shows the region shaded. For 1800 <sims2200, devemos integrarzde 1000 as/m-800. Por 2.200 <simms4800, devemos 1800 < y < 2200, we must integrate z from 1000 to y — 800. For 2200 < y < 4800, we integrarzdesim-1200 asim-800. Por 4800 <vocé <5200, devemos integrarzdesim-1200 must integrate z from y — 1200 to y — 800. For 4800 < y < 5200, we must integrate z a 4000. Como a funcdo na Eq. (3.9.7) € constante quando é positiva, a integral é igual from y — 1200 to 4000. Since the function in Eq. (3.9.7) is constant when it is positive, a constante vezes a duracao do intervalo dezvalores. Entdo, o pdf deSé the integral equals the constant times the length of the interval of z values. So, the p.d.f. of Y is [essacioroce 800) | por 1800 <sims2.200, por 8.333 x 10-7(y — 1800) for 1800 < y < 2200, (Ss 3.333x10-4 2.200 <sims4800, por ey) = 3.333 x 1074 for 2200 < y < 4800, |s.333«10-75200 -e)| 4800 <vocé <5200, caso 8.333 x 10-7(5200 — y) for 4800 < y < 5200, 0 contrario. - 0 otherwise. < Como outro exemplo do método de forga bruta, consideramos as maiores e as menores As another example of the brute-force method, we consider the largest and observagdes em uma amostra aleatoria. Essas fungdes dao uma ideia de qudo espalhada esta a smallest observations in arandom sample. These functions give an idea of how spread amostra. Por exemplo, os meteorologistas frequentemente relatam valores recordes de altos e baixos out the sample is. For example, meteorologists often report record high and low 180 Chapter 3 Random Variables and Distributions Figure 3.25 The region where the function in Eq. (3.9.7) is positive. y z 2000 3000 4000 5000 1000 0 1500 2000 2500 3000 3500 4000 temperatures for specific days as well as record high and low rainfalls for months and years. Example 3.9.6 Maximum and Minimum of a Random Sample. Suppose that X1, . . . , Xn form a random sample of size n from a distribution for which the p.d.f. is f and the c.d.f. is F. The largest value Yn and the smallest value Y1 in the random sample are defined as follows: Yn = max{X1, . . . , Xn}, Y1 = min{X1, . . . , Xn}. (3.9.8) Consider Yn first. Let Gn stand for its c.d.f., and let gn be its p.d.f. For every given value of y (−∞ < y < ∞), Gn(y) = Pr(Yn ≤ y) = Pr(X1 ≤ y, X2 ≤ y, . . . , Xn ≤ y) = Pr(X1 ≤ y) Pr(X2 ≤ y) . . . Pr(Xn ≤ y) = F(y)F(y) . . . F(y) = [F(y)]n, where the third equality follows from the fact that the Xi are independent and the fourth follows from the fact that all of the Xi have the same c.d.f. F. Thus, Gn(y) = [F(y)]n. Now, gn can be determined by differentiating the c.d.f. Gn. The result is gn(y) = n[F(y)]n−1f (y) for −∞ < y < ∞. Next, consider Y1 with c.d.f. G1 and p.d.f. g1. For every given value of y (−∞ < y < ∞), G1(y) = Pr(Y1 ≤ y) = 1 − Pr(Y1 > y) = 1 − Pr(X1 > y, X2 > y, . . . , Xn > y) = 1 − Pr(X1 > y) Pr(X2 > y) . . . Pr(Xn > y) = 1 − [1 − F(y)][1 − F(y)] . . . [1 − F(y)] = 1 − [1 − F(y)]n. Thus, G1(y) = 1 − [1 − F(y)]n. Then g1 can be determined by differentiating the c.d.f. G1. The result is g1(y) = n[1 − F(y)]n−1f (y) for −∞ < y < ∞. 180 Capítulo 3 Variáveis Aleatórias e Distribuições Figura 3.25A região z onde a função na Eq. (3.9.7) é positivo. 4000 3500 3.000 2500 2000 1500 1000 0 sim 2000 3.000 4000 5.000 temperaturas em dias específicos, bem como registrar chuvas altas e baixas durante meses e anos. Exemplo 3.9.6 Máximo e mínimo de uma amostra aleatória.Suponha queX1, . . . , Xnformar um aleatório amostra de tamanhonde uma distribuição para a qual o pdf éfe o cdf éF.O maior valorSne o menor valorS1na amostra aleatória são definidos da seguinte forma: Sn=máximo{X1, . . . , Xn}, S1=min{X1, . . . , Xn}. (3.9.8) ConsiderarSnfiprimeiro. DeixarGnrepresente seu cdf e deixegnseja seu pdf Para cada valor dado devocê (−∞<você <∞), Gn(s)=Pr.(Sn≤e)=Pr.(X1≤sim, X2≤sim, . . . , Xn≤e) =Pr.(X1≤e)Pr.(X2≤e). . .Pr.(Xn≤e) = F(y)F(y) . . . F (s)= [F (s)]n, onde a terceira igualdade decorre do fato de queXeusão independentes e a quarta decorre do fato de que todos osXeutem o mesmo cdfF.Por isso, Gn(s)= [F (s)]n. Agora,gnpode ser determinado diferenciando o cdfGn. O resultado é gn(s)=n[F (s)]n−1f (s)para -∞<você <∞. A seguir, considereS1com CDFG1e pdfg1. Para cada valor dado devocê (−∞< você <∞), G1(s)=Pr.(S1≤e)=1 − Pr(S1>e) =1 − Pr(X1>sim, X2>sim, . . . , Xn> e) =1 − Pr (X1>e)Pr.(X2>e). . .Pr.(Xn> e) =1 − [1 −F (s)][1 -F (s)]. . .[1 -F (s)] =1 − [1 −F (s)]n. Por isso,G1(s)=1 − [1 −F (s)]n. Entãog1pode ser determinado diferenciando o cdfG1. O resultado é g1(s)=n[1 -F (s)]n−1f (s)para -∞<você <∞. 3.9 Fungdes de duas ou mais variaveis aleatdrias 181 3.9 Functions of Two or More Random Variables 181 Figura 3.26A pdf da pdf Figure 3.26 Thep.dfofthe pdf. distribuigdo uniforme no 5 uniform distribution on the 5 intervalo [0,1] juntamente Yariv aleatsra tinica 2 interval [0, 1] together with — finsle random variable 2 com os pdf's do minimo e A seesseees Maximo de 5 a the p.d.f’s of the minimum 4 seesseess Maximum of 5 a maximo de amostras de 777 Faba de 5 é and maximum of samples 77 7 Range of 5 é tamanhon=5. O pdf do 3 of size n =5. The p.d-f. of 3 intervalo de uma amostra de o the range of a sample of size o tamanho n=5 (ver Exemplo 2 err n =5 (see Example 3.9.7) is 2 ect, 3.9.7) também esta incluido. uo a also included. uo a 1 Se we \ I Se oe \ -c we \ 7 we \ mee Tenens eee \ me Terence \ 0 0,2 0,4 0,6 0,8 10 9 0.2 0.4 0.6 0.8 10 4 A Figura 3.26 mostra a pdf da distribuigdo uniforme no intervalo [0,1] juntamente Figure 3.26 shows the p.d.f. of the uniform distribution on the interval [0, 1] com os pdfs de Sie Snpara 0 casom=5. Também mostra o pdf de 55-51, que sera together with the p.d.f.’s of Y, and Y, for the case n = 5. It also shows the p.d.f. of derivado no Exemplo 3.9.7. Observe que o pdf de51é mais alto perto de 0 e mais Y5 — Y,, which will be derived in Example 3.9.7. Notice that the p.d-f. of Y; is highest baixo perto de 1, enquanto o oposto é verdadeiro para a pdf de Sn, como seria de near 0 and lowest near 1, while the opposite is true of the p.d.f. of Y,, as one would esperar. expect. Finalmente, determinaremos a distribuigdo conjunta deSie5n. Para cada par de Finally, we shall determine the joint distribution of Y, and Y,. For every pair valores(sirm, simnkal que -~<sinm <simmn<,o evento {SiSsinn} N {SnSsimn}é o mesmo of values (y;, y,,) such that —oo < y; < y, < 0, the event {Y; < y,} N{Y,, < y,} is the que {Snssimn} N {S1>sinm}c. SeGdenota o cdf conjunto bivariado deSieSn, entado same as {Y, < y,} N{Y, > y,}°. If G denotes the bivariate joint c.d.f. of Y; and Y,,, then G(s1, simmnPr.(Sissinne Snssimn) G()1, Yn) = Pr(y, < y, and Y,, < y,) =Pr.(Snssimn}Pr.(SnssimneS1 >sirm) = = Pr(Y,, < y,) — Pr(Y, < y, and Y; > yy) Pr.(Snssimn) = Pr(Y, < yn) - Pr.(vocéi <Xissimn, sirm <X2Ssimn, ..., Sinn <Xnssimn)[] — Priyy < X14 < yo Vy < X20 Sy eV < Xn <n) n n =Gn(vocén) Pr.(voc& <Xeussimn) = Gin) — I] Pro, < X; < Yn) eu=1 i=l =[F(sn]| n- [F(Ssn} F(s1 Jn. =[Fy,)]" —[FO,) -— FO)]’. A articulacado bivariada pdfgdeSieSnpode ser encontrado a partir da relacdo The bivariate joint p.d.f. g of Y,; and Y,, can be found from the relation G(s1, simn) PG, Yn) (s1, simn= =——————-.. (¥,. ¥,) = ——. g ay dyn §\Y1L> Yn dvd, Assim, para -©<sinn <simn<e, Thus, for —co < yy < y, < ™, G(s1, SimnE N(N-A NF (Sn}F (51 J\r-2F (51)F (Sn). (3.9.9) 8015 Yn) =22 — D[F On) — Fd)!" FOV FOn)- (3.9.9) Além disso, para todos os outros valores desirmesimn,g(s1,simn0. - Also, for all other values of yy and Yao SO. Yn) = 0. < Uma maneira popular de descrever o qudo espalhada é uma amostra aleatoria é usar A popular way to describe how spread out is a random sample is to use the a distancia do minimo ao maximo, que é chamada defa/xada amostra aleatéria. Podemos distance from the minimum to the maximum, which is called the range of the random combinar o resultado do final do Exemplo 3.9.6 com o Teorema 3.9.4 para encontrar a fdp sample. We can combine the result from the end of Example 3.9.6 with Theorem 3.9.4 do contradominio. to find the p.d.f. of the range. Exemplo A distribuicdo do intervalo de uma amostra aleatéria.Considere a mesma situacdo que em Example The Distribution of the Range of a Random Sample. Consider the same situation as in 3.9.7 Exemplo 3.9.6. A variavel aleatéria Sn-516 chamado de faixada amostra. O pdf 3.9.7 Example 3.9.6. The random variable W = Y,, — Y, is called the range of the sample. conjuntog(s1, siminddeS1eSnfoi apresentado na Eq. (3.9.9). Podemos agora aplicar o The joint p.d.f. g(y4, y,) of Y; and Y,, was presented in Eq. (3.9.9). We can now apply Teorema 3.9.4 comai= -1,a2= 1, eb=0 para obter o pdfhdeCc Theorem 3.9.4 with a; = —1, a7 = 1, and b = 0 to get the p.d-f. h of W: 182 Capitulo 3 Varidveis Aleatdrias e Distribuicgées 182 Chapter 3 Random Variables and Distributions Joo Joo oo oo h(oF g(sr-e, enJmorrern= Gz, z+widz, (3.9.10) h(w) = / gO, — W, Y,)dy, = / g(z,z+w)dz, (3.9.10) — © — © —0o —0o onde, para a ultima igualdade, fizemos a mudanga de variavelZ=simr-c. - where, for the last equality, we have made the change of variable z = y, — w. < Aqui esta um caso especial em que a integral da Eq. 3.9.10 pode ser calculado de Here is a special case in which the integral of Eq. 3.9.10 can be computed in forma fechada. closed form. Exemplo O intervalo de uma amostra aleatéria de uma distribuigdo uniforme.Suponha que onaleatério Example The Range of a Random Sample from a Uniform Distribution. Suppose that then random 3.9.8 variaveisX1,..., Xnformar uma amostra aleatoria a partir da distribuigdo uniforme no 3.9.8 variables X,,..., X,, form a random sample from the uniform distribution on the intervalo [0,1]. Determinaremos a pdf do intervalo da amostra. interval [0, 1]. We shall determine the p.d.f. of the range of the sample. Neste exemplo, In this example, { 1 para O<x <1, 1 for0 <x <1, AX f(x)= : 0 de outra forma, 0 otherwise, Também, A(x/=xpara O<x <1. Podemos escrever9(5s1, simmnyda Eq. (3.9.9) neste caso como Also, F(x) =x for 0 <x <1. We can write g(y, y,) from Eq. (3.9.9) in this case as { 7 . . . _ _ n—2 gist, simp 11 lsrsirm -2para O<sirm <simn<1, guy) = | n(n — 1)(y, — 1) for 0 <Y1<In< 1, 0 de outra forma. 0 otherwise. Portanto, na Eq. (3.9.10), g(z, z+cH0 a menos que O<e <1 e 0<z<1 -c. Para valores Therefore, in Eq. (3.9.10), g(z, z+ w) =0 unless 0 < w <1 and0<z<1-—vw. For decezsatisfazendo essas condi¢6es,g(/z, wtZEN(n-1 )cn-2. A pdf na Eq. (3.9.10) é values of w and z satisfying these conditions, g(z, w + z) =n(n — 1)w"~*. The p.d.f. entdo, para 0<e <1, in Eq. (3.9.10) is then, for 0 < w <1, fa-c 1-w h(oF Nn Jor-2dz=n(-1 Jor-2( -C). h(w) = / n(n — 1)w"? dz =n(n —1)w" 7 — w). 0 0 De outra forma,h(oF0. Esta pdf é mostrada na Fig. 3.26 para o cason=5. - Otherwise, h(w) = 0. This p.d.f. is shown in Fig. 3.26 for the case n = 5. < Transformacao Direta de um PDF Multivariado Direct Transformation of a Multivariate p.d.f. Aseguir, enunciamos sem prova uma generalizacdo do Teorema 3.8.4 para o caso de diversas Next, we state without proof a generalization of Theorem 3.8.4 to the case of several varidveis aleatdérias. A prova do Teorema 3.9.5 é baseada na teoria das transformagées random variables. The proof of Theorem 3.9.5 is based on the theory of differentiable biunivocas diferenciaveis em calculo avancgado. one-to-one transformations in advanced calculus. Teorema Transformagdo Multivariada.Deixar™, ..., Xntém uma distribuigdo conjunta continua Theorem Multivariate Transformation. Let X,,..., X, have a continuous joint distribution 3.9.5 para o qual o pdf conjunto éfSuponha que exista um subconjuntoSde Antal que Pr[(X1 3.9.5 for which the joint p.d.f. is f. Assume that there is a subset S of R” such that ,+++,Xn&S] = 1. Definirnnovas variaveis aleatdrias1,..., Sndo seguinte modo: Pr[(X 1, ..., X,) € S]=1. Define n new random variables Yj, ..., Y,, as follows: Si=Ri(M,..., Xn), Y,=71(X%4,..., Xn), S2=R2(M,..., Xn), Yo =12(X,.--5 Xn)s é h (3.9.11) 2=12(%1 n) (3.9.11) Sn=Rn(M,..., Xn), Y, =Wn(X%1,---, Xn), onde assumimos que orfuncéesA1,..., mdefinir uma transformagdo diferenciavel um- where we assume that the n functions rj, ..., 7, define a one-to-one differentiable para-um deSem um subconjunto 7defn. Deixe o inverso desta transformacdo ser dado da transformation of S onto a subset T of R”. Let the inverse of this transformation be seguinte forma: given as follows: mM=E1(vocél,..., Simn), xy = 5401, tne Ya) x2=€2(vocé1,..., Simn), xy=S gee ; ” (3.9.12) 2= 82001. «+++ Yn) (3.9.12) Xn=En(voc&, ..., sitmn). Xp =SnQ1.--+> Yy)- 3.9 Fungdes de duas ou mais variaveis aleatdrias 183 3.9 Functions of Two or More Random Variables 183 Entdo o pdf conjuntogdeS1,..., Sné Then the joint p.d-f. g of Y;,..., Y,, is { . F(s1,..., @)|f.| — para(sin,..., simnXT, de (sj,..-,5,)|J| for Oy,...,¥,) €T, 9(S1,..., Sin: (3.9.13) On | [Orr Men (3.9.13) 0 outra forma, 0 otherwise, onde/.é o determinante where J is the determinant [as as as as, oi il Oy OY j=det| te Te J=det} : - O50 sa Bsn On oy dyn ayy BYn e |/.|denota o valor absoluto do determinante/.. 7 and |/J| denotes the absolute value of the determinant J. 7 Assim, 0 pdf conjuntog(si, ..., simn obtido comecando com o pdf conjunto fx! Thus, the joint p.d.f. g(1,..., y,) is obtained by starting with the joint p.d.f. ,...,Xn), substituindo cada valorxeupela sua expressdoéeu(vocé, ..., simnjem termos de f(y, ...,%,), replacing each value x; by its expression s;(y),..., y,) in terms of sim, ..., Simne multiplicando o resultado por |/.|.Este determinante/.é chamado de yy,---s ¥,, and then multiplying the result by |/|. This determinant J is called the Jacobianoda transformagdo especificada pelas equacdes em (3.9.12). Jacobian of the transformation specified by the equations in (3.9.12). Nota: O Jacobiano é uma generalizagao da derivada do inverso.Eqs. (3.8.3) e Note: The Jacobian Is a Generalization of the Derivative of the Inverse. Eqs. (3.8.3) (3.9.13) so muito semelhantes. O primeiro fornece a fdp de uma unica fungdo de and (3.9.13) are very similar. The former gives the p.d-f. of a single function of a uma Unica variavel aleatéria. Na verdade, sen=1 pol. (3.9.13),/=ds1(vocéi)/morrerie single random variable. Indeed, if n = 1 in (3.9.13), J = ds1(y1)/dy, and Eq. (3.9.13) Eq. (3.9.13) torna-se igual a (3.8.3). O Jacobiano apenas generaliza a derivada do becomes the same as (3.8.3). The Jacobian merely generalizes the derivative of the inverso de uma Unica funcdo de uma variavel paranfuncgdes devariaveis. inverse of a single function of one variable to n functions of n variables. Exemplo A pdf conjunta do quociente e do produto de duas variadveis aleatérias.Suponha que Example The Joint p.d.f. of the Quotient and the Product of Two Random Variables. Suppose that 3.9.9 duas variaveis aleatériasXieX2tém uma distribuigéo conjunta continua para a qual a pdf 3.9.9 two random variables X, and X, have a continuous joint distribution for which the conjunta é a seguinte: joint p.d.f. is as follows: t 4x1 xzpara 0<x1<1 e 0<x2<1,0 4x4x, for0<x,<1land0<x <1, fn, XO f4, x2) = : de outra forma. 0 otherwise. Determinaremos a pdf conjunta de duas novas variaveis aleatérias51e52, que sdo We shall determine the joint p.d.f. of two new random variables Y, and Y>, which are definidos pelas relagées defined by the relations MeS2=XX2. X2 XxX N= — Y, = — and Y, = X,X). XQ Na notacao do Teorema 3.9.5, diriamos queS1=R1(X1, X2JeS2= R2(X1, X2), onde In the notation of Theorem 3.9.5, we would say that Yj =r,(X1, X2) and Y, = ro(X1, X2), where Rim, x2) xe R2(x _ xy _ , — V X2X1X2. (3.9.1 4) ry(%4, x2) = — and ro(X4, x2) = X4X2. (3.9.14) x2 x2 O inverso da transformagdo na Eq. (3.9.14) € encontrado resolvendo as equacées sim The inverse of the transformation in Eq. (3.9.14) is found by solving the equations 1=Ri (x1, X2Jesinn=R2(x1, x2)paraxiexzem termos desinnesin. O resultado é yy = (X41, X2) and yo =1rp(x1, x2) for x; and x, in terms of y, and y>. The result is M=€1(vocél, sim)=(vocé sim) 12 xX, = 54011, V2) = (ypy2)?, ( oy (3.9.15) yy? (3.9.15) x2=@(voc&i, sim — . Xp = $7(¥y, V2) = (22) . SIM yy DeixarSdenotar 0 conjunto de pontos (x1,x2) tal que 0<x1 <1 e O<x2<1, de modo que Pr[(™1, Let S denote the set of points (x1, x2) such that 0 < x, < 1 and 0 < x» <1, so that X25] = 1. Deixe 7seja o conjunto de(sinm, sim2)pares tais que(sirm, sim2}€ Tse € apenas se( Pr[(X,, X2) € S]=1. Let T be the set of (y,, yz) pairs such that (y1, y2) € T if and only &1\(voc&, sim2), S2(voc&, sirm)ES. Entao Pr[(S1, 52}€7] =1. A transformagcao definida pelas if (8101, y2), 82091, 2)) € S. Then Pr[(¥,, Y,) € T] = 1. The transformation defined by equacées em (3.9.14) ou, equivalentemente, pelas equacées em (3.9.15) especifica uma the equations in (3.9.14) or, equivalently, by the equations in (3.9.15) specifies a one- relacdo um a um entre os pontos em Se os pontos em7. to-one relation between the points in S and the points in T. 184 Capitulo 3 Varidveis Aleatdrias e Distribuigées 184 Chapter 3 Random Variables and Distributions Si J2 sirmsime- 1 yiy2= 1 x2 X9 simasinm-1 yo/y,=l 1 1+ ----- 1 1{----- T T 0 1 xi 0 sirm 0 1 xy 0 yy Figura 3.270s conjuntosSe 7no Exemplo 3.9.9. Figure 3.27 The sets S and T in Example 3.9.9. Mostraremos agora como encontrar o conjunto 7.Nés sabemos isso(x1, x2)€.5se e somente se as We shall now show how to find the set T. We know that (x1, x») € S if and only seguintes desigualdades forem validas: if the following inequalities hold: x1>0, X1<1, x2>0,eX2<1. (3.9.16) xy>0, xy<1, x >0, andx <1. (3.9.16) Podemos substituir as formulas pormiexzem termos desinnesimeda Eq. (3.9.15) nas We can substitute the formulas for x; and x in terms of y, and yy from Eq. (3.9.15) desigualdades em (3.9.16) para obter into the inequalities in (3.9.16) to obtain ( Jip 1/2 Si (vocé sirm)2>0, (vocé\ sim) r<1, syn >0, (yy)? > 0, (y1y2) 7 <1, (22) > 0, sirm yy ( Jar 1/2 sim e 2 4. (3.9.17) and (2) <1. (3.9.17) SIM Y A primeira desigualdade se transforma em (sinm>0 esim2>0) ou (sin <0 esim2<0). No entanto, The first inequality transforms to (y, > 0 and y2 > 0) or (y; < Oand y, < 0). However, desdesinn=x1/x2, ndo podemos tersirm <0, entdo obtemos apenassirm >0 esimm2>0. A terceira since y; = x,/x2, we cannot have y, < 0, so we get only y, > 0 and y, > 0. The third desigualdade em (3.9.17) transforma-se na mesma coisa. A segunda desigualdade em (3.9.17) inequality in (3.9.17) transforms to the same thing. The second inequality in (3.9.17) torna-sesime<1/s1. A quarta desigualdade torna-sesime<sirm. A regido Tonde (vocé1, sim2) becomes y < 1/y,. The fourth inequality becomes y) < y,. The region T where satisfazer essas novas desigualdades é mostrado no painel direito da Fig. 3.27 com o conjuntoS (1, ¥2) Satisfy these new inequalities is shown in the right panel of Fig. 3.27 with no painel esquerdo. the set S in the left panel. Para as funcgdes em (3.9.15), For the functions in (3.9.15), as 1° sii a8 1 siti? a =i (2)" fs 1 (n)" oy) 2 sirm’ ody2 2 sim’ ay, 2\yJ > ay 2X\y/J ’ ( diz 1/2 a2 1 sim” an _1° 17 as. __1 (x ta 1 (1 \" oy) 2 simp” O22 ssi ayy 2\y7]} 7 dy. 2 \yy Por isso, Hence, [ a si? 1Sirma| (2) (ny Si 2 sin 1 2 2 1 J=det| p (i “Ty - ( Fall —. J =det ‘ 1/2 » y2 |=5n° rr sm I(y (+) v1 2sirm, 2 siomsinm 2 y3 2 \ yy Desdesinm>0 em todo o conjunto/7, |/.| =1/2sinm). Since y, > 0 throughout the set T, |J| = 1/(2y). O pdf conjunto9(si, sim2Jagora pode ser obtido diretamente da Eq. (3.9.13) da The joint p.d.f. g(1, y2) can now be obtained directly from Eq. (3.9.13) in the seguinte forma: Na expressdo paraf(x1, x2), substituirx1com(sinm sim2).2, substituirx2 following way: In the expression for f (x,, x2), replace x1 with (y,y7)!/?, replace x, 3.9 Fungdes de duas ou mais variaveis aleatdrias 185 3.9 Functions of Two or More Random Variables 185 com(sim2/siize multiplique o resultado por |/.| =1/2sinm). Portanto, with (y,/y,)!/*, and multiply the result by |/| = 1/(2y,). Therefore, {( g(s1, sin) 3s para(sinm, sim2)€T, gO, >) _ | 2(2) for 1, yo) eT, 0 de outra forma. - 0 otherwise. < Exemplo Tempo de atendimento em fila.DeixarXseja o momento em que o servidor em uma fila de servidor Unico Example Service Time in a Queue. Let X be the time that the server in a single-server queue 3.9.10 gastara com um cliente especifico e deixaraSser a taxa na qual o servidor pode 3.9.10 will spend on a particular customer, and let Y be the rate at which the server can operar. Um modelo popular para a distribuigdo condicional deXdadoS=simé dizer operate. A popular model for the conditional distribution of X given Y = y is to say que o pdf condicional deXdadoS=sirmé that the conditional p.d.f. of X given Y = y is { vosxy parax >0, ye-*) forx >0, g1(x| eF girly) = | 0 de outra forma. 0 otherwise. Deixar Stem o pdffa(s). O pdf conjunto de(X, Yé entdogi(x| s)f(s). Porque 1/Spode ser Let Y have the p.d.f. f(y). The joint p.d-f. of (X, Y) is then g;(x|y) fo(y). Because interpretado como o tempo médio de servico,XYmede a rapidez com que, em 1/Y can be interpreted as the average service time, Z = XY measures how quickly, comparacdo com a média, 0 cliente é atendido. Por exemplo,Z=1 corresponde a um compared to average, that the customer is served. For example, Z = 1 corresponds tempo médio de atendimento, enquantoZ >1 significa que este cliente demorou mais to an average service time, while Z > 1 means that this customer took longer than que a média eZ<1 significa que esse cliente foi atendido mais rapidamente do que o average, and Z <1 means that this customer was served more quickly than the cliente médio. Se quisermos a distribuigdo deZ, poderiamos calcular a pdf conjunta average customer. If we want the distribution of Z, we could compute the joint p.d.f. de(Z, Sdiretamente usando os métodos ilustrados. Poderiamos entdo integrar o pdf of (Z, Y) directly using the methods just illustrated. We could then integrate the joint conjuntos/mpara obter a pdf marginal deZ. No entanto, é mais simples transformar a p.d.f. over y to obtain the marginal p.d.f. of Z. However, it is simpler to transform the distribuigdo condicional deXdadoS=simna distribuigdo condicional deZdado S=sim, ja conditional distribution of X given Y = y into the conditional distribution of Z given que o condicionamento emS=simnos permite tratarScomo a constantesim. Porque X Y = y, since conditioning on Y = y allows us to treat Y as the constant y. Because =Z/S,a transformacdo inversa éx=s(z), ondes(z¥Z/a. A derivada de X = Z/Y, the inverse transformation is x = s(z), where s(z) = z/y. The derivative of este 6 1/s, e o pdf condicional deZladossime this is 1/y, and the conditional p.d.f. of Z given Y = y is a4 ekgs 4\ . incl) = 4ei( = »). SINTM simisim y y PorqueSé uma taxa, 520 eX=Z/A >0 se e somente seZ >0. Entdo, Because Y is arate, Y > 0 and X = Z/Y > Oif and only if Z > 0. So, {ez paraz >0, e” forz>0, Ii(2| eF (3.9.18) hy(zly) = | (3.9.18) 0 de outra forma. 0 otherwise. Notar que/indo depende desim, entaoZ independente deSe/é a pdf marginal deZ. Notice that h, does not depend on y, so Z is independent of Y and h, is the marginal O leitor pode verificar tudo isso no Exercicio 17. - p.d.f. of Z. The reader can verify all of this in Exercise 17. < Nota: Removendo Dependéncia.A férmulaZ=XYno Exemplo 3.9.10 faz parecer que Note: Removing Dependence. The formula Z = XY in Example 3.9.10 makes it Zdeveria dependerS.Na realidade, porém, multiplicarXporSelimina a dependéncia look as if Z should depend on Y. In reality, however, multiplying X by Y removes the queXja esta ligadoSe torna o resultado independente deS.Este tipo de transformacdo dependence that X already has on Y and makes the result independent of Y. This type que elimina a dependéncia de uma variavel aleatéria de outra é uma técnica muito of transformation that removes the dependence of one random variable on another poderosa para encontrar distribuig6es marginais de transformacées de variaveis is a very powerful technique for finding marginal distributions of transformations of aleatorias. random variables. No Exemplo 3.9.10, mencionamos que havia outra maneira, mais direta, porém In Example 3.9.10, we mentioned that there was another, more straightforward mais tediosa, de calcular a distribuigdo deZ. Esse método, que é util em muitos but more tedious, way to compute the distribution of Z. That method, which is useful ambientes, consiste em transformar(X, Yem(Z, W)para alguma variavel aleatéria in many settings, is to transform (X, Y) into (Z, W) for some uninteresting random desinteressanteCe depois integrarcfora do pdf conjunto Tudo o que importa na variable W and then integrate w out of the joint p.d.f. All that matters in the choice escolha doCé que a transformagdo seja biunivoca com inversa diferenciavel e que os of W is that the transformation be one-to-one with differentiable inverse and that calculos sejam viaveis. Aqui esta um exemplo especifico. the calculations are feasible. Here is a specific example. Exemplo Uma fungao de duas variaveis.No Exemplo 3.9.9, suponha que estivéssemos interessados Example One Function of Two Variables. In Example 3.9.9, suppose that we were interested 3.9.11 apenas no quociente51=%1/xX2em vez do quociente e do produto $2=X1X2. Como ja 3.9.11 only in the quotient Y, = X,/X, rather than both the quotient and the product temos o pdf conjunto de(51, $2), iremos apenas integrar sim2sair em vez de comecar Y, = X,X>. Since we already have the joint p.d.f. of (Y;, Y>), we will merely integrate do zero. Para cada valor desirm>0, precisamos olhar para 0 conjunto 7na Fig. 3.27 e y2 out rather than start from scratch. For each value of y; > 0, we need to look at the encontre o intervalo desimevalores a serem integrados. Para 0<sin <1, set T in Fig. 3.27 and find the interval of y2 values to integrate over. For 0 < y, < 1, raduzido do Inglés para o Portugués - www.onlinedoctranslator.com 186 Capitulo 3 Varidveis Aleatdrias e Distribuicgées 186 Chapter 3 Random Variables and Distributions integramos sobre 0<simz<sirm. Parasinm>1, integramos sobre 0<sinz<1/s1. (Parasinm= 1 we integrate over 0 < y < y). For y, > 1, we integrate over 0 < y) < 1/)4. (For y,; =1 ambos os intervalos sdo iguais.) Portanto, a pdf marginal de.s1é both intervals are the same.) So, the marginal p.d-f. of Y, is f.(.) | 5 2 ~ yee para<vocé <1, Jo! 2 (2) dy, for0<y, <1, gi(vocé va 2% ap) = Ly ae | Ji 2 seqpparer 2 parasim 1 > , I 7 2 (2) dy> for y> 1, { SifMpara 0<sinu <1, yy, for0<y, <1, = 1 / =) li sire parasinm>1. y for y, > 1. Existem outras transformac6es que teriam tornado o calculo degimais simples se isso fosse There are other transformations that would have made the calculation of g; simpler tudo o que queriamos. Veja o Exercicio 21 para ver um exemplo. - if that had been all we wanted. See Exercise 21 for an example. < Teorema Transformacées Lineares.DeixarX=(X1,..., Xntem uma distribuicéo conjunta continua para Theorem Linear Transformations. Let X = (X,,..., X,,) have a continuous joint distribution for 3.9.6 qual é o pdf conjuntof#DefinirS=(S1,..., Sn)por 3.9.6 which the joint p.d.-f. is f. Define Y = (Y),..., Y,,) by S=MACHADO, (3.9.19) Y=AX, (3.9.19) ondeAé um nao-singularnxnmatriz. EntdoStem uma distribuigdo conjunta continua where A is a nonsingular n x n matrix. Then Y has a continuous joint distribution com pdf with p.d.f. . 1 . . 1 -1 n g(sim= ——— f(A-\simparasimERn, (3.9.20) g(y) = —— f(A -y) forye R’, (3.9.20) |detA| |det A| ondeA-1é o inverso deA. where A~! is the inverse of A. ProvaCadaSeuvé uma combinacdo linear deX, ..., Xn. PorqueAé ndo singular, a Proof Each JY; is a linear combination of Xj, ..., X,,. Because A is nonsingular, the transformacgdo na Eq. (3.9.19) € uma transformacgdo um-para-um de todo 0 espacgoRn transformation in Eq. (3.9.19) is a one-to-one transformation of the entire space R” sobre si mesmo. Em cada pontosim€ Rr, a transformagcao inversa pode ser representada pela onto itself. At every point y € R”, the inverse transformation can be represented by equac¢ado the equation x=A-1 sim. (3.9.21) x=Avly, (3.9.21) O Jacobiano/.da transformacao que é definida pela Eq. (3.9.21) € simplesmente/.= det The Jacobian J of the transformation that is defined by Eq. (3.9.21) is simply J = A-1. Além disso, é sabido pela teoria dos determinantes que det A~!. Also, it is known from the theory of determinants that 1 1 detAi= ——_. det A~! = ——. detA det A Portanto, em cada pontosime€Rn, o pdf conjuntog(sim)pode ser avaliado da seguinte Therefore, at every point y € R”, the joint p.d.f. g(y) can be evaluated in the fol- maneira, de acordo com 0 Teorema 3.9.5: Primeiro, paraeu=1,..., n, 0 Componentexevem lowing way, according to Theorem 3.9.5: First, fori =1,...,, the component x; in f(x,..., Xn substituido peloeua componente do vetorA-1sim. Entdo, o resultado é f (x1, ..., X,) is replaced with the ith component of the vector A~!y. Then, the result dividido por |detA]. Isso produz a Eq. (3.9.20). 7 is divided by |det A|. This produces Eq. (3.9.20). 7 Resumo Summary Estendemos a construcdo da distribuigdo de uma fungdo de variavel aleatdéria ao caso de We extended the construction of the distribution of a function of a random variable diversas fungées de diversas variaveis aleatdérias. Se alguém quiser apenas a distribuicgdo to the case of several functions of several random variables. If one only wants the de uma fungdoAidenvariaveis aleatérias, a maneira usual de descobrir isso é primeiro distribution of one function r; of n random variables, the usual way to find this is to encontrarn-1fungées adicionais 2, ..., para que onfungées juntas compdem uma first find — 1 additional functions ry, ..., 7, so that the n functions together compose transformagdo um-para-um. Em seguida, encontre o pdf conjunto donfungées e a one-to-one transformation. Then find the joint p.d.f. of the n functions and finally finalmente encontrar o pdf marginal da primeira funcdo integrando o extra/-1 variaveis. find the marginal p.d.f. of the first function by integrating out the extran — 1 variables. O método ¢ ilustrado para os casos de soma e intervalo de diversas variaveis aleatorias. The method is illustrated for the cases of the sum and the range of several random variables. 3.9 Fungdes de duas ou mais variaveis aleatdrias 187 3.9 Functions of Two or More Random Variables ‘187 Exercicios Exercises 1.Suponha queXieX2sdo variaveis aleatdrias iid e que 11.Para as condicées do Exercicio 9, determine a 1. Suppose that X; and X are i.i.d. random variables and 11. For the conditions of Exercise 9, determine the prob- cada uma delas tem distribuigdo uniforme no intervalo probabilidade de que 0 intervalo deSiparaSnnao contera o that each of them has the uniform distribution on the ability that the interval from Y, to Y, will not contain the [0,1]. Encontre o pdf deS=X1+X2. ponto 1/3. interval [0, 1]. Find the p.d.-f. of Y = X, + Xp. point 1/3. 2.Para as condic¢6es do Exercicio 1, encontre o pdf da 12.Deixar Cdenotam o intervalo de uma amostra aleatoria 2. For the conditions of Exercise 1, find the p.d.f. of the 12. Let W denote the range of a random sample of n média(Xi+X2)/2. den observacoes da distribuigdo uniforme no intervalo [0, average (X, + X>)/2. observations from the uniform distribution on the interval 3.Suponha que trés varidveis aleatérias™1,X2, 2X3 1}, Determine o valor de Pr/W/>0.9). 3. Suppose that three random variables X,, Xz, and X3 [0, 1], Determine the value of Pr(W > 0.9). tém uma distribuigdo conjunta continua para a qual a pdf 13.Determine a pdf do intervalo de uma amostra have a continuous joint distribution for which the joint 13. Determine the p.d.f. of the range of a random sample conjunta é a seguinte: aleatéria denobservacgées da distribuigdo uniforme no p.d.f. is as follows: of n observations from the uniform distribution on the ‘ 8x1 x2x3para 0 <xeu<1 (eu=1,2,3), intervalo [-3,5]. 8x4X9x3 for0 <x; <1 @=1, 2, 3), interval [—3, 5]. f(x, X2, X3F 0 se outra forma 14.Suponha que, ..., Xnformar uma amostra F(X, X25 %3) = 0 otherwise 14. Suppose that X;,..., X,, form a random sample of n : : aleatéria den observacées da distribuigdo uniforme no ‘ observations from the uniform distribution on the interval Suponha também queS1=X1,52=X1X2, €53=X1X243. intervalo [0,1], e deixeSdenota a segunda maior das Suppose also that Yj = X1, Y2 = X;X, and ¥3= X;X2X3. _[0, 1], and let Y denote the second largest of the observa- Encontre o PDF conjunto de5i,52, eS3. observagées. Determine o pdf de Y. Dica:Primeiro Find the joint p.d-f. of Y;, Y2, and Y3. tions. Determine the p.d.f. of Y. Hint: First determine the 4.Suponha queXieX2tém uma distribuigdo conjunta determine o cdfGdeSao notar que 4. Suppose that X, and X, have a continuous joint distri- c.d.f G of Y by noting that continua para a qual a pdf conjunta é a sequinte: G(s)-Pr.(S<e) bution for which the joint p.d.f. is as follows: G(y) = Pr(Y < y) fl . { Xi+x2para 0<x1<1 e 0<x2<1, =Pr.(Pelo menosn-1 observacées<e). FC ) | xy +x, for0 <x, <1and0 <x) <1, = Pr(At least n — 1 observations < y). X1, X2. X14, X2) = 0) de outra forma. 15.Mostre que sexi, X2,..., XnSdo variaveis aleatdrias 0 otherwise. 15. Show that if X;, X5,..., X, are independent random Encontre 0 pdf deS=%iX2. independentes e se51=R1(X1),S2=R2(X2),..., Sn=Rn(Xn), entao 1, Find the p.d.f. of Y = XX. variables and if Y; =1r,(Xj), Y= r2(X9),..., Vn, = T(Xn), S2,..., Sntambém sdo variaveis aleatérias independentes. then Y;, Yo,..., Y, are also independent random vari- 5.Suponha que a pdf conjunta deXiexX2é como dado no 5. Suppose that the joint p.d.f. of X; and X> is as given in ables. Exercicio 4. Encontre o pdf deZ=X1/X2. Exercise 4. Find the p.d.f. of Z = X4/X>. . 16.Suponha queX1,X2,..., Xssdo cinco varidveis aleatérias 16. Suppose that X;, X2,..., X5 are five random vari- 6.DeixarXeSser variaveis aleatdrias para as quais a junta para as quais a pdf conjunta pode ser fatorada na seguinte 6. Let X and Y be random variables for which the joint ables for which the joint p.df. can be factored in the fol- pdf € o seguinte: forma para todos os pontos(x1, x2,..., X5JER5: p.d.f. is as follows: lowing form for all points (x1, x9, ..., x5) € R>: { . Fix, yr 2(x+e) para O0S.xSsims1, caso f(x, X2,..., XSF G(X, X2)A0B, x4, X5), fa.yy= | 2ix+y) forO<x<y<l, Pq Hoy ooo Xs) = G(X], X9)A(X3, 4, XS), 0 contrario. ondegehsdo certas funcdes nao negativas. Mostre que 0 otherwise. where g and A are certain nonnegative functions. Show Encontre o pdf deZ=X+5. seSi=Ri (M1, X2Je2=R2(X3, X4, X5), entdo as variaveis Find the p.d.f. of Z=X+Y. that if Y) =r, (X1, Xy) and ¥y = rp (X3, X4, X5), then the 7.Suponha queXieX2sdo variaveis aleatdrias iid e que aleatorias.SieSesdo independentes. 7. Suppose that X; and X> are i.i.d. random variables and random variables Y; and Y are independent. a pdf de cada uma delas é a seguinte: 17.No Exemplo 3.9.10, use o método Jacobiano (3.9.13) that the p.d.f. of each of them is as follows: 17. In Example 3.9.10, use the Jacobian method (3.9.13) {oe para verificar queSeZsdo independentes e que a Eq. _ to verify that Y and Z are independent and that Eq. Ex parax>0 , : e* forx>0 . ‘ fF P ' (3.9.18) € a pdf marginal deZ. f(Qx)= | ? (3.9.18) is the marginal p.d.f. of Z. 0 de outra forma. 0 otherwise. .. . 18.Deixe o pdf condicional deXdadoSsergi(x| e 3x2/s3 . 18. Let the conditional p.d.f. of X given Y be gj(x|y) = Encontre o pdf deS=Xi-X2. para O<x < ye 0 caso contrario. Deixe o pdf marginal deS Find the p.d.f. of Y = X1 — Xo. 3x7/y? for 0 <x < y and 0 otherwise. Let the marginal 8.Suponha que, ..., Xnformar uma amostra aleatoria de ser(s), ondef(s-0 parasims0, mas nao € especificado de 8. Suppose that X;,..., X, formarandom sample of size _P-d.f. of Y be fx(y), where f)(y) =0 for y <0 but is oth- tamanho nda distribuigao uniforme no intervalo [0,1]e queSn= __Outra forma. DeixarZ=X/¥.Prove issoZeSsao independentes n from the uniform distribution on the interval [0,1] and _ eTWise unspecified. Let Z = X/Y. Prove that Z and Y are maximo {X1,..., Xn}. Encontre o menor valor den de tal modo e encontre a pdf marginal deZ that Y, = max {X,..., X,,}. Find the smallest value of n independent and find the marginal p.d.f. of Z. que 19.DeixarXieX2seja como no Exercicio 7. Encontre o pdf de such that 19. Let X, and X> be as in Exercise 7. Find the p.d.f. of pr{Sn20.99}20.95. S=X+X2. Pr{Y, > 0.99} = 0.95. Y=X,+X. 9.Suponha que omvariaveis™, ..., Xnformar uma amostra 20.Seaz= 9 no Teorema 3.9.4, mostre que a Eq. (3.9.2) 9. Suppose that the n variables X1,..., X, formarandom 20. If a, = 0 in Theorem 3.9.4, show that Eq. (3.9.2) be- aleatoria a partir da distribuigdo uniforme no intervalo [0, torna-se igual a Eq. (3.8.1) coma-aieF A. sample from the uniform distribution on the interval [0, 1] comes the same as Eq. (3.8.1) with a =a, and f = fi. 1] e que as variaveis aleatoriasS1eSnsao definidos como 21.Nos Exemplos 3.9.9 e 3.9.11, encontre a pdf marginal and that the random variables Y; and Y,, are defined as 21. In Examples 3.9.9 and 3.9.11, find the marginal p.d.f. na Eq. (3.9.8). Determine o valor de Pr(Si<0.1e Sns0.8). de2=m /X2primeiro transformando emZ1e2=Xie depois in Eq. (3.9.8). Determine the value of Pr(y, < 0.1 and of Zi = X4/X2 by first transforming to Zi and Z) = Xy and integrandozefora do pdf conjunto Y, < 0.8). then integrating z> out of the joint p.d.f. 10.Para as condicées do Exercicio 9, determine o valor 10. For the conditions of Exercise 9, determine the value de Pr(S1<0.1 eSn20.8). of Pr(Y, < 0.1 and Y,, > 0.8). 188 Chapter 3 Random Variables and Distributions ⋆ 3.10 Markov Chains A popular model for systems that change over time in a random manner is the Markov chain model. A Markov chain is a sequence of random variables, one for each time. At each time, the corresponding random variable gives the state of the system. Also, the conditional distribution of each future state given the past states and the present state depends only on the present state. Stochastic Processes Example 3.10.1 Occupied Telephone Lines. Suppose that a certain business office has five telephone lines and that any number of these lines may be in use at any given time. During a certain period of time, the telephone lines are observed at regular intervals of 2 minutes and the number of lines that are being used at each time is noted. Let X1 denote the number of lines that are being used when the lines are first observed at the beginning of the period; let X2 denote the number of lines that are being used when they are observed the second time, 2 minutes later; and in general, for n = 1, 2, . . . , let Xn denote the number of lines that are being used when they are observed for the nth time. ◀ Definition 3.10.1 Stochastic Process. A sequence of random variables X1, X2, . . . is called a stochastic process or random process with discrete time parameter. The first random variable X1 is called the initial state of the process; and for n = 2, 3, . . . , the random variable Xn is called the state of the process at time n. In Example 3.10.1, the state of the process at any time is the number of lines being used at that time. Therefore, each state must be an integer between 0 and 5. Each of the random variables in a stochastic process has a marginal distribution, and the entire process has a joint distribution. For convenience, in this text, we will discuss only joint distributions for finitely many of X1, X2, . . . at a time. The meaning of the phrase “discrete time parameter” is that the process, such as the numbers of occupied phone lines, is observed only at discrete or separated points in time, rather than continuously in time. In Sec. 5.4, we will introduce a different stochastic process (called the Poisson process) with a continuous time parameter. In a stochastic process with a discrete time parameter, the state of the process varies in a random manner from time to time. To describe a complete probability model for a particular process, it is necessary to specify the distribution for the initial state X1 and also to specify for each n = 1, 2, . . . the conditional distribution of the subsequent state Xn+1 given X1, . . . , Xn. These conditional distributions are equivalent to the collection of conditional c.d.f.’s of the following form: Pr(Xn+1 ≤ b|X1 = x1, X2 = x2, . . . , Xn = xn). Markov Chains A Markov chain is a special type of stochastic process, defined in terms of the conditional distributions of future states given the present and past states. Definition 3.10.2 Markov Chain. A stochastic process with discrete time parameter is a Markov chain if, for each time n, the conditional distributions of all Xn+j for j ≥ 1given X1, . . . , Xn depend only on Xn and not on the earlier states X1, . . . , Xn−1. In symbols, for 188 Capítulo 3 Variáveis Aleatórias e Distribuições - 3.10 Cadeias de Markov Um modelo popular para sistemas que mudam aleatoriamente ao longo do tempo é o modelo de cadeia de Markov. Uma cadeia de Markov é uma sequência de variáveis aleatórias, uma para cada tempo. A cada momento, a variável aleatória correspondente fornece o estado do sistema. Além disso, a distribuição condicional de cada estado futuro, dados os estados passados e o estado presente, depende apenas do estado presente. Processos Estocásticos Exemplo 3.10.1 Linhas telefônicas ocupadas.Suponha que um determinado escritório comercial tenha cinco telefones linhas e que qualquer número dessas linhas pode estar em uso a qualquer momento. Durante um determinado período de tempo, as linhas telefônicas são observadas em intervalos regulares de 2 minutos e é anotado o número de linhas que estão sendo utilizadas em cada horário. DeixarX1 denota o número de linhas que estão sendo usadas quando as linhas são observadas pela primeira vez no início do período; deixarX2denota o número de linhas que estão sendo utilizadas quando são observadas pela segunda vez, 2 minutos depois; e em geral, paran=1,2 , . . . , deixarXndenotam o número de linhas que estão sendo usadas quando são observadas para o nª vez. - Definição 3.10.1 Processo estocástico.Uma sequência de variáveis aleatóriasX1, X2, . . .é chamado deestocástico processoouprocesso aleatóriocomparâmetro de tempo discreto. A primeira variável aleatóriaX1 é chamado deEstado inicialdo processo; e paran=2,3, . . . ,a variável aleatóriaXn é chamado deestado do processo no tempo n. No Exemplo 3.10.1, o estado do processo em qualquer momento é o número de linhas em uso naquele momento. Portanto, cada estado deve ser um número inteiro entre 0 e 5. Cada uma das variáveis aleatórias em um processo estocástico possui uma distribuição marginal e todo o processo possui uma distribuição conjunta. Por conveniência, neste texto discutiremos apenas distribuições conjuntas para um número finito deX1, X2, . . .de uma vez. O significado da frase “parâmetro de tempo discreto” é que o processo, como o número de linhas telefônicas ocupadas, é observado apenas em pontos discretos ou separados no tempo, em vez de continuamente no tempo. Na seg. 5.4, apresentaremos um processo estocástico diferente (chamado processo de Poisson) com um parâmetro de tempo contínuo. Em um processo estocástico com parâmetro de tempo discreto, o estado do processo varia de maneira aleatória de tempos em tempos. Para descrever um modelo de probabilidade completo para um processo específico, é necessário especificar a distribuição para o estado inicialX1e também para especificar para cadan=1,2, . . .a distribuição condicional do estado subsequenteXn+1dadoX1, . . . , Xn. Essas distribuições condicionais são equivalentes à coleção de cdfs condicionais no seguinte formato: Pr.(Xn+1≤b|X1=x1, X2=x2, . . . , Xn=xn). Cadeias de Markov Uma cadeia de Markov é um tipo especial de processo estocástico, definido em termos de distribuições condicionais de estados futuros dados os estados presentes e passados. Definição 3.10.2 Cadeia de Markov.Um processo estocástico com parâmetro de tempo discreto é umCadeia de Markov se, para cada vezn, as distribuições condicionais de todosXn+jparaj≥1 dadoX1, . . . , Xn depender apenas deXne não nos estados anterioresX1, . . . , Xn−1. Em símbolos, por 3.10 Markov Chains 189 n = 1, 2, . . . and for each b and each possible sequence of states x1, x2, . . . , xn, Pr(Xn+1 ≤ b|X1 = x1, X2 = x2, . . . , Xn = xn) = Pr(Xn+1 ≤ b|Xn = xn). A Markov chain is called finite if there are only finitely many possible states. In the remainder of this section, we shall consider only finite Markov chains. This assumption could be relaxed at the cost of more complicated theory and calculation. For convenience, we shall reserve the symbol k to stand for the number of possible states of a general finite Markov chain for the remainder of the section. It will also be convenient, when discussing a general finite Markov chain, to name the k states using the integers 1, . . . , k. That is, for each n and j, Xn = j will mean that the chain is in state j at time n. In specific examples, it may prove more convenient to label the states in a more informative fashion. For example, if the states are the numbers of phone lines in use at given times (as in the example that introduced this section), we would label the states 0, . . . , 5 even though k = 6. The following result follows from the multiplication rule for conditional proba- bilities, Theorem 2.1.2. Theorem 3.10.1 For a finite Markov chain, the joint p.f. for the first n states equals Pr (X1 = x1, X2 = x2, . . . , Xn = xn) = Pr(X1 = x1) Pr(X2 = x2|X1 = x1) Pr(X3 = x3|X2 = x2) . . . Pr(Xn = xn|Xn−1 = xn−1). (3.10.1) Also, for each n and each m > 0, Pr (Xn+1 = xn+1, Xn+2 = xn+2, . . . , Xn+m = xn+m|Xn = xn) = Pr(Xn+1 = xn+1|Xn = xn) Pr(Xn+2 = xn+2|Xn+1 = xn+1) . . . Pr(Xn+m = xn+m|Xn+m−1 = xn+m−1). (3.10.2) Eq. (3.10.1) is a discrete version of a generalization of conditioning in sequence that was illustrated in Example 3.7.18 with continuous random variables. Eq. (3.10.2) is a conditional version of (3.10.1) shifted forward in time. Example 3.10.2 Shopping for Toothpaste. In Exercise 4 in Sec. 2.1, we considered a shopper who chooses between two brands of toothpaste on several occasions. Let Xi = 1 if the shopper chooses brand A on the ith purchase, and let Xi = 2 if the shopper chooses brand B on the ith purchase. Then the sequence of states X1, X2, . . . is a stochas- tic process with two possible states at each time. The probabilities of purchase were specified by saying that the shopper will choose the same brand as on the previous purchase with probability 1/3 and will switch with probability 2/3. Since this hap- pens regardless of purchases that are older than the previous one, we see that this stochastic process is a Markov chain with Pr(Xn+1 = 1|Xn = 1) = 1 3, Pr(Xn+1 = 2|Xn = 1) = 2 3, Pr(Xn+1 = 1|Xn = 2) = 2 3, Pr(Xn+1 = 2|Xn = 2) = 1 3. ◀ Exammple 3.10.2 has an additional feature that puts it in a special class of Markov chains. The probability of moving from one state at time n to another state at time n + 1 does not depend on n. 3.10 Cadeias de Markov 189 n=1,2, . . .e para cadabe cada sequência possível de estadosx1, x2, . . . , xn, Pr.(Xn+1≤b|X1=x1, X2=x2, . . . , Xn=xn)=Pr.(Xn+1≤b|Xn=xn). Uma cadeia de Markov é chamadafinoitese houver apenas um número finito de estados possíveis. No restante desta seção, consideraremos apenas cadeias de Markov finitas. Esta suposição poderia ser relaxada à custa de teoria e cálculos mais complicados. Por conveniência, reservaremos o símbolokpara representar o número de estados possíveis de uma cadeia de Markov finita geral para o restante da seção. Também será conveniente, ao discutir uma cadeia de Markov finita geral, nomear okestados usando os inteiros 1, . . . , k. Ou seja, para cadanej,Xn=jsignificará que a cadeia está no estadojno tempon. Em exemplos específicos, pode ser mais conveniente rotular os estados de uma forma mais informativa. Por exemplo, se os estados forem os números de linhas telefônicas em uso em determinados horários (como no exemplo que introduziu esta seção), rotularíamos os estados como 0, . . . ,5 emborak=6. O seguinte resultado segue da regra de multiplicação para probabilidades condicionais, Teorema 2.1.2. Teorema 3.10.1 Para uma cadeia de Markov finita, o PF conjunto para o primeironestados são iguais Pr.(X1=x1, X2=x2, . . . , Xn=xn) =Pr.(X1=x1)Pr.(X2=x2|X1=x1)Pr.(X3=x3|X2=x2). . . Pr.(Xn=xn|Xn−1=xn−1). Além disso, para cadane cada umm >0, (3.10.1) Pr.(Xn+1=xn+1, Xn+2=xn+2, . . . , Xn+eu=xn+eu|Xn=xn) =Pr.(Xn+1=xn+1|Xn=xn)Pr.(Xn+2=xn+2|Xn+1=xn+1) . . . Pr.(Xn+eu=xn+eu|Xn+eu−1=xn+eu−1). (3.10.2) Eq. (3.10.1) é uma versão discreta de uma generalização de condicionamento em sequência que foi ilustrada no Exemplo 3.7.18 com variáveis aleatórias contínuas. Eq. (3.10.2) é uma versão condicional de (3.10.1) deslocada no tempo. Exemplo 3.10.2 Compras de pasta de dente.No Exercício 4 da Seç. 2.1, consideramos um comprador que escolhe entre duas marcas de pasta de dente em diversas ocasiões. DeixarXeu=1 se o comprador escolher a marcaAnoeua compra e deixeXeu=2 se o comprador escolher a marcaBnoeua compra. Então a sequência de estadosX1, X2, . . .é um processo estocástico com dois estados possíveis em cada tempo. As probabilidades de compra foram especificadas dizendo que o comprador escolherá a mesma marca da compra anterior com probabilidade 1/3 e mudará com probabilidade 2/3. Como isso acontece independentemente de compras anteriores à anterior, vemos que esse processo estocástico é uma cadeia de Markov com 1 3 2 3 2 3 1 3 Pr.(Xn+1= 1|Xn=1)= , Pr.(Xn+1= 2Xn| =1)= , Pr.(Xn+1= 1|Xn=2)= , Pr.(Xn+1= 2|Xn =2)= . - O Exemplo 3.10.2 possui um recurso adicional que o coloca em uma classe especial de cadeias de Markov. A probabilidade de mudar de um estado de cada veznpara outro estado no momento n+1 não depende den. 190 Capitulo 3 Varidveis Aleatdrias e Distribuigées 190 Chapter 3 Random Variables and Distributions Definigao Distribuigdes de Transicdo/Distribuigdes de Transicdo Estacionarias.Considere um Markov finito Definition Transition Distributions/Stationary Transition Distributions. Consider a finite Markov 3.10.3 corrente comkestados possiveis. As distribuigdes condicionais do estado no tempon+1 3.10.3 chain with k possible states. The conditional distributions of the state at time n + 1 dado 0 estado no momentoy, isto é, Pr(Xn+1=/| Xn=eu)paraeu F1,..., Ken=1,2,..., S40 given the state at time n, that is, Pr(X,4; = j|X, =i) fori, j=1,...,k andn= chamados dedistribuicées de transi¢goda cadeia de Markov. Se a distribuigdo de transicao 1,2, ..., are called the transition distributions of the Markov chain. If the transition for a mesma para todos os tempos/(n=1,2, .. .), entao a cadeia de Markov tem distribution is the same for every time n (n = 1, 2, ...), then the Markov chain has distribuigées de transi¢ao estaciondrias. stationary transition distributions. Quando uma cadeia de Markov comkestados possiveis t€m distribuigdes de transicgdo When a Markov chain with k possible states has stationary transition distribu- estacionarias, existem probabilidadespeusparaeu 1, ..., Atal que, para todosn, tions, there exist probabilities p;; fori, j = 1, ..., k such that, for all n, Pr. (Xn+1=/| Xn=euF peuj paran=1,2,.... (3.10.3) Pr(X,41=J|Xn =D =pij forn=1,2,.... (3.10.3) A cadeia de Markov no Exemplo 3.10.2 possui distribuigdes de transicdo estacionarias. Por The Markov chain in Example 3.10.2 has stationary transition distributions. For exemplo,p11= 1A. example, p,, = 1/3. Na linguagem das distribuigdes multivariadas, quando uma cadeia de Markov possui distribuigées de In the language of multivariate distributions, when a Markov chain has stationary transicdo estaciondarias, especificadas por (3.10.3), podemos escrever o PF condicional deXn+1 transition distributions, specified by (3.10.3), we can write the conditional p.f. of Xn4t dadoXncomo given X,, as G(j| euFpeus, (3.10.4) g(jli) = Dij. (3.10.4) para todosn, eu, /. for all n, i, J. Exemplo Linhas telef6nicas ocupadas.Para ilustrar a aplicagdo desses conceitos, vamos Example Occupied Telephone Lines. To illustrate the application of these concepts, we shall 3.10.3 considere novamente o exemplo envolvendo o escritério com cinco linhas telefénicas. Para que este 3.10.3 consider again the example involving the office with five telephone lines. In order processo estocastico seja uma cadeia de Markov, a distribuicdo especificada para o numero de linhas for this stochastic process to be a Markov chain, the specified distribution for the que podem estar em uso em cada momento deve depender apenas do numero de linhas que number of lines that may be in use at each time must depend only on the number estavam em uso quando 0 processo foi observado mais recentemente, 2 minutos antes. e nao deve of lines that were in use when the process was observed most recently 2 minutes depender de quaisquer outros valores observados obtidos anteriormente. Por exemplo, se trés linhas earlier and must not depend on any other observed values previously obtained. For estivessem em uso no momenton, entdo a distribuicdo de tempont1 deve ser o mesmo, example, if three lines were in use at time n, then the distribution for time n + 1 must independentemente de 0,1,2,3,4 ou 5 linhas estavam em uso no momentor-1. Na realidade, porém, a be the same regardless of whether 0, 1, 2, 3, 4, or 5 lines were in use at time n — 1. observacdo no momentor-1 pode fornecer algumas informac6es em relagdo ao periodo de tempo In reality, however, the observation at time n — 1 might provide some information in durante o qual cada uma das trés linhas em uso no momentonestava ocupado, e esta informacdo regard to the length of time for which each of the three lines in use at time n had been pode ser util para determinar a distribuicdo do tempont1. No entanto, vamos supor agora que este occupied, and this information might be helpful in determining the distribution for processo 6 uma cadeia de Markov. Para que esta cadeia de Markov tenha distribuicdes de transicdo time n + 1. Nevertheless, we shall suppose now that this process is a Markov chain. estacionarias, deve ser verdade que as taxas as quais as chamadas telefénicas de entrada e de saida If this Markov chain is to have stationary transition distributions, it must be true that sdo feitas e a duracdo média destas chamadas telefénicas ndo se alteram durante todo o periodo the rates at which incoming and outgoing telephone calls are made and the average abrangido pelo processo. Este requisito significa que 0 periodo global nado pode incluir periodos de duration of these telephone calls do not change during the entire period covered maior movimento, quando sao esperadas mais chamadas, ou periodos de siléncio, quando sdo by the process. This requirement means that the overall period cannot include busy esperadas menos chamadas. Por exemplo, se apenas uma linha estiver em uso em um determinado times when more calls are expected or quiet times when fewer calls are expected. For momento de observacao, independentemente de quando esse hordrio ocorre durante todo o periodo example, if only one line is in use at a particular observation time, regardless of when coberto pelo processo, entdo deve haver uma probabilidade especificapiisso exatamentesas linhas this time occurs during the entire period covered by the process, then there must be estardo em uso 2 minutos depois. - a specific probability p,; that exactly j lines will be in use 2 minutes later. < A Matriz de Transicao The Transition Matrix Exemplo Compras de pasta de dente.A notacao para distribuigdes de transigdo estacionarias, peu, Example Shopping for Toothpaste. The notation for stationary transition distributions, p;;, 3.10.4 sugere que eles poderiam ser organizados em uma matriz. As probabilidades de transi¢do para 3.10.4 suggests that they could be arranged in a matrix. The transition probabilities for o Exemplo 3.10.2 podem ser organizadas na seguinte matriz: Example 3.10.2 can be arranged into the following matrix: C, 2] 1 2 3 3 3. 3 a | r-[i 3). 3 3 7 3. 3 3.10 Cadeias de Markov 191 3.10 Markov Chains 191 Toda cadeia de Markov finita com distribuicdes de transicdo estacionarias possui uma matriz Every finite Markov chain with stationary transition distributions has a matrix like como a construida no Exemplo 3.10.4. the one constructed in Example 3.10.4. Definigao Matriz de Transic¢do.Considere uma cadeia de Markov finita com distribuigdo de transi¢gdo estacionaria Definition Transition Matrix. Consider a finite Markov chain with stationary transition distribu- 3.10.4 ges dadas porpeu =Pr.(Xn+1=/| Xn=euJpara todosn, eu, /,Omatriz de transi¢aoda 3.10.4 tions given by p;; = Pr(X,41 = j|X, =4) for all n, i, j. The transition matrix of the cadeia de Markov é definido como okxAmatrizPcom elementospeu jAquilo é, Markov chain is defined to be the k x k matrix P with elements p;;. That is, Puce piel Pil *** Pik oe k P21 °** P2k P-| > _ « | (3.10.5) P=|00 OO |. (3.10.5) pki 2 es Pkkkkk Phi * °° Pkk Uma matriz de transicdo possui diversas propriedades que sdo evidentes em sua definicdo. Por A transition matrix has several properties that are apparent from its defintion. exemplo, cada elemento é ndo negativo porque todos os elementos sao probabilidades. For example, each element is nonnegative because all elements are probabilities. Como cada linha de uma matriz de transic¢do é ump condicional para 0 préximo estado dado Since each row of a transition matrix is a conditional p.f. for the next state given algum valor do estado atual, temos Ki peuj=1 paraeu=1,..., k. Na verdade, linha some value of the current state, we have vi pij =1fori=1,...,k. Indeed, row euda matriz de transicdo especifica o PF condicionalg/(.| eu)definido em (3.10.4). i of the transition matrix specifies the conditional p.f. g(-|i) defined in (3.10.4). Definicao Matriz Estocastica.Uma matriz quadrada para a qual todos os elementos sdo ndo negativos e o Definition Stochastic Matrix. A square matrix for which all elements are nonnegative and the 3.10.5 a soma dos elementos em cada linha é 1 6 chamada dematriz estocastica. 3.10.5 sum of the elements in each row is 1 is called a stochastic matrix. E claro que a matriz de transicdo Ppara cada cadeia de Markov finita com probabilidades de transicao It is clear that the transition matrix P for every finite Markov chain with stationary estacionaria deve haver uma matriz estocdstica. Por outro lado, cadakxAmatriz estocastica pode servir transition probabilities must be a stochastic matrix. Conversely, every k x k stochastic como matriz de transicdo de uma cadeia de Markov finita comkestados possiveis e distribuicdes de matrix can serve as the transition matrix of a finite Markov chain with k possible states transicdo estacionarias. and stationary transition distributions. Exemplo Uma Matriz de Transicdo para o Numero de Linhas Telefénicas Ocupadas.Suponha que no Example A Transition Matrix for the Number of Occupied Telephone Lines. Suppose that in the 3.10.5 exemplo envolvendo o escritério com cinco linhas telefénicas, o numero de linhas usadas as 3.10.5 example involving the office with five telephone lines, the numbers of lines being vezes 1,2, .. formar uma cadeia de Markov com distribuigées de transicdo estacionarias. Esta used at times 1, 2, ... form a Markov chain with stationary transition distributions. cadeia tem seis estados possiveis 0,1, ...,5, ondeeué o estado em que exatamente eulinhas This chain has six possible states 0, 1,...,5, where i is the state in which exactly estdo sendo usadas em um determinado momento (eu=0,1, .. . ,5). Suponha que a matriz de i lines are being used at a given time (i = 0, 1,..., 5). Suppose that the transition transicdoPé o seguinte: matrix P is as follows: fo 1 2 3 4 5] 0 1 2 3 4 ~=5 00.1 0.40.2 0.1 0.1 0.1 0; 01 04 0.2 O01 O01 O01 1]]0.2 0.3 0.2 0.1 0.1 0.1 | 1} 02 03 02 01 O01 0.1 2||0.1 0.2 0.3 0.2 0.1 0.1 2) 01 02 03 0.2 O01 O01 P= (3.10.6) P= . (3.10.6) 3}|/0.1 0.1 0.2 0.3 0.2 0.1 ' 3] 01 01 0.2 03 02 O01 All0.1 0.1 0.1 0.2 0.3 0.2 | 4} 01 01 01 02 03 02 5 0.1 0.1 0.1 0.1 0.40.2 5] 01 01 O01 O01 04 0.2 (a) Supondo que todas as cinco linhas estejam em uso num determinado momento de observacao, (a) Assuming that all five lines are in use at a certain observation time, we shall determinaremos a probabilidade de que exatamente quatro linhas estejam em uso no proximo momento de determine the probability that exactly four lines will be in use at the next observation observacao. (b) Supondo que nenhuma linha esteja em uso em um determinado momento, determinaremos time. (b) Assuming that no lines are in use at a certain time, we shall determine the a probabilidade de que pelo menos uma linha esteja em uso no préximo hordario de observacao. probability that at least one line will be in use at the next observation time. (a) Esta probabilidade é 0 elemento da matrizPna linha correspondente ao estado 5 (a) This probability is the element in the matrix P in the row corresponding to the e na coluna correspondente ao estado 4. Seu valor é visto como 0,4. state 5 and the column corresponding to the state 4. Its value is seen to be 0.4. (b) Se nenhuma linha estiver em uso em um determinado momento, entdo o elemento no canto superior (b) If no lines are in use at a certain time, then the element in the upper left corner esquerdo da matrizPda a probabilidade de que nenhuma linha esteja em uso no préximo momento of the matrix P gives the probability that no lines will be in use at the next de observacao. Seu valor é considerado 0,1. Portanto, a probabilidade de que pelo menos uma linha observation time. Its value is seen to be 0.1. Therefore, the probability that at esteja em uso no proximo momento de observacdo é 1 - 0.1 = 0.9. - least one line will be in use at the next observation time is 1 — 0.1 = 0.9. < 192 Chapter 3 Random Variables and Distributions Figure 3.28 The generation following {Aa, Aa}. A a a A Aa aa AA aA Example 3.10.6 Plant Breeding Experiment. A botanist is studying a certain variety of plant that is monoecious (has male and female organs in separate flowers on a single plant). She begins with two plants I and II and cross-pollinates them by crossing male I with female II and female I with male II to produce two offspring for the next generation. The original plants are destroyed and the process is repeated as soon as the new generation of two plants is mature. Several replications of the study are run simultaneously. The botanist might be interested in the proportion of plants in any generation that have each of several possible genotypes for a particular gene. (See Example 1.6.4 on page 23.) Suppose that the gene has two alleles, A and a. The genotype of an individual will be one of the three combinations AA, Aa, or aa. When a new individual is born, it gets one of the two alleles (with probability 1/2 each) from one of the parents, and it independently gets one of the two alleles from the other parent. The two offspring get their genotypes independently of each other. For example, if the parents have genotypes AA and Aa, then an offspring will get A for sure from the first parent and will get either A or a from the second parent with probability 1/2 each. Let the states of this population be the set of genotypes of the two members of the current population. We will not distinguish the set {AA, Aa} from {Aa, AA}. There are then six states: {AA, AA}, {AA, Aa}, {AA, aa}, {Aa, Aa}, {Aa, aa}, and {aa, aa}. For each state, we can calculate the probability that the next generation will be in each of the six states. For example, if the state is either {AA, AA} or {aa, aa}, the next generation will be in the same state with probability 1. If the state is {AA, aa}, the next generation will be in state {Aa, Aa} with probability 1. The other three states have more complicated transitions. If the current state is {Aa, Aa}, then all six states are possible for the next gen- eration. In order to compute the transition distribution, it helps to first compute the probability that a given offspring will have each of the three genotypes. Figure 3.28 illustrates the possible offspring in this state. Each arrow going down in Fig. 3.28 is a possible inheritance of an allele, and each combination of arrows terminating in a genotype has probability 1/4. It follows that the probability of AA and aa are both 1/4, while the probability of Aa is 1/2, because two different combinations of arrows lead to this offspring. In order for the next state to be {AA, AA}, both off- spring must be AA independently, so the probability of this transition is 1/16. The same argument implies that the probability of a transition to {aa, aa} is 1/16. A tran- sition to {AA, Aa} requires one offspring to be AA (probability 1/4) and the other to be Aa (probabilty 1/2). But the two different genotypes could occur in either order, so the whole probability of such a transition is 2 × (1/4) × (1/2) = 1/4. A similar ar- gument shows that a transition to {Aa, aa} also has probability 1/4. A transition to {AA, aa} requires one offspring to be AA (probability 1/4) and the other to be aa (probability 1/4). Once again, these can occur in two orders, so the whole probabil- ity is 2 × 1/4 × 1/4 = 1/8. By subtraction, the probability of a transition to {Aa, Aa} must be 1 − 1/16 − 1/16 − 1/4 − 1/4 − 1/8 = 1/4. Here is the entire transition matrix, which can be verified in a manner similar to what has just been done: 192 Capítulo 3 Variáveis Aleatórias e Distribuições Figura 3.28A geração seguinte {Aa, aa}. Um um Um um AA aA Ah ah Exemplo 3.10.6 Experimento de melhoramento de plantas.Um botânico está estudando uma certa variedade de planta que é monóica (possui órgãos masculinos e femininos em flores separadas em uma única planta). Ela começa com duas plantas I e II e as poliniza cruzando o macho I com a fêmea II e a fêmea I com o macho II para produzir dois descendentes para a próxima geração. As plantas originais são destruídas e o processo se repete assim que a nova geração de duas plantas amadurece. Várias replicações do estudo são executadas simultaneamente. O botânico pode estar interessado na proporção de plantas em qualquer geração que possuem cada um dos vários genótipos possíveis para um determinado gene. (Veja o Exemplo 1.6.4 na página 23.) Suponha que o gene tenha dois alelos,Aea. O genótipo de um indivíduo será uma das três combinaçõesAA,Ah, ouah. Quando um novo indivíduo nasce, ele recebe um dos dois alelos (com probabilidade 1/2 cada) de um dos pais e, independentemente, obtém um dos dois alelos do outro pai. Os dois descendentes obtêm seus genótipos independentemente um do outro. Por exemplo, se os pais têm genótipos AAeAh, então um descendente obterá Acom certeza do primeiro pai e obteráAouado segundo pai com probabilidade 1/2 cada. Sejam os estados desta população o conjunto de genótipos dos dois membros da população atual. Não distinguiremos o conjunto {AA, AA} de {Aa, AA}. Existem então seis estados: {AA, AA}, {AA, AA}, {AA, aa}, {Aa, aa}, {Aa, aa}, e {ah, ah}. Para cada estado, podemos calcular a probabilidade de a próxima geração estar em cada um dos seis estados. Por exemplo, se o estado for {AA, AA} ou {ah, ah}, a próxima geração estará no mesmo estado com probabilidade 1. Se o estado for {AA, aa}, a próxima geração estará no estado {Aa, aa}com probabilidade 1. Os outros três estados têm transições mais complicadas. Se o estado atual for {Aa, aa}, então todos os seis estados serão possíveis para a próxima geração. Para calcular a distribuição de transição, é útil primeiro calcular a probabilidade de um determinado descendente ter cada um dos três genótipos. A Figura 3.28 ilustra os possíveis descendentes neste estado. Cada seta descendo na Figura 3.28 é uma possível herança de um alelo, e cada combinação de setas terminando em um genótipo tem probabilidade 1/4. Segue- se que a probabilidade deAAeahsão ambos 1/4, enquanto a probabilidade deAhé 1/2, porque duas combinações diferentes de flechas levam a essa prole. Para que o próximo estado seja { AA, AA}, ambos os descendentes devem serAAindependentemente, então a probabilidade dessa transição é 1/16. O mesmo argumento implica que a probabilidade de uma transição para {ah, ah}é 16/01. Uma transição para {AA, AA}requer que um descendente sejaAA( probabilidade 1/4) e o outro seráAh(probabilidade 1/2). Mas os dois genótipos diferentes poderiam ocorrer em qualquer ordem, então a probabilidade total de tal transição é 2×(1/4)×(1/ 2)=1/4. Um argumento semelhante mostra que uma transição para {Aa, aa}também tem probabilidade 1/4. Uma transição para {AA, aa}requer que um descendente sejaAA( probabilidade 1/4) e o outro seráah (probabilidade 1/4). Mais uma vez, estes podem ocorrer em duas ordens, então a probabilidade total é 2×1/4×1/4 = 1/8. Por subtração, a probabilidade de uma transição para {Aa, aa} deve ser 1 - 1/16-1/16-1/4-1/4-1/8 = 1/4. Aqui está toda a matriz de transição, que pode ser verificada de maneira semelhante ao que acabamos de fazer: 3.10 Markov Chains 193 ⎡ ⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣ {AA, AA} {AA, Aa} {AA, aa} {Aa, Aa} {Aa, aa} {aa, aa} {AA, AA} 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 {AA, Aa} 0.2500 0.5000 0.0000 0.2500 0.0000 0.0000 {AA, aa} 0.0000 0.0000 0.0000 1.0000 0.0000 0.0000 {Aa, Aa} 0.0625 0.2500 0.1250 0.2500 0.2500 0.0625 {Aa, aa} 0.0000 0.0000 0.0000 0.2500 0.5000 0.2500 {aa, aa} 0.0000 0.0000 0.0000 0.0000 0.0000 1.0000 ⎤ ⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦ . ◀ The Transition Matrix for Several Steps Example 3.10.7 Single Server Queue. A manager usually checks the server at her store every 5 minutes to see whether the server is busy or not. She models the state of the server (1 = busy or 2 = not busy) as a Markov chain with two possible states and stationary transition distributions given by the following matrix: P = ⎡ ⎢⎣ Busy Not busy Busy 0.9 0.1 Not busy 0.6 0.4 ⎤ ⎥⎦. The manager realizes that, later in the day, she will have to be away for 10 minutes and will miss one server check. She wants to compute the conditional distribution of the state two time periods in the future given each of the possible states. She reasons as follows: If Xn = 1 for example, then the state will have to be either 1 or 2 at time n + 1 even though she does not care now about the state at time n + 1. But, if she computes the joint conditional distribution of Xn+1 and Xn+2 given Xn = 1, she can sum over the possible values of Xn+1 to get the conditional distribution of Xn+2 given Xn = 1. In symbols, Pr(Xn+2 = 1|Xn = 1) = Pr(Xn+1 = 1, Xn+2 = 1|Xn = 1) + Pr(Xn+1 = 2, Xn+2 = 1|Xn = 1). By the second part of Theorem 3.10.1, Pr(Xn+1 = 1, Xn+2 = 1|Xn = 1) = Pr(Xn+1 = 1|Xn = 1) Pr(Xn+2 = 1|Xn+1 = 1) = 0.9 × 0.9 = 0.81. Similarly, Pr(Xn+1 = 2, Xn+2 = 1|Xn = 1) = Pr(Xn+1 = 2|Xn = 1) Pr(Xn+2 = 1|Xn+1 = 2) = 0.1 × 0.6 = 0.06. It follows that Pr(Xn+2 = 1|Xn = 1) = 0.81+ 0.06 = 0.87, and hence Pr(Xn+2 = 2|Xn = 1) = 1 − 0.87 = 0.13. By similar reasoning, if Xn = 2, Pr(Xn+2 = 1|Xn = 2) = 0.6 × 0.9 + 0.4 × 0.6 = 0.78, and Pr(Xn+2 = 2|Xn = 2) = 1 − 0.78 = 0.22. ◀ Generalizing the calculations in Example 3.10.7 to three or more transitions might seem tedious. However, if one examines the calculations carefully, one sees a pattern 3.10 Cadeias de Markov 193 ⎡ {AA, AA} {AA, AA}⎢ {AA, AA} {AA, aa} {Aa, aa} {Aa, aa} 0.0000 0.0000 0.0000 0.0000 {ah, ah}⎤ 0.0000⎥ 1.0000 0.2500 0.0000 0.0625 0.0000 0.0000 {AA, AA}⎢ ⎢ 0.5.000 0.0000 0.2500 0.0000 0.0000 0.0000 0.0000 0.1250 0.0000 0.0000 0.2500 1.0000 0.0000 0. 2500 0.2500 0. 2500 0.5000 0. 0000 0.0000 0.0000 0.0000⎥⎥ ⎢⎢⎢⎢⎢⎢⎢⎣ ⎥ {AA, aa} {Aa, aa} {Aa, aa} {ah, ah} 0.0000⎥⎥ . 0.0625⎥⎥⎥ 0.2500⎥⎦ 1.0000 - A Matriz de Transição para Várias Etapas Exemplo 3.10.7 Fila de servidor único.Um gerente geralmente verifica o servidor de sua loja a cada 5 minutos para ver se o servidor está ocupado ou não. Ela modela o estado do servidor (1 = ocupado ou 2 = não ocupado) como uma cadeia de Markov com dois estados possíveis e distribuições de transição estacionárias dadas pela seguinte matriz: ⎡Ocupado Desocupado⎤ 0.1 0.4 Ocupado Desocupado ⎢⎣ 0.9 0.6 ⎥⎦. P= A gerente percebe que, no final do dia, terá que se ausentar por 10 minutos e perderá uma verificação do servidor. Ela deseja calcular a distribuição condicional do estado em dois períodos de tempo no futuro, dados cada um dos estados possíveis. Ela raciocina da seguinte forma: seXn=1, por exemplo, então o estado terá que ser 1 ou 2 de cada vez n+1 mesmo que ela não se importe agora com o estado da épocan+1. Mas, se ela calcular a distribuição condicional conjunta deXn+1eXn+2dadoXn=1, ela pode somar os valores possíveis deXn+1para obter a distribuição condicional deXn+2dado Xn=1. Em símbolos, Pr.(Xn+2= 1|Xn=1)=Pr.(Xn+1= 1, Xn+2= 1|Xn=1) + Pr.(Xn+1= 2, Xn+2= 1|Xn=1). Pela segunda parte do Teorema 3.10.1, Pr.(Xn+1= 1, Xn+2= 1|Xn=1)=Pr.(Xn+1= 1|Xn=1)Pr.(Xn+2= 1|Xn+1= 1) =0.9×0.9 = 0.81. De forma similar, Pr.(Xn+1= 2, Xn+2= 1|Xn=1)=Pr.(Xn+1= 2|Xn=1)Pr.(Xn+2= 1|Xn+1= 2) =0.1×0.6 = 0.06. Segue-se que o Pr(Xn+2= 1|Xn=1)=0.81 + 0.06 = 0.87 e, portanto, Pr(Xn+2= 2|Xn= 1)=1 − 0.87 = 0.13. Por raciocínio semelhante, seXn=2, Pr.(Xn+2= 1|Xn=2)=0.6×0.9 + 0.4×0.6 = 0.78, e Pr(Xn+2= 2|Xn=2)=1 − 0.78 = 0.22. - Generalizar os cálculos do Exemplo 3.10.7 para três ou mais transições pode parecer tedioso. No entanto, se examinarmos cuidadosamente os cálculos, veremos um padrão 194 Capitulo 3 Varidveis Aleatdrias e Distribuigées 194 Chapter 3 Random Variables and Distributions isso permitira um calculo compacto das distribuigdes de transigdo para varias etapas. that will allow a compact calculation of transition distributions for several steps. Considere uma cadeia de Markov geral comkestados possiveis 1,..., ke a matriz de Consider a general Markov chain with k possible states 1, ..., k and the transition transigdo Pdado pela Eq. (3.10.5). Supondo que a cadeia esteja no estadoevem um dado matrix P given by Eq. (3.10.5). Assuming that the chain is in state i at a given time n, instanten, determinaremos agora a probabilidade de a cadeia estar no estado/no tempon we shall now determine the probability that the chain will be in state j at time n +2. +2, Em outras palavras, determinaremos a probabilidade condicional deXn+2=/dado In other words, we shall determine the conditional probability of X,4. = j given Xn=eu. A notacdo para esta probabilidade ép@) ,.,, X,, =i. The notation for this probability is pe. Argumentamos como o gerente fez no Exemplo 3.10.7. Deixar Rdenotar o valor deXn1 We argue as the manager did in Example 3.10.7. Let r denote the value of X,,44 isso nado é de interesse primario, mas € util para o calculo. Entdo that is not of primary interest but is helpful to the calculation. Then . 2 . . PO) 7 Pr.(Xn2=/| Xn=eu) pO = Pr(Xn42 = j|Xn =i) yk k = Pr.(Xne1=ReXn2=/| Xn=eu) =SUPrM(Xny =r and Xpy2 = |X, =i) R=1 r=l yk k =~ Pr.(Xne1=R| Xn=euPr.(Xn+2=/| Xne=r, Xn=eu) = SO Pr(X ny. = 71 Xn =H) Pr(X yg = HXnu =" X, =H) R=1 r=1 x k = Pr.(Xn1=R| Xn=euyPr.(Xn+2=/| Xn =r) = > Pr(Xny1 =7|Xp =) Pr(Xny2 = [Xn =") R=1 r=1 yk k = Pirprj, = > Pir Prj> R=1 r=l onde a terceira igualdade segue do Teorema 2.1.3 e a quarta igualdade segue da where the third equality follows from Theorem 2.1.3 and the fourth equality follows definig¢do de uma cadeia de Markov. from the definition of a Markov chain. O valor dep) eu pode ser determinado da seguinte maneira: Se a transi¢gdo The value of py can be determined in the following manner: If the transition matrizPé ao quadrado, isto é, se a matrizP2=PPé construido, entdo o elemento em matrix P is squared, that is, if the matrix P? = PP is constructed, then the element in oeua linha e af coluna da matrizPavai ser Ae pirpr.Portanto, pa) eu/ the ith row and the jth column of the matrix P? will be ean Pir Pj Therefore, py sera o elemento doeua linha e aya coluna deP2. will be the element in the ith row and the jth column of P?. Por um argumento semelhante, a probabilidade de a cadeia passar do estadoeupara By asimilar argument, the probability that the chain will move from the state i to o Estadoyem trés etapas, oup@)eu =Pr.(Xn+3=/| Xn=eu), pode ser encontrado construindo the state j in three steps, or pe = Pr(X,,43 = j|X, = 1), can be found by constructing o MatrixP3=P2P. Entao a probabilidadep@) eu Sera o elemento doev linha e the matrix P? = P’P. Then the probability pe will be the element in the ith row and of coluna da matrizP3. the jth column of the matrix P?. Em geral, temos 0 seguinte resultado. In general, we have the following result. Teorema Transicdes de multiplas etapas.DeixarPser a matriz de transicao de uma cadeia de Markov finita com Theorem Multiple Step Transitions. Let P be the transition matrix of a finite Markov chain with 3.10.2 distribuigdes de transigdo estaciondarias. Para cadaeu=2,3, .. ., oeuo poder Peudo 3.10.2 stationary transition distributions. For each m = 2, 3,..., the mth power P” of the matrizPtem em linhaeue colunaja probabilidadepim)eu que a cadeia se movera de matrix P has in rowi and column j the probability pe that the chain will move from estado eudeclararjem eupassos. = state i to state j in m steps. , 7 Definigao Matriz de transi¢do de multiplas etapas.Sob as condigées do Teorema 3.10.2, 0 ma- Definition Multiple Step Transition Matrix. Under the conditions of Theorem 3.10.2, the ma- 3.10.6 trixPeué chamado dematriz de transic¢ao m-passoda cadeia de Markov. 3.10.6 trix P” is called the m-step transition matrix of the Markov chain. Em resumo, oec? linha doeumatriz de transigdo em etapas fornece a distribuicdo In summary, the ith row of the m-step transition matrix gives the conditional distri- condicional deXn+eudadoXn=eupara todoseu=1,..., ke tudon, m=1,2,.... bution of X,,,,, given X, =i foralli=1,...,k andalln,m=1,2,.... Exemplo As Matrizes de Transicdo em Duas e Trés Etapas para o Numero de Telefones Ocupados Example The Two-Step and Three-Step Transition Matrices for the Number of Occupied Telephone 3.10.8 Linhas.Considere novamente a matriz de transi¢doPdado pela Eq. (3.10.6) para a cadeia de Markov 3.10.8 Lines. Consider again the transition matrix P given by Eq. (3.10.6) for the Markov baseada em cinco linhas telefénicas. Vamos supor primeiro queeuinhas estao em uso em um chain based on five telephone lines. We shall assume first that i lines are in use at a 3.10 Markov Chains 195 certain time, and we shall determine the probability that exactly j lines will be in use two time periods later. If we multiply the matrix P by itself, we obtain the following two-step transition matrix: P2 = ⎡ ⎢⎢⎢⎢⎢⎢⎢⎢⎣ 0 1 2 3 4 5 0 0.14 0.23 0.20 0.15 0.16 0.12 1 0.13 0.24 0.20 0.15 0.16 0.12 2 0.12 0.20 0.21 0.18 0.17 0.12 3 0.11 0.17 0.19 0.20 0.20 0.13 4 0.11 0.16 0.16 0.18 0.24 0.15 5 0.11 0.16 0.15 0.17 0.25 0.16 ⎤ ⎥⎥⎥⎥⎥⎥⎥⎥⎦ . (3.10.7) From this matrix we can find any two-step transition probability for the chain, such as the following: i. If two lines are in use at a certain time, then the probability that four lines will be in use two time periods later is 0.17. ii. If three lines are in use at a certain time, then the probability that three lines will again be in use two time periods later is 0.20. We shall now assume that i lines are in use at a certain time, and we shall determine the probability that exactly j lines will be in use three time periods later. If we construct the matrix P3 = P2P, we obtain the following three-step transi- tion matrix: P3 = ⎡ ⎢⎢⎢⎢⎢⎢⎢⎢⎣ 0 1 2 3 4 5 0 0.123 0.208 0.192 0.166 0.183 0.128 1 0.124 0.207 0.192 0.166 0.183 0.128 2 0.120 0.197 0.192 0.174 0.188 0.129 3 0.117 0.186 0.186 0.179 0.199 0.133 4 0.116 0.181 0.177 0.176 0.211 0.139 5 0.116 0.180 0.174 0.174 0.215 0.141 ⎤ ⎥⎥⎥⎥⎥⎥⎥⎥⎦ . (3.10.8) From this matrix we can find any three-step transition probability for the chain, such as the following: i. If all five lines are in use at a certain time, then the probability that no lines will be in use three time periods later is 0.116. ii. If one line is in use at a certain time, then the probability that exactly one line will again be in use three time periods later is 0.207. ◀ Example 3.10.9 Plant Breeding Experiment. In Example 3.10.6, the transition matrix has many zeros, since many of the transitions will not occur. However, if we are willing to wait two steps, we will find that the only transitions that cannot occur in two steps are those from the first state to anything else and those from the last state to anything else. 3.10 Cadeias de Markov 195 certo tempo, e determinaremos a probabilidade de que exatamentejas linhas estarão em uso dois períodos depois. Se multiplicarmos a matrizPpor si só, obtemos a seguinte matriz de transição em duas etapas: ⎡0 1 2 3 4 5⎤ 0 0.14 0.23 0.20 0.15 0.16 0.12 ⎢ ⎥ ⎥ 1⎢⎢0.13 0.24 0.20 0.15 0.16 0.12⎥ 2⎢⎢0.12 0.20 0.21 0.18 0.17 0.12⎥ P2= ⎥ 3⎢ ⎥. (3.10.7) ⎢0.11 0.17 0.19 0.20 0.20 0.13⎥ 4⎢⎣0.11 0.16 0.16 0.18 0.24 0.15⎥ ⎦ 5 0.11 0.16 0.15 0.17 0.25 0.16 A partir desta matriz podemos encontrar qualquer probabilidade de transição em duas etapas para a cadeia, como a seguinte: eu. Se duas linhas estiverem em uso em um determinado momento, a probabilidade de quatro linhas estarem em uso dois períodos depois é de 0,17. ii. Se três linhas estiverem em uso em um determinado momento, então a probabilidade de três linhas estarem novamente em uso dois períodos depois é de 0,20. Vamos agora assumir queeulinhas estão em uso em um determinado momento, e determinaremos a probabilidade de que exatamentejas linhas estarão em uso três períodos depois. Se construirmos a matrizP3=P2P, obtemos a seguinte matriz de transição de três etapas: ⎡0 1 2 3 4 5⎤ 0 0.123 0.208 0.192 0.166 0.183 0.128 ⎢ ⎥ ⎥ 1⎢⎢0.124 0.207 0.192 0.166 0.183 0.128⎥ 2⎢⎢0.120 0.197 0.192 0.174 0.188 0.129⎥ P3= ⎥ 3⎢ ⎥. (3.10.8) ⎢0.117 0.186 0.186 0.179 0.199 0.133⎥ 4⎢⎣0.116 0.181 0.177 0.176 0.211 0.139⎥ ⎦ 5 0.116 0.180 0.174 0.174 0.215 0.141 A partir desta matriz podemos encontrar qualquer probabilidade de transição de três etapas para a cadeia, como a seguinte: eu. Se todas as cinco linhas estiverem em uso em um determinado momento, então a probabilidade de que nenhuma linha esteja em uso três períodos depois é de 0,116. ii. Se uma linha estiver em uso em um determinado momento, então a probabilidade de que exatamente uma linha esteja novamente em uso três períodos depois é de 0,207. - Exemplo 3.10.9 Experimento de melhoramento de plantas.No Exemplo 3.10.6, a matriz de transição tem muitos zeros, já que muitas das transições não ocorrerão. Contudo, se estivermos dispostos a esperar dois passos, descobriremos que as únicas transições que não podem ocorrer em dois passos são as do primeiro estado para qualquer outro estado e as do último estado para qualquer outro. 196 Chapter 3 Random Variables and Distributions Here is the two-step transition matrix: ⎡ ⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣ {AA, AA} {AA, Aa} {AA, aa} {Aa, Aa} {Aa, aa} {aa, aa} {AA, AA} 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 {AA, Aa} 0.3906 0.3125 0.0313 0.1875 0.0625 0.0156 {AA, aa} 0.0625 0.2500 0.1250 0.2500 0.2500 0.0625 {Aa, Aa} 0.1406 0.1875 0.0313 0.3125 0.1875 0.1406 {Aa, aa} 0.0156 0.0625 0.0313 0.1875 0.3125 0.3906 {aa, aa} 0.0000 0.0000 0.0000 0.0000 0.0000 1.0000 ⎤ ⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦ . Indeed, if we look at the three-step or the four-step or the general m-step transition matrix, the first and last rows will always be the same. ◀ The first and last states in Example 3.10.9 have the property that, once the chain gets into one of those states, it can’t get out. Such states occur in many Markov chains and have a special name. Definition 3.10.7 Absorbing State. In a Markov chain, if pii = 1for some state i, then that state is called an absorbing state. In Example 3.10.9, there is positive probability of getting into each absorbing state in two steps no matter where the chain starts. Hence, the probability is 1 that the chain will eventually be absorbed into one of the absorbing states if it is allowed to run long enough. The Initial Distribution Example 3.10.10 Single Server Queue. The manager in Example 3.10.7 enters the store thinking that the probability is 0.3 that the server will be busy the first time that she checks. Hence, the probability is 0.7 that the server will be not busy. These values specify the marginal distribution of the state at time 1, X1. We can represent this distribution by the vector v = (0.3, 0.7) that gives the probabilities of the two states at time 1 in the same order that they appear in the transition matrix. ◀ The vector giving the marginal distribution of X1 in Example 3.10.10 has a special name. Definition 3.10.8 Probability Vector/Initial Distribution. A vector consisting of nonnegative numbers that add to 1 is called a probability vector. A probability vector whose coordinates specify the probabilities that a Markov chain will be in each of its states at time 1 is called the initial distribution of the chain or the intial probability vector. For Example 3.10.2, the initial distribution was given in Exercise 4 in Sec. 2.1 as v = (0.5, 0.5). The initial distribution and the transition matrix together determine the entire joint distribution of the Markov chain. Indeed, Theorem 3.10.1 shows how to con- struct the joint distribution of the chain from the initial probability vector and the transition matrix. Letting v = (v1, . . . , vk) denote the initial distribution, Eq. (3.10.1) can be rewritten as Pr(X1 = x1, X2 = x2, . . . , Xn = xn) = vx1px1x2 . . . pxn−1xn. (3.10.9) 196 Capítulo 3 Variáveis Aleatórias e Distribuições Aqui está a matriz de transição em duas etapas: ⎡ {AA, AA} {AA, AA}⎢ {AA, AA} {AA, aa} {Aa, aa} {Aa, aa} 0.0000 0.0000 0.0000 0.0000 0.3125 0. 0313 0.1875 0.0625 {ah, ah}⎤ 0.0000⎥ 1.0000 0.3906 0.0625 0.1406 0.0156 0.0000 {AA, AA}⎢ ⎢ 0.0156⎥⎥ ⎢⎢⎢⎢⎢⎢⎢⎣ ⎥ {AA, aa} {Aa, aa} {Aa, aa} {ah, ah} 0.2500 0.1250 0.2500 0.2500 0.0625⎥⎥ 0.1875 0.0625 0.0313 0.0313 0.3125 0.1875 0.1875 0.3125 . 0.1406⎥⎥⎥ 0.3906⎥⎦ 0.0000 0.0000 0.0000 0.0000 1.0000 Na verdade, se olharmos para os três passos ou para os quatro passos ou para oeumatriz de transição de etapas, a primeira e a última linhas serão sempre as mesmas. - O primeiro e o último estado do Exemplo 3.10.9 têm a propriedade de que, uma vez que a cadeia entre em um desses estados, ela não poderá sair. Tais estados ocorrem em muitas cadeias de Markov e possuem um nome especial. Definição 3.10.7 Estado Absorvente.Numa cadeia de Markov, sepeu=1 para algum estadoeu, então esse estado é chamado umestado absorvente. No Exemplo 3.10.9, há probabilidade positiva de entrar em cada estado absorvente em duas etapas, não importa onde a cadeia comece. Portanto, a probabilidade é 1 de que a cadeia acabe sendo absorvida em um dos estados absorventes se for permitida a execução por tempo suficiente. A distribuição inicial Exemplo 3.10.10 Fila de servidor único.O gerente do Exemplo 3.10.7 entra na loja pensando que o a probabilidade é de 0,3 de que o servidor esteja ocupado na primeira vez que ele verificar. Portanto, a probabilidade é de 0,7 de que o servidor não esteja ocupado. Esses valores especificam a distribuição marginal do estado no tempo 1,X1. Podemos representar esta distribuição pelo vetor v=(0 .3,0.7)isso fornece as probabilidades dos dois estados no tempo 1 na mesma ordem em que aparecem na matriz de transição. - O vetor que dá a distribuição marginal deX1no Exemplo 3.10.10 tem um nome especial. Definição 3.10.8 Vetor de probabilidade/distribuição inicial.Um vetor que consiste em números não negativos que soma a 1 é chamado devetor de probabilidade. Um vetor de probabilidade cujas coordenadas especificam as probabilidades de uma cadeia de Markov estar em cada um de seus estados no tempo 1 é chamado dedistribuição inicialda cadeia ou dovetor de probabilidade inicial. Para o Exemplo 3.10.2, a distribuição inicial foi dada no Exercício 4 da Seção. 2.1 como v=( 0.5,0.5). A distribuição inicial e a matriz de transição determinam juntas toda a distribuição conjunta da cadeia de Markov. Na verdade, o Teorema 3.10.1 mostra como construir a distribuição conjunta da cadeia a partir do vetor de probabilidade inicial e da matriz de transição. De locaçãov=(v1, . . . , vk)denota a distribuição inicial, Eq. (3.10.1) pode ser reescrito como Pr.(X1=x1, X2=x2, . . . , Xn=xn)=vxpx1x2 . . . px . (3.10.9) 1 n−1xn 3.10 Cadeias de Markov 197 3.10 Markov Chains 197 As distribuigées marginais de estados em momentos posteriores a 1 podem ser encontradas na The marginal distributions of states at times later than 1 can be found from the distribuigdo conjunta. joint distribution. Teorema Distribuigdes marginais em tempos diferentes de 1.Considere uma cadeia de Markov finita com Theorem Marginal Distributions at Times Other Than |. Consider a finite Markov chain with 3.10.3 distribuigées de transicdo estacionarias com distribuigdo inicialve matriz de transigdo P.A 3.10.3 stationary transition distributions having initial distribution v and transition matrix distribuigdo marginal deXn, o estado no momenton, é dado pelo vetor de probabilidadevP P. The marginal distribution of X,,, the state at time n, is given by the probability m1, vector vP”—1, ProvaA distribuigdo marginal deXnpode ser encontrado na Eq. (3.10.9) somando os Proof The marginal distribution of X,, can be found from Eq. (3.10.9) by summing valores possiveis dex1,..., Xn-1. Aquilo é, over the possible values of x1, ..., x,—1. That is, yk YAY. k kok Pr.(Xn=XnF vee Vi sag xox PX™IXM, (3.10.10) Pr(X, =n) = Yo YOO UP Pry Py ye, (3-10.10) Xr-1=1 X2=1x1=1 X,-1=1 xg=1 x;=1 Asoma mais interna na Eq. (3.10.10) parax1= 1,..., Kenvolve apenas os dois primeiros fatores The innermost sum in Eq. (3.10.10) for x; =1, ..., k involves only the first two factors Vxg0x 132 © — produz oxzcoordenada de vP. Da mesma forma, a proxima soma mais interna Vy,Px,x, and produces the x2 coordinate of vP. Similarly, the next innermost sum sobrex2= 1,..., Kenvolve apenas ox2zcoordenada devPepx.3 eproduzo over x7 =1,..., k involves only the x2 coordinate of vP and p,,,, and produces the x3coordenada dePPVvP2. Procedendo desta forma por todosn-1 somatorio x3 coordinate of vPP = vP*. Proceeding in this way through all n — 1 summations produz oxncoordenada devPr-1. 7 produces the x, coordinate of vP”—!. 7 Exemplo Probabilidades para o numero de linhas telefOnicas ocupadas.Considere novamente o Example Probabilities for the Number of Occupied Telephone Lines. Consider again the office 3.10.11 escritério com cinco linhas telef6nicas e a cadeia de Markov para a qual a matriz de transicao Pé 3.10.11 with five telephone lines and the Markov chain for which the transition matrix P is dado pela Eq. (3.10.6). Suponha que no inicio do processo de observacéo no momenton=1, a given by Eq. (3.10.6). Suppose that at the beginning of the observation process at probabilidade de que nenhuma linha esteja em uso é 0,5, a probabilidade de que uma linha time n = 1, the probability that no lines will be in use is 0.5, the probability that one esteja em uso é 0,3 e a probabilidade de que duas linhas estejam em uso é 0,2. Entdo o vetor de line will be in use is 0.3, and the probability that two lines will be in use is 0.2. Then probabilidade inicial év=(0.5,0.3,0.2,0,0,0). Determinaremos primeiro a distribuigéo do numero the initial probability vector is v = (0.5, 0.3, 0.2, 0, 0, 0). We shall first determine the de linhas em uso no tempo 2, um periodo depois. distribution of the number of lines in use at time 2, one period later. Por um calculo elementar descobrir-se-a que By an elementary computation it will be found that vP=(0.13,0.33,0.22,0.12,0.10,0.10). vP = (0.13, 0.33, 0.22, 0.12, 0.10, 0.10). Como o primeiro componente deste vetor de probabilidade é 0,13, a probabilidade de que nenhuma linha Since the first component of this probability vector is 0.13, the probability that no esteja em uso no momento 2 é 0,13; como o segundo componente é 0,33, a probabilidade de que lines will be in use at time 2 is 0.13; since the second component is 0.33, the probability exatamente uma linha esteja em uso no momento 2 é 0,33; e assim por diante. that exactly one line will be in use at time 2 is 0.33; and so on. A seguir, determinaremos a distribuigdo do numero de linhas que estarao em uso no Next, we shall determine the distribution of the number of lines that will be in tempo 3. use at time 3. Pelo uso da Eq. (3.10.7), pode-se descobrir que By use of Eq. (3.10.7), it can be found that vP2=(0.133,0.227,0.202,0.156,0.162,0.120). vP? = (0.133, 0.227, 0.202, 0.156, 0.162, 0.120). Como 0 primeiro componente deste vetor de probabilidade é 0,133, a probabilidade de que nenhuma Since the first component of this probability vector is 0.133, the probability that linha esteja em uso no momento 3 é 0,133; como o segundo componente é 0,227, a probabilidade de no lines will be in use at time 3 is 0.133; since the second component is 0.227, the que exatamente uma linha esteja em uso no momento 3 é 0,227; e assim por diante. - probability that exactly one line will be in use at time 3 is 0.227; and so on. < Distribuigdes Estacionarias Stationary Distributions Exemplo Uma distribuicdo inicial especial para linhas telef6nicas.Suponha que a distribuicdo inicial para Example A Special Initial Distribution for Telephone Lines. Suppose that the initial distribution 3.10.12 o numero de linhas telefénicas ocupadas seja 3.10.12 for the number of occupied telephone lines is v=(0.119,0.193,0.186,0.173,0.196,0.133). v = (0.119, 0.193, 0.186, 0.173, 0.196, 0.133). Pode-se mostrar, por multiplicagéo de matrizes, quevP=v. Isto significa que sevé a It can be shown, by matrix multiplication, that vP = v. This means that if v is the distribuigdo inicial, entao também é a distribuigdo apds uma transi¢do. initial distribution, then it is also the distribution after one transition. Hence, it will Conseqtientemente, também sera a distribuigdo apds duas ou mais transigées. - also be the distribution after two or more transitions as well. < 198 Chapter 3 Random Variables and Distributions Definition 3.10.9 Stationary Distribution. Let P be the transition matrix for a Markov chain. A proba- bility vector v that satisfies vP = v is called a stationary distribution for the Markov chain. The initial distribution in Example 3.10.12 is a stationary distribution for the tele- phone lines Markov chain. If the chain starts in this distribution, the distribution stays the same at all times. Every finite Markov chain with stationary transition distribu- tions has at least one stationary distribution. Some chains have a unique stationary distribution. Note: A Stationary Distribution Does Not Mean That the Chain is Not Moving. It is important to note that vP n gives the probabilities that the chain is in each of its states after n transitions, calculated before the initial state of the chain or any transitions are observed. These are different from the probabilities of being in the various states after observing the initial state or after observing any of the intervening transitions. In addition, a stationary distribution does not imply that the Markov chain is staying put. If a Markov chain starts in a stationary distribution, then for each state i, the probability that the chain is in state i after n transitions is the same as the probability that it is state i at the start. But the Markov chain can still move around from one state to the next at each transition. The one case in which a Markov chain does stay put is after it moves into an absorbing state. A distribution that is concentrated solely on absorbing states will necessarily be stationary because the Markov chain will never move if it starts in such a distribution. In such cases, all of the uncertainty surrounds the initial state, which will also be the state after every transition. Example 3.10.13 Stationary Distributions for the Plant Breeding Experiment. Consider again the experi- ment described in Example 3.10.6. The first and sixth states, {AA, AA} and {aa, aa}, respectively, are absorbing states. It is easy to see that every initial distribution of the form v = (p, 0, 0, 0, 0, 1 − p) for 0 ≤ p ≤ 1 has the property that vP = v. Suppose that the chain is in state 1 with probability p and in state 6 with probability 1 − p at time 1. Because these two states are absorbing states, the chain will never move and the event X1 = 1 is the same as the event that Xn = 1 for all n. Similarly, X1 = 6 is the same as Xn = 6. So, thinking ahead to where the chain is likely to be after n transitions, we would also say that it will be in state 1 with probability p and in state 6 with probability 1 − p. ◀ Method for Finding Stationary Distributions We can rewrite the equation vP = v that defines stationary distributions as v[P − I]= 0, where I is a k × k identity matrix and 0 is a k-dimensional vector of all zeros. Unfortunately, this system of equations has lots of solutions even if there is a unique stationary distribution. The reason is that whenever v solves the system, so does cv for all real c (including c = 0). Even though the system has k equations for k variables, there is at least one redundant equation. However, there is also one missing equation. We need to require that the solution vector v has coordinates that sum to 1. We can fix both of these problems by replacing one of the equations in the original system by the equation that says that the coordinates of v sum to 1. To be specific, define the matrix G to be P − I with its last column replaced by a column of all ones. Then, solve the equation 198 Capítulo 3 Variáveis Aleatórias e Distribuições Definição 3.10.9 Distribuição Estacionária.DeixarPser a matriz de transição para uma cadeia de Markov. Uma probabilidade vetor de capacidadevque satisfazvP=vé chamado dedistribuição estacionáriapara a cadeia de Markov. A distribuição inicial no Exemplo 3.10.12 é uma distribuição estacionária para a cadeia de Markov das linhas telefônicas. Se a cadeia começar nesta distribuição, a distribuição permanecerá a mesma em todos os momentos. Toda cadeia de Markov finita com distribuições de transição estacionárias possui pelo menos uma distribuição estacionária. Algumas cadeias têm uma distribuição estacionária única. Nota: Uma distribuição estacionária não significa que a cadeia não esteja se movendo.É importante notar quevPndá as probabilidades de que a cadeia esteja em cada um de seus estados apósntransições, calculadas antes do estado inicial da cadeia ou de quaisquer transições serem observadas. Estas são diferentes das probabilidades de estar nos vários estados após observar o estado inicial ou após observar qualquer uma das transições intervenientes. Além disso, uma distribuição estacionária não implica que a cadeia de Markov permaneça onde está. Se uma cadeia de Markov começa numa distribuição estacionária, então para cada estadoeu, a probabilidade de a cadeia estar no estadoeudepoisntransições é igual à probabilidade de ser estadoeuno começo. Mas a cadeia de Markov ainda pode passar de um estado para outro em cada transição. O único caso em que uma cadeia de Markov permanece estável é depois de passar para um estado absorvente. Uma distribuição concentrada apenas nos estados absorventes será necessariamente estacionária porque a cadeia de Markov nunca se moverá se começar nessa distribuição. Nesses casos, toda a incerteza envolve o estado inicial, que também será o estado após cada transição. Exemplo 3.10.13 Distribuições Estacionárias para o Experimento de Melhoramento de Plantas.Considere novamente o experimento descrito no Exemplo 3.10.6. O primeiro e o sexto estados, {AA, AA}e { ah, ah}, respectivamente, são estados absorventes. É fácil ver que toda distribuição inicial da formav=(p,0,0,0,0,1 -p)para 0≤p≤1 tem a propriedade quevP=v. Suponha que a cadeia esteja no estado 1 com probabilidadepe no estado 6 com probabilidade 1 -p no tempo 1. Como esses dois estados são estados absorventes, a cadeia nunca se moverá e o eventoX1= 1 é igual ao evento queXn=1 para todosn. De forma similar,X1= 6 é o mesmo queXn=6. Portanto, pensando no futuro, onde a cadeia provavelmente estará depoisn transições, diríamos também que estará no estado 1 com probabilidadepe no estado 6 com probabilidade 1 -p. - Método para encontrar distribuições estacionáriasPodemos reescrever a equaçãovP=v que define distribuições estacionárias comov[P-EU] =0, ondeEUé umk×kmatriz identidade e0é umkvetor -dimensional de todos os zeros. Infelizmente, este sistema de equações tem muitas soluções, mesmo que exista uma distribuição estacionária única. A razão é que semprevresolve o sistema, o mesmo acontececvpara tudo de verdadec(Incluindoc=0). Mesmo que o sistema tenhakequações parakvariáveis, há pelo menos uma equação redundante. No entanto, também falta uma equação. Precisamos exigir que o vetor soluçãovtem coordenadas cuja soma é 1. Podemos resolver ambos os problemas substituindo uma das equações do sistema original pela equação que diz que as coordenadas devsoma para 1. Para ser mais específico, defina a matrizGserP-EUcom sua última coluna substituída por uma coluna de todos. Então, resolva a equação 3.10 Cadeias de Markov 199 3.10 Markov Chains 199 vG=(0,...,0,1). (3.10.11) vG=(0,...,0, 1). (3.10.11) Se existir uma distribuigdo estacionaria Unica, iremos encontra-la resolvendo (3.10.11). If there is a unique stationary distribution, we will find it by solving (3.10.11). In this Neste caso, a matrizGtera um inversoG-1que satisfaz case, the matrix G will have an inverse G—' that satisfies GG-1= G-1 G=EU. GG'=G"'G=I. A solugdo de (3.10.11) sera entdo The solution of (3.10.11) will then be v=00,...,0,1)G-1, v=(0,...,0, DG", que é facilmente visto como a linha inferior da matrizG-1. Este foi o método utilizado para which is easily seen to be the bottom row of the matrix G~!. This was the method encontrar a distribuicgéo estacionaria no Exemplo 3.10.12. Se a cadeia de Markov tiver used to find the stationary distribution in Example 3.10.12. If the Markov chain multiplas distribuigdes estacionarias, entéo a matrizGsera singular e este método nado has multiple stationary distributions, then the matrix G will be singular, and this encontrara nenhuma das distribuicgdes estacionarias. Isso é o que aconteceria no Exemplo method will not find any of the stationary distributions. That is what would happen 3.10.13 se o método fosse aplicado. in Example 3.10.13 if one were to apply the method. Exemplo Distribuicdo estacionaria para compras de pasta de dente.Considere a matriz de transicaoP Example Stationary Distribution for Toothpaste Shopping. Consider the transition matrix P 3.10.14 dado no Exemplo 3.10.4. Podemos construir a matrizGdo seguinte modo: 3.10.14 given in Example 3.10.4. We can construct the matrix G as follows: [ 2 2] [2 4] 2 2 2 P-EU= 3 3 . : G= 3 P-I= 3 3 : ob G= 3 1 - FU= 2 2 + porissoG= 4° | . —HT=| 2%, {3 henceG=) 3° , |. 3 3 3 3 3 3 O inverso deGé - - - The inverse of G is 6 [ 3 3] 33 |= 4 4 -1_ | 4 1 1, G "= i 1} , 2- 2 2 2 Vemos agora que a distribuigdo estacionaria é atinha inferior doG-1,v=(1/2,1/2). We now see that the stationary distribution is the bottom row of G~!, v = (1/2, 1/2). - < Ha um caso especial em que se sabe que existe uma distribuicdo estacionaria Unica e que There is a special case in which it is known that a unique stationary distribution possui propriedades especiais. exists and it has special properties. Teorema Se existe eutal que cada elemento Pecé estritamente positivo, entao Theorem If there exists m such that every element of P” is strictly positive, then 3.10.4 a cadeia de Markov tem uma distribuigdo estacionaria Unicay, 3.10.4 ¢ the Markov chain has a unique stationary distribution v, lim&on-«Pné uma matriz com todas as linhas iguais av, e ¢ lim,_,., P” is a matrix with all rows equal to v, and nao importa com que distribuicdo a cadeia de Markov comega, sua distribuigdo apés n ¢ no matter with what distribution the Markov chain starts, its distribution after etapas convergem paravcomon- ~., a n Steps converges to v asn > ov. a Nao provaremos 0 Teorema 3.10.4, embora alguma evidéncia para a segunda We shall not prove Theorem 3.10.4, although some evidence for the second afirmagao possa ser vista na Eq. (3.10.8), onde as seis linhas de P3sdo muito mais claim can be seen in Eq. (3.10.8), where the six rows of P? are much more alike parecidos do que as fileiras dePe sdo muito semelhantes a distribuigdo estacionaria dada than the rows of P and they are very similar to the stationary distribution given in no Exemplo 3.10.12. A terceira afirmagado do Teorema 3.10.4 decorre facilmente da Example 3.10.12. The third claim in Theorem 3.10.4 actually follows easily from the segunda afirmacao. Na seg. 12.5, introduziremos um método que faz uso da terceira second claim. In Sec. 12.5, we shall introduce a method that makes use of the third afirmacado do Teorema 3.10.4 para aproximar distribuicgées de variaveis aleatdérias claim in Theorem 3.10.4 in order to approximate distributions of random variables quando essas distribuigées sdo dificeis de calcular com exatiddo. when those distributions are difficult to calculate exactly. As matrizes de transigdo nos Exemplos 3.10.2, 3.10.5 e 3.10.7 satisfazem as condicgdes The transition matrices in Examples 3.10.2, 3.10.5, and 3.10.7 satisfy the condi- do Teorema 3.10.4. O exemplo a seguir tem uma distribuigdo estacionaria Unica, mas ndo tions of Theorem 3.10.4. The following example has a unique stationary distribution satisfaz as condigdes do Teorema 3.10.4. but does not satisfy the conditions of Theorem 3.10.4. Exemplo Cadeia Alternada.Seja a matriz de transicdo para uma cadeia de Markov de dois estados Example Alternating Chain. Let the transition matrix for a two-state Markov chain be 3.10.15 [py 4! 3.10.15 04 P= . P= . 10 Lio] 200 Chapter 3 Random Variables and Distributions The matrix G is easy to construct and invert, and we find that the unique stationary distribution is v = (0.5, 0.5). However, as m increases, Pm alternates between P and the 2 × 2 identity matrix. It does not converge and never does it have all elements strictly positive. If the initial distribution is (v1, v2), the distribution after n steps alternates between (v1, v2) and (v2, v1). ◀ Another example that fails to satisfy the conditions of Theorem 3.10.4 is the gambler’s ruin problem from Sec. 2.4. Example 3.10.16 Gambler’s Ruin. In Sec. 2.4, we described the gambler’s ruin problem, in which a gambler wins one dollar with probability p and loses one dollar with probability 1 − p on each play of a game. The sequence of amounts held by the gambler through the course of those plays forms a Markov chain with two absorbing states, namely, 0 and k. There are k − 1other states, namely, 1, . . . , k − 1. (This notation violates our use of k to stand for the number of states, which is k + 1in this example. We felt this was less confusing than switching from the original notation of Sec. 2.4.) The transition matrix has first and last row being (1, 0, . . . , 0) and (0, . . . , 1), respectively. The ith row (for i = 1, . . . , k − 1) has 0 everywhere except in coordinate i − 1 where it has 1 − p and in coordinate i + 1 where it has p. Unlike Example 3.10.15, this time the sequence of matrices P m converges but there is no unique stationary distribution. The limit of P m has as its last column the numbers a0, . . . , ak, where ai is the probability that the fortune of a gambler who starts with i dollars reaches k dollars before it reaches 0 dollars. The first column of the limit has the numbers 1 − a0, . . . , 1 − ak and the rest of the limit matrix is all zeros. The stationary distributions have the same form as those in Example 3.10.13, namely, all probability is in the absorbing states. ◀ Summary A Markov chain is a stochastic process, a sequence of random variables giving the states of the process, in which the conditional distribution of the state at the next time given all of the past states depends on the past states only through the most recent state. For Markov chains with finitely many states and stationary transition distributions, the transitions over time can be described by a matrix giving the prob- abilities of transition from the state indexing the row to the state indexing the column (the transition matrix P). The initial probability vector v gives the distribution of the state at time 1. The transition matrix and initial probability vector together allow calculation of all probabilities associated with the Markov chain. In particular, P n gives the probabilities of transitions over n time periods, and vP n gives the distri- bution of the state at time n + 1. A stationary distribution is a probability vector v such that vP = v. Every finite Markov chain with stationary transition distributions has at least one stationary distribution. For many Markov chains, there is a unique stationary distribution and the distribution of the chain after n transitions converges to the stationary distribution as n goes to ∞. Exercises 1. Consider the Markov chain in Example 3.10.2 with ini- tial probability vector v = (1/2, 1/2). a. Find the probability vector specifying the probabili- ties of the states at time n = 2. b. Find the two-step transition matrix. 200 Capítulo 3 Variáveis Aleatórias e Distribuições O MatrixGé fácil de construir e inverter, e descobrimos que a distribuição estacionária única év=(0.5,0.5). No entanto, comoeuaumenta,Peualterna entrePe o 2×2 matriz identidade. Não converge e nunca possui todos os elementos estritamente positivos. Se a distribuição inicial for(v1, v2), a distribuição depoisn passos alternam entre(v1, v2)e(v2, v1). - Outro exemplo que não satisfaz as condições do Teorema 3.10.4 é o problema da ruína do jogador da Seção. 2.4. Exemplo 3.10.16 Ruína do Jogador.Na seg. 2.4, descrevemos o problema da ruína do jogador, no qual um jogador ganha um dólar com probabilidadepe perde um dólar com probabilidade 1 -p em cada jogada de um jogo. A sequência de quantias mantidas pelo jogador ao longo dessas jogadas forma uma cadeia de Markov com dois estados absorventes, a saber, 0 e k. Hák-1 outros estados, a saber, 1, . . . , k-1. (Esta notação viola nosso uso de kpara representar o número de estados, que ék+1 neste exemplo. Achamos que isso era menos confuso do que mudar da notação original de Sec. 2.4.) A matriz de transição tem a primeira e a última linha sendo(1,0, . . . ,0)e(0, . . . ,1), respectivamente. Oeuª linha (para eu=1, . . . , k-1) tem 0 em todos os lugares, exceto na coordenadaeu-1 onde tem 1 -pe em coordenaçãoeu +1 onde temp. Ao contrário do Exemplo 3.10.15, desta vez a sequência de matrizesPeu converge, mas não há distribuição estacionária única. O limite dePeutem como última coluna os númerosa0, . . . , ak, ondeaeué a probabilidade de que a fortuna de um jogador que começa comeudólares atingemkdólares antes de chegar a 0 dólares. A primeira coluna do limite contém os números 1 -a0, . . . ,1 -ake o resto da matriz limite são todos zeros. As distribuições estacionárias têm a mesma forma que as do Exemplo 3.10.13, ou seja, todas as probabilidades estão nos estados absorventes. - Resumo Uma cadeia de Markov é um processo estocástico, uma sequência de variáveis aleatórias que fornecem os estados do processo, em que a distribuição condicional do estado no próximo momento, dados todos os estados passados, depende dos estados passados apenas através do estado mais recente. Para cadeias de Markov com um número finito de estados e distribuições de transição estacionárias, as transições ao longo do tempo podem ser descritas por uma matriz que fornece as probabilidades de transição do estado que indexa a linha para o estado que indexa a coluna (a matriz de transiçãoP). O vetor de probabilidade inicialvfornece a distribuição do estado no tempo 1. A matriz de transição e o vetor de probabilidade inicial juntos permitem o cálculo de todas as probabilidades associadas à cadeia de Markov. Em particular,Pn dá as probabilidades de transições ao longonperíodos de tempo, evPndá a distribuição do estado no tempon+1. Uma distribuição estacionária é um vetor de probabilidadev de tal modo quevP=v. Toda cadeia de Markov finita com distribuições de transição estacionárias possui pelo menos uma distribuição estacionária. Para muitas cadeias de Markov, existe uma distribuição estacionária única e a distribuição da cadeia apósntransições convergem para a distribuição estacionária comonvai para∞. Exercícios 1.Considere a cadeia de Markov no Exemplo 3.10.2 com vetor de probabilidade inicialv=(1/2,1/2). a.Encontre o vetor de probabilidade especificando as probabilidades dos estados no tempon=2. b.Encontre a matriz de transição em duas etapas. 3.10 Markov Chains 201 2. Suppose that the weather can be only sunny or cloudy and the weather conditions on successive mornings form a Markov chain with stationary transition probabilities. Suppose also that the transition matrix is as follows: Sunny Cloudy Sunny 0.7 0.3 Cloudy 0.6 0.4 a. If it is cloudy on a given day, what is the probability that it will also be cloudy the next day? b. If it is sunny on a given day, what is the probability that it will be sunny on the next two days? c. If it is cloudy on a given day, what is the probability that it will be sunny on at least one of the next three days? 3. Consider again the Markov chain described in Exer- cise 2. a. If it is sunny on a certain Wednesday, what is the probability that it will be sunny on the following Saturday? b. If it is cloudy on a certain Wednesday, what is the probability that it will be sunny on the following Saturday? 4. Consider again the conditions of Exercises 2 and 3. a. If it is sunny on a certain Wednesday, what is the probability that it will be sunny on both the following Saturday and Sunday? b. If it is cloudy on a certain Wednesday, what is the probability that it will be sunny on both the following Saturday and Sunday? 5. Consider again the Markov chain described in Exer- cise 2. Suppose that the probability that it will be sunny on a certain Wednesday is 0.2 and the probability that it will be cloudy is 0.8. a. Determine the probability that it will be cloudy on the next day, Thursday. b. Determine the probability that it will be cloudy on Friday. c. Determine the probability that it will be cloudy on Saturday. 6. Suppose that a student will be either on time or late for a particular class and that the events that he is on time or late for the class on successive days form a Markov chain with stationary transition probabilities. Suppose also that if he is late on a given day, then the probability that he will be on time the next day is 0.8. Furthermore, if he is on time on a given day, then the probability that he will be late the next day is 0.5. a. If the student is late on a certain day, what is the probability that he will be on time on each of the next three days? b. If the student is on time on a given day, what is the probability that he will be late on each of the next three days? 7. Consider again the Markov chain described in Exer- cise 6. a. If the student is late on the first day of class, what is the probability that he will be on time on the fourth day of class? b. If the student is on time on the first day of class, what is the probability that he will be on time on the fourth day of class? 8. Consider again the conditions of Exercises 6 and 7. Suppose that the probability that the student will be late on the first day of class is 0.7 and that the probability that he will be on time is 0.3. a. Determine the probability that he will be late on the second day of class. b. Determine the probability that he will be on time on the fourth day of class. 9. Suppose that a Markov chain has four states 1, 2, 3, 4 and stationary transition probabilities as specified by the following transition matrix: ⎡ ⎢⎢⎢⎢⎢⎣ 1 2 3 4 1 1/4 1/4 0 1/2 2 0 1 0 0 3 1/2 0 1/2 0 4 1/4 1/4 1/4 1/4 ⎤ ⎥⎥⎥⎥⎥⎦ . a. If the chain is in state 3 at a given time n, what is the probability that it will be in state 2 at time n + 2? b. If the chain is in state 1 at a given time n, what is the probability that it will be in state 3 at time n + 3? 10. Let X1 denote the initial state at time 1 of the Markov chain for which the transition matrix is as specified in Exercise 5, and suppose that the initial probabilities are as follows: Pr(X1 = 1) = 1/8, Pr(X1 = 2) = 1/4, Pr(X1 = 3) = 3/8, Pr(X1 = 4) = 1/4. Determine the probabilities that the chain will be in states 1, 2, 3, and 4 at time n for each of the following values of n: (a) n = 2; (b) n = 3; (c) n = 4. 11. Each time that a shopper purchases a tube of tooth- paste, she chooses either brand A or brand B. Suppose that the probability is 1/3 that she will choose the same brand 3.10 Cadeias de Markov 201 2.Suponha que o tempo possa estar apenas ensolarado ou nublado e que as condições climáticas nas manhãs sucessivas formem uma cadeia de Markov com probabilidades de transição estacionária. Suponha também que a matriz de transição seja a seguinte: a.Se o aluno chegar atrasado em um determinado dia, qual a probabilidade de ele chegar na hora certa em cada um dos próximos três dias? b.Se o aluno chegar pontualmente em um determinado dia, qual é a probabilidade de ele se atrasar em cada um dos próximos três dias? Ensolarado Nublado Ensolarado Nublado 0,7 0,3 7.Considere novamente a cadeia de Markov descrita no Exercício 6. 0,6 0,4 a.Se o aluno chegar atrasado no primeiro dia de aula, qual a probabilidade de ele chegar na hora certa no quarto dia de aula? a.Se estiver nublado num determinado dia, qual a probabilidade de também estar nublado no dia seguinte? b.Se o aluno chegar na hora no primeiro dia de aula, qual a probabilidade de ele chegar na hora no quarto dia de aula? b.Se estiver ensolarado em um determinado dia, qual é a probabilidade de que esteja ensolarado nos próximos dois dias? c.Se estiver nublado num determinado dia, qual é a probabilidade de estar ensolarado em pelo menos um dos próximos três dias? 8.Considere novamente as condições dos Exercícios 6 e 7. Suponha que a probabilidade de o aluno chegar atrasado no primeiro dia de aula seja de 0,7 e que a probabilidade de ele chegar na hora certa seja de 0,3. 3.Considere novamente a cadeia de Markov descrita no Exercício 2. a.Determine a probabilidade de ele chegar atrasado no segundo dia de aula. a.Se fizer sol numa determinada quarta-feira, qual a probabilidade de fazer sol no sábado seguinte? b.Determine a probabilidade de ele chegar na hora certa no quarto dia de aula. b.Se numa determinada quarta-feira estiver nublado, qual a probabilidade de haver sol no sábado seguinte? 9.Suponha que uma cadeia de Markov tenha quatro estados 1, 2, 3, 4 e probabilidades de transição estacionárias conforme especificado pela seguinte matriz de transição: 4.Considere novamente as condições dos Exercícios 2 e 3. a.Se fizer sol numa determinada quarta-feira, qual é a probabilidade de fazer sol no sábado e no domingo seguintes? ⎡1 1⎢1/4 2 3 4⎤ 1/4 1 0 0 1/2⎥ 0 2⎢⎢ 0 ⎥⎥⎥.⎥⎦ ⎢ 3⎢1/2 ⎣ b.Se numa determinada quarta-feira estiver nublado, qual a probabilidade de haver sol no sábado e no domingo seguintes? 0 1/2 0 4 1/4 1/4 1/4 1/4 5.Considere novamente a cadeia de Markov descrita no Exercício 2. Suponha que a probabilidade de fazer sol numa determinada quarta-feira seja de 0,2 e a probabilidade de estar nublado seja de 0,8. a.Se a cadeia estiver no estado 3 em um determinado momenton, qual é a probabilidade de que esteja no estado 2 no momenton+2? b.Se a cadeia estiver no estado 1 em um determinado momenton, qual é a probabilidade de que esteja no estado 3 no momenton+3? a.Determine a probabilidade de estar nublado no dia seguinte, quinta-feira. 10.DeixarX1denotamos o estado inicial no tempo 1 da cadeia de Markov para o qual a matriz de transição é especificada no Exercício 5, e suponhamos que as probabilidades iniciais sejam as seguintes: b.Determine a probabilidade de estar nublado na sexta- feira. c.Determine a probabilidade de estar nublado no sábado. Pr.(X1= 1)=1/8,Pr.(X1= 2)=1/4, Pr.(X1 = 3)=3/8,Pr.(X1= 4)=1/4. 6.Suponha que um aluno chegue na hora ou se atrase para uma aula específica e que os eventos de que ele chegue na hora ou se atrase para a aula em dias sucessivos formem uma cadeia de Markov com probabilidades de transição estacionárias. Suponha também que se ele se atrasar num determinado dia, então a probabilidade de chegar na hora no dia seguinte é de 0,8. Além disso, se ele chegar na hora certa em um determinado dia, a probabilidade de chegar atrasado no dia seguinte é de 0,5. Determine as probabilidades de a cadeia estar nos estados 1, 2, 3 e 4 no momentonpara cada um dos seguintes valores den:(a)n=2;(b)n=3;(c)n=4. 11.Cada vez que um cliente compra um tubo de pasta de dente, ele escolhe uma das marcasAou marcaB. Suponha que a probabilidade seja 1/3 de ela escolher a mesma marca 202 Chapter 3 Random Variables and Distributions chosen on her previous purchase, and the probability is 2/3 that she will switch brands. a. If her first purchase is brand A, what is the probability that her fifth purchase will be brand B? b. If her first purchase is brand B, what is the probability that her fifth purchase will be brand B? 12. Suppose that three boys A, B, and C are throwing a ball from one to another. Whenever A has the ball, he throws it to B with a probability of 0.2 and to C with a probability of 0.8. Whenever B has the ball, he throws it to A with a probability of 0.6 and to C with a probability of 0.4. Whenever C has the ball, he is equally likely to throw it to either A or B. a. Consider this process to be a Markov chain and con- struct the transition matrix. b. If each of the three boys is equally likely to have the ball at a certain time n, which boy is most likely to have the ball at time n + 2? 13. Suppose that a coin is tossed repeatedly in such a way that heads and tails are equally likely to appear on any given toss and that all tosses are independent, with the following exception: Whenever either three heads or three tails have been obtained on three successive tosses, then the outcome of the next toss is always of the opposite type. At time n (n ≥ 3), let the state of this process be specified by the outcomes on tosses n − 2, n − 1, and n. Show that this process is a Markov chain with stationary transition probabilities and construct the transition matrix. 14. There are two boxes A and B, each containing red and green balls. Suppose that box A contains one red ball and two green balls and box B contains eight red balls and two green balls. Consider the following process: One ball is selected at random from box A, and one ball is selected at random from box B. The ball selected from box A is then placed in box B and the ball selected from box B is placed in box A. These operations are then repeated indef- initely. Show that the numbers of red balls in box A form a Markov chain with stationary transition probabilities, and construct the transition matrix of the Markov chain. 15. Verify the rows of the transition matrix in Exam- ple 3.10.6 that correspond to current states {AA, Aa} and {Aa, aa}. 16. Let the initial probability vector in Example 3.10.6 be v = (1/16, 1/4, 1/8, 1/4, 1/4, 1/16). Find the probabilities of the six states after one generation. 17. Return to Example 3.10.6. Assume that the state at time n − 1 is {Aa, aa}. a. Suppose that we learn that Xn+1 is {AA, aa}. Find the conditional distribution of Xn. (That is, find all the probabilities for the possible states at time n given that the state at time n + 1 is {AA, aa}.) b. Suppose that we learn that Xn+1 is {aa, aa}. Find the conditional distribution of Xn. 18. Return to Example 3.10.13. Prove that the stationary distributions described there are the only stationary dis- tributions for that Markov chain. 19. Find the unique stationary distribution for the Markov chain in Exercise 2. 20. The unique stationary distribution in Exercise 9 is v = (0, 1, 0, 0). This is an instance of the following general re- sult: Suppose that a Markov chain has exactly one absorb- ing state. Suppose further that, for each non-absorbing state k, there is n such that the probability is positive of moving from state k to the absorbing state in n steps. Then the unique stationary distribution has probability 1 in the absorbing state. Prove this result. 3.11 Supplementary Exercises 1. Suppose that X and Y are independent random vari- ables, that X has the uniform distribution on the integers 1, 2, 3, 4, 5 (discrete), and that Y has the uniform distribu- tion on the interval [0, 5](continuous). Let Z be a random variable such that Z = X with probability 1/2 and Z = Y with probability 1/2. Sketch the c.d.f. of Z. 2. Suppose that X and Y are independent random vari- ables. Suppose that X has a discrete distribution concen- trated on finitely many distinct values with p.f. f1. Suppose that Y has a continuous distribution with p.d.f. f2. Let Z = X + Y. Show that Z has a continuous distribution and find its p.d.f. Hint: First find the conditional p.d.f. of Z given X = x. 3. Suppose that the random variable X has the following c.d.f.: F(x) = ⎧ ⎪⎪⎪⎪⎨ ⎪⎪⎪⎪⎩ 0 for x ≤ 0, 2 5x for 0 < x ≤ 1, 3 5x − 1 5 for 1 < x ≤ 2, 1 for x > 2. Verify that X has a continuous distribution, and determine the p.d.f. of X. 202 Capítulo 3 Variáveis Aleatórias e Distribuições escolhido em sua compra anterior e a probabilidade é de 2/3 de que ela troque de marca. então colocado na caixaBe a bola selecionada na caixaBé colocado na caixaA. Essas operações são então repetidas indefinidamente. Mostre que o número de bolas vermelhas na caixaAformar uma cadeia de Markov com probabilidades de transição estacionárias e construir a matriz de transição da cadeia de Markov. a.Se a primeira compra dela for de marcaA, qual é a probabilidade de que sua quinta compra seja de marcaB? b.Se a primeira compra dela for de marcaB, qual é a probabilidade de que sua quinta compra seja de marcaB? 15.Verifique as linhas da matriz de transição no Exemplo 3.10.6 que correspondem aos estados atuais {AA, AA}e { Aa, aa}. 12.Suponha que três meninosA,B, eCestão jogando uma bola de um para o outro. Em qualquer momentoAtem a bola, ele joga paraBcom uma probabilidade de 0,2 e paraCcom uma probabilidade de 0,8. Em qualquer momentoBtem a bola, ele joga paraAcom uma probabilidade de 0,6 e paraCcom uma probabilidade de 0,4. Em qualquer momentoC tem a bola, ele tem a mesma probabilidade de jogá-la para qualquer umAouB. 16.Seja o vetor de probabilidade inicial no Exemplo 3.10.6 v=(1/16,1/4,1/8,1/4,1/4,1/16). Encontre as probabilidades dos seis estados após uma geração. 17.Volte ao Exemplo 3.10.6. Suponha que o estado no momenton-1 é {Aa, aa}. a.Considere este processo como uma cadeia de Markov e construa a matriz de transição. a.Suponha que aprendemos queXn+1é {AA, aa}. Encontre a distribuição condicional deXn. (Isto é, encontre todas as probabilidades para os estados possíveis no tempon dado que o estado na épocan+1 é {AA, aa}.) b.Se cada um dos três meninos tiver a mesma probabilidade de ter a bola em um determinado momenton, qual garoto tem maior probabilidade de ter a bola naquele momenton+2? b.Suponha que aprendemos queXn+1é {ah, ah}. Encontre a distribuição condicional deXn. 13.Suponha que uma moeda seja lançada repetidamente de tal forma que caras e coroas tenham a mesma probabilidade de aparecer em qualquer lançamento e que todos os lançamentos sejam independentes, com a seguinte exceção: sempre que três caras ou três coroas forem obtidas em três lançamentos sucessivos , então o resultado do próximo lançamento será sempre do tipo oposto. No tempon(n≥3), deixe o estado deste processo ser especificado pelos resultados nos lançamentosn-2,n-1, en. Mostre que este processo é uma cadeia de Markov com probabilidades de transição estacionárias e construa a matriz de transição. 18.Volte ao Exemplo 3.10.13. Prove que as distribuições estacionárias descritas ali são as únicas distribuições estacionárias para aquela cadeia de Markov. 19.Encontre a distribuição estacionária única para a cadeia de Markov no Exercício 2. 20.A distribuição estacionária única no Exercício 9 év= (0,1,0,0) . Este é um exemplo do seguinte resultado geral: Suponha que uma cadeia de Markov tenha exatamente um estado absorvente. Suponha ainda que, para cada estado não absorventek, hántal que a probabilidade é positiva de passar do estadokpara o estado absorvente emnpassos. Então a distribuição estacionária única tem probabilidade 1 no estado absorvente. Prove este resultado. 14.São duas caixasAeB, cada um contendo bolas vermelhas e verdes. Suponha que essa caixaAcontém uma bola vermelha e duas bolas verdes e uma caixaBcontém oito bolas vermelhas e duas bolas verdes. Considere o seguinte processo: Uma bola é selecionada aleatoriamente da caixaA, e uma bola é selecionada aleatoriamente da caixaB. A bola selecionada da caixaAé 3.11 Exercícios Suplementares 1.Suponha queXeSsão variáveis aleatórias independentes, queXtem a distribuição uniforme nos inteiros 1,2,3,4,5 (discreto), e queStem distribuição uniforme no intervalo [0,5] (contínuo). DeixarZseja uma variável aleatória tal queZ=Xcom probabilidade 1/2 eZ= S com probabilidade 1/2. Esboce o cdf deZ. encontre seu pdfDica:Primeiro encontre o pdf condicional deZ dado X=x. 3.Suponha que a variável aleatóriaXtem o seguinte cdf: ⎧ ⎪⎪0 ⎪⎪⎨2 parax≤0, para 0<x≤1, por 1<x≤2, parax >2. F(x)= 5x 2.Suponha queXeSsão variáveis aleatórias independentes. Suponha queXtem uma distribuição discreta concentrada em um número finito de valores distintos com PFf1. Suponha que Stem distribuição contínua com pdff2. Deixar Z=X+S.Mostre issoZtem uma distribuição contínua e ⎪⎪3⎪⎪5x-1 ⎩ 1 5 Verifique issoXtem uma distribuição contínua e determine a fdp deX. 3.11 Exercicios Suplementares 203 3.11 Supplementary Exercises 203 4.Suponha que a variavel aleatériaXtem distribuigao 11.Suponha que a variavel aleatériaXtem o seguinte 4. Suppose that the random variable X has a continuous 11. Suppose that the random variable X has the following continua com o seguinte pdf: pdf: distribution with the following p.d_f.: p.d.f.: fixe para -00<x <eo, suede parax >0, f@= sot for —oo<x<o. f= | 2e** for x > 0, 0 de outra forma. 0 otherwise. Determine o valorxode tal modo queF (x00.9, ondeF(x) Determine the value xg such that F(x) = 0.9, where F(x) éocdf dex Construa uma variavel aleatériaS=r/X)que tem is the c.d f. of X. Construct a random variable Y = r(X) that has the uni- distribuigdo uniforme no intervalo [0,5]. form distribution on the interval [0, 5]. 5.Suponha queXieX2sdo variaveis aleatdrias iid e que 5. Suppose that X, and X> are iid. random variables, cada uma tem distribuigdo uniforme no intervalo 12.Suponha que as 12 variadveis aleatériasX1,..., X12S8d0 and that each has the uniform distribution on the interval 12. Suppose that the 12 random variables Xj, ..., Xj are [0,1]. Avaliar PR(X2 1+X2 2<1). iid e cada um tem a distribuigdo uniforme no intervalo [0, [0, 1]. Evaluate Pr(xt + x5 <1). i.i.d. and each has the uniform distribution on the interval . 20]. Para0,1,...,19, deixe£Ujdenotar o intervalo (7, +1). [0, 20]. For j = 0, 1,..., 19, let 7; denote the interval (j, 6.Para cada valor dep >1, deixe Determine a probabilidade de que nenhum dos 20 6. For each value of p > 1, let j +1). Determine the probability that none of the 20 dis- y! intervalos disjuntos£Ujcontera mais de uma das variaveis Oo 4 joint intervals J; will contain more than one of the random c(pF x aleatoriasM,..., X12. c(p)= > . variables X1,..., X49. x=1 x= Suponha que a variavel aleatoriaXtem uma distribuigdo 13.Suponha que a distribuicao comune wees uniforme Suppose that the random variable X has a discrete distri- 6. Suppose that we Jomt distribution ae ané ne discreta com o seguinte PF: emum conjuntoAnoxy-avido. Para qual dos seguintes bution with the following p£: form over a set A in the xy-plane. For which of the follow- conjuntosAsdoXe Sindependente? ing sets A are X and Y independent? fix= 1 parax=1,2,.... a.Um circulo com raio 1 e centro na origem f(x) = 1 forx=1,2,.... a. A circle with a radius of 1 and with its center at the c(p)xp c(p)xP origin a.Para cada inteiro positivo fixon, determine a b.Um circulo com raio 1 e centro no ponto@,5) a. For each fixed positive integer n, determine the prob- b. A circle with a radius of 1 and with its center at the probabilidade de queXsera divisivel porn. ability that X will be divisible by n. point (3, 5) b.Determine a probabilidade de queXsera estranho. eum rere ry nos quatro pontos(1,1), ( b. Determine the probability that X will be odd. C. a aN and ny four points (1, 1), 71)61,-1) eF1, , -1), (-1, —)), and (-1, 7-Suponha queXieXesao varlavels aleatorias iid, cada d.Um retangulo com vértices nos quatro pontos(0,0), 7. Suppose that X; and Xp are id. random variables, d. A rectangle with vertices at the four points (0, 0), uma das quais tem o PFf(xJespecificado no Exercicio 6. (0,3),(1,3), e(1,0) each of which has the p.f. f(x) specified in Exercise 6. (0, 3), (1, 3), and (1, 0) Determine a probabilidade de queXi+X2sera par. ae ae Determine the probability that X, + X> will be even. pea , e.Um quadrado com vértices nos quatro pontos(0,0),(1,1 e. A square with vertices at the four points (0, 0), (1, D, 8.Suponha que um sistema eletrénico compreenda quatro ), 0,2), e(-1,1) 8. Suppose that an electronic system comprises four com- (0, 2), and (—1, 1) componentes e deixeXjdenota 0 tempo até o componente/ ponents, and let X ; denote the time until component j fails nado funciona(1,2,3,4). Suponha queX1,X2,X3, eX4 14.Suponha queXeSsdao variaveis aleatorias to operate (j = 1, 2, 3, 4). Suppose that X1, X>, X3, and X4 14. Suppose that X and Y are independent random vari- sdo varidveis aleatérias iid, cada uma das quais tem uma independentes com os seguintes pdfs: are 1.i.d. random variables, each of which has a continuous ables with the following p.d.f.’s: distribuigdo continua com cdfF(x). Suponha que o sistema { distribution with c.d.f. F(x). Suppose that the system will funcionara enquanto o componente 1 e pelo menos um dos fixe 1 para O<x <1, operate as long as both component 1 and at least one of fia) = | 1 for0<x <1, outros trés componentes operarem. Determine o cdf do O de outra forma, the other three components operate. Determine the c.d_f. 0 otherwise, tempo até que o sistema deixe de operar. to . . of the time until the system fails to operate. 1 f(s 8SiMpara 0<vocé <5, ho) = | 8y for0<y<z5, 9.Suponha que uma caixa contenha um grande numero de tachas O de outra forma. 9. Suppose that a box contains a large number of tacks 0 otherwise. e que a probabilidadeXque uma determinada amura caird coma and that the probability X that a particular tack will land ponta para cima quando for langada varia de amura para amura Determine o valor de Pr(X > Y). with its point up when it is tossed varies from tack to tack Determine the value of Pr(X > Y). de acordo com o seguinte pdf: ; ; in accordance with the following p.d.f:: . { 15.Suponha que, num determinado dia, duas pessoasAe B 15. Suppose that, on a particular day, two persons A and 2(1-x) para O<x <1, chegam a uma determinada loja independentemente um 2—x) for0<x <1, B arrive at a certain store independently of each other. OF do outro. Suponha queA loja por 15 minut fon =| S that A remains in the store for 15 minutes and B 0 de outra forma. 0 outro. Suponha queApermanece na loja por 15 minutos 0 otherwise. uppose that A remains in the store for 15 minutes an eB permanece na loja por 10 minutos. Se o horario de . remains in the store for 10 minutes. If the time of arrival Suponha que uma travessa seja selecionada aleatoriamente da caixa e chegada de cada pessoa tiver distribuicdo uniforme ao Suppose that a tack is selected at random from the box of each person has the uniform distribution over the hour que essa travessa seja lancada trés vezes de forma independente. longo do hordrio entre 9hsoue 10:00sou,qual é a and that this tack is then tossed three times independently. between 9:00 a.m. and 10:00 a.m., what is the probability Determine a probabilidade de a tachinha cair com a ponta para cima probabilidade de queAe Bestar na loja ao mesmo tempo? Determine the probability that the tack will land with its that A and B will be in the store at the same time? nos trés langamentos. point up on all three tosses. 10.Suponha que 0 raioXde um circulo é uma variavel 16.Suponha queXe stem o seguinte pdf conjunto: 10. Suppose that the radius X of a circle is a random 16. Suppose that X and Y have the following joint p.d.t-: aleatoria com a Sepuints pdf: Fix, yr { 2(xte) para O<x <y<1, caso variable having the following p.d-: f(x,y) | 2aa+y) forO<x<y<l1, fix= §(3x+1) — para O<x <2, iv 0 contrario. f= | gQxt+1) for0<x <2, ° 0 otherwise. ) de outra forma. Determinar(a)Pr.(X <1/2}(b)o pdf marginal deX e(c)o 0 otherwise. Determine (a) Pr(X < 1/2); (b) the marginal p.d.f. of X; Determine a fdp da area do circulo. pdf condicional deSdado queX=x. Determine the p.d.f. of the area of the circle. and (c) the conditional p.d.f. of Y given that X = x. 204 Capitulo 3 Varidveis Aleatérias e Distribuicdes 204 Chapter 3 Random Variables and Distributions 17.Suponha queXeSsao variaveis aleatdrias. O pdf DeixarSi=min{M, ..., Xn}eSn=maximo{m, ..., Xn}. 17. Suppose that X and Y are random variables. The mar- Let Y; = min{X;,..., X,}and Y, = max{Xj,..., X,,}. De- marginal dexé Determine a pdf condicional deSidado que Sn=simn. ginal p.d.f. of X is termine the conditional p.d.f. of Y; given that Y, = y,. fi t para O<x <1, 24.Suponha queXi,X2, eX3forme uma amostra aleatoria 3x2 for0<x <1, 24. Suppose that X,, X2, and X3 form a random sample of (XF 3x2 0 de outra forma de trés observacgées de uma distribuigdo com a fQ)= 0 otherwise three observations from a distribution having the follow- , seguinte pdf: , ing p.d.f.: Além disso, 0 pdf condicional deSdado queX=xé { Also, the conditional p.d.f. of Y given that X = x is { . fix x para O<x <1, toro f= | 2x for <x < 1, o(s| x we para0<y <x, de outra forma. g(ylx) = 2 gx For <y <X, 0 otherwise. Ode outra forma, Determine a pdf do intervalo da amostra. 0 otherwise. Determine the p.d.f. of the range of the sample. Determinar(a)o pdf marginal deSe(b)o pdf condicional 25.Neste exercicio, forneceremos uma justificativa Determine (a) the marginal p.d.f. of Y and (b) the condi- 25. In this exercise, we shall provide an approximate jus- deXdado queS=sim. aproximada para a Eq. (3.6.6). Primeiro, lembre-se que sea tional p.d.f of X given that Y = y. tification for Eq. (3.6.6). First, remember that if a and b 18.Suponha que a distribuicgao conjunta dexe.sé eb estado proximos, entao 18. Suppose that the joint distribution of X and Y is uni- _ are close together, then uniforme em toda a regidoxy-plano delimitado pelas Jo ( b ) form over the region in the xy-plane bounded by the four b b quatro linhasx= -1,x=1,sim=x+1, esim=x-1. Determinar r(tdt=(b-a)r ab . (3.11.1) lines x = —1,x=1, y=x+1, and y=x — 1. Determine / r(t)dt © (b—a)r (). (3.11.1) (a)Pr.(XY >0)e(b)o pdf condicional deSdado queX=x. a 2 (a) Pr(XY > 0) and (b) the conditional p.d.f. of Y given a 2 Ao longo deste problema, suponha queXeStem pdf that X =x. Throughout this problem, assume that X and Y have joint 19.Suponha que as variaveis aleatériasX,S,eZtem o conjuntof. 19. Suppose that the random variables X,Y,and Zhave _P-d-f. f. seguinte pdf conjunto: a.Use (3.11.1) para aproximar Pr(vocée < Y<simte). the following joint p.d.f.: a. Use (3.11.1) to approximate Pr(y —e < ¥Y <y +6). fyb { 6 para0<x <y<z<1, caso b.Use (3.11.1) comr(t=F(s, Ypara fixo éaproximar fory.2) = 6 for0<x<y<z<l, b. Use (3.11.1) with r(t) = f(s, t) for fixed s to approx- ca 0 contrdario. "(0 otherwise. imate Determine as pdfs marginais univariadas dex, S,eZ. ee < Yesimte) Determine the univariate marginal p.d.f.’s of X, Y, and Z. Pr(X sxandy—e<Ysy+e) SIMHE x yte 20.Suponha que as variaveis aleatdériasX,5,eZtem o = f(s, Jdtds. 20. Suppose that the random variables X, Y, and Z have = / / S(s, t) dt ds. seguinte pdf conjunto: ~ sive the following joint p.d-f.:: 00 vy—€ { ; ce F(X y, ZF 2 para0<x <y<1 e0<z<1, caso c.Mostre que a razdo da aproximacdo Jacao na parte (b) fa,y.2= 2 for0O<x<y<tland0<z<l1, c. Show that the ratio of the approximation in part (b) I, 0 contrario. para a aproximagdo na parte (a) éx- »g1(s| e) ds. Ye 2) = 0 otherwise. to the approximation in part (a) is Sos gy(s|y) ds. Avaliar PR@GX > Y|1<42Z<2). 26.Deixar™i, X2ser duas varidveis aleatdrias Evaluate Pr(3X > Y|1<4Z <2). 26. Let X;, X2 be two independent random variables each : _ . _ 4x _ 21.Suponha queXeSsdo variaveis aleatdrias iid e cada independentes, cada uma som PATA (x e-xparax >0 ef (xe 21. Suppose that X and Y are i.i.d. random variables, and with pdt. fix) =e“ forx > Oand fi(x) = Oforx <0. Let ‘ . O paraxs0. Deixe Z=X1-X2e CX /X2. . Z = X,— X, and W = X4/Xp. uma tem 0 seguinte pdf: that each has the following p.d.f.: { ey a.Encontre o PDF conjunto dexieZ. a. Find the joint p.d.f. of X, and Z. M —x (XK 0 parax >0, b.Prove que a fdp condicional dexidadoZ=0 é f@)= | 0 a > 0, b. Prove that the conditional p.d.f. of X; given Z = 0 is de outra forma. { otherwise. -2. Além disso, deixe vocé=X/(X+ SJe V=X+ 5. gica|oe 276 2xParaxi>0, Also, let U = X/(X +Y) and V =X+Y. 21(x1|0) = | 2e™ for x1 > 0, a.Determine a pdf conjunta devocée V. 0 Se outta forma. a. Determine the joint p.d.f. of U and V 0 otherwise. b.Sdo vocée Vndependente? c.Encontre o PDF conjunto dexieC. b. Are U and V independent? c. Find the joint p.d.f. of X; and W. 22.Suponha que as varidveis aleatériasXeStem o d.Prove que a fdp condicionel dexidadoC1 é 22. Suppose that the random variables X and Y have the d. Prove that the conditional p.d.f. of X; given W = Lis seguinte pdf conjunto: following joint p.d.f.: 9 { | mn | 1) 4x1 e-2xi1parax1>0, hy(xy|1) = | 4xje2"1 for x1 > 0, F(x, YF 8Xy para 0sx<sims1, caso 0 de outra forma. fn y= | 8&xy for0O<x<y<1, 0 otherwise. 0 contrario. e.Notar que {Z+0} = {G1}, mas a distribuicdo condicional de 0 otherwise. e. Notice that {Z = 0} = {W = 1}, but the conditional Além disso, deixevocé=X/Ye V=S. XidadoZ0 nao é o mesmo que a distribuigdo Also, let U = X/Y andV =Y. distribution of X, given Z = 0 is not the same as the a.Determine a pdf coniunta devocée V. condicional deXidado C1. Esta discrepancia é conhecida Determine the joint p.d.f. of U and V conditional distribution of X, given W = 1. This dis- ~ P ) . como Paradoxo do Borel. A|uz da discussdo0 que comeca ae ekermine © jon pct orange’ crepancy is known as the Borel paradox. In light b.SaoXe Sindependente? na pagina 146 sobre como as FDPs condicionais nao sao b. Are X and Y independent? of the discussion that begins on page 146 about c.Sdovocée Vindependente? como 0 condicionamento em eventos de probabilidade c. Are U and V independent? how conditional p.d.f’s are not like conditioning on . co. ae 0, mostre como “Zmuito prdéximo de 0” ndo 6 o mesmo _ . events of probability 0, show how “Z very close to 23.Suponha queXi,..., Xnsdo variaveis aleatorias iid, a . a. nh: . 23. Suppose that X,,..., X, are iid. random variables, >: < 9 Tig. d : df: que “Cmuito préximo de 1."Dica:Desenhe um conjunto h having the followi af: 0” is not the same as “W very close to 1.” Hint: Draw cada uma Com o oC car de eixos paraxiex2, e desenhe os dois conjuntos {(x1, x2): each having the following c.¢.t.: a set of axes for x, and x2, and draw the two sets 0 parax<0, | x1-x2| < Efe {(x1, x2) | x1/x2-1| < €} e Veja o quanto eles 0 for x <0, {(x1, X2) : |xy — Xo| < €} and {(x4, x2): |x1/x2 — 1| < €} F(X 1 -ex parax >0 sao diferentes. F(x) = l—e forx>0 and see how much different they are. 3.11 Supplementary Exercises 205 27. Three boys A, B, and C are playing table tennis. In each game, two of the boys play against each other and the third boy does not play. The winner of any given game n plays again in game n + 1 against the boy who did not play in game n, and the loser of game n does not play in game n + 1. The probability that A will beat B in any game that they play against each other is 0.3, the probability that A will beat C is 0.6, and the probability that B will beat C is 0.8. Represent this process as a Markov chain with stationary transition probabilities by defining the possible states and constructing the transition matrix. 28. Consider again the Markov chain described in Exer- cise 27. (a) Determine the probability that the two boys who play against each other in the first game will play against each other again in the fourth game. (b) Show that this probability does not depend on which two boys play in the first game. 29. Find theunique stationary distribution for theMarkov chain in Exercise 27. 3.11 Exercícios Suplementares 205 27.Três meninosA,B, eCestão jogando tênis de mesa. Em cada jogo, dois dos meninos jogam um contra o outro e o terceiro menino não joga. O vencedor de qualquer jogo njoga novamente no jogon+1 contra o garoto que não jogoun, e o perdedor do jogonnão joga no jogon+1. A probabilidade de queAvai baterBem qualquer jogo que eles joguem um contra o outro é de 0,3, a probabilidade de que Avai baterCé 0,6, e a probabilidade de queBvai bater Cé 0,8. Represente este processo como uma cadeia de Markov com probabilidades de transição estacionárias, definindo os estados possíveis e construindo a matriz de transição. 28.Considere novamente a cadeia de Markov descrita no Exercício 27.(a)Determine a probabilidade de que os dois meninos que jogaram um contra o outro no primeiro jogo joguem um contra o outro novamente no quarto jogo.(b) Mostre que esta probabilidade não depende de quais dois meninos jogam o primeiro jogo. 29.Encontre a distribuição estacionária única para a cadeia de Markov no Exercício 27. This page intentionally left blank Esta página foi intencionalmente deixada em branco Chapter 4 Expectation 4.1 The Expectation of a Random Variable 4.2 Properties of Expectations 4.3 Variance 4.4 Moments 4.5 The Mean and the Median 4.6 Covariance and Correlation 4.7 Conditional Expectation 4.8 Utility 4.9 Supplementary Exercises 4.1 The Expectation of a Random Variable The distribution of a random variable X contains all of the probabilistic infor- mation about X. The entire distribution of X, however, is usually too cumbersome for presenting this information. Summaries of the distribution, such as the average value, or expected value, can be useful for giving people an idea of where we expect X to be without trying to describe the entire distribution. The expected value also plays an important role in the approximation methods that arise in Chapter 6. Expectation for a Discrete Distribution Example 4.1.1 Fair Price for a Stock. An investor is considering whether or not to invest $18 per share in a stock for one year. The value of the stock after one year, in dollars, will be 18 + X, where X is the amount by which the price changes over the year. At present X is unknown, and the investor would like to compute an “average value” for X in order to compare the return she expects from the investment to what she would get by putting the $18 in the bank at 5% interest. ◀ The idea of finding an average value as in Example 4.1.1 arises in many applications that involve a random variable. One popular choice is what we call the mean or expected value or expectation. The intuitive idea of the mean of a random variable is that it is the weighted average of the possible values of the random variable with the weights equal to the probabilities. Example 4.1.2 Stock Price Change. Suppose that the change in price of the stock in Example 4.1.1 is a random variable X that can assume only the four different values −2, 0, 1, and 4, and that Pr(X = −2) = 0.1, Pr(X = 0) = 0.4, Pr(X = 1) = 0.3, and Pr(X = 4) = 0.2. Then the weighted avarage of these values is −2(0.1) + 0(0.4) + 1(0.3) + 4(0.2) = 0.9. The investor now compares this with the interest that would be earned on $18 at 5% for one year, which is 18 × 0.05 = 0.9 dollars. From this point of view, the price of $18 seems fair. ◀ 207 C4 felizmente Expectativa 4.1A expectativa de uma variável aleatória 4.2Propriedades das Expectativas 4.3Variância 4.4Momentos 4,5A média e a mediana 4.6Covariância e Correlação 4.7Expectativa Condicional 4.8Utilitário 4.9Exercícios Suplementares 4.1 A Expectativa de uma Variável Aleatória A distribuição de uma variável aleatóriaXcontém todas as informações probabilísticas sobreX.Toda a distribuição deX,no entanto, é normalmente demasiado complicado para apresentar esta informação. Resumos da distribuição, como o valor médio ou o valor esperado, podem ser úteis para dar às pessoas uma ideia de onde esperamos Xser sem tentar descrever toda a distribuição. O valor esperado também desempenha um papel importante nos métodos de aproximação que surgem no Capítulo 6. Expectativa de uma distribuição discreta Exemplo 4.1.1 Preço justo para uma ação.Um investidor está considerando se deve ou não investir US$ 18 por participar de uma ação por um ano. O valor das ações após um ano, em dólares, será de 18 +X, ondeXé o valor pelo qual o preço muda ao longo do ano. Atualmente Xé desconhecido, e o investidor gostaria de calcular um “valor médio” paraXpara comparar o retorno que ela espera do investimento com o que obteria se colocasse os US$ 18 no banco a juros de 5%. - A ideia de encontrar um valor médio como no Exemplo 4.1.1 surge em muitas aplicações que envolvem uma variável aleatória. Uma escolha popular é o que chamamos designificarou valor esperadoouexpectativa. A ideia intuitiva da média de uma variável aleatória é que ela é a média ponderada dos valores possíveis da variável aleatória com pesos iguais às probabilidades. Exemplo 4.1.2 Mudança no preço das ações.Suponha que a mudança no preço das ações no Exemplo 4.1.1 é uma variável aleatóriaXque pode assumir apenas os quatro valores diferentes −2,0,1 e 4, e que Pr(X= -2)=0.1, Pr.(X=0)=0.4, Pr.(X=1)=0.3, e Pr.(X=4)=0.2. Então a média ponderada desses valores é − 2(0.1)+0(0.4)+1(0.3)+4(0.2)=0.9. O investidor agora compara isso com os juros que seriam ganhos sobre US$ 18 a 5% por um ano, que é 18×0.05 = 0.9 dólares. Deste ponto de vista, o preço de 18 dólares parece justo. - 207 208 Capitulo 4 Expectativa 208 Chapter 4 Expectation O calculo no Exemplo 4.1.2 generaliza facilmente para qualquer variavel aleatéria que assume apenas The calculation in Example 4.1.2 generalizes easily to every random variable that um numero finito de valores. Possiveis problemas surgem com varidveis aleatérias que assumem assumes only finitely many values. Possible problems arise with random variables mais do que um numero finito de valores, especialmente quando a colecdo de valores possiveis é that assume more than finitely many values, especially when the collection of possible ilimitada. values is unbounded. Definigao Média da variavel aleatoria discreta limitada.DeixarXser um aleatério discreto limitado Definition Mean of Bounded Discrete Random Variable. Let X be a bounded discrete random 4.1.1 variavel cujo PF éfO expectativa dex, denotado por£X), 6 um numero definido da seguinte 4.1.1 variable whose p.f. is f. The expectation of X, denoted by E(X), is a number defined forma: as follows: 2d EX Xf). (4.1.1) E(X) = D> xf(x). (4.1.1) Todosx All x A expectativa deXtambém é chamado designificado deXou ova/or esperado dex. The expectation of X is also referred to as the mean of X or the expected value of X. No Exemplo 4.1.2,£X0.9. Observe que 0,9 ndo é um dos valores possiveis deX nesse In Example 4.1.2, E(X) = 0.9. Notice that 0.9 is not one of the possible values of X exemplo. Este 6 normalmente o caso com variaveis aleatérias discretas. in that example. This is typically the case with discrete random variables. Exemplo Variavel aleatoria de Bernoulli.DeixarXtem a distribuicao de Bernoulli com parametrop, Example Bernoulli Random Variable. Let X have the Bernoulli distribution with parameter p, 4.1.3 isto é, suponha queXleva apenas os dois valores 0 e 1 com Pr(X=1Fp. Entdo a 4.1.3 that is, assume that X takes only the two values 0 and 1 with Pr(X = 1) = p. Then the média dexé mean of X is EX0x(1 -p}+1xp=pdg. - E(X)=0x (1— p)+1x p=p. < SeXé ilimitado, ainda pode ser possivel definirEX)como a média ponderada de seus If X is unbounded, it might still be possible to define E(X) as the weighted valores possiveis. Porém, alguns cuidados sdo necessarios. average of its possible values. However, some care is needed. Definigao Média da variavel aleatoria discreta geral.Deixar Xser uma variavel aleatéria discreta cujo Definition Mean of General Discrete Random Variable. Let X be a discrete random variable whose 4.1.2 pf éfSuponha que pelo menos uma das seguintes somas seja finita: 4.1.2 p.f. is f. Suppose that at least one of the following sums is finite: 2d 2d Xf(x), Xf(x). (4.1.2) > xf (x), > xf (x). (4.1.2) Positivox Negativox Positive x Negative x Entdo osigniticar, expectativa, ouvalor esperadodeXé dito existire é definido como Then the mean, expectation, or expected value of X is said to exist and is defined to be 2d EX xf(x). (4.1.3) E(X)= > xf (x). (4.1.3) Todosx All x Se ambas as somas em (4.1.2) forem infinitas, entéo EX)ndo existe. If both of the sums in (4.1.2) are infinite, then E(X) does not exist. A razdo pela qual a expectativa ndo existe se ambas as somas em (4.1.2) forem infinitas é The reason that the expectation fails to exist if both of the sums in (4.1.2) are que, nesses casos, a soma em (4.1.3) ndo esta bem definida. E sabido pelo caélculo que a soma infinite is that, in such cases, the sum in (4.1.3) is not well-defined. It is known from de uma série infinita cujos termos positivos e negativos somam-se ao infinito ou nado consegue calculus that the sum of an infinite series whose positive and negative terms both convergir ou pode convergir para muitos valores diferentes reorganizando os termos em add to infinity either fails to converge or can be made to converge to many different ordens diferentes. Nao queremos que 0 significado do valor esperado dependa de escolhas values by rearranging the terms in different orders. We don’t want the meaning of arbitrarias sobre a ordem de adicgdo dos numeros. Se apenas uma das duas somas em (4.1.3) expected value to depend on arbitrary choices about what order to add numbers. If for infinita, entaéo o valor esperado também 6 infinito com o mesmo sinal da soma que é only one of two sums in (4.1.3) is infiinte, then the expected value is also infinite with infinita. Se ambas as somas forem finitas, enta@o a soma em (4.1.3) converge e nado depende da the same sign as that of the sum that is infinite. If both sums are finite, then the sum ordem em que os termos sao adicionados. in (4.1.3) converges and doesn’t depend on the order in which the terms are added. Exemplo A média deXNdo existe.DeixarXseja uma variavel aleatoria cujo PF é Example The Mean of X Does Not Exist. Let X be a random variable whose pf. is 4.1.4 | 1 4.1.4 1 ——_——___ se x=+1,42,43,..., —— ifx=+1, +2,43,..., fo [2] x| (x1 +1) FO)= ) Axle +1 0 de outra forma. 0 otherwise. 4.1 A Expectativa de uma Variavel Aleatéria 209 4.1 The Expectation of a Random Variable 209 Pode-se verificar que esta fungao satisfaz as condigées exigidas para ser um pf. As duas It can be verified that this function satisfies the conditions required to be a p.f. The somas em (4.1.2) sdo two sums in (4.1.2) are >” 1 » 1 cg 1 ~ 1 x-——_ = 2 ee x =; > x-=———— = ~00 and S° x = 00; wet 21411 +) vet 2x(x+1) a) 2|x|(\x| + D a 2x(x+ 1) por isso, EX)ndo existe. - hence, E(X) does not exist. < Exemplo Uma média infinita.Deixar Xseja uma variavel aleatéria cujo PF é Example An Infinite Mean. Let X be a random variable whose pf. is 4.1.5 | 1 4.1.5 1 ——— sex=1,2,3,..., —§ ifx=1,2,3,..., fx [x(x+1) f(x) = | x(x +1) 0 de outra forma. 0 otherwise. A soma dos valores negativos na Eq. (4.1.2) 6 0, entdo a média deXexiste e é The sum over negative values in Eq. (4.1.2) is 0, so the mean of X exists and is Y 1 = 1 EX X——— =, E(X) = yx = 00. x1 X(x+1) rat x(x+1 Dizemos que a média deXé/nfinitonesse caso. - We say that the mean of X is infinite in this case. < Nota: A expectativa deXDepende apenas da distribuigéo deX.Embora EX chamada de Note: The Expectation of X Depends Only on the Distribution of X. Although expectativa deX, depende apenas da distribuigaéo deX. Cada duas variaveis aleatérias que E(X) is called the expectation of X, it depends only on the distribution of X. Every tenham a mesma distribuicdo terdo a mesma expectativa, mesmo que nado tenham nada a ver two random variables that have the same distribution will have the same expectation uma com a outra. Por esta razdo, referir-nos-emos frequentemente a expectativa de uma even if they have nothing to do with each other. For this reason, we shall often refer distribuigdo mesmo que ndo tenhamos em mente uma variavel aleatéria com essa distribuicdo. to the expectation of a distribution even if we do not have in mind a random variable with that distribution. Expectativa de uma distribuicgdo continua Expectation for a Continuous Distribution A ideia de calcular uma média ponderada dos valores possiveis pode ser generalizada The idea of computing a weighted average of the possible values can be generalized para variaveis aleatérias continuas usando integrais em vez de somas. A distingdo entre to continuous random variables by using integrals instead of sums. The distinction variaveis aleatdérias limitadas e ilimitadas surge neste caso pelas mesmas razGes. between bounded and unbounded random variables arises in this case for the same reasons. Definigao Média da variavel aleatéria continua limitada.DeixarXser um continuo limitado Definition Mean of Bounded Continuous Random Variable. Let X be a bounded continuous 4.1.3 variavel aleatéria cujo pdf €£O expectativa dex, denotado Ex), é definido da seguinte 4.1.3 random variable whose p.d.f. is f. The expectation of X, denoted E(X), is defined forma: as follows: Joo oo EXF xh(x)dx. (4.1.4) E(X)= / xf (x) dx. (4.1.4) _ _ Mais uma vez, a expectativa também é chamada designificarou ovalor esperado. Once again, the expectation is also called the mean or the expected value. Exemplo Tempo de falha esperado.Um aparelho tem uma vida util maxima de um ano. A Hora Example Expected Failure Time. An appliance has a maximum lifetime of one year. The time 4.1.6 Xaté falhar 6 uma variavel aleatéria com distribuigdo continua tendo pdf 4.1.6 X until it fails is a random variable with a continuous distribution having p.d.f. { 2x para O<x <1, 2x forO<x <1, for P f= { . 0 de outra forma. 0 otherwise. Entao Then fi fi 5 1 1 2 EX x(2X)ax= 2x2dx= =. E(X)= / x(2x) dx = / 2x? dx = =. 0 0 3 0 0 3 Podemos dizer também que a expectativa da distribuigdo com pdf 2/3. - We can also say that the expectation of the distribution with p.d.f. f is 2/3. < 210 Capitulo 4 Expectativa 210 Chapter 4 Expectation Para varidveis aleatorias continuas gerais, modificamos a Definigdo 4.1.2. For general continuous random variables, we modify Definition 4.1.2. Definigao Média da variavel aleatoria continua geral.DeixarXser uma variavel aleatéria continua Definition Mean of General Continuous Random Variable. Let X be a continuous random variable 4.1.4 cujo pdf é£#Suponha que pelo menos uma das seguintes integrais seja finita: 4.1.4 whose p.d.f. is f. Suppose that at least one of the following integrals is finite: Joo Jo oo 0 Xf(X)dx, Xf(x)dx. (4.1.5) / xf (x)dx, / xf (x)dx. (4.1.5) (@) — 00 0 —0o Entdo osigniticar, expectativa, ouvalor esperadodeXé dito existire é definido para Then the mean, expectation, or expected value of X is said to exist and is defined to ser be Joo oo EX xfogax. (4.1.6) E(X) = / xf (x)dx. (4.1.6) 00 —0o Se ambas as integrais em (4.1.5) forem infinitas, entao£X)ndo existe. If both of the integrals in (4.1.5) are infinite, then E(X) does not exist. Exemplo Falha apos garantia.Um produto tem garantia de um ano. DeixarXseja a hora em Example Failure after Warranty. A product has a warranty of one year. Let X be the time at 4.1.7 qual o produto falha. Suponha queXtem uma distribuigdo continua com o pdf 4.1.7 which the product fails. Suppose that X has a continuous distribution with the p.d.f. { f 0 parax <1, 0 forx <1, OF 2 parax21. FQ) = 2 forx>1. xB x O tempo esperado até a falha é entao - The expected time to failure is then Joo 2 Jo2 °° 2 °° 2 EXE xX— dx= — ax=2. - E(X)= / x—dx =| —dx =2. < q x3 1 x2 1 x3 1 x2 Exemplo Um meio que nado existe.Suponha que uma variavel aleatoriaXtem um continuo Example A Mean That Does Not Exist. Suppose that a random variable X has a continuous 4.1.8 distribuigdo cujo pdf é o seguinte: 4.1.8 distribution for which the p.d.f. is as follows: 1 1 fxy= = ——— ___ para -~<x <w, (4.1.7) x) = —— for -w<x<o. 4.1.7 m1 +x2) P £O) m(1+ x?) ( ) JEssa distribuigdo é chamada de Distribui¢ago de Cauchy. Podemos verificar 0 fato de que This distribution is called the Cauchy distribution. We can verify the fact that “we f(x)dx=1 usando o seguinte resultado padrdo do calculo elementar: Le f(x) dx =1by using the following standard result from elementary calculus: d 1 d -1 1 — bronzeado-1x» ———— Pala -©<xX <oo, — tan” x = ——— for -w<x<o. dx 1 +.x2 dx 1+ x? As duas integrais em (4.1.5) sdo The two integrals in (4.1.5) are Jo Jo 0 x x °° —~___ dx=we —*___ dy= -0; / —*—_dx =00 and / —* dx = —00; o ml +x2) -on(1 +x) 2 0 mwi+x?) oo W(1+ x2) portanto, a média deXndo existe. - hence, the mean of X does not exist. < Interpretacdo da Expectativa Interpretation of the Expectation Relacao da Média com o Centro de Gravidaded expectativa de um acaso Relation of the Mean to the Center of Gravity The expectation of a random variavel ou, equivalentemente, a média da sua distribuigdo pode ser considerada como sendo o variable or, equivalently, the mean of its distribution can be regarded as being the centro de gravidade dessa distribuicdo. Para ilustrar este conceito, considere, por exemplo, o FP center of gravity of that distribution. To illustrate this concept, consider, for example, esbocado na Figura 4.1. Ox-eixo pode ser considerado como uma longa haste sem peso a qual the p.f. sketched in Fig. 4.1. The x-axis may be regarded as a long weightless rod to sao fixados pesos. Se um peso igual af(xjesta preso a esta haste em cada pontox,entdo a haste which weights are attached. If a weight equal to f(x) is attached to this rod at each estara equilibrada se estiver apoiada no ponto£x). point x;, then the rod will be balanced if it is supported at the point E(X). Agora considere a PDF esbogada na Figura 4.2. Neste caso, 0x-eixo pode ser considerado Now consider the p.d.f. sketched in Fig. 4.2. In this case, the x-axis may be como uma longa haste sobre a qual a massa varia continuamente. Se a densidade de regarded as a long rod over which the mass varies continuously. If the density of 4.1 The Expectation of a Random Variable 211 Figure 4.1 The mean of a discrete distribution. x3 x2 x1 f(x1) f(x2) f(x3) f(x4) f(x5) x4 x x4 E(X) Figure 4.2 The mean of a continuous distribution. x E(X) f(x) the rod at each point x is equal to f (x), then the center of gravity of the rod will be located at the point E(X), and the rod will be balanced if it is supported at that point. It can be seen from this discussion that the mean of a distribution can be affected greatly by even a very small change in the amount of probability that is assigned to a large value of x. For example, the mean of the distribution represented by the p.f. in Fig. 4.1 can be moved to any specified point on the x-axis, no matter how far from the origin that point may be, by removing an arbitrarily small but positive amount of probability from one of the points xj and adding this amount of probability at a point far enough from the origin. Suppose now that the p.f. or p.d.f. f of some distribution is symmetric with respect to a given point x0 on the x-axis. In other words, suppose that f (x0 + δ) = f (x0 − δ) for all values of δ. Also assume that the mean E(X) of this distribution exists. In accordance with the interpretation that the mean is at the center of gravity, it follows that E(X) must be equal to x0, which is the point of symmetry. The following example emphasizes the fact that it is necessary to make certain that the mean E(X) exists before it can be concluded that E(X) = x0. Example 4.1.9 The Cauchy Distribution. Consider again the p.d.f. specified by Eq. (4.1.7), which is sketched in Fig. 4.3. This p.d.f. is symmetric with respect to the point x = 0. Therefore, if the mean of the Cauchy distribution existed, its value would have to be 0. However, we saw in Example 4.1.8 that the mean of X does not exist. The reason for the nonexistence of the mean of the Cauchy distribution is as follows: When the curve y = f (x) is sketched as in Fig. 4.3, its tails approach the x- axis rapidly enough to permit the total area under the curve to be equal to 1. On the other hand, if each value of f (x) is multiplied by x and the curve y = xf (x) is sketched, as in Fig. 4.4, the tails of this curve approach the x-axis so slowly that the total area between the x-axis and each part of the curve is infinite. ◀ 4.1 A Expectativa de uma Variável Aleatória 211 Figura 4.1A média de uma distribuição discreta. f(x4) f(x3) f(x2) f(x5) f(x1) E(X) x1 x2 x3 x4 x4 x Figura 4.2A média de uma distribuição contínua. f(x) E(X) x a haste em cada pontoxé igual af(x), então o centro de gravidade da haste estará localizado no pontoEX), e a haste estará equilibrada se estiver apoiada nesse ponto. Pode-se ver nesta discussão que a média de uma distribuição pode ser grandemente afetada, mesmo por uma mudança muito pequena na quantidade de probabilidade atribuída a um grande valor dex. Por exemplo, a média da distribuição representada pelo PF na Fig. 4.1 pode ser movida para qualquer ponto especificado nax-eixo, não importa quão longe da origem esse ponto possa estar, removendo uma quantidade arbitrariamente pequena, mas positiva de probabilidade de um dos pontosxje adicionando essa quantidade de probabilidade em um ponto suficientemente distante da origem. Suponha agora que o PF ou PDFfde alguma distribuição é simétrica em relação a um determinado pontox0nox-eixo. Em outras palavras, suponha quef(x0+δ)=f(x0-δ) para todos os valores deδ. Suponha também que a médiaEX)desta distribuição existe. De acordo com a interpretação de que a média está no centro de gravidade, segue-se queEX)deve ser igual ax0, que é o ponto de simetria. O exemplo a seguir enfatiza o fato de que é necessário ter certeza de que a médiaEX)existe antes que se possa concluir queEX)=x0. Exemplo 4.1.9 A distribuição de Cauchy.Considere novamente a fdp especificada pela Eq. (4.1.7), que é esboçado na Fig. 4.3. Esta pdf é simétrica em relação ao pontox=0. Portanto, se existisse a média da distribuição de Cauchy, seu valor teria que ser 0. Porém, vimos no Exemplo 4.1.8 que a média deXnão existe. A razão para a inexistência da média da distribuição de Cauchy é a seguinte: Quando a curvasim=f(x)é esboçado como na Fig. 4.3, suas caudas se aproximam dox- eixo com rapidez suficiente para permitir que a área total sob a curva seja igual a 1. Por outro lado, se cada valor def(x)é multiplicado porxe a curvasim=xf(x)está esboçado, como na Fig. 4.4, as pontas desta curva se aproximam dox-eixo tão lentamente que a área total entre ox-axis e cada parte da curva é infinita. - 212 Capitulo 4 Expectativa 212 Chapter 4 Expectation Figura 4.30 pdf de uma Kx) Figure 4.3 The p.d.f. of a SQ) distribuigdo Cauchy. Cauchy distribution. 1 p z | | | | | | | | | | | | | | | | | | | | | | | | | | | | 3 -2 -1 1 oO 1 1 2 3 x 3 -2 -l 1 0 4 1 2 3 * v3 v3 3 3 Figura 4.4A curva AX) Figure 4.4 The curve fx) sim=xf(x)para a distribuigao y =xf(x) for the Cauchy de Cauchy. distribution. 1 1 2p 7 I anf I | | 3 -2 “11 1 2 3 Xx -3 -2 =i 1 2 3. * —<__|. 1 << tL 2p Qa A expectativa de uma funcdo The Expectation of a Function Exemplo Taxa de falha e tempo até a falha.Suponha que os aparelhos fabricados por um determinado Example Failure Rate and Time to Failure. Suppose that appliances manufactured by a particular 4.1.10 empresa falha a uma taxa deXpor ano, ondeXé atualmente desconhecido e, portanto, é uma 4.1.10 company fail at a rate of X per year, where X is currently unknown and hence is a variavel aleatéria. Se estivermos interessados em prever quanto tempo tal aparelho durara random variable. If we are interested in predicting how long such an appliance will antes de falhar, poderemos usar a média de 1/X. Como podemos calcular a média de S=1/X? last before failure, we might use the mean of 1/X. How can we calculate the mean - of Y=1/X? < Fungées de uma Unica varidvel aleatériaSeXé uma variavel aleatéria para a qual Functions of a Single Random Variable If X is a random variable for which the pdf éfentdo a expectativa de cada fungdo com valor realr(X)pode ser encontrado p.d.f. is f, then the expectation of each real-valued function r(X) can be found by aplicando a definicgdo de expectativa a distribuigdo der(/X)da seguinte forma: applying the definition of expectation to the distribution of r(X) as follows: Let Deixe S=r(X), determine a distribuigdo de probabilidade deS,e entao determinar Y =r(X), determine the probability distribution of Y, and then determine E(Y) E(S) aplicando a Eq. (4.1.1) ou Eq. (4.1.4). Por exemplo, suponha queStem uma by applying either Eq. (4.1.1) or Eq. (4.1.4). For example, suppose that Y has a distribuigado continua com o pdfg. Entado j continuous distribution with the p.d.f. g. Then ” Co FUr(X)| =E(SF yay) dy, (4.1.8) E[r(X)]= EY) = / yg(y) dy, (4.1.8) 00 —0o se a expectativa existir. if the expectation exists. Exemplo Taxa de falha e tempo até a falha.No Exemplo 4.1.10, suponha que a pdf dexé Example Failure Rate and Time to Failure. In Example 4.1.10, suppose that the p.d.f. of X is 4.1.11 { 4.1.11 3x2 se O<x <1, 3x2 if0<x <1, hxF f(x) = | . 0 de outra forma. 0 otherwise. 4.1 A Expectativa de uma Variavel Aleatéria 213 4.1 The Expectation of a Random Variable 213 Deixarr(x1/x. Usando os métodos da Seg. 3.8, podemos encontrar o pdf deS=/(Xomo Let r(x) = 1/x. Using the methods of Sec. 3.8, we can find the p.d.f. of Y = r(X) as { ols 3sim-4 sevocé >1, g(y) = { 3y—4 if y> 1, 0 de outra forma. 0 otherwise. A média de5Sé entdo The mean of Y is then Joo 3 4 3 E(S) = sim3sim-4morrer=z - E(Y)= y3y dy =. < 0 2 0 2 Embora 0 método do Exemplo 4.1.11 possa ser usado para encontrar a média de Although the method of Example 4.1.11 can be used to find the mean of a uma variavel aleatéria continua, na verdade ndo é necessario determinar a fdp de r/X)para continuous random variable, it is not actually necessary to determine the p.d.f. of calcular a expectativaé[/(XJ]. Na verdade, pode-se mostrar que o valor def[/(XJ] sempre r(X) in order to calculate the expectation E[r(X)]. In fact, it can be shown that the pode ser calculado diretamente usando 0 seguinte resultado. value of E[r(X)] can always be calculated directly using the following result. Teorema Lei do Estatistico Inconsciente.DeixarXseja uma variavel aleatéria e sejaRseja um verdadeiro- Theorem Law of the Unconscious Statistician. Let X be a random variable, and let r be a real- 4.1.1 fung&o avaliada de uma variavel real. SeXtem uma distribuigdo continua, entao 4.1.1 valued function of a real variable. If X has a continuous distribution, then Joo oo EAL(X)] = rota, (4.1.9) E[r(X)] =| r(x) f (x) dx, (4.1.9) 0 —0o se a média existir. SeXtem uma distribuigdo discreta, entao if the mean exists. If X has a discrete distribution, then 2d AUX) = AAXIFOY), (4.1.10) E[r(X]= Yo rf), (4.1.10) Todosx All x se a média existir. if the mean exists. ProvaUma prova geral ndo sera dada aqui. No entanto, forneceremos uma prova para Proof A general proof will not be given here. However, we shall provide a proof dois casos especiais. Primeiro, suponha que a distribuigdo deXé discreto. Entdo a for two special cases. First, suppose that the distribution of X is discrete. Then the distribuigdo deStambém deve ser discreto. Deixargseja o PF deS.Para este caso, distribution of Y must also be discrete. Let g be the p.f. of Y. For this case, 2d 2d YOVF simPrir(XF sim] > yg(y) = > y Pr[r(X) = y] sim sim y y > 2 = sim K(x) =Yiy YE fe SIM xr(xsim y xir(x)=y > 2 2d = OO FOO). =) YS r@Mf@ =o r@) fe). SiMe r(xj=sim x y xir(x)=y x Portanto, a Eq. (4.1.10) produz o mesmo valor que se obteria da Definigdo 4.1.1 Hence, Eq. (4.1.10) yields the same value as one would obtain from Definition 4.1.1 aplicada aS. applied to Y. Em segundo lugar, suponha que a distribuigdo deXé continuo. Suponha também, como na Second, suppose that the distribution of X is continuous. Suppose also, as in Sec. 3.8, quer(x% estritamente crescente ou estritamente decrescente com inverso diferenciavel Sec. 3.8, that r(x) is either strictly increasing or strictly decreasing with differentiable s(s). Entéo, se mudarmos as varidveis na Eq. (4.1.9) dexparasim=r(x), inverse s(y). Then, if we change variables in Eq. (4.1.9) from x to y=r(x), Je Jee | , ds(s) * ~ ds(y) rotxjdx= simts(s)l poe Imorri / r(x) f(x) dx = / yf[sQ)] — dy. 0 — 0 —oo —0o y Segue-se agora da Eq. (3.8.3) que 0 lado direito desta equacdo é igual a It now follows from Eq. (3.8.3) that the right side of this equation is equal to Joo oo YOY) Ay. / yg(y) dy. -o _ Portanto, as Eqs. (4.1.8) e (4.1.9) produzem o mesmo valor. 7 Hence, Eqs. (4.1.8) and (4.1.9) yield the same value. 7 214 Capitulo 4 Expectativa 214 Chapter 4 Expectation O Teorema 4.1.1 € chamado de lei do estatistico inconsciente porque muitas Theorem 4.1.1 is called the law of the unconscious statistician because many peo- pessoas tratam as Eqs. (4.1.9) e (4.1.10) como a definigdo de£[(XJ] e esqueca que ple treat Eqs. (4.1.9) and (4.1.10) as the definition of E[r(X)] and forget that the a definigdo da média de S=r(/X}é dado nas Definigdes 4.1.2 e 4.1.4. definition of the mean of Y = r(X) is given in Definitions 4.1.2 and 4.1.4. Exemplo Taxa de falha e tempo até a falha.No Exemplo 4.1.11, podemos aplicar o Teorema 4.1.1 para Example Failure Rate and Time to Failure. In Example 4.1.11, we can apply Theorem 4.1.1 to 4.1.12 encontrar 4.1.12 find Ju 3 [ 1», 3 E(SF —3x20x%=, = E(Y)= —3x*dx = =, 0 xX 2 0 x 2 o mesmo resultado que obtivemos no Exemplo 4.1.11. - the same result we got in Example 4.1.11. < Exemplo Determinando a expectativa deX12. Suponha que o pdf deXé como dado no Exame Example Determining the Expectation of X!/*. Suppose that the p.d.f. of X is as given in Exam- 4.1.13 ple 4.1.6 e queS=Xi2. Entdo, pela Eq. (4.1.9), 4.1.13 ple 4.1.6 and that Y = X/?. Then, by Eq. (4.1.9), fi fi 1 1 E(S X12(2X)ax=2 wend, o - E(Y)= / x/?(2x) dx =2 / ax —4 < 0 0 5 0 0 5 Nota: Em geral, Al9(X)] =9(E(X)). No Exemplo 4.1.13, a média deX12é 4/5. Note: In General, E[g(X)] 4 g(E(X)). In Example 4.1.13, the mean of X!/? is 4/5. A média deXfoi calculado no Exemplo 4.1.6 como 2/3. Observe que 445 =(2)12. Na The mean of X was computed in Example 4.1.6 as 2/3. Note that 4/5 4 (2/3)!”. In verdade, a menos quegé uma funcdo linear, geralmente acontece que/[g(XJ] =g(E(X)). fact, unless g is a linear function, it is generally the case that E[g(X)] 4 g(E(X)). A Uma fungao linear gsatisfazALg(X)] =g(E(X)), como veremos no Teorema 4.2.1. linear function g does satisfy E[g(X)] = g(E(X)), as we shall see in Theorem 4.2.1. Exemplo Preco da opcdo.Suponha que as acées ordinarias da emergente empresa A sejam Example Option Pricing. Suppose that common stock in the up-and-coming company A is 4.1.14 atualmente cotado a US$ 200 por agdo. Como incentivo para que vocé trabalhe para a empresa 4.1.14 currently priced at $200 per share. As an incentive to get you to work for company A, pode ser oferecida a vocé a opcdo de comprar um determinado nimero de acées, daqui a A, you might be offered an option to buy a certain number of shares of the stock, one um ano, ao prego de US$ 200. Isso poderia ser muito valioso se vocé acreditasse que 0 preco year from now, at a price of $200. This could be quite valuable if you believed that the das agées provavelmente subiria no préximo ano. Para simplificar, suponha que o pregoXdo stock was very likely to rise in price over the next year. For simplicity, suppose that estoque daqui a um ano é uma variavel aleatéria discreta que pode assumir apenas dois the price X of the stock one year from now is a discrete random variable that can take valores (em délares): 260 e 180. Sejapseja a probabilidade de queX=260. Vocé deseja calcular o only two values (in dollars): 260 and 180. Let p be the probability that X = 260. You valor dessas opcées de acées, seja porque contempla a possibilidade de vendé-las ou porque want to calculate the value of these stock options, either because you contemplate deseja comparar a oferta da Empresa A com o que outras empresas estado oferecendo. DeixarS the possibility of selling them or because you want to compare Company A’s offer sera o valor da opcdo de uma acao quando ela expirar em um ano. Como ninguém pagaria US$ to what other companies are offering. Let Y be the value of the option for one share 200 pelas acées se 0 precoX for inferior a $ 200, o valor da opcao de acées sera 0 seX=180. SeX= when it expires in one year. Since nobody would pay $200 for the stock if the price X 260, seria possivel comprar as acdes por US$ 200 por acdo e vendé-las imediatamente por US$ is less than $200, the value of the stock option is 0 if X = 180. If X = 260, one could 260. Isso gera um lucro de $ 60 por acao. (Para simplificar, ignoraremos os dividendos e a buy the stock for $200 per share and then immediately sell it for $260. This brings in a transacdo profit of $60 per share. (For simplicity, we shall ignore dividends and the transaction custos de compra e venda de ac¢6es.) Entdo S=/A(Xjonde costs of buying and selling stocks.) Then Y = h(X) where hix= { O sex=180, no = {0 if x = 180, 60 sex=260. 60 if x = 260. Suponha que um investidor possa ganhar 4% sem risco sobre qualquer dinheiro investido no Assume that an investor could earn 4% risk-free on any money invested for this mesmo ano. (Suponha que os 4% incluam qualquer capitalizacdo.) Se nenhuma outra op¢do de same year. (Assume that the 4% includes any compounding.) If no other investment investimento estivesse disponivel, um custo justo da op¢do seria entdo o que é chamado de options were available, a fair cost of the option would then be what is called the valor presentede E(S)em um ano. Isso é igual ao valor cde tal modo que£(SF1.04c. Ou seja, 0 present value of E(Y) in one year. This equals the value c such that E(Y) = 1.04c. valor esperado da opcao é igual 4 quantidade de dinheiro que o investidor teria apds um ano That is, the expected value of the option equals the amount of money the investor sem comprar a opcdo. Podemos encontrar £(S}facilmente: would have after one year without buying the option. We can find E(Y) easily: E(SF0x(1 -p}+60xp=60pdg. E(Y)=0x (1— p) + 60 x p=60p. Assim, 0 prego justo de uma opcdo de compra de uma ado seriac=60p/1.04 = 57.69p. So, the fair price of an option to buy one share would be c = 60p/1.04 = 57.69p. Como se deve determinar a probabilidadep? Existe um método padrdo usado no How should one determine the probability p? There is a standard method used setor financeiro para escolherpneste exemplo. Esse método é assumir que in the finance industry for choosing p in this example. That method is to assume that 4.1 A Expectativa de uma Variavel Aleatéria 215 4.1 The Expectation of a Random Variable 215 o valor presente da média deX(o prego das acdes em um ano) é igual ao valor atual the present value of the mean of X (the stock price in one year) is equal to the current do prego das acdes. Ou seja, suponha que o valor esperado de comprar uma aGao e value of the stock price. That is, assume that the expected value of buying one share esperar um ano para vender seja o mesmo que o resultado de investir o custo atual of stock and waiting one year to sell is the same as the result of investing the current da acdo sem risco durante um ano (multiplicando por 1,04 neste exemplo). No nosso cost of the stock risk-free for one year (multiplying by 1.04 in this example). In our exemplo, isso significaEX#200*1.04. Desde EX260p+180(1 -p), montamos example, this means E(X) = 200 x 1.04. Since E(X) = 260p + 180(1 — p), we set 200x1.04 = 2609+180/(1 -p), 200 x 1.04 = 260p + 180(1 — p), e obterp=0.35. O preco resultante de uma opc¢ao de compra de uma acao por US$ 200 em um and obtain p = 0.35. The resulting price of an option to buy one share for $200 in ano seria de US$ 57..69x0.35 = $ 20.19. Este preco é chamado depre¢o neutro ao risco da one year would be $57.69 x 0.35 = $20.19. This price is called the risk-neutral price op¢ao.Pode-se provar (ver Exercicio 14 nesta secdo) que qualquer preco diferente de US$ 20,19 of the option. One can prove (see Exercise 14 in this section) that any price other than para a opcao levaria a consequéncias desagradaveis no mercado. - $20.19 for the option would lead to unpleasant consequences in the market. < Fungées de diversas varidveis aleatérias Functions of Several Random Variables Exemplo A expectativa de uma fungdo de duas variaveis.DeixarXe Ster um PDF conjunto e Example The Expectation of a Function of Two Variables. Let X and Y have a joint p.d.f., and 4.1.15 suponha que queremos a média deX2+ $2. A maneira mais direta, porém mais dificil, 4.1.15 suppose that we want the mean of X* + Y?. The most straightforward but most de fazer isso seria usar os métodos da Secdo. 3.9 para encontrar a distribuigdo deZX difficult way to do this would be to use the methods of Sec. 3.9 to find the distribution 2+ $2e entado aplique a definigdo de média aZ. - of Z = X? + Y* and then apply the definition of mean to Z. < Existe uma versdo do Teorema 4.1.1 para fungdes de mais de uma variavel aleatoria. Sua There is a version of Theorem 4.1.1 for functions of more than one random variable. prova nao é dada aqui. Its proof is not given here. Teorema Lei do Estatistico Inconsciente.Suponha queX,..., Xnsdo variaveis aleatdrias Theorem Law of the Unconscious Statistician. Suppose that X,,..., X, are random variables 4.1.2 com 0 pdf conjuntoffm1, ..., Xn). DeixarRseja uma fungdo com valor real denvariaveis reais, e 4.1.2 with the joint p.d.f. f(x, ..., x,). Let r be a real-valued function of n real variables, suponha queS=/(X1, ..., Xn). Entao£(Sjpode ser determinado diretamente a partir do and suppose that Y = r(X,,..., X,,). Then E(Y) can be determined directly from the relagado i relation E(S ce Oy oo XIE «6 X) ON « GN, BO)= fogs [rors Fle so) dx, +++ dXp, n Hn se a média existir. Da mesma forma, seXi,..., Xntem uma distribuigao conjunta discreta com PF if the mean exists. Similarly, if X;,..., X, have a discrete joint distribution with p.f. f(m,...,Xn),amédia deS=r(M,..., Xnjé f(y, ..-,%X,), the mean of Y =r(Xj,..., X,) is 2d E(S r(xi,..., Xn)f(xi,..., Xn), EY) = So ree an) SO. 5 Xn)s Todosx1,...,Xn All x4,...,X) se a média existir. 7 if the mean exists. 7 Exemplo Determinando a expectativa de uma funcdo de duas varidveis.Suponha que um ponto (X,5) Example Determining the Expectation of a Function of Two Variables. Suppose that a point (X, Y) 4.1.16 é escolhido aleatoriamente do quadradoScontendo todos os pontos (x,s/rm) tal que Os 4.1.16 is chosen at random from the square S containing all points (x, y) such thatO <x <1 XS1 e Ossims1. Determinaremos 0 valor esperado deX2+ S2. and 0 < y < 1. We shall determine the expected value of X* + Y?. DesdeXeStem a distribuigdo uniforme sobre o quadradoS, e desde o Since X and Y have the uniform distribution over the square S, and since the area deSé 1, o pdf conjunto dexesé area of S is 1, the joint p.d.f. of X and Y is { 1 ara(x, VES, de 1 for (x, eS, F (x, YP pares fx. y= 9) 0 outra forma. 0 otherwise. Portanto, Therefore, Jorfoo 00 oo EX2+ 52) (x2+ sima)f (x, y) dx dy E(X?+Y’)= / / (x? + y’) f(x, y) dx dy -0 —-o —oo J—0o Jif 5 Ll 2 = (2. + sim) dx dy=z - -|/ [e+ avay=5. < 0 Oo 3 0 JO 3 216 Capitulo 4 Expectativa 216 Chapter 4 Expectation Nota: Distribuigd6es mais gerais.No Exemplo 3.2.7, introduzimos um tipo de Note: More General Distributions. In Example 3.2.7, we introduced a type of distri- distribuigdo que ndo era nem discreta nem continua. Também é possivel definir bution that was neither discrete nor continuous. It is possible to define expectations expectativas para tais distribuicgées. A definigdo é bastante complicada e ndoa for such distributions also. The definition is rather cumbersome, and we shall not abordaremos aqui. pursue it here. Resumo Summary A expectativa, valor esperado ou média de uma variavel aleatéria 6 um resumo de sua The expectation, expected value, or mean of a random variable is a summary of its distribuicdo. Se a distribuigdo de probabilidade for considerada como uma distribuicgado de distribution. If the probability distribution is thought of as a distribution of mass massa ao longo da linha real, entao a média é 0 centro de massa. A média de uma fungdo Ade along the real line, then the mean is the center of mass. The mean of a function r of a uma variavel aleatoriaXpode ser calculado diretamente a partir da distribuigdo deXsem random variable X can be calculated directly from the distribution of X without first primeiro encontrar a distribuicado der(X). Da mesma forma, a média de uma fungdo de um vetor finding the distribution of r(X). Similarly, the mean of a function of a random vector aleatorio Xpode ser calculado diretamente a partir da distribuicdo dex. X can be calculated directly from the distribution of X. Exercicios Exercises 1.Suponha queXtem a distribuic¢do uniforme no intervalo [ 8.Suponha queXestém uma distribuigdo conjunta continua 1. Suppose that X has the uniform distribution on the 8. Suppose that X and Y have a continuous joint distribu- um, 6]. Encontre a média dex. para a qual a pdf conjunta é a seguinte: interval [a, b]. Find the mean of X. tion for which the joint p.d-f. is as follows: 2.Se um numero inteiro entre 1 e 100 for escolhido { 12sim para OSsimSx<1, caso 2. If an integer between 1 and 100 is to be chosen at 12y? forO<y<x <1, aleatoriamente, qual é 0 valor esperado? F(x, YF . random, what is the expected value? OY = | . 0 contréario. 0 otherwise. 3.Em uma turma de 50 alunos, o numero de alunosneude cada Encontre o valor deE(XY). 3. Ina class of 50 students, the number of students n; Of Bing the value of E(XY). idadeeué mostrado na tabela a seguir: each age i is shown in the following table: 9.Suponha que um ponto seja escolhido aleatoriamente em uma vara 9. Suppose that a point is chosen at random on a stick of Idadecu Neu de comprimento unitario e que a vara seja quebrada em dois pedacos Age i n; unit length and that the stick is broken into two pieces at naquele ponto. Encontre o valor esperado do comprimento da peca that point. Find the expected value of the length of the 18 20 mais longa. 18 20 longer piece. 19 22 10.Suponha que uma particula seja liberada na origem do 19 22 10. Suppose that a particle is released at the origin of 20 4 xy-plano e viaja para o semiplano ondex >0. Suponha que 0 4 the xy-plane and travels into the half-plane where x > 0. a particula se desloque em linha reta e que o dngulo entre Suppose that the particle travels in a straight line and that 21 3 a metade positiva dox-axis e esta linha éa, que pode ser 21 3 the angle between the positive half of the x-axis and this 25 1 positivo ou negativo. Suponha, finalmente, que o anguloa 25 1 line is a, which can be either positive or negative. Suppose, _ tem a distribuic¢do uniforme no intervalo [-77/2,77/2]. Deixar _ finally, that the angle a has the uniform distribution on the . . , Sseja a ordenada do ponto em que a particula atinge a . interval [—z/2, 2/2]. Let Y be the ordinate of the point at Se um aluno for selecionado aleatoriamente na turma, qual é linha verticalx=1. Mostre que a distribuicao de.Sé uma Ifa student is to be selected at random from the class, what which the particle hits the vertical line x = 1. Show that 0 valor esperado de sua idade? distribuigdo de Cauchy. is the expected value of his age? the distribution of Y is a Cauchy distribution. 4.Suponha que uma palavra seja selecionada . . 11.Suponha que as variaveis aleatériasX1,..., Xnformar uma 4. Suppose that one word is to be selected at random from 11. Suppose that the random variables X,,..., X,, form aleatoriamente da frasea garota colocou seu lindo chapéu amostra aleatéria de tamanhonda distribuicdo uniforme no the sentence THE GIRL PUT ON HER BEAUTIFUL RED HAT. Ifx a random sample of size n from the uniform distribution vermelho.Sex denota o numero de letras da palavra intervalo [0,1]. Deixar Si=min{X1,..., Xn}, e deixar Sn=maximo{ denotes the number of letters in the word that is selected, on the interval [0, 1]. Let Y, = min{X),..., X,}, and let selecionada, qual € o valor deEX? M,..., Xn}. Encontrar £(Si Je E(Sn). what is the value of E(X)? Y, = max{X,..., X,}. Find E(Y,) and E(Y,). 5.Suponha que uma letra seja selecionada aleatoriamente 12.Suponha que as varidveis aleatériasXi, ..., Xnformar uma 5. Suppose that one letter is to be selected atrandom from 12. Suppose that the random variables X),..., X, form entre as 30 letras da frase dada no Exerciclo 4. Se5denota amostra aleatéria de tamanhonde uma distribuicdo continua the 30 letters in the sentence given In Exercise 4. iy a random sample of size n from a continuous distribution o numero de letras da palavra em que a letra selecionada para a qual o cdf éFe deixe as varidveis_aleatériass1 denotes the number of letters in the word in which the for which the c.d.f. is F, and let the random variables Y; aparece, qual € 0 valor de£(SP eSnser definido como no Exercicio 11. EncontreALF (S1)] e A selected letter appears, what is the value of E(Y)? and Y,, be defined as in Exercise 11. Find E[F(Y,)] and 6.Suponha que uma variavel aleatoriaXtem uma F (Sn). 6. Suppose that a random variable X has a continuous E[F(Y,)]. distribuicao continua com o pdfidado no Exemplo 4.1.6. 13.Uma acdo atualmente é vendida por US$ 110 por acdo. Deixe o distribution with the p.d.f f given in Example 4.1.6. Find 13. A stock currently sells for $110 per share. Let the price Encontre a expectativa de 1/x. preco da aco no final do periodo de um ano serX, que assumira the expectation of 1/X. of the stock at the end of a one-year period be X, which will 7.Suponha que uma variavel aleatériaXtem distribuicgdo um dos valores $100 ou $300. Suponha que vocé tenha a opcao de 7. Suppose that a random variable X has the uniform dis- take one of the values $100 or $300. Suppose that you have uniforme no intervalo [0,1]. Mostre que a expectativa de 1/ | comprar ag6es dessa acao a US$ 150 por acao no final desse tribution on the interval [0, 1]. Show that the expectation the option to buy shares of this stock at $150 per share Xé infinito. perfodo de um ano. Suponha que esse dinheiro of 1/X is infinite. at the end of that one-year period. Suppose that money 4.2 Propriedades das Expectativas 217 4.2 Properties of Expectations 217 poderia ganhar 5,8% sem risco durante esse periodo de um ano. Encontre o c.Considere as mesmas transacées da parte (a), mas desta vez could earn 5.8% risk-free over that one-year period. Find c. Consider the same transactions as in part (a), but preco neutro ao risco para a op¢ao de compra de uma acao. suponha que o preco da opcao seja $xonde x >20.19. Prove the risk-neutral price for the option to buy one share. this time suppose that the option price is $x where que nosso investidor ganha 4.16x-84 dolares de patrim6nio x > 20.19. Prove that our investor gains 4.16x — 84 14.Considere a situacdo de precificagao de uma opcdo de compra de acdes liquido, ndo importa o que aconteca com o prego das acées. 14. Consider the situation of pricing a stock option as in dollars of net worth no matter what happens to the como no Exemplo 4.1.14. Queremos provar que um preco diferente de US$ Example 4.1.14. We want to prove that a price other than stock price. 20,19 para a op¢do de comprar uma acdo em um ano por US$ 200 seria de As situag6es nas partes (b) e (c) s’o chamadasoportunidades de $20.19 for the option to buy one share in one year for $200 The situations in parts (b) and (c) are called arbi- alguma forma injusto. arbitragem. Tais oportunidades raramente existem durante qualquer would be unfair in some way. trage opportunities. Such opportunities rarely exist for any a.Suponha que um investidor (que ja possui varias ag6es) faca periodo de tempo nos mercados financeiros. Imagine o que a. Suppose that an investor (who has several shares of length of time in financial markets. Imagine what would as seguintes transacées. Ela compra mais trés agdes a US$ aconteceria seas tres agoes € quatro opsoes fossem alteradas para the stock already) makes the following transactions. happen if the three shares and four options were changed 200 por acdo e vende quatro opcées por US$ 20,19 cada. O trés milhdes de acdes e quatro milhées de opcées. She buys three more shares of the stock at $200 per to three million shares and four million options. investidor deve tomar emprestado os US$ 919,24 extras 15.No Exemplo 4.1.14, mostramos como precificar uma opcdo de share and sells four options for $20.19 each. The in- 15. In Example 4.1.14, we showed how to price an option necessdarios para realizar essas transagdes a 4% ao ano. No . . vestor must borrow the extra $519.24 necessary to . : : . . . : compra de uma ac¢do a um determinado preco em um . 0 to buy one share of a stock at a particular price at a partic- final do ano, nosso investidor poderd ter que vender quatro . . ~ make these transactions at 4% for the year. At the to . Ty . determinado momento no futuro. Este tipo de opcdo é chamado . . ular time in the future. This type of option is called a call aces por US$ 200 cada para a pessoa que comprou as ~ ~ . ~ end of the year, our investor might have to sell four . . : . . - deop¢ao de chamada. Aop¢ao de vendaé uma op¢ao de vender option. A put option is an option to sell a share of a stock opcées. De qualquer forma, ela vende ages suficientes para ~ . . . shares for $200 each to the person who bought the . . . . . . uma acao a um determinado preco $simem um determinado : at a particular price $y at a particular time in the future. pagar o valor emprestado mais os 4% de juros. Prove que o mn . ~ options. In any event, she sells enough stock to pay ; : . ! . ere momento no futuro. (Se vocé ndo possui nenhuma acdo quando , (If you don’t own any shares when you wish to exercise investidor tem o mesmo patriménio liquido (dentro do erro . ~ " back the amount borrowed plus the 4 percent inter- . : i . deseja exercer a op¢ao, vocé sempre pode comprar uma pelo . the option, you can always buy one at the market price de arredondamento) no final do ano que teria sem fazer . - . . est. Prove that the investor has the same net worth . : : ~ preco de mercado e depois vendé-la por $s5/m.) O mesmo tipo de ways . and then sell it for $y.) The same sort of reasoning as in essas transacées, independentemente do que acontecesse a . es (within rounding error) at the end of the year as she . : . on . . raciocinio do Exemplo 4.1.14 poderia ser usado para precificar . : . Example 4.1.14 could be used to price a put option. Con- com 0 preco das acées. (Uma combinacdo de acdes e opcdes ~ . ~ would have had without making these transactions, : : . . . a uma op¢do de venda. Considere a mesma aco do Exemplo 4.1.14 : : sider the same stock as in Example 4.1.14 whose price in que nao produz nenhuma alterac¢ao no patriménio liquido é . , i om no matter what happens to the stock price. (A combi- : : vgs . j cujo prego em um ano éXcom a mesma distribuicgao do exemplo e : . one year is X with the same distribution as in the example chamada decarteira livre de risco.) . . . nation of stocks and options that produces no change . . . . a mesma taxa de juros livre de risco. Encontre 0 prego neutro ao . : : . and the same risk-free interest rate. Find the risk-neutral . x x . in net worth is called a risk-free portfolio.) . : : risco de uma opcdo de venda de uma acdo dessa agdo em um ano / ; : price for an option to sell one share of that stock in one b.Considere as mesmas transag6es da parte (a), mas desta vez ao preco de US$ 220. b. Consider the same transactions as in part (a), but year at a price of $220. suponha que o prego da opcao seja $xonde x <20.19. Prove this time suppose that the option price is $x where que nosso investidor perde |4.16x-84| dolares de 16.DeixarSseja uma variavel aleatoria discreta cujo PFéa x < 20.19. Prove that our investor loses |4.16x — 84| 16. Let Y be a discrete random variable whose p-f. is the patriménio liquido, nado importa o que aconteca como prego = fungdo/no Exemplo 4.1.4. DeixarX= | 5|.Prove que a dollars of net worth no matter what happens to the function f in Example 4.1.4. Let X = |Y|. Prove that the das acées. distribuigdo deXtem o pdf no Exemplo 4.1.5. stock price. distribution of X has the p.d.f. in Example 4.1.5. 4.2 Propriedades das Expectativas 4.2 Properties of Expectations Nesta secdo apresentamos alguns resultados que simplificam o cdlculo de expectativas In this section, we present some results that simplify the calculation of expectations para algumas fungdes comuns de varidveis aleatérias. for some common functions of random variables. Teoremas Basicos Basic Theorems Suponha queXé uma variavel aleatéria para a qual a expectativa£Xexiste. Suppose that X is a random variable for which the expectation E(X) exists. We shall Apresentaremos varios resultados relativos as propriedades basicas das expectativas. present several results pertaining to the basic properties of expectations. Teorema Funcdo linear.Ses=machado+ b, onde ae b sao constantes finitas, entao Theorem Linear Function. If Y =aX + b, where a and b are finite constants, then 4.2.1 2.1 E(SFak(X} b. 4 E(Y) =aE(X) +b. ProvaVamos primeiro assumir, por conveniéncia, queXtem uma distribuigdo Proof We first shall assume, for convenience, that X has a continuous distribution continua para a qual o pdf éfEntdo j for which the p.d.f. is f. Then co Co E(SFE(aX+ bE (machadot b)f(xjax E(Y)=E(ax +b) = / (ax +b) f (x) dx 0 —00 Joo Joo oo oo =a Xf(x)dx+ b f(x)dx =a / xf (x) dx +b / f(x) dx — 0 — 0 —oo —oo =aFE(X} Bb. =aE(X)+b. Uma prova semelhante pode ser dada para uma distribuicdo discreta. a A similar proof can be given for a discrete distribution. a 218 Capitulo 4 Expectativa 218 Chapter 4 Expectation Exemplo Calculando a expectativa de uma funcao linear.Suponha queEX}5. Entdo Example Calculating the Expectation of a Linear Function. Suppose that E(X) = 5. Then 421 FBX-5)-3EX)5 = 10 4241 E@GX — 5) =3E(X) —5=10 e and E-3X+15} -3EX#15 =0. - E(—3X + 15) = -3E(X) +15=0. < O seguinte resultado segue do Teorema 4.2.1 coma=0. The following result follows from Theorem 4.2.1 with a = 0. Corolario SeX=ccom probabilidade 1, entaoEXc. = Corollary If X =c with probability 1, then E(X) =c. = 4.2.1 4.2.1 Exemplo Investimento.Um investidor esté tentando escolher entre duas acées possiveis para comprar por Example Investment. An investor is trying to choose between two possible stocks to buy for 4.2.2 um investimento de trés meses. Uma ado custa $ 50 por acdo e tem uma taxa de retorno 4.2.2 a three-month investment. One stock costs $50 per share and has a rate of return of de Aiddlares por acdo para 0 periodo de trés meses, ondeié uma variavel aleatoria. A R, dollars per share for the three-month period, where R is a random variable. The segunda acdo custa $ 30 por acdo e tem uma taxa de retorno deR2por ado para o mesmo second stock costs $30 per share and has a rate of return of R, per share for the same periodo de trés meses. O investidor tem um total de $ 6.000 para investir. Para este three-month period. The investor has a total of $6000 to invest. For this example, exemplo, suponha que o investidor compre agdes de apenas uma acao. (No Exemplo suppose that the investor will buy shares of only one stock. (In Example 4.2.3, we 4.2.3, consideraremos estratégias nas quais o investidor compra mais de uma aco.) shall consider strategies in which the investor buys more than one stock.) Suppose Suponha quefitem distribuigdo uniforme no intervalo [-10,20] e queRztem a distribuigdo that R, has the uniform distribution on the interval [—10, 20] and that R, has the uniforme no intervalo [-4.5,10]. Primeiro calcularemos o valor em ddlares esperado do uniform distribution on the interval [—4.5, 10]. We shall first compute the expected investimento em cada uma das duas acées. Para a primeira acado, os $ 6.000 comprardo dollar value of investing in each of the two stocks. For the first stock, the $6000 will 120 agées, entdo o retorno sera de 1201, cuja média é 120£(Ri F600. (Resolva o Exercicio purchase 120 shares, so the return will be 120R,, whose mean is 120E(R,) = 600. 1 na Segdo 4.1 para ver por que£/A1 5.) Para a segunda acao, os $ 6.000 comprarao 200 (Solve Exercise 1 in Sec. 4.1 to see why E(R,) = 5.) For the second stock, the $6000 acgées, entdo o retorno sera de 2002, cuja média é 200£/R2550. A primeira agdo tem um will purchase 200 shares, so the return will be 200R,, whose mean is 200E(Rz) = 550. retorno esperado maior. The first stock has a higher expected return. Além de calcular o retorno esperado, devemos também perguntar qual dos dois In addition to calculating expected return, we should also ask which of the two investimentos é mais arriscado. Calcularemos agora 0 valor em risco (VaR) no nivel de investments is riskier. We shall now compute the value at risk (VaR) at probability probabilidade 0,97 para cada investimento. (Veja Exemplo 3.3.7 na pagina 113.) VaR sera o level 0.97 for each investment. (See Example 3.3.7 on page 113.) VaR will be the negativo de 1 - 0.97 = 0.03 quantil para o retorno de cada investimento. Para a primeira negative of the 1 — 0.97 = 0.03 quantile for the return on each investment. For the ac¢do, o retorno 120Ritem a distribuicado uniforme no intervalo [-1200,2400] (ver Exercicio first stock, the return 120R, has the uniform distribution on the interval [—1200, 2400] 14 na Segdo 3.8) cujo quantil 0,03 é (de acordo com o Exemplo 3.3.8 na pagina 114) 0.03 (see Exercise 14 in Sec. 3.8) whose 0.03 quantile is (according to Example 3.3.8 on 2400 + 0.97x(-1200# -1092. Entéo VaR= 1092. Para a segunda acdo, 0 retorno 200R2tem a page 114) 0.03 x 2400 + 0.97 x (—1200) = —1092. So VaR= 1092. For the second distribuigdo uniforme no intervalo [-900,2000] cujo quantil 0,03 é 0.03x2000 + 0.97x(-900) stock, the return 200R, has the uniform distribution on the interval [—900, 2000] = -813. Portanto, VaR = 813. Embora a primeira acdo tenha um retorno esperado mais whose 0.03 quantile is 0.03 x 2000 + 0.97 x (—900) = —813. So VaR= 813. Even elevado, a segunda acao parece ser um pouco menos arriscada em termos de VaR. Como though the first stock has higher expected return, the second stock seems to be devemos equilibrar 0 risco e o retorno esperado para escolher entre as duas compras? slightly less risky in terms of VaR. How should we balance risk and expected return Uma maneira de responder a esta questdo é ilustrada no Exemplo 4.8.10, depois de to choose between the two purchases? One way to answer this question is illustrated aprendermos sobre utilidade. - in Example 4.8.10, after we learn about utility. < Teorema Se existe uma constante tal que Pr(X2zaF1, entao£X/za. Se existe um Theorem If there exists a constant such that Pr(X >a) = 1, then E(X) >a. If there exists a 4.2.2 constante b tal que Pr(Xsb¥ 1, entaoEXsb. 4.2.2 constant b such that Pr(X <b) = 1, then E(X) <b. ProvaAssumiremos novamente, por conveniéncia, queXtem uma distribuicgado Proof We shall assume again, for convenience, that X has a continuous distribution continua para a qual o pdf éfe vamos supor primeiro que Pr(Xza¥1. PorqueXé for which the p.d.f. is f, and we shall suppose first that Pr(X > a) = 1. Because X is limitado abaixo, a segunda integral em (4.1.5) é finita. Entdo bounded below, the second integral in (4.1.5) is finite. Then Joo Joo 0° oo EX Xh(x)dx= Xf(x)adx E(X)= / xf (x) dx = / xf (x) dx — 0 a —0oo a Joo oo 2 at(x)dx=aPr.(X2aFa. > / af (x) dx =a Pr(X >a) =a. a a A prova da outra parte do teorema e a prova de uma distribuicgdo discreta sdo The proof of the other part of the theorem and the proof for a discrete distribution semelhantes. = are similar. = 4.2 Propriedades das Expectativas 219 4.2 Properties of Expectations 219 Segue-se do Teorema 4.2.2 que se Pr(asXSb¥F 1, entdoasEXXb. It follows from Theorem 4.2.2 that if Pr(a < X < b) =1, thena < E(X) <b. Teorema Suponha que£Xae que ou Pr(X2aF1 ou Pr(XSa¥1. Entdo Theorem Suppose that E(X) =a and that either Pr(X >a) =1 or Pr(X <a)=1. Then 4.2.3 Pr.(XX=a¥1. 4.2.3 Pr(X =a) =1. ProvaForneceremos uma prova para 0 caso em queXtem uma distribuicgdo discreta e Pr(X Proof We shall provide a proof for the case in which X has a discrete distribution >a1. Os outros casos sdo semelhantes. Deixarxi, x2,...inclua todos os valores x > uma and Pr(X > a) =1. The other cases are similar. Let x1, x2, ... include every value tal que Pr(X=x) >0, se houver. Deixarpo=Pr(X=a). Entdo, x >a such that Pr(X = x) > 0, if any. Let po = Pr(X = a). Then, » oO EX poat xpPr.(XEX)). (4.2.1) E(X) = poa + > x; Pr(X =x;). (4.2.1) Fil j=l Cadaxna soma do lado direito da Eq. (4.2.1) 6 maior quea. Se substituirmos todos osx/é Each x; in the sum on the right side of Eq. (4.2.1) is greater than a. If we replace all pora, asoma nao pode ficar maior e, portanto, of the x;’S by a, the sum can’t get larger, and hence > CO EX? poat aPr.(X=xjEa. (4.2.2) E(X) > ppa+ Da Pr(X = xj) =a. (4.2.2) Fl j=l Além disso, a desigualdade na Eq. (4.2.2) sera estrito se houver pelo menos umx > Furthermore, the inequality in Eq. (4.2.2) will be strict if there is even one x > a with umacom Pr.(X=x) >0. Isso contradizEXa. Portanto, ndo pode haverx > umatal que Pr Pr(X = x) > 0. This contradicts E(X) =a. Hence, there can be no x > a such that (X=x) >0. a Pr(x =x) > 0. a Teorema SeXi,...,Xnsdorwvariaveis aleatdrias tais que cada expectativa£XeuX finito Theorem If X;,..., X, are n random variables such that each expectation E(X;) is finite 4.2.4 (eu=1,..., n), entao 4.2.4 (i =1,...,n), then EXi+.. .+XnF EM}... .+EXn). E(X,+---+X,) = E(X4) +--+ E(X,). ProvaVamos primeiro assumir quen=2 e também, por conveniéncia, queXieX2t€m uma Proof We shall first assume that n = 2 and also, for convenience, that X, and X, have distribuigdo conjunta continua para a qual a pdf conjunta éfEntado a continuous joint distribution for which the joint p.d-f. is f. Then fof co pos EX +X2 (x1+2x2)F (x1, x2)dx1 ax2 E(X,+X)= / / (x4 +X) f (x1, Xo) dxy dx, — 0% — 00 —oo J—00 fof fof xpos px = X1f(x1, x2)dx1 ax2+ X24(x1, x2)dx1 x2 = / / X41 f (X41, X2) dx, dxy + / / Xo f (x1, X2) dx1 dx — 00 —00 -0 -—0 —oo J—00 —oo J—00 Joofoo Joo CO poo oo = X1f(x1, X2)dx2dx1+ X27 (x2)dx2 = / / X41 f (X41, X2) dx_ dx, + / Xo fo(X2) dx — 00 —00 — © —oo J—00 —0o Joo Joo oo oo = Xi fi (x1 Jdxi+ X2f2(x2)dx2 = / x1 f(x) dxy+ / X fo(X2) dx2 — © — © —0o —0o =EX # EX2), = E(X1) + E(x), ondefiefsdo os PDFs marginais deXieX2. A prova para uma distribui¢do discreta é where f; and f> are the marginal p.d.f’s of X, and X>. The proof for a discrete semelhante. Finalmente, o teorema pode ser estabelecido para cada inteiro positivon distribution is similar. Finally, the theorem can be established for each positive por um argumento de inducdo. 7 integer n by an induction argument. 7 Deve-se enfatizar que, de acordo com o Teorema 4.2.4, a expectativa da soma de It should be emphasized that, in accordance with Theorem 4.2.4, the expectation diversas variaveis aleatdérias 6é sempre igual a soma das suas expectativas of the sum of several random variables always equals the sum of their individual individuais, independentemente de qual seja a sua distribuigdo conjunta. Embora o expectations, regardless of what their joint distribution is. Even though the joint p.d.f. pdf conjunto deXiexX2zapareceu na prova do Teorema 4.2.4, apenas as pdfs marginais of X, and X, appeared in the proof of Theorem 4.2.4, only the marginal p.d.f’s figured figuraram no calculo deEXi+X2). into the calculation of E(X, + X>). O préximo resultado segue facilmente dos Teoremas 4.2.1 e 4.2.4. The next result follows easily from Theorems 4.2.1 and 4.2.4. Corolario Assuma isso £XeuX finito paraeu=1,..., 7. Para todas as constantesa1,..., aneb, Corollary Assume that E(X;) is finite fori =1, ..., . For all constants a), ..., a, and b, 4.2.2 E(umaXit.. tanXnt bean EX +. . tanEXn}tb. / 4.2.2 E(a,X1 +++: +a,X, +b) =a, E(X}) +++ +a, E(X,) +0. . 220 Chapter 4 Expectation Example 4.2.3 Investment Portfolio. Suppose that the investor with $6000 in Example 4.2.2 can buy shares of both of the two stocks. Suppose that the investor buys s1 shares of the first stock at $50 per share and s2 shares of the second stock at $30 per share. Such a combination of investments is called a portfolio. Ignoring possible problems with fractional shares, the values of s1 and s2 must satisfy 50s1 + 30s2 = 6000, in order to invest the entire $6000. The return on this portfolio will be s1R1 + s2R2. The mean return will be s1E(R1) + s2E(R2) = 5s1 + 2.75s2. For example, if s1 = 54 and s2 = 110, then the mean return is 572.5. ◀ Example 4.2.4 Sampling without Replacement. Suppose that a box contains red balls and blue balls and that the proportion of red balls in the box is p (0 ≤ p ≤ 1). Suppose that n balls are selected from the box at random without replacement, and let X denote the number of red balls that are selected. We shall determine the value of E(X). We shall begin by defining n random variables X1, . . . , Xn as follows: For i = 1, . . . , n, let Xi = 1 if the ith ball that is selected is red, and let Xi = 0 if the ith ball is blue. Since the n balls are selected without replacement, the random variables X1, . . . , Xn are dependent. However, the marginal distribution of each Xi can be derived easily (see Exercise 10 of Sec. 1.7). We can imagine that all the balls are arranged in the box in some random order, and that the first n balls in this arrange- ment are selected. Because of randomness, the probability that the ith ball in the arrangement will be red is simply p. Hence, for i = 1, . . . , n, Pr(Xi = 1) = p and Pr(Xi = 0) = 1 − p. (4.2.3) Therefore, E(Xi) = 1(p) + 0(1 − p) = p. From the definition of X1, . . . , Xn, it follows that X1 + . . . + Xn is equal to the total number of red balls that are selected. Therefore, X = X1 + . . . + Xn and, by Theorem 4.2.4, E(X) = E(X1) + . . . + E(Xn) = np. (4.2.4) ◀ Note: In General, E[g(X)] ̸= g(E(X)). Theorems 4.2.1 and 4.2.4 imply that if g is a linear function of a random vector X, then E[g(X)] = g(E(X)). For a nonlinear func- tion g, we have already seen Example 4.1.13 in which E[g(X)] ̸= g(E(X)). Jensen’s inequality (Theorem 4.2.5) gives a relationship between E[g(X)] and g(E(X)) for another special class of functions. Definition 4.2.1 Convex Functions. A function g of a vector argument is convex if, for every α ∈ (0, 1), and every x and y, g[αx + (1 − α)y] ≥ αg(x) + (1 − α)g(y). The proof of Theorem 4.2.5 is not given, but one special case is left to the reader in Exercise 13. Theorem 4.2.5 Jensen’s Inequality. Let g be a convex function, and let X be a random vector with finite mean. Then E[g(X)] ≥ g(E(X)). 220 Capítulo 4 Expectativa Exemplo 4.2.3 Carteira de Investimentos.Suponha que o investidor com $ 6.000 no Exemplo 4.2.2 possa comprar ações de ambas as ações. Suponha que o investidor compreé1ações da primeira ação a $ 50 por ação eé2ações da segunda ação a $ 30 por ação. Essa combinação de investimentos é chamada deportfólio. Ignorando possíveis problemas com ações fracionárias, os valores deé1eé2deve satisfazer 50é1+ 30é2= 6.000, para investir todos os $ 6.000. O retorno desta carteira seráé1R1+é2R2. O retorno médio será é1E(R1)+é2E(R2)=5é1+ 2.75é2. Por exemplo, seé1= 54 eé2= 110, então o retorno médio é 572,5. - Exemplo 4.2.4 Amostragem sem reposição.Suponha que uma caixa contenha bolas vermelhas e bolas azuis e que a proporção de bolas vermelhas na caixa ép (0≤p≤1). Suponha quenbolas são selecionadas aleatoriamente da caixaSem substituição, e deixarXdenota o número de bolas vermelhas selecionadas. Vamos determinar o valor deEX). Começaremos definindonvariáveis aleatóriasX1, . . . , Xnda seguinte forma: Paraeu= 1 , . . . , n, deixarXeu=1 se oeua bola selecionada é vermelha, e deixeXeu=0 se oeua bola é azul. Desde onbolas são selecionadas sem reposição, as variáveis aleatórias X1, . . . , Xn são dependentes. No entanto, a distribuição marginal de cadaXeupode ser derivado facilmente (ver Exercício 10 da Seção 1.7). Podemos imaginar que todas as bolas estão dispostas na caixa em alguma ordem aleatória, e que a primeiranbolas neste arranjo são selecionadas. Por causa da aleatoriedade, a probabilidade de que oeua bola do arranjo será vermelha é simplesmentep. Portanto, paraeu=1, . . . , n, Pr.(Xeu=1)=pe Pr(Xeu=0)=1 -pág. (4.2.3) Portanto,EXeu)=1(P)+0(1 -p)=p. Da definição deX1, . . . , Xn, segue queX1+. . .+Xné igual ao número total de bolas vermelhas selecionadas. Portanto,X=X1+. . .+Xne, pelo Teorema 4.2.4, EX)=EX1)+. . .+EXn)=np. (4.2.4) - Nota: Em geral,E[g(X)] =g(E(X)).Os teoremas 4.2.1 e 4.2.4 implicam que segé uma função linear de um vetor aleatórioX, entãoE[g(X)] =g(E(X)). Para uma função não linearg, já vimos o Exemplo 4.1.13 em queE[g(X)] =g(E(X)). A desigualdade de Jensen (Teorema 4.2.5) fornece uma relação entreE[g(X)] eg(E(X))para outra classe especial de funções. Definição 4.2.1 Funções convexas.Uma funçãogde um argumento vetorial éconvexose, para cadaα∈(0,1), e cadaxesim, g[αx+(1 -a)sim]≥αg(x)+(1 -a)g(sim). A prova do Teorema 4.2.5 não é dada, mas um caso especial é deixado ao leitor no Exercício 13. Teorema 4.2.5 Desigualdade de Jensen.Deixargseja uma função convexa e sejaXseja um vetor aleatório com média finita. EntãoE[g(X)]≥g(E(X)). 4.2 Propriedades das Expectativas 221 4.2 Properties of Expectations 221 Exemplo Amostragem com Substituigao.Suponha novamente que em uma caixa contendo bolas vermelhas e Example Sampling with Replacement. Suppose again that in a box containing red balls and 4.2.5 bolas azuis, a proporcdo de bolas vermelhas ép (Osps1). Suponha agora, entretanto, que 4.2.5 blue balls, the proportion of red balls is p (0 < p < 1). Suppose now, however, that uma amostra aleatéria denbolas sdo selecionadas da caixacom substitui¢ao. SeXdenota o a random sample of n balls is selected from the box with replacement. If X denotes numero de bolas vermelhas na amostra, entaoXtem a distribuigéo binomial com the number of red balls in the sample, then X has the binomial distribution with parametrosnep, conforme descrito na Seg. 3.1. Vamos agora determinar o valor deEX). parameters n and p, as described in Sec. 3.1. We shall now determine the value of E(X). Como antes, paraeu=1,..., 7, deixarXeu=1 se oeua bola selecionada é vermelha, e As before, fori =1,...,n, let X; =1if the ith ball that is selected is red, and let deixe Xeu=0 caso contrario. Entéo, como antes,X=Xi+. . .+Xn. Neste problema, as variaveis X; =0 otherwise. Then, as before, X = X,+---+ X~,,. In this problem, the random aleatoriasX1,..., Xnsdo independentes, e a distribuigdo marginal de cadaXeué novamente variables X;,..., X, are independent, and the marginal distribution of each X; is dado pela Eq. (4.2.3). Portanto, EXeu/pparaeu=1,..., 7, e segue do Teorema 4.2.4 que again given by Eq. (4.2.3). Therefore, E(X;) = p fori =1,...,n, and it follows from Theorem 4.2.4 that EXEnp. (4.2.5) E(X) =np. (4.2.5) Assim, a média da distribuigdo binomial com parametrosnepénp. O Thus, the mean of the binomial distribution with parameters n and p is np. The pffxdesta distribuigdo binomial é dada pela Eq. (3.1.4), e a média pode ser calculada p-f£. f(x) of this binomial distribution is given by Eq. (3.1.4), and the mean can be diretamente a partir do FP da seguinte forma: computed directly from the p.f. as follows: ” ( ) n n EXP x 7 pxgnx. (4.2.6) E(X)=)) x( pra. (4.2.6) Xx x x=0 x=0 Portanto, pela Eq. (4.2.5), o valor da soma na Eq. (4.2.6) deve sernp. - Hence, by Eq. (4.2.5), the value of the sum in Eq. (4.2.6) must be np. < E visto nas Eqs. (4.2.4) e (4.2.5) que o numero esperado de bolas vermelhas em uma It is seen from Eqs. (4.2.4) and (4.2.5) that the expected number of red balls amostra denbolas énp, independentemente de a amostra ser selecionada com ou sem in a sample of n balls is np, regardless of whether the sample is selected with or reposi¢do. No entanto, a distribuigdo do numero de bolas vermelhas é diferente without replacement. However, the distribution of the number of red balls is different dependendo se a amostragem é feita com ou sem reposicdo (porn >1). Por exemplo, Pr.(X depending on whether sampling is done with or without replacement (for n > 1). =né sempre menor no Exemplo 4.2.4, onde a amostragem é feita sem reposigdo, do que For example, Pr(X =n) is always smaller in Example 4.2.4 where sampling is done no Exemplo 4.2.5, onde a amostragem é feita com reposi¢do, sen >1. (Ver Exercicio 27 na without replacement than in Example 4.2.5 where sampling is done with replacement, Sedo 4.9.) ifn > 1. (See Exercise 27 in Sec. 4.9.) Exemplo Numero esperado de partidas.Suponha que uma pessoa digitenletras, digita o anuncio Example Expected Number of Matches. Suppose that a person types n letters, types the ad- 4.2.6 vestidosnenvelopes e, em seguida, coloca cada carta em um envelope de maneira 4.2.6 dresses on n envelopes, and then places each letter in an envelope in a random aleatéria. DeixarXseja o numero de cartas colocadas nos envelopes corretos. manner. Let X be the number of letters that are placed in the correct envelopes. Encontraremos a média deX. (Na Secdo 1.10, fizemos um calculo mais dificil com este We shall find the mean of X. (In Sec. 1.10, we did a more difficult calculation with mesmo exemplo.) this same example.) Paraeu=1,..., 0, deixarXeu=1 se oeua carta é colocada no envelope correto, e deixeX Fori =1,...,n, let X; =1if the ith letter is placed in the correct envelope, and ev=0 caso contrario. Entéo paraeu=1,..., 7, let X; = 0 otherwise. Then, fori =1,...,n, 1 1 1 1 Pr.(Xeu=1- - e@ Pr.(X3, OV=1 - -. Prix; =1)=—- and Pr(x;=0)=1--. n n n n Portanto, Therefore, 1 1 . EXeu= — paraeu=1,...,7n. E(X;)=—- fori=1,...,n. n n Desdex=Xi+. +--+ Xn, segue que Since X = X,+---+ X,, it follows that EXFEM }. . .+EXn) E(X) = E(X1) +---+ E(X,) 1 1 1 1 Ht, t=1. -— =—+---4+-=1. n n n n Assim, 0 valor esperado do numero de correspondéncias corretas de cartas e envelopes é 1, Thus, the expected value of the number of correct matches of letters and envelopes independentemente do valor den. - is 1, regardless of the value of n. < 222 Capitulo 4 Expectativa 222 Chapter 4 Expectation Expectativa de um produto de varidveis aleatérias independentes Expectation of a Product of Independent Random Variables Teorema SemX,..., Xnsdonvariaveis aleatérias independentes, tais que cada expectativa£Xeu) Theorem If X,,..., X, aren independent random variables such that each expectation E(X;) 4.2.6 é finito(eu=1,..., n), entdo 4.2.6 is finite (i =1,...,n), then ( bn iT’ n n E Xeu= EXeu). E (M1 xi = I] E(X;). eu=1 eu=1 i=l i=l ProvaAssumiremos novamente, por conveniéncia, queX1,..., Xntém uma distribuicao Proof We shall again assume, for convenience, that X,,..., X,, have a continuous conjunta continua para a qual a pdf conjunta éfAlém disso, vamos deixar feudenotar o pdf joint distribution for which the joint p.d.f. is f. Also, we shall let f; denote the mar- marginal deXeu(eu=1,..., n). Entéo, como as variaveisXi, ..., Xnsdo independentes, ginal p.d-f. of X; G =1,..., 2). Then, since the variables X), ..., X,, are independent, segue-se que em cada ponto(x1,..., XnJERn, it follows that at every point (x1,...,x,) € R”, iT’ n fa, ..., Xn feu(Xeu). F(x, se Xn) =] | A@. eu=1 i=l Portanto, Therefore, ( ir )Joo Joo ( Tr ) n co oo n E Xeu = Lee Xeuf (x, ..., Xn) AX .4 . AXn E [| %: -| af [ [2 Ff (Xp. 00+ 5 Xp) dx +++ dx, eu=1 “° “° [ eu=1 i=l ~o ~© \i=1 Joo Joo ir 0° oo n = one Xeufeu(Xeu) ax. . .aXxn -| se / I] x; f(x) dx, wee dx, - 0 ~° eu=' 70 7% Lj=t iT Joo iT n oo n = Xeufeu(Xeu)aXeu= EXeu). = I] / x; fi (x;) dx; = I] E(X;). eu=1 eu=1 int 77 i=1 A prova para uma distribuicdo discreta 6 semelhante. a The proof for a discrete distribution is similar. a A diferenca entre o Teorema 4.2.4 e 0 Teorema 4.2.6 deve ser enfatizada. Se for The difference between Theorem 4.2.4 and Theorem 4.2.6 should be emphasized. assumido que cada expectativa é finita, a expectativa da soma de um grupo de variaveis If it is assumed that each expectation is finite, the expectation of the sum of a group aleatérias ésempreigual a soma de suas expectativas individuais. No entanto, a of random variables is always equal to the sum of their individual expectations. expectativa do produto de um grupo de variaveis aleatérias éndosempre igual ao However, the expectation of the product of a group of random variables is not always produto de suas expectativas individuais. Se as variaveis aleatérias forem independente, equal to the product of their individual expectations. If the random variables are entdo essa igualdade também sera valida. independent, then this equality will also hold. Exemplo Calculando a expectativa de uma combinagdo de varidveis aleatdrias.Suponha queXi, Example Calculating the Expectation of a Combination of Random Variables. Suppose that X,, 4.2.7 X2, eX3Sd0 variaveis aleatdrias independentes tais queEXeu-0 e£X2 = eu1 para 4.2.7 X , and X3 are independent random variables such that E(X;) = 0 and E(X?) =1for eu=1,2,3. Determinaremos o valor de£LX2 10Q- 4X3)]. i =1, 2, 3. We shall determine the value of E[X7(X2 — 4X3)’]. DesdeX1,X2, eX3sdo independentes, segue-se que as duas variadveis aleatdrias Since X,, X>, and X3 are independent, it follows that the two random variables Xae(X2- 4X3 também sdo independentes. Portanto, x? and (X> — 4X3)” are also independent. Therefore, ALX21 (02- 4X3P] =BX2 1 )JAL00- 43] E[X{ (Xp — 4X3)] = E(XPE[(X2 — 4X3)"] =EX2 2- BX2X3+ 16X23) = E(X5 — 8X)X3 + 16X3) =EX2 2)8EX2%316EX2 3) = E(X3) — 8E(XX3) + 16E(X3) =1 - 8EX2)EX3}-16 =17. = 1—- 8E(X>)E(X3) + 16 - =17. < Exemplo Filtragem repetida.Um processo de filtragdo remove uma proporcdo aleatéria de particulas Example Repeated Filtering. A filtration process removes a random proportion of particulates 4.2.8 na Agua onde é aplicado. Suponha que uma amostra de Agua seja submetida a esse processo 4.2.8 in water to which it is applied. Suppose that a sample of water is subjected to this duas vezes. DeixarXisera a proporcao de particulas que sdo removidas na primeira passagem. process twice. Let X, be the proportion of the particulates that are removed by Deixar-X2seja a proporcdo do que resta apos a primeira passagem que the first pass. Let X> be the proportion of what remains after the first pass that 4.2 Propriedades das Expectativas 223 4.2 Properties of Expectations 223 é removido na segunda passagem. Assuma issoXieX2sdo varidveis aleatdrias is removed by the second pass. Assume that X, and X, are independent random independentes com pdf comumffx-4x3para 0<x <1 ef(x0 caso contrario. Deixar Sseja a variables with common p.d.f. f(x) = 4x? for 0 < x <1 and f(x) =0 otherwise. Let proporcdo das particulas originais que permanecem na amostra apés duas passagens. Y be the proportion of the original particulates that remain in the sample after two EntaoS=(1 -X1)(1 -X2). PorqueXieX2sdo independentes, entao também sdo 1 -X1e 1 -X2. passes. Then Y = (1 — X;)(1 — X2). Because X, and X> are independent, so too are Desde 1 -Xie 1 -X2t€m a mesma distribuicdo, eles tam a mesma média, chame isso. 1 — X; and 1 — X>. Since 1 — X, and 1 — X> have the same distribution, they have the Segue queStem significadoyz. Podemos encontraryicomo same mean, call it jz. It follows that Y has mean p2. We can find py as jr 4 1 4 L=EQ -M (1 -x1)4x3 1dx1= 1 - 5 =0.2. M=EA-X)= / d—- x)4xfdx =1- 5 = 0.2. 0 0 Segue queF(SF0.22= 0.04. - It follows that E(Y) = 0.27 = 0.04. < Expectativa para distribuig6es ndo negativas Expectation for Nonnegative Distributions Teorema Varidveis aleatdérias com valor inteiro.DeixarXser uma varidvel aleatoria que pode assumir apenas o Theorem Integer-Valued Random Variables. Let X be a random variable that can take only the 4.2.7 valores 0,1,2,....Entao 4.2.7 values 0, 1, 2,.... Then ye oo EXF Pr.(X2n). (4.2.7) E(X) = )> Pr(X =n). (4.2.7) n=1 n=1 ProvaPrimeiro, podemos escrever Proof First, we can write E E ~ = EX nPr. (Xen nPr.(X=n). (4.2.8) E(X) =n Pr(X =n) =) > n Pr(X =n). (4.2.8) n=0 n=1 n=0 n=l A seguir, considere a seguinte matriz triangular de probabilidades: Next, consider the following triangular array of probabilities: Pr.(X=1) — Pr.(X=2) Pr. (X=3) se Prix =1) Prix =2) Prix =3) -:-- Pr.(X=2) Pr.(X=3) —--- Prix =2) Pr(x=3) .--- Pr.(X=3) «+> Prix =3) --- Podemos calcular a soma de todos os elementos desta matriz de duas maneiras diferentes We can compute the sum of all the elements in this array in two different ways porque todas as somas sao nao negativas. Primeiro, podemos adicionar os elementos em cada because all of the summands are nonnegative. First, we can add the elements in each cSpluna da matriz e, em seguida, adicione os totais dessas colunas. Assim, obtemos 0 valor column of the array and then add these column totals. Thus, we obtain the value 7-1 nPr.(X=n). Segundo, podemos adicionar os elementos on cada linha do array e entao ye n Pr(X =n). Second, we can add the elements in each row of the array and then adicione esses totais de linha. Desta forma obtemos 0 valor ~ 7=1Pr.(X2n). Portanto, add these row totals. In this way we obtain the value ye Pr(X > n). Therefore, ye ye oo oo nPr.(X=nF Pr.(X2n). > n Pr(X =n) = > Pr(X >n). n=1 n=1 n=1 n=1 Eq. (4.2.7) agora segue da Eq. (4.2.8). 7 Eq. (4.2.7) now follows from Eq. (4.2.8). 7 Exemplo Numero esperado de ensaios.Suponha que uma pessoa tente repetidamente realizar um determinado Example Expected Number of Trials. Suppose that a person repeatedly tries to perform a certain 4.2.9 tarefa até que ele tenha sucesso. Suponha também que a probabilidade de sucesso em cada 4.2.9 task until he is successful. Suppose also that the probability of success on each given tentativa sejap (0<p <1)e que todos os ensaios sao independentes. SeXdenota o numero da trial is p (0 < p <1) and that all trials are independent. If X denotes the number tentativa em que o primeiro sucesso é obtido, entéo£X)pode ser determinado da seguinte of the trial on which the first success is obtained, then E(X) can be determined as forma. follows. Como pelo menos uma tentativa é sempre necessaria, Pr(X21}1. Além disso, paran=2,3,..., Since at least one trial is always required, Pr(X > 1) = 1. Also, forn = 2, 3,..., pelo menosntestes serdo necessarios se e somente se nenhum dos primeiros/+1 tentativas resultam at least n trials will be required if and only if none of the first n — 1 trials results in em sucesso. Portanto, success. Therefore, Pr.(Xnk(l -p)r-. Pr(X =n) =(1—p)". 224 Capitulo 4 Expectativa 224 Chapter 4 Expectation Pela Eq. (4.2.7), segue-se que By Eq. (4.2.7), it follows that 1 1 2 1 1 EXP +(1 -pe(1 -pht+..= ——_ = -. - E(X) =14 (1—p) +(.— p++ = ——>— = -. < 1-(1 -p) p 1-—(_—-p) p O Teorema 4.2.7 tem uma versdo mais geral que se aplica a todas as variaveis aleatérias nao Theorem 4.2.7 has a more general version that applies to all nonnegative random negativas. variables. Teorema Variavel aleatoria geral ndo negativa.DeixarXser uma variavel aleatoria ndo negativa com Theorem General Nonnegative Random Variable. Let X be a nonnegative random variable with 4.2.8 CDFF.Entao j 4.2.8 c.d.f. F. Then °° Co EX [1 -Fixlldx. (4.2.9) E(X)= / [1 — FQ) ]dx. (4.2.9) 0 0 = = A prova do Teorema 4.2.8 é deixada ao leitor nos Exercicios 1 e 2 da Secao. 4.9. The proof of Theorem 4.2.8 is left to the reader in Exercises 1 and 2 in Sec. 4.9. Exemplo Tempo de espera esperado.DeixarXé 0 tempo que um cliente passa esperando pelo atendimento Example Expected Waiting Time. Let X be the time that a customer spends waiting for service 4.2.10 numa fila. Suponha que o cdf dexé 4.2.10 in a queue. Suppose that the c.d.f. of X is { 0 sexso0, 0 if x <0, Fix F(x) = { x 1-e2x sex>0. l-e~ ifx>0. Entdo a média deXé j Then the mean of X is @-2x 1 S -2x 1 EX Oax=. = - E(X)= e “dx =-. < 0 2 0 2 Resumo Summary A média de uma fungdo linear de um vetor aleatdério é a fungdo linear da média. Em The mean of a linear function of a random vector is the linear function of the mean. particular, a média de uma soma € a soma das médias. Por exemplo, a média da In particular, the mean of a sum is the sum of the means. As an example, the mean of distribuigdo binomial com parametrosnepénp. Em geral, tal relacionamento ndo se the binomial distribution with parameters n and p is np. No such relationship holds aplica a funcdes ndo lineares. Para variaveis aleatdérias independentes, a média do in general for nonlinear functions. For independent random variables, the mean of produto é o produto das médias. the product is the product of the means. Exercicios Exercises 1.Suponha que o retornoA(em dolares por a¢do) de AU(M- 2X2+X3)]. 1. Suppose that the return R (in dollars per share) of a E[(X, — 2X + X3)]. uma acao tem distribuigdo uniforme no intervalo [-3,7]. . a ae stock has the uniform distribution on the interval [—3, 7]. . . Suponha também que cada acdo custa $ 1,50. DeixarS 4.Suponha que a variavel aleatoriaXtem distribuicao Suppose also, that each share of the stock costs $1.50, 4 Suppose that the random variable X has the uniform seja o retorno liquido (retorno total menos custo) de uniforme no intervalo [0,1], que a variavel aleatoriaS Let Y be the net return (total return minus cost) on an _‘“istribution on the interval [0, 1], that the random vari- um investimento de 10 acées. Calcular£(S). tem a distribuigdo uniforme no intervalo [5,9], e queXeS investment of 10 shares of the stock. Compute E(Y). able Y has the uniform distribution on the interval [5, 9], sdo independentes. Suponha também que um and that X and Y are independent. Suppose also that a 2.Suponha que trés variaveis aleatériasX1,X2,X3forme uma retangulo seja construido para o qual os comprimentos 2. Suppose that three random variables X,, X>, X3 form rectangle is to be constructed for which the lengths of two amostra aleatoria de uma distribuigdo para a qualamédiaé5. de dois lados adjacentes sejamXeS.Determine o valor a random sample from a distribution for which the mean adjacent sides are X and Y. Determine the expected value Determine o valor de esperado da area do retangulo. is 5. Determine the value of of the area of the rectangle. F(2X1-3.X2+.X3- 4), 5.Suponha que as variaveisX1, ..., Xnformar uma amostra E(2X, —3X)+ X34). 5. Suppose that the variables X;,..., X,, form a random aleatéria de tamanhonde uma dada distribuicgdo continua na sample of size n from a given continuous distribution on 3.Suponha que trés variadveis aleatdériasX1,X2,X3formar linha real para a qual a pdf éfEncontre a expectativa do 3. Suppose that three random variables X,, X2, X3 form the real line for which the p.d.f. is f. Find the expecta- uma amostra aleatoria a partir da distribuigdo uniforme numero de observacées na amostra que se enquadram em a random sample from the uniform distribution on the tion of the number of observations in the sample that fall no intervalo [0,1]. Determine o valor de um intervalo especificadoasxsb. interval [0, 1]. Determine the value of within a specified interval a <x <b. 4.3 Variance 225 6. Suppose that a particle starts at the origin of the real line and moves along the line in jumps of one unit. For each jump, the probability is p (0 ≤ p ≤ 1) that the particle will jump one unit to the left and the probability is 1 − p that the particle will jump one unit to the right. Find the expected value of the position of the particle after n jumps. 7. Suppose that on each play of a certain game a gambler is equally likely to win or to lose. Suppose that when he wins, his fortune is doubled, and that when he loses, his fortune is cut in half. If he begins playing with a given fortune c, what is the expected value of his fortune after n independent plays of the game? 8. Suppose that a class contains 10 boys and 15 girls, and suppose that eight students are to be selected at random from the class without replacement. Let X denote the number of boys that are selected, and let Y denote the number of girls that are selected. Find E(X − Y). 9. Suppose that the proportion of defective items in a large lot is p, and suppose that a random sample of n items is selected from the lot. Let X denote the number of defective items in the sample, and let Y denote the number of nondefective items. Find E(X − Y). 10. Suppose that a fair coin is tossed repeatedly until a head is obtained for the first time. (a) What is the expected number of tosses that will be required? (b) What is the expected number of tails that will be obtained before the first head is obtained? 11. Suppose that a fair coin is tossed repeatedly until ex- actly k heads have been obtained. Determine the expected number of tosses that will be required. Hint: Represent the total number of tosses X in the form X = X1 + . . . + Xk, where Xi is the number of tosses required to obtain the ith head after i − 1 heads have been obtained. 12. Suppose that the two return random variables R1 and R2 in Examples 4.2.2 and 4.2.3 are independent. Consider the portfolio at the end of Example 4.2.3 with s1 = 54 shares of the first stock and s2 = 110 shares of the second stock. a. Prove that the change in value X of the portfolio has the p.d.f. f (x) = ⎧ ⎪⎪⎪⎨ ⎪⎪⎪⎩ 3.87 × 10−7(x + 1035) if −1035 < x < 560, 6.1728 × 10−4 if 560 ≤ x ≤ 585, 3.87 × 10−7(2180 − x) if 585 < x < 2180, 0 otherwise. Hint: Look at Example 3.9.5. b. Find the value at risk (VaR) at probability level 0.97 for the portfolio. 13. Prove the special case of Theorem 4.2.5 in which the function g is twice continuously differentiable and X is one-dimensional. You may assume that a twice continu- ously differentiable convex function has nonnegative sec- ond derivative. Hint: Expand g(X) around its mean using Taylor’s theorem with remainder. Taylor’s theorem with remainder says that if g(x) has two continuous derivatives g′ and g′′ at x = x0, then there exists y between x0 and x such that g(x) = g(x0) + (x − x0)g′(x0) + (x − x0)2 2 g′′(y). 4.3 Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution. For example, a random variable X with mean 2 has the same mean as the constant random variable Y such that Pr(Y = 2) = 1 even if X is not constant. To distinguish the distribution of X from the distribution of Y in this case, it might be useful to give some measure of how spread out the distribution of X is. The variance of X is one such measure. The standard deviation of X is the square root of the variance. The variance also plays an important role in the approximation methods that arise in Chapter 6. Example 4.3.1 Stock Price Changes. Consider the prices A and B of two stocks at a time one month in the future. Assume that A has the uniform distribution on the interval [25, 35] and B has the uniform distribution on the interval [15, 45]. It is easy to see (from Exercise 1 in Sec. 4.1) that both stocks have a mean price of 30. But the distributions are very different. For example, A will surely be worth at least 25 while Pr(B < 25) = 1/3. But B has more upside potential also. The p.d.f.’s of these two random variables are plotted in Fig. 4.5. ◀ 4.3 Variância 225 6.Suponha que uma partícula comece na origem da reta real e se mova ao longo da reta em saltos de uma unidade. Para cada salto, a probabilidade ép (0≤p≤1)que a partícula saltará uma unidade para a esquerda e a probabilidade é 1 -p que a partícula saltará uma unidade para a direita. Encontre o valor esperado da posição da partícula apósnsalta. ondeXeué o número de lançamentos necessários para obter o eua cabeça depoiseu-1 cabeça foi obtida. 12.Suponha que os dois retornem variáveis aleatóriasR1e R2nos Exemplos 4.2.2 e 4.2.3 são independentes. Considere o portfólio no final do Exemplo 4.2.3 comé1= 54 ações da primeira ação eé2= 110 ações da segunda ação. 7.Suponha que em cada jogada de um determinado jogo um jogador tenha a mesma probabilidade de ganhar ou perder. Suponha que quando ele ganha, sua fortuna seja duplicada e que, quando ele perde, sua fortuna seja reduzida à metade. Se ele começar a brincar com uma determinada fortunac, qual é o valor esperado de sua fortuna após n jogadas independentes do jogo? a.Prove que a mudança no valorXdo portfólio tem o pdf f(x) ⎧ ⎪⎪⎪3.87×10−7(x+1035) se −1035<x <560, se 560≤x≤585, se 585<x <2180, caso contrário. 8.Suponha que uma turma contenha 10 meninos e 15 meninas e suponha que oito alunos sejam selecionados aleatoriamente da turma, sem reposição. DeixarXdenota o número de meninos selecionados e deixaSdenota o número de meninas selecionadas. EncontrarEX-S). ⎨ 6.1728×10−4 = ⎪⎪⎪3.87×10−7(2180 -x) ⎩ 0 Dica:Veja o Exemplo 3.9.5. b.Encontre o valor em risco (VaR) no nível de probabilidade 0,97 para a carteira. 9.Suponha que a proporção de itens defeituosos em um lote grande sejap, e suponha que uma amostra aleatória den itens são selecionados do lote. DeixarXdenotar o número de itens defeituosos na amostra, e deixarSdenota o número de itens não defeituosos. EncontrarEX-S). 13.Prove o caso especial do Teorema 4.2.5 em que a funçãogé duas vezes continuamente diferenciável eXé unidimensional. Você pode assumir que uma função convexa duas vezes continuamente diferenciável tem segunda derivada não negativa.Dica:Expandirg(X)em torno de sua média usando o teorema de Taylor com resto. O teorema de Taylor com resto diz que seg(x)tem duas derivadas contínuas g'eg“nox=x0, então existesimentrex0ex de tal modo que 10.Suponha que uma moeda honesta seja lançada repetidamente até que uma cara seja obtida pela primeira vez.(a)Qual é o número esperado de lançamentos que serão necessários?(b)Qual é o número esperado de coroas que serão obtidas antes da primeira cara ser obtida? 11.Suponha que uma moeda honesta seja lançada repetidamente até exatamentekcabeças foram obtidas. Determine o número esperado de lançamentos que serão necessários.Dica:Representa o número total de lançamentosXna formaX=X1+. . .+Xk, (x-x0)2 2 g(x)=g(x0)+(x-x0)g'(x0)+ g“(s). 4.3 Variância Embora a média de uma distribuição seja um resumo útil, ela não transmite muita informação sobre a distribuição. Por exemplo, uma variável aleatória X com média 2 tem a mesma média que a variável aleatória constanteSde tal modo que Pr.(S=2)=1ainda queXnão é constante. Para distinguir a distribuição de Xda distribuição deSneste caso, pode ser útil dar alguma medida de quão dispersa é a distribuição deXé. A variação deXé uma dessas medidas. O desvio padrão deXé a raiz quadrada da variância. A variância também desempenha um papel importante nos métodos de aproximação que surgem no Capítulo 6. Exemplo 4.3.1 Mudanças no preço das ações.Considere os preçosAeBde duas ações por vez, um mês em o futuro. Assuma issoAtem a distribuição uniforme no intervalo [25,35] eB tem a distribuição uniforme no intervalo [15,45]. É fácil ver (a partir do Exercício 1 da Seção 4.1) que ambas as ações têm um preço médio de 30. Mas as distribuições são muito diferentes. Por exemplo,Acertamente valerá pelo menos 25 enquanto Pr(B<25)=1/3. MasB também tem mais potencial de valorização. As PDFs dessas duas variáveis aleatórias estão plotadas na Figura 4.5. - 226 Capitulo 4 Expectativa 226 Chapter 4 Expectation Figura 4.5As PDFs de duas pdf — Uniforme em [25,35] Figure 4.5 The p.d.t’s of pat. — Uniform on [25,35] distribuigdes uniformes no two uniform distributions Q Exemplo 4.3.1. Ambas as N in Example 4.3.1. Both = distribuigdes tem média igual S —— distributions have mean 3 —— a 30, mas estdo distribuidas S equal to 30, but they are 2 de forma diferente. 8 spread out differently. S oO 3 S ° t 8 8 So Ss 9 10 20 30 40 50 60 x 9 10 20 30 40 50 60 * Definigdes da Variancia e do Desvio Padrdo Definitions of the Variance and the Standard Deviation Embora os dois precos aleatérios no Exemplo 4.3.1 tenham a mesma média, 0 Although the two random prices in Example 4.3.1 have the same mean, price B precoBé mais espalhado que o precoA, e seria bom ter um resumo da is more spread out than price A, and it would be good to have a summary of the distribuigao que facilitasse a visualizagao. distribution that makes this easy to see. Definigao Varidncia/desvio padrdo.DeixarXser uma variavel aleatéria com média finitaz=EX). Definition — Variance/Standard Deviation. Let X be a random variable with finite mean x = E(X). 4.3.1 Ovaria¢ao dex, denotado por Var(X), é definido da seguinte forma: 4.3.1 The variance of X, denoted by Var(X), is defined as follows: Var (XFA(Xyp]. (4.3.1) Var(X) = E[(X — 1)’]. (4.3.1) SeXtem média infinita ou se a média deXndo existe, dizemos que Var(X)ndo If X has infinite mean or if the mean of X does not exist, we say that Var(X) does existe. Odesvio padrdo deXé a raiz quadrada ndo negativa de Var(Xe a variagdo not exist. The standard deviation of X is the nonnegative square root of Var(X) if the existir. variance exists. Se a expectativa na Eq. (4.3.1) é infinito, dizemos que Var(Xe o desvio padrao de If the expectation in Eq. (4.3.1) is infinite, we say that Var(X) and the standard Xsao infinitos. deviation of X are infinite. Quando apenas uma variavel aleatéria esta sendo discutida, 6 comum denotar When only one random variable is being discussed, it is common to denote its seu desvio padrdo pelo simbolog, e a variancia 6 denotada poroz. Quando mais de standard deviation by the symbol o, and the variance is denoted by o?. When more uma variavel aleatéria esta sendo discutida, o nome da variavel aleatoria é incluido than one random variable is being discussed, the name of the random variable is como um subscrito do simbolog, por exemplo, oxseria o desvio padrdo de included as a subscript to the symbol o, e.g., oy would be the standard deviation of Xenquantoosseria a variagao deS. X while oy would be the variance of Y. Exemplo Mudangas no prego das acées.Retorne as duas varidveis aleatériasAeSno Exemplo 4.3.1. Example Stock Price Changes. Return to the two random variables A and B in Example 4.3.1. 4.3.2 Usando 0 Teorema 4.1.1, podemos calcular 4.3.2 Using Theorem 4.1.1, we can compute fas Js. Is 35 5 5 1. 1 18 25 1 1 1x? 25 Var (A (230k —pa=— xae= 10 | == Var (A) =| (a — 30)?—da = — / dx = —~ ==, 25 10 10 -5 103i 3 25 10 10 J_5 10 3 3 x=-5 x=—5 Jas J 15 15 1 1 15 1si | 45 1 1 15 1 3 Var(B (D-30)2 —voncocewoee —— simmoren =75, Var(B) = / (b — 30)2—-db = — / ydy = —~ =75. 15 30 30 -15 303] | 15 30 30 J_is 30 3 sim=-15 y=-15 Entdo, Var(BX nove vezes maior que Var(A). Os desvios padrdo deAeBsdo ov=2. So, Var(B) is nine times as large as Var(A). The standard deviations of A and B are 87 eos8=8.66. - oO, = 2.87 and op = 8.66. < Nota: A Variancia Depende Apenas da Distribuigado.A varidncia e o desvio padrdo Note: Variance Depends Only on the Distribution. The variance and standard de uma variavel aleatériaXdependem apenas da distribuigdo deX, assim como a deviation of a random variable X depend only on the distribution of X, just as expectativa deXdepende apenas da distribuicdo. Na verdade, tudo 0 que pode ser the expectation of X depends only on the distribution. Indeed, everything that can calculado a partir do PF ou da pdf depende apenas da distribuicdo. Dois aleatérios be computed from the p.f. or p.d.f. depends only on the distribution. Two random 4.3 Variance 227 variables with the same distribution will have the same variance, even if they have nothing to do with each other. Example 4.3.3 Variance and Standard Deviation of a Discrete Distribution. Suppose that a random variable X can take each of the five values −2, 0, 1, 3, and 4 with equal probability. We shall determine the variance and standard deviation of X. In this example, E(X) = 1 5(−2 + 0 + 1 + 3 + 4) = 1.2. Let μ = E(X) = 1.2, and define W = (X − μ)2. Then Var(X) = E(W). We can easily compute the p.f. f of W: x −2 0 1 3 4 w 10.24 1.44 0.04 3.24 7.84 f (w) 1/5 1/5 1/5 1/5 1/5 It follows that Var(X) = E(W) = 1 5[10.24 + 1.44 + 0.04 + 3.24 + 7.84] = 4.56. The standard deviation of X is the square root of the variance, namely, 2.135. ◀ There is an alternative method for calculating the variance of a distribution, which is often easier to use. Theorem 4.3.1 Alternative Method for Calculating the Variance. For every random variable X, Var(X) = E(X2) − [E(X)]2. Proof Let E(X) = μ. Then Var(X) = E[(X − μ)2] = E(X2 − 2μX + μ2) = E(X2) − 2μE(X) + μ2 = E(X2) − μ2. Example 4.3.4 Variance of a Discrete Distribution. Once again, consider the random variable X in Example 4.3.3, which takes each of the five values −2, 0, 1, 3, and 4 with equal probability. We shall use Theorem 4.3.1 to compute Var(X). In Example 4.3.3, we computed the mean of X as μ = 1.2. To use Theorem 4.3.1, we need E(X2) = 1 5[(−2)2 + 02 + 12 + 32 + 42] = 6. BecauseE(X) = 1.2, Theorem 4.3.1 says that Var(X) = 6 − (1.2)2 = 4.56, which agrees with the calculation in Example 4.3.3. ◀ The variance (as well as the standard deviation) of a distribution provides a mea- sure of the spread or dispersion of the distribution around its mean μ. A small value of the variance indicates that the probability distribution is tightly concentrated around 4.3 Variância 227 variáveis com a mesma distribuição terão a mesma variância, mesmo que não tenham nada a ver uma com a outra. Exemplo 4.3.3 Variância e desvio padrão de uma distribuição discreta.Suponha que um acaso variávelXpode assumir cada um dos cinco valores −2,0,1,3 e 4 com igual probabilidade. Determinaremos a variância e o desvio padrão deX. Neste exemplo, 1 EX)=(−2 + 0 + 1 + 3 + 4)=1.2. 5 Deixarμ=EX)=1.2, e definaC=(X-μ)2. Então Var(X)=AI CREDO). Podemos calcular facilmente o PFfdeC: x c f (w) − 2 10.24 0 1 3 4 1,44 1/5 0,04 1/5 3.24 1/5 7,84 1/5 1/5 Segue que Var(X)=AI CREDO)= 1 5 O desvio padrão deXé a raiz quadrada da variância, ou seja, 2.135. [10.24 + 1.44 + 0.04 + 3.24 + 7.84] = 4.56. - Existe um método alternativo para calcular a variância de uma distribuição, que geralmente é mais fácil de usar. Teorema 4.3.1 Método alternativo para cálculo da variância.Para cada variável aleatóriaX, Var(X)=EX2)− [EX)]2. ProvaDeixarEX)=μ. Então Var(X)=E[(X-μ)2] =EX2− 2µX+μ2) =EX2)-2 µE(X)+μ2 =EX2)-μ2. Exemplo 4.3.4 Variância de uma distribuição discreta.Mais uma vez, considere a variável aleatóriaXem Exemplo 4.3.3, que assume cada um dos cinco valores −2,0,1,3 e 4 com igual probabilidade. Usaremos o Teorema 4.3.1 para calcular Var(X). No Exemplo 4.3.3, calculamos a média deXcomoμ= 1.2. Para usar o Teorema 4.3.1, precisamos 1 EX2)= [(−2)2+ 02+ 12+32+ 42] = 6. 5 PorqueEX)=1.2, Teorema 4.3.1 diz que Var(X)=6 -(1.2)2= 4.56, o que está de acordo com o cálculo do Exemplo 4.3.3. - A variância (bem como o desvio padrão) de uma distribuição fornece uma medida da propagação ou dispersão da distribuição em torno de sua médiaμ. Um pequeno valor da variância indica que a distribuição de probabilidade está fortemente concentrada em torno de 228 Chapter 4 Expectation μ; a large value of the variance typically indicates that the probability distribution has a wide spread around μ. However, the variance of a distribution, as well as its mean, can be made arbitrarily large by placing even a very small but positive amount of probability far enough from the origin on the real line. Example 4.3.5 Slight Modification of a Bernoulli Distribution. Let X be a discrete random variable with the following p.d.f.: f (x) = ⎧ ⎪⎪⎪⎨ ⎪⎪⎪⎩ 0.5 if x = 0, 0.499 if x = 1, 0.001 if x = 10,000, 0 otherwise. There is a sense in which the distribution of X differs very little from the Bernoulli distribution with parameter 0.5. However, the mean and variance of X are quite different from the mean and variance of the Bernoulli distribution with parame- ter 0.5. Let Y have the Bernoulli distribution with parameter 0.5. In Example 4.1.3, we computed the mean of Y as E(Y) = 0.5. Since Y 2 = Y, E(Y 2) = E(Y) = 0.5, so Var(Y) = 0.5 − 0.52 = 0.25. The means of X and X2 are also straightforward calcula- tions: E(X) = 0.5 × 0 + 0.499 × 1 + 0.001 × 10,000 = 10.499 E(X2) = 0.5 × 02 + 0.499 × 12 + 0.001 × 10,0002 = 100,000.499. So Var(X) = 99,890.27. The mean and variance of X are much larger than the mean and variance of Y. ◀ Properties of the Variance We shall now present several theorems that state basic properties of the variance. In these theorems we shall assume that the variances of all the random variables exist. The first theorem concerns the possible values of the variance. Theorem 4.3.2 For each X, Var(X) ≥ 0. If X is a bounded random variable, then Var(X) must exist and be finite. Proof Because Var(X) is the mean of a nonnegative random variable (X − μ)2, it must be nonnegative according to Theorem 4.2.2. If X is bounded, then the mean exists, and hence the variance exists. Furthermore, if X is bounded the so too is (X − μ)2, so the variance must be finite. The next theorem shows that the variance of a random variable X cannot be 0 unless the entire probability distribution of X is concentrated at a single point. Theorem 4.3.3 Var(X) = 0 if and only if there exists a constant c such that Pr(X = c) = 1. Proof Suppose first that there exists a constant c such that Pr(X = c) = 1. Then E(X) = c, and Pr[(X − c)2 = 0] = 1. Therefore, Var(X) = E[(X − c)2] = 0. Conversely, suppose that Var(X) = 0. Then Pr[(X − μ)2 ≥ 0] = 1 but E[(X − μ)2] = 0. It follows from Theorem 4.2.3 that Pr[(X − μ)2 = 0] = 1. Hence, Pr(X = μ) = 1. 228 Capítulo 4 Expectativa μ; um valor grande da variância normalmente indica que a distribuição de probabilidade tem uma ampla dispersão em tornoμ. No entanto, a variância de uma distribuição, bem como a sua média, pode ser tornada arbitrariamente grande, colocando mesmo uma quantidade muito pequena, mas positiva, de probabilidade suficientemente longe da origem na linha real. Exemplo 4.3.5 Ligeira modificação de uma distribuição de Bernoulli.DeixarXser uma variável aleatória discreta com o seguinte pdf: ⎧ ⎪⎪0.5 sex=0, sex=1, sex=10,000, caso contrário. ⎪⎨ 0.499 f(x)= ⎪⎪⎪0.001 ⎩ 0 Há um sentido em que a distribuição deXdifere muito pouco da distribuição de Bernoulli com parâmetro 0,5. No entanto, a média e a variância deXsão bastante diferentes da média e da variância da distribuição de Bernoulli com parâmetro 0,5. DeixarStem a distribuição de Bernoulli com parâmetro 0,5. No Exemplo 4.1.3, calculamos a média deScomoE(S)=0.5. DesdeS2=S,E(S2)=E(S)=0.5, então Var (S)=0.5 − 0.52= 0.25. Os meios deXeX2também são cálculos simples: EX)=0.5×0 + 0.499×1 + 0.001×10,000 = 10.499 EX2)=0.5×02+ 0.499×12+ 0.001×10,0002= 100,000.499. Então Var(X)=99,890.27. A média e a variância deXsão muito maiores que a média e a variância deS. - Propriedades da Variância Apresentaremos agora vários teoremas que estabelecem propriedades básicas da variância. Nestes teoremas assumiremos que existem variâncias de todas as variáveis aleatórias. O primeiro teorema diz respeito aos valores possíveis da variância. Teorema 4.3.2 Para cadaX, Var(X)≥0. SeXé uma variável aleatória limitada, então Var(X)deve existir e ser finito. ProvaPorque Var(X)é a média de uma variável aleatória não negativa(X-μ)2, deve ser não negativo de acordo com o Teorema 4.2.2. SeXé limitado, então a média existe e, portanto, a variância existe. Além disso, seXé limitado e também é (X-μ)2, então a variância deve ser finita. O próximo teorema mostra que a variância de uma variável aleatóriaXnão pode ser 0, a menos que toda a distribuição de probabilidade deXestá concentrado em um único ponto. Teorema 4.3.3 Var(X)=0 se e somente se existe uma constante c tal que Pr(X=c)=1. ProvaSuponha primeiro que existe uma constantectal que Pr(X=c)=1. Então EX)= ce Pr[(X-c)2= 0] = 1. Portanto, Var(X)=E[(X-c)2] = 0. aquele Var(X)=0. Então Pr[(X-μ)2≥0] = 1 mas Por outro lado, suponha E[(X-μ)2] = 0. Segue-se do Teorema 4.2.3 que Pr[(X-μ)2= 0] = 1. Portanto, Pr.(X=μ)=1. 4.3Variancia 229 4.3 Variance 229 Figura 4.6A pdf de uma pdf Figure 4.6 The p.d.f. of a pdf. variavel aleatériaXjunto wn —— pdf dex random variable X together wn — p.d.f. of x com os pdfs deX+3 e = joven Bat sex 3 with the p.d.f’s of X +3 and 7 poe PAE va +3 - X. Observe que os spreads de 777 pal dex —X. Note that the spreads of oo7peno todas as trés distribuicgées p n all three distributions appear p n parecem iguais. I\ 90 pa the same. Ii 6s pa pion ii pio i 4 1! PY 1! po 1! pi it i 4 1! pk ot Pog i! PY i! PY I ! : B 1 ! : B 1 4 : B 1 tow : B ! loo : 4 ! los : 4 ! ! : 4 ! ! : 4 ! | : 4 ! I : * ! ! : 4 ! I : ‘ I l : % I l : 5 1 1 ; ! 1 ; J I! ‘, J i “, -2 0 2 4 6 Xx -2 9 2 4 6 * Teorema Para constantesaeb, deixar S=machadot b. Entao Theorem For constants a and b, let Y =aX +b. Then 4.3.4 3. Var(SaVar(X), 4.3.4 Var(¥) = a2 Var(X), eos= | al ox. and oy = |aloy. ProvaSeEXFy, entaoF(SFap bpelo Teorema 4.2.1. Portanto, Proof If E(X) =p, then E(Y) =ap +b by Theorem 4.2.1. Therefore, Var (5 El(machadot b-ap-bp] =A (machado-ay] Var(Y) = E[(aX +b — ay — b)*| = E[(aX — ap)*] =a A(X] =a2Var(X). =a E[(X — )?] =a? Var(X). Tirando a raiz quadrada de Var(S)rendimentos | a| ox. = Taking the square root of Var(Y) yields |aloy. = Segue-se do Teorema 4.3.4 que Var(X+ bE Var (X)para cada constanted. Este resultado It follows from Theorem 4.3.4 that Var(X + b) = Var(X) for every constant b. é intuitivamente plausivel, uma vez que mudando toda a distribuigdo deXuma distancia de This result is intuitively plausible, since shifting the entire distribution of X a distance bunidades ao longo da linha real mudardo a média da distribuigdo embunidades, mas a of b units along the real line will change the mean of the distribution by b units but mudanga nao afetara a dispersdo da distribuigdo em torno da sua média. A Figura 4.6 the shift will not affect the dispersion of the distribution around its mean. Figure 4.6 mostra a pdf como uma variavel aleatoriaXjunto com o pdf deX+3 para ilustrar como uma shows the p.d.f. a random variable X together with the p.d.f. of X +3 to illustrate mudang¢a na distribuigdo nao afeta o spread. how a shift of the distribution does not affect the spread. Da mesma forma, segue do Teorema 4.3.4 que Var(-XVar(X). Este resultado Similarly, it follows from Theorem 4.3.4 that Var(—X) = Var(X). This result also também é intuitivamente plausivel, uma vez que reflete toda a distribuigdo dexXem is intuitively plausible, since reflecting the entire distribution of X with respect to the relagdo a origem da linha real resultara em uma nova distribuigdo que é a imagem origin of the real line will result in a new distribution that is the mirror image of the espelhada da original. A média sera alterada deypara -y, mas a dispersdo total da original one. The mean will be changed from yz to —j, but the total dispersion of distribuigdo em torno da sua média nado sera afetada. A Figura 4.6 mostra a pdf de the distribution around its mean will not be affected. Figure 4.6 shows the p.d-f. of a uma variavel aleatériaXjunto com o pdf de -Xpara ilustrar como um reflexo da random variable X together with the p.d.f. of —X to illustrate how a reflection of the distribuigdo ndo afeta o spread. distribution does not affect the spread. Exemplo Calculando a variancia e o desvio padrdo de uma fungdo linear.Considere o mesmo Example Calculating the Variance and Standard Deviation of a Linear Function. Consider the same 4.3.6 variavel aleatoriaXcomo no Exemplo 4.3.3, que assume cada um dos cinco valores -2, 4.3.6 random variable X as in Example 4.3.3, which takes each of the five values —2, 0, 1, 3, 0,1,3 e 4 com igual probabilidade. Determinaremos a varidncia e o desvio padrdo des and 4 with equal probability. We shall determine the variance and standard deviation =4X-7, of Y =4X —7. No Exemplo 4.3.3, calculamos a média deXcomoz= 1.2 e a variancia como In Example 4.3.3, we computed the mean of X as 4 = 1.2 and the variance as 4,56. Pelo Teorema 4.3.4, 4.56. By Theorem 4.3.4, Var(SF16 Var.(X}72.96. Var(Y) = 16 Var(X) = 72.96. Além disso, 0 desvio padrdoodeSé Also, the standard deviation o of Y is Os=40x=4(4.56)1 2= 8.54. - oy =4oy = 4(4.56) 7 = 8.54. < 230 Chapter 4 Expectation The next theorem provides an alternative method for calculating the variance of a sum of independent random variables. Theorem 4.3.5 If X1, . . . , Xn are independent random variables with finite means, then Var(X1 + . . . + Xn) = Var(X1) + . . . + Var(Xn). Proof Suppose first that n = 2. If E(X1) = μ1 and E(X2) = μ2, then E(X1 + X2) = μ1 + μ2. Therefore, Var(X1 + X2) = E[(X1 + X2 − μ1 − μ2)2] = E[(X1 − μ1)2 + (X2 − μ2)2 + 2(X1 − μ1)(X2 − μ2)] = Var(X1) + Var(X2) + 2E[(X1 − μ1)(X2 − μ2)]. Since X1 and X2 are independent, E[(X1 − μ1)(X2 − μ2)] = E(X1 − μ1)E(X2 − μ2) = (μ1 − μ1)(μ2 − μ2) = 0. It follows, therefore, that Var(X1 + X2) = Var(X1) + Var(X2). The theorem can now be established for each positive integer n by an induction argument. It should be emphasized that the random variables in Theorem 4.3.5 must be independent. The variance of the sum of random variables that are not independent will be discussed in Sec. 4.6. By combining Theorems 4.3.4 and 4.3.5, we can now obtain the following corollary. Corollary 4.3.1 If X1, . . . , Xn are independent random variables with finite means, and if a1, . . . , an and b are arbitrary constants, then Var(a1X1 + . . . + anXn + b) = a2 1 Var(X1) + . . . + a2 n Var(Xn). Example 4.3.7 Investment Portfolio. An investor with $100,000 to invest wishes to construct a port- folio consisting of shares of one or both of two available stocks and possibly some fixed-rate investments. Suppose that the two stocks have random rates of return R1 and R2 per share for a period of one year. Suppose that R1 has a distribution with mean 6 and variance 55, while R2 has mean 4 and variance 28. Suppose that the first stock costs $60 per share and the second costs $48 per share. Suppose that money can also be invested at a fixed rate of 3.6 percent per year. The portfolio will consist of s1 shares of the first stock, s2 shares of the second stock, and all remaining money ($s3) invested at the fixed rate. The return on this portfolio will be s1R1 + s2R2 + 0.036s3, where the coefficients are constrained by 60s1 + 48s2 + s3 = 100,000, (4.3.2) 230 Capítulo 4 Expectativa O próximo teorema fornece um método alternativo para calcular a variância de uma soma de variáveis aleatórias independentes. Teorema 4.3.5 SeX1, . . . , Xnsão variáveis aleatórias independentes com médias finitas, então Var(X1+. . .+Xn)=Var(X1)+. . .+Var(Xn). ProvaSuponha primeiro quen=2. SeEX1)=μ1eEX2)=μ2, então EX1+X2)=μ1+μ2. Portanto, Var(X1+X2)=E[(X1+X2-μ1-μ2)2] =E[(X1-μ1)2+(X2-μ2)2+ 2(X1-μ1)(X2-μ2)] =Var(X1)+Var(X2) +2E[(X1-μ1)(X2-μ2)]. DesdeX1eX2são independentes, E[(X1-μ1)(X2-μ2)] =EX1-μ1)EX2-μ2) =(μ1-μ1)(μ2-μ2) =0. Segue-se, portanto, que Var(X1+X2)=Var(X1)+Var(X2). O teorema agora pode ser estabelecido para cada inteiro positivonpor um argumento de indução. Deve-se enfatizar que as variáveis aleatórias do Teorema 4.3.5 devem ser independentes. A variância da soma das variáveis aleatórias que não são independentes será discutida na Seção. 4.6. Combinando os Teoremas 4.3.4 e 4.3.5, podemos agora obter o seguinte corolário. Corolário 4.3.1 SeX1, . . . , Xnsão variáveis aleatórias independentes com médias finitas, e sea1, . . . , an ebsão constantes arbitrárias, então Var(a1X1+. . .+anXn+b)=a2 1Var(X1)+. . .+a2 nVar(Xn). Exemplo 4.3.7 Carteira de Investimentos.Um investidor com US$ 100.000 para investir deseja construir um porto fólio composto por ações de uma ou ambas as ações disponíveis e possivelmente alguns investimentos de taxa fixa. Suponha que as duas ações tenham taxas de retorno aleatóriasR1 eR2por ação pelo período de um ano. Suponha queR1tem uma distribuição com média 6 e variância 55, enquantoR2tem média 4 e variância 28. Suponha que a primeira ação custe $ 60 por ação e a segunda custe $ 48 por ação. Suponha que o dinheiro também possa ser investido a uma taxa fixa de 3,6% ao ano. O portfólio será composto poré1ações do primeiro estoque,é2ações da segunda ação e todo o dinheiro restante ($é3) investido à taxa fixa. O retorno desta carteira será é1R1+é2R2+ 0.036é3, onde os coeficientes são limitados por 60é1+48é2+é3= 100,000, (4.3.2) 4.3Variancia 231 4.3 Variance 231 Figura 4.70 conjunto de Figure 4.7 The set of all todas as médias e variancias 1,5 108 means and variances of 115x108 das carteiras de investimento investment portfolios in no Exemplo 4.3.7. A linha ° Example 4.3.7. The solid = vertical sdlida mostraafaixade §& vertical line shows the range 3 variagdes possiveis para é 1108 of possible variances for s 1x108 carteiras com média de 7.000. 2 portfoloios with a mean of E é 7000. = $ 5107 Faixa de variagdes Z socto? Range of variances & Portfolios eficientes Ss Efficient portfolios 2,55107 +-----------3 <——— Portfdlio eficiente 2.55x10’ +-----------> <——— Efficient portfolio com média 7000 with mean 7000 —e| —e| 0 4000 5.000 6.000 7.000 8.000 9.000 10.000 0 4000 5000 6000 7000 8000 9000 10,000 Média do retorno do portfélio Mean of portfolio return assim comoé1, &2, &20. Por enquanto, vamos assumir queRieR2sdo as well as 51, 53, 53 => 0. For now, we shall assume that R, and R, are independent. The independentes. A média e a varidncia do retorno da carteira serdo mean and the variance of the return on the portfolio will be E(sh Rit €2R2+ 0.036636 41+ 442+ 0.03663, E(s,Ry + 59R> + 0.03653) = 65, + 452 + 0.03653, Var(s1 Rit €2R2+ 0.0366355€2 1+28€25. Var(s, Ry + 52Rz + 0.03653) = 55s? + 2855. Um método para comparar uma classe de carteiras é dizer que a carteira A é pelo menos One method for comparing a class of portfolios is to say that portfolio A is at least tao boa quanto a carteira B se o retorno médio de A for pelo menos tao grande quanto o as good as portfolio B if the mean return for A is at least as large as the mean return retorno médio de B e se a varidncia de A nao for maior que 0 variancia de B. (Ver for B and if the variance for A is no larger than the variance of B. (See Markowitz, Markowitz, 1987, para um tratamento classico de tais métodos.) A razdo para preferir uma 1987, for a classic treatment of such methods.) The reason for preferring smaller variancia menor é que a grande variancia esta associada a grandes desvios da média, e variance is that large variance is associated with large deviations from the mean, para carteiras com uma média comum, alguns dos grandes os desvios terdo que ficar and for portfolios with a common mean, some of the large deviations are going to abaixo da média, levando ao risco de grandes perdas. A Figura 4.7 6 um grafico dos pares have to be below the mean, leading to the risk of large losses. Figure 4.7 is a plot (média, variancia) para todas as carteiras possiveis neste exemplo. Ou seja, para cada(é1, of the pairs (mean, variance) for all of the possible portfolios in this example. That €2, 63)que satisfazem (4.3.2), existe um ponto na regido delineada da Fig. 4.7. Os pontos a is, for each (51, 52, 53) that satisfy (4.3.2), there is a point in the outlined region of direita e na parte inferior so aqueles que apresentam o maior retorno médio para uma Fig. 4.7. The points to the right and toward the bottom are those that have the largest variancia fixa e aqueles que apresentam a menor varidncia para um retorno médio fixo. mean return for a fixed variance, and the ones that have the smallest variance for Esses portfdlios sdo chamadoseficiente. Por exemplo, suponha que o investidor gostaria a fixed mean return. These portfolios are called efficient. For example, suppose that de um retorno médio de 7.000. O segmento de linha vertical acima de 7.000 no eixo the investor would like a mean return of 7000. The vertical line segment above 7000 horizontal da Fig. 4.7 indica as variancias possiveis de todas as carteiras com retorno on the horizontal axis in Fig. 4.7 indicates the possible variances of all portfolios with médio de 7.000. a variancia é eficiente e é indicada na Figura 4.7. Este portfélio temé1= mean return of 7000. Among these, the portfolio with the smallest variance is efficient 524.7, €2= 609.7, B= 39,250 e variancia 2.55x107. Portanto, toda carteira com retorno and is indicated in Fig. 4.7. This portfolio has s; = 524.7, sy = 609.7, s3 = 39,250, and médio superior a 7.000 deve ter variancia maior que 2.55x107, e todo portfdlio com variance 2.55 x 10’. So, every portfolio with mean return greater than 7000 must have varidncia menor que 2.55x107 variance larger than 2.55 x 10’, and every portfolio with variance less than 2.55 x 107 deve ter retorno médio menor que 7.000. - must have mean return smaller than 7000. < A Variancia de uma Distribuicaéo Binomial The Variance of a Binomial Distribution Consideraremos agora novamente o método de geracdo de uma distribuicgado binomial We shall now consider again the method of generating a binomial distribution pre- apresentado na Sedo. 4.2. Suponha que uma caixa contenha bolas vermelhas e bolas azuis e sented in Sec. 4.2. Suppose that a box contains red balls and blue balls, and that the que a proporcdo de bolas vermelhas sejap (0s ps1). Suponha também que uma amostra proportion of red balls is p (0 < p < 1). Suppose also that a random sample of n balls aleatéria denas bolas sao selecionadas da caixa com reposicdo. Paraeu=1,..., 7, deixarXeu=1 se is selected from the box with replacement. For i =1,...,n, let X¥; = 1if the ith ball oeua bola selecionada é vermelha, e deixeXeu=0 caso contrario. SeXdenota o numero total de that is selected is red, and let X; = 0 otherwise. If X denotes the total number of red bolas vermelhas na amostra, entaoX=Xi+. . .+XneXterad a distribuicdo binomial com pardmetrosn balls in the sample, then X = X;+---+ X, and X will have the binomial distribution ep. with parameters n and p. 232 Capitulo 4 Expectativa 232 Chapter 4 Expectation Figura 4.8Duas distribuigdes pf Figure 4.8 Two binomial pf. binomiais com a mesma distributions with the same 3 média (2,5), mas variancias m710,p-0,25 mean (2.5) but different Ss mn = 10, p = 0.25 diferentes. variances. 1 2 S ° So 9 S q © 5 S S S 2 S S 3 8 Oo So 9 2 4 6 8 10 x 0 2 4 6 8 10 x DesdeXi,..., Xnsdo independentes, segue do Teorema 4.3.5 que Since X,,..., X, are independent, it follows from Theorem 4.3.5 that ” n Var(XF Var (Xeu). Var(X) = > Var(X;). eu=1 i=1 De acordo com o Exemplo 4.1.3, FXeu-pparaeu=1,...,n. DesdeX2 eu=Xeupara cadaeu, According to Example 4.1.3, E(X;) = p fori =1,...,n. Since x? = X; for each i, EX2 eu EXeuF p. Portanto, pelo Teorema 4.3.1, E(X?) = E(X;) = p. Therefore, by Theorem 4.3.1, Var (Xeu)=EX2 eu) [EXeuJ]2 Var(X;) = E(X?) — [E(X)) = = 5 2 =p-p2=p(\ -pag.). =p-p° =p—p). Segue-se agora que It now follows that Var(XFnp( -pdag.). (4.3.3) Var(X) =np(1— p). (4.3.3) A Figura 4.8 compara duas distribuigées binomiais diferentes com a mesma média Figure 4.8 compares two different binomial distributions with the same mean (2,5) mas variancias diferentes (1,25 e 1,875). Pode-se ver como 0 PF da distribuicdo com a (2.5) but different variances (1.25 and 1.875). One can see how the p.f. of the distri- maior variancia (n=10,p=0.25) é maior em valores mais extremos e menor em valores bution with the larger variance (n = 10, p = 0.25) is higher at more extreme values mais centrais do que o PF da distribuigdo com menor variancia (7=5,p=0.5). Da mesma and lower at more central values than is the p.f. of the distribution with the smaller forma, a Figura 4.5 compara duas distribuigdes uniformes com a mesma média (30) e variance (n = 5, p = 0.5). Similarly, Fig. 4.5 compares two uniform distributions with variancias diferentes (8,33 e 75). Aparece o mesmo padrdo, nomeadamente que a the same mean (30) and different variances (8.33 and 75). The same pattern appears, distribuigéo com maior variancia tem maior pdf em valores mais extremos e menor pdf namely that the distribution with larger variance has higher p.d-f. at more extreme em valores mais centrais. values and lower p.d.f. at more central values. Intervalo Interquartil Interquartile Range Exemplo A distribuigao de Cauchy.No Exemplo 4.1.8, vimos uma distribuicdo (a dis- Example The Cauchy Distribution. In Example 4.1.8, we saw a distribution (the Cauchy dis- 4.3.8 tribuigdo) cuja média nao existia e, portanto, sua varidncia ndo existe. Mas ainda 4.3.8 tribution) whose mean did not exist, and hence its variance does not exist. But, we podemos querer descrever 0 qudo espalhada é essa distribuigdo. Por exemplo, sex might still want to describe how spread out such a distribution is. For example, if X tem a distribuigdo de Cauchy e=2X, é ldgico que Sé duas vezes mais espalhado queX has the Cauchy distribution and Y = 2X, it stands to reason that Y is twice as spread é, mas como quantificamos isso? - out as X is, but how do we quantify this? < Existe uma medida de spread para cada distribuigdo, independentemente There is a measure of spread that exists for every distribution, regardless of de a distribuigdo ter ou ndo média ou varidncia. Lembre-se da Definigdo 3.3.2 whether or not the distribution has a mean or variance. Recall from Definition 3.3.2 que a fungdo quantilica para uma variavel aleatdria é 0 inverso da cdf e é that the quantile function for a random variable is the inverse of the c.d-f., and it is definida para cada variavel aleatoria. defined for every random variable. 4.3Variancia 233 4.3 Variance 233 Definicao Intervalo Interquartil (IQR).DeixarXser uma variavel aleatéria com funcgdo quantil£1(P) Definition Interquartile Range (IQR). Let X be a random variable with quantile function F~!(p) 4.3.2 para 0<p <1. O/ntervalo interquartil (IQR% definido para serF1(0.75}F1(0.25). 4.3.2 for 0 < p <1. The interquartile range (IOR) is defined to be F~1(0.75) — F~!(0.25). Em palavras, 0 IQR é a duracao do intervalo que contém a metade intermediaria da In words, the IQR is the length of the interval that contains the middle half of the distribuicgdo. distribution. Exemplo A distribuigdo de Cauchy.DeixarXxtem a distribuigdo de Cauchy. O CDFAdeXpode Example The Cauchy Distribution. Let X have the Cauchy distribution. The c.d.f. F of X can 4.3.9 ser encontrado usando uma substituicdo trigonométrica na seguinte integral: 4.3.9 be found using a trigonometric substitution in the following integral: Jx morrer 1 ronzeado-1 (x, * 1 t 1 Fix ets te Fo)= | _4y___ 1, tan) -om(lt+voce) 2 T -o m(lt+y2) 2 sa onde bronzeado-1(xX 0 inverso principal da fungdo tangente, tomando valores de where tan~!(x) is the principal inverse of the tangent function, taking values from - M/2 param/2 comoxcorre de -~para~.A fun¢gdo quantil deXé entaoF1 (PE —m/2 to 1/2 as x runs from —oo to oo. The quantile function of X is then F~!(p) = bronzeado[7/(p-1/2)] para 0<p <1. OAIQ € tan[z(p — 1/2)] for 0 < p < 1. The IQR is F1(0.75} F1 (0.25 Fbronzeado(m4}bronzeado(-m4-2. F-!(0.75) — F~1(0.25) = tan(a/4) — tan(—2/4) =2. Nao é dificil mostrar que, seS=2X, entdo o IQR deSé 4. (Veja o Exercicio 14.) It is not difficult to show that, if Y =2X, then the IOR of Y is 4. (See Exercise 14.) - < Resumo Summary A variagao dex, denotado por Var(X), € a média de [X-EX]2e mede 0 qudo The variance of X, denoted by Var(X), is the mean of [X — E(X)} and measures how espalhada é a distribuigdo dexé. A varidncia também é igual£X2} [EXJ]2. O desvio spread out the distribution of X is. The variance also equals E(X*) — [E(X)f. The padrdo é a raiz quadrada da varidncia. A variagdo demachado+b, onde aebsdo standard deviation is the square root of the variance. The variance of aX + b, where constantes, éa2Var(X). A variancia da soma das variaveis aleatdrias a and b are constants, is a2 Var(X). The variance of the sum of independent random independentes é a soma das variancias. Por exemplo, a variancia da distribuigdo variables is the sum of the variances. As an example, the variance of the binomial binomial com pardmetrosnepénp(l -p). O intervalo interquartil (IQR) éa distribution with parameters n and p is np(1 — p). The interquartile range (IQR) is diferenca entre os quantis 0,75 e 0,25. O IQR € uma medida de spread que existe the difference between the 0.75 and 0.25 quantiles. The IQR is a measure of spread para cada distribuicdo. that exists for every distribution. Exercicios Exercises 1.Suponha queAtem distribuigdo uniforme no intervalo 6.Suponha queXeSsao variaveis aleatdrias 1. Suppose that X has the uniform distribution on the 6. Suppose that X and Y are independent random vari- [0,1]. Calcule a variancia dex. independentes cujas variancias existem e tais queEX= interval [0, 1]. Compute the variance of X. ables whose variances exist and such that E(X) = E(Y). E(S). Mostre isso Show that 2.Suponha que uma palavra seja selecionada 2. Suppose that one word is selected at random from the aleatoriamente da frasea garota colocou seu lindo chapéu i _ sentence THE GIRL PUT ON HER BEAUTIFUL RED HAT. If X — yy2_ vermelho.SeX denota o numero de letras da palavra AL(X-S)2] = Var(X#Var(S): denotes the number of letters in the word that is selected, EU(X — Y)"]= Var(X) + Var(Y). selecionada, qual é 0 valor de Var(X? what is the value of Var(X)? 7.Suponha queXeSsdo variaveis aleatdérias independentes 7. Suppose that X and Y are independent random vari- 3.Para todos os numerosaebde tal modo queuma < b, encontre a para as quais Var(XVar(S)=3. Encontre os valores de 3. For all numbers a and b such that a < b, find the vari- ables for which Var(X) = Var(Y) = 3. Find the values of variancia da distribuigdo uniforme no intervalo [um, 6]. (a)Var (X¥ SJe(b)Var (2X-3.S+1). ance of the uniform distribution on the interval [a, b]. (a) Var(X — Y) and (b) Var(2X — 3Y + 1). 4.Suponha queXé uma variavel aleatoria para a qualEX) —_g.Construa um exemplo de distribuicao para a qual a 4. Suppose that X is a random variable for which E(X)= _§, Construct an example of a distribution for which the = pe Var (X02. Mostre isso ELX(X-1)] =p(u- 1-2. média é finita, mas a variancia é infinita. wand Var(X) = 0%. Show that mean is finite but the variance is infinite. E[X(X —D]=n(u-1) +0. ; ; . _ 9.DeixarAtem a distribuicdo uniforme discreta nos . . 9. Let X have the discrete uniform distribution on the 5.DelxarXseja uma variavel aleatoria para a qual EX} pe Var(XF — inteiros 1,..., n. Calcule oyvariacdo deX.DicaNocé 5. Let X be a random variable for which £(X) =u and integers 1, ...,. Compute the variance of X. Hint: You 2, e deixarcseja uma constante arbitraria. Mostre isso pode querer usar a formula ts e=n(n). ant Var(X) =o“, and let c be an arbitrary constant. Show that may wish to use the formula )7"_, =n(n+1)-(2n+ Al(X%-ch] =(p-Ce+ 02. 1)A. E[(X —c)*]=(u—0)* +07”. 1)/6. 234 Capitulo 4 Expectativa 234 Chapter 4 Expectation 10.Considere o exemplo de portfolio eficiente no final do 11.DeixarXtem a distribuigdo uniforme no intervalo [0, 10. Consider the example efficient portfolio at the end of 11. Let X have the uniform distribution on the interval Exemplo 4.3.7. Suponha queReutem a distribuicgdo 1]. Encontre o IQR dex. Example 4.3.7. Suppose that R; has the uniform distribu- [0, 1]. Find the IQR of X. if int lo [aeu, b eu=1,2. . ta ti the int 1fa;, b;| for i = 1, 2. uniforme no intervalo [aeu, Deu] para 12.DeixarXtem o pdff/x-experiéncia-x)parax20, e MxF ton on the interval [a;, bj] for i 12. Let X have the p.d.f. f(x) = exp(—x) for x > 0, and a.Encontre os dois intervalos [a1, b1] e [a2, 2].Dica:Os 0 parax <0. Encontre 0 IQR dex. a. Find the two intervals [a,, b,] and [a>, by]. Hint: The f(x) =0 for x < 0. Find the IOR of X. intervalos sdo determinados pelas médias e variancias. ; an ; . intervals are determined by the means and variances. . Bo . bE isco (VaR iad 13.DeixarXtem a distribuigdo binomial com pardmetros b. Find the value atrisk (VaR) for th 1 tfoli 13. Let X have the binomial distribution with parameters -Encontre o valor em risco (VaR) para a carteira de exemplono 5 «9.3. Encontre 0 IQR deX.DicaNolte ao Exemplo 3.3.9 - Find the value atrisk (VaR) forthe example portfolio 5/444 0.3, Find the IOR of X. Hint: Return to Exam- nivel de probabilidade 0,97. Dica-Revise o Exemplo 3.9.5 para . at probability level 0.97. Hint: Review Example 3.9.5 an e a Tabela 3.1. “ ple 3.3.9 and Table 3.1. ver como encontrar a FDP da soma de duas variaveis to see how to find the p.d.f. of the sum of two uniform aleatorias uniformes. 14.DeixarXser uma variavel aleatéria cujo intervalo interquartil random variables. 14. Let X be arandom variable whose interquartile range én. DeixarS=2X. Prove que o intervalo interquartil deSé 277. is 7. Let Y = 2X. Prove that the interquartile range of Y is 2n. 4.4 Momentos 4.4 Moments Para uma varidvel aleat6riaX,os meios dos poderesXk(chamados momentos) parak >2tém For a random variable X, the means of powers X* (called moments) for k > propriedades tedricas Uteis e algumas delas sao usadas para resumos adicionais de uma 2 have useful theoretical properties, and some of them are used for additional distribui¢ao. A funcao geradora de momento é uma ferramenta relacionada que auxilia na summaries of a distribution. The moment generating function is a related tool derivagao de distribui¢ées de somas de varidveis aleatdrias independentes e that aids in deriving distributions of sums of independent random variables and propriedades limitantes das distribuicées. limiting properties of distributions. Exist€ncia de Momentos Existence of Moments Para cada variavel aleatériaXe todo numero inteiro positivok, a expectativaEXxé For each random variable X and every positive integer k, the expectation E(X*) is chamado dek°momento de X. Em particular, de acordo com esta terminologia, a called the kth moment of X. In particular, in accordance with this terminology, the média deXé o primeiro momento dex. mean of X is the first moment of X. Diz-se que oko momento existe se e somente seF(| X| k) <~.Se a varidvel aleatériaXé It is said that the kth moment exists if and only if E(|X|) < oo. If the random limitado, isto é, se existem numeros finitosaebtal que Pr(as XSb¥1, entéo todos os momentos variable X is bounded, that is, if there are finite numbers a and b such that Pr(a < deXdeve necessariamente existir. E possivel, no entanto, que todos os momentos deXexistir X <b) =1, then all moments of X must necessarily exist. It is possible, however, that mesmo quexnao é limitado. E mostrado no proximo teorema que se oko momento deXexiste, all moments of X exist even though X is not bounded. It is shown in the next theorem entado todos os momentos de ordem inferior também devem existir. that if the kth moment of X exists, then all moments of lower order must also exist. Teorema SeF(| X| k) <epara algum numero inteiro positivok, entaoF/| X|j) <epara cada positivo Theorem If E(|X|*) < co for some positive integer k, then E(|X|/) < oo for every positive 4.4.1 inteirojde tal modo que <k. 4.4.1 integer j such that j <k. ProvaAssumiremos, por conveniéncia, que a distribuigdo dexé continuo e o pdf Proof We shall assume, for convenience, that the distribution of X is continuous and éfEntdo the p.d.f. is f. Then Je pe F(|X|pF |x| ifodx E(\X!)= / |x| fr) dx f° J _ = [xl ,flobet |x| fix) =| isiiroodxs [inv poy as pes j |x]>1 |x|<1 |x|>1 < 1.foxddx+ | x] kfOJdx < / 1. f(x) dx + / lx |K fx) dx [x|<1 |x|>1 |x|<1 |x|>1 <Pr.(|X| <1FE(|X| 4). < Pr(|X| <1) + E(x). Por hipdtese, F/| X| &) <ee.Seque-se portanto queF/| X|j) <.Uma prova semelhante é valida By hypothesis, E(|X|") < oo. It therefore follows that E(|X|/) < oo. A similar proof para um tipo de distribuigdo discreta ou mais geral. a holds for a discrete or a more general type of distribution. a Em particular, segue do Teorema 4.4.1 que se£X2) <~,entdo tanto a média de In particular, it follows from Theorem 4.4.1 that if E(X*) < oo, then both the Xe a variagdo deXexistir. O Teorema 4.4.1 se estende ao caso em que mean of X and the variance of X exist. Theorem 4.4.1 extends to the case in which 4.4Momentos 235 4.4 Moments 235 jeksdo numeros positivos arbitrarios em vez de apenas numeros inteiros. (Veja o Exercicio j and k are arbitrary positive numbers rather than just integers. (See Exercise 15 in 15 nesta secdo.) Entretanto, ndo usaremos tal resultado neste texto. this section.) We will not make use of such a result in this text, however. Momentos CentraisSuponha queXé uma variavel aleatéria para a qualEX+ ys. Para Central Moments Suppose that X is a random variable for which E(X) = yw. For cada inteiro positivok, a expectativaAL(X-~)] € chamado dek*momento central doXou every positive integer k, the expectation E[(X — j1)‘]is called the kth central moment ok momento de X em relagdo a média. Em particular, de acordo com esta of X or the kth moment of X about the mean. In particular, in accordance with this terminologia, a variancia deXé o segundo momento centralX. terminology, the variance of X is the second central moment of X. Para toda distribuicdo, o primeiro momento central deve ser 0 porque For every distribution, the first central moment must be 0 because EX-L + L=0. E(X —py=u—-pw=0. Além disso, se a distribuigdo deXé simétrico em relagdo a sua médiay, e se o Furthermore, if the distribution of X is symmetric with respect to its mean yz, and if momento central £é[(X-k] existe para um determinado numero inteiro impark, entado the central moment E[(X — j2)*] exists for a given odd integer k, then the value of o valor de A[(X-Lk] sera 0 porque os termos positivos e negativos nesta expectativa E[(X — )*] will be 0 because the positive and negative terms in this expectation will se cancelardo. cancel one another. Exemplo Um pdf simétricoSuponha queXtem uma distribuigdo continua para a qual o pdf Example A Symmetric p.d.f. Suppose that X has a continuous distribution for which the p.d-f. 4.4.1 tem o seguinte formato: 4.4.1 has the following form: 2 f(XJF Ce-(x-3p2pata -0<x <r, f(x) =ce""-9Y 2 for —00 < x < 00. Vamos determinar a média deXe todos os momentos centrais. We shall determine the mean of X and all the central moments. Pode-se mostrar que para todo numero inteiro positivok, It can be shown that for every positive integer k, Joo 0° 5 | x| ke-(x-3n2dx<0o, / |x|ke~@ 3"? dx < 00. 00 —0o Assim, todos os momentos dexXexistir. Além disso, desdef(x¥ simétrico em relacdo ao Hence, all the moments of X exist. Furthermore, since f(x) is symmetric with respect pontox=3, entao£X3. Devido a esta simetria, seqgue-se também que £[(*-3)k] = 0 para to the point x = 3, then E(X) =3. Because of this symmetry, it also follows that cada inteiro positivo impark. Para aték=2n, podemos encontrar uma férmula recursiva E[(X — 3)']=0 for every odd positive integer k. For even k = 2n, we can find a para a sequéncia de momentos centrais. Primeiro, deixesim=x-yem todas as formulas recursive formula for the sequence of central moments. First, let y = x — w in all integrais. Entéo paran21, 0 2n0 momento central é the integral fomulas. Then, for n > 1, the 2nth central moment is Joo oo 5 CLU2 n= Simmence-sim/morri. Mon -| yce dy, — 0 —0o Use integracdo por partes comvocé=simen-1edv=V6s-simz2morrer. Segue que vocé= Use integration by parts with wu = y2”~! and dv= ye” dy, It follows that du = (21n-1)s2n-2morrere V= - e-sim2. Entao, (2n — 1) y2"-dy and v = —e~”/2, So, Joo Joo oo oo CLR n= UAV=UV| 5 oo vdu Mn =| udv = UV\ > 66 -— / udu — 0 - 0 —0o —0o q i | Je 2 19 | °° 2 = -sinmmr' €-sirma/2\p + (20-1) sirmn-2ce-simrmorrer _— —y2n-le-y 2] + (2n —1) / yen 2¢e-Y (gy Sim=-« —o y=—00 —oo =(21-1 )ma(n-1). = (2n — 1)m2~y_1)- Porquegfimo= 1,eu0é apenas a integral do pdf; por isso, euo= 1. Seque-se que Because y? = 1, mg is just the integral of the p.d.f.; hence, mg = 1. It follows that eunn= eu=1(2eu-1 Jparan=1,2,.... Entdo, por exemplo, eu2= 1,eu4= 3, eus= 15, e assim Mo, =| ];_,2i — 1) forn =1, 2, .. .. So, for example, mz = 1, m4 = 3, mg = 15, and so sobre. - on. < Distor¢aoNo Exemplo 4.4.1, vimos que os momentos centrais impares sdo todos 0 para Skewness In Example 4.4.1, we saw that the odd central moments are all 0 for a uma distribuigdo simétrica. Isso leva ao seguinte resumo distribucional que é usado para distribution that is symmetric. This leads to the following distributional summary that medir a falta de simetria. is used to measure lack of symmetry. Definigao Distorcao.DeixarXser uma variavel aleatéria com médiay, desvio padrdoce finito Definition Skewness. Let X be a random variable with mean jz, standard deviation o, and finite 4.4.1 terceiro momento. OassimetriadeXé definido para ser ~[(X-]/03. 4.4.1 third moment. The skewness of X is defined to be E[(X — uy \/o?. raduzido do Inglés para o Portugués - www.onlinedoctranslator.com 236 Capitulo 4 Expectativa 236 Chapter 4 Expectation A razao para dividir 0 terceiro momento central porosé fazer com que a The reason for dividing the third central moment by o° is to make the skewness assimetria mega apenas a falta de simetria, e ndo a dispersdo da distribuicgdo. measure only the lack of symmetry rather than the spread of the distribution. Exemplo Distor¢do das distribuigées binomiais.DeixarXtem a distribuigao binomial com parametros Example Skewness of Binomial Distributions. Let X have the binomial distribution with param- 4.4.2 éteres 10 e 0,25. O FP desta distribuigdo aparece na Figura 4.8. Ndo é dificil ver que o 4.4.2 eters 10 and 0.25. The p.f. of this distribution appears in Fig. 4.8. It is not difficult to FP nado é simétrico. A assimetria pode ser calculada da seguinte forma: Primeiro, see that the p.f. is not symmetric. The skewness can be computed as follows: First, observe que a média éy= 10x0.25 = 2.5 e que o desvio padrao é note that the mean is w = 10 x 0.25 = 2.5 and that the standard deviation is O=(10x0.25x0.75)12= 1.369. o = (10 x 0.25 x 0.75)/? = 1.369. Em segundo lugar, calcule Second, compute 0) ( ) 0 10 10 Al(%2.5)3] =(0-2.5)310 0 0.2500.7510+... + (10-2.58 10 0.25000.750 E[(X —2.5))]=(0 — 2.5) 0 Joas' 0.7519 + --- + (10 — 2.5)5({h Jost 0.75° =0.9375. = 0.9375. Finalmente, a assimetria é Finally, the skewness is 0.9375 0.9375 =0,3652. 0.9379 _ 0.3652. 1.3693 1.369% Para comparacao, a assimetria da distribuigdo binomial com os parametros 10 e For comparison, the skewness of the binomial distribution with parameters 10 and 0.2 0,2 60,4743, e a assimetria da distribuig¢do binomial com os pardmetros 10 e 0,3 is 0.4743, and the skewness of the binomial distribution with parameters 10 and 0.3 é 0,2761. O valor absoluto da assimetria aumenta a medida que a probabilidade is 0.2761. The absolute value of the skewness increases as the probability of success de sucesso se afasta de 0,5. E simples mostrar que a assimetria da distribuigao moves away from 0.5. It is straightforward to show that the skewness of the binomial binomial com pardmetrosnepé o negativo da assimetria da distribuigdo binomial distribution with parameters n and p is the negative of the skewness of the binomial com pardmetrosne 1 -p. (Veja o Exercicio 16 nesta secdo.) - distribution with parameters n and 1 — p. (See Exercise 16 in this section.) < Fungdes de geracdo de momento Moment Generating Functions Consideraremos agora uma forma diferente de caracterizar a distribuigdéo de uma variavel aleatoria que esta We shall now consider a different way to characterize the distribution of a random mais intimamente relacionada com os seus momentos do que com 0 local onde a sua probabilidade esta variable that is more closely related to its moments than to where its probability is distribuida. distributed. Definicgao Fungao geradora de momento.Deixar Xseja uma varidvel aleatoria. Para cada numero realt, Definition Moment Generating Function. Let X be a random variable. For each real number f, 4.4.2 definir 4.4.2 define W(t= E(etx). (4.4.1) w(t) = E(e*). (4.4.1) A fungdo Y(t} chamado defuncdao geradora de momento(mgf abreviado) dex. The function y(t) is called the moment generating function (abbreviated m.g.f.) of X. Nota: A funcado geradora de momento deXDepende apenas da distribuic¢ao Note: The Moment Generating Function of X Depends Only on the Distribution deX.Ccomo o mgf é 0 valor esperado de uma fungdo dex, deve depender apenas da of X. Since the m.g.f. is the expected value of a function of X, it must depend only distribuigdo dex. Sexestém a mesma distribui¢gdo, eles devem ter o mesmo mgf on the distribution of X. If X and Y have the same distribution, they must have the same m.g.f. Se a variavel aleatoriaXé limitado, entdo a expectativa na Eq. (4.4.1) deve ser If the random variable X is bounded, then the expectation in Eq. (4.4.1) must finito para todos os valores det.Neste caso, portanto, o mgf deXsera finito para todos be finite for all values of r. In this case, therefore, the m.g.f. of X will be finite for all os valores dez.Por outro lado, seXndo é limitado, entdo o mgf pode ser finito para values of t. On the other hand, if X is not bounded, then the m.g.f. might be finite for alguns valores dete pode nado ser finito para outros. Isso pode ser visto na Eq. (4.4.1), some values of t and might not be finite for others. It can be seen from Eq. (4.4.1), entretanto, que para cada variavel aleatériaX, o mgf(tdeve ser finito no ponto =0e however, that for every random variable X, the m.g.f. y(t) must be finite at the point nesse ponto seu valor deve serWOF(1 1. t = 0 and at that point its value must be y(0) = E(1) = 1. O préximo resultado explica como surgiu o nome “funcdo geradora de momento”. The next result explains how the name “moment generating function” arose. Teorema DeixarXseja uma variavel aleatéria cujo mgfW(Hé finito para todos os valores detem algum aberto Theorem Let X be arandom variables whose m.g.f. w(t) is finite for all values of t in some open 4.4.2 intervalo em torno do pontof0. Entdo, para cada numero inteiron >0, ono momento dex, 4.4.2 interval around the point rt = 0. Then, for each integer n > 0, the nth moment of X, 4.4Momentos 237 44 Moments 237 EXn), é finito e igual ana derivada Wn(tno f=0. Ou seja, EXnE Winj0) paran=1,2,.... E(X"), is finite and equals the nth derivative y(t) at t = 0. That is, E(X”) = yw (0) forn=1,2,.... Esbogamos a prova no final desta secdo. We sketch the proof at the end of this section. Exemplo Calculando um mgfSuponha queXé uma variavel aleatoria para a qual o pdf é tao Example Calculating an m.g.f. Suppose that X is a random variable for which the p.d.-f. is as 4.4.3 segue: { 4.4.3 follows: e-x —x fix parax >0, f(x) = | e for x > 0, 0 de outra forma. 0 otherwise. Vamos determinar o mgf deXe também Var(X). We shall determine the m.g.f. of X and also Var(X). Para cada numero realt, j For each real number f, * CO W(t E(etx)= etxe. xX w(t) = E(e’*) = / ee * dx 0 0 Joo oo = e(t-1 xX, _ / et-Dx dy. 0 0 A integral final nesta equagdo sera finita se e somente set <1. Portanto, W(t The final integral in this equation will be finite if and only if t < 1. Therefore, y(t) is finito apenas parat <1. Para cada valor dez, finite only for t < 1. For each such value of f, 1 1 t= ——. t) = ——_. Wite = v() = — Desde y/(tX finito para todos os valores detem um intervalo aberto ao redor do pontof0, todos Since w(t) is finite for all values of ¢ in an open interval around the point tf = 0, os momentos deXexistir. As duas primeiras derivadas deysdo all moments of X exist. The first two derivatives of w are 1 2 1 2 t=} — e ‘(t= ——. ‘(t) = ———_ and "(t) = ——.. P(t Ath p(t (1th w(t) a1? we a—p: Portanto, EXFWOF1 e£xX2Fy(02. Segue-se agora que Therefore, E(X) = w/(0) = 1and E(X) = w’(0) =2. It now follows that Var (XF WO} [WO2= 1. - Var(X) = w"(0) — [w’(O)P =1. < Propriedades de funcées geradoras de momento Properties of Moment Generating Functions Apresentaremos agora trés teoremas basicos relativos a funcgdes geradoras de We shall now present three basic theorems pertaining to moment generating func- momentos. tions. Teorema DeixarXser uma variavel aleatéria para a qual o mgf éy; deixar S=machado+ b, ondeaeb Theorem Let X be a random variable for which the m.g.f. is y; let Y = aX + b, where a and b 4.4.3 recebem constantes; e deixar ¢2denotar o mgf deS.Entado para cada valor deide tal modo 4.4.3 are given constants; and let 7, denote the m.g.f. of Y. Then for every value of t such que yw (no finito, that (at) is finite, Y2(t ealidsy (no). (4.4.2) W(t) =e Wy(at). (4.4.2) ProvaPela definic¢do de um mof, Proof By the definition of an m.g.f., Y2(t)= E(ety)= Flet (ax+b)] = ealiésE(Cemx)= ealiés (NO). : W(t) = E(el”) = Efe! **)] =e E(e"*) = ee Wy(at). : Exemplo Calculando o mgf de uma fungado linear.Suponha que a distribuigdo dexXé tao Example Calculating the m.g.f. of a Linear Function. Suppose that the distribution of X is as 4.4.4 especificado no Exemplo 4.4.3. Vimos que o maf deXparat <1 é 4.4.4 specified in Example 4.4.3. We saw that the m.g.f. of X for t < lis 1 1 t= —. t) = ——. Write —— Walt) = — SeS=3-2X, entdo o mof deSé finito para>-1/2 e tera o valor If Y =3 — 2X, then the m.g.f. of Y is finite for t > —1/2 and will have the value e3t 3 et tr e3tyn (-2t= —_. - th= —2t) = ——. < Yr(t est (-2tF TaoL Yo(t) =e" Wy (—2t) 14 238 Capitulo 4 Expectativa 238 Chapter 4 Expectation O préximo teorema mostra que o mgf da soma de um numero arbitrario de variaveis The next theorem shows that the m.g.f. of the sum of an arbitrary number of aleatorias independentes tem uma forma muito simples. Devido a esta propriedade, o FGM é independent random variables has a very simple form. Because of this property, the uma ferramenta importante no estudo de tais somas. m.g.f. is an important tool in the study of such sums. Teorema Suponha que,..., Xnsdonvariaveis aleatdérias independentes; e paraeu=1,..., 7, Theorem Suppose that Xj, ..., X,, are n independent random variables; and fori =1,...,7, 4.4.4 deixar Weudenotar o mgf deXeu. DeixarS=Xi+. . .+Xn, e deixe o maf deSser denotado pory. 4.4.4 let y; denote the m.g.f. of X;. Let Y = X,+---+ X,, and let the m.g.f. of Y be denoted Entdo para cada valor detde tal modo que Weu(t finito paraeu=1,..., 7, by w. Then for every value of t such that w;(t) is finite fori =1,...,n, iT’ n Y(t peu(t). (4.4.3) vo=[[ v0. (4.4.3) eu=1 i=1 ProvaPor definicdo, Proof By definition, ( Tr ) h W(t E(ety El et (xi+...4+Xnj] =E Creu W(t) = E(el”) = Efe! Xt tw] = (I ey eu=1 i=l Como as variaveis aleatdriasXi,..., Xnsdo independentes, segue de Theo- Since the random variables X,,..., X, are independent, it follows from Theo- rem 4.2.6 que rem 4.2.6 that ( ag ) iT’ n n FE eMeus E(etxeu). E (1 ) =|]. eu=1 eu=1 i=1 i=l Por isso, Hence, iT’ n W(t peut) 7 wo=[[ vit. 7 eu=1 i=l A fungao geradora de momento para a distribuigao binomiaKuponha que The Moment Generating Function for the Binomial Distribution Suppose that uma variavel aleatériaXtem a distribuigdo binomial com pardmetrosnep. Nas Secdes a random variable X has the binomial distribution with parameters n and p. In 4.2 e 4.3, a média e a varidncia deXforam determinados representando Xcomo a Sections 4.2 and 4.3, the mean and the variance of X were determined by representing soma dernariaveis aleatdrias independentesX,..., Xn. Nesta representagdo, a X as the sum of n independent random variables X,,..., X,,. In this representation, distribuigdo de cada variavelXeué o seguinte: the distribution of each variable X; is as follows: Pr.(Xeu=1Fpe Pr(Xeu=0 1 -pdag. Pr(xX¥;=Y=p and Pr(x;=0)=1-p. Usaremos agora esta representa¢do para determinar o mgf deX=Xi+. . .+Xn. We shall now use this representation to determine the m.g.f. of X = X,+---+ Xp. Como cada uma das variaveis aleatdriasX1,..., Xntem a mesma distribuicao, o Since each of the random variables X;,..., X,, has the same distribution, the mof de cada variavel sera 0 mesmo. Paraeu=1,..., 7, o mgf deXeué m.g.f. of each variable will be the same. Fori = 1,...,n, the m.g.f. of X; is Weu(t E(etxeu=(et)Pr. (Xeu=1 + (1 )Pr.(Xeu=0) W(t) = E(e!**) = (e') Pr(X; = 1) + (A) Pr(X; = 0) =educac¢ao Fisicact| -pag. = pe! + 1 — p. Segue-se do Teorema 4.4.4 que o maf deXneste caso é It follows from Theorem 4.4.4 that the m.g.f. of X in this case is W(t (educacao Fisicat -p)n. (4.4.4) w(t) =(pe' +1— p)". (4.4.4) Singularidade das fungées geradoras de momentoVamos agora declarar mais uma im- Uniqueness of Moment Generating Functions We shall now state one more im- propriedade importante do maf A prova desta propriedade esta além do escopo deste portant property of the m.g.f. The proof of this property is beyond the scope of this livro e foi omitida. book and is omitted. Teorema Se o MGF de duas variaveis aleatériasXieX2sdo finitos e idénticos para todos os valores Theorem If the m.g.f’s of two random variables X, and X> are finite and identical for all values 4.4.5 detem um intervalo aberto ao redor do pontof=0, entdo as distribuigées de probabilidade de x1 4.4.5 of ¢ in an open interval around the point t = 0, then the probability distributions of eX2devem ser idénticos. 2 X, and X> must be identical. 2 4.4Momentos 239 44 Moments 239 O Teorema 4.4.5 é a justificativa para a afirmac¢do feita no inicio desta discussdo, Theorem 4.4.5 is the justification for the claim made at the start of this discussion, a saber, que o maf é outra forma de caracterizar a distribuigdo de uma variavel namely, that the m.g.f. is another way to characterize the distribution of a random aleatoria. variable. A propriedade aditiva da distribuigéo binomialFunc6es geradoras de momento The Additive Property of the Binomial Distribution Moment generating functions fornecem uma maneira simples de derivar a distribuigdo da soma de duas varidveis aleatorias provide a simple way to derive the distribution of the sum of two independent binomiais independentes com o mesmo segundo parametro. binomial random variables with the same second parameter. Teorema SeXieX2sdo variaveis aleatdrias independentes, e seXeutem a distribuigdo binomial Theorem If X; and X, are independent random variables, and if X; has the binomial distribu- 4.4.6 ¢do com pardmetrosnevep (eu=1,2), entdoXi+X2tem a distribuigdo binomial com 4.4.6 tion with parameters n; and p (i = 1, 2), then X, + X> has the binomial distribution pardmetrosm+mep. with parameters n, +n, and p. Prova L etWeudenotar o mgf deXeuparaeu=1,2. Seque-se da Eq. (4.4.4) que Proof L et y; denote the m.g.f. of X; for i = 1, 2. It follows from Eq. (4.4.4) that Weu(t}=(educa¢ao Fisicatt1 -p)neu. Wi (t) =( pe! 4+1-— py. Deixar Ydenotar o mgf deXi+X2. Entdo, pelo Teorema 4.4.4, Let y denote the m.g.f. of X; + X>. Then, by Theorem 4.4.4, W(t}=(educacao Fisicat+1 -p)m+m. w(t) = (pe +1- pyr, Isso pode ser visto na Eq. (4.4.4) que esta fungdo Wé o maf da distribuigdo It can be seen from Eq. (4.4.4) that this function y is the m.g.f. of the binomial binomial com pardmetrosm+mep. Portanto, pelo Teorema 4.4.5, a distribui¢do distribution with parameters n, +n, and p. Hence, by Theorem 4.4.5, the distribution deXi+X2deve ser essa distribuigdo binomial. 7 of X, + X, must be that binomial distribution. 7 Esboco da Prova do Teorema 4.4.2 Sketch of the Proof of Theorem 4.4.2 Primeiro, indicamos por que todos os momentos deXsdo finitos. Deixar>0 seja tal que First, we indicate why all moments of X are finite. Let t > 0 be such that both y(t) ambos y(t) eW(-tsao finitos. Definir g(x etxt e-tx. Notar que and w(—r) are finite. Define g(x) = e’* + e~"*. Notice that Ag(X) = W(t WCt) <. (4.4.5) E[g(XY]=v)+ W(-b <o. (4.4.5) Em cada intervalo limitado dexvalores, 9(/x# limitado. Para cada inteiron >0, On every bounded interval of x values, g(x) is bounded. For each integer n > 0, as como |x| > ~,g(x/é eventualmente maior que |x|. Segque-se destes fatos e |x| > oo, g(x) is eventually larger than |x|". It follows from these facts and (4.4.5) (4.4.5) queE| Xn| <o0, that E|X"| < oo. Embora esteja além do escopo deste livro, pode ser mostrado que a Although it is beyond the scope of this book, it can be shown that the derivative derivada w(texiste no pontof0, e isso emé=0, a derivada da expectativa em w(t) exists at the point t = 0, and that at t = 0, the derivative of the expectation in Eq. (4.4.1) deve ser igual a expectativa da derivada. Por isso, Eq. (4.4.1) must be equal to the expectation of the derivative. Thus, [ ] [( )] d d exx / d tx d ix (OF —Eerr ) =E — . (0) = | — E(e’*) =E}||{—e' . ” at 0 dt 0 v dt 1=0 dt 1=0 Mas But ( d d EX = (Xerx)t0=X. (Se) = (Xel*), 9 =X. dt t-0 dt t=0 Segue que It follows that ys(0F EX). w' (0) = E(X). Em outras palavras, a derivada do mgfy(tnof=0 é a média dex. In other words, the derivative of the m.g.f. y(t) at t = 0 is the mean of X. Além disso, pode-se mostrar que é possivel diferenciar ~(Hum numero Furthermore, it can be shown that it is possible to differentiate y(t) an arbitrary arbitrario de vezes no pontof0. Paran=1,2,...,0na derivadayn\0)no number of times at the point t = 0. For n =1,2,..., the nth derivative w™(0) at t=0 ira satisfazer a sequinte relacdo: t = 0 will satisfy the following relation: cy 1 Un) ” ” (n) — Flex =F =—_ eX ™(@Q) = |e ex =E (ce) Won FE elex) TP ae VO =| TEE] a) = FL (Xnetx)t=0] =EX n). = E[(X"e'*),_9] = E(X"). 240 Capitulo 4 Expectativa 240 Chapter 4 Expectation Por isso, YW (OF EX), WOK EX2), Wr (0 EX3), e assim por diante. Portanto, vemos que o maf, se Thus, w/(0) = E(X), w’(0) = E(X”), w’(0) = E(X), and so on. Hence, we see that for finito em um intervalo aberto em torno def=0, pode ser usado para gerar todos os the m.g.f., if it is finite in an open interval around t = 0, can be used to generate all momentos da distribuigdo tomando derivadas em¢=0. of the moments of the distribution by taking derivatives at t = 0. Resumo Summary Se o& momento de uma variavel aleatoria existe, entao também existe ojo momento para If the kth moment of a random variable exists, then so does the jth moment for every cada / <k. A funcgdo geradora de momento deX, W(t E(etx), se for finito paratem uma j <k. The moment generating function of X, y(t) = E(e'*), if it is finite for ¢ in a vizinhanga de 0, pode ser usado para encontrar momentos deX. Oka derivada dew(t)no neighborhood of 0, can be used to find moments of X. The kth derivative of w(t) at 0 6EXk). O FGM caracteriza a distribuigdo no sentido de que todas as variaveis aleatérias t =Ois E(X*). The m.g.f. characterizes the distribution in the sense that all random que possuem o mesmo FMG tém a mesma distribuigdo. variables that have the same m.g.f. have the same distribution. Exercicios Exercises 1.SeXxtem a distribuigdo uniforme no intervalo [um, 6], variavel para a qual o maf é 1. If X has the uniform distribution on the interval [a, b], variable for which the m.g.f. is qual é o valor do quinto momento central dex? what is the value of the fifth central moment of X? Y2(t ed yn(t-1para -% <t <vo, . ae . W(t) =elO-H for —00 <t <0. 2.SeXtem a distribuigdo uniforme no intervalo [um, 6], escreva 2. If X has the uniform distribution on the interval [a, b], uma formula para cada momento par central dex. Encontre express6es para a média e a variancia deSem write a formula for every even central moment of X. Find expressions for the mean and the variance of Y in 3.Suponha queXé uma variavel aleatoria para a qualEX= termos de média e variancia dex. 3. Suppose that X is a random variable for which E(X) = terms of the mean and the variance of X. 1,EX22, eEX35. Encontre o valor do terceiro 10.Suponha que as varidveis aleatériasXeSsdo iid e 1, E(X*) = 2, and E(X) =5. Find the value of the third 10. Suppose that the random variables X and Y are i.i.d. momento central dex. que o mgf de cada um é central moment of X. and that the m.g.f. of each is 4.Suponha queXé uma variavel aleatéria tal que£X2)é 4. Suppose that X is a random variable such that E(X7) 434 finito.(a)Mostre issoEX2)= [EX/]2.(b)Mostre isso EX2 [ WTF Ca+3¢ para -@ <t <. is finite. (a) Show that E(X2) > [E(X). (b) Show that vO =e for —00 <1 < ov. EX)l2se e somente se existe uma constantec tal que Pr Encontre o mgf deZ=2X-35+4. E(X?) =[E(X)F if and only if there exists a constant c Find the m.gf. of Z =2X — 3Y +4. (X=c¥1.DicaNar (X20. such that Pr(X = c) = 1. Hint: Var(X) > 0. . . : . 11.Suponha queXé uma variavel aleatéria para a qual ; ; ; 11. Suppose that X is a random variable for which the 5.Suponha queXé uma variavel aleatéria com meédiape maf é 0 seguinte: 5. Suppose that X is a random variable with mean w and —_—y.¢-f. is as follows: variag¢doo2z, e que o quarto momento deXé finito. variance o7, and that the fourth moment of X is finite. Mostre isso Witenes, Zack =e para -~<t <o, Show that W(t) = ze + ze + ze for —co <1t<o. AX yh]201. ; EU(X — p)"]2 0". see Acesahce a Encontre a distribuigdo de probabilidade deX.Dica-E uma Find the probability distribution of X. Hint: It is a simple 6.Suponha queXtem a distribui¢do uniforme no distribuicdo discreta simples. 6. Suppose that X has the uniform distribution on the discrete distribution. intervalo [um, 6]. Determine o maf dex. interval [a, b]. Determine the m.g.f. of X. 12.Suponha queXé uma variavel aleatéria para a qual ; ; ; 12. Suppose that X is a random variable for which the 7.Suponha queXé uma variavel aleatéria para a qual o mgf é 0 mgf é 0 seguinte: 7. Suppose that X isarandom variable for which the m.g.f 9. f. is as follows: seguinte: is as follows: 1 1 1 . - 1 == f4et — U(0= Bet expara 00 <t <0o, PitF Avert e t)para -~<t <wo, w(t) = 703 +e) for co <t<oo. w(t) 54 +e +e") for -c<t<o. Encontre a média e a variancia dex. Encontre a distribuicdo de probabilidade dex. Find the mean and the variance of X Find the probability distribution of X. 8.Suponha queXé uma varidvel aleatoria paraa qualomgféo _—_13-DeixarXt€m a distribuigao de Cauchy (ver Exemplo 8. Suppose that X isa random variable for which the m.g.f. 15. Let X have the Cauchy distribution (see Example seguinte: 4.1.8). Prove que 0 mgf y(t finito apenas paraf=0. is as follows: 4.1.8). Prove that the m.g.f. W(z) is finite only for ¢ = 0. Wi(t}=en+3t para -0 <t <, 14.DeixarxXtem pdf ; wit) = et +3" for —c0 <t <0. 14, Let X have p.d.f. Encontre a média e a variancia dex. fiz * 288X a1, Find the mean and the variance of X. f= | x ifx> 1, 9.DeixarXser uma variavel aleatoria com médiaye variagdo 0 caso contrario. 9. Let X be arandom variable with mean yw and variance 0 otherwise. o2, e deixar y (tXdenotar o mof deXpara -~<t <e, Deixarc Prove que 0 mgf y(t finito para todost<0 mas para ndo > o?, and let y(t) denote the m.g.f. of X for —0o < t < oo. Prove that the m.g.f. y(f) is finite for all t <0 but for no seja uma dada constante positiva, e sejaSseja aleatério 0. Let c be a given positive constant, and let Y be a random t>0. 4.5 The Mean and the Median 241 15. Prove the following extension of Theorem 4.4.1: If E(|X|a) < ∞ for some positive number a, then E(|X|b) < ∞ for every positive number b < a. Give the proof for the case in which X has a discrete distribution. 16. Let X have the binomial distribution with parameters n and p. Let Y have the binomial distribution with pa- rameters n and 1 − p. Prove that the skewness of Y is the negative of the skewness of X. Hint: Let Z = n − X and show that Z has the same distribution as Y. 17. Find the skewness of the distribution in Example 4.4.3. 4.5 The Mean and the Median Although the mean of a distribution is a measure of central location, the median (see Definition 3.3.3) is also a measure of central location for a distribution. This section presents some comparisons and contrasts between these two location summaries of a distribution. The Median It was mentioned in Sec. 4.1 that the mean of a probability distribution on the real line will be at the center of gravity of that distribution. In this sense, the mean of a distribution can be regarded as the center of the distribution. There is another point on the line that might also be regarded as the center of the distribution. Suppose that there is a point m0 that divides the total probability into two equal parts, that is, the probability to the left of m0 is 1/2, and the probability to the right of m0 is also 1/2. For a continuous distribution, the median of the distribution introduced in Definition 3.3.3 is such a number. If there is such an m0, it could legitimately be called a center of the distribution. It should be noted, however, that for some discrete distributions there will not be any point at which the total probability is divided into two parts that are exactly equal. Moreover, for other distributions, which may be either discrete or continuous, there will be more than one such point. Therefore, the formal definition of a median, which will now be given, must be general enough to include these possibilities. Definition 4.5.1 Median. Let X be a random variable. Every number m with the following property is called a median of the distribution of X: Pr(X ≤ m) ≥ 1/2 and Pr(X ≥ m) ≥ 1/2. Another way to understand this definition is that a median is a point m that satisfies the following two requirements: First, if m is included with the values of X to the left of m, then Pr(X ≤ m) ≥ Pr(X > m). Second, if m is included with the values of X to the right of m, then Pr(X ≥ m) ≥ Pr(X < m). If there is a number m such that Pr(X < m) = Pr(X > m), that is, if the number m does actually divide the total probability into two equal parts, then m will of course be a median of the distribution of X (see Exercise 16). Note: Multiple Medians. One can prove that every distribution must have at least one median. Indeed, the 1/2 quantile from Definition 3.3.2 is a median. (See Exer- cise 1.) For some distributions, every number in some interval is a median. In such 4.5 A Média e a Mediana 241 15.Prove a seguinte extensão do Teorema 4.4.1: Se E(|X|a) <∞para algum número positivoa, entãoE(|X|b) < ∞para cada número positivob<uma. Forneça a prova para o caso em queXtem uma distribuição discreta. 16.DeixarXtem a distribuição binomial com parâmetros nep. DeixarStem a distribuição binomial com parâmetrosne 1 -p. Prove que a assimetria deSé o negativo da assimetria deX.Dica:DeixarZ=n-Xe mostre issoZtem a mesma distribuição queS. 17.Encontre a assimetria da distribuição no Exemplo 4.4.3. 4.5 A Média e a Mediana Embora a média de uma distribuição seja uma medida de localização central, a mediana (ver Definição 3.3.3) é também uma medida de localização central de uma distribuição. Esta seção apresenta algumas comparações e contrastes entre esses dois resumos de localização de uma distribuição. A mediana Foi mencionado na Seç. 4.1 que a média de uma distribuição de probabilidade na reta real estará no centro de gravidade dessa distribuição. Nesse sentido, a média de uma distribuição pode ser considerada como oCentroda distribuição. Há outro ponto na linha que também pode ser considerado o centro da distribuição. Suponha que haja um ponto eu0que divide a probabilidade total em duas partes iguais, ou seja, a probabilidade à esquerda deeu0é 1/2, e a probabilidade à direita deeu0também é 1/2. Para uma distribuição contínua, a mediana da distribuição introduzida na Definição 3.3.3 é esse número. Se existe taleu0, poderia legitimamente ser chamado de centro de distribuição. Deve-se notar, entretanto, que para algumas distribuições discretas não haverá nenhum ponto em que a probabilidade total seja dividida em duas partes exatamente iguais. Além disso, para outras distribuições, que podem ser discretas ou contínuas, haverá mais de um ponto desse tipo. Portanto, a definição formal de mediana, que será dada agora, deve ser suficientemente geral para incluir estas possibilidades. Definição 4.5.1 Mediana.DeixarXseja uma variável aleatória. Cada númeroeucom a seguinte propriedade é chamado demedianada distribuição deX: Pr.(X≤m)≥1/2 e Pr(X≥m)≥1/2. Outra maneira de entender esta definição é que uma mediana é um pontoeuque satisfaça os dois requisitos a seguir: Primeiro, seeuestá incluído nos valores deX à esquerda deeu, então Pr.(X≤m)≥Pr.(X >m). Em segundo lugar, seeuestá incluído nos valores deXpara a direita deeu, então Pr.(X≥m)≥Pr.(X<m). Se houver um númeroeutal que Pr(X<m)=Pr.(X > m), isto é, se o númeroeu realmente divide a probabilidade total em duas partes iguais, entãoeuserá, obviamente, uma mediana da distribuição deX(veja o Exercício 16). Nota: Múltiplas Medianas.Pode-se provar que toda distribuição deve ter pelo menos uma mediana. Na verdade, o 1/2 quantil da Definição 3.3.2 é uma mediana. (Veja o Exercício 1.) Para algumas distribuições, todo número em algum intervalo é uma mediana. Em tal 242 Capitulo 4 Expectativa 242 Chapter 4 Expectation casos, 0 1/2 quantil é 0 minimo do conjunto de todas as medianas. Quando um intervalo inteiro de cases, the 1/2 quantile is the minimum of the set of all medians. When a whole interval numeros sd0 medianas de uma distribuicdo, alguns escritores referem-se ao ponto médio do of numbers are medians of a distribution, some writers refer to the midpoint of the intervalo como mediana. interval as the median. Exemplo A mediana de uma distribui¢do discreta.Suponha queXtem o seguinte discreto Example The Median of a Discrete Distribution. Suppose that X has the following discrete 4.5.1 distribuigao: 4.5.1 distribution: Pr.(X=1}0.1, Pr. Pr.(X=2}0.2, Pr. Pr(X = 1) = 0.1, Pr(xX =2) = 0.2, (X=3 0.3, (X=40.4. Pr(X = 3) = 0.3, Pr(X = 4) =0.4. O valor 3 € uma mediana desta distribuig¢do porque Pr(X<3 0.6, que é maior que The value 3 is a median of this distribution because Pr(X <3) = 0.6, which is greater 1/2, e Pr(X230.7, que também é maior que 1/2. Além disso, 3 6 a mediana unica than 1/2, and Pr(X > 3) = 0.7, which is also greater than 1/2. Furthermore, 3 is the desta distribui¢do. - unique median of this distribution. < Exemplo Uma distribuigdo discreta para a qual a mediana ndo é Unica.Suponha queXtem o Example A Discrete Distribution for Which the Median Is Not Unique. Suppose that X has the 4.5.2 seguinte distribuicdo discreta: 4.5.2 following discrete distribution: Pr.(X=150.1, Pr. Pr.(X=20.4, Pr. Pr(x = 1) = 0.1, Pr(X = 2) =0.4, (X=3 0.3, (X=4F0.2. Pr(X = 3) = 0.3, Pr(X = 4) = 0.2. Aqui, Pr.(XS$2 1/2, e Pr.(X23 1/2. Portanto, todo valor deeuno intervalo fechado Here, Pr(X < 2) = 1/2, and Pr(X > 3) = 1/2. Therefore, every value of m in the closed 2S eus3 sera uma mediana desta distribuigdo. A escolha mais popular de interval 2 < m <3 will be a median of this distribution. The most popular choice of mediana desta distribui¢do seria o ponto médio 2,5. - median of this distribution would be the midpoint 2.5. < Exemplo A mediana de uma distribuigdo continua.Suponha queXtem uma distribuicado continua Example The Median ofa Continuous Distribution. Suppose that X has a continuous distribution 4.5.3 para o qual o pdf é 0 seguinte: 4.5.3 for which the p.d.f. is as follows: { 3 fix 4x3 para O<x <1, fons] for0<x <1, 0 de outra forma. 0 otherwise. A mediana Unica desta distribuigdo sera o nUmeroeude tal modo que The unique median of this distribution will be the number m such that Jeu fi 1 m 1 1 4x3 dax= 4x3dx= <=. / 4x3 dx -|/ 4x3 dx ==. 0 eu 2 0 m 2 Este numero éeu=1/214. - This number is m = 1/2'/*. < Exemplo Uma distribuigéo continua para a qual a mediana nao é Unica.Suponha queXtem um Example A Continuous Distribution for Which the Median Is Not Unique. Suppose that X has a 4.5.4 distribuigdo continua para a qual.o pdf é o seguinte: 4.5.4 continuous distribution for which the p.d-f. is as follows: li para O<xs1, 1/2 forO<x <1, f(x | 1 para 2.5Sxs3, caso f@m=j1 for 2.5 < x <3, 0 contrario. 0 otherwise. Aqui, para cada valor deeuno intervalo fechado 1<eus2.5, Pr.(X¥Sm Pr. (x2 Here, for every value of m in the closed interval 1 < m <2.5, Pr(X < m) = Pr(X => mF1/2. Portanto, todo valor deeuno intervalo 1<eu<2.5 6 uma mediana desta m) = 1/2. Therefore, every value of m in the interval 1 < m < 2.5 is a median of this distribuigdo. - distribution. < Comparacdo da média e da mediana Comparison of the Mean and the Median Exemplo Ultimo numero da loteria.Em um jogo de loteria estadual, um numero de trés digitos de 000 a 999 Example Last Lottery Number. In a state lottery game, a three-digit number from 000 to 999 4.5.5 é sorteado todos os dias. Depois de varios anos, todos os 1.000 numeros possiveis, exceto um, 4.5.5 is drawn each day. After several years, all but one of the 1000 possible numbers has foram sorteados. Um funcionéario da loteria gostaria de prever quanto tempo levara até que o been drawn. A lottery official would like to predict how much longer it will be until numero que falta seja finalmente sorteado. DeixarXseja o numero de dias (X=1 sendo amanha) that missing number is finally drawn. Let X be the number of days (X = 1 being até que esse nimero apareca. Nao é dificil determinar a distribuicao dex, assumindo que todos tomorrow) until that number appears. It is not difficult to determine the distribution os 1000 numeros tém a mesma probabilidade de serem sorteados todos os dias e of X, assuming that all 1000 numbers are equally likely to be drawn each day and 4.5 A Média e a Mediana 243 4.5 The Mean andthe Median 243 que os sorteios sdo independentes. DeixarAxrepresentam o evento em que o numero que that the draws are independent. Let A, stand for the event that the missing number falta é sorteado no diaxparax=1,2, ... .Entdo {X=1} =A, e parax >1, is drawn on day x for x = 1, 2,.... Then {X = 1} = A, and for x > 1, LEXp =Ac 1M... NAC x-1NAx. {X =x}= At N-A AC NA,. Desde oAxos eventos sdo independentes e todos tém probabilidade 0.001, é facil ver Since the A, events are independent and all have probability 0.001, it is easy to see que o FP deXé that the p.f. of X is fix) { 0.001 (0.999)x-1 — parax=1,2,...de f= { 0.001(0.999)*—! for x =1, 2... 0 outra forma. 0 otherwise. Mas, 0 oficial da loteria quer dar uma previsdo de numero Unico para quando o But, the lottery official wants to give a single-number prediction for when the number numero sera sorteado. Que resumo da distribui¢do seria apropriado para esta will be drawn. What summary of the distribution would be appropriate for this previsdo? - prediction? < O oficial da loteria no Exemplo 4.5.5 deseja algum tipo de numero “médio” ou The lottery official in Example 4.5.5 wants some sort of “average” or “middle” “intermediario” para resumir a distribuigao do numero de dias até o Ultimo numero aparecer. number to summarize the distribution of the number of days until the last number Presumivelmente, ela deseja uma previsdo que ndo seja nem excessivamente grande nem appears. Presumably she wants a prediction that is neither excessively large nor too muito pequena. A média ou a mediana deXpode ser usado como um resumo da distribuicao. small. Either the mean or a median of X can be used as such a summary of the Algumas propriedades importantes da média ja foram descritas neste capitulo, e diversas distribution. Some important properties of the mean have already been described in outras propriedades serdo apresentadas posteriormente neste livro. No entanto, para muitos this chapter, and several more properties will be given later in the book. However, for fins, a mediana é uma medida mais util do meio da distribuigdo do que a média. Por exemplo, many purposes the median is a more useful measure of the middle of the distribution toda distribuigdo tem uma mediana, mas nem toda distribuigdo tem uma média. Conforme than is the mean. For example, every distribution has a median, but not every ilustrado no Exemplo 4.3.5, a média de uma distribuicdo pode se tornar muito grande distribution has a mean. As illustrated in Example 4.3.5, the mean of a distribution removendo uma quantidade pequena, mas positiva, de probabilidade de qualquer parte da can be made very large by removing a small but positive amount of probability from distribuicdo e atribuindo essa quantidade a um valor suficientemente grande dex. Por outro any part of the distribution and assigning this amount to a sufficiently large value of x. lado, a mediana pode nao ser afetada por uma mudanca semelhante nas probabilidades. Se On the other hand, the median may be unaffected by a similar change in probabilities. qualquer quantidade de probabilidade for removida de um valor dexmaior que a mediana e If any amount of probability is removed from a value of x larger than the median atribuido a um valor arbitrariamente grande dex, a mediana da nova distribuigdo sera igual a and assigned to an arbitrarily large value of x, the median of the new distribution da distribuicdo original. No Exemplo 4.3.5, todos os nimeros no intervalo [0,1] sio medianas de will be the same as that of the original distribution. In Example 4.3.5, all numbers in ambas as variadveis aleatériasXeSapesar da grande diferenca em suas médias. the interval [0, 1] are medians of both random variables X and Y despite the large difference in their means. Exemplo Rendas anuais.Suponha que a renda média anual entre as familias de um Example Annual Incomes. Suppose that the mean annual income among the families in a 4.5.6 determinada comunidade custa US$ 30.000. E possivel que apenas algumas familias na comunidade 4.5.6 certain community is $30,000. It is possible that only a few families in the community tenham realmente um rendimento tao grande como 30.000 ddlares, mas essas poucas familias tém actually have an income as large as $30,000, but those few families have incomes that rendimentos muito superiores a 30.000 ddlares. Como exemplo extremo, suponha que existam 100 are very much larger than $30,000. As an extreme example, suppose that there are familias e 99 delas tenham renda de US$ 1.000, enquanto a outra tenha renda de US$ 2.901.000. Se, 100 families and 99 of them have income of $1,000 while the other one has income no entanto, o rendimento médio anual entre as familias for de 30.000 délares, entéo pelo menos of $2,901,000. If, however, the median annual income among the families is $30,000, metade das familias devera ter rendimentos de 30.000 dolares ou mais. - then at least one-half of the families must have incomes of $30,000 or more. < A mediana tem uma propriedade conveniente: a médiandoter. The median has one convenient property that the mean does not have. Teorema Funcdo um para um.DeixarXser uma variavel aleatoria que assume valores em um intervaloFU Theorem One-to-One Function. Let X be a random variable that takes values in an interval [ 4.5.1 de numeros reais. DeixarAser uma fungdo injetora definida no intervaloFU.Seeué uma 4.5.1 of real numbers. Let r be a one-to-one function defined on the interval J. If m is a mediana dex, entéo/(m¥ uma mediana de/(X). median of X, then r(m) is a median of r(X). ProvaDeixar S=/(X). Precisamos mostrar que Pr(S2/(m)21/72 e Pr(Ssr(m)2 1/2. DesdeR Proof Let Y =r(X). We need to show that Pr(Y > r(m)) > 1/2 and Pr(Y <r(m)) = é um para um no intervalo£U,deve estar aumentando ou diminuindo ao longo do 1/2. Since r is one-to-one on the interval /, it must be either increasing or decreasing intervalo£U.SeResta aumentando, entdoS=/(mse e apenas sexX2eu, entdo Pr(S2 r/m)) over the interval /. If r is increasing, then Y > r(m) if and only if X >m, so Pr(Y = =Pr.(Xzm21/2. Da mesma forma,Ss/(mse e apenas seX<eue Pr(Ssr(m)z 12 r(m)) = Pr(X > m) > 1/2. Similarly, Y < r(m) if and only if X <m and Pr(Y <r(m)) => também. Sefesta diminuindo, entaoS2r/mse e apenas seX<eu. O restante da prova é 1/2 also. If r is decreasing, then Y > r(m) if and only if X < m. The remainder of the entdo semelhante ao anterior. 7 proof is then similar to the preceding. 7 244 Capitulo 4 Expectativa 244 Chapter 4 Expectation Consideraremos agora dois critérios especificos pelos quais a previsdo de uma We shall now consider two specific criteria by which the prediction of a random variavel aleat6ériaXpode ser julgado. Pelo primeiro critério, a previsdo étima que pode variable X can be judged. By the first criterion, the optimal prediction that can be ser feita 6 a média. Pelo segundo critério, a previsdo étima é a mediana. made is the mean. By the second criterion, the optimal prediction is the median. Minimizando o erro quadratico médio Minimizing the Mean Squared Error Suponha queXé uma variavel aleatoria com médiaye variacdo oz. Suponha também que o Suppose that X is a random variable with mean p and variance o”. Suppose also that valor deXdeve ser observado em algum experimento, mas esse valor deve ser previsto the value of X is to be observed in some experiment, but this value must be predicted antes que a observacdo possa ser feita. Uma base para fazer a previsdo é selecionar before the observation can be made. One basis for making the prediction is to select algum numerodpara o qual o valor esperado do quadrado do erroX-asera um minimo. some number d for which the expected value of the square of the error X — d will be a minimum. Definigao Erro Quadratico Médio/MSE.O numero/[(~ ep] é chamado deerro quadratico médio Definition Mean Squared Error/M.S.E.. The number E[(X — d)’| is called the mean squared error 4.5.2 (MSE) da previsdod. 4.5.2 (M.S.E.) of the prediction d. O préximo resultado mostra que o numerodpara o qual o MSE é minimizado é The next result shows that the number d for which the M.S.E. is minimized is EX). E(X). Teorema DeixarXser uma variavel aleatéria com variancia finitaoz, e deixarp=EX). Para cada Theorem Let X be a random variable with finite variance o”, and let « = E(X). For every 4.5.2 numerod, 4.5.2 number d, A(Xpr\s A(X ep]. (4.5.1) E[(X — 4)*] < E[(X — d)’]. (4.5.1) Além disso, havera igualdade na relacdo (4.5.1) se e somente sea=y. Furthermore, there will be equality in the relation (4.5.1) if and only if d = w. ProvaPara cada valor ded, Proof For every value of d, Al(Xeh] =EX0- 20X+ab) E[(X — d)*]= E(X* —2dX +’) =FX2)2 dys ab. (4.5.2) = E(X”) —2dyu+d’. (4.5.2) A expressdo final na Eq. (4.5.2) € simplesmente uma fungdo quadratica ded. Por The final expression in Eq. (4.5.2) is simply a quadratic function of d. By elementary diferenciagdo elementar descobrir-se-a que 0 valor minimo desta funcdo é alcangado differentiation it will be found that the minimum value of this function is attained quandoa=y. Portanto, para minimizar o MSE, 0 valor previsto deX deveria ser sua when d=. Hence, in order to minimize the M.S.E., the predicted value of X médiay/. Além disso, quando esta previsdo é usada, o MSE é simplesmente A[(X-72] = should be its mean yw. Furthermore, when this prediction is used, the M.S.E. is simply Q2. = E[(X — p)?]=07. = Exemplo Ultimo numero da loteria.No Exemplo 4.5.5, discutimos uma loteria estadual em que um Example Last Lottery Number. In Example 4.5.5, we discussed a state lottery in which one 4.5.7 numero ainda nao havia sido sorteado. DeixarXrepresentam o numero de dias até que 4.5.7 number had never yet been drawn. Let X stand for the number of days until that o Ultimo numero é eventualmente sorteado. O PF deXfoi calculado no Exemplo 4.5.5 como last number is eventually drawn. The p.f. of X was computed in Example 4.5.5 as { - - x-1 _— fixjz —-(9:0010.999)-1parax=1,2,... de f= | 0.001(0.999) for x 1, 2,... 0 outra forma. 0 otherwise. Podemos calcular a média dexXcomo We can compute the mean of X as > > CO CO EX x0.001 (0.999 )x-1= 0.001 x(0.999 )x-1. (4.5.3) E(X)= > x0.001(0.999)*~! = 0.001 > x(0.999)*—!. (4.5.3) x= 1 x= 1 x=1 x=1 A primeira vista, esta soma nao parece facil de calcular. No entanto, esta intimamente At first, this sum does not look like one that is easy to compute. However, it is closely relacionado com a soma geral related to the general sum Y 1 = 1 o(S¥ sime ——, s(y) = oy =—., 1 -sim 1l-y x=0 x=0 4.5 A Média e a Mediana 245 4.5 The Mean andthe Median 245 se 0<vocé <1. Usando propriedades de séries de poténcias do calculo, sabemos que a derivada if 0 < y < L. Using properties of power series from calculus, we know that the deriva- deg(s)pode ser encontrado diferenciando os termos individuais da série de poténcias. Aquilo é, tive of g(y) can be found by differentiating the individual terms of the power series. That is, » » oO oO g(SF XYer1= XY, g(y)= y xy ls y xy, x=0 x=1 x=0 x= para 0<vocé <1. Mas também sabemos queg(s¥1/(1 -ek. A ultima soma na Eq. (4.5.3) for 0 < y <1. But we also know that g/(y) = 1/(1 — y)*. The last sum in Eq. (4.5.3) is é 9 (0.999 1/0.001 2. Segue que g’(0.999) = 1/(0.001). It follows that EX0.001 1 =1000. - E(X)= 0.0011 = 1000. < (0.0012 (0.001)2 Minimizando 0 erro médio absoluto Minimizing the Mean Absolute Error Outra base possivel para prever o valor de uma variavel aleatériaXé escolher Another possible basis for predicting the value of a random variable X is to choose algum numeroadpara qualF(| X-d| Sera um minimo. some number d for which E(|X — d|) will be a minimum. Definigao Erro Médio Absoluto/MAEO numero£/(| X-d| Jé chamado designitica erro absoluto Definition Mean Absolute Error/M.A.E. The number E(|X — d]) is called the mean absolute error 4.5.3 (MAE) da previsdod. 4.5.3 (M.A.E.) of the prediction d. Mostraremos agora que 0 MAE é minimizado quando o valor escolhido dedé uma We shall now show that the M.A.E. is minimized when the chosen value of d is a mediana da distribuigdo dex. median of the distribution of X. Teorema DeixarXseja uma variavel aleatéria com média finita, e sejaeuser uma mediana da distribuicdo Theorem Let X be arandom variable with finite mean, and let m be a median of the distribution 4.5.3 deX. Para cada numerod, 4.5.3 of X. For every number d, E(|X-eu| JSE(| X-d| ). (4.5.4) E(|X —m|) < E(\X — dl). (4.5.4) Além disso, havera igualdade na relagdo (4.5.4) se e somente seaé também uma Furthermore, there will be equality in the relation (4.5.4) if and only if d is also a mediana da distribuigdo dex. median of the distribution of X. ProvaPor conveniéncia, assumiremos queXtem uma distribuigaéo continua para a qual o Proof For convenience, we shall assume that X has a continuous distribution for pdf éfA prova para qualquer outro tipo de distribuigdo é semelhante. Suponha primeiro which the p.d.f. is f. The proof for any other type of distribution is similar. Suppose qued > m. Entao f first that d > m. Then * CO E(|X-d| }E(| X-eu| F (|x| — | xeu| )fxjax E(X = dl) ~ E(x mp = [ (jx — d| — |x — ml) f(x) dx — 0 —0o feu fa fo ; ' . = (a-m)fodx+ (a+ eu-2.X)Ff (x) dx+ (m-d)fo)dx = / (d —m) f(x) dx + / (d +m — 2x) f(x) dx + / (m — d) f (x) dx — eu d —00 m d Jeu Ja Je m d 00 = (a-m)f(xjdx+ (m-d)f(x)ax+ (m-d)t(x)dx > / (d —m) f (x) dx + / (m — d) f(x) dx + / (m — d) f (x) dx — 00 eu d —0o m d =(a-mJPr(XsmpPr.(X > ml. (4.5.5) = (d — m)[Pr(X <m) — Pr(X > m)]. (4.5.5) Desdeeué uma mediana da distribuigdo dex, segue que Since m is a median of the distribution of X, it follows that Pr.(Xsm21/22Pr.(X >m). (4.5.6) Pr(X <m) > 1/2> Pr(X > m),. (4.5.6) A diferenga final na relacdo (4.5.5) é, portanto, ndo negativa. Por isso, The final difference in the relation (4.5.5) is therefore nonnegative. Hence, E(|X-d| 2E(| X-eu| ). (4.5.7) E(|X —d|) > E(\X —m)l). (4.5.7) Além disso, s6 pode haver igualdade na relacdo (4.5.7) se as desigualdades nas relagées Furthermore, there can be equality in the relation (4.5.7) only if the inequalities in (4.5.5) e (4.5.6) forem realmente igualdades. Uma andlise cuidadosa mostra que estas relations (4.5.5) and (4.5.6) are actually equalities. A careful analysis shows that these desigualdades so serdo igualdades sedé também uma mediana da distribuicdo dex. inequalities will be equalities only if d is also a median of the distribution of X. A prova para cada valor dedde tal modo qued <me similar. 7 The proof for every value of d such that d < m is similar. 7 246 Capitulo 4 Expectativa 246 Chapter 4 Expectation Exemplo Ultimo numero da loteria.No Exemplo 4.5.5, para calcular a mediana deX, devemos Example Last Lottery Number. In Example 4.5.5, in order to compute the median of X, we must 4.5.8 encontre o menor numeroxtal que o cdfF/x20.5. Para numero inteirox, Nés temos 4.5.8 find the smallest number x such that the c.d-f. F(x) > 0.5. For integer x, we have »* x F(X 0.001 (0.999 )n-1. F(x) = > 0.001(0.999)"-1, n=1 n=1 Podemos usar a formula popular We can use the popular formula * 1 -simx+1 ~ n 1i- yrtl n=0 1-sim 0 —y para ver isso, para numero inteirox21, to see that, for integer x > 1, 1 -(0.999 x 1 — (0.999)* F(x0.001 —————— =1 -(0.999 x. F(x) = 0.001 —-———_ = 1 — (0.999)”. 1 - 0.999 ) 1 — 0.999 ( ) Definir isso igual a 0,5 e resolver paraxdax=692.8; portanto, a mediana deXé 693. A Setting this equal to 0.5 and solving for x gives x = 692.8; hence, the median of X is mediana é Unica porquef(x)nunca assume 0 valor exato 0,5 para qualquer numero inteiro 693. The median is unique because F (x) never takes the exact value 0.5 for any integer x. Amediana deXé muito menor que a média de 1000 encontrada no Exemplo 4.5.7. x. The median of X is much smaller than the mean of 1000 found in Example 4.5.7. - < A raz&o pela qual a média é muito maior do que a mediana nos Exemplos 4.5.7 e 4.5.8 é The reason that the mean is so much larger than the median in Examples 4.5.7 que a distribuigéo tem probabilidade em valores arbitrariamente grandes, mas é limitada and 4.5.8 is that the distribution has probability at arbitrarily large values but is abaixo. A probabilidade nesses valores grandes aumenta a média porque nao ha probabilidade bounded below. The probability at these large values pulls the mean up because there em valores igualmente pequenos para equilibrar. A mediana nao é afetada pela forma como a is no probability at equally small values to balance. The median is not affected by metade superior da probabilidade é distribuida. O exemplo a seguir envolve uma distribuicéo how the upper half of the probability is distributed. The following example involves simétrica. Aqui, a média e a(s) mediana(s) sAo mais semelhantes. a symmetric distribution. Here, the mean and median(s) are more similar. Exemplo Predicdo de uma variavel aleatéria uniforme discreta.Suponha que a probabilidade seja 1/6 Example Predicting a Discrete Uniform Random Variable. Suppose that the probability is 1/6 4.5.9 que uma variavel aleatériaXassumira cada um dos seis valores a seguir: 1,2,3,4,5 4.5.9 that a random variable X will take each of the following six values: 1, 2, 3, 4, 5, 6. We /6. Determinaremos a previsdo para a qual o MSE é minimo e a previsdo para a shall determine the prediction for which the M.S.E. is minimum and the prediction qual o MAE é minimo. for which the M.A.E. is minimum. Neste exemplo, In this example, 1 1 EXF( +2 +3+44+5+6}3.5. E(X)= gi t2t3t4+5 +6) =3.5. Portanto, o MSE sera minimizado pelo valor Unicoa=3.5. Therefore, the M.S.E. will be minimized by the unique value d = 3.5. Além disso, cada numeroeuno intervalo fechado 3seus4 é uma mediana da Also, every number m in the closed interval 3 < m < 4 is a median of the given distribuigdo dada. Portanto, o MAE sera minimizado por cada valor deatal que 3<a&4 distribution. Therefore, the M.A.E. will be minimized by every value of d such that e somente por tal valor ded. Porque a distribuicdo deXé simétrico, a média deX 3 <d <4 and only by such a value of d. Because the distribution of X is symmetric, também é uma mediana dex. - the mean of X is also a median of X. < Nota: Quando o MAE e o MSE s@o finitos.Observamos que a mediana existe para cada Note: When the M.A.E. and M.S.E. Are Finite. We noted that the median exists for distribuigéo, mas 0 MAE é finito se e somente se a distribuigdo tiver uma média finita. Da every distribution, but the M.A.E. is finite if and only if the distribution has a finite mesma forma, 0 MSE é finito se e somente se a distribuicdo tiver uma variancia finita. mean. Similarly, the M.S.E. is finite if and only if the distribution has a finite variance. Resumo Summary Uma mediana deXé qualquer numeroeutal que Pr(Xsm2172 e Pr(Xzm21/2. Para A median of X is any number m such that Pr(X < m) > 1/2 and Pr(X > m) > 1/2. minimizar E(| X-d| Jpor escolha dea, deve-se escolheraser uma mediana dex. Para To minimize E(|X — d|) by choice of d, one must choose d to be a median of X. To minimizar £[(X-e2] por escolha ded, deve-se escolher a= EX). minimize E[(X — d)*| by choice of d, one must choose d = E(X). 4.5 A Média e a Mediana 247 4.5 The Mean andthe Median 247 Exercicios Exercises 1.Prove que o quantil 1/2 conforme definido na Definicao 3.3.2 6 uma Xtem uma distribui¢do continua para a qual o pdf é tao 1. Prove that the 1/2 quantile as defined in Definition 3.3.2 X has a continuous distribution for which the p.d.f. is as mediana conforme definido na Definicdo 4.5.1. segue: ( is a median as defined in Definition 4.5.1. follows: 2.Suponha que uma variavel aleatériaXtem uma distribuicdo X+1 2para OSxs1, 2. Suppose that a random variable X has a discrete distri- x+ 5 for0<x <1, discreta para a qual o PF é 0 seguinte: fixe bution for which the p.f. is as follows: SO)= . 0) de outra forma. 0 otherwise. { Determine a previsdo deXque minimiza(a)o MSE e(b)o Determine the dicti f X that minimi ) th CX parax=1,2,3,4,5,6, caso p q cx for x =1, 2, 3, 4,5, 6, prearction © at minimizes (a) the f(x P . MAE faX)= | . M.S.E. and (b) the M.A.E. 0 contrario. 0 otherwise. . d di q distribuics 8.Suponha que a distribuigdo de uma variavel aleatéria D ine all th di £ this distributi 8. Suppose that the distribution of a random variable Determine todas as medianas desta distribuicao. Xé simétrico em relagdo ao pontox=0 e isso EX4) <e, etermine all the medians of this distribution. X is symmetric with respect to the point x = 0 and that 3.Suponha que uma variavel aleatériaXtem uma Mostre isso£[(X-e4] 6 minimizado pelo valora=0. 3. Suppose that a random variable X has a continuous —-E(X*) < oo. Show that E[(X — d)*] is minimized by the distribuigdo continua para a qual o pdf é 0 seguinte: distribution for which the p.d.f. is as follows: value d = 0. ft { e-x parax >0, 9.Suponha que um incéndio possa ocorrer em qualquer um dos cinco e* forx>0, 9. Suppose that a fire can occur at any one of five points (XF pontos ao longo de uma estrada. Esses pontos estdo localizados em -3,-1,0, f(x) = : along a road. These points are located at —3, —1, 0, 1, and 0 de outra forma. . , 0 otherwise. : . “4: 1 e 2 na Figura 4.9. Suponha também que a probabilidade de cada um 2 in Fig. 4.9. Suppose also that the probability that each of Determine todas as medianas desta distribuicdo. desses pontos ser o local do proximo incéndio que ocorrera ao longo da Determine all the medians of this distribution. these points will be the location of the next fire that occurs . ae trada sej ificada na Figura 4.9. . oe “ye along the road is as specified in Fig. 4.9. 4.Numa pequena comunidade composta por 153 familias, ee NSS 4. In a small community consisting of 153 families, the é P & o numero de familias que tem&criangas(k=0,1,2,...)é number of families that have k children (k = 0, 1, 2, .. .) dado na tabela a seguir: 0.4 is given in the following table: 04 0,2 0,2 0.2 0.2 Numero de Numero de 0,1 0,1 Number of Number of 0.1 0.1 criangas familias 3 -1 0 1 2 estrada children families 3 =I 0 1 2 Road 0 21 Figura 4.9Probabilidades para 0 Exercicio 9. 0 21 Figure 4.9 Probabilities for Exercise 9. 1 40 1 40 a.Em que ponto da estrada um carro de bombeiros deve a. At what point along the road should a fire engine 2 42 a 2 42 ws Lo. esperar para minimizar o valor esperado do quadrado wait in order to minimize the expected value of the 3 27 da distancia que deve percorrer até 0 préximo incéndio? 3 27 square of the distance that it must travel to the next 4 ou mais 23 4 or more 23 fire? b.Onde o carro de bombeiros deve esperar para minimizar o ; ; b. Where should the fire engine wait to minimize the Determine a média e a mediana do numero de filhos por valor esperado da distancia que deve percorrer até o Determine the mean and the median of the number of expected value of the distance that it must travel to familia. (Para a média, suponha que todas as familias com préximo incéndio? children per family. (For the mean, assume that all families the next fire? quatro ou mais filhos tenham apenas quatro filhos. Por que . . . with four or more children have only four children. Why . . este ponto nao importa para a mediana?) 10.Senas casas estado localizadas em varios pontos ao doesn’t this point matter for the median?) 10. If n houses are located at various points along a longo de uma estrada reta, em que ponto da estrada deve straight road, at what point along the road should a store 5.Suponha que um valor observado dexé igualmente ser localizado um armazém para minimizar a soma das 5. Suppose that an observed value of X is equally likely to be located in order to minimize the sum of the distances provavel que venha de uma distribuicgdo continua para distancias doncasas para a loja? come from a continuous distribution for which the p.df. from the n houses to the store? a qual o pdf éfou de um para 0 qual o pdf ég. Suponha / | -_ ae is f or from one for which the p.d.f. is g. Suppose that . . . to. que f(x) >0 por 0<x <1 ef{x-0 caso contrario, e 11.DeixarXser uma variavel aleatoria tendo a distribuigdo f(x) > 0 for 0 <x <1and f(x) =0 otherwise, and sup- 11. Let X be a random variable having the binomial dis- suponha também queg(x) >0 por 2<x <4 eg(x0 caso binomial com parametros/=7 ep=1 A, e deixeSser uma pose also that g(x) > 0 for 2 <x <4 and g(x) =0 other- tribution with parameters n=7 and p= 1/4, and let Y be contrario. Determinar:(a)a média e variavel aleatoria tendo a distribuigdo binomial com wise. Determine: (a) the mean and a random variable having the binomial distribution with (b)a mediana da distribuicdo dex. parametros/i=5 ep=172. Qual destas duas variaveis (b) the median of the distribution of X. parameters n=S5and p= 1/2. Which of these two random aleatorias pode ser prevista com o MSE menor? variables can be predicted with the smaller M.S.E.? 6.Suponha que uma variavel aleatoriaXtem uma 2.Consid da cui babilidade de ob 6. Suppose that a random variable X has a continuous 12. Consid n for which th bability of obtaini distribuigdo continua para a qual o pdff o seguinte: 12.Considere uma Moeca Cuja Propadllidade ae o ter cara distribution for which the p.d.f. f is as follows: - Consider a coin for which the probabullty of obtaining { em cada langamento é de 0,3. Suponha que a moeda seja ahead on each given toss is 0.3. Suppose that the coin is to fix 2x para O<x <1, langada 15 vezes e deixeXdenota o numero de caras que fa)= 2x for0<x <1, be tossed 15 times, and let X denote the number of heads 0 de outra forma. serdo obtidas. 0 otherwise. that will be obtained. . wo isd ? . Lo, . icti S.E.? Determine o valor deaque minimiza a.Que previsdo deXtem o menor MSE: Determine the value of d that minimizes a. What prediction of X has the smallest M.S.E (a)AL(~ ep] e(b) E(| Xd ). b.Que previsdo deXtem o menor MAE? (a) E[(X — d)*] and (b) E(|X —d|). b. What prediction of X has the smallest M.A.E.? 7.Suponha que a pontuagdo de uma pessoaXem um determinado 13.Suponha que a distribuic¢ao dexé simetrico em torno de 7. Suppose that a person’s score X on a certain examina- 13. Suppose that the distribution of x is symmetric exame sera um numero no intervalo 0SX<1 e isso um pontoeu. Prove issoeué uma mediana dex. tion will be a number in the interval 0 < X <1 and that around a point m. Prove that m is a median of X. 248 Chapter 4 Expectation 14. Find the median of the Cauchy distribution defined in Example 4.1.8. 15. Let X be a random variable with c.d.f. F. Suppose that a < b are numbers such that both a and b are medians of X. a. Prove that F(a) = 1/2. b. Prove that there exist a smallest c ≤ a and a largest d ≥ b such that every number in the closed interval [c, d] is a median of X. c. If X has a discrete distribution, prove that F(d) > 1/2. 16. Let X be a random variable. Suppose that there exists a number m such that Pr(X < m) = Pr(X > m). Prove that m is a median of the distribution of X. 17. Let X be a random variable. Suppose that there exists a number m such that Pr(X < m) < 1/2 and Pr(X > m) < 1/2. Prove that m is the unique median of the distribution of X. 18. Prove the following extension of Theorem 4.5.1. Let m be the p quantile of the random variable X. (See Defi- nition 3.3.2.) If r is a strictly increasing function, then r(m) is the p quantile of r(X). 4.6 Covariance and Correlation When we are interested in the joint distribution of two random variables, it is useful to have a summary of how much the two random variables depend on each other. The covariance and correlation are attempts to measure that dependence, but they only capture a particular type of dependence, namely linear dependence. Covariance Example 4.6.1 Test Scores. When applying for college, high school students often take a number of standardized tests. Consider a particular student who will take both a verbal and a quantitative test. Let X be this student’s score on the verbal test, and let Y be the same student’s score on the quantitative test. Although there are students who do much better on one test than the other, it might still be reasonable to expect that a student who does very well on one test to do at least a little better than average on the other. We would like to find a numerical summary of the joint distribution of X and Y that reflects the degree to which we believe a high or low score on one test will be accompanied by a high or low score on the other test. ◀ When we consider the joint distribution of two random variables, the means, the medians, and the variances of the variables provide useful information about their marginal distributions. However, these values do not provide any information about the relationship between the two variables or about their tendency to vary together rather than independently. In this section and the next one, we shall introduce summaries of a joint distribution that enable us to measure the association between two random variables, determine the variance of the sum of an arbitrary number of dependent random variables, and predict the value of one random variable by using the observed value of some other related variable. Definition 4.6.1 Covariance. Let X and Y be random variables having finite means. Let E(X) = μX and E(Y) = μY The covariance of X and Y, which is denoted by Cov(X, Y), is defined as Cov(X, Y) = E[(X − μX)(Y − μY)], (4.6.1) if the expectation in Eq. (4.6.1) exists. 248 Capítulo 4 Expectativa 14.Encontre a mediana da distribuição de Cauchy definida no Exemplo 4.1.8. 16.DeixarXseja uma variável aleatória. Suponha que exista um númeroeutal que Pr(X<m)=Pr.(X > m). Prove isso eué uma mediana da distribuição deX. 15.DeixarXseja uma variável aleatória com cdfF.Suponha que uma < bsão números tais que ambosaebsão medianas de X. 17.DeixarXseja uma variável aleatória. Suponha que exista um númeroeutal que Pr(X <m) <1/2 e Pr(X > m) < 1/2. Prove issoeué a mediana única da distribuição deX. a.Prove issoF(a)=1/2. b.Prove que existe um menorc≤ae um maior d≥btal que todo número no intervalo fechado [cd] é uma mediana deX. 18.Prove a seguinte extensão do Teorema 4.5.1. Deixar eu seja opquantil da variável aleatóriaX. (Ver Definição 3.3.2.) SeRé uma função estritamente crescente, entãor(m) é op quantil der(X). c.SeXtem distribuição discreta, prove queF(d) > 1/2. 4.6 Covariância e Correlação Quando estamos interessados na distribuição conjunta de duas variáveis aleatórias, é útil ter um resumo do quanto as duas variáveis aleatórias dependem uma da outra. A covariância e a correlação são tentativas de medir essa dependência, mas apenas captam um tipo particular de dependência, nomeadamente a dependência linear. Covariância Exemplo 4.6.1 Resultados dos testes.Ao se inscreverem para a faculdade, os alunos do ensino médio geralmente realizam uma série de Testes padronizados. Considere um aluno específico que fará um teste verbal e um teste quantitativo. DeixarXseja a pontuação deste aluno no teste verbal, e deixeSser a mesma nota do aluno na prova quantitativa. Embora existam alunos que se saem muito melhor num teste do que no outro, ainda pode ser razoável esperar que um aluno que se sai muito bem num teste tenha um desempenho pelo menos um pouco melhor do que a média no outro. Gostaríamos de encontrar um resumo numérico da distribuição conjunta deX eSisso reflecte o grau em que acreditamos que uma pontuação alta ou baixa num teste será acompanhada por uma pontuação alta ou baixa no outro teste. - Quando consideramos a distribuição conjunta de duas variáveis aleatórias, as médias, as medianas e as variâncias das variáveis fornecem informações úteis sobre as suas distribuições marginais. No entanto, estes valores não fornecem qualquer informação sobre a relação entre as duas variáveis ou sobre a sua tendência para variar em conjunto e não de forma independente. Nesta seção e na próxima, apresentaremos resumos de uma distribuição conjunta que nos permitem medir a associação entre duas variáveis aleatórias, determinar a variância da soma de um número arbitrário de variáveis aleatórias dependentes e prever o valor de uma variável aleatória. variável usando o valor observado de alguma outra variável relacionada. Definição 4.6.1 Covariância.DeixarXeSser variáveis aleatórias com médias finitas. DeixarEX)=μX eE(S)=μSOcovariância deXeS,que é denotado por Cov(X, Y), é definido como Cov(X, Y)=E[(X-μX)(S-μS)], se a expectativa na Eq. (4.6.1) existe. (4.6.1) 4.6 Covariancia e Correlagdo 249 4.6 Covariance and Correlation 249 Pode-se mostrar (ver Exercicio 2 no final desta segdo) que se ambosXeS tem It can be shown (see Exercise 2 at the end of this section) that if both X and Y variancia finita, entdo a expectativa na Eq. (4.6.1) existira e Cov(X, Ysera finito. have finite variance, then the expectation in Eq. (4.6.1) will exist and Cov(X, Y) will No entanto, o valor de Cov(X, Y)pode ser positivo, negativo ou zero. be finite. However, the value of Cov(X, Y) can be positive, negative, or zero. Exemplo Resultados dos testes.Deixar Xe Ssejam as pontuacgées dos testes no Exemplo 4.6.1 e suponha que elas Example Test Scores. Let X and Y be the test scores in Example 4.6.1, and suppose that they 4.6.2 tem o pdf conjunto 4.6.2 have the joint p.d-f. { 2xy+0.5 ara OS x<1 e O<sims1, caso 2xy+0.5 forO<x<1land0<y<l1, FogyR ONO Pare fon | » » 0 contrario. 0 otherwise. Vamos calcular a covariancia Cov(X, Y). Primeiro, vamos calcular as médiasyx We shall compute the covariance Cov(X, Y). First, we shall compute the means py episdeXeS,respectivamente. A simetria na PDF conjunta significa queXe Stém a and jy of X and Y, respectively. The symmetry in the joint p.d.f. means that X and mesma distribuigdo marginal; por isso, Lx="s.N6s vemos que Y have the same marginal distribution; hence, wy = ry. We see that fifi 1 pl Lix= [2x2 sim-+0.5x] dyx py = / / [2x7y + 0.5x]dydx 0.600 0 JO iy 1,1_7 1 1 17 = Pet d5xXae + -= —, = [iP +0seax= 547-5. 0 3 4 #12 0 3 4 12 para quep/s=7/1 2 também. A covariancia pode ser calculada usando 0 Teorema 4.1.2. so that wy = 7/12 as well. The covariance can be computed using Theorem 4.1.2. Especificamente, devemos avaliar a integral Specifically, we must evaluate the integral SoS Zea) 7 1 xX — sim— (2xyt0.5) dy dx. [/ («- 3) ( - 5) 2xy + 0.5) dy dx. oo 13 7p (2xV+0.5) dy bd p)P7p)e” ) dy Esta integral é simples, embora tediosa, de calcular, e o resultado é Cov(X, YF1/ This integral is straightforward, albeit tedious, to compute, and the result is 144. - Cov(X, Y) = 1/144. < O resultado a sequir simplifica frequentemente o cdlculo de uma covariancia. The following result often simplifies the calculation of a covariance. Teorema Para todas as varidveis aleatériasXeSde tal modo queoz X<00e02 S<00, Theorem For all random variables X and Y such that oy < oo and oy <O, 4.6.1 4.6.1 Cov(X, YF E(XY} EXQE(Y). (4.6.2) Cov(X, Y) = E(XY) — E(X)E(Y). (4.6.2) ProvaSegue-se da Eq. (4.6.1) que Proof It follows from Eq. (4.6.1) that Cov(X, YF E(XY-xS- sx Uxps) Cov(X, Y) = E(XY — pwyY — wyX + pyby) = E(XY)-UxE(S)- SEX} UXps. = E(XY) — wy E(Y) — wy E(X) + nyny. Desde EX uxe E(SE us, Eq. (4.6.2) é obtido. . Since E(X) = wy and E(Y) = py, Eq. (4.6.2) is obtained. 7 A covaridncia entreXeSpretende medir até que ponto XeStendem a ser grandes The covariance between X and Y is intended to measure the degree to which ao mesmo tempo ou o grau em que um tende a ser grande enquanto o outro é X and Y tend to be large at the same time or the degree to which one tends to be pequeno. Alguma intuigdo sobre esta interpretagdo pode ser obtida a partir de uma large while the other is small. Some intution about this interpretation can be gathered observacdo cuidadosa da Eq. (4.6.1). Por exemplo, suponha que Cov(X, Y}é positivo. from a careful look at Eq. (4.6.1). For example, suppose that Cov(X, Y) is positive. EntdoX > pxe Y > usdevem ocorrer juntos e/ouX<pxe Y < psdevem ocorrer juntos em Then X > jy and Y > wy must occur together and/or X < wy and Y < wy must occur uma extensdo maior do queX<pxocorre comY > ~seX > pxocorre com Y < ps.Caso together to a larger extent than X < wy occurs with Y > wy and X > wy occurs with contrario, a média seria negativa. Da mesma forma, se Cov(X, Yé negativo, entaoX > Y < py. Otherwise, the mean would be negative. Similarly, if Cov(X, Y) is negative, pxeY <psdevem ocorrer juntos e/ouX<pxe Y > isdevem ocorrer juntas em maior then X > wy and Y < py must occur together and/or X < wy and Y > wy must occur extensdo do que as outras duas desigualdades. Se Cov(X, YEO, entdo até que pontoX together to larger extent than the other two inequalities. If Cov(X, Y) = 0, then the eSestdo nos mesmos lados de seus respectivos meios equilibra exatamente até que extent to which X and Y are on the same sides of their respective means exactly ponto eles estdo em lados opostos de seus meios. balances the extent to which they are on opposite sides of their means. 250 Chapter 4 Expectation Correlation Although Cov(X, Y) gives a numerical measure of the degree to which X and Y vary together, the magnitude of Cov(X, Y) is also influenced by the overall magnitudes of X and Y. For example, in Exercise 5 in this section, you can prove that Cov(2X, Y) = 2 Cov(X, Y). In order to obtain a measure of association between X and Y that is not driven by arbitrary changes in the scales of one or the other random variable, we define a slightly different quantity next. Definition 4.6.2 Correlation. Let X and Y be random variables with finite variances σ 2 X and σ 2 Y, re- spectively. Then the correlation of X and Y, which is denoted by ρ(X, Y), is defined as follows: ρ(X, Y) = Cov(X, Y) σXσY . (4.6.3) In order to determine the range of possible values of the correlation ρ(X, Y), we shall need the following result. Theorem 4.6.2 Schwarz Inequality. For all random variables U and V such that E(UV ) exists, [E(UV )]2 ≤ E(U2)E(V 2). (4.6.4) If, in addition, the right-hand side of Eq. (4.6.4) is finite, then the two sides of Eq. (4.6.4) equal the same value if and only if there are nonzero constants a and b such that aU + bV = 0 with probability 1. Proof If E(U2) = 0, then Pr(U = 0) = 1. Therefore, it must also be true that Pr(UV = 0) = 1. Hence, E(UV ) = 0, and the relation (4.6.4) is satisfied. Similarly, if E(V 2) = 0, then the relation (4.6.4) will be satisfied. Moreover, if either E(U2) or E(V 2) is infinite, then the right side of the relation (4.6.4) will be infinite. In this case, the relation (4.6.4) will surely be satisfied. For the rest of the proof, assume that 0 < E(U2) < ∞ and 0 < E(V 2) < ∞. For all numbers a and b, 0 ≤ E[(aU + bV )2] = a2E(U2) + b2E(V 2) + 2abE(UV ) (4.6.5) and 0 ≤ E[(aU − bV )2] = a2E(U2) + b2E(V 2) − 2abE(UV ). (4.6.6) If we let a = [E(V 2)]1/2 and b = [E(U2)]1/2, then it follows from the relation (4.6.5) that E(UV ) ≥ −[E(U2)E(V 2)]1/2. (4.6.7) It also follows from the relation (4.6.6) that E(UV ) ≤ [E(U2)E(V 2)]1/2. (4.6.8) These two relations together imply that the relation (4.6.4) is satisfied. Finally, suppose that the right-hand side of Eq. (4.6.4) is finite. Both sides of (4.6.4) equal the same value if and only if the same is true for either (4.6.7) or (4.6.8). Both sides of (4.6.7) equal the same value if and only if the rightmost expression in (4.6.5) is 0. This, in turn, is true if and only if E[(aU + bV )2] = 0, which occurs if and only if aU + bV = 0 with probability 1. The reader can easily check that both sides of (4.6.8) equal the same value if and only if aU − bV = 0 with probability 1. 250 Capítulo 4 Expectativa Correlação Embora Cov(X, Y)fornece uma medida numérica do grau em queXeSvariam juntos, a magnitude de Cov(X, Y)também é influenciado pelas magnitudes globais de XeS.Por exemplo, no Exercício 5 desta seção, você pode provar que Cov(2X, Y)= 2 Cov(X, Y). Para obter uma medida de associação entreXeSaquilo é não impulsionado por mudanças arbitrárias na balançade uma ou outra variável aleatória, definimos a seguir uma quantidade ligeiramente diferente. Definição 4.6.2 Correlação.DeixarXeSser variáveis aleatórias com variâncias finitasσ2 Xeσ2 S,ré- respectivamente. Então ocorrelação deXeS,que é denotado porρ(X, Y), é definido da seguinte forma: Cov(X, Y) σXσS ρ(X,Y)= . (4.6.3) Para determinar a faixa de valores possíveis da correlaçãoρ(X, Y), precisaremos do seguinte resultado. Teorema 4.6.2 Desigualdade de Schwarz.Para todas as variáveis aleatóriasvocêeVde tal modo queE(UV)existe, [E(UV)]2≤UE2)E(V2). (4.6.4) Se, além disso, o lado direito da Eq. (4.6.4) é finito, então os dois lados da Eq. (4.6.4) é igual ao mesmo valor se e somente se houver constantes diferentes de zeroae bde tal modo quetudo+bV=0 com probabilidade 1. ProvaSeUE2)=0, então Pr(VOCÊ=0)=1. Portanto, também deve ser verdade que Pr(UV = 0)=1. Portanto,E(UV)=0, e a relação (4.6.4) é satisfeita. Da mesma forma, seE(V2)=0, então a relação (4.6.4) será satisfeita. Além disso, se qualquer umUE2)ouE(V2)é infinito, então o lado direito da relação (4.6.4) será infinito. Neste caso, a relação (4.6.4) certamente será satisfeita. Para o resto da prova, suponha que 0<UE2) <∞e 0<E(V2) <∞.Para todos os númerosaeb, 0≤E[(aU+bV)2] =a2UE2)+b2E(V2)+2abE(UV ) (4.6.5) e 0≤E[(aU-bV)2] =a2UE2)+b2E(V2)-2abE(UV ). (4.6.6) Se deixarmosa= [E(V2)]1/2eb= [UE2)]1/2, então segue da relação (4.6.5) que E(UV)≥ −[UE2)E(V2)]1/2. Também segue da relação (4.6.6) que E(UV)≤ [UE2)E(V2)]1/2. (4.6.7) (4.6.8) Estas duas relações juntas implicam que a relação (4.6.4) é satisfeita. Finalmente, suponha que o lado direito da Eq. (4.6.4) é finito. Ambos os lados (4.6.4) é igual ao mesmo valor se e somente se o mesmo for verdadeiro para (4.6.7) ou (4.6.8). Ambos os lados de (4.6.7) são iguais ao mesmo valor se e somente se a expressão mais à direita em (4.6.5) for 0. Isso, por sua vez, é verdadeiro se e somente seE[(aU+bV)2] = 0, o que ocorre se e somente setudo+ bV=0 com probabilidade 1. O leitor pode facilmente verificar que ambos os lados de (4.6.8) são iguais ao mesmo valor se e somente setudo-bV=0 com probabilidade 1. 4.6 Covariance and Correlation 251 A slight variant on Theorem 4.6.2 is the result we want. Theorem 4.6.3 Cauchy-Schwarz Inequality. Let X and Y be random variables with finite variance. Then [Cov(X, Y)]2 ≤ σ 2 Xσ 2 Y, (4.6.9) and −1 ≤ ρ(X, Y) ≤ 1. (4.6.10) Furthermore, the inequality in Eq. (4.6.9) is an equality if and only if there are nonzero constants a and b and a constant c such that aX + bY = c with probability 1. Proof Let U = X − μX and V = Y − μY. Eq. (4.6.9) now follows directly from Theo- rem 4.6.2. In turn, it follows from Eq. (4.6.3) that [ρ(X, Y)]2 ≤ 1 or, equivalently, that Eq. (4.6.10) holds. The final claim follows easily from the similar claim at the end of Theorem 4.6.2. Definition 4.6.3 Positively/Negatively Correlated/Uncorrelated. It is said that X and Y are positively correlated if ρ(X, Y) > 0, that X and Y are negatively correlated if ρ(X, Y) < 0, and that X and Y are uncorrelated if ρ(X, Y) = 0. It can be seen from Eq. (4.6.3) that Cov(X, Y) and ρ(X, Y) must have the same sign; that is, both are positive, or both are negative, or both are zero. Example 4.6.3 Test Scores. For the two test scores in Example 4.6.2, we can compute the correlation ρ(X, Y). The variances of X and Y are both equal to 11/144, so the correlation is ρ(X, Y) = 1/11. ◀ Properties of Covariance and Correlation We shall now present four theorems pertaining to the basic properties of covariance and correlation. The first theorem shows that independent random variables must be uncorre- lated. Theorem 4.6.4 If X and Y are independent random variables with 0 < σ 2 X < ∞ and 0 < σ 2 Y < ∞, then Cov(X, Y) = ρ(X, Y) = 0. Proof If X and Y are independent, then E(XY) = E(X)E(Y). Therefore, by Eq. (4.6.2), Cov(X, Y) = 0. Also, it follows that ρ(X, Y) = 0. The converse of Theorem 4.6.4 is not true as a general rule. Two dependent random variables can be uncorrelated. Indeed, even though Y is an explicit function of X, it is possible that ρ(X, Y) = 0, as in the following examples. Example 4.6.4 Dependent but Uncorrelated Random Variables. Suppose that the random variable X can take only the three values −1, 0, and 1, and that each of these three values has the same probability. Also, let the random variable Y be defined by the relation Y = X2. We shall show that X and Y are dependent but uncorrelated. 4.6 Covariância e Correlação 251 Uma ligeira variante do Teorema 4.6.2 é o resultado que queremos. Teorema 4.6.3 Desigualdade de Cauchy-Schwarz.DeixarXeSser variáveis aleatórias com variância finita. Então [Cov(X, Y)]2≤σ2 Xσ2S, (4.6.9) e − 1≤ρ(X, Y)≤1. (4.6.10) Além disso, a desigualdade na Eq. (4.6.9) é uma igualdade se e somente se houver constantes diferentes de zeroaebe uma constantecde tal modo quemachado+por=ccom probabilidade 1. ProvaDeixarvocê=X-μXeV=S-μS.Eq. (4.6.9) agora segue diretamente do Teorema 4.6.2. Por sua vez, segue da Eq. (4.6.3) que [ρ(X,Y)]2≤1 ou, equivalentemente, que a Eq. (4.6.10) é válido. A afirmação final segue facilmente da afirmação semelhante no final do Teorema 4.6.2. Definição 4.6.3 Positivamente/Negativamente Correlacionado/Não Correlacionado.Diz-se queXeSsãopositivamente correlacionadoseρ(X, Y) >0, issoXeSsãocorrelacionado negativamenteseρ(X, Y) <0, e issoXeSsãonão correlacionadasseρ(X, Y)=0. Isso pode ser visto na Eq. (4.6.3) que Cov(X, Y)eρ(X, Y)deve ter o mesmo sinal; isto é, ambos são positivos, ou ambos são negativos, ou ambos são zero. Exemplo 4.6.3 Resultados dos testes.Para as duas pontuações dos testes no Exemplo 4.6.2, podemos calcular a correlação ρ(X,Y). As variações deXeSambos são iguais a 11/144, então a correlação é ρ(X,Y) =1/11. - Propriedades de Covariância e Correlação Apresentaremos agora quatro teoremas relativos às propriedades básicas de covariância e correlação. O primeiro teorema mostra que variáveis aleatórias independentes não devem ser correlacionadas. Teorema 4.6.4 SeXeSsão variáveis aleatórias independentes com 0<σ2 Cov(X, Y)=ρ(X,Y)=0. X<∞e 0<σ2 S<∞,então ProvaSeXeSsão independentes, entãoE(XY)=E(X)E(Y). Portanto, pela Eq. (4.6.2), Cov(X, Y)=0. Além disso, segue-se queρ(X, Y)=0. A recíproca do Teorema 4.6.4 não é verdadeira como regra geral. Duas variáveis aleatórias dependentes podem não ser correlacionadas. Na verdade, mesmo queSé uma função explícita deX, é possível queρ(X, Y)=0, como nos exemplos a seguir. Exemplo 4.6.4 Variáveis aleatórias dependentes, mas não correlacionadas.Suponha que a variável aleatóriaX pode assumir apenas os três valores −1,0 e 1, e que cada um desses três valores tem a mesma probabilidade. Além disso, deixe a variável aleatóriaSser definido pela relaçãoS=X2. Mostraremos queXeSsão dependentes, mas não correlacionados. 252 Capitulo 4 Expectativa 252 Chapter 4 Expectation Figura 4.100 sombreado sj Figure 4.10 The shaded y regido é onde o pdf conjunto 1,0 region is where the joint p.d_f. 10 de(X, ¥ constante e diferente oon | ote. of (X, Y) is constant and oon) | tes. de zero no Exemplo 4.6.5. A rae DS nonzero in Example 4.6.5. rae DS linha vertical indica os valores L \ The vertical line indicates the L \ deSque sao possiveis quandoX f 0,5 \, values of Y that are possible / 0.5 \, =0.5. ! \ when X = 0.5. ! \ ! \ I \ ! \ ! \ I \ I 1 -1,0! “05 05 1 1,0X -10! ee a 11.0 x \ 4 \ 4 \ 4 \ 4 \ f \ f \ y \ y \ -0,5 , \ -0.5 , ‘ s ‘ s Ny , Ny , SC y a a SC . 7 SAL . 7 Neste exemplo,XeSsdo claramente dependentes, uma vez que Sndo é constante e o In this example, X and Y are clearly dependent, since Y is not constant and the valor deSé completamente determinado pelo valor dex. No entanto, value of Y is completely determined by the value of X. However, E(XXYF DBF EXO, E(XY) = E(X*) = E(X) =0, porquex3é a mesma variavel aleatéria queX. Desde F(XY0 eF(X)JE(YF0, segue do because X°? is the same random variable as X. Since E(XY) = 0 and E(X)E(Y) =0, Teorema 4.6.1 que Cov(X, YF0 e issoxXeSndo estado correlacionados. it follows from Theorem 4.6.1 that Cov(X, Y) =0 and that X and Y are uncorrelated. - < Exemplo Distribuigdo Uniforme Dentro de um Circulo.Deixar(X, ¥jtem pdf conjunto que é constante em Example Uniform Distribution Inside a Circle. Let (X, Y) have joint p.d.f. that is constant on 4.6.5 o interior do circulo unitario, a regido sombreada na Fig. 4.10. O valor constante da 4.6.5 the interior of the unit circle, the shaded region in Fig. 4.10. The constant value of pdf é um sobre a area do circulo, ou seja, 14277). E claro queXeS sdo dependentes, the p.d.f. is one over the area of the circle, that is, 1/(27). It is clear that X and Y pois a regido onde a pdf conjunta é diferente de zero ndo é um retangulo. Em are dependent since the region where the joint p.d.f. is nonzero is not a rectangle. particular, observe que o conjunto de valores possiveis paraSé o intervalo(-1,1), mas In particular, notice that the set of possible values for Y is the interval (—1, 1), but quandoX=0.5, o conjunto de valores possiveis paraSé o menor intervalo(—0.866,0.866) when X = 0.5, the set of possible values for Y is the smaller interval (—0.866, 0.866). . Asimetria do circulo fazfistofclaro que ambosxXeStém média 0. Além disso, ndo é The symmetry of the circle makes it clear that both X and Y have mean 0. Also, it is dificil ver queE(XY}= xyf (x, yJdxdy=0. Para ver isso, observe que a integral not difficult to see that E(XY) = f[ f xyf(x, y)dxdy =0. To see this, notice that the dexyacima da metade superior do circulo é exatamente o negativo da integral dexysobre a integral of xy over the top half of the circle is exactly the negative of the integral of xy metade inferior. Portanto, Cov(X, Y=0, mas as variaveis aleatorias so dependentes. over the bottom half. Hence, Cov(X, Y) = 0, but the random variables are dependent. - < O préximo resultado mostra que se.Sé um/inearfuncdo deX, entaoXeSdeve ser The next result shows that if Y is a linear function of X, then X and Y must be correlacionado e, de fato, | p(X, Y)| =1. correlated and, in fact, |o(X, Y)| =1. Teorema Suponha queXé uma variavel aleatéria tal que 0<o2 x<~,eS=machadot bpara alguns Theorem Suppose that X is a random variable such that 0 < oy <oo,and Y =aX +b forsome 4.6.5 constantesaeb, ondea=0. Seuma >0, entaop(x, YF1. Seum <0, entdop(X, YF -1. 4.6.5 constants a and b, wherea £0. Ifa > 0, then p(X, Y) =1.Ifa <0, then p(X, Y) = -1. ProvaSeS=machadot b, entaop~s=apxt be S-s=machado-px). Portanto, pela Eq. Proof If Y=aX +), then wy =apy +b and Y — wy =a(X — py). Therefore, by (4.6.1), Eq. (4.6.1), Cov(X, YF afl (X-uxp] =ao2 x Cov(X, Y) =aE[(X — py)*]=aoy. Desdeos | a| ox, o teorema segue da Eq. (4.6.3). 7 Since oy = |aloy, the theorem follows from Eq. (4.6.3). 7 Existe uma reciproca para o Teorema 4.6.5. Isto é, | o(X, Y)| =1 implica queXe Sestao There is a converse to Theorem 4.6.5. That is, |o(X, Y)| = 1 implies that X and linearmente relacionados. (Veja o Exercicio 17.) Em geral, 0 valor dep(x, Yfornece uma medida Y are linearly related. (See Exercise 17.) In general, the value of p(X, Y) provides a da extensdo em que duas varidveis aleatériasXe Sestao linearmente relacionados. Se measure of the extent to which two random variables X and Y are linearly related. If 4.6 Covariancia e Correlacao 253 4.6 Covariance and Correlation 253 a distribuigéo conjunta deXeSesta relativamente concentrado em torno de uma linha reta noxy-plano the joint distribution of X and Y is relatively concentrated around a straight line in que tem uma inclinac&o positiva, entaop(X, Ynormalmente estara proximo de 1. Se a distribuicdo the xy-plane that has a positive slope, then p(X, Y) will typically be close to 1. If the conjunta estiver relativamente concentrada em torno de uma linha reta que tem uma inclinagdo joint distribution is relatively concentrated around a straight line that has a negative negativa, entéop(X, Ynormalmente estard proximo de -1. Ndo discutiremos mais esses conceitos aqui, slope, then p(X, Y) will typically be close to —1. We shall not discuss these concepts mas os consideraremos novamente quando a distribuicdo normal bivariada for introduzida e further here, but we shall consider them again when the bivariate normal distribution estudada na Sedo. 5.10. is introduced and studied in Sec. 5.10. Nota: Correlagdo mede apenas relacionamento linear.Um grande valor de | p(X, Y)| significa Note: Correlation Measures Only Linear Relationship. A large value of |p(X, Y)| queXeSestao perto de serem linearmente relacionados e, portanto, estdo intimamente relacionados. means that X and Y are close to being linearly related and hence are closely related. Mas um pequeno valor de | p(X, Y)| nao significa queXeSndo estdo perto de serem parentes. Na But a small value of |o(X, Y)| does not mean that X and Y are not close to being verdade, o Exemplo 4.6.4 ilustra variaveis aleatdrias que estado funcionalmente relacionadas, mas related. Indeed, Example 4.6.4 illustrates random variables that are functionally t€m correlacdo 0. related but have 0 correlation. Determinaremos agora a variancia da soma das variadveis aleatérias que nao sdo We shall now determine the variance of the sum of random variables that are necessariamente independentes. not necessarily independent. Teorema SeXeSsdo variaveis aleatdrias tais que Var(X) <ee Var(S) <~,entdo Theorem If X and Y are random variables such that Var(X) < oo and Var(Y) < ov, then 4.6.6 -6.6 Var(X+SFVar(X}Var(SH2 Cov(X, Y). (4.6.11) 4 Var(X + Y) = Var(X) + Var(Y) + 2 Cov(Xx, Y). (4.6.11) ProvaDesde£X+ SE uxt Ls,entao Proof Since E(X + Y) =py + py, then Var (X+ SE EL(X+ Sx s2] Var(X + Y)=E[(X + ¥ -—py - Ly)" ] = FL(X-puxo+ (S-Uspe+ 20% UxNS-pisf\ =Var (XPVar (S2 = E[(X — wy)? + (¥ — wy)* + 2(X — py) (Y — py)] Cov(X, Y). a = Var(X) + Var(Y) + 2 Cov(X, Y). a Para todas as constantesaeb, pode-se mostrar que Cov(aX, porYFabCov(x, Y) For all constants a and b, it can be shown that Cov(aX, bY) = ab Cov(X, Y) (veja o Exercicio 5 no final desta segdo). O seguinte segue facilmente do (see Exercise 5 at the end of this section). The following then follows easily from Teorema 4.6.6. Theorem 4.6.6. Corolario Deixara,6, ecsejam constantes. Nas condigdes do Teorema 4.6.6, Corollary Let a, b, and c be constants. Under the conditions of Theorem 4.6.6, 4.6.1 6.1 Var(machadot port cE a2Var (X} b2Var (SH2abCov(X, Y). (4.6.12) 4 Var(aX + bY +c) =a’ Var(X) + b? Var(Y) + 2ab Cov(X, Y). (4.6.12) 7 7 Um caso especial particularmente util do Corolario 4.6.1 é A particularly useful special case of Corollary 4.6.1 is Var(X-SEVar(X}#Var(S)-2 Cov(X, Y). (4.6.13) Var(X — Y) = Var(X) + Var(Y) — 2 Cov(X, Y). (4.6.13) Exemplo Carteira de Investimentos.Considere, mais uma vez, 0 investidor no Exemplo 4.3.7 na pagina 230 Example Investment Portfolio. Consider, once again, the investor in Example 4.3.7 on page 230 4.6.6 tentando escolher um portfdlio com US$ 100.000 para investir. Faremos as mesmas 4.6.6 trying to choose a portfolio with $100,000 to invest. We shall make the same assump- suposig6es sobre os retornos das duas a¢ées, exceto que agora vamos supor que a tions about the returns on the two stocks, except that now we will suppose that the correlacdo entre os dois retornosRieA2é -0.3, reflectindo a crenca de que as duas accgdes correlation between the two returns R, and R; is —0.3, reflecting a belief that the two tendem a reagir de formas opostas as forgas comuns do mercado. A variagdo de uma stocks tend to react in opposite ways to common market forces. The variance of a carteira deé1acédes do primeiro estoque, é2acdes do segundo estoque, eésddlares portfolio of s,; shares of the first stock, s, shares of the second stock, and s3 dollars investido em 3,6% é agora J invested at 3.6% is now Var(si Rit €2R2+ 0.03663F55€2 1+28€22- 0.3 55x28 61 é2. Var (s, Ry, + 52R2 + 0.03653) = 55s? + 2855 — 0.355 x 28545. Continuamos a assumir que (4.3.2) é valido. A Figura 4.11 mostra a relagdo entre a média We continue to assume that (4.3.2) holds. Figure 4.11 shows the relationship between e a variancia das carteiras eficientes neste exemplo e no Exemplo 4.3.7. Observe como as the mean and variance of the efficient portfolios in this example and Example 4.3.7. variag6es sdo menores neste exemplo do que no Exemplo 4.3.7. Isto se deve ao fato de Notice how the variances are smaller in this example than in Example 4.3.7. This is que a correlacdo negativa diminui a variancia de uma combinacao linear com coeficientes due to the fact that the negative correlation lowers the variance of a linear combina- positivos. - tion with positive coefficients. < O Teorema 4.6.6 também pode ser facilmente estendido para a variancia da soma denvariaveis Theorem 4.6.6 can also be extended easily to the variance of the sum of n random aleatérias, como segue. variables, as follows. 254 Capitulo 4 Expectativa 254 Chapter 4 Expectation Figura 4.11Média e variancia Figure 4.11 Mean and vari- de carteiras de investimentos 1,5 108 ance of efficient investment 1.5x108 eficientes. ; — Correlagao - -0,3 portfolios. eg — Correlation = —0.3 g res Correlagdo - 0 Z see Correlation = 0 s 10 ; 2 108 j & 5107 — & 5x10? — 0 4000 5.000 6.000 7.000 8.000 9.000 10.000 0 4000 5000 6000 7000 8000 9000 =: 10,000 Retorno médio do portfélio Mean portfolio return Teorema SeX,..., XnSdo variaveis aleatdrias tais que Var(Xeu) <oparaeu=1,..., n, entao Theorem If X,,..., X, are random variables such that Var(X;) < co fori =1,...,7n, then 4.6.7 ( ) 4.6.7 ” »? »> n n Var Xeu = Var (Xeu}+2 Cov(Xeu, Xj). (4.6.14) Var (x x) = )¢ Var(X;) +2 5° S° Cov(X;, X;). (4.6.14) eu=1 eu=1 euy i=1 i=1 i<j ProvaPara cada variavel aleatériaS,Cov(S, S-Var(S). Portanto, utilizando o Proof For every random variable Y, Cov(Y, Y) = Var(Y). Therefore, by using the resultado do Exercicio 8 no final desta se¢do, podemos obter a seguinte rela¢do: result in Exercise 8 at the end of this section, we can obtain the following relation: ( y” ) ? ? ym. n n n n n Var Xeu =Covl Xeu,X) j= Cov(Xeu, Xj). Var (x x) = Cov > X,, > xX,|= - > Cov(X;, Xj). eu=1 eu=1_— 1 eu=1/1 i=1 i=1 j=l i=1 j=1 Separaremos a soma final nesta relagdo em duas somas: (i) a Soma daqueles termos We shall separate the final sum in this relation into two sums: (i) the sum of those para os quaiseu-e (ii) asoma dos termos para os quaiseu=/.Entdo, se usarmos 0 fato terms for which i = j and (ii) the sum of those terms for which i 4 j. Then, if we use de que Cov(Xeu, Xj=Cov(Xj, Xeu), obtemos a relagdo the fact that Cov(X;, X ;) = Cov(X ;, X;), we obtain the relation ( ? ) ? >> n n Var Xeu = Var(Xeujt Cov(Xeu, Xj) Var ( x) = )¢ Var(X;) + 9° S) Cov(X;, X;) eu=1 eu=1 eu=/ i=1 i=1 ixj » 2» u =~ Var(Xeu}+2 Cov(Xeu, Xj). = = )° Var(X;) +2 5) ° Cov(X;, X;). = eu=1 euy i=1 i<j O seguinte é um corolario simples do Teorema 4.6.7. The following is a simple corrolary to Theorem 4.6.7. Corolario SeM,..., Xnsdo varidveis _ aleatérias nao correlacionadas (isto é, seXeueXndo estado corretos Corollary If X;,..., X, are uncorrelated random variables (that is, if X; and X; are uncorre- 4.6.2 lacionado sempre queeu=/),entao 4.6.2 lated whenever i # j), then ( ? ) »? n n Var Xeu = Var(Xeu). (4.6.15) Var { }° xX; ] =) Var(X;). (4.6.15) eu=1 eu=1 i=1 i=1 = = O Corolario 4.6.2 estende o Teorema 4.3.5 na pagina 230, que afirma que (4.6.15) é valido sex1 Corollary 4.6.2 extends Theorem 4.3.5 on page 230, which states that (4.6.15) holds ,..+,Xnsdo variaveis aleatérias independentes. if X,,..., X, are independent random variables. Nota: Em geral, as variagées sao adicionadas apenas para variaveis aleatérias nao Note: In General, Variances Add Only for Uncorrelated Random Variables. The correlacionadas.A varidncia de uma soma de variaveis aleatorias deve ser calculada usando 0 Teorema variance of a sum of random variables should be calculated using Theorem 4.6.7 in 4.6.7 em geral. O corolario 4.6.2 aplica-se apenas a variaveis aleatérias ndo correlacionadas. general. Corollary 4.6.2 applies only for uncorrelated random variables. 4.6 Covariancia e Correlacao 255 4.6 Covariance and Correlation 255 Resumo Summary A covaridncia dexXeSé Cov(X, YFALX-EXJI[S-E(SJ}.A correlacdo ép(xX, YECov(x, YAVar The covariance of X and Y is Cov(X, Y) = E{[X — E(X)][Y — E(Y)]}. The correlation (XNar (S12, e mede até que pontoX eSestdo linearmente relacionados. De fato,XeS is 0(X, Y) = Cov(X, Y)/[Var(X) Var(Y)]!/?, and it measures the extent to which X sao precisamente linearmente relacionados se e somente se | p(X, Y)| =1. A variancia and ¥Y are linearly related. Indeed, X and Y are precisely linearly related if and only de uma soma de variaveis aleatdérias pode ser expressa como a soma das variancias if |o(X, Y)| = 1. The variance of a sum of random variables can be expressed as the mais duas vezes a soma das covariancias. A variancia de uma funcdo linear é Var sum of the variances plus two times the sum of the covariances. The variance of a (machadot por+ c=aeVar (Xt beVar(SH2abCov(X, Y). linear function is Var(aX + bY +c) =a? Var(X) + b? Var(Y) + 2ab Cov(X, Y). 1.Suponha que 0 par(X, Yesta uniformemente distribuido 10.Suponha queXeSestdo negativamente correlacionados. 1. Suppose that the pair (X, Y) is uniformly distributed on 10. Suppose that X and Y are negatively correlated. Is no interior de um circulo de raio 1. Calculep(X, Y). E Var(X+Symaior ou menor que Var(X-S?? the interior of a circle of radius 1. Compute p(X, Y). Var(X + Y) larger or smaller than Var(X — Y)? 2.Prove que se Var(X) <oe Var(S) <,entao Cov(X, YE 11.Mostre que duas variaveis aleatériasXeSndo pode 2. Prove that if Var(X) < oo and Var(Y) <oo, then 11. Show that two random variables X and Y cannot pos- finito. Dica:‘Considerando a relacao [(X-px/t(S-s)]220, ter as seguintes propriedades: EX}3, F(SF2, EX210,E(S Cov(X, Y) is finite. Hint: By considering the relation _—_ sibly have the following properties: E(X) = 3, E(Y) =2, mostre que 2-29, eE(XYFO. [(X — wy) + (Y — py) = 0, show that E(X*) = 10, E(Y*) = 29, and E(XY) =0. 1 12.Suponha queXeSter uma distribuigdo conjunta continua 1 12. Suppose that X and Y have a continuous joint distri- | OH XNSps)| < [OCy 5 xet(SU RI. 5 mas para 0 qual o pdf conjunto é 0 seguinte: (X —py)Y — py) S 5 lx — py) +(¥ - py)’ ). bution for which the joint p.d.f. is as follows: { 3.Suponha queXtem a distribuigdo uniforme no F(x, YF 3(xte) para 0<x<1 e O<sims2, caso 3. Suppose that X has the uniform distribution on the fa y= 3(x+y) forO<x<land0<y<2, intervalo [-2,2] eS=X6. Mostre issoXeSndo estado 0 contrario. interval [—2, 2] and Y = X°®. Show that X and Y are un- , 0 otherwise. correlacionados. . correlated. . Determine o valor de Var(2X-3S+8). Determine the value of Var(2X — 3Y + 8). 4.Suponha que a distribuicdo de uma variavel aleatoria 13.Suponha queXeSsdo variaveis aleatdrias tais que 4. Suppose that the distribution ofa random variable Xs 13. Suppose that X and Y are random variables such that Xé simétrico em relagdo ao pontox=0, 0<EX4) <e, eS=X2. . symmetric with respect to the point x = 0,0 < E(X") < oo, : : ns _ laci d Var (XF9, Var(S4, ep(X, Y= -14. Determinar d 2 Sh h d lated Var(X) = 9, Var(Y) =4, and p(X, Y) = —1/6. Determine Mostre issoXeSnao estdo correlacionados. (a)Var(X+ SJe(b)Var(X-3.5+4). and Y = X~*. Show that X and Y are uncorrelated. (a) Var(X + Y) and (b) Var(X — 3Y +4). 5.Para todas as variaveis aleatdriasXeSe todas as constantesa 14.Suponha queX,5,eZsdo trés varidveis aleatorias tais 5. For all random variables X and Y and all constants a, 14. Suppose that X, Y, and Z are three random variables , 8c, ed, mostre que que Var(X}1, Var(5#4, Var(Z8, Cov(X, Y)=1, Cov(XZF b, c, and d, show that such that Var(X) = 1, Var(Y) = 4, Var(Z) = 8, Cov(X, Y) -1 e Cov(S, ZE2. Determinar(a) Var (X+S+Z)e(b)Var 3BX-S- _ = 1, Cov(X, Z) = —1, and Cov(Y, Z) = 2. Determine (a) Cov(machadot b, c¥+eFacCov(X, Y). 2241) Cov(aX +b, cY +d) =ac Cov(X, Y). Var(X +Y + Z) and (b) Var3X — Y —2Z +1). 6.DeixarXeSser variaveis aleatdrias tais que 0<a2 x<co 15.Suponha queX,..., Xnsdo variaveis _ aleatorias 6. Let X and Y be random variables such that 0 < on < 00 15. Suppose that X,,..., X,, are random variables such e0<o2 s<,Suponha quevocé=machadot be =cY+ tais que a varidncia de cada variavel é 1 e a correlacdo and 0 < of < oo. Suppose that U =aX + band V=cY + that the variance of each variable is 1 and the correlation d, ondea=0 ec=0. Mostre issop(vocé, VE p(X, Yse ac>0,e entrecadapardevariaveis diferentes 6 1A. d, where a £0 and c 40. Show that o(U, V) = p(X, Y) if between each pair of different variables is 1/4. Determine plvocé, VF -p(X, Yseac <0. Determinar Var(Xit. . .+Xn). ac > 0, and p(U, V) = —p(X, Y) if ac < 0. Var(X,+---+X,). 7.DeixarX,5,eZser trés varidveis aleatérias tais que Cov _—-16.Considere 0 investidor no Exemplo 4.2.3 na pagina 220. 7. Let X, Y, and Z be three random variables such that 16. Consider the investor in Example 4.2.3 on page 220. (X,Ze Cov(S, Zexistir e deixara,b, ecser arbitrario dadas | Suponha que os retornosiefznas duas acoes tém Cov(X, Z) and Cov(Y, Z) exist, and let a, b, and c be Suppose that the returns R; and R, on the two stocks constantes. Mostre isso correlagdo -1. Um portfélio sera composto por é1acgées da arbitrary given constants. Show that have correlation —1. A portfolio will consist of s; shares primeira agdo eézacgdes da segunda acdo onde &1, €220. of the first stock and s shares of the second stock where Cov(machadot port c, ZF aCov(X,Z} bCov(S, Z). Encontre uma carteira tal que o custo total da carteira seja Cov(aX + bY +c, Z) =a Cov(X, Z) + b Cov(Y, Z). S51, Sy > 0. Find a portfolio such that the total cost of the de $ 6.000 e a variancia do retorno seja 0. Por que esta portfolio is $6000 and the variance of the return is 0. Why 8.Suponha queX,..., XeveS,..., SnSdo variaveis situacdo é€ irrealista? 8. Suppose that X;,..., X,, and Y;,..., Y, arerandom i this situation unrealistic? aleatdrias tais que Cov(Xeu, Sjexiste paraeu=1,..., eue ; . . oo variables such that Cov(X;, Y;) existsfori =1,...,m and ; ; ; ; Fl,...,n,e suponha queat,..., aevebi,..., bn 17.DeixarXeSser variaveis aleatorias com varidncia finita. j=1,...,n, and suppose that aj,...,a, and by, ...,b, 17. Let X and Y be random variables with finite variance. 540 constantes. Mostre isso Prove isso | 0(X, Y)| =1 implica que existem constantes 4,6, are constants. Show that Prove that |o(X, Y)| = 1implies that there exist constants [ | ecde tal modo quemachado+ por=ccom probabilidade 1. a, b, and c such that aX + bY =c with probability 1. Hint: yeu yn Yau Dica: Use 0 Teorema 4.6.2 comvocé=X-pxe V=S-ps. m n m on Use Theorem 4.6.2 with U = X — wy and V=Y — py. Cov aeuXey iS} ~ aeubCov(Xeu, 5). 18.DeixarXestém uma distribuicéo continua com articulacdo Cov » ajXi, » bij) = » » aj Cov(X;,¥j). 18. Let X and Y have a continuous distribution with joint eu=1 Fl eu=1/-1 i=1 j=1 i=1 j=1 pdf p.d.f. 9.Suponha queXeSsao duas variaveis aleatdrias, que t xtsim para OSxS1 e O<sims1, caso 9. Suppose that X and Y are two random variables, which x+y forO<x<1land0<y<1, podem ser dependentes, e Var(XVar(S). Supondo que F(x, YF . may be dependent, and Var(X) = Var(Y). Assuming that I, y= . 0<Var(X+5) <ee 0<Var(X-S) <e,mostre que as variaveis 0 contrario. 0 < Var(X + Y) < coand0 < Var(X — Y) < 00, show that 0 otherwise. aleatoriasX+SeX-Sndo estdo correlacionados. Calcule a covaridncia Cov(X, Y). the random variables X + Y and X — Y are uncorrelated. Compute the covariance Cov(X, Y). 256 Capitulo 4 Expectativa 256 Chapter 4 Expectation 4.7 Expectativa Condicional 4.7 Conditional Expectation Como as expectativas (incluindo variancias e covariancias) sao propriedades de Since expectations (including variances and covariances) are properties of distri- distribuigées, existirgo vers6es condicionais de todos esses resumos distribucionais, butions, there will exist conditional versions of all such distributional summaries bem como versées condicionais de todos os teoremas que provamos ou provaremos as well as conditional versions of all theorems that we have proven or will later mais tarde sobre as expectativas. Em particular, suponha que desejamos prever uma prove about expectations. In particular, suppose that we wish to predict one ran- varidvel aleat6riaSusando uma fungaod(X)de outra varidvel aleat6riaXde modo a dom variable Y using a function d(X) of another random variable X so as to minimizarE(|S-d(XJl2). Entaod(X)deve ser a média condicional deSdado minimize E([Y — d(x )Pp). Then d(X) should be the conditional mean of Y given X.Hé também um teorema muito Util que 6 uma extensGo das expectativas da lei da X. There is also a very useful theorem that is an extension to expectations of the probabilidade total. law of total probability. Definicdo e propriedades basicas Definition and Basic Properties Exemplo Pesquisa domiciliar.Um conjunto de domicilios foi pesquisado e cada domicilio Example Household Survey. A collection of households were surveyed, and each household re- 4.7.1 portou o numero de membros e o numero de automédveis possuidos. Os numeros 4.7.1 ported the number of members and the number of automobiles owned. The reported reportados estado na Tabela 4.1. numbers are in Table 4.1. Suponhamos que tivéssemos que amostrar aleatoriamente um agregado familiar dos Suppose that we were to sample a household at random from those households agregados no inquérito e saber o nimero de membros. Qual seria entéo o numero esperado in the survey and learn the number of members. What would then be the expected de automéveis que eles possuem? - number of automobiles that they own? < A questao no final do Exemplo 4.7.1 esta intimamente relacionada com a distribui¢gado The question at the end of Example 4.7.1 is closely related to the conditional condicional de uma variavel aleatoria dada a outra, conforme definido na Sedo. 3.6. distribution of one random variable given the other, as defined in Sec. 3.6. Definigao Expectativa/média condicional.DeixarXeSsejam varidveis aleatdrias tais que a média Definition Conditional Expectation/Mean. Let X and Y be random variables such that the mean 4.7.1 deSexiste e é finito. Oexpectativa condicional (ou média condicional) deSdado X= 4.7.1 of Y exists and is finite. The conditional expectation (or conditional mean) of Y given xé denotado por£(5| xe é definido como a expectativa da distribuigdo X =x is denoted by E(Y |x) and is defined to be the expectation of the conditional condicional deSdadoX=x. distribution of Y given X = x. Por exemplo, se Stem uma distribuigdo condicional continua dadaX=xcom pdf For example, if Y has a continuous conditional distribution given X = x with condicionalg2(vocé| x), entao j conditional p.d.f. g5(y|x), then * Co E(S| XF yg2(vocé| x) morrer. (4.7.1) E(Y |x) = / ygo(y|x) dy. (4.7.1) 00 —0o Da mesma forma, seStem uma distribuicgdo condicional discreta dadaX=xcom PF condicional gz(vocé| Similarly, if Y has a discrete conditional distribution given X = x with conditional p.f. x), entao g0(y|x), then 2d E(S| x yg2(vocé| x). (4.7.2) E(Y|x) = > ygo(y|x). (4.7.2) Todossim Ally Tabela 4.1Numeros relatados de membros da familia e Table 4.1 Reported numbers of household members and automoveis no Exemplo 4.7.1 automobiles in Example 4.7.1 Numerode ____Ndmerodemembros Number of ___Numberofmembers automoveis 1 2 3456 7 8 automobiles 1 2 3 4 5 6 7 8 0 10 7 3 2 2 1 0 0 0 107 3 2 2 1 #0 #0 1 12 21 25302515 5 1 1 12 21 25 30 25 15 5 1 2 1510152011 5 3 2 1 5 10 15 20 11 5 3 3 023553 2 1 3 0 2 3 5 5 3 2 «21 4.7 Conditional Expectation 257 The value of E(Y|x) will not be uniquely defined for those values of x such that the marginal p.f. or p.d.f. of X satisfies f1(x) = 0. However, since these values of x form a set of points whose probability is 0, the definition of E(Y|x) at such a point is irrelevant. (See Exercise 11 in Sec. 3.6.) It is also possible that there will be some values of x such that the mean of the conditional distribution of Y given X = x is undefined for those x values. When the mean of Y exists and is finite, the set of x values for which the conditional mean is undefined has probability 0. The expressions in Eqs. (4.7.1) and (4.7.2) are functions of x. These functions of x can be computed before X is observed, and this idea leads to the following useful concept. Definition 4.7.2 Conditional Means as Random Variables. Let h(x) stand for the function of x that is denoted E(Y|x) in either (4.7.1) or (4.7.2). Define the symbol E(Y|X) to mean h(X) and call it the conditional mean of Y given X. In other words, E(Y|X) is a random variable (a function of X) whose value when X = x is E(Y|x). Obviously, we could define E(X|Y) and E(X|y) analogously. Example 4.7.2 Household Survey. Consider the household survey in Example 4.7.1. Let X be the number of members in a randomly selected household from the survey, and let Y be the number of cars owned by that household. The 250 surveyed households are all equally likely to be selected, so Pr(X = x, Y = y) is the number of households with x members and y cars, divided by 250. Those probabilities are reported in Table 4.2. Suppose that the sampled household has X = 4 members. The conditional p.f. of Y given X = 4 is g2(y|4) = f (4, y)/f1(4), which is the x = 4 column of Table 4.2 divided by f1(4) = 0.208, namely, g2(0|4) = 0.0385, g2(1|4) = 0.5769, g2(2|4) = 0.2885, g2(3|4) = 0.0962. The conditional mean of Y given X = 4 is then E(Y|4) = 0 × 0.0385 + 1 × 0.5769 + 2 × 0.2885 + 3 × 0.0962 = 1.442. Similarly, we can compute E(Y|x) for all eight values of x. They are x 1 2 3 4 5 6 7 8 E(Y|x) 0.609 1.057 1.317 1.442 1.538 1.533 1.75 2 Table 4.2 Joint p.f. f (x, y) of X and Y in Example 4.7.2 together with marginal p.f.’s f1(x) and f2(y) x y 1 2 3 4 5 6 7 8 f2(y) 0 0.040 0.028 0.012 0.008 0.008 0.004 0 0 0.100 1 0.048 0.084 0.100 0.120 0.100 0.060 0.020 0.004 0.536 2 0.004 0.020 0.040 0.060 0.080 0.044 0.020 0.012 0.280 3 0 0.008 0.012 0.020 0.020 0.012 0.008 0.004 0.084 f1(x) 0.092 0.140 0.164 0.208 0.208 0.120 0.048 0.020 4.7 Expectativa Condicional 257 O valor deE(S|x)não será definido exclusivamente para esses valores dextal que o PF marginal ou pdf deXsatisfazf1(x)=0. No entanto, uma vez que estes valores dex formam um conjunto de pontos cuja probabilidade é 0, a definição deE(S|x)nesse ponto é irrelevante. (Veja o Exercício 11 na Seção 3.6.) Também é possível que haja alguns valores dextal que a média da distribuição condicional deSdadoX=xé indefinido para aquelesx valores. Quando a média deSexiste e é finito, o conjunto dex valores para os quais a média condicional é indefinida têm probabilidade 0. As expressões nas Eqs. (4.7.1) e (4.7.2) são funções dex. Estas funções de x pode ser calculado antesXé observado, e essa ideia leva ao seguinte conceito útil. Definição 4.7.2 Médias condicionais como variáveis aleatórias.Deixarh(x)representam a função dexaquilo é denotadoE(S|x)em (4.7.1) ou (4.7.2). Defina o símboloE(S|X)significarh(X) e chamá-lo demédia condicional deSdadoX. Em outras palavras,E(S|X)é uma variável aleatória (uma função deX) cujo valor quando X=xéE(S|x). Obviamente, poderíamos definirEX|S)eEX|e)analogamente. Exemplo 4.7.2 Pesquisa domiciliar.Considere a pesquisa domiciliar no Exemplo 4.7.1. DeixarXseja o número de membros em um domicílio selecionado aleatoriamente na pesquisa, e deixarSserá o número de carros pertencentes a essa família. Os 250 agregados familiares inquiridos têm a mesma probabilidade de serem seleccionados, por isso o Pr(X=x, Y=e)é o número de domicílios com xmembros esimcarros, dividido por 250. Essas probabilidades são relatadas na Tabela 4.2. Suponha que o agregado familiar amostrado tenhaX=4 membros. O PF condicional deS dadoX= 4 ég2(você|4)=f (4, y)/f1(4), qual é ox=4ª coluna da Tabela 4.2 dividida porf1(4)=0.208, a saber, g2(0|4)=0.0385, g2(1|4)=0.5769, g2(2|4)=0.2885, g2(3|4)=0.0962. A média condicional deSdadoX=4 é então E(S|4)=0×0.0385 + 1×0.5769 + 2×0.2885 + 3×0.0962 = 1.442. Da mesma forma, podemos calcularE(S|x)para todos os oito valores dex. Eles são x 1 2 3 4 5 6 7 8 E(S|x) 0,609 1.057 1.317 1.442 1.538 1.533 1,75 2 Tabela 4.2Junta PFf (x, y)deXeSno Exemplo 4.7.2 junto com marginal PFf1(x)ef2(s) x sim 1 2 3 4 5 6 7 8 f2(s) 0 1 2 3 0,040 0,028 0,012 0,008 0,008 0,004 0 0,048 0,084 0,100 0,120 0,100 0,060 0,020 0,004 0,536 0,004 0,020 0,040 0,060 0,080 0,044 0,020 0,012 0,280 0 0,008 0,012 0,020 0,020 0,012 0,008 0,004 0,084 0 0,100 f1(x)0,092 0,140 0,164 0,208 0,208 0,120 0,048 0,020 258 Capitulo 4 Expectativa 258 Chapter 4 Expectation A variavel aleatéria que assume 0 valor 0.609 quando 0 domicilio amostrado tem um The random variable that takes the value 0.609 when the sampled household has one membro, assume 0 valor 1.057 quando 0 domicilio amostrado tem dois membros, e assim member, takes the value 1.057 when the sampled household has two members, and por diante, é a variavel aleatériaF(S| X). - so on, is the random variable E(Y|X). < Exemplo Um ensaio clinico.Considere um ensaio clinico no qual varios pacientes serao tratados Example A Clinical Trial. Consider a clinical trial in which a number of patients will be treated 4.7.3 e cada paciente tera um de dois resultados possiveis: sucesso ou fracasso. DeixarP 4.7.3 and each patient will have one of two possible outcomes: success or failure. Let P ser a proporcgdo de sucessos em uma colecdo muito grande de pacientes, e deixar Xeu be the proportion of successes in a very large collection of patients, and let X; = 1 =1 se oeu0 paciente é um sucesso eXev=0 se ndo. Suponha que as variaveis if the ith patient is a success and X; = 0 if not. Assume that the random variables aleatorias X1, X2,...sdo condicionalmente independentes, dadosP=pcom Pr.(Xeu=1 | P X1, X,... are conditionally independent given P = p with Pr(xX; = 1|P = p) = p. =p p. DeixarX=Xi+. . .+Xn, que 6 o numero de pacientes do primeironque sdo Let X = X,+---+X,, which is the number of patients out of the first n who are sucessos. Agora calculamos a média condicional deXdadoP.Os pacientes sdo successes. We now compute the conditional mean of X given P. The patients are independentes e distribuidos de forma idéntica, dependendo deP=p. Portanto, a independent and identically distributed conditional on P = p. Hence, the conditional distribuigdo condicional deXdadoP=pé a distribuigdo binomial com parametrosnep. distribution of X given P = p is the binomial distribution with parameters n and p. Como vimos na Sec. 4.2, a média desta distribuigdo binomial énp, entao EX| pe np eEX As we saw in Sec. 4.2, the mean of this binomial distribution is np, so E(X|p) =np | PEnP.Posteriormente, mostraremos como calcular a média condicional deP dadoX. and E(X|P) =nP. Later, we will show how to compute the conditional mean of P Isso pode ser usado para preverPdepois de observarX. - given X. This can be used to predict P after observing X. < Nota: A média condicional deSDadoXE uma variavel aleatéria.PorqueF(S| X)é Note: The Conditional Mean of Y Given X Is a Random Variable. Because E(Y|X) uma funcdo da variavel aleatériaX, ela propria é uma variavel aleatéria com sua is a function of the random variable X, it is itself a random variable with its own propria distribuigdo de probabilidade, que pode ser derivada da distribuigdo dex. Por probability distribution, which can be derived from the distribution of X. On the outro lado, A(x E(S| x)é uma fungdo dexque pode ser manipulado como qualquer other hand, h(x) = E(Y|x) is a function of x that can be manipulated like any other outra fungdo. A conexdo entre os dois € que quando se substitui a variavel aleatoriaX function. The connection between the two is that when one substitutes the random paraxem/(x), o resultado éhA(X}= E(S| X). variable X for x in h(x), the result is h(X) = E(Y|X). Mostraremos agora que a média da variavel aleatéria£(S| X)devemos ser £(S). Um We shall now show that the mean of the random variable E(Y|X) must be E(Y). calculo semelhante mostra que a média deEX| S)\devemos ser EX). A similar calculation shows that the mean of E(X|Y) must be E(X). Teorema Lei da Probabilidade Total para Expectativas.DeixarXeSsejam varidveis aleatérias tais que Theorem Law of Total Probability for Expectations. Let X and Y be random variables such that 4.7.1 Stem média finita. Entao 4.7.1 Y has finite mean. Then ELE(S| XJ] =E(S). (4.7.3) E[E(Y|X)|=E(). (4.7.3) ProvaSuponhamos, por conveniéncia, queXeStém uma distribuigdo conjunta Proof We shall assume, for convenience, that X and Y have a continuous joint continua. Entado j distribution. Then * CO FLE(S| X) = E(S| x) (x)cbx Joo E[E(Y|X)|= / E(Y |x) fila) dx — 0 —0o Joo CO CO = yg2(vocée| x)fi(x) dy dx. = / / y82(v lx) fi) dy dx. = 0 00 —0o J—0o Desdeg2(vocé| xFf (x, y)/fi(x), segue que Since go(y|x) = f(x, y)/f,(x), it follows that Jorfoo 00 oo FLE(S| X) = Vf (x, y) dy ax=E(S). E(EYXO]= | / yf (x, y) dy dx = E(Y). —0 — 00 —oo J—00 A prova para uma distribuicdo discreta ou um tipo mais geral de distribuigdo é semelhante. The proof for a discrete distribution or a more general type of distribution is similar. = = Exemplo Pesquisa domiciliar.No final do Exemplo 4.7.2, descrevemos a variavel aleatoria Example Household Survey. At the end of Example 4.7.2, we described the random variable 4.7.4 £(S| X). Sua distribuicdo pode ser construida a partir dessa descrigdo. Tem uma distribuicdo 4.7.4 E(Y|X). Its distribution can be constructed from that description. It has a discrete dis- discreta que assume os oito valores de£(S| xJistado perto do final desse exemplo com tribution that takes the eight values of E (Y |x) listed near the end of that example with probabilidades correspondentesfi(xJparax=1,...,8. Para ser mais especifico, vamosZ=£(5| X), corresponding probabilities f;(x) for x =1,..., 8. To be specific, let Z = E(Y|X), entdo Pr[Z=£(5| x] =fi(x)parax=1,...,8. Os valores especificos sao then Pr[Z = E(Y |x)]= f,(x) for x =1,..., 8. The specific values are 4.7 Expectativa Condicional 259 4.7 Conditional Expectation 259 Z 0,609 1,057 1,317 1,442 1,538 1,533 1,75 2 Z 0.609 1.057 1.317 1.442 1.538 = 1.533 1.75 2 Pr.(Z=z) 0,092 0,140 0,164 0,208 0,208 0,120 0,048 0,020 Pr(Z =z) 0.092 0.140 0.164 0.208 0.208 0.120 0.048 0.020 Podemos calcular£(Z0.609x0.092 +. . .+ 2x0.020 = 1.348. O leitor podera We can compute E(Z) = 0.609 x 0.092 + ---+2 x 0.020 = 1.348. The reader can verificar que£(S1.348 usando os valores defa(s)na Tabela 4.2. - verify that E(Y) = 1.348 by using the values of f5(y) in Table 4.2. < Exemplo Um ensaio clinico.No Exemplo 4.7.3, deixamosXser o numero de pacientes fora do Example A Clinical Trial. In Example 4.7.3, we let X be the number of patients out of the 4.7.5 primeironque sdo sucessos. A média condicional deXdadoP=pfoi computado 4.7.5 first n who are successes. The conditional mean of X given P = p was computed as como EX| pEnp, ondePé a proporcdo de sucessos em uma grande populagao de E(X|p) =np, where P is the proportion of successes in a large population of patients. pacientes. Se a distribuigdo dePé uniforme no intervalo [0,1], entdo o valor If the distribution of P is uniform on the interval [0, 1], then the marginal expected marginal esperado deXé£[EX| PJ] =E(nP -n/2. Veremos como calcularF(P| X)no value of X is E[E(X|P)]= E(P) =n/2. We will see how to calculate E(P|X) in Exemplo 4.7.8. - Example 4.7.8. < Exemplo Escolhendo pontos de distribuigées uniformes.Suponha que um pontoXé escolhido em Example Choosing Points from Uniform Distributions. Suppose that a point X is chosen in 4.7.6 de acordo com a distribui¢do uniforme no intervalo [0,1]. Além disso, suponha que 4.7.6 accordance with the uniform distribution on the interval [0, 1]. Also, suppose that apos 0 valorxX=xtem sido observado(0<x <1), um pontoSé escolhido de acordo com after the value X = x has been observed (0 < x < 1), a point Y is chosen in accordance uma distribuigdo uniforme no intervalo [x,1]. Vamos determinar o valor de£(S). with a uniform distribution on the interval [x, 1]. We shall determine the value of E(Y). Para cada valor dado dex (0<x <1),E(S| xsera igual ao ponto médio For each given value of x (0 <x <1), E(Y|x) will be equal to the midpoint (172)(x+1)do intervalo [x,1]. Portanto, £(5| XF(12)(X+1)e (1/2)(x + 1) of the interval [x, 1]. Therefore, E(Y|X) = (1/2)(X + 1) and E(SPALE(S|XI= s[EXH1]= Vyte 3 E(Y) = E[EW|X)] = =[E(X) + 1]=- (; +1) _3 < 2 22 "4 — 2 — 2A\2 4 Ao manipular a distribuigdo condicional dadaX=x, 6 seguro agir como se Xé a When manipulating the conditional distribution given X = x, it is safe to act as if constantex. Este facto, que pode simplificar o calculo de certas médias condicionais, X is the constant x. This fact, which can simplify the calculation of certain conditional é agora afirmado sem prova. means, is now stated without proof. Teorema DeixarXeSsejam variaveis aleatdrias e deixeZ=r(X, Yara alguma fungaoR. O Theorem Let X and Y be random variables, and let Z = r(X, Y) for some function r. The 4.7.2 distribuigdo condicional deZdadoX=xé o mesmo que a distribuigdo condicional 4.7.2 conditional distribution of Z given X = x is the same as the conditional distribution der(x, YdadoxX=x. a of r(x, Y) given X =x. a Uma consequéncia do Teorema 4.7.2 quandoxXe Stem uma junta continua One consequence of Theorem 4.7.2 when X and Y have a continuous joint distribuigdo é que f distribution is that °° CO E(Z| XFE(r(x, Y)| XF 1X, y)g2(vocé| x) morrer. E(Z\|x) = E(r(x, Y)|x) = / r(x, y)go(y|x) dy. — 0 —0o O Teorema 4.7.1 também implica que para duas variaveis aleatorias arbitrariasXeS, Theorem 4.7.1 also implies that for two arbitrary random variables X and Y, AELW(X, Y)| X10} =ELWX, YA, (4.7.4) E{E[r(X, Y)|X]} = E[r(X, Y)], (4.7.4) deixandoZ=r(x, Ye notando queA{E(Z| X}} =F(Z). by letting Z =r(X, Y) and noting that E{E(Z|X)} = E(Z). Podemos definir, de maneira semelhante, a expectativa condicional der(x, Y) We can define, in a similar manner, the conditional expectation of r(X, Y) given dado Se a expectativa condicional de uma fungao/(X!, .. ., Xnde varias variaveis Y and the conditional expectation of a function r(X,,..., X,,) of several random aleatorias, dada uma ou mais das variaveisX,..., Xn. variables given one or more of the variables X,,..., X,. Exemplo Expectativa Condicional Linear.Suponha que£($| X=machadot bpara algumas constantesa Example Linear Conditional Expectation. Suppose that E(Y|X) = aX + b for some constants a 4.7.7 eb. Vamos determinar 0 valor de&(xYJem termos de£X}e EX2). 4.7.7 and b. We shall determine the value of E(XY) in terms of E(X) and E(X2). Pela Eq. (4.7.4), E(XYFELE(XY| XJ]. Além disso, desdexXé considerado dado e By Eq. (4.7.4), E(XY) = E[E(XY|X)]. Furthermore, since X is considered to be fixado na expectativa condicional, given and fixed in the conditional expectation, E(XY| XEXE(Y| XX(aX+ bE machadoz+ bx. E(XY|X) = XE(Y|X) = X(aX +b) =aX? + dX. 260 Capitulo 4 Expectativa 260 Chapter 4 Expectation Portanto, Therefore, E(XYF E(aX2+ bX aF(X2} ser(X). - E(XY) = E(aX? + bX) =aE(X’) + bE(X). < A média nao é a Unica caracteristica de uma distribuigéo condicional que é importante o suficiente para The mean is not the only feature of a conditional distribution that is important receber seu prdprio nome. enough to get its own name. Definigao Variancia Condicional.Para cada valor dado, deixe Var(S| x)denotar a variancia do Definition Conditional Variance. For every given value x, let Var(Y |x) denote the variance of the 4.7.3 distribuigdo condicional deSdado queX=x. Aquilo é, 4.7.3 conditional distribution of Y given that X = x. That is, Var (S| XFALS-E(S| x2 | x}. (4.7.5) Var (Y |x) = E{[Y — E(Y|x)P |x}. (4.7.5) Chamamos Var(S| xjovariac¢do condicional deSdadox=x. We call Var(Y |x) the conditional variance of Y given X =x. A expressdo na Eq. (4.7.5) € mais uma vez uma fungaov(x). Vamos definir Var The expression in Eq. (4.7.5) is once again a function v(x). We shall define (S| Xserv(Xje chama-lo devariacao condicional deSdadox. Var(Y |X) to be v(X) and call it the conditional variance of Y given X. Nota: Outras Grandezas Condicionais.Da mesma forma que nas Definigdes 4.7.1 Note: Other Conditional Quantities. In much the same way as in Definitions 4.7.1 e 4.7.3, poderiamos definir qualquer resumo condicional de uma distribuigdo que and 4.7.3, we could define any conditional summary of a distribution that we wish. For desejarmos. Por exemplo, quantis condicionais deSdadoX=xsdo os quantis da example, conditional quantiles of Y given X = x are the quantiles of the conditional distribuigdo condicional deSdadoX=x. O mgf condicional deSdadoX=xé o mgf da distribution of Y given X = x. The conditional m.g.f. of Y given X = x is the m.g.f. of distribuigdo condicional deSdadoX=x, etc. the conditional distribution of Y given X = x, etc. Predigdo Prediction No final do Exemplo 4.7.3, consideramos 0 problema de prever a proporcdo Pde sucessos At the end of Example 4.7.3, we considered the problem of predicting the proportion em uma grande populacdo de pacientes, dado o numero observadoXde sucessos em uma P of successes in a large population of patients given the observed number X of amostra de tamanhon. Em geral, considere duas variaveis aleatdrias arbitrariasX eSque succeses in a sample of size n. In general, consider two arbitrary random variables X tém uma distribuigdo conjunta especificada e suponha que apos o valor dex foi and Y that have a specified joint distribution and suppose that after the value of X observado, o valor deSdeve ser previsto. Em outras palavras, o valor previsto deSpode has been observed, the value of Y must be predicted. In other words, the predicted depender do valor dex. Vamos assumir que este valor previsto a/X)deve ser escolhido de value of Y can depend on the value of X. We shall assume that this predicted value modo a minimizar 0 erro quadratico médio AS a(X)]2}. d(X) must be chosen so as to minimize the mean squared error E{[Y — d(X)f}. Teorema A previsdod(X)que minimiza AS a(X)l2}é aX E(S| X). Theorem The prediction d(X) that minimizes E{[Y — d(X)F} is d(X) = E(Y|X). 4.7.3 4.7.3 ProvaProvaremos 0 teorema no caso em queXtem uma distribui¢do continua, Proof We shall prove the theorem in the case in which X has a continuous distri- mas a prova no caso discreto é virtualmente idéntica. Deixara(X £(S| X), e deixar bution, but the proof in the discrete case is virtually identical. Let d(X) = E(Y|X), a(Xser um preditor arbitrario. Precisamos apenas provar queA{[S-a(XJ]2} < AILS and let d*(X) be an arbitrary predictor. We need only prove that E{[Y — d(X)} < as (XJ|2}. Seque-se da Eq. (4.7.4) que E{[Y — d*(X)f}. It follows from Eq. (4.7.4) that ALS-d(X)]2} =E(ALS- (X)]2 | XP). (4.7.6) E{[Y — d(X)P} = E(E{[Y — d(X) |X}). (4.7.6) Uma equacdo semelhante vale parads. DeixarZ= [S-a(XJ]2, e deixar h(x £(Z| x). A similar equation holds for d*. Let Z =[Y — d(X)f, and let h(x) = E(Z|x). Sim- Simiflarly, deixe Z= [S-a«(X)]2e h«(x}= E(Z| x). O certof-lado de (4.7.6) é ilarly, let Z* = [Y — d*(X)f and h*(x) = E(Z*|x). The right-hand side of (4.7.6) is hodfiodadx, e a expressdo correspondente usandoaé he Ohi (x)dx. Entao o J h(x) f(x) dx, and the corresponding expression using d* is [ h*(x) f(x) dx. So, the a prova estara completa se Pupiermos provar que j proof will be complete if we can prove that hodfi Odaxs he OA Odx. (4.7.7) / h(x) fix) dx < / h* (x) f(x) dx. (4.7.7) Claramente, a Eq. (4.7.7) é valido se pudermos mostrar queh(x)s/+(x)para todosx. Ou seja, Clearly, Eq. (4.7.7) holds if we can show that h(x) < h*(x) for all x. That is, the proof a prova esta completa se pudermos mostrar queA{LS-d(X)]2 | x} $A{S-a«(XJ]2 | x}. quando is complete if we can show that E{[Y — d(X)P |x} <E{[y - d*(X)P |x}. When we nos condicionarmosxX=x, estamos autorizados a tratarXcomo se fosse a constantex, entao condition on X = x, we are allowed to treat X as if it were the constant x, so we need precisamos mostrar queA[S-a(xJ]2 | x} sA{LS- a+(x)]2| x}. Estas Ultimas expressdes nada mais to show that E{[Y — d(x)P |x} <E{[Y - d*(x)P|x}. These last expressions are nothing sdo do que MSE para duas previsées diferentesa(x)e a+(x)de Scalculado more than the M.S.E.’s for two different predictions d(x) and d*(x) of Y calculated 4.7 Expectativa Condicional 261 4.7 Conditional Expectation 261 usando a distribuigdo condicional deSdadoX=x. Conforme discutido na Seg. 4.5, o using the conditional distribution of Y given X = x. As discussed in Sec. 4.5, the MSE de tal previsdo é menor se a previsdo for a média da distribuigdo deS.Neste M.S.E. of such a prediction is smallest if the prediction is the mean of the distribution caso, essa média é a média da distribuigdo condicional deSdado X=x. Desdead(x}é of Y. In this case, that mean is the mean of the conditional distribution of Y given a média da distribuigdo condicional de SdadoX=x, deve ter MSE menor do que X =x. Since d(x) is the mean of the conditional distribution of Y given X = x, it must qualquer outra previsdoa:(x). Por isso, h(x)s/+(x)Jpara todosx. have smaller M.S.E. than every other prediction d*(x). Hence, h(x) < h*(x) for all x. . . Se o valorX=xé observado e 0 valor £(S| xJesta previsto paraS,entado o MSE If the value X =x is observed and the value E(Y|x) is predicted for Y, then deste valor previsto sera Var(S| x), da Definigdo 4.7.3. Seque-se da Eq. (4.7.6) que the M.S.E. of this predicted value will be Var(Y|x), from Definition 4.7.3. It follows se a previsdo for feita usando a fungdod(X- F(S| X), entado o MSE geral, calculado from Eq. (4.7.6) that if the prediction is to be made by using the function d(X) = sobre todos os valores possiveis dex, vai ser E[Var(S| XJ]. E(Y|X), then the overall M.S.E., averaged over all the possible values of X, will be E[Var(Y|X)]. Se o valor deSdeve ser previsto sem qualquer informagdo sobre o valor de x, If the value of Y must be predicted without any information about the value of entado, como mostrado na Sec. 4.5, a melhor previsdo é a média£(S)e o MSE é Var X, then, as shown in Sec. 4.5, the best prediction is the mean E(Y) and the M.S.E. (S). No entanto, seXpode ser observada antes da previsdo ser feita, a melhor is Var(Y). However, if X can be observed before the prediction is made, the best previsdo 6a(X}E(5| Xe o MSE €f[Var(S| XJ]. Assim, a redugdo no MSE que pode prediction is d(X) = E(Y|X) and the M.S.E. is E[Var(Y|X)]. Thus, the reduction in ser alcangada usando a observagao.xé the M.S.E. that can be achieved by using the observation X is Var (S)-A[Var (S| XJ]. (4.7.8) Var(Y) — E[Var(Y|X)]. (4.7.8) Esta reducdo fornece uma medida da utilidade doXem preverS.E mostrado no This reduction provides a measure of the usefulness of X in predicting Y. It is shown Exercicio 11 no final desta secdo que esta reducdo também pode ser expressa como in Exercise 11 at the end of this section that this reduction can also be expressed as Var[F(S| XJ]. Var[E(Y|X)]. E importante distinguir cuidadosamente entre o MSE global, que é A[Var(S| It is important to distinguish carefully between the overall M.S.E., which is X}| e o MSE da previsdo especifica a ser feita quandoX=x, que é Var(S| x).Anteso E[Var(Y|X)], and the M.S.E. of the particular prediction to be made when X = x, valor deXfoi observado, o valor apropriado para o MSE do processo completo de which is Var(Y|x). Before the value of X has been observed, the appropriate value observagdoXe entdo prevendoSé f[Var(S| XJ]. Depoisum valor especifico xdeAfoi for the M.S.E. of the complete process of observing X and then predicting Y is observado e a previsdo F(S| xffoi feita, a medida apropriada do MSE desta E[Var(Y|X)]. After a particular value x of X has been observed and the prediction previsdo é Var(S| x). Uma relacao util entre estes valores é dada no resultado E(Y|x) has been made, the appropriate measure of the M.S.E. of this prediction is seguinte, cuja prova fica para o Exercicio 11. Var(Y |x). A useful relationship between these values is given in the following result, whose proof is left to Exercise 11. Teorema Lei da Probabilidade Total para Variagdes.SeXe5sao variaveis aleatérias arbitrarias para Theorem Law of Total Probability for Variances. If X and Y are arbitrary random variables for 4.7.4 quais existem as expectativas e variagdes necessarias, entdo Var(Sf[Var(S| XJ] + Var[ 4.7.4 which the necessary expectations and variances exist, then Var(Y) = E[Var(Y|X)]+ E(S| XA. . Var[E(Y|X)]. 7 Exemplo Um ensaio clinico.No Exemplo 4.7.3, deixeXsera o numero de pacientes do primeiro Example A Clinical Trial. In Example 4.7.3, let X be the number of patients out of the first 4.7.8 40 em um ensaio clinico que tem como resultado 0 sucesso. DeixarPser a 4.7.8 40 in a clinical trial who have success as their outcome. Let P be the probability probabilidade de um paciente individual ser um sucesso. Suponha quePtem that an individual patient is a success. Suppose that P has the uniform distribution distribuigdo uniforme no intervalo [0,1] antes do inicio do estudo, e suponha que os on the interval [0, 1] before the trial begins, and suppose that the outcomes of the resultados dos pacientes sejam condicionalmente independentes, dadasP=p. Como patients are conditionally independent given P = p. As we saw in Example 4.7.3, X vimos no Exemplo 4.7.3,Xtem a distribuigdo binomial com parametros 40 epdado P= has the binomial distribution with parameters 40 and p given P = p. If we needed to p. Se precisassemos minimizar o MSE na previsdoPantes de observarX, usariamos a minimize M.S.E. in predicting P before observing X, we would use the mean of P, média deP, ou seja, 1/2. O MSE seria Var(P=1/ 2. No entanto, em breve namely, 1/2. The M.S.E. would be Var(P) = 1/12. However, we shall soon observe the observaremos 0 valor deXe entdo prever?.Para fazer isso, precisaremos da value of X and then predict P. To do this, we shall need the conditional distribution distribuigdo condicional dePdadox=x. O teorema de Bayes para varidveis aleatdorias of P given X =x. Bayes’ theorem for random variables (3.6.13) tells us that the (3.6.13) nos diz que a fdp condicional dePdadoX=xé conditional p.d.f. of P given X = x is cxipsg.| xe DELPIRIP) 479) wo(p|x) =< 1A) (4.79) fi (x) fi) ondegi(x| pX€ o FP condicional deXdadoP=p, ou seja, o binédmio pf where g;(x|p) is the conditional p.f. of X given P = p, namely, the binomial p.f. gi(x| pepe -pso-xparax=0,...,40,4(P1 por 0<p <1 é€a pdf marginal dePeh gi(xlp) = (°) p*d — p)* for x =0,..., 40, f(p) = 1 for 0 < p < Lis the marginal (x 0 FP marginal deXobtido da lei da probabilidade total p.d.f. of P, and f(x) is the marginal p.f. of X obtained from the law of total probability 262 Capitulo 4 Expectativa 262 Chapter 4 Expectation Figura 4.120 paf —— Marainal Figure 4.12 The conditional —Mncinal condicional dePdadoX=18 no 5 me p.d.f. of P given X = 18 in 5 me . pa ondicional . pa Conditional Exemplo 4.7.8. O pdf marginal Poy Example 4.7.8. The marginal Poy deA(antes de observar X) 4 pA p.d.f. of P (prior to observing 4 pA também é mostrado. i 4 X) is also shown. i 4 _ é a ; ‘ 2 i : 0 0.2 0,4 06 08 1,0 P 0 02 04 06 ~~(O8 10 P para variaveis aleatérias (3.6.12): for random variables (3.6.12): Ja( o 1 4 40 x 40—x A(x 7 x px(\ -p)4o-xd)p. (4.7.10) AQ) = WAY pp) dp. (4.7.10) Esta Ultima integral parece dificil de calcular. No entanto, existe uma férmula simples para This last integral looks difficult to compute. However, there is a simple formula for integrais desta forma, a saber, integrals of this form, namely, fi 1 K(1 -p)d _ KE (4.7.11) / ka pod — ee (4.7.11) 9 PP ket “ y PP? OP ee Dl f Uma prova da Eq. (4.7.11) € dado na Sec. 5.8. Substituindo (4.7.11) em (4.7.10) rendimentos A proof of Eq. (4.7.11) is given in Sec. 5.8. Substituting (4.7.11) into (4.7.10) yields 40! —s x!(40 -x}! 1 40! x(40—x)! 1 fixe OCOD TF fix) = SEO et x1(40 -x)! 41! 41 x"40—x)! 41! 41 parax=0,...,40. Substituindo isso na Eq. (4.7.9) rendimentos for x =0,..., 40. Substituting this into Eq. (4.7.9) yields (p49.| x ——-'— px(1 -pwo-xpara O<p <1 (pIx) ah p= py", for0<p<1 dg.|x~* ——————— px(1 - -x, . = ——— — ; . g2(pag x40 -x} Px(! -P, p P 82(p x140—x1!? Pp p Por exemplo, comx=18, o numero de sucessos observado na Tabela 2.1, um grafico de g2(pdg.| For example, with x = 18, the observed number of successes in Table 2.1, a graph of 186 mostrado na Figura 4.12. 80(p|18) is shown in Fig. 4.12. Se quisermos minimizar o MSE ao preverP,devemos usar F(P| x), a média If we want to minimize the M.S.E. when predicting P, we should use E(P|x), condicional. Podemos calcular£(P| xjusando a pdf condicional e a Eq. (4.7.11): the conditional mean. We can compute E(P|x) using the conditional p.d.f. and Eq. (4.7.11): 1 J 41! ' 41), 40x E(P| x Praoong PM -p40-xDP E(P\x) = Pro? (1— p)"~ dp o * (4.7.12) a * aa (4.7.12) _ 41! (x+1\(40 -x}! _ xt _ 41! (x +1)'"40—x)! x4+1 x1(40 -x}} 42! 42° ~ x1(40 — x)! 42! 42° Entdo depoisX=xé observado, vamos prever Pser(x+1 )/42, o que é muito préximo da So, after X = x is observed, we will predict P to be (x + 1)/42, which is very close to proporcdo dos primeiros 40 pacientes que obtiveram sucesso. O MSE depois de observar the proportion of the first 40 patients who are successes. The M.S.E. after observing X=xé a variancia condicional Var(P| x). Podemos calcular isso usando (4.7.12) e X =x is the conditional variance Var(P|x). We can compute this using (4.7.12) and E(P2| x} : ot ph -pho-xDP e(Ptx) = | pp— psa pa x, ——_—_—_- -/p)o- x)= —___— - | 0 PP ao-xy PO PO | I Piao POP _ 41! (x+2)\(40 -x)! _ (X+1)(x+2) _ 41! (x +2)'"40—x)! @+)D@ +2) (40 -x}! 43! 42x43, ~ x1(40 — x)! 43! ~ A2x 43 4.7 Expectativa Condicional 263 4.7 Conditional Expectation 263 Usando 0 fato de que Var/(P| x £(P2| x} [E(P| x/lz2, nds vemos que Using the fact that Var(P|x) = E(P?|x) — [E(P|x)f, we see that (+ - —_— var(P|x CCUG) Var(P|x) = STOEt =) 422x43 42? x 43 O MSE geral de previsdoPdexé a média do MSE condicional The overall M.S.E. of predicting P from X is the mean of the conditional M.S.E. ( ) (X+1)(41 -X) (S +H(41—- 0) Var(P|XJ=— = 2———__—_ E[Var(P|X)] = £ {| —————— EVar(P| XA 422x43 [Var(P|X)] 42? x 43 1 1 2 = — F-X2+ 40X+41 = —— E(-X* + 40X + 41 75,852 ( i 75,852 ( 17% Ye 40 40 = BE aay a 4 Dea 75,852 41 41 75,852\ 41 41 ( x=0 x=0 ) x=0 x=0 1 1 40x41x81 40 40x41 1 1 40x 41x81 4040 x 41 = — - ———__ + ——__ + 4] = —— (| —-— ——— _ + — —— _ +41 75,852 41 6 412 75,852 \ 41 6 41 2 = 301 _o 003968. — 0! _ 9) 003968. 75,852 75,852 Neste calculo, usamos duas férmulas populares, In this calculation, we used two popular formulas, ” n pe OO) (4.7.13) Sik= mat) (4.7.13) 2 2 k-0 k=0 y” n ko= MNT) (4.7.14) yea Met Dent (4.7.14) 6 6 0 k=0 O MSE geral 6 um pouco menor que o valor 1/12 = 0.08333, que teriamos obtido The overall M.S.E. is quite a bit smaller than the value 1/12 = 0.08333, which we antes de observarxX. A titulo de ilustragdo, a Figura 4.12 mostra o quanto mais would have obtained before observing X. As an illustration, Fig. 4.12 shows how espalhada é a distribuigdo marginal dePé comparado com a distribuigdo much more spread out the marginal distribution of P is compared to the conditional condicional dePdepois de observarX=18. - distribution of P after observing X = 18. < Deve-se enfatizar que para as condicées do Exemplo 4.7.8, 0,003968 é o It should be emphasized that for the conditions of Example 4.7.8, 0.003968 is the valor apropriado do MSE global quando se sabe que 0 valor deXestara disponivel appropriate value of the overall M.S.E. when it is known that the value of X will be para preverPmas antes do valor explicito deXxfoi determinado. Apdos o valor dex= available for predicting P but before the explicit value of X has been determined. xfoi determinado, o valor apropriado do MSE é After the value of X = x has been determined, the appropriate value of the M.S.E. is Var(P| XE Ot ttttertes— Observe que o maior valor possivel de Var(P| x6 0,005814 Var(P |x) = Soe, Notice that the largest possible value of Var(P|x) is 0.005814 quandox=20 e ainda é muito menor que 1/12. when x = 20 and is still much less than 1/12. Um resultado semelhante ao Teorema 4.7.3 é valido se estivermos tentando A result similar to Theorem 4.7.3 holds if we are trying to minimize the M.A.E. minimizar o MAE (erro médio absoluto) de nossa previsdo em vez do MSE. No (mean absolute error) of our prediction rather than the M.S.E. In Exercise 16, you Exercicio 16, vocé pode provar que o preditor que minimiza o MAE éd(Xjigual a can prove that the predictor that minimizes M.A.E. is d(X) equal to the median of mediana da distribuigdo condicional deSdadox. the conditional distribution of Y given X. Resumo Summary A média condicional £(S| x)deSdadoX=xé a média da distribuigdo condicional des The conditional mean E(Y|x) of Y given X =x is the mean of the conditional dadoX=x. Esta distribuigdo condicional foi definida no Capitulo 3. Da mesma distribution of Y given X = x. This conditional distribution was defined in Chapter 3. forma, a variancia condicional Var(S| x)deSdadoX=xé a varidncia da distribuigdo Likewise, the conditional variance Var(Y|x) of Y given X = x is the variance of condicional. A lei da probabilidade total para expectativas diz que ALE(S| XJ] =F(S). the conditional distribution. The law of total probability for expectations says that Se observarmosXe entdo precisa preverS,o preditor que leva ao menor MSE é€a E[E(Y|X)]= E(Y). If we will observe X and then need to predict Y, the predictor média condicional £(S| X). that leads to the smallest M.S.E. is the conditional mean E(Y|X). 264 Capitulo 4 Expectativa 264 Chapter 4 Expectation Exercicios Exercises 1.Considere novamente a situagdo descrita no Exemplo ou pode simplesmente prever o valor deSsem primeiro 1. Consider again the situation described in Example or can simply predict the value of Y without first observing 4.7.8. Calcule o MSE ao usar£(P| x)prever Pdepois de observar o valor deX. Se a pessoa considerar que a sua perda 4.7.8. Compute the M.S.E. when using E(P|x) to predict the value of X. If the person considers her total loss to be observarX=18. Qudo menor é isso do que o MSE total é o custocmais o MSE do seu valor previsto, qual é 0 P after observing X = 18. How much smaller is this than the cost c plus the M.S.E. of her predicted value, what is marginal 1/12? valor maximo decque ela deveria estar disposta a pagar? the marginal M.S.E. 1/12? the maximum value of c that she should be willing to pay? 2.Suponha que 20% dos alunos que fizeram um 11.Prove o Teorema 4.7.4. 2. Suppose that 20 percent of the students who took a 11. Prove Theorem 4.7.4. determinado teste fossem da escolaAe que a média . co, . . certain test were from school A and that the arithmetic . aritmética de suas notas no teste foi 80. Suponha 12.Suponha queXeSsao variaveis aleatorias tais que average of their scores on the test was 80. Suppose also 12 Suppose that X and Y are random variables such that também que 30% dos alunos eram da escolaBe que a E(S| Xmachador b. Supondo que Cov(x, Yexiste e que that 30 percent of the students were from school B and that E(Y|X) =aXx +b. Assuming that Cov(X, Y) exists and média aritmética de suas notas foi 76. Suponha, 0<Var(X) <e,determinar expressdes paraaeb em termos the arithmetic average of their scores was 76. Suppose, ‘hat 0 < Var(X) < oo, determine expressions for a and b finalmente, que os outros 50% dos alunos eram da de£X), E(S), Var(XJe Cov(x, Y). finally, that the other 50 percent of the students were from 1 terms of £(X), E(Y), Var(X), and Cov(x, Y). escolaCe que a media aritmética de suas notas foi 84. 13.Suponha que a pontuacdo de uma pessoaXem um teste de school C and that the arithmetic average of their scores 13. Suppose that a person’s score X on a mathematics Se um aluno for selecionado aleatoriamente de todo o aptiddo matematica é um numero no intervalo(0,1)e que sua was 84. If a student is selected at random from the entire aptitude test is a number in the interval (0, 1) and that grupo que fez o teste, qual é o valor esperado de sua pontuacadoSem um teste de aptidao musical também é um group that took the test, what is the expected value ofher _ iis score Y on a music aptitude test is also a number in nota? numero no intervalo(0,1). Suponha também que na populacao de score? the interval (0, 1). Suppose also that in the population of 3.Suponha que 0<Var(X) we 0<Var(S) <, Mostre que seEX todos os estudantes universitarios nos Estados Unidos, as 3. Suppose that 0 < Var(X) <oo and 0 < Var(Y) <0 all college students in the United States, the scores X and | 5 constante para todos os valores deS,entaoX eSnao pontuagdesXe Sso distribuidos de acordo com o seguinte pdf Show that if E(X|Y) is constant for all values of Y, then X Y are distributed in accordance with the following joint esto correlacionados. Conjunto: ( and Y are uncorrelated. p.d.f: 4.Suponha que a distribuicdo deXé simétrico em relacdo F(x, YF 8(2x+3e) para 0<x<1 e 0<sims1, caso 4. Suppose that the distribution of X is symmetric with fa y= | (2x +3y) forO<x <land0<y<1, ao pontox=0, que todos os momentos deXexiste, e queF(S 0 contrario. respect to the point x = 0, that all moments of X exist, and 0 otherwise. | X}=machado+ b, ondeaebrecebem constantes. Mostre og, , that E(Y|X) =aX + b, where a and b are given constants. . isso,X0eue Sao est3o correlacionados paraeu=1.2 a.Se um estudante universitario for selecionado aleatoriamente, Show that X2” and ¥ are uncorrelated for m = 1.2 a. If a college student is selected at random, what pre- ° P rene qual valor previsto de sua pontuac¢o no teste de musica SU fone dicted value of his score on the music test has the 5.Suponha que um pontoXié escolhido a partir da tem o menor MSE? 5. Suppose that a point X, is chosen from the uniform smallest M.S.E.? distribuigdo uniforme no intervalo [0,1], e isso apds o valor b.Qual valor previsto de sua pontuacdo no teste de distribution on the interval [0, 1], and that after the value b. What predicted value of his score on the mathematics M=xié observado um ponto.X2é escolhido a partir de uma matematica tem o menor MAE? x1 =* is observed, a point X> is chosen from a uniform test has the smallest M.A.E.? distribuigdo uniforme no intervalo [1,1]. Suponha ainda distribution on the interval [x,, 1]. Suppose further that ; ; . ; que varidveis adicionaisx3, X%4, ...s’o gerados da mesma 14.Considere novamente as condigées do Exercicio 13. As pontuagées dos additional variables X3, X4, ... are generated in the same 14. Consider again the conditions of Exercise 13. Are the maneira. Em geral, para=1,2,...,depois do valorX= xtem _—estudantes universitarios no teste de matematica e no teste de musica estao way. In general, for j=1,2,..., after the value X,= Scores of college students on the mathematics test and the sido observado,X#1é escolhido a partir de uma distribuicdo positivamente correlacionadas, negativamente correlacionadas ou nado xj has been observed Xj is chosen from a uniform music test positively correlated, negatively correlated, or uniforme no intervalo [x;,1]. Encontre o valor deEXn). correlacionadas? distribution on the interval [x , 1]. Find the value of E(X,,). uncorrelated? 6.Suponha que a distribuico conjunta dexeSé a 15.Considere novamente as condicgées do Exercicio 13.(a)Se a 6. Suppose that the joint distribution of X and Y is the uni- 15. Consider again the conditions of Exercise 13. (a) Ifa distribuicao uniforme no circuloxe+sime <1. EncontreEX| 5). pontuacdo de um aluno no teste de matematica for 0,8, qual valor form distribution on the circle x2 + y2 < 1. Find E(X|Y). student’s score on the mathematics test is 0.8, what pre- previsto de sua pontuac¢ao no teste de musica tem o menor MSE?( dicted value of his score on the music test has the smallest 7.Suponha queXestém uma distribuigdo conjunta continua b)Se a pontuacdo de um aluno na prova de musica for 1/3, qual 7. Suppose that X and Y have a continuous joint distribu- M.S.E.? (b) If a student’s score on the music test is 1/3, para a qual a pdf conjunta é a seguinte: valor previsto de sua pontuacao na prova de matematica tem o tion for which the joint p.d-f. is as follows: what predicted value of his score on the mathematics test { menor MAE? has the smallest M.A.E.? f xtsim para OSxS1 e O<sims1, caso _fxt+y for0<x<landO<y<1, (x, YF 0 contrario. 16.Defina uma mediana condicional deSdadoX=xser qualquer I, y= 0 otherwise. 16. Define a conditional median of Y given X = x to be mediana da distribuicgdo condicional deSdadoX= x. Suponha any median of the conditional distribution of Y given X = Encontrar £(S| Xe Var(S| X). que consequiremos observarXe entao precisaremos preverS. Find E(Y|X) and Var(Y |X). x. Suppose that we will get to observe X and then we will 8.Considere novamente as condic¢ées do Exercicio 7.(a) Suponha que desejamos escolher nossa previsaoa/Xde modo 8. Consider again the conditions of Exercise 7. (a) If it need to predict Y. Suppose that we wish to choose our Se for observado queX=1/2, qual valor previsto destera a minimizar o erro absoluto médio, F/(|S-d(X)| ). Prove isso a(x) is observed that X = 1/2, what predicted value of Y will prediction d(X) so as to minimize mean absolute error, 0 menor MSE?(b)Qual sera o valor deste MSE? deve ser escolhido para ser uma mediana condicional deS have the smallest M.S.E.? (b) What will be the value of E(\¥ — d(X)|). Prove that d(x) should be chosen to be dadoX=x. DicaNocé pode modificar a prova do Teorema 4.7.3 this MS.E.? a conditional median of Y given X =x. Hint: You can para lidar com este caso. — modify the proof of Theorem 4.7.3 to handle this case. 9.Considere novamente as condicées do Exercicio 7. Se 9. Consider again the conditions of Exercise 7. If the value . . o valor de Sdeve ser previsto a partir do valor dex, qual 17.Prove ° Teorema 4.7.2 Para 0 caso em quexes tem uma of Y is to be predicted from the value of X, what will be 17. Prove Theorem 4.7.2 for the case in which X and Y seré o valor minimo do MSE geral? distribuigdo conjunta discreta. A chave para a prova é the minimum value of the overall M.S.E.? have a discrete joint distribution. The key to the proof is , escrever todos os FP's condicionais necessarios em termos en to write all of the necessary conditional p.f.’s in terms of 10.Suponha que, para as condicées dos Exercicios 7 e 9, do FP conjunto deXeSe o FP marginal dex. Para facilitar 10. Suppose that, for the conditions in Exercises 7 and 9, the joint p.f. of X and Y and the marginal p.f. of X. To uma pessoa possa pagar um custocpela oportunidade de isso, para cadaxez, dé um nome ao conjunto desim valores a person either can pay a cost c for the opportunity of _ facilitate this, for each x and z, give a name to the set of y observar o valor deXantes de prever o valor deS tais quer(x, YFz. observing the value of X before predicting the value of Y values such that r(x, y) =z. 48 Utilidade 265 4.8 Utility 265 - 4.8 Utilidade *4.8 Utility Grande parte da inferéncia estatistica consiste em escolher entre diversas acées Much of statistical inference consists of choosing between several available actions. disponiveis. Geralmente, ndo sabemos ao certo qual escolha serd a melhor, porque Generally, we do not know for certain which choice will be best, because some alguma varidvel aleat6ria importante ainda no foi observada. Para alguns valores dessa important random variable has not yet been observed. For some values of that varidvel aleat6ria, uma escolha é melhor, e para outros valores, alguma outra escolha é random variable one choice is best, and for other values some other choice is melhor. Podemos tentar pesar os custos e beneficios das diversas escolhas em relacdo as best. We can try to weigh the costs and benefits of the various choices against the probabilidades de que as diversas escolhas se revelem as melhores. A utilidade é uma probabilities that the various choices turn out to be best. Utility is one tool for ferramenta para atribuir valores aos custos e beneficios de nossas escolhas. O valor assigning values to the costs and benefits of our choices. The expected value of the esperado da utilidade equilibra entao os custos e beneficios de acordo coma utility then balances the costs and benefits according to how likely the uncertain probabilidade das possibilidades incertas. possibilities are. Func6es utilitarias Utility Functions Exemplo Escolha de jogos.Considere duas apostas entre as quais um jogador deve escolher. Example Choice of Gambles. Consider two gambles between which a gambler must choose. 4.8.1 Cada aposta sera expressa como uma variavel aleatéria para a qual valores positivos significam um 4.8.1 Each gamble will be expressed as a random variable for which positive values mean ganho para 0 jogador e valores negativos significam uma perda para 0 jogador. Os valores numéricos a gain to the gambler and negative values mean a loss to the gambler. The numerical de cada variavel aleatéria indicam a quantidade de ddlares que o jogador ganha ou perde. DeixarX values of each random variable tell the number of dollars that the gambler gains or tem o pf loses. Let X have the p.f. t 0.5 sex=500 oux= -350, caso 0.5 if x =500 or x = —350, fx eo f(x) = 0 contrdario, 0 otherwise, e deixarStem o pf and let Y have the p.f. t 1 sesim=40,sim=50, ousim=60, 1/3 if y = 40, y = 50, or y = 60, USF . gy) = ; 0 caso contrario, 0 otherwise, E simples calcular isso£X)=75 e£(S50. Como pode um jogador escolher entre estas It is simple to compute that E(X) = 75 and E(Y) =50. How might a gambler choose duas apostas? EXmelhor queSsimplesmente porque tem um valor esperado mais between these two gambles? Is X better than Y simply because it has higher expected alto? - value? < No Exemplo 4.8.1, um jogador que nao deseja arriscar perder 350 ddlares pela In Example 4.8.1, a gambler who does not desire to risk losing 350 dollars for the chance de ganhar 500 dolares pode preferirS,o que rende um certo ganho de pelo menos chance of winning 500 dollars might prefer Y, which yields a certain gain of at least 40 dolares. 40 dollars. Oteoria da utilidadefoi desenvolvido durante as décadas de 1930 e 1940 para descrever a The theory of utility was developed during the 1930s and 1940s to describe a preferéncia de uma pessoa entre jogos como os do Exemplo 4.8.1. De acordo com essa teoria, person’s preference among gambles like those in Example 4.8.1. According to that uma pessoa preferira apostarXpara 0 qual a expectativa de uma determinada fungdovocé(xé theory, a person will prefer a gamble X for which the expectation of a certain um maximo, em vez de uma aposta para a qual simplesmente o ganho esperado£X% um function U(X) is a maximum, rather than a gamble for which simply the expected maximo. gain E(X) is a maximum. Definigao Fungao util.De uma pessoafuncGo utilvocéé uma funcdo que atribui a cada pos- Definition Utility Function. A person’s utility function U is a function that assigns to each pos- 4.8.1 quantia possivelx (-~<x <e)Jum numerovocé(xyjrepresentando o valor real para a 4.8.1 sible amount x (—oo < x < oo) a number U(x) representing the actual worth to the pessoa de ganhar a quantiax. person of gaining the amount x. Exemplo Escolha de jogos.Suponha que a fungao utilidade de uma pessoa sejavocée que ela deve Example Choice of Gambles. Suppose that a person’s utility function is U and that she must 4.8.2 escolha entre as apostasXeSno Exemplo 4.8.1. Entdo 4.8.2 choose between the gambles X and Y in Example 4.8.1. Then 1 1 1 1 Alvocé(X)] = 5 VOCE(S00}+ 5 VOCE(-350) (4.8.1) E[U(X)]= 5 YU 500) + 5 U(— 350) (4.8.1) e and . 1. 1, 1 |. 1 1 1 EALVOCE(S) = VOCE 0} 3 VOCES0}+ 3 VOCE(A0). (4.8.2) E[UY)]= 3U (60) + 3U (50) + 3U (40). (4.8.2) 266 Capitulo 4 Expectativa 266 Chapter 4 Expectation Figura 4.134A utilidade vocé) Figure 4.13 The utility U(x) fungdo para o Exemplo 4.8.2. 100 function for Example 4.8.2. 100 100 100 - 400 - 200 200 400 * —400 —200 200 400 * 400 100 - 200 —200 - 300 —300 - 400 —400 - 500 —500 A pessoa preferiria 0 jogo para o qual a utilidade esperada do ganho, conforme The person would prefer the gamble for which the expected utility of the gain, as especificado pela Eq. (4.8.1) ou Eq. (4.8.2), 6 maior. specified by Eq. (4.8.1) or Eq. (4.8.2), is larger. Como exemplo especifico, considere a seguinte funcdo de utilidade que penaliza perdas As a specific example, consider the following utility function that penalizes losses numa extens&o muito maior do que recompensa os ganhos: to a much greater extent than it rewards gains: { 100 registros(x+100}461 sex20, 100 lo 100) — 461 ifx>0 voce (xp: registros(x+1 00} (4.8.3) U(x) = | a(x + 100) aan (4.8.3) x sex <0. x if x <0. Esta funcdo foi escolhida para ser diferencidvel emx=0, continuo em todos os lugares, This function was chosen to be differentiable at x = 0, continuous everywhere, in- crescente, cOncavo parax >0 e linear parax <0. Um grafico devocé(x}é dado na Fig. 4.13. Usando creasing, concave for x > 0, and linear for x < 0. A graph of U(x) is given in Fig. 4.13. este especificovocé calculamos Using this specific U, we compute 1 1 1 1 Alvocé(X)] = 5i100 registros(600)-461] + 5 (350% -85.4, E[U(X)]= 51100 log(600) — 461] + 5 (350) = —85.4, = 1 / 1 1 1 1 1 FALVOCE(SJ = 3" 00 registros(160)461] + 3 00 registros(150}461] + ito registros(140}-461] E[U()]= 31100 log(160) — 461] + 3100 log(150) — 461] + 3100 log(140) — 461] =40.4, = 40.4. Vemos que uma pessoa com a fungdo utilidade na Eq. (4.8.3) prefeririaSparax. We see that a person with the utility function in Eq. (4.8.3) would prefer Y to X. - < Aqui, formalizamos o principio que fundamenta a escolha entre apostas Here, we formalize the principle that underlies the choice between gambles ilustradas no Exemplo 4.8.1. illustrated in Example 4.8.1. Definicgao Maximizando a utilidade esperada.Dizemos que uma pessoa escolhe entre jogos por Definition Maximizing Expected Utility. We say that a person chooses between gambles by 4.8.2 maximizando a utilidade esperadase as seguintes condicées forem validas. Existe uma 4.8.2 maximizing expected utility if the following conditions hold. There is a utility function fungado utilidade vocé, e quando a pessoa deve escolher entre quaisquer duas apostasXeS, U, and when the person must choose between any two gambles X and Y, he will ele vai preferirXparaSse £[ vocé(X)|>£[ VOCE(SJ e sera indiferente entreXesse ALvocé(X)] =F prefer X to Y if E[U(X)]> E[U(Y)] and will be indifferent between X and Y if VOCE(S)]. E[U(X)]= E[U(Y)]. Em palavras, a Definigdo 4.8.2 diz que uma pessoa escolhe entre jogos maximizando a In words, Definition 4.8.2 says that a person chooses between gambles by maximizing utilidade esperada se escolher um jogoXpara qual £[ vocé(X]] 6 um maximo. expected utility if he will choose a gamble X for which E[U(X)] is a maximum. Se adotarmos uma fungdo de utilidade, entao podemos (pelo menos em principio) fazer If one adopts a utility function, then one can (at least in principle) make choices escolhas entre apostas, maximizando a utilidade esperada. Os algoritmos computacionais between gambles by maximizing expected utility. The computational algorithms nec- necessdrios para realizar a maximizagdo muitas vezes constituem um desafio pratico. Por outro essary to perform the maximization often provide a practical challenge. Conversely, lado, se fizermos escolhas entre jogos de tal forma que certos critérios razodveis se apliquem, if one makes choices between gambles in such a way that certain reasonable criteria entdo poderemos provar que existe uma funcdo de utilidade tal que as escolhas apply, then one can prove that there exists a utility function such that the choices 4.8 Utility 267 correspond to maximizing expected utility. We shall not consider this latter prob- lem in detail here; however, it is discussed by DeGroot (1970) and Schervish (1995, chapter 3) along with other aspects of the theory of utility. Examples of Utility Functions Since it is reasonable to assume that every person prefers a larger gain to a smaller gain, we shall assume that every utility function U(x) is an increasing function of the gain x. However, the shape of the function U(x) will vary from person to person and will depend on each person’s willingness to risk losses of various amounts in attempting to increase his gains. For example, consider two gambles X and Y for which the gains have the follow- ing probability distributions: Pr(X = −3) = 0.5, Pr(X = 2.5) = 0.4, Pr(X = 6) = 0.1 (4.8.4) and Pr(Y = −2) = 0.3, Pr(Y = 1) = 0.4, Pr(Y = 3) = 0.3. (4.8.5) We shall assume that a person must choose one of the following three decisions: (i) accept gamble X, (ii) accept gamble Y, or (iii) do not accept either gamble. We shall now determine the decision that a person would choose for three different utility functions. Example 4.8.3 Linear Utility Function. Suppose that U(x) = ax + b for some constants a and b, where a > 0. In this case, for every gamble X, E[U(X)] = aE(X) + b. Hence, for every two gambles X and Y, E[U(X)] > E[U(Y)] if and only if E(X) > E(Y). In other words, a person who has a linear utility function will always choose a gamble for which the expected gain is a maximum. When the gambles X and Y are defined by Eqs. (4.8.4) and (4.8.5), E(X) = (0.5)(−3) + (0.4)(2.5) + (0.1)(6) = 0.1 and E(Y) = (0.3)(−2) + (0.4)(1) + (0.3)(3) = 0.7. Furthermore, since the gain from not accepting either of these gambles is 0, the expected gain from choosing not to accept either gamble is clearly 0. Since E(Y) > E(X) > 0, it follows that a person who has a linear utility function would choose to accept gamble Y. If gamble Y were not available, then the person would prefer to accept gamble X rather than not to gamble at all. ◀ Example 4.8.4 Cubic Utility Function. Suppose that a person’s utility function is U(x) = x3 for −∞ < x < ∞. Then for the gambles defined by Eqs. (4.8.4) and (4.8.5), E[U(X)] = (0.5)(−3)3 + (0.4)(2.5)3 + (0.1)(6)3 = 14.35 and E[U(Y)] = (0.3)(−2)3 + (0.4)(1)3 + (0.3)(3)3 = 6.1. Furthermore, the utility of not accepting either gamble is U(0) = 03 = 0. Since E[U(X)]> E[U(Y)]> 0, it follows that the person would choose to accept gamble X. If gamble X were not available, the person would prefer to accept gamble Y rather than not to gamble at all. ◀ 4.8 Utilidade 267 correspondem à maximização da utilidade esperada. Não consideraremos aqui este último problema em detalhe; no entanto, é discutido por DeGroot (1970) e Schervish (1995, capítulo 3) juntamente com outros aspectos da teoria da utilidade. Exemplos de funções utilitárias Visto que é razoável supor que cada pessoa prefere um ganho maior a um ganho menor, assumiremos que toda função de utilidadevocê(x)é uma função crescente do ganhox. No entanto, a forma da funçãovocê(x)variará de pessoa para pessoa e dependerá da disposição de cada pessoa em arriscar perdas de vários valores na tentativa de aumentar seus ganhos. Por exemplo, considere duas apostasXeSpara os quais os ganhos têm as seguintes distribuições de probabilidade: Pr.(X= -3)=0.5, Pr.(X=2.5)=0.4, Pr.(X=6)=0.1 (4.8.4) e Pr.(S= -2)=0.3, Pr.(S=1)=0.4, Pr.(S=3)=0.3. (4.8.5) Assumiremos que uma pessoa deve escolher uma das três decisões a seguir: (i) aceitar o jogoX, (ii) aceitar apostasS,ou (iii) não aceita nenhuma das apostas. Determinaremos agora a decisão que uma pessoa escolheria para três funções de utilidade diferentes. Exemplo 4.8.3 Função de utilidade linear.Suponha quevocê(x)=machado+bpara algumas constantesaeb, onde uma >0. Neste caso, para cada apostaX,E[você(X)] =aE(X)+b. Portanto, para cada duas apostasXeS,E[você(X)]>E[VOCÊ(S)] se e apenas seE(X) > E(Y). Em outras palavras, uma pessoa que possui uma função de utilidade linear sempre escolherá uma aposta cujo ganho esperado seja máximo. Quando as apostasXeSsão definidos pelas Eqs. (4.8.4) e (4.8.5), EX)=(0.5)(−3)+(0.4)(2.5)+(0.1)(6)=0.1 e E(S)=(0.3)(−2)+(0.4)(1)+(0.3)(3)=0.7. Além disso, uma vez que o ganho de não aceitar qualquer uma destas apostas é 0, o ganho esperado de escolher não aceitar qualquer uma das apostas é claramente 0. Uma vez queE(Y) > E(X) >0, segue-se que uma pessoa que tem uma função de utilidade linear escolheria aceitar o jogoS.Se jogarSnão estivessem disponíveis, então a pessoa preferiria aceitar o jogoXem vez de não jogar. - Exemplo 4.8.4 Função Utilitária Cúbica.Suponha que a função utilidade de uma pessoa sejavocê(x)=x3para -∞< x <∞.Então, para as apostas definidas pelas Eqs. (4.8.4) e (4.8.5), E[você(X)] =(0.5)(−3)3+(0.4)(2.5)3+(0.1)(6)3= 14.35 e E[VOCÊ(S)] =(0.3)(−2)3+(0.4)(1)3+(0.3)(3)3= 6.1. Além disso, a utilidade de não aceitar nenhuma das apostas éVOCÊ(0)=03= 0. Desde E[você(X)]>E[VOCÊ(S)]>0, segue-se que a pessoa escolheria aceitar o jogoX . Se jogarXnão estivessem disponíveis, a pessoa preferiria aceitar o jogoSem vez de não jogar. - 268 Chapter 4 Expectation Example 4.8.5 Logarithmic Utility Function. Suppose that a person’s utility function is U(x) = log(x + 4) for x > −4. Since limx→−4 log(x + 4) = −∞, a person who has this utility function cannot choose a gamble in which there is any possibility of her gain being −4 or less. For the gambles X and Y defined by Eqs. (4.8.4) and (4.8.5), E[U(X)] = (0.5)(log 1) + (0.4)(log 6.5) + (0.1)(log 10) = 0.9790 and E[U(Y)] = (0.3)(log 2) + (0.4)(log 5) + (0.3)(log 7) = 1.4355. Furthermore, the utility of not accepting either gamble is U(0) = log 4 = 1.3863. Since E[U(Y)]> U(0) > E[U(X)], it follows that the person would choose to accept gamble Y. If gamble Y were not available, the person would prefer not to gamble at all rather than to accept gamble X. ◀ Selling a Lottery Ticket Suppose that a person has a lottery ticket from which she will receive a random gain of X dollars, where X has a specified probability distribution. We shall determine the number of dollars for which the person would be willing to sell this lottery ticket. Let U denote the person’s utility function. Then the expected utility of her gain from the lottery ticket is E[U(X)]. If she sells the lottery ticket for x0 dollars, then her gain is x0 dollars, and the utility of this gain is U(x0). The person would prefer to accept x0 dollars as a certain gain rather than accept the random gain X from the lottery ticket if and only if U(x0) > E[U(X)]. Hence, the person would be willing to sell the lottery ticket for any amount x0 such that U(x0) > E[U(X)]. If U(x0) = E[U(X)], she would be equally willing to either sell the lottery ticket or accept the random gain X. Example 4.8.6 Quadratic Utility Function. Suppose that U(x) = x2 for x ≥ 0, and suppose that the person has a lottery ticket from which she will win either 36 dollars with probability 1/4 or 0 dollars with probability 3/4. For how many dollars x0 would she be willing to sell this lottery ticket? The expected utility of the gain from the lottery ticket is E[U(X)] = 1 4U(36) + 3 4U(0) = 1 4(362) + 3 4(0) = 324. Therefore, the person would be willing to sell the lottery ticket for any amount x0 such that U(x0) = x2 0 > 324. Hence, x0 > 18. In other words, although the expected gain from the lottery ticket in this example is only 9 dollars, the person would not sell the ticket for less than 18 dollars. ◀ Example 4.8.7 Square Root Utility Function. Suppose now that U(x) = x1/2 for x ≥ 0, and consider again the lottery ticket described in Example 4.8.6. The expected utility of the gain from the lottery ticket in this case is E[U(X)] = 1 4U(36) + 3 4U(0) = 1 4(6) + 3 4(0) = 1.5. Therefore, the person would be willing to sell the lottery ticket for any amount x0 such that U(x0) = x1/2 0 > 1.5. Hence, x0 > 2.25. In other words, although the expected gain from the lottery ticket in this example is 9 dollars, the person would be willing to sell the ticket for as little as 2.25 dollars. ◀ 268 Capítulo 4 Expectativa Exemplo 4.8.5 Função de utilidade logarítmica.Suponha que a função utilidade de uma pessoa sejavocê(x)=registro(x+ 4)parax >-4. Desde limx→−4registro(x+4)= -∞,uma pessoa que tem esta função de utilidade não pode escolher uma aposta em que haja qualquer possibilidade de seu ganho ser −4 ou menos. Para as apostasXeSdefinido pelas Eqs. (4.8.4) e (4.8.5), E[você(X)] =(0.5)(registro 1)+(0.4)(registro 6.5)+(0.1)(registro 10)=0.9790 e E[VOCÊ(S)] =(0.3)(registro 2)+(0.4)(registro 5)+(0.3)(registro 7)=1.4355. Além disso, a utilidade de não aceitar nenhuma das apostas éVOCÊ(0)=registro 4 = 1. 3863. Desde E[VOCÊ(S)]>VOCÊ(0) > E[você(X)], segue-se que a pessoa escolheria aceitar o jogo S.Se jogarSnão estivessem disponíveis, a pessoa preferiria não jogar em vez de aceitar o jogoX. - Vendendo um bilhete de loteria Suponha que uma pessoa tenha um bilhete de loteria do qual receberá um ganho aleatório deX dólares, ondeXtem uma distribuição de probabilidade especificada. Determinaremos a quantidade de dólares pela qual a pessoa estaria disposta a vender este bilhete de loteria. Deixarvocêdenotam a função de utilidade da pessoa. Então a utilidade esperada do seu ganho com o bilhete de loteria éE[você(X)]. Se ela vender o bilhete de loteria porx0dólares, então o ganho dela éx0dólares, e a utilidade desse ganho évocê(x0). A pessoa preferiria aceitar x 0dólares como um ganho certo, em vez de aceitar o ganho aleatórioXdo bilhete de loteria se e somente sevocê(x0) > E[você(X)]. Portanto, a pessoa estaria disposta a vender o bilhete de loteria por qualquer valorx0de tal modo quevocê(x0) > E[você(X)]. Sevocê(x0)=E[você(X)], ela estaria igualmente disposta a vender o bilhete de loteria ou aceitar o ganho aleatórioX. Exemplo 4.8.6 Função de utilidade quadrática.Suponha quevocê(x)=x2parax≥0, e suponha que pessoa tem um bilhete de loteria do qual ganhará 36 dólares com probabilidade 1/4 ou 0 dólares com probabilidade 3/4. Por quantos dólaresx0ela estaria disposta a vender esse bilhete de loteria? A utilidade esperada do ganho do bilhete de loteria é 1 4 3 4 1 4 3 E[você(X)] = VOCÊ(36)+ VOCÊ(0)= (362)+(0)=324. 4 Portanto, a pessoa estaria disposta a vender o bilhete de loteria por qualquer valorx0 de tal modo quevocê(x0)=x20>324. Portanto,x0>18. Por outras palavras, embora o esperado o ganho com o bilhete de loteria neste exemplo é de apenas 9 dólares, a pessoa não venderia o bilhete por menos de 18 dólares. - Exemplo 4.8.7 Função Utilitária de Raiz Quadrada.Suponha agora quevocê(x)=x1/2parax≥0 e considere novamente o bilhete de loteria descrito no Exemplo 4.8.6. A utilidade esperada do ganho do bilhete de loteria neste caso é 1 4 3 4 1 4 3 4 E[você(X)] = VOCÊ(36)+ VOCÊ(0)= (6)+ (0)=1.5. Portanto, a pessoa estaria disposta a vender o bilhete de loteria por qualquer valorx0 de tal modo quevocê(x0)=x1/2 0 >1.5. Portanto,x0>2.25. Por outras palavras, embora o esperado Se o ganho com o bilhete de loteria neste exemplo for de 9 dólares, a pessoa estaria disposta a vender o bilhete por apenas 2,25 dólares. - 4.8Utilidade 269 4.8 Utility 269 Alguns problemas de decisdo estatistica Some Statistical Decision Problems Grande parte da teoria da inferéncia estatistica (tema dos Capitulos 7 a 11 deste texto) Much of the theory of statistical inference (the subject of Chapters 7-11 of this trata de problemas nos quais é preciso fazer uma das diversas escolhas disponiveis. text) deals with problems in which one has to make one of several available choices. Geralmente, qual escolha é a melhor depende de alguma variavel aleatéria que ainda ndo Generally, which choice is best depends on some random variable that has not yet foi observada. Um exemplo ja foi discutido na Sec. 4.5, onde introduzimos os critérios do been observed. One example was already discussed in Sec. 4.5, where we introduced erro quadratico médio (MSE) e do erro médio absoluto (MAE) para prever uma variavel the mean squared error (M.S.E.) and mean absolute error (M.A.E.) criteria for aleatéria. Nestes casos, temos que escolher um nimeroapara nossa previsdo de uma predicting a random variable. In these cases, we have to choose a number d for our variavel aleatoriaS.Qual previsdo sera a melhor depende do valor deSque ainda nao prediction of a random variable Y. Which prediction will be best depends on the sabemos. Variaveis aleatérias como —|5-d|e -(Sep value of Y that we do not yet know. Random variables like —|Y — d| and —(Y — d)? sdo apostas, e a escolha da aposta que minimiza MAE ou MSE é a escolha que are gambles, and the choice of gamble that minimizes M.A.E. or M.S.E. is the choice maximiza uma utilidade esperada. that maximizes an expected utility. Exemplo Previsdo de uma variavel aleatéria.Suponha queSé uma variavel aleatéria que precisamos Example Predicting a Random Variable. Suppose that Y is a random variable that we need 4.8.8 prever. Para cada previsdo possiveld, ha uma apostaXc= —|Sa| que especifica nosso 4.8.8 to predict. For each possible prediction d, there is a gamble X,; = —|Y —d| that ganho quando somos julgados por erro absoluto. Alternativamente, se estivermos specifies our gain when we are being judged by absolute error. Alternatively, if we sendo julgados pelo erro quadratico, a aposta apropriada a considerar seria Za= -(S- are being judged by squared error, the appropriate gamble to consider would be ep. Observe que essas apostas sdo sempre negativas, o que significa que nosso Zi = —(Y —d)*. Notice that these gambles are always negative, meaning that our ganho é€ negativo porque perdemos de acordo com o quao longeSé da previsdod. Se gain is negative because we lose according to how far Y is from the prediction d. If our nossa utilidadevocéé linear, entdo maximizando£[vocé(XaJ] por escolha dedé o utility U is linear, then maximizing E[U(X,)] by choice of d is the same as minimizing mesmo que minimizar o MAE. Além disso, maximizar £[ vocé(Za)] por escolha dedé o M.A.E. Also, maximizing E[U(Z,)] by choice of d is the same as minimizing M.S.E. mesmo que minimizar o MSE A equivaléncia entre maximizar a utilidade esperada e The equivalence between maximizing expected utility and minimizing the mean error minimizar o erro médio continuaria valida se a previsdo dependesse de outra would continue to hold if the prediction were allowed to depend on another random variavel aleatéria que poderiamos observar antes de prever. Ou seja, nossa previsdo variable W that we could observe before predicting. That is, our prediction would be seria uma fungdod(W), eXa= —|S-a(W)| ouZe= -[S-a(W)]2seria a aposta cuja utilidade a function d(W), and Xj; = —|Y —d(W)| or Z7 = —[Y — d(w)P would be the gamble esperada gostariamos de calcular. - whose expected utility we would want to compute. < Exemplo Limitando uma variavel aleatéria.Suponha queSé uma variavel aleatéria e que estamos Example Bounding a Random Variable. Suppose that Y is a random variable and that we are 4.8.9 interessado em saber seSs<cpara alguma constantec. Por exemplo, Spoderia ser a variavel 4.8.9 interested in whether or not Y <c for some constant c. For example, Y could be aleatoriaPem nosso ensaio clinico Exemplo 4.7.3. Poderiamos estar interessados em the random variable P in our clinical trial Example 4.7.3. We might be interested in saber sePspo, ondepoé a probabilidade de um paciente ter sucesso sem qualquer ajuda whether or not P < po, where pg is the probability that a patient will be a success do tratamento que esta sendo estudado. Suponha que tenhamos que tomar uma das without any help from the treatment being studied. Suppose that we have to make duas decisées disponiveis: one of two available decisions: (8 continuar a promover o tratamento, ou (a) (t) continue to promote the treatment, or abandonar o tratamento. (a) abandon the treatment. Se escolhermost,suponha que temos a ganhar If we choose ft, suppose that we stand to gain { ye 106 seP>po, y= 10° if P> po, -106 sePspo. ‘| 10° if P < po. Se escolhermosa, nosso ganho seraXa=0. Se nossa fungdo de utilidade for vocé, ent&o a If we choose a, our gain will be X, = 0. If our utility function is U, then the expected utilidade esperada para escolherté A[vocé(X:], etseria a melhor escolha se esse valor fosse utility for choosing t is E[U(X,)], and t would be the better choice if this value is maior que VOCE(0). Por exemplo, suponha que nossa utilidade seja greater than U(0). For example, suppose that our utility is { X0.8 08 Greys voce (xj: sex0, (4.8.6) U(x) = { * tx 2 0, (4.8.6) x sex <0. x ifx <0. Entao VOCE(00 e Then U(0) = 0 and Flvocé(Xt)] = -106Pr.(Pso}+ [106]o.8Pr.(P > po) E[U(X,)] = —10° Pr(P < po) + [10°°8 Pr(P > po) =104.8-(106+ 104.8)Pr.(PS/0). = 1048 — (10° + 10%) Pr(P < po). 270 Capitulo 4 Expectativa 270 Chapter 4 Expectation Entdo, A.vocé(Xt)|>0 se Pr(Ps po) <104.8//106+ 104.830.0594. Faz sentido quefé melhor So, E[U(X,)] > Oif Pr(P < po) < 10*87(10° + 10*8) = 0.0594. It makes sense that t is quease Pr(Psp0jé pequeno. A razdo é que a utilidade de escolhertsobrea so é positivo better than a if Pr(P < po) is small. The reason is that the utility of choosing t over a quandoP > po. Este exemplo segue o espirito do teste de hipdteses, que sera o is only positive when P > po. This example is in the spirit of hypothesis testing, which assunto do Capitulo 9. - will be the subject of Chapter 9. < Exemplo Investimento.No Exemplo 4.2.2, comparamos duas possiveis compras de ages com base Example Investment. In Example 4.2.2, we compared two possible stock purchases based 4.8.10 em seus retornos esperados e valor em risco, VaR. Suponha que o investidor tenha uma 4.8.10 on their expected returns and value at risk, VaR. Suppose that the investor has a fungdo de utilidade nao linear para dolares. Para ser mais especifico, suponha que a nonlinear utility function for dollars. To be specific, suppose that the utility of a return utilidade de um retorno dexseria igual vocé(xdado na Eq. (4.8.6). Podemos calcular a of x would equal U(x) given in Eq. (4.8.6). We can calculate the expected utility of utilidade esperada do retorno de cada uma das duas possiveis compras de agdes no the return from each of the two possible stock purchases in Example 4.2.2 to decide Exemplo 4.2.2 para decidir qual é mais favoravel. SeRé o retorno por agdo e compramosé which is more favorable. If R is the return per share and we buy s shares, then the agées, entdo o retorno éX=sF, e a utilidade esperada do retorno é return is X = sR, and the expected utility of the return is Jo Joo 0 oo ALU(SR)] = srf(r) drt (srpsf (r) dr, (4.8.7) E(U(sR)]= / srf(r) dr + [ (sr)°* f(r) dr, (4.8.7) — 00 0 —cO 0 onde& o pdf def. Para a primeira acdo, o retorno por acgdo éAidistribuido where f is the p.d.f. of R. For the first stock, the return per share is R, distributed uniformemente no intervalo [-10,20], e o numero de acées seriaéi= 120. Isso uniformly on the interval [—10, 20], and the number of shares would be s; = 120. This torna (4.8.7) igual a makes (4.8.7) equal to Jo J20(120rp.s 0 20 0.8 : 120R , 120 120r)” ALVOCE(120R1)] = —— Dr —.—— Dr=-12.6. E[U(120R,)] = / dr + / Gory dr = —12.6. -190 30 0 30 _10 30 0 30 Para a segunda ac¢do, o retorno por agao éAadistribuido uniformemente no For the second stock, the return per share is R, distributed uniformly on the interval intervalo [-4.5,10], e o numero de acées seria é2= 200. Isso torna (4.8.7) igual a [—4.5, 10], and the number of shares would be sj = 200. This makes (4.8.7) equal to Jo J . 200R 10(200rp.8 ° 200 '0 (200r)°8 ELVOCE(200 R2)| = — Dr+ (200rp8 Dr.=27.9. E[U(200R3)] =| a dr + / Or dr = 27.9. -4514.5 0 14.5 _45 14.5 0 14.5 Com esta fungdo de utilidade, a utilidade esperada da primeira compra de agées é na verdade With this utility function, the expected utility of the first stock purchase is actually negativa porque os grandes ganhos (até 120x20 = 2.400) adicione menos ao utilitario (2.4000.8= negative because the big gains (up to 120 x 20 = 2400) add less to the utility (2400°° = 506) do que as grandes perdas (até 120x-10 = -1200) subtraia da utilidade. A segunda compra 506) than the big losses (up to 120 x —10 = —1200) take away from the utility. The de acées tem utilidade esperada positiva, portanto seria a escolha preferida neste exemplo. second stock purchase has positive expected utility, so it would be the preferred - choice in this example. < Resumo Summary Quando temos que fazer escolhas face a incerteza, precisamos de avaliar quais serdo os When we have to make choices in the face of uncertainty, we need to assess what our nossos ganhos e perdas sob cada uma das possibilidades incertas. Utilidade é 0 valor para gains and losses will be under each of the uncertain possibilities. Utility is the value nds desses ganhos e perdas. Por exemplo, seXrepresenta o ganho aleatério de uma to us of those gains and losses. For example, if X represents the random gain from escolha possivel, entéovocé(X o valor para nds do ganho aleatério que receberiamos se a possible choice, then U(X) is the value to us of the random gain we would receive fizéssemos essa escolha. Devemos fazer a escolha de tal forma que£[vocé(X]] é o maior if we were to make that choice. We should make the choice such that E[U(X)] is as possivel. large as possible. Exercicios Exercises 1.Deixara >0. Um tomador de decisado tem uma funcdo de utilidade Suponha que esse tomador de decisdo esteja tentando decidir se 1. Let a > 0. A decision maker has a utility function for Suppose that this decision maker is trying to decide para o dinheiro da forma deve ou ndo comprar um bilhete de loteria por US$ 1. O bilhete de money of the form whether or not to buy a lottery ticket for $1. The lottery { loteria paga $ 500 com probabilidade 0,001 e $0 com a if 0 ticket pays $500 with probability 0.001, and it pays $0 with vocé(x= Xa sex >0, probabilidade 0,999. Quais seriam os valores deatem que ser para U(x) = | * 1 te probability 0.999. What would the values of w have to be x sexsd. que esse tomador de decisdo prefira comprar a passagem a nao x ifx <0. in order for this decision maker to prefer buying the ticket compra-la? to not buying it? 4.8Utilidade 271 4.8 Utility 271 2.Considere trés apostasX, 5, eZpara o qual as distribuicgées Pr.(X1= 0F0.2,Pr.(X1= 1F0.5,Pr.(X1= 20.3; Pr.(X2= OF 2. Consider three gambles X, Y, and Z for which the Pr(X; =0) =0.2, Pr(X; =) =05, Pr(X;=2) =0.3; de probabilidade dos ganhos sdo as seguintes: 0.4,Pr.(X%2= 1)£0.2, Pr.(X3= 0)£0.3,Pr. (BR (KE O4X4 probability distributions of the gains are as follows: Pr(X, =0) =0.4, Pr(X, =1) =0.2, Pr(X, =2) =0.4; Pr.(X=5EPr.(X=25}F 12, Pr.(S= = OF Pr.(X4= 20.5. Pr.(X3= 20.4; Pr(X =5) = Pr(X =25) = 1/2, Pr(X3 = 0) = 0.3, Pr(X3 =1)=0.3, Pr(X3 =2)=0.4; 10}EPr.(S=201/2, Pr.(Z=15}1. Pr(Y = 10) = Pr(Y = 20) = 1/2, Pr(X4 = 0) = Pr(X4 = 2) =0.5. Suponha que a fungdo utilidade de uma pessoa seja tal Pr(Z = 15) = 1. Suppose that a person’s utility function is such that she que ela prefiraXiparaxX2. Se a pessoa fosse forgada a 7 ; prefers X, to X>. If the person were forced to accept either Suponha que a funcdo de utilidade de uma pessoa tenha aceitar X30u.X4, qual ela escolheria? Suppose that a person’s utility function has the form —_y,, or X4, which one would she choose? a forma vocé(x x2parax >0. Qual das trés apostas ela U(x) =x? for x > 0. Which of the three gambles would she . preferiria? 11.Suponha que uma pessoa tenha uma determinada fortunaUm prefer? 11. Suppose that a person has a given fortune A > 0 and >0 e pode apostar qualquer quantiabdesta fortuna em um can bet any amount b of this fortune in a certain game 3.Determine qual das trés apostas do Exercicio 2 seria determinado jogo (0<bsA). Se ele ganhar a aposta, entao sua 3. Determine which of the three gambles in Exercise 2 (0 < b < A). If he wins the bet, then his fortune becomes preferida por uma pessoa cuja funcao de utilidade é fortuna se tornara A+; se ele perder a aposta, ent&o sua fortuna would be preferred by a person whose utility function is A+; if he loses the bet, then his fortune becomes A — b. vocé(xF xi 2parax >0. se tornaraA-b. Em geral, deixeXdenota sua fortuna depois de U(x) = x!/2 forx > 0. In general, let X denote his fortune after he has won or ganhar ou perder. Suponha que a probabilidade de ele ganhar lost. Assume that the probability of his winning is p (0 < 4.Determine qual das trés apostas do Exercicio 2 seria sejap (0<p <1)e a probabilidade de ele perder é 1 -p. Suponha 4. Determine which of the three gambles in Exercise 2 p <1) and the probability of his losing is 1 — p. Assume preferida por uma pessoa cuja fun¢ao de utilidade tem a também que a sua funcao utilidade, em fungdo da sua fortuna would be preferred by a person whose utility function also that his utility function, as a function of his final for- formavocé(x- machadot b, ondeaebsao constantes (um >0 finalx, évocé(xF registro xparax >0. Se a pessoa deseja apostar has the form U(x) = ax + b, where a and b are constants tune x, is U(x) = log x for x > 0. If the person wishes to } uma quantiadpara o qual a utilidade esperada de sua fortuna£, (a > 0). bet an amount b for which the expected utility of his for- 5.Considere uma funcao de utilidadevocépara qual VOCE(0-0 voce(M)] sera um maximo, qual valor Bele deveria apostar? 5. Consider a utility function U for which U(0) = 0 and pune ELV CO] will be a maximum, what amount b should e VOCE(100)=1. Suponha que uma pessoa que tem esta U(100) = 1. Suppose that a person who has this utility ‘ funcao de utilidade seja indiferente a aceitar uma aposta na 12.Determine o valorbque a pessoa deveria apostar no function is indifferent to either accepting a gamble from 12. Determine the amount b that the person should bet in qual seu ganho sera de 0 délares com probabilidade 1/3 ou Exercicio 11 se sua funcao de utilidade forvocé(x= x1 2parax20. which his gain will be 0 dollars with probability 1/3 or 100 Exercise 11 if his utility function is U(x) = x/? for x > 0. 100 délares com probabilidade 2/3 ou aceitar 50 ddlares como . . dollars with probability 2/3 or accepting 50 dollars as a . . uma coisa certa. Qual é 0 valor VOCE(S50)? 13.Determine o valorbque a pessoa deveria apostar no sure thing. What is the value of (50)? 13. Determine the amount b that the person should bet in Exercicio 11 se sua fungdo de utilidade forvocé(x! xparax20. Exercise 11 if his utility function is U(x) = x for x > 0. 6.Considere uma fungao de utilidadevocépara qual VOCEOS, 14.Determine o valor bque a pessoa deveria apostar no 6. Consider a utility function U for which U(0) =, 14. Determine the amount b that the person should bet in VOCEAE8, eVOCE(210. Suponha que uma pessoa que tenha Exercicio 11 se sua fungdo de utilidade forvocé(x x2parax20 Ul) ~ 8, and U2) = 10. Suppose that a person who has Exercise 11 if his utility function is U(x) = x? forx >0 esta funcdo de utilidade seja indiferente a qualquer um dos dois a this utility function is indifferent to either of two gambles -—" jogos XeSim,para o qual as distribuigdes de probabilidade dos 15.Suponha que uma pessoa tenha um bilhete de loteria do X and Y, for which the probability distributions of the 15. Suppose that a person has a lottery ticket from which ganhos sao as seguintes: qual ganharaXdélares, ondeXtem distribuigdo uniforme no gains are as follows: she will win X dollars, where X has the uniform distribu- intervalo [0,4]. Suponha também que a funcdo de utilidade da tion on the interval [0, 4]. Suppose also that the person’s Pr. (X= -10.6,Pr.(X=0)£0.2,Pr.(X=2}-0.2; Pr.(S=0)-0. pessoa sejavocé(x xaparax20, ondeaé uma dada constante Pr(X = —1) = 0.6, Pr(X = 0) = 0.2, Pr(X = 2) = 0.2; utility function is U(x) = x® for x > 0, where @ is a given 9,Pr.(S=1}0.1. positiva. Por quantos ddélaresxoa pessoa estaria disposta a Pr(Y = 0) =0.9, Pr(Y = 1) =0.1. positive constant. For how many dollars x9 would the per- . vender esse bilhete de loteria? son be willing to sell this lottery ticket? Qual é 0 valor VOCE(-1 ?? What is the value of U(—1)? 16.DeixarSseja uma variavel aleatéria que gostariamos de prever. 16. Let Y be arandom variable that we would like to pre- 7.Suponha que uma pessoa deva aceitar uma apostaXdo Suponha que devemos escolher um unico numeroacomo a 7. Suppose that a person must accept a gamble X of the dict. Suppose that we must choose a single number d as the seguinte formato: previsdo e que vamos perder(S-ezdolares. Suponha que nossa following form: prediction and that we will lose (Y — d)? dollars. Suppose utilidade para délares seja uma fungao de raiz quadrada: that our utility for dollars is a square root function: Pr.(X=aEpe Pr(X=1 -aF1 -p, {V Pr(xX¥ =a)=p and Pr(x=1-—a)=1-p, . x sex20, JX ifx >0, ondepé um determinado numero tal que 0< where p is a given number such that 0 < p < 1. Suppose UX) = . - " sex <0. —J/—x ifx <0. também que a pessoa possa escolher e fixar o valor dea (O<as also that the person can choose and fix the value of a 1)para ser usado nesta aposta. Determine o valor deaque a Prove que 0 valor deaque maximiza a utilidade esperada é (0 <a <1) to be used in this gamble. Determine the value —_ Prove that the value of d that maximizes expected utility pessoa escolheria se sua funcao de utilidade fosse vocé(xF uma mediana da distribuicao des. of a that the person would choose if his utility function is a median of the distribution of Y. registroxparax >0. . . was U(x) = log x for x > 0. : _ : 17.Reconsidere as condigdes do Exemplo 4.8.9. Desta 17. Reconsider the conditions of Example 4.8.9. This 8.Determine o valor deaque uma pessoa escolheria no Exercicio 7 vez, suponha quepo= 12 e 8. Determine the value of a that a person would choose in time, suppose that pp = 1/2 and ~ ae as . . . “4: . _ 1/2 se sua funcdo de utilidade fosse vocé(x- x1 2parax20. voce { 0.9 sex20, Exercise 7 if his utility function was U(x) = x /2 for x >0. va)= x09 ifx>0, 9.Determine o valor deaque uma pessoa escolheria no x sex <0. 9. Determine the value of a that a person would choose x ifx <0. Exercicio 7 se sua fungdo de utilidade fosse vocé(x- xpara x20. in Exercise 7 if his utility function was U(x) =x for x > 0. Suponha também quePtem pdff(p-56p6(1 -p)para 0<p Suppose also that P has p.d.f. f(p) = 56p°(1 — p) for 0 < 10.Considere quatro apostas%1,X2,X3, eX4, para as quais as <1. Decida se € melhor ou ndo abandonar o 10. Consider four gambles X1, X2, X3, and X4, for which p <1. Decide whether or not it is better to abandon the distribuig6es de probabilidade dos ganhos sdo as seguintes: tratamento. the probability distributions of the gains are as follows: treatment. 272 Capitulo 4 Expectativa 272 Chapter 4 Expectation 4.9 Exercicios Suplementares 4.9 Supplementary Exercises 1.Suponha que a variavel aleatériaXtem uma 10.Suponha queX1,..., Xnsdo variaveis _aleatérias iid, cada 1. Suppose that the random variable X has a continuous 10. Suppose that Xj, ..., X,, are i.i.d. random variables, distribuigdo continua com cdfF(xJe pdffSuponha uma das quais tem uma distribuigéo continua com mediana distribution with c.d.f. F(x) and p.d.f. f. Suppose also that each of which has a continuous distribution with median também que EXjexiste. Prove isso eu. DeixarSn=maximo{™, ..., Xn}. Determine o valor de Pr(Sn E(X) exists. Prove that m. Let Y, = max{X,,..., X,}. Determine the value of a >m). . Pr(Y,, > m). limaox{1 -F(x)| = 0. im, x[{1— F(x)]=0. ; ; . 11.Suponha que vocé va vender refrigerante em um jogo de . . . 11. Suppose that you are going to sell cola at a football Dica.Use 0 fato de que seEXjexiste, entao futebol e deva decidir antecipadamente quanto pedir. Suponha Hint: Use the fact that if F(X) exists, then game and must decide in advance how much to order. Jvoce que a demanda por refrigerante no jogo, em litros, tenha u Suppose that the demand for cola at the game, in liters, EXFlimdao _ co -00 xf(x)ax. distribuigdo continua com pdfffx). Suponha que vocé tenha lucro E(X)= im / xf (x) dx. has acontinuous distribution with p.d.f. f(x). Suppose that vo degcentavos em cada litro que vocé vende no jogo e sofre uma “oe you make a profit of g cents on each liter that you sell at 2.Suponha que a variavel aleatériaXtem uma distribuicgdo perda deccentavos por cada litro que vocé encomenda, mas nado 2. Suppose that the random variable X has a continuous the game and suffer a loss of c cents on each liter that you continua com cdf F(x). Suponha também que Pr(X20}- 1 e vende. Qual é a quantidade ideal de refrigerante que vocé pode distribution with c.d.f. F(x). Suppose also that Pr(x >0)= _ order but do not sell. What is the optimal amount of cola issoEXjexiste. Mostre ese pedir para maximizar o ganho liquido esperado? land that E(X) exists. Show that for you to order so as to maximize your expected net gain? co EX [1 -A(x)ldx. 12.Suponha que o numero de horasXpara o qual uma E(X)= fu — F(x)] dx. 12. Suppose that the number of hours X for which a ma- 0 maquina ira operar antes de falhar tem uma distribuigdo 0 chine will operate before it fails has a continuous distribu- DicaNocé pode usar o resultado comprovado no Exercicio 1. continua com pdff(x). Suponha que no momento em que a Hint: You may use the result proven in Exercise 1. tion with p.d.f. F(). Suppose that at the time at which the maquina comega a operar vocé deva decidir quando retornara machine begins operating you must decide when you will 3.Considere novamente as condic¢ées do Exercicio 2, mas para inspeciona-la. Se vocé retornar antes que a maquina 3. Consider again the conditions of Exercise 2, but sup- return to inspect it. If you return before the machine has suponha agora queXtem uma distribuigéo discreta com cdf tenha falhado, vocé incorrera em um custo debdélares por ter pose now that X hasa discrete distribution withc.d.f. F(x), failed, you incur a cost of b dollars for having wasted an F(x), em vez de uma distribuigéo continua. Mostre que a desperdicado uma inspecdo. Se vocé retornar apés a falha da rather than a continuous distribution. Show that the con- _ inspection. If you return after the machine has failed, you conclusdo do Exercicio 2 ainda é valida. maquina, vocé incorreré em um custo decdolares por hora clusion of Exercise 2 still holds. incur a cost of c dollars per hour for the length of time dur- 4.Suponha queX,S,eZsao variaveis aleatérias ndo pelo period de tempo durante ° qual a maquina nao 4. Suppose that X, Y, and Z are nonnegative random ng which the machine vb not operating afters failure. negativas tais que Pr(X+S+Z<1.31. Mostre issoX, Sez _—-{UNcionou apds sua falha. Qual € o numero ideal de horas variables such that Pr(X + Y + Z <1.3)=1.Showthat x, What 1s the optimal number of hours to wait before you ndo pode ter uma distribuicao conjunta sob a qual cada _—-~PA"4 ESPErar antes de retornar para inspe¢ao, a fim de Y, and Z cannot possibly have a joint distribution under return for inspection in order to minimize your expected uma de suas distribuicées marginais seja a distribuicdo = M!M"zar 0 custo esperado? which each of their marginal distributions is the uniform ©S"* uniforme no intervalo [0,1]. 13.Suponha queXe$sao varidveis aleatorias para as distribution on the interval [0, 1]. 13. Suppose that X and Y are random variables for which 5.Suponha que a variavel aleatériaXtem significadoe quais EX)=3,E(S1, Var(X)=4 e Var(S¥9. DeixeZ 5X-S+ 5. Suppose that the random variable X has mean wp and = E(X) = 3, E(Y) = 1, Var(X) = 4, and Var(Y) = 9. Let Z = variag&oa2, e essaS=machadot b. Determine os valores _—-19- Encontre£(Ze Var(Z)sob cada uma das seguintes variance o”, and that Y =aX +b. Determine the values 5% — ¥ +15. Find E£(Z) and Var(Z) under each of the deaebpara qual£(5-0 e Var(S1. condic¢ées:(a)XeSsdo independentes;(b) XeSndo estado of a and b for which E(Y) =0 and Var(Y) = 1. following conditions: (a) X and Y are independent; (b) correlacionados;(c)a correlagdo deXeSé 0,25. X and Y are uncorrelated; (c) the correlation of X and Y 6.Determine a expectativa do intervalo de uma amostra 6. Determine the expectation of the range of arandom jg 9.25. aleatoria de tamanhonda distribuicgdo uniforme no sample of size n from the uniform distribution on the intervalo [0,1]. 14.Suponha que, Xi,..., XnSdo corridas independentes interval [0, 1]. 14. Suppose that Xo, X;,..., X, are independent ran- 7.Suponha que um revendedor de automéveis pague uma quantia variavels dom, cada uma com a mesma varencia ge oener 7. Suppose that an automobile dealer pays an amount X dom variables, each having the same we 0”. Let X(em milhares de délares) por um carro usado e depois o vende SeX+-Xiparafi,...,n,edeixarSn= — A Sj (in thousands of dollars) for a used car and then sells it for Y, =X, —Xj,-;for j=1,...,n,and let Y,=- viet Y;. por uma quantiaS.Suponha que as varidveis aleatériasxXeStem o . — n an amount Y. Suppose that the random variables X and Y . = n seguinte pdf conjunto: Determine o valor de Var(Sn). have the following joint p.d.f: Determine the value of Var(Y,,). { 1 15.Suponha que%,..., Xnsdo variaveis aleatorias para as 1 15. Suppose that X,,..., X, are random variables for para 0<x <y <6, caso . axx for0<x<y<6, : 2 . F(x, WF 36x quais Var(Xeutem 0 mesmo valorazparaeu=1,..., Ne p(Xeu, fx, y= | 36 which Var(X;) has the same value o~ fori=1,...,n and O = _contrario. Xjtem o mesmo valorppara cada par de valoreseu 0 otherwise. p(X;, X;) has the same value p for every pair of values i Determine 0 ganho esperado do revendedor com a venda. ejde tal modo queeu=/.Prove issopz= - —— Determine the dealer’s expected gain from the sale. and j such that i 4 j. Prove that p > — —. 8.Suponha queXi, ..., Xnformar uma amostra aleatéria de " 8. Suppose that X;,..., X, formarandom sample of size " tamanho nde uma distribuicado continua com o seguinte pdf: 16.Suponha que a distribuigdo conjunta deXeSé a n from a continuous distribution with the following p.d.£: 16. Suppose that the joint distribution of X and Y is the { distribuigdo uniforme sobre um retangulo com lados uniform distribution over a rectangle with sides parallel fix= 2x para O<x <1, paralelos aos eixos coordenados noxy-avido. f(x) = | 2x for0<x <1, to the coordinate axes in the xy-plane. Determine the O — deoutra forma. Determine a correlagdo deXeS. 0 otherwise. correlation of X and Y. DeixarSn=maximo{M, . . . , Xn}. Avalie (Sn). 17.Suponha quenletras sdo colocadas aleatoriamente emn Let Y, = max{X},..., X,}. Evaluate E(Y,). 17. Suppose that n letters are put at random into n en- 9.Seeué uma mediana da distribuicdo dex, e seS=r(X) é envelopes, como no problema de correspondéncia descrito na 9. Ifm isa median of the distribution of X, and if Y =r(X) velopes, as in the matching problem described in Sec. 1.10. uma fun¢do nado decrescente ou nao crescente dex, Sec. 1.10. Determine a variacdo do numero de cartas is either a nondecreasing or a nonincreasing function of X, Determine the variance of the number of letters that are mostre que/(m}é uma mediana da distribuicgdo deS. colocadas nos envelopes corretos. show that r(m) is a median of the distribution of Y. placed in the correct envelopes. 4.9 Exercicios Suplementares 273 4.9 Supplementary Exercises 273 18.Suponha que a variavel aleatoriaXtem significadoye 18. Suppose that the random variable X has mean yz and variagdooz. Mostre que o terceiro momento central deX A B C D variance o”. Show that the third central moment of X can A B C D pode ser expresso como£X3}3 02-123. . . | . | . | | be expressed as E(X3) — 30? — p>. e e | e | e | | 19.Suponha queXé uma variavel aleatéria com mgf Y(t), 0 1 2 3 4 5 6 7 19. Suppose that X is a random variable with m.g.f. y(t), 0 1 2 3 4 5 6 7 significare varia¢do o2; e deixarc(tEregistro[ W(t). ; ; mean j, and variance o7; and let c(t) = log[y(t)]. Prove oo. Prove issoc(0Epec (0K on. 25.Suponha queresem o seguinte pdf conjunto: that c’(0) = w and c"(0) =o”. 25. Suppose that X and Y have the following joint p.d.f.: 20.Suponha queXeStem uma distribuigéo conjunta com F(x, YF 8xy para 0<y <x <1, caso 20. Suppose that X and Y have a joint distribution with f(x,y) = | 8xy for0<y<x<1, meiospxeps,desvio padraooxeas,e correlacdop. Mostre O __contrario. means jy and py, standard deviations oy and oy, and 0 otherwise. que se£(S| Xé uma funcao linear de x, entao Suponha também que o valor observado deXé 0,2. correlation p. Show that if E(Y|X) is a linear function of Suppose also that the observed value of X is 0.2. a.Qual valor previsto deStem o menor MSE? X, then a. What predicted value of Y has the smallest M.S.E.? E(S| X= ust ps(X wd b.Qual valor previsto deStem o menor MAE? E(Y|X)=pyt+ poh (x — [Ly). b. What predicted value of Y has the smallest M.A.E.? oO on 26.Para todas as variaveis aleatdériasX,S,eZ, deixe Cov(X, Y| z) * 26. For all random variables X, Y, and Z, let Cov(X, Y|z) 21.Suponha queXeSsdo variaveis aleatdérias tais que denotar a covariancia deXeSem sua distribuicgéo conjunta 21. Suppose that X and Y are random variables such that denote the covariance of X and Y in their conditional joint E(S| XF7 -(1\/A)Xe EX| S10 -S.Determine a correlagdo de condicional dadaZ=z. Prove isso E(Y|X)=7—-—(1/4)X and E(X|Y) =10 — Y. Determine distribution given Z = z. Prove that XeS. Cov(x, YFACOWX, Y|ZI the correlation of X and Y. Cov(X, Y) = E[Cov(X, Y1Z)] 22.Suponha que uma vara com 3 pés de comprimento seja + Cov[EX|Z), E(S| ZI. 22. Suppose that a stick having a length of 3 feet is broken 4+ Cov[E(X|Z), E(Y|Z)}. quebrada em dois pedagos e que 0 ponto em que a vara é into two pieces, and that the point at which the stick is quebrada seja escolhido de acordo com a pdfffx). Qual éa 27.Considere a caixa de bolas vermelhas e azuis nos Exemplos broken is chosen in accordance with the p.d.f. f(x). What 27. Consider the box of red and blue balls in Exam- correlagao entre o comprimento da pega mais longa e o 4.2.4 e 4.2.5. Suponha que fagamos uma amostran >1 bolinha is the correlation between the length of the longer piece ples 4.2.4 and 4.2.5. Suppose that we sample n > 1 balls comprimento da pega mais curta? com reposic¢do, e deixeXseja o numero de bolas vermelhas na and the length of the shorter piece? with replacement, and let X be the number of red balls in amostra. Entéo nds amostramosnbolas sem reposicdo, e the sample. Then we sample n balls without replacement, 23.Suponha queXeStem uma distribuigao conjunta com deixamosSseja o numero de bolas vermelhas na amostra. 23. Suppose that X and Y have a joint distribution with and we let Y be the number of red balls in the sample. correlacaop >1/2 e aquele Var (X-Var(S¥1. Mostrar Prove que Pr(X=n) >Pr.(S=n). correlation > 1/2 and that Var(X) = Var(Y) = 1. Show Prove that Pr(X =n) > Pr(Y =n). que b= - 20 € 0 valor Unico dedtal que o correspondente 28.Suponha que a fungdo utilidade de uma pessoa sejavocé(x- x2 that b = 2p is the unique value of b such that the corre- 28. Suppose that a person’s utility function is U(x) = x? lacdo deXeX+poré tambémp. parax20. Mostre que a pessoa sempre preferira fazer lation of X and X + bY is also p. for x > 0. Show that the person will always prefer to take uma aposta na qual recebera um ganho aleatério dex a gamble in which she will receive a random gain of X dol- 24.Suponha que quatro prédios de apartamentosA,B,C, eD estado délares em vez de receber a quantiaEXxcom certeza, 24. Suppose that four apartment buildings A, B, C, and D lars rather than receive the amount E(X) with certainty, localizados ao longo de uma rodovia nos pontos 0,1,3 e 5, conforme onde Pr(X20)=1 e£(X) <eo, are located along a highway at the points 0, 1, 3, and 5, as where Pr(X > 0) =1 and E(X) < oo. mostrado na figura a seguir. Suponha também que 10% dos , , shown in the following figure. Suppose also that 10 percent ., . empregados de uma determinada empresa vivam em edificiosA, 20 29.Uma pessoa € dadaeudolares, que ele deve alocar of the employees of a certain company live in building A, 29. A person is given m dollars, which he must allocate por cento vivem em4, 30 por cento vivem em<G, e 40 por cento vivem entre um eventoAe seu complementoAc. Suponha que ele 20 percent live in B, 30 percent live in C, and 40 percent between an event A and its complement A*. Suppose that emD. aloqueadolares paraAeeu-adélares paraAc. O ganho da live in D. he allocates a dollars to A and m —a dollars to A‘. The pessoa é entdo determinado da seguinte forma: SeA o, . person’s gain is then determined as follows: If A occurs, a.Onde a empresa deve construir seu novo escritério para ocorre, seu ganho ég1a; seAcocorre, seu ganho &g2(m-a). a. Where should the company build its new office in or- his gain is g,a; if A occurs, his gain is g)(m — a). Here, minimizar a distancia total que seus funciondarios devem Aqui, giegzrecebem constantes positivas. Suponha der to minimize the total distance that its employees g1 and gp are given positive constants. Suppose also that percorrer? também que Pr(A pe a funcao de utilidade da pessoa é must travel? Pr(A) = p and the person’s utility function is U(x) = log x b.Onde a empresa deve construir seu novo escritdrio para vocé(xregistrox parax >0. Determine o valoraque b. Where should the company build its new office in for x > 0. Determine the amount a that will maximize the minimizar a soma das distancias quadradas que seus maximizara a utilidade esperada da pessoa e mostrara order to minimize the sum of the squared distances person’s expected utility, and show that this amount does funcionarios devem percorrer? que esse valor ndo depende dos valores degiegz. that its employees must travel? not depend on the values of g; and g>. This page intentionally left blank Esta página foi intencionalmente deixada em branco Chapter 5 Special Distributions 5.1 Introduction 5.2 The Bernoulli and Binomial Distributions 5.3 The Hypergeometric Distributions 5.4 The Poisson Distributions 5.5 The Negative Binomial Distributions 5.6 The Normal Distributions 5.7 The Gamma Distributions 5.8 The Beta Distributions 5.9 The Multinomial Distributions 5.10 The Bivariate Normal Distributions 5.11 Supplementary Exercises 5.1 Introduction In this chapter, we shall define and discuss several special families of distributions that are widely used in applications of probability and statistics. The distributions that will be presented here include discrete and continuous distributions of univariate, bi- variate, and multivariate types. The discrete univariate distributions are the families of Bernoulli, binomial, hypergeometric, Poisson, negative binomial, and geomet- ric distributions. The continuous univariate distributions are the families of normal, lognormal, gamma, exponential, and beta distributions. Other continuous univariate distributions (introduced in exercises and examples) are the families of Weibull and Pareto distributions. Also discussed is the multinomial family of multivariate discrete distributions, and the bivariate normal family of bivariate continuous distributions. We shall briefly describe how each of these families of distributions arise in applied problems and show why each might be an appropriate probability model for some experiment. For each family, we shall present the form of the p.f. or the p.d.f. and discuss some of the basic properties of the distributions in the family. The list of distributions presented in this chapter, or in this entire text for that matter, is not intended to be exhaustive. These distributions are known to be useful in a wide variety of applied problems. In many real-world problems, however, one will need to consider other distributions not mentioned here. The tools that we develop for use with these distributions can be generalized for use with other distributions. Our purpose in providing in-depth presentations of the most popular distributions here is to give the reader a feel for how to use probablity to model the variation and uncertainty in applied problems as well as some of the tools that get used during probability modeling. 5.2 The Bernoulli and Binomial Distributions The simplest type of experiment has only two possible outcomes, call them 0 and 1. If X equals the outcome from such an experiment, then X has the simplest type of nondegenerate distribution, which is a member of the family of Bernoulli distributions. If n independent random variables X1, . . . , Xn all have the same 275 C5 felizmente Distribuições Especiais 5.1Introdução 5.2As Distribuições Bernoulli e Binomial 5.3As distribuições hipergeométricas 5.4As Distribuições de Poisson 5.5As distribuições binomiais negativas 5.6As distribuições normais 5.7As Distribuições Gama 5.8As distribuições beta 5.9As Distribuições Multinomiais 5.10As distribuições normais bivariadas 5.11Exercícios Suplementares 5.1 Introdução Neste capítulo, definiremos e discutiremos diversas famílias especiais de distribuições que são amplamente utilizadas em aplicações de probabilidade e estatística. As distribuições que serão apresentadas aqui incluem distribuições discretas e contínuas de tipos univariados, bivariados e multivariados. As distribuições univariadas discretas são as famílias de distribuições Bernoulli, binomial, hipergeométrica, Poisson, binomial negativa e geométrica. As distribuições univariadas contínuas são as famílias de distribuições normal, lognormal, gama, exponencial e beta. Outras distribuições univariadas contínuas (introduzidas em exercícios e exemplos) são as famílias de distribuições Weibull e Pareto. Também é discutida a família multinomial de distribuições discretas multivariadas e a família normal bivariada de distribuições contínuas bivariadas. Descreveremos brevemente como cada uma dessas famílias de distribuições surge em problemas aplicados e mostraremos por que cada uma pode ser um modelo de probabilidade apropriado para algum experimento. Para cada família, apresentaremos a forma do PF ou da FDP e discutiremos algumas das propriedades básicas das distribuições na família. A lista de distribuições apresentada neste capítulo, ou em todo o texto, não pretende ser exaustiva. Essas distribuições são conhecidas por serem úteis em uma ampla variedade de problemas aplicados. Em muitos problemas do mundo real, entretanto, será necessário considerar outras distribuições não mencionadas aqui. As ferramentas que desenvolvemos para uso com essas distribuições podem ser generalizadas para uso com outras distribuições. Nosso objetivo ao fornecer aqui apresentações detalhadas das distribuições mais populares é dar ao leitor uma ideia de como usar a probabilidade para modelar a variação e a incerteza em problemas aplicados, bem como algumas das ferramentas que são usadas durante a modelagem de probabilidade. 5.2 As Distribuições Bernoulli e Binomial O tipo mais simples de experimento tem apenas dois resultados possíveis, chame-os de 0 e 1. SeXé igual ao resultado de tal experimento, entãoXtem o tipo mais simples de distribuição não degenerada, que é membro da família das distribuições de Bernoulli. Senvariáveis aleatórias independentesX1, . . . , Xntodos têm o mesmo 275 276 Capitulo 5 Distribuigées Especiais 276 Chapter 5 Special Distributions distribuicgo de Bernoulli, ent@o sua soma é igual ao numero doXeué igual a 1,ea Bernoulli distribution, then their sum is equal to the number of the X ;’s that equal 1, distribuiggo da soma 6 um membro da familia binomial. and the distribution of the sum is a member of the binomial family. As Distribuigdes Bernoulli The Bernoulli Distributions Exemplo Um ensaio clinico.O tratamento dado a um determinado paciente em um ensaio clinico pode Example A Clinical Trial. The treatment given to a particular patient in a clinical trial can 5.2.1 sucesso ou fracasso. DeixarX=0 se o tratamento falhar, e deixeX=1 se o tratamento for bem- 5.2.1 either succeed or fail. Let X = 0 if the treatment fails, and let X = 1 if the treatment sucedido. Tudo o que é necessario para especificar a distribuicdo deXé o valorp=Pr.(X=1)(ou, succeeds. All that is needed to specify the distribution of X is the value p = Pr(X = 1) equivalentemente, 1 -p=Pr.(X=0)). Cada um diferentepcorresponde a uma distribuicdo diferente (or, equivalently, 1 — p = Pr(X =0)). Each different p corresponds to a different paraX. A colecao de todas essas distribuigdes correspondentes a todos os O< ps1 formoa distribution for X. The collection of all such distributions corresponding to all 0 < familia de distribuigées Bernoulli. - p <1form the family of Bernoulli distributions. < Um experimento de tipo particularmente simples é aquele em que existem apenas dois An experiment of a particularly simple type is one in which there are only two resultados possiveis, como cabeca ou cauda, sucesso ou fracasso, defeito ou nao defeito, possible outcomes, such as head or tail, success or failure, defective or nondefective, paciente se recupera ou nao se recupera. E conveniente designar os dois resultados possiveis patient recovers or does not recover. It is convenient to designate the two possible de tal experimento como 0 e 1, como no Exemplo 5.2.1. A seguinte recapitulacdo da Definicdo outcomes of such an experiment as 0 and 1, as in Example 5.2.1. The following recap 3.1.5 pode entdo ser aplicada a todos os experimentos deste tipo. of Definition 3.1.5 can then be applied to every experiment of this type. Definicgao Distribuigdo Bernoulli.Uma variavel aleatériaXtem oDistribuicao de Bernoulli com pa- Definition Bernoulli Distribution. A random variable X has the Bernoulli distribution with pa- 5.2.1 rametrop (Osps1)seXpode assumir apenas os valores 0 e 1 e as probabilidades sdo 5.2.1 rameter p (0 < p <1) if X can take only the values 0 and 1 and the probabilities are Pr.(X=1 pe Pr(X=01 -pdg. (5.2.1) Pr(X=)=p and Pr(x=0)=1-p. (5.2.1) O PF deXpode ser escrito da seguinte forma: The p.f. of X can be written as follows: { 1 -ph- arax=0,1, *(1— p)!* forx =0, 1, fix|p= Px -PI-x PP (5.2.2) f(xlp) = { pi — p) . (5.2.2) 0 de outra forma. 0 otherwise. Para verificar se este pff(x| prealmente representa a distribuigdo de Bernoulli To verify that this pf. f(x|p) actually does represent the Bernoulli distribution especificada pelas probabilidades (5.2.1), 6 simplesmente necessdario notar quef (1 | p) specified by the probabilities (5.2.1), it is simply necessary to note that f(1|p) =p =pef(0|pF1 -p. and f(0|p) =1— p. SeAtem a distribuigdo de Bernoulli com pardmetrop, entdoX2exXsdo a mesma If X has the Bernoulli distribution with parameter p, then X? and X are the same variavel aleatéria. Segue que random variable. It follows that EX#1.p+0. (1 -pp, E02 EX) E(X)=1-p+0-(1—p) =p, =P, E(X°) = E(X) = p, e and Var(XFEX2}) [EX)2=p(\ -pdg.). Var(X) = E(X*) —[E(X)f = p( — p). Além disso, o mgf dexé Furthermore, the m.g.f. of X is W(t= E(etx= educacao Fisicat (| -p)para -%<t <e, w(t) = E(e'*) = pe’ +(1— p) for —-wo<t<o. Definigao Ensaios/Processos de Bernoulli.Se as varidveis aleatérias em uma sequéncia finita ou infinitaXi, Definition Bernoulli Trials/Process. If the random variables in a finite or infinite sequence X,, 5.2.2 X2,...SA0 iid, e se cada variavel aleatoriaXeutem a distribuigdo de Bernoulli com parametrog, 5.2.2 X>,... are ii.d., and if each random variable X; has the Bernoulli distribution with entdo diz-se queX, X2,...sdoFnsaios de Bernoulli com parametrop. Uma sequéncia infinita de parameter p, then it is said that X1, X>, ... are Bernoulli trials with parameter p. An tentativas de Bernoulli também é chamada deProcesso Bernoulli. infinite sequence of Bernoulli trials is also called a Bernoulli process. Exemplo Jogando uma moeda.Suponha que uma moeda honesta seja lancada repetidamente. DeixarXeu=1 se uma cabeca for Example Tossing a Coin. Suppose that a fair coin is tossed repeatedly. Let x;= 1 if a head is 5.2.2 obtido noeulance e deixeXeu=0 se uma cauda for obtida(eu=1,2, .. .). Entéo as 5.2.2 obtained on the ith toss, and let X; = 0 if a tail is obtained (i = 1, 2, .. .). Then the variaveis aleatorias%, X2,...sdo ensaios de Bernoulli com parametrop=1/72. - random variables X,, X>,... are Bernoulli trials with parameter p = 1/2. < 5.2 As Distribuigées Bernoulli e Binomial 277 5.2 The Bernoulli and Binomial Distributions 277 Exemplo Pecas defeituosas.Suponha que 10% dos itens produzidos por uma determinada maquina Example Defective Parts. Suppose that 10 percent of the items produced by a certain machine 5.2.3 estao com defeito e as pecas sdo independentes umas das outras. Nos iremos provarnitens 5.2.3 are defective and the parts are independent of each other. We will sample n items at aleatoriamente e inspeciona-los. DeixarXeu=1 se oeuo item esta com defeito e deixeXeu=0 se random and inspect them. Let X; = 1 if the ith item is defective, and let X; = 0 if it nao estiver defeituoso(eu=1,..., ). Entdo as variaveisM1, ..., XnformanEnsaios de Bernoulli is nondefective (i =1,..., 7). Then the variables X,,..., X,, formn Bernoulli trials com parametrop=1/10. - with parameter p = 1/10. < Exemplo Testes clinicos.Nos muitos exemplos de ensaios clinicos nos capitulos anteriores (Exemplo 4.7.8, Example Clinical Trials. In the many clinical trial examples in earlier chapters (Example 4.7.8, 5.2.4 por exemplo), as varidveis aleatériasXi, X2,...,indicando se cada paciente é um sucesso, 5.2.4 for instance), the random variables X,, X,..., indicating whether each patient is a foram ensaios condicionais de Bernoulli com parametropdadoP=p, ondeP é a proporcao success, were conditionally Bernoulli trials with parameter p given P = p, where P desconhecida de pacientes em uma populacdo muito grande que se recuperam. - is the unknown proportion of patients in a very large population who recover. <4 As distribuigées binomiais The Binomial Distributions Exemplo Pecas defeituosas.No Exemplo 5.2.3, deixexX=Xi+. «++ Xio, que é igual ao numero Example Defective Parts. In Example 5.2.3, let X = X; +---+ X 9, which equals the number 5.2.5 de pecas defeituosas entre as 10 pecas amostradas. Qual é a distribuigdo dex? - 5.2.5 of defective parts among the 10 sampled parts. What is the distribution of X? < Conforme derivado apos o Exemplo 3.1.9, a distribuigdo deXno Exemplo 5.2.5 esta a As derived after Example 3.1.9, the distribution of X in Example 5.2.5 is the distribuigdo binomial com paradmetros 10 e 1/10. Repetimos aqui a definigdo geral de binomial distribution with parameters 10 and 1/10. We repeat the general definition distribuigées binomiais. of binomial distributions here. Definicao Distribuigdo binomial.Uma variavel aleatériaXtem odistribui¢do binomial com pa- Definition Binomial Distribution. A random variable X has the binomial distribution with pa- 5.2.3 ramimetrosnepseXtem uma distribuigdo discreta para a qual o PF é 0 seguinte: 5.2.3 rameters n and p if X has a discrete distribution for which the p.f. is as follows: {() fix\n, pF Xp (V-p)n-x parax=0,1,2,..., 1, (5.2.3) f(rln, p) = | (%) p*(— p)"* forx =0,1,2,...,7, (5.2.3) 0 de outra forma. 0 otherwise. Nesta distribuigdo, mdeve ser um numero inteiro positivo epdeve estar no intervalo Os In this distribution, n must be a positive integer, and p must lie in the interval psi. O<p<l. As probabilidades para diversas distribuigdes binomiais podem ser obtidas na tabela fornecida Probabilities for various binomial distributions can be obtained from the table given no final deste livro e em muitos programas de software estatistico. at the end of this book and from many statistical software programs. As distribuigdes binomiais sdo de fundamental importancia em probabilidade e The binomial distributions are of fundamental importance in probability and estatistica devido ao seguinte resultado, que foi derivado na Secdo. 3.1 e que statistics because of the following result, which was derived in Sec. 3.1 and which we reafirmamos aqui na terminologia deste capitulo. restate here in the terminology of this chapter. Teorema Se as variaveis aleatoriasXi,..., XnformanEnsaios de Bernoulli com parametrop, e se Theorem If the random variables X,,..., X, form n Bernoulli trials with parameter p, and if 5.2.1 X=Mit+...+Xn, entdorAtem a distribuigdo binomial com pardmetrosnep. 5.2.1 X = X,+---+X,, then X has the binomial distribution with parameters n and p. 7 7 Quando representado como a soma denEnsaios de Bernoulli como no Teorema When X is represented as the sum of n Bernoulli trials as in Theorem 5.2.1, the 5.2.1, os valores da média, varidncia e mgf deXpode ser derivado com muita facilidade. values of the mean, variance, and m.g.f. of X can be derived very easily. These values, Esses valores, que ja foram obtidos no Exemplo 4.2.5 e nas paginas 231 e 238, sdo which were already obtained in Example 4.2.5 and on pages 231 and 238, are y” n EX EXeu=np, E(X)= > E(X;) =np, eu=1 i=l y” n Var(XF Var(XeuEnp( -p), Var(X) = > Var(X;) =np(1 — p), eu=1 i=l e and iT’ n W(t E(exx F(etxeu= (educacao Fisicatt1 -p)n. (5.2.4) W(t) = E(e'*x) = I] E(e'*i) =(pe'+1—p)". (5.2.4) eu=1 i=l 278 Chapter 5 Special Distributions The reader can use the m.g.f. in Eq. (5.2.4) to establish the following simple extension of Theorem 4.4.6. Theorem 5.2.2 If X1, . . . , Xk are independent random variables, and if Xi has the binomial distri- bution with parameters ni and p (i = 1, . . . , k), then the sum X1 + . . . + Xk has the binomial distribution with parameters n = n1 + . . . + nk and p. Theorem 5.2.2 also follows easily if we represent each Xi as the sum of ni Bernoulli trials with parameter p. If n = n1 + . . . + nk, and if all n trials are inde- pendent, then the sum X1 + . . . + Xk will simply be the sum of n Bernoulli trials with parameter p. Hence, this sum must have the binomial distribution with parameters n and p. Example 5.2.6 Castaneda v. Partida. Courts have used the binomial distributions to calculate proba- bilities of jury compositions from populations with known racial and ethnic composi- tions. In the case of Castaneda v. Partida, 430 U.S. 482 (1977), a local population was 79.1 percent Mexican American. During a 2.5-year period, there were 220 persons called to serve on grand juries, but only 100 were Mexican Americans. The claim was made that this was evidence of discrimination against Mexican Americans in the grand jury selection process. The court did a calculation under the assumption that grand jurors were drawn at random and independently from the population each with probability 0.791 of being Mexican American. Since the claim was that 100 was too small a number of Mexican Americans, the court calculated the probability that a binomial random variable X with parameters 220 and 0.791 would be 100 or less. The probability is very small (less than 10−25). Is this evidence of discrimination against Mexican Americans? The small probability was calculated under the assumption that X had the binomial distribution with parameters 220 and 0.791, which means that the court was assuming that there was no discrimination against Mexican Americans when performing the calculation. In other words, the small probability is the condi- tional probability of observing X ≤ 100 given that there is no discrimination. What should be more interesting to the court is the reverse conditional probability, namely, the probability that there is no discrimination given that X = 100 (or given X ≤ 100). This sounds like a case for Bayes’ theorem. After we introduce the beta distributions in Sec. 5.8, we shall show how to use Bayes’ theorem to calculate this probability (Examples 5.8.3 and 5.8.4). ◀ Note: Bernoulli and Binomial Distributions. Every random variable that takes only the two values 0 and 1 must have a Bernoulli distribution. However, not every sum of Bernoulli random variables has a binomial distribution. There are two conditions needed to apply Theorem 5.2.1. The Bernoulli random variables must be mutually independent, and they must all have the same parameter. If either of these conditions fails, the distribution of the sum will not be a binomial distribution. When the court did a binomial calculation in Example 5.2.6, it was defining “no discrimination” to mean that jurors were selected independently and with the same probability 0.791 of being Mexican American. If the court had defined “no discrimination” some other way, they would have needed to do a different, presumably more complicated, probability calculation. We conclude this section with an example that shows how Bernoulli and binomial calculations can improve efficiency when data collection is costly. Example 5.2.7 Group Testing. Military and other large organizations are often faced with the need to test large numbers of members for rare diseases. Suppose that each test requires 278 Capítulo 5 Distribuições Especiais O leitor pode usar o mgf na Eq. (5.2.4) para estabelecer a seguinte extensão simples do Teorema 4.4.6. Teorema 5.2.2 SeX1, . . . , Xksão variáveis aleatórias independentes, e seXeutem a distribuição binomial mas com parâmetrosneuep (eu=1, . . . , k), então a somaX1+. . .+Xktem a distribuição binomial com parâmetrosn=n1+. . .+nkep. O Teorema 5.2.2 também segue facilmente se representarmos cadaXeucomo a soma deneu Ensaios de Bernoulli com parâmetrop. Sen=n1+. . .+nk, e se todosnas tentativas são independentes, então a somaX1+. . .+Xkserá simplesmente a soma denEnsaios de Bernoulli com parâmetrop. Portanto, esta soma deve ter a distribuição binomial com parâmetros nep. Exemplo 5.2.6 Castaneda v. Partida.Os tribunais têm usado as distribuições binomiais para calcular probabilidades. habilidades de composição do júri de populações com composições raciais e étnicas conhecidas. No caso deCastaneda v., 430 US 482 (1977), a população local era de 79,1% mexicano-americana. Durante um período de 2,5 anos, 220 pessoas foram chamadas para servir em grandes júris, mas apenas 100 eram mexicano-americanos. Alegou-se que esta era uma prova de discriminação contra os mexicanos-americanos no processo de seleção do grande júri. O tribunal fez um cálculo partindo do pressuposto de que os grandes jurados foram sorteados aleatoriamente e independentemente da população, cada um com probabilidade de 0,791 de serem mexicano-americanos. Como a alegação era que 100 era um número muito pequeno de mexicano-americanos, o tribunal calculou a probabilidade de que uma variável aleatória binomialXcom os parâmetros 220 e 0,791 seria 100 ou menos. A probabilidade é muito pequena (menos de 10−25). Isso é evidência de discriminação contra os mexicanos-americanos? A pequena probabilidade foi calculada sob a suposição de que Xtinha a distribuição binomial com parâmetros 220 e 0,791, o que significa que o tribunal estava presumindo que não havia discriminação contra os mexicanos-americanos ao realizar o cálculo. Em outras palavras, a pequena probabilidade é a probabilidade condicional de observarX≤100 dado que não há discriminação. O que deveria ser mais interessante para o tribunal é a probabilidade condicional inversa, nomeadamente, a probabilidade de não haver discriminação, dado queX=100 (ou dadoX≤100). Isto soa como um caso para o teorema de Bayes. Depois de apresentarmos as distribuições beta na Seção. 5.8, mostraremos como usar o teorema de Bayes para calcular esta probabilidade (Exemplos 5.8.3 e 5.8.4). - Nota: Bernoulli e Distribuições Binomiais.Toda variável aleatória que assume apenas os dois valores 0 e 1 deve ter uma distribuição de Bernoulli. No entanto, nem toda soma de variáveis aleatórias de Bernoulli tem uma distribuição binomial. Existem duas condições necessárias para aplicar o Teorema 5.2.1. As variáveis aleatórias de Bernoulli devem ser mutuamente independentes e todas devem ter o mesmo parâmetro. Se alguma destas condições falhar, a distribuição da soma não será uma distribuição binomial. Quando o tribunal fez um cálculo binomial no Exemplo 5.2.6, estava definindo “sem discriminação” para significar que os jurados foram selecionados de forma independente e com a mesma probabilidade de 0,791 de serem mexicano-americanos. Se o tribunal tivesse definido “sem discriminação” de outra forma, teria sido necessário fazer um cálculo de probabilidade diferente, presumivelmente mais complicado. Concluímos esta seção com um exemplo que mostra como os cálculos de Bernoulli e binomiais podem melhorar a eficiência quando a coleta de dados é dispendiosa. Exemplo 5.2.7 Teste em grupo.As organizações militares e outras grandes organizações enfrentam frequentemente a necessidade para testar um grande número de membros para doenças raras. Suponha que cada teste exija 5.2 As Distribuigées Bernoulli e Binomial 279 5.2 The Bernoulli and Binomial Distributions 279 uma pequena quantidade de sangue e é garantido que a doenca sera detectada se ela estiver a small amount of blood, and it is guaranteed to detect the disease if it is anywhere em qualquer parte do sangue. Suponha que 1.000 pessoas precisem ser testadas para uma in the blood. Suppose that 1000 people need to be tested for a disease that affects doencga que afeta 1/5 de 1% de todas as pessoas. DeixarX=1 se pessoajtem a doenca eX=0 se 1/5 of 1 percent of all people. Let X; =1 if person j has the disease and X ; = 0 if nao, por-1,...,1000. Modelamos oXcomo varidveis aleatérias iid Bernoulli com parametro not, for j =1,..., 1000. We model the X ; as i.i.d. Bernoulli random variables with 0,002 paraf1,...,1000. A abordagem mais ingénua seria realizar 1000 testes para ver quem parameter 0.002 for j = 1, ..., 1000. The most naive approach would be to perform tem a doenca. Mas se os testes forem caros, pode haver uma forma mais econémica de testar. 1000 tests to see who has the disease. But if the tests are costly, there may be a more Por exemplo, pode-se dividir as 1.000 pessoas em 10 grupos de 100 cada. Para cada grupo, economical way to test. For example, one could divide the 1000 people into 10 groups pegue uma porcao da amostra de sangue de cada uma das 100 pessoas do grupo e combine-as of size 100 each. For each group, take a portion of the blood sample from each of em uma amostra. Em seguida, teste cada uma das 10 amostras combinadas. Se nenhuma das the 100 people in the group and combine them into one sample. Then test each of 10 amostras combinadas tiver a doenca, entao ninguém tera a doenga, e precisariamos apenas the 10 combined samples. If none of the 10 combined samples has the disease, then de 10 testes em vez de 1.000. Se apenas uma das amostras combinadas tivesse a doenga, entado nobody has the disease, and we needed only 10 tests instead of 1000. If only one of poderiamos testar essas 100 pessoas separadamente, e precisdavamos apenas 110 testes. the combined samples has the disease, then we can test those 100 people separately, and we needed only 110 tests. Em geral, deixeZ1,euwseja o numero de pessoas no grupo euque tém a doen¢a eu= In general, let Z; ; be the number of people in group i who have the disease for 1,...,10. Entdo cadaZ1,eutem a distribuig¢do binomial com pardmetros 100 e 0,002. i=1,...,10. Then each Z;, ; has the binomial distribution with parameters 100 and DeixarSi,eu=1 seZ1,eu>0 eSi,eu=0 seZ1,eu=0. Entdo cadaS1,eutem a distribui¢do de 0.002. Let Y,; =1if Z,; > O and Y, ; = Oif Z,; =0. Then each Y, ; has the Bernoulli Bernoulli com parametro distribution with parameter Pr.(Z1,eu>0F1 - Pr(Zi,eu=0F1 — 0.998100= 0.181, Pr(Z,,; > 0) =1— Pr(Z,, =0) = 1 — 0.998!" = 0.181, e eles sdo independentes. Entao.Si= 20 Seu o numero de grupos cujos membros and they are independent. Then Y, = ye Y,; is the number of groups whose mem- temos que testar individualmente. Também, Sitem a distribuigdo binomial com parametros 10 e bers we have to test individually. Also, Y; has the binomial distribution with param- 0,181. O numero de pessoas que precisamos testar individualmente é 10051. A média de 100516 eters 10 and 0.181. The number of people that we need to test individually is 100Y;. 100x10x0.181 = 181. Portanto, o numero total esperado de testes é 10 + 181 = 191, em vez de The mean of 100Y, is 100 x 10 x 0.181 = 181. So, the expected total number of tests is 1.000. Pode-se calcular a distribuigdéo completa do numero total de testes, 10051+ 10. O numero 10 + 181 = 191, rather than 1000. One can compute the entire distribution of the to- maximo de testes necessdrios para este procedimento de teste de grupo é 1.010, o que seria o tal number of tests, 100Y, + 10. The maximum number of tests needed by this group caso se todos os 10 grupos tivessem pelo menos uma pessoa com a doenga, mas isto tem testing procedure is 1010, which would be the case if all 10 groups had at least one probabilidade 3.84x10-s. Em todos os outros casos, os testes em grupo requerem menos de person with the disease, but this has probability 3.84 x 10~®. In all other cases, group 1.000 testes. testing requires fewer than 1000 tests. Existem versGes de varios estagios de testes de grupo, nas quais cada um dos grupos com There are multiple-stage versions of group testing in which each of the groups teste positivo é dividido em subgrupos, cada um testado em conjunto. Se cada um desses that tests positive is split further into subgroups which are each tested together. If subgrupos for suficientemente grande, podem ser subdivididos em subsubgrupos mais each of those subgroups is sufficiently large, they can be further subdivided into pequenos, etc. Finalmente, apenas os subgrupos da fase final que apresentam um resultado smaller sub-subgroups, etc. Finally, only the final-stage subgroups that have a positive positivo sao testados individualmente. Isso pode reduzir ainda mais o numero esperado de result are tested individually. This can further reduce the expected number of tests. testes. Por exemplo, considere a seguinte versdo em duas etapas do procedimento descrito For example, consider the following two-stage version of the procedure described anteriormente. Poderiamos dividir cada um dos 10 grupos de 100 pessoas em 10 subgrupos de earlier. We could divide each of the 10 groups of 100 people into 10 subgroups of 10 pessoas cada. Seguindo a nota¢do acima, deixeZ2, eu, ser 0 nUMero de pessoas no subgrupo 10 people each. Following the above notation, let Z> ; , be the number of people in kdo grupoeuque tém a doenga, poreu=1,...,10 ek=1,...,10. Entao cadaZ, eu, tema subgroup & of group i who have the disease, fori =1,..., 10 andk=1,..., 10. Then distribuicéo binomial com paradmetros 10 e 0,002. DeixarS2, eu, 1 seZ2, eu, k>0 eS2, eu, KO Caso each Z, ; , has the binomial distribution with parameters 10 and 0.002. Let Y, ;,=1 contrario. Notar que.S2, eu, 0 parak=1,...,10 para cada eude tal modo queSi,eu=0. Portanto, sé if Z, ;, > Oand Y> ; , = 0 otherwise. Notice that Y, ;, =Ofork=1,..., 10 for every precisamos testar individuos nesses subgrupos de modo que 32, eu, 1 cada.S2, eu, tem a i such that Y, ; = 0. So, we only need to test individuals in those subgroups such that distribuicgao de Bernoulli com parametro Yy 4 = 1. Each Y); , has the Bernoulli distribution with parameter Pr.(Z, eu, >0F1 - Pr(Z, eu, HO - 0.99810= 0.0198, Pr(Zp;,4 > 0) =1—Pr(Zz,;,, = 0) = 1— 0.998!" = 0.0198, x = ¢_ 210 210 fo . ; 10 x10 e eles sdo independentes. EntaoS2= eu=1_ 152, eu, Ke oO nUMero de grupos cujos and they are independent. Then Y, = )°;_, Vi=l Y> ;,, 8 the number of groups whose membros temos que testar individualmente. Também, S2tem distribuigdo binomial com members we have to test individually. Also, Y, has the binomial distribution with pardmetros 100 e 0,0198. O numero de pessoas que precisamos testar individualmente é parameters 100 and 0.0198. The number of people that we need to test individually is 10.52. A média de 1052é 10x100x0.0198 = 19.82. O numero de subgrupos que precisamos 10Y,. The mean of 10Y, is 10 x 100 x 0.0198 = 19.82. The number of subgroups that testar na segunda etapa 651, cuja média é 1,81. Portanto, o numero total esperado de we need to test in the second stage is Y;, whose mean is 1.81. So, the expected total testes 6 10 + 1.81 + 19.82 = 31.63, que é ainda menor que 0 191 para o procedimento de number of tests is 10 + 1.81 + 19.82 = 31.63, which is even smaller than the 191 for estagio Unico descrito anteriormente. - the one-stage procedure described earlier. < 280 Capitulo 5 Distribuigées Especiais 280 Chapter 5 Special Distributions Resumo Summary Uma variavel aleatériaXtem a distribuigdo de Bernoulli com pardmetropse o PF dex éf(x| A random variable X has the Bernoulli distribution with parameter p if the p.f. of X PF px() -ph-xparax=0,1 e 0 caso contrario. SeXi,..., Xnsdo variaveis _ aleatdrias iid, todas is f(x|p) = p*( — p)!~* for x =0, 1 and 0 otherwise. If X,,..., X,, are i.id. random com distribuigdo de Bernoulli Gdo com parametrop, entdo nos referimos a variables all having the Bernoulli distribution with parameter p, then we refer to M,...,Xncomo os julgamentos de Bernoulli, eX= éu=1Xeutem a distribuicdo binomial com X,,..., X,, as Bernoulli trials, and X = )~"_, X; has the binomial distribution with parametrosnep. Também,Xé o nimero de sucessos nonjulgamentos de Bernoulli, onde o parameters n and p. Also, X is the number of successes in the n Bernoulli trials, where sucesso no julgamentoeucorresponde aXev=1 e a falha corresponde aXev=0. success on trial i corresponds to X; = 1 and failure corresponds to X; = 0. Exercicios Exercises 1.Suponha queXé uma variavel aleatdria tal queEXkE 13 pendentes um do outro. Dado que pelo menos um dos 1. Suppose that X is a random variable such that E(X*) = pendently of each other. Given that at least one of the parak=1,2,....Supondo que nado pode haver mais de uma componentes falhou, qual é a probabilidade de pelo menos 1/3 for k =1, 2, .... Assuming that there cannot be more components has failed, what is the probability that at least distribuigdo com esta mesma sequéncia de momentos (ver dois dos componentes terem falhado? than one distribution with this same sequence of moments two of the components have failed? Exercicio 14), determine a distribuicgdo dex. oo. .. (see Exercise 14), determine the distribution of X. . 9.Suponha que as variaveis aleatoriasX,..., Xnforman 9. Suppose that the random variables X;,..., X,, formn 2.Suponha que uma variavel aleatériaXpode assumir apenas Ensaios de Bernoulli com parametrop. Determine a 2. Suppose that a random variable X can take only the Bernoulli trials with parameter p. Determine the condi- os dois valoresaebcom as seguintes probabilidades: probabilidade condicional de queXi= 1, dado que two values a and b with the following probabilities: tional probability that X,; = 1, given that Pr.(X=ak pe Pr(X=bF1 -pdg. »” Pr(X =a)=p and Pr(x=b)=1-p. n Xeu=k (k=1,..., n). SOX; =k k=1,...,n). Expresse o PF deXem uma forma semelhante aquela dada na eur Express the p.f. of X in a form similar to that given in i=l Eq. (5.2.2). 1. . e Eq. (5.2.2). o: ; aes 10.A probabilidade de que cada crianca especifica de uma 10. The probability that each specific child in a given fam- 3.Suponha que uma moeda justa (a probabilidade de dar cara é igual a determinada familia herde uma determinada doenca ép. Se se 3. Suppose that a fair coin (probability of heads equals _ ily will inherit a certain disease is p. If it is known that at 1/2) seja langada independentemente 10 vezes. Use a tabela de souber que pelo menos uma crianca numa familia den 1/2) is tossed independently 10 times. Use the table of the least one child in a family of n children has inherited the distribuicdo binomial fornecida no final deste livro para encontrar a criancas herdaram a doenca, qual é o numero esperado de binomial distribution given at the end of this book to find disease, what is the expected number of children in the probabilidade de que sejam obtidas estritamente mais caras do que criancas na familia que herdaram a doenca? the probability that strictly more heads are obtained than family who have inherited the disease? coroas. tails. 11.Para O<ps1, en=2,3,...,determinar o valor de 11. For 0 < p <1, andn =2, 3,..., determine the value 4.Suponha que a probabilidade de que um determinado 4. Suppose that the probability that a certain experiment of experimento seja bem-sucedido seja 0,4 e sejaXdenota o yn () will be successful is 0.4, and let X denote the number h ‘ > n . . . n numero de sucessos obtidos em 15 execucdes x(x) px(l -p)rx. of successes that are obtained in 15 independent perfor- \- x(x — v( ova — pyr, independentes do experimento. Use a tabela de 2 x mances of the experiment. Use the table of the binomial a x distribuigdo binomial fornecida no final deste livro para distribution given at the end of this book to determine the determinar 0 valor de Pr(6<X<9). 12.Se uma variavel aleatériaXtem uma distribuigdo value of Pr(6 < X <9). 12. If a random variable X has a discrete distribution ; 7 ; : discreta para a qual o PF é/(x), entdo o valor dexpara qual . . . . . for which the p.f. is f(x), then the value of x for which 5.Uma moeda cuja probabilidade de dar cara € 0,6 € lancada fixseu maximo é chamado demododa distribuigdo. Se 5. A coin for which the probability of heads is 0.6is tossed ¢() ig maximum is called the mode of the distribution. nove vezes. Use a tabela de distribuicdo binomial fornecida no este mesmo maximof(xk alcancado em mais de um valor nine times. Use the table of the binomial distribution given If this same maximum f(x) is attained at more than one final deste livro para encontrar a probabilidade de obter um dex, entdo todos esses valores dexsdo chamadosmodosda at the end of this book to find the probability of obtaining value of x, then all such values of x are called modes of numero par de caras. distribuigdo. Encontre o modo ou modos da distribuigdo an even number of heads. the distribution. Find the mode or modes of the binomial 6.Trés homensA,B, eCatirar em um alvo. Suponha que binomial com pardmetrosnep. Dica-Estude a proporgdo f(x 6. Three men A, B, and C shoot at a target. Suppose that distribution with parameters n and p. Hint: Study the ratio Aatira trés vezes e a probabilidade de acertaroalvoem = #117, PYF (x| 7, p). A shoots three times and the probability that he will hit /(@ + Iln, p)/f(xln, p). qualquer tiro € 1/8, Batira cinco vezes € a probabilidade 13.Num ensaio clinico com dois grupos de tratamento, a the target on any given shot Is 1/8, B shoots five times and 13. Ina clinical trial with two treatment groups, the prob- de acertar o alvo em qualquer tiro € 1/4, eCatira duas probabilidade de sucesso num grupo de tratamento é de 0,5ea the probability that he will hit the target on any given shot ability of success in one treatment group is 0.5, and the vezes € a probabilidade de acertar o alvo em qualquer probabilidade de sucesso no outro é de 0,6. Suponha que haja IS 1/4, and C shoots twice and the probability that he will probability of success in the other is 0.6. Suppose that tiroe 1/2. Qual e onamero esperado de vezes que 0 cinco pacientes em cada grupo. Suponha que os resultados de hit the target on any given shot is 1/2. Whats the expected there are five patients in each group. Assume that the alvo sera atingido? todos os pacientes sejam independentes. Calcule a probabilidade number of times that the target will be hit? outcomes of all patients are independent. Calculate the 7.Nas condicdes do Exercicio 6, assuma também que de que o primeiro grupo tenha pelo menos tantos sucessos 7. Under the conditions of Exercise 6, assume also that all | Probability that the first group will have at least as many todos os tiros no alvo sdo independentes. Qual é a quanto 0 segundo grupo. shots at the target are independent. What is the variance SUCCesses as the second group. variagdo do numero de vezes que 0 alvo sera atingido? 14.No Exercicio 1, assumimos que poderia haver no of the number of times that the target will be hit? 14. In Exercise 1, we assumed that there could be at soos . . soy . . . . . . . k 8.Um certo sistema eletrdnico contém 10 componentes. Suponha maximo uma distribuigao com momentosEXKF 1/3 para 8. A certain electronic system contains 10 components. | Most one distribution with moments E(X") = 1/3 for que a probabilidade de cada componente individual falhar seja0,2.-A=1,2,..- ‘Neste exercicio, provaremos que sO pode Suppose that the probability that each individual com- *&=1,2,.... In this EXCICISe, WE shall prove that there € que os componentes falhem independentemente. haver uma tal distribuicdo. Prove o seguinte ponent will fail is 0.2 and that the components fail inde- can be only one such distribution. Prove the following 5.3 The Hypergeometric Distributions 281 facts and show that they imply that at most one distribu- tion has the given moments. a. Pr(|X| ≤ 1) = 1. (If not, show that limk→∞ E(X2k) = ∞.) b. Pr(X2 ∈ {0, 1}) = 1. (If not, prove that E(X4) < E(X2).) c. Pr(X = −1) = 0. (If not, prove that E(X) < E(X2).) 15. In Example 5.2.7, suppose that we use the two-stage version described at the end of the example. What is the maximum number of tests that could possibly be needed by this version? What is the probability that the maximum number of tests would be required? 16. For the 1000 people in Example 5.2.7, suppose that we use the following three-stage group testing procedure. First, divide the 1000 people into five groups of size 200 each. For each group that tests positive, further divide it into five subgroups of size 40 each. For each subgroup that tests positive, further divide it into five sub-subgroups of size 8 each. For each sub-subgroup that tests positive, test all eight people. Find the expected number and maximum number of tests. 5.3 The Hypergeometric Distributions In this section, we consider dependent Bernoulli random variables. A common source of dependent Bernoulli random variables is sampling without replacement from a finite population. Suppose that a finite population consists of a known number of successes and failures. If we sample a fixed number of units from that population, the number of successes in our sample will have a distribution that is a member of the family of hypergeometric distributions. Definition and Examples Example 5.3.1 Sampling without Replacement. Suppose that a box contains A red balls and B blue balls. Suppose also that n ≥ 0 balls are selected at random from the box without replacement, and let X denote the number of red balls that are obtained. Clearly, we must have n ≤ A + B or we would run out of balls. Also, if n = 0, then X = 0 because there are no balls, red or blue, drawn. For cases with n ≥ 1, we can let Xi = 1 if the ith ball drawn is red and Xi = 0 if not. Then each Xi has a Bernoulli distribution, but X1, . . . , Xn are not independent in general. To see this, assume that both A > 0 and B > 0 as well as n ≥ 2. We will now show that Pr(X2 = 1|X1 = 0) ̸= Pr(X2 = 1|X1 = 1). If X1 = 1, then when the second ball is drawn there are only A − 1 red balls remaining out of a total of A + B − 1 available balls. Hence, Pr(X2 = 1|X1 = 1) = (A − 1)/(A + B − 1). By the same reasoning, Pr(X2 = 1|X1 = 0) = A A + B − 1 > A − 1 A + B − 1. Hence, X2 is not independent of X1, and we should not expect X to have a binomial distribution. ◀ The problem described in Example 5.3.1 is a template for all cases of sampling without replacement from a finite population with only two types of objects. Any- thing that we learn about the random variable X in Example 5.3.1 will apply to every case of sampling without replacement from finite populations with only two types of objects. First, we derive the distribution of X. 5.3 As Distribuições Hipergeométricas 281 fatos e mostram que eles implicam que no máximo uma distribuição tem os momentos dados. por esta versão? Qual é a probabilidade de que o número máximo de testes seja necessário? a.Pr.(|X| ≤1)=1. (Se não, mostre que limk→∞EX2k)= ∞.) 16.Para as 1.000 pessoas do Exemplo 5.2.7, suponha que usamos o seguinte procedimento de teste em grupo de três estágios. Primeiro, divida as 1.000 pessoas em cinco grupos de 200 cada. Para cada grupo com teste positivo, divida-o em cinco subgrupos de tamanho 40 cada. Para cada subgrupo com teste positivo, divida-o em cinco subsubgrupos de tamanho 8 cada. Para cada subgrupo com resultado positivo, teste todas as oito pessoas. Encontre o número esperado e o número máximo de testes. b.Pr.(X2∈ {0,1})=1. (Se não, prove queEX4) <E(X2).) c.Pr.(X= -1)=0. (Se não, prove queE(X) < E(X2).) 15.No Exemplo 5.2.7, suponha que utilizamos a versão de dois estágios descrita no final do exemplo. Qual é o número máximo de testes que poderiam ser necessários 5.3 As Distribuições Hipergeométricas Nesta seção, consideramos variáveis aleatórias dependentes de Bernoulli. Uma fonte comum de variáveis aleatórias dependentes de Bernoulli é a amostragem sem reposição de uma população finita. Suponha que uma população finita consista em um número conhecido de sucessos e fracassos. Se amostrarmos um número fixo de unidades dessa população, o número de sucessos na nossa amostra terá uma distribuição que é membro da família das distribuições hipergeométricas. Definição e exemplos Exemplo 5.3.1 Amostragem sem reposição.Suponha que uma caixa contenhaAbolas vermelhas eBazul bolas. Suponha também quen≥0 bolas são selecionadas aleatoriamente da caixa sem reposição eXdenota o número de bolas vermelhas obtidas. Claramente, devemos ter n≤A+Bou ficaríamos sem bolas. Também sen=0, entãoX=0 porque não há bolas, vermelhas ou azuis, sorteadas. Para casos comn≥1, podemos deixar Xeu=1 se oeua bola sorteada é vermelha eXeu=0 se não. Então cadaXeutem uma distribuição de Bernoulli, masX1, . . . , Xnnão são independentes em geral. Para ver isso, suponha que ambosUm >0 eB >0 bem comon≥2. Mostraremos agora que Pr(X2= 1|X1= 0)=Pr.(X2= 1|X1= 1). SeX1= 1, então quando a segunda bola for sorteada só haveráA-1 bola vermelha restante de um total deA+B-1 bolas disponíveis. Portanto, Pr.(X2= 1|X1= 1)= (A-1)/(A+B-1). Pelo mesmo raciocínio, A A+B-1 A-1 A+B-1 Pr.(X2= 1|X1= 0)= > . Por isso,X2não é independente deX1, e não devemos esperarXter uma distribuição binomial. - O problema descrito no Exemplo 5.3.1 é um modelo para todos os casos de amostragem sem reposição de uma população finita com apenas dois tipos de objetos. Qualquer coisa que aprendemos sobre a variável aleatóriaXno Exemplo 5.3.1 será aplicado a todos os casos de amostragem sem reposição de populações finitas com apenas dois tipos de objetos. Primeiro, derivamos a distribuição deX. 282 Capitulo 5 Distribuigées Especiais 282 Chapter 5 Special Distributions Teorema Fungdo de probabilidade.A distribuigdéo deXno Exemplo 5.3.1 tem o PF Theorem Probability Function. The distribution of X in Example 5.3.1 has the p-f. 5.3.1 5.3.1 ( AD ( B ) ( ‘) ( B ) xX N1-x x n—-x {(x|A, B ne (| ————}_, 5.3.1 x|A, B,n) = ———_, 5.3.1 XA Bn ( (5.3.1) fOlA, Bn) () (5.3.1) n n para for maximo{0, n-B} <xsmin{n / D}, (5.3.2) max{0, n — B} <x < min{n, A}, (5.3.2) ef(x|A, B, nF0 caso contrario. and f(x|A, B,n) =0 otherwise. ProvaClaramente, o valor deXnem pode excedermnem excederA. Portanto, deve ser verdade Proof Clearly, the value of X can neither exceed n nor exceed A. Therefore, it must queX<min{n / D}. Da mesma forma, porque o numero de bolas azuisn-Xque sao sorteados ndo be true that X < min{n, A}. Similarly, because the number of blue balls n — X that podem excederB, o valor deXdeve ser pelo menos/-8&. Porque o valor deXndo pode ser menor are drawn cannot exceed B, the value of X must be at least n — B. Because the value que 0, deve ser verdade queX=maximo{0, n-5}. Portanto, o valor deXdeve ser um numero of X cannot be less than 0, it must be true that X > max{0, n — B}. Hence, the value inteiro no intervalo em (5.3.2). of X must be an integer in the interval in (5.3.2). Vamos agora encontrar o FP deXusando argumentos combinatorios da Sec. 1.8. Os We shall now find the p.f. of X using combinatorial arguments from Sec. 1.8. The (qgsos degenerados, aqueles comA, Be/ounigual a 0, sao faceis de provar porque degenerate cases, those with A, B, and/or n equal to 0, are easy to prove because o= 1 para todos ndo negativosk, IncluindoA=0. Para os casos em que todos/,B, en (5) = 1 for all nonnegative k, including k = 0. For the cases in which all of A, B, andn so estritamente positivos, existem4;s) maneiras de escolhernbolas para fora doA+ Bdisponivel are Strictly positive, there are (Ate ) ways to choose n balls out of the A + B available bolas, e tudo isso (e4)Essas escolhas sao igualmente provaveis. Para cada inteiroxno intervalo balls, and all of these choices are equally likely. For each integer x in the interval (5.3.2), existem — xmaneiras de escolherxbolas vermelhas, e para cada uma dessas escolhas existem (5.3.2), there are (4) ways to choose x red balls, and for each such choice there are (2) maneiras de escolhern-xbolas azuis. Portanto, a probabilidade de obter exatamentex (7.) ways to choose n — x blue balls. Hence, the probability of obtaining exactly x bolas vermelhas forané dado pela Eq. (5.3.1). Além disso, f/x| A, 8, neve ser 0 para todos os outros red balls out of n is given by Eq. (5.3.1). Furthermore, f(x|A, B, ) must be 0 for all valores dex, porque todos os outros valores sao impossiveis. a other values of x, because all other values are impossible. a Definigao Distribuigdo Hipergeométrica.DeixarA,B, ensejam inteiros ndo negativos comnsAtB. Definition Hypergeometric Distribution. Let A, B, andn be nonnegative integers withn < A+ B. 5.3.1 Se uma variavel aleatériaXtem uma distribuigdo discreta com PF como nas Eqs. (5.3.1) e 5.3.1 If a random variable X has a discrete distribution with p.f. as in Eqs. (5.3.1) and (5.3.2), entdo diz-se queXtem odistribuicao hipergeométrica com pardmetrosA, (5.3.2), then it is said that X has the hypergeometric distribution with parameters A, Ben. B, and n. Exemplo Amostragem sem reposicdo de um conjunto de dados observados.Considere os pacientes do Example Sampling without Replacement from an Observed Data Set. Consider the patients in the 5.3.2 ensaio clinico cujos resultados estao tabulados na Tabela 2.1. Talvez seja necessario 5.3.2 clinical trial whose results are tabulated in Table 2.1. We might need to reexamine a reexaminar um subconjunto de pacientes do grupo placebo. Suponha que precisemos subset of the patients in the placebo group. Suppose that we need to sample 11 distinct amostrar 11 pacientes distintos dos 34 pacientes desse grupo. Qual é a distribuigéo do patients from the 34 patients in that group. What is the distribution of the number of numero de sucessos (sem recaida) que obtemos na subamostra? DeixarXrepresenta o successes (no relapse) that we obtain in the subsample? Let X stand for the number numero de sucessos na subamostra. A Tabela 2.1 indica que houve 10 sucessos e 24 of successes in the subsample. Table 2.1 indicates that there are 10 successes and fracassos no grupo placebo. De acordo com a definicdo da distribuicdo hipergeométrica,X 24 failures in the placebo group. According to the definition of the hypergeometric tem a distribuigdo hipergeométrica com parametrosA=10,8=24, en=11. Em particular, os distribution, X has the hypergeometric distribution with parameters A = 10, B = 24, possiveis valores deXsdo os numeros inteiros de 0 a 10. Embora tenhamos uma amostra and n = 11. In particular, the possible values of X are the integers from 0 to 10. Even de 11 pacientes, ndo podemos observar 11 sucessos, pois apenas 10 sucessos estado though we sample 11 patients, we cannot observe 11 successes, since only 10 successes disponiveis. - are available. < A média e a variancia para uma distribuigdo hipergeométrica The Mean and Variance for a Hypergeometric Distribution Teorema Média e Variancia.DeixarXtém uma distribuigdo hipergeométrica com estritamente positiva Theorem Mean and Variance. Let X have a hypergeometric distribution with strictly positive 5.3.2 parametrosA,B, en. Entdo 5.3.2 parameters A, B, andn. Then 5.3 As Distribuigdes Hipergeométricas 283 5.3 The Hypergeometric Distributions 283 Exe 22 (5.3.3) E(x) - 4 (5.3.3) A+B ’ ~ ~~ A +B ° ~~ NAB .AB-n AB A+B- Var(Xx=_ ——_ _ ————__., (5.3.4) Var(X) = nee arean (5.3.4) (A+B A+ B-1 (A+B)? A+B-1 Prova Assuma issoXé conforme definido no Exemplo 5.3.1, 0 numero de bolas vermelhas sorteadas Proof Assume that X is as defined in Example 5.3.1, the number of red balls drawn quandonbolas sao selecionadas aleatoriamente, sem reposigdo, de uma caixa contendoA bolas when n balls are selected at random without replacement from a box containing A vermelhas eSbolas azuis. Paraeu=1,..., 7, deixarXeu=1 se oeua bola selecionada é vermelha, e red balls and B blue balls. Fori =1,...,n, let X; =1if the ith ball that is selected deixeXeu=0 se oeua bola é azul. Conforme explicado no Exemplo 4.2.4, podemos imaginar que o is red, and let X; = 0 if the ith ball is blue. As explained in Example 4.2.4, we can nbolas sdo selecionadas da caixa primeiro organizando todas as bolas na caixa em alguma imagine that the n balls are selected from the box by first arranging all the balls in the ordem aleatoria e depois selecionando a primeiranbolas deste arranjo. Pode-se perceber a box in some random order and then selecting the first n balls from this arrangement. partir desta interpretagdo que, paraeu=1,..., n, It can be seen from this interpretation that, fori =1,...,n, A B A B Pr.(Xeu=1 — e Pr.(x3 ——_., Pr(x; = 1) = ——— and Pr(x; =0) = ——_. Portanto, paraeu=1,..., 7, Therefore, fori =1,...,n, A AB A AB EXeu- —— e Var(Xeu-: ————. (5.3.5) E(X;) = —— and Var(X;) = ——— . (5.3.5) A+B (At Bh A+B (A + B)? DesdexX=Xit. . .+Xn,a média deXé a soma das médiasXeu's, ou seja, a Eq. (5.3.3). Since X = X,+---+X,, the mean of X is the sum of the means of the X;’s, namely, Eq. (5.3.3). A seguir, use 0 Teorema 4.6.7 para escrever Next, use Theorem 4.6.7 to write "i 55 Var(XF Var(Xeu}+2 Cov(Xeu, Xj). (5.3.6) Var(X) = > Var(X;) +2 > > Cov(X;, Xj). (5.3.6) eu=1 euy i=l i<j Por causa da simetria entre as variaveis aleatoriasX,..., Xn, cada termo Because of the symmetry among the random variables X,,..., X,, every term Cov(Xeu, Xjno somatoério final da Eq. (5.3.6) tera o mesmo valor que Cov(X;, X;) in the final summation in Eq. (5.3.6) will have the same value as Cov(™1, X2). Ja que existemn) 2termos neste somatorio, segue das Eqs. (5.3.5) Cov(X;, X2). Since there are (5) terms in this summation, it follows from Eqs. (5.3.5) e (5.3.6) que and (5.3.6) that nAB nAB Var(X= —— + n(n-1Xov(X, X). (5.3.7) Var(X) = ———~ + n(n — 1) Cov(X}, X9). 5.3.7 (A+ Bp 1 2 (A+ Be 1 X2 (5.3.7) Poderiamos calcular Cov(X1, X2)diretamente, mas é mais simples argumentar da seguinte We could compute Cov(X,, X>) directly, but it is simpler to argue as follows. If forma. Se n=A+B, ent&o Pr(X=A¥1porquetodosas bolas da caixa serdo selecionadas sem n=A+B,thenPr(X = A) = 1 because all the balls in the box will be selected without reposicdo. Assim, paran=A+B,Xé uma variavel aleatoria constante e Var(X0. Configurando a replacement. Thus, for n = A+ B, X is a constant random variable and Var(X) = 0. Eq. (5.3.7) para 0 e resolvendo para Cov(X1, X2)da Setting Eq. (5.3.7) to 0 and solving for Cov(X,, X>) gives AB AB Cov(™1, X24 - = ——_______—_-. Cov(X,, X2) = ———.—_—. (A+Bh(A+B-1) (A + B)?(A + B—1) Inserindo esse valor novamente na Eq. (5.3.7) fornece a Eq. (5.3.4). 7 Plugging this value back into Eq. (5.3.7) gives Eq. (5.3.4). 7 Comparacdo de métodos de amostragem Comparison of Sampling Methods se nos tivéssemos amostradocomsubstituigdo no Exemplo 5.3.1, o numero de bolas If we had sampled with replacement in Example 5.3.1, the number of red balls would vermelhas teria a distribuigdo binomial com parametrosne Um/Unn B). Nesse caso, 0 have the binomial distribution with parameters n and A/(A + B). In that case, the numero médio de bolas vermelhas ainda serianA//A+ 8B), mas a variacao seria diferente. mean number of red balls would still be nA/(A + B), but the variance would be Para ver como as variagdes da amostragem com e sem reposicdo estao relacionadas, different. To see how the variances from sampling with and without replacement are vamos 7=A+ Bdenotar o numero total de bolas na caixa, e deixarp=NO denota a proporcgao related, let T = A + B denote the total number of balls in the box, and let p = A/T de bolas vermelhas na caixa. Entado a Eq. (5.3.4) pode ser reescrito da seguinte forma: denote the proportion of red balls in the box. Then Eq. (5.3.4) can be rewritten as follows: rn T-n 284 Capitulo 5 Distribuigées Especiais 284 Chapter 5 Special Distributions A variagaonp(1 -pXda distribuigdo binomial é a variancia do numero de bolas The variance np(1 — p) of the binomial distribution is the variance of the number vermelhas na amostragem com reposic¢do. O fatora=(7-n)XT-1 na Eq. (5.3.8) of red balls when sampling with replacement. The factor a = (T —n)/(T — 1) in representa portanto a reducdo do Var(X)causado pela amostragem sem Eq. (5.3.8) therefore represents the reduction in Var(X) caused by sampling without reposi¢do de uma populagao finita. Esseaé chamado deficorre¢ao populacional replacement from a finite population. This @ is called the finite population correction finita na teoria da amostragem de populacées finitas sem reposicao. in the theory of sampling from finite populations without replacement. Sen=1, 0 valor deste fatoraé 1, porque ndo ha distingdo entre amostragem If n = 1, the value of this factor a is 1, because there is no distinction between com reposicgdo e amostragem sem reposi¢do quando apenas uma bola esta sampling with replacement and sampling without replacement when only one ball is sendo selecionada. Sen=7,entdo (como mencionado anteriormente)a= 0 e Var(X) being selected. If n = T, then (as previously mentioned) a = 0 and Var(X) = 0. For =0. Para valores denentre 1 e7,0 valor deaestara entre 0e 1. values of n between 1 and T, the value of a will be between 0 and 1. Para cada tamanho de amostra fixon, pode ser visto quea>1 como7> ~.Este limite reflecte For each fixed sample size n, it can be seen that a > 1 as T > ov. This limit o facto de que quando o tamanho da populagao 7é muito grande em comparagdo com o reflects the fact that when the population size T is very large compared to the sample tamanho da amostran, ha muito pouca diferenga entre amostragem com reposigdo e size n, there is very little difference between sampling with replacement and sampling amostragem sem reposi¢do. O Teorema 5.3.4 expressa esta ideia de forma mais formal. A without replacement. Theorem 5.3.4 expresses this idea more formally. The proof prova se baseia no seguinte resultado, que é usado diversas vezes neste texto. relies on the following result which gets used several times in this text. Teorema Deixaranecnsejam sequéncias de numeros reais tais queanconverge para 0, ecnaz_, Theorem Let a, and c, be sequences of real numbers such that a, converges to 0, and cya? 5.3.3 converge para 0. Entdo 5.3.3 converges to 0. Then limdao(1 +an)on€-ancr=1. lim (1+ a,)%e ® =1. [ho no Em particular, seancnconverge parabd, entao(1 +an)Jmconverge paraes. a In particular, if a,c, converges to b, then (1+ a,)° converges to e?. a A prova do Teorema 5.3.3 é deixada ao leitor no Exercicio 11. The proof of Theorem 5.3.3 is left to the reader in Exercise 11. Teorema Proximidade de distribuicdes binomiais e hipergeométricas.Deixe 0<p <1, e deixenser Theorem Closeness of Binomial and Hypergeometric Distributions. Let 0 < p <1, and let n be 5.3.4 um numero inteiro positivo. DeixarStem a distribuigdo binomial com pardmetros 5.3.4 a positive integer. Let Y have the binomial distribution with parameters n and p. nep. Para cada inteiro positivo 7,deixarA7eBrsejam inteiros tais que lim™0A=~, For each positive integer T, let Ay and By be integers such that limy_,,, Ar = ~, limdo ~~ Br=~%,e limao0AT/(/AT+ Bre p. DeixarX7tem a distribuigdo limy_,., Br = &, and limy_,,, Ar/(Ar + Br) = p. Let X7 have the hypergeometric hipergeométrica com pardmetrosA/7, 87,en. Para cada fixone cada umx=0,..., 1, distribution with parameters A;, By, and n. For each fixed n and each x =0,...,n, Pr.(S=X, _ Prvy= wma PEO ay, (5.3.9) fim STOEL 7 (5.3.9) rcePr.(XT=X) Toe Pr(Xp = x) ProvaUma vezA7e87sdo ambos maiores quen, a formula em (5.3.1) 6 Pr(X7=x) Proof Once A; and B; are both larger than n, the formula in (5.3.1) is Pr(X; = x) para todosx=0,..., 7. Entdo, para grandes 7,Nos temos for allx =0,..., 7. So, for large T, we have () n An Bn (Art Br-n} Ar!Br\(Ar + Br —n)! Pr.(XT=xXF ___AnBRAM BEAR Pr(X; =x) = (") r'Br'(Ar + Br —n) X (AT-x)(Br-n+x)(AT+ Br) x} (Ap — x) Br —n+x)\"Ar + Br)! Aplique a formula de Stirling (Teorema 1.7.5) a cada um dos seis fatoriais do segundo Apply Stirling’s formula (Theorem 1.7.5) to each of the six factorials in the second fator acima. Um pouco de manipulagdo da isso factor above. A little manipulation gives that (n) + "hay 2Br+ayp (AT B enone 2 kim (Aart? pBrti2 cg, + Br — nyArtBr—nt1/2 im I T»00Pr.(XT=X)(AT-X)Ar-x+1/2(BT-N+X)Br-n+x+1/2(AT+ BT)Ar+ BT+1/2 Too Pr(Xp =x)(Ap — x47 41/2(Bp — nn + x)Brtxt1/2 (Ap + Bp)ArtBrtl/2 (5.3.10) (5.3.10) é igual a 1. Cada um dos seguintes limites segue do Teorema 5.3.3: equals 1. Each of the following limits follows from Theorem 5.3.3: ( AT ) Ar-x+172 A Arp—x4+1/2 jimio = ———— =ex lim (“+ =e" To = ATX Tow \Ap—-x ( ) Brnt+xt1/2 Byr—n+x4+1/2 BT . B r limao TT =en-x lim (=) => e”* To | BEX Roe \ Br —n +x ( A r+ Br-nan Bron 2 . (“" +Br-—n yee “a mao §=§ =@n. lim {| ————— =e”. T>00 Art Br Too \ Ar + Br 5.3 As Distribuigdes Hipergeométricas 285 5.3 The Hypergeometric Distributions 285 Inserindo esses limites nos rendimentos 319) Inserting these limits in (5.3.10) yields n, n x n-X nse, —— ART (5.3.11) jim — WAT Br (5.3.11) ToPr.(XT=X)AT+ Brn T>0o Pr(X7 =x)(A7 + Br)” DesdeA7/Ar+ Brxconverge parap, Nos temos Since A,/(A; + Br) converges to p, we have ArtBrx : A; Br * x n—-x lmao, ———————. = Px(1 -/p)rn-x. (5.3.12) lim ——————— = p*(1- p)"™. (5.3.12) T0(AT+ Brn T>0o (Ar + Br)” Juntos, (5.3.11) e (5.3.12) implica gue Together, (5.3.11) and (5.3.12) imply that n ung 2PACL=p)rx _ a4, tim W)P*A= ry T>o Pr.(XT=x) T>0o0 =6Pr(X7 =x) O numerador desta Ultima expressdo é Pr(S=x} portanto, (5.3.9) é valido. a The numerator of this last expression is Pr(Y = x); hence, (5.3.9) holds. a Em palavras, o Teorema 5.3.4 diz que se o tamanho da amostranrepresenta uma In words, Theorem 5.3.4 says that if the sample size n represents a negligible fraction fracdo insignificante da populacdo totalA+8, entdo a distribui¢do hipergeométrica of the total population A + B, then the hypergeometric distribution with parameters com paradmetros A,B, ensera quase igual a distribuigdo binomial com pardmetros nep A, B, andn will be very nearly the same as the binomial distribution with parameters =Um/(Um+ B). nand p=A/(A+B). Exemplo Populacdo de composicdo desconhecida.A distribuicdo hipergeométrica pode surgir como uma Example Population of Unknown Composition. The hypergeometric distribution can arise as a 5.3.3 distribuigdo condicional quando a amostragem é feita sem reposigdo a partir de uma 5.3.3 conditional distribution when sampling is done without replacement from a finite populacdo finita de composicao desconhecida. O exemplo mais simples seria population of unknown composition. The simplest example would be to modify modificar o Exemplo 5.3.1 para que ainda saibamos o valor de 7=A+Bmas ja nado sei A Example 5.3.1 so that we still know the value of T= A+ B but no longer know eB. Ou seja, sabemos quantas bolas ha na caixa, mas ndo sabemos quantas sao A and B. That is, we know how many balls are in the box, but we don’t know how vermelhas ou azuis. Isto fazP=NO,a proporcao de bolas vermelhas, desconhecida. many are red or blue. This makes P = A/T, the proportion of red balls, unknown. Deixarh(pSeja o PF deP.AquiPé uma variavel aleatéria cujos valores possiveis sdo 0,1/ Let h(p) be the p.f. of P. Here P is a random variable whose possible values are T,...,(F-1/7,1. Condicional emP=p, podemos nos comportar como se soubéssemos 0,1/T,..., (7 —1)/T, 1. Conditional on P = p, we can behave as if we know that que A=P7eB=(1 -p)7,e entdo a distribuigdo condicional deX(o numero de bolas A= pT and B =(1-— p)T, and then the conditional distribution of X (the number vermelhas em uma amostra de tamanhon) é a distribuigdo hipergeométrica com of red balls in a sample of size n) is the hypergeometric distribution with parameters parametros P7,(1 -p)7,en. pT,(1— p)T, andn. Suponha agora que 7é tdo grande que a diferenca é essencialmente insignificante Suppose now that T is so large that the difference is essentially negligible be- entre esta distribuigdo hipergeométrica e a distribuigéo binomial com parametrosnep. tween this hypergeometric distribution and the binomial distribution with parame- Neste caso, ndo é mais necessario assumirmos que 7é conhecido. Esta é a situagdo que ters n and p. In this case, it is no longer necessary that we assume that T is known. tinhamos em mente (nos Exemplos 3.4.10 e 3.6.7, bem como nas suas muitas variagdes e This is the situation that we had in mind (in Examples 3.4.10 and 3.6.7, as well as outros exemplos) quando nos referimos aPcomo a proporgdo de sucessos entre todos os their many variations and other examples) when we referred to P as the proportion pacientes que poderiam receber um tratamento ou a proporcdo de defeitos entre todas of successes among all patients who might receive a treatment or the proportion of as pecas produzidas por uma maquina. Nés pensamos em 7como essencialmente infinito, defectives among all parts produced by a machine. We think of T as essentially infi- de modo que condicional 4 proporgdoNO,que chamamosP,0s sorteios individuais tornam- nite so that conditional on the proportion A/T, which we call P, the individual draws se testes de Bernoulli independentes. Se qualquer umAou Zou ambos) é desconhecido, become independent Bernoulli trials. If either A or T (or both) is unknown, it makes faz sentido queP=NOsera desconhecido. No experimento aumentado descrito na pagina sense that P = A/T will be unknown. In the augmented experiment described on 61, no qualPpode ser calculado a partir do resultado experimental, temos que Pé uma page 61, in which P can be computed from the experimental outcome, we have that variavel aleatoria. - P is arandom variable. < Nota: Populacées Essencialmente Infinitas.O caso em que 7é essencialmente infinito no Exemplo Note: Essentially Infinite Populations. The case in which T is essentially infinite 5.3.3 6 a motivacdo para usar as distribuicdes binomiais como modelos para nimeros de sucessos em in Example 5.3.3 is the motivation for using the binomial distributions as models amostras de populacées finitas muito grandes. Veja o Exemplo 5.2.6, por exemplo. O numero de for numbers of successes in samples from very large finite populations. Look at mexicanos-americanos disponiveis para serem incluidos na amostra para 0 servico do grande juri é Example 5.2.6, for instance. The number of Mexican Americans available to be finito, mas é enorme em relacdo ao numero (220) de grandes jurados selecionados durante 0 periodo sampled for grand jury duty is finite, but it is huge relative to the number (220) of de 2,5 anos. Tecnicamente, é impossivel que os grandes jurados individuais sejam selecionados de grand jurors selected during the 2.5-year period. Technically, it is impossible that the forma independente, mas a diferenca é pequena demais para que até mesmo o melhor advogado de individual grand jurors are selected independently, but the difference is too small for defesa possa tirar alguma vantagem disso. No futuro, frequentemente modelaremos variaveis even the best defense attorney to make anything out of it. In the future, we will often aleatérias de Bernoulli como independentes quando imaginarmos seleciona-las model Bernoulli random variables as independent when we imagine selecting them raduzido do Inglés para o Portugués - www.onlinedoctranslator.com 286 Capitulo 5 Distribuigées Especiais 286 Chapter 5 Special Distributions aleatoriamente sem reposigdo de uma enorme populacado finita. Basear-nos-emos no at random without replacement from a huge finite population. We shall be relying Teorema 5.3.4 nestes casos sem 0 dizer explicitamente. on Theorem 5.3.4 in these cases without explicitly saying so. Estendendo a definicdo de coeficientes binomiais Extending the Definition of Binomial Coefficients Ha uma extensdo da definigdo de um coeficiente binomial dada na Secao. 1.8 There is an extension of the definition of a binomial coefficient given in Sec. 1.8 que permite uma simplificagdo da expressdo para o FP do hipergeométrico that allows a simplification of the expression for the pf. of the hypergeometric (Bu)tribuigdo. Para todos os inteiros positivosReeu, ondeRseu, o coeficiente binomial distribution. For all positive integers r and m, where r < m, the binomial coefficient afoi definido para ser (”) was defined to be () eu eu! ! = _, (5.3.13) (”) =" _, (5.3.13) R Rin-r) r r\(m —r)! Pode-se perceber que o valor de (20) secificado pela Eq. (5.3.13) também pode ser escrito It can be seen that the value of ("”) specified by Eq. (5.3.13) can also be written na forma () in the form eu _ milimetros-1)... (m-R+1 —-1)---@™- 1 = milimetros-1)... (m-R+1) (5.3.14) (”) = mim = 1-+-(mar+) (5.3.14) R Ri r r! Para cada numero realeuisso nao é necessariamente um ntimero inteiro positivo e todo For every real number m that is not necessarily a positive integer and every numero inteiro positivoR, o valor do lado direito da Eq. (5.3.14) é um numero bem definido. positive integer r, the value of the right side of Eq. (5.3.14) is a well-defined number. Portanto, para todo numero realeue (ev eu)todo inteiro positivoR, podemos estender Therefore, for every real number m and every positive integer r, we can extend a defini¢do do coeficiente binomial Eq. —_p_ definindo seu valor como aquele dado por the definition of the binomial coefficient (”") by defining its value as that given by (5.3.14). Eq. (5.3.14). O valor do coeficiente binomial (eu) pode ser obtido a partir desta definicdo The value of the binomial coefficient (”) can be obtained from this definition para todos os inteiros positivosReeu. SeR<eu, o valor de (eu) é dado pela oo. Se for all positive integers r and m. If r < m, the value of (”") is given by Eq.(5.3.13). If r>m,um dos fatores no numerador de (5.3.14) sera p e ( (eu, Finalmente, r >m, one of the factors in the numerator of (5.3.14) will be 0 and (’”) = 0. Finally, para cada numero realeu, definiremos 0 valor deeu g ser ev)a4, for every real number m, we shall define the value of (‘}) to be (5) =1. Quando esse ex(te : 8 . ; When this extended definition of a binomial coefficient is used, it can be seen que 0 valor deanidad PSPS CRATES Ft CORSE GOEL ROG ASE BF that the value of (“)(,”_) is 0 for every integer x such that either x > A orn —x > B. Portanto, podemos escrever o PF da distribuicdo hipergeométrica com pardmetros Therefore, we can write the p.f. of the hypergeometric distribution with parameters A,B, endo seguinte modo: A, B, and nas follows: ( A) ( B ) A B ts 0,1 MAS forx=0,1 fix|A, B nF | AFB parax=U,1,...,1, (5.3.15) f(x|A, Bn) = (474) orx =U, 1,...,N, (5.3.15) n 0 de outra forma. 0 otherwise. Segue-se entao da Eq. (5.3.14) que/(x| A, B, n) >0 se e somente sexé um numero inteiro no It then follows from Eq. (5.3.14) that f(x|A, B, n) > Oif and only if x is an integer in intervalo (5.3.2). the interval (5.3.2). Resumo Summary Introduzimos a familia de distribuigdes hipergeométricas. Suponha quenunidades We introduced the family of hypergeometric distributions. Suppose that n units are sdo sorteadas aleatoriamente sem reposigdo de uma populacdo finita que consiste drawn at random without replacement from a finite population consisting of T units em 7unidades das quaisAsdo sucessos e=7-Asdo fracassos. DeixarXrepresenta o of which A are successes and B = T — A are failures. Let X stand for the number of numero de sucessos na amostra. Entdo a distribuigdo deXé a distribuigdo successes in the sample. Then the distribution of X is the hypergeometric distribution hipergeométrica com parametrosA,B, en. Vimos que a disting¢do entre amostragem with parameters A, B, and n. We saw that the distinction between sampling from de uma populacao finita com e sem reposicdo é insignificante quando o tamanho da a finite population with and without replacement is negligible when the size of the populagdo é enorme em relagdo a(th population is huge relative to the size of the sample. We also generalized the binomial notacdo de coeficiente para queeu)étdefiaritho polar artonhbsaoT anntneno g eeaistize tuk qbinitivoio coefficient notation so that (") is defined for all real numbers m and all positive inteirosR. integers r. 5.4 As Distribuicées de Poisson 287 5.4 The Poisson Distributions 287 Exercicios Exercises 1.No Exemplo 5.3.2, calcule a probabilidade de todos os 10 Xi+X2=ké hipergeométrico com pardametrosm,m, ek. 1. In Example 5.3.2, compute the probability that all 10 X,+ X.=k is hypergeometric with parameters ny, no, pacientes com sucesso aparecerem na subamostra de tamanho success patients appear in the subsample of size 11 from and k. "1 do grupo Placebo. 7.Suponha que em um lote grande contendo /7itens the Placebo group. 7. Suppose that in a large lot containing T manufactured 2.Suponha que uma caixa contenha cinco bolas vermelhas e dez manufaturados, 30% dos itens sao defeituosos e 70% nao sao 2. Suppose that a box contains five red balls and ten blue items, 30 percent of the items are defective and 70 per- bolas azuis. Se sete bolas forem selecionadas aleatoriamente sem defeituosos. Além disso, suponha que dez itens sejam balls. If seven balls are selected at random without re- cent are nondefective. Also, suppose that ten items are reposicdo, qual é a probabilidade de obter pelo menos trés bolas selecionados aleatoriamente sem reposic¢ao do lote. placement, what is the probability that at least three red selected at random without replacement from the lot. De- vermelhas? Determinar(a)uma expressdo exata para a probabilidade de balls will be obtained? termine (a) an exact expression for the probability that not que nado mais do que um item defeituoso sera obtido e(b)uma more than one defective item will be obtained and (b) an 3.Suponha que sete bolas sejam selecionadas aleatoriamente sem —_ expressdo aproximada para esta probabilidade com base na 3. Suppose that seven balls are selected at random with- approximate expression for this probability based on the reposicgao de uma caixa contendo cinco bolas vermelhas e distribuigao binomial. out replacement from a box containing five red balls and binomial distribution. dez bolas azuis. SeXdenota a proporcao de bolas vermelhas 8.Considere um grupo de 7pessoas, e deixear,..., ar ten blue balls. If X denotes the proportion of red balls in 8. Consider a group of T persons, and let a, ..., a7 de- na amostra, quais séo a média e a variancia dex? denotar as alturas destes 7pessoas. Suponha quen the sample, what are the mean and the variance of X? note the heights of these T persons. Suppose that n per- 4.Se uma variavel aleatériaXtem a distribuigdo pessoas S80 Selecionadas geste grupo aleatoriamente, 4. If a random variable X has the hypergeometric distri- ae are selected from tats oe he at random without a hipergeométrica com pardmetrosA=8, B=20, en, para des reposicad, € elxam™ enotar a soma Cas arturas bution with parameters A = 8, B = 20, and n, for what placement, and let X enote the sum of the heights 0 : as essesnpessoas. Determine a média e a variancia dex. . . these n persons. Determine the mean and variance of X. que valor denvai Var(X)ser um maximo? ( value of n will Var(X) be a maximum? . : ; 9.Encontre o valor de3) . 9. Find the value of C?). 5.Suponha quenos alunos sdo selecionados aleatoriamente, sem ° 5. Suppose that n students are selected at random without reposicdo, de uma turma contendo 7estudantes, dos quais Asdo 10.Mostre que para todos os inteiros positivosnek, replacement from a class containing T students, of whom 10. Show that for all positive integers n and k, meninos e 7-Asdo meninas. DeixarXdenota o numero de meninos () ( ) Aare boys and T — A are girls. Let X denote the number of obtidos. Para qual tamanho de amostramvai Var(X)ser um 7a =(-1)k mil . boys that are obtained. For what sample size n will Var(X) (;") = (—1)* (" +k — ') maximo? k k be a maximum? k k 6.Suponha queXieX2sdo variaveis aleatdrias 11-Prove 0 Teorema 5.3.3. DicaProve isso 6. Suppose that X, and X> are independent random vari- 11. Prove Theorem 5.3.3. Hint: Prove that independentes, queXitem a distribuigdo binomial com limdocnregistro(1 +an)-ancn=0 ables, that X, has the binomial distribution with param- lim c, log +a,) — a,c, =9 paradmetrosmep, e essaX2tem a distribuigao binomial me eters n, and p, and that X, has the binomial distribution noe com parametrosmep, ondepé o mesmo para ambosX1 aplicando o teorema de Taylor com resto (ver Exercicio with parameters n> and p, where p is the same for both X, by applying Taylor’s theorem with remainder (see Exer- eX2. Para cada valor fixo dek (k=1,2,..., m+n2), prove 13 na Sedo 4.2) a funcdof(xFregistro(1 +xJem volta x= and X>. For each fixed value of k (k =1,2,...,, +79), cise 13 in Sec. 4.2) to the function f(x) = log(1 + x) around que a distribuicdo condicional deXidado que 0. prove that the conditional distribution of X; given that x=0. 5.4 As Distribuigdes de Poisson 5.4 The Poisson Distributions Muitos experimentos consistem em observar os tempos de ocorréncia de chegadas aleatorias. Os Many experiments consist of observing the occurrence times of random arrivals. exemplos incluem chegadas de clientes para atendimento, chegadas de chamadas em uma central Examples include arrivals of customers for service, arrivals of calls at a switch- telef6nica, ocorréncias de inunda¢ées e outros desastres naturais ou provocados pelo homem, e assim board, occurrences of floods and other natural and man-made disasters, and so por diante. A familia de distribuicées de Poisson é usada para modelar o nuimero dessas chegadas que forth. The family of Poisson distributions is used to model the number of such ocorrem em um periodo de tempo fixo. As distribuicées de Poisson também sao aproximacées Uteis arrivals that occur in a fixed time period. Poisson distributions are also useful para distribuicées binomiais com probabilidades de sucesso muito pequenas. approximations to binomial distributions with very small success probabilities. Definigdo e propriedades das distribuigdes de Poisson Definition and Properties of the Poisson Distributions Exemplo Chegadas de clientes.O dono de uma loja acredita que os clientes chegam a sua loja a uma taxa Example Customer Arrivals. A store owner believes that customers arrive at his store at a rate 5.4.1 de 4,5 clientes por hora em média. Ele quer encontrar a distribuicdo do numero realXde clientes 5.4.1 of 4.5 customers per hour on average. He wants to find the distribution of the actual que chegardo durante um determinado perfodo de uma hora no final do dia. Ele modela as number X of customers who will arrive during a particular one-hour period later in chegadas de clientes em diferentes periodos de tempo como independentes umas das outras. the day. He models customer arrivals in different time periods as independent of each Como primeira aproximacdo, ele divide o periodo de uma hora em 3.600 segundos e considera other. As a first approximation, he divides the one-hour period into 3600 seconds and a taxa de chegada como sendo 4.53600 = 0.00125 por segundo. Ele entao diz que durante cada thinks of the arrival rate as being 4.5/3600 = 0.00125 per second. He then says that segundo chegard 0 ou 1 cliente, e a probabilidade de uma chegada durante qualquer segundo during each second either 0 or 1 customers will arrive, and the probability of an arrival € 0.00125. Ele entdo tenta usar a distribuigdo binomial com during any single second is 0.00125. He then tries to use the binomial distribution with 288 Capitulo 5 Distribuigées Especiais 288 Chapter 5 Special Distributions parametrosn=3600 ep=0.00125 para distribuigdo do numero de clientes que parameters n = 3600 and p = 0.00125 for the distribution of the number of customers chegam no periodo de uma hora no final do dia. who arrive during the one-hour period later in the day. Ele comega a calcularfo FP dessa distribuigéo binomial e descobre rapidamente quao Hestarts calculating f, the p.f. of this binomial distribution, and quickly discovers complicados sao os calculos. No entanto, ele percebe que os valores sucessivos def(x) how cumbersome the calculations are. However, he realizes that the successive values estdo intimamente relacionados entre si porque/f(x)mudangas de forma sistematicax of f(x) are closely related to each other because f(x) changes in a systematic way aumenta. Entdo ele calcula as x increases. So he computes fort) “Ja pe- PP nx np foth Qe a=pyre! @ np om “e) WY. . Oon)C DO dn fi) Tbx(l -pln-x (+1) -p) xe fa) (ed py* + DA=p) x41 onde o raciocinio para a aproximagao no final é 0 seguinte: Para os primeiros 30 ou where the reasoning for the approximation at the end is as follows: For the first 30 mais valores dex,n-xé essencialmente o mesmo quene dividindo por 1 -pquase ndo or so values of x, — x is essentially the same as n and dividing by 1 — p has almost tem efeito porquepé tao pequeno. Por exemplo, parax=30, o valor real é 0,1441, no effect because p is so small. For example, for x = 30, the actual value is 0.1441, enquanto a aproximacao € 0,1452. Esta aproximagdo sugere definirA=npe while the approximation is 0.1452. This approximation suggests defining A = np and aproximandof(x+1 Fx)A/(x+1 Jpara todos os valores dexisso importa. Aquilo é, approximating f(x + 1) © f(x)A/( + 1) for all the values of x that matter. That is, FAFFON, fA) = fA, A 22 r nr F2EFA) ==f0O) =, 2=fHM=-=f0O)—, CEM) 5 Os f(2) fMs LOT A 23 a a3 FREF(2) =F) —, fO=fADZ=fO—, 3 6 3 6 Continuando o padrao para todosxcolheitay éf(x f (0)Ax/x! para todosx. Para obter um PF para Continuing the pattern for all x yields f(x) = f(0)A*/x! for all x. To obtain a p.f. for X, ele precisaria ter certeza de que x-0f(x)=1. Isto é facilmente conseguido definindo X, he would need to make sure that }°~° , f (x) = 1. This is easily achieved by setting f -y2—+— eA f0)=$— = -A , = =e > x=0Ax/Xx! yo Ax /x! onde a ultima igualdade segue do seguinte resultado de calculo bem conhecido: where the last equality follows from the following well-known calculus result: yx © * e= —, 5.4.1 v=) —, 5.4.1 qi (5.4.1) US (5.4.1) x=0 x=0 para todosA >0. Portanto, (x e-adx/M parax=0,1,...ef(x-0 caso contrario 6 um PF for all A > 0. Hence, f(x) = e~*A*/x! for x =0, 1,...and f(x) =0 otherwise is a p.f. - < A formula de aproximacao para o FP de uma distribuigdo binomial no final do The approximation formula for the p.f. of a binomial distribution at the end Exemplo 5.4.1 é na verdade um PF Util que pode modelar muitos fendmenos de tipos of Example 5.4.1 is actually a useful p.f. that can model many phenomena of types semelhantes as chegadas de clientes. similar to the arrivals of customers. Definicgao Distribuigdo de veneno.DeixarA >0. Uma variavel aleatériaXtem 0 Distribuic¢ao de veneno Definition Poisson Distribution. Let 4 > 0. A random variable X has the Poisson distribution 5.4.1 com média Ase o PF deXé o seguinte: 5.4.1 with mean d if the p.f. of X is as follows: levi ene fix| A a parax=0,1,2,..., (5.4.2) fla) = | — forx =0,1,2,..., (5.4.2) 0 de outra forma. 0 otherwise. No final do Exemplo 5.4.1, provamos que a fungao na Eq. (5.4.2) é de fato um At the end of Example 5.4.1, we proved that the function in Eq. (5.4.2) is indeed FP Para justificar a frase “com média/"na definicdo da distribuicgdo, precisamos a p.f. In order to justify the phrase “with mean i” in the definition of the distribution, provar que a média é de fatoA. we need to prove that the mean is indeed A. Teorema Significar.A média da distribuigdo com PF igual a (5.4.2) 6A. Theorem Mean. The mean of the distribution with p.f. equal to (5.4.2) is A. 5.4.1 5.4.1 5.4 As Distribuicées de Poisson 289 5.4 The Poisson Distributions 289 ProvaSeXtem a distribuigdo com pff(x| A), entao£X}é dado pela seguinte série Proof If X has the distribution with p.f. f(x|A), then E(X) is given by the following infinita: infinite series: x 00 EXE xf (x| A). E(X) = )° xf (ald). x=0 x=0 Como o termo correspondente ax=0 nesta série é 0, podemos omitir este termo Since the term corresponding to x = 0 in this series is 0, we can omit this term and e comecar a soma com o termo parax=1. Portanto, can begin the summation with the term for x = 1. Therefore, > yewx weet 90 OO gy OO pm AyxHl EXF Xf (x| AF X—— =A ———_.. E(X)= Xf (x|A) = x—— =i ———_.. | x ma (O=Dsfol =D = a x=1 x=1 x=1 x=1 x=1 x=1 Se agora deixarmossim=x-1 nesta soma, obtemos If we now let y = x — 1 in this summation, we obtain yemsim OB Ayy EX=A —_, E(X)=2)°——, sim y! sim=0 y=0 A soma das séries nesta equacgdo é a soma def(s| A), que é igual a 1. Portanto, The sum of the series in this equation is the sum of f(y|A), which equals 1. Hence, EXFA. = E(X)=.. = Exemplo Chegadas de clientes.No Exemplo 5.4.1, 0 dono da loja estava aproximando o binémio Example Customer Arrivals. In Example 5.4.1, the store owner was approximating the binomial 5.4.2 distribuigdo com parametros 3600 e 0.00125 com uma distribuigaéo que agora conhecemos 5.4.2 distribution with parameters 3600 and 0.00125 with a distribution that we now know como distribuicdo de Poisson com médiaA= 3600x0.00125 = 4.5. Parax=0,...,9, a Tabela 5.1 as the Poisson distribution with mean 4 = 3600 x 0.00125 = 4.5. For x =0,..., 9, apresenta as probabilidades binomiais e de Poisson correspondentes. Table 5.1 has the binomial and corresponding Poisson probabilities. A divisdo do periodo de uma hora em 3.600 segundos foi um tanto arbitraria. O The division of the one-hour period into 3600 seconds was somewhat arbitrary. proprietario poderia ter dividido a hora em 7.200 meios segundos ou 14.400 quartos de The owner could have divided the hour into 7200 half-seconds or 14400 quarter- segundo, etc. Independentemente de qudo finamente o tempo seja dividido, 0 produto do seconds, etc. Regardless of how finely the time is divided, the product of the number numero de intervalos de tempo e a taxa em clientes por intervalo de tempo sera sempre 4,5 of time intervals and the rate in customers per time interval will always be 4.5 because porque eles sdo tudo baseado em uma taxa de 4,5 clientes por hora. Talvez o dono da loja faria they are all based on a rate of 4.5 customers per hour. Perhaps the store owner would melhor simplesmente modelando o numeroXdas chegadas como uma variavel aleatéria de do better simply modeling the number X of arrivals as a Poisson random variable with Poisson com média 4,5, em vez de escolher um intervalo de tempo de tamanho arbitrario para mean 4.5, rather than choosing an arbitrarily sized time interval to accommodate a acomodar um tedioso calculo binomial. A desvantagem do modelo de Poisson paraXé que ha tedious binomial calculation. The disadvantage to the Poisson model for X is that probabilidade positiva de que uma variavel aleatoria de Poisson seja arbitrariamente grande, there is positive probability that a Poisson random variable will be arbitrarily large, enquanto uma variavel aleatoria binomial com parametrosnepnunca pode excedern. No whereas a binomial random variable with parameters n and p can never exceed n. entanto, a probabilidade é essencialmente 0 de que uma variavel aleatéria de Poisson com However, the probability is essentially 0 that a Poisson random variable with mean média 4,5 exceda 19. - 4.5 will exceed 19. < Tabela 5.1Probabilidades binomiais e de Poisson no Exemplo 5.4.2 Table 5.1 Binomial and Poisson probabilities in Example 5.4.2 x x 0 1 2 3 4 0 1 2 3 4 Binomial 0,01108 0,04991 0,11241 0,16874 0,18991 Binomial 0.01108 0.04991 0.11241 0.16874 0.18991 Poisson 0,01111 0,04999 0,11248 0,16872 0,18981 Poisson 0.01111 0.04999 0.11248 0.16872 0.18981 x x 5 6 7 8 9 5 6 7 8 9 Binomial 0,17094 0,12819 0,08237 0,04630 0,02313 Binomial 0.17094 0.12819 0.08237 0.04630 0.02313 Poisson 0,17083 0,12812 0,08237 0,04633 0,02317 Poisson 0.17083 0.12812 0.08237 0.04633 0.02317 290 Capitulo 5 Distribuigées Especiais 290 Chapter 5 Special Distributions Teorema Variancia.A variancia da distribuigdo de Poisson com médiadé tambémA. Theorem Variance. The variance of the Poisson distribution with mean A is also i. 5.4.2 5.4.2 ProvaA varidncia pode ser encontrada por uma técnica semelhante a usada na prova do Proof The variance can be found by a technique similar to the one used in the Teorema 5.4.1 para encontrar a média. Comecamos considerando a seguinte expectativa: proof of Theorem 5.4.1 to find the mean. We begin by considering the following expectation: > > CO CO ALX(X-1)] = X01 )F (x| AE x(x 1 )F (x| A) E[X(X —)l]= > x(x — Df (x|a) = > x(x — 1) f (x|A) x=0 xX=2 x=0 x=2 Y ex YeAAx-2 °° —Ayx OO Ay x—2 = x(x-1) sw 2 Te = ox - yp =v YP. 2 | 2 (x-2)! > x! = (x — 2)! Se deixarmossim=x-2, obtemos If we let y = x — 2, we obtain » e@AAsim °° ew ,y ALX(%1 J) =A2 —— =/2. (5.4.3) E[X(X —)] =i? S° —— = 2’. (5.4.3) sim! y! sim=0 y=0 Desde FALX(X-1 J] =EX2)- EX EX2}-A, segue da Eq. (5.4.3) que Ex2A2+A. Portanto, Since E[X(X — 1)] = E(X’) — E(X) = E(X’) — A, it follows from Eq. (5.4.3) that E(X7) =i? +4. Therefore, Var (XE EX2) [EX)]2= Eu. (5.4.4) Var(X) = E(X*) — [E(X)]? =4. (5.4.4) Portanto, a variancia também é igual aA. = Hence, the variance is also equal to 4. = Teorema Funcado geradora de momento.O mof da distribuigdo de Poisson com médiaAé Theorem Moment Generating Function. The m.g.f. of the Poisson distribution with mean A is 5.4.3 5.4.3 tcl Y(t exer), (5.4.5) w(t) =e YP, (5.4.5) para tudo de verdadet. for all real t. ProvaPara cada valor det (-~<t <co), Proof For every value of t (—oo < f < 00), ye etxe-AAx » ‘re OO tx mh x 0° nety)* W(t Flex =e) Aeon w(t) = E(’*) = yo et Se Gey x x x! x! x=0 x=0 x=0 x=0 Segue-se da Eq. (5.4.1) que, para -~<t <o, It follows from Eq. (5.4.1) that, for —co < t < ~, W(t €-ACE= CXet-1). = w(t) =e ore! = pde'—D = A média e a variancia, bem como todos os outros momentos, podem ser determinados a The mean and the variance, as well as all other moments, can be determined partir do mgf dado na Eq. (5.4.5). Nao derivaremos aqui os valores de quaisquer outros from the m.g.f. given in Eq. (5.4.5). We shall not derive the values of any other momentos, mas usaremos 0 mof para derivar a sequinte propriedade das distribuicdes de moments here, but we shall use the m.g.f. to derive the following property of Poisson Poisson. distributions. Teorema Se as variaveis aleatériasXi,..., Xsdo independentes e seXeutem o Poisson dis- Theorem If the random variables X,,..., X; are independent and if X; has the Poisson dis- 5.4.4 atribuigdo com médiaAeu(eu=1,..., kK), entao a somaXi+. . .+X*tem a distribuigdo 5.4.4 tribution with mean A; (i =1,...,), then the sum X;+---+ X;, has the Poisson de Poisson com médiaAi+. . .+Ak. distribution with mean A; +---+A,. ProvaDeixar Weu(t)denotar o mgf deXeuparaeu=1,..., k, e deixarw(Gdenotar o Proof Let y(t) denote the m.gf. of X; fori =1,...,k, and let y(t) denote the mof da somait...+Xx. DesdeXi,..., XsSdo independentes, segue-se que, para - m.g.f. of the sum X,+---+ X;. Since X,,..., X;, are independent, it follows that, 00 <t <vo, for —o0 <t<@, rook W(t Weu(t= Creu(et-1 FF @(A1+...+Aky(et-1). W(t) = I] Wy (t) = I] eile) _ ote tag) (ef LD) eu=1 eu=1 i=1 i=1 5.4 As Distribuicées de Poisson 291 5.4 The Poisson Distributions 291 Isso pode ser visto na Eq. (5.4.5) que este mgf y(t o maf da distribuigdo de Poisson It can be seen from Eq. (5.4.5) that this m.g.f. w(t) is the m.g.f. of the Poisson com médiadi+ seja conforme dedaiadearbaetremdistribuigdo dexit+. . .+Xkdeve distribution with mean 4, + ---+A,. Hence, the distribution of X, +.---+ X; must 7 be as stated in the theorem. 7 Uma tabela de probabilidades para distribuig6es de Poisson com varios valores da média A A table of probabilities for Poisson distributions with various values of the mean é fornecido no final deste livro. A is given at the end of this book. Exemplo Chegadas de clientes.Suponha que o dono da loja nos Exemplos 5.4.1 e 5.4.2 esteja interessado Example Customer Arrivals. Suppose that the store owner in Examples 5.4.1 and 5.4.2 is in- 5.4.3 interessado ndo apenas no numero de clientes que chegam no periodo de uma hora, mas 5.4.3 terested not only in the number of customers that arrive in the one-hour period, também em quantos clientes chegam na proxima hora apos esse periodo. DeixarSsera 0 but also in how many customers arrive in the next hour after that period. Let Y be numero de clientes que chegam na segunda hora. Pelo raciocinio no final do Exemplo the number of customers that arrive in the second hour. By the reasoning at the 5.4.2, o proprietario pode modelarScomo uma variavel aleatéria de Poisson com média end of Example 5.4.2, the owner might model Y as a Poisson random variable with 4,5. Ele também diria queXeSsdo independentes porque ele tem assumido que as mean 4.5. He would also say that X and Y are independent because he has been chegadas em intervalos de tempo disjuntos sdo independentes. De acordo como assuming that arrivals in disjoint time intervals are independent. According to Theo- Teorema 5.4.4,X+Steria a distribuigdo de Poisson com média 4.5 + 4.5 = 9. Qualéa rem 5.4.4, X + Y would have the Poisson distribution with mean 4.5 + 4.5 = 9. What probabilidade de pelo menos 12 clientes chegarem em todo 0 periodo de duas horas? is the probability that at least 12 customers will arrive in the entire two-hour period? Podemos usar a tabela de probabilidades de Poisson no final deste livro, olhando noA= 9 We can use the table of Poisson probabilities in the back of this book by looking in coluna. Some os numeros correspondentes ak=0,...,11 e subtraia 0 total de 1 ousome the 24 =9 column. Either add up the numbers corresponding to k =0,..., 11 and os valores dek=12 até o final. De qualquer forma, o resultado é Pr(X212}0.1970. subtract the total from 1, or add up those from k = 12 to the end. Either way, the - result is Pr(X > 12) = 0.1970. < A aproximacdo de Poisson para distribuicées binomiais The Poisson Approximation to Binomial Distributions Nos Exemplos 5.4.1 e 5.4.2, ilustramos o qudo préxima a distribuigdo de Poisson com In Examples 5.4.1 and 5.4.2, we illustrated how close the Poisson distribution with média 4,5 esta da distribuicgdo binomial com pardmetros 3600 e 0,00125. mean 4.5 is to the binomial distribution with parameters 3600 and 0.00125. We shall Demonstraremos agora uma versdo geral desse resultado, a saber, que quando o valor now demonstrate a general version of that result, namely, that when the value of n den é grande e o valor depesta prdoximo de 0, a distribuigdo binomial com pardmetrosn ep is large and the value of p is close to 0, the binomial distribution with parameters n pode ser aproximado pela distribuigdo de Poisson com médianp. and p can be approximated by the Poisson distribution with mean np. Teorema Proximidade das Distribuigées Binomial e Poisson.Para cada inteirone cada 0<p <1, Theorem Closeness of Binomial and Poisson Distributions. For each integer n and each0 < p <1, 5.4.5 deixar f(x| n, pdenotar o PF da distribuigdo binomial com pardmetrosnep. Deixar 5.4.5 let f(x|n, p) denote the p.f. of the binomial distribution with parameters n and p. f(x| A)denotar o PF da distribuigdo de Poisson com médiaA. Deixar {p sequiéajaian Let f(x|A) denote the p.f. of the Poisson distribution with mean A. Let {p,}°°, be a de numeros entre 0 e 1 tal que limn-enpn=A. Entado sequence of numbers between 0 and 1 such that lim,_,,, np, =A. Then limdof(x|n, p nEF( x|A), lim f(x|n, py) = fa), no noo para todosx=0,1,.... for allx =0,1,.... ProvaComecamos escrevendo Proof We begin by writing n(n-1)... (n-x+1) . n(n —1)---(n—-x +1) _ f(x| 1, pn 7 “pn) fxn, Pp) = Pn — Pn)". A seguir, deixeAn=npnentao aquele limn-«An=A. Entéof(x| 0, pn)pode ser reescrito na Next, let A,, = np, so that lim,_,,, A, =A. Then f(x|n, p,) can be rewritten in the seguinte forma: following form: ( ) nf ). _ Asn. n-1...n-x+1 A A, ~~ a —1 —x+1 dn \" dn \ > fix|n, pne ae ed - - 8) Fosin, py) = “EE MOE BESET (Za) (1-*) . (5.4.6) xn n n n n x!n n n n n Para cadax20, For each x > 0, ( ). x _ nn... ext A . —1 —x4+1 dn \ > igo PAT Og Oa, lim 2."—",.. 2757" (1-**) =1. mon n n n n>oo n n n n 292 Capitulo 5 Distribuigées Especiais 292 Chapter 5 Special Distributions Além disso, segue do Teorema 5.3.3 que Furthermore, it follows from Theorem 5.3.3 that ( dn n . An . An Ah limite 1-— =ea. (5.4.7) lim {1—-—) =e”. (5.4.7) no n n—> oo n Segue-se agora da Eq. (5.4.6) que para cadax20, It now follows from Eq. (5.4.6) that for every x > 0, eax ene imao fx| 1, P n)= ZT =f(x| A). 7 im f(x|n, Pa) = — = f (XA). 7 Exemplo Aproximando uma probabilidade.Suponha que numa grande populacao a proporgao de Example Approximating a Probability. Suppose that in a large population the proportion of 5.4.4 pessoas que tém uma determinada doenca é 0,01. Determinaremos a probabilidade de que num 5.4.4 people who have a certain disease is 0.01. We shall determine the probability that in grupo aleatério de 200 pessoas pelo menos quatro pessoas tenham a doenca. a random group of 200 people at least four people will have the disease. Neste exemplo, podemos assumir que a distribuigdo exata do numero de In this example, we can assume that the exact distribution of the number of pessoas com a doenga entre as 200 pessoas do grupo aleatorio é a distribuicgdo people having the disease among the 200 people in the random group is the binomial binomial com pardmetrosn=200 ep=0.01. Portanto, esta distribuigdo pode ser distribution with parameters n = 200 and p = 0.01. Therefore, this distribution can aproximada pela distribuigdo de Poisson para a qual a média éA=np=2. SeX denota be approximated by the Poisson distribution for which the mean is A = np = 2. If X uma variavel aleatéria com esta distribuigdo de Poisson, entdo pode ser encontrado denotes a random variable having this Poisson distribution, then it can be found from na tabela da distribuigdo de Poisson no final deste livro que Pr(X240.1428. Assim, a the table of the Poisson distribution at the end of this book that Pr(X > 4) = 0.1428. probabilidade de pelo menos quatro pessoas terem a doenga é de aproximadamente Hence, the probability that at least four people will have the disease is approximately 0,1428. O valor real é 0,1420. - 0.1428. The actual value is 0.1420. < O teorema 5.4.5 diz que sené grande epé pequeno entdonpé perto ded, entdo a Theorem 5.4.5 says that if n is large and p is small so that np is close to A, then the distribuigdo binomial com parametrosnepesta préximo da distribuigéo de Poisson com binomial distribution with parameters n and pis close to the Poisson distribution with médiaA. Lembre-se do Teorema 5.3.4, que diz que seAeBsdo grandes em comparacgao mean A. Recall Theorem 5.3.4, which says that if A and B are large compared ton and comne seUm/Um+ BX perto dep, entao a distribuigdo hipergeométrica com parametrosA, if A/(A + B) isclose to p, then the hypergeometric distribution with parameters A, B, B, enesta proximo da distribuigéo binomial com parametrosnep. Estes dois resultados and n is close to the binomial distribution with parameters n and p. These two results podem ser combinados no seguinte teorema, cuja prova fica para o Exercicio 17. can be combined into the following theorem, whose proof is left to Exercise 17. Teorema Proximidade das distribuicdes hipergeométricas e de Poisson.DeixarA >0. DeixeStenha o Theorem Closeness of Hypergeometric and Poisson Distributions. Let 1 > 0. Let Y have the 5.4.6 Distribuigdo de Poisson com médiad. Para cada inteiro positivo 7,deixarAz, Bre nr 5.4.6 Poisson distribution with mean i. For each positive integer T, let Ay, By, and sejam inteiros tais que lim ~-A7=~,lima0 7» B=, limdo 0NT=~,e limdo 0NTAT/ nr be integers such that limy_,,, Ar = 00, limy_,,, Br = 00, limy_,.g np = Ov, and (AT+ BTA. DeixarX7tem a distribuigdo hipergeométrica com parametrosA7, 87,enr limyp_,g9 N7pAr/(Ar + Br) =A. Let X7 have the hypergeometric distribution with .Para cada fixox=0,1,..., parameters A;, B;, and nz. For each fixed x =0,1,..., Pr.(S=Xx, . Priv =x wig, OO" =, . lim OEY 1, . To~@Pr.(XT=X) Too Pr(X7 = x) Processos de Poisson Poisson Processes Exemplo Chegadas de clientes.No Exemplo 5.4.3, o dono da loja acredita que o numero de Example Customer Arrivals. In Example 5.4.3, the store owner believes that the number of 5.4.5 os clientes que chegam a cada periodo de uma hora tem distribuicdo de Poisson com média 5.4.5 customers that arrive in each one-hour period has the Poisson distribution with mean 4,5. Ese o proprietario estiver interessado em um periodo de meia hora ou de 4 horas e 15 4.5. What if the owner is interested in a half-hour period or a 4-hour and 15-minute minutos? E seguro assumir que o numero de clientes que chegam num periodo de meia hora period? Is it safe to assume that the number of customers that arrive in a half-hour tem a distribuigéo de Poisson com média 2,25? - period has the Poisson distribution with mean 2.25? < Para ter certeza de que todas as distribuigdes para os varios numeros de chegadas no In order to be sure that all of the distributions for the various numbers of arrivals Exemplo 5.4.5 sdo consistentes entre si, o proprietario da loja precisa pensar no processo geral in Example 5.4.5 are consistent with each other, the store owner needs to think about de chegadas de clientes, e ndo apenas em alguns periodos isolados. A definigdo a seguir the overall process of customer arrivals, not just a few isolated time periods. The fornece um modelo para o processo geral de chegadas que permitira ao dono da loja construir following definition gives a model for the overall process of arrivals that will allow distribuicdes para todas as contagens de chegadas de clientes que lhe interessam, bem como the store owner to construct distributions for all the counts of customer arrivals that outras coisas Uuteis. interest him as well as other useful things. 5.4 The Poisson Distributions 293 Definition 5.4.2 Poisson Process. A Poisson process with rate λ per unit time is a process that satisfies the following two properties: i. The number of arrivals in every fixed interval of time of length t has the Poisson distribution with mean λt. ii. The numbers of arrivals in every collection of disjoint time intervals are inde- pendent. The answer to the question at the end of Example 5.4.5 will be “yes” if the store owner makes the assumption that customers arrive according to a Poisson process with rate 4.5 per hour. Here is another example. Example 5.4.6 Radioactive Particles. Suppose that radioactive particles strike a certain target in accordance with a Poisson process at an average rate of three particles per minute. We shall determine the probability that 10 or more particles will strike the target in a particular two-minute period. In a Poisson process, the number of particles striking the target in any particular one-minute period has the Poisson distribution with mean λ. Since the mean num- ber of strikes in any one-minute period is 3, it follows that λ = 3 in this example. Therefore, the number of strikes X in any two-minute period will have the Poisson distribution with mean 6. It can be found from the table of the Poisson distribution at the end of this book that Pr(X ≥ 10) = 0.0838. ◀ Note: Generality of Poisson Processes. Although we have introduced Poisson pro- cesses in terms of counts of arrivals during time intervals, Poisson processes are actually more general. For example, a Poisson process can be used to model occur- rences in space as well as time. A Poisson process could be used to model telephone calls arriving at a switchboard, atomic particles emitted from a radioactive source, diseased trees in a forest, or defects on the surface of a manufactured product. The reason for the popularity of the Poisson process model is twofold. First, the model is computationally convenient. Second, there is a mathematical justification for the model if one makes three plausible assumptions about how the phenomena occur. We shall present the three assumptions in some detail after another example. Example 5.4.7 Cryptosporidium in Drinking Water. Cryptosporidium is a genus of protozoa that oc- curs as small oocysts and can cause painful sickness and even death when ingested. Occasionally, oocysts are detected in public drinking water supplies. A concentration as low as one oocyst per five liters can be enough to trigger a boil-water advisory. In April 1993, many thousands of people became ill during a cryptosporidiosis outbreak in Milwaukee, Wisconsin. Different water systems have different systems for moni- toring protozoa occurrence in drinking water. One problem with monitoring systems is that detection technology is not always very sensitive. One popular technique is to push a large amount of water through a very fine filter and then treat the material captured on the filter in a way that identifies Cryptosporidium oocysts. The number of oocysts is then counted and recorded. Even if there is an oocyst on the filter, the probability can be as low as 0.1 that it will get counted. Suppose that, in a particular water supply, oocysts occur according to a Poisson process with rate λ oocysts per liter. Suppose that the filtering system is capable of capturing all oocysts in a sample, but that the counting system has probability p of actually observing each oocyst that is on the filter. Assume that the counting system observes or misses each oocyst on the filter independently. What is the distribution of the number of counted oocysts from t liters of filtered water? 5.4 As Distribuições de Poisson 293 Definição 5.4.2 Processo de Poisson.AProcesso de Poissoncom taxaλpor unidade de tempo é um processo que satisfaz as duas propriedades a seguir: eu. O número de chegadas em cada intervalo fixo de tempo e duraçãottem a distribuição de Poisson com médiaλt. ii. Os números de chegadas em cada coleção de intervalos de tempo disjuntos são independentes. A resposta à pergunta no final do Exemplo 5.4.5 será “sim” se o dono da loja presumir que os clientes chegam de acordo com um processo de Poisson com taxa de 4,5 por hora. Aqui está outro exemplo. Exemplo 5.4.6 Partículas Radioativas.Suponha que partículas radioativas atinjam um determinado alvo em de acordo com um processo de Poisson a uma taxa média de três partículas por minuto. Determinaremos a probabilidade de que 10 ou mais partículas atinjam o alvo em um determinado período de dois minutos. Num processo de Poisson, o número de partículas que atingem o alvo em qualquer período específico de um minuto tem a distribuição de Poisson com médiaλ. Como o número médio de golpes em qualquer período de um minuto é 3, segue-se queλ= 3 neste exemplo. Portanto, o número de grevesXem qualquer período de dois minutos terá a distribuição de Poisson com média 6. Pode-se verificar na tabela de distribuição de Poisson no final deste livro que Pr(X≥10)=0.0838. - Nota: Generalidade dos Processos de Poisson.Embora tenhamos introduzido processos de Poisson em termos de contagens de chegadas durante intervalos de tempo, os processos de Poisson são na verdade mais gerais. Por exemplo, um processo de Poisson pode ser usado para modelar ocorrências no espaço e também no tempo. Um processo de Poisson poderia ser usado para modelar chamadas telefônicas que chegam a uma central telefônica, partículas atômicas emitidas por uma fonte radioativa, árvores doentes em uma floresta ou defeitos na superfície de um produto manufaturado. A razão para a popularidade do modelo de processo de Poisson é dupla. Primeiro, o modelo é computacionalmente conveniente. Em segundo lugar, existe uma justificação matemática para o modelo se forem feitas três suposições plausíveis sobre como os fenómenos ocorrem. Apresentaremos as três suposições com algum detalhe após outro exemplo. Exemplo 5.4.7 Cryptosporidium em água potável.Cryptosporidium é um gênero de protozoários que ocorre se desenvolve como pequenos oocistos e pode causar doenças dolorosas e até a morte quando ingerido. Ocasionalmente, oocistos são detectados em abastecimento público de água potável. Uma concentração tão baixa quanto um oocisto por cinco litros pode ser suficiente para acionar um aviso de fervura de água. Em abril de 1993, muitos milhares de pessoas ficaram doentes durante um surto de criptosporidiose em Milwaukee, Wisconsin. Diferentes sistemas de água possuem diferentes sistemas para monitorar a ocorrência de protozoários na água potável. Um problema dos sistemas de monitorização é que a tecnologia de detecção nem sempre é muito sensível. Uma técnica popular é empurrar uma grande quantidade de água através de um filtro muito fino e depois tratar o material capturado no filtro de uma forma que identifique os oocistos de Cryptosporidium. O número de oocistos é então contado e registrado. Mesmo que haja um oocisto no filtro, Suponha que, em um determinado abastecimento de água, os oocistos ocorram de acordo com um processo de Poisson com taxaλoocistos por litro. Suponha que o sistema de filtragem seja capaz de capturar todos os oocistos de uma amostra, mas que o sistema de contagem tenha probabilidadep de realmente observar cada oocisto que está no filtro. Suponha que o sistema de contagem observe ou perca cada oocisto no filtro de forma independente. Qual é a distribuição do número de oocistos contados detlitros de água filtrada? 294 Capitulo 5 Distribuigées Especiais 294 Chapter 5 Special Distributions DeixarSseja o numero de oocistos nodiitros (todos vdo para o filtro). EntaoStem a Let Y be the number of oocysts in the ¢ liters (all of which make it onto the filter). distribuigdo de Poisson com médiaAt.DeixarXeu=1 se oeuo oocisto no filtro € contado Then Y has the Poisson distribution with mean Ar. Let X; = 1 if the ith oocyst on the eXeu=0 se nao. DeixarXseja o numero contado de oocistos para que X=Xi+t. . .+XsimseS filter gets counted, and X; = Oif not. Let X be the counted number of oocysts so that =sim. Condicional emS=sim, assumimos queXeusdo variaveis aleatdrias X = X,+---+X, if Y=y. Conditional on Y = y, we have assumed that the X; are independentes de Bernoulli com pardmetrop, entaoXtem a distribuigdo binomial com independent Bernoulli random variables with parameter p, so X has the binomial parametrossimepcondicional aS=sim. Queremos a distribuigdo marginal dex. Isto distribution with parameters y and p conditional on Y = y. We want the marginal pode ser encontrado usando a lei da probabilidade total para variaveis aleatérias distribution of X. This can be found using the law of total probability for random (3.6.11). Parax=0,1,..., variables (3.6.11). For x =0,1,..., » CO fi(xF gi (x| s)f2(s) A@) => aly) AO) sim=0 ( y=0 ; sim oO ) = x a, _p)sim-xe- Naot = > (*)ova _ p)?-Sen ADE sim sim! yor y! co Co -e- Nao(pAthD. [AC (1 -pJ|sim-x _ ar (Paty S [ar(1 — p)P Mo (vocé-x) x! v= (y—x)! ” TAL (I -pJlvoce 3p (paty® SS [ard — py]! = e-NaofpAE hy ——— =e > ——— . voce vocél XxX: u=0 us = enao¢ LA ert(i-pFe-par(pA__Ox = pt (PA pat p) — 9 pat (PA x xl x! x! Isto é facilmente reconhecido como o PF da distribuicgdo de Poisson com médiapAt.O This is easily recognized as the p.f. of the Poisson distribution with mean pat. The efeito de perder uma fragao 1 -pda contagem de oocistos 6 meramente diminuir a taxa do effect of losing a fraction 1 — p of the oocyst count is merely to lower the rate of the processo de Poisson deApor litro parapApor litro. Poisson process from A per liter to pd per liter. Suponha que/= 0.2 ep=0.1. Quanta agua devemos filtrar para que haja Suppose that 4 =0.2 and p=0.1. How much water must we filter in order probabilidade de pelo menos 0,9 de contarmos pelo menos um oocisto? A for there to be probability at least 0.9 that we will count at least one oocyst? The probabilidade de contar pelo menos um oocisto é 1 menos a probabilidade de ndo probability of counting at least one oocyst is 1 minus the probability of counting contar nenhum, que éé@-pa=e-0.02t.Entdo, precisamosfgrande o suficiente para que 1 - none, which is e~?*" = e~°-"', So, we need t large enough so that 1 — e~°°¥' > 0.9, €-0.02220.9, isto 6,115. Um procedimento tipico é testar 100 litros, o que teria that is, t > 115. A typical procedure is to test 100 liters, which would have probability probabilidade 1 —e.02x100= 0.86 de detectar pelo menos um oocisto. - 1 — e~ -02x100 — 0.86 of detecting at least one oocyst. < Suposicgdes subjacentes ao modelo de processo de Poisson Assumptions Underlying the Poisson Process Model A seguir, nos referiremos a intervalos de tempo, mas as suposi¢des podem ser usadas igualmente In what follows, we shall refer to time intervals, but the assumptions can be used bem para sub-regides de regides bidimensionais ou tridimensionais ou subcomprimentos de uma equally well for subregions of two- or three-dimensional regions or sublengths of distancia linear. Na verdade, um processo de Poisson pode ser usado para modelar ocorréncias em a linear distance. Indeed, a Poisson process can be used to model occurrences in qualquer regido que possa ser subdividida em pedacos arbitrariamente pequenos. Existem trés any region that can be subdivided into arbitrarily small pieces. There are three suposicées que levam ao modelo de processo de Poisson. assumptions that lead to the Poisson process model. A primeira suposicdo é que o numero de ocorréncias em qualquer colecdo de disjuntointervalos The first assumption is that the numbers of occurrences in any collection of de tempo devem ser mutuamente independentes. Por exemplo, mesmo que um numero disjoint intervals of time must be mutually independent. For example, even though invulgarmente grande de chamadas telefénicas seja recebido numa central telefonica durante um an unusually large number of telephone calls are received at a switchboard during intervalo particular, a probabilidade de que pelo menos uma chamada seja recebida durante um a particular interval, the probability that at least one call will be received during a intervalo seguinte permanece inalterada. Da mesma forma, mesmo que nenhuma chamada tenha forthcoming interval remains unchanged. Similarly, even though no call has been sido recebida na central telefénica durante um intervalo anormalmente longo, a probabilidade de received at the switchboard for an unusually long interval, the probability that a call uma chamada ser recebida durante o proximo intervalo curto permanece inalterada. will be received during the next short interval remains unchanged. A segunda suposi¢do é que a probabilidade de uma ocorréncia durante cada The second assumption is that the probability of an occurrence during each intervalo de tempo muito curto deve ser aproximadamente proporcional a duracado desse very short interval of time must be approximately proportional to the length of intervalo. Para expressar esta condi¢gdo de forma mais formal, usaremos 0 padrdo that interval. To express this condition more formally, we shall use the standard 5.4 The Poisson Distributions 295 mathematical notation in which o(t) denotes any function of t having the property that lim t→0 o(t) t = 0. (5.4.8) According to (5.4.8), o(t) must be a function that approaches 0 as t → 0, and, fur- thermore, this function must approach 0 at a rate faster than t itself. An example of such a function is o(t) = tα, where α > 1. It can be verified that this function satisfies Eq. (5.4.8). The second assumption can now be expressed as follows: There exists a constant λ > 0 such that for every time interval of length t, the probability of at least one occurrence during that interval has the form λt + o(t). Thus, for every very small value of t, the probability of at least one occurrence during an interval of length t is equal to λt plus a quantity having a smaller order of magnitude. One of the consequences of the second assumption is that the process being ob- served must be stationaryover the entire period of observation; that is, the probability of an occurrence must be the same over the entire period. There can be neither busy intervals, during which we know in advance that occurrences are likely to be more frequent, nor quiet intervals, during which we know in advance that occurrences are likely to be less frequent. This condition is reflected in the fact that the same con- stant λ expresses the probability of an occurrence in every interval over the entire period of observation. The second assumption can be relaxed at the cost of more complicated mathematics, but we shall not do so here. The third assumption is that, for each very short interval of time, the probability that there will be two or more occurrences in that interval must have a smaller order of magnitude than the probability that there will be just one occurrence. In symbols, the probability of two or more occurrences in a time interval of length t must be o(t). Thus, the probability of two or more occurrences in a small interval must be negligible in comparison with the probability of one occurrence in that interval. Of course, it follows from the second assumption that the probability of one occurrence in that same interval will itself be negligible in comparison with the probability of no occurrences. Under the preceding three assumptions, it can be shown that the process will satisfy the definition of a Poisson process with rate λ. See Exercise 16 in this section for one method of proof. Summary Poisson distributions are used to model data that arrive as counts. A Poisson process with rate λ is a model for random occurrences that have a constant expected rate λ per unit time (or per unit area). We must assume that occurrences in disjoint time intervals (or disjoint areas) are independent and that two or more occurrences cannot happen at the same time (or place). The number of occurrences in an interval of length (or area of size) t has the Poisson distribution with mean tλ. If n is large and p is small, then the binomial distribution with parameters n and p is approximately the same as the Poisson distribution with mean np. 5.4 As Distribuições de Poisson 295 notação matemática em queo(t)denota qualquer função detter a propriedade que o(t) limão =0. (5.4.8) t→0t De acordo com (5.4.8),o(t)deve ser uma função que se aproxima de 0 comot→0 e, além disso, esta função deve aproximar-se de 0 a uma taxa mais rápida do quetem si. Um exemplo de tal função éo(t)=tα, ondeα >1. Pode-se verificar que esta função satisfaz a Eq. (5.4.8). A segunda suposição pode agora ser expressa da seguinte forma: existe uma constanteλ >0 tal que para cada intervalo de tempo de comprimentot,a probabilidade de pelo menos uma ocorrência durante esse intervalo tem a formaλt+o(t). Assim, para cada valor muito pequeno det,a probabilidade de pelo menos uma ocorrência durante um intervalo de comprimentoté igual aλt mais uma quantidade de ordem de grandeza menor. Uma das consequências da segunda suposição é que o processo que está sendo observado deve serestacionáriodurante todo o período de observação; isto é, a probabilidade de uma ocorrência deve ser a mesma durante todo o período. Não pode haver nem intervalos movimentados, durante os quais sabemos antecipadamente que as ocorrências provavelmente serão mais frequentes, nem intervalos tranquilos, durante os quais sabemos antecipadamente que as ocorrências provavelmente serão menos frequentes. Esta condição se reflete no fato de que a mesma constanteλexpressa a probabilidade de uma ocorrência em cada intervalo durante todo o período de observação. A segunda suposição pode ser relaxada à custa de uma matemática mais complicada, mas não o faremos aqui. A terceira suposição é que, para cada intervalo de tempo muito curto, a probabilidade de haver duas ou mais ocorrências nesse intervalo deve ter uma ordem de grandeza menor do que a probabilidade de haver apenas uma ocorrência. Em símbolos, a probabilidade de duas ou mais ocorrências em um intervalo de tempo de comprimentotdevemos ser o(t). Assim, a probabilidade de duas ou mais ocorrências num intervalo pequeno deve ser insignificante em comparação com a probabilidade de uma ocorrência nesse intervalo. Naturalmente, segue-se da segunda suposição que a probabilidade de uma ocorrência nesse mesmo intervalo será ela própria insignificante em comparação com a probabilidade de nenhuma ocorrência. Sob as três suposições anteriores, pode ser mostrado que o processo irá satisfazer a definição de um processo de Poisson com taxaλ. Veja o Exercício 16 nesta seção para um método de prova. Resumo As distribuições de Poisson são usadas para modelar dados que chegam como contagens. Um processo de Poisson com taxaλé um modelo para ocorrências aleatórias que têm uma taxa esperada constanteλ por unidade de tempo (ou por unidade de área). Devemos assumir que as ocorrências em intervalos de tempo disjuntos (ou áreas disjuntas) são independentes e que duas ou mais ocorrências não podem acontecer ao mesmo tempo (ou local). O número de ocorrências em um intervalo de comprimento (ou área de tamanho)ttem a distribuição de Poisson com médiatλ. Sené grande e pé pequeno, então a distribuição binomial com parâmetrosnepé aproximadamente igual à distribuição de Poisson com médianp. 296 Capitulo 5 Distribuigées Especiais 296 Chapter 5 Special Distributions Exercicios Exercises 1.No Exemplo 5.4.7, comA= 0.2 ep=0.1, calcule a 12.Suponha que a proporgdo de dalténicos em uma 1. In Example 5.4.7, with A =0.2 and p = 0.1, compute 12. Suppose that the proportion of colorblind people in probabilidade de detectarmos pelo menos dois determinada populacdo seja 0,005. Qual é a probabilidade de the probability that we would detect at least two oocysts a certain population is 0.005. What is the probability that oocistos apés filtrar 100 litros de agua. nao haver mais de uma pessoa dalt6nica em um grupo after filtering 100 liters of water. there will not be more than one colorblind person in a . . , escolhido aleatoriamente de 600 pessoas? . . randomly chosen group of 600 people? 2.Suponha que num determinado fim de semana o numero de 2. Suppose that on a given weekend the number of acci- acidentes em um determinado cruzamento tenha a distribui¢gdo 13.A probabilidade de trigémeos em nascimentos humanos é de dents at a certain intersection has the Poisson distribution 13. The probability of triplets in human births is approx- de Poisson com média 0,7. Qual é a probabilidade de ocorrerem aproximadamente 0,001. Qual é a probabilidade de haver exatamente with mean 0.7. What is the probability that there will be at imately 0.001. What is the probability that there will be pelo menos trés acidentes no cruzamento durante o fim de um conjunto de trigémeos entre 700 nascimentos em um grande least three accidents at the intersection during the week- exactly one set of triplets among 700 births in a large hos- semana? hospital? end? pital? 3.Suponha que o numero de defeitos em um pedaco de tecido 14.Uma companhia aérea vende 200 passagens para um 3. Suppose that the number of defects on a bolt of cloth 14, An airline sells 200 tickets for a certain flight on an produzido por um determinado processo tenha a distribuigdo de determinado voo em um avido que tem apenas 198 assentos produced by a certain process has the Poisson distribution airplane that has only 198 seats because, on the average, Poisson com média 0,4. Se uma amostra aleatéria de cinco parafusos porque, em média, 1% dos compradores de passagens aéreas with mean 0.4. If a random sample of five bolts of cloth is 1 percent of purchasers of airline tickets do not appear de tecido for inspecionada, qual é a probabilidade de que o numero nao comparece a partida do voo. Determine a probabilidade inspected, what is the probability that the totalnumber of _ for the departure of their flight. Determine the probability total de defeitos nos cinco parafusos seja pelo menos 6? de que todos que comparecerem para a saida deste voo defects on the five bolts will be at least 6? that everyone who appears for the departure of this flight 4.Suponhamos que num determinado livro existam, em média, Aerros de tenham assento. 4. Suppose that in a certain book there are on the average will have a seat. impressdo por pagina e que os erros de impressdo ocorreram de acordo 15.Suponha que os usuarios da Internet acessem um a misprints per page and that misprints occurred accord- 15. Suppose that internet users access a particular Web com um processo de Poisson. Qual é a probabilidade de que uma determinado site de acordo com um processo de Poisson com ing to a Poisson process. What is the probability that a site according to a Poisson process with rate 4 per hour, determinada pagina nao contenha erros de impressao? taxaApor hora, masAE desconhecido. O mantenedor do site particular page will contain no misprints? but 4 is unknown. The Web site maintainer believes that 5.Suponha que um livro comnpaginas contém em médiaJerros de acredita que Atem distribuicao continua com pdf 5. Suppose that a book with n pages contains on the av- has a continuous distribution with p.d.f impressdo por pagina. Qual é a probabilidade de que haja pelo { erage 4 misprints per page. What is the probability that 2 menoseupaginas que contenham mais dek erros de impressdo? fAF 2e-2/ para >0, 0 there will be at least m pages which contain more than k fAW= | 2e for A > 0, de outra forma. misprints? 0 otherwise. 6.Suponha que um certo tipo de fita magnética contenha em DeixarXsera o numero de usuarios que acessam 0 site 6. Suppose that a certain type of magnetic tape contains Let X be the number of users who access the Web média trés defeitos por 1.000 pés. Qual é a probabilidade de que durante um periodo de uma hora. SeX=1 é observado, on the average three defects per 1000 feet. What is the site during a one-hour period. If X = 1 is observed, find um rolo de fita com 1.200 pés de comprimento nao contenha encontre a pdf condicional deAdadoX=1. probability that a roll of tape 1200 feet long contains no _ the conditional p.d.f. of 4 given X = 1. defeitos? 16.Neste exercicio, provaremos que as trés suposi¢gées defects? 16. In this exercise, we shall prove that the three assump- 7.Suponha que, em média, uma determinada loja atenda 15 subjacentes ao modelo do processo de Poisson implicam de fato 7. Suppose that on the average a certain store serves 15 tions underlying the Poisson process model do indeed clientes por hora. Qual é a probabilidade de a loja atender que as ocorréncias acontecem de acordo com um processo de customers per hour. What is the probability that the store imply that occurrences happen according to a Poisson mais de 20 clientes em um determinado perfodo de duas Poisson. O que precisamos mostrar é que, para cadat,o numero will serve more than 20 customers ina particular two-hour process. What we need to show is that, for each r, the horas? de ocorréncias durante um intervalo de tempo de duracdottem a period? number of occurrences during a time interval of length r . co. a distribuigdo de Poisson com médiaAt.DeixarXrepresentam o . . has the Poisson distribution with mean Ar. Let X stand for 8.Suponha queXiex2sao varlavels aleatorias . numero de ocorréncias durante um determinado intervalo de 8. Suppose that X; and Xz are independent random varl- the number of occurrences during a particular time inter- independentes e queXeutem a distribuicao de Poisson tempo de duracdotSinta-se 4 vontade para usar a seguinte ables and that X; has the Poisson distribution with mean val of length r. Feel free to use the following extension of com média Aeu(eu=1,2). Para cada valor fixo dek (k=1,2 extensdo da Eq. (5.4.7): Para tudo reala, A; (i = 1, 2). For each fixed value of k (k = 1,2,...), de- Eq. (5.4.7): For all real a, ,-. J, determine a distribuigdo condicional dexidado termine the conditional distribution of X, given that X; + queXit X2=k. limdo(1 +aut o(vocé)h Wocé= €a, (5.4.9) X72 =k. lim (1+au+ o(u))!/" =e", (5.4.9) 9.Suponha que o numero total de itens produzidos por uma ™ 9. Suppose that the total number of items produced by ~ oo, .. , determinada maquina tenha a distribuicgdo de Poisson com a.Para cada inteiro positivon, divida 0 intervalo de tempo ema a certain machine has the Poisson distribution with mean a. For each positive integer n, divide the time interval média A, todos os itens s4o0 produzidos independentemente subintervalos disjuntos de comprimentot/ncada. Para eu=1,..., i, all items are produced independently of one another, into n disjoint subintervals of length 1/n each. For uns dos outros, e a probabilidade de qualquer item produzido n, deixarSeu=1 se ocorrer exatamente uma chegada noeuo and the probability that any given item produced by the i=1,...,n,let ¥j =1if exactly one arrival occurs in pela maquina ser defeituoso ép. Determine a distribuicdo subintervalo, e deixeAeuser 0 evento em que duas ou mais machine will be defective is p. Determine the marginal the ith subinterval, and let A; be the event that two or marginal do numero de itens defeituosos produzidos pela ocorréncias} ocorréncias ocorrem durante oeuo subintervalo. distribution of the number of defective items produced by more occurrences Occur during the ith subinterval. maquina. DeixarCn= —eu=1 Seu. Para cada inteiro ndo negativok, the machine. Let W,, = >-;_, Yj. For each nonnegative integer k, mostre que podemos escrever Pr(X=kKEPr.(Cn=k} Pr.(B), show that we can write Pr(X =k) = Pr(W, =k) + 10.Para o problema descrito no Exercicio 9, vamosXdenotar o ondeBé um subconjunto deUn a For ne proviem described in Exercise. ct x denote Pr(B), where B is a subset of U"_, Aj. numero de itens defeituosos produzidos pela maquina, e . ee . the number of defective items produce the machine, . ~ . deixar Sdenota o numero de tens nao defeituosos produzidos b.Mostre esse limmPr.(Un eu-1Aeu0.Dica-Mostre isso and let Y denote the number ot nondefective items pro- b. Show that Timy—oo Pr(U;_, Ai) = 0. Hint: Show that pela maquina. Mostre issoXeSsdo varidveis aleatorias PrN, ag4Rl +0(voce)h voceonde voce=1 /n. duced by the machine. Show that X and Y are independent Pr(Mj_, A‘) = + o(u))"" where u = 1/n. independentes. c.Mostre esse limm-Pr.(CGn=k)= e-(AYK/K\. Dica: limaon random variables. c. Show that lim,_,.) Pr(W, =k) =e"*(at)‘/k!. Hint: 11.A moda de uma distribuicdo discreta foi definida no oA Nk(n-k}] = 1. 11. The mode of a discrete distribution was defined in lim, 95 2!/[nk(n — kK)! = 1. Exercicio 12 da Secdo. 5.2. Determine a moda ou modas da d.Mostre issoXtem a distribuigdo de Poisson com média Nao Exercise 12 of Sec. 5.2. Determine the mode or modes of d. Show that X has the Poisson distribution with mean distribuigdo de Poisson com médiaA. . the Poisson distribution with mean i. At. 5.5 As Distribuigdes Binomiais Negativas 297 5.5 The Negative Binomial Distributions 297 17.Prove o Teorema 5.4.6. Uma abordagem é adaptar a prova Vocé também precisara de mais alguns desses limites. (iii) Em vez 17. Prove Theorem 5.4.6. One approach is to adapt the You'll need a couple more such limits as well. (iii) Instead do Teorema 5.3.4 substituindonpornmessa prova. As etapas de (5.3.12), prove que proof of Theorem 5.3.4 by replacing n by m7 in that proof. of (5.3.12), prove that da prova que sao significativamente diferentes sdo as The steps of the proof that are significanlty different are seguintes. (i) Vocé precisara mostrar isso 87-n7vai para~. (ii) Os Nye x the following. (i) You will need to show that By — ny goes x 4x plr-* trés limites que dependem do Teorema 5.3.3 precisam ser limo eT =Axe-A, to oo. (ii) The three limits that depend on Theorem 5.3.3 lim PTOTPT arte, reescritos como razées convergindo para 1. Por exemplo, o T»00(AT+ BT)nt need to be rewritten as ratios converging to 1. For exam- Too (Ar + Br)"T segundo é reescrito como ple, the second one is rewritten as ( ) Brnrexet 2 18.DeixarAr, B7,en7ser sequéncias, todas as trés indo Br—nptett/2 18. Let Ay, By, andny be sequences, all three of which go smote Br net para»como 7>~.Prove que lim7«0nTAT/(/AT+ BTA se e im ( Br ) reer oonrtt 1 to coas T — oo. Prove that limy_,,,n 7 Ap/(Ar + Br) =A T20 BrNT+X somente se lim™0nTA7/Br=A. Too \ Bp —nptx if and only if limy_,.yn7Ar/Br =k. 5.5 As Distribuigdes Binomiais Negativas 5.5 The Negative Binomial Distributions Anteriormente aprendemos que, emnEnsaios de Bernoulli com probabilidade de sucessop, Earlier we learned that, in n Bernoulli trials with probability of success p, the o numero de sucessos tem a distribuicgo binomial com parametrosnepag.Em vez de number of successes has the binomial distribution with parameters n and p. Instead contar os sucessos num numero fixo de tentativas, muitas vezes 6 necessdrio observar as of counting successes in a fixed number of trials, it is often necessary to observe tentativas até vermos um numero fixo de sucessos. Por exemplo, ao monitorar um the trials until we see a fixed number of successes. For example, while monitoring equipamento para ver quando ele precisa de manuten¢éo, podemos deixd-lo funcionar até a piece of equipment to see when it needs maintenance, we might let it run until it produzir um numero fixo de erros e entao repard-lo. O numero de falhas até um numero produces a fixed number of errors and then repair it. The number of failures until fixo de sucessos tem uma distribui¢ao na familia de distribuigé6es binomiais negativas. a fixed number of successes has a distribution in the family of negative binomial distributions. Definigdo e Interpretagdo Definition and Interpretation Exemplo Pecas defeituosas.Suponha que uma maquina produz pecas que podem ser boas ou Example Defective Parts. Suppose that a machine produces parts that can be either good or 5.5.1 com defeito. DeixarXeu=1 se oeua peca esta com defeito eXeu=0 caso contrario. Suponha que as 5.5.1 defective. Let X; =1 if the ith part is defective and X; = 0 otherwise. Assume that pecas sejam boas ou defeituosas independentemente umas das outras com Pr(Xeu=1Fppara the parts are good or defective independently of each other with Pr(X; = 1) = p for todoseu. Um inspetor observa as pecas produzidas por esta maquina até ver quatro pecas all i. An inspector observes the parts produced by this machine until she sees four defeituosas. DeixarXseja o numero de pecas boas observadas no momento em que a quarta defectives. Let X be the number of good parts observed by the time that the fourth peca defeituosa é observada. Qual é a distribuicao dex? - defective is observed. What is the distribution of X? < O problema descrito no Exemplo 5.5.1 é tipico de uma situacdo geral em que uma The problem described in Example 5.5.1 is typical of a general situation in which sequéncia de tentativas de Bernoulli pode ser observada. Suponha que uma sequéncia infinita a sequence of Bernoulli trials can be observed. Suppose that an infinite sequence de tentativas de Bernoulli esteja disponivel. Chame os dois resultados possiveis de sucesso e of Bernoulli trials is available. Call the two possible outcomes success and failure, fracasso, compsendo a probabilidade de sucesso. Nesta secdo, estudaremos a distribui¢do do with p being the probability of success. In this section, we shall study the distribution numero total de falhas que ocorrerdo exatamente antesAforam obtidos sucessos, ondeRé um of the total number of failures that will occur before exactly r successes have been numero inteiro positivo fixo. obtained, where r is a fixed positive integer. Teorema Amostragem até um ntimero fixo de sucessos.Suponha que uma sequéncia infinita de Theorem Sampling until a Fixed Number of Successes. Suppose that an infinite sequence of 5.5.1 Ensaios de Bernoulli com probabilidade de sucessopEstdo disponiveis. O numeroXde 5.5.1 Bernoulli trials with probability of success p are available. The number X of failures falhas que ocorrem antes doRO sucesso tem o seguinte pdf: that occur before the rth success has the following p.d.f.: Rex-1pr ? r+x-1 fix\ 7, pr x (1 -p)x parax=0,1,2,..., (5.5.1) flr p) = { x ) p= py forx =0,1,2,..., (5.5.1) 0 de outra forma. 0 otherwise. ProvaParan=r, r+1,...,vamos deixarAndenotam o evento em que o numero total de tentativas Proof Forn=r,r+1,..., weshalllet A, denote the event that the total number of necessdrias para obter exatamenteAsucessos én. Conforme explicado no Exemplo 2.2.8, o trials required to obtain exactly r successes is n. As explained in Example 2.2.8, the eventoAnocorrera se e somente se exatamentef-1 sucesso ocorre entre os primeiros/7-1 event A,, will occur if and only if exactly r — 1 successes occur among the first n — 1 298 Capitulo 5 Distribuigées Especiais 298 Chapter 5 Special Distributions ensaios e oRo sucesso é obtido nono julgamento. Como todas as tentativas sao independentes, trials and the rth success is obtained on the nth trial. Since all trials are independent, segue que () ( it follows that n-4 1/1 - _ n-4 n—-1 n—-1 Pr.(An= PRA(V PHAM). ge pR(1 -pn-R(5.5.2) Pr(A,) = pd py@P-6-D . p = p’(1— py". (5.5.2) R-1 R-1 r—-1 r—-1 Para cada valor dex (x=0,1,2,.. .), o evento que exatamentexfalhas sdo obtidas For each value of x (x = 0, 1, 2, . . .), the event that exactly x failures are obtained antes doRO sucesso é obtido é igual ao evento em que o numero total de tentativas before the rth success is obtained is the same as the event that the total number necessarias para obterAsucessos éA+x. Em outras palavras, seXdenota o numero de of trials required to obtain r successes is r + x. In other words, if X denotes the falhas que ocorrerdo antes doo sucesso é obtido, entao Pr(X= number of failures that will occur before the rth success is obtained, then Pr(X = XFPr.(Ar+x). Eq. (5.5.1) agora segue da Eq. (5.5.2). 7 x) = Pr(A,,). Eq. (5.5.1) now follows from Eq. (5.5.2). 7 Definicgao Distribuigdo Binomial Negativa.Uma variavel aleatériaXtem odiso binomial negativo Definition Negative Binomial Distribution. A random variable X has the negative binomial dis- 5.5.1 contribuigéo com pardmetrosRep(R=1,2,...e 0<p <1) seXtem uma distribuigdo discreta 5.5.1 tribution with parameters r and p (r =1,2,... and 0 < p <1) if X has a discrete para a qual o PF/(x| r, p conforme especificado pela Eq. (5.5.1). distribution for which the p.f. f(x|r, p) is as specified by Eq. (5.5.1). Exemplo Pecas defeituosas.O Exemplo 5.5.1 esta redigido de forma que pecas defeituosas sdo sucessos e Example Defective Parts. Example 5.5.1 is worded so that defective parts are successes and 5.5.2 partes boas sao falhas. A distribuicgdo do numeroXxde pecas boas observadas no 5.5.2 good parts are failures. The distribution of the number X of good parts observed by momento do quarto defeituoso é a distribuigdo binomial negativa com parametros 4 the time of the fourth defective is the negative binomial distribution with parameters ep. - 4 and p. < As Distribuigd6es Geométricas The Geometric Distributions O caso especial mais comum de uma variavel aleatéria binomial negativa é aquele The most common special case of a negative binomial random variable is one for para o qualA=1. Este seria o numero de falhas até o primeiro sucesso. which r = 1. This would be the number of failures until the first success. Definicgao Distribuigdo Geométrica.Uma variavel aleatériaXtem odistribuicao geométrica com Definition Geometric Distribution. A random variable X has the geometric distribution with 5.5.2 pardmetrop (0<p <1seXtem uma distribuicdo discreta para a qual o PF//x|1, Pé 5.5.2 parameter p (0 < p <1) if X has a discrete distribution for which the p.f. f(x|1, p) is do seguinte modo: { as follows: 1 - =0,1,2,...,d 1— p)* forx =0,1,2,..., fix|1, PF P(\ -p)x parax= e 65.5.3) f(x|l, p) = | pd-— p) x . (5.5.3) 0 outra forma. 0 otherwise. Exemplo Triplos na loteria.Um jogo de loteria diario comum envolve o sorteio de trés Example Triples in the Lottery. A common daily lottery game involves the drawing of three 5.5.3 digitos de 0 a 9 de forma independente com reposicao e de forma independente do dia a dia. 5.5.3 digits from 0 to 9 independently with replacement and independently from day to Os observadores da loteria geralmente ficam entusiasmados quando os trés digitos sdo iguais, day. Lottery watchers often get excited when all three digits are the same, an event um evento chamadotriplos. Sepé a probabilidade de obter triplos, e seXé o numero de dias sem called triples. If p is the probability of obtaining triples, and if X is the number of triplos antes que o primeiro triplo seja observado, entaoXtem a distribuicdo geométrica com days without triples before the first triple is observed, then X has the geometric parametrop. Neste caso, é facil ver quep=0.01, uma vez que existem 10 triplos diferentes entre distribution with parameter p. In this case, it is easy to see that p = 0.01, since there os 1000 numeros didrios igualmente provaveis. - are 10 different triples among the 1000 equally likely daily numbers. < A relacdo entre distribuicd6es geométricas e binomiais negativas vai além do fato The relationship between geometric and negative binomial distributions goes de que as distribuigdes geométricas sdo casos especiais de distribuigdes binomiais beyond the fact that the geometric distributions are special cases of negative binomial negativas. distributions. Teorema Semi,..., XRsdo variaveis aleatdrias iid e se cadaXeutem a distribuigdo geométrica Theorem If X,,..., X, areiid. random variables and if each X; has the geometric distribution 5.5.2 com pardmetrop, entdo a somaM+. . .+Xatem a distribuigdo binomial negativa 5.5.2 with parameter p, then the sum X; +---+ X,. has the negative binomial distribution com pardmetrosfep. with parameters r and p. ProvaConsidere uma sequéncia infinita de tentativas de Bernoulli com probabilidade de Proof Consider an infinite sequence of Bernoulli trials with success probability p. sucessop. DeixarXidenota o numero de falhas que ocorrem antes que 0 primeiro sucesso seja Let X, denote the number of failures that occur before the first success is obtained; obtido; entaoXitera a distribuigdo geométrica com parametrop. then X, will have the geometric distribution with parameter p. Agora continue observando os testes de Bernoulli apds o primeiro sucesso. Para/ Now continue observing the Bernoulli trials after the first success. For j = = 2,3,...,deixarXidenota o numero de falhas que ocorrem apds/1 sucessos tém 2,3,..., let X; denote the number of failures that occur after j — 1 successes have 5.5 As Distribuigdes Binomiais Negativas 299 5.5 The Negative Binomial Distributions 299 foi obtido, mas antes do/o sucesso é obtido. Como todas as tentativas sdo been obtained but before the jth success is obtained. Since all the trials are indepen- independentes e a probabilidade de obter sucesso em cada tentativa ép, segue- dent and the probability of obtaining a success on each trial is p, it follows that each se que cada variavel aleatériaXtera a distribuigdo geométrica com pardmetrope random variable X ; will have the geometric distribution with parameter p and that que as variaveis aleatoriasM, X2,...sera independente. Além disso, paraR=1,2 the random variables X1, X>, .. . will be independent. Furthermore, forr = 1, 2,..., ,---,asomaNit, ..+XRsera igual ao numero total de falhas que ocorrem the sum X; +---+ X, will be equal to the total number of failures that occur before exatamente antes Asucessos foram obtidos. Portanto, esta soma tera a exactly r successes have been obtained. Therefore, this sum will have the negative distribuigdo binomial negativa com pardmetrosRep. 7 binomial distribution with parameters r and p. 7 Propriedades de distribuigées binomiais e geométricas negativas Properties of Negative Binomial and Geometric Distributions Teorema Fungdo geradora de momento.SeXtem a distribuigao binomial negativa com pardmetros Theorem Moment Generating Function. If X has the negative binomial distribution with param- 5.5.3 éteres Rep, entao o maf deXé o seguinte: 5.5.3 eters r and p, then the m.g.f. of X is as follows: W(t p_’* Soa? (5.5.4) wit) ( P ) for t <lo ( | ) (5.5.4) OO arat <registro __ . wo. = TO —_——— ]- J. 1 -(1 -educa¢ao Fisicat i. 9 1 -p 1 _— (1 _— pet 6 1 —p O mof da distribuigdo geométrica com pardmetropé o caso especial da Eq. The m.g.f. of the geometric distribution with parameter p is the special case of (5.5.4) comR=1. Eq. (5.5.4) with r = 1. ProvaDeixar%,..., Xaser uma amostra aleatoria deAvariadveis aleatérias geométricas, cada Proof Let X;,..., X,bearandom sample ofr geometric random variables each with uma com parametrop. Encontraremos o mgf deXie entdo aplique os Teoremas 4.4.4 e 5.5.2 parameter p. We shall find the m.g.f. of X, and then apply Theorems 4.4.4 and 5.5.2 para encontrar o mgf da distribuigdo binomial negativa com parametrosRep. to find the m.g.f. of the negative binomial distribution with parameters r and p. O mofyi(tderié The m.g.f. y(t) of Xj, is > CO Yn (t= E(ew Fp [nt -ecucagao Fisica (5.5.5) wit) = Ee!) = p S“[(1 = p)e'f. (5.5.5) x=0 x=0 A série infinita na Eq. (5.5.5) tera uma soma finita para cada valor dettal que 0<(1 - The infinite series in Eq. (5.5.5) will have a finite sum for every value of t such that educacao Fisicat<1, ou seja, parat <registro(1 41 -p]). E sabido pelo caélculo elementar 0<(1-— p)e’ <1, that is, for t < log(1/[1 — p]). It is known from elementary calculus que para cada numeroa (0<a<1), that for every number a (0 <a < 1), Y 1 = 1 a= ——. > a* = ——. 1-a l-a x=0 x=0 Portanto, parat <registro(141 -p]), o maf da distribuigdo geométrica com pardmetrop Therefore, fort < log(1/[1 — p]), the m.g.f. of the geometric distribution with param- é eter p is yn(t= ——? —. (5.5.6) Wa) = —__?-—. (5.5.6) 1 -(1 -educac¢ao Fisicat 1- (1 _ pet Cada um, ..., Xxtem o mesmo mf, ou seja,~1. De acordo com o Teorema Each of X;,..., X, has the same m.g.f., namely, y,. According to Theorem 4.4.4, 4.4.4, o mof dex=Xit. . .+XréW(t= [1 (t]R.0 teorema 5.5.2 diz queXtem a the m.g.f. of X = X,+---+ X, is ¥() =[W,(@]". Theorem 5.5.2 says that X has the distribuigdo binomial negativa com pardmetrosRep, e portanto o mgf deXé [Wi(t) negative binomial distribution with parameters r and p, and hence the m.g.f. of X is ]J”,que €o mesmo que a Eq. (5.5.4). 7 [w1(t)]", which is the same as Eq. (5.5.4). 7 Teorema Média e Varidncia.SerAtem a distribuigdo binomial negativa com pardametrosRe Theorem Mean and Variance. If X has the negative binomial distribution with parameters r and 5.5.4 p, amédia e a variancia deXdevemos ser 5.5.4 p, the mean and the variance of X must be ri - r( - 1- 1- Exe OP Vary, OP) (5.5.7) E(x) = GP) and Var(x) = re~P) (5.5.7) Pp Pe P Pp A média e a variancia da distribuigdo geométrica com parametropsdo o caso especial The mean and variance of the geometric distribution with parameter p are the special da Eq. (5.5.7) comR=1. case of Eq. (5.5.7) with r = 1. 300 Chapter 5 Special Distributions Proof Let X1 have the geometric distribution with parameter p. We will find the mean and variance by differentiating the m.g.f. Eq. (5.5.5): E(X1) = ψ′ 1(0) = 1 − p p , (5.5.8) Var(X1) = ψ′′ 1(0) − [ψ′ 1(0)]2 = 1 − p p2 . (5.5.9) If X has the negative binomial distribution with parameters r and p, represent it as the sum X = X1 + . . . + Xr of r independent random variables, each having the same distribution as X1. Eq. (5.5.7) now follows from Eqs. (5.5.8) and (5.5.9). Example 5.5.4 Triples in the Lottery. In Example 5.5.3, the number X of daily draws without a triple until we see a triple has the geometric distribution with parameter p = 0.01. The total number of days until we see the first triple is then X + 1. So, the expected number of days until we observe triples is E(X) + 1 = 100. Now suppose that a lottery player has been waiting 120 days for triples to occur. Such a player might conclude from the preceeding calculation that triples are “due.” The most straightforward way to address such a claim would be to start by calculating the conditional distribution of X given that X ≥ 120. ◀ The next result says that the lottery player at the end of Example 5.5.4 couldn’t be farther from correct. Regardless of how long he has waited for triples, the time remaining until triples occur has the same geometric distribution (and the same mean) as it had when he started waiting. The proof is simple and is left as Exercise 8. Theorem 5.5.5 Memoryless Property of Geometric Distributions. Let X have the geometric distribution with parameter p, and let k ≥ 0. Then for every integer t ≥ 0, Pr(X = k + t|X ≥ k) = Pr(X = t). The intuition behind Theorem 5.5.5 is the following: Think of X as the number of failures until the first success in a sequence of Bernoulli trials. Let Y be the number of failures starting with the k + 1st trial until the next success. Then Y has the same distribution as X and is independent of the first k trials. Hence, conditioning on anything that happened on the first k trials, such as no successes yet, doesn’t affect the distribution of Y—it is still the same geometric distribution. A formal proof can be given in Exercise 8. In Exercise 13, you can prove that the geometric distributions are the only discrete distributions that have the memoryless property. Example 5.5.5 Triples in the Lottery. In Example 5.5.4, after the first 120 non-triples, the process essentially starts over again and we still have to wait a geometric amount of time until the first triple. At the beginning of the experiment, the expected number of failures (non- triples) that will occur before the first success (triples) is (1 − p)/p, as given by Eq. (5.5.8). If it is known that failures were obtained on the first 120 trials, then the conditional expected total number of failures before the first success (given the 120 failures on the first 120 trials) is simply 120 + (1 − p)/p. ◀ 300 Capítulo 5 Distribuições Especiais ProvaDeixarX1tem a distribuição geométrica com parâmetrop. Encontraremos a média e a variância diferenciando a Eq. (5.5.5): 1 -p p EX1)=ψ' 1(0)= , (5.5.8) 1- p p2 Var(X1)=ψ“ 1(0)− [ψ' 1(0)]2 = . (5.5.9) SeXtem a distribuição binomial negativa com parâmetrosRep, represente-o como a somaX=X1+. . .+XRdeRvariáveis aleatórias independentes, cada uma tendo a mesma distribuição queX1. Eq. (5.5.7) agora segue das Eqs. (5.5.8) e (5.5.9). Exemplo 5.5.4 Triplos na loteria.No Exemplo 5.5.3, o númeroXde sorteios diários sem triplo até vermos que um triplo tem a distribuição geométrica com parâmetrop=0.01. O número total de dias até vermos o primeiro triplo é entãoX+1. Portanto, o número esperado de dias até observarmos triplos éEX)+1 = 100. Agora, suponha que um jogador de loteria esteja esperando 120 dias para que ocorram triplos. Tal jogador pode concluir do cálculo anterior que os triplos são “devidos”. A maneira mais direta de abordar tal afirmação seria começar calculando a distribuição condicional deXdado queX≥120. - O próximo resultado diz que o jogador de loteria no final do Exemplo 5.5.4 não poderia estar mais longe de estar correto. Independentemente de quanto tempo ele esperou pelos triplos, o tempo restante até que os triplos ocorram tem a mesma distribuição geométrica (e a mesma média) que tinha quando ele começou a esperar. A prova é simples e fica como Exercício 8. Teorema 5.5.5 Propriedade sem memória de distribuições geométricas.DeixarXtem a distribuição geométrica com parâmetrop, e deixark≥0. Então para cada número inteirot≥0, Pr.(X=k+t|X≥k)=Pr.(X=t). A intuição por trás do Teorema 5.5.5 é a seguinte: Pense emXcomo o número de falhas até o primeiro sucesso em uma sequência de tentativas de Bernoulli. DeixarSser o número de falhas começando com ok+1ª tentativa até o próximo sucesso. EntãoStem a mesma distribuição queXe é independente do primeirokensaios. Portanto, condicionar-se a qualquer coisa que tenha acontecido no primeiroktestes, como nenhum sucesso ainda, não afeta a distribuição deS — ainda é a mesma distribuição geométrica. Uma prova formal pode ser dada no Exercício 8. No Exercício 13, você pode provar que as distribuições geométricas são as únicas distribuições discretas que possuem a propriedade sem memória. Exemplo 5.5.5 Triplos na loteria.No Exemplo 5.5.4, após as primeiras 120 não triplas, o processo essencialmente começa de novo e ainda temos que esperar um tempo geométrico até o primeiro triplo. No início do experimento, o número esperado de falhas (não triplas) que ocorrerão antes do primeiro sucesso (triplas) é(1 -p)/p, conforme dado pela Eq. (5.5.8). Se for conhecido que as falhas foram obtidas nas primeiras 120 tentativas, então o número total condicional esperado de falhas antes do primeiro sucesso (dadas as 120 falhas nas primeiras 120 tentativas) é simplesmente 120 +(1 -p)/p. - 5.5 As Distribuigdes Binomiais Negativas 301 5.5 The Negative Binomial Distributions 301 Extensdo da defini¢ao de distribuicao binomial negativa Extension of Definition of Negative Binomial Distributon Usando a definigdo de coeficientes binomiais dada na Eq. (5.3.14), a fungdo f(/x| 7, p) By using the definition of binomial coefficients given in Eq. (5.3.14), the function pode ser considerado como o PF de uma distribuicdo discreta para cada numeror >0 f (|r, p) can be regarded as the p.f. of a discrete distribution for each number r > 0 (ndo necessariamente um numero inteiro) e cada numeropno intervalo 00 e O 0 and0 ( ) pd- py =1. (5.5.10) x x x=0 x=0 Resumo Summary Se observarmos uma sequéncia de ensaios de Bernoulli independentes com If we observe a sequence of independent Bernoulli trials with success probability p, probabilidade de sucessop, o numero de falhas até oo sucesso tem distribuigdo binomial the number of failures until the rth success has the negative binomial distribution negativa com pardmetrosfep. O caso especial deA=1 é a distribuigdo geométrica com with parameters r and p. The special case of r = 1 is the geometric distribution with parametrop. A soma de variaveis aleatdrias binomiais negativas independentes com o parameter p. The sum of independent negative binomial random variables with the mesmo segundo parametroptem uma distribuigdo binomial negativa. same second parameter p has a negative binomial distribution. Exercicios Exercises 1.Considere uma loteria didria conforme descrito no Exemplo 5.5.4. inteiro(eu=2,3, .. .de tal modo que/mp <1, e ela joga até 1. Consider a daily lottery as described in Example 5.5.4. integer (m = 2, 3,...) such that mp < 1, and she throws a.Calcule a probabilidade de que dois dias especificos conseguir senhorvezes. a. Compute the probability that two particular days in until she has succeeded mr times. consecutivos tenham triplos. a.Para qual jogador o numero esperado de langamentos é a row will both have triples. a. For which player is the expected number of throws b.Suponha que observemos triplos em um determinado dia. menor? b. Suppose that we observe triples on a particular day. smaller? Calcule a probabilidade condicional de observarmos triplos b.Para qual jogador a variancia do numero de Compute the conditional probability that we observe b. For which player is the variance of the number of novamente no dia seguinte. langamentos é menor? triples again the next day. throws smaller? h ancia de | nd d 5.Suponha que as variaveis aleatdériasX1,..., Xisdo 2S that f ind dent t 5. Suppose that the random variables Xj, ..., X; are in- 2.Suponha que uma sequéncia de langamentos independentes independentes e queXeutem a distribuicdo binomial . fe cith a i wich the soba or taint are dependent and that X; has the negative binomial distribu- seja feita com uma moeda para a qual a probabilidade de obter negativa com paradmetrosReuep (eu=1.. .k). Prove que a hes i wit heiv or wil 30 probability of obtaining a tion with parameters r; and p (i = 1...k). Prove that the cara em cada langamento seja 1/30. somaXit. ..+Xitem a distribuigdo binomial negativa ead on each given toss is 1/30. sum X; +---+ X;, has the negative binomial distribution a.Qual 6 o numero esperado de coroas que serdo com parametrosA=Ait. . .+Rkep. a. What is the expected number of tails that will be with parameters r =r, +---+7r;, and p. obtidas antes que cinco caras sejam obtidas? teeth os a obtained before five heads have been obtained? tog . , . 4 : J . 6.Suponha queXtem a distribuigdo geométrica com ; | : ; 6. Suppose that X has the geometric distribution with b.Qual é a variancia do numero de coroas que serao pardmetrop. Determine a probabilidade de que o valor b. What is the variance of the number of tails that will harameter p. Determine the probability that the value of obtidas antes de cinco caras serem obtidas? de Xserd um dos inteiros pares 0,2,4,.... be obtained before five heads have been obtained? X will be one of the even integers 0, 2, 4,.... 3.Considere a sequéncia de langamentos de moeda descrita no 7.Suponha quextem a distribuicao geometrica com 3. Consider the sequence of coin tosses described in Ex- 7. Suppose that X has the geometric distribution with Exercicio 2. parametrop. Mostre que para todo numero inteiro ndo ercise 2. parameter p. Show that for every nonnegative integer k, oo, negativok, Pr.(XX2kE(1 -p)k. . . Pr(X >k) =(1— p). a.Qual é o numero esperado de langamentos a. What is the expected number of tosses that will be necessarios para obter cinco caras? 8.Prove o Teorema 5.5.5. required in order to obtain five heads? 8. Prove Theorem 5.5.5. b.Qual é a variancia do numero de langamentos 9.Suponha que um sistema eletrénico contenhancomponentes que b. What is the variance of the number of tosses that will 9. Suppose that an electronic system contains n compo- necessarios para obter cinco caras? funcionam independentemente uns dos outros, e suponhamos que be required in order to obtain five heads? nents that function independently of each other, and sup- esses componentes estejam conectados em série, conforme definido pose that these components are connected in series, as 4.Suponha que dois jogadoresAeBestdo tentando langar uma no Exercicio 5 da Secdo. 3.7. Suponha também que cada componente 4. Suppose that two players A and B are trying to throwa defined in Exercise 5 of Sec. 3.7. Suppose also that each bola de basquete através de um aro. A probabilidade desse funcionaraé adequadamente por um certo numero de periodos e depois basketball through a hoop. The probability that player A component will function properly for a certain number jogadorA tera sucesso em qualquer lance ép, e ele joga até falhara. Finalmente, suponha que paraeu= 1,..., 7, o numero de will succeed on any given throw is p, and he throws until of periods and then will fail. Finally, suppose that for i = conseguir Avezes. A probabilidade desse jogador# tera periodos para os quais 0 componenteeu funcionara corretamente é he has succeeded r times. The probability that player B 1,...,m, the number of periods for which component i sucesso em qualquer lance émp, ondeeué um dado uma variavel aleatoria discreta tendo will succeed on any given throw is mp, where m is a given will function properly is a discrete random variable having 302 Capitulo 5 Distribuigées Especiais 302 Chapter 5 Special Distributions uma distribuigdo geométrica com parametropeu. Determine a Ptem distribuigdo continua com pdf a geometric distribution with parameter p;. Determine the P has a continuous distribution with p.d_f. distribuiggo do numero de periodos durante os quais o sistema { distribution of the number of periods for which the system 9. funcionara adequadamente. f(pF10(1 -pp se 0<p <1, will function properly. f(p)= | 10d —p) if0<p<1, . Coo . . 0 de outra forma. . . . 0 otherwise. 10.Deixarf(x| r, p)denotar o PF da distribuigdo binomial 10. Let f(x|r, p) denote the p-f. of the negative binomial negativa com pardmetros Rep, e deixar f(x| A)denotar o PF Condicional emP=p, suponha que todas as partes sejam distribution with parameters r and p, and let f(x|A) de- | Conditional on P = p, assume that all parts are indepen- da distribuicdo de Poisson com médiad, conforme definido —independentes umas das outras. Deixarxsera 0 numero de pecas note the p.f. of the Poisson distribution with mean 4, as dent of each other. Let X be the number of nondefective pela Eq. (5.4.2). SuponhaR> »ep-1 de tal forma que o ndo defeituosas observadas até a primeira pega defeituosa. Se defined by Eq. (5.4.2). Suppose r > oo and p > Lin such parts observed until the first defective part. If we observe valor der(1 -p)permanece constante e é igual aAdurante observarmos X=12, calcule a pdf condicional dePdadoX=12. a way that the value of r(1 — p) remains constant and is X = 12, compute the conditional p.d.f. of P given X = 12. todo © processo. Mostre que para cada inteiro nao 13.Deixar Seja o cdf de uma distribuicdo discreta que possui a equal to 4 throughout the process. Show that for each fixed 13. Let F be the c.d-f. of a discrete distribution that has negativo fixox, propriedade sem memoria declarada no Teorema 5.5.5. Definir nonnegative integer x, the memoryless property stated in Theorem 5.5.5. Define fix| r, pyofix| A). (xFregistro[1 -F (0-1) parax=1,2,.... flr, p) > Fela). L(x) = log[1 — F(x — 1)] for x =1,2,.... a.Mostre que, para todos os inteiros¢, h >0, a. Show that, for all integers r, h > 0, 11.Prove que o FP da distribuigdo binomial negativa 11. Prove that the p.f. of the negative binomial distribu- pode ser escrito na seguinte forma alternativa: 1-F(A1 LF (hl) tion can be written in the following alternative form: 1-Fih—-l= 1-F@+h-) {(-R) 1-F(t1) 1— F(t —1) fix| rp xp (A1-p])x parax=0,1,2,..., b - rain p= | (~")p"(-[L- p)* forx =0,1,2,..., ’ .Prove isso (t+hE(t}+ (hjpara todos os inteirost, h > >~P . b. Prove that €(t +h) = €(t) + £(h) for allintegerst, h > 0 de outra forma. 0. 0 otherwise. 0. Dica-Use 0 Exercicio 10 na seg. 5.3. c.Prove isso (t¥t(1)para cada numero inteiro>0. Hint: Use Exercise 10 in Sec. 5.3. c. Prove that €(r) = t@(1) for every integer t > 0. 12.Suponha que uma maquina produza pecas defeituosas d.Prove issoFdeve ser o cdf de uma distribuic¢do 12. Suppose that a machine produces parts that are defec- d. Prove that F must be the c.d.f of a geometric distri- com probabilidadeP,masPe desconhecido. Suponha que geometrica. tive with probability P, but P is unknown. Suppose that bution. 5.6 As Distribuicd6es Normais 5.6 The Normal Distributions O modelo mais utilizado para varidveis aleatérias com distribuigées continuas é a familia The most widely used model for random variables with continuous distributions is de distribuig¢ées normais. Estas distribuic¢6es sao as primeiras que veremos cujas paf nao the family of normal distributions. These distributions are the first ones we shall see podem ser integradas de forma fechada e, portanto, tabelas da cdf ou programas de whose p.d.f.’s cannot be integrated in closed form, and hence tables of the c.d.f. or computador sao necessdrias para calcular probabilidades e quantis para distribuig6es computer programs are necessary in order to compute probabilities and quantiles normais. for normal distributions. Importancia das Distribuigdes Normais Importance of the Normal Distributions Exemplo Emissdes de automéveis.Os motores de automdveis emitem uma série de poluentes indesejaveis Example Automobile Emissions. Automobile engines emit a number of undesirable pollutants 5.6.1 quando queimam gasolina. Lorenzen (1980) estudou as quantidades de varios poluentes 5.6.1 when they burn gasoline. Lorenzen (1980) studied the amounts of various pollutants emitidos por 46 motores de automéveis. Uma classe de poluentes consiste nos oxidos de emitted by 46 automobile engines. One class of polutants consists of the oxides of nitrogénio. A Figura 5.1 mostra um histograma das 46 quantidades de oxidos de nitrogen. Figure 5.1 shows a histogram of the 46 amounts of oxides of nitrogen (in nitrogénio (em gramas por milha) relatadas por Lorenzen (1980). As barras do histograma grams per mile) that are reported by Lorenzen (1980). The bars in the histogram possuem areas que equivalem as proporcées da amostra de 46 medicgées que ficam entre have areas that equal the proportions of the sample of 46 measurements that lie os pontos do eixo horizontal onde ficam as laterais das barras. Por exemplo, a quarta between the points on the horizontal axis where the sides of the bars stand. For barra (que vai de 1,0 a 1,2 no eixo horizontal) tem area 0.870x0.2 = 0.174, o que equivale a example, the fourth bar (which runs from 1.0 to 1.2 on the horizontal axis) has 8/46 porque existem oito observacgées entre 1,0 e 1,2. Quando quisermos fazer area 0.870 x 0.2 = 0.174, which equals 8/46 because there are eight observations declaracdes sobre probabilidades relacionadas com as emiss6es, precisaremos de uma between 1.0 and 1.2. When we want to make statements about probabilities related distribuigdo para modelar as emiss6es. A familia de distribuigdes normais apresentada to emissions, we will need a distribution with which to model emissions. The family of nesta secdo provara ser valiosa em exemplos como este. normal distributions introduced in this section will prove to be valuable in examples - such as this. < A familia de distribuigdes normais, que sera definida e discutida nesta secdo, The family of normal distributions, which will be defined and discussed in this é de longe a colecgdo mais importante de distribuigées de probabilidade. section, is by far the single most important collection of probability distributions 5.6 As Distribuig6es Normais 303 5.6 The Normal Distributions 303 Figura 5.1Histograma Figure 5.1 Histogram de emiss6es de Oxidos de 12 of emissions of oxides of 12 nitrogénio, por Exemplo 5.6.1, em nitrogen for Example 5.6.1 gramas por milha durante um 1,0 in grams per mile over a 1.0 regime de condugdo comum. common driving regimen. 2 08 g 08 3 06 £06 a oO 0,4 0.4 0,2 02 9 0,5 1,0 1,5 2,0 25 3,0 0 0.5 1.0 15 2.0 2.5 3.0 Oxidos de nitrogénio Oxides of nitrogen nas estatisticas. Existem trés raz6es principais para esta posicdo proeminente destas in statistics. There are three main reasons for this preeminent position of these distribuicgées. distributions. A primeira razdo esta diretamente relacionada as propriedades matematicas das The first reason is directly related to the mathematical properties of the normal distribuig6es normais. Demonstraremos nesta secdo e em varias secdes posteriores deste distributions. We shall demonstrate in this section and in several later sections of this livro que se uma amostra aleatoria for retirada de uma distribuigéo normal, entao as book that if a random sample is taken from a normal distribution, then the distribu- distribuigées de varias fungdes importantes das observacgées na amostra podem ser tions of various important functions of the observations in the sample can be derived derivadas explicitamente e terdo elas prdoprias formas simples. Portanto, 6 uma explicitly and will themselves have simple forms. Therefore, it is a mathematical con- conveniéncia matematica poder assumir que a distribuigdo da qual uma amostra aleatéria venience to be able to assume that the distribution from which a random sample is é extraida é uma distribuigéo normal. drawn is a normal distribution. Asegunda razdo é que muitos cientistas observaram que as varidveis aleatdrias The second reason is that many scientists have observed that the random vari- estudadas em varios experimentos fisicos frequentemente tém distribuigdes aproximadamente ables studied in various physical experiments often have distributions that are ap- normais. Por exemplo, uma distribuigdo normal sera normalmente uma grande aproximacao a proximately normal. For example, a normal distribution will usually be a close ap- distribuigdo das alturas ou pesos dos individuos numa populagdo homogénea de pessoas, talos proximation to the distribution of the heights or weights of individuals in a homoge- de milho ou ratos, ou a distribuigdo da resist€ncia a traccdo de pecas de aco produzidas por um neous population of people, corn stalks, or mice, or to the distribution of the tensile determinado processo. As vezes, uma simples transformacao das variaveis aleatérias strength of pieces of steel produced by a certain process. Sometimes, a simple trans- observadas tem uma distribuicdo normal. formation of the observed random variables has a normal distribution. A terceira razdo para a preeminéncia das distribuigdes normais é 0 teorema do limite The third reason for the preeminence of the normal distributions is the central central, que sera declarado e provado na Se¢ao. 6.3. Se uma grande amostra aleatoria for limit theorem, which will be stated and proved in Sec. 6.3. If a large random sample is retirada de alguma distribuigéo, mesmo que esta distribuigdo ndo seja aproximadamente taken from some distribution, then even though this distribution is not itself approx- normal, uma consequéncia do teorema do limite central é que muitas fungdes imately normal, a consequence of the central limit theorem is that many important importantes das observagées na amostra terdo distribuigdes que sdo aproximadamente functions of the observations in the sample will have distributions which are approx- normais. Em particular, para uma grande amostra aleatéria de qualquer distribuicgdo que imately normal. In particular, for a large random sample from any distribution that tenha uma variancia finita, a distribuigdo da média da amostra aleatéria sera has a finite variance, the distribution of the average of the random sample will be aproximadamente normal. Voltaremos a este tema no préximo capitulo. approximately normal. We shall return to this topic in the next chapter. Propriedades de distribuicdes normais Properties of Normal Distributions Definicgao Definigdo e pdfUma variavel aleatoriaXtem odistribuicao normal com média pr Definition _ Definition and p.d.f. A random variable X has the normal distribution with mean 5.6.1 e variancia 02(-~ <pi<eea >0) seXtem distribuigdo continua com o seguinte pdf: 5.6.1 and variance o* (—oo < jz < co ando > 0) if X has a continuous distribution with the following p.d.f.: 1 thy M 2 1 (4)] F(X| LM, 02 = ———— experiencia” ~ = “———— para -~< x <oo (5.6.1) flu, 07) = ——— exp} -= | —— for —co<x<oo. (5.6.1) Qmpr20 2 oO (21) V/2a 2 o 304 Capitulo 5 Distribuigées Especiais 304 Chapter 5 Special Distributions Devemos primeiro verificar se a fungdo definida na Eq. (5.6.1) € uma pdf. Logo We should first verify that the function defined in Eq. (5.6.1) is a p.d.f. Shortly em seguida, verificaremos se a média e a variancia da distribuigdo com pdf (5.6.1) thereafter, we shall verify that the mean and variance of the distribution with p.d.f. sdo de fatoyeoz, respectivamente. (5.6.1) are indeed yu and o?, respectively. Teorema A fungao definida na Eq. (5.6.1) € um pdf Theorem The function defined in Eq. (5.6.1) is a p.d-f. 5.6.1 5.6.1 Prova Claramente, a funcgdo nao é poe Devemos também mostrar que Proof Clearly, the function is nonnegative. We must also show that * CO f(x| p, o2)dx=1. (5.6.2) / f(x|m, 07) dx =1. (5.6.2) 0 —0o Se deixarmossim=(x-j/)/o,entao If we let y = (x — w)/o, then Je ey (4) °° 2 4 1, f(x| pl, O2)dx= experiencia = SIM Bnorri / X|m, 0°) dx = / —— ex (-5 ) dy. . Ly _e@m) 2” 5 _ IS |, 0") “0 Gay PL —5 ) ay Vamos agora deixar We shall now let fe (4) 00 1 EU= exp - S/FFR2 morri. (5.6.3) l= / exp(-3y°) dy. (5.6.3) ~ 0 2 00 2 Entéo devemos mostrar queFUE(277)1 2. Then we must show that J = (2z7)!/2. Da Eq. (5.6.3), segue-se que From Eq. (5.6.3), it follows that Joo ( Ny} o ) 00 00 1 Simemorrer 1 2 1 2 1 2 EU2=EU . FU= experiéncia” = experienia= =Z2 AZ =1-T= exp( —~y~ ] dy exp| —=z° | dz = 00 2 = 00 2 —oo 2 —0o0 2 fof [ . = eXP- =(voce+z) adydz. = / / exp —=(y* +2) | dy dz. Vamos agora mudar as variaveis nesta integral des/mezpara as coordenadas We shall now change the variables in this integral from y and z to the polar coordi- polares Re Odeixandosim=Rporque Gez=Rpecado@. Entdo, desdesim2+22= 2, nates r and 6 by letting y =r cos @ and z=r sin@. Then, since y* + z* =r”, Janfo ( 1 ) 2n ro 1 Fl2= experéncia - — R2r Ar AG=271, (5.6.4) P= / / exp (->") r dr d@ =2n, (5.6.4) 0 oO 2 0 Jo 2 onde a integral interna em (5.6.4) é realizada substituindo =A2/2 comav=RDR, entdo where the inner integral in (5.6.4) is performed by substituting v = r?/2 withdv = rdr, a integral interna é so the inner integral is Joo oo experiéncia¢-vjdv=1, / exp(—v)dv = 1, 0 0 ea integral externa é 27. Portanto, FU=(27) 2e Eq. (5.6.2) foi estabelecido. and the outer integral is 277. Therefore, J = (27)!/* and Eq. (5.6.2) has been estab- 7 lished. 7 Exemplo Emissdes de automéveis.Considere os motores de automéveis descritos no Exemplo 5.6.1. Example Automobile Emissions. Consider the automobile engines described in Example 5.6.1. 5.6.2 A Figura 5.2 mostra o histograma da Figura 5.1 juntamente com a pdf normal tendo média e 5.6.2 Figure 5.2 shows the histogram from Fig. 5.1 together with the normal p.d.f. having variancia escolhidas para corresponder aos dados observados. Embora 0 pdf nado corresponda mean and variance chosen to match the observed data. Although the p.d.f. does not exatamente ao formato do histograma, ele corresponde notavelmente bem. - exactly match the shape of the histogram, it does correspond remarkably well. < Poderiamos verificar diretamente, utilizando integragdo por partes, que a média e a variancia da We could verify directly, using integration by parts, that the mean and variance distribuicgdo com fdp dada pela Eq. (5.6.1) sdo, respectivamente,/eoz. (Veja o Exercicio 26.) No entanto, of the distribution with p.d.f. given by Eq. (5.6.1) are, respectively, . and o”. (See precisamos da funcdo geradora de momento de qualquer maneira, e entéo podemos simplesmente Exercise 26.) However, we need the moment generating function anyway, and then calcular duas derivadas de mgf para encontrar os dois primeiros momentos. we can just take two derivatives of the m.g.f. to find the first two moments. Teorema Funcdo geradora de momento.O mgf da distribuigdo com pdf dado por Theorem Moment Generating Function. The m.g.f. of the distribution with p.d.f given by 5.6.2 Eq. (5.6.1) é 5.6.2 Eq. (5.6.1) is ( ton 1 Y(t=experiéncia pt 578 Para -~<f <e, (5.6.5) W(t) = exp (i + 50°") for —co <t <o. (5.6.5) 5.6 As Distribuig6es Normais 305 5.6 The Normal Distributions 305 Figura 5.2Histograma Figure 5.2 Histogram de emissdes de oxidos de 1.2 of emissions of oxides of 12 nitrogénio para o Exemplo 5.6.2 nitrogen for Example 5.6.2 juntamente com uma pdf normal 1,0 together with a matching 1.0 correspondente normal p.d.f. 5 0:8 = 08 3 0,6 = 0.6 a A 0,4 04 0,2 0.2 0 0,5 1,0 1,5 2,0 2,5 3,0 o 0.5 1.0 15 2.0 25 3.0 Oxidos de nitrogénio Oxides of nitrogen ProvaPela definigdo de um mof, Proof By the definition of an m.g.f., if [ 1 (pp | ~~ 1 (x — pw)? t E(etx, ——— operienialX- ————_ x, t) = E(el*) = ——— exp] tx — ———— | dx. Y(t E(exF -eQn) Wo oD v(t) (e") xo Gm) 2a Pp 62 Completando o quadrado entre colchetes (ver Exercicio 24), obtemos a relacdo By completing the square inside the brackets (see Exercise 24), we obtain the relation - . _ 2 _ 24) 2 be OSH = pte 1 op. Krai py — SO 8 a Top boMtooy 202 2 202 202 2 202 Portanto, Therefore, ( 1 1 Y(t=Cexperiéncia LC 57k , W(t)=C exp (i + 30°), onde where Je 1 Lx (LA o2t)]2 od [x — (u +07t)P Ce experiencia ——_——_ A. C= / ——— exp} ———-_—————_ } dx. -o(2m) 120 202 «© (2n)/2e 202 Se agora substituirmosycomp oztna Eq. (5.6.1), segue da Eq. (5.6.2) queC=1. If we now replace yw with pz + o7t in Eq. (5.6.1), it follows from Eq. (5.6.2) that C = 1. Portanto, o maf da distribuigdo normal é dado pela Eq. (5.6.5). 7 Hence, the m.g.f. of the normal distribution is given by Eq. (5.6.5). 7 Agora estamos prontos para verificar a média e a variancia. We are now ready to verify the mean and variance. Teorema Média e Varidncia.A média e a variancia da distribuigdo com pdf dada por Theorem Mean and Variance. The mean and variance of the distribution with p.d-f. given by 5.6.3 Eq. (5.6.1) sdoeo2, respectivamente. 5.6.3 Eq. (5.6.1) are jz and o”, respectively. ProvaAs duas primeiras derivadas do mgf na Eq. (5.6.5) sdo Proof The first two derivatives of the m.g.f. in Eq. (5.6.5) are ( )( 1? ; i Wi ((=ir o2t experiénciaytt 57k w(t) = (u +o ') exp(u + 3° t ) ( )( low 1 Wr (te [LH o2f]2+ 02 experiénciapt+ 5" w(t) = (7 + otf + 0”) exp (i + 50°?) Conectandof=0 em cada um desses derivados rende Plugging t = 0 into each of these derivatives yields EXF OF pe Var (XE YO} [yr OJl2=02. 7 E(X)=w'@)=p and Var(X) = w"(0) — [W'(O)P =o”. 7 Desde 0 mgfw(tX finito para todos os valores dez,todos os momentos EX«) (k= 1,2 Since the m.g.f. y(t) is finite for all values of t, all the moments E(X*) (k = ,..Jtambém sera finito. 1, 2,...) will also be finite. 306 Capitulo 5 Distribuigées Especiais 306 Chapter 5 Special Distributions Figura 5.30 pdf de uma fx| eu, 62) Figure 5.3 The p.d.f. of a Fee| wo?) distribuigdo normal. 1 normal distribution. ' V2obs: \270. eu-2é eué eu eu é eu 26 x pw 20 Lo Lb ut+o wt+2o x Exemplo Mudangas no prego das agdes.Um modelo popular para a variagdo do prego de uma acao ao longo de um Example Stock Price Changes. A popular model for the change in the price of a stock over a 5.6.3 periodo de tempo de duracdovocéé dizer que 0 preco apés 0 tempovocéé Svocé= S0e@Zvoce, onde Z 5.6.3 period of time of length w is to say that the price after time u is S, = Sge“", where vocétem a distribuigdo normal com médiapvocée variagao o2vocé. Nesta formula,So Z,, has the normal distribution with mean jw and variance o7u. In this formula, Sp é o prego atual da agdo, eoé chamado devolatilidadedo prego das acées. O valor is the present price of the stock, and o is called the volatility of the stock price. The esperado deSvocepode ser calculado a partir do mgfwdeZvoce: expected value of S, can be computed from the m.g.f. y of Z,,: 2 E(SvocéF S0 E(€ZvocéF 50 Wi = S0 uu o2vocé?. - E(S,) = SpE (e7") = Sow (1) = Spet"*? u/2 < As formas das distribuig6es normaisisso pode ser visto na Eq. (5.6.1) que o The Shapes of Normal Distributions It can be seen from Eq. (5.6.1) that the p.d.f. pdf f(x| u, o2)da distribuigdo normal com médiape variagdoozé simétrico em f (x|u, 7) of the normal distribution with mean p and variance o? is symmetric relagdo ao pontox=y. Portanto,vé a média e a mediana da distribuigdo. Além with respect to the point x = yw. Therefore, jy is both the mean and the median disso, é também o modo de distribuicgdo. Em outras palavras, 0 pdff/x| p, o2) of the distribution. Furthermore, y is also the mode of the distribution. In other atinge seu valor maximo no pontox=y. Finalmente, ao diferenciar/f(x| Uy, o2z)duas words, the p.d.f. f (x|u, 07) attains its maximum value at the point x = j. Finally, by vezes, pode-se descobrir que existem pontos de inflexdo em x=L/+0e eEMX=L+a. differentiating f(x|, 07) twice, it can be found that there are points of inflection at x= ut+oandatx=p-o. O pdff(x| u, o2Jesta esbocgado na Fig. 5.3. Vé-se que a curva tem “formato de sino”. No The p.d.f. f(x|u, 07) is sketched in Fig. 5.3. It is seen that the curve is “bell- entanto, ndo é necessariamente verdade que toda fdp arbitraria em forma de sino possa ser shaped.” However, it is not necessarily true that every arbitrary bell-shaped p.d.f. aproximada pela fdp de uma distribuigdo normal. Por exemplo, a FDP de uma distribuigdo de can be approximated by the p.d.f. of a normal distribution. For example, the p.d-f. of Cauchy, conforme esbocada na Figura 4.3, € uma curva simétrica em forma de sino que a Cauchy distribution, as sketched in Fig. 4.3, is a symmetric bell-shaped curve which aparentemente se assemelha a FDP esbocada na Figura 5.3. No entanto, como nao existe apparently resembles the p.d-f. sketched in Fig. 5.3. However, since no moment of nenhum momento da distribuicgéo de Cauchy - nem mesmo a média -, as caudas da fdp de the Cauchy distribution—not even the mean—exists, the tails of the Cauchy p.d.f. Cauchy devem ser bem diferentes das caudas da fdp normal. must be quite different from the tails of the normal p.d.f. Transformagées LinearesMostraremos agora que se uma variavel aleatériaXtem uma Linear Transformations We shall now show that if a random variable X has a nor- distribuicgaéo normal, entao toda fungdo linear deXtambém tera uma distribuigdo normal. mal distribution, then every linear function of X will also have a normal distribution. Teorema SeXtem a distribuigdo normal com médiawe variagdooze se S= machadot b, Theorem If X has the normal distribution with mean jw and variance o* and if Y =aX +b, 5.6.4 ondeaebrecebem constantes ea=0, entaoStem a distribuigdo normal com média 5.6.4 where a and b are given constants and a ¥ 0, then Y has the normal distribution with at be variacaoao2. mean ay + b and variance a’o?. ProvaO mofydexé dado pela Eq. (5.6.5). Sewsdenota o mgf deS,entado Proof The m.g.f. ¥ of X is given by Eq. (5.6.5). If wy denotes the m.g.f. of Y, then ! 1 1 Ws(O= ealiés(em-experiéncia (apt b)t+ 5 Poe para -~<t <oo, Wy(t) = ew (at) = exp] (an +b)t+ seer | for -co<t<o. 5.6 As Distribuig6es Normais 307 5.6 The Normal Distributions 307 Comparando esta expressdo parawscom o mgf de uma distribuigdo normal dada By comparing this expression for yy with the m.g.f. of a normal distribution given in na Eq. (5.6.5), vemos quewsé o maf da distribuigdo normal com médiaay+be Eq. (5.6.5), we see that py is the m.g.f. of the normal distribution with mean au +b varia¢doazoz. Por isso, Sdeve ter essa distribuigdo normal. 7 and variance ao”. Hence, Y must have this normal distribution. 7 A distribuig¢ao normal padrdo The Standard Normal Distribution Definicgao Distribuigdo Normal Padrdo.A distribuigdo normal com média 0 e variancia 1 é Definition | Standard Normal Distribution. The normal distribution with mean 0 and variance 1 is 5.6.2 Chamou odistribui¢ao normal padrdo. O pdf da distribuigdo normal padrdo 5.6.2 called the standard normal distribution. The p.d.f. of the standard normal distribution geralmente é denotado pelo simbolog, e o cdf é denotado pelo simbolo . Por isso, is usually denoted by the symbol ¢, and the c.d.f. is denoted by the symbol ®. Thus, ( ) 1 1 1 1 (XE f(x| 0,1 ———— experincias = X2 ara -0 <x <oo (5.6.6) x) = f(x|0, 1) = ——~ ex (-5") for -co <x <©o 5.6.6 ” | Dine 5 p o(x) = f (x10, 1) Ouyie *P\ 9 (5.6.6) e and Jx x (XF glvocéjdu = para -~<x <oo, (5.6.7) P(x) = / o(u)du for -w<x<o, (5.6.7) 00 —0o onde o simbolovocéé usado na Eq. (5.6.7) como variavel dummy de integracao. where the symbol u is used in Eq. (5.6.7) as a dummy variable of integration. O CDF(x)ndo pode ser expresso de forma fechada em termos de fungdes The c.d.f. ®(x) cannot be expressed in closed form in terms of elementary elementares. Portanto, as probabilidades para a distribuigdo normal padrao ou qualquer functions. Therefore, probabilities for the standard normal distribution or any other outra distribuigdo normal s6 podem ser encontradas por aproximacées numéricas ou normal distribution can be found only by numerical approximations or by using a usando uma tabela de valores de(xJcomo o apresentado no final deste livro. Nessa tabela, table of values of ®(x) such as the one given at the end of this book. In that table, the os valores de(x)sd0 dados apenas parax20. A maioria dos pacotes de computador que values of ®(x) are given only for x > 0. Most computer packages that do statistical fazem anialises estatisticas contém fungées que calculam o cdf e a fun¢gdo quantilica da analysis contain functions that compute the c.d.f. and the quantile function of the distribuigdo normal padrao. Conhecendo os valores de(x)parax20 e-1(P)para 0.5 0 and ®~'(p) for suficiente para calcular o cdf e a fungdo quantilica de qualquer distribuigdo normal em 0.5 < p <1 is sufficient for calculating the c.d.f. and the quantile function of any qualquer valor, como mostram os préximos dois resultados. normal distribution at any value, as the next two results show. Teorema Consequéncias da Simetria.Para todosxe todos 0<p <1, Theorem Consequences of Symmetry. For all x and all 0 < p <1, 9.6.5 -XF1 -(xJe “41(PF -1(1 -pdag.). (5.6.8) 5.6.5 @(-x)=1-—6(%) and @ '(p)=-—o 11 — p). (5.6.8) ProvaComo a pdf da distribuig¢do normal padrdo é simétrica em relagdo ao Proof Since the p.d.f. of the standard normal distribution is symmetric with respect pontox=0, segue-se que Pr(XSx}Pr.(X2 -x)Jpara cada numerox (-~< x <e), Desde to the point x = 0, it follows that Pr(X < x) = Pr(X > —x) for every number x (—oo < Pr(XSxE (xe Pr(X2 -x¥1 -(-x), temos a primeira equacgdo na Eq. (5.6.8). A segunda x <oo). Since Pr(X < x) = ®(x) and Pr(X > —x) =1— ®(—x), we have the first equacdo segue deixandox=-1(P)na primeira equacgdo e depois aplicando a fungao equation in Eq. (5.6.8). The second equation follows by letting x = &~!(p) in the -1para ambos os lados da equa¢do. 7 first equation and then applying the function ®~! to both sides of the equation. m Teorema Convertendo distribuigdes normais em padrdo.DeixarXtem a distribuigdo normal com Theorem Converting Normal Distributions to Standard. Let X have the normal distribution with 5.6.6 significarye variagdo o2. Deixar Seja o cdf deX. EntdoZ=(X*-~/)//otem o 5.6.6 mean yu and variance o7. Let F be the c.d.f. of X. Then Z = (X — 1)/o has the distribuigdo normal padrdo e, para todos todos oP <1, standard normal distribution, and, for all x and all 0 < p <1, x- _ Fix XH (5.6.9) F(x) = (—) (5.6.9) oO o Fi(PEuto -1(P). (5.6.10) F-\(p)=w+o\(p). (5.6.10) ProvaSegue imediatamente do Teorema 5.6.4 queZ=(X-~/)//otem o padrao Proof It follows immediately from Theorem 5.6.4 that Z = (X — w)/o has the stan- distribuigdo normal padrao. Portanto, ’ dard normal distribution. Therefore, x _ F(x)=Pr.(XSxEPr.Z< XH F(x) =Pr(X <x) =Pr (z <i* ) , oO o que estabelece a Eq. (5.6.9). Para a equacao (5.6.10), deixep=F/x)na Eq. (5.6.9) e entao which establishes Eq. (5.6.9). For Eq. (5.6.10), let p = F(x) in Eq. (5.6.9) and then resolva paraxna equag¢ao resultante. 7 solve for x in the resulting equation. 7 308 Capitulo 5 Distribuigées Especiais 308 Chapter 5 Special Distributions Exemplo Determinando probabilidades para uma distribuigéo normal.Suponha queXtem o normal Example Determining Probabilities for a Normal Distribution. Suppose that X has the normal 5.6.4 distribuigdo com média 5 e desvio padrdo 2. Determinaremos 0 valor de Pr(1 <X< 5.6.4 distribution with mean 5 and standard deviation 2. We shall determine the value of 8). Pr(1 < X <8). Se delxarmosZ°(¥5//2, entsodera a aistribuigeo normal padrdo e If we let Z = (X — 5)/2, then Z will have the standard normal distribution and 1-5 ~X5 8-5 1-5 x-5 8-5 Pr.(1 <X<8}Pr. — _ < — < — =Pr.(-2<Z<1.5). Prdi < X <8)=Pr (= <—-< =) = Pr(—2 < Z < 1.5). 2 2 2 2 2 2 Além disso, Furthermore, Pr.(-2<Z<1.5}Pr.(Z<1.5}Pr.(Zs -2) Pr(—2 < Z < 1.5)=Pr(Z < 1.5) — Pr(Z < —2) = A5)F2)0.5- [1 = (1.5) — &(-2) = -(2). = (1.5) — [1 — &(2)]. Na tabela no final deste livro, verifica-se que 0.9773. (1.540.9332e (2 From the table at the end of this book, it is found that ®(1.5) = 0.9332 and ®(2) = Portanto, 0.9773. Therefore, Pr.(1<X<8)-0.9105. - Pr(i < X < 8) =0.9105. < Exemplo Quantis de distribuigdes normais.Suponha que os engenheiros que coletaram os Example Quantiles of Normal Distributions. Suppose that the engineers who collected the 5.6.5 os dados de emiss6es de automéveis no Exemplo 5.6.1 estado interessados em 5.6.5 automobile emissions data in Example 5.6.1 are interested in finding out whether descobrir se a maioria dos motores sdo poluidores graves. Por exemplo, poderiam most engines are serious polluters. For example, they could compute the 0.05 quantile calcular o quantil 0,05 da distribuigdo das emissdes e declarar que 95 por cento dos of the distribution of emissions and declare that 95 percent of the engines of the motores do tipo testado excedem este quantil. DeixarXseja a média de gramas de type tested exceed this quantile. Let X be the average grams of oxides of nitrogen 6xidos de nitrogénio por milha para um motor tipico. Entao os engenheiros per mile for a typical engine. Then the engineers modeled X as having a normal modelaramxXcomo tendo uma distribuigdo normal. A distribuigdo normal tragada na distribution. The normal distribution plotted in Fig. 5.2 has mean 1.329 and standard Figura 5.2 tem média 1,329 e desvio padrdo 0,4844. O cdf ddxsérBYOMADRAA) ze deviation 0.4844. The c.d.f. of X would then be F(x) = ®([x — 1.329]/0.4844), and fungdo quantil seriaF1(P1.329 + 0.4844 -1(P), onde -1€0 the quantile function would be F-'(p) = 1.329 + 0.48446-!(p), where 7! is the fungdo quantilica da distribuigdo normal padrdo, que pode ser avaliada em um quantile function of the standard normal distribution, which can be evaluated using computador ou em tabelas. Encontrar-1(P)na tabela de , encontre o valor mais a computer or from tables. To find @-!(p) from the table of ®, find the closest value prdoximo depno(x)coluna e leia o inverso daxcoluna. Como a tabela sé possui valores to p in the ®(x) column and read the inverse from the x column. Since the table only dep >0.5, usamos a Eq. (5.6.8) para concluir que-1(0.05} --1(0.95). Entdo, procure 0.95 has values of p > 0.5, we use Eq. (5.6.8) to conclude that ®~!(0.05) = —&~1(0.95). So, pol.(xJcoluna (a meio caminho entre 0,9495 e 0,9505) para encontrarx=1.645 (a meio look up 0.95 in ®(x) column (halfway between 0.9495 and 0.9505) to find x = 1.645 caminho entre 1,64 e 1,65) e concluir que -1(0.05}- -1.645. O 0.05 (halfway between 1.64 and 1.65) and conclude that &~!(0.05) = —1.645. The 0.05 quantil dexé entdo 1.329 + 0.4844x(—-1.645 0.5322. - quantile of X is then 1.329 + 0.4844 x (—1.645) = 0.5322. < Comparacées de distribuigdes normais Comparisons of Normal Distributions As PDFs de trés distribuig6es normais estao esbocadas na Fig. 5.4 para um valor fixo de pe The p.d.f.’s of three normal distributions are sketched in Fig. 5.4 for a fixed value of trés valores diferentes deo(o=1/2,1 e 2). Pode-se ver nesta figura que a pdf de uma uw and three different values of o (o = 1/2, 1, and 2). It can be seen from this figure distribuigdo normal com um pequeno valor deotem um pico alto e esta muito that the p.d.f. of a normal distribution with a small value of o has a high peak and concentrado em torno da médiay, enquanto a pdf de uma distribuigdo normal com um is very concentrated around the mean jz, whereas the p.d.f. of a normal distribution valor maior deoé relativamente plana e esta mais espalhada ao longo da linha real. with a larger value of o is relatively flat and is spread out more widely over the real line. Um facto importante é que cada distribuigdo normal contém a mesma quantidade An important fact is that every normal distribution contains the same total total de probabilidade dentro de um desvio padrdo da sua média, a mesma quantidade amount of probability within one standard deviation of its mean, the same amount dentro de dois desvios padrdo da sua média, e a mesma quantidade dentro de qualquer within two standard deviations of its mean, and the same amount within any other outro numero fixo de desvios padrdo da sua média. Em geral, seXtem a distribuicdo fixed number of standard deviations of its mean. In general, if X has the normal dis- normal com médiape variacgdooz2, e seZtem a distribuigdo normal padrdo, entao parak >0, tribution with mean yw and variance o?, and if Z has the standard normal distribution, then for k > 0, pk=Pr.(|X-p| sko}Pr.(| Z| $k). Pe=Pr(|X — w| < ko) = Pr(|Z| <&). Na Tabela 5.2, os valores desta probabilidadepssdo dados para varios valores dek. Essas In Table 5.2, the values of this probability p, are given for various values of k. probabilidades podem ser calculadas a partir de uma tabela ou usando programas de computador. These probabilities can be computed from a table of ® or using computer programs. 5.6 As Distribuig6es Normais 309 5.6 The Normal Distributions 309 Figura 5.40 pdf normal Figure 5.4 The normal p.d-f. parap= 0 eo= 1/2,1,2. for u =O ando = 1/2, 1, 2. z 1 e135 o=5 é1 o=1 é2 o=2 -4 3 -2 -1 0 1 2 3 4 -4 -3 -2 1 0 1 2 3 4 Tabela 5.2Probabilidades que normais Table 5.2 Probabilities that normal variaveis aleatérias estdo random variables are within dentro kdesvios padrao de k standard deviations of suas médias their means 1 0,6826 1 0.6826 2 0,9544 2 0.9544 3 0,9974 3 0.9974 4 0,99994 4 0.99994 5 1 - 6x10-7 5 1-6x 10-7 10 1-2x10-23 10 1-2x 10-3 Embora a fdp de uma distribuicgdo normal seja positiva em toda a linha real, pode-se ver Although the p.d.f. of a normal distribution is positive over the entire real line, it can nesta tabela que a quantidade total de probabilidade fora de um intervalo de quatro be seen from this table that the total amount of probability outside an interval of desvios padrao em cada lado da média é de apenas 0,00006. four standard deviations on each side of the mean is only 0.00006. Combinagées Lineares de Varidveis Normalmente Distribuidas Linear Combinations of Normally Distributed Variables No proximo teorema e corolario, provaremos o seguinte resultado importante: Toda In the next theorem and corollary, we shall prove the following important result: combinacao linear de variaveis aleatérias que sdo independentes e normalmente Every linear combination of random variables that are independent and normally distribuidas também tera uma distribuigéo normal. distributed will also have a normal distribution. Teorema Se as varidveis aleatériasXi,..., Xisdo independentes e seXeutem a distribuigdo normal Theorem If the random variables Xj, ..., X; are independent and if X; has the normal distri- 5.6.7 masg¢do com médiapeue variagdoo2 eu(eu=1,..., k), entdo asomait. . .+Xitem 5.6.7 bution with mean jy; and variance o? (i =1,...,k), then the sum X, + ---+ X, has a distribuigdo normal com médiapit. . .+ Ke variagdooz 1t...402 the normal distribution with mean ju, + --- + 4, and variance ot feet on. 310 Capitulo 5 Distribuigées Especiais 310 Chapter 5 Special Distributions ProvaDeixar Weu(t)denotar o mgf deXeuparaeu=1,..., k, e deixary(tdenotar o maf de Proof Let y;(t) denote the m.g.f. of X; fori =1,..., k, and let y(t) denote the m.g.f. Xi+.. .+Xk. Ja que as variaveisX,..., XsSdo independentes, entdo of X,+---+ X;,. Since the variables X,, ..., X;, are independent, then K mK 1 ) k k 1 Yt Peu(t= —eperénaa pleut 5 at vo=[[ vio =[ [exp (vt + 307) ert ey i=l i=l C (x I k k 2 wi 2 b 1 2\21 ¢ serpin peut 5 OR, para -2<t <e, =exp| (> 4; )r+ 5 So; |r Or —0O <1 <@. eu=1 eu=1 i=l i=l Da Eq. (5.6.5), o mgfW(t}jpode serigentificado como o mof dosdisturbio normal From Eq. (5.6.5), the m.g.f. w(t) can be identified as the m.g.f. of the normal dis- contribuicdo para a qual a média é eu=1 eu a variaGdo é Kat géu.Portanto, o tribution for which the mean is an jz; and the variance is vO o?. Hence, the distribuigdo deXi+. . .+Xxdeve ser como indicado no teorema. = distribution of X, + ----+ X; must be as stated in the theorem. 2 O seguinte corolario é agora obtido combinando os Teoremas 5.6.4 e 5.6.7. The following corollary is now obtained by combining Theorems 5.6.4 and 5.6.7. Corolario Se as variaveis aleatoriasXi,..., Xssdo independentes, seXeutem a distribuigdo normal Corollary If the random variables X;,..., X; are independent, if X; has the normal distribution 5.6.1 com médiapeve variaGdooz eu(eu=1,..., k),esea,..., akebsdo constantes 5.6.1 with mean jy; and variance oa? (i =1,...,k), and if a,,..., a, and b are constants para o qual pelo menos um dos valoresa,..., aké diferente de 0, entdo a for which at least one of the values aj, ..., a; is different from 0, then the variable variavel avXit. . .+akXkt+ btem a distribuigdo normal com médiaaipit. . rakutb a,X,+---+a,X;, +b has the normal distribution with mean aju,+---+au,+b evariagdoaz 102+...+a2 ko%, 7 and variance ajo} +--+ +202. 7 Exemplo Alturas de homens e mulheres.Suponha que a altura, em polegadas, das mulheres Example Heights of Men and Women. Suppose that the heights, in inches, of the women 5.6.6 em uma determinada populagdo seguem a distribuigaéo normal com média 65 e desvio padrao 5.6.6 in a certain population follow the normal distribution with mean 65 and standard 1, e que as alturas dos homens seguem a distribuigdo normal com média 68 e desvio padrao 3. deviation 1, and that the heights of the men follow the normal distribution with mean Suponha também que uma mulher seja selecionada aleatoriamente e, independentemente, um 68 and standard deviation 3. Suppose also that one woman is selected at random and, homem é selecionado aleatoriamente. Determinaremos a probabilidade de a mulher ser mais independently, one man is selected at random. We shall determine the probability alta que o homem. that the woman will be taller than the man. Deixar Cdenotar a altura da mulher selecionada, e deixarMdenota a altura do Let W denote the height of the selected woman, and let M denote the height of homem selecionado. Entdo a diferengcaC-Mtem distribuigdo normal com média 65 — the selected man. Then the difference W — M has the normal distribution with mean 68 = -3 e varidncia 12+32= 10. Portanto, se deixarmos 65 — 68 = —3 and variance 17 + 3? = 10. Therefore, if we let LZ _| CM+3 z= W-M+3 ~ T0ig (M3), =a W-M +3), entdoZtem a distribuig¢do normal padrdo. Segue que then Z has the standard normal distribution. It follows that Pr.(Ww> MFPr(CM >0) ) Pr(W > M) = Pr(W — M > 0) 3 3 = > —_— = = — = Pr. Z> ap O=Pr.(Z>0.949) Pi(z > as) Pr(Z > 0.949) =1- (0.949)0.171. = 1-— (0.949) = 0.171. Assim, a probabilidade de a mulher ser mais alta que o homem é de 0,171. - Thus, the probability that the woman will be taller than the man is 0.171. < As médias de amostras aleatérias de varidveis aleatérias normais figuram com destaque em Averages of random samples of normal random variables figure prominently in muitos calculos estatisticos. Para corrigir a notaco, comecamos com uma definicdo geral. many Statistical calculations. To fix notation, we start with a general defintion. Definigao Média da amostga DelxarX 1, ...,Xnsejam variaveis aleatorias. A média dessesnaleatério Definition Sample Mean. Let X),..., X, be random variables. The average of these n random 5.6.3 variaveis,t, ~~ eu-1Xeu, é chamado de seumédia da amostrae 6 comumente denotado Xn. 5.6.3 variables, 1 do), Xi, is called their sample mean and is commonly denoted X,,. O seguinte corolario simples do Corolario 5.6.1 fornece a distribuicdo da média The following simple corollary to Corollary 5.6.1 gives the distribution of the amostral de uma amostra aleatoria de variaveis aleatorias normais. sample mean of a random sample of normal random variables. 5.6 As Distribuig6es Normais 311 5.6 The Normal Distributions 311 Corolario Suponha que as variaveis aleatériasX,..., Xnformar uma amostra aleatéria do Corollary Suppose that the random variables X,,..., X, form a random sample from the 5.6.2 distribuigdo normal com médiaye variagdooz, e deixarXndenotar sua média 5.6.2 normal distribution with mean jy and variance o”, and let X,, denote their sample amostral. EntaoXntem a distribuigdo normal com médiape variagdoo2/n. mean. Then X,, has the normal distribution with mean yw and variance o7/n. — n — ProvaDesdeXn= 2 eu=1(1 ore segue do Corolario 9.641 que a distribuicdo Proof Since X, = yo /n)X;, it follows from Corollary 5.6.1 that the distribution — n — ~ de Xéolnexa-a¢gtimpeqe vattacaoeu=1 (1 /no2=02/n. ” . of X, is normal with mean )7"_,(1/n) = w and variance )~""_,(1/n)?o? =07/n. Exemplo Determinando um tamanho de amostra.Suponha que uma amostra aleatéria de tamanhoné para ser levado Example Determining a Sample Size. Suppose that a random sample of size n is to be taken 5.6.7 da distribuigdo normal com médiape varidncia 9. (As alturas dos homens no 5.6.7 from the normal distribution with mean yw and variance 9. (The heights of men Exemplo 5.6.6 tém tal distribuigdo comp= 68.) Determinaremos o valor minimo in Example 5.6.6 have such a distribution with 4 = 68.) We shall determine the denpara qual minimum value of n for which Pr.([Xnp| <120.95. Pr(|X,, — w| < 1) > 0.95. E conhecido pelo Corolario 5.6.2 que a média amostralXntera a distribuicgado It is known from Corollary 5.6.2 that the sample mean X,, will have the normal normal para a qual a média éve o desvio padrdo é 3/m.. Portanto, se deixarmos distribution for which the mean is jz and the standard deviation is 3/n'/?. Therefore, if we let M2 _ nl2 _ Z= —(Xn- py), Z= (Xn - b), 3 3 entdoZtera a distribuigdo normal padrdo. Neste exemplo,ndeve ser escolhido then Z will have the standard normal distribution. In this example, n must be chosen para que ( ) so that _ MR _ ni/2 Pr.(|Xrp| $1 }Pré |Z| < zz 20.95. (5.6.11) Pr(|X,, — u| <1) =Pr{ |Z| < z > 0.95. (5.6.11) Para cada numero positivox, sera verdade que Pr(| Z| <x/20.95 se e somente For each positive number x, it will be true that Pr(|Z| < x) > 0.95 if and only if se 1 -(xEPr.(Z > x)S0.025. Da tabela de distribuigdo normal padrdo ao final deste 1— ®(x) = Pr(Z > x) < 0.025. From the table of the standard normal distribution at livro, verifica-se que 1 -(x)S0.025 se e somente sex21.96. Portanto, a the end of this book, it is found that 1 — ®(x) < 0.025if and only if x > 1.96. Therefore, desigualdade na relagdo (5.6.11) sera satisfeita se e somente se the inequality in relation (5.6.11) will be satisfied if and only if MR ni/2 —— 21.96. — > 1.96. 3 3 Como 0 menor valor permitido dené 34,6, o tamanho da amostra deve ser de pelo menos Since the smallest permissible value of n is 34.6, the sample size must be at least 35 35 para que a relacdo especificada seja satisfeita. - in order that the specified relation will be satisfied. < Exemplo Intervalo para média.Considere uma populacdo com distribuigdo normal como a Example Interval for Mean. Consider a popluation with a normal distribution such as the 5.6.8 alturas dos homens no Exemplo 5.6.6. Suponha que nao estamos dispostos a especificar a 5.6.8 heights of men in Example 5.6.6. Suppose that we are not willing to specify the distribuicdo precisa como fizemos naquele exemplo, mas apenas que 0 desvio padrao é 3, precise distribution as we did in that example, but rather only that the standard deixando a médiaymAo especificado. Se amostrarmos varios homens desta populacao, deviation is 3, leaving the mean jz unspecified. If we sample a number of men from poderemos tentar usar as suas alturas amostradas para nos dar uma ideia do que é.//é igual a. this population, we could try to use their sampled heights to give us some idea what jz Uma forma popular de inferéncia estatistica que sera discutida na Seg. 8.5 encontra um equals. A popular form of statistical inference that will be discussed in Sec. 8.5 finds intervalo que tem uma probabilidade especificada de contery. Para ser mais especifico, an interval that has a specified probability of containing jw. To be specific, suppose suponha que observamos uma amostra aleatoria de tamanhonda distribui¢éo normal com that we observe a random sample of size n from the normal distribution with mean meédia pe desvio padrao 3. Entao,Xntem a distribuicdo normal com médiaye desvio padrao 3/m,/ j and standard deviation 3. Then, X,, has the normal distribution with mean jy and 2como no Exemplo 5.6.7. Da mesma forma, podemos definir standard deviation 3/n!/? as in Example 5.6.7. Similarly, we can define M2 _ nid _ Z= —— (Xn - pb), Z=—(%, - ), 3 3 que entdo tem a distribuigdo normal pad rao: Por isso, which then has the standard normal distribution. Hence, a 3 = 3 0.95 = Pr(| Z| <1.96Pré | Xn-p| <1.96 Wa (5.6.12) 0.95 = Pr(|Z| < 1.96) = Pr (x, —pl< 196) . (5.6.12) n 312 Capitulo 5 Distribuigées Especiais 312 Chapter 5 Special Distributions E facil verificar que It is easy to verify that — 3 = 3, . |Xn-p]<1.96 77; see apenas se |X, —pu| < 1.9675 if and only if Xr-1.96 3 <U<X nt1.96 3 5.6.13 xX, 1.96 xX. 1.962 5.6.13 mi. MPA ye nv. mr (5.6.13) nA. 2 << ntl. 71 (5.6. ) As duas desigualdades na Eq. (5.6.13) é valido se e somente se 0 intervalo The two inequalities in Eq. (5.6.13) hold if and only if the interval ( 3 3 3 3 _ — + _ _— _— Xr-1.96 Ae, Xnt 1.96-, (5.6.14) (x, 1.965 Xnt 1.96; ) (5.6.14) contém o valor dey. Segue-se da Eq. (5.6.12) que a probabilidade é 0,95 de que 0 intervalo contains the value of jw. It follows from Eq. (5.6.12) that the probability is 0.95 that em (5.6.14) contenhay. Agora, suponha que o tamanho da amostra sejan=36. Entdo a the interval in (5.6.14) contains ~. Now, suppose that the sample size is n = 36. Then meia largura do intervalo (5.6.14) é entéo 3/3612= 0.98. Ndo saberemos o(s) ponto(s) do the half-width of the interval (5.6.14) is then 3/36!/? = 0.98. We will not know the intervalo até que a)depois de observarmosXn. Contudo, sabemos agora que endpoints of the interval until after we observe X,,. However, we know now that the intervaloXr-0.98, Xn+0.98 tem probabilidade 0,95 de contery: - interval (X, — 0.98, X,, + 0.98) has probability 0.95 of containing j.. < As Distribuigdes Lognormais The Lognormal Distributions E muito comum usar distribuig6es normais para modelar logaritmos de variaveis It is very common to use normal distributions to model logarithms of random vari- aleatorias. Por esta razdo, 6 dado um nome a distribuicgdo das variaveis aleatdrias ables. For this reason, a name is given to the distribution of the original random originais antes da transformacdo. variables before transforming. Definicgao Distribuigdo lognormal.Se registrar(X)tem a distribuigdo normal com médiaye varia- Definition Lognormal Distribution. If log(X) has the normal distribution with mean wy and vari- 5.6.4 ancaoz, dizemos queXtem odistribuicao lognormalcom pardmetrospecz. 5.6.4 ance o7, we say that X has the lognormal distribution with parameters jz and o°. Exemplo Tempos de falha de rolamentos de esferas.Os produtos sujeitos a desgaste sao geralmente Example Failure Times of Ball Bearings. Products that are subject to wear and tear are gener- 5.6.9 testados quanto a resist€ncia para estimar sua vida util. Lawless (1982, exemplo 5.2.2) descreve 5.6.9 ally tested for endurance in order to estimate their useful lifetimes. Lawless (1982, dados retirados de Lieblein e Zelen (1956), que sdo medigdes dos numeros de milhdes de example 5.2.2) describes data taken from Lieblein and Zelen (1956), which are mea- revolugées antes da falha para 23 rolamentos de esferas. A distribuicdo lognormal é um surements of the numbers of millions of revolutions before failure for 23 ball bearings. modelo popular para tempos até a falha. A Figura 5.5 mostra um histograma dos 23 tempos de The lognormal distribution is one popular model for times until failure. Figure 5.5 vida juntamente com uma pdf lognormal com parametros escolhidos para corresponder aos shows a histogram of the 23 lifetimes together with a lognormal p.d.f. with parame- dados observados. As barras do histograma da Figura 5.5 possuem areas que sdo iguais as ters chosen to match the observed data. The bars of the histogram in Fig. 5.5 have proporcées da amostra que ficam entre os pontos no eixo horizontal onde ficam os lados das areas that equal the proportions of the sample that lie between the points on the barras. Suponha que os engenheiros estejam interessados em saber quanto tempo esperar horizontal axis where the sides of the bars stand. Suppose that the engineers are in- até que haja 90% de chance de que uma bola terested in knowing how long to wait until there is a 90 percent chance that a ball Figura 5.5Histograma de vida Figure 5.5 Histogram of Util de rolamentos de esferas e 0,020 lifetimes of ball bearings and 0.020 pdf lognormal ajustado para fitted lognormal p.d.f. for Exemplo 5.6.9. Example 5.6.9. 0,015 0.015 8 & 2 5 © 0,010 5 0.010 a Ay 0,005 0.005 50 100 150 50 100 150 Milhdes de Revolugdes Millions of Revolutions 5.6 As Distribuig6es Normais 313 5.6 The Normal Distributions 313 rolamento tera falhado. Entdo eles querem o quantil 0,9 da distribuig¢do de tempos bearing will have failed. Then they want the 0.9 quantile of the distribution of life- de vida. DeixarXser o momento da falha de um rolamento de esferas. A distribuicgdo times. Let X be the time to failure of a ball bearing. The lognormal distribution of lognormal de Xplotado na Fig. 5.5 tem parametros 4.15 e 0.53342. O cdf deXseria X plotted in Fig. 5.5 has parameters 4.15 and 0.5334’. The c.d.f. of X would then be entdo F(x ([registro(x}-4.15]/0.5334), e a funcdo quantil seria F(x) = ®([log(x) — 4.15]/0.5334), and the quantile function would be -1 Fi (PF €4.15+0.5334-1(P), F-! (p) = 04:15+0.53340 (p), onde -1éa fungdo quantilica da distribuigdo normal padrado. Comp=0.9, where ©~! is the quantile function of the standard normal distribution. With p = 0.9, Nostemos -1(0.91.28 eF1(0.9125.6. - we get ®-1(0.9) = 1.28 and F~!(0.9) = 125.6. < Os momentos de uma variavel aleatéria lognormal sdo faceis de calcular com base no The moments of a lognormal random variable are easy to compute based on the mgf de uma distribuigdo normal. SeS=registro(X)tem a distribuigdo normal com m.g.f. of a normal distribution. If Y = log(X) has the normal distribution with mean média pe variagdooz, entdo o mgf deSéW(t¥experiéncia(ut+0.5 0222). No entanto, a uw and variance o”, then the m.g.f. of Y is w(t) = exp(ut + 0.50717). However, the definicdo de wé W(tF E(ety). Desde S=registro(X), Nés temos definition of w is y(t) = E(e'”). Since Y = log(X), we have (t= E(ety = E(etregistro(x)= EXt). w(t) = E(e!”) = E(e!8) = F(X"). Segue queEXt- W(t)para tudo de verdadet.Em particular, a média e a varidncia dex It follows that F(X‘) = w(t) for all real ¢. In particular, the mean and variance of X sao are EX wl Fexperiéncia(y+0.502), (5.6.15) E(X) = W()) = exp(u + 0.507), (5.6.15) Var (XF P2)-W0l k= exp (2p o2lexp.(o2}1]. Var(X) = (2) — wl)? = exp(2u + 0”)[exp(o) — 1]. Exemplo Precos de ag6es e opcées.Considere uma agdo como a do Exemplo 5.6.3, cujo atual Example Stock and Option Prices. Consider a stock like the one in Example 5.6.3 whose current 5.6.10 preco é50. Suponha que o preco emvocéunidades de tempo no futuro é Svocé= SoeZvoce, onde Z 5.6.10 price is Sp. Suppose that the price at u time units in the future is S,, = Spe7", where vocétem a distribuigdo normal com médiapvocée variacdo oz vocé. Observe que S0eZvocé= €Zvocé Z,, has the normal distribution with mean ju and variance o7u. Note that Sye7" = +registro(sJe Zvocé+ registro(Sojtem a distribuigdo normal com médiayvocétregistro(S0)e variagaog e%u+lo8(S0) and Z,, + log(Sp) has the normal distribution with mean ju + log(Sp) and 2vocé. Entado Svocatem a distribuigdo lognormal com parametrospvocé+registro(Soje o2vocé. variance o7u. So S,, has the lognormal distribution with parameters jzu + log(So) and 2 Oru. Black e Scholes (1973) desenvolveram um esquema de precificacdéo para opcées sobre Black and Scholes (1973) developed a pricing scheme for options on stocks whose aces cujos precos seguem uma distribuigdo lognormal. Para o restante deste exemplo, iremos prices follow a lognormal distribution. For the remainder of this example, we shall considere uma Unica vezvocée escreva 0 preco das agdes como Svocé= uur vocéi2z, ondeZtem a consider a single time u and write the stock price as S, = SyetutoureZ where Z has distribuigdo normal padrao. Suponha que precisemos precificar a opcdo de comprar uma agao the standard normal distribution. Suppose that we need to price the option to buy das acées acima pelo pregogem um determinado momentovocéno futuro. Como no Exemplo one share of the above stock for the price g at a particular time u in the future. As 4.1.14 na pagina 214, usaremos precos neutros ao risco. Ou seja, forgamos o valor presente de in Example 4.1.14 on page 214, we shall use risk-neutral pricing. That is, we force E(SvocéJigualar So. Sevocéé medida em anos e a taxa de juros livre de risco éRpor ano, entdo o the present value of E(S,,) to equal So. If u is measured in years and the risk-free valor presente deF(Svocé e-ruE(Svocé). (Isto pressupde que a composicao de juros é feita interest rate is r per year, then the present value of E(S,) ise" E(S,,). (This assumes continuamente, em vez de apenas uma vez, como foi no Exemplo 4.1.14. O efeito da that compounding of interest is done continuously instead of just once as it was in composi¢ao continua é examinado no Exercicio 25.) Example 4.1.14. The effect of continuous compounding is examined in Exercise 25.) Mas E(Svocé¥ S0 euut o2vocé2. Contexto Sdigual aeé-ruS0eyu+ 2vocé2rendimentosp=R-02/2 ao But E(S,) = Syehuteru/2 Setting So equal to onl Syehutoru/2 yields wp =r — 07/2 fazer precos neutros ao risco. when doing risk-neutral pricing. Agora podemos determinar um preco para a opcao especificada. O valor da op¢ao no Now we can determine a price for the specified option. The value of the option momentovocévai ser/h(Svocé), onde at time uw will be A(S,,), where { . ; e- ses>g, _ if ; ise 4 4 n=" ques & 0 de outra forma. 0 otherwise. Definiry=R-02/2, e é facil ver que/(Svocé) >0 se e Somente se Set yp =r — 07/2, and it is easy to see that h(S,) > 0 if and only if () registepg -(R-o 2/2)vocé log (<) -—(r—- o7/2)u Z> —______.. (5.6.16) Z> (5.6.16) vocéi2 oul/2 314 Capitulo 5 Distribuigées Especiais 314 Chapter 5 Special Distributions Iremos nos referir a constante do lado direito da Eq. (5.6.16) comoc. O prego We shall refer to the constant on the right-hand side of Eq. (5.6.16) as c. The neutro ao risco da opcdo € o valor presente daF(h(Svocé)), que é igual risk-neutral price of the option is the present value of E(h(S,,)), which equals - ull S €[R-or/2\vocé+vocé ) 4 e-nadZz, - - * [r—0? /2Jutoul/? 1 -2p eruE{h(Svocél] =e GiRewelvocévocéinz © gq ___ - (5.6.17) eE[A(S,)]=e™ [Soe rao? /Zutoull?z _ a ——__¢*/2qz, (5.6.17) c (21h /2 c (2x)1/? Para calcular a integral na Eq. (5.6.17), divida o integrando em duas partes no -g. A segunda To compute the integral in Eq. (5.6.17), split the integrand into two parts at the —q. integral é entéo apenas uma constante vezes a integral de uma pdf normal, a saber, The second integral is then just a constant times the integral of anormal p.d.f., namely, co - &rug 1 ena -e-rugl -(CJ] —e '"q / * ee —eq[1— ®(c)] c (my/? ce (2n)1/? A primeira integral na Eq. (5.6.17), r The first integral in Eq. (5.6.17), is co e-cavocé2 Gy 1 enrtvocérnzaz. en H/25, [ 1 ee /2+0ul7z dz c (mp? © (2n)1/? Isto pode ser convertido na integral de uma pdf normal vezes uma constante This can be converted into the integral of a normal p.d-f. times a constant by com- completando o quaptrade (ver Exercicio 24). O resultado de completar o quadrado é pleting the square (see Exercise 24). The result of completing the square is ce @-cavocé2S0 1 5 C-(z0 vocéi2p/2+ o2vocé2.AZ=So[1 -(c~ vocé2)]. eo u/2g, [ 1 ee oull?)?/24+07u/2 dz =Spf1— O(c — oul). c (2m) c (20)? Finalmente, combine as duas integrais no preco da op¢ao, usando o fato de que 1 - (XF Finally, combine the two integrals into the option price, using the fact that 1 — ®(x) = (-x): @D(—x): S (om/*- oge ™ ¢d). (5.6.18) Sp® (oul? — c) — ge &(-c). (5.6.18) Este é o famoso Formula de Black-Scholespara opGées de precos. Como um exemplo This is the famous Black-Scholes formula for pricing options. As a simple ex- simples, suponha queq= 50, R=0.06 (juros de 6 por cento), voc&=1 (espera de um ano) e o=0.1. ample, suppose that g = So, r = 0.06 (6 percent interest), u = 1 (one year wait), and Ent&o (5.6.18) diz que 0 prego da opcdo deveria ser 0.074650. Se a distribuigdo de Svocefor o =0.1. Then (5.6.18) says that the option price should be 0.07465. If the distribution diferente da forma usada aqui, técnicas de simulagdo (ver Capitulo 12) podem ser usadas para of S, is different from the form used here, simulation techniques (see Chapter 12) ajudar as opcdes de precos. - can be used to help price options. < As pdf's das distribuigdes lognormais serdo encontradas no Exercicio 17 desta The p.d.f.’s of the lognormal distributions will be found in Exercise 17 of this secdo. O cdf de cada distribuigdo lognormal é facilmente construido a partir do cdf section. The c.d.f. of each lognormal distribution is easily constructed from the normal padrdo. DeixarXtem a distribuigdo lognormal com pardmetros standard normal c.d.f. &. Let X have the lognormal distribution with parameters peo2. Entao ( wand o”. Then ist log(x) — Pr.(Xsx}Pr.(registro(X)sregistro(x)F TegtstroOrn . Pr(X <x) = Prilog(X) < log(x)) = ® (Reman), oO oO Os resultados anteriores nesta secdo sobre combinacées lineares de varidveis aleatérias normais se The results from earlier in this section about linear combinations of normal random traduzem em resultados sobre produtos de poténcias de varidveis aleatérias lognormais. Os variables translate into results about products of powers of lognormal random vari- resultados sobre somas de varidveis aleatérias normais independentes traduzem-se em resultados ables. Results about sums of independent normal random variables translate into sobre produtos de varidveis aleatérias lognormais independentes. results about products of independent lognormal random variables. Resumo Summary Introduzimos a familia de distribuig6es normais. Os pardmetros de cada distribui¢do We introduced the family of normal distributions. The parameters of each normal normal sdo sua média e variancia. Uma combinagao linear de variaveis aleatérias distribution are its mean and variance. A linear combination of independent normal normais independentes tem a distribuigdo normal com média igual a combinagdo random variables has the normal distribution with mean equal to the linear combi- linear das médias e variancia determinada pelo Corolario 4.3.1. Em particular, seX nation of the means and variance determined by Corollary 4.3.1. In particular, if X tem a distribuigdo normal com médiaye variagdooz2, entao(X-y)/otem a distribuicgdo has the normal distribution with mean jz and variance o”, then (X — 1)/o has the normal padrdo (média 0 e varidncia 1). Probabilidades e quantis para distribuicgées standard normal distribution (mean 0 and variance 1). Probabilities and quantiles for normais podem ser obtidos em tabelas ou programas de computador para normal distributions can be obtained from tables or computer programs for standard probabilidades e quantis normais padrao. Por exemplo, seXtem a distribuigdo normal probabilities and quantiles. For example, if X has the normal distribution with normal com médiaye variagdooz, entdo o cdf dex€F(xE (x-pil/o e a fungdo quantilica mean ju and variance o”, then the c.d.f. of X is F(x) = ®([x — y]/o) and the quantile deXéF1(PEp+-1(p)o,onde esta o cdf normal padrdo function of X is F-\(p) =et+ © l(p)o, where ® is the standard normal c.d.f. 5.6 As Distribuig6es Normais 315 5.6 The Normal Distributions 315 Exercicios Exercises 1.Encontre os quantis 0,5, 0,25, 0,75, 0,1 e 0,9 da 10.Se uma amostra aleatéria de 25 observagées for retirada 1. Find the 0.5, 0.25, 0.75, 0.1, and 0.9 quantiles of the 10. If a random sample of 25 observations is taken from distribuigdo normal padrdo. da distribuigdo normal com médiaye desvio padrao 2, qual é a standard normal distribution. the normal distribution with mean yu and standard devia- 2.Suponha queXtem a distribuigdo normal para a qual a ee de que a média amostral fique dentro de uma 2. Suppose that X has the normal distribution for which fron a what is the Probability that the sample mean will média é 1 e a varidncia é 4. Encontre o valor de cada uma a the mean is 1 and the variance is 4. Find the value of each Be das seguintes probabilidades: 11.Suponha que uma amostra aleatéria de tamanhondeve ser of the following probabilities: 11. Suppose that a random sample of size n is to be taken a.Pr.(X<3) b.Pr.(X>1.5) retirado da distribuigdo normal com médiayve desvio padrao 2. a. Pr(X <3) b. Pr(X > 1.5) from the normal distribution with mean yw and standard Determine o menor valor dende tal modo que deviation 2. Determine the smallest value of n such that c.Pr.(X=1) d.Pr.(2<X<5) _ ce. Pr(X=1) d. Pr2<xX <5) _ e.Pr. (X20) £.Pr.(-1<X0.5) Pr.(|Xmp| 0.10.9. e. Pr(X>0) f. Pr(-1<X <0.5) Pr(|X,, — #| < 0.1) > 0.9. g-Pr.(|X| <2) = -h.Pr.(1s -2X+358) 12. g. Pr(|xX|<2) h. Pr(dl<—2X+3<8) 12. 3.Se a temperatura em graus Fahrenheit em um a.Esboce a fdc da distribuigdo normal padrdao a partir 3. If the temperature in degrees Fahrenheit at a certain a. Sketch the c.d-f. ® of the standard normal distribu- determinado local tem distribuigdo normal com média dos valores dados na tabela no final deste livro. location is normally distributed with a mean of 68 degrees tion from the values given in the table at the end of de 68 graus e desvio padrdo de 4 graus, qual éa and a standard deviation of 4 degrees, what is the distri- this book. distribuigdo da temperatura em graus Celsius no b.A partir do esboco dado na parte (a) deste bution of the temperature in degrees Celsius at the same b. From the sketch given in part (a) of this exercise, mesmo local? exercicio, esboce o fdc da distribuicgdo normal para location? sketch the c.d.f. of the normal distribution for which 4.Encontre os quantis 0,25 e 0,75 da temperatura a qual a média é ~2 e o desvio padrdo é 3. 4, Find the 0.25 and 0.75 quantiles of the Fahrenheit tem- the mean is —2 and the standard deviation is 3. Fahrenheit no local mencionado no Exercicio 3. 13.Suponha que os didmetros dos parafusos em uma caixa perature at the location mentioned in Exercise 3. 13. Suppose that the diameters of the bolts in a large box 5.DeixarX1, X2,eX3ser vidas independentes de chips de grande sigam uma distribuig¢do normal com média de 2 5. Let X1, X, and X3 be independent lifetimes of memory follow a normal distribution with a mean of 2 centimeters memoria. Suponha que cadaXeutem distribuicdo normal centimetros e desvio padrao de 0,03 centimetros. Além disso, chips. Suppose that each X; has the normal distribution and a standard deviation of 0.03 centimeter. Also, suppose com média 300 horas e desvio padrao 10 horas. Calcule a suponha que os didmetros dos furos nas porcas em outra with mean 300 hours and standard deviation 10 hours. that the diameters of the holes in the nuts in another large probabilidade de que pelo menos um dos trés chips dure caixa grande sigam a distribuigdo normal com média de 2,02 Compute the probability that at least one of the three box follow the normal distribution with a mean of 2.02 pelo menos 290 horas. centimetros e desvio padrao de 0,04 centimetros. Um chips lasts at least 290 hours. centimeters and a standard deviation of 0.04 centimeter. , oo parafuso e uma porca se encaixardo se o didmetro do furo na ; ; 5 A bolt and a nut will fit together if the diameter of the 6.Se o mgf de uma variavel aleatoriaXé Y(t enpara porca for maior que o didmetro do parafuso e a diferenca 6. If the m.g.f. of a random variable X is ¥(t)=e" for hole in the nutis greater than the diameter of the bolt and — e<t <e,qual é a distribuigao dex? entre esses didmetros nao for maior que 0,05 centimetro. Se —0o <t <0, what is the distribution of X? the difference between these diameters is not greater than 7.Suponha que a tensdo medida em um determinado circuito um parafuso e uma porca forem Selecionados aleatoriamente, 7. Suppose that the measured voltage ina certain electric _ 9-05 centimeter. Ifa bolt and a nut are selected at random, elétrico tenha distribuicdo normal com média 120 e desvio qual € a probabilidade de eles se encaixarem: circuit has the normal distribution with mean 120 and What is the probability that they will fit together? padrao 2. Se forem feitas trés medicées independentes da 14.Suponha que, num determinado exame de standard deviation 2. If three independent measurements 14. Suppose that on a certain examination in advanced tensdo, qual € a probabilidade de que todas matematica avangada, estudantes universitariosA of the voltage are made, what is the probability that all mathematics, students from university A achieve scores trés medigoes ficarao entre 116 e 118? alcancgar pontuagées normalmente distribuidas com three measurements will lie between 116 and 118? that are normally distributed with a mean of 625 and a 8.Avalie a integral» oe awdx média de 625 e varidncia de 100, e estudantes 8. Evaluate the integral °° e738” dx. variance of 100, and students from university B achieve ee’ universitarios Batingir pontuagées normalmente ; ; ; ; scores which are normally distributed with a mean of 600 9.Uma haste reta é formada conectando trés secées A,B, eC cada distribuidas com média de 600 e varidncia de 150. Se 9, A straight rod is formed by connecting three sections and a variance of 150. If two students from university A um deles fabricado em uma maquina diferente. O comprimento dois estudantes da universidadeA e trés estudantes da A, B, and C, each of which is manufactured on a different —_ and three students from university B take this examina- da secaoA, em polegadas, tem distribuic¢ao normal com média 20 universidade fazer este exame, qual é a probabilidade machine. The length of section A, ininches, hasthenormal _ tion, what is the probability that the average of the scores e variancia 0,04. O comprimento da secado8, em polegadas, tem de que a média das notas dos dois estudantes da distribution with mean 20 and variance 0.04. The length of of the two students from university A will be greater than distribuicao normal com média 14 € variancia 0,01. 0 universidadeAsera maior que a média das notas dos section B, in inches, has the normal distribution withmean _the average of the scores of the three students from univer- comprimento da se¢aoC, em polegadas, tem distribuicéo normal trés alunos da universidade&? Dica:Determine a 14 and variance 0.01. The length of section C, ininches,has _ sity B? Hint: Determine the distribution of the difference com média 26 e variancia 0,04. Conforme indicado na Fig. 5.6, as distribuicdo da diferenca entre as duas médias. the normal distribution with mean 26 and variance 0.04. between the two averages. trés secdes sdo unidas de modo que haja uma sobreposicdo de 2 As indicated in Fig. 5.6, the three sections are joined so polegadas em cada conexdo. Suponha que a haste possa ser 15.Suponha que 10% das pessoas de uma determinada that there is an overlap of 2 inches at each connection. 15. Suppose that 10 percent of the people in a certain usada na construcdo de uma asa de avido se seu comprimento populagao tenham a doenca ocular glaucoma. Para Suppose that the rod can be used in the construction ofan population have the eye disease glaucoma. For persons total em polegadas estiver entre 55,7 e 56,3. Qual éa pessoas que tém glaucoma, medicoes da pressdo ocularX airplane wing if its total length in inches is between 55.7 who have glaucoma, measurements of eye pressure X will probabilidade de a barra poder ser usada? tera distribuigdo normal com média de 25 e variacgdo de 1. and 56.3. What is the probability that the rod can be used? be normally distributed with a mean of 25 and a variance Para pessoas que ndo tém glaucoma, a pressdo Xtera of 1. For persons who do not have glaucoma, the pressure distribuigdo normal com média 20 e varidncia 1. Suponha X will be normally distributed with a mean of 20 and a que uma pessoa seja selecionada aleatoriamente na variance of 1. Suppose that a person is selected at random A c populacao e sua pressdo ocular.xé medido. A c from the population and her eye pressure X is measured. [2 a.Determine a probabilidade condicional de que a [2 a. Determine the conditional probability that the per- ww _— pessoa tenha glaucoma, dado queX=x. ww _———— son has glaucoma given that X = x. B B . we pepe b.Para quais valores dexa probabilidade condicional na b. For what values of x is the conditional probability in Figura 5.6Secées da barra no Exercicio 9. parte (a) € maior que 1/2? Figure 5.6 Sections of the rod in Exercise 9. part (a) greater than 1/2? 316 Capitulo 5 Distribuigées Especiais 316 Chapter 5 Special Distributions 16.Suponha que a pdf conjunta de duas variaveis 23.Suponha queAtem a distribuigdo lognormal com 16. Suppose that the joint p.d.f. of two random variables 23. Suppose that X has the lognormal distribution with aleatorias XeSé pardmetros 4.1 e 8. Encontre a distribuicgdo de 3X1. X and ¥ is parameters 4.1 and 8. Find the distribution of 3X'/. 1 24.0 método deCompletando o quadradoé usado diversas — 1 apye2+y?y 24. The method of completing the square is used several F(x, YF Bp Meat siima) Para -09 SX <0 vezes neste texto. E um método util para combinar varios FO, y= On for — 00 <x <0 times in this text. It is a useful method for combining e@ -0 <VOCé <oo, polinédmios quadraticos e lineares em um quadrado perfeito and —oo<y<oo. several quadratic and linear polynomials into a perfect Vv J mais uma constante. Prove a seguinte identidade, que é uma square plus a constant. Prove the following identity, which Encontrar PR¢-2<X+5 <2 2). forma geral de completar o quadrado: Find Pr(— V2 < X + Y < 2v2). is one general form of completing the square: n n 17.Considere uma variavel aleatériaXtendo a y aeu(X-beup+cx 17. Consider a random variable X having the lognormal - a(x — b,) tex distribuigdo lognormal com pardmetrospece. curt distribution with parameters j and o*. Determine the i] Determine o pdf dex. ( )( y )o p.d.f. of X. 5 y a Dei, cf ( n ( an a;b; _ ~) 18.Suponha que as variaveis aleatériasXeSsdo = deu xX Sy 18. Suppose that the random variables X and Y are inde- = > ce independentes e que cada um tem a distribuigdo normal eu=1 eu=14 CU pendent and that each has the standard normal distribu- i=l dint Mi padrdo. Mostre que 0 quocienteX/Ytem a distribuigdo de yn ( >” b )2 tion. Show that the quotient X/Y has the Cauchy distri- n Sa; 2 Cauchy. + aeuber Yer 1? bution. +4 > ; (5, _ ee) i. . . eu=1 eu=1aeu i=l an a; 19.Suponha que a medicdoXda pressdo feita por um ( deat ] 19. Suppose that the measurement X of pressure made by 4 dispositivo em um sistema particular tem a distribuicao ” »” a device in a particular system has the normal distribution n n normal com médiaye variancia 1, ondevé a verdadeira + deu C — aeuberc2 4 with mean yj and variance 1, where j is the true pressure. + (>: «) . > ajb; — ei] pressdo. Suponha que a verdadeira pressdoyé desconhecido, eu=1 eu=1 Suppose that the true pressure jz is unknown but has the i=l i=l mas tem distribuigdo uniforme no intervalo [5,15]. SeX=8 é see unt aeue0 uniform distribution on the interval [5, 15]. If X =8 is if "_, a; £0 . . “ys : i=1 “i . observado, encontre a pdf condicional deydadoX=8. observed, find the conditional p.d.f. of given X = 8. 25.No Exemplo 5.6.10, consideramos 0 efeito da 25. In Example 5.6.10, we considered the effect of con- 20.DeixarAtém a distribuigdo lognormal com parametros 3 — composicdo continua de juros. Suponha que Sodélares 20. Let X have the lognormal distribution with parame- tinuous compounding of interest. Suppose that Sg dollars e 1,44. Encontre a probabilidade de queX<6.05. ganham uma taxa deApor ano composto continuamente ters 3 and 1.44. Find the probability that X < 6.05. earn a rate of r per year componded continuously for u ; . oo, oo. porvocé anos. Prove que o principal mais os juros ao final . . years. Prove that the principal plus interest at the end of 21.DeixarXeSsejam variaveis _ aleatorias independentes desse periodo sao iguais S0erv.DicaSuponha que os juros 21. Let X and Y be independent random variables such _ this time equals Sye’”. Hint: Suppose that interest is com- tais que log(xxtem distribuigao normal com media 1,6 e- sejam compostosrvezes em intervalos deunanos cada. No that log(X) has the normal distribution with mean 1.6and —_ jounded n times at intervals of u/n years each. At the end variancia 4,5 € log(Stem a distribuicao normal com media final de cada um dosnintervalos, o principal é multiplicado variance 4.5 and log(Y) has the normal distribution with of each of the n intervals, the principal gets multiplied by 3 e varidncia 6. Encontre a distribuigdo do produto XY. por 1 +correr. Tome o limite do resultado comon ~, mean 5 and variance 6. Find the distribution of the product 1+ru/n. Take the limit of the result as n > oo. 26.DeixarXtem a distribuig¢do normal cuja pdf é dada 26. Let X have the normal distribution whose p.d.-f. is 22.Suponha queXtem a distribuigao lognormal com por (5.6.6). Em vez de usar o mgf, derive a variancia deX 22. Suppose that X has the lognormal distribution with _given by (5.6.6). Instead of using the m.g.f., derive the parametrosyeoz. Encontre a distribuicdo de 1/X. usando integragao por partes. parameters ju and o”. Find the distribution of 1/X. variance of X using integration by parts. 5.7 As Distribuigd6es Gama 5.7 The Gamma Distributions A familia de distribuig¢6es gama é um modelo popular para varidveis aleatérias que so The family of gamma distributions is a popular model for random variables that conhecidas por serem positivas. A familia de distribuigées exponenciais 6 uma subfamilia are known to be positive. The family of exponential distributions is a subfamily of das distribuicées gama. Os tempos entre ocorréncias sucessivas em um processo de the gamma distributions. The times between successive occurrences in a Poisson Poisson tém uma distribui¢ao exponencial. A fun¢go gama, relacionada as distribuigées process have an exponential distribution. The gamma function, related to the gama, 6 uma extensdo de fatoriais de inteiros para todos os nuimeros positivos. gamma distributions, is an extension of factorials from integers to all positive numbers. A Fungao Gama The Gamma Function Exemplo Média e variancia da vida Util de uma l[Ampada.Suponha que modelemos 0 tempo de vida de Example Mean and Variance of Lifetime of a Light Bulb. Suppopse that we model the lifetime of 5.7.1 uma lampada como uma variavel aleatéria continua com a seguinte pdf: 5.7.1 a light bulb as a continuous random variable with the following p.d.f: Y ex >0 ~* f 0 fix parax >0, fa) = | e or x > 0, 0 de outra forma. 0 otherwise. 5.7 As Distribuigdes Gama 317 5.7 The Gamma Distributions 317 Se quisermos calcular a média e a varidncia desse tempo de vida, precisamos calcular as If we wish to compute the mean and variance of such a lifetime, we need to compute seguintes integrais: the following integrals: Joo Joo oo oo xe-xdx, e x2€-xdX. (5.7.1) / xe “dx, and / xe*dx, (5.7.1) 0 0 0 0 Estas integrais sd casos especiais de uma funcdo importante que examinaremos a seguir. - These integrals are special cases of an important function that we examine next. < Definigao A fungdo gama.Para cada numero positivoa, deixe o valor (axer definido por Definition §= The Gamma Function. For each positive number a, let the value ['(q@) be defined by 5.7.1 a seguinte integral: j 5.7.1 the following integral: * Co (a= Xai e-xdx, (5.7.2) T(@) = / xt te dx. (5.7.2) 0 0 A funcdo definida pela Eq. (5.7.2) paraa >0 é chamado defun¢do gama. The function I defined by Eq. (5.7.2) for a > 0 is called the gamma function. Como um exemplo, j As an example, °° Co = e-xdx=1. (5.7.3) ro@= / e dx =1. (5.7.3) 0 0 O resultado a seguir, juntamente com a Eq. (5.7.3), mostra queo (aX finito para todo The following result, together with Eq. (5.7.3), shows that (a) is finite for every valor dea >0. value of a > 0. Teorema Sea >1, entdo Theorem If a > 1, then 5.7.4 (a=(a-1) (a1). (5.7.4) 5.71 Ta) =(a —1)P(@—1). (5.7.4) ProvaAplicaremos o método de integrac¢ao por partes a integral na Eq. (5.7.2). Se Proof We shall apply the method of integration by parts to the integral in Eq. (5.7.2). deixarmosvocé=xa-1e dv=e-xdx, entao vocé=(a-1 )xa-2 xe v= -e-x. Portanto, If we let wu =x! and dv = e™ dx, then du = (a — 1)x%~? dx and v = —e~*. There- fore, Joo Joo oo oo (AF vocé 6 um idiota= [U= - vocé T'(a) => [ u du => [uv]> _ / UV du 0 0 0 0 _ Joo d oo = [-Xa-1 @-x]o x<0+(a- 1) Xa-2.e-xx _ [-x* le], +(a— »f x22 6-* dy 0 0 =0 +(a- 1) (a-1). 7 =0+(@—-—Dr(@— 1). 7 Para valores inteiros dea, temos uma expressdo simples para a fungdo gama. For integer values of a, we have a simple expression for the gamma function. Teorema Para cada inteiro positivon, Theorem For every positive integer n, 5.7.2 (n=(-1}. (5.7.5) 5.1.2 ra) =(n—D!. (5.7.5) Prova Segue-se do Teorema 5.7.1 que para cada numero inteiron=2, Proof It follows from Theorem 5.7.1 that for every integer n > 2, (NF(n-1) (n-1F(n-1 )(n-2) (n-2) =(n-1)(n-2). 1. Na)=an-)Pa-)Y)=a@-Y)M—-2)P(@ —2) 1)=(n-1)'(1). =(n—(n—-2)---1-TQ) =(n—1)IF(). Desde (11 = 0! pela Eq. (5.7.3), a prova esta completa. 7 Since F'(1) = 1 = 0! by Eq. (5.7.3), the proof is complete. 7 Exemplo Média e variancia da vida util de uma lmpada.As duas integrais em (5.7.1) sdo, respectivamente Example Mean and Variance of Lifetime of a Light Bulb. The two integrals in (5.7.1) are, respec- 5.7.2 ativamente, (21! = 1 e()2! = 2. Seque-se que a média de cada vidaé1ea 5.7.2 tively, [(2) = 1!= 1 and ['(3) = 2! =2. It follows that the mean of each lifetime is 1, variancia é 2 - 12= 1. - and the variance is 2 — 1* = 1. < Em muitas aplicagées estatisticas, (ajdeve ser avaliado quandoaé um numero inteiro In many statistical applications, [(@) must be evaluated when a is either a positive positivo ou da formaa=n+(1/2)para algum numero inteiro positivon. Segue-se de integer or of the form a =n + (1/2) for some positive integer n. It follows from 318 Capitulo 5 Distribuigées Especiais 318 Chapter 5 Special Distributions Eq. (5.7.4) que para cada inteiro positivon, Eq. (5.7.4) that for each positive integer n, OO ( )( )( ) mie pl pp bee 1 T (5.7.6) r(n+5)=(n-3) (»-3)--(S)r(G). (5.7.6) 2 2 2 2 2 2 2 2 2 2 ( 1 1 Assim, sera possivel determinar o valor de nt 5 pudermos avaliar Hence, it will be possible to determine the value of r(n + ;) if we can evaluate () 7 r(2) 2 2 Da Eq. (5.7.2), From Eq. (5.7.2), ( 4 co 1 oo TL X-12€-xdX, r(5) -| yoe-* dy. 2 0 2 0 Se deixarmosx=(1/2)s2nesta integral, entéodx=vocé esté morrendoe If we let x = (1/2) y? in this integral, then dx = y dy and () Joo ( ) oo 1 cay exp -sign (5.7.7) r(5) = 2 | ex (-5 *) d (5.7.7) 2 0 p 2 mori. of. 2 —_ 0 p >> y. aa Como a integral da pdf da distribuig¢do normal padrdo é igual a1, é Because the integral of the p.d.f. of the standard normal distribution is equal to 1, it segue isso follows that je C4) °° 1 experiencia” = SimMorrer=(2T1)\ 2. (5.7.8) / exp(-3y°) dy = (2n)"/?, (5.7.8) ~ 00 2 —oo 2 Como o integrando em (5.7.8) é simétrico em tornosim=0, Because the integrand in (5.7.8) is symmetric around y = 0, Jo ( ) ()12 1/2 1. 1 12 TT [ ( 1 *) 1 1/2 () experinciay == S//W2norre= = (LTT, = = . ex - = d — i 20 / ={= . 0 2 gem = 9 | exp(—5»?) dy = 5m? = (F Segue-se agora da Eq. (5.7.7) que It now follows from Eq. (5.7.7) that (,) en (5.7.9) r(5) =n, (5.7.9) 2 2 Por exemplo, é encontrado nas Eqs. (5.7.6) e (5.7.9) que For example, it is found from Eqs. (5.7.6) and (5.7.9) that ONOO 831 maz Ima r(2)=(2) (2) (2) ne=Bae 2 222 8 2 2) \2/) \2 8 Apresentamos dois resultados finais Uiteis antes de introduzirmos as distribuig6es gama. We present two final useful results before we introduce the gamma distributions. Teorema Para cadaa >0 e cadaf Oy Theorem For each a > 0 and each 6 > 0, 5.7.3 * 5.7.3 oo XAT experiencia(Bade LD (5.7.10) [ x! exp(Bx)dx = T@) (5.7.10) 0 Ba 0 pe ProvaFaca a mudan¢a de variadveissim=Gxpara quex=y/Be dx=dy/B. O resultado Proof Make the change of variables y = 6x so that x = y/B and dx =dy/f. The agora segue facilmente da Eq. (5.7.2). 7 result now follows easily from Eq. (5.7.2). 7 Existe uma versdo da formula de Stirling (Teorema 1.7.5) para a fungdo gama, There is a version of Stirling’s formula (Theorem 1.7.5) for the gamma function, que afirmamos sem prova. which we state without proof. (2TT2Xx-1/2€-x (27) '/2xx-V29- Teorema Formula de Stirling. timso ————~-~——— = 1. 2 Theorem Stirling’s Formula. lim ————__—_ = 1. 2 5.7.4 * (x) 5.7.4 x00 T(x) Exemplo Tempos de servicgo em uma fila.Paraeu=1,..., 7, suponha que esse clienteeuvem uma fila deve Example Service Times in a Queue. Fori =1,...,, suppose that customer / in a queue must 5.7.3 tempo de esperaXeupara atendimento assim que chegar ao topo da fila. DeixarZer a taxa na 5.7.3 wait time X; for service once reaching the head of the queue. Let Z be the rate at qual o cliente médio é atendido. Um modelo de probabilidade tipico para esta situagdo which the average customer is served. A typical probability model for this situation 5.7 As Distribuigdes Gama 319 5.7 The Gamma Distributions 319 é dizer que, condicionadoZz,X1,..., XnSdo iid com uma distribuicgdo tendo o pdf is to say that, conditional on Z = z, X;,..., X,, arei.i.d. with a distribution having the condicional gi (xeu| z-zexperiéncia-zxeu)paraxeu>0. Suponha queZtambém é conditional p.d.f. g4(x;|z) = z exp(—zx,) for x; > 0. Suppose that Z is also unknown desconhecido e tem 0 pdff(z-2 exp(-2z)paraz >0. O pdf conjunto deX,..., Xn, Z and has the p.d.f. fo(z) = 2 exp(—2z) for z > 0. The joint p.d-f. of X1,..., X,, Z is entdo then iT’ n fx, 2.2, Xn,ZF gi (xeu| z)R(z) Opn 2D=] [aclaA@ eu=1 ( ) i=l =2znexp -A2 +xi+.. .+xnl, (5.7.11) = 22" exp (—z [2+ x, +---+x,]), (5.7.11) Sez,x1,..., Xn>0 e 0 caso contrario. Para calcular a distribuigdo marginal conjunta de if z, x4, ...,X, > 0 and 0 otherwise. In order to calculate the marginal joint distribu- X,...,Xn, devemos integrarzfora do pdf conjunto acima. Podemos aplicar o tion of X,,..., X,, we must integrate z out of the the joint p.d.f. above. We can apply Teorema 5.7.3 coma=n+1 ef= 2 +xi+...+xnjuntamente com o Teorema 5.7.2 para Theorem 5.7.3 witha =n + land 6B =2+.x,;+---+ x, together with Theorem 5.7.2 integrar a fungdo na Eq. (5.7.11). O resultado é to integrate the function in Eq. (5.7.11). The result is Je 2(n °° 2n! f(x, ..., Xn, Z)dz= ( i (5.7.12) / fq, ...5 Xp, ZDdz= a) (5.7.12) n 0 2 + eu=1Xeu 0 (2 + vi=l xi) para todosxeu>0 e 0 caso contrario. Este 6 o mesmo pdf conjunto usado no Exemplo 3.7.5 for all x; > 0 and 0 otherwise. This is the same joint p.d.f. that was used in Exam- na pagina 154. - ple 3.7.5 on page 154. < As Distribuigé6es Gama The Gamma Distributions Exemplo Tempos de servigo em uma fila.No Exemplo 5.7.3, suponha que observamos os tempos de servico Example Service Times ina Queue. In Example 5.7.3, suppose that we observe the service times 5.7.4 denclientes e deseja encontrar a distribuigdo condicional da taxaZ. Podemos 5.7.4 of n customers and want to find the conditional distribution of the rate Z. We can encontrar facilmente o pdf condicionalg2(z| x1, ..., xn\deZdadoX=x1,..., Xn=Xn easily find the conditional p.d.f. g9(z|x1,...,x,) of Z given X;=4x4,..., X, =X, by dividindo o pdf conjunto dex, ..., Xn, Zna Eq. (5.7.11) pory epdfderi,..., Xnem dividing the joint p.d-f. of X),..., X,, Z in Eq. (5.7.11) by the p.d.-f. of X;,..., X,, in Eq. (5.7.12). O calculo é simplificado definindosim=2 + cu=1 Xeu. Obtemos entao Eq. (5.7.12). The calculation is simplified by defining y = 2 + }7"_, x;. We then obtain { simnv yrtl gr(z|x1,..., XnF Sy Suz Paraz 0, - 82(2I¥1) «++ + Xn) = | al” ” forz>0, < 0 de outra forma. 0 otherwise. Distribuigdes com pdfs como aquela no final do Exemplo 5.7.4 séo membros de uma Distributions with p.d.f.’s like the one at the end of Example 5.7.4 are members familia comumente usada, que definimos agora. of a commonly used family, which we now define. Definicao Distribuig6es Gama.Deixaraefsejam numeros positivos. Uma variavel aleatériaXtem o Definition Gamma Distributions. Let a and £ be positive numbers. A random variable X has the 5.7.2 distribuicdo gama com parametros a e BseXtem uma distribuigdo continua para 5.7.2 gamma distribution with parameters a and 6 if X has a continuous distribution for a qual o pdf é which the p.d.f. is |p pe a—-1,—Bx —— xa-1e-Bx parax >0, —x*"“e for x > 0, fixla, Be | (a) PrP (5.7.13) ficla, B)=+ Ta) (5.7.13) 0 paraxso. 0 for x <0. Que a integral da pdf na Eq. (5.7.13) € 1 segue facilmente do Teorema 5.7.3. That the integral of the p.d.f. in Eq. (5.7.13) is 1 follows easily from Theorem 5.7.3. Exemplo Tempos de servico em uma fila.No Exemplo 5.7.4, podemos facilmente reconhecer a condicional Example Service Times in a Queue. In Example 5.7.4, we can easily recognize the conditional 5.7.5 pdf como o pdf da distribuigdo gama com pardmetrosa=n+1 ef=sim. 5.7.5 p.d.f. as the p.d.f. of the gamma distribution with parameters a=n+1 and B=y. - < SeXtem uma distribuicdo gama, entao os momentos deXsdo facilmente encontrados a partir das If X has a gamma distribution, then the moments of X are easily found from Eqs. (5.7.13) e (5.7.10). Eqs. (5.7.13) and (5.7.10). 320 Capitulo 5 Distribuigées Especiais 320 Chapter 5 Special Distributions Figura 5.7Graficos das PDFs Figure 5.7 Graphs of the — — de diversas distribuigées 12 20/12 OI 215 p.d.f’s of several different 1D “ _ Bt 0.1 gama diferentes com média | cee @2.B2 gamma distributions with | oe. . =? B=2 comum de 1. 1,0 ee 23,53 common mean of 1. 1.0 ---a= 3. B=3 « 0,8 - = 08 - Ss iS a. iS = 0,6 i a 5 0.6 i a 3 a LN 5S i] LN © oath! nN © oa th “MN “N oN 0,2 SS 0.2 aS “Sc, , “Sc, 9 1 2 3 4. 5 x 0 1 2 3 403 Teorema Momentos.Deixarxtem a distribuigdo gama com pardmetrosaeP. Parak= Theorem Moments. Let X have the gamma distribution with parameters w and f. For k = 5.7.5 1,2,..., 5.7.5 1,2,..., atk, a(at1)... (atk-1 r k 1)--- k-1 exye LOK. art). (aR) pert) = M@th) _aa+ Det k=D. B(a) Bk BT (@) B Em particular,EXFa ,e Var(XFa B. In particular, E(X) = B and Var(X) = Be ProvaParak=1,2,..., Proof Fork =1,2,..., Joo Ba Joo d oO p% 0° EXk xkf(x| a, B) dx= Xatk-1 €-BxOX E(X‘)= / x* f(xla, B) dx = —— / xe tke Bx dy 0 (a) 0 0 Tr(a) Jo . (atk, at k, eT k) oT k = Ba (ak) = (atk) (5.7.14) = a : TN@+k) = P@t+h (5.7.14) (q) Bark Bia) Tia) perk BAT (a) A expressdo para£Xsegue imediatamente de (5.7.14). A variancia pode ser calculada The expression for E(X) follows immediately from (5.7.14). The variance can be como computed as ( )2 2 a(at a a 1 varxe OV aol a 7 Var(X) = wer) - (<) =<. 7 2 BB fe B Bb) B A Figura 5.7 mostra varios PDFs de distribuigéo gama, todos com média igual a 1, Figure 5.7 shows several gamma distribution p.d.f.’s that all have mean equal to mas valores diferentes deaef. 1 but different values of a and 6. Exemplo Tempos de servigo em uma fila.No Exemplo 5.7.5, a taxa de servico média condicional dada Example Service Times in a Queue. In Example 5.7.5, the conditional mean service rate given 5.7.6 as observagoesXi=x1,... , Xn=xné 5.7.6 the observations X; = x1,..., X, =X, iS 1 1 E(Z|m,..., X= — Sp —. E(Z|xp. . 0.4%) =— tt. 2 + eu=1Xeu 2 + vizi Xj Para grandesn, a média condicional é aproximadamente 1 sobre a média amostral dos tempos de For large n, the conditional mean is approximately 1 over the sample average of atendimento. Isso faz sentido, j4 que 1 acima do tempo médio de servico é 0 que geralmente the service times. This makes sense since 1 over the average service time is what we entendemos por taxa de servico. - generally mean by service rate. < O mgfydeXpode ser obtido de forma semelhante. The m.g.f. % of X can be obtained similarly. Teorema Fungdo geradora de momento.DeixarXtem a distribuigdo gama com parametrosa Theorem Moment Generating Function. Let X have the gamma distribution with parameters a 5.7.6 ef. O mgf dexé 5.7.6 and f. The m.g.f. of X is ( B da a W(tF Bt parat <P. (5.7.15) WtH= (5) fort < B. (5.7.15) —t 5.7 As Distribuigdes Gama 321 5.7 The Gamma Distributions 321 Prova Omofé j j Proof The m.g/f. is * * oo al CO W(t COF | a B) ax= pe Xa-1 e-(B-txdx. w(t) -| e'* f(xla, B) dx _ B [ x1 (B-1)x dx. 0 (a) 0 0 P@) Jo Esta integral sera finita para cada valor defde tal modo quet < #. Portanto, segue da This integral will be finite for every value of t such that t < 6. Therefore, it follows Eq. (5.7.10) que, parat <, from Eq. (5.7.10) that, for t < B, ( ) a a a r Wt Ba iq - a . : vin= FTO ( FY. : (a) (Bta Bt Pa) (B—1)% B-t Podemos agora mostrar que a soma das variaveis aleatérias independentes que We can now show that the sum of independent random variables that have possuem distribuig6es gama com um valor comum do parametroftambém teraé uma gamma distributions with acommon value of the parameter 6 will also have a gamma distribuigdo gama. distribution. Teorema Seas variaveis aleatorias™,..., Xssdo independentes e seXeutem a gama Theorem If the random variables X;,..., X, are independent, and if X; has the gamma 5.7.7 distribuigdo com pardmetrosaeveB (eu=1,..., kK), entdo asomaXit. . .+Xk 5.7.7 distribution with parameters a; and 6B (i=1,...,k), then the sum X,;+---+ X;, tem a distribuigdo gama com pardmetrosatt. . .+akef. has the gamma distribution with parameters a, +---+ a, and f. ProvaSe Weudenota o maf deXeu, entdo segue da Eq. (5.7.15) que para Proof If w; denotes the m.gf. of X;, then it follows from Eq. (5.7.15) that for eu=1,...,k, i=1,...,k, ( B ) eu B Oj Yeu(t= —— parat <Z. W(t) = (<4) fort < B. Bt p-t SeyYdenota o mgf da somaMit. . .+Xk, entdo pelo Teorema 4.4.4, If yw denotes the m.g.f. of the sum X, + --- + X;,, then by Theorem 4.4.4, ik ( ) B ait... + ak k B ay tee tay Wit peu(t= = —— parat<B, vo=T] ui@= () fort < B. bt p-t eu=1 i=l O mgfwagora pode ser reconhecido como o mgf da distribuigdo gama com The m.g.f. yw can now be recognized as the m.g.f. of the gamma distribution with pardmetrosart. . .+axef. Portanto, a somaxXit+. . .+Xxdeve ter esta distribuigdo parameters a, +---+a,and 8. Hence, the sum X, + ---+ X; must have this gamma gama. 7 distribution. 7 As distribuigdes exponenciais The Exponential Distributions Um caso especial de distribuigdes gama fornece um modelo comum para fendmenos A special case of gamma distributions provide a common model for phenomena such como tempos de espera. Por exemplo, no Exemplo 5.7.3, a distribuigdo condicional de as waiting times. For instance, in Example 5.7.3, the conditional distribution of each cada tempo de servicoXeudadoZa taxa de servigo) é membro da seguinte familia de service time X; given Z (the rate of service) is a member of the following family of distribuicgées. distributions. Definicao Distribuig6es Exponenciais.DeixarZ >0. Uma variavel aleatériaXtem odistribui¢ao exponencial Definition Exponential Distributions. Let 8 > 0. A random variable X has the exponential distri- 5.7.3 mas¢do com paradmetro BseXtem uma distribuigdo continua com o pdf 5.7.3 bution with parameter B if X has a continuous distribution with the p.d.-f. {B €Bx parax >0 Be-P* forx >0 fix| B= P (5.7.16) (x|B) = { ’ 5.7.16 IB 0 paraxso. f 0 for x <0. ( ) Uma comparacao das PDFs para distribuigdes gama e exponencial torna o A comparison of the p.d.f.’s for gamma and exponential distributions makes the seguinte resultado 6bvio. following result obvious. Teorema A distribuigdo exponencial com pardmetroé igual a distribuigdo gama Theorem The exponential distribution with parameter £ is the same as the gamma distribution 5.7.8 com parametrosa= 1 ef. SeXtem a distribuigdo exponencial com parametro £, 5.7.8 with parameters w = 1 and £. If X has the exponential distribution with parameter entao B, then 1 1 1 1 EXF = e Var(X—- —, (5.7.17) E(x)=— and Var(X)= = (5.7.17) B pe B B 322 Capitulo 5 Distribuigées Especiais 322 Chapter 5 Special Distributions eo mof dexé and the m.g.f. of X is W(tF ae parat <B. 2 w(t)= a fort < B. 2 —t As distribuigées exponenciais tam uma propriedade sem memoria semelhante 4 declarada no Exponential distributions have a memoryless property similar to that stated in Teorema 5.5.5 para distribuicdes geométricas. Theorem 5.5.5 for geometric distributions. Teorema Propriedade sem memoria de distribuigdes exponenciais.DeixarXtemos a distribuicao exponencial Theorem Memoryless Property of Exponential Distributions. Let X have the exponential distri- 5.7.9 mascdo com parametroZ, e deixar>0. Entdo para cada numeroh >0, 5.7.9 bution with parameter £, and let t > 0. Then for every number h > 0, Pr.(Xet+h| XetePr.(Xeh). (5.7.18) Prix >t +h|X >t) =Pr(X =A). (5.7.18) ProvaPara cada>0, j Proof For eacht > 0, co CO Pr.(X2t Be-pxdx= e-ft. (5.7.19) Pr(X >t) = / Be * dx =e". (5.7.19) t t Portanto, para cada>0 e cadah >0, Hence, for each t > 0 and each h > 0, Pr.(xXet+h Prix >t+h Pr.(X2tt+h| X2t= Prifetrh) Prix >1+h|X >1) = NX Stt) Pret) Pr(X 2 1) @-B(t+h) o-Bit+h) = ——— =epr=Pr.(Xeh). (5.7.20) = ———_ = ¢ Fh = pr(X > h). (5.7.20) e-Bt eB a a Vocé pode provar (ver Exercicio 23) que as distribuig6es exponenciais sdo as Unicas You can prove (see Exercise 23) that the exponential distributions are the only distribuigées continuas com a propriedade sem memoria. continuous distributions with the memoryless property. Para ilustrar a propriedade sem memoria, vamos supor queXrepresenta o numero de To illustrate the memoryless property, we shall suppose that X represents the minutos que decorrem antes que algum evento ocorra. De acordo com a Eq. (5.7.20), caso 0 number of minutes that elapse before some event occurs. According to Eq. (5.7.20), evento nao tenha ocorrido durante o primeirotminutos, entdo a probabilidade de que o evento if the event has not occurred during the first t minutes, then the probability that the nao ocorra durante os préximos/minutos é simplesmentee gn. Isto € 0 mesmo que a event will not occur during the next minutes is simply e~*". This is the same as the probabilidade de que o evento ndo ocorra durante um intervalo deAminutos a partir do tempo probability that the event would not occur during an interval of / minutes starting 0. Em outras palavras, independentemente do periodo de tempo decorrido sem a ocorréncia do from time 0. In other words, regardless of the length of time that has elapsed without evento, a probabilidade de o evento ocorrer durante o préximo Aminutos sempre tem o the occurrence of the event, the probability that the event will occur during the next mesmo valor. h minutes always has the same value. Esta propriedade de auséncia de memoria ndo serd estritamente satisfeita em todos os This memoryless property will not strictly be satisfied in all practical problems. problemas praticos. Por exemplo, suponha queXé 0 periodo de tempo durante o qual uma For example, suppose that X is the length of time for which a light bulb will burn lAmpada permanecerd acesa antes de falhar. O periodo de tempo durante o qual se espera que before it fails. The length of time for which the bulb can be expected to continue to a l[Ampada continue a queimar no futuro dependera do periodo de tempo durante o qual ela burn in the future will depend on the length of time for which it has been burning esteve acesa no passado. No entanto, a distribuic¢éo exponencial tem sido utilizada eficazmente in the past. Nevertheless, the exponential distribution has been used effectively as como uma distribuigdo aproximada para varidveis como a duracdo da vida de varios produtos. an approximate distribution for such variables as the lengths of the lives of various products. Testes de vida Life Tests Exemplo Lampadas.Suponha quenlampadas estado queimando simultaneamente em um teste para determinar Example Light Bulbs. Suppose that n light bulbs are burning simultaneously in a test to deter- 5.7.7 minar a duragdo de suas vidas. Vamos supor quenlampadas queimam 5.7.7 mine the lengths of their lives. We shall assume that the n bulbs burn independently of independentemente umas das outras e que a vida util de cada lampada tem a distribuicgdo one another and that the lifetime of each bulb has the exponential distribution with exponencial com pardmetrof. Em outras palavras, seXeudenota a vida util da lampadaeu, parameter f. In other words, if X; denotes the lifetime of bulb 7, fori =1,...,n, paraeu=1,...,, entéo assume-se que as variaveis aleatériasXi,..., Xnsdo iid e que then it is assumed that the random variables X,,..., X,, are i.i.d. and that each has cada um tem a distribuigdo exponencial com parametroP. Qual é a distribuigdo do the exponential distribution with parameter 6. What is the distribution of the length periodo de tempoStaté a primeira falha de um dosnlampadas? Qual é a distribuigdo do of time Y, until the first failure of one of the n bulbs? What is the distribution of the periodo de tempoSzapés a primeira falha até que uma segunda lampada falhe? - length of time Y, after the first failure until a second bulb fails? < 5.7 The Gamma Distributions 323 The random variable Y1 in Example 5.7.7 is the minimum of a random sample of n exponential random variables. The distribution of Y1 is easy to find. Theorem 5.7.10 Suppose that the variables X1, . . . , Xn form a random sample from the exponential distribution with parameter β. Then the distribution of Y1 = min{X1, . . . , Xn} will be the exponential distribution with parameter nβ. Proof For every number t > 0, Pr(Y1 > t) = Pr(X1 > t, . . . , Xn > t) = Pr(X1 > t) . . . Pr(Xn > t) = e−βt . . . e−βt = e−nβt. By comparing this result with Eq. (5.7.19), we see that the distribution of Y1 must be the exponential distribution with parameter nβ. The memoryless property of the exponential distributions allows us to answer the second question at the end of Example 5.7.7, as well as similar questions about later failures. After one bulb has failed, n − 1 bulbs are still burning. Furthermore, regardless of the time at which the first bulb failed or which bulb failed first, it follows from the memoryless property of the exponential distribution that the distribution of the remaining lifetime of each of the other n − 1 bulbs is still the exponential distribution with parameter β. In other words, the situation is the same as it would be if we were starting the test over again from time t = 0 with n − 1new bulbs. Therefore, Y2 will be equal to the smallest of n − 1 i.i.d. random variables, each of which has the exponential distribution with parameter β. It follows from Theorem 5.7.10 that Y2 will have the exponential distribution with parameter (n − 1)β. The next result deals with the remaining waiting times between failures. Theorem 5.7.11 Suppose that the variables X1, . . . , Xn form a random sample from the exponen- tial distribution with parameter β. Let Z1 ≤ Z2 ≤ . . . ≤ Zn be the random variables X1, . . . , Xn sorted from smallest to largest. For each k = 2, . . . , n, let Yk = Zk − Zk−1. Then the distribution of Yk is the exponential distribution with parameter (n + 1 − k)β. Proof At the time Zk−1, exactly k − 1 of the lifetimes have ended and there are n + 1 − k lifetimes that have not yet ended. For each of the remaining lifetimes, the conditional distribution of what remains of that lifetime given that it has lasted at least Zk−1 is still exponential with parameter β by the memoryless property. So, Yk = Zk − Zk−1 has the same distribution as the minimum lifetime from a random sample of size n + 1 − k from the exponential distribution with parameter β. According to Theorem 5.7.10, that distribution is exponential with parameter (n + 1 − k)β. Relation to the Poisson Process Example 5.7.8 Radioactive Particles. Suppose that radioactive particles strike a target according to a Poisson process with rate β, as defined in Definition 5.4.2. Let Zk be the time until the kth particle strikes the target for k = 1, 2, . . .. What is the distribution of Z1? What is the distribution of Yk = Zk − Zk−1 for k ≥ 2? ◀ Although the random variables defined at the end of Example 5.7.8 look similar to those in Theorem 5.7.11, there are major differences. In Theorem 5.7.11, we were 5.7 As Distribuições Gama 323 A variável aleatóriaS1no Exemplo 5.7.7 é o mínimo de uma amostra aleatória de n variáveis aleatórias exponenciais. A distribuição deS1é fácil de encontrar. Teorema 5.7.10 Suponha que as variáveisX1, . . . , Xnformar uma amostra aleatória a partir do exponencial distribuição com parâmetroβ. Então a distribuição deS1=min{X1, . . . , Xn}será a distribuição exponencial com parâmetronβ. ProvaPara cada número>0, Pr.(S1>t)=Pr.(X1>t, . . . , Xn>t) =Pr.(X1>t). . .Pr.(Xn>t) =e-βt. . . e-βt=e-não. Comparando este resultado com a Eq. (5.7.19), vemos que a distribuição deS1deve ser a distribuição exponencial com parâmetronβ. A propriedade sem memória das distribuições exponenciais nos permite responder à segunda questão no final do Exemplo 5.7.7, bem como questões semelhantes sobre falhas posteriores. Depois que uma lâmpada falhou,n-1 lâmpadas ainda estão acesas. Além disso, independentemente do momento em que a primeira lâmpada falhou ou de qual lâmpada falhou primeiro, segue-se da propriedade sem memória da distribuição exponencial que a distribuição da vida útil restante de cada uma das outrasn-1 lâmpadas ainda é a distribuição exponencial com parâmetroβ. Em outras palavras, a situação é a mesma que seria se recomeçassemos o teste a partir do tempot=0 comn-1 lâmpadas novas. Portanto, S2será igual ao menor dosn-1 iid variáveis aleatórias, cada uma das quais tem a distribuição exponencial com parâmetroβ. Segue do Teorema 5.7.10 queS2 terá a distribuição exponencial com parâmetro(n-1)β. O próximo resultado trata dos tempos de espera restantes entre falhas. Teorema 5.7.11 Suponha que as variáveisX1, . . . , Xnformar uma amostra aleatória a partir da exponencial distribuição cial com parâmetroβ. DeixarZ1≤Z2≤. . .≤Znsejam as variáveis aleatórias X1, . . . , Xnclassificados do menor para o maior. Para cadak=2, . . . , n, deixarSk=Zk-Zk−1. Então a distribuição deSké a distribuição exponencial com parâmetro(n+1 - k)β. ProvaNo momentoZk−1, exatamentek-1 das vidas terminou e há n+1 -kvidas que ainda não terminaram. Para cada uma das vidas restantes, a distribuição condicional do que resta dessa vida, dado que durou pelo menosZk−1ainda é exponencial com parâmetroβpela propriedade sem memória. Então,Sk= Zk-Zk−1tem a mesma distribuição que o tempo de vida mínimo de uma amostra aleatória de tamanhon+1 - kda distribuição exponencial com parâmetroβ. De acordo com o Teorema 5.7.10, essa distribuição é exponencial com parâmetro(n+1 -k)β. Relação com o Processo de Poisson Exemplo 5.7.8 Partículas Radioativas.Suponha que partículas radioativas atinjam um alvo de acordo com uma Processo de Poisson com taxaβ, conforme definido na Definição 5.4.2. DeixarZkseja o tempo até o ka partícula atinge o alvo pork=1,2, . . .. Qual é a distribuição deZ1? Qual é a distribuição deSk=Zk-Zk−1parak≥2? - Embora as variáveis aleatórias definidas no final do Exemplo 5.7.8 pareçam semelhantes às do Teorema 5.7.11, existem diferenças importantes. No Teorema 5.7.11, fomos 324 Chapter 5 Special Distributions observing a fixed number n of lifetimes that all started simultaneously. The n lifetimes are all labeled in advance, and each could be observed independently of the others. In Example 5.7.8, there is no fixed number of particles being contemplated, and we have no well-defined notion of when each particle “starts” toward the target. In fact, we cannot even tell which particle is which until after they are observed. We merely start observing at an arbitrary time and record each time a particle hits. Depending on how long we observe the process, we could see an arbitrary number of particles hit the target in Example 5.7.8, but we could never see more than n failures in the setup of Theorem 5.7.11, no matter how long we observe. Theorem 5.7.12 gives the distributions for the times between arrivals in Example 5.7.8, and one can see how the distributions differ from those in Theorem 5.7.11. Theorem 5.7.12 Times between Arrivals in a Poisson Process. Suppose that arrivals occur according to a Poisson process with rate β. Let Zk be the time until the kth arrival for k = 1, 2, . . . . Define Y1 = Z1 and Yk = Zk − Zk−1 for k ≥ 2. Then Y1, Y2, . . . are i.i.d. and they each have the exponential distribution with parameter β. Proof Let t > 0, and define X to be the number of arrivals from time 0 until time t. It is easy to see that Y1 ≤ t if and only if X ≥ 1. That is, the first particle strikes the target by time t if and only if at least one particle strikes the target by time t. We already know that X has the Poisson distribution with mean βt, where β is the rate of the process. So, for t > 0, Pr(Y1 ≤ t) = Pr(X ≥ 1) = 1 − Pr(X = 0) = 1 − e−βt. Comparing this to Eq. (5.7.19), we see that 1 − e−βt is the c.d.f. of the exponential distribution with parameter β. What happens in a Poisson process after time t is independent of what happens up to time t. Hence, the conditional distribution given Y1 = t of the gap from time t until the next arrival at Z2 is the same as the distribution of the time from time 0 until the first arrival. That is, the distribution of Y2 = Z2 − Z1 given Y1 = t (i.e., Z1 = t) is the exponential distribution with parameter β no matter what t is. Hence, Y2 is independent of Y1 and they have the same distribution. The same argument can be applied to find the distributions for Y3, Y4, . . . . An exponential distribution is often used in a practical problem to represent the distribution of the time that elapses before the occurrence of some event. For example, this distribution has been used to represent such periods of time as the period for which a machine or an electronic component will operate without breaking down, the period required to take care of a customer at some service facility, and the period between the arrivals of two successive customers at a facility. If the events being considered occur in accordance with a Poisson process, then both the waiting time until an event occurs and the period of time between any two successive events will have exponential distributions. This fact provides theoretical support for the use of the exponential distribution in many types of problems. We can combine Theorem 5.7.12 with Theorem 5.7.7 to obtain the following. Corollary 5.7.1 Time until kth Arrival. In the situation of Theorem 5.7.12, the distribution of Zk is the gamma distribution with parameters k and β. 324 Capítulo 5 Distribuições Especiais observando um número fixonde vidas que começaram todas simultaneamente. Onas vidas são todas rotuladas antecipadamente e cada uma pode ser observada independentemente das outras. No Exemplo 5.7.8, não há um número fixo de partículas sendo contemplado e não temos uma noção bem definida de quando cada partícula “começa” em direção ao alvo. Na verdade, não podemos nem dizer qual partícula é qual até depois de serem observadas. Simplesmente começamos a observar em um momento arbitrário e registramos cada vez que uma partícula atinge. Dependendo de quanto tempo observamos o processo, poderíamos ver um número arbitrário de partículas atingindo o alvo no Exemplo 5.7.8, mas nunca poderíamos ver mais do quenfalhas na configuração do Teorema 5.7.11, não importa quanto tempo observemos. O Teorema 5.7.12 fornece as distribuições para os tempos entre chegadas no Exemplo 5.7.8, e pode-se ver como as distribuições diferem daquelas no Teorema 5.7.11. Teorema 5.7.12 Tempos entre chegadas em um processo de Poisson.Suponha que as chegadas ocorram de acordo com um processo de Poisson com taxaβ. DeixarZkseja o tempo até oka chegada parak=1,2 , . . . . DefinirS1=Z1eSk=Zk-Zk−1parak≥2. EntãoS1, S2, . . .são iid e cada um deles tem a distribuição exponencial com parâmetroβ. ProvaDeixar>0 e definaXserá o número de chegadas do horário 0 até o horáriot. É fácil ver issoS1≤tse e apenas seX≥1. Ou seja, a primeira partícula atinge o alvo no tempotse e somente se pelo menos uma partícula atingir o alvo no tempot.Nós já sabemos dissoXtem a distribuição de Poisson com médiaβt,ondeβé a taxa do processo. Então, por>0, Pr.(S1≤t)=Pr.(X≥1)=1 − Pr(X=0)=1 -e-βt. Comparando isso com a Eq. (5.7.19), vemos que 1 −e-βté o cdf da distribuição exponencial com parâmetroβ. O que acontece em um processo de Poisson depois de um tempoté independente do que acontece até o momentot.Portanto, a distribuição condicional dadaS1=tda lacuna do tempo taté a próxima chegada emZ2é igual à distribuição do tempo desde o tempo 0 até a primeira chegada. Ou seja, a distribuição deS2=Z2-Z1dadoS1=t(ou seja, Z1=t) é a distribuição exponencial com parâmetroβnão importa o queté. Por isso, S2é independente deS1e eles têm a mesma distribuição. O mesmo argumento pode ser aplicado para encontrar as distribuições paraS3, S4, . . . . Uma distribuição exponencial é frequentemente usada em um problema prático para representar a distribuição do tempo que decorre antes da ocorrência de algum evento. Por exemplo, esta distribuição tem sido usada para representar períodos de tempo como o período durante o qual uma máquina ou um componente electrónico irá funcionar sem avarias, o período necessário para cuidar de um cliente em alguma instalação de serviço, e o período entre o chegada de dois clientes sucessivos a uma instalação. Se os eventos considerados ocorrerem de acordo com um processo de Poisson, então tanto o tempo de espera até que um evento ocorra quanto o período de tempo entre quaisquer dois eventos sucessivos terão distribuições exponenciais. Este fato fornece suporte teórico para o uso da distribuição exponencial em diversos tipos de problemas. Podemos combinar o Teorema 5.7.12 com o Teorema 5.7.7 para obter o seguinte. Corolário 5.7.1 Tempo atékª Chegada.Na situação do Teorema 5.7.12, a distribuição deZké o distribuição gama com parâmetroskeβ. 5.7 As Distribuigdes Gama 325 5.7 The Gamma Distributions 325 Resumo Summary A fungdo gama é definida por (ao ” xa-1e-xdxe tem a propriedade que The gamma function is defined by (a) = tor x¢—le-* dx and has the property that (nE(n-1}) paran=1,2,....SeX1,...,Xnsdo independentes randoy comm variaveis Tn) = (n — 1)! forn =1,2,.... If X1,..., X, are independent random variables distribuigdes gama todas tendo o mesmo segundo parametroZ, entdo eu=1 Xeutem with gamma distributions all having the same second parameter f, then )~”_, X; has a distribuigdo gama com primeiro pardmetro igual 4 soma dos primeiros pardametros the gamma distribution with first parameter equal to the sum of the first parameters deXi,..., Xne segundo pardmetro igual af. A distribuigdo exponencial com of X,,..., X, and second parameter equal to 6. The exponential distribution with parametroé igual a distribuigdo gama com pardmetros 1 ef. Portanto, asoma de parameter £ is the same as the gamma distribution with parameters 1 and 6. Hence, uma amostra aleatéria derwariaveis aleatérias exponenciais com pardametrof tem a the sum of a random sample of 1 exponential random variables with parameter 6 distribuigdo gama com pardmetrosnef. Para um processo de Poisson com taxa £, os has the gamma distribution with parameters n and f. For a Poisson process with rate tempos entre ocorréncias sucessivas tém distribuigdo exponencial com parametroZ, B, the times between successive occurrences have the exponential distribution with e eles sdo independentes. O tempo de espera até oka ocorréncia tem distribuicgdo parameter #8, and they are independent. The waiting time until the kth occurrence gama com parametroskeP. has the gamma distribution with parameters k and f. Exercicios Exercises 1.Suponha queAtem a distribuicgdo gama com valor, medido em horas, tem distribuigdo exponencial com 1. Suppose that X has the gamma distribution with pa- nent, measured in hours, has the exponential distribution parametrosaef, ecé uma constante positiva. Mostre pardmetroG= 0.001, a duracdo da vida util do segundo rameters a and f, and c is a positive constant. Show that with parameter £ = 0.001, the length of life of the second isso cXtem a distribuigdo gama com pardmetrosae Pc. componente tem a distribuig¢do exponencial com cX has the gamma distribution with parameters @ and 8 /c. component has the exponential distribution with parame- pardmetro/= 0.003, e a duracdo da vida do terceiro ter 6 = 0.003, and the length of life of the third component ~ ae oe componente tem a distribuigdo exponencial com . . ar has the exponential distribution with parameter f = 0.006. 2.Calcule a pungo qvandca da distribuigao pardmetroG= 0.006. Determine a probabilidade de o ee compute the quani’e function of the exponential dis- Determine the probability that the system will not fail be- exponencial com parametrof. sistema ndo falhar antes de 100 horas. mbution with parameter f. fore 100 hours. 3.Esboce a pdf da distribuigdo gama para cada um dos . a 3. Sketch the p.d.f. of the gamma distribution for each of . . . : ~ 10.Suponha que um sistema eletrénico contenhan . . 10. Suppose that an electronic system contains n similar seguintes pares de valores dos pardmetrosaeP: . the following pairs of values of the parameters w and f: er (a)a=1/2 eB= 1,(b)a=1 eB= 1,(c)a=2 e B=1 componentes semelhantes que funcionam (a) « =1/2 and B =1, (b) w =1 and B =1, (c) a =2 and components that function independently of each other ‘ ' , independentemente uns dos outros e que estado conectados B= , , and that are connected in series so that the system fails em série para que o sistema falhe assim que um dos — as soon as one of the components fails. Suppose also that 4.Determine 0 modo da distribuigdo gama com componentes falhar. Suponha também que a vida Util de cada 4, Determine the mode of the gamma distribution with the length of life of each component, measured in hours, pardmetrosaeP. componente, medida em horas, tenha distribui¢gado parameters a and £. has the exponential distribution with mean yw. Determine b df da distribuics ial d exponencial com médiay. Determine a média e a variancia do ketch the p.df of th ‘aldistribution £ h the mean and the variance of the length of time until the 5.Esboce a p Af da istribuigao exponencia para cada perfodo de tempo até que o sistema falhe. 5. Sketcht ep.d.to the exponential distribution for eac system fails. um dos seguintes valores do pardmetrof:(a)F=172,(b) B of the following values of the parameter A: (a) 6 = 1/2, (b) =1, e(c)6=2. 11.Suponha quenos itens estdo sendo testados B =1, and (c) 6 =2. 11. Suppose that n items are being tested simultaneously, simultaneamente, os itens sdo independentes e a duracdo the items are independent, and the length of life of each 6.Suponha que%,..., Xnformar uma amostra aleatdéria de . : 19 Ingepe . § 6. Suppose that X;,..., X, form a random sample of : pengen, ane ane eng ve . 5 de vida de cada item tem a distribuigdo exponencial com . a, : item has the exponential distribution with parameter #. tamanhonda distribuigdo exponencial com parametro - . , . size n from the exponential distribution with parameter : : : : : distribuic&o d adi Xn parametroZ. Determine o periodo de tempo esperado até ‘ne the distributi fth 1 — Determine the expected length of time until three items B. Determine a distribuigao da media amostralAn. que trés itens falhem. Dica-O valor requerido 6£(S1+ $2+$3) B. Determine the distribution of the sample mean X,,. have failed. Hint: The required value is E(Y; + Y> + Y3) in 7.DeixarXi, X2, X3seja uma amostra aleatéria da distribuicdo na notagao do Teorema 5.7.11. 7. Let X;, X>, X;be arandom sample from the exponen- __ the notation of Theorem 5.7.11. exponencial com paradmetro. Encontre a probabilidade de : . At . tial distribution with parameter §. Find the probability : . . . . ra . . . 12.Considere novamente o sistema eletr6nico descrito no . : 12. Consider again the electronic system described in Ex- que pelo menos uma das variaveis aleatérias seja maior que t : : , that at least one of the random variables is greater than . : : Exercicio 10, mas suponha agora que o sistema continuara ercise 10, but suppose now that the system will continue tonde>0. , : : t, where t > 0. . . : a operar até que dois componentes falhem. Determine a to operate until two components have failed. Determine 8.Suponha que as variaveis aleatériasX,..., X40 média e a variancia do periodo de tempo até que o 8. Suppose that the random variables X,,..., X, are in- the mean and the variance of the length of time until the independentes eXeutem a distribuigdo exponencial com _ sistema falhe. dependent and X; has the exponential distribution with system fails. pardmetroPeu(eu=1,..., k). DeixarS=min{m™,..., Xa. . . . parameter 6; (@ =1,...,k). Let Y=min{X),..., X;}. . Lo. Mostre issoStem a distribuigdo exponencial com 13.Suponha queum determinado exame seja realizado Show that Y has the exponential distribution with param- 13. Suppose that a certain examination is to be taken by * por cinco alunos independentemente um do outro, e que five students independently of one another, and the num- parametrofit. . +k. , : as eter By +--+ + By. . : : o numero de minutos necessarios para qualquer aluno em ber of minutes required by any particular student to com- 9.Suponha que um determinado sistema contenha trés componentes particular para concluir o exame tenha a distribuigao 9. Suppose that a certain system contains three compo- plete the examination has the exponential distribution for que funcionam independentemente uns dos outros e estao conectados += exponencial para a qual a média é 80. Suponha que o nents that function independently of each other and are which the mean is 80. Suppose that the examination be- em série, conforme definido no Exercicio 5 da Secdo. 3.7, para que o exame comece as 9: OOsouDetermine a probabilidade de connected in series, as defined in Exercise 5 of Sec. 3.7, gins at 9:00 a.m. Determine the probability that at least sistema falhe assim que um dos componentes falhar. Suponha que a pelo menos um dos alunos concluir o exame antes das so that the system fails as soon as one of the components one of the students will complete the examination before duracao da vida do primeiro componente 9h40.sou fails. Suppose that the length of life of the first compo- 9:40 a.m. 326 Capitulo 5 Distribuigées Especiais 326 Chapter 5 Special Distributions 14.Suponha novamente que o exame considerado no queXtem uma taxa de falha crescente se>1, eXtem uma taxa 14. Suppose again that the examination considered in Ex- that X has an increasing failure rate if b > 1, and X has a Exercicio 13 seja realizado por cinco alunos, e o primeiro de falha decrescente seb <1. ercise 13 is taken by five students, and the first student to decreasing failure rate if b < 1. aluno a concluir o exame termine as 9h25.souDetermine a . re - complete the examination finishes at 9:25 a.m. Determine toy . a: 21.DeixarXtem a distribuigdo gama com pardmetros a a : 21. Let X have the gamma distribution with parameters probabilidade de que pelo menos um outro aluno conclua the probability that at least one other student will com- >2 eB >0. so . a >2and fp > 0. o exame antes das 10h.sou plete the examination before 10:00 a.m. a.Prove que a média de 1/XéG/a- 1). . oo, . . a. Prove that the mean of 1/X is B/(a — 1). 15.Suponhamos novamente que o exame considerado no bP iancia de 1/Xé 1 15. Suppose again that the examination considered in Ex- P h h . fl . 2 12 Exercicio 13 seja realizado por cinco alunos. Determine a , oD} que a variancia de 1/Xef2A(a-1 2 ercise 13 is taken by five students. Determine the proba- b. 9 that the variance of 1/X is B°/[(@ — 1) probabilidade de que dois alunos ndo concluam o exame com (a2). bility that no two students will complete the examination (a — 2)]. intervalo de 10 minutos um do outro. 22.Considere o processo de Poisson de colisdo de particulas within 10 minutes of each other. 22. Consider the Poisson process of radioactive particle 16.Diz-se que uma variavel aleatériaxtem oDistribuicdo de radioativas no Exemplo 5.7.8. Suponha que a taxaBdo ; 16. Itissaid that a random variable X has the Pareto distri- hits in Example 5.7.8. Suppose that the rate 6 of the Pois- a processo de Poisson é desconhecido e possui distribuigdo . . . son process is unknown and has the gamma distribution Pareto com parametrosxoe a(x0>0 ea >0) seXtem uma - . , , bution with parameters xp and a (xp > Oanda > 0) if X has . . gg , , gama com parametrosaey. DeixarXser o numero de particulas . we : . with parameters a and y. Let X be the number of parti- distribuigdo continua para a qual o pdff(x| xo, ae . . a continuous distribution for which the p.d.f. f (x|x9, a) is : . . : que atingem o alvo durantetunidades de tempo. Prove que a as follows: cles that strike the target during f time units. Prove that fo seguinte mode: distribuicdo condicional deSdadoX=xé uma distribuicdo gama , . the conditional distribution of 6 given X = x is a gamma axa 0 e encontre os parametros dessa distribuigdo gama. aX distribution, and find the parameters of that gamma dis- —— parax2Xxo — forx > x0, ws f(x| x0, AF [x at F(X|xXq, @) = 7 yo+1 tribution. 0 parax <0. 23.Deixar Seja um CDF continuo satisfatorioF (00, e 0 for x < x9. 23. Let F be a continuous c.d.f. satisfying F(0) = 0, and Mostre que seXtem esta distribuigdo de Pareto, entdo o log da suponha que a distribuicao com cdfAem a propriedade Show that if X has this Pareto distribution, then the ran- suppose that the distribution with c.d.f. F has the mem- variavel aleatéria(X/x0)tem a distribuiggo exponencial com sem memoria (5.7.18). Definir(xFregistro[1 -F/x)] para x >0. dom variable log(X/xy) has the exponential distribution °ryless property (5.7.18). Define ¢(x) = log[1 — F(x)] for pardmetroa. with parameter a. x>0. 17.Suponha que uma variavel aleatériaXtem a a-Mostre isso para todosé, / >0, 17. Suppose that a random variable X has the normal a. Show that for all r, h > 0, distribuigdo normal com médiaye variagadooz. 1 -Flh= 1 -F (t+h) distribution with mean ju and variance o7. Determine the 1-F()= 1—F(t+h) Determine 0 valor deA[(~ppr] paran=1,2,.... 1 -F(t) . value of E[(X — y)?"]forn=1,2,.... ~ 1-F(t) - 18.Considere uma variavel aleatériaXpara o qual Pr(X >0+ b.P . 18. Consider a random variable X for which Pr(X > 0) = . . : , - .Prove isso(tthE(t (h)oara todost h>0. : . . b. Prove that €(t + h) = €(t) + €(h) for all t, h > 0. 1,0 pdf éfe o cdf €F.Considere também a funcgaoh (AE (H+ (Np 1, the p.df. is f, and the c.d.f. is F. Consider also the CFM = lO) + eH) , ~ definido da seguinte forma: c.Prove isso para todos>0 e todos os numeros inteiros positivoske eu function A defined as follows: c. Prove that for all t > 0 and all positive integers k and f ,(quilates/m(k/m) (t). m, £(kt/m) = (k/m)£(t). N(xF TAX) ~ parax >0. d.Prove isso para todos¢, c >0,(ct c(t). h(x) = To Fw LO) forx >0. d. Prove that for all t, c > 0, (ct) = c(t). “FON e.Prove issog(t¥()/té constante para>0. ~ F@) e. Prove that g(t) = €(t)/t is constant for t > 0. A fungdohé chamado detaxa de falhaou ofun¢ao de perigo f.Prove issoAdeve ser o cdf de uma distribuicdo The function h is called the failure rate or the hazard func- f. Prove that F must be the c.d.f. of an exponential dex. Mostre que seXtem uma distribui¢do exponencial, exponencial. tion of X. Show that if X has an exponential distribution, distribution. entdo a taxa de falha/(x}é constante parax >0. ; oo. . then the failure rate h(x) is constant for x > 0. . oe 24.Revise a derivacdo da formula Black-Scholes 24. Review the derivation of the Black-Scholes formula 19.Diz-se que uMa variavel aleatéria tem oDistribuicdo (5.6.18). Para este exercicio, suponhamos que o prego de nossas 19. It is said that a random variable has the Weibull distri- (5.6.18). For this exercise, assume that our stock price at Weibull com parametrosaeKuma >0 e>0) seXtem uma agdes no momentovocéno futuro 650 euu+ Croce, Onde Cvocetem a bution with parameters a and b (a > 0 and b > 0) if X has time wu in the future is Sye““+“, where W, has the gamma distribuigdo continua para a qual o pdff(x| uma, be distribuicdo gama com pardmetrosavocéefcom >1. DeixeRsera a a continuous distribution for which the p.d.f. f(x|a, b) is distribution with parameters au and £ with B > 1. Letr be do seguinte modo: taxa de juros livre de risco. as follows: the risk-free interest rate. lo wo eosuma — narax 50, a.Prove issoe-ruE(Svocé- Sose e apenas seL=R- a b b-Ay-@/ay” for x s 0, a. Prove that e’"E(S,,) = So if and only if w=r— f(x| uma, b= la registro(BAP-1]). fla, b)= 4 ab a log(B/[B — 1)). 0 paraxso. b.Assuma issop=R-aregistro(B{-1]). Deixar Aseja 1 0 for x <0. b. Assume that » =r — a log(8/[B — 1]). Let R be 1 mi- Mostre que seXtem esta distribuigéo Weibull, entao a menos O cdf da distribuicao gama com parametrosa Show that if X has this Weibull distribution, then the ran- nus the c.d.f. of the gamma distribution with param- variavel aleatériaXbtem a distribuigdo exponencial com vocée 1. Provar que o preco neutro ao risco para a dom variable X” has the exponential distribution with pa- eters au and 1. Prove that the risk-neutral price for parametrof=2-. opcao de compra de uma acao pelo prego qno tempo rameter B =a~?. the option to buy one share of the stock for the price vocéé So R(q 8-11} ge-ruR(cB), onde q at time u is SpR(c[B — 1]) — ge" R(cB), where 20.Diz-se que uma variavel aleatériaXtem umaumentando a () () 20. It is said that a random variable X has an increasing taxa de falhasse a taxa de falhah(xMefinido no Exercicio 18 é cregistro G4 avocéregistro B. ru. failure rate if the failure rate h(x) defined in Exercise 18 is c= toe(£) +au tog( B ) — ru. uma funcdo crescente dexparax >0, e diz-se queXtem um So 1 an increasing function of x for x > 0, and it is said that X So B-1 diminuindo a taxa de falhasseh(x}é uma fungdo decrescente ~ , . has a decreasing failure rate if h(x) is a decreasing function : : : : : oy eg . c.Encontre o prego da opcdo que esta sendo considerada : Co. c. Find the price for the option being considered when dexparax >0. Suponha queXtem a distribuigdo Weibull com 4 of x for x > 0. Suppose that X has the Weibull distribution _ _ _ _ _ r _ . quando vocé=1,q=.50, R=0.06,a= 1, eB= 10. : : . u=1,q = Spo, r = 0.06, a = 1, and 6 = 10. pardmetrosaed, conforme definido no Exercicio 19. Mostre with parameters a and b, as defined in Exercise 19. Show 5.8 As Distribuigdes Beta 327 5.8 The Beta Distributions 327 5.8 As Distribuigdes Beta 5.8 The Beta Distributions A familia de distribuicgées beta 6 um modelo popular para varidveis aleatérias que The family of beta distributions is a popular model for random variables that are assumem valores no intervalo[0,1]. Um exemplo comum de tal varidvel aleatoria é a known to take values in the interval [0, 1]. One common example of such a random propor¢ao desconhecida de sucessos em uma sequéncia de tentativas de Bernoulli. variable is the unknown proportion of successes in a sequence of Bernoulli trials. A fungdo beta The Beta Function Exemplo Pecas defeituosas.Uma maquina produz peas defeituosas ou ndo, como em Example Defective Parts. A machine produces parts that are either defective or not, as in 5.8.1 Exemplo 3.6.9 na pagina 148. DeixePdenotam a proporcdo de defeitos entre todas as 5.8.1 Example 3.6.9 on page 148. Let P denote the proportion of defectives among all pecas que podem ser produzidas por esta maquina. Suponha que observemosntais pecas, parts that might be produced by this machine. Suppose that we observe n such parts, e deixeXseja o numero de defeituosos entre osnpartes observadas. Se assumirmos que as and let X be the number of defectives among the n parts observed. If we assume that partes sdo condicionalmente independentes, dadasP,entdo temos a mesma situacdo do the parts are conditionally independent given P, then we have the same situation as Exemplo 3.6.9, onde calculamos a fdp condicional dePdadoX=xcomo in Example 3.6.9, where we computed the conditional p.d.f. of P given X = x as , ‘| - - x4 n-x 2(pag.| xf Pe paraosp <1. (5.8.1) g(pix) =P OP tor cp <i. (5.8.1) 0g (1 -gnxdq So = gy" dq Estamos agora em condic6es de calcular a integral no denominador da Eq. (5.8.1). A We are now in a position to calculate the integral in the denominator of Eq. (5.8.1). distribuigdo com a pdf resultante €é membro de uma familia Util que estudaremos The distribution with the resulting p.d.f. is a member a useful family that we shall nesta secdo. - study in this section. < Definicao A funcgdo beta.Para cada positivoaef, definir Definition | The Beta Function. For each positive a and £, define 5.8.1 fi 5.8.1 1 Ba, BF Xa-i (1 -x)p-1 dx. Bia, B) = [ x®1(1 — x)Pldx. 0 0 A fungdo B é chamada defun¢dao beta. The function B is called the beta function. Podemos mostrar que a funcdo beta B é finita para todosa, 6 >0. A prova do We can show that the beta function B is finite for all aw, 6 > 0. The proof of the seguinte resultado baseia-se nos métodos do final da Sec. 3.9 e é fornecido no final following result relies on the methods from the end of Sec. 3.9 and is given at the end desta secdo. of this section. Teorema Para todosa, £ >0, Theorem For all a, 6B > 0, 5.8.1 5.8.1 rer a B(a, BF @ (PB) (5.8.2) Ba, B) = Dol) (5.8.2) (a+ B) T'(a +B) Exemplo Pecas defeituosas.Segue-se do Teorema 5.8.1 que a integral no denominador Example Defective Parts. It follows from Theorem 5.8.1 that the integral in the denominator 5.8.2 da Eq. (5.8.1) é 5.8.2 of Eq. (5.8.1) is hi (x+1) (n- X+ 1)X\(n-x} = 1 _ Pa@~t)ra—-x+))— x!n—x)! Qx(1 -Qn-x = AQG= ——>—_____~ " —~___, ga-q) “dq = ao = mm —. 0 (n+2) (n+1} 0 Pint 2) (n+)! O pdf condicional dePdadox=xé entao The conditional p.d.f. of P given X = x is then n+) 1)! g2(pag. | xF ON nal -P)n-x,para O<p <1. - 82(p|x) = + DT a — p)"™, for0<p <1. < xX (n-x)! x\(n — x)! 328 Capitulo 5 Distribuigées Especiais 328 Chapter 5 Special Distributions Definigdo das Distribuicédes Beta Definition of the Beta Distributions A distribuigdo no Exemplo 5.8.2 6 um caso especial do seguinte. The distribution in Example 5.8.2 is a special case of the following. Definicgao Distribuig6es beta.Deixara, T e deixeXseja uma variavel aleatéria com pdf Definition Beta Distributions. Let a, 8 > 0 and let X be a random variable with p.d.f. 5.8.2 5.8.2 (a+ B) Ta@+B) 9-1 -1 —— xo-1(1 -x)#-1para O<x <1, a 1 — x8 for0 <x <1, fx| a, Br | @@) ep (5.8.3) fole.p)=% Trg “oO” (5.8.3) 0 de outra forma. 0 otherwise. EntdoXtem odistribuicgo beta com parametros ae P. Then X has the beta distribution with parameters a and B. A distribuigdo condicional dePdadoX=xno Exemplo 5.8.2 esta a distribuigdo The conditional distribution of P given X =x in Example 5.8.2 is the beta beta com parametrosx+1 en-x+1. Também pode ser visto na Eq. (5.8.3) que a distribution with parameters x + 1 andn — x +1. It can also be seen from Eq. (5.8.3) distribuigdo beta com pardmetrosa= 1 ef= 1 é simplesmente a distribuigdo that the beta distribution with parameters a = 1 and 6 =1 is simply the uniform uniforme no intervalo [0,1]. distribution on the interval [0, 1]. Exemplo Castaneda v. Partida.No Exemplo 5.2.6 na pagina 278, 220 grandes jurados foram escolhidos Example Castaneda v. Partida. In Example 5.2.6 on page 278, 220 grand jurors were chosen 5.8.3 de uma populacdo de 79,1% mexicano-americana, mas apenas 100 grandes jurados eram 5.8.3 from a population that is 79.1 percent Mexican American, but only 100 grand jurors mexicano-americanos. O valor esperado de uma variavel aleatéria binomialXcom os were Mexican American. The expected value of a binomial random variable X with pardmetros 220 e 0,791 6EX#220x0.791 = 174.02. Isso é muito maior que o valor parameters 220 and 0.791 is E(X) = 220 x 0.791 = 174.02. This is much larger than observado deX=100. E evidente que tal discrepancia pode ocorrer por acaso. Afinal, existe the observed value of X = 100. Of course, such a discrepancy could occur by chance. uma probabilidade positiva deX=xpara todosx=0, ...,220. DeixePrepresentam a After all, there is positive probability of X = x for allx =0,..., 220. Let P stand for proporcdo de mexicanos-americanos entre todos os grandes jurados que seriam the proportion of Mexican Americans among all grand jurors that would be chosen escolhidos no sistema atual usado. O tribunal presumiu queAteve a distribuigdo binomial under the current system being used. The court assumed that X had the binomial com parametrosn=220 ep, condicionado aP=p. Deveriamos entdo estar interessados em distribution with parameters n = 220 and p, conditional on P = p. We should then saber se Pé substancialmente menor que o valor 0,791, que representa a escolha imparcial be interested in whether P is substantially less than the value 0.791, which represents do jurado. Por exemplo, suponha que definimos discriminagao como significando quePso0. impartial juror choice. For example, suppose that we define discrimination to mean 8x0.791 = 0.6328. Gostariamos de calcular a probabilidade condicional dePs0.6328 dadoxX that P <0.8 x 0.791 = 0.6328. We would like to compute the conditional probability =100. of P < 0.6328 given X = 100. Suponha que a distribuigdo dePantes de observarXfoi a distribuigdo beta Suppose that the distribution of P prior to observing X was the beta distribution com pardmetrosaef. Entao 0 pdf dePera with parameters a and . Then the p.d.f. of P was (a+ B) P@+B) o-1 -1 (PEF = ——— pa-1(1 -p)p-1,para O<p <1. fo(p) = ———— p* 1 — p)P 1, for0< p <1. (a) (p) P(@)P(B) O PF condicional deXdado P=pé 0 binémio pf The conditional p.f. of X given P = p is the binomial p.f. 0, 20 220 gi(x| pF px(1 -p20-x,parax=0,...,220. g1(x|p) = ( ) p(1— p)9*, for x =0,..., 220. x x Podemos agora aplicar o teorema de Bayes para variadveis aleatérias (3.6.13) para obter a We can now apply Bayes’ theorem for random variables (3.6.13) to obtain the con- fdp condicional dePdadoX=100: ditional p.d.f. of P given X = 100: (920) 22 (at B) 220 Ta@t+Bp) a - +40 p\%-p) (0 Pei -p)p-' (100) pq — pre 1a — pyb-1 (pag. (100): ai7AWAJJ44HJa 100) = aA g2(pag.| A100) 82(p|100) 7,100) (220) 220 = 100 (a+B) Pa+100-1(1 -/)6+120-1, (5.8.4) _ (ooh @ + B) prtloo-ly _ p)P}20-1, (5.8.4) (a) (BA (100) P(@)T(B) f,(100) para O0<p <1, ondefi(100% o FP marginal dexas 100. Como a funcdo de for 0 < p <1, where f,(100) is the marginal p.f. of X at 100. As a function of po lado direito da Eq. (5.8.4) € uma constante vezespa+100-1(1 -/p)+120-1para O<p < p the far right side of Eq. (5.8.4) is a constant times p%+!00-1(1 — p)y8+!20-1 for 1. Como tal, é claramente o pdf de uma distribuigdo beta. Os parametros 0 < p <1. As such, it is clearly the p.d.f. of a beta distribution. The parameters 5.8 As Distribuigdes Beta 329 5.8 The Beta Distributions 329 dessa distribuigdo beta sdoat 100 eft 120. Portanto, a constante deve ser 1/8( of that beta distribution are a + 100 and £6 + 120. Hence, the constant must be 100+a,120+). Aquilo é, 1/B(100 + @, 120 + B). That is, (a+ B+220) Ta@+ Bp +220) 100-1 120-1 ig.|100)=& —————_———____- pa+100-1(1 -/p)s+120-1,para O<p <1. 100) = ———_~_—~____p** 1— pot , for0<p<1. g2(pag.|100 (a*100) (B*120) Pa+100-1(1 -p)B+ p p 82(p|100) F@ + 100re +120)” (1— p) P (5.8.5) (5.8.5) Depois de escolher valores deaef, poderiamos calcular Pr(PS0.6328 | X=100)e decidir After choosing values of a and 8, we could compute Pr(P < 0.6328|X = 100) and qual a probabilidade de ter havido discriminagdo. Veremos como escolherae Gdepois decide how likely it is that there was discrimination. We will see how to choose a and aprendermos como calcular o valor esperado de uma variavel aleatéria beta. - B after we learn how to compute the expected value of a beta random variable. < Nota: Distribuigao Condicional dePdepois de observarXcom distribuigao Note: Conditional Distribution of P after Observing X with Binomial Distribu- binomial.O calculo da distribuigdo condicional de PdadoX=100 no Exemplo 5.8.3 é tion. The calculation of the conditional distribution of P given X = 100 in Exam- um caso especial de resultado geral util. Na verdade, a prova do seguinte resultado é ple 5.8.3 is a special case of a useful general result. In fact, the proof of the following essencialmente dada no Exemplo 5.8.3 e ndo sera repetida. result is essentially given in Example 5.8.3, and will not be repeated. Teorema Suponha quePtem a distribuigdo beta com parametrosaeZ, e o condicional Theorem Suppose that P has the beta distribution with parameters @ and f, and the conditional 5.8.2 distribuigdo deXdadoP=pé a distribuigdo binomial com pardmetrosne p. Entdo a 5.8.2 distribution of X given P = p is the binomial distribution with parameters n and distribuigdo condicional dePdadoX=xé a distribuigdo beta com pardmetrosa+xeB p. Then the conditional distribution of P given X = x is the beta distribution with +1-x. 7 parameters a+ x and B+n—x. 7 Momentos de Distribuigdes Beta Moments of Beta Distributions Teorema Momentos.Suponha queAtem a distribuigdo beta com pardmetrosaef. Entdo Theorem Moments. Suppose that X has the beta distribution with parameters a and £. Then 5.8.3 para cada inteiro positivok, 5.8.3 for each positive integer k, + wae _— Exe A (OR (5.8.6) Ex = 2@ FD @ tke) (5.8.6) (a+ Bat B+1)... (at B+k-1) (a+ pa@+B+1)---a@+tB+k—1) Em particular, In particular, a a EX, —, E(x) = ——_, op = Var (XF WB Var(X) = BB (a+ Bp(at B+) (a+ pyr(a+ B+) ProvaParak=1,2,..., j Proof Fork =1,2,..., | 1 EXk xkf(x| a, B) dx E(X= / x* f(xla, B) dx 0 0 carp)! +B) ft atk-t = ——— — xark-1(1 -x)p-1 0X. = wer? | otk — Pol dy, (a(B) oO P(@)P(B) Jo Portanto, pela Eq. (5.8.2), Therefore, by Eq. (5.8.2), + : ey LOB KB) B(xh = E+) Te + HT) (a) (B) (at k+ B) P@rp) la@t+k+B) que simplifica para a Eq. (5.8.6). O caso especial da média é simples, enquanto a which simplifies to Eq. (5.8.6). The special case of the mean is simple, while the variancia segue facilmente de variance follows easily from EX2)— aa) u E(x?) =_—_%@+) u (at Bat B+1) (a+ B\a+p+) Existem muitas distribuigdes beta para fornecer tabelas no final do livro. There are too many beta distributions to provide tables in the back of the Qualquer bom pacote estatistico sera capaz de calcular os cdfs de muitos beta book. Any good statistical package will be able to calculate the c.d.f.’s of many beta 330 Chapter 5 Special Distributions Figure 5.8 Probability of discrimination as a function of β. 1.0 0.8 0.6 0.4 0.2 0 Probability of P at most 0.6328 20 40 60 80 100 b distributions, and some packages will also be able to calculate the quantile functions. The next example illustrates the importance of being able to calculate means and c.d.f.’s of beta distributions. Example 5.8.4 Castaneda v. Partida. Continuing Example 5.8.3, we are now prepared to see why, for every reasonable choice one makes for α and β, the probability of discrimination in Castaneda v. Partida is quite large. To avoid bias either for or against the defendant, we shall suppose that, before learning X, the probability that a Mexican American juror would be selected on each draw from the pool was 0.791. Let Y = 1if a Mexican American juror is selected on a single draw, and let Y = 0 if not. Then Y has the Bernoulli distribution with parameter p given P = p and E(Y|p) = p. So the law of total probability for expectations, Theorem 4.7.1, says that Pr(Y = 1) = E(Y) = E[E(Y|P )] = E(P). This means that we should choose α and β so that E(P) = 0.791. Because E(P) = α/(α + β), this means that α = 3.785β. The conditional distribution of P given X = 100 is the beta distribution with parameters α + 100 and β + 120. For each value of β > 0, we can compute Pr(P ≤ 0.6328|X = 100) using α = 3.785β. Then, for each β we can check whether or not that probability is small. A plot of Pr(P ≤ 0.6328|X = 100) for various values of β is given in Fig. 5.8. From the figure, we see that Pr(P ≤ 0.6328|X = 100) < 0.5 only for β ≥ 51.5. This makes α ≥ 194.9. We claim that the beta distribution with parameters 194.9 and 51.5 as well as all others that make Pr(P ≤ 0.6328|X = 100) < 0.5 are unreasonable because they are incredibly preju- diced about the possibility of discrimination. For example, suppose that someone actually believed, before observing X = 100, that the distribution of P was the beta distribution with parameters 194.9 and 51.5. For this beta distribution, the proba- bility that there is discrimination would be Pr(P ≤ 0.6328) = 3.28 × 10−8, which is essentially 0. All of the other priors with β ≥ 51.5 and α = 3.785β have even smaller probabilities of {P ≤ 0.6328}. Arguing from the other direction, we have the fol- lowing: Anyone who believed, before observing X = 100, that E(P) = 0.791 and the probability of discrimination was greater than 3.28 × 10−8, would believe that the probability of discrimination is at least 0.5 after learning X = 100. This is then fairly convincing evidence that there was discrimination in this case. ◀ Example 5.8.5 A Clinical Trial. Consider the clinical trial described in Example 2.1.4. Let P be the proportion of all patients in a large group receiving imipramine who have no relapse (called success). A popular model for P is that P has the beta distribution with 330 Capítulo 5 Distribuições Especiais Figura 5.8Probabilidade de discriminação em função de β. 1,0 0,8 0,6 0,4 0,2 0 20 40 60 80 100 b distribuições, e alguns pacotes também serão capazes de calcular as funções quantílicas. O próximo exemplo ilustra a importância de ser capaz de calcular médias e cdfs de distribuições beta. Exemplo 5.8.4 Castaneda v. Partida.Continuando o Exemplo 5.8.3, estamos agora preparados para ver por que, por toda escolha razoável que alguém faz paraαeβ, a probabilidade de discriminação no caso Castaneda v. Partida é bastante grande. Para evitar preconceitos a favor ou contra o réu, vamos supor que, antes de sabermosX, a probabilidade de um jurado mexicano- americano ser selecionado em cada sorteio do grupo era de 0,791. DeixarS=1 se um jurado mexicano-americano for selecionado em um único sorteio, e deixeS=0 se não. EntãoStem a distribuição de Bernoulli com parâmetropdadoP=peE(S|p)=p. Portanto, a lei da probabilidade total para expectativas, Teorema 4.7.1, diz que Pr.(S=1)=E(S)=E[E(S|P)] =E(P ). Isto significa que devemos escolherαeβpara queE(P)=0.791. PorqueE(P)= α/(α+β), Isso significa queα= 3.785β. A distribuição condicional dePdadoX= 100 é a distribuição beta com parâmetrosα+ 100 eβ+ 120. Para cada valor de β >0, podemos calcular Pr(P ≤0.6328|X=100)usandoα= 3.785β. Então, para cadaβpodemos verificar se essa probabilidade é pequena ou não. Um enredo de Pr(P≤0.6328|X=100) para vários valores deβé dado na Fig. 5.8. Pela figura, vemos que Pr(P≤ 0.6328|X=100) <0.5 apenas paraβ≥51.5. Isso fazα≥194.9. Afirmamos que a distribuição beta com os parâmetros 194,9 e 51,5, bem como todas as outras que fazem Pr(P≤0.6328|X=100) < 0.5 não são razoáveis porque têm um preconceito incrível quanto à possibilidade de discriminação. Por exemplo, suponha que alguém realmente acreditasse, antes de observarX=100, que a distribuição dePfoi a distribuição beta com parâmetros 194,9 e 51,5. Para esta distribuição beta, a probabilidade de haver discriminação seria Pr(P≤0 .6328)=3.28×10−8, que é essencialmente 0. Todos os outros anteriores comβ≥51.5 eα = 3.785βtêm probabilidades ainda menores de {P≤0.6328}. Argumentando na outra direção, temos o seguinte: Quem acreditou, antes de observarX=100, issoE(P)=0.791 e a probabilidade de discriminação foi superior a 3.28×10−8, acreditaria que a probabilidade de discriminação é de pelo menos 0,5 após aprenderX=100. Esta é, portanto, uma prova bastante convincente de que houve discriminação neste caso. - Exemplo 5.8.5 Um ensaio clínico.Considere o ensaio clínico descrito no Exemplo 2.1.4. DeixarPseja o proporção de todos os pacientes em um grande grupo recebendo imipramina que não apresentam recaída (chamado sucesso). Um modelo popular paraPé aquelePtem a distribuição beta com Probabilidade dePno máximo 0,6328 5.8 The Beta Distributions 331 parameters α and β. Choosing α and β can be done based on expert opinion about the chance of success and on the effect that data should have on the distribution of P after observing the data. For example, suppose that the doctors running the clinical trial think that the probability of success should be around 1/3. Let Xi = 1if the ith patient is a success and Xi = 0 if not. We are supposing that E(Xi|p) = Pr(Xi = 1|p) = p, so the law of total probability for expectations (Theorem 4.7.1) says that Pr(Xi = 1) = E(Xi) = E[E(Xi|P)] = E(P ) = α α + β . If we want Pr(Xi = 1) = 1/3, we need α/(α + β) = 1/3, so β = 2α. Of course, the doctors will revise the probability of success after observing patients from the study. The doctors can choose α and β based on how that revision will occur. Assume that the random variables X1, X2, . . . (the indicators of success) are con- ditionally independent given P = p. Let X = X1 + . . . + Xn be the number of patients out of the first n who are successes. The conditional distribution of X given P = p is the binomial distribution with parameters n and p, and the marginal distribution of P is the beta distribution with parameters α and β. Theorem 5.8.2 tells us that the conditional distribution of P given X = x is the beta distribution with parame- ters α + x and β + n − x. Suppose that a sequence of 20 patients, all of whom are successes, would raise the doctors’ probability of success from 1/3 up to 0.9. Then 0.9 = E(P |X = 20) = α + 20 α + β + 20. This equation implies that α + 20 = 9β. Combining this with β = 2α, we get α = 1.18 and β = 2.35. Finally, we can ask, what will be the distribution of P after observing some patients in the study? Suppose that 40 patients are actually observed, and 22 of them recover (as in Table 2.1). Then the conditional distribution of P given this observation is the beta distribution with parameters 1.18 + 22 = 23.18 and 2.35 + 18 = 20.35. It follows that E(P|X = 22) = 23.18 23.18 + 20.35 = 0.5325. Notice how much closer this is to the proportion of successes (0.55) than was E(P) = 1/3. ◀ Proof of Theorem 5.8.1. Theorem 5.8.1, i.e., Eq. (5.8.2), is part of the following useful result. The proof uses Theorem 3.9.5 (multivariate transformation of random variables). If you did not study Theorem 3.9.5, you will not be able to follow the proof of Theorem 5.8.4. Theorem 5.8.4 Let U and V be independent random variables with U having the gamma distribution with parameters α and 1 and V having the gamma distribution with parameters β and 1. Then . X = U/(U + V ) and Y = U + V are independent, . X has the beta distribution with parameters α and β, and . Y has the gamma distribution with parameters α + β and 1. Also, Eq. (5.8.2) holds. 5.8 As Distribuições Beta 331 parâmetrosαeβ. Escolhendoαeβpode ser feito com base na opinião de especialistas sobre a chance de sucesso e sobre o efeito que os dados devem ter na distribuição dePdepois de observar os dados. Por exemplo, suponha que os médicos que conduzem o ensaio clínico pensem que a probabilidade de sucesso deveria ser em torno de 1/3. DeixarXeu=1 se oeuo paciente é um sucesso eXeu=0 se não. Estamos supondo queEXeu|p)=Pr.(Xeu=1|p)=p, então a lei da probabilidade total para expectativas (Teorema 4.7.1) diz que α α+β Pr.(Xeu=1)=EXeu)=E[EXeu|P)] =E(P)= . Se quisermos Pr(Xeu=1)=1/3, precisamosα/(α+β)= 1/3, entãoβ= 2α. É claro que os médicos revisarão a probabilidade de sucesso após observar os pacientes do estudo. Os médicos podem escolherαeβcom base em como essa revisão ocorrerá. Suponha que as variáveis aleatóriasX1, X2, . . .(os indicadores de sucesso) são condicionalmente independentes, dadosP=p. DeixarX=X1+. . .+Xnserá o número de pacientes do primeironque são sucessos. A distribuição condicional deXdadoP=p é a distribuição binomial com parâmetrosnep, e a distribuição marginal dePé a distribuição beta com parâmetrosαeβ. O Teorema 5.8.2 nos diz que a distribuição condicional dePdadoX=xé a distribuição beta com parâmetros α+xeβ+n-x. Suponha que uma sequência de 20 pacientes, todos bem-sucedidos, aumentaria a probabilidade de sucesso dos médicos de 1/3 para 0,9. Então α+20 α+β+20 0.9 =E(P|X=20)= . Esta equação implica queα+ 20 = 9β. Combinando isso comβ= 2α, Nós temosα= 1.18 eβ= 2.35. Finalmente, podemos perguntar qual será a distribuição dePdepois de observar alguns pacientes no estudo? Suponha que 40 pacientes sejam realmente observados e 22 deles se recuperem (como na Tabela 2.1). Então a distribuição condicional dePdada esta observação é a distribuição beta com parâmetros 1.18 + 22 = 23.18 e 2.35 + 18 = 20.35. Daqui resulta que 23.18 23.18 + 20.35 E(P|X=22)= =0.5325. Observe como isso está muito mais próximo da proporção de sucessos (0,55) do que estavaE(P)= 1/3. - Prova do Teorema 5.8.1. Teorema 5.8.1, ou seja, Eq. (5.8.2), faz parte do seguinte resultado útil. A prova utiliza o Teorema 3.9.5 (transformação multivariada de variáveis aleatórias). Se você não estudou o Teorema 3.9.5, não poderá seguir a prova do Teorema 5.8.4. Teorema 5.8.4 DeixarvocêeVser variáveis aleatórias independentes comvocêtendo a distribuição gama com parâmetrosαe 1 eVtendo a distribuição gama com parâmetrosβe 1. Então . X=você/(você+V)eS=você+Vsão independentes, . Xtem a distribuição beta com parâmetrosαeβ, e . Stem a distribuição gama com parâmetrosα+βe 1. Além disso, a Eq. (5.8.2) é válido. 332 Capitulo 5 Distribuigées Especiais 332 Chapter 5 Special Distributions ProvaPorquevocée Kado independentes, o pdf conjunto devocée o produto de Proof Because U and V are independent, the joint p.d.f. of U and V is the product seus pdfs marginais, que sdo of their marginal p.d.f.’s, which are vocéa-1 €-vocé u~—le-4 fi(vocé= —————,, paravocé >0, f,@) = ——_., for u > 0, (a) D(a) Vg-1 ev pP-le-v AVE ——-, parav>0. fav) = ———, forv>0. (P) I'(B) Entao, o pdf conjunto é So, the joint p.d.f. is vocéa-1 VB-1 €-(vocé+v) ye lyBb-le-utv) f(vocé, v>F —————_~—, flu, v) = ————_ (a) (p) P(@)P(h) paravocé >0 ev >0. foru >Oandv > 0. A transformacdo de(vocé, v)para(x, ye The transformation from (u, v) to (x, y) is vocé __. a a xX=Ri(vocé, vF ———_ esim=r (vogé, VF Vocétv, x=ry(u, v) = a and y=r(u,v)=u+u, vocé+v u+v e o inverso é and the inverse is voc€=€1 (x, YE xyev=€2(x, VE (1 -x)y. u=s(x, y) =xy andv=s7(x, y) =(1—~x)y. O Jacobiano é 0 determinante da matriz The Jacobian is the determinant of the matrix [ ] J= sim xX J= y x "= sin-x -y 1l-x]? que é igualsim. De acordo com o Teorema 3.9.5, a pdf conjunta de(X, Ye entado which equals y. According to Theorem 3.9.5, the joint p.d-f. of (X, Y) is then WX, YF F (51 (x, Y), S2(x, Wy 8(x, y) = f(s, y), 92, yy Xa-1(1 -X)B-1SiMat B-1 @-sim a-1(1 _ y)B-lye+B-1p,-y SO (5.8.7) oye es (5.8.7) (a) (p) P(@)P(f) para O<x <1 evocé >0. Observe que esta pdf conjunta leva em consideragdo fungées for 0 <x <1and y>0. Notice that this joint p.d-f. factors into separate functions separadas dexesim, e, portantoXeSsdo independentes. A distribui¢do marginal deS of x and y, and hence X and Y are independent. The marginal distribution of Y is esta disponivel no Teorema 5.7.7. O pdf marginal deXé obtido integrandosim de available from Theorem 5.7.7. The marginal p.d-f. of X is obtained by integrating y (5.8.7): out of (5.8.7): Joxei(l -X)B-1 SiMat B-1 @-sim oo xe _ x)P-lyat+B-lp-y 91 XF TTT Toro oo rtorrver g(x) = dy 0 (a) (p) 0 P@)r(B) = xan a Cale simarp-1 @-simmorrer = xe td = xh _ xh [ yXtB-le-Y dy (a) (B) 0 P@P(p) Jo at r = OB) aa -X)B-1, (5.8.8) = Pa +B) o-14 — x)b1 (5.8.8) (a) (P) P@P(p) onde a ultima equacgdo segue de (5.7.2). Como o lado direito de (5.8.8) 6 uma pdf, ele where the last equation follows from (5.7.2). Because the far right side of (5.8.8) is integra-se a 1, 0 que prova a Eq. (5.8.2). Além disso, pode-se reconhecer o lado ap.d.f., it integrates to 1, which proves Eq. (5.8.2). Also, one can recognize the far direito de (5.8.8) como o pdf da distribuigdo beta com pardmetrosaeP. = right side of (5.8.8) as the p.d.f. of the beta distribution with parameters w and 6. Resumo Summary A familia de distribuig6es beta 6 um modelo popular para variaveis aleatérias que se The family of beta distributions is a popular model for random variables that lie in encontram no intervalo(0,1), como proporcées desconhecidas de sucesso para sequéncias de the interval (0, 1), such as unknown proportions of success for sequences of Bernoulli ensaios de Bernoulli. A média da distribuigdéo beta com parametrosaefé a/at B). SeX trials. The mean of the beta distribution with parameters w and 6 is a/(a + B). If X 5.9 The Multinomial Distributions 333 has the binomial distribution with parameters n and p conditional on P = p, and if P has the beta distribution with parameters α and β, then, conditional on X = x, the distribution of P is the beta distribution with parameters α + x and β + n − x. Exercises 1. Compute the quantile function of the beta distribution with parameters α > 0 and β = 1. 2. Determine the mode of the beta distribution with pa- rameters α and β, assuming that α > 1 and β > 1. 3. Sketch the p.d.f. of the beta distribution for each of the following pairs of values of the parameters: a. α = 1/2 and β = 1/2 b. α = 1/2 and β = 1 c. α = 1/2 and β = 2 d. α = 1 and β = 1 e. α = 1 and β = 2 f. α = 2 and β = 2 g. α = 25 and β = 100 h. α = 100 and β = 25 4. Suppose that X has the beta distribution with param- eters α and β. Show that 1 − X has the beta distribution with parameters β and α. 5. Suppose that X has the beta distribution with param- eters α and β, and let r and s be given positive integers. Determine the value of E[Xr(1 − X)s]. 6. Suppose that X and Y are independent random vari- ables, X has the gamma distribution with parameters α1 and β, and Y has the gamma distribution with parameters α2 and β. Let U = X/(X + Y) and V = X + Y. Show that (a) U has the beta distribution with parameters α1 and α2, and (b) U and V are independent. Hint: Look at the steps in the proof of Theorem 5.8.1. 7. Suppose that X1 and X2 form a random sample of two observed values from the exponential distribution with parameter β. Show that X1/(X1 + X2) has the uniform distribution on the interval [0, 1]. 8. Suppose that the proportion X of defective items in a large lot is unknown and that X has the beta distribution with parameters α and β. a. If one item is selected at random from the lot, what is the probability that it will be defective? b. If two items are selected at random from the lot, what is the probability that both will be defective? 9. A manufacturer believes that an unknown proportion P of parts produced will be defective. She models P as having a beta distribution. The manufacturer thinks that P should be around 0.05, but if the first 10 observed products were all defective, the mean of P would rise from 0.05 to 0.9. Find the beta distribution that has these properties. 10. A marketer is interested in how many customers are likely to buy a particular product in a particular store. Let P be the proportion of all customers in the store who will buy the product. Let the distribution of P be uniform on the interval [0, 1]before observing any data. The marketer then observes 25 customers and only six buy the product. If the customers were conditionally independent given P , find the conditional distribution of P given the observed customers. 5.9 The Multinomial Distributions Many times we observe data that can assume three or more possible values. The family of multinomial distributions is an extension of the family of binomial distributions to handle these cases. The multinomial distributions are multivariate distributions. Definition and Derivation of Multinomial Distributions Example 5.9.1 Blood Types. In Example 1.8.4 on page 34, we discussed human blood types, of which there are four: O, A, B, and AB. If a number of people are chosen at random, we might be interested in the probability of obtaining certain numbers of each blood type. Such calculations are used in the courts during paternity suits. ◀ In general, suppose that a population contains items of k different types (k ≥ 2) and that the proportion of the items in the population that are of type i is pi 5.9 As Distribuições Multinomiais 333 tem a distribuição binomial com parâmetrosnepcondicional aP=p, e se Ptem a distribuição beta com parâmetrosαeβ, então, condicionado aX=x, a distribuição dePé a distribuição beta com parâmetrosα+xeβ+n-x. Exercícios 1.Calcule a função quantílica da distribuição beta com parâmetrosα >0 eβ= 1. 7.Suponha queX1eX2formar uma amostra aleatória de dois valores observados da distribuição exponencial com parâmetroβ. Mostre issoX1/(X1+X2)tem distribuição uniforme no intervalo [0,1]. 2.Determine o modo da distribuição beta com parâmetrosαeβ, assumindo queα >1 eβ >1. 8.Suponha que a proporçãoXde itens defeituosos em um grande lote é desconhecido e queXtem a distribuição beta com parâmetrosαeβ. 3.Esboce a pdf da distribuição beta para cada um dos seguintes pares de valores dos parâmetros: a.α=1/2 eβ= 1/2 c.α=1/2 eβ= 2 e.α=1 eβ= 2 g.α=25 eβ= 100 b.α=1/2 eβ= 1 d.α=1 eβ= 1 f.α=2 eβ= 2 h.α=100 eβ= 25 a.Se um item for selecionado aleatoriamente do lote, qual é a probabilidade de ele apresentar defeito? b.Se dois itens forem selecionados aleatoriamente do lote, qual é a probabilidade de ambos serem defeituosos? 9.Um fabricante acredita que uma proporção desconhecida Pdas peças produzidas estarão com defeito. Ela modelaPcomo tendo uma distribuição beta. O fabricante pensa queP deveria estar em torno de 0,05, mas se os primeiros 10 produtos observados estivessem todos defeituosos, a média dePaumentaria de 0,05 para 0,9. Encontre a distribuição beta que possui essas propriedades. 4.Suponha queXtem a distribuição beta com parâmetrosαeβ. Mostre que 1 -Xtem a distribuição beta com parâmetrosβeα. 5.Suponha queXtem a distribuição beta com parâmetrosαeβ, e deixarReéreceber inteiros positivos. Determine o valor deE[XR(1 -X)é]. 10.Um profissional de marketing está interessado em saber quantos clientes provavelmente comprarão um determinado produto em uma determinada loja. Deixar Pserá a proporção de todos os clientes da loja que comprarão o produto. Deixe a distribuição dePser uniforme no intervalo [0,1] antes de observar quaisquer dados. O profissional de marketing observa então 25 clientes e apenas seis compram o produto. Se os clientes fossem condicionalmente independentes, dado P, encontre a distribuição condicional dePdados os clientes observados. 6.Suponha queXeSsão variáveis aleatórias independentes,X tem a distribuição gama com parâmetrosα1 eβ, eStem a distribuição gama com parâmetros α2eβ. Deixarvocê=X/(X+S)eV=X+S.Mostre isso (a)vocêtem a distribuição beta com parâmetrosα1eα2, e (b)vocêeVsão independentes.Dica:Veja as etapas da prova do Teorema 5.8.1. 5.9 As Distribuições Multinomiais Muitas vezes observamos dados que podem assumir três ou mais valores possíveis. A família de distribuições multinomiais é uma extensão da família de distribuições binomiais para lidar com estes casos. As distribuições multinomiais são distribuições multivariadas. Definição e Derivação de Distribuições Multinomiais Exemplo 5.9.1 Tipos sanguíneos.No Exemplo 1.8.4 na página 34, discutimos os tipos de sangue humano, dos quais existem quatro: O, A, B e AB. Se um número de pessoas for escolhido aleatoriamente, poderemos estar interessados na probabilidade de obter certos números de cada tipo sanguíneo. Tais cálculos são utilizados nos tribunais durante processos de paternidade. - Em geral, suponha que uma população contenha itens dektipos diferentes(k ≥2) e que a proporção dos itens na população que são do tipoeuépeu 334 Capitulo 5 Distribuigées Especiais 334 Chapter 5 Special Distributions (eu=1,..., k). E assumido quepeu>0 paraeu=1,..., k, e (pdg.1 2 bint peut. Deixep= (i =1,...,k). It is assumed that p; > 0 fori =1,...,k, and yO pj =1. Let p= ,+.., Ppag.kdenota o vetor dessas probabilidades. (P1,---, Px) denote the vector of these probabilities. A seguir, suponha quenitens sao selecionados aleatoriamente da populacgao, com Next, suppose that n items are selected at random from the population, with reposic¢ao, e deixadosXeudenota o numero de itens selecionados que sao do tipoeu (eu=1,..., replacement, and let X; denote the number of selected items that are of type i k). Porque onitens sdo selecionados aleatoriamente da populacdo com reposicao, as selecdes (i =1,...,). Because the n items are selected from the population at random with serado independentes umas das outras. Portanto, a probabilidade de que o primeiro item seja replacement, the selections will be independent of each other. Hence, the probability do tipoeu, o segundo item do tipoeuz, e assim por diante, é simplesmente that the first item will be of type 7,, the second item of type iz, and so on, is simply peypeuy «+ - P-Bortanto, a probabilidade de que a sequéncia denos resultados consistirdo Pi,Pi, -- - Pi,- Therefore, the probability that the sequence of n outcomes will consist de exatamentexiitens do tipo 1,xzitens do tipo 2, e assim por diante, selecionados em umespecial of exactly x; items of type 1, x» items of type 2, and so on, selected in a particular pedido pré-especificado, 6px, pe . .PxkkSegue-se que a probabilidade de obter exatamente prespecified order, is Pi p> Lee DE. It follows that the probability of obtaining exactly Xeuitens do tipoeu (eu=1,..., kJé igual a probabilidadepxa, .. . pxthultiplicado por x; Items of typei (i =1, ..., k) is equal to the probability Pi'P> a PE multiplied by o numero total de maneiras diferentes pelas quais a ordem dosnitens podem ser especificados. the total number of different ways in which the order of the n items can be specified. Da discussdo que levou a definicdo de coeficientes multinomiais (Definigdo 1.9.1), From the discussion that led to the definition of multinomial coefficients (Defini- segue-se que 0 numero total de maneiras diferentes pelas quaisnitens podem ser tion 1.9.1), it follows that the total number of different ways in which n items can be organizados quando houverxeuitens do tipoeu (eu=1, ..., k}é dado pelo multinédmio arranged when there are x; items of type i (i =1, ..., k) is given by the multinomial coeficiente ( ) coefficient n _ n ( n ) _ n! X1,...,Xk xia! XK” Xq, 2-5 XK xy bte exp! Na notacdo de distribuigées multivariadas, sejaX=(X1,..., X<Jdenotar o vetor aleatério de In the notation of multivariate distributions, let X = (Xj, ..., X;,) denote the random contagens, e deixarx=(x1,..., xkJdenota um valor possivel para esse vetor aleatorio. vector of counts, and let x = (x1,...,x;,) denote a possible value for that random Finalmente, deixef (x| nao, p)denotar o FP conjunto deX. Entaéo vector. Finally, let f(x|n, p) denote the joint p.f. of X¥. Then Ff (x| ad a . 2, XKEX k) f@ln, p) = Pr(X =x) = Pr(X, = x, ..., Xp = Xp) n _ n xy Xk; _ _ | Mo ky Pin.» PxXk Sexit, . XIE n, (5,9. 1) -| ("ei spe ifxy te +x, =n, (5.9.1) 0 de outra forma. 0 otherwise. Definicgao Distribuigées Multinomiais.Um vetor aleatério discretoX=(%1,..., Xk)cujo PF Definition Multinomial Distributions. A discrete random vector X = (Xj,..., X;,) whose p.f. 5.9.1 é dado pela Eq. (5.9.1) tem odistribuicdo multinomial com parametrosnep= (pag.1 5.9.1 is given by Eq. (5.9.1) has the multinomial distribution with parameters n and p= pee, PAG.k). (P1,---> Px): Exemplo Participagdo em um jogo de beisebol.Suponha que 23% das pessoas que frequentam um Example Attendance at a Baseball Game. Suppose that 23 percent of the people attending a 5.9.2 certos jogos de beisebol vivem a menos de 16 quil6metros do estadio, 59% vivem 5.9.2 certain baseball game live within 10 miles of the stadium, 59 percent live between entre 16 e 80 quildmetros do estadio e 18% vivem a mais de 80 quildmetros do 10 and 50 miles from the stadium, and 18 percent live more than 50 miles from estadio. Suponha também que 20 pessoas sejam selecionadas aleatoriamente entre the stadium. Suppose also that 20 people are selected at random from the crowd a multiddo que assiste ao jogo. Determinaremos a probabilidade de que sete das attending the game. We shall determine the probability that seven of the people pessoas selecionadas vivam num raio de 10 milhas do estadio, oito delas vivam entre selected live within 10 miles of the stadium, eight of them live between 10 and 50 10 e 50 milhas do estadio e cinco delas vivam a mais de 50 milhas do estadio. miles from the stadium, and five of them live more than 50 miles from the stadium. Assumiremos que a multiddo que assiste ao jogo é tao grande que é irrelevante We shall assume that the crowd attending the game is so large that it is irrelevant se as 20 pessoas sdo selecionadas com ou sem reposigdo. Podemos, portanto, whether the 20 people are selected with or without replacement. We can therefore assumir que foram selecionados com reposicdo. Segue-se entdo da Eq. (5.9.1) que a assume that they were selected with replacement. It then follows from Eq. (5.9.1) probabilidade exigida é that the required probability is —— (0.23)7(0.598(0.18)5= 0.0094. - —— (0.23) (0.59)°(0.18)°? = 0.0094. < 7! 815! 7! 815! Exemplo Tipos sanguineos.Berry e Geisser (1986) estimam as probabilidades dos quatro Example Blood Types. Berry and Geisser (1986) estimate the probabilities of the four blood 5.9.3 tipos na Tabela 5.3 com base em uma amostra de 6.004 californianos brancos que foi analisada 5.9.3 types in Table 5.3 based on a sample of 6004 white Californians that was analyzed by por Grunbaum et al. (1978). Suponha que selecionaremos duas pessoas aleatoriamente dessa Grunbaum et al. (1978). Suppose that we will select two people at random from this populacao e observaremos seus tipos sanguineos. Qual é a probabilidade de ambos terem o population and observe their blood types. What is the probability that they will both mesmo tipo sanguineo? O evento de duas pessoas terem 0 mesmo tipo sanguineo é a unido de have the same blood type? The event that the two people have the same blood type quatro eventos disjuntos, cada um dos quais € 0 evento que as duas pessoas is the union of four disjoint events, each of which is the event that the two people 5.9 As Distribuig6es Multinomiais 335 5.9 The Multinomial Distributions 335 Tabela 5.3Probabilidades estimadas de sangue Table 5.3 Estimated probabilities of blood tipos para californianos brancos types for white Californians A B AB 0 A B AB O ambos tém um dos quatro tipos sanguineos diferentes. Cada um desses eventos tem probabilidade (2 both have one of the four different blood types. Each of these events has probability 2,0,0,0vezes 0 quadrado de uma das quatro probabilidades. A probabilidade de querermos (, 00 0) times the square of one of the four probabilities. The probability that we want é a soma das probabilidades dos quatro eventos: is the sum of the probabilities of the four events: ( ) 2 2 (0.3602+ 0.1232+ 0.0382+ 0.47920.376. - ( ) 0.360" + 0.1237 + 0.0387 + 0.4797) = 0.376. < 2,0,0,0 2,0, 0, 0 Relagdo entre as Distribuigdes Multinomial e Binomial Relation between the Multinomial and Binomial Distributions Quando a populagdo amostrada contém apenas dois tipos diferentes de itens, ou When the population being sampled contains only two different types of items, seja, quandofé=2, cada distribuicdo multinomial se reduz essencialmente a uma that is, when k = 2, each multinomial distribution reduces to essentially a binomial distribuigdo binomial. A forma precisa dessa relagdo é a seguinte. distribution. The precise form of this relationship is as follows. Teorema Suponha que o vetor aleat6rioX=(X1, X2}tem a distribuigdo multinomial com Theorem Suppose that the random vector X¥ = (X 1, X2) has the multinomial distribution with 5.9.1 parametrosnep=(p1, pag.2). EntaoXitem a distribuigdo binomial com parametros 5.9.1 parameters n and p = (pj, p2). Then X, has the binomial distribution with parameters nepi, eX2=n-m. nand p;, and X, =n — Xj. ProvaFica claro a partir da definigdo de distribuigdes multinomiais queX2=n-X1 Proof It is clear from the definition of multinomial distributions that X, =n — X, ep2= 1 -p1. Portanto, o vetor aleatérioXé na verdade determinado pela unica variavel and p, = 1— p;. Therefore, the random vector X is actually determined by the single aleatériaX1. A partir da derivacdo da distribuigéo multinomial, vemos que X1é o numero random variable X,. From the derivation of the multinomial distribution, we see that de itens do tipo 1 que sdo selecionados senos itens sdo selecionados de uma populagao X, is the number of items of type 1 that are selected if n items are selected from a que consiste em dois tipos de itens. Se chamarmos itens do tipo 1 de “sucesso”, entdo X1é population consisting of two types of items. If we call items of type 1 “success,” then o numero de sucessos emnEnsaios de Bernoulli com probabilidade de sucesso em cada X, is the number of successes in n Bernoulli trials with probability of success on each tentativa igual api. Segue queXitem a distribuicgdo binomial com pardmetrosn epi. trial equal to pj. It follows that X, has the binomial distribution with parameters n 7 and pj. 7 A prova do Teorema 5.9.1 estende-se facilmente ao seguinte resultado. The proof of Theorem 5.9.1 extends easily to the following result. Corolario Suponha que o vetor aleat6rioX=(%1,..., XkJtem a distribuigdo multinomial Corollary Suppose that the random vector X = (X,,..., X;,) has the multinomial distribution 5.9.1 com pardmetrosnep=(p1,..., Pdg.k). A distribuigdo marginal de cada variavel Xeu 5.9.1 with parameters n and p= (pj, ..., p;). The marginal distribution of each variable (eu=1,..., k€a distribuigdo binomial com pardmetrosnepeu. X; G =1,..., k) is the binomial distribution with parameters n and p;. ProvaEscolha umeua partir de 1,..., ke defina sucesso como a selecdo de um item do tipoeu. Proof Choose onei from 1,..., k, and define success to be the selection of an item EntaoXeué o numero de sucessos emnEnsaios de Bernoulli com probabilidade de sucesso em of type i. Then X; is the number of successes in n Bernoulli trials with probability of cada ensaio igual apeu. 7 sucess on each trial equal to p;. 7 Uma generalizagao adicional do Corolario 5.9.1 @ que a distribuigdo marginal da soma de A further generalization of Corollary 5.9.1 is that the marginal distribution of the algumas das coordenadas de um vetor multinomial tem uma distribuigdo binomial. A sum of some of the coordinates of a multinomial vector has a binomial distribution. prova fica para o Exercicio 1 desta secdo. The proof is left to Exercise 1 in this section. Corolario Suponha que o vetor aleat6rioX=(X1,..., XkJtem a distribuigdo multinomial Corollary Suppose that the random vector X = (X,,..., X;,) has the multinomial distribution 5.9.2 com pardmetrosnep=(pi,..., pag.kJcomk >2. Deixe<k, e deixareu,..., euser 5.9.2 with parameters n and p= (p,..., p,) with k > 2. Let € <k, and let i;,..., i, be elementos distintos do conjunto {1,..., A}. A distribuigado deS=Xeut. . +Xeué a distinct elements of the set {1,..., k}. The distribution of Y = Xj, +++ + Xi, is the distribui¢ao binomial com parametrosnepeut, . .+ peu. 7 binomial distribution with parameters n and p;, +--+ + Pj,- 7 336 Chapter 5 Special Distributions As a final note, the relationship between Bernoulli and binomial distributions extends to multinomial distributions. The Bernoulli distribution with parameter p is the same as the binomial distribution with parameters 1 and p. However, there is no separate name for a multinomial distribution with first parameter n = 1. A random vector with such a distribution will consist of a single 1 in one of its coordinates and k − 1 zeros in the other coordinates. The probability is pi that the ith coordinate is the 1. A k-dimensional vector seems an unwieldy way to represent a random object that can take only k different values. A more common representation would be as a single discrete random variable X that takes one of the k values 1, . . . , k with probabilities p1, . . . , pk, respectively. The univarite distribution just described has no famous name associated with it; however, we have just shown that it is closely related to the multinomial distribution with parameters 1 and (p1, . . . , pk). Means, Variances, and Covariances The means, variances, and covaraiances of the coordinates of a multinomial random vector are given by the next result. Theorem 5.9.2 Means, Variances, and Covariances. Let the random vector X have the multinomial distribution with parameters n and p. The means and variances of the coordinates of X are E(Xi) = npi and Var(Xi) = npi(1 − pi) for i = 1, . . . , k. (5.9.2) Also, the covariances between the coordinates are Cov(Xi, Xj) = −npipj. (5.9.3) Proof Corollary 5.9.1 says that the marginal distribution of each component Xi is the binomial distribution with parameters n and pi. Eq. 5.9.2 follows directly from this fact. Corollary 5.9.2 says that Xi + Xj has the binomial distribution with parameters n and pi + pj. Hence, Var(Xi + Xj) = n(pi + pj)(1 − pi − pj). (5.9.4) According to Theorem 4.6.6, it is also true that Var(Xi + Xj) = Var(Xi) + Var(Xj) + 2 Cov(Xi, Xj) = npi(1 − pi) + npj(1 − pj) + 2 Cov(Xi, Xj). (5.9.5) Equate the right sides of (5.9.4) and (5.9.5), and solve for Cov(Xi, Xj). The result is (5.9.3). Note: Negative Covariance Is Natural for Multinomial Distributions. The negative covariance between different coordinates of a multinomial vector is natural since there are only n selections to be distributed among the k coordinates of the vector. If one of the coordinates is large, at least some of the others have to be small because the sum of the coordinates is fixed at n. Summary Multinomial distributions extend binomial distributions to counts of more than two possible outcomes. The ith coordinate of a vector having the multinomial distribution 336 Capítulo 5 Distribuições Especiais Como nota final, a relação entre Bernoulli e distribuições binomiais se estende às distribuições multinomiais. A distribuição de Bernoulli com parâmetropé igual à distribuição binomial com parâmetros 1 ep. No entanto, não existe um nome separado para uma distribuição multinomial com primeiro parâmetron=1. Um vetor aleatório com tal distribuição consistirá em um único 1 em uma de suas coordenadas e k-1 zeros nas demais coordenadas. A probabilidade épeuque oeua coordenada é 1. Akvetor tridimensional parece uma maneira difícil de representar um objeto aleatório que pode levar apenaskvalores diferentes. Uma representação mais comum seria como uma única variável aleatória discretaXisso leva um doskvalores 1, . . . , kcom probabilidadesp1, . . . , pág.k, respectivamente. A distribuição univariada que acabamos de descrever não tem nenhum nome famoso associado a ela; no entanto, acabamos de mostrar que está intimamente relacionado à distribuição multinomial com parâmetros 1 e(p1, . . . , pág.k). Médias, Variâncias e Covariâncias As médias, variâncias e covariâncias das coordenadas de um vetor aleatório multinomial são fornecidas pelo próximo resultado. Teorema 5.9.2 Médias, Variâncias e Covariâncias.Deixe o vetor aleatórioXtem o multinômio distribuição com parâmetrosnep. As médias e variâncias das coordenadas de X são EXeu)=npeu e Var(Xeu)=npeu(1 -peu)paraeu=1, . . . , k. Além disso, as covariâncias entre as coordenadas são Cov(Xeu, Xj)= -npeupj. (5.9.2) (5.9.3) ProvaO Corolário 5.9.1 diz que a distribuição marginal de cada componenteXeué a distribuição binomial com parâmetrosnepeu. Eq. 5.9.2 decorre diretamente deste fato. O corolário 5.9.2 diz queXeu+Xjtem a distribuição binomial com parâmetros n epeu+pj.Por isso, Var(Xeu+Xj)=n(peu+pj)(1 -peu-pj). (5.9.4) De acordo com o Teorema 4.6.6, também é verdade que Var(Xeu+Xj)=Var(Xeu)+Var(Xj)+2 Cov(Xeu, Xj) =npeu(1 -peu)+npj(1 -pj)+2 Cov(Xeu, Xj). (5.9.5) Iguale os lados direitos de (5.9.4) e (5.9.5) e resolva para Cov(Xeu, Xj). O resultado é (5.9.3). Nota: A covariância negativa é natural para distribuições multinomiais.A covariância negativa entre diferentes coordenadas de um vetor multinomial é natural, pois existem apenas nseleções a serem distribuídas entre oskcoordenadas do vetor. Se uma das coordenadas for grande, pelo menos algumas das outras terão que ser pequenas porque a soma das coordenadas é fixada emn. Resumo As distribuições multinomiais estendem as distribuições binomiais para contagens de mais de dois resultados possíveis. Oeua coordenada de um vetor com distribuição multinomial Traduzido do Inglês para o Português - www.onlinedoctranslator.com 5.10 As Distribuigdes Normais Bivariadas 337 5.10 The Bivariate Normal Distributions 337 com pardmetrosnep=(pi,..., pag.ktem a distribuigdo binomial com pardmetros with parameters n and p = (py, ..., p;,) has the binomial distribution with parameters nepeuparaeu=1,..., k. Consequentemente, as médias e variancias das n and p; fori =1,...,k. Hence, the means and variances of the coordinates of coordenadas de um vetor multinomial sdo iguais as de uma variavel aleatoria a multinomial vector are the same as those of a binomial random variable. The binomial. A covaridncia entreeuo e/a coordenada é -npeupjy. covariance between the ith and jth coordinates is —np; pj. Exercicios Exercises 1.Prove o Corolario 5.9.2. 6.No Exercicio 5, vamosX3denotar o numero de 1. Prove Corollary 5.9.2. 6. In Exercise 5, let X3; denote the number of juniors juniores na amostra aleatéria de 15 alunos, e deixarX4 in the random sample of 15 students, and let X4 denote 2.Suponha quefé um cdf continuo na linha real, e deixeaie denota o numero de idosos na amostra. Encontre o 2. Suppose that F is a continuous c.d.f. on the real line, the number of seniors in the sample. Find the value of azsejam numeros tais queF(a1 0.3 e F(az0.8. Se 25 valor de EX3-X4Je o valor de Var (X3-X4). and let a; and a be numbers such that F(a 1) = 0.3 and E(X3— X4) and the value of Var(X3 — X4). observacgées forem selecionadas aleatoriamente da a , . . F (a7) = 0.8. If 25 observations are selected at random . . distribuigdo para a qual o cdf éF qual é a probabilidade de 7.Suponha que as variaveis aleatoriasX1, . .., Xisdo from the distribution for which the c.d.f. is F, what is the 7+ Suppose that the random variables Xj, ..., X; are in- que seis dos valores observados sejam menores que ai,10 — independentes e queXeutem a distribuic¢ao de Poisson com probability that six of the observed values will be lessthan dependent and that X; has the Poisson distribution with dos valores observados estardo entreaiea, e nove dos meédiaAeu(eu=1, . . ., k). Mostre que para cada posi¢ao fixa a1, 10 of the observed values will be between a, anda), ‘mean 4; @ =1,...,). Show that for each fixed posi- valores observados serdo maiores queaz? numero inteiro ativon, a distribuigdo condiciopal do ran- and nine of the observed values will be greater than a5? tive integer n, the conditional distribution of the ran- vetor domésticoX=(X1,..., Xk), dado que kK =1Xev=n, dom vector X = (Xj, ..., X,), given that vi X; =n, 3.Se cinco dados equilibrados forem langados, qual é a é a distribuig¢do multinomial com parametrosne p=(p1 3. If five balanced dice are rolled, what is the probability is the multinomial distribution with parameters n and probabilidade de o numero 1 e o numero 4 aparecerem o , +++, pag.k), onde that the number 1 and the number 4 will appear the same Pp=(P1.---> Px), where mesmo numero de vezes? A number of times? 4 i . 4.Suponha que um dado seja carregado de modo que cada um dos numeros Pew Ay paraeust,.-.. & 4. Suppose that a die is loaded so that each of the numbers Pi vie dj for? =1,...,k- 1, 2, 3, 4, 5 e 6 tém probabilidades diferentes de aparecer 1, 2,3, 4, 5, and 6 has a different probability of appearing quando o dado é langado. Paraeu=1,...,6, deixepeu 8.Suponha que as pecas produzidas por uma maquina possam ter when the die is rolled. For i=1,...,6, let p; denote 8. Suppose that the parts produced by a machine can have denotaa probabilidade de queo numeroeusera obtido, e trés niveis diferentes de funcionalidade: funcionando, danificadas, the probability that the number 7 will be obtained, and three different levels of functionality: working, impaired, suponha quep1= 0.11,2= 0.30,03= 0.22, p4= 0.05, ps= 0.25, defeituosas. Deixarp1,p2, ep3= 1 —pi-p2sejam as probabilidades de suppose that p; = 0.11, p. = 0.30, p3=0.22, py = 0.05, defective. Let pj, p2, and p3=1— p, — pz be the prob- epe= 0.07. Suponha também que o dado seja lancado 40 uma pega estar funcionando, danificada e com defeito, Ps = 0.25, and pe = 0.07. Suppose also that the die is to abilities that a part is working, impaired, and defective, vezes. DeixarXidenotar o numero de langamentos para os respectivamente. Suponha que o vetorp=(p1, pag.2}é be rolled 40 times. Let X, denote the number of rolls respectively. Suppose that the vector p = (pj, p2) is un- quais um numero par aparece, e deixar X2denota o desconhecido, mas tem distribui¢do conjunta com pdf for which an even number appears, and let X, denote known but has a joint distribution with p.d-f. numero de lancamentos para os quais o numero 1 ou oO | . the number of rolls for which either the number 1 or 5 numero 3 aparece. Encontre o valor de Pr(Xi= 20 e X2= 15). 12p2 ipara O<p1, pag.2<1 the number 3 appears. Find the value of Pr(X, = 20 and 12py for0< py, pz <1 f(p1, pag.2F epitp2<, X, = 15). f(P1, P2) = and pj + p2 <1, 5.Suponha que 16% dos alunos de uma determinada escola 0 de outra forma, 5. Suppose that 16 percent of the students in a certain 0 otherwise. secundaria sejam calouros, 14% sejam alunos do segundo Suponha que observemos 10 partes que sdo high school are freshmen, 14 percent are sophomores, 38 Suppose that we observe 10 parts that are conditionally ano, 38% sejam juniores e 32% sejam veteranos. Se 15 alunos condicionalmente independentes, dadasp, e entre essas 10 percent are juniors, and 32 percent are seniors. If 15 stu- independent given p, and among those 10 parts, eight forem selecionados aleatoriamente na escola, qual é a partes, oito estao funcionando e duas estdo deficientes. dents are selected at random from the school, what is the are working and two are impaired. Find the conditional probabilidade de que pelo menos oito sejam calouros ou Encontre o pdf condicional depdadas as partes observadas. probability that at least eight will be either freshmen or p.d.f. of p given the observed parts. Hint: You might find segundanistas? DicaNocé pode encontrar a Eq. (5.8.2) util. sophomores? Eq. (5.8.2) helpful. 5.10 As Distribuigdes Normais Bivariadas 5.10 The Bivariate Normal Distributions A primeira familia de distribuigées continuas multivariadas para a qual temos um nome é The first family of multivariate continuous distributions for which we have a name uma generaliza¢ao da familia de distribuicg6es normais para duas coordenadas. Hd mais is a generalization of the family of normal distributions to two coordinates. There estrutura para uma distribuicao normal bivariada do que apenas um par de distribui¢ées is more structure to a bivariate normal distribution than just a pair of normal marginais normais. marginal distributions. Definicgdo e Derivacdo de Distribuigdes Normais Bivariadas Definition and Derivation of Bivariate Normal Distributions Exemplo Horménios da tiredide.A produ¢do de combustivel para foguetes produz um produto quimico, 0 perclorato, que Example Thyroid Hormones. Production of rocket fuel produces a chemical, perchlorate, that 5.10.1 encontrou seu caminho para o abastecimento de 4gua potavel. Suspeita-se que o perclorato 5.10.1 has found its way into drinking water supplies. Perchlorate is suspected of inhibiting iniba a fungao da tireoide. Experimentos foram realizados em que ratos de laboratério thyroid function. Experiments have been performed in which laboratory rats have 338 Capitulo 5 Distribuigées Especiais 338 Chapter 5 Special Distributions foram doseados com perclorato na agua potavel. Apés varias semanas, os ratos foram been dosed with perchlorate in their drinking water. After several weeks, rats were sacrificados e varios horm6nios da tireoide foram medidos. Os niveis desses horménios foram sacrificed, and a number of thyroid hormones were measured. The levels of these hor- entdo comparados aos niveis dos mesmos horménios em ratos que nao receberam perclorato mones were then compared to the levels of the same hormones in rats that received na Agua. Dois hormé6nios, TSH e T4, foram de particular interesse. Os experimentadores no perchlorate in their water. Two hormones, TSH and T4, were of particular inter- estavam interessados na distribuigéo conjunta de TSH e T4. Embora cada um dos horménios est. Experimenters were interested in the joint distribution of TSH and T4. Although possa ser modelado com uma distribuicdo normal, é necessdria uma distribuicgdo bivariada para each of the hormones might be modeled with a normal distribution, a bivariate dis- modelar os dois niveis hormonais em conjunto. O conhecimento da atividade da tireoide tribution is needed in order to model the two hormone levels jointly. Knowledge of sugere que os niveis desses horménios ndo serdo independentes, porque um deles 6, na thyroid activity suggests that the levels of these hormones will not be independent, verdade, utilizado pela tireoide para estimular a producdo do outro. because one of them is actually used by the thyroid to stimulate production of the - other. < Se os pesquisadores se sentirem confortadveis em usar a familia de distribuigdes normais If researchers are comfortable using the family of normal distributions to model para modelar cada uma das duas varidveis aleatdrias separadamente, como os horménios no each of two random variables separately, such as the hormones in Example 5.10.1, Exemplo 5.10.1, entao eles precisardo de uma generalizacdo bivariada da familia de then they need a bivariate generalization of the family of normal distributions that distribuicdes normais que ainda tenha distribuigdes normais para suas marginais enquanto still has normal distributions for its marginals while allowing the two random vari- permitindo que as duas varidveis aleatérias sejam dependentes. Uma maneira simples de ables to be dependent. A simple way to create such a generalization is to make use criar tal generalizagao é utilizar o resultado do Corolario 5.6.1. Esse resultado diz que uma of the result in Corollary 5.6.1. That result says that a linear combination of indepen- combinaco linear de variaveis aleatérias normais independentes tem uma distribuigdo dent normal random variables has a normal distribution. If we create two different normal. Se criarmos duas combinacées lineares diferentes XieX2das mesmas variaveis linear combinations X, and X, of the same independent normal random variables, aleatorias normais independentes, entéoX1eX2cada um tera uma distribuigéo normal e podem then X, and X> will each have a normal distribution and they might be dependent. ser dependentes. O resultado a seguir formaliza essa ideia. The following result formalizes this idea. Teorema Suponha queZieZ2sao variadveis aleatérias independentes, cada uma das quais tem o Theorem Suppose that Z, and Z, are independent random variables, each of which has the 5.10.1 distribuig¢do normal padrdo. Deixary,L2,01,02, epser constantes tais que -~< Lleu<oo( 5.10.1 standard normal distribution. Let p11, 42, 01, 02, and p be constants such that —oo < eU=1,2), deu>0(eu=1,2)e -1<p<1. Defina duas novas variaveis aleatérias X1eX2do bj 0 =1, 2), and —1 as follows: A OVZ MN 1 X1,=012, + 4, K=apZit(1 pink ? + pp. (5.10.1) X)=07 [ez +(1- p)'?Z5| + bp. (5.10.1) O pdf conjunto dexieX2é The joint p.d.f. of X; and X> is { [( ) 2 2 1 1 XE 1 1 xy- FOX, X2 | ——— > eaperienia” ~—————- ae (5.10.2) f (x4, x2) = —————————_. exp} -— ———— (2) (5.10.2) 2m(1 -p2h20102 2(1 -p) 2 fo 2m (1 — p2)/20405 2(1 — p?) oO} Co Op On, OF ue a omy -2p MH 1 Se, MR —2p( MEH) (Bowe), (owe) Th ol O2 O2 O71 072 07 ProvaEsta prova baseia-se no Teorema 3.9.5 (transformagdo multivariada de variaveis Proof This proof relies on Theorem 3.9.5 (multivariate transformation of random aleatorias). Se vocé nao estudou o Teorema 3.9.5, ndo sera capaz de seguir esta prova. O variables). If you did not study Theorem 3.9.5, you won’t be able to follow this proof. pdf conjuntog(z1,z2)deZieZ2é The joint p.d.f. g(z,, z.) of Z, and Z, is (21, 2 { xp | tsa 2) (5.10,3) (2, 2) = — exp| 202 +22) (5.10.3) , _—_ — ) 10, 21,2.) =— —=(z27 +25) |, 10. 4 an 2 BNE DS 9 OPT 91 2 para todosziez2. for all z, and z>. O inverso da transformagdo (5.10.1) 6(Z1, Z2E(s1 (M1, X2), S2(M1, X2)), onde The inverse of the transformation (5.10.1) is (Z;, Za) = (s1(X1, X2), 92(X1, X2)), where , XL, ue a s(t) = EM Ol O71 ; ( ) (5.10.4) \ (5.10.4) 2 X2- X1-[N. x2 — 42 17 1 €2(x1, x2 ~———_—— eee poe 52(X1, X2) = ——5,5 | ——= - p———— } . (1-2 O2 a d— py? o o1 5.10 The Bivariate Normal Distributions 339 The Jacobian J of the transformation is J = det ⎡ ⎢⎣ 1 σ1 0 −ρ σ1(1 − ρ2)1/2 1 σ2(1 − ρ2)1/2 ⎤ ⎥⎦ = 1 (1 − ρ2)1/2σ1σ2 . (5.10.5) If one substitutes si(x1, x2) for zi (i = 1, 2) in Eq. (5.10.3) and then multiplies by |J|, one obtains Eq. (5.10.2), which is the joint p.d.f. of (X1, X2) according to Theo- rem 3.9.5. Some simple properties of the distribution with p.d.f. in Eq. (5.10.2) are worth deriving before giving a name to the joint distribution. Theorem 5.10.2 Suppose that X1and X2 have the joint distribution whose p.d.f. is given by Eq. (5.10.2). Then there exist independent standard normal random variables Z1 and Z2 such that Eqs. (5.10.1) hold. Also, the mean of Xi is μi and the variance of Xi is σ 2 i for i = 1, 2. Furthermore the correlation between X1 and X2 is ρ. Finally, the marginal distribution of Xi is the normal distribution with mean μi and variance σ 2 i for i = 1, 2. Proof Use the functions s1 and s2 defined in Eqs. (5.10.4) and define Zi = si(X1, X2) for i = 1, 2. By running the proof of Theorem 5.10.1 in reverse, we see that the joint p.d.f. of Z1 and Z2 is Eq. (5.10.3). Hence, Z1 and Z2 are independent standard normal random variables. The values of the means and variances of X1 and X2 are easily obtained by apply- ing Corollary 5.6.1 to Eq. (5.10.1). If one applies the result in Exercise 8 of Sec. 4.6, one obtains Cov(X1, X2) = σ1σ2ρ. It now follows that ρ is the correlation. The claim about the marginal distributions of X1 and X2 is immediate from Corollary 5.6.1. We are now ready to define the family of bivariate normal distributions. Definition 5.10.1 Bivariate Normal Distributions. When the joint p.d.f. of two random variables X1 and X2 is of the form in Eq. (5.10.2), it is said that X1 and X2 have the bivariate normal distribution with means μ1 and μ2, variances σ 2 1 and σ 2 2 , and correlation ρ. It was convenient for us to derive the bivariate normal distributions as the joint distributions of certain linear combinations of independent random variables hav- ing standard normal distributions. It should be emphasized, however, that bivariate normal distributions arise directly and naturally in many practical problems. For ex- ample, for many populations the joint distribution of two physical characteristics such as the heights and the weights of the individuals in the population will be approxi- mately a bivariate normal distribution. For other populations, the joint distribution of the scores of the individuals in the population on two related tests will be approx- imately a bivariate normal distribution. Example 5.10.2 Anthropometry of Flea Beetles. Lubischew (1962) reports the measurements of several physical features of a variety of species of flea beetle. The investigation was concerned with whether some combination of easily obtained measurements could be used to distinguish the different species. Figure 5.9 shows a scatterplot of measurements of the first joint in the first tarsus versus the second joint in the first tarsus for a sample of 31 from the species Chaetocnema heikertingeri. The plot also includes three ellipses that correspond to a fitted bivariate normal distribution. The ellipses were chosen to contain 25%, 50%, and 75% of the probability of the fitted bivariate normal 5.10 As Distribuições Normais Bivariadas 339 O JacobianoJ.da transformação é ⎡ 1 ⎤ 0 1 J.=det⎣ ⎢ σ1 - ρ ⎥⎦ = 1 (1 -ρ2)1/2σ1σ2 . (5.10.5) σ1(1 -ρ2)1/2 σ2(1 -ρ2)1/2 Se alguém substituiréeu(x1, x2)parazeu(eu=1,2) na Eq. (5.10.3) e depois multiplica por | J.|,obtém-se a Eq. (5.10.2), que é o pdf conjunto de(X1, X2)de acordo com o Teorema 3.9.5. Algumas propriedades simples da distribuição com pdf na Eq. (5.10.2) vale a pena derivar antes de dar um nome à distribuição conjunta. Teorema 5.10.2 Suponha queX1eX2tem a distribuição conjunta cuja pdf é dada pela Eq. (5.10.2). Então existem variáveis aleatórias normais padrão independentesZ1eZ2tal que as Eqs. (5.10.1) segure. Além disso, a média deXeuéμeue a variação deXeuéσ2 eupara eu=1,2. Além disso, a correlação entreX1eX2éρ. Por fim, a marginal distribuição deXeué a distribuição normal com médiaμeue variaçãoσ2 euparaeu=1,2. ProvaUtilize as funçõesé1eé2definido nas Eqs. (5.10.4) e definaZeu=éeu(X1, X2) paraeu= 1,2. Executando a prova do Teorema 5.10.1 ao contrário, vemos que a pdf conjunta deZ1eZ2é a Eq. (5.10.3). Por isso,Z1eZ2são variáveis aleatórias normais padrão independentes. Os valores das médias e variâncias deX1eX2são facilmente obtidos aplicando o Corolário 5.6.1 à Eq. (5.10.1). Se aplicarmos o resultado do Exercício 8 da Seção. 4.6, obtém-se Cov(X1, X2)=σ1σ2ρ. Segue-se agora queρé a correlação. A afirmação sobre as distribuições marginais deX1eX2é imediato do Corolário 5.6.1. Agora estamos prontos para definir a família de distribuições normais bivariadas. Definição 5.10.1 Distribuições normais bivariadas.Quando a pdf conjunta de duas variáveis aleatóriasX1e X2é da forma na Eq. (5.10.2), diz-se queX1eX2tenha onormal bivariado distribuição com médias μ1e μ2, variâncias σ2 1e σ2 2,e correlação ρ. Foi conveniente para nós derivar as distribuições normais bivariadas como as distribuições conjuntas de certas combinações lineares de variáveis aleatórias independentes com distribuições normais padrão. Deve-se enfatizar, entretanto, que distribuições normais bivariadas surgem direta e naturalmente em muitos problemas práticos. Por exemplo, para muitas populações, a distribuição conjunta de duas características físicas, como as alturas e os pesos dos indivíduos da população, será aproximadamente uma distribuição normal bivariada. Para outras populações, a distribuição conjunta das pontuações dos indivíduos da população em dois testes relacionados será aproximadamente uma distribuição normal bivariada. Exemplo 5.10.2 Antropometria de besouros de pulga.Lubischew (1962) relata as medições de vários características físicas de uma variedade de espécies de besouros-pulgas. A investigação estava preocupada em saber se alguma combinação de medidas facilmente obtidas poderia ser usada para distinguir as diferentes espécies. A Figura 5.9 mostra um gráfico de dispersão das medidas da primeira articulação no primeiro tarso versus a segunda articulação no primeiro tarso para uma amostra de 31 da espécieChaetocnema heikertingeri. O gráfico também inclui três elipses que correspondem a uma distribuição normal bivariada ajustada. As elipses foram escolhidas para conter 25%, 50% e 75% da probabilidade da normal bivariada ajustada 340 Capitulo 5 Distribuigées Especiais 340 Chapter 5 Special Distributions Figura 5.9Grafico de dispersdo Figure 5.9 Scatterplot of de dados de besouros de pulga 130 nena ° flea beetle data with 25%, 130 ee ° com 25%, 50% e 75% de elipses a ° . * 50%, and 75% bivariate a ° . * normais bivariadas para o 125 aw s. i normal ellipses for Exam- 2 125 aw s. i Exemplo 5.10.2. g en ee “sy °s ple 5.10.2. ‘So or Sa “sy °s 51207, OS 6 a“ oe 210-, So a“ : £ : : “SY Zz : : Of 3 i ee e e a n i ee e e a 110 7 ee Ho S es 180 190 200 210 220 230 240 130-190-200 210's«20-s-230—S«240 Primetra articulagio do tarso First tarsus joint distribuigdo. A distribuicgdo ajustada é a distribuigdo normal bivariada com médias distribution. The fitted distribution is is the bivariate normal distribution with means 201 e 119,3, variancias 222,1 e 44,2 e correlacdo 0,64. - 201 and 119.3, variances 222.1 and 44.2, and correlation 0.64. < Propriedades de distribuigées normais bivariadas Properties of Bivariate Normal Distributions Para variaveis aleatérias com distribuigéo normal bivariada, descobrimos que ser For random variables with a bivariate normal distribution, we find that being inde- independente equivale a nao ser correlacionado. pendent is equivalent to being uncorrelated. Teorema Independéncia e Correlagdo.Duas variaveis aleatériasXieX2que tém uma bivariada Theorem Independence and Correlation. Two random variables X, and X> that have a bivariate 5.10.3 distribuicdo normal sdo independentes se e somente se ndo forem correlacionados. 5.10.3 normal distribution are independent if and only if they are uncorrelated. ProvaA diregdo “somente se” ja 6 conhecida pelo Teorema 4.6.4. Para a direcao Proof The “only if” direction is already known from Theorem 4.6.4. For the “if” “se”, suponha queXieX2ndo estdo correlacionados. Entaop= 0, e isso pode ser direction, assume that X; and X> are uncorrelated. Then p = 0, and it can be seen visto na Eq. (5.10.2) que o pdf conjunto//x1, x2/fatores no produto da pdf from Eq. (5.10.2) that the joint p.d-f. f(x, x2) factors into the product of the marginal marginal deXe o pdf marginal deX2. Por isso, X1ex2sdo0 independentes. = p.d.f. of X, and the marginal p.d.f. of X,. Hence, X, and X, are independent. 2 Ja vimos no Exemplo 4.6.4 que duas variaveis aleatoriasXieX2 We have already seen in Example 4.6.4 that two random variables X; and X> com uma distribuigéo conjunta arbitraria pode nado ser correlacionado sem ser with an arbitrary joint distribution can be uncorrelated without being independent. independente. O Teorema 5.10.3 diz que ndo existem exemplos em queXieXztem uma Theorem 5.10.3 says that no such examples exist in which X, and X, have a bivariate distribuigdo normal bivariada. normal distribution. Quando a correlacdo nao é zero, o Teorema 5.10.2 fornece as distribuigdes marginais de When the correlation is not zero, Theorem 5.10.2 gives the marginal distributions varidveis aleatérias normais bivariadas. A combinagao das distribuigdes marginais e conjuntas of bivariate normal random variables. Combining the marginal and joint distributions permite-nos encontrar as distribuigdes condicionais de cadaXeudado o outro. O préximo allows us to find the conditional distributions of each X; given the other one. The next teorema deriva as distribuigédes condicionais usando outra técnica. theorem derives the conditional distributions using another technique. Teorema Distribuigdes Condicionais.DeixarXieX2tem a distribuigdo normal bivariada cujo Theorem Conditional Distributions. Let X, and X, have the bivariate normal distribution whose 5.10.4 pdf é a Eq. (5.10.2). A distribuigdo condicional dexXzdado queX1=x1é o normal 5.10.4 p.d.f. is Eq. (5.10.2). The conditional distribution of X, given that X, = x, is the normal distribuigdo com média e varidncia dada por distribution with mean and variance given by ( ) x1-LN Var(xxE (Al Xy-p EX2| x1 )-L2+ por a 2|\1 - prjoz, (5.10.6) E(X|x1) = bo + poy a , Var(X>|x,) = - p’)o5. (5.10.6) ProvaFaremos uso liberal do Teorema 5.10.2 e sua notacdo nesta prova. Proof We will make liberal use of Theorem 5.10.2 and its notation in this proof. Con- Condicionamento ligadoX1=x1é o mesmo que condicionarZ1=(x1-41)/o1. Quando ditioning on X = x; is the same as conditioning on Z,; = (x1 — (44)/o0,. When we want queremos encontrar a distribuigdo condicional dexzdadoZ1=(x1-41)/o1, podemos to find the conditional distribution of X, given Z, = (x; — M4)/o1, we can subtitute substituir (x1-11 )//oiparaZ2na formula paraX2na Eq. (5.10.1) e encontre a distribuigdo (x1 — 44)/o, for Z, in the formula for X> in Eq. (5.10.1) and find the conditional dis- condicional para o resto da férmula. Ou seja, a distribuigdo condicional deXzdado tribution for the rest of the formula. That is, the conditional distribution of X> given 5.10 As Distribuigdes Normais Bivariadas 341 5.10 The Bivariate Normal Distributions 341 queXi1=x1é o mesmo que a distribui¢do condicional de that X, = x, is the same as the conditional distribution of (Cy (\ -p2202Z2+ 12+ po2 ae (5.10.7) (1 — p2)"a5Z5 + py + po,(—41) (5.10.7) O71 dadoZ1=(x1-1n Yor. Mas22é a Unica variavel aleatdéria na Eq. (5.10.7), eZ2 given Z, = (x; — 41)/o;. But Z, is the only random variable in Eq. (5.10.7), and Z, é independente deZ1. Portanto, a distribuigdo condicional deXzdadoX1=x1é a is independent of Z,. Hence, the conditional distribution of X> given X, = x, is the distribuigdo marginal da Eq. (5.10.7), ou seja, a distribuigdo normal com média e marginal distribution of Eq. (5.10.7), namely, the normal distribution with mean and variancia dada pela Eq. (5.10.6). 7 variance given by Eq. (5.10.6). 7 A distribuigdo condicional deXidado queX2=x2ndo pode ser derivado tao The conditional distribution of X, given that X, = x cannot be derived so easily facilmente da Eq. (5.10.1) devido as diferentes maneiras pelas quaisZ1eZzinsira a Eq. from Eq. (5.10.1) because of the different ways in which Z, and Z, enter Eq. (5.10.1). (5.10.1). No entanto, é visto na Eq. (5.10.2) que a distribuigdo conjunta deX2eX1 However, it is seen from Eq. (5.10.2) that the joint distribution of X> and X; is also também é normal bivariado com todos os subscritos 1 e 2 alternados em todos os bivariate normal with all of the subscripts 1 and 2 swithched on all of the parameters. pardmetros. Portanto, podemos aplicar o Teorema 5.10.4 paraxXzeXiconcluir que a Hence, we can apply Theorem 5.10.4 to Xz and X, to conclude that the conditional distribuigdo condicional deXidado queX2=x2deve ser a distribuigdo normal com distribution of X, given that X, = x» must be the normal distribution with mean and média e variancia variance ( ) EX Poe yntpor eee (PHP, 1, (5.10.8) E(X |x) = 4 + po (2) , Var(Xylx2) =(1— p07. (5.10.8) 2 Mostramos agora que cada distribuicgdo marginal e cada distribuigéo condicional de We have now shown that each marginal distribution and each conditional distri- uma distribuigdo normal bivariada é uma distribuigdo normal univariada. bution of a bivariate normal distribution is a univariate normal distribution. Algumas caracteristicas particulares da distribuigdo condicional deXzdado que Some particular features of the conditional distribution of X, given that X; = = xideve ser observado. Sep= 0, entao£X2| x1) é uma fungdo linear dex1. Sep >0, a x, should be noted. If o 40, then E(X>|x,) is a linear function of x. If p > 0, inclinagdo desta funcado linear é positiva. Sep <0, a inclinacgdo da funcdo é negativa. the slope of this linear function is positive. If o <0, the slope of the function is No entanto, a variancia da distribuigdo condicional deX2dado que negative. However, the variance of the conditional distribution of X, given that M=xi€(1 -p2)o02 2, que nao dependex1. Além disso, esta variagdo de X,=x,is 1- p°)o%, which does not depend on x,. Furthermore, this variance of a distribuigdo condicional dex2é menor que a varianciaoz 2da marginal the conditional distribution of X> is smaller than the variance a5 of the marginal distribuigdo deX2. distribution of X>. Exemplo Prever 0 peso de uma pessoa.DeixarXidenotam a altura de uma pessoa selecionada aleatoriamente Example Predicting a Person’s Weight. Let X, denote the height of a person selected at random 5.10.3 de uma determinada populagdo, e deixeX2denota o peso da pessoa. Suponha que essas 5.10.3 from a certain population, and let X, denote the weight of the person. Suppose that variaveis aleatérias tenham a distribuicgdo normal bivariada para a qual a pdf é these random variables have the bivariate normal distribution for which the p.d-f. is especificada pela Eq. (5.10.2) e que o peso da pessoaX2deve ser previsto. Compararemos o specified by Eq. (5.10.2) and that the person’s weight X, must be predicted. We shall menor MSE que pode ser alcangado se a altura da pessoaX1é conhecido quando seu peso compare the smallest M.S.E. that can be attained if the person’s height X, is known deve ser previsto com o menor MSE que pode ser alcangado se sua altura nao for when her weight must be predicted with the smallest M.S.E. that can be attained if conhecida. her height is not known. Se a altura da pessoa nao for conhecida, entao a melhor previsdo do seu peso éa If the person’s height is not known, then the best prediction of her weight is the significar EX2u2, e o MSE desta previsdo é€ a varianciaoz 2. Se for conhecido mean E(X>) = fo, and the M.S.E. of this prediction is the variance a}. If it is known que a altura da pessoa éx1, entdo a melhor previsdo é a média£X2| x1) da that the person’s height is x,, then the best prediction is the mean E(X>|x,) of the distribuigdo condicional dex2dado quexXi=x1, e o MSE desta previsdo é conditional distribution of X, given that X, = x,, and the M.S.E. of this prediction is avariagdo(1 -p2)o2 —2dessa distribuigdo condicional. Portanto, quando o valor deX1 the variance (1 — p*)o% of that conditional distribution. Hence, when the value of X, € conhecido, o MSE é reduzido deo2 2para(l -p2)o2_ - is known, the M.S.E. is reduced from a5 to (1 —- p°)o5. < Como a variancia da distribuigao condicional no Exemplo 5.10.3 €(1 -p2)o2z 5 Since the variance of the conditional distribution in Example 5.10.3 is (1 — p°)o5, independentemente da altura conhecidaxida pessoa, segue-se que a dificuldade regardless of the known height x, of the person, it follows that the difficulty of de prever o peso da pessoa é a mesma para uma pessoa alta, baixa ou predicting the person’s weight is the same for a tall person, a short person, or a pessoa de estatura média. Além disso, como a variancia(1 -92)o2 — 2diminuia medida que person of medium height. Furthermore, since the variance (1 — p°)o%, decreases as | p| aumenta, segue-se que é mais facil prever o peso de uma pessoa a partir da sua altura quando a |p| increases, it follows that it is easier to predict a person’s weight from her height pessoa é selecionada de uma populagdo na qual a altura e o peso estado altamente correlacionados. when the person is selected from a population in which height and weight are highly correlated. 342 Chapter 5 Special Distributions Example 5.10.4 Determining a Marginal Distribution. Suppose that a random variable X has the nor- mal distribution with mean μ and variance σ 2, and that for every number x, the conditional distribution of another random variable Y given that X = x is the normal distribution with mean x and variance τ 2. We shall determine the marginal distribu- tion of Y. We know that the marginal distribution of X is a normal distribution, and the conditional distribution of Y given that X = x is a normal distribution, for which the mean is a linear function of x and the variance is constant. It follows that the joint distribution of X and Y must be a bivariate normal distribution (see Exercise 14). Hence, the marginal distribution of Y is also a normal distribution. The mean and the variance of Y must be determined. The mean of Y is E(Y) = E[E(Y|X)] = E(X) = μ. Furthermore, by Theorem 4.7.4, Var(Y) = E[Var(Y|X)] + Var[E(Y|X)] = E(τ 2) + Var(X) = τ 2 + σ 2. Hence, the distribution of Y is the normal distribution with mean μ and variance τ 2 + σ 2. ◀ Linear Combinations Example 5.10.5 Heights of Husbands and Wives. Suppose that a married couple is selected at random from a certain population of married couples and that the joint distribution of the height of the wife and the height of her husband is a bivariate normal distribution. What is the probability that, in the randomly chosen couple, the wife is taller than the husband? ◀ The question asked at the end of Example 5.10.5 can be expressed in terms of the distribution of the difference between a wife’s and husband’s heights. This is a special case of a linear combination of a bivariate normal vector. Theorem 5.10.5 Linear Combination of Bivariate Normals. Suppose that two random variables X1 and X2 have a bivariate normal distribution, for which the p.d.f. is specified by Eq. (5.10.2). Let Y = a1X1 + a2X2 + b, where a1, a2, and b are arbitrary given constants. Then Y has the normal distribution with mean a1μ1 + a2μ2 + b and variance a2 1σ 2 1 + a2 2σ 2 2 + 2a1a2ρσ1σ2. (5.10.9) Proof According to Theorem 5.10.2, both X1 and X2 can be represented, as in Eq. (5.10.1), as linear combinations of independent and normally distributed random variables Z1 and Z2. Since Y is a linear combination of X1 and X2, it follows that Y can also be represented as a linear combination of Z1 and Z2. Therefore, by Corollary 5.6.1, the distribution of Y will also be a normal distribution. It only remains to compute the mean and variance of Y. The mean of Y is E(Y) = a1E(X1) + a2E(X2) + b = a1μ1 + a2μ2 + b. 342 Capítulo 5 Distribuições Especiais Exemplo 5.10.4 Determinando uma distribuição marginal.Suponha que uma variável aleatóriaXtem o nor- má distribuição com médiaμe variaçãoσ2, e isso para cada númerox, a distribuição condicional de outra variável aleatóriaSdado queX=xé a distribuição normal com médiaxe variaçãoτ2. Determinaremos a distribuição marginal deS. Sabemos que a distribuição marginal deXé uma distribuição normal, e a distribuição condicional deSdado queX=xé uma distribuição normal, para a qual a média é uma função linear dexe a variância é constante. Segue-se que a distribuição conjunta deXeSdeve ser uma distribuição normal bivariada (ver Exercício 14). Portanto, a distribuição marginal deStambém é uma distribuição normal. A média e a variância deSdeve ser determinado. A média deSé E(S)=E[E(S|X)] =EX)=μ. Além disso, pelo Teorema 4.7.4, Var(S)=E[Var(S|X)] + Var[E(S|X)] =E(τ2)+Var(X) =τ2+σ2. Portanto, a distribuição deSé a distribuição normal com médiaμe variação τ2+σ2. - Combinações Lineares Exemplo 5.10.5 Alturas de maridos e esposas.Suponha que um casal seja selecionado aleatoriamente de uma determinada população de casais casados e que a distribuição conjunta da altura da esposa e da altura do marido é uma distribuição normal bivariada. Qual é a probabilidade de, no casal escolhido aleatoriamente, a esposa ser mais alta que o marido? - A questão colocada no final do Exemplo 5.10.5 pode ser expressa em termos da distribuição da diferença entre as alturas da esposa e do marido. Este é um caso especial de combinação linear de um vetor normal bivariado. Teorema 5.10.5 Combinação Linear de Normais Bivariadas.Suponha que duas variáveis aleatóriasX1e X2têm uma distribuição normal bivariada, para a qual a pdf é especificada pela Eq. (5.10.2). DeixarS=a1X1+a2X2+b, ondea1,a2, ebsão constantes dadas arbitrariamente. EntãoS tem a distribuição normal com médiaa1μ1+a2μ2+be variação a21σ21+a2 2+ 2a1a2ρσ1σ2. 2σ2 (5.10.9) ProvaDe acordo com o Teorema 5.10.2, ambosX1eX2pode ser representado, como na Eq. (5.10.1), como combinações lineares de variáveis aleatórias independentes e normalmente distribuídasZ1eZ2. DesdeSé uma combinação linear deX1eX2, segue que S também pode ser representado como uma combinação linear deZ1eZ2. Portanto, pelo Corolário 5.6.1, a distribuição deStambém será uma distribuição normal. Resta apenas calcular a média e a variância deS.A média deSé E(S)=a1EX1)+a2EX2)+b =a1μ1+a2μ2+b. 5.10 As Distribuigdes Normais Bivariadas 343 5.10 The Bivariate Normal Distributions 343 Também segue do Corolario 4.6.1 que It also follows from Corollary 4.6.1 that Var(SFa War(Xifta2 2Var(X%2+2a1a2Cov(M, X2). Var(Y) = a} Var(X,) +. a3 Var(X3) + 2ayaz Cov(X4, X3). Essa var(S}é dado pela Eq. (5.10.9) agora segue facilmente. 7 That Var(Y) is given by Eq. (5.10.9) now follows easily. 7 Exemplo Alturas de maridos e esposas.Considere novamente o Exemplo 5.10.5. Suponha que o Example Heights of Husbands and Wives. Consider again Example 5.10.5. Suppose that the 5.10.6 as alturas das esposas tém média de 66,8 polegadas e desvio padrdo de 2 polegadas, 5.10.6 heights of the wives have a mean of 66.8 inches and a standard deviation of 2 inches, as alturas dos maridos tém média de 70 polegadas e desvio padrdo de 2 polegadas, the heights of the husbands have a mean of 70 inches and a standard deviation of 2 e a correlacdo entre essas duas alturas é de 0,68. Determinaremos a probabilidade inches, and the correlation between these two heights is 0.68. We shall determine the de a esposa ser mais alta que 0 marido. probability that the wife will be taller than her husband. Se deixarmosXdenotar a altura da esposa, e deixar Sdenotar a altura do If we let X denote the height of the wife, and let Y denote the height of her marido, entdo devemos determinar 0 valor de Pr(X-S >0). DesdeXeStém uma husband, then we must determine the value of Pr(x — Y > 0). Since X and Y have distribuigdo normal bivariada, segue-se que a distribuigdo deX-Ssera a a bivariate normal distribution, it follows that the distribution of X — Y will be the distribuigdo normal, com média normal distribution, with mean EX-SF66.8 - 70 = -3.2 E(X — Y) = 66.8 — 70 = —3.2 e variagdo and variance Var(XX-SEVar (X}Var(S}2 Cov(X, Y) Var(X — Y) = Var(X) + Var(Y) — 2 Cov(Xx, Y) =4+4-2(0.68)2)(22.56. =4+ 4 — 2(0.68)(2)(2) = 2.56. Portanto, o desvio padrdo deX-Sé 1,6. Hence, the standard deviation of X — Y is 1.6. A variavel aleatoriaZ=(X¥-S+3.2)/(1 .6tera a distribuigdo normal padrdo. Pode- The random variable Z = (X — Y + 3.2)/(1.6) will have the standard normal se verificar na tabela apresentada no final deste livro que distribution. It can be found from the table given at the end of this book that Pr.(X-S>0Pr.(Z>21 -(2) Pr(x — Y > 0) =Pr(Z > 2) =1-— ®(2) =0.0227. = 0.0227. Portanto, a probabilidade de a esposa ser mais alta que o marido é 0,0227. - Therefore, the probability that the wife will be taller than her husband is 0.0227. < Resumo Summary Se um vetor aleatério(X, Ytem uma distribuicgdo normal bivariada, entéo toda combinacgao If a random vector (X, Y) has a bivariate normal distribution, then every linear linear machadot port ctem uma distribuigdo normal. Em particular, as distribuigées combination aX + bY +c has a normal distribution. In particular, the marginal marginais deXeSsdo normais. Além disso, a distribuigdo condicional deXdado S=simé distributions of X and Y are normal. Also, the conditional distribution of X given normal com a média condicional sendo uma fungao linear desime a varidncia condicional Y = y is normal with the conditional mean being a linear function of y and the sendo constante ems/m. (Da mesma forma, para a distribuigéo condicional deSdadoX=x.) conditional variance being constant in y. (Similarly, for the conditional distribution Um tratamento mais completo das distribuigdes normais bivariadas e generalizagdes de of Y given X =x.) A more thorough treatment of the bivariate normal distributions dimens6es superiores pode ser encontrado no livro de DF Morrison (1990). and higher-dimensional generalizations can be found in the book by D. F. Morrison (1990). Exercicios Exercises 1.Considere novamente a distribuigdo conjunta de alturas o desvio padrdo é 10; a pontuacgdo média no teste Bé 90 e 1. Consider again the joint distribution of heights of hus- standard deviation is 10; the mean score on test B is 90, de maridos e esposas no Exemplo 5.10.6. Encontre o o desvio padrdo é 16; as pontuagdes nos dois testes tém bands and wives in Example 5.10.6. Find the 0.95 quantile and the standard deviation is 16; the scores on the two tests quantil 0,95 da distribuigdo condicional da altura da distribuigdo normal bivariada; e a correlagdo das duas of the conditional distribution of the height of the wife have a bivariate normal distribution; and the correlation esposa, dado que a altura do marido é 72 polegadas. pontuacées é de 0,8. Se a nota do aluno na prova/é 80, given that the height of the husband is 72 inches. of the two scores is 0.8. If the student’s score on test A is : ; qual é a probabilidade de que sua pontuacao no testeB . . 80, what is the probability that her score on test B will be 2.Suponha que dois testes diferentesAeBdevem ser dados a um sera superi 2. Suppose that two different tests A and B are to be given : 95 : ’ ; . perior a 90? , . higher than 90? aluno escolhido aleatoriamente de uma determinada populacdo. to a student chosen at random from a certain population. Suponha também que a pontuacdo média no testeAé 85, eo Suppose also that the mean score on test A is 85, and the 344 Capitulo 5 Distribuigées Especiais 344 Chapter 5 Special Distributions 3.Considere novamente os dois testesAe Bdescrito no 11.Suponha que duas variaveis aleatdériasXieX2tém 3. Consider again the two tests A and B described in Ex- 11. Suppose that two random variables X; and X, have Exercicio 2. Se um aluno for escolhido aleatoriamente, uma distribuigdo normal bivariada e Var(X1 FVar(X2). ercise 2. If a student is chosen at random, what is the a bivariate normal distribution, and Var(X,) = Var(X>). qual é a probabilidade de a soma das suas notas nos dois Mostre que a somaXi+X2e a diferengaX1-X2 probability that the sum of her scores on the two tests will Show that the sum X, + X> and the difference X; — X> testes ser superior a 200? sdo varidveis aleatérias independentes. be greater than 200? are independent random variables. 4.Considere novamente os dois testesAe Bdescrito no Exercicio 2. 12.Suponha que as duas medicées de besouros de pulgas 4. Consider again the two tests A and B described in Ex- 12. Suppose that the two measurements from flea beetles Se um aluno for escolhido aleatoriamente, qual é a probabilidade no Exemplo 5.10.2 tenham distribuigdo normal bivariada ercise 2. If a student is chosen at random, what is the in Example 5.10.2 have the bivariate normal distribution de que sua pontuaco no testeAserd maior do que sua pontuagado compn= 201,42= 118,01= 15.2,02= 6.6, ep= 0.64. Suponha probability that her score on test A will be higher than with p44 = 201, wz = 118, 01 = 15.2, 0 = 6.6, and p = 0.64. no teste B? que as mesmas duas medidas de uma segunda espécie her score on test B? Suppose that the same two measurements from a second também tenham distribuigdo normal bivariada com jn= species also have the bivariate normal distribution with 5.Considere novamente os dois testesAe Bdescrito no 187,p2= 131, 01= 15.2, 02= 6.6, ep= 0.64. Deixe (M1, X2ser um 5. Consider again the two tests A and B described in Ex- Hy = 187, 2 = 131, 0, = 15.2, op = 6.6, and p = 0.64. Let Exercicio 2. Se um aluno for escolhido aleatoriamente e par de medicées em um besouro-pulga de uma dessas ercise 2. If a student is chosen at random, and her score (X,, X>) be a pair of measurements on a flea beetle from sua pontua¢do no teste Bé 100, qual o valor previstode sua —duas espécies. Deixara, a2sejam constantes. on test B is 100, what predicted value of her score on test one of these two species. Let a1, a be constants. pontuacdo no teste Atem o menor MSE e qual é 0 valor a ar A has the smallest M.S.E., and what is the value of this . desse MSE minimo? a.Para cada uma das duas espécies, encontre a média e 0 minimum MS.E.? a. For each of the two species, find the mean and stan- , desvio padrao dea1X1+ a2X2. (Observe que as variagdes — dard deviation of a,X 1+ aX. (Note that the vari- 6.Suponha que as variadveis aleatériasxiexX2tém uma para as duas espécies serao as mesmas. Como vocé 6. Suppose that the random variables X, and X, have ances for the two species will be the same. How do distribuigdo normal bivariada, para a qual a pdf conjunta é sabe disso?) a bivariate normal distribution, for which the joint p.d.-f. you know that?) especificada pela Eq. (5.10.2). Determine o valor da b.Encontraraieazmaximizar a razdo entre a diferenca entre is specified by Eq. (5.10.2). Determine the value of the b. Find aj and az to maximize the ratio of the difference constantebpara o qual Var(X1+bX2)sera um minimo. as duas médias encontradas na parte (a) e o desvio constant b for which Var(X, + bX) will be a minimum. between the two means found in part (a) to the stan- 75 h xieXot distribuica padrao encontrado na parte (a). Ha um sentido em que 7.5 that X, and X>h bivariat Ld dard deviation found in part (a). There is a sense in uponna querrex2etem uma aistribuicao norma esta combinacéo lineara1 X1+a2X2faz o melhor trabalho * Suppose Mat #1 and “2 Aave a Oivariate norma’ C1s- which this linear combination a,X 1 + a>X> does the bivariada para a qualEM | X2)=3.7 - 0.152, EX2| Xi 0.4 - distinquir as d - tre tod tribution for which E(X1|X 7) = 3.7 — 0.15X9, E(X>|X1) = best job of distinguishing the t . ll 0.6X1e Var(X2| Xi 3.64. Encontre a média e a variancia eee eae ees ee ees SNe NOCAS aS 0.4 — 0.6X 1, and Var(X>|X1) = 3.64. Find the mean and the oer OO oe CIsMng UIs ng ee two species among a . ve a, . combinacées lineares possiveis. ad 2141) ee possible linear combinations. dex, a média e a varidncia deX2, e a correlagdo dexXieX variance of X,, the mean and the variance of X, and the 2. 13.Suponha que a pdf conjunta de duas variaveis correlation of X, and X>. 13. Suppose that the joint p.d.f. of two random variables ; ae aleatérias Xe5é proporcional, em funcgao de(x, y), para , X and Y is proportional, as a function of (x, y), to 8.Deixarf(x1, x2zdenotam o pdf da distribuigdo normal ( ) 8. Let f(x1, x2) denote the p.d.f of the bivariate normal bivariada especificada pela Eq. (5.10.2). Mostre que o exp -[machadoz+ port cxy+ ex+ git A], distribution specified by Eq. (5.10.2). Show that the max- exp(—[ax + by? texy tex tey+ n|) . valor maximo def(x1, x2)€ alcangado no ponto em que x imum value of f(x;, x2) is attained at the point at which 1={N@X2=L2. ondeuma >0,>0, ec e,g, ehsdo todas constantes. Xy = py and x7 = py. where a>0, b> 0, and c, e, g, and hf are all constants. . Co Assuma issoab > (c/2p. Prove issoXe Stenha uma Lo. Assume that ab > (c/2)?. Prove that X and Y have a bi- pene ne (M, re enoram . par “2 roo normal distribuigdo normal bivariada e encontre as médias, * “ (1, X2) denote me ree O the bivariate normal variate normal distribution, and find the means, variances, ivariada especificada pela Eq. (5.10.2), e deixekseja uma variancias e correlacao. istribution specified by Eq. (5.10.2), and le e a con- and correlation. constante tal que stant such that 1 14.Suponha que uma variavel aleatériaXtem distribuigdo 1 14. Suppose that a random variable X has a normal dis- O0<k << — normal e para cadax, a distribuigdo condicional de outra 0<k < ——______.. tribution, and for every x, the conditional distribution of 271 -p2h2oior variavel aleatoriaSdado queX=xé uma distribuigdo normal 2m (1 — p?)'/?a109 another random variable Y given that X = x is a normal . com médiamachadot be variagdor, ondea, 6, et2sdo . . distribution with mean ax + b and variance 17, where a, Mostre que os pontos(x1, x2)de tal modo quef(m1, x2 Adeitar em um Cs : , Show that the points (x1, x2) such that f (x1, x.) =klieona 2 . wae : . ; constantes. Prove que a distribuigdo conjunta de XesSé : . . : : b, and t“ are constants. Prove that the joint distribution of circulo sep= 0 eai=o2, € esses pontos estdo em uma elipse, caso nrEne oo circle if p = 0 and oj = op, and these points lie on an ellipse . . oy - uma distribuigdo normal bivariada. : X and Y isa bivariate normal distribution. contrario. otherwise. oo . 15.Deixar™,..., Xnser iid variaveis aleatdrias tendo o 10. S h d ‘abl dXoh 15. Let X1,..., X,, be i.id. random variables having the 10.Suponha que duas varlavels aleatoriasXieX2tem uma normal Gdistribuigso com médiaye variacGdooz. Definir X » Suppose that two random variables x; and X have normal distribution with mean yw and variance o”. Define distribuigéo normal bivariada e duas outras variaveis m1 — © BerXou, amédia amostral, Neste problema, nés a bivariate normal distribution, and two other random X, =! °"_ X,, the sample mean. In this problem, we aleatérias.S1eS2sdo definidos da seguinte forma: nm CNA , P a variables Y, and Y> are defined as follows: nen vial P a P Oe deve encontrar a distribuicdo condicional de cadaXeuwdadoXn. shall find the conditional distribution of each X; given X,,. SH-an Xi +an2xet hn, Sa-a a.Mostre issoXeuveXntem a distribuigdo normal Y= ayiX1 + 2X2 + dr, a. Show that X; and X,, have the bivariate normal dis- 21Xi+ a22X2+ be, bivariada com ambosvsignificay, varéareap/n, Yy = dy1X1 + a92Xq + by, tribution with both means jz, variances o” and o7/n, onde e correlagdo 1/n. Dica.Deixar S= jeeuXAgora where and correlation 1/./n. Hint: Let Y = 37,4; X;. Now | mostre isso SeXeuSA0 normais independentes eXn show that Y and X; are independent normals and X,, , 4 hao eXeusao combinacées lineares de SeXeu. | a1 42 | 40 and X; are linear combinations of Y and X;. el 2 b.Mostre que a distribuigdo condicional deXeudado 421 492 b. Show that the conditional distribution of X; given Xn=xr@ normal com médiaxné variacdoo2(1-1/n). oo. i. X,, =X, is normal with mean x, and variance o7(1 — Mostre issoSieSztambém tém uma distribuigdo normal mae m S a ) Show that Y; and Y also have a bivariate normal distribu- 1/n) *n *n ( bivariada. tion. , 5.11 Exercicios Suplementares 345 5.11 Supplementary Exercises 345 5.11 Exercicios Suplementares 5.11 Supplementary Exercises 1.DeixarXePsejam variaveis aleatorias. Suponha que a 12.Suponha que uma moeda honesta seja langada até que pelo menos 1. Let X and P be random variables. Suppose that the 12. Suppose that a fair coin is tossed until at least one head distribuigdo condicional deXdadoP=pé a distribuigdo uma cara e pelo menos uma coroa sejam obtidas. DeixarXdenota o conditional distribution of X given P = p is the binomial and at least one tail have been obtained. Let X denote the binomial com pardmetrosnep. Suponha que a numero de langamentos necessarios. Encontre o PF dex. distribution with parameters n and p. Suppose that the number of tosses that are required. Find the p.f. of X. vr oBe | Encontro distribuiezo marginal estos a 13.Suponha que um par de dados equilibrados seja oe tand Bl OF Ps the Dela distribution wi th Parameters 13. Suppose that a pair of balanced dice are rolled 120 , S g . langado 120 vezes eXdenota o numero de langamentos em 7 a g , times, and let X denote the number of rolls on which the 2.Suponha queX,S,eZsdo variaveis aleatdrias iid e que a soma dos dois numeros é 12. Use a aproximacao de 2. Suppose that X, Y, and Z are i.i.d. random variables sum of the two numbers is 12. Use the Poisson approxi- cada uma tem a distribuigdo normal padrdo. Avaliar PR( ~—- Poisson para aproximar Pr(X=3). and each has the standard normal distribution. Evaluate mation to approximate Pr(X = 3). 3X+25 627). Pr(3X + 2Y <6Z —7). } 14.Suponha que, ..., Xnformar uma amostra aleatoria a GX + 2¥ < ) 14. Suppose that X;,..., X,, form a random sample from 3.Suponha queXeSsdo variaveis aleatdrias de Poisson artir da distribuigao uniforme no intervalo [0,1]. DeixarS1 3. Suppose that X and Y are independent Poisson random the uniform distribution on the interval [0, 1]. Let Y; = p q p ¢ Pp P 1 independentes, tais que Var(X#Var(S5. Avalie 0 PR(X+ S < =min{m, ..., Xn}, Sn=maximot{%, ..., Xn}, eC Sn- 51. Mostre variables such that Var(X) + Var(Y) =5. Evaluate Pr(X + min{X,,..., X,}, Y, = max{X;,..., X,}, and W=Y, — 2). que cada uma das variaveis aleatériasS1,5n,eCtem uma Y <2). Y,. Show that each of the random variables Yj, Y,,, and W 4.Suponha queXtem uma distribui¢do normal tal que Pr distribuicao beta. 4. Suppose that X has a normal distribution such that has a beta distribution. (X <116}0.20 e Pr.(X <3280.90. Determine a média ea 15.Suponha que os eventos ocorram de acordo com um processo Pr(X < 116) = 0.20 and Pr(X < 328) = 0.90. Determine 15. Suppose that events occur in accordance with a Pois- variancia dex. de Poisson a taxa de cinco eventos por hora. the mean and the variance of X. son process at the rate of five events per hour. 5.Suponha que uma amostra aleatéria de quatro observacées seja a.Determine a distribuigdo do tempo de espera 71 5. Suppose that a random sample of four observations is a. Determine the distribution of the waiting time T; extraida da distribuigéo de Poisson com médiaA, e deixar Xdenota até que o primeiro evento ocorra. drawn from the Poisson distribution with mean A, and let until the first event occurs. a média amostral. Mostre isso b.Determine a distribuicgdo do tempo total de espera X denote the sample mean. Show that b. Determine the distribution of the total waiting time ( ) Teaté kocorreram eventos T, until k events have occurred — 1 ‘ > 1 _a, k . Pr.x< 5 =(AA+1 Je-aa. c.Determine a probabilidade de que nenhum dos primeirosk os Pr (x < >) = (444 le ™. c. Determine the probability that none of the first k eventos ocorrerao com intervalo de 20 minutos um do outro. events will occur within 20 minutes of one another. 6.A vidaXde um componente eletrénico tem 165 h . am funci d 6. The lifetime X of an electronic component has the 16. S that fi t functioning simul distribuigdo exponencial tal que Pr(X<10000.75. Qual _" ‘tan aque cinco eee te vein uncionanco exponential distribution such that Pr(X < 1000) = 0.75. ° oooh th li ‘tomee of the are funcroning eid, é a vida Util esperada do componente? simu a ae os ee ten ns ‘i alleen What is the expected lifetime of the component? aeons t , i fetis oes the eee i nb iG ” sejam iid e que cada tempo de vida tenha a distribuicao and that each lifetime has the exponential distribution 7.Suponha queXtem a distribuigdo normal com média / _exponencial com parametrof. Deixar Tidenota 0 tempo desde 7. Suppose that X has the normal distribution with mean —_ with parameter 8. Let T, denote the time from the begin- e varia¢dooz. Expressar £X3)em termos depecz. 0 inicio do processo até a falha de um dos componentes; e and variance o*. Express E(X?*) in terms of yz and o?. ning of the process until one of the components fails; and 8.Suponha que uma amostra aleatdéria de 16 observacées deixar Tsdenota o tempo total até que todos os cinco 8. Suppose that a random sample of 16 observations is let Ts denote the total time until all five components have : ; wes on . componentes falhem. Avaliar Cov(71, 75). ty . failed. Evaluate Cov(7, Ts). seja extraida da distribuigdo normal com médiaye desvio drawn from the normal distribution with mean py and stan- padrdo 12, e que independentemente outra amostra 17.Suponha queXieX2sdo varidveis aleatdrias dard deviation 12, and that independently another ran- 17. Suppose that X, and X> are independent random vari- aleatoria de 25 observacoes seja extraida da distribuigdo independentes, eXeutem a distribuicdo exponencial com dom sample of 25 observations is drawn from the normal ables, and X; has the exponential distribution with param- normal com a mesma médiape desvio padrdo 20. DeixeXe pardmetrofeu(eu=1,2). Mostre que para cada constantek >0, distribution with the same mean y and standard devia- eter 8; (i =1, 2). Show that for each constant k > 0, Sdenotar as médias amostrais dos dois pe tion 20. Let X and Y denote the sample means of the two B amostras. Avaliar PR(| X-S| <5). Pr.(Mizkx2e = =——_.,, samples. Evaluate Pr(|X — Y| <5). Pr(X, > kXy) = —?—. kBi+ Ba kB, + Bo 9.Suponha que os homens cheguem a um balcdo de vendas de acordo 9. Suppose that men arrive at a ticket counter according . . . . com um processo de Poisson a taxa de 120 por hora, e as mulheres 18.Suponha que 15.000 pessoas em uma cidade com uma populacao de toa Poisson process at the rate of 120 per hour, and women 18. Suppose that 15,000 people in a city with a population cheguem de acordo com um processo de Poisson independente a taxa 500.000 habitantes estejam assistindo a um determinado programa de arrive according to an independent Poisson process at the of 500,000 are watching a certain television program. If de 60 por hora. Determine a probabilidade de que quatro ou menos eee ae pessoas ‘ _ forem ones ie qual rate of 60 per hour. Determine the probability that four we People in the oe nitit contacted at random. wnat is pessoas cheguem num periodo de um minuto. 4 propabilidade aproximada de que menos de quatro delas estejam or fewer people arrive in a one-minute period. Pproy Pp y assistindo ao programa? are watching the program? 10.Suponha queX1, X2,...sdo variaveis aleatorias iid, . . . 10. Suppose that X,, X5,... are i.i.d. random variables, Le . . . cada uma das quais tem mgf y(t). DeixarS=Xi+. . .+Xwy 19.Suponha que se deseje estimar a proporcdo de pessoas em each of which has m.g.f. w(t). Let Y= X,+---+X 19. Suppose that it is desired to estimate the proportion of onde o numero de termosAnesta soma é uma variavel uma grande populagdo que possuem uma determinada where the number of terms N in this sum is a random persons in a large population who have a certain charac- aleatéria tendo a distribuicdo de Poisson com médiaA. caracteristica, Uma amostra aleatoria de 100 pessoas € variable having the Poisson distribution with mean i. the oa, ‘ random rample ° 100 Persons is selected from Assuma issoNeX, X2,.. .Sdo independentes eS=0 seA= selecionada da populagdo sem reposicao, e a proporcgdo Assume that N and X, X>,...areindependent, and Y = 0 the popu ation without replacement, and the proportion 0. Determine o mgf des. Xde pessoas na amostra que possuem a caracteristica é observada. if N =0. Determine the m.g.f. of Y. X of persons in the sample who have the characteristic is Mostre que, nao importa qudo grande seja a populacao ; ; ; ; observed. Show that, no matter how large the population 11.Todos os domingos de manhi, duas criangas, Craig e Jill, tentam é o desvio padrao dexé no maximo 0,05. 11. Every Sunday morning, two children, Craig and Jill, is, the standard deviation of X is at most 0.05. langar seus aeromodelos de forma independente. Em cada domingo, oo ; independently try to launch their model airplanes. On ; ; oo, ; Craig tem probabilidade de 1/3 de langamento bem-sucedido e Jill tem 20.Suponha queAtem a distribuicao binomial com each Sunday, Craig has probability 1/3 of a successful 20. Suppose that X has the binomial distribution with probabilidade de 1/5 de langamento bem-sucedido. Determine o parametrosnep, e essaStem a distribuicao binomial ; launch, and Jill has probability 1/5 of a successful launch. parameters n and p, and that Y has the negative binomial numero esperado de domingos necessarios até que pelo menos um negativa com parametrosRep, ondeRé um numero inteiro Determine the expected number of Sundays required un- distribution with parameters r and p, where r isa positive dos dois filhos tenha um lancamento bem-sucedido. positivo. Mostre que Pr(X<rFPr.(S > n-rymostrando til at least one of the two children has a successful launch. integer. Show that Pr(X <r) = Pr(Y >n — r) by showing 346 Capitulo 5 Distribuigées Especiais 346 Chapter 5 Special Distributions que tanto o lado esquerdo quanto 0 lado direito desta equacdo podem 10 alunos sdo selecionados aleatoriamente da populacdo e that both the left side and the right side of this equation 10 students are selected at random from the population, ser considerados como a probabilidade do mesmo evento em uma X,X2,X3,Xadenotam, respectivamente, o numero de can be regarded as the probability of the same event in a and let X;, X2, X3, X4 denote, respectively, the numbers sequéncia de tentativas de Bernoulli com probabilidadepde sucesso. calouros, alunos do segundo ano, juniores e seniores sequence of Bernoulli trials with probability p of success. of freshmen, sophomores, juniors, and seniors that are ob- 21.Suponha queAtem a distribuigdo de Poisson com média Ndo,e oPtidos. . . 21. Suppose that X has the Poisson distribution with mean tained. . . . . essa Stem a distribuigdo gama com parametros a=kef=/, ondeké a.Determinarp(Xeu, Xpara cada par de valores eue/ At, and that Y has the gamma distribution with parameters a. Determine p(X;, Xj) for each pair of values i and j um numero inteiro positivo. Mostre que Pr(X2kPr.(Sst) (eu <). a =k and B =), where k is a positive integer. Show that @ <j). mostrando que tanto o lado esquerdo quanto o lado direito desta b.Para quais valores deeue/ (eu <jép(Xeu, Xjmais Pr(X > k) =Pr(Y <tr) by showing that both the left side b. For what values of i and j (i < j) is p(X;, X;) most equac¢do podem ser considerados como a probabilidade do negativo? and the right side of this equation can be regarded as the negative? mesmo evento em um processo de Poisson no qual o numero c.Para quais valores deeue/ (eu <ép(Xeu, XJmais proximo de probability of the same event in a Poisson process in which c. For what values of i and j (i < j) is p(X;, X;) closest esperado de ocorréncias por unidade de tempo éA. 0? the expected number of occurrences per unit of time is i. to 0? 22.Suponha queXé uma variavel aleatoria com distribuigao ——-:24.Suponha queXieX2tem a normal bivariada 22. Suppose that X is a random variable having a contin- 24. Suppose that X, and X, have the bivariate normal continua com pdff(xJe cdfF(x), e para o qual Pr(Xx>0#1. distribuicdo com meiospiezsn, variagdesoz eo, uous distribution with p.d-f. f(x) and c.d.f. F(x), and for distribution with means 1; and jx», variances o? and 03, Deixe a taxa de falhah(xser conforme definido no e correlacdop. Determine a distribuigdo deXi-3.2. ‘ which Pr(X > 0) = 1. Let the failure rate h(x) beasdefined and correlation p. Determine the distribution of X,—3X>. Exercicio 18 da Seg. 5.7. Mostre isso in Exercise 18 of Sec. 5.7. Show that [ Jx ] 25.Suponha queAtem a distribuigdo normal padrdo ea x 25. Suppose that X has the standard normal distribution, exp - h(t)dt=1 -F(x). distribuigdo condicional deSdadoxé a distribuicao exp] - [ A(t) ar| =1- F(x). and the conditional distribution of Y given X isthe normal 0 normal com média 2X3 e varidncia 12. Determine a 0 distribution with mean 2X — 3 and variance 12. Determine distribuig¢do marginal deSe o valor dep(x, Y). . the marginal distribution of Y and the value of p(X, Y). 23.Suponha que 40% dos alunos de uma grande populagado 23. Suppose that 40 percent of the students in a large pop- sejam calouros, 30% sejam alunos do segundo ano, 20% 26.Suponha queXieX2tem uma doenga normal bivariada ulation are freshmen, 30 percent are sophomores, 20 per- 26. Suppose that X, and X> have a bivariate normal dis- sejam juniores e 10% sejam veteranos. Suponha que contribuigdo comEX2~0. AvaliarEX2 = 1X2). cent are juniors, and 10 percent are seniors. Suppose that tribution with E(X,) = 0. Evaluate E(X{X)). 2 CO —_—we¥ eS ee wou Ft felizmente Chapter Grandes amostras aleatérias 5 LARGE RANDOM SAMPLES 4 6.1Introdugdo 6.4A correcdo para continuidade 6.1 Introduction 6.4 The Correction for Continuity 6.2A Lei dos Grandes Numeros 6,5Exercicios Suplementares 6.2 The Law of Large Numbers 6.5 Supplementary Exercises 6.30 Teorema do Limite Central 6.3 The Central Limit Theorem 6.1 Introdugdao 6.1 Introduction Neste capitulo, apresentamos uma série de resultados de aproximacao que simplificam a In this chapter, we introduce a number of approximation results that simplify the andlise de grandes amostras aleatérias. Na primeira secgdo, damos dois exemplos para ilustrar analysis of large random samples. In the first section, we give two examples to os tipos de andlises que podemos desejar realizar e como ferramentas adicionais podem ser illustrate the types of analyses that we might wish to perform and how additional necessdrias para podermos realiza-las. tools may be needed to be able to perform them. Exemplo Proporcdo de cabegas.Se vocé tirar uma moeda do bolso, podera se sentir confiante Example Proportion of Heads. If you draw a coin from your pocket, you might feel confident 6.1.1 que é essencialmente justo. Ou seja, a probabilidade de cair de cabega erguida quando virada é 6.1.1 that it is essentially fair. That is, the probability that it will land with head up when 1/2. No entanto, se vocé jogasse a moeda 10 vezes, nao esperaria ver exatamente 5 caras. Se flipped is 1/2. However, if you were to flip the coin 10 times, you would not expect vocé virasse 100 vezes, seria ainda menos provavel que visse exatamente 50 caras. Na verdade, to see exactly 5 heads. If you were to flip it 100 times, you would be even less likely podemos calcular as probabilidades de cada um destes dois resultados utilizando o facto de to see exactly 50 heads. Indeed, we can calculate the probabilities of each of these que o numero de caras emnlangamentos independentes de uma moeda justa tém distribuigdo two results using the fact that the number of heads in n independent flips of a fair binomial com parametrosne 1/2. Entéo seXé o numero de caras em 10 langamentos coin has the binomial distribution with parameters n and 1/2. So, if X is the number independentes, sabemos que of heads in 10 independent flips, we know that OG 1 5€ 4s 10\ (1\°(, 1) Pr. (X=5 = 1- = =0.2461. Pr(x =5)= = 1—=] =0.2461. 5 2 2 5 2 2 Se5é o numero de caras em 100 langamentos independentes, temos If Y is the number of heads in 100 independent flips, we have () Qed ) 50 50 1 150 150 100\ (1 1 Pr.(S=50)= = 1- = =0.0796. Pr(Y =50) = ( ) (3) (1 = :) = 0.0796. 50 2 2 50 2 2 Mesmo que a probabilidade de exatamenten/2 cabecas dentronFLips é bastante pequeno, Even though the probability of exactly n/2 heads in n flips is quite small, especially especialmente para grandes», vocé ainda espera que a proporcao de caras seja préxima de 1/2 for large n, you still expect the proportion of heads to be close to 1/2 if n is large. For sené grande. Por exemplo, sen=100, a proporcdo de caras 65/100. Neste caso, a probabilidade example, if n = 100, the proportion of heads is Y/100. In this case, the probability de a proporcdo estar dentro de 0,1 de 1/2 é that the proportion is within 0.1 of 1/2 is ( s ) FA od 4) eu 100-eu y 60 H00\ /1\¢ 1\ 100-1 Pr0.4< -—— <0.6 = Pr(40s5<60} = 1--+ =0.9648. Pr ( 0.4 < — <0.6) =Pr(40 < Y <60)= > -) (1-= = 0.9648. 100 eu 2 2 100 . i 2 2 eu=40 i=40 Um calculo semelhante comn=10 rendimentos A similar calculation with n = 10 yields ) sdo (eu daw 6 | 10-1 x 0 1 1 tee X 10\ (1)! 1\"" Pr0.4< — <0.6=Pr(4<5<6- = 1-= =0.6563. Pr (04 <—< 06) =Pr(4<Y<6)= > ( ) (5) (1 _ ;) = 0.6563. 10 eu 2 2 10 ; i 2 2 eu=4 i=4 Observe que a probabilidade de que a proporcdo de caras emnlangamentos € préximo de 1/2 é Notice that the probability that the proportion of heads in n tosses is close to 1/2 is maior paran=100 do que paran=10 neste exemplo. Isto se deve em parte ao fato de que larger for n = 100 than for n = 10 in this example. This is due in part to the fact that 347 347 348 Capitulo 6 Grandes Amostras Aleatérias 348 Chapter 6 Large Random Samples definimos “perto de 1/2” como sendo o mesmo para ambos os casos, ou seja, entre 0,4 e we have defined “close to 1/2” to be the same for both cases, namely, between 0.4 0,6. - and 0.6. < Os cAlculos realizados no Exemplo 6.1.1 foram bastante simples porque temos uma The calculations performed in Example 6.1.1 were simple enough because we formula para a funcdo de probabilidade do numero de caras em qualquer numero de have a formula for the probability function of the number of heads in any number langamentos. Para variaveis aleatérias mais complicadas, a situacdo ndo é tao simples. of flips. For more complicated random variables, the situation is not so simple. Exemplo Tempo médio de espera.Uma fila esta atendendo os clientes, e oeuo cliente espera um Example Average Waiting Time. A queue is serving customers, and the ith customer waits a 6.1.2 hora aleatériaXeupara ser servido. Suponha que, X2,...sdo variaveis _ aleatorias iid 6.1.2 random time X; to be served. Suppose that X,, X>,... are i1.i.d. random variables com distribuigado uniforme no intervalo [0,1]. O tempo médio de espera é de 0,5. A having the uniform distribution on the interval [0, 1]. The mean waiting time is 0.5. intuigdo sugere que a média de um grande numero de tempos de espera deve estar Intuition suggests that the average of a large number of waiting times should be proxima do tempo médio de espera. Mas a distribuigdo da média deX, ... , Xné bastante close to the mean waiting time. But the distribution of the average of X,,..., X,, is complicado para cadan >1. Pode nao ser possivel calcular com precisdo a probabilidade rather complicated for every n > 1. It may not be possible to calculate precisely the de a média amostral estar proxima de 0,5 para amostras grandes. - probability that the sample average is close to 0.5 for large samples. < A lei dos grandes numeros (Teorema 6.2.4) dara uma base matematica a intuicdéo de The law of large numbers (Theorem 6.2.4) will give a mathematical foundation que a média de uma grande amostra de variaveis aleatorias iid, como os tempos de to the intuition that the average of a large sample of i.i.d. random variables, such as espera no Exemplo 6.1.2, deve estar préxima da sua média. O teorema do limite central the waiting times in Example 6.1.2, should be close to their mean. The central limit (Teorema 6.3.1) nos dard uma maneira de aproximar a probabilidade de que a média da theorem (Theorem 6.3.1) will give us a way to approximate the probability that the amostra esteja proxima da média. sample average is close to the mean. Exercicios Exercises 1.A solugdo do Exercicio 1 da Sec. 3.9 6 0 pdf deXi+ X2 Pr.(|Xn-p| $c)converge para 1 comon-~, Dica-Escreva a 1. The solution to Exercise 1 of Sec. 3.9 is the p.d.f. of X; + Pr(|X,, — “| <c) converges to 1 asn — oo. Hint: Write the no Exemplo 6.1.2. Encontre o pdf deX2=(Xi+X2)/2. probabilidade em termos da fdc normal padrdo e use 0 X, in Example 6.1.2. Find the p.d.f. of X> = (X; + X>)/2. probability in terms of the standard normal c.d.f. ® and use Compare as probabilidades queX2eXiestdo perto de que vocé sabe sobre esta fdc Compare the probabilities that X, and X, are close to0.5. | What you know about this c.d.f. 0,5. Em particular, calcule Pr(| X2- 0.5| <0.1)e Pr(| X1- In particular, compute Pr(|X — 0.5] < 0.1) and Pr(|X, — ; ; 0.5| <0.1). Qual recurso do pdf deXodeixa claro que a 3.Este problema requer um programa de computador porque o calculo 0.5| < 0.1). What feature of the p.d.f. of X> makes it clear 3. This problem requires a computer program because the distribuicdo esta mais concentrada perto da média? € muito tedioso para ser feito manualmente. Estenda 0 calculo do that the distribution is more concentrated near the mean? calculation is too tedious to do by hand. Extend the cal- Exemplo 6.1.1 para o caso den=200 voltas. Ou seja, deixe Geja o culation in Example 6.1.1 to the case of n = 200 flips. That 2.DeixarX, X2,...ser uma sequéncia de varidveis _ aleatérias iid- numero de caras em 200 langamentos de uma moeda honesta, 2. Let X 1, X>,... be a sequence of i.i.d. random vari- is, let W be the number ot heads in 200 flips of a fair coin, capazes de ter a distribuicao normal com médiaye e calcule Pr 0.4<c 20e<0.6. O que vocé acha que é ables having the normal distribution with mean » and = and compute Pr (0.4 < x09 < 0.6). What do you think is variagaooz. DeixarXn=1, ~ ev1Xeuseja a média amostral de a continuacdo do padrdo dessas probabilidades como o variance o7. Let X, = 4 oF X; be the sample mean of the continuation of the pattern of these probabilities as 0 primeironvariaveis aleatérias na sequéncia. Mostre isso numero de langcamentosnaumenta sem limite? the first n random variables in the sequence. Show that the number of flips n increases without bound? 6.2 ALei dos Grandes Numeros 6.2 The Law of Large Numbers A média de uma amostra aleatoria de varidveis _aleat6rias iid 6 chamada de média The average of a random sample of i.i.d. random variables is called their sample amostral. A média amostral é util para resumir as informagées em uma amostra mean. The sample mean is useful for summarizing the information in a random aleat6ria, da mesma forma que a média de uma distribuigao de probabilidade sample in much the same way that the mean of a probability distribution summa- resume as informag¢ées na distribuicao. Nesta secao, apresentamos alguns resultados rizes the information in the distribution. In this section, we present some results que ilustram a conexao entre a média amostral e o valor esperado das varidveis that illustrate the connection between the sample mean and the expected value of aleatorias individuais que comp6em a amostra aleatoria. the individual random variables that comprise the random sample. As desigualdades de Markov e Chebyshev The Markov and Chebyshev Inequalities Comegaremos esta segdo apresentando dois resultados simples e gerais, conhecidos como We shall begin this section by presenting two simple and general results, known desigualdade de Markov e desigualdade de Chebyshev. Aplicaremos entdo essas desigualdades as the Markov inequality and the Chebyshev inequality. We shall then apply these a amostras aleatérias. inequalities to random samples. 6.2 A Lei dos Grandes Numeros 349 6.2 The Law of Large Numbers 349 A desigualdade de Markov esta relacionada a afirmagdo feita na pagina 211 sobre como a média de The Markov inequality is related to the claim made on page 211 about how the uma distribuicaéo pode ser afetada ao mover uma pequena quantidade de probabilidade para um valor mean of a distribution can be affected by moving a small amount of probability to an arbitrariamente grande. A desigualdade de Markov imp6de um limite a quantidade de probabilidade que pode arbitrarily large value. The Markov inequality puts a bound on how much probability existir em valores arbitrariamente grandes, uma vez especificada a média. can be at arbitrarily large values once the mean is specified. Teorema Desigualdade de Markov.Suponha queXé uma variavel aleatéria tal que Pr(X201. Entdo Theorem Markov Inequality. Suppose that X is a random variable such that Pr(X > 0) = 1. Then 6.2.1 para cada numero real>0, 6.2.1 for every real number ¢ > 0, EX, E(x Pr. (X20 a. (6.2.1) Pr(x >t) < EQ) (6.2.1) t ProvaPor conveniéncia, assumiremos queXtem uma distribui¢gdo discreta para a qual Proof For convenience, we shall assume that X has a discrete distribution for which o PF éfA prova de uma distribuigdo continua ou de um tipo mais geral de the pf. is f. The proof for a continuous distribution or a more general type of distribuicdo é semelhante. Para uma distribuicdo discreta, distribution is similar. For a discrete distribution, 2d 2d 2d EX xf(Xx xfx xf(x). E(X) =) xf(x) = do xf(x) + Do xf). x xst xet x x<t x>t DesdeXpode ter apenas valores ndo negativos, todos os termos nos somatdrios sdo nao Since X can have only nonnegative values, all the terms in the summations are negativos. Portanto, nonnegative. Therefore, 2d 2d EXE xf tf (XE iPr. (X20). (6.2.2) E(X) >So xf(x) = So tf) =1 Pr(X = 0). (6.2.2) x2t x2t x>t x>t Divida as extremidades de (6.2.2) por>0 para obter (6.2.1). 7 Divide the extreme ends of (6.2.2) by t > 0 to obtain (6.2.1). 7 A desigualdade de Markov é de interesse principalmente para grandes valores det.Na The Markov inequality is primarily of interest for large values of t. In fact, when verdade, quando t<E£X), a desigualdade nado tem nenhum interesse, pois se sabe que Pr(X< t < E(X), the inequality is of no interest whatsoever, since it is known that Pr(X < 01. No entanto, verifica-se a partir da desigualdade de Markov que para cada t) < 1. However, it is found from the Markov inequality that for every nonnegative variavel aleatoria ndo negativaXcuja média é 1, o valor maximo possivel de Pr(xX2100) random variable X whose mean is 1, the maximum possible value of Pr(X > 100) is € 0,01. Além disso, pode-se verificar que este valor maximo é realmente alcancado 0.01. Furthermore, it can be verified that this maximum value is actually attained by por cada variavel aleatériaXpara o qual Pr(X=00.99 e Pr.(X=100}0.01. every random variable X for which Pr(X = 0) = 0.99 and Pr(X = 100) = 0.01. A desigualdade de Chebyshev esta relacionada a ideia de que a variancia de uma variavel The Chebyshev inequality is related to the idea that the variance of a random aleatéria é uma medida de qudo espalhada é sua distribuicdo. A desigualdade diz que a probabilidade variable is a measure of how spread out its distribution is. The inequality says that the de queXesta longe de sua média é limitado por uma quantidade que aumenta a medida que Var(X) probability that X is far away from its mean is bounded by a quantity that increases aumenta. as Var(X) increases. Teorema Desigualdade de Chebyshev.DeixarXser uma variavel aleatéria para a qual Var(XjJexiste. Ent&o para Theorem Chebyshev Inequality. Let X be a random variable for which Var(X) exists. Then for 6.2.2 cada numero->0, 6.2.2 every number ¢ > 0, Var (X) Var (X Pr.(| X-EX)| 20S var (6.2.3) Pr(|X — E(X)| >t) < wer (6.2.3) t ProvaDeixarS= [X-EXxJ]2. Entdo Pr.(S20)-1 e£(SFVar(X). Aplicando a desigualdade Proof Let Y =[X — E(X)f. Then Pr(Y > 0) =1 and E(Y) = Var(X). By applying de Markov aS,obtemos 0 seguinte resultado: the Markov inequality to Y, we obtain the following result: Var(X) Var(X Pr.(| X-EX)| 2tFPr.(Sz2X vr = Pr(|X — E(X)| >t) =Pr(¥ >?) < ere = t Pode-se ver a partir desta prova que a desigualdade de Chebyshev é simplesmente um It can be seen from this proof that the Chebyshev inequality is simply a special caso especial da desigualdade de Markov. Portanto, os comentarios feitos apés a prova da case of the Markov inequality. Therefore, the comments that were given following desigualdade de Markov também podem ser aplicados 4 desigualdade de Chebyshev. Devido a the proof of the Markov inequality can be applied as well to the Chebyshev inequal- sua generalidade, estas desigualdades sdo muito Uteis. Por exemplo, se Var(Xoze nds ity. Because of their generality, these inequalities are very useful. For example, if deixamost=3a,entdo a desigualdade de Chebyshev produz o resultado que Var(X) = o? and we let t = 3c, then the Chebyshev inequality yields the result that 1 1 Pr.(| X-EX)| 230K. 5 Pr(|X — E(X)| = 30) < 9° 350 Capitulo 6 Grandes Amostras Aleatorias 350 Chapter 6 Large Random Samples Em palavras, a probabilidade de que qualquer variavel aleatoria diferira de sua média em In words, the probability that any given random variable will differ from its mean by mais de 3 desvios padraéondo podeexceder 1/9. Na verdade, essa probabilidade sera more than 3 standard deviations cannot exceed 1/9. This probability will actually be muito menor que 1/9 para muitas das variaveis aleatorias e distribuigdes que serdo much smaller than 1/9 for many of the random variables and distributions that will discutidas neste livro. A desigualdade de Chebyshev é util porque esta probabilidade deve be discussed in this book. The Chebyshev inequality is useful because of the fact that ser 1/9 ou menos paratododistribuigdo. Também pode ser mostrado (ver Exercicio 4 no this probability must be 1/9 or less for every distribution. It can also be shown (see final desta secdo) que o limite superior em (6.2.3) é nitido no sentido de que nao pode ser Exercise 4 at the end of this section) that the upper bound in (6.2.3) is sharp in the diminuido e ainda valido portodosdistribuicgées. sense that it cannot be made any smaller and still hold for a// distributions. Propriedades da média amostral Properties of the Sample Mean Na Definicdo 5.6.3, definimos omédia da amostrademariaveis aleatériasX1,..., Xn In Definition 5.6.3, we defined the sample mean of n random variables X;,..., X, ser a média deles, to be their average, _ 1 = 1 Xn= —(Xi+ «+++ Xn). X, = —(X%,+---4+X,). n n A média e a variancia deXnsdo facilmente computados. The mean and the variance of X,, are easily computed. Teorema Média e Variancia da Média Amostral.Deixar%1,..., X»ser uma amostra aleatéria de Theorem Mean and Variance of the Sample Mean. Let X;,..., X,, be a random sample from 6.2.3 uma distribuigdo com médiape variacdooz. DeixarXnseja a média amostral. Entdo 6.2.3 a distribution with mean yw and variance o?. Let X,, be the sample mean. Then EXn pe Var (Xn o2/n. E(X,,) = wand Var(X,) =07/n. ProvaSegue-se dos Teoremas 4.2.1 e 4.2.4 que Proof It follows from Theorems 4.2.1 and 4.2.4 that — 12" 1 ~,_1x< 1 EXnF a EX cup Fld =p. E(X,) = - » E(X)) = — on =H eu=1 i=1 Além disso, desdeXi,..., XnSdo independentes, os Teoremas 4.3.4 e 4.3.5 dizem que Furthermore, since X;,..., X,, are independent, Theorems 4.3.4 and 4.3.5 say that 1 y 1 = Var(Xn= —Var ~ Xeu Var(X,,) = — va xi m n2 ; eu=1 i=l 12” 1 O2 1X 1 o? =— Var(Xk, —.no= —. = = — S*‘Var(X;) =— - no? =—. = m “ym n n2 » ue 2 n eu=1 i=l Em palavras, a média deXné igual 4 média da distribuicdo da qual a amostra aleatoria In words, the mean of X,, is equal to the mean of the distribution from which the foi extraida, mas a varidncia deXné apenas 1/nvezes a varidncia dessa distribuigdo. random sample was drawn, but the variance of X,, is only 1/n times the variance Segue-se que a distribuigdo de probabilidade deXnestara mais concentrado em torno of that distribution. It follows that the probability distribution of X,, will be more do valor médioydo que era a distribuicdo original. Em outras palavras, a média concentrated around the mean value ju than was the original distribution. In other amostralXné mais provavel que esteja pertoydo que é 0 valor de apenas uma unica words, the sample mean X,, is more likely to be close to wz than is the value of just a observacdoXeuda distribuigdo dada. single observation X; from the given distribution. Estas afirmacg6es podem ser mais precisas aplicando a desigualdade de These statements can be made more precise by applying the Chebyshev inequal- Chebyshev aXn. Desde EXnEpe Var(XnFo2/n, segue da relagdo (6.2.3) que para ity to X,,. Since E(X,,) = and Var(X,,) = o7/n, it follows from the relation (6.2.3) cada numero>0, that for every number tr > 0, — O2 > o? Pr.(|Xn-p| 20S —. (6.2.4) Pr(|X, — wl =t<—. (6.2.4) ndo2 nt? Exemplo Determinando 0 numero necesséario de observacées.Suponha que uma amostra aleatoria seja Example Determining the Required Number of Observations. Suppose that a random sample is 6.2.1 a ser retirado de uma distribuicgdo para a qual o valor da médiaymndo é conhecido, mas para o 6.2.1 to be taken from a distribution for which the value of the mean p is not known, but for qual se sabe que o desvio padrdocé de 2 unidades ou menos. Determinaremos quao grande which it is known that the standard deviation o is 2 units or less. We shall determine deve ser o tamanho da amostra para tornar a probabilidade de pelo menos 0,99 que | X-p/| how large the sample size must be in order to make the probability at least 0.99 that sera inferior a 1 unidade. |X,, — “| will be less than 1 unit. 6.2 A Lei dos Grandes Numeros 351 6.2 The Law of Large Numbers 351 Desdeo2s22= 4, segue da relacdo (6.2.4) que para cada tamanho de amostran, Since o? < 2? =4, it follows from the relation (6.2.4) that for every sample size n, — o.4 = o2 4 Pr.(|Xn-p| 21)s —<-, Pr(|X, — ul) > D)<—<-. n n n n Desdendeve ser escolhido de modo que Pr/| Xr-1/| <1 20.99, segue-se quendeve ser Since n must be chosen so that Pr(|X,, — | < 1) > 0.99, it follows that n must be escolhido de modo que 4/ns0.01. Portanto, é necessario quen2400. - chosen so that 4/n < 0.01. Hence, it is required that n > 400. < Exemplo Uma simulagao.Um engenheiro ambiental acredita que existem dois contaminantes Example A Simulation. An environmental engineer believes that there are two contaminants 6.2.2 em um abastecimento de agua, arsénico e chumbo. As concentragées reais dos dois 6.2.2 in a water supply, arsenic and lead. The actual concentrations of the two contami- contaminantes sao variaveis aleatérias independentesXeS,medido nas mesmas nants are independent random variables X and Y, measured in the same units. The unidades. O engenheiro esta interessado em saber qual proporgdo da contaminagao é, engineer is interested in what proportion of the contamination is lead on average. em média, de chumbo. Ou seja, o engenheiro quer saber a média deR=S/X+ S). Supomos That is, the engineer wants to know the mean of R = Y/(X + Y). We suppose that it que é simples gerar tantos nimeros pseudo-aleatorios independentes com as is a simple matter to generate as many independent pseudo-random numbers with distribuig6es deXeScomo desejamos. Uma maneira comum de obter uma aproximagao the distributions of X and Y as we desire. A common way to obtain an approximation para£[S//X+ S)] seria o seguinte: Se amostrarmosnpares(™1, Sy, . «+, (Xn, Sn) to E[Y/(X + Y)] would be the following: If we sample n pairs (X1, Y1),..., (Xn. Yn) ecalcularR — eu=Seu/(Xeut Ser &@u. . ,h, entdoRn=1 = “Bir Reue um sensato and compute R; = Y,/(X; + Y;) fori=1,...,n, then R, =+ >-"_, R; is a sensible aproximacao de£(R). Para decidir qudo grandendeveria ser, podemos argumentar como no approximation to E(R). To decide how large n should be, we can argue as in Ex- Exemplo 6.2.1. Como se sabe que | Reu| <1, deve ser que Var(Reu)s1. (Na verdade, Var(ReuJS14, ample 6.2.1. Since it is known that |R;| <1, it must be that Var(R;) < 1. (Actually, mas isso é mais dificil de provar. Veja o Exercicio 14 nesta segdo para ver uma maneira de Var(R;) < 1/4, but this is harder to prove. See Exercise 14 in this section for a way to provar isso no caso discreto.) De acordo com a desigualdade de Chebyshev, para cadaé >0, prove it in the discrete case.) According to Chebyshev’s inequality, for each € > 0, (_ ) 4 _ 1 Pré|Rn-E(R)| ze < —. Pr(IR, — E(R)|> c) <—. NE2 ne Entdo, se quisermos | Rr-F(R)| <0.005 com probabilidade 0,98 ou mais, entao devemos So, if we want |R,, — E(R)| < 0.005 with probability 0.98 or more, then we should use usar 7 >140.2x0.0052] = 2,000,000. - n> 1/[0.2 x 0.0057] = 2,000,000. < Deve-se enfatizar que o uso da desigualdade de Chebyshev no Exemplo 6.2.1 garante que It should be emphasized that the use of the Chebyshev inequality in Exam- uma amostra para a qual=400 sera grande o suficiente para atender aos requisitos de ple 6.2.1 guarantees that a sample for which n = 400 will be large enough to meet the probabilidade especificados, independentemente do tipo especifico de distribuigdo da qual a specified probability requirements, regardless of the particular type of distribution amostra sera retirada. Se mais informagées sobre esta distribuicdo estiverem disponiveis, entao from which the sample is to be taken. If further information about this distribution muitas vezes pode ser demonstrado que um valor menor paransera suficiente. Esta is available, then it can often be shown that a smaller value for n will be sufficient. propriedade é ilustrada no préximo exemplo. This property is illustrated in the next example. Exemplo Jogando uma moeda.Suponha que uma moeda honesta seja langadanvezes de forma independente. Para Example Tossing a Coin. Suppose that a fair coin is to be tossed n times independently. For 6.2.3 eu=1,...,, deixarXeu=1 se uma cabega for obtida noeulance e deixeXeu=0 se uma cauda 6.2.3 i=1,...,n, let X; =1if a head is obtained on the ith toss, and let X; = 0 if a tail for obtida noeuo lance. Entéo a média amostralXnsera simplesmente igual 4 propor¢do de is obtained on the ith toss. Then the sample mean X,, will simply be equal to the caras obtidas nonarremessos. Determinaremos o numero de vezes que a moeda deve ser proportion of heads that are obtained on the n tosses. We shall determine the number lancada para que Pr(0.4<Xns0.6 20.7. Determinaremos este numero de duas maneiras: of times the coin must be tossed in order to make Pr(0.4 < X,, < 0.6) > 0.7. We shall primeiro, utilizando a desigualdade de Chebyshev; segundo, usando as probabilidades determine this number in two ways: first, by using the Chebyshev inequality; second, exatas para a distribuigdo binomial do numero total de caras. by using the exact probabilities for the binomial distribution of the total number of yn heads. Deixar7= ‘~ eu=1Xeudenotam o numero total de caras que sdo obtidas quandon Let T = }°"_, X; denote the total number of heads that are obtained when n langamentos sdo feitos. Entdo 7tem a distribuigdo binomial com pardmetrosnep= tosses are made. Then T has the binomial distribution with parameters n and p = 1/2. 1/2. Portanto, segue da Eq. (4.2.5) na pagina 221 queF(T=n/7, e segue da Eq. Therefore, it follows from Eq. (4.2.5) on page 221 that E(T) =n/2, and it follows (4.3.3) na pagina 232 que Var(7TENn/. PorqueXn=7/n, podemos obter from Eq. (4.3.3) on page 232 that Var(T) =n/4. Because X,, = T/n, we can obtain 352 Capitulo 6 Grandes Amostras Aleatérias 352 Chapter 6 Large Random Samples a seguinte relacdo da desigualdade de Chebyshev: the following relation from the Chebyshev inequality: Pr.0.4sXrs0.67Pr.(0.47¢ mer Pr(0.4 < X, <0.6) = Pr(0.4n < T <0.6n) n =Pr. Ir -!| 0.1n =Pr(|T — —| <0.1n 2\s 2 n 25 n 25 21- ——F=1- —. >1- —— =1-—. 40.1nb n 4(0.1n)2 n Portanto, senz84, esta probabilidade sera de pelo menos 0,7, conforme necessario. Hence, if n > 84, this probability will be at least 0.7, as required. No entanto, a partir da tabela de distribuicdes binomiais fornecida no final deste livro, However, from the table of binomial distributions given at the end of this book, verifica-se que paran=15, it is found that for n = 15, Pr.(0.4SXn<0.6 FPr.(6S TS9 0.70. Pr(0.4 < X,, < 0.6) = Pr(6 < T < 9) =0.70. Portanto, 15 langamentos seriam suficientes para satisfazer o requisito de probabilidade Hence, 15 tosses would actually be sufficient to satisfy the specified probability especificado. - requirement. < A Lei dos Grandes Numeros The Law of Large Numbers A discussdo no Exemplo 6.2.3 indica que a desigualdade de Chebyshev pode nao ser uma The discussion in Example 6.2.3 indicates that the Chebyshev inequality may not be ferramenta pratica para determinar o tamanho de amostra apropriado em um problema a practical tool for determining the appropriate sample size in a particular problem, especifico, porque pode especificar um tamanho de amostra muito maior do que o realmente because it may specify a much greater sample size than is actually needed for the necessario para a distribuic¢do especifica da qual a amostra esta sendo levado. Contudo, a particular distribution from which the sample is being taken. However, the Cheby- desigualdade de Chebyshev é uma ferramenta tedrica valiosa e sera usada aqui para provar um shev inequality is a valuable theoretical tool, and it will be used here to prove an resultado importante conhecido como/e/ dos grandes nuimeros. important result known as the law of large numbers. Suponha queZ1, 22,...6 uma sequéncia de variaveis aleatdrias. Grosso modo, diz-se Suppose that Z,, Z>, .. .is asequence of random variables. Roughly speaking, it que esta sequéncia converge para um determinado numerodse a distribuicgdo de is said that this sequence converges to a given number b if the probability distribution probabilidade deZnfica cada vez mais concentrado em tornobcomon-~.Para ser mais of Z,, becomes more and more concentrated around b as n — oo. To be more precise, preciso, damos a seguinte definigdo. we give the following definition. Definigao Convergéncia em Probabilidade.Uma sequénciaZ1, 2,...de varidveis _ aleatoriasconverge para Definition Convergence in Probability. A sequence Z,, Z>, ... of random variables converges to 6.2.1 bem probabilidadese para cada nimeroe >0, 6.2.1 b in probability if for every number « > 0, lim Pr(| Zn-b| <EF1. lim Pr(|Z, — b| < e) =1. [ho noo Esta propriedade é denotada por This property is denoted by Zrb, Z, ~> b, e as vezes é declarado simplesmente comoZnconverge parabem probabilidade. and is sometimes stated simply as Z,, converges to b in probability. Em outras palavras, Znconverge parabem probabilidade se a probabilidade de queZnreside em In other words, Z,, converges to b in probability if the probability that Z,, lies in cada intervalo dado em tornod, nao importa qudo pequeno seja esse intervalo, se aproxima de 1 each given interval around b, no matter how small this interval may be, approaches conformen- ©, lasn—> oo. Mostraremos agora que a média amostral de uma amostra aleatéria com variancia We shall now show that the sample mean of a random sample with finite variance finita sempre converge em probabilidade para a média da distribuigéo da qual a amostra always converges in probability to the mean of the distribution from which the aleatoria foi retirada. random sample was taken. Teorema Lei dos Grandes Numeros.Suponha queXi, ..., Xnformar uma amostra aleatéria de um Theorem Law of Large Numbers. Suppose that X,,..., X, form a random sample from a 6.2.4 distribuigdo para a qual a média éve para o qual a varidncia é finita. DeixarXndenota a 6.2.4 distribution for which the mean is yz and for which the variance is finite. Let X,, denote média amostral. Entao the sample mean. Then x0 > P Xr, (6.2.5) X, > mu. (6.2.5) 6.2 A Lei dos Grandes Numeros 353 6.2 The Law of Large Numbers 353 ProvaDeixe a variadncia de cadaXeuseraz. Segue-se entdo da desigualdade de Chebyshev Proof Let the variance of each X; be o”. It then follows from the Chebyshev inequal- que para cada nimeroe >0, ity that for every number ¢ > 0, an O2 > o? Pr.(| Xn-p| < E21 - —. Pr(|X, — wu] <6) >1-—. ne2 ne? Por isso, Hence, lim Pr(|Xn-p| <EF1, lim Pr(|X,, — “| <e) =1, Now noo 0 que significa queXn-5 us. = which means that X,, —> wu. = Também pode ser mostrado que a Eq. (6.2.5) é satisfeita se a distribuigdo da qual a It can also be shown that Eq. (6.2.5) is satisfied if the distribution from which the amostra aleatoria é retirada tiver uma média finitaymas uma variacdo infinita. Contudo, a prova random sample is taken has a finite mean jz but an infinite variance. However, the para este caso esta além do escopo deste livro. proof for this case is beyond the scope of this book. DesdeXnconverge parayem probabilidade, segue-se que ha uma alta probabilidade de que Since X,, converges to in probability, it follows that there is high probability that Xnestara perto deyse o tamanho da amostrané grande. Portanto, se uma grande amostra X,, will be close to if the sample size n is large. Hence, if a large random sample is aleatoria for retirada de uma distribuicdo para a qual a média é desconhecida, entao a média taken from a distribution for which the mean is unknown, then the arithmetic average aritmética dos valores na amostra seré normalmente uma estimativa aproximada da média of the values in the sample will usually be a close estimate of the unknown mean. desconhecida. Este topico sera discutido novamente na Sec. 6.3, onde introduzimos o teorema This topic will be discussed again in Sec. 6.3, where we introduce the central limit do limite central. Sera entdo possivel apresentar uma distribuicgaéo de probabilidade mais theorem. It will then be possible to present a more precise probability distribution precisa para a diferencga entreXney. for the difference between X,, and pw. O resultado a seguir pode ser util se observarmos variaveis aleatérias com médiay The following result can be useful if we observe random variables with mean jz mas estdo interessados emyrou registro(~ou alguma outra fungdo continua dey. A but are interested in jx.” or log(jz) or some other continuous function of jz. The proof prova fica para o leitor (Exercicio 15). is left for the reader (Exercise 15). Teorema Funcdes Continuas de Variaveis Aleatorias.SeZr—b, e seg(zé uma fungdo Theorem Continuous Functions of Random Variables. If Z,, —> b, and if g(z) is a function that 6.2.5 continua emzb, ent3o0g(Zn}—g(bh 7 6.2.5 is continuous at z = b, then g(Z,,) -, g(b). 7 Da mesma forma, é quase tao facil mostrar que’seZr— be Si-®, eseg(z, Vk Similarly, it is almost as easy to show that if Z,, —, b and Y,, —’s c, and if g(z, y) is continuo em(z, yF(b,c), entaog(Zn, Sn} g(b, CKEXercicio 16). Na verdade, o Teorema 6.2.5 continuous at (z, y) = (b, c), then g(Z,, Y,,) , g(b, c) (Exercise 16). Indeed, Theo- se estende a qualquer numero finitokde sequéncias que convergem em probabilidade e rem 6.2.5 extends to any finite number k of sequences that converge in probability uma fungdo continua dedvariaveis. and a continuous function of k variables. A lei dos grandes nimeros ajuda a explicar por que um histograma (Definicdo 3.7.9) pode ser The law of large numbers helps to explain why a histogram (Definition 3.7.9) can usado como uma aproximacdo para uma pdf be used as an approximation to a p.d.f. Teorema Histogramas.Deixar%, X2,...ser uma sequéncia de variaveis aleatérias iid. Deixarci per Theorem Histograms. Let X,, X2,... be a sequence of i.i.d. random variables. Let c, < c2 be 6.2.6 duas constantes. DefinirSeu=1 secisXeu< @eSeu=0 se ndo. EntaoSn=1 5 ic, 6.2.6 two constants. Define Y; = 1 if cy < X; <c) and Y; =O if not. Then Y, = 4 er é a proporcdo deXi,..., Xnque estado no intervalo [c1, @), eSn-> Pr.(cis™ <Q). P is the proportion of X1,..., X, that lie in the interval [c), c2), and Y, , Pr(cy < X1 <0). ProvaPor construgdo, 51, S2,...sdo variaveis aleatdrias iid Bernoulli com Proof By construction, Y;, Y>,...areii.d. Bernoulli random variables with param- parametrop=Pr.(ci<X1<c2), O teorema 6.2.4 diz que Srp. = eter p = Pr(cy < X; < cp). Theorem 6.2.4 says that Y,, > p. = Em palavras, o Teorema 6.2.6 diz 0 seguinte: Se desenharmos um histograma com a 4rea da In words, Theorem 6.2.6 says the following: If we draw a histogram with the area barra sobre cada subintervalo sendo a proporcgdo de uma amostra aleatéria que se encontra no of the bar over each subinterval being the proportion of a random sample that lies subintervalo correspondente, entdo a area de cada barra converge em probabilidade para o in the corresponding subinterval, then the area of each bar converges in probability probabilidade de que uma variavel aleatéria da sequéncia esteja no subintervalo. Se a amostra to the probability that a random variable from the sequence lies in the subinterval. for grande, esperariamos que a area de cada barra estivesse proxima da probabilidade. A If the sample is large, we would then expect the area of each bar to be close to the mesma ideia se aplica a um iid condicionalmente (dadoZ=2 amostra, com Pr(ci<X1<@) probability. The same idea applies to a conditionally i.i.d. (given Z =z) sample, with substituido por Pr.(cisXi <c2| ZZ). Pr(cy < X1 < co) replaced by Pr(c, < Xj < c2|Z =z). 354 Capitulo 6 Grandes Amostras Aleatorias 354 Chapter 6 Large Random Samples Figura 6.1Histograma de tempos Figure 6.1 Histogram of ser- de servico para o Exemplo 6.2.4 vice times for Example 6.2.4 juntamente com grafico da pdf together with graph of the condicional a partir da qual os oh3 conditional p.d.f. from which 0.30 tempos de servico foram the service times were simu- , simulados. o 9,25 lated. 5 0.25 5 0,20 5 0.20 2 g * 0,15 * 0.15 0,10 0.10 0,05 0.05 0 2 4 6 8 10 0 2 4 6 8 10 Tempo Time Exemplo Taxa de servico.No Exemplo 3.7.20, desenhamos um histograma de uma amostra observada de Example Rate of Service. In Example 3.7.20, we drew a histogram of an observed sample of 6.2.4 n=100 tempos de servico. Os tempos de atendimento foram simulados como uma 6.2.4 n = 100 service times. The service times were actually simulated as an i.i.d. sample amostra iid da distribuigdo exponencial com pardmetro 0,446. A Figura 6.1 reproduz from the exponential distribution with parameter 0.446. Figure 6.1 reproduces the o histograma sobreposto ao grafico deg(x| 2/ondem= 0.446. Como a largura de cada histogram overlayed with the graph of g(x|zo) where zp = 0.446. Because the width barra é 1, a area de cada barra é igual a proporcdo da amostra que se encontra no of each bar is 1, the area of each bar equals the proportion of the sample that lies in the intervalo correspondente. A area sob a curvag(x| 2} Pr(cisX<@| Z=2) para cada corresponding interval. The area under the curve g(x|zg) is Pr(cy < X1 < c2|Z = Zo) intervalo [c1, 2). Observe o qudo préxima a area sob a pdf condicional corresponde a for each interval [c;, c,). Notice how closely the area under the conditional p.d.f. area de cada barra. - matches the area of each bar. < A razao pela qual a pdf e as alturas das barras no histograma da Fig. 6.1 The reason that the p.d.f. and the heights of the bars in the histogram in Fig. 6.1 correspondem tao intimamente é que a area de cada barra esta convergindo em match so closely is that the area of each bar is converging in probablity to the area probabilidade para a area sob o grafico da pdf. 1, que é igual a area sob o under the graph of the p.d.f. The sum of the areas of the bars is 1, which is the same grafico do pdf. Se tivéssemos escolhido as alturas das barras no histograma as the area under the graph of the p.d.f. If we had chosen the heights of the bars in para representar as contagens, entdo a soma das areas das barras teria sidon= the histogram to represent counts, then the sum of the areas of the bars would have 100, e as barras teriam sido cerca de 100 vezes mais altas que o pdf been n = 100, and the bars would have been about 100 times as high as the p.d.f. Poderiamos escolher uma largura diferente para os subintervalos no histograma e We could choose a different width for the subintervals in the histogram and still ainda manter as areas iguais as proporcgées nos subintervalos. keep the areas equal to the proportions in the subintervals. Exemplo Taxa de servico.No Exemplo 6.2.4, podemos escolher 20 barras de largura 0,5 em vez de 10 barras Example Rate of Service. In Example 6.2.4, we can choose 20 bars of width 0.5 instead of 10 bars 6.2.5 de largura 1. Para fazer com que a area de cada barra represente a proporcao no subintervalo, 6.2.5 of width 1. To make the area of each bar represent the proportion in the subinterval, a altura de cada barra deve ser igual 4 propor¢do dividida por 0,5. A probabilidade de uma the height of each bar should equal the proportion divided by 0.5. The probability of observacdo estar em cada intervalo [c1, 2seria an observation being in each interval [c,, c.) would be Ja C2 Pr.(caisM <aQ| Z=x GX| Zdx=(a2-ci)gfcit+a]/2| Z) Pr(cy < Xy < |Z =x) = / B(x|z)dx © (cp — cy) g([cy + &]/2Iz) ca cy =0.5+9(.cit+@]/2 | z). (6.2.6) = 0.5 * g([c) + C]/2|z). (6.2.6) Lembre-se de que a probabilidade em (6.2.6) deve estar proxima da proporc¢do da Recall that the probability in (6.2.6) should be close to the proportion of the sample amostra no intervalo. Se dividirmos a probabilidade e a proporcdo por 0,5, vemos que a in the interval. If we divide both the probability and the proportion by 0.5, we see altura da barra do histograma deve estar proxima deg([ci+@]/2). Portanto, o grafico da that the height of the histogram bar should be close to g([c, + c)]/2). Hence, the pdf ainda deve estar proximo das alturas das barras do histograma. O que estamos graph of the p.d.f. should still be close to the heights of the histogram bars. What fazendo aqui é escolher R=n(b-a)/kna Definicdo 3.7.9. A Figura 6.2 mostra o histograma we are doing here is choosing r = n(b — a)/k in Defintion 3.7.9. Figure 6.2 shows the com 20 intervalos de comprimento 0,5 junto com a mesma fdp da Figura 6.1. As alturas histogram with 20 intervals of length 0.5 together with the same p.d_-f. from Fig. 6.1. das barras ainda sdo semelhantes as do pdf, mas sdo muito mais varidveis em The bar heights are still similar to the p.d.f., but they are much more variable in 6.2 A Lei dos Grandes Numeros 355 6.2 The Law of Large Numbers 355 Figura 6.2Histograma Figure 6.2 Modified his- modificado de tempos de 04 togram of service times from 0.4 servigo do Exemplo 6.2.4 junto ‘ Example 6.2.4 together with com grafico da pdf condicional. graph of the conditional p.d-f. Desta vez, a largura de cada 0,3 This time, the width of each 0.3 intervalo é 0,5. g interval is 0.5. 2 3 Z 5 0,2 A 0.2 0,1 0.1 0 2 4 6 8 10 0 2 4 6 8 10 Tempo Time Figura 6.2 comparada com Figura 6.1. O Exercicio 17 ajuda a explicar por que as alturas das barras Fig. 6.2 compared to Fig. 6.1. Exercise 17 helps to explain why the bar heights are sdo mais varidveis neste exemplo. - more variable in this example. < O raciocinio utilizado para construir as Figuras 6.1 e 6.2 aplica-se mesmo quando os The reasoning used to construct Figures 6.1 and 6.2 applies even when the subintervalos utilizados para construir o histograma tém larguras diferentes. Neste caso, cada subintervals used to construct the histogram have different widths. In this case, each barra deve ter altura igual 4 contagem bruta dividida por ambosn(o tamanho da amostra) e a bar should have height equal to the raw count divided by both n (the sample size) largura do subintervalo correspondente. and the width of the corresponding subinterval. Leis fracas e leis fortes Weak Laws and Strong Laws Existem outros conceitos de convergéncia de uma sequéncia de variaveis There are other concepts of the convergence of a sequence of random variables, aleatérias, além do conceito de convergéncia em probabilidade apresentado acima. in addition to the concept of convergence in probability that has been presented Por exemplo, diz-se que uma sequénciaZ, 22,...converge paraaconstanteb com above. For example, it is said that a sequence Z), Z5, ... converges to a constant b probabilidade 1 se with probability 1 if ) | Pré-limZ =b=1. Pr ( lim Z, =b) =1. neon noo Uma investigagdo cuidadosa do conceito de convergéncia com probabilidade 1 esta além do A careful investigation of the concept of convergence with probability 1 is be- escopo deste livro. Pode-se mostrar que se uma sequénciaZi, 2, .. converge para bcom yond the scope of this book. It can be shown that if asequence Z), Z>, .. .converges to probabilidade 1, entéo a sequéncia também convergira parabem probabilidade. Por esta razdo, a b with probability 1, then the sequence will also converge to b in probability. For this convergéncia com probabilidade 1 é frequentemente chamada forte convergéncia, enquanto a reason, convergence with probability 1 is often called strong convergence, whereas convergéncia em probabilidade é chamadaconvergéncia fraca. Para enfatizar a distingdo entre estes convergence in probability is called weak convergence. In order to emphasize the dois conceitos de convergéncia, o resultado que aqui foi chamado simplesmente de lei dos grandes distinction between these two concepts of convergence, the result that here has been numeros é muitas vezes chamado de lei dos grandes nuimeros./e/ fraca dos grandes numeros. Olei called simply the law of large numbers is often called the weak law of large numbers. forte dos grandes numerospode entdo ser expresso da seguinte forma: SeXné a média amostral de The strong law of large numbers can then be stated as follows: If X,, is the sample uma amostra aleatéria de tamanhonde uma distribuicdo com médiay, entao mean of a random sample of size n from a distribution with mean jw, then ( _ ) _ Pré-limX ,=p=1. Pr (lim X,, =) =1. noo n= noo.” Mh A prova deste resultado nao sera dada aqui. Existem exemplos de sequéncias de variaveis The proof of this result will not be given here. There are examples of sequences of aleatorias que convergem em probabilidade, mas que ndo convergem com a random variables that converge in probability but that do not converge with proba- probabilidade 1. O Exercicio 22 6 um exemplo. Outro tipo de converge éconvergéncia em bility 1. Exercise 22 is one such example. Another type of converges is convergence média quadrdtica, que é apresentado nos Exercicios 10-13. in quadratic mean, which is introduced in Exercises 10-13. 356 Capitulo 6 Grandes Amostras Aleatérias 356 Chapter 6 Large Random Samples Limites de Chernoff Chernoff Bounds Uma maneira de pensar na desigualdade de Chebyshev é como uma aplicagdo da desigualdade One way to think of the Chebyshev inequality is as an application of the Markov de Markov a variavel aleatoria(X-yp. Esta ideia generaliza-se para outras fungées e leva a um inequalitty to the random variable (X — 1)”. This idea generalizes to other functions limite mais nitido na probabilidade na cauda de uma distribuigéo quando o limite se aplica. and leads to a sharper bound on the probability in the tail of a distribution when the Antes de apresentar o resultado geral, damos um exemplo simples para ilustrar a melhoria bound applies. Before giving the general result, we give a simple example to illustrate potencial que ele pode proporcionar. the potential improvement that it can provide. Exemplo Variavel aleatéria binomial.Suponha queXtem a distribuigdo binomial com parametros Example Binomial Random Variable. Suppose that X has the binomial distribution with param- 6.2.6 éteresne 1/2. Gostariamos de ter um limite para a probabilidade de queX/nestd longe de sua média 1/ 6.2.6 eters n and 1/2. We would like a bound to the probability that X/n is far from its 2. Para ser mais especifico, suponha que gostariamos de um limite para mean 1/2. To be specific, suppose that we would like a bound for (I | ) xX 1 1 xX 1 1 Pr. | z= =. 6.2.7 Pr(/—-—-|>—}. 6.2.7 nm 2) 10 (027) (; 3|* 0) 27) A desigualdade de Chebyshev da 0 limite Var(X/n)1 102, que é igual a 25/n. The Chebyshev inequality gives the bound Var(X/n)/(1/10)*, which equals 25/n. Em vez de aplicar a desigualdade de Chebyshev, definaS=X-n/2 e reescreva a Instead of applying the Chebyshev inequality, define Y = X — n/2 and rewrite probabilidade em (6.2.7) como a soma das duas probabilidades a seguir: the probability in (6.2.7) as the sum of the following two probabilities: ( ) ( ) pr, a1, 1 =Pr, s& 2 , @ p(X > 54h) =rr(v>4). and ’ n2 1 °, ( 10 ’ n- 2 10 10 Pr. x < 14 =pr. -s 2. (6.2.8) Pr (< < tL 5) = Pr (-v > “) . (6.2.8) n 2 = 10 10 0) 10 Para cadae >0, reescreva a primeira das probabilidades em (6.2.8) como For each s > 0, rewrite the first of the probabilities in (6.2.8) as ( n ! ( hs n ns Pr. S= ~~ =Pr, experiéncia(sYzexperiéncia ~__ Pr Y = = Pr ex Y > ex —_ 10 ™“ 10 ( 15) | Pen (ia) < Alexp.(sYI] < Elexp(sY)] ~ experiéncia(ns/10) , ~ exp(ns/10) , onde a desigualdade decorre da desigualdade de Markov. Esta equacgdo envolve a where the inequality follows from the Markov inequality. This equation involves fungdo geradora de momento deS, Ys flexp.(sYJ]. O mgf deSpode ser encontrado the moment generating function of Y, y(s) = E[exp(sY)]. The m.g.f. of Y can be aplicando o Teorema 4.4.3 comp=1/2,4a=1, eb= -n/2 juntamente com a Equa¢do found by applying Theorem 4.4.3 with p = 1/2, a =1, and b = —n/2 together with (5.2.4). O resultado é Equation (5.2.4). The result is ( 1 ) n 1 n W(SF 5 [exp.(s#1] exp-s/2) , (6.2.9) vis)= (5 [exp(s) + 1] exp(-s/2)) , (6.2.9) para todosé Deixar é=1/2 em (6.2.9) para obter o limite for all s. Let s = 1/2 in (6.2.9) to obtain the bound ( ) Pr. S2 a <W(l 2experiéncia(’-n/20) Pr (v > “) < w(1/2) exp(—n/20) ( 1 ) n 1 n =experiéncia¢-n/20) 5 [exp.(i21]exp-1A) =0.9811n. = exp(—n/20) (5 [exp(1/2) + 1] exp(-1/4)) = 0.9811". Da mesma forma, podemos escrever a segunda probabilidade em (6.2.8) como Similarly, we can write the second probability in (6.2.8) as ( n) ( Ms n ns Pr -S2 ~ =Pr. experiéncia(-sim/zexperiéncia ~ , 6.2.1 0 Pr -Y == => Pr ex —_ Y = ex __ ’ 6.2.10 10 resencansve 10 (62-10) ( =i) | Posy) = (5) (6.2.10) ondee >0. O maf de -SéW-e). Deixar&=1/2 em (6.2.10) e aplique a desigualdade de where s > 0. The m.g.f. of —Y is y(—s). Let s = 1/2 in (6.2.10) and apply the Markov Markov para obter o limite inequality to obtatin the bound 6.2 A Lei dos Grandes Numeros 357 6.2 The Law of Large Numbers 357 ( n ) Ch Pr. - SB 10 <1 2)experiéncia(-n/20) Pr (-y > “) < w(-1/2) exp(—n/20) ( 1 ) n 1 n =experiénciaén/20) 5 [exp.-12}1]exp(iA) =0.9811n. = exp(—n/20) (5 [exp(—1/2) + 1] exp(l/4)) = 0.9811". Portanto, obtemos o limite ( Hence, we obtain the bound ) Xx i 1 xX 1 1 Pr} j— =! — 200.9811). 6.2.11 Pr | |— — =| = — ) < 20.9811)”. 6.2.11 EF ae 79 <209eTm (62.11) (|~- 5]2 4) <2e98n (62.11) O limite em (6.2.11) diminui exponencialmente rapido 4 medida quenaumenta, enquanto o The bound in (6.2.11) decreases exponentially fast as n increases, while the Cheby- Chebyshev liga 25/ndiminui proporcionalmente para 1/n. Por exemplo, comn=100,200, shev bound 25/n decreases proportionally to 1/n. For example, with n = 100, 200, 300, os limites de Chebychev sdo 0,25, 0,125 e 0,0833. Os limites correspondentes de 300, the Chebychev bounds are 0.25, 0.125, and 0.0833. The corresponding bounds (6.2.11) sao 0,2967, 0,0440 e 0,0065. - from (6.2.11) are 0.2967, 0.0440, and 0.0065. < Aescolha deé=1/2 no Exemplo 6.2.6 era arbitrario. O Teorema 6.2.7 diz que The choice of s = 1/2 in Example 6.2.6 was arbitrary. Theorem 6.2.7 says that we podemos substituir esta escolha arbitraria pela escolha que conduz ao menor limite can replace this arbitrary choice with the choice that leads to the smallest possible possivel. A prova do Teorema 6.2.7 € uma aplicagdo direta da desigualdade de bound. The proof of Theorem 6.2.7 is a straightforward application of the Markov Markov. (Veja o Exercicio 18 nesta segdo.) inequality. (See Exercise 18 in this section.) Teorema Limites de Chernoff.DeixarXser uma variavel aleatoria com fungdo geradora de momentow. Theorem Chernoff Bounds. Let X be a random variable with moment generating function y. 6.2.7 Entdo, para cada realt, 6.2.7 Then, for every real r, Pr. (Ae tsexperiéncia minima(st)W(s). = Pr(x >1)< min exp(—st)y(s). = e. s> O Teorema 6.2.7 é mais Util quandoXé a soma demvariaveis _aleatdrias iid, cada uma Theorem 6.2.7 is most useful when X is the sum of n i.i.d. random variables each com mdf finito e quaandot=ndopor um grande valor dene alguns corrigidosvocé. Este foi o with finite m.g.f. and when t = nu for a large value of n and some fixed u. This was caso no Exemplo 6.2.6. the case in Example 6.2.6. Exemplo Média de amostra aleatéria geométrica.Suponha queX1, X2, . . .sd0 iid geométricos Example Average of Geometric Random Sample. Suppose that X,, X>,... are iid. geometric 6.2.7 varidveis aleatérias com parametrop. Gostariamos de ter um limite para a probabilidade de que Xn 6.2.7 random variables with parameter p. We would like a bound to the probability that esta longe da média(1 -p//p. Para ser mais especifico, para cada fixovocé >0, gostariamos X,, is far from the mean (1 — p)/p. To be specific, for each fixed u > 0, we would like um limite para a bound for (| 1p |) 1 Pr,| Ly ay Pueee. (6.2.12) Pr (|x, - 2 > “) ; (6.2.12) Pp Deixar X= 2? Xeunt -p/p. Para cadavocé >0, 0 Teorema 6.2.7 pode ser usado para limitar Let X = )°"_, X; —n(1— p)/p. For each u > 0, Theorem 6.2.7 can be used to bound ambos both ( 1- 1- Pr. Xnz Poy vocéPr.(X2ndo)e Pr (x, > a + “) =Pr(xX >nu), and Pp p (ou) oe Pr. Xns _P vocéPr. (-X2ndo). Pr (x, < a “) = Pr(—X > nu). Pp p Como (6.2.12) é igual a Pr(Xendo}Pr.¢-X2ndGo), 0 limite que procuramos é a soma dos Since (6.2.12) equals Pr(X > nu) + Pr(—X > nu), the bound we seek is the sum of dois limites que obtemos para Pr(X2ndo}e Pr(-X2ndo). the two bounds that we get for Pr(X > nu) and Pr(—X > nu). O maf deXpode ser encontrado aplicando o Teorema 4.4.3 coma=1 e The m.g.f. of X can be found by applying Theorem 4.4.3 with a=1 and b= -n(\ -p)/pjuntamente com o Teorema 5.5.3. O resultado é b=-—n(1 — p)/p together with Theorem 5.5.3. The result is ( ) n -e(1 -pyv —s(1- Ws pexpl-e(] -pYp\n _ . (6.2.13) ws) = (geet sc vir!) ; (6.2.13) 1 -(1 -pyexperiéncia(s) 1—(1— p) exp(s) O mof de -XéW-e). De acordo com 0 Teorema 6.2.7, The m.g.f. of —X is w(—s). According to Theorem 6.2.7, Pr.(X2ndo}smin wlekexperiénciaé-snu). (6.2.14) Pr(x > nu) < min w(s) exp(—snu). (6.2.14) eé. s> 358 Capitulo 6 Grandes Amostras Aleatérias 358 Chapter 6 Large Random Samples Encontramos o minimo dew/(eexperiéncia’-bemencontrando o minimo de seu logaritmo. We find the minimum of w(s) exp(—snu) by finding the minimum of its logarithm. Usando (6.2.13), obtemos que Using (6.2.13), we get that { . } _ registro[ w(sjexperiéncia¢-bem/ =nregistro(P}-é a? registro[1 -(1 -pjexperiéncia(s)] -su. log[y(s) exp(—snu)] =n {tox.o) —s—P_ log[1 — (1 — p) exp(s)] — su| . Pp P A derivada desta expressdo em relacgdo aé igual a0 em The deriviative of this expression with respect to s equals 0 at [ ] +acimat - 1 1- & -registro (+acimatl “Py -p), (6.2.15) s =—log jee Pa — »)| ; (6.2.15) acima+! -p up+1—p e a segunda derivada € positiva. Sevocé >0, entdo o valor deéem (6.2.15) é positivo e and the second derivative is positive. If u > 0, then the value of s in (6.2.15) is positive ¥(eX finito. Portanto, o valor deéem (6.2.15) fornece 0 minimo em (6.2.14). Esse and y(s) is finite. Hence, the value of s in (6.2.15) provides the minimum in (6.2.14). minimo pode ser expresso comoqnonde That minimum can be expressed as q” where [ . ]vocé+(1-p)f u+(1—p)/ 5 (1 t+acimat - pep (l+u)p+1— Pup g= [pl +vocé1 -p) = ESAT Py py (6.2.16) q=(p(l tu) +1—p]}|— PP — p) (6.2.16) acima+! -p up +1—p e 0<q <1. (Veja o Exercicio 19 para uma prova.) Portanto, Pr(X2ndo}qn. > Para pr(-xe and 0 < q < 1. (See Exercise 19 for a proof.) Hence, Pr(X > nu) <q". ndo), notamos primeiro que Pr/-X2ndGo 0 sevocé2(1 -p)/pporque For Pr(—X > nu), we notice first that Pr(—X > nu) = Oif u > (1 — p)/p because éu=1 Xeu20. Sevocé2( -p)/p, entdo o limite geral em (6.2.12) égn. Para 0<vocé < yo, X; = 0. If u = (1 — p)/p, then the overall bound on (6.2.12) is q". For 0 <u < (1 -p//p, 0 valor deéque minimiza W-eyexperiéncia(-bemé (1 — p)/p, the value of s that minimizes w(—s) exp(—snu) is [ ] -acimat - 1- 1- & -registro (1 -acima"l -P_ -p), s =—log jee Pa — »)| , 1 -p-acima 1-—p-—up que é positivo quando 0<vocé <(1 -p//p. O valor de mine-oW(-eyexperiéncia- bem Rn, which is positive when 0 < u < (1 — p)/p. The value of min,.9 w(—s) exp(—snu) is onde r”, where [ . ]-vocé+(1-py —u+(1—p)/ - (1 -acimat - PYP (l—u)p+1— PUP R= [pf -vocé1 py EASON “Py -p) r=[pd—u) +1—p]| —“P * Pg _ py 1 -p-acima 1-—p-—up e O<r<1. Portanto, o limite de Chernoff égnsevocé=(1 -p)/pe égnt Rnse 0<vocé <(1 -p//p. and 0 <r <1. Hence, the Chernoff bound is qg” if u > (1 — p)/p and is q” +r” if Como tal, 0 limite diminui exponencialmente rapido 4 medida quenaumenta. Esta é uma 0 <u <(1—p)/p. As such, the bound decreases exponenially fast as n increases. melhoria acentuada em relagdo ao limite de Chebyshev, que diminui como uma constante This is a marked impovement over the Chebyshev bound, which decreases like a ao longon. - constant over n. < Resumo Summary A lei dos grandes numeros diz que a média amostral de uma amostra aleatéria converge em The law of large numbers says that the sample mean of a random sample converges probabilidade para a médiaydas varidveis aleatérias individuais, se a variancia existir. Isso significa in probability to the mean yz of the individual random variables, if the variance exists. que a média amostral estara préxima dese o tamanho da amostra aleatoria for suficientemente This means that the sample mean will be close to wy if the size of the random sample grande. A desigualdade de Chebyshev fornece um limite (bruto) de qudo alta é a probabilidade de a is sufficiently large. The Chebyshev inequality provides a (crude) bound on how high média da amostra estar préxima dey. Os limites de Chernoff podem ser mais nitidos, mas sdo mais the probability is that the sample mean will be close to ~. Chernoff bounds can be dificeis de calcular. sharper, but are harder to compute. Exercicios Exercises 1.Para cada inteiron, deixarXnser uma variavel aleatéria ndo 3.Suponha queXé uma variavel aleatoria para a qualEX 1. For each integer n, let X,, be a nonnegative random 3. Suppose that X is a random variable for which E(X) = negativa com média finitayn. Prove que se limn>-pin=0, 10, Pr.(XS70.2, e Pr.(X213)-0.3. Prove que Var(X29/2. variable with finite mean y,,. Prove that if lim,_,45 My, = 0, 10, Pr(x <7) =0.2, and Pr(X > 13) =0.3. Prove that entaoXir-50. then X, > 0. Var(X) > 9/2. 2.Suponha queXé uma variavel aleatéria para a qual 4.DeixarXseja uma variavel aleatéria para a qualEXye Var (Xo. 2. Suppose that X is a random variable for which 4. Let X be a random variable for which E(X) = uw and Construa uma distribuigéo de probabilidade paraX de tal modo Var(X) =o”. Construct a probability distribution for X Pr.(X201 e Pr(xX210F 1A. que Pr(X > 0) =1and Pr(xX > 10) = 1/5. such that Prove issoEX22. Pr.(| <p| 230F 1A. Prove that E(X) > 2. Pr([X — | = 30) = 1/9. 6.2 A Lei dos Grandes Numeros 359 6.2 The Law of Large Numbers 359 5.Qual o tamanho de uma amostra aleatéria que deve ser retirada de a.Existe uma constantecpara o qual a sequéncia 5. How large arandom sample must be taken from a given a. Does there exist a constant c to which the sequence uma determinada distribuigéo para que a probabilidade seja de pelo converge em probabilidade? distribution in order for the probability to be at least 0.99 converges in probability? menos 0,99 de que a média amostral esteja dentro de 2 desvios padrao b.Existe uma constanteqpara o qual a sequéncia that the sample mean will be within 2 standard deviations b. Does there exist a constant c to which the sequence da média da distribuicdo? converge em média quadratica? of the mean of the distribution? converges in quadratic mean? 6.Suponha que, .. . , Xformar uma amostra aleatoria de 14.DeixarSeja um PF para uma distribuicdo discreta. Suponha 6. Suppose that X;,..., X, formarandom sample ofsize 44, Let f be a p.f. for a discrete distribution. Suppose tamanho nde uma distribuicao paraa qual amédiaé6,5ea que ffx}-0 parax€ [0,1]. Prove que a variancia desta n from a distribution for which the mean is 6.5 and the that f(x) =0 for x ¢[0, 1]. Prove that the variance of variancia € 4, Determine qudo grande € o valor dendeve ser distribuicdo é no maximo 1/4.Dica-Prove que existe uma variance is 4. Determine how large the value ofm must be tig distribution is at most 1/4. Hint: Prove that there is para que a seguinte relacao seja satisfeita: distribuicdo suportada apenas nos dois pontos {0,1} que tem in order for the following relation to be satisfied: a distribution supported on just the two points {0, 1} that Pr.(6<Xn&7 20.8. variancia pelo menos tao grande quanto faz e entdo provar Pr(6< X, <7) >08. has variance at least as large as f does and then prove that que a variancia de uma distribuicgdo suportada em {0,1} 6 no the variance of a distribution supported on {0, 1} is at most 7.Suponha queXé uma variavel aleatéria para a qualEX) = maximo 1/4. 7. Suppose that X is a random variable for which E(X) = 1/4. = pe EL (X- =a, Prove isso and E[(X — j)*] = fy. Prove that peel(XH)s] =e 15.Prove o Teorema 6.2.5. M [( w= Ba 15. Prove Theorem 6.2.5. Pr.(|Xp| 204. B 16.Suponha queZ—p, Sr, eg(z/ Vk uma fungdo Pr(|X — w| >t) < a 16. Suppose that Z,, bp, Y,, —.¢, and g(z, y) is a continua em(z, y=(b,c). Prove isso 9(Zn, Snkconverge em function that is continuous at (z, y) = (b, c). Prove that 8.Suponha que 30% dos itens de um grande lote fabricado probabilidade parag(b, c). 8. Suppose that 30 percent of the items in a large manu- g(Z,,, Y,) converges in probability to g(b, c). sejam de baixa qualidade. Suponha também que uma . oo ea . . factured lot are of poor quality. Suppose also that a ran- . Coe, . amostra aleatéria denitens devem ser retirados do lote e 17.DeixarXtem a distribui¢ao binomial com parametros dom sample of n items is to be taken from the lot, and _—-17- Let X have the binomial distribution with parameters deixadosPndenotam a proporcao de itens na amostra que nep. Deixar Stem a distribuicao binomial com let Q,, denote the proportion of the items in the sam- 7 4Nd p. Let Y have the binomial distribution with param- sao de baixa qualidade. Encontre um valor dental que Pr@. | Parametrosnep/kcomk >1. DeixeZ=ky. ple that are of poor quality. Find a value of n such that tet 7 and p/k withk > 1. Let Z=kyY. 2<Pns0.420.75 usando(a)a desigualdade de Chebyshev e( a.Mostre issoxXeZtém a mesma média. Pr(0.2 < Q,, < 0.4) = 0.75 by using (a) the Chebyshev in- a. Show that X and Z have the same mean. b)as tabelas da distribuigdo binomial no final deste livro. b.Encontre as variacdes deXeZ, Mostre que, sepé pequena, equality and (b) the tables of the binomial distribution at b. Find the variances of X and Z. Show that, if p is small, entdo a variancia deZé aproximadamentedvezes tao grande the end of this book. then the variance of Z is approximately k times as 9.DeixarZ7i1, Z2,...ser uma sequéncia de variaveis aleatérias, quanto a variancia dex. 9. Let Z,, Z, ... be a sequence of random variables, and large as the variance of X. e suponha que, paran=1,2, ...,a distribuigdo deZné o c.Mostre por que os resultados acima explicam a maior variabilidade suppose that, forn = 1, 2,..., the distribution of Z,, is as c. Show why the results above explain the higher vari- seguinte: nas alturas das barras na Figura 6.2 em comparacdo com a follows: ability in the bar heights in Fig. 6.2 compared to 1 1 Figura 6.1. 1 1 Fig. 6.1. Pr.(Z=meE —-— e Pr.(Z,= O)=1 - -. Pr(Z, =n’?)=— and Pr(Z,=0)=1--. n n 18.Prove 0 Teorema 6.2.7. n n 18. Prove Theorem 6.2.7. Mostre isso 19.Volte ao Exemplo 6.2.7. Show that 19. Return to Example 6.2.7. limao£(ZremasZ->0. n p a.Prove que 0 minimoe>0W(eyexperiéncia- bem igual aqn, lim E(Z,) = oo but Z,, 50. a. Prove that the min,.o w(s) exp(—snu) equals q”, > 00 soo : ondegé dado em (6.2.16). oo " where q is given in (6.2.16). 10,Diz-se que uma eo nreben de via sued qe ororiasZ\, 2... b.Prove que 0<q <1.Dica-Primeiro, mostre que 0<q <1 sevocé=0. ” It is said that a eee tant ° random variables 41, b. Prove that 0 <q < 1. Hint: First, show that 0 <q <1 converge paraaconstantebem média quadrdatica se Em seguida, deixex-acimat1 -pe mostre esse log(q) é uma 9,... converges to a constant b in quadratic mean if ifu = 0. Next, let x =up +1— pand show that log(q) limaoAl(Zr bp] = 0. (6.2.17) fungdo decrescente dex. lim E[(Z, — by] —0. (6.2.17) is a decreasing function of x. noo noo 20.Volte ao Exemplo 6.2.6. Encontre o limite de Chernoff para a . . . . 20. Return to Example 6.2.6. Find the Chernoff bound for Mostre que a Eq. (6.2.17) é satisfeito se e somente se probabilidade ome 27) P Show that Eq. (6.2.17) is satisfied if and only if the probability in (62 7) jimao £(Zn)= be Lim Vvar(ZF0. n 21.DeixarXi, X2,...ser uma sequéncia de variaveis aleatdrias Jim, E(Z,)=b and Jim, Var(Zn) = 0. 21. Let X,, X2,... be a sequence of ii.d. random vari- .. « iid tendoya distribuigdo exponencial com pardmetro . . ables having the exponential distribution with parameter Dica‘Use 0 Exercicio 5 da Secao. 4.3. 1. Deixe Sr Bet Xeupara cadan=1,2,.... Hint: Use Exercise 5 of Sec. 4.3. 1. Let Y, = ar X; for eachn =1,2,.... 11.Prove que se uma sequénciaZ1, 22, .. .converge para uma a.Para cadavocé >1, calcule o limite de Chernoff em 11. Prove that if a sequence Zj, Z>,... converges to a a. For each u > 1, compute the Chernoff bound on constantebna média quadratica, entao a sequéncia também Pr(Sn>nao). constant b in quadratic mean, then the sequence also con- Pr(Y, > nu). converge parabem probabilidade. , verges to b in probability. ‘ . b.O que corre mal se tentarmos calcular o limite de b. What goes wrong if we try to compute the Chernoff 12.DeixarXnser a média amostral de uma amostra aleatéria de Chernoff quandovocé <1? 12. Let X, be the sample mean of a random sample of bound when u < 1? tamanhonde uma distribuigdo para a qual a média éye a - ; . size n from a distribution for which the mean is jz and the . . vow : — 22.Neste exercicio, construimos um exemplo de uma sequéncia : a) 2 > 22. In this exercise, we construct an example of a se- variagdo é02, ondea2<e,Mostre issoXnconverge para em p variance is 0~, where o~ < oo. Show that X,, converges to : Pp média quadratica comon— , de variadveis aleatériasZnde tal modo queZr—0 mas y in quadratic mean as n > 00. quence of random variables Z,, such that Z,, —> 0 but ( ) ; 13.DeixarZ1, Z2,...ser uma sequéncia de variaveis aleatorias, Pré-limZ ,=0=0. (6.2.18) 13. Let Z;, Z2,... be asequence of random variables, and Pr ( lim Z, = 0) —0. (6.2.18) e suponha que paran=2,3,...,a distribuigdo deZné tao mr suppose that for n = 2, 3,..., the distribution of Z,, is as n—> 00 segue ) Aquilo é,Znconverge em probabilidade para 0, masZnndo converge follows: That is, Z,, converges in probability to 0, but Z,, does not 1 1 1 para 0 com probabilidade 1. Na verdade,Znconverge para 0 com 1 1 1 converge to 0 with probability 1. Indeed, Z,, converges to Pr.Zn= 7 =1- m° Pr.(Z,=NF = probabilidade 0. Pri Zn =~ J = 1- 72 and = Pr(Z, =n) = 7 0 with probability 0. 360 Capitulo 6 Grandes Amostras Aleatérias 360 Chapter 6 Large Random Samples DeixarXser uma variavel aleatéria com distribui¢do c.Definir Let X be arandom variable having the uniform distri- c. Define uniforme no intervalo [0,1]. Vamos construir uma { 1 . kok + ke bution on the interval [0, 1]. We will construct a sequence l ifG.—1 . sequéncia de funcgdes/n(x)paran=1,2, .. .e definirZn=hn(X). hil Ser Vk ns x <J n/kn, of functions h,,(x) forn = 1, 2,...and define Z, =h,(X). h, (x) = | ' Gin — D/ kn = < Jnl Kn Cada fungdo/massumira apenas dois valores, 0e 1. O 0 senao. Each function h, will take only two values, 0 and 1. The 0 if not. conjunto coronene é ceterminad® owidinds © Mostre que, para cadax€ [0,1),An(x}=1 para ume set ote fin) = lis determined Oy dividing the n Show that, for every x €[0, 1), h,(x) =1 for one intervalo [0,1] emksubintervalos ndo sobrepostos de terval [0, 1] into k nonoverlappling subintervals of len _ comprimento 1/kparak=1,2,.. » organizando esses apenas ummentre 1 +(Kir)kn/2, .. . . kn(knt 1/2. 1/k foe ay 2.055 arranging these intervals in sequence, i one n among 1+ (kn — Dn /2s + «+ knln + intervalos em sequéncia e deixando/n(x¥1 nono intervalo d.Mostre issoZn=/n(XJassume 0 valor 1 infinitamente and letting h,,(x) = 1 on the nth interval in the sequence d. Show that Z, =h,(X) takes the value 1 infinitely na sequéncia paran=1,2,....Para cadak, haksubintervalos com probabilidade 1. for n= 1, 2,.... For each k, there are k nonoverlapping often with probability 1. ndo sobrepostos, entdo o numero de subintervalos com on subintervals, so the number of subintervals with lengths comprimento 1,1/2,12, ...,1/kKé e.Mostre que (6.2.18) é valido. 1, 1/2, 1/3,...,1/kis e. Show that (6.2.18) holds. f.Mostre que Pr(Zn=0 1-1 /kne limaon-okn= %, f. Show that Pr(Z, = 0) =1—1/k, and lim,_,.. k, = 1+24+3+.. +h K(k) 14243400 4h — RED oo. 2 g.Mostre issoZ-~0. 2 g. Show that Z,, —> 0. O restante da construcdo é baseado nesta férmula. O primeiro 23.Prove que a sequéncia de variadveis aleatériasZnno The remainder of the construction is based on this for- 23. Prove that the sequence of random variables Z,, in intervalo da sequéncia tem comprimento 1, os préximos dois tém Exercicio 22 converge em média quadratica (definigao no mula. The first interval in the sequence has length 1, the Exercise 22 converges in quadratic mean (definition in comprimento 1/2, os proximos trés tem comprimento 1/3, etc. Exercicio 10) para 0. next two have length 1/2, the next three have length 1/3, Exercise 10) to 0. 24.Neste exercicio, construimos um exemplo de uma ete. 24. In this exercise, we construct an example of a se- a.Para cadar-1,2, . . .. prove que existe um numero inteiro sequéncia de varidveis aleatériasZnde tal modo queZn a. For each n = 1, 2,..., prove that there is a unique — quence of random variables Z,, such that Z,, converges positivo unicoknde tal modo que converge para 0 com probabilidade 1, masZnndo consegue positive integer k, such that to 0 with probability 1, but Z,, fails to converge to 0 in convergir para 0 na média quadratica. DeixarXser uma uadratic mean. Let X be a random variable having the (ir kn <ns Ktko¥ 1) _ . variével aleatoria com distribuiggo uniforme no intervalo [0,1]. (Kn = Dk <n< knlkn +) . uniform distribution on the interval [0, 1]. Define the se- 2 2 Defina a sequénciaZnporZn=mese 0<X<1 /neZn=0 caso contrario. 2 7 2 quence Z,, by Z,, =n? if0 < X <1/nand Z, =O otherwise. oo a.Prove issoZconygrge para 0 com probabilidade 1. . a. Prove that Z, converges to 0 with probability 1. b.Para cadan=1,2,...,deixarj n=n-(kn-1)kn/2. Mostrar . . a b. Foreachn =1,2,..., let j, =n — (k, — Dk,/2.Show . . quejnassume os valores 1,..., kncomonpassa por b.Prove issoZnnao converge para 0 na média that j, takes the values 1, ..., &, as n runs through b. Prove that Z,, does not converge to 0 in quadratic 1 +(kr-1)kn/2, ..., kn(knt1)22. quadratica. 1+ (ky — Dk,/2. .-- 5 Ky Ky + D)/2. mean. 6.3 O Teorema Central do Limite 6.3. The Central Limit Theorem A média amostral de uma grande amostra aleatoria de varidveis _aleatdrias com média ps The sample mean of a large random sample of random variables with mean e variancia finita o2tem aproximadamente a distribuic¢o normal com média pe variancia o and finite variance 0” has approximately the normal distribution with mean 2/n.Este resultado ajuda a justificar o uso da distribuigzo normal como modelo para muitas and variance o7/n. This result helps to justify the use of the normal distribution varidveis aleatorias que podem ser pensadas como sendo compostas por muitas partes as a model for many random variables that can be thought of as being made up independentes. E dada outra vers&o do teorema do limite central que se aplica a varidveis of many independent parts. Another version of the central limit theorem is given aleatorias independentes que nao s&o distribuidas de forma idéntica. Também that applies to independent random variables that are not identically distributed. introduzimos o método delta, que nos permite calcular distribui¢ées aproximadas para We also introduce the delta method, which allows us to compute approximate funcées de varidveis aleatorias. distributions for functions of random variables. Declaracgdo do Teorema Statement of the Theorem Exemplo Uma grande amostra.Um ensaio clinico tem 100 pacientes que receberdo um tratamento. Pacientes Example A Large Sample. A clinical trial has 100 patients who will receive a treatment. Patients 6.3.1 que nao recebem o tratamento sobrevivem 18 meses com probabilidade de 0,5 cada. 6.3.1 who don’t receive the treatment survive for 18 months with probability 0.5 each. We Assumimos que todos os pacientes sdo independentes. O ensaio visa verificar se 0 novo assume that all patients are independent. The trial is to see whether the new treatment tratamento pode aumentar significativamente a probabilidade de sobrevivéncia. DeixarXsera o can increase the probability of survival significantly. Let X be the number of patients numero de pacientes em 100 que sobrevivem por 18 meses. Se a probabilidade de sucesso out of the 100 who survive for 18 months. If the probabiity of success were 0.5 for the fosse de 0,5 para os pacientes em tratamento (a mesma que sem tratamento), entdoXteria a patients on the treatment (the same as without the treatment), then X would have the distribuicao binomial com parametrosn=100 ep=0.5. O FP deXé representado graficamente binomial distribution with parameters n = 100 and p = 0.5. The p.f. of X is graphed como um grafico de barras com a linha solida na Fig. 6.3. A forma do grafico de barras lembra as a bar chart with the solid line in Fig. 6.3. The shape of the bar chart is reminiscent uma curva em forma de sino. O pdf normal com a mesma médiap= 50 e variancia o2= 25, pois a of a bell-shaped curve. The normal p.d.f. with the same mean yz = 50 and variance distribuicao binomial também é representada graficamente com a linha pontilhada. - o” = 25 as the binomial distribution is also graphed with the dotted line. < 6.3 O Teorema Central do Limite 361 6.3 The Central Limit Theorem 361 Figura 6.3Comparacao Figure 6.3 Comparison do FP binomial com 0.08 of the binomial p.f. with 0.08 paradmetros 100 e 0,5 para i === Normal parameters 100 and 0.5 to 3 i === Normal opdfnormalcommédia50—s = Ali the normal p.d.f.with mean = & Ali e variancia 25. 5 0,06 Aiilk 50 and variance 25. E 0.06 Aiilk S I 1 5 I 1 c ! 1 5 H \ 2 0,04 I 1 4: 0.04 I 1 : Ah S Ah 2 Ll 1 2 I 1 5 0,02 | \ € 0.02 | \ a / ‘ al / ‘ A i ff hs 9 20 40 60 80 100 x 0 20 40 60 80 100 * Nos Exemplos 5.4.1 e 5.4.2, ilustramos como a distribuigdo de Poisson fornece uma boa In Examples 5.4.1 and 5.4.2, we illustrated how the Poisson distribution pro- aproximacao para uma distribuicdo binomial com um grandene pequenop. O Exemplo 6.3.1 vides a good approximation to a binomial distribution with a large n and small p. mostra como uma distribuigdo normal pode ser uma boa aproximacdo para uma distribuicdo Example 6.3.1 shows how a normal distribution can be a good approximation to a binomial com grandene nao tao pequenop. O teorema do limite central (Teorema 6.3.1) 6 uma binomial distribution with a large n and not so small p. The central limit theorem declaracdo formal de como as distribuig6es normais podem aproximar distribuigdes de somas (Theorem 6.3.1) is a formal statement of how normal distributions can approximate gerais ou médias de varidveis aleatérias iid. distributions of general sums or averages of i.i.d. random variables. No Corolario 5.6.2, vimos que se uma amostra aleatéria de tamanhoré retirado In Corollary 5.6.2, we saw that if a random sample of size n is taken from the da distribuigao normal com médiaye variacdooz, entao a média amostralXntem a normal distribution with mean jz and variance o7, then the sample average X,, has distribuigdo normal com médiape variagdoo2/n. A versdo simples do teorema do the normal distribution with mean yw and variance o*/n. The simple version of the limite central que apresentamos nesta segdo diz que sempre que uma amostra central limit theorem that we give in this section says that whenever a random sample aleatoria de tamanhoné tirado dequa/querdistribuigdo com médiaye variagdooz, a of size n is taken from any distribution with mean yw and variance o7, the sample média da amostraXntera uma distribuicdo que éaproximadamentenormal com média average X, will have a distribution that is approximately normal with mean ju and Le variagdoo2/n. variance o7/n. Este resultado foi estabelecido para uma amostra aleatoria de uma distribuicgado de This result was established for a random sample from a Bernoulli distribution Bernoulli por A. de Moivre no inicio do século XVIII. A prova para uma amostra aleatéria de uma by A. de Moivre in the early part of the eighteenth century. The proof for a random distribuigdo arbitraria foi dada de forma independente por JW Lindeberg e P. Lévy no inicio da sample from an arbitrary distribution was given independently by J. W. Lindeberg década de 1920. Uma declaracao precisa do seu teorema sera dada agora, e um esbocgo da and P. Lévy in the early 1920s. A precise statement of their theorem will be given prova desse teorema sera dado mais adiante nesta segdo. Apresentaremos também outro now, and an outline of the proof of that theorem will be given later in this section. We teorema do limite central relativo 4 soma de varidveis aleatérias independentes que nao sdo shall also state another central limit theorem pertaining to the sum of independent necessariamente distribuidas de forma idéntica e apresentaremos alguns exemplos ilustrando random variables that are not necessarily identically distributed and shall present ambos os teoremas. some examples illustrating both theorems. Teorema Teorema do Limite Central (Lindeberg e Lévy).Se as varidveis aleatériasXi,..., Xnforma Theorem Central Limit Theorem (Lindeberg and Lévy). If the random variables X;,..., X,, form 6.3.1 uma amostra aleatoria de tamanhonde uma dada distribuigdo com médiape variacdo oz 6.3.1 a random sample of size n from a given distribution with mean yw and variance o7 (0<o2 <0), entdo para cada numero fixox, (0 < 0% < oo), then for each fixed number x, [ _ ] _ lim pr “a7 # <x=(x), (6.3.1) lim P| oe < “ = &(x), (6.3.1) ve o/M 2 noo a/ni/2 onde denota o cdf da distribuigdo normal padrdo. 7 where ® denotes the c.d.f. of the standard normal distribution. 7 A interpretagdo da Eq. (6.3.1) € o seguinte: Se uma grande amostra aleatoria for The interpretation of Eq. (6.3.1) is as follows: If a large random sample is taken retirada de qualquer distribuigdo com médiaye variagdo oz, independentemente de from any distribution with mean yu and variance o”, regardless of whether this esta distribuigdo ser discreta ou continua, entdo a distribuigdo da variavel aleatéria n distribution is discrete or continuous, then the distribution of the random variable 12(Xrp)/osera aproximadamente a distribuicdo normal padrao. Portanto, a n'/2(X,, — )/o will be approximately the standard normal distribution. Therefore, distribuicdo deXnsera aproximadamente a distribuicao normaly ncom médiay the distribution of X,, will be approximately the normal distribution with mean € variagdoo2/n, ou, equivalentemente, a distribuigdo da soma eu=1Xewai Ser and variance o7/n, or, equivalently, the distribution of the sum yr, X; will be 362 Capitulo 6 Grandes Amostras Aleatorias 362 Chapter 6 Large Random Samples aproximadamente a distribuigdo normal com médianpe variagdonoz. E nesta ultima approximately the normal distribution with mean ny and variance no?. It is in this forma que o teorema do limite central foi ilustrado no Exemplo 6.3.1. last form that the central limit theorem was illustrated in Example 6.3.1. Exemplo Jogando uma moeda.Suponha que uma moeda honesta seja langada 900 vezes. Vamos aproximar o Example Tossing a Coin. Suppose that a fair coin is tossed 900 times. We shall approximate the 6.3.2 probabilidade de obter mais de 495 caras. 6.3.2 probability of obtaining more than 495 heads. Paraeu=1,...,900, vamosXeu=1 se uma cabeca for obtida noeujogue e deixeXeu= Fori =1,..., 900, let X; = 1if a head is obtained on the ith toss and let X; = 0 0 caso contrario. EntaoFXeu¥1/2 e Var(Xeu-1A. Portanto, os valores, ..., X900 otherwise. Then E(X;) = 1/2 and Var(X;) = 1/4. Therefore, the values Xj, ..., Xo formar uma amostra aleatoria de tamanhon=900 de uma distribuigdo com média 1/2 e form a random sample of size n = 900 from a distribution with mean 1/2 and variance variancia 1/4. Seglee tio rema do limite central de que a distribui¢do do numero total 1/4. It follows from the central limit theorem that the distribution of the total number 90 , . re 900 . . ar . . decabecasH= = eu=1XeuSera aproximadamente a distribuigdo normal para a qual o of heads H = )°;_, X; will be approximately the normal distribution for which the significa 6(9900)1/2450, a variagdo 6(900)(1 4225, e o desvio padrdo €(225)2= mean is (900) (1/2) = 450, the variance is (900) (1/4) = 225, and the standard deviation 15. Portanto, a variavelZ=(H-450)/5 tera aproximadamente a distribuigdo is (225)!/? = 15. Therefore, the variable Z = (H — 450)/15 will have approximately normal padrdo. Por isso, the standard normal distribution. Thus, ( ) H-450 495-450 H—450 495 — 450 Pr.(H >495}Pr. ——__ > —_— Pr(H > 495) = Pr (a > a) 15 15 15 15 =Pr.(Z>3F 1 - 30.0013. - = Pr(Z > 3) 1— (3) = 0.0013. < A probabilidade exata 0,0012 com quatro casas decimais. The exact probability 0.0012 to four decimal places. Exemplo Amostragem de uma distribuicdo uniforme.Suponha que uma amostra aleatoria de tamanhon=12 é Example Sampling from a Uniform Distribution. Suppose that a random sample of size n = 12 is 6.3.3 retirado de (thle unifolrm distribuigdo no intervalo [0,1]. Vamos aproximar o 6.3.3 taken from the uniform distribution on the interval [0, 1]. We shall approximate the valor de Pr Xe alco.1. value of Pr (|X, - | < 0.1). A média da distribuigdo uniforme no intervalo [0,1] 6 1/2, e a varidncia 6 112 The mean of the uniform distribution on the interval [0, 1]is 1/2, and the variance (ver Exercicio 3 da Secdo 4.3). Desden=12 neste exemplo, segue do teorema do is 1/12 (see Exercise 3 of Sec. 4.3). Since n = 12 in this example, it follows from the limite central que a distribuigdo deXnsera aproximadamente 0 normal central limit theorem that the distribution of X,, will be approximately the normal distribuigdo con{significa 1)/2 e variancia 1/144. Portanto, a distribuigdo do distribution with mean 1/2 and variance 1/144. Therefore, the distribution of the ws > 1 , . vp x ~ . => 1 . . ee variavelZ=12 Xn- 5 sera aproximadamente a distribuigdo normal padrao. variable Z = 12 (x, - :) will be approximately the standard normal distribution. Por isso, Hence, ( i ) [| i ] 1 1 Pr. Xn = .1=Pr12| Fea = Pr( |X, — 5) <0.1) =r[12|x, - 3| <12| 2IsO 2|s1.2 2 2 =Pr.(|Z| $1.22(1.2}1 = 0.7698. = Pr(|Z| < 1.2) 26(1.2) — 1=0.7698. 12 Para o caso especial den=12, a variavel aleatoériaztem a formaZ= Z eu=1Xeu-6. For the special case of n = 12, the random variable Z has the form Z = yi, X; — 6. Ao mesmo tempo, alguns computadores produziam numeros pseudoaleatérios normais At one time, some computers produced standard normal pseudo-random numbers padrdo adicionando 12 numeros pseudoaleatérios uniformes e subtraindo 6. - by adding 12 uniform pseudo-random numbers and subtracting 6. < Exemplo Varidveis Aleatérias de Poisson.Suponha que, ..., Xnformar uma amostra aleatéria do Example Poisson Random Variables. Suppose that X,..., X,, form arandom sample from the 6.3.4 Distribuigdo de Poisson com média@. DeixarXnseja a média. Entaop=Ge02=8. O 6.3.4 Poisson distribution with mean 6. Let X,, be the average. Then jp = @ and o* =. teorema do limite central diz quem 2(Xn-6)/@12tem aproximadamente a distribuicado The central limit theorem says that n”*(X,, — 0)/0'/* has approximately the standard normal padrdo. Em particular, o teorema do limite central diz queXndeveria estar normal distribution. In particular, the central limit theorem says that X,, should be pertoycom alta probabilidade. A probabilidade de que | Xn-98|é menor que um close to yw with high probability. The probability that |X,, — 0| is less than some small pequeno numerocpode ser aproximado usando o cdf normal padrdo: number c could be approximated using the standard normal c.d.f.: (_ ) ( ) _ Pré [Xn-6| <c=2 cM2b12~ 1. (6.3.2) Pr (IX, ~6|< c) ~2 (en'??9-"/?) -1. (6.3.2) - < O tipo de convergéncia que aparece no teorema do limite central, especificamente, na Eq. The type of convergence that appears in the central limit theorem, specifically, (6.3.1), surge em outros contextos e tem um nome especial. Eq. (6.3.1), arises in other contexts and has a special name. 6.3 O Teorema Central do Limite 363 6.3 The Central Limit Theorem 363 Definigao Convergéncia na Distribui¢do/Distribuigado Assintética.DeixarXi, X2,.. .ser uma sequéncia de Definition Convergence in Distribution/Asymptotic Distribution. Let X,, X>,... be a sequence of 6.3.1 variaveis aleatdrias, e paran=1,2,...,deixarFndenotar o cdf deXn. Além disso, deixeAseja 6.3.1 random variables, and forn =1,2,..., let F,, denote the c.d.f. of X,,. Also, let F* be um cdf Entdo diz-se que a sequénciaX1, X2,.. .converge na distribuic¢goparaFse ac.d.f. Then it is said that the sequence Xj, X>, .. . converges in distribution to F* if limdo Fn(xXE F(X), (6.3.3) lim F,,(x) = F*(x), (6.3.3) mo n>0oo para todosxem qual&(x continuo. As vezes, diz-se simplesmente queXnconverge na for all x at which F*(x) is continuous. Sometimes, it is simply said that X,, converges distribuigdo paraf, e&é chamado dedistribuicao assintdticadeXn. SeFtem um nome, in distribution to F*, and F* is called the asymptotic distribution of X,. If F* has a entao dizemos queXnconverge em distribuigdo para esse nome. name, then we say that X,, converges in distribution to that name. Assim, de acordo com 0 Teorema 6.3.1, conforme indicado na Eq. (6.3.1), a variavel Thus, according to Theorem 6.3.1, as indicated in Eq. (6.3.1), the random variable aleatoria m2(Xr-y)/oconverge em distribuigdo para a distribuigdo normal padrdo, ou, n/2(X,, — w)/o converges in distribution to the standard normal distribution, or, equivalentemente, a distribuicdo assintotica dem 2(Xr-p/oé a distribuigdo normal padrdo. equivalently, the asymptotic distribution of n/?(X,, — )/o is the standard normal distribution. Efeito do Teorema do Limite Centralo teorema do limite central fornece uma explicacado Effect of the Central Limit Theorem The central limit theorem provides a plausible plausivel para o fato de que as distribuigdes de muitas varidveis aleatérias estudadas em explanation for the fact that the distributions of many random variables studied in experimentos fisicos sdo aproximadamente normais. Por exemplo, a altura de uma pessoa é physical experiments are approximately normal. For example, a person’s height is influenciada por muitos fatores aleatérios. Se a altura de cada pessoa for determinada pela influenced by many random factors. If the height of each person is determined by soma dos valores desses fatores individuais, entdo a distribui¢do das alturas de um grande adding the values of these individual factors, then the distribution of the heights of a numero de pessoas sera aproximadamente normal. Em geral, 0 teorema do limite central large number of persons will be approximately normal. In general, the central limit indica que a distribuicdo da soma de muitas varidveis aleatérias pode ser aproximadamente theorem indicates that the distribution of the sum of many random variables can be normal, mesmo que a distribuicdo de cada varidvel aleatoria na soma seja diferente da normal. approximately normal, even though the distribution of each random variable in the sum differs from the normal. Exemplo Determinando um tamanho de simulacdo.No Exemplo 6.2.2 na pagina 351, um engenheiro ambiental Example Determining a Simulation Size. In Example 6.2.2 on page 351, an environmental engi- 6.3.5 neer queria determinar o tamanho de uma simulacdo para estimar a proporcdo média do 6.3.5 neer wanted to determine the size of a simulation to estimate the mean proportion of contaminante da agua que era o chumbo. O uso da desigualdade de Chebyshev nesse exemplo water contaminant that was lead. Use of the Chebyshev inequality in that example sugeriu que uma simulacao de tamanho 2.000.000 garantira que a estimativa estaraé a menos suggested that a simulation of size 2,000,000 will guarantee that the estimate will be de 0,005 da verdadeira proporcdo média com probabilidade de pelo menos 0,98. Neste less than 0.005 away from the true mean proportion with probability at least 0.98. exemplo, usaremos 0 teorema do limite central para determinar um tamanho de simulacdo In this example, we shall use the central limit theorem to determine a much smaller muito menor que ainda devera fornecer o mesmo limite de precisdo. A estimativa da propor¢ao simulation size that should still provide the same accuracy bound. The estimate of the média sera a médiaAnde todas as proporgées simuladasAi, ..., Rn mean proportion will be the average R,, of all of the simulated proportions R;,..., R, densimulagdes que serdo executadas. Como observamos no Exemplo 6.2.2, a variancia de from the n simulations that will be run. As we noted in Example 6.2.2, the variance cadaReuéo2s1 e, portanto, o teorema do limite central diz queRntem aproximadamente a of each R; is o* < 1, and hence the central limit theorem says that R,, has approxi- distribuigdo normal com média igual a proporcdo média verdadeiraF(Reu) e variancia no mately the normal distribution with mean equal to the true mean proportion E(R;) maximo 1/n. Como a probabilidade de estar préximo da média diminui and variance at most 1/n. Since the probability of being close to the mean decreases a medida que a variancia aumenta, vemos que ( ( as the variance increases, we see that — 0.005 - 0.005 = 0.005 —0.005 Pr.(| Rn-E(Reu)| <0.005 oo - = 0p> Pr(|R,, — E(R;)| < 0.005) ~ 0( 20) — o( =) ’ o/n ’ o/n o//n o/J/n 0.005 - 0.905 0.005 —0.005 a > 00) - o( 2) 1/n J 1/n 1/J/n 1/J/n J =2 (0.005 n}1. = 20(0.005./n) — 1. Se definirmos 200.005 n}1 = 0.98, obtemos If we set 26(0.005./n) — 1= 0.98, we obtain 1 1 n= ——— -1(0.99= 40,000x2.3262= 216.411. n= ——~ 0 1(0.99)* = 40,000 x 2.3267 = 216,411. 0.0052 0.0052 Ou seja, precisamos apenas de um pouco mais de 10% do tamanho da simulacdo sugerido pela That is, we only need a little more than 10 percent of the simulation size that the desigualdade de Chebyshev. (Desdegzna verdade nao é mais que 1/4, s6 precisamosn=54.103. Chebyshev inequality suggested. (Since o? is actually no more than 1/4, we really only Veja o Exercicio 14 na Secdo. 6.2 para uma prova de que uma distribuigdo discreta em need n =54,103. See Exercise 14 in Sec. 6.2 for a proof that a discrete distribution on 364 Chapter 6 Large Random Samples the interval [0, 1] can have variance at most 1/4. The continuous case is slightly more complicated, but also true.) ◀ Other Examples of Convergence in Distribution In Chapter 5, we saw three exam- ples of limit theorems involving discrete distributions. Theorems 5.3.4, 5.4.5, and 5.4.6 all showed that a sequence of p.f.’s converged to some other p.f. In Exercise 7 in Sec. 6.5, you can prove a general result that implies that the three theorems just mentioned are examples of convergence in distribution. The Delta Method Example 6.3.6 Rate of Service. Customers arrive at a queue for service, and the ith customer is served in some time Xi after reaching the head of the queue. If we assume that X1, . . . , Xn form a random sample of service times with mean μ and finite variance σ 2, we might be interested in using 1/Xn to estimate the rate of service. The central limit theorem tells us something about the approximate distribution of Xn if n is large, but what can we say about the distribution of 1/Xn? ◀ Suppose that X1, . . . , Xn form a random sample from a distribution that has finite mean μ and finite variance σ 2. The central limit theorem says that n1/2(Xn − μ)/σ has approximately the standard normal distribution. Now suppose that we are interested in the distribution of some function α of Xn. We shall assume that α is a differentiable function whose derivative is nonzero at μ. We shall approximate the distribution of α(Xn) by a method known in statistics as the delta method. Theorem 6.3.2 Delta Method. Let Y1, Y2, . . . be a sequence of random variables, and let F ∗ be a continuous c.d.f. Let θ be a real number, and let a1, a2, . . . be a sequence of positive numbers that increase to ∞. Suppose that an(Yn − θ) converges in distribution to F ∗. Let α be a function with continuous derivative such that α′(θ) ̸= 0. Then an[α(Yn) − α(θ)]/α′(θ) converges in distribution to F ∗. Proof We shall give only an outline of the proof. Because an → ∞, Yn must get close to θ with high probability as n → ∞. If not, |an(Yn − θ)| would go to ∞ with nonzero probability and then the c.d.f. of an(Yn − θ) would not converge to a c.d.f. Because α is continuous, α(Yn) must also be close to α(θ) with high probability. Therefore, we shall use a Taylor series expansion of α(Yn) around θ, α(Yn) ≈ α(θ) + α′(θ)(Yn − θ), (6.3.4) where we have ignored all terms involving (Yn − θ)2 and higher powers. Subtract α(θ) from both sides of Eq. (6.3.4), and then multiply both sides by an/α′(θ) to get an α′(θ)(Yn − θ) ≈ an(Yn − θ). (6.3.5) We then conclude that the distribution of the left side of Eq. (6.3.5) will be ap- proximately the same as the distribution of the right side of the equation, which is approximately F ∗. The most common application of Theorem 6.3.2 occurs when Yn is the average of a random sample from a distribution with finite variance. We state that case in the following corollary. Corollary 6.3.1 Delta Method for Average of a Random Sample. Let X1, X2, . . . be a sequence of i.i.d. random variables from a distribution with mean μ and finite variance σ 2. Let α 364 Capítulo 6 Grandes Amostras Aleatórias o intervalo [0,1] pode ter variação de no máximo 1/4. O caso contínuo é um pouco mais complicado, mas também verdadeiro.) - Outros exemplos de convergência na distribuiçãoNo Capítulo 5, vimos três exames ples de teoremas de limite envolvendo distribuições discretas. Os teoremas 5.3.4, 5.4.5 e 5.4.6 mostraram todos que uma sequência de PF convergiu para algum outro PF. No Exercício 7 da Seção. 6.5, você pode provar um resultado geral que implica que os três teoremas que acabamos de mencionar são exemplos de convergência na distribuição. O Método Delta Exemplo 6.3.6 Taxa de serviço.Os clientes chegam a uma fila para atendimento e oeuo cliente é atendido em algum tempoXeudepois de chegar ao topo da fila. Se assumirmos queX1, . . . , Xn formar uma amostra aleatória de tempos de serviço com médiaμe variância finitaσ2, podemos estar interessados em usar 1/Xnpara estimar a taxa de serviço. O teorema do limite central nos diz algo sobre a distribuição aproximada deXnsené grande, mas o que podemos dizer sobre a distribuição de 1/Xn? - Suponha queX1, . . . , Xnformar uma amostra aleatória de uma distribuição que tem média finitaμe variância finitaσ2. O teorema do limite central diz quen1/2(Xn-µ)/σtem aproximadamente a distribuição normal padrão. Agora suponha que estamos interessados na distribuição de alguma funçãoαdeXn. Vamos assumir queαé uma função diferenciável cuja derivada é diferente de zero emμ. Vamos aproximar a distribuição de α(Xn)por um método conhecido em estatística comométodo delta. Teorema 6.3.2 Método Delta.DeixarS1, S2, . . .seja uma sequência de variáveis aleatórias, e sejaF∗seja um cdf contínuo Deixeθseja um número real e sejaa1, a2, . . .ser uma sequência de números positivos que aumentam para∞.Suponha quean(Sn-θ)converge na distribuição paraF∗. Deixarαseja uma função com derivada contínua tal queα'(θ)=0. Entãoan[α(Sn)- α(θ)]/α'(θ) converge na distribuição paraF∗. ProvaDaremos apenas um esboço da prova. Porquean→ ∞,Sndeve chegar pertoθcom alta probabilidade comon→ ∞.Se não, |an(Sn-θ)|iria para∞com probabilidade diferente de zero e então o cdf dean(Sn-θ)não convergiria para um cdf porqueα é contínuo,α(Sn)também deve estar próximoα(θ)com alta probabilidade. Portanto, usaremos uma expansão em série de Taylor deα(Sn)em voltaθ, α(Sn)≈α(θ)+α'(θ)(Sn-θ), (6.3.4) onde ignoramos todos os termos que envolvem(Sn-θ)2e poderes superiores. Subtrairα(θ) de ambos os lados da Eq. (6.3.4), e então multiplique ambos os lados poran/α'(θ)obter an(Sn-θ)≈an(Sn-θ). α'(θ) (6.3.5) Concluímos então que a distribuição do lado esquerdo da Eq. (6.3.5) será aproximadamente igual à distribuição do lado direito da equação, que é aproximadamenteF∗. A aplicação mais comum do Teorema 6.3.2 ocorre quandoSné a média de uma amostra aleatória de uma distribuição com variância finita. Afirmamos esse caso no seguinte corolário. Corolário 6.3.1 Método Delta para média de uma amostra aleatória.DeixarX1, X2, . . .seja uma sequência de iid variáveis aleatórias de uma distribuição com médiaμe variância finitaσ2. Deixarα 6.3 O Teorema Central do Limite 365 6.3 The Central Limit Theorem 365 seja uma fun¢gdo com derivada continua tal quea(u0. Entdo a distribuigdo be a function with continuous derivative such that a’(w) #40. Then the asymptotic assintotica de distribution of MR _ ni/2 _ —— [a(Xn} at) ——|a(X;,) — a (Ke) onli) U sak n 7) | é a distribuigdo normal padrdo. is the standard normal distribution. ProvaAplique 0 Teorema 6.3.2 com Sn=Xn,an= 2/0, O=, eFsendo o cdf normal Proof Apply Theorem 6.3.2 with Y, = X,, a, =n'/?/o, 0 =, and F* being the padrdo 7 standard normal c.d.f. 7 Uma forma comum de relatar o resultado no Corolario 6.3.1 é dizer que a A common way to report the result in Corollary 6.3.1 is to say that the distribution distribuigdo dea(XnX¥ aproximadamente a distribuig¢do normal com médiaa(ue of a(X,,) is approximately the normal distribution with mean a@(j) and variance variagdo o2[a(uJl2/n. o7[a'(u) P/n. Exemplo Taxa de servico.No Exemplo 6.3.6, estamos interessados_na distribuicao dea(Xnjonde Example Rate of Service. In Example 6.3.6, we are interested in the distribution of w(X,,) where 6.3.7 a(x 1/xparax >0. Podemos aplicar o método delta encontrandoa(xF -1/x2. 6.3.7 a(x) =1/x for x > 0. We can apply the delta method by finding a’(x) = —1/x?. It Segue-se que a distribui¢do assintdtica de follows that the asymptotic distribution of ( ) Mp2 1 1 n'/2 42 1 1 oO Xn v o xX, M é a distribuicdo normal padrdo. Alternativamente, poderiamos dizer que 1/Xntem is the standard normal distribution. Alternatively, we might say that 1/X,, has ap- aproximadamente a distribuigdo normal com média 1 /ve variagdo oz np). - proximately the normal distribution with mean 1/ and variance o”/[nj.‘]. < Transformagées de Estabilizagao de VarianciaSe observassemos uma amostra aleatéria de Variance Stabilizing Transformations If we were to observe a random sample of Varidveis aleatérias de Poisson como no Exemplo 6.3.4, assumiriamos que GE desconhecido. Poisson random variables as in Example 6.3.4, we would assume that 0 is unknown. Nesse caso, ndo podemos calcular a probabilidade na Eq. (6.3.2), porque a variancia In such a case we cannot compute the probability in Eq. (6.3.2), because the ap- aproximada deXndepende de@. Por esta razdo, as vezes é desejavel transformarXnpor uma proximate variance of X,, depends on @. For this reason, it is sometimes desirable fungdoade modo que a distribuigdo aproximada dea(Xntem uma variagdo que é um valor to transform X,, by a function @ so that the approximate distribution of a(X,,) has a conhecido. Tal fungdo é chamada detransforma¢o estabilizadora de variancia. Muitas vezes variance that is a known value. Such a function is called a variance stabilizing transfor- podemos encontrar uma transformacdo de estabilizagdo de variancia executando 0 método mation. We can often find a variance stabilizing transformation by running the delta delta ao contrario. Em geral, notamos que a distribuigéo aproximada dea(Xn) tem variagaoa (up method in reverse. In general, we note that the approximate distribution of a(X,,) o2/n. Para tornar esta variancia constante, precisamosa(yser uma constante vezes 1/o.Sea2zé has variance a’ (w)°o7/n. In order to make this variance constant, we need a’(j1) to uma funcgdog(1), entao alcancamos esse objetivo deixando be a constant times 1/c. If o” is a function g(j), then we achieve this goal by letting Ju ax Me dx a= —_, (6.3.6) a(jL) = / — (6.3.6) a O(x)/ a g(x)t/? ondeaé uma constante arbitraria que torna a integral finita. where a is an arbitrary constant that makes the integral finite. Exemplo Variaveis Aleatdérias de Poisson.No Exemplo 6.3.4, temosa2=6=y, para queg(UFy. Example Poisson Random Variables. In Example 6.3.4, we have o* = 6 =, so that g(y) =m. 6.3.8 De acordo com a Eq. (6.3.6), devemos deixar 6.3.8 According to Eq. (6.3.6), we should let Su ax » dx 1 al 9 MA 22. au) = f yin —1/2 ; eet ick a: =1/2 . wg Segue-se que 2Xntem aproximadamente a distribuigdo normal com média 2012e It follows that 2x,/ has approximately the normal distribution with mean 26!/? and variancia 1/7. Para cada numeroc >0, temos variance 1/n. For each number c > 0, we have ( ) ( ) —12 a sl1/2 Pr |2kn- 2012| <c=2 cma~ 1. (6.3.7) Pr([2X,, ~ 2691/2) < c) ~2 (cn'”) -1. (6.3.7) No Capitulo 8, veremos como usar a Eq (6.3.7) para estimar @quando nés o supusermos6E In Chapter 8, we shall see how to use Eq (6.3.7) to estimate 6 when we assume desconhecido. - that 0 is unknown. < 366 Capitulo 6 Grandes Amostras Aleatérias 366 Chapter 6 Large Random Samples O Teorema do Limite Central (Liapounov) para a Soma de Varidveis Aleatorias Independentes The Central Limit Theorem (Liapounov) for the Sum of Independent Random Variables Devemos agora enunciar um teorema do limite central que se aplica a uma sequéncia de variaveis We shall now state a central limit theorem that applies to a sequence of random aleatériasX1, X2,.. que sdo independentes, mas nao necessariamente distribuidos de forma idéntica. variables X1, X>, .. . that are independent but not necessarily identically distributed. Este teorema foi provado pela primeira vez por A. Liapounov em 1901. Assumiremos que £Xeu/= This theorem was first proved by A. Liapounov in 1901. We shall assume that E(X;) = [eue Var (Xeu= 02 eyparaeu=1, ..., 1. Além disso, vamos deixar fu; and Var(X;) = a? fori =1,...,n. Also, we shall let dn dn eu=1 Xeu- yw XX) — a — eu=1 [eu _ i=l 7! i=l Sn= on in Ta * (6.3.8) Y,= 7 2 2 (6.3.8) eu=Peu (oh, 9; ) Entao£(Sn0 e Var(Sn1. O teorema apresentado a seguir fornece uma condigdo Then £(Y,,) =0 and Var(Y,,) = 1. The theorem that is stated next gives a sufficient suficiente para a distribuicdo desta variavel aleatoriaSnser aproximadamente a condition for the distribution of this random variable Y, to be approximately the distribuigdo normal padrdo. standard normal distribution. Teorema Suponha que as variaveis aleatdriasX1, X2,...sdo independentes e queF/| Xeu- Theorem Suppose that the random variables X,, Xz, ... are independent and that E(|X; — 6.3.3 Leu|3) <eparaeu=1,2, ...Além disso, suponha que 6.3.3 u;|>) < oo fori =1,2,... Also, suppose that ) 2 Ga E|X eupieu| 3 an E(IX; — Hil) nos On S32 =0. (6.3.9) im, (on 32 = 0. (6.3.9) eu=1 Oeu (oh 9; ) Finalmente, deixe a variavel aleatoriaSnser como definido na Eq.(6.3.8). Entao, para cada Finally, let the random variable Y,, be as defined in Eq. (6.3.8). Then, for each fixed numero fixox, number x, slim Pr(Snsx (x). (6.3.10) lim Pr(Y, <x) = (x). (6.3.10) 00 n—>0oo 7 7 Ainterpretagdo deste teorema é uma segue: Se a Eq. (6.3.9) for satisfeito, entéo para The interpretation of this theorem is as follows: If Eq. (6.3.9) is satisfied, then for cada grande valor den, 3 distribuicao de Aug Xewser’ aproximadamente 0 normal every large value of n, the distribution of }*”_, X; will be approximately the normal Cdistribuigéo com média “~ eu=1fleue variagdo ~~ ,,au.Deve-se notar que quando distribution with mean )7"_, 4; and variance )~/_, 07. It should be noted that when as varidveis aleatériasX1, X2, .. .sdo distribuidos de forma idéntica e os terceiros momentos the random variables X), X>,... are identically distributed and the third moments das variaveis _ existem, Eq. (6.3.9) sera automaticamente satisfeita e a Eq. (6.3.10) entdo se of the variables exist, Eq. (6.3.9) will automatically be satisfied and Eq. (6.3.10) then reduz a Eq. (6.3.1). reduces to Eq. (6.3.1). A distingdo entre o teorema de Lindeberg e Lévy e 0 teorema de Liapounov deve The distinction between the theorem of Lindeberg and Lévy and the theorem ser enfatizada. O teorema de Lindeberg e Lévy aplica-se a uma sequéncia de of Liapounov should be emphasized. The theorem of Lindeberg and Lévy applies to variaveis aleatorias iid. Para que este teorema Seja aplicavel, basta assumir apenas a sequence of i.i.d. random variables. In order for this theorem to be applicable, it que a variancia de cada variavel aleatoria é finita. O teorema de Liapounov aplica-se is sufficient to assume only that the variance of each random variable is finite. The a uma sequéncia de variaveis aleatdrias independentes que ndo sdo theorem of Liapounov applies to a sequence of independent random variables that necessariamente distribuidas de forma idéntica. Para que este teorema seja are not necessarily identically distributed. In order for this theorem to be applicable, aplicavel, deve-se assumir que o terceiro momento de cada variavel aleatoria é finito it must be assumed that the third moment of each random variable is finite and e satisfaz a Eq. (6.3.9). satisfies Eq. (6.3.9). O Teorema do Limite Central para Varidveis Aleatérias de Bernoullido aplicar o The Central Limit Theorem for Bernoulli Random Variables By applying the teorema de Liapounov, podemos estabelecer o seguinte resultado. theorem of Liapounov, we can establish the following result. Teorema Suponha que as variaveis aleatoriasXi,..., XnSdo independentes eXeutem o Theorem Suppose that the random variables X,..., X, are independent and X; has the 6.3.4 Berno@istribuicdo ulli com parametropeu(eu=1,2, .. .). Suponha também que a série 6.3.4 Bernoulli distribution with parameter p; (( =1, 2, .. .). Suppose also that the infinite infunitepeu(1 -peuX divergente e deixe series )°°°, p;(1 — p;) is divergent, and let Dn dn n n eu=1Xeu- _ ,X-— a . Sn= (9 55.. (6.3.11) Y, = Pe M7 Die Pe (6.3.11) n eu=1peu(1 -p eu) (yy p,di- Pi)) 6.3 O Teorema Central do Limite 367 6.3 The Central Limit Theorem 367 Entdo para cada numero fixox, Then for every fixed number x, slim Pr(Snsx} (x). (6.3.12) lim Pr(Y, <x) = (x). (6.3.12) 00 n—>0oo ProvaAqui Pr(Xeu=1 Fpeue Pr(Xeu=0 1 -peu. Portanto, Proof Here Pr(X; = 1) = p; and Pr(x; = 0) = 1 — p;. Therefore, ( EXeu)=peuNar(XeuF peut - peu), ’ ) E(X;) = p;, Var(X;) = p;( — pj), E|Xeu-peu|3=peu(l -p —_edf+(1 -peu)p3 eu=peu( -peu)p2 eut(1 -P2 gy) E(|X; — pil) = pil = p> + = pp} = pill — p)(v? + = p?)) Speu(1 -peu), (6.3.13) <p; — p;), (6.3.13) Segue que ( ) It follows that dn eu=1E | Xeu-peu| 3 1 an E (x; _ pil?) 1 Gir < (77a: (6.3.14) Teng (B/D < Toon ay h/2” (6.3.14) ev=1Peu(1 -P ey eu=1peu(| -p eu// (S77, pi — pi) (S2"_, pid — pi) 00 n : : : : : : Como a série infinita 2 eu=1Peu(1 -peuX divergente, entdo 2 eu=1eu(1 -Peu)> © Since the infinite series )°*°, p;(1— p;) is divergent, then )~"_, pj(1— p;) > 00 comorn> ™,e pode ser visto pela relacdo (6.3.14) que a Eq. (6.3.9) sera satisfeito. Por as n —> oo, and it can be seen from the relation (6.3.14) that Eq. (6.3.9) will be sua vez, segue do Teorema 6.3.3 que a Eq. (6.3.10) sera satisfeito. Desde a Eq. (6.3.12) satisfied. In turn, it follows from Theorem 6.3.3 that Eq. (6.3.10) will be satisfied. é simplesmente uma reformulacao da Eq. (6.3.10) para as variaveis aleatdrias Since Eq. (6.3.12) is simply a restatement of Eq. (6.3.10) for the particular random particulares aqui consideradas, a prova do teorema esta completa. 7 variables being considered here, the proof of the theorem is complete. 7 O Teorema 6.3.4 implica que sea série infinita 2 eu=1peu(| -peuX divergente, entado Theorem 6.3.4 implies that if the infinite series )°*°, p;(1 — p;) is divergent, then a distribuigdo da soma eu=1Xeude um grande numero de Bernoulli indepengentes the distribution of the sum }°"_, X; of a large number of independent Bernoulli varidveis aleatérias terdo aproximadamente a distribuiggo normal com média eur peu random variables will be approximately the normal distribution with mean )~"_, p; e variagdo éu=1 peu( -peu). Deve-se ter em mente, no entanto, que um tipico and variance }*"_, p;(1— p;). It should be kept in mind, however, that a typical problema pratico envolvera apenas um numero finito de variaveis aleatériasM1,..., Xn, em vez practical problem will involve only a finite number of random variables X;,..., X,, de uma sequéncia infinita de varidveis aleatérias. EUY n tal problema, ndo é rather than an infinite sequence of random variables. In such a problem, it is not significativo considerar se a série infinita eu=1peu(1 -PeuX divergente, meaningful to consider whether or not the infinite series )°>° , p;(1 — p;) is divergent, porque apenas um numero finito de valorespi,..., pag.nsera spyecificada no problema. because only a finite number of values p;,..., p, will be specified in the problem. Num certo sentido, portanto, a distribuigdo da soma cu-1 Xeupodesempreser In a certain sense, therefore, the distribution of the sum }7"_, X; can always be aproximado por uma distribuigdo normal. A questdo critica 6 se este n approximated by a normal distribution. The critical question is whether or not this > distribuigdo normal fornece umbomaproximagao da distribuicdo real de normal distribution provides a good approximation to the actual distribution of éu-1Xeu. A resposta depende, é claro, dos valores depi,..., pdg.n. YComoa va X;. The answer depends, of course, on the values of pj, ..., Dp- distribuig¢do normal sera alcancada cada vez mais de perto a medida que Since the normal distribution will be attained more and more closely as éu=1 Peu( Pal «,a distribuigdo normal fornece uma boa aproximacgao quando Ye, pi — pi) > 00, the normal distribution provides a good approximation when o valor de éu=1peu(1 -peu grande. Além disso, como o valor de cada termo the value of }~”_, p;(1— p;) is large. Furthermore, since the value of each term peu(1 - peu um maximo quandopev=1/2, a aproximacao sera melhor quandoné grande e p;( — p;) is a maximum when p; = 1/2, the approximation will be best when n is os valores dep1,..., pag.nestao perto de 172. large and the values of pj, ..., p, are close to 1/2. Exemplo Perguntas do exame.Suponha que um exame contenha 99 questées organizadas Example Examination Questions. Suppose that an examination contains 99 questions arranged 6.3.9 numa sequéncia do mais facil ao mais dificil. Suponha que a probabilidade de um 6.3.9 in a sequence from the easiest to the most difficult. Suppose that the probability that determinado aluno responder corretamente a primeira questdo seja 0,99, a probabilidade a particular student will answer the first question correctly is 0.99, the probability that de que ele responda corretamente a segunda questdo seja 0,98 e, em geral, a he will answer the second question correctly is 0.98, and, in general, the probability probabilidade de que ele responda aeua questdo correta é 1 -eu/100 paraeu=1,...,99. that he will answer the ith question correctly is 1—1i/100 for i=1,..., 99. It is Presume-se que todas as questdes serdo respondidas de forma independente e que o assumed that all questions will be answered independently and that the student must aluno devera responder corretamente pelo menos 60 questées para passar no exame. answer at least 60 questions correctly to pass the examination. We shall determine Determinaremos a probabilidade de 0 aluno ser aprovado. the probability that the student will pass. DeixarXeu=1 se oeva pergunta foi respondida corretamente eXe-0 caso contrario. Let X; = 1if the ith question is answered correctly and X; = 0 otherwise. Then Entdo EXeu peu=1 -(eu/100)e Var(XeuF peu(1 -peu-(eu/100J11 -(eu/100/). Também, E(X,;) = p; =1— (/100) and Var(X;) = p;(1 — p;) = @/100)[1 — @/100)]. Also, Y? 1> 99 99 99 1 9)(100 1 1 = (99)(100 peu=99- —= eurgg- 1. PPN000) hg 5 Yo 7 =99- — oi =99- COTO) = 49.5 100 100 2 : 100 ¢ 100 2 eu=1 eu=1 i=l i=l 368 Capitulo 6 Grandes Amostras Aleatérias 368 Chapter 6 Large Random Samples e and 9 9 9 99 99 99 Y 1 Y 1 Y eu? 1 1 2 peu -peu= ——" el-——— Yo i= p)=— Di-—Gi eu=1 100 eu=1 a 0022 eu=1 i=l 100 i=l (100) i=l 1 9)100)199 1 100) =49,5- —_. 91010071199) =16.665. = 49.5 — ——__.. (09) 100)(199) = 16.665. (1002 6 (100)2 6 Segue-se do teorema do limite central que) distribui¢do do numero total It follows from the central limit theorem that the distribution of the total number de perguntas respondidas corretamente, o que é 221Xeu, sera aproximadamente of questions that are answered correctly, which is an X;, will be approximately a distribuigdo normal com média 49,5 e desvio padrdo(16.665)12= 4.08. the normal distribution with mean 49.5 and standard deviation (16.665)!/? = 4.08. Portanto, a distribuigdo da variavel Therefore, the distribution of the variable n 2 eu=1Xeu-49.5 an X; — 49.5 LZ ——_—_——_. Z= 4.08 4.08 sera aproximadamente a distribuigdo normal padrdo. Segue que will be approximately the standard normal distribution. It follows that ( ” ) n Pr. Xeu260 = Pr(Z22.5735)1 -(2.5735}-0.0050. - (So X,;> ) = Pr(Z > 2.5735) ~ 1 — &(2.5735) = 0.0050. < eu=1 i=1 Esboco da Prova do Teorema do Limite Central Outline of Proof of Central Limit Theorem Convergéncia das Fungées Geradoras do MomentoFuncées geradoras de momento Convergence of the Moment Generating Functions Moment generating functions sdo importantes no estudo da convergéncia na distribuigdo por causa do seguinte are important in the study of convergence in distribution because of the following teorema, Cuja prova é muito avancada para ser apresentada aqui. theorem, the proof of which is too advanced to be presented here. Teorema Deixar™, X2,...ser uma sequéncia de variaveis aleatérias. Paran=1,2, ...,deixarfndenotar Theorem Let X,, X>,... be a sequence of random variables. Forn = 1,2,..., let F,, denote 6.3.5 o cdf deXn, e deixar wndenotar o maf deXn. 6.3.5 the c.d.f. of X,,, and let y, denote the m.g.f. of X,,. Além disso, deixeXdenotar outra variavel aleatéria com cdffe mgfy. Suponha Also, let X* denote another random variable with c.d.f. F* and m.g.f. y*. Suppose que 0 MGF ye Ywexistir(n=1,2, . . .). Se limn-~ Wn(t= y(tpara todos os valores detem that the m.g.f’s w,, and y* exist (n = 1, 2,...). Iflim,_... (4) = W*(£) for all values algum intervalo ao redor do ponto&0, entdo a sequénciaM, X2,.. .converge na of tf in some interval around the point r = 0, then the sequence Xj, X>, ... converges distribuigdo paraX. a in distribution to X*. a Em outras palavras, a sequéncia de cdfsFi, F2,...deve convergir para o cdfF In other words, the sequence of c.d.f’s Fi, Fo, ... must converge to the c.d.f. F* se a sequéncia correspondente de mgf'sy1,¥z2, .. .converge para o mgfys. if the corresponding sequence of m.g.f’s W, w, ... converges to the m.g.f. w”. Esboco da Prova do Teorema 5.7.1Estamos agora prontos para esbocar uma prova do Outline of the Proof of Theorem 5.7.1 Weare now ready to outline a proof of Theo- Teorema 6.3.1, que é o teorema do limite central de Lindeberg e Lévy. Vamos supor que rem 6.3.1, which is the central limit theorem of Lindeberg and Lévy. We shall assume as variaveisXi,..., Xnformar uma amostra aleatoria de tamanhonde uma distribuicdo that the variables X,,..., X,, form a random sample of size n from a distribution com médiaye variagdoaz. Assumiremos também, por conveniéncia, que o FGM desta with mean ju and variance o7. We shall also assume, for convenience, that the m.g.f. distribuigdo existe, embora o teorema do limite central seja verdadeiro mesmo sem esta of this distribution exists, although the central limit theorem is true even without this suposicao. assumption. Paraeu=1,..., n, deixarSev=(Xeu-///o.Entdo as variaveis aleatdriasSi,..., Snsdo Fori =1,...,n, let Y; = (X; — w)/o. Then the random variables Y;,..., Y,, are iid, e cada um tem média 0 e variancia 1. Além disso, deixe 1.i.d., and each has mean 0 and variance 1. Furthermore, let — meX&rpy) 12.” n2xX,-w 1 Zn= Go - mn Fou Zn = —o = nl2 > Y;. eu=1 i=1 6.3 O Teorema Central do Limite 369 6.3 The Central Limit Theorem 369 Mostraremos queZnconverge na distribui¢do para uma variavel aleatoria tendo a We shall show that Z,, converges in distribution to a random variable having the distribuigdo normal padrdo, conforme indicado na Eq. (6.3.1), ao mostrar que o FGM standard normal distribution, as indicated in Eq. (6.3.1), by showing that the m.g.f. deZnconverge para o mof da distribuigdo normal padrdo. of Z,, converges to the m.g.f. of the standard normal distribution. SeW(tdenota o maf de cada v aleatoériodaravel Seu(eu=1,...., n), entdo segue If y(t) denotes the m.g.f. of each random variable Y; =1, ..., 7), thenit follows do Teorema 4.4.4 que o mgf da soma cu=1 Sewai ser [W(t]. Além disso, segue from Theorem 4.4.4 that the m.g.f. of the sum )~"_, Y; will be [y(r)]’. Also, it follows do Teorema 4.4.3 que 0 mgfén(t)deZnvai ser from Theorem 4.4.3 that the m.g.f. ¢,,(t) of Z,, will be (dn an OtFY ae &n(t) = l¥(<5) | ; Neste problema, WO E(Seu-0 ey (OK E(S2 euF1. Portanto, o Taylor In this problem, w’(0) = E(Y;) =0 and w”(0) = E(Y?) = 1. Therefore, the Taylor expansdo em série dew~(tsobre 0 ponto&0 tem o seguinte formato: series expansion of y(t) about the point t = 0 has the following form: t2 B 2 3 Y(tF pOr ty 0} 5 Or 3 OF. . W(t) = WO) +1) + YO + 3i¥ tee e B PoP, =1+ —+ —wW(p.... =14+—4+-W'(0)+---. 77 3 yO 5 3 (0) Também, Also, [ ]n Q 2 3a) n By) 2 Sy") nite 1+—+ ——~+..- , (6.3.15) th=]14+—+——S4+:-::]. 6.3.15 2n 31m Sul) 2n —3!n3/? ( Aplique o Teorema 5.3.3 com 1 +an/nigual a expressdo entre colchetes em (6.3.15) ec Apply Theorem 5.3.3 with 1 + a,,/n equal to the expression inside brackets in (6.3.15) n=n. Desde and c, =n. Since [ ] lima 2 Bv® _® lim Pew" _v now 2 3Im2 2 noo} 2 3Int/2 2° segue que (3 it follows that 1 . 1 jimao’ (t-experiéncia 5 a2 . (6.3.16) im 6, (t) = exp(5) : (6.3.16) Como o lado direito da Eq. (6.3.16) € o maf da distribuigdo normal padrdo, segue Since the right side of Eq. (6.3.16) is the m.g.f. of the standard normal distribution, do Teorema 6.3.5 que a distribuigdo assintdtica deZndeve ser a distribui¢do it follows from Theorem 6.3.5 that the asymptotic distribution of Z, must be the normal padrdo. standard normal distribution. Um esboco da prova do teorema do limite central de Liapounov também pode ser dado An outline of the proof of the central limit theorem of Liapounov can also be procedendo de forma semelhante, mas ndo consideraremos este problema mais detalhadamente given by proceeding along similar lines, but we shall not consider this problem further aqui. here. Resumo Summary Duas versées do teorema do limite central foram fornecidas. Concluem que a distribuicdo da Two versions of the central limit theorem were given. They conclude that the distri- média de um grande numero de varidveis aleatérias independentes esta préxima de uma bution of the average of a large number of independent random variables is close distribuigdo normal. Um teorema exige que todas as variaveis aleatérias tenham a mesma to a normal distribution. One theorem requires that the random variables all have distribuigdo com variancia finita. O outro teorema nao exige que as varidveis aleatérias sejam the same distribution with finite variance. The other theorem does not require that distribuidas de forma idéntica, mas sim que os seus terceiros momentos existam e satisfagam a the random variables be identically distributed, but instead requires that their third condicao (6.3.9). O método delta nos permite encontrar a distribuicdo aproximada de uma moments exist and satisfy condition (6.3.9). The delta method lets us find the approx- fungdo suave de uma média amostral. imate distribution of a smooth function of a sample average. 370 Capitulo 6 Grandes Amostras Aleatérias 370 Chapter 6 Large Random Samples Exercicios Exercises 1.A cada minuto, uma maquina produz um comprimento de corda 9.Um fisico faz 25 medicdes independentes da gravidade 1. Each minute a machine produces a length of rope with 9. A physicist makes 25 independent measurements of com média de 4 pés e desvio padrdo de 5 polegadas. Supondo especifica de um determinado corpo. Ele sabe que as mean of 4 feet and standard deviation of 5 inches. Assum- the specific gravity of a certain body. He knows that the que as quantidades produzidas em diferentes minutos sejam limitagées do seu equipamento sdo tais que 0 desvio ing that the amounts produced in different minutes are limitations of his equipment are such that the standard independentes e distribuidas de forma idéntica, aproxime a padrdo de cada medicao éounidades. independent and identically distributed, approximate the deviation of each measurement is o units. probabilidade de que a maquina produza pelo menos 250 pés em a.Usando a desigualdade de Chebyshev, encontre um limite probability that the machine will produce at least 250 feet a. By using the Chebyshev inequality, find a lower uma hora. inferior para a probabilidade de que a média de suas in one hour. bound for the probability that the average of his mea- 2.Suponha que 75% da populacao de uma determinada area medigées diferira da gravidade especifica real do corpo 2. Suppose that 75 percent of the people in a certain me- surements will differ from the actual specific gravity metropolitana viva na cidade e 25% da populacdo viva nos em menos dea unidades. tropolitan area live in the city and 25 percent of the people of the body by less than o/4 units. suburbios. Se 1.200 pessoas que assistem a um determinado b.Usando o teorema do limite central, encontre um valor live in the suburbs. If 1200 people attending a certain con- b. By using the central limit theorem, find an approxi- concerto representam uma amostra aleatoria da area aproximado para a probabilidade na parte (a). cert represent a random sample from the metropolitan mate value for the probability in part (a). metropolitana, qual é a probabilidade de que o numero de 10.Uma amostra aleatdéria denitens devem ser retirados area, what is the probability that the number of people 10. A random sample of n items is to be taken from a pessoas dos suburbios que assistem ao concerto seja inferior de uma distribuicdo com médiaye desvio padraoc. from the suburbs attending the concert will be fewer than distribution with mean u and standard deviation c. a 270? $ M P 270? n a.Use a desigualdade de Chebyshev para determinar a. Use the Chebyshev inequality to determine the 3.Suponha que a distribuigdéo do numero de defeitos em qualquer o menor numero de itensnque deve ser tomada 3. Suppose that the distribution of the number of defects smallest number of items n that must be taken in parafuso de tecido seja a distribuigdo de Poisson com média 5, e o para satisfazer a seguinte relacdo: on any given bolt of cloth is the Poisson distribution with order to satisfy the following relation: numero de defeitos em cada parafuso seja contado para uma ( ) mean 5, and the number of defects on each bolt is counted amostra aleatéria de 125 parafusos. Determine a probabilidade de Pré [Xrp| < oa >0.99. for a random sample of 125 bolts. Determine the proba- Pr( IX, —p|< “) > 0.99. que o numero médio de defeitos por parafuso na amostra seja 4 bility that the average number of defects per bolt in the 4 inferior a 5.5. b.Use 0 teorema do limite central para determinar o menor sample will be less than 5.5. b. Use the central limit theorem to determine the small- 4.Suponha que uma amostra aleatéria de tamanhondeve ser nuimero de itensnque deve ser tomada para satisfazer 4. Suppose that a random sample of size n is to be taken est number of items n that must be taken in order to retirado de uma distribuicdo para a qual a média éye o desvio aproximadamente a relacdo na parte (a). from a distribution for which the mean is jz and the stan- satisfy the relation in part (a) approximately. padrdo é 3. Use o teorema do limite central para determinar 11.Suponha que, em média, 1/3 dos alunos do Ultimo ano de dard deviation is 3. Use the central limit theorem to de- 11. Suppose that, on the average, 1/3 of the graduating aproximadamente o menor valor denpara o qual a seguinte graduacao em uma determinada faculdade tenham dois pais termine approximately the smallest value of n for which _ seniors at a certain college have two parents attend the relacdo sera satisfeita: presentes na ceriménia de formatura, outro terco desses alunos do the following relation will be satisfied: graduation ceremony, another third of these seniors have _ Ultimo ano tenha um dos pais presente na ceriménia e 0 terco restante _ one parent attend the ceremony, and the remaining third Pr.(|Xrp| <0.320.95. desses alunos do Ultimo ano nao tenha nenhum dos pais presentes. Se Pr(|X, — “| < 0.3) = 0.95. of these seniors have no parents attend. If there are 600 houver 600 formandos numa determinada turma, qual é a ; . ; graduating seniors in a particular class, what is the proba- 5.Suponha que a proporcao de itens defeituosos em um probabilidade de que nao mais de 650 pais comparecam a ceriménia 5. Suppose that the proportion of defective items in a bility that not more than 650 parents will attend the grad- grande lote fabricado seja 0,1. Qual 6 a menor amostra de formatura? large manufactured lot is 0.1. Whatis the smallestrandom _yation ceremony? aleatoria de itens que deve ser retirada do lote para que a sample of items that must be taken from the lot in order probabilidade seja de pelo menos 0,99 de que aproporcao —«-12.DeixarXnser uma variavel aleatoria tendo a for the probability to be at least 0.99 that the proportion 12. Let X,, be a random variable having the binomial dis- de itens defeituosos na amostra seja inferior a 0,13? distribuigdo binomial com parametrosnepn. Suponha of defective items in the sample will be less than 0.13? tribution with parameters n and p,. Assume that que limn-“npn=A. Prove que o mgf deXnconverge para lim,,+o0 "Pn =. Prove that the m.g.f. of X, converges 6.Suponha que trés meninasA,B, eGogue bolas de neve em o mof da distribuigdo de Poisson com médiaA. 6. Suppose that three girls A, B, and C throw snowballs at to the m.g.f. of the Poisson distribution with mean A. um alvo. Suponha também que aquela garotaAlanca 10 vezes, . a target. Suppose also that girl A throws 10 times, and the e a probabilidade de acertar o alvo em qualquer lance é de 13.Suponha quex,..., Xnformar uma amostra aleatoria de uma probability that she will hit the target on any given throw is 13. Suppose that X14, ute X, forma random sample from 0.3: garotaBlanca 15 vezes. ea probabilidade de acertar o alvo distribuigdo normal com média desconhecidaGe variacao oz. 0.3: virl B throws 15 times. and the probability that she will a normal distribution with unknown mean @ and variance 3; 9 ¢ ,eap Oe eee 32 , Pp y 2 ; vara em qualquer lance é de 0,2; e garotaCarremessa 20 vezes, e a Assumindo que@= 0, determine a distribuigao assintotica hit the target on any given throw is 0.2; and girl C throws o Assuming that 6 #0, determine the asymptotic distri- probabilidade de ela acertar o alvo em qualquer lance é de mas deXn, 20 times, and the probability that she will hit the targeton bution of X,. 0,1. Determine a probabilidade de o alvo ser atingido pelo 14,Suponha que%, ..., Xiformar uma amostra aleatéria de any given throw is 0.1. Determine the probability that the 14. Suppose that X,..., X, forma random sample from menos 12 vezes. uma distribuicdo normal com média 0 e variancia desconhecida oz target will be hit at least 12 times. a normal distribution with mean 0 and unknown variance 7.Suponha que 16 digitos sejam escolhidos aleatoriamente . 7. Suppose that 16 digits are chosen at random with re- a. com substituigdo do conjunto {0,...,9}. Qual éa a.Determine a distribuigdo assintdtica da estatistica placement from the set {0, . . . , 9}. What is the probability a. Determine the asymptotic distribution of the statistic probabilidade de que a média deles fique entre 4 e 6? 1)An ya that their average will lie between 4 and 6? lon 2\~1 n eutX2eu * (2 ri X7) . 8.Suponha que os participantes de uma festa sirvam bebidas de b.Encontre Uma transformacao estabilizadora de variancia para 0 Yn 8. Suppose that people attending a party pour drinks from b. Find a variance stabilizing transformation for the uma garrafa contendo 63 oncas de um determinado liquido. as a bottle containing 63 ounces of a certain liquid. Suppose sae. 1 yon 2 Suponha também que o tamanho esperado de cada bebida seja 2 PStatStCty eu X2eu. also that the expected size of each drink is 2 ounces, that statistic 7 1 X7- p q p J P 5 ongas, que o desvio padrdo de cada bebida seja 1/2 ongas, e que 15.Deixar™, X2,...ser uma sequéncia de variaveis the standard deviation of each drink is 1/2 ounce, and 15. Let X,, X,... be a sequence of i.i.d. random vari- todas as bebidas sejam servidas de forma independente. aleatorias iid, cada uma tendo distribuigdo uniforme no that all drinks are poured independently. Determine the ables each having the uniform distribution on the interval Determine a probabilidade de a garrafa ndo estar vazia apés 36 intervalo [0,4] para algum numero real@ >0. Para cadan, probability that the bottle will not be empty after 36 drinks [0, 6] for some real number @ > 0. For each n, define Y,, to bebidas terem sido servidas. definirSnser o maximo de, ..., Xn. have been poured. be the maximum of Xj, ..., X,.- 6.4 A Correcdo para Continuidade 371 6.4 The Correction for Continuity 371 , { ; a.Mostre que o cdf de Sné experiénciaza) SEZ<O, a. Show that the c.d.f. of Y,, is x, _ | exp(z/@) ifz <0, B(ZF n F°@= . 1 sez >0. . 1 ifz>0. 0 sexs, 0 if x <0, — — n ] Fr(S(//Pn se Oxy <6, Dica‘Aplique o Teorema 5.3.3 depois de encontrar o cdf de Z FrQ) = 9 0/8) if O<y <6, Hint: Apply Theorem 5.3.3 after finding the c.d-f. of 1 sey>@. hn. 1 ify >. Zn. Dica.Leia o Exemplo 3.9.6. c.Use 0 Teorema 6.3.2 para encontrar a distribuic¢do aproximada Hint: Read Example 3.9.6. ee c. Use Theorem 6.3.2 to find the approximate distribu- b.Mostre issoZn=Nova Iorquen-@)converge na distribuigao de Sogalinhané grande. b. Show that Zn = nYn — 9) converges in distribution tion of y2 when n is large. para a distribuigéo com cdf to the distribution with c.d.f. 6.4 A Correcao para Continuidade 6.4 The Correction for Continuity Algumas aplicagées do teorema do limite central nos permitem aproximar a probabilidade Some applications of the central limit theorem allow us to approximate the proba- de que uma varidvel aleat6ria discretaXesté em um intervalolum, b\pela probabilidade de bility that a discrete random variable X lies in an interval [a, b] by the probability que uma variavel aleat6éria normal esteja nesse intervalo. A aproxima¢ao pode ser that a normal random variable lies in that interval. The approximation can be ligeiramente melhorada tomando cuidado com a forma como aproximamosPr.(X=aje Pr.(X improved slightly by being careful about how we approximate Pr(X =a) and =p). Pr(Xx =D). Aproximando uma distribuigdo discreta por uma distribuicado continua Approximating a Discrete Distribution by a Continuous Distribution Exemplo Uma grande amostra.No Exemplo 6.3.1, ilustramos como a distribuigdo normal com Example A Large Sample. In Example 6.3.1, we illustrated how the normal distribution with 6.4.1 média 50 e variancia 25 poderiam aproximar a distribuigdo de uma variavel aleatoriaX que 6.4.1 mean 50 and variance 25 could approximate the distribution of a random variable X possui distribuicdo binomial com pardmetros 100 e 0,5. Em particular, se Stem distribuigao that has the binomial distribution with parameters 100 and 0.5. In particular, if Y has normal com média 50 e varidncia 25, sabemos que Pr(Ssx/fica perto do Pr(X<x)para todosx the normal distribution with mean 50 and variance 25, we know that Pr(Y < x) isclose . Mas a aproximacdo contém alguns erros sistematicos. A Figura 6.4 mostra os dois cdfs to Pr(X <x) for all x. But the approximation has some systematic errors. Figure 6.4 no intervalo 30<x <70. Os dois cdfs estado muito préximos em x=+0.5 para cada numero shows the two c.d.f’s over the range 30 < x < 70. The two c.d.f.’s are very close at inteiron. Mas para cada inteiron, Pr.(SSx) <Pr.(X<x)paraxum pouco acimane Pr(S<x) >Pr.(XS x =n-+ 0.5 for each integer n. But for each integer n, Pr(Y <x) < Pr(X <x) forxa X)paraxum pouco abaixon. Deveriamos ser capazes de fazer uso destas discrepdancias little above n and Pr(Y < x) > Pr(X < x) for x a little below n. We ought to be able to sistematicas para melhorar a aproximacdo. make use of these systematic discrepancies in order to improve the approximation. - < Suponha queXtem uma distribuicdo discreta que pode ser aproximada por uma Suppose that X has a discrete distribution that can be approximated by a normal distribuigdo normal, como no Exemplo 6.4.1. Nesta se¢do, descreveremos um distribution, such as in Example 6.4.1. In this section, we shall describe a standard método padrdo para melhorar a qualidade de tal aproximagdo com base nas method for improving the quality of such an approximation based on the systematic discrepancias sistematicas observadas no final do Exemplo 6.4.1. discrepancies that were noted at the end of Example 6.4.1. Deixarf(xeja o PF da variavel aleatéria discretaX, e suponha que desejamos Let f(x) be the p.f. of the discrete random variable X, and suppose that we wish aproximar a distribuigdo deXpor uma distribuigdo continua com pdfg(x). Para to approximate the distribution of X by a continuous distribution with p.d-f. g(x). To Figura 6.4Comparacao de Figure 6.4 Comparison of cdfs binomiais e normais. _ ee binomial and normal c.d.f.’s. , ee 1,0 Bindmio oo 1.0 Binomal oo v —--— Normal co a —--— Normal co E 0,8 / = os 7x 2 o g o ° O6 f S 0.6 7 8 t sg + E / g / 204 é = 04 i ° Z g Z irl 7 g 7 S 0,2 f B 0.2 Y 0) “30 40 50 60 70 xX 9} 30 40 50 60 Jo x 372 Capitulo 6 Grandes Amostras Aleatérias 372 Chapter 6 Large Random Samples ajude a discussdo, deixe Sseja uma variavel aleatdria com pdfg. Além disso, por aid the discussion, let Y be a random variable with p.d.f. g. Also, for simplicity, we simplicidade, assumiremos que todos os valores possiveis deXsdo inteiros. Esta condicgdo shall assume that all of the possible values of X are integers. This condition is satis- é satisfeita para as distribuigées binomial, hipergeométrica, Poisson e binomial negativa fied for the binomial, hypergeometric, Poisson, and negative binomial distributions descritas neste texto. described in this text. Se a distribuigdo deSfornece uma boa aproximacao para a distribuigdo dex, entao If the distribution of Y provides a good approximation to the distribution of X, para todos os inteirosaeb, podemos aproximar a probabilidade discreta then for all integers a and b, we can approximate the discrete probability y b Pr.(asXs bE f(x) (6.4.1) Pr(a < X <b)= > f(x) (6.4.1) xa w=4 pela probabilidade continua by the continuous probability Jo b Pr.(as SS bE giXax. (6.4.2) Praa<Y<b)= / g(x) dx. (6.4.2) a a Na verdade, esta aproximacdo foi usada nos Exemplos 6.3.2 e 6.3.9, ondeg(xffoi a pdf Indeed, this approximation was used in Examples 6.3.2 and 6.3.9, where g(x) was the normal apropriada derivada do teorema do limite central. appropriate normal p.d.f. derived from the central limit theorem. Esta aproximagdo simples tem a seguinte defici€ncia: Embora Pr(x2a)e Pr(X > This simple approximation has the following shortcoming: Although Pr(X > a) uma)ormalmente tera valores diferentes para a distribuigdo discreta de X, Pr.(Sza and Pr(X >a) will typically have different values for the discrete distribution of Pr.(S > uma)porqueStem uma distribuigdo continua. Outra forma de expressar esta X, Pr(Y > a) =Pr(Y > a) because Y has a continuous distribution. Another way of deficiéncia é a seguinte: Embora Pr(X=x) >0 para cada numero inteirox esse 6 um expressing this shortcoming is as follows: Although Pr(X = x) > 0 for each integer x valor possivel dex, Pr.(S=x0 para todosx. that is a possible value of X, Pr(Y = x) = 0 for all x. Aproximando um grafico de barras Approximating a Bar Chart O PFffx)de uma variavel aleatéria discretaXpode ser representado por umgrdfico de barras, como The p.f. f(x) of a discrete random variable X can be represented by a bar chart, as esbocado na Fig. 6.5. Para cada inteirox, a probabilidade de {X=x}é representado sketched in Fig. 6.5. For each integer x, the probability of {X =x} is represented 1 1 . 1 1 . pela area de um retangulo com base que se estende desdex- 5 paraxt 5 ecom um by the area of a rectangle with a base that extends from x — 5 tox + 5 and with a altura f/x). Assim, a area do retangulo cujo centro da base esta no height f(x). Thus, the area of the rectangle for which the center of the base is at the inteiroxe simples f(x). Um pdf aproximadog(xtambém esta esbocado na Fig. 6.5. Um grafico de integer x is simply f(x). An approximating p.d.f. g(x) is also sketched in Fig. 6.5. A barras com dreas de barras proporcionais as probabilidades é andlogo a um histograma (na bar chart with areas of bars proportional to probabilities is analogous to a histogram pagina 165) com areas de barras proporcionais as proporgées de uma amostra. (see page 165) with areas of bars proportional to proportions of a sample. Deste ponto de vista, pode-se perceber que Pr(asX<b), conforme especificado na From this point of view, it can be seen that Pr(a < X <b), as specified in Eq. (6.4.1), 6 a soma das areas dos retangulos da Fig. 6.5 que estado centrados emum, Eq. (6.4.1), is the sum of the areas of the rectangles in Fig. 6.5 that are centered um1,..., 6. Também pode ser visto na Fig. 6.5 que a soma dessas areas é ata,a+1,...,b. It can also be seen from Fig. 6.5 that the sum of these areas is Figura 6.5Aproximando um AX) Figure 6.5 Approximating 8(x) grafico de barras usando um pdf / i a bar chart by using a p.d-f. / he /| /| tay A na Y J Z| IN Z| IN | 7 ‘NI | 7 N\ Ran | |. | ob) an Ran Ra a1 5 a 5 x1 5 x 5 Dis b 5 a~ 35 at+3 x5 x+5 b- 5 b+; 6.4 A Correcdo para Continuidade 373 6.4 The Correction for Continuity 373 Figura 6.6Comparacao do cdf — Figure 6.6 Comparison of — binomial com o cdf normal 10 Binomio —<<“———— binomial c.d.f. with normal 1.0 Binomial —_—<—<“———— , ' --=-Normal(x 0,5) an . . A --- Normal (x + 0.5) am deslocado para a direitae paraa wv N ee c.d.f. shifted to the right and & ee 3 vesssseee Normal (x-0,5) Fs 3 sevssesee Normal (x — 0.5) F esquerda em 0,5. E 0,8 ee to the left by 0.5. 2 08 ee 2 “L & “L wo ae 5 Ve 4% 0,6 a = 06 a 8 “4 Zz “£ € ‘es & ‘es ° KB = re 5 04 e 204 & Y 0,2 “L 202 “L Z Z 0} “30 40 50 60 70 x 9] * 30 40 50 60 7 * aproximado pela integral approximated by the integral Jorn) b+(1/2) Pr.(a-122<S <bt+12- g(x)ax. (6.4.3) Pria—-1/2<Y <b41/2)= / g(x) dx. (6.4.3) a1?) a—(1/2) O ajuste da integral em (6.4.2) para a integral em (6.4.3) é chamado de corre¢ao The adjustment from the integral in (6.4.2) to the integral in (6.4.3) is called the para continuidade. correction for continuity. Exemplo Uma grande amostra.No final do Exemplo 6.4.1, descobrimos que quandoxestava um pouco acima Example A Large Sample. At the end of Example 6.4.1, we found that when x was a little above 6.4.2 um numero inteiro, a probabilidade aproximada Pr(S<x}é um pouco menor que a probabilidade 6.4.2 an integer, the approximating probability Pr(Y <x) is a bit smaller than the actual real Pr(X<x). A correcdo para continuidade desloca o cdf deSpara a esquerda em 0,5 quando probability Pr(X <x). The correction for continuity shifts the c.d.f. of Y to the left queremos calcular Pr(S<x)paraxum pouco acima de um numero inteiro. Esta mudanga substitui by 0.5 when we want to compute Pr(Y <x) for x a little above an integer. This shift Pr (SSx)por Pr.(SSx+0.5), que é maior e geralmente mais proximo de Pr(XSx). Da mesma forma, replaces Pr(Y <x) byPr(Y < x + 0.5), whichis larger and usually closer to Pr(X < x). quando queremos calcular Pr(S<x)quandoxest4 um pouco abaixo de um numero inteiro, a Similarly, when we want to compute Pr(Y < x) when x is a little below an integer, correcdo para continuidade desloca o cdf de Sa direita por 0,5 que substitui Pr(S<x)por Pr.(Ssx-0. the correction for continuity shifts the c.d.f. of Y to the right by 0.5 which replaces 5). A Figura 6.6 ilustra essas duas mudangas e mostra como cada uma delas se aproxima Pr(Y <x) by Pr(Y <x — 0.5). Figure 6.6 illustrates both of these shifts and shows how melhor da fdc binomial real do que da fdc normal nao deslocada da Fig. 6.4. they each approximate the actual binomial c.d-f. better than the unshifted normal - c.d.f. in Fig. 6.4. < Se usarmos a corregdo para continuidade, descobrimos que a probabilidadeffado If we use the correction for continuity, we find that the probability f(a) of the Unico inteiroapode ser aproximado da seguinte forma: single integer a can be approximated as follows: ( 1 1 ) 1 1 Pr.(X=aFPr.a- = <X€at = Pr(X =a) = Pr(a —-<X<act+ *) 2 2 2 2 Jara at(1/2) = g(xJax. (6.4.4) x / g(x) dx. (6.4.4) 2(\2) a—(1/2) De forma similar, Similarly, ( 1 ) 1 Pr.(X > umaFPr.(X2at+1 Pr. Xe at 5 Pr(X > a) =Pr(xX >a+)= Pr(x Sat ;) Joo 0° = g(x)ax. (6.4.5) x / g(x) dx. (6.4.5) a (12) a+(1/2) Exemplo Perguntas do exame.Para ilustrar 0 uso da corregdo para continuidade, devemos Example Examination Questions. To illustrate the use of the correction for continuity, we shall 6.4.3 considere novamente o Exemplo 6.3.9. Nesse exemplo, um exame contém 99 questées de 6.4.3 again consider Example 6.3.9. In that example, an examination contains 99 questions dificuldade variada e deseja-se determinar Pr(x260), ondeXdenota o numero total de of varying difficulty and it is desired to determine Pr(X > 60), where X denotes the perguntas que um determinado aluno responde corretamente. Entdo, nas condigdes do total number of questions that a particular student answers correctly. Then, under the exemplo, verifica-se a partir do teorema do limite central que o discreto conditions of the example, it is found from the central limit theorem that the discrete 374 Capitulo 6 Grandes Amostras Aleatérias 374 Chapter 6 Large Random Samples distribuigdo deXpoderia ser aproximado pela distribuigdo normal com média distribution of X could be approximated by the normal distribution with mean 49.5 49,5 e desvio padrdo 4,08. DeixarZ=(X-49.5)A.08. and standard deviation 4.08. Let Z = (X — 49.5) /4.08. Se usarmos a correcdo para continuidade, obtemos If we use the correction for continuity, we obtain ( ) 59.5 - 49.5 59.5 — 49.5 Pr.(X260FPr.(X259.5Pr.Z ——$_—— Pr(X > 60) = Pr(X > 59.5) = Pr(z > “| 4.08 4.08 =1 -(2.45100.007. &1— (2.4510) = 0.007. Este valor € um pouco maior que o valor 0,005, obtido na Seg. 6.3, sem a This value is somewhat larger than the value 0.005, which was obtained in Sec. 6.3, correcdo. - without the correction. < Exemplo Langamento de moedas.Suponha que uma moeda honesta seja lancada 20 vezes e que todos os langamentos sejam Example Coin Tossing. Suppose that a fair coin is tossed 20 times and that all tosses are 6.4.4 independente. Qual é a probabilidade de obter exatamente 10 caras? 6.4.4 independent. What is the probability of obtaining exactly 10 heads? DeixarXdenota o numero total de caras obtidas nos 20 langamentos. De acordo Let X denote the total number of heads obtained in the 20 tosses. According com 0 teorema do limite central, a distribuigdo deXsera aproximadamente a to the central limit theorem, the distribution of X will be approximately the normal distribuigdo normal com média 10 e desvio padrdao [(20)(112)12))12= 2.236. Se distribution with mean 10 and standard deviation [(20) (1/2) (1/2) ]/? = 2.236. If we usarmos a correcdo para continuidade, use the correction for continuity, Pr OPTOFPr.(9.95XEI 0.5) ) Pr(X = 10) = Pr(9.5 < X < 10.5) 0.5 0.5 0.5 0.5 prs 25 cx 05 =(-2% <2 <,85) 2.236 2.236 2.236 2.236 = (0.2236) (-0.22360.177. & ©(0.2236) — ®(—0.2236) = 0.177. O valor exato de Pr(xX=10)encontrado na tabela de probabilidades binomiais The exact value of Pr(X = 10) found from the table of binomial probabilities fornecida no final deste livro é 0,1762. Assim, a aproximagdo normal com a corregdo given at the back of this book is 0.1762. Thus, the normal approximation with the para continuidade é bastante boa. - correction for continuity is quite good. < Resumo Summary DeixarXser uma variavel aleatéria que aceita apenas valores inteiros. Suponha quexX Let X be a random variable that takes only integer values. Suppose that X has tem aproximadamente a distribuigdo normal com médiaye variagdooz. Deixaraeb approximately the normal distribution with mean jz and variance o”. Let a and b be sejam inteiros, e suponhamos que desejamos aproximar Pr(asX<b). A corregdo para integers, and suppose that we wish to approximate Pr(a < X <b). The correction to a aproximacdo da distribuigdo normal para continuidade é usar({6+122 -L/o} the normal distribution approximation for continuity is to use ®([b + 1/2 — p]/o) — (la-172 -pi/o Jem vez de([b-L/a } (La-Li/a como a aproximagao. ®([a — 1/2 — w/o) rather than ®([b — ]/o) — ®({a — p]/o) as the approximation. Exercicios Exercises 1.DeixarXi,..., X30ser varidveis aleatérias independentes, cada a.Determine aproximadamente o valor de Pr(x=4) 1. Let X1,..., X39 be independent random variables a. Determine approximately the value of Pr(X = 4) by uma com uma distribuigdo discreta com PF usando 0 teorema do limite central com a correcdo each having a discrete distribution with p.f. using the central limit theorem with the correction | para continuidade. for continuity. 1A sex=0 0u 2, b.Compare a resposta obtida na parte (a) com o valor 1/4 ifx=Oor2, b. Compare the answer obtained in part (a) with the hx | 1/2 sex=1, exato desta probabilidade. f@)= 41/2 ifx=1, exact value of this probability. 0 de outra forma. 0 otherwise. . . oo. . 3.Usando a correcdo para continuidade, determine a 3. Using the correction for continuity, determine the Use o teorema do limite central e a correcdo de continuidade probabilidade exigida no Exemplo 6.3.2. Use the central limit theorem and the correction for con- _ probability required in Example 6.3.2. para aproximar a probabilidade de queXi+. . .+X30 a ; tinuity to approximate the probability that X; +---+ X39 ; ; . ; & no maximo 33. 4.Usando a correcdo para continuidade, determine a is at most 33. 4. Using the correction for continuity, determine the probabilidade exigida no Exercicio 2 da Secdo. 6.3. probability required in Exercise 2 of Sec. 6.3. 2.DeixarXdenotam o numero total de sucessos em 15 tentativas _ — ; 2. Let X denote the total number of successes in 15 ; ; . ; de Bernoulli, com probabilidade de sucessop=0.3 em cada 5.Usando a correcdo para continuidade, determine a Bernoulli trials, with probability of success p = 0.3 on each 5. Using the correction for continuity, determine the tentativa. probabilidade exigida no Exercicio 3 da Secdo. 6.3. trial. probability required in Exercise 3 of Sec. 6.3. 6.5 Exercicios Suplementares 375 6.5 Supplementary Exercises 375 6.Usando a corregdo para continuidade, determine a 7.Usando a corregdo para continuidade, determine a 6. Using the correction for continuity, determine the 7. Using the correction for continuity, determine the probabilidade exigida no Exercicio 6 da Secao. 6.3. probabilidade exigida no Exercicio 7 da Secao. 6.3. probability required in Exercise 6 of Sec. 6.3. probability required in Exercise 7 of Sec. 6.3. 6.5 Exercicios Suplementares 6.5 Supplementary Exercises 1.Suponha que um par de dados equilibrados seja lancado 8.Deixar {pn}e1seja uma sequéncia de numeros tal que 0< 1. Suppose that a pair of balanced dice are rolled 120 8. Let {p,}°°, be a sequence of numbers such that 0 < 120 vezes eXdenotam o numero de langamentos em que a pn<1 para todosn. Suponha que limn-~pn=pcom O0<p < times, and let X denote the number of rolls on which the Py, < 1for alln. Assume that lim,_,.. p, = p with 0 < p< soma dos dois numeros é 7. Use 0 teorema do limite central 1. DeixeXntem a distribuigdo binomial com pardmetros kep sum of the two numbers is 7. Use the central limittheorem 1. Let X,, have the binomial distribution with parameters para determinar um valor deAtal que Pr(| X-20|<kjé npara algum numero inteiro positivok. Prove issoXn to determine a value of k such that Pr(|X — 20| <k) is k and p, for some positive integer k. Prove that X,, con- aproximadamente 0,95. converge na distribuicdo para a distribuigéo binomial com approximately 0.95. verges in distribution to the binomial distribution with 2.Suponha queXtem uma distribuicdo de Poisson com uma parametroskep. 2. Suppose that X has a Poisson distribution with a very parameters k and p. média muito grandeA. Explique por que a distribuigdo dex 9.Suponha que o numero de minutos necessarios para large mean A. Explain why the distribution of X can be 9. Suppose that the number of minutes required to serve a pode ser aproximado pela distribuigdo normal com médiaA e atender um cliente no caixa de um supermercado tenha uma approximated by the normal distribution with mean A customer at the checkout counter of a supermarket has an variagaéoA. Em outras palavras, explique por que(~<A)/A12 distribuigdo exponencial para a qual a média é 3. Usando o and variance A. In other words, explain why (X — a)/a!/? exponential distribution for which the mean is 3. Using the converge na distribuigdo, como/> ©,para uma variavel teorema do limite central, aproxime a probabilidade de que o converges in distribution, as A > oo, to arandom variable central limit theorem, approximate the probability that aleatoria com distribuigdo normal padrdo. tempo total necessario para atender uma amostra aleatoria having the standard normal distribution. the total time required to serve a random sample of 16 ee ; . de 16 os clientes excederdo uma hora. . too . customers will exceed one hour. 3.Suponha queAtem a distribuigdo de Poisson com média 3. Suppose that X has the Poisson distribution with mean 10. Use 0 teorema do limite central, sem e com correcgdo 10.Suponha que modelemos a ocorréncia de defeitos em uma 10. Use the central limit theorem, both without and with 10. Suppose that we model the ocurrence of defects on a de continuidade, para determinar um valor aproximado linha de fabricacdo de tecidos como um processo de Poisson com the correction for continuity, to determine an approximate fabric manufacturing line as a Poisson process with rate para Pr(8<X<12). Use a tabela de probabilidades de taxa de 0,01 por pé quadrado. Use 0 teorema do limite central value for Pr(8 < X < 12). Use the table of Poisson proba- 0.01 per square foot. Use the central limit theorem (both Poisson apresentada no final deste livro para avaliar a (com e sem a correcao para continuidade) para aproximar a bilities given in the back of this book to assess the quality with and without the correction for continuity) to approxi- qualidade dessas aproximacées. probabilidade de encontrar pelo menos 15 defeitos em 2.000 pés of these approximations. mate the probability that one would find at least 15 defects . | . quadrados de tecido. : : k in 2000 square feet of fabric. 4.Suponha queXé uma variavel aleatéria tal queEXk) 4. Suppose that X is a random variable such that E(X") existe e Pr(X20#1. Prove isso parak >0 e>0, 11.DeixarXxtem a distribuicgdéo gama com pardmetros ne 3, exists and Pr(X > 0) = 1. Prove that for k > 0 and t > 0, 11. Let X have the gamma distribution with parameters ondené um numero inteiro grande. n and 3, where n is a large integer. EXk) E(X*) ; Pr.(X20s —a—. a.Explique por que se pode usar 0 teorema do limite central Prix >t) < ke a. Explain why one can use the central limit theorem para aproximar a distribuigdo deXpor uma distribuigdo to approximate the distribution of X by a normal 5.Suponha que, ..., Xnformar uma amostra aleatoria de normal. 5. Suppose that X;,..., X,, form a random sample from distribution. a distribuicdo de Bernoulli com parametrop. DeixarXnseja a média b.Qual distribuicdo normal se aproxima da the Bernoulli distribution with parameter p. Let X,, be b. Which normal distribution approximates the distri- amostral. Encontre uma transformacao estabilizadora de variancia distribuicado dex? the sample average. Find a variance stabilizing transfor- bution of X? informagdo paraXn.Dica-Ao tentar encontrar a integral BC mation for X,,. Hint: When trying to find the integral of . . ae ; de (p4g.[1 -pl)-12, faca a substituicaoz penggeamtao 12.Deixarxtem a distribuigao binomial negativa com (p[1 — p}) 1/2, make the substitution z = Vp and then 12. Let X have the negative binomial distribution with arcsin, o inverso da funcdo sin. pardmetrosne 0,2, ondené um numero inteiro grande. think about arcsin, the inverse of the sin function. parameters n and 0.2, where n is a large integer. 6.Suponha que, ..., Xnformar uma amostra aleatéria de a-Explique porque sé pode sare teorema do limite central 6. Suppose that X;,..., X,, form a random sample from a. Explain why one ean se the central limit theorem a : : : = para aproximar a distribuicdo deXpor uma distribuicdo ae — to approximate the distribution of X by a normal a distribuigéo exponencial com média@. DeixarXnseja a média normal. the exponential distribution with mean @. Let X,, be the distribution. amostral. Encontre uma transformacao estabilizadora de variancia ae . sample average. Find a variance stabilizing transformation . Co, . oo, = b.Qual distribuigdo normal se aproxima da + b. Which normal distribution approximates the distri- paraXn. en for X,,. : distribuigdo dex? ” bution of X? 7.Suponha que, X2,...6 uma sequéncia de variaveis 7. Suppose that X1, X>, ...is a sequence of positive inte- aleatorias com valor inteiro positivo. Suponha que exista uma ger-valued random variables. Suppose that there is a func- fungdoftaly que para cadaeu=1,2,... ,limaon-oPr.(Xn= tion f such that for every m= 1, 2,..., lim,_,4, Pr(X, = mF ttm), eu=if(m1, ef(x0 para cadaxque m) = f(m), °°_, f(m) = 1, and f (x) =0 for every x that nao é€ um numero inteiro positivo. Deixar Seja o cdf discreto is not a positive integer. Let F be the discrete c.d.f. whose cujo PF éfProve issoXnconverge na distribuicdo paraF. p.f. is f. Prove that X,, converges in distribution to F. Chapter 7 Estimation 7.1 Statistical Inference 7.2 Prior and Posterior Distributions 7.3 Conjugate Prior Distributions 7.4 Bayes Estimators 7.5 Maximum Likelihood Estimators 7.6 Properties of Maximum Likelihood Estimators 7.7 Sufficient Statistics 7.8 Jointly Sufficient Statistics 7.9 Improving an Estimator 7.10 Supplementary Exercises 7.1 Statistical Inference Recall our various clinical trial examples. What would we say is the probability that a future patient will respond successfully to treatment after we observe the results from a collection of other patients? This is the kind of question that statistical inference is designed to address. In general, statistical inference consists of making probabilistic statements about unknown quantities. For example, we can compute means, variances, quantiles, probabilities, and some other quantities yet to be introduced concerning unobserved random variables and unknown parameters of distributions. Our goal will be to say what we have learned about the unknown quantities after observing some data that we believe contain relevant information. Here are some other examples of questions that statistical inference can try to answer. What can we say about whether a machine is functioning properly after we observe some of its output? In a civil lawsuit, what can we say about whether there was discrimination after observing how different ethnic groups were treated? The methods of statistical inference, which we shall develop to address these questions, are built upon the theory of probability covered in the earlier chapters of this text. Probability and Statistical Models In the earlier chapters of this book, we discussed the theory and methods of probabil- ity. As new concepts in probability were introduced, we also introduced examples of the use of these concepts in problems that we shall now recognize as statistical infer- ence. Before discussing statistical inference formally, it is useful to remind ourselves of those probability concepts that will underlie inference. Example 7.1.1 Lifetimes of Electronic Components. A company sells electronic components and they are interested in knowing as much as they can about how long each component is likely to last. They can collect data on components that have been used under typical conditions. They choose to use the family of exponential distributions to model the length of time (in years) from when a component is put into service until it fails. They would like to model the components as all having the same failure rate θ, but there is uncertainty about the specific numerical value of θ. To be more precise, 376 C7 felizmente Estimativa 7.1 Inferência estatística 7.2Distribuições anteriores e posteriores 7.3Distribuições anteriores conjugadas 7.4Estimadores Bayesianos 7,5Estimadores de Máxima Verossimilhança 7.6Propriedades dos estimadores de máxima verossimilhança 7.7Estatísticas suficientes 7,8Estatísticas conjuntamente suficientes 7,9Melhorando um Estimador 7.10Exercícios Suplementares 7.1 Inferência Estatística Lembre-se de nossos vários exemplos de ensaios clínicos. O que diríamos ser a probabilidade de um futuro paciente responder com sucesso ao tratamento depois de observarmos os resultados de um grupo de outros pacientes? Este é o tipo de questão que a inferência estatística pretende abordar. Em geral, a inferência estatística consiste em fazer afirmações probabilísticas sobre quantidades desconhecidas. Por exemplo, podemos calcular médias, variâncias, quantis, probabilidades e algumas outras quantidades ainda a serem introduzidas em relação a variáveis aleatórias não observadas e parâmetros desconhecidos de distribuições. Nosso objetivo será dizer o que aprendemos sobre as quantidades desconhecidas após observar alguns dados que acreditamos conterem informações relevantes. Aqui estão alguns outros exemplos de questões que a inferência estatística pode tentar responder. O que podemos dizer sobre se uma máquina está funcionando adequadamente depois de observarmos alguns de seus resultados? Num processo civil, o que podemos dizer sobre se houve discriminação depois de observarmos como os diferentes grupos étnicos foram tratados? Os métodos de inferência estatística, que desenvolveremos para responder a estas questões, baseiam-se na teoria da probabilidade abordada nos capítulos anteriores deste texto. Probabilidade e Modelos Estatísticos Nos capítulos anteriores deste livro, discutimos a teoria e os métodos de probabilidade. À medida que novos conceitos de probabilidade foram introduzidos, também introduzimos exemplos do uso desses conceitos em problemas que reconheceremos agora comoinferência estatística. Antes de discutir formalmente a inferência estatística, é útil lembrar-nos dos conceitos de probabilidade que fundamentarão a inferência. Exemplo 7.1.1 Vida útil dos componentes eletrônicos.Uma empresa vende componentes eletrônicos e eles estão interessados em saber o máximo que puderem sobre quanto tempo cada componente provavelmente durará. Eles podem coletar dados sobre componentes que foram usados em condições típicas. Eles optam por usar a família de distribuições exponenciais para modelar o período de tempo (em anos) desde que um componente é colocado em serviço até falhar. Eles gostariam de modelar os componentes como se todos tivessem a mesma taxa de falhasθ, mas há incerteza sobre o valor numérico específico deθ. Para ser mais preciso, 376 7.1 Inferéncia Estatistica 377 7.1 Statistical Inference 377 deixarXi, X2,.. .representam uma sequéncia de vidas Uteis dos componentes em anos. A let X,, X>,... stand for a sequence of component lifetimes in years. The company empresa acredita que se soubessem a taxa de falha@, entdoXi, X2,.. .seriam variaveis believes that if they knew the failure rate 6, then X,, X>,... would be 1.i.d. random aleatérias iid tendo a distribuigao exponencial com parametro@. (Veja a Secdo 5.7 paraa variables having the exponential distribution with parameter 0. (See Sec. 5.7 for the definicdo de distribuicdes exponenciais. Estamos usando 0 simbolo@para 0 parametro de definition of exponential distributions. We are using the symbol 6 for the parameter nossas distribuigdes exponenciais em vez defpara corresponder ao restante da notacdo neste of our exponential distributions rather than £ to match the rest of the notation in this capitulo.) Suponha que os dados que a empresa observara consistam nos valores deXi,..., Xeu chapter.) Suppose that the data that the company will observe consist of the values mas que eles ainda estado interessados emXeu+1, Xeu+2,....Eles também estado interessados@ of X,,..., X,, but that they are still interested in X,,,1, X,,42,.... They are also porque esta relacionado com a vida média. Como vimos na Eq. (5.7.17), a média de uma interested in 6 because it is related to the average lifetime. As we saw in Eq. (5.7.17), variavel aleatéria exponencial com parametro6é 1/6,é por isso que a empresa pensa em@como the mean of an exponential random variable with parameter 6 is 1/6, which is why a taxa de falha. the company thinks of 6 as the failure rate. Imaginamos um experimento cujos resultados sao sequéncias de tempos de vida We imagine an experiment whose outcomes are sequences of lifetimes as de- conforme descrito acima. Como ja mencionado, se soubéssemos 0 valor, entaoX, X2,.. .seria scribed above. As mentioned already, if we knew the value 6, then X,, X>,... would ser iid variaveis aleagorias. Neste caso, a lei dos grandes numeros (Teorema 6.2.4) diz be i.i.d. random variables. In this case, the law of large numbers (Theorem 6.2.4) says queamédiat 7, $i; Xeuconverge em probabilidade para a média 1/0.E Theo- that the average 1 >-;_, X; converges in probability to the mean 1/0. And Theo- rem 6.2.5 diz quen/ ~~ eu=1Xeuconverge em probabilidade para@. Porque6é uma fungdo rem 6.2.5 says that n/ }°"_, X; converges in probability to @. Because 6 is a function da sequéncia de tempos de vida que constituem cada resultado experimental, ela pode ser of the sequence of lifetimes that constitute each experimental outcome, it can be tratada como uma variavel aleatéria. Suponha que, antes de observar os dados, a empresa treated as a random variable. Suppose that, before observing the data, the com- acredite que a taxa de falha esta provavelmente em torno de 0,5/ano, mas ha um pouco de pany believes that the failure rate is probably around 0.5/year but there is quite a incerteza sobre isso. Eles modelam@como uma variavel aleatéria tendo distribuigdo gama com bit of uncertainty about it. They model 6 as a random variable having the gamma parametros 1 e 2. Para reformular o que foi afirmado anteriormente, eles também modelam™, distribution with parameters 1 and 2. To rephrase what was stated earlier, they also X2,...como varidveis aleatorias exponenciais condicionalmente iid com parametro @dado8. model X,, X5, ... as conditionally 1.i.d. exponential random variables with param- Eles esperam aprender mais sobre @de examinar os dados da amostra X1,..., Xeu. Eles nunca eter 6 given 6. They hope to learn more about @ from examining the sample data poderao aprender 6precisamente, porque isso exigiria observar toda a sequéncia infinitaX1, X2 X1,..., Xj. They can never learn 6 precisely, because that would require observ- ,...-Por esta razdo, 6 apenas hipoteticamente observavel. ing the entire infinite sequence X, X>, .... For this reason, 6 is only hypothetically - observable. < O Exemplo 7.1.1 ilustra varios recursos que serao comuns a maioria dos problemas de Example 7.1.1 illustrates several features that will be common to most statistical inferéncia estatistica e que constituem o que chamamos de modelo estatistico. inference problems and which constitute what we call a statistical model. Definicao Modelo Estatistico.Amode/o estatisticoconsiste em uma identificagdo de varidveis aleatérias Definition Statistical Model. A statistical model consists of an identification of random variables 7.1.1 de interesse (ambos observaveis e apenas observaveis hipoteticamente), uma especificagdo de uma 71.1 of interest (both observable and only hypothetically observable), a specification of a distribuicdo conjunta ou uma familia de possiveis distribuicdes conjuntas para as varidveis aleatérias joint distribution or a family of possible joint distributions for the observable random observaveis, a identificacgdo de quaisquer parametros dessas distribuicdes que sdo considerados variables, the identification of any parameters of those distributions that are assumed desconhecidos e possivelmente observaveis hipoteticamente, e (se desejado) uma especificacado para unknown and possibly hypothetically observable, and (if desired) a specification for uma distribuicdo (conjunta) para o(s) parametro(s) desconhecido(s). Quando tratamos 0(s) a (joint) distribution for the unknown parameter(s). When we treat the unknown parametro(s) desconhecido(s)@como aleatoria, entdo a distribuicdo conjunta das variaveis aleatérias parameter(s) 6 as random, then the joint distribution of the observable random observaveis indexadas poréé entendida como a distribuigéo condicional das varidveis _aleatdrias variables indexed by @ is understood as the conditional distribution of the observable observaveis dadas@. random variables given 6. No Exemplo 7.1.1, as variaveis aleatorias observaveis de interesse formam a sequéncia In Example 7.1.1, the observable random variables of interest form the sequence XM, X2,...,enquanto a taxa de falhaGé hipoteticamente observavel. A familia de possiveis X1, Xo,..., While the failure rate 6 is hypothetically observable. The family of distribuigdes conjuntas deX1, X2,.. .6 indexado pelo paradmetro@. A distribuigdo conjunta possible joint distributions of X;, Xz, ... is indexed by the parameter 6. The joint dos observaveis correspondentes ao valor Gé aqueleX1, X2,...sdo varidveis aleatdrias distribution of the observables corresponding to the value 6 is that X;, X>,... are iid, cada uma tendo a distribuigdo exponencial com parametro@, Esta 6 também a Lid. random variables each having the exponential distribution with parameter 0. distribuigdo condicional dex, X2, .. .dado@porque estamos tratando Gcomo uma variavel This is also the conditional distribution of X,, X>, .. . given 6 because we are treating aleatoria. A distribuigdo de 6 a distribuigdo gama com parametros 1 e 2. 6 as aradom variable. The distribution of 6 is the gamma distribution with parameters 1 and 2. Nota: Redefinindo ideias antigas.0 leitor notara que um modelo estatistico nada mais é do Note: Redefining Old Ideas. The reader will notice that a statistical model is nothing que uma identificacdo formal de muitas caracteristicas que utilizamos em varios exemplos ao more than a formal identification of many features that we have been using in various longo dos capitulos anteriores deste livro. Alguns exemplos precisam apenas de alguns dos examples throughout the earlier chapters of this book. Some examples need only recursos que constituem uma especificacdo completa de um modelo estatistico, enquanto a few of the features that make up a complete specification of a statistical model, outros exemplos usam a especificacdo completa. Nas Secées 7.1-7.4, devemos while other examples use the complete specification. In Sections 7.1-7.4, we shall 378 Capitulo 7 Estimativa 378 Chapter 7 Estimation introduzir uma quantidade consideravel de terminologia, a maior parte da qual é mera formalizacdo introduce a considerable amount of terminology, most of which is mere formalization de conceitos que foram introduzidos e usados em varios lugares no inicio do livro. O objectivo de of concepts that have been introduced and used in several places earlier in the book. todo este formalismo é€ ajudar-nos a manter os conceitos organizados para que possamos saber The purpose of all of this formalism is to help us to keep the concepts organized so quando estamos a aplicar as mesmas ideias de novas formas e quando estamos a introduzir novas that we can tell when we are applying the same ideas in new ways and when we are ideias. introducing new ideas. Agora estamos prontos formalmente para introduzir a inferéncia estatistica. We are now ready formally to introduce statistical inference. Definigao Inferéncia estatistica.Ainferéncia estatisticaé um procedimento que produz uma probabilidade Definition Statistical Inference. A statistical inference is a procedure that produces a probabilistic 7.1.2 declaracdo sobre algumas ou todas as partes de um modelo estatistico. 71.2 statement about some or all parts of a statistical model. Por “afirmacao probabilistica” queremos dizer uma afirmacdo que faz uso de qualquer um dos By a “probabilistic statement” we mean a statement that makes use of any of the conceitos da teoria da probabilidade que foram discutidos anteriormente no texto ou que ainda serdo concepts of probability theory that were discussed earlier in the text or are yet to discutidos posteriormente no texto. Alguns exemplos incluem uma média, uma média condicional, be discussed later in the text. Some examples include a mean, a conditional mean, a um quantil, uma variancia, uma distribuicdo condicional para uma variavel aleatoria dada outra, a quantile, a variance, a conditional distribution for a random variable given another, probabilidade de um evento, uma probabilidade condicional de um evento dado algo e assim por the probability of an event, a conditional probability of an event given something, diante. No Exemplo 7.1.1, aqui estao alguns exemplos de infer€ncias estatisticas que se pode querer and so on. In Example 7.1.1, here are some examples of statistical inferences that fazer: one might wish to make: Produza uma variavel aleatoriaS(uma fungdo deX,..., Xeu) tal que Pr(S= 0| ¢ Produce a random variable Y (a function of X;,..., X,,) such that Pr(Y > O8F0.9. 6|6) = 0.9. Produza uma variavel aleatériaSque esperamos estar perto@. ¢ Produce a random variable Y that we expect to be close to 6. Calcule a probabilidade de que a média das proximas 10 vidas,1 7 deulo ¢« Compute how likely it is that the average of the next 10 lifetimes, a ve Xj, é pelo menos 2. is at least 2. Diga algo sobre o qudo confiantes estamos de que@s0.4 depois de observar™,..., Xeu. ¢ Say something about how confident we are that 6 < 0.4 after observing Xj, ..., X,,. Todos esses tipos de inferéncia e outros serdo discutidos com mais detalhes posteriormente neste All of these types of inference and others will be discussed in more detail later in this livro. book. Na Definigdo 7.1.1, distinguimos entre varidveis aleatérias observaveis e hipoteticamente In Definition 7.1.1, we distinguished between observable and hypothetically ob- observaveis. Reservamos 0 nomeobservavelpara uma variavel aleatéria que temos servable random variables. We reserved the name observable for a random variable essencialmente certeza de que poderiamos observar se dedicadssemos 0 esforco necessario that we are essentially certain that we could observe if we devoted the necessary ef- para observa-la. O nomehipoteticamente observavefoi usado para uma variavel aleatéria que fort to observe it. The name hypothetically observable was used for a random variable exigiria recursos infinitos para ser observada, como 0 limite (como ~)das médias amostrais that would require infinite resources to observe, such as the limit (as n > oo) of the do primeironobservaveis. Neste texto, tais variaveis aleatorias hipoteticamente observaveis sample averages of the first n observables. In this text, such hypothetically observ- corresponderdo aos parametros da distribuigdo conjunta dos observaveis como no Exemplo able random variables will correspond to the parameters of the joint distribution of 7.1.1. Como esses pardmetros figuram de forma tao proeminente em muitos dos tipos de the observables as in Example 7.1.1. Because these parameters figure so prominently problemas de inferéncia que veremos, vale a pena formalizar o conceito de pardmetro. in many of the types of inference problems that we will see, it pays to formalize the concept of parameter. Definicgao Parametro/Espaco de parametro.Em um problema de inferéncia estatistica, uma caracteristica ou Definition Parameter/Parameter space. In a problem of statistical inference, a characteristic or 7.1.3 combinac&o de caracteristicas que determinam a distribuicdo conjunta das variadveis aleatdérias 7.1.3 combination of characteristics that determine the joint distribution for the random de interesse é chamada deparémetroda distribuigdo. O conjunto de todos os valores possiveis variables of interest is called a parameter of the distribution. The set Q of all pos- de um pardmetro@ou de um vetor de parametros(@1, ... ,Oké chamado de espaco de sible values of a parameter @ or of a vector of parameters (6), ..., 0,) is called the parametros. parameter space. Todas as familias de distribuig6es apresentadas anteriormente (e que serdo All of the families of distributions introduced earlier (and to be introduced later) apresentadas posteriormente) neste livro possuem pardmetros que estdo incluidos nos in this book have parameters that are included in the names of the individual mem- nomes dos membros individuais da familia. Por exemplo, a familia de distribuigdes bers of the family. For example, the family of binomial distributions has parameters binomiais possui parametros que chamamosnep, a familia de distribuigdes normais é that we called n and p, the family of normal distributions is parameterized by the parametrizada pela médiaye variagdoozde cada distribuicgdo, a familia de distribuigées mean w and variance o” of each distribution, the family of uniform distributions on uniformes em intervalos é parametrizada pelos extremos dos intervalos, a familia de intervals is parameterized by the endpoints of the intervals, the family of exponential distribuig6es exponenciais é parametrizada pelo parametro taxa@, e assim por diante. distributions is parameterized by the rate parameter 6, and so on. 7.1 Inferéncia Estatistica 379 7.1 Statistical Inference 379 No Exemplo 7.1.1, 0 parametro a taxa de falha) deve ser positiva. Portanto, a menos que In Example 7.1.1, the parameter 0 (the failure rate) must be positive. Therefore, certos valores positivos de@podem ser explicitamente descartados como possiveis valores de6, unless certain positive values of 6 can be explicitly ruled out as possible values of 0, 0 espaco de parametros sera 0 conjunto de todos os nimeros positivos. Como outro exemplo, the parameter space © will be the set of all positive numbers. As another example, suponha que a distribui¢do das alturas dos individuos em uma determinada populaco seja suppose that the distribution of the heights of the individuals in a certain population considerada a distribuigéo normal com médiawye variagdo oz, mas que os valores exatos depecz is assumed to be the normal distribution with mean jy and variance o”, but that the sao desconhecidos. O significativoye a variagdoozdeterminar a distribuicdo normal especifica exact values of 4 and o? are unknown. The mean w and the variance o? determine para as alturas dos individuos. Entao(u, o2)pode ser considerado um par de parametros. Neste the particular normal distribution for the heights of individuals. So (u, 07) can be exemplo de alturas, ambosyeozdeve ser positivo. Portanto, o espaco de parametros pode ser considered a pair of parameters. In this example of heights, both 4. and o* must be considerado como o conjunto de todos os pares (1/,02) de tal modo quey/ >0 e02>0. Sea positive. Therefore, the parameter space Q can be taken as the set of all pairs (1, 0) distribuigao normal neste exemplo representa a distribuicgdo das alturas em polegadas dos such that pp > 0 and o? > 0. If the normal distribution in this example represents the individuos em alguma populacdo especifica, podemos ter certeza de que 30<i/<100 ea2<50. distribution of the heights in inches of the individuals in some particular population, Neste caso, 0 espaco de parametros poderia ser considerado como o conjunto menor de todos we might be certain that 30 < < 100 ando” < 50. In this case, the parameter space Q OS pares (1,02) tal que 30<p/<100 e 0<a2<50. could be taken as the smaller set of all pairs (u, 07) such that 30 < w < 100 and 0<o7 <50. A caracteristica importante do espaco de pardmetros é que ele deve conter todos os The important feature of the parameter space Q is that it must contain all possible valores possiveis dos parametros em um determinado problema, para que possamos ter values of the parameters in a given problem, in order that we can be certain that the certeza de que o valor real do vetor de parametros é um ponto em. actual value of the vector of parameters is a point in Q. Exemplo Um ensaio clinico.Suponha que 40 pacientes vdo receber um tratamento para um Example A Clinical Trial. Suppose that 40 patients are going to be given a treatment for a 7.1.2 condigdo e que observaremos para cada paciente se ele se recupera ou nao da condicao. 7.1.2 condition and that we will observe for each patient whether or not they recover from Provavelmente também estamos interessados em uma grande colecdo de pacientes adicionais the condition. We are most likely also intersted in a large collection of additional além dos 40 a serem observados. Para ser especifico, para cada pacienteeu=1,2, ..., deixar Xeu= patients besides the 40 to be observed. To be specific, for each patienti =1,2,..., 1 se pacienteeuse recupera e deixaXeu=0 se nado. Como uma colecao de distribuicdes possiveis let X; =1 if patient i recovers, and let X; = 0 if not. As a collection of possible paraXi, X2,...,poderiamos optar por dizer que oXeusdo iid tendo a distribuicaéo de Bernoulli distributions for X,, X>,..., we could choose to say that the X, are ii.d. having com parametroppara Osps1. Neste caso, 0 parametro pé conhecido por estar no intervalo the Bernoulli distribution with parameter p for 0 < p < 1. In this case, the parameter fechado [0,1], e este intervalo pode ser tomado como 0 espaco de paradmetros. Observe p is known to lie in the closed interval [0, 1], and this interval could be taken as the também que a lei dos grandes numeros (Teorema 6.2.4) diz que pé 0 limite comomvai ao infinito parameter space. Notice also that the law of large numbers (Theorem 6.2.4) says that da proporcao do primeironpacientes que se recuperam. p is the limit as n goes to infinity of the proportion of the first n patients who recover. - < Na maioria dos problemas, existe uma interpretacdo natural do parametro como uma In most problems, there is a natural interpretation for the parameter as a feature caracteristica das possiveis distribuigdes dos nossos dados. No Exemplo 7.1.2, o parametrop of the possible distributions of our data. In Example 7.1.2, the parameter p has a tem uma interpretacao natural como a propor¢ao de uma grande populacao de pacientes que natural interpretation as the proportion out of a large population of patients given recebem o tratamento que se recuperam da doenga. No Exemplo 7.1.1, 0 parametro@tem uma the treatment who recover from the condition. In Example 7.1.1, the parameter 0 interpretacdo natural como uma taxa de falha, ou seja, uma ao longo do tempo de vida médio has a natural interpretation as a failure rate, that is, one over the average lifetime de uma grande populac¢ao de vidas. Nesses casos, a infer€ncia sobre parametros pode ser of a large population of lifetimes. In such cases, inference about parameters can interpretada como inferéncia sobre a caracteristica que 0 parametro representa. Neste texto, be interpreted as inference about the feature that the parameter represents. In todos os parametros terdo tais interpretagdes naturais. Em exemplos encontrados fora de um this text, all parameters will have such natural interpretations. In examples that curso introdutério, as interpretagdes podem no ser tao diretas. one encounters outside of an introductory course, interpretations may not be as straightforward. Exemplos de inferéncia estatistica Examples of Statistical Inference Aqui est&o alguns exemplos de modelos estatisticos e infer€ncias que foram introduzidos Here are some of the examples of statistical models and inferences that were intro- anteriormente no texto. duced earlier in the text. Exemplo Um ensaio clinico.O ensaio clinico introduzido no Exemplo 2.1.4 preocupou-se com Example A Clinical Trial. The clinical trial introduced in Example 2.1.4 was concerned with 7.1.3 qual a probabilidade de os pacientes evitarem recaidas durante varios tratamentos. Para cadaeu, 7.1.3 how likely patients are to avoid relapse while under various treatments. For each i, deixarXeu=1 se pacienteeuno grupo da imipramina evita recaidas eXeu=0 caso contrario. DeixarP let X,; = 1if patient 7 in the imipramine group avoids relapse and X; = 0 otherwise. representam a proporcdo de pacientes que evitam a recaida em um grande grupo que recebe Let P stand for the proportion of patients who avoid relapse out of a large group tratamento com imipramina. SePé desconhecido, podemos modelarX1, X2, ...como iid receiving imipramine treatment. If P is unknown, we can model Xj, Xo, ... as iid. 380 Capitulo 7 Estimativa 380 Chapter7 Estimation Variaveis aleatorias de Bernoulli com pardmetropcondicional aP=p. Os pacientes na Bernoulli random variables with parameter p conditional on P = p. The patients in coluna da imipramina da Tabela 2.1 devem nos fornecer algumas informagées que the imipramine column of Table 2.1 should provide us with some information that mudem nossa incerteza sobreP?.Uma inferéncia estatistica consistiria em fazer uma changes our uncertainty about P. A statistical inference would consist of making declaragao de probabilidade sobre os dados e/ouPe quais sdo os dados eAconte-nos um a probability statement about the data and/or P, and what the data and P tell us sobre o outro. Por exemplo, no Exemplo 4.7.8, assumimos quePteve a distribuigdo about each other. For instance, in Example 4.7.8, we assumed that P had the uniform uniforme no intervalo [0,1], e encontramos a distribuigdo condicional dePdados os distribution on the interval [0, 1], and we found the conditional distribution of P given resultados observados do estudo. Também calculamos a média condicional dePdados os the observed results of the study. We also computed the conditional mean of P given resultados do estudo, bem como o MSE para tentar preverPantes e depois de observar os the study results as well as the M.S.E. for trying to predict P both before and after resultados do estudo. - observing the results of the study. < Exemplo Particulas Radioativas.No Exemplo 5.7.8, as particulas radioativas atingem um alvo de acordo Example Radioactive Particles. In Example 5.7.8, radioactive particles reach a target according 7.1.4 para um processo de Poisson com taxa desconhecidaf. No Exercicio 22 da Seco. 5.7, vocé foi 7.1.4 to a Poisson process with unknown rate 6. In Exercise 22 of Sec. 5.7, you were asked solicitado a encontrar a distribuicdo condicional dedepois de observar o processo de Poisson por um to find the conditional distribution of 6 after observing the Poisson process for a certo periodo de tempo. - certain amount of time. < Exemplo Antropometria de besouros de pulga.No Exemplo 5.10.2, tragamos duas medidas fisicas Example Anthropometry of Flea Beetles. In Example 5.10.2, we plotted two physical measure- 7.1.5 mentos de uma amostra de 31 besouros de pulgas juntamente com contornos de uma distribuicdo 7.1.5 ments from a sample of 31 flea beetles together with contours of a bivariate normal normal bivariada. A familia de distribuigdes normais bivariadas é parametrizada por cinco distribution. The family of bivariate normal distributions is parameterized by five quantidades: as duas médias, as duas variancias e a correlacdo. A escolha de qual conjunto de cinco quantities: the two means, the two variances, and the correlation. The choice of which pardmetros usar para a distribuicdo ajustada é uma forma de infer€ncia estatistica conhecida como set of five parameters to use for the fitted distribution is a form of statistical inference estimativa. - known as estimation. < Exemplo Intervalo para média.Suponha que a altura dos homens numa determinada populacao siga Example Interval for Mean. Suppose that the heights of men in a certain population follow 7.1.6 a distribuig¢do normal com médiaye varidncia 9, como no Exemplo 5.6.7. Desta vez, 7.1.6 the normal distribution with mean yz and variance 9, as in Example 5.6.7. This time, suponha que nao sabemos 0 valor da médiay, mas preferimos aprender sobre isso assume that we do not know the value of the mean yz, but rather we wish to learn about por amostragem da populagdo. Suponha que decidamos (amostran=36 homens e ) it by sampling from the population. Suppose that we decide to sample n = 36 men and deixar Xnrepresentam a média de suas alturas. Entdo 0 intervaloXn-0.98, Xn+0.98 let X,, stand for the average of their heights. Then the interval (x, — 0.98, X, + 0.98) calculado no Exemplo 5.6.8 tem a propriedade de conter o valor deyicom computed in Example 5.6.8 has the property that it will contain the value of jz with probabilidade 0,95. - probability 0.95. < Exemplo Discriminagao na selecdo do juri.No Exemplo 5.8.4, estavamos interessados em saber se Example Discrimination in Jury Selection. In Example 5.8.4, we were interested in whether 7.1.7 houve evidéncias de discriminagdo contra os mexicanos-americanos na selecdo dos jurados. A 7.1.7 there was evidence of discrimination against Mexican Americans in juror selection. Figura 5.8 mostra como as pessoas que entraram no caso com opinides diferentes sobre a Figure 5.8 shows how people who came into the case with different opinions about extensdo da discriminagdo (se houver) poderiam alterar as suas opinides a luz da aprendizagem the extent of discrimination (if any) could alter their opinions in the light of learning das provas numéricas apresentadas no caso. - the numerical evidence presented in the case. < Exemplo Tempos de servico em uma fila.Suponha que os clientes em uma fila devam esperar pelo atendimento, Example Service Times in a Queue. Suppose that customers in a queue must wait for service, 7.1.8 e que consigamos observar os tempos de atendimento de diversos clientes. Suponha que 7.1.8 and that we get to observe the service times of several customers. Suppose that we estejamos interessados na taxa com que os clientes sdo atendidos. No Exemplo 5.7.3, are interested in the rate at which customers are served. In Example 5.7.3, we let Z deixamosZ representa a taxa de servico e, no Exemplo 5.7.4, mostramos como encontrar a stand for the service rate, andin Example 5.7.4, we showed how to find the conditional distribuicao condicional deZdados varios tempos de servicgo observados. - distribution of Z given several observed service times. < Classes Gerais de Problemas de Inferéncia General Classes of Inference Problems PredigaoUma forma de inferéncia é tentar prever variaveis aleatorias que ainda ndo Prediction One form of inference is to try to predict random variables that have foram observadas. No Exemplo 7.1.1, podemos estar interessados na média de not yet been observed. In Example 7.1.1, we might be interested in the average of nas prdéximas 10 vidas, 4 2 eur O . Xeu. NO exemplo do ensaio clinico (Exemplo 7.1.3), nds the next 10 lifetimes, a ee X;. In the clinical trial example (Example 7.1.3), we pode estar interessado em prever quantos pacientes do préximo grupo de pacientes no grupo might be interested in predicting how many patients from the next set of patients da imipramina terdo resultados bem-sucedidos. Em praticamente todos os problemas de in the imipramine group will have successful outcome. In virtually every statistical inferéncia estatistica, nos quais ndo observamos todos os dados relevantes, a previsdo inference problem, in which we have not observed all of the relevant data, prediction 7.1 Statistical Inference 381 is possible. When the unobserved quantity to be predicted is a parameter, prediction is usually called estimation, as in Example 7.1.5. Statistical Decision Problems In many statistical inference problems, after the ex- perimental data have been analyzed, we must choose a decision from some available class of decisions with the property that the consequences of each available decision depend on the unknown value of some parameter. For example, we might have to estimate the unknown failure rate θ of our electronic components when the con- sequences depend on how close our estimate is to the correct value θ. As another example, we might have to decide whether the unknown proportion P of patients in the imipramine group (Example 7.1.3) is larger or smaller than some specified con- stant when the consequences depend on where P lies relative to the constant. This last type of inference is closely related to hypothesis testing, the subject of Chapter 9. Experimental Design In some statistical inference problems, we have some control over the type or the amount of experimental data that will be collected. For example, consider an experiment to determine the mean tensile strength of a certain type of alloy as a function of the pressure and temperature at which the alloy is produced. Within the limits of certain budgetary and time constraints, it may be possible for the experimenter to choose the levels of pressure and temperature at which experi- mental specimens of the alloy are to be produced, and also to specify the number of specimens to be produced at each of these levels. Such a problem, in which the experimenter can choose (at least to some extent) the particular experiment that is to be carried out, is called a problem of experimental design. Of course, the design of an experiment and the statistical analysis of the experimental data are closely related. One cannot design an effective experiment without considering the subsequent statistical analysis that is to be carried out on the data that will be obtained. And one cannot carry out a meaningful statistical analysis of experimental data without considering the particular type of experiment from which the data were derived. Other Inferences The general classes of problems described above, as well as the more specific examples that appeared earlier, are intended as illustrations of types of statistical inferences that we will be able to perform with the theory and methods introduced in this text. The range of possible models, inferences, and methods that can arise when data are observed in real research problems far exceeds what we can introduce here. It is hoped that gaining an understanding of the problems that we can cover here will give the reader an appreciation for what needs to be done when a more challenging statistical problem arises. Definition of a Statistic Example 7.1.9 Failure Times of Ball Bearings. In Example 5.6.9, we had a sample of the numbers of millions of revolutions before failure for 23 ball bearings. We modeled the lifetimes as a random sample from a lognormal distribution. We might suppose that the parameters μ and σ 2 of that lognormal distribution are unknown and that we might wish to make some inference about them. We would want to make use of the 23 observed values in making any such inference. But do we need to keep track of all 23 values or are there some summaries of the data on which our inference will be based? ◀ 7.1 Inferência Estatística 381 é possível. Quando a quantidade não observada a ser prevista é um parâmetro, a predição é geralmente chamadaestimativa,como no Exemplo 7.1.5. Problemas de decisão estatísticaEm muitos problemas de inferência estatística, após a análise dos dados experimentais, devemos escolher uma decisão de alguma classe disponível de decisões com a propriedade de que as consequências de cada decisão disponível dependem do valor desconhecido de algum parâmetro. Por exemplo, podemos ter que estimar a taxa de falha desconhecidaθde nossos componentes eletrônicos quando as consequências dependem de quão próxima nossa estimativa está do valor corretoθ. Como outro exemplo, poderíamos ter que decidir se a proporção desconhecidaPde pacientes no grupo da imipramina (Exemplo 7.1.3) é maior ou menor do que alguma constante especificada quando as consequências dependem de ondePreside em relação à constante. Este último tipo de inferência está intimamente relacionado comtestando hipóteses, assunto do Capítulo 9. Design experimentalEm alguns problemas de inferência estatística, temos algum controle sobre o tipo ou a quantidade de dados experimentais que serão coletados. Por exemplo, considere um experimento para determinar a resistência média à tração de um certo tipo de liga em função da pressão e da temperatura nas quais a liga é produzida. Dentro dos limites de certas restrições orçamentárias e de tempo, pode ser possível ao experimentador escolher os níveis de pressão e temperatura nos quais as amostras experimentais da liga serão produzidas, e também especificar o número de amostras a serem produzidas em cada desses níveis. Tal problema, no qual o experimentador pode escolher (pelo menos até certo ponto) o experimento específico que será realizado, é chamado de problema dedesign experimental. É claro que o desenho de um experimento e a análise estatística dos dados experimentais estão intimamente relacionados. Não se pode desenhar um experimento eficaz sem considerar a análise estatística subsequente que será realizada nos dados que serão obtidos. E não se pode realizar uma análise estatística significativa de dados experimentais sem considerar o tipo específico de experiência do qual os dados foram derivados. Outras inferênciasAs classes gerais de problemas descritas acima, bem como os exemplos mais específicos que apareceram anteriormente, pretendem ser ilustrações de tipos de inferências estatísticas que seremos capazes de realizar com a teoria e os métodos apresentados neste texto. A gama de modelos, inferências e métodos possíveis que podem surgir quando os dados são observados em problemas reais de pesquisa excede em muito o que podemos apresentar aqui. Espera-se que a compreensão dos problemas que podemos abordar aqui dê ao leitor uma ideia do que precisa ser feito quando surgir um problema estatístico mais desafiador. Definição de uma estatística Exemplo 7.1.9 Tempos de falha de rolamentos de esferas.No Exemplo 5.6.9, tivemos uma amostra dos números de milhões de rotações antes da falha de 23 rolamentos de esferas. Modelamos os tempos de vida como uma amostra aleatória de uma distribuição lognormal. Poderíamos supor que os parâmetrosμeσ2dessa distribuição lognormal são desconhecidos e que poderíamos querer fazer algumas inferências sobre eles. Gostaríamos de fazer uso dos 23 valores observados para fazer tal inferência. Mas será que precisamos de registar todos os 23 valores ou existem alguns resumos dos dados nos quais a nossa inferência se baseará? - 382 Capitulo 7 Estimativa 382 Chapter 7 Estimation Cada inferéncia estatistica que aprenderemos a realizar neste livro sera baseada em um ou Each statistical inference that we will learn how to perform in this book will be alguns resumos dos dados disponiveis. Esses resumos de dados surgem com tanta frequéncia based on one or a few summaries of the available data. Such data summaries arise e sdo tao fundamentais para a infer€ncia que recebem um nome especial. so often and are so fundamental to inference that they receive a special name. Definigao Estatistica.Suponha que as variaveis aleatdrias observaveis de interesse sejamX,..., Xn. Definition Statistic. Suppose that the observable random variables of interest are X;,..., X,. 7.1.4 Deixar ser uma funcdo arbitraria de valor real denvariaveis reais. Entdo a variavel 7.1.4 Let r be an arbitrary real-valued function of n real variables. Then the random aleatéria7=r(M1, ..., Xn} chamado deestatistica. variable T =r(X,,..., X,) is called a statistic. Trés exemplos de estatisticas sdo a média amostralXn, o maximo Sndos Three examples of statistics are the sample mean X,,, the maximum Y, of the valores deX,..., Xn,e a funcdor(™, ..., Xn), que tem o valor constante 3 para values of Xj, ..., X,, and the function r(X,, ..., X,,), which has the constant value todos os valores deXi,..., Xn. 3 for all values of X),..., Xp. Exemplo Tempos de falha de rolamentos de esferas.No Exemplo 7.1.9, suponha que estivéssemos interessados em Example Failure Times of Ball Bearings. In Example 7.1.9, suppose that we were interested in 7.1.10 fazendo uma declaracdo sobre o quao longepé de 40. Entéo podemos querer usar a estatistica 7.1.10 making a statement about how far yz is from 40. Then we might want to use the statistic Led oe - | 36 ome 4 T= s dX log(X;) 4 em nosso procedimento de inferéncia. Nesse caso, 76 uma medida ingénua de até que ponto os dados in our inference procedure. In this case, T is a naive measure of how far the data sugerem queyé a partir dos 40. - suggest that jz is from 40. < Exemplo Intervalo para média.No Exemplo 7.1.6, construimos um intervalo que tem probabilidade Example Interval for Mean. In Example 7.1.6, we constructed an interval that has probability 7.1.11 0,95 de contendoy. Os pontos finais desse intervalo, ou seja,Xn-0.98 eXr+0.98, sdo TI. 0.95 of containing jz. The endpoints of that interval, namely, X,, — 0.98 and X,, + 0.98, estatisticas. - are statistics. < Muitas inferéncias podem prosseguir sem a construcdo explicita de estatisticas como uma etapa Many inferences can proceed without explicitly constructing statistics as a pre- preliminar. Contudo, a maioria das inferéncias envolvera 0 uso de estatisticas que poderiam ser liminary step. However, most inferences will involve the use of statistics that could identificadas antecipadamente. E saber quais estatisticas sdo Uteis em quais inferéncias pode be identified in advance. And knowing which statistics are useful in which inferences simplificar bastante a implementagao da inferéncia. Expressar uma inferéncia em termos estatisticos can greatly simplify the implementation of the inference. Expressing an inference in também pode nos ajudar a decidir até que ponto a infer€ncia atende as nossas necessidades. Por terms of statistics can also help us to decide how well the inference meets out needs. exemplo, no Exemplo 7.1.10, se estimarmos | 4-40| por 7,podemos usar a distribuicdo de 7para ajudar For instance, in Example 7.1.10, if we estimate | — 40| by 7, we can use the distri- a determinar a probabilidade de que 7difere de | 440| por uma grande quantidade. A medida que bution of T to help determine how likely it is that T differs from | — 40| by a large construimos inferéncias especificas mais adiante neste livro, chamaremos a atencdo para as amount. As we construct specific inferences later in this book, we will draw attention estatisticas que desempenham papéis importantes na inferéncia. to those statistics that play important roles in the inference. @) | Parametros como varidveis aleatdrias e) Parameters as Random Variables Existe alguma controvérsia sobre se os pardmetros devem ser tratados como variaveis There is some controversy over whether parameters should be treated as random aleatérias ou apenas como numeros que indexam uma distribuicdo. Por exemplo, no variables or merely as numbers that index a distribution. For instance, in Exam- Exemplo 7.1.3, deixamosPrepresentam a proporcao de pacientes que evitam a recaida de ple 7.1.3, we let P stand for the proportion of the patients who avoid relapse from um grande grupo que recebe imipramina. Dizemos entao queXi, X2, .. .sdo varidveis a large group receiving imipramine. We then say that X;, X>,... are iid. Bernoulli aleatorias iid Bernoulli com parametropcondicional aP=p. Aqui, estamos pensando random variables with parameter p conditional on P = p. Here, we are explicitly explicitamente emPcomo uma variavel aleatéria, e damos a ela uma distribuicdo. Uma thinking of P as a random variable, and we give it a distribution. An alternative alternativa seria dizer queXi, X2,...sdo variaveis _ aleatorias iid Bernoulli com parametro would be to say that X,, X>, ... arei.i.d. Bernoulli random variables with parameter pondepé desconhecido e deixe por isso mesmo. p where p is unknown and leave it at that. Se realmente quisermos calcular algo como a probabilidade condicional de que a If we really want to compute something like the conditional probability that the proporcdoPfor maior que 0,5 dadas as observacées dos primeiros 40 pacientes, ent&o proportion P is greater than 0.5 given the observations of the first 40 patients, then precisamos da distribuig&éo condicional dePdados os primeiros 40 pacientes, e devemos tratarP we need the conditional distribution of P given the first 40 patients, and we must como uma variavel aleatoria. Por outro lado, se estivermos interessados apenas em fazer treat P as arandom variable. On the other hand, if we are only interested in making declaracdes de probabilidade que sejam indexadas pelo valor dep, ent&o nao precisamos probability statements that are indexed by the value of p, then we do not need to pensar em uma variavel aleatéria chamadaP.Por exemplo, podemos desejar encontrar duas think about a random variable called P. For example, we might wish to find two varidveis aleatériasSieS2(fungées dex, ..., X40) tal que, ndo importa o quep random variables Y, and Y> (functions of X,,..., X49) such that, no matter what p 7.1 Statistical Inference 383 equals, the probability that Y1 ≤ p ≤ Y2 is at least 0.9. Some of the inferences that we shall discuss later in this book are of the former type that require treating P as a random variable, and some are of the latter type in which p is merely an index for a distribution. Some statisticians believe that it is possible and useful to treat parameters as random variables in every statistical inference problem. They believe that the dis- tribution of the parameter is a subjective probability distribution in the sense that it represents an individual experimenter’s information and subjective beliefs about where the true value of the parameter is likely to lie. Once they assign a distribution for a parameter, that distribution is no different from any other probability distri- bution used in the field of statistics, and all of the rules of probability theory apply to every distribution. Indeed, in all of the cases described in this book, the parame- ters can actually be identified as limits of functions of large collections of potential observations. Here is a typical example. Example 7.1.12 Parameter as a Limit of Random Variables. In Example 7.1.3, the parameter P can be understood as follows: Imagine an infinite sequence of potential patients receiving imipramine treatment. Assume that for every integer n, the outcomes of every or- dered subset of n patients from that infinite sequence has the same joint distribution as the outcomes of every other ordered subset of n patients. In other words, assume that the order in which the patients appear in the sequence is irrelevant to the joint distribution of the patient outcomes. Let Pn be the proportion of patients who don’t relapse out of the first n. It can be shown that the probability is 1 that Pn converges to something as n → ∞. That something can be thought of as P, which we have been calling the proportion of successes in a very large population. In this sense, P is a ran- dom variable because it is a function of other random variables. A similar argument can be made in all of the statistical models in this book involving parameters, but the mathematics needed to make these arguments precise is too advanced to present here. (Chapter 1 of Schervish (1995) contains the necessary details.) Statisticians who argue as in this example are said to adhere to the Bayesian philosophy of statistics and are called Bayesians. ◀ There is another line of reasoning that leads naturally to treating P as a ran- dom variable in Example 7.1.12 without relying on an infinite sequence of potential patients. Suppose that the number of potential patients is enough larger than any sam- ple that we will see to make the approximation in Theorem 5.3.4 applicable. Then P is just the proportion of successes among the large population of potential pa- tients. Conditional on P = p, the number of successes in a sample of n patients will be approximately a binomial random variable with paramters n and p according to Theorem 5.3.4. If the outcomes of the patients in the sample are random variables, then it makes sense that the proportion of successes among those patients is also random. There is another group of statisticians who believe that in many problems it is not appropriate to assign a distribution to a parameter but claim instead that the true value of the parameter is a certain fixed number whose value happens to be unknown to the experimenter. These statisticians would assign a distribution to a parameter only when there is extensive previous information about the relative frequencies with which similar parameters have taken each of their possible values in past experiments. If two different scientists could agree on which past experiments were similar to the present experiment, then they might agree on a distribution to be assigned to the parameter. For example, suppose that the proportion θ of defective items in a certain large manufactured lot is unknown. Suppose also that 7.1 Inferência Estatística 383 é igual, a probabilidade de queS1≤p≤S2é pelo menos 0,9. Algumas das inferências que discutiremos mais adiante neste livro são do primeiro tipo, que requerem tratamentoPcomo uma variável aleatória, e alguns são do último tipo em quepé apenas um índice para uma distribuição. Alguns estatísticos acreditam que é possível e útil tratar parâmetros como variáveis aleatórias em todos os problemas de inferência estatística. Eles acreditam que a distribuição do parâmetro é uma distribuição de probabilidade subjetiva no sentido de que representa as informações e crenças subjetivas de um experimentador individual sobre onde provavelmente estará o verdadeiro valor do parâmetro. Uma vez atribuída uma distribuição para um parâmetro, essa distribuição não é diferente de qualquer outra distribuição de probabilidade usada no campo da estatística, e todas as regras da teoria da probabilidade se aplicam a cada distribuição. Na verdade, em todos os casos descritos neste livro, os parâmetros podem, na verdade, ser identificados como limites de funções de grandes coleções de observações potenciais. Aqui está um exemplo típico. Exemplo 7.1.12 Parâmetro como Limite de Variáveis Aleatórias.No Exemplo 7.1.3, o parâmetroPpode ser entendido da seguinte forma: Imagine uma sequência infinita de pacientes potenciais recebendo tratamento com imipramina. Suponha que para cada número inteiron, os resultados de cada subconjunto ordenado denpacientes dessa sequência infinita tem a mesma distribuição conjunta que os resultados de qualquer outro subconjunto ordenado den pacientes. Em outras palavras, suponha que a ordem em que os pacientes aparecem na sequência é irrelevante para a distribuição conjunta dos resultados dos pacientes. DeixarPnser a proporção de pacientes que não recaem no primeiron. Pode-se mostrar que a probabilidade é 1 de quePnconverge para algo comon→ ∞.Esse algo pode ser pensado comoP,que temos chamado de proporção de sucessos em uma população muito grande. Nesse sentido,Pé uma variável aleatória porque é uma função de outras variáveis aleatórias. Um argumento semelhante pode ser apresentado em todos os modelos estatísticos deste livro envolvendo parâmetros, mas a matemática necessária para tornar esses argumentos precisos é demasiado avançada para ser apresentada aqui. (O Capítulo 1 de Schervish (1995) contém os detalhes necessários.) Diz-se que os estatísticos que argumentam como neste exemplo aderem à filosofia bayesiana da estatística e são chamadosBayesianos. - Há outra linha de raciocínio que leva naturalmente a tratarPcomo uma variável aleatória no Exemplo 7.1.12 sem depender de uma sequência infinita de pacientes potenciais. Suponha que o número de pacientes potenciais seja suficientemente maior do que qualquer amostra que veremos para tornar aplicável a aproximação do Teorema 5.3.4. Então Pé apenas a proporção de sucessos entre a grande população de pacientes potenciais. Condicional emP=p, o número de sucessos em uma amostra denpacientes serão aproximadamente uma variável aleatória binomial com parâmetrosnepde acordo com o Teorema 5.3.4. Se os resultados dos pacientes na amostra forem variáveis aleatórias, então faz sentido que a proporção de sucessos entre esses pacientes também seja aleatória. Há outro grupo de estatísticos que acredita que em muitos problemas não é apropriado atribuir uma distribuição a um parâmetro, mas afirma, em vez disso, que o verdadeiro valor do parâmetro é um determinado número fixo cujo valor é desconhecido pelo experimentador. Esses estatísticos atribuiriam uma distribuição a um parâmetro somente quando houvesse extensa informação prévia sobre as frequências relativas com as quais parâmetros semelhantes assumiram cada um de seus valores possíveis em experimentos anteriores. Se dois cientistas diferentes pudessem concordar sobre quais experiências passadas eram semelhantes à presente, então poderiam concordar sobre uma distribuição a ser atribuída ao parâmetro. Por exemplo, suponha que a proporçãoθO número de itens defeituosos em um determinado grande lote fabricado é desconhecido. Suponha também que 384 Chapter 7 Estimation the same manufacturer has produced many such lots of items in the past and that detailed records have been kept about the proportions of defective items in past lots. The relative frequencies for past lots could then be used to construct a distribution for θ. Statisticians who would argue this way are said to adhere to the frequentist philosophy of statistics and are called frequentists. The frequentists rely on the assumption that there exist infinite sequences of random variables in order to make sense of most of their probability statements. Once one assumes the existence of such an infinite sequence, one finds that the parameters of the distributions being used are limits of functions of the infinite sequences, just as do the Bayesians described above. In this way, the parameters are random variables because they are functions of random variables. The point of disagreement between the two groups is whether it is useful or even possible to assign a distribution to such parameters. Both Bayesians and frequentists agree on the usefulness of families of distri- butions for observations indexed by parameters. Bayesians refer to the distribution indexed by parameter value θ as the conditional distribution of the observations given that the parameter equals θ. Frequentists refer to the distribution indexed by θ as the distribution of the observations when θ is the true value of the parameter. The two groups agree that whenever a distribution can be assigned to a parameter, the theory and methods to be described in this chapter are applicable and useful. In Sections 7.2–7.4, we shall explicitly assume that each parameter is a random random variable and we shall assign it a distribution that represents the probabilities that the parameter lies in various subsets of the parameter space. Beginning in Sec. 7.5, we shall consider techniques of estimation that are not based on assigning distributions to parameters. References In the remainder of this book, we shall consider many different problems of statistical inference, statistical decision, and experimental design. Some books that discuss statistical theory and methods at about the same level as they will be discussed in this book were mentioned at the end of Sec. 1.1. Some statistics books written at a more advanced level are Bickel and Doksum (2000), Casella and Berger (2002), Cram´er (1946), DeGroot (1970), Ferguson (1967), Lehmann (1997), Lehmann and Casella (1998), Rao (1973), Rohatgi (1976), and Schervish (1995). Exercises 1. Identify the components of the statistical model (as defined in Definition 7.1.1) in Example 7.1.3. 2. Identify two statistical inferences mentioned in Exam- ple 7.1.3. 3. In Examples 7.1.4 and 5.7.8 (page 323), identify the components of the statistical model as defined in Defini- tion 7.1.1. 4. In Example 7.1.6, identify the components of the sta- tistical model as defined in Definition 7.1.1. 5. In Example 7.1.6, identify any statistical inference men- tioned. 6. In Example 5.8.3 (page 328), identify the components of the statistical model as defined in Definition 7.1.1. 7. In Example 5.4.7 (page 293), identify the components of the statistical model as defined in Definition 7.1.1. 384 Capítulo 7 Estimativa o mesmo fabricante produziu muitos desses lotes de itens no passado e que foram mantidos registros detalhados sobre as proporções de itens defeituosos em lotes anteriores. As frequências relativas dos lotes anteriores poderiam então ser usadas para construir uma distribuição paraθ. Diz-se que os estatísticos que argumentam desta forma aderem à filosofia frequentista da estatística e são chamadosfrequentistas. Os frequentistas baseiam-se na suposição de que existem sequências infinitas de variáveis aleatórias para dar sentido à maioria de suas declarações de probabilidade. Uma vez assumida a existência de tal sequência infinita, descobre-se que os parâmetros das distribuições utilizadas são limites de funções das sequências infinitas, tal como fazem os bayesianos descritos acima. Desta forma, os parâmetros são variáveis aleatórias porque são funções de variáveis aleatórias. O ponto de divergência entre os dois grupos é se é útil ou mesmo possível atribuir uma distribuição a tais parâmetros. Tanto os bayesianos quanto os freqüentistas concordam sobre a utilidade das famílias de distribuições para observações indexadas por parâmetros. Bayesianos referem-se à distribuição indexada pelo valor do parâmetroθcomo a distribuição condicional das observações dado que o parâmetro é igualθ. Os freqüentadores referem-se à distribuição indexada por θcomo a distribuição das observações quandoθé o verdadeiro valor do parâmetro. Os dois grupos concordam que sempre que uma distribuição puder ser atribuída a um parâmetro, a teoria e os métodos descritos neste capítulo serão aplicáveis e úteis. Nas Seções 7.2 a 7.4, assumiremos explicitamente que cada parâmetro é uma variável aleatória aleatória e atribuiremos a ele uma distribuição que represente as probabilidades de o parâmetro estar em vários subconjuntos do espaço de parâmetros. Começando na Seg. 7.5, consideraremos técnicas de estimação que não se baseiam na atribuição de distribuições a parâmetros. Referências No restante deste livro, consideraremos muitos problemas diferentes de inferência estatística, decisão estatística e planejamento experimental. Alguns livros que discutem teoria e métodos estatísticos aproximadamente no mesmo nível em que serão discutidos neste livro foram mencionados no final da Seção. 1.1. Alguns livros de estatística escritos em nível mais avançado são Bickel e Doksum (2000), Casella e Berger (2002), Cramér (1946), DeGroot (1970), Ferguson (1967), Lehmann (1997), Lehmann e Casella (1998) , Rao (1973), Rohatgi (1976) e Schervish (1995). Exercícios 1.Identifique os componentes do modelo estatístico (conforme definido na Definição 7.1.1) no Exemplo 7.1.3. 4.No Exemplo 7.1.6, identifique os componentes do modelo estatístico conforme definido na Definição 7.1.1. 5.No Exemplo 7.1.6, identifique qualquer inferência estatística mencionada. 2.Identifique duas inferências estatísticas mencionadas no Exemplo 7.1.3. 6.No Exemplo 5.8.3 (página 328), identifique os componentes do modelo estatístico conforme definido na Definição 7.1.1. 3.Nos Exemplos 7.1.4 e 5.7.8 (página 323), identifique os componentes do modelo estatístico conforme definido na Definição 7.1.1. 7.No Exemplo 5.4.7 (página 293), identifique os componentes do modelo estatístico conforme definido na Definição 7.1.1. 7.2 Distribuigdes Anteriores e Posteriores 385 7.2 Prior and Posterior Distributions 385 7.2 Distribuicées Anteriores e Posteriores 7.2 Prior and Posterior Distributions A distribuig¢ao de um pardmetro antes de observar qualquer dado é chamada de The distribution of a parameter before observing any data is called the prior distribui¢ao a priori do parametro. A distribuic¢ao condicional do parametro dados os distribution of the parameter. The conditional distribution of the parameter given dados observados é chamada de distribui¢ao posterior. Se inserirmos os valores the observed data is called the posterior distribution. If we plug the observed values observados dos dados no PF ou pdf condicional dos dados dado o parametro, o resultado of the data into the conditional p.f: or p.d.f. of the data given the parameter, the seré uma func&o apenas do parametro, que é chamada de funcao de verossimilhanca. result is a function of the parameter alone, which is called the likelihood function. A Distribuigao Prévia The Prior Distribution Exemplo Vida util dos componentes eletrénicos.No Exemplo 7.1.1, tempos de vidaXi, X2, .. .de ele- Example Lifetimes of Electronic Components. In Example 7.1.1, lifetimes X,, X>,... of elec- 7.2.1 componentes eletrénicos foram modelados como varidveis _aleatérias exponenciais iid com 7.2.1 tronic components were modeled as i.i.d. exponential random variables with param- parametro@condicional a6, umyd Gfoi interpretado como a taxa de falha dos componentes. eter 6 conditional on 6, and 6 was interpreted as the failure rate of the components. Com efeito, notamos quen/ éu=1 Xeudleve convergir em probabilidade para @comorai para Indeed, we noted that n/ an X; should converge in probability to 6 as n goes to c.Dissemos entdo queéteve a distribuigdo gama com os pardmetros 1 e 2. - oo. We then said that 6 had the gamma distribution with parameters 1 and 2. < A distribuiggo de @mencionado no final do Exemplo 7.2.1 foi atribuido antes de observar qualquer um The distribution of 9 mentioned at the end of Example 7.2.1 was assigned before ob- dos tempos de vida dos componentes. Por esta razio, chamamos isso dedistribuicao prévia. serving any of the component lifetimes. For this reason, we call it a prior distribution. Definigao Distribuigdo Prévia/pf/pdfSuponha que se tenha um modelo estatistico com parametro@. Definition Prior Distribution/p.f./p.d.f. Suppose that one has a statistical model with parameter 0. 7.2.1 Se alguém tratar 6como aleatorio, entao a distribuigdo que se atribui a@antes de observar as 7.2.1 If one treats 6 as random, then the distribution that one assigns to 6 before observing outras variaveis aleatdérias de interesse é chamada dedistribuicao prévia. Se 0 espaco de the other random variables of interest is called its prior distribution. If the parameter parametros for no maximo contavel, entdo a distribuicdo a priori é discreta e seu PF 6 chamado space is at most countable, then the prior distribution is discrete and its p.f. is called deantes pfde@. Se a distribuicgdo anterior for uma distribuigdo continua, entao sua pdf é the prior p.f. of 6. If the prior distribution is a continuous distribution, then its p.d.f. chamada depdf anteriorde@. Usaremos comumente 0 simbolo&@)para denotar o PF ou pdf is called the prior p.d.f. of 6. We shall commonly use the symbol &(@) to denote the anterior em fungdo de@. prior p.f. or p.d-f. as a function of 6. Quando se trata o parametro como uma variavel aleatéria, o nome “distribuicdo When one treats the parameter as a random variable, the name “prior distribu- anterior” é apenas outro nome para a distribuigéo marginal do parametro. tion” is merely another name for the marginal distribution of the parameter. Exemplo Moeda Justa ou de Duas Cabecas.Deixar @denotam a probabilidade de obter uma cara quando um Example Fair or Two-Headed Coin. Let 6 denote the probability of obtaining a head when a 7.2.2 certa moeda é langada e suponha que se saiba que a moeda é honesta ou tem cara em 7.2.2 certain coin is tossed, and suppose that it is known that the coin either is fair or has cada lado. Portanto, os Unicos valores possiveis de@sdo0 6= 1/2 e@= 1. Se a probabilidade a head on each side. Therefore, the only possible values of 6 are 6 = 1/2 and 6 = 1. If anterior de que a moeda seja honesta for 0,8, entdo o PF anterior deGé&1/20.8 e&1 £0. the prior probability that the coin is fair is 0.8, then the prior p.f. of 6 is (1/2) = 0.8 2. - and €(1) = 0.2. < Exemplo Proporcdo de itens defeituosos.Suponha que a proporcgdoéde itens defeituosos em um Example Proportion of Defective Items. Suppose that the proportion 0 of defective items in a 7.2.3 grande lote fabricado é desconhecido e que a distribuicdo prévia atribuida aGé o 7.2.3 large manufactured lot is unknown and that the prior distribution assigned to 6 is the distribuigdo uniforme no intervalo [0,1]. Entdo o pdf anterior deéé uniform distribution on the interval [0, 1]. Then the prior p.d.f. of @ is { 1 para 0<6<1, 1 ford <6 <1, EO P (7.2.1) é(6) = | ' (7.2.1) O de outra forma. 0 otherwise. - < A distribuigaéo anterior de um pardmetroédeve ser uma distribuicdo de probabilidade The prior distribution of a parameter 6 must be a probability distribution over sobre o espaco de parametros. Assumimos que 0 experimentador ou estatistico seré capaz de the parameter space 2. We assume that the experimenter or statistician will be able resumir suas informagées e conhecimentos anteriores sobre onde estd 0 valor de 6 provavel to summarize his previous information and knowledge about where in Q the value of que minta construindo uma distribuigado de probabilidade no conjunto. Em outras palavras, 6 is likely to lie by constructing a probability distribution on the set Q. In other words, antes que os dados experimentais tenham sido coletados ou observados, a experiéncia e o before the experimental data have been collected or observed, the experimenter’s conhecimento passados do experimentador o levardo a acreditar que6é mais provavel que past experience and knowledge will lead him to believe that 6 is more likely to lie esteja em certas regides do que em outras. Assumiremos que as probabilidades relativas in certain regions of Q than in others. We shall assume that the relative likelihoods raduzido do Inglés para o Portugués - www.onlinedoctranslator.com 386 Capitulo 7 Estimativa 386 Chapter 7 Estimation das diferentes regides pode ser expressa em termos de uma distribuigdo de probabilidade em, of the different regions can be expressed in terms of a probability distribution on Q, ou seja, a distribuigdo anterior deé. namely, the prior distribution of 6. Exemplo Vida util das [ampadas fluorescentes.Suponha que a vida Util (em horas) das lampadas fluorescentes Example Lifetimes of Fluorescent Lamps. Suppose that the lifetimes (in hours) of fluorescent 7.2.4 devem ser observadas lampadas de um determinado tipo e que a vida util de qualquer 7.2.4 lamps of a certain type are to be observed and that the the lifetime of any particular lampada especifica tenha a distribuigdo exponencial com paradmetro@. Suponha também lamp has the exponential distribution with parameter 6. Suppose also that the exact que o valor exato de6é desconhecido e, com base na experiéncia anterior, a distribuicado value of @ is unknown, and on the basis of previous experience the prior distribution anterior de@é considerado como a distribuigdo gama para a qual a média é 0,0002 eo of @ is taken as the gamma distribution for which the mean is 0.0002 and the standard desvio padrao é 0,0001. Determinaremos a pdf anterior de@. deviation is 0.0001. We shall determine the prior p.d-f. of 6. Suponha que a distribui¢do anterior de a distribuig¢do gama com pardmetrosa Suppose that the prior distribution of 6 is the gamma distribution with param- oe fo. Foi mostrado no Teorema 5.7.5 que a média desta distribuigdo eters a and Bp. It was shown in Theorem 5.7.5 that the mean of this distribution éa0/oe a variacdo 6a0/2 0. Portanto, ao/Bo= 0.0002 eai2 0 /f= 0.0001. is wo/Bo and the variance is ag/f2. Therefore, a9/ fy = 0.0002 and a)’ /By = 0.0001. Resolver essas duas equacdes daao= 4 efo= 20,000. Seque-se da Eq. (5.7.13) que Solving these two equations gives ag = 4 and By = 20,000. It follows from Eq. (5.7.13) a pdf anterior de @parad >0 é o seguinte: that the prior p.d.f. of 6 for 6 > 0 is as follows: 0,000 20,000)4 3 _ GO Oe (7.2.2) éE(0) = or oe 20,0000 (7.2.2) Também, &(@= 0 paraGso0. - Also, €(@) = 0 for 6 <0. < No restante desta secdo e nas Secées 7.3 e 7.4, nos concentraremos em In the remainder of this section and Sections 7.3 and 7.4, we shall focus on problemas de inferéncia estatistica nos quais o pardametro& uma variavel aleatoria statistical inference problems in which the parameter @ is a random variable of de interesse e, portanto, precisara receber uma distribuigdo. Nesses problemas, nos interest and hence will need to be assigned a distribution. In such problems, we shall referiremos a distribuigdo indexada por@para as outras variaveis aleatdrias de refer to the distribution indexed by @ for the other random variables of interest interesse como a distribuicgdo condicional para essas variaveis aleatdorias fornecidas as the conditional distribution for those random variables given @. For example, @. Por exemplo, esta é precisamente a linguagem usada no Exemplo 7.2.1 onde o this is precisely the language used in Example 7.2.1 where the parameter is 0, the pardmetro 66, a taxa de falha. Ao se referir ao PF ou pdf condicional de variaveis failure rate. In referring to the conditional p.f. or p.d.f. of random variables, such as aleatorias, como %1, X2,...no Exemplo 7.2.1, usaremos a notacdo de PF's e pdf's X1, Xz, ...in Example 7.2.1, we shall use the notation of conditional p.f’s and p.d.f.’s. condicionais. Por exemplo, se deixarmosxX=(X1,..., XeuyJno Exemplo 7.2.1, a pdf For example, if we let X = (X,,..., X,,) in Example 7.2.1, the conditional p.d.f. of condicional de Xdado 6é X given 6 is { fou(x| O)- Gevexperiéncia(-Oxit. ..+Xeu])0 para todosxeu>0, 7.2.3) f,(r|0) = 6” exp(—O[x, +---+%,]) for all x > 0, (7.2.3) caso contrario. 0 otherwise. Em muitos problemas, como o Exemplo 7.2.1, os dados observaveisX1, X2,.. .Sd0 In many problems, such as Example 7.2.1, the observable data X;, X>,... are modelados como uma amostra aleatéria de uma distribuicgdo univariada indexada por @. modeled as a random sample from a univariate distribution indexed by @. In these Nestes casos, deixe/fx| @denotam o PF ou pdf de uma Unica variavel aleatoria sob a cases, let f(x|@) denote the p.f. or p.d.f. of a single random variable under the distribuigdo indexada por@. Nesse caso, usando a notacdo acima, distribution indexed by 9. In such a case, using the above notation, feu(x| OF f(x | 8)... f(xeu| ). Fin (¥19) = f (110) +> > F%mlA)- Quando nos tratamos@como uma variavel aleatoria, /(x| 0)é o PF ou pdf condicional de When we treat 6 as a random variable, f(x|@) is the conditional pf. or p.d-f. of cada observacdoXeudado@, e as observacées sdo condicionalmente iid dadas@. Em each observation X; given 6, and the observations are conditionally i.i.d. given 6. resumo, as duas express6es seguintes devem ser entendidas como equivalentes: In summary, the following two expressions are to be understood as equivalent: M,...,Xnformar uma amostra aleatoria com PF ou PDFffx| 8). ¢ X,,...,X, form a random sample with pf. or p.d.f. f (x|@). M,..., XnSdo condicionalmente iid dados 6com pf condicional ou pdf/(x| 8). ° X1,..., X, are conditionally i.i.d. given 6 with conditional p.f. or p.d.f. f(«|@). Embora geralmente utilizemos o texto do primeiro item acima para simplificar, muitas Although we shall generally use the wording in the first bullet above for simplicity, vezes é util lembrar que os dois textos sdo equivalentes quando tratamos@ como uma it is often useful to remember that the two wordings are equivalent when we treat 6 variavel aleatoria. as a random variable. Anidlise de Sensibilidade e Prioridades IndevidasNo Exemplo 2.3.8 na pagina 84, vimos um Sensitivity Analysis and Improper Priors In Example 2.3.8 on page 84, we saw a situagdo em que dois conjuntos muito diferentes de probabilidades a priori foram usados para uma colecdo situation in which two very different sets of prior probabilities were used for a col- de eventos. Depois de observarmos os dados, no entanto, as probabilidades posteriores foram lection of events. After we observed data, however, the posterior probabilities were 7.2 Distribuigdes Anteriores e Posteriores 387 7.2 Prior and Posterior Distributions 387 bastante similar. No Exemplo 5.8.4 na pagina 330, usamos uma grande colecado de quite similar. In Example 5.8.4 on page 330, we used a large collection of prior dis- distribuigées anteriores para um parametro, a fim de ver quanto impacto a distribuicdo tributions for a parameter in order to see how much impact the prior distribution anterior teve na probabilidade posterior de um Unico evento importante. E uma pratica had on the posterior probability of a single important event. It is a common practice comum comparar as distribuig6es posteriores que surgem de varias distribuicgées to compare the posterior distributions that arise from several different prior distri- anteriores diferentes, a fim de ver quanto efeito a distribuigdo anterior tem nas respostas butions in order to see how much effect the prior distribution has on the answers to a questées importantes. Tais comparagées sdo chamadasandlise sensitiva. important questions. Such comparisons are called sensitivity analysis. E muito frequente que diferentes distribuicdes anteriores ndo fagam muita diferenca depois de It is very often the case that different prior distributions do not make much os dados terem sido observados. Isto € especialmente verdadeiro se houver muitos dados ou se as difference after the data have been observed. This is especially true if there are a lot of distribuigdes anteriores comparadas estiverem muito dispersas. Esta observagdo tem duas data or if the prior distributions being compared are very spread out. This observation implicacdes importantes. Primeiro, 0 facto de diferentes experimentadores poderem nao concordar has two important implications. First, the fact that different experimenters might not sobre uma distribuigdo anterior torna-se menos importante se houver muitos dados. Em segundo agree on a prior distribution becomes less important if there are a lot of data. Second, lugar, os experimentadores podem estar menos inclinados a gastar tempo especificando uma experimenters might be less inclined to spend time specifying a prior distribution if distribuicdo anterior se ndo for muito importante qual delas sera especificada. Infelizmente, se nao it is not going to matter much which one is specified. Unfortunately, if one does not for especificada alguma distribuicdo anterior, ndo ha como calcular uma distribuigado condicional do specify some prior distribution, there is no way to calculate a conditional distribution parametro dados os dados. of the parameter given the data. Como expediente, existem alguns calculos disponiveis que tentam captar a ideia de As an expedient, there are some calculations available that attempt to capture que os dados contém muito mais informagao do que a disponivel a priori. Geralmente, the idea that the data contain much more information than is available a priori. esses calculos envolvemfestou usando uma funcao€é/@)como se fosse um pdf anterior para Usually, these calculations involve using a function €(@) as if it were a prior p.d.f. for 0 pardametroémas tal queé(@)dG=~,0 que claramente viola a definicdo de pdf. Tais the parameter @ but such that { €(@) d@ =o, which clearly violates the definition anteriores sdo chamados/mproprio. Discutiremos anteriores imprdoprios com mais of p.d.f. Such priors are called improper. We shall discuss improper priors in more detalhes na Seg. 7.3. detail in Sec. 7.3. A Distribuigdo Posterior The Posterior Distribution Exemplo Vida util das [ampadas fluorescentes.No Exemplo 7.2.4, construimos uma distribuicdo anterior Example Lifetimes of Fluorescent Lamps. In Example 7.2.4, we constructed a prior distribution 7.2.5 para o paradmetro @que especifica a distribuigdo exponencial para um conjunto de vidas 7.2.5 for the parameter 6 that specifies the exponential distribution for a collection of life- Uteis de lampadas fluorescentes. Suponha que observamos uma colecdo dentais vidas. times of fluorescent lamps. Suppose that we observe a collection of n such lifetimes. Como mudariamos a distribuigéo de@levar em conta os dados observados? How would we change the distribution of 6 to take account of the observed data? - < Definicao Distribuigdo Posterior/pf/pdfConsidere um problema de inferéncia estatistica com parametros Definition Posterior Distribution/p.f./p.d.f. Consider a statistical inference problem with param- 7.2.2 éterGe variaveis aleatdériasX1,..., Xna ser observado. A distribuigdo condicional de@ 7.2.2 eter 6 and random variables X,,..., X,, to be observed. The conditional distribution dado, ..., Xné chamado dedistribui¢ao posteriorde @. O PF condicional ou pdf de@ of 6 given Xj,..., X,, is called the posterior distribution of 6. The conditional p.f. or dadoX1=x1,..., Xn=xné chamado deFP posteriorouposterior pdfded e é normalmente p.d.f. of 6 given Xj =x,..., X, =x, is called the posterior p.f or posterior p.d.f of @ denotado€(6| x1, ..., xn). and is typically denoted €(0|x1,..., x,). Quando se trata o parametro como uma variavel aleatéria, o nome “distribuicdo When one treats the parameter as a random variable, the name “posterior dis- posterior” é apenas outro nome para a distribuicgdo condicional do pardmetro dados os tribution” is merely another name for the conditional distribution of the parameter dados. O teorema de Bayes para variadveis aleatérias (3.6.13) e para vetores aleatdérios given the data. Bayes’ theorem for random variables (3.6.13) and for random vec- (3.7.15) nos diz como calcular a pdf posterior ou PF de@depois de observar os dados. tors (3.7.15) tells us how to compute the posterior p.d.f. or p.f. of 6 after observing Revisaremos aqui a derivacdo do teorema de Bayes usando a nota¢do especifica de data. We shall review the derivation of Bayes’ theorem here using the specific nota- distribuigées e parametros anteriores. tion of prior distributions and parameters. Teorema Suponha que o/variaveis aleatériasX,..., Xnformar uma amostra aleatoria de um Theorem Suppose that the n random variables X,..., X, form a random sample from a 7.2.1 distribui¢do para a qual o pdf ou o PF é/(x| 8). Suponha também que 0 valor do 7.2.1 distribution for which the p.d.f. or the p.f. is f(x|9). Suppose also that the value of pardmetro6é desconhecido e o pdf ou PF anterior de6éé/@). Entdo o pdf the parameter 6 is unknown and the prior p.d.f. or p.f. of 6 is (0). Then the posterior posterior ou PF deéé p.d.f. or p.f. of 6 is &(0| xX FOC1| @).. . fXn| DEO) __ paraGe, E(0|x) = FOO) +++ FEnlOo@) ford €Q, gn(x) 8, (X) ondegné a junta marginal pdf ou PF deX1,..., Xn. where g, is the marginal joint p.d.f. or p.f. of X1,..., Xp. 388 Capitulo 7 Estimativa 388 Chapter 7 Estimation ProvaPara simplificar, assumiremos que 0 espaco de pardmetros é um intervalo da linha Proof For simplicity, we shall assume that the parameter space Q is either an interval real ou toda a linha real e queé/(9 uma pdf anterior em , em vez de uma PF anterior. No of the real line or the entire real line and that €(0) is a prior p.d.f. on Q, rather than entanto, a prova que sera dada aqui pode ser facilmente adaptada a um problema em a prior p.f. However, the proof that will be given here can be adapted easily to a queé(@x% um pf problem in which &(@) is a pf. Como as variaveis aleatériasX,..., Xnformar uma amostra aleatéria da Since the random variables X), ..., X, form arandom sample from the distribu- distribuigdo para a qual a pdf é/x| 0), segue da Seg. 3.7 que sua junta tion for which the p.d-f. is f(x|@), it follows from Sec. 3.7 that their conditional joint condicional pdf ou PFfn(m,..., xn| @)dado6é p.d.f or p.f f,(x1,...,%,|/9) given 6 is fr(m,..., Xn| OF (x1 | OD... Hxn| @). (7.2.4) fn (%q, - ++ Xl) = f (x1|0) --- f(x, |8). (7.2.4) Se usarmos a notacao vetorialx=(x1, ..., Xn), entdo a pdf conjunta na Eq. (7.2.4) pode ser escrita If we use the vector notation x = (x;,...,x,), then the joint p.d-f. in Eq. (7.2.4) de forma mais compacta comofn(x| 9). Eq. (7.2.4) apenas expressa o fato de que X1,..., XnSdo can be written more compactly as f,(x|0). Eq. (7.2.4) merely expresses the fact that condicionalmente independentes e distribuidos de forma idéntica, dados@, cada um tendo pdf X1,..., X, are conditionally independent and identically distributed given 0, each ou PFf(x| 8). having p.d.f. or p.f£. f(x|@). Se multiplicarmos a pdf ou pf da junta condicional pela pdf&6), obtemos 0 (n If we multiply the conditional joint p.d.f. or p.f. by the p.d.f. €(0), we obtain the +1}pdf conjunto dimensional (ou pf/pdf) dem, ..., Xne@na forma (n + 1)-dimensional joint p.d.f. (or p.f./p.d.£.) of X1,..., X, and @ in the form F (x, 0 Fin(x| O)5(0). (7.2.5) f(, 0) = fy(X10)E (0). (7.2.5) A junta marginal pdf ou PF de, ..., Xnagora pode ser obtido integrando o lado The marginal joint p.d.f. or p.f. of X,,..., X, can now be obtained by integrating direito da Eq. (7.2.5) sobre todos os valores de@. Portanto, o7-dimensional the right-hand side of Eq. (7.2.5) over all values of 0. Therefore, the n-dimensional junta marginal pdf ou PFgn(x)dem, [ ., Xnpode ser escrito na forma marginal joint p.d.f. or p.f. g,(¥) of X,,..., X, can be written in the form gn(x= fr(x| HE(A)A. (7.2.6) 8, (X) =| Ffn(x|O)E(0) dd. (7.2.6) Q Eq. (7.2.6) 6 apenas um exemplo da lei da probabilidade total para vetores aleatdérios Eq. (7.2.6) is just an instance of the law of total probability for random vectors (3.7.14). (3.7.14). Além disso, a pdf condicional de@dado queM1=m1,..., Xn=xn, a Saber, &(O| x), Furthermore, the conditional p.d-f. of 6 given that X;=x,..., X, =x,, namely, deve ser igual af (x, @dividido porgn(x). Assim, temos é(0|x), must be equal to f(x, @) divided by g,(x). Thus, we have fr(x| HEE, O)E(O &(0| x Fox O50) parade, (7.2.7) eO|x) = EOS torgeg, (7.2.7) gn(x) 8n(X) que é 0 teorema de Bayes reformulado para parametros e amostras aleatérias. Se&/@)é um PF, which is Bayes’ theorem restated for parameters and random samples. If €(@) is a para que a distribuicdo a priori seja discreta, basta substituir a integral em (7.2.6) pela soma de p.f., so that the prior distribution is discrete, just replace the integral in (7.2.6) by the todos os valores possiveis de@. = sum over all of the possible values of 6. = Exemplo Vida util das lampadas fluorescentes.Suponha novamente, como nos Exemplos 7.2.4 e 7.2.5, que o Example Lifetimes of Fluorescent Lamps. Suppose again, as in Examples 7.2.4 and 7.2.5, that the 7.2.6 distribuigdo da vida util de lampadas fluorescentes de um determinado tipo é a 7.2.6 distribution of the lifetimes of fluorescent lamps of a certain type is the exponential distribuigdo exponencial com parametro@, e a distribuigdo prévia deGé uma distribuigdo distribution with parameter 6, and the prior distribution of 6 is a particular gamma gama especifica para a qual o pdf&(@ dado pela Eq. (7.2.2). Suponha também que os distribution for which the p.d.f. €(@) is given by Eq. (7.2.2). Suppose also that the tempos de vida, ..., Xnde uma amostra aleatéria denlampadas deste tipo sdo lifetimes X,,..., X,, of a random sample of n lamps of this type are observed. We observadas. Determinaremos a fdp posterior de@dado queX1=x1, ..., Xn=Xxn. shall determine the posterior p.d.f. of @ given that X, =x,,..., X, =X,). Pela Eq. (5.7.16), a pdf de cada observagdoXeué By Eq. (5.7.16), the p.d.f. of each observation X; is { 9 e- Ox —0x f{x| OF parax >0, f(x|8) = | Ge for x > 0, 0 de outra forma. 0 otherwise. O pdf conjunto dex, ..., Xnpode ser escrito da seguinte forma, paraxeu>0 (eu= 1 The joint p.d.f. of X;,..., X, can be written in the following form, for x; > 0 Gi = pera, DD: 1,...,7): iT’ n fn(x| OF O@-6xeu= One-by, f,(x|0) =] ] 0c =0"e™, 5 eu=1 i=l n ondesim= ~ eu=1Xeu. Como fn(x| OSera usado na gonstrucao da distribuigdo posterior where y = )-_, x;. As f;,(¥|@) will be used in constructing the posterior distribution de@, agora € evidente que a estatistica S= eur Xeusera usado em qualquer inferéncia of 0, it is now apparent that the statistic Y = }*"_, X; will be used in any inference que faz uso da distribui¢do posterior. that makes use of the posterior distribution. 7.2 Distribuigdes Anteriores e Posteriores 389 7.2 Prior and Posterior Distributions 389 Desde o pdf anterior €/@/é dado pela Eq. (7.2.2), seqgue-se que para@ >0, Since the prior p.d.f. (0) is given by Eq. (7.2.2), it follows that for 6 > 0, tn(x| QE(Q= On+3 &-(vocé+20,000)8. (7.2.8) fy (X|O)E(B) = OP 3.9 OF20,000)6 | (7.2.8) Precisamos cateutar god Sue é a integral de (7.2.8) sobre todos& We need to compute g,,(x), which is the integral of (7.2.8) over all 6: co nix On+3 @-(vocé+20,000)0O= (4) g, (x) = / ~ 9"t3,-(+20,000)8 19 — Ta +4) 0 (vocé+20,000)n-4 * " 0 (y + 20,000)"+4’ onde a ultima igualdade segue do Teorema 5.7.3. Por isso, where the last equality follows from Theorem 5.7.3. Hence, On+3 €-(vocé+20,000)8 g"t3_—(y+20,000)6 O| x= =————___——_ 8|x) = ————_— OE Ta) 60) = Tapa (vocé+20,000 )nt+4 (7.2.9) (y+ 20,000)"+4 (7.2.9) _ (vocé+20,000 )n+4 @-(vocé+20,000)8, _ (y+ 20,000)"*4 5 (20,000)0 (n+4) T(n +4) , para >0. Quando comparamos esta expressdo com a Eq. (5.7.13), podemos ver que for 6 > 0. When we compare this expression with Eq. (5.7.13), we can see that it is é a pdf da distribuigdo gama com pardmetrosn+4 esim+20,000. Portanto, esta the p.d.f. of the gamma distribution with parameters n + 4 and y + 20,000. Hence, distribuigdo gama é a distribui¢do posterior de@. this gamma distribution is the posterior distribution of 6. Como exemplo especifico, suponha que observemos 0 seguinten=5 vidas As a specific example, suppose that we observe the following n =5 lifetimes em horas: 2.911, 3.403, 3.237, 3.509 e 3.118.s/m=16,178, e a distribuigdo in hours: 2911, 3403, 3237, 3509, and 3118. Then y = 16,178, and the posterior posterior deéé a distribuigdo gama com pardmetros 9 e 36,178. O painel distribution of 6 is the gamma distribution with parameters 9 and 36,178. The top superior da Fig. 7.1 exibe as PDFs anteriores e posteriores neste exemplo. E panel of Fig. 7.1 displays both the prior and posterior p.d.f.’s in this example. It is claro que os dados causaram a distribuigdo de@mudar um pouco do anterior clear that the data have caused the distribution of 6 to change somewhat from the para o posterior. prior to the posterior. Neste ponto, pode ser apropriado realizar uma analise de sensibilidade. Por exemplo, At this point, it might be appropriate to perform a sensitivity analysis. For como mudaria a distribuigdo posterior se tivéssemos escolhido uma distribuigdo anterior example, how would the posterior distribution change if we had chosen a different diferente? Para ser mais especifico, considere o anterior gama com os parametros 1 e prior distribution? To be specific, consider the gamma prior with parameters 1 and 1000. Este anterior tem o mesmo desvio padrao que o anterior original, mas a média é 1000. This prior has the same standard deviation as the original prior, but the mean cinco vezes maior. A distribuigdo posterior seria entdo a distribuigdo gama com os is five times as big. The posterior distribution would then be the gamma distribution pardmetros 6 e 17,178. As PDFs deste par de anteriores e posteriores estao plotadas no with parameters 6 and 17,178. The p.d-f’s of this pair of prior and posterior are plotted painel inferior da Fig. 7.1. Pode-se ver que tanto o anterior quanto o posterior no painel in the lower panel of Fig. 7.1. One can see that both the prior and the posterior in inferior estao mais espalhados do que suas contrapartes no painel superior. Isto the bottom panel are more spread out than their counterparts in the upper panel. It Figura 7.1PDFs anteriores e Distribuigdo anterior original Figure 7.1 Prior and poste- Original prior distribution posteriores no Exemplo 7.2.6. O = == Anterior rior p.d.f.’s in Example 7.2.6. --- Prior painel superior é baseado no ~ — Posterior The top panel is basedonthe | ~ — Posterior anterior original. O painel £ 40007 original prior. The bottom % 4000-- inferior 6 baseado na alternativa. © i panelis based onthe alterna- © i a 2000+ | . . 2000+ anterior que fez parte da andlise i tive prior that was part of the i de sensibilidade. ! . sensitivity analysis. ! . 9 0,0005 0,0010 o,0015vocé 0 0.0005 0.0010 0.0015 9 Distribuigdo prévia alternativa Alternative prior distribution —— Posterior —— Posterior v 4000 % 4000 3 Z a a 2000 C 2000 C 9 0,0005 0,0010 0.0015 vocé 0 0.0005 0.0010 0.0015 9 390 Capitulo 7 Estimativa 390 Chapter 7 Estimation esta claro que a escolha da distribuigdo anterior fara diferenga neste pequeno conjunto is clear that the choice of prior distribution is going to make a difference with this de dados. - small data set. < Os nomes “anterior” e “posterior” derivam das palavras latinas para “antigo” e “vem The names “prior” and “posterior” derive from the Latin words for “former” depois”. A distribuigdo anterior é a distribuigéo de isso vem antes da observacdo dos and “coming after.” The prior distribution is the distribution of 6 that comes before dados, e a distribuicgdo posterior vem depois da observagdo dos dados. observing the data, and posterior distribution comes after observing the data. A funcao de probabilidade The Likelihood Function O denominador do lado direito da Eq. (7.2.7) é simplesmente a integral do numerador The denominator on the right side of Eq. (7.2.7) is simply the integral of the numer- sobre todos os valores possiveis de@ Embora o valor desta integral dependa dos valores ator over all possible values of 6. Although the value of this integral depends on observadosx,..., Xn, ndo depende@e pode ser tratado como uma constante quando o the observed values x1, ..., X,, it does not depend on 6 and it may be treated as a lado direito da Eq. (7.2.7) é considerado como um pdf de@. Podemos, portanto, substituir constant when the right-hand side of Eq. (7.2.7) is regarded as a p.d.f. of 6. We may a Eq. (7.2.7) com a seguinte relacao: therefore replace Eq. (7.2.7) with the following relation: &(8| x fn(x| O)E(O). (7.2.10) E(O|x) x f,(x|O)E(). (7.2.10) O simbolo de proporcionalidade«é usado aqui para indicar que o lado esquerdo é igual ao The proportionality symbol « is used here to indicate that the left side is equal to the lado direito, exceto possivelmente por um fator constante, cujo valor pode depender dos right side except possibly for a constant factor, the value of which may depend on valores observadosx1,..., Xnmas nado depende@. O fator constante apropriado que the observed values x, ..., x, but does not depend on 6. The appropriate constant estabelecerd a igualdade dos doisflados na relagdo (7.2.10) podem ser determinados a factor that will establish the equality of the two sides in the relation (7.2.10) can be qualquer momento usando o fato de que €(8| x)dO=1, porqueé(9| x* um determined at any time by using the fact that /Q €(@|x) dé = 1, because E(6|x) is a pdf ded. p.d.f. of 6. Uma das duas fungées do lado direito da Eq. (7.2.10) € o pdf anterior de@. A One of the two functions on the right-hand side of Eq. (7.2.10) is the prior p.d.f. outra fungdo também tem um nome especial. of 6. The other function has a special name also. Definigao Funcdo de probabilidade.Quando o pdf conjunto ou o PF conjuntofn(x| O)das observacées Definition —_ Likelihood Function. When the joint p.d.f. or the joint p.f. f,(x|@) of the observations 7.2.3 em uma amostra aleatéria é considerada como uma fungdo de@para determinados valores dex, ..., Xn, € chamado 7.2.3 in arandom sample is regarded as a function of 6 for given values of X45 -++5Xn,> it is defun¢ao de verossimilhanga. called the likelihood function. A relagdo (7.2.10) afirma que a fdp posterior de proporcional ao produto da funcdo The relation (7.2.10) states that the posterior p.d.f. of 6 is proportional to the product de verossimilhanca e a fdp anterior de@. of the likelihood function and the prior p.d-f. of 6. Usando a relacdo de proporcionalidade (7.2.10), muitas vezes é possivel determinar a By using the proportionality relation (7.2.10), it is often possible to determine fdp posterior de@sem realizar explicitamente a integracdo na Eq. (7.2.6). Se pudermos the posterior p.d.f. of @ without explicitly performing the integration in Eq. (7.2.6). reconhecer o lado direito da relagdo (7.2.10) como sendo igual a uma das PDFs padrao If we can recognize the right side of the relation (7.2.10) as being equal to one of the introduzidas no Capitulo 5 ou em outro lugar deste livro, exceto possivelmente por um standard p.d.f’s introduced in Chapter 5 or elsewhere in this book, except possibly fator constante, entdo poderemos facilmente determinar o fator apropriado que ira for a constant factor, then we can easily determine the appropriate factor that will converter o lado direito de (7.2.10) em um pdf adequado de@. Ilustraremos essas idéias convert the right side of (7.2.10) into a proper p.d.f. of 6. We shall illustrate these considerando novamente o Exemplo 7.2.3. ideas by considering again Example 7.2.3. Exemplo Proporcdo de itens defeituosos.Suponha novamente, como no Exemplo 7.2.3, que a proporcado Example Proportion of Defective Items. Suppose again, as in Example 7.2.3, that the proportion 7.2.7 Ade itens defeituosos em um grande lote fabricado é desconhecido e que a distribuigdo 7.2.7 @ of defective items in a large manufactured lot is unknown and that the prior prévia de6 uma distribuigdo uniforme no intervalo [0,1]. Suponha também que uma distribution of 6 is a uniform distribution on the interval [0, 1]. Suppose also that amostra aleatéria denitens sao retirados do lote, e paraeu=1,..., n, deixarXeu=1 se oeuo a random sample of n items is taken from the lot, and fori =1,...,n, let X; =1if item esta com defeito e deixeXeu=0 caso contrario. Entdom, ..., XnformanEnsaios de the ith item is defective, and let X; = 0 otherwise. Then X),..., X,, form Bernoulli Bernoulli com parametro@. Determinaremos a fdp posterior de@. trials with parameter 6. We shall determine the posterior p.d-f. of 6. Segue-se da Eq. (5.2.2) que o PF de cada observagadoXeué It follows from Eq. (5.2.2) that the p.f. of each observation X; is { _Ah_ — x(1 _ g)l-x _— fix| O)= Ox(1 -Oh-x — parax=0,1, F(«l8) = | 6*(1—6) for x =0, 1, 5 0 de outra forma. 0 otherwise. n Portanto, se deixarmossim= “~ eu=1Xeu, entdo a junta PF deXi,..., Xnpode ser escrito no Hence, if we let y = )°"_, x;, then the joint p.f. of X;,..., X,, can be written in the seguinte formulario paraxev=0 ou 1 (eu=1,..., 1): following form for x; =0 or 1 (i=1,..., 7): fn(x| OF Osim(1 -@)n-sim. (7.2.11) f,(x|0@) =e7(1-6)". (7.2.11) 7.2 Distribuigdes Anteriores e Posteriores 391 7.2 Prior and Posterior Distributions 391 Desde o pdf anterior €/@ dado pela Eq. (7.2.1), segue-se que para 0<6<1, Since the prior p.d.f. €(@) is given by Eq. (7.2.1), it follows that for 0 <6 <1, fr(x| DE(OE Osim(1 -On-sim. (7.2.12) frtxl0é(O) =0°(1 - 0)" ®. (7.2.12) Quando comparamos esta expressdo com a Eq. (5.8.3), podemos ver que, When we compare this expression with Eq. (5.8.3), we can see that, except for exceto por um fator constante, é a fdp da distribuigdo beta com parametrosa= a constant factor, it is the p.d.f. of the beta distribution with parameters a = y + 1 sim+1 ef=n-sim+1. Desde o pdf posterior &(6| xJé proporcional ao lado direito da and 8 =n — y +1. Since the posterior p.d.f. €(@|x) is proportional to the right side Eq. (7.2.12), segue que€(@| x)deve ser o pdf da distribuigdo beta com pardmetros of Eq. (7.2.12), it follows that €(6|x) must be the p.d.f. of the beta distribution with a=sim+1 ef=n-sim+1. Portanto, para 0<6<1, parameters a = y+ land 8 =n — y +1. Therefore, for0 <6 <1, m2 r 2 _y &(O| XF re) ng -O)nrsim. (7.2.13) E(O|x) = Pes?) gg — oy". (7.2.13) (vocé+1) (n-sim+1) lo+bDra-y+h Neste exemplo, a estatistica S= 2 cui Xevesta sendo usado para construir a parte posterior In this example, the statistic Y = }*"_, X; is being used to construct the posterior distribuigdo e, portanto, sera usado em qualquer inferéncia baseada na distribuigdo distribution, and hence will be used in any inference that is based on the posterior posterior. - distribution. < Nota: Constante de normalizaga4o para pdf posteriorAs etapas que nos levaram de Note: Normalizing Constant for Posterior p.d.f. The steps that got us from (7.2.12) (7.2.12) a (7.2.13) sio um exemplo de uma técnica muito comum para determinar uma fdp to (7.2.13) are an example of a very common technique for determining a posterior posterior. Podemos eliminar qualquer fator constante inconveniente da fdp anterior e da p.d.f. We can drop any inconvenient constant factor from the prior p.d.f. and from the fungdo de verossimilhanca antes de multiplicd-los juntos como em (7.2.10). Entéo olhamos para likelihood function before we multiply them together as in (7.2.10). Then we look at © produto resultante, chamamos-lheg/(O), para ver se o reconhecemos como parte de um PDF the resulting product, call it g(0), to see if we recognize it as looking like part of a que vimos em outro lugar. Se de fato encontrarmos uma distribuicgdo nomeada com pdf igual a p.d.f. that we have seen elsewhere. If indeed we find a named distribution with p.d.f. cg(@), entao nosso pdf posterior também écg/(), e nossa distribuigdo posterior tem o nome equal to cg(@), then our posterior p.d-f. is also cg(9), and our posterior distribution correspondente, assim como no Exemplo 7.2.7. has the corresponding name, just as in Example 7.2.7. Observacées e previsdes sequenciais Sequential Observations and Prediction Em muitos experimentos, as observagéesX1, ..., Xn, que formam a amostra aleatoria, In many experiments, the observations X,,..., X,, which form the random sample, devem ser obtidos sequencialmente, ou seja, um de cada vez. Em tal experimento, o valor must be obtained sequentially, that is, one at a time. In such an experiment, the deXié observado primeiro, 0 valor deX2é observado a seguir, 0 valor deX3é entdo value of X, is observed first, the value of X> is observed next, the value of X3 is then observado, e assim por diante. Suponha que a pdf anterior do parametro6é&@). Depois observed, and so on. Suppose that the prior p.d.f. of the parameter 6 is (6). After do valorxideXifoi observado, o pdf posterior €/6| x1 Joode ser calculado da maneira usual a the value x, of X; has been observed, the posterior p.d-f. €(9|x,) can be calculated in partir da relagdo the usual way from the relation &(8| xi f(x | HEA). (7.2.14) E(O|x1) x f(x4|O)E(). (7.2.14) DesdeXieX2sdo condicionalmente independentes, dados@, 0 PF condicional ou Since X, and X> are conditionally independent given 6, the conditional p.f. or pdf dex2dadoGeX1=x1é igual ao dado &sozinho, ou seja, f/x2 | 8). Portanto, a pdf p.d.f. of X, given 6 and X; = x, is the same as that given 6 alone, namely, f(x2|0). posterior de@na Eq. (7.2.14) serve como o pdf anterior de@quando o valor dex2 Hence, the posterior p.d.f. of 6 in Eq. (7.2.14) serves as the prior p.d.f. of 6 when the deve ser observado. Assim, apos 0 valorxzdeX2foi observado, o pdf posterior é/0| value of X, is to be observed. Thus, after the value x, of X, has been observed, the X1, X2)pode ser calculado a partir da relagdo posterior p.d.f. €(@|x1, x2) can be calculated from the relation &(O| x1, x2 102 | AE(O| x1). (7.2.15) E(O|X4, X2) & f (x2|9)E(O|x1). (7.2.15) Podemos continuar desta forma, calculando uma fdp posterior atualizada deGapds We can continue in this way, calculating an updated posterior p.d-f. of 6 after each cada observagdo e usando esse pdf como o pdf anterior de@para a proxima observacao. O observation and using that p.d.f. as the prior p.d.f. of 6 for the next observation. The pdf posterior &/@| x, ..., Xn-1}depois dos valoresx1, ..., Xn-1foram observados sera, em posterior p.d.f. €(6|x1, ..., x,_1) after the values x1, ..., x, _, have been observed will Ultima andalise, o pdf anterior de@para o valor final observado deXn. Afinal, o pdf posterior ultimately be the prior p.d-f. of 6 for the final observed value of X,,. The posterior mvaloresx1,..., Xnforam observados serdo, portanto, especificados pela relagdo p.d.f. after all n values x1, ..., x, have been observed will therefore be specified by the relation €(8| x f(xn| DE(O| x1, ..., Xn-1). (7.2.16) E(O|x) x f(x, (AVEO |x4, --- , Xp_1)- (7.2.16) Alternativamente, afinalnvaloresx,..., xnforam observados, poderiamos calcular a Alternatively, after all values x1, ..., x, have been observed, we could calculate pdf posterior &/6| xda maneira usual, combinando o pdf conjuntofn(x| 8) com o pdf the posterior p.d.f. €(6|x) in the usual way by combining the joint p.df. f,(x|@) anterior original €(@), conforme indicado na Eq. (7.2.7). Pode ser mostrado (ver with the original prior p.d.f. €(@), as indicated in Eq. (7.2.7). It can be shown (see 392 Capitulo 7 Estimativa 392 Chapter 7 Estimation Exercicio 8) que a pdf posterior &(6| xsera o mesmo, independentemente de ser calculado Exercise 8) that the posterior p.d.f. €(@|x) will be the same regardless of whether it is diretamente usando a Eq. (7.2.7) ou sequencialmente usando as Eqs. (7.2.14), (7.2.15) e (7.2.16). calculated directly by using Eq. (7.2.7) or sequentially by using Eqs. (7.2.14), (7.2.15), Esta propriedade foi ilustrada na Sec. 2.3 (ver pagina 80) para uma moeda que é conhecida por and (7.2.16). This property was illustrated in Sec. 2.3 (see page 80) for a coin that is ser honesta ou por ter uma cara em cada lado. Apés cada lancamento da moeda, a known either to be fair or to have a head on each side. After each toss of the coin, probabilidade posterior de a moeda ser honesta é atualizada. the posterior probability that the coin is fair is updated. As constantes de proporcionalidade nas Eqs. (7.2.14)-(7.2.16) tam uma interpretagdo The proportionality constants in Eqs. (7.2.14)—(7.2.16) have a useful interpreta- util. Por exemplo, em (7.2.16) a constante de proporcionalidade é 1 sobre a integral do tion. For example, in (7.2.16) the proportionality constant is 1 over the integral of lado direito em relacgdo a@. Mas esta integral é a pdf condicional ou PF deXn the right side with respect to 6. But this integral is the conditional p.d-f. or p.f. of X,, dadomM=x1,..., Xn-1=Xn-1, de acordo com a versdo condicional da lei de given Xj =2%,..., X,_1 =%,_1, according to the conditional version of the law of probabilidade total (3.7.16). Por cxerple se@tem uma distribuicdo continua, total probability (3.7.16). For example, if 6 has a continuous distribution, f(xn| X1, 2.6, Xr-1F f(xn| DE(O| x1, ..., Xr-1)d8. (7.2.17) f Onl, --+. Xp_-v -| fF OnlAEO |x, -.- 5. Xp) dO. (7.2.17) A constante de proporcionalidade em (7.2.16) é 1 sobre (7.2.17). Entao, se estivermos The proportionality constant in (7.2.16) is 1 over (7.2.17). So, if we are interested in interessados em prever ov? observagdo em uma sequéncia apés observar a primeira/+1, predicting the nth observation in a sequence after observing the first n — 1, we can podemos usar (7.2.17), que também é 1 sobre a constante de proporcionalidade na Eq. (7.2.16), use (7.2.17), which is also 1 over the proportionality constant in Eq. (7.2.16), as the como 0 PF ou pdf condicional deXndado o primeiron-1 observacées. conditional p.f. or p.d.f. of X,, given the first n — 1 observations. Exemplo Vida util das lampadas fluorescentes.No Exemplo 7.2.6, condicional a, as vidas de Example Lifetimes of Fluorescent Lamps. In Example 7.2.6, conditional on 0, the lifetimes of 7.2.8 lampadas fluorescentes so varidveis aleatdrias exponenciais independentes com parametro 7.2.8 fluorescent lamps are independent exponential random variables with parameter 0. 8. Observamos também a vida Util de cinco [ampadas e a posterior distribuigdo de Gfoi We also observed the lifetimes of five lamps, and the posterior distribution of 6 was encontrada a distribuigéo gama com os paradmetros 9 e 36,178. Suponha que queiramos prever found to be the gamma distribution with parameters 9 and 36,178. Suppose that we o tempo de vidaXéda préxima lampada. want to predict the lifetime X¢ of the next lamp. O pdf condicional de, a vida util da proxima lampada, dadas as primeiras cinco The conditional p.d.f. of X¢, the lifetime of the next lamp, given the first five vidas, é igual a integral deé(0| x)F (x6 | @Jem relacgdo a@. O pdf posterior de G6€(0| x2. lifetimes equals the integral of €(0|x) f (x6|0) with respect to 0. The posterior p.d.f. of 633x1 O2eGhe 26 78¢parap >0. Entdo, parax6>0 6 is E(O|x) = 2.633 x 10°°68e—36178 for 6 > 0. So, for x6 > 0 °° Co f(0%6 | x 2.633% 1036 G8 e-36,17800e.x%6600 f (x6 |x) -|/ 2.633 x 10°98 e— 36-1789 9 6—*6 da 0 0 Joo oo =2.633x1036 e-(15+36,178)608 (7.2.18) = 2.633 x 10° / 92e~(*6+36.179)8 gg (7.2.18) 0 0 10 9.555x1041 ro 9.555 x 1041 =2.633x1 036 10) ——§$£_——_.. = 2.633 x 1o%_ 100) _ — A (x6+36,178)10 (X6+36,178)10 (x6 + 36,178)!9 (xg + 36,178)!9 Podemos usar este pdf para realizar qualquer cdlculo que desejarmos em relacao a distribuicgdo We can use this p.d.f. to perform any calculation we wish concerning the distribution deXédados os tempos de vida observados. Por exemplo, a probabilidade de a sexta lampada of X¢ given the observed lifetimes. For example, the probability that the sixth lamp durar mais de 3.000 horas é igual a lasts more than 3000 hours equals Je 9555x1041 9.555x1041 ~ 9.555 x 104 9.555 x 104! Pr.(X6>3000 | x ———_ —— a= ——— =0.4882. Pr(X6 > 3000|x) = / 2? * dXx5 = =??? * “_ — 0.4882. 3.000(x6+36,178)10 9x39,1789 3000 (x6 + 36,178) 10 9 x 39,1789 Finalmente, podemos continuar a andlise de sensibilidade iniciada no Exemplo 7.2.6. Finally, we can continue the sensitivity analysis that was started in Example 7.2.6. Se for importante saber a probabilidade de que 0 proximo tempo de vida seja de pelo If it is important to know the probability that the next lifetime is at least 3000, we can menos 3.000, podemos ver quanta influéncia a escolha da distribuigaéo anterior teve neste see how much influence the choice of prior distribution has made on this calculation. calculo. Utilizando a segunda distribuicdo anterior (gama com pardmetros 1 e 1000), Using the second prior distribution (gamma with parameters 1 and 1000), we found descobrimos que a distribuigdo posterior de foi a distribuigdo gama com os parametros 6 that the posterior distribution of 6 was the gamma distribution with parameters 6 e 17,178. Poderiamos calcular a pdf condicional deXédados os dados observados da and 17,178. We could compute the conditional p.d.f. of X¢ given the observed data mesma forma que fizemos com o posterior original, e seria in the same way as we did with the original posterior, and it would be 1.542x1026 1.542 x 1076 fixe | x= = ——————_,,_ paraxe>0. (7.2.19) f (lx) = —————..._ for x, > 0. (7.2.19) (x6+ 17,178) (x6 + 17,178)? Com esta pdf, a probabilidade de queX6>3000 é With this p.d-f., the probability that X¢ > 3000 is 7.2 Distribuigdes Anteriores e Posteriores 393 7.2 Prior and Posterior Distributions 393 Figura 7.2Duas possibilidades Comparacao de distribuigdes condicionais da préxima observagao Figure 7.2 Two possi- Comparison of conditional distributions of next observation pdfs condicionais possiveis, Eqs. —— antes original ble conditional p.d.f’s, —— Original prior (7.2.18) e (7.2.19) paraXédados Eqs. (7.2.18) and (7.2.19) os dados observados no for X¢ given the observed Exemplo 7.2.8. As duas PDFs 7100030 Fh data in Example 7.2.8. The 0.00030 foram calculadas usando as \ two p.d.f’s were computed a \ duas distribuigdes posteriores | 000020 | using the two different pos- 5 0.00020 | diferentes que foram derivadas ° \\ terior distributions that were “ \\ das duas distribuigées derived from the two dif- anteriores diferentes no 0,00010 ferent prior distributions in 0.00010 Exemplo 7.2.6. Example 7.2.6. 9 5.000 10.000 15.000 20.000 25.000 30.000.x6 9 5000 10,000 15,000 20,000 25,000 30,000 “6 Jo 1.542x1026 xo 6 6 Pr.(X6>3000 | x —__—__—— da%= 1542%1 026 _ =0.3807. Pr(X6 > 3000|x) = / 1542 x10" dXx5 = 1542 x 10" = 0.3807. 3.000(X17,178)7 6x20,1786 3000 (x6 + 17,178)? 6 x 20,1786 Como observamos no final do Exemplo 7.2.6, os diferentes anteriores fazem uma As we noted at the end of Example 7.2.6, the different priors make a considerable diferenga consideravel nas inferéncias que podemos fazer. Se for importante ter um valor difference in the inferences that we can make. Ifit is important to have a precise value preciso de Pr(X6>3000| x), precisamos de uma amostra maior. Os dois PDFs diferentes deX of Pr(X¢ > 3000|x), we need a larger sample. The two different p.d.f’s of X¢ given x 6dadox pode ser comparado na Fig. 7.2. A pdf da Eq. (7.2.18) é maior para valores can be compared in Fig. 7.2. The p.d.f. from Eq. (7.2.18) is higher for intermediate intermediarios dexe, enquanto o da Eq. (7.2.19) é maior para os valores extremos dexe. values of x¢, while the one from Eq. (7.2.19) is higher for the extreme values of x¢. - < Resumo Summary A distribuigdo anterior de um pardmetro descreve nossa incerteza sobre 0 pardmetro The prior distribution of a parameter describes our uncertainty about the parameter antes de observar quaisquer dados. A funcdo de verossimilhanga é a pdf ou PF before observing any data. The likelihood function is the conditional p.d_f. or p.f. of condicional dos dados dados ao pardmetro quando considerado como uma func¢ao the data given the parameter when regarded as a function of the parameter with the do pardmetro com os dados observados inseridos. A probabilidade nos diz 0 quanto observed data plugged in. The likelihood tells us how much the data will alter our os dados alterardo nossa incerteza. Grandes valores de verossimilhanga uncertainty. Large values of the likelihood correspond to parameter values where the correspondem a valores de pardmetros onde a pdf ou PF posterior sera maior que a posterior p.d.f. or p.f. will be higher than the prior. Low values of the likelihood occur anterior. Valores baixos de probabilidade ocorrem em valores de parametros onde o at parameter values where the posterior will be lower than the prior. The posterior posterior sera menor que o anterior. A distribuigdo posterior do pardmetro éa distribution of the parameter is the conditional distribution of the parameter given distribuicdo condicional do parametro dados os dados. E obtido usando o teorema the data. It is obtained using Bayes’ theorem for random variables, which we first saw de Bayes para variaveis aleatdérias, que vimos pela primeira vez na pagina 148.0 on page 148. We can predict future observations that are conditionally independent usando a versdo condicional da lei da probabilidade total que vimos na pagina 163. of the observed data given @ by using the conditional version of the law of total probability that we saw on page 163. Exercicios Exercises 1.Considere novamente a situacdo descrita no Exemplo &0.10.7 e€0.20.3. 1. Consider again the situation described in Example &€(0.1)=0.7 and €(0.2) =0.3. 7.2.8. Desta vez, suponha que o experimentador acredite 7.2.8. This time, suppose that the experimenter believes que a distribuicdo anterior de6é a distribuigdo gama com Suponha também que, quando oito itens sao selecionados that the prior distribution of @ is the gamma distribution Suppose also that when eight items are selected at ran- parametros 1 e 5000. O que este experimentador aleatoriamente do lote, verifica-se que exatamente dois deles with parameters 1 and 5000. What would this experi- dom from the lot, it is found that exactly two of them are calcularia como o valor de Pr(X6>3000| x? estdo com defeito. Determine o PF posterior de@. menter compute as the value of Pr(X¢ > 3000|x)? defective. Determine the posterior p.f. of 6. 2.Suponha que a proporcdo &de itens defeituosos em um 3.Suponha que o numero de defeitos em um rolo de 2. Suppose that the proportion 6 of defective items in a 3. Suppose that the number of defects on a roll of mag- grande lote fabricado € conhecido como 0,1 ou 0,2, e o PF fita magnética tenha uma distribuigdo de Poisson para large manufactured lot is known to be either 0.1 or 0.2, netic recording tape has a Poisson distribution for which anterior deGé o seguinte: a qual a média/é 1,0 ou 1,5, e o PF anterior deAé tao and the prior p.f. of is as follows: the mean A is either 1.0 or 1.5, and the prior pf. of A is as 394 Capitulo 7 Estimativa 394 Chapter7 Estimation segue: 8.Suponha queX1,..., Xnformar uma amostra aleatdria de follows: 8. Suppose that X,..., X, form a random sample from uma distribuicdo para a qual a pdf éf(x| 8), o valor degé a distribution for which the p.d.f. is f(x|6), the value of 6 &(1.00.4 e &(1.5+0.6. desconhecido, e o pdf anterior de@@é8). Mostre que a pdf €(1.0)=0.4 and €(1.5)=0.6. is unknown, and the prior p.d.f. of 6 is €(@). Show that the posterior &(6| x}6 o mesmo, independentemente de ser posterior p.d.f. €(6@|x) is the same regardless of whether it Se um rolo de fita selecionado aleatoriamente apresenta trés calculado diretamente usando a Eq. (7.2.7) ou If a roll of tape selected at random is found to have three is calculated directly by using Eq. (7.2.7) or sequentially defeitos, qual € o FP posterior do? sequencialmente usando as Eqs. (7.2.14), (7.2.15) e (7.2.16). defects, what is the posterior p.f. of 4? by using Eqs. (7.2.14), (7.2.15), and (7.2.16). 4.Suponha que a distribuigdo anterior de algum 9.Considere novamente o problema descrito no Exercicio 6 e 4. Suppose that the prior distribution of some parameter 9. Consider again the problem described in Exercise 6, parametro 68 uma distribuigdo gama para a qual a média assuma a mesma distribuicdo anterior de@. Suponha agora, 6 isa gamma distribution for which the mean is 10 andthe —_ and assume the same prior distribution of 6. Suppose now, é 10 e a variancia é 5. Determine a pdf anterior de@. entretanto, que em vez de selecionar uma amostra aleatoria de variance is 5. Determine the prior p.d.f. of 0. however, that instead of selecting a random sample of 5.Suponha que a distribuicdo anterior de algum oito itens do lote, realizamos 0 seguinte experimento: itens do 5. Suppose that the prior distribution of some parameter eight items from the lot, we perform the following exper- oe , se tus lote so selecionados aleatoriamente, um por um, até que - re . : iment: Items from the lot are selected at random one by parametro Ge uma distribuic¢ao beta Para a qual a media é exatamente trés itens defeituosos sejam encontrados. Se g 1S a beta distribution for which the mean is 1/3 and the one until exactly three defectives have been found. If we 14 ea variancia € 1/45. Determine o pdf anterior deé. descobrirmos que devemos selecionar um total de oito itens neste variance is 1/45. Determine the prior p.d f of @. find that we must select a total of eight items in this exper- q g Pp 6.Suponha que a proporcdo@de itens defeituosos em um experimento, qual é a distribuicdo posterior de&no final do 6. Suppose that the proportion 6 of defective items in a iment, what is the posterior distribution of 0 at the end of grande lote fabricado é desconhecido, e a distribuicdo prévia experimento? large manufactured lot is unknown, and the prior distribu- the experiment? de a distribuigaéo uniforme no intervalo [0,1]. Quando oito 10.Suponha que uma Unica observacdoXdeve ser tion of 6 iS the uniform distribution on the interval [0, 1). 10. Suppose that a single observation X is to be taken itens sdo selecionados aleatoriamente do lote, verifica-se que obtido da distribuic3o uniforme no intervalo [6-1 5 When eight items are selected at random from the lot, itis from the uniform distribution on the interval [@ — 4, exatamente trés deles estao com defeito. Determine a . ; Ce oo found that exactly three of them are defective. Determine 1 . . ro 2 distribuicao posterior de@. Or 2], 0 valor deée desconhecido, ea distribuicao anterior the posterior distribution of 6. @ +5], the value of 6 is unknown, and the prior distribu- cdo de 6 a distribuigdo uniforme no intervalo [10,20]. tion of 6 is the uniform distribution on the interval [10, 20]. 7.Considere novamente o problema descrito no Exercicio 6, Se o valor observado deXé 12, qual é a distribuigdo 7. Consider again the problem described in Exercise 6, If the observed value of X is 12, what is the posterior dis- mas suponha agora que a fdp anterior deGé o seguinte: posterior de? but suppose now that the prior p.d.f. of 6 is as follows: tribution of 6? { 21-8) para 0<6<1, 11.Considere novamente as condigées do Exercicio 10 e 211-6) for0 <6 <1, 11. Consider again the conditions of Exercise 10, and &(OF assuma a mesma distribuicgdo anterior de@. Suponha agora, ¢(6) = | . assume the same prior distribution of 6. Suppose now, 0 de outra forma. entretanto, que seis observacées sejam selecionadas 0 otherwise. however, that six observations are selected at random Como no Exercicio 6, suponha que numa amostra aleatoria de oito - aleatoriamente da distribuigao uniforme no intervalo[@1 5 Asin Exercise 6, suppose that inarandom sample ofeight | from the uniform distribution on the interval [6 — i itens exatamente trés sejam considerados defeituosos. Determine G+1 2], e seus valores sdo 11.0,11.5,11.7,11.1,11.4, e items exactly three are found to be defective. Determine O+ 5] and their values are 11.0, 11.5, 11.7, 11.1, 11.4, and a distribuicgdo posterior de@. 10.9. Determine a distribuigdo posterior deé. the posterior distribution of 6. 10.9. Determine the posterior distribution of 6. 7.3 Distribuigdes Anteriores Conjugadas 7.3 Conjugate Prior Distributions Para cada um dos modelos estatisticos mais populares, existe uma familia de distribuic¢ées para For each of the most popular statistical models, there exists a family of distributions 0 parametro com uma propriedade muito especial. Se a distribuicao anterior for escolhida como for the parameter with a very special property. If the prior distribution is chosen to membro dessa familia, ent&o a distribuicao posterior também seré membro dessa familia. Essa be amember of that family, then the posterior distribution will also be amember of familia de distribuigées 6 chamada de familia conjugada. A escolha de uma distribui¢ao anterior that family. Such a family of distributions is called a conjugate family. Choosing a de uma familia conjugada normalmente tornaré particularmente simples o cdlculo da prior distribution from a conjugate family will typically make it particularly simple distribuicao posterior. to calculate the posterior distribution. Amostragem de uma distribuigdo Bernoulli Sampling from a Bernoulli Distribution Exemplo Um ensaio clinico.No Exemplo 5.8.5 (pagina 330), estavamos observando pacientes em uma clinica Example A Clinical Trial. In Example 5.8.5 (page 330), we were observing patients in a clini- 7.3.1 julgamento cal. A proporgdoFde resultados bem-sucedidos entre todos os pacientes 7.3.1 cal trial. The proportion P of successful outcomes among all possible patients was possiveis foi uma variavel aleatoria para a qual escolhemos uma distribuicdo da familia de a random variable for which we chose a distribution from the family of beta distri- distribuig6es beta. Esta escolha fez com que o calculo da distribuigdo condicional deP butions. This choice made the calculation of the conditional distribution of P given dados os dados observados muito simples no final desse exemplo. Na verdade, a the observed data very simple at the end of that example. Indeed, the conditional distribuigaéo condicional dePdados os dados, era outro membro da familia beta. - distribution of P given the data was another member of the beta family. < Que o resultado do Exemplo 7.3.1 ocorra em geral é 0 assunto do préximo teorema. That the result in Example 7.3.1 occurs in general is the subject of the next theorem. Teorema Suponha que, ..., Xnformar uma amostra aleatoria da distribuigéo de Bernoulli com Theorem Suppose that X,,..., X,, form arandom sample from the Bernoulli distribution with 7.3.1 parametro@, o que é desconhecido(0<@<1 ). Suponha também que a distribuigdo anterior 7.3.1 parameter 6, which is unknown (0 < 6 < 1). Suppose also that the prior distribution 7.3 Distribuigées Anteriores Conjugadas 395 7.3 Conjugate Prior Distributions 395 deéé a distribuigdo beta com pardmetrosa >0 ef >0. Entdo a dis- of @ is the beta distribution with parameters w > 0 and 6 > 0. Then the posterior dis- tribugao de @dado queX se XeuleL=1, ...,na distribuigdo beta com pardmetros tribution of 6 given that X; = x; (i =1,..., ) is the beta distribution with parameters a+ 1 Xeve B+ N- cust Xeu, a+ yor x; and B+n— SO") x. O Teorema 7.3.1 € apenas uma reformulacdo do Teorema 5.8.2 (pagina 329), e sua prova é Theorem 7.3.1 is just a restatement of Theorem 5.8.2 (page 329), and its proof is essencialmente o calculo do Exemplo 5.8.3. essentially the calculation in Example 5.8.3. Atualizando a Distribuigao PosteriorJUma implicagdo do Teorema 7.3.1 é a seguinte: Updating the Posterior Distribution One implication of Theorem 7.3.1 is the fol- Suponha que a propor¢do Ade itens defeituosos em uma remessa grande é desconhecida, lowing: Suppose that the proportion 6 of defective items in a large shipment is un- a distribuigdo prévia deGé a distribuigdo beta com parametrosaef, enos itens sdo known, the prior distribution of 6 is the beta distribution with parameters a and 6, selecionados um de cada vez aleatoriamente na remessa e inspecionados. Suponha que and n items are selected one at a time at random from the shipment and inspected. os itens sejam condicionalmente independentes, dado@. Se o primeiro item inspecionado Assume that the items are conditionally independent given 0. If the first item in- apresentar defeito, a distribuigdo posterior do@sera a distribuigdo beta com pardmetrosat spected is defective, the posterior distribution of 6 will be the beta distribution with 1 eB. Se o primeiro item nao for defeituoso, a distribuigdéo posterior sera a distribuigdo parameters a + 1 and £. If the first item is nondefective, the posterior distribution beta com parametrosaef+ 1. O processo pode ser continuado da seguinte maneira: Cada will be the beta distribution with parameters a and £8 + 1. The process can be contin- vez que um item é inspecionado, a distribuigdo beta posterior atual do alterado para ued in the following way: Each time an item is inspected, the current posterior beta uma nova distribuigdo beta na qual o valor do parametroaou o parametrofé aumentado distribution of 6 is changed to a new beta distribution in which the value of either the em uma unidade. O valor deaé aumentado em uma unidade cada vez que um item parameter a or the parameter £ is increased by one unit. The value of a is increased defeituoso é encontrado, e 0 valor defé aumentado em uma unidade cada vez que um by one unit each time a defective item is found, and the value of 8 is increased by item nao defeituoso é encontrado. one unit each time a nondefective item is found. Definigao Familia conjugada/hiperpardmetros.Deixar%, X2,.. .ser condicionalmente iid dado@com Definition Conjugate Family/Hyperparameters. Let X,, X>,... be conditionally i.i.d. given 0 with 7.3.1 PF comum ou pdff(x| 8). Seja uma familia de distribuigédes possiveis no espaco de 7.3.1 common p.f. or p.d-f. f(x|@). Let Y be a family of possible distributions over the parametros. Suponha que, nado importa qual distribuigdo anteriorgescolhemos, nao parameter space Q. Suppose that, no matter which prior distribution € we choose importa quantas observacéesX=(X1,..., XnJobservamos, e ndo importa quais sejam from W, no matter how many observations X = (X,..., X,,) we observe, and no seus valores observadosx=(x1,..., Xn), a distribuigdo posterior €(@| x)é membro de. matter what are their observed values x = (x1,..., x,), the posterior distribution Entdo é chamado defamilia conjugada de distribuicdes anteriores para amostras das &(@|x) is a member of W. Then W is called a conjugate family of prior distributions distribuigdes/(x| @). Diz-se também que a familia 6fechado sob amostragemdas for samples from the distributions f(x|@). It is also said that the family W is closed distribuigdes/(x| 6). Finalmente, se as distribuigdes forem parametrizadas por under sampling from the distributions f(x|0). Finally, if the distributions in V are parametros adicionais, entdo os pardmetros associados para a distribui¢do anterior parametrized by further parameters, then the associated parameters for the prior sdo chamados dehiperparametros anteriorese os pardmetros associados da distribution are called the prior hyperparameters and the associated parameters of distribuigdo posterior sdo chamados dehiperpardmetros posteriores. the posterior distribution are called the posterior hyperparameters. O Teorema 7.3.1 diz que a familia de distribuigdes beta é uma familia conjugada de distribuigdes Theorem 7.3.1 says that the family of beta distributions is a conjugate family of prior anteriores para amostras de uma distribuicdo de Bernoulli. Se a distribuicdo prévia de@é uma distributions for samples from a Bernoulli distribution. If the prior distribution of 6 distribuicdo beta, entao a distribuicdo posterior em cada etapa da amostragem também sera uma is a beta distribution, then the posterior distribution at each stage of sampling will distribuicdo beta, independentemente dos valores observados na amostra. Além disso, a familia de also be a beta distribution, regardless of the observed values in the sample. Also, the distribuicgdes beta é fechada na amostragem das distribuigdes de Bernoulli. family of beta distributions is closed under sampling from Bernoulli distributions. Os parametrosaefGno Teorema 7.3.1 sdo os anterioresy hiperparametros. ° Fforre- The parameters a and f in Theorem 7.3.1 are the prior hyperparameters. The corre- parametros correspondentes das distribuigdes posteriores (g éu=1Xeue B+ N- eux Xeu) sponding parameters of the posterior distributions (a + )~"_, x; and B +n — )~"_, x;) sao os hiperpardmetros posteriores. A estatistica cu=1Xeué necessario para calcular o are the posterior hyperparameters. The statistic }~”_, X; is needed to compute the distribuicdo posterior, portanto sera necessario realizar qualquer infer€ncia baseada na posterior distribution, hence it will be needed to perform any inference based on the distribuicdo posterior. Os exercicios 23 e 24 apresentam uma colecdo geral de PDFs f(x| @para posterior distribution. Exercises 23 and 24 introduce a general collection of p.d.f’s os quais existem familias conjugadas de anteriores. A maioria das distribuigd6es nomeadas f (x|@) for which conjugate families of priors exist. Most of the familiar named distri- familiares sAo abordadas nestes exercicios. As varias distribuigdes uniformes sao excegdes butions are covered by these exercises. The various uniform distributions are notable notaveis. exceptions. Exemplo A Variancia da Distribuigdo Beta Posterior.Suponha que a proporcdo de Example The Variance of the Posterior Beta Distribution. Suppose that the proportion 6 of 7.3.2 itens defeituosos em uma remessa grande sdo desconhecidos, a distribuigdo prévia de6é 7.3.2 defective items in a large shipment is unknown, the prior distribution of @ is the a distribuigaéo uniforme no intervalo [0,1], e os itens deverdo ser selecionados uniform distribution on the interval [0, 1], and items are to be selected at random aleatoriamente do embarque e inspecionados até a variagdo da distribuigaéo posterior ded from the shipment and inspected until the variance of the posterior distribution of @ 396 Chapter 7 Estimation has been reduced to the value 0.01 or less. We shall determine the total number of defective and nondefective items that must be obtained before the sampling process is stopped. As stated in Sec. 5.8, the uniform distribution on the interval [0, 1] is the beta distribution with parameters 1 and 1. Therefore, after y defective items and z non- defective items have been obtained, the posterior distribution of θ will be the beta distribution with α = y + 1and β = z + 1. It was shown in Theorem 5.8.3 that the vari- ance of the beta distribution with parameters α and β is αβ/[(α + β)2(α + β + 1)]. Therefore, the variance V of the posterior distribution of θ will be V = (y + 1)(z + 1) (y + z + 2)2(y + z + 3). Sampling is to stop as soon as the number of defectives y and the number of non- defectives z that have been obtained are such that V ≤ 0.01. It can be shown (see Exercise 2) that it will not be necessary to select more than 22 items, but it is neces- sary to select at least seven items. ◀ Example 7.3.3 Glove Use by Nurses. Friedland et al. (1992) studied 23 nurses in an inner-city hos- pital before and after an educational program on the importance of wearing gloves. They recorded whether or not the nurses wore gloves during procedures in which they might come in contact with bodily fluids. Before the educational program the nurses were observed during 51 procedures, and they wore gloves in only 13 of them. Let θ be the probability that a nurse will wear gloves two months after the educa- tional program. We might be interested in how θ compares to 13/51, the observed proportion before the program. We shall consider two different prior distributions for θ in order to see how sensitive the posterior distribution of θ is to the choice of prior distribution. The first prior distribution will be uniform on the interval [0, 1], which is also the beta distribution with parameters 1 and 1. The second prior distribution will be the beta distribution with parameters 13 and 38. This second prior distribution has much smaller variance than the first and has its mean at 13/51. Someone holding the second prior distribution believes fairly strongly that the educational program will have no noticeable effect. Two months after the educational program, 56 procedures were observed with the nurses wearing gloves in 50 of them. The posterior distribution of θ, based on the first prior, would then be the beta distribution with parameters 1 + 50 = 51 and 1 + 6 = 7. In particular, the posterior mean of θ is 51/(51 + 7) = 0.88, and the posterior probability that θ > 2 × 13/51is essentially 1. Based on the second prior, the posterior distribution would be the beta distribution with parameters 13 + 50 = 63 and 38 + 6 = 44. The posterior mean would be 0.59, and the posterior probability that θ > 2 × 13/51is 0.95. So, even to someone who was initially skeptical, the educational program seems to have been quite effective. The probability is quite high that nurses are at least twice as likely to wear gloves after the program as they were before. Figure 7.3 shows the p.d.f.’s of both of the posterior distributions computed above. The distributions are clearly very different. For example, the first posterior gives probability greater than 0.99 that θ > 0.7, while the second gives probability less than 0.001 to θ > 0.7. However, since we are only interested in the probability that θ > 2 × 13/51 = 0.5098, we see that both posteriors agree that this probability is quite large. ◀ 396 Capítulo 7 Estimativa foi reduzido para o valor 0,01 ou menos. Determinaremos o número total de itens defeituosos e não defeituosos que devem ser obtidos antes que o processo de amostragem seja interrompido. Como afirmado na Seç. 5.8, a distribuição uniforme no intervalo [0,1] é a distribuição beta com parâmetros 1 e 1. Portanto, apóssimitens defeituosos ez itens não defeituosos foram obtidos, a distribuição posterior deθserá a distribuição beta comα=sim+1 eβ=z+1. Foi mostrado no Teorema 5.8.3 que a variância da distribuição beta com parâmetrosαeβéαβ/[(α+β)2(α+β+1)]. Portanto, a variânciaVda distribuição posterior deθvai ser (você+1)(z+1) (você+z+2)2(você+z+3) V= . A amostragem deve parar assim que o número de defeituosossime o número de não defeituososzobtidos são tais queV≤0.01. Pode-se mostrar (ver Exercício 2) que não será necessário selecionar mais de 22 itens, mas é necessário selecionar pelo menos sete itens. - Exemplo 7.3.3 Uso de luvas por enfermeiros.Friedland et al. (1992) estudaram 23 enfermeiras em um hospital do centro da cidade. pital antes e depois de um programa educativo sobre a importância do uso de luvas. Eles registraram se as enfermeiras usavam ou não luvas durante procedimentos em que poderiam entrar em contato com fluidos corporais. Antes do programa educativo os enfermeiros foram observados durante 51 procedimentos e utilizaram luvas em apenas 13 deles. Deixarθseja a probabilidade de uma enfermeira usar luvas dois meses após o programa educacional. Poderíamos estar interessados em saber comoθcompara-se a 13/51, proporção observada antes do programa. Consideraremos duas distribuições anteriores diferentes paraθpara ver quão sensível é a distribuição posterior deθé a escolha da distribuição prévia. A primeira distribuição anterior será uniforme no intervalo [0,1], que também é a distribuição beta com parâmetros 1 e 1. A segunda distribuição anterior será a distribuição beta com parâmetros 13 e 38. Esta segunda distribuição anterior tem variância muito menor que a primeira e tem sua média em 13/51. Alguém que defende a segunda distribuição anterior acredita fortemente que o programa educacional não terá nenhum efeito perceptível. Dois meses após o programa educativo, foram observados 56 procedimentos com uso de luvas pelos enfermeiros em 50 deles. A distribuição posterior deθ, com base no primeiro anterior, seria então a distribuição beta com parâmetros 1 + 50 = 51 e 1 + 6 = 7. Em particular, a média posterior deθtem 51/(51 + 7)=0.88, e a probabilidade posterior de queθ >2×13/51 é essencialmente 1. Com base no segundo anterior, a distribuição posterior seria a distribuição beta com os parâmetros 13 + 50 = 63 e 38 + 6 = 44. A média posterior seria 0,59, e a probabilidade posterior de que θ >2×13/51 é 0,95. Assim, mesmo para alguém inicialmente céptico, o programa educativo parece ter sido bastante eficaz. A probabilidade é bastante elevada de que os enfermeiros tenham pelo menos duas vezes mais probabilidades de usar luvas depois do programa do que antes. A Figura 7.3 mostra as PDFs de ambas as distribuições posteriores calculadas acima. As distribuições são claramente muito diferentes. Por exemplo, o primeiro posterior dá probabilidade maior que 0,99 de queθ >0.7, enquanto o segundo dá probabilidade menor que 0,001 paraθ >0.7. No entanto, como estamos interessados apenas na probabilidade de queθ > 2×13/51 = 0.5098, vemos que ambos os posteriores concordam que esta probabilidade é bastante grande. - 7.3 Distribuigées Anteriores Conjugadas 397 7.3 Conjugate Prior Distributions 397 Figura 7.3PDFs posteriores Figure 7.3 Posterior p.d.f’s no Exemplo 7.2.6. As curvas sao 10 in Example 7.2.6. The curves 10 rotuladas pela anterior que T= Beta (13, 38) anterior are labeled by the prior that 777 Beta (13, 38) prior levava a posterior , 8 ™\ led to the corresponding o 8 ™\ correspondente. 5 i \ posterior. 3 i \ 2 6 14 = 6 14 3 Poy Sg Poy 3 1 a poy L ! 3 I a 4 1 \ e 4 1 \ 1 \ 1 \ \ \ 2 i \ 2 i \ \ \ / \ / \ / \ / \ 0 0,2 0,4 0,6 0,8 1,0 vocé 0 02 0.4 0.6 0.8 10 9 Amostragem de uma distribuigdo de Poisson Sampling from a Poisson Distribution Exemplo Chegadas de clientes.O dono de uma loja modela a chegada de clientes como um processo de Poisson com Example Customer Arrivals. A store owner models customer arrivals as a Poisson process with 7.3.4 taxa desconhecida@por hora. Ela atribui@uma distribuigéo gama anterior com parametros 7.3.4 unknown rate 6 per hour. She assigns 0 a gamma prior distribution with parameters 3 e 2. DeixeXseja o numero de clientes que chegam em um periodo especifico de uma 3 and 2. Let X be the number of customers that arrive in a specific one-hour period. hora. SeX=3 for observado, o lojista deseja atualizar a distribuigdo deé. - If X =3 is observed, the store owner wants to update the distribution of 6. < Quando as amostras sdo retiradas de uma distribuigdo de Poisson, a familia de When samples are taken from a Poisson distribution, the family of gamma distribuigdes gama é uma familia conjugada de distribuicdes anteriores. Essa relacdo é distributions is a conjugate family of prior distributions. This relationship is shown mostrada no préximo teorema. in the next theorem. Teorema Suponha que, ..., Xnformar uma amostra aleatoria da distribuigdo de Poisson com Theorem Suppose that X,,..., X,, form a random sample from the Poisson distribution with 7.3.2 significar@ >0, e@E desconhecido. Suponha também que a distribuicdo anterior de6é 7.3.2 mean @ > 0, and @ is unknown. Suppose also that the prior distribution of @ is the a distribuigdo gama com pardmetrosa >0 ef >0. Entdo a distribui¢do posterior ded,> gamma distribution with parameters a > 0 and 8 > 0. Then the posterior distribution dado queXeu=xeueu=1,..., 1), €a distribuigdo gama com parametros at of 0, given that X¥; =x; (i =1,...,7), is the gamma distribution with parameters u=1 Xeue B+ N. a+ >", x; and 6 +n. n “4: : : : Prova _ Deixarsim= 2 eu=1Xeu, Entaéo a fungdo de verossimilhangafn(x| Osatisfaz a relacdo Proof Let y= an x;. Then the likelihood function f,,(x|9) satisfies the relation ta(x| Of e.n6Osim. fy(xl0) x e7"0”, Nessa relagdo, um fator que envolvexmas ndo depende foi descartado do lado In this relation, a factor that involves x but does not depend on @ has been dropped direito. Além disso, o pdf anterior de@tem a forma from the right side. Furthermore, the prior p.d.f. of 6 has the form &(O Oa-1 €-p0 parad >0. E(0) «0% te? foro > 0. Desde o pdf posterior /0| x proporcional afn(x| @)E(@), segue que Since the posterior p.d.f. €(@|x) is proportional to f,,(x|0)é(@), it follows that &(0| X Oar sim-1 e-(Benjo para@ >0. E(O|x) x Oto B+ for 6 > 0. O lado direito desta relagdo pode ser reconhecido como sendo, exceto por um fator The right side of this relation can be recognized as being, except for a constant factor, constante, a pdf da distribuigdo gama com pardmetrosatsimef+n. Portanto, a the p.d.f. of the gamma distribution with parameters a + y and 8 +n. Therefore, the distribuigdo posterior de6é conforme especificado no teorema. 7 posterior distribution of 6 is as specified in the theorem. 7 No Teorema 7.3.2, os numerosaefsao os hiperparametros anteriores, enquantoat In Theorem 7.3.2, the numbers a and £ are the prior hyperparameters, while a + Sey Xeue Bt NAO So os hiperparametros posteriores. Observe que a estatistica S= re i and 6 +n are the posterior hyperparameters. Note that the statistic Y= eu=1Xeué usado para calcular a distribuigéo posterior de6e, portanto, fara parte de 77, X; is used to compute the posterior distribution of 0, and hence it will be part qualquer inferéncia baseada na posterior. of any inference based on the posterior. 398 Capitulo 7 Estimativa 398 Chapter 7 Estimation Exemplo Chegadas de clientes.No Exemplo 7.3.4, podemos aplicar o Teorema 7.3.2 com/=1,a= 3, Example Customer Arrivals. In Example 7.3.4, we can apply Theorem 7.3.2 with n = 1, a = 3, 7.3.5 (=2, exi= 3. A distribuigdo posterior de@dadoX=3 é a distribuigdo gama com os 7.3.5 8 =2, and x, = 3. The posterior distribution of 6 given X = 3 is the gamma distribu- parametros 6 e 3. - tion with parameters 6 and 3. < Exemplo A Variancia da Distribuigdo Gama Posterior.Considere uma distribuigdo de Poisson para Example The Variance of the Posterior Gamma Distribution. Consider a Poisson distribution for 7.3.6 qual a média6é desconhecido, e suponha que a fdp anterior deGé o seguinte: 7.3.6 which the mean 6 is unknown, and suppose that the prior p.d.f. of 6 is as follows: { 6) 2€28 nara >0, 2) = { 2e-2° ford >0, 0 para@so. 0 for 6 <0. Suponha também que as observacées sejam tiradas aleatoriamente da distribuigdo de Suppose also that observations are to be taken at random from the given Poisson Poisson dada até a variancia da distribuigdo posterior de foi reduzido para o valor 0,01 ou distribution until the variance of the posterior distribution of @ has been reduced to menos. Determinaremos o numero de observagées que devem ser feitas antes que o the value 0.01 or less. We shall determine the number of observations that must be processo de amostragem seja interrompido. taken before the sampling process is stopped. O PDF anterior fornecido (9 o pdf da distribuigdo gama com The given prior p.d.f. (6) is the p.d-f. of the gamma distribution with prior hiperparametros anterioresa= 1 ef= 2. Laiites, depois de termos obtidonobservado hyperparameters aw = 1 and 6 = 2. Therefore, after we have obtained n observed valoresx1,..., Xn, Cuja soma ésim= éu=1Xeu, a distribuicdo posterior de vai values x1, ..., X,, the sum of which is y = }~”_, x;, the posterior distribution of 6 will seja a distribuigdo gama com hiperparadmetros posterioressim+1 en+2. Foi be the gamma distribution with posterior hyperparameters y + 1 and n +2. It was mostrado no Teorema 5.4.2 que a variancia da distribuigdo gama com shown in Theorem 5.4.2 that the variance of the gamma distribution with parameters parametros aeféa/P2. Portanto, a varidncia Vda distribuigdo posterior de @vai ser a and B is w/f. Therefore, the variance V of the posterior distribution of 6 will be sim! +1 ~ me. Va2 7. (nm2kb (n +2)? A amostragem deve parar assim que a sequéncia de valores observadosx1, ..., xné tal que 160.01. Ao Sampling is to stop as soon as the sequence of observed values x, .. . , x, is such that contrario do Exemplo 7.3.2, ndo existe um limite uniforme sobre o tamanhonprecisa ser porquesim V <0.01. Unlike Example 7.3.2, there is no uniform bound on how large n needs to pode ser arbitrariamente grande, nado importa o quené. Claramente, é preciso pelo menos n=8 be because y can be arbitrarily large no matter what n is. Clearly, it takes at least observacGes antes 50.01. - n = 8 observations before V < 0.01. < Amostragem de uma distribuigdo normal Sampling from a Normal Distribution Exemplo Emiss6es de automéveis.Consideremos novamente a amostragem das emissées dos automéveis, em par- Example Automobile Emissions. Consider again the sampling of automobile emissions, in par- 7.3.7 oxidos especificos de nitrogénio, descritos no Exemplo 5.6.1 na pagina 302. Antes de 7.3.7 ticular oxides of nitrogen, described in Example 5.6.1 on page 302. Prior to observing observar os dados, suponha que um engenheiro acreditasse que cada medicdo de the data, suppose that an engineer believed that each emissions measurement had the emiss6es tinha a distribuigdo normal com médiaGe desvio padrdo 0,5, mas isso Gera normal distribution with mean @ and standard deviation 0.5 but that @ was unknown. desconhecido. A incerteza do engenheiro sobre@pode ser descrita por outra distribuigdo The engineer’s uncertainty about 6 might be described by another normal distribu- normal com média 2,0 e desvio padrdo 1,0. Depois de ver os dados da Figura 5.1, como tion with mean 2.0 and standard deviation 1.0. After seeing the data in Fig. 5.1, how esse engenheiro descreveria sua incerteza sobre 6? - would this engineer describe her uncertainty about 0? < Quando amostras sdo retiradas de uma distribuigao normal para a qual o valor da média@ When samples are taken from a normal distribution for which the value of the é desconhecido, mas 0 valor da varianciaozé conhecido, a familia de distribuigées normais é ela mean 6 is unknown but the value of the variance o” is known, the family of normal préopria uma familia conjugada de distribuigdes anteriores, como é mostrado no préximo distributions is itself a conjugate family of prior distributions, as is shown in the next teorema. theorem. Teorema Suponha que, ..., Xnformar uma amostra aleatéria de uma distribuigaéo normal para a qual Theorem Suppose that Xj, ..., X, form arandom sample from a normal distribution for which 7.3.3 o valor da média6é desconhecido e o valor da varianciaoz>0 é conhecido. Suponha 7.3.3 the value of the mean @ is unknown and the value of the variance o” > 0 is known. também que a distribuicdo anterior de6é a distribuigdo normal com médiazo Suppose also that the prior distribution of 6 is the normal distribution with mean jg e variagdov2 o. Entdo a distribuicgdo posterior de@dado queXev=xeu(eu=1,..., 1) and variance Up- Then the posterior distribution of 6 given that X; = x; (i=1,...,n) é a distribuigdo normal com médiajne variagdov2 10nde is the normal distribution with mean 1, and variance ve where O2pLl0t NOVODXN O~ Ug +Nvpx p= eee (7.3.1) y= i (7.3.1) O2+ NOVO? o* + No 7.3 Distribuigées Anteriores Conjugadas 399 7.3 Conjugate Prior Distributions 399 e and 2,2 oO2 o“u v= 78 (7.3.2) y=. (7.3.2) 2+ novor oF + NVo Prova A funcdo de verossimilhanga. fn(x| @tem a forma Proof The likelihood function. f,,(x|6) has the form ! 1) 7 12 ; frix|Omexp- =~ (Xey Iu(¥|0) o exp] > YQ; - 9) |. eu=1 i=l Aqui, um fator constante foi eliminado do lado direito. O método de completar o Here a constant factor has been dropped from the right side. The method of com- quadrado (ver Exercicio 24 na Segdo 5.6) nos diz que pleting the square (see Exercise 24 in Sec. 5.6) tells us that ? »? n n (XeurOR=n(O-x —nf (Xeu-Xn) 7° Yi; — 0)? =n —¥,)° + 0 — F,). eu=1 eu=1 i=l i=l Ao omitir um fator que envolvex,..., Xnmas ndo depende@, podemos reescrever fn( By omitting a factor that involves x,, ..., x, but does not depend on 6, we may rewrite X| @no seguinte formato: f,(x|6) in the following form: n n = = \2 fn(x| exp - 5G OP no: Fn(xl0) x exp| -s500 — Xp) |: Desde o pdf anterior &(@tem a forma Since the prior p.d.f. €(@) has the form 1 1 fOrexp- —(Opho , E(O) exp] ——5 0 — Mo) |, 218 2u9 segue-se que a pdf posterior &(0| xsatisfaz a relagdo it follows that the posterior p.d.f. €(6|x) satisfies the relation 1 1 p 1 1 n _— n _ Fe|xmexperienia - ~ —(O-X4, —(O-~oh E(O|x) x exp} —=| 5 @ —%,)° + =O — Mo)” | f- 202 v2 2\| 0 U9 Seynev2 — 1sdo conforme especificado nas Eqs. (7.3.1) e (7.3.2), completando o quadrado If uw, and vy are as specified in Eqs. (7.3.1) and (7.3.2), completing the square novamente estabelece a seguinte identidade: again establishes the following identity: n _ 1 1 n n _ 1 1 n _ —(O-X~np+ —(Opko = =(Opk+ ——(Xn- pe 0 — X19 +O = 10)? = GO — by)? + =— On — Mo)” O2 VB ut 02+ NOVOY oO Up Uy o* +N Como o termo final do lado direito desta equacgdo ndo envolve@, pode ser Since the final term on the right side of this equation does not involve @, it can be absorvido no fator de proporcionalidade, e obtemos a relacgao absorbed in the proportionality factor, and we obtain the relation 1 1 &(0|xkexp- ——(O-p haa E(6|x) x exp] ——5 (6 — 4)" |. 212 2v; O lado direito desta relacdo pode ser reconhecido como sendo, exceto por um fator constante, The right side of this relation can be recognized as being, except for a constant factor, o pdf da distribuigdo normal com médiasne variagdo v2 1. Portanto, o the p.d.f. of the normal distribution with mean j; and variance ve. Therefore, the distribuicdo posterior de@é conforme especificado no teorema. 2 posterior distribution of 6 is as specified in the theorem. 2 No Teorema 7.3.3, os nUMerospwe v2 _osdo os hiperparametros anteriores, enquantopn In Theorem 7.3.3, the numbers jg and vA are the prior hyperparameters, while 14 ev2 1sdo os hiperparametros posteriores. Observe que a estatisticaXné usado no and vy are the posterior hyperparameters. Notice that the statistic X,, is used in the construcdo da distribuicdo posterior e, portanto, desempenhara um papel em qualquer inferéncia construction of the posterior distribution, and hence will play a role in any inference baseada na posterior. based on the posterior. Exemplo Emissdes de automéveis.Podemos aplicar 0 Teorema 7.3.3 para responder a questdo no final Example Automobile Emissions. We can apply Theorem 7.3.3 to answer the question at the end 7.3.8 do Exemplo 7.3.7. Na notacdo do teorema, temosn=46, 02= 0.52= 0.25, 7.3.8 of Example 7.3.7. In the notation of the theorem, we have n = 46, 02 =0.5° = 0.25, 400 Chapter 7 Estimation μ0 = 2, and v2 = 1.0. The average of the 46 measurements is xn = 1.329. The posterior distribution of θ is then the normal distribution with mean and variance given by μ1 = 0.25 × 2 + 46 × 1 × 1.329 0.25 + 46 × 1 = 1.333, v2 1 = 0.25 × 1 0.25 + 46 × 1 = 0.0054. ◀ The mean μ1 of the posterior distribution of θ, as given in Eq. (7.3.1), can be rewritten as follows: μ1 = σ 2 σ 2 + nv2 0 μ0 + nv2 0 σ 2 + nv2 0 xn. (7.3.3) It can be seen from Eq. (7.3.3) that μ1is a weighted average of the mean μ0 of the prior distribution and the sample mean xn. Furthermore, it can be seen that the relative weight given to xn satisfies the following three properties: (1) For fixed values of v2 0 and σ 2, the larger the sample size n, the greater will be the relative weight that is given to xn. (2) For fixed values of v2 0 and n, the larger the variance σ 2 of each observation in the sample, the smaller will be the relative weight that is given to xn. (3) For fixed values of σ 2 and n, the larger the variance v2 0 of the prior distribution, the larger will be the relative weight that is given to xn. Moreover, it can be seen from Eq. (7.3.2) that the variance v2 1 of the posterior distribution of θ depends on the number n of observations that have been taken but does not depend on the magnitudes of the observed values. Suppose, therefore, that a random sample of n observations is to be taken from a normal distribution for which the value of the mean θ is unknown, the value of the variance is known, and the prior distribution of θ is a specified normal distribution. Then, before any observations have been taken, we can use Eq. (7.3.2) to calculate the actual value of the variance v2 1 of the posterior distribution. However, the value of the mean μ1 of the posterior distribution will depend on the observed values that are obtained in the sample. The fact that the variance of the posterior distribution depends only on the number of observations is due to the assumption that the variance σ 2 of the individual observations is known. In Sec. 8.6, we shall relax this assumption. Example 7.3.9 The Variance of the Posterior Normal Distribution. Suppose that observations are to be taken at random from the normal distribution with mean θ and variance 1, and that θ is unknown. Assume that the prior distribution of θ is a normal distribution with variance 4. Also, observations are to be taken until the variance of the posterior distribution of θ has been reduced to the value 0.01 or less. We shall determine the number of observations that must be taken before the sampling process is stopped. It follows from Eq. (7.3.2) that after n observations have been taken, the variance v2 1 of the posterior distribution of θ will be v2 1 = 4 4n + 1. Therefore, the relation v2 1 ≤ 0.01 will be satisfied if and only if n ≥ 99.75. Hence, the relation v2 1 ≤ 0.01 will be satisfied after 100 observations have been taken and not before then. ◀ Example 7.3.10 Calorie Counts on Food Labels. Allison, Heshka, Sepulveda, and Heymsfield (1993) sampled 20 nationally prepared foods and compared the stated calorie contents per 400 Capítulo 7 Estimativa μ0= 2, ev2= 1.0. A média das 46 medições éxn=1.329. A distribuição posterior deθ é então a distribuição normal com média e variância dada por 0.25×2 + 46×1×1.329 μ1= v2 =1.333, 0.25 + 46×1 5 0.2 × 1 0.25 + 46×1 1= =0.0054. - O significativoμ1da distribuição posterior deθ, conforme dado na Eq. (7.3.1), pode ser reescrito da seguinte forma: σ2 σ2+novo2 novo2 + novo2n μ1= μ0 + 0 x. (7.3.3) 2 0 σ 0 Isso pode ser visto na Eq. (7.3.3) queμ1é uma média ponderada da médiaμ0da distribuição anterior e da média amostralxn. Além disso, pode-se observar que o peso relativo dado ao xnsatisfaz as três propriedades a seguir: (1) Para valores fixos dev2 0 eσ2, maior o tamanho da amostran, maior será o peso relativo dado paraxn. (2) Para valores fixos dev2 0en, maior será a variânciaσ2de cada observação na amostra, menor será o peso relativo dado axn. (3) Para fixo valores deσ2en, maior será a variânciav2 0da distribuição anterior, maior será seja o peso relativo dado axn. Além disso, pode ser visto na Eq. (7.3.2) que a variânciav2 1da parte posterior distribuição deθdepende do númeronde observações que foram feitas, mas não depende das magnitudes dos valores observados. Suponha, portanto, que uma amostra aleatória denobservações devem ser tiradas de uma distribuição normal para a qual o valor da médiaθé desconhecido, o valor da variância é conhecido e a distribuição anterior deθé uma distribuição normal especificada. Então, antes de quaisquer observações serem feitas, podemos usar a Eq. (7.3.2) para calcular o valor real da variaçãov2 1da distribuição posterior. No entanto, o valor da médiaμ1 da distribuição posterior dependerá dos valores observados obtidos na amostra. O fato de a variância da distribuição posterior depender apenas do número de observações se deve ao pressuposto de que a variânciaσ2das observações individuais é conhecida. Na seg. 8.6, vamos relaxar esta suposição. Exemplo 7.3.9 A Variância da Distribuição Normal Posterior.Suponha que as observações sejam ser retirado aleatoriamente da distribuição normal com médiaθe variância 1, e queθÉ desconhecido. Suponha que a distribuição anterior deθé uma distribuição normal com variância 4. Além disso, as observações devem ser feitas até que a variância da distribuição posterior deθ foi reduzido para o valor 0,01 ou menos. Determinaremos o número de observações que devem ser feitas antes que o processo de amostragem seja interrompido. Segue-se da Eq. (7.3.2) que depoisnobservações foram feitas, a variância 1da distribuição posterior deθvai ser v2 4 v21= 4n+1 . Portanto, a relaçãov2 1≤0.01 ficará satisfeito se e somente sen≥99.75. Portanto, o relaçãov21≤0.01 ficará satisfeito após 100 observações terem sido feitas e não antes de. - Exemplo 7.3.10 As calorias contam nos rótulos dos alimentos.Allison, Heshka, Sepúlveda e Heymsfield (1993) amostraram 20 alimentos preparados nacionalmente e compararam o conteúdo calórico declarado por 7.3 Distribuigdes Anteriores Conjugadas 401 7.3 Conjugate Prior Distributions 401 Figura 7.4Histograma de Numero de alimentos Figure 7.4 Histogram of Number of foods diferengas percentuais percentage differences be- entre calorias observadas e 8 tween observed and ad- 8 anunciadas no Exemplo vertised calories in Exam- 7.3.10. 6 ple 7.3.10. 4 4 | ft P| ft 230 220 210 0 10 20 —30 —20 -10 0 10 20 Calorias de laboratério menos calorias do rétulo Laboratory calories minus label calories grama dos rétulos ao conteudo caldrico determinado em laboratorio. A Figura 7.4 é gram from the labels to calorie contents determined in the laboratory. Figure 7.4 is um histograma das diferengas percentuais entre as medicées de calorias observadas a histogram of the percentage differences between the observed laboratory calorie em laboratério e o conteudo calérico anunciado nos rétulos dos alimentos. Suponha measurements and the advertised calorie contents on the labels of the foods. Suppose que modelemos a distribuigdo condicional das diferengas dadas@como a distribuicdo that we model the conditional distribution of the differences given 6 as the normal normal com média@e varidncia 100. (Nesta segdo, assumimos que a varidncia é distribution with mean @ and variance 100. (In this section, we assume that the conhecida. Na Secdo 8.6, seremos capazes de lidar com 0 caso em que a médiae a variance is known. In Sec. 8.6, we will be able to deal with the case in which the variancia sdo tratadas como variaveis aleatérias com uma distribuigdo conjunta.) mean and the variance are treated as random variables with a joint distribution.) We usara uma distribuigdo prévia paraGessa é a distribuigdo normal com média 0 e will use a prior distribution for 6 that is the normal distribution with mean 0 and a variancia de 60. Os dadosXcompreendem a colecao de 20 diferengas na Fig. 7.4, cuja variance of 60. The data X comprise the collection of 20 differences in Fig. 7.4, whose média é€ 0.125. A distribuigdo posterior de@seria entdo a distribuigdo normal com average is 0.125. The posterior distribution of 6 would then be the normal distribution média with mean 100x0 + 20x60x0.125 100 x 0 +20 x 60 x 0.125 h= ————__—____———-_ =0.1154, y= SOT KOU KOO 0.1154, 100 + 20x60 100 + 20 x 60 e variagdo and variance 100x60 100 x 60 v= ————___- =4,62. vt = ———— = 4.62. 100 + 20x60 100 + 20 x 60 Por exemplo, podemos estar interessados em saber se os embaladores estdo ou ndo a subestimar For example, we might be interested in whether or not the packagers are system- sistematicamente as calorias dos seus alimentos em pelo menos 1 por cento. Isto corresponderia a@ > atically understating the calories in their food by at least 1 percent. This would 1. Usando 0 Teorema 5.6.6, podemos encontrar correspond to 6 > 1. Using Theorem 5.6.6, we can find ( ) 1-9.1154 1—0.1154 Pr.(@ >1|XF1 - 1a pita) =1 -(1.12)-0.3403. Pr@ > 1\x) =1-© (=) = 1- (1.12) = 0.3403. 4.62 V4.62 Ha uma chance ndo negligenciavel, mas nao esmagadora, de que os embaladores estejam reduzindo There is a nonnegligible, but not overwhelming, chance that the packagers are uma porcentagem ou mais de seus rétulos. - shaving a percent or more off of their labels. < Amostragem de uma distribuigdo exponencial Sampling from an Exponential Distribution Exemplo Vida util dos componentes eletr6énicos.No Exemplo 7.2.1, suponha que observamos o Example Lifetimes of Electronic Components. In Example 7.2.1, suppose that we observe the 7.3.11 vida Util de tr€s componentes, X1= 3,X2= 1.5, eX3= 2.1. Estas foram modeladas como 7.3.11 lifetimes of three components, X; = 3, X7 = 1.5, and X3 = 2.1. These were modeled variaveis aleatorias exponenciais iid fornecidas@. Nossa distribuigdo prévia paraGfoi a as i.i.d. exponential random variables given @. Our prior distribution for 6 was the distribuigdo gama com os parametros 1 e 2. Qual é a distribuigdo posterior de@ dadas gamma distribution with parameters 1 and 2. What is the posterior distribution of @ essas vidas observadas? - given these observed lifetimes? < 402 Capitulo 7 Estimativa 402 Chapter 7 Estimation Ao amostrar de uma distribuigéo exponencial para a qual o valor do parametro6é When sampling from an exponential distribution for which the value of the desconhecido, a familia de distribuig6es gama serve como uma familia conjugada de parameter @ is unknown, the family of gamma distributions serves as a conjugate distribuigées anteriores, conforme mostrado no préximo teorema. family of prior distributions, as shown in the next theorem. Teorema Suponha que, ..., Xnformar uma amostra aleatoria a partir da distribuicéo exponencial Theorem Suppose that X;,..., X, form a random sample from the exponential distribution 7.3.4 com parametro@ >0 que é desconhecido. Suponha também que a distribui¢do 7.3.4 with parameter 6 > 0 that is unknown. Suppose also that the prior distribution of anterior de 6 a distribuigdo gama com pardmetrosa >0 ef >0. Entdo a distribuigdo @ is the gamma distribution with parameters a > 0 and § > 0. Then the posterior posterior de@dado que) = tXeu=xewX eu=1,..., m) €a distribuigdo gama com distribution of 0 given that X; =x; (i=1,...,7) is the gamma distribution with a n n pardmetrosatne f+ wnt Xeu, parameters w +n and B +" _, x;. n ProvaNovamente, deixesim= 2 euv=1Xeu. Entéo a fungao de verossimilhangafn(x| 0 Proof Again, let y= va x;. Then the likelihood function f, (x|@) is fn(X| = One-ay. f(X|0) = 0" e®. Além disso, 0 pdf anterior&/@tem a forma Also, the prior p.d.f. €(@) has the form &0 Oa-1 €-B0 para >0. E(0) «0% te? ford > 0. Segue-se, portanto, que a pdf posterior &/6| xtem a forma It follows, therefore, that the posterior p.d.f. €(@|x) has the form &(O| X¥ Gat+n-1 €-(B+ej8 parad >0. E(O|x) ox gxtn-1,-(B+Y)9 for Os 0). O lado direito desta relagdo pode ser reconhecido como sendo, exceto por um fator The right side of this relation can be recognized as being, except for a constant factor, constante, a pdf da distribuigdo gama com pardmetrosatneftsim. Portanto, a the p.d.f. of the gamma distribution with parameters aw +n and 6 + y. Therefore, the distribuigdo posterior deGé conforme especificado no teorema. 7 posterior distribution of 6 is as specified in the theorem. 7 A parte posterior dilistribuigdo de@no Teorema 7.3.4 depende do valor observado The posterior distribution of 6 in Theorem 7.3.4 depends on the observed value da estatistica S= cu=1Xeu; portanto, toda inferéncia sobre @com base na parte posterior of the statistic Y = )~"_, X;; hence, every inference about 6 based on the posterior distribuigdo dependera do valor observado deS. distribution will depend on the observed value of Y. Exemplo Vida Util dos componentes eletrdnicos.No Exemplo 7.3.11, podemos aplicar o Teorema 7.3.4 Example Lifetimes of Electronic Components. In Example 7.3.11, we can apply Theorem 7.3.4 7.3.12 para encontrar a distribuigdo posterior. Na notagdo do teorema e sua prova, 7.3.12 to find the posterior distribution. In the notation of the theorem and its proof, we temosn=3,a= 1,f= 2,e have n = 3,a =1, B =2, and y” n sim= xev=3+1.5+2.1=6.6. y= ox, =34+154+21=66. eu=1 i=1 A distribuigdo posterior de6é entdo a distribuigdo gama com pardmetros a=1 + 3 The posterior distribution of 6 is then the gamma distribution with parameters =4ef=2+6.6=8.6. - a=1+3=4and B=24+66=8.6. < O leitor deve notar que 0 Teorema 7.3.4 teria encurtado bastante a The reader should note that Theorem 7.3.4 would have greatly shortened the derivagdo da distribuigdo posterior no Exemplo 7.2.6. derivation of the posterior distribution in Example 7.2.6. Distribuicdes anteriores inadequadas Improper Prior Distributions Na seg. 7.2, mencionamos os anteriores impréprios como expedientes que tentam capturar a In Sec. 7.2, we mentioned improper priors as expedients that try to capture the ideia de que hd muito mais informagées nos dados do que é capturado em nossa distribuigdo idea that there is much more information in the data than is captured in our prior anterior. Cada uma das familias conjugadas que vimos nesta se¢do tem um anterior imprdprio distribution. Each of the conjugate families that we have seen in this section has an como caso limite. improper prior as a limiting case. Exemplo Um ensaio clinico.O que ilustramos aqui se aplicara a todos os exemplos em que os dados Example A Clinical Trial. What we illustrate here will apply to all examples in which the data 7.3.13 compreendem uma amostra condicionalmente iid (dada@) da distribuigdéo de Bernoulli com 7.3.13 comprise a conditionally i.i.d. sample (given 6) from the Bernoulli distribution with parametro@. Considere os sujeitos do grupo da imipramina no Exemplo 2.1.4. A proporcado de parameter 9. Consider the subjects in the imipramine group in Example 2.1.4. The sucessos entre todos os pacientes que poderiam receber imipramina foi chamada Phos proportion of successes among all patients who might get imipramine had been called exemplos anteriores, mas vamos chaméa-lo@desta vez de acordo com a notacao geral P inearlier examples, but let us call it 6 this time in keeping with the general notation 7.3 Distribuigdes Anteriores Conjugadas 403 7.3 Conjugate Prior Distributions 403 Figura 7.5As probabilidades Figure 7.5 The posterior posteriores dos Exemplos 0,5 probabilities from Exam- 05 2.3.7 (X) e 2.3.8 (barras) ples 2.3.7 (X) and 2.3.8 (bars) juntamente com a pdf 0,4 ” together with the posterior 0.4 posterior do Exemplo 7.3.13 0,3 p.d.f. from Example 7.3.13 0.3 (linha sélida). 02 (solid line). 02 0,1 0.1 OF, a 3.64 os Bs 67 Be BDEIO Bil | By By By By Bs Be By By By By By deste capitulo. Suponha que@tem a distribuigdo beta com parametrosaef, um of this chapter. Suppose that 6 has the beta distribution with parameters a and £, conjugado geral anterior. Han=40 pacientes no grupo da imipramina, e 22 deles sdo a general conjugate prior. There are n = 40 patients in the imipramine group, and sucessos. A distribuigdo posterior de& a distribuicdo beta com pardmetrosa+ 22 efs+ 22 of them are successes. The posterior distribution of 6 is the beta distribution with 18, como vimos no Teorema 7.3.1. A média da distribuigdo posterior é(a+22)/at+ B+40 parameters a + 22 and 6 + 18, as we saw in Theorem 7.3.1. The mean of the posterior ). SeaePsdo pequenos, entdo a média posterior é préxima de 22/40, que éa distribution is (# + 22)/(@ + 6 + 40). If a and 6 are small, then the posterior mean proporcdo observada de sucessos. Na verdade, sea=f= 0, o que ndo corresponde a is close to 22/40, which is the observed proportion of successes. Indeed, if a = 6 = 0, uma distribuicdo beta real, entao a média posterior é exatamente 22/40. No entanto, which does not correspond to a real beta distribution, then the posterior mean is podemos ver 0 que acontece comoaefchegar perto de 0. O beta pdf (ignorando o exactly 22/40. However, we can look at what happens as a and £ get close to 0. fator constante) €Ga-1(1 -@)s-1. Podemos definira=f= 0 e finja queg(O8-1(1 -A)-1€ 0 The beta p.d.f. (ignoring the constant factor) is 6¢—-!(1 — 9)8-!. We can set a = B =0 pdf anterior de@. A fungdo de verossimilhanga ( and pretend that €(0) « 6—!(1 — 6)~1 is the prior p.d.f. of 9. The likelihood function €/40(x| @)=22640) 2271 -8) 18 Podemos ignorar o fator constantes@2e obter o is fan(x|0) = (55)072(1 — 6)!8. We can ignore the constant factor (6) and obtain the produtos product &(0| x} 621 (1 -Oh7,para 0<O<1. E(@|x) «71 —0)!’, for0 <0 <1. Isto é facilmente reconhecido como sendo igual ao pdf da distribuigéo beta com os parametros This is easily recognized as being the same as the p.d.f. of the beta distribution with 22 e 18, exceto por um fator constante. Portanto, se usarmos a “distribuigdo beta” imprépria parameters 22 and 18 except for a constant factor. So, if we use the improper “beta anterior com os hiperparametros anteriores 0 e 0, obteremos a distribuicdo beta posterior para distribution” prior with prior hyperparameters 0 and 0, we get the beta posterior dis- &om hiperparametros posteriores 22 e 18. Observe que 0 Teorema 7.3.1 produz a distribuicgdo tribution for 6 with posterior hyperparameters 22 and 18. Notice that Theorem 7.3.1 posterior correta mesmo neste caso anterior impréprio. A Figura 7.5 adiciona a pdf da yields the correct posterior distribution even in this improper prior case. Figure 7.5 distribuicdo beta posterior calculada aqui a Figura 2.4, que representa as probabilidades adds the p.d.f. of the posterior beta distribution calculated here to Fig. 2.4 which de- posteriores para duas distribuicdes anteriores discretas diferentes. Todos os trés posteriores picted the posterior probabilities for two different discrete prior distributions. All estdo bem préximos. - three posteriors are pretty close. < Definicao Prévio imprdéprio.Deixar&eja uma fungdo nao negativafion cujo dominio inclui o parametro Definition Improper Prior. Let € be anonnegative function whose domain includes the parameter 7.3.2 espaco de um modelo estatistico. Suponha que €/@)d=~.Se fingirmos como seé(@¥ o pdf 7.3.2 space of a statistical model. Suppose that [ &(0)d0 = oo. If we pretend as if (8) is anterior de@, entaéo estamos usando umanterior imprdpriopara 6. the prior p.d.f. of 6, then we are using an improper prior for 0. A definigdo 7.3.2 nao é muito Util para determinar um produto imprdprio antes do uso em Definition 7.3.2 is not of much use in determining an improper prior to use in a uma aplicagdo especifica. Existem muitos métodos para escolher um anterior impréprio, ea particular application. There are many methods for choosing an improper prior, and esperanca é que todos eles levem a distribuigdes posteriores semelhantes, de modo que ndo the hope is that they all lead to similar posterior distributions so that it does not much importa muito qual deles se escolhe. O método mais direto para escolher uma priori imprépria matter which of them one chooses. The most straightforward method for choosing é comecar com a familia de distribuigées a priori conjugadas, se tal familia existir. Na maioria an improper prior is to start with the family of conjugate prior distributions, if there dos casos, se a parametrizagdo da familia conjugada (hiperparametros anteriores) for escolhida is such a family. In most cases, if the parameterization of the conjugate family (prior cuidadosamente, cada um dos hiperparametros posteriores sera igual ao hiperparametro hyperparameters) is chosen carefully, the posterior hyperparameters will each equal anterior correspondente mais uma estatistica. Seria entéo substituido cada um desses the corresponding prior hyperparameter plus a statistic. One would then replace each hiperparametros anteriores por 0 na formula da pdf anterior. Isso geralmente resulta em uma of those prior hyperparameters by 0 in the formula for the prior p.d.f. This generally funcdo que satisfaz a Definicdo 7.3.2. No Exemplo 7.3.13, cada um dos hiperparadmetros results in a function that satisfies Definition 7.3.2. In Example 7.3.13, each of the posteriores era igual aos hiperparametros anteriores correspondentes mais algumas posterior hyperparameters were equal to the corresponding prior hyperparameters estatisticas. Nesse exemplo, substituimos ambos os hiperparametros anteriores por 0 para plus some statistic. In that example, we replaced both prior hyperparameters by obter o anterior imprdprio. Aqui estao mais alguns exemplos. O método apenas 0 to obtain the improper prior. Here are some more examples. The method just 404 Chapter 7 Estimation described needs to be modified if one chooses an “inconvenient” parameterization of the conjugate prior, as in Example 7.3.15 below. Example 7.3.14 Prussian Army Deaths. Bortkiewicz (1898) counted the numbers of Prussian soldiers killed by horsekick (a more serious problem in the nineteenth century than it is to- day) in 14 army units for each of 20 years, a total of 280 counts. The 280 counts have the following values: 144 counts are 0, 91 counts are 1, 32 counts are 2, 11 counts are 3, and 2 counts are 4. No unit suffered more than four deaths by horsekick during any one year. (These data were reported and analyzed by Winsor, 1947.) Suppose that we were going to model the 280 counts as a random sample of Poisson random variables X1, . . . , X280 with mean θ conditional on the parameter θ. A conjugate prior would be a member of the gamma family with prior hyperparameters α and β. Theorem 7.3.2 says that the posterior distribution of θ would be the gamma dis- tribution with posterior hyperparameters α + 196 and β + 280, since the sum of the 280 counts equals 196. Unless either α or β is very large, the posterior gamma distri- bution is nearly the same as the gamma distribution with posterior hyperparameters 196 and 280. This posterior distribution would seem to be the result of using a con- jugate prior with prior hyperparameters 0 and 0. Ignoring the constant factor, the p.d.f. of the gamma distribution with parameters α and β is θα−1eβθ for θ > 0. If we let α = 0 and β = 0 in this formula, we get the improper prior “p.d.f.” ξ(θ) = θ−1 for θ > 0. Pretending as if this really were a prior p.d.f. and applying Bayes’ theorem for random variables (Theorem 3.6.4) would yield ξ(θ|x) ∝ θ195e−280θ, for θ > 0. This is easily recognized as being the p.d.f. of the gamma distribution with parameters 196 and 280, except for a constant factor. The result in this example applies to all cases in which we model data with Poisson distributions. The improper “gamma distribution” with prior hyperparameters 0 and 0 can be used in Theorem 7.3.2, and the conclusion will still hold. ◀ Example 7.3.15 Failure Times of Ball Bearings. Suppose that we model the 23 logarithms of failure times of ball bearings from Example 5.6.9 as normal random variables X1, . . . , X23 with mean θ and variance 0.25. A conjugate prior for θ would be the normal distribution with mean μ0 and variance v2 0 for some μ0 and v2 0. The average of the 23 log-failure times is 4.15, so the posterior distribution of θ would be the normal distribution with mean μ1 = (0.25μ0 + 23 × 4.15v2 0)/(0.25 + 23v2 0) and variance v2 1 = (0.25v2 0)/(0.25 + 23v2 0). If we let v2 0 → ∞ in the formulas for μ1 and v2 1, we get μ1 → 4.15 and v2 1 → 0.25/23. Having infinite variance for the prior distribution of θ is like saying that θ is equally likely to be anywhere on the real number line. This same thing happens in every example in which we model data X1, . . . , Xn as a random sample from the normal distribution with mean θ and known variance σ 2 conditional on θ. If we use an improper “normal distribution” prior with variance ∞ (the prior mean does not matter), the calculation in Theorem 7.3.3 would yield a posterior distribution that is the normal distribution with mean xn and variance σ 2/n. The improper prior “p.d.f.” in this case is ξ(θ) equal to a constant. This example would be an application of the method described after Defini- tion 7.3.2 if we had described the conjugate prior distribution in terms of the following “more convenient” hyperparameters: 1 over the variance u0 = 1/v2 0 and the mean over the variance t0 = μ0/v2 0. In terms of these hyperparameters, the posterior dis- tribution has 1 over its variance equal to u1 = u0 + n/0.25 and mean over variance equal to t1 = μ1/v2 1 = t0 + 23 × 4.15/0.25. Each of u1 and t1 has the form of the cor- 404 Capítulo 7 Estimativa descrito precisa ser modificado caso se opte por uma parametrização “inconveniente” do conjugado anterior, como no Exemplo 7.3.15 abaixo. Exemplo 7.3.14 Mortes do Exército Prussiano.Bortkiewicz (1898) contou o número de soldados prussianos mortos por coice de cavalo (um problema mais sério no século XIX do que é hoje) em 14 unidades do exército em cada um dos 20 anos, num total de 280 acusações. As 280 contagens têm os seguintes valores: 144 contagens são 0, 91 contagens são 1, 32 contagens são 2, 11 contagens são 3 e 2 contagens são 4. Nenhuma unidade sofreu mais de quatro mortes por chute a cavalo durante qualquer ano. (Esses dados foram relatados e analisados por Winsor, 1947.) Suponha que modelássemos as 280 contagens como uma amostra aleatória de variáveis aleatórias de PoissonX1, . . . , X280com médiaθ condicional ao parâmetroθ. Um anterior conjugado seria um membro da família gama com hiperparâmetros anterioresαe β. O Teorema 7.3.2 diz que a distribuição posterior de θseria a distribuição gama com hiperparâmetros posterioresα+ 196 eβ+ 280, já que a soma das 280 contagens é igual a 196. A menos queαouβé muito grande, a distribuição gama posterior é quase a mesma que a distribuição gama com hiperparâmetros posteriores 196 e 280. Esta distribuição posterior parece ser o resultado do uso de um conjugado anterior com hiperparâmetros anteriores 0 e 0. Ignorando o fator constante, o pdf da distribuição gama com parâmetrosαeβéθα−1eβθparaθ >0. Se deixarmosα= 0 eβ= 0 nesta fórmula, obtemos o “pdf” anterior impróprioξ(θ)=θ−1para θ >0. Fingir que se trata realmente de uma fdp anterior e aplicar o teorema de Bayes para variáveis aleatórias (Teorema 3.6.4) resultaria ξ(θ|x)∝θ195e−280θ,paraθ >0. Isto é facilmente reconhecido como sendo o pdf da distribuição gama com os parâmetros 196 e 280, exceto por um fator constante. O resultado neste exemplo se aplica a todos os casos em que modelamos dados com distribuições de Poisson. A “distribuição gama” imprópria com hiperparâmetros anteriores 0 e 0 pode ser usada no Teorema 7.3.2, e a conclusão ainda será válida. - Exemplo 7.3.15 Tempos de falha de rolamentos de esferas.Suponha que modelemos os 23 logaritmos dos tempos de falha de rolamentos de esferas do Exemplo 5.6.9 como variáveis aleatórias normaisX1, . . . , X23 com médiaθe variância 0,25. Um conjugado anterior paraθseria a distribuição normal com médiaμ0e variaçãov2 0para algunsμ0ev2 0. A média dos 23 registros de falha vezes é 4,15, então a distribuição posterior deθseria a distribuição normal com significarμ1=(0.25μ0+23×4.15v20)/(0.25 + 23v20)e variaçãov2 1=(0.25v2 0)/(0.25+ 23v2 0). Se deixarmosv20→ ∞nas fórmulas deμ1ev2 1, Nós temosμ1→4.15 ev2 1→ 0.25/23. Tendo variância infinita para a distribuição anterior deθé como dizer issoθ tem a mesma probabilidade de estar em qualquer lugar na reta numérica real. A mesma coisa acontece em todos os exemplos em que modelamos dadosX1, . . . , Xncomo uma amostra aleatória da distribuição normal com médiaθe variação conhecidaσ2condicional aθ. Se usarmos uma “distribuição normal” imprópria anterior com variância∞ (a média anterior não importa), o cálculo no Teorema 7.3.3 produziria uma distribuição posterior que é a distribuição normal com médiaxne variaçãoσ2/n. O “pdf” anterior impróprio neste caso é ξ(θ)igual a uma constante. Este exemplo seria uma aplicação do método descrito após a Definição 7.3.2 se tivéssemos descrito a distribuição a priori conjugada em termos do seguinte Hiperparâmetros “mais convenientes”: 1 sobre a variânciavocê0= 1/v2 0e a média acima da variaçãot0=μ0/v2 0. Em termos desses hiperparâmetros, a dis- atribuição tem 1 sobre sua variância igual avocê1=você0+n/0.25 e média sobre variância igual at1=μ1/v2 1=t0+23×4.15/0.25. Cada umvocê1et1tem a forma do correspondente 7.3 Conjugate Prior Distributions 405 responding prior hyperparamter plus a statistic. The improper prior with u0 = t0 = 0 also has ξ(θ) equal to a constant. ◀ There are improper priors for other sampling models, also. The reader can verify (in Exercise 21) that the “gamma distribution” with parameters 0 and 0 leads to results similar to those in Example 7.3.14 when the data are a random sample from an exponential distribution. Exercises 23 and 24 introduce a general collection of p.d.f.’s f (x|θ) for which it is easy to construct improper priors. Improper priors were introduced for cases in which the observed data contain much more information than is represented by our prior distribution. Implicitly, we are assuming that the data are rather informative. When the data do not contain much information, improper priors may be higly inappropriate. Example 7.3.16 Very Rare Events. In Example 5.4.7, we discussed a drinking water contaminant known as cryptosporidium that generally occurs in very low concentrations. Suppose that a water authority models the oocysts of cryptosporidium in the water supply as a Poisson process with rate of θ oocysts per liter. They decide to sample 25 liters of water to learn about θ. Suppose that they use the improper gamma prior with “p.d.f.” θ−1. (This is the same improper prior used in Example 7.3.14.) If the 25-liter sample contains no oocysts, the water authority would be led to a posterior distribution for θ that was the gamma distribution with parameters 0 and 5, which is not a real distribution. No matter how many liters are sampled, the posterior distribution will not be a real distribution until at least one oocyst is observed. When sampling for rare events, one might be forced to quantify prior information in the form a proper prior distribution in order to be able to make inferences based on the posterior distribution. ◀ Summary For each of several different statistical models for data given the parameter, we found a conjugate family of distributions for the parameter. These families have the property that if the prior distribution is chosen from the family, then the posterior distribution is a member of the family. For data with distributions related to the Bernoulli, such as binomial, geometric, and negative binomial, the conjugate family for the success probability parameter is the family of beta distributions. For data with distributions related to the Poisson process, such as Poisson, gamma (with known first parameter), and exponential, the conjugate family for the rate parameter is the family of gamma distributions. For data having a normal distribution with known variance, the conjugate family for the mean is the normal family. We also described the use of improper priors. Improper priors are not true probability distributions, but if we pretend that they are, we will compute posterior distributions that approximate the posteriors that we would have obtained using proper conjugate priors with extreme values of the prior hyperparameters. Exercises 1. Consider again the situation described in Example 7.3.10. Once again, suppose that the prior distribution of θ is a normal distribution with mean 0, but this time let the prior variance be v2 > 0. If the posterior mean of θ is 0.12, what value of v2 was used? 2. Show that in Example 7.3.2 it must be true that V ≤ 0.01 after 22 items have been selected. Also show that V > 0.01 until at least seven items have been selected. 3. Suppose that the proportion θ of defective items in a large shipment is unknown and that the prior distribution 7.3 Distribuições Anteriores Conjugadas 405 respondendo ao hiperparâmetro anterior mais uma estatística. O anterior impróprio comvocê0=t0= 0 também temξ(θ)igual a uma constante. - Existem antecedentes impróprios para outros modelos de amostragem também. O leitor pode verificar (no Exercício 21) que a “distribuição gama” com parâmetros 0 e 0 leva a resultados semelhantes aos do Exemplo 7.3.14 quando os dados são uma amostra aleatória de uma distribuição exponencial. Os exercícios 23 e 24 apresentam uma coleção geral de PDFsf(x| θ)para o qual é fácil construir anteriores impróprias. Prioridades impróprias foram introduzidas para casos em que os dados observados contêm muito mais informações do que as representadas pela nossa distribuição a priori. Implicitamente, estamos assumindo que os dados são bastante informativos. Quando os dados não contêm muitas informações, antecedentes impróprios podem ser altamente inapropriados. Exemplo 7.3.16 Eventos muito raros.No Exemplo 5.4.7, discutimos um contaminante de água potável conhecido como cryptosporidium que geralmente ocorre em concentrações muito baixas. Suponha que uma autoridade hídrica modele os oocistos de Cryptosporidium no abastecimento de água como um processo de Poisson com taxa deθoocistos por litro. Eles decidem provar 25 litros de água para aprenderθ. Suponha que eles usem o gama impróprio anterior com “pdf” θ−1. (Este é o mesmo anterior impróprio usado no Exemplo 7.3.14.) Se a amostra de 25 litros não contiver oocistos, a autoridade hídrica seria levada a uma distribuição posterior paraθessa era a distribuição gama com parâmetros 0 e 5, que não é uma distribuição real. Não importa quantos litros sejam amostrados, a distribuição posterior não será uma distribuição real até que pelo menos um oocisto seja observado. Ao amostrar eventos raros, pode-se ser forçado a quantificar informações anteriores na forma de uma distribuição anterior adequada para poder fazer inferências com base na distribuição posterior. - Resumo Para cada um dos vários modelos estatísticos diferentes para dados de determinado parâmetro, encontramos uma família conjugada de distribuições para o parâmetro. Essas famílias têm a propriedade de que se a distribuição anterior for escolhida na família, então a distribuição posterior será um membro da família. Para dados com distribuições relacionadas ao Bernoulli, como binomial, geométrica e binomial negativa, a família conjugada para o parâmetro de probabilidade de sucesso é a família de distribuições beta. Para dados com distribuições relacionadas ao processo de Poisson, como Poisson, gama (com primeiro parâmetro conhecido) e exponencial, a família conjugada para o parâmetro taxa é a família de distribuições gama. Para dados com distribuição normal com variância conhecida, a família conjugada para a média é a família normal. Também descrevemos o uso de anteriores impróprios. Priores impróprios não são distribuições de probabilidade verdadeiras, mas se fingirmos que são, calcularemos distribuições posteriores que se aproximam dos posteriores que teríamos obtido usando anteriores conjugados adequados com valores extremos dos hiperparâmetros anteriores. Exercícios 1.Considere novamente a situação descrita no Exemplo 7.3.10. Mais uma vez, suponha que a distribuição anterior de θ é uma distribuição normal com média 0, mas desta vez deixe a variância anterior serv2>0. Se a média posterior deθé 0.12, qual o valor dev2foi usado? 2.Mostre que no Exemplo 7.3.2 deve ser verdade queV≤0.01 após 22 itens terem sido selecionados. Mostre também queV >0.01 até que pelo menos sete itens tenham sido selecionados. 3.Suponha que a proporçãoθde itens defeituosos em uma remessa grande é desconhecida e que a distribuição prévia 406 Capitulo 7 Estimativa 406 Chapter 7 Estimation de6 a distribuicdo beta com pardmetros 2 e 200. Se 100 o desvio padrdo é 1. Qual € 0 menor numero de of 6 is the beta distribution with parameters 2 and 200. If — the standard deviation is 1. What is the smallest number itens forem selecionados aleatoriamente da remessa e se observacdes que devem ser incluidas na amostra para 100 items are selected at random from the shipment and of observations that must be included in the sample in trés desses itens forem considerados defeituosos, qual é a reduzir o desvio padrdao da distribuigdo posterior ded if three of these items are found to be defective, what is order to reduce the standard deviation of the posterior distribuic¢do posterior de@? para o valor 0,1? the posterior distribution of 6? distribution of 6 to the value 0.1? 4.Considere novamente as condi¢des do Exercicio 3. 11.Suponha que uma amostra aleatoria de 100 4. Consider again the conditions of Exercise 3. Suppose 11. Suppose that a random sample of 100 observations is Suponha que depois de um certo estatistico ter observado —_—bservacées seja retirada de uma distribuigao normal para that after a certain statistician has observed that there —_ to be taken from a normal distribution for which the value que havia trés itens defeituosos entre os 100 itens a qual o valor da média6é desconhecido e o desvio padrao were three defective items among the 100 items selected _ of the mean @ is unknown and the standard deviation is selecionados aleatoriamente, a distribuigdo posterior que é 2, ea distribuicdo anterior de@é uma distribuicdo at random, the posterior distribution that she assigns to 6 2, and the prior distribution of 6 is a normal distribution. ela atribui a@ é uma distribuicgdo beta para a qual a média normal. Mostre que ndo importa qudo grande seja o is a beta distribution for which the mean is 2/51 and the Show that no matter how large the standard deviation € 241 ea variacdo € 98/[(51)2(103)]. A qual distribuigao desvio padrdo da distribuicdo anterior, o desvio padrao da variance is 98/[(51)?(103)]. What prior distribution had _ of the prior distribution is, the standard deviation of the anterior o estatistico atribuiu@ distribuicdo posterior sera menor que 1/5. the statistician assigned to 6? posterior distribution will be less than 1/5. 5.Suponha que o numero de defeitos em um rolo de fita 12.Suponha que o tempo em minutos necessario para 5. Suppose that the number of defects in a 1200-foot roll 42, Suppose that the time in minutes required to serve a magnetica de 1.200 pés tenha uma distribui¢do de Poisson —aatender um cliente em uma determinada instalacao tenha of magnetic recording tape has a Poisson distribution for —_ystomer at a certain facility has an exponential distribu- para a qual o valor da médiaGé desconhecido e que a uma distribuicdo exponencial para a qual o valor do which the value of the mean @ is unknown and that the tion for which the value of the parameter 6 is unknown distribuicao anterior deGé a distribuigao gama com pardmetro6é desconhecido e que a distribuicdo anterior de6é prior distribution of @ is the gamma distribution with pa- —_ and that the prior distribution of @ is a gamma distribu- parametrosa= 3 ef 1. Quando cinco rolos desta fita sao uma distribuicdo gama para a qual a média é 0,2 e 0 desvio rameters a = 3 and f = 1. When five rolls of this tape are tion for which the mean is 0.2 and the standard deviation selecionados aleatoriamente e inspecionados, o numero padrao é 1. Se o tempo médio necessario para atender uma selected at random and inspected, the numbers of defects is 1. If the average time required to serve a random sam- de defeitos encontrados nos rolos é 2,2,6,0 e 3. Determine amostra aleatéria de 20 clientes for de 3,8 minutos, qual é a found on the rolls are 2, 2, 6, 0, and 3. Determine the pos- ple of 20 customers is observed to be 3.8 minutes, what is a distribuicao posterior de6é. distribuicdo posterior de@? terior distribution of 6. the posterior distribution of 6? 6.Deixar @denotam ° numero medio de defeitos por 100 pes . 13.Para uma distribuigdo com médiay= 0 e desvio 6. Let 6 denote the average number of defects per 100 13. For a distribution with mean ju 4 0 and standard devi- de um certo tipo de fita magnética. Suponha que o valor de@ a arsqg 30, ocoeficiente de variacaoda distribuicao é feet of a certain type of magnetic tape. Suppose that the ation & S 0, the coefficient of variation of the distribution desconhecido e que a distribuigdo anterior de Géa definida comoo/| p/|. Considere novamente o problema value of 6 is unknown and that the prior distribution of is defined as o/|u|. Consider again the problem described distribuigdo gama com pardmetrosa= 2 e B=10. Quando um d it E H 42 h ci ted @ is the gamma distribution with parameters a = 2 and in E ise 12 Me d i t th P ficient of vari rolo de 1.200 pés desta fita é inspecionado, sao encontrados escriro de distribut 2 © supon te en Ou i c 6 =10. When a 1200-foot roll of this tape is inspected, m vt the, o> any ae het ‘ba ve er o Wh. ve the exatamente quatro defeitos. Determine a distribuicdo Varlagao Ca gistnipuicao gama anterior ae Qual 0 exactly four defects are found. Determine the posterior ton of the prior gamma Cistribution Of @ Is «. What is the . menor numero de clientes que deve ser observado ae smallest number of customers that must be observed in or- posterior de@. para reduzir o coeficiente de variagdo da distribuicdo distribution of 6. der to reduce the coefficient of variation of the posterior 7.Suponha que as alturas dos individuos de uma determinada _— posterior para 0,1? 7. Suppose that the heights of the individualsinacertain distribution to 0.1? populagdo tenham uma distribuigdo normal para a qual o . So : .. population have a normal distribution for which the value . oa, . valor da médiaGé desconhecido e o desvio padrdo é de 2 14.Mostre que a familia de distribuic6es beta é uma familia of the mean @ is unknown and the standard deviation is 14. Show that the family of beta distributions is a con- polegadas. Suponha também que a distribuicdo anterior deGé conjugada de distribuicées anteriores para amostras de uma 2 inches. Suppose also that the prior distribution of 6 is a Jugate family of prior distributions for samples from a uma distribuicao normal para a qual a média é 68 polegadas e distribuicao binomial negativa com um valor conhecido do normal distribution for which the mean is 68 inches and negative binomial distribution with a known value of the o desvio padrdo é 1 polegada. Se 10 pessoas forem parametroe um valor desconhecido do parametrop (0<p <1). the standard deviation is 1 inch. If 10 people are selected parameter r and an unknown value of the parameter p selecionadas aleatoriamente da populacdo e sua altura média at random from the population, and their average height is (O<p<b. for 69,5 polegadas, qual é a distribuigdo posterior de@? 15.Deixar&Oseja um pdf definido como segue para found to be 69.5 inches, what is the posterior distribution 15. Let €(0) be a p.d.f. that is defined as follows for con- constantesa >0 ef >0: of 8? stants a > 0 and p > 0: 8.Considere novamente o problema descrito no Exercicio 7. { Ba 8. Consider again the problem described in Exercise 7. BY» (atl) ,-B/0 a.Qual intervalo de 1 polegada de comprimento teve a maior &(OF ~(@O-(a1)e- 6/8 paraG >0, a. Which interval 1-inch long had the highest prior €(0) = | T@? ° ford > 0, probabilidade anterior de conter o valor de@ 0 para@s0. probability of containing the value of 6? 0 for 6 <0. b.Qual intervalo de 1 polegada de comprimento tem a maior Uma distribuicgdo com este pdf é chamada dedistribuicgo b. Which interval 1-inch long has the highest posterior A distribution with this p.d.f. is called an inverse gamma probabilidade posterior de conter o valor de@? gama inversa. probability of containing the value of 6? distribution. c.Encontre os valores das probabilidades nas partes (a) e a.Vferificar isso €(@ na verdade um pdf verificando se c. Find the values of the probabilities in parts (a) and a. Verify that €(@) is actually a p.d.f. by verifying that (b). 0&(0)de=1. (b). Jor €(8) do =1. 9.Suponha que uma amostra aleatéria de 20 observacées seja b.Considere a familia de distribuig6es de probabilidade que 9, Suppose that a random sample of 20 observations is b. Consider the family of probability distributions that retirada de uma distribuicdo normal para a qual o valor da pode ser representada por uma pdf&@tendo a forma dada taken from a normal distribution for which the value of the can be represented by a p.d.f. €(@) having the given média 6 desconhecido e a variancia é 1. Apés a amostra para todos os pares possiveis de constantesa >0 ef >0. mean @ is unknown and the variance is 1. After the sample form for all possible pairs of constants a > 0 and > valores foram observados, verifica-se queXn=10, e que a Mostre que esta familia é uma familia conjugada de values have been observed, it is found that X,, = 10, and 0. Show that this family is a conjugate family of prior distribuicao posterior de6é uma distribuigaéo normal para distribuigées anteriores para amostras de uma distribuigao that the posterior distribution of 6 is a normal distribution distributions for samples from a normal distribution a qual a média é 8 e a variancia 6 1/25. Qual eraa normal com um valor conhecido da médiaye um valor for which the mean is 8 and the variance is 1/25. What was with a known value of the mean y and an unknown distribuicdo anterior de@? desconhecido da variancia 6. the prior distribution of 6? value of the variance 0. 10.Suponha que uma amostra aleatoria seja retirada de 16.Suponha que no Exercicio 15 0 pardmetro seja considerado 10. Suppose that a random sample is to be taken from 16. Suppose that in Exercise 15 the parameter is taken as uma distribuigdo normal para a qual o valor da média 66 o desvio padrao da distribuigdo normal, em vez da variancia. a normal distribution for which the value of the mean the standard deviation of the normal distribution, rather desconhecido e o desvio padrdo é 2, e a distribuicdo Determine uma familia conjugada de distribuigées anteriores @ is unknown and the standard deviation is 2, and the than the variance. Determine a conjugate family of prior anterior de6é uma distribuigdo normal para a qual para amostras de uma distribuigdo normal com prior distribution of 6 is a normal distribution for which distributions for samples from a normal distribution with 7.3 Distribuigées Anteriores Conjugadas 407 7.3 Conjugate Prior Distributions 407 um valor conhecido da médiaye um valor segue paraé€e todos os valores dex: a known value of the mean yw and an unknown value of follows for 6 € Q and all values of x: desconhecido do desvio padrdoa. the standard deviation o. f(x| -uma(O)b(xJexplc(A)a(x/]. f (10) = a()b(x) exp[c@) d(x)]. 17.S h u d inut d . ~ ~ we 17. that th b f minut t : : “iponna ale © namere ee miniros que uma pessoa ceve Aquiuma(@ec(8sdo fungées arbitrarias de@, eb(x) ed(x) : Suppose that the number Of manures @ person MUS Here a(@) and c(@) are arbitrary functions of 6, and b(x) esperar por um 6nibus todas as manhas tenha distribuigdo 530 funcées arbitrarias dex. Deixar wait for a bus each morning has the uniform distribution and d(x) are arbitrary functions of x. Let uniforme no intervalo [0,4], onde o valor do ponto final@E S { j . } on the interval [0, 6], where the value of the endpoint 6 * y *. desconhecido. Suponha também que a pdf anterior deGé o is unknown. Suppose also that the prior p.d.f. of 6 is as a seguinte: H=(a, B): uma(@)aexp[c(A)B\ de <, follows: H=j{(a, B): [ a(@)* exp[c(@) B]d@ < co}. { ; 192 Para cada(a, BX H,deixar 192 For each (a, 8) € H, let {OF bi para@=4, B é(0) = | a4 for 6 = 4, B QU de outra forma. uma(@)aexp[c(O)h) 0 otherwise. a(@) exp[c(0) B} éa, p(O=J amalnexpreiniBldn’ Ca,p(9) = T_ a(n) exple(n) Bldn’ Se os tempos de espera observados em trés manhas uma(naexplc(n)Ban If the observed waiting times on three successive mornings J am exple(n) Bldn sucessivas sao de 5, 3 e 8 minutos, qual € a fdp posterior de@? =e deixar _seja o conjunto de todas as distribuicées de probabilidade que are 5, 3, and 8 minutes, what is the posterior p.d.f. of 6? and let W be the set of all probability distributions that 18.A distribuigéo de Pareto com parametrosxvea(x0>0 ea tem pdfs do formato¢a p/@)para alguns(a, BJEH. 18. The Pareto distribution with parameters x9 and a have p.d.f’s of the form Cy, g(@) for some (a, B) € H. >0) é definido no Exercicio 16 da Secdo. 5.7. Mostre que a a.Mostre que é uma familia conjugada de distribuig6es a (xo > 0 and aw > 0) is defined in Exercise 16 of Sec. 5.7. a. Show that W is a conjugate family of prior distribu- familia de distribuigées de Pareto 6 uma familia conjugada priori para amostras def(x| 4). Show that the family of Pareto distributions is a conjugate tions for samples from f(x|9). de distribuicées a priori para amostras de uma b.Suponha que observamos uma amastra gleatéria ge family of prior distributions for samples from a uniform b. Suppose that we observe a random sample of size n istribuica i i amanhon da distribuicao com x| @). Seo istributi i « tettaati ‘ ‘ distribuicao uniforme no intervalo [0,6], onde o valor do tama debe ea, pmostram que os A erparsrhetros distribution on the interval [0, 6], where the value of the from the distribution with p.d.f. f(x|). If the prior ponto final@E desconhecido. posteriores sao endpoint 6 is unknown. p.d.f. of 6 is Eg,» Show that the posterior hyperpa- 0 0>P0? 19.Suponha que, ..., Xiformar uma amostra aleatéria de 19. Suppose that X;,..., X,, form a random sample from rameters are uma distribuicdo para a qual a pdff(x| 84 o seguinte: »” a distribution for which the p.d.f. f(x|@) is as follows: n sa0P C parfix| 9 a=a0+n, 1=Bo+ a(xeu). pdt £18) ay =a) +n, Bi = Bo + >- d(x). Oxe-1 eu=1 g-1 i=l fix| OF para O<x <1, f(x|0) = 0x for0 <x <1, _ — 0 de outra forma. 24.Mostre que cada uma das seguintes familias de distribuigées é 0 otherwise. 24. Show that each of the following families of distribu- , . , uma familia exponencial, conforme definido no Exercicio 23: ; tions is an exponential family, as defined in Exercise 23: Suponha também que 0 valor do parametro GE A familia de distribuicées de B li lor d Suppose also that the value of the parameter 0 is unknown The familv of B ki distributi ith desconhecido (@ >0), e a distribuigdo prévia deGé a aA tamiila de See heed € pernoullil com valor de (@ > 0), and the prior distribution of 6 is the gamma dis- a. k e oval ° f No ot 1 distributions with an un- distribuicdo gama com parametrosaef(a >0 ef >0). parametro desconhecidop tribution with parameters w and B (a > 0 and B > 0). De- nown value of the parameter p Determine a média e a variancia da distribuicgao b.A familia de distribuigdes de Poisson com média termine the mean and the variance of the posterior distri- b. The family of Poisson distributions with an unknown posterior de@. desconhecida bution of 6. mean . c.A familia de distribuigées binomiais negativas para a tpg: . c. The family of negative binomial distributions for 20.Suponha que modelemos os tempos de vida (em meses) de , . “ 20. Suppose that we model the lifetimes (in months) of . . i ao _ _ qual o valor deRé conhecido e 0 valor depE : : . which the value of r is known and the value of p componentes eletrénicos como variaveis aleatdérias exponenciais : electronic components as independent exponential ran- : . . : . desconhecido . , is unknown independentes com pardmetro desconhecidof. Nés modelamos# _ oe ; - dom variables with unknown parameter 8. We model 6 . en . como tendo a distribuigéo gama com parametrosae b. d.A familia de distribuig6es normals com media as having the gamma distribution with parameters a and d. The family of normal distributions with an unknown Acreditamos que a vida média é de quatro meses antes de vermos desconhecida e variancia conhecida b. We believe that the mean lifetime is four months before mean and a known variance quaisquer dados. Se observassemos 10 componentes com uma e.A familia de distribuigdes normais com variancia we see any data. If we were to observe 10 components with e. The family of normal distributions with an unknown vida Util média observada de seis meses, afirmariamos entdo que desconhecida e média conhecida an average observed lifetime of six months, we would then variance and a known mean a vida util média é de cinco meses. Determinara eb. Dica-Use 0 f.A familia de distribuicdes gama para a qual o valor claim that the mean lifetime is five months. Determine a f. The family of gamma distributions for which the Exercicio 21 na Secdo. 5.7. deaé desconhecido e o valor defé conhecido and b. Hint: Use Exercise 21 in Sec. 5.7. value of a is unknown and the value of £ is known 21.Suponha que, ..., Xnformar uma amostra aleatoria g.A familia de distribuic6es gama Para a qual o valor 21. Suppose that X,,..., X,, form a random sample from g. The family of gamma distributions for which the da distribuigdo exponencial com parametro@. Deixe a deaé conhecido e o valor defr desconhecido the exponential distribution with parameter 6. Let the value of a is known and the value of f is unknown distribuigdo anterior deé@ser impréprio com “pdf” 1/@para 0 h.A familia de distribuigdes beta para a qual o valor deaé prior distribution of 6 be improper with “p.d.f” 1/0 for h. The family of beta distributions for which the value >0. Encontre a distribuigdo posterior deOe mostre que a desconhecido e 0 valor defé conhecido @ > 0. Find the posterior distribution of 6 and show that of w is unknown and the value of £ is known media posterior deGe 1/xn. eu.A familia de distribuigdes beta para a qual o valor dea the posterior mean of @ is 1/%,. i. The family of beta distributions for which the value 22.Considere os dados do Exemplo 7.3.10. Desta vez, suponha € conhecido e o valor der desconhecido 22. Consider the data in Example 7.3.10. This time, sup- of a is known and the value of # is unknown que usemos 0 “pdf" anterior impropriog(@¥ 1 (para todos®). 25.Mostre que a familia de distribuicées uniformes nos pose that we use the improper prior “p.d.f” ¢(@)=1 (for 25. Show that the family of uniform distributions on the Encontre a distribuigdo posterior de Ge a probabilidade intervalos [0,6] para@ >0 éndouma familia exponencial all@). Find the posterior distribution of 6 and the posterior intervals [0, 6] for 6 > 0 is not an exponential family as posterior de que@ >1. conforme definido no Exercicio 23. Dica:Observe o suporte de probability that 6 > 1. defined in Exercise 23. Hint: Look at the support of each da distribuigdo unif . . eye . if distribution. 23.Considere uma distribuicdo para a qual a pdf ou o PF éf(x| cada austriouigao untrorme 23. Consider a distribution for which the p.d.f. or the p-f. union castmouiton 8), onde @pertence a algum espaco de pardmetros. Diz-se que 26.Mostre que a familia de distribuigdes uniformes discretas nos is f(x|0), where 6 belongs to some parameter space Q. It 26. Show that the family of discrete uniform distributions a familia de distribuig6es obtida deixando Qvariam em todos conjuntos de inteiros {0,1,...,@}para6um numero inteiro ndo is said that the family of distributions obtained by letting on the sets of integers {0, 1,..., 6} for 0 a nonnegative os valores em é um familia exponencial, ou um Familia negativo éngouma familia exponencial conforme definido no @ vary over all values in Q is an exponential family, or integer is not an exponential family as defined in Exer- Koopman-Darmois, sef(x| @)pode ser escrito como Exercicio 23. a Koopman-Darmois family, if f (x|@) can be written as cise 23. 408 Capitulo 7 Estimativa 408 Chapter 7 Estimation 7.4 Estimadores Bayesianos 7.4 Bayes Estimators Um estimador de um parametro é alguma funcdo dos dados que esperamos estar An estimator of a parameter is some function of the data that we hope is close to proximos do parametro. Um estimador Bayes 6 um estimador escolhido para minimizar a the parameter. A Bayes estimator is an estimator that is chosen to minimize the média posterior de alguma medida de qudo longe o estimador esté do parametro, como posterior mean of some measure of how far the estimator is from the parameter, erro quadrdatico ou erro absoluto. such as squared error or absolute error. Natureza de um problema de estimativa Nature of an Estimation Problem Exemplo As calorias contam nos rotulos dos alimentos.No Exemplo 7.3.10, encontramos a distribuicdo posterior Example Calorie Counts on Food Labels. In Example 7.3.10, we found the posterior distribution 7.4.1 de@, a diferenca percentual média entre as contagens de calorias medidas e anunciadas. 7.4.1 of 6, the mean percentage difference between measured and advertised calorie Um grupo de consumidores pode querer reportar um Unico numero como uma counts. A consumer group might wish to report a single number as an estimate of 0 estimativa de@ sem especificar toda a distribuicdo para@. Como escolher essa estimativa without specifying the entire distribution for 9. How to choose such a single-number de numero Unico em geral é 0 assunto desta secdo. - estimate in general is the subject of this section. < Comegamos com uma definicgdo apropriada para um pardmetro com valor real, We begin with a definition that is appropriate for a real-valued parameter such como no Exemplo 7.4.1. Uma definigdo mais geral seguira depois que nos as in Example 7.4.1. A more general definition will follow after we become more familiarizarmos com o conceito de estimativa. familiar with the concept of estimation. Definicgao Estimador/Estimativa.Deixar%i, ..., Xnser dados observaveis cuja distribuigdo conjunta é Definition Estimator/Estimate. Let X,,..., X, be observable data whose joint distribution is 7.4.1 indexado por um pardmetro@tomando valores em um subconjunto da linha real. Um 7.4.1 indexed by a parameter @ taking values in a subset © of the real line. An estimator estimador do parametroGé uma fungdo com valor reald(X1, ..., Xn). SeX1=x1,..., Xn=Xn of the parameter 6 is a real-valued function 6(X,..., X,). If X; =24,..., X, =Xpy sao observados, entaod(x1,..., xn} chamado deestimativade 6. are observed, then 5(x1,..., x,) is called the estimate of 6. Observe que todo estimador é, por natureza, uma fungdo de dados, uma estatistica no Notice that every estimator is, by nature of being a function of data, a statistic in the sentido da Definigdo 7.1.4. sense of Definition 7.1.4. Porque o valor de@deve pertencer ao conjunto, pode parecer razoavel exigir que todo Because the value of 6 must belong to the set Q, it might seem reasonable to valor possivel de um estimadoré(™1, .. ., Xntambém deve pertencer a . Contudo, ndo require that every possible value of an estimator 5(Xj,..., X,,) must also belong exigiremos esta restrig¢do. Se um estimador puder assumir valores fora do espaco de to Q. We shall not require this restriction, however. If an estimator can take values parametros, o experimentador precisara decidir no problema especifico se isso parece outside of the parameter space Q, the experimenter will need to decide in the specific apropriado ou nado. Pode acontecer que todo estimador que aceita valores apenas problem whether that seems appropriate or not. It may turn out that every estimator internamente tenha outras propriedades ainda menos desejaveis. that takes values only inside Q has other even less desirable properties. Na Definicdo 7.4.1, distinguimos entre os termosestimadoreestimativa. Porque um In Definition 7.4.1, we distinguished between the terms estimator and estimate. estimadord(M1,..., Xné uma fungdo das variaveis aleatériasX1,..., Xn, 0 prdéprio Because an estimator 5(X,,..., X,,)isa function of the random variables X),..., X,,, estimador é uma variavel aleatéria e sua distribuigdo de probabilidade pode ser derivada the estimator itself is a random variable, and its probability distribution can be da distribuigdo conjunta deX,..., Xn, se desejado. Por outro lado, um estimativaé um derived from the joint distribution of X;,..., X,, if desired. On the other hand, an valor especificod(m, ..., XnJdo estimador que é determinado usando valores observados estimate is a specific value 5(x1,..., x,) of the estimator that is determined by using especificosx1,..., Xn. Se usarmos a notacdo vetorialX=(X1,..., Xn) ex=(x1,..., Xn), entao specific observed values x1, ..., x,. If we use the vector notation X = (X1,..., X,) um estimador é uma fungdo0d(X)do vetor aleatérioX, e uma estimativa é um valor and x = (x;,..., x,), then an estimator is a function 5(X) of the random vector X, and especificod(x). Muitas vezes sera conveniente denotar um estimador 5(Xsimplesmente an estimate is a specific value 5(x). It will often be convenient to denote an estimator pelo simbolod. 5(X) simply by the symbol 6. Funcées de Perda Loss Functions Exemplo As calorias contam nos rétulos dos alimentos.No Exemplo 7.4.1, 0 grupo de consumidores pode sentir que o Example Calorie Counts on Food Labels. In Example 7.4.1, the consumer group may feel that the 7.4.2 mais longe sua estimativad(x} da verdadeira diferenga média@, mais constrangimento e 7.4.2 farther their estimate 5(x) is from the true mean difference 6, the more embarassment possiveis agdes legais eles encontrardo. Idealmente, eles gostariam de quantificar a and possible legal action they will encounter. Ideally, they would like to quantify the quantidade de repercuss6es negativas em funcdo doe a estimativad(x). Entdo, eles amount of negative repercussions as a function of 6 and the estimate 5(x). Then they poderiam ter uma ideia da probabilidade de encontrarem varios niveis de problemas could have some idea how likely it is that they will encounter various levels of hassle como resultado de sua estimativa. - as a result of their estimation. < 7.4 Estimadores Bayesianos 409 7.4 Bayes Estimators 409 O principal requisito de um bom estimadordé que produza uma estimativa de 0 The foremost requirement of a good estimator 64 is that it yield an estimate of que esta préximo do valor real de@ Em outras palavras, um bom estimador é aquele @ that is close to the actual value of 6. In other words, a good estimator is one for para o qual é altamente provavel que o errod(X}- sera préximo de 0. Assumiremos which it is highly probable that the error 5(X) — 6 will be close to 0. We shall assume que para cada valor possivel de6€e cada estimativa possivela, ha um numero eu(6, that for each possible value of @ € Q and each possible estimate a, there is a number uma)que mede a perda ou custo para o estatistico quando 0 verdadeiro valor do L(@, a) that measures the loss or cost to the statistician when the true value of the pardmetro é6e sua estimativa €éa. Normalmente, quanto maior a distancia entrea e6, parameter is @ and her estimate is a. Typically, the greater the distance between a maior sera o valor deeu(@, uma). and @, the larger will be the value of L(@, a). Definicao Funcdo de perda.Afun¢dao de perdaé uma funcao de valor real de duas variaveis, eu(@, uma), Definition Loss Function. A Joss function is a real-valued function of two variables, L(0, a), 7.4.2 ondeG€eaé um numero real. A interpretagdo é que o estatistico perde eu(@, 7.4.2 where 6 € Q and a is a real number. The interpretation is that the statistician loses umagxse 0 parametro for igual6e a estimativa é iguala. L(6, a) if the parameter equals 6 and the estimate equals a. Como antes, deixe&@denotar o pdf anterior deGno conjunto e considere um As before, let €(@) denote the prior p.d.f. of 6 on the set 2, and consider a problem problema no qual 0 estatistico deve estimar o valor de@sem poder observar os valores em in which the statistician must estimate the value of 6 without being able to observe uma amostra aleatoria. Se 0 estatistico escolher uma estimativa especificaa, entao the values in arandom sample. If the statistician chooses a particular estimate a, then sua perda esperada sera j her expected loss will be Fleu(@, uma) = eu(O, ajE(Q) dé. (7.4.1) E[L(@, a)|= / L(@, a)E(O) dd. (7.4.1) Q Assumiremos que 0 estatistico deseja escolher uma estimativaapara 0 qual a perda We shall assume that the statistician wishes to choose an estimate a for which the esperada na Eq. (7.4.1) 6 um minimo. expected loss in Eq. (7.4.1) is a minimum. Definicdo de um estimador Bayes Definition of a Bayes Estimator Suponha agora que 0 estatistico possa observar o valorxdo vetor aleatorioX antes de Suppose now that the statistician can observe the value x of the random vector X estimar@, e deixar &(0| x)denotar o pdf posterior de@sobre . (O caso de um pardmetro before estimating 6, and let €(6|x) denote the posterior p.d.f. of @ on Q. (The case of discreto pode ser tratado de maneira semelhante.) Para cada estimativaaque o a discrete parameter can be handled in similar fashion.) For each estimate a that the estatistico pode usar, sua perda esperada pos caso sera statistician might use, her expected loss in this case will be Aleu(9, uma)| x] = eu(@, umajé(O| x)d0. (7.4.2) E[L(, a)|x] = / L(O, a)é(O|x) dé. (7.4.2) Q Portanto, o estatistico deve agora escolher uma estimativaapara o qual a expectativa na Hence, the statistician should now choose an estimate a for which the expectation in Eq. (7.4.2) 6 um minimo. Eq. (7.4.2) is a minimum. Para cada valor possivelxdo vetor aleatérioX, deixar d+(x)denotar um valor da For each possible value x of the random vector X, let 6*(x) denote a value of estimativaapara o qual a perda esperada na Eq. (7.4.2) € um minimo. Entdo a fungaod*(X) the estimate a for which the expected loss in Eq. (7.4.2) is a minimum. Then the para os quais os valores sdo especificados desta forma sera um estimador de @. function 6*(X) for which the values are specified in this way will be an estimator of 0. Definicao Estimador/estimativa de Bayes.Deixareu(@, uma}seja uma funcao de perda. Para cada valor possivelxde Definition Bayes Estimator/Estimate. Let L(6, a) be a loss function. For each possible value x of 7.4.3 X, deixar d+(xser um valor deade tal modo que£[eu(@, uma)| x] é minimizado. Entdod:é chamado de 7.4.3 X, let 6*(x) be a value of a such that E[L(6, a)|x] is minimized. Then 6* is called a Estimador Bayesianode 0. Uma vezX=xé observado, «(x chamado deEFstimativa de Bayesde @. Bayes estimator of 6. Once X = x is observed, 5*(x) is called a Bayes estimate of 0. Outra maneira de descrever um estimador Bayesianoéé notar que, para cada valor possivel x Another way to describe a Bayes estimator 6* is to note that, for each possible value deX, O valor qued+(xé escolhido para que x of X, the value 5*(x) is chosen so that LLeu(@, 5+(x))| x] = minimof,eu(6, uma)| x1. (7.4.3) E[L(, 6*(x)) |x] = min E[L(@, a)|x]. (7.4.3) Todosa a Em resumo, consideramos um problema de estimativa em que uma amostra aleatériaX=(X In summary, we have considered an estimation problem in which a random sam- 1,...,Xn)deve ser obtido de uma distribuigdo envolvendo um parametro @que possui um valor ple X = (Xj, ..., X,,) is to be taken from a distribution involving a parameter 6 that desconhecido em algum conjunto especificado. Para cada funcdo de perda dadaeu(@, uma)e has an unknown value in some specified set Q. For every given loss function L(@, a) todos os PDF anteriores (9), o estimador de Bayes de 6 o estimadoré+(X)para 0 qual a Eq. and every prior p.d.f. €(@), the Bayes estimator of 6 is the estimator 5*(X) for which (7.4.3) é satisfeito para todos os valores possiveisxdeX. Deve-se enfatizar que a forma do Eq. (7.4.3) is satisfied for every possible value x of X. It should be emphasized that estimador de Bayes dependerd tanto da funcdo de perda utilizada the form of the Bayes estimator will depend on both the loss function that is used 410 Capitulo 7 Estimativa 410 Chapter 7 Estimation no problema e a distribuigdo anterior que é atribuida a@ Nos problemas descritos neste in the problem and the prior distribution that is assigned to 0. In the problems de- texto existirdo estimadores Bayesianos. No entanto, existem situagées mais complicadas scribed in this text, Bayes estimators will exist. However, there are more complicated em que nenhuma funcdoéssatisfaz (7.4.3). situations in which no function 6* satisfies (7.4.3). Diferentes funcgdes de perda Different Loss Functions De longe, a funcado de perda mais comumente usada em problemas de estimativa é a funcdo de perda By far, the most commonly used loss function in estimation problems is the squared de erro quadratico. error loss function. Definicgao Funcdo de perda de erro quadratico.A funcao de perda Definition Squared Error Loss Function. The loss function 744 7.4.4 eu(0, uma(0-ap (7.4.4) L(6, a) =(6 —a)* (7.4.4) é chamadoperda de erro quadratico. is called squared error loss. Quando a fungdo de perda de erro quadratica é usada, a estimativa de Bayes d(x) When the squared error loss function is used, the Bayes estimate 6*(x) for each para cada valor observado dexsera 0 valor deapara 0 qual a expectativa/[(@-ap | x] 6 um observed value of x will be the value of a for which the expectation E[(@ — a)*|x]is a minimo. O Teorema 4.7.3 afirma que, quando a expectativa de(@-aé calculado em minimum. Theorem 4.7.3 states that, when the expectation of (9 — a)? is calculated relacdo a distribuicdo posterior de@, essa expectativa sera minima quandoaé escolhido with respect to the posterior distribution of 6, this expectation will be a minimum para ser igual 4 média£/(6| x)da distribuigdo posterior, se essa média posterior for finita. when a is chosen to be equal to the mean E(6|x) of the posterior distribution, if that Se a média posterior deGndo é finito, entao a perda esperada é infinita para todas as posterior mean is finite. If the posterior mean of 6 is not finite, then the expected loss estimativas possiveisa. Portanto, temos 0 seguinte corolario do Teorema 4.7.3. is infinite for every possible estimate a. Hence, we have the following corollary to Theorem 4.7.3. Coroldrio Deixar@ser um pardmetro com valor real. Suponha que a fungdo de perda de erro quadratica (7.4.4) Corollary Let 0 be a real-valued parameter. Suppose that the squared error loss function (7.4.4) 741 é usado e que a média posterior deG@, F(O| X), é finito. Entao, um estimador Bayesiano 7.4.1 is used and that the posterior mean of 0, E(6|X), is finite. Then, a Bayes estimator de G6 5: (X= E(6| X). = of 6 is 6*(X) = E(6|X). = Exemplo Estimando o paradmetro de uma distribuigdo de Bernoulli.Deixe a amostra aleatoriaXi, ..., Xn Example Estimating the Parameter of a Bernoulli Distribution. Let the random sample X,,..., X, 7.4.3 ser retirado da distribuigdo de Bernoulli com parametro@, que é desconhecido e deve ser 7.4.3 be taken from the Bernoulli distribution with parameter 6, which is unknown and estimado. Deixe a distribuigdo anterior de @seja a distribuigdo beta com parametrosa >0 e must be estimated. Let the prior distribution of 6 be the beta distribution with £ >0. Suponha que a funcdo de perda de erro quadratica seja usada, conforme parameters a > 0 and f > 0. Suppose that the squared error loss function is used, especificado pela Eq. (7.4.4), para 0<6<1 e 0<um <1. Determinaremos o estimador as specified by Eq. (7.4.4), for 0 < @ < land 0 <a <1. We shall determine the Bayes Bayesiano deé. y estimator of 6. Para valores observadosx1,..., xn, deixarsim® Gu=1Xeu. Entéo segue de Teo- For observed values x),...,x,, let y= }0"_, x;. Then it follows from Theo- rem 7.3.1 que a distribuigdo posterior de@sera a distribuigdo beta com pardmetrosa rem 7.3.1 that the posterior distribution of @ will be the beta distribution with pa- =atsimeBi=h+n-sim. Como a média da distribuigdo beta com parametrosaiefiéaiHa rameters a; =a + yand f; = 8 +n — y. Since the mean of the beta distribution with 1+f1), a média desta distribuigdo posterior de @vai ser(aty)//at G+ n). A estimativa de parameters a and f; is a)/(a; + 6,), the mean of this posterior distribution of 6 will Bayes 6é(xsera igual a este valor para cada vetor observadox. Portanto, o estimador be (a + y)/(a + B +n). The Bayes estimate 6(x) will be equal to this value for each de Bayes d#(X especificado da seguinte forma: observed vector x. Therefore, the Bayes estimator 5*(X) is specified as follows: dn n a+ a+ >, xX. &x(X=_ 27 1Xeu. (7.4.5) 5*(X) = oF i Xi (7.4.5) at Btn a+B+n - < Exemplo Estimando a média de uma distribuigdéo normal.Suponha que uma amostra aleatériaXi,..., Example Estimating the Mean of a Normal Distribution. Suppose that a random sample X),..., 7.4.4 Xndeve ser retirado de uma distribuigdo normal para a qual o valor da médiaGé 7.4.4 X,, is to be taken from a normal distribution for which the value of the mean @ is desconhecido e 0 valor da varidnciaozé conhecido. Suponha também que o anterior unknown and the value of the variance o” is known. Suppose also that the prior distribuigdo de6 a distribuigdo normal com médiapoe variagdo v2 0. Suponha, distribution of @ is the normal distribution with mean jg and variance Up. Suppose, finalmente, que a funcdo de perda de erro quadratica deve ser usada, conforme especificado finally, that the squared error loss function is to be used, as specified in Eq. (7.4.4), na Eq. (7.4.4), para -00<@<ee -0<um <w,Determinaremos o estimador Bayesiano de@. for —co < 8 < co and —oo <a < oo. We shall determine the Bayes estimator of 0. Segue-se do Teorema 7.3.3 que para todos os valores observadosx1,..., Xn, a It follows from Theorem 7.3.3 that for all observed values x1, ..., x,, the pos- distribuigdo posterior de@sera uma distribuigdo normal com médiayespecificado por terior distribution of 6 will be a normal distribution with mean pp; specified by 7.4 Bayes Estimators 411 Eq. (7.3.1). Therefore, the Bayes estimator δ∗(X) is specified as follows: δ∗(X) = σ 2μ0 + nv2 0Xn σ 2 + nv2 0 . (7.4.6) The posterior variance of θ does not enter into this calculation. ◀ Another commonly used loss function in estimation problems is the absolute error loss function. Definition 7.4.5 Absolute Error Loss Function. The loss function L(θ, a) = |θ − a| (7.4.7) is called absolute error loss. For every observed value of x, the Bayes estimate δ∗(x) will now be the value of a for which the expectation E(|θ − a||x) is a minimum. It was shown in Theorem 4.5.3 that for every given probability distribution of θ, the expectation of |θ − a| will be a minimum when a is chosen to be equal to a median of the distribution of θ. Therefore, when the expectation of |θ − a| is calculated with respect to the posterior distribution of θ, this expectation will be a minimum when a is chosen to be a median of the posterior distribution of θ. Corollary 7.4.2 When the absolute error loss function (7.4.7) is used, a Bayes estimator of a real- valued parameter is δ∗(X) equal to a median of the posterior distribution of θ. We shall now reconsider Examples 7.4.3 and 7.4.4, but we shall use the absolute error loss function instead of the squared error loss function. Example 7.4.5 Estimating the Parameter of a Bernoulli Distribution. Consider again the conditions of Example 7.4.3, but suppose now that the absolute error loss function is used, as specified by Eq. (7.4.7). For all observed values x1, . . . , xn, the Bayes estimate δ∗(x) will be equal to the median of the posterior distribution of θ, which is the beta distribution with parameters α + y and β + n − y. There is no simple expression for this median. It must be determined by numerical approximations for each given set of observed values. Most statistical computer software can compute the median of an arbitrary beta distribution. As a specific example, consider the situation described in Example 7.3.13 in which an improper prior was used. The posterior distribution of θ in that example was the beta distribution with parameters 22 and 18. The mean of this beta distribution is 22/40 = 0.55. The median is 0.5508. ◀ Example 7.4.6 Estimating the Mean of a Normal Distribution. Consider again the conditions of Exam- ple 7.4.4, but suppose now that the absolute error loss function is used, as specified by Eq. (7.4.7). For all observed values x1, . . . , xn, the Bayes estimate δ∗(x) will be equal to the median of the posterior normal distribution of θ. However, since the mean and the median of each normal distribution are equal, δ∗(x) is also equal to the mean of the posterior distribution. Therefore, the Bayes estimator with respect to the absolute error loss function is the same as the Bayes estimator with respect to the squared error loss function, and it is again given by Eq. (7.4.6). ◀ 7.4 Estimadores Bayesianos 411 Eq. (7.3.1). Portanto, o estimador de Bayesδ∗(X)é especificado da seguinte forma: σ2μ0+novo2 σ2+novo2 δ∗(X)= 0Xn. (7.4.6) 0 A variância posterior deθnão entra neste cálculo. - Outra função de perda comumente usada em problemas de estimativa é a função de perda de erro absoluto. Definição 7.4.5 Função de perda de erro absoluta.A função de perda eu(θ, uma)= |θ-a| (7.4.7) é chamadoperda de erro absoluta. Para cada valor observado dex, a estimativa de Bayesδ∗(x)agora será o valor dea para o qual a expectativaE(|θ-a||x)é um mínimo. Foi mostrado no Teorema 4.5.3 que para cada distribuição de probabilidade dada deθ, a expectativa de |θ-a|será o mínimo quandoaé escolhido para ser igual a uma mediana da distribuição deθ. Portanto, quando a expectativa de |θ-a|é calculado em relação à distribuição posterior deθ, essa expectativa será mínima quandoaé escolhido para ser uma mediana da distribuição posterior deθ. Corolário 7.4.2 Quando a função de perda de erro absoluta (7.4.7) é usada, um estimador Bayes de um valor real parâmetro avaliado éδ∗(X)igual a uma mediana da distribuição posterior deθ. Reconsideraremos agora os Exemplos 7.4.3 e 7.4.4, mas usaremos a função de perda de erro absoluta em vez da função de perda de erro quadrática. Exemplo 7.4.5 Estimando o parâmetro de uma distribuição de Bernoulli.Considere novamente as condições do Exemplo 7.4.3, mas suponha agora que a função de perda de erro absoluta seja usada, conforme especificado pela Eq. (7.4.7). Para todos os valores observadosx1, . . . , xn, a estimativa de Bayes δ∗(x)será igual à mediana da distribuição posterior deθ, que é a distribuição beta com parâmetrosα+simeβ+n-sim. Não existe uma expressão simples para esta mediana. Deve ser determinado por aproximações numéricas para cada conjunto de valores observados. A maioria dos softwares estatísticos pode calcular a mediana de uma distribuição beta arbitrária. Como exemplo específico, considere a situação descrita no Exemplo 7.3.13 em que foi utilizado um anterior impróprio. A distribuição posterior deθnaquele exemplo estava a distribuição beta com parâmetros 22 e 18. A média desta distribuição beta é 22/40 = 0.55. A mediana é 0,5508. - Exemplo 7.4.6 Estimando a média de uma distribuição normal.Considere novamente as condições do Exame Exemplo 7.4.4, mas suponha agora que a função de perda de erro absoluta seja usada, conforme especificado pela Eq. (7.4.7). Para todos os valores observadosx1, . . . , xn, a estimativa de Bayesδ∗(x)será igual à mediana da distribuição normal posterior deθ. No entanto, como a média e a mediana de cada distribuição normal são iguais,δ∗(x)também é igual à média da distribuição posterior. Portanto, o estimador de Bayes em relação à função de perda de erro absoluta é o mesmo que o estimador de Bayes em relação à função de perda de erro quadrática, e é novamente dado pela Eq. (7.4.6). - 412 Capitulo 7 Estimativa 412 Chapter 7 Estimation Outras fun¢des de perdaEmbora a funcdo de perda de erro quadratica e, em menor Other Loss Functions Although the squared error loss function and, to a lesser grau, a fungdo de perda de erro absoluta sejam as mais comumente usadas em extent, the absolute error loss function are the most commonly used ones in esti- problemas de estimativa, nenhuma dessas funcées de perda pode ser apropriada em um mation problems, neither of these loss functions may be appropriate in a particular problema especifico. Em alguns problemas, pode ser apropriado usar uma fungdo de problem. In some problems, it might be appropriate to use a loss function having the perda com a formaeu(@, umaF | @-a| k, ondeké algum numero positivo diferente de 1 ou form L(@, a) =|6 — al‘, where k is some positive number other than 1 or 2. In other 2. Em outros problemas, a perda resultante quando o erro | @a|tem uma determinada problems, the loss that results when the error |9 — a| has a given magnitude might magnitude pode depender do valor real de@. Em tal problema, pode ser apropriado usar depend on the actual value of 6. In such a problem, it might be appropriate to use a uma fungdo de perda com a formaeu(@, umaFA(O)(G-akoueu(@, umaFA(Q)| @-a|, onde A(@) loss function having the form L(@, a) = A(6)(0 — a)* or L(6, a) =A(6)|6 — al, where é uma dada fungao positiva de@. Em ainda outros problemas, pode ser mais caro A(@) is a given positive function of 6. In still other problems, it might be more costly superestimar 0 valor deGem uma certa quantia do que subestima-lo to overestimate the value of 6 by a certain amount than to underestimate it by the mesma quantidade. Uma funcdo de perda especifica que reflete esta propriedade é a seguinte: same amount. One specific loss function that reflects this property is as follows: { eu(6, umay 3(@-ap paraésa, L(0. a) = { 3(0 —a)? for @ <a, (0-ap parad >a. (@—a)* ford>a. Varios outros tipos de fungdes de perda podem ser relevantes em problemas Various other types of loss functions might be relevant in specific estimation especificos de estimativa. No entanto, neste livro daremos maior atencdo as fungées de problems. However, in this book we shall give most of our attention to the squared erro quadratico e perda absoluta de erro. error and absolute error loss functions. A estimativa de Bayes para grandes amostras The Bayes Estimate for Large Samples Efeito de diferentes distribuigées anterioresSuponha que a proporcdo de itens Effect of Different Prior Distributions Suppose that the proportion 6 of defective defeituosos em uma remessa grande é desconhecida e que a distribuigdo prévia de a items in a large shipment is unknown and that the prior distribution of 6 is the uniform distribuigdo uniforme no intervalo [0,1]. Suponha também que 0 valor de @deve ser distribution on the interval [0, 1]. Suppose also that the value of 6 must be estimated, estimado e que a funcdo de perda de erro quadratico é usada. Suponha, finalmente, que and that the squared error loss function is used. Suppose, finally, that in a random em uma amostra aleatoria de 100 itens da remessa, exatamente 10 itens sejam sample of 100 items from the shipment, exactly 10 items are found to be defective. considerados defeituosos. Como a distribuicdo uniforme é a distribuigdo beta com Since the uniform distribution is the beta distribution with parameters a = 1 and parametrosa= 1 e 6=1, e desden=100 esim=10 para a amostra dada, segue da Eq. (7.4.5) 6 =1, and since n = 100 and y = 10 for the given sample, it follows from Eq. (7.4.5) que a estimativa de Bayes 65*(x11/102 = 0.108. that the Bayes estimate is 6*(x) = 11/102 = 0.108. A seguir, suponha que a pdf anterior de&tem a formagé(@# 2(1 -A@)para 0<O<1, em Next, suppose that the prior p.d-f. of 6 has the form €(6) = 2(1 — 9) for0 <6 <1, vez de ser uma distribuigdo uniforme, e novamente em uma amostra aleatéria de instead of being a uniform distribution, and that again in a random sample of 100 100 itens, exatamente 10 itens sdo considerados defeituosos. Desdeé(@X o pdf da items, exactly 10 items are found to be defective. Since €(@) is the p.d.f. of the beta distribuigdo beta com parametrosa= 1 ef 2, segue da Eq. (7.4.5) que neste caso a distribution with parameters a = 1 and £ = 2, it follows from Eq. (7.4.5) that in this estimativa de Bayes deGéd6(x11/103 = 0.107. case the Bayes estimate of 6 is 6(x) = 11/103 = 0.107. As duas distribuigées anteriores consideradas aqui sdo bastante diferentes. A média The two prior distributions considered here are quite different. The mean of the da distribuigdo anterior uniforme é 1/2, e a média da outra distribuicgdo beta anterior é 1/ uniform prior distribution is 1/2, and the mean of the other beta prior distribution 3. No entanto, como o numero de observacées na amostra é tao grande (n=100), as is 1/3. Nevertheless, because the number of observations in the sample is so large estimativas de Bayes em relacdo as duas distribuic6es anteriores diferentes sdo quase as (n = 100), the Bayes estimates with respect to the two different prior distributions mesmas. Além disso, os valores de ambas as estimativas estao muito préximos da are almost the same. Furthermore, the values of both estimates are very close to the proporcdo observada de itens defeituosos na amostra, 0 que éxn=0.7. observed proportion of defective items in the sample, which is x,, = 0.1. Exemplo Medidas do térax de soldados escoceses.Quetelet (1846) relatado (com alguns erros) Example Chest Measurements of Scottish Soldiers. Quetelet (1846) reported (with some errors) 7.4.7 dados sobre as medidas do peito (em polegadas) de 5.732 milicianos escoceses. Esses dados 7.4.7 data on the chest measurements (in inches) of 5732 Scottish militiamen. These data apareceram anteriormente em uma revista médica de 1817 e sdo discutidos por Stigler (1986). appeared earlier in an 1817 medical journal and are discussed by Stigler (1986). Fig- A Figura 7.6 mostra um histograma dos dados. Suponha que modelassemos as medidas ure 7.6 shows a histogram of the data. Suppose that we were to model the individual individuais do torax como uma amostra aleatéria (dada@) de varidveis aleatérias normais com chest measurements as a random sample (given @) of normal random variables with média @e variancia 4. A medida média do térax éxn=39.85. SeGteve 0 mean @ and variance 4. The average chest measurement is x, = 39.85. If 9 had the distribuigdo anterior normal com médiapoe variagdov2 0, entao usando a Eq. (7.3.1) 0 normal prior distribution with mean jp and variance Ups then using Eq. (7.3.1) the distribuigdo posterior de@seria normal com média posterior distribution of 6 would be normal with mean _ 4y0+5732xv2 0x39.85 __ 4uq +5732 x v9 x 39.85 m 4+5732xn, " 445732xu | 7.4 Bayes Estimators 413 Figure 7.6 Histogram of chest measurements of Scottish militiamen in Example 7.4.7. 35 40 45 1000 800 600 400 200 0 Chest measurement Count and variance v2 1 = 4v2 0 4 + 5732v2 0 . The Bayes estimate will then be δ(x) = μ1. Notice that, unless μ0 is incredibly large or v2 0 is very small, we will have μ1 nearly equal to 39.85 and v2 1 nearly equal to 4/5732. Indeed, if the prior p.d.f. of θ is any continuous function that is positive around θ = 39.85 and is not extremely large when θ is far from 39.85, then the posterior p.d.f. of θ will very nearly be the normal p.d.f. with mean 39.85 and variance 4/5732. The mean and median of the posterior distribution are nearly xn regardless of the prior distribution. ◀ Consistency of the Bayes Estimator Let X1, . . . , Xn be a random sample (given θ) from the Bernoulli distribution with parameter θ. Suppose that we use a conjugate prior for θ. Since θ is the mean of the distribution from which the sample is being taken, it follows from the law of large numbers discussed in Sec. 6.2 that Xn converges in probability to θ as n → ∞. Since the difference between the Bayes estimator δ∗(X) and Xn converges in probability to 0 as n → ∞, it can also be concluded that δ∗(X) converges in probability to the unknown value of θ as n → ∞. Definition 7.4.6 Consistent Estimator. A sequence of estimators that converges in probability to the unknown value of the parameter being estimated, as n → ∞, is called a consistent sequence of estimators. Thus, we have shown that the Bayes estimators δ∗(X) form a consistent sequence of estimators in the problem considered here. The practical interpretation of this result is as follows: When large numbers of observations are taken, there is high probability that the Bayes estimator will be very close to the unknown value of θ. The results that have just been presented for estimating the parameter of a Bernoulli distribution are also true for other estimation problems. Under fairly general conditions and for a wide class of loss functions, the Bayes estimators of some parameters θ will form a consistent sequence of estimators as the sample size n → ∞. In particular, for random samples from any one of the various families of distributions discussed in Sec. 7.3, if a conjugate prior distribution is assigned to the parameters and the squared error loss function is used, the Bayes estimators will form a consistent sequence of estimators. For example, consider again the conditions of Example 7.4.4. In that example, a random sample is taken from a normal distribution for which the value of the mean 7.4 Estimadores Bayesianos 413 Figura 7.6Histograma de medidas do peito de milicianos escoceses no Exemplo 7.4.7. 1000 800 600 400 200 0 35 40 Medida do Busto 45 e variação 4v2 4 + 5732v2 v2 0 1= . 0 A estimativa de Bayes será entãoδ(x)=μ1. Observe que, a menos queμ0é incrivelmente grande ou v20é muito pequeno, teremosμ1quase igual a 39,85 ev2 1quase igual a 4/5732. Na verdade, se o pdf anterior deθé qualquer função contínua positiva em torno de θ=39. 85 e não é extremamente grande quandoθestá longe de 39,85, então a pdf posterior deθ será muito próximo do pdf normal com média 39,85 e variância 4/5732. A média e a mediana da distribuição posterior são quasexnindependentemente da distribuição anterior. - Consistência do Estimador Bayesiano DeixarX1, . . . , Xnser uma amostra aleatória (dadaθ) da distribuição de Bernoulli com parâmetroθ. Suponha que usamos um conjugado anterior paraθ. Desdeθé a média da distribuição da qual a amostra está sendo retirada, ela segue da lei dos grandes números discutida na Seção. 6.2 queXnconverge em probabilidade paraθcomon→∞.Como a diferença entre o estimador de Bayesδ∗(X) eXn converge em probabilidade para 0 comon→ ∞,também se pode concluir queδ∗(X) converge em probabilidade para o valor desconhecido deθcomon→ ∞. Definição 7.4.6 Estimador consistente.Uma sequência de estimadores que converge em probabilidade para o valor desconhecido do parâmetro que está sendo estimado, comon→ ∞,é chamado desequência consistente de estimadores. Assim, mostramos que os estimadores de Bayesδ∗(X)formam uma sequência consistente de estimadores no problema considerado aqui. A interpretação prática deste resultado é a seguinte: Quando um grande número de observações é feito, há uma grande probabilidade de que o estimador de Bayes esteja muito próximo do valor desconhecido deθ. Os resultados que acabamos de apresentar para estimar o parâmetro de uma distribuição de Bernoulli também são verdadeiros para outros problemas de estimação. Sob condições bastante gerais e para uma ampla classe de funções de perda, os estimadores Bayesianos de alguns parâmetrosθformará uma sequência consistente de estimadores conforme o tamanho da amostra n→ ∞.Em particular, para amostras aleatórias de qualquer uma das várias famílias de distribuições discutidas na Seção. 7.3, se uma distribuição a priori conjugada for atribuída aos parâmetros e a função de perda de erro quadrática for usada, os estimadores Bayes formarão uma sequência consistente de estimadores. Por exemplo, considere novamente as condições do Exemplo 7.4.4. Nesse exemplo, uma amostra aleatória é retirada de uma distribuição normal para a qual o valor da média Contar 414 Capitulo 7 Estimativa 414 Chapter 7 Estimation 6 desconhecido, e o estimador de Bayes d:(X}é especificado pela Eq. (7.4.6). Pela lei dos @ is unknown, and the Bayes estimator 6*(X) is specified by Eq. (7.4.6). By the law grandes numeros,Xnconvergira para o valor desconhecido da média@comon-> ~.Agora pode ser of large numbers, X,, will converge to the unknown value of the mean @ as n — oo. It visto na Eq. (7.4.6) qued«(X)também convergira paraGcomon- ~.Assim, os estimadores can now be seen from Eq. (7.4.6) that 6*(X) will also converge to 6 as n — oo. Thus, Bayesianos formam novamente uma sequéncia consistente de estimadores. Outros exemplos the Bayes estimators again form a consistent sequence of estimators. Other examples sdo dados nos Exercicios 7 e 11 no final desta secdo. are given in Exercises 7 and 11 at the end of this section. Pardmetros e estimadores mais gerais More General Parameters and Estimators Até agora nesta secdo, consideramos apenas pardmetros com valor real e estimadores So far in this section, we have considered only real-valued parameters and estima- desses pardmetros. Existem duas generalizagdes muito comuns desta situagdo que sdo tors of those parameters. There are two very common generalizations of this situation faceis de lidar com as mesmas técnicas descritas acima. A primeira generalizacdo é para that are easy to handle with the same techniques described above. The first general- parametros multidimensionais, como o pardmetro bidimensional de uma distribuigdo ization is to multidimensional parameters such as the two-dimensional parameter of normal com média e varidncia desconhecidas. A segunda generalizagdo é para funcgées do a normal distribution with unknown mean and variance. The second generalization parametro e nado para o pardmetro em si. Por exemplo, se@ a taxa de falha no Exemplo is to functions of the parameter rather than the parameter itself. For example, if 0 is 7.1.1, podemos estar interessados em estimar 1/@,0 tempo médio até a falha. Como the failure rate in Example 7.1.1, we might be interested in estimating 1/0, the mean outro exemplo, se os nossos dados surgirem de uma distribuigéo normal com média e time to failure. As another example, if our data arise from a normal distribution with variancia desconhecidas, poderemos querer estimar apenas a média em vez de todo o unknown mean and variance, we might wish to estimate the mean only rather than parametro. the entire parameter. As mudangas necessarias na Definigdo 7.4.1 para lidar com ambas as The necessary changes to Definition 7.4.1 in order to handle both of the gener- generalizacédes mencionadas sdo fornecidas na Definicdo 7.4.7. alizations just mentioned are given in Definition 7.4.7. Definigao Estimador/Estimativa.Deixarm, ..., Xnser dados observaveis cuja distribuigdo conjunta é Definition Estimator/Estimate. Let X,,..., X, be observable data whose joint distribution is 7.4.7 indexado por um pardmetro@tomando valores em um subconjunto dekespaco 7.4.7 indexed by a parameter @ taking values in a subset Q of k-dimensional space. Let -dimensional. Deixar Aser uma funcdo de dentrodespaco -dimensional. Definir ~=h(@). Um h be a function from Q into d-dimensional space. Define y = h(@). An estimator estimador de ye uma fungaod(%1, ..., XnJque leva valores emaespaco -dimensional. SeX1= of w is a function 6(X,..., X,,) that takes values in d-dimensional space. If X; = X1,...,Xn=XnS40 observados, entéod(x1, ..., Xn} chamado deestimativade w. X1,..., X, =X, are observed, then 5(x1, ..., x,) is called the estimate of y. Quandohna Definicgdo 7.4.7 é a funcdo identidadeh(@ 6,entao y= Ge estamos When / in Definition 7.4.7 is the identity function h(0) = 6, then w = 6 and we are estimando o pardmetro original@. Quandoh(@% uma coordenada de@, entdo ow que estimating the original parameter 6. When /(@) is one coordinate of 6, then the w estamos estimando é apenas aquela coordenada. that we are estimating is just that one coordinate. Havera varios exemplos de paradmetros multidimensionais em secées e capitulos There will be a number of examples of multidimensional parameters in later posteriores deste livro. Aqui esta um exemplo de estimativa de uma funcgdo de um sections and chapters of this book. Here is an example of estimating a function of a parametro. parameter. Exemplo Vida Util dos componentes eletrénicos.No Exemplo 7.3.12, suponha que queremos estimar Example Lifetimes of Electronic Components. In Example 7.3.12, suppose that we want to esti- 7.4.8 amigoW= 1/6,0 tempo médio até a falha dos componentes eletrénicos. A distribuigdo 7.4.8 mate y = 1/0, the mean time to failure of the electronic components. The posterior posterior deGé a distribuigdo gama com parametros 4 e 8.6. Se usarmos a perda de erro distribution of 6 is the gamma distribution with parameters 4 and 8.6. If we use the quadraticoeu(@, uma¥(y+ap, O Teorema 4.7.3 diz que a estimativa de Bayes é a média da squared error loss L(0, a) = (Ww — a)”, Theorem 4.7.3 says that the Bayes estimate is distribuigdo posterior dew. Aquilo é, the mean of the posterior distribution of w. That is, (I) 1 1 5x(X)- E(p| XE a o°(x) = E(W|x) = E (; | *) Jo oO 4 =" 270 x18 = [Ze ae 0 @ 0 6 = J=t 8.04 63e-8.60d0 _ [ 18.6" 93,-8.60 49 0 & 0 6 6 _ 8.64Jo @essedd _ 8.64 [ 920-8: 19 6 0 6 Jo = 8602 =2.867, = 8.60 2 = 2.867, 6 8.63 6 8.63 7.4 Estimadores Bayesianos 415 7.4 Bayes Estimators 415 onde a igualdade final segue do Teorema 5.7.3. A média de 1/6€ um pouco maior where the final equality follows from Theorem 5.7.3. The mean of 1/0 is slightly higher que 1/E(0| x/8.6/4 = 2.15. - than 1/E(6|x) = 8.6/4 =2.15. < Nota: Fungées de Perda e Utilidade.Na seg. 4.8, introduzimos 0 conceito de utilidade para medir Note: Loss Functions and Utility. In Sec. 4.8, we introduced the concept of utility os valores para um tomador de decisdo de varios resultados aleatorios. O conceito de funcdo de to measure the values to a decision maker of various random outcomes. The concept perda esta intimamente relacionado ao de utilidade. De certo modo, uma funcdo de perda é como o of loss function is closely related to that of utility. In a sense, a loss function is like negativo de uma utilidade. Na verdade, o Exemplo 4.8.8 mostra como converter a perda absoluta de the negative of a utility. Indeed, Example 4.8.8 shows how to convert absolute error erros em um utilitario. Nesse exemplo,Sdesempenha o papel de parametro ea(Wdesempenha o loss into a utility. In that example, Y plays the role of the parameter and d(W) plays papel de estimador. De maneira semelhante, pode-se converter outras funcdes de perda em the role of the estimator. In a similar manner, one can convert other loss functions utilidades. Portanto, ndo é surpreendente que 0 objetivo de maximizar a utilidade esperada na Sec. into utilities. Hence, it is not surprising that the goal of maximizing expected utility 4.8 foi substituido pelo objetivo de minimizar a perda esperada na presente secdo. in Sec. 4.8 has been replaced by the goal of minimizing expected loss in the present section. ®) | Limitagées dos estimadores Bayesianos e Limitations of Bayes Estimators A teoria dos estimadores Bayesianos, conforme descrita nesta se¢do, fornece uma teoria The theory of Bayes estimators, as described in this section, provides a satisfactory satisfatoria e coerente para a estimacdo de pardmetros. Na verdade, de acordo com and coherent theory for the estimation of parameters. Indeed, according to statisti- estatisticos que aderem a filosofia Bayesiana, esta fornece a Unica teoria de estimativa cians who adhere to the Bayesian philosophy, it provides the only coherent theory of coerente que pode ser desenvolvida. No entanto, existem certas limitagdes a estimation that can possibly be developed. Nevertheless, there are certain limitations aplicabilidade desta teoria em problemas estatisticos praticos. Para aplicar a teoria, é to the applicability of this theory in practical statistical problems. To apply the the- necessario especificar uma fungdo de perda especifica, como o erro quadratico ou a ory, it is necessary to specify a particular loss function, such as the squared error or fungao de erro absoluto, e também uma distribuigdo a priori para o pardmetro. Podem absolute error function, and also a prior distribution for the parameter. Meaningful existir especificagdes significativas, em principio, mas pode ser muito dificil e demorado specifications may exist, in principle, but it may be very difficult and time-consuming determina-las. Em alguns problemas, o estatistico deve determinar as especificagées que to determine them. In some problems, the statistician must determine the specifi- seriam apropriadas para clientes ou empregadores que nao estado disponiveis ou que ndo cations that would be appropriate for clients or employers who are unavailable or conseguem comunicar as suas preferéncias e conhecimentos. Noutros problemas, pode otherwise unable to communicate their preferences and knowledge. In other prob- ser necessario que uma estimativa seja feita em conjunto pelos membros de um grupo ou lems, it may be necessary for an estimate to be made jointly by members of a group or comité, e pode ser dificil para os membros do grupo chegarem a acordo sobre uma committee, and it may be difficult for the members of the group to reach agreement fungdo de perda apropriada e distribuigdo prévia. about an appropriate loss function and prior distribution. Outra possivel dificuldade é que em um problema particular o parametro@pode na Another possible difficulty is that in a particular problem the parameter 6 may verdade ser um vetor de parametros com valores reais para os quais todos os valores sdo actually be a vector of real-valued parameters for which all the values are unknown. desconhecidos. A teoria da estimativa de Bayes, que foi desenvolvida nas secées anteriores, The theory of Bayes estimation, which has been developed in the preceding sections, pode ser facilmente generalizada para incluir a estimativa de um parametro vetorial@. Contudo, can easily be generalized to include the estimation of a vector parameter 6. However, para aplicar esta teoria em tal problema é necessario especificar uma distribuigdo a priori to apply this theory in such a problem it is necessary to specify a multivariate prior multivariada para o vetorGe também para especificar uma fungdo de perdaeu(6,ajisso € uma distribution for the vector 6 and also to specify a loss function L (6, a) that is a function fungdo do vetor Ge o vetora, que sera usado para estimar@. Mesmo que 0 estatistico possa estar of the vector 6 and the vector a, which will be used to estimate 6. Even though interessado em estimar apenas uma ou duas componentes do vetor Gem um determinado the statistician may be interested in estimating only one or two components of the problema, ele ainda deve atribuir uma distribuigdéo anterior multivariada para todo o vetor@ Em vector @ in a given problem, he must still assign a multivariate prior distribution to muitos problemas estatisticos importantes, alguns dos quais serao discutidos mais adiante the entire vector 6. In many important statistical problems, some of which will be neste livro, Q00de ter um grande numero de componentes. Em tal problema, é especialmente discussed later in this book, 6 may have a large number of components. In such a dificil especificar uma distribuigdo a priori significativa no espaco de pardmetros problem, it is especially difficult to specify a meaningful prior distribution on the multidimensional. multidimensional parameter space Q. Deve-se enfatizar que ndo existe uma maneira simples de resolver essas dificuldades. It should be emphasized that there is no simple way to resolve these difficulties. Outros métodos de estimativa que nado se baseiam em distribuicdes anteriores e fungdes de Other methods of estimation that are not based on prior distributions and loss perda normalmente também apresentam limitagSes praticas. Esses outros métodos também functions typically have practical limitations, also. These other methods also typically apresentam defeitos graves em sua estrutura teérica. have serious defects in their theoretical structure as well. e e Resumo Summary Um estimador de um pardmetro6é uma fungdoddos dadosX. SeX=xé observado, o valor d( An estimator of a parameter 0 is a function 6 of the data X. If X = x is observed, the x} chamado de nossa estimativa, o valor observado do estimadord(X). Uma perda value 5(x) is called our estimate, the observed value of the estimator 5(X). A loss 416 Chapter 7 Estimation function L(θ, a) is designed to measure how costly it is to use the value a to estimate θ. A Bayes estimator δ∗(X) is chosen so that a = δ∗(x) provides the minimum value of the posterior mean of L(θ, a). That is, E[L(θ, δ∗(x))|x] = min a E[L(θ, a)|x]. If the loss is squared error, L(θ, a) = (θ − a)2, then δ∗(x) is the posterior mean of θ, E(θ|x). If the loss is absolute error, L(θ, a) = |θ − a|, then δ∗(x) is a median of the posterior distribution of θ. For other loss functions, locating the minimum might have to be done numerically. Exercises 1. In a clinical trial, let the probability of successful out- come θ have a prior distribution that is the uniform dis- tribution on the interval [0, 1], which is also the beta dis- tribution with parameters 1 and 1. Suppose that the first patient has a successful outcome. Find the Bayes estimates of θ that would be obtained for both the squared error and absolute error loss functions. 2. Suppose that the proportion θ of defective items in a large shipment is unknown, and the prior distribution of θ is the beta distribution for which the parameters are α = 5and β = 10. Suppose also that 20 items are selected at random from the shipment, and that exactly one of these items is found to be defective. If the squared error loss function is used, what is the Bayes estimate of θ? 3. Consider again the conditions of Exercise 2. Suppose that the prior distribution of θ is as given in Exercise 2, and suppose again that 20 items are selected at random from the shipment. a. For what number of defective items in the sample will the mean squared error of the Bayes estimate be a maximum? b. For what number will the mean squared error of the Bayes estimate be a minimum? 4. Suppose that a random sample of size n is taken from the Bernoulli distribution with parameter θ, which is un- known, and that the prior distribution of θ is a beta distri- bution for which the mean is μ0. Show that the mean of the posterior distribution of θ will be a weighted average having the form γnXn + (1 − γn)μ0, and show that γn → 1 as n → ∞. 5. Suppose that the number of defects in a 1200-foot roll of magnetic recording tape has a Poisson distribution for which the value of the mean θ is unknown, and the prior distribution of θ is the gamma distribution with param- eters α = 3 and β = 1. When five rolls of this tape are selected at random and inspected, the numbers of defects found on the rolls are 2, 2, 6, 0, and 3. If the squared error loss function is used, what is the Bayes estimate of θ? (See Exercise 5 of Sec. 7.3.) 6. Suppose that a random sample of size n is taken from a Poisson distribution for which the value of the mean θ is unknown, and the prior distribution of θ is a gamma dis- tribution for which the mean is μ0. Show that the mean of the posterior distribution of θ will be a weighted average having the form γnXn + (1 − γn)μ0, and show that γn → 1 as n → ∞. 7. Consider again the conditions of Exercise 6, and sup- pose that the value of θ must be estimated by using the squared error loss function. Show that the Bayes estima- tors, for n = 1, 2, . . . , form a consistent sequence of esti- mators of θ. 8. Suppose that the heights of the individuals in a certain population have a normal distribution for which the value of the mean θ is unknown and the standard deviation is 2 inches. Suppose also that the prior distribution of θ is a normal distribution for which the mean is 68 inches and the standard deviation is 1 inch. Suppose finally that 10 people are selected at random from the population, and their average height is found to be 69.5 inches. a. If the squared error loss function is used, what is the Bayes estimate of θ? b. If the absolute error loss function is used, what is the Bayes estimate of θ? (See Exercise 7 of Sec. 7.3). 9. Suppose that a random sample is to be taken from a normal distribution for which the value of the mean θ is unknown and the standard deviation is 2, the prior distri- bution of θ is a normal distribution for which the standard deviation is 1, and the value of θ must be estimated by us- ing the squared error loss function. What is the smallest random sample that must be taken in order for the mean squared error of the Bayes estimator of θ to be 0.01 or less? (See Exercise 10 of Sec. 7.3.) 10. Suppose that the time in minutes required to serve a customer at a certain facility has an exponential distribu- tion for which the value of the parameter θ is unknown, 416 Capítulo 7 Estimativa funçãoeu(θ, uma)é projetado para medir o quão caro é usar o valoraestimar θ.Um estimador Bayesianoδ∗(X)é escolhido para quea=δ∗(x)fornece o valor mínimo da média posterior deeu(θ, uma). Aquilo é, E[eu(θ, δ∗(x))|x] = mínimoE[eu(θ, uma)|x]. a Se a perda for um erro quadrático,eu(θ, uma)=(θ-a)2, entãoδ∗(x)é a média posterior de θ,E(θ|x). Se a perda for um erro absoluto,eu(θ, uma)= |θ-a|, entãoδ∗(x)é uma mediana da distribuição posterior deθ. Para outras funções de perda, a localização do mínimo pode ter que ser feita numericamente. Exercícios 1.Em um ensaio clínico, deixe a probabilidade de resultado bem- sucedidoθtem uma distribuição anterior que é a distribuição uniforme no intervalo [0,1], que também é a distribuição beta com os parâmetros 1 e 1. Suponha que o primeiro paciente tenha um resultado bem-sucedido. Encontre as estimativas de Bayes deθque seria obtido para as funções de erro quadrático e de perda de erro absoluto. função de perda é usada, qual é a estimativa de Bayes deθ? (Ver Exercício 5 da Seção 7.3.) 6.Suponha que uma amostra aleatória de tamanhoné retirado de uma distribuição de Poisson para a qual o valor da médiaθé desconhecido, e a distribuição anterior deθé uma distribuição gama para a qual a média éμ0. Mostre que a média da distribuição posterior deθserá uma média ponderada 2.Suponha que a proporçãoθde itens defeituosos em uma remessa grande é desconhecida, e a distribuição prévia de θé a distribuição beta para a qual os parâmetros são α=5 eβ= 10. Suponha também que 20 itens sejam selecionados aleatoriamente da remessa e que exatamente um desses itens seja considerado defeituoso. Se a função de perda de erro quadrática for usada, qual é a estimativa de Bayes deθ? tendo a formaγnXn+(1 -γn)μ0, e mostre queγn→1 comon → ∞. 7.Considere novamente as condições do Exercício 6 e suponha que o valor deθdeve ser estimado usando a função de perda de erro quadrático. Mostre que os estimadores de Bayes, para n=1,2, . . . ,formar uma sequência consistente de estimadores deθ. 3.Considere novamente as condições do Exercício 2. Suponha que a distribuição anterior deθé como dado no Exercício 2, e suponha novamente que 20 itens sejam selecionados aleatoriamente da remessa. 8.Suponha que as alturas dos indivíduos de uma determinada população tenham uma distribuição normal para a qual o valor da médiaθé desconhecido e o desvio padrão é de 2 polegadas. Suponha também que a distribuição anterior deθé uma distribuição normal para a qual a média é 68 polegadas e o desvio padrão é 1 polegada. Suponha, finalmente, que 10 pessoas sejam selecionadas aleatoriamente da população e que sua altura média seja de 69,5 polegadas. a.Para que número de itens defeituosos na amostra o erro quadrático médio da estimativa de Bayes será máximo? b.Para que número o erro quadrático médio da estimativa de Bayes será mínimo? a.Se a função de perda de erro quadrática for usada, qual é a estimativa de Bayes deθ? 4.Suponha que uma amostra aleatória de tamanhoné retirado da distribuição de Bernoulli com parâmetroθ, o que é desconhecido, e que a distribuição anterior deθé uma distribuição beta para a qual a média éμ0. Mostre que a média da distribuição posterior deθserá uma média ponderada b.Se a função de perda de erro absoluta for usada, qual é a estimativa de Bayes deθ? (Ver Exercício 7 da Seção 7.3). 9.Suponha que uma amostra aleatória seja retirada de uma distribuição normal para a qual o valor da médiaθé desconhecido e o desvio padrão é 2, a distribuição anterior deθé uma distribuição normal para a qual o desvio padrão é 1, e o valor deθdeve ser estimado usando a função de perda de erro quadrático. Qual é a menor amostra aleatória que deve ser obtida para que o erro quadrático médio do estimador Bayesiano deθser 0,01 ou menos? (Ver Exercício 10 da Seção 7.3.) tendo a formaγnXn+(1 -γn)μ0, e mostre queγn→1 comon → ∞. 5.Suponha que o número de defeitos em um rolo de fita magnética de 1.200 pés tenha uma distribuição de Poisson para a qual o valor da médiaθé desconhecido, e a distribuição anterior deθé a distribuição gama com parâmetrosα= 3 eβ= 1. Quando cinco rolos desta fita são selecionados aleatoriamente e inspecionados, o número de defeitos encontrados nos rolos é 2,2,6,0 e 3. Se o erro quadrático 10.Suponha que o tempo em minutos necessário para atender um cliente em uma determinada instalação tenha uma distribuição exponencial para a qual o valor do parâmetroθÉ desconhecido, 7.5 Estimadores de Maxima Verossimilhanca 417 7.5 Maximum Likelihood Estimators 417 a distribuigdo prévia deé uma distribuig¢do gama para a numero na amostra que era a favor da the prior distribution of 6 is a gamma distribution for number in the sample who were in favor of the prop- qual a média € 0,2 e o desvio padrdo é 1, eo tempo médio proposicdo. which the mean is 0.2 and the standard deviation is 1, and osition. necessario para atender uma amostra aleatéria de 20 a the average time required to serve a random sample of . . : ~ 13.Suponha queXi,..., Xnformar uma amostra aleatéria a . : 13. Suppose that X;,..., X, forma random sample from clientes é de 3,8 minutos. Se a funcdo de perda de erro . Se ‘ . 20 customers is observed to be 3.8 minutes. If the squared : vs . 7 ; . . partir da distribuigdo uniforme no intervalo [0,6], onde o valor . : . the uniform distribution on the interval [0, 6], where the quadratica for usada, qual é a estimativa de Bayes de @ . , . : error loss function is used, what is the Bayes estimate of . yo ~ do parametroGE desconhecido. Suponha também que a : value of the parameter @ is unknown. Suppose also that (Ver Exercicio 12 da Secdo 7.3.) vp . A 0? (See Exercise 12 of Sec. 7.3.) : woe : re . distribuigdo anterior de6é a distribuigdo de Pareto com the prior distribution of 6 is the Pareto distribution with 11.Suponha que uma amostra aleatéria de tamanhoné parametrosxvea{x0>0 ea >0), conforme definido no Exercicio 11. Suppose that a random sample of sizen istaken from Parameters xp and a (x9 > 0 and @ > 0), as defined in retirado de uma distribuigdo exponencial para a qual o valor 16 da Sec. 5.7. Se o valor deddeve ser estimado usando a an exponential distribution for which the value of the Exercise 16 of Sec. 5.7. If the value of @ is to be estimated do parametro6é desconhecida, a distribuigéo anterior deGé fungdo de perda de erro quadratica, qual € 0 estimador de parameter @ is unknown, the prior distribution of @ is by using the squared error loss function, what is the Bayes uma distribuicao gama especificada e o valor de@deve ser Bayes de@ (Ver Exercicio 18 da Secao 7.3.) a specified gamma distribution, and the value of 6 must _ &Stmator of 6? (See Exercise 18 of Sec. 7.3.) estimado usando a fungdo de perda de erro quadratico. 14.Suponha que®, ..., Xnformar uma amostra aleatéria de uma be estimated by using the squared error loss function. 14. Suppose that X;,..., X,, form a random sample from Mostre que os estimadores de Bayes, paran=1,2,...,formar distribuigéo exponencial para a qual o valor do parametro Show that the Bayes estimators, forn=1,2,..., forma an exponential distribution for which the value of the uma sequéncia consistente de estimadores de@. desconhecido/@ >0). Deixar&@Mdenotar o anterior consistent sequence of estimators of 0. parameter @ is unknown (6 > 0). Let €(@) denote the prior 12.Deixar @denotam a proporcdo de eleitores registrados em par de Ge ooh ena ° estimacor de Bayes el 12. Let 6 denote the proportion of registered voters in a p.df of 8, and let g denone the Bayes ee of 6 with uma grande cidade que sdo a favor de uma determinada relagdo do po anterior é/@)quan 0 a Tungao de perda de large city who are in favor of a certain proposition. Sup- respect tot © prior p.c.1. o(6) when the square @rTOr tOss proposta. Suponha que o valor deéé desconhecido, e dois erro quadratico é€ usada. Deixar Y= 62, e suponha que em pose that the value of @ is unknown, and two statisticians function is used. Let = 6°, and suppose that instead of estatisticos AeBatribuir aos sequintes PDFs anteriores vez de estimar 6, deseja-se estimar o valor de ¢sujeito a A and B assign to @ the following different prior p.d.f’s _ °Stimating @, it is desired to estimate the value of y subject diferentes &4(O)e (6), respectivamente: seguinte funcdo de perda de erro quadratico: é,(0) and ég(6), respectively: to the following squared error loss function: &(0-2@ para 0<6<1, eu(, umaF(p-akparaw >0 euma >0. E,(0)=20 for0<0 <1, Li, a)=(W—a)* forw>Oanda>0. &2(0-463 para 0<6<1. Deixar Ydenotar o estimador de Bayes dew. Explique por queyw> & En(0) = 46° for0 <6 <1. Let v denote the Bayes estimator of w. Explain why v > 2.DicaNeja o Exercicio 4 na Secao. 4.4. ; ; 6. Hint: Look at Exercise 4 in Sec. 4.4. Em uma amostra aleatoria de 1.000 eleitores cadastrados na Inarandom sample of 1000 registered voters from the city, ; ; cidade, verifica-se que 710 sdo a favor da proposta. 15.Deixarc >0 e considere fungao de perda it is found that 710 are in favor of the proposition. 15. Let c > 0 and consider the loss function a.Encontre a distribuigdo posterior que cada estatistico c|@a|_ se@<a, a. Find the posterior distribution that each statistician _fcl@—al| if@ <a, — eu(6, uma . L(@,a)= : atribui a8. | -a| se@za. assigns to 0. |O-—a| if@ >a. b.Encontre a estimativa de Bayes para cada estatistico com base na Assuma issoétem uma distribuicdo continua. Prove que um b. Find the Bayes estimate for each statistician based Assume that @ has a continuous distribution. Prove that a funcao de perda de erro quadratico. estimador Bayesiano de@sera qualquer 1/1 +c)quantil da on the squared error loss function. Bayes estimator of @ will be any 1/(1 + c) quantile of the c.Mostre que depois de obtidas as opinides dos 1.000 eleitores distribuigdo posterior de@.Dica:A prova é muito parecida com c. Show that after the opinions of the 1000 registered posterior distribution of 9. Hint: The proof is a lot like the registrados na amostra aleatoria, as estimativas de Bayes a prova do Teorema 4.5.3. O resultado é valido mesmo que@ voters in the random sample had been obtained, the proof of Theorem 4.5.3. The result holds even if 6 does para os dois estatisticos ndo poderiam diferir em mais de ndo tem uma distribuigéo continua, mas a prova é mais Bayes estimates for the two statisticians could not not have a continuous distribution, but the proof is more 0,002, independentemente da situacdo. complicada. possibly differ by more than 0.002, regardless of the cumbersome. 7.5 Estimadores de Maxima Verossimilhanca 7.5 Maximum Likelihood Estimators A estimativa de maxima verossimilhanca 6 um método para escolher estimadores de parametros Maximum likelihood estimation is a method for choosing estimators of parameters que evita o uso de distribuicées anteriores e funcées de perda. Fle escolhe como estimativa de @ that avoids using prior distributions and loss functions. It chooses as the estimate o valor de @ que fornece o maior valor da fun¢ao de verossimilhan¢a. of 0 the value of 6 that provides the largest value of the likelihood function. Introdugao Introduction Exemplo Vida util dos componentes eletrénicos.Suponha que observemos os dados do Exame Example Lifetimes of Electronic Components. Suppose that we observe the data in Exam- 7.5.1 ple 7.3.11 consistindo na vida Util de trés componentes eletrénicos. Existe um método 7.5.1 ple 7.3.11 consisting of the lifetimes of three electronic components. Is there a method para estimar a taxa de falha@sem primeiro construir uma distribuigdéo anterior e uma for estimating the failure rate 6 without first constructing a prior distribution and a fungdo de perda? - loss function? < Nesta secdo, desenvolveremos um método relativamente simples de construgado de um In this section, we shall develop a relatively simple method of constructing an estimador sem a necessidade de especificar uma funcdo de perda e uma distribuicdo a priori. E estimator without having to specify a loss function and a prior distribution. It is called chamado de método deprobabilidade maxima, e foi introduzido por RA Fisher em 1912. A estimativa the method of maximum likelihood, and it was introduced by R. A. Fisher in 1912. de maxima verossimilhanga pode ser aplicada na maioria dos problemas, tem um forte Maximum likelihood estimation can be applied in most problems, it has a strong 418 Capitulo 7 Estimativa 418 Chapter 7 Estimation apelo intuitivo, e muitas vezes produzira um estimador razoavel de@. Além disso, se a intuitive appeal, and it will often yield a reasonable estimator of 6. Furthermore, if amostra for grande, o método normalmente produzira um excelente estimador de@. Por the sample is large, the method will typically yield an excellent estimator of 6. For estas raz6es, o método da maxima verossimilhanca é provavelmente o método de these reasons, the method of maximum likelihood is probably the most widely used estimativa mais utilizado nas estatisticas. method of estimation in statistics. Nota: Terminologia.Ccomo a estimativa de maxima verossimilhanca, bem como muitos outros Note: Terminology. Because maximum likelihood estimation, as well as many other procedimentos a serem introduzidos posteriormente no texto, nao envolvem a especificacdo de uma procedures to be introduced later in the text, do not involve the specification of a prior distribuicdo anterior do parametro, alguma terminologia diferente é frequentemente usada na distribution of the parameter, some different terminology is often used in describing descrigdo dos modelos estatisticos aos quais esses procedimentos sao aplicados. Em vez de dizer isso the statistical models to which these procedures are applied. Rather than saying that M,..., Xn8do iid com PF ou PDF/(x| @xondicional a6, poderiamos dizer que M1, ..., Xnformar uma X1,..., X, are iid. with pf. or p.d.f f(x|4) conditional on 6, we might say that amostra aleatoria a partir de uma distribuicdo com PF ou pdff(x| @onde GE desconhecido. Mais X1,..., X, form arandom sample from a distribution with p.f. or p.d.f. f(x|9) where especificamente, no Exemplo 7.5.1, poderiamos dizer que os tempos de vida formam uma amostra 8 is unknown. More specifically, in Example 7.5.1, we could say that the lifetimes form aleatéria da distribuigdo exponencial com parametro desconhecido#. a random sample from the exponential distribution with unknown parameter 0. Definigao de um estimador de maxima verossimilhanga Definition of a Maximum Likelihood Estimator Deixe as variaveis aleatdriasX,..., Xnformar uma amostra aleatéria de uma Let the random variables X,,..., X, form a random sample from a discrete distri- distribuigdo discreta ou de uma distribuigéo continua para a qual o PF ou o pdf éf(x| 8), bution or a continuous distribution for which the p.f. or the p.d-f. is f(«|@), where the onde o parametro @pertence a algum espaco de paradmetros. Aqui, Qp0de ser um parameter @ belongs to some parameter space Q. Here, 6 can be either a real-valued parametro com valor real ou um vetor. Para cada vetor observadox=(x1,..., Xnna parameter or a vector. For every observed vector x = (x1, ..., X,) in the sample, the amostra, 0 valor do PF conjunto ou pdf conjunto sera, como de costume, denotado por fn( value of the joint p.f. or joint p.d.f. will, as usual, be denoted by f,,(x|0). Because of X| ). Devido 4 sua importancia nesta secdo, repetimos a Definicao 7.2.3. its importance in this section, we repeat Definition 7.2.3. Definigao Fungao de probabilidade.Quando o pdf conjunto ou o PF conjuntofn(x| das observacées Definition Likelihood Function. When the joint p.d-f. or the joint p-f. f,,(x|0) of the observations 7.5.1 em uma amostra aleatéria é considerada como uma funcao de@para determinados valores dex1,..., Xn, é chamado 7.5.1 in a random sample is regarded as a function of 6 for given values of x1, ..., x, it is defung¢ao de verossimilhanga. called the likelihood function. Considere primeiro 0 caso em que o vetor observadoxveio de uma distribuicgdo Consider first, the case in which the observed vector x came from a discrete discreta. Se uma estimativa de@deve ser selecionado, certamente nado considerarfamos distribution. If an estimate of 6 must be selected, we would certainly not consider nenhum valor de@€para o qual seria impossivel obter 0 vetorxisso foi realmente any value of 6 € Q for which it would be impossible to obtain the vector x that was observado. Além disso, suponha que a probabilidadefn(x| Ode obter o vetor real actually observed. Furthermore, suppose that the probability f, (x|9) of obtaining the observadoxé muito alto quando@tem um valor particular, digamos, =, e é muito actual observed vector x is very high when @ has a particular value, say, 6 = 6p, and is pequeno para qualquer outro valor deG€ .Entdo estimariamos naturalmente o valor de@ very small for every other value of 6 € Q. Then we would naturally estimate the value ser (a menos que tivéssemos informacées prévias fortes que superassem as evidéncias of 6 to be 4 (unless we had strong prior information that outweighed the evidence in na amostra e apontassem para algum outro valor). Quando a amostra provém de uma the sample and pointed toward some other value). When the sample comes from a distribuigdo continua, seria novamente natural tentar encontrar um valor de @para a qual continuous distribution, it would again be natural to try to find a value of 6 for which a densidade de probabilidade fn(x| 8 grande e usar esse valor como uma estimativa de@. the probability density f,(x|@) is large and to use this value as an estimate of 6. For Para cada possivel vetor observadox, somos levados por esse raciocinio a considerar um each possible observed vector x, we are led by this reasoning to consider a value of valor de Opara a qual a fungdo de verossimilhancafn(x| 86 um maximo e usar esse valor @ for which the likelihood function f,,(%|0) is a maximum and to use this value as an como uma estimativa de@. Este conceito é formalizado na seguinte defini¢ao. estimate of 6. This concept is formalized in the following definition. Definigao Estimador/estimativa de maxima verossimilhanga.Para cada possivel vetor observadox, deixar Definition |= Maximum Likelihood Estimator/Estimate. For each possible observed vector x, let 7.5.2 6(x}€denotar um valor de@€para a qual a funcao de verossimilhangafn(x| 86 um maximo, e 7.5.2 6(x) € Q denote a value of 6 € Q for which the likelihood function f,,(¥|@) is a max- deixe & 5(Xseja o estimador de G@definido desta forma. O estimador6é chamado deestimador de imum, and let 6 = 5(X) be the estimator of 6 defined in this way. The estimator 6 is mdxima verossimilhancade @. DepoisX=xé observado, 0 valor d(x) é chamado deestimativa de called a maximum likelihood estimator of 0. After X =x is observed, the value (x) maxima verossimilhancade @. is called a maximum likelihood estimate of 0. As expresséesestimador de maxima verossimilhancaeestimativa de maéxima verossimilhanca The expressions maximum likelihood estimator and maximum likelihood estimate are sdo abreviados como MLE E preciso confiar no contexto para determinar se a abreviatura se abbreviated M.L.E. One must rely on context to determine whether the abbreviation refere a um estimador ou a uma estimativa. Observe que o MLE deve ser um elemento do refers to an estimator or to an estimate. Note that the M.L.E. is required to be an espaco de pardmetros, ao contrario dos estimadores/estimativas gerais para os quais tal element of the parameter space Q, unlike general estimators/estimates for which no requisito ndo existe. such requirement exists. 7.5 Estimadores de Maxima Verossimilhanca 419 7.5 Maximum Likelihood Estimators 419 Exemplos de estimadores de maxima verossimilhanga Examples of Maximum Likelihood Estimators Exemplo Vida util dos componentes eletrénicos.No Exemplo 7.3.11, os dados observados séoX1= 3, Example Lifetimes of Electronic Components. In Example 7.3.11, the observed data are X; = 3, 7.5.2 X2= 1.5, eX3= 2.1. As varidveis aleatorias foram modeladas como uma amostra aleatéria de 7.5.2 X> =1.5, and X3 = 2.1. The random variables had been modeled as a random sample tamanho 3 a partir da distribuigdo exponencial com parametro@. A funcdo de verossimilhanga of size 3 from the exponential distribution with parameter 0. The likelihood function é, por@ >0, is, for 6 > 0, fi(x| 6 @2experiéncia(-6.66), x10) = 0° exp (—6.66) , ondex=(2,1.5,2.1). O valor de@que maximiza a fungdo de verossimilhangaf(x| 8) where x = (2, 1.5, 2.1). The value of 6 that maximizes the likelihood function /3(x|0) sera igual ao valor de@que maximiza 0 logA(x| 8), ja que log é uma funcdo will be the same as the value of 6 that maximizes log f3(x|0), since log is an increasing crescente. Portanto, sera conveniente determinar o MLE encontrando o valor de function. Therefore, it will be convenient to determine the M.L.E. by finding the value Oque maximiza of 6 that maximizes eu(OFregistrof(x| O=3 registro(O}6.68. L(6) = log f3(x|0) = 3 log(@) — 6.68. Tomando a derivadadl(0)/d6,definindo a derivada como 0 e resolvendo para@rendimentos 6=3/ Taking the derivative dL(@)/d0, setting the derivative to 0, and solving for @ yields 6.6 = 0.455. A segunda derivada é negativa neste valor de@, portanto fornece um maximo. A 6 = 3/6.6 = 0.455. The second derivative is negative at this value of 0, so it provides estimativa de maxima verossimilhanca é entao 0,455. - a maximum. The maximum likelihood estimate is then 0.455. < Deve-se notar que em alguns problemas, para certos vetores observadosx, 0 It should be noted that in some problems, for certain observed vectors x, the valor maximo de/fn(x| O)pode nao ser realmente alcangado em nenhum pontoé . maximum value of f,,(x|0) may not actually be attained for any point @ € Q. In such Nesse caso, um MLE de@nao existe. Para alguns outros vetores observadosx, 0 valor a case, an M.L.E. of 6 does not exist. For certain other observed vectors x, the maximo de/fn(x| @)pode realmente ser alcangado em mais de um ponto no espaco. maximum value of f,,(x|9) may actually be attained at more than one point in the Nesse caso, 0 MLE nado é definido de forma Unica, e qualquer um desses pontos pode space ©. In such a case, the M.L.E. is not uniquely defined, and any one of these ser escolhido como o valor do estimador& Em muitos problemas praticos, points can be chosen as the value of the estimator 6. In many practical problems, entretanto, o MLE existe e é definido de forma unica. however, the M.L.E. exists and is uniquely defined. Ilustraremos agora 0 método da maxima verossimilhanga e estas diversas We shall now illustrate the method of maximum likelihood and these various possibilidades considerando varios exemplos. Em cada exemplo, tentaremos possibilities by considering several examples. In each example, we shall attempt to determinar um MLE determine an M.L.E. Exemplo Teste para uma doenga.Suponha que vocé esteja andando na rua e perceba que o Example Test for a Disease. Suppose that you are walking down the street and notice that the 7.5.3 O Departamento de Saude Publica esta oferecendo um exame médico gratuito para uma 7.5.3 Department of Public Health is giving a free medical test for a certain disease. The determinada doenga. O teste é 90% confidvel no seguinte sentido: se uma pessoa tiver a test is 90 percent reliable in the following sense: If a person has the disease, there is a doenga, ha uma probabilidade de 0,9 de que o teste dé uma resposta positiva; ao passo que, se probability of 0.9 that the test will give a positive response; whereas, if a person does uma pessoa n@o tiver a doenga, existe uma probabilidade de apenas 0,1 de que o teste dé uma not have the disease, there is a probability of only 0.1 that the test will give a positive resposta positiva. Este mesmo teste foi considerado no Exemplo 2.3.1. Vamos deixarX response. This same test was considered in Example 2.3.1. We shall let X stand for representam o resultado do teste, ondeX=1 significa que 0 teste é positivo eX=0 significa que o the result of the test, where X = 1 means that the test is positive and X = 0 means teste é negativo. Seja o espaco de parametros = {0.1,0.9}, onde@ 0.1 that the test is negative. Let the parameter space be Q = {0.1, 0.9}, where 6 = 0.1 significa que a pessoa testada ndo tem a doenga, eG= 0.9 significa que a pessoa tem a means that the person tested does not have the disease, and 6 = 0.9 means that the doenga. Este espaco de pardmetros foi escolhido de modo que, dadoG,Xtem a distribuigdo person has the disease. This parameter space was chosen so that, given 6, X has the de Bernoulli com paradmetro@. A fungdo de verossimilhanga é Bernoulli distribution with parameter 9. The likelihood function is f(x| B= Ox(1 -ON-x. f(x|a) =e7(1 —a@)™, Sex=0 é observado, entdo If x = 0 is observed, then { 0.9 se@0.1, 0.9 if6=0.1, F(0| O= ro ={ 0.1 sed=0.9. 01 if6 =0.9. Claramente, 6= 0.1maximiza a probabilidade quandox=0 é observado. Sex=1 é observado, Clearly, @ = 0.1 maximizes the likelihood when x = 0 is observed. If x = 1is observed, entdo then { 0.1 sed=0.1, 01 if6=0.1, f(\| O- f (10) -|{ 0.9 sed=0.9. 0.9 if6 =0.9. 420 Capitulo 7 Estimativa 420 Chapter 7 Estimation Claramente, 6= 0.9 maximiza a probabilidade quandox=1 é observado. Portanto, temos Clearly, 6 = 0.9 maximizes the likelihood when x = 1 is observed. Hence, we have que o MLE é that the M.L.E. is o 4 0.1 sexX=0, f- | 0.1 if X =0, 0.9 sex=1. - 0.9 ifX=1. < Exemplo Amostragem de uma distribuicao Bernoulli.suponha que as varidveis aleatériasM,..., Xn Example Sampling from a Bernoulli Distribution. Suppose that the random variables X,,..., X,, 7.5.4 formar uma amostra aleatoria da distribuigdo de Bernoulli com parametro@, o que é 7.5.4 form a random sample from the Bernoulli distribution with parameter 6, which is desconhecido(0< 6<1). Para todos os valores observadosx, ..., Xn, onde cadaxeué 0 ou 1, a unknown (0 < 6 < 1). For all observed values x,, ..., x,, where each x; is either 0 or fungdo de verossimilhanga é 1, the likelihood function is iT’ n fn(x| OF Oxeu(1 -O)-xeu. (7.5.1) fn(xl0) =] [o*a-e)'™. (7.5.1) eu=1 i=l Em vez de maximizar a funcao de verossimilhancafn(x| Odiretamente, 6 novamente mais facil Instead of maximizing the likelihood function f,,(x%|@) directly, it is again easier to maximizar 0 log fn(x| @): maximize log f,(x|0): ” n eu(OF registro fn(x| O= [xeuregistro6+(1 -xeuyregistro(| -9)] L(@) =log f,(x|@) = EF log 6 + (1 — x;) log(1 — 6)] eu= i=l Os, Cn) = Xeu registro 6+ f- Xeu registro(1 -@). = (> “| log O+ (: _ > “) log” —6). eu=1 eu=1 i=l i=l Agora calcule a derivada) ol (6)/d6,defina esta derivada igual a 0 e resolva a Now calculate the derivative dL(6)/d6, set this derivative equal to 0, and solve equacdo resultante para@é. Se éu=1 XeuE {0, n}, descobrimos que a derivada 6 0 em the resulting equation for 0. If }*"_, x; ¢ {0, n}, we find that the derivative is 0 at O=xn, e pode-se verificar (por exemplo, examinando a segunda derivada) que 6 =X,, and it can be verified (for example, by examining the second derivative) este valor) de fato maximizaeu(@e a funcgao de verossimilhanga definida pela Eq. that this value does indeed maximize L(6) and the likelihood function defined by (7.5.1). Se eu=1 xev=0, entadoeu(O% uma funcaasdecrescente de @para todos@, e, portanto Eq. (7.5.1). If )>"_, x; =0, then L() is a decreasing function of 6 for all 6, and hence euatinge seu maximo em@= 0. Da mesma forma, sé @u=1.Xev=n, eué uma fungdo crescente, L achieves its maximum at 6 = 0. Similarly, if )7""_, x; =n, L is an increasing function, e atinge seu maximo em@ 1. Nestes dois ultimos casos, observe que o maximo da and it achieves its maximum at 6 = 1. In these last two cases, note that the maximum verossimilhanga ocorre em6=xn. Segue-se, portanto, que o MLE deGé6=Xn. 7 of the likelihood occurs at 6 = X,,. It follows, therefore, that the M.L.E. of 6 is 6 = X,. - < Segue-se do Exemplo 7.5.4 que seM,..., Xnsdo considerados comonEnsaios de Bernoulli e It follows from Example 7.5.4 that if X;,..., X,, are regarded as n Bernoulli trials se 0 espaco de parametros for = [0,1], entéo o MLE da probabilidade desconhecida de sucesso and if the parameter space is Q = [0, 1], then the M.L.E. of the unknown probability em qualquer tentativa é simplesmente a proporgdo de sucessos observados no nensaios. No of success on any given trial is simply the proportion of successes observed in the Exemplo 7.5.3, temosn=1 tentativa de Bernoulli, mas 0 espaco de pardmetros é n trials. In Example 7.5.3, we have n = 1 Bernoulli trial, but the parameter space = {0.1,0.9} em vez de [0,1], e o MLE difere da proporcdo de sucessos. is Q = {0.1, 0.9} rather than [0, 1], and the M.L.E. differs from the proportion of successes. Exemplo Amostragem de uma distribuicao normal com média desconhecida.Suponha queXi, ..., Xn Example Sampling from a Normal Distribution with Unknown Mean. Suppose that X1,..., X, 7.5.5 formar uma amostra aleatéria de uma distribuigdo normal para a qual a médiayé desconhecido 7.5.5 form a random sample from a normal distribution for which the mean yz is unknown e a varianciaozé conhecido. Para todos os valores observadosxi,..., Xn, a fungdo de and the variance o” is known. For all observed values x,,..., x,, the likelihood verossimilhanga fn(x| pai ser function f,,(x|) will be 1 1 2 1 1< fr(x| LF Qroynn- exp - Io (X4up2. (7.5.2) Fr(xle) = Qno2n2 ex 5h dei — "| . (7.5.2) eu=1 i=l Isso pode ser visto na Eq. (7.5.2) quefn(x| sera maximizado pelo valor deyique It can be seen from Eq. (7.5.2) that f,,(v|) will be maximized by the value of that minimiza minimizes ” ” ” n n n QU (Xeu- U2= XPu-2 [I Xeut NR. Om) = (a — WY = xP 2 Do xj tne’. eu=1 eu=1 eu=1 i=l i=l i=l 7.5 Estimadores de Maxima Verossimilhanca 421 7.55 Maximum Likelihood Estimators 421 Nos vemos quePé quadratico empycom coeficiente positivo empz. Segue que We see that Q is a quadratic in yu with positive coefficient on i”. It follows that Psera minimizado onde sua derivada é 0. Se calcularmos agora a derivada dQ(u)/ Q will be minimized where its derivative is 0. If we now calculate the derivative dp, defina esta derivada igual a 0 e resolva a equacdo resultante paray, dQ(u)/dy, set this derivative equal to 0, and solve the resulting equation for 1, we descobrimos ques=xn. Segue-se, portanto, que o MLE depe“EXn. - find that uw = Xx,,. It follows, therefore, that the M.L.E. of w is @# = X,,. < Pode ser visto no Exemplo 7.5.5 que o estimadorpmdo é afetado pelo valor da It can be seen in Example 7.5.5 that the estimator /2 is not affected by the value varia¢doo2, que assumimos ser conhecido. O MLE da média desconhecida pé of the variance o”, which we assumed was known. The M.L.E. of the unknown mean simplesmente a média amostralXn, independentemente do valor deoz. Veremos isso jis simply the sample mean X.,,, regardless of the value of a”. We shall see this again novamente no préximo exemplo, em que ambospeozdeve ser estimado. in the next example, in which both w and o? must be estimated. Exemplo Amostragem de uma distribuigdo normal com média e variancia desconhecidas.Suponha novamente Example Sampling from a Normal Distribution with Unknown Mean and Variance. Suppose again 7.5.6 queXi,..., Xeformam uma amostra aleatoria a partir de uma distribuigdo normal, mas 7.5.6 that X,,..., X, form a random sample from a normal distribution, but suppose suponhamos agora que tanto a médiaye a variagdoozsdo desconhecidos. O parametro é now that both the mean y and the variance o” are unknown. The parameter is then entdo G=(u, 02). Para todos os valores observadosx, ..., Xn, a fungdo de verossimilhangaf 6 = (, 0”). For all observed values x), ..., x,, the likelihood function f,,(x|u, 07) n(x| Ll, 02) sera novamente dado pelo lado direito da Eq. (7.5.2). Esta fungdo deve agora ser will again be given by the right side of Eq. (7.5.2). This function must now be maximizada sobre todos os valores possiveis deyeo2, onde -~ <p<wea2>0. Em vez de maximized over all possible values of 4 and a”, where —oo < pp < 00 and o7 > 0. maximizar a fungdo de verossimilhanga fn(x| yu, ozddiretamente, 6 novamente mais facil Instead of maximizing the likelihood function f,,(x|w, 07) directly, it is again easier maximizar 0 log fn(x| tu, 02). NOs temos to maximize log f,,(x|u, 0”). We have eu(@F registro fn(x| pl, 02) L(O) =log f, (x|u, o°) =~ registroem © regist o bedgh (7.5.3) " Joe(2n) —" logo? - sX 2 (7.5.3) —- = registro(z7 = registroo2z- —— % . 2. =T-T =- >t Oo —- =, Xx; . Or 2" 2 a Fo 2 °8 2°89 942 eu=1 i=l Encontraremos 0 valor de@=(y, o2)para qualeu(@é maximo em trés etapas. We shall find the value of 6 = (4, 0”) for which L(@) is maximum in three Primeiro, para cada fixoo2, encontraremos o valorzfo2z)que maximiza o direito stages. First, for each fixed o, we shall find the value ji(o”) that maximizes the right lado de (7.5.3). Em segundo lugar, encontraremos 0 valorozdeozque maximizaeu(@) side of (7.5.3). Second, we shall find the value o? of o” that maximizes L(0’) when quando 0=((o2), o2). Por fim, o MLE de@sera o vetor aleatdério cujo observado 6! = (fi(o”), 0”). Finally, the M.L.E. of 6 will be the random vector whose observed o valor é(L02), 02). A primeira etapa ja foi resolvida no Exemplo 7.5.5. La, value is (fi(o2), o2). The first stage has already been solved in Example 7.5.5. There, obtivemosp(oz- xn. Para a segunda etapa, definimos 9=(Xn, o2Je maximizar we obtained fi(o”) = x, For the second stage, we set 0’ = (X,, o*) and maximize n n 1) ” , n n 2 1 “ 2 eUu(OF - = registro = registroo- —— OK )2— (7.5.4) L(0’) = —= log(27) — = log o* — —~ Xj —X,))°. 75.4 > 9g > g 2a an, (0') 2 g(2r) 2 g Io2 du i n) ( ) eu=1 i=l Isto pode ser maximizado definindo sua derivada em relagdo aozigual aNe This can be maximized by setting its derivative with respect to o* equal to 0 and resolvendo paracz. A derivada é solving for o”. The derivative is d mi 1 2 d ni 1 < 3 —eu(@)=- =~—+ — §— (xxXp- — L(6’)=-~= + ——~ ) (; -X,). doz 202 2(ap an do2 202 2(02)2 » coo" eu=1 i=l Definir isso como 0 produz Setting this to 0 yields 12" — 2_1¥< = \2 Q= — (xxp— * (7.5.5) 07 == (x, — Xp). (7.5.5) n n* eu=1 i=l A segunda derivada de (7.5.4) € negativa no valor deozem (7.5.5), entao temos The second derivative of (7.5.4) is negative at the value of o? in (7.5.5), so we have encontrou o maximo. Portanto, o MLE de6=(p, 02}6 found the maximum. Therefore, the M.L.E. of 0 = (yu, o”) is (aye ~ ig & =(U, BFXn, — —— (Xe Xn. (7.5.6) 6 = (fi, 02) = (x. = Sx; - %,”) (7.5.6) ” eu "i=l Observe que a primeira coordenada do MLE na Eq. (7.5.6) 6 chamada de média Notice that the first coordinate of the M.L.E. in Eq. (7.5.6) is called the sample amostral dos dados. Da mesma forma, chamamos a segunda coordenada deste MLE mean of the data. Likewise, we call the second coordinate of this M.L.E. the sample devariagao amostral. Nao é dificil ver que o valor observado da varidncia amostral é variance. It is not difficult to see that the observed value of the sample variance is 422 Capitulo 7 Estimativa 422 Chapter 7 Estimation a variancia de uma distribuicao que atribui probabilidade 1/npara cada um dosnvalores the variance of a distribution that assigns probability 1/n to each of the n observed observadosm,..., Xnna amostra. (Veja o Exercicio 1.) - values x;,..., x, in the sample. (See Exercise 1.) < Exemplo Amostragem de uma distribuigéo uniforme.Suponha que, ..., Xnformar uma amostra aleatoria Example Sampling froma Uniform Distribution. Suppose that X,..., X,, formarandom sample 7.5.7 da distribuigdo uniforme no intervalo [0,6], onde o valor do parametro 7.5.7 from the uniform distribution on the interval [0, 6], where the value of the parameter GE desconhecido(@ >0). O pdff(x| Ade cada observagdo tem a seguinte forma: @ is unknown (6 > 0). The p.d.f. f(x|@) of each observation has the following form: { | < 1 fix|@= @ - Para OSx<6, (7.5.7) f(x|0) = | g for0<x <6, (75.7) 0 de outra forma. 0 otherwise. Portanto, o pdf conjunto fod Adem ,...,Xntem a forma Therefore, the joint p.d-f. f,,(x|@) of X;,..., X, has the form 1 1 , _ fin(x|O)= nat OsxeusAHeu=1,..., 1), de (7.5.8) (10) = | a for0<x;<0(i=1,...,n), (7.5.8) O _outra forma. 0 otherwise. Isso pode ser visto na Eq. (7.5.8) que o MLE de@deve ser um valor de @para qual@ It can be seen from Eq. (7.5.8) that the M.L.E. of 6 must be a value of 6 for 2Xeuparaeu=1,..., Ne isso maximiza 1/Gnentre todos esses valores. Desde 1/Oné uma which 6 > x; fori =1,..., and that maximizes 1/0” among all such values. Since funcdo decrescente de®, a estimativa sera o menor valor de@de tal modo queG=xeu 1/6” is a decreasing function of 6, the estimate will be the smallest value of 6 such paraeu=1,..., 7. Como esse valor €O= maximo{x, ..., Xn}, O MLE de 6& maximo{x that 6 >x,; fori =1,...,n. Since this value is 6 = max{x,,..., x,}, the M.L.E. of 6 1,...,Xn}. - is 90 =max{Xj,..., Xp}. < @) Limitagdes da estimativa de maxima verossimilhanga e) Limitations of Maximum Likelihood Estimation Apesar do seu apelo intuitivo, o método da maxima verossimilhanca ndo é Despite its intuitive appeal, the method of maximum likelihood is not necessarily necessariamente apropriado em todos os problemas. Por exemplo, no Exemplo 7.5.7, 0 appropriate in all problems. For instance, in Example 7.5.7, the M.L.E. 6 does not MLE @hdo parece ser um estimador adequado de@. Desde max{X1,.. . , Xn}<Ocom seem to be a suitable estimator of 0. Since max{X 1, ..., X,,} < @ with probability 1, it probabilidade 1, segue que@ertamente subestima o valor de@ Na verdade, se qualquer follows that 6 surely underestimates the value of 6. Indeed, if any prior distribution distribuicgdo anterior for atribuida a@, entdo o estimador de Bayes de@certamente sera is assigned to 6, then the Bayes estimator of 6 will surely be greater than 6. The maior que& O valor real pelo qual o estimador Bayes excede @dependera, é claro, da actual amount by which the Bayes estimator exceeds 6 will, of course, depend on the distribuigdo a priori especifica usada e dos valores observados deXi,..., Xn. O Exemplo particular prior distribution that is used and on the observed values of Xj, ..., X;. 7.5.7 também levanta outra dificuldade com a maxima verossimilhanca, como ilustramos Example 7.5.7 also raises another difficulty with maximum likelihood, as we illustrate no Exemplo 7.5.8. in Example 7.5.8. Exemplo Inexisténcia de um MLESuponha novamente queXi, ..., Xnformar uma amostra aleatéria de Example Nonexistence ofan M.L.E. Suppose again that X,,..., X,, formarandom sample from 7.5.8 a distribuigdo uniforme no intervalo [0,4]. No entanto, suponha agora que em vez de 7.5.8 the uniform distribution on the interval [0, 0]. However, suppose now that instead of escrever o pdff(x| @)da distribuigdo uniforme na forma dada na Eq. (7.5.7), writing the p.d.f. f(x|@) of the uniform distribution in the form given in Eq. (7.5.7), escrevemos da seguinte forma: we write it in the following form: { 1 1 fx\@= a Para O<x <O, (7.5.9) F(l6) = | 5 for0<x <8, (7.5.9) 0 de outra forma. 0 otherwise. A unica diferenga entre a Eq. (7.5.7) e Eq. (7.5.9) € que o valor da pdf em cada um The only difference between Eq. (7.5.7) and Eq. (7.5.9) is that the value of dos dois pontos finais 0 e foi alterado substituindo as desigualdades fracas na Eq. the p.d.f. at each of the two endpoints 0 and @ has been changed by replacing the (7.5.7) com desigualdades estritas na Eq. (7.5.9). Portanto, qualquer uma das weak inequalities in Eq. (7.5.7) with strict inequalities in Eq. (7.5.9). Therefore, equacoes poderia ser usada como a pdf da distribuicdo uniforme. No entanto, se a either equation could be used as the p.d-f. of the uniform distribution. However, Eq. (7.5.9) € usado como pdf, entao um MLE deésera um valor de@para qual @ > xeu if Eq. (7.5.9) is used as the p.d.f, then an M.L.E. of 6 will be a value of 6 for which paraeu=1,..., me que maximiza 1/Onentre todos esses valores. Deve-se notar que os 6 > x; fori =1,..., and which maximizes 1/6” among all such values. It should be possiveis valores de@ndo inclui mais o valor@= maximo{m, ..., Xn}, porque@devemos noted that the possible values of 6 no longer include the value 6 = max{x),..., x,}, serestritamentemaior que cada valor observadoxeu(eu=1,..., 7). Porque Opode ser because 6 must be strictly greater than each observed value x; (i =1,..., n). Because escolhido arbitrariamente prdéximo ao valor max{x1, ..., Xn}mas ndo pode ser @ can be chosen arbitrarily close to the value max{x;,..., x,} but cannot be chosen escolhido igual a este valor, segue-se que o MLE de@nao existe. - equal to this value, it follows that the M.L.E. of 6 does not exist. < Em todas as nossas discuss6es anteriores sobre pdfs, enfatizamos o fato de que In all of our previous discussions about p.d.f.’s, we emphasized the fact that it is é irrelevante se a pdf da distribuigdo uniforme é escolhida como igual a 1/0 irrelevant whether the p.d.f. of the uniform distribution is chosen to be equal to 1/0 7.5 Estimadores de Maxima Verossimilhanca 423 7.55 Maximum Likelihood Estimators 423 no intervalo aberto 0<x <Gou sobre o intervalo fechado 0<x<@.Agora, porém, vemos que a over the open interval 0 < x < @ or over the closed interval 0 < x < 0. Now, however, exist€ncia de um MLE depende desta escolha irrelevante e sem importancia. Esta we see that the existence of an M.L.E. depends on this irrelevant and unimportant dificuldade é facilmente evitada no Exemplo 7.5.8 usando a fdp dada pela Eq. (7.5.7) em choice. This difficulty is easily avoided in Example 7.5.8 by using the p.d.f. given by vez do dado pela Eq. (7.5.9). Também em muitos outros problemas, uma dificuldade deste Eq. (7.5.7) rather than that given by Eq. (7.5.9). In many other problems as well, a tipo pode ser evitada simplesmente escolhendo uma versdo especifica apropriada da PDF difficulty of this type can be avoided simply by choosing one particular appropriate para representar a distribuigdo dada. Contudo, como veremos no Exemplo 7.5.10, a version of the p.d.f. to represent the given distribution. However, as we shall see in dificuldade nem sempre pode ser evitada. Example 7.5.10, the difficulty cannot always be avoided. Exemplo Ndo exclusividade de um MLESuponha que, .. ., Xnformar uma amostra aleatdéria de Example Non-uniqueness of an M.L.E. Suppose that X,,..., X,, form a random sample from 7.5.9 a distribuigdo uniforme no intervalo [@, 6+1], onde o valor do pardmetro GE 7.5.9 the uniform distribution on the interval [6, 6 + 1], where the value of the parameter desconhecido(-~<@<~). Neste exemplo, o pdf conjunto/fn(x| @tem a forma @ is unknown (—oo < 6 < oo). In this example, the joint p.d-f. f,,(v|@) has the form { K XeuS =1,... <x,< j=1,... flx| O 1 para@sxeu<6+1, (eu=1,..., 7), de (7.5.10) f,(x|0) = | 1 foré <4 <04+1,(7=1, ,n), (7.5.10) 0 outra forma. 0 otherwise. A condicdo que @sxeuparaeu=1,..., Né equivalente a condicdo queds min{x1 The condition that 6 < x; fori =1,..., is equivalent to the condition that 6 < ,..-,Xnmy. Da mesma forma, a condigdo quexeu<O+1 paraeu=1,..., Né equivalente a min{x,,..., x,}. Similarly, the condition that x, <@ +1fori=1,...,n is equivalent condicdo qued=maximo{x, ..., Xn} -1. Portanto, em vez de escrever fn(x| 8)na forma to the condition that 6 > max{x;,..., x,} — 1. Therefore, instead of writing f,,(x|@) dada na Eq. (7.5.10), podemos usar a seguinte forma: in the form given in Eq. (7.5.10), we can use the following form: fux| 6 { 1 por maximo{™,..., xn} -1S@smin{m1, ..., Xn}, de (7.5.11) f,(x16) = { 1 for max{x1, 1. Xp} —1 <0 <min{xy,..., xy}, (7.5.11) 0 outra forma. 0 otherwise. Assim, é possivel selecionar como MLE qualquer valor de@no intervalo Thus, it is possible to select as an M.L.E. any value of @ in the interval maximo{x1,..., Xn} -1S@smin{xi,..., xn}. (7.5.12) max{xj,...,xX,}—1<6 <min{x,..., x,}. (7.5.12) Neste exemplo, 0 MLE nao é especificado de forma exclusiva. Na verdade, 0 método In this example, the M.L.E. is not uniquely specified. In fact, the method of da maxima verossimilhanga fornece muito pouca ajuda na escolha de uma estimativa de@. maximum likelihood provides very little help in choosing an estimate of 6. The A probabilidade de cada valor de@fora do intervalo (7.5.12) é na verdade 0. Portanto, likelihood of every value of 6 outside the interval (7.5.12) is actually 0. Therefore, nenhum valor 6fora desse intervalo seriam estimados e todos os valores dentro do no value 6 outside this interval would ever be estimated, and all values inside the intervalo séo MLEs. - interval are M.L.E.’s. < Exemplo Amostragem de uma mistura de duas distribuicdes.Considere uma variavel aleatoriaXque pode Example Sampling from a Mixture of Two Distributions. Consider a random variable X that can 7.5.10 vém com igual probabilidade da distribuigdo normal com média 0 e varidncia 1 7.5.10 come with equal probability either from the normal distribution with mean 0 and ou de outra distribuigdo normal com médiaye variagdooz, onde ambospeozsdo variance 1 or from another normal distribution with mean jz and variance o, where desconhecidos. Nessas condicées, 0 pdff(x| , ozdexXsera a média das pdfs das both jz and o? are unknown. Under these conditions, the p.d.f. f(x|u, 07) of X will duas distribuigdes normais diferentes. Por isso, be the average of the p.d.f’s of the two different normal distributions. Thus, ,' ; ( ) ; [ hewn }} od 5 ' 5 x2 XU 2 x (x — pw) A(X| , O2FF = ——— experisnia> =F ————— experiencia>@~ ——————-__. (7.5.13) x|u, 07) = =) ——7 exp| —-— ] + ———— exp] —-—_ | ¢- 7.5.13 lH 2am2 -"" 2” Qmno™ 22 POM = 5) oi XP >) + Gayiag SP] 952 (7.5.13) Suponha agora queXi, ..., Xnforme uma amostra aleatoria a partir da distribuicdo para a Suppose now that X;,..., X, form a random sample from the distribution for qual a pdf é dada pela Eq. (7.5.13). Como de costume, a fungdo de verossimilhangafn(x| fu, 02) which the p.d.f. is given by Eq. (7.5.13). As usual, the likelihood function f,,(x|w, 07) tem a forma has the form iT’ n fr(X| LJ, 02) f(xeu| [, 02). (7.5.14) frxlu, 07) =] | f(ailu, 07). (7.5.14) eu=1 i=1 Para encontrar o MLE de6@=(y, o2), devemos encontrar valores deyeozpara qual fn(x| To find the M.L.E. of 6 =(y, 07), we must find values of w and o” for which HU, 02}8 maximizado. f,(x|, 67) is maximized. Deixarxxdenotar qualquer um dos valores observados,..., Xn. Se deixarmosL= Let x, denote any one of the observed values x1, ..., x,. If we let 4 = x; and let xe deixar 0270, entdo o fatorf(xk| u, o2)no lado direito da Eq. (7.5.14) crescera sem o* — 0, then the factor f(x;,|, 07) on the right side of Eq. (7.5.14) will grow large limites, enquanto cada fatorf(xeu| py, o2)paraxeu=xevai se aproximar do valor without bound, while each factor f (x;|, 07) for x; # x, will approach the value 424 Capitulo 7 Estimativa 424 Chapter 7 Estimation ( ) ; 1 experéncia~ “eu . | exp (-) 2(2m2 2 2(2)1/2 2 Dai, quandop=xkea2-0, descobrimos quefn(x| LU, a2} ~. Hence, when pz = x, and o” — 0, we find that f,,(x|u, 02) > oo. O valor 0 nao é uma estimativa permitida deoz, porque sabemos de antemao que o2>0. The value 0 is not a permissible estimate of «7, because we know in advance that Como a fungdo de verossimilhanga pode ser arbitrariamente grande escolhendope xk o” > 0. Since the likelihood function can be made arbitrarily large by choosing jz = x; e escolhendogzarbitrariamente prdoximo de 0, segue-se que 0 MLE ndo existe. and choosing o” arbitrarily close to 0, it follows that the M.L.E. does not exist. Se tentarmos corrigir esta dificuldade permitindo que o valor 0 seja uma estimativa If we try to correct this difficulty by allowing the value 0 to be a permissible permissivel deoz, entaéo descobrimos que existemndiferentes MLEs deyeoz; nomeadamente, estimate of o”, then we find that there are n different M.L.E.’s of 4 and o7; namely, Oh= (Uf, 03)=(Xk,0)parak=1,..., 7. 6, = (jt, 02) = (X,, 0) fork =1,...,n. Nenhum desses estimadores parece apropriado. Consideremos novamente a descri¢ao, None of these estimators seems appropriate. Consider again the description, given dada no inicio deste exemplo, das duas distribuigdes normais das quais cada observacdo at the beginning of this example, of the two normal distributions from which each pode provir. Suponha, por exemplo, quen=1000, e usamos 0 estimador 63=(X3,0). Entao, observation might come. Suppose, for example, that n = 1000, and we use the esti- estariamos estimando o valor da variancia desconhecida como 0; também, na verdade, mator 63 = (X3, 0). Then, we would be estimating the value of the unknown variance estariamos nos comportando como se exatamente um dosXe(ou seja, X3) vem da to be 0; also, in effect, we would be behaving as if exactly one of the X;’s (namely, distribuigéo normal desconhecida dada, enquanto todos os outros 999 valores de X3) comes from the given unknown normal distribution, whereas all the other 999 observacdo vém da distribuicgdo normal com média 0 e variancia 1. Na verdade, porém, observation values come from the normal distribution with mean 0 and variance 1. como cada observacdo tinha a mesma probabilidade de vir de qualquer uma das duas In fact, however, since each observation was equally likely to come from either of the distribuigdes, é € muito mais provavel que centenas de observacgées, em vez de apenas two distributions, itis much more probable that hundreds of observations, rather than uma, venham da distribuigdéo normal desconhecida. Neste exemplo, o método da maxima just one, come from the unknown normal distribution. In this example, the method of verossimilhanga é obviamente insatisfatorio. Uma solugdo bayesiana para este problema maximum likelihood is obviously unsatisfactory. A Bayesian solution to this problem é descrita no Exercicio 10 da Segdo. 12.5. - is outlined in Exercise 10 in Sec. 12.5. < Por fim, mencionaremos um ponto relativo a interpretagdo do MLE O MLE éo0 Finally, we shall mention one point concerning the interpretation of the M.L.E. valor de@que maximiza o PF ou pdf condicional dos dadosX dado@. Portanto, a The M.L.E. is the value of 6 that maximizes the conditional p.f. or p.d-f. of the data X estimativa de maxima verossimilhanga é 0 valor de @que atribuiu a maior given 6. Therefore, the maximum likelihood estimate is the value of 6 that assigned probabilidade de ver os dados observados. Nao é necessariamente o valor do the highest probability to seeing the observed data. It is not necessarily the value of pardmetro que parece ser mais provavel, dados os dados. Para dizer a probabilidade the parameter that appears to be most likely given the data. To say how likely are de diferentes valores do pardmetro, seria necessaria uma distribuigdo de different values of the parameter, one would need a probability distribution for the probabilidade para o parametro. E claro que a distribuigdo posterior do parametro parameter. Of course, the posterior distribution of the parameter (Sec. 7.2) would (Secdo 7.2) serviria para esse propésito, mas nenhuma distribui¢do posterior esta serve this purpose, but no posterior distribution is involved in the calculation of the envolvida no calculo do MLE. Portanto, nado é legitimo interpretar o MLE como o valor M.L.E. Hence, it is not legitimate to interpret the M.L.E. as the most likely value of mais provavel do parametro apés tendo visto os dados. the parameter after having seen the data. Por exemplo, considere uma situagdo coberta pelo Exemplo 7.5.4. Suponha que For example, consider a situation covered by Example 7.5.4. Suppose that we vamos langar uma moeda algumas vezes e estamos preocupados se ela tem ou ndo uma are going to flip a coin a few times, and we are concerned with whether or not it ligeira tendéncia para cara ou coroa. DeixarXeu=1 se oeuo lance é cara e Xev=0 se ndo. Se has a slight bias toward heads or toward tails. Let X; = 1 if the ith flip is heads and obtivermos quatro caras e uma coroa nas primeiras cinco jogadas, o valor observado do X; = 0 if not. If we obtain four heads and one tail in the first five flips, the observed MLE sera 0,8. Mas seria dificil imaginar uma situagdo em que sentiriamos que o valor mais value of the M.L.E. will be 0.8. But it would be difficult to imagine a situation in provavel de@, a probabilidade de dar cara, chega a 0,8 com base em apenas cinco which we would feel that the most likely value of 6, the probability of heads, is as langamentos do que parecia a priori ser uma moeda tipica. Tratar o EML como se fosse 0 large as 0.8 based on just five tosses of what appeared a priori to be a typical coin. valor mais provavel do parametro é praticamente o mesmo que ignorar a informacdo Treating the M.L.E. as if it were the most likely value of the parameter is very much prévia sobre a doenga rara no exame médico dos Exemplos 2.3.1 e 2.3.3. Se o teste for the same as ignoring the prior information about the rare disease in the medical test positivo nestes exemplos, descobrimos (no Exemplo 7.5.3) que o MLE assume 0 valor 6 0. of Examples 2.3.1 and 2.3.3. If the test is positive in these examples, we found (in 9, que corresponde a ter a doenca. Contudo, se a probabilidade anterior de vocé ter a Example 7.5.3) that the M.L.E. takes the value 6 = 0.9, which corresponds to having doenca for tao pequena como no Exemplo 2.3.1, a probabilidade posterior de vocé ter a the disease. However, if the prior probability that you have the disease is as small doenga (8=0.9) ainda € pequeno mesmo apos 0 resultado positivo do teste. O teste ndo é as in Example 2.3.1, the posterior probability that you have the disease (6 = 0.9) preciso o suficiente para superar completamente as informagées anteriores. O mesmo is still small even after the positive test result. The test is not accurate enough to acontece com 0 nosso langamento de moeda; cinco langamentos ndo sao informagées completely overcome the prior information. So too with our coin tossing; five tosses suficientes para superar crengas anteriores sobre a moeda ser tipica. Somente quando os are not enough information to overcome prior beliefs about the coin being typical. dados contiverem muito mais informagées do que as disponiveis a priori é que Only when the data contain much more information than is available a priori would 7.5 Estimadores de Maxima Verossimilhanca 425 7.5 Maximum Likelihood Estimators 425 seria aproximadamente correto pensar no MLE como 0 valor do qual acreditamos que o it be approximately correct to think of the M.L.E. as the value that we believe the parametro tem maior probabilidade de estar préximo. Isto pode acontecer quando o MLE se parameter is most likely to be near. This could happen either when the M.L.E. is baseia em muitos dados ou quando ha muito pouca informacdo prévia. based on a lot of data or when there is very little prior information. e@ e@ Resumo Summary Aestimativa de maxima verossimilhanga de um pardmetro6é esse valor de@que fornece o The maximum likelihood estimate of a parameter 6 is that value of 6 that provides maior valor da fungdo de verossimilhangafn(x| OJpara dados fixosx. Sed(x)denota a the largest value of the likelihood function f,,(x|6) for fixed data x. If 5(x) denotes the estimativa de maxima verossimilhanga, entéao 0=4(X o estimador de maxima maximum likelihood estimate, then 6 = 6(X) is the maximum likelihood estimator verossimilhanga (MLE). Calculamos o MLE quando os dados compreendem uma amostra (M.L.E.). We have computed the M.L.E. when the data comprise a random sample aleatéria de uma distribuigdo de Bernoulli, uma distribuigdo normal com variancia from a Bernoulli distribution, a normal distribution with known variance, a normal conhecida, uma distribuigdo normal com ambos os pardmetros desconhecidos ou a distribution with both parameters unknown, or the uniform distribution on the distribuigdo uniforme no intervalo [0,6] ou no intervalo [6, 6+1]. interval [0, 6] or on the interval [6, 6 + 1]. Exercicios Exercises 1.Deixarx1,..., xasejam numeros distintos. DeixarSseja uma 7.Suponha que®,..., Xnformar uma amostra aleatoria de 1. Let x,,..., x, be distinct numbers. Let Y be a discrete 7. Suppose that X,,..., X,, form a random sample from varidvel aleatéria discreta com o seguinte PF: uma distribuigdo exponencial para a qual o valor do random variable with the following p-f.: an exponential distribution for which the value of the { parametrofe desconhecido(B >0). Encontre o MLE def. parameter # is unknown (8 > 0). Find the M.L.E. of 6. 1 , Lj sesime {x1,..., Xn}, de _ j= ifye {xy,..., xy}, f(s n 8.Suponha queX,..., Xnformar uma amostra aleatoria de fO)= | n . " 8. Suppose that X,,..., X,, form a random sample from O outra forma. re , . QO otherwise. vote . : . = uma distribuigdo para a qual a pdff(x| 9 o seguinte: a distribution for which the p.d.f. f(«|@) is as follows: Prove que Var(S dado pela Eq. (7.5.5). { Prove that Var(Y) is given by Eq. (7.5.5). C6x parax> 8 e’-* for x > 6 . . . fix| OF P f(x|0) = , 2.Nao se sabe qual proporcaopdas compras de uma determinada 0 parax<@. 2. It is not known what proportion p of the purchases of a 0 for x <0. marca de cereais matinais sdo feitas por mulheres e que certain brand of breakfast cereal are made by women and proporcao sao feitas por homens. Numa amostra aleatéria de 70 Além disso, suponha que o valor de GE desconhecido(-» <6< «) what proportion are made by men. In a random sample of Also, suppose that the value of 6 is unknown (—oo < 6 < compras deste cereal, constatou-se que 58 foram efetuadas por . 70 purchases of this cereal, it was found that 58 were made oo). mulheres e 12 por homens. Encontre o MLE dep. a.Mostre que o MLE de@nio existe. by women and 12 were made by men. Find the M.L.E. of p. a. Show that the M.L_E. of @ does not exist. 3.Considere novamente as condigdes do Exercicio 2, mas suponha b.Determine outra versdo da pdf desta mesma 3. Consider again the conditions in Exercise 2, but sup- b. Determine another version of the p.d.f. of this same afirmam também que se sabe quei 2SPS2 3. Se. a observacdo distribui¢do para a qual o MLE de@existira e pose also that it is known that 5 < p < 4. If the observa- distribution for which the M.L.E. of 6 will exist, and Na amostra aleatdéria de 70 compras sao fornecidas no encontre esse estimador. tions in the random sample of 70 purchases are as given find this estimator. Exercicio 2, qual 6 o MLE dep? . in Exercise 2, what is the M.L.E. of p? q P 9.Suponha que, ..., Xnformar uma amostra aleatoria P 9. Suppose that X,..., X,, form arandom sample from a 4.Suponha que, ..., Xnformar uma amostra aleatoriada | de uma distribui¢do para a qual a pdffx| A& o 4. Suppose that X;,..., X, formarandom sample from __ distribution for which the pdf f(x|0) is as distribuicao de Bernoulli com parametro@, 0 que é seguinte: the Bernoulli distribution with parameter 6, whichis un- _follows: desconhecido, mas sabe-se queGencontra-se no intervalo { @x0-1 known, but it is known that 6 lies in the open interval @-1 aberto 0<0<1, Mostre que o MLE de@ndo existe se todo f(x| OF para O<x <1, 0 <6 <1.Show that the M.L.E. of @ does not exist if every f(x|0) = | Ox for0<x <1, valor observado for 0 ou se todo valor observado for 1. 0 de outra forma. observed value is 0 or if every observed value is 1. 0 otherwise. 5.Suponha queXi, ..., Xnformar uma amostra aleatoria a partir de Além disso, suponha que o valor de@e desconhecido(@ >0). 5. Suppose that X;,..., X, form a random sample from Also, suppose that the value of 6 is unknown (@ > 0). Find uma distribuicdo de Poisson para a qual a média6E desconhecido, (@ >0 Encontre o MLE de@é. a Poisson distribution for which the mean @ is unknown, the M.L.E. of 6. , 0 > 0). ; 10.Suponha que, ..., Xnformar uma amostra aleatéria de (@>9) ; ; 10. Suppose that Xj, ..., X,, form a random sample from a.Determine o MLE de@, assumindo que pelo menos um uma distribuigdo para a qual a pdff(x| 9 o seguinte: a. Determine the M.L.E. of 6, assuming that at least a distribution for which the p.d.f. f(x|@) is as follows: dos valores observados seja diferente de 0. one of the observed values is different from 0. b.Mostre que o MLE de@nao existe se todo valor f(x| O-e- (ho) para -2< x <t, b. Show that the M.L.E. of @ does not exist if every f(x|0) = 1a for —co<x<oo. observado for 0. 2 observed value is 0. 2 6.Suponha que%, ..., Xaformar uma amostra aleatéria de Além disso, suponha que o valor deGE desconhecido(-»< 6 <« 6. Suppose that X;,..., X, form arandom sample from Also, suppose that the value of @ is unknown (—oo < uma distribuicdo normal para a qual a médiayé conhecida, ). Encontre o MLE de@.Dica:Compare isso com o problema de a normal distribution for which the mean y is known, but < 00). Find the M.L.E. of @. Hint: Compare this to the mas a variacdoo2E desconhecido. Encontre o MLE deoz. minimizacao do MAE como no Teorema 4.5.3. the variance o? is unknown. Find the M.L.E. of o?. problem of minimizing M.A.E as in Theorem 4.5.3. 426 Capitulo 7 Estimativa 426 Chapter 7 Estimation 11.Suponha que, ..., Xnformar uma amostra aleatoria a exatamenteneuindividuos sdo do tipoeu, ondem+.. += Nn 11. Suppose that X;,..., X,, form a random sample from exactly n; individuals are of type i, where nj +---+np= partir da distribuigdo uniforme no intervalo [@, 6], onde . Encontre os MLEs de&, .. . ,k. the uniform distribution on the interval [6), 6], where n. Find the M.L.E.’s of 6), ..., O. ambos @1e62sd0 desconhecidos (-~<& <@2<«).Encontre os both 6, and 6, are unknown (—oo < 6; < 6) < oo). Find the MLEs de dhe G2. 13.Suponha que os vetores bidimensionais(X1, 51), (X2, S2 M.L.Ev's of 6) and 6. 13. Suppose that the two-dimensional vectors (X,, Yj), 12.Suponha que uma determinada grande populagdo ), «++, Xn, Snformar uma amostra aleatoria a partir de 12. Suppose that a certain large population contains k (Xo, Yo), ..., (Xy, Y,) form a random sample from a bi- contenhak diferentes tipos de individuos (k22), e deixe Geu uma distribuigdo normal bivariada para a qual as médias different types of individuals (k > 2), and let 6; denote variate normal distribution for which the means of X and denotam a proporgdo de individuos do tipoeu, paraeu=1 deXe Ssdo desconhecidos, mas as variagdes deXeSe a the proportion of individuals of type i, fori =1,...,k. Y are unknown but the variances of X and Y and the cor- ,.++,k Aqui, 0S Oeu<1 e+. . .+Ok=1. Suponha também que correlagdo entreXeSsdo conhecidos. Encontre os MLE das Here, 0 < 6; <1 and 6,+---+6,=1. Suppose also that relation between X and Y are known. Find the M.L.E.’s of numa amostra aleatéria denindividuos desta populacdo, médias. in arandom sample of n individuals from this population, the means. 7.6 Propriedades dos Estimadores de Maxima Verossimilhanca 7.6 Properties of Maximum Likelihood Estimators Nesta se¢Go, exploramos varias propriedades dos MLEs, incluindo: In this section, we explore several properties of M.L.E.’s, including: A relaggo entre o MLE de um parametro e o MLE de uma fun¢gdo desse ¢ The relationship between the M.L.E. of a parameter and the M.L.E. of a paraémetro function of that parameter A necessidade de algoritmos computacionais ¢ The need for computational algorithms O comportamento do MLE 4 medida que o tamanho da amostra aumenta ¢ The behavior of the M.L.E. as the sample size increases A falta de dependéncia do MLE do plano amostral ¢ The lack of dependence of the M.L.E. on the sampling plan Também introduzimos um método alternativo popular de estimativa (método dos momentos) We also introduce a popular alternative method of estimation (method of mo- que as vezes concorda com a maxima verossimilhanca, mas as vezes pode ser ments) that sometimes agrees with maximum likelihood, but can sometimes be computacionalmente mais simples. computationally simpler. Invariancia Invariance Exemplo Vida Util dos componentes eletr6nicos.No Exemplo 7.1.1, o parametro foi interpretado Example Lifetimes of Electronic Components. In Example 7.1.1, the parameter 6 was interpreted 7.6.1 como a taxa de falha de componentes eletrénicos. No Exemplo 7.4.8, encontramos 7.6.1 as the failure rate of electronic components. In Example 7.4.8, we found a Bayes uma estimativa de Bayes deyY= 1/6,a vida média. Existe um método correspondente estimate of wy = 1/0, the average lifetime. Is there a corresponding method for para calcular o MLE dey? - computing the M.L.E. of y? < Suponha que, ..., Xnformar uma amostra aleatoria de uma distribuigdo para a qual Suppose that X;,..., X, form a random sample from a distribution for which o PF ou o pdf éffx| 8), onde o valor do parametro GE desconhecido. O parametro pode ser either the p.f. or the p.d-f. is f(x|9), where the value of the parameter 6 is unknown. unidimensional ou um vetor de pardmetros. Deixar@denotar o MLE de@. Assim, para The parameter may be one-dimensional or a vector of parameters. Let 6 denote the todos os valores observadosx, ..., Xn, a fungdo de verossimilhangafn(x| @) é maximizado M.L.E. of 6. Thus, for all observed values x1, ..., x,, the likelihood function f, (x|0) quando@=@ is maximized when 6 = 0. Suponha agora que alteramos 0 pardmetro na distribuigdo da seguinte forma: Suppose now that we change the parameter in the distribution as follows: Instead Em vez de expressar o PF ou 0 pdff/x| AJem termos do parametro@, devemos of expressing the p.f. or the p.d.f. f(x|@) in terms of the parameter 0, we shall express expressa-lo em termos de um novo pardmetro Y=9g/0), ondegé uma fungao biunivoca it in terms of a new parameter y = g(0), where g is a one-to-one function of 6. Is de@. Existe uma relacdo entre o MLE deGe o MLE dey? there a relationship between the M.L.E. of 6 and the M.L.E. of yr? Teorema Propriedade de invariancia de MLE's.Se6 o estimador de maxima verossimilhanca deGe seg Theorem Invariance Property of M.L.E.’s. If 6 is the maximum likelihood estimator of 6 and if g 7.6.1 é uma fungéo injetora, entaéog(G% o estimador de maxima verossimilhanca deg(O). 7.6.1 is a one-to-one function, then g(0) is the maximum likelihood estimator of g(@). ProvaO novo espaco de parametros é, a imagem de sob a fungdog. Nés Proof The new parameter space is I’, the image of Q under the function g. We deve deixar @=/hA(wWMdenota a funcgao inversa. Entdo, expresso em termos do novo shall let 6 = h(y) denote the inverse function. Then, expressed in terms of the new parametroy, o PF ou pdf de cada valor observado seraf[x| h(WJ, e a fungdo de parameter yw, the p.f. or p.d-f. of each observed value will be f[x|h(y)], and the verossimilhanga sera fn[x| h(W/]. likelihood function will be f,[x|h(W)]. O MLEYdeysera igual ao valor deWpara qual fn[x| h(W)] 6 maximizado. Desde The M.L.E. y of w will be equal to the value of y for which f,[x|h(W)] fn(x| 8 maximizado quando G&G segue quefr[x| h(W) é is maximized. Since f,,(x|@) is maximized when 6 = @, it follows that f,[x|h(w)] is 7.6 Propriedades dos Estimadores de Maxima Verossimilhanca 427 7.6 Properties of Maximum Likelihood Estimators 427 maximizado quandohh(W @Portanto, o MLEYdeve satisfazer a relacdoh(y= @'ou maximized when h(y) = 6. Hence, the M.L.E. y must satisfy the relation h(y) = 6 equivalente, Y=9/(). a or, equivalently, y = g(0). 2 Exemplo Vida Util dos componentes eletr6nicos.De acordo com o Teorema 7.6.1, o MLE deyé Example Lifetimes of Electronic Components. According to Theorem 7.6.1, the M.L.E. of w is 7.6.2 um sobre o MLE de@. No Exemplo 7.5.2, calculamos o valor observado de 6£0.455. O valor 7.6.2 one over the M.L.E. of 6. In Example 7.5.2, we computed the observed value of observado deyéeria entdo 10.455 = 2.2. Isso 6 um pouco menor do que a estimativa de 6 = 0.455. The observed value of y would then be 1/0.455 = 2.2. This is a bit smaller Bayes usando perda de erro quadratico de 2,867 encontrada no Exemplo 7.4.8. than the Bayes estimate using squared error loss of 2.867 found in Example 7.4.8. - < A propriedade de invariancia pode ser estendida a fungdes que nao sao injetivas. Por The invariance property can be extended to functions that are not one-to-one. exemplo, suponha que desejamos estimar a médiayde uma distribuigdo normal quando a For example, suppose that we wish to estimate the mean yz of a normal distribution média e a variancia sdo desconhecidas. Entdoyndo é uma fungao biunivoca do paradmetro when both the mean and the variance are unknown. Then yp is not a one-to-one @=(, 02). Neste caso, a fungdo que desejamos estimar ég(@y. Existe uma maneira de function of the parameter 6 = (1, 0”). In this case, the function we wish to estimate definir o MLE de uma fungdo deGisso ndo é necessariamente um para um. Uma forma is g(9) = uw. There is a way to define the M.L.E. of a function of 6 that is not necessarily popular é a seguinte. one-to-one. One popular way is the following. Definigao MLE de uma funcdo.Deixarg(@ser uma fungdo arbitraria do pardmetro, e deixar Gser Definition M.L.E. of a Function. Let g(6) be an arbitrary function of the parameter, and let G be 7.6.1 a imagem sob a fungdog. Para cadateG, definir G= {6.9/8 he definir 7.6.1 the image of Q under the function g. For each t € G, define G, = {0 : g(0) =f} and define eur(tregistro maximo fi(x| @). L*(th= max log f, (x10). Finalmente, defina o MLE deg(/@ser fOnde Finally, define the M.L.E. of g(@) to be ¢ where eu« (=m ax.eu(t). (7.6.1) L*(f) = max L*(t). (7.6.1) teG teG O resultado a seguir mostra como encontrar o MLE deg(@)com base na Definicdo 7.6.1. The following result shows how to find the M.L.E. of g(6) based on Definition 7.6.1. Teorema Deixar &er um MLE de@, e deixarg(@ser uma fungdo de@. Entdo um MLE deg(@e Theorem Let 6 be an M.LE. of 6, and let g(@) be a function of 6. Then an M.L.E. of g(@) is 7.6.2 G8). 7.6.2 @(6). ProvaProvaremos que&9(Osatisfaz (7.6.1). Desdeeu(tJé 0 maximo do log/n(x| 8) Proof We shall prove that f= (6) satisfies (7.6.1). Since L*(t) is the maximum of sobreGem um subconjunto de, e desde log fn(x| Hé o maximo sobre todos@, nés log f,(x|0) over 6 in a subset of 2, and since log f,,(x|9) is the maximum over all 6, sabemos isso eus(tregistrofn(x| Mpara todosteG. Deixarg(8). Terminaremos se we know that L*(t) < log f,(x|) for all t ¢ G. Let = g(0). We are done if we can pudermos mostrar isso eu*(t=registro fn(x| &). Observe que & GeDesde& maximizafn(x show that L*(/) = log f,(x|0). Note that 6 € G;. Since @ maximizes f, (x|@) over all 6, | gerald, também maximizafn(x| @SobreG€GePor isso, eu«(f=registro fn(x| Het g(Oe it also maximizes f,(x|0) over 6 € G;. Hence, L*(f) = log f,(x|@) and ¢ = g(@) is an um MLE deg(@). a M.L.E. of g(@). a Exemplo Estimando o Desvio Padrdo e o Segundo Momento.Suponha que, ..., Xn Example Estimating the Standard Deviation and the Second Moment. Suppose that X;,..., X, 7.6.3 formar uma amostra aleatoria a partir de uma distribuigdo normal para a qual 7.6.3 form a random sample from a normal distribution for which both the mean yw and the tanto a médiaye a variagdoo2zsdo desconhecidos. Devemos determinar o MLE do variance o” are unknown. We shall determine the M.L.E. of the standard deviation desvio padrdo oe o MLE do segundo momento da distribuigdo normal EX2). Foi o and the M.L.E. of the second moment of the normal distribution E(X*). It was descoberto no Exemplo 7.5.6 que o MLE de@=(u, o2}66=(L/ 02). Da propriedade found in Example 7.5.6 that the M.L.E. of @ = (yu, 0”) is 6 = (ji, 02). From the de invaridncia, podemos concluir que o MLEodo desvio padrdo invariance property, we can conclude that the M.L.E. o of the standard deviation é simplesmente a raiz quadrada da variancia da amostra. Em simbolos, 0=(02)12. Além is simply the square root of the sample variance, In symbols, ¢ = (o2)'/?, Also, since disso, desde EX2/-02+p2, o MLE deEX2Wai sero2+pA. - E(X*) =o? +p”, the M.L.E. of E(X”) will be o2 + fi. < Consisténcia Consistency Considere um problema de estimativa em que uma amostra aleatéria deve ser retirada de uma Consider an estimation problem in which a random sample is to be taken from a distribuigdo envolvendo um parametro@. Suponha que para cada amostra suficientemente grande distribution involving a parameter 6. Suppose that for every sufficiently large sample 428 Capitulo 7 Estimativa 428 Chapter 7 Estimation tamanhon, isto é, para cada valor denmaior que algum numero minimo determinado, size n, that is, for every value of n greater than some given minimum number, there existe um MLE Unico de@. Entao, sob certas condigées, que normalmente sdo satisfeitas exists a unique M.L.E. of 6. Then, under certain conditions, which are typically em problemas praticos, a sequéncia de MLE é uma sequéncia consistente de estimadores satisfied in practical problems, the sequence of M.L.E.’s is a consistent sequence of de@. Em outras palavras, em tais problemas a sequéncia de MLE converge em estimators of 0. In other words, in such problems the sequence of M.L.E.’s converges probabilidade para o valor desconhecido de@écomorn> », in probability to the unknown value of 6 as n > oo. Observamos na Sec. 7.4 que sob certas condigées gerais a sequéncia de estimadores We have remarked in Sec. 7.4 that under certain general conditions the sequence Bayesianos de um pardmetroétambém é uma sequéncia consistente de estimadores. Portanto, of Bayes estimators of a parameter 9 is also a consistent sequence of estimators. para uma dada distribuigdo anterior e um tamanho de amostra suficientemente grander, o Therefore, for a given prior distribution and a sufficiently large sample size n, the estimador de Bayes e o MLE de6normalmente estardo muito préximos um do outro e ambos Bayes estimator and the M.L.E. of 6 will typically be very close to each other, and estardo muito préximos do valor desconhecido de@. both will be very close to the unknown value of 6. Nao apresentaremos quaisquer detalhes formais das condicgées necessarias para We shall not present any formal details of the conditions that are needed to provar este resultado. (Detalhes podem ser encontrados no capitulo 7 de Schervish, 1995.) prove this result. (Details can be found in chapter 7 of Schervish, 1995.) We shall, Ilustraremos, no entanto, o resultado considerando novamente uma amostra aleatoriaX1 however, illustrate the result by considering again a random sample Xj, ..., X,, from ,...,Xnda distribuigéo de Bernoulli com parametro@, 0 que é desconhecido(0s6s1 ). Foi the Bernoulli distribution with parameter 6, which is unknown (0 <6 < 1). It was mostrado na Sec. 7.4 que se a distribuicdo anterior dada deGé uma distribuigdo beta, shown in Sec. 7.4 that if the given prior distribution of 6 is a beta distribution, then entdo a diferenca entre o estimador Bayes deGe a média amostralXnconverge para 0 the difference between the Bayes estimator of 6 and the sample mean X,, converges comon> ~,Além disso, foi mostrado no Exemplo 7.5.4 que o MLE de@ Xn. Assim, comon> to 0 as n —> oo. Furthermore, it was shown in Example 7.5.4 that the M.L.E. of 6 is «,a diferenca entre o estimador de Bayes e o MLE convergira para 0. Finalmente, a lei dos X,,. Thus, as n —> 00, the difference between the Bayes estimator and the M.L.E. will grandes numeros (Teorema 6.2.4) diz que a média da amostraXnconverge em converge to 0. Finally, the law of large numbers (Theorem 6.2.4) says that the sample probabilidade para@comon- ~,Portanto, tanto a sequéncia dos estimadores de Bayes mean X,, converges in probability to 6 as n — oo. Therefore, both the sequence of quanto a sequéncia dos MLE's sdo sequéncias consistentes. Bayes estimators and the sequence of M.L.E.’s are consistent sequences. Computacao Numérica Numerical Computation Em muitos problemas existe um MLE Unicode um determinado parametro@, mas este In many problems there exists a unique M.L.E. 6 of a given parameter 6, but this MLE nao pode ser expresso de forma fechada em fungdo das observacgées da amostra. M.L.E. cannot be expressed in closed form as a function of the observations in the Neste problema, para um determinado conjunto de valores observados, é necessario sample. In such a problem, for a given set of observed values, it is necessary to determinar o valor de@por calculo numérico. Ilustraremos esta situacdo com dois determine the value of 6 by numerical computation. We shall illustrate this situation exemplos. by two examples. Exemplo Amostragem de uma distribuicao gama.Suponha queXi, ..., Xnformar uma amostra aleatéria Example Sampling from a Gamma Distribution. Suppose that X,,..., X,, form arandom sample 7.6.4 da distribuigdo gama para a qual o pdf é o seguinte: 7.6.4 from the gamma distribution for which the p.d-f. is as follows: f(x| aF ie ex parax >0. (7.6.2) fala) = } ye-ty-s for x > 0. (7.6.2) (a) P@) Suponha também que 0 valor deat desconhecido/a >0)e deve ser estimado. Suppose also that the value of a is unknown (@ > 0) and is to be estimated. A funcdo de verossimilhanca é The likelihood function is 1 ( rr ) a- ( yn ) 1 n a-l n fr(x| ae =——— Xeu exp - Xeu. (7.6.3) f,(x|a) = —— (1 “) exp (- > “) . (7.6.3) n(a) eu=1 eu=1 P@) \j-4 i=l O MLE deaserda 0 valor deaque satisfaz a equagdo The M.L.E. of a will be the value of @ that satisfies the equation registro fn(x| a) =0. (7.6.4) d log f,(x|a) 0. (7.6.4) oa da Quando aplicamos a Eq. (7.6.4) neste exemplo, obtemos a seguinte equagao: When we apply Eq. (7.6.4) in this example, we obtain the following equation: (a _ 12” r@ i 1¢ — = - registrogyy (7.6.5) —_ = log Xj- (7.6.5) (a) on eux T(@) an 2 Tabelas da fungdo'(a// (a), que é chamado defuncdo digamma, sao Tables of the function T’’(a)/T(@), which is called the digamma function, are incluido em varias colegdes publicadas de tabelas matematicas. A funcdo digamma também included in various published collections of mathematical tables. The digamma func- esta disponivel em varios pacotes de software matematico. Para todos os valores dados tion is also available in several mathematical software packages. For all given values 7.6 Propriedades dos Estimadores de Maxima Verossimilhanca 429 7.6 Properties of Maximum Likelihood Estimators 429 dexi,..., Xn, 0 valor Unico deaque Satisfaz a Eq. (7.6.5) deve ser determinado of x;,..., x,, the unique value of a that satisfies Eq. (7.6.5) must be determined either recorrendo a estas tabelas ou realizando uma andlise numérica da fungdo by referring to these tables or by carrying out a numerical analysis of the digamma digammaa. Este valor sera o MLE dea. - function. This value will be the M.L.E. of a. < Exemplo Amostragem de uma distribuicao de Cauchy.Suponha que, ..., Xnformar uma amostra aleatoria Example Sampling from a Cauchy Distribution. Suppose that X;,..., X,, form arandom sample 7.6.5 de uma distribuicdo de Cauchy centrada em um ponto desconhecido@ (-~ <@<«), para o qual o 7.6.5 from a Cauchy distribution centered at an unknown point 6 (—oo < 6 < oo), for which pdf é o seguinte: the p.d.f. is as follows: 1 1 f(x| A= —-————_}para -~ <x <oo, (7.6.6) (x|06) = ———————___ for -w<x<ow. 7.6.6 | TT 1+(-Op Perl a [1+ —6)7] ( ) Suponha também que o valor de@deve ser estimado. Suppose also that the value of 0 is to be estimated. A funcdo de verossimilhanca é The likelihood function is 1 1 frn(x| A= = —Ar—_—__-}. (7.6.7) Sf, (x|8¢) = ———_—————;. (7.6.7) Tin eypeq 1 t(Xeu8) 2 " x" TTP, [1+ @; - 4)?] Portanto, o MLE de@sera 0 valor que minimiza Therefore, the M.L.E. of 6 will be the value that minimizes rl ] n 1 +(x eO) 2 (7.6.8) I] [1+ @:- 0]. (7.6.8) eu=1 i=1 Para a maioria dos valores dex, ..., Xn, 0 valor de@que minimiza a expressdo (7.6.8) deve For most values of x;,...,x,, the value of 6 that minimizes the expression (7.6.8) ser determinada por um calculo numérico. - must be determined by a numerical computation. < Uma alternativa para a solucdo exata da Eq. (7.6.4) 6 comecar com um estimador heuristico deae An alternative to exact solution of Eq. (7.6.4) is to start with a heuristic estimator entdo aplique o método de Newton. of a and then apply Newton’s method. Definicao Método de Newton.Deixar@ser uma funcdo com valor real de uma variavel real, e suponha Definition Newton’s Method. Let f(@) be a real-valued function of a real variable, and suppose 7.6.2 que queremos resolver a equacdo/@=0. Deixe Goser uma estimativa inicial da solugdo. Método 7.6.2 that we wish to solve the equation f (6) = 0. Let 69 be an initial guess at the solution. de Newtonsubstitui a estimativa inicial pela estimativa atualizada Newton’s method replaces the initial guess with the updated guess (& 8 O=Q- HG) 6, = 6) - Lo. f(®0) f'() A légica por tras do método de Newton 6¢ ilustrada na Figura 7.7. A fungdo 8% a The rationale behind Newton’s method is illustrated in Fig. 7.7. The function curva solida. O método de Newton aproxima a curva por uma linha tangente a curva, ou f (@) is the solid curve. Newton’s method approximates the curve by a line tangent to seja, a linha tracejada que passa pelo ponto(&, f(0)), indicado pelo circulo. A linha de the curve, that is, the dashed line passing through the point (6), f()), indicated by aproximagdo cruza 0 eixo horizontal na estimativa revisada@1. Normalmente, substitui-se the circle. The approximating line crosses the horizontal axis at the revised guess 6). a estimativa inicial pela estimativa revisada e itera-se o método de Newton até que os Typically, one replaces the intial guess with the revised guess and iterates Newton’s resultados se estabilizem. method until the results stabilize. Figu ra 7.7Newton Ilustragdo do Método de Newton Figure 7.7 Newton’s Illustration of Newton’s Method FO) método para aproximar a . method to approximate the solucdo deff@F0. A 10 solution to f(@) =0. The 10 estimativa inicial 6, ea initial guess is 09, and the a . ; 0,8 . . 0.8 estimativa revisada 601. revised guess is 6}. 0,6 0.6 0,4 0.4 sect O 0,2 0; O 0.2 vocé?2,0 21,5 21,0 20,5 0 9-29 =1.5 -1.0 0.5 9 430 Capitulo 7 Estimativa 430 Chapter 7 Estimation Exemplo Amostragem de uma distribuigdo gama.No Exemplo 7.6.4, suponha que observamos Example Sampling from a Gamma Distribution. In Example 7.6.4, suppose that we observe 7.6.6 n=20 variaveis aleatdrias gama. 5¢ , X20com parametrosag,},,Suponha que 7.6.6 n = 20 gamma random variables Xj, ..., X29 with parameters a and 1. Suppose that os valores observados so tais que15, — eu=1registro(xeu1.220 e155 ev=1 Xeu=3.679. NOs the observed values are such that n al log(x;) = 1.220 and 5 al x; = 3.679. We desejo usar o método de Newton para aproximar o MLE. Uma estimativa inicial wish to use Newton’s method to approximate the M.L.E. A sensible initial guess is sensata € baseada no fato de queFXeuKa. Isso sugere usarav= 3.679, a média based on the fact that E(X;) = a. This suggests using a = 3.679, the sample mean. amostral. A funcgdofakwW(ay 1.220, onde wé a funcdo digamma. A derivada f(akw(a) The function f(a) is y(@) — 1.220, where y is the digamma function. The derivative , que é conhecida como fungao trigama. O método de Newton atualiza a estimativa f(a) is w'(a@), which is known as the trigamma function. Newton’s method updates inicialaopara the intial guess ap to (a0)-1.220 1.1607-1.220 ay) — 1.220 1.1607 — 1.220 a@=ao- Y(a0y1.220 =3.679 - ——_————_ 3.871. Oy = Ay — ¥ (Go) — 1.220 = 3.679 — ———__———_ = 3.871. W(a0) 0.3120 w’ (ag) 0.3120 Aqui, usamos software estatistico que calcula as fungdes digamma e trigamma. Here, we have used statistical software that computes both the digamma and Apos mais duas iterag6es, a aproximagao se estabiliza em 3.876. trigamma functions. After two more iterations, the approximation stabilizes at 3.876. - < O método de Newton pode falhar terrivelmente sef(@)/f (@chega perto de 0 entre&be Newton’s method can fail terribly if f’(0)/f (0) gets close to 0 between 69 and the a solucdo real paraff@F0. Existe uma versdo multidimensional do método de Newton, que actual solution to f (6) = 0. There is a multidimensional version of Newton’s method, nado apresentaremos aqui. Existem também muitos outros métodos numéricos para which we will not present here. There are also many other numerical methods for maximizar fung6es. Qualquer texto sobre otimizagéo numérica, como Nocedal e Wright maximizing functions. Any text on numerical optimization, such as Nocedal and (2006), descrevera alguns deles. Wright (2006), will describe some of them. Método dos Momentos Method of Moments Exemplo Amostragem de uma distribuicao gama.Suponha queXi, .. ., Xnformar uma amostra aleatéria Example Sampling from a Gamma Distribution. Suppose that X;,..., X, form a random sam- 7.6.7 ple da distribuic¢do gama com parametrosaef. No Exemplo 7.6.4, explicamos como 7.6.7 ple from the gamma distribution with parameters a and #. In Example 7.6.4, we se poderia encontrar o MLE deaseferam conhecidos. O método envolvia a fungdo explained how one could find the M.L.E. of a if 6 were known. The method involved digamma, que muitas pessoas nado conhecem. Uma estimativa de Bayes também the digamma function, which is unfamiliar to many people. A Bayes estimate would seria dificil de encontrar neste exemplo porque teriamos que integrar uma fungdo also be difficult to find in this example because we would have to integrate a func- que incluisse um fator de 1/paranfajnokidesteaxatnplovaneira de estimar o vetor tion that includes a factor of 1/I'(@)”. Is there no other way to estimate the vector - parameter 6 in this example? < O método dos momentos é um método intuitivo para estimar parametros quando outros The method of moments is an intuitive method for estimating parameters when métodos mais atrativos podem ser muito dificeis. Também pode ser usado para obter uma estimativa other, more attractive, methods may be too difficult. It can also be used to obtain an inicial para a aplicagdo do método de Newton. initial guess for applying Newton’s method. Definigao Método dos Momentos.Assuma issoM, ..., Xnformar uma amostra aleatoria a partir de um dis- Definition Method of Moments. Assume that X,,..., X, form a random sample from a dis- 7.6.3 contribuigdo que é indexada por umApardmetro -dimensionalGe isso tem pelo menosk 7.6.3 tribution that is indexed by a k-dimensional parameter @ and that has at least k momentos finitos. Para/=1,..., k, deixar~j(@- EX 1 | 8). Suponha que a funcado finite moments. For j = 1,...,k, let 4;(0) = E(X{|@). Suppose that the function H(OF(tn (0), ..., Lk(O)E uma funcgao biunivoca de@. DeixarM(in,..., ’kdenotam a (0) = (44 (0), ..., Uz (8)) is a one-to-one function of 6. Let M(u4, ..., “,) denote funcdo inversa, ou seja, para todos6, the inverse function, that is, for all 6, OM (0), ..., tk(O)). 06=M(u1(0),..., Ux(O)). Defina amomentos de amostrapor eu1, 2 cui xpupara/=1, ...,k. Ométodo de Define the sample moments by mj = 4 an x/ for j =1,...,k. The method of estimador de momentosde 66 Milimetrosi, ..., euj). moments estimator of 8 is M(m,,..., mj). A maneira usual de implementar o método dos momentos é configurar okequagées eu=L/ The usual way of implementing the method of moments is to set up the k equations j(Ok entdo resolva paraé. mj; = 1; (6) and then solve for 6. Exemplo Amostragem de uma distribuigdo gama.No Exemplo 7.6.4, consideramos uma amostra de Example Sampling from a Gamma Distribution. In Example 7.6.4, we considered a sample of 7.6.8 tamanhonda distribuigdo gama com pardmetrosae 1. A média de cada 7.6.8 size n from the gamma distribution with parameters a and 1. The mean of each 7.6 Properties of Maximum Likelihood Estimators 431 such random variable is μ1(α) = α. The method of moments estimator is then ˆα = m1, the sample mean. This was the initial guess used to start Newton’s method in Example 7.6.6. ◀ Example 7.6.9 Sampling from a Gamma Distribution with Both Parameters Unknown. Theorem 5.7.5 tells us that the first two moments of the gamma distribution with parameters α and β are μ1(θ) = α β , μ2(θ) = α(α + 1) β2 . The method of moments says to replace the right-hand sides of these equations by the sample moments and then solve for α and β. In this case, we get ˆα = m2 1 m2 − m2 1 , ˆβ = m1 m2 − m2 1 as the method of moments estimators. Note that m2 − m2 1 is just the sample variance. ◀ Example 7.6.10 Sampling from a Uniform Distribution. Suppose that X1, . . . , Xn form a random sample from the uniform distribution on the interval [θ, θ + 1], as in Example 7.5.9. In that example, we found that the M.L.E. is not unique and there is an interval of M.L.E.’s max{x1, . . . , xn} − 1 ≤ θ ≤ min{x1, . . . , xn}. (7.6.9) This interval contains all of the possible values of θ that are consistent with the ob- served data. We shall now apply the method of moments, which will produce a single estimator. The mean of each Xi is θ + 1/2, so the method of moments estimator is Xn − 1/2. Typically, one would expect the observed value of the method of moments estimator to be a number in the interval (7.6.9). However, that is not always the case. For example, if n = 3 and X1 = 0.2, X2 = 0.99, X3 = 0.01 are observed, then (7.6.9) is the interval [−0.01, 0.01], while X3 = 0.4. The method of moments estimate is then −0.1, which could not possibly be the true value of θ. ◀ There are several examples in which method of moments estimators are also M.L.E.’s. Some of these are the subjects of exercises at the end of this section. Despite occasional problems such as Example 7.6.10, the method of moments estimators will typically be consistent in the sense of Definition 7.4.6. Theorem 7.6.3 Suppose that X1, X2, . . . are i.i.d. with a distribution indexed by a k-dimensional pa- rameter vector θ. Suppose that the first k moments of that distribution exist and are finite for all θ. Suppose also that the inverse function M in Definition 7.6.3 is contin- uous. Then the sequence of method of moments estimators based on X1, . . . , Xn is a consistent sequence of estimators of θ. Proof The law of large numbers says that the sample moments converge in prob- ability to the moments μ1(θ), . . . , μk(θ). The generalization of Theorem 6.2.5 to 7.6 Propriedades dos Estimadores de Máxima Verossimilhança 431 tal variável aleatória éμ1(α)=α. O método do estimador de momentos é entãoα̂= eu1, a média amostral. Esta foi a estimativa inicial usada para iniciar o método de Newton no Exemplo 7.6.6. - Exemplo 7.6.9 Amostragem de uma distribuição gama com ambos os parâmetros desconhecidos.Teorema 5.7.5 nos diz que os dois primeiros momentos da distribuição gama com parâmetrosαe β são α μ1(θ)=, β α(α+1) β2 μ2(θ)= . O método dos momentos diz para substituir os lados direitos dessas equações pelos momentos da amostra e então resolverαeβ. Neste caso, obtemos eu2 α̂= 1 , eu2-eu21 eu1 eu2-eu2 β̂= 1 como o método dos estimadores de momentos. Observe queeu2-eu2 1é apenas a variação da amostra. - Exemplo 7.6.10 Amostragem de uma distribuição uniforme.Suponha queX1, . . . , Xnformar uma amostra aleatória da distribuição uniforme no intervalo [θ, θ+1], como no Exemplo 7.5.9. Nesse exemplo, descobrimos que o MLE não é único e existe um intervalo de MLE's máximo{x1, . . . , xn} -1≤θ≤min{x1, . . . , xn}. (7.6.9) Este intervalo contém todos os valores possíveis deθque são consistentes com os dados observados. Aplicaremos agora o método dos momentos, que produzirá um único estimador. A média de cadaXeuéθ+ 1/2, então o método do estimador de momentos é Xn- 1/2. Normalmente, seria de esperar que o valor observado do estimador do método dos momentos fosse um número no intervalo (7.6.9). No entanto, esse não é sempre o caso. Por exemplo, sen=3 eX1= 0.2, X2= 0.99, X3= 0.01 são observados, então (7.6.9) é o intervalo [−0.01,0.01], enquantoX3= 0.4. O método de estimativa de momentos é então − 0.1, que não poderia ser o verdadeiro valor deθ. - Existem vários exemplos em que estimadores do método dos momentos também são do MLE. Alguns deles são assuntos dos exercícios no final desta seção. Apesar de problemas ocasionais como o Exemplo 7.6.10, os estimadores do método dos momentos serão tipicamente consistentes no sentido da Definição 7.4.6. Teorema 7.6.3 Suponha queX1, X2, . . .são iid com uma distribuição indexada por umk-dimensional pa- vetor de rametroθ. Suponha que o primeirokmomentos dessa distribuição existem e são finitos para todosθ. Suponha também que a função inversaMna Definição 7.6.3 é contínuo. Então a sequência de estimadores do método dos momentos com base emX1, . . . , Xné uma sequência consistente de estimadores deθ. ProvaA lei dos grandes números diz que os momentos da amostra convergem em probabilidade para os momentosμ1(θ), . . . , μk(θ). A generalização do Teorema 6.2.5 para 432 Capitulo 7 Estimativa 432 Chapter 7 Estimation fungdes dekvariaveis implica queMavaliado nos momentos amostrais (ou seja, o Método do functions of k variables implies that M evaluated at the sample moments (i.e., the estimador de momentos) converge em probabilidade paraé. = method of moments estimator) converges in probability to 6. = Estimadores MLE e Bayes M.L.E.’s and Bayes Estimators Os estimadores Bayesianos e MLE dependem dos dados apenas por meio da funcdo de Bayes estimators and M.L.E.’s depend on the data solely through the likelihood verossimilhanga. Eles usam a funcdo de verossimilhanga de maneiras diferentes, mas em muitos function. They use the likelihood function in different ways, but in many problems problemas serao muito semelhantes. Quando a funcdo/(x| Osatisfaz certas condicdes de suavidade they will be very similar. When the function f(x|@) satisfies certain smoothness (em func¢do de@), pode-se mostrar que a funcdo de verossimilhanca tendera a se parecer cada vez conditions (as a function of 6), it can be shown that the likelihood function will tend to mais com uma fdp normal 4 medida que o tamanho da amostra aumenta. Mais especificamente, look more and more like a normal p.d.f. as the sample size increases. More specifically, comonaumenta, a funcdo de verossimilhanca comeca a parecer uma constante (ndo dependendo as n increases, the likelihood function starts to look like a constant (not depending sobre@, mas possivelmente dependendo dos dados) vezes on @, but possibly depending on the data) times [ ' _, exp 2vnoyn (2-On, (7.6.10) exp | BV, (Jn (6 —@) ; (7.6.10) ondeé o MLE eVn(@ uma sequéncia de variaveis aleatérias que normalmente converge where 6 is the M.LE. and V,,(0) is a sequence of random variables that typically comon- até um limite que chamaremos (6). Quandoné grande, a funcdo em (7.6.10) sobe converges as n — oo to a limit that we shall call v,,(@). When n is large, the function rapidamente até seu pico conformeGabordagens€ entao cai tao rapidamente quanto se in (7.6.10) rises quickly to its peak as 6 approaches 6 and then drops just as quickly as 0 afasta de@ Nestas condicées, desde que o pdf anterior doé relativamente plana em moves away from 6. Under these conditions, so long as the prior p.d.f. of 6 is relatively comparacdo com a funcao de verossimilhancga com pico maximo, a pdf posterior se parecera flat compared to the very peaked likelihood function, the posterior p.d.f. will look a muito com a probabilidade multiplicada pela constante necessaria para transforma-la em uma lot like the likelihood multiplied by the constant needed to turn it into a p.d-f. The pdf.@sera entdo aproximadamente@ Na verdade, a distribuigdo posterior de Osera posterior mean of 4 will then be approximately 6. In fact, the posterior distribution of aproximadamente a distribuigdo normal com média€ variacdo Vn(@)/n. De maneira semelhante, @ will be approximately the normal distribution with mean 6 and variance V,, (6)/n. In a distribui¢do do estimador de maxima verossimilhanca (dada®) sera aproximadamente a similar fashion, the distribution of the maximum likelihood estimator (given @) will distribuigdo normal com médiaGe variagdo v(@)/n. As condicées e provas necessarias para be approximately the normal distribution with mean 6 and variance v,,(0)/n. The tornar estas afirmagées precisas estéo além do escopo deste texto, mas podem ser conditions and proofs needed to make these claims precise are beyond the scope of encontradas no capitulo 7 de Schervish (1995). this text but can be found in chapter 7 of Schervish (1995). Exemplo Amostragem de uma distribuicgéo exponencial.Suponha quey 1, X2,.. .eu estou tendo Example Sampling from an Exponential Distribution. Suppose that X,, X2,... are iid. having 7.6.11 a distribuigdo exponencial com pardmetro@. Deixar 7n= éu=1Xeu. Entao o MLE de 7.6.11 the exponential distribution with parameter 0. Let T,, = )-”_, X;. Then the M.L.E. of 66 Or=Nn/Tn. (Isso foi encontrado no Exercicio 7 da Secdo 7.5.) Porque 1/6né uma 0 is 6, =n/T,. (This was found in Exercise 7 in Sec. 7.5.) Because 1/6, is an average média de variaveis aleatdrias iid com variancia finita, o teorema do limite central of i.i.d. random variables with finite variance, the central limit theorem tells us that nos diz que a distribuigdo de 1/Gné aproximadamente normal. A média e a variancia, the distribution of 1/6, is approximately normal. The mean and variance, in this case, neste caso, dessa distribuigdo normal aproximada sdo, respectivamente, 1/Ge 1/@2n) of that approximate normal distribution are, respectively, 1/@ and 1/(62n). The delta . O método delta (Teorema 6.3.2) diz queG€ntdo tem aproximadamente a distribui¢do method (Theorem 6.3.2) says that 6 then has approximately the normal distribution normal com média@e variagdo @2/n. Na notacdo acima, temos Vn(OF &. with mean 6 and variance 6”/n. In the notation above, we have V,,(0) = 6”. A seguir, deixe a distribuicgdo anterior de @seja a distribuigdo gama com Next, let the prior distribution of 6 be the gamma distribution with parameters parametros aef. O Teorema 7.3.4 diz que a distribui¢do posterior de@sera a a and 6. Theorem 7.3.4 says that the posterior distribution of 6 will be the gamma distribuigdo gama com pardmetrosa+ne/# tn. Concluimos mostrando que esta distribution with parameters a +n and 6 +1¢,. We conclude by showing that this distribuigdo gama é aproximadamente uma distribuigdo normal. Suponha por gamma distribution is approximately a normal distribution. Assume for simplicity simplicidade queaé um numero inteiro. Entdo a distribuicgdo posterior deé igual a that w is an integer. Then the posterior distribution of 6 is the same as the distribution distribuigdo da soma deatmvariaveis aleatérias exponenciais iid com parametrof+t of the sum of a +n i.i.d. exponential random variables with parameter 6 + t,. Such n. Talsoma tem aproximadamente a distribuigdo normal com média(atn)(/B+ tne a sum has approximately the normal distribution with mean (@ +n)/(6 +4,) and variagao(at+n)/(/G+ tn. SeaePsdo pequenos, a média aproximada é entdo quase variance (w +n)/(B +t,)?. If a and f are small, the approximate mean is then nearly n/tn=Ge a variancia aproximada é entdo quasen/2 n= @/n= Vn(O)/n. - n/t, =, and the approximate variance is then nearly n/t? = §?/n = V,(6)/n. < Exemplo Mortes do Exército Prussiano.No Exemplo 7.3.14, encontramos a distribuigdo posterior de6, Example Prussian Army Deaths. In Example 7.3.14, we found the posterior distribution of 0, 7.6.12 o numero médio de mortes por ano por chute de cavalo em unidades do exército prussiano 7.6.12 the mean number of deaths per year by horsekick in Prussian army units based com base em uma amostra de 280 observacées. A distribuigdo posterior foi considerada a on a sample of 280 observations. The posterior distribution was found to be the distribuigéo gama com os parametros 196 e 280. Pelo mesmo argumento usado em gamma distribution with parameters 196 and 280. By the same argument used in 7.6 Properties of Maximum Likelihood Estimators 433 Figure 7.8 Posterior p.d.f. together with p.d.f. of M.L.E. and approximating normal p.d.f. in Example 7.6.13. For the p.d.f of the M.L.E., the value of θ = 3/6.6 is used to make the p.d.f.’s as similar as possible. 0 0.5 1 2 3 4 1.0 1.5 2.0 u Density Posterior M.L.E. Normal Example 7.6.11, this gamma distribution is approximately the distribution of the sum of 196 i.i.d. exponential random variables with parameter 280. The distribution of this sum is approximately the normal distribution with mean 196/280 and variance 196/2802. Using the same data as in Example 7.3.14, we can find the M.L.E. of θ, which is the average of the 280 observations (according to Exercise 5 in Sec. 7.5). The distribution of the average of 280 i.i.d. Poisson random variables with mean θ is approximately the normal distribution with mean θ and variance θ/280 according to the central limit theorem. We then have Vn(θ) = θ in the earlier notation. The maximum likelihood estimate with the observed data is ˆθ = 196/280 the mean of the posterior distribution. The variance of the posterior distribution is also Vn( ˆθ)/n = ˆθ/280. ◀ There are two common situations in which posterior distributions and distri- butions of M.L.E.’s are not such similar normal distributions as in the preceding discussion. One is when the sample size is not very large, and the other is when the likelihood function is not smooth. An example with small sample size is our electronic components example. Example 7.6.13 Lifetimes of Electronic Components. In Example 7.3.12, we have a sample of n = 3 exponential random variables with parameter θ. The posterior distribution found there was the gamma distribution with parameters 4 and 8.6. The M.L.E. is ˆθ = 3/(X1 + X2 + X3), which has the distribution of 1 over a gamma random variable with parameters 3 and 3θ. Figure 7.8 shows the posterior p.d.f. along with the p.d.f. of the M.L.E. assuming that θ = 3/6.6, the observed value of the M.L.E. The two p.d.f.’s, although similar, are still different. Also, both p.d.f.’s are similar to, but still different from, the normal p.d.f. with the same mean and variance as the posterior, which also appears on the plot. ◀ An example of an unsmooth likelihood function involves the uniform distribu- tion on the interval [0, θ]. Example 7.6.14 Sampling from a Uniform Distribution. In Example 7.5.7, we found the M.L.E. of θ based on a sample of size n from the uniform distribution on the interval [0, θ]. The M.L.E. is ˆθ = max{X1, . . . , Xn}. We can find the exact distribution of ˆθ using the result in Example 3.9.6. The p.d.f. of Y = ˆθ is gn(y|θ) = n[F(y|θ)]n−1f (y|θ), (7.6.11) 7.6 Propriedades dos Estimadores de Máxima Verossimilhança 433 Figura 7.8Fdp posterior junto com fdp de MLE e fdp normal aproximado no Exemplo 7.6.13. Para a pdf do MLE, o valor deθ= 3/6.6 é usado para tornar os PDFs o mais semelhantes possível. Posterior MLE Normal 2,0 1,5 1,0 0,5 0 1 2 3 4 você Exemplo 7.6.11, esta distribuição gama é aproximadamente a distribuição da soma de 196 variáveis aleatórias exponenciais iid com parâmetro 280. A distribuição desta soma é aproximadamente a distribuição normal com média 196/280 e variação 196/2802. Usando os mesmos dados do Exemplo 7.3.14, podemos encontrar o MLE deθ, que é a média das 280 observações (de acordo com o Exercício 5 da Seção 7.5). A distribuição da média de 280 variáveis aleatórias iid Poisson com médiaθé aproximadamente a distribuição normal com médiaθe variaçãoθ/280 de acordo com o teorema do limite central. Temos entãoVn(θ)=θna notação anterior. A estimativa de máxima verossimilhança com os dados observados éθ̂= 196/280 a média da distribuição posterior. A variância da distribuição posterior também éVn(θ̂)/n=θ̂/280. - Existem duas situações comuns em que as distribuições posteriores e as distribuições de MLE não são distribuições normais tão semelhantes como na discussão anterior. Uma é quando o tamanho da amostra não é muito grande e a outra é quando a função de verossimilhança não é suave. Um exemplo com tamanho de amostra pequeno é nosso exemplo de componentes eletrônicos. Exemplo 7.6.13 Vida útil dos componentes eletrônicos.No Exemplo 7.3.12, temos uma amostra den=3 variáveis aleatórias exponenciais com parâmetroθ. A distribuição posterior ali encontrada foi a distribuição gama com parâmetros 4 e 8,6. O MLE éθ̂= 3/(X1+X2+X3), que tem a distribuição de 1 sobre uma variável aleatória gama com parâmetros 3 e 3 θ.A Figura 7.8 mostra a pdf posterior junto com a pdf do MLE assumindo queθ= 3/6. 6, o valor observado do MLE As duas pdf, embora semelhantes, ainda são diferentes. Além disso, ambos os pdfs são semelhantes, mas ainda diferentes, do pdf normal com a mesma média e variância do posterior, que também aparece no gráfico. - Um exemplo de função de verossimilhança não suave envolve a distribuição uniforme no intervalo [0,θ]. Exemplo 7.6.14 Amostragem de uma distribuição uniforme.No Exemplo 7.5.7, encontramos o MLE deθ com base em uma amostra de tamanhonda distribuição uniforme no intervalo [0,θ]. O MLE éθ̂= máximo{X1, . . . , Xn}. Podemos encontrar a distribuição exata deθ̂usando o resultado do Exemplo 3.9.6. O pdf deS=θ̂é gn(você|θ)=n[F (s|θ)]n−1f (s|θ), (7.6.11) Densidade 434 Capitulo 7 Estimativa 434 Chapter 7 Estimation ondef(.| 84 o pdf da distribuigdo uniforme em [0,6] eF (.| A o cdf where f(-|@) is the p.d.f. of the uniform distribution on [0,0] and F(-|@) is the correspondente. Substituindo essas fungdes bem conhecidas na Eq. (7.6.11) corresponding c.d.f. Substituting these well-known functions into Eq. (7.6.11) yields produz o pdf deS=& the p.d.f£. of Y = 6: [ kimn-11 simmt y n—-1 1 yrol gn(vocé| 8FN @ @” On’ 8,(y|0) =n H 6 =n” para 0<y <@.Esta pdf ndo é nem um pouco parecida com uma pdf normal E muito for 0 < y <0. This p.d-f. is not the least bit like a normal p.d-f. It is very asymmetric assimétrica e tem seu maximo no maior valor possivel do MLE Na verdade, pode-se and has its maximum at the largest possible value of the M.L.E. In fact, one can calcular a média e a variancia de@ respectivamente, como compute the mean and variance of 6, respectively, as EO- —_6 E@) =——6, n+4 n+1 Var(F —__! Var(6) = —___7_¢?, (n+1 (n+2) (n + 1)2(n +2) A variancia cai como 1/mem vez de 1/nnos exemplos aproximadamente normais que The variance goes down like 1/n? instead of like 1/n in the approximately normal vimos anteriormente. examples we saw earlier. Sené grande, a distribuigdo posterior de@tera uma pdf que é aproximadamente a funcdo Ifn is large, the posterior distribution of 6 will have a p.d_-f. that is approximately de verossimilhanga vezes a constante necesséaria para transforma-la em uma pdf. A the likelihood function times the constant needed to make it into a p.d.f. The likeli- probabilidade esta na Eq. (7.5.8). Integrando essa fungdo @para obter a constante necessaria hood is in Eq. (7.5.8). Integrating that function over @ to obtain the needed constant leva a seguinte pdf posterior “Teme de leads to the following approximate posterior p.d.f. of 0: N-1)On-1 n—16"-! A &(6| x= a paraé > & si | oe for 0 > 6, 0 de outra forma. 0 otherwise. A média e a variancia desta distribuigdo posterior aproximada sao, The mean and variance of this approximate posterior distribution are, respectively, respectivamente, (n-1)67(n-2)e(n-1)J@A(n-22(n-3)]. A média posterior ainda é (n — 1)6/(n — 2) and (n — 1)6?/[(n — 2)2(n — 3)]. The posterior mean is still nearly quase igual ao MLE (mas um pouco maior), e a variancia posterior diminui a uma equal to the M.L.E. (but a little larger), and the posterior variance decreases at a taxa de 1/n2, assim como a varidncia do MLE Mas a distribui¢do posterior ndo é rate like 1/n”, as does the variance of the M.L.E. But the posterior distribution is not nem um pouco normal, pois a pdf tem seu maximo no menor valor possivel de@ the least bit normal, as the p.d.f. has its maximum at the smallest possible value of 0 e diminui a partir dai. - and decreases from there. < O Algoritmo EM The EM Algorithm Ha uma série de situagées complicadas em que é dificil calcular o MLE. Muitas There are a number of complicated situations in which it is difficult to compute the dessas situagdes envolvem formas de dados faltantes. O termo “dados faltantes” M.L.E. Many of these situations involve forms of missing data. The term “missing pode referir-se a varios tipos diferentes de informacao. As mais obvias seriam as data” can refer to several different types of information. The most obvious would be observagdes que haviamos planejado ou esperavamos observar, mas nao foram observations that we had planned or hoped to observe but were not observed. For observadas. Por exemplo, imagine que planejamos coletar alturas e pesos para example, imagine that we planned to collect both heights and weights for a sample of uma amostra de atletas. Por razées que podem estar fora do nosso controle, é athletes. For reasons that might be beyond our control, it is possible that we observed possivel que tenhamos observado alturas e pesos para a maioria dos atletas, both heights and weights for most of the athletes, but only heights for one subset of mas apenas alturas para um subconjunto de atletas e apenas pesos para outro atheletes and only weights for another subset. If we model the heights and weights subconjunto. Se modelarmos as alturas e pesos como tendo uma distribui¢do as having a bivariate normal distribution, we might want to compute the M.L.E. of normal bivariada, poderemos querer calcular o MLE dos pardmetros dessa the parameters of that distribution. For a complete collection of pairs, Exercise 24 distribuigdo. Para uma colegdo completa de pares, in this section gives formulas for the M.L.E. It is not difficult to see how much more complicated it would be to compute the M.L.E. in the situation described above with missing data. OA/goritmo EMé um método iterativo para aproximar MLEs quando dados faltantes The EM algorithm is an iterative method for approximating M.L.E.’s when dificultam a localizagéo dos MLEs na forma fechada. Comega-se (como na maioria dos missing data are making it difficult to find the M.L.E.’s in closed form. One begins procedimentos iterativos) no estagio 0 com um vetor de parametros inicial@o). Para sair (as in most iterative procedures) at stage 0 with an initial parameter vector 6. To do palcojencenar/+1, primeiro se escreve oprobabilidade de registro de dados completos, move from stage j to stage j + 1, one first writes the full-data log-likelihood, which que é qual seria 0 logaritmo da fungdo de verossimilhanga se tivéssemos observado o is what the logarithm of the likelihood function would be if we had observed the 7.6 Propriedades dos Estimadores de Maxima Verossimilhanca 435 7.6 Properties of Maximum Likelihood Estimators 435 dados faltantes. Os valores dos dados faltantes aparecem no log de probabilidade de dados missing data. The values of the missing data appear in the full-data log-likelihood as completos como varidveis aleatdrias e ndo como valores observados. A etapa “E” do algoritmo random variables rather than as observed values. The “E” step of the EM algorithm EM éa seguinte: Calcule a distribuigdo condicional dos dados faltantes dados os dados is the following: Compute the conditional distribution of the missing data given observados como se 0 pardmetroéeram iguais aye, em seguida, calcule a média condicional the observed data as if the parameter 6 were equal to 64), and then compute the do tratamento de probabilidade de log de dados completos@como constante e os dados conditional mean of the full-data log-likelihood treating 6 as constant and the missing faltantes como varidveis aleatérias. A etapa E elimina as varidveis aleatérias ndo observadas data as random variables. The E step gets rid of the unobserved random variables da probabilidade de log de dados completos e deixaGonde isso estava. Para a etapa “M”, from the full-data log-likelihood and leaves 6 where it was. For the “M” step, choose escolha 6g+1)para maximizar o valor esperado da probabilidade de log de dados completos que 09+) to maximize the expected value of the full-data log-likelihood that you just vocé acabou de calcular. O passo M leva vocé ao palco/+1. Idealmente, a etapa de maximizagao computed. The M step takes you to stage j + 1. Ideally, the maximization step is no ndo é mais dificil do que seria se os dados faltantes tivessem sido realmente observados. harder than it would be if the missing data had actually been observed. Exemplo Alturas e pesos.Suponha que tentemos observarn=6 pares de alturas e Example Heights and Weights. Suppose that we try to observe n = 6 pairs of heights and 7.6.15 pesos, mas obtemos apenas trés vetores completos mais um Unico peso e duas alturas 7.6.15 weights, but we get only three complete vectors plus one lone weight and two lone solitarias. Modelamos os pares como vetores aleatorios normais bivariados e queremos heights. We model the pairs as bivariate normal random vectors, and we want to encontre o MLE do vetor de pardmetros(i1, 2, 02 1, 02,p). (Este exemplo é para find the M.L.E. of the parameter vector (j21, {o, a, oF, p). (This example is for fins ilustrativos. Ndo se pode esperar obter uma boa estimativa de um vetor de illustrative purposes. One cannot expect to get a good estimate of a five-dimensional parametros pentadimensional com apenas nove valores observados e nenhuma parameter vector with only nine observed values and no prior information.) The informagdo prévia.) Os dados estdo na Tabela 7.1. Os pesos que faltam sdoX4,2e%,2. A data are in Table 7.1. The missing weights are X47 and X57. The missing height altura que falta €%,1. A probabilidade logaritmica de dados completos é a soma dos is X¢ 1. The full-data log-likelihood is the sum of the logarithms of six expressions logaritmos de seis expressdes da forma Eq. (5.10.2) cada uma com uma das linhas da of the form Eq. (5.10.2) each with one of the rows of Table 7.1 substituted for the Tabela 7.1 substituida pelas variaveis dummy(x1, x2). Por exemplo, o termo dummy variables (x1, x2). For example, the term corresponding to the fourth row of correspondente a quarta linha da Tabela 7.1 é X Table 7.1 is Ja 2 1 1 68 “U4 1 2 1 68 — Ly - registro(2moia} = registro(1 -p24 — — —_ ——— — log(27 0107) — = log(1 — p*) —- ————~ | | ———— 9 2"°9 PY FA -po) ol 102) 5 OB PO 5 92) o1 68 -1N , - 68 — X42 - X42 —- - 29 U #2, X42 Ha -20(( mt) ( 4,2 2) 4 ( 4,2 a) O1 O2 O2 O1 02 02 Como vetor de parametro inicial escolhemos uma estimativa ingénua calculada a partir dos dados As an initial parameter vector we choose a naive estimate computed from the ob- observados: served data: 0} 0 0 2(0 2(0 BOF (UO), » Lg), 2M bx 2 Pe) (69.60,194.75,2.87,14.82,0.1764). 0 = (uw, of, 05, p) = (69.60, 194.75, 2.87, 14.82, 0.1764). Consiste nos MLE baseados nas distribuigdes marginais das duas coordenadas, This consists of the M.L.E.’s based on the marginal distributions of the two coor- juntamente com a correla¢do amostral calculada a partir das trés observacgdes dinates, together with the sample correlation computed from the three complete completas. observations. Tabela 7.1Alturas e pesos para exame Table 7.1 Heights and weights for Exam- ponto 7.6.15. Os valores ausentes recebem ple 7.6.15. The missing values are nomes de variaveis aleatdrias. given random variable names. Altura Peso Height Weight 72 197 72 197 70 204 70 204 73 208 73 208 68 X42 68 X42 65 X5,2 65 X52 X%6,1 170 X61 170 raduzido do Inglés para o Portugués - www.onlinedoctranslator.com 436 Capitulo 7 Estimativa 436 Chapter7 Estimation O passo E finge que @=@0,e calcula a média condicional da probabilidade de log de The E step pretends that @ = 9 and computes the conditional mean of the full- dados completos, dados os dados observados. Para a quarta linha da Tabela 7.1, a data log-likelihood given the observed data. For the fourth row of Table 7.1, the distribuigdo condicional deX4,2dados os dados observados e6= G)pode ser encontrado conditional distribution of X4 given the observed data and 6 = 6 can be found do Teorema 5.10.4 como sendo a distribuicao normal com mésia from Theorem 5.10.4 to be the normal distribution with mean 68 — 69.60 194.75 + 0.1764x(14.82)1268-69.49—— _ =193.3 194.75 + 0.1764 x (14.82)! (Sa) = 193.3 2.8712 2.871/2 e variagdo(1 - 0.17642)14.822= 212.8. A média condicional de(%4,2-L2)2 and variance (1 — 0.17647)14.827 = 212.8. The conditional mean of (X42 — M2)? seria entdo 212.8+(193.3 -~2)2. A média condicional da expressdo em (7.6.12) would then be 212.8 + (193.3 — 1)”. The conditional mean of the expression in seria entado (7.6.12) would then be [( )2 2 1 1 68 “UH 4 1 2 1 68 — My - registro(?2moio2} — registro(1 -p2+ ——————_ — — log(270,05) — = log(1 — p*) — ————— —— 2 PO -p2) ol BMT O102) ~ 5 108 2 — p2) o1 ( 68 -1N M 193.3 -p2 M 193.3 -y ») 2 n28. 68 — 1, \ (193.3 — uw 193.3 — uo > 212.8 -29 ——— Oe + Sete +, — 2p (——+ } | ——— } + ( =} + I. O1 O2 O2 03 O71 02 02 o5 O ponto a notar sobre esta Ultima expressdo é que, com excecao do ultimo termo 212.8/o2 5 The point to notice about this last expression is that, except for the last term 212.8/03, é exatamente a contribuicgdo para a probabilidade logaritmica que teriamos obtido seX4,2 it is exactly the contribution to the log-likelihood that we would have obtained if X4 5 foi observado ser igual a 193,3, sua média condicional. Calculos semelhantes podem ser feitos had been observed to equal 193.3, its conditional mean. Similar calculations can be para as outras duas observagées sem coordenadas. Cada um produzira uma contribui¢do para done for the other two observations with missing coordinates. Each will produce a probabilidade logaritmica que € a variancia condicional da coordenada faltante dividida por a contribution to the log-likelihood that is the conditional variance of the missing sua variancia mais 0 que teria sido a probabilidade logaritmica se o valor faltante tivesse sido coordinate divided by its variance plus what the log-likelihood would have been if the observado igual 4 sua média condicional. Isto torna 0 passo M quase idéntico a encontrar o missing value had been observed to equal its conditional mean. This makes the M step MLE para um conjunto de dados completamente observado. A Unica almost identical to finding the M.L.E. for a completely observed data set. The only A diferenca das formulas do Exercicio 24 é a seguinte: Para cada observagdo difference from the formulas in Exercise 24 is the following: For each observation isso esta faltandoX, adicione a varidncia condicional deXdadoSpara éu=1(Xeu-Xnem that is missing X, add the conditional variance of X given Y to }~"_,(X; — X,)° in tanto a formula parao2iep* Da mesma forma, para cada observacao que falta S,adicionar both the formula for ot and (. Similarly, for each observation that is missing Y, add a variancia condicional deSdadoXpara “~ G-1(SeuSnktanto na férmuladeci , the conditional variance of Y given X to )~"_,(Y; — Y,,)? in both the formula for 5 ep. and /. Ilustramos agora a primeira iteracdo do algoritmo EM com os dados deste exemplo. We now illustrate the first iteration of the EM algorithm with the data of this Nos ja temos@), e podemos calcular a fungdo log-verossimilhanga a partir dos dados example. We already have 9, and we can compute the log-likelihood function observados em @o)como —31.359. Para iniciar o algoritmo, ja calculamos a média from the observed data at 9 as —31.359. To begin the algorithm, we have already condicional e a varidncia da segunda coordenada ausente da quarta linha da Tabela 7.1. computed the conditional mean and variance of the missing second coordinate from As médias e variancias condicionais correspondentes para a quinta e sexta linhas sdo the fourth row of Table 7.1. The corresponding conditional means and variances for 190,6 e 212,8 para a quinta linha e 68,76 e 7,98 para a sexta linha. Para a etapa E, the fifth and sixth rows are 190.6 and 212.8 for the fifth row and 68.76 and 7.98 for the substituimos as observacées faltantes pelas suas médias condicionais e adicionamos as sixth row. For the E step, we replace the missing observations by their conditional variancias condicionais 4s somas dos desvios quadrados. Para 0 passo M, inserimos os means and add the conditional variances to the sums of squared deviations. For the M valores recém-calculados nas formulas do Exercicio 24 conforme descrito acima. O novo step, we insert the values just computed into the formulas of Exercise 24 as described vetor é above. The new vector is 01 =(69.46,193.81,2.88,14.83,0.3742), 6 = (69.46, 193.81, 2.88, 14.83, 0.3742), e a probabilidade logaritmica 6 -31.03. Apés 32 iteracées, a estimativa e a probabilidade logaritmica and the log-likelihood is —31.03. After 32 iterations, the estimate and log-likelihood param de mudar. A estimativa final é stop changing. The final estmate is 082 (68.86,189.71,3.15,15.03,0.8965), 69%) — (68.86, 189.71, 3.15, 15.03, 0.8965), com probabilidade logaritmica -29.66. - with log-likelihood —29.66. < Exemplo Mistura de distribuig6es normais.Um uso muito popular do algoritmo EM é ajustar Example Mixture of Normal Distributions. A very popular use of the EM algorithm is in fitting 7.6.16 distribuigdes de mistura. Deixar™,..., Xnser varidveis aleatérias tais que cada uma seja 7.6.16 mixture distributions. Let X;,..., X, be random variables such that each one is 7.6 Propriedades dos Estimadores de Maxima Verossimilhanca 437 7.6 Properties of Maximum Likelihood Estimators 437 amostrado a partir da distribuigdo normal com médiazne variagaoo2(com probabilidadep) sampled either from the normal distribution with mean jz, and variance o? (with ou da distribuigdo normal com médiayze variagdo2(com probabilidade 1 -p), ondezn <z2. probability p) or from the normal distribution with mean j1 and variance o (with A restrigdo quen <p2é tornar o modelo identificavel no seguinte sentido. Seyn=p2é probability 1 — p), where 4; < 4. The restriction that w, < > is to make the model permitido, entdo todo valor depleva a mesma distribuicgdo conjunta dos dados identifiable in the following sense. If 11 = 22 is allowed, then every value of p leads to observaveis. Além disso, se nenhuma das médias estiver restrita a estar abaixo da outra, the same joint distribution of the observable data. Also, if neither mean is constrained entdo trocando as duas médias e alterandoppara 1 -pproduzira a mesma distribuigdo to be below the other, then switching the two means and changing p to 1 — p will conjunta para os dados observaveis. A restrigdojn <2 produce the same joint distribution for the observable data. The restriction p41 < fo garante que cada vetor de parametros distinto produza uma distribuicdo conjunta diferente ensures that every distinct parameter vector produces a different joint distribution para os dados observaveis. for the observable data. Os dados da Figura 7.4 tém a aparéncia tipica de uma distribuigdo que é uma mistura The data in Fig. 7.4 have the typical appearance of a distribution that is a mixture de duas normais com médias ndo muito distantes uma da outra. Como assumimos que as of two normals with means not very far apart. Because we have assumed that the variancias das duas distribuigdes sdo iguais, nado teremos o problema que surgiu no variances of the two distributions are the same, we will not have the problem that Exemplo 7.5.10. arose in Example 7.5.10. A fungao de verossimilhanga das observacéesM1=x1,..., Xn=xné The likelihood function from observations X, = x1,..., X, =X, 1S 7 ( y )] , NY , - X - - ‘X+ —_ _— —_ —_ _— Perky VP OR 2 gay [1 ) ——e— exp (=) EP exp (=) | 76.13) cunt QMm20 202 (2120 202 1 (21) 20 202 (21) 20 202 O vetor de parametros 6@=(1N, 12, 02, P), e maximizar a probabilidade conforme esta The parameter vector is 0 = (j11, 42, 07, p), and maximizing the likelihood as written escrito 6 um desafio. No entanto, podemos introduzir observacées faltantes.1,..., Sn is a challenge. However, we can introduce missing observations Y;,..., Y,, where onde Seu=1 seXeufoi amostrado da distribuigdo com médiayneSeu=0 seXeufoi amostrado da Y, =1if X; was sampled from the distribution with mean jz, and Y; = 0 if X; was distribuigdo com médiazz. A probabilidade de log de dados completos pode ser escrita sampled from the distribution with mean j4. The full-data log-likelihood can be como a soma do logaritmo do FP marginal dos faltantes Sdados mais 0 logaritmo da pdf written as the sum of the logarithm of the marginal p.f. of the missing Y data plus the condicional do observadoXdados dados osSdados. Aquilo é, logarithm of the conditional p.d-f. of the observed X data given the Y data. That is, ( ) ” ” n n n n Seuregistro(P+ = - Seu registro(1 -p} 5 registro(2r02) > Y; log(p) + (: — > ‘) log(1 — p) — 5 log (207) a yr i ] (7.6.1 4) = n = (7.6.14) 1 1 - — Sdxeurpn kt -Seu)(Xeu-[ 2p a > hae _ my) +0 - YQ - p2)?| : 202 202 eu=1 i=l No palcojcom estimativa 6jde 6, a etapa E primeiro encontra a distribuigdo condicional deS At stage j with estimate 6 of 0, the E step first finds the conditional distribution 1,..., Sndados os dados observados eG=6y. Desde(X1, 51), ..., (Xn, Snsdo pares of Y;,..., Y, given the observed data and 6 = 0), Since (X1, Yj), ..., (Xp» Yn) are independentes, podemos encontrar a distribuigdo condicional separadamente para cada independent pairs, we can find the conditional distribution separately for each pair. par. A distribuigdo conjunta de(Xeu, Seu} uma distribuigdo mista com pf/pdf The joint distribution of (X;, Y;) is a mixed distribution with p.f./p.d.f. ficeu, sirmeu\ j= Patel Bhesimeu C4 [ . yk Ash jp ) gid) — P*A= pi 1 D244 (2 (Xeu, Simeu| ml Amino experiéncia ~ O2q) vege (x-LK) ) ( -siMeu)(Xeu- [5 ). St, y;| ) = Qn) oD exp ~ G20) [>i (x; _ My ) + ( _ yx; _ My ) | . O pdf marginal deXeuvé oeuo fator em (7.6.13). E simples determinar que a The marginal p.d-f. of X; is the ith factor in (7.6.13). It is straightforward to deter- distribuigdo condicional deSeudados os dados observados é€ a distribuigdo de mine that the conditional distribution of Y; given the observed data is the Bernoulli Bernoulli com pardmetro distribution with parameter ( ) (<dliy, (j) (xj- nt)? ; poexperiéncia =~ Tay ; pv’ exp —F62t WU, = ——__?———_ + _t—.__ (7.6.15) qy = oT r.O OS oor (7.6.15) (Xeu- Hy van (Xeu ype (j) Oi-1! )2 () Oi-n! 2 puexperiéncia - Tog) + ( -pwexperiéncia - “Tag Dp J exp 62 + (1 —_— D J ) exp FG ti 438 Capitulo 7 Estimativa 438 Chapter 7 Estimation Como a probabilidade do log de dados completos é uma fungo linear doSeu's, a etapa E Because the full-data log-likelihood is a linear function of the Y;’s, the E step simply simplesmente substitui cada Sevem (7.6) A)Poeguitado é replaces each Y, in (7.6.14) by qv. The result is »? . ( ” ) n n : n : n Wecistrorry n- g@registro(1 “Peregistro(2 102) > gq? log(p) + (» — > ‘”) log(1 — p) - 5 log(207) eu=1 1 h [ eu=1 ; (7.6.16) i=l h i=l (7.6.16) --—£ Wixeupnp+(\-q@ — eu)(xeuw-pep. >) [aiPex, — wy) + dq); = 12)? | 202 20% 4 eu=1 i=1 Maximizar (7.6.16) é simples. Desdepaparece apenas nos dois primeiros termos, Maximizing (7.6.16) is straightforward. Since p appears in only the first two terms, vemos quepy+1é apenas a Média doqy) —_,,,'S. Também,pj1é a Média ponderada we see that p\/+” is just the average of the q's. Also, wit is the weighted average doXevesta COM PeSOSij)q,,. De forma similarg1) € a Média ponderada doXevesta com of the X;’s with weights gq. Similarly, utd is the weighted average of the X;’s with pesos 1 -Q(j) euFinalmente, weights 1 — qu ), Finally, O2(j+1F 12" low +12 GW) ] 27+) 1X (i) (j+1),\2 (i) (j+D\2 7 eu(x HO + (1- Wi)ixeu pry. (7.6.17) o =-»> [a (x; — wy)? + 97); -— wy”) | . (7.6.17) eu=1 "j=l Ilustraremos as primeiras etapas E e M usando os dados do Exemplo 7.3.10. Para We will illustrate the first E and M steps using the data in Example 7.3.10. For o vetor de parametro inicial Qo), vamos deixaryonser a média dos 10 mais baixos the initial parameter vector 9g, we will let wo be the average of the 10 lowest observagdes eo) 2ser a Média das 10 observacées mais altas. Montamospor1/72, observations and ws be the average of the 10 highest observations. We set p = 1/2, eo20% a média da varidncia amostral das 10 observacgées mais baixas e da and a?) is the average of the sample variance of the 10 lowest observations and the varidncia amostral das 10 observacgées mais altas. Isto faz sample variance of the 10 highest observations. This makes G0=(U0)1, LOR, 0200), PAg.0)=(-7.65,7.36,46.28,0.5). 0) = (WO, uw, 0? , p) = (7.65, 7.36, 46.28, 0.5). Para cada um dos 20 valores observadosxeu, calculamosqogu.Por exemplo,x10= —4.0. For each of the 20 observed values x;, we compute gy”. For example, x; 9 = —4.0. De acordo com (7.6.15), According to (7.6.15), ( ) _ €4.0+7.6522 _ ee) qo) 0.5 exp 2xA6.28 0.7774 4 = 0.5 exp ( 2x 46.28 07774 10= { _ } ( —————}- . . 10 = a eNO . (-4.0+7.65)2 (-4.0-7.362 (~4.0+47.65) (~4.0-7.36) 0.5 exp - 2x46.28 + 0.5 exp - Ix46.28 0.5 exp (- Sorte” ) + 0.5 exp (-S ar) Um calculo semelhante paraxs= 9.0 rendimentosqo) 8= 0.0489. A probabilidade logaritmica inicial, cal- A similar calculation for xg = 9.0 yields qs = 0.0489. The initial log-likelihood, cal- calculado como o logaritmo de (7.6.13), 6 -75.98. A média dos 20g) euvalores é culated as the logarithm of (7.6.13), is —75.98. The average of the 20 q values is pa-0.4402. A média ponderada dos valores dos dados usando 0q@)eu's como os pesos so p\? =0.4402. The weighted average of the data values using the q's as weights is Md 7,736, e a média ponderada usando 0 1 -qg@) ev@L() 2= 6.3068. Usando ui? = —7.736, and the weighted average using the 1 — qo’s is us? = 6.3068. Using (7.6.17), obtemosoz=56.5491. A probabilidade logaritmica aumenta para -75.19. Apds 25 (7.6.17), we get «7 = 56.5491. The log-likelihood rises to —75.19. After 25 iter- iteragdes, os resultados se estabilizam 625 (-21.9715,2.6802,48.6864,0.1037)com uma ations, the results settle on 9°) = (—21.9715, 2.6802, 48.6864, 0.1037) with a final probabilidade logaritmica final de -72.84. O histograma da Figura 7.4 é reproduzido na Figura log-likelihood of —72.84. The histogram from Fig. 7.4 is reproduced in Fig. 7.9 to- 7.9 juntamente com a pdf de uma observacao da distribuigdo da mistura ajustada, a saber, gether with the p.d-f. of an observation from the fitted mixture distribution, namely, ( ) 0.1037 (x+21.97152 0.1037 x + 21.9715)? (0 $$ crpcriorcias = fQ) = a OXP _ 4 Ah 9715)" (277*48.6864)12 2x48.6864 (27 x 48.6864)!/ 2 x 48.6864 ( ) + 1 - 0.1037 (X-2.6802)2 4 1 — 0.1037 ex (x — 2.6802)? (27*48.6864)2 ~~. 2x48.6864 (2x x 48.6864)'72 *P \ 9 x 48.6864 |’ Além disso, a pdf ajustada com base em uma Unica distribuigdo normal também é In addition, the fitted p.d.f. based on a single normal distribution is also shown in mostrada na Fig. 7.9. A média e a variancia dessa distribuigdo normal Unica so 0,1250 e Fig. 7.9. The mean and variance of that single normal distribution are 0.1250 and 110,6809, respectivamente. - 110.6809, respectively. < 7.6 Properties of Maximum Likelihood Estimators 439 10 210 220 230 20 2 4 6 8 10 0 Laboratory calories minus label calories Number of foods Figure 7.9 Histogram of data from Example 7.3.10 together with fitted p.d.f. from Example 7.6.16 (solid curve). The p.d.f. has been scaled up to match the fact that the histogram gives counts rather than an estimated p.d.f. Also, the dashed curve gives the estimated p.d.f. for a single normal distribution. One can prove that the log-likelihood increases with each iteration of the EM algorithm and that the algorithm converges to a local maximum of the likelihood function. As with other numerical maximization routines, it is difficult to guarantee convergence to a global maximum. Sampling Plans Suppose that an experimenter wishes to take observations from a distribution for which the p.f. or the p.d.f. is f (x|θ) in order to gain information about the value of the parameter θ. The experimenter could simply take a random sample of a predetermined size from the distribution. Instead, however, he may begin by first observing a few values at random from the distribution and noting the cost and the time spent in taking these observations. He may then decide to observe a few more values at random from the distribution and to study all the values thus far obtained. At some point, the experimenter will decide to stop taking observations and will estimate the value of θ from all the observed values that have been obtained up to that point. He might decide to stop because either he feels that he has enough information to be able to make a good estimate of θ or he cannot afford to spend any more money or time on sampling. In this experiment, the number n of observations in the sample is not fixed beforehand. It is a random variable whose value may very well depend on the magnitudes of the observations as they are obtained. Suppose that an experimenter contemplates using a sampling plan in which, for every n, the decision of whether or not to stop sampling after n observations have been collected is a function of the n observations seen so far. Regardless of whether the experimenter chooses such a sampling plan or decides to fix the value of n before 7.6 Propriedades dos Estimadores de Máxima Verossimilhança 439 Número de alimentos 10 8 6 4 2 230 220 210 0 Calorias de laboratório menos calorias do rótulo 10 20 Figura 7.9Histograma dos dados do Exemplo 7.3.10 juntamente com a pdf ajustada do Exemplo 7.6.16 (curva sólida). A pdf foi ampliada para corresponder ao fato de que o histograma fornece contagens em vez de uma pdf estimada. Além disso, a curva tracejada fornece a pdf estimada para uma única distribuição normal. Pode-se provar que o log-verossimilhança aumenta com cada iteração do algoritmo EM e que o algoritmo converge para um máximo local da função de verossimilhança. Tal como acontece com outras rotinas de maximização numérica, é difícil garantir a convergência para um máximo global. Planos de Amostragem Suponha que um experimentador deseje fazer observações de uma distribuição para a qual o PF ou o FD éf(x|θ)para obter informações sobre o valor do parâmetroθ. O experimentador poderia simplesmente retirar uma amostra aleatória de tamanho predeterminado da distribuição. Em vez disso, porém, ele pode começar observando primeiro alguns valores aleatórios da distribuição e anotando o custo e o tempo gasto na realização dessas observações. Ele pode então decidir observar mais alguns valores aleatoriamente da distribuição e estudar todos os valores obtidos até agora. Em algum momento, o experimentador decidirá parar de fazer observações e estimará o valor deθde todos os valores observados que foram obtidos até aquele ponto. Ele pode decidir parar porque sente que tem informações suficientes para poder fazer uma boa estimativa doθou ele não pode se dar ao luxo de gastar mais dinheiro ou tempo em amostragem. Neste experimento, o númeronde observações na amostra não é fixada de antemão. É uma variável aleatória cujo valor pode muito bem depender das magnitudes das observações à medida que são obtidas. Suponha que um experimentador considere usar um plano de amostragem no qual, para cadan, a decisão de interromper ou não a amostragem apósnobservações foram coletadas é uma função donobservações vistas até agora. Independentemente de o experimentador escolher tal plano de amostragem ou decidir fixar o valor denantes 440 Chapter 7 Estimation any observations are taken, it can be shown that the likelihood function based on the observed values is proportional (as a function of θ) to f (x1|θ) . . . f (xn|θ). In such a situation, the M.L.E. of θ will depend only on the likelihood function and not on what type of sampling plan is used. In other words, the value of ˆθ depends only on the values x1, . . . , xn that are actually observed and does not depend on the plan (if there was one) that was used by the experimenter to decide when to stop sampling. To illustrate this property, suppose that the intervals of time, in minutes, between arrivals of successive customers at a certain service facility are i.i.d. random variables. Suppose also that each interval has the exponential distribution with parameter θ, and that a set of observed intervals X1, . . . , Xn form a random sample from this distribution. It follows from Exercise 7 of Sec. 7.5 that the M.L.E. of θ will be ˆθ = 1/Xn. Also, since the mean μ of the exponential distribution is 1/θ, it follows from the invariance property of M.L.E.’s that ˆμ = Xn. In other words, the M.L.E. of the mean is the average of the observations in the sample. Consider now the following three sampling plans: 1. An experimenter decides in advance to take exactly 20 observations, and the average of these 20 observations turns out to be 6. Then the M.L.E. of μ is ˆμ = 6. 2. An experimenter decides to take observations X1, X2 . . . until she obtains a value greater than 10. She finds that Xi < 10 for i = 1, . . . , 19 and that X20 > 10. Hence, sampling terminates after 20 observations. If the average of these 20 observations is 6, then the M.L.E. is again ˆμ = 6. 3. An experimenter takes observations one at a time, with no particular plan in mind, until either she is forced to stop sampling or she gets tired of sampling. She is certain that neither of these causes (being forced to stop or getting tired) depends in any way on μ. If for either reason she stops as soon as she has taken 20 observations and if the average of the 20 observations is 6, then the M.L.E. is again ˆμ = 6. Sometimes, an experiment of this type must be terminated during an interval when the experimenter is waiting for the next customer to arrive. If a certain amount of time has elapsed since the arrival of the last customer, this time should not be omitted from the sample data, even though the full interval to the arrival of the next customer has not been observed. Suppose, for example, that the average of the first 20 observations is 6, the experimenter waits another 15 minutes but no other customer arrives, and then she terminates the experiment. In this case, we know that the M.L.E. of μ would have to be greater than 6, since the value of the 21st observation must be greater than 15, even though its exact value is unknown. The new M.L.E. can be obtained by multiplying the likelihood function for the first 20 observations by the probability that the 21st observation is greater than 15, namely, exp(−15θ), and finding the value of θ that maximizes this new likelihood function (see Exercise 15). Remember that the M.L.E. is determined by the likelihood function. The only way in which the M.L.E. is allowed to depend on the sampling plan is through the likelihood function. If the decision about when to stop observing data is based solely on the observations seen so far, then this information has already been included in the likelihood function. If the decision to stop is based on something else, one needs 440 Capítulo 7 Estimativa quaisquer observações forem feitas, pode-se mostrar que a função de verossimilhança baseada nos valores observados é proporcional (em função deθ) para f(x1|θ). . . f(xn|θ). Em tal situação, o MLE deθdependerá apenas da função de verossimilhança e não do tipo de plano amostral utilizado. Em outras palavras, o valor deθ̂depende apenas dos valoresx1, . . . , xnque são realmente observados e não depende do plano (se houver) que foi usado pelo experimentador para decidir quando parar a amostragem. Para ilustrar esta propriedade, suponhamos que os intervalos de tempo, em minutos, entre chegadas de clientes sucessivos a uma determinada instalação de serviço sejam variáveis aleatórias iid. Suponha também que cada intervalo tenha a distribuição exponencial com parâmetroθ, e que um conjunto de intervalos observadosX1, . . . , Xn forme uma amostra aleatória desta distribuição. Segue-se do Exercício 7 da Seç. 7.5 que o MLE deθvai ser θ̂=1/Xn. Além disso, como a médiaμda distribuição exponencial é 1/θ, segue da propriedade de invariância dos MLE queμ̂=Xn. Em outras palavras, o MLE da média é a média das observações da amostra. Considere agora os três planos de amostragem a seguir: 1. Um experimentador decide antecipadamente fazer exatamente 20 observações, e a média dessas 20 observações acaba sendo 6. Então o MLE deμé μ̂=6. 2. Um experimentador decide fazer observaçõesX1, X2. . .até obter um valor superior a 10. Ela descobre queXeu<10 paraeu=1, . . . ,19 e issoX20>10. Portanto, a amostragem termina após 20 observações. Se a média destas 20 observações for 6, então o MLE é novamenteμ̂= 6. 3. Um experimentador faz observações uma de cada vez, sem nenhum plano específico em mente, até que seja forçado a interromper a amostragem ou se canse de amostrar. Ela tem certeza de que nenhuma dessas causas (ser forçada a parar ou ficar cansada) depende de alguma forma deμ. Se por qualquer razão ela parar assim que tiver feito 20 observações e se a média das 20 observações for 6, então o MLE é novamenteμ̂= 6. Às vezes, um experimento desse tipo deve ser encerrado durante um intervalo em que o experimentador está aguardando a chegada do próximo cliente. Se tiver decorrido um determinado período de tempo desde a chegada do último cliente, esse tempo não deve ser omitido dos dados da amostra, mesmo que não tenha sido observado todo o intervalo até a chegada do próximo cliente. Suponha, por exemplo, que a média das primeiras 20 observações seja 6, o experimentador espere mais 15 minutos, mas nenhum outro cliente chegue e então encerre o experimento. Neste caso, sabemos que o MLE deμ teria que ser maior que 6, pois o valor da 21ª observação deve ser maior que 15, embora seu valor exato seja desconhecido. O novo MLE pode ser obtido multiplicando a função de verossimilhança para as primeiras 20 observações pela probabilidade de a 21ª observação ser maior que 15, ou seja, exp(−15θ), e encontrando o valor deθque maximiza esta nova função de verossimilhança (ver Exercício 15). Lembre-se de que o MLE é determinado pela função de verossimilhança. A única maneira pela qual o MLE pode depender do plano amostral é através da função de verossimilhança. Se a decisão sobre quando parar de observar os dados for baseada apenas nas observações vistas até agora, então esta informação já foi incluída na função de verossimilhança. Se a decisão de parar for baseada em outra coisa, é preciso 7.6 Propriedades dos Estimadores de Maxima Verossimilhanca 441 7.6 Properties of Maximum Likelihood Estimators 441 para avaliar a probabilidade dessa “outra coisa” dado cada valor possivel de@e inclua to evaluate the probability of that “something else” given each possible value of @ essa probabilidade na probabilidade. and include that probability in the likelihood. Outras propriedades dos MLE serdo discutidas posteriormente neste capitulo e no Capitulo Other properties of M.L.E.’s will be discussed later in this chapter and in Chap- 8. ter 8. Resumo Summary O MLE de uma funcgdo0g(84 9/8), onde o MLE de@. Por exemplo, se a taxa na qual os clientes The M.L.E. of a function g(6) is (6), where 6 is the M.L.E. of 6. For example, if 6 is sdo atendidos em uma fila, entao 1/66 o tempo médio de servicgo. O MLE de 1/66 1 sobre o MLE the rate at which customers are served in a queue, then 1/6 is the average service time. de@. As vezes nao conseguimos encontrar uma expressdo de forma fechada para o MLE de um The M.L.E. of 1/6 is 1 over the M.L.E. of 6. Sometimes we cannot find a closed form parametro e devemos recorrer a métodos numéricos para encontrar ou aproximar o MLE. Na expression for the M.L.E. of a parameter and we must resort to numerical methods to maioria dos problemas, a sequéncia de MLE, a medida que o tamanho da amostra aumenta, find or approximate the M.L.E. In most problems, the sequence of M.L.E.’s, as sample converge em probabilidade para o parametro. Quando os dados sao coletados de tal forma que size increases, converges in probability to the parameter. When data are collected in a decisdo de interromper a coleta de dados se baseia exclusivamente nos dados ja observados such a way that the decision to stop collecting data is based solely on the data already ou em outras consideragdes que nao estado relacionadas ao paradmetro, entéo o MLE nao observed or on other considerations that are not related to the parameter, then the dependerda do plano amostral. Ou seja, se dois planos amostrais diferentes levam a fungées de M.L.E. will not depend on the sampling plan. That is, if two different sampling plans verossimilhanga proporcional, ent&o o valor de@que maximiza uma probabilidade também lead to proportional likelihood functions, then the value of 6 that maximizes one maximizarda a outra. likelihood will also maximize the other. Exercicios Exercises 1.Suponha queX, ..., Xnforme uma amostra aleatoria a partir de bloco da distribuicgdo, ou seja, do ponto tal que Pr(x 1. Suppose that X,,..., X,, form arandom sample from a tile of the distribution, that is, of the point @ such that uma distribuigdo com a pdf dada no Exercicio 10 da Secao. 7.5. <OF0.95. distribution with the p.d.f. given in Exercise 10 of Sec. 7.5. Pr(x < 6) =0.95. yl —1/0 Encontre 0 MLE dee-1a 7.Para as condigées do Exercicio 6, encontre o MLE de Find the M.L.E. of e~'””. 7. For the conditions of Exercise 6, find the M.L.E. of 2.Suponha que, ..., Xnforme uma amostra aleatéria de vocé=Pr.(X >2). 2. Suppose that X,,..., X,, form a random sample from v = Pr(X > 2). uma distribuigdo de Poisson para a qual a média é 8S h xi Xo leatori a Poisson distribution for which the mean is unknown. 8. S hat X XE d lef desconhecida. Determine o MLE do desvio padrdo da uponia quest, . .., Anforme uma amostra aleatoria Determine the M.L.E. of the standard deviation of the - Suppose that Xj, ..., X, form a random sample trom distribuicao. de uma distribuigdo gama para a qual a pdf é dada pela distribution. a gamma distribution for which the p.d.f. is given by Eq. (7.6.2). Encontre o MLE de'(a)/(a). Eq. (7.6.2). Find the M.L.E. of I’(a)/T(@). 3.Suponha que, ..., Xnformar uma amostra aleatoria .. 3. Suppose that X;,..., X,, form a random sample from de uma distribuic3o exponencial paraa qualovalordo —-%Suponha que, . . ., Xnformar uma amostra aleatoria de uma an exponential distribution for which the value of the 9+ SUPPose that Xj,..., X,, form a random sample from pardmetro Pe desconhecido. Determine o MLE da distribuicao gama para a qual ambos os parametrosaef sao parameter # is unknown. Determine the M.L.E. of the a gamma distribution for which both parameters a and 6 mediana da distribuicdo. desconhecidos. Encontre o MLE dea/B. median of the distribution. are unknown. Find the M.L.E. of a/. 4.Suponha que a vida util de um determinado tipo de lampada 10.Suponha queXi, +, Xnformar uma amostra aleatoria de 4. Suppose that the lifetime of a certain type of lamp 10. Suppose that Xie ’, X;, form a random sample from tenha uma distribuigdo exponencial para a qual o valor do uma distribuigao beta para a qual ambos os parametrosaep has an exponential distribution for which the value of the a beta distribution for which both parameters a and Bare pardmetrof desconhecido. Uma amostra aleatéria denlampadas sao desconhecidos. Mostre que os MLE deaefsatisfazer a parameter 8 is unknown. A random sample of n lamps unknown. Show that the M.L.E.’s of a and satisfy the deste tipo sdo testadas por um periodo de 7horas e o numeroX seguinte equacao: of this type are tested for a period of T hours and the following equation: observa-se o numero de lampadas que falham nesse periodo, mas a (B 1" x, number x of lamps that fail during this period is observed, Ce x. nao sdo anotados os momentos em que as falhas ocorreram. ——. —“ =- registro oe but the times at which the failures occurred are not noted. —— —- — =- > log —_. Determine o MLE defcom base no valor observado dex. (a) Bn eu=1 1 -Xeu Determine the M.L.E. of 6 based on the observed value P(a) P(g) nia 1- x; 11.Suponha que, ..., Xnformar uma amostra aleatéria de of x 11. Suppose that X;,..., X,, form a random sample of 5.Suponha que, ..., Xnformar uma amostra aleatoria a tamanhonda distribuigdo uniforme no intervalo [0,6], onde o 5. Suppose that X;,..., X, form a random sample from size n from the uniform distribution on the interval [0, 6], partir da distribuigdo uniforme no intervalo [um, 6], onde valor de@E desconhecido. Mostre que a sequéncia de MLEs de the uniform distribution on the interval [a, b], where both where the value of 6 is unknown. Show that the sequence ambos os pontos finaisaebsdo desconhecidos. Encontreo MLE 6 uma sequéncia consistente. endpoints a and b are unknown. Find the M.L.E. of the of M.L.E.’s of 6 is a consistent sequence. da média da distribuigdo. mean of the distribution. 12.Suponha que, ..., Xnformar uma amostra aleatéria de 12. Suppose that X;,..., X,, form a random sample from 6.Suponha queXi, ..., Xiformar uma amostra aleatoéria de uma distribuigdo exponencial para a qual o valor do 6. Suppose that X;,..., X,, form a random sample from an exponential distribution for which the value of the pa- uma distribuigdo normal para a qual a média e a varidncia parametrofE desconhecido. Mostre que a sequéncia de MLEs a normal distribution for which both the mean and the rameter # is unknown. Show that the sequence of M.L.E.’s sdo desconhecidas. Encontre o MLE do quan- de Zé uma sequéncia consistente. variance are unknown. Find the M.L.E. of the 0.95 quan- of B is a consistent sequence. 442 Capitulo 7 Estimativa 442 Chapter 7 Estimation 13.Suponha que, ..., Xnforme uma amostra aleatoria a 21.Prove que os estimadores do método dos momentos da média 13. Suppose that X;,..., X,, form a random sample from 21. Prove that the method of moments estimators of the partir de uma distribuigdo cuja fdp seja a especificada no e da varidncia de uma distribuigdo normal também s&o os MLE's. a distribution for which the p.d-f. is as specified in Exer- mean and variance of a normal distribution are also the Exercicio 9 da Sedo 7.5. Mostre que a sequéncia de MLEs de@ cise 9 of Section 7.5. Show that the sequence of M.L.E.’s M.L.E.’s. € uma sequencia consistente. 22.Deixar™, ..., XnSer uma amostra aleatoria da of @ is a consistent sequence. 22. Let X;,..., X, be arandom sample from the uniform 14.Suponha que um cientista deseje estimar a proporcdop _distribuigao uniforme no intervalo [0,4]. 14. Suppose that a scientist desires to estimate the pro- _ distribution on the interval [0, 6]. de borboletas monarca que possuem um tipo especial de a.Encontre 0 método do estimador de momentos de@. portion P of monarch butterflies that have a special type a. Find the method of moments estimator of 0. marcagao em suas asas. b.Mostre que o método do estimador de momentos ndo of marking on their wings. b. Show that the method of moments estimator is not a.Suponha que ele capture borboletas monarcas, uma de €0MLE a. Suppose that he captures monarch butterflies one at the M.L.E. cada vez, até encontrar cinco que tenham essa marca . a time until he has found five that have this special especial. Se ele deve capturar um total de 43 borboletas, 23.Suponha queXi, tree Xnformar uma amostra aleatoria marking. If he must capture a total of 43 butterflies, 23. Suppose that X1, nt X,, form a random sample from qual é o MLE dep? da distribuicao beta com pardmetrosae/f. Deixar @ (a, 8) what is the M.LE. of p? the beta distribution with parameters a and f. Let 6 = ; . , seja Oo pardmetro do vetor. oo (a, B) be the vector parameter. b.Suponha que no final de um dia 0 cientista tenha b. Suppose that at the end of a day the scientist had . . capturado 58 borboletas monarcas e encontrado a.Encontre o método do estimador de momentos para@. captured 58 monarch butterflies and had found only a. Find the method of moments estimator for 6. apenas trés com a marcacdo especial. Qual € o MLE b.Mostre que o método do estimador de momentos ndo three with the special marking. What is the M.L.E. b. Show that the method of moments estimator is not dep? é€0 MLE of p? the M.L.E. 15.Suponha que 21 observagées sejam tiradas 24.Suponha que os vetores bidimensionais(X1, 51), (X2, 52 15. Suppose that 21 observations are taken at random 24. Suppose that the two-dimensional vectors (Xj, Y;), aleatoriamente de uma distribuicdo exponencial para a ),. ++, (Xn, Snformar uma amostra aleatoria a partir de from an exponential distribution for which the mean wis — (X2, Y2),..-, (Xn, Y,) form a random sample from a bi- qual a médiayE desconhecido(p >0), a média de 20 dessas uma distribuigdo normal bivariada para a qual as médias unknown (1 > 0), the average of 20 of these observations variate normal distribution for which the means of X and observacées é 6 e, embora o valor exato da outra deXe S,as variacdes deXeS,e a correlacdo entre XeSsao is 6, and although the exact value of the other observation _‘Y, the variances of X and Y, and the correlation between observacao nao tenha sido determinado, sabia-se queera _ desconhecidos. Mostre que os MLE desses cinco could not be determined, it was known to be greater than =X and Y are unknown. Show that the M.L.E.’s of these five maior que 15. Determine o MLE dey. parametros sdo os seguintes: 15. Determine the M.L.E. of jz. parameters are as follows: 16.Suponha que cada um dos dois estatisticosAe Bdeve Mi=Xn ep2=Sn, 16. Suppose that each of two statisticians A and B must A,=X, and fl2=Y,, estimar um determinado pardmetro Gcujo valor é 1)” _ 1)” _ estimate a certain parameter 6 whose value is unknown ~ 1 _ ~ 1 _ desconhecido (6 >0). EstatisticoApode observar o valor de a= — (XX, npe o2= — (Sex Sn), (6 > 0). Statistician A can observe the value of a random ot = SOX; —X,)* and 05 = Yi; ~Y,)*, uma variavel aleatoriaX, que possui a distribuigdo gama D out 7 out variable X, which has the gamma distribution with pa- nia nia com parametrosaeZ, ondea= 3 ef-6, estatisticoB pode dn WEN pn Sea S) rameters a and £, where a =3 and £ = 9; statistician B x" (xX; — X,Y; —¥,) observar o valor de uma variavel aleatériaS,que tem a = De a can observe the value of a random variable Y, which has p= i distribuigdo de Poisson com média 2@.Suponha que o zo Nalin =) 27 the Poisson distribution with mean 26. Suppose that the n X27 fy yy]? Pane eu=1(XeurX pp eu=1(Seu- Sn) on ‘ PP [Dr — X,) | paarer —Y,) | valor observado pelo estatisticoAéX=2 e o valor observado value observed by statistician A is X = 2 and the value ob- pelo estatistico BéS=3. Mostre que as funcdes de DicaPrimeiro, reescreva 0 pdf conjunto de cada par(Xeu, Seu) served by statistician B is Y = 3. Show that the likelihood — int: First, rewrite the joint p.d.f. of each pair (X;, Y;) as verossimilhanga determinadas por esses valores como o produto da pdf marginal deXeue o pdf condicional des functions determined by these observed values are pro- the product of the marginal p.d.f. of X; and the conditional observados sdo proporcionais e encontre o valor comum eudadoXeu. Segundo, transforme os parametros em portional, and find the common value of the M.L.E. of 6 p.d.f. of Y; given X;. Second, transform the parameters to do MLE de@ obtido por cada estatistico. jN, o1e obtained by each statistician. 1 o? and 17.Suponha que cada um dos dois estatisticosAe Bdeve ip. pop, 17. Suppose that each of two statisticians A and B must Ponty estimar um determinado pardmetropcujo valor é He ol estimate a certain parameter p whose value is unknown a= 2 1 desconhecido (0<p <1). EstatisticoApode observar o valor por, (0 < p <1). Statistician A can observe the value of a ran- po» de uma variavel aleatériaX, que tem a distribuigdo - —_— dom variable X, which has the binomial distribution with p=—, binomial com pardmetrosn=10 ep; estatisticoBpode a parameters n = 10 and p; statistician B can observe the “1 observar o valor de uma variavel aleatériaS,que tem a a2i=(\ -p Jot 2. value of a random variable Y, which has the negative bi- 3, =(1- p)o5. distribuigdo binomial negativa com parametrosA=4 ep. nomial distribution with parameters r = 4 and p. Suppose . i oo . . Suponha que o valor observado pelo estatisticoAéX=4 e 0 Terceiro, maximize a funcdo de verossimilhanga em fungdo that the value observed by statistician A is X = 4 and the Third, maximize the likelihood function as a function of valor observado pelo estatisticoBéS=6. Mostre que as dos novos parametros. Finalmente, aplique a propriedade de value observed by statistician B is Y = 6. Show that the — the new parameters. Finally, apply the invariance prop- fungées de verossimilhanga determinadas por esses invariancia dos MLEs para encontrar os MLEs dos parametros likelihood functions determined by these observed val- _ &Tty of M.L.E.’s to find the M.L.E.’s of the original pa- valores observados sdo proporcionais e encontre o valor originais. A transformagao acima simplifica muito a ues are proportional, and find the common value of the —_Tameters. The above transformation greatly simplifies the comum do MLE depobtido por cada estatistico. maximizacao da probabilidade. M.L.E. of p obtained by each statistician. maximization of the likelihood. 18.Prove que o método estimador de momentos para o 25.Considere novamente a situacao descrita no Exercicio 24. 18. Prove that the method of moments estimator for the 25. Consider again the situation described in Exercise 24. a tem Ls Desta vez, suponha que, por razGes nao relacionadas com os Soe This time, suppose that, for reasons unrelated to the val- paradmetro de uma distribuicgdo de Bernoulli 6 o MLE - . parameter of a Bernoulli distribution is the M.L.E. valores dos paradmetros, nado possamos observar os valores de ues of the parameters, we cannot observe the values of 19.Prove que 0 método estimador de momentos parao Sr-k1,..., Sn. Ou Seja, poderemos observar todos X1,..., Xne 19. Prove that the method of moments estimator for the Yn-k+b re an That is, we will be able to observe all of pardmetro de uma distribuicdo exponencial é 0 MLE St,..., Sk, Mas nao 0 ultimok Svalores. Usando a dica dada parameter of an exponential distribution is the M.L.E. X,,...,X, and Yj,..., ¥,_,, but not the last k Y values. no Exercicio 24, encontre os MLEs de Using the hint given in Exercise 24, find the M.L.E.’s of 20.Prove que o método dos momentos estimador da [N1, [2, O2 1,022, @p. 20. Prove that the method of moments estimator of the Ly, [495 o?, o5, and p. média de uma distribuicgdo de Poisson é 0 MLE mean of a Poisson distribution is the M.L.E. 7.7 Sufficient Statistics 443 ⋆ 7.7 Sufficient Statistics In the first six sections of this chapter, we presented some inference methods that are based on the posterior distribution of the parameter or on the likelihood function alone. There are other inference methods that are based neither on the posterior distribution nor on the likelihood function. These methods are based on the conditional distributions of various functions of the data (i.e., statistics) given the parameter. There are many statistics available in a given problem, some more useful than others. Sufficient statistics turn out to be the most useful in some sense. Definition of a Sufficient Statistic Example 7.7.1 Lifetimes of Electronic Components. In Examples 7.4.8 and 7.5.2, we computed esti- mates of the mean lifetime for electronic components based on a sample of size three from the distribution of lifetimes. The two estimates we computed were a Bayes es- timate (Example 7.4.8) and an M.L.E. (Example 7.5.2). Both estimates made use of the observed data solely through the value of the statistic X1 + X2 + X3. Is there any- thing special about this statistic, and if so, do such statistics exist in other problems? ◀ In many problems in which a parameter θ must be estimated, it is possible to find either an M.L.E. or a Bayes estimator that will be suitable. In some problems, however, neither of these estimators may be suitable or available. There may not be any M.L.E., or there may be more than one. Even when an M.L.E. is unique, it may not be a suitable estimator, as in Example 7.5.7, where the M.L.E. always underestimates the value of θ. Reasons why there may not be a suitable Bayes estimator were presented at the end of Sec. 7.4. In such problems, the search for a good estimator must be extended beyond the methods that have been introduced thus far. In this section, we shall define the concept of a sufficient statistic, which was introduced by R. A. Fisher in 1922, and we shall show how this concept can be used to simplify the search for a good estimator in many problems. Suppose that in a specific estimation problem, two statisticians A and B must estimate the value of the parameter θ. Statistician A can observe the values of the observations X1, . . . , Xn in a random sample, and statistician B cannot observe the individual values of X1, . . . , Xn but can learn the value of a certain statistic T = r(X1, . . . , Xn). In this case, statistician A can choose any function of the observations X1, . . . , Xn as an estimator of θ (including a function of T ). But statistician B can use only a function of T . Hence, it follows that A will generally be able to find a better estimator than will B. In some problems, however, B will be able to do just as well as A. In such a problem, the single function T = r(X1, . . . , Xn) will in some sense summarize all the information contained in the random sample, and knowledge of the individual values of X1, . . . , Xn will be irrelevant in the search for a good estimator of θ. A statistic T having this property is called a sufficient statistic. The formal definition of a sufficient statistic is based on the following intuition. Suppose that one could learn T and were then able to simulate random variables X′ 1, . . . , X′ n such that, for every θ, the joint distribution of X′ 1, . . . , X′ n was exactly the same as the joint distribution of X1, . . . , Xn. Such a statistic T is sufficient in the sense that one could, if one felt the need, use X′ 1, . . . , X′ process of simulating X′n in the same way that one would have used X1, . . . , Xn. The 1, . . . , X′ n is called an auxiliary randomization. 7.7 Estatísticas Suficientes 443 - 7.7 Estatísticas Suficientes Nas primeiras seis seções deste capítulo, apresentamos alguns métodos de inferência que se baseiam na distribuição posterior do parâmetro ou apenas na função de verossimilhança. Existem outros métodos de inferência que não se baseiam nem na distribuição posterior nem na função de verossimilhança. Esses métodos são baseados nas distribuições condicionais de várias funções dos dados (ou seja, estatísticas) dados o parâmetro. Existem muitas estatísticas disponíveis sobre um determinado problema, algumas mais úteis que outras. Estatísticas suficientes revelam-se mais úteis em certo sentido. Definição de uma estatística suficiente Exemplo 7.7.1 Vida útil dos componentes eletrônicos.Nos Exemplos 7.4.8 e 7.5.2, calculamos estimativas resultados da vida útil média para componentes eletrônicos com base em uma amostra de tamanho três da distribuição de vida útil. As duas estimativas que calculamos foram uma estimativa de Bayes (Exemplo 7.4.8) e um MLE (Exemplo 7.5.2). Ambas as estimativas utilizaram os dados observados apenas através do valor da estatísticaX1+X2+X3. Há algo de especial nesta estatística e, em caso afirmativo, tais estatísticas existem em outros problemas? - Em muitos problemas em que um parâmetroθdeve ser estimado, é possível encontrar um estimador MLE ou Bayes que seja adequado. Em alguns problemas, entretanto, nenhum desses estimadores pode ser adequado ou disponível. Pode não haver nenhum MLE ou pode haver mais de um. Mesmo quando um MLE é único, pode não ser um estimador adequado, como no Exemplo 7.5.7, onde o MLE sempre subestima o valor deθ. As razões pelas quais pode não haver um estimador Bayesiano adequado foram apresentadas no final da Seç. 7.4. Em tais problemas, a busca por um bom estimador deve ser estendida além dos métodos que foram introduzidos até agora. Nesta seção definiremos o conceito de estatística suficiente, que foi introduzido por RA Fisher em 1922, e mostraremos como esse conceito pode ser usado para simplificar a busca por um bom estimador em muitos problemas. Suponha que em um problema específico de estimativa, dois estatísticosAeBdeve estimar o valor do parâmetroθ. EstatísticoApode observar os valores das observaçõesX1 , . . . , Xnem uma amostra aleatória, e estatísticoBnão pode observar os valores individuais deX1, . . . , Xnmas pode aprender o valor de uma determinada estatísticaT= r(X1, . . . , Xn). Neste caso, o estatísticoApode escolher qualquer função das observações X1, . . . , Xncomo um estimador deθ(incluindo uma função deT).Mas estatísticoBpode usar apenas uma função deT.Portanto, segue-se queAgeralmente será capaz de encontrar um estimador melhor do queB. Em alguns problemas, no entanto,Bserá capaz de fazer tão bem quantoA. Em tal problema, a única funçãoT=r(X1, . . . , Xn)irá, de certa forma, resumir todas as informações contidas na amostra aleatória e o conhecimento dos valores individuais deX1, . . . , Xnserá irrelevante na busca por um bom estimador deθ. Uma estatísticaTter essa propriedade é chamado deestatística suficiente. A definição formal de uma estatística suficiente baseia- se na seguinte intuição. Suponha que alguém pudesse aprender Te foram então capazes de simular variáveis aleatóriasX' 1, . . . , X'ntal que, para cada θ,a distribuição conjunta deX' 1, . . . , X'nfoi exatamente igual à distribuição conjunta deX1, . . . , Xn. Tal estatísticaTé suficiente no sentido de que alguém poderia, se sentisse a necessidade, usarX'1, . . . , X'nda mesma forma que alguém teria usadoX1, . . . , Xn. O processo de simulaçãoX' 1, . . . , X'né chamado derandomização auxiliar. 444 Capitulo 7 Estimativa 444 Chapter 7 Estimation Definigao Estatistica suficiente.Deixarm, ..., Xnser uma amostra aleatéria de uma distribuigdo indexada Definition Sufficient Statistic. Let X,,..., X, be a random sample from a distribution indexed 7.7.1 por um paradmetro@. Deixar 7ser uma estatistica. Suponha que, para cada@e todos os 77.1 by a parameter 0. Let T be a statistic. Suppose that, for every 6 and every possible valores possiveis tde 7,a distribuigdo conjunta condicional deXi,..., Xndado que 7=de 9) value t of T, the conditional joint distribution of X;,..., X, given that T =t (and depende apenas detmas n4o ligado@. Ou seja, para cadata distribuicgdo condicional de Xi 0) depends only on ¢ but not on 6. That is, for each t, the conditional distribution of ,...,Xndado 7=te 6 o mesmo para todos@. Entdo dizemos isso 7é umestatistica suficiente X1,..., X, given T =t¢ and 6 is the same for all 6. Then we say that T is a sufficient para o paraémetro 8. statistic for the parameter 0. Volte agora a intuigdo introduzida logo antes da Defini¢do 7.7.1. Quando Return now to the intuition introduced right before Definition 7.7.1. When um simulaX 1,...,X nde acordo com a distribuigdo conjunta condicional de one simulates X} Ley Xx} in accordance with the conditional joint distribution of M,..., Xndado 7=t,segue-se que para cada valor dado de@€,a distribuigéo conjunta X1,..., X, given T =1, it follows that for each given value of 6 € Q, the joint distri- mas de7,X 1,...,Xnserao mesmo que a distribuicgdo conjunta de7, X1,..., Xn. Por bution of T, X\ Leng Xx} will be the same as the joint distribution of T, X;,..., X,. By integrando (ou resumindo) 7da distribuigdo conjunta, vemos que a distribuigaéo conjunta integrating out (or summing out) T from the joint distribution, we see that the joint distribuigdo dex1,..., Xné.o mesmo que a distribuigdo conjuntadeX 1,..., Xn. Por isso, distribution of X;, ..., X,, is the same as the joint distribution of X}, ..., X/,. Hence, se estatistico Bpodemos observar o valor de uma estatistica suficiente 7,ent&o ela pode gerar if statistician B can observe the value of a sufficient statistic T, then she can generate mvariaveis aleatériasX1,..., Xn, que tem a mesma distribuigdo conjunta que o original n random variables X},..., X},, which have the same joint distribution as the origi- amostra aleatéria finalX1, ..., Xn. A propriedade que distingue uma estatistica suficiente 7de nal random sample Xj, ..., X,,. The property that distinguishes a sufficient statistic uma estatistica que ndo é suficiente pode ser descrita da seguinte forma: O auxiliar T from a statistic that is not sufficient may be described as follows: The auxiliary randomizagao usada para gerar as variaveis aleatérias¥ 1,..., X‘ndepois do suficiente randomization used to generate the random variables X{,..., X/, after the sufficient estatistica 7foi observado nado requer nenhum conhecimento sobre o valor deG, uma vez statistic T has been observed does not require any knowledge about the value of 0, que a distribuigdo conjunta condicional deX1, .. ., Xnquando 7é dado nado depende do since the conditional joint distribution of X,,..., X,, when T is given does not depend valor de@. Se a estatistica 7ndo foram suficientes, esta randomizacdo auxiliar ndo pdde ser on the value of @. If the statistic T were not sufficient, this auxiliary randomization realizada, porque a distribuigdo conjunta condicional dex, ..., Xnpara um determinado could not be carried out, because the conditional joint distribution of X;,..., X,, for valor de 7envolveria o valor de@, e esse valor é desconhecido. a given value of T would involve the value of 6, and this value is unknown. Se estatisticoBesta preocupada apenas com a distribuigdo do estimador que ela If statistician B is concerned solely with the distribution of the estimator she usa, podemos agora ver porque ela pode estimar @tao bem quanto o estatisticoA, uses, we can now see why she can estimate @ just as well as can statistician A, que observa os valores dei, ..., Xn. Suponha queAplaneja usar um determinado who observes the values of X,,..., X,,. Suppose that A plans to use a particular estimadord(™, ..., XnJestimar 6, eBobserva o valor de7e gera estimator 5(X),..., X,,) to estimate 6, and B observes the value of T and generates M1,...,X%n, que tém a mesma distribuigdo conjunta que™,..., Xn. SeBusa o X\ Ley Xj, which have the same joint distribution as X;,..., X,. If B uses the estimadord(X1,..., Xn), entdo segue-se que a distribuigdo de probabilidade de Bde estimator 6(Xj,..., X,), then it follows that the probability distribution of B’s estimador sera 0 mesmo que a distribuigdo de probabilidade deAestimador. Esta estimator will be the same as the probability distribution of A’s estimator. This discussdo ilustra por que, ao procurar um bom estimador, um estatistico pode restringir a discussion illustrates why, when searching for a good estimator, a statistician can busca a estimadores que sejam fungdes de uma estatistica suficiente 7Voltaremos a este restrict the search to estimators that are functions of a sufficient statistic T. We shall ponto na Sec. 7.9. return to this point in Sec. 7.9. Por outro lado, se o estatisticoBesta interessada em basear seu estimador na distribuicao On the other hand, if statistician B is interested in basing her estimator on posterior de@, ainda nado mostramos por que ela pode se sair tao bem quanto a estatisticaA. O the posterior distribution of 6, we have not yet shown why she can do just as well prdéximo resultado (0 critério de fatoragdo) mostra por que isso é verdade. Uma estatistica as statistician A. The next result (the factorization criterion) shows why even this suficiente é suficiente para poder calcular a funcdo de verossimilhanga e, portanto, é suficiente is true. A sufficient statistic is sufficient for being able to compute the likelihood para realizar qualquer inferéncia que dependa dos dados apenas por meio da funcgao de function, and hence it is sufficient for performing any inference that depends on the verossimilhanga. MLEs e qualquer coisa baseada em distribuicdes posteriores dependem dos data only through the likelihood function. M.L.E.’s and anything based on posterior dados apenas por meio da fungdo de verossimilhanga. distributions depend on the data only through the likelihood function. O critério de fatoragao The Factorization Criterion Imediatamente apds o Exemplo 7.2.7 e os Teoremas 7.3.2 e 7.3.3, apontamos que uma Immediately after Example 7.2.7 and Theorems 7.3.2 and 7.3.3, we pointed out that estatistica especifica foi usada para calcular a distribuigdo posterior em discussdo. Todas essas a particular statistic was used to compute the posterior distribution being discussed. estatisticas tinham a propriedade de serem tudo o que era necessdrio dos dados para poder These statistics all had the property that they were all that was needed from the calcular a fungdo de verossimilhanga. Esta propriedade é outra forma de caracterizar data to be able to compute the likelihood function. This property is another way to estatisticas suficientes. Apresentaremos agora um método simples para encontrar uma characterize sufficient statistics. We shall now present a simple method for finding a estatistica suficiente que possa ser aplicada em muitos problemas. Este método é baseado no sufficient statistic that can be applied in many problems. This method is based on the seguinte resultado, que foi desenvolvido com generalidade crescente por RA Fisher em 1922, J. following result, which was developed with increasing generality by R. A. Fisher in Neyman em 1935, e PR Halmos e LJ Savage em 1949. 1922, J. Neyman in 1935, and P. R. Halmos and L. J. Savage in 1949. 7.7 Estatisticas Suficientes 445 7.7 Sufficient Statistics 445 Teorema Critério de fatoragdo.DeixarM, ..., Xnformar uma amostra aleatéria de um conjunto continuo Theorem Factorization Criterion. Let X,,..., X, form arandom sample from either a continu- 7.7.1 distribuigdo ou uma distribuigdo discreta para a qual a pdf ou o PF éffx| 8), onde o valor de 7.7.1 ous distribution or a discrete distribution for which the p.d.f. or the p.f. is f(x|@), 6 desconhecido e pertence a um determinado espaco de pardmetros. Uma estatistica = where the value of 6 is unknown and belongs to a given parameter space Q. A 1M ,...,Xn& uma estatistica suficiente para@se e somente se o PDF conjunto ou 0 PF statistic T =r(X,..., X,) is a sufficient statistic for 6 if and only if the joint p.d.f. conjuntofn(x| OdeXi,..., Xnpode ser fatorado da seguinte forma para todos os valores de or the joint pf. f,(|0) of X;,..., X, can be factored as follows for all values of X=(X1,..., Xnk&Rne todos os valores deGe: X= (x1,...,X,) € R” and all values of 6 € Q: fn(x| OF vocé(x)V r(x), 8]. (7.7.1) fr(xl0) =u(x)v[r (x), 8]. (7.7.1) Aqui, as fungéesvocéevsdo nado negativos, a fungdo vocépode depender dexmas Here, the functions u and v are nonnegative, the function u may depend on x but does nao depende@, e a fungdowai depender de@mas depende do valor observadox not depend on 6, and the function v will depend on 6 but depends on the observed somente através do valor da estatisticar(x). value x only through the value of the statistic r(x). ProvaDaremos a prova somente quando o vetor aleatérioX=(M1,..., Xnjtem uma Proof We shall give the proof only when the random vector X = (Xj,..., X,,) has distribuigdo discreta, caso em que a discrete distribution, in which case fn(x| OF Pr. (X=x| 8). fy(x|0) = Pr(X = x6). Suponha primeiro que fn(x| @)pode ser fatorado como na Eq. (7.7.1) para todos os valores dexERn Suppose first that f,,(v|0) can be factored as in Eq. (7.7.1) for all values of x € R" e6€ .Para cada valor possiveltde 7,deixar Vo}denotar 0 conjunto de todos os pontosxERn and 6 € Q. For each possible value t of T, let A(t) denote the set of all points x € R” de tal modo que/(xtPara cada valor dado deé€,determinaremos a distribuicdo such that r(x) = t. For each given value of @ € Q, we shall determine the conditional condicional deXdado que 7=tPara cada pontoxE No), distribution of X given that T =f. For every point x € A(f), Pr.(X=x| 8, fr(x| Pr(X =x|0 0 Prxex|T=t@- PROEXIO © 5 inl) Pr(X =x|T =1, 6) = 2A a _ Int) Pr.(7=t| @) simeNo)fn(sim| ) Pr(T=116) doveaay nl) Desder(simF tpara cada pontosim€ No), e desdexE No), segue da Eq. (7.7.1) que Since r(y) = ¢ for every point y € A(t), and since x € A(t), it follows from Eq. (7.7.1) that vocé(x) x Pr.(X=x|T=t@-y 9 (7.7.2) Pr(X =x|T =1,0) = —L@) (7.7.2) simeNo)voce(sim) dive ) uly ) Finalmente, para cada pontoxque ndo pertenceNo), Finally, for every point x that does not belong to A(t), Pr.(X=x| T=t,00. (7.7.3) Pr(X =x|T =1, 0) =0. (7.7.3) Isso pode ser visto nas Eqs. (7.7.2) e (7.7.3) que a distribuigdo condicional deXndo It can be seen from Eqs. (7.7.2) and (7.7.3) that the conditional distribution of X does depende de@. Portanto, 7é uma estatistica suficiente. not depend on @. Therefore, T is a sufficient statistic. Por outro lado, suponha que 7é uma estatistica suficiente. Entao, para cada Conversely, suppose that T is a sufficient statistic. Then, for every given value valor dado tde7,cada pontoxENo), e todo valor de6,a probabilidade condicional t of T, every point x € A(t), and every value of 6 € Q, the conditional probability Pr(X=x| 7=¢,@)ndo vai depender Ge portanto tera a forma Pr(X =x|T =1, 6) will not depend on @ and will therefore have the form Pr. (X=x| T=t,6- vocé(x). Pr(X¥ =x|T =1, 0) =u(x). Se deixarmosv(t,@Pr.(7=t| 8), segue que If we let v(t, 0) = Pr(T = 10), it follows that fn(x| OF Pr. (X=x| OFPr.(X=x| T=t O)Pr.(7=t| O) f,(x|0) = Pr(X =x|0) = Pr(X¥ =x|T =1, 0) Pr(T = 10) =vocé(x)(t, 8). =u(x)v(t, 0). Por isso, fn(x| Ofoi fatorado na forma especificada na Eq. (7.7.1). Hence, f,,(x|@) has been factored in the form specified in Eq. (7.7.1). A prova para uma amostra aleatoriaM, ..., Xnpartir de uma distribuicdo continua The proof for a random sample X,,..., X,, from a continuous distribution requer métodos um pouco diferentes e ndo sera fornecido aqui. 7 requires somewhat different methods and will not be given here. 7 Uma maneira de ler 0 Teorema 7.7.1 6 que 7=/(XX suficiente se e somente se a funcdo de One way to read Theorem 7.7.1 is that T = r(X) is sufficient if and only if the like- verossimilhanga for proporcional (em fungdo de@) para uma fungdo que depende dos dados lihood function is proportional (as a function of @) to a function that depends on the apenas por meio der(x). Essa funcdo seriaVir(x0), 6]. Ao usar a fungdo de verossimilhanga para data only through r(x). That function would be v[r(x), 0]. When using the likelihood encontrar distribuigées posteriores, vimos que qualquer fator que nado dependa de Acomo function for finding posterior distributions, we saw that any factor not depending on vocé(x)na Eq. (7.7.1)) pode ser removido da probabilidade sem afetar 6 (such as u(x) in Eq. (7.7.1)) can be removed from the likelihood without affecting 446 Capitulo 7 Estimativa 446 Chapter 7 Estimation o calculo da distribui¢gdo posterior. Portanto, temos o seguinte corolario do the calculation of the posterior distribution. So, we have the following corollary to Teorema 7.7.1. Theorem 7.7.1. Coroldrio Uma estatistica 7=r(X suficiente se e somente se, nado importa qual distribuicdo anterior tenhamos Corollary A statistic T = r(X) is sufficient if and only if, no matter what prior distribution we 7.7.1 uso, a distribuigdo posterior de@depende dos dados apenas através do valor de7. 77.1 use, the posterior distribution of 6 depends on the data only through the value of T. 7 7 Para cada valor dexpara qualfn(x| 9-0 para todos os valores de 6,0 valor da funcado For each value of x for which f,,(x|0) = 0 for all values of 6 € Q, the value of the vocé(x)na Eq. (7.7.1) pode ser escolhido como 0. Portanto, quando o critério de fatoracdo function u(x) in Eq. (7.7.1) can be chosen to be 0. Therefore, when the factorization esta sendo aplicado, é suficiente verificar que uma fatoragdo na forma dada na Eq. (7.7.1) criterion is being applied, it is sufficient to verify that a factorization of the form é satisfeita para cada valor dexde tal modo quefn(x| 8) >0 para pelo menos um valor de@ given in Eq. (7.7.1) is satisfied for every value of x such that f,,(v|0) > 0 for at least €. one value of 6 € Q. Ilustraremos agora 0 uso do critério de fatoragdo dando quatro exemplos. We shall now illustrate the use of the factorization criterion by giving four examples. Exemplo Amostragem de uma distribuicdo de Poisson.Suponha queX=(Xi, ..., Xnformar um aleatério Example Sampling from a Poisson Distribution. Suppose that X = (X),..., X,,) form a random 7.7.2 amostra de um Poissha distribuigdo para a qual o valor da média, oe desconhecido 7.7.2 sample from a Poisson distribution for which the value of the mean @ is unknown (9 >0). Deixarr(x} éu=1Xeu. Mostraremos que 7=/(X eu=1Xevé suficiente (0 > 0). Let r(x) = )7_, x;. We shall show that T =r(X) = )7"_, X; is a sufficient estatistica paraé. statistic for 0. Para cada conjunto de inteiros ndo negativosx1,..., Xn, a articulagdo PFfn(x| @dex,..., Xn For every set of nonnegative integers x1,..., x,, the joint p.f. f,(x|0) of X1,..., é 0 seguinte: X,, is as follows: ( ) iT’ e-6Oxeu iT’ 1 n —9 9%; n 1 fa(x| OF = = x e-noOr(x). f,(x|0) = I] <_—— — I] = eo gr a). eu=1 “eU eu=T &U ixt 7 j=1 *i° n Deixar vocé(x= i eu=1(1/Xeu! Jev(t, OF e-no6eVemos agora isso fn(x| O)foi fac- Let u(x) =[]}_,(1/x;!) and u(t, 0) = e~"°9', We now see that f,(x|@) has been fac- torado como na Eq. (7.7.1). Segue que T= &-1Xeué uma estatistica suficiente paraé. - tored as in Eq. (7.7.1). It follows that T = }~"_, X; is a sufficient statistic for@. << Exemplo Aplicando o Critério de Fatoragdo a uma Distribuigdo Continua.Suponha queX= Example Applying the Factorization Criterion to a Continuous Distribution. Suppose that X = 7.7.3 (M,...,Xnfforme uma amostra aleatéria de uma distribuigdo continua com a 7.7.3 (X1,..., X,) form a random sample from a continuous distribution with the follow- seguinte pdf: ing p.d.f.: { Oxe-1 6-1 flx| O)= para O<x <1, posi = {0 for 0 <x < 1, 0 de outra forma. n 0 otherwise. z n Sup6e-se que o valor do paranpetro OE desconhecido(@ >0). Deixarr(x euctXeu, It is assumed that the value of the parameter 6 is unknown (6 > 0). Letr(x) = IT Xj. Mostraremos que 7=/(X cu=1 Xeué uma estatistica suficiente para@. We shall show that T =r(X) = ITs X; is a sufficient statistic for 0. Para O0<xeu<1 (eu=1,..., n), 0 pdf conjuntofn(x| Adem, ..., Xné 0 seguinte: For 0 <x; <1@=1,...,n), the joint p.df f,(¥|0) of X1,..., X,, is as follows: ( rT dor h 6-1 f (x| @= On Xeu = Or r(x)le-1. (7.7.4) f(x|0) =0" (I “) =6" [r(x Pot. (7.7.4) eu=1 i=l Além disso, se pelo menos um valor dexevesta fora do intervalo 0<xeu<1, entdofn(x| AF 0 Furthermore, if at least one value of x; is outside the interval 0 < x; < 1,then f, (x|0) = para cada valor dee .O lado direito da Eq. (7.7.4) depende dexsomente através do valor 0 for every value of 6 € Q. The right side of Eq. (7.7.4) depends on x only through der(x). Portanto, se deixarmosvocé(x¥1 ev(t,@ Onto-1, entao fn(x| Ona Eq. (7.7.4) pode ser the value of r(x). Therefore, if we let u(x) = 1 and v(t, 6) =0"r°—!, then f,(x|9) in considerado fatorado na especificagdo do formulariof] ificado na Eq. (7.7.1). Isto Eq. (7.7.4) can be considered to be factored in the form specified in Eq. (7.7.1). It segue do critério de fatoragdo que a estatistica 7= éu=1Xeué suficiente follows from the factorization criterion that the statistic T =[];_, X; is a sufficient estatistica paraé. - statistic for 0. < Exemplo Amostragem de uma distribuigdo normal.Suponha queX=(M1, ..., Xnformar um aleatério Example Sampling from a Normal Distribution. Suppose that X = (X,,..., X,,) form a random 7.7.4 amostra de um d normalpistribuigaéo para a qual a médiayé desconbecido e a varidncia 7.7.4 sample from a normal distribution for which the mean jz is unknown and the variance o2é conhecido. Deixarr(x éu=1Xeu. Mostraremos que 7='(X eu=1Xeué suficiente o7 is known. Let r(x) = yy, Xi. We shall show that T =r(X) = 7"_, X; isasufficient estatistica paray. statistic for pu. 7.7 Estatisticas Suficientes 447 7.7 Sufficient Statistics 447 Para -~<xeu<m/(eu=1,..., n), 0 pdf conjunto deXé o seguinte: For —00 < x; < oo (i =1,...,n), the joint p.d.f. of X is as follows: [ ] IT’ 1 (Xeu- Lip - 1 (x; — 1)? fr(x| [= — arperinion (7.7.5) (x|w) = | ] ——— exp} -/ = |. 7.75 lH Qmhr20 202 In(¥lM I] (21) 1/20 P 202 ( ) eu=1 i=l Esta equacgdo pode ser reescrita na forma This equation can be rewritten in the form ( )( ) 1 1 2” Hy” npr 1 — 2 lL ” np fn(X| LJ= = ————— experiencia”. —— epeitnia— Xe (7.7.6) (x|) = ———— exp| -—~ 9 ‘x? ] exp( = Sx, -—= ). 7.7.6 WE Oamaan Top Meu mn 2a In OU) = TF rnfagn P| 552 » PPK G2 X ‘202 (7-76) eu=1 eu=1 i=1 i=1 Deixarvocé(x}seja o fator constante e o primeiro fator exponencial na Eq. (7.7.6). Deixar Let u(x) be the constant factor and the first exponential factor in Eq. (7.7.6). Let V(t, L=experiéncia(ut/o2-np2/o2). Entao fn(x| undo tem yfomos fatorados como na Eq. (7.7.1). v(t, w) = exp(ut /o? — nu*/o*). Then f,,(x|1) has now been factored as in Eq. (7.7.1). Segue-se do critério de fatoracgdo que 7= yp. du=1 Xeué uma estatistica suficiente para It follows from the factorization criterion that T = )°"_, X; is a sufficient statistic for - LL. < dn — : n = : : Desde ~ eu=1Xeu=nxn, podemos afirmar de forma equivalente que o fator final na Eq. (7.7.6) Since )*"_, x; =nX,, we can state equivalently that the final factor in Eq. (7.7.6) depende dexi,..., xnsomente através do valor dexn. Portanto, no Exemplo 7.7.4 a estatisticaXn depends on x,,..., x, only through the value of x,,. Therefore, in Example 7.7.4 também é uma estatistica suficiente paray. De forma mais geral (ver Exercicio 13 no final desta the statistic X,, is also a sufficient statistic for wu. More generally (see Exercise 13 at se¢do), toda funcao injetora de uma estatistica suficiente é também uma estatistica suficiente. the end of this section), every one-to-one function of a sufficient statistic is also a sufficient statistic. Exemplo Amostragem de uma distribuicdo uniforme.Suponha queX=(%1,..., Xnformar um aleatério Example Sampling from a Uniform Distribution. Suppose that X = (X,,..., X,) forma random 7.7.5 amostra da distribuigdo uniforme no intervalo [0,4], onde o valor do parametro GE 7.7.5 sample from the uniform distribution on the interval [0, 6], where the value of the desconhecido(@ >0). Deixarr(x/=maximo{x, ..., Xn}. Mostraremos que 7=(XF parameter 6 is unknown (6 > 0). Let r(x) = max{x;,..., x,}. We shall show that maximo{™1, ..., Xn}é uma estatistica suficiente paraé. T =r(X) = max{Xj,..., X,} is a sufficient statistic for 0. O pdffx| Ade cada observacao individual Xeué The p.d.f. f(«|@) of each individual observation X; is { 1 < 1 fx\@= a Para O< x<6, rosie) ={ 5 for0<x <8, QO de outra forma. 0 otherwise. Portanto, o pdf conjuntofn(x| Ader, ..., Xné Therefore, the joint p.d.f. f,(%|@) of X1,..., X;, is { ] 1 a fn(x| OF Qn Pata OSxeus 6, (eU=1,..., 1), de f,(e10) = | ar for0 S4 <0,(i=1,...,n), Q._outra forma. 0 otherwise. Pode-se ver que sexeu<0 para pelo menos um valor deeu (eu=1,..., n), entdofn(x It can be seen that if x; < 0 for at least one value of i (i =1,...,”), then f,(x|6) =0 | 8£0 para cada valor de@ >0. Portanto, basta considerar a fatoracdo defn(x| 8) for every value of @ > 0. Therefore, it is only necessary to consider the factorization para valores dexeu20(eu=1,..., 1). of f,(x|9) for values of x; > 0 @=1,...,n). DeixarV{t, 6] ser definido da seguinte forma: Let u[t, 6] be defined as follows: { 1 1; Ut Q= 4 sets 6, utral={? ift <0, QO set>é. 0 ift>90. Notar quexeu< Qparaeu=1,..., Nse e Somente se MAximo{x1, ..., Xn} $O.Portanto, para xeu20(eu Notice that x; < 6 fori =1,...,n if and only if max{x,, ..., x,} <0. Therefore, for =1,..., 1), podemos reescreverfn(x| @)do seguinte modo: x;>0(=1,...,n), we can rewrite f,,(x|6) as follows: fn(x| OF V(x), A. (7.7.7) fa(xl0) = v[r(x), 6]. (7.7.7) De locagaovocé(x1, vemos que o lado direito da Eq. (7.7.7) esta na forma da Eq. (7.7.1). Letting u(x) = 1, we see that the right side of Eq. (7.7.7) is in the form of Eq. (7.7.1). Segue que 7=maximo{X%i, ..., Xn}é uma estatistica suficiente para@. - It follows that T = max{X,, ..., X,,} is a sufficient statistic for 6. < Resumo Summary Uma estatistica 7=1(X}é suficiente se, para cadat,a distribuigdo condicional deXdado 7=teGé A statistic T = r(X) is sufficient if, for each r, the conditional distribution of X given o mesmo para todos os valores de@. Entdo se 7é suficiente, e um observado apenas 7em T =t and @ is the same for all values of 0. So, if T is sufficient, and one observed only vez deX, seria possivel, pelo menos em principio, simular variaveis aleatériasXcom T instead of X, one could, at least in principle, simulate random variables X’ with 448 Capitulo 7 Estimativa 448 Chapter 7 Estimation a mesma distribuigdo conjunta dada@comoX. Nesse sentido, 7é suficiente para obter o the same joint distribution given 6 as X. In this sense, T is sufficient for obtaining maximo de informagées sobre @como se poderia obter deX. O critério de fatoragdo diz que as much information about 6 as one could get from X. The factorization criterion 7=r(X suficiente se e somente se o FP ou pdf conjunto puder ser fatorado como fF (x| OF says that T = r(X) is sufficient if and only if the joint p.f. or p.d.f. can be factored as vocé(x)V1(x),4] para algumas fungdesvocéev. Esta é a forma mais conveniente de f (x|0) = u(x)v[r (x), 6] for some functions u and v. This is the most convenient way identificar se uma estatistica 6 ou ndo suficiente. to identify whether or not a statistic is sufficient. Exercicios Exercises Instrugdes para os Exercicios 1 a 10: Em cada um desses 12.Suponha que uma amostra aleatoriaX, ..., Xné Instructions for Exercises 1 to 10: In each of these ex- = 12. Suppose that a random sample Xj, ..., X,, is drawn exercicios, suponha que as variadveis aleatoriasX1,..., Xn extraido da distribuigdo de Pareto com pardmetrosxoea. ercises, assume that the random variables X1,..., X,, from the Pareto distribution with parameters x9 and a. formar uma amostra aleatoria de tamanhonda distribuigado (Veja o Exercicio 16 na Secao 5.7.) form a random sample of size n from the distribution (See Exercise 16 in Sec. 5.7.) especificada nesse exercicio e mostre que a estatistica7 a.Sexoé conhecido ea >0 desconhecido, encontre uma estatistica specified in that exercise, and show that the statistic T a. If x9 is known and a > 0 unknown, find a sufficient especificado no exercicio é uma estatistica suficiente para o suficiente. specified in the exercise is a sufficient statistic for the statistic. parametro. b.Seaé conhecido exodesconhecido, encontre uma estatistica suficiente. parameter. b. Ifa is known and xp unknown, find a sufficient statis- 1.A distribuigdo Bernoullidativado com parametrop, qual é 1. The Bernoulli distribution with parameter p, which is ue. desconhecido(0<p <1} 7= aXou, 13.Suponha que, ..., Xnformar uma amostra aleatoria unknown (0 < p <1); T = )0)_, Xj. 13. Suppose that X;,..., X,, form a random sample from de uma distribuicdo para a qual a pdf é//x| 8), onde o valor a distribution for which the p.d.f. is f(x|@), where the value 2.A distribuigdo geométricaZtivado com pardmetrop, qual é do parametro@pertence a um determinado espaco de 2. The geometric distribution with parameter p, which is of the parameter 6 belongs to a given parameter space &. desconhecidoO<p <A Oya, parametros. Suponha que 7=/(%1,..., XnJeT=R(M,..., Xn) unknown (0 < p <1); T = }7j_, Xi. Suppose that T=r(X,,..., X,) and T’=r'(X,..., Xy) we . . a sdo duas estatisticas tais que 7é uma funcao biunivoca de . . Sp qe . are two Statistics such that T’ is a one-to-one function of 3.A distribuigdo binomial negativa com parametrosR ep Tou seia, o valor de Toode ser determinado a partir do 3. The negative binomial distribution with parameters r T: that is. the value of T’ can be determined from the >,ondeRé conhecido epE desconhecido( a . n valor de 7sem saber os valores deXi,..., Xn, e 0 valor de7T T=s"_X, value of T without knowing the values of X,,..., X,, and eu-1Xeu. pode ser determinado a partir do valor de7 isl the value of T can be determined from the value of T’ 4.A distribuicdo normal para a qual MY eanyé conhecido sem saber os valores deXi, .. ., Xn. Mostre isso 7 4. The normal distribution for which the mean jp isknown _ Without knowing the values of X;,..., X,,. Show that 7” e a variacdoa2>0 é desconhecido; 7= Bust (Xeu- [)2. é uma estatistica suficiente para@se e apenas se 7é uma estatistica and the variance o2 > 0 is unknown: T = x; — p)?. is a sufficient statistic for 0 if and only if T is a sufficient suficiente paraé. ~ statistic for 0. 5.A distribuigdo gama com parametrosaef, onde ° 14.Suponha que, ..., Xformar uma amostra aleatéria da 5. The gamma distribution with parameters o and B 14. Suppose that X;,..., X,, form a random sample from valor deaé conhecido e 0 valor defE desconhecido(f >0 distribuigéo gamanas especificada no Exercicio 6. Mostre que where the value of a is known and the value of f is un- the gamma distribution specified in Exercise 6. Show that J T-Xn. a estatistica = aniregistroXeué uma estatistica suficiente para known (6 > 0); T= Xp. the statistic T = )~"_, log X; is a sufficient statistic for the 6.A distribuigéo gama com parametrosaeZ, onde o parametroa. 6. The gamma distribution with parameters a and B, Parameter a. valor deff]é conhecido e o valor deat desconhecido(a> = 15.Suponha queXi, ..., Xiformar uma amostra aleatéria where the value of 6 is known and the value of a is un- 15. Suppose that X,,..., X, form a random sample from O; 7 0 Kou, da distribuicdo beta com parametrosaef, onde o valor dea known (@ > 0); T =[]j_, X;- the beta distribution with parameters w and f, where the Cae a é conhecido e o valor deGE desconhecido(Z >0). Mostre on . value of w is known and the value of 8 is unknown (f > 0). 7A distribuicao beta com parametrosaeZ, onde 0 valor que a seguinte ectatstiea Te uma setatistica suficiente para 7, The beta distribution with parameters and B, where Show that the following statistic T i a sufficient siatisti def] é conhecido e o valor deak desconhecido B the value of 6 is known and the value of @ is unknown for B: (a>0} T= O aXeu, ( 4 (a >0);T =[]_, X;. 1 n n 8.A distribuigdo uniforme nos inteiros 1 ,2,...,0, conforme T= Id registro 1 . 8. The uniform distribution on the integers 1, 2,..., 0, T- 1 (>: log 1 ; definido na Sec. 3.1, onde o valor deGE desconhecido (6=1, Dat 1-X ey as defined in Sec. 3.1, where the value of 6 is unknown nV 1- x; 2,.. 4 7=maximo{m, ..., Xn}. . (6 =1,2,...); T =max{X,,..., Xp}. . 16.Deixar ser um pardmetro com espaco de parametro igual 16. Let 6 be a parameter with parameter space Q equal 9.A distribuigdo uniforme no intervalo [um, 6], onde o a um intervalo de numeros reais (possivelmente ilimitado). 9. The uniform distribution on the interval [a, b], where to an interval of real numbers (possibly unbounded). Let valor deaé conhecido e o valor debE desconhecido (b > Deixar Xtem pdf ou PFfn(x| Acondicional a@. Deixar 7=r(X) ser the value of a is known and the value of b is unknown X have p.d-f. or p.f. f,(¥|@) conditional on 6. Let T =r(X) uma), T=maximo{™, ... , Xn}. uma estatistica. Assuma isso 7é suficiente. Prove que, para (b> a); T =max{Xy,..., Xy}- be a statistic. Assume that T is sufficient. Prove that, for toda fdp anterior possivel para@, o pdf posterior de@ dadoX=x every possible prior p.d.f. for 6, the posterior p.d.f. of 6 10.A distribuigao uniforme no intervalo [um, 6], onde o depende dexapenas atraves/(x). 10. The uniform distribution on the interval [a, b], where given X =x depends on x only through r(x). valor debé conhecido e o valor deaE desconhecido the value of b is known and the value of a is unknown ; ; (uma<b}, T=min{™, ..., Xn}. 17.Deixar Oseja um parametro e deixeXseja discreto com pf fn( (a <b); T =min{X},..., Xp}. 17. Let 6 be a parameter, and let X be discrete with p.f. X| @condicional a@. Deixar 7=/(Xser uma estatistica. Prove isso Ff, (x|@) conditional on 6. Let T =r(X) be a statistic. Prove 11.Assuma issoM,..., Xnformar uma amostra aleatéria de uma 7é suficiente se e somente se, para cadate cadax de tal modo 11. Assume that X 1, ..., X, form a random sample from that T is sufficient if and only if, for every t and every x distribuigdo que pertence a uma familia exponencial de quef=r(x), a funcdo de verossimilhanga da observacao 7=té a distribution that belongs to an exponential family of such that t =r(x), the likelihood function from observ- distribuigdesfiforme definido no Exercicio 23 da Seg. 7.3. Provar proporcional a funcgdo de verossimilhanga da observacdoX=x. distributions as defined in Exercise 23 of Sec. 7.3. Prove ing T =t is proportional to the likelihood function from que /= cu1a(Xeu uma estatistica suficiente para@. that T = )°"_, d(X;) is a sufficient statistic for 0. observing X =x. 7.8 Estatisticas Conjuntamente Suficientes 449 78 Jointly Sufficient Statistics 449 ~ 7.8 Estatisticas Conjuntamente Suficientes * 7.8 Jointly Sufficient Statistics Quando um paraémetro @ é multidimensional, estatisticas suficientes normalmente também When a parameter 0 is multidimensional, sufficient statistics will typically need to precisarao ser multidimensionais. As vezes, nenhuma estatistica unidimensional é suficiente, be multidimensional as well. Sometimes, no one-dimensional statistic is sufficient mesmo quando @ é unidimensional. Em qualquer dos casos, precisamos de alargar o conceito de even when @ is one-dimensional. In either case, we need to extend the concept of estatistica suficiente para lidar com casos em que é necesséria mais do que uma estatistica para sufficient statistic to deal with cases in which more than one statistic is needed in ser suficiente. order to be sufficient. Definicdo de estatisticas conjuntamente suficientes Definition of Jointly Sufficient Statistics Exemplo Amostragem de uma distribuigdo normal.Volte ao Exemplo 7.7.4, no qualX=(X1,..., Example Sampling from a Normal Distribution. Return to Example 7.7.4, in which X = (X1,..., 7.8.1 Xnformar uma amostra aleatoria da distribuigdo normal com médiaye variagdooz. Desta 7.8.1 X,,) form arandom sample from the normal distribution with mean i and variance o?. vez, Suponha que ambas as coordenadas do parametro 6=(y, 02/840 desconhecidos. O pdf This time, assume that both coordinates of the parameter 6 = (1, 07) are unknown. conjunto deXainda é dado pelo lado direito da Eq. (7.7.5). Mas agora, nos refeririamos ao The joint p.d.f. of X is still given by the right side of Eq. (7.7.5). But now, we would pdf conjunto como /n(x| 8). Com ambospeozdesconhecido, ja ndo parece haver uma Unica refer to the joint p.d-f. as f,(x|@). With both and o? unknown, there no longer estatistica que seja suficiente. - appears to be a single statistic that is sufficient. < Continuaremos a supor que as variaveisX1, ..., Xnformar uma amostra aleatoria de We shall continue to suppose that the variables X,,..., X,, form a random sam- uma distribuigdo para a qual a pdf ou o PF éf(x| 8), onde o pardametro@ deve pertencer a ple from a distribution for which the p.d.f. or the p.f. is f(x|0), where the parameter 0 algum espago de parametros. No entanto, consideraremos agora explicitamente a must belong to some parameter space Q. However, we shall now explicitly consider possibilidade de que@pode ser um vetor de parametros com valor real. Por exemplo, se a the possibility that 9 may be a vector of real-valued parameters. For example, if the amostra provém de uma distribuigdo normal para a qual tanto a médiaye a variagdoozsdo sample comes from a normal distribution for which both the mean w and the vari- desconhecidos, entéo @seria um vetor bidimensional cujos componentes sdoeoz. Da ance o* are unknown, then @ would be a two-dimensional vector whose components mesma forma, se a amostra vier de uma distribuigdo uniforme em algum intervalo [um, 6] are w and o?. Similarly, if the sample comes from a uniform distribution on some para os quais ambos os pontos finaisaebsdo desconhecidos, entdo @seria um vetor interval [a, b] for which both endpoints a and b are unknown, then @ would be a two- bidimensional cujos componentes sdoaebd. E claro que continuaremos a incluir a dimensional vector whose components are a and b. We shall, of course, continue to possibilidade de que6é um parametro unidimensional. include the possibility that 6 is a one-dimensional parameter. Em quase todos os problemas em que6 um vetor, bem como em alguns problemas In almost every problem in which 6 is a vector, as well as in some problems in em queéé unidimensional, ndo existe uma estatistica unidimensional 7isso é suficiente. which @ is one-dimensional, there does not exist a one-dimensional statistic T that is Em tal problema é necessario encontrar duas ou mais estatisticas7,..., Tk sufficient. In such a problem it is necessary to find two or more statistics T;, ..., Ty que juntos sdoestatisticas suficientes em conjuntonum sentido que sera agora descrito. that together are jointly sufficient statistics in a sense that will now be described. Suponha que em um determinado problema as estatisticas 71, ..., Tsao definidos pork Suppose that in a given problem the statistics T;, ... , T, are defined by k different diferentes funcdes do vetor de observacéesX=(M1,..., Xn). Especificamente, deixe Teu=Reu(X) functions of the vector of observations X = (X,,..., X,,). Specifically, let 7; = 1r;(X) paraeu=1,..., k. Em termos gerais, as estatisticas71,..., Tksdo estatisticas conjuntamente fori =1,...,k. Loosely speaking, the statistics T,,..., T, are jointly sufficient statis- suficientes para@se um estatistico que aprende apenas os valores doAfungdesRi(X),..., rk(X) tics for 0 if a statistician who learns only the values of the k functions r|(X), . . . , 7, (X) pode estimar cada componente de@e todas as fungdes dos componentes de6, bem como can estimate every component of 6 and every function of the components of 6, as aquele que observa omvalores individuais deX1,..., Xn. Mais formalmente, temos a seguinte well as one who observes the n individual values of X),..., X,,. More formally, we definicao. have the following definition. Definigao Estatisticas conjuntamente suficientes.suponha que para cada@e cada valor possivel(t,..., tk) Definition —_Jointly Sufficient Statistics. Suppose that for each 6 and each possible value (t, .. . , th) 7.8.1 de(N1,..., Tk), a distribuigado conjunta condicional de(™,..., XnXdado(N1,..., ThE (t,..., tkndo 7.8.1 of (T,,..., Tj), the conditional joint distribution of (X;,..., X,) given(T,..., Tj) = depende de@. Entdo7i,..., 7sdo0 chamadosestatisticas conjuntamente suficientes para @. (tj, ..., t,) does not depend on @. Then 7;, ... , T, are called jointly sufficient statistics for 6. Existe uma versdo do critério de fatoragdo para estatisticas suficientes em conjunto. A A version of the factorization criterion exists for jointly sufficient statistics. The prova nao sera dada, mas é semelhante a prova do Teorema 7.7.1. proof will not be given, but it is similar to the proof of Theorem 7.7.1. Teorema Critério de fatoracao para estatisticas conjuntamente suficientes.DeixarAi,..., riser funcdes den Theorem Factorization Criterion for Jointly Sufficient Statistics. Let r,,..., 7, be functions of n 7.8.1 variaveis reais. As estatisticas Teu= Reu(X),eu=1,..., k, Sdo estatisticas conjuntamente suficientes para Ose e 7.8.1 real variables. The statistics T; =7;(X),i =1,..., k, are jointly sufficient statistics for somente se o PDF conjunto ou o PF conjunto/n(x| pode ser fatorado da seguinte forma para 6 if and only if the joint p.d-f. or the joint pf. f,(¥|0) can be factored as follows for 450 Capitulo 7 Estimativa 450 Chapter 7 Estimation todos os valores dex€ Rre todos os valores deée: all values of x € R” and all values of 8 € Q: fn(x| OF voce(x)4.Ri (x), ..., rk(X),9]. (7.8.1) fr(X|@) =u(w)v[r}(e), ..., 7%), 6]. (7.8.1) Aqui as fungéesvocéevsdo ndo negativos, a fungdo vocépode depender dexmas Here the functions u and v are nonnegative, the function u may depend on x but does ndo depende@, e a fungdowai depender de@mas dependexsomente através dok not depend on @, and the function v will depend on 6 but depends on x only through fungdesi(x), ..., rk(x). 7 the k functions r1(x), ..., 7,(¥). 7 Exemplo Estatisticas conjuntamente suficientes para os parametros de uma distribuigdéo normal.Suponha que Example Jointly Sufficient Statistics for the Parameters of a Normal Distribution. Suppose that 7.8.2 &M,...,Xnformar uma amostra aleatéria a partir de uma distribuigdo normal para a qual tanto a 7.8.2 X1,..., X, form a random sample from a normal distribution for which both the médiaye a variacdoo2sdo0 desconhecidos. O pdf conjunto deX,..., XnE dado por mean yw and the variance o” are unknown. The joint p.d.f. of X,,..., X,, is given by Eq. (7.7.6) e pode __Yser visto que este pdf conjunto depende dexsomente através do Eq. (7.7.6), and it can be seen that this joint p.d.f. depends on x only through the valores de éu=1 Xeue 0 ete T portanto, pelo critério de fatoracdo, as estatisticas ) values of )*"_, x; and )7"_, x. Therefore, by the factorization criterion, the statistics N= éu=1 Xeue T2= Fe XyetSH0 estatisticas conjuntamente suficientes parayecz. - Ti = va X; and T, = an x? are jointly sufficient statistics for u and o. < Suponha agora que em um determinado problema as estatisticas 71, ..., Tso conjuntamente suficientes Suppose now that in a given problem the statistics T;, ..., T, are jointly sufficient estatisticas para algum vetor de parametro@. Sekoutras estatisticas71,..., 7kSdO Obtidos statistics for some parameter vector 6. If k other statistics T/,..., Tj are obtained de7,..., Tipor uma transformacao biunivoca, entaéo pode-se mostrar queT 1,..., Ty from Tj, ..., 7; by a one-to-one transformation, then it can be shown that T/,..., T; também sero estatisticas conjuntamente suficientes para@. will also be jointly sufficient statistics for 6. Exemplo Outro par de estatisticas conjuntamente suficientes para os parametros de uma distribuigdo normal Example Another Pair of Jointly Sufficient Statistics for the Parameters of a Normal Distribu- 7.8.3 ¢do.Suponha novamente queX', ..., Xnformar uma amostra aleatéria de uma distribuigdo normal 7.8.3 tion. Suppose again that X,,..., X,, form a random sample from a normal distri- mas para a qual tanto a médiaye a variagdoozsdo desconhecidos. Deixar7 1=,/ 0 bution for which both the mean j and the variance o? are unknown. Let T, =f, the média amostral, e deixe 7 2=03, a variancia da amostra. Por isso, sample mean, and let T; =o, the sample variance. Thus, 137 iw Ti=Xn eF a= (Xey Xn. T/=X, and Tj=-)(X;-X,)°. eu=1 "al Mostraremos que71e7' — 2sao estatisticas conjuntamente suficientes parayeon. We shall show that T} and T; are jointly sufficient statistics for u and o. Deixar Tie 7aser as estatisticas conjuntamente suficientes parayeozderivado no Exemplo Let T; and 7 be the jointly sufficient statistics for 4 and o* derived in Exam- 7.8.2. Entéo ple 7.8.2. Then 1 1 1 1 1 1 N= -T e@ T= -h- wR. Ti/=-T and Tj=-T——T}. n n n n n2 Além disso, de forma equivalente, Also, equivalently, =AT = nT _ _ 2 M=nF, e =n? 24724) Tj=nT, and T)=n(T;+T,°). Portanto, as estatisticas7 1@7 — 2s40 obtidos a partir de estatisticas conjuntamente suficientes Tie Hence, the statistics T/ and T, are obtained from the jointly sufficient statistics T, and 72por uma transformacgdo um-para-um. Segue-se, portanto, que7 1e@7 2eles mesmos T> by a one-to-one transformation. It follows, therefore, that T/ and T; themselves sdo estatisticas conjuntamente suficientes parayeoz. - are jointly sufficient statistics for u and o. < Mostramos agora que as estatisticas conjuntamente suficientes para a média e a variancia We have now shown that the jointly sufficient statistics for the unknown mean desconhecidas de uma distribuicdo normal podem ser escolhidas como Tie 72, conforme dado and variance of a normal distribution can be chosen to be either T, and T5, as given no Exemplo 7.8.2, ou7/1e7 — 2, conforme dado no Exemplo 7.8.3. in Example 7.8.2, or T; and T;, as given in Example 7.8.3. Exemplo Estatisticas conjuntamente suficientes para os parametros de uma distribuigdo uniforme.Suponha que Example Jointly Sufficient Statistics for the Parameters of a Uniform Distribution. Suppose that 7.8.4 M,...,Xnformar uma amostra aleatéria a partir da distribuigdo uniforme no intervalo [um, 5], 7.8.4 X1,..., X, form a random sample from the uniform distribution on the interval onde os valores de ambos os pontos finaisaebsdo desconhecidos (uma < 6). O pdf conjunto fn(x [a, b], where the values of both endpoints a and b are unknown (a < b). The joint p.d-f. | uma, be, ..., Xnsera 0, a menos que todos os valores observadosm1,..., Xndeitar entre ae fla, b) of X,,..., X,, will be 0 unless all the observed values x, ... , x, lie between b; aquilo é, fn(x| uma, b0, a menos que min{x1, ..., Xn} 2ae MaxiMo{x1, ..., Xn} Sb. a and b; that is, f,(x|a, b) =0 unless min{x,,...,x,} =a and max{x,,...,x,} <b. 7.8 Estatisticas Conjuntamente Suficientes 451 7.8 Jointly Sufficient Statistics 451 Além disso, para cada vetorxtal que min{x1,..., Xn} 2ae maximo{x, ..., Xn} < b, N6s Furthermore, for every vector x such that min{x,,..., x, } => a@andmax{x,,...,x,}< temos b, we have fn(x| bF { f,(xla, b) | n(X| uma, ——. a, b) = ———_.. (b-a)n " (b—a)y" Para cada dois nimerossimez, vamos deixarh(s, zser definido da seguinte forma: For each two numbers y and z, we shall let h(y, z) be defined as follows: t 1 parasimsz, 1 fory <z, hs, 2 h(y, 2) = 0 paravocé > z. 0 for y>Z. Para cada valor dex€Rn, podemos entdo escrever For every value of x € R”, we can then write AlLa,min{xi,..., xnJAl[max{xi,..., Xn}, Bb hla, min{x,,..., x,}]h[max{x,...,x,},b frtx| uma, be MIN, OHM Xb PL Frlala, by = Mae mint Xn} nlmaxtry, nh BY (b-a)n (b— a)" Como esta expressdo depende dexsomente através dos valores de min{x1,..., xn} e Since this expression depends on x only through the values of min{x,,..., x,} maximo{x1,..., Xn}, segue-se que as estatisticas M=min{X1, ..., Xn}e72= maximo{X, ..., X and max{x,,...,x,}, it follows that the statistics 7, = min{X,,..., X,} and 7) = n}sdo estatisticas conjuntamente suficientes paraaeb. - max{X),..., X,} are jointly sufficient statistics for a and b. < Estatisticas Minimas Suficientes Minimal Sufficient Statistics Num determinado problema, queremos tentar encontrar uma estatistica suficiente ou um conjunto In a given problem, we want to try to find a sufficient statistic or a set of jointly de estatisticas conjuntamente suficientes para@, porque os valores dessas estatisticas resumem todas sufficient statistics for 6, because the values of such statistics summarize all the as informag6es relevantes sobre @contida na amostra aleatéria. Quando um conjunto de estatisticas relevant information about 6 contained in the random sample. When a set of jointly conjuntamente suficientes 6 conhecido, a busca por um bom estimador de simplificado porque sufficient statistics are known, the search for a good estimator of @ is simplified precisamos considerar apenas funcées dessas estatisticas como possiveis estimadores. Portanto, num because we need consider only functions of these statistics as possible estimators. dado problema é desejavel encontrar, ndo apenas qualquer conjunto de estatisticas conjuntamente Therefore, in a given problem it is desirable to find, not merely any set of jointly suficientes, mas omais simplesconjunto de estatisticas conjuntamente suficientes. Ou seja, queremos sufficient statistics, but the simplest set of jointly sufficient statistics. That is, we want © conjunto de estatisticas suficientes que exija que consideremos a menor colecdo de estimadores the set of sufficient statistics that requires us to consider the smallest collection of possiveis. (Tornamos isso mais preciso na Definigdo 7.8.3.) Por exemplo, é correto, mas posible estimators. (We make this more precise in Defintion 7.8.3.) For example, it completamente inutil, dizer que em cada problema onobservacées Xi, ..., XnSdo estatisticas is correct but completely useless to say that in every problem the n observations conjuntamente suficientes. X1,..., X, are jointly sufficient statistics. Descreveremos agora outro conjunto de estatisticas conjuntamente suficientes que existem em todos We shall now describe another set of jointly sufficient statistics that exist in every os problemas e sdo um pouco mais Uteis. problem and are slightly more useful. Definicao Estatisticas de pedidos.Suponha que, ..., Xnformar uma amostra aleatoria de algum distrito Definition Order Statistics. Suppose that X,,..., X,, form a random sample from some distri- 7.8.2 mas. DeixarSidenotar o menor valor na amostra aleatéria, deixe S2denotar o proximo 7.8.2 bution. Let Y, denote the smallest value in the random sample, let Y, denote the next menor valor, deixeS3denota 0 terceiro menor valor e assim por diante. Desta maneira, Sn smallest value, let Y; denote the third smallest value, and so on. In this way, Y,, de- denota 0 maior valor na amostra, eSr-1denota o préximo maior valor. As variaveis notes the largest value in the sample, and Y,,_; denotes the next largest value. The aleatoriasS1,..., Snsdo chamados deestatisticas de pedidosda amostra. random variables Y;,..., Y,, are called the order statistics of the sample. Agora deixesinssinns. . .<simndenotam os valores das estatisticas do pedido para uma Now let y; < yo <--- < y, denote the values of the order statistics for a given determinada amostra. Se nos disserem os valores desir, ..., simn, entdéo sabemos que estes sample. If we are told the values of y,,..., y,, then we know that these n values valores foram obtidos na amostra. No entanto, ndo sabemos qual das observacées Xi, ..., Xn were obtained in the sample. However, we do not know which one of the observations realmente rendeu 0 valorsin, qual deles realmente rendeu 0 valorsim, e assim por diante. X1,..., X, actually yielded the value y,, which one actually yielded the value y, and Tudo 0 que sabemos é que 0 menor dos valores deX1,..., Xnerasirm, 0 préximo menor valor so on. All we know is that the smallest of the values of X;,..., X, was y,, the next foisime, e assim por diante. smallest value was y2, and so on. Teorema As estatisticas de pedidos sdo suficientes em amostras aleatérias.Deixar™, ..., Xnformar um aleatério Theorem Order Statistics Are Sufficient in Random Samples. Let X41, sey Xn form a random 7.8.2 amostra de uma distribuigado para a qual o pdf ou o PF éf(x| @). Entao as estatisticas do pedidoS 7.8.2 sample from a distribution for which the p.d-f. or the p.f. is f(x|0). Then the order , +++, 5n840 conjuntamente suficientes paraé. statistics Y;,..., Y,, are jointly sufficient for 0. 452 Capitulo 7 Estimativa 452 Chapter 7 Estimation ProvaDeixarsinnssinms. . <simndenotam os valores das estatisticas do pedido. O PDF conjunto Proof Let y,; < y) <---<,y, denote the values of the order statistics. The joint p.d_-f. ou PF conjunto deX1,..., Xntem o seguinte formato: or joint p.f. of X;,..., X,, has the following form: iT’ n fn(x| OF fixeu| 8). (7.8.2) f(X|0) = I] f (%;|0). (7.8.2) eu=1 i=l Como a ordem dos fatores no produto do lado direito da Eq. (7.8.2) é irrelevante, Since the order of the factors in the product on the right side of Eq. (7.8.2) is Eq. (7.8.2) poderia muito bem ser reescrito na forma irrelevant, Eq. (7.8.2) could just as well be rewritten in the form iT’ n fn(x\OQ= Ff (seu| 8). fa(¥l0) =] | £018). eu=1 i=l Por isso, fn(x| @\depende dexsomente através dos valores desinm, ..., Simmn. Seque-se, portanto, que Hence, f,,(x|@) depends on x only through the values of y,, ..., y,. It follows, there- as estatisticas de pedidos51,..., Snsdo estatisticas conjuntamente suficientes paraé. a fore, that the order statistics Y;,..., Y, are jointly sufficient statistics for 6. a Em palavras, o Teorema 7.8.2 diz que é suficiente conhecer o conjunto dennumeros In words, Theorem 7.8.2 says that it is sufficient to know the set of n numbers that que foram obtidos na amostra, e ndo é necessario saber qual desses numeros em were obtained in the sample, and it is not necessary to know which particular one of particular foi, por exemplo, o valor dex3. these numbers was, for example, the value of X3. Para ver como a estatistica de ordem é mais simples que 0 vetor de dados completo no sentido To see how the order statistic is simpler than the full data vector in the sense de ter menos estimadores possiveis, observe queX3é um estimador baseado no vetor de dados of having fewer possible estimators, note that X3 is an estimator based on the full completo, masX3ndo pode ser determinado a partir das estatisticas do pedido. Por issoX3ndo é um data vector, but X3 cannot be determined from the order statistics. Hence X3 is not estimador que precisariamos considerar se basedssemos nossa inferéncia nas estatisticas de pedidos. an estimator that we would need to consider if we based our inference on the order O mesmo se aplica a todas as médias da forma(Xeut. . .+Xeu)/kpara {eu1, ..., evk}um subconjunto statistics. The same is true of all of the averages of the form (Xi, teeet Xi,)/k for adequado de {1,..., 7}, bem como muitas outras fung6es. Por outro lado, todo estimador baseado {ij,...,%,} a proper subset of {1,...,m}, as well as many other functions. On the nas estatisticas de pedidos também é uma funcdo dos dados completos. other hand, every estimator based on the order statistics is also a function of the full data. Em cada um dos exemplos dados nesta secao e na Sec. 7.7, consideramos uma distribuigdo In each of the examples that have been given in this section and in Sec. 7.7, we para a qual havia uma Unica estatistica suficiente ou havia duas estatisticas que eram considered a distribution for which either there was a single sufficient statistic or there conjuntamente suficientes. Para algumas distribuicdes, no entanto, as estatisticas de pedidoss1 were two statistics that were jointly sufficient. For some distributions, however, the , ++, SnSA0 0 Conjunto mais simples de estatisticas conjuntamente suficientes que existe, e order statistics Y,,..., Y, are the simplest set of jointly sufficient statistics that exist, nenhuma reducao adicional em termos de estatisticas suficientes é possivel. and no further reduction in terms of sufficient statistics is possible. Exemplo Estatisticas suficientes para o parametro de uma distribuigdo de Cauchy.Suponha queXi,..., Xn Example Sufficient Statistics for the Parameter of a Cauchy Distribution. Suppose that Xj, ..., X, 7.8.5 formar uma amostra aleatoria de uma distribuigéo de Cauchy centrada em um ponto 7.8.5 form a random sample from a Cauchy distribution centered at an unknown point desconhecido 6 (-~<@</), O pdff(x| @desta distribuicdo é dada pela Eq. (7.6.6) e o pdf conjuntof 0 (—0co < 0 < ov). The p.d-f. f(x|0) of this distribution is given by Eq. (7.6.6), and the n(x| @deX,..., Xn@ dado pela Eq. (7.6.7). Pode ser mostrado que as Unicas estatisticas joint p.d.f. f,,(|@) of X1,..., X, is given by Eq. (7.6.7). It can be shown that the only conjuntamente suficientes que existem neste problema sdo as estatisticas de ordem51,..., Sn jointly sufficient statistics that exist in this problem are the order statistics Y;,..., Y, ou algum outro conjunto derEstatisticas 71, ..., Tnque pode ser derivado das estatisticas do or some other set of n statistics T;, ..., 7,, that can be derived from the order statistics pedido por uma transformacao um para um. Os detalhes do argumento nao serdo dados aqui. by a one-to-one transformation. The details of the argument will not be given here. - < Estas consideragdes levam-nos aos conceitos de uma estatistica minima suficiente e de um These considerations lead us to the concepts of a minimal sufficient statistic and a conjunto minimo de estatisticas conjuntamente suficientes. Uma estatistica suficiente 7é uma minimal set of jointly sufficient statistics. A sufficient statistic T is a minimal sufficient estatistica minima suficiente se cada fun¢do de 7,que em si é uma estatistica suficiente, 6 uma fun¢do statistic if every function of 7, which itself is a sufficient statistic, is a one-to-one biunivoca de 7.Formalmente, usaremos a seguinte definicdo, que é equivalente a definicdo informal function of 7. Formally, we shall use the following definition, which is equivalent to que acabamos de dar. the informal definition just given. Definicgao Estatistica(s) minima(s) suficiente(s) (em conjunto).Uma estatistica 7é um estatistica minima suficiente Definition Minimal (Jointly) Sufficient Statistic(s). A statistic T is a minimal sufficient statistic 7.8.3 se 7é suficiente e é uma funcdo de qualquer outra estatistica suficiente. Um vetor 7= (Ti,..., Tidas 7.8.3 if T is sufficient and is a function of every other sufficient statistic. A vector T = estatisticas sdoestatisticas minimas conjuntamente suficientesse as coordenadas de Tsao estatisticas (T,,..., T;,) of statistics are minimal jointly sufficient statistics if the coordinates of conjuntamente suficientes e7é uma funcdo de todas as outras estatisticas conjuntamente suficientes. T are jointly sufficient statistics and T is a function of every other jointly sufficient statistics. 7.8 Estatisticas Conjuntamente Suficientes 453 7.8 Jointly Sufficient Statistics 453 No Exemplo 7.8.5, as estatisticas do pedidoSi, ..., Snsdo estatisticas minimas e suficientes em In Example 7.8.5, the order statistics Y;,..., Y,, are minimal jointly sufficient conjunto. statistics. Estimadores de maxima verossimilhanga e estimadores de Bayes como Maximum Likelihood Estimators and Bayes Estimators estatisticas suficientes as Sufficient Statistics Para os préximos dois teoremas, vamosXi,..., Xnformar uma amostra aleatéria de uma For the next two theorems, let X;,..., X,, form a random sample from a distribution distribuigdo para a qual o PF ou o pdf éf(x| 8), onde o valor do parametro 6 desconhecido for which the p.f. or the p.d-f. is f (x|@), where the value of the parameter 6 is unknown e unidimensional. and one-dimensional. Teorema MLE e estatisticas suficientes.Deixar 7=/(Xi, ..., Xnser uma estatistica suficiente para. Theorem M.L.E. and Sufficient Statistics. Let T =r(Xj,..., X,,) be a sufficient statistic for 0. 7.8.3 Entdo o MLE@deG@depende das observacgéesX1,..., Xnsomente através da estatistica7 7.8.3 Then the M.L.E. 6 of 6 depends on the observations X,,..., X,, only through the Além disso, se por si s6 suficiente, entao é minimo suficiente. statistic T. Furthermore, if 6 is itself sufficient, then it is minimal sufficient. ProvaMostramos primeiro queGé uma funcdo de toda estatistica suficiente. Deixar 7=r(Xser Proof We show first that 6 is a function of every sufficient statistic. Let T =r(X) bea uma estatistica suficiente. O critério de fatoragdo Teorema 7.7.1 diz que a fungdo de sufficient statistic. The factorization criterion Theorem 7.7.1 says that the likelihood verossimilhangafn(x| @pode ser escrito na forma function f,,(«|0) can be written in the form fr(x| OF vocE(x) r(x), A]. fn (X10) = u(x) v[r(x), 8]. O MLE€ o valor de@para qualfn(x| 8€ um maximo. Segue-se, portanto, quae® The M.L.E. 6 is the value of @ for which f,,(x|9) is a maximum. It follows, therefore, sera o valor de@para qualY{r(x),@] € um maximo. Desde V[r(x), @jdepende do vetor that 6 will be the value of 6 for which v[r (x), 0] is amaximum. Since v[r (x), 6] depends observadoxsomente através da fun¢do/(x), segue que@fambém dependerax on the observed vector x only through the function r(x), it follows that @ will also somente através da funcdo/(x). Assim, o estimador 6 uma funcao de 7=r(X). depend on x only through the function r(x). Thus, the estimator @ is a function of T=r(X). Ja que o estimador & uma fungado das observagéesX1, ..., Xne nao € uma fungdo do Since the estimator @ is a function of the observations X,..., X,, and is not a parametro@, 0 estimador é em si uma estatistica. Se6@ na verdade uma estatistica suficiente, function of the parameter 0, the estimator is itself a statistic. If 6 is actually a sufficient entdo é minima suficiente porque acabamos de mostrar que é uma funcdo de todas as outras statistic, then it is minimal sufficient because we just showed that it is a function of estatisticas suficientes. 7 every other sufficient statistic. 7 O Teorema 7.8.3 pode ser facilmente estendido para 0 caso em que o parametroéé Theorem 7.8.3 can be extended easily to the case in which the parameter 6 is multidimensional. Se@=(61, .. . ,0k6 um vetor dekparametros com valor real, ent&o o vetor MLE(G multidimensional. If 6 = (6), ... , 9.) is a vector of k real-valued parameters, then the ,..., OkNai depender das observacdes™1,..., Xnsomente através das fungdes em um conjunto de M.L.E. vector (6), ..., 6,) will depend on the observations X,,..., X,, only through estatisticas conjuntamente suficientes. Se o vetor dos estimadores 6i,..., 6k um conjunto de the functions in a set of jointly sufficient statistics. If the vecotor of the estimators estatisticas conjuntamente suficientes, entao sdo estatisticas minimas conjuntamente suficientes 6),..., 4 18 a set of jointly sufficient statistics, then they are minimal jointly sufficient porque sdo funcées de cada conjunto de estatisticas conjuntamente suficientes. statistics because they are functions of every set of jointly sufficient statistics. Exemplo Estatisticas minimas conjuntamente suficientes para os pardmetros de uma distribuigdo normal.Suponha Example Minimal Jointly Sufficient Statistics for the Parameters of a Normal Distribution. Suppose 7.8.6 quem, ..., Xnformar uma amostra aleatoria a partir de uma distribuigdo normal para a qual 7.8.6 that X,,..., X,, form a random sample from a normal distribution for which both tanto a médiaye a variacdoozsdo desconhecidos. Foi mostrado no Exemplo 7.5.6 que o the mean and the variance o” are unknown. It was shown in Example 7.5.6 that the MLEsp@o2sdo a média amostral e a varidncia amostral. Também foi mostrado M.L.E.’s fi and o? are the sample mean and the sample variance. Also, it was shown no Exemplo 7.8.3 ques@oesao estatisticas conjuntamente suficientes. Por isso, v@c2sdo estatisticas in Example 7.8.3 that fi and o2 are jointly sufficient statistics. Hence, fi and o2 are minimas e suficientes em conjunto. - minimal jointly sufficient statistics. < O estatistico do Exemplo 7.8.6 pode restringir a busca por bons estimadores dey/ ea2a The statistician in Example 7.8.6 can restrict the search for good estimators of fungées de estatisticas minimas conjuntamente suficientes. Resulta, portanto, de and o” to functions of minimal jointly sufficient statistics. It follows, therefore, from Exemplo 7.8.6 que se o MLEyeo2por si sé ndo sdo usados como estimadores depeoz Example 7.8.6 that if the M.L.E.’s fi and o? themselves are not used as estimators , oS Unicos outros estimadores que precisam ser considerados sdo fungédes des/* of w and o?, the only other estimators that need to be considered are functions of ji eon, and o?. Os resultados acima relativos aos MLE também se referem aos estimadores Bayesianos. The results above concerning M.L.E.’s also pertain to Bayes estimators. 454 Capitulo 7 Estimativa 454 Chapter 7 Estimation Teorema Estimador de Bayes e estatisticas suficientes.Deixar =r(Xser uma estatistica suficiente para Theorem Bayes Estimator and Sufficient Statistics. Let T =r(X) be a sufficient statistic for 7.8.4 @.Entdo todo estimador Bayesiano de 6depende das observacgoes™i,..., Xn 7.8.4 0. Then every Bayes estimator 6 of 6 depends on the observations Xj,..., X, somente através da estatistica 7Além disso, se G@ por si so suficiente, entaéo é minimo only through the statistic 7. Furthermore, if @ is itself sufficient, then it is minimal suficiente. sufficient. ProvaDeixe o pdf ou PF anterior de&ser€(). Seque-se da relacdo (7.2.10) e do Proof Let the prior p.d.f. or p.f. of 6 be &(6). It follows from relation (7.2.10) and critério de fatoracdo que a pdf posterior €/6| xsatisfara a seguinte relacdo: the factorization criterion that the posterior p.d.f. €(6|x) will satisfy the following relation: &(8| x U(x), 6/8). CO|x) x v[r(x), O]E@). Pode-se ver a partir desta relacdo que a fdp posterior de@dependera do vetor It can be seen from this relation that the posterior p.d.f. of 6 will depend on observadoxsomente através do valor der(x). Como o estimador Bayesiano de Gem relacdo the observed vector x only through the value of r(x). Since the Bayes estimator of a uma fungdo de perda especificada é calculada a partir desta fdp posterior, o estimador @ with respect to a specified loss function is calculated from this posterior p.d.f., the também dependera do vetor observadoxsomente através do valor der(x). Em outras estimator also will depend on the observed vector x only through the value of r(x). In palavras, o estimador de Bayes é uma fungdo de 7=/(X). Ja que o estimador de Bayes 6€ other words, the Bayes estimator is a function of T = r(X). Since the Bayes estimator em si uma estatistica e 6 uma fungdo de toda estatistica suficiente 7,se@fambém é @ is itself a statistic and is a function of every sufficient statistic T, if @ is also sufficient, suficiente, entao é minimo suficiente. : then it is minimal sufficient. : O Teorema 7.8.4 também se estende a parametros vetoriais e estatisticas suficientes conjuntamente. Theorem 7.8.4 also extends to vector parameters and jointly sufficient statistics. Resumo Summary Estatisticas 1=A1(X),..., Tk Rk(X40 conjuntamente suficientes se e somente se o PF ou pdf Statistics T, =r ,(X),..., T, =1;,(X) are jointly sufficient if and only if the joint p.f. conjunto puder ser fatorado comofn(x| AF vocé(x)Ri (x), ..., rk(X),@], para algumas fungées or p.d.f. can be factored as f,(x|@) =u(x)v[r1 (x), ..., r(x), 0], for some functions vocéev. Fica claro a partir desta fatoracdo que os dados originais™, ... , X»SAo conjuntamente u and v. It is clear from this factorization that the original data X;,..., X, are suficientes. Para ser util, uma estatistica suficiente deve ser uma funcdo mais simples do que jointly sufficient. In order to be useful, a sufficient statistic should be a simpler todos os dados. Uma estatistica minima suficiente é a funcdo mais simples que ainda é function than the entire data. A minimal sufficient statistic is the simplest function suficiente; isto 6, 6 uma estatistica suficiente que é uma funcdo de toda estatistica suficiente. that is still sufficient; that is, it is a sufficient statistic that is a function of every Como a funcdo de verossimilhanga é uma funcdo de toda estatistica suficiente, de acordo como sufficient statistic. Since the likelihood function is a function of every sufficient critério de fatoracdo, uma estatistica suficiente que pode ser determinada a partir da funcao de statistic, according to the factorization criterion, a sufficient statistic that can be verossimilhanga é minima suficiente. Em particular, se um estimador MLE ou Bayes for determined from the likelihood function is minimal sufficient. In particular, if an suficiente, entdo ele é minimo suficiente. M.L.E. or Bayes estimator is sufficient, then it is minimal sufficient. Exercicios Exercises Instrugdes para os Exercicios 1 a 4: Em cada exercicio, suponha 4.A distribuigdo uniforme no intervalo [@, +3], onde o Instructions for Exercises 1 to 4: In each exercise, assume 4. The uniform distribution on the interval [0, 6 + 3], que as variaveis aleatériasM, ..., Xnformar uma amostra valor deGé desconhecido (-~ <@<e); 7=min{%, that the random variables X;,..., X,, formarandom sam- where the value of 6 is unknown (—oo < @ < oo); Ty = aleatoria de tamanhonda distribuigdo especificada no exercicio e ple of size n from the distribution specified in the exercise, min{X, mostram que as estatisticas Tie Zzespecificados no exercicio sdo ...,Xnhe T2= maximo{™, ..., Xn}. and show that the statistics T, and T, specified in the exer- ..., X,} and T> = max{Xy,..., Xj}. estatisticas conjuntamente suficientes. cise are jointly sufficient statistics. ae 5.Suponha que os vetores(X1, 51), (X2, S2),..., (Xn, Sn) oe, . 5. Suppose that the vectors (X1, Y;), (Xo, Y),..., 1.Uma distribui¢ao gama para a qual ambos pf] amperimetrosa formar uma amostra aleatoria de vetores bidimensionais 1. A gamma distribution for which both parameters a (X,, ¥,) form a random sample of two-dimensional vec- ePsag desconhecidos (a >0 ef >0); N= eu-1 Xeue de uma distribuigdo normal bivariada para a qual as and B are unknown (a > 0 and 6 > 0); T =[Tj_, X; and tors from a bivariate normal distribution for which the n= eu=Xeu, médias, as varidncias e a correlagdo sdo desconhecidas. Th= ey X;. means, the variances, and the correlation are unknown. 2.Uma distribuicao beta para a qual ambos pf] amperimetrosae Scomo isso seguinte 350 CINCO Fstatisticas $40 joeturdament suficiente: 2. A beta distribution for which both parameters w and Show that the following five statistics are jointly sufficient: fBs° desconhecidos (a>0 eB >0);T1= Ge Xeve T2= eu-iXeu, ewe, eu= 1X24 exer Ser? © eurtXeuSeu. B are unknown (a > 0 and 6 > 0); T =J]}_, X; and = Deiat Xs Vian Vis Lint X7> Lies Vi, and Dy Xi¥i- n n unt (1 -Xeu). 6.Considere uma distribui¢éo para a qual a pdf ou o PF éf(x Ti=10 — X)). 6. Consider a distribution for which the p.d-f. or the pf. 3.Uma distribuicdo de Pareto (ver Exercicio 16 da Secdo 5.7) paraa | 6), onde o parametro& umAvetor tridimensional 3. A Pareto distribution (see Exercise 16 of Sec. 5.7) is f (x19), where the parameter 6 is a k-dimensional vec- qual ambos os parametrosxoeasdo desconhecidos|] agora conhecido/x0> pertencente a algum espaco de parametros. Diz-se que a for which both parameters x9 and a are unknown (x9 > tor belonging to some parameter space &. It is said that 0 ea >0} M=min{X Xnye T2= n x. familia de distribuigdes indexadas pelos valores dedem 0 and a > 0); T; = min{X, X,} and T, =|]"_, X; the family of distributions indexed by the values of 6 in ' peers eu=1Xeu. ° a I= . 7.9 Melhorando um Estimador 455 7.9 Improving an Estimator 455 é umk-familia exponencial de pardmetros, ou umk-parémetro 11.Suponha queX,..., Xnformar uma amostra aleatéria de Q is a k-parameter exponential family, or a k-parameter 11. Suppose that X;,..., X, form a random sample from Familia Koopman-Darmois, sef(x| @pode ser escrito da seguinte uma distribuigdo de Cauchy centrada em um ponto Koopman-Darmois family, if f (x|6) can be written as fol- a Cauchy distribution centered at an unknown point 6 forma paraé€e todos os valores dex. desconhecido @ (-~ <@<), O MLE é deGuma estatistica minima lows for 6 € Q and all values of x: (—oo < 6 < oo). Is the M.L.E. of 6 a minimal sufficient [ ] suficiente? statistic? x k Ax| @)=uma(@)b(xJexperiéncia ceu(@)deu(x). 12.Suponha queXi,..., Xnforme uma amostra aleatoria a Ff (x|0) =a(@)b(x) exp > c(0)d; 9 12. Suppose that Xj, ..., X,, form a random sample from eu=1 partir de uma distribuicdo cuja pdf é a seguinte: i=1 a distribution for which the p.d-f. is as follows: Aqui,aeci,..., cksao funcées arbitrarias de@, ebeai,..., ak t 2x para 0S x<6, Here, a and cj, ..., cy, are arbitrary functions of 6, and b 2x for0 <x <6 sdo funcées arbitrarias dex. Suponha agora que, ..., Xn f(x| OF O° and d,,..., d, are arbitrary functions of x. Suppose now FOO) = | e° oe noe ep ee de outra forma. . : : 0 otherwise. formar uma amostra aleatéria de uma distribuigado que . that X,,..., X, formarandom sample from a distribution pertence a umk-parametro familia exponencial deste tipo e Aqui, o valor do parametro@€ desconhecido(@ >0). which belongs to a k-parameter exponential family of this | Here, the value of the parameter 6 is unknown (@ > 0). defina ok€statisticas 71, ..., Tado seguinte modo: Determine o MLE da mediana desta distribuigdo e mostre type, and define the k statistics T;, ... , T, as follows: Determine the M.L.E. of the median of this distribution, ” que este estimador é uma estatistica minima suficiente h and show that this estimator is a minimal sufficient statistic Teu=— deu(Xjparaeu=1,..., k. paraé. T,= 5 d(X;) fori=1,...,k. for 0. Fl 13.Suponha que, ..., Xformar uma amostra aleatéria a partir j=l 13. Suppose that X;,..., X,, form a random sample from . . a. ; da distribui¢do uniforme no intervalo [um, 6], onde ambos os Lo. .. . the uniform distribution on the interval [a, b], where both Mostre que as estatisticas7i,..., TiSdo estatisticas conjuntamente pontos finaisaebsaio desconhecidos. Os MLEs so deae b Show that the statistics T,,..., 7, are jointly sufficient endpoints a and b are unknown. Are the M.L.E.’s of a and suficientes para6. estatisticas minimas conjuntamente suficientes? statistics for 6. b minimal jointly sufficient statistics? 7.Mostre que cada uma das seguintes familias de distribuigdes é uma 14.Para as condicées do Exercicio 5, os MLE das médias, das 7. Show that each of the following families of distribu- 14. For the conditions of Exercise 5, the M.L.E.’s of the familia exponencial de dois parametros conforme definido no Exercicio —_arincias e da correlacao sao dados no Exercicio 24 da Secdo. tions is a two-parameter exponential family as defined in means, the variances, and the correlation are given in 6: 7.6. Esses cinco estimadores sao estatisticas minimas Exercise 6: Exercise 24 of Sec. 7.6. Are these five estimators minimal a.A familia de todas as distribuigdes normais para as quaisa suficientes em conjunto? a. The family of all normal distributions for which both jointly sufficient statistics? media e a variancia sao desconhecidas 15.Suponha que, ..., Xnformar uma amostra aleatéria da the mean and the variance are unknown 15. Suppose that X;,..., X,, form a random sample from b.A familia de todas as distribuigdes gama para as quais ambos distribuicdo de Bernoulli com parametrop, o que é b. The family of all gamma distributions for which both the Bernoulli distribution with parameter p, which is un- ae sao desconhecidos desconhecido, e que a distribuicdo anterior depé uma a and f are unknown known, and that the prior distribution of p is a certain c.A familia de todas as distribuigdes beta para as quais ambosaef determinada distribuigdo beta especificada. O estimador de c. The family of all beta distributions for which both a specified beta distribution. Is the Bayes estimator of p sao desconhecidos Bayes ép em relagdo a funcgdo de perda de erro quadratico, and 6 are unknown with respect to the squared error loss function a minimal uma estatistica minima suficiente? sufficient statistic? 8.Suponha queXi, ..., Xnformar uma amostra aleatoéria de 8. Suppose that X;,..., X,, form a random sample from uma distribuigdo exponencial para a qual o valor do 16.Suponha queX, ..., Xnformar uma amostra aleatoria a partir an exponential distribution for which the value of the 16. Suppose that X;,..., X, form a random sample from parametrot desconhecido (f >0). O MLE é deBuma de uma distribuigdo de Poisson para a qual o valor da médiaAé parameter # is unknown (f > 0). Is the M.L.E. of B a a Poisson distribution for which the value of the mean A is estatistica minima suficiente? desconhecido, e que a distribuigdo anterior deAé uma certa minimal sufficient statistic? unknown, and that the prior distribution of 4 is a certain distribuigao gama especificada. O estimador de Bayes 6A em specified gamma distribution. Is the Bayes estimator of 2 9.Suponha queM,..., Xnformar uma amostra aleatoriada _relacdo a funcdo de perda de erro quadratico, uma estatistica 9. Suppose that X;,..., X, form arandom sample from _ with respect to the squared error loss function a minimal distribuigdo de Bernoulli com pardmetrop, que é minima suficiente? the Bernoulli distribution with parameter p, which is un- sufficient statistic? desconhecido (O<ps1). O MLE é depuma estatistica minima known (0 < p < 1). Is the M.L.E. of p a minimal sufficient suficiente? 17.Suponha que, ..., Xnformar uma amostra aleatoria de statistic? 17. Suppose that X;,..., X, forma random sample from uma distribuigdo normal para a qual o valor da médiayé a normal distribution for which the value of the mean uw 10.Suponha queX,..., Xnformar uma amostra aleatéria a desconhecido e o valor da varidncia 6 conhecido, e a 10. Suppose that X1,..., X, form a random sample from is unknown and the value of the variance is known, and partir da distribuigdo uniforme no intervalo [0,4], onde o valor distribuigdo anterior devé uma certa distribuigdo normal the uniform distribution on the interval [0, 6], where the the prior distribution of is a certain specified normal de GE desconhecido (@ >0). O MLE é de@uma estatistica minima especificada. O estimador de Bayes éyem relagdo a fungdo de value of 6 is unknown (6 > 0). Is the M.L.E. of 6 a minimal distribution. Is the Bayes estimator of jz with respect to the suficiente? perda de erro quadratico, uma estatistica minima suficiente? sufficient statistic? squared error loss function a minimal sufficient statistic? - 7.9 Melhorando um Estimador * 7.9 Improving an Estimator Nesta secdo, mostramos como melhorar um estimador que no é funcao de uma In this section, we show how to improve upon an estimator that is not a function of estatistica suficiente usando um estimador que é fun¢ao de uma estatistica suficiente. a sufficient statistic by using an estimator that is a function of a sufficient statistic. O erro quadratico médio de um estimador The Mean Squared Error of an Estimator Exemplo Chegadas de clientes.O dono de uma loja esta interessado na probabilidadepexatamente esse Example Customer Arrivals. A store owner is interested in the probability p that exactly one 7.9.1 o cliente chegaraé em um hordrio normal. Ela modela as chegadas de clientes como um 7.9.1 customer will arrive during a typical hour. She models customer arrivals as a Poisson processo de Poisson com taxa por hora e observa quantos clientes chegam durante cada process with rate @ per hour and observes how many customers arrive during each 456 Capitulo 7 Estimativa 456 Chapter 7 Estimation denhoras,™,..., Xn. Ela converte cadaXeuparaSeu=1 seXeu=1 eSeu=0 seXev=1. of n hours, X;,..., X,. She converts each X; to Y; = 1if X¥;=land Y,; =Oif xX; 41. Entdo51,..., Sné uma amostra aleatoéria do Bdistribuigdo Ernoulli com pardmetro Then Y;,..., Y,, is arandom sample from the Bernoulli distribution with parameter p.O dono da loja entdo estimappor 6(X= ent Seu/n. Este é um bom estimador? Em p. The store owner then estimates p by 5(X) = )7"_, Y;/n. Is this a good estimator? In Em particular, se 0 dono da loja quiser minimizar 0 erro quadratico médio, existe outro estimador que particular, if the store owner wants to minimize mean squared error, is there another possamos mostrar que é melhor? - estimator that we can show is better? < Em geral, suponha queX=(%1, ..., Xn/formar uma amostra aleatéria de uma In general, suppose that X = (X;,..., X,,) form a random sample from a distri- distribuigdo para a qual a pdf ou o PF éf/x| 8), onde o parametro@deve pertencer a algum bution for which the p.d.f. or the p.f. is f(x|9), where the parameter 6 must belong espaco de pardmetros. Nesta seccdo, pode ser um parametro unidimensional ou um to some parameter space Q. In this section, 9 can be a one-dimensional parameter vetor de paradmetros. Para cada variavel aleatoriaZ=g(M1, ..., Xn), vamos deixar Fo(Z) or a vector of parameters. For each random variable Z = g(X,,..., X,,), we shall let denota a expectativa deZcalculado em relagdo ao PDF conjunto ou PF conjuntofn(x| 8). Se E,(Z) denote the expectation of Z calculated with respect to the joint p.d-f. or joint estivéssemos pensando@como uma variavel aleatéria, entao Fe(Z)- (Z| 8). Por exemplo, sef p.f. f,(x|@). If we were thinking of 6 as a random variable, then Eg(Z) = E(Z|@). For n(x| 8% um pdf, entao example, if f,(x|@) is a p.d.f, then Jos poe Ee(ZF tee GOO fn(x| Ade. . .adxn. E,(Z) =| ee / B(X) fn (X|O) dx1...dXpy. — 0 — 0 —oo —oo Suponhamos que o valor deGé desconhecido e que queremos estimar alguma fungdo We shall suppose that the value of 6 is unknown and that we want to estimate h(@). SeGé um vetor,h(@)pode ser uma das coordenadas ou uma fungao de todas as some function (6). If @ is a vector, h(@) might be one of the coordinates or a function coordenadas e assim por diante. Assumiremos que a fungdo de perda de erro quadratica of all coordinates, and so on. We shall assume that the squared error loss function is sera usada. Além disso, para cada estimador fornecidod(X/e cada valor dado de@€,vamos to be used. Also, for each given estimator 5(X) and every given value of 6 € Q, we deixar (8, Sdenotar o MSE deédcalculado em relacdo ao valor dado de @Por isso, shall let R(@, 5) denote the M.S.E. of 5 calculated with respect to the given value of 6. Thus, RO, 5F Eo 6(X)-h(@)]2). (7.9.1) R(6, 5) = Ep ((8(X) — hp). (7.9.1) Se nao atribuirmos uma distribuigdo anterior a9, entéo deseja-se encontrar um estimadoré If we do not assign a prior distribution to 6, then it is desired to find an estimator 6 para o qual o MSEA(@, 5 pequeno para cada valor de@€ou, pelo menos, para uma ampla gama for which the M.S.E. R(@, 5) is small for every value of 6 € Q or, at least, for a wide de valores de@. range of values of 6. Suponha agora que 7é um vetor de estatisticas conjuntamente suficientes para@. No Suppose now that T is a vector of jointly sufficient statistics for 6. In the re- restante desta secdo nos referiremos a 7simplesmente como estatistica suficiente. Se 7é mainder of this section we shall refer to T simply as the sufficient statistic. If T is unidimensional, basta fingir que o escrevemos como/7.Considere um estatisticoAquem one-dimensional, just pretend that we wrote it as T. Consider a statistician A who planeja usar um determinado estimadord(X). Na seg. 7.7 observamos que outro plans to use a particular estimator 5(X). In Sec. 7.7 we remarked that another statisti- estatisticoBque aprende apenas 0 valor da estatistica suficiente 7pode gerar, por meio de cian B who learns only the value of the sufficient statistic T can generate, by means of uma randomizacaéo auxiliar, um estimador que tera exatamente a mesma distribuicgdo que an auxiliary randomization, an estimator that will have exactly the same distribution d(Xe, em particular, tera o mesmo erro quadratico médio qued(X)para cada valor deée . as 6(X) and, in particular, will have the same mean squared error as 6(X) for every Mostraremos agora que mesmo sem usar uma randomizacdo auxiliar, o estatistico Bpode value of 8 € 2. We shall now show that even without using an auxiliary randomiza- encontrar um estimadordvisso depende das observagdesXsomente através da estatistica tion, statistician B can find an estimator 6p that depends on the observations X only suficiente Ze é pelo menos um estimador tao bom quantodno sentido de queR(G, d0)S R/O, through the sufficient statistic T and is at least as good an estimator as 6 in the sense 6), para cada valor de@e. that R(O, 59) < R(O, 5), for every value of 6 € Q. Expectativa condicional quando uma estatistica suficiente 6 conhecida Conditional Expectation When a Sufficient Statistic Is Known Vamos definir o estimadordo(T)pela seguinte expectativa condicional: We shall define the estimator 69(T) by the following conditional expectation: 60(TF Eg 5(X)| 7]. (7.9.2) 69(T) = E,[6(X)|T]. (7.9.2) Desde 7é uma estatistica suficiente, a distribuigdo conjunta condicional deM,..., Xn Since T is a sufficient statistic, the conditional joint distribution of X;,..., X,, for para cada valor dado de 7é 0 mesmo para todo valor deé€ .Portanto, para qualquer each given value of T is the same for every value of 6 € Q. Therefore, for any given valor dado de7, a expectativa condicional da fungdod(X)sera o mesmo para cada value of T, the conditional expectation of the function 5(X) will be the same for valor deG€ .Segue-se que a expectativa condicional na Eq. (7.9.2) dependera do valor every value of 6 € Q. It follows that the conditional expectation in Eq. (7.9.2) will de 7mas na verdade nado dependerad do valor de@. Em outras palavras, a fun¢dod0(T depend on the value of T but will not actually depend on the value of @. In other de fato um estimador deOporque depende apenas das observacéesXe ndo depende words, the function 69(T) is indeed an estimator of 6 because it depends only on the do valor desconhecido de@. Por esta razao, nos observations X and does not depend on the unknown value of @. For this reason, we 7.9 Melhorando um Estimador 457 7.9 Improving an Estimator 457 pode omitir o subscrito@no simbolo de expectativaéna Eq. (7.9.2), e podemos escrever a relacao can omit the subscript 6 on the expectation symbol E in Eq. (7.9.2), and we can write da seguinte forma: the relation as follows: 60(TF AL 6(X)| 71. (7.9.3) 69(T) = E[8(X)|T]. (7.9.3) Podemos agora provar o seguinte teorema, que foi estabelecido independentemente por We can now prove the following theorem, which was established independently D. Blackwell e CR Rao no final da década de 1940. by D. Blackwell and C. R. Rao in the late 1940s. Teorema Deixar d(Xseja um estimador, seja Ser uma estatistica suficiente para@, e deixe o estimador Theorem Let 5(X) be an estimator, let T be a sufficient statistic for 6, and let the estimator 7.9.1 60(Tser definido como na Eq.(7.9.3). Entao para cada valor deée, 7.9.1 59(T) be defined as in Eq. (7.9.3). Then for every value of 6 € Q, R(O, 50)SR(O, 5). (7.9.4) R(@, 59) < R@, 4). (7.9.4) Além disso, seR(@, 5) <~,ha desigualdade estrita em (7.9.4), a menos qued(Xjé uma Furthermore, if R(@, 5) < oo, there is strict inequality in (7.9.4) unless 5(X) is a funcdo deT. function of T. ProvaSe 0 MSEAR/G, 5X infinito para um determinado valor deG€,entdo a relacdo Proof If the M.S.E. R(@, 5) is infinite for a given value of 6 € Q, then the relation (7.9.4) é automaticamente satisfeita. Assumiremos, portanto, queR/@, 5) <~.Segue-se (7.9.4) is automatically satisfied. We shall assume, therefore, that R(0, 5) < oo. It da parte (a) do Exercicio 4 na Seg. 4.4 que follows from part (a) of Exercise 4 in Sec. 4.4 that Fol 5(X)-Ol2 (EA S(XN -O), Ey({5(X) — OP) > (Eg[3(X)] - 0)”, e pode ser demonstrado que esta mesma relacdo também deve ser mantida se as expectativas forem and it can be shown that this same relationship must also hold if the expectations are substituidas por expectativas condicionais dadas 7. Portanto, replaced by conditional expectations given T. Therefore, Eo 5(X} 2 | T)2 (El 5(X)| 7] -@2= [50(T} Az. (7.9.5) Ep ((5(X) — OP |T) = (Ee[5(X)|T] — 9)? = [59(T) — OF. (7.9.5) Segue-se agora da relagdo (7.9.5) que It now follows from relation (7.9.5) that R(8, 50= Eel 60(T)-0}2)< Eat EA { 5(X)-0}2 | T} RO, 8) = Eo[{8(T) — 6}"] S Eg{ Bol (5(X) — 0)°1T} = Fol{5(X)- 832] =R(8, 6), = E4[{5(X) — 6}"]= R@, 4), onde a penultima igualdade segue do Teorema 4.7.1, a lei da probabilidade total where the next-to-last equality follows from Theorem 4.7.1, the law of total proba- para expectativas. Por isso, R(8, 50)SR(@, d)Jpara cada valor deée . bility for expectations. Hence, R(O, 59) < R(@, 5) for every value of 6 € Q. Finalmente, suponha queA/@G, 5) <we essad(X)ndo é uma fungdo deT. Ou seja, ndo Finally, suppose that R(@, 5) < oo and that 5(X) is not a function of T. That is, ha fungdog(7tal que Pr(d(X g(T)| T=1. Em seguida, a parte (b) do Exercicio 4 da there is no function g(T) such that Pr(S(X) = g(T)|T) = 1. Then part (b) of Exercise 4 Sedo. 4.4 (condicional 7) diz que ha desigualdade estrita em (7.9.4). 7 in Sec. 4.4 (conditional on T) says that there is strict inequality in (7.9.4). 7 Exemplo Chegadas de clientes.Volte agora ao Exemplo 7.9.1. Deixar representa a taxa do cliente Example Customer Arrivals. Return now to Example 7.9.1. Let 6 stand for the rate of customer 7.9.2 chegadas em unidades por hora. EntaoXforma uma amostra aleatéria do Poisson) __distribuicao 7.9.2 arrivals in units per hour. Then X forms a random sample from the Poisson distribu- cdo com média@. O Exemplo 7.7.2 nos mostra que uma estatistica suficiente é 7= 0 aXeu, tion with mean 6. Example 7.7.2 shows us that a sufficient statistic is T = an X;. A distribuigdo de 7é a distribuigdo de Poisson com médian@Vamos agora calcular The distribution of T is the Poisson distribution with mean n@. We shall now compute 60(TF EL 4(X)| 71, 89(T) = E[5(X)|T], onded(XF 2 cu=1 Seu/nfoi definido no Exemplo 7.9.1. (Lembre-se disso Seu=1 seXeu=1 where 6(X) = )°"_, Y,/n was defined in Example 7.9.1. (Recall that Y¥; = 1if X; =1 eSeu=0 seXeu=1 para qued(X a proporcdo de horas em que exatamente um and Y; = 0 if X; 41 so that 6(X) is the proportion of hours in which exactly one cliente chega.) Para cadaewe cada valor possiveltde 7,é facil ver isso customer arrives.) For each i and each possible value t of T, it is easy to see that 2 P 1,>° 1 1 Pr.Xeu=1, freuXFt-1 _ _ r (x; =1,)°.,,X,;=t- ) E(Seu| T=t=Pr.(Xeu=1 | T=t= Prieur, FQ 2 EV,|T =) =Pr(X, = 17 ar) = PSL TED ON AT Pr.(7=0) Pr.(7=0) Pr(T =f) Pr(T =) 458 Capitulo 7 Estimativa 458 Chapter 7 Estimation Parat=0, pr(Xeu=1| 7=00 trivialmente. Para>0, vemos isso For t = 0, Pr(X; = 1|T = 0) = 0 trivially. For t > 0, we see that e-na(n@, —79 (no)! Pr.(T=t= en9(nG)t Pr(T =t) = ee ney ti t! 2 @ [1907-1] Oe-1 e-n6l-1]t-1 Ot “PU in — 1]! en — 1] “10° Pr.[Xeu=1, Xet-1| =e0 Q@x Surana _ endir let Pr X;=1,)°xX,;=r-1 =e 96 x cin tp jeu (1) (1) iat (t — 1)! (t — 1)! A razdo entre essas duas probabilidades é The ratio of these two probabilities is t ( 1) t1 t 1 t-1 E(Seu|T=t= — 1-- . (7.9.6) E(Y,|T =t)=— (1 - *) (7.9.6) n n n n Segue que It follows that 1 > n | 1 > n 1 n 1 n 60(t}=F, 50(x)| T=f] =E - SquT=t =—" F(Seu| T=0). do(t) = E[So(x)|T =t]=E | — > Y,|T=t|=- > E(Y,|T =1t). n n n * nn’ eu=1 eu=1 i=1 i=l De acordo com a Eq. (7.9.6), todosF(Seu| 7=t)sdo iguais, entdodo(t€ 0 lado direito da According to Eq. (7.9.6), all E(Y;|T =1) are the same, so 60(t) is the right-hand side Eq. (7.9.6). Quedo(7)é melhor qued(XJa perda de erro quadratico segue do Teorema of Eq. (7.9.6). That 59(T) is better than 6(X) under squared error loss follows from 7.9.1, - Theorem 7.9.1. < Um resultado semelhante ao Teorema 7.9.1 é valido seR(@, 5) definido como o MAE de um A result similar to Theorem 7.9.1 holds if R(6, 5) is defined as the M.A.E. of estimador para um determinado valor de@= = em vez do MSE ded. Em outras palavras, an estimator for a given value of 6 € Q instead of the M.S.E. of 8. In other words, Suponha queR(@, 5% definido da seguinte forma: suppose that R(O, 5) is defined as follows: R(@, 5 Ea(| 5(X)- 9] ). (7.9.7) RO, 6) = Eg(|6(X) — 9)). (7.9.7) Entao pode-se mostrar (ver Exercicio 10 no final desta segdo) que o Teorema 7.9.1 ainda é Then it can be shown (see Exercise 10 at the end of this section) that Theorem 7.9.1 verdadeiro. is still true. Definicao Inadmissivel/Admissivel/Domina.Suponha queA(G, 6% definido pela Eq. Definition —Inadmissible/Admissible/Dominates. Suppose that R(@, 5) is defined by either Eq. 7.9.1 (7.9.1) ou Eq. (7.9.7). Diz-se que um estimador dé inadmissiveke existe outro 7.9.1 (7.9.1) or Eq. (7.9.7). It is said that an estimator 5 is inadmissible if there exists estimadordode tal modo queR(@, d0)S R/O, d)para cada valor deGee ha desigualdade another estimator 459 such that R(@, 59) < R(@, 5) for every value of 6 € Q and there estrita nesta relagdo para pelo menos um valor deGe .Nestas condicées, diz-se is strict inequality in this relation for at least one value of 6 € Q. Under these condi- também que o estimadord0dominao estimador6é. Um estimador do tions, it is also said that the estimator 5) dominates the estimator 5. An estimator 5p éadmissivese nao houver outro estimador que domine do. is admissible if there is no other estimator that dominates 4p. Na terminologia da Definicdo 7.9.1, 0 Teorema 7.9.1 pode ser resumido da seguinte In the terminology of Definition 7.9.1, Theorem 7.9.1 can be summarized as forma: Um estimadordisso nado é uma fungdo da estatistica suficiente 7por si sé deve ser follows: An estimator 6 that is not a function of the sufficient statistic T alone must inadmissivel. O Teorema 7.9.1 também identifica explicitamente um estimador do=F(5(X)| T be inadmissible. Theorem 7.9.1 also explicitly identifies an estimator 59 = E(5(X)|T) ) que dominaé. Contudo, esta parte do teorema é um pouco menos util num problema that dominates 5. However, this part of the theorem is somewhat less useful in a pratico, porque normalmente é muito dificil calcular a expectativa condicional.£(5(X)| T). O practical problem, because it is usually very difficult to calculate the conditional Teorema 7.9.1 é valioso principalmente porque fornece mais evidéncias fortes de que expectation E(6(X)|T). Theorem 7.9.1 is valuable mainly because it provides further podemos restringir a nossa busca por um bom estimador de @aqueles estimadores que strong evidence that we can restrict our search for a good estimator of 6 to those dependem das observacées apenas através de uma estatistica suficiente. estimators that depend on the observations only through a sufficient statistic. Exemplo Estimando a média de uma distribuigdo normal.Suponha queXi, ..., Xnformar um aleatdrio Example Estimating the Mean ofa Normal Distribution. Suppose that X,,..., X, form arandom 7.9.3 amostra de uma distribuigéo normal para a qual a médiayé desconhecido e a varidncia é 7.9.3 sample from a normal distribution for which the mean jz is unknown and the variance conhecida, e deixe Sis. . .<Sndenotam as estatisticas de pedido da amostra, conforme is known, and let Y,; <--- < Y,, denote the order statistics of the sample, as defined definido na Seg. 7.8. Sené um numero impar, entdo a observacdo do meio Sim1)26 in Sec. 7.8. If n is an odd number, then the middle observation Y(,+1)/2 is called the chamado de mediana amostral. Sené um numero par, entdo cada valor entre as duas sample median. If n is an even number, then each value between the two middle observacées intermediarias Sn2e Sin2+16 ummediana amostral, mas 0 valor especif[Gow2+ observations Y,,;7 and Y(,,/2)44 is a sample median, but the particular value sIY, j2+ Sin2p1] € frequentemente referido comoa mediana da amostra. Yn/2) 41] is often referred to as the sample median. 7.9 Melhorando um Estimador 459 7.9 Improving an Estimator 459 Como a distribuigdo normal da qual a amostra é extraida é simétrica em relacdo ao Since the normal distribution from which the sample is drawn is symmetric with pontoy, a mediana da distribuigéo normal éy. Portanto, podemos considerar o uso da respect to the point j, the median of the normal distribution is 1. Therefore, we might mediana amostral, ou uma simples fungdo da mediana amostral, como um estimador dey consider the use of the sample median, or a simple function of the sample median, . No entanto, foi mostrado no Exemplo 7.7.4 que a média amostral Xné uma estatistica as an estimator of jz. However, it was shown in Example 7.7.4 that the sample mean suficiente paray. Seque-se do Teorema 7.9.1 que toda funcdo da mediana amostral que X,, is a sufficient statistic for 2. It follows from Theorem 7.9.1 that every function possa ser usada como um estimador deysera dominado por alguma outra fungdo deXn. of the sample median that might be used as an estimator of jz will be dominated by Na busca por um estimador dey, precisamos considerar apenas funcées deXn. some other function of X,,. In searching for an estimator of jz, we need consider only - - functions of X,,. < Exemplo Estimando o desvio padrdo de uma distribuigdo normal.Suponha que%i, ..., Xn Example Estimating the Standard Deviation of a Normal Distribution. Suppose that X;,..., X, 7.9.4 formar uma amostra aleatéria a partir de uma distribuigdo normal para a qual tanto a médiaye 7.9.4 form a random sample from a normal distribution for which both the mean yw and a variac¢doozsdo desconhecidos, e novamente deixe Sis. . .<Sndenotam as estatisticas do pedido the variance o? are unknown, and again let Y, <--- < Y,, denote the order statistics da amostra. A diferengaS-51é chamado defaixada amostra, e podemos considerar 0 uso de of the sample. The difference Y,, — Y; is called the range of the sample, and we might alguma funcdo simples do intervalo como um estimador de ty ele é padrdo consider using some simple function of the range as an estimator of the standard gesviog. No entanto, foi mostrado no Exemplo 7.8.2 que as estatisticas éu=1 Xeue deviation o. However, it was shown in Example 7.8.2 that the statistics )~"_, X; and eu=1 X9e-S40 Conjuntamente suficientes para os parametrospecz. Portanto, toda fungdo va x? are jointly sufficient for the parameters jz and o?. Therefore, every function de Ointervalo qug pode ser usado como um estimador deosera dominado por uma fungdo de of the range that might be used as an estimator of o will be dominated by a function éu=1 Xeue eu Xen - of 7", X; and 7"_, X?. < Exemplo Tempos de falha de rolamentos de esferas.Suponha que desejamos estimar o tempo médio de falha Example Failure Times of Ball Bearings. Suppose that we wish to estimate the mean failure time 7.9.5 dos rolamentos de esferas descritos no Exemplo 5.6.9 com base na amostra de 23 tempos de falha 7.9.5 of the ball bearings described in Example 5.6.9 based on the sample of 23 observed observados. Deixari, . . ., S2aserao os tempos de fata observados (ndo os logaritmos). Nés failure times. Let Y;,..., Y53 be the observed failure times (not the logarithms). We pode considerar usar a médiaSn=1 53 eu=1Seucomo estimador. Suponha que nés might consider using the average Y, = 4 ran Y; as an estimator. Suppose that we continue modelando os logaritmosXeuv=registro(Seujcomo variaveis aleatdérias normais continue to model the logarithms X; = log(Y;) as normal random variables with mean com média 6e variancia 0,25. EntdoSeutem a distribuigdo lognormal com pardmetros6e @ and variance 0.25. Then Y; has the lognormal distribution with parameters 6 and 0,25. Da Eq. (5.6.15), a média de Seué exp(6+0.125), o tempo médio de falha. No entanto, 0.25. From Eq. (5.6.15), the mean of Y; is exp(@ + 0.125), the mean failure time. sabemos queXné suficiente. Desde Snnao é uma funcao deXn, existe uma funcdo deXnque However, we know that X,, is sufficient. Since Y,, is not a function of X,,, there is melhora emSncomo um estimador do tempo médio de falha. Podemos realmente a function of X,, that improves on Y,, as an estimator of the mean failure time. We descobrir que fungdo é essa. Primeiro, escreva can actually find which function that is. First, write _ 1 >” _ _ 1 n _ E(Sn| Xn A E(u} n. (7.9.8) EQ IXn) = — So EX n)- (7.9.8) eu=1 i=l No Exercicio 15 da Sec. 5.10, vocé provou que a distribuigdo condicional deXeudado X In Exercise 15 of Sec. 5.10, you proved that the conditional distribution of X; given n=Xnéa distribuicdo normal com médiaxne variancia 0.25(1-1/n)para cada eu. Segue- X,, =X, is the normal distribution with mean x, and variance 0.25(1 — 1/n) for every se que, para cadaeu, a distribuicdo condicional deSeudadoXné a distribuicao i. It follows that, for each i, the conditional distribution of Y, given X,, is the lognormal lognormal com parametrosXne 0.25(1-1/n). Portanto, segue da Eq. (5.6.15) que a distribution with parameters X,, and 0.25(1 — 1/n). Hence, it follows from Eq. (5.6.15) média condicional deSeudadoXné exp[Xn+0.125(1-1/nJ] para todoseue Eq. (7.9.8) é that the conditional mean of Y; given X,, is exp[X,, + 0.125(1 — 1/n)] for all i, and igual a exp[Xn+0.125(1-1/n]] também. - Eq. (7.9.8) equals exp[X,, + 0.125(1 — 1/n)] as well. < @| Limitagdo do uso de estatisticas suficientes e) Limitation of the Use of Sufficient Statistics Quando a teoria anterior de estatistica suficiente é aplicada a um problema estatistico, é When the foregoing theory of sufficient statistics is applied in a statistical problem, importante ter em mente a seguinte limitacdo. A existéncia e a forma de uma estatistica it is important to keep in mind the following limitation. The existence and the form suficiente em um problema particular dependem criticamente da forma da funcdo of a sufficient statistic in a particular problem depend critically on the form of the assumida para a pdf ou para o PF. Uma estatistica que é uma estatistica suficiente quando function assumed for the p.d.f. or the p.f. A statistic that is a sufficient statistic when it se assume que a pdf éf(x| @Jpode nao ser uma estatistica suficiente quando se assume is assumed that the p.d.f. is f(x|@) may not be a sufficient statistic when it is assumed que a pdf ég9(x| 8), emborag(x| O)pode ser bastante semelhante a/(x| O)para cada valor de@ that the p.d-f. is g(x|@), even though g(x|@) may be quite similar to f(x|6) for every € .Suponha que um estatistico esteja em duivida sobre a forma exata da pdf em um value of 8 € Q. Suppose that a statistician is in doubt about the exact form of the p.d.f. problema especifico, mas assume, por conveniéncia, que a pdf éfx| 8; suponha também in a specific problem but assumes for convenience that the p.d-f. is f(«|@); suppose que a estatistica 7é uma estatistica suficiente sob esta suposicdo. Por causa do also that the statistic T is a sufficient statistic under this assumption. Because of the 460 Chapter 7 Estimation statistician’s uncertainty about the exact form of the p.d.f., he may wish to use an estimator of θ that performs reasonably well for a wide variety of possible p.d.f.’s, even though the selected estimator may not meet the requirement that it should depend on the observations only through the statistic T. An estimator that performs reasonably well for a wide variety of possible p.d.f.’s, even though it may not necessarily be the best available estimator for any particular family of p.d.f.’s, is often called a robust estimator. We shall consider robust estimators further in Chapter 10. The preceding discussion also raises another useful point to keep in mind. In Sec. 7.2, we introduced sensitivity analysis as a way to study the effect of the choice of prior distribution on an inference. The same idea can be applied to any feature of a statistical model that is chosen by a statistician. In particular, the distribution for the observations given the parameters, as defined through f (x|θ), is often chosen for convenience rather than through a careful analysis. One can perform an inference repeatedly using different distributions for the observable data. The comparison of the resulting inferences from each choice is another form of sensitivity analysis. Summary Suppose that T is a sufficient statistic, and we are trying to estimate a parameter with squared error loss. Suppose that an estimator δ(X) is not a function of T. Then δ can be improved by using δ0(T), the conditional mean of δ(X) given T. Because δ0(T) has the same mean as δ(X) and its variance is no larger, it follows that δ0(T) has M.S.E. that is no larger than that of δ(X). Exercises 1. Suppose that the random variables X1, . . . , Xn form a random sample of size n (n ≥ 2) from the normal distribu- tion with mean 0 and unknown variance θ. Suppose also that for every estimator δ(X1, . . . , Xn), the M.S.E. R(θ, δ) is defined by Eq. (7.9.1). Explain why the sample variance is an inadmissible estimator of θ. 2. Suppose that the random variables X1, . . . , Xn form a random sample of size n (n ≥ 2) from the uniform dis- tribution on the interval [0, θ], where the value of the parameter θ is unknown (θ > 0) and must be estimated. Suppose also that for every estimator δ(X1, . . . , Xn), the M.S.E. R(θ, δ) is defined by Eq. (7.9.1). Explain why the estimator δ1(X1, . . . , Xn) = 2Xn is inadmissible. 3. Consider again the conditions of Exercise 2, and let the estimator δ1 be as defined in that exercise. Determine the value of the M.S.E. R(θ, δ1) for θ > 0. 4. Consider again the conditions of Exercise 2. Let Yn = max{X1, . . . , Xn} and consider the estimator δ2(X1, . . . , Xn) = Yn. a. Determine the M.S.E. R(θ, δ2) for θ > 0. b. Show that for n = 2, R(θ, δ2) = R(θ, δ1) for θ > 0. c. Show that for n ≥ 3, the estimator δ2 dominates the estimator δ1. 5. Consider again the conditions of Exercises 2 and 4. Show that there exists a constant c∗ such that the estimator c∗Yn dominates every other estimator having the form cYn for c ̸= c∗. 6. Suppose that X1, . . . , Xn form a random sample of size n (n ≥ 2) from the gamma distribution with parameters α and β, where the value of α is unknown (α > 0) and the value of β is known. Explain why Xn is an inadmissible es- timator of the mean of this distribution when the squared error loss function is used. 7. Suppose that X1, . . . , Xn form a random sample from an exponential distribution for which the value of the pa- rameter β is unknown (β > 0) and must be estimated by using the squared error loss function. Let δ be the estima- tor such that δ(X1, . . . , Xn) = 3 for all possible values of X1, . . . , Xn. a. Determine the value of the M.S.E. R(β, δ) for β > 0. b. Explain why the estimator δ must be admissible. 460 Capítulo 7 Estimativa incerteza do estatístico sobre a forma exata da FDP, ele pode querer usar um estimador deθque funciona razoavelmente bem para uma ampla variedade de pdfs possíveis, mesmo que o estimador selecionado possa não atender ao requisito de que deve depender das observações apenas por meio da estatísticaT. Um estimador que funciona razoavelmente bem para uma ampla variedade de pdfs possíveis, mesmo que não seja necessariamente o melhor estimador disponível para qualquer família específica de pdfs, é frequentemente chamado deestimador robusto. Consideraremos estimadores robustos mais detalhadamente no Capítulo 10. A discussão anterior também levanta outro ponto útil a ser lembrado. Na seg. 7.2, introduzimosanálise sensitivacomo forma de estudar o efeito da escolha da distribuição a priori em uma inferência. A mesma ideia pode ser aplicada a qualquer característica de um modelo estatístico escolhido por um estatístico. Em particular, a distribuição das observações dados os parâmetros, conforme definido atravésf(x|θ), é muitas vezes escolhido por conveniência e não através de uma análise cuidadosa. Pode-se realizar uma inferência repetidamente usando diferentes distribuições para os dados observáveis. A comparação das inferências resultantes de cada escolha é outra forma de análise de sensibilidade. Resumo Suponha queTé uma estatística suficiente e estamos tentando estimar um parâmetro com perda de erro quadrático. Suponha que um estimadorδ(X)não é uma função deT. Entãoδ pode ser melhorado usandoδ0(T), a média condicional deδ(X)dadoT. Porqueδ0(T)tem a mesma média queδ(X)e sua variância não é maior, segue-se queδ0(T)tem MSE que não é maior que o deδ(X). Exercícios 1.Suponha que as variáveis aleatóriasX1, . . . , Xnformar uma amostra aleatória de tamanhon (n≥2)da distribuição normal com média 0 e variância desconhecidaθ. Suponha também que para cada estimadorδ(X1, . . . , Xn), o MSER(θ, δ) é definido pela Eq. (7.9.1). Explique por que a variância amostral é um estimador inadmissível deθ. c.Mostre isso paran≥3, o estimadorδ2domina o estimadorδ1. 5.Considere novamente as condições dos Exercícios 2 e 4. Mostre que existe uma constantec∗tal que o estimador c∗Sn domina todos os outros estimadores tendo a formacYn parac=c∗. 2.Suponha que as variáveis aleatóriasX1, . . . , Xnformar uma amostra aleatória de tamanhon (n≥2)da distribuição uniforme no intervalo [0,θ], onde o valor do parâmetroθÉ desconhecido (θ >0) e deve ser estimado. Suponha também que para cada estimadorδ(X1, . . . , Xn), o MSER(θ, δ)é definido pela Eq. (7.9.1). Explique por que 6.Suponha queX1, . . . , Xnformar uma amostra aleatória de tamanho n (n≥2)da distribuição gama com parâmetrosα eβ, onde o valor deαÉ desconhecido(α >0)e o valor deβé conhecido. Explique por queXné um estimador inadmissível da média desta distribuição quando a função de perda de erro quadrático é usada. estimadorδ1(X1, . . . , Xn)=2Xné inadmissível. 3.Considere novamente as condições do Exercício 2 e deixe o estimadorδ1ser conforme definido nesse exercício. Determine o valor do MSER(θ, δ1)paraθ >0. 7.Suponha queX1, . . . , Xnformar uma amostra aleatória de uma distribuição exponencial para a qual o valor do parâmetroβÉ desconhecido(β >0)e deve ser estimado usando a função de perda de erro quadrático. Deixarδseja o estimador tal queδ(X1, . . . , Xn)=3 para todos os valores possíveis de X1, . . . , Xn. 4.Considere novamente as condições do Exercício 2. SejaS n= máximo{X1, . . . , Xn}e considere o estimadorδ2(X1, . . . , X n)=Sn. a.Determinar o MSER(θ, δ2)paraθ >0. b.Mostre isso paran=2,R(θ, δ2)=R(θ, δ1)paraθ >0. a.Determine o valor do MSER(β, δ)paraβ >0. b.Explique por que o estimadorδdeve ser admissível. 7.10 Exercicios Suplementares 461 7.10 Supplementary Exercises 461 8.Suponha que uma amostra aleatoria denobservagées 12.Suponha queM, ..., Xnformar uma sequéncia denEnsaios de 8. Suppose that a random sample of n observations is 12. Suppose that X,,..., X, form a sequence of n Ber- sdo tiradas de uma distribuicdo de Poisson para a qual o Bernoulli para os quais a probabilidadepde sucessoy esta em qualquer taken from a Poisson distribution for which the value of noulli trials for which the probability p of success on any valor da média6E desconhecido/(@ >0), e o valor def=e6 determinado julgamento é desconhecido(0<ps1), e ns Xous the mean 6 is unknown (6 > 0), and the value of 6 = e? given trial is unknown (0 < p < 1), and let T = 1 X;. deve ser estimado usando a funcdo de perda de erro deixar 7= Determine a forma do estimador £1 | 7). must be estimated by using the squared error loss function. Determine the form of the estimator E(X,|T). quadratico. Desde Zé igual 4 probabilidade de que uma _ Since is equal to the probability that an observation from observacao desta distribuicdo de Poisson tera o valor 0, um 13.Suponha que%1, . . . , Xnformar uma amostra aleatoria de uma this Poisson distribution will have the value 0, a natural 13. Suppose that X;,..., X, form a random sample from estimador natural deBé a proporcao file observaces no distribuicéo de Poisson para wyqual o valor da médiaGé estimator of f is the proportion A of observations in the a Poisson distribution for which the value of the mean 6 is . . desconhecido/(@ >0). Deixar 7= éu=1Xeu, € paraeu=1,..., 1, : ~, unknown (9 > 0). Let T= 50”_, X;, and fori =1,...,n, amostra aleatéria que tem o valor 0. Explique por quefé um a _ random sample that have the value 0. Explain why £ is an es i=l . . . i, deixe a estatisticaSeuser definido da seguinte forma: . . . let the statistic Y; be defined as follows: estimador inadmissivel def. ( inadmissible estimator of £. ' 9.Para cada variavel aleatoriaX, mostre que | EX)| < E(| Seyz | SCXD, 9, For every random variable X, show that |E(X)| < y= | 1 if x; =0, X|). 0 seXeu>0. E(|X|). ' 0 if X;>0. 10.Deixar™,..., Xnformar uma amostra aleatéria de uma Determine a forma do estimador E(Seu| 7). 10. Let X,,..., X,, forma random sample from a distri- | Determine the form of the estimator E(Y;|T). distribuigao para a qual a pdf ou o PF efx| 9) onde de . 14.Considere novamente as condigées do Exercicio 8. Determinando bution for which the p.d.t or the p.t. is F16), where @ € 14. Consider again the conditions of Exercise 8. Deter- Suponha que o valor de G@deve ser estimado, e que 7€ uma : . . Suppose that the value of 6 must be estimated, and that . . n . estatistica suficiente para@. Deixardseja um estimador mine a forma do estimador E(B] 7). Voce pode usar’ OS T is a sufficient statistic for 6. Let 6 be an arbitrary esti- mine the form of the estimator E(BIT). You may wish to arbitrario de@, e deixar doseja outro estimador definido resultados obtidos ao resolver o Exercicio 13. mator of 9, and let 59 be another estimator defined by the use results obtained while solving Exercise 13. pela relacao d0=£(4| 7). Mostre que para cada valor deée, 15.Encontre o MLE de exp(6+0.125)no Exemplo 7.9.5. relation 59 = E(5|T). Show that for every value of 6 € &, 15. Find the M.LE. of exp(@ + 0.125) in Example 7.9.5. Tanto o MLE quanto o estimador no Exemplo 7.9.5 tém Both the M.L.E. and the estimator in Example 7.9.5 have Fo(| 50-6| sEo(| 5-6] ). o formulario exp(Xr+c)para alguma constantec. Encontre o valorc F6(180 — @l) < Eo (ld — @)). the form exp(X,, +c) for some constant c. Find the value c 11.Suponha que as variaveisX1,..., Xnformar uma amostra de modo que 0 estimador exp(Xnt ctem o menor MSE 11. Suppose that the variables X;,..., X,, form arandom so that the estimator exp(X,, + ¢) has the smallest possible aleatéria de uma distribuicdo para a qual a pdf ou o PF possivel sample from a distribution for which the p.d.f. or the pf. MSE. éf(x| ), ondeé<,e deixar denotar o MLE de @&Suponha 16.No Exemplo 7.9.1, encontre a formula parapem termos is f(x|0), where 6 € &, and let 6 denote the M.L.E. of — 16, In Example 7.9.1, find the formula for p in terms of também que a estatistica 76 uma estatistica suficiente para de 6,a média de cadaXeu. Encontre também o MLE depe 9. Suppose also that the statistic T is a sufficient statistic 6, the mean of each X;. Also find the M.L.E. of p and 6, e deixe 0 estimador doser definido pela relacdo mostre que o estimadoré0(7no Exemplo 7.9.2 6 quase o for 6, and let the estimator 59 be defined by the relation show that the estimator 69(7) in Example 7.9.2 is nearly d0=£(6| T). Compare os estimadores 6 do. mesmo que 0 MLE sené grande. 59 = E(@\T). Compare the estimators 6 and 4p. the same as the M.L.E. if n is large. 7.10 Exercicios Suplementares 7.10 Supplementary Exercises 1.Um programa sera executado com 25 conjuntos diferentes de estimativa deGem relacdo a funcgdo de perda de erro 1. A program will be run with 25 different sets of input. estimate of 9 with respect to the squared error loss func- entradas. Deixar@representam a probabilidade de que um erro de quadratico. Let 6 stand for the probability that an execution error will tion. execucdo ocorra durante uma Unica execu¢ao. Acreditamos que, occur during a single run. We believe that, conditional on condicionado 6,cada execucdo do programa encontrara um erro 4.Suponha que, ..., Xnforme uma amostra aleatoria a partir de 8, each run of the program will encounter an error with 4, Suppose that X;,..., X,, form a random sample from com probabilidade Ge que as diferentes execucdes sdo uma distribuigao uniforme com a seguinte pdf: probability 6 and that the different runs are independent. a uniform distribution with the following p.d.f.: independentes. Antes de executar 0 programa, acreditamos que@ { Prior to running the program, we believe that 6 has the tem distribuicdo uniforme no intervalo [0,1]. Suponha que fix| O= 3 para@sx<26, de uniform distribution on the interval [0, 1]. Suppose that f(xl6) = | 3 for 0 <x < 20, tenhamos erros durante 10 das 25 execucées. 0 outra forma. we get errors during 10 of the 25 runs. 0 otherwise. a.Encontre a distribuicdo posterior deé. : . a. Find the posterior distribution of 0. . : Supondo que o valor deGE desconhecido/(@ >0), a Assuming that the value of @ is unknown (0 > 0), deter- b.Se quiséssemos estimar Opor Gusando perda de erro determine o MLE de@. b. If we wanted to estimate 6 by 6 using squared error mine the M.L._E. of 6. quadratico, qual seria nossa estimativa Ber? loss, what would our estimate 6 be? ok 5.Suponha queXieX2sdo variaveis aleatdérias independentes . . 5. Suppose that X; and X> are independent random vari- 2.Suponha que, ..., Xnsdo iid com Pr(Xeu=1}-@e Pr(X —g queXeutem a distribuicao normal com média 2. Suppose that X;,..., X, are iid. with Pr(x;=1)=@ ables, and that X; has the normal distribution with mean eu-01 -6, ondeGe desconhecido(0<6s1 ). Encontre o beuple variaGdo 02 euparaeu=1,2. Suponha também quebi, £2, and Pr(X;=0)=1— 6, where @ is unknown (0 <6 <1). Find b, and variance o? for i = 1, 2. Suppose also that by, bo, MLE de@. . _. . the M.L.E. of 67. 5 > i we ; O2,@02 2sd0 constantes positivas conhecidas, e queyvé um o7, and o5 are known positive constants, and that jz is an 3.Suponha que a proporcgdo @de macas podres em um grande parametro desconhecido. Determine o MLE depbaseado 3. Suppose that the proportion 6 of bad apples in a large unknown parameter. Determine the M.L.E. of jz based on lote é desconhecido e tem o seguinte pdf anterior: em XieX2. lot is unknown and has the following prior p.d.f.: X, and X. t para 0<6<1, 6.Deixar W(a'(a)/ (ajparaa >0 (a fungdo digamma). _ { 606201- 6)? for0<@ <1, 6. Let w(a) =I'(a)/T (a) for a > 0 (the digamma func- 5(GF60 621 -O)3 Mostre isso f@) = ‘ tion). Show that 0 de outra forma. 0 otherwise. . Suponha que uma amostra aleatéria de 10 magas seja retirada do 1 Suppose that a random sample of 10 apples is drawn from 1 lote e se descubra que trés sao ruins. Encontre os Bayes pari Fip(ay. a the lot, and it is found that three are bad. Find the Bayes vat D=v@) + a 462 Capitulo 7 Estimativa 462 Chapter 7 Estimation 7.Suponha que uma lampada comum, uma lampada de longa uma probabilidade desconhecidapque ela dara uma 7. Suppose that a regular light bulb, a long-life light bulb, an unknown probability p that she will give a positive re- duracdo e uma lampada de longa duracdo estejam sendo resposta positiva. O estatistico pode observar apenas o and an extra-long-life light bulb are being tested. The life- sponse. The statistician can observe only the total number testadas. A vidaXida lampada regular tem distribuigdo numero total Xdas respostas positivas dadas pelonpessoas time X, of the regular bulb has the exponential distribu- X of positive responses that were given by the n persons exponencial com média@, a vidaX2da lampada de longa na amostra. Ele ndo pode observar a quais dessas pessoas tion with mean @, the lifetime X, of the long-life bulb has in the sample. He cannot observe which of these persons duracdo tem distribuigdo exponencial com média26,eavidaX foi feita a pergunta delicada ou a quantas pessoas na the exponential distribution with mean 26, and the life- were asked the sensitive question or how many persons in 3da lampada de vida extralonga tem distribuigéo exponencial amostra foi feita a pergunta delicada. Determine o MLE de time X3 of the extra-long-life bulb has the exponential the sample were asked the sensitive question. Determine com média 38. pcom base na observacdox. distribution with mean 36. the M.L.E. of p based on the observation X. a.Determine o MLE de@com base nas observacées M, 12.Suponha que uma amostra aleatoria de quatro a. Determine the M.L.E. of 6 based on the observations 12. Suppose that arandom sample of four observations is Xe, &X3. observacées seja extraida da distribuigdo uniforme no X1, Xp, and X3. to be drawn from the uniform distribution on the interval b.Deixar Y= 1/6,e suponha que a distribuigdo anterior _intervalo [0,4], e que a distribuicdo prévia dedtem o b. Let y = 1/6, and suppose that the prior distribution —_ [0, 0], and that the prior distribution of 6 has the following deyé a distribuigdo gama com parametrosae f. seguinte pdf: of w is the gamma distribution with parameterswand __p..d.f.: Determine a distribuigdo posterior dewdado %1,X2, { 1 toyed f. Determine the posterior distribution of y given L fora> eXx3. EO =NOUFI, X,, Xp, and X3. é(0) = | gz Ore =’ 0 de outra forma. 0 otherwise. 8.Considere uma cadeia de Markov com dois estados possiveisé1 ~ 8. Consider a Markov chain with two possible states s1 . . . i x on Suponha que os valores das observacgées na amostra . . ae sya . Suppose that the values of the observations in the sam- eé2e com probabilidades de transicdo estacionarias conforme : : : : and s, and with stationary transition probabilities as given . . . . as sejam 0,6, 0,4, 0,8 e 0,9. Determine a estimativa de : . “ys op. ple are found to be 0.6, 0.4, 0.8, and 0.9. Determine the fornecido na seguinte matriz de transicdoP: ms ~ in the following transition matrix P: . : Bayes deGem relacdo a funcdo de perda de erro Bayes estimate of @ with respect to the squared error loss : [eg &] quadratico. 51 2 , function. P= el ” , 13.Para as condi¢des do Exercicio 12, determine a P= 51 ~ ; 13. For the conditions of Exercise 12, determine the ea 3/4 14 estimativa de Bayes deGem relacdo a fungdo de perda de 2 13/4 1/4 Bayes estimate of 6 with respect to the absolute error loss onde o valor de@e desconhecido(0<6s1). Suponha que erro absoluta. where the value of @ is unknown (0 <6 <1).Suppose that —_ function. 0 estado inicialXida cadeia €e1, e deixar, .. % Xow 14.Suponha que™, ..., Xnforme uma amostra aleatoria a partir de the initial state X; of the chain is sj, and let Xp, ..., Xn41 14. Suppose that X;,..., X,, form a random sample from denotam o estado da cadeia em cada um dos préximosn uma distribuicSo com a sequinte odf- denote the state of the chain at each of the next n suc- a distribution with the following p.d.f: periodos sucessivos. Determine o MLE de6com base nas s c pr cessive periods. Determine the M.L.E. of 6 based on the & Pals observagéesxX2,..., Xn+1. BEB) arax26, observations X7,..., Xy414. eBa-9) forx >6 AB OF P " fIB, an={F ~ 9.Suponha que uma observacaoXé extraido de uma de outra forma, 9. Suppose that an observation X is drawn from a distri- 0 otherwise, distribuigao com o seguinte pdf: ondee @s40 desconhecidos(@ >0,-<@<e), Determine um par bution with the following p.d.f: where f§ and @ are unknown (8 > 0, —oo < 6 < 00). De- { 1 de estatisticas conjuntamente suficientes. 1 termine a pair of jointly sufficient statistics. fix| OF g Para O0<x <8 fale) = 4 8 for0<x <@, 0 de outra forma 15.Suponha que™,..., Xnformar uma amostra aleatéria ~ 0 otherwise 15. Suppose that X;,..., X,, form a random sample from - , da distribuigdo de Pareto com pardmetrosxoe a(ver , the Pareto distribution with parameters xo and a (see Ex- Além disso, suponha que a pdf anterior deéé Exercicio 16 da Seco 5.7), ondexoé desconhecido eaé Also, suppose that the prior p.d.f. of 6 is ercise 16 of Sec. 5.7), where x9 is unknown and q@ is known. { conhecido. Determine o MLE dex. Determine the M.L.E. of xo. 6 0e-6 para@>0 ; ; . : 9) = be" ford >0 . . . sr 16.Determine se 0 estimador encontrado no Exercicio 15 é 6) = . 16. Determine whether the estimator found in Exer- 0 de outra forma. rigs ye _ 0 otherwise. : . os : - oe uma estatistica minima suficiente. cise 15 is a minimal sufficient statistic. Determine o estimador de Bayes de gem relacao a(a)a funcao 17.Considere novamente as condi¢ées do Exercicio 15, mas Determine the Bayes estimator of 0 with respect to (a) the 17. Consider again the conditions of Exercise 15, but sup- de perda de erro quadratico médio e(b)a fungdo de perda de a = mean squared error loss function and (b) the absolute bsolut suponha agora que ambos os pardmetrosxoeasdo loss functi pose now that both parameters x9 and a are unknown. erro absolute. desconhecidos. Determine os MLEs dexoea. CTTOT toss tuncyon. Determine the M.L.E.’s of xg and a@. 10.Suponha quem, ..., XnformanEnsaios de Bernoulli 18.Determine se os estimadores encontrados no Exercicio 17 sdo 10. Suppose that X;,..., X, form n Bernoulli trials with 18. Determine whether the estimators found in Exer- com parametro@(1 3X1 +P), onde o valor deGE estatisticas minimas suficientes em conjunto. parameter 6 = (1/3). + B), where the value of £ is un- cise 17 are minimal jointly sufficient statistics. desconhecido(0sfs1). Determine o MLE def. known (0 < f < 1). Determine the M.L.E. of £. . 19.Suponha que a variavel aleatériaXtem uma distribuicgdo ; ; ; 19. Suppose that the random variable X has a binomial 11.0 método deresposta aleatoriaas vezes 6 usado para binomial com um valor desconhecido dene um valor 11. The method of randomized response 1S sometimes distribution with an unknown value of n and a known realizar pesquisas sobre tdpicos delicados. Uma versdo conhecido dep (0<p <1). Determine o MLE dencom base na used to conduct surveys on sensitive topics. A simple ver- value of p (0 < p <1). Determine the M.L_E. of n based simples do método pode ser descrita da seguinte forma: Uma observacaoX.Dica-Considere a proporcado sion of the method can be described as follows: A random on the observation X. Hint: Consider the ratio amostra aleatoria denpessoas sao provenientes de uma fix| n+, P) sample of n persons is drawn from a large population. For 1 grande populacao. Para cada pessoa na amostra ha fod 1, P) each person in the sample there is probability 1/2 that the f@ln+h py probabilidade 1/2 de que a pessoa receba uma pergunta f(x\ 1, p) person will be asked a standard question and probability f(x|n, p) padrdo e probabilidade 1/2 de que a pessoa responda uma 20.Suponha que duas observacoes Xie X2s30 sorteados 1/2 that the person will be asked a sensitive question. Pur- 20. Suppose that two observations X, and X) are drawn pergunta delicada. Alem disso, esta selecao do padrao ou da aleatoriamente de uma distribuicdo uniforme com a seguinte thermore, this selection of the standard or the sensitive at random from a uniform distribution with the following questdo delicada é feita independentemente de pessoa para pdf: question is made independently from person to person. pdf: pessoa. Se for feita a pergunta padrdo a uma pessoa, entado { If a person is asked the standard question, then there is ha 1/2 probabilidade de ela dar uma resposta positiva; no fix| OF xa para OSx< Oou 265x<36, probability 1/2 that she will give a positive response; how- f(x|0) = a for0<x <6 or20 <x <30, entanto, se ela fizer a pergunta delicada, entao ha O de outra forma, ever if she is asked the sensitive question, then there is ~ (0 — otherwise, 7.10 Exercicios Suplementares 463 7.10 Supplementary Exercises 463 onde o valor de@E desconhecido(@ >0). Determine o MLE seixarat, 2,.. .Seja uma sequéncia de numeros reais. Informatica where the value of 6 is unknown (6 > 0). Determine the let x1, x2, ... be a sequence of real numbers. Computing de para cada um dos seguintes pares de valores ev=1(Xeu-Xnprequer diretamente que primeiro calculemosxn — M.L.E. of @ for each of the following pairs of observed = 7”_,(x; — X,)? directly requires that we first compute x, observados deXieX2: e ainda tem tudonobservacées disponiveis para que possamos values of X; and X>: and then still have all n observations available so that we aXiz7 eX%=9 calcular xew Xnpara cadaeu. Também sené muito grande, entdo a. X,=7and X,=9 can compute x; — X, for each i. Also, if n is very large, computarxnadicionando oxeujuntos podem produzir grandes erros then computing x,, by adding the x,’s together can pro- b.x1= 4 eX2= 9 a . b. X,=4 and X,=9 . de arredondamento quando 0 préximoxeutorna-se muito pequeno duce large rounding errors once the next x; becomes very c.X1= 5 eX2= 9 em relagdo ao montante acumulado. ce X;=5and X,=9 small relative to the accummulated sum. 21.Suponha que uma amostra aleatériam,..., Xndeve ser a.Prove a formula aparentemente mais eficiente > 21. Suppose that a random sample X,..., X,, is to be a. Prove the seemingly more efficient formula retirado da distribuigdo normal com média desconhecida Ge n y taken from the normal distribution with unknown mean n n variancia 100, e a distribuicdo anterior de@é a distribuicdo (Xeu-Xnp= Xau XZ, 6 and variance 100, and the prior distribution of 6 is the SoG; —x,)= > x? — nx?. normal com média especificadayoe variancia 25. Suponha que eu=1 eu=1 . normal distribution with specified mean jg and variance i=l i=l @deve ser estimado usando a fungdo de perda de erro , , 25. Suppose that @ is to be estimated using the squared . . bas me Com esta formula, poderiamos acumular a soma . . With this formula, we could accummulate the sum quadratico, e o custo de amostragem de cada observacdo é d oe, " error loss function, and the sampling cost of each obser- f the x.’s and x2” tely and f t each ob 0,25 (em unidades apropriadas). Se o custo total do vse ce relax2 da. Pan © sblema - ee vation is 0.25 (in appropriate units). If the total cost of the ° fi © Xi tt an i) Wewor Natit! fer the « ° a procedimento de estimativa for igual 4 perda esperada do et ene Ainda so ‘onadue, 9 problema do erro de estimation procedure is equal to the expected loss of the vation a blem : . hi ved 7 i Sutter the Touncins estimador Bayes mais 0 custo de amostragem (0,25)n, qual é arrecon amento mencionado acima. Bayes estimator plus the sampling cost (0.25)n, what is the b Prov one role men for i why d h d o tamanho da amostranpara 0 qual o custo total sera minimo? b.Prove as seguintes formulas que reduzem o problema do erro sample size n for which the total cost will be a minimum? » Prove the following formulas that reduce the round- de arredondamento no acumulo de uma soma. Para cada ing error problem in accummulating a sum. For each 22.Suponha que®M, ..., Xnformar uma amostra aleatéria da inteiron 22. Suppose that X;,..., X, form a random sample from integer n distribuigdo de Poisson com média desconhecida@, e a 1 the Poisson distribution with unknown mean @, and the 1 variancia desta distribuicdo deve ser estimada usando a Xme1=XnF pe Or - Xy, variance of this distribution is to be estimated using the Xn41=Xy + Tat —Xn), fungdo de perda de erro quadratico. Determine se a varidncia n squared error loss function. Determine whether or not the n+ da amostra é ou ndo um estimador admissivel. yt »” n sample variance is an admissible estimator. n+l ” n (Xeu-Xr1 P= (XeurXnpt? me tex ni Soa _ Xp 41)” = Soa — ¥,)- + Ta ntl - X,)°- 23.As formulas (7.5.6) para a média amostral e a variancia eu=1 eu=1 23. The formulas (7.5.6) for the sample mean and sam- i=l i=l at amostral $40 de importancia teorica, mas podem ser ineficientes Estas formulas permitem-nos esquecer cadaxeudepois de usd-lo ple varrance are of theoretical Importance, but they can These formulas allow us to forget each x; after we use ou produzir resultados imprecisos se usadas para calculos . . be inefficient or produce inaccurate results if used for nu- : _. / para atualizar as duas formulas. : . . it to update the two formulas. numéricos com amostras muito grandes. Por exemplo, merical calculation with very large samples. For example, > TTT OT TTOSNNNOT~rVE (3S a ONNma—sv > felizmente Chapter 8.1A distribuig¢do amostral de uma estatistica 8.6Andalise Bayesiana de Amostras de uma 8.1 The Sampling Distribution of a Statistic 8.6 Bayesian Analysis of Samples from a Normal 8.2As distribuigées qui-quadrado Distribuigdo Normal 8.2 The Chi-Square Distributions Distribution 8.3Distribuigdo Conjunta da Média Amostral e 8.7Estimadores imparciais 8.3 Joint Distribution of the Sample Mean and 8.7 Unbiased Estimators Variancia Amostral 8.8Informagées sobre Pescador Sample Variance 8.8 Fisher Information 8.40 tDistribuicgées 8,9Exercicios Suplementares 8.4 The ¢ Distributions 8.9 Supplementary Exercises 8,5Intervalos de confiancga 8.5 Confidence Intervals 8.1 A distribuicdo amostral de uma estatistica 8.1 The Sampling Distribution of a Statistic Uma estatistica 6 uma funcéo de algumas varidveis aleatorias observaveis e, portanto, é ela A Statistic is a function of some observable random variables, and hence is itself a mesma uma variavel aleatéria com uma distribuicao. Essa distribuicao 6 a sua distribuicéo random variable with a distribution. That distribution is its sampling distribution, amostral e diz-nos quais os valores que a estatistica provavelmente assumiré e qual a and it tells us what values the statistic is likely to assume and how likely it is to probabilidade de assumir esses valores antes de observar os nossos dados. Quando a assume those values prior to observing our data. When the distribution of the distribuicao dos dados observaéveis é indexada por um parametro, a distribuicao amostral é observable data is indexed by a parameter, the sampling distribution is specified especificada como a distribuicao da estatistica para um determinado valor do parametro. as the distribution of the statistic for a given value of the parameter. Estatisticas e Estimadores Statistics and Estimators Exemplo Um ensaio clinico.No ensaio clinico introduzido pela primeira vez no Exemplo 2.1.4, deixe Gapoiar Example A Clinical Trial. In the clinical trial first introduced in Example 2.1.4, let 6 stand for 8.1.1 a proporgdo que n4o recai entre todos os possiveis pacientes com imipramina. 8.1.1 the proportion who do not relapse among all possible imipramine patients. We could Poderiamos usar a proporcdo observada de pacientes sem recidiva no grupo da use the observed proportion of patients without relapse in the imipramine group to imipramina para estimar@. Antes de observar os dados, a proporcao de pacientes da estimate 9. Prior to observing the data, the proportion of sampled patients with no amostra sem recidiva é uma variavel aleatéria 7que tem uma distribuigdo e ndo sera relapse is a random variable T that has a distribution and will not exactly equal the exatamente igual ao pardmetro@. Contudo, esperamos que 7estara perto de@com alta parameter 6. However, we hope that T will be close to 6 with high probability. For probabilidade. Por exemplo, poderiamos tentar calcular a probabilidade de que | 7-6| <0. example, we could try to compute the probability that |T — 6| < 0.1. Such calculations 1. Tais calculos exigem que conhegamos a distribuicdo da variavel aleatéria 7.No ensaio require that we know the distribution of the random variable T. In the clinical trial, clinico, modelamos as respostas dos 40 pacientes no grupo da imipramina como we modeled the responses of the 40 patients in the imipramine group as conditionally condicionalmente (dado@) variaveis aleatorias iid Bernoulli com parametro@, Segue-se (given 6) iid. Bernoulli random variables with parameter 0. It follows that the que a distribuigdo condicional de 40 7dado€ a distribuigdo binomial com parametros 40 e conditional distribution of 40T given 6 is the binomial distribution with parameters 6. A distribuigado de 7pode ser derivado facilmente disso. De fato 7tem o 40 and @. The distribution of T can be derived easily from this. Indeed T has the seguinte pf dado& following p.f. given 6: 40? 40 \ 401 401-1) 1 2 39 FEOF py, Mod! -Ghon-o,parat=0,1 70,240, -- - 390,1, flO) = (40, Jar") , fort=0, 4.4 --- eb ef (t| QO caso contrario. - and f(t|@) = 0 otherwise. < A distribuicao no final do Exemplo 8.1.1 é chamada dedistribuicgo de amostrasda The distribution at the end of Example 8.1.1 is called the sampling distribution of estatistica 7,e podemos usa-lo para ajudar a responder quest6es como qudo perto esperamos 7 the statistic T, and we can use it to help address questions such as how close we expect ser para@antes de observar os dados. Também podemos usar a distribuigdo amostral de 7para T to be to 6 prior to observing the data. We can also use the sampling distribution ajudar a determinar o quanto aprenderemos sobre Gobservando 7.Se nds somos of T to help to determine how much we will learn about 6 by observing 7. If we are 464 464 8.1 A distribuiggo amostral de uma estatistica 465 8.1 The Sampling Distribution of a Statistic 465 tentando decidir qual de duas estatisticas diferentes usar como estimador, suas distribuigdes trying to decide which of two different statistics to use as an estimator, their sampling amostrais podem ser titeis para nos ajudar a comparéa-las. distributions can be useful for helping us to compare them. O conceito de distribuicéo amostral se aplica a uma classe maior de variaveis aleatorias do que The concept of sampling distribution applies to a larger class of random variables as estatisticas. than statistics. Definigao Distribuigdéo de amostras.Suponha que as variaveis aleatériasXxX=(X1,..., Xnformar um Definition Sampling Distribution. Suppose that the random variables X = (X,,..., X,,) forma 8.1.1 amostra aleatoria de uma distribuicdéo envolvendo um pardmetro @cujo valor é 8.1.1 random sample from a distribution involving a parameter 6 whose value is unknown. desconhecido. Deixar 7ser uma fungado deXe possivelmente@. Aquilo é, 7=/(X1,..., Xn,O).A Let T be a function of X and possibly 6. That is, T = r(Xj,..., X,, 9). The distribu- distribuigdo de {dado® é chamado dedistribuic¢ao de amostrasde T.Usaremos a notagdo F tion of T (given @) is called the sampling distribution of T. We will use the notation 6(T)jpara denotar a média de 7calculado a partir de sua distribuigdo amostral. E,(T) to denote the mean of T calculated from its sampling distribution. O nome “distribuigdo amostral” vem do fato de que 7depende de uma amostra The name “sampling distribution” comes from the fact that T depends on a random aleatéria e, portanto, sua distribuic¢do é derivada da distribuigdo da amostra. sample and so its distribution is derived from the distribution of the sample. Muitas vezes, a variavel aleatéria 7na Definicgado 8.1.1 ndo dependera de@, e portanto sera Often, the random variable T in Definition 8.1.1 will not depend on 6, and hence uma estatistica conforme definido na Definicdo 7.1.4. Em particular, se 7é um estimador de@ will be a statistic as defined in Definition 7.1.4. In particular, if T is an estimator (conforme definido na Definigao 7.4.1), entéo Ztambém é uma estatistica porque é uma fungado of 6 (as defined in Definition 7.4.1), then T is also a statistic because it is a function deX. Portanto, em principio, é possivel derivar a distribuigéo amostral de cada estimador deé. of X. Therefore, in principle, it is possible to derive the sampling distribution of each Na verdade, as distribuigé6es de muitos estimadores e estatisticas ja foram encontradas em estimator of 0. In fact, the distributions of many estimators and statistics have already partes anteriores deste livro. been found in previous parts of this book. Exemplo Distribuigdo amostral do MLE da média de uma distribuigdo normal.Suponha Example Sampling Distribution of the M.L.E. of the Mean of a Normal Distribution. Supppose 8.1.2 quem, ..., Xnformar uma amostra aleatoria da distribuigdo normal com médiay 8.1.2 that X,,..., X, form a random sample from the normal distribution with mean yw e variagdooz. Descobrimos nos Exemplos 7.5.5 e 7.5.6 que a média amostralXné and variance o*. We found in Examples 7.5.5 and 7.5.6 that the sample mean X,, is o MLE dey. Além disso, constatou-se no Corolario 5.6.2 que a distribuigdo de Xné the M.L.E. of jz. Furthermore, it was found in Corollary 5.6.2 that the distribution of a distribuicdo normal com médiaye variagdoa2/n. - X,, is the normal distribution with mean jz and variance o7/n. < Neste capitulo, derivaremos, para amostras aleatérias de uma distribuigdo normal, a In this chapter, we shall derive, for random samples from a normal distribution, distribuigdo da variancia amostral e as distribuigées de varias fungdes da média amostral the distribution of the sample variance and the distributions of various functions e da variancia amostral. Estas derivagées nos levardo as definigdes de algumas novas of the sample mean and the sample variance. These derivations will lead us to distribuig6es que desempenham papéis importantes em problemas de inferéncia the definitions of some new distributions that play important roles in problems estatistica. Alam disso, estudaremos certas propriedades gerais dos estimadores e suas of statistical inference. In addition, we shall study certain general properties of distribuig6es amostrais. estimators and their sampling distributions. Objetivo da Distribuigdo Amostral Purpose of the Sampling Distribution Exemplo Vida util dos componentes eletrénicos.Considere a empresa no Exemplo 7.1.1 que Example Lifetimes of Electronic Components. Consider the company in Example 7.1.1 that 8.1.3 vende componentes eletrénicos. Eles modelam os tempos de vida desses componentes 8.1.3 sells electronic components. They model the lifetimes of these components as i.i.d. como variaveis aleatdérias exponenciais iid com parametro @condicional a@. Eles exponential random variables with parameter 6 conditional on 0. They model 6 as modelam como tendo a distribuigdo gama com os pardmetros 1 e 2. Agora, suponha que having the gamma distribution with parameters 1 and 2. Now, suppose that they are eles estejam prestes a observarn=3 vidas, e eles usardo a média posterior de@como um about to observe n = 3 lifetimes, and they will use the posterior mean of @ as an estimador. De acordo com o Teorema 7.3.4, a distribyigao posterior de@sera o estimator. According to Theorem 7.3.4, the posterior distribution of 6 will be the distribuigdo gama com parametros 1+3=4e2+ eu=-1Xeu. A média posterior gamma distribution with parameters 1+ 3 =4 and2 + yy X;. The posterior mean sera entdo@4/2+ “3 v4 will then be 6 = 4/(2 + >_, X}). Antes de observar as trés vidas, a empresa pode querer saber qual a probabilidade Prior to observing the three lifetimes, the company may want to know how likely de queééstara perto de@. Por exemplo, eles podem querer calcular Pr(| &9| <0.1). Além it is that 0 will be close to @. For example, they may want to compute Pr(|6 —6|<0.1). disso, outras partes interessadas, como os clientes, podem estar interessadas em saber In addition, other interested parties such as customers might be interested in how qudo préximo o estimador estara de@. Mas estes outros podem nao querer atribuir a close the estimator is going to be to 6. But these others might not wish to assign mesma distribuicdo anterior a@. Na verdade, alguns deles podem desejar ndo atribuir the same prior distribution to 6. Indeed, some of them may wish to assign no prior nenhuma distribuigdo prévia. Veremos em breve que todas essas pessoas achardo util distribution at all. We shall soon see that all of these people will find it useful to de- determinar a distribuicdéo amostral de& O que eles fardo com essa distribuicdo amostral termine the sampling distribution of §. What they do with that sampling distribution sera diferente, mas todos poderdo fazer uso da distribuigéo amostral. - will differ, but they will all be able to make use of the sampling distribution. < 466 Capitulo 8 Distribuigdes amostrais de estimadores 466 Chapter 8 Sampling Distributions of Estimators No Exemplo 8.1.3, apos a empresa observar os trés tempos de vida, ela estara In Example 8.1.3, after the company observes the three lifetimes, they will be interessada apenas na distribuicdo posterior de@. Eles poderiam entdo calcular a interested only in the posterior distribution of 0. They could then compute the probabilidade posterior de que | £6| <0.1. No entanto, antes de a amostra ser posterior probability that |9 — 6| < 0.1. However, before the sample is taken, both 6 colhida, ambos @e6sdo aleatorios e Pr(| &0| <0.1 Jenvolve a distribuigdo conjunta de® and 6 are random and Pr(|6 — 6| < 0.1) involves the joint distribution of @ and 6. The e@. A distribuigdo amostral é apenas a distribuigdo condicional de@dado@. Portanto, a sampling distribution is merely the conditional distribution of 6 given 6. Hence, the lei da probabilidade total diz que law of total probability says that [ ] . . Pr.(| £6| <0.1- £Pr.(| £9] <0.1| 8). Pr(|6 —6| <O.1)=E [Prd —6|< 0.116)] . Dessa forma, a empresa faz uso da distribuigdo amostral de6€omo um calculo In this way, the company makes use of the sampling distribution of 6 as an interme- intermediario no caminho para calcular Pr(| &6| <0.1). diate calculation on the way to computing Pr(|6 — 6| < 0.1). Exemplo Vida util dos componentes eletrénicos.No Exemplo 8.1.3, a distribuigao amostral de” Example Lifetimes of Electronic Components. In Example 8.1.3, the sampling distribution of 6 8.1.4 nado tem nome, mas é facil perceber que6@ uma fungdo monotona da estatistica 8.1.4 does not have a name, but it is easy to see that 6 is a monotone function of the statistic i= 2u-1 Xeuque tem a distribuigdo gama com parametros 3 e&{condicional T= ye X; that has the gamma distribution with parameters 3 and @ (conditional sobre). Entéo, podemos calcular o cdfF(.| @para a distribuigdo amostral deGdo cdfG/.| A) on @). So, we can compute the c.d.f. F(-|@) for the sampling distribution of 6 from the da distribuigdo de 7Argumente da seguinte forma. Para>0, c.d.f. G(-|@) of the distribution of T. Argue as follows. For t > 0, F(t| Or Pr.(Bt| 7) ) F(t|@) = Pr < 16) 4 4 =Pr. ~—— Stig =r (5*— <i|¢) ( 2+7 2+T 4 4 =Pr. 72 7 Ae =Pr(T>--—2\6 t ( 4 ) 4 =1-G —-- dag . =1-G{--2/6}. t t Parats0,F (t| 90. A maioria dos pacotes estatisticos de computador inclui a fungdoG, que For t <0, F(t|0) =0. Most statistical computer packages include the function G, é o cdf de uma distribuigdo gama. A empresa agora pode calcular, para cada 6, which is the c.d-f. of a gamma distribution. The company can now compute, for each 6, Pr.(| €8| <0.1 | QE F(O+0.1 | A} F(E-0.1 | A). (8.1.1) Pr(\6 — 6| < 0.1/6) = F(@ + 0.1/0) — F(6 — 0.1)9). (8.1.1) A Figura 8.1 mostra um grafico desta probabilidade em funcao de@. Para completar o Figure 8.1 shows a graph of this probability as a function of 6. To complete the calcu- calculo de Pr(| &6| <0.1), devemos integrar (8.1.1) em relagdo a distribuigdo de@, ou lation of Pr(|@ — 6| < 0.1), we must integrate (8.1.1) with respect to the distribution seja, a distribui¢do gama com os pardmetros 1 e 2. Esta integral ndo pode ser of 6, that is, the gamma distribution with parameters 1 and 2. This integral cannot realizada de forma fechada e requer uma aproximagdo numérica. Uma dessas be performed in closed form and requires a numerical approximation. One such ap- aproximacgées seria uma simulacdo, que sera discutida no Capitulo 12. Neste proximation would be a simulation, which will be discussed in Chapter 12. In this exemplo, a aproximacado produz Pr(| &6| <0.1)=0.478. example, the approximation yields Pr(|6 — 0| < 0.1) + 0.478. Também incluido na Fig. 8.1 esta o calculo de Pr(| &6| <0.1 | usando & 3/7,0 Also included in Fig. 8.1 is the calculation of Pr(|9 — 6| < 0.1/6) using 6 = 3/T, the MLE de@. A distribuigdo amostral do MLE pode ser derivada no Exercicio 9 no final desta M.L.E. of 6. The sampling distribution of the M.L.E. can be derived in Exercise 9 at secdo. Observe que a média posterior tem maior probabilidade de estar proxima de 6@do the end of this section. Notice that the posterior mean has higher probability of being que o MLE quando@esta prdoximo da média da distribuigdo anterior. QuandoGesta longe close to 6 than does the M.L.E. when @ is near the mean of the prior distribution. da média anterior, o MLE tem maior probabilidade de estar proximo de@. When 6 is far from the prior mean, the M.L.E. has higher probability of being close - to 6. < Outro caso em que a distribuigao amostral de um estimador é necessaria é quando o Another case in which the sampling distribution of an estimator is needed is estatistico deve decidir qual dos dois ou mais experimentos disponiveis deve ser realizado para when the statistician must decide which one of two or more available experiments obter o melhor estimador de@. Por exemplo, se ela tiver que escolher qual tamanho de amostra should be performed in order to obtain the best estimator of 6. For example, if she usar para um experimento, entado ela normalmente baseara sua decisdo nas distribuigées must choose which sample size to use for an experiment, then she will typically base amostrais dos diferentes estimadores que podem ser usados para cada tamanho de amostra. her decision on the sampling distributions of the different estimators that might be used for each sample size. 8.1 A distribuigdo amostral de uma estatistica 467 8.1 The Sampling Distribution of a Statistic 467 Figura 8.1Lote de Pr/|& a Figure 8.1 Plot of Pr(|é — —= 9| <0.1| Bpara ambos& 1,0 6| < 0.1|6) for both 6 equal 10 igual a média posteriore & . to the posterior mean and _ 7 igual ao MLE no Exemplo 5 08+\. 6 equal to the M.L.E. in S 08 +h, 8.1.4. gy a Example 8.1.4. Vv a S 0,6 \ S 06 ‘ 8 \ | . < $ 04 YX = 0.4 i o mn a Mee, 0,2 Pa 0.2 Pa 0 0,5 1,0 1,5 2,0 vocé 0 0.5 10 15 20 8 Conforme mencionado no final do Exemplo 8.1.3, existem estatisticos que ndo As mentioned at the end of Example 8.1.3, there are statisticians who do not wish desejam atribuir uma distribuicdo anterior a@. Esses estatisticos ndo seriam capazes de to assign a prior distribution to 6. Those statisticians would not be able to calculate a calcular uma distribuicdo posterior para@. Em vez disso, baseariam todas as suas posterior distribution for 6. Instead, they would base all of their statistical inferences infer€ncias estatisticas na distribuigdo amostral de quaisquer estimadores que on the sampling distribution of whatever estimators they chose. For example, a escolhessem. Por exemplo, um estatistico que optou por usar o MLE deGno Exemplo 8.1.4 statistician who chose to use the M.L.E. of 6 in Example 8.1.4 would need to deal precisaria lidar com toda a curva na Fig. 8.1 correspondente ao MLE para decidir qudo with the entire curve in Fig. 8.1 corresponding to the M.L.E. in order to decide how provavel é que o MLE esteja mais proximo de@do que 0,1. Alternativamente, ela pode likely it is that the M.L.E. will be closer to 6 than 0.1. Alternatively, she might choose escolher uma medida diferente de qudo préximo o MLE esta de@. a different measure of how close the M.L.E. is to 0. Exemplo Vida util dos componentes eletrénicos.Suponha que um estatistico opte por estimar Example Lifetimes of Electronic Components. Suppose that a statistician chooses to estimate 8.1.5 Opelo MLE, & 3/7em vez da média posterior no Exemplo 8.1.4. Este estatistico pode 8.1.5 6 by the M.L.E., 6= 3/T instead of the posterior mean in Example 8.1.4. This nao achar o grafico da Fig. 8.1 muito util, a menos que possa decidir qual 6os valores statistician may not find the graph in Fig. 8.1 very useful unless she can decide which sdo mais importantes a serem considerados. Em vez de calcular Pr(| & | <0.1 | ), ela @ values are most important to consider. Instead of calculating Pr(|O — 6| < 0.16), pode calcular she might compute (| i . le 6 Pr.| ia tos . (8.1.2) Pr ( - J < a ) . (8.1.2) Esta é a probabilidade de questa dentro de 10% do valor de@. A probabilidade em This is the probability that 4 is within 10% of the value of 9. The probability in (8.1.2) (8.1.2) pode ser calculada a partir da distribuigdo amostral do MLE Alternativamente, could be computed from the sampling distribution of the M.L.E. Alternatively, one pode-se notar que6/@= 3/07 ), e a distribuigdo de7é a distribuigdo gama com can notice that 6/6 = 3/(9T), and the distribution of 67 is the gamma distribution parametros 3 e 1. Portanto, @/@&tem uma distribuigdo que nado depende de@. Segue-se with parameters 3 and 1. Hence, 6/@ has a distribution that does not depend on 6. que o Pr(| 676-1 | <0.1| 8 o mesmo numero para todos@. Na notacgdo de It follows that Pr(|0/0 — 1] < 0.16) is the same number for all 6. In the notation of Exemplo 8.1.4, 0 cdf de@7éG/.| 1), e, portanto Example 8.1.4, the c.d.f. of OT is G(-|1), and hence Ne ond -oeb- daad Pr j= - 11<0.1| =Pr, [3_- II<0.116 rx (|f —1) <a) =Pr (| -1) <o./0) lo TI |OT 6 OT ( 3 | ) 3 =Pr. 0.9< —— <1.1l19 = Pr (09< 5 <11)0) OT OT =Pr.(2.73<OT <3.33| 8#)=GB.33 | = Pr(2.73 < OT < 3.33]6) 1}-G(2.73|10.134. = G(3.33]1) — G(2.73|1) = 0.134. O estatistico pode agora afirmar que a probabilidade é de 0,134 de que o MLE de@estara The statistician can now claim that the probability is 0.134 that the M.L.E. of 6 will dentro de 10% do valor de@, nado importa o que. - be within 10% of the value of 6, no matter what 6 is. < A variavel aleatoria676no Exemplo 8.1.5 6 um exemplo de umquantidade fundamental, que The random variable 6/@ in Example 8.1.5 is an example of a pivotal quantity, sera definido e usado extensivamente na Sec. 8.5. which will be defined and used extensively in Sec. 8.5. 468 Capitulo 8 Distribuigées amostrais de estimadores 468 Chapter 8 Sampling Distributions of Estimators Figura 8.2Lote de Pr(| 7 6| Figure 8.2 Plot of Pr(|7 — <0.1| Ano Exemplo 8.1.6. 1,00 \ [7 6| < 0.1|6) in Example 8.1.6. 1.00 \ / = 0,95 . . = 0.95 . . o S @ 0,90 ° ° Y 090 ° ° 3 -\ lf. Ss -\ lf. Ss \ 4A \ ra A . . . . & 0,85 2 if. & 085 . hw rn. * 0,80 . Ra Nuon . * 0.80 . Re Nunn tl . 0,75 mee a 0.75 mee a 0 0,2 0,4 0,6 0,8 1,0 vocé 0 0.2 04 0.6 0.8 10 8 Exemplo Um ensaio clinico.No Exemplo 8.1.1, encontramos a distribuigdo amostral de 7,0 pr6- Example A Clinical Trial. In Example 8.1.1, we found the sampling distribution of T, the pro- 8.1.6 parcela de pacientes sem recidiva no grupo da imipramina. Usando essa distribui¢ao, 8.1.6 portion of patients without relapse in the imipramine group. Using that distribution, podemos desenhar um grafico semelhante ao da Figura 8.1. Ou seja, para cadaG, we can draw a plot similar to that in Fig. 8.1. That is, for each 6, we can compute podemos calcular Pr(| 7-9| <0.1| 8). O grafico aparece na Fig. 8.2. Os saltos e a natureza Pr(|T — 0| < 0.16). The plot appears in Fig. 8.2. The jumps and cyclic nature of the ciclica da trama se devem a discrigdo da distribuigdo de 7A menor probabilidade é 0,7318 plot are due the discreteness of the distribution of T. The smallest probability is em6@= 0.5. (Os pontos isolados que aparecem abaixo da parte principal do grafico em@ 0.7318 at 6 = 0.5. (The isolated points that appear below the main part of the graph igual a cada multiplo de 1/40 apareceria igualmente acima da parte principal do grafico, at @ equal to each multiple of 1/40 would appear equally far above the main part of se tivéssemos tracgado Pr(| 7-6| <0.1| @Jem vez de Pr/(| 7-6| <0.1 | 8).) - the graph, if we had plotted Pr(|T — 6| < 0.1|6) instead of Pr(|T — 6| <0.1|6).) << Resumo Summary A distribuigéo amostral de um estimador 6 a distribuigdo condicional do estimador dado The sampling distribution of an estimator 4 is the conditional distribution of the esti- 0 parametro. A distribuigdo amostral pode ser usada como um calculo intermediario na mator given the parameter. The sampling distribution can be used as an intermediate avaliacdo das propriedades de um estimador Bayes antes da observacdo dos dados. Mais calculation in assessing the properties of a Bayes estimator prior to observing data. comumente, a distribuicdo amostral é usada por estatisticos que preferem nao usar More commonly, the sampling distribution is used by those statisticians who prefer distribuigées anteriores e posteriores. Por exemplo, antes de a amostra ser retirada, 0 not to use prior and posterior distributions. For example, before the sample has been estatistico pode usar a distribuigdo amostral de para calcular a probabilidade de que®” taken, the statistician can use the sampling distribution of 6 to calculate the proba- estara perto de@, Se esta probabilidade for alta para cada valor possivel de 6,entdo o bility that @ will be close to @. If this probability is high for every possible value of estatistico pode se sentir confiante de que o valor observado de ééstara perto de@. Depois 0, then the statistician can feel confident that the observed value of @ will be close que os dados forem observados e uma estimativa especifica for obtida, o estatistico to 6. After the data are observed and a particular estimate is obtained, the statisti- gostaria de continuar confiante de que a estimativa especifica provavelmente estara cian would like to continue feeling confident that the particular estimate is likely to proxima de, mesmo que probabilidades posteriores explicitas ndo possam ser be close to 6, even though explicit posterior probabilities cannot be given. It is not fornecidas. Contudo, nem sempre é seguro tirar tal conclusdo, como ilustraremos no final always safe to draw such a conclusion, however, as we shall illustrate at the end of do Exemplo 8.5.11. Example 8.5.11. Exercicios Exercises 1.Suponha que uma amostra aleatériaX, ..., Xndeve ser retirado 2.Suponha que uma amostra aleatoria seja retirada da 1. Suppose that a random sample X;,..., X,, is to be 2. Suppose that a random sample is to be taken from the da distribuicdo uniforme no intervalo [0,4] e essaGE desconhecido. distribuigdo normal com média desconhecidaGe desvio padrdo 2. taken from the uniform distribution on the interval [0, 6] normal distribution with unknown mean 6 and standard Qual o tamanho de uma amostra aleatoria que deve ser retirada Qual o tamanho de uma amostra aleatoria que deve ser coletada and that 6 is unknown. How large a random sample must deviation 2. How large a random sample must be taken para que para que £é/(| Xn-9|2)<0.1 para cada valor possivel de@? be taken in order that in order that E,(|X,, — 0|) < 0.1 for every possible value of 6? Pre | maximo{™, ..., Xn} -8| <0.1 bo.95, 3.Para as condicées do Exercicio 2, qual é o tamanho do aleatorio Pr(| max{X1,...,X,}—0|< 0.10) > 0.95, 3. For the conditions of Exercise 2, how large a random deve ser colhida uma amostra para que F6(| Xn-6| )S0.1 para sample must be taken in order that E,(|X,, — 9|) < 0.1 for para todos os possiveisé? cada valor possivel de 6? for all possible 6? every possible value of 6? 8.2 As Distribuicées Qui-Quadrado 469 8.2 The Chi-Square Distributions 469 4.Para as condigdes do Exercicio 2, qual é o tamanho do aleatério amostra aleatéria que deve ser colhida para que Pr(| Xr p| 4. For the conditions of Exercise 2, how large a random random sample that must be taken in order that Pr(|X,, — amostra deve ser coletada para que Pr(|Xn-6| <0.12 0.95 <0.120.95 quandop-0.2. sample must be taken in order that Pr(|X, —9|<O0.D> p| < 0.1) = 0.95 when p = 0.2. i i ? ws . para cada valor possivel ded? 7.Para as condigées do Exercicio 5, qual o tamanho da 0.95 for every possible value of 07 7. For the conditions of Exercise 5, how large a random amostra aleatéria que deve ser obtida para que£p/| Xn-p|2)s0. sample must be taken in order that E (|X, — pl?) < 0.01 5.Suponha que uma amostra aleatoria seja retirada da 01 quandop=0.2? 5. Suppose that a random sample is to be taken from the when p = 0.2? distribuigdo de Bernoulli com parametro desconhecidop. Suponha Bernoulli distribution with unknown parameter p. Sup- . ; também que se acredite que o valor depesta na vizinhanga de 0,2. 8.Para as condicées do Exercicio 5, qual o tamanho da pose also that it is believed that the value of p is in the 8. For the conditions of Exercise 5, how large a random Qual deve ser o tamanho de uma amostra aleatoria amostra aleatoria que deve ser obtida para que £Fp/| Xn-p|2)s0. neighborhood of 0.2. How large a random sample must sample must be taken in order that £,(|X,, — p|*) < 0.01 ser tomadas para que 0 Pr(|Xrp| <0.1/20.75 quando p __—O1 para cada valor possivel dep (Osps1? be taken in order that Pr(|X,, — p| <0.1) >0.75 when _ for every possible value of p (0 < p <1)? =0.2? 9.DeixarX,..., Xnseja uma amostra aleatoria da p=0.2? 9. Let X1,..., X, be a random sample from the expo- distribuigdo exponencial com parametro@. Encontre o cdf nential distribution with parameter @. Find the c.d.f. for 6.Para as condig6es do Exercicio 5, use o teorema do limite central na para a distribuigdo amostral do MLE de@. (O préprio MLE 6. For the conditions of Exercise 5, use the central limit the sampling distribution of the M.L.E. of 6. (The M.L.E. Sedo. 6.3 para encontrar aproximadamente o tamanho de um foi encontrado no Exercicio 7 da Secdo 7.5.) theorem in Sec. 6.3 to find approximately the size of a itself was found in Exercise 7 in Sec. 7.5.) 8.2 As Distribuic6es Qui-Quadrado 8.2 The Chi-Square Distributions A familia do qui-quadrado (2) distribuigées 6 uma subcole¢ao da familia de distribuigées The family of chi-square (x7) distributions is a subcollection of the family of gama. Essas distribuigdes gama especiais surgem como distribuicgées amostrais de gamma distributions. These special gamma distributions arise as sampling dis- estimadores de variancia baseados em amostras aleat6rias de uma distribui¢go normal. tributions of variance estimators based on random samples from a normal distri- bution. Definicgdo das Distribuigoes Definition of the Distributions Exemplo MLE da Variancia de uma Distribuigdo Normal.Suponha que, ..., Xnformar um Example M.L.E. of the Variance of a Normal Distribution. Suppose that X,,..., X,, form a 8.2.1 amostra aleatdria da distribuigdo normal com média conhecidaye variancia desconhecida 8.2.1 random sample from the normal distribution with known mean yw and unknown o2. O MLE deazé encontrado no Exercicio 6 na Seg. 7.5. Isso é variance o”. The M.L.E. of o? is found in Exercise 6 in Sec. 7.5. It is 150 + 1x! a 2 d= — (Xyp. 06 =— D(X - pW’. n n* eu=1 i=l As distribuicées deo30e03 ~—_0/osdo Uteis em varios problemas estatisticos, e nds The distributions of a6 and 04/07 are useful in several statistical problems, and we devemos deriva-los nesta secao. - shall derive them in this section. < Nesta secdo, apresentaremos e discutiremos uma classe particular de distribuigd6es gama In this section, we shall introduce and discuss a particular class of gamma dis- conhecida como qui-quadrado.(y2Mdistribuicgdes. Estas distribuigdes, que estao intimamente tributions known as the chi-square (7) distributions. These distributions, which are relacionadas com amostras aleatérias de uma distribuigéo normal, sio amplamente aplicadas closely related to random samples from a normal distribution, are widely applied in no campo da estatistica. No restante deste livro, veremos como eles sdo aplicados em muitos the field of statistics. In the remainder of this book, we shall see how they are applied problemas importantes de inferéncia estatistica. Nesta secdo apresentaremos a definicao doy2 in many important problems of statistical inference. In this section, we shall present distribuigdes e algumas de suas propriedades matematicas basicas. the definition of the x? distributions and some of their basic mathematical properties. Definicao ZeDistribuigdes.Para cada numero positivoeu, a distribuicgdéo gama com pardmetros Definition x Distributions. For each positive number m, the gamma distribution with parame- 8.2.1 termosa=m/2 ef= 1/2 é chamado dey2distribuig¢ao comeugraus de liberdade. 8.2.1 ters a =m/2 and f = 1/2 is called the x? distribution with m degrees of freedom. (See (Ver Definigdo 5.7.2 para a definigdo da distribuigdo gama com pardmetrosae f.) Definition 5.7.2 for the definition of the gamma distribution with parameters a and B.) E comum restringir os graus de liberdadeeuna Definicgado 8.2.1 como um numero It is common to restrict the degrees of freedom m in Definition 8.2.1 to be an integer. inteiro. Porém, existem situacgdes em que sera util que os graus de liberdade ndo However, there are situations in which it will be useful for the degrees of freedom to sejam inteiros, por isso ndo faremos essa restri¢do. not be integers, so we will not make that restriction. 470 Capitulo 8 Distribuigées amostrais de estimadores 470 Chapter 8 Sampling Distributions of Estimators Se uma variavel aleatériaXtem oy2distribuigdo comeugraus de liberdade, If a random variable X has the x? distribution with m degrees of freedom, it segue da Eq. (5.7.13) que o pdf dexXparax >0 é follows from Eq. (5.7.13) that the p.d.f. of X for x > 0 is 1 1 tx, X(MAIAVEX?2. 8.2.1 x) = — 9 Tg? 8.2.1 Também, 4x0 paraxs0. Also, f(x) =0 for x <0. Uma pequena tabela depquantis para oyzdistribuigdo para varios valores depe varios A short table of p quantiles for the x? distribution for various values of p and graus de liberdade sao fornecidos no final deste livro. A maioria dos pacotes de software various degrees of freedom is given at the end of this book. Most statistical software estatistico inclui fungdes para calcular o cdf e a fungdo quantilica de um valor arbitrario.y2 packages include functions to compute the c.d.f. and the quantile function of an distribuigao. arbitrary x? distribution. Segue da Definicdo 8.2.1 e pode ser visto na Eq. (8.2.1), que o yadistribuigdo It follows from Definition 8.2.1, and it can be seen from Eq. (8.2.1), that the com dois graus de liberdade é a distribuigdo exponencial com pardmetro 1/2 ou, x° distribution with two degrees of freedom is the exponential distribution with equivalentemente, a distribuigdo exponencial para a qual a média é 2. Assim, as parameter 1/2 or, equivalently, the exponential distribution for which the mean is trés distribuigdes a seguir sdo todas iguais: a distribuigdo gama com parametros 2. Thus, the following three distributions are all the same: the gamma distribution a= 1 ef= 172, ox2distribuigdo com dois graus de liberdade e a distribuicgdo with parameters w = 1 and 6 = 1/2, the x? distribution with two degrees of freedom, exponencial para a qual a média é 2. and the exponential distribution for which the mean is 2. Propriedades das Distribuicdes Properties of the Distributions Os meios e variancias dey2distribuigdes seguem imediatamente do Teorema 5.7.5 e The means and variances of x? distributions follow immediately from Theorem 5.7.5, sdo fornecidas aqui sem prova. and are given here without proof. Teorema Média e Variancia.Se uma variavel aleatériaXtem oy2distribuigdo comeugraus de Theorem Mean and Variance. If a random variable X has the x? distribution with m degrees of 8.2.1 liberdade, entado £X eue Var(XF2eu. a 8.2.1 freedom, then E(X) =m and Var(X) = 2m. a Além disso, segue-se da fungdo geradora de momento dada na Eq. Furthermore, it follows from the moment generating function given in Eq. (5.7.15) que o FGM deXé (5.7.15) that the m.g.f. of X is ( 1 ) me 1 1 \m? 1 t —— arat < t) = { —— fort <<. WE TE paren ¥@ (=x) 2 A propriedade de aditividade doyza distribuigdo, que é apresentada sem prova no The additivity property of the x? distribution, which is presented without proof proximo teorema, segue diretamente do Teorema 5.7.7. in the next theorem, follows directly from Theorem 5.7.7. Teorema Se as variaveis aleatériasXi,..., Xksdo independentes e seXeutem oya2distribuigdo Theorem If the random variables Xj, ..., X; are independent and if X; has the x? distribution 8.2.2 comévueugraus de liberdade(eu=1,..., Kk), entdo asomait. . .+XKtem ox2 8.2.2 with m; degrees of freedom (i = 1,...,), then the sum X; +---+ X, has the x2 distribuigdo comeui+. . .+euxgraus de liberdade. 7 distribution with m, + --- +m, degrees of freedom. . Estabeleceremos agora a relacdo basica entre oy2distribuigdes e a We shall now establish the basic relation between the x? distributions and the distribuigdo normal padrdo. standard normal distribution. Teorema DeixarXtem a distribuigdo normal padrdo. Entdo a variavel aleatoriaS=X2tem Theorem Let X have the standard normal distribution. Then the random variable Y = X? has 8.2.3 ox2distribuigdo com um grau de liberdade. 8.2.3 the x? distribution with one degree of freedom. ProvaDeixar/f(sjeF (s\denotam, respectivamente, o pdf e o cdf deS.Além disso, Proof Let f(y) and F(y) denote, respectively, the p.d-f. and the c.d.f. of Y. Also, desdeXtem a distribuigdo normal padrdo, vamos deixar y/(xJe(x)denotar o pdf eo since X has the standard normal distribution, we shall let (x) and ®(x) denote the cdf dex. Entado paravocé >0, p.d.f. and the c.d.-f. of X. Then for y > 0, F (sPr.(SSePr.(X2SePr.-sim 28 X<sirm 2) F(y) =Pr(¥ < y) = Pr(X’ < y) = Pr(-y!/? < x < y7) =(sirm2}-(-sinm 2). = &(y/*) — @(-y"). 8.2 As Distribuicées Qui-Quadrado 471 8.2 The Chi-Square Distributions 471 Desdef (sEP(ske v(x que '(x), segue da regra da cadeia para derivativos Since f(y) = F’(y) and $(x) = ®(x), it follows from the chain rule for derivatives that Cayce (oo) ' ' -12+0f- _ _ _ F(SEQ2 )simint+Q-sim2 — )sim-1r. fO) = o(y"/?) (5 v2) + o(-yl/) (5 v2) 2 2 2 2 Além disso, desde y(sinrn 2 y(-sirm 2)= (21-1 2e-s/2, seque-se agora que Furthermore, since ¢(y!/”) = ¢(—y!/?) = (22) ~'/2e-9/2, it now follows that F(SE 1 sin snesaparavocé >0 f= _! _\-1p,-»p for y >0 (2m)? (2x) 1/2 Comparando esta equacdo com a Eq. (8.2.1), vé-se que a pdf deSé de fato o pdf By comparing this equation with Eq. (8.2.1), it is seen that the p.d.f. of Y is indeed doyz2distribuigdo com um grau de liberdade. 7 the p.d.f. of the x? distribution with one degree of freedom. 7 Podemos agora combinar 0 Teorema 8.2.3 com o Teorema 8.2.2 para obter o seguinte We can now combine Theorem 8.2.3 with Theorem 8.2.2 to obtain the follow- resultado, que fornece a principal razdo pela qual oyza distribuicdo é importante nas ing result, which provides the main reason that the x? distribution is important in estatisticas. statistics. Corolario Se as variaveis aleatdérias™,..., XeuSdo iid com a distribuigdo normal padrdo, Corollary If the random variables X;,..., X,, are 1.i.d. with the standard normal distribution, 8.2.1 entdo a soma dos quadradosX21+...+X2 eutem oya2distribuigdo comeugraus de 8.2.1 then the sum of squares xt feet x? has the x? distribution with m degrees of liberdade. 7 freedom. 7 Exemplo MLE da Variancia de uma Distribuigdo Normal.No Exemplo 8.2.1, as varidveis aleatérias Example M.L.E. of the Variance of a Normal Distribution. In Example 8.2.1, the random variables 8.2.2 Zeu=(Xeu- )/oparaeu=1,..., nformar uma amostra aleatéria da equipe>padrao normal 8.2.2 Z; = (X; — w)/o fori =1,..., form a random sample from the standard normal distribuigdo. Segue-se do Corolario 8.2.1 que a distribuigao de Fe zeUe oO distribution. It follows from Corollary 8.2.1 that the distribution of }*"_, Zz? is the Z2distribuigdo comngraus de liberdade. E facil ver isso Fe zoe precisamente x? distribution with n degrees of freedom. It is easy to see that an Zz? is precisely o mesmo quenow/o2, que aparece no Exemplo 8.2.1. Entdo a distribuigdo deno2 0o/o2 the same as nop /o, which appears in Example 8.2.1. So the distribution of nog /o7 é ox2distribuigdo comngraus de liberdade. O leitor também devera ser capaz de is the x? distribution with n degrees of freedom. The reader should also be able to veja que a distribuicgdo deo2oem si é a distribuigdo gama com pardmetrosn/2 see that the distribution of oo itself is the gamma distribution with parameters n/2 en/2o2XExercicio 13). - and n/(207) (Exercise 13). < Exemplo Concentragao de Acido em Queijo.Moore e McCabe (1999, p. D-1) descrevem uma experiéncia Example Acid Concentration in Cheese. Moore and McCabe (1999, p. D-1) describe an experi- 8.2.3 estudo realizado na Australia para estudar a relacdo entre o sabor e a composicao quimica do 8.2.3 ment conducted in Australia to study the relationship between taste and the chemical queijo. Um produto quimico cuja concentracdo pode afetar o sabor é 0 acido lactico. Os composition of cheese. One chemical whose concentration can affect taste is lactic fabricantes de queijo que desejam estabelecer uma base de clientes fiéis gostariam que o sabor acid. Cheese manufacturers who want to establish a loyal customer base would like fosse praticamente 0 mesmo sempre que um cliente comprasse 0 queijo. A variacdo nas the taste to be about the same each time a customer purchases the cheese. The vari- concentracées de produtos quimicos como o acido lactico pode levar a variagao no sabor do ation in concentrations of chemicals like lactic acid can lead to variation in the taste queijo. Suponha que modelemos a concentracao de Acido lactico em varios pedacos de queijo of cheese. Suppose that we model the concentration of lactic acid in several chunks como variaveis aleatérias normais independentes com médiape variacdo oz. Estamos of cheese as independent normal random variables with mean yw and variance o?. interessados em saber o quanto essas concentracées diferem do valory. Deixar Xi,..., Xk We are interested in how much these concentrations differ from the value yw. Let sejam as concentragdes emkpedacos e deixe Zeu=(Xeu- p/)/o.Entao X1,..., X, be the concentrations in k chunks, and let Z; = (X; — )/o. Then 1yk yk 1 k 2 =k — 2, & 2_¢ 2 =F My nl SO 2, Y=7 Di wP aT YZ, k k k ¢ k + eu=1 eu=1 i=l i=l é uma medida de quantodconcentragées diferem dey. Suponha que uma diferenca is one measure of how much the k concentrations differ from jz. Suppose that a dif- devocéou mais na concentracdo de acido lactico é suficiente para causar uma ference of u or more in lactic acid concentration is enough to cause a noticeable diferenga notavel no sabor. Poderiamos entdo desejar calcular Pr(S<vocé2). De difference in taste. We might then wish to calculate Pr(Y < u2). According to Corol- acordo com 0 Corolario 8.2.1, a distribuigdo de C=kY/o2zéyzcomkgraus de liberdade. lary 8.2.1, the distribution of W =kY/o? is x* with k degrees of freedom. Hence, Portanto, Pr.(S<vocé2Pr.(C< ku2/o2). Pr(Y < u2) = Pr(W < ku? /o?). Por exemplo, suponha queoz= 0.09, e estamos interessados emA=10 pedacos For example, suppose that o” = 0.09, and we are interested in k = 10 cheese de queijo. Além disso, suponha quevocé=0.3 é a diferenca critica de interesse. N6és chunks. Furthermore, suppose that u = 0.3 is the critical difference of interest. We 472 Capitulo 8 Distribuigées amostrais de estimadores 472 Chapter 8 Sampling Distributions of Estimators pode escrever ( ) can write 10x0.09 10 x 0.09 Pr.(SS0.32 )=Pr. CS MW =Pr.(CS10). (8.2.2) Pr(Y <0.3*) = Pr( W < ———— ) = Pr(wW < 10). (8.2.2) 0.09 0.09 Usando a tabela de quantis doyzdistribuigdo com 10 graus de liberdade, vemos que 10 esta Using the table of quantiles of the x° distribution with 10 degrees of freedom, we entre os quantis 0,5 e 0,6. Na verdade, a probabilidade na Eq. (8.2.2) pode ser considerado igual see that 10 is between the 0.5 and 0.6 quantiles. In fact, the probability in Eq. (8.2.2) a 0,56 pelo software de computador, portanto ha 44% de chance de que a diferenca quadrada can be found by computer software to equal 0.56, so there is a 44 percent chance média entre a concentrac¢ao de acido lactico e a concentragdo média em 10 pedacos seja maior that the average squared difference between lactic acid concentration and mean do que a quantidade desejada. Se esta probabilidade for muito grande, o fabricante podera concentration in 10 chunks will be more than the desired amount. If this probability is querer investir algum esforgo na reducdo da variacdo da concentracao de acido lactico. too large, the manufacturer might wish to invest some effort in reducing the variance - of lactic acid concentration. < Resumo Summary A distribuigdo qui-quadrado comngraus de liberdade sao iguais a distribuigdo gama The chi-square distribution with n degrees of freedom is the same as the gamma com parametrosm?2 e 1/2. E a distribuigao da soma dos quadrados de uma amostra distribution with parameters m/2 and 1/2. It is the distribution of the sum of squares deewariaveis aleatérias normais padrdo independentes. A média do y2distribuigdo of a sample of m independent standard normal random variables. The mean of the comeugraus de liberdade éeu, e a variancia é 2 eu. x2 distribution with m degrees of freedom is m, and the variance is 2m. Exercicios Exercises 1.Suponha que vamos ovar 20 pedacgos de queijo em deixar(X, Y, Z)denotar as coordenadas da particula a 1. Suppose that we will sample 20 chunks of cheese in let (X, Y, Z) denote the coordinates of the particle at any Exemplo 8.2.3. Deixar Be ont (Keu-ye20 ondeXeué | qualquer momento>0. As variaveis aleatériasxX,S,eZsdo Example 8.2.3. Let T = ee — )*/20, where X; is the time ¢t > 0. The random variables X, Y, and Z are i.i.d., 1= ? : : : : . . coricen ragao dei iid, e cada um deles tem distribuigdo normal com média 0 concentration of lactic acid in the ith chunk. Assume that and each of them has the normal distribution with mean HGH CS BRSUR BS FATF< AES ERA isso a2= 0.09. Qual e varianciaoztEncontre a probabilidade de que no o” =0.09. What number c satisfies Pr(T < c) = 0.9? 0 and variance ot. Find the probability that at time 1 = 2 ee momentof2 a particula estara dentro de uma esfera cujo . Des . the particle will lie within a sphere whose center is at the 2.Encontre 0 modo doyadistribuigdo comeugraus de centro esta na origem e cujo raio € 4a. 2. Find the mode of the x“ distribution with m degrees of origin and whose radius is 4c. liberdade(eu=1,2,...). freedom (m = 1, 2,...). ae ; 7.Suponha que as variaveis aleatdriasXi,..., Xnsdo 9 ate . 7. Suppose that the random variables X,,..., X,, are in- 3.Esboce o pdf doyadistribuicao comeugraus de liberdade independentes e cada variavel aleatoriaXeutem um cdf 3. Sketch the p.d.f. of the x distribution with m degrees of dependent, and each random variable X; has a continuous para cada um dos seguintes valores deeu. Localize a continuo Feu. Além dissarideaesleatoriaSser definido pelo freedom for each of the following values of m. Locate the c.d.f. F;. Also, let the random variable Y be defined by the ela, a mediana e€ a moda em cada esboco.(a)eu=1; relacdoS= -2 2 -rregistro Feu(Xeu). Mostre issoStem 072 mean, the median, and the mode on each sketch. (a) m = 1; relation Y = —2 an log F,(X;). Show that Y has the x2 (b)eu=2;(c)eu=3;(e)eu=4. distribuicdo com 2ngraus de liberdade. (b) m = 2; (c) m= 3; (d) m =4. distribution with 2n degrees of freedom. a Suponne que um ponto(%, ree oes escolhido 8.Suponha queX,..., Xnformar uma amostra aleatéria a 4. Suppose that a pom vy to Pe chosen at random 8. Suppose that X,..., X, form a random sample from a orae es erdontes e oad EAC ISAO yo distrib . partir da distribuicdo uniforme no intervalo [0,1], e deixe C in th bee ae "n he ae dard n el fet ‘bution. the uniform distribution on the interval [0, 1], and let a ee “i, So Se entes e i fa “dos i istrl mane denota o intervalo da amostra, conforme definido no Tha ci “I ani han a th es Ol ar th ite vier at the W denote the range of the sample, as defined in Exam- norma pa rao. Se um creulo or jesen ado noxye ano Exemplo 3.9.7. Além disso, deixegn(x)denotar o pdf da a cire ois rawn mn e xP ane wit i S center a ne ple 3.9.7. Also, let ¢,(x) denote the p.d.f. of the random com centro na origem, qual € 0 ralo Co Menor circulo que variavel aleat6éria 2n(1 -C), e deixarg(xdenotar o pdf do x2 OMI, WHAT TS be Factus OF tie smallest circle mat can be variable 2n(1 — W), and let g(x) denote the p.d-f. of the pode ser escolhido para que haja probabilidade 0,99 de distribuica : . chosen in order for there to be probability 0.99 that the 2 Asetetha ati : aor , istribuigao com quatro graus de liberdade. Mostre isso : “yy ys es . x distribution with four degrees of freedom. Show that que o ponto(X, Yficara dentro do circulo? point (X, Y) will lie inside the circle? 5.Suponha que um ponto(X, Y, Z)deve ser escolhido jlimao gn(xF g(xparax 0. 5. Suppose that a point (X, Y, Z) is to be chosen at ran- im 8n(x) = g(x) forx >0. aleatoriamente no espaco tridimensional, ondex, 5,eZsdo 9.Suponha queX,..., Xnformar uma amostra aleatéria da dom m three-dimensional space, where X, Y, and Z are 9. Suppose that X,..., X, form a random sample from variaveis aleatdérias independentes e cada uma tem a tg ao. . iw independent random variables and each has the standard ve . . 2: distribuic3 = , a: distribuigdo normal com médiaye variagdooz. Encontre a Lo : uy . the normal distribution with mean yp and variance o~. Find istribuigdo normal padrdo. Qual é a probabilidade de que Cw normal distribution. What is the probability that the dis- ere as . . . distribuicdo de vo . . : the distribution of a distancia da origem ao ponto seja menor que 1 unidade? tance from the origin to the point will be less than 1 unit? xX, ¥ _,)/2 6.Quando o movimento de uma particula microscépica em um liquido NXE 6. When the motion of a microscopic particle in a liquid mn" ou gas é observado, percebe-se que 0 movimento é irregular porque a O2 or a gas is observed, it is seen that the motion is irregular o particula colide frequentemente com outras particulas. O modelo de 10.Suponha que seis varidveis aleatoriasXi,..., Xéforme uma because the particle collides frequently with other parti- 10. Suppose that six random variables X,,..., X¢ form probabilidade para este movimento, que é chamado Movimento amostra aleatéria a partir da distribuicdo normal padrao e cles. The probability model for this motion, whichiscalled a random sample from the standard normal distribution, browniano,é 0 seguinte: Um sistema de coordenadas é escolhido no deixe Brownian motion, is as follows: A coordinate system is and let liquido ou no gas. Suponha que a particula esteja na origem deste chosen in the liquid or gas. Suppose that the particle is 5 5 sistema de coordenadas no tempoé=0, e S=(Xi+ X2+ X3fot (XAtXSt X62. at the origin of this coordinate system at time t = 0, and Y= (X14 XQ + X3)" + (Xq + X54 Xo)". 8.3 Distribuicao Conjunta da Média Amostral e Variancia Amostral 473 8.3 Joint Distribution of the Sample Mean and Sample Variance 473 Determine um valor dectal que a variavel aleatériacY 12.Considere novamente a situacdo descrita no Exemplo Determine a value of c such that the random variable cY 12. Consider again the situation described in Example tera umyadistribui¢do. 8.2.3. Quaéo pequeno seriaozprecisa estar em ordem para Pr(Ss 0. will have a x? distribution. 8.2.3. How small would o? need to be in order for Pr(Y < 0920.9? 0.09) > 0.9? 11.Se uma variavel aleatéoriaXtem oy2distribuigdo comeu 11. If a random variable X has the x? distribution with m ~ graus de liberdade, entdo a distribuicao deX12é chamado de 13.Prove que a distribuigdo deo2 onos Exemplos 8.2.1 degrees of freedom, then the distribution of X!/?iscalleda 13. Prove that the distribution of oj in Examples 8.2.1 chiydistribuic¢go com m graus de liberdade. Determine a e 8.2.2 6a distribuigdo gama com paradmetrosn72 en/2 chi (x) distribution with m degrees of freedom. Determine and 8.2.2 is the gamma distribution with parameters n/2 média desta distribuicgdo. O2). the mean of this distribution. and n/(207). 8.3 Distribuicdo Conjunta da Média Amostral e 8.3 Joint Distribution of the Sample Mean Variancia Amostral and Sample Variance Suponha que nossos dados formem uma amostra aleat6ria de uma distribui¢do normal. O Suppose that our data form a random sample from a normal distribution. The média amostral Le varidncia amostral o2sao estatisticas importantes que sao sample mean ji and sample variance o2 are important statistics that are computed calculadas para estimar os pardmetros da distribuigaéo normal. Suas distribuigées in order to estimate the parameters of the normal distribution. Their marginal marginais nos ajudam a entender quéo boa cada uma delas é como estimador do distributions help us to understand how good each of them is as an estimator of pardmetro correspondente. No entanto, a distribuic¢go marginal de p’depende the corresponding parameter. However, the marginal distribution of ju depends em o.A distribuiggo conjunta de pre o2nos permitird fazer inferéncias sobre [ ono. The joint distribution of (i and o? will allow us to make inferences about ju sem referéncia a Oo. without reference to o. Independéncia da média amostral e variancia amostral Independence of the Sample Mean and Sample Variance Exemplo Chuva de nuvens semeadas.Simpson, Olsen e Eden (1975) descrevem um experimento Example Rain from Seeded Clouds. Simpson, Olsen, and Eden (1975) describe an experiment 8.3.1 no qual uma amostra aleatoria de 26 nuvens foi semeada com nitrato de prata para ver se produziam 8.3.1 in which a random sample of 26 clouds were seeded with silver nitrate to see if they mais chuva do que nuvens ndo semeadas. Suponha que, em uma escala logaritmica, as nuvens nao produced more rain than unseeded clouds. Suppose that, on a log scale, unseeded semeadas normalmente produzissem uma precipitagdo média de 4. Ao comparar a média das nuvens clouds typically produced a mean rainfall of 4. In comparing the mean of the seeded semeadas com a média nao semeada, pode-se ver naturalmente até que ponto a precipitacdo média clouds to the unseeded mean, one might naturally see how far the average log-rainfall logaritmica das nuvens semeadas/é de 4. Mas a variacdo da precipitacdo dentro da amostra também of the seeded clouds (i is from 4. But the variation in rainfall within the sample is also é importante. Por exemplo, se compararmos duas amostras diferentes de nuvens semeadas, seria de important. For example, if one compared two different samples of seeded clouds, esperar que as precipitacgdes médias nas duas amostras fossem diferentes apenas devido a variacao one would expect the average rainfalls in the two samples to be different just due entre as nuvens. Para ter certeza de que a semeadura das nuvens realmente produziu mais chuva, to variation between clouds. In order to be confident that seeding the clouds really gostariamos que 0 logaritmo de precipitacdo médio excedesse 4 em uma grande quantidade em produced more rain, we would want the average log-rainfall to exceed 4 by a large comparacdo com a variacdo entre as amostras, que esta intimamente relacionada a variacdo dentro amount compared to the variation between samples, which is closely related to the das amostras. Como nao conhecemos a variancia das nuvens semeadas, variation within samples. Since we do not know the variance for seeded clouds, we calcular a variancia da amostrao2. Comparandosé 4 ao2exige que consideremos a compute the sample variance o”. Comparing /i — 4 to o? requires us to consider the distribuigdo conjunta da média amostral e da varidncia amostral. - joint distribution of the sample mean and the sample variance. < Suponha que, ..., Xnformar uma amostra aleatoria da distribuigdo normal com Suppose that X,,..., X, form a random sample from the normal distribution média desconhecidawe variancia desconhecidaoz. Entaéo, como foi mostrado no Exame with unknown mean jp and unknown variance o”. Then, as was shown in Exam- item &9.6, os MLEs deveorsdo a média da amostraXne a variancia da amostra ple 7.5.6, the M.L.E.’s of jz and o? are the sample mean X,, and the sample variance (\/n) © 8u=1 (Xeu-Xn. Nesta secdo, derivaremos a distribuicdo conjunta desses (1/n) 7"_,(X; — X,,)?. In this section, we shall derive the joint distribution of these dois estimadores. two estimators. Ja sabemos pelo Corolario 5.6.2 que a propria média amostral tem a distribuigdo We already know from Corollary 5.6.2 that the sample mean itself has the normal normal com médiawe variagdoo2/n. Estabeleceremos a propriedade digna de nota de que distribution with mean yw and variance o*/n. We shall establish the noteworthy a média amostral e a variancia amostral sdo varidveis aleatérias independentes, embora property that the sample mean and the sample variance are independent random ambas sejam funcdes das mesmas variaveis aleatdrias.X1,..., Xn. Além disso, variables, even though both are functions of the same random variables Xj, ..., Xj. mostraremos que, exceto por um fator de escala, a varidncia da amostra tem Furthermore, we shall show that, except for a scale factor, the sample variance has ox2distribuigdo coming graus de liberdade. Mais precisamente, mostraremos que the x? distribution with n — 1 degrees of freedom. More precisely, we shall show that avariavel aleatoria ~ @-1(Xeu-Xnk/oztem oy2distribuigdo comn-1 grau the random variable }~"_,(X; — X,,)*/o has the x? distribution with n — 1 degrees 474 Capitulo 8 Distribuigdes amostrais de estimadores 474 Chapter 8 Sampling Distributions of Estimators da liberdade. Este resultado é também uma propriedade bastante surpreendente de amostras of freedom. This result is also a rather striking property of random samples from a aleatérias de uma distribuicdo normal, como indica a discussdo a seguir. normal distribution, as the following discussion indicates. Porque as variaveis aleatdoriasXi,..., Xnsdo independentes e porque cada um Because the random variables X;,..., X, are independent, and because each tem distribuigdo normal com médiape varia¢dooz, as variaveis aleatdérias (X1-L/)// has the normal distribution with mean yw and variance o”, the random variables O,..., (Xrp/otambém sao independentes, e cada uma dessas varidveis temo (X, — )/o,..., (X, — w)/o are also independent, and each of these variables has suporte) distribuigdo normal. Segue-se do Corolario 8.2.1 que a soma de seus the standard normal distribution. It follows from Corollary 8.2.1 that the sum of their quadrados ~— Gu=1(Xeu-2/o2ztem oy2distribuicao comngraus de liberdade. Por isso, squares )>"_,(X; — it)’ /o* has the x? distribution with n degrees of freedom. Hence, a propriedade impressionante mencionada no paragrafo anterior é que se a populacao the striking property mentioned in the previous paragraph is that if the population significaé substituido pela média amostralXnnesta soma de quadrados, 0 efeito é mean y is replaced by the sample mean X,, in this sum of squares, the effect is simply simplesmente reduzir os graus de liberdade noyadistribuigdo denparan-1. Em resumo, to reduce the degrees of freedom in the x? distribution from n to n — 1. In summary, estabeleceremos 0 seguinte teorema. we shall establish the following theorem. Teorema Suponha que, ..., Xnformar uma amostra aleatoria da distribuigdo normal Theorem Suppose that X,,..., X, form a random sample from the normal distribution 8.3.1 com rpediape varia¢doo2. Entéo a média amostralXn e a variancia da amostra 8.3.1 with mean and variance o?. Then the sample mean X,, and the sample variance (1/n) ~ Gu=1(Xeuw-Xnpsao varidveis aleatorias independentes,Antem a distribuicao normal (1/n) Yor (X; - X,,)* are independent random variables, X,, has the normal distribu- cdo com médiaye variagdoo2/n, e éu=1 (Xeu-Xnb/aatem oyx2distribuigdo tion with mean y and variance o7/n, and )~"_,(X; — X,,)"/o” has the x? distribution comr-1 graus de liberdade. with n — 1 degrees of freedom. Além disso, pode ser mostrado que a média amostral e a variancia amostral sdo Furthermore, it can be shown that the sample mean and the sample variance are independentesapenasquando a amostra aleatéria é retirada de uma distribuigdo normal. Nao independent only when the random sample is drawn from a normal distribution. We consideraremos esse resultado mais adiante neste livro. No entanto, enfatiza o facto de que a shall not consider this result further in this book. However, it does emphasize the independéncia da média amostral e da variancia amostral é de facto uma propriedade digna de fact that the independence of the sample mean and the sample variance is indeed a nota das amostras de uma distribuigdo normal. noteworthy property of samples from a normal distribution. A prova do Teorema 8.3.1 faz uso de transformac6es de diversas variaveis conforme The proof of Theorem 8.3.1 makes use of transformations of several variables as descrito na Seg. 3.9 e das propriedades das matrizes ortogonais. A prova aparece no final described in Sec. 3.9 and of the properties of orthogonal matrices. The proof appears desta secdo. at the end of this section. Exemplo Chuva de nuvens semeadas.A Figura 8.3 é um histograma dos logaritmos das chuvas Example Rain from Seeded Clouds. Figure 8.3 is a histogram of the logarithms of the rainfalls 8.3.2 das nuvens semeadas no Exemplo 8.3.1. Suponha que esses logaritmos%i, ..., X26 8.3.2 from the seeded clouds in Example 8.3.1. Suppose that these logarithms X, ..., Xo s4o modelados como variaveis aleatérias normais iid com médiaye variacdooz. Se estivermos are modeled as i.i.d. normal random variables with mean jz and variance o”. If we interessados em quanta variacdo existe na Precipitagao entre as nuvens semeadas, are interested in how much variation there is in rainfall among the seeded clouds, podemos calcular a variancia da amostrao=" eu=1(Xeu-Xn) 26. A distribuigdo de we can compute the sample variance o? = ee — X,,)*/26. The distribution of vocé=26 02/026 oy2distribuigdo com 25 graus de liberdade. Podemos usar isso U = 2602/07 is the x? distribution with 25 degrees of freedom. We can use this distribuigdo para nos dizer qudo provavel é queo2superestimara ou subestimaraoz distribution to tell us how likely it is that o2 will overestimate or underestimate o7 por varios montantes. Por exemplo, oyztabela neste livro diz que o quantil 0,25 doy2 by various amounts. For example, the x? table in this book says that the 0.25 quantile distribuicdo com 25 graus de liberdade é 19,94, entdo Pr(VOCES19.94-0.25. of the x? distribution with 25 degrees of freedom is 19.94, so Pr(U < 19.94) = 0.25. Figura 8.3Histograma de Figure 8.3 Histogram of registros de chuvas de nuvens 8 log-rainfalls from seeded 8 semeadas. clouds. 6 6 Sy Sy 2 2 0 2 4 6 8 0 2 4 6 8 registro (chuva) log(rainfall) 8.3 Distribuicao Conjunta da Média Amostral e Variancia Amostral 475 8.3 Joint Distribution of the Sample Mean and Sample Variance 475 Segue que ( It follows that 02 _ 19.94 - 52 19.94 > 0.25=Pr — < —— =Pr.(o2<0.77 02). (8.3.1) 0.25 = Pr( < p94 = Pr(o? < 0.770”). (8.3.1) O2 26 o2 26 Ou seja, ha probabilidade 0,25 de queoivai subestimarozem 23 por cento ou mais. That is, there is probability 0.25 that o2 will underestimate o2 by 23 percent or more. O valor observado deo2é 2,460 neste exemplo. A probabilidade calculada na Eq. The observed value of o? is 2.460 in this example. The probability calculated in (8.3.1) ndo tem nada a ver com a distancia de 2.46002. Eq. (8.3.1) nos diz o Eq. (8.3.1) has nothing to do with how far 2.460 is from o”. Eq. (8.3.1) tells us the probabilidade (antes de observar os dados) queaZestaria pelo menos 23% abaixocz. - probability (prior to observing the data) that 2 would be at least 23% belowo?. < Estimativa da Média e Desvio Padrdo Estimation of the Mean and Standard Deviation Vamos assumir queXi, ..., Xnformar uma amostra aleatéria da distribuigdo normal com We shall assume that X;,..., X,, form arandom sample from the normal distribution média desconhecidaye desvio padrao desconhecidoa. Além disso, como sempre, with unknown mean ju and unknown standard deviation o. Also, as usual, we shall denotaremos os MLEs deyeoporpec Por isso, denote the M.L.E.’s of 4 and o by “4 and. Thus, _ Casa _ dar _ 1. _ \!? LEXn eo —~ (Xeu-Xnp . =X, and ¢=(—)(x%;-X,) ” eu "i=l Notar queo2=02, o MLE deoz. No restante deste livro, ao se referir Notice that ¢2 = 62, the M.L.E. of o2. For the remainder of this book, when referring para o MLE deaz, usaremos qualquer simboloo20u02é mais conveniente. Como to the M.L.E. of o, we shall use whichever symbol 6? or 2 is more convenient. As an ilustragdo da aplicagdo do Teorema 8.3.1, determinaremos agora o menor illustration of the application of Theorem 8.3.1, we shall now determine the smallest possivel tamanho da amostrantal que a seguinte relacdo sera satisfeita: possible sample size n such that the following relation will be satisfied: ( ) 1 1 1 1 1 1 Pré |/4u| soe Ho| soz. = = 8.3.2 Pr(|@#—pl< =o and |6—-—o|<=0)})>-. 8.3.2 \Keu| soe 0 ii (6.32) (a-uls2 js -ol<30) > 3 (83.2) Em outras palavras, determinaremos 0 tamanho minimo da amostranpara 0 qual a In other words, we shall determine the minimum sample size n for which the proba- probabilidade sera de pelo menos 1/2 de que nempMem<ésera diferente do valor desconhecido bility will be at least 1/2 that neither / nor 6 will differ from the unknown value it is que esta estimando em mais de(1 Jo. estimating by more than (1/5)o. Por causa da independénciaLéag; a relagdo (8.3.2) pode ser reescrita como Because of the independence of 4 and G, the relation (8.3.2) can be rewritten as segue: follows: ore | E | <oPr. . | éo| <o22 | (8.3.3) Pr(f |< ) Pr(I6 \< EF (8.3.3) - = - =. = 3. _ =O Oo —-O =O =. oO. PRISON a) BORIS 5°) = 3 Se deixarmospidenotamos a primeira probabilidade no lado esquerdo da relacdo (8.3.3), e deixamos If we let p; denote the first probability on the left side of the relation (8.3.3), and let vocéseja uma variavel aleatéria que tem a distribuiggo normal padrdo, esta probabilidade U be a random variable that has the standard normal distribution, this probability pode ser escrito da seguinte forma: can be written in the following form: (v — ) ( ) C Iv _ Iv _ i — 1 1 pi=Pr IE <= n =Pr. |jvocél|< = I p= Pr( MRA! 2 yi) = Pr(iu1 < evi). oO 5 5 o 5 5 Da mesma forma, se deixarmosp2denotam a segunda probabilidade no lado esquerdo da relagdo Similarly, if we let p. denote the second probability on the left side of the relation (8.3.3), e deixe Hno2/o2, essa probabilidade pode ser escrita da seguinte forma: (8.3.3), and let V = nG?/o”, this probability can be written in the following form: ( ~ ) | . ) . 0 oO no2 po=Pr0.8< —<1.2 =Pr. 0.64n< — <144n Py =Pr( 08 <2 < 12) = P04 <“ < 144) oO O02 o o2 =Pr.(0.64n < V<1.44n). = Pr(0.64n < V < 1.44n). Pelo Teorema 8.3.1, a variavel aleatéria tem oyzdistribuigdo comn-1 graus de By Theorem 8.3.1, the random variable V has the x? distribution with n — 1 degrees liberdade. of freedom. Para cada valor especifico den, os valores depiep2pode ser encontrado, pelo For each specific value of n, the values of p; and p, can be found, at least menos aproximadamente, na tabela da distribuicgdo normal padrdo e na tabela day2 approximately, from the table of the standard normal distribution and the table of distribuigdo dada no final deste livro. Em particular, apds varios valores the x? distribution given at the end of this book. In particular, after various values 476 Capitulo 8 Distribuigées amostrais de estimadores 476 Chapter 8 Sampling Distributions of Estimators denforam tentados, descobrir-se-a que paran=21 os valores depiep2sdo pi= 0.64 of n have been tried, it will be found that for n = 21 the values of p, and p> are ep2= 0.78. Portanto, pi p2= 0.50, e segue-se que a relacdo (8.3.2) sera satisfeita p, = 0.64 and p, = 0.78. Hence, p;p2 = 0.50, and it follows that the relation (8.3.2) paran=21. will be satisfied for n = 21. Prova do Teorema 8.3.1 Proof of Theorem 8.3.1 Ja sabfamos pelo Corolario 5.6.2 que a distribuigdo da média amostral era conforme We already knew from Corollary 5.6.2 that the distribution of the sample mean was afirmado no Teorema 8.3.1. O que resta provar é a distribuigdo declarada da as stated in Theorem 8.3.1. What remains to prove is the stated distribution of the variancia amostral e a independéncia da média amostral e da variancia amostral. sample variance and the independence of the sample mean and sample variance. Matrizes Ortogonais Orthogonal Matrices Comecamos com algumas propriedades de matrizes ortogonais que sdo essenciais para a prova. We begin with some properties of orthogonal matrices that are essential for the proof. Definigao Matriz Ortogonal.Diz-se que umnxnmatrizAé ortogonakeA-1=A,, onde Definition | Orthogonal Matrix. It is said that ann x n matrix A is orthogonal if A~! = A’, where 8.3.1 Aé a transposicdo deA. 8.3.1 A’ is the transpose of A. Em outras palavras, uma matrizAé ortogonal se e somente seAA'=A'A=EU, onde FUE 0 nxn In other words, a matrix A is orthogonal if and only if AA’ = A’A = J, where J is the matriz de identidade. Segue-se desta Ultima propriedade que uma matriz é ortogonal se e n x n identity matrix. It follows from this latter property that a matrix is orthogonal somente se a soma dos quadrados dos elementos em cada linha for 1 e asoma dos if and only if the sum of the squares of the elements in each row is 1 and the sum produtos dos elementos correspondentes em cada par de linhas diferentes for 0. of the products of the corresponding elements in every pair of different rows is 0. Alternativamente, uma A matriz é ortogonal se e somente se a soma dos quadrados dos Alternatively, a matrix is orthogonal if and only if the sum of the squares of the elementos em cada coluna for 1 e a soma dos produtos dos elementos correspondentes elements in each column is 1 and the sum of the products of the corresponding em cada par de colunas diferentes for 0. elements in every pair of different columns is 0. Propriedades de matrizes ortogonaisDerivaremos agora duas propriedades importantes Properties of Orthogonal Matrices We shall now derive two important properties de matrizes ortogonais. of orthogonal matrices. Teorema O determinante é 1.SeAé ortogonal, entdo |detA| =1. Theorem Determinant is |. If A is orthogonal, then |det A| = 1. 8.3.2 8.3.2 ProvaPara provar este resultado, deve-se lembrar que detA=detA'para cada matriz Proof To prove this result, it should be recalled that det A = det A’ for every square quadradaA. Lembre-se também que detAB=(detA)(det B)para matrizes quadradasAeB. matrix A. Also recall that det AB = (det A)(det B) for square matrices A and B. Portanto, Therefore, det(AA' -(detA)detA -(detAp. det(AA’) = (det A)(det A’) = (det A)’. Também seAé ortogonal, entaoAA=EU, e seque-se que Also, if A is orthogonal, then AA’ = J, and it follows that det(AA' -detEU=1. det(AA’) = det I =1. Por isso(detAj= 1 ou, equivalentemente, |detA| =1. a Hence (det A)? = 1 or, equivalently, |det A| = 1. a Teorema O comprimento quadrado é presengdo.corsiders “olsmetoreyalenros dimensionais Theorem Squared Length Is Preserved. Consider two n-dimensional random vectors 8.3.3 x St 8.3.3 xX, Y; X= . | e S$. | ‘| (8.3.4) X=|: and Y=| : |, (8.3.4) Xn Sn X, Y,, e suponha queS=MACHADO, ondeAé uma matriz ortogonal. Entao and suppose that Y = AX, where A is an orthogonal matrix. Then ” ” n n Sau=X2qu), (8.3.5) So ve=)>° x7. (8.3.5) eu=1 eu=1 i=l i=l 8.3 Distribuicao Conjunta da Média Amostral e Variancia Amostral 477 8.3 Joint Distribution of the Sample Mean and Sample Variance 477 ProvaEste resultado decorre do fato de queA A=EU, porque Proof This result follows from the fact that A’A = I, because wv y N n Sau=55S=X A MACHADO=X X= Xo), 7 Sov =YVV=NAAX =X'X=)° X?. 7 eu=1 eu=1 i=1 i=1 Multiplicagdo de um vetorXpor uma matriz ortogonalAcorresponde a uma rotacdo deXem Multiplication of a vector X by an orthogonal matrix A corresponds to a rotation nespaco -dimensional possivelmente seguido pela mudanga dos sinais de algumas of X in n-dimensional space possibly followed by changing the signs of some coor- coordenadas. Nenhum destes) operacgées podem alterar o comprimento do vetor originalX, e dinates. Neither of these operations can change the length of the original vector X, esse comprimento é igual/ Fei Xpeul 2. and that length equals Orr x72, Juntas, essas duas propriedades de matrizes ortogonais implicam que se um vetor Together, these two properties of orthogonal matrices imply that if a random aleatorioSé obtido de um vetor aleatérioXpor umortogonakransformacao linear 5 vector Y is obtained from a random vector X by an orthogonal linear transformation X= MACHARO, entdo o valor absoluto do Jacobiano da transformagdo é1e Y = AX, then the absolute value of the Jacobian of the transformation is 1 and n _— 2 n 2 n 2 eun1 SFO eu=1 Xeu, via; = di-l X%. ; Combinamos os Teoremas 8.3.2 e 8.3.3 para obter um fato util sobre transformagées We combine Theorems 8.3.2 and 8.3.3 to obtain a useful fact about orthogonal ortogonais de uma amostra aleatéria de variaveis aleatdérias normais padrao. transformations of a random sample of standard normal random variables. Teorema Suponha que as variaveis aleatorias,X1,..., Xnsdo iid e cada um tem 0 padrdo Theorem Suppose that the random variables, X,,..., X, are i.i.d. and each has the standard 8.3.4 distribuigdo normal. Suponha também queAé uma ortogonalnxnmatriz, eS= MACHADO. 8.3.4 normal distribution. Suppose also that A is an orthogonal x n matrix, and Y= AX. Entdo o aleatérioy variaveis5i, . “5 pntambém so iid, cada um também tem 0 padrao normal Then the random variables Yj, ..., Y,, are alsoi.i.d., each also has the standard normal : : cow n _ 2 : : : n 2 __ n 2 distribuicgdo, e oui Xoo et Son, distribution, and )*"_, X7 = )0"_, Y;. Prova O pdf conjunto deM,..., Xné o seguinte, para -© <xeu<w/(eu=1,..., n} Proof The joint p.d.f. of X;,..., X, is as follows, for —00 < x; < oo (i =1,...,n): ( ) fn(x= { 12" 2. (8.3.6) f,(*) Ix Lye 2 (8.3.6) n — Ss experiéncia™ = ao. = —_——, _- Xx: . wo. 21Mn2 2 eu " (2m n/2 Plog a eu=1 i=l SeAé uma ortogonalnxnmatriz e as variaveis aleatdriasS1,..., Snsdo definidos pela If Ais an orthogonal x n matrix, and the random variables Y;, ..., Y,, are defined by relagdoS=MACHADO, onde os vetoresXeSsdo conforme especificado na Eq. (8.3.4). Esta é the relation Y = AX, where the vectors X and Y are as specified in Eq. (8.3.4). This is uma transformacao linear, entao a pdf conjunta de, ..., Sné obtido da Eq. (3.9.20) e a linear transformation, so the joint p.d.f. of Y;,..., Y,, is obtained from Eq. (3.9.20) igual and equals (sim —— f p-rsim) (9) = —— f,(A“y) in(Shi —— -1 . = — . 9 |deta| on ata n Deixarx=A-1sim. DesdeAé ortogonal, |detA| =1e 2 neue eu-t xeu.Como nés apenas Let x = A~!y. Since A is orthogonal, |det A] = 1 and an yy = x, as we just provado. Entdo, proved. So, ( ) (sim}= | 12” sim, (8.3.7) (y) | xp(—2 » 2 (8.3.7) in(Shi ——— experiéncia™ 9 = 3. = = - = - |. oO. 9 2Qt)n2 2 eu Bak) (27/2 P 2 ¢ vi eu=1 i=1 Isso pode ser visto na Eq. (8.3.7) que o pdf conjunto deS1,..., Sné exatamente It can be seen from Eq. (8.3.7) that the joint p.d-f. of Y;,..., ¥, is exactly the igual ao pdf conjunto deX,..., Xn. 7 same as the joint p.d.f. of X1,..., X). 7 Prova do Teorema 8.3.1 Proof of Theorem 8.3.1 Amostras aleatorias da distribuigao normal padraoComecaremos por Random Samples from the Standard Normal Distribution We shall begin by provando o Teorema 8.3.1 sob a suposigdo de queXi, ... , Xnformar uma amostra aleatéria a proving Theorem 8.3.1 under the assumption that X,,..., X,, form arandom sample partir da distribuicdo normal padrdo. Considerarv omvetor de linha tridimensional vocé, em from the standard normal distribution. Consider the n-dimensional row vector uw, in qual cada um dosncomponentes temovalor1/ 77: which each of the n components has the value 1/,/n: 1 / 1 1 vocé=v—...¥— . (8.3.8) u= |= a =| (8.3.8) n n vn Jn Como a soma dos quadrados dosncomponentes do vetor vocéé 1, é possivel Since the sum of the squares of the n components of the vector w is 1, it is possible construir uma matriz ortogonalAtal que os componentes do vetor vocéforma to construct an orthogonal matrix A such that the components of the vector u form 478 Capitulo 8 Distribuigdes amostrais de estimadores 478 Chapter 8 Sampling Distributions of Estimators a primeira fila deA. Essa construcdo, chamada deMétodo Gram-Schmidt, é descrito the first row of A. This construction, called the Gram-Schmidt method, is described em livros didaticos de algebra linear como Cullen (1972) e ndo sera discutido aqui. in textbooks on linear algebra such as Cullen (1972) and will not be discussed here. Vamos supor que tal matrizAfoi construido, e definiremos novamente as variaveis We shall assume that such a matrix A has been constructed, and we shall again define aleatoriasS1,..., Snpela transformacgaoS= MACHADO. the random variables Y;,..., Y,, by the transformation Y = AX. Uma vez que os componentes devocéformar a primeira linha deA, segue que Since the components of u form the first row of A, it follows that yr" Vi "4 _ S1=UX= VE oy Xn. (8.3.9) Yj =uX =) =X; = vnX,,. (8.3.9) n — /n eu=1 i=l n Além disso, pelo Teorema 8.3.4, 2 eu=1 XA 2 on sgu-Portanto, Furthermore, by Theorem 8.3.4, an x? = an Y?. Therefore, ? y” »? 2 ” n n n 2 n = = 72 > 2 2 2 2 2 0+ y Seu= "Stu S21= — XteunXr= (Xeu-Xn). Soyrs doy? -¥i =o Xo -nX) = SOX, - X,,)’. eu=2 eu=1 eu=1 eu=1 i=2 i=1 i=1 i=1 Obtivemos assim a relagado We have thus obtained the relation ” ” n n Seu=" (Xeu-Xnp. (8.3.10) So v7= 0%) - X,)’. (8.3.10) eu=2 eu=1 i=2 i=l E conhecido pelo Teorema 8.3.4 que 0 aleatOrio Vay responsabilidadessi, ..., Srestao dentro- It is known from Theorem 8.3.4 that the random variables Y;,..., Y,, are in- dependente. Portanto, as duas variaveis aleatdriasSie sp S#SH0 independentes, dependent. Therefore, the two random variables Y, and })7"_, y? are independent, e segue das Eqs. (8.3.9) e (8.3.10) queXne 7 eu (Xeu-Xn)oesto dentro- and it follows from Eqs. (8.3.9) and (8.3.10) that X,, and 7"_,(X; — X,,)? are in- dependente. Além disso, €é conhecido pelo Teorema 8.3.4 que 077-1 variaveis dependent. Furthermore, it is known from Theorem 8.3.4 that the n — 1 random aleatérias&2, ..., SnSdo iid, e que cada uma dessas variaveis aleatérias tem téle esta- variables Y>,..., Y,, are i.i.d., and that each of these random variables has the stan- distribuigdo normal padrdo. Portanto, pelo Corolario 8.2.1 a variavel aleatéria 0 2 Stey dard normal distribution. Hence, by Corollary 8.2.1 the random variable }°"_, y? tem gxadistribuigao com,-1 graus de liberdade. Segue-se da Eq. (8.3.10) has the x? distribution with n — 1 degrees of freedom. It follows from Eq. (8.3.10) que ~ @=1(Xeu-Xnktambém tem oyzdistribuicdo comn-1 graus de liberdade. that 7" (X; - X,,)° also has the x? distribution with n — 1 degrees of freedom. Amostras aleatérias de uma distribuigao normal arbitrariaAté agora, ao provar Random Samples from an Arbitrary Normal Distribution Thus far, in proving Teorema 8.3.1, consideramos apenas amostras aleatorias da distribuigdo normal padrdo. Theorem 8.3.1, we have considered only random samples from the standard normal Suponha agora que as variaveis aleatériasXi,..., Xnformar uma amostra aleatoria a distribution. Suppose now that the random variables X,,..., X, form a random partir de uma distribuigéo normal arbitraria com médiaye variacdocz. sample from an arbitrary normal distribution with mean yw and variance o?. Se deixarmosZeu=(Xeu- L///oparaeu=1,..., 7, entéo as variaveis aleatdriasZ,..., Zn If we let Z; = (X; — w)/o fori =1,...,n, then the random variables Z,,..., Z, sdo independentes e cada um tem a distribuicao normal padrdo. Em outras palavras, a are independent, and each has the standard normal distribution. In other words, the distribuicaéo conjunta deZ1,..., Zné o mesmo que a distribuigdo conjunta de um aleatério joint distribution of Z),..., Z, is the same as the joint distribution of a random amostra da distribuigdo normal padrdo. Resulta dos resultados aN sample from the standard normal distribution. It follows from the results that have acabei de obter issoZne ust feurZnJa 2s0mos independentes’ eet (Zeur Znpe just been obtained that Z,, and )~"_,(Z; — Z,)? are independent, and }¥_4(Z; —Z,)" tem oy2distribuigdo comn-1 graus de liberdade. No entanto, Zn=(Xn-t//oe has the x? distribution with n — 1 degrees of freedom. However, Z,, = (X, — 1)/o and ” 1> n n 1 n (ZerZnp= — (XXpp. (8.3.11) (Zi = Zn)? = (K - X,)”. (8.3.11) O02 ; a2 # eu=1 eu=1 i=l i=1 — n — Concluimos agora que a média amostralXne a variancia da amostra(1/n) 2 eu=1 (Xeu- We now conclude that the sample mean X,, and the sample variance (1/n) )~"_,(X; — Xnbsao independentes, e que a variavel aleatéria no lado direito da Eq. (8.3.11) tem o X,,)° are independent, and that the random variable on the right side of Eq. (8.3.11) Xedistribuigdo comn-1 graus de liberdade. Todos os resultados declarados no has the x? distribution with n — 1 degrees of freedom. All the results stated in Teorema 8.3.1 foram agora estabelecidos. Theorem 8.3.1 have now been established. 8.3 Distribuicao Conjunta da Média Amostral e Variancia Amostral 479 8.3 Joint Distribution of the Sample Mean and Sample Variance 479 Resumo Summary Deixar™,..., Xnser uma amostra aleatoria da distribuigao normal com médiay/ Let X,,..., X, be a random sample from the normal distribution with mean yu e variagdooz. Entdo a média amostralLEXn=1 7 eu=iXeuve variagdo amostral and variance o”. Then the sample mean ji = X,, = ‘ >), X; and sample variance 02=1)4 eu=1(Xeu-Xnbsao varidveis aleatorias independentes. Além disso,pem o oz = 4 (Xi - X,,)* are independent random variables. Furthermore, i has the distribuigdo normal com médiape variagdoo2/n, eno2/o2ztem uma distribuigdo qui- normal distribution with mean yu and variance o*/n, and no2/o7 has a chi-square quadrado comr7-1 graus de liberdade. distribution with n — 1 degrees of freedom. Exercicios Exercises 1.Assuma issoX,..., Xnformar uma amostra aleatoria 5.Suponha que as variaveis aleatdériasXieX2sdo 1. Assume that X;,..., X,, form arandom sample from 5. Suppose that the random variables X, and X> are inde- da distribuigdo normal com médiaye variagdo oz. independentes e que cada um tem distribuigdo normal the normal distribution with mean yw and variance o”7. _ pendent, and that each has the normal distribution with Mostre issoa3tem a distribuicdo gama com parametros com médiae variagdooz. Prove que as variaveis Show that o2 has the gamma distribution with parameters | mean yz and variance o*. Prove that the random variables (7-1) en/2.02). aleatérias X1+X2eX1-X2sdo independentes. (n — 1)/2 and n/(202). X, +X, and X; — X> are independent. 2.Determine se cada uma das cinco matrizes aseguiré ©-Suponha queN, . .., Xnformar uma amostra aleatoria da 2. Determine whether or not each of the five following ® Suppose that Xj, ..., X,, form a random sample from ortogonal ou nao: distribuigdo normal com médiaye variacgdo oz. Supondo matrices is orthogonal: the normal distribution with mean j1 and variance o?. As- [ ] [ ] que o tamanho da amostrané 16, determine os valores suming that the sample size n is 16, determine the values 0 1 0 08 0 0.6 das seguintes probabilidades: 0 1 0 0.8 0 0.6 of the following probabilities: ad 0 1 b.-0.6 0 0.8 [ yn ] a]O 0 1 b.| -0.6 0 0.8 100 0 i 1 0 a.Pr.1 20281 7 eui(Xeuw-[ps2o2 100 0-1 O a. Pr| $0? <i D(X; - pw) < 207] 4 [ _ ] 1 _ - ve VL VL 2 _— LL LL 1.2-1yu _ 2 2 Los o 06! 5 4 i b.Pri1 20813 euni(XewXnhS20 o8 0 06 Zo OR b. Prffo2 <1 (x; — ¥, 7 < 207] v1 1 1 1 c.-0.6 0 08 d,| “4 - + ol 7.Suponha que, ..., Xnformar uma amostra aleatoria da ¢. 0.6 0 08 | d. BB Va 7. Suppose that X,,..., X,, form a random sample from 005 0 wa wk al distribuigdo normal com médiaye variagdooz, e deixara2 0 05 0 ee the normal distribution with mean pz and variance o”, and 3 3 3 denota a variancia da amostra. Determine o menor v3 v3 v3 let c? denote the sample variance. Determine the smallest [ 1 1 1 1 | valores denpara 0 qual as seguintes relacées sdo satisfeitas: 1 1 1 1 values of n for which the following relations are satisfied: 2 2 2 2 ( ) 7 2 2 3 + 4~--i 1 2 a.Pr.o o<15. 2095 -2 _1 2 1 a. Pr( & < 1.5) > 0.95 elo z zr Ff | = e 2 ~2 2 2 o 1 = 1 ld 1 ( ) “} 1 I _d I ~ || b.Pré | o3-c2| <1 20220.8 2 2 "2 2 b. Pr(\o? —o|< 30?) > 0.8 _1 1 1 11 _l I I _d 2 2 2 2 2 2 2 2 oo ; 8.Suponha queXxtem oyz2distribuigdo com 200 graus de 8. Suppose that X has the x? distribution with 200 degrees 3.a. Construa um 2x2 matriz ortogonal para a qual a liberdade. Explique por que o teorema do limite central pode 3.a. Construct a 2 x 2 orthogonal matrix for which the of freedom. Explain why the central limit theorem can be primeira linha é a seguinte: ser usado para determinar o valor aproximado de Pr(160<X< first row is as follows: used to determine the approximate value of Pr(160 < X < 240e encontre esse valor aproximado. 240) and find this approximate value. [a vu. ph +4). 2 2 9.Suponha que cada um dos dois estatisticos,AeB, obtém v2 v2 9. Suppose that each of two statisticians, A and B, inde- independentemente uma amostra aleatéria de 20 observacées da pendently takes a random sample of 20 observations from b.Construa um 3x3 matriz ortogonal para a qual a primeira distribuigaéo normal com média desconhecidaye variancia b. Construct a 3 x 3 orthogonal matrix for which the the normal distribution with unknown mean « and known linha é a seguinte: conhecida 4. Suponha também que 0 estatisticoAfidescobre que a first row is as follows: variance 4. Suppose also that statistician A finds the sam- variancia amostral em sua amostra aleatéria é 3,8, e o estatisticoB ple variance in his random sample to be 3.8, and statis- VE ve vi]. , vn a pot + 4). . ; , 5 5 3 fiacha que a variancia amostral em sua amostra aleatéria é 9,4. Aw; Ww tician B finds the sample variance in her random sample AS h oo. | a . Para qual amostra aleatoria a média amostral provavelmente 4S hat th d ‘ables X,. X dx to be 9.4. For which random sample is the sample mean 4.UpOnna que as varlavels a eatoriasXi, X2, eX3Sa0 estara mais préxima do valor desconhecido de? 4. Suppose that the random variables X1, X2, and X are likely to be closer to the unknown value of 1? iid, e que cada um tem a distribuigdo normal padrdo. Além i.i.d., and that each has the standard normal distribution. disso, suponha que Also, suppose that Si= Op A+ 0.6X2, Yy = 0.8X4 + 0.6X>, S= — 2(0.3Xi- 0.4X2- 0.5.3), V Y, = V2(0.3X, — 0.4X> — 0.5X3), S3= — 20.3Xi- 0.4X2+ 0.53). Y; = V2(0.3X, — 0.4X> + 0.5X3). Encontre a distribuigdo conjunta deS1, S2, eS3. Find the joint distribution of Y,, Y>, and Y3. 480 Capitulo 8 Distribuigées amostrais de estimadores 480 Chapter 8 Sampling Distributions of Estimators 8.4 O¢Distribuigdes 8.4 The t Distributions Quando nossos dados sao uma amostra da distribuigao normal com média pe When our data are a sample from the normal distribution with mean ju and vari- varidncia o2, a distribui¢go deZ=m 2(L- p06 a distribuic¢go normal padrao, onde pr ance o”, the distribution of Z = n'/*(ji — w)/o is the standard normal distribution, é a média amostral. Se 026 desconhecido, podemos substituir o por um where ji is the sample mean. If 0” is unknown, we can replace o by an estimator estimador (semelhante ao MLE) na formula paraZ.A varidvel aleatoria resultante (similar to the M.L.E.) in the formula for Z. The resulting random variable has tem otdistribuic¢do comn-1 graus de liberdade e é util para fazer inferéncias the t distribution with n — 1 degrees of freedom and is useful for making inferences apenas sobre pl, mesmo quando pl e o2zsao desconhecidos. about yz alone even when both ys and o? are unknown. Definigdo das Distribuigdes Definition of the Distributions Exemplo Chuva de nuvens semeadas.Considere a mesma amostra de medicées logaritmicas de precipitacdo Example Rain from Seeded Clouds. Consider the same sample of log-rainfall measurements 8.4.1 de 26 nuvens semeadas do Exemplo 8.3.2. Suponha agora que estamos interessados 8.4.1 from 26 seeded clouds from Example 8.3.2. Suppose now that we are interested in em saber até que ponto a média amostralXndessas medicées é da médiay/. Nés how far the sample average X,, of those measurements is from the mean jz. We know sabemos issom2(Xn-/otem a distribuigdo normal padrdo, mas ndo sabemoso. Se that n!/2(X,, — )/o has the standard normal distribution, but we do not know o. If substituirmosopor um estimadorotomo o MLE, ou algo semelhante, qual éa we replace o by an estimator 6 such as the M.L.E., or something similar, what is the distribuigdo dem 2(Xn-p)/ae como podemos fazer uso desta variavel aleatoria para distribution of n!/?(X,, — )/é, and how can we make use of this random variable to fazer inferéncias sobre? - make inferences about j1? < Nesta secdo, apresentaremos e discutiremos outra familia de distribuigdes, chamada det In this section, we shall introduce and discuss another family of distributions, distribuicdes, que estado intimamente relacionadas a amostras aleatérias de uma distribuigdo called the t distributions, which are closely related to random samples from a normal normal. Ofdistribuigdes, comoyzdistribuigdes, tem sido amplamente aplicadas em problemas distribution. The t distributions, like the x7 distributions, have been widely applied in importantes de inferéncia estatistica. O “As distribuig6es também sao conhecidas como important problems of statistical inference. The f distributions are also known as Stu- distribuigdes de Student (ver Student, 1908), em homenagem a WS Gosset, que publicou seus dent’s distributions (see Student, 1908), in honor of W. S. Gosset, who published his estudos sobre essa distribuigdo em 1908 sob 0 pseud6nimo de “Student”. As distribuigdes sao studies of this distribution in 1908 under the pen name “Student.” The distributions definidas da seguinte forma. are defined as follows. Definicgao tDistribuigdes.Considere duas variaveis aleatérias independentesSeZ, de tal modo queS Definition t Distributions. Consider two independent random variables Y and Z, such that Y 8.4.1 tem oy2distribuigdo comeugraus de liberdade eZtem a distribuigdo normal 8.4.1 has the x? distribution with m degrees of freedom and Z has the standard normal padrdo. Suponha que uma variavel aleatdériaXé definido pela equacgdo distribution. Suppose that a random variable X is defined by the equation Z Z Y eu (< ) Entdo a distribuigdo deXé chamado detdistribui¢ao comeugraus de liberdade. Then the distribution of X is called the t distribution with m degrees of freedom. A derivagao do pdf doddistribuigdo comeugraus de liberdade faz uso dos The derivation of the p.d.f. of the ¢ distribution with m degrees of freedom makes métodos da Seg. 3.9 e sera fornecido no final desta secdo. Mas declaramos o use of the methods of Sec. 3.9 and will be given at the end of this section. But we state resultado aqui. the result here. Teorema Fungdo densidade de probabilidade.O pdf doddistribuigdo com eugraus de liberdade Theorem Probability Density Function. The p.d.f. of the t distribution with m degrees of freedom 8.4.1 é CX 8.4.1 is eutl ) - (myn r("#) 3\ —m+/2 2 x2 2 x ———_ H+ — para -~< x <oo, (8.4.2) {14+ — for — 00 <x <0. (8.4.2) (mm2 F eu (mx)'/7T (4) m Momentos das Distribuig6ées tEembora a média dotdistribuicdo nado existe Moments of the t Distributions Although the mean of the ¢ distribution does not quandoeus1, a média existe para cada valor dem >1. E claro que sempre que a exist when m < 1, the mean does exist for every value of m > 1. Of course, whenever média existe, seu valor é 0 devido a simetria datdistribuicdo. the mean does exist, its value is 0 because of the symmetry of the r distribution. 8.4 OdDistribuicgdes 481 8.4 Thet Distributions 481 Em geral, se uma variavel aleatoriaXtem oddistribuigdo comeugraus de liberdade (m > In general, if a random variable X has the ¢ distribution with m degrees of 1), entéo pode-se mostrar queF/(| X| k) <eparak <me essaF/(| X| kK »parakzeu. Seeué um freedom (m > 1), then it can be shown that E(|X|*) < 00 fork < mand that E(|X|) = numero inteiro, o primeiroeu+1 momentos deXexistem, mas ndo existem momentos de co for k > m. If m is an integer, the first m — 1 moments of X exist, but no moments ordem superior. Segue-se, portanto, que o FMG deXndo existe. of higher order exist. It follows, therefore, that the m.g.f. of X does not exist. Pode-se mostrar (ver Exercicio 1 no final desta segdo) que sexAtem of It can be shown (see Exercise 1 at the end of this section) that if X has the r distribuigdo comeugraus de liberdade(m >2), entdo Var(X-milimetros-2). distribution with m degrees of freedom (m > 2), then Var(X) = m/(m — 2). Relagdo com amostras aleatérias de uma distribuigao normal Relation to Random Samples from a Normal Distribution Exemplo Chuva de nuvens semeadas.Volte ao Exemplo 8.4.1. Ja vimos issoZ= Example Rain from Seeded Clouds. Return to Example 8.4.1. We have already seen that Z = 8.4.2 M2(Xr-p)/otem a distribuigdo normal padrdo. Além disso, o Teorema 8.3.1 diz 8.4.2 n'/?(X,, — w)/o has the standard normal distribution. Furthermore, Theorem 8.3.1 queXr(e, portantoZ) é independente deS=no2/o2, que tem oyadistribuigdo comn- says that X,, (and hence Z) is independent of Y = no?/o?, which has the x? dis- 1 graus de liberdade. Segue queZ//SAn-1]h2tem otdistribuigdo comn-1 graus de tribution with n — 1 degrees of freedom. It follows that Z/(Y/[n — 1])¥/? has the t liberdade. Mostraremos como utilizar este facto depois de apresentar a versdo distribution with n — 1 degrees of freedom. We shall show how to use this fact after geral deste resultado. - stating the general version of this result. < Teorema Suponha que, ..., Xnformar uma amostra aleatoéria da distribuigdo normal com Theorem Suppose that X;,..., X, form a random sample from the normal distribution with 8.4.2 significarye variagdooz. DeixarXndenotar a média amostral e definir 8.4.2 mean yu and variance o”. Let X,, denote the sample mean, and define [Dn — ir = 551/72 eu=1 (Xeu-Xn)2 " (X,-X,) o= =, (8.4.3) o= Lini(Xi — XnJ" ; (8.4.3) n-1 n—1 Entéom(Xr-y)/otem otdistribuigéo comn-1 graus de liberdade. Then n/?(X,, — )/o’ has the t distribution with n — 1 degrees of freedom. n — — — — ProvaDefinir S2n= 2 eu=1(Xeu-Xnp. A seguir, definaZ=m 2(Xr-pl)/oe S=S2 n/O2. Proof Define S? = )7"_,(X; — X,,)*. Next, define Z =n1/?(X,, — w)/o and Y = S?/o?. Segue do Teorema 8.3.1 queSeZsado independentes, Stem oy2distribuigdo comr-1 It follows from Theorem 8.3.1 that Y and Z are independent, Y has the x? distribution graus de liberdade, eZtem a distribuigdo normal padrdao. Por fim, definavocépor with n — 1 degrees of freedom, and Z has the standard normal distribution. Finally, define U by . Z Z voce N12" U= yoy \W2 n-1 ( _ ;) Resulta da definigdo doddistribuigdo quevocétem ofdistribuigdo com n-1 graus de It follows from the definition of the t distribution that U has the ¢t distribution with liberdade. E facilmente visto quevocépode ser reescrito como n — 1 degrees of freedom. It is easily seen that U can be rewritten as ma(X 2X, — voce GOR rH) (8.4.4) ya tn) (8.4.4) 2 \'/2 Sh Sn n-1 n—1 O denominador da expressdo no lado direito da Eq. (8.4.4) é facilmente reconhecido como The denominator of the expression on the right side of Eq. (8.4.4) is easily recognized odefinido na Eq. (8.4.3). 7 as o’ defined in Eq. (8.4.3). 7 A primeira prova rigorosa do Teorema 8.4.2 foi dada por RA Fisher em 1923. The first rigorous proof of Theorem 8.4.2 was given by R. A. Fisher in 1923. Um aspecto importante da Eq. (8.4.4) 6 que nem o valor devocénem a One important aspect of Eq. (8.4.4) is that neither the value of U nor the distribuigdo devocédepende do valor da varidnciaoz. No Exemplo 8.4.1, tentamos distribution of U depends on the value of the variance o”. In Example 8.4.1, we tried substituirona variavel aleatériaZ=m _2(Xn-~y)/opor 0 Em vez disso, o Teorema 8.4.2 replacing o in the random variable Z = n'/2(X,, — 2)/o by G. Instead, Theorem 8.4.2 sugere que devemos substituir oporodefinido na Eq. (8.4.3). Se substituirmosopora, suggests that we should replace o by o’ defined in Eq. (8.4.3). If we replace o by o’, produzimos a variavel aleatériavocéna Eq. (8.4.4) que ndo envolvece também tem we produce the random variable U in Eq. (8.4.4) that does not involve o and also uma distribuigdo que ndo depende dea. has a distribution that does not depend ono. 482 Capitulo 8 Distribuigdes amostrais de estimadores 482 Chapter 8 Sampling Distributions of Estimators O leitor devera notar queadifere do MLEodeopor um fator constante, The reader should notice that o’ differs from the M.L.E. 6 of o by a constant factor, Toole ( ) 72 S n 12 82 1/2 a a (8.4.5) of'=|—t] = (4) é. (8.4.5) nmi 1 n-1 n—1 Isso pode ser visto na Eq. (8.4.5) que para grandes valores denos estimadoresgecéstardo muito It can be seen from Eq. (8.4.5) that for large values of n the estimators o’ and ¢ will prdéximos um do outro. O estimadorasera discutido mais detalhadamente na Sec. 8.7. be very close to each other. The estimator o’ will be discussed further in Sec. 8.7. Se o tamanho da amostrané grande, a probabilidade de que o estimadorgestara If the sample size n is large, the probability that the estimator o’ will be close toa perto degé alto. Portanto, substituindooporana variavel aleatériaZndo mudara muito a is high. Hence, replacing o by o’ in the random variable Z will not greatly change the distribuigdo normal padrdo deZ. Por esta razao, é plausivel que ofdistribuigdo comn-1 standard normal distribution of Z. For this reason, it is plausible that the r distribution graus de liberdade devem estar préximos da distribuigdo normal padrao se né grande. with n — 1 degrees of freedom should be close to the standard normal distribution if Voltaremos a este ponto mais formalmente mais adiante nesta secao. n is large. We shall return to this point more formally later in this section. Exemplo Chuva de nuvens semeadas.Volte ao Exemplo 8.4.2. Sob o pressuposto de que o Example Rain from Seeded Clouds. Return to Example 8.4.2. Under the assumption that the 8.4.3 observagoes™1, ..., Xn(log-precipitagées) sdo independentes da distribuicdo 8.4.3 observations X1,..., X, (log-rainfalls) are independent with common normal distri- normal comum, a distribuig¢do devocé=m 2(Xr-p//oé otdistribuigdo comn-1 graus bution, the distribution of U =n'/?(X,, — )/o’ is the r distribution with n — 1 degrees de liberdade. Comn=26, a tabela dofdistribuigdo nos diz que o quantil 0,9 dot of freedom. With n = 26, the table of the ¢ distribution tells us that the 0.9 quantile distribuigado com 25 graus de liberdade é 1,316, entdo Pr(VOCES1.3160.9. of the ¢ distribution with 25 degrees of freedom is 1.316, so Pr(U < 1.316) = 0.9. It Segue-se que follows that (__ ) _ Pr.Xnsj+0.2581 0=0.9, Pr(X, <# + 0.25810") = 0.9, porque 1.316/26)12= 0.2581. Ou seja, a probabilidade é 0,9 de queXnnao sera mais que because 1.316/(26)!/* = 0.2581. That is, the probability is 0.9 that X,, will be no more 0,2581 vezesoacimay. Claro,aé uma variavel aleatéria, bem comoXn, portanto este than 0.2581 times o’ above ju. Of course, a’ is a random variable as well as X,,, so this resultado nao é tao informativo quanto esperavamos. Nas Secées 8.5 e 8.6, mostraremos result is not as informative as we might have hoped. In Sections 8.5 and 8.6, we will como fazer uso doddistribuicgdo para fazer algumas inferéncias padrdo sobre a média show how to make use of the t distribution to make some standard inferences about desconheciday. - the unknown mean jz. < Relagdo com a Distribuigdo de Cauchy e coma Relation to the Cauchy Distribution and to the Standard Distribuigdo Normal Padrao Normal Distribution Isso pode ser visto na Eq. (8.4.2) (e Fig. 8.4) que o pdfg(x uma fungdo simétrica em It can be seen from Eq. (8.4.2) (and Fig. 8.4) that the p.d.f. g(x) is a symmetric, bell- forma de sino com seu valor maximo emx=0. Assim, sua forma geral 6 semelhante a shaped function with its maximum value at x = 0. Thus, its general shape is similar da pdf de uma distribuigdo normal com média 0. Porém, comox> ~ou x> -~,as to that of the p.d.f. of a normal distribution with mean 0. However, as x — oo or caudas do pdfg(x/aproximam-se de 0 muito mais lentamente do que as caudas da x — —o0, the tails of the p.d.f. g(x) approach 0 much more slowly than do the tails pdf de uma distribuigdo normal. Na verdade, isso pode ser visto na Eq. (8.4.2) que of of the p.d.f. of a normal distribution. In fact, it can be seen from Eq. (8.4.2) that the r distribuigdo com um grau de liberdade é a distribuigdo de Cauchy, que foi definida distribution with one degree of freedom is the Cauchy distribution, which was defined no Exemplo 4.1.8. A fdp da distribuigdo de Cauchy foi esbocada na Figura 4.3. Foi in Example 4.1.8. The p.d-f. of the Cauchy distribution was sketched in Fig. 4.3. It mostrado no Exemplo 4.1.8 que a média da distribuigdo de Cauchy ndo existe, was shown in Example 4.1.8 that the mean of the Cauchy distribution does not exist, porque a integral que especifica o valor da média nado é absolutamente convergente. because the integral that specifies the value of the mean is not absolutely convergent. Segue-se que, embora o pdf doddistribuigdo com um grau de liberdade é simétrica It follows that, although the p.d.f. of the ¢ distribution with one degree of freedom em relagdo ao pontox=0, a média desta distribui¢gdo ndo existe. is symmetric with respect to the point x = 0, the mean of this distribution does not exist. Também pode ser mostrado a partir da Eq. (8.4.2) que, comon> ~,0 pdfg(x) It can also be shown from Eq. (8.4.2) that, as n — oo, the p.d-f. g(x) converges to converge para o pdfg/(x)da distribuigdo normal padrdo para cada valor dex (-~ <x <o) the p.d.f. @ (x) of the standard normal distribution for every value of x (—oo < x < ov). . Isto segue do Teorema 5.3.3 e do seguinte resultado: This follows from Theorem 5.3.3 and the following result: ( ) eU+15 P(m + 1) limso, ————— =1. (8.4.6) lim ———~ =1. (8.4.6) eu (milimetrosi2 m—> oo T(m)m1/2 8.4 OdDistribuicgdes 483 8.4 Thet Distributions 483 Figura 8.4pdfs de padrao Densidade Figure 8.4 p.d.f’s of stan- Density normal eddistribuigées. —_ Normal dard normal and t distribu- —— Normal 0,4 Cauchy . 0.4 Cauchy js" terse 5 Qraus tions. sree 5 Degrees I? da liberdade of Freedom ki 3 ——-— 20 graus / —-—-— 20 Degrees fF ' da liberdade : of Freedom f 0,2 0,1 -4 -2 0 2 4 x -4 -2 0 2 4 * (Veja o Exercicio 7 para provar o resultado acima.) Portanto, quandoné grande, ot (See Exercise 7 for a way to prove the above result.) Hence, when n is large, the t distribuigdo comngraus de liberdade podem ser aproximados pela distribuigdo normal distribution with n degrees of freedom can be approximated by the standard normal padrdo. A Figura 8.4 mostra a pdf da distribuigdo normal padrdo juntamente com as pdf's distribution. Figure 8.4 shows the p.d.f. of the standard normal distribution together datdistribuigdes com 1, 5 e 20 graus de liberdade para que o leitor possa ver como ofas with the p.d.f’s of the ¢ distributions with 1, 5, and 20 degrees of freedom so that the distribuigdes se aproximam do normal a medida que os graus de liberdade aumentam. reader can see how the t distributions get closer to normal as the degrees of freedom increase. Uma pequena tabela depquantis para ofdistribuigdo comeugraus de liberdade para A short table of p quantiles for the ¢ distribution with m degrees of freedom for varios valores depeeué fornecido no final deste livro. As probabilidades na primeira linha various values of p and m is given at the end of this book. The probabilities in the da tabela, correspondentes aeu=1, sdo os da distribuigdo de Cauchy. As probabilidades na first line of the table, corresponding to m = 1, are those for the Cauchy distribution. linha inferior da tabela correspondente aeu=~sdo aqueles para a distribuigdo normal The probabilities in the bottom line of the table corresponding to m = oo are those padrdo. A maioria dos pacotes estatisticos inclui uma fungdao para calcular o cdf e a fungdo for the standard normal distribution. Most statistical packages include a function to quantil de um valor arbitrdrio. tdistribuicdo. compute the c.d.f. and the quantile function of an arbitrary ¢ distribution. Derivacgao do pdf Derivation of the p.d.f. Suponha que a distribuig¢do conjunta de SeZé conforme especificado na Definigado Suppose that the joint distribution of Y and Z is as specified in Definition 8.4.1. Then, 8.4.1. Entdo, porqueSeZsdo independentes, sua pdf conjunta é igual ao produtofi(sja because Y and Z are independent, their joint p.d_-f. is equal to the product f;(y) fo(z), (2), ondefi(s o pdf doyzdistribuigdo comeugraus de liberdade ef(zX o pdf da where f(y) is the p.d.f. of the x? distribution with m degrees of freedom and f)(z) is distribuigdo normal padrdo. DeixarXser definido pela Eq. (8.4.1) e, como um the p.d.f. of the standard normal distribution. Let X be defined by Eq. (8.4.1) and, as dispositivo conveniente, deixeC=S.Determinaremos primeiro a pdf conjunta deXeC. a convenient device, let W = Y. We shall determine first the joint p.d-f. of X and W. Das definigdes deXeC, From the definitions of X and W, ( c diz w\!2 ZX = eS=C. (8.4.7) Z=xX (“) and Y=W. (8.4.7) eu m O Jacobiano da transformagdo (8.4.7) deXeCparaSeZé(W/m) 2. O pdf conjuntof The Jacobian of the transformation (8.4.7) from X and W to Y and Z is (W/m)"/?. (x, wdexe Cpode ser obtido no pdf conjuntofi (sJf(zsubstituindosimezpelas The joint p.d.f. f(x, w) of X and W can be obtained from the joint p.d-f. f(y) f(z) by expressdes dadas em (8.4.7) e depois multiplicando 0 resultado por(c/m)j 2. replacing y and z by the expressions given in (8.4.7) and then multiplying the result Verifica-se entdo que o valor def (x, w¥ 0 seguinte, para -< x <to by (w/m)'/2. It is then found that the value of f (x, w) is as follows, for —oo < x < 00 ew >0: and w > 0: () ( Jaz W2\ 7 \W2 C12 Cc w w f(x, WEA (C)f2x = — f@,w)=fAwh (: =| (=) eu eu m m [ ( )] 1 x (m+)/2-1 1 x =CWim+1)2-1EXP - = 1+— ¢, (8.4.8) =cw exp} —= [1+ — ]w], (8.4.8) 2 eu 2 m onde where 484 Capitulo 8 Distribuigdes amostrais de estimadores 484 Chapter 8 Sampling Distributions of Estimators [ ( J} -1 c= 2imyr(mmn OU c= 20+02¢nayr() 2 2 O pdf marginalg(xdeXpode ser obtido a partir da Eq. (8.4.8) usando o The marginal p.d.f. g(x) of X can be obtained from Eq. (8.4.8) by using the relagdo f relation GXF f(x, w) dw g(x) -| f(x, w) dw Joo oo =C cim+1)2-1exp[-o que(x)|dw, =f w""tD?2-l exp[—wh(x)] dw, 0 0 ondeA(x# [1 +x2/m]/2. Segue-se da Eq. (5.7.10) que where h(x) = [1 + x?/m]/2. It follows from Eq. (5.7.10) that (m+ 2) — Tm +1)/2) GOFC helmet’ BO) = m2 Substituindo a formula pormisso produz a funcgdo em (8.4.2). Substituting the formula for c into this yields the function in (8.4.2). Resumo Summary DeixarX1,..., Xnser uma amostra aleatéria da digtribuicao normal com médiay/ Let X,,..., X, be a random sample from the normal distribution with mean pu _ n —_ 12 ; _ — 1/2 e variagdoo2. DeixarXn=1 = 2 @u=1Xeueo=r-1 2 eu=1 (XeurXnp2 . Entéo o and variance o7. Let X, = + -"_, X; ando’= (4 (kX; - X,)”) . Then the distribuigdo dem2(Xn-p)//oé otdistribuigéo comn-1 graus de liberdade. distribution of n/?(X,, — w)/o' is the t distribution with n — 1 degrees of freedom. Exercicios Exercises 1.Suponha queXtem oddistribuicdo comeugraus de C(X1+X2) 1. Suppose that X has the ¢ distribution with m degrees c(X1 + X9) liberdade(m >2). Mostre que Var (X= milimetros-2).Dica: (XB+X2 44X25 of freedom (m > 2). Show that Var(X) = m/(m — 2). Hint: (XZ +X} + X22 Avaliar EX2), restrinja a integral a metade positiva da . oe To evaluate E(X), restrict the integral to the positive half . Co, reta real e altere a varidvel dexpara tera umédistribuicdo. of the real line and change the variable from x to will have at distribution. x” 4.Usando a tabela dofdistribuigdo dada no final deste x2 4. By using the table of the ¢ distribution given in the back —_ livro, determine o valor da integral —_ of this book, determine the value of the integral sin=—— fas y=": 44+ 2 ax 14 x? [. dx eu -ofl2+x2p © m ~oo (12 + x2)?" Compare a integral com a pdf de uma distribui¢do beta. 5.Suponha que as variaveis aleatoriasXieX2sa0 Compare the integral with the p.d.f. of a beta distribution. | 5. Suppose that the random variables X, and X, are in- Alternativamente, use o Exercicio 21 da Secdo. 5.7. independentes e que cada um tem a distribuigdo normal com Alternatively, use Exercise 21 in Sec. 5.7. dependent and that each has the normal distribution with média 0 e varidnciaoz. Determine o valor de mean 0 and variance o. Determine the value of 2.Suponha que, ..., Xnformar uma amostra aleatoria da [ ] 2. Suppose that X,,..., X, form a random sample from distribuigdo normal com média desconhecidaye desvio Pr (M+X2p <A the normal distribution with unknown mean yw and un- Pr (X1 +X)? <4 padrdo desconhecidog, e deixarp@odenotam os MLE deyeo. ” (M-Xap ‘ known standard deviation o, and let fj and G denote the (X1 — X>)? , Para o tamanho da amostran=17, encontre um valor dedde tal M.L.E.’s of and o. For the sample size n = 17, find a modo que Dica: r value of k such that Hint: da 2 Pr.(if> p+ koj=0.95. (Xi-X2p= 2 M- ae Pr(a > w+ko) =0.95. (X, — Xy)* =2 (x _ MX) 3.Suponha que as cinco varidveis aleatdriasX,..., X5sdo0 ( )b 3. Suppose that the five random variables X;,..., X5 are 2 iid e que cada um tem a distribuigdo normal padrdo. + Xu X+X2 . i.i.d. and that each has the standard normal distribution. + (x _Xit 9) Determine uma constantectal que a variavel aleatéria 2 Determine a constant c such that the random variable 2 8.5 Intervalos de Confianca 485 8.5 Confidence Intervals 485 6.No Exemplo 8.2.3, suponha que observaremos/=20 pedacos de 8.Deixarxtem a distribuigdo normal padrdo e deixa 5 6. In Example 8.2.3, suppose that we will observe n = 20 8. Let X have the standard normal distribution, and let queijo com concentragao de acido lactico tenha ofdistribuigdo com cinco graus de liberdade. cheese chunks with lactic acid concen- Y have the ¢ distribution with five degrees of freedom. tragdesMi,..., X20. Encontre um numeroqpara que Explique por quec=1.63 fornece o maior valor da trations X1,..., X29. Find a number c so that Explain why c = 1.63 provides the largest value of the Pr.(X20Sp+co 0.95. diferenga Pr(-c <X <c}Pr.é-c <S < c).Dica‘Comece Pr(X99 < uw + co’) = 0.95. difference Pr(—c < X <c) — Pr(—c < Y <c). Hint: Start , . . observando a Fig. 8.4. Lo . by looking at Fig. 8.4. 7.Prove a formula limite Eq. (8.4.6). Dica‘Use 0 Teorema g 7. Prove the limit formula Eq. (8.4.6). Hint: Use Theo- y = = 5.7.4. rem 5.7.4. 8.5 Intervalos de Confianca 8.5 Confidence Intervals Os intervalos de confian¢a fornecem um método para adicionar mais informagées a um Confidence intervals provide a method of adding more information to an estimator estimador. Qquando desejamos estimar um parametro desconhecido 6. Podemos encontrar um 6 when we wish to estimate an unknown parameter 0. We can find an interval intervalo (A, B)que pensamos ter alta probabilidade de conter @. A dura¢ao desse intervalo nos (A, B) that we think has high probability of containing 0. The length of such an dé uma ideia de quao préximo podemos estimar 0. interval gives us an idea of how closely we can estimate 0. Intervalos de confianga para a média de uma distribuigdéo normal Confidence Intervals for the Mean of a Normal Distribution Exemplo Chuva de nuvens semeadas.No Exemplo 8.3.2, a média dosn=26 registros de chuvas Example Rain from Seeded Clouds. In Example 8.3.2, the average of the n = 26 log-rainfalls 8.5.1 das nuvens semeadas €Xn. Este pode ser um estimador sensato doy, 0 log médio de 8.5.1 from the seeded clouds is X,,. This may be a sensible estimator of the w, the mean precipitagdo de uma nuvem semeada, mas nado da nenhuma ideia de quanto estoque log-rainfall from a seeded cloud, but it doesn’t give any idea how much stock we devemos colocar no estimador. O desvio padrdo deXnéo/26)12, e poderiamos should place in the estimator. The standard deviation of X,, is 7 /(26)'/*, and we could estimaropor um estimador comoada Eq. (8.4.3). Existe uma maneira sensata de estimate o by an estimator like o’ from Eq. (8.4.3). Is there a sensible way to combine combinar esses dois estimadores em uma inferéncia que nos diga o que devemos these two estimators into an inference that tells us both what we should estimate for estimar? ve quanta confianga devemos depositar no estimador? - j and how much confidence we should place in the estimator? < Assuma isso™,..., Xn, forme uma amostra aleatéria da distribuigdo normal Assume that X;,..., X,, form a random sample from the normal distribution com médiaye variagdooz. Construa os estimadoresXndeveadeo. Mostraremos with mean y and variance o?. Construct the estimators X,, of . and’ of o . We shall agora como fazer uso da variavel aleatéria now show how to make use of the random variable m2(Xr n(x. — voce 2AM) _ (8.5.1) y =n Xn) (8.5.1) 0 o’ da Eq. (8.4.4) para responder a questdo no final do Exemplo 8.5.1. N6s sabemos isso from Eq. (8.4.4) to address the question at the end of Example 8.5.1. We know that U vocé tem oddistribuigdo comn-1 graus de liberdade. Portanto, podemos calcular o cdf has the ¢ distribution with n — 1 degrees of freedom. Hence, we can calculate the c.d.f. devocée/ou quantis devocéusando software estatistico ou tabelas como as do final of U and/or quantiles of U using either statistical software or tables such as those deste livro. Em particular, podemos calcular Pr(/-c<vocé<c)para cada c >0. As in the back of this book. In particular, we can compute Pr(—c 0. The inequalities —c < U <c can be translated into inequalities involving jz by fazendo uso da foérmulavocéna Eq. (8.5.1). A algebra simples mostra que -c < vocé <c making use of the formula for U in Eq. (8.5.1). Simple algebra shows that —c < U <c é equivalente a is equivalent to _ co — co = co’ = co’ Xr a <UK nt a. (8.5.2) Xy— Tg << Xn + Tp. (8.5.2) Qualquer probabilidade que possamos atribuir ao evento {-c < vocé < também podemos atribuir a Whatever probability we can assign to the event {—c < U < c} we can also assign to 0 evento em que a Eq. (8.5.2) é valido. Por exemplo, se Préc<vocé<c¥ y,entao the event that Eq. (8.5.2) holds. For example, if Pr(—c < U <c) = y, then ( ) _ co — co = co’ = co’ ——_ + —_ = —_ ___ _ = Pr.Xn- ma <lX nt Fa zy (8.5.3) Pr (x, “I <me<X,+ <) y. (8.5.3) E preciso ter cuidado para entender a afirmacao de probabilidade na Eq. (8.5.3) como One must be careful to understand the probability statement in Eq. (8.5.3) as being sendo uma afirmagdo sobre a distribuicgdo conjunta das variaveis aleatériasXneopara a statement about the joint distribution of the random variables X,, and o’ for fixed valores fixos deve. Ou seja, 6 uma afirmacao sobre a distribuigéo amostral deXne values of jz and o. That is, it is a statement about the sampling distribution of X,, and Calculo de Probabilidades 1. Pagina 115 ~ ws Cpe 1 1. Como sao 11 inteiros entre 10 e 20 e a distribuigao é€ uniforme temos P(X = x) = Tr’ qualquer que seja x entre os inteiros listados. Como temos 6 inteiros pares e 5 impares , 6 dentre eles, a probabilidade procurada para5 <k < 10,k EeN@ P(X = 2k) = ir: 2. Como P(X = x) = cz, paraz = 1,...,5, e para que f(z) seja uma fungao de distribuigao de probabilidade devemos ter S- f(x) = 1. Entao w=1,...,5 1 le+2c4+38c¢+4e+5e=1 = ldc=1 => c= 5 3. Vamos ver os possiveis resultados dos pares de rolagens como (d\,42), Sendo que dj, dy = 1,...6. Faremos uma tabela dos possiveis pares e os valores absolutos das diferengas para facilitar a visualizagao: (1,1) 50 | (1,2) | (1,3)2 | (1,4)63 | (1,5)4 | (1,6)5 (2,1)1 | (2,2)0 | (2,3)1 | (2,4)2 | (2,5)3 | (2,6) 54 (3,1)42 | (3,2)1 | (3,3)0 | (3,4)1 | (3,5)2 | (3,6)53 (4,1)-33 | (4,2)2 | (4,3)1 | (4,4)50 | (4,5)1 | (4,6)52 (5,1)44 | (5,2)3 | (5,3)2 | (5,4)1 | (5,5)0 | (5,6)1 (6,1)45 | (6,2)4 | (6,3)3 | (6,4)2 | (6,5)>1 | (6,6)0 Como temos dados equilibrados, os 36 pares sao equiprovaveis, entao a probabilidade de , 1 ; ~ cada par e 36° Usaremos essas informagoes para esbogar o FP de X. F(X) 10 9 8 7 6 4 3 2 1 -2 -1 1 2 3 4 5 6 7 8 “4 x { Calculo de Probabilidades 4. Em 10 langamentos da moeda temos 2'0 = 1024 possibilidades para resultados e como a probabilidade de sucesso, nesse caso é igual a probabilidade de fracasso, qualquer resultado possivel tem probabilidade (0,5)'0 =, muliplicada pela quantidade de conjuntos de n caras escolhidas entre os 10 resultados. Ou seja, PF de X = n caras em 10 langamentos é: PR(X =n) = Chon: (0,5)"° 5. Sorteando 5 bolas, teremos 5 posigdes que podem ser vermelha ou preta, nos dando um total de 2° = 32 possibilidades de resultado e a quantidade deles com 0 < n < 5 bolas vermelhas sera C;,,, nNOS dando o seguinte FP: 11 F(X) 10 9 8 7 6 4 3 2 1 -3 -2 -1 1 2 3 4 5 6 7 8 9 10 “4 x 6. Distribuigao binomial: P(X = k) = @ic —p)"*,k=0,...,n ° 2/15 PR(X <6)= PR(X =n) = 0,5)"(0,5)P-" = AUX <6) = Yo PRX =n) = (2) (0.5)90.5 = 0, 00003 + 0, 00045 + 0, 00320 + 0, 01389 + 0, 04166 + 0, 09164 = 0, 15087 7. PR(X > 5) = 0(8;0,7) = PR(X < 3) = b(8; 0,3) e olhando na tabela encontramos 3 PR(X <3) = > PR(X =n) =0,0576 + 0, 1977 + 0, 2965 + 0, 2541 = 0, 8059 n=0 8. A PR(X = k), onde k € o numero de bolas vermelhas selecionadas segue a distribuigao binomial com p = 0,1 € n = 20. Usando a probabilidade do evento complementar temos PR(X > 3) =1-—PR(X < 3) que sera: 2 PR(x > 3) =1- PR(X <3) =1—S° P(X =k) =1—0,1206 — 0, 2702 — 0, 2852 = 0, 3240 k=0 9. Para que f seja uma fungao de distribuigao de probabilidade devemos ter S- = | entao, n=0 usando a convergéncia da série geométrica temos “ ¢ Cc 1 2 1-3 2 Calculo de Probabilidades 2. Pagina 122 8 m2 2fat]* 8 \° 1. PR(X<— -/ <7 ide =|} =4/(—) =0,1613 27 0 3 315 |, 27 2. Esbogando a fungao pdf encontramos 1.5 £ 1 0.5 0.5 1 b4 4 x)’ “(J — 7 — [ys [5 x’ )dx ac | 1\ 4f «]? 4/1 1) 4-31 31 PR(X <=)=-|¢-—=| ==(--—)=—~—=— (a) e( <5) sl |. 5 (5 si) 3-64 16 3 1 3 4 x|4 4/3 81 1 1 (0) pR(z<x<)=5|e-4] -3(G-am-at im) _ 4(768—81-256+1) 9 - 3 - 1024 16 1\ 4 vy] 4 111 4(324—81—108+1) 136 ~)= -fry-=] = =[],- 2-24) = ee (@ PR(x>5) ac r], + ( 4 s+ ai) 3.304 343 3. Esbogando a fungao pdf encontramos 0.5 —3-2.8-21.540.5 0.511.522.53 b 375 | yg eee = 55 9x — 1 36 36 3 |, 1 v)° 81-27 1 X — — _ — (a) PR(X < 0) 36 0 5. 108 5 1 a)' 27-1427-1 13 -1<X<1)=— — fp SD b) PROTSX SI) = 55 |». | 36-3 27 1 a]? 27-18-2748 35 2 4. (a) Como / cx’dx = 1 par que f seja uma pdf, temos 1 3° 37 oo 5 Esbocando a pdf encontramos: 3 Calculo de Probabilidades 2 1.5 1 0.5 0.511.5 22.5 ° 1 2 (b) pR(x>3) -/ 3 2dr = + (s-<) _* 2 3 7 7 8 56 ‘¢ 1 ei . _— = TJ = TF = 2 5 (a) [ae i= i i= ‘¢ 1 | —dx = = —=+ = 2V2 (o) fae 3 yr v2 6. Teremos PR(S = 0) = PR(O < X < 1/2), PR(S = 1) = PR(1/2 < X < 3/2, e assim sucessivamente, entao PR(S =k) = PR((k—1)/2 < X < (k+1)/2), entao a PF de S fica k4l fk) - | © Taek =1,...,4. ka 8 7. Se a distribuigdo é uniforme, entao f(x) = ce c(8 — (—2)) =1 => c= ;z, entdo a pdf de X , 1 =—,-2<27<8. é f(x) in 2 22S 8 1 7 P X =7-—=— ROO<X <7)=7 1 io ; ~ ce ** ~ 8. (a) Para que f seja uma pdf devemos ter ce "dx = jim 7 + a= 1 temos entao 0 00 c = 2. Esbogando a pdf encontramos: 2.5 2 1.5 1 0.5 —0.5° 0.511.522533.5 4 4.5 5 5.5 2 (b) PRI <X <2)= / 2e-**dx = —e-* +e -* =0,1170 1 3. Pagina 132 0.7 <0 1. Ocdf de X sera F(x) = PR(X <x), logo F(x) = 41,0<a2<1_ . Entaoo esbooo fica: 0.7 >1 1 0.5 0.511.5 4 C´alculo de Probabilidades 2. O cdf de X ser´a F(x) = PR(X ≤ x) =                0, x < −2 0, 4, −2 ≤ x < 0 0, 5, 0 ≤ x < 1 0, 8, 1 ≤ x < 4 1, x ≥ 4 e o esboc¸o de F(x) ficar´a −4 −3 −2 −1 1 2 3 4 5 6 0.2 0.4 0.6 0.81 3. Neste caso a probabilidade de cara (sucesso) ´e 0, 5 e a de coroa (fracasso) ´e a mesma em cada lanc¸amento. Ent˜ao F(x) = 1 − (0, 5)x e seu esboc¸o fica 1 2 3 4 5 6 0.2 0.4 0.6 0.81 f 4. Olhando o gr´afico encontramos: (a) PR(X < −1) = 0 (b) PR(X < 0) = 0, 1 (c) PR(X ≤ 0) = 0, 2 (d) PR(X = 1) = 0 (e) PR(0 < X ≤ 3) = 0, 7 (f) PR(0 < X < 3) = 0, 5 (g) PR(0 ≤ X ≤ 3) = 0, 6 (h) PR(1 < X ≤ 2) = 0 (i) PR(X > 5) = 0 (j) PR(X ≥ 5) = 0 (k) PR(3 ≤ X ≤ 4) = 0 5. Nese caso a pdf ´e f(x) = x2 9 , 0 ≤ x ≤ 3 e o esboc¸o fica: 1 2 3 0.2 0.4 0.6 0.81 f 6. Nesse caso a pdf ´e f(x) = ex−3, x ≤ 3 e o esboc¸o fica: −2 −1 1 2 3 0.2 0.4 0.6 0.81 f 5 Calculo de Probabilidades ete _9<xr<8 , 7. Nesse caso o cdf de X, ou seja PR(X < x) = L 3 e o esboco fica: «> 7 1 0.5 —2 -l 1 2 3 4 D 6 7 8 9 10 9. Como X é€ uniforme em [0,5], Entao PR(X = x) = 0,2,0 < x < 5. Entao o pdf de S fica PR(S = 0) = 0,2;PR(S = 5) = 0,4;PR(S = X) = 0,4. Entao o cdf de S fica 0,2 <0 2 < F(S) = 0,20,0S¢<3 e o esboco fica: 0,6,3<%"%<5 lae>sd 1 ——— —E®) 0.5 1 2 3 4 3 6 7 8 10. A fungao quantil F~'(p) é igual ao menor x tal que F(x) > p paraO —> F"(p) = \/9p = 3yp. 6

Send your question to AI and receive an answer instantly

Ask Question

Preview text

Exercícios - Degroot - Cálculo das Probabilidades 1 - 2023-2

UERJ ·

Estatística ·

Cálculo das Probablidades 1

Send your question to AI and receive an answer instantly

Send your question to AI and receive an answer instantly

·