1
Análise Matemática
UMG
2
Análise Matemática
UMG
1
Análise Matemática
UMG
14
Análise Matemática
UMG
1
Análise Matemática
UMG
1
Análise Matemática
UMG
1
Análise Matemática
UMG
1
Análise Matemática
UMG
1
Análise Matemática
UMG
1
Análise Matemática
UMG
Texto de pré-visualização
Now that you have your topic it is time to structure your IA Although there are no strict rules for how to assemble your IA Math IAs generally follow this flow 1 Rationale 2 Plan of action 3 Methodology 4 Evaluation and Extensions 5 Conclusion It may look like a lot but dont worry It is simpler than you think and these steps will be further explained in order below I will use my own IA in Math AI SL as an example but feel free to crossreference this with any sample IAs provided by your teacher Statistical Analysis of the Correlation between Social Media Usage and Academic Performance An Approach via Estimation Theory and Statistical Inference Estudante Sofia Número do Candidato x Matemática Análise e Abordagens Nível Superior Colégio Exemplo IB 4 de fevereiro de 2026 4 de fevereiro de 2026 Resumo This work investigates the functional relationship between daily social media usage time and academic performance in International Baccalaureate students Through a theoretical framework based on probability theory and statistical inference we formulate the problem as a Gaussian linear regression model We employ maximum likelihood and least squares estimators deriving their asymptotic properties The analysis includes hypothesis testing on the model parameters using NeymanPearson theory construction of confidence intervals via the central limit theorem and mo del diagnostics through residual analysis We demonstrate that under the model assumptions there exists a statistically significant negative correlation p 001 between the variables with a coefficient of determination of R2 0623 The work illustrates the application of advanced mathematical methods to a realworld pro blem following the rigorous standards of pure mathematics Keywords Statistical inference linear regression estimation theory Gaussian models hypothesis testing correlation analysis 1 Conteúdo 1 Introduction and Theoretical Foundations 4 11 Problem Context 4 12 Rigorous Mathematical Formulation 4 13 Mathematical Literature Review 4 2 Methodology and Experimental Design 5 21 Probability Space and Measure 5 22 Sampling Procedure 5 23 Measured Variables 5 24 Data Collection 6 3 Mathematical Analysis Estimation Theory 6 31 Maximum Likelihood Estimators 6 32 Properties of Estimators 6 33 Fisher Information Matrix 7 4 Results and Statistical Inference 7 41 Collected Data 7 42 Descriptive Statistics 8 43 Correlation Analysis 8 44 Regression Parameter Estimation 8 45 Analysis of Variance ANOVA 9 46 Confidence Intervals 9 5 Model Diagnostics and Validation 9 51 Residual Analysis 9 52 Influence Analysis 10 53 CrossValidation 10 2 6 Discussion and Critical Reflection 10 61 Interpretation of Results 10 62 Model Limitations 11 63 Generalization of Results 11 64 Extensions and Future Research 11 7 Conclusion 11 A Appendix A Complete Mathematical Proofs 12 A1 Proof of GaussMarkov Theorem 12 A2 Proof of tdistribution for ˆβ1 12 B Appendix B Computational Codes 12 B1 Python for Statistical Analysis 12 B2 LaTeX for PGFPlots Graphics 13 C Appendix C Questionnaire and Instruments 14 3 1 Introduction and Theoretical Foundations 11 Problem Context Let Ω be the set of all International Baccalaureate students endowed with a σalgebra F and a probability measure P We define two measurable random variables X Ω R daily social media time in hours Y Ω 0 45 predicted IB diploma score The main objective is to investigate the existence and nature of the statistical dependence between X and Y 12 Rigorous Mathematical Formulation We consider that the pair X Y follows an unknown bivariate distribution FXY x y We assume that the conditional relationship EY X x can be approximated by a linear function This leads to the parametric model Yi β0 β1Xi εi i 1 n 1 where we assume the following axioms A1 Linearity EεiXi 0 A2 Homoscedasticity VarεiXi σ2 A3 Nonautocorrelation Covεi εj 0 for i j A4 Normality εi N0 σ2 A5 IID Sample Xi Yin i1 is a simple random sample 13 Mathematical Literature Review The problem of parameter estimation in linear models has a rich history in mathematics We cite three fundamental results Teorema 11 GaussMarkov Under hypotheses A1A3 the ordinary least squares OLS estimators of β0 and β1 are the Best Linear Unbiased Estimators BLUE 4 Teorema 12 Distribution of OLS Estimators Under hypotheses A1A4 the OLS estimators follow normal distributions β1 N β 1 σ 2 n i1 x i x 2 β0 N β 0 σ 2 1 n x 2 n i1 x i x 2 Teorema 13 FisherCochran Under normality of errors the statistics β 1 β 1 β 1 and n 2 σ 2 σ 2 follow Students t and χ 2 distributions respectively and are independent 2 Methodology and Experimental Design 21 Probability Space and Measure We formally define the measurable space Ω F P R 0 45 B R 2 P θ where θ β 0 β 1 σ 2 Θ R 2 R is the parameter vector and P θ is the family of probability measures induced by model 1 22 Sampling Procedure We used stratified random sampling by school year Let Ω k 3 k 1 be a partition of Ω corresponding to IB years The sample measure is n k n Ω k Ω 3 k 1 n k n 50 23 Measured Variables X i Average daily social media time precision 01 hour Y i Predicted total IB diploma score scale 045 Z i Control variable daily study hours 24 Data Collection Data were collected through a validated instrument with Cronbachs α 087 The final sample consists of n 50 independent observations 3 Mathematical Analysis Estimation Theory 31 Maximum Likelihood Estimators The likelihood function for model 1 is L β 0 β 1 σ 2 x y n i 1 1 2 π σ 2 exp y i β 0 β 1 x i 2 2 σ 2 2 The corresponding loglikelihood function is ℓ β 0 β 1 σ 2 n 2 log 2 π σ 2 1 2 σ 2 n i 1 y i β 0 β 1 x i 2 3 Proposição 31 MLE for Linear Regression The maximum likelihood estimators MLE are β 1 MLE n i 1 x i x y i y n i 1 x i x 2 4 β 0 MLE y β 1 MLE x 5 σ MLE 2 1 n n i 1 y i β 0 MLE β 1 MLE x i 2 6 Demonstração We derive the firstorder conditions ℓ β 0 1 σ 2 n i 1 y i β 0 β 1 x i 0 ℓ β 1 1 σ 2 n i 1 x i y i β 0 β 1 x i 0 Solving this linear system yields the above expressions 32 Properties of Estimators Teorema 32 Inconsistency of MLE for σ 2 The MLE of σ 2 is biased E σ 2 MLE n 2 n σ 2 An unbiased estimator is s2 1n2 Σi1 to n εi2 Teorema 33 Asymptotic Efficiency Under regularity conditions the MLEs are asymptotically normal and efficient n θMLE θ0 𝒩0 I1θ0 where Iθ is the Fisher information matrix 33 Fisher Information Matrix For the Gaussian linear model the Fisher information matrix is Iθ nσ2 Σxiσ2 0 Σxiσ2 Σxi2σ2 0 0 0 n2σ4 7 The CramérRao lower bound for the variance of any unbiased estimator of β1 is Varβ1 σ2 Σi1 to n xi x2 4 Results and Statistical Inference 41 Collected Data Table 1 presents a subset of the collected data Tabela 1 Sample Data n 50 i Xi hours Yi score Zi control 1 12 42 35 2 35 35 20 3 08 44 40 4 42 32 15 5 21 38 30 50 28 36 25 42 Descriptive Statistics We calculate the following sample statistics x 1n Σi1 to n xi 284 hours y 1n Σi1 to n yi 3672 points sx2 1n1 Σi1 to n xi x2 146 sy2 1n1 Σi1 to n yi y2 1837 sxy 1n1 Σi1 to n xi xyi y 428 43 Correlation Analysis Definição 41 Pearson Correlation Coefficient The population correlation coefficient is ρ CovXY VarX VarY Its natural estimator is r sxy sx sy 0824 Proposição 41 Distribution of r under H0 ρ 0 Under bivariate normality and H0 we have t rn2 1 r2 tn2 For our data t 082448 1 0679 872 p 00001 We reject H0 with significance level α 001 44 Regression Parameter Estimation Applying the OLS estimators β1 Σ xi xyi y Σ xi x2 20986 7154 2934 β0 y β1 x 3672 2934 284 4505 Therefore the estimated equation is Ŷi 4505 2934 Xi 45 Analysis of Variance ANOVA Tabela 2 ANOVA Table Source SS df MS F pvalue Regression 61542 1 61542 7612 00001 Residual 37258 48 776 Total 98800 49 The coefficient of determination is R2 SS Regression SS Total 61542 98800 0623 46 Confidence Intervals Using Students t distribution β1 t097548 β1 2934 2011 0337 3612 2256 With 95 confidence each additional hour of social media reduces the score by between 226 and 361 points 5 Model Diagnostics and Validation 51 Residual Analysis We define residuals as εi yi ŷi We test the hypotheses 1 Normality ShapiroWilk test W 0982 p 0647 do not reject normality 2 Homoscedasticity BreuschPagan test χ2 234 p 0126 do not reject homoscedasticity 3 Independence DurbinWatson test d 192 p 0432 do not reject independence 52 Influence Analysis We calculate Cooks distance for each observation Di êi2 p MSE hii 1 hii2 where hii are the diagonal elements of the hat matrix H XXT X1 XT No observation has Di 05 indicating absence of extreme influential points 53 CrossValidation We performed kfold crossvalidation with k 5 MSECV 1n sumk15 sumi foldk yi ŷik2 823 Compared to the full model MSE 776 this indicates good stability 6 Discussion and Critical Reflection 61 Interpretation of Results The coefficient β1 2934 has limited causal interpretation due to possible omitted variables Formally n β1 β1 CovXε VarX If there are confounding variables Z correlated with X and Y then CovXε 0 and the estimator is inconsistent 62 Model Limitations 1 Omitted Variable Bias Factors such as intelligence intrinsic motivation and family support were not measured 2 Measurement Error Social media time was selfreported subject to memory bias 3 Nonlinearity The true relationship may be nonlinear but our model assumes linearity 4 Reverse Causality Poor performance may lead to increased social media use not the opposite 63 Generalization of Results Inference to the population requires the sample to be representative Our convenience sampling procedure violates this assumption limiting external validity 64 Extensions and Future Research 1 Fixed Effects Models Control unobserved individual heterogeneity 2 Instrumental Variables Use exogenous variations to identify causal effects 3 Panel Data Models Longitudinal data to analyze temporal dynamics 4 Quantile Regression Analyze effects at different points of the distribution 7 Conclusion This work demonstrated the rigorous application of statistical inference methods to the problem of the relationship between social media use and academic performance Through precise mathematical formulation efficient parameter estimation rigorous hypothesis tes ting and comprehensive model diagnostics we found statistically significant evidence of a negative correlation between the variables The main contribution lies not only in the empirical results but in illustrating how pure mathematics probability theory mathematical statistics linear algebra provides the necessary tools for rigorous scientific analysis The work highlights the importance of methodological rigor assumption verification and cautious interpretation of results For IB students this study serves as an example of how apparently simple problems can be approached with mathematical sophistication demonstrating the beauty and utility of mathematics as a language for understanding the world 11 A Appendix A Complete Mathematical Proofs A1 Proof of GaussMarkov Theorem Let β XT X1 XT y be the OLS estimator For any other linear unbiased estimator β Ay we have Varβ VarAy A Vary AT σ2 A AT Varβ σ2 XT X1 The difference is Varβ Varβ σ2AAT XT X1 Since A XT X1 XT D with D X 0 for unbiasedness then AAT XT X1 DDT XT X1 Therefore the difference is positive semidefinite proving OLS is BLUE A2 Proof of tdistribution for β1 Under normality β1 N β1 σ2Sxx and n2 s2 σ2 χ2n2 Furthermore β1 and s2 are independent Hence β1 β1 s Sxx tn2 B Appendix B Computational Codes B1 Python for Statistical Analysis import numpy as np import scipystats as stats Data example X nparray social media hours Y nparray IB scores Linear regression n lenX Xmean npmeanX Ymean npmeanY Sxx npsumX Xmean2 Sxy npsumX XmeanY Ymean beta1 Sxy Sxx beta0 Ymean beta1 Xmean Standard errors Ypred beta0 beta1 X residuals Y Ypred s2 npsumresiduals2 n2 sebeta1 npsqrts2 Sxx ttest tstat beta1 sebeta1 pvalue 2 1 statstcdfabststat n2 B2 LaTeX for PGFPlots Graphics begintikzpicture beginaxis xlabelSocial Media Time hours ylabelPredicted IB Score gridboth width08 extwidth height06 extwidth addplotonly marks blue table 12 42 35 35 08 44 42 32 21 38 addplotred thick domain05 4505 2934x endaxis endtikzpicture 13 C Appendix C Questionnaire and Instruments The questionnaire consisted of 1 Informed consent form 2 Demographic data age gender school year 3 Average daily social media time scale 024 hours 4 Predicted scores in each IB component 5 Control variables study hours sleep extracurricular activities 14 Bibliographic References Referências 1 Casella G Berger R L 2002 Statistical Inference 2nd ed Duxbury Press 2 Draper N R Smith H 1998 Applied Regression Analysis 3rd ed Wiley 3 Lehmann E L Casella G 2006 Theory of Point Estimation 2nd ed Sprin ger 4 Seber G A F Lee A J 2012 Linear Regression Analysis 2nd ed Wiley 5 Shao J 2003 Mathematical Statistics 2nd ed Springer 6 White H 1980 A HeteroskedasticityConsistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity Econometrica 484 817838 7 Wooldridge J M 2010 Econometric Analysis of Cross Section and Panel Data 2nd ed MIT Press 8 Greene W H 2012 Econometric Analysis 7th ed Pearson 9 Hastie T Tibshirani R Friedman J 2009 The Elements of Statistical Lear ning 2nd ed Springer 10 Efron B Tibshirani R J 1994 An Introduction to the Bootstrap Chapman and HallCRC 15
1
Análise Matemática
UMG
2
Análise Matemática
UMG
1
Análise Matemática
UMG
14
Análise Matemática
UMG
1
Análise Matemática
UMG
1
Análise Matemática
UMG
1
Análise Matemática
UMG
1
Análise Matemática
UMG
1
Análise Matemática
UMG
1
Análise Matemática
UMG
Texto de pré-visualização
Now that you have your topic it is time to structure your IA Although there are no strict rules for how to assemble your IA Math IAs generally follow this flow 1 Rationale 2 Plan of action 3 Methodology 4 Evaluation and Extensions 5 Conclusion It may look like a lot but dont worry It is simpler than you think and these steps will be further explained in order below I will use my own IA in Math AI SL as an example but feel free to crossreference this with any sample IAs provided by your teacher Statistical Analysis of the Correlation between Social Media Usage and Academic Performance An Approach via Estimation Theory and Statistical Inference Estudante Sofia Número do Candidato x Matemática Análise e Abordagens Nível Superior Colégio Exemplo IB 4 de fevereiro de 2026 4 de fevereiro de 2026 Resumo This work investigates the functional relationship between daily social media usage time and academic performance in International Baccalaureate students Through a theoretical framework based on probability theory and statistical inference we formulate the problem as a Gaussian linear regression model We employ maximum likelihood and least squares estimators deriving their asymptotic properties The analysis includes hypothesis testing on the model parameters using NeymanPearson theory construction of confidence intervals via the central limit theorem and mo del diagnostics through residual analysis We demonstrate that under the model assumptions there exists a statistically significant negative correlation p 001 between the variables with a coefficient of determination of R2 0623 The work illustrates the application of advanced mathematical methods to a realworld pro blem following the rigorous standards of pure mathematics Keywords Statistical inference linear regression estimation theory Gaussian models hypothesis testing correlation analysis 1 Conteúdo 1 Introduction and Theoretical Foundations 4 11 Problem Context 4 12 Rigorous Mathematical Formulation 4 13 Mathematical Literature Review 4 2 Methodology and Experimental Design 5 21 Probability Space and Measure 5 22 Sampling Procedure 5 23 Measured Variables 5 24 Data Collection 6 3 Mathematical Analysis Estimation Theory 6 31 Maximum Likelihood Estimators 6 32 Properties of Estimators 6 33 Fisher Information Matrix 7 4 Results and Statistical Inference 7 41 Collected Data 7 42 Descriptive Statistics 8 43 Correlation Analysis 8 44 Regression Parameter Estimation 8 45 Analysis of Variance ANOVA 9 46 Confidence Intervals 9 5 Model Diagnostics and Validation 9 51 Residual Analysis 9 52 Influence Analysis 10 53 CrossValidation 10 2 6 Discussion and Critical Reflection 10 61 Interpretation of Results 10 62 Model Limitations 11 63 Generalization of Results 11 64 Extensions and Future Research 11 7 Conclusion 11 A Appendix A Complete Mathematical Proofs 12 A1 Proof of GaussMarkov Theorem 12 A2 Proof of tdistribution for ˆβ1 12 B Appendix B Computational Codes 12 B1 Python for Statistical Analysis 12 B2 LaTeX for PGFPlots Graphics 13 C Appendix C Questionnaire and Instruments 14 3 1 Introduction and Theoretical Foundations 11 Problem Context Let Ω be the set of all International Baccalaureate students endowed with a σalgebra F and a probability measure P We define two measurable random variables X Ω R daily social media time in hours Y Ω 0 45 predicted IB diploma score The main objective is to investigate the existence and nature of the statistical dependence between X and Y 12 Rigorous Mathematical Formulation We consider that the pair X Y follows an unknown bivariate distribution FXY x y We assume that the conditional relationship EY X x can be approximated by a linear function This leads to the parametric model Yi β0 β1Xi εi i 1 n 1 where we assume the following axioms A1 Linearity EεiXi 0 A2 Homoscedasticity VarεiXi σ2 A3 Nonautocorrelation Covεi εj 0 for i j A4 Normality εi N0 σ2 A5 IID Sample Xi Yin i1 is a simple random sample 13 Mathematical Literature Review The problem of parameter estimation in linear models has a rich history in mathematics We cite three fundamental results Teorema 11 GaussMarkov Under hypotheses A1A3 the ordinary least squares OLS estimators of β0 and β1 are the Best Linear Unbiased Estimators BLUE 4 Teorema 12 Distribution of OLS Estimators Under hypotheses A1A4 the OLS estimators follow normal distributions β1 N β 1 σ 2 n i1 x i x 2 β0 N β 0 σ 2 1 n x 2 n i1 x i x 2 Teorema 13 FisherCochran Under normality of errors the statistics β 1 β 1 β 1 and n 2 σ 2 σ 2 follow Students t and χ 2 distributions respectively and are independent 2 Methodology and Experimental Design 21 Probability Space and Measure We formally define the measurable space Ω F P R 0 45 B R 2 P θ where θ β 0 β 1 σ 2 Θ R 2 R is the parameter vector and P θ is the family of probability measures induced by model 1 22 Sampling Procedure We used stratified random sampling by school year Let Ω k 3 k 1 be a partition of Ω corresponding to IB years The sample measure is n k n Ω k Ω 3 k 1 n k n 50 23 Measured Variables X i Average daily social media time precision 01 hour Y i Predicted total IB diploma score scale 045 Z i Control variable daily study hours 24 Data Collection Data were collected through a validated instrument with Cronbachs α 087 The final sample consists of n 50 independent observations 3 Mathematical Analysis Estimation Theory 31 Maximum Likelihood Estimators The likelihood function for model 1 is L β 0 β 1 σ 2 x y n i 1 1 2 π σ 2 exp y i β 0 β 1 x i 2 2 σ 2 2 The corresponding loglikelihood function is ℓ β 0 β 1 σ 2 n 2 log 2 π σ 2 1 2 σ 2 n i 1 y i β 0 β 1 x i 2 3 Proposição 31 MLE for Linear Regression The maximum likelihood estimators MLE are β 1 MLE n i 1 x i x y i y n i 1 x i x 2 4 β 0 MLE y β 1 MLE x 5 σ MLE 2 1 n n i 1 y i β 0 MLE β 1 MLE x i 2 6 Demonstração We derive the firstorder conditions ℓ β 0 1 σ 2 n i 1 y i β 0 β 1 x i 0 ℓ β 1 1 σ 2 n i 1 x i y i β 0 β 1 x i 0 Solving this linear system yields the above expressions 32 Properties of Estimators Teorema 32 Inconsistency of MLE for σ 2 The MLE of σ 2 is biased E σ 2 MLE n 2 n σ 2 An unbiased estimator is s2 1n2 Σi1 to n εi2 Teorema 33 Asymptotic Efficiency Under regularity conditions the MLEs are asymptotically normal and efficient n θMLE θ0 𝒩0 I1θ0 where Iθ is the Fisher information matrix 33 Fisher Information Matrix For the Gaussian linear model the Fisher information matrix is Iθ nσ2 Σxiσ2 0 Σxiσ2 Σxi2σ2 0 0 0 n2σ4 7 The CramérRao lower bound for the variance of any unbiased estimator of β1 is Varβ1 σ2 Σi1 to n xi x2 4 Results and Statistical Inference 41 Collected Data Table 1 presents a subset of the collected data Tabela 1 Sample Data n 50 i Xi hours Yi score Zi control 1 12 42 35 2 35 35 20 3 08 44 40 4 42 32 15 5 21 38 30 50 28 36 25 42 Descriptive Statistics We calculate the following sample statistics x 1n Σi1 to n xi 284 hours y 1n Σi1 to n yi 3672 points sx2 1n1 Σi1 to n xi x2 146 sy2 1n1 Σi1 to n yi y2 1837 sxy 1n1 Σi1 to n xi xyi y 428 43 Correlation Analysis Definição 41 Pearson Correlation Coefficient The population correlation coefficient is ρ CovXY VarX VarY Its natural estimator is r sxy sx sy 0824 Proposição 41 Distribution of r under H0 ρ 0 Under bivariate normality and H0 we have t rn2 1 r2 tn2 For our data t 082448 1 0679 872 p 00001 We reject H0 with significance level α 001 44 Regression Parameter Estimation Applying the OLS estimators β1 Σ xi xyi y Σ xi x2 20986 7154 2934 β0 y β1 x 3672 2934 284 4505 Therefore the estimated equation is Ŷi 4505 2934 Xi 45 Analysis of Variance ANOVA Tabela 2 ANOVA Table Source SS df MS F pvalue Regression 61542 1 61542 7612 00001 Residual 37258 48 776 Total 98800 49 The coefficient of determination is R2 SS Regression SS Total 61542 98800 0623 46 Confidence Intervals Using Students t distribution β1 t097548 β1 2934 2011 0337 3612 2256 With 95 confidence each additional hour of social media reduces the score by between 226 and 361 points 5 Model Diagnostics and Validation 51 Residual Analysis We define residuals as εi yi ŷi We test the hypotheses 1 Normality ShapiroWilk test W 0982 p 0647 do not reject normality 2 Homoscedasticity BreuschPagan test χ2 234 p 0126 do not reject homoscedasticity 3 Independence DurbinWatson test d 192 p 0432 do not reject independence 52 Influence Analysis We calculate Cooks distance for each observation Di êi2 p MSE hii 1 hii2 where hii are the diagonal elements of the hat matrix H XXT X1 XT No observation has Di 05 indicating absence of extreme influential points 53 CrossValidation We performed kfold crossvalidation with k 5 MSECV 1n sumk15 sumi foldk yi ŷik2 823 Compared to the full model MSE 776 this indicates good stability 6 Discussion and Critical Reflection 61 Interpretation of Results The coefficient β1 2934 has limited causal interpretation due to possible omitted variables Formally n β1 β1 CovXε VarX If there are confounding variables Z correlated with X and Y then CovXε 0 and the estimator is inconsistent 62 Model Limitations 1 Omitted Variable Bias Factors such as intelligence intrinsic motivation and family support were not measured 2 Measurement Error Social media time was selfreported subject to memory bias 3 Nonlinearity The true relationship may be nonlinear but our model assumes linearity 4 Reverse Causality Poor performance may lead to increased social media use not the opposite 63 Generalization of Results Inference to the population requires the sample to be representative Our convenience sampling procedure violates this assumption limiting external validity 64 Extensions and Future Research 1 Fixed Effects Models Control unobserved individual heterogeneity 2 Instrumental Variables Use exogenous variations to identify causal effects 3 Panel Data Models Longitudinal data to analyze temporal dynamics 4 Quantile Regression Analyze effects at different points of the distribution 7 Conclusion This work demonstrated the rigorous application of statistical inference methods to the problem of the relationship between social media use and academic performance Through precise mathematical formulation efficient parameter estimation rigorous hypothesis tes ting and comprehensive model diagnostics we found statistically significant evidence of a negative correlation between the variables The main contribution lies not only in the empirical results but in illustrating how pure mathematics probability theory mathematical statistics linear algebra provides the necessary tools for rigorous scientific analysis The work highlights the importance of methodological rigor assumption verification and cautious interpretation of results For IB students this study serves as an example of how apparently simple problems can be approached with mathematical sophistication demonstrating the beauty and utility of mathematics as a language for understanding the world 11 A Appendix A Complete Mathematical Proofs A1 Proof of GaussMarkov Theorem Let β XT X1 XT y be the OLS estimator For any other linear unbiased estimator β Ay we have Varβ VarAy A Vary AT σ2 A AT Varβ σ2 XT X1 The difference is Varβ Varβ σ2AAT XT X1 Since A XT X1 XT D with D X 0 for unbiasedness then AAT XT X1 DDT XT X1 Therefore the difference is positive semidefinite proving OLS is BLUE A2 Proof of tdistribution for β1 Under normality β1 N β1 σ2Sxx and n2 s2 σ2 χ2n2 Furthermore β1 and s2 are independent Hence β1 β1 s Sxx tn2 B Appendix B Computational Codes B1 Python for Statistical Analysis import numpy as np import scipystats as stats Data example X nparray social media hours Y nparray IB scores Linear regression n lenX Xmean npmeanX Ymean npmeanY Sxx npsumX Xmean2 Sxy npsumX XmeanY Ymean beta1 Sxy Sxx beta0 Ymean beta1 Xmean Standard errors Ypred beta0 beta1 X residuals Y Ypred s2 npsumresiduals2 n2 sebeta1 npsqrts2 Sxx ttest tstat beta1 sebeta1 pvalue 2 1 statstcdfabststat n2 B2 LaTeX for PGFPlots Graphics begintikzpicture beginaxis xlabelSocial Media Time hours ylabelPredicted IB Score gridboth width08 extwidth height06 extwidth addplotonly marks blue table 12 42 35 35 08 44 42 32 21 38 addplotred thick domain05 4505 2934x endaxis endtikzpicture 13 C Appendix C Questionnaire and Instruments The questionnaire consisted of 1 Informed consent form 2 Demographic data age gender school year 3 Average daily social media time scale 024 hours 4 Predicted scores in each IB component 5 Control variables study hours sleep extracurricular activities 14 Bibliographic References Referências 1 Casella G Berger R L 2002 Statistical Inference 2nd ed Duxbury Press 2 Draper N R Smith H 1998 Applied Regression Analysis 3rd ed Wiley 3 Lehmann E L Casella G 2006 Theory of Point Estimation 2nd ed Sprin ger 4 Seber G A F Lee A J 2012 Linear Regression Analysis 2nd ed Wiley 5 Shao J 2003 Mathematical Statistics 2nd ed Springer 6 White H 1980 A HeteroskedasticityConsistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity Econometrica 484 817838 7 Wooldridge J M 2010 Econometric Analysis of Cross Section and Panel Data 2nd ed MIT Press 8 Greene W H 2012 Econometric Analysis 7th ed Pearson 9 Hastie T Tibshirani R Friedman J 2009 The Elements of Statistical Lear ning 2nd ed Springer 10 Efron B Tibshirani R J 1994 An Introduction to the Bootstrap Chapman and HallCRC 15