This document provides an introduction to correlation and regression analysis. It defines key concepts like variables, random variables, and probability distributions. It discusses how correlation measures the strength and direction of a linear relationship between two variables. Correlation coefficients range from -1 to 1, with values closer to these extremes indicating stronger correlation. The document also introduces determination coefficients, which measure the proportion of variance in one variable explained by the other. Regression analysis builds on correlation to study and predict the average value of one variable based on the values of other explanatory variables.
Topic: Variance
Student Name: Sonia Khan
Class: B.Ed. 2.5
Project Name: “Young Teachers' Professional Development (TPD)"
"Project Founder: Prof. Dr. Amjad Ali Arain
Faculty of Education, University of Sindh, Pakistan
This document provides an overview of linear regression analysis. It discusses (1) why regression is used, including for description, adjustment for covariates, identifying predictors, and prediction; (2) the basics of linear regression in predicting an interval outcome variable based on predictor variables; and (3) how to conduct univariate linear regression in SPSS, including interpreting results and ensuring assumptions are met. Key assumptions include no outliers, independent data points, normally distributed residuals with constant variance.
The document discusses correlation and linear regression. It defines Pearson and Spearman correlation as statistical techniques to measure the relationship between two variables. Pearson correlation measures the linear association between interval variables, while Spearman correlation measures statistical dependence between two variables using their rank order. Linear regression finds the best fit linear relationship between a dependent and independent variable to predict changes in one based on the other. The key assumptions and interpretations of correlation coefficients and regression lines are also covered.
- Regression analysis is a statistical tool used to examine relationships between variables and can help predict future outcomes. It allows one to assess how the value of a dependent variable changes as the value of an independent variable is varied.
- Simple linear regression involves one independent variable, while multiple regression can include any number of independent variables. Regression analysis outputs include coefficients, residuals, and measures of fit like the R-squared value.
- An example uses home size and price data from 10 houses to generate a linear regression equation predicting that price increases by around $110 for each additional square foot. This model explains 58% of the variation in home prices.
This document provides an overview of regression analysis and two-way tables. It defines key concepts such as regression lines, correlation, residuals, and marginal and conditional distributions. Regression finds the linear relationship between two variables to make predictions. The least squares regression line minimizes the vertical distance between the data points and the line. Correlation and the coefficient of determination r2 measure how well the regression line fits the data. Two-way tables summarize the relationship between two categorical variables through marginal and conditional distributions.
The document provides information about binomial probability distributions including:
- Binomial experiments have a fixed number (n) of independent trials with two possible outcomes and a constant probability (p) of success.
- The binomial probability distribution gives the probability of getting exactly x successes in n trials. It is calculated using the binomial coefficient and p and q=1-p.
- The mean, variance and standard deviation of a binomial distribution are np, npq, and √npq respectively.
- Examples demonstrate calculating probabilities of outcomes for binomial experiments and determining if results are significantly low or high using the range rule of μ ± 2σ.
This document discusses confidence intervals for population means and proportions. It explains how to construct confidence intervals using the normal distribution for large sample sizes (n ≥ 30) and the t-distribution for small sample sizes. Formulas are provided for calculating margin of error and determining necessary sample size. Guidelines are given for determining whether to use the normal or t-distribution based on sample size and characteristics. Confidence intervals can be constructed for variance and standard deviation using the chi-square distribution.
This presentation discusses binomial probability distributions through the following key points:
- It defines basic terminology related to random experiments, events, and variables. The binomial distribution specifically describes discrete data from Bernoulli processes.
- It outlines the notation and assumptions for binomial distributions, including that there are two possible outcomes for each trial (success/failure), a fixed number of trials, and constant probabilities of success/failure.
- It presents three methods for calculating binomial probabilities: the binomial probability formula, table method, and using technology like Excel.
- It discusses measures of central tendency and dispersion for binomial distributions and how the shape of the distribution depends on the number of trials and probability of success.
- Real-world
The document provides an overview of regression analysis. It defines regression analysis as a technique used to estimate the relationship between a dependent variable and one or more independent variables. The key purposes of regression are to estimate relationships between variables, determine the effect of each independent variable on the dependent variable, and predict the dependent variable given values of the independent variables. The document also outlines the assumptions of the linear regression model, introduces simple and multiple regression, and describes methods for model building including variable selection procedures.
Chapter 6 simple regression and correlationRione Drevale
There is a significant positive correlation between amount of feed intake and live weight of broilers. The correlation coefficient (r) between feed intake and live weight is 0.726, which is statistically significant with p<0.017. On average, broilers gain approximately 0.5 kg of live weight for every 1 kg of feed consumed.
Regression analysis is a statistical technique for predicting a dependent variable based on one or more independent variables. Simple linear regression fits a straight line to the data to predict a continuous dependent variable (y) from a single independent variable (x). The output is an equation of the form y= b0 + b1x + ε, where b0 is the y-intercept, b1 is the slope, and ε is the error. Multiple linear regression extends this to include more than one independent variable. Regression analysis calculates the "best fit" line that minimizes the residuals, or differences between predicted and observed y values.
Regression analysis is a statistical technique used to estimate the relationships between variables. It allows one to predict the value of a dependent variable based on the value of one or more independent variables. The document discusses simple linear regression, where there is one independent variable, as well as multiple linear regression which involves two or more independent variables. Examples of linear relationships that can be modeled using regression analysis include price vs. quantity, sales vs. advertising, and crop yield vs. fertilizer usage. The key methods for performing regression analysis covered in the document are least squares regression and regressions based on deviations from the mean.
The t-test is used to compare the means of two groups and has three main applications:
1) Compare a sample mean to a population mean.
2) Compare the means of two independent samples.
3) Compare the values of one sample at two different time points.
There are two main types: the independent-measures t-test for samples not matched, and the matched-pair t-test for samples in pairs. The t-test assumes normal distributions and equal variances between groups. Examples are provided to demonstrate hypothesis testing for each application.
This chapter summary covers simple linear regression models. Key topics include determining the simple linear regression equation, measures of variation such as total, explained, and unexplained sums of squares, assumptions of the regression model including normality, homoscedasticity and independence of errors. Residual analysis is discussed to examine linearity and assumptions. The coefficient of determination, standard error of estimate, and Durbin-Watson statistic are also introduced.
The Cramer-Rao Inequality provides us with a lower bound on the variance of an unbiased estimator for a parameter.
The Cramer-Rao Inequality Let X = (X1,X2,. . ., Xn) be a random sample from a distribution with d.f. f(x|θ), where θ is a scalar parameter. Under certain regularity conditions on f(x|θ), for any unbiased estimator φˆ (X) of φ (θ)
Regression analysis measures the average relationship between two or more variables using their original data units. There are two main types: simple regression involving two variables, and multiple regression involving more than two variables. Regression can be linear, following a straight line, or non-linear/curvilinear. A simple linear regression model relates a dependent variable Y to an independent variable X plus an error term. Estimating the model involves calculating the slope/regression coefficient and intercept. Multiple regression relates a dependent variable to two or more independent variables using a multiple correlation coefficient.
This document discusses key concepts in statistics for engineers and scientists such as point estimates, properties of good estimators, confidence intervals, and the t-distribution. A point estimate is a single numerical value used to estimate a population parameter from a sample. A good estimator must be unbiased, consistent, and relatively efficient. A confidence interval provides a range of values that is likely to contain the true population parameter based on the sample data and confidence level. The t-distribution is similar to the normal distribution but has greater variance and depends on degrees of freedom. Examples are provided to demonstrate how to calculate confidence intervals for means using the normal and t-distributions.
This document discusses correlation and regression. Correlation describes the strength and direction of a linear relationship between two variables, while regression allows predicting a dependent variable from an independent variable. It provides examples of calculating the correlation coefficient r to determine the strength and direction of relationships between variables like education and self-esteem or family income and number of children. The regression equation describes the linear regression line and can be used to predict values of the dependent variable from known values of the independent variable.
Simple Linear Regression: Step-By-StepDan Wellisch
This presentation was made to our meetup group found here.: https://www.meetup.com/Chicago-Technology-For-Value-Based-Healthcare-Meetup/ on 9/26/2017. Our group is focused on technology applied to healthcare in order to create better healthcare.
Correlation analysis measures the relationship between two or more variables. The sample correlation coefficient r ranges from -1 to 1, indicating the degree of linear relationship between variables. A value of 0 indicates no linear relationship, while values closer to 1 or -1 indicate a strong positive or negative linear relationship. Excel can be used to calculate r using the CORREL function.
This document discusses correlation and regression analysis. It defines correlation as a statistical measure of how two variables are related. A correlation coefficient between -1 and 1 indicates the strength and direction of the linear relationship between variables. A scatterplot can show this graphically. Regression analysis involves using one variable to predict scores on another variable. Simple linear regression uses one independent variable to predict a dependent variable, while multiple regression uses two or more independent variables. The goal is to identify the regression line that best fits the data with the least error. The coefficient of determination, R2, indicates how much variance in the dependent variable is explained by the independent variables.
This document discusses correlation and regression analysis. It defines correlation as a statistical measure of how related two variables are. A correlation coefficient between -1 and 1 indicates the strength and direction of the relationship. Scatterplots visually depict the relationship between variables. Regression analysis predicts the value of a dependent variable based on the value of one or more independent variables. The regression equation represents the line of best fit through the data points that minimizes the residuals.
This document discusses correlation and regression analysis. It defines correlation as a statistical measure of how strongly two variables are related. A correlation coefficient between -1 and 1 indicates the strength and direction of the linear relationship between variables. Regression analysis allows us to predict the value of a dependent variable based on the value of one or more independent variables. Simple linear regression involves one independent variable, while multiple regression involves two or more independent variables to predict the dependent variable. The document provides examples and formulas for calculating correlation, regression lines, explained and unexplained variance, and the coefficient of determination.
This document discusses key statistical concepts including random variables, probability distributions, expected value, variance, and correlation. It defines discrete and continuous random variables and explains how probability distributions assign probabilities to the possible values of a random variable. It also defines important metrics like expected value and variance, and how they are calculated for discrete and continuous random variables. The document concludes by explaining correlation, how the correlation coefficient measures the strength and direction of linear association between two variables, and how it is calculated.
This document provides an overview of correlation and linear regression analysis. It defines correlation as a statistical measure of the relationship between two variables. Pearson's correlation coefficient (r) ranges from -1 to 1, with values farther from 0 indicating a stronger linear relationship. Positive values indicate an increasing relationship, while negative values indicate a decreasing relationship. The coefficient of determination (r2) represents the proportion of shared variance between variables. While correlation indicates linear association, it does not imply causation. Multiple regression allows predicting a continuous dependent variable from two or more independent variables.
This presentation covered the following topics:
1. Definition of Correlation and Regression
2. Meaning of Correlation and Regression
3. Types of Correlation and Regression
4. Karl Pearson's methods of correlation
5. Bivariate Grouped data method
6. Spearman's Rank correlation Method
7. Scattered diagram method
8. Interpretation of correlation coefficient
9. Lines of Regression
10. regression Equations
11. Difference between correlation and regression
12. Related examples
Correlation and regression analysis are statistical methods used to determine if a relationship exists between variables and describe the nature of that relationship. A scatter plot graphs the independent and dependent variables and allows visualization of any trends in the data. The correlation coefficient measures the strength and direction of the linear relationship between variables, ranging from -1 to 1. Regression finds the linear "best fit" line that minimizes the residuals and can be used to predict dependent variable values.
Correlation and regression analysis are statistical methods used to determine if a relationship exists between variables and describe the nature of that relationship. A scatter plot graphs the independent and dependent variables and allows visualization of any trends in the data. The correlation coefficient measures the strength and direction of the linear relationship between variables, ranging from -1 to 1. Regression finds the linear "best fit" line that minimizes the residuals, or differences between observed and predicted dependent variable values. The coefficient of determination measures how much variation in the dependent variable is explained by the regression model.
This document discusses correlation and defines it as the statistical relationship between two variables, where a change in one variable results in a corresponding change in the other. It describes different types of correlation including positive, negative, simple, partial and multiple. Methods for studying correlation are also outlined, including scatter diagrams and Karl Pearson's coefficient of correlation (represented by r), which quantifies the strength and direction of the linear relationship between two variables from -1 to 1. The coefficient of determination (r2) is also introduced, which expresses the proportion of variance in one variable that is predictable from the other.
FSE 200AdkinsPage 1 of 10Simple Linear Regression Corr.docxbudbarber38650
FSE 200
Adkins Page 1 of 10
Simple Linear Regression
Correlation only measures the strength and direction of the linear relationship between two quantitative variables. If the relationship is linear, then we would like to try to model that relationship with the equation of a line. We will use a regression line to describe the relationship between an explanatory variable and a response variable.
A regression line is a straight line that describes how a response variable y changes as an explanatory variable x changes. We often use a regression line to predict the value of y for a given value of x.
Ex. It has been suggested that there is a relationship between sleep deprivation of employees and the ability to complete simple tasks. To evaluate this hypothesis, 12 people were asked to solve simple tasks after having been without sleep for 15, 18, 21, and 24 hours. The sample data are shown below.
Subject
Hours without sleep, x
Tasks completed, y
1
15
13
2
15
9
3
15
15
4
18
8
5
18
12
6
18
10
7
21
5
8
21
8
9
21
7
10
24
3
11
24
5
12
24
4
Draw a scatterplot and describe the relationship. Lay a straight-edge on top of the plot and move it around until you find what you think might be a “line of best fit.” Then try to predict the number of tasks completed for someone having been without sleep 16 hours.
Was your line the same as that of the classmate sitting next to you? Probably not. We need a method that we can use to find the “best” regression line to use for prediction. The method we will use is called least-squares. No line will pass exactly through all the points in the scatterplot. When we use the line to predict a y for a given x value, if there is a data point with that same x value, we can compute the error (residual):
Our goal is going to be to make the vertical distances from the line as small as possible. The most commonly used method for doing this is the least-squares method.
The least-squares regression line of y on x is the line that makes the sum of the squares of the vertical distances of the data points from the line as small as possible.
Equation of the Least-Squares Regression Line
· Least-Squares Regression Line:
· Slope of the Regression Line:
· Intercept of the Regression Line:
Generally, regression is performed using statistical software. Clearly, given the appropriate information, the above formulas are simple to use.
Once we have the regression line, how do we interpret it, and what can we do with it?
The slope of a regression line is the rate of change, that amount of change in when x increases by 1.
The intercept of the regression line is the value of when x = 0. It is statistically meaningful only when x can take on values that are close to zero.
To make a prediction, just substitute an x-value into the equation and find .
To plot the line on a scatterplot, just find a couple of points on the regression line, one near each end of the range of x in the data. Plot the points and connect them with a line. .
Discriminant analysis (DA) is a statistical technique used to predict group membership when the dependent variable is categorical and the independent variables are continuous. It identifies which variables discriminate between two or more naturally occurring groups. DA develops a linear equation to predict group membership based on weighted combinations of predictor variables. It aims to maximize the distance between group means to achieve strong discriminatory power. Like regression, DA assumes variables are normally distributed, cases are randomly sampled, and groups are mutually exclusive and collectively exhaustive. It requires at least two groups with minimal overlap and similar group sizes of at least five cases. DA can classify new cases into groups based on the discriminant functions derived from existing data.
This document provides an introduction to linear regression analysis. It discusses how regression finds the best fitting straight line to describe the relationship between two variables. The regression line minimizes the residuals, or errors, between the predicted Y values from the line and the actual data points. The accuracy of predictions from the regression model can be evaluated using the correlation coefficient (r) and the standard error of estimate. Multiple linear regression extends this process to model relationships between a dependent variable Y and two or more independent variables (X1, X2, etc).
It is most useful for the students of BBA for the subject of "Data Analysis and Modeling"/
It has covered the content of chapter- Data regression Model
Visit for more on www.ramkumarshah.com.np/
1. Regression analysis is a statistical technique used to model relationships between variables and make predictions. It can be used to describe relationships, estimate coefficients, make predictions, and control systems.
2. Linear regression models describe straight-line relationships between variables, while non-linear models describe curved relationships. The goodness of fit of a model can be evaluated using the coefficient of determination.
3. The least squares method is used to fit regression lines by minimizing the sum of the squared vertical distances between observed and estimated y-values for a regression of y on x, or minimizing the sum of squared horizontal distances for a regression of x on y.
To get a copy of the slides for free Email me at: japhethmuthama@gmail.com
You can also support my PhD studies by donating a 1 dollar to my PayPal.
PayPal ID is japhethmuthama@gmail.com
Factor Extraction method in factor analysis with example in R studio.pptxGauravRajole
This document discusses various factor extraction methods used in factor analysis including principal factor method, unweighted least squares method, and generalized least squares method. It provides details on principal factor method, testing the sufficiency of factor numbers, interpreting factors, calculating factor scores, and an example application of factor analysis. The key points are:
1) Principal factor method estimates factors to explain maximum variation in the original data set and is similar to principal component analysis.
2) Testing the sufficiency of factor numbers uses a likelihood ratio test statistic to determine if factors are sufficient to explain the variance in the data.
3) Factor loadings indicate the correlation between variables and factors and are used to interpret and group factors.
This document discusses correlation and regression analysis. It defines correlation analysis as examining the relationship between two or more variables, and regression analysis as examining how one variable changes when another specific variable changes in volume. It covers positive and negative correlation, linear and non-linear correlation, and how to calculate the coefficient of correlation. Regression analysis and regression equations are introduced for using a known variable to predict an unknown variable. Examples are provided to illustrate key concepts.
Please Subscribe to this Channel for more solutions and lectures
http://www.youtube.com/onlineteaching
Chapter 10: Correlation and Regression
10.1: Correlation
This document provides an overview of simple linear regression. It defines regression as determining the statistical relationship between variables where changes in one variable depend on changes in another. Regression analysis is used for prediction and exploring relationships between dependent and independent variables. The key aspects covered include:
- Dependent variables change due to independent variables.
- Lines of regression show the relationship between the variables.
- The method of least squares is used to determine the line of best fit that minimizes the error between predicted and actual values.
- Linear regression models take the form of y = a + bx and are used for tasks like prediction and determining impact of independent variables.
Similar to Introduction to correlation and regression analysis (20)
1) The document discusses risk-return analysis and the efficient frontier. It introduces the Capital Market Line (CML), which shows superior portfolio combinations when investing in both risky and risk-free assets.
2) The CML is tangent to the efficient frontier at the market portfolio, which offers the highest Sharpe Ratio. The Sharpe Ratio represents excess return per unit of risk.
3) With access to risk-free borrowing and lending, investors are no longer confined to the efficient frontier, but can choose portfolios along the CML based on their individual risk preferences.
NPV and IRR are commonly used methods for evaluating investment decisions, but each has limitations. NPV compares the present value of cash inflows to the investment cost, accepting projects where NPV is positive. IRR is the discount rate that sets NPV to zero, accepting projects where IRR exceeds the opportunity cost of capital. However, IRR can indicate acceptance of projects with negative NPV. Managers also consider payback period and accounting rate of return, but these methods ignore timing of cash flows. Cash flows rather than accounting profits drive NPV analysis. Overall, NPV is preferred but multiple methods are often examined when evaluating large capital investments.
This document provides an overview of key concepts in bond valuation, equity valuation, and financial mathematics. It defines geometric sequences and series, and explains how to calculate the present value of perpetuities, annuities, growing perpetuities, and delayed perpetuities. It also provides examples of calculating present values for specific financial scenarios.
1. The document discusses different types of business entities including sole proprietorships, partnerships, and corporations. It notes key characteristics of each like ownership, liability, taxation, and life of the business.
2. It then covers topics relevant to corporate finance including the roles of financial managers, the goal of maximizing shareholder wealth, and investment and financing decisions.
3. The document also discusses concepts such as the balance sheet, time value of money, net present value, and discounted cash flow analysis which are important tools used in financial management.
1) The document discusses the envelope theorem and its application to optimization problems with parameters. The envelope theorem describes how optimal values of decision variables change with parameters.
2) It provides an example of optimizing a quadratic function with respect to a single parameter b, showing how the optimal values of x and y vary with b.
3) The envelope theorem can be applied to constrained and unconstrained optimization problems. It allows calculating the rate of change of the optimal objective value with respect to parameters.
This document provides an overview of matrix algebra concepts for business students. It defines key terms like matrix, order, types of matrices including identity, diagonal and triangular matrices, and matrix operations such as addition, subtraction and multiplication. It also explains determinants, which evaluate whether a system of linear equations has a unique solution. Determinants are calculated by taking the difference of products of diagonal elements of a square matrix. This document serves as a basic introduction and recap of matrix algebra.
1) The document discusses basic rules and concepts of integration, including that integration is the inverse process of differentiation and that the indefinite integral of a function f(x) is notated as ∫f(x) dx = F(x) + c, where F(x) is the primitive function and c is the constant of integration.
2) Methods of integration discussed include the substitution method, where a function is substituted for the variable, and integration by parts, which uses the product rule in reverse to solve integrals involving products.
3) Finding the constant of integration c requires knowing the value of the primitive function F(x) at a specific point, which eliminates the family of functions and isolates a
1) The document provides an overview of key concepts in probability and statistics, including random variables, probability distributions, and characteristics of distributions such as expected value and variance.
2) It defines key probability terms such as population, sample, mutually exclusive events, independent events, and exhaustive events. It also covers how to calculate the probability of single and multiple events.
3) The document distinguishes between discrete and continuous random variables and probability distributions. It explains how probability distributions associate probabilities with individual outcomes for discrete variables but use probability density functions to provide probabilities over intervals for continuous variables.
∂z
∂x
= f′.
∂g
∂x
∂z
∂y
= f′.
∂g
∂y
The document discusses multi-variable functions and their derivatives. It defines partial derivatives as the slope of a multi-variable function with respect to one variable, holding the other variables constant. It provides examples of calculating partial derivatives using limits and applying rules like the product and chain rules. Formulas are given for finding the partial derivatives of a function z with respect to x and y at a specific point.
The document provides an overview of basic calculus concepts including:
- Exponents and exponent rules for multiplying, dividing, and raising to powers.
- Algebraic expressions including monomials, binomials, polynomials, and equations.
- Common identities for exponents, polynomials, trigonometric functions.
- The definition of a function as a correspondence between variables where each input has a single output.
- Examples of basic functions including power, exponential, logarithmic, and trigonometric functions.
The Dynamic of Business Cycle in Kalecki’s Theory: Duality in the Nature of I...Farzad Javidanrad
This document summarizes Kalecki's theory of business cycles, which attributes cycles to the dual nature of investment in capitalism. It introduces Kalecki's concepts of short-period equilibrium and a dynamic process as a chain of short-period equilibria. Key aspects of Kalecki's theory included differentiating between workers' consumption and capitalists' consumption, and introducing a time lag between investment decisions and investment output through the concept of a "gestation period". This dynamic process with a lag is what drives the cyclical nature of capitalism according to Kalecki's theory.
This document provides an overview of financial markets and institutions. It begins by defining key terms like financial systems and markets. It then describes different types of financial markets including capital markets, money markets, commodity markets, and more. It also outlines various financial institutions like commercial banks, investment banks, insurance companies, and others. The document discusses how funds flow through the financial system directly and indirectly. It also touches on important concepts like asymmetric information, free rider problems, and how financial development relates to economic growth. Finally, it introduces Minsky's financial instability hypothesis and how periods of stability can lead to increased risk-taking and potential financial crises.
The Jewish Trinity : Sabbath,Shekinah and Sanctuary 4.pdfJackieSparrow3
we may assume that God created the cosmos to be his great temple, in which he rested after his creative work. Nevertheless, his special revelatory presence did not fill the entire earth yet, since it was his intention that his human vice-regent, whom he installed in the garden sanctuary, would extend worldwide the boundaries of that sanctuary and of God’s presence. Adam, of course, disobeyed this mandate, so that humanity no longer enjoyed God’s presence in the little localized garden. Consequently, the entire earth became infected with sin and idolatry in a way it had not been previously before the fall, while yet in its still imperfect newly created state. Therefore, the various expressions about God being unable to inhabit earthly structures are best understood, at least in part, by realizing that the old order and sanctuary have been tainted with sin and must be cleansed and recreated before God’s Shekinah presence, formerly limited to heaven and the holy of holies, can dwell universally throughout creation
Principles of Roods Approach!!!!!!!.pptxibtesaam huma
Principles of Rood’s Approach
Treatment technique used in physiotherapy for neurological patients which aids them to recover and improve quality of life
Facilitatory techniques
Inhibitory techniques
No, it's not a robot: prompt writing for investigative journalismPaul Bradshaw
How to use generative AI tools like ChatGPT and Gemini to generate story ideas for investigations, identify potential sources, and help with coding and writing.
A talk from the Centre for Investigative Journalism Summer School, July 2024
Integrated Marketing Communications (IMC)- Concept, Features, Elements, Role of advertising in IMC
Advertising: Concept, Features, Evolution of Advertising, Active Participants, Benefits of advertising to Business firms and consumers.
Classification of advertising: Geographic, Media, Target audience and Functions.
Satta Matka Dpboss Kalyan Matka Results Kalyan ChartMohit Tripathi
SATTA MATKA DPBOSS KALYAN MATKA RESULTS KALYAN CHART KALYAN MATKA MATKA RESULT KALYAN MATKA TIPS SATTA MATKA MATKA COM MATKA PANA JODI TODAY BATTA SATKA MATKA PATTI JODI NUMBER MATKA RESULTS MATKA CHART MATKA JODI SATTA COM INDIA SATTA MATKA MATKA TIPS MATKA WAPKA ALL MATKA RESULT LIVE ONLINE MATKA RESULT KALYAN MATKA RESULT DPBOSS MATKA 143 MAIN MATKA KALYAN MATKA RESULTS KALYAN CHART
Kalyan Matka Kalyan Result Satta Matka Result Satta Matka Kalyan Satta Matka Kalyan Open Today Satta Matka Kalyan
Kalyan today kalyan trick kalyan trick today kalyan chart kalyan today free game kalyan today fix jodi kalyan today matka kalyan today open Kalyan jodi kalyan jodi trick today kalyan jodi trick kalyan jodi ajj ka.
Beginner's Guide to Bypassing Falco Container Runtime Security in Kubernetes ...anjaliinfosec
This presentation, crafted for the Kubernetes Village at BSides Bangalore 2024, delves into the essentials of bypassing Falco, a leading container runtime security solution in Kubernetes. Tailored for beginners, it covers fundamental concepts, practical techniques, and real-world examples to help you understand and navigate Falco's security mechanisms effectively. Ideal for developers, security professionals, and tech enthusiasts eager to enhance their expertise in Kubernetes security and container runtime defenses.
Credit limit improvement system in odoo 17Celine George
In Odoo 17, confirmed and uninvoiced sales orders are now factored into a partner's total receivables. As a result, the credit limit warning system now considers this updated calculation, leading to more accurate and effective credit management.
Understanding and Interpreting Teachers’ TPACK for Teaching Multimodalities i...Neny Isharyanti
Presented as a plenary session in iTELL 2024 in Salatiga on 4 July 2024.
The plenary focuses on understanding and intepreting relevant TPACK competence for teachers to be adept in teaching multimodality in the digital age. It juxtaposes the results of research on multimodality with its contextual implementation in the teaching of English subject in the Indonesian Emancipated Curriculum.
2. Some Basic Concepts:
o Variable: A letter (symbol) which represents the elements of
a specific set.
o Random Variable: A variable whose values are randomly
appear based on a probability distribution.
o Probability Distribution: A corresponding rule (function)
which corresponds a probability to the values of a random
variable (individually or to a set of them). E.g.:
𝒙 0 1
𝑃(𝑥) 0.5 0.5
In one trial 𝐻, 𝑇
In two trials
𝐻𝐻, 𝐻𝑇, 𝑇𝐻, 𝑇𝑇
3. Correlation:
Is there any relation between:
fast food sale and different seasons?
specific crime and religion?
smoking cigarette and lung cancer?
maths score and overall score in exam?
temperature and earthquake?
cost of advertisement and number of sold items?
To answer each question two sets of corresponding data need to
be randomly collected.
Let random variable "𝒙" represents the first group of
data and random variable "𝒚" represents the second.
Question: Is this true that students who have a better
overall result are good in maths?
4. Our aim is to find out whether there is any linear
association between 𝒙 and 𝒚. In statistics, technical
term for linear association is “correlation”. So, we are
looking to see if there is any correlation between two
scores.
“Linear association” : variables are in relations at
their levels, i.e. 𝒙 with 𝒚 not with 𝒚 𝟐
, 𝒚 𝟑
,
𝟏
𝒚
or even
∆𝒚.
Imagine we have a random sample of scores in a
school as following:
5. In our example, the correlation between 𝒙 and 𝒚
can be shown in a scatter diagram:
0
10
20
30
40
50
60
70
80
90
100
0 20 40 60 80 100
Y
X
Correlation between maths score and
overall score The graph shows a
positive correlation
between maths
scores and overall
scores, i.e. when 𝒙
increases 𝒚
increases too.
6. Different scatter diagrams show different types of
correlation:
• Is this enough? Are we happy?
Certainly not!! We think we know things better
when they are described by numbers!!!!
Although, scatter diagrams are informative but to find
the degree (strength) of a correlation between two
variables we need a numerical measurement.
Adopted from www.pdesas.org
7. Following the work of Francis Galton on regression
line, in 1896 Karl Pearson introduced a formula for
measuring correlation between two variables, called
Correlation Coefficient or Pearson’s Correlation
Coefficient.
For a sample of size 𝒏, sample correlation coefficient
𝒓 𝒙𝒚 can be calculated by:
𝒓 𝒙𝒚 =
𝟏
𝒏
(𝒙𝒊 − 𝒙)(𝒚𝒊 − 𝒚)
𝟏
𝒏
(𝒙𝒊 − 𝒙) 𝟐 . 𝟏
𝒏
(𝒚𝒊 − 𝒚) 𝟐
=
𝒄𝒐𝒗(𝒙, 𝒚)
𝑺 𝒙 . 𝑺 𝒚
Where 𝒙 and 𝒚 are the mean values of 𝒙 and 𝒚 in the
sample and 𝑺 represents the biased version of
“standard deviation”*. The covariance between 𝒙 and 𝒚
( 𝒄𝒐𝒗 𝒙, 𝒚 ) shows how much 𝒙 and 𝒚 change together.
8. Alternatively, if there is an opportunity to observe all
available data, the population correlation coefficient
(𝝆 𝒙𝒚) can be obtained by:
𝝆 𝒙𝒚 =
𝑬 𝒙𝒊 − 𝝁 𝒙 . (𝒚𝒊 − 𝝁 𝒚)
𝑬 𝒙𝒊 − 𝝁 𝒙
𝟐. 𝑬(𝒚𝒊 − 𝝁 𝒚) 𝟐
=
𝒄𝒐𝒗(𝒙, 𝒚)
𝝈 𝒙 . 𝝈 𝒚
Where 𝑬, 𝝁 and 𝝈 are expected value, mean and
standard deviation of the random variables,
respectively and 𝑵 is the size of the population.
Question: Under what conditions can we use this
population correlation coefficient?
9. If 𝒙 = 𝒂𝒚 + 𝒃 𝒓 𝒙𝒚 = 𝟏
Maximum (perfect) positive correlation.
If 𝒙 = 𝒂𝒚 + 𝒃 𝒓 𝒙𝒚 = −𝟏
Maximum (perfect) negative correlation.
If there is no linear association between 𝒙 and 𝒚
then 𝒓 𝒙𝒚 = 𝟎.
Note 1: If there is no linear association between two
random variables they might have non linear
association or no association at all.
For all 𝒂 , 𝒃 ∈ 𝑹
And 𝒂 > 𝟎
For all 𝒂 , 𝒃 ∈ 𝑹
And 𝒂 < 𝟎
11. Positive Linear
Association
No Linear
Association
Negative Linear
Association
𝑺 𝒙 > 𝑺 𝒚 𝑺 𝒙 = 𝑺 𝒚 𝑺 𝒙 < 𝑺 𝒚
𝒓 𝒙𝒚 = 𝟏
Adapted and modified from www.tice.agrocampus-ouest.fr
𝒓 𝒙𝒚 ≈ 𝟏
𝟎 < 𝒓 𝒙𝒚 < 𝟏
𝒓 𝒙𝒚 = 𝟎
−𝟏 < 𝒓 𝒙𝒚< 𝟎
𝒓 𝒙𝒚 ≈ −𝟏
𝒓 𝒙𝒚 = −𝟏
Perfect
Weak
No
Correlation
Weak
Strong
Perfect
Strong
12. Some properties of the correlation coefficient:
(Sample or population)
a. It lies between -1 and 1, i.e. −𝟏 ≤ 𝒓 𝒙𝒚 ≤ 𝟏.
b. It is symmetrical with respect to 𝒙 and 𝒚, i.e. 𝒓 𝒙𝒚 =
𝒓 𝒚𝒙 . This means the direction of calculation is not
important.
c. It is just a pure number and independent from the
unit of measurement of 𝒙 and 𝒚.
d. It is independent of the choice of origin and scale
of 𝒙 and 𝒚’s measurements, that is;
𝒓 𝒙𝒚 = 𝒓 𝒂𝒙+𝒃 𝒄𝒚+𝒅 (𝒂, 𝒄 > 𝟎)
13. e. 𝒇 𝒙, 𝒚 = 𝒇 𝒙 . 𝒇(𝒚) 𝒓 𝒙𝒚 = 𝟎
Important Note:
Many researchers wrongly construct a theory just based on a
simple correlation test.
Correlation does not imply causation.
If there is a high correlation between number of smoked
cigarettes and the number of infected lung’s cells it does not
necessarily mean that smoking causes lung cancer. Causality
test (sometimes called Granger causality test) is different from
correlation test.
In causality test it is important to know about the direction of
causality (e.g. 𝒙 on 𝒚 and not vice versa) but in correlation we
are trying to find if two variables moving together (same or
opposite directions).
𝒙 and 𝒚 are statistically independent,
where 𝒇(𝒙, 𝒚) is the joint Probability
Density Function (PDF)
14. Determination Coefficient and Correlation Coefficient:
𝒓 𝒙𝒚 = ±𝟏 perfect linear relationship between variables:
i.e. 𝒙 is the only factor which describes variations of 𝒚 at the level
(linearly); 𝒚 = 𝒂 + 𝒃𝒙 .
𝒓 𝒙𝒚 ≈ ±𝟏 𝒙 is not the only factor which describes
variations of 𝒚 but we can still imagine that a line represents this
relationship which passing through most of the points or having a
minimum vertical distance from them, in total. This line is called
the “line of best fit” or known technically as “regression line”.
Adopted from www.ncetm.org.uk/public/files/195322/G3fb.jpg
The graph shows a line of
best fit between age of a
car and its price. Imagine
the line has the equation
of 𝒚 = 𝒂 + 𝒃𝒙
15. The criterion to choose a line among others is the
goodness of fit which can be calculated through
determination coefficient, 𝒓 𝟐.
In the previous example, age of a car is only factor
among many other factors that explain the price of a
car. Can you find some other factors?
If 𝒚 and 𝒙 represent price and age of cars respectively,
the percentage of the variation of 𝒚 which is determined
(explained) by the variation of 𝒙 is called “determination
coefficient”.
Determination coefficient can be understood better by
Venn-Euler diagrams:
16. y x
y x
y x
y=x
𝒓 𝟐 = 𝟎 , none of variations of y can be determined
by x (no linear association)
𝒓 𝟐
≈ 𝟎, small percentage of variation of y can be
determined by x (weak linear association)
𝒓 𝟐 ≈ 𝟏, large percentage of variation of y can be
determined by x (strong linear association)
𝒓 𝟐
= 𝟏, all variation of y can be determined by x
and no other factors (complete linear association)
The shaded area shows the percentage of variation of
y which can be determined by x. it is easy to
understand that 𝟎 ≤ 𝒓 𝟐
≤ 𝟏.
17. Although, determination coefficient (𝒓 𝟐) is different
conceptually from correlation coefficient (𝒓 𝒙𝒚)but one
can be calculated from another; in fact:
𝒓 𝒙𝒚 = ± 𝒓 𝟐
Or, alternatively
𝒓 𝟐 = 𝒃 𝟐 𝟏
𝒏
𝒙𝒊 − 𝒙 𝟐
𝟏
𝒏
𝒚𝒊 − 𝒚 𝟐
= 𝒃 𝟐
𝑺 𝒙
𝟐
𝑺 𝒚
𝟐
Where 𝒃 is the slope coefficient in the regression
line 𝒚 = 𝒂 + 𝒃𝒙 .
Note: If 𝒚 = 𝒂 + 𝒃𝒙 shows the regression line (𝒚 𝒐𝒏 𝒙)
and 𝒙 = 𝒄 + 𝒅𝒚 shows another regression line (𝒙 𝒐𝒏 𝒚)
then we have: 𝒓 𝟐 = 𝒃. 𝒅
18. Summary of Correlation & Determination Coefficients:
• Correlation means a linear association between two random variables which
could be positive or negative or zero.
• Linear association means that variables are in relations at their levels
(linearly).
• Correlation coefficient measures the strength of linear association between
two variables. It could be calculated for a sample or for the whole population.
• The value of correlation coefficient is between -1 and 1, which show the
strongest correlation (negative or positive) but moving towards zero it makes
correlation weaker.
• Correlation does not imply causation.
• Determination coefficient shows the percentage of variation of one variable
which can be described by another variable and it is a measure for the
goodness of fit for lines passing through plotted points.
• The value of determination coefficient is between 0 and 1 and can be
obtained from correlation coefficient by squaring it.
19. • Knowing two random variables are just linearly associated is
not much satisfactory. There are sometimes a strong idea
that the variation of one variable can solidly explain the
variation of another.
• To test this idea (hypothesis) we need another analytical
approach, which is called “regression analysis”.
• In regression analysis we try to study or predict the mean
(average) value of a dependent variable 𝒀 based on the
knowledge we have about independent (explanatory)
variable(s) 𝑿 𝟏, 𝑿 𝟐,…, 𝑿 𝒏. This is familiar for those who know
the meaning of conditional probabilities; as we are going to
make a linear model such as, which is a deterministic part of
the model in regression analysis:
𝐸(𝑌 𝑋1, 𝑋2,…, 𝑋 𝑛) = 𝛽0 + 𝛽1 𝑋1 + 𝛽2 𝑋2 + ⋯ + 𝛽 𝑛 𝑋 𝑛
20. • The deterministic part of the regression model does reflect the
structure of the relationship between 𝒀 and 𝑿′ 𝒔 in a
mathematical world but we live in a stochastic world.
• God’s knowledge (if the term is applicable) is deterministic but
our perception about everything in this world is always
stochastic and our model should be built in this way.
• To understand the concept of stochastic model let’s have an
example:
If we make a model between monthly consumption expenditure
𝑪 and monthly income 𝑰, the model cannot be deterministic
(mathematical) such that for every value of 𝑰 there is one and
only one value of 𝑪 (which is the concept of functional
relationship in maths). Why?
21. Although, the income is the main variable determining the amount of
consumption expenditure but many other factors such as the mood of
people, their wealth, interest rate and etc. are overlooked in a simple
mathematical model such as 𝑪 = 𝒇(𝑰) but their influences can change the
value of 𝑪 even at the same level of 𝑰. If we believe that the average impact
of all their omitted variables is random (sometimes positive and sometimes
negative). So, in order to make a realistic model we need to add a stochastic
(random) term 𝒖 to our mathematical model: 𝑪 = 𝒇 𝑰 + 𝒖
£1000
£1400
⋮
⋮
£800
£1000
£750
£900
£1200
£1150
I C
The change in the
consumption
expenditure comes
from the change of
income (𝐼) or
change of some
random elements
(𝑢), so, we can write
𝑪 = 𝒇 𝑰 + 𝒖
22. • The general stochastic model for our purpose would be as
following, which is called “Linear Regression Model**”:
𝒀𝒊 = 𝑬(𝒀𝒊 𝑿 𝟏𝒊, … , 𝑿 𝒏𝒊) + 𝒖𝒊
Which can be written as:
𝒀𝒊 = 𝜷 𝟎 + 𝜷 𝟏 𝑿 𝟏𝒊 + 𝜷 𝟐 𝑿 𝟐𝒊 + ⋯ + 𝜷 𝒏 𝑿 𝒏𝒊 + 𝒖𝒊
Where 𝒊 (𝑖 = 1,2, … , 𝑛) shows time period (days, weeks, months,
years and etc.) and 𝒖𝒊 is an error (stochastic) term and also a
representative of all other influential variables which are not
considered in the model and ignored.
• The deterministic part of the model
𝑬(𝒀𝒊 𝑿 𝟏𝒊, … , 𝑿 𝒏𝒊) =𝜷 𝟎 + 𝜷 𝟏 𝑿 𝟏𝒊 + 𝜷 𝟐 𝑿 𝟐𝒊 + ⋯ + 𝜷 𝒏 𝑿 𝒏𝒊
is called Population Regression Function (PRF).
23. • The general form of the Linear Regression Model with 𝒌
explanatory variables and 𝒏 observations can be shown in
the matrix form as:
𝒀 𝑛×1 = 𝑿 𝑛×𝑘 𝜷 𝑘×1 + 𝒖 𝑛×1
Or simply:
𝒀 = 𝑿𝜷 + 𝒖
Where
𝒀 =
𝑌1
𝑌2
⋮
𝑌𝑛
, 𝑿 =
1 𝑋11 𝑋21
1
⋮
𝑋12
⋮
𝑋22
⋮
1 𝑋1𝑛 𝑋2𝑛
… 𝑋 𝑘1
…
⋱
𝑋 𝑘2
⋮
… 𝑋 𝑘𝑛
, 𝜷 =
𝛽0
𝛽1
⋮
𝛽 𝑘
and 𝒖 =
𝑢1
𝑢2
⋮
𝑢 𝑛
𝒀 is also called regressand and 𝑿 is a vector of regressors.
24. • 𝜷 𝟎 is the intercept but 𝜷𝒊
′
𝒔 are slope coefficients which are also
called regression parameters. The value of each parameter
shows the magnitude of one unit change in the associated
regressor 𝑿𝒊 on the mean value of the regressand 𝒀𝒊. The idea
is to estimate the unknown value of the population
regression parameters based on estimators which use
sample data.
• The sample counterpart of the regression line can be written in
the form of:
𝒀𝒊 = 𝒀𝒊 + 𝒖𝒊
or
𝒀𝒊 = 𝒃 𝟎 + 𝒃 𝟏 𝑿 𝟏𝒊 + 𝒃 𝟐 𝑿 𝟐𝒊 + ⋯ + 𝒃 𝒏 𝑿 𝒏𝒊 + 𝒆𝒊
Where 𝒀𝒊 = 𝒃 𝟎 + 𝒃 𝟏 𝑿 𝟏𝒊 + 𝒃 𝟐 𝑿 𝟐𝒊 + ⋯ + 𝒃 𝒏 𝑿 𝒏𝒊 is the deterministic
part of the sample model and is called “Sample Regression
Function (SRF) “and 𝒃𝒊
′
𝒔 are estimators of unknown parameters
𝜷𝒊
′
𝒔 and 𝒖𝒊 = 𝒆𝒊 is a residual.
25. The following graph shows the important elements of PRF and
SRF:
𝒀𝒊 − 𝑬(𝒀 𝑿𝒊) = 𝒖𝒊
𝒀𝒊 − 𝒀𝒊 = 𝒖𝒊 = 𝒆𝒊
observation
Estimation of
𝒀𝒊 based on SRF
Estimation of
𝒀𝒊 based on PRF
Adopted and altered fromhttp://marketingclassic.blogspot.co.uk/2011_12_01_archive.html
In PRF
In SRF
The PRF is a
hypothetical
line which we
have no idea
about that but
try to estimate
its parameters
based on the
data in sample
𝑺𝑹𝑭: 𝒀𝒊 = 𝒃 𝟎 + 𝒃 𝟏 𝑿𝒊
𝑷𝑹𝑭: 𝑬(𝒀 𝑿𝒊) = 𝜷 𝟎 + 𝜷 𝟏 𝑿𝒊
26. • Now the question is how to calculate 𝒃𝒊
′
𝒔 based on the
sample observations and how to ensure that they are good
and unbiased estimators of 𝜷𝒊
′
𝒔 in the population?
• There are two main methods of calculating 𝒃𝒊
′
𝒔 and constructing
SRF, called the “method of Ordinary Least Square (OLS)” and
the “method of Maximum Likelihood (ML)”. Here, we focus on
OLS method as it is used most comprehensively. Here, for
simplicity, we start with two-variable PRF (𝒀𝒊 = 𝜷 𝟎 + 𝜷 𝟏 𝑿𝒊) and
its SRF counterpart (𝒀𝒊 = 𝒃 𝟎 + 𝒃 𝟏 𝑿𝒊).
• According to OLS method we try to minimise some of the
squared residuals in a hypothetical sample; i.e.
𝒖𝒊
𝟐
= 𝒆𝒊
𝟐
= 𝒀𝒊 − 𝒀𝒊
𝟐
= 𝒀𝒊 − 𝒃 𝟎 − 𝒃 𝟏 𝑿𝒊
𝟐
27. • It is obvious from previous equation that the sum of squared
residuals is a function of 𝒃 𝟎 and 𝒃 𝟏, i.e.
𝒆𝒊
𝟐 = 𝒇(𝒃 𝟎, 𝒃 𝟏)
because if these two parameters (intercept and slope) change,
𝒆𝒊
𝟐 will change (see the graph on the slide 25).
• Differentiating A partially with respect to 𝒃 𝟎 and 𝒃 𝟏 and
following the first and necessary conditions for optimisation in
calculus we have:
𝝏 𝒆𝒊
𝟐
𝝏𝒃 𝟎
= −𝟐 𝒀𝒊 − 𝒃 𝟎 − 𝒃 𝟏 𝑿𝒊 = −𝟐 𝒆𝒊 = 𝟎
𝝏 𝒆𝒊
𝟐
𝝏𝒃 𝟏
= −𝟐 𝑿𝒊 𝒀𝒊 − 𝒃 𝟎 − 𝒃 𝟏 𝑿𝒊 = −𝟐 𝑿𝒊 𝒆𝒊 = 𝟎
A
B
28. After simplifications we reach to two equations with two
unknowns 𝒃 𝟎 and 𝒃 𝟏:
𝒀𝒊 = 𝒏𝒃 𝟎 + 𝒃 𝟏 𝑿𝒊
𝒀𝒊 𝑿𝒊 = 𝒃 𝟎 𝑿𝒊 + 𝒃 𝟏 𝑿𝒊
𝟐
Where 𝒏 is the sample size. So;
𝒃 𝟏 =
𝑿𝒊 − 𝑿 𝒀𝒊 − 𝒀
𝑿𝒊 − 𝑿 𝟐
=
𝒙𝒊 𝒚𝒊
𝒙𝒊
𝟐
=
𝒄𝒐𝒗(𝒙, 𝒚)
𝑺 𝒙
𝟐
Where 𝑺 𝒙 is the biased version of sample standard deviation,
i.e. we have 𝒏 instead of (𝒏 − 𝟏) in denominator.
𝑺 𝒙 =
𝑿𝒊 − 𝑿 𝟐
𝒏
29. And
𝑏0 = 𝑌 − 𝑏1 𝑋
• The 𝒃 𝟎 and 𝒃 𝟏 obtained from OLS method are the point
estimators of 𝜷 𝟎 and 𝜷 𝟏in the population but in order to test
some hypothesis about the population parameters we need to
have knowledge about the distributions of their estimators. For
that reason we need to make some assumptions about the
explanatory variables and the error term in PRF. (see the
equations in B to find the reason).
The Assumptions Underlying the OLS Method:
1. The regression model is linear in terms of its parameters (coefficients).*
2. The values of the explanatory variable(s) are fixed in repeated sampling.
This means that the nature of explanatory variables (𝑿′ 𝒔) is non-stochastic.
The only stochastic variables are error term (𝒖𝒊) and regressand (𝒀𝒊).
3. The disturbance (error) terms are normally distributed with zero mean and
equal variance; given the value of 𝑿′ 𝒔. That is: 𝒖𝒊~𝑵(𝟎, 𝝈 𝟐)
30. 4. There is no autocorrelation between error terms, i.e.
𝒄𝒐𝒗 𝒖𝒊, 𝒖𝒋 = 𝟎
This means they are completely random and there is no association between
them or any pattern in their appearance.
5. There is no correlation between error terms and explanatory variables, i.e.
𝒄𝒐𝒗 𝒖𝒊, 𝑿𝒊 = 𝟎
6. The number of observations (sample size) should be bigger than the
number of parameters in the model.
7. The model should be logically and correctly specified in terms of functional
form or even the type and the nature of variables enter into the model.
These assumptions are the assumptions of the Classical Linear
Regression Models (CLRM), which sometimes they are called
Gaussian assumptions on linear regression models.
31. • Under these assumptions and also the central limit theorem
the OLS estimators in sampling distribution (repeated sampling)
,when 𝒏 → ∞, have a normal distribution:
𝒃 𝟎~𝑵(𝜷 𝟎,
𝑿𝒊
𝟐
𝒏 𝒙𝒊
𝟐
. 𝝈 𝟐)
𝒃 𝟏~𝑵(𝜷 𝟏,
𝝈 𝟐
𝒙𝒊
𝟐
)
where 𝝈 𝟐 is the variance of the error term (𝒗𝒂𝒓 𝒖𝒊 = 𝝈 𝟐) and it
can be estimated itself through 𝝈 estimator, where:
𝝈 =
𝒆𝒊
𝟐
𝒏 − 𝟐
𝑜𝑟
𝝈 =
𝒆𝒊
𝟐
𝒏 − 𝒌
𝑤ℎ𝑒𝑛 𝑡ℎ𝑒𝑟𝑒 𝑖𝑠 𝒌 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟 𝑖𝑛 𝑡ℎ𝑒 𝑚𝑜𝑑𝑒𝑙.
32. • Based on the assumptions of the classical linear regression
model (CLRM), Gauss-Markov Theorem asserts that the least
square estimators, among unbiased estimators, have the
minimum variance. So they are the Best, Linear, Unbiased
Estimators (BLUE).
Interval Estimation For Population Parameters:
• In order to construct a confidence interval for unknown
𝜷′ 𝒔 (PRF’s parameters) we can either follow Z distribution (if
we have a prior knowledge about 𝝈) or t-distribution (if we use
𝝈 instead).
• The confidence intervals for the slope parameter at any level of
significance 𝜶 would be*:
𝑷 𝒃 𝟏 − 𝒁 𝜶
𝟐
. 𝝈 𝒃 𝟏
≤ 𝜷 𝟏 ≤ 𝒃 𝟏 + 𝒁 𝜶
𝟐
. 𝝈 𝒃 𝟏
= 𝟏 − 𝜶
Or
𝑷 𝒃 𝟏 − 𝒕 𝜶
𝟐,(𝒏−𝟐). 𝝈 𝒃 𝟏
≤ 𝜷 𝟏 ≤ 𝒃 𝟏 + 𝒕 𝜶
𝟐,(𝒏−𝟐). 𝝈 𝒃 𝟏
= 𝟏 − 𝜶
33. Hypothesis Testing For Parameters:
• The critical values (Z or t) in the confidence intervals, can be
used to find the rejection area(s) and test any hypothesis on
parameters.
• For example, to test 𝑯 𝟎: 𝜷 𝟏 = 𝟎 against the alternative 𝑯 𝟏: 𝜷 𝟏 ≠
𝟎, after finding the critical values t (which means we do not
have prior knowledge of 𝝈 and use 𝝈 instead) at any
significance level 𝜶, we will have two critical regions and if the
value of the test statistic
𝒕 =
𝒃 𝟏−𝜷 𝟏
𝝈
𝒙 𝒊
𝟐
be in the critical region 𝑯 𝟎: 𝜷 𝟏 = 𝟎 must be rejected.
• In case we have more than one slope parameter the degree of
freedom for t-distribution will be the sample size 𝒏 minus the
number of estimated parameters including the intercept
parameters, i.e. for 𝒌 parameters 𝒅𝒇 = 𝒏 − 𝒌 .
34. Determination Coefficient 𝒓 𝟐
and Goodness of Fit:
• In early slides we talked about determination coefficient and
its relationship with correlation coefficient. The coefficient of
determination 𝒓 𝟐
come to our attention when there is no issue
about estimation of regression parameters.
• It is a measure which shows how well the SRF fits the data.
• to understand this measure properly let’s have a look at it
from different angle.
We know that
𝒀𝒊 = 𝒀𝒊 + 𝒆𝒊
And in the deviation form after
subtracting 𝒀 from both sides
𝒀𝒊 − 𝒀 = 𝒀𝒊 − 𝒀 + 𝒆𝒊
We know that 𝒆𝒊 = 𝒀𝒊 − 𝒀𝒊
𝒆𝒊
AdoptedfromBasicEconometricsGojaratiP76
𝑌
𝒀𝒊 − 𝒀
35. So;
𝒀𝒊 − 𝒀 = ( 𝒀𝒊 − 𝒀) + (𝒀𝒊 − 𝒀𝒊)
Or in the deviation form
𝒚𝒊 = 𝒚𝒊 + 𝒆𝒊
By squaring both sides and adding all over the sample we have:
𝒚𝒊
𝟐
= 𝒚𝒊
𝟐
+ 𝟐 𝒚𝒊 𝒆𝒊 + 𝒆𝒊
𝟐
= 𝒚𝒊
𝟐
+ 𝒆𝒊
𝟐
Where 𝒚𝒊 𝒆𝒊 = 𝟎 according to the OLS’s assumptions 3 and 5.
And if we change it to the non-deviated form:
𝒀𝒊 − 𝒀 2 = 𝒀𝒊 − 𝒀
2
+ 𝒀𝒊 − 𝒀𝒊
2
Total variation of the
observed Y values around
their mean =Total Sum of
Squares= TSS
Total explained variation of the
estimated Y values around their
mean = Explained Sum of
Squares (by explanatory
variables)= ESS
Total unexplained variation of
the observed Y values around
the regression line= Residual
Sum of Squares (Explained by
error terms)= RSS
36. Dividing both sides by Total Sum of Squares (TSS) we have:
1 =
𝐸𝑆𝑆
𝑇𝑆𝑆
+
𝑅𝑆𝑆
𝑇𝑆𝑆
=
𝒀𝒊 − 𝒀 2
𝒀𝒊 − 𝒀 2
+
𝒀𝒊 − 𝒀𝒊
2
𝒀𝒊 − 𝒀 2
Where
𝒀𝒊− 𝒀 𝟐
𝒀𝒊− 𝒀 𝟐
=
𝑬𝑺𝑺
𝑻𝑺𝑺
is the percentage of the variation of the actual
(observed) 𝒀𝒊 which is explained by the explanatory variables (by
regression line).
• A good reader knows that this is not a new concept; the
determination coefficient 𝒓 𝟐 was described already as a
measure of the goodness of fit between different alternative
sample regression functions (SRFs).
𝟏 = 𝒓 𝟐 +
𝑹𝑺𝑺
𝑻𝑺𝑺
→ 𝒓 𝟐 = 𝟏 −
𝑹𝑺𝑺
𝑻𝑺𝑺
= 𝟏 −
𝒆 𝒊
𝟐
𝒀 𝒊− 𝒀 𝟐
37. • A good model must have a reasonable high 𝒓 𝟐 but this does not
mean any model with a high 𝒓 𝟐 is a good model. Extremely high
level of 𝒓 𝟐 could be as a result of having a spurious regression
line due to the variety of reasons such as non-stationarity of
data, cointegration problem and etc.
• In a regression model with two parameters, 𝒓 𝟐 can be directly
calculated:
𝒓 𝟐 =
𝒀 𝒊− 𝒀
𝟐
𝒀 𝒊− 𝒀 𝟐 =
𝒃 𝟎+𝒃 𝟏 𝑿𝒊−𝒃 𝟎−𝒃 𝟏 𝑿
𝟐
𝒀 𝒊− 𝒀 𝟐
=
𝒃 𝟏
𝟐
𝑿 𝒊−𝑿
𝟐
𝒀 𝒊− 𝒀 𝟐 =
𝒃 𝟏
𝟐
𝒙 𝒊
𝟐
𝒚 𝒊
𝟐 = 𝒃 𝟏
𝟐 𝑺 𝑿
𝟐
𝑺 𝒀
𝟐
Where 𝑺 𝑿
𝟐
and 𝑺 𝒀
𝟐
are the standard deviations of 𝑿 and 𝒀
respectively.
38. Multiple Regression Analysis:
• If there are more than two explanatory variables in the
regression line we need additional assumptions about the
independency of the explanatory variables and also having no
exact linear relationship between them.
• The population and the sample regression models for three
variables model can be described as following:
In Population: 𝒀𝒊 = 𝜷 𝟎 + 𝜷 𝟏 𝑿 𝟏𝒊 + 𝜷 𝟐 𝑿 𝟐𝒊 + 𝒖𝒊
In Sample: 𝒀𝒊 = 𝒃 𝟎 + 𝒃 𝟏 𝑿 𝟏𝒊 + 𝒃 𝟐 𝑿 𝟐𝒊 + 𝒆𝒊
• The OLS estimators can be obtained by minimising 𝒆𝒊
𝟐. So,
the values of the SRF parameters in the deviation form are as
following:
𝒃 𝟏 =
( 𝒙 𝟏𝒊 𝒚𝒊)( 𝒙 𝟐𝒊
𝟐) − ( 𝒙 𝟐𝒊 𝒚𝒊)( 𝒙 𝟏𝒊 𝒙 𝟐𝒊)
( 𝒙 𝟏𝒊
𝟐)( 𝒙 𝟐𝒊
𝟐) − ( 𝒙 𝟏𝒊 𝒙 𝟐𝒊)
𝟐
39. 𝒃 𝟐 =
( 𝒙 𝟐𝒊 𝒚𝒊)( 𝒙 𝟏𝒊
𝟐
) − ( 𝒙 𝟏𝒊 𝒚𝒊)( 𝒙 𝟏𝒊 𝒙 𝟐𝒊)
( 𝒙 𝟏𝒊
𝟐)( 𝒙 𝟐𝒊
𝟐) − ( 𝒙 𝟏𝒊 𝒙 𝟐𝒊)
𝟐
And the intercept parameter will be calculated in the non-deviated
form as:
𝒃 𝟎 = 𝒀 − 𝒃 𝟏 𝑿 𝟏 − 𝒃 𝟐 𝑿 𝟐
• Under the classical assumptions and also the central limit
theorem the OLS estimators in sampling distribution (repeated
sampling),when 𝒏 → ∞, have a normal distribution:
𝒃 𝟏~𝑵(𝜷 𝟏,
𝝈 𝒖
𝟐. 𝒙 𝟐𝒊
𝟐
( 𝒙 𝟏𝒊
𝟐)( 𝒙 𝟐𝒊
𝟐) − ( 𝒙 𝟏𝒊 𝒙 𝟐𝒊)
𝟐
)
𝒃 𝟐~𝑵(𝜷 𝟐,
𝝈 𝒖
𝟐. 𝒙 𝟏𝒊
𝟐
( 𝒙 𝟏𝒊
𝟐)( 𝒙 𝟐𝒊
𝟐) − ( 𝒙 𝟏𝒊 𝒙 𝟐𝒊)
𝟐
)
40. • The distribution of the intercept parameter 𝒃 𝟎 is not of primary
concern as in many cases it has no practical importance.
• If the variance of the disturbance (error) term (𝝈 𝒖
𝟐
) is not known
the residual variance (sample variance) can be used ( 𝝈 𝒖
𝟐
),
which is an unbiased estimator of the earlier:
𝝈 𝒖
𝟐
=
𝒆𝒊
𝟐
𝒏 − 𝒌
Where 𝒌 is the number of parameters in the model (including the
intercept 𝒃 𝟎). Therefore, in a regression model with two slope
parameters and one intercept parameter the residual variance can
be calculated by:
𝝈 𝒖
𝟐
=
𝒆𝒊
𝟐
𝒏 − 𝟑
41. So, for a model with two slope parameters, the unbiased
estimates of the variance of these parameters are:
𝑺 𝒃 𝟏
𝟐
=
𝒆𝒊
𝟐
𝒏 − 𝟑
.
𝒙 𝟐𝒊
𝟐
( 𝒙 𝟏𝒊
𝟐)( 𝒙 𝟐𝒊
𝟐) − ( 𝒙 𝟏𝒊 𝒙 𝟐𝒊)
𝟐
=
𝝈 𝒖
𝟐
𝒙 𝟏𝒊
𝟐 (𝟏 − 𝒓 𝟐
𝟏𝟐)
Where 𝒓 𝟐
𝟏𝟐 =
𝒙 𝟏𝒊 𝒙 𝟐𝒊
𝟐
𝒙 𝟏𝒊
𝟐 𝒙 𝟐𝒊
𝟐 .
and
𝑺 𝒃 𝟐
𝟐
=
𝒆𝒊
𝟐
𝒏 − 𝟑
.
𝒙 𝟏𝒊
𝟐
( 𝒙 𝟏𝒊
𝟐)( 𝒙 𝟐𝒊
𝟐) − ( 𝒙 𝟏𝒊 𝒙 𝟐𝒊)
𝟐
=
𝝈 𝒖
𝟐
𝒙 𝟐𝒊
𝟐 (𝟏 − 𝒓 𝟐
𝟏𝟐)
𝝈 𝒖
𝟐
42. The Coefficient of Multiple Determination (𝑹 𝟐
and 𝑹 𝟐
):
The same concept of the coefficient of determination used for a
bivariate model can be extended for a multivariate model.
• If 𝑹 𝟐 is denoted as the coefficient of multiple determination it
shows the proportion (percentage) of the total variation of 𝒀
explained by the explanatory variables and it is calculated by:
𝑅2
=
𝐸𝑆𝑆
𝑇𝑆𝑆
=
𝑦 𝑖
2
𝑦 𝑖
2 =
𝑏1 𝑦 𝑖 𝑥1𝑖+𝑏2 𝑦 𝑖 𝑥2𝑖
𝑦 𝑖
2
And we know that:
0 ≤ 𝑅2
≤ 1
Note that 𝑅2 can also be calculated through RSS, i.e.
𝑅2 = 1 −
𝑅𝑆𝑆
𝑇𝑆𝑆
= 1 −
𝑒𝑖
2
𝑦𝑖
2
C
43. • 𝑹 𝟐 is likely to increase by including an additional explanatory
variable (see ). Therefore, in case we have two alternative
models with the same dependent variable 𝒀 but different
number of explanatory variables we should not be misled by the
high 𝑹 𝟐
of the model with more variables.
• To solve this problem we need to bring the degrees of freedom
into our consideration as a reduction factor against adding
additional explanatory variables. So, the adjusted 𝑹 𝟐 which can
be shown by 𝑹 𝟐 is considered as an alternative coefficient of
determination and it is calculated as:
𝑅2 = 1 −
𝑒𝑖
2
𝑛 − 𝑘
𝑦𝑖
2
𝑛 − 1
= 1 −
𝑛 − 1
𝑛 − 𝑘
.
𝑒𝑖
2
𝑦𝑖
2
= 1 −
𝑛−1
𝑛−𝑘
(1 − 𝑅2)
C
44. Partial Correlation Coefficients:
• For a three-variable regression model such as
𝒀𝒊 = 𝒃 𝟎 + 𝒃 𝟏 𝑿 𝟏𝒊 + 𝒃 𝟐 𝑿 𝟐𝒊 + 𝒆𝒊
We can talk about three linear association (correlation) between
𝒀 and 𝑿 𝟏 𝒓 𝒚𝒙 𝟏
, between 𝒀 and 𝑿 𝟐 (𝒓 𝒚𝒙 𝟐
) and finally between
𝑿 𝟏 and 𝑿 𝟐 (𝒓 𝒙 𝟏 𝒙 𝟐
). These correlations are called simple (gross)
correlation coefficients but they do not reflect the true linear
association between two variables as the influence of the third
variable on the other two is not removed.
• The net linear association between two variables can be
obtained through the partial correlation coefficient, where the
influence of the third variable is removed (the variable is hold
constant). Symbolically, 𝒓 𝒚𝒙 𝟏. 𝒙 𝟐
represents the partial
correlation coefficient between 𝒀 and 𝑿 𝟏 holding 𝑿 𝟐 constant.
45. • Two partial correlation coefficients in our model can be
calculated as following:
𝒓 𝒚𝒙 𝟏. 𝒙 𝟐
=
𝒓 𝒚𝒙 𝟏
− 𝒓 𝒚𝒙 𝟐
𝒓 𝒙 𝟏 𝒙 𝟐
𝟏 − 𝒓 𝟐
𝒙 𝟏 𝒙 𝟐
. 𝟏 − 𝒓 𝟐
𝒚𝒙 𝟐
𝒓 𝒚𝒙 𝟐. 𝒙 𝟏
=
𝒓 𝒚𝒙 𝟐
− 𝒓 𝒚𝒙 𝟏
𝒓 𝒙 𝟏 𝒙 𝟐
𝟏 − 𝒓 𝟐
𝒙 𝟏 𝒙 𝟐
. 𝟏 − 𝒓 𝟐
𝒚𝒙 𝟏
• The correlation coefficient 𝒓 𝒙 𝟏 𝒙 𝟐.𝒚 has no practical importance.
Specifically, when the direction of causality is from 𝑿′
𝒔 to 𝒀 we
can simply use the simple correlation coefficient in this case:
𝒓 =
𝒙 𝟏 𝒙 𝟐
𝒙 𝟏
𝟐 . 𝒙 𝟐
𝟐
• They can be used to find out which explanatory variable has
more linear association with the dependent variable.
46. Hypothesis Testing in Multiple Regression Models:
In a multiple regression model hypotheses are formed to test
different aspects of this type of regression models:
i. Testing hypothesis about an individual parameter of the
model. For example;
𝑯 𝟎: 𝜷𝒋 = 𝟎 against 𝑯 𝟏: 𝜷𝒋 ≠ 𝟎
If 𝝈 is unknown and is replaced by 𝝈 the test statistic
𝒕 =
𝒃 𝒋−𝜷 𝒋
𝒔𝒆(𝒃 𝒋)
=
𝒃 𝒋
𝒔𝒆(𝒃 𝒋)
follows the t-distribution with 𝒏 − 𝒌 df (for a regression model with
three parameters, including intercept, 𝐝𝐟 = 𝒏 − 𝟑)
47. ii. Testing hypothesis about the equality of two parameters
in the model. For example,
𝑯 𝟎: 𝜷𝒊 = 𝜷𝒋 against 𝑯 𝟏: 𝜷𝒊 ≠ 𝜷𝒋
Again, if 𝝈 is unknown and is replaced by 𝝈 the test statistic
𝒕 =
𝒃𝒊 − 𝒃𝒋 − 𝜷𝒊 − 𝜷𝒋
𝒔𝒆(𝒃𝒊 − 𝒃𝒋)
=
𝒃𝒊 − 𝒃𝒋
𝒗𝒂𝒓 𝒃𝒊 + 𝒗𝒂𝒓 𝒃𝒋 − 𝟐𝒄𝒐𝒗(𝒃𝒊, 𝒃𝒋)
follows the t-distribution with 𝒏 − 𝒌 df.
• If the value of test statistic 𝒕 > 𝒕 𝜶
𝟐
,(𝒏−𝒌) we must reject 𝑯 𝟎,
otherwise there is not much evidence to reject that.
48. iii. Testing hypothesis about the overall significance of the
estimated model by checking if all the slope parameters
are simultaneously zero. For example, to test
𝑯 𝟎: 𝜷𝒊 = 𝟎 (∀ 𝒊) against 𝑯 𝟏: ∃𝜷𝒊 ≠ 𝟎
the analysis of variance (ANOVA) table can be used to find if the
mean sum of squares (MSS), due to the regression (or
explanatory variables) are very far from the MSS due to the
residuals. If this is true, it means the variation of explanatory
variables contribute more towards the variation of the dependent
variable than the variation of residuals, so, the ratio
𝑴𝑺𝑺 𝑑𝑢𝑒 𝑡𝑜 𝑟𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛 (𝑒𝑥𝑝𝑙𝑎𝑛𝑎𝑡𝑜𝑟𝑦 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒𝑠)
𝑴𝑺𝑺 𝑑𝑢𝑒 𝑡𝑜 𝑟𝑒𝑠𝑖𝑑𝑢𝑎𝑙𝑠 (𝑟𝑎𝑛𝑑𝑜𝑚 𝑒𝑙𝑒𝑚𝑒𝑛𝑡𝑠)
should be much higher than one.
49. • The ANOVA table for the three-variable regression model can
be formed as following:
• If we believe that the regression model is meaningless so we
cannot reject the null hypothesis that all slope coefficients are
simultaneously equal to zero, otherwise the test statistic
𝐹 =
𝐸𝑆𝑆/𝑑𝑓
𝑅𝑆𝑆/𝑑𝑓
=
𝒃 𝟏 𝒚𝒊 𝒙 𝟏𝒊 + 𝒃 𝟐 𝒚𝒊 𝒙 𝟐𝒊
𝟐
𝒆𝒊
𝟐
𝒏 − 𝟑
Which follows the F-distribution with 2 and 𝒏 − 𝟑 df must be much
bigger than 1.
Source of variation Sum of Squares (SS) df Mean Sum of Squares (MSS)
Due to Explanatory
Variables
𝒃 𝟏 𝒚𝒊 𝒙 𝟏𝒊 + 𝒃 𝟐 𝒚𝒊 𝒙 𝟐𝒊 2
𝒃 𝟏 𝒚𝒊 𝒙 𝟏𝒊 + 𝒃 𝟐 𝒚𝒊 𝒙 𝟐𝒊
𝟐
Due to Residuals
𝒆𝒊
𝟐
𝒏 − 𝟑
𝝈 𝟐
=
𝒆𝒊
𝟐
𝒏 − 𝟑
Total
𝒚𝒊
𝟐
𝒏 − 𝟏
50. • In general, to test the overall significance of the sample
regression for a multi-variable model (e.g with 𝒌 slope
parameters) the null and alternative hypotheses and the test
statistic are as following:
𝑯 𝟎: 𝜷 𝟏 = 𝜷 𝟐 = ⋯ = 𝜷 𝒌 = 𝟎
𝑯 𝟏: 𝒂𝒕 𝒍𝒆𝒂𝒔𝒕 𝒕𝒉𝒆𝒓𝒆 𝒊𝒔 𝒐𝒏𝒆 𝜷𝒊 ≠ 𝟎
𝑭 =
𝑬𝑺𝑺
𝒌−𝟏
𝑹𝑺𝑺
𝒏−𝒌
• If 𝑭 > 𝑭 𝜶, 𝒌−𝟏, 𝒏−𝒌 we reject 𝑯 𝟎 at the significance level of 𝜶,
otherwise there is no enough evidence to reject it.
• It is sometimes easier to use the determination coefficient 𝑹 𝟐
to run the above test, because
𝑹 𝟐
=
𝑬𝑺𝑺
𝑻𝑺𝑺
→ 𝑬𝑺𝑺 = 𝑹 𝟐
. 𝑻𝑺𝑺
and also
𝑹𝑺𝑺 = 𝟏 − 𝑹 𝟐
. 𝑻𝑺𝑺
51. • The ANOVA table can also be written as:
• So, the test statistic F can be written as:
𝑭 =
𝑹 𝟐 𝒚𝒊
𝟐
(𝒌 − 𝟏)
(𝟏 − 𝑹 𝟐) 𝒚𝒊
𝟐
(𝒏 − 𝒌)
=
𝒏 − 𝒌
𝒌 − 𝟏
.
𝑹 𝟐
𝟏 − 𝑹 𝟐
Source of variation Sum of Squares (SS) df Mean Sum of
Squares (MSS)
Due to Explanatory
Variables
𝑹 𝟐
𝒚𝒊
𝟐
𝒌 − 𝟏
𝑹 𝟐
𝒚𝒊
𝟐
𝒌 − 𝟏
Due to Residuals
(𝟏 − 𝑹 𝟐
) 𝒚𝒊
𝟐 𝒏 − 𝒌
𝝈 𝟐
=
(𝟏 − 𝑹 𝟐
) 𝒚𝒊
𝟐
𝒏 − 𝒌
Total
𝒚𝒊
𝟐
𝒏 − 𝟏
52. iv. Testing hypothesis about parameters when they satisfy
certain restrictions.*
e.g.𝑯 𝟎: 𝜷𝒊 + 𝜷𝒋 = 𝟏 against 𝑯 𝟏: 𝜷𝒊 + 𝜷𝒋 ≠ 𝟏
v. Testing hypothesis about the stability of the estimated
regression model in a specific time period or in two cross-
sectional unit.**
vi. Testing hypothesis about different functional forms of
regression models.***