correlation and regression

1
Correlation and
Regression
BY UNSA SHAKIR

2
Correlation and Regression
Correlation describes the strength of a
linear relationship between two variables
Regression tells us how to draw the straight
line described by the correlation

3
• For example:
A sociologist may be interested in the relationship
between education and self-esteem or Income and
Number of Children in a family.
Independent Variables
Education
Family Income
Dependent Variables
Self-Esteem
Number of Children

4
• For example:
• May expect: As education increases, self-esteem
increases (positive relationship).
• May expect: As family income increases, the number
of children in families declines (negative relationship).
Independent Variables
Education
Family Income
Dependent Variables
Self-Esteem
Number of Children
+
-

6
Correlation
• Correlation is a statistical technique used to
determine the degree to which two variables
are related
• A correlation is a relationship between two
variables. The data can be represented by the
ordered pairs (x, y) where x is the independent
(or explanatory) variable, and y is the
dependent (or response) variable.

7
Correlation
x 1 2 3 4 5
y – 4 – 2 – 1 0 2
A scatter plot can be used to determine
whether a linear (straight line) correlation
exists between two variables.
x
2 4
–2
– 4
y
2
6
Example:

8
Linear Correlation
x
y
Negative Linear Correlation
x
y
No Correlation
x
y
Positive Linear Correlation
x
y
Nonlinear Correlation
As x increases,
y tends to
decrease.
As x increases,
y tends to
increase.

9
Correlation Coefficient
• It is also called Pearson's correlation or
product moment correlation coefficient
• The correlation coefficient is a measure of
the strength and the direction of a linear
relationship between two variables. The
symbol r represents the sample correlation
coefficient. The formula for r is
  
   2 22 2
.
n xy x y
r
n x x n y y
   

     

10
The sign of r denotes the nature of
association
while the value of r denotes the strength of
association.

11
If the sign is +ve this means the relation is
direct (an increase in one variable is
associated with an increase in the
other variable and a decrease in one
variable is associated with a
decrease in the other variable).
While if the sign is -ve this means an
inverse or indirect relationship (which
means an increase in one variable is
associated with a decrease in the other).

12
The value of r ranges between ( -1) and ( +1)
The value of r denotes the strength of the
association as illustrated
by the following diagram.
-1 10-0.25-0.75 0.750.25
strong strongintermediate intermediateweak weak
no
relation
perfect
correlation
perfect
correlation
Directindirect

13
If r = Zero this means no association or
correlation between the two variables.
If 0 < r < 0.25 = weak correlation.
If 0.25 ≤ r < 0.75 = intermediate correlation.
If 0.75 ≤ r < 1 = strong correlation.
If r = l = perfect correlation.

14
Linear Correlation
x
y
Strong negative correlation
x
y
Weak positive correlation
x
y
Strong positive correlation
x
y
Nonlinear Correlation
r = 0.91 r = 0.88
r = 0.42
r = 0.07

15
Calculating a Correlation Coefficient
  
   2 22 2
.
n xy x y
r
n x x n y y
   

     
1. Find the sum of the x-values.
2. Find the sum of the y-values.
In Words In Symbols
x
y
xy3. Multiply each x-value by its
corresponding y-value and find
the sum.

16
In Words In Symbols
2
x
2
y
4. Square each x-value and
find the sum.
5. Square each y-value and
find the sum.
6. Use these five sums to
calculate the correlation
coefficient.

17
x y xy x2 y2
1 – 3 – 3 1 9
2 – 1 – 2 4 1
3 0 0 9 0
4 1 4 16 1
5 2 10 25 4
Example:
Calculate the correlation coefficient r for the following
data.
15x  1y   9xy  2
55x  2
15y 

18
  
   2 22 2
n xy x y
r
n x x n y y
   

     
Example:
Calculate the correlation coefficient r for the following
data.
  
 22
5(9) 15 1
5(55) 15 5(15) 1
 

  
60
50 74
 0.986
There is a strong positive linear correlation
between x and y.

19
Hours,
x
0 1 2 3 3 5 5 5 6 7 7 10
Test score,
y
96 85 82 74 95 68 76 84 58 65 75 50
Example:
The following data represents the number of hours, 12
different students watched television during the
weekend and the scores of each student who took a test
the following Monday.
a.) Display the scatter plot.
b.) Calculate the correlation coefficient r.

20
100
x
y
Hours watching TV
Testscore
80
60
40
20
2 4 6 8 10
Hours,
x
0 1 2 3 3 5 5 5 6 7 7 10
Test score,
y
96 85 82 74 95 68 76 84 58 65 75 50

21
Hours, x 0 1 2 3 3 5 5 5 6 7 7 10
Test
score, y
96 85 82 74 95 68 76 84 58 65 75 50
xy 0 85
16
4
222
28
5
34
0
38
0
420 348
45
5
52
5
50
0
x2 0 1 4 9 9 25 25 25 36 49 49
10
0
y2 921
6
722
5
67
24
547
6
90
25
46
24
57
76
705
6
336
4
42
25
56
25
25
00
Example continued:
54x  908y  3724xy 
2
332x  2
70836y 

22
Example continued:
  
   2 22 2
n xy x y
r
n x x n y y
   

     
  
 22
12(3724) 54 908
12(332) 54 12(70836) 908


 
0.831 
• There is a strong negative linear correlation.
• As the number of hours spent watching TV increases,
the test scores tend to decrease.

23
Example:
A sample of 6 children was selected, data about their
age in years and weight in kilograms was recorded
as shown in the following table . It is required to find
the correlation between age and weight.
Weight
(Kg)
Age
(years)
serial
No
1271
862
1283
1054
1165
1396

24
Y2X2xy
Weight
(Kg)
(y)
Age
(year)
(x)
Serial
n.
14449841271
643648862
14464961283
10025501054
12136661165
169811171396
∑y2=
742
∑x2=
291
∑xy=
461
∑y=
66
∑x=
41
Total

25
r = 0.759
strong direct correlation
















6
(66)
742.
6
(41)
291
6
6641
461
r
22

26
EXAMPLE: Relationship betweenAnxiety and Test
Scores
Anxiety
(X)
Test
score (Y)
X2 Y2 XY
10 2 100 4 20
8 3 64 9 24
2 9 4 81 18
1 7 1 49 7
5 6 25 36 30
6 5 36 25 30
∑X = 32 ∑Y = 32 ∑X2 = 230 ∑Y2 = 204 ∑XY=129

27
Calculating Correlation Coefficient
  
94.
)200)(356(
1024774
32)204(632)230(6
)32)(32()129)(6(
22





r
r = - 0.94
Indirect strong correlation

28
Example
Tree
Height
Trunk
Diameter
y x xy y2
x2
35 8 280 1225 64
49 9 441 2401 81
27 7 189 729 49
33 6 198 1089 36
60 13 780 3600 169
21 7 147 441 49
45 11 495 2025 121
51 12 612 2601 144
Σ =321 Σ =73 Σ =3142 Σ =14111 Σ =713

29
13
0
10
20
30
40
50
60
70
0 2 4 6 8 10 12 14
2
Trunk Diameter, x
Tree
Height,
y
Example
• r = 0.886 → relatively
strong positive linear
association between x
and y

32
Regression Analyses
• Regression technique is concerned with
predicting some variables by knowing others
• The process of predicting variable Y using
variable X

33
20
Types of Regression Models
Positive Linear Relationship
Negative Linear Relationship
Relationship NOT Linear
No Relationship

34
Regression
Uses a variable (x) to predict some outcome
variable (y)
Tells you how values in y change as a
function of changes in values of x

35
The regression line makes the sum of the squares of the
residuals smaller than for any other line
Regression minimizes residuals
80
100
120
140
160
180
200
220
60 70 80 90 100 110 120
Wt (kg)
SBP(mmHg)

36
By using the least squares method (a procedure that
minimizes the vertical deviations of plotted points
surrounding a straight line) we are
able to construct a best fitting straight line to the scatter
diagram points and then formulate a regression equation
in the form of:



 



n
x)(
x
n
yx
xy
b 2
2
1
)xb(xyyˆ 
bXayˆ 
Regression equation describes the regression line
mathematically by showing Intercept and Slope

37
• The statistics equation for a line:
Y = a + bx
Where: Y = the line’s position on the
vertical axis at any point (estimated
value of dependent variable)
X = the line’s position on the
horizontal axis at any point (value of
the independent variable for which you
want an estimate of Y)
b = the slope of the line
(called the coefficient)
a = the intercept with the Y axis,
where X equals zero
^
^

38
Linear Equations
Y
Y = bX + a
a = Y-intercept
X
Change
in Y
Change in X
b = Slope

39
Exercise
A sample of 6 persons was selected the value of
their age ( x variable) and their weight is
demonstrated in the following table. Find the
regression equation and what is the predicted
weight when age is 8.5 years.
Weight (y)Age (x)Serial no.
12
8
12
10
11
13
7
6
8
5
6
9
1
2
3
4
5
6

40
Answer
Y2X2xyWeight (y)Age (x)Serial no.
144
64
144
100
121
169
49
36
64
25
36
81
84
48
96
50
66
117
12
8
12
10
11
13
7
6
8
5
6
9
1
2
3
4
5
6
7422914616641Total

41
6.83
6
41
x  11
6
66
y
92.0
6
)41(
291
6
6641
461
2




b
Regression equation
6.83)0.9(x11yˆ (x) 

42
0.92x4.675yˆ (x) 
12.50Kg8.5*0.924.675yˆ (8.5) 
Kg58.117.5*0.924.675yˆ (7.5) 

43
we create a regression line by plotting two estimated
values for y against their X component, then extending
the line right and left.

44
Regression Line
Example:
a.) Find the equation of the regression line.
b.) Use the equation to find the expected value when
value of x is 2.3
x y xy x2 y2
1 – 3 – 3 1 9
2 – 1 – 2 4 1
3 0 0 9 0
4 1 4 16 1
5 2 10 25 4
15x  1y   9xy  2
55x  2
15y 

45
Regression Line
2
x
y
1
1
2
3
1 2 3 4 5
  
 22
n xy x y
m
n x x
   

  
  
 2
5(9) 15 1
5(55) 15
 


60
50

1.2

46
Regression Line
Example:
The following data represents the number of hours 12
different students watched television during the
weekend and the scores of each student who took a
test the following Monday.
a.) Find the equation of the regression line.
b.) Use the equation to find the expected test score
for a student who watches 9 hours of TV.

47
Regression Line
Hours, x 0 1 2 3 3 5 5 5 6 7 7 10
Test score,
y
96 85 82 74 95 68 76 84 58 65 75 50
xy 0 85 164 222 285 340 380 420 348 455 525 500
x2 0 1 4 9 9 25 25 25 36 49 49 100
y2 9216
722
5
672
4
547
6
902
5
462
4
577
6
705
6
336
4
422
5
562
5
250
0
54x  908y  3724xy  2
332x  2
70836y 

48
• Find the correlation between age and blood
pressure using simple and Spearman's
correlation coefficients, and comment.
• Find the regression equation?
• What is the predicted blood pressure for a
man aging 25 years?
Exercise

49
x2xyyxSerial
4002400120201
18495504128432
39698883141633
6763276126264
28097102134535
9613968128316
33647888136587
21166072132468
33648120140589
4900100801447010

50
x2xyyxSerial
211658881284611
280972081365312
360087601466013
40024801242014
396990091436315
184955901304316
67632241242617
36122991211918
96139061263119
52928291232320
416781144862630852Total

51



 



n
x)(
x
n
yx
xy
b 2
2
1
4547.0
20
852
41678
20
2630852
114486
2




=
=112.13 + 0.4547 x
for age 25
B.P = 112.13 + 0.4547 * 25=123.49
= 123.5 mm hg
yˆ

correlation and regression

More Related Content

What's hot

What's hot (20)

Similar to correlation and regression

Similar to correlation and regression (20)

More from Unsa Shakir

More from Unsa Shakir (20)

Recently uploaded

Recently uploaded (20)

correlation and regression