Introduces and explains the use of multiple linear regression, a multivariate correlational statistical technique. For more info, see the lecture page at http://goo.gl/CeBsv. See also the slides for the MLR II lecture http://www.slideshare.net/jtneill/multiple-linear-regression-ii
The document discusses simple linear regression. It defines key terms like regression equation, regression line, slope, intercept, residuals, and residual plot. It provides examples of using sample data to generate a regression equation and evaluating that regression model. Specifically, it shows generating a regression equation from bivariate data, checking assumptions visually through scatter plots and residual plots, and interpreting the slope as the marginal change in the response variable from a one unit change in the explanatory variable.
Logistic regression is a statistical model used to predict binary outcomes like disease presence/absence from several explanatory variables. It is similar to linear regression but for binary rather than continuous outcomes. The document provides an example analysis using logistic regression to predict risk of HHV8 infection from sexual behaviors and infections like HIV. The analysis found HIV and HSV2 history were associated with higher odds of HHV8 after adjusting for other variables, while gonorrhea history was not a significant independent predictor.
Regression analysis is a statistical technique for predicting a dependent variable based on one or more independent variables. Simple linear regression fits a straight line to the data to predict a continuous dependent variable (y) from a single independent variable (x). The output is an equation of the form y= b0 + b1x + ε, where b0 is the y-intercept, b1 is the slope, and ε is the error. Multiple linear regression extends this to include more than one independent variable. Regression analysis calculates the "best fit" line that minimizes the residuals, or differences between predicted and observed y values.
- Regression analysis is a statistical technique for modeling relationships between variables, where one variable is dependent on the others. It allows predicting the average value of the dependent variable based on the independent variables.
- The key assumptions of regression models are that the error terms are normally distributed with zero mean and constant variance, and are independent of each other.
- Linear regression specifies that the dependent variable is a linear combination of the parameters, though the independent variables need not be linearly related. In simple linear regression with one independent variable, the least squares estimates of the intercept and slope are calculated to minimize the sum of squared errors.
Logistic regression allows prediction of discrete outcomes from continuous and discrete variables. It addresses questions like discriminant analysis and multiple regression but without distributional assumptions. There are two main types: binary logistic regression for dichotomous dependent variables, and multinomial logistic regression for variables with more than two categories. Binary logistic regression expresses the log odds of the dependent variable as a function of the independent variables. Logistic regression assesses the effects of multiple explanatory variables on a binary outcome variable. It is useful when the dependent variable is non-parametric, there is no homoscedasticity, or normality and linearity are suspect.
This document presents information about regression analysis. It defines regression as the dependence of one variable on another and lists the objectives as defining regression, describing its types (simple, multiple, linear), assumptions, models (deterministic, probabilistic), and the method of least squares. Examples are provided to illustrate simple regression of computer speed on processor speed. Formulas are given to calculate the regression coefficients and lines for predicting y from x and x from y.
This document provides an overview of logistic regression, including when and why it is used, the theory behind it, and how to assess logistic regression models. Logistic regression predicts the probability of categorical outcomes given categorical or continuous predictor variables. It relaxes the normality and linearity assumptions of linear regression. The relationship between predictors and outcomes is modeled using an S-shaped logistic function. Model fit, predictors, and interpretations of coefficients are discussed.
Regression analysis is a statistical technique used to investigate relationships between variables. It allows one to determine the strength of the relationship between a dependent variable (usually denoted by Y) and one or more independent variables (denoted by X). Multiple regression extends this to analyze the relationship between a dependent variable and multiple independent variables. The goals of regression analysis are to understand how the dependent variable changes with the independent variables and to use the independent variables to predict the value of the dependent variable. It requires the dependent variable to be continuous and the independent variables can be either continuous or categorical.
Multiple regression analysis allows researchers to examine the relationship between one dependent or outcome variable and two or more independent or predictor variables. It extends simple linear regression to model more complex relationships. Stepwise regression is a technique that automates the process of building regression models by sequentially adding or removing variables based on statistical criteria. It begins with no variables in the model and adds variables one at a time based on their contribution to the model until none improve it significantly.
- The document discusses simple linear regression analysis and how to use it to predict a dependent variable (y) based on an independent variable (x).
- Key points covered include the simple linear regression model, estimating regression coefficients, evaluating assumptions, making predictions, and interpreting results.
- Examples are provided to demonstrate simple linear regression analysis using data on house prices and sizes.
This document provides an overview of regression analysis and two-way tables. It defines key concepts such as regression lines, correlation, residuals, and marginal and conditional distributions. Regression finds the linear relationship between two variables to make predictions. The least squares regression line minimizes the vertical distance between the data points and the line. Correlation and the coefficient of determination r2 measure how well the regression line fits the data. Two-way tables summarize the relationship between two categorical variables through marginal and conditional distributions.
This document provides an example of simple linear regression with one independent variable. It explains that linear regression finds the line of best fit by estimating values for the slope (b1) and y-intercept (b0) that minimize the sum of the squared errors between the observed data points and the regression line. It provides the formulas for calculating the least squares estimates of b1 and b0. The document includes a table of temperature and sales data and a corresponding scatter plot as an example of simple linear regression analysis.
It is most useful for the students of BBA for the subject of "Data Analysis and Modeling"/
It has covered the content of chapter- Data regression Model
Visit for more on www.ramkumarshah.com.np/
Assumptions of Linear Regression - Machine LearningKush Kulshrestha
There are 5 key assumptions in linear regression analysis:
1. There must be a linear relationship between the dependent and independent variables.
2. The error terms cannot be correlated with each other.
3. The independent variables cannot be highly correlated with each other.
4. The error terms must have constant variance (homoscedasticity).
5. The error terms must be normally distributed. Violations of these assumptions can result in poor model fit or inaccurate predictions. Various tests can be used to check for violations.
The document provides an overview of regression analysis including:
- Regression analysis is a statistical process used to estimate relationships between variables and predict unknown values.
- The document outlines different types of regression like simple, multiple, linear, and nonlinear regression.
- Key aspects of regression like scatter diagrams, regression lines, and the method of least squares are explained.
- An example problem is worked through demonstrating how to calculate the slope and y-intercept of a regression line using the least squares method.
Simple Linear Regression: Step-By-StepDan Wellisch
This presentation was made to our meetup group found here.: https://www.meetup.com/Chicago-Technology-For-Value-Based-Healthcare-Meetup/ on 9/26/2017. Our group is focused on technology applied to healthcare in order to create better healthcare.
This document outlines a course on multivariate data analysis. It introduces key topics that will be covered, including matrix algebra, the multivariate normal distribution, principal component analysis, factor analysis, cluster analysis, discriminant analysis, and canonical correlations. The course workload consists of 40% theory and 60% practice, including a group project and weekly presentations. R will be the main software used. Examples of multivariate data and applications in various fields like business, health, and education are also provided.
Regression analysis is a statistical technique for investigating relationships between variables. Simple linear regression defines a relationship between two variables (X and Y) using a best-fit straight line. Multiple regression extends this to model relationships between a dependent variable Y and multiple independent variables (X1, X2, etc.). Regression coefficients are estimated to define the regression equation, and R-squared and the standard error can be used to assess the goodness of fit of the regression model to the data. Regression analysis has applications in pharmaceutical experimentation such as analyzing standard curves for drug analysis.
This document discusses point and interval estimation. It defines an estimator as a function used to infer an unknown population parameter based on sample data. Point estimation provides a single value, while interval estimation provides a range of values with a certain confidence level, such as 95%. Common point estimators include the sample mean and proportion. Interval estimators account for variability in samples and provide more information than point estimators. The document provides examples of how to construct confidence intervals using point estimates, confidence levels, and standard errors or deviations.
This document discusses correlation and regression analysis. It defines correlation analysis as examining the relationship between two or more variables, and regression analysis as examining how one variable changes when another specific variable changes in volume. It covers positive and negative correlation, linear and non-linear correlation, and how to calculate the coefficient of correlation. Regression analysis and regression equations are introduced for using a known variable to predict an unknown variable. Examples are provided to illustrate key concepts.
Here are the key steps and results:
1. Load the data and run a multiple linear regression with x1 as the target and x2, x3 as predictors.
R-squared is 0.89
2. Add x4, x5 as additional predictors.
R-squared increases to 0.94
3. Add x6, x7 as additional predictors.
R-squared further increases to 0.98
So as more predictors are added, the R-squared value increases, indicating more of the variation in x1 is explained by the model. However, adding too many predictors can lead to overfitting.
This chapter discusses various statistical tests used for multiple regression analysis, including:
1. Testing individual regression coefficients and the overall model significance.
2. Testing whether two or more coefficients are equal.
3. Testing if coefficients satisfy certain restrictions.
4. Testing the stability of a regression model over time using the Chow test.
5. Testing linear vs. log-linear functional forms using the MacKinnon-White-Davidson test.
The chapter outlines different statistical approaches like confidence intervals, F-tests, and t-tests to evaluate hypotheses about coefficients and models.
Linear regression(probabilistic interpretation)hitesh saini
Linear regression can be interpreted probabilistically where the model assumes that the target variable is a linear combination of the features and some random error. This error is assumed to be normally distributed. Linear regression finds the coefficients that minimize this error to fit the linear model to the data.
This document discusses multiple linear regression analysis conducted to assess staff satisfaction levels at an educational institution. A questionnaire was administered to staff across multiple locations. Factor analysis was used to identify the variables that best predict overall satisfaction. A regression model was developed using satisfaction as the dependent variable and questions regarding workplace expectations, resources, communication, recognition, development opportunities, and opinions as independent variables. The model was analyzed in SPSS and showed high explanatory power, with no issues of multicollinearity between predictors.
This chapter summary covers simple linear regression models. Key topics include determining the simple linear regression equation, measures of variation such as total, explained, and unexplained sums of squares, assumptions of the regression model including normality, homoscedasticity and independence of errors. Residual analysis is discussed to examine linearity and assumptions. The coefficient of determination, standard error of estimate, and Durbin-Watson statistic are also introduced.
Regression analysis is a statistical technique used to model relationships between variables. It allows one to predict the average value of a dependent variable based on the value of one or more independent variables. The key ideas are that the dependent variable is influenced by the independent variables in a linear or curvilinear fashion, and regression provides an equation to estimate the dependent variable given values of the independent variables. Common applications of linear regression include forecasting, determining relationships between variables, and estimating how changes in one variable impact another.
The document provides an overview of regression analysis concepts including:
- Regression analysis is used to understand relationships between variables and predict the value of one variable based on another.
- A regression model has a dependent variable on the y-axis and an independent variable on the x-axis.
- Examples of how to perform regression analysis are provided including creating a scatter plot and calculating parameters like the slope and intercept.
- Key concepts for measuring the fit of a linear regression model are defined including variability, correlation coefficient, coefficient of determination, and standard error.
This document provides an overview of linear regression analysis. It begins by defining regression analysis and describing its uses in prediction, forecasting, and understanding relationships between variables. It then covers simple and multivariate linear regression, discussing modeling relationships between one or more predictor and response variables. The document explains linear regression in R and how to evaluate model performance using analysis of variance (ANOVA) and other metrics like the coefficient of correlation. Key concepts like residuals, least squares estimation, and assumptions of linear regression are also introduced.
1. The document discusses techniques for using the chi-square distribution to test hypotheses, including goodness-of-fit tests to compare observed and expected frequencies and contingency table analysis to test for relationships between categorical variables.
2. Examples are provided to demonstrate how to conduct chi-square tests, including stating hypotheses, selecting test statistics, computing expected frequencies, and determining whether to reject the null hypothesis.
3. One example analyzes survey data on hospital admissions of seniors to determine if it is consistent with national data, while another uses data on ex-prisoners' adjustments to test for a relationship between adjustment and living location. Both examples compute chi-square statistics that do not reject the null hypotheses.
The document provides an overview of multiple linear regression (MLR). MLR allows predicting a dependent variable from multiple independent variables. It extends simple linear regression by incorporating additional predictors. Key points covered include: purposes of MLR for explanation and prediction; assumptions of the method; interpreting R-squared values; comparing unstandardized and standardized regression coefficients; and testing the statistical significance of predictors.
This document provides instructions for performing multiple regression analysis in SPSS. It demonstrates entering variables, running the regression using the enter, stepwise, and backward methods, and interpreting the output including R-square values, F-tests, beta coefficients, and equations for predicting the dependent variable based on the independent variables. Age and education were identified as the best predictors of months of full-time employment using both the stepwise and backward regression methods.
- The document outlines a class on multiple regression analysis and its applications.
- It discusses using regression to analyze the relationship between sales of a product and its own price as well as the price of a substitute or complementary product.
- As an example, it describes a group exercise analyzing sales data of Best Foods and Kraft mayonnaise to determine if they are substitute or complementary goods.
This document provides an overview of multiple linear regression analysis. It describes using multiple regression to model the relationship between a dependent variable and multiple independent variables. Key points covered include: setting up and interpreting a multiple regression equation; computing measures like the standard error, coefficient of determination, and adjusted coefficient of determination; conducting hypothesis tests on the regression coefficients and overall model; evaluating assumptions; and using residual analysis to validate the model. An example is presented using data on home heating costs to develop a multiple regression model relating costs to temperature, insulation, and furnace age.
Linear regression analysis predicts the value of a dependent variable based on the value of an independent variable. It requires continuous variables, a linear relationship between variables, no outliers, independent observations, homoscedasticity, and normally distributed residuals. The analysis identifies whether changes in the independent variable reliably predict changes in the dependent variable.
This document provides an overview of multiple regression analysis. It defines multiple regression, explains how to interpret regression coefficients and outputs, and discusses best practices for variable selection and assessing assumptions. Examples are provided on how to conduct multiple regression in SPSS to analyze customer survey data from two restaurants. Advanced topics like multicollinearity and dummy variables are also mentioned.
A budget is a financial plan that shows expected income and expenditure over a period of time. It is important to budget to ensure money is available to cover costs, reduce impulse buying, avoid debt, and save for emergencies. A budget accounts for regular income like wages, irregular income like bonuses, and non-cash benefits. It also accounts for fixed, irregular, and discretionary expenses. By tracking income, expenses, and the difference between the two, a budget identifies surpluses or deficits at the end of each period.
This document provides an introduction to econometrics and regression analysis. It defines econometrics as the application of statistical methods to economic data and models. The document outlines the methodology of econometrics, including specifying economic theories as mathematical and econometric models, obtaining data, estimating models, hypothesis testing, forecasting, and using models for policy purposes. It also discusses key concepts in regression analysis such as the dependent and explanatory variables, and distinguishes regression from correlation and causation.
- Regression analysis is a statistical technique used to measure the relationship between two quantitative variables and make causal inferences.
- A regression model graphs the relationship between a dependent variable (Y axis) and one or more independent variables (X axis). The goal is to find the linear equation that best fits the data.
- The regression equation takes the form Y = a + bX, where a is the intercept, b is the slope coefficient, and X and Y are the variables. The coefficient b indicates the strength and direction of the relationship.
Explains some advanced uses of multiple linear regression, including partial correlations, analysis of residuals, interactions, and analysis of change. See also previous lecture http://www.slideshare.net/jtneill/multiple-linear-regression
The document discusses simple linear regression and correlation methods. It defines deterministic and probabilistic models for describing the relationship between two variables. A simple linear regression model assumes a population regression line with intercept a and slope b, where observations may deviate from the line by some random error e. Key assumptions of the model are that e has a normal distribution with mean 0 and constant variance across values of x, and errors are independent. The slope b estimates the average change in y per unit change in x.
This is the comprehensive PPT on regression analysis. It covers the methods of identifying IV, DV, mediator, and moderators. How to interpreter using the parameters, R square, T-test. differentiation between linear and non-lienar regression
This document discusses standard and hierarchical multiple regression. It provides examples using data on academic achievement (GPA) predicted from minutes spent studying, motivation, and anxiety. Standard multiple regression is used to assess how much variance in GPA is explained collectively by the three predictors. Specifically, it finds the predictors explain 65% of variance in GPA. It also describes interpreting individual predictor importance through coefficients like beta weights. Hierarchical regression is mentioned but not demonstrated.
This document discusses multiple regression analysis. It begins by introducing multiple regression as an extension of simple linear regression that allows for modeling relationships between a response variable and multiple explanatory variables. It then covers topics such as examining variable distributions, building regression models, estimating model parameters, and assessing overall model fit and significance of individual predictors. An example demonstrates using multiple regression to build a model for predicting cable television subscribers based on advertising rates, station power, number of local families, and number of competing stations.
This document presents data from a case-control study examining the association between oral contraceptive use and myocardial infarction. It provides odds ratios comparing cases and controls stratified by smoking status. The odds ratio for oral contraceptive use adjusted for smoking is 4.5. Additional data are presented examining an outbreak of gastroenteritis at a nursing home, including attack rates by potential risk factors like age, sex, floor of residence, meal location, and protein supplement consumption. A logistic regression analysis found protein supplement consumption to be significantly associated with gastroenteritis.
A presentation on correlation and regression for engineering students studying probability and statistics. The presentation is designed according to syllabus of Institute of Engineering (IOE), Tribhuvan University. But the course content is similar to that of almost all the engineering universities.
This chapter discusses exploring relationships between two quantitative variables using scatterplots and measuring the strength of linear relationships using correlation coefficients. Scatterplots show the joint distribution of two variables by plotting one on the x-axis and the other on the y-axis. Interpreting scatterplots involves examining the overall pattern, direction, strength, and outliers of the relationship. Correlation coefficients measure the direction and strength of linear relationships between -1 and 1. The chapter also discusses adding categorical variables to scatterplots and facts about correlation such as its sensitivity to outliers.
This document discusses descriptive statistics used in research. It defines descriptive statistics as procedures used to organize, interpret, and communicate numeric data. Key aspects covered include frequency distributions, measures of central tendency (mode, median, mean), measures of variability, bivariate descriptive statistics using contingency tables and correlation, and describing risk to facilitate evidence-based decision making. The overall purpose of descriptive statistics is to synthesize and summarize quantitative data for analysis in research.
4. Performed statistical analysis on a chosen data table and understood relationship amongst different data fields using IBM SPSS software.
Methodologies: Multi linear regression, Logistic linear regression
IBM SPSS
This document provides an introduction to biostatistics. It defines biostatistics as the branch of statistics dealing with biological data. It discusses different types of data, methods of data presentation including tables, charts and graphs. It also covers measures of central tendency and dispersion, sampling methods, tests of significance including chi-square test and t-test, and correlation and regression. The overall purpose is to introduce basic statistical concepts and methods used for analyzing health and medical data.
This document presents data from several epidemiological studies analyzing risk factors for various health outcomes. It includes tables of data on myocardial infarction and oral contraceptive use, smoking status and myocardial infarction, and age and signs of coronary heart disease. It also includes graphs of the relationship between age and systolic blood pressure. The document introduces logistic regression as a statistical method for analyzing this type of categorical outcome data and explains key aspects of logistic regression including the logistic function, maximum likelihood estimation, and testing of models.
This document provides an overview of biostatistics and statistical methods used for medical and biological data. It discusses topics including descriptive statistics, statistical inference through estimation, hypothesis testing, and confidence intervals. Specific statistical tools covered include correlation, regression, chi-square tests, multivariate techniques like PCA and clustering, and time series analysis. Examples are provided for hypothesis testing on means and proportions. The document defines key biostatistical concepts like variables, parameters, statistics, and sampling distributions.
This document discusses quantitative and qualitative data analysis techniques. It covers:
- Displays for numerical (frequency charts, histograms) and categorical data (bar charts, pie charts, contingency tables).
- Measures for numerical data including mean, median, mode, range, variance, standard deviation, and quartiles.
- Scatter plots to examine relationships between two quantitative variables and measures of association like covariance and correlation coefficient.
- Contingency tables to study relationships between two categorical variables and examine dependency/independency.
- An example analyzing Titanic passenger data using contingency tables to examine the "first-class passengers first" policy.
Talk given at ISCB 2016 Birmingham
For indications and treatments where their use is possible, n-of-1 trials represent a promising means of investigating potential treatments for rare diseases. Each patient permits repeated comparison of the treatments being investigated and this both increases the number of observations and reduces their variability compared to conventional parallel group trials.
However, depending on whether the framework for analysis used is randomisation-based or model- based produces puzzling difference in inferences. This can easily be shown by starting on the one hand with the randomisation philosophy associated with the Rothamsted school of inference and building up the analysis through the block + treatment structure approach associated with John Nelder’s theory of general balance (as implemented in GenStat®) or starting on the other hand with a plausible variance component approach through a mixed model. However, it can be shown that these differences are related not so much to modelling approach per se but to the questions one attempts to answer: ranging from testing whether there was a difference between treatments in the patients studied, to predicting the true difference for a future patient, via making inferences about the effect in the average patient.
This in turn yields interesting insight into the long-run debate over the use of fixed or random effect meta-analysis.
Some practical issues of analysis will also be covered in R and SAS®, in which languages some functions and macros to facilitate analysis have been written. It is concluded that n-of-1 hold great promise in investigating chronic rare diseases but that careful consideration of matters of purpose, design and analysis is necessary to make best use of them.
Acknowledgement
This work is partly supported by the European Union’s 7th Framework Programme for research, technological development and demonstration under grant agreement no. 602552. “IDEAL”
Correlation & Regression Analysis using SPSSParag Shah
Concept of Correlation, Simple Linear Regression & Multiple Linear Regression and its analysis using SPSS. How it check the validity of assumptions in Regression
The document discusses biostatistics and statistics. It defines biostatistics as the application of statistics to topics in biology, with a focus on health applications such as survival analysis and longitudinal data analysis. It also discusses the role of biostatisticians in guiding experimental design, analyzing data, and interpreting results. The document then defines statistics and describes some key concepts in statistics including data collection, presentation of data, and drawing inferences from data. It discusses various methods of presenting data numerically and graphically, including tables, graphs, charts and diagrams. It also covers measures of central tendency like mean, median and mode, as well as measures of dispersion such as range, variance and standard deviation.
Linear regression is a statistical method used to explain the relationship between variables. The document discusses:
1) An agenda covering regression, diagnostics, differences between linear and logistic regression, assumptions, and interview questions.
2) Details on linear regression including understanding the algorithm, assumptions around linearity, normality, multicollinearity, autocorrelation, and homoscedasticity.
3) How to check if assumptions are violated including residual plots, Q-Q plots, and various statistical tests.
The document provides an in-depth overview of linear regression modeling, assumptions, and how to diagnose potential issues.
The document outlines key aspects of the self and its strivings, including self-concept, agency, identity, and self-regulation. It discusses self-schemas and how they motivate behavior to confirm one's self-view. Well-developed self-schemas allow people to efficiently process self-information and predict their own behavior. The document also examines how identity directs behaviors to express social roles and relate the self to society. Developing self-regulation skills through social learning helps achieve long-term goals by overriding impulses.
The document is a 31 slide presentation summarizing key concepts about personal control beliefs from Chapter 10 of Understanding Motivation and Emotion by Johnmarshall Reeve. It covers topics like self-efficacy, learned helplessness, mastery versus helplessness orientation, and reactance theory. Diagrams and tables from the textbook are reproduced to illustrate important models and studies.
This document is a chapter summary on mindsets from the book Understanding Motivation and Emotion by John Reeve. It outlines four main mindsets discussed in the chapter: deliberative vs implemental, promotion vs prevention, growth vs fixed, and consistency vs dissonance. For each mindset, it describes the key differences in thoughts, goals, definitions of success/failure, and downstream consequences on behavior. It also covers topics like achievement goals, cognitive dissonance, and motivation processes in depth.
The document summarizes goal setting and goal striving theories from the literature. It discusses how discrepancies between ideal and actual states create motivation to set goals. Setting specific, difficult goals can improve performance by directing attention and effort. Feedback is important for monitoring progress and generating positive or negative emotions. Both short-term and long-term goal setting have benefits, as do implementation intentions and knowing when to disengage from unattainable goals.
This document summarizes key concepts about implicit motives from the book "Understanding Motivation and Emotion" by John Reeve. It discusses implicit motives like achievement, affiliation, and power, and the social incentives that activate each motive. Conditions that involve and satisfy needs for achievement, affiliation/intimacy, and power are outlined. The leadership motive pattern in relation to power is also summarized.
The document summarizes key concepts about physiological needs from a psychology textbook. It discusses how physiological needs give rise to psychological drives that motivate behaviors to satisfy those needs. Specific needs examined include thirst, hunger, and sex. Environmental and self-regulatory factors that influence these needs are also explored.
The document summarizes the history of motivation theory from ancient philosophical origins to contemporary multi-level perspectives. It describes early grand theories about will, instinct and drive that dominated but declined as limitations were recognized. This led to the rise of mini-theories focusing on specific motivational phenomena. Current views emphasize multiple influences on motivation from neurological to social-cognitive levels and the active role of the individual.
This document outlines the syllabus for a university unit on motivation and emotion. It provides information on the teaching staff, learning outcomes, topics to be covered, assessment tasks including developing a topic, writing a book chapter and creating a multimedia presentation on the topic, grading criteria, and key dates. Students will have flexible online or blended delivery options. The unit aims to integrate theories and research to explain the role of motivation and emotion in human behavior.
Development and evaluation of the PCYC Catalyst outdoor adventure interventio...James Neill
This document summarizes the development and evaluation of the PCYC Catalyst Outdoor Adventure Youth Intervention Program. The program aims to provide a positive intervention for at-risk youth ages 13-16 using a 15-day outdoor adventure program. Research evaluated the program's effects on life effectiveness, mental health, and behavioral conduct. Results found moderate improvements in these areas from pre- to post-program and at 6-12 month follow up. Interviews revealed youth found the program challenging but fostered personal and social development. Recommendations focused on continued evaluation and ensuring program integrity.
This document summarizes 20 individual emotions categorized into three sections: basic emotions, self-conscious emotions, and cognitively complex emotions. For each emotion, the summary provides a brief description of what causes the emotion and its evolutionary function based on the source text by Reeve (2015). The basic emotions include fear, anger, disgust, sadness, joy, interest, and contempt. The self-conscious emotions include shame, guilt, embarrassment, pride, and triumph. The cognitively complex emotions include envy, gratitude, disappointment, regret, hope, schadenfreude, empathy, and compassion. Images are provided as examples to illustrate some of the emotions.
This document discusses how and why to edit Wikipedia. It explains that editing Wikipedia allows you to share your expertise, network with other experts, and build your online identity and that of your department, university or region. It also outlines how to get started editing Wikipedia, such as creating an account, making a user page, adding articles to your watchlist, and starting with small edits to existing articles. The basics of editing include adding bold, italics and underlining text, inserting headings, links, images and tables, and adding citations.
Going open (education): What, why, and how?James Neill
This document discusses open education, which involves using open educational resources (OERs) and reducing barriers to participation. It explains that open education has advantages like greater accessibility, flexibility, opportunities for quality improvement, and lower long-term costs. The document provides guidance on how to transition to open education, such as using well-known open education platforms, minimizing barriers, encouraging student contributions, and engaging with open communities of practice.
Introduction to motivation and emotion 2013James Neill
This document provides an overview of a university course on motivation and emotion. It includes the unit outline, contact information for the teaching staff, learning outcomes, syllabus, lecture and tutorial topics and schedules, required textbook, assessment tasks, student feedback, and an introduction to the study of motivation and emotion. The key assessments are a book chapter, multimedia presentation, and online quizzes. The lectures will cover theories and research on motivation in the first half and emotion in the second half. Tutorials provide hands-on activities related to the content and assessments.
Summary and conclusion - Survey research and design in psychologyJames Neill
This document provides an overview and summary of a lecture on survey research and design in psychology. It covers the following key points:
- Survey research involves using standardized questionnaires to collect data on psychological phenomena. It has become a popular social science method since the 1920s.
- Survey design considerations include whether the survey is self-administered or interview-based, the types of questions used, and response formats. Proper sampling and minimizing biases are also important.
- Analysis of survey data involves descriptive statistics, graphs, and correlations to describe and explore relationships in the data. Tools like exploratory factor analysis can be used to develop psychometric instruments. Multiple linear regression allows predicting outcomes from multiple variables.
The effects of green exercise on stress, anxiety and moodJames Neill
The document summarizes research on the effects of green exercise, which is physical exercise performed in natural settings. Several studies have found that green exercise is associated with moderate short-term improvements in both positive and negative indicators of psychological well-being, including reduced stress, anxiety, and negative mood, as well as increased positive mood and vitality. The effects appear to be influenced somewhat by the perceived naturalness of the environment and type of environmental cognition used during exercise, though more research is still needed to fully understand the underlying psychological processes.
The document outlines an intervention and review session, beginning with participants sharing insights from studying motivation and emotion, followed by a discussion of interventions described in Chapter 17 of a motivation textbook including supporting psychological needs, increasing growth mindsets, promoting emotion knowledge, and cultivating compassion. The review section will cover key topics from Chapters 1 through 16 of the motivation textbook.
Visualiation of quantitative informationJames Neill
This document discusses data visualization and graphing techniques. It covers levels of measurement, principles of graphing, and types of univariate graphs like bar charts, pie charts, histograms, and box plots. It emphasizes graphical integrity and avoiding distortion to clearly communicate the true story of the data.
Front Desk Management in the Odoo 17 ERPCeline George
Front desk officers are responsible for taking care of guests and customers. Their work mainly involves interacting with customers and business partners, either in person or through phone calls.
Split Shifts From Gantt View in the Odoo 17Celine George
Odoo allows users to split long shifts into multiple segments directly from the Gantt view.Each segment retains details of the original shift, such as employee assignment, start time, end time, and specific tasks or descriptions.
How to Store Data on the Odoo 17 WebsiteCeline George
Here we are going to discuss how to store data in Odoo 17 Website.
It includes defining a model with few fields in it. Add demo data into the model using data directory. Also using a controller, pass the values into the template while rendering it and display the values in the website.
AI Risk Management: ISO/IEC 42001, the EU AI Act, and ISO/IEC 23894PECB
As artificial intelligence continues to evolve, understanding the complexities and regulations regarding AI risk management is more crucial than ever.
Amongst others, the webinar covers:
• ISO/IEC 42001 standard, which provides guidelines for establishing, implementing, maintaining, and continually improving AI management systems within organizations
• insights into the European Union's landmark legislative proposal aimed at regulating AI
• framework and methodologies prescribed by ISO/IEC 23894 for identifying, assessing, and mitigating risks associated with AI systems
Presenters:
Miriama Podskubova - Attorney at Law
Miriama is a seasoned lawyer with over a decade of experience. She specializes in commercial law, focusing on transactions, venture capital investments, IT, digital law, and cybersecurity, areas she was drawn to through her legal practice. Alongside preparing contract and project documentation, she ensures the correct interpretation and application of European legal regulations in these fields. Beyond client projects, she frequently speaks at conferences on cybersecurity, online privacy protection, and the increasingly pertinent topic of AI regulation. As a registered advocate of Slovak bar, certified data privacy professional in the European Union (CIPP/e) and a member of the international association ELA, she helps both tech-focused startups and entrepreneurs, as well as international chains, to properly set up their business operations.
Callum Wright - Founder and Lead Consultant Founder and Lead Consultant
Callum Wright is a seasoned cybersecurity, privacy and AI governance expert. With over a decade of experience, he has dedicated his career to protecting digital assets, ensuring data privacy, and establishing ethical AI governance frameworks. His diverse background includes significant roles in security architecture, AI governance, risk consulting, and privacy management across various industries, thorough testing, and successful implementation, he has consistently delivered exceptional results.
Throughout his career, he has taken on multifaceted roles, from leading technical project management teams to owning solutions that drive operational excellence. His conscientious and proactive approach is unwavering, whether he is working independently or collaboratively within a team. His ability to connect with colleagues on a personal level underscores his commitment to fostering a harmonious and productive workplace environment.
Date: June 26, 2024
Tags: ISO/IEC 42001, Artificial Intelligence, EU AI Act, ISO/IEC 23894
-------------------------------------------------------------------------------
Find out more about ISO training and certification services
Training: ISO/IEC 42001 Artificial Intelligence Management System - EN | PECB
Webinars: https://pecb.com/webinars
Article: https://pecb.com/article
-------------------------------------------------------------------------------
Still I Rise by Maya Angelou
-Table of Contents
● Questions to be Addressed
● Introduction
● About the Author
● Analysis
● Key Literary Devices Used in the Poem
1. Simile
2. Metaphor
3. Repetition
4. Rhetorical Question
5. Structure and Form
6. Imagery
7. Symbolism
● Conclusion
● References
-Questions to be Addressed
1. How does the meaning of the poem evolve as we progress through each stanza?
2. How do similes and metaphors enhance the imagery in "Still I Rise"?
3. What effect does the repetition of certain phrases have on the overall tone of the poem?
4. How does Maya Angelou use symbolism to convey her message of resilience and empowerment?
Beyond the Advance Presentation for By the Book 9John Rodzvilla
In June 2020, L.L. McKinney, a Black author of young adult novels, began the #publishingpaidme hashtag to create a discussion on how the publishing industry treats Black authors: “what they’re paid. What the marketing is. How the books are treated. How one Black book not reaching its parameters casts a shadow on all Black books and all Black authors, and that’s not the same for our white counterparts.” (Grady 2020) McKinney’s call resulted in an online discussion across 65,000 tweets between authors of all races and the creation of a Google spreadsheet that collected information on over 2,000 titles.
While the conversation was originally meant to discuss the ethical value of book publishing, it became an economic assessment by authors of how publishers treated authors of color and women authors without a full analysis of the data collected. This paper would present the data collected from relevant tweets and the Google database to show not only the range of advances among participating authors split out by their race, gender, sexual orientation and the genre of their work, but also the publishers’ treatment of their titles in terms of deal announcements and pre-pub attention in industry publications. The paper is based on a multi-year project of cleaning and evaluating the collected data to assess what it reveals about the habits and strategies of American publishers in acquiring and promoting titles from a diverse group of authors across the literary, non-fiction, children’s, mystery, romance, and SFF genres.
Beginner's Guide to Bypassing Falco Container Runtime Security in Kubernetes ...anjaliinfosec
This presentation, crafted for the Kubernetes Village at BSides Bangalore 2024, delves into the essentials of bypassing Falco, a leading container runtime security solution in Kubernetes. Tailored for beginners, it covers fundamental concepts, practical techniques, and real-world examples to help you understand and navigate Falco's security mechanisms effectively. Ideal for developers, security professionals, and tech enthusiasts eager to enhance their expertise in Kubernetes security and container runtime defenses.
1. Lecture 7
Survey Research & Design in Psychology
James Neill, 2017
Creative Commons Attribution 4.0
Image source:http://commons.wikimedia.org/wiki/File:Vidrarias_de_Laboratorio.jpg
Multiple Linear Regression I
2. 2
Overview
1. Readings
2. Correlation (Review)
3. Simple linear regression
4. Multiple linear regression
5. Summary
6. MLR I Quiz - Practice questions
5. 5
Purposes of
correlational statistics
Explanatory - Regression
e.g., cross-sectional study
(all data collected at same
time)
Predictive - Regression
e.g., longitudinal study
(predictors collected prior
to outcome measures)
6. 6
Linear correlation
● Linear relations between interval or ratio
variables
● Best fitting straight-line on a scatterplot
7. 7
Correlation – Key points
• Covariance = sum of cross-products
(unstandardised)
• Correlation = sum of cross-products
(standardised), ranging from -1 to 1
(sign indicates direction, value indicates size)
• Coefficient of determination (r2
)
indicates % of shared variance
• Correlation does not necessarily
equal causality
8. 8
Correlation is shared variance
Venn diagrams are helpful for depicting
relations between variables.
.32 .68.68
r2
=
10. 10
What is simple linear regression?
• An extension of correlation
• Best-fitting straight line for a scatterplot
between two variables:
• predictor (X) – also called an independent
variable (IV)
• outcome (Y) - also called a
dependent variable (DV) or criterion variable
• LR uses an IV to explain/predict a DV
• Help to understand relationships and
possible causal effects of one variable
on another.
11. 11
Least squares criterionThe line of best fit minimises
the total sum of squares of
the vertical deviations for
each case.
a = point at which line of
best fit crosses the Y-axis.
b = slope
of the line of best fit
Least squares criterion
residuals
= vertical (Y) distance
between line of best fit
and each observation
(unexplained variance)
12. 12
Linear Regression - Example:
Cigarettes & coronary heart disease
IV = Cigarette
consumption
DV = Coronary
Heart Disease
IV = Cigarette
consumption
Landwehr & Watkins (1987, cited in Howell, 2004, pp. 216-218)
13. 13
Linear regression - Example:
Cigarettes & coronary heart disease
(Howell, 2004)
Research question:
How fast does CHD mortality rise
with a one unit increase in smoking?
• IV = Av. # of cigs per adult per day
• DV = CHD mortality rate (deaths per
10,000 per year due to CHD)
• Unit of analysis = Country
15. 15Cigarette Consumption per Adult per Day
12108642
CHDMortalityper10,000 30
20
10
0
Linear regression - Example:
Scatterplot with Line of Best Fit
16. 16
Linear regression equation
(without error)
predicted
values of Y
Y-intercept =
level of Y
when X is 0
b = slope = rate of
predicted ↑/↓ for Y
scores for each unit
increase in X
17. 17
Y = bX + a + e
X = IV values
Y = DV values
a = Y-axis intercept
b = slope of line of best fit
(regression coefficient)
e = error
Linear regression equation
(with error)
18. 18
Linear regression – Example:
Equation
Variables:
• (DV) = predicted rate of CHD mortality
• X (IV) = mean # of cigarettes per adult
per day per country
Regression co-efficients:
• b = rate of ↑/↓ of CHD mortality for each
extra cigarette smoked per day
• a = baseline level of CHD (i.e., CHD
when no cigarettes are smoked)
19. 19
Linear regression – Example:
Explained variance
• r = .71
• r2
= .712
= .51
• p < .05
• Approximately 50% in variability
of incidence of CHD mortality is
associated with variability in
countries' smoking rates.
20. 20
Linear regression – Example:
Test for overall significance
ANOVAb
454.482 1 454.48 19.59 .00a
440.757 19 23.198
895.238 20
Regression
Residual
Total
Sum of
Squares df
Mean
Square F Sig.
Predictors: (Constant), Cigarette Consumption per
Adult per Day
a.
Dependent Variable: CHD Mortality per 10,000b.
● r = .71, r2
= .51, p < .05
21. 21
Linear regression – Example:
Regression coefficients - SPSS
Coefficientsa
2.37 2.941 .80 .43
2.04 .461 .713 4.4 .00
(Constant)
Cigarette
Consumption
per Adult per
Day
B
Std.
Error
Unstandardiz
ed
Coefficients
Beta
Standardized
Coefficients
t Sig.
Dependent Variable: CHD Mortality per 10,000a.
a
b
22. 22
Linear regression - Example:
Making a prediction
● What if we want to predict CHD mortality
when cigarette consumption is 6?
● We predict that 14.61 / 10,000 people in a
country with an average cigarette
consumption of 6 per person will die of
CHD per annum.
61.1437.26*04.2ˆ
37.204.2ˆ
=+=
+=+=
Y
XabXY
23. 23
Linear regression - Example:
Accuracy of prediction - Residual
• Finnish smokers smoke 6
cigarettes/adult/day
• We predict 14.61 deaths /10,000
• But Finland actually has 23
deaths / 10,000
• Therefore, the error (“residual”)
for this case is 23 - 14.61 = 8.39
25. 25
Hypothesis testing
Null hypotheses (H0
):
• a (Y-intercept) = 0
Unless the DV is ratio (meaningful 0), we are not
usually very interested in the a value (starting
value of Y when X is 0).
• b (slope of line of best fit) = 0
26. 26
Linear regression – Example:
Testing slope and intercept
Coefficientsa
2.37 2.941 .80 .43
2.04 .461 .713 4.4 .00
(Constant)
Cigarette
Consumption
per Adult per
Day
B
Std.
Error
Unstandardiz
ed
Coefficients
Beta
Standardized
Coefficients
t Sig.
Dependent Variable: CHD Mortality per 10,000a.
a
b
a is not significant -
baseline CHD may be
neglible.
b is significant (+ve) -
smoking is +vely
associated with CHD
27. 27
Linear regression - Example
Does a tendency to
‘ignore problems’ (IV)
predict
‘psychological distress’ (DV)?
28. 28
Ignore the Problem
543210
PsychologicalDistress
140
120
100
80
60
40
20 Rsq = 0.1058
Line of best fit
seeks to minimise
sum of squared
residuals
PD is
measured
in the
direction of
mental
health – i.e.,
high scores
mean less
distress.
Higher IP scores indicate
greater frequency of ignoring
problems as a way of coping.
29. 29
Model Summary
.325a .106 .102 19.4851
Model
1
R R Square
Adjusted
R Square
Std. Error of
the Estimate
Predictors: (Constant), IGNO2 ACS Time 2 - 11. Ignorea.
Ignoring Problems accounts for ~10% of the
variation in Psychological Distress
Linear regression - Example
R = .32, R2
= .11, Adjusted R2
= .10
The predictor (Ignore the Problem) explains
approximately 10% of the variance in the
dependent variable (Psychological Distress).
30. 30
ANOVAb
9789.888 1 9789.888 25.785 .000a
82767.884 218 379.669
92557.772 219
Regression
Residual
Total
Model
1
Sum of
Squares df Mean Square F Sig.
Predictors: (Constant), IGNO2 ACS Time 2 - 11. Ignorea.
Dependent Variable: GWB2NEGb.
The population relationship between Ignoring
Problems and Psychological Distress is
unlikely to be 0% because p = .000
(i.e., reject the null hypothesis that there is no
relationship)
Linear regression - Example
31. 31
Coefficientsa
118.897 4.351 27.327 .000
-9.505 1.872 -.325 -5.078 .000
(Constant)
IGNO2 ACS Time
2 - 11. Ignore
Model
1
B Std. Error
Unstandardized
Coefficients
Beta
Standardi
zed
Coefficien
ts
t Sig.
Dependent Variable: GWB2NEGa.
PD = 119 - 9.5*IP
There is a sig. a or constant (Y-intercept) - this
is the baseline level of Psychological Distress.
In addition, Ignore Problems (IP) is a
significant predictor of Psychological Distress
(PD).
Linear regression - Example
33. 33
Linear regression summary
• Linear regression is for
explaining or predicting the
linear relationship between two
variables
• Y = bx + a + e
• = bx + a
(b is the slope; a is the Y-intercept)
35. 35
Linear Regression
X Y
Multiple Linear Regression
X1
X2
X3 Y
X4
X5
What is multiple linear regression (MLR)?
Visual model
Single predictor
Multiple
predictors
36. 36
What is MLR?
• Use of several IVs to predict a DV
• Weights each predictor (IV)
according to the strength of its
linear relationship with the DV
• Makes adjustments for inter-
relationships among predictors
• Provides a measure of overall fit (R)
38. 38
What is MLR?
A 3-way scatterplot can depict the correlational
relationship between 3 variables.
However, it is difficult to graph/visualise 4+-
way relationships via scatterplot.
39. 39
General steps
1. Develop a diagrammatic model
and express a research
question and/or hypotheses
2. Check assumptions
3. Choose type of MLR
4. Interpret output
5. Develop a regression equation
(if needed)
40. 40
• ~50% of the variance in CHD
mortality could be explained by
cigarette smoking (using LR)
• Strong effect - but what about the
other 50% (‘unexplained’
variance)?
• What about other predictors?
–e.g., exercise and cholesterol?
LR → MLR example:
Cigarettes & coronary heart disease
41. 41
MLR – Example
Research question 1
How well do these three IVs:
• # of cigarettes / day (IV1)
• exercise (IV2) and
• cholesterol (IV3)
predict
• CHD mortality (DV)?
Cigarettes
Exercise CHD Mortality
Cholesterol
42. 42
MLR – Example
Research question 2
To what extent do personality factors
(IVs) predict annual income (DV)?
Extraversion
Neuroticism Income
Psychoticism
43. 43
MLR - Example
Research question 3
“Does the # of years of formal study
of psychology (IV1) and the no. of
years of experience as a
psychologist (IV2) predict clinical
psychologists’ effectiveness in
treating mental illness (DV)?”
Study
Experience Effectiveness
44. 44
MLR - Example
Your example
Generate your own MLR research
question
(e.g., based on some of the following variables):
• Gender & Age
• Enrolment Type
• Hours
• Stress
• Time management
– Planning
– Procrastination
– Effective actions
• Time perspective
– Past-Negative
– Past-Positive
– Present-Hedonistic
– Present-Fatalistic
– Future-Positive
– Future-Negative
45. 45
Assumptions
• Levels of measurement
• Sample size
• Normality (univariate, bivariate, and multivariate)
• Linearity: Linear relations between IVs & DVs
• Homoscedasticity
• Multicollinearity
– IVs are not overly correlated with one another
(e.g., not over .7)
• Residuals are normally distributed
46. 46
Levels of measurement
• DV = Continuous
(Interval or Ratio)
• IV = Continuous or Dichotomous
(if neither, may need to recode
into a dichotomous variable
or create dummy variables)
47. 47
Dummy coding
• “Dummy coding” converts a more
complex variable into a series of
dichotomous variables
(i.e., 0 or 1)
• Several dummy variables can be
created from a variable with a
higher level of measurement.
48. 48
Dummy coding - Example
• Religion
(1 = Christian; 2 = Muslim; 3 = Atheist)
in this format, can't be an IV in regression
(a linear correlation with a categorical variable doesn't
make sense)
• However, it can be dummy coded into
dichotomous variables:
– Christian (0 = no; 1 = yes)
– Muslim (0 = no; 1 = yes)
– Atheist (0 = no; 1 = yes) (redundant)
• These variables can then be used as IVs.
• More information (Dummy variable (statistics), Wikiversity)
49. 49
Sample size:
Rules of thumb
• Enough data is needed to provide reliable estimates
of the correlations.
• N >= 50 cases and N >= 10 to 20 cases x no. of
IVs, otherwise the estimates of the regression line are probably
unstable and are unlikely to replicate if the study is repeated.
• Green (1991) and Tabachnick & Fidell (2013)
suggest:
– 50 + 8(k) for testing an overall regression model and
– 104 + k when testing individual predictors (where k is the
number of IVs)
– Based on detecting a medium effect size (β >= .20), with
critical α <= .05, with power of 80%.
50. 50
Sample size:
Rules of thumb
Q: Should a researcher conduct an MLR
with 4 predictors with 200 cases?
A: Yes; satisfies all rules of thumb:
• N > 50 cases
• N > 20 cases x 4 = 80 cases
• N > 50 + 8 x 4 = 82 cases
• N > 104 + 4 = 108 cases
51. 51
Dealing with outliers
Extreme cases should be deleted or
modified if they are overly influential.
• Univariate outliers -
detect via initial data screening
(e.g., min. and max.)
• Bivariate outliers -
detect via scatterplots
• Multivariate outliers -
unusual combination of predictors – detect via
Mahalanobis' distance
52. 52
Multivariate outliers
• A case may be within normal range for
each variable individually, but be a
multivariate outlier based on an unusual
combination of responses which unduly
influences multivariate test results.
• e.g., a person who:
–Is 18 years old
–Has 3 children
–Has a post-graduate degree
54. 54
Multivariate outliers
• Mahalanobis' distance (MD)
– Distributed as χ2
with df equal to the number of
predictors (with critical α = .001)
– Cases with a MD greater than the critical value
are multivariate outliers.
• Cook’s D
– Cases with CD values > 1 are multivariate
outliers.
• Use either MD or CD
• Examine cases with extreme MD or CD
scores - if in doubt, remove & re-run.
55. 55
Normality &
homoscedasticity
Normality
• If variables are non-normal,
this will create
heteroscedasticity
Homoscedasticity
• Variance around the
regression line should be
the same throughout the
distribution
• Even spread in residual
plots
56. 56
Multicollinearity
• Multicollinearity – IVs shouldn't be
overly correlated (e.g., over .7) – if so,
consider combining them into a single
variable or removing one.
• Singularity - perfect correlations among
IVs.
• Leads to unstable regression
coefficients.
57. 57
Multicollinearity
Detect via:
Correlation matrix - are there
large correlations among IVs?
Tolerance statistics - if < .3 then
exclude that variable.
Variance Inflation Factor (VIF) –
if > 3, then exclude that variable.
VIF is the reciprocal of Tolerance
(so use one or the other – not both)
58. 58
Causality
• Like correlation, regression does
not tell us about the causal
relationship between variables.
• In many analyses, the IVs and DVs
could be swapped around –
therefore, it is important to:
–Take a theoretical position
–Acknowledge alternative explanations
59. 59
Multiple correlation coefficient
(R)
• “Big R” (capitalised)
• Equivalent of r, but takes into
account that there are multiple
predictors (IVs)
• Always positive, between 0 and 1
• Interpretation is similar to that for r
(correlation coefficient)
60. 60
Coefficient of determination (R2
)
• “Big R squared”
• Squared multiple correlation
coefficient
• Always include R2
• Indicates the % of variance in
DV explained by combined
effects of the IVs
• Analogous to r2
61. 61
Rule of thumb for
interpretation of R2
• .00 = no linear relationship
• .10 = small (R ~ .3)
• .25 = moderate (R ~ .5)
• .50 = strong (R ~ .7)
• 1.00 = perfect linear relationship
R2
> .30
is “good” in social sciences
62. 62
Adjusted R2
• R2
is explained variance in a sample.
• Adjusted R2
is used for estimating
explained variance in a population.
• Report R2
and adjusted R2
.
• Particularly for small N and where
results are to be generalised, take
more note of adjusted R2
.
63. 63
Multiple linear regression –
Test for overall significance
• Shows if there is a significant
linear relationship between the X
variables taken together and Y
• Examine F and p in the ANOVA
table to determine the likelihood
that the explained variance in Y
could have occurred by chance
64. 64
Regression coefficients
• Y-intercept (a)
• Slopes (b):
–Unstandardised
–Standardised
• Slopes are the weighted loading of
each IV on the DV, adjusted for the
other IVs in the model.
65. 65
Unstandardised
regression coefficients
• B = unstandardised regression
coefficient
• Used for regression equations
• Used for predicting Y scores
• But can’t be compared with other Bs
unless all IVs are measured on the
same scale
66. 66
Standardised
regression coefficients
• Beta (β) = standardised regression
coefficient
• Useful for comparing the relative
strength of predictors
• β = r in LR but this is only true in
MLR when the IVs are uncorrelated.
67. 67
Test for significance:
Independent variables
Indicates the likelihood of a linear
relationship between each IV (Xi)
and Y occurring by chance.
Hypotheses:
H0: βi = 0 (No linear relationship)
H1: βi ≠ 0 (Linear relationship
between Xi and Y)
68. 68
Relative importance of IVs
• Which IVs are the most important?
• To answer this, compare the
standardised regression
coefficients (βs)
69. 69
Y = b1x1 + b2x2 +.....+ bixi + a + e
• Y = observed DV scores
• bi = unstandardised regression
coefficients (the Bs in SPSS) -
slopes
• x1 to xi = IV scores
• a = Y axis intercept
• e = error (residual)
Regression equation
70. 70
Multiple linear regression -
Example
“Does ‘ignoring problems’ (IV1)
and ‘worrying’ (IV2)
predict ‘psychological distress’
(DV)”
73. 73
Multiple linear regression -
Example
Together, Ignoring Problems and Worrying
explain 30% of the variance in Psychological
Distress in the Australian adolescent
population (R2
= .30, Adjusted R2
= .29).
75. 75
Coefficientsa
138.932 4.680 29.687 .000
-11.511 1.510 -.464 -7.625 .000
-4.735 1.780 -.162 -2.660 .008
(Constant)
Worry
Ignore the Problem
Model
1
B Std. Error
Unstandardized
Coefficients
Beta
Standardized
Coefficients
t Sig.
Dependent Variable: Psychological Distressa.
Multiple linear regression -
Example
Worry predicts about three times as much
variance in Psychological Distress than Ignoring
the Problem, although both are significant,
negative predictors of mental health.
76. 76
Linear Regression
PD (hat) = 119 – 9.50*Ignore
R2
= .11
Multiple Linear Regression
PD (hat) = 139 - .4.7*Ignore - 11.5*Worry
R2
= .30
Multiple linear regression -
Example – Prediction equations
77. 77
Confidence interval for the slope
Mental Health (PD) is reduced by between 8.5 and
14.5 units per increase of Worry units.
Mental Health (PD) is reduced by between 1.2 and
8.2 units per increase in Ignore the Problem units.
78. 78
Multiple linear regression - Example
Effect of violence, stress, social support
on internalising behaviour problems
Kliewer, Lepore, Oskin, & Johnson, (1998)
Image source: http://cloudking.com/artists/noa-terliuc/family-violence.php
79. 79
Multiple linear regression –
Example – Violence study
• Participants were children:
– 8 - 12 years
– Lived in high-violence areas, USA
• Hypotheses:
– Stress → ↑ internalising behaviour
– Violence → ↑ internalising behaviour
– Social support → ↓ internalising behaviour
80. 80
Multiple linear regression –
Example - Variables
• Predictors
–Degree of witnessing violence
–Measure of life stress
–Measure of social support
• Outcome
–Internalising behaviour
(e.g., depression, anxiety, withdrawal
symptoms) – measured using the
Child Behavior Checklist (CBCL)
81. 81
Correlations
Correlations
Pearson Correlation
.050
.080 -.080
.200* .270** -.170
Amount violenced
witnessed
Current stress
Social support
Internalizing symptoms
on CBCL
Amount
violenced
witnessed
Current
stress
Social
support
Internalizin
g
symptoms
on CBCL
Correlation is significant at the 0.05 level (2-tailed).*.
Correlation is significant at the 0.01 level (2-tailed).**.
Correlations
amongst
the IVs
Correlations
between the
IVs and the DV
82. 82
Model Summary
.37a .135 .108 2.2198
R
R
Square
Adjusted
R
Square
Std. Error
of the
Estimate
Predictors: (Constant), Social
support, Current stress, Amount
violenced witnessed
a.
R2
13.5% of the variance in children's internalising
symptoms can be explained by the 3 predictors.
83. 83
Regression coefficients
Coefficientsa
.477 1.289 .37 .712
.038 .018 .201 2.1 .039
.273 .106 .247 2.6 .012
-.074 .043 -.166 -2 .087
(Constant)
Amount
violenced
witnessed
Current stress
Social
support
B
Std.
Error
Unstandardized
Coefficients
Beta
Standardized
Coefficients
t Sig.
Dependent Variable: Internalizing symptoms on CBCa.
2 predictors
have
p < .05
84. 84
Regression equation
• A separate coefficient or slope for
each variable
• An intercept (here its called b0
)
477.0074.0273.0038.0
ˆ 0332211
+−+=
+++=
SocSuppStressWit
bXbXbXbY
85. 85
Interpretation
• Slopes for Witness and Stress are +ve;
slope for Social Support is -ve.
• Ignoring Stress and Social Support, a
one unit increase in Witness would
produce .038 unit increase in
Internalising symptoms.
477.0074.0273.0038.0
ˆ 0332211
+−+=
+++=
SocSuppStressWit
bXbXbXbY
86. 86
Predictions
Q: If Witness = 20, Stress = 5, and
SocSupp = 35, what we would
predict internalising symptoms to be?
A: .012
012.
477.0)35(074.)5(273.)20(038.
477.0*074.*273.*038.ˆ
=
+−+=
+−+= SocSuppStressWitY
87. 87
Multiple linear regression - Example
The role of human, social, built, and natural
capital in explaining life satisfaction at the
country level:
Towards a National Well-Being Index (NWI)
Vemuri & Costanza (2006)
88. 88
Variables
• IVs:
–Human & Built Capital
(Human Development Index)
–Natural Capital
(Ecosystem services per km2
)
–Social Capital
(Press Freedom)
• DV = Life satisfaction
• Units of analysis: Countries
(N = 57; mostly developed countries, e.g., in Europe
and America)
89. 89
● There are moderately strong positive and
statistically significant linear relations between
the IVs and the DV
● The IVs have small to moderate positive
inter-correlations.
93. 93
Types of MLR
• Standard or direct (simultaneous)
• Hierarchical or sequential
• Stepwise (forward & backward)
Image source: https://commons.wikimedia.org/wiki/File:IStumbler.png
94. 94
• All predictor variables are entered
together (simultaneously)
• Allows assessment of the relationship
between all predictor variables and the
outcome (Y) variable if there is good
theoretical reason for doing so.
• Manual technique & commonly used.
• If you're not sure what type of MLR to
use, start with this approach.
Direct or Standard
95. 95
• IVs are entered in blocks or stages.
–Researcher defines order of entry for the
variables, based on theory.
–May enter ‘nuisance’ variables first to
‘control’ for them, then test ‘purer’ effect of
next block of important variables.
• R2
change - additional variance in Y
explained at each stage of the regression.
– F test of R2
change.
Hierarchical (Sequential)
96. 96
• Example
– Drug A is a cheap, well-proven drug which reduces
AIDS symptoms
– Drug B is an expensive, experimental drug which
could help to cure AIDS
– Hierarchical linear regression:
• Step 1: Drug A (IV1)
• Step 2: Drug B (IV2)
• DV = AIDS symptoms
• Research question: To what extent does Drug B
reduce AIDS symptoms above and beyond the effect
of Drug A?
• Examine the change in R2
between Step 1 & Step 2
Hierarchical (Sequential)
97. 97
• Computer-driven – controversial.
• Starts with 0 predictors, then the
strongest predictor is entered into
the model, then the next strongest
etc. if they reach a criteria (e.g., p
< .05)
Forward selection
98. 98
• Computer-driven – controversial.
• All predictor variables are
entered, then the weakest
predictors are removed, one by
one, if they meet a criteria (e.g., p
> .05)
Backward elimination
99. 99
• Computer-driven – controversial.
• Combines forward & backward.
• At each step, variables may be
entered or removed if they meet
certain criteria.
• Useful for developing the best
prediction equation from a large
number of variables.
• Redundant predictors are removed.
Stepwise
100. 100
Which method?
• Standard: To assess impact of
all IVs simultaneously
• Hierarchical: To test IVs in a
specific order (based on
hypotheses derived from theory)
• Stepwise: If the goal is accurate
statistical prediction from a large
# of variables - computer driven
102. 102
Summary: General steps
1. Develop model and hypotheses
2. Check assumptions
3. Choose type
4. Interpret output
5. Develop a regression equation
(if needed)
103. 103
Summary: Linear regression
1. Best-fitting straight line for a
scatterplot of two variables
2. Y = bX + a + e
1. Predictor (X; IV)
2. Outcome (Y; DV)
3. Least squares criterion
4. Residuals are the vertical
distance between actual and
predicted values
104. 104
Summary:
MLR assumptions
1. Level of measurement
2. Sample size
3. Normality
4. Linearity
5. Homoscedasticity
6. Collinearity
7. Multivariate outliers
8. Residuals should be normally
distributed
105. 105
Summary:
Level of measurement and
dummy coding
1. Levels of measurement
1. DV = Interval or ratio
2. IV = Interval or ratio or dichotomous
2. Dummy coding
1. Convert complex variables into series of
dichotomous IVs
107. 107
Summary:
MLR output
1. Overall fit
1. R, R2
, Adjusted R2
2. F, p
2. Coefficients
1. Relation between each IV and the DV,
adjusted for the other IVs
2. B, β, t, p, and rp
3. Regression equation (if useful)
Y = b1x1 + b2x2 +.....+ bixi + a + e
109. 109
MLR I Quiz –
Practice question 1
A linear regression analysis produces the
equation Y = 0.4X + 3. This indicates
that:
(a) When Y = 0.4, X = 3
(b) When Y = 0, X = 3
(c) When X = 3, Y = 0.4
(d) When X = 0, Y = 3
(e) None of the above
110. 110
MLR I Quiz –
Practice question 1
Multiple linear regression is a
________ type of statistical analysis.
(a) univariate
(b) bivariate
(c) multivariate
111. 111
MLR I Quiz –
Practice question 3
Multiple linear regression is a
________ type of statistical analysis.
(a) univariate
(b) bivariate
(c) multivariate
112. 112
MLR I Quiz –
Practice question 4
The following types of data can be used in
MLR (choose all that apply):
(a) Interval or higher DV
(b) Interval or higher IVs
(c) Dichotomous Ivs
(d) All of the above
(e) None of the above
113. 113
MLR I Quiz –
Practice question 5
In MLR, the square of the multiple
correlation coefficient, R2
, is called the:
(a) Coefficient of determination
(b) Variance
(c) Covariance
(d) Cross-product
(e) Big R
114. 114
MLR I Quiz –
Practice question 6
In MLR, a residual is the difference
between the predicted Y and actual Y
values.
(a) True
(b) False
115. 115
Next lecture
• Review of MLR I
• Semi-partial correlations
• Residual analysis
• Interactions
• Analysis of change
116. 116
References
Howell, D. C. (2004). Chapter 9: Regression. In D. C. Howell..
Fundamental statistics for the behavioral sciences (5th ed.) (pp. 203-
235). Belmont, CA: Wadsworth.
Howitt, D. & Cramer, D. (2011). Introduction to statistics in psychology
(5th ed.). Harlow, UK: Pearson.
Kliewer, W., Lepore, S.J., Oskin, D., & Johnson, P.D. (1998). The role of
social and cognitive processes in children’s adjustment to community
violence. Journal of Consulting and Clinical Psychology, 66, 199-209.
Landwehr, J.M. & Watkins, A.E. (1987) Exploring data: Teacher’s
edition. Palo Alto, CA: Dale Seymour Publications.
Tabachnick, B. G., & Fidell, L. S. (2013) (6th ed. - International ed.).
Multiple regression [includes example write-ups]. In Using multivariate
statistics (pp. 117-170). Boston, MA: Allyn and Bacon.
Vemuri, A. W., & Constanza, R. (2006). The role of human, social, built,
and natural capital in explaining life satisfaction at the country level:
Toward a National Well-Being Index (NWI). Ecological Economics,
58(1), 119-133.
117. 117
Open Office Impress
● This presentation was made using
Open Office Impress.
● Free and open source software.
● http://www.openoffice.org/product/impress.html
Editor's Notes
7126/6667 Survey Research & Design in Psychology
Semester 1, 2017, University of Canberra, ACT, Australia
James T. Neill
Home page: http://en.wikiversity.org/wiki/Survey_research_and_design_in_psychology
Lecture page: http://en.wikiversity.org/wiki/Survey_research_and_design_in_psychology/Lectures/Multiple_linear_regression_I
Image source:http://commons.wikimedia.org/wiki/File:Vidrarias_de_Laboratorio.jpg
License: Public domain
Description:
Introduces and explains the use of linear regression and multiple linear regression in the context of psychology.
This lecture is accompanied by a computer-lab based tutorial, the notes for which are available here:
http://ucspace.canberra.edu.au/display/7126/Tutorial+-+Multiple+linear+regression
Image source: http://commons.wikimedia.org/wiki/File:Information_icon4.svg
License: Public domain
Assumed knowledge: A solid understanding of linear correlation.
Image source: http://commons.wikimedia.org/wiki/File:Information_icon4.svg
License: Public domain
Image source: James Neill, Creative Commons Attribution-Share Alike 2.5 Australia, http://creativecommons.org/licenses/by-sa/2.5/au/
Image source: James Neill, Creative Commons Attribution-Share Alike 2.5 Australia, http://creativecommons.org/licenses/by-sa/2.5/au/
Regression tends not to be used for Exploratory or Descriptive purposes.
Image source: http://commons.wikimedia.org/wiki/File:Information_icon4.svg
License: Public domain
Image name:
James Neill
License: Public domain
Image source: Unknown
To describe the regression line, need the slope of the line of best fit and the point at which it touches the Y-axis.
A line of best fit can be applied using any method e.g., by eye/hand.
Another way is to use the Method of Least Squares – a formula which minimizes the sum of the vertical deviations
See also - http://www.hrma-agrh.gc.ca/hr-rh/psds-dfps/dafps_basic_stat2_e.asp#D
Image sources: Unknown
Example from Landwehr & Watkins (1987), cited in Howell (2004, pp. 216-218) and accompanying powerpoint lecture notes).
Image source: Public domain
http://commons.wikimedia.org/wiki/File:Scatterplot_WithLRForumulaIndicated.png
Y = a + bx + e
X = predictor value (IV)
Y = predicted value (DV)
a = Y axis intercept(Y-intercept – i.e., when X is 0)
b = unstandardised regression coefficient (i.e. B in SPSS)(regression coefficient - slope – line of best fit – average rate at which Y changes with one unit change in X)
e = error
LaTeX: \hat Y=bx+a
R2 needs to be provided first
Image source: James Neill, Creative Commons Attribution-Share Alike 2.5 Australia, http://creativecommons.org/licenses/by-sa/2.5/au/
Image source: James Neill, Creative Commons Attribution-Share Alike 2.5 Australia, http://creativecommons.org/licenses/by-sa/2.5/au/
The intercept is labeled “constant.”
Slope is labeled by name of predictor variable.
The variance of these residuals is indicated by the standard error in the regression coefficients table
Image source: Unknown
(rho - population correlation) = 0
Image source: James Neill, Creative Commons Attribution-Share Alike 2.5 Australia, http://creativecommons.org/licenses/by-sa/2.5/au/
Significance tests of the slope and intercept are given as t-tests.
The t values in the second from right column are tests on slope and intercept.
The associated p values are next to them.
The slope is significantly different from zero, but not the intercept.
Image source: James Neill, Creative Commons Attribution-Share Alike 2.5 Australia, http://creativecommons.org/licenses/by-sa/2.5/au/
Note that high scores indicate good mental health, i.e., absence of distress
Image source: James Neill, Creative Commons Attribution-Share Alike 2.5 Australia, http://creativecommons.org/licenses/by-sa/2.5/au/
R = correlation [multiple correlation in MLR]
R2 = % of variance explained
Adjusted R2 = % of variance, reduced estimate, bigger adjustments for small samples
In this case, Ignoring Problems accounts for ~10% of the variation in Psychological Distress
Image source: James Neill, Creative Commons Attribution-Share Alike 2.5 Australia, http://creativecommons.org/licenses/by-sa/2.5/au/
The MLR ANOVA table provides a significance test of R
It is NOT a “normal ANOVA” (test of mean differences)
tests whether a significant (non-zero) amount of variance is explained? (null hypothesis is zero variance explained)
In this case a significant amount of Psychological Distress variance is explained by Ignoring Problems, F(1,218) = 25.78, p &lt; .01
Image source: James Neill, Creative Commons Attribution-Share Alike 2.5 Australia, http://creativecommons.org/licenses/by-sa/2.5/au/
Multiple regression coefficient table
Analyses the relationship of each IV with the DV
For each IV, examine B, Beta, t and sig.
B = unstandardised regression coefficient[use in prediction equations]
Beta (b) = standardised regression coefficient[use to compare predictors with one another]
t-test & sig. shows the statistical likelihood of a DV-IV relationship being caused by chance alone
Image source: http://commons.wikimedia.org/wiki/File:Information_icon4.svg
License: Public domain
Image soruce: http://commons.wikimedia.org/wiki/File:Example_path_diagram_%28conceptual%29.png
License: Public domain
Use of several IVs to predict a DV
Image source: James Neill, Creative Commons Attribution-Share Alike 2.5 Australia, http://creativecommons.org/licenses/by-sa/2.5/au/
In MLR there are:
multiple predictor X variables (IVs) and
a single predicted Y (DV)
Interrelationships between predictors e.g. IVs = height, gender →
DV = weight
IVs = metric (interval or ratio) or dichotomous, e.g. age and gender
DV = metric (interval or ratio), e.g., pulse
Linear relations exist between IVs & DVs, e.g., check scatterplots
IVs are not overly correlated with one another (e.g., over .7) – if so, apply cautiously
Assumptions for correlation apply e.g., watch out for outliers, non-linear relationships, etc.
Homoscedasticity – similar/even spread of data from line of best throughout the distribution
For more on assumptions, see http://www.visualstatistics.net/web%20Visual%20Statistics/Visual%20Statistics%20Multimedia/correlation_assumtions.htm
IVs = metric (interval or ratio) or dichotomous, e.g. age and gender
DV = metric (interval or ratio), e.g., pulse
Linear relations exist between IVs & DVs, e.g., check scatterplots
IVs are not overly correlated with one another (e.g., over .7) – if so, apply cautiously
Assumptions for correlation apply e.g., watch out for outliers, non-linear relationships, etc.
Homoscedasticity – similar/even spread of data from line of best throughout the distribution
For more on assumptions, see http://www.visualstatistics.net/web%20Visual%20Statistics/Visual%20Statistics%20Multimedia/correlation_assumtions.htm
IVs = metric (interval or ratio) or dichotomous, e.g. age and gender
DV = metric (interval or ratio), e.g., pulse
Linear relations exist between IVs & DVs, e.g., check scatterplots
IVs are not overly correlated with one another (e.g., over .7) – if so, apply cautiously
Assumptions for correlation apply e.g., watch out for outliers, non-linear relationships, etc.
Homoscedasticity – similar/even spread of data from line of best throughout the distribution
For more on assumptions, see http://www.visualstatistics.net/web%20Visual%20Statistics/Visual%20Statistics%20Multimedia/correlation_assumtions.htm
Image source: Unknown
Regression can establish correlational link, but cannot determine causation.
The coefficient of determination is a measure of how well the regression line represents the data. If the regression line passes exactly through every point on the scatter plot, it would be able to explain all of the variation and R2 would be 1. The further the line is away from the points, the less it is able to explain. If the scatterplot is completely random and there is zero relationship between the IVs and the DV, then R2 will be 0.
As number of predictors approaches N, R2 is inflated
= r in LR but this is only true in MLR when the IVs are uncorrelated.
If IVs are uncorrelated (usually not the case) then you can simply use the correlations between the IVs and the DV to determine the strength of the predictors.
If the IVs are standardised (usually not the case), then the unstandardised regression coefficients (B) can be compared to determine the strength of the predictors.
If the IVs are measured using the same scale (sometimes the case), then the unstandardised regression coefficients (B) can meaningfully be compared.
The MLR equation has multiple regression coefficients and a constant (intercept).
Image source: James Neill, Creative Commons Attribution-Share Alike 2.5 Australia, http://creativecommons.org/licenses/by-sa/2.5/au/
It is a good idea to get into the habit of drawing Venn diagrams to represent the degree of linear relationship between variables.
Image source: James Neill, Creative Commons Attribution-Share Alike 2.5 Australia, http://creativecommons.org/licenses/by-sa/2.5/au/
95% CI
Image source: http://cloudking.com/artists/noa-terliuc/family-violence.php
Kliewer, Lepore, Oskin, & Johnson, (1998)
Data available at www.duxbury.com/dhowell/StatPages/More_Stuff/Kliewer.dat
Image source: James Neill, Creative Commons Attribution-Share Alike 2.5 Australia, http://creativecommons.org/licenses/by-sa/2.5/au/
CBCL = Child Behavior Checklist
Predictors are largely independent of each other.
Stress and Witnessing Violence are significantly correlated with Internalizing.
Image source: James Neill, Creative Commons Attribution-Share Alike 2.5 Australia, http://creativecommons.org/licenses/by-sa/2.5/au/
R2 has same interpretation as r2. 13.5% of variability in Internal accounted for by variability in Witness, Stress, and SocSupp.
Image source: James Neill, Creative Commons Attribution-Share Alike 2.5 Australia, http://creativecommons.org/licenses/by-sa/2.5/au/
t test on two slopes (Violence and Stress) are positive and significant.
SocSupp is negative and not significant. However the size of the effect is not much different from the two significant effects.
Image source: Unknown
Image source: Unknown
Re the 2nd point - the same holds true for other predictors.
Image source: Unknown
Image source: http://cloudking.com/artists/noa-terliuc/family-violence.php
Kliewer, Lepore, Oskin, & Johnson, (1998)
Image source::Vemuri & Constanza (2006).
Image source::Vemuri & Constanza (2006).
Image source::Vemuri & Constanza (2006).
Nigeria, India, Bangladesh, Ghana, China and Philippines were treated as outliers and excluded from the analysis.
Manual technique & commonly used.
e.g., some treatment variables may be less expensive and these could be entered first to find out whether or not there is additional justification for the more expensive treatments
Manual technique & commonly used.
e.g., some treatment variables may be less expensive and these could be entered first to find out whether or not there is additional justification for the more expensive treatments
These residual slides are based on Francis (2007) – MLR (Section 5.1.4) – Practical Issues & Assumptions, pp. 126-127 and Allen and Bennett (2008)
Note that according to Francis, residual analysis can test:
Additivity (i.e., no interactions b/w Ivs) (but this has been left out for the sake of simplicity)