banner



How To Draw A Regression Line

Return to Statistics Topics

Scatter Diagrams and Regression Lines

A. Constructing a Besprinkle Diagram

Constructing a scatter diagram is a fairly straightforward process. First decide which variable is going to be your 10-value and which variable is going to be your y-value.

Find the minimum and maximum of your ten-values and gear up a compatible number line on your horizontal axis so that the values extend from the minimum ten-value to the maximum x-value but not much farther.

Next find the minimum and maximum of your y-values and gear up a uniform number line on the vertical centrality and so that the values extend from the minimum y-value to the maximum y-value only not much further.

Once the axes are set, you just act like each pair of 10- and y-values is an ordered pair, and y'all plot these ordered pairs on the coordinate axes you lot just created.

For example, consider Table i on folio 178 of Sullivan, which is reproduced below. Usually y'all would pick the kickoff row or column as your x-values and the second row or cavalcade every bit your y-values. Once you lot make that determination, your axes should be fairly similar to those shown in the figure below.

Club-Head Speed (mph)

100

102

103

101

105

100

99

105

Altitude (yards)

257

264

274

266

277

263

258

275

The everyman club caput speed is 99 and the highest is 105, and the x-centrality shown extends from 99 to 105. The shortest distance is 257 and the largest is 277, only since multiples of five were used to mark the scale, the vertical scale extends a little below 257 to 255, and a little above 277 to 280.

Each row of Table 1 on page 178 will exist represented by a point on the scatter diagram. To plot the point represented by the get-go row in the table, you observe 100 on the 10-centrality and then move up to a height representing 257 on the y-axis. Since 257 is between 255 and 260, but closer to 255, that is where the betoken should be. In a similar fashion, points would be plotted for the other rows in Tabular array one. Once a signal has been plotted for each row and a title has be added to the nautical chart, the scatter diagram is complete.

golfscatter

B. Using the TI 83/84 Calculator to Find Equations of Regression Lines and Coefficients of Determination and Correlation

To find coefficients of determination and correlation, you must first brand a change in the settings on your calculator. Press "2ND" and "0" (zero). This volition bring up a list of all the procedures in the calculator in alphabetical social club. Use the downwardly-arrow key to put the triangle cursor adjacent to "DiagnosticOn". Make sure yous choose "DiagnosticOn" and NOT "DiagnosticOff". Then printing "Enter" twice. On your calculator screen, you lot should see:

DiagnosticOn

Washed

Yous should just need to do this process over again if (1) your estimator is reset to the manufacturer�south default settings, or (2) y'all commencement using a different figurer.

To find regression lines and coefficients of determination and correlation, yous need to be working with two variables with each value of the first variable paired with one value of the 2d. For this demonstration, nosotros will use the data from Table 1 again:

Lodge-Head Speed (mph)

100

102

103

101

105

100

99

105

Distance (yards)

257

264

274

266

277

263

258

275

Press "STAT" and "ENTER". Enter the numbers for the start variable under "L1". Enter the numbers for the 2nd variable under "L2". When y'all are finished, the data entry screen should look like the following:

L1

L2

103

274

101

266

105

277

100

263

99

258

105

275

------

------

The commencement ii rows appear to be missing, but if you printing the up-arrow a few times, both rows will reappear.

The numbers in the commencement column are paired with the same numbers in the 2nd column that they were in the original table of information (except numbers in the first row were paired with numbers in the second row in the original table). If the pairing is changed, your results will most likely be incorrect.

At present printing "STAT". Movement the cursor to "CALC" and select "LinReg(ax+b)" using the downwards-arrow button. Pressing "ENTER" twice will bring up the following brandish:

LinReg

y = ax + b

a = 3.166101695

b = -55.79661017

r2 = .8811498758

r = .9386958377

If your information is in some other lists too L1 and L2, you could cull those lists by pressing "ENTER" just ane fourth dimension after selecting "LinReg(ax+b)", typing in the correct lists, and pressing" ENTER" again. If your data is in L4 and L5 , for example, you would press "2nd", "four", "," (comma), "second", "5" and so "ENTER" to compute the regression line for those lists.

For this example, the equation of the regression line is y = iii.166x-55.797. The coefficient of decision, 0.881, says that well-nigh 88.one% of the variation in the data is determined by the regression line. The correlation coefficient, 0.939, indicates a stiff positive correlation. Run across Coefficients of Decision and Correlation beneath to find out how to interpret the coefficients of determination and correlation.

To plot the regression line on the scatter diagram, yous need to discover two points on the regression line. Since you have the equation of the regression line, all you need are some x-values to plug into this equation. The minimum and maximum x-values are good to employ, but you could use whatever numbers that are close to these x-values. In the above example, 99 is the minimum 10-value and 105 is the maximum. Plugging these into the equation gives:

y = 3.166 x 99 - 55.797 = 257.6,

or x = 99 and y = 257.6 for the first signal, and

y = iii.166 x 105 - 55.797 = 276.half-dozen,

or x =105 and y = 276.6 for the 2nd point.

Plotting these 2 points on the scatter diagram and cartoon a line through them gives a graph of the regression line. When the regression line is plotted correctly, about half of the information points will be in a higher place the line and the other half will be below the line. If your line is below or above much more than half of the information points, then y'all take done something incorrect. Ordinarily this indicates that y'all recorded a wrong number somewhere or you have switched the 10-values and y-values at some footstep in the process.

C. Good and Bad Scatter Diagrams

Figure ane: Skilful Scatter Diagram

golfscatreg

This is a GOOD scatter diagram. It has a title and both axes are labeled. Both scales extend only as far as the data values and non much farther. Notice that the regression line goes through the middle of the points. Three points are above the regression line and 3 points are below it, while 2 points just bear upon the regression line.

Figure ii: BAD Scatter Diagram

golfscatregx

This is a BAD besprinkle diagram. Notice that there are no data values to the left of 90 on the horizontal axis, and even so the horizontal scale goes all the way downwardly to nix. As a result, most of the left side of the chart is empty and all the information values are squeezed against the right side of the nautical chart.

Figure 3: BAD Scatter Diagram

golfscatregy

This is a BAD scatter diagram. Find that there are no data values below 250 on the vertical axis, and yet the vertical scale goes all the way down to zero. As a result, most of the bottom function of the chart is empty and all the data values are pushed upward against the meridian of the graph.

Effigy four: BAD Besprinkle Diagram

golfscatregxy

This is a BAD besprinkle diagram. While both scales are restricted, they still get a lot farther than they demand to. As a result, the data is forced into a very small surface area of the nautical chart and in that location is a lot of blank space around information technology. There is no need for the horizontal axis to go below 95 or above 110. There is no demand for the vertical axis to become beneath 250 or above 280. The smaller the range of data on each axis is, the more the chart becomes focused on the data. Run across how the information points take up most of the graph in Effigy 1 above.

Figure 5: BAD Scatter Diagram

golfscatregyx

This is a BAD scatter diagram. The vertical centrality has been extended to show the line, but since the line is nowhere almost the data, this is not the regression line. Usually when this happens, it means the x and y variables take been switched somewhere in the procedure of finding the regression line. Always set up the scatter diagram starting time. Then if the regression line is nowhere well-nigh the information, that means yous made a mistake in computing the regression line. One thing to try if this happens is switching x and y values.

D. Coefficients of Determination and Correlation

The coefficient of determination, r 2, tells what percent of the variation in data values is explained by the regression line. If this percent is less than 100%, then the difference between 100% and the coefficient of decision tells what percent of the variation is determined by something other than the regression line.

Examples

a) If r 2 = 0.82, and then 82% of the variation is adamant by the regression line, and 18% of the variation is determined past another cistron or factors.

b) If r 2 = 0.47, then 47% of the variation is determined past the regression line, and 53% of the variation is determined by some other gene or factors.

The correlation coefficient, r, tells how closely the besprinkle diagram points are to being on a line. If the correlation coefficient is positive, the line slopes upward. If the correlation coefficient is negative, the line slopes downward. All values of the correlation coefficient are between -1 and 1, inclusive.

The correlation calibration* beneath provides a way to categorize the values of correlation coefficients.

According to this scale, a correlation coefficient of 0.two would indicate a weak positive correlation, while a coefficient of -0.9 would indicate a stiff negative correlation. A correlation coefficient of 1.0 indicates a perfect positive correlation.

* This scale has been revised and expanded from the correlation scale presented in Jay Devore and Nicholas Farnum, Applied Statistics for Engineers and Scientists, 2nd edition, Brooks/Cole 2005, p. 109.

Practise: Interpret the values of r 2 and r given below.

1) r 2 = 0.452

2) r ii = 0.913

iii) r 2 = 0.721

4) r 2 = 0.264

5) r = 0.431

half dozen) r = - 0.083

vii) r = 0.972

8) r = - 1.0

ix) r = 0.681

x) r = - 0.753

eleven) r = 0.047

12) r = - 0.994

Answers

The following scatter diagrams are provided to give yous some thought of how correlation coefficients and coefficients of determination chronicle to how points are amassed around a regression line.

GolfScatRegM

The regression line has the equation: Distance = iii.17 x Speed - 55.80. The correlation coefficient is 0.939, which signifies a strong positive correlation. The coefficient of determination is 0.881, indicating that 88.i% of the variation in the data is determined past the regression line.

lifegest

The regression line has the equation: Life Exp. = 0.0261 ten Gestation + vii.87. The correlation coefficient is 0.726, which signifies a moderate positive correlation. The coefficient of determination is 0.527, indicating that 52.vii% of the variation in the data is adamant by the regression line.

IQMRI

The regression line has the equation: IQ = 0.0172 x (MRI Count) + 119.22. The correlation coefficient is 0.357, which signifies a weak positive correlation. The coefficient of determination is 0.128, indicating that 12.8% of the variation in the data is adamant by the regression line.

TECOGE

The regression line has the equation: TECO = 0.0170 ten GE + 0.0427 . The correlation coefficient is 0.017, which shows no correlation between the annual rates of return for the two stocks.The coefficient of determination is 0.0003, indicating that virtually none of the variation in the data is determined by the regression line.

TECOCisco

The regression line has the equation: TECO = -0.112 x Cisco + 0.0888 . The correlation coefficient is -0.235, which signifies a weak negative correlation. The coefficient of decision is 0.055, indicating that 5.5% of the variation in the data is determined past the regression line.

winera

The regression line has the equation: Per centum = -0.111 x ERA + 0.977. The correlation coefficient is -0.660, which signifies a moderate negative correlation. The coefficient of determination is 0.436, indicating that 43.half dozen% of the variation in the information is determined by the regression line.

MPGwgt

The regression line has the equation: MPG = -0.00617 ten Weight + 41.46 . The correlation coefficient is -0.892, which signifies a strong negative correlation. The coefficient of decision is 0.796, indicating that 79.6% of the variation in the information is determined by the regression line.

E. Outliers and Influential Observations in a Scatter Diagram

If there is a regression line on a scatter diagram, you can identify outliers. An outlier for a scatter diagram is the point or points that are farthest from the regression line. Distance from a point to the regression line is the length of the line segment that is perpendicular to the regression line and extends from the point to the regression line. (See the figure below.) Note that outliers for a scatter plot are very dissimilar from outliers for a boxplot.

There is usually at least one outlier and ordinarily only 1 outlier on a scatter diagram. If one betoken of a scatter diagram is farther from the regression line than some other point, and then the scatter diagram has at least ane outlier. If ii or more points are the same uttermost altitude from the regression line (not a common occurrence), so each of these points is an outlier. If all points of the scatter diagram are the same distance from the regression line (which very rarely happens), then in that location is no outlier. (Come across the GeoGebra applet Scatter Diagram Outliers.)

An influential observation (inf. obs.) is a point on a scatter diagram that has a big horizontal gap containing no points between it and a vast majority of the other points. As shown in the graph below, there can exist more than than one influential observation. If there is no large horizontal gap between information points in a scatter diagram, at that place are no influential observations. In many cases, a scatter diagram will have no influential observations; only influential observations should exist identified if they occur.

When an influentialobservation is moved upwards or down and the regression line is recomputed, the newline will be much closer to the new location of the influential observation. If a non-influential observation is relocated, the recomputed regression line will exist in near thesame position as the original regression line. Thus the influential observation "influences" the location of the regression line. (Come across the Influential Observations GeoGebra applet )

Exercises Place influential observations and outliers in the besprinkle diagrams shown below.

Source: https://www2.southeastern.edu/Academics/Faculty/dgurney/Math241/StatTopics/ScatGen.htm

Posted by: oneallaremas.blogspot.com

0 Response to "How To Draw A Regression Line"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel