An Excel Tool for Testing Differences between Correlations

I recently had a client who had conducted a study of employee turnover using two different measures of intention to quit. We regressed these measures separately on measures of extrinsic and intrinsic satisfaction. We then sought to address the question of whether there was a difference in the degree to which the two measures of intention to quit could be predicted from the two satisfaction measures. This problem boils down to testing for the difference between the two R2s or Rs resulting from the separate regressions.

This is clearly not a case of testing for the difference between two correlations from independent samples. Both of the regressions were computed on the same sample. However, the two correlations were non-overlapping, which means that no variables were common to both correlations. This is apparent in the case of the separate intention to quit measures. However, it is less apparent but equally true that the composite score based on the regression weights derived for the prediction of each intention to quit measure constitutes a completely separate variable from the composite score based on the regression weights derived for the prediction of the other intention to quit measure. Thus, we have a case of correlations between dependent (because they are computed for the same sample) but non-overlapping variables.

The proper procedure for testing the difference between correlations fitting the above description was not fully resolved until the publication of a paper in Psychological Methods by Raghunathan, Rosenthal, and Rubin (1996). That paper describes a modification of the Pearson-Filon (PF) method that substitutes Fisher z transformations for the correlations appearing in the original PF formula. The resulting formula for this revised test is as follows:

 where: zr12 = Fisher z transformation of r12
z
r34 = Fisher z transformation of r34
k
= (r13 – r23r12)(r24 – r23r34)
+ (r14 – r13r34)(r23 – r13r12)
+ (r13 – r14r34)(r24 – r14r12)
+ (r14 – r12r24)(r23 – r24r34)

The value of ZPF is referenced to the Z distribution to obtain its p-value.

The ZPF formula is clearly not one that the typical researcher, or even the typical statistician, wants to hammer out on a calculator each time the need for it arises. Weaver and Wuensch (2013) published an excellent article that explains the intricacies of comparing correlations and regression coefficients, and provides SPSS and SAS syntax programs for conducting the necessary calculations (the full manuscript of this article is available here: http://core.ecu.edu/psyc/wuenschk/W&W/Weaver&Wuensch_2013.pdf;

for the SPSS syntax files, see:
https://sites.google.com/a/lakeheadu.ca/bweaver/Home/statistics/spss/my-spss-page/weaver_wuensch;

for the SAS syntax files, see: http://core.ecu.edu/psyc/wuenschk/W&W/W&W-SAS.htm.)

However, not everyone has access to either SPSS or SAS, and it's often much easier to perform calculations on summary data (e.g., already computed correlations and regression coefficients) in Excel than getting into the syntax runs of SPSS or SAS. As far as I can determine, there has not been a set of Excel calculator worksheets that correctly computes all the tests of the differences between correlations and regression coefficients within each of the categories of independent and dependent, overlapping and non-overlapping, and their combinations. In response to the absence of such an Excel-based set of calculators, I offer the Correlation and Slope Comparator. This is available for download at the Stat Tools tab of this website, and is described below.

The Correlation and Slope Comparator

This set of tools is provided in the form of a collection of worksheet calculators within an Excel workbook. I cannot take full credit for this tool. In the course of searching for an Excel-based calculator for the test of the difference between dependent non-overlapping correlations, I ran across a worksheet that contained a version of the test I was seeking along with tests of two other types of correlation pairs and the correction for attenuation. I updated the test of the difference between dependent non-overlapping correlations to reflect the latest version of the formula from the Raghunathan, et al. (1996) article. I also added the tests for the difference of a correlation from a hypothesized value and for the difference between slopes. Finally, I cleaned up the formatting and the appearance of the worksheets prior to data entry.

I have been unable to relocate the source from which I downloaded the spreadsheet that became the basis for this set of tools, or to identify the originator of the spreadsheet I adapted. However, if he or she stumbles upon this site and recognizes his or her work as the basis for this tool, I wish to acknowledge this person's good work and contributions to this further evolution of the original set of calculators.

The Correlation and Slope Comparator contains the following calculation tabs:

 

  • Dependent overlapping correlations: Tests for the significance of the difference between two correlations in the situation where the two correlations share a common variable (e.g., r1,2 and r1,3) and both correlations were computed on the same cases.

 

 

 

  • Dependent non-overlapping correlations: Tests for the significance of the difference between two correlations in the situation where there is no variable in common between the two correlations (e.g., r1,2 and r3,4), and both correlations were computed on the same cases.

 

 

 

  • Independent samples correlations: Tests for the significance of the difference between two correlations in the situation where each correlation was computed on a different sample of cases. [Note: The example invariably used in this case is the correlation between the same two variables in different samples (i.e., complete overlap). There potentially are hidden and as yet unexplored complications for comparisons involving 50% and zero overlap between the variables correlated in separate samples.]

 

 

 

  • Difference from hypothesized correlation: Tests for the significance of the difference between an observed correlation and the hypothesized value of the correlation. The hypothesized value may be zero or any other value between -1.0 and +1.0.

 

 

 

  • Difference between slopes: Tests for the significance of the difference between two (raw score) slopes (i.e., also commonly referred to as b weights or raw score regression coefficients) from a regression equation. The slopes may reference the same x and y variables in the same or different samples, or different x variables for regression equations computed for different samples. It's difficult to imagine a need to compare slopes for regressions of different y variables in either the same or different samples, but there are no indications in the literature that the computation of the pooled standard error of the difference in such slopes would be any different than in the more conventional situations.

 

 

 

  • Disattenuation of correlation: This computes three different ways of correcting a correlation for unreliability in the variables being correlated: correcting only for unreliability in the x variable, correcting only for unreliability in the y variable, and correcting for unreliability in both of the variables.

 


The Correlation and Slope Comparator can be downloaded by clicking this link:

The Correlation and Slope Comparator (Excel worksheet calculators)