Several efforts have been made to offer a quick and easy tool to produce the needed sets of observations, but the tools that been offered to date all contain a common serious flaw: They fail to account for the random level of correlation that inevitably occurs between the initial observations randomly generated for a pair of variables. Consequently, any procedures subsequently applied to impute the desired correlation will produce a correlation that deviates from the desired level by the amount of the originally occurring chance correlation. One fairly recent article even went so far as to assert that the application of the Box-Muller normality transformation to the variables eliminates any chance correlation between them, which is absurd.

The Correlated Random Variable Generator Excel macro offered on the "Stats Tools" tab of this website (see also the link below) is the only standalone tool currently available to produce normally distributed random bivariate data sets of any number of observations which have the precise (to 3 decimal places) degree of correlation between them that the user specifies. In addition, the user may specify the means and standard deviations of the two variables, and may limit the values of each variable to fall within specified upper and lower boundaries.

I should also mention that many researchers encounter the need for random data to be generated for 3 or more variables with a specified multivariate correlational structure, either with or without sampling error. I have developed a standalone compiled program for producing such data sets which is called, *Monte Carlo/PC*. This program sells for $150, and may be purchased from me. Please send requests for information about purchasing this program to *inquiry@prostatservices.com* or by calling Professional Statistical Services at the number in the contact tab of www.prostatservices.com.

The Correlated Random Variable Generator can be downloaded by clicking this link: Correlated Random Variable Generator (Excel macro)

]]>Up to now these tests have been inaccessible to many who sought to utilize them due to their being available only in several very expensive or very technically challenging statistical software systems (i.e., SAS, SPSS, Stata, and R). I have developed an Excel macro worksheet for calculating these tests, and made it available for download under the Stats Tools tab of the Prostatservices.com website, free of charge to anyone. It is currently set up with a limit of 20 strata. If any users require the ability to test the odds ratios of more strata than this, they may contact me (Jeffrey Kane) through this website to arrange the necessary modifications.

The Breslow-Day and Tarone tests calculator can be downloaded by clicking this link: The Breslow-Day and Tarone Tests Calculator (Excel worksheet calculators)

]]>Note that this tool will also work with multiple linear regression models as well as with simple linear regression models. To use with multiple regression models, you will have to compute the predicted values for your models, and then compute the simple regression of the observed Y values on the predicted Y values for each model. Use the slopes, intercepts, and minimums and maximums for these models to generate the graphs (the predicted Y variable will be the X variable in these analyses).

The Correlation and Slope Comparator can be downloaded by clicking this link:

Regression Graph Creator (Excel worksheet calculators)

Unfortunately, ANCOVA is only able to achieve its intended purpose in the very rare case where the slopes of the regressions of the dependent variable on the covariate are identical in all of the groups being compared. This restriction on its applicability is conventionally stretched to allow slope differences up to the point where they become statistically significant. In fact, even such nonsignificant slope differences will result in the retention of extraneous covariate variance within each group that will reduce the power of the test of group differences by enlarging the error term. In a large proportion of studies (25 - 50% in my experience), especially those involving 50 or fewer subjects per group, the assumption of homogeneity of within-group regression slopes is violated to significant degree, rendering ANCOVA formally inapplicable. Until now, the conclusion that data fails to satisfy this "homogeneity of regression slopes" assumption has left analysts with no alternative methodology for excluding covariate variance from the comparison of group means.

I have recently submitted an article for review in a major journal that proposes a new method which can be used in place of ANCOVA to exclude variance due to one or more extraneous covariates from the dependent variable on which group means are to be compared. The proposed method is called Analysis of Covariate Residuals, or ANCOVRES. The applicability of this method is unaffected by any degree of differences between the slopes of within-group regressions of the dependent variable on the covariate (i.e., lack of homogeneity of regression). It achieves complete exclusion of covariate influence on the dependent variable within each group being compared, thereby maximizing the power of the comparison for the given sample. The adjusted data is quite easily computed, especially in the case of one covariate, and this adjusted data is subsequently analyzed through ordinary ANOVA or t-test.

An abbreviated version of the article proposing the new method is available for download from the "Articles" section of this website, or by clicking this link:

https://www.prostatservices.com/articles/beyond-ancova-a-new-method-for-excluding-the-influence-of-covariates-in-comparing-group-means

This is clearly not a case of testing for the difference between two correlations from independent samples. Both of the regressions were computed on the same sample. However, the two correlations were non-overlapping, which means that no variables were common to both correlations. This is apparent in the case of the separate intention to quit measures. However, it is less apparent but equally true that the composite score based on the regression weights derived for the prediction of each intention to quit measure constitutes a completely separate variable from the composite score based on the regression weights derived for the prediction of the other intention to quit measure. Thus, we have a case of correlations between dependent (because they are computed for the same sample) but non-overlapping variables.

The proper procedure for testing the difference between correlations fitting the above description was not fully resolved until the publication of a paper in *Psychological Methods* by Raghunathan, Rosenthal, and Rubin (1996). That paper describes a modification of the Pearson-Filon (PF) method that substitutes Fisher z transformations for the correlations appearing in the original PF formula. The resulting formula for this revised test is as follows:

where: *z*_{r12} = Fisher z transformation of r_{12}

z_{r34} = Fisher z transformation of r_{34}*k* = (r

+ (r

+ (r

+ (r

The value of ZPF is referenced to the Z distribution to obtain its p-value.

The ZPF formula is clearly not one that the typical researcher, or even the typical statistician, wants to hammer out on a calculator each time the need for it arises. Weaver and Wuensch (2013) published an excellent article that explains the intricacies of comparing correlations and regression coefficients, and provides SPSS and SAS syntax programs for conducting the necessary calculations (the full manuscript of this article is available here: http://core.ecu.edu/psyc/wuenschk/W&W/Weaver&Wuensch_2013.pdf;

for the SPSS syntax files, see:

https://sites.google.com/a/lakeheadu.ca/bweaver/Home/statistics/spss/my-spss-page/weaver_wuensch;

for the SAS syntax files, see: http://core.ecu.edu/psyc/wuenschk/W&W/W&W-SAS.htm.)

However, not everyone has access to either SPSS or SAS, and it's often much easier to perform calculations on summary data (e.g., already computed correlations and regression coefficients) in Excel than getting into the syntax runs of SPSS or SAS. As far as I can determine, there has not been a set of Excel calculator worksheets that correctly computes all the tests of the differences between correlations and regression coefficients within each of the categories of independent and dependent, overlapping and non-overlapping, and their combinations. In response to the absence of such an Excel-based set of calculators, I offer the Correlation and Slope Comparator. This is available for download at the Stat Tools tab of this website, and is described below.

**The Correlation and Slope Comparator**

This set of tools is provided in the form of a collection of worksheet calculators within an Excel workbook. I cannot take full credit for this tool. In the course of searching for an Excel-based calculator for the test of the difference between dependent non-overlapping correlations, I ran across a worksheet that contained a version of the test I was seeking along with tests of two other types of correlation pairs and the correction for attenuation. I updated the test of the difference between dependent non-overlapping correlations to reflect the latest version of the formula from the Raghunathan, et al. (1996) article. I also added the tests for the difference of a correlation from a hypothesized value and for the difference between slopes. Finally, I cleaned up the formatting and the appearance of the worksheets prior to data entry.

I have been unable to relocate the source from which I downloaded the spreadsheet that became the basis for this set of tools, or to identify the originator of the spreadsheet I adapted. However, if he or she stumbles upon this site and recognizes his or her work as the basis for this tool, I wish to acknowledge this person's good work and contributions to this further evolution of the original set of calculators.

The Correlation and Slope Comparator contains the following calculation tabs:

**Dependent overlapping correlations**: Tests for the significance of the difference between two correlations in the situation where the two correlations share a common variable (e.g., r_{1,2}and r_{1,3}) and both correlations were computed on the same cases.

**Dependent non-overlapping correlations**: Tests for the significance of the difference between two correlations in the situation where there is no variable in common between the two correlations (e.g., r_{1,2}and r_{3,4}), and both correlations were computed on the same cases.

**Independent samples correlations**: Tests for the significance of the difference between two correlations in the situation where each correlation was computed on a different sample of cases. [Note: The example invariably used in this case is the correlation between the same two variables in different samples (i.e., complete overlap). There potentially are hidden and as yet unexplored complications for comparisons involving 50% and zero overlap between the variables correlated in separate samples.]

**Difference from hypothesized correlation**: Tests for the significance of the difference between an observed correlation and the hypothesized value of the correlation. The hypothesized value may be zero or any other value between -1.0 and +1.0.

**Difference between slopes**: Tests for the significance of the difference between two (raw score) slopes (i.e., also commonly referred to as b weights or raw score regression coefficients) from a regression equation. The slopes may reference the same x and y variables in the same or different samples, or different x variables for regression equations computed for different samples. It's difficult to imagine a need to compare slopes for regressions of different y variables in either the same or different samples, but there are no indications in the literature that the computation of the pooled standard error of the difference in such slopes would be any different than in the more conventional situations.

**Disattenuation of correlation**: This computes three different ways of correcting a correlation for unreliability in the variables being correlated: correcting only for unreliability in the x variable, correcting only for unreliability in the y variable, and correcting for unreliability in both of the variables.

The Correlation and Slope Comparator can be downloaded by clicking this link:

The Correlation and Slope Comparator (Excel worksheet calculators)

]]>I encounter this view from time to time, usually indirectly through students whose professors are compelling them to use R or SAS. This propensity to derogate statistical systems that have been made more accessible to a wide range of potential users, and which are designed to expedite the conduct of lengthy analyses, seems to stem from at least three different motives. One source of such behavior is a weak ego. It is unfortunately not rare to find people who can feel good about themselves only by putting other people down. Since far fewer people make the effort to learn R, or have the resources to acquire access to SAS, a noticeable proportion of the people attracted to these software systems are motivated by the desire to gain some bragging rights over others.

Another source of this “put down” behavior is crass commercialism. If you can convince the public that your relatively rare skill can solve their problems better than a skill that is held more widely or that even could be acquired by the end-user him-/herself, you can charge more for your skill.

Least reprehensible, but no less damaging, is the belief that these harder to acquire software packages (where the barrier to acquisition is either the learning curve or the cost of the package) actually do everything better than the more widely accessible packages.

We can dismiss those driven by the ego-inflation motive as just your garden variety jerk who will always be with us. In my 23 years as an academic I encountered more of them ensconced in university faculties than I care to even think about. More importantly, my purpose here is not to try to change the views and behavior of any of these people who seek to put down more accessible statistical packages. Instead, my purpose is to give the rest of you an accurate understanding of the relative merits of the most widely used statistical packages.

The fact of the matter is that no statistical packages are “world class” in regard to all of the criteria by which such packages can be judged, and practically all of the packages are “world class” in some respects. Let’s consider what these criteria are in relation to widely-used, all-purpose statistical software packages. Here is my list (feel free to write-in to add more):

- Ease of use

- Learning curve

- Depth of menued procedures

- Range, quality, and ease of use of statistical procedures offered

- Modifiability of analytical output specifications

- Ease of transforming table output to formatting conventions (e.g., APA)

- Range of graphical output offered

- Speed of handling large data sets

- Ease and flexibility of data importation

- Ease of results exportation

- Thoroughness and interpretability of results output

- Ease and flexibility of data set manipulation

- Pricing for individuals

- Thoroughness and informativeness of documentation

I have written a review of the top 5 statistical software systems (i.e., SPSS, SAS, R, Minitab, and Stata) that evaluates these systems against each of the above criteria. It is available as an article in the Articles section of this website (A Review of the Top Five Statistical Software Systems).

Has knowledge become something that is disseminated on the basis of wealth? It certainly seems so from my vantage point. This basis for access to the repositories of knowledge seems to undermine the democratic principles of equal opportunity and the free flow of ideas and information. Moreover, much of the research reported in these journal articles was conducted with government funding. How is it that such research ends up being a commodity that is apportioned on the basis of having the ability to pay for it? This is a good example of where the interests of free enterprise and capitalism run counter to the public interest.

I would be very interested in hearing your views on what can be done to expand public access to scientific and professional journals, and to documents in repositories.There are a few examples of freely-accessible repositories of journal articles, such as ERIC and JSTOR. However, the coverage of these repositories is spotty and limited, undoubtedly due to the threat they pose to the profits of the pay-to-read services. Could public funding be provided to expand the coverage of these free services? Another possibility might be to enable the Library of Congress, which has access to all publications, to offer free online access to its entire collection. This could probably be done for the cost of, say, one aircraft carrier, and might do more to advance the cause of peace than an armada of such ships.

]]>I am structuring this blog so that you can leave a comment or question in response to any of the blog entries I make. If you would like me to address a question you have, or to start up a blog thread on any particular topic that I haven't addressed, please send me an email with details to inquiry@prostatservices.com. I will attempt to prepare a blog entry that responds to your request, or at least post your interest or need for information so that others can reply.

Don't worry that your request for information might seem too elementary or simple -- we all started learning these subjects at ground zero, and those of us who have some knowledge are standing on the shoulders of others who took the time to help us understand what probably seemed to them to be simple things. Too many of the teachers and professionals in this field seem to want to make its concepts and methods sound difficult, apparently in an effort to elevate their own stature. I've also run into a considerable number of experts who view their own understanding of a statistical concept or method as a competitive edge, not to be shared with others. Quite candidly, I hold people in both of these categories in utter contempt. Knowledge is to be shared, and people seeking help with statistics are much better off being left with understanding rather than in awe.

]]>“If a significant interaction is obtained, it means that a different relationship is seen for different levels of an independent variable….One implication of obtaining a significant interaction is that a statement of each main effect will not fully capture the results of the study. …The general rule is that when an interaction effect is present, the information it supplies is more enriched—more complete—than the information contained in the outcome of the main effects of those variables composing it. Sometimes … a main effect is moderately representative of the results (although it is still not completely adequate to fully explicate the data). Other times … the main effects paint a nonrepresentative picture of the study's outcome.”

A second important point that Myers et al. make is that post hoc analyses of the “simple effects” encompassed by an interaction should proceed by pairwise comparison of the levels of each factor within the levels of the other factor(s) in the interaction. For example, if A is a 2-level factor and B is 3-level factor, pairwise comparisons should be made between the 3 pairs of B levels within each of the two A levels. The Type I error levels of these comparisons should be corrected for family-wise error (e.g., using Bonferroni, LSD, or other procedure). Myers et al. also recommend that the comparisons be done both ways (i.e., between B levels within each A level, and between A levels within each B level), although they note that others (e.g., Keppel, 1991) suggest that only one of these be chosen on a priori conceptual grounds. In either case, this is quite a different approach than other widely read authors recommend. For example, Howell (2009, pp. 424-426) recommends analyzing whether the overall differences are significant between each level of a factor within each level of the other factor(s) involved in an interaction. This is adequate when there are only two levels of the factor being compared within each of the other factor’s levels. However, when there are 3 or more levels being compared (e.g., 3 levels of B within each level of A), Howell’s overall difference approach does not tell us which specific pairs of factors differ significantly within each level of the other factor. In order to fully understand the nature of the interaction, we must use the pairwise comparison approach.

This leads to a final point that this message should cover: How does one obtain pairwise comparisons of interaction category means? I will address this question in relation to the use of SPSS. The menu options in SPSS do not allow for post hoc analyses of the pairwise combinations of interacting factors in factorial ANOVA analyses conducted using the GLM methods. In order to obtain the desired output we need to add a statement to the syntax of the GLM univariate ANOVA command. If we paste the syntax from a simple 2-way between-subjects ANOVA with post hoc tests specified (which the SPSS menu system limits to only the main effects) and the minimum of other options selected, we get the following (assume A = 2 levels, B = 3 levels):

UNIANOVA C BY A B /METHOD=SSTYPE(3) /INTERCEPT=INCLUDE /POSTHOC=B(TUKEY) /CRITERIA=ALPHA(.05) /DESIGN=A B A*B. In order to obtain the results of the post-hoc pairwise comparisons of the interaction categories (i.e., 3 comparisons within each of the 2 levels of A), we need to add the following line to the above syntax: /EMMEANS=TABLES(A*B) compare (B) adj (BONFERRONI) This would result in the following syntax statement: UNIANOVA C BY A B /METHOD=SSTYPE(3) /INTERCEPT=INCLUDE /POSTHOC=B(TUKEY) /EMMEANS=TABLES(A*B) compare (B) adj (BONFERRONI) /CRITERIA=ALPHA(.05) /DESIGN=A B A*B. The resulting output for the post hoc analyses of the interaction categories would look like this (with simulated data):

A | (I) B | (J) B | Mean Difference (I-J) | Std. Error | Sig. | |

A1 | B1 | B2 | .030 | .039 | 1.000 | |

B3 | .170* | .039 | <.001 | |||

B2 | B3 | .140* | .039 | .001 | ||

A2 | B1 | B2 | -.470* | .039 | <.001 | |

B3 | .065 | .039 | .291 | |||

B2 | B3 | .535* | .039 | <.001 |

In the above example of output, I have modified the standard SPSS output to eliminate redundant categories. I have not explored the command syntax requirements necessary to get comparable output from SAS, STATA, or Minitab, but I strongly suspect each has provisions for producing these comparisons. References Howell, D. C. (2009). Statistical methods for psychology, seventh edition. Belmont, CA: Cengage Wadsworth. Meyers, L.S., Gamst, G., & Guarino, A. (2006). Applied multivariate research: Design and interpretation. Thousand Oaks, CA: Sage Publishers.

]]>