Package 'correlatio'

Title: Visualize Details Behind Pearson's Correlation Coefficient
Description: Helps visualizing what is summarized in Pearson's correlation coefficient. That is, it visualizes its main constituent, namely the distances of the single values to their respective mean. The visualization thereby shows what the etymology of the word correlation contains: In pairwise combination, bringing back (see package Vignette for more details). I hope that the 'correlatio' package may benefit some people in understanding and critically evaluating what Pearson's correlation coefficient summarizes in a single number, i.e., to what degree and why Pearson's correlation coefficient may (or may not) be warranted as a measure of association.
Authors: Marcel Miché [aut, cre]
Maintainer: Marcel Miché <[email protected]>
License: MIT + file LICENSE
Version: 0.1.0
Built: 2025-02-19 07:16:45 UTC
Source: https://github.com/mmiche/correlatio

Help Index


Documentation of this correlatio package.

Description

This R package can help visualizing what is summarized in Pearson's correlation coefficient.

All R packages that have been used in, as well as for developing, this correlatio package, are listed below. Thanks to the many R package developers!

References

Wickham H, Wickham H (2016). “Programming with ggplot2.” Ggplot2: elegant graphics for data analysis, 241–253.

Müller K, Wickham H (2023). tibble: Simple Data Frames. R package version 3.2.1, https://CRAN.R-project.org/package=tibble.

R Core Team (2024). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/.

Wickham H, Hester J, Chang W, Bryan J (2021). devtools: Tools to Make Developing R Packages Easier. R package version 2.4.3, https://CRAN.R-project.org/package=devtools.

Wickham H, Bryan J (2021). usethis: Automate Package and Project Setup. R package version 2.0.1, https://CRAN.R-project.org/package=usethis.

Wickham H, Danenberg P, Csárdi G, Eugster M (2021). roxygen2: In-Line Documentation for R. R package version 7.1.2, https://CRAN.R-project.org/package=roxygen2.

Boshnakov GN (2022). “Rdpack: Update and Manipulate Rd Documentation Objects.” doi:10.5281/zenodo.3925612, R package version 2.3.


Apply and visualize Pearson's product-moment correlation.

Description

Compute all components which are part of Pearson's correlation coefficient and visualize the most important part of what is summarized in the correlation coefficient. This most important part is the difference between the values of each variable from their respective mean. While it may appear superflous for some people to visualize this part, other people may benefit from it. See vignette of this 'correlatio' package for further explanations.

Usage

corrio(data = NULL, visualize = TRUE)

Arguments

data

A data.frame with two columns, which shall be correlated by Pearson's product-moment method.

visualize

A single boolean value (default: TRUE), which determines whether the data shall be visualized in two plots.

Value

a list with a data.frame (name: dat), a list (name: details), and two graphs as elements (plot1 and plot2). dat contains these five columns:

  1. x Values of the first variable (= x).

  2. y Values of the second variable (= y).

  3. x-mean(x) Difference between x and the mean of x.

  4. y-mean(y) Difference between y and the mean of y.

  5. covVec Product of x-mean(x) and y-mean(y).

details is a list with 12 objects, each of which contains an explanation as attribute:

  1. Mean of variable 1 (variable 1 = x).

  2. Mean of variable 2 (variable 2 = y).

  3. Sum of all negative products (negSum): (x-mean(x)) * (y-mean(y)).

  4. Sum of all positive products (posSum): (x-mean(x)) * (y-mean(y)).

  5. Numerator of covariance formula: Sum of negSum and posSum.

  6. Denominator of covariance formula: n - 1.

  7. Covariance: numeratorCov/denominatorCov.

  8. Standard deviation of variable 1 (i.e., x): R command sd().

  9. Standard deviation of variable 2 (i.e., y): R command sd().

  10. Product of standard deviations (prodSD) of variables 1 and 2 (i.e., x and y).

  11. Correlation: Covariance/prodSD.

  12. Percentages of pairwise directions of s, c, n (s = same, c = contrary, n = no)

plot1 and plot2 are two ways of visualizing the connection between the individual values and their respective mean value.

Author(s)

Marcel Miché

References

Curran-Everett D (2010). “Explorations in statistics: correlation.” Advances in physiology education, 34(4), 186–191.

Wickham H, Wickham H (2016). “Programming with ggplot2.” Ggplot2: elegant graphics for data analysis, 241–253.

Examples

simData <- simcor(obs=100, rhos = .6)
corrio(data=simData[[1]], visualize = TRUE)

Linearly transform one scale into another scale.

Description

Transform the values of a variable into other values, by using the linear model. Additionally, select the number of decimal digits of the transformed values.

Usage

lineartransform(futureRange = c(1, 5), vec = NULL, digits = NULL)

Arguments

futureRange

Vector that shows the range of the new scale, e.g., c(1, 5).

vec

A vector which contains the values that shall be transformed to the new scale.

digits

A single integer that shows the number of digits, which the transformed values shall get rounded to.

Value

a vector with the linearly transformed new values, rounded to how many digits the user has set the function argument 'digits'.

Author(s)

Marcel Miché

References

lm; linear model command from the stats package

Examples

someValues <- stats::rnorm(n=10)
# Linearly transform to values between 1 and 5, rounded to zero digits.
lineartransform(futureRange = c(1, 5), vec = someValues, digits = 0)

Simulate two correlated variables.

Description

Simulate pairs of variables with a predefined correlation between them.

Usage

simcor(obs = 100, rhos = c(-0.5, 0.5))

Arguments

obs

A single integer that determines the number of simulated observations in each of the pair of variables.

rhos

A vector with at least one value that shows the theoretical correlation between the simulated pair of variables.

Value

a list with as many data.frames (each consisting of two columns) as there are values passed to the function argument 'rhos'.

Author(s)

Marcel Miché

References

pdf, see headline: Simulating data with known correlations

Examples

# Simulate a list with two data.frames. The first one contains variables that are correlated
# around -.8, the second one around .7. Both data.frames contain 200 observations.
simcor(obs = 200, rhos = c(-.8, .7))