Chapter 2. Correlation: Correlation
Displaying the Relationship Between Two Variables
Contingency tableTo inspect the relationship between two categorical/qualitative variables, we can construct a contingency table.
A contingency table is very similar to a frequency table, the major difference being that whereas a frequency table only concerns a single variable, a contingency table concerns two variables.
A contingency table displays the distribution of one variable in the rows of the table and the distribution of a second variable in the columns of the table.
Below is a contingency table displaying the relationship between gender and pet preference:
Prefers dogs | Prefers cats | Total | |
Male | 25 | 15 | 40 |
Female | 20 | 40 | 60 |
Total | 45 | 55 | 100 |
Because the rows and columns contain a different number of cases, the relationship between the two variables is not immediately obvious. To get a better understanding of the relationship, we can convert the absolute frequencies in the table into either column or row percentages:
- To calculate the column percentage of a cell, we divide the absolute frequency in the cell by the corresponding column total
- To calculate the row percentage of a cell, we divide the absolute frequency in the cell by the corresponding row total
The table below is the result of converting the absolute frequencies into row percentages:
Prefers dogs | Prefers cats | Total | |
Male | 62.5% | 37.5% | 100% |
Female | 33.3% | 66.7% | 100% |
Total |
This data suggest that there is a relationship between gender and pet preference.
Specifically, men tend to have a preference for dogs over cats (62.5% vs 37.5%), whereas women tend to have a preference for cats (33.3% vs 66.7%).
#\phantom{0}#
ScatterplotTo visually inspect the relationship between two numerical/quantitative variables, we can construct a scatterplot.
A scatterplot is an #X#-#Y# graph, with one variable plotted along each axis. Pairs of scores that correspond to a single individual are plotted as dots.
\[\begin{array}{c|c|c}
\text{Student}&\text{Time studied}&\text{Grade}\\
\hline
1&5&5.0\\
2&5&6.2\\
3&8&7.1\\
4&9&6.3\\
5&10&6.2\\
6&11&8.1\\
7&11&6.3\\
8&12&7.0\\
9&14&7.5\\
10&16&7.4\\
11&17&9.0\\
12&17&7.6\\
\end{array}\]
#\phantom{0}#
To get an impression of the relationship between the two variables, we can draw a 'cloud' around the dots in the scatterplot.
In this case, the cloud has the shape of an ellipse pointing from the bottom-left to the top-right, indicating a positive linear relationship.
This suggests that students who study longer also tend to be students that get better grades.
Or visit omptest.org if jou are taking an OMPT exam.