10.29 Rescale Data using Rank

20240814

A Rank ordering is assigned based on the sequential ordering of values of the variable. If all the values are different and there are no NAs then the rank is simply from 1 to the number of observations.

Data with equal values (ties) need to be resolved in some form. The R function base::rank() provides several approaches to resolving ties. The default is to assign, for all those with a common value, the average rank for those with this common value. This in some sense retains a kind of consistent distribution for the ranks. Other approaches can, for example, ensure a unique number for every value, even if they were originally the same. NAs are handled specially, being placed at the end of the ranking, incrementing for each one.

Some examples:

> rank(c(10,10))
[1] 1.5 1.5
> rank(c(10,10), ties.method='first')
[1] 1 2
> rank(c(10,10), ties.method='last')
[1] 2 1
> rank(c(10,10), ties.method='min')
[1] 1 1
> rank(c(10,10), ties.method='max')
[1] 2 2
> rank(c(10,10), ties.method='average')
[1] 1.5 1.5
> rank(c(10,10), ties.method='random')
[1] 2 1
> rank(c(10,10), ties.method='random')
[1] 1 2
> rank(c(10,10), ties.method='random')
[1] 2 1


Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2022 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0