Email news@statisticool.com to sign up to receive news and updates
Digit Preference Heaping
2/7/16
Here is some R code I wrote to handle something called digit preference heaping. I was following some article in 2009 when I wrote this. Of course, Murphy's Law applies, and I cannot find that article.
In any case, digit preference is the phenomenon of people reporting numbers with a certain digit. The classic example is people reporting their weight in numbers that end in 0 or 5. You weigh 173lbs? No problem, just report 170 or 175 because that "seems right".
This is, of course, a major theoretical issue. I say theoretical, because in practice, often surveys ask for, and accept, estimates from people and companies. Additionally, many different types of adjustments are often carried out on the data. Nevertheless, here is a simple way to address digit preference heaping, that is, when some numbers are "heaped" in a pile on numbers with a certain digit.
In this toy example, people will report tend to report their numbers using numbers that end with a 5.

I generated 100,000 uniform random numbers between 170 and 200. This is the "original data". Then I randomly changed 70% of the numbers ending in 4 to make them end in 5. This is the "extra 5 data". Then I used the statistical technique in the code to spread out the "extra 5 data" to make it the "adjusted data".
Here is my R script with comments:
#sample size
n<-100000
originaldata<-round(runif(n,170,200))
layout(1:3)
hist(originaldata)
minorig<-min(originaldata)
maxorig<-max(originaldata)
extra5data<-originaldata
#add extra numbers ending in 5
for (i in 1:n)
{
#change 70% of the numbers ending in 4 to ending in 5
if (runif(1)>.3 && substring(extra5data[i],3)=="4") {extra5data[i]<-extra5data[i]+1}
}
hist(extra5data,xlim=c(minorig,maxorig))
#spread out these extra, probably false but maybe true, numbers ending in 5
#on average keep them around that number however, since they may be valid
adjusteddata<-extra5data
#the larger this is, the more they get spread out to either direction
parameter<-.1
for (i in 1:n)
{
#if the original measurements are thought to be too small or too large, make the
#low and up endpoints assymetric accordingly
p<-runif(1,1-parameter,1+parameter)
#spread out 40% of the numbers ending in 5
if (runif(1)>.6 && substring(adjusteddata[i],3)=="5")
{adjusteddata[i]<-round(adjusteddata[i]*p)}
}
#if adjusted obs is less than minimum or greater than maximum from original series
#adjust appropriately
for (i in 1:n)
{
if (adjusteddata[i]{adjusteddata[i]<-minorig}
}
for (i in 1:n)
{
if (adjusteddata[i]>maxorig)
{adjusteddata[i]<-maxorig}
}
hist(adjusteddata,xlim=c(minorig,maxorig))
#adjustment preserves mean and adds to variance
mean(originaldata);mean(extra5data);mean(adjusteddata)
sd(originaldata);sd(extra5data);sd(adjusteddata)
Note that you don't want to spread out all the "extra 5 data" because obviously some of the data ending in 5 is valid. Also, note that the means of the data before and after the adjustment are generally the same. However, the more you spread out the "extra 5 data", the larger the variance will be of the "adjusted data". As mentioned, this is a simple way to think about adjusting data when it is suspected that digit preference heaping is present. Thanks for reading.
Please anonymously VOTE on the content you have just read:
Like:Dislike:
If you enjoyed any of my content, please consider supporting it in a variety of ways:
- PLEASE take a moment to check out two GoFundMe fundraisers I set up. The idea is to make it possible for me to pursue my passions. My goal is to be able to create free randomized educational worksheets and create poetry on a full-time basis. THANK YOU for your support!
- Email news@statisticool.com to sign up to receive news and updates
- Donate any amount via PayPal
- Take my Five Poem Challenge
- Subscribe to my YouTube channel
- Visit my Amazon author page
- Buy what you need on Amazon using my affiliate link
- Follow me on Twitter here
- Buy ad space on Statisticool.com
AFFILIATE LINK DISCLOSURE: Some links included on this page may be affiliate links. If you purchase a product or service with the affiliate link provided I may receive a small commission (at no additional charge to you). Thank you for the support!