Florent Destremau

Generating random survey ratings in PHP.

At Windoo, we make a SaaS for HR people, with surveys, pulse questions and such… When we try to generate fixtures, or even demo accounts, it’s important to have realistic data. For instance, if all your ratings are at 5/5, it’s not going to help you simulate live data.

The constraints

A survey has multiple rating questions, such as “Do you think the last event was successful according to your needs”, “Would you recommend the same event to another person from your company ?”. What we want is simple: we want specific questions to have a fixed rating when generating random replies.

We would like to have somehow a “normal” distribution of replies. The center should be around the target rating we would like to have in the end, and the standard deviation would be randomly wide.

The solution

Generating targeted normal values

Generating a normal distribution value with mean and standard deviation in PHP can be done with a simple function:

function nrand(float $mean, float $sd): float  
{  
    $x = mt_rand() / mt_getrandmax();  
    $y = mt_rand() / mt_getrandmax();  
  
    return sqrt(-2 * log($x)) * cos(2 * pi() * $y) * $sd + $mean;  
}

This comes from a StackOverflow answer using the Box-Muller transform.

Once this is done, a rating random function would be the following:

function randomRating(float $target): int  
{  
    $rating = round(nrand($target, 1));  
    $rating = $rating > 5 ? 5 : $rating;  
    $rating = $rating < 1 ? 1 : $rating;  
  
    return $rating;  
}

This gives you results approximately centered around a $target value. The generated array of ratings will not necessarily have an average on target, but they will look “realistic”. You can of course play around with the 1 standard deviation that we use, and get different kinds of your distribution’s “sharpness” around the average.

In reality, the closer you try to target 1 or 5, the larger the gap between the target and the resulting average will be , because ratings over 5 are reduced to 5 so you don’t offset enough the average. We found a solution for that but this is not the scope of the article.

Choosing random targets

One of the problem that arose when we started generating our fixtures is that each survey reply has several question replies, so if we want different averages in our final result, we need to find “random” targets for each question. And the targets need to be consistent so that we can generate our replies in parallel scripts.

What we need is some kind of “hashing” of the question to determine a random-ish target. What I found was the crc32 PHP function that returns a control number from a string. It’s kind of like an MD5 but it’s an int . This way, we can deduce a percentage, let’s say we want a predictable random-ish percentage from a survey question title between 60% and 100%, we would write this:

    $targetPercentage = (crc32($question->getTitle()) % 40 + 60);

For this to be a target rating, just divide by 20 and you get a target rating between 3 and 5:

    $targetRating = (crc32($question->getQuestion()) % 40 + 60) / 20;

And voilà!