Reminds me of an encounter on BoardGameArena where the top ranked 7 Wonders Duel player complained there was a randomization bug (the Great Library never offered the science progress token). I thought he was raging (who hasn’t heard a poker player complain about bad luck) but turns out the developer checked the code and did in fact find this was a bug!
Is a 155 throws enough to evaluate bias? Seems more times than I'd like to roll some dice, but not enough to gain enough measurement confidence. By what criteria is the person assigning the traffic light ratings? What about face coplanarity? Get this enthusiast in a metrology lab.!
No, it's much too low. OP shows Pearson's X^2 for their results, but that alone is meaningless. p-value would be the interesting metric. I haven't computed it (although we could from the results) but I expect it to be very high, i.e. it's likely to observe these results even with perfect dice.
The giveaway is the handling of uncertainty. That's too many decimal places for some of these measurements: 10um (0.01mm) is not reliably measurable by a cheapo caliper, and even trying to do it with a good caliper or micrometer, you'll find that everyday objects simply cannot be reliably straightforwardly measured with that level of precision. (You need cleaning procedures, standardized handling, standardized sampling, etc.) And quoting "4.1g (5.1% too heavy)" versus "4.0g (2.6% too heavy)" is just absurd: that last digit really doesn't mean much. So don't treat it like it does.
For example, on my random first d6 at hand, I get 4.47g from my nice scale and somewhere between 14.82 and 14.85 mm on the first face dimension, depending on how I measure, from my Mitutoyo caliper. I have a micrometer in the shop, but you can see that it'd be pointless to go get it. The next two faces are (14.79 to 14.84) and (14.76 to 14.87), so it's consistently like this.
Likewise, χ² to five decimal places isn't terribly useful... especially since you haven't really described the test you're running....
In general there's a lot of "look at me make measurements" here that might be impressive. There is very little "what is the true value of this measurement, and how well can we assert that", and simply not enough "is this the right thing to be measuring, and how much does that factor matter". That last one is critical: the actual weight of a die is, I think, not important at all. It's weight distribution that matters, so who cares about 0.1g of difference. Unless you're making a batch uniformity claim? But really this evidence just says more about your measuring equipment. And it's well known that different color resins, especially black, white, and red, are pretty differently loaded with pigments, so they have different properties. You can't just expect them to be the same, but the author seems surprised that they aren't.
And then we get to "These dice are safe to use" without any real description of the criteria or threshold. I say "this report is not safe to use (for serious purposes)"!
It's cute, it's a fun little minute to read on the internet this morning. But it's silly, and if my students back in the day or coworkers today sent it to me, they'd be getting red ink and remedial lectures in measurement uncertainty.
This reminds me of a D&D dice website that went into way too much detail about how they weren't fair and I remember photos of them stacked on top of each other to show the variations in manufacturing.
> The reason casino dice have such sharp edges is to get the to stop rolling faster with fewer tumbling. The more a die tumbles the more likely it will present any issues with it.
If I understand it correctly, the justification is this: if a die is biased (usually a heavier face), this bias will manifest with a higher chance the longer the die rolls. But if it stops abruptly, for whatever reason (bumping against the edge of the table, other dice, or having a shape that prevents longer roll time, like the casino dice) this bias will be less likely to manifest. Did I get this explanation right?
I have quite a few sets of dice for D&D, nearly all of which favour aesthetics over balance. But saying that I prefer to use simpler plastic with rounded edges at a table. Sharp edge dice stop very abruptly and tend to show bias based on how they were held. The same is true of metal dice which are heavier, and tend to land instead of roll. This isn’t really the outcome you want.
For role-playing game purposes - not for gambling or serious competition or encryption of your super-valuable secrets - there is a question of what sort of randomization is needed:
* Truly random outcomes: Doesn't hurt
* Psuedo-random outcomes: Good enough?
* Unpredictable but unequally distributed outcomes: As long as nobody can know what will happen, is that sufficient?
* Unknown outcomes: As long as the players can't predict the outcome, that's what counts. If the game manager can avoid bias somehow, why not have them pick the number? Even use family birthdays, old phone numbers, etc., like people do with passwords.
All devices will output unequal distributions for most realistic N, and especially for shorter series. Games are played mostly in shorter series. Does it matter if, over the long run, the device outputs a perfectly equal distribution?
Need a d10 roll? Just look at the last digit of the current second on your clock. Is it random? No, but it approximates randomness if you only make a roll sporadically.
A short video about what happened:
https://www.youtube.com/shorts/4ekc9Xwynkc
The giveaway is the handling of uncertainty. That's too many decimal places for some of these measurements: 10um (0.01mm) is not reliably measurable by a cheapo caliper, and even trying to do it with a good caliper or micrometer, you'll find that everyday objects simply cannot be reliably straightforwardly measured with that level of precision. (You need cleaning procedures, standardized handling, standardized sampling, etc.) And quoting "4.1g (5.1% too heavy)" versus "4.0g (2.6% too heavy)" is just absurd: that last digit really doesn't mean much. So don't treat it like it does.
For example, on my random first d6 at hand, I get 4.47g from my nice scale and somewhere between 14.82 and 14.85 mm on the first face dimension, depending on how I measure, from my Mitutoyo caliper. I have a micrometer in the shop, but you can see that it'd be pointless to go get it. The next two faces are (14.79 to 14.84) and (14.76 to 14.87), so it's consistently like this.
Likewise, χ² to five decimal places isn't terribly useful... especially since you haven't really described the test you're running....
In general there's a lot of "look at me make measurements" here that might be impressive. There is very little "what is the true value of this measurement, and how well can we assert that", and simply not enough "is this the right thing to be measuring, and how much does that factor matter". That last one is critical: the actual weight of a die is, I think, not important at all. It's weight distribution that matters, so who cares about 0.1g of difference. Unless you're making a batch uniformity claim? But really this evidence just says more about your measuring equipment. And it's well known that different color resins, especially black, white, and red, are pretty differently loaded with pigments, so they have different properties. You can't just expect them to be the same, but the author seems surprised that they aren't.
And then we get to "These dice are safe to use" without any real description of the criteria or threshold. I say "this report is not safe to use (for serious purposes)"!
It's cute, it's a fun little minute to read on the internet this morning. But it's silly, and if my students back in the day or coworkers today sent it to me, they'd be getting red ink and remedial lectures in measurement uncertainty.
https://www.gamescience.com/about-1
(Note: the sprue left by his sharp-edged process has since been proven to result in more bias than the tumbling undergone by the round-edged process.)
> The reason casino dice have such sharp edges is to get the to stop rolling faster with fewer tumbling. The more a die tumbles the more likely it will present any issues with it.
If I understand it correctly, the justification is this: if a die is biased (usually a heavier face), this bias will manifest with a higher chance the longer the die rolls. But if it stops abruptly, for whatever reason (bumping against the edge of the table, other dice, or having a shape that prevents longer roll time, like the casino dice) this bias will be less likely to manifest. Did I get this explanation right?
* Truly random outcomes: Doesn't hurt
* Psuedo-random outcomes: Good enough?
* Unpredictable but unequally distributed outcomes: As long as nobody can know what will happen, is that sufficient?
* Unknown outcomes: As long as the players can't predict the outcome, that's what counts. If the game manager can avoid bias somehow, why not have them pick the number? Even use family birthdays, old phone numbers, etc., like people do with passwords.
All devices will output unequal distributions for most realistic N, and especially for shorter series. Games are played mostly in shorter series. Does it matter if, over the long run, the device outputs a perfectly equal distribution?