[talk] passwd entropy and strength

Sun Nov 5 10:02:00 EST 2017

This is something I've tinkered with for a long while, and thought I'd
raise it on talk@ even though it's a bit OT from BSD land.

Looking for feedback/correction on my argument here.

I toyed a bit with the xkcd passwds... the well-known https://xkcd.com/936/.

The passwords in question are:

Tr0ub4dor &3 (aka "Tr0")

versus

correct horse battery staple (aka "correct")

There is a unix program called ent in all BSD ports
(https://www.fourmilab.ch/random/) which calculates entropy with a bunch
of different measurements.  I fully don't understand all the tests, but
I generally look at the first one "Entropy" which is bits of entropy per
byte, ie, how much entropy/randomness per digit. So if you have a five
digit passwd and 3 bits of entropy per byte, the passwd entropy would be
5 digits X 3 bits of entropy per byte = 15 bits of entropy.  I think
that's correct.

Now the first Tr0... passwd has 3.546... bits per byte, and the second
correct... has 3.590... Therefore:

Tr0.. 3.546 x 12 digits (including space) and is 42.48 bits of entropy

correct... 3.590 x 28 digits and is 100.5 bits of entropy

This seems to confirm the argument of the xkcd cartoon. The easier
passwd correct... has more than double the bits of entropy and is easier
to remember.

But then look at the password strength wikipedia page, specifically the
chart entitled "desired password entropy".
https://en.wikipedia.org/wiki/Password_strength/

The Tr0 passwd is considered in the column of "All ASCII printable
characters" and the correct passwd is in the "case insensitive Latin
alphabet".  Even by this the Tr0 passwd is about 72 bits of entropy and
the correct passwd is 128 bits of entropy.

Certainly the recommended method from the cartoon would likely be a
massive improvement for most users. And with that, I don't think it's a
bad thing if there is some real mathematical improvement in passwd strength.

The problem with that argument, however, is the same problem with
Diceware. The words, like Diceware, are all in standard US English
dictionaries, and most methods of bits/entropy calculation doesn't take
that into account. Both xkcd and Diceware only use a "26 character
universe", ie, they limit their content to lower-case alphabet English.
So what may look better in terms of some calculations, doesn't add up
when it comes to determining patterns, ie, English words.

So someone getting some of the passwd really just needs a "Wheel of
Fortune" approach to determining a passwd in full.

co__ect ho_se batte_y staple

"Can I buy an 'r'?"

And "universe" is one of the criteria I think matters for password
strength. All ASCII printable characters is a 95 universe, which is much
better over the 26 in a lower-case (or upper-case) US English alphabet one.

Length matters, of course, but I think it's too often the main criteria
for users and sysadmins, and the think that intimidates users the most.

But it's third criteria that matters in this case, what I call
"diversity", ie, can you find it in any dictionary? Does it contain
common digraphs (in english, th, ng, st, etc).  Both xkcd and Diceware do.

Going back to the passwd strength chart, how long does a passwd in
26-universe alphabet have to be to equal on that uses a 95-character
ASCII universe?

alpha only/ascii

13/7 (40 bits/entropy in total)
25/13 (80 bits)
39/20 (128 bits)

etc.

The conclusion, to me, is that getting users to use the whole ASCII
universe of 95 characters with diversity is an achievable goal, and
removes the easy brute force dictionary attacks so common.

Can't imagine a more perfect Sunday morning post.

g

-- 

5822 F82D 665B 5C6A 915B FAD4 B014 1CEE 545A A6C6