[talk] passwd entropy and strength

Sun Nov 5 12:32:18 EST 2017

On November 5, 2017 at 10:03:21, George Rosamond
(george at ceetonetechnology.com) wrote:
> Now the first Tr0... passwd has 3.546... bits per byte, and the second
> correct... has 3.590... Therefore:
>
> Tr0.. 3.546 x 12 digits (including space) and is 42.48 bits of entropy
>
> correct... 3.590 x 28 digits and is 100.5 bits of entropy
>
> This seems to confirm the argument of the xkcd cartoon. The easier
> passwd correct... has more than double the bits of entropy and is easier
> to remember.
>
> The problem with that argument, however, is the same problem with
> Diceware. The words, like Diceware, are all in standard US English
> dictionaries, and most methods of bits/entropy calculation doesn't take
> that into account. Both xkcd and Diceware only use a "26 character
> universe", ie, they limit their content to lower-case alphabet English.
> So what may look better in terms of some calculations, doesn't add up
> when it comes to determining patterns, ie, English words.
>
> So someone getting some of the passwd really just needs a "Wheel of
> Fortune" approach to determining a passwd in full.
>
> co__ect ho_se batte_y staple
>
> "Can I buy an 'r'?"
>
> And "universe" is one of the criteria I think matters for password
> strength. All ASCII printable characters is a 95 universe, which is much
> better over the 26 in a lower-case (or upper-case) US English alphabet one.
>
> Length matters, of course, but I think it's too often the main criteria
> for users and sysadmins, and the think that intimidates users the most.
>
> But it's third criteria that matters in this case, what I call
> "diversity", ie, can you find it in any dictionary? Does it contain
> common digraphs (in english, th, ng, st, etc). Both xkcd and Diceware do.

	The argument in XKCD, as you’ve mentioned, is finding a more optimal
entropy:memory ratio, since hard-to-remember passwords tend to be
easier to guess by reading post-it notes.

	I’m a fan of the short english-ish sentence approach. However, it
doesn’t have anywhere near the entropy you have calculated up above.
The reason is that if it’s known they’re English words, which you
should assume in order to calculate minimum entropy, then there’s a
lot less variation.

	/usr/share/dict/words has 235,886 entries on my Mac. That’s ~17.847
bits of entropy per word. The average English speaker’s vocabulary
seems to hover at about 15,000 words, which is ~13.872 bits of entropy
per word. Multiply that by 4 and you get ~55.5 bits of entropy. Not a
huge improvement over the mixed number-letter case.

	I am a big fan of this approach, though, since the
bits-of-entropy-per-bits-of-neuron ratio is really high. Our brains
are made for remembering language, and while scrambles of words aren’t
ideal for recall (vs real grammatical sentences), they’re a /lot/
better than what amounts to random letter substitution.

> The conclusion, to me, is that getting users to use the whole ASCII
> universe of 95 characters with diversity is an achievable goal, and
> removes the easy brute force dictionary attacks so common.

	I’d rather see longer strings of words, personally, since I think
it’s more efficient when meat-space storage is taken into account. My
ideal circumstance is basically to have a well-remembered sentence
that acts as a master password to 1Password or Keychain or something,
and then offload literally all of the rest of the completely random
and un-rememberable passwords to that.

-bjc