4 Comments
User's avatar
Noah Haskell's avatar

Nice essay. I appreciate the Numberwang references, and the Rockwell Retro Encabulator video made me chuckle. Looking forward to the follow-up posts.

Expand full comment
Erick Wales's avatar

(This started out with me disagreeing with you but in the end I think I am just making your point, would love to hear your thoughts on this)

> people are imagining themselves to be measuring the informational content of a ghost!

While I agree that you can’t invoke some ideal language, any particular language – given its alphabet, grammar and vocabulary – has a measurable information density and the informational content of any given phrase can be given an approximate mathematical value (e.g. Shannon Information).

Though I think a straight bits-per-letter methodology wouldn’t be very representative of the actual information content of a particular proposition. For example:

“Dogs have tails and walk on four legs”

and

“Electrons have half-integer spin”

don’t seem to contain the same amount of information even though they contain about the same amount of letters. The point is though that they do contain information and although it cannot be measured exactly it does exist and is finite.

Although if I only gave you the details of a language and single phrase I think it is impossible to measure the information. You need some context as to the relative meaning of the individual words and their interactions. I wonder if it’s possible the techniques used to generate the internal vector representations of language in LLMs could be used to generate a contextual measure of information?

This measure would vary depending on the encoding strategy and training information, but parameterized in the right way it could still be relevant or interesting. The meaning also changes over time, for example my second statement was gibberish 200 years ago and might be naive in 200 more (These are changes in the “context” or training set used to build the model).

Anyways, even if you who write 576 pages wrestling with long sentences full of large words think your book contains an immense amount of information (arguably true), it still doesn’t mean you’re right or have said anything of actual value. Information content =/= truth.

Expand full comment
Nathan Ormond's avatar

Hey -- appreciate your thoughts here!

I was going to write about this with a clarifying response but it's quite a lot and probably worth its own post. I will write more about this in future and don't disagree with everything you've said. I do think I need to spell out what my views of language are and how this stuff integrates with that!

Expand full comment
Erick Wales's avatar

Really great piece Nathan! Thank you for sharing. Its too bad though that my prior probability in my own bayesian methods is .999 and my credence in your writing is .000001 so clearly and intuitively is just obvious that bayesian analytic philosophy is the best!!

Expand full comment