... it is said that he [Babbage] sent the following letter to Alfred, Lord Tennyson about a couplet in "The Vision of Sin":
Every minute dies a man,
Every minute one is born
I need hardly point out to you that this calculation would tend to keep the sum total of the world's population in a state of perpetual equipoise, whereas it is a well-known fact that the said sum total is constantly on the increase. I would therefore take the liberty of suggesting that in the next edition of your excellent poem the erroneous calculation to which I refer should be corrected as follows:
Every minute dies a man,
And one and a sixteenth is born
I may add that the exact figures are 1.167, but something must, of course, be conceded to the laws of metre.
Not from one token, from one embedding. Text contains a low amount of information: it is possible to compress a few token embeddings into a single tiken embedding.
The how is variable. The calm paper seems to have used a MLP to compress from and ND input (N embeddings of size D) into a single D embedding and other for decompress them back
https://en.wikipedia.org/wiki/A_picture_is_worth_a_thousand_...
https://arxiv.org/abs/2010.11929
"""
"""It's the same as LLMs being able to "decode" Base64, or work with sub-word tokens for that matter, it just learns to predict that:
<compressed representation> will be followed by (or preceded by) <decompressed representation>, or vice versa.
The how is variable. The calm paper seems to have used a MLP to compress from and ND input (N embeddings of size D) into a single D embedding and other for decompress them back