What stood out to me is how the MLP’s expansion–contraction process mirrors how we sometimes need to stretch ideas into bigger spaces before distilling them back down
Do you think this mechanism also shapes the kinds of “world knowledge” associations LLMs surface beyond the immediate text?
Interesting connection, Frankline! There does seem to be something universal about spaciousness leading to better connections.
As for world-knowledge: Yes, one of the ideas of the MLP layers is that they learn facts about the world that get incorporated into the token embeddings adjustments.
What stood out to me is how the MLP’s expansion–contraction process mirrors how we sometimes need to stretch ideas into bigger spaces before distilling them back down
Do you think this mechanism also shapes the kinds of “world knowledge” associations LLMs surface beyond the immediate text?
Interesting connection, Frankline! There does seem to be something universal about spaciousness leading to better connections.
As for world-knowledge: Yes, one of the ideas of the MLP layers is that they learn facts about the world that get incorporated into the token embeddings adjustments.
Nicely put
Thanks for the clarity Mike