I prefer to think of it as they’re interpolation machines not extrapolation machines. They can project within the space they’re trained in, and what they produce may not be in their training corpus, but it must be implied by it. I don’t know if this is sufficient to make them too weak to create original “ideas” of this sort, but I think it is sufficient to make them incapable of original thought vs a very complex to evaluate expected thought.
People keep saying this, but if you try to interpret this at all literally, it just doesn’t work. Like, it’s phrased like it should have a precise meaning, right? Like, people even mention convex hulls when talking about it.
But if you actually try to take a convex hull of, some encoding of sentences as vectors? It isn’t true. The outputs are not in the convex hull of the training data.
I guess it’s supposed to be a metaphor and not literal, but in that case it’s confusing.
Especially seeing as there are contexts in machine learning where literal interpolation vs literal extrapolation, is relevant.
So, please, find a better way to say it than saying that “it can only interpolate”?
If it's all just points in the multidimensional space, why would the thing be restricted to some operations and not others. I'm not buying the argument
Sorry, I don't understand what you mean. Are you agreeing or disagreeing with me?
If it can only interpolate in a literal sense, that means that it only produces good outputs on convex combinations of inputs that appear in the training set. That's what interpolation means. But, if you take the embedding vectors of sentences/prompts, and then take the convex hull of these, it is not typical for new sentences not in the training set to have its embedding vectors be in the convex hull of these.
I’m not sure I follow your end to end reasoning. In an n dimensional space interpolation along and within the convex hull is pretty much what they’re doing. How can it possibly not be? How would it interpolate a point that’s not within its vector space? Yes, it’s very complex with non linear transformations and a very high dimensionality, and residuals and other features create more complexity in the shape of the hull. But an LLM can not infer a concept to which it has no information channel. That’s clearly nonsense. The fact that they do bounded, learned, nonlinear compositional generalizations over a representational space induced by training -is by nature interpolation- not extrapolation. I’m sorry, but I believe their immense power has you confusing math with magic.