There are (relatively simple) examples of what the transformer architecture is s...

lucubratory · on May 19, 2024

Can you provide those examples?

mjburgess · on May 19, 2024

all statistical AI systems are models of ensemble/population conditional probabilities between pairs of low-validity measures. In practice, almost all relevant distributions are time-varying, causal, and require a large number of high validity measures to capture.

eg., NLP LLMs model, eg., all books ever written using frequencies by which words co-occur at certain distances relative to other words.

But these words are about the world (, people, events, etc.) and these change daily in ways that completely change their future distribution (eg., consider what all people said about Ukraine/Russia pre/post a few hours of 2022).

The LLM has no mechanism to be sensitive to what causes this distribution shift, which can be radical for any given topic, and happen over minutes.

All models of conditional probabilities of these kinds end up producing models which are only good at predicting on-average canonical answers/predictions that are stable over long periods.

nl · on May 19, 2024

> The LLM has no mechanism to be sensitive to what causes this distribution shift, which can be radical for any given topic, and happen over minutes.

This sounds so logical and authoritative. And yet:

me> What event would cause a change in what all people said about Ukraine/Russia pre/post a few hours of 2022

GPT4O> A significant event that caused a drastic change in global discussions about Ukraine and Russia in 2022 was the Russian invasion of Ukraine, which began on February 24, 2022. This military escalation led to widespread condemnation from the international community, significant geopolitical shifts, and a surge in media coverage. Before this invasion, discussions were likely more focused on diplomatic tensions, historical conflicts, and regional stability. After the invasion, the discourse shifted to topics such as warfare, humanitarian crises, sanctions against Russia, global security, and support for Ukraine.

mjburgess · on May 19, 2024

Right... because it's been trained on those news stories.

The point is a model whose training stopped in 2021 would not produce a history of ukraine (etc.) that a person writing in 2023 would.

The later GPTs are trained on the user-provided prompts/answers of previous GPTs, so this process (which isnt the LLM, but it's the activity of research staff at OpenAI) is what's inducing approximate tracking of some changes in meaning.

Whilst this works for any changes over-represented in the new training data, (1) the LLM isnt doing that, its the researchers; and (2) this process is vastly expensive and time-intensive; and (3) only tracks changes with a high word frequency in new data.

If you could run the months-long, 1GWh, 10s-million-USD training process each minutes of the day, you would resolve the inability of the model to track major news stores... but would not resolve its ability to track, say, the user changing their clothes.

The sensitivity to the model of stuff in the world arises because of humans preparing the training data to bring about apparent sensitivity. Absent the activity of these humans, the whole thing drifts gradually into irrelvance.

nl · on May 19, 2024

> would not resolve its ability to track, say, the user changing their clothes.

In context learning works fine for this (and does for the Russia/Ukraine change too).

But yes, sure. It can be outdated in the same way a person cut off from news can be.

We've never argued that a shipwrecked person who was unaware of news became less intelligent because of that, just that their knowledge is outdated.

Additionally, the whole point of machine learning is to make systems that learn so they remain useful.

It seems likely that a model in soon (one year? five years? one month? who knows..) will be able to continually watch video broadcast news and videos of your home, continually updating its model.

In this case it would understand both the Ukraine issue and what you are wearing. Is it now suddenly intelligent? It's true it might be more useful, but to me that is a different thing.

JohnKemeny · on May 19, 2024

On Limitations of the Transformer Architecture

https://arxiv.org/abs/2402.08164