We’re in a world where LLMs are basically going to be extensions of how we think. An additional thing we use to do a lot of thinking tasks.
As a piano player, it’s important to work hands separately. Sometimes your right hand will carry the melody and your left hand the harmony, sometimes vice versa. Sometimes there may be more than just two “voices”/melodies/lines between your two hands. Even as a very good (as in getting paid to do it) sight reader, I learn a lot working all the voices/melodic lines separately.
Singers do similar things like singing only the vowels to keep themselves in the right placement. Learning handstands, you have to work your wrists, rotator cuffs, core (which is many things), etc. separately. Yoga, Pilates, and running also help us learn to break problems down this way.
Anyway, all that to say: If LLMs are gonna be a natural extension of how we think, we need to understand what parts of problem-solving LLMs are good for, and what parts our brains are for. The nice thing about working these bits “separately” is that one side is done for us. So we just need to consciously practice using our brains.
As programmers that means, maybe we conscientiously practice writing things ourselves sometimes. Remembering that this even if this sacrifices short-term “velocity” (whose measurement is problematic, but I digress), it preserves our long-term ability to do good work. And I think any of the above physical/artistic practices (or countless others), worked in these ways, will help reinforce this entire mindset.
I think kids of the coming generation will be sharply divided on their ability to conscientiously practice things separately. It’s been happening, but I suspect LLMs will accelerate it unless how we actually teach kids can catch up.
> We’re in a world where LLMs are basically going to be extensions of how we think
If that's the case then we're in trouble based on my experience. This week I've been using ChatGPT to help figure out some old linux platform that I need to resurrect. It's very good at quickly searching and surfacing relevant information online, and that's helpful, but if I did not have a lot of experience at linux administration to be able to see where it was suggesting the wrong thing, or initially dismissing the right thing, then I'd just be thrashing.
The LLM is helping me because I know what I need, and it can search and read faster than I can. But it's not really very smart.
> An additional thing we use to do a lot of thinking tasks.
Which is to say, an additional thing you're going to be forced to pay a lifelong tithe to a trillion-dollar company in order to do a lot of thinking tasks.
I’m rather optimistic about the future of smaller open-source models and market competition actually doing its job here, honestly. I myself, again, err on the side of doing things with my own brain. But there are many things LLMs are useful for, and they’re definitely better than a “rubber duck” if you don’t trust them blindly.
I assume they are called hooks, .vscode/settings.json - you can put some linters/tests which run automatically (from my understanding, something similar to git hooks, hence the reason I called them hooks). I generally hate the concept and I generally dislike vscode so... yeah.
I'd hope most functional adults understand that the Fields Medal and basically every other annual "prize" out there is awarded to both "recombinant" innovations and "new-dimensional thinking" innovations. Humans aren't going to come up with "new-dimensional" innovations in every field, every single year.
I'd say yes, LLMs "just" recombine things. I still don't think if you trained an LLM with every pre-Newton/Liebniz algebra/geometry/trig text available, it could create calculus. (I'm open to being proven wrong.) But stuff like this is exactly the type of innovation LLMs are great at, and that doesn't discount the need for humans to also be good at "recombinant" innovation. We still seem to be able to do a lot that they cannot in terms of synthesizing new ideas.
> Humans aren't going to come up with "new-dimensional" innovations in every field, every single year.
In fact, they are more rare. Specifically because they harder to produce. This is also why it is much harder to get LLMs to be really innovative. Human intelligence is a lot of things, it is deeply multifaceted.
Also, I'm not sure why CS people act like axioms are where you start. Finding them is very very difficult. It can take some real innovation because you're trying to get rid of things, not build on top of. True for a lot of science too. You don't just build up. You tear down. You translate. You go sideways. You zoom in. You zoom out. There are so many tools at your disposal. There's so much math that has no algorithmic process to it. If you think it all is, your image is too ideal (pun(s) intended).
But at the same time I get it, it is a level of math (and science) people never even come into contact with. People think they're good at math because they can do calculus. You're leagues ahead of most others around you, yes, and be proud of that. But don't let that distance deceive you into believing you're anywhere near the experts. There's true for much more than just math, but it's easy to demonstrate to people that they don't understand math. Granted, most people don't want to learn, which is perfectly okay too
To keep my usual rant short: I think you’re assuming a categorical distinction between those two types of innovations that just doesn’t exist. Calculus certainly required some fundamental paradigm shifts, but there’s a reason that they didn’t have to make up many words wholesale to explain it!
Also we shouldn’t be thinking about what LLMs are good at, but rather what any computer ever might be good at. LLMs are already only one (essential!) part of the system that produced this result, and we’ve only had them for 3 years.
Also also this is a tiny nitpick but: the fields medal is every 4 years, AFAIR. For that exact reason, probably!
I took humans thousands of years, then hundreds of years, to come to terms with very basic concepts about numbers.
Its amazing to me when people talk about recombining things, or following up on things as somehow lesser work.
People can't separate the perspective they were given when they learned the concepts, that those who developed the concepts didn't have because they didn't exist.
Simple things are hard, or everything simple would have been done hundreds of years ago, and that is certainly not the case. Seeing something others have not noticed is very hard, when we don't have the concepts that the "invisible" things right in front of us will teach us.
It's why the invention of teaching has been so important. Took a long time for humans to develop calculus. A long time to then refine it and make it much more useful. But then in a year or two an average person can learn what took hundreds of years to invent. It's crazy to equate these tasks as being the same. Even incremental innovation is difficult. You have to see something billions of people haven't. But there's also paradigm shifts and well... if you're not considered crazy at first then did you really shift a paradigm?
And yet it is still taught in less than optimal form, lacking algebraic closure in ways that are completely unnecessary.
It isn't a secret, but the percentage of people who don't know that, plus the percentage of mathematicians who vaguely or more directly know that, but habitually use the broken, more difficult (i.e. less algebraic) notation is ... virtually everyone.
I am not trying to pick on calculus, this is everywhere. Important and useful concepts are right in front of all of us, that we don't see even in the context of what we are relatively fluent with.
Because we learn quickly, where we have (almost always inherited) the right preparatory perspectives (earned over lifetimes by others), we vastly overrate our ability to reason independently.
Were I to guess they're talking about the different derivatives. Here's at least something that might introduce you to some of the shortcuts people take but it's far from complete [0] (you can probably find more if you search things like how physicists use the derivative wrong. (I make this critique as someone with a degree in physics too))
I often say that math is taught through a game of telephone. It's a fanatic example of the problem with "I just care that it works" type of attitudes. The problem is if that's your actual belief then you wouldn't be saying that because you'd need to dig deeper. Caring about it working is exactly the reason people do did deeper and bring up issues. The reason things fall apart less in math is because the language was specifically invented to make miscommunication difficult. That's why it's overly pedantic. That's why we use formal languages rather than natural ones. So we should rephrase "I just care that it works" is that it's actually "I just care that it works for this exact case." It makes it easier to see the problem. If you don't know the subject in more detail then you can't actually know if it breaks in that use case. The broken parts are completely invisible to you! Which undermines your own stated goal.
This goes for a lot more than math. But being a formal language it's just easier to point things out and how people misunderstand. If you're an expert in any field you've probably see this same phenomena in that domain though. People having over confidence and their refusal to get deeper knowledge actually just undermines their whole goal. I'd honestly call this a form of Murray-Gell-man Amnesia
OpenAI themselves must not have a "reasonable definition of L", then. Their own papers and press releases refer to GPT-2 (from 2019) as a "large language model".
Yes, and 1.5 billion parameters meets no reasonable current definition of large. It would be considered a tiny language model. OpenAI themselves refer to their small/fast models as small models all over their documentation.
The term doesn't change its meaning because something new comes along.
The point of the term "large" is to highlight the massive parameter count (compared to traditional statistical models, where having 1.5 billion parameters was basically unheard of). It leads to the "double decent" phenomenon that allows them to generalize in ways traditional statistical models can't.
The idea that the "large" descriptor was just a subjective exclamation, like "oh wow this model is pretty large ain't it", is revisionism.
yes, it does. That's why OpenAI refers to it's small models as small. They are just so different. The capabilities have changed dramatically. The use cases are wildly different. The architectures are quite different. Even the core idea of attention is different. Training them is materially different. Serving them is materially different. A 1.5 bill parameter model from 2019 is so different from today's LLMs that they really don't have much in common. What we have now is quite similar to what we had a couple years ago though.
Sure we do, since Fei-Fei Li and team created that annotated dataset, which allowed to train first LLMs. So LLMs are here for more than a decade already.
Fei Fei was annotating images... the second L in LLM is for "language". The first language models named LLM at the time were trained on language data, with an objective function of predicting the next token. It had nothing to do with the imagenet data. Imagenet data was used in... vision models.
The attention is all you need paper didn't ever use the term LLM or large language model because the phrase didn't exist in industry.
When people say this what they mean is that we've had plausibly useful LLMs for around three years, and I would say that is basically true. The stuff before 2023 could barely be classified above the level of an interesting toy.
I think your comment about inventing new words is an interesting one. One of the things that I believe limits our ability to discover new ideas is our ability to describe related concepts. For example, the reason we still can't have clear discussions on consciousness is probably partly due to the fact that the necessary concepts haven't been cemented in language. We need new language before we can describe consciousness.
I would guess LLMs are limited in their ability to be genuinely novel because they are trained on a fixed language. It makes research into the internal languages developed by LLMs during training all the more interesting.
The fundamental paradigm shift is the categorical distinction. And what would constitute many new words for you? It introduced a bunch of concepts and terms which we take for granted today, including "derivative", "integral", "infinitesimal", "limit" and even "function", the latter two not a new words, but what does it matter? – the associated meanings were new.
I agree with almost all of what you have stated, save for a minor nitpick: I frankly don't think most functional adults think about the Fields Medal, similar annual prizes, or the qualities of the innovations of their candidate pools. I also think that that's totally okay. I think among a certain learned cohort of adults it's okay to hope that, and I think it's okay to imagine an idealized world where having an opinion on this sort of matter is a baseline, but I don't think it's realistic or fair to imply that (what I believe handwavily to be a majority of) adults are nonfunctional for not sharing this understanding.
I think an LLM trained on pre-calculus material would easily stumble into reinventing at least early calculus. It's already pretty easy for students to stumble into calculus from solid enough fundamentals.
We even think that the Babylonian astronomers figured out they could integrate over velocity to predict the position of Jupiter.
> I still don't think if you trained an LLM with every pre-Newton/Liebniz algebra/geometry/trig text available, it could create calculus. (I'm open to being proven wrong.)
The experiment is feasible. If it were performed and produced a positive result, what would it imply/change about how you see LLMs?
GP was stating that they don't believe this would happen (I don't either), but also to make the point that it's a falsifiable view. (At least in theory. In practice, there probably won't even be enough historical text to train an LLM on). No, I don't think it would be falsified. Asking what if I'm wrong is kind of redundant. If I'm wrong, I'm wrong, duh.
The problem is the amount of data with that cutoff is really minuscule to produce anything powerful. You might be able to generate a lot of 1700s sounding data, you’d have to be careful not to introduce newer concepts or ways of thinking in that synthetic data though. A lot of modern texts talk about rates of change and the like in ways that are probably influenced by preexisting knowledge of calculus.
Without passing opinion on GP's point, I think that just proves it's hard to establish a data set that doesn't bias toward the result you're hoping to find.
Time cutoff LLMs are regularly posted to HN. It takes just one success to prove feasibility.
Besides, we can forecast our thoughts and actions to imagined scenarios unconditioned on their possibility. Something doesn't have to be possible for us to imagine our reactions.
I don't think its really feasible - there just isn't enough training data before calculus. I would guess all the mathematical and philosophical texts available to Newton and Leibniz would fit on a CD-ROM with loads of space to spare.
The “cars stopping in random places everywhere in any remotely urban area” thing has become a huge problem in general. It’s probably our clearest sign of the fundamental scalability problems of car-centric design.
Assuming we can’t significantly reduce car usage (and noting that you can still prioritize bike/pedestrian-friendliness and assume this), we really need regular car equivalents to bus stops. For Waymo or human rideshare drivers, or just non-transactional human families, say, dropping grandma off at a brunch restaurant. And significant fines + license points for anyone who stops anywhere outside them, like they do now, once established. The idea is no different than frequent trash cans and significant littering fines, really.
(I’m just spitballing here and am open to being wrong, just putting the idea out there as someone who’s noticed how much worse driving in cities has become over time.)
In France, especially in Paris, you have large "delivery" parking places where you are allowed to stop but not to park.
Unfortunately, with the rise of bike lanes, those spots are not quite dangerous, as the delivery person has to cross the lane to access the sidewalk and bicycles refuse to slow down, as usual.
We’ve had a looming crisis for decades of young people increasingly not understanding a lot of the fundamentals of mathematical logic. And I think treating LLMs (which are amazing tools) as “AI,” and having it play this type of role, is the final step towards a lot of unrecoverable self-destruction.
We need to remember that the core of what “logic” is can be understood by every human mind, and that it’s our individual responsibility to endeavor to build this understanding, not delegate or hand-wave it. For all of human history, delegating/hand-waving away basic logic that can be understood by actuarial/engineering types has never gone well in the long term.
You do if what you are implementing requires it. Beyond this, if you don't understand the code the AI agent outputted, you shouldn't let other people run it in production.
I’d say both local data structures and algorithms atop them, and external services like DBs, etc., are both just “resources” in a more abstract sense. Optimizing performance is a matter of using the right resources for the right things. Algorithms help a lot when you’re building FE components (even if the server is rendering them, or “rendering” responses for the FE).
I’d also argue “micro-ORMs” like Diesel (which isn’t really much like ActiveRecord, Hibernate, etc., but more a very thin DSL/interface that maps SQL types to Rust types), combined with LLMs, are the ideal solution (assuming we still want humans to be able to easily understand and trust the code generated). And there’s a big argument to be made for schema migration management being done at the app level (with plain SQL for migrations).
All that said, at work, we use Rails. And ActiveRecord’s “includes/preload/eager_load” methods are fantastic solutions to 99% of cases of querying for things efficiently, and are far more clear than all the SQL you’d have to write to replicate them.
I’ve only taken two buses in Brazil (Goiânia to Pirenópolis and back), and can definitely report that this was not the case there. It was incredibly hot and dry there until you hit the mountains, and the AC barely worked. Granted, I think this was one of the crappier bus lines, and they had a monopoly on this particular route.
There’s an entire Linux distro (Asahi) for MacBooks. Apple has never released a Mac with a locked bootloader.
And macOS frankly provides a far better Unix experience than ChromeOS, in my experience, having actually used both (including for development, though only for a short time on ChromeOS because it was horrible).
Apple did not lock the bootloader, but they do not provide documentation for their products.
What would have been a trivial porting work with documentation, becomes extremely time-consuming and hard work without documentation.
That is why Asahi Linux lags by several years with the support for Apple computers, and it is unlikely that this lag time will ever be reduced. Even for the old Apple computers the hardware support is only partial, so such computers are never as useful for running Linux as AMD/Intel based computers.
As a piano player, it’s important to work hands separately. Sometimes your right hand will carry the melody and your left hand the harmony, sometimes vice versa. Sometimes there may be more than just two “voices”/melodies/lines between your two hands. Even as a very good (as in getting paid to do it) sight reader, I learn a lot working all the voices/melodic lines separately.
Singers do similar things like singing only the vowels to keep themselves in the right placement. Learning handstands, you have to work your wrists, rotator cuffs, core (which is many things), etc. separately. Yoga, Pilates, and running also help us learn to break problems down this way.
Anyway, all that to say: If LLMs are gonna be a natural extension of how we think, we need to understand what parts of problem-solving LLMs are good for, and what parts our brains are for. The nice thing about working these bits “separately” is that one side is done for us. So we just need to consciously practice using our brains.
As programmers that means, maybe we conscientiously practice writing things ourselves sometimes. Remembering that this even if this sacrifices short-term “velocity” (whose measurement is problematic, but I digress), it preserves our long-term ability to do good work. And I think any of the above physical/artistic practices (or countless others), worked in these ways, will help reinforce this entire mindset.
I think kids of the coming generation will be sharply divided on their ability to conscientiously practice things separately. It’s been happening, but I suspect LLMs will accelerate it unless how we actually teach kids can catch up.
reply