Hacker Newsnew | past | comments | ask | show | jobs | submit | runarberg's commentslogin

I wouldn’t call them effective as much as motivating. I think for people who would not be motivated otherwise, this methodology is fine actually, as the alternative is probably nothing. However if you are motivated, almost any other method is more effective then DuoLingo (or alternatives), including more effective then the old DuoLingo with the forums and everything.

Simon Willison’s analogy does not apply unless that other team was immediately fired after they delivered the image resize service, or (more commonly) was done by a one off contractor. The difference is the trust model. We trust that our company has hired a competent team which maintains knowledge of the image resizing service, that they respond to bug reports and feature requests and that they know how to fix and implement those.

Now I have been on HN long enough to know that we used to despise code written by contractors which we now depend on.


Why does the team need to be "fired"?

The single person who did the service might just quit and go to another job. They might be external consultants that rotate away when the contract ends. It might be a SaaS service where you don't control the code at all - nor the composition of their team.

We have trusted services, contractors and teams within our companies before. Now suddenly _everyone_ has ALWAYS read and meticulously analyzed every single line of code they have ever imported to a project?


As your parent comment says. It’s about trust. People don’t hire contractors with low reputations. Same with SaaS services. That’s why you see so much stuff about branding and customer testimonials. It can be gamed, but usually works well enough.

LLM have no reputation to lose. Their work may or may not be aligned with your goals and they can’t care if they messed up.


Personally, if my company would have one person write a utility which mine would depend on, and that person would quit soon after delivery, I would be pissed. And I would demand that my team take ownership of the utility, and gain intimate knowledge of the utility, and voice my concerns with management who made the decision to hand out a task like that to a single person. I would then inform that management about the concept of bus factor, and how they just violated best practices. That next time they decide to hand out a task like that to a single person, that they should instead just hand it out to the team which is gonna rely on that utility.

I’ve noticed you are posting a lot of studies around, some of which have been peer reviewed and some not, some argue against your point, and some show mixed results.

Are you a researcher in the pedagogical sciences? Regardless, you have to admit that the original claim has very little evidence behind it despite being testable. And also the caveat you tag onto the end is a pretty massive caveat, and from the sources you provided it seems that students which use in the way which you claim has been shown to be effective, that those students are in a minority anyway.


I'm not a researcher in the official sense, my interest is that of a parent whose kids are interested in programming and will be graduating into a world upended by AI, and how I can best prepare them for it. I always look to empirical evidence whenever there is a conflict of opinions, and there certainly are many opinions here!

I initially banned them from using LLMs for homework or coding assignments, because as above, my intuition is that you learn best by doing, and you won't learn anything if LLMs do everything for you.

On the other hand, I personally have learned insane amounts of a new subject matter simply by pair programming and conversing with an LLM. I could not even "cheat" and let the LLM do everything because the problem I tackled is not really addressed anywhere! This forced me to experiment a lot, which helped me learn very quickly.

This led me to wonder what "disciplined" use of LLMs can do for learning... which is how I came across a whole bunch of these studies.

I think your concern is really about disciplined use of LLMs, rather than the overall effect of LLMs on learning. And I would agree: students will just be too tempted to use them to cheat. However, I think those who have the discipline to use them judiciously can supercharge their learning like never before, but only as long they do the hard work of "building the muscles" without AI.


> a Junior (in ANY subject) has the ability to LEARN so much faster with an AI research assistant

This is a testable hypotheses with severe lack of citations. Intuition would argue the opposite. We learn by using our brains, if we offload the thinking to a machine and copy their output we don‘t learn. A child does not learn multiplication by using a calculator, and a language learner will not learn a new language by machine translating every sentence. In both cases all they’ve learnt is using a tool to do what they skipped learning.


This seems to me like one of those things where people go into it with widely different initial assumptions.

1. AI is for cheating and doing the work for you. Obviously it won't help you learn faster because you won't have to do any thinking at all.

2. AI is an always-available question answering machine. It's like having a teaching assistant who you can ask about anything at any time. This means you can greatly accelerate the process of learning new things.

I'm in team 2, but given how many people are in team 1 (and may not even acknowledge team 2 as even being a possibility) I suspect there may be some core values or different-types-of-people factors at play here.


This is also a testable hypothesis. I would like to see usage statistics before making assumptions here but my gut feeling is that an overwhelming AI usage (like > 90%) would fall into your category 1.

But even with category 2. I think that still does not absolve AI as a cheating machine. Doing research is a skill and if you ask AI to do the research for you that is a skill a junior developer simply never learns.


This is interesting and relevant: https://www.sciencedirect.com/science/article/pii/S095947522...

"The expertise reversal effect is present when instructional assistance leads to increased learning gains in novices, but decreased learning gains in experts."

There's a whole lot of depth to the question of how AI tools support or atrophy learning for different levels of expertise.


Actually, you're both right. Using AI as a supplementary learning aid -- i.e. students use AI as a personalized tutor but still do the assignments themselves -- produces better outcomes. But using AI as a crutch -- i.e. using it to do the assignments -- produces worse outcomes.

There is even preliminary research evidence for this, e.g. https://www.mdpi.com/2076-3417/14/10/4115 and https://www.sciencedirect.com/science/article/pii/S2666920X2...


> students use AI as a personalized tutor but still do the assignments themselves.

So your first study actually concludes the opposite. It concluded that all AI users performed worse, but the effect was smaller for students which used AI as a tutor.

The second meta analysis I don‘t quite understand. I understand they conclude that using AI tutor shows significant improvement, but I don‘t understand the methodology. I may be misunderstanding but it seems to simply count papers which shows positive outcomes and reaches conclusion that way. I think that methodology is deeply flawed as it will amplify whichever biases are present in the studies it uses. I also think the lack of control groups is a major issues. If we are comparing AI tutor to nothing, off course the AI tutor is gonna perform better. We need to compare to traditional methods. And this is especially relevant in our discussion because junior developers usually have excellent access to senior developers (via peer review, pair programing, etc.), much better then student’s access to tutors for that matter.

So out of the meta-analysis I picked the paper with the strongest claim (trying to steel-man it) which is this one: https://online-journal.unja.ac.id/JIITUJ/article/view/34809/...

It claims the following in the abstract:

> The results indicated that students employing AI tutors shown significant improvements in problem-solving and personalized learning compared to the control group.

Now when I look at the control group it claims this (also in the abstract):

> Participants were allocated to a control group receiving conventional training and an experimental group utilizing AI technology,

But when I look into the methodology section I see this:

> The researchers classified the patients into two groups: MathGPT and Flexi 2.0

MathGPT and Flexi 2.0 are both AI tutors. Now I am confused, where is the control group and how was this “conventional training conducted”?

The methodology section actually tells a different story from the abstract:

> This research utilized a quantitative methodology via a quasi-experimental design.

By quasi-experimental design they mean that they tested the same students before and after AI intervention. And concluded that the AI tutor helped them improve. Now this is not what control group means, so the researchers are actually lying by omission in the abstract. This is a spectacularly bad experimental design and I wonder how it would pass peer review, so I look at the publisher Jurnal Ilmiah Ilmu Terapan Universitas Jambi. So not exactly a reputable journal.

I still stand by my no evidence for a testable hypotheses. I suspect that your first link is actually correct in that AI is bad for students and just less bad if it is used as a tutor.


I hadn't looked at that study you selected, but yeah the methodology conflicts with the abstract (Also it low-key seems to be an ad for "Flexi 2.0.") It does seems to be a shady paper, with a small N and in a journal of questionable repute.

That said, there are 80+ other studies listed in the meta-study, which is pretty frank about its limitations. (Note the snippets about positive biases in the conclusion.) It is going more for quantity over quality and is transparent about the statistical findings of each one (or lack thereof; see the count of "Not reported"s.) All these references have a myriad of results, but across the spectrum of well-designed studies at reputable venues to the other end, they follow the same themes, so I don't think this can be dismissed that easily.

But if you want, here's more research (some of which I linked in a sibling comment https://news.ycombinator.com/item?id=48241839) which has similar findings:

https://scale.stanford.edu/ai/repository/ai-meets-classroom-...

https://arxiv.org/html/2601.20245v2 (from Anthropic)

This article summarizes some of the above and more studies and has similar findings: https://maxmynter.substack.com/p/learn-to-code-with-llms-i-r...


This was the only study from the meta analysis that I read, and I picked it because it made the strongest claim out of all of them.

This is in the opening of the results section in the meta-analysis:

> In the final screening phase, a rigorous full-text analysis evaluated the methodological robustness and empirical validity of the remaining studies. [...] The final corpus comprised 88 studies that demonstrated robust empirical evidence for LLM applications in educational contexts.

The inclusion of the study I read does not give me confidence that this statement is true. And the fact that they reach their conclusion by simply tallying up the positive vs. negative studies makes me conclude that this meta-analysis is practically useless. They do admit this in the conclusion (which is probably why it passed peer review [assuming the peer reviewer didn’t read the same citation as me as I am 100% certain they would have asked for it to be excluded]). But that pretty much just leaves us with nothing. We are exactly where we started. No evidence that LLMs help students beyond traditional methods.

Now I am not gonna read that Anthropic study. It reminds me of Cigarette companies finding the health benefits of cigarettes. That leaves that excellent 3-study review. In their first study they found LLM has negative effects on students (in line with the first link you showed me). In the second study they found no effect. And in the third study they found mixed (nuanced) effect where using LLMs as tutor helped students in one aspect but had negative effects on others. This is by far the best study you have presented me but it still does not change my opinion. There is little evidence that LLMs (even when used as a tutor) help people learn better traditional methods.

What makes me even more against this sentiment is this quote from the conclusion of the 3-study review paper:

> Our results suggest that students prefer to use LLMs to substitute rather than complement learning activities.

So on their own, students are more likely to use LLMs in a way which is harmful to their learning. I would expect similar behavior of junior developers.


As a precondition I think we have to assume that the person in question 1) wants to learn and 2) is smart enough to absorb new info and apply it and 3) reflects enough to adjust their approach when hitting bottlenecks or making mistakes 4) has a drive to create. Without these, self driven learning is not viable - and that has very little to do with AI.

For such a person, I believe AI can be very empowering for learning. Like Google, wikipedia and stack overflow, Arxiv before it - AI tools give access to a lot of information. It allows to quickly dig deep into any topic you can imagine. And yes, the quality is variable - so one needs to find ways to filter and synthesize from imperfect info. But that was also the case before. Furthermore AI tools can be used to find holes in arguments or a paper. And by coding one can use it to test out things in practice. These are also powerful (albeit imperfect) learning tools. But they will not apply themselves.


Who is talking about self driven learning? Every workplace teachers their juniors how to do their job, and how to become better at their jobs.

And as we are talking about junior developers it is safe to assume your conditions (1), (2), and (4) are all true, if any of them are false, then why did that person apply for and get a job as a junior developer? As for condition (3), all workplaces eventually hires a person who does not fulfill this, then they either fire that person, or they give them a talk and the developer grows out of it and changes their behavior to fulfill that condition.

Aside: you listed 4 conditions for learning. I am not sure these are actually conditions recognized as such by behavior science. In fact, I doubt they are and that these conditions are just your opinions (man).


And in doing so you spend what, a 100 watt hours per bad idea? Compared to how many megawatt hours of AIs failed attempts at proving math capabilities to investors only to prolong the AI bubble another month?

I bet your stupid ideas also taught you a valuable lesson and you learned at least something from the experience, maybe your next idea won’t be so dumb, and those 100 watt hours weren’t actually wasted (though it may feel like they were). Compered to a failed LLM experiment, where all those billions of billions of computations are completely wasted. the model knows exactly as much after a failed experiment as it did going into it. Those Megawatt hours were simply wasted, turned into heat energy, paid for by raising the power bills of the of the datacenter’s neighbors.


I am also an AI skeptic, but I would rather have used the 1000 monkeys with a 1000 typewriters will eventually write the whole works of Shakespeare analogy.

When you consider the amount of computation which went into this discovery it is less impressive. Like if you spend a lot of fuel you can travel really fast, much faster then a bicyclist. Similarly Go-engines can beat the best humans at go, but they spend several orders of magnitude more energy to do so.

Mathematicians prove or disprove conjectures all the time and use orders of less energy to do so. Using LLMs is kind of just throwing money at the problem and hoping it works. In this case it did. But this is not the most efficient way to do this, and it won‘t scale.


> Go-engines can beat the best humans at go, but they spend several orders of magnitude more energy to do so.

From what I can find AlphaGo Zero runs at ~400 watts, while a human brain uses ~20. So really only about 1 order of magnitude difference.

Of course, training costs are a different question entirely.


I would call it cultural theft. But a better word is cultural appropriation, and the original cartoon—though iconic—did it worse. Aladdin was first written sometime in the 9th or the 10th century (oldest surviving complete manuscript of 1001 nights is from the 15th century). It was translated into English in the 18th century.

Disney made a cartoon of the story without understanding the culture it comes from with the main purpose of selling it to an audience with an even less understanding. And the results was a horrible misrepresentation of somebody else’s cultural heritage.


You are arguing in theoreticals, so you should not be surprised if your answers are hypotheticals.

In reality most art is done because the artist has something to say, and the money they get from it is only motivating in as much as it enables the artist to do more art. So I would guess in a world without copyright protection we would just find other ways to pay artists and a very similar amount of art would be produced.

You can see an example of this e.g. in Iceland where the market is way to small for art aimed at the domestic market to make enough money solely by selling it (possible with music; rare with books; not possible with movies). Instead the state has an extensive “artist salary“ program, which pays artist regardless of how well the art they produce sells. Unsurprisingly Iceland produces a lot of art and has many working artists.


Cool. Let me know when the government is willing to pay me to write full time---I would love to quit my job and do that instead. I think it's a great idea!

I think you may be too optimistic about the state of affairs under capitalism. Very rarely do things change which don't benefit the owning class without direct action from the working class that puts adequate pressure on the rich, i.e actions which threatens their profits.

In this analogy most people are already using an excavator, however some people are using an expensive driverless excavator which sometimes digs the wrong hole, and your boss is wondering why they are paying you to prompt the excavator when they know just as well as you how to prompt the machine to dig big holes.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: