Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If you are a senior developer [1] responsible for delivering projects where you have to delegate to mid level ticket takers, you have to deal with developers who are also non deterministic and you can never trust their quality.

Hell my coding is non deterministic with different degrees of quality depending on what else I have going on.

But just like a developer, an LLM can also reason over intent based on clearly named functions, modularity, etc.

[1] if someone is pulling well defined tickets off the board. They are a mid level developer regardless of title.



LLM cannot reason about anything. It can provide text that can be plausibly interpreted as reasoning by someone reading that text. When human provides a plausible explanation then it means they either had someone else provide it to them or they actually understand the issue. LLM cannot understand anything, it can only provide output based on the training data where similar input has likely to have produced similar output in the past. Human can tell you they don't understand or don't know something but LLM is unlikely to have training that will produce that kind of output, it is more likely you will always get something that looks correct but it might not be.

LLM can automate a part of the process where human might take slightly but, ultimately, any output generated by LLM cannot be trusted and should be checked by human that understands the issue...and that is actually the hard part where humans will struggle so they won't actually do it.

When human is producing the output that human is performing the following actions: -analysing the issue -analysing the exiting process -building the understanding of the existing process -building the understanding of how issue affects the existing process -producing the output to address the issue in the existing process -checking the output as it is being produced -updating the understanding of the existing process with lessons learned from the above -checking the final product to ensure that it has solved the original issue and hasn't broken some other part of the system

LLM can help speed up one of those steps (producing the output) at the expense of slowing down the other parts (which were already slow) and reducing the understanding and reliability of the existing system which will make future iterations even slower.

LLM can be used to speed up the generation of examples but just like in the past you could not just copy the example from some random internet search result, you should not just copy the LLM output without understanding it...and that is the slow part where LLM might not help (and might actually make worse) for most people.

And when in the past you encountered comprehensive and well documented output you could assume human that put that amount of effort actually understood what they were doing and wouldn't have expended that much effort to generate garbage, you cannot make that same assumption now with LLMs.


Again this is not true with a real world recent example.

For context: for the project I’m about to describe, I did the 3 week discovery process where I iterated through the design. I designed the architecture from an empty AWS account with IAC and an empty git repo. I know every decision that was made and why.

An issue was reported while the client was testing - a duplicate message was displayed to the user.

I gave codex three pieces of information - the duplicate IDs and told it was duplicate.

Codex:

1. Created and ran a query in the Postgres database after finding the ARN to the credentials - you don’t have to pass credentials to the database in AWS, you pass the entry in Secrets Manager directly to the database as long as you have permission to both (Dev account). I didn’t tell it the database and queried where I was storing the event.

2. It found the lambda that stored the events in the database.

3. It looked at the CloudFormation template to figure out the Lambda was triggered by messages in an SQS queue

4. Looking at the same template it saw that the SQS message was described to an SNS topic

5. It found the code that sent the events - a 3000 line lambda

6. It was able to explain what the lambda did and find there wasn’t a bug in the logic

7. It saw that the flow was data driven and got the information from a DDB table defined by an environment variable.

8. It then looked at that CloudFormation template that deployed the Lambda

9. It ran a query on the DDB table after looking at that CloudFormation template to figure out the schema

It then told me that there was a duplicate entry in the database.

I knew the entire structure of the system - again I designed all of this myself. I wanted to see how codex would do.

Everything you are saying a modern LLM can do.

I won’t even go to how well it debugged a vibe coded internal website just by telling it to use Docker container with headless chromium and Playwright. It debugged it by taking screenshots while navigating and making changes.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: