The opposite of that has been happening for 20 years now with cloud compute.
It won't happen with AI models either.
It's almost ingrained in the American business model now. Outsource everything. Nobody wants to manage a room full of servers when they can spend 2-3x as much and outsource that headache along with the responsibility for it.
Same will happen with AI. Whether that means paying Anthropic that premium or paying AWS.
I'm in a relatively small business, we recently had an outage related to our local infrastructure.
I got pressure from the CEO saying it wasn't reliable to host our own infrastructure anymore even though our total internal down time over the last 5 years is significantly less than even a single of the larger recent AWS outages.
Everyone wants to shuck the chore and the responsibility.
> The opposite of that has been happening for 20 years now with cloud compute.
It won't happen with AI models either.
AI is different.
Cloud computing genuinely is cheaper on average. It's better than paying for cisco servers, and at scale, it's cheaper than managed platforms (ala Heroku), and it's a coin toss for when you're in the middle ground and constantly approaching the point of rebuilding poor-man versions of existing products but with very very expensive engineering salaries.
In contrast, local models offer dramatic savings, and are magnitude of orders better in certain aspects: like stability - the performance is all over the place with traditional AI companies as they divert compute to their next big thing.
The benefits to maintaining your own infrastructure are pretty moderate to low, with very high risk.
And also, alternate models are pretty easy to use and easy to swap out unlike the vendor lock-in that exists with cloud services.
I agree. The other thing here is that, once you can run LLMs on a single piece of commodity hardware (whether that includes one GPU or several), the difference between cloud vs. on-premise LLMs will largely be about where your hardware is located. There will be very little software configuration involved (just an HTTP endpoint that talks to the GPU). This is decidedly different from cloud products where the moat of hyperscalers is largely in the software and services on top of the hardware, not the hardware itself. (Sure, GPUs will eventually break & need replacement, too, but there's no state to lose, so that's already orders of magnitude easier than replacing hard drives.)
There's also a difference in the cost of downtime. A server hosting your website or SaaS, if it's down for five minutes, costs you a lot of real revenue. So you plan for redundancy, you set up automatic failover so that if one node goes down the next node can handle the load while the first one reboots, and so on. But for the LLM that's just serving your local model? You can tell everyone "Hey, we're taking it down for a 15-minute window, so plan your lunch break while it's down". Unplanned downtime can interrupt what people were doing and cost you productivity and thus money, but it's a lot easier to schedule planned downtime and have people work on non-model-using tasks during those periods: the model is helpful, but not essential.
You pay a 3x markup to rent a server through AWS than managing your own. You pay for convenience. At shall annals that's fine, but for large companies with their own datacenters, you generally do things in house.
> Cloud computing genuinely is cheaper on average.
For some applications, sure. Availability is a large part of what one is paying for with cloud computing, but it's also something that not every business needs.
If you sacrifice availability and have a pure-compute use case (low durability requirements), on-prem can quickly end up cheaper for far better hardware.
AI is different because you can't encrypt it. An running on someone else's hardware is basically just 'trust me bro! I won't read it!'. Of course you can say that about I.e. database too, but at least you can run it on your own dedicated hardware in some datacenter, so it is password protected, you can encrypt it at rest and you will only know the key.
With AI, no, you can't . model needs plain text to be able to work. If somebody will be able to figure out models with asymmetric keys will make a lot of money.
For many companies (country-dependent) that's not really why they use cloud services vs purchasing. It's tax shenanigans and business process overhead. OpEx vs CapEx, and a small (%) bump in the huge AWS bill no one will even notice or a $30k+ invoice for hardware that has to go through rigorous review and 3 departments.
Same reason people pay for things through the AWS marketplace (like Vanta) instead of having to go through their invoicing process.
Good point. Maybe there'll be companies that maintain your on-premise GPU cluster just like there are companies that service the coffee machine in your office?
It's just not comparable though is it? You need cloud services because it's physically impossible to use your single home computer as a server, CDN, load balancer, mass storage, security service, and distributed system.
But AI is just weights, you can run a reasonably intelligent model at home, or on a few GPUs if you're a small-medium sized company, and it doesn't require dedicated maintenance.
If you're a medium-large company, you should definitely run your own AI because you can max out the CPUs more often. You're not only able to run privately and locally, but you're also able to run efficiently.
> I got pressure from the CEO saying it wasn't reliable to host our own infrastructure anymore even though our total internal down time over the last 5 years is significantly less than even a single of the larger recent AWS outages.
Same here. My job as a software dev does not require me to self-host services we need and use. Quite the opposite. But, I am reluctant to hand over all control to AWS or equivalent for several reasons that I will get into here.
I have found that Infrastructure as Code (IaC) and modern tools like opentofu, ansible, combined with frontier AI models and harnesses gives you superpowers in this space. Almost all of our self-hosted services are fully managed by these tools. e.g. We perform backups and test them more often now than we ever did before. Entirely because it is so much easier to do all of that now.
There is efficiency in the cloud model for models. So maybe there is a scope for Apple or an "Apple for AI" in the AI compute game - mainly from the perspective of privacy etc.
And once the servers are in space, everything is fully out there.
AI is definitely different. Cloud compute is incredibly convenient to the point where even if AWS is more expensive it's just so _nice_. LLM models are much more abstract and while I can't easily swap AWS for Hetzner to save 80% of my costs I can absolutely get close to that for many of LLM tasks, even today.
I suspect Anthropic and gang all know that that's why they are buying up dev tools and shifting towards long-running agents because that's where they can get AWS's "nicesness" that they can charge for.
I suppose cloud won because:
- nobody wants to deal with the networking stack on the internet
- you want servers alive all the time
- it's businesses running their software on servers to serve to customers
That's an interesting take, however there is no ongoing maintenance related to local models, maybe the only effort is giving more capable machines to the workforce; but yeah I can see how it might feel like a barrier.
The hardware, the power systems, the cooling systems. They need maintenance.
The OS needs updates, file systems get corrupted.
Fans get dirty.
All the things that you need to deal with in hosting your own server infrastructure you have to deal with when hosting your own AI infrastructure (which runs on servers...)
However, you can get many of the benefits of a "local model" by outsourcing all the hardware maintenance but still using an open model. Guaranteed repeatability for one.
A lot of the reason people outsource normal software is its brittle security properties, not sure that even applies to an LLM - it can go and look up the latest security best practices just like an engineer can.
on prem cloud is harder because of the scale up and scale down requirements. If you are a growing business which most decent ones are, you constantly have to think about that.
Did you build your own house using tools that you forged from iron-rich ore yourself? Did you grow your own wheat to make bread for your lunchtime sandwich today?
There's a reason most people pay other people to do these things for them.
It's a longstanding management principle, so old that people may not even say it explicitly any more, which states "focus on your core competencies," the corollary of which is "outsource anything that is not a core competency."
I can see how it makes sense for companies, because money is "only money" but an ongoing operational distraction can be much more costly, as in, it can be detrimental to the success of the overall business.
outsource that headache along with the responsibility for it
You know what gives me headaches? When I'm in the middle of a session and the model gets rug-pulled out from under me because somebody at the model provider didn't pay the Trump bill that month.
Or when someone at the model provider decides that the curve-fitting algorithm in my graphics package looks a little too much like Skynet for comfort.
Or when they do any number of other things to undermine my work for the sake of their business model, some of which I won't even notice until the damage is done.
The sad thing is, if you know how inference works, you know that it really is insanely wasteful for everybody to run it locally. If anything naturally belongs in the cloud, it's inference. But at the same time, what choice are we being given?
Inference basically looks like this (neglecting a whole bunch of stuff):
for t in tokens_in_context
for p in model_weights
do something with p*t
The expensive part is fetching each weight from memory, which is why VRAM/HBM is such a big deal. Conceptually, for a huge, dense (non-MoE) model, the inner loop might run a trillion times for every token generated.
Obviously that's not how it really works in practice, but the point is, if you are only running one prompt at a time, each weight gets fetched, applied to the token being processed, and then never touched again until the next token is processed.
So when you submit a prompt to a model that's running a bunch of other peoples' contexts concurrently, it can reuse each weight multiple times before moving on to the next one:
for p in model_weights
for u in users
for t in u's context
do something with p*t
The same is true in an agent-heavy scenario where you have several contexts in play at once.
Worst case, in terms of energy efficiency, is a single user sitting around waiting for a single response. I don't feel like I'm explaining it well, but the core idea is that every time a weight is fetched from memory, you want to get as much work done as possible with it.
Wouldn't it be the purview of the cops to update Flock that the plate is no longer of interest and to stop alerting on it? I'm no fan of Flock, but let's put the onus where it is deserved.
For those of us stuck on normal android, is there a way to achieve that? I know it used to work with some firewall apps but nowdays they all require root access.
It looks like you can't revoke the internet permission, but you can use the firewall via ADB. Settings are lost on reboot, but you can use an automation with Tasker or similar to set them on boot:
Or you can set your DNS resolver to dns.adguard-dns.com and it blocks almost all ads. You can search "private dns" in Android settings app and set it there.
iOS allows this, but only on mobile data, which is pretty infuriating. Why should I not be able to also restrict apps from dialing home/anywhere just because I'm on a Wi-Fi network (which isn't even necessarily unmetered)?
It's really annoying. I have a sudoku game on my phone, works great but give it internet access and it's suddenly full of sketchy adverts.
If I'm playing it on my commute, it's usable with mobile data disabled for the app. But when the train stops in a station long enough to auto-connect to wifi, immediate full screen adverts :(
Then don’t use an ad supported app? I have one as supported app on my phone - Overcast. The developer created their own ad platform and serves topic based ads based on the podcast you are listening to right now. Ironically enough I started to pay for a subscription even though it didn’t give me any real benefit just to support him until he started having ads.
I’m gonna be That Guy for a minute: if you enjoy using a Sudoku app, isn’t there one available on more acceptable terms, e.g. a single purchase or a IAP that removes the ads from this one? I’m not saying you have to pay like $3.99/week for a scam one, but more like pointing out that if you don’t like ads (as I also don’t) why not support the developers who believe in selling software to you for a few bucks rather than selling your annoyance to Google via Adsense?
yes. it's an argument that since EVs are heavier than fossil-fuel vehicles due to their batteries, that they generate more particulate emissions (brakes/tire dust) than fossil-fuel vehicles.
it's a wrong argument, but it's still circulated in groups of factually-challenged people
Nobody said they generate more but simply that they generate some. Modern petrol engines output very little particulates so almost all the particulates are from tyres and brakes. Why would EVs produce any less?
While EVs are heavier—increasing tire wear—their regenerative braking significantly reduces brake dust, and they eliminate tailpipe exhaust entirely. Overall, EVs offer a net reduction in particulates.
> Overall, EVs offer a net reduction in particulates.
Nobody said anything to the contrary.
I am sceptical about the reduction versus a modern, efficient hybrid, though. Those can use regenerative braking too.
EVs are heavier which increases road wear. Everyone loves to forget about the road.
When it comes to particulates and other issues, EVs are just "less bad". We still need to push for walking, cycling and trams and stop pretending that EVs solve the bigger problems. I hate how every comment on HN that doesn't sing the praises of EVs from the rooftops gets immediately downvoted. We can do better than "less bad". We should be aiming much higher.
I wish EVs happened earlier, before the explosion in fossil fuels that led to enormous vehicles with full air-conditioning "cabins" (more like portable living rooms). EVs being slow to charge is an extremely good thing for us. It makes it obvious that this energy isn't free and takes a while to accumulate. If this was obvious from the start, I doubt people would have wanted these huge, inefficient things. Imagine opting for a climate controlled cabin or a larger vehicle if it meant a significant increase in charging time. Nobody would go for it unless they really had to.
> EVs are heavier which increases road wear. Everyone loves to forget about the road.
Passenger vehicles are pretty negligible when it comes to road wear compared to trucks (1000 times more). The weight is more important when we consider freight trucks (electric freight trains just get the power from overhead cables or a third rail). As freight trucks transition to electric, we will definitely have more road wear to worry about.
> When it comes to particulates and other issues, EVs are just "less bad".
Is this a perfect is the enemy of good argument? I mean sure, using public transit, bikes, and walking is better than using private personal transportation. But I can tell you...Beijing has all of that and electric cars are still much better than the ICEs they used to have.
> I hate how every comment on HN that doesn't sing the praises of EVs from the rooftops gets immediately downvoted.
All kinds of Perfect is the enemy of good comments generally get downvoted because the fallacy is overused on HN.
> Imagine opting for a climate controlled cabin or a larger vehicle if it meant a significant increase in charging time.
The WSJ and Daily Mail both ran stories with headlines explicitly stating that they generate more particulates. I can't find any credible source stating the same, so I'm assuming the stories were the usual agenda fiction, but they do exist.
It's an argument that means you can still say cars are bad even if they're electric, which may be true but also clearly leans into some people's preexisting biases
It won't happen with AI models either.
It's almost ingrained in the American business model now. Outsource everything. Nobody wants to manage a room full of servers when they can spend 2-3x as much and outsource that headache along with the responsibility for it.
Same will happen with AI. Whether that means paying Anthropic that premium or paying AWS.
I'm in a relatively small business, we recently had an outage related to our local infrastructure.
I got pressure from the CEO saying it wasn't reliable to host our own infrastructure anymore even though our total internal down time over the last 5 years is significantly less than even a single of the larger recent AWS outages.
Everyone wants to shuck the chore and the responsibility.
reply