Hacker Newsnew | past | comments | ask | show | jobs | submit | sathackr's commentslogin

The opposite of that has been happening for 20 years now with cloud compute.

It won't happen with AI models either.

It's almost ingrained in the American business model now. Outsource everything. Nobody wants to manage a room full of servers when they can spend 2-3x as much and outsource that headache along with the responsibility for it.

Same will happen with AI. Whether that means paying Anthropic that premium or paying AWS.

I'm in a relatively small business, we recently had an outage related to our local infrastructure.

I got pressure from the CEO saying it wasn't reliable to host our own infrastructure anymore even though our total internal down time over the last 5 years is significantly less than even a single of the larger recent AWS outages.

Everyone wants to shuck the chore and the responsibility.


> The opposite of that has been happening for 20 years now with cloud compute. It won't happen with AI models either.

AI is different.

Cloud computing genuinely is cheaper on average. It's better than paying for cisco servers, and at scale, it's cheaper than managed platforms (ala Heroku), and it's a coin toss for when you're in the middle ground and constantly approaching the point of rebuilding poor-man versions of existing products but with very very expensive engineering salaries.

In contrast, local models offer dramatic savings, and are magnitude of orders better in certain aspects: like stability - the performance is all over the place with traditional AI companies as they divert compute to their next big thing.

The benefits to maintaining your own infrastructure are pretty moderate to low, with very high risk.

And also, alternate models are pretty easy to use and easy to swap out unlike the vendor lock-in that exists with cloud services.


> AI is different.

I agree. The other thing here is that, once you can run LLMs on a single piece of commodity hardware (whether that includes one GPU or several), the difference between cloud vs. on-premise LLMs will largely be about where your hardware is located. There will be very little software configuration involved (just an HTTP endpoint that talks to the GPU). This is decidedly different from cloud products where the moat of hyperscalers is largely in the software and services on top of the hardware, not the hardware itself. (Sure, GPUs will eventually break & need replacement, too, but there's no state to lose, so that's already orders of magnitude easier than replacing hard drives.)


There's also a difference in the cost of downtime. A server hosting your website or SaaS, if it's down for five minutes, costs you a lot of real revenue. So you plan for redundancy, you set up automatic failover so that if one node goes down the next node can handle the load while the first one reboots, and so on. But for the LLM that's just serving your local model? You can tell everyone "Hey, we're taking it down for a 15-minute window, so plan your lunch break while it's down". Unplanned downtime can interrupt what people were doing and cost you productivity and thus money, but it's a lot easier to schedule planned downtime and have people work on non-model-using tasks during those periods: the model is helpful, but not essential.

There's no economic reason why running a model locally should be better than using a cloud hosted version.

“There is no reason anyone would want a computer in their home." - Ken Olson, Founder of Digital Equipment Corporation, in 1977

In hindsight this is getting truer, what with the push of dumb terminal for everyone

Everyone has at least one in their pocket right now though.

Sure there is. Keeping your IP in house.

You pay a 3x markup to rent a server through AWS than managing your own. You pay for convenience. At shall annals that's fine, but for large companies with their own datacenters, you generally do things in house.

> Cloud computing genuinely is cheaper on average.

For some applications, sure. Availability is a large part of what one is paying for with cloud computing, but it's also something that not every business needs.

If you sacrifice availability and have a pure-compute use case (low durability requirements), on-prem can quickly end up cheaper for far better hardware.


AI is different because you can't encrypt it. An running on someone else's hardware is basically just 'trust me bro! I won't read it!'. Of course you can say that about I.e. database too, but at least you can run it on your own dedicated hardware in some datacenter, so it is password protected, you can encrypt it at rest and you will only know the key.

With AI, no, you can't . model needs plain text to be able to work. If somebody will be able to figure out models with asymmetric keys will make a lot of money.


For many companies (country-dependent) that's not really why they use cloud services vs purchasing. It's tax shenanigans and business process overhead. OpEx vs CapEx, and a small (%) bump in the huge AWS bill no one will even notice or a $30k+ invoice for hardware that has to go through rigorous review and 3 departments.

Same reason people pay for things through the AWS marketplace (like Vanta) instead of having to go through their invoicing process.


Good point. Maybe there'll be companies that maintain your on-premise GPU cluster just like there are companies that service the coffee machine in your office?

This is far more likely than everyone racking their own servers.

> on-premise GPU cluster

Renting a GPU server from a cloud and hosting your own llama.cpp is the path of least resistance.


It's just not comparable though is it? You need cloud services because it's physically impossible to use your single home computer as a server, CDN, load balancer, mass storage, security service, and distributed system.

But AI is just weights, you can run a reasonably intelligent model at home, or on a few GPUs if you're a small-medium sized company, and it doesn't require dedicated maintenance.


If you're a medium-large company, you should definitely run your own AI because you can max out the CPUs more often. You're not only able to run privately and locally, but you're also able to run efficiently.

> I got pressure from the CEO saying it wasn't reliable to host our own infrastructure anymore even though our total internal down time over the last 5 years is significantly less than even a single of the larger recent AWS outages.

Same here. My job as a software dev does not require me to self-host services we need and use. Quite the opposite. But, I am reluctant to hand over all control to AWS or equivalent for several reasons that I will get into here.

I have found that Infrastructure as Code (IaC) and modern tools like opentofu, ansible, combined with frontier AI models and harnesses gives you superpowers in this space. Almost all of our self-hosted services are fully managed by these tools. e.g. We perform backups and test them more often now than we ever did before. Entirely because it is so much easier to do all of that now.


IMO local-vs-cloud may be a misleading dichotomy, versus:

    1. Individual dev machines
    2. Shared local server
    3. Shared server in corporate cloud
    4. Third-party LLM SaaS provider
Even if you don't want your laptop melting, there are still some important differences between 3 and 4 in terms of data privacy and security.

There is efficiency in the cloud model for models. So maybe there is a scope for Apple or an "Apple for AI" in the AI compute game - mainly from the perspective of privacy etc.

And once the servers are in space, everything is fully out there.


> It won't happen with AI models either.

AI is definitely different. Cloud compute is incredibly convenient to the point where even if AWS is more expensive it's just so _nice_. LLM models are much more abstract and while I can't easily swap AWS for Hetzner to save 80% of my costs I can absolutely get close to that for many of LLM tasks, even today.

I suspect Anthropic and gang all know that that's why they are buying up dev tools and shifting towards long-running agents because that's where they can get AWS's "nicesness" that they can charge for.


Still though, perhaps the existence of low-margin, generic, cloud LLM's puts some downward pressure on the 'brand name' companies?

I suppose cloud won because: - nobody wants to deal with the networking stack on the internet - you want servers alive all the time - it's businesses running their software on servers to serve to customers

Do these apply to AI?


That's an interesting take, however there is no ongoing maintenance related to local models, maybe the only effort is giving more capable machines to the workforce; but yeah I can see how it might feel like a barrier.

The hardware, the power systems, the cooling systems. They need maintenance.

The OS needs updates, file systems get corrupted.

Fans get dirty.

All the things that you need to deal with in hosting your own server infrastructure you have to deal with when hosting your own AI infrastructure (which runs on servers...)


However, you can get many of the benefits of a "local model" by outsourcing all the hardware maintenance but still using an open model. Guaranteed repeatability for one.

A lot of the reason people outsource normal software is its brittle security properties, not sure that even applies to an LLM - it can go and look up the latest security best practices just like an engineer can.


on prem cloud is harder because of the scale up and scale down requirements. If you are a growing business which most decent ones are, you constantly have to think about that.

> Everyone wants to shuck the chore and the responsibility.

Which gives all the power to the big techs. I'll never understand why the average company seems to have no problem with this.


Did you build your own house using tools that you forged from iron-rich ore yourself? Did you grow your own wheat to make bread for your lunchtime sandwich today?

There's a reason most people pay other people to do these things for them.


It's a longstanding management principle, so old that people may not even say it explicitly any more, which states "focus on your core competencies," the corollary of which is "outsource anything that is not a core competency."

I can see how it makes sense for companies, because money is "only money" but an ongoing operational distraction can be much more costly, as in, it can be detrimental to the success of the overall business.


> in the American business model

AI company valuations won't survive if they're only for the "American business model".


Exactly. American businesses aren't even particularly efficient or well run

outsource that headache along with the responsibility for it

You know what gives me headaches? When I'm in the middle of a session and the model gets rug-pulled out from under me because somebody at the model provider didn't pay the Trump bill that month.

Or when someone at the model provider decides that the curve-fitting algorithm in my graphics package looks a little too much like Skynet for comfort.

Or when they do any number of other things to undermine my work for the sake of their business model, some of which I won't even notice until the damage is done.

The sad thing is, if you know how inference works, you know that it really is insanely wasteful for everybody to run it locally. If anything naturally belongs in the cloud, it's inference. But at the same time, what choice are we being given?


What about inference suggests it naturally belongs in the cloud?

Inference basically looks like this (neglecting a whole bunch of stuff):

    for t in tokens_in_context
        for p in model_weights
            do something with p*t
The expensive part is fetching each weight from memory, which is why VRAM/HBM is such a big deal. Conceptually, for a huge, dense (non-MoE) model, the inner loop might run a trillion times for every token generated.

Obviously that's not how it really works in practice, but the point is, if you are only running one prompt at a time, each weight gets fetched, applied to the token being processed, and then never touched again until the next token is processed.

So when you submit a prompt to a model that's running a bunch of other peoples' contexts concurrently, it can reuse each weight multiple times before moving on to the next one:

    for p in model_weights
        for u in users
           for t in u's context
              do something with p*t
The same is true in an agent-heavy scenario where you have several contexts in play at once.

Worst case, in terms of energy efficiency, is a single user sitting around waiting for a single response. I don't feel like I'm explaining it well, but the core idea is that every time a weight is fetched from memory, you want to get as much work done as possible with it.


That makes a lot of sense, thank you. I think a pirate cloud of local models could make sense, but that would be regulated into oblivion

The Mark of the Beast


Do you do consulting?

Please reach out to me if so.

My username at gmail


"They can't remove it without knowing who the warrant is for" is absolutely Flocks problem.

They're alerting on a license plate but yet somehow they can't turn off that license plate alert using just the license plate number? Fucking bullshit


Wouldn't it be the purview of the cops to update Flock that the plate is no longer of interest and to stop alerting on it? I'm no fan of Flock, but let's put the onus where it is deserved.


I can tell you've never worked on government software.


MEGA is now headquartered in Hungary...who until very recently was run by someone very much aligned with the far right movement.


GrapheneOS still does this -- allows controlling internet access on a per-app basis.


It's one of the big reasons I advocate for graphene even if one chooses to install Google services afterward.

Also notable: as of last year, OnePlus allowed mobile and WiFi network toggle, effectively doing the same thing.


For those of us stuck on normal android, is there a way to achieve that? I know it used to work with some firewall apps but nowdays they all require root access.


Rethink DNS can block internet access of an app (besides doing DNS-based blocking, etc.): https://rethinkdns.com

It uses the VPN functionality, but you can stack a Wireguard VPN on top of it.


Netguard No Root Firewall still works for me: https://github.com/M66B/NetGuard


+1 for Netguard, it is awesome. A bit clumsy UI, but indispensible.


It looks like you can't revoke the internet permission, but you can use the firewall via ADB. Settings are lost on reboot, but you can use an automation with Tasker or similar to set them on boot:

https://www.reddit.com/r/tasker/comments/1mxjnvs/how_to_bloc...


Not the same thing, but you can install an app like Blokada Libre to block ads and trackers in all apps.

https://blokada.org/


Or you can set your DNS resolver to dns.adguard-dns.com and it blocks almost all ads. You can search "private dns" in Android settings app and set it there.


This has the disadvantage that you can’t whitelist specific domains, which is something I need pretty often.


You can signup for private adguard dns, then you should be able to whitelist domains.


Go to settings > App > $SCUMMY_APP > Mobile Data & WiFi. Uncheck all.


Not a thing on stock android


Why does Apple not give that Wi-Fi option there? I mean, is there a reason we’d be sympathetic to?


iOS allows this, but only on mobile data, which is pretty infuriating. Why should I not be able to also restrict apps from dialing home/anywhere just because I'm on a Wi-Fi network (which isn't even necessarily unmetered)?


It's really annoying. I have a sudoku game on my phone, works great but give it internet access and it's suddenly full of sketchy adverts.

If I'm playing it on my commute, it's usable with mobile data disabled for the app. But when the train stops in a station long enough to auto-connect to wifi, immediate full screen adverts :(


Then don’t use an ad supported app? I have one as supported app on my phone - Overcast. The developer created their own ad platform and serves topic based ads based on the podcast you are listening to right now. Ironically enough I started to pay for a subscription even though it didn’t give me any real benefit just to support him until he started having ads.

I’ve found a lot of useful podcasts from the ads.


The OS ought to let you deny internet access to an app entirely, but DNS-based adblocking might solve your problem: https://mullvad.net/en/help/dns-over-https-and-dns-over-tls


I’m gonna be That Guy for a minute: if you enjoy using a Sudoku app, isn’t there one available on more acceptable terms, e.g. a single purchase or a IAP that removes the ads from this one? I’m not saying you have to pay like $3.99/week for a scam one, but more like pointing out that if you don’t like ads (as I also don’t) why not support the developers who believe in selling software to you for a few bucks rather than selling your annoyance to Google via Adsense?


It's almost like 'shadow-banned'

it's not listed as flagged or dead, but it's also not on the first 4 pages despite https://www.pbs.org/newshour/world/trump-warns-a-whole-civil... being on the 4th page with only 23 points


It was on page #11 last I checked, past 330.


Its on /active I think


This absolutely demands the attention of every American


yes. it's an argument that since EVs are heavier than fossil-fuel vehicles due to their batteries, that they generate more particulate emissions (brakes/tire dust) than fossil-fuel vehicles.

it's a wrong argument, but it's still circulated in groups of factually-challenged people


Nobody said they generate more but simply that they generate some. Modern petrol engines output very little particulates so almost all the particulates are from tyres and brakes. Why would EVs produce any less?


While EVs are heavier—increasing tire wear—their regenerative braking significantly reduces brake dust, and they eliminate tailpipe exhaust entirely. Overall, EVs offer a net reduction in particulates.


Now compare that to a 2200lb civic with a 5 speed you can engine brake with.


> Overall, EVs offer a net reduction in particulates.

Nobody said anything to the contrary.

I am sceptical about the reduction versus a modern, efficient hybrid, though. Those can use regenerative braking too.

EVs are heavier which increases road wear. Everyone loves to forget about the road.

When it comes to particulates and other issues, EVs are just "less bad". We still need to push for walking, cycling and trams and stop pretending that EVs solve the bigger problems. I hate how every comment on HN that doesn't sing the praises of EVs from the rooftops gets immediately downvoted. We can do better than "less bad". We should be aiming much higher.

I wish EVs happened earlier, before the explosion in fossil fuels that led to enormous vehicles with full air-conditioning "cabins" (more like portable living rooms). EVs being slow to charge is an extremely good thing for us. It makes it obvious that this energy isn't free and takes a while to accumulate. If this was obvious from the start, I doubt people would have wanted these huge, inefficient things. Imagine opting for a climate controlled cabin or a larger vehicle if it meant a significant increase in charging time. Nobody would go for it unless they really had to.


> EVs are heavier which increases road wear. Everyone loves to forget about the road.

Passenger vehicles are pretty negligible when it comes to road wear compared to trucks (1000 times more). The weight is more important when we consider freight trucks (electric freight trains just get the power from overhead cables or a third rail). As freight trucks transition to electric, we will definitely have more road wear to worry about.

> When it comes to particulates and other issues, EVs are just "less bad".

Is this a perfect is the enemy of good argument? I mean sure, using public transit, bikes, and walking is better than using private personal transportation. But I can tell you...Beijing has all of that and electric cars are still much better than the ICEs they used to have.

> I hate how every comment on HN that doesn't sing the praises of EVs from the rooftops gets immediately downvoted.

All kinds of Perfect is the enemy of good comments generally get downvoted because the fallacy is overused on HN.

> Imagine opting for a climate controlled cabin or a larger vehicle if it meant a significant increase in charging time.

It really depends on how much you need to drive.


The WSJ and Daily Mail both ran stories with headlines explicitly stating that they generate more particulates. I can't find any credible source stating the same, so I'm assuming the stories were the usual agenda fiction, but they do exist.


It's an argument that means you can still say cars are bad even if they're electric, which may be true but also clearly leans into some people's preexisting biases


ev particulate is identical if not less than fossil fuel.

same tires (actually a little harder due to being LRR tires) same brakes (that get used significantly less thanks to regenerative braking)


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: