Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I’m not sure tax depreciation rates are the best measure here. Those GPUs will be used for much longer than 6 years, and the returns from the businesses will be an order of magnitude longer.


The jury is still out on this. Those tax based deprecation schedules are largely a relic of traditional data centers, where workloads are fairly moderate compared to AI use cases. Additionally, power and rack space constraints can complicate things quite a bit. If next gen chips are significantly more efficient and you are currently constrained by power availability, you might pull your old servers and replace them with the newer ones regardless of how much useful life you have left.


Azure ran K-80/P-100 fleets a bit longer for 8-9 years . Google does 9 years for TPUs .

In the current generation There are plenty of questions around

- viability of training to inference cascades (the key to extended life) given custom ASICs hitting production like cerebras did early this year.

- energy efficiency of older chips in tight energy environments , just new grid capacity constraints favor running newer efficient chips ignoring perhaps short term(< 1 year) price shock due to war.

- higher MBTF , compared to older GPUs modern nodes are 8 GPU clusters built on 2/3 nm processors depending on HBM memory, the tolerances are much lower especially for training.

- new DCs being spun up are being by up less than ideal conditions due to permitting, part supply and other constraints which will impact operating environment.

Not withstanding, all these issues and even taking a generous 10 year useful life . The expenses dwarf every mega project before it .


> Those GPUs will be used for much longer than 6 years

Will it be worth the cost of electricity to run them if the flops/watt of newer chips is lower?


If demand is less than supply, definitely not.

If every latest-gen is booked solid and there is still unmet demand, why would you decommission?


actually the physical lifetime (not financial depreciation) for AI data center GPUs is even lower (3 to 4 years)


Like, they break? Or it just becomes more profitable for the data center to replace them?


It will become more expensive to fix than replace. Also more energy intensive than newer generation to operate. MBTF is significant the older the fleet gets higher the failure rates .

A typical node today is 8 GPU node today , you have to keep replacing failed GPUs by cannibalizing parts from other GPUs as nobody is selling new GPUs of that model anymore at higher frequencies.

In addition to outright failure there are higher error rates in computation in graphics it tends to be flickers or screen artifacts and so on.

Azure operated K-80s and P-100s for 9 and 7 years respectively but they were running at 2 GPU nodes and of course were much simpler compared to today’s HBM behomouths on 2/5 nm processor nodes . Google operates their custom ASIC TPUs for about 8-9 years .

With custom inference ASICs like cerebras hitting production the cascading of training NVIDIA chips to inference to get the 5-6 year useful life is also not clear.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: