More

Netch · 2026-05-18T13:35:07 1779111307

From a bystanderʼs POV it is excessively hard to memorize all the mess with multiple different extensions. The naming style doesnʼt alleviate the task. But this is a typical issue in the whole RISC-V ecosystem.

What Iʼm slightly confused for is that all these extensions, useful for a minor part of applications, arenʼt moved to longer instructions (6-byte).

rwmj · 2026-05-20T08:52:23 1779267143

I agree the naming is annoying. However the extensions aren't too hard to understand. They are much more regular than on other architectures. I wrote a deep dive into them here: https://research.redhat.com/blog/article/risc-v-extensions-w...

Also groups of extensions are consolidated into Profiles, so in practice you don't really care about individual extensions. You'll only care that the hardware supports eg RVA23.

brucehoult · 2026-05-19T02:43:48 1779158628

The average bystander doesn't have to care, just buy a machine implementing the RVA23 profile (standard set of extensions) and be happy.

If you're building your own embedded hardware then you determine what your needs actually are e.g. do you need double precision? half precision? vector?. Then you choose a chip implementing that. Then you copy the ISA string from your chip's documentation to the `-march=` argument for GCC/Clang and be happy.

It's not hard and you don't have to think about it unless you very specifically want to.

bjourne · 2026-05-20T08:53:15 1779267195

The average bystander might want to write high-performance code for their risc-v cpu. Then they must know precisely which instructions are available and what the performance implications of using them are. E.g., the difference between a shared and non-shared fp register file is huge.

rwmj · 2026-05-20T08:55:36 1779267336

For the "average bystander" they're going to buy an OS and compatible hardware, or if they're the average programmer they're going to use a compiler and libraries that solve the problem already for them. Very very few people need to worry about the details.

akarpathy · 2026-05-20T09:35:43 1779269743

The average programmer still must inform their compiler what to use. Claude Code can assist with this.

andrepd · 2026-05-20T10:04:55 1779271495

You need Claude Code to copy a string into your config/make file?

trollbridge · 2026-05-20T14:17:40 1779286660

I suspect the PC was being satirical, but yes, this is quite common now.

brucehoult · 2026-05-20T12:58:21 1779281901

If you want to get the absolute most out of a specific CPU that is in your hands then you of course have to refer to the documentation for that specific CPU.

That process doesn't depend on whether it's an x86 or an Arm or a RISC-V.

That's why x86 people refer to the HUGE document maintained by Agner Fog.

If you want your code to run well on all standards-compliant implementations then you write according to the ISA documentation, in this case RVA23. Or ARMv9-A. Or x86_64 v3.

bjourne · 2026-05-20T19:19:25 1779304765

Nope. I want to get the most out of all cpus that will run my code. This is a combinatorial problem that grows exponentially by the number of relevant extensions. So, yes, you need to know the hardware, but accounting for combinations of 5 features is way easier than accounting for combinations of 10 features.

Riscv is basically repeating the same mistake X11 did. A minimal base that could be varied endlessly by combining extensions. I didn't work for X11. Some extensions became de facto mandatory (shm) while others fell by the wayside. But you could never rely on the availability of a given extension because someone somewhere might not have had it or disabled it. Then Wayland came along and cleaned up the gazillion extensions mis-design because it was a huge PITA. Riscv will get there too, sooner or later.

panick21_ · 2026-05-20T08:55:46 1779267346

You think the average person writes performance optmized code?

If you are on that level then you know pretty well what you are targeting. And even then in 99% of cases you just look at the top level profile.

If you do performance analysis for some specific embeded project that is not using a standard profile, then its a bit more work, but hardly impossible.

bjourne · 2026-05-20T09:29:51 1779269391

Bruh, the "average person" won't buy a riscv-based computer in decades. The average bystander to the riscv project indeed writes high-performance code for their, so far, mostly non-existent or emulated riscv processors.

panick21_ · 2026-05-20T11:54:35 1779278075

Your seriously arguing the the avg person write performance code so critical that minor difference in hardware implementation are relevant? Most people write code that isn't that performance critical, fireware or they are porting existing software over. A extreme minority of people that interact with an ISA is hand optimizing code.

RetroTechie · 2026-05-20T18:40:46 1779302446

Lol... the RISC-V ecosystem has loooong passed that stage. RISC-V is eating into markets from deeply embedded to automotive, high-end server cpu's to specialized accelerators. That's mass-produced hard silicon.

It's here to stay, coming to a device near you Real Soon Now (tm).

LoganDark · 2026-05-20T12:20:13 1779279613

Do high-performance RISC-V CPUs (that you can actually buy) still exist? SiFive Unleashed was great but IIRC it was a single batch that never returned.

brucehoult · 2026-05-20T13:11:17 1779282677

It depends on what you call "high performance".

I have in my hands one of the new SpacemiT K3 machines. It arrived today. I'm comparing it to several other things, and finding that it is pretty comparable to a "late 2012" Mac Mini with a i7-3720QM with base 2.6 GHz turbo 3.6 GHz running Ubuntu 24.04. They are quite close in feel for general use, web browsing, code editing, watching YouTube etc. The Mac is a little faster on many things, a LOT slower on others (anything that can use 8 cores, obviously).

You might say that's not "high performance" but we thought it was pretty good a dozen years ago.

The previous SpacemiT K1 chip two years ago was more like one of the last Pentium IIIs or PowerpC G4s, except with a lot more cores.

SpacemiT have a next generation K5 coming out, they say, at the end of the year. Tenstorrent have their new Ascalon-X core comparable to Apple's late 2020 M1 — and designed by the same guy who designed the M1. They've taped out a chip using that and say they'll be selling a dev board in Q2 or Q3. For now the first version is using an old chip process and it will be running at half the clock speed of the M1, but that's still going to be a very decent machine.

The HiFive Unleashed was of course 8 years ago. Since then there have been the HiFive Unmatched (roughly like Cortex A55) and the HiFive Premier P550 (a bit better than Cortex A72, other than no SIMD).

LoganDark · 2026-05-20T17:27:54 1779298074

> You might say that's not "high performance" but we thought it was pretty good a dozen years ago.

Definitely sounds pretty high-performance compared to basically every RISC I've seen (and including nearly every cell phone I've ever owned with the exception of the Apple ones).

Tenstorrent is awesome, can't wait to see if I can afford any of their hardware in 5ish years. I miss when you could buy TPUs as a consumer (Coral etc.)

fweimer · 2026-05-20T10:01:00 1779271260

How does the average bystander buy an RVA23 machine today?

self · 2026-05-20T10:50:35 1779274235

You buy a https://www.spacemit.com/products/keystone/k3 board. A few links:

- https://sipeed.com/k3

- https://milkv.io/jupiter2-dev-kit

More here: https://redd.it/1tcrx4o

fweimer · 2026-05-20T11:26:05 1779276365

The Arace purchase link for the Jupiter 2 kit says it's “in stock“, but it's actually for a discount coupon. The actual system can only pre-ordered. The Sipeed web site does not say anything about shipping timelines, and the products are not offered in their AliExpress store. I think the Sipeed boards are in preorder, too.

Of course, neither of these are machines. And the average bystander probably isn't used to importing computer parts directly from China, either.

brucehoult · 2026-05-20T12:53:37 1779281617

I have a machine in my hot little hands. It arrived today.

https://x.com/BruceHoult/status/2056911834737975635/photo/1

I've already posted on github my first project written on and for it today:

https://github.com/brucehoult/k3_ai

Sipeed have posted photos four days ago of the first batch of customer orders going out:

https://x.com/SipeedIO/status/2055549071931404291

> the average bystander probably isn't used to importing computer parts directly from China, either.

It won't take long for them to be available on amazon, just as D1 and JH7110 and K1 boards are now. e.g.

https://www.amazon.com/Orange-Pi-RV-Frequency-Development/dp...

fweimer · 2026-05-24T09:27:34 1779614854

Interesting. Is this a full system and not just a board? This is still not quite clear to me.

Hopefully, one of these systems gets produced in such large quantities that there's some pressure to add mainline Linux support.

ilidur · 2026-05-20T14:52:48 1779288768

Deep Computing have started taking orders for the final product and the Preorders are shipping within the next 6 weeks. They will be shipping from China I expect, but it's a proper shop front.

https://store.deepcomputing.io/products/dc-roma-risc-v-mainb...

camel-cdr · 2026-05-18T16:24:36 1779121476

> From a bystanderʼs POV it is excessively hard to memorize all the mess with multiple different extensions

It's the same for other ISAs.

> What Iʼm slightly confused for is that all these extensions, useful for a minor part of applications, arenʼt moved to longer instructions (6-byte).

Because these instructions don't need it. There will be future >4-byte instructions, for things thay can't resonably be done in 4-bytes, e.g. much larger immediates.

pclmulqdq · 2026-05-20T09:30:53 1779269453

It's way worse on RISC-V. There are maybe 5 x86 or ARM variants to care about at any given time, even if you want to hyper-optimize your code. RISC-V has a soup of literally 100s of extensions with non-uniform use and support.

camel-cdr · 2026-05-20T11:59:14 1779278354

There are a lot more ARM extensions than people are aware of. E.g. debian uses ARMv8-A with FEAT_FP and FEAT_AdvSIMDas a base. Yes, floating-point and SIMD are optional in ARMv8-A, as are the following ISA extensions, only including ones that add instructions and excluding the AArch32 stuff: FEAT_CRC32, FEAT_AES, FEAT_PMULL, FEAT_SHA1, FEAT_SHA256, FEAT_RDM, FEAT_F32MM, FEAT_F64MM, FEAT_I8MM, FEAT_LSMAOC, FEAT_SHA3, FEAT_SHA512, , FEAT_SM3, FEAT_SM4, FEAT_SVE, FEAT_EPAC, FEAT_FCMA, FEAT_JSCVT, FEAT_LRCPC, FEAT_DotProd, FEAT_FHM, FEAT_FlagM, FEAT_LRCPC2, FEAT_BTI, FEAT_FRINTTS, FEAT_FlagM2, FEAT_MTE, FEAT_MTE2, FEAT_RNG, FEAT_SB, FEAT_BF16, FEAT_DGH, FEAT_EBF16, FEAT_CSSC, ...

Also fun: FEAT_LittleEnd, FEAT_MixedEnd, FEAT_BigEnd

All of that was just 64-bit ARMv8.x-a, there is a lot more stuff, once you go to R or M profiles, 32-bit and previous versions.

The reason this is mostly not a problem, is that distros converged on a minimum of 64-bit ARMv8-A + FP + SIMD, which will also happen with RVA23 for RISC-V.

Just for fun, here are the Zen4 ISA flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl tsc_reliable nonstop_tsc cpuid extd_apicid tsc_known_freq pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy svm cr8_legacy abm sse4a misalignsse 3 dnowprefetch osvw topoext perfctr_core ssbd ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves avx512_bf16 clzero xsaveerptr arat npt nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold v_vmsave_vmload avx512vbmi umip avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid fsrm

Compared to RVA23 written out: rv64imafdcbv_zicsr_zicntr_zihpm_ziccif_ziccrse_ziccamoa_zicclsm_zic64b_za64rs_zihintpause_zba_zbb_zbs_zicbom_zicbop_zicboz_zfhmin_zkt_zvfhmin_zvbb_zvkt_zihintntl_zicond_zimop_zcmop_zcb_zfa_zawrs_svbare_svade_ssccptr_sstvecd_sstvala_sscounterenw_svpbmt_svinval_svnapot_sstc_sscofpmf_ssnpm_ssu64xl_sha_supm_zifencei

pclmulqdq · 2026-05-20T14:07:20 1779286040

I will note that you listed out all of the RVA23 instruction extensions, not all of the blessed RISC-V instruction set extensions. Here's the list of every ratified RISC-V instruction set extension, to get parity with the list you gave for the other ISAs:

M, A, F, D, Q, C, B, H, Zicsr, Zifencei, Zicntr, Zihpm, Zihintpause, Zihintntl, Zicbom, Zicbop, Zicboz, Zicond, Zicfilp, Zicfiss, Zimop, Zca, Zcb, Zcd, Zce, Zcf, Zcmp, Zcmt, Zcmop, Zclsd, Zilsd, Zmmul, Zfh, Zfhmin, Zfa, Zfbfmin, Zfinx, Zdinx, Zhinx, Zhinxmin, Zaamo, Zalrsc, Zawrs, Zacas, Zabha, Zalasr, Zba, Zbb, Zbc, Zbs, Ztso, Zbkb, Zbkc, Zbkx, Zknd, Zkne, Zknh, Zksed, Zksh, Zkn, Zks, Zkt, Zk, Zkr, Zve32x, Zve32f, Zve64x, Zve64f, Zve64d, Zve, Zvl32b, Zvl64b, Zvl128b, Zvl256b, Zvl512b, Zvl1024b, Zvl, Zv, Zvfh, Zvfhmin, Zvfbfmin, Zvfbfwma, Zvbb, Zvbc, Zvkb, Zvkg, Zvkn, Zvknc, Zvkned, Zvkng, Zvknha, Zvknhb, Zvks, Zvksc, Zvksed, Zvksg, Zvksh, Zvkt, Sm1p11, Sm1p12, Sm1p13, Smaia, Smepmp, Smstateen, Smcdeleg, Smcsrind, Smcntrpmf, Smrnmi, Smdbltrp, Smmpm, Smnpm, Smctr, Ss1p11, Ss1p12, Ss1p13, Ssaia, Ssccfg, Sscsrind, Sscofpmf, Sstc, Ssqosid, Ssdbltrp, Ssnpm, Sspm, Ssctr, Supm, Sv32, Sv39, Sv48, Sv57, Svinval, Svnapot, Svpbmt, Svadu, Svvptc, Svrsw60t59b, Sdext, Sdtrig

That doesn't look very short to me.

These are grouped into profiles, like "Skylake" or "Cortex-M33" or "Neoverse-N1." The main issue for RISC-V isn't the number of instruction set extensions, it's the number of profiles. RVA23 is one single blessed profile, but many chips will add a few more instructions or include fewer than RVA23 based on age of the chip.

RetroTechie · 2026-05-20T19:09:28 1779304168

Common Linux distros will target one of the profiles, or a commonly supported subset like RV64GC.

Beyond that, what other extensions a particular board or chip supports, doesn't affect regular uses like web browsing. Specific apps or software libraries may use an ISA extension if present. Same as for other ISAs.

Code for embedded systems is optimized for the exact cpu in there. Same thing for highly specialized jobs (scientific / datacenter type stuff).

In short: yes, fragmentation wrt ISA extensions, hardware & software support exists. In practice, it isn't a big problem as some claim it to be.

dwattttt · 2026-05-20T12:23:17 1779279797

That sure is a long list. But written out like that it gets a bit misleading: does there exist anything with that same list, just missing pae? mmx? syscall? Just because they have individual names & flags, doesn't mean every combination of them exists.

jcranmer · 2026-05-20T15:02:58 1779289378

The Intel manuals list the set of features that are removed or planned to be removed from newer hardware versions: Sub-page write permissions for EPT, xAPIC mode, Key Locker, Uncore PMI. IA32_DEBUGCTL MSR, bit 13 (MSR address 1D9H), Intel® Memory Protection Extensions (Intel® MPX), MSR_TEST_CTRL, bit 31 (MSR address 33H), Hardware Lock Elision (HLE), VP2INTERSECT. AMD's manuals suggests that they view the ISA as purely additive, but I haven't read them in detail.

Basically, outside of MPX, and the confusing lineage of AVX-512 on client versus server parts, x86 is pretty strictly additive.

benj111 · 2026-05-20T10:33:26 1779273206

What are you imagining? If this is desktop then most of the extensions are going to be standard.

The only reason they're optional is because I'm using the same instruction set on my Pico, so no it doesn't have floating point, and I believe it has integer divide but I wouldn't be surprised if it didn't.

And the extensions are in groups, a good chunk of which are compressed instructions, which unless you're writing assembly, you don't need to worry about.

In fact most of this you don't need to worry about unless youre writing assembly.

imtringued · 2026-05-20T10:57:49 1779274669

Electronics distributors search engines tend to work extremely poorly and if you try to overload them with an absurd variety of niche extensions, then nobody is going to find the right RISC V MCU for their needs.

addaon · 2026-05-20T16:17:13 1779293833

> There are maybe 5 x86 or ARM variants to care about at any given time

What? There are individual chips with nearly that many ARM variants, including incompatible ISAs (M0 vs R52) and compatible-but-very-different-performance-characteristics implementations of the same ISA (M4 vs M7, say). Even figuring out what portion of code can be shared across which cores (and for those that distinguish between ARM and Thumb mode, what mode that code can be called in), vs what code needs duplicate versions for different cores for correctness, vs what code needs duplicate versions for performance but not correctness (which changes as the code usage pattern evolves) can be a challenge on a single chip; I can't imagine a world where you can think about only five across an entire industry.

wg0 · 2026-05-20T10:26:00 1779272760

> It's the same for other ISAs.

No they are not. See the Intel Software Programmer Volumes. Highly detailed, highly structured and highly specific.

Joker_vD · 2026-05-20T10:31:51 1779273111

You're joking, right?

wg0 · 2026-05-20T13:26:57 1779283617

No. Because I read about Intel in detail for a long time. Those volumes are part of my digital library.

Tired finding similar quality documentation on ARM and RISC-V and came empty handed.

fidotron · 2026-05-20T12:12:53 1779279173

From the point of view of the RISC-V architects the "users" are the chip designers who are engaged in a sort of build-your-own-instruction-set situation, and this kind of makes sense, but does contribute to it being a mess.

They are absolutely in denial as to the downstream effects of this on the software ecosystem. Android, for example, for native support had enough fun dealing with relatively few ARM variants (and x86/MIPS etc), and identifying chip features at run time was reliant on the board support software getting it right (hint: it didn't).

pjmlp · 2026-05-20T11:29:36 1779276576

It is like Khronos APIs but in hardware, design by comittee at its best.

Netch · 2026-05-18T12:11:15 1779106275

A normal situation in my tasks is when the working copy contains lots of changes that are used for debug (mainly prints) but these changes shall not be committed to the proposed change. For this, even interactive adding (`git add -i`) does not satisfy; I need `git add -e` which allows editing in a patch form, and remove the temporary local changes.

Netch · 2025-07-22T11:54:59 1753185299

Slightly side note - I donʼt understand why C++ invents new stdio based functions like std::print but rejects to import funopen() or fopencookie() which make stdio flexible similar to iostreams. As for me it should have been done ≈20 years ago.

Netch · on Feb 8, 2025

With the manner exposed by the most feature-full DEs, Iʼd expect setting default to ~/Cache (not ~/cache, the latter is for the userʼs discretion).

But, I tend to agree with the main articleʼs premise that without a special teaching for the issue most newcomers will lose the issue.

Netch · on Dec 16, 2024

Nice work! Perseverance to find the root cause in such an unfriendly environment as any bootstrap is what saves all us from the civilization crash...

Netch · on Dec 15, 2024

RISC-V designers explicitly declare in their base specification: "The AMOs were designed to implement the C11 and C++11 memory models efficiently." So at least one example is present.

Netch · on Dec 14, 2024

> Python might have used array[~0] instead

This is what was once added to C#: arr[^idx], when this ^idx is mapped to a special object, typically optimized then out. arr[^0] means the last element.

neonsunset · on Dec 14, 2024

[^n] indexing is mapped to an 'Index' struct by Roslyn which can then be applied to any array or list-shaped type (it either has to expose Index-accepting indexer, or a plain integer-based indexer and Count or Length property. There really isn't much to optimize away besides the bounds check since there are no object allocations involved.

A similar approach also works for slicing the types with range operator e.g. span[start..end].

Netch · on Dec 14, 2024

> In Europe, we typically mark the ground-floor as floor-zero,

_Western_ Europe. Eastern Europe prefers 1-based numbering. The reason, typically assumed, is that thermal isolation, required due to colder winters, causes at least one stair segment between entrance and the sequentially first floor.

Netch · on Dec 14, 2024

> find the convention used in many countries of numbering building floors starting with zero to be more logical.

Ukrainian here. Multi-floor buildings always have at least one stair section to first floor due to need of thermal basement isolation. (I guess this is not pertaining to Western Europe due to more clement winters.) And, yep, it is called "first" floor. Using zero number is rare but possible (in this case it is called "tsokolny" floor) if a real "basement floor" is present, but in this case still 1-based numbering is preferred.

Netch · on Dec 14, 2024

This division, using SRT loop with 2 bit output per iteration, perhaps would have already been microcoded - but using the lookup table as an accelerator. An alternative could use a simpler approach (e.g. 1-bit-iteration "non-restoring" division). Longer but still fitting into normal range.

But if they had understood possible aftermath of non-tested block they would have implemented two blocks, and switch to older one if misworking was detected.

tliltocatl · on Dec 15, 2024

That was before dark silicon and real estate was still somewhat expensive, so probably having two blocks wasn't really an option.