Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Comp.lang.c Frequently Asked Questions (c-faq.com)
124 points by hliyan on Sept 12, 2021 | hide | past | favorite | 54 comments


Newsgroup FAQs are a goldmine. I still find myself referring to the comp.graphics.algorithms FAQ repeatedly:

http://www.faqs.org/faqs/graphics/algorithms-faq/


Does anyone know what happened to rtfm.mit.edu? It was, quite recently, an FTP site where all the Usenet FAQs were archived.


Might gave something to do with Firefox removing FTP support.


Connecting with an FTP client shows that the connection times out.


1.1. How should I decide which integer type to use?

Programmer: Hmmm, I think these three should be shorts and these 10 can fit in chars.

Modern compiler: Fuck that, you're all 32-bits. I ain't got time for unaligned memory access...


And then you get to the unfortunate situation that is the x64 ABIs - int is still 32-bit, leading to a bunch of extra movsx instructions if you use ints for things like indexing.


Use size_t to declare anything that will be used as an index.


This is literally the reason `size_t` was created.


Why isn't it called index_t then?


As a fellow student of "why is this named that, what does this name mean" I have found that asking about counterfactuals is rarely satisfying. Naming stuff is hard and the question assumed a level of intent that I think is rarely present. Asking "how did this thing get named that" works out better, cause it seems that occasionally gets written down.


My point was that it seems more likely that size_t was created to represent sizes than to represent indexes.

I don't disagree that size_t is an appropriate type for indexes, but I don't think indexes are literally the reason it was created.


It's named for sizeof, a C operator which returns a positive integer but annoyingly the core C language itself doesn't define what the type of that integer is, the standard library does though, naming it size_t

Now, sizeof does measure the size of things, but, one of the obvious sizes you can measure is an array†, and that's definitely also the maximum index value for the array, so I do think it's fair to say that's (part of) literally why size_t exists.

† One of C's treacherous footguns. In the scope where the array was defined it's an array, and sizeof(array) tells you how big that array is. But, passed as a parameter it becomes a pointer and sizeof(resulting_pointer) is the size of the pointer, not the array :(


>Now, sizeof does measure the size of things, but, one of the obvious sizes you can measure is an array†, and that's definitely also the maximum index value for the array

Make sure to be careful and realize that sizeof my_array returns the number of bytes that my_array uses, not the number of elements. An array of 10 ints likely has sizeof == 40, while indexing past 9 would be undefined behavior.


Excellent point, that largely contradicts my earlier position in fact. I was thinking of an array of chars and should have considered that's only a special case.


The core language says that the result of sizeof is size_t, defined in <stddef.h> (and other headers).

size_t is a typedef (alias) for an implementation-defined unsigned integer type.

It could have been worded differently with the same meaning. For example, the core language (section 6 of the standard) could have said that sizeof yields a result of an implementation-defined unsigned integer type without referring to "size_t". The library section (section 7) already says that size_t "is the unsigned integer type of the result of the sizeof operator". Personally I think that referring to size_t in the language section adds clarity, even if it's slightly redundant.

In a given implementation, a compiler might arrange for sizeof to yield a result of type unsigned long, for example. The corresponding <stddef.h> header must then define size_t as unsigned long for the implementation to be correct.


Well, my point is that "more likely" and "more reasonable" are rarely reasons why things were done in the first place so using the benefit of hindsight to try and suss out cause and effect will frequently lead you to plausible but incorrect answers (specifically for the history and naming of things).

Then again, being wrong about something seems faster than asking a question, and this little thread has already unearthed interesting answers, so maybe you have the right of it!


I agree and disagree. Yes it was created for size. But when indexing std::vector you can index up to the size, and thus should use the same type for both.


Because C doesn't really have indexes, it has pointers and offsets. offset_t would make more sense than index_t.


There's also ssize_t and ptrdiff_t for offsets that might be negative. Use is pretty nuanced, read the docs before using, etc.


And intptr_t and uintptr_t for results of computations resulting in pointers. Sadly C's type system isn't really powerful enough to properly take advantage of these.


I think the next (or current - I've lost track) version of C++ has an idx_t type which is an unsigned int of some sort, which will be the recommended type for for loops.


I don't see a reference to "idx_t" anywhere in the latest draft of the C++ standard.


Yep, I can't find anything either - I thought I read it four or five years ago in an interview with someone big in the C++ community.


Because its the return type of the sizeof operator


Modern compilers generally prefer register-sized data.


Ah thanks, that's what I was thinking. But somehow I got caught up in unaligned cacheline accesses, which are much larger than 32-bits. It's early where I am. :)


I mean there are 4 different concepts at play here:

Unaligned memory access, types generally have an alignment which is the same as their length and structs have the alignment of their largest member because many architectures get quite cross with unaligned access e.g. trying to load a 32b integer from memory to register while it's not aligned to 32b, this is where compilers will pad structures to ensure everything is aligned properly by default, and where (in C/C++ anyway) you want to take care of your struct layout to avoid unnecessary padding. That doesn't prevent "these three should be shorts and these 10 can fit in chars." at all, but if you intersperse them the structure will increase in size due to padding.

Then there's C being specified in terms of lowest capabilities so while "char" and "short" are at least 8 and at least 16 bits… they can also be larger (POSIX does specify that char must be exactly 8 bits but non-POSIX platforms don't require that). That matches everything being 32b. It's a somewhat common occurrences for DSPs to have char of 16 or 32 bits, it really has nothing to do with the compiler (except insofar as the compiler does what the architecture description specifies).

Then there's the preferred data size, which is probably a function of the architectural ALUs than the native registers e.g. if an architecture only works on 32b datum then computations on 8 bit data will require loading 8 bits, zeroing the upper 24, performing the computation in 32b, copying whichever bit is concerned to the overflow flag, then zeroing that. Whereas performing the computation on a 32b-native type would require loading 32b, performing the computation, done (that was never a concern on x86/64 but IIRC older revisions of ARM could only natively work in 32b, and not all possible arithmetic operations got expanded to 8/16b at the same time).

And finally there's the cache line alignment. There are actually two opposite issues with cacheline alignments: you usually want your structure to not span cachelines (as that gets more expensive) but sometimes you want your structures be always start at the start of a cacheline (and possibly take the entirety of the cacheline) to avoid false sharing issues: if two unrelated structures are on the same cache line and they're used from different cores, the cores will need a lot more synchronisation than if they were on different cache lines.


cache alignment also gets stupid complicated when you add in what kind of malloc you use, c.f. google's tcmalloc: https://github.com/google/tcmalloc


Also worth reading is C infrequently asked questions https://www.seebs.net/faqs/c-iaq.html


Worth noting that parts of it are trying to be funny, parts of it are plain wrong, and parts of it are dangerously misleading. There may be one or two useful bits in it somewheres too.


It may be based in truth, but it's a humor piece.


For entertainment? Tone seems a bit over the top sarcastic.


If it is only the tone that bothers you, maybe you should abstain from programming C. ;-)


First tone then content...


Written by my former colleague Steve Summit, probably the most knowledgeable person this side of Dennis Ritchie about programming C, especially in Unix environments.


A giant among men ;)


Ooooh, this one's good - http://c-faq.com/struct/retcrash.html


I think I’ve done that one.


While I don’t particularly like the structure of the FAQ (one page per item is definitely tedious to read through - I wish it came in a PDF or scrollable form) this is a goldmine of succinct and zero frills information. I learned a few things.


http://c-faq.com/versions.html has downloads for ascii versions (that may or may be up to date)

http://www.faqs.org/faqs/C-faq/faq/ has an ascii version from July 3, 2004 (again, I wouldn’t know whether that is up to date, but given its age, it may not be)


> Most programs do not need precise control over these sizes; many programs that do try to achieve this control would be better off if they didn't.

Interesting that modern languages like rust, go, and zig all lean towards using integer types with well defined sizes. I think that we've learned that the less exact definitions in c can cause problems. For example, if you develope and test in an environment where int is 32 bits, you could easily end up with bugs that only exist if int is 16 bits.


Modern C (from C99 onward) introduces a few families of sized types -- [u]int[_least|_fast|][8|16|32|64]_t. The non-least-or-fast types are not guaranteed to exist (e.g., int8_t is optional because implementation may be unreasonable on a 24-bit DSP with no sub-word operations), but the _least and _fast types are generally what you want anyway, unless you're relying on unsigned overflow behavior.

In my experience, modern code is either written almost entirely in terms of sized values like int32_t (when assuming a "normal" platform), or almost entirely in terms of _least/_fast values (when going for maximum portability). The only case that "naked" int/long is still common in this model is in a few specific uses such as loop indices where it's idiomatic.

The _fast and _least types are especially interesting, pragmatically. "int_fast8_t" means "I need a signed value that can store values between -128 and 127, doesn't have defined behavior on overflow, and is allowed to fill a register / use a word worth of memory." "int_least8_t" means "I need a signed value that can store values between -128 and 127, doesn't have defined behavior on overflow, and I am likely to have a significant number of these in memory so please pack them as efficiently as reasonable." (And of course we have bit packing when we need to pack more efficiently than reasonable; bit-packing with int_fast8_t allows you to represent "I need this to take exactly 8 bits of memory, whatever it costs compute-wise, because I'm basically filling memory with these.")

In general, I find that these types give modern C a good balance between declaring requirements on types and not overly trying to control their size.


> bit-packing with int_fast8_t allows you to represent

I assume you meant `int_least8_t`?


Either is valid, but int_fast8_t is more meaningful, in my mind. Using an int_fast8_t encourages the compiler to use the fastest representation, except when explicitly bit-packed. When the value is pulled into a register it's probably not going to matter (unless the compiler is doing something quite silly with regards to handling unsigned overflow for a uint*), but it means that spilling it to the stack, etc, will prefer word-sized operations (which may or may not be preferred, but is my default). Meanwhile, the bit packing is fully specifying the representation when the struct containing the value is written to memory, so there's no distinction between _fast and _least here.


> And of course we have bit packing when we need to pack more efficiently than reasonable; bit-packing with *int_fast8_t* allows you to represent "I need this to take exactly 8 bits of memory, whatever it costs compute-wise, because I'm basically filling memory with these."

AFAICT, "int_fast8_t" is generally typedef to "int", and "int_least8_t" is generally typedef to "char", which conflicts with "I need this to take exactly 8 bits of memory, whatever it costs compute-wise, because I'm basically filling memory with these."


Right. The context here is when combined with bit-packing, i.e. `int_fast8_t : 8` vs `int_least8_t : 8` in a struct or similar.


Link to source of quote: http://c-faq.com/decl/exactsizes.html

I'm fond of Ada's approach: generally you just specify the range you want and let the compiler figure out the best integer size to use. You only need to specify a size if you're doing low-level work or FFI.


Technically the wrong type was used if no less than 32 bits was desired. A common failing when you learn a compiler/platform or two instead of the language. Such has been called many things, among them 'everything is a VAX'.


Nice list, but would be cooler if it was scrollable. One question per page is a bit tedious.


I often cite section 6. It's the best explanation I've found of the (often confusing and counterintuitive) relationship between arrays and pointers in C.

If you think arrays are really pointers, you need to read section 6 of the FAQ.


I love this one:

Why do some people write if(0 == x) instead of if(x == 0)?

http://c-faq.com/style/revtest.html



The maintainer, Steve Summit, lives up to his name. He’s self-described as “five foot twenty-two”.


Last update in 2005.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: