>At each level a caller might need 5% of the functionality of any given dependency. The deeper the dependency tree gets the more waste piles on. Eventually you end up in a world where your simple binary is 500 MiB of code you never actually call, but all you did was take that one dependency to format a number.
So, what's is the compiler doing that he doesnt remove unused code?
"dependency" here I guess means something higher-level that your compiler can't make the assumption you will never use.
For example you know you will never use one of the main functions in the parsing library with one of the arguments set to "XML", because you know for sure you don't use XML in your domain (for example you have a solid project constraint that says XML is out of scope).
Unfortunately the code dealing with XML in the library is 95% of the code, and you can't tell your compiler I won't need this, I promise never to call that function with argument set to XML.
Why the compiler can't detect it will not be used? Tree shaking is well implemented in Javascript compilers, an ecosystem which extensively suffer from this problem.
It should be possible to build a dependency graph and analyze which functions might actually end up in the scope. After all the same is already done for closures.
A more realistic example: something like printf or scanf. It can take an object of multiple types as argument. It takes computer's locale from environment and does locale dependent number and date formatting, also supports various timezones that it reads from OS.
And you always run it in a data center that uses a specific locale, and only UTC time zone, and very few simple types. But all this can only be known at runtime, except maybe types if the compiler is good.
You as the implementer might know the user will never input xml, so doc_format can't be 'xml' (you might even add some error handling if the user inputs this), but how can you communicate this to the compiler?
What you're calling "tree shaking" is more commonly called "dead code elimination" in compilers, and is one of the basic optimisations that any production compiler would implement.
A surprising amount of code might be executed in rarely-used or undocumented code paths (for example, if the DEBUG environment variable is 1 or because a plugin is enabled even if not actually used) and thus not shaken out by the compiler.
Plenty of libraries have "verbose" logging flags ship way more than assumed. I remember lots of NPM libs that require `winston` for example are runtime-configurable. Or Java libraries that require Log4J. With Rust it's getting hard to remember because everything today seems to pull the fucking kitchen sink...
And even going beyond "debug", plenty of libraries ship features that are downright unwanted by consumers.
The two famous recent examples are Heartbleed and Log4shell.
So, what's is the compiler doing that he doesnt remove unused code?